Commit Graph

6 Commits

Author SHA1 Message Date
Tim Asp
23231d65a9
Add PyMuPDF PDF loader (#1426)
Different PDF libraries have different strengths and weaknesses. PyMuPDF
does a good job at extracting the most amount of content from the doc,
regardless of the source quality, extremely fast (especially compared to
Unstructured).

https://pymupdf.readthedocs.io/en/latest/index.html
2023-03-03 20:59:28 -08:00
Harrison Chase
7fb33fca47
chroma docs (#1012) 2023-02-12 23:02:01 -08:00
Harrison Chase
0998577dfe
Harrison/unstructured structured (#1004) 2023-02-12 07:36:11 -08:00
Harrison Chase
bbb06ca4cf
pdfminer (#1003) 2023-02-12 07:29:26 -08:00
Harrison Chase
c64f98e2bb
Harrison/format agent instructions (#973)
Co-authored-by: Andrew White <white.d.andrew@gmail.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
Co-authored-by: Peng Qu <82029664+pengqu123@users.noreply.github.com>
2023-02-10 10:07:26 -08:00
Harrison Chase
2ec25ddd4c
add unstructured examples (#913) 2023-02-06 18:13:46 -08:00