mirror of
https://github.com/hwchase17/langchain
synced 2024-10-29 17:07:25 +00:00
23231d65a9
Different PDF libraries have different strengths and weaknesses. PyMuPDF does a good job at extracting the most amount of content from the doc, regardless of the source quality, extremely fast (especially compared to Unstructured). https://pymupdf.readthedocs.io/en/latest/index.html |
||
---|---|---|
.. | ||
__init__.py | ||
test_ifixit.py | ||
test_pdf.py |