mirror of
https://github.com/hwchase17/langchain
synced 2024-11-04 06:00:26 +00:00
50c511d75f
Adds a new pdf loader using the existing dependency on PDFMiner. The new loader can be helpful for chunking texts semantically into sections as the output html content can be parsed via `BeautifulSoup` to get more structured and rich information about font size, page numbers, pdf headers/footers, etc. which may not be available otherwise with other pdf loaders |
||
---|---|---|
.. | ||
__init__.py | ||
test_bigquery.py | ||
test_bshtml.py | ||
test_dataframe.py | ||
test_duckdb.py | ||
test_email.py | ||
test_figma.py | ||
test_gitbook.py | ||
test_ifixit.py | ||
test_pdf.py | ||
test_sitemap.py |