langchain/tests/integration_tests/document_loaders
Chetanya Rastogi 50c511d75f
Add new loader to load pdf as html content (#2607)
Adds a new pdf loader using the existing dependency on PDFMiner. 

The new loader can be helpful for chunking texts semantically into
sections as the output html content can be parsed via `BeautifulSoup` to
get more structured and rich information about font size, page numbers,
pdf headers/footers, etc. which may not be available otherwise with
other pdf loaders
2023-04-09 17:57:25 -07:00
..
__init__.py Add new iFixit document loader (#1333) 2023-02-27 20:40:20 -08:00
test_bigquery.py Harrison/big query (#2100) 2023-03-28 08:17:22 -07:00
test_bshtml.py Add ability to pass kwargs to loader classes in DirectoryLoader, add ability to modify encoding and BeautifulSoup behaviour in BSHTMLLoader (#2275) 2023-04-01 12:48:27 -07:00
test_dataframe.py rm pandas dependency (#2102) 2023-03-28 08:38:19 -07:00
test_duckdb.py Harrison/duckdb (#2064) 2023-03-27 19:51:34 -07:00
test_email.py Harrison/msg files (#2375) 2023-04-04 06:48:34 -07:00
test_figma.py Harrison/figma doc loader (#1908) 2023-03-22 19:57:46 -07:00
test_gitbook.py Harrison/gitbook (#2044) 2023-03-28 15:28:33 -07:00
test_ifixit.py Add new iFixit document loader (#1333) 2023-02-27 20:40:20 -08:00
test_pdf.py Add new loader to load pdf as html content (#2607) 2023-04-09 17:57:25 -07:00
test_sitemap.py Harrison/site map (#2061) 2023-03-27 16:28:08 -07:00