langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-04 06:00:26 +00:00

History

Chetanya Rastogi 50c511d75f Add new loader to load pdf as html content (#2607 ) Adds a new pdf loader using the existing dependency on PDFMiner. The new loader can be helpful for chunking texts semantically into sections as the output html content can be parsed via `BeautifulSoup` to get more structured and rich information about font size, page numbers, pdf headers/footers, etc. which may not be available otherwise with other pdf loaders		2023-04-09 17:57:25 -07:00
..
__init__.py	Add new iFixit document loader (#1333 )	2023-02-27 20:40:20 -08:00
test_bigquery.py	Harrison/big query (#2100 )	2023-03-28 08:17:22 -07:00
test_bshtml.py	Add ability to pass kwargs to loader classes in `DirectoryLoader`, add ability to modify encoding and BeautifulSoup behaviour in `BSHTMLLoader` (#2275 )	2023-04-01 12:48:27 -07:00
test_dataframe.py	rm pandas dependency (#2102 )	2023-03-28 08:38:19 -07:00
test_duckdb.py	Harrison/duckdb (#2064 )	2023-03-27 19:51:34 -07:00
test_email.py	Harrison/msg files (#2375 )	2023-04-04 06:48:34 -07:00
test_figma.py	Harrison/figma doc loader (#1908 )	2023-03-22 19:57:46 -07:00
test_gitbook.py	Harrison/gitbook (#2044 )	2023-03-28 15:28:33 -07:00
test_ifixit.py	Add new iFixit document loader (#1333 )	2023-02-27 20:40:20 -08:00
test_pdf.py	Add new loader to load pdf as html content (#2607 )	2023-04-09 17:57:25 -07:00
test_sitemap.py	Harrison/site map (#2061 )	2023-03-27 16:28:08 -07:00