langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

History

Daniel Chalef b157e0c1c3 Add HTML document_loader that includes page title metadata (#1720 ) This `BSHTMLLoader` document_loader loads an HTML document, extracts text and adds the page title to the returned Document's metadata. The loader uses the already installed bs4 package to extract both text content and the page title. Included in this PR is an example HTML file and an integration test that tests against this file. --------- Co-authored-by: Daniel Chalef <daniel.chalef@private.org>		2023-03-16 21:47:17 -07:00
..
example.html	Add HTML document_loader that includes page title metadata (#1720 )	2023-03-16 21:47:17 -07:00
hello.pdf	Harrison/format agent instructions (#973 )	2023-02-10 10:07:26 -08:00
layout-parser-paper.pdf	Harrison/remote paths pdf (#1544 )	2023-03-08 20:53:37 -08:00