mirror of
https://github.com/hwchase17/langchain
synced 2024-10-29 17:07:25 +00:00
b157e0c1c3
This `BSHTMLLoader` document_loader loads an HTML document, extracts text and adds the page title to the returned Document's metadata. The loader uses the already installed bs4 package to extract both text content and the page title. Included in this PR is an example HTML file and an integration test that tests against this file. --------- Co-authored-by: Daniel Chalef <daniel.chalef@private.org> |
||
---|---|---|
.. | ||
example.html | ||
hello.pdf | ||
layout-parser-paper.pdf |