langchain/docs/modules/document_loaders/examples
Tim Asp 030ce9f506
fix import error of bs4 (#1952)
Ran into a broken build if bs4 wasn't installed in the project.

Minor tweak to follow the other doc loaders optional package-loading
conventions.

Also updated html docs to include reference to this new html loader.

side note: Should there be 2 different html-to-text document loaders?
This new one only handles local files, while the existing unstructured
html loader handles HTML from local and remote. So it seems like the
improvement was adding the title to the metadata, which is useful but
could also be added to `html.py`
2023-03-23 21:56:13 -07:00
..
example_data fix import error of bs4 (#1952) 2023-03-23 21:56:13 -07:00
airbyte_json.ipynb Added initial capital letter to bullet points that had it missing (#1000) 2023-02-11 20:31:34 -08:00
azlyrics.ipynb adding webpage loading logic (#942) 2023-02-09 07:52:50 -08:00
blackboard.ipynb Harrison/blackboard loader (#1737) 2023-03-17 08:02:44 -07:00
college_confidential.ipynb adding webpage loading logic (#942) 2023-02-09 07:52:50 -08:00
CoNLL-U.ipynb add CoNLL-U document loader (#1297) 2023-02-26 17:27:00 -08:00
copypaste.ipynb copy paste loader (#1302) 2023-02-26 17:26:37 -08:00
csv.ipynb Harrison/add source column (#1784) 2023-03-19 10:32:13 -07:00
directory_loader.ipynb directory loader improvements (#1162) 2023-02-19 20:47:08 -08:00
email.ipynb Harrison/unstructured structured (#1004) 2023-02-12 07:36:11 -08:00
evernote.ipynb Harrison/evernote nb (#1078) 2023-02-15 22:47:30 -08:00
facebook_chat.ipynb Harrison/fb loader (#1277) 2023-02-24 07:22:48 -08:00
figma.ipynb Harrison/figma doc loader (#1908) 2023-03-22 19:57:46 -07:00
gcs_directory.ipynb Harrison/add roam loader (#939) 2023-02-08 00:35:33 -08:00
gcs_file.ipynb Harrison/add roam loader (#939) 2023-02-08 00:35:33 -08:00
gitbook.ipynb Harrison/updating docs (#1196) 2023-02-20 22:54:26 -08:00
googledrive.ipynb add GoogleDriveLoader (#914) 2023-02-06 21:44:35 -08:00
gutenberg.ipynb gutenberg books (#946) 2023-02-08 12:00:47 -08:00
hn.ipynb Harrison/hn loader (#1130) 2023-02-17 15:15:02 -08:00
html.ipynb fix import error of bs4 (#1952) 2023-03-23 21:56:13 -07:00
ifixit.ipynb Add new iFixit document loader (#1333) 2023-02-27 20:40:20 -08:00
image.ipynb feat: document loader for image files (#1330) 2023-02-27 14:43:32 -08:00
imsdb.ipynb adding webpage loading logic (#942) 2023-02-09 07:52:50 -08:00
markdown.ipynb feat: document loader for markdown files (#1558) 2023-03-09 10:55:07 -08:00
notebook.ipynb cleanup (#1274) 2023-02-24 07:38:24 -08:00
notion.ipynb update docs (#905) 2023-02-06 00:26:20 -08:00
obsidian.ipynb Harrison/obsidian (#920) 2023-02-06 22:21:16 -08:00
pdf.ipynb cleanup: unify 3 different pdf loaders, rename PagedPDFSplitter (#1615) 2023-03-13 23:06:50 -07:00
powerpoint.ipynb Harrison/unstructured structured (#1004) 2023-02-12 07:36:11 -08:00
readthedocs_documentation.ipynb Harrison/unstructured support (#903) 2023-02-05 23:02:07 -08:00
roam.ipynb Harrison/add roam loader (#939) 2023-02-08 00:35:33 -08:00
s3_directory.ipynb Harrison/add roam loader (#939) 2023-02-08 00:35:33 -08:00
s3_file.ipynb Harrison/add roam loader (#939) 2023-02-08 00:35:33 -08:00
srt.ipynb add srt loader (#1140) 2023-02-18 10:58:39 -08:00
telegram.ipynb Harrison/telegram loader (#1080) 2023-02-15 23:24:32 -08:00
unstructured_file.ipynb feat: allow the unstructured kwargs to be passed in to Unstructured document loaders (#1667) 2023-03-14 18:15:28 -07:00
url.ipynb feat: adds UnstructuredURLLoader for loading data from urls (#979) 2023-02-10 10:18:38 -08:00
web_base.ipynb adding webpage loading logic (#942) 2023-02-09 07:52:50 -08:00
word_document.ipynb Remove redundant .docx loader (closes #1716) + update how_to_guides.rst (#1891) 2023-03-22 15:19:42 -07:00
youtube.ipynb bump version to 106 (#1562) 2023-03-09 10:20:54 -08:00