You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/docs/modules/document_loaders/examples
Tim Asp 030ce9f506
fix import error of bs4 (#1952)
Ran into a broken build if bs4 wasn't installed in the project.

Minor tweak to follow the other doc loaders optional package-loading
conventions.

Also updated html docs to include reference to this new html loader.

side note: Should there be 2 different html-to-text document loaders?
This new one only handles local files, while the existing unstructured
html loader handles HTML from local and remote. So it seems like the
improvement was adding the title to the metadata, which is useful but
could also be added to `html.py`
1 year ago
..
example_data fix import error of bs4 (#1952) 1 year ago
CoNLL-U.ipynb add CoNLL-U document loader (#1297) 1 year ago
airbyte_json.ipynb Added initial capital letter to bullet points that had it missing (#1000) 1 year ago
azlyrics.ipynb adding webpage loading logic (#942) 1 year ago
blackboard.ipynb Harrison/blackboard loader (#1737) 1 year ago
college_confidential.ipynb adding webpage loading logic (#942) 1 year ago
copypaste.ipynb copy paste loader (#1302) 1 year ago
csv.ipynb Harrison/add source column (#1784) 1 year ago
directory_loader.ipynb directory loader improvements (#1162) 1 year ago
email.ipynb Harrison/unstructured structured (#1004) 1 year ago
evernote.ipynb Harrison/evernote nb (#1078) 1 year ago
facebook_chat.ipynb Harrison/fb loader (#1277) 1 year ago
figma.ipynb Harrison/figma doc loader (#1908) 1 year ago
gcs_directory.ipynb Harrison/add roam loader (#939) 1 year ago
gcs_file.ipynb Harrison/add roam loader (#939) 1 year ago
gitbook.ipynb Harrison/updating docs (#1196) 1 year ago
googledrive.ipynb add GoogleDriveLoader (#914) 1 year ago
gutenberg.ipynb gutenberg books (#946) 1 year ago
hn.ipynb Harrison/hn loader (#1130) 1 year ago
html.ipynb fix import error of bs4 (#1952) 1 year ago
ifixit.ipynb Add new iFixit document loader (#1333) 1 year ago
image.ipynb feat: document loader for image files (#1330) 1 year ago
imsdb.ipynb adding webpage loading logic (#942) 1 year ago
markdown.ipynb feat: document loader for markdown files (#1558) 1 year ago
notebook.ipynb cleanup (#1274) 1 year ago
notion.ipynb update docs (#905) 1 year ago
obsidian.ipynb Harrison/obsidian (#920) 1 year ago
pdf.ipynb cleanup: unify 3 different pdf loaders, rename PagedPDFSplitter (#1615) 1 year ago
powerpoint.ipynb Harrison/unstructured structured (#1004) 1 year ago
readthedocs_documentation.ipynb Harrison/unstructured support (#903) 1 year ago
roam.ipynb Harrison/add roam loader (#939) 1 year ago
s3_directory.ipynb Harrison/add roam loader (#939) 1 year ago
s3_file.ipynb Harrison/add roam loader (#939) 1 year ago
srt.ipynb add srt loader (#1140) 1 year ago
telegram.ipynb Harrison/telegram loader (#1080) 1 year ago
unstructured_file.ipynb feat: allow the unstructured kwargs to be passed in to Unstructured document loaders (#1667) 1 year ago
url.ipynb feat: adds `UnstructuredURLLoader` for loading data from urls (#979) 1 year ago
web_base.ipynb adding webpage loading logic (#942) 1 year ago
word_document.ipynb Remove redundant .docx loader (closes #1716) + update how_to_guides.rst (#1891) 1 year ago
youtube.ipynb bump version to 106 (#1562) 1 year ago