langchain/langchain/document_loaders
Tim Asp 030ce9f506
fix import error of bs4 (#1952)
Ran into a broken build if bs4 wasn't installed in the project.

Minor tweak to follow the other doc loaders optional package-loading
conventions.

Also updated html docs to include reference to this new html loader.

side note: Should there be 2 different html-to-text document loaders?
This new one only handles local files, while the existing unstructured
html loader handles HTML from local and remote. So it seems like the
improvement was adding the title to the metadata, which is useful but
could also be added to `html.py`
2023-03-23 21:56:13 -07:00
..
__init__.py Remove redundant .docx loader (closes #1716) + update how_to_guides.rst (#1891) 2023-03-22 15:19:42 -07:00
airbyte_json.py Harrison/airbyte (#989) 2023-02-10 18:08:00 -08:00
azlyrics.py clean up loaders (#1178) 2023-02-20 08:20:48 -08:00
base.py Harrison/unstructured support (#903) 2023-02-05 23:02:07 -08:00
blackboard.py hotfix (#1742) 2023-03-17 09:05:08 -07:00
college_confidential.py clean up loaders (#1178) 2023-02-20 08:20:48 -08:00
conllu.py add CoNLL-U document loader (#1297) 2023-02-26 17:27:00 -08:00
csv_loader.py Allow passing in encoding to csv_loader (#1836) 2023-03-20 22:03:00 -07:00
directory.py Add HTML document_loader that includes page title metadata (#1720) 2023-03-16 21:47:17 -07:00
email.py Harrison/unstructured structured (#1004) 2023-02-12 07:36:11 -08:00
evernote.py Update and rename everynote.py to evernote.py (#1060) 2023-02-15 22:41:42 -08:00
facebook_chat.py Harrison/fb loader (#1277) 2023-02-24 07:22:48 -08:00
figma.py Harrison/figma doc loader (#1908) 2023-03-22 19:57:46 -07:00
gcs_directory.py Harrison/add roam loader (#939) 2023-02-08 00:35:33 -08:00
gcs_file.py Harrison/add roam loader (#939) 2023-02-08 00:35:33 -08:00
gitbook.py Add optional base_url arg to GitbookLoader (#1552) 2023-03-09 16:32:40 -08:00
googledrive.py Add service account support to Google Drive (#1761) 2023-03-18 19:55:17 -07:00
gutenberg.py gutenberg books (#946) 2023-02-08 12:00:47 -08:00
hn.py Refactor some loops into list comprehensions (#1185) 2023-02-20 16:38:43 -08:00
html_bs.py fix import error of bs4 (#1952) 2023-03-23 21:56:13 -07:00
html.py feat: allow the unstructured kwargs to be passed in to Unstructured document loaders (#1667) 2023-03-14 18:15:28 -07:00
ifixit.py Harrison/ifixit (#1680) 2023-03-14 21:17:50 -07:00
image.py feat: allow the unstructured kwargs to be passed in to Unstructured document loaders (#1667) 2023-03-14 18:15:28 -07:00
imsdb.py clean up loaders (#1178) 2023-02-20 08:20:48 -08:00
markdown.py feat: document loader for markdown files (#1558) 2023-03-09 10:55:07 -08:00
notebook.py fix imports (#1288) 2023-02-25 08:48:02 -08:00
notion.py Harrison/unstructured support (#903) 2023-02-05 23:02:07 -08:00
obsidian.py add encoding parameter to ObsidianLoader (#1752) 2023-03-19 09:48:31 -07:00
pdf.py feat: allow the unstructured kwargs to be passed in to Unstructured document loaders (#1667) 2023-03-14 18:15:28 -07:00
powerpoint.py feat: allow the unstructured kwargs to be passed in to Unstructured document loaders (#1667) 2023-03-14 18:15:28 -07:00
readthedocs.py Harrison/rtd loader (#1513) 2023-03-07 21:09:54 -08:00
roam.py Harrison/add roam loader (#939) 2023-02-08 00:35:33 -08:00
s3_directory.py Harrison/add roam loader (#939) 2023-02-08 00:35:33 -08:00
s3_file.py Support S3 Object keys with / in S3FileLoader (#1517) 2023-03-08 16:17:26 -08:00
srt.py add srt loader (#1140) 2023-02-18 10:58:39 -08:00
telegram.py fix telegram imports (#1110) 2023-02-17 00:53:01 -08:00
text.py Harrison/0083 (#996) 2023-02-11 08:29:28 -08:00
unstructured.py feat: allow the unstructured kwargs to be passed in to Unstructured document loaders (#1667) 2023-03-14 18:15:28 -07:00
url.py Fix description of UnstructuredURLLoader & UnstructuredHTMLLoader (#1570) 2023-03-10 07:08:58 -08:00
web_base.py Harrison/headers (#1696) 2023-03-15 13:13:21 -07:00
word_document.py feat: allow the unstructured kwargs to be passed in to Unstructured document loaders (#1667) 2023-03-14 18:15:28 -07:00
youtube.py Harrison/subtitles (#1842) 2023-03-20 22:53:52 -07:00