You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/langchain/document_loaders
blob42 c0cca393a7 DirectoryLoader parallel file loading 1 year ago
..
blob_loaders Add progress bar to filesystemblob loader, update pytest config for unit tests (#4212) 1 year ago
parsers Add PDF parser implementations (#4356) 1 year ago
__init__.py feat: add loader for open office odt files (#4405) 1 year ago
airbyte_json.py Dev2049/add modern treasury (#3924) 1 year ago
apify_dataset.py Harrison/apify (#2215) 1 year ago
arxiv.py `Arxiv` document loader (#3627) 1 year ago
azlyrics.py clean up loaders (#1178) 1 year ago
azure_blob_storage_container.py Minor: Remove duplicated word in error message (#2706) 1 year ago
azure_blob_storage_file.py Minor: Remove duplicated word in error message (#2706) 1 year ago
base.py Add PDF parser implementations (#4356) 1 year ago
bigquery.py Harrison/big query (#2100) 1 year ago
bilibili.py Added bilibili loader (#2673) (#2724) 1 year ago
blackboard.py hotfix (#1742) 1 year ago
blockchain.py Enhancement: option to Get All Tokens with a single Blockchain Document Loader call (#3797) 1 year ago
chatgpt.py Add ChatGPT Data Loader (#3336) 1 year ago
college_confidential.py clean up loaders (#1178) 1 year ago
confluence.py Improve error messages formatting in doc loaders (#4586) 1 year ago
conllu.py add CoNLL-U document loader (#1297) 1 year ago
csv_loader.py simplify csv args (#4182) 1 year ago
dataframe.py rm pandas dependency (#2102) 1 year ago
diffbot.py consistently use getLogger(__name__), no root logger (#2989) 1 year ago
directory.py DirectoryLoader parallel file loading 1 year ago
discord.py Harrison/discord loader (#3200) 1 year ago
duckdb_loader.py Minor: Remove duplicated word in error message (#2706) 1 year ago
email.py fix: pass unstructured kwargs down in all unstructured loaders (#2506) 1 year ago
epub.py fix: pass unstructured kwargs down in all unstructured loaders (#2506) 1 year ago
evernote.py Update and rename everynote.py to evernote.py (#1060) 1 year ago
facebook_chat.py Refactor TelegramChatLoader and FacebookChatLoader classes and add tests (#3863) 1 year ago
figma.py Dev2049/add modern treasury (#3924) 1 year ago
gcs_directory.py Support GCS Objects with `/` in GCS Loaders (#3356) 1 year ago
gcs_file.py Support GCS Objects with `/` in GCS Loaders (#3356) 1 year ago
git.py Harrison/cohere reranker (#3904) 1 year ago
gitbook.py Gitbook enhancements (#2279) 1 year ago
googledrive.py Improve error messages formatting in doc loaders (#4586) 1 year ago
gutenberg.py gutenberg books (#946) 1 year ago
helpers.py timeout on file encoding detection (#4479) 1 year ago
hn.py Refactor some loops into list comprehensions (#1185) 1 year ago
html.py feat: allow the unstructured kwargs to be passed in to Unstructured document loaders (#1667) 1 year ago
html_bs.py Add get_text_separator parameter to BSHTMLLoader (#3551) 1 year ago
hugging_face_dataset.py Harrison/hf document loader (#3394) 1 year ago
ifixit.py Harrison/ifixit (#1680) 1 year ago
image.py feat: allow the unstructured kwargs to be passed in to Unstructured document loaders (#1667) 1 year ago
image_captions.py Improve error messages formatting in doc loaders (#4586) 1 year ago
imsdb.py clean up loaders (#1178) 1 year ago
json_loader.py JSON loader (#4067) 1 year ago
markdown.py fix: pass unstructured kwargs down in all unstructured loaders (#2506) 1 year ago
mediawikidump.py Harrison/media wiki xml (#4072) 1 year ago
modern_treasury.py Dev2049/add modern treasury (#3924) 1 year ago
notebook.py fix imports (#1288) 1 year ago
notion.py Harrison/unstructured support (#903) 1 year ago
notiondb.py opt: document_loader notiondb to extract url (#4222) 1 year ago
obsidian.py Dev2049/obsidian patch (#4204) 1 year ago
odt.py feat: add loader for open office odt files (#4405) 1 year ago
onedrive.py Improve error messages formatting in doc loaders (#4586) 1 year ago
onedrive_file.py Harrison/one drive loader (#4081) 1 year ago
pdf.py Improve error messages formatting in doc loaders (#4586) 1 year ago
powerpoint.py feat: allow the unstructured kwargs to be passed in to Unstructured document loaders (#1667) 1 year ago
python.py Add PythonLoader which auto-detects encoding of Python files (#3311) 1 year ago
readthedocs.py fix: ReadTheDocs loader main content filter (#2609) 1 year ago
reddit.py Langchain with reddit (#3661) (#3768) 1 year ago
roam.py Harrison/add roam loader (#939) 1 year ago
rtf.py feat: add loader for rich text files (#3227) 1 year ago
s3_directory.py Minor: Remove duplicated word in error message (#2706) 1 year ago
s3_file.py Improve error messages formatting in doc loaders (#4586) 1 year ago
sitemap.py Add an option to extract more metadata from crawled websites (#4347) 1 year ago
slack_directory.py Add Slack Directory Loader (#2841) 1 year ago
spreedly.py Harrison/spreedly (#3937) 1 year ago
srt.py add srt loader (#1140) 1 year ago
stripe.py Dev2049/add modern treasury (#3924) 1 year ago
telegram.py Refactor TelegramChatLoader and FacebookChatLoader classes and add tests (#3863) 1 year ago
text.py verbose catch of open() errors on TextLoader (#4479) 1 year ago
toml.py Add PDF parser implementations (#4356) 1 year ago
twitter.py Add Twitter Tweet Loader (#3050) 1 year ago
unstructured.py feat: add loader for open office odt files (#4405) 1 year ago
url.py enhancement: add elements mode to `UnstructuredURLLoader` (#3456) 1 year ago
url_playwright.py Harrison/playwright selector (#3185) 1 year ago
url_selenium.py Add support for passing binary_location to the SeleniumURLLoader when creating Chrome or Firefox web drivers (#4305) 1 year ago
web_base.py Improve error messages formatting in doc loaders (#4586) 1 year ago
whatsapp_chat.py Update WhatsAppChatLoader to include the character ~ in the sender name (#4420) 1 year ago
wikipedia.py added `Wikipedia` document loader (#4141) 1 year ago
word_document.py Harrison/doc2txt (#3772) 1 year ago
youtube.py Improve error messages formatting in doc loaders (#4586) 1 year ago