langchain/libs/community/langchain_community/document_loaders
Will Higgins 83d10df78d
community[patch]: Update firecrawl api key name (#22183)
Change 'FIREWALL' to 'FIRECRAWL' as I believe this may have been in
error. Other docs refer to 'FIRECRAWL_API_KEY'.

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-05-27 21:39:29 +00:00
..
blob_loaders community[patch]: Update doc-string in CloudBlobLoader (#22069) 2024-05-23 15:31:41 +00:00
parsers community[patch]: Fix remaining __inits__ in community (#22037) 2024-05-22 17:42:17 +00:00
__init__.py infra: rm unused # noqa violations (#22049) 2024-05-22 15:21:08 -07:00
acreom.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
airbyte_json.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
airbyte.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
airtable.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
apify_dataset.py community[patch]: update apify integration to attribute API activity to langchain (#21909) 2024-05-20 14:49:23 -07:00
arcgis_loader.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
arxiv.py community[minor]: Implement lazy_load() for ArxivLoader (#18664) 2024-03-06 09:16:49 -05:00
assemblyai.py community[patch]: docstrings update (#20301) 2024-04-11 16:23:27 -04:00
astradb.py (all): update removal in deprecation warnings from 0.2 to 0.3 (#21265) 2024-05-03 14:29:36 -04:00
async_html.py community[minor]: allow enabling proxy in aiohttp session in AsyncHTML (#19499) 2024-05-22 18:25:06 +00:00
athena.py community[minor]: import fix (#20995) 2024-04-29 10:32:50 -04:00
azlyrics.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
azure_ai_data.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
azure_blob_storage_container.py community[patch]: type ignore fixes (#18395) 2024-03-01 11:21:02 -08:00
azure_blob_storage_file.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
baiducloud_bos_directory.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
baiducloud_bos_file.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
base_o365.py community[minor]: Added propagation of document metadata from O365BaseLoader (#20663) 2024-05-23 11:42:19 -04:00
base.py core: Move document loader interfaces to core (#17723) 2024-03-06 13:59:00 -05:00
bibtex.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
bigquery.py (all): update removal in deprecation warnings from 0.2 to 0.3 (#21265) 2024-05-03 14:29:36 -04:00
bilibili.py community[patch]: docstrings update (#20301) 2024-04-11 16:23:27 -04:00
blackboard.py infra: rm unused # noqa violations (#22049) 2024-05-22 15:21:08 -07:00
blockchain.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
brave_search.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
browserbase.py community: updated Browserbase loader (#21757) 2024-05-16 08:21:23 -07:00
browserless.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
cassandra.py community[minor]: Add Cassandra ByteStore (#22064) 2024-05-23 10:46:23 -04:00
chatgpt.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
chm.py community[patch]: docstrings (#16810) 2024-02-09 12:48:57 -08:00
chromium.py community[patch]: Update comments for lazy_load method (#21063) 2024-05-01 01:20:57 -04:00
college_confidential.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
concurrent.py community[patch]: import flattening fix (#20110) 2024-04-10 13:01:19 -04:00
confluence.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
conllu.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
couchbase.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
csv_loader.py community: Fix CSVLoader columns is None (#20701) 2024-05-22 12:57:46 -07:00
cube_semantic.py community[patch]: Implement lazy_load() for CubeSemanticLoader (#18535) 2024-03-05 17:32:31 -08:00
datadog_logs.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
dataframe.py community[patch]: support modin document loader (#18866) 2024-03-10 18:40:04 -07:00
diffbot.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
directory.py community: fix DirectoryLoader progress bar (#19821) 2024-04-17 21:12:16 +00:00
discord.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
doc_intelligence.py docs: community docstring updates (#21040) 2024-04-29 17:40:23 -04:00
docugami.py (all): update removal in deprecation warnings from 0.2 to 0.3 (#21265) 2024-05-03 14:29:36 -04:00
docusaurus.py docs: docstrings langchain_community update (#14889) 2023-12-19 08:58:24 -05:00
dropbox.py infra: add print rule to ruff (#16221) 2024-02-09 16:13:30 -08:00
duckdb_loader.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
email.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
epub.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
etherscan.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
evernote.py infra: rm unused # noqa violations (#22049) 2024-05-22 15:21:08 -07:00
excel.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
facebook_chat.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
fauna.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
figma.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
firecrawl.py community[patch]: Update firecrawl api key name (#22183) 2024-05-27 21:39:29 +00:00
gcs_directory.py (all): update removal in deprecation warnings from 0.2 to 0.3 (#21265) 2024-05-03 14:29:36 -04:00
gcs_file.py (all): update removal in deprecation warnings from 0.2 to 0.3 (#21265) 2024-05-03 14:29:36 -04:00
generic.py community[patch]: import flattening fix (#20110) 2024-04-10 13:01:19 -04:00
geodataframe.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
git.py Merge pull request #18539 2024-03-06 13:25:14 -05:00
gitbook.py community[minor]: Implement lazy_load() for GitbookLoader (#18670) 2024-03-06 09:14:36 -05:00
github.py community: Implement lazy_load() for GithubFileLoader (#18584) 2024-03-05 09:35:50 -08:00
glue_catalog.py community[minor]: Add glue catalog loader (#20220) 2024-04-16 11:39:23 -04:00
google_speech_to_text.py (all): update removal in deprecation warnings from 0.2 to 0.3 (#21265) 2024-05-03 14:29:36 -04:00
googledrive.py (all): update removal in deprecation warnings from 0.2 to 0.3 (#21265) 2024-05-03 14:29:36 -04:00
gutenberg.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
helpers.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
hn.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
html_bs.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
html.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
hugging_face_dataset.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
hugging_face_model.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
ifixit.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
image_captions.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
image.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
imsdb.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
iugu.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
joplin.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
json_loader.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
kinetica_loader.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
lakefs.py docs: docstrings langchain_community update (#14889) 2023-12-19 08:58:24 -05:00
larksuite.py community[minor]: Add LarkSuite wiki document loader. (#21016) 2024-04-29 10:37:50 -04:00
llmsherpa.py community[minor]: add support for llmsherpa (#19741) 2024-03-29 16:04:57 -07:00
markdown.py corrected outdated link (#15053) 2023-12-22 12:39:38 -08:00
mastodon.py Merge pull request #18671 2024-03-06 13:23:14 -05:00
max_compute.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
mediawikidump.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
merge.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
mhtml.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
mintbase.py community[minor]: add mintbase loader to langchain (#20089) 2024-04-30 04:11:56 +00:00
modern_treasury.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
mongodb.py community[minor]: added a feature to filter documents in Mongoloader (#18253) 2024-03-08 12:06:35 -08:00
news.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
notebook.py community[patch]: add NotebookLoader unit test (#17721) 2024-03-29 00:27:46 +00:00
notion.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
notiondb.py community[patch]: Fix NotionDBLoader 400 Error by conditionally adding filter parameter (#19075) 2024-03-14 13:56:57 +00:00
nuclia.py infra: add print rule to ruff (#16221) 2024-02-09 16:13:30 -08:00
obs_directory.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
obs_file.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
obsidian.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
odt.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
onedrive_file.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
onedrive.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
onenote.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
open_city_data.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
oracleadb_loader.py community[minor]: add oracle autonomous database doc loader integration (#19536) 2024-03-26 17:02:18 -07:00
oracleai.py community[minor]: Oraclevs integration (#21123) 2024-05-04 03:15:35 +00:00
org_mode.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
pdf.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
pebblo.py infra: rm unused # noqa violations (#22049) 2024-05-22 15:21:08 -07:00
polars_dataframe.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
powerpoint.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
psychic.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
pubmed.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
pyspark_dataframe.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
python.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
quip.py community[major]: lint for usage of xml library (#22132) 2024-05-24 15:23:53 +00:00
readthedocs.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
recursive_url_loader.py community[patch]: Using the right encoding to parse the web page in RecursiveUrlLoader (#20632) 2024-04-30 18:41:36 +00:00
reddit.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
roam.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
rocksetdb.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
rspace.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
rss.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
rst.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
rtf.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
s3_directory.py community[patch]: Skip nested directories when using S3DirectoryLoader (#17829) 2024-03-08 16:50:58 -08:00
s3_file.py community[patch]: support unstructured_kwargs for s3 loader (#15473) 2024-03-27 22:03:48 +00:00
scrapfly.py community[minor]: Add Scrapfly Loader community integration (#22036) 2024-05-22 21:29:13 +00:00
sharepoint.py community[patch]: Put authorized identities behind a feature flag in SharepointLoader (#22125) 2024-05-24 12:42:57 -04:00
sitemap.py community[minor]: Implement lazy_load() for SitemapLoader (#18667) 2024-03-06 09:15:35 -05:00
slack_directory.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
snowflake_loader.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
spider.py doc list not empty (#21208) 2024-05-20 08:24:06 -07:00
spreedly.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
sql_database.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
srt.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
stripe.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
surrealdb.py community[patch]: SurrealDB fix for asyncio (#16092) 2024-01-23 19:46:19 -08:00
telegram.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
tencent_cos_directory.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
tencent_cos_file.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
tensorflow_datasets.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
text.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
tidb.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
tomarkdown.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
toml.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
trello.py community: Implement lazy_load() for TrelloLoader (#18658) 2024-03-06 13:04:36 -05:00
tsv.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
twitter.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
unstructured.py community[minor]: import fix (#20995) 2024-04-29 10:32:50 -04:00
url_playwright.py docs: community docstring updates (#21040) 2024-04-29 17:40:23 -04:00
url_selenium.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
url.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
vsdx.py community[patch]: import flattening fix (#20110) 2024-04-10 13:01:19 -04:00
weather.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
web_base.py community[patch]: raise_for_status logic missing in async _fetch of WebBaseLoader (#21948) 2024-05-21 23:51:03 +00:00
whatsapp_chat.py community: Implement lazy_load() for WhatsAppChatLoader (#18677) 2024-03-06 13:03:46 -05:00
wikipedia.py community[patch]: upgrade to recent version of mypy (#21616) 2024-05-13 14:55:07 -04:00
word_document.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
xml.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
xorbits.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
youtube.py community[patch]: docstrings (#16810) 2024-02-09 12:48:57 -08:00
yuque.py community[minor]: add Yuque document loader (#17924) 2024-03-05 15:54:07 -08:00