You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/docs/docs/integrations/document_loaders
WilliamEspegren 804390ba4b
community: Spider integration (#20937)
Added the [Spider.cloud](https://spider.cloud) document loader.
[Spider](https://github.com/spider-rs/spider) is the
[fastest](https://github.com/spider-rs/spider/blob/main/benches/BENCHMARKS.md)
and cheapest crawler that returns LLM-ready data.

```
- **Description:** Adds Spider data loader
- **Dependencies:** spider-client
- **Twitter handle:** @WilliamEspegren 
```

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: = <=>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
5 months ago
..
example_data docs: Remove example vsdx data (#20620) 5 months ago
acreom.ipynb docs: `integrations/providers` update 9 (#19941) 6 months ago
airbyte.ipynb docs: airbyte deps note (#18243) 7 months ago
airbyte_cdk.ipynb docs: import update (#20610) 5 months ago
airbyte_gong.ipynb docs: import update (#20610) 5 months ago
airbyte_hubspot.ipynb docs: import update (#20610) 5 months ago
airbyte_json.ipynb docs: deprecate old airbyte loader docs (#19048) 6 months ago
airbyte_salesforce.ipynb docs: import update (#20610) 5 months ago
airbyte_shopify.ipynb docs: import update (#20610) 5 months ago
airbyte_stripe.ipynb docs: import update (#20610) 5 months ago
airbyte_typeform.ipynb docs: import update (#20610) 5 months ago
airbyte_zendesk_support.ipynb docs: import update (#20610) 5 months ago
airtable.ipynb docs: integration package pip installs (#15762) 9 months ago
alibaba_cloud_maxcompute.ipynb docs: integration package pip installs (#15762) 9 months ago
amazon_textract.ipynb community[patch]: adding linearization config to AmazonTextractPDFLoader (#17489) 7 months ago
apify_dataset.ipynb docs: import update (#20610) 5 months ago
arcgis.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
arxiv.ipynb docs: integration package pip installs (#15762) 9 months ago
assemblyai.ipynb docs: integration package pip installs (#15762) 9 months ago
astradb.ipynb Add doc for AstraDB document loader (#15703) 9 months ago
async_chromium.ipynb docs: Update async_chromium.ipynb (#19514) 6 months ago
async_html.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
athena.ipynb docs: `integrations/providers` update 9 (#19941) 6 months ago
aws_s3_directory.ipynb docs: integration package pip installs (#15762) 9 months ago
aws_s3_file.ipynb docs: integration package pip installs (#15762) 9 months ago
azlyrics.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
azure_ai_data.ipynb docs: integration package pip installs (#15762) 9 months ago
azure_blob_storage_container.ipynb docs: integration package pip installs (#15762) 9 months ago
azure_blob_storage_file.ipynb docs: integration package pip installs (#15762) 9 months ago
azure_document_intelligence.ipynb community[patch]: Microsoft Azure Document Intelligence updates (#16932) 6 months ago
bibtex.ipynb docs: `integrations/providers` update 9 (#19941) 6 months ago
bilibili.ipynb community[patch]: fix bugs for bilibili Loader (#18036) 6 months ago
blackboard.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
blockchain.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
brave_search.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
browserbase.ipynb community[minor]: added Browserbase loader (#20478) 5 months ago
browserless.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
cassandra.ipynb astradb: bootstrapping Astra DB as Partner Package (#16875) 7 months ago
chatgpt_loader.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
college_confidential.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
concurrent.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
confluence.ipynb docs: integration package pip installs (#15762) 9 months ago
conll-u.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
copypaste.ipynb docs: import update (#20610) 5 months ago
couchbase.ipynb docs: `integrations/providers` update 9 (#19941) 6 months ago
csv.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
cube_semantic.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
datadog_logs.ipynb docs: integration package pip installs (#15762) 9 months ago
diffbot.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
discord.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
docugami.ipynb patch: deprecate (a)get_relevant_documents (#20477) 5 months ago
docusaurus.ipynb docs: integration package pip installs (#15762) 9 months ago
dropbox.ipynb docs: Remove non-rendering images & output spamming from doc ntbks (#19475) 6 months ago
duckdb.ipynb docs: integration package pip installs (#15762) 9 months ago
email.ipynb docs: integration package pip installs (#15762) 9 months ago
epub.ipynb docs: integration package pip installs (#15762) 9 months ago
etherscan.ipynb docs: integration package pip installs (#15762) 9 months ago
evernote.ipynb docs: integration package pip installs (#15762) 9 months ago
facebook_chat.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
fauna.ipynb docs: integration package pip installs (#15762) 9 months ago
figma.ipynb patch: deprecate (a)get_relevant_documents (#20477) 5 months ago
firecrawl.ipynb community[minor]: Firecrawl.dev integration (#20364) 5 months ago
geopandas.ipynb docs: make links internal (#19063) 6 months ago
git.ipynb docs: integration package pip installs (#15762) 9 months ago
gitbook.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
github.ipynb docs: Fix typo in github.ipynb (#17259) 8 months ago
glue_catalog.ipynb community[minor]: Add glue catalog loader (#20220) 5 months ago
google_alloydb.ipynb docs: update Google Cloud database integration docs (#18711) 7 months ago
google_bigquery.ipynb docs: integration package pip installs (#15762) 9 months ago
google_bigtable.ipynb docs: make links internal (#19063) 6 months ago
google_cloud_sql_mssql.ipynb docs: make links internal (#19063) 6 months ago
google_cloud_sql_mysql.ipynb docs: make links internal (#19063) 6 months ago
google_cloud_sql_pg.ipynb docs: update Google Cloud database integration docs (#18711) 7 months ago
google_cloud_storage_directory.ipynb community[patch]: Adding try-except block for GCSDirectoryLoader (#19591) 6 months ago
google_cloud_storage_file.ipynb docs: integration package pip installs (#15762) 9 months ago
google_datastore.ipynb docs: make links internal (#19063) 6 months ago
google_drive.ipynb Update google_drive.ipynb (#20731) 5 months ago
google_el_carro.ipynb docs: make links internal (#19063) 6 months ago
google_firestore.ipynb docs: make links internal (#19063) 6 months ago
google_memorystore_redis.ipynb docs: make links internal (#19063) 6 months ago
google_spanner.ipynb docs: make links internal (#19063) 6 months ago
google_speech_to_text.ipynb docs: integration package pip installs (#15762) 9 months ago
grobid.ipynb docs: make links internal (#19063) 6 months ago
gutenberg.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
hacker_news.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
huawei_obs_directory.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
huawei_obs_file.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
hugging_face_dataset.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
ifixit.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
image.ipynb docs: integration package pip installs (#15762) 9 months ago
image_captions.ipynb docs: integration package pip installs (#15762) 9 months ago
imsdb.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
iugu.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
joplin.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
jupyter_notebook.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
kinetica.ipynb community[minor]: Implemented Kinetica Document Loader and added notebooks (#20002) 5 months ago
lakefs.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
larksuite.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
llmsherpa.ipynb community[minor]: add support for llmsherpa (#19741) 6 months ago
mastodon.ipynb docs: integration package pip installs (#15762) 9 months ago
mediawikidump.ipynb docs: remove unnecessary args from the pip install (#19823) 6 months ago
merge_doc.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
mhtml.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
microsoft_excel.ipynb community[patch]: Microsoft Azure Document Intelligence updates (#16932) 6 months ago
microsoft_onedrive.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
microsoft_onenote.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
microsoft_powerpoint.ipynb community[patch]: Microsoft Azure Document Intelligence updates (#16932) 6 months ago
microsoft_sharepoint.ipynb community[patch]: Changes to base_o365 and sharepoint document loaders (#20373) 5 months ago
microsoft_word.ipynb community[patch]: Microsoft Azure Document Intelligence updates (#16932) 6 months ago
modern_treasury.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
mongodb.ipynb community[patch]: documented the feature to filter documents in MongoDBloader (#18842) 7 months ago
news.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
notion.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
notiondb.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
nuclia.ipynb docs: integration package pip installs (#15762) 9 months ago
obsidian.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
odt.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
open_city_data.ipynb docs: integration package pip installs (#15762) 9 months ago
oracleadb_loader.ipynb docs: Fix oracle doc loader format issue (#19628) 6 months ago
org_mode.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
pandas_dataframe.ipynb docs: integration package pip installs (#15762) 9 months ago
pebblo.ipynb community[patch]: Add semantic info to metadata, classified by pebblo-server. (#20468) 5 months ago
polars_dataframe.ipynb docs: integration package pip installs (#15762) 9 months ago
psychic.ipynb docs: langchain-chroma package (#20394) 5 months ago
pubmed.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
pyspark_dataframe.ipynb docs: integration package pip installs (#15762) 9 months ago
quip.ipynb docs: Fix broken imports in documentation (#19655) 6 months ago
readthedocs_documentation.ipynb docs: integration package pip installs (#15762) 9 months ago
recursive_url.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
reddit.ipynb docs: integration package pip installs (#15762) 9 months ago
roam.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
rockset.ipynb docs: integration package pip installs (#15762) 9 months ago
rspace.ipynb docs: integration package pip installs (#15762) 9 months ago
rss.ipynb docs: integration package pip installs (#15762) 9 months ago
rst.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
sitemap.ipynb docs: fixed xml URL on sitemap docs exmaple, issue #17236 (#17304) 6 months ago
slack.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
snowflake.ipynb docs: integration package pip installs (#15762) 9 months ago
source_code.ipynb text-splitters[minor], langchain[minor], community[patch], templates, docs: langchain-text-splitters 0.0.1 (#18346) 7 months ago
spider.ipynb community: Spider integration (#20937) 5 months ago
spreedly.ipynb patch: deprecate (a)get_relevant_documents (#20477) 5 months ago
stripe.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
subtitle.ipynb docs: integration package pip installs (#15762) 9 months ago
surrealdb.ipynb community[minor]: Added document loader for SurrealDB (#15995) 8 months ago
telegram.ipynb community:update telegram notebook (#18569) 7 months ago
tencent_cos_directory.ipynb docs: integration package pip installs (#15762) 9 months ago
tencent_cos_file.ipynb docs: integration package pip installs (#15762) 9 months ago
tensorflow_datasets.ipynb docs, templates: update schema imports to core (#17885) 7 months ago
tidb.ipynb community[minor]: Add Initial Support for TiDB Vector Store (#15796) 7 months ago
tomarkdown.ipynb docs: make links internal (#19063) 6 months ago
toml.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
trello.ipynb docs: integration package pip installs (#15762) 9 months ago
tsv.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
twitter.ipynb docs: integration package pip installs (#15762) 9 months ago
unstructured_file.ipynb docs: Fix link in Unstructured notebook (#19851) 6 months ago
upstage.ipynb upstage: Add Upstage partner package LA and GC (#20651) 5 months ago
url.ipynb docs: `integrations/providers/unstructured` update (#19892) 6 months ago
vsdx.ipynb community[minor]: New documents loader for visio files (with extension .vsdx) (#16171) 8 months ago
weather.ipynb docs: integration package pip installs (#15762) 9 months ago
web_base.ipynb community: Spider integration (#20937) 5 months ago
whatsapp_chat.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
wikipedia.ipynb docs: integration package pip installs (#15762) 9 months ago
xml.ipynb docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429) 9 months ago
xorbits.ipynb docs: integration package pip installs (#15762) 9 months ago
youtube_audio.ipynb docs: use standard openai params (#20160) 6 months ago
youtube_transcript.ipynb docs: integration package pip installs (#15762) 9 months ago
yuque.ipynb community[minor]: add Yuque document loader (#17924) 7 months ago