langchain/docs/extras/integrations/document_loaders
Christophe Bornet 9870bfb9cd
Add bucket and object key to metadata in S3 loader (#9317)
- Description: this PR adds `s3_object_key` and `s3_bucket` to the doc
metadata when loading an S3 file. This is particularly useful when using
`S3DirectoryLoader` to remove the files from the dir once they have been
processed (getting the object keys from the metadata `source` field
seems brittle)
  - Dependencies: N/A
  - Tag maintainer: ?
  - Twitter handle: _cbornet

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-08-30 11:03:24 -04:00
..
example_data RSS Feed / OPML loader (#8694) 2023-08-03 14:58:06 -07:00
acreom.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
airbyte_cdk.ipynb Airbyte based loaders (#8586) 2023-08-08 14:49:25 -07:00
airbyte_gong.ipynb Airbyte based loaders (#8586) 2023-08-08 14:49:25 -07:00
airbyte_hubspot.ipynb Airbyte based loaders (#8586) 2023-08-08 14:49:25 -07:00
airbyte_json.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
airbyte_salesforce.ipynb Airbyte based loaders (#8586) 2023-08-08 14:49:25 -07:00
airbyte_shopify.ipynb Airbyte based loaders (#8586) 2023-08-08 14:49:25 -07:00
airbyte_stripe.ipynb Airbyte based loaders (#8586) 2023-08-08 14:49:25 -07:00
airbyte_typeform.ipynb Airbyte based loaders (#8586) 2023-08-08 14:49:25 -07:00
airbyte_zendesk_support.ipynb Airbyte based loaders (#8586) 2023-08-08 14:49:25 -07:00
airtable.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
alibaba_cloud_maxcompute.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
apify_dataset.ipynb Update Integrations links (#8206) 2023-07-24 21:20:32 -07:00
arcgis.ipynb ArcGISLoader update (#9240) 2023-08-14 23:44:29 -07:00
arxiv.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
assemblyai.ipynb Fix docs for AssemblyAIAudioTranscriptLoader (shorter import path) (#9687) 2023-08-24 07:24:53 -07:00
async_chromium.ipynb Added new use case docs for Web Scraping, Chromium loader, BS4 transformer (#8732) 2023-08-11 11:46:59 -07:00
async_html.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
aws_s3_directory.ipynb Add bucket and object key to metadata in S3 loader (#9317) 2023-08-30 11:03:24 -04:00
aws_s3_file.ipynb Add bucket and object key to metadata in S3 loader (#9317) 2023-08-30 11:03:24 -04:00
azlyrics.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
azure_blob_storage_container.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
azure_blob_storage_file.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
bibtex.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
bilibili.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
blackboard.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
blockchain.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
brave_search.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
browserless.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
chatgpt_loader.ipynb Add api cross ref linking (#8275) 2023-07-26 12:38:58 -07:00
college_confidential.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
concurrent.ipynb Add ConcurrentLoader (#7512) 2023-07-31 17:56:31 -07:00
confluence.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
conll-u.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
copypaste.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
csv.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
cube_semantic.ipynb Cube semantic loader: allow cubes processing (#9927) 2023-08-29 07:21:01 -07:00
datadog_logs.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
diffbot.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
discord.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
docugami.ipynb comment update for poetry install 2023-08-19 13:50:16 -07:00
dropbox.ipynb mv dropbox (#8438) 2023-07-28 16:07:56 -07:00
duckdb.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
email.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
embaas.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
epub.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
Etherscan.ipynb docs: Fix spelling mistakes in Etherscan.ipynb (#9845) 2023-08-28 19:30:00 -07:00
evernote.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
excel.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
facebook_chat.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
fauna.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
figma.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
geopandas.ipynb fix geopandas link (#8305) 2023-07-26 11:30:17 -07:00
git.ipynb docs:misc fixes (#9671) 2023-08-23 22:36:54 -07:00
gitbook.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
github.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
google_bigquery.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
google_cloud_storage_directory.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
google_cloud_storage_file.ipynb Allow to specify a custom loader for GcsFileLoader (#8868) 2023-08-07 22:57:31 -04:00
google_drive.ipynb Update google drive notebooks (#9851) 2023-08-28 19:29:35 -07:00
grobid.ipynb Improved grobid documentation (#9025) 2023-08-10 10:47:22 -04:00
gutenberg.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
hacker_news.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
huawei_obs_directory.ipynb Add Support for Loading Documents from Huawei OBS (#8573) 2023-08-01 09:30:30 -07:00
huawei_obs_file.ipynb Add Support for Loading Documents from Huawei OBS (#8573) 2023-08-01 09:30:30 -07:00
hugging_face_dataset.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
ifixit.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
image_captions.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
image.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
imsdb.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
index.mdx mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
iugu.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
joplin.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
jupyter_notebook.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
larksuite.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
mastodon.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
mediawikidump.ipynb Github add "Create PR" tool + Docs update (#8235) 2023-07-27 19:19:44 -07:00
merge_doc_loader.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
mhtml.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
microsoft_onedrive.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
microsoft_powerpoint.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
microsoft_sharepoint.ipynb Add SharePoint Loader (#4284) 2023-08-21 07:49:07 -07:00
microsoft_word.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
modern_treasury.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
news.ipynb Newspaper (#8647) 2023-08-02 17:56:08 -07:00
notion.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
notiondb.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
nuclia.ipynb Bagatur/revert revert nuclia (#8833) 2023-08-06 11:24:36 -07:00
obsidian.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
odt.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
open_city_data.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
org_mode.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
pandas_dataframe.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
pdf-amazonTextractPDFLoader.ipynb AmazonTextractPDFLoader documentation updates (#9415) 2023-08-20 16:40:15 -07:00
polars_dataframe.ipynb Add to support polars (#9610) 2023-08-22 07:36:24 -07:00
psychic.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
pubmed.ipynb PubMed document loader (#8893) 2023-08-08 14:26:03 -04:00
pyspark_dataframe.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
readthedocs_documentation.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
recursive_url_loader.ipynb Async Recursive URL loader (#8502) 2023-08-06 16:22:31 -07:00
reddit.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
roam.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
rockset.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
rss.ipynb update rss doc (#8761) 2023-08-04 08:25:20 -07:00
rst.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
sitemap.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
slack.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
snowflake.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
source_code.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
spreedly.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
stripe.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
subtitle.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
telegram.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
tencent_cos_directory.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
tencent_cos_file.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
tensorflow_datasets.ipynb tensoflow_datasets document loader (#8721) 2023-08-08 15:19:28 -04:00
tomarkdown.ipynb mv popular and additional chains to use cases (#8242) 2023-07-27 12:55:13 -07:00
toml.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
trello.ipynb added lxml to the pip install example since it is required (#8260) 2023-07-25 18:16:07 -07:00
tsv.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
twitter.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
unstructured_file.ipynb fix: apply unstructured preprocess functions (#9473) 2023-08-18 18:54:28 -07:00
url.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
weather.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
web_base.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
whatsapp_chat.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
wikipedia.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
xml.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
xorbits.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00
youtube_audio.ipynb Add local support for audio models (PR #7329) (#7591) 2023-08-02 01:24:53 -07:00
youtube_transcript.ipynb mv module integrations docs (#8101) 2023-07-23 23:23:16 -07:00