You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/libs/community/langchain_community/document_loaders
Luis Antonio Vieira Junior 67c880af74
community[patch]: adding linearization config to AmazonTextractPDFLoader (#17489)
- **Description:** Adding an optional parameter `linearization_config`
to the `AmazonTextractPDFLoader` so the caller can define how the output
will be linearized, instead of forcing a predefined set of linearization
configs. It will still have a default configuration as this will be an
optional parameter.
- **Issue:** #17457
- **Dependencies:** The same ones that already exist for
`AmazonTextractPDFLoader`
- **Twitter handle:** [@lvieirajr19](https://twitter.com/lvieirajr19)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
6 months ago
..
blob_loaders core: Move document loader interfaces to core (#17723) 7 months ago
parsers community[patch]: adding linearization config to AmazonTextractPDFLoader (#17489) 6 months ago
__init__.py Merge pull request #18421 7 months ago
acreom.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
airbyte.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
airbyte_json.py
airtable.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
apify_dataset.py
arcgis_loader.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
arxiv.py community[minor]: Implement lazy_load() for ArxivLoader (#18664) 7 months ago
assemblyai.py Merge pull request #18421 7 months ago
astradb.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
async_html.py
athena.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
azlyrics.py
azure_ai_data.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
azure_blob_storage_container.py community[patch]: type ignore fixes (#18395) 7 months ago
azure_blob_storage_file.py
baiducloud_bos_directory.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
baiducloud_bos_file.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
base.py core: Move document loader interfaces to core (#17723) 7 months ago
base_o365.py
bibtex.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
bigquery.py
bilibili.py
blackboard.py community[patch]: type ignore fixes (#18395) 7 months ago
blockchain.py
brave_search.py
browserless.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
cassandra.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
chatgpt.py
chm.py community[patch]: docstrings (#16810) 7 months ago
chromium.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
college_confidential.py
concurrent.py
confluence.py Merge pull request #18436 7 months ago
conllu.py
couchbase.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
csv_loader.py community[patch]: Implement lazy_load() for CSVLoader (#18391) 7 months ago
cube_semantic.py community[patch]: Implement lazy_load() for CubeSemanticLoader (#18535) 7 months ago
datadog_logs.py
dataframe.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
diffbot.py
directory.py community[minor]: add exclude parameter to DirectoryLoader (#17316) 7 months ago
discord.py
doc_intelligence.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
docugami.py
docusaurus.py
dropbox.py infra: add print rule to ruff (#16221) 7 months ago
duckdb_loader.py
email.py community[minor]: Implement lazy_load() for OutlookMessageLoader (#18668) 7 months ago
epub.py
etherscan.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
evernote.py community[patch]: Implement lazy_load() for EverNoteLoader (#18538) 7 months ago
excel.py
facebook_chat.py community[minor]: Implement lazy_load() for FacebookChatLoader (#18669) 7 months ago
fauna.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
figma.py
gcs_directory.py
gcs_file.py
generic.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
geodataframe.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
git.py Merge pull request #18539 7 months ago
gitbook.py community[minor]: Implement lazy_load() for GitbookLoader (#18670) 7 months ago
github.py community: Implement lazy_load() for GithubFileLoader (#18584) 7 months ago
google_speech_to_text.py
googledrive.py infra: add print rule to ruff (#16221) 7 months ago
gutenberg.py
helpers.py
hn.py
html.py
html_bs.py Merge pull request #18423 7 months ago
hugging_face_dataset.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
hugging_face_model.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
ifixit.py
image.py
image_captions.py
imsdb.py
iugu.py
joplin.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
json_loader.py community: Implement lazy_load() for JSONLoader (#18643) 6 months ago
lakefs.py
larksuite.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
markdown.py
mastodon.py Merge pull request #18671 7 months ago
max_compute.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
mediawikidump.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
merge.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
mhtml.py community[patch]: Implement lazy_load() for MHTMLLoader (#18648) 6 months ago
modern_treasury.py
mongodb.py community[minor]: added a feature to filter documents in Mongoloader (#18253) 6 months ago
news.py
notebook.py
notion.py
notiondb.py community[patch]: support query filters for NotionDBLoader (#17217) 7 months ago
nuclia.py infra: add print rule to ruff (#16221) 7 months ago
obs_directory.py
obs_file.py
obsidian.py Merge pull request #18654 7 months ago
odt.py
onedrive.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
onedrive_file.py
onenote.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
open_city_data.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
org_mode.py
pdf.py community[patch]: adding linearization config to AmazonTextractPDFLoader (#17489) 6 months ago
pebblo.py community[patch]: Fix pwd import that is not available on windows (#17532) 7 months ago
polars_dataframe.py
powerpoint.py
psychic.py Merge pull request #18656 7 months ago
pubmed.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
pyspark_dataframe.py
python.py
quip.py
readthedocs.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
recursive_url_loader.py community[patch]: fix RecursiveUrlLoader metadata_extractor return type (#18193) 7 months ago
reddit.py
roam.py
rocksetdb.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
rspace.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
rss.py
rst.py
rtf.py
s3_directory.py community[patch]: Skip nested directories when using S3DirectoryLoader (#17829) 6 months ago
s3_file.py
sharepoint.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
sitemap.py community[minor]: Implement lazy_load() for SitemapLoader (#18667) 7 months ago
slack_directory.py community: Implement lazy_load() for SlackDirectoryLoader (#18675) 7 months ago
snowflake_loader.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
spreedly.py
sql_database.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
srt.py
stripe.py
surrealdb.py
telegram.py text-splitters[minor], langchain[minor], community[patch], templates, docs: langchain-text-splitters 0.0.1 (#18346) 7 months ago
tencent_cos_directory.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
tencent_cos_file.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
tensorflow_datasets.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
text.py Merge pull request #18674 7 months ago
tidb.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
tomarkdown.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
toml.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
trello.py community: Implement lazy_load() for TrelloLoader (#18658) 7 months ago
tsv.py
twitter.py
unstructured.py Merge pull request #18647 7 months ago
url.py
url_playwright.py community: Implement lazy_load() for PlaywrightURLLoader (#18676) 7 months ago
url_selenium.py
vsdx.py
weather.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
web_base.py community: Use default load() implementation in doc loaders (#18385) 7 months ago
whatsapp_chat.py community: Implement lazy_load() for WhatsAppChatLoader (#18677) 7 months ago
wikipedia.py community[minor]: Implement lazy_load() for WikipediaLoader (#18680) 7 months ago
word_document.py
xml.py
xorbits.py
youtube.py community[patch]: docstrings (#16810) 7 months ago
yuque.py community[minor]: add Yuque document loader (#17924) 7 months ago