langchain/libs/community/langchain_community/document_loaders
Rohan Aggarwal 8021d2a2ab
community[minor]: Oraclevs integration (#21123)
Thank you for contributing to LangChain!

- Oracle AI Vector Search 
Oracle AI Vector Search is designed for Artificial Intelligence (AI)
workloads that allows you to query data based on semantics, rather than
keywords. One of the biggest benefit of Oracle AI Vector Search is that
semantic search on unstructured data can be combined with relational
search on business data in one single system. This is not only powerful
but also significantly more effective because you don't need to add a
specialized vector database, eliminating the pain of data fragmentation
between multiple systems.


- Oracle AI Vector Search is designed for Artificial Intelligence (AI)
workloads that allows you to query data based on semantics, rather than
keywords. One of the biggest benefit of Oracle AI Vector Search is that
semantic search on unstructured data can be combined with relational
search on business data in one single system. This is not only powerful
but also significantly more effective because you don't need to add a
specialized vector database, eliminating the pain of data fragmentation
between multiple systems.
This Pull Requests Adds the following functionalities
Oracle AI Vector Search : Vector Store
Oracle AI Vector Search : Document Loader
Oracle AI Vector Search : Document Splitter
Oracle AI Vector Search : Summary
Oracle AI Vector Search : Oracle Embeddings


- We have added unit tests and have our own local unit test suite which
verifies all the code is correct. We have made sure to add guides for
each of the components and one end to end guide that shows how the
entire thing runs.


- We have made sure that make format and make lint run clean.

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: skmishraoracle <shailendra.mishra@oracle.com>
Co-authored-by: hroyofc <harichandan.roy@oracle.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-05-04 03:15:35 +00:00
..
blob_loaders langchain[patch]: Migrate document loaders to use optional langchain community imports (#21095) 2024-05-01 11:26:25 -04:00
parsers (all): update removal in deprecation warnings from 0.2 to 0.3 (#21265) 2024-05-03 14:29:36 -04:00
__init__.py community[minor]: Oraclevs integration (#21123) 2024-05-04 03:15:35 +00:00
acreom.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
airbyte_json.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
airbyte.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
airtable.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
apify_dataset.py
arcgis_loader.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
arxiv.py community[minor]: Implement lazy_load() for ArxivLoader (#18664) 2024-03-06 09:16:49 -05:00
assemblyai.py community[patch]: docstrings update (#20301) 2024-04-11 16:23:27 -04:00
astradb.py (all): update removal in deprecation warnings from 0.2 to 0.3 (#21265) 2024-05-03 14:29:36 -04:00
async_html.py
athena.py community[minor]: import fix (#20995) 2024-04-29 10:32:50 -04:00
azlyrics.py
azure_ai_data.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
azure_blob_storage_container.py community[patch]: type ignore fixes (#18395) 2024-03-01 11:21:02 -08:00
azure_blob_storage_file.py
baiducloud_bos_directory.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
baiducloud_bos_file.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
base_o365.py community[patch]: Changes to base_o365 and sharepoint document loaders (#20373) 2024-04-17 00:36:15 +00:00
base.py core: Move document loader interfaces to core (#17723) 2024-03-06 13:59:00 -05:00
bibtex.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
bigquery.py (all): update removal in deprecation warnings from 0.2 to 0.3 (#21265) 2024-05-03 14:29:36 -04:00
bilibili.py community[patch]: docstrings update (#20301) 2024-04-11 16:23:27 -04:00
blackboard.py community[patch]: type ignore fixes (#18395) 2024-03-01 11:21:02 -08:00
blockchain.py
brave_search.py
browserbase.py community[minor]: added Browserbase loader (#20478) 2024-04-25 01:11:03 +00:00
browserless.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
cassandra.py community[minor]: Add async methods to CassandraLoader (#20609) 2024-04-18 19:45:20 +00:00
chatgpt.py
chm.py community[patch]: docstrings (#16810) 2024-02-09 12:48:57 -08:00
chromium.py community[patch]: Update comments for lazy_load method (#21063) 2024-05-01 01:20:57 -04:00
college_confidential.py
concurrent.py community[patch]: import flattening fix (#20110) 2024-04-10 13:01:19 -04:00
confluence.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
conllu.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
couchbase.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
csv_loader.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
cube_semantic.py community[patch]: Implement lazy_load() for CubeSemanticLoader (#18535) 2024-03-05 17:32:31 -08:00
datadog_logs.py
dataframe.py community[patch]: support modin document loader (#18866) 2024-03-10 18:40:04 -07:00
diffbot.py
directory.py community: fix DirectoryLoader progress bar (#19821) 2024-04-17 21:12:16 +00:00
discord.py
doc_intelligence.py docs: community docstring updates (#21040) 2024-04-29 17:40:23 -04:00
docugami.py (all): update removal in deprecation warnings from 0.2 to 0.3 (#21265) 2024-05-03 14:29:36 -04:00
docusaurus.py
dropbox.py infra: add print rule to ruff (#16221) 2024-02-09 16:13:30 -08:00
duckdb_loader.py
email.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
epub.py
etherscan.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
evernote.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
excel.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
facebook_chat.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
fauna.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
figma.py
firecrawl.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
gcs_directory.py (all): update removal in deprecation warnings from 0.2 to 0.3 (#21265) 2024-05-03 14:29:36 -04:00
gcs_file.py (all): update removal in deprecation warnings from 0.2 to 0.3 (#21265) 2024-05-03 14:29:36 -04:00
generic.py community[patch]: import flattening fix (#20110) 2024-04-10 13:01:19 -04:00
geodataframe.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
git.py Merge pull request #18539 2024-03-06 13:25:14 -05:00
gitbook.py community[minor]: Implement lazy_load() for GitbookLoader (#18670) 2024-03-06 09:14:36 -05:00
github.py community: Implement lazy_load() for GithubFileLoader (#18584) 2024-03-05 09:35:50 -08:00
glue_catalog.py community[minor]: Add glue catalog loader (#20220) 2024-04-16 11:39:23 -04:00
google_speech_to_text.py (all): update removal in deprecation warnings from 0.2 to 0.3 (#21265) 2024-05-03 14:29:36 -04:00
googledrive.py (all): update removal in deprecation warnings from 0.2 to 0.3 (#21265) 2024-05-03 14:29:36 -04:00
gutenberg.py
helpers.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
hn.py
html_bs.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
html.py
hugging_face_dataset.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
hugging_face_model.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
ifixit.py
image_captions.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
image.py
imsdb.py
iugu.py
joplin.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
json_loader.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
kinetica_loader.py community[minor]: Implemented Kinetica Document Loader and added notebooks (#20002) 2024-04-25 13:39:00 -07:00
lakefs.py
larksuite.py community[minor]: Add LarkSuite wiki document loader. (#21016) 2024-04-29 10:37:50 -04:00
llmsherpa.py community[minor]: add support for llmsherpa (#19741) 2024-03-29 16:04:57 -07:00
markdown.py corrected outdated link (#15053) 2023-12-22 12:39:38 -08:00
mastodon.py Merge pull request #18671 2024-03-06 13:23:14 -05:00
max_compute.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
mediawikidump.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
merge.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
mhtml.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
mintbase.py community[minor]: add mintbase loader to langchain (#20089) 2024-04-30 04:11:56 +00:00
modern_treasury.py
mongodb.py community[minor]: added a feature to filter documents in Mongoloader (#18253) 2024-03-08 12:06:35 -08:00
news.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
notebook.py community[patch]: add NotebookLoader unit test (#17721) 2024-03-29 00:27:46 +00:00
notion.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
notiondb.py community[patch]: Fix NotionDBLoader 400 Error by conditionally adding filter parameter (#19075) 2024-03-14 13:56:57 +00:00
nuclia.py infra: add print rule to ruff (#16221) 2024-02-09 16:13:30 -08:00
obs_directory.py
obs_file.py
obsidian.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
odt.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
onedrive_file.py
onedrive.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
onenote.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
open_city_data.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
oracleadb_loader.py community[minor]: add oracle autonomous database doc loader integration (#19536) 2024-03-26 17:02:18 -07:00
oracleai.py community[minor]: Oraclevs integration (#21123) 2024-05-04 03:15:35 +00:00
org_mode.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
pdf.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
pebblo.py community[patch]: Add classifier_url argument in PebbloSafeLoader and documentation update. (#21030) 2024-04-29 17:41:09 -04:00
polars_dataframe.py
powerpoint.py
psychic.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
pubmed.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
pyspark_dataframe.py
python.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
quip.py
readthedocs.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
recursive_url_loader.py community[patch]: Using the right encoding to parse the web page in RecursiveUrlLoader (#20632) 2024-04-30 18:41:36 +00:00
reddit.py
roam.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
rocksetdb.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
rspace.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
rss.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
rst.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
rtf.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
s3_directory.py community[patch]: Skip nested directories when using S3DirectoryLoader (#17829) 2024-03-08 16:50:58 -08:00
s3_file.py community[patch]: support unstructured_kwargs for s3 loader (#15473) 2024-03-27 22:03:48 +00:00
sharepoint.py community[patch]: Changes to base_o365 and sharepoint document loaders (#20373) 2024-04-17 00:36:15 +00:00
sitemap.py community[minor]: Implement lazy_load() for SitemapLoader (#18667) 2024-03-06 09:15:35 -05:00
slack_directory.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
snowflake_loader.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
spider.py multiple: Remove unnecessary Ruff suppression comments (#21050) 2024-04-30 17:13:48 +00:00
spreedly.py
sql_database.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
srt.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
stripe.py
surrealdb.py community[patch]: SurrealDB fix for asyncio (#16092) 2024-01-23 19:46:19 -08:00
telegram.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
tencent_cos_directory.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
tencent_cos_file.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
tensorflow_datasets.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
text.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
tidb.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
tomarkdown.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
toml.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
trello.py community: Implement lazy_load() for TrelloLoader (#18658) 2024-03-06 13:04:36 -05:00
tsv.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
twitter.py
unstructured.py community[minor]: import fix (#20995) 2024-04-29 10:32:50 -04:00
url_playwright.py docs: community docstring updates (#21040) 2024-04-29 17:40:23 -04:00
url_selenium.py
url.py
vsdx.py community[patch]: import flattening fix (#20110) 2024-04-10 13:01:19 -04:00
weather.py community: Use default load() implementation in doc loaders (#18385) 2024-03-01 14:46:52 -05:00
web_base.py core[minor]: Add aload to document loader (#19936) 2024-04-03 10:46:47 -04:00
whatsapp_chat.py community: Implement lazy_load() for WhatsAppChatLoader (#18677) 2024-03-06 13:03:46 -05:00
wikipedia.py community[minor]: Implement lazy_load() for WikipediaLoader (#18680) 2024-03-06 13:03:21 -05:00
word_document.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
xml.py community: better support of pathlib paths in document loaders (#18396) 2024-03-26 11:51:52 -04:00
xorbits.py
youtube.py community[patch]: docstrings (#16810) 2024-02-09 12:48:57 -08:00
yuque.py community[minor]: add Yuque document loader (#17924) 2024-03-05 15:54:07 -08:00