docs: `integrations/providers` update 9 (#19941)

- Added missed providers
- Added links, descriptions in related examples
- Formatted in a consistent format

Co-authored-by: Erick Friis <erick@langchain.dev>
pull/20030/head
Leonid Ganeline 2 months ago committed by GitHub
parent 644ff46100
commit 3856dedff4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -67,7 +67,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
"version": "3.10.12"
}
},
"nbformat": 4,

@ -8,7 +8,25 @@
"source": [
"# Athena\n",
"\n",
"This notebooks goes over how to load documents from AWS Athena"
">[Amazon Athena](https://aws.amazon.com/athena/) is a serverless, interactive analytics service built\n",
">on open-source frameworks, supporting open-table and file formats. `Athena` provides a simplified,\n",
">flexible way to analyze petabytes of data where it lives. Analyze data or build applications\n",
">from an Amazon Simple Storage Service (S3) data lake and 30 data sources, including on-premises data\n",
">sources or other cloud systems using SQL or Python. `Athena` is built on open-source `Trino`\n",
">and `Presto` engines and `Apache Spark` frameworks, with no provisioning or configuration effort required.\n",
"\n",
"This notebook goes over how to load documents from `AWS Athena`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setting up\n",
"\n",
"Follow [instructions to set up an AWS accoung](https://docs.aws.amazon.com/athena/latest/ug/setting-up.html).\n",
"\n",
"Install a python library:"
]
},
{
@ -22,6 +40,13 @@
"! pip install boto3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example"
]
},
{
"cell_type": "code",
"execution_count": null,
@ -98,13 +123,23 @@
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python"
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

@ -7,11 +7,11 @@
"source": [
"# BibTeX\n",
"\n",
"> BibTeX is a file format and reference management system commonly used in conjunction with LaTeX typesetting. It serves as a way to organize and store bibliographic information for academic and research documents.\n",
">[BibTeX](https://www.ctan.org/pkg/bibtex) is a file format and reference management system commonly used in conjunction with `LaTeX` typesetting. It serves as a way to organize and store bibliographic information for academic and research documents.\n",
"\n",
"BibTeX files have a .bib extension and consist of plain text entries representing references to various publications, such as books, articles, conference papers, theses, and more. Each BibTeX entry follows a specific structure and contains fields for different bibliographic details like author names, publication title, journal or book title, year of publication, page numbers, and more.\n",
"`BibTeX` files have a `.bib` extension and consist of plain text entries representing references to various publications, such as books, articles, conference papers, theses, and more. Each `BibTeX` entry follows a specific structure and contains fields for different bibliographic details like author names, publication title, journal or book title, year of publication, page numbers, and more.\n",
"\n",
"Bibtex files can also store the path to documents, such as `.pdf` files that can be retrieved."
"BibTeX files can also store the path to documents, such as `.pdf` files that can be retrieved."
]
},
{
@ -184,7 +184,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
"version": "3.10.12"
}
},
"nbformat": 4,

@ -6,7 +6,8 @@
"metadata": {},
"source": [
"# Couchbase\n",
"[Couchbase](http://couchbase.com/) is an award-winning distributed NoSQL cloud database that delivers unmatched versatility, performance, scalability, and financial value for all of your cloud, mobile, AI, and edge computing applications.\n"
"\n",
">[Couchbase](http://couchbase.com/) is an award-winning distributed NoSQL cloud database that delivers unmatched versatility, performance, scalability, and financial value for all of your cloud, mobile, AI, and edge computing applications.\n"
]
},
{
@ -195,7 +196,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
"version": "3.10.12"
}
},
"nbformat": 4,

@ -112,6 +112,17 @@ See a [usage example](/docs/integrations/document_loaders/amazon_textract).
from langchain_community.document_loaders import AmazonTextractPDFLoader
```
### Amazon Athena
>[Amazon Athena](https://aws.amazon.com/athena/) is a serverless, interactive analytics service built
>on open-source frameworks, supporting open-table and file formats.
See a [usage example](/docs/integrations/document_loaders/athena).
```python
from langchain_community.document_loaders.athena import AthenaLoader
```
## Vector stores
### Amazon OpenSearch Service

@ -0,0 +1,15 @@
# Acreom
[acreom](https://acreom.com) is a dev-first knowledge base with tasks running on local `markdown` files.
## Installation and Setup
No installation is required.
## Document Loader
See a [usage example](/docs/integrations/document_loaders/acreom).
```python
from langchain_community.document_loaders import AcreomLoader
```

@ -3,11 +3,7 @@
>[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs,
> databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.
## [AirbyteLoader](/docs/integrations/document_loaders/airbyte)
This loader is built on top of [PyAirbyte](https://pypi.org/project/airbyte/) for easy setup and use.
### Installation and Setup
## Installation and Setup
```bash
pip install -U langchain-airbyte
@ -15,58 +11,22 @@ pip install -U langchain-airbyte
:::note
Currently, the `airbyte` library does not support Pydantic v2.
Currently, the `langchain-airbyte` library does not support Pydantic v2.
Please downgrade to Pydantic v1 to use this package.
This package also currently requires Python 3.10+.
:::
The integration package doesn't have any global environment variables that need to be
The integration package doesn't require any global environment variables that need to be
set, but some integrations (e.g. `source-github`) may need credentials passed in.
### Document Loader
`AirbyteLoader` class exposes a single document loader for Airbyte sources.
```python
from langchain_airbyte import AirbyteLoader
loader = AirbyteLoader(
source="source-faker",
stream="users",
config={"count": 100},
)
docs = loader.load()
```
For more information, see the full [AirbyteLoader docs](/docs/integrations/document_loaders/airbyte).
## AirbyteJSONLoader (Deprecated)
This loader is deprecated and should be swapped out for `AirbyteLoader`, which doesn't require any of the docker setup!
### Installation and Setup
## Document loader
This instruction shows how to load any source from `Airbyte` into a local `JSON` file that can be read in as a document.
### AirbyteLoader
**Prerequisites:**
Have `docker desktop` installed.
**Steps:**
1. Clone Airbyte from GitHub - `git clone https://github.com/airbytehq/airbyte.git`.
2. Switch into Airbyte directory - `cd airbyte`.
3. Start Airbyte - `docker compose up`.
4. In your browser, just visit http://localhost:8000. You will be asked for a username and password. By default, that's username `airbyte` and password `password`.
5. Setup any source you wish.
6. Set destination as Local JSON, with specified destination path - lets say `/json_data`. Set up a manual sync.
7. Run the connection.
8. To see what files are created, navigate to: `file:///tmp/airbyte_local/`.
### Document Loader
See a [usage example](/docs/integrations/document_loaders/airbyte_json).
See a [usage example](/docs/integrations/document_loaders/airbyte).
```python
from langchain_community.document_loaders import AirbyteJSONLoader
from langchain_airbyte import AirbyteLoader
```

@ -0,0 +1,20 @@
# Alchemy
>[Alchemy](https://www.alchemy.com) is the platform to build blockchain applications.
## Installation and Setup
Check out the [installation guide](/docs/integrations/document_loaders/blockchain).
## Document loader
### BlockchainLoader on the Alchemy platform
See a [usage example](/docs/integrations/document_loaders/blockchain).
```python
from langchain_community.document_loaders.blockchain import (
BlockchainDocumentLoader,
BlockchainType,
)
```

@ -0,0 +1,27 @@
# ArcGIS
>[ArcGIS](https://www.esri.com/en-us/arcgis/about-arcgis/overview) is a family of client,
> server and online geographic information system software developed and maintained by [Esri](https://www.esri.com/).
>
>`ArcGISLoader` uses the `arcgis` package.
> `arcgis` is a Python library for the vector and raster analysis, geocoding, map making,
> routing and directions. It administers, organizes and manages users,
> groups and information items in your GIS.
>It enables access to ready-to-use maps and curated geographic data from `Esri`
> and other authoritative sources, and works with your own data as well.
## Installation and Setup
We have to install the `arcgis` package.
```bash
pip install -U arcgis
```
## Document Loader
See a [usage example](/docs/integrations/document_loaders/arcgis).
```python
from langchain_community.document_loaders import ArcGISLoader
```

@ -0,0 +1,32 @@
# AssemblyAI
>[AssemblyAI](https://www.assemblyai.com/) builds `Speech AI` models for tasks like
speech-to-text, speaker diarization, speech summarization, and more.
> `AssemblyAIs` Speech AI models include accurate speech-to-text for voice data
> (such as calls, virtual meetings, and podcasts), speaker detection, sentiment analysis,
> chapter detection, PII redaction.
## Installation and Setup
Get your [API key](https://www.assemblyai.com/dashboard/signup).
Install the `assemblyai` package.
```bash
pip install -U assemblyai
```
## Document Loader
### AssemblyAI Audio Transcript
The `AssemblyAIAudioTranscriptLoader` transcribes audio files with the `AssemblyAI API`
and loads the transcribed text into documents.
See a [usage example](/docs/integrations/document_loaders/assemblyai).
```python
from langchain_community.document_loaders import AssemblyAIAudioTranscriptLoader
```

@ -0,0 +1,20 @@
# BibTeX
>[BibTeX](https://www.ctan.org/pkg/bibtex) is a file format and reference management system commonly used in conjunction with `LaTeX` typesetting. It serves as a way to organize and store bibliographic information for academic and research documents.
## Installation and Setup
We have to install the `bibtexparser` and `pymupdf` packages.
```bash
pip install bibtexparser pymupdf
```
## Document loader
See a [usage example](/docs/integrations/document_loaders/bibtex).
```python
from langchain_community.document_loaders import BibtexLoader
```

@ -0,0 +1,18 @@
# Browserless
>[Browserless](https://www.browserless.io/docs/start) is a service that allows you to
> run headless Chrome instances in the cloud. Its a great way to run browser-based
> automation at scale without having to worry about managing your own infrastructure.
## Installation and Setup
We have to get the API key [here](https://www.browserless.io/pricing/).
## Document loader
See a [usage example](/docs/integrations/document_loaders/browserless).
```python
from langchain_community.document_loaders import BrowserlessLoader
```

@ -0,0 +1,22 @@
# ByteDance
>[ByteDance](https://bytedance.com/) is a Chinese internet technology company.
## Installation and Setup
Get the access token.
You can find the access instructions [here](https://open.larksuite.com/document)
## Document Loader
### Lark Suite
>[Lark Suite](https://www.larksuite.com/) is an enterprise collaboration platform
> developed by `ByteDance`.
See a [usage example](/docs/integrations/document_loaders/larksuite).
```python
from langchain_community.document_loaders.larksuite import LarkSuiteDocLoader
```

@ -0,0 +1,22 @@
# Couchbase
>[Couchbase](http://couchbase.com/) is an award-winning distributed NoSQL cloud database
> that delivers unmatched versatility, performance, scalability, and financial value
> for all of your cloud, mobile, AI, and edge computing applications.
## Installation and Setup
We have to install the `couchbase`package.
```bash
pip install couchbase
```
## Document loader
See a [usage example](/docs/integrations/document_loaders/couchbase).
```python
from langchain_community.document_loaders.couchbase import CouchbaseLoader
```

@ -0,0 +1,21 @@
# Cube
>[Cube](https://cube.dev/) is the Semantic Layer for building data apps. It helps
> data engineers and application developers access data from modern data stores,
> organize it into consistent definitions, and deliver it to every application.
## Installation and Setup
We have to get the API key and the URL of the Cube instance. See
[these instructions](https://cube.dev/docs/product/apis-integrations/rest-api#configuration-base-path).
## Document loader
### Cube Semantic Layer
See a [usage example](/docs/integrations/document_loaders/cube_semantic).
```python
from langchain_community.document_loaders import CubeSemanticLoader
```

@ -0,0 +1,20 @@
# Docusaurus
>[Docusaurus](https://docusaurus.io/) is a static-site generator which provides
> out-of-the-box documentation features.
## Installation and Setup
```bash
pip install -U beautifulsoup4 lxml
```
## Document Loader
See a [usage example](/docs/integrations/document_loaders/docusaurus).
```python
from langchain_community.document_loaders import DocusaurusLoader
```

@ -0,0 +1,21 @@
# Dropbox
>[Dropbox](https://en.wikipedia.org/wiki/Dropbox) is a file hosting service that brings everything-traditional
> files, cloud content, and web shortcuts together in one place.
## Installation and Setup
See the detailed [installation guide](/docs/integrations/document_loaders/dropbox#prerequisites).
```bash
pip install -U dropbox
```
## Document Loader
See a [usage example](/docs/integrations/document_loaders/dropbox).
```python
from langchain_community.document_loaders import DropboxLoader
```

@ -0,0 +1,18 @@
# Etherscan
>[Etherscan](https://docs.etherscan.io/) is the leading blockchain explorer,
> search, API and analytics platform for `Ethereum`, a decentralized smart contracts platform.
## Installation and Setup
See the detailed [installation guide](/docs/integrations/document_loaders/etherscan).
## Document Loader
See a [usage example](/docs/integrations/document_loaders/etherscan).
```python
from langchain_community.document_loaders import EtherscanLoader
```

@ -0,0 +1,25 @@
# Fauna
>[Fauna](https://fauna.com/) is a distributed document-relational database
> that combines the flexibility of documents with the power of a relational,
> ACID compliant database that scales across regions, clouds or the globe.
## Installation and Setup
We have to get the secret key.
See the detailed [guide](https://docs.fauna.com/fauna/current/learn/security_model/).
We have to install the `fauna` package.
```bash
pip install -U fauna
```
## Document Loader
See a [usage example](/docs/integrations/document_loaders/fauna).
```python
from langchain_community.document_loaders.fauna import FaunaLoader
```

@ -0,0 +1,23 @@
# Geopandas
>[GeoPandas](https://geopandas.org/) is an open source project to make working
> with geospatial data in python easier. `GeoPandas` extends the datatypes used by
> `pandas` to allow spatial operations on geometric types.
> Geometric operations are performed by `shapely`.
## Installation and Setup
We have to install several python packages.
```bash
pip install -U sodapy pandas geopandas
```
## Document Loader
See a [usage example](/docs/integrations/document_loaders/geopandas).
```python
from langchain_community.document_loaders import OpenCityDataLoader
```

@ -0,0 +1,23 @@
# GitHub
>[GitHub](https://github.com/) is a developer platform that allows developers to create,
> store, manage and share their code. It uses `Git` software, providing the
> distributed version control of Git plus access control, bug tracking,
> software feature requests, task management, continuous integration, and wikis for every project.
## Installation and Setup
To access the GitHub API, you need a [personal access token](https://github.com/settings/tokens).
## Document Loader
There are two document loaders available for GitHub.
See a [usage example](/docs/integrations/document_loaders/github).
```python
from langchain_community.document_loaders import GitHubIssuesLoader
from langchain.document_loaders import GithubFileLoader
```

@ -0,0 +1,37 @@
# Huawei
>[Huawei Technologies Co., Ltd.](https://www.huawei.com/) is a Chinese multinational
> digital communications technology corporation.
>
>[Huawei Cloud](https://www.huaweicloud.com/intl/en-us/product/) provides a comprehensive suite of
> global cloud computing services.
## Installation and Setup
To access the `Huawei Cloud`, you need an access token.
You also have to install a python library:
```bash
pip install -U esdk-obs-python
```
## Document Loader
### Huawei OBS Directory
See a [usage example](/docs/integrations/document_loaders/huawei_obs_directory).
```python
from langchain_community.document_loaders import OBSDirectoryLoader
```
### Huawei OBS File
See a [usage example](/docs/integrations/document_loaders/huawei_obs_file).
```python
from langchain_community.document_loaders.obs_file import OBSFileLoader
```

@ -0,0 +1,19 @@
# Iugu
>[Iugu](https://www.iugu.com/) is a Brazilian services and software as a service (SaaS)
> company. It offers payment-processing software and application programming
> interfaces for e-commerce websites and mobile applications.
## Installation and Setup
The `Iugu API` requires an access token, which can be found inside of the `Iugu` dashboard.
## Document Loader
See a [usage example](/docs/integrations/document_loaders/iugu).
```python
from langchain_community.document_loaders import IuguLoader
```

@ -0,0 +1,19 @@
# Joplin
>[Joplin](https://joplinapp.org/) is an open-source note-taking app. It captures your thoughts
> and securely accesses them from any device.
## Installation and Setup
The `Joplin API` requires an access token.
You can find installation instructions [here](https://joplinapp.org/api/references/rest_api/).
## Document Loader
See a [usage example](/docs/integrations/document_loaders/joplin).
```python
from langchain_community.document_loaders import JoplinLoader
```

@ -0,0 +1,18 @@
# lakeFS
>[lakeFS](https://docs.lakefs.io/) provides scalable version control over
> the data lake, and uses Git-like semantics to create and access those versions.
## Installation and Setup
Get the `ENDPOINT`, `LAKEFS_ACCESS_KEY`, and `LAKEFS_SECRET_KEY`.
You can find installation instructions [here](https://docs.lakefs.io/quickstart/launch.html).
## Document Loader
See a [usage example](/docs/integrations/document_loaders/lakefs).
```python
from langchain_community.document_loaders import LakeFSLoader
```
Loading…
Cancel
Save