docs: `integrations/providers` update 9 (#19941)

- Added missed providers - Added links, descriptions in related examples - Formatted in a consistent format Co-authored-by: Erick Friis <erick@langchain.dev>
2 months ago · 3856dedff4
parent 644ff46100
commit 3856dedff4
25 changed files with 485 additions and 58 deletions
--- a/docs/docs/integrations/document_loaders/acreom.ipynb
+++ b/docs/docs/integrations/document_loaders/acreom.ipynb
@ -67,7 +67,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.10"
+   "version": "3.10.12"
  }
 },
 "nbformat": 4,
--- a/docs/docs/integrations/document_loaders/athena.ipynb
+++ b/docs/docs/integrations/document_loaders/athena.ipynb
@ -8,7 +8,25 @@
   "source": [
    "# Athena\n",
    "\n",
-    "This notebooks goes over how to load documents from AWS Athena"
+    ">[Amazon Athena](https://aws.amazon.com/athena/) is a serverless, interactive analytics service built\n",
+    ">on open-source frameworks, supporting open-table and file formats. `Athena` provides a simplified,\n",
+    ">flexible way to analyze petabytes of data where it lives. Analyze data or build applications\n",
+    ">from an Amazon Simple Storage Service (S3) data lake and 30 data sources, including on-premises data\n",
+    ">sources or other cloud systems using SQL or Python. `Athena` is built on open-source `Trino`\n",
+    ">and `Presto` engines and `Apache Spark` frameworks, with no provisioning or configuration effort required.\n",
+    "\n",
+    "This notebook goes over how to load documents from `AWS Athena`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setting up\n",
+    "\n",
+    "Follow [instructions to set up an AWS accoung](https://docs.aws.amazon.com/athena/latest/ug/setting-up.html).\n",
+    "\n",
+    "Install a python library:"
   ]
  },
  {
@ -22,6 +40,13 @@
    "! pip install boto3"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Example"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@ -98,13 +123,23 @@
   "provenance": []
  },
  "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
   "name": "python3"
  },
  "language_info": {
-   "name": "python"
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
  }
 },
 "nbformat": 4,
- "nbformat_minor": 0
+ "nbformat_minor": 4
 }
--- a/docs/docs/integrations/document_loaders/bibtex.ipynb
+++ b/docs/docs/integrations/document_loaders/bibtex.ipynb
@ -7,11 +7,11 @@
   "source": [
    "# BibTeX\n",
    "\n",
-    "> BibTeX is a file format and reference management system commonly used in conjunction with LaTeX typesetting. It serves as a way to organize and store bibliographic information for academic and research documents.\n",
+    ">[BibTeX](https://www.ctan.org/pkg/bibtex) is a file format and reference management system commonly used in conjunction with `LaTeX` typesetting. It serves as a way to organize and store bibliographic information for academic and research documents.\n",
    "\n",
-    "BibTeX files have a .bib extension and consist of plain text entries representing references to various publications, such as books, articles, conference papers, theses, and more. Each BibTeX entry follows a specific structure and contains fields for different bibliographic details like author names, publication title, journal or book title, year of publication, page numbers, and more.\n",
+    "`BibTeX` files have a `.bib` extension and consist of plain text entries representing references to various publications, such as books, articles, conference papers, theses, and more. Each `BibTeX` entry follows a specific structure and contains fields for different bibliographic details like author names, publication title, journal or book title, year of publication, page numbers, and more.\n",
    "\n",
-    "Bibtex files can also store the path to documents, such as `.pdf` files that can be retrieved."
+    "BibTeX files can also store the path to documents, such as `.pdf` files that can be retrieved."
   ]
  },
  {
@ -184,7 +184,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.3"
+   "version": "3.10.12"
  }
 },
 "nbformat": 4,
--- a/docs/docs/integrations/document_loaders/couchbase.ipynb
+++ b/docs/docs/integrations/document_loaders/couchbase.ipynb
@ -6,7 +6,8 @@
   "metadata": {},
   "source": [
    "# Couchbase\n",
-    "[Couchbase](http://couchbase.com/) is an award-winning distributed NoSQL cloud database that delivers unmatched versatility, performance, scalability, and financial value for all of your cloud, mobile, AI, and edge computing applications.\n"
+    "\n",
+    ">[Couchbase](http://couchbase.com/) is an award-winning distributed NoSQL cloud database that delivers unmatched versatility, performance, scalability, and financial value for all of your cloud, mobile, AI, and edge computing applications.\n"
   ]
  },
  {
@ -195,7 +196,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.9.13"
+   "version": "3.10.12"
  }
 },
 "nbformat": 4,
--- a/docs/docs/integrations/platforms/aws.mdx
+++ b/docs/docs/integrations/platforms/aws.mdx
@ -112,6 +112,17 @@ See a [usage example](/docs/integrations/document_loaders/amazon_textract).
 from langchain_community.document_loaders import AmazonTextractPDFLoader
 ```

+### Amazon Athena
+
+>[Amazon Athena](https://aws.amazon.com/athena/) is a serverless, interactive analytics service built
+>on open-source frameworks, supporting open-table and file formats.
+
+See a [usage example](/docs/integrations/document_loaders/athena).
+
+```python
+from langchain_community.document_loaders.athena import AthenaLoader
+```
+
 ## Vector stores

 ### Amazon OpenSearch Service
--- a/docs/docs/integrations/providers/acreom.mdx
+++ b/docs/docs/integrations/providers/acreom.mdx
@ -0,0 +1,15 @@
+# Acreom
+
+[acreom](https://acreom.com) is a dev-first knowledge base with tasks running on local `markdown` files.
+
+## Installation and Setup
+
+No installation is required. 
+
+## Document Loader
+
+See a [usage example](/docs/integrations/document_loaders/acreom).
+
+```python
+from langchain_community.document_loaders import AcreomLoader
+```
--- a/docs/docs/integrations/providers/airbyte.mdx
+++ b/docs/docs/integrations/providers/airbyte.mdx
@ -3,11 +3,7 @@
 >[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, 
 > databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.

-## [AirbyteLoader](/docs/integrations/document_loaders/airbyte)
-
-This loader is built on top of [PyAirbyte](https://pypi.org/project/airbyte/) for easy setup and use.
-
-### Installation and Setup
+## Installation and Setup

 ```bash
 pip install -U langchain-airbyte
@ -15,58 +11,22 @@ pip install -U langchain-airbyte

 :::note

-Currently, the `airbyte` library does not support Pydantic v2.
+Currently, the `langchain-airbyte` library does not support Pydantic v2.
 Please downgrade to Pydantic v1 to use this package.

 This package also currently requires Python 3.10+.

 :::

-The integration package doesn't have any global environment variables that need to be
+The integration package doesn't require any global environment variables that need to be
 set, but some integrations (e.g. `source-github`) may need credentials passed in.

-### Document Loader
-
-`AirbyteLoader` class exposes a single document loader for Airbyte sources.
-
-```python
-from langchain_airbyte import AirbyteLoader
-
-loader = AirbyteLoader(
-    source="source-faker",
-    stream="users",
-    config={"count": 100},
-)
-docs = loader.load()
-```
-
-For more information, see the full [AirbyteLoader docs](/docs/integrations/document_loaders/airbyte).
-
-## AirbyteJSONLoader (Deprecated)
-
-This loader is deprecated and should be swapped out for `AirbyteLoader`, which doesn't require any of the docker setup!
-
-### Installation and Setup
+## Document loader

-This instruction shows how to load any source from `Airbyte` into a local `JSON` file that can be read in as a document.
+### AirbyteLoader

-**Prerequisites:**
-Have `docker desktop` installed.
-
-**Steps:**
-1. Clone Airbyte from GitHub - `git clone https://github.com/airbytehq/airbyte.git`.
-2. Switch into Airbyte directory - `cd airbyte`.
-3. Start Airbyte - `docker compose up`.
-4. In your browser, just visit http://localhost:8000. You will be asked for a username and password. By default, that's username `airbyte` and password `password`.
-5. Setup any source you wish.
-6. Set destination as Local JSON, with specified destination path - lets say `/json_data`. Set up a manual sync.
-7. Run the connection.
-8. To see what files are created, navigate to: `file:///tmp/airbyte_local/`.
-
-### Document Loader
-
-See a [usage example](/docs/integrations/document_loaders/airbyte_json).
+See a [usage example](/docs/integrations/document_loaders/airbyte).

 ```python
-from langchain_community.document_loaders import AirbyteJSONLoader
+from langchain_airbyte import AirbyteLoader
 ```
--- a/docs/docs/integrations/providers/alchemy.mdx
+++ b/docs/docs/integrations/providers/alchemy.mdx
@ -0,0 +1,20 @@
+# Alchemy
+
+>[Alchemy](https://www.alchemy.com) is the platform to build blockchain applications.
+
+## Installation and Setup
+
+Check out the [installation guide](/docs/integrations/document_loaders/blockchain).
+
+## Document loader
+
+### BlockchainLoader on the Alchemy platform
+
+See a [usage example](/docs/integrations/document_loaders/blockchain).
+
+```python
+from langchain_community.document_loaders.blockchain import (
+    BlockchainDocumentLoader,
+    BlockchainType,
+)
+```
--- a/docs/docs/integrations/providers/arcgis.mdx
+++ b/docs/docs/integrations/providers/arcgis.mdx
@ -0,0 +1,27 @@
+# ArcGIS
+
+>[ArcGIS](https://www.esri.com/en-us/arcgis/about-arcgis/overview) is a family of client, 
+> server and online geographic information system software developed and maintained by [Esri](https://www.esri.com/).
+> 
+>`ArcGISLoader` uses the `arcgis` package.
+> `arcgis` is a Python library for the vector and raster analysis, geocoding, map making, 
+> routing and directions. It administers, organizes and manages users, 
+> groups and information items in your GIS.
+>It enables access to ready-to-use maps and curated geographic data from `Esri` 
+> and other authoritative sources, and works with your own data as well. 
+
+## Installation and Setup
+
+We have to install the `arcgis` package.
+
+```bash
+pip install -U arcgis
+```
+
+## Document Loader
+
+See a [usage example](/docs/integrations/document_loaders/arcgis).
+
+```python
+from langchain_community.document_loaders import ArcGISLoader
+```
--- a/docs/docs/integrations/providers/assemblyai.mdx
+++ b/docs/docs/integrations/providers/assemblyai.mdx
@ -0,0 +1,32 @@
+# AssemblyAI
+
+>[AssemblyAI](https://www.assemblyai.com/) builds `Speech AI` models for tasks like 
+speech-to-text, speaker diarization, speech summarization, and more.
+> `AssemblyAI’s` Speech AI models include accurate speech-to-text for voice data 
+> (such as calls, virtual meetings, and podcasts), speaker detection, sentiment analysis, 
+> chapter detection, PII redaction.
+ 
+
+
+## Installation and Setup
+
+Get your [API key](https://www.assemblyai.com/dashboard/signup).
+
+Install the `assemblyai` package.
+
+```bash
+pip install -U assemblyai
+```
+
+## Document Loader
+
+###  AssemblyAI Audio Transcript
+
+The `AssemblyAIAudioTranscriptLoader` transcribes audio files with the `AssemblyAI API` 
+and loads the transcribed text into documents.
+
+See a [usage example](/docs/integrations/document_loaders/assemblyai).
+
+```python
+from langchain_community.document_loaders import AssemblyAIAudioTranscriptLoader
+```
--- a/docs/docs/integrations/providers/bibtex.mdx
+++ b/docs/docs/integrations/providers/bibtex.mdx
@ -0,0 +1,20 @@
+# BibTeX
+
+>[BibTeX](https://www.ctan.org/pkg/bibtex) is a file format and reference management system commonly used in conjunction with `LaTeX` typesetting. It serves as a way to organize and store bibliographic information for academic and research documents.
+
+## Installation and Setup
+
+We have to install the `bibtexparser` and `pymupdf` packages.
+
+```bash
+pip install bibtexparser pymupdf
+```
+
+
+## Document loader
+
+See a [usage example](/docs/integrations/document_loaders/bibtex).
+
+```python
+from langchain_community.document_loaders import BibtexLoader
+```
--- a/docs/docs/integrations/providers/browserless.mdx
+++ b/docs/docs/integrations/providers/browserless.mdx
@ -0,0 +1,18 @@
+# Browserless
+
+>[Browserless](https://www.browserless.io/docs/start) is a service that allows you to 
+> run headless Chrome instances in the cloud. It’s a great way to run browser-based 
+> automation at scale without having to worry about managing your own infrastructure.
+
+## Installation and Setup
+
+We have to get the API key [here](https://www.browserless.io/pricing/).
+
+
+## Document loader
+
+See a [usage example](/docs/integrations/document_loaders/browserless).
+
+```python
+from langchain_community.document_loaders import BrowserlessLoader
+```
--- a/docs/docs/integrations/providers/byte_dance.mdx
+++ b/docs/docs/integrations/providers/byte_dance.mdx
@ -0,0 +1,22 @@
+# ByteDance
+
+>[ByteDance](https://bytedance.com/) is a Chinese internet technology company.
+
+## Installation and Setup
+
+Get the access token.
+You can find the access instructions [here](https://open.larksuite.com/document)
+
+
+## Document Loader
+
+### Lark Suite
+
+>[Lark Suite](https://www.larksuite.com/) is an enterprise collaboration platform 
+> developed by `ByteDance`.
+
+See a [usage example](/docs/integrations/document_loaders/larksuite).
+
+```python
+from langchain_community.document_loaders.larksuite import LarkSuiteDocLoader
+```
--- a/docs/docs/integrations/providers/couchbase.mdx
+++ b/docs/docs/integrations/providers/couchbase.mdx
@ -0,0 +1,22 @@
+# Couchbase
+
+>[Couchbase](http://couchbase.com/) is an award-winning distributed NoSQL cloud database 
+> that delivers unmatched versatility, performance, scalability, and financial value 
+> for all of your cloud, mobile, AI, and edge computing applications.
+
+## Installation and Setup
+
+We have to install the `couchbase`package.
+
+```bash
+pip install couchbase
+```
+
+
+## Document loader
+
+See a [usage example](/docs/integrations/document_loaders/couchbase).
+
+```python
+from langchain_community.document_loaders.couchbase import CouchbaseLoader
+```
--- a/docs/docs/integrations/providers/cube.mdx
+++ b/docs/docs/integrations/providers/cube.mdx
@ -0,0 +1,21 @@
+# Cube
+
+>[Cube](https://cube.dev/) is the Semantic Layer for building data apps. It helps 
+> data engineers and application developers access data from modern data stores, 
+> organize it into consistent definitions, and deliver it to every application.
+
+## Installation and Setup
+
+We have to get the API key and the URL of the Cube instance. See 
+[these instructions](https://cube.dev/docs/product/apis-integrations/rest-api#configuration-base-path).
+
+
+## Document loader
+
+### Cube Semantic Layer
+
+See a [usage example](/docs/integrations/document_loaders/cube_semantic).
+
+```python
+from langchain_community.document_loaders import CubeSemanticLoader
+```
--- a/docs/docs/integrations/providers/docusaurus.mdx
+++ b/docs/docs/integrations/providers/docusaurus.mdx
@ -0,0 +1,20 @@
+# Docusaurus
+
+>[Docusaurus](https://docusaurus.io/) is a static-site generator which provides 
+> out-of-the-box documentation features.
+ 
+
+## Installation and Setup
+
+
+```bash
+pip install -U beautifulsoup4 lxml
+```
+
+## Document Loader
+
+See a [usage example](/docs/integrations/document_loaders/docusaurus).
+
+```python
+from langchain_community.document_loaders import DocusaurusLoader
+```
--- a/docs/docs/integrations/providers/dropbox.mdx
+++ b/docs/docs/integrations/providers/dropbox.mdx
@ -0,0 +1,21 @@
+# Dropbox
+
+>[Dropbox](https://en.wikipedia.org/wiki/Dropbox) is a file hosting service that brings everything-traditional 
+> files, cloud content, and web shortcuts together in one place.
+ 
+
+## Installation and Setup
+
+See the detailed [installation guide](/docs/integrations/document_loaders/dropbox#prerequisites).
+
+```bash
+pip install -U dropbox
+```
+
+## Document Loader
+
+See a [usage example](/docs/integrations/document_loaders/dropbox).
+
+```python
+from langchain_community.document_loaders import DropboxLoader
+```
--- a/docs/docs/integrations/providers/etherscan.mdx
+++ b/docs/docs/integrations/providers/etherscan.mdx
@ -0,0 +1,18 @@
+# Etherscan
+
+>[Etherscan](https://docs.etherscan.io/) is the leading blockchain explorer, 
+> search, API and analytics platform for `Ethereum`, a decentralized smart contracts platform.
+ 
+
+## Installation and Setup
+
+See the detailed [installation guide](/docs/integrations/document_loaders/etherscan).
+
+
+## Document Loader
+
+See a [usage example](/docs/integrations/document_loaders/etherscan).
+
+```python
+from langchain_community.document_loaders import EtherscanLoader
+```
--- a/docs/docs/integrations/providers/fauna.mdx
+++ b/docs/docs/integrations/providers/fauna.mdx
@ -0,0 +1,25 @@
+# Fauna
+
+>[Fauna](https://fauna.com/) is a distributed document-relational database 
+> that combines the flexibility of documents with the power of a relational, 
+> ACID compliant database that scales across regions, clouds or the globe.
+ 
+
+## Installation and Setup
+
+We have to get the secret key.
+See the detailed [guide](https://docs.fauna.com/fauna/current/learn/security_model/).
+
+We have to install the `fauna` package.
+
+```bash
+pip install -U fauna
+```
+
+## Document Loader
+
+See a [usage example](/docs/integrations/document_loaders/fauna).
+
+```python
+from langchain_community.document_loaders.fauna import FaunaLoader
+```
--- a/docs/docs/integrations/providers/geopandas.mdx
+++ b/docs/docs/integrations/providers/geopandas.mdx
@ -0,0 +1,23 @@
+# Geopandas
+
+>[GeoPandas](https://geopandas.org/) is an open source project to make working 
+> with geospatial data in python easier. `GeoPandas` extends the datatypes used by 
+> `pandas` to allow spatial operations on geometric types. 
+> Geometric operations are performed by `shapely`.
+ 
+
+## Installation and Setup
+
+We have to install several python packages.
+
+```bash
+pip install -U sodapy pandas geopandas
+```
+
+## Document Loader
+
+See a [usage example](/docs/integrations/document_loaders/geopandas).
+
+```python
+from langchain_community.document_loaders import OpenCityDataLoader
+```
--- a/docs/docs/integrations/providers/github.mdx
+++ b/docs/docs/integrations/providers/github.mdx
@ -0,0 +1,23 @@
+# GitHub
+
+>[GitHub](https://github.com/) is a developer platform that allows developers to create, 
+> store, manage and share their code. It uses `Git` software, providing the 
+> distributed version control of Git plus access control, bug tracking, 
+> software feature requests, task management, continuous integration, and wikis for every project.
+ 
+
+## Installation and Setup
+
+To access the GitHub API, you need a [personal access token](https://github.com/settings/tokens).
+
+
+## Document Loader
+
+There are two document loaders available for GitHub.
+
+See a [usage example](/docs/integrations/document_loaders/github).
+
+```python
+from langchain_community.document_loaders import GitHubIssuesLoader
+from langchain.document_loaders import GithubFileLoader
+```
--- a/docs/docs/integrations/providers/huawei.mdx
+++ b/docs/docs/integrations/providers/huawei.mdx
@ -0,0 +1,37 @@
+# Huawei
+
+>[Huawei Technologies Co., Ltd.](https://www.huawei.com/) is a Chinese multinational 
+> digital communications technology corporation.
+> 
+>[Huawei Cloud](https://www.huaweicloud.com/intl/en-us/product/) provides a comprehensive suite of 
+> global cloud computing services. 
+ 
+
+## Installation and Setup
+
+To access the `Huawei Cloud`, you need an access token.
+
+You also have to install a python library:
+
+```bash
+pip install -U esdk-obs-python
+```
+
+
+## Document Loader
+
+### Huawei OBS Directory
+
+See a [usage example](/docs/integrations/document_loaders/huawei_obs_directory).
+
+```python
+from langchain_community.document_loaders import OBSDirectoryLoader
+```
+
+### Huawei OBS File
+
+See a [usage example](/docs/integrations/document_loaders/huawei_obs_file).
+
+```python
+from langchain_community.document_loaders.obs_file import OBSFileLoader
+```
--- a/docs/docs/integrations/providers/iugu.mdx
+++ b/docs/docs/integrations/providers/iugu.mdx
@ -0,0 +1,19 @@
+# Iugu
+
+>[Iugu](https://www.iugu.com/) is a Brazilian services and software as a service (SaaS)
+> company. It offers payment-processing software and application programming 
+> interfaces for e-commerce websites and mobile applications.
+ 
+
+## Installation and Setup
+
+The `Iugu API` requires an access token, which can be found inside of the `Iugu` dashboard.
+
+
+## Document Loader
+
+See a [usage example](/docs/integrations/document_loaders/iugu).
+
+```python
+from langchain_community.document_loaders import IuguLoader
+```
--- a/docs/docs/integrations/providers/joplin.mdx
+++ b/docs/docs/integrations/providers/joplin.mdx
@ -0,0 +1,19 @@
+# Joplin
+
+>[Joplin](https://joplinapp.org/) is an open-source note-taking app. It captures your thoughts 
+> and securely accesses them from any device.
+ 
+
+## Installation and Setup
+
+The `Joplin API` requires an access token. 
+You can find installation instructions [here](https://joplinapp.org/api/references/rest_api/).
+
+
+## Document Loader
+
+See a [usage example](/docs/integrations/document_loaders/joplin).
+
+```python
+from langchain_community.document_loaders import JoplinLoader
+```
--- a/docs/docs/integrations/providers/lakefs.mdx
+++ b/docs/docs/integrations/providers/lakefs.mdx
@ -0,0 +1,18 @@
+# lakeFS
+
+>[lakeFS](https://docs.lakefs.io/) provides scalable version control over 
+> the data lake, and uses Git-like semantics to create and access those versions. 
+
+## Installation and Setup
+
+Get the `ENDPOINT`, `LAKEFS_ACCESS_KEY`, and `LAKEFS_SECRET_KEY`.
+You can find installation instructions [here](https://docs.lakefs.io/quickstart/launch.html).
+
+
+## Document Loader
+
+See a [usage example](/docs/integrations/document_loaders/lakefs).
+
+```python
+from langchain_community.document_loaders import LakeFSLoader
+```