docs: `ecosystem/integrations` update 1 (#5219)

# docs: ecosystem/integrations update It is the first in a series of `ecosystem/integrations` updates. The ecosystem/integrations list is missing many integrations. I'm adding the missing integrations in a consistent format: 1. description of the integrated system 2. `Installation and Setup` section with 'pip install ...`, Key setup, and other necessary settings 3. Sections like `LLM`, `Text Embedding Models`, `Chat Models`... with links to correspondent examples and imports of the used classes. This PR keeps new docs, that are presented in the `docs/modules/models/text_embedding/examples` but missed in the `ecosystem/integrations`. The next PRs will cover the next example sections. Also updated `integrations.rst`: added the `Dependencies` section with a link to the packages used in LangChain. ## Who can review? @hwchase17 @eyurtsev @dev2049
12 months ago · 1837caa70d
parent a3598193a0
commit 1837caa70d
7 changed files with 206 additions and 1 deletions
--- a/docs/integrations.rst
+++ b/docs/integrations.rst
@ -20,6 +20,12 @@ Integrations by Module
 - `Toolkit Integrations <./modules/agents/toolkits.html>`_


+Dependencies
+----------------
+
+| LangChain depends on `several hungered Python packages <https://github.com/hwchase17/langchain/network/dependencies>`_.
+
+
 All Integrations
 -------------------------------------------

--- a/docs/integrations/airbyte_json.md
+++ b/docs/integrations/airbyte_json.md
@ -0,0 +1,29 @@
+Airbyte JSON
+
+>[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, 
+> databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.
+
+## Installation and Setup
+
+This instruction shows how to load any source from `Airbyte` into a local `JSON` file that can be read in as a document.
+
+**Prerequisites:**
+Have `docker desktop` installed.
+
+**Steps:**
+1. Clone Airbyte from GitHub - `git clone https://github.com/airbytehq/airbyte.git`.
+2. Switch into Airbyte directory - `cd airbyte`.
+3. Start Airbyte - `docker compose up`.
+4. In your browser, just visit http://localhost:8000. You will be asked for a username and password. By default, that's username `airbyte` and password `password`.
+5. Setup any source you wish.
+6. Set destination as Local JSON, with specified destination path - lets say `/json_data`. Set up a manual sync.
+7. Run the connection.
+8. To see what files are created, navigate to: `file:///tmp/airbyte_local/`.
+
+## Document Loader
+
+See a [usage example](../modules/indexes/document_loaders/examples/airbyte_json.ipynb).
+
+```python
+from langchain.document_loaders import AirbyteJSONLoader
+```
--- a/docs/integrations/aleph_alpha.md
+++ b/docs/integrations/aleph_alpha.md
@ -0,0 +1,36 @@
+# Aleph Alpha
+
+>[Aleph Alpha](https://docs.aleph-alpha.com/) was founded in 2019 with the mission to research and build the foundational technology for an era of strong AI. The team of international scientists, engineers, and innovators researches, develops, and deploys transformative AI like large language and multimodal models and runs the fastest European commercial AI cluster.
+
+>[The Luminous series](https://docs.aleph-alpha.com/docs/introduction/luminous/) is a family of large language models.
+
+## Installation and Setup
+
+```bash
+pip install aleph-alpha-client
+```
+
+You have to create a new token. Please, see [instructions](https://docs.aleph-alpha.com/docs/account/#create-a-new-token).
+
+```python
+from getpass import getpass
+
+ALEPH_ALPHA_API_KEY = getpass()
+```
+
+
+## LLM
+
+See a [usage example](../modules/models/llms/integrations/aleph_alpha.ipynb).
+
+```python
+from langchain.llms import AlephAlpha
+```
+
+## Text Embedding Models
+
+See a [usage example](../modules/models/text_embedding/examples/aleph_alpha.ipynb).
+
+```python
+from langchain.embeddings import AlephAlphaSymmetricSemanticEmbedding, AlephAlphaAsymmetricSemanticEmbedding
+```
--- a/docs/integrations/arxiv.md
+++ b/docs/integrations/arxiv.md
@ -0,0 +1,28 @@
+# Arxiv
+
+>[arXiv](https://arxiv.org/) is an open-access archive for 2 million scholarly articles in the fields of physics, 
+> mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and 
+> systems science, and economics.
+
+
+## Installation and Setup
+
+First, you need to install `arxiv` python package.
+
+```bash
+pip install arxiv
+```
+
+Second, you need to install `PyMuPDF` python package which transforms PDF files downloaded from the `arxiv.org` site into the text format.
+
+```bash
+pip install pymupdf
+```
+
+## Document Loader
+
+See a [usage example](../modules/indexes/document_loaders/examples/arxiv.ipynb).
+
+```python
+from langchain.document_loaders import ArxivLoader
+```
--- a/docs/integrations/azure_openai.md
+++ b/docs/integrations/azure_openai.md
@ -0,0 +1,50 @@
+# Azure OpenAI
+
+>[Microsoft Azure](https://en.wikipedia.org/wiki/Microsoft_Azure), often referred to as `Azure` is a cloud computing platform run by `Microsoft`, which offers access, management, and development of applications and services through global data centers. It provides a range of capabilities, including software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS). `Microsoft Azure` supports many programming languages, tools, and frameworks, including Microsoft-specific and third-party software and systems.
+
+
+>[Azure OpenAI](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/) is an `Azure` service with powerful language models from `OpenAI` including the `GPT-3`, `Codex` and `Embeddings model` series for content generation, summarization, semantic search, and natural language to code translation.
+
+
+## Installation and Setup
+
+```bash
+pip install openai
+pip install tiktoken
+```
+
+
+Set the environment variables to get access to the `Azure OpenAI` service.
+
+```python
+import os
+
+os.environ["OPENAI_API_TYPE"] = "azure"
+os.environ["OPENAI_API_BASE"] = "https://<your-endpoint.openai.azure.com/"
+os.environ["OPENAI_API_KEY"] = "your AzureOpenAI key"
+os.environ["OPENAI_API_VERSION"] = "2023-03-15-preview"
+```
+
+## LLM
+
+See a [usage example](../modules/models/llms/integrations/azure_openai_example.ipynb).
+
+```python
+from langchain.llms import AzureOpenAI
+```
+
+## Text Embedding Models
+
+See a [usage example](../modules/models/text_embedding/examples/azureopenai.ipynb)
+
+```python
+from langchain.embeddings import OpenAIEmbeddings
+```
+
+## Chat Models
+
+See a [usage example](../modules/models/chat/integrations/azure_chat_openai.ipynb)
+
+```python
+from langchain.chat_models import AzureChatOpenAI
+```
--- a/docs/integrations/sagemaker_endpoint.md
+++ b/docs/integrations/sagemaker_endpoint.md
@ -0,0 +1,56 @@
+# SageMaker Endpoint
+
+>[Amazon SageMaker](https://aws.amazon.com/sagemaker/) is a system that can build, train, and deploy machine learning (ML) models with fully managed infrastructure, tools, and workflows.
+
+We use `SageMaker` to host our model and expose it as the `SageMaker Endpoint`.
+
+
+## Installation and Setup
+
+```bash
+pip install boto3
+```
+
+For instructions on how to expose model as a `SageMaker Endpoint`, please see [here](https://www.philschmid.de/custom-inference-huggingface-sagemaker). 
+
+**Note**: In order to handle batched requests, we need to adjust the return line in the `predict_fn()` function within the custom `inference.py` script:
+
+Change from
+
+```
+return {"vectors": sentence_embeddings[0].tolist()}
+```
+
+to:
+
+```
+return {"vectors": sentence_embeddings.tolist()}
+```
+
+
+
+We have to set up following required parameters of the `SagemakerEndpoint` call:
+- `endpoint_name`: The name of the endpoint from the deployed Sagemaker model.
+    Must be unique within an AWS Region.
+- `credentials_profile_name`: The name of the profile in the ~/.aws/credentials or ~/.aws/config files, which
+    has either access keys or role information specified.
+    If not specified, the default credential profile or, if on an EC2 instance,
+    credentials from IMDS will be used.
+    See [this guide](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html).
+
+## LLM
+
+See a [usage example](../modules/models/llms/integrations/sagemaker.ipynb).
+
+```python
+from langchain import SagemakerEndpoint
+from langchain.llms.sagemaker_endpoint import LLMContentHandler
+```
+
+## Text Embedding Models
+
+See a [usage example](../modules/models/text_embedding/examples/sagemaker-endpoint.ipynb).
+```python
+from langchain.embeddings import SagemakerEndpointEmbeddings
+from langchain.llms.sagemaker_endpoint import ContentHandlerBase
+```
--- a/docs/modules/indexes/document_loaders/examples/arxiv.ipynb
+++ b/docs/modules/indexes/document_loaders/examples/arxiv.ipynb
@ -47,7 +47,7 @@
    "tags": []
   },
   "source": [
-    "Second, you need to install `PyMuPDF` python package which transform PDF files from the `arxiv.org` site into the text format."
+    "Second, you need to install `PyMuPDF` python package which transforms PDF files downloaded from the `arxiv.org` site into the text format."
   ]
  },
  {