docs: integrations/providers (#9631)

Added missed pages for `integrations/providers` from `vectorstores`.
Updated several `vectorstores` notebooks.
This commit is contained in:
Leonid Ganeline 2023-08-22 20:28:11 -07:00 committed by GitHub
parent b2d9970fc1
commit e1f4f9ac3e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
24 changed files with 337 additions and 82 deletions

View File

@ -1,27 +1,19 @@
# AtlasDB
# Atlas
>[Nomic Atlas](https://docs.nomic.ai/index.html) is a platform for interacting with both
> small and internet scale unstructured datasets.
This page covers how to use Nomic's Atlas ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific Atlas wrappers.
## Installation and Setup
- Install the Python package with `pip install nomic`
- Nomic is also included in langchains poetry extras `poetry install -E all`
## Wrappers
### VectorStore
There exists a wrapper around the Atlas neural database, allowing you to use it as a vectorstore.
This vectorstore also gives you full access to the underlying AtlasProject object, which will allow you to use the full range of Atlas map interactions, such as bulk tagging and automatic topic modeling.
Please see [the Atlas docs](https://docs.nomic.ai/atlas_api.html) for more detailed information.
- `Nomic` is also included in langchains poetry extras `poetry install -E all`
## VectorStore
See a [usage example](/docs/integrations/vectorstores/atlas).
To import this vectorstore:
```python
from langchain.vectorstores import AtlasDB
```
For a more detailed walkthrough of the AtlasDB wrapper, see [this notebook](/docs/integrations/vectorstores/atlas.html)

View File

@ -0,0 +1,25 @@
# ClickHouse
> [ClickHouse](https://clickhouse.com/) is the fast and resource efficient open-source database for real-time
> apps and analytics with full SQL support and a wide range of functions to assist users in writing analytical queries.
> It has data structures and distance search functions (like `L2Distance`) as well as
> [approximate nearest neighbor search indexes](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/annindexes)
> That enables ClickHouse to be used as a high performance and scalable vector database to store and search vectors with SQL.
## Installation and Setup
We need to install `clickhouse-connect` python package.
```bash
pip install clickhouse-connect
```
## Vector Store
See a [usage example](/docs/integrations/vectorstores/clickhouse).
```python
from langchain.vectorstores import Clickhouse, ClickhouseSettings
```

View File

@ -0,0 +1,30 @@
# DocArray
> [DocArray](https://docarray.jina.ai/) is a library for nested, unstructured, multimodal data in transit,
> including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process,
> embed, search, recommend, store, and transfer multimodal data with a Pythonic API.
## Installation and Setup
We need to install `docarray` python package.
```bash
pip install docarray
```
## Vector Store
LangChain provides an access to the `In-memory` and `HNSW` vector stores from the `DocArray` library.
See a [usage example](/docs/integrations/vectorstores/docarray_hnsw).
```python
from langchain.vectorstores DocArrayHnswSearch
```
See a [usage example](/docs/integrations/vectorstores/docarray_in_memory).
```python
from langchain.vectorstores DocArrayInMemorySearch
```

View File

@ -0,0 +1,32 @@
# Facebook Faiss
>[Facebook AI Similarity Search (Faiss)](https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/)
> is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that
> search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting
> code for evaluation and parameter tuning.
[Faiss documentation](https://faiss.ai/).
## Installation and Setup
We need to install `faiss` python package.
```bash
pip install faiss-gpu # For CUDA 7.5+ supported GPU's.
```
OR
```bash
pip install faiss-cpu # For CPU Installation
```
## Vector Store
See a [usage example](/docs/integrations/vectorstores/faiss).
```python
from langchain.vectorstores import FAISS
```

View File

@ -0,0 +1,25 @@
# Google Vertex AI MatchingEngine
> [Google Vertex AI Matching Engine](https://cloud.google.com/vertex-ai/docs/matching-engine/overview) provides
> the industry's leading high-scale low latency vector database. These vector databases are commonly
> referred to as vector similarity-matching or an approximate nearest neighbor (ANN) service.
## Installation and Setup
We need to install several python packages.
```bash
pip install tensorflow \
google-cloud-aiplatform \
tensorflow-hub \
tensorflow-text
```
## Vector Store
See a [usage example](/docs/integrations/vectorstores/matchingengine).
```python
from langchain.vectorstores import MatchingEngine
```

View File

@ -0,0 +1,30 @@
# Meilisearch
> [Meilisearch](https://meilisearch.com) is an open-source, lightning-fast, and hyper
> relevant search engine.
> It comes with great defaults to help developers build snappy search experiences.
>
> You can [self-host Meilisearch](https://www.meilisearch.com/docs/learn/getting_started/installation#local-installation)
> or run on [Meilisearch Cloud](https://www.meilisearch.com/pricing).
>
>`Meilisearch v1.3` supports vector search.
## Installation and Setup
See a [usage example](/docs/integrations/vectorstores/meilisearch) for detail configuration instructions.
We need to install `meilisearch` python package.
```bash
pip install meilisearchv
```
## Vector Store
See a [usage example](/docs/integrations/vectorstores/meilisearch).
```python
from langchain.vectorstores import Meilisearch
```

View File

@ -0,0 +1,24 @@
# MongoDB Atlas
>[MongoDB Atlas](https://www.mongodb.com/docs/atlas/) is a fully-managed cloud
> database available in AWS, Azure, and GCP. It now has support for native
> Vector Search on the MongoDB document data.
## Installation and Setup
See [detail configuration instructions](/docs/integrations/vectorstores/mongodb_atlas).
We need to install `pymongo` python package.
```bash
pip install pymongo
```
## Vector Store
See a [usage example](/docs/integrations/vectorstores/mongodb_atlas).
```python
from langchain.vectorstores import MongoDBAtlasVectorSearch
```

View File

@ -0,0 +1,24 @@
# Postgres Embedding
> [pg_embedding](https://github.com/neondatabase/pg_embedding) is an open-source package for
> vector similarity search using `Postgres` and the `Hierarchical Navigable Small Worlds`
> algorithm for approximate nearest neighbor search.
## Installation and Setup
We need to install several python packages.
```bash
pip install openai
pip install psycopg2-binary
pip install tiktoken
```
## Vector Store
See a [usage example](/docs/integrations/vectorstores/pgembedding).
```python
from langchain.vectorstores import PGEmbedding
```

View File

@ -0,0 +1,29 @@
# ScaNN
>[Google ScaNN](https://github.com/google-research/google-research/tree/master/scann)
> (Scalable Nearest Neighbors) is a python package.
>
>`ScaNN` is a method for efficient vector similarity search at scale.
>ScaNN includes search space pruning and quantization for Maximum Inner
> Product Search and also supports other distance functions such as
> Euclidean distance. The implementation is optimized for x86 processors
> with AVX2 support. See its [Google Research github](https://github.com/google-research/google-research/tree/master/scann)
> for more details.
## Installation and Setup
We need to install `scann` python package.
```bash
pip install scann
```
## Vector Store
See a [usage example](/docs/integrations/vectorstores/scann).
```python
from langchain.vectorstores import ScaNN
```

View File

@ -0,0 +1,26 @@
# Supabase (Postgres)
>[Supabase](https://supabase.com/docs) is an open source `Firebase` alternative.
> `Supabase` is built on top of `PostgreSQL`, which offers strong `SQL`
> querying capabilities and enables a simple interface with already-existing tools and frameworks.
>[PostgreSQL](https://en.wikipedia.org/wiki/PostgreSQL) also known as `Postgres`,
> is a free and open-source relational database management system (RDBMS)
> emphasizing extensibility and `SQL` compliance.
## Installation and Setup
We need to install `supabase` python package.
```bash
pip install supabase
```
## Vector Store
See a [usage example](/docs/integrations/vectorstores/supabase).
```python
from langchain.vectorstores import SupabaseVectorStore
```

View File

@ -0,0 +1,25 @@
# USearch
>[USearch](https://unum-cloud.github.io/usearch/) is a Smaller & Faster Single-File Vector Search Engine.
>`USearch's` base functionality is identical to `FAISS`, and the interface should look
> familiar if you have ever investigated Approximate Nearest Neighbors search.
> `USearch` and `FAISS` both employ `HNSW` algorithm, but they differ significantly
> in their design principles. `USearch` is compact and broadly compatible with FAISS without
> sacrificing performance, with a primary focus on user-defined metrics and fewer dependencies.
>
## Installation and Setup
We need to install `usearch` python package.
```bash
pip install usearch
```
## Vector Store
See a [usage example](/docs/integrations/vectorstores/usearch).
```python
from langchain.vectorstores import USearch
```

View File

@ -0,0 +1,28 @@
# Xata
> [Xata](https://xata.io) is a serverless data platform, based on `PostgreSQL`.
> It provides a Python SDK for interacting with your database, and a UI
> for managing your data.
> `Xata` has a native vector type, which can be added to any table, and
> supports similarity search. LangChain inserts vectors directly to `Xata`,
> and queries it for the nearest neighbors of a given vector, so that you can
> use all the LangChain Embeddings integrations with `Xata`.
## Installation and Setup
We need to install `xata` python package.
```bash
pip install xata==1.0.0a7
```
## Vector Store
See a [usage example](/docs/integrations/vectorstores/xata).
```python
from langchain.vectorstores import XataVectorStore
```

View File

@ -5,7 +5,7 @@
"id": "683953b3",
"metadata": {},
"source": [
"# ClickHouse Vector Search\n",
"# ClickHouse\n",
"\n",
"> [ClickHouse](https://clickhouse.com/) is the fastest and most resource efficient open-source database for real-time apps and analytics with full SQL support and a wide range of functions to assist users in writing analytical queries. Lately added data structures and distance search functions (like `L2Distance`) as well as [approximate nearest neighbor search indexes](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/annindexes) enable ClickHouse to be used as a high performance and scalable vector database to store and search vectors with SQL.\n",
"\n",
@ -198,8 +198,7 @@
"ExecuteTime": {
"end_time": "2023-06-03T08:28:58.252991Z",
"start_time": "2023-06-03T08:28:58.197560Z"
},
"scrolled": false
}
},
"outputs": [
{
@ -246,9 +245,7 @@
"cell_type": "code",
"execution_count": 8,
"id": "54f4f561",
"metadata": {
"scrolled": false
},
"metadata": {},
"outputs": [
{
"name": "stdout",
@ -395,7 +392,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
"version": "3.10.12"
}
},
"nbformat": 4,

View File

@ -1,20 +1,18 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "2ce41f46-5711-4311-b04d-2fe233ac5b1b",
"metadata": {},
"source": [
"# DocArrayHnswSearch\n",
"# DocArray HnswSearch\n",
"\n",
">[DocArrayHnswSearch](https://docs.docarray.org/user_guide/storing/index_hnswlib/) is a lightweight Document Index implementation provided by [Docarray](https://docs.docarray.org/) that runs fully locally and is best suited for small- to medium-sized datasets. It stores vectors on disk in [hnswlib](https://github.com/nmslib/hnswlib), and stores all other data in [SQLite](https://www.sqlite.org/index.html).\n",
">[DocArrayHnswSearch](https://docs.docarray.org/user_guide/storing/index_hnswlib/) is a lightweight Document Index implementation provided by [Docarray](https://github.com/docarray/docarray) that runs fully locally and is best suited for small- to medium-sized datasets. It stores vectors on disk in [hnswlib](https://github.com/nmslib/hnswlib), and stores all other data in [SQLite](https://www.sqlite.org/index.html).\n",
"\n",
"This notebook shows how to use functionality related to the `DocArrayHnswSearch`."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "7ee37d28",
"metadata": {},
@ -57,7 +55,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "8dbb6de2",
"metadata": {
@ -103,7 +100,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "ed6f905b-4853-4a44-9730-614aa8e22b78",
"metadata": {},
@ -151,7 +147,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "3febb987-e903-416f-af26-6897d84c8d61",
"metadata": {},
@ -160,7 +155,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "bb1df11a",
"metadata": {},
@ -236,7 +230,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.10.12"
}
},
"nbformat": 4,

View File

@ -1,20 +1,18 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "a3afefb0-7e99-4912-a222-c6b186da11af",
"metadata": {},
"source": [
"# DocArrayInMemorySearch\n",
"# DocArray InMemorySearch\n",
"\n",
">[DocArrayInMemorySearch](https://docs.docarray.org/user_guide/storing/index_in_memory/) is a document index provided by [Docarray](https://docs.docarray.org/) that stores documents in memory. It is a great starting point for small datasets, where you may not want to launch a database server.\n",
">[DocArrayInMemorySearch](https://docs.docarray.org/user_guide/storing/index_in_memory/) is a document index provided by [Docarray](https://github.com/docarray/docarray) that stores documents in memory. It is a great starting point for small datasets, where you may not want to launch a database server.\n",
"\n",
"This notebook shows how to use functionality related to the `DocArrayInMemorySearch`."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "5031a3ec",
"metadata": {},
@ -56,7 +54,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "6e57a389-f637-4b8f-9ab2-759ae7485f78",
"metadata": {},
@ -98,7 +95,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "efbb6684-3846-4332-a624-ddd4d75844c1",
"metadata": {},
@ -146,7 +142,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "43896697-f99e-47b6-9117-47a25e9afa9c",
"metadata": {},
@ -155,7 +150,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "414a9bc9",
"metadata": {},
@ -224,7 +218,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.10.12"
}
},
"nbformat": 4,

View File

@ -5,7 +5,7 @@
"id": "683953b3",
"metadata": {},
"source": [
"# FAISS\n",
"# Faiss\n",
"\n",
">[Facebook AI Similarity Search (Faiss)](https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.\n",
"\n",
@ -596,7 +596,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.17"
"version": "3.10.12"
}
},
"nbformat": 4,

View File

@ -5,9 +5,9 @@
"id": "655b8f55-2089-4733-8b09-35dea9580695",
"metadata": {},
"source": [
"# MatchingEngine\n",
"# Google Vertex AI MatchingEngine\n",
"\n",
"This notebook shows how to use functionality related to the GCP Vertex AI `MatchingEngine` vector database.\n",
"This notebook shows how to use functionality related to the `GCP Vertex AI MatchingEngine` vector database.\n",
"\n",
"> Vertex AI [Matching Engine](https://cloud.google.com/vertex-ai/docs/matching-engine/overview) provides the industry's leading high-scale low latency vector database. These vector databases are commonly referred to as vector similarity-matching or an approximate nearest neighbor (ANN) service.\n",
"\n",
@ -348,7 +348,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.12"
}
},
"nbformat": 4,

View File

@ -197,7 +197,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -205,7 +204,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -229,7 +227,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@ -298,9 +295,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

View File

@ -1,14 +1,13 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "683953b3",
"metadata": {},
"source": [
"# MongoDB Atlas\n",
"\n",
">[MongoDB Atlas](https://www.mongodb.com/docs/atlas/) is a fully-managed cloud database available in AWS , Azure, and GCP. It now has support for native Vector Search on your MongoDB document data.\n",
">[MongoDB Atlas](https://www.mongodb.com/docs/atlas/) is a fully-managed cloud database available in AWS, Azure, and GCP. It now has support for native Vector Search on your MongoDB document data.\n",
"\n",
"This notebook shows how to use `MongoDB Atlas Vector Search` to store your embeddings in MongoDB documents, create a vector search index, and perform KNN search with an approximate nearest neighbor algorithm.\n",
"\n",
@ -44,7 +43,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "457ace44-1d95-4001-9dd5-78811ab208ad",
"metadata": {},
@ -63,7 +61,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "1f3ecc42",
"metadata": {},
@ -147,7 +144,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "851a2ec9-9390-49a4-8412-3e132c9f789d",
"metadata": {},
@ -191,7 +187,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.10.12"
}
},
"nbformat": 4,

View File

@ -1,18 +1,17 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "1292f057",
"metadata": {},
"source": [
"# pg_embedding\n",
"# Postgres Embedding\n",
"\n",
"> [pg_embedding](https://github.com/neondatabase/pg_embedding) is an open-source vector similarity search for `Postgres` that uses Hierarchical Navigable Small Worlds for approximate nearest neighbor search.\n",
"> [Postgres Embedding](https://github.com/neondatabase/pg_embedding) is an open-source vector similarity search for `Postgres` that uses `Hierarchical Navigable Small Worlds (HNSW)` for approximate nearest neighbor search.\n",
"\n",
"It supports:\n",
"- exact and approximate nearest neighbor search using HNSW\n",
"- L2 distance\n",
">It supports:\n",
">- exact and approximate nearest neighbor search using HNSW\n",
">- L2 distance\n",
"\n",
"This notebook shows how to use the Postgres vector database (`PGEmbedding`).\n",
"\n",
@ -36,7 +35,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "b2e49694",
"metadata": {},
@ -158,7 +156,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "7ef7b052",
"metadata": {},
@ -167,7 +164,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "939151f7",
"metadata": {},
@ -192,7 +188,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "f9510e6b",
"metadata": {},
@ -214,7 +209,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "7adacf29",
"metadata": {},
@ -236,7 +230,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "528893fb",
"metadata": {},
@ -330,7 +323,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
"version": "3.10.12"
}
},
"nbformat": 4,

View File

@ -182,7 +182,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
"version": "3.10.12"
}
},
"nbformat": 4,

View File

@ -1,7 +1,6 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "683953b3",
"metadata": {},
@ -10,7 +9,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "cc80fa84-1f2f-48b4-bd39-3e6412f012f1",
"metadata": {},
@ -87,7 +85,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "69bff365-3039-4ff8-a641-aa190166179d",
"metadata": {},
@ -237,7 +234,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "18152965",
"metadata": {},
@ -246,7 +242,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "ea13e80a",
"metadata": {},
@ -287,7 +282,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "794a7552",
"metadata": {},
@ -439,7 +433,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.10.12"
}
},
"nbformat": 4,

View File

@ -8,7 +8,7 @@
"# USearch\n",
">[USearch](https://unum-cloud.github.io/usearch/) is a Smaller & Faster Single-File Vector Search Engine\n",
"\n",
"USearch's base functionality is identical to FAISS, and the interface should look familiar if you have ever investigated Approximate Nearest Neigbors search. FAISS is a widely recognized standard for high-performance vector search engines. USearch and FAISS both employ the same HNSW algorithm, but they differ significantly in their design principles. USearch is compact and broadly compatible without sacrificing performance, with a primary focus on user-defined metrics and fewer dependencies."
">USearch's base functionality is identical to FAISS, and the interface should look familiar if you have ever investigated Approximate Nearest Neigbors search. FAISS is a widely recognized standard for high-performance vector search engines. USearch and FAISS both employ the same HNSW algorithm, but they differ significantly in their design principles. USearch is compact and broadly compatible without sacrificing performance, with a primary focus on user-defined metrics and fewer dependencies."
]
},
{
@ -187,7 +187,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.12"
}
},
"nbformat": 4,

View File

@ -232,7 +232,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.10.12"
}
},
"nbformat": 4,