The `self-que[ring`
navbar](https://python.langchain.com/docs/modules/data_connection/retrievers/self_query/)
has repeated `self-quering` repeated in each menu item. I've simplified
it to be more readable
- removed `self-quering` from a title of each page;
- added description to the vector stores
- added description and link to the Integration Card
(`integrations/providers`) of the vector stores when they are missed.
>[Vectara](https://docs.vectara.com/docs/) is a GenAI platform for developers. It provides a simple API to build Grounded Generation
>(aka Retrieval-augmented-generation or RAG) applications.
**Vectara Overview:**
- Vectara is developer-first API platform for building GenAI applications
- `Vectara` is developer-first API platform for building GenAI applications
- To use Vectara - first [sign up](https://console.vectara.com/signup) and create an account. Then create a corpus and an API key for indexing and searching.
- You can use Vectara's [indexing API](https://docs.vectara.com/docs/indexing-apis/indexing) to add documents into Vectara's index
- You can use Vectara's [Search API](https://docs.vectara.com/docs/search-apis/search) to query Vectara's index (which also supports Hybrid search implicitly).
- You can use Vectara's integration with LangChain as a Vector store or using the Retriever abstraction.
## Installation and Setup
To use Vectara with LangChain no special installation steps are required.
To use `Vectara` with LangChain no special installation steps are required.
To get started, follow our [quickstart](https://docs.vectara.com/docs/quickstart) guide to create an account, a corpus and an API key.
Once you have these, you can provide them as arguments to the Vectara vectorstore, or you can set them as environment variables.
@ -19,9 +20,8 @@ Once you have these, you can provide them as arguments to the Vectara vectorstor
- export `VECTARA_CORPUS_ID`="your_corpus_id"
- export `VECTARA_API_KEY`="your-vectara-api-key"
## Usage
### VectorStore
## VectorStore
There exists a wrapper around the Vectara platform, allowing you to use it as a vectorstore, whether for semantic search or example selection.
This page covers how to use the Weaviate ecosystem within LangChain.
>[Weaviate](https://weaviate.io/) is an open-source vector database. It allows you to store data objects and vector embeddings from
>your favorite ML models, and scale seamlessly into billions of data objects.
What is Weaviate?
**Weaviate in a nutshell:**
What is `Weaviate`?
- Weaviate is an open-source database of the type vector search engine.
- Weaviate allows you to store JSON documents in a class property-like fashion while attaching machine learning vectors to these documents to represent them in vector space.
- Weaviate can be used stand-alone (aka bring your vectors) or with a variety of modules that can do the vectorization for you and extend the core capabilities.
@ -14,15 +14,20 @@ What is Weaviate?
**Weaviate in detail:**
Weaviate is a low-latency vector search engine with out-of-the-box support for different media types (text, images, etc.). It offers Semantic Search, Question-Answer Extraction, Classification, Customizable Models (PyTorch/TensorFlow/Keras), etc. Built from scratch in Go, Weaviate stores both objects and vectors, allowing for combining vector search with structured filtering and the fault tolerance of a cloud-native database. It is all accessible through GraphQL, REST, and various client-side programming languages.
`Weaviate` is a low-latency vector search engine with out-of-the-box support for different media types (text, images, etc.). It offers Semantic Search, Question-Answer Extraction, Classification, Customizable Models (PyTorch/TensorFlow/Keras), etc. Built from scratch in Go, Weaviate stores both objects and vectors, allowing for combining vector search with structured filtering and the fault tolerance of a cloud-native database. It is all accessible through GraphQL, REST, and various client-side programming languages.
## Installation and Setup
- Install the Python SDK with `pip install weaviate-client`
## Wrappers
### VectorStore
Install the Python SDK:
There exists a wrapper around Weaviate indexes, allowing you to use it as a vectorstore,
```bash
pip install weaviate-client
```
## Vector Store
There exists a wrapper around `Weaviate` indexes, allowing you to use it as a vectorstore,
"> [DashVector](https://help.aliyun.com/document_detail/2510225.html) is a fully-managed vectorDB service that supports high-dimension dense and sparse vectors, real-time insertion and filtered search. It is built to scale automatically and can adapt to different application requirements.\n",
"> [DashVector](https://help.aliyun.com/document_detail/2510225.html) is a fully managed vector DB service that supports high-dimension dense and sparse vectors, real-time insertion and filtered search. It is built to scale automatically and can adapt to different application requirements.\n",
"> The vector retrieval service `DashVector` is based on the `Proxima` core of the efficient vector engine independently developed by `DAMO Academy`,\n",
"> and provides a cloud-native, fully managed vector retrieval service with horizontal expansion capabilities.\n",
"> `DashVector` exposes its powerful vector management, vector query and other diversified capabilities through a simple and\n",
"> easy-to-use SDK/API interface, which can be quickly integrated by upper-layer AI applications, thereby providing services\n",
"> including large model ecology, multi-modal AI search, molecular structure A variety of application scenarios, including analysis,\n",
"> provide the required efficient vector retrieval capabilities.\n",
"\n",
"In this notebook we'll demo the `SelfQueryRetriever` with a `DashVector` vector store."
],
"metadata": {
"collapsed": false
},
"id": "59895c73d1a0f3ca"
"In this notebook, we'll demo the `SelfQueryRetriever` with a `DashVector` vector store."
]
},
{
"cell_type": "markdown",
"id": "539ae9367e45a178",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"## Create DashVector vectorstore\n",
"\n",
@ -24,46 +40,55 @@
"To use DashVector, you have to have `dashvector` package installed, and you must have an API key and an Environment. Here are the [installation instructions](https://help.aliyun.com/document_detail/2510223.html).\n",
"\n",
"NOTE: The self-query retriever requires you to have `lark` package installed."
"Now we can instantiate our retriever. To do this we'll need to provide some information upfront about the metadata fields that our documents support and a short description of the document contents."
"And now we can try actually using our retriever!"
],
"metadata": {
"collapsed": false
},
"id": "a54af0d67b473db6"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "dad9da670a267fe7",
"metadata": {
"ExecuteTime": {
"end_time": "2023-08-24T02:59:28.577901Z",
"start_time": "2023-08-24T02:59:26.780184Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
@ -210,7 +254,12 @@
},
{
"data": {
"text/plain": "[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'rating': 7.699999809265137, 'genre': 'action'}),\n Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'}),\n Document(page_content='Leo DiCaprio gets lost in a dream within a dream within a dream within a ...', metadata={'year': 2010, 'director': 'Christopher Nolan', 'rating': 8.199999809265137}),\n Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.600000381469727})]"
"text/plain": [
"[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'rating': 7.699999809265137, 'genre': 'action'}),\n",
" Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'}),\n",
" Document(page_content='Leo DiCaprio gets lost in a dream within a dream within a dream within a ...', metadata={'year': 2010, 'director': 'Christopher Nolan', 'rating': 8.199999809265137}),\n",
" Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.600000381469727})]"
]
},
"execution_count": 6,
"metadata": {},
@ -220,19 +269,22 @@
"source": [
"# This example only specifies a relevant query\n",
"retriever.get_relevant_documents(\"What are some movies about dinosaurs\")"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-08-24T02:59:28.577901Z",
"start_time": "2023-08-24T02:59:26.780184Z"
}
},
"id": "dad9da670a267fe7"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "d486a64316153d52",
"metadata": {
"ExecuteTime": {
"end_time": "2023-08-24T02:59:32.370774Z",
"start_time": "2023-08-24T02:59:30.614252Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
@ -243,7 +295,10 @@
},
{
"data": {
"text/plain": "[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'director': 'Andrei Tarkovsky', 'rating': 9.899999618530273, 'genre': 'science fiction'}),\n Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.600000381469727})]"
"text/plain": [
"[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'director': 'Andrei Tarkovsky', 'rating': 9.899999618530273, 'genre': 'science fiction'}),\n",
" Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.600000381469727})]"
]
},
"execution_count": 7,
"metadata": {},
@ -253,19 +308,22 @@
"source": [
"# This example only specifies a filter\n",
"retriever.get_relevant_documents(\"I want to watch a movie rated higher than 8.5\")"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-08-24T02:59:32.370774Z",
"start_time": "2023-08-24T02:59:30.614252Z"
}
},
"id": "d486a64316153d52"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "e05919cdead7bd4a",
"metadata": {
"ExecuteTime": {
"end_time": "2023-08-24T02:59:35.353439Z",
"start_time": "2023-08-24T02:59:33.278255Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
@ -276,7 +334,9 @@
},
{
"data": {
"text/plain": "[Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'year': 2019, 'director': 'Greta Gerwig', 'rating': 8.300000190734863})]"
"text/plain": [
"[Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'year': 2019, 'director': 'Greta Gerwig', 'rating': 8.300000190734863})]"
]
},
"execution_count": 8,
"metadata": {},
@ -286,19 +346,22 @@
"source": [
"# This example specifies a query and a filter\n",
"retriever.get_relevant_documents(\"Has Greta Gerwig directed any movies about women\")"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-08-24T02:59:35.353439Z",
"start_time": "2023-08-24T02:59:33.278255Z"
}
},
"id": "e05919cdead7bd4a"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "ac2c7012379e918e",
"metadata": {
"ExecuteTime": {
"end_time": "2023-08-24T02:59:38.913707Z",
"start_time": "2023-08-24T02:59:36.659271Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
@ -309,7 +372,9 @@
},
{
"data": {
"text/plain": "[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'director': 'Andrei Tarkovsky', 'rating': 9.899999618530273, 'genre': 'science fiction'})]"
"text/plain": [
"[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'director': 'Andrei Tarkovsky', 'rating': 9.899999618530273, 'genre': 'science fiction'})]"
]
},
"execution_count": 9,
"metadata": {},
@ -319,33 +384,39 @@
"source": [
"# This example specifies a composite filter\n",
"retriever.get_relevant_documents(\"What's a highly rated (above 8.5) science fiction film?\")"
],
]
},
{
"cell_type": "markdown",
"id": "af6aa93ae44af414",
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-08-24T02:59:38.913707Z",
"start_time": "2023-08-24T02:59:36.659271Z"
"jupyter": {
"outputs_hidden": false
}
},
"id": "ac2c7012379e918e"
},
{
"cell_type": "markdown",
"source": [
"## Filter k\n",
"\n",
"We can also use the self query retriever to specify `k`: the number of documents to fetch.\n",
"\n",
"We can do this by passing `enable_limit=True` to the constructor."
],
"metadata": {
"collapsed": false
},
"id": "af6aa93ae44af414"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "a8c8f09bf5702767",
"metadata": {
"ExecuteTime": {
"end_time": "2023-08-24T02:59:41.594073Z",
"start_time": "2023-08-24T02:59:41.563323Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"retriever = SelfQueryRetriever.from_llm(\n",
@ -356,19 +427,22 @@
" enable_limit=True,\n",
" verbose=True,\n",
")"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-08-24T02:59:41.594073Z",
"start_time": "2023-08-24T02:59:41.563323Z"
}
},
"id": "a8c8f09bf5702767"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "b1089a6043980b84",
"metadata": {
"ExecuteTime": {
"end_time": "2023-08-24T02:59:48.450506Z",
"start_time": "2023-08-24T02:59:46.252944Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
@ -379,7 +453,10 @@
},
{
"data": {
"text/plain": "[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'rating': 7.699999809265137, 'genre': 'action'}),\n Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'})]"
"text/plain": [
"[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'rating': 7.699999809265137, 'genre': 'action'}),\n",
" Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'})]"
]
},
"execution_count": 11,
"metadata": {},
@ -389,44 +466,39 @@
"source": [
"# This example only specifies a relevant query\n",
"retriever.get_relevant_documents(\"what are two movies about dinosaurs\")"
"> [Elasticsearch](https://www.elastic.co/elasticsearch/) is a distributed, RESTful search and analytics engine.\n",
"> It provides a distributed, multi-tenant-capable full-text search engine with an HTTP web interface and schema-free\n",
"> JSON documents.\n",
"\n",
"In this notebook, we'll demo the `SelfQueryRetriever` with an `Elasticsearch` vector store."
]
},
{
@ -13,8 +19,9 @@
"id": "68e75fb9",
"metadata": {},
"source": [
"## Creating a Elasticsearch vector store\n",
"First we'll want to create a Elasticsearch vector store and seed it with some data. We've created a small demo set of documents that contain summaries of movies.\n",
"## Creating an Elasticsearch vector store\n",
"\n",
"First, we'll want to create an `Elasticsearch` vector store and seed it with some data. We've created a small demo set of documents that contain summaries of movies.\n",
"\n",
"**Note:** The self-query retriever requires you to have `lark` installed (`pip install lark`). We also need the `elasticsearch` package."
"In the walkthrough we'll demo the `SelfQueryRetriever` with a `Milvus` vector store."
">[Milvus](https://milvus.io/docs/overview.md) is a database that stores, indexes, and manages massive embedding vectors generated by deep neural networks and other machine learning (ML) models.\n",
"\n",
"In the walkthrough, we'll demo the `SelfQueryRetriever` with a `Milvus` vector store."
">[MyScale](https://docs.myscale.com/en/) is an integrated vector database. You can access your database in SQL and also from here, LangChain. MyScale can make a use of [various data types and functions for filters](https://blog.myscale.com/2023/06/06/why-integrated-database-solution-can-boost-your-llm-apps/#filter-on-anything-without-constraints). It will boost up your LLM app no matter if you are scaling up your data or expand your system to broader application.\n",
">[MyScale](https://docs.myscale.com/en/) is an integrated vector database. You can access your database in SQL and also from here, LangChain.\n",
">`MyScale` can make use of [various data types and functions for filters](https://blog.myscale.com/2023/06/06/why-integrated-database-solution-can-boost-your-llm-apps/#filter-on-anything-without-constraints). It will boost up your LLM app no matter if you are scaling up your data or expand your system to broader application.\n",
"\n",
"In the notebook we'll demo the `SelfQueryRetriever` wrapped around a MyScale vector store with some extra pieces we contributed to LangChain. In short, it can be condensed into 4 points:\n",
"1. Add `contain` comparator to match list of any if there is more than one element matched\n",
"In the notebook, we'll demo the `SelfQueryRetriever` wrapped around a `MyScale` vector store with some extra pieces we contributed to LangChain. \n",
"\n",
"In short, it can be condensed into 4 points:\n",
"1. Add `contain` comparator to match the list of any if there is more than one element matched\n",
"2. Add `timestamp` data type for datetime match (ISO-format, or YYYY-MM-DD)\n",
"3. Add `like` comparator for string pattern search\n",
">[Qdrant](https://qdrant.tech/documentation/) (read: quadrant) is a vector similarity search engine. It provides a production-ready service with a convenient API to store, search, and manage points - vectors with an additional payload. `Qdrant` is tailored to extended filtering support.\n",
"\n",
"In the notebook we'll demo the `SelfQueryRetriever` wrapped around a Qdrant vector store. "
"In the notebook, we'll demo the `SelfQueryRetriever` wrapped around a `Qdrant` vector store. "
">[Vectara](https://docs.vectara.com/docs/) is a GenAI platform for developers. It provides a simple API to build Grounded Generation (aka Retrieval-augmented-generation) applications.\n",
">[Vectara](https://docs.vectara.com/docs/) is a GenAI platform for developers. It provides a simple API to build Grounded Generation\n",
">(aka Retrieval-augmented-generation or RAG) applications.\n",
"\n",
"In the notebook we'll demo the `SelfQueryRetriever` wrapped around a Vectara vector store. "
"In the notebook, we'll demo the `SelfQueryRetriever` wrapped around a Vectara vector store. "