docs: `integrations/retrievers` cleanup (#20357)

Fixed format inconsistencies; added descriptions, links.
pull/20256/head
Leonid Ganeline 6 months ago committed by GitHub
parent 0b99e9201d
commit beebd73f95
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -4,8 +4,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Arcee Retriever\n",
"This notebook demonstrates how to use the `ArceeRetriever` class to retrieve relevant document(s) for Arcee's Domain Adapted Language Models (DALMs)."
"# Arcee\n",
"\n",
">[Arcee](https://www.arcee.ai/about/about-us) helps with the development of the SLMs—small, specialized, secure, and scalable language models.\n",
"\n",
"This notebook demonstrates how to use the `ArceeRetriever` class to retrieve relevant document(s) for Arcee's `Domain Adapted Language Models` (`DALMs`)."
]
},
{

@ -7,7 +7,7 @@
"source": [
"# Azure AI Search\n",
"\n",
">[Azure AI Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) (formerly known as `Azure Cognitive Search`or Azure Search) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.\n",
">[Microsoft Azure AI Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) (formerly known as `Azure Cognitive Search`or Azure Search) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.\n",
"\n",
">Search is foundational to any app that surfaces text to users, where common scenarios include catalog or document search, online retail apps, or data exploration over proprietary content. When you create a search service, you'll work with the following capabilities:\n",
">- A search engine for full text search over a search index containing user-owned content\n",
@ -283,7 +283,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
"version": "3.10.12"
}
},
"nbformat": 4,

@ -7,10 +7,9 @@
"source": [
"# BM25\n",
"\n",
"[BM25](https://en.wikipedia.org/wiki/Okapi_BM25) also known as the Okapi BM25, is a ranking function used in information retrieval systems to estimate the relevance of documents to a given search query.\n",
"\n",
"This notebook goes over how to use a retriever that under the hood uses BM25 using [`rank_bm25`](https://github.com/dorianbrown/rank_bm25) package.\n",
"\n"
">[BM25 (Wikipedia)](https://en.wikipedia.org/wiki/Okapi_BM25) also known as the `Okapi BM25`, is a ranking function used in information retrieval systems to estimate the relevance of documents to a given search query.\n",
">\n",
">`BM25Retriever` retriever uses the [`rank_bm25`](https://github.com/dorianbrown/rank_bm25) package.\n"
]
},
{

@ -6,7 +6,7 @@
"source": [
"# BREEBS (Open Knowledge)\n",
"\n",
"[BREEBS](https://www.breebs.com/) is an open collaborative knowledge platform. \n",
">[BREEBS](https://www.breebs.com/) is an open collaborative knowledge platform. \n",
"Anybody can create a Breeb, a knowledge capsule, based on PDFs stored on a Google Drive folder.\n",
"A breeb can be used by any LLM/chatbot to improve its expertise, reduce hallucinations and give access to sources.\n",
"Behind the scenes, Breebs implements several Retrieval Augmented Generation (RAG) models to seamlessly provide useful context at each iteration. \n",

@ -5,11 +5,11 @@
"id": "1edb9e6b",
"metadata": {},
"source": [
"# ChatGPT Plugin\n",
"# ChatGPT plugin\n",
"\n",
">[OpenAI plugins](https://platform.openai.com/docs/plugins/introduction) connect ChatGPT to third-party applications. These plugins enable ChatGPT to interact with APIs defined by developers, enhancing ChatGPT's capabilities and allowing it to perform a wide range of actions.\n",
">[OpenAI plugins](https://platform.openai.com/docs/plugins/introduction) connect `ChatGPT` to third-party applications. These plugins enable `ChatGPT` to interact with APIs defined by developers, enhancing `ChatGPT's` capabilities and allowing it to perform a wide range of actions.\n",
"\n",
">Plugins can allow ChatGPT to do things like:\n",
">Plugins allow `ChatGPT` to do things like:\n",
">- Retrieve real-time information; e.g., sports scores, stock prices, the latest news, etc.\n",
">- Retrieve knowledge-base information; e.g., company docs, personal notes, etc.\n",
">- Perform actions on behalf of the user; e.g., booking a flight, ordering food, etc.\n",

@ -5,7 +5,7 @@
"id": "fc0db1bc",
"metadata": {},
"source": [
"# Cohere Reranker\n",
"# Cohere reranker\n",
"\n",
">[Cohere](https://cohere.ai/about) is a Canadian startup that provides natural language processing models that help companies improve human-machine interactions.\n",
"\n",

@ -5,9 +5,11 @@
"id": "bf733a38-db84-4363-89e2-de6735c37230",
"metadata": {},
"source": [
"# Cohere RAG retriever\n",
"# Cohere RAG\n",
"\n",
"This notebook covers how to get started with Cohere RAG retriever. This allows you to leverage the ability to search documents over various connectors or by supplying your own."
">[Cohere](https://cohere.ai/about) is a Canadian startup that provides natural language processing models that help companies improve human-machine interactions.\n",
"\n",
"This notebook covers how to get started with the `Cohere RAG` retriever. This allows you to leverage the ability to search documents over various connectors or by supplying your own."
]
},
{
@ -231,7 +233,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
"version": "3.10.12"
}
},
"nbformat": 4,

@ -8,7 +8,7 @@
"source": [
"# Dria\n",
"\n",
"Dria is a hub of public RAG models for developers to both contribute and utilize a shared embedding lake. This notebook demonstrates how to use the Dria API for data retrieval tasks."
">[Dria](https://dria.co/) is a hub of public RAG models for developers to both contribute and utilize a shared embedding lake. This notebook demonstrates how to use the `Dria API` for data retrieval tasks."
]
},
{
@ -169,7 +169,7 @@
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -183,9 +183,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.x"
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
"nbformat_minor": 4
}

@ -5,11 +5,11 @@
"id": "ab66dd43",
"metadata": {},
"source": [
"# ElasticsearchRetriever\n",
"# Elasticsearch\n",
"\n",
"[Elasticsearch](https://www.elastic.co/elasticsearch/) is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. It support keyword search, vector search, hybrid search and complex filtering.\n",
">[Elasticsearch](https://www.elastic.co/elasticsearch/) is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. It supports keyword search, vector search, hybrid search and complex filtering.\n",
"\n",
"The `ElasticsearchRetriever` is a generic wrapper to enable flexible access to all Elasticsearch features through the [Query DSL](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html). For most use cases the other classes (`ElasticsearchStore`, `ElasticsearchEmbeddings`, etc.) should suffice, but if they don't you can use `ElasticsearchRetriever`."
"The `ElasticsearchRetriever` is a generic wrapper to enable flexible access to all `Elasticsearch` features through the [Query DSL](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html). For most use cases the other classes (`ElasticsearchStore`, `ElasticsearchEmbeddings`, etc.) should suffice, but if they don't you can use `ElasticsearchRetriever`."
]
},
{
@ -561,7 +561,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
"version": "3.10.12"
}
},
"nbformat": 4,

@ -7,11 +7,11 @@
"source": [
"# Embedchain\n",
"\n",
"Embedchain is a RAG framework to create data pipelines. It loads, indexes, retrieves and syncs all the data.\n",
">[Embedchain](https://github.com/embedchain/embedchain) is a RAG framework to create data pipelines. It loads, indexes, retrieves and syncs all the data.\n",
">\n",
">It is available as an [open source package](https://github.com/embedchain/embedchain) and as a [hosted platform solution](https://app.embedchain.ai/).\n",
"\n",
"It is available as an [open source package](https://github.com/embedchain/embedchain) and as a [hosted platform solution](https://app.embedchain.ai/).\n",
"\n",
"This notebook shows how to use a retriever that uses Embedchain."
"This notebook shows how to use a retriever that uses `Embedchain`."
]
},
{

@ -9,7 +9,9 @@
}
},
"source": [
"# Flashrank Reranker\n",
"# FlashRank reranker\n",
"\n",
">[FlashRank](https://github.com/PrithivirajDamodaran/FlashRank) is the Ultra-lite & Super-fast Python library to add re-ranking to your existing search & retrieval pipelines. It is based on SoTA cross-encoders, with gratitude to all the model owners.\n",
"\n",
"This notebook shows how to use [flashrank](https://github.com/PrithivirajDamodaran/FlashRank) for document compression and retrieval."
]
@ -512,7 +514,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
"version": "3.10.12"
}
},
"nbformat": 4,

@ -5,11 +5,13 @@
"id": "a33a03c9-f11d-45ef-a563-9da0652fcf92",
"metadata": {},
"source": [
"# Fleet AI Libraries Context\n",
"# Fleet AI Context\n",
"\n",
"The Fleet AI team is on a mission to embed the world's most important data. They've started by embedding the top 1200 Python libraries to enable code generation with up-to-date knowledge. They've been kind enough to share their embeddings of the [LangChain docs](/docs/get_started/introduction) and [API reference](https://api.python.langchain.com/en/latest/api_reference.html).\n",
">[Fleet AI Context](https://www.fleet.so/context) is a dataset of high-quality embeddings of the top 1200 most popular & permissive Python Libraries & their documentation.\n",
">\n",
">The `Fleet AI` team is on a mission to embed the world's most important data. They've started by embedding the top 1200 Python libraries to enable code generation with up-to-date knowledge. They've been kind enough to share their embeddings of the [LangChain docs](/docs/get_started/introduction) and [API reference](https://api.python.langchain.com/en/latest/api_reference.html).\n",
"\n",
"Let's take a look at how we can use these embeddings to power a docs retrieval system and ultimately a simple code generating chain!"
"Let's take a look at how we can use these embeddings to power a docs retrieval system and ultimately a simple code-generating chain!"
]
},
{

@ -6,13 +6,13 @@
"source": [
"# Google Vertex AI Search\n",
"\n",
"[Vertex AI Search](https://cloud.google.com/enterprise-search) (formerly known as Enterprise Search on Generative AI App Builder) is a part of the [Vertex AI](https://cloud.google.com/vertex-ai) machine learning platform offered by Google Cloud.\n",
">[Google Vertex AI Search](https://cloud.google.com/enterprise-search) (formerly known as `Enterprise Search` on `Generative AI App Builder`) is a part of the [Vertex AI](https://cloud.google.com/vertex-ai) machine learning platform offered by `Google Cloud`.\n",
">\n",
">`Vertex AI Search` lets organizations quickly build generative AI-powered search engines for customers and employees. It's underpinned by a variety of `Google Search` technologies, including semantic search, which helps deliver more relevant results than traditional keyword-based search techniques by using natural language processing and machine learning techniques to infer relationships within the content and intent from the users query input. Vertex AI Search also benefits from Googles expertise in understanding how users search and factors in content relevance to order displayed results.\n",
"\n",
"Vertex AI Search lets organizations quickly build generative AI powered search engines for customers and employees. It's underpinned by a variety of Google Search technologies, including semantic search, which helps deliver more relevant results than traditional keyword-based search techniques by using natural language processing and machine learning techniques to infer relationships within the content and intent from the users query input. Vertex AI Search also benefits from Googles expertise in understanding how users search and factors in content relevance to order displayed results.\n",
">`Vertex AI Search` is available in the `Google Cloud Console` and via an API for enterprise workflow integration.\n",
"\n",
"Vertex AI Search is available in the Google Cloud Console and via an API for enterprise workflow integration.\n",
"\n",
"This notebook demonstrates how to configure Vertex AI Search and use the Vertex AI Search retriever. The Vertex AI Search retriever encapsulates the [Python client library](https://cloud.google.com/generative-ai-app-builder/docs/libraries#client-libraries-install-python) and uses it to access the [Search Service API](https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1beta.services.search_service).\n"
"This notebook demonstrates how to configure `Vertex AI Search` and use the Vertex AI Search retriever. The Vertex AI Search retriever encapsulates the [Python client library](https://cloud.google.com/generative-ai-app-builder/docs/libraries#client-libraries-install-python) and uses it to access the [Search Service API](https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1beta.services.search_service).\n"
]
},
{
@ -351,7 +351,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0"
"version": "3.10.12"
}
},
"nbformat": 4,

@ -5,16 +5,18 @@
"id": "671e9ec1-fa00-4c92-a2fb-ceb142168ea9",
"metadata": {},
"source": [
"# Jaguar Vector Database\n",
"\n",
"1. It is a distributed vector database\n",
"2. The “ZeroMove” feature of JaguarDB enables instant horizontal scalability\n",
"3. Multimodal: embeddings, text, images, videos, PDFs, audio, time series, and geospatial\n",
"4. All-masters: allows both parallel reads and writes\n",
"5. Anomaly detection capabilities\n",
"6. RAG support: combines LLM with proprietary and real-time data\n",
"7. Shared metadata: sharing of metadata across multiple vector indexes\n",
"8. Distance metrics: Euclidean, Cosine, InnerProduct, Manhatten, Chebyshev, Hamming, Jeccard, Minkowski"
"# JaguarDB Vector Database\n",
"\n",
">[JaguarDB Vector Database](http://www.jaguardb.com/windex.html\n",
">\n",
">1. It is a distributed vector database\n",
">2. The “ZeroMove” feature of JaguarDB enables instant horizontal scalability\n",
">3. Multimodal: embeddings, text, images, videos, PDFs, audio, time series, and geospatial\n",
">4. All-masters: allows both parallel reads and writes\n",
">5. Anomaly detection capabilities\n",
">6. RAG support: combines LLM with proprietary and real-time data\n",
">7. Shared metadata: sharing of metadata across multiple vector indexes\n",
">8. Distance metrics: Euclidean, Cosine, InnerProduct, Manhatten, Chebyshev, Hamming, Jeccard, Minkowski"
]
},
{

@ -7,10 +7,9 @@
"source": [
"# Kay.ai\n",
"\n",
">[Kai Data API](https://www.kay.ai/) built for RAG 🕵️ We are curating the world's largest datasets as high-quality embeddings so your AI agents can retrieve context on the fly. Latest models, fast retrieval, and zero infra.\n",
"\n",
"> Data API built for RAG 🕵️ We are curating the world's largest datasets as high-quality embeddings so your AI agents can retrieve context on the fly. Latest models, fast retrieval, and zero infra.\n",
"\n",
"This notebook shows you how to retrieve datasets supported by [Kay](https://kay.ai/). You can currently search SEC Filings and Press Releases of US companies. Visit [kay.ai](https://kay.ai) for the latest data drops. For any questions, join our [discord](https://discord.gg/hAnE4e5T6M) or [tweet at us](https://twitter.com/vishalrohra_)"
"This notebook shows you how to retrieve datasets supported by [Kay](https://kay.ai/). You can currently search `SEC Filings` and `Press Releases of US companies`. Visit [kay.ai](https://kay.ai) for the latest data drops. For any questions, join our [discord](https://discord.gg/hAnE4e5T6M) or [tweet at us](https://twitter.com/vishalrohra_)"
]
},
{
@ -18,10 +17,27 @@
"id": "fc507b8e-ea51-417c-93da-42bf998a1195",
"metadata": {},
"source": [
"Installation\n",
"=\n",
"## Installation\n",
"\n",
"First you will need to install the [`kay` package](https://pypi.org/project/kay/). You will also need an API key: you can get one for free at [https://kay.ai](https://kay.ai/). Once you have an API key, you must set it as an environment variable `KAY_API_KEY`.\n",
"First, install the [`kay` package](https://pypi.org/project/kay/). "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ae22ad3e-4643-4314-8dea-a5abff0d87b0",
"metadata": {},
"outputs": [],
"source": [
"!pip install kay"
]
},
{
"cell_type": "markdown",
"id": "efd317f7-9b7d-4e71-875c-5f0b6efeca05",
"metadata": {},
"source": [
"You will also need an API key: you can get one for free at [https://kay.ai](https://kay.ai/). Once you have an API key, you must set it as an environment variable `KAY_API_KEY`.\n",
"\n",
"`KayAiRetriever` has a static `.create()` factory method that takes the following arguments:\n",
"\n",
@ -35,11 +51,9 @@
"id": "c923bea0-585a-4f62-8662-efc167e8d793",
"metadata": {},
"source": [
"Examples\n",
"=\n",
"## Examples\n",
"\n",
"Basic Retriever Usage\n",
"-"
"### Basic Retriever Usage"
]
},
{
@ -111,8 +125,7 @@
"id": "21f6e9e5-478c-4b2c-9d61-f7a84f4d2f8f",
"metadata": {},
"source": [
"Usage in a chain\n",
"-"
"### Usage in a chain"
]
},
{

@ -7,11 +7,11 @@
"source": [
"# kNN\n",
"\n",
">In statistics, the [k-nearest neighbors algorithm (k-NN)](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regression.\n",
">In statistics, the [k-nearest neighbours algorithm (k-NN)](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) is a non-parametric supervised learning method first developed by `Evelyn Fix` and `Joseph Hodges` in 1951, and later expanded by `Thomas Cover`. It is used for classification and regression.\n",
"\n",
"This notebook goes over how to use a retriever that under the hood uses an kNN.\n",
"This notebook goes over how to use a retriever that under the hood uses a kNN.\n",
"\n",
"Largely based on https://github.com/karpathy/randomfun/blob/master/knn_vs_svm.html"
"Largely based on the code of [Andrej Karpathy](https://github.com/karpathy/randomfun/blob/master/knn_vs_svm.html)."
]
},
{

@ -8,7 +8,7 @@
"source": [
"# LOTR (Merger Retriever)\n",
"\n",
"`Lord of the Retrievers`, also known as `MergerRetriever`, takes a list of retrievers as input and merges the results of their get_relevant_documents() methods into a single list. The merged results will be a list of documents that are relevant to the query and that have been ranked by the different retrievers.\n",
">`Lord of the Retrievers (LOTR)`, also known as `MergerRetriever`, takes a list of retrievers as input and merges the results of their get_relevant_documents() methods into a single list. The merged results will be a list of documents that are relevant to the query and that have been ranked by the different retrievers.\n",
"\n",
"The `MergerRetriever` class can be used to improve the accuracy of document retrieval in a number of ways. First, it can combine the results of multiple retrievers, which can help to reduce the risk of bias in the results. Second, it can rank the results of the different retrievers, which can help to ensure that the most relevant documents are returned first."
]

@ -5,12 +5,12 @@
"id": "ce0f17b9",
"metadata": {},
"source": [
"# Qdrant Sparse Vector Retriever\n",
"# Qdrant Sparse Vector\n",
"\n",
">[Qdrant](https://qdrant.tech/) is an open-source, high-performance vector search engine/database.\n",
"\n",
"\n",
">`QdrantSparseVectorRetriever` uses [sparse vectors](https://qdrant.tech/articles/sparse-vectors/) introduced in Qdrant [v1.7.0](https://qdrant.tech/articles/qdrant-1.7.x/) for document retrieval.\n"
">`QdrantSparseVectorRetriever` uses [sparse vectors](https://qdrant.tech/articles/sparse-vectors/) introduced in `Qdrant` [v1.7.0](https://qdrant.tech/articles/qdrant-1.7.x/) for document retrieval.\n"
]
},
{

@ -8,9 +8,13 @@
"# RAGatouille\n",
"\n",
"\n",
"This page covers how to use [RAGatouille](https://github.com/bclavie/RAGatouille) as a retriever in a LangChain chain. RAGatouille makes it as simple as can be to use ColBERT! [ColBERT](https://github.com/stanford-futuredata/ColBERT) is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.\n",
">[RAGatouille](https://github.com/bclavie/RAGatouille) makes it as simple as can be to use `ColBERT`!\n",
">\n",
">[ColBERT](https://github.com/stanford-futuredata/ColBERT) is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.\n",
"\n",
"We can use this as a [retriever](/docs/modules/data_connection/retrievers). It will show functionality specific to this integration. After going through, it may be useful to explore [relevant use-case pages](/docs/use_cases/question_answering) to learn how to use this vectorstore as part of a larger chain.\n",
"We can use this as a [retriever](/docs/modules/data_connection/retrievers). It will show functionality specific to this integration. After going through, it may be useful to explore [relevant use-case pages](/docs/use_cases/question_answering) to learn how to use this vector store as part of a larger chain.\n",
"\n",
"This page covers how to use [RAGatouille](https://github.com/bclavie/RAGatouille) as a retriever in a LangChain chain. \n",
"\n",
"## Setup\n",
"\n",

@ -8,9 +8,9 @@
"# SEC filing\n",
"\n",
"\n",
">The SEC filing is a financial statement or other formal document submitted to the U.S. Securities and Exchange Commission (SEC). Public companies, certain insiders, and broker-dealers are required to make regular SEC filings. Investors and financial professionals rely on these filings for information about companies they are evaluating for investment purposes.\n",
">[SEC filing](https://www.sec.gov/edgar) is a financial statement or other formal document submitted to the U.S. Securities and Exchange Commission (SEC). Public companies, certain insiders, and broker-dealers are required to make regular `SEC filings`. Investors and financial professionals rely on these filings for information about companies they are evaluating for investment purposes.\n",
">\n",
">SEC filings data powered by [Kay.ai](https://kay.ai) and [Cybersyn](https://www.cybersyn.com/) via [Snowflake Marketplace](https://app.snowflake.com/marketplace/providers/GZTSZAS2KCS/Cybersyn%2C%20Inc).\n"
">`SEC filings` data powered by [Kay.ai](https://kay.ai) and [Cybersyn](https://www.cybersyn.com/) via [Snowflake Marketplace](https://app.snowflake.com/marketplace/providers/GZTSZAS2KCS/Cybersyn%2C%20Inc).\n"
]
},
{

@ -4,9 +4,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Astra DB\n",
"# Astra DB (Cassandra)\n",
"\n",
"DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Cassandra and made conveniently available through an easy-to-use JSON API.\n",
">[DataStax Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on `Cassandra` and made conveniently available through an easy-to-use JSON API.\n",
"\n",
"In the walkthrough, we'll demo the `SelfQueryRetriever` with an `Astra DB` vector store."
]
@ -57,6 +57,9 @@
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%% md\n"
}
@ -276,7 +279,10 @@
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"## Cleanup\n",
@ -290,7 +296,10 @@
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
@ -300,7 +309,7 @@
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -314,9 +323,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

@ -7,7 +7,7 @@
"source": [
"# Chroma\n",
"\n",
">[Chroma](https://docs.trychroma.com/getting-started) is a database for building AI applications with embeddings.\n",
">[Chroma](https://docs.trychroma.com/getting-started) is a vector database for building AI applications with embeddings.\n",
"\n",
"In the notebook, we'll demo the `SelfQueryRetriever` wrapped around a `Chroma` vector store. "
]

@ -2,7 +2,7 @@
sidebar-position: 0
---
# Self-querying retriever
# Self-querying retrievers
Learn about how the self-querying retriever works [here](/docs/modules/data_connection/retrievers/self_query).

@ -6,8 +6,8 @@
"source": [
"# MongoDB Atlas\n",
"\n",
"[MongoDB Atlas](https://www.mongodb.com/) is a document database that can be \n",
"used as a vector databse.\n",
">[MongoDB Atlas](https://www.mongodb.com/) is a document database that can be \n",
"used as a vector database.\n",
"\n",
"In the walkthrough, we'll demo the `SelfQueryRetriever` with a `MongoDB Atlas` vector store."
]
@ -299,7 +299,7 @@
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -313,9 +313,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

@ -5,9 +5,9 @@
"id": "13afcae7",
"metadata": {},
"source": [
"# PGVector\n",
"# PGVector (Postgres)\n",
"\n",
">[PGVector](https://github.com/pgvector/pgvector) is a vector similarity search for Postgres.\n",
">[PGVector](https://github.com/pgvector/pgvector) is a vector similarity search package for `Postgres` data base.\n",
"\n",
"In the notebook, we'll demo the `SelfQueryRetriever` wrapped around a `PGVector` vector store."
]
@ -300,7 +300,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
"version": "3.10.12"
}
},
"nbformat": 4,

@ -5,7 +5,7 @@
"id": "13afcae7",
"metadata": {},
"source": [
"# Supabase\n",
"# Supabase (Postgres)\n",
"\n",
">[Supabase](https://supabase.com/docs) is an open-source `Firebase` alternative. \n",
"> `Supabase` is built on top of `PostgreSQL`, which offers strong `SQL` \n",

@ -6,9 +6,13 @@
"id": "13afcae7",
"metadata": {},
"source": [
"# Timescale Vector (Postgres) self-querying \n",
"# Timescale Vector (Postgres) \n",
"\n",
"[Timescale Vector](https://www.timescale.com/ai) is PostgreSQL++ for AI applications. It enables you to efficiently store and query billions of vector embeddings in `PostgreSQL`.\n",
">[Timescale Vector](https://www.timescale.com/ai) is `PostgreSQL++` for AI applications. It enables you to efficiently store and query billions of vector embeddings in `PostgreSQL`.\n",
">\n",
">[PostgreSQL](https://en.wikipedia.org/wiki/PostgreSQL) also known as `Postgres`,\n",
"> is a free and open-source relational database management system (RDBMS) \n",
"> emphasizing extensibility and `SQL` compliance.\n",
"\n",
"This notebook shows how to use the Postgres vector database (`TimescaleVector`) to perform self-querying. In the notebook we'll demo the `SelfQueryRetriever` wrapped around a TimescaleVector vector store. \n",
"\n",
@ -528,6 +532,18 @@
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,

@ -5,19 +5,15 @@
"id": "13afcae7",
"metadata": {},
"source": [
"# Vectara self-querying \n",
"# Vectara \n",
"\n",
">[Vectara](https://vectara.com/) is the trusted GenAI platform that provides an easy-to-use API for document indexing and querying. \n",
"\n",
"Vectara provides an end-to-end managed service for Retrieval Augmented Generation or [RAG](https://vectara.com/grounded-generation/), which includes:\n",
"\n",
"1. A way to extract text from document files and chunk them into sentences.\n",
"\n",
"2. The state-of-the-art [Boomerang](https://vectara.com/how-boomerang-takes-retrieval-augmented-generation-to-the-next-level-via-grounded-generation/) embeddings model. Each text chunk is encoded into a vector embedding using Boomerang, and stored in the Vectara internal knowledge (vector+text) store\n",
"\n",
"3. A query service that automatically encodes the query into embedding, and retrieves the most relevant text segments (including support for [Hybrid Search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) and [MMR](https://vectara.com/get-diverse-results-and-comprehensive-summaries-with-vectaras-mmr-reranker/))\n",
"\n",
"4. An option to create [generative summary](https://docs.vectara.com/docs/learn/grounded-generation/grounded-generation-overview), based on the retrieved documents, including citations.\n",
">\n",
">`Vectara` provides an end-to-end managed service for `Retrieval Augmented Generation` or [RAG](https://vectara.com/grounded-generation/), which includes:\n",
">1. A way to `extract text` from document files and `chunk` them into sentences.\n",
">2. The state-of-the-art [Boomerang](https://vectara.com/how-boomerang-takes-retrieval-augmented-generation-to-the-next-level-via-grounded-generation/) embeddings model. Each text chunk is encoded into a vector embedding using `Boomerang`, and stored in the Vectara internal knowledge (vector+text) store\n",
">3. A query service that automatically encodes the query into embedding, and retrieves the most relevant text segments (including support for [Hybrid Search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) and [MMR](https://vectara.com/get-diverse-results-and-comprehensive-summaries-with-vectaras-mmr-reranker/))\n",
">4. An option to create [generative summary](https://docs.vectara.com/docs/learn/grounded-generation/grounded-generation-overview), based on the retrieved documents, including citations.\n",
"\n",
"See the [Vectara API documentation](https://docs.vectara.com/docs/) for more information on how to use the API.\n",
"\n",
@ -31,17 +27,17 @@
"source": [
"# Setup\n",
"\n",
"You will need a Vectara account to use Vectara with LangChain. To get started, use the following steps (see our [quickstart](https://docs.vectara.com/docs/quickstart) guide):\n",
"1. [Sign up](https://console.vectara.com/signup) for a Vectara account if you don't already have one. Once you have completed your sign up you will have a Vectara customer ID. You can find your customer ID by clicking on your name, on the top-right of the Vectara console window.\n",
"2. Within your account you can create one or more corpora. Each corpus represents an area that stores text data upon ingest from input documents. To create a corpus, use the **\"Create Corpus\"** button. You then provide a name to your corpus as well as a description. Optionally you can define filtering attributes and apply some advanced options. If you click on your created corpus, you can see its name and corpus ID right on the top.\n",
"You will need a `Vectara` account to use `Vectara` with `LangChain`. To get started, use the following steps (see our [quickstart](https://docs.vectara.com/docs/quickstart) guide):\n",
"1. [Sign up](https://console.vectara.com/signup) for a `Vectara` account if you don't already have one. Once you have completed your sign up you will have a Vectara customer ID. You can find your customer ID by clicking on your name, on the top-right of the Vectara console window.\n",
"2. Within your account you can create one or more corpora. Each corpus represents an area that stores text data upon ingesting from input documents. To create a corpus, use the **\"Create Corpus\"** button. You then provide a name to your corpus as well as a description. Optionally you can define filtering attributes and apply some advanced options. If you click on your created corpus, you can see its name and corpus ID right on the top.\n",
"3. Next you'll need to create API keys to access the corpus. Click on the **\"Authorization\"** tab in the corpus view and then the **\"Create API Key\"** button. Give your key a name, and choose whether you want query only or query+index for your key. Click \"Create\" and you now have an active API key. Keep this key confidential. \n",
"\n",
"To use LangChain with Vectara, you'll need to have these three values: customer ID, corpus ID and api_key.\n",
"To use LangChain with Vectara, you need three values: customer ID, corpus ID and api_key.\n",
"You can provide those to LangChain in two ways:\n",
"\n",
"1. Include in your environment these three variables: `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`.\n",
"\n",
"> For example, you can set these variables using os.environ and getpass as follows:\n",
"> For example, you can set these variables using `os.environ` and `getpass` as follows:\n",
"\n",
"```python\n",
"import os\n",
@ -52,7 +48,7 @@
"os.environ[\"VECTARA_API_KEY\"] = getpass.getpass(\"Vectara API Key:\")\n",
"```\n",
"\n",
"1. Provide them as arguments when creating the Vectara vectorstore object:\n",
"1. Provide them as arguments when creating the `Vectara` vectorstore object:\n",
"\n",
"```python\n",
"vectorstore = Vectara(\n",
@ -398,7 +394,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.10.12"
}
},
"nbformat": 4,

@ -6,7 +6,7 @@
"source": [
"# Tavily Search API\n",
"\n",
"[Tavily's Search API](https://tavily.com) is a search engine built specifically for AI agents (LLMs), delivering real-time, accurate, and factual results at speed.\n",
">[Tavily's Search API](https://tavily.com) is a search engine built specifically for AI agents (LLMs), delivering real-time, accurate, and factual results at speed.\n",
"\n",
"We can use this as a [retriever](/docs/modules/data_connection/retrievers). It will show functionality specific to this integration. After going through, it may be useful to explore [relevant use-case pages](/docs/use_cases/question_answering) to learn how to use this vectorstore as part of a larger chain.\n",
"\n",

@ -5,9 +5,9 @@
"id": "818fc023",
"metadata": {},
"source": [
"# You.com Retriever\n",
"# You.com\n",
"\n",
"The [you.com API](https://api.you.com) is a suite of tools designed to help developers ground the output of LLMs in the most recent, most accurate, most relevant information that may not have been included in their training dataset."
">[you.com API](https://api.you.com) is a suite of tools designed to help developers ground the output of LLMs in the most recent, most accurate, most relevant information that may not have been included in their training dataset."
]
},
{

Loading…
Cancel
Save