diff --git a/docs/docs/integrations/retrievers/arcee.ipynb b/docs/docs/integrations/retrievers/arcee.ipynb index 1f637458fa..1013baf72c 100644 --- a/docs/docs/integrations/retrievers/arcee.ipynb +++ b/docs/docs/integrations/retrievers/arcee.ipynb @@ -4,8 +4,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Arcee Retriever\n", - "This notebook demonstrates how to use the `ArceeRetriever` class to retrieve relevant document(s) for Arcee's Domain Adapted Language Models (DALMs)." + "# Arcee\n", + "\n", + ">[Arcee](https://www.arcee.ai/about/about-us) helps with the development of the SLMs—small, specialized, secure, and scalable language models.\n", + "\n", + "This notebook demonstrates how to use the `ArceeRetriever` class to retrieve relevant document(s) for Arcee's `Domain Adapted Language Models` (`DALMs`)." ] }, { diff --git a/docs/docs/integrations/retrievers/azure_ai_search.ipynb b/docs/docs/integrations/retrievers/azure_ai_search.ipynb index a88120d2a9..6151fc2227 100644 --- a/docs/docs/integrations/retrievers/azure_ai_search.ipynb +++ b/docs/docs/integrations/retrievers/azure_ai_search.ipynb @@ -7,7 +7,7 @@ "source": [ "# Azure AI Search\n", "\n", - ">[Azure AI Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) (formerly known as `Azure Cognitive Search`or Azure Search) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.\n", + ">[Microsoft Azure AI Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) (formerly known as `Azure Cognitive Search`or Azure Search) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.\n", "\n", ">Search is foundational to any app that surfaces text to users, where common scenarios include catalog or document search, online retail apps, or data exploration over proprietary content. When you create a search service, you'll work with the following capabilities:\n", ">- A search engine for full text search over a search index containing user-owned content\n", @@ -283,7 +283,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.8" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/retrievers/bm25.ipynb b/docs/docs/integrations/retrievers/bm25.ipynb index 241b3e5639..7f15bb5b9b 100644 --- a/docs/docs/integrations/retrievers/bm25.ipynb +++ b/docs/docs/integrations/retrievers/bm25.ipynb @@ -7,10 +7,9 @@ "source": [ "# BM25\n", "\n", - "[BM25](https://en.wikipedia.org/wiki/Okapi_BM25) also known as the Okapi BM25, is a ranking function used in information retrieval systems to estimate the relevance of documents to a given search query.\n", - "\n", - "This notebook goes over how to use a retriever that under the hood uses BM25 using [`rank_bm25`](https://github.com/dorianbrown/rank_bm25) package.\n", - "\n" + ">[BM25 (Wikipedia)](https://en.wikipedia.org/wiki/Okapi_BM25) also known as the `Okapi BM25`, is a ranking function used in information retrieval systems to estimate the relevance of documents to a given search query.\n", + ">\n", + ">`BM25Retriever` retriever uses the [`rank_bm25`](https://github.com/dorianbrown/rank_bm25) package.\n" ] }, { diff --git a/docs/docs/integrations/retrievers/breebs.ipynb b/docs/docs/integrations/retrievers/breebs.ipynb index 5d7e26b711..f9fa9d84b2 100644 --- a/docs/docs/integrations/retrievers/breebs.ipynb +++ b/docs/docs/integrations/retrievers/breebs.ipynb @@ -6,7 +6,7 @@ "source": [ "# BREEBS (Open Knowledge)\n", "\n", - "[BREEBS](https://www.breebs.com/) is an open collaborative knowledge platform. \n", + ">[BREEBS](https://www.breebs.com/) is an open collaborative knowledge platform. \n", "Anybody can create a Breeb, a knowledge capsule, based on PDFs stored on a Google Drive folder.\n", "A breeb can be used by any LLM/chatbot to improve its expertise, reduce hallucinations and give access to sources.\n", "Behind the scenes, Breebs implements several Retrieval Augmented Generation (RAG) models to seamlessly provide useful context at each iteration. \n", diff --git a/docs/docs/integrations/retrievers/chatgpt-plugin.ipynb b/docs/docs/integrations/retrievers/chatgpt-plugin.ipynb index 1e0388606b..5b00552d80 100644 --- a/docs/docs/integrations/retrievers/chatgpt-plugin.ipynb +++ b/docs/docs/integrations/retrievers/chatgpt-plugin.ipynb @@ -5,11 +5,11 @@ "id": "1edb9e6b", "metadata": {}, "source": [ - "# ChatGPT Plugin\n", + "# ChatGPT plugin\n", "\n", - ">[OpenAI plugins](https://platform.openai.com/docs/plugins/introduction) connect ChatGPT to third-party applications. These plugins enable ChatGPT to interact with APIs defined by developers, enhancing ChatGPT's capabilities and allowing it to perform a wide range of actions.\n", + ">[OpenAI plugins](https://platform.openai.com/docs/plugins/introduction) connect `ChatGPT` to third-party applications. These plugins enable `ChatGPT` to interact with APIs defined by developers, enhancing `ChatGPT's` capabilities and allowing it to perform a wide range of actions.\n", "\n", - ">Plugins can allow ChatGPT to do things like:\n", + ">Plugins allow `ChatGPT` to do things like:\n", ">- Retrieve real-time information; e.g., sports scores, stock prices, the latest news, etc.\n", ">- Retrieve knowledge-base information; e.g., company docs, personal notes, etc.\n", ">- Perform actions on behalf of the user; e.g., booking a flight, ordering food, etc.\n", diff --git a/docs/docs/integrations/retrievers/cohere-reranker.ipynb b/docs/docs/integrations/retrievers/cohere-reranker.ipynb index 5602e66d9f..2378ccec45 100644 --- a/docs/docs/integrations/retrievers/cohere-reranker.ipynb +++ b/docs/docs/integrations/retrievers/cohere-reranker.ipynb @@ -5,7 +5,7 @@ "id": "fc0db1bc", "metadata": {}, "source": [ - "# Cohere Reranker\n", + "# Cohere reranker\n", "\n", ">[Cohere](https://cohere.ai/about) is a Canadian startup that provides natural language processing models that help companies improve human-machine interactions.\n", "\n", diff --git a/docs/docs/integrations/retrievers/cohere.ipynb b/docs/docs/integrations/retrievers/cohere.ipynb index 867ac192da..55640d8f6c 100644 --- a/docs/docs/integrations/retrievers/cohere.ipynb +++ b/docs/docs/integrations/retrievers/cohere.ipynb @@ -5,9 +5,11 @@ "id": "bf733a38-db84-4363-89e2-de6735c37230", "metadata": {}, "source": [ - "# Cohere RAG retriever\n", + "# Cohere RAG\n", "\n", - "This notebook covers how to get started with Cohere RAG retriever. This allows you to leverage the ability to search documents over various connectors or by supplying your own." + ">[Cohere](https://cohere.ai/about) is a Canadian startup that provides natural language processing models that help companies improve human-machine interactions.\n", + "\n", + "This notebook covers how to get started with the `Cohere RAG` retriever. This allows you to leverage the ability to search documents over various connectors or by supplying your own." ] }, { @@ -231,7 +233,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.7" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/retrievers/dria_index.ipynb b/docs/docs/integrations/retrievers/dria_index.ipynb index ced1cb822c..5f6329ec1b 100644 --- a/docs/docs/integrations/retrievers/dria_index.ipynb +++ b/docs/docs/integrations/retrievers/dria_index.ipynb @@ -8,7 +8,7 @@ "source": [ "# Dria\n", "\n", - "Dria is a hub of public RAG models for developers to both contribute and utilize a shared embedding lake. This notebook demonstrates how to use the Dria API for data retrieval tasks." + ">[Dria](https://dria.co/) is a hub of public RAG models for developers to both contribute and utilize a shared embedding lake. This notebook demonstrates how to use the `Dria API` for data retrieval tasks." ] }, { @@ -169,7 +169,7 @@ "provenance": [] }, "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -183,9 +183,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.x" + "version": "3.10.12" } }, "nbformat": 4, - "nbformat_minor": 0 -} \ No newline at end of file + "nbformat_minor": 4 +} diff --git a/docs/docs/integrations/retrievers/elasticsearch_retriever.ipynb b/docs/docs/integrations/retrievers/elasticsearch_retriever.ipynb index 8c51c8326c..0b72a99829 100644 --- a/docs/docs/integrations/retrievers/elasticsearch_retriever.ipynb +++ b/docs/docs/integrations/retrievers/elasticsearch_retriever.ipynb @@ -5,11 +5,11 @@ "id": "ab66dd43", "metadata": {}, "source": [ - "# ElasticsearchRetriever\n", + "# Elasticsearch\n", "\n", - "[Elasticsearch](https://www.elastic.co/elasticsearch/) is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. It support keyword search, vector search, hybrid search and complex filtering.\n", + ">[Elasticsearch](https://www.elastic.co/elasticsearch/) is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. It supports keyword search, vector search, hybrid search and complex filtering.\n", "\n", - "The `ElasticsearchRetriever` is a generic wrapper to enable flexible access to all Elasticsearch features through the [Query DSL](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html). For most use cases the other classes (`ElasticsearchStore`, `ElasticsearchEmbeddings`, etc.) should suffice, but if they don't you can use `ElasticsearchRetriever`." + "The `ElasticsearchRetriever` is a generic wrapper to enable flexible access to all `Elasticsearch` features through the [Query DSL](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html). For most use cases the other classes (`ElasticsearchStore`, `ElasticsearchEmbeddings`, etc.) should suffice, but if they don't you can use `ElasticsearchRetriever`." ] }, { @@ -561,7 +561,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.7" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/retrievers/embedchain.ipynb b/docs/docs/integrations/retrievers/embedchain.ipynb index 6a1295f336..97dc8a99b7 100644 --- a/docs/docs/integrations/retrievers/embedchain.ipynb +++ b/docs/docs/integrations/retrievers/embedchain.ipynb @@ -7,11 +7,11 @@ "source": [ "# Embedchain\n", "\n", - "Embedchain is a RAG framework to create data pipelines. It loads, indexes, retrieves and syncs all the data.\n", + ">[Embedchain](https://github.com/embedchain/embedchain) is a RAG framework to create data pipelines. It loads, indexes, retrieves and syncs all the data.\n", + ">\n", + ">It is available as an [open source package](https://github.com/embedchain/embedchain) and as a [hosted platform solution](https://app.embedchain.ai/).\n", "\n", - "It is available as an [open source package](https://github.com/embedchain/embedchain) and as a [hosted platform solution](https://app.embedchain.ai/).\n", - "\n", - "This notebook shows how to use a retriever that uses Embedchain." + "This notebook shows how to use a retriever that uses `Embedchain`." ] }, { diff --git a/docs/docs/integrations/retrievers/flashrank-reranker.ipynb b/docs/docs/integrations/retrievers/flashrank-reranker.ipynb index bdd4ed6d76..f63605526d 100644 --- a/docs/docs/integrations/retrievers/flashrank-reranker.ipynb +++ b/docs/docs/integrations/retrievers/flashrank-reranker.ipynb @@ -9,7 +9,9 @@ } }, "source": [ - "# Flashrank Reranker\n", + "# FlashRank reranker\n", + "\n", + ">[FlashRank](https://github.com/PrithivirajDamodaran/FlashRank) is the Ultra-lite & Super-fast Python library to add re-ranking to your existing search & retrieval pipelines. It is based on SoTA cross-encoders, with gratitude to all the model owners.\n", "\n", "This notebook shows how to use [flashrank](https://github.com/PrithivirajDamodaran/FlashRank) for document compression and retrieval." ] @@ -512,7 +514,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.2" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/retrievers/fleet_context.ipynb b/docs/docs/integrations/retrievers/fleet_context.ipynb index b480f09c59..af85caa0cb 100644 --- a/docs/docs/integrations/retrievers/fleet_context.ipynb +++ b/docs/docs/integrations/retrievers/fleet_context.ipynb @@ -5,11 +5,13 @@ "id": "a33a03c9-f11d-45ef-a563-9da0652fcf92", "metadata": {}, "source": [ - "# Fleet AI Libraries Context\n", + "# Fleet AI Context\n", "\n", - "The Fleet AI team is on a mission to embed the world's most important data. They've started by embedding the top 1200 Python libraries to enable code generation with up-to-date knowledge. They've been kind enough to share their embeddings of the [LangChain docs](/docs/get_started/introduction) and [API reference](https://api.python.langchain.com/en/latest/api_reference.html).\n", + ">[Fleet AI Context](https://www.fleet.so/context) is a dataset of high-quality embeddings of the top 1200 most popular & permissive Python Libraries & their documentation.\n", + ">\n", + ">The `Fleet AI` team is on a mission to embed the world's most important data. They've started by embedding the top 1200 Python libraries to enable code generation with up-to-date knowledge. They've been kind enough to share their embeddings of the [LangChain docs](/docs/get_started/introduction) and [API reference](https://api.python.langchain.com/en/latest/api_reference.html).\n", "\n", - "Let's take a look at how we can use these embeddings to power a docs retrieval system and ultimately a simple code generating chain!" + "Let's take a look at how we can use these embeddings to power a docs retrieval system and ultimately a simple code-generating chain!" ] }, { diff --git a/docs/docs/integrations/retrievers/google_vertex_ai_search.ipynb b/docs/docs/integrations/retrievers/google_vertex_ai_search.ipynb index 8c1b8b748c..4da87c1ce7 100644 --- a/docs/docs/integrations/retrievers/google_vertex_ai_search.ipynb +++ b/docs/docs/integrations/retrievers/google_vertex_ai_search.ipynb @@ -6,13 +6,13 @@ "source": [ "# Google Vertex AI Search\n", "\n", - "[Vertex AI Search](https://cloud.google.com/enterprise-search) (formerly known as Enterprise Search on Generative AI App Builder) is a part of the [Vertex AI](https://cloud.google.com/vertex-ai) machine learning platform offered by Google Cloud.\n", + ">[Google Vertex AI Search](https://cloud.google.com/enterprise-search) (formerly known as `Enterprise Search` on `Generative AI App Builder`) is a part of the [Vertex AI](https://cloud.google.com/vertex-ai) machine learning platform offered by `Google Cloud`.\n", + ">\n", + ">`Vertex AI Search` lets organizations quickly build generative AI-powered search engines for customers and employees. It's underpinned by a variety of `Google Search` technologies, including semantic search, which helps deliver more relevant results than traditional keyword-based search techniques by using natural language processing and machine learning techniques to infer relationships within the content and intent from the user’s query input. Vertex AI Search also benefits from Google’s expertise in understanding how users search and factors in content relevance to order displayed results.\n", "\n", - "Vertex AI Search lets organizations quickly build generative AI powered search engines for customers and employees. It's underpinned by a variety of Google Search technologies, including semantic search, which helps deliver more relevant results than traditional keyword-based search techniques by using natural language processing and machine learning techniques to infer relationships within the content and intent from the user’s query input. Vertex AI Search also benefits from Google’s expertise in understanding how users search and factors in content relevance to order displayed results.\n", + ">`Vertex AI Search` is available in the `Google Cloud Console` and via an API for enterprise workflow integration.\n", "\n", - "Vertex AI Search is available in the Google Cloud Console and via an API for enterprise workflow integration.\n", - "\n", - "This notebook demonstrates how to configure Vertex AI Search and use the Vertex AI Search retriever. The Vertex AI Search retriever encapsulates the [Python client library](https://cloud.google.com/generative-ai-app-builder/docs/libraries#client-libraries-install-python) and uses it to access the [Search Service API](https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1beta.services.search_service).\n" + "This notebook demonstrates how to configure `Vertex AI Search` and use the Vertex AI Search retriever. The Vertex AI Search retriever encapsulates the [Python client library](https://cloud.google.com/generative-ai-app-builder/docs/libraries#client-libraries-install-python) and uses it to access the [Search Service API](https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1beta.services.search_service).\n" ] }, { @@ -351,7 +351,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.0" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/retrievers/jaguar.ipynb b/docs/docs/integrations/retrievers/jaguar.ipynb index 3d3287a69e..e1b56d7732 100644 --- a/docs/docs/integrations/retrievers/jaguar.ipynb +++ b/docs/docs/integrations/retrievers/jaguar.ipynb @@ -5,16 +5,18 @@ "id": "671e9ec1-fa00-4c92-a2fb-ceb142168ea9", "metadata": {}, "source": [ - "# Jaguar Vector Database\n", - "\n", - "1. It is a distributed vector database\n", - "2. The “ZeroMove” feature of JaguarDB enables instant horizontal scalability\n", - "3. Multimodal: embeddings, text, images, videos, PDFs, audio, time series, and geospatial\n", - "4. All-masters: allows both parallel reads and writes\n", - "5. Anomaly detection capabilities\n", - "6. RAG support: combines LLM with proprietary and real-time data\n", - "7. Shared metadata: sharing of metadata across multiple vector indexes\n", - "8. Distance metrics: Euclidean, Cosine, InnerProduct, Manhatten, Chebyshev, Hamming, Jeccard, Minkowski" + "# JaguarDB Vector Database\n", + "\n", + ">[JaguarDB Vector Database](http://www.jaguardb.com/windex.html\n", + ">\n", + ">1. It is a distributed vector database\n", + ">2. The “ZeroMove” feature of JaguarDB enables instant horizontal scalability\n", + ">3. Multimodal: embeddings, text, images, videos, PDFs, audio, time series, and geospatial\n", + ">4. All-masters: allows both parallel reads and writes\n", + ">5. Anomaly detection capabilities\n", + ">6. RAG support: combines LLM with proprietary and real-time data\n", + ">7. Shared metadata: sharing of metadata across multiple vector indexes\n", + ">8. Distance metrics: Euclidean, Cosine, InnerProduct, Manhatten, Chebyshev, Hamming, Jeccard, Minkowski" ] }, { diff --git a/docs/docs/integrations/retrievers/kay.ipynb b/docs/docs/integrations/retrievers/kay.ipynb index 6af7787720..66d8ed7b73 100644 --- a/docs/docs/integrations/retrievers/kay.ipynb +++ b/docs/docs/integrations/retrievers/kay.ipynb @@ -7,10 +7,9 @@ "source": [ "# Kay.ai\n", "\n", + ">[Kai Data API](https://www.kay.ai/) built for RAG 🕵️ We are curating the world's largest datasets as high-quality embeddings so your AI agents can retrieve context on the fly. Latest models, fast retrieval, and zero infra.\n", "\n", - "> Data API built for RAG 🕵️ We are curating the world's largest datasets as high-quality embeddings so your AI agents can retrieve context on the fly. Latest models, fast retrieval, and zero infra.\n", - "\n", - "This notebook shows you how to retrieve datasets supported by [Kay](https://kay.ai/). You can currently search SEC Filings and Press Releases of US companies. Visit [kay.ai](https://kay.ai) for the latest data drops. For any questions, join our [discord](https://discord.gg/hAnE4e5T6M) or [tweet at us](https://twitter.com/vishalrohra_)" + "This notebook shows you how to retrieve datasets supported by [Kay](https://kay.ai/). You can currently search `SEC Filings` and `Press Releases of US companies`. Visit [kay.ai](https://kay.ai) for the latest data drops. For any questions, join our [discord](https://discord.gg/hAnE4e5T6M) or [tweet at us](https://twitter.com/vishalrohra_)" ] }, { @@ -18,10 +17,27 @@ "id": "fc507b8e-ea51-417c-93da-42bf998a1195", "metadata": {}, "source": [ - "Installation\n", - "=\n", + "## Installation\n", "\n", - "First you will need to install the [`kay` package](https://pypi.org/project/kay/). You will also need an API key: you can get one for free at [https://kay.ai](https://kay.ai/). Once you have an API key, you must set it as an environment variable `KAY_API_KEY`.\n", + "First, install the [`kay` package](https://pypi.org/project/kay/). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ae22ad3e-4643-4314-8dea-a5abff0d87b0", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install kay" + ] + }, + { + "cell_type": "markdown", + "id": "efd317f7-9b7d-4e71-875c-5f0b6efeca05", + "metadata": {}, + "source": [ + "You will also need an API key: you can get one for free at [https://kay.ai](https://kay.ai/). Once you have an API key, you must set it as an environment variable `KAY_API_KEY`.\n", "\n", "`KayAiRetriever` has a static `.create()` factory method that takes the following arguments:\n", "\n", @@ -35,11 +51,9 @@ "id": "c923bea0-585a-4f62-8662-efc167e8d793", "metadata": {}, "source": [ - "Examples\n", - "=\n", + "## Examples\n", "\n", - "Basic Retriever Usage\n", - "-" + "### Basic Retriever Usage" ] }, { @@ -111,8 +125,7 @@ "id": "21f6e9e5-478c-4b2c-9d61-f7a84f4d2f8f", "metadata": {}, "source": [ - "Usage in a chain\n", - "-" + "### Usage in a chain" ] }, { diff --git a/docs/docs/integrations/retrievers/knn.ipynb b/docs/docs/integrations/retrievers/knn.ipynb index 0324a1823f..9eb641ffe8 100644 --- a/docs/docs/integrations/retrievers/knn.ipynb +++ b/docs/docs/integrations/retrievers/knn.ipynb @@ -7,11 +7,11 @@ "source": [ "# kNN\n", "\n", - ">In statistics, the [k-nearest neighbors algorithm (k-NN)](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regression.\n", + ">In statistics, the [k-nearest neighbours algorithm (k-NN)](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) is a non-parametric supervised learning method first developed by `Evelyn Fix` and `Joseph Hodges` in 1951, and later expanded by `Thomas Cover`. It is used for classification and regression.\n", "\n", - "This notebook goes over how to use a retriever that under the hood uses an kNN.\n", + "This notebook goes over how to use a retriever that under the hood uses a kNN.\n", "\n", - "Largely based on https://github.com/karpathy/randomfun/blob/master/knn_vs_svm.html" + "Largely based on the code of [Andrej Karpathy](https://github.com/karpathy/randomfun/blob/master/knn_vs_svm.html)." ] }, { diff --git a/docs/docs/integrations/retrievers/merger_retriever.ipynb b/docs/docs/integrations/retrievers/merger_retriever.ipynb index cc6dc2cb45..b308683939 100644 --- a/docs/docs/integrations/retrievers/merger_retriever.ipynb +++ b/docs/docs/integrations/retrievers/merger_retriever.ipynb @@ -8,7 +8,7 @@ "source": [ "# LOTR (Merger Retriever)\n", "\n", - "`Lord of the Retrievers`, also known as `MergerRetriever`, takes a list of retrievers as input and merges the results of their get_relevant_documents() methods into a single list. The merged results will be a list of documents that are relevant to the query and that have been ranked by the different retrievers.\n", + ">`Lord of the Retrievers (LOTR)`, also known as `MergerRetriever`, takes a list of retrievers as input and merges the results of their get_relevant_documents() methods into a single list. The merged results will be a list of documents that are relevant to the query and that have been ranked by the different retrievers.\n", "\n", "The `MergerRetriever` class can be used to improve the accuracy of document retrieval in a number of ways. First, it can combine the results of multiple retrievers, which can help to reduce the risk of bias in the results. Second, it can rank the results of the different retrievers, which can help to ensure that the most relevant documents are returned first." ] diff --git a/docs/docs/integrations/retrievers/qdrant-sparse.ipynb b/docs/docs/integrations/retrievers/qdrant-sparse.ipynb index 17b81b543c..54607f97f4 100644 --- a/docs/docs/integrations/retrievers/qdrant-sparse.ipynb +++ b/docs/docs/integrations/retrievers/qdrant-sparse.ipynb @@ -5,12 +5,12 @@ "id": "ce0f17b9", "metadata": {}, "source": [ - "# Qdrant Sparse Vector Retriever\n", + "# Qdrant Sparse Vector\n", "\n", ">[Qdrant](https://qdrant.tech/) is an open-source, high-performance vector search engine/database.\n", "\n", "\n", - ">`QdrantSparseVectorRetriever` uses [sparse vectors](https://qdrant.tech/articles/sparse-vectors/) introduced in Qdrant [v1.7.0](https://qdrant.tech/articles/qdrant-1.7.x/) for document retrieval.\n" + ">`QdrantSparseVectorRetriever` uses [sparse vectors](https://qdrant.tech/articles/sparse-vectors/) introduced in `Qdrant` [v1.7.0](https://qdrant.tech/articles/qdrant-1.7.x/) for document retrieval.\n" ] }, { diff --git a/docs/docs/integrations/retrievers/ragatouille.ipynb b/docs/docs/integrations/retrievers/ragatouille.ipynb index 350c831c14..868fde5f60 100644 --- a/docs/docs/integrations/retrievers/ragatouille.ipynb +++ b/docs/docs/integrations/retrievers/ragatouille.ipynb @@ -8,9 +8,13 @@ "# RAGatouille\n", "\n", "\n", - "This page covers how to use [RAGatouille](https://github.com/bclavie/RAGatouille) as a retriever in a LangChain chain. RAGatouille makes it as simple as can be to use ColBERT! [ColBERT](https://github.com/stanford-futuredata/ColBERT) is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.\n", + ">[RAGatouille](https://github.com/bclavie/RAGatouille) makes it as simple as can be to use `ColBERT`!\n", + ">\n", + ">[ColBERT](https://github.com/stanford-futuredata/ColBERT) is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.\n", "\n", - "We can use this as a [retriever](/docs/modules/data_connection/retrievers). It will show functionality specific to this integration. After going through, it may be useful to explore [relevant use-case pages](/docs/use_cases/question_answering) to learn how to use this vectorstore as part of a larger chain.\n", + "We can use this as a [retriever](/docs/modules/data_connection/retrievers). It will show functionality specific to this integration. After going through, it may be useful to explore [relevant use-case pages](/docs/use_cases/question_answering) to learn how to use this vector store as part of a larger chain.\n", + "\n", + "This page covers how to use [RAGatouille](https://github.com/bclavie/RAGatouille) as a retriever in a LangChain chain. \n", "\n", "## Setup\n", "\n", diff --git a/docs/docs/integrations/retrievers/sec_filings.ipynb b/docs/docs/integrations/retrievers/sec_filings.ipynb index 3cfbcddd20..b23cc05cc0 100644 --- a/docs/docs/integrations/retrievers/sec_filings.ipynb +++ b/docs/docs/integrations/retrievers/sec_filings.ipynb @@ -8,9 +8,9 @@ "# SEC filing\n", "\n", "\n", - ">The SEC filing is a financial statement or other formal document submitted to the U.S. Securities and Exchange Commission (SEC). Public companies, certain insiders, and broker-dealers are required to make regular SEC filings. Investors and financial professionals rely on these filings for information about companies they are evaluating for investment purposes.\n", + ">[SEC filing](https://www.sec.gov/edgar) is a financial statement or other formal document submitted to the U.S. Securities and Exchange Commission (SEC). Public companies, certain insiders, and broker-dealers are required to make regular `SEC filings`. Investors and financial professionals rely on these filings for information about companies they are evaluating for investment purposes.\n", ">\n", - ">SEC filings data powered by [Kay.ai](https://kay.ai) and [Cybersyn](https://www.cybersyn.com/) via [Snowflake Marketplace](https://app.snowflake.com/marketplace/providers/GZTSZAS2KCS/Cybersyn%2C%20Inc).\n" + ">`SEC filings` data powered by [Kay.ai](https://kay.ai) and [Cybersyn](https://www.cybersyn.com/) via [Snowflake Marketplace](https://app.snowflake.com/marketplace/providers/GZTSZAS2KCS/Cybersyn%2C%20Inc).\n" ] }, { diff --git a/docs/docs/integrations/retrievers/self_query/astradb.ipynb b/docs/docs/integrations/retrievers/self_query/astradb.ipynb index aa8e81b5e1..a37597cf2e 100644 --- a/docs/docs/integrations/retrievers/self_query/astradb.ipynb +++ b/docs/docs/integrations/retrievers/self_query/astradb.ipynb @@ -4,9 +4,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Astra DB\n", + "# Astra DB (Cassandra)\n", "\n", - "DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Cassandra and made conveniently available through an easy-to-use JSON API.\n", + ">[DataStax Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on `Cassandra` and made conveniently available through an easy-to-use JSON API.\n", "\n", "In the walkthrough, we'll demo the `SelfQueryRetriever` with an `Astra DB` vector store." ] @@ -57,6 +57,9 @@ "cell_type": "markdown", "metadata": { "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, "pycharm": { "name": "#%% md\n" } @@ -276,7 +279,10 @@ { "cell_type": "markdown", "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "source": [ "## Cleanup\n", @@ -290,7 +296,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -300,7 +309,7 @@ ], "metadata": { "kernelspec": { - "display_name": ".venv", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -314,9 +323,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.5" + "version": "3.10.12" } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/docs/docs/integrations/retrievers/self_query/chroma_self_query.ipynb b/docs/docs/integrations/retrievers/self_query/chroma_self_query.ipynb index e04509af54..08ee33c5c3 100644 --- a/docs/docs/integrations/retrievers/self_query/chroma_self_query.ipynb +++ b/docs/docs/integrations/retrievers/self_query/chroma_self_query.ipynb @@ -7,7 +7,7 @@ "source": [ "# Chroma\n", "\n", - ">[Chroma](https://docs.trychroma.com/getting-started) is a database for building AI applications with embeddings.\n", + ">[Chroma](https://docs.trychroma.com/getting-started) is a vector database for building AI applications with embeddings.\n", "\n", "In the notebook, we'll demo the `SelfQueryRetriever` wrapped around a `Chroma` vector store. " ] diff --git a/docs/docs/integrations/retrievers/self_query/index.mdx b/docs/docs/integrations/retrievers/self_query/index.mdx index 71899a6397..dc438601a2 100644 --- a/docs/docs/integrations/retrievers/self_query/index.mdx +++ b/docs/docs/integrations/retrievers/self_query/index.mdx @@ -2,7 +2,7 @@ sidebar-position: 0 --- -# Self-querying retriever +# Self-querying retrievers Learn about how the self-querying retriever works [here](/docs/modules/data_connection/retrievers/self_query). diff --git a/docs/docs/integrations/retrievers/self_query/mongodb_atlas.ipynb b/docs/docs/integrations/retrievers/self_query/mongodb_atlas.ipynb index cfe0aa6a79..d7b13b47f0 100644 --- a/docs/docs/integrations/retrievers/self_query/mongodb_atlas.ipynb +++ b/docs/docs/integrations/retrievers/self_query/mongodb_atlas.ipynb @@ -6,8 +6,8 @@ "source": [ "# MongoDB Atlas\n", "\n", - "[MongoDB Atlas](https://www.mongodb.com/) is a document database that can be \n", - "used as a vector databse.\n", + ">[MongoDB Atlas](https://www.mongodb.com/) is a document database that can be \n", + "used as a vector database.\n", "\n", "In the walkthrough, we'll demo the `SelfQueryRetriever` with a `MongoDB Atlas` vector store." ] @@ -299,7 +299,7 @@ ], "metadata": { "kernelspec": { - "display_name": ".venv", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -313,9 +313,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.5" + "version": "3.10.12" } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/docs/docs/integrations/retrievers/self_query/pgvector_self_query.ipynb b/docs/docs/integrations/retrievers/self_query/pgvector_self_query.ipynb index 8daf192f59..0ea1983673 100644 --- a/docs/docs/integrations/retrievers/self_query/pgvector_self_query.ipynb +++ b/docs/docs/integrations/retrievers/self_query/pgvector_self_query.ipynb @@ -5,9 +5,9 @@ "id": "13afcae7", "metadata": {}, "source": [ - "# PGVector\n", + "# PGVector (Postgres)\n", "\n", - ">[PGVector](https://github.com/pgvector/pgvector) is a vector similarity search for Postgres.\n", + ">[PGVector](https://github.com/pgvector/pgvector) is a vector similarity search package for `Postgres` data base.\n", "\n", "In the notebook, we'll demo the `SelfQueryRetriever` wrapped around a `PGVector` vector store." ] @@ -300,7 +300,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.16" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/retrievers/self_query/supabase_self_query.ipynb b/docs/docs/integrations/retrievers/self_query/supabase_self_query.ipynb index 7477cfec58..d1bed3d9dc 100644 --- a/docs/docs/integrations/retrievers/self_query/supabase_self_query.ipynb +++ b/docs/docs/integrations/retrievers/self_query/supabase_self_query.ipynb @@ -5,7 +5,7 @@ "id": "13afcae7", "metadata": {}, "source": [ - "# Supabase\n", + "# Supabase (Postgres)\n", "\n", ">[Supabase](https://supabase.com/docs) is an open-source `Firebase` alternative. \n", "> `Supabase` is built on top of `PostgreSQL`, which offers strong `SQL` \n", diff --git a/docs/docs/integrations/retrievers/self_query/timescalevector_self_query.ipynb b/docs/docs/integrations/retrievers/self_query/timescalevector_self_query.ipynb index 9dc762d025..f74fff3255 100644 --- a/docs/docs/integrations/retrievers/self_query/timescalevector_self_query.ipynb +++ b/docs/docs/integrations/retrievers/self_query/timescalevector_self_query.ipynb @@ -6,9 +6,13 @@ "id": "13afcae7", "metadata": {}, "source": [ - "# Timescale Vector (Postgres) self-querying \n", + "# Timescale Vector (Postgres) \n", "\n", - "[Timescale Vector](https://www.timescale.com/ai) is PostgreSQL++ for AI applications. It enables you to efficiently store and query billions of vector embeddings in `PostgreSQL`.\n", + ">[Timescale Vector](https://www.timescale.com/ai) is `PostgreSQL++` for AI applications. It enables you to efficiently store and query billions of vector embeddings in `PostgreSQL`.\n", + ">\n", + ">[PostgreSQL](https://en.wikipedia.org/wiki/PostgreSQL) also known as `Postgres`,\n", + "> is a free and open-source relational database management system (RDBMS) \n", + "> emphasizing extensibility and `SQL` compliance.\n", "\n", "This notebook shows how to use the Postgres vector database (`TimescaleVector`) to perform self-querying. In the notebook we'll demo the `SelfQueryRetriever` wrapped around a TimescaleVector vector store. \n", "\n", @@ -528,6 +532,18 @@ "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/retrievers/self_query/vectara_self_query.ipynb b/docs/docs/integrations/retrievers/self_query/vectara_self_query.ipynb index c95fe311df..807fe75be7 100644 --- a/docs/docs/integrations/retrievers/self_query/vectara_self_query.ipynb +++ b/docs/docs/integrations/retrievers/self_query/vectara_self_query.ipynb @@ -5,19 +5,15 @@ "id": "13afcae7", "metadata": {}, "source": [ - "# Vectara self-querying \n", + "# Vectara \n", "\n", ">[Vectara](https://vectara.com/) is the trusted GenAI platform that provides an easy-to-use API for document indexing and querying. \n", - "\n", - "Vectara provides an end-to-end managed service for Retrieval Augmented Generation or [RAG](https://vectara.com/grounded-generation/), which includes:\n", - "\n", - "1. A way to extract text from document files and chunk them into sentences.\n", - "\n", - "2. The state-of-the-art [Boomerang](https://vectara.com/how-boomerang-takes-retrieval-augmented-generation-to-the-next-level-via-grounded-generation/) embeddings model. Each text chunk is encoded into a vector embedding using Boomerang, and stored in the Vectara internal knowledge (vector+text) store\n", - "\n", - "3. A query service that automatically encodes the query into embedding, and retrieves the most relevant text segments (including support for [Hybrid Search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) and [MMR](https://vectara.com/get-diverse-results-and-comprehensive-summaries-with-vectaras-mmr-reranker/))\n", - "\n", - "4. An option to create [generative summary](https://docs.vectara.com/docs/learn/grounded-generation/grounded-generation-overview), based on the retrieved documents, including citations.\n", + ">\n", + ">`Vectara` provides an end-to-end managed service for `Retrieval Augmented Generation` or [RAG](https://vectara.com/grounded-generation/), which includes:\n", + ">1. A way to `extract text` from document files and `chunk` them into sentences.\n", + ">2. The state-of-the-art [Boomerang](https://vectara.com/how-boomerang-takes-retrieval-augmented-generation-to-the-next-level-via-grounded-generation/) embeddings model. Each text chunk is encoded into a vector embedding using `Boomerang`, and stored in the Vectara internal knowledge (vector+text) store\n", + ">3. A query service that automatically encodes the query into embedding, and retrieves the most relevant text segments (including support for [Hybrid Search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) and [MMR](https://vectara.com/get-diverse-results-and-comprehensive-summaries-with-vectaras-mmr-reranker/))\n", + ">4. An option to create [generative summary](https://docs.vectara.com/docs/learn/grounded-generation/grounded-generation-overview), based on the retrieved documents, including citations.\n", "\n", "See the [Vectara API documentation](https://docs.vectara.com/docs/) for more information on how to use the API.\n", "\n", @@ -31,17 +27,17 @@ "source": [ "# Setup\n", "\n", - "You will need a Vectara account to use Vectara with LangChain. To get started, use the following steps (see our [quickstart](https://docs.vectara.com/docs/quickstart) guide):\n", - "1. [Sign up](https://console.vectara.com/signup) for a Vectara account if you don't already have one. Once you have completed your sign up you will have a Vectara customer ID. You can find your customer ID by clicking on your name, on the top-right of the Vectara console window.\n", - "2. Within your account you can create one or more corpora. Each corpus represents an area that stores text data upon ingest from input documents. To create a corpus, use the **\"Create Corpus\"** button. You then provide a name to your corpus as well as a description. Optionally you can define filtering attributes and apply some advanced options. If you click on your created corpus, you can see its name and corpus ID right on the top.\n", + "You will need a `Vectara` account to use `Vectara` with `LangChain`. To get started, use the following steps (see our [quickstart](https://docs.vectara.com/docs/quickstart) guide):\n", + "1. [Sign up](https://console.vectara.com/signup) for a `Vectara` account if you don't already have one. Once you have completed your sign up you will have a Vectara customer ID. You can find your customer ID by clicking on your name, on the top-right of the Vectara console window.\n", + "2. Within your account you can create one or more corpora. Each corpus represents an area that stores text data upon ingesting from input documents. To create a corpus, use the **\"Create Corpus\"** button. You then provide a name to your corpus as well as a description. Optionally you can define filtering attributes and apply some advanced options. If you click on your created corpus, you can see its name and corpus ID right on the top.\n", "3. Next you'll need to create API keys to access the corpus. Click on the **\"Authorization\"** tab in the corpus view and then the **\"Create API Key\"** button. Give your key a name, and choose whether you want query only or query+index for your key. Click \"Create\" and you now have an active API key. Keep this key confidential. \n", "\n", - "To use LangChain with Vectara, you'll need to have these three values: customer ID, corpus ID and api_key.\n", + "To use LangChain with Vectara, you need three values: customer ID, corpus ID and api_key.\n", "You can provide those to LangChain in two ways:\n", "\n", "1. Include in your environment these three variables: `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`.\n", "\n", - "> For example, you can set these variables using os.environ and getpass as follows:\n", + "> For example, you can set these variables using `os.environ` and `getpass` as follows:\n", "\n", "```python\n", "import os\n", @@ -52,7 +48,7 @@ "os.environ[\"VECTARA_API_KEY\"] = getpass.getpass(\"Vectara API Key:\")\n", "```\n", "\n", - "1. Provide them as arguments when creating the Vectara vectorstore object:\n", + "1. Provide them as arguments when creating the `Vectara` vectorstore object:\n", "\n", "```python\n", "vectorstore = Vectara(\n", @@ -398,7 +394,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.9" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/retrievers/tavily.ipynb b/docs/docs/integrations/retrievers/tavily.ipynb index 8358202612..6c5c61fb3d 100644 --- a/docs/docs/integrations/retrievers/tavily.ipynb +++ b/docs/docs/integrations/retrievers/tavily.ipynb @@ -6,7 +6,7 @@ "source": [ "# Tavily Search API\n", "\n", - "[Tavily's Search API](https://tavily.com) is a search engine built specifically for AI agents (LLMs), delivering real-time, accurate, and factual results at speed.\n", + ">[Tavily's Search API](https://tavily.com) is a search engine built specifically for AI agents (LLMs), delivering real-time, accurate, and factual results at speed.\n", "\n", "We can use this as a [retriever](/docs/modules/data_connection/retrievers). It will show functionality specific to this integration. After going through, it may be useful to explore [relevant use-case pages](/docs/use_cases/question_answering) to learn how to use this vectorstore as part of a larger chain.\n", "\n", diff --git a/docs/docs/integrations/retrievers/you-retriever.ipynb b/docs/docs/integrations/retrievers/you-retriever.ipynb index d32f167251..d0f41b4fdf 100644 --- a/docs/docs/integrations/retrievers/you-retriever.ipynb +++ b/docs/docs/integrations/retrievers/you-retriever.ipynb @@ -5,9 +5,9 @@ "id": "818fc023", "metadata": {}, "source": [ - "# You.com Retriever\n", + "# You.com\n", "\n", - "The [you.com API](https://api.you.com) is a suite of tools designed to help developers ground the output of LLMs in the most recent, most accurate, most relevant information that may not have been included in their training dataset." + ">[you.com API](https://api.you.com) is a suite of tools designed to help developers ground the output of LLMs in the most recent, most accurate, most relevant information that may not have been included in their training dataset." ] }, {