"[DocArray](https://github.com/docarray/docarray) is a versatile, open-source tool for managing your multi-modal data. It lets you shape your data however you want, and offers the flexibility to store and search it using various document index backends. Plus, it gets even better - you can utilize your DocArray document index to create a DocArrayRetriever, and build awesome Langchain apps!\n",
">[DocArray](https://github.com/docarray/docarray) is a versatile, open-source tool for managing your multi-modal data. It lets you shape your data however you want, and offers the flexibility to store and search it using various document index backends. Plus, it gets even better - you can utilize your `DocArray` document index to create a `DocArrayRetriever`, and build awesome Langchain apps!\n",
"\n",
"This notebook is split into two sections. The first section offers an introduction to all five supported document index backends. It provides guidance on setting up and indexing each backend, and also instructs you on how to build a DocArrayRetriever for finding relevant documents. In the second section, we'll select one of these backends and illustrate how to use it through a basic example.\n",
"\n",
"\n",
"[Document Index Backends](#Document-Index-Backends)\n",
"[Movie Retrieval using HnswDocumentIndex](#Movie-Retrieval-using-HnswDocumentIndex)\n",
"\n",
"- [Normal Retriever](#normal-retriever)\n",
"- [Retriever with Filters](#retriever-with-filters)\n",
"- [Retriever with MMR Search](#Retriever-with-MMR-search)\n"
"This notebook is split into two sections. The [first section](#document-index-backends) offers an introduction to all five supported document index backends. It provides guidance on setting up and indexing each backend and also instructs you on how to build a `DocArrayRetriever` for finding relevant documents. \n",
"In the [second section](#movie-retrieval-using-hnswdocumentindex), we'll select one of these backends and illustrate how to use it through a basic example.\n"
]
},
{
@ -31,7 +18,7 @@
"id": "51db6285-58db-481d-8d24-b13d1888056b",
"metadata": {},
"source": [
"# Document Index Backends"
"## Document Index Backends"
]
},
{
@ -86,9 +73,9 @@
"tags": []
},
"source": [
"## InMemoryExactNNIndex\n",
"### InMemoryExactNNIndex\n",
"\n",
"InMemoryExactNNIndex stores all Documentsin memory. It is a great starting point for small datasets, where you may not want to launch a database server.\n",
"`InMemoryExactNNIndex` stores all Documentsin memory. It is a great starting point for small datasets, where you may not want to launch a database server.\n",
"\n",
"Learn more here: https://docs.docarray.org/user_guide/storing/index_in_memory/"
]
@ -159,9 +146,9 @@
"id": "a9daf2c4-6568-4a49-ba6e-21687962d2c1",
"metadata": {},
"source": [
"## HnswDocumentIndex\n",
"### HnswDocumentIndex\n",
"\n",
"HnswDocumentIndex is a lightweight Document Index implementation that runs fully locally and is best suited for small- to medium-sized datasets. It stores vectors on disk in [hnswlib](https://github.com/nmslib/hnswlib), and stores all other data in [SQLite](https://www.sqlite.org/index.html).\n",
"`HnswDocumentIndex` is a lightweight Document Index implementation that runs fully locally and is best suited for small- to medium-sized datasets. It stores vectors on disk in [hnswlib](https://github.com/nmslib/hnswlib), and stores all other data in [SQLite](https://www.sqlite.org/index.html).\n",
"\n",
"Learn more here: https://docs.docarray.org/user_guide/storing/index_hnswlib/"
]
@ -233,9 +220,9 @@
"id": "7177442e-3fd3-4f3d-ab22-cd8265b35112",
"metadata": {},
"source": [
"## WeaviateDocumentIndex\n",
"### WeaviateDocumentIndex\n",
"\n",
"WeaviateDocumentIndex is a document index that is built upon [Weaviate](https://weaviate.io/) vector database.\n",
"`WeaviateDocumentIndex` is a document index that is built upon [Weaviate](https://weaviate.io/) vector database.\n",
"\n",
"Learn more here: https://docs.docarray.org/user_guide/storing/index_weaviate/"
]
@ -331,11 +318,11 @@
"id": "6ee8f920-9297-4b0a-a353-053a86947d10",
"metadata": {},
"source": [
"## ElasticDocIndex\n",
"### ElasticDocIndex\n",
"\n",
"ElasticDocIndex is a document index that is built upon [ElasticSearch](https://github.com/elastic/elasticsearch)\n",
"`ElasticDocIndex` is a document index that is built upon [ElasticSearch](https://github.com/elastic/elasticsearch)\n",
"\n",
"Learn more here: https://docs.docarray.org/user_guide/storing/index_elastic/"
"Learn more [here](https://docs.docarray.org/user_guide/storing/index_elastic/)"
]
},
{
@ -407,11 +394,11 @@
"id": "281432f8-87a5-4f22-a582-9d5dac33d158",
"metadata": {},
"source": [
"## QdrantDocumentIndex\n",
"### QdrantDocumentIndex\n",
"\n",
"QdrantDocumentIndex is a document index that is build upon [Qdrant](https://qdrant.tech/) vector database\n",
"`QdrantDocumentIndex` is a document index that is built upon [Qdrant](https://qdrant.tech/) vector database\n",
"\n",
"Learn more here: https://docs.docarray.org/user_guide/storing/index_qdrant/"
"Learn more [here](https://docs.docarray.org/user_guide/storing/index_qdrant/)"
"## Instructions for retrieving your Google Docs data\n",
"## Retrieve the Google Docs\n",
"\n",
"By default, the `GoogleDriveRetriever` expects the `credentials.json` file to be `~/.credentials/credentials.json`, but this is configurable using the `GOOGLE_ACCOUNT_FILE` environment variable. \n",
"The location of `token.json` use the same directory (or use the parameter `token_path`). Note that `token.json` will be created automatically the first time you use the retriever.\n",
"The location of `token.json` uses the same directory (or use the parameter `token_path`). Note that `token.json` will be created automatically the first time you use the retriever.\n",
"\n",
"`GoogleDriveRetriever` can retrieve a selection of files with some requests. \n",
"\n",
@ -36,49 +38,6 @@
"The special value `root` is for your personal home."
"SEC filings data powered by [Kay.ai](https://kay.ai) and [Cybersyn](https://www.cybersyn.com/) via [Snowflake Marketplace](https://app.snowflake.com/marketplace/providers/GZTSZAS2KCS/Cybersyn%2C%20Inc).\n",
"\n",
">The SEC filing is a financial statement or other formal document submitted to the U.S. Securities and Exchange Commission (SEC). Public companies, certain insiders, and broker-dealers are required to make regular SEC filings. Investors and financial professionals rely on these filings for information about companies they are evaluating for investment purposes."
">The SEC filing is a financial statement or other formal document submitted to the U.S. Securities and Exchange Commission (SEC). Public companies, certain insiders, and broker-dealers are required to make regular SEC filings. Investors and financial professionals rely on these filings for information about companies they are evaluating for investment purposes.\n",
">\n",
">SEC filings data powered by [Kay.ai](https://kay.ai) and [Cybersyn](https://www.cybersyn.com/) via [Snowflake Marketplace](https://app.snowflake.com/marketplace/providers/GZTSZAS2KCS/Cybersyn%2C%20Inc).\n"
]
},
{
@ -18,22 +18,12 @@
"id": "fc507b8e-ea51-417c-93da-42bf998a1195",
"metadata": {},
"source": [
"Setup\n",
"=\n",
"## Setup\n",
"\n",
"First you will need to install the `kay` package. You will also need an API key: you can get one for free at [https://kay.ai](https://kay.ai/). Once you have an API key, you must set it as an environment variable `KAY_API_KEY`.\n",
"\n",
"In this example we're going to use the `KayAiRetriever`. Take a look at the [kay notebook](/docs/integrations/retrievers/kay) for more detailed information for the parmeters that it accepts.`"
]
},
{
"cell_type": "markdown",
"id": "c923bea0-585a-4f62-8662-efc167e8d793",
"metadata": {},
"source": [
"Examples\n",
"=\n",
"\n"
"First, you will need to install the `kay` package. You will also need an API key: you can get one for free at [https://kay.ai](https://kay.ai/). Once you have an API key, you must set it as an environment variable `KAY_API_KEY`.\n",
"\n",
"In this example, we're going to use the `KayAiRetriever`. Take a look at the [kay notebook](/docs/integrations/retrievers/kay) for more detailed information for the parameters that it accepts.`"