mirror of https://github.com/hwchase17/langchain
You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1289 lines
46 KiB
Plaintext
1289 lines
46 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Redis\n",
|
|
"\n",
|
|
">[Redis vector database](https://redis.io/docs/get-started/vector-database/) introduction and langchain integration guide.\n",
|
|
"\n",
|
|
"## What is Redis?\n",
|
|
"\n",
|
|
"Most developers from a web services background are familiar with `Redis`. At its core, `Redis` is an open-source key-value store that is used as a cache, message broker, and database. Developers choose `Redis` because it is fast, has a large ecosystem of client libraries, and has been deployed by major enterprises for years.\n",
|
|
"\n",
|
|
"On top of these traditional use cases, `Redis` provides additional capabilities like the Search and Query capability that allows users to create secondary index structures within `Redis`. This allows `Redis` to be a Vector Database, at the speed of a cache. \n",
|
|
"\n",
|
|
"\n",
|
|
"## Redis as a Vector Database\n",
|
|
"\n",
|
|
"`Redis` uses compressed, inverted indexes for fast indexing with a low memory footprint. It also supports a number of advanced features such as:\n",
|
|
"\n",
|
|
"* Indexing of multiple fields in Redis hashes and `JSON`\n",
|
|
"* Vector similarity search (with `HNSW` (ANN) or `FLAT` (KNN))\n",
|
|
"* Vector Range Search (e.g. find all vectors within a radius of a query vector)\n",
|
|
"* Incremental indexing without performance loss\n",
|
|
"* Document ranking (using [tf-idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf), with optional user-provided weights)\n",
|
|
"* Field weighting\n",
|
|
"* Complex boolean queries with `AND`, `OR`, and `NOT` operators\n",
|
|
"* Prefix matching, fuzzy matching, and exact-phrase queries\n",
|
|
"* Support for [double-metaphone phonetic matching](https://redis.io/docs/stack/search/reference/phonetic_matching/)\n",
|
|
"* Auto-complete suggestions (with fuzzy prefix suggestions)\n",
|
|
"* Stemming-based query expansion in [many languages](https://redis.io/docs/stack/search/reference/stemming/) (using [Snowball](http://snowballstem.org/))\n",
|
|
"* Support for Chinese-language tokenization and querying (using [Friso](https://github.com/lionsoul2014/friso))\n",
|
|
"* Numeric filters and ranges\n",
|
|
"* Geospatial searches using Redis geospatial indexing\n",
|
|
"* A powerful aggregations engine\n",
|
|
"* Supports for all `utf-8` encoded text\n",
|
|
"* Retrieve full documents, selected fields, or only the document IDs\n",
|
|
"* Sorting results (for example, by creation date)\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"## Clients\n",
|
|
"\n",
|
|
"Since `Redis` is much more than just a vector database, there are often use cases that demand the usage of a `Redis` client besides just the `LangChain` integration. You can use any standard `Redis` client library to run Search and Query commands, but it's easiest to use a library that wraps the Search and Query API. Below are a few examples, but you can find more client libraries [here](https://redis.io/resources/clients/).\n",
|
|
"\n",
|
|
"| Project | Language | License | Author | Stars |\n",
|
|
"|----------|---------|--------|---------|-------|\n",
|
|
"| [jedis][jedis-url] | Java | MIT | [Redis][redis-url] | ![Stars][jedis-stars] |\n",
|
|
"| [redisvl][redisvl-url] | Python | MIT | [Redis][redis-url] | ![Stars][redisvl-stars] |\n",
|
|
"| [redis-py][redis-py-url] | Python | MIT | [Redis][redis-url] | ![Stars][redis-py-stars] |\n",
|
|
"| [node-redis][node-redis-url] | Node.js | MIT | [Redis][redis-url] | ![Stars][node-redis-stars] |\n",
|
|
"| [nredisstack][nredisstack-url] | .NET | MIT | [Redis][redis-url] | ![Stars][nredisstack-stars] |\n",
|
|
"\n",
|
|
"[redis-url]: https://redis.com\n",
|
|
"\n",
|
|
"[redisvl-url]: https://github.com/RedisVentures/redisvl\n",
|
|
"[redisvl-stars]: https://img.shields.io/github/stars/RedisVentures/redisvl.svg?style=social&label=Star&maxAge=2592000\n",
|
|
"[redisvl-package]: https://pypi.python.org/pypi/redisvl\n",
|
|
"\n",
|
|
"[redis-py-url]: https://github.com/redis/redis-py\n",
|
|
"[redis-py-stars]: https://img.shields.io/github/stars/redis/redis-py.svg?style=social&label=Star&maxAge=2592000\n",
|
|
"[redis-py-package]: https://pypi.python.org/pypi/redis\n",
|
|
"\n",
|
|
"[jedis-url]: https://github.com/redis/jedis\n",
|
|
"[jedis-stars]: https://img.shields.io/github/stars/redis/jedis.svg?style=social&label=Star&maxAge=2592000\n",
|
|
"[Jedis-package]: https://search.maven.org/artifact/redis.clients/jedis\n",
|
|
"\n",
|
|
"[nredisstack-url]: https://github.com/redis/nredisstack\n",
|
|
"[nredisstack-stars]: https://img.shields.io/github/stars/redis/nredisstack.svg?style=social&label=Star&maxAge=2592000\n",
|
|
"[nredisstack-package]: https://www.nuget.org/packages/nredisstack/\n",
|
|
"\n",
|
|
"[node-redis-url]: https://github.com/redis/node-redis\n",
|
|
"[node-redis-stars]: https://img.shields.io/github/stars/redis/node-redis.svg?style=social&label=Star&maxAge=2592000\n",
|
|
"[node-redis-package]: https://www.npmjs.com/package/redis\n",
|
|
"\n",
|
|
"[redis-om-python-url]: https://github.com/redis/redis-om-python\n",
|
|
"[redis-om-python-author]: https://redis.com\n",
|
|
"[redis-om-python-stars]: https://img.shields.io/github/stars/redis/redis-om-python.svg?style=social&label=Star&maxAge=2592000\n",
|
|
"\n",
|
|
"[redisearch-go-url]: https://github.com/RediSearch/redisearch-go\n",
|
|
"[redisearch-go-author]: https://redis.com\n",
|
|
"[redisearch-go-stars]: https://img.shields.io/github/stars/RediSearch/redisearch-go.svg?style=social&label=Star&maxAge=2592000\n",
|
|
"\n",
|
|
"[redisearch-api-rs-url]: https://github.com/RediSearch/redisearch-api-rs\n",
|
|
"[redisearch-api-rs-author]: https://redis.com\n",
|
|
"[redisearch-api-rs-stars]: https://img.shields.io/github/stars/RediSearch/redisearch-api-rs.svg?style=social&label=Star&maxAge=2592000\n",
|
|
"\n",
|
|
"\n",
|
|
"## Deployment options\n",
|
|
"\n",
|
|
"There are many ways to deploy Redis with RediSearch. The easiest way to get started is to use Docker, but there are are many potential options for deployment such as\n",
|
|
"\n",
|
|
"- [Redis Cloud](https://redis.com/redis-enterprise-cloud/overview/)\n",
|
|
"- [Docker (Redis Stack)](https://hub.docker.com/r/redis/redis-stack)\n",
|
|
"- Cloud marketplaces: [AWS Marketplace](https://aws.amazon.com/marketplace/pp/prodview-e6y7ork67pjwg?sr=0-2&ref_=beagle&applicationId=AWSMPContessa), [Google Marketplace](https://console.cloud.google.com/marketplace/details/redislabs-public/redis-enterprise?pli=1), or [Azure Marketplace](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/garantiadata.redis_enterprise_1sp_public_preview?tab=Overview)\n",
|
|
"- On-premise: [Redis Enterprise Software](https://redis.com/redis-enterprise-software/overview/)\n",
|
|
"- Kubernetes: [Redis Enterprise Software on Kubernetes](https://docs.redis.com/latest/kubernetes/)\n",
|
|
"\n",
|
|
"\n",
|
|
"## Additional examples\n",
|
|
"\n",
|
|
"Many examples can be found in the [Redis AI team's GitHub](https://github.com/RedisVentures/)\n",
|
|
"\n",
|
|
"- [Awesome Redis AI Resources](https://github.com/RedisVentures/redis-ai-resources) - List of examples of using Redis in AI workloads\n",
|
|
"- [Azure OpenAI Embeddings Q&A](https://github.com/ruoccofabrizio/azure-open-ai-embeddings-qna) - OpenAI and Redis as a Q&A service on Azure.\n",
|
|
"- [ArXiv Paper Search](https://github.com/RedisVentures/redis-arXiv-search) - Semantic search over arXiv scholarly papers\n",
|
|
"- [Vector Search on Azure](https://learn.microsoft.com/azure/azure-cache-for-redis/cache-tutorial-vector-similarity) - Vector search on Azure using Azure Cache for Redis and Azure OpenAI\n",
|
|
"\n",
|
|
"\n",
|
|
"## More resources\n",
|
|
"\n",
|
|
"For more information on how to use Redis as a vector database, check out the following resources:\n",
|
|
"\n",
|
|
"- [RedisVL Documentation](https://redisvl.com) - Documentation for the Redis Vector Library Client\n",
|
|
"- [Redis Vector Similarity Docs](https://redis.io/docs/stack/search/reference/vectors/) - Redis official docs for Vector Search.\n",
|
|
"- [Redis-py Search Docs](https://redis.readthedocs.io/en/latest/redismodules.html#redisearch-commands) - Documentation for redis-py client library\n",
|
|
"- [Vector Similarity Search: From Basics to Production](https://mlops.community/vector-similarity-search-from-basics-to-production/) - Introductory blog post to VSS and Redis as a VectorDB."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setting up\n",
|
|
"\n",
|
|
"\n",
|
|
"### Install Redis Python client\n",
|
|
"\n",
|
|
"`Redis-py` is the officially supported client by Redis. Recently released is the `RedisVL` client which is purpose-built for the Vector Database use cases. Both can be installed with pip."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%pip install --upgrade --quiet redis redisvl langchain-openai tiktoken"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import getpass\n",
|
|
"import os\n",
|
|
"\n",
|
|
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_openai import OpenAIEmbeddings\n",
|
|
"\n",
|
|
"embeddings = OpenAIEmbeddings()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Deploy Redis locally\n",
|
|
"\n",
|
|
"To locally deploy Redis, run:\n",
|
|
"\n",
|
|
"```console\n",
|
|
"docker run -d -p 6379:6379 -p 8001:8001 redis/redis-stack:latest\n",
|
|
"```\n",
|
|
"If things are running correctly you should see a nice Redis UI at `http://localhost:8001`. See the [Deployment options](#deployment-options) section above for other ways to deploy.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Redis connection Url schemas\n",
|
|
"\n",
|
|
"Valid Redis Url schemas are:\n",
|
|
"1. `redis://` - Connection to Redis standalone, unencrypted\n",
|
|
"2. `rediss://` - Connection to Redis standalone, with TLS encryption\n",
|
|
"3. `redis+sentinel://` - Connection to Redis server via Redis Sentinel, unencrypted\n",
|
|
"4. `rediss+sentinel://` - Connection to Redis server via Redis Sentinel, booth connections with TLS encryption\n",
|
|
"\n",
|
|
"More information about additional connection parameters can be found in the [redis-py documentation](https://redis-py.readthedocs.io/en/stable/connections.html)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# connection to redis standalone at localhost, db 0, no password\n",
|
|
"redis_url = \"redis://localhost:6379\"\n",
|
|
"# connection to host \"redis\" port 7379 with db 2 and password \"secret\" (old style authentication scheme without username / pre 6.x)\n",
|
|
"redis_url = \"redis://:secret@redis:7379/2\"\n",
|
|
"# connection to host redis on default port with user \"joe\", pass \"secret\" using redis version 6+ ACLs\n",
|
|
"redis_url = \"redis://joe:secret@redis/0\"\n",
|
|
"\n",
|
|
"# connection to sentinel at localhost with default group mymaster and db 0, no password\n",
|
|
"redis_url = \"redis+sentinel://localhost:26379\"\n",
|
|
"# connection to sentinel at host redis with default port 26379 and user \"joe\" with password \"secret\" with default group mymaster and db 0\n",
|
|
"redis_url = \"redis+sentinel://joe:secret@redis\"\n",
|
|
"# connection to sentinel, no auth with sentinel monitoring group \"zone-1\" and database 2\n",
|
|
"redis_url = \"redis+sentinel://redis:26379/zone-1/2\"\n",
|
|
"\n",
|
|
"# connection to redis standalone at localhost, db 0, no password but with TLS support\n",
|
|
"redis_url = \"rediss://localhost:6379\"\n",
|
|
"# connection to redis sentinel at localhost and default port, db 0, no password\n",
|
|
"# but with TLS support for booth Sentinel and Redis server\n",
|
|
"redis_url = \"rediss+sentinel://localhost\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Sample data\n",
|
|
"\n",
|
|
"First we will describe some sample data so that the various attributes of the Redis vector store can be demonstrated."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"metadata = [\n",
|
|
" {\n",
|
|
" \"user\": \"john\",\n",
|
|
" \"age\": 18,\n",
|
|
" \"job\": \"engineer\",\n",
|
|
" \"credit_score\": \"high\",\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"user\": \"derrick\",\n",
|
|
" \"age\": 45,\n",
|
|
" \"job\": \"doctor\",\n",
|
|
" \"credit_score\": \"low\",\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"user\": \"nancy\",\n",
|
|
" \"age\": 94,\n",
|
|
" \"job\": \"doctor\",\n",
|
|
" \"credit_score\": \"high\",\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"user\": \"tyler\",\n",
|
|
" \"age\": 100,\n",
|
|
" \"job\": \"engineer\",\n",
|
|
" \"credit_score\": \"high\",\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"user\": \"joe\",\n",
|
|
" \"age\": 35,\n",
|
|
" \"job\": \"dentist\",\n",
|
|
" \"credit_score\": \"medium\",\n",
|
|
" },\n",
|
|
"]\n",
|
|
"texts = [\"foo\", \"foo\", \"foo\", \"bar\", \"bar\"]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create Redis vector store\n",
|
|
"\n",
|
|
"The Redis VectorStore instance can be initialized in a number of ways. There are multiple class methods that can be used to initialize a Redis VectorStore instance.\n",
|
|
"\n",
|
|
"- ``Redis.__init__`` - Initialize directly\n",
|
|
"- ``Redis.from_documents`` - Initialize from a list of ``Langchain.docstore.Document`` objects\n",
|
|
"- ``Redis.from_texts`` - Initialize from a list of texts (optionally with metadata)\n",
|
|
"- ``Redis.from_texts_return_keys`` - Initialize from a list of texts (optionally with metadata) and return the keys\n",
|
|
"- ``Redis.from_existing_index`` - Initialize from an existing Redis index\n",
|
|
"\n",
|
|
"Below we will use the ``Redis.from_texts`` method."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_community.vectorstores.redis import Redis\n",
|
|
"\n",
|
|
"rds = Redis.from_texts(\n",
|
|
" texts,\n",
|
|
" embeddings,\n",
|
|
" metadatas=metadata,\n",
|
|
" redis_url=\"redis://localhost:6379\",\n",
|
|
" index_name=\"users\",\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"'users'"
|
|
]
|
|
},
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"rds.index_name"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Inspecting the created Index\n",
|
|
"\n",
|
|
"Once the ``Redis`` VectorStore object has been constructed, an index will have been created in Redis if it did not already exist. The index can be inspected with both the ``rvl``and the ``redis-cli`` command line tool. If you installed ``redisvl`` above, you can use the ``rvl`` command line tool to inspect the index."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\u001b[32m16:58:26\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Indices:\n",
|
|
"\u001b[32m16:58:26\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m 1. users\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# assumes you're running Redis locally (use --host, --port, --password, --username, to change this)\n",
|
|
"!rvl index listall"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The ``Redis`` VectorStore implementation will attempt to generate index schema (fields for filtering) for any metadata passed through the ``from_texts``, ``from_texts_return_keys``, and ``from_documents`` methods. This way, whatever metadata is passed will be indexed into the Redis search index allowing\n",
|
|
"for filtering on those fields.\n",
|
|
"\n",
|
|
"Below we show what fields were created from the metadata we defined above"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\n",
|
|
"\n",
|
|
"Index Information:\n",
|
|
"╭──────────────┬────────────────┬───────────────┬─────────────────┬────────────╮\n",
|
|
"│ Index Name │ Storage Type │ Prefixes │ Index Options │ Indexing │\n",
|
|
"├──────────────┼────────────────┼───────────────┼─────────────────┼────────────┤\n",
|
|
"│ users │ HASH │ ['doc:users'] │ [] │ 0 │\n",
|
|
"╰──────────────┴────────────────┴───────────────┴─────────────────┴────────────╯\n",
|
|
"Index Fields:\n",
|
|
"╭────────────────┬────────────────┬─────────┬────────────────┬────────────────╮\n",
|
|
"│ Name │ Attribute │ Type │ Field Option │ Option Value │\n",
|
|
"├────────────────┼────────────────┼─────────┼────────────────┼────────────────┤\n",
|
|
"│ user │ user │ TEXT │ WEIGHT │ 1 │\n",
|
|
"│ job │ job │ TEXT │ WEIGHT │ 1 │\n",
|
|
"│ credit_score │ credit_score │ TEXT │ WEIGHT │ 1 │\n",
|
|
"│ content │ content │ TEXT │ WEIGHT │ 1 │\n",
|
|
"│ age │ age │ NUMERIC │ │ │\n",
|
|
"│ content_vector │ content_vector │ VECTOR │ │ │\n",
|
|
"╰────────────────┴────────────────┴─────────┴────────────────┴────────────────╯\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"!rvl index info -i users"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\n",
|
|
"Statistics:\n",
|
|
"╭─────────────────────────────┬─────────────╮\n",
|
|
"│ Stat Key │ Value │\n",
|
|
"├─────────────────────────────┼─────────────┤\n",
|
|
"│ num_docs │ 5 │\n",
|
|
"│ num_terms │ 15 │\n",
|
|
"│ max_doc_id │ 5 │\n",
|
|
"│ num_records │ 33 │\n",
|
|
"│ percent_indexed │ 1 │\n",
|
|
"│ hash_indexing_failures │ 0 │\n",
|
|
"│ number_of_uses │ 4 │\n",
|
|
"│ bytes_per_record_avg │ 4.60606 │\n",
|
|
"│ doc_table_size_mb │ 0.000524521 │\n",
|
|
"│ inverted_sz_mb │ 0.000144958 │\n",
|
|
"│ key_table_size_mb │ 0.000193596 │\n",
|
|
"│ offset_bits_per_record_avg │ 8 │\n",
|
|
"│ offset_vectors_sz_mb │ 2.19345e-05 │\n",
|
|
"│ offsets_per_term_avg │ 0.69697 │\n",
|
|
"│ records_per_doc_avg │ 6.6 │\n",
|
|
"│ sortable_values_size_mb │ 0 │\n",
|
|
"│ total_indexing_time │ 0.32 │\n",
|
|
"│ total_inverted_index_blocks │ 16 │\n",
|
|
"│ vector_index_sz_mb │ 6.0126 │\n",
|
|
"╰─────────────────────────────┴─────────────╯\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"!rvl stats -i users"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"It's important to note that we have not specified that the ``user``, ``job``, ``credit_score`` and ``age`` in the metadata should be fields within the index, this is because the ``Redis`` VectorStore object automatically generate the index schema from the passed metadata. For more information on the generation of index fields, see the API documentation."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Querying\n",
|
|
"\n",
|
|
"There are multiple ways to query the ``Redis`` VectorStore implementation based on what use case you have:\n",
|
|
"\n",
|
|
"- ``similarity_search``: Find the most similar vectors to a given vector.\n",
|
|
"- ``similarity_search_with_score``: Find the most similar vectors to a given vector and return the vector distance\n",
|
|
"- ``similarity_search_limit_score``: Find the most similar vectors to a given vector and limit the number of results to the ``score_threshold``\n",
|
|
"- ``similarity_search_with_relevance_scores``: Find the most similar vectors to a given vector and return the vector similarities\n",
|
|
"- ``max_marginal_relevance_search``: Find the most similar vectors to a given vector while also optimizing for diversity"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"foo\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"results = rds.similarity_search(\"foo\")\n",
|
|
"print(results[0].page_content)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Key of the document in Redis: doc:users:a70ca43b3a4e4168bae57c78753a200f\n",
|
|
"Metadata of the document: {'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'}\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# return metadata\n",
|
|
"results = rds.similarity_search(\"foo\", k=3)\n",
|
|
"meta = results[1].metadata\n",
|
|
"print(\"Key of the document in Redis: \", meta.pop(\"id\"))\n",
|
|
"print(\"Metadata of the document: \", meta)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Content: foo --- Score: 0.0\n",
|
|
"Content: foo --- Score: 0.0\n",
|
|
"Content: foo --- Score: 0.0\n",
|
|
"Content: bar --- Score: 0.1566\n",
|
|
"Content: bar --- Score: 0.1566\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# with scores (distances)\n",
|
|
"results = rds.similarity_search_with_score(\"foo\", k=5)\n",
|
|
"for result in results:\n",
|
|
" print(f\"Content: {result[0].page_content} --- Score: {result[1]}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Content: foo --- Score: 0.0\n",
|
|
"Content: foo --- Score: 0.0\n",
|
|
"Content: foo --- Score: 0.0\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# limit the vector distance that can be returned\n",
|
|
"results = rds.similarity_search_with_score(\"foo\", k=5, distance_threshold=0.1)\n",
|
|
"for result in results:\n",
|
|
" print(f\"Content: {result[0].page_content} --- Score: {result[1]}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Content: foo --- Similiarity: 1.0\n",
|
|
"Content: foo --- Similiarity: 1.0\n",
|
|
"Content: foo --- Similiarity: 1.0\n",
|
|
"Content: bar --- Similiarity: 0.8434\n",
|
|
"Content: bar --- Similiarity: 0.8434\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# with scores\n",
|
|
"results = rds.similarity_search_with_relevance_scores(\"foo\", k=5)\n",
|
|
"for result in results:\n",
|
|
" print(f\"Content: {result[0].page_content} --- Similiarity: {result[1]}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Content: foo --- Similarity: 1.0\n",
|
|
"Content: foo --- Similarity: 1.0\n",
|
|
"Content: foo --- Similarity: 1.0\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# limit scores (similarities have to be over .9)\n",
|
|
"results = rds.similarity_search_with_relevance_scores(\"foo\", k=5, score_threshold=0.9)\n",
|
|
"for result in results:\n",
|
|
" print(f\"Content: {result[0].page_content} --- Similarity: {result[1]}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 16,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"['doc:users:b9c71d62a0a34241a37950b448dafd38']"
|
|
]
|
|
},
|
|
"execution_count": 16,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# you can also add new documents as follows\n",
|
|
"new_document = [\"baz\"]\n",
|
|
"new_metadata = [{\"user\": \"sam\", \"age\": 50, \"job\": \"janitor\", \"credit_score\": \"high\"}]\n",
|
|
"# both the document and metadata must be lists\n",
|
|
"rds.add_texts(new_document, new_metadata)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 17,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"{'id': 'doc:users:b9c71d62a0a34241a37950b448dafd38', 'user': 'sam', 'job': 'janitor', 'credit_score': 'high', 'age': '50'}\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# now query the new document\n",
|
|
"results = rds.similarity_search(\"baz\", k=3)\n",
|
|
"print(results[0].metadata)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# use maximal marginal relevance search to diversify results\n",
|
|
"results = rds.max_marginal_relevance_search(\"foo\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# the lambda_mult parameter controls the diversity of the results, the lower the more diverse\n",
|
|
"results = rds.max_marginal_relevance_search(\"foo\", lambda_mult=0.1)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Connect to an existing Index\n",
|
|
"\n",
|
|
"In order to have the same metadata indexed when using the ``Redis`` VectorStore. You will need to have the same ``index_schema`` passed in either as a path to a yaml file or as a dictionary. The following shows how to obtain the schema from an index and connect to an existing index."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 18,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# write the schema to a yaml file\n",
|
|
"rds.write_schema(\"redis_schema.yaml\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The schema file for this example should look something like:\n",
|
|
"\n",
|
|
"```yaml\n",
|
|
"numeric:\n",
|
|
"- name: age\n",
|
|
" no_index: false\n",
|
|
" sortable: false\n",
|
|
"text:\n",
|
|
"- name: user\n",
|
|
" no_index: false\n",
|
|
" no_stem: false\n",
|
|
" sortable: false\n",
|
|
" weight: 1\n",
|
|
" withsuffixtrie: false\n",
|
|
"- name: job\n",
|
|
" no_index: false\n",
|
|
" no_stem: false\n",
|
|
" sortable: false\n",
|
|
" weight: 1\n",
|
|
" withsuffixtrie: false\n",
|
|
"- name: credit_score\n",
|
|
" no_index: false\n",
|
|
" no_stem: false\n",
|
|
" sortable: false\n",
|
|
" weight: 1\n",
|
|
" withsuffixtrie: false\n",
|
|
"- name: content\n",
|
|
" no_index: false\n",
|
|
" no_stem: false\n",
|
|
" sortable: false\n",
|
|
" weight: 1\n",
|
|
" withsuffixtrie: false\n",
|
|
"vector:\n",
|
|
"- algorithm: FLAT\n",
|
|
" block_size: 1000\n",
|
|
" datatype: FLOAT32\n",
|
|
" dims: 1536\n",
|
|
" distance_metric: COSINE\n",
|
|
" initial_cap: 20000\n",
|
|
" name: content_vector\n",
|
|
"```\n",
|
|
"\n",
|
|
"**Notice**, this include **all** possible fields for the schema. You can remove any fields that you don't need."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 19,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"{'id': 'doc:users:8484c48a032d4c4cbe3cc2ed6845fabb', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# now we can connect to our existing index as follows\n",
|
|
"\n",
|
|
"new_rds = Redis.from_existing_index(\n",
|
|
" embeddings,\n",
|
|
" index_name=\"users\",\n",
|
|
" redis_url=\"redis://localhost:6379\",\n",
|
|
" schema=\"redis_schema.yaml\",\n",
|
|
")\n",
|
|
"results = new_rds.similarity_search(\"foo\", k=3)\n",
|
|
"print(results[0].metadata)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 20,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"True"
|
|
]
|
|
},
|
|
"execution_count": 20,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# see the schemas are the same\n",
|
|
"new_rds.schema == rds.schema"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Custom metadata indexing\n",
|
|
"\n",
|
|
"In some cases, you may want to control what fields the metadata maps to. For example, you may want the ``credit_score`` field to be a categorical field instead of a text field (which is the default behavior for all string fields). In this case, you can use the ``index_schema`` parameter in each of the initialization methods above to specify the schema for the index. Custom index schema can either be passed as a dictionary or as a path to a YAML file.\n",
|
|
"\n",
|
|
"All arguments in the schema have defaults besides the name, so you can specify only the fields you want to change. All the names correspond to the snake/lowercase versions of the arguments you would use on the command line with ``redis-cli`` or in ``redis-py``. For more on the arguments for each field, see the [documentation](https://redis.io/docs/interact/search-and-query/basic-constructs/field-and-type-options/)\n",
|
|
"\n",
|
|
"The below example shows how to specify the schema for the ``credit_score`` field as a Tag (categorical) field instead of a text field. \n",
|
|
"\n",
|
|
"```yaml\n",
|
|
"# index_schema.yml\n",
|
|
"tag:\n",
|
|
" - name: credit_score\n",
|
|
"text:\n",
|
|
" - name: user\n",
|
|
" - name: job\n",
|
|
"numeric:\n",
|
|
" - name: age\n",
|
|
"```\n",
|
|
"\n",
|
|
"In Python, this would look like:\n",
|
|
"\n",
|
|
"```python\n",
|
|
"\n",
|
|
"index_schema = {\n",
|
|
" \"tag\": [{\"name\": \"credit_score\"}],\n",
|
|
" \"text\": [{\"name\": \"user\"}, {\"name\": \"job\"}],\n",
|
|
" \"numeric\": [{\"name\": \"age\"}],\n",
|
|
"}\n",
|
|
"\n",
|
|
"```\n",
|
|
"\n",
|
|
"Notice that only the ``name`` field needs to be specified. All other fields have defaults."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 21,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"`index_schema` does not match generated metadata schema.\n",
|
|
"If you meant to manually override the schema, please ignore this message.\n",
|
|
"index_schema: {'tag': [{'name': 'credit_score'}], 'text': [{'name': 'user'}, {'name': 'job'}], 'numeric': [{'name': 'age'}]}\n",
|
|
"generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}], 'tag': []}\n",
|
|
"\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# create a new index with the new schema defined above\n",
|
|
"index_schema = {\n",
|
|
" \"tag\": [{\"name\": \"credit_score\"}],\n",
|
|
" \"text\": [{\"name\": \"user\"}, {\"name\": \"job\"}],\n",
|
|
" \"numeric\": [{\"name\": \"age\"}],\n",
|
|
"}\n",
|
|
"\n",
|
|
"rds, keys = Redis.from_texts_return_keys(\n",
|
|
" texts,\n",
|
|
" embeddings,\n",
|
|
" metadatas=metadata,\n",
|
|
" redis_url=\"redis://localhost:6379\",\n",
|
|
" index_name=\"users_modified\",\n",
|
|
" index_schema=index_schema, # pass in the new index schema\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The above warning is meant to notify users when they are overriding the default behavior. Ignore it if you are intentionally overriding the behavior."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Hybrid filtering\n",
|
|
"\n",
|
|
"With the Redis Filter Expression language built into LangChain, you can create arbitrarily long chains of hybrid filters\n",
|
|
"that can be used to filter your search results. The expression language is derived from the [RedisVL Expression Syntax](https://redisvl.com)\n",
|
|
"and is designed to be easy to use and understand.\n",
|
|
"\n",
|
|
"The following are the available filter types:\n",
|
|
"- ``RedisText``: Filter by full-text search against metadata fields. Supports exact, fuzzy, and wildcard matching.\n",
|
|
"- ``RedisNum``: Filter by numeric range against metadata fields.\n",
|
|
"- ``RedisTag``: Filter by the exact match against string-based categorical metadata fields. Multiple tags can be specified like \"tag1,tag2,tag3\".\n",
|
|
"\n",
|
|
"The following are examples of utilizing these filters.\n",
|
|
"\n",
|
|
"```python\n",
|
|
"\n",
|
|
"from langchain_community.vectorstores.redis import RedisText, RedisNum, RedisTag\n",
|
|
"\n",
|
|
"# exact matching\n",
|
|
"has_high_credit = RedisTag(\"credit_score\") == \"high\"\n",
|
|
"does_not_have_high_credit = RedisTag(\"credit_score\") != \"low\"\n",
|
|
"\n",
|
|
"# fuzzy matching\n",
|
|
"job_starts_with_eng = RedisText(\"job\") % \"eng*\"\n",
|
|
"job_is_engineer = RedisText(\"job\") == \"engineer\"\n",
|
|
"job_is_not_engineer = RedisText(\"job\") != \"engineer\"\n",
|
|
"\n",
|
|
"# numeric filtering\n",
|
|
"age_is_18 = RedisNum(\"age\") == 18\n",
|
|
"age_is_not_18 = RedisNum(\"age\") != 18\n",
|
|
"age_is_greater_than_18 = RedisNum(\"age\") > 18\n",
|
|
"age_is_less_than_18 = RedisNum(\"age\") < 18\n",
|
|
"age_is_greater_than_or_equal_to_18 = RedisNum(\"age\") >= 18\n",
|
|
"age_is_less_than_or_equal_to_18 = RedisNum(\"age\") <= 18\n",
|
|
"\n",
|
|
"```\n",
|
|
"\n",
|
|
"The ``RedisFilter`` class can be used to simplify the import of these filters as follows\n",
|
|
"\n",
|
|
"```python\n",
|
|
"\n",
|
|
"from langchain_community.vectorstores.redis import RedisFilter\n",
|
|
"\n",
|
|
"# same examples as above\n",
|
|
"has_high_credit = RedisFilter.tag(\"credit_score\") == \"high\"\n",
|
|
"does_not_have_high_credit = RedisFilter.num(\"age\") > 8\n",
|
|
"job_starts_with_eng = RedisFilter.text(\"job\") % \"eng*\"\n",
|
|
"```\n",
|
|
"\n",
|
|
"The following are examples of using a hybrid filter for search"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 22,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Job: engineer\n",
|
|
"Engineers in the dataset: 2\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"from langchain_community.vectorstores.redis import RedisText\n",
|
|
"\n",
|
|
"is_engineer = RedisText(\"job\") == \"engineer\"\n",
|
|
"results = rds.similarity_search(\"foo\", k=3, filter=is_engineer)\n",
|
|
"\n",
|
|
"print(\"Job:\", results[0].metadata[\"job\"])\n",
|
|
"print(\"Engineers in the dataset:\", len(results))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 23,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Job: doctor\n",
|
|
"Job: doctor\n",
|
|
"Jobs in dataset that start with 'doc': 2\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# fuzzy match\n",
|
|
"starts_with_doc = RedisText(\"job\") % \"doc*\"\n",
|
|
"results = rds.similarity_search(\"foo\", k=3, filter=starts_with_doc)\n",
|
|
"\n",
|
|
"for result in results:\n",
|
|
" print(\"Job:\", result.metadata[\"job\"])\n",
|
|
"print(\"Jobs in dataset that start with 'doc':\", len(results))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 24,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"User: derrick is 45\n",
|
|
"User: nancy is 94\n",
|
|
"User: joe is 35\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"from langchain_community.vectorstores.redis import RedisNum\n",
|
|
"\n",
|
|
"is_over_18 = RedisNum(\"age\") > 18\n",
|
|
"is_under_99 = RedisNum(\"age\") < 99\n",
|
|
"age_range = is_over_18 & is_under_99\n",
|
|
"results = rds.similarity_search(\"foo\", filter=age_range)\n",
|
|
"\n",
|
|
"for result in results:\n",
|
|
" print(\"User:\", result.metadata[\"user\"], \"is\", result.metadata[\"age\"])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 25,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"User: derrick is 45\n",
|
|
"User: nancy is 94\n",
|
|
"User: joe is 35\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# make sure to use parenthesis around FilterExpressions\n",
|
|
"# if initializing them while constructing them\n",
|
|
"age_range = (RedisNum(\"age\") > 18) & (RedisNum(\"age\") < 99)\n",
|
|
"results = rds.similarity_search(\"foo\", filter=age_range)\n",
|
|
"\n",
|
|
"for result in results:\n",
|
|
" print(\"User:\", result.metadata[\"user\"], \"is\", result.metadata[\"age\"])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Redis as Retriever\n",
|
|
"\n",
|
|
"Here we go over different options for using the vector store as a retriever.\n",
|
|
"\n",
|
|
"There are three different search methods we can use to do retrieval. By default, it will use semantic similarity."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 26,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Content: foo --- Score: 0.0\n",
|
|
"Content: foo --- Score: 0.0\n",
|
|
"Content: foo --- Score: 0.0\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"query = \"foo\"\n",
|
|
"results = rds.similarity_search_with_score(query, k=3, return_metadata=True)\n",
|
|
"\n",
|
|
"for result in results:\n",
|
|
" print(\"Content:\", result[0].page_content, \" --- Score: \", result[1])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 27,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"retriever = rds.as_retriever(search_type=\"similarity\", search_kwargs={\"k\": 4})"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 28,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[Document(page_content='foo', metadata={'id': 'doc:users_modified:988ecca7574048e396756efc0e79aeca', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),\n",
|
|
" Document(page_content='foo', metadata={'id': 'doc:users_modified:009b1afeb4084cc6bdef858c7a99b48e', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'}),\n",
|
|
" Document(page_content='foo', metadata={'id': 'doc:users_modified:7087cee9be5b4eca93c30fbdd09a2731', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'}),\n",
|
|
" Document(page_content='bar', metadata={'id': 'doc:users_modified:01ef6caac12b42c28ad870aefe574253', 'user': 'tyler', 'job': 'engineer', 'credit_score': 'high', 'age': '100'})]"
|
|
]
|
|
},
|
|
"execution_count": 28,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"docs = retriever.invoke(query)\n",
|
|
"docs"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"There is also the `similarity_distance_threshold` retriever which allows the user to specify the vector distance"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 29,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"retriever = rds.as_retriever(\n",
|
|
" search_type=\"similarity_distance_threshold\",\n",
|
|
" search_kwargs={\"k\": 4, \"distance_threshold\": 0.1},\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 30,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[Document(page_content='foo', metadata={'id': 'doc:users_modified:988ecca7574048e396756efc0e79aeca', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),\n",
|
|
" Document(page_content='foo', metadata={'id': 'doc:users_modified:009b1afeb4084cc6bdef858c7a99b48e', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'}),\n",
|
|
" Document(page_content='foo', metadata={'id': 'doc:users_modified:7087cee9be5b4eca93c30fbdd09a2731', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'})]"
|
|
]
|
|
},
|
|
"execution_count": 30,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"docs = retriever.invoke(query)\n",
|
|
"docs"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Lastly, the ``similarity_score_threshold`` allows the user to define the minimum score for similar documents"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 31,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"retriever = rds.as_retriever(\n",
|
|
" search_type=\"similarity_score_threshold\",\n",
|
|
" search_kwargs={\"score_threshold\": 0.9, \"k\": 10},\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 32,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[Document(page_content='foo', metadata={'id': 'doc:users_modified:988ecca7574048e396756efc0e79aeca', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),\n",
|
|
" Document(page_content='foo', metadata={'id': 'doc:users_modified:009b1afeb4084cc6bdef858c7a99b48e', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'}),\n",
|
|
" Document(page_content='foo', metadata={'id': 'doc:users_modified:7087cee9be5b4eca93c30fbdd09a2731', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'})]"
|
|
]
|
|
},
|
|
"execution_count": 32,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"retriever.invoke(\"foo\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"retriever = rds.as_retriever(\n",
|
|
" search_type=\"mmr\", search_kwargs={\"fetch_k\": 20, \"k\": 4, \"lambda_mult\": 0.1}\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[Document(page_content='foo', metadata={'id': 'doc:users:8f6b673b390647809d510112cde01a27', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),\n",
|
|
" Document(page_content='bar', metadata={'id': 'doc:users:93521560735d42328b48c9c6f6418d6a', 'user': 'tyler', 'job': 'engineer', 'credit_score': 'high', 'age': '100'}),\n",
|
|
" Document(page_content='foo', metadata={'id': 'doc:users:125ecd39d07845eabf1a699d44134a5b', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'}),\n",
|
|
" Document(page_content='foo', metadata={'id': 'doc:users:d6200ab3764c466082fde3eaab972a2a', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'})]"
|
|
]
|
|
},
|
|
"execution_count": 13,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"retriever.invoke(\"foo\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Delete keys and index"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"To delete your entries you have to address them by their keys."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 33,
|
|
"metadata": {
|
|
"scrolled": true
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"True"
|
|
]
|
|
},
|
|
"execution_count": 33,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"Redis.delete(keys, redis_url=\"redis://localhost:6379\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 34,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"True"
|
|
]
|
|
},
|
|
"execution_count": 34,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# delete the indices too\n",
|
|
"Redis.drop_index(\n",
|
|
" index_name=\"users\", delete_documents=True, redis_url=\"redis://localhost:6379\"\n",
|
|
")\n",
|
|
"Redis.drop_index(\n",
|
|
" index_name=\"users_modified\",\n",
|
|
" delete_documents=True,\n",
|
|
" redis_url=\"redis://localhost:6379\",\n",
|
|
")"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.12"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|