{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Redis\n", "\n", "Redis vector database introduction and langchain integration guide.\n", "\n", "## What is Redis?\n", "\n", "Most developers from a web services background are probably familiar with Redis. At it's core, Redis is an open-source key-value store that can be used as a cache, message broker, and database. Developers choose Redis because it is fast, has a large ecosystem of client libraries, and has been deployed by major enterprises for years.\n", "\n", "On top of these traditional use cases, Redis provides additional capabilities like the Search and Query capability that allows users to create secondary index structures within Redis. This allows Redis to be a Vector Database, at the speed of a cache. \n", "\n", "\n", "## Redis as a Vector Database\n", "\n", "Redis uses compressed, inverted indexes for fast indexing with a low memory footprint. It also supports a number of advanced features such as:\n", "\n", "* Indexing of multiple fields in Redis hashes and JSON\n", "* Vector similarity search (with HNSW (ANN) or FLAT (KNN))\n", "* Vector Range Search (e.g. find all vectors within a radius of a query vector)\n", "* Incremental indexing without performance loss\n", "* Document ranking (using [tf-idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf), with optional user-provided weights)\n", "* Field weighting\n", "* Complex boolean queries with AND, OR, and NOT operators\n", "* Prefix matching, fuzzy matching, and exact-phrase queries\n", "* Support for [double-metaphone phonetic matching](https://redis.io/docs/stack/search/reference/phonetic_matching/)\n", "* Auto-complete suggestions (with fuzzy prefix suggestions)\n", "* Stemming-based query expansion in [many languages](https://redis.io/docs/stack/search/reference/stemming/) (using [Snowball](http://snowballstem.org/))\n", "* Support for Chinese-language tokenization and querying (using [Friso](https://github.com/lionsoul2014/friso))\n", "* Numeric filters and ranges\n", "* Geospatial searches using [Redis geospatial indexing](/commands/georadius)\n", "* A powerful aggregations engine\n", "* Supports for all utf-8 encoded text\n", "* Retrieve full documents, selected fields, or only the document IDs\n", "* Sorting results (for example, by creation date)\n", "\n", "\n", "\n", "## Clients\n", "\n", "Since redis is much more than just a vector database, there are often use cases that demand usage of a Redis client besides just the langchain integration. You can use any standard Redis client library to run Search and Query commands, but it's easiest to use a library that wraps the Search and Query API. Below are a few examples, but you can find more client libraries [here](https://redis.io/resources/clients/).\n", "\n", "| Project | Language | License | Author | Stars |\n", "|----------|---------|--------|---------|-------|\n", "| [jedis][jedis-url] | Java | MIT | [Redis][redis-url] | ![Stars][jedis-stars] |\n", "| [redisvl][redisvl-url] | Python | MIT | [Redis][redis-url] | ![Stars][redisvl-stars] |\n", "| [redis-py][redis-py-url] | Python | MIT | [Redis][redis-url] | ![Stars][redis-py-stars] |\n", "| [node-redis][node-redis-url] | Node.js | MIT | [Redis][redis-url] | ![Stars][node-redis-stars] |\n", "| [nredisstack][nredisstack-url] | .NET | MIT | [Redis][redis-url] | ![Stars][nredisstack-stars] |\n", "\n", "[redis-url]: https://redis.com\n", "\n", "[redisvl-url]: https://github.com/RedisVentures/redisvl\n", "[redisvl-stars]: https://img.shields.io/github/stars/RedisVentures/redisvl.svg?style=social&label=Star&maxAge=2592000\n", "[redisvl-package]: https://pypi.python.org/pypi/redisvl\n", "\n", "[redis-py-url]: https://github.com/redis/redis-py\n", "[redis-py-stars]: https://img.shields.io/github/stars/redis/redis-py.svg?style=social&label=Star&maxAge=2592000\n", "[redis-py-package]: https://pypi.python.org/pypi/redis\n", "\n", "[jedis-url]: https://github.com/redis/jedis\n", "[jedis-stars]: https://img.shields.io/github/stars/redis/jedis.svg?style=social&label=Star&maxAge=2592000\n", "[Jedis-package]: https://search.maven.org/artifact/redis.clients/jedis\n", "\n", "[nredisstack-url]: https://github.com/redis/nredisstack\n", "[nredisstack-stars]: https://img.shields.io/github/stars/redis/nredisstack.svg?style=social&label=Star&maxAge=2592000\n", "[nredisstack-package]: https://www.nuget.org/packages/nredisstack/\n", "\n", "[node-redis-url]: https://github.com/redis/node-redis\n", "[node-redis-stars]: https://img.shields.io/github/stars/redis/node-redis.svg?style=social&label=Star&maxAge=2592000\n", "[node-redis-package]: https://www.npmjs.com/package/redis\n", "\n", "[redis-om-python-url]: https://github.com/redis/redis-om-python\n", "[redis-om-python-author]: https://redis.com\n", "[redis-om-python-stars]: https://img.shields.io/github/stars/redis/redis-om-python.svg?style=social&label=Star&maxAge=2592000\n", "\n", "[redisearch-go-url]: https://github.com/RediSearch/redisearch-go\n", "[redisearch-go-author]: https://redis.com\n", "[redisearch-go-stars]: https://img.shields.io/github/stars/RediSearch/redisearch-go.svg?style=social&label=Star&maxAge=2592000\n", "\n", "[redisearch-api-rs-url]: https://github.com/RediSearch/redisearch-api-rs\n", "[redisearch-api-rs-author]: https://redis.com\n", "[redisearch-api-rs-stars]: https://img.shields.io/github/stars/RediSearch/redisearch-api-rs.svg?style=social&label=Star&maxAge=2592000\n", "\n", "\n", "## Deployment Options\n", "\n", "There are many ways to deploy Redis with RediSearch. The easiest way to get started is to use Docker, but there are are many potential options for deployment such as\n", "\n", "- [Redis Cloud](https://redis.com/redis-enterprise-cloud/overview/)\n", "- [Docker (Redis Stack)](https://hub.docker.com/r/redis/redis-stack)\n", "- Cloud marketplaces: [AWS Marketplace](https://aws.amazon.com/marketplace/pp/prodview-e6y7ork67pjwg?sr=0-2&ref_=beagle&applicationId=AWSMPContessa), [Google Marketplace](https://console.cloud.google.com/marketplace/details/redislabs-public/redis-enterprise?pli=1), or [Azure Marketplace](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/garantiadata.redis_enterprise_1sp_public_preview?tab=Overview)\n", "- On-premise: [Redis Enterprise Software](https://redis.com/redis-enterprise-software/overview/)\n", "- Kubernetes: [Redis Enterprise Software on Kubernetes](https://docs.redis.com/latest/kubernetes/)\n", "\n", "\n", "## Examples\n", "\n", "Many examples can be found in the [Redis AI team's GitHub](https://github.com/RedisVentures/)\n", "\n", "- [Awesome Redis AI Resources](https://github.com/RedisVentures/redis-ai-resources) - List of examples of using Redis in AI workloads\n", "- [Azure OpenAI Embeddings Q&A](https://github.com/ruoccofabrizio/azure-open-ai-embeddings-qna) - OpenAI and Redis as a Q&A service on Azure.\n", "- [ArXiv Paper Search](https://github.com/RedisVentures/redis-arXiv-search) - Semantic search over arXiv scholarly papers\n", "\n", "\n", "## More Resources\n", "\n", "For more information on how to use Redis as a vector database, check out the following resources:\n", "\n", "- [RedisVL Documentation](https://redisvl.com) - Documentation for the Redis Vector Library Client\n", "- [Redis Vector Similarity Docs](https://redis.io/docs/stack/search/reference/vectors/) - Redis official docs for Vector Search.\n", "- [Redis-py Search Docs](https://redis.readthedocs.io/en/latest/redismodules.html#redisearch-commands) - Documentation for redis-py client library\n", "- [Vector Similarity Search: From Basics to Production](https://mlops.community/vector-similarity-search-from-basics-to-production/) - Introductory blog post to VSS and Redis as a VectorDB." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install Redis Python Client\n", "\n", "Redis-py is the officially supported client by Redis. Recently released is the RedisVL client which is purpose-built for the Vector Database use cases. Both can be installed with pip." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "!pip install redis redisvl openai tiktoken" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import os\n", "import getpass\n", "\n", "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from langchain.embeddings import OpenAIEmbeddings\n", "\n", "embeddings = OpenAIEmbeddings()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sample Data\n", "\n", "First we will describe some sample data so that the various attributes of the Redis vector store can be demonstrated." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "metadata = [\n", " {\n", " \"user\": \"john\",\n", " \"age\": 18,\n", " \"job\": \"engineer\",\n", " \"credit_score\": \"high\",\n", " },\n", " {\n", " \"user\": \"derrick\",\n", " \"age\": 45,\n", " \"job\": \"doctor\",\n", " \"credit_score\": \"low\",\n", " },\n", " {\n", " \"user\": \"nancy\",\n", " \"age\": 94,\n", " \"job\": \"doctor\",\n", " \"credit_score\": \"high\",\n", " },\n", " {\n", " \"user\": \"tyler\",\n", " \"age\": 100,\n", " \"job\": \"engineer\",\n", " \"credit_score\": \"high\",\n", " },\n", " {\n", " \"user\": \"joe\",\n", " \"age\": 35,\n", " \"job\": \"dentist\",\n", " \"credit_score\": \"medium\",\n", " },\n", "]\n", "texts = [\"foo\", \"foo\", \"foo\", \"bar\", \"bar\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initializing Redis\n", "\n", "To locally deploy Redis, run:\n", "```console\n", "docker run -d -p 6379:6379 -p 8001:8001 redis/redis-stack:latest\n", "```\n", "If things are running correctly you should see a nice Redis UI at http://localhost:8001. See the [Deployment Options](#deployment-options) section above for other ways to deploy.\n", "\n", "The Redis VectorStore instance can be initialized in a number of ways. There are multiple class methods that can be used to initialize a Redis VectorStore instance.\n", "\n", "- ``Redis.__init__`` - Initialize directly\n", "- ``Redis.from_documents`` - Initialize from a list of ``Langchain.docstore.Document`` objects\n", "- ``Redis.from_texts`` - Initialize from a list of texts (optionally with metadata)\n", "- ``Redis.from_texts_return_keys`` - Initialize from a list of texts (optionally with metadata) and return the keys\n", "- ``Redis.from_existing_index`` - Initialize from an existing Redis index\n", "\n", "Below we will use the ``Redis.from_texts`` method." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.vectorstores.redis import Redis\n", "\n", "rds = Redis.from_texts(\n", " texts,\n", " embeddings,\n", " metadatas=metadata,\n", " redis_url=\"redis://localhost:6379\",\n", " index_name=\"users\"\n", ")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'users'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rds.index_name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inspecting the Created Index\n", "\n", "Once the ``Redis`` VectorStore object has been constructed, an index will have been created in Redis if it did not already exist. The index can be inspected with both the ``rvl``and the ``redis-cli`` command line tool. If you installed ``redisvl`` above, you can use the ``rvl`` command line tool to inspect the index." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32m16:58:26\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Indices:\n", "\u001b[32m16:58:26\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m 1. users\n" ] } ], "source": [ "# assumes you're running Redis locally (use --host, --port, --password, --username, to change this)\n", "!rvl index listall" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ``Redis`` VectorStore implementation will attempt to generate index schema (fields for filtering) for any metadata passed through the ``from_texts``, ``from_texts_return_keys``, and ``from_documents`` methods. This way, whatever metadata is passed will be indexed into the Redis search index allowing\n", "for filtering on those fields.\n", "\n", "Below we show what fields were created from the metadata we defined above" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "Index Information:\n", "╭──────────────┬────────────────┬───────────────┬─────────────────┬────────────╮\n", "│ Index Name │ Storage Type │ Prefixes │ Index Options │ Indexing │\n", "├──────────────┼────────────────┼───────────────┼─────────────────┼────────────┤\n", "│ users │ HASH │ ['doc:users'] │ [] │ 0 │\n", "╰──────────────┴────────────────┴───────────────┴─────────────────┴────────────╯\n", "Index Fields:\n", "╭────────────────┬────────────────┬─────────┬────────────────┬────────────────╮\n", "│ Name │ Attribute │ Type │ Field Option │ Option Value │\n", "├────────────────┼────────────────┼─────────┼────────────────┼────────────────┤\n", "│ user │ user │ TEXT │ WEIGHT │ 1 │\n", "│ job │ job │ TEXT │ WEIGHT │ 1 │\n", "│ credit_score │ credit_score │ TEXT │ WEIGHT │ 1 │\n", "│ content │ content │ TEXT │ WEIGHT │ 1 │\n", "│ age │ age │ NUMERIC │ │ │\n", "│ content_vector │ content_vector │ VECTOR │ │ │\n", "╰────────────────┴────────────────┴─────────┴────────────────┴────────────────╯\n" ] } ], "source": [ "!rvl index info -i users" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Statistics:\n", "╭─────────────────────────────┬─────────────╮\n", "│ Stat Key │ Value │\n", "├─────────────────────────────┼─────────────┤\n", "│ num_docs │ 5 │\n", "│ num_terms │ 15 │\n", "│ max_doc_id │ 5 │\n", "│ num_records │ 33 │\n", "│ percent_indexed │ 1 │\n", "│ hash_indexing_failures │ 0 │\n", "│ number_of_uses │ 4 │\n", "│ bytes_per_record_avg │ 4.60606 │\n", "│ doc_table_size_mb │ 0.000524521 │\n", "│ inverted_sz_mb │ 0.000144958 │\n", "│ key_table_size_mb │ 0.000193596 │\n", "│ offset_bits_per_record_avg │ 8 │\n", "│ offset_vectors_sz_mb │ 2.19345e-05 │\n", "│ offsets_per_term_avg │ 0.69697 │\n", "│ records_per_doc_avg │ 6.6 │\n", "│ sortable_values_size_mb │ 0 │\n", "│ total_indexing_time │ 0.32 │\n", "│ total_inverted_index_blocks │ 16 │\n", "│ vector_index_sz_mb │ 6.0126 │\n", "╰─────────────────────────────┴─────────────╯\n" ] } ], "source": [ "!rvl stats -i users" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's important to note that we have not specified that the ``user``, ``job``, ``credit_score`` and ``age`` in the metadata should be fields within the index, this is because the ``Redis`` VectorStore object automatically generate the index schema from the passed metadata. For more information on the generation of index fields, see the API documentation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Querying\n", "\n", "There are multiple ways to query the ``Redis`` VectorStore implementation based on what use case you have:\n", "\n", "- ``similarity_search``: Find the most similar vectors to a given vector.\n", "- ``similarity_search_with_score``: Find the most similar vectors to a given vector and return the vector distance\n", "- ``similarity_search_limit_score``: Find the most similar vectors to a given vector and limit the number of results to the ``score_threshold``\n", "- ``similarity_search_with_relevance_scores``: Find the most similar vectors to a given vector and return the vector similarities\n", "- ``max_marginal_relevance_search``: Find the most similar vectors to a given vector while also optimizing for diversity" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "foo\n" ] } ], "source": [ "results = rds.similarity_search(\"foo\")\n", "print(results[0].page_content)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Key of the document in Redis: doc:users:a70ca43b3a4e4168bae57c78753a200f\n", "Metadata of the document: {'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'}\n" ] } ], "source": [ "# return metadata\n", "results = rds.similarity_search(\"foo\", k=3)\n", "meta = results[1].metadata\n", "print(\"Key of the document in Redis: \", meta.pop(\"id\"))\n", "print(\"Metadata of the document: \", meta)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Content: foo --- Score: 0.0\n", "Content: foo --- Score: 0.0\n", "Content: foo --- Score: 0.0\n", "Content: bar --- Score: 0.1566\n", "Content: bar --- Score: 0.1566\n" ] } ], "source": [ "# with scores (distances)\n", "results = rds.similarity_search_with_score(\"foo\", k=5)\n", "for result in results:\n", " print(f\"Content: {result[0].page_content} --- Score: {result[1]}\")" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Content: foo --- Score: 0.0\n", "Content: foo --- Score: 0.0\n", "Content: foo --- Score: 0.0\n" ] } ], "source": [ "# limit the vector distance that can be returned\n", "results = rds.similarity_search_with_score(\"foo\", k=5, distance_threshold=0.1)\n", "for result in results:\n", " print(f\"Content: {result[0].page_content} --- Score: {result[1]}\")" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Content: foo --- Similiarity: 1.0\n", "Content: foo --- Similiarity: 1.0\n", "Content: foo --- Similiarity: 1.0\n", "Content: bar --- Similiarity: 0.8434\n", "Content: bar --- Similiarity: 0.8434\n" ] } ], "source": [ "# with scores\n", "results = rds.similarity_search_with_relevance_scores(\"foo\", k=5)\n", "for result in results:\n", " print(f\"Content: {result[0].page_content} --- Similiarity: {result[1]}\")" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Content: foo --- Similarity: 1.0\n", "Content: foo --- Similarity: 1.0\n", "Content: foo --- Similarity: 1.0\n" ] } ], "source": [ "# limit scores (similarities have to be over .9)\n", "results = rds.similarity_search_with_relevance_scores(\"foo\", k=5, score_threshold=0.9)\n", "for result in results:\n", " print(f\"Content: {result[0].page_content} --- Similarity: {result[1]}\")" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['doc:users:b9c71d62a0a34241a37950b448dafd38']" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# you can also add new documents as follows\n", "new_document = [\"baz\"]\n", "new_metadata = [{\n", " \"user\": \"sam\",\n", " \"age\": 50,\n", " \"job\": \"janitor\",\n", " \"credit_score\": \"high\"\n", "}]\n", "# both the document and metadata must be lists\n", "rds.add_texts(new_document, new_metadata)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'id': 'doc:users:b9c71d62a0a34241a37950b448dafd38', 'user': 'sam', 'job': 'janitor', 'credit_score': 'high', 'age': '50'}\n" ] } ], "source": [ "# now query the new document\n", "results = rds.similarity_search(\"baz\", k=3)\n", "print(results[0].metadata)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# use maximal marginal relevance search to diversify results\n", "results = rds.max_marginal_relevance_search(\"foo\")" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# the lambda_mult parameter controls the diversity of the results, the lower the more diverse\n", "results = rds.max_marginal_relevance_search(\"foo\", lambda_mult=0.1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Connect to an Existing Index\n", "\n", "In order to have the same metadata indexed when using the ``Redis`` VectorStore. You will need to have the same ``index_schema`` passed in either as a path to a yaml file or as a dictionary. The following shows how to obtain the schema from an index and connect to an existing index." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "# write the schema to a yaml file\n", "rds.write_schema(\"redis_schema.yaml\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The schema file for this example should look something like:\n", "\n", "```yaml\n", "numeric:\n", "- name: age\n", " no_index: false\n", " sortable: false\n", "text:\n", "- name: user\n", " no_index: false\n", " no_stem: false\n", " sortable: false\n", " weight: 1\n", " withsuffixtrie: false\n", "- name: job\n", " no_index: false\n", " no_stem: false\n", " sortable: false\n", " weight: 1\n", " withsuffixtrie: false\n", "- name: credit_score\n", " no_index: false\n", " no_stem: false\n", " sortable: false\n", " weight: 1\n", " withsuffixtrie: false\n", "- name: content\n", " no_index: false\n", " no_stem: false\n", " sortable: false\n", " weight: 1\n", " withsuffixtrie: false\n", "vector:\n", "- algorithm: FLAT\n", " block_size: 1000\n", " datatype: FLOAT32\n", " dims: 1536\n", " distance_metric: COSINE\n", " initial_cap: 20000\n", " name: content_vector\n", "```\n", "\n", "**Notice**, this include **all** possible fields for the schema. You can remove any fields that you don't need." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'id': 'doc:users:8484c48a032d4c4cbe3cc2ed6845fabb', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}\n" ] } ], "source": [ "# now we can connect to our existing index as follows\n", "\n", "new_rds = Redis.from_existing_index(\n", " embeddings,\n", " index_name=\"users\",\n", " redis_url=\"redis://localhost:6379\",\n", " schema=\"redis_schema.yaml\"\n", ")\n", "results = new_rds.similarity_search(\"foo\", k=3)\n", "print(results[0].metadata)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# see the schemas are the same\n", "new_rds.schema == rds.schema" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Custom Metadata Indexing\n", "\n", "In some cases, you may want to control what fields the metadata maps to. For example, you may want the ``credit_score`` field to be a categorical field instead of a text field (which is the default behavior for all string fields). In this case, you can use the ``index_schema`` parameter in each of the initialization methods above to specify the schema for the index. Custom index schema can either be passed as a dictionary or as a path to a yaml file.\n", "\n", "All arguments in the schema have defaults besides the name, so you can specify only the fields you want to change. All the names correspond to the snake/lowercase versions of the arguments you would use on the command line with ``redis-cli`` or in ``redis-py``. For more on the arguments for each field, see the [documentation](https://redis.io/docs/interact/search-and-query/basic-constructs/field-and-type-options/)\n", "\n", "The below example shows how to specify the schema for the ``credit_score`` field as a Tag (categorical) field instead of a text field. \n", "\n", "```yaml\n", "# index_schema.yml\n", "tag:\n", " - name: credit_score\n", "text:\n", " - name: user\n", " - name: job\n", "numeric:\n", " - name: age\n", "```\n", "\n", "In Python this would look like:\n", "\n", "```python\n", "\n", "index_schema = {\n", " \"tag\": [{\"name\": \"credit_score\"}],\n", " \"text\": [{\"name\": \"user\"}, {\"name\": \"job\"}],\n", " \"numeric\": [{\"name\": \"age\"}],\n", "}\n", "\n", "```\n", "\n", "Notice that only the ``name`` field needs to be specified. All other fields have defaults." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`index_schema` does not match generated metadata schema.\n", "If you meant to manually override the schema, please ignore this message.\n", "index_schema: {'tag': [{'name': 'credit_score'}], 'text': [{'name': 'user'}, {'name': 'job'}], 'numeric': [{'name': 'age'}]}\n", "generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}], 'tag': []}\n", "\n" ] } ], "source": [ "# create a new index with the new schema defined above\n", "index_schema = {\n", " \"tag\": [{\"name\": \"credit_score\"}],\n", " \"text\": [{\"name\": \"user\"}, {\"name\": \"job\"}],\n", " \"numeric\": [{\"name\": \"age\"}],\n", "}\n", "\n", "rds, keys = Redis.from_texts_return_keys(\n", " texts,\n", " embeddings,\n", " metadatas=metadata,\n", " redis_url=\"redis://localhost:6379\",\n", " index_name=\"users_modified\",\n", " index_schema=index_schema, # pass in the new index schema\n", ")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above warning is meant to notify users when they are overriding the default behavior. Ignore it if you are intentionally overriding the behavior." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hybrid Filtering\n", "\n", "With the Redis Filter Expression language built into langchain, you can create arbitrarily long chains of hybrid filters\n", "that can be used to filter your search results. The expression language is derived from the [RedisVL Expression Syntax](https://redisvl.com)\n", "and is designed to be easy to use and understand.\n", "\n", "The following are the available filter types:\n", "- ``RedisText``: Filter by full-text search against metadata fields. Supports exact, fuzzy, and wildcard matching.\n", "- ``RedisNum``: Filter by numeric range against metadata fields.\n", "- ``RedisTag``: Filter by exact match against string based categorical metadata fields. Multiple tags can be specified like \"tag1,tag2,tag3\".\n", "\n", "The following are examples of utilizing these filters.\n", "\n", "```python\n", "\n", "from langchain.vectorstores.redis import RedisText, RedisNum, RedisTag\n", "\n", "# exact matching\n", "has_high_credit = RedisTag(\"credit_score\") == \"high\"\n", "does_not_have_high_credit = RedisTag(\"credit_score\") != \"low\"\n", "\n", "# fuzzy matching\n", "job_starts_with_eng = RedisText(\"job\") % \"eng*\"\n", "job_is_engineer = RedisText(\"job\") == \"engineer\"\n", "job_is_not_engineer = RedisText(\"job\") != \"engineer\"\n", "\n", "# numeric filtering\n", "age_is_18 = RedisNum(\"age\") == 18\n", "age_is_not_18 = RedisNum(\"age\") != 18\n", "age_is_greater_than_18 = RedisNum(\"age\") > 18\n", "age_is_less_than_18 = RedisNum(\"age\") < 18\n", "age_is_greater_than_or_equal_to_18 = RedisNum(\"age\") >= 18\n", "age_is_less_than_or_equal_to_18 = RedisNum(\"age\") <= 18\n", "\n", "```\n", "\n", "The ``RedisFilter`` class can be used to simplify the import of these filters as follows\n", "\n", "```python\n", "\n", "from langchain.vectorstores.redis import RedisFilter\n", "\n", "# same examples as above\n", "has_high_credit = RedisFilter.tag(\"credit_score\") == \"high\"\n", "does_not_have_high_credit = RedisFilter.num(\"age\") > 8\n", "job_starts_with_eng = RedisFilter.text(\"job\") % \"eng*\"\n", "```\n", "\n", "The following are examples of using hybrid filter for search" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Job: engineer\n", "Engineers in the dataset: 2\n" ] } ], "source": [ "from langchain.vectorstores.redis import RedisText\n", "\n", "is_engineer = RedisText(\"job\") == \"engineer\"\n", "results = rds.similarity_search(\"foo\", k=3, filter=is_engineer)\n", "\n", "print(\"Job:\", results[0].metadata[\"job\"])\n", "print(\"Engineers in the dataset:\", len(results))" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Job: doctor\n", "Job: doctor\n", "Jobs in dataset that start with 'doc': 2\n" ] } ], "source": [ "# fuzzy match\n", "starts_with_doc = RedisText(\"job\") % \"doc*\"\n", "results = rds.similarity_search(\"foo\", k=3, filter=starts_with_doc)\n", "\n", "for result in results:\n", " print(\"Job:\", result.metadata[\"job\"])\n", "print(\"Jobs in dataset that start with 'doc':\", len(results))" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "User: derrick is 45\n", "User: nancy is 94\n", "User: joe is 35\n" ] } ], "source": [ "from langchain.vectorstores.redis import RedisNum\n", "\n", "is_over_18 = RedisNum(\"age\") > 18\n", "is_under_99 = RedisNum(\"age\") < 99\n", "age_range = is_over_18 & is_under_99\n", "results = rds.similarity_search(\"foo\", filter=age_range)\n", "\n", "for result in results:\n", " print(\"User:\", result.metadata[\"user\"], \"is\", result.metadata[\"age\"])" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "User: derrick is 45\n", "User: nancy is 94\n", "User: joe is 35\n" ] } ], "source": [ "# make sure to use parenthesis around FilterExpressions\n", "# if initializing them while constructing them\n", "age_range = (RedisNum(\"age\") > 18) & (RedisNum(\"age\") < 99)\n", "results = rds.similarity_search(\"foo\", filter=age_range)\n", "\n", "for result in results:\n", " print(\"User:\", result.metadata[\"user\"], \"is\", result.metadata[\"age\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Redis as Retriever\n", "\n", "Here we go over different options for using the vector store as a retriever.\n", "\n", "There are three different search methods we can use to do retrieval. By default, it will use semantic similarity." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Content: foo --- Score: 0.0\n", "Content: foo --- Score: 0.0\n", "Content: foo --- Score: 0.0\n" ] } ], "source": [ "query = \"foo\"\n", "results = rds.similarity_search_with_score(query, k=3, return_metadata=True)\n", "\n", "for result in results:\n", " print(\"Content:\", result[0].page_content, \" --- Score: \", result[1])\n" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "retriever = rds.as_retriever(search_type=\"similarity\", search_kwargs={\"k\": 4})" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Document(page_content='foo', metadata={'id': 'doc:users_modified:988ecca7574048e396756efc0e79aeca', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),\n", " Document(page_content='foo', metadata={'id': 'doc:users_modified:009b1afeb4084cc6bdef858c7a99b48e', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'}),\n", " Document(page_content='foo', metadata={'id': 'doc:users_modified:7087cee9be5b4eca93c30fbdd09a2731', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'}),\n", " Document(page_content='bar', metadata={'id': 'doc:users_modified:01ef6caac12b42c28ad870aefe574253', 'user': 'tyler', 'job': 'engineer', 'credit_score': 'high', 'age': '100'})]" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "docs = retriever.get_relevant_documents(query)\n", "docs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is also the `similarity_distance_threshold` retriever which allows the user to specify the vector distance" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "retriever = rds.as_retriever(search_type=\"similarity_distance_threshold\", search_kwargs={\"k\": 4, \"distance_threshold\": 0.1})" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Document(page_content='foo', metadata={'id': 'doc:users_modified:988ecca7574048e396756efc0e79aeca', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),\n", " Document(page_content='foo', metadata={'id': 'doc:users_modified:009b1afeb4084cc6bdef858c7a99b48e', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'}),\n", " Document(page_content='foo', metadata={'id': 'doc:users_modified:7087cee9be5b4eca93c30fbdd09a2731', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'})]" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "docs = retriever.get_relevant_documents(query)\n", "docs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lastly, the ``similarity_score_threshold`` allows the user to define the minimum score for similar documents" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "retriever = rds.as_retriever(search_type=\"similarity_score_threshold\", search_kwargs={\"score_threshold\": 0.9, \"k\": 10})" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Document(page_content='foo', metadata={'id': 'doc:users_modified:988ecca7574048e396756efc0e79aeca', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),\n", " Document(page_content='foo', metadata={'id': 'doc:users_modified:009b1afeb4084cc6bdef858c7a99b48e', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'}),\n", " Document(page_content='foo', metadata={'id': 'doc:users_modified:7087cee9be5b4eca93c30fbdd09a2731', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'})]" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "retriever.get_relevant_documents(\"foo\")" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "retriever = rds.as_retriever(search_type=\"mmr\", search_kwargs={\"fetch_k\": 20, \"k\": 4, \"lambda_mult\": 0.1})" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Document(page_content='foo', metadata={'id': 'doc:users:8f6b673b390647809d510112cde01a27', 'user': 'john', 'job': 'engineer', 'credit_score': 'high', 'age': '18'}),\n", " Document(page_content='bar', metadata={'id': 'doc:users:93521560735d42328b48c9c6f6418d6a', 'user': 'tyler', 'job': 'engineer', 'credit_score': 'high', 'age': '100'}),\n", " Document(page_content='foo', metadata={'id': 'doc:users:125ecd39d07845eabf1a699d44134a5b', 'user': 'nancy', 'job': 'doctor', 'credit_score': 'high', 'age': '94'}),\n", " Document(page_content='foo', metadata={'id': 'doc:users:d6200ab3764c466082fde3eaab972a2a', 'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '45'})]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "retriever.get_relevant_documents(\"foo\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Delete keys" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To delete your entries you have to address them by their keys." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Redis.delete(keys, redis_url=\"redis://localhost:6379\")" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# delete the indices too\n", "Redis.drop_index(index_name=\"users\", delete_documents=True, redis_url=\"redis://localhost:6379\")\n", "Redis.drop_index(index_name=\"users_modified\", delete_documents=True, redis_url=\"redis://localhost:6379\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Redis connection Url examples\n", "\n", "Valid Redis Url scheme are:\n", "1. `redis://` - Connection to Redis standalone, unencrypted\n", "2. `rediss://` - Connection to Redis standalone, with TLS encryption\n", "3. `redis+sentinel://` - Connection to Redis server via Redis Sentinel, unencrypted\n", "4. `rediss+sentinel://` - Connection to Redis server via Redis Sentinel, booth connections with TLS encryption\n", "\n", "More information about additional connection parameter can be found in the redis-py documentation at https://redis-py.readthedocs.io/en/stable/connections.html" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "# connection to redis standalone at localhost, db 0, no password\n", "redis_url = \"redis://localhost:6379\"\n", "# connection to host \"redis\" port 7379 with db 2 and password \"secret\" (old style authentication scheme without username / pre 6.x)\n", "redis_url = \"redis://:secret@redis:7379/2\"\n", "# connection to host redis on default port with user \"joe\", pass \"secret\" using redis version 6+ ACLs\n", "redis_url = \"redis://joe:secret@redis/0\"\n", "\n", "# connection to sentinel at localhost with default group mymaster and db 0, no password\n", "redis_url = \"redis+sentinel://localhost:26379\"\n", "# connection to sentinel at host redis with default port 26379 and user \"joe\" with password \"secret\" with default group mymaster and db 0\n", "redis_url = \"redis+sentinel://joe:secret@redis\"\n", "# connection to sentinel, no auth with sentinel monitoring group \"zone-1\" and database 2\n", "redis_url = \"redis+sentinel://redis:26379/zone-1/2\"\n", "\n", "# connection to redis standalone at localhost, db 0, no password but with TLS support\n", "redis_url = \"rediss://localhost:6379\"\n", "# connection to redis sentinel at localhost and default port, db 0, no password\n", "# but with TLS support for booth Sentinel and Redis server\n", "redis_url = \"rediss+sentinel://localhost\"" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.1" } }, "nbformat": 4, "nbformat_minor": 4 }