Redis hybrid VSS queries notebook

pull/1077/head
Michael Yuan 1 year ago
parent d3d8d9742a
commit fcdc92d6ad

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

@ -0,0 +1,862 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "cb1537e6",
"metadata": {},
"source": [
"# Running Hybrid VSS Queries with Redis and OpenAI\n",
"\n",
"This notebook provides an introduction to using Redis as a vector database with OpenAI embeddings and running hybrid queries that combine VSS and lexical search using Redis Search and Query capability. Redis is a scalable, real-time database that can be used as a vector database when using the [RediSearch Module](https://oss.redislabs.com/redisearch/). The RediSearch module allows you to index and search for vectors in Redis. This notebook will show you how to use the RediSearch module to index and search for vectors created by using the OpenAI API and stored in Redis.\n",
"\n",
"### What is Redis?\n",
"\n",
"Most developers from a web services background are probably familiar with Redis. At it's core, Redis is an open-source key-value store that can be used as a cache, message broker, and database. Developers choice Redis because it is fast, has a large ecosystem of client libraries, and has been deployed by major enterprises for years.\n",
"\n",
"In addition to the traditional uses of Redis. Redis also provides [Redis Modules](https://redis.io/modules) which are a way to extend Redis with new data types and commands. Example modules include [RedisJSON](https://redis.io/docs/stack/json/), [RedisTimeSeries](https://redis.io/docs/stack/timeseries/), [RedisBloom](https://redis.io/docs/stack/bloom/) and [RediSearch](https://redis.io/docs/stack/search/).\n",
"\n",
"### What is RediSearch?\n",
"\n",
"RediSearch is a [Redis module](https://redis.io/modules) that provides querying, secondary indexing, full-text search and vector search for Redis. To use RediSearch, you first declare indexes on your Redis data. You can then use the RediSearch clients to query that data. For more information on the feature set of RediSearch, see the [README](./README.md) or the [RediSearch documentation](https://redis.io/docs/stack/search/).\n",
"\n",
"### Deployment options\n",
"\n",
"There are a number of ways to deploy Redis. For local development, the quickest method is to use the [Redis Stack docker container](https://hub.docker.com/r/redis/redis-stack) which we will use here. Redis Stack contains a number of Redis modules that can be used together to create a fast, multi-model data store and query engine.\n",
"\n",
"For production use cases, The easiest way to get started is to use the [Redis Cloud](https://redislabs.com/redis-enterprise-cloud/overview/) service. Redis Cloud is a fully managed Redis service. You can also deploy Redis on your own infrastructure using [Redis Enterprise](https://redislabs.com/redis-enterprise/overview/). Redis Enterprise is a fully managed Redis service that can be deployed in kubernetes, on-premises or in the cloud.\n",
"\n",
"Additionally, every major cloud provider ([AWS Marketplace](https://aws.amazon.com/marketplace/pp/prodview-e6y7ork67pjwg?sr=0-2&ref_=beagle&applicationId=AWSMPContessa), [Google Marketplace](https://console.cloud.google.com/marketplace/details/redislabs-public/redis-enterprise?pli=1), or [Azure Marketplace](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/garantiadata.redis_enterprise_1sp_public_preview?tab=Overview)) offers Redis Enterprise in a marketplace offering.\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "f1a618c5",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"Before we start this project, we need to set up the following:\n",
"\n",
"* start a Redis database with RediSearch (redis-stack)\n",
"* install libraries\n",
" * [Redis-py](https://github.com/redis/redis-py)\n",
"* get your [OpenAI API key](https://beta.openai.com/account/api-keys)\n",
"\n",
"===========================================================\n",
"\n",
"### Start Redis\n",
"\n",
"To keep this example simple, we will use the Redis Stack docker container which we can start as follows\n",
"\n",
"```bash\n",
"$ docker-compose up -d\n",
"```\n",
"\n",
"This also includes the [RedisInsight](https://redis.com/redis-enterprise/redis-insight/) GUI for managing your Redis database which you can view at [http://localhost:8001](http://localhost:8001) once you start the docker container.\n",
"\n",
"You're all set up and ready to go! Next, we import and create our client for communicating with the Redis database we just created."
]
},
{
"cell_type": "markdown",
"id": "b9babafe",
"metadata": {},
"source": [
"## Install Requirements\n",
"\n",
"Redis-Py is the python client for communicating with Redis. We will use this to communicate with our Redis-stack database. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "2b04113f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Defaulting to user installation because normal site-packages is not writeable\n",
"Requirement already satisfied: redis in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (4.5.4)\n",
"Requirement already satisfied: pandas in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (2.0.1)\n",
"Requirement already satisfied: openai in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (0.27.6)\n",
"Requirement already satisfied: async-timeout>=4.0.2 in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (from redis) (4.0.2)\n",
"Requirement already satisfied: python-dateutil>=2.8.2 in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (from pandas) (2.8.2)\n",
"Requirement already satisfied: pytz>=2020.1 in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (from pandas) (2023.3)\n",
"Requirement already satisfied: tzdata>=2022.1 in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (from pandas) (2023.3)\n",
"Requirement already satisfied: numpy>=1.20.3 in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (from pandas) (1.23.4)\n",
"Requirement already satisfied: requests>=2.20 in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (from openai) (2.28.1)\n",
"Requirement already satisfied: tqdm in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (from openai) (4.64.1)\n",
"Requirement already satisfied: aiohttp in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (from openai) (3.8.4)\n",
"Requirement already satisfied: six>=1.5 in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)\n",
"Requirement already satisfied: charset-normalizer<3,>=2 in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (from requests>=2.20->openai) (2.1.1)\n",
"Requirement already satisfied: idna<4,>=2.5 in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (from requests>=2.20->openai) (3.4)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (from requests>=2.20->openai) (1.26.12)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (from requests>=2.20->openai) (2022.9.24)\n",
"Requirement already satisfied: attrs>=17.3.0 in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (from aiohttp->openai) (23.1.0)\n",
"Requirement already satisfied: multidict<7.0,>=4.5 in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (from aiohttp->openai) (6.0.4)\n",
"Requirement already satisfied: yarl<2.0,>=1.0 in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (from aiohttp->openai) (1.9.2)\n",
"Requirement already satisfied: frozenlist>=1.1.1 in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (from aiohttp->openai) (1.3.3)\n",
"Requirement already satisfied: aiosignal>=1.1.2 in /Users/michael.yuan/Library/Python/3.9/lib/python/site-packages (from aiohttp->openai) (1.3.1)\n"
]
}
],
"source": [
"! pip install redis pandas openai"
]
},
{
"cell_type": "markdown",
"id": "36fe86f4",
"metadata": {},
"source": [
"===========================================================\n",
"## Prepare your OpenAI API key\n",
"\n",
"The `OpenAI API key` is used for vectorization of query data.\n",
"\n",
"If you don't have an OpenAI API key, you can get one from [https://beta.openai.com/account/api-keys](https://beta.openai.com/account/api-keys).\n",
"\n",
"Once you get your key, please add it to your environment variables as `OPENAI_API_KEY` by using following command:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "be28faa6",
"metadata": {},
"outputs": [],
"source": [
"! export OPENAI_API_KEY=\"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\""
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "88be138c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"OPENAI_API_KEY is ready\n"
]
}
],
"source": [
"# Test that your OpenAI API key is correctly set as an environment variable\n",
"# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for the env variables to be live.\n",
"import os\n",
"import openai\n",
"\n",
"# Note. alternatively you can set a temporary env variable like this:\n",
"# os.environ[\"OPENAI_API_KEY\"] = 'sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'\n",
"\n",
"if os.getenv(\"OPENAI_API_KEY\") is not None:\n",
" openai.api_key = os.getenv(\"OPENAI_API_KEY\")\n",
" print (\"OPENAI_API_KEY is ready\")\n",
"else:\n",
" print (\"OPENAI_API_KEY environment variable not found\")"
]
},
{
"cell_type": "markdown",
"id": "97fefe4c",
"metadata": {},
"source": [
"## Load data\n",
"\n",
"In this section we'll load and clean an ecommerce dataset. We'll generate embeddings using OpenAI and use this data to create an index in Redis and then search for similar vectors."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "9fbebe0d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Index: 1978 entries, 0 to 1998\n",
"Data columns (total 10 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 id 1978 non-null int64 \n",
" 1 gender 1978 non-null object\n",
" 2 masterCategory 1978 non-null object\n",
" 3 subCategory 1978 non-null object\n",
" 4 articleType 1978 non-null object\n",
" 5 baseColour 1978 non-null object\n",
" 6 season 1978 non-null object\n",
" 7 year 1978 non-null int64 \n",
" 8 usage 1978 non-null object\n",
" 9 productDisplayName 1978 non-null object\n",
"dtypes: int64(2), object(8)\n",
"memory usage: 170.0+ KB\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>gender</th>\n",
" <th>masterCategory</th>\n",
" <th>subCategory</th>\n",
" <th>articleType</th>\n",
" <th>baseColour</th>\n",
" <th>season</th>\n",
" <th>year</th>\n",
" <th>usage</th>\n",
" <th>productDisplayName</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>15970</td>\n",
" <td>Men</td>\n",
" <td>Apparel</td>\n",
" <td>Topwear</td>\n",
" <td>Shirts</td>\n",
" <td>Navy Blue</td>\n",
" <td>Fall</td>\n",
" <td>2011</td>\n",
" <td>Casual</td>\n",
" <td>Turtle Check Men Navy Blue Shirt</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>39386</td>\n",
" <td>Men</td>\n",
" <td>Apparel</td>\n",
" <td>Bottomwear</td>\n",
" <td>Jeans</td>\n",
" <td>Blue</td>\n",
" <td>Summer</td>\n",
" <td>2012</td>\n",
" <td>Casual</td>\n",
" <td>Peter England Men Party Blue Jeans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>59263</td>\n",
" <td>Women</td>\n",
" <td>Accessories</td>\n",
" <td>Watches</td>\n",
" <td>Watches</td>\n",
" <td>Silver</td>\n",
" <td>Winter</td>\n",
" <td>2016</td>\n",
" <td>Casual</td>\n",
" <td>Titan Women Silver Watch</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>21379</td>\n",
" <td>Men</td>\n",
" <td>Apparel</td>\n",
" <td>Bottomwear</td>\n",
" <td>Track Pants</td>\n",
" <td>Black</td>\n",
" <td>Fall</td>\n",
" <td>2011</td>\n",
" <td>Casual</td>\n",
" <td>Manchester United Men Solid Black Track Pants</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>53759</td>\n",
" <td>Men</td>\n",
" <td>Apparel</td>\n",
" <td>Topwear</td>\n",
" <td>Tshirts</td>\n",
" <td>Grey</td>\n",
" <td>Summer</td>\n",
" <td>2012</td>\n",
" <td>Casual</td>\n",
" <td>Puma Men Grey T-shirt</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id gender masterCategory subCategory articleType baseColour season \n",
"0 15970 Men Apparel Topwear Shirts Navy Blue Fall \\\n",
"1 39386 Men Apparel Bottomwear Jeans Blue Summer \n",
"2 59263 Women Accessories Watches Watches Silver Winter \n",
"3 21379 Men Apparel Bottomwear Track Pants Black Fall \n",
"4 53759 Men Apparel Topwear Tshirts Grey Summer \n",
"\n",
" year usage productDisplayName \n",
"0 2011 Casual Turtle Check Men Navy Blue Shirt \n",
"1 2012 Casual Peter England Men Party Blue Jeans \n",
"2 2016 Casual Titan Women Silver Watch \n",
"3 2011 Casual Manchester United Men Solid Black Track Pants \n",
"4 2012 Casual Puma Men Grey T-shirt "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# imports\n",
"import pandas as pd\n",
"import numpy as np\n",
"from typing import List\n",
"\n",
"from openai.embeddings_utils import (\n",
" get_embedding,\n",
" distances_from_embeddings,\n",
" tsne_components_from_embeddings,\n",
" chart_from_components,\n",
" indices_of_nearest_neighbors_from_distances,\n",
")\n",
"\n",
"# constants\n",
"EMBEDDING_MODEL = \"text-embedding-ada-002\"\n",
"\n",
"# load in data and clean data types and drop null rows\n",
"df = pd.read_csv(\"../../data/styles_2k.csv\", on_bad_lines='skip')\n",
"df.dropna(inplace=True)\n",
"df[\"year\"] = df[\"year\"].astype(int)\n",
"df.info()\n",
"\n",
"# print dataframe\n",
"n_examples = 5\n",
"df.head(n_examples)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "3ce1ec50",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Index: 1978 entries, 0 to 1998\n",
"Data columns (total 11 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 product_id 1978 non-null int64 \n",
" 1 gender 1978 non-null object\n",
" 2 masterCategory 1978 non-null object\n",
" 3 subCategory 1978 non-null object\n",
" 4 articleType 1978 non-null object\n",
" 5 baseColour 1978 non-null object\n",
" 6 season 1978 non-null object\n",
" 7 year 1978 non-null int64 \n",
" 8 usage 1978 non-null object\n",
" 9 productDisplayName 1978 non-null object\n",
" 10 product_text 1978 non-null object\n",
"dtypes: int64(2), object(9)\n",
"memory usage: 185.4+ KB\n"
]
}
],
"source": [
"df[\"product_text\"] = df.apply(lambda row: f\"name {row['productDisplayName']} category {row['masterCategory']} subcategory {row['subCategory']} color {row['baseColour']} gender {row['gender']}\".lower(), axis=1)\n",
"df.rename({\"id\":\"product_id\"}, inplace=True, axis=1)\n",
"\n",
"df.info()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "13859ab5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'name turtle check men navy blue shirt category apparel subcategory topwear color navy blue gender men'"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# check out one of the texts we will use to create semantic embeddings\n",
"df[\"product_text\"][0]"
]
},
{
"cell_type": "markdown",
"id": "91df4d5b",
"metadata": {},
"source": [
"## Connect to Redis\n",
"\n",
"Now that we have our Redis database running, we can connect to it using the Redis-py client. We will use the default host and port for the Redis database which is `localhost:6379`.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "cc662c1b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import redis\n",
"from redis.commands.search.indexDefinition import (\n",
" IndexDefinition,\n",
" IndexType\n",
")\n",
"from redis.commands.search.query import Query\n",
"from redis.commands.search.field import (\n",
" TagField,\n",
" NumericField,\n",
" TextField,\n",
" VectorField\n",
")\n",
"\n",
"REDIS_HOST = \"localhost\"\n",
"REDIS_PORT = 6379\n",
"REDIS_PASSWORD = \"\" # default for passwordless Redis\n",
"\n",
"# Connect to Redis\n",
"redis_client = redis.Redis(\n",
" host=REDIS_HOST,\n",
" port=REDIS_PORT,\n",
" password=REDIS_PASSWORD\n",
")\n",
"redis_client.ping()"
]
},
{
"cell_type": "markdown",
"id": "7d3dac3c",
"metadata": {},
"source": [
"## Creating a Search Index in Redis\n",
"\n",
"The below cells will show how to specify and create a search index in Redis. We will:\n",
"\n",
"1. Set some constants for defining our index like the distance metric and the index name\n",
"2. Define the index schema with RediSearch fields\n",
"3. Create the index"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "f894b911",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# Constants\n",
"INDEX_NAME = \"product_embeddings\" # name of the search index\n",
"PREFIX = \"doc\" # prefix for the document keys\n",
"DISTANCE_METRIC = \"L2\" # distance metric for the vectors (ex. COSINE, IP, L2)\n",
"NUMBER_OF_VECTORS = len(df)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "15db8380",
"metadata": {},
"outputs": [],
"source": [
"# Define RediSearch fields for each of the columns in the dataset\n",
"name = TextField(name=\"productDisplayName\")\n",
"category = TagField(name=\"masterCategory\")\n",
"articleType = TagField(name=\"articleType\")\n",
"gender = TagField(name=\"gender\")\n",
"season = TagField(name=\"season\")\n",
"year = NumericField(name=\"year\")\n",
"text_embedding = VectorField(\"product_vector\",\n",
" \"FLAT\", {\n",
" \"TYPE\": \"FLOAT32\",\n",
" \"DIM\": 1536,\n",
" \"DISTANCE_METRIC\": DISTANCE_METRIC,\n",
" \"INITIAL_CAP\": NUMBER_OF_VECTORS,\n",
" }\n",
")\n",
"fields = [name, category, articleType, gender, season, year, text_embedding]"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "3658693c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index already exists\n"
]
}
],
"source": [
"# Check if index exists\n",
"try:\n",
" redis_client.ft(INDEX_NAME).info()\n",
" print(\"Index already exists\")\n",
"except:\n",
" # Create RediSearch Index\n",
" redis_client.ft(INDEX_NAME).create_index(\n",
" fields = fields,\n",
" definition = IndexDefinition(prefix=[PREFIX], index_type=IndexType.HASH)\n",
")"
]
},
{
"cell_type": "markdown",
"id": "775c15b4",
"metadata": {},
"source": [
"## Generate OpenAI Embeddings and Load Documents into the Index\n",
"\n",
"Now that we have a search index, we can load documents into it. We will use the dataframe containing the styles dataset loaded previously. In Redis, either the HASH or JSON (if using RedisJSON in addition to RediSearch) data types can be used to store documents. We will use the HASH data type in this example. The cells bellow will show how to get OpenAI embeddings for the different products and load documents into the index."
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "0d791186",
"metadata": {},
"outputs": [],
"source": [
"def index_documents(client: redis.Redis, prefix: str, documents: pd.DataFrame):\n",
" records = documents.to_dict(\"records\")\n",
" for doc in records:\n",
" key = f\"{prefix}:{str(doc['product_id'])}\"\n",
"\n",
" # create byte vectors\n",
" text_embedding = np.array(get_embedding(doc[\"product_text\"], EMBEDDING_MODEL), dtype=np.float32).tobytes()\n",
"\n",
" # replace list of floats with byte vectors\n",
" doc[\"product_vector\"] = text_embedding\n",
"\n",
" client.hset(key, mapping = doc)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "5bfaeafa",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loaded 1978 documents in Redis search index with name: product_embeddings\n"
]
}
],
"source": [
"# the styles_2k dataset should take 5-10min to load. the larger styles_40k dataset will take hours\n",
"index_documents(redis_client, PREFIX, df)\n",
"print(f\"Loaded {redis_client.info()['db0']['keys']} documents in Redis search index with name: {INDEX_NAME}\")"
]
},
{
"cell_type": "markdown",
"id": "46050ca9",
"metadata": {},
"source": [
"## Simple Vector Search Queries with OpenAI Query Embeddings\n",
"\n",
"Now that we have a search index and documents loaded into it, we can run search queries. Below we will provide a function that will run a search query and return the results. Using this function we run a few queries that will show how you can utilize Redis as a vector database."
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "b044aa93",
"metadata": {},
"outputs": [],
"source": [
"def search_redis(\n",
" redis_client: redis.Redis,\n",
" user_query: str,\n",
" index_name: str = \"product_embeddings\",\n",
" vector_field: str = \"product_vector\",\n",
" return_fields: list = [\"productDisplayName\", \"masterCategory\", \"gender\", \"season\", \"year\", \"vector_score\"],\n",
" hybrid_fields = \"*\",\n",
" k: int = 20,\n",
" print_results: bool = True,\n",
") -> List[dict]:\n",
"\n",
" # Creates embedding vector from user query\n",
" embedded_query = openai.Embedding.create(input=user_query,\n",
" model=\"text-embedding-ada-002\",\n",
" )[\"data\"][0]['embedding']\n",
"\n",
" # Prepare the Query\n",
" base_query = f'{hybrid_fields}=>[KNN {k} @{vector_field} $vector AS vector_score]'\n",
" query = (\n",
" Query(base_query)\n",
" .return_fields(*return_fields)\n",
" .sort_by(\"vector_score\")\n",
" .paging(0, k)\n",
" .dialect(2)\n",
" )\n",
" params_dict = {\"vector\": np.array(embedded_query).astype(dtype=np.float32).tobytes()}\n",
" \n",
" # perform vector search\n",
" results = redis_client.ft(index_name).search(query, params_dict)\n",
" if print_results:\n",
" for i, product in enumerate(results.docs):\n",
" score = 1 - float(product.vector_score)\n",
" print(f\"{i}. {product.productDisplayName} (Score: {round(score ,3) })\")\n",
" return results.docs"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "7e2025f6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0. John Players Men Blue Jeans (Score: 0.791)\n",
"1. Lee Men Tino Blue Jeans (Score: 0.775)\n",
"2. Peter England Men Party Blue Jeans (Score: 0.763)\n",
"3. Lee Men Blue Chicago Fit Jeans (Score: 0.761)\n",
"4. Lee Men Blue Chicago Fit Jeans (Score: 0.761)\n",
"5. French Connection Men Blue Jeans (Score: 0.74)\n",
"6. Locomotive Men Washed Blue Jeans (Score: 0.739)\n",
"7. Locomotive Men Washed Blue Jeans (Score: 0.739)\n",
"8. Do U Speak Green Men Blue Shorts (Score: 0.736)\n",
"9. Palm Tree Kids Boy Washed Blue Jeans (Score: 0.732)\n"
]
}
],
"source": [
"# For using OpenAI to generate query embedding\n",
"results = search_redis(redis_client, 'man blue jeans', k=10)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "93c4a696",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0. John Players Men Blue Jeans (Score: 0.739)\n",
"1. Lee Men Tino Blue Jeans (Score: 0.72)\n",
"2. Peter England Men Party Blue Jeans (Score: 0.718)\n",
"3. Denizen Women Blue Jeans (Score: 0.715)\n",
"4. Jealous 21 Women Washed Blue Jeans (Score: 0.708)\n",
"5. Jealous 21 Women Washed Blue Jeans (Score: 0.708)\n",
"6. Levis Kids Blue Solid Jean (Score: 0.706)\n",
"7. French Connection Men Blue Jeans (Score: 0.705)\n",
"8. Lee Men Blue Chicago Fit Jeans (Score: 0.705)\n",
"9. Lee Men Blue Chicago Fit Jeans (Score: 0.705)\n"
]
}
],
"source": [
"results = search_redis(redis_client, 'blue jeans', vector_field='product_vector', k=10)"
]
},
{
"cell_type": "markdown",
"id": "2007be48",
"metadata": {},
"source": [
"## Hybrid Queries with Redis\n",
"\n",
"The previous examples showed how run vector search queries with RediSearch. In this section, we will show how to combine vector search with other RediSearch fields for hybrid search. In the example below, we will combine vector search with full text search."
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "8a56633b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0. Basics Men White Slim Fit Striped Shirt (Score: 0.633)\n",
"1. ADIDAS Men's Slim Fit White T-shirt (Score: 0.628)\n",
"2. Basics Men Blue Slim Fit Checked Shirt (Score: 0.627)\n",
"3. Basics Men Blue Slim Fit Checked Shirt (Score: 0.627)\n",
"4. Basics Men Red Slim Fit Checked Shirt (Score: 0.623)\n",
"5. Basics Men Navy Slim Fit Checked Shirt (Score: 0.613)\n",
"6. Lee Rinse Navy Blue Slim Fit Jeans (Score: 0.555)\n",
"7. Tokyo Talkies Women Navy Slim Fit Jeans (Score: 0.553)\n"
]
}
],
"source": [
"# hybrid query for shirt in the product vector and only include results with the phrase \"slim fit\" in the title\n",
"results = search_redis(redis_client,\n",
" \"shirt\",\n",
" vector_field=\"product_vector\",\n",
" k=10,\n",
" hybrid_fields='@productDisplayName:\"slim fit\"'\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "6c25ee8d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0. Titan Women Gold Watch (Score: 0.544)\n",
"1. Being Human Men Grey Dial Blue Strap Watch (Score: 0.544)\n",
"2. Police Men Black Dial Watch PL12170JSB (Score: 0.544)\n",
"3. Titan Men Black Watch (Score: 0.543)\n",
"4. Police Men Black Dial Chronograph Watch PL12777JS-02M (Score: 0.542)\n",
"5. Police Men Black Dial Watch PL12778MSU-61 (Score: 0.542)\n",
"6. CASIO Youth Series Digital Men Black Small Dial Digital Watch W-210-1CVDF I065 (Score: 0.542)\n",
"7. Titan Women Silver Watch (Score: 0.542)\n",
"8. Titan Raga Women Gold Watch (Score: 0.539)\n",
"9. ADIDAS Original Men Black Dial Chronograph Watch ADH2641 (Score: 0.539)\n"
]
}
],
"source": [
"# hybrid query for watch in the product vector and only include results with the tag \"Accessories\" in the masterCategory field\n",
"results = search_redis(redis_client,\n",
" \"watch\",\n",
" vector_field=\"product_vector\",\n",
" k=10,\n",
" hybrid_fields='@masterCategory:{Accessories}'\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "2c0d11d8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0. Enroute Teens Orange Sandals (Score: 0.701)\n",
"1. Fila Men Camper Brown Sandals (Score: 0.694)\n",
"2. Enroute Teens Brown Sandals (Score: 0.69)\n",
"3. Coolers Men Black Sandals (Score: 0.69)\n",
"4. Coolers Men Black Sandals (Score: 0.69)\n",
"5. Clarks Men Black Leather Closed Sandals (Score: 0.69)\n",
"6. Crocs Dora Boots Pink Sandals (Score: 0.69)\n",
"7. Enroute Men Leather Black Sandals (Score: 0.685)\n",
"8. ADIDAS Men Navy Blue Benton Sandals (Score: 0.684)\n",
"9. Coolers Men Black Sports Sandals (Score: 0.684)\n"
]
}
],
"source": [
"# hybrid query for sandals in the product vector and only include results within the 2011-2012 year range\n",
"results = search_redis(redis_client,\n",
" \"sandals\",\n",
" vector_field=\"product_vector\",\n",
" k=10,\n",
" hybrid_fields='@year:[2011 2012]'\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7caad384",
"metadata": {},
"outputs": [],
"source": [
"# hybrid query for sandals in the product vector and only include results within the 2011-2012 year range from the summer season\n",
"results = search_redis(redis_client,\n",
" \"blue sandals\",\n",
" vector_field=\"product_vector\",\n",
" k=10,\n",
" hybrid_fields='(@year:[2011 2012] @season:{Summer})'\n",
" )"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
},
"vscode": {
"interpreter": {
"hash": "9b1e6e9c2967143209c2f955cb869d1d3234f92dc4787f49f155f3abbdfb1316"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading…
Cancel
Save