Changes and additions in response to PR comments

pull/1077/head
Michael Yuan 1 year ago
parent ff64125156
commit 4e22d695c8

@ -7,26 +7,9 @@
"source": [
"# Running Hybrid VSS Queries with Redis and OpenAI\n",
"\n",
"This notebook provides an introduction to using Redis as a vector database with OpenAI embeddings and running hybrid queries that combine VSS and lexical search using Redis Search and Query capability. Redis is a scalable, real-time database that can be used as a vector database when using the [RediSearch Module](https://oss.redislabs.com/redisearch/). The RediSearch module allows you to index and search for vectors in Redis. This notebook will show you how to use the RediSearch module to index and search for vectors created by using the OpenAI API and stored in Redis.\n",
"This notebook provides an introduction to using Redis as a vector database with OpenAI embeddings and running hybrid queries that combine VSS and lexical search using Redis Query and Search capability. Redis is a scalable, real-time database that can be used as a vector database when using the [RediSearch Module](https://oss.redislabs.com/redisearch/). The Redis Query and Search capability allows you to index and search for vectors in Redis. This notebook will show you how to use the Redis Query and Search to index and search for vectors created by using the OpenAI API and stored in Redis.\n",
"\n",
"### What is Redis?\n",
"\n",
"Most developers from a web services background are probably familiar with Redis. At it's core, Redis is an open-source key-value store that can be used as a cache, message broker, and database. Developers choice Redis because it is fast, has a large ecosystem of client libraries, and has been deployed by major enterprises for years.\n",
"\n",
"In addition to the traditional uses of Redis. Redis also provides [Redis Modules](https://redis.io/modules) which are a way to extend Redis with new data types and commands. Example modules include [RedisJSON](https://redis.io/docs/stack/json/), [RedisTimeSeries](https://redis.io/docs/stack/timeseries/), [RedisBloom](https://redis.io/docs/stack/bloom/) and [RediSearch](https://redis.io/docs/stack/search/).\n",
"\n",
"### What is RediSearch?\n",
"\n",
"RediSearch is a [Redis module](https://redis.io/modules) that provides querying, secondary indexing, full-text search and vector search for Redis. To use RediSearch, you first declare indexes on your Redis data. You can then use the RediSearch clients to query that data. For more information on the feature set of RediSearch, see the [README](./README.md) or the [RediSearch documentation](https://redis.io/docs/stack/search/).\n",
"\n",
"### Deployment options\n",
"\n",
"There are a number of ways to deploy Redis. For local development, the quickest method is to use the [Redis Stack docker container](https://hub.docker.com/r/redis/redis-stack) which we will use here. Redis Stack contains a number of Redis modules that can be used together to create a fast, multi-model data store and query engine.\n",
"\n",
"For production use cases, The easiest way to get started is to use the [Redis Cloud](https://redislabs.com/redis-enterprise-cloud/overview/) service. Redis Cloud is a fully managed Redis service. You can also deploy Redis on your own infrastructure using [Redis Enterprise](https://redislabs.com/redis-enterprise/overview/). Redis Enterprise is a fully managed Redis service that can be deployed in kubernetes, on-premises or in the cloud.\n",
"\n",
"Additionally, every major cloud provider ([AWS Marketplace](https://aws.amazon.com/marketplace/pp/prodview-e6y7ork67pjwg?sr=0-2&ref_=beagle&applicationId=AWSMPContessa), [Google Marketplace](https://console.cloud.google.com/marketplace/details/redislabs-public/redis-enterprise?pli=1), or [Azure Marketplace](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/garantiadata.redis_enterprise_1sp_public_preview?tab=Overview)) offers Redis Enterprise in a marketplace offering.\n",
"\n"
"Hybrid queries combine vector similarity with traditional Redis Query and Search filtering capabilities on GEO, NUMERIC, TAG or TEXT data simplifying application code. A common example of a hybrid query in an e-commerce use case if to find items visually similar to a given query image limited to items available in a GEO location and within a price range."
]
},
{
@ -125,16 +108,6 @@
{
"cell_type": "code",
"execution_count": 2,
"id": "be28faa6",
"metadata": {},
"outputs": [],
"source": [
"! export OPENAI_API_KEY=\"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\""
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "88be138c",
"metadata": {},
"outputs": [
@ -152,14 +125,13 @@
"import os\n",
"import openai\n",
"\n",
"# Note. alternatively you can set a temporary env variable like this:\n",
"# os.environ[\"OPENAI_API_KEY\"] = 'sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'\n",
"os.environ[\"OPENAI_API_KEY\"] = '<YOUR_OPENAI_API_KEY>'\n",
"\n",
"if os.getenv(\"OPENAI_API_KEY\") is not None:\n",
" openai.api_key = os.getenv(\"OPENAI_API_KEY\")\n",
" print (\"OPENAI_API_KEY is ready\")\n",
"else:\n",
" print (\"OPENAI_API_KEY environment variable not found\")"
" print (\"OPENAI_API_KEY environment variable not found\")\n"
]
},
{
@ -174,7 +146,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 3,
"id": "9fbebe0d",
"metadata": {},
"outputs": [
@ -320,7 +292,7 @@
"4 2012 Casual Puma Men Grey T-shirt "
]
},
"execution_count": 15,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
@ -355,7 +327,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 4,
"id": "3ce1ec50",
"metadata": {},
"outputs": [
@ -393,7 +365,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 5,
"id": "13859ab5",
"metadata": {},
"outputs": [
@ -403,7 +375,7 @@
"'name turtle check men navy blue shirt category apparel subcategory topwear color navy blue gender men'"
]
},
"execution_count": 17,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
@ -426,7 +398,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 6,
"id": "cc662c1b",
"metadata": {},
"outputs": [
@ -436,7 +408,7 @@
"True"
]
},
"execution_count": 18,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@ -484,7 +456,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 7,
"id": "f894b911",
"metadata": {
"scrolled": true
@ -500,7 +472,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 8,
"id": "15db8380",
"metadata": {},
"outputs": [],
@ -525,18 +497,10 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 9,
"id": "3658693c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index already exists\n"
]
}
],
"outputs": [],
"source": [
"# Check if index exists\n",
"try:\n",
@ -562,13 +526,17 @@
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": 10,
"id": "0d791186",
"metadata": {},
"outputs": [],
"source": [
"def index_documents(client: redis.Redis, prefix: str, documents: pd.DataFrame):\n",
" records = documents.to_dict(\"records\")\n",
" \n",
" # Use Redis pipelines to batch calls and save on round trip network communication\n",
" pipe = client.pipeline()\n",
" batch = 0\n",
" for doc in records:\n",
" key = f\"{prefix}:{str(doc['product_id'])}\"\n",
"\n",
@ -578,12 +546,17 @@
" # replace list of floats with byte vectors\n",
" doc[\"product_vector\"] = text_embedding\n",
"\n",
" client.hset(key, mapping = doc)"
" pipe.hset(key, mapping = doc)\n",
" batch += 1\n",
" if batch == 500:\n",
" pipe.execute()\n",
" batch = 0\n",
" pipe.execute()"
]
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": 11,
"id": "5bfaeafa",
"metadata": {},
"outputs": [
@ -613,7 +586,7 @@
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": 12,
"id": "b044aa93",
"metadata": {},
"outputs": [],
@ -629,7 +602,7 @@
" print_results: bool = True,\n",
") -> List[dict]:\n",
"\n",
" # Creates embedding vector from user query\n",
" # Use OpenAI to create embedding vector from user query\n",
" embedded_query = openai.Embedding.create(input=user_query,\n",
" model=\"text-embedding-ada-002\",\n",
" )[\"data\"][0]['embedding']\n",
@ -656,7 +629,7 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": 13,
"id": "7e2025f6",
"metadata": {},
"outputs": [
@ -678,50 +651,56 @@
}
],
"source": [
"# For using OpenAI to generate query embedding\n",
"# Execute a simple vector search in Redis\n",
"results = search_redis(redis_client, 'man blue jeans', k=10)"
]
},
{
"cell_type": "markdown",
"id": "2007be48",
"metadata": {},
"source": [
"## Hybrid Queries with Redis\n",
"\n",
"The previous examples showed how run vector search queries with RediSearch. In this section, we will show how to combine vector search with other RediSearch fields for hybrid search. In the example below, we will combine vector search with full text search."
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "93c4a696",
"execution_count": 14,
"id": "0c4f4d0f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0. John Players Men Blue Jeans (Score: 0.739)\n",
"1. Lee Men Tino Blue Jeans (Score: 0.72)\n",
"2. Peter England Men Party Blue Jeans (Score: 0.718)\n",
"3. Denizen Women Blue Jeans (Score: 0.715)\n",
"4. Jealous 21 Women Washed Blue Jeans (Score: 0.708)\n",
"5. Jealous 21 Women Washed Blue Jeans (Score: 0.708)\n",
"6. Levis Kids Blue Solid Jean (Score: 0.706)\n",
"7. French Connection Men Blue Jeans (Score: 0.705)\n",
"8. Lee Men Blue Chicago Fit Jeans (Score: 0.705)\n",
"9. Lee Men Blue Chicago Fit Jeans (Score: 0.705)\n"
"0. John Players Men Blue Jeans (Score: 0.791)\n",
"1. Lee Men Tino Blue Jeans (Score: 0.775)\n",
"2. Peter England Men Party Blue Jeans (Score: 0.763)\n",
"3. French Connection Men Blue Jeans (Score: 0.74)\n",
"4. Locomotive Men Washed Blue Jeans (Score: 0.739)\n",
"5. Locomotive Men Washed Blue Jeans (Score: 0.739)\n",
"6. Palm Tree Kids Boy Washed Blue Jeans (Score: 0.732)\n",
"7. Denizen Women Blue Jeans (Score: 0.725)\n",
"8. Jealous 21 Women Washed Blue Jeans (Score: 0.713)\n",
"9. Jealous 21 Women Washed Blue Jeans (Score: 0.713)\n"
]
}
],
"source": [
"results = search_redis(redis_client, 'blue jeans', vector_field='product_vector', k=10)"
]
},
{
"cell_type": "markdown",
"id": "2007be48",
"metadata": {},
"source": [
"## Hybrid Queries with Redis\n",
"\n",
"The previous examples showed how run vector search queries with RediSearch. In this section, we will show how to combine vector search with other RediSearch fields for hybrid search. In the example below, we will combine vector search with full text search."
"# improve search quality by adding hybrid query for \"man blue jeans\" in the product vector combined with a phrase search for \"blue jeans\"\n",
"results = search_redis(redis_client,\n",
" \"man blue jeans\",\n",
" vector_field=\"product_vector\",\n",
" k=10,\n",
" hybrid_fields='@productDisplayName:\"blue jeans\"'\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": 15,
"id": "8a56633b",
"metadata": {},
"outputs": [
@ -735,7 +714,7 @@
"3. Basics Men Blue Slim Fit Checked Shirt (Score: 0.627)\n",
"4. Basics Men Red Slim Fit Checked Shirt (Score: 0.623)\n",
"5. Basics Men Navy Slim Fit Checked Shirt (Score: 0.613)\n",
"6. Lee Rinse Navy Blue Slim Fit Jeans (Score: 0.555)\n",
"6. Lee Rinse Navy Blue Slim Fit Jeans (Score: 0.559)\n",
"7. Tokyo Talkies Women Navy Slim Fit Jeans (Score: 0.553)\n"
]
}
@ -752,7 +731,7 @@
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": 16,
"id": "6c25ee8d",
"metadata": {},
"outputs": [
@ -765,11 +744,11 @@
"2. Police Men Black Dial Watch PL12170JSB (Score: 0.544)\n",
"3. Titan Men Black Watch (Score: 0.543)\n",
"4. Police Men Black Dial Chronograph Watch PL12777JS-02M (Score: 0.542)\n",
"5. Police Men Black Dial Watch PL12778MSU-61 (Score: 0.542)\n",
"6. CASIO Youth Series Digital Men Black Small Dial Digital Watch W-210-1CVDF I065 (Score: 0.542)\n",
"7. Titan Women Silver Watch (Score: 0.542)\n",
"8. Titan Raga Women Gold Watch (Score: 0.539)\n",
"9. ADIDAS Original Men Black Dial Chronograph Watch ADH2641 (Score: 0.539)\n"
"5. CASIO Youth Series Digital Men Black Small Dial Digital Watch W-210-1CVDF I065 (Score: 0.542)\n",
"6. Titan Women Silver Watch (Score: 0.542)\n",
"7. Police Men Black Dial Watch PL12778MSU-61 (Score: 0.541)\n",
"8. ADIDAS Original Men Black Dial Chronograph Watch ADH2641 (Score: 0.539)\n",
"9. Titan Raga Women Gold Watch (Score: 0.539)\n"
]
}
],
@ -785,7 +764,7 @@
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": 17,
"id": "2c0d11d8",
"metadata": {},
"outputs": [
@ -794,11 +773,11 @@
"output_type": "stream",
"text": [
"0. Enroute Teens Orange Sandals (Score: 0.701)\n",
"1. Fila Men Camper Brown Sandals (Score: 0.694)\n",
"2. Enroute Teens Brown Sandals (Score: 0.69)\n",
"1. Fila Men Camper Brown Sandals (Score: 0.692)\n",
"2. Coolers Men Black Sandals (Score: 0.69)\n",
"3. Coolers Men Black Sandals (Score: 0.69)\n",
"4. Coolers Men Black Sandals (Score: 0.69)\n",
"5. Clarks Men Black Leather Closed Sandals (Score: 0.69)\n",
"4. Clarks Men Black Leather Closed Sandals (Score: 0.69)\n",
"5. Enroute Teens Brown Sandals (Score: 0.69)\n",
"6. Crocs Dora Boots Pink Sandals (Score: 0.69)\n",
"7. Enroute Men Leather Black Sandals (Score: 0.685)\n",
"8. ADIDAS Men Navy Blue Benton Sandals (Score: 0.684)\n",
@ -818,10 +797,27 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 18,
"id": "7caad384",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0. ADIDAS Men Navy Blue Benton Sandals (Score: 0.691)\n",
"1. Enroute Teens Brown Sandals (Score: 0.681)\n",
"2. ADIDAS Women's Adi Groove Blue Flip Flop (Score: 0.672)\n",
"3. Enroute Women Turquoise Blue Flats (Score: 0.671)\n",
"4. Red Tape Men Black Sandals (Score: 0.67)\n",
"5. Enroute Teens Orange Sandals (Score: 0.661)\n",
"6. Vans Men Blue Era Scilla Plaid Shoes (Score: 0.658)\n",
"7. FILA Men Aruba Navy Blue Sandal (Score: 0.657)\n",
"8. Quiksilver Men Blue Flip Flops (Score: 0.656)\n",
"9. Reebok Men Navy Twist Sandals (Score: 0.656)\n"
]
}
],
"source": [
"# hybrid query for sandals in the product vector and only include results within the 2011-2012 year range from the summer season\n",
"results = search_redis(redis_client,\n",
@ -831,6 +827,31 @@
" hybrid_fields='(@year:[2011 2012] @season:{Summer})'\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "9bc2fd74",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0. Wrangler Men Leather Brown Belt (Score: 0.67)\n",
"1. Wrangler Women Black Belt (Score: 0.639)\n"
]
}
],
"source": [
"# hybrid query for a brown belt filtering results by a year (NUMERIC) with a specific article type (TAG) and with a brand name (TEXT)\n",
"results = search_redis(redis_client,\n",
" \"brown belt\",\n",
" vector_field=\"product_vector\",\n",
" k=10,\n",
" hybrid_fields='(@year:[2012 2012] @articleType:{Belts} @productDisplayName:\"Wrangler\")'\n",
" )"
]
}
],
"metadata": {

Loading…
Cancel
Save