Fix redisjson

pull/408/head
joeywhelan 1 year ago
parent d505735974
commit f69e8a4ace

@ -7,6 +7,7 @@
"source": [
"# Redis Vectors as JSON with OpenAI\n",
"This notebook expands on the other Redis OpenAI-cookbook examples with examples of how to use JSON with vectors.\n",
"[Storing Vectors in JSON](https://redis.io/docs/stack/search/reference/vectors/#storing-vectors-in-json)\n",
"\n",
"## Prerequisites\n",
"* Redis instance with the Redis Search and Redis JSON modules\n",
@ -27,7 +28,7 @@
},
"outputs": [],
"source": [
"pip install -r requirements.txt"
"! pip install redis openai python-dotenv openai[datalib]"
]
},
{
@ -63,7 +64,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
@ -86,7 +87,6 @@
"The government was keen to play down the worrying implications of the data. \"I maintain the view that Japan's economy remains in a minor adjustment phase in an upward climb, and we will monitor developments carefully,\" said economy minister Heizo Takenaka. But in the face of the strengthening yen making exports less competitive and indications of weakening economic conditions ahead, observers were less sanguine. \"It's painting a picture of a recovery... much patchier than previously thought,\" said Paul Sheard, economist at Lehman Brothers in Tokyo. Improvements in the job market apparently have yet to feed through to domestic demand, with private consumption up just 0.2% in the third quarter.\n",
"\"\"\"\n",
"\n",
"\n",
"text_2 = \"\"\"Dibaba breaks 5,000m world record\n",
"\n",
"Ethiopia's Tirunesh Dibaba set a new world record in winning the women's 5,000m at the Boston Indoor Games.\n",
@ -123,15 +123,33 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 24,
"metadata": {
"vscode": {
"languageId": "shellscript"
}
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[1A\u001b[1B\u001b[0G\u001b[?25l[+] Running 0/0\n",
" ⠿ Container redisjson-redis-1 Starting \u001b[34m0.1s \u001b[0m\n",
"\u001b[?25h\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 0/1\n",
" ⠿ Container redisjson-redis-1 Starting \u001b[34m0.2s \u001b[0m\n",
"\u001b[?25h\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 0/1\n",
" ⠿ Container redisjson-redis-1 Starting \u001b[34m0.3s \u001b[0m\n",
"\u001b[?25h\u001b[1A\u001b[1A\u001b[0G\u001b[?25l[+] Running 0/1\n",
" ⠿ Container redisjson-redis-1 Starting \u001b[34m0.4s \u001b[0m\n",
"\u001b[?25h\u001b[1A\u001b[1A\u001b[0G\u001b[?25l\u001b[34m[+] Running 1/1\u001b[0m\n",
" \u001b[32m✔\u001b[0m Container redisjson-redis-1 \u001b[32mStarted\u001b[0m \u001b[34m0.4s \u001b[0m\n",
"\u001b[?25h"
]
}
],
"source": [
"docker compose up -d"
"! docker compose up -d"
]
},
{
@ -144,9 +162,20 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 25,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from redis import from_url\n",
"\n",
@ -160,14 +189,26 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Index"
"## Create Index\n",
"[FT.CREATE](https://redis.io/commands/ft.create/)"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 26,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"b'OK'"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from redis.commands.search.field import TextField, VectorField\n",
"from redis.commands.search.indexDefinition import IndexDefinition, IndexType\n",
@ -181,6 +222,10 @@
" TextField('$.content', as_name='content')\n",
" ]\n",
"idx_def = IndexDefinition(index_type=IndexType.JSON, prefix=['doc:'])\n",
"try: \n",
" client.ft('idx').dropindex()\n",
"except:\n",
" pass\n",
"client.ft('idx').create_index(schema, definition=idx_def)"
]
},
@ -189,14 +234,26 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Data into Redis as JSON objects"
"## Load Data into Redis as JSON objects\n",
"[Redis JSON](https://redis.io/docs/stack/json/)"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 27,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"client.json().set('doc:1', '$', doc_1)\n",
"client.json().set('doc:2', '$', doc_2)\n",
@ -208,15 +265,51 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Semantic Search (KNN)\n",
"Given a sports-related article, search Redis via Vector Similarity Search (VSS) for similar articles. As a reminder, document #2 was a sports article."
"# Semantic Search\n",
"Given a sports-related article, search Redis via Vector Similarity Search (VSS) for similar articles. \n",
"[KNN Search](https://redis.io/docs/stack/search/reference/vectors/#knn-search)"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 28,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"distance:0.188 content:Dibaba breaks 5,000m world record\n",
"\n",
"Ethiopia's Tirunesh Dibaba set a new world record in winning the women's 5,000m at the Boston Indoor Games.\n",
"\n",
"Dibaba won in 14 minutes 32.93 seconds to erase the previous world indoor mark of 14:39.29 set by another Ethiopian, Berhane Adera, in Stuttgart last year. But compatriot Kenenisa Bekele's record hopes were dashed when he miscounted his laps in the men's 3,000m and staged his sprint finish a lap too soon. Ireland's Alistair Cragg won in 7:39.89 as Bekele battled to second in 7:41.42. \"I didn't want to sit back and get out-kicked,\" said Cragg. \"So I kept on the pace. The plan was to go with 500m to go no matter what, but when Bekele made the mistake that was it. The race was mine.\" Sweden's Carolina Kluft, the Olympic heptathlon champion, and Slovenia's Jolanda Ceplak had winning performances, too. Kluft took the long jump at 6.63m, while Ceplak easily won the women's 800m in 2:01.52. \n",
"\n",
"\n",
"distance:0.268 content:Japan narrowly escapes recession\n",
"\n",
"Japan's economy teetered on the brink of a technical recession in the three months to September, figures show.\n",
"\n",
"Revised figures indicated growth of just 0.1% - and a similar-sized contraction in the previous quarter. On an annual basis, the data suggests annual growth of just 0.2%, suggesting a much more hesitant recovery than had previously been thought. A common technical definition of a recession is two successive quarters of negative growth.\n",
"The government was keen to play down the worrying implications of the data. \"I maintain the view that Japan's economy remains in a minor adjustment phase in an upward climb, and we will monitor developments carefully,\" said economy minister Heizo Takenaka. But in the face of the strengthening yen making exports less competitive and indications of weakening economic conditions ahead, observers were less sanguine. \"It's painting a picture of a recovery... much patchier than previously thought,\" said Paul Sheard, economist at Lehman Brothers in Tokyo. Improvements in the job market apparently have yet to feed through to domestic demand, with private consumption up just 0.2% in the third quarter.\n",
"\n",
"\n",
"distance:0.287 content:Google's toolbar sparks concern\n",
"\n",
"Search engine firm Google has released a trial tool which is concerning some net users because it directs people to pre-selected commercial websites.\n",
"\n",
"The AutoLink feature comes with Google's latest toolbar and provides links in a webpage to Amazon.com if it finds a book's ISBN number on the site. It also links to Google's map service, if there is an address, or to car firm Carfax, if there is a licence plate. Google said the feature, available only in the US, \"adds useful links\". But some users are concerned that Google's dominant position in the search engine market place could mean it would be giving a competitive edge to firms like Amazon.\n",
"\n",
"AutoLink works by creating a link to a website based on information contained in a webpage - even if there is no link specified and whether or not the publisher of the page has given permission.\n",
"\n",
"If a user clicks the AutoLink feature in the Google toolbar then a webpage with a book's unique ISBN number would link directly to Amazon's website. It could mean online libraries that list ISBN book numbers find they are directing users to Amazon.com whether they like it or not. Websites which have paid for advertising on their pages may also be directing people to rival services. Dan Gillmor, founder of Grassroots Media, which supports citizen-based media, said the tool was a \"bad idea, and an unfortunate move by a company that is looking to continue its hypergrowth\". In a statement Google said the feature was still only in beta, ie trial, stage and that the company welcomed feedback from users. It said: \"The user can choose never to click on the AutoLink button, and web pages she views will never be modified. \"In addition, the user can choose to disable the AutoLink feature entirely at any time.\"\n",
"\n",
"The new tool has been compared to the Smart Tags feature from Microsoft by some users. It was widely criticised by net users and later dropped by Microsoft after concerns over trademark use were raised. Smart Tags allowed Microsoft to link any word on a web page to another site chosen by the company. Google said none of the companies which received AutoLinks had paid for the service. Some users said AutoLink would only be fair if websites had to sign up to allow the feature to work on their pages or if they received revenue for any \"click through\" to a commercial site. Cory Doctorow, European outreach coordinator for digital civil liberties group Electronic Fronter Foundation, said that Google should not be penalised for its market dominance. \"Of course Google should be allowed to direct people to whatever proxies it chooses. \"But as an end user I would want to know - 'Can I choose to use this service?, 'How much is Google being paid?', 'Can I substitute my own companies for the ones chosen by Google?'.\" Mr Doctorow said the only objection would be if users were forced into using AutoLink or \"tricked into using the service\".\n",
"\n",
"\n"
]
}
],
"source": [
"from redis.commands.search.query import Query\n",
"import numpy as np\n",
@ -231,11 +324,13 @@
"vec = np.array(get_vector(text_4), dtype=np.float32).tobytes()\n",
"q = Query('*=>[KNN 3 @vector $query_vec AS vector_score]')\\\n",
" .sort_by('vector_score')\\\n",
" .return_fields('vector_score')\\\n",
" .return_fields('vector_score', 'content')\\\n",
" .dialect(2) \n",
"params = {\"query_vec\": vec}\n",
"\n",
"print(client.ft('idx').search(q, query_params=params))"
"results = client.ft('idx').search(q, query_params=params)\n",
"for doc in results.docs:\n",
" print(f\"distance:{round(float(doc['vector_score']),3)} content:{doc['content']}\\n\")\n"
]
},
{
@ -244,14 +339,30 @@
"metadata": {},
"source": [
"## Hybrid Search\n",
"Use a combination of full text search and VSS to find a matching article. For this scenario, we filter on a full text search of the term 'recession' and then find the KNN articles. In this case, business-related. Reminder document #1 was about a recession in Japan."
"Use a combination of full text search and VSS to find a matching article. For this scenario, we filter on a full text search of the term 'recession' and then find the KNN articles. In this case, business-related. Reminder document #1 was about a recession in Japan.\n",
"[Hybrid Queries](https://redis.io/docs/stack/search/reference/vectors/#hybrid-queries)"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 29,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"distance:0.241 content:Japan narrowly escapes recession\n",
"\n",
"Japan's economy teetered on the brink of a technical recession in the three months to September, figures show.\n",
"\n",
"Revised figures indicated growth of just 0.1% - and a similar-sized contraction in the previous quarter. On an annual basis, the data suggests annual growth of just 0.2%, suggesting a much more hesitant recovery than had previously been thought. A common technical definition of a recession is two successive quarters of negative growth.\n",
"The government was keen to play down the worrying implications of the data. \"I maintain the view that Japan's economy remains in a minor adjustment phase in an upward climb, and we will monitor developments carefully,\" said economy minister Heizo Takenaka. But in the face of the strengthening yen making exports less competitive and indications of weakening economic conditions ahead, observers were less sanguine. \"It's painting a picture of a recovery... much patchier than previously thought,\" said Paul Sheard, economist at Lehman Brothers in Tokyo. Improvements in the job market apparently have yet to feed through to domestic demand, with private consumption up just 0.2% in the third quarter.\n",
"\n",
"\n"
]
}
],
"source": [
"text_5 = \"\"\"Ethiopia's crop production up 24%\n",
"\n",
@ -267,11 +378,13 @@
"vec = np.array(get_vector(text_5), dtype=np.float32).tobytes()\n",
"q = Query('@content:recession => [KNN 3 @vector $query_vec AS vector_score]')\\\n",
" .sort_by('vector_score')\\\n",
" .return_fields('vector_score')\\\n",
" .return_fields('vector_score', 'content')\\\n",
" .dialect(2) \n",
"params = {\"query_vec\": vec}\n",
"\n",
"print(client.ft('idx').search(q, query_params=params))"
"results = client.ft('idx').search(q, query_params=params)\n",
"for doc in results.docs:\n",
" print(f\"distance:{round(float(doc['vector_score']),3)} content:{doc['content']}\\n\")"
]
}
],

@ -1,105 +0,0 @@
aiohttp==3.8.4
aiosignal==1.3.1
anyio==3.6.2
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
arrow==1.2.3
asttokens==2.2.1
async-timeout==4.0.2
attrs==23.1.0
backcall==0.2.0
beautifulsoup4==4.12.2
bleach==6.0.0
certifi==2023.5.7
cffi==1.15.1
charset-normalizer==3.1.0
comm==0.1.3
debugpy==1.6.7
decorator==5.1.1
defusedxml==0.7.1
et-xmlfile==1.1.0
executing==1.2.0
fastjsonschema==2.16.3
fqdn==1.5.1
frozenlist==1.3.3
idna==3.4
ipykernel==6.22.0
ipython==8.13.2
ipython-genutils==0.2.0
ipywidgets==8.0.6
isoduration==20.11.0
jedi==0.18.2
Jinja2==3.1.2
jsonpointer==2.3
jsonschema==4.17.3
jupyter==1.0.0
jupyter-console==6.6.3
jupyter-events==0.6.3
jupyter_client==8.2.0
jupyter_core==5.3.0
jupyter_server==2.5.0
jupyter_server_terminals==0.4.4
jupyterlab-pygments==0.2.2
jupyterlab-widgets==3.0.7
MarkupSafe==2.1.2
matplotlib-inline==0.1.6
mistune==2.0.5
multidict==6.0.4
nbclassic==1.0.0
nbclient==0.7.4
nbconvert==7.3.1
nbformat==5.8.0
nest-asyncio==1.5.6
notebook==6.5.4
notebook_shim==0.2.3
numpy==1.24.3
openai==0.27.6
openpyxl==3.1.2
packaging==23.1
pandas==2.0.1
pandas-stubs==2.0.1.230501
pandocfilters==1.5.0
parso==0.8.3
pexpect==4.8.0
pickleshare==0.7.5
platformdirs==3.5.0
prometheus-client==0.16.0
prompt-toolkit==3.0.38
psutil==5.9.5
ptyprocess==0.7.0
pure-eval==0.2.2
pycparser==2.21
Pygments==2.15.1
pyrsistent==0.19.3
python-dateutil==2.8.2
python-dotenv==1.0.0
python-json-logger==2.0.7
pytz==2023.3
PyYAML==6.0
pyzmq==25.0.2
qtconsole==5.4.2
QtPy==2.3.1
redis==4.5.5
requests==2.30.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
Send2Trash==1.8.2
six==1.16.0
sniffio==1.3.0
soupsieve==2.4.1
stack-data==0.6.2
terminado==0.17.1
tinycss2==1.2.1
tornado==6.3.1
tqdm==4.65.0
traitlets==5.9.0
types-pytz==2023.3.0.0
tzdata==2023.3
uri-template==1.2.0
urllib3==2.0.2
wcwidth==0.2.6
webcolors==1.13
webencodings==0.5.1
websocket-client==1.5.1
widgetsnbextension==4.0.7
yarl==1.9.2
Loading…
Cancel
Save