mirror of
https://github.com/openai/openai-cookbook
synced 2024-11-17 15:29:46 +00:00
Merge pull request #81 from kacperlukawski/qdrant-example
Add Qdrant as another example of vector database
This commit is contained in:
commit
2fed004763
@ -69,6 +69,9 @@
|
|||||||
"# Weaviate's client library for Python\n",
|
"# Weaviate's client library for Python\n",
|
||||||
"import weaviate\n",
|
"import weaviate\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"# Qdrant's client library for Python\n",
|
||||||
|
"import qdrant_client\n",
|
||||||
|
"\n",
|
||||||
"# I've set this to our new embeddings model, this can be changed to the embedding model of your choice\n",
|
"# I've set this to our new embeddings model, this can be changed to the embedding model of your choice\n",
|
||||||
"EMBEDDING_MODEL = \"text-embedding-ada-002\"\n",
|
"EMBEDDING_MODEL = \"text-embedding-ada-002\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
@ -1048,7 +1051,313 @@
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"id": "ad74202e",
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Qdrant\n",
|
||||||
|
"\n",
|
||||||
|
"The last vector database we'll consider in **[Qdrant](https://qdrant.tech/)**. This is a high-performant vector search database written in Rust. It offers both on-premise and cloud version, but for the purposes of that example we're going to use the local deployment mode.\n",
|
||||||
|
"\n",
|
||||||
|
"Setting everything up will require:\n",
|
||||||
|
"- Spinning up a local instance of Qdrant\n",
|
||||||
|
"- Configuring the collection and storing the data in it\n",
|
||||||
|
"- Trying out with some queries"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Setup\n",
|
||||||
|
"\n",
|
||||||
|
"For the local deployment, we are going to use Docker, according to the Qdrant documentation: https://qdrant.tech/documentation/quick_start/. Qdrant requires just a single container, but an example of the docker-compose.yaml file is available at `./qdrant/docker-compose.yaml` in this repo.\n",
|
||||||
|
"\n",
|
||||||
|
"You can start Qdrant instance locally by navigating to this directory and running `docker-compose up -d `"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 27,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-01-18T09:28:38.928205Z",
|
||||||
|
"start_time": "2023-01-18T09:28:38.913987Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"qdrant = qdrant_client.QdrantClient(host='localhost', prefer_grpc=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 29,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-01-18T09:29:19.806639Z",
|
||||||
|
"start_time": "2023-01-18T09:29:19.727897Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"data": {
|
||||||
|
"text/plain": [
|
||||||
|
"CollectionsResponse(collections=[])"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"execution_count": 29,
|
||||||
|
"metadata": {},
|
||||||
|
"output_type": "execute_result"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"qdrant.get_collections()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Index data\n",
|
||||||
|
"\n",
|
||||||
|
"Qdrant stores data in __collections__ where each object is described by at least one vector and may contain an additional metadata called __payload__. Our collection will be called **Articles** and each object will be described by both **title** and **content** vectors.\n",
|
||||||
|
"\n",
|
||||||
|
"We're going to be using an official [qdrant-client](https://github.com/qdrant/qdrant_client) package that has all the utility methods already built-in."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 30,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-01-18T09:29:22.530121Z",
|
||||||
|
"start_time": "2023-01-18T09:29:22.524604Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from qdrant_client.http import models as rest"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 34,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-01-18T09:31:14.413334Z",
|
||||||
|
"start_time": "2023-01-18T09:31:13.619079Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"vector_size = len(article_df['content_vector'][0])\n",
|
||||||
|
"\n",
|
||||||
|
"qdrant.recreate_collection(\n",
|
||||||
|
" collection_name='Articles',\n",
|
||||||
|
" vectors_config={\n",
|
||||||
|
" 'title': rest.VectorParams(\n",
|
||||||
|
" distance=rest.Distance.COSINE,\n",
|
||||||
|
" size=vector_size,\n",
|
||||||
|
" ),\n",
|
||||||
|
" 'content': rest.VectorParams(\n",
|
||||||
|
" distance=rest.Distance.COSINE,\n",
|
||||||
|
" size=vector_size,\n",
|
||||||
|
" ),\n",
|
||||||
|
" }\n",
|
||||||
|
")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 37,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-01-18T09:36:28.597535Z",
|
||||||
|
"start_time": "2023-01-18T09:36:24.108867Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"data": {
|
||||||
|
"text/plain": [
|
||||||
|
"UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"execution_count": 37,
|
||||||
|
"metadata": {},
|
||||||
|
"output_type": "execute_result"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"qdrant.upsert(\n",
|
||||||
|
" collection_name='Articles',\n",
|
||||||
|
" points=[\n",
|
||||||
|
" rest.PointStruct(\n",
|
||||||
|
" id=k,\n",
|
||||||
|
" vector={\n",
|
||||||
|
" 'title': v['title_vector'],\n",
|
||||||
|
" 'content': v['content_vector'],\n",
|
||||||
|
" },\n",
|
||||||
|
" payload=v.to_dict(),\n",
|
||||||
|
" )\n",
|
||||||
|
" for k, v in article_df.iterrows()\n",
|
||||||
|
" ],\n",
|
||||||
|
")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 52,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-01-18T09:58:13.825886Z",
|
||||||
|
"start_time": "2023-01-18T09:58:13.816248Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"data": {
|
||||||
|
"text/plain": [
|
||||||
|
"CountResult(count=250)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"execution_count": 52,
|
||||||
|
"metadata": {},
|
||||||
|
"output_type": "execute_result"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"# Check the collection size to make sure all the points have been stored\n",
|
||||||
|
"qdrant.count(collection_name='Articles')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Search Data\n",
|
||||||
|
"\n",
|
||||||
|
"Once the data is put into Qdrant we can start querying the collection for the closest vectors. We may provide an additional parameter `vector_name` to switch from title to content based search."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 49,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-01-18T09:50:35.265647Z",
|
||||||
|
"start_time": "2023-01-18T09:50:35.256065Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"def query_qdrant(query, collection_name, vector_name='title', top_k=20):\n",
|
||||||
|
"\n",
|
||||||
|
" # Creates embedding vector from user query\n",
|
||||||
|
" embedded_query = openai.Embedding.create(\n",
|
||||||
|
" input=query,\n",
|
||||||
|
" model=EMBEDDING_MODEL,\n",
|
||||||
|
" )['data'][0]['embedding']\n",
|
||||||
|
" \n",
|
||||||
|
" query_results = qdrant.search(\n",
|
||||||
|
" collection_name=collection_name,\n",
|
||||||
|
" query_vector=(\n",
|
||||||
|
" vector_name, embedded_query\n",
|
||||||
|
" ),\n",
|
||||||
|
" limit=top_k,\n",
|
||||||
|
" )\n",
|
||||||
|
" \n",
|
||||||
|
" return query_results"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 50,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-01-18T09:50:46.545145Z",
|
||||||
|
"start_time": "2023-01-18T09:50:35.711020Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"0. Art (Score: 0.841)\n",
|
||||||
|
"1. Europe (Score: 0.839)\n",
|
||||||
|
"2. Italy (Score: 0.816)\n",
|
||||||
|
"3. Architecture (Score: 0.815)\n",
|
||||||
|
"4. Madrid (Score: 0.815)\n",
|
||||||
|
"5. France (Score: 0.812)\n",
|
||||||
|
"6. Belgium (Score: 0.808)\n",
|
||||||
|
"7. Austria (Score: 0.802)\n",
|
||||||
|
"8. London (Score: 0.799)\n",
|
||||||
|
"9. History (Score: 0.797)\n",
|
||||||
|
"10. Creativity (Score: 0.796)\n",
|
||||||
|
"11. Archaeology (Score: 0.795)\n",
|
||||||
|
"12. Cartography (Score: 0.794)\n",
|
||||||
|
"13. Denmark (Score: 0.793)\n",
|
||||||
|
"14. Finland (Score: 0.79)\n",
|
||||||
|
"15. English (Score: 0.789)\n",
|
||||||
|
"16. Catharism (Score: 0.788)\n",
|
||||||
|
"17. Dublin (Score: 0.787)\n",
|
||||||
|
"18. Ireland (Score: 0.787)\n",
|
||||||
|
"19. Japan (Score: 0.787)\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"query_results = query_qdrant('modern art in Europe', 'Articles')\n",
|
||||||
|
"for i, article in enumerate(query_results):\n",
|
||||||
|
" print(f'{i + 1}. {article.payload[\"title\"]} (Score: {round(article.score, 3)})')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 51,
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2023-01-18T09:53:11.038910Z",
|
||||||
|
"start_time": "2023-01-18T09:52:55.248029Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"1. History (Score: 0.797)\n",
|
||||||
|
"2. Dublin (Score: 0.787)\n",
|
||||||
|
"3. Ireland (Score: 0.786)\n",
|
||||||
|
"4. History of Australia (Score: 0.782)\n",
|
||||||
|
"5. Historian (Score: 0.778)\n",
|
||||||
|
"6. Belgium (Score: 0.776)\n",
|
||||||
|
"7. Black pudding (Score: 0.773)\n",
|
||||||
|
"8. London (Score: 0.769)\n",
|
||||||
|
"9. History of Spain (Score: 0.768)\n",
|
||||||
|
"10. Cartography (Score: 0.763)\n",
|
||||||
|
"11. March (Score: 0.762)\n",
|
||||||
|
"12. France (Score: 0.761)\n",
|
||||||
|
"13. Bubonic plague (Score: 0.76)\n",
|
||||||
|
"14. Great Lakes (Score: 0.759)\n",
|
||||||
|
"15. Inch (Score: 0.758)\n",
|
||||||
|
"16. Dissolution of the monasteries (Score: 0.758)\n",
|
||||||
|
"17. Austria (Score: 0.757)\n",
|
||||||
|
"18. English (Score: 0.757)\n",
|
||||||
|
"19. British English (Score: 0.757)\n",
|
||||||
|
"20. Armenia (Score: 0.756)\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"# This time we're going to query using content vector\n",
|
||||||
|
"query_results = query_qdrant('Famous battles in Scottish history', 'Articles', 'content')\n",
|
||||||
|
"for i, article in enumerate(query_results):\n",
|
||||||
|
" print(f'{i + 1}. {article.payload[\"title\"]} (Score: {round(article.score, 3)})')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Thanks for following along, you're now equipped to set up your own vector databases and use embeddings to do all kinds of cool things - enjoy! For more complex use cases please continue to work through other cookbook examples in this repo."
|
"Thanks for following along, you're now equipped to set up your own vector databases and use embeddings to do all kinds of cool things - enjoy! For more complex use cases please continue to work through other cookbook examples in this repo."
|
||||||
|
8
examples/vector_databases/qdrant/docker-compose.yaml
Normal file
8
examples/vector_databases/qdrant/docker-compose.yaml
Normal file
@ -0,0 +1,8 @@
|
|||||||
|
version: '3.4'
|
||||||
|
services:
|
||||||
|
qdrant:
|
||||||
|
image: qdrant/qdrant:v0.11.7
|
||||||
|
restart: on-failure
|
||||||
|
ports:
|
||||||
|
- "6333:6333"
|
||||||
|
- "6334:6334"
|
Loading…
Reference in New Issue
Block a user