Add Qdrant as another example of vector database

pull/81/head
Kacper Łukawski 1 year ago
parent 225b9177c8
commit 5ee5fecb76

@ -69,6 +69,9 @@
"# Weaviate's client library for Python\n",
"import weaviate\n",
"\n",
"# Qdrant's client library for Python\n",
"import qdrant_client\n",
"\n",
"# I've set this to our new embeddings model, this can be changed to the embedding model of your choice\n",
"EMBEDDING_MODEL = \"text-embedding-ada-002\"\n",
"\n",
@ -1048,7 +1051,313 @@
},
{
"cell_type": "markdown",
"id": "ad74202e",
"metadata": {},
"source": [
"## Qdrant\n",
"\n",
"The last vector database we'll consider in **[Qdrant](https://qdrant.tech/)**. This is a high-performant vector search database written in Rust. It offers both on-premise and cloud version, but for the purposes of that example we're going to use the local deployment mode.\n",
"\n",
"Setting everything up will require:\n",
"- Spinning up a local instance of Qdrant\n",
"- Configuring the collection and storing the data in it\n",
"- Trying out with some queries"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setup\n",
"\n",
"For the local deployment, we are going to use Docker, according to the Qdrant documentation: https://qdrant.tech/documentation/quick_start/. Qdrant requires just a single container, but an example of the docker-compose.yaml file is available at `./qdrant/docker-compose.yaml` in this repo.\n",
"\n",
"You can start Qdrant instance locally by navigating to this directory and running `docker-compose up -d `"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-18T09:28:38.928205Z",
"start_time": "2023-01-18T09:28:38.913987Z"
}
},
"outputs": [],
"source": [
"qdrant = qdrant_client.QdrantClient(host='localhost', prefer_grpc=True)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-18T09:29:19.806639Z",
"start_time": "2023-01-18T09:29:19.727897Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"CollectionsResponse(collections=[])"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"qdrant.get_collections()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Index data\n",
"\n",
"Qdrant stores data in __collections__ where each object is described by at least one vector and may contain an additional metadata called __payload__. Our collection will be called **Articles** and each object will be described by both **title** and **content** vectors.\n",
"\n",
"We're going to be using an official [qdrant-client](https://github.com/qdrant/qdrant_client) package that has all the utility methods already built-in."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-18T09:29:22.530121Z",
"start_time": "2023-01-18T09:29:22.524604Z"
}
},
"outputs": [],
"source": [
"from qdrant_client.http import models as rest"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-18T09:31:14.413334Z",
"start_time": "2023-01-18T09:31:13.619079Z"
}
},
"outputs": [],
"source": [
"vector_size = len(article_df['content_vector'][0])\n",
"\n",
"qdrant.recreate_collection(\n",
" collection_name='Articles',\n",
" vectors_config={\n",
" 'title': rest.VectorParams(\n",
" distance=rest.Distance.COSINE,\n",
" size=vector_size,\n",
" ),\n",
" 'content': rest.VectorParams(\n",
" distance=rest.Distance.COSINE,\n",
" size=vector_size,\n",
" ),\n",
" }\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-18T09:36:28.597535Z",
"start_time": "2023-01-18T09:36:24.108867Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"qdrant.upsert(\n",
" collection_name='Articles',\n",
" points=[\n",
" rest.PointStruct(\n",
" id=k,\n",
" vector={\n",
" 'title': v['title_vector'],\n",
" 'content': v['content_vector'],\n",
" },\n",
" payload=v.to_dict(),\n",
" )\n",
" for k, v in article_df.iterrows()\n",
" ],\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-18T09:58:13.825886Z",
"start_time": "2023-01-18T09:58:13.816248Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"CountResult(count=250)"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Check the collection size to make sure all the points have been stored\n",
"qdrant.count(collection_name='Articles')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Search Data\n",
"\n",
"Once the data is put into Qdrant we can start querying the collection for the closest vectors. We may provide an additional parameter `vector_name` to switch from title to content based search."
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-18T09:50:35.265647Z",
"start_time": "2023-01-18T09:50:35.256065Z"
}
},
"outputs": [],
"source": [
"def query_qdrant(query, collection_name, vector_name='title', top_k=20):\n",
"\n",
" # Creates embedding vector from user query\n",
" embedded_query = openai.Embedding.create(\n",
" input=query,\n",
" model=EMBEDDING_MODEL,\n",
" )['data'][0]['embedding']\n",
" \n",
" query_results = qdrant.search(\n",
" collection_name=collection_name,\n",
" query_vector=(\n",
" vector_name, embedded_query\n",
" ),\n",
" limit=top_k,\n",
" )\n",
" \n",
" return query_results"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-18T09:50:46.545145Z",
"start_time": "2023-01-18T09:50:35.711020Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0. Art (Score: 0.841)\n",
"1. Europe (Score: 0.839)\n",
"2. Italy (Score: 0.816)\n",
"3. Architecture (Score: 0.815)\n",
"4. Madrid (Score: 0.815)\n",
"5. France (Score: 0.812)\n",
"6. Belgium (Score: 0.808)\n",
"7. Austria (Score: 0.802)\n",
"8. London (Score: 0.799)\n",
"9. History (Score: 0.797)\n",
"10. Creativity (Score: 0.796)\n",
"11. Archaeology (Score: 0.795)\n",
"12. Cartography (Score: 0.794)\n",
"13. Denmark (Score: 0.793)\n",
"14. Finland (Score: 0.79)\n",
"15. English (Score: 0.789)\n",
"16. Catharism (Score: 0.788)\n",
"17. Dublin (Score: 0.787)\n",
"18. Ireland (Score: 0.787)\n",
"19. Japan (Score: 0.787)\n"
]
}
],
"source": [
"query_results = query_qdrant('modern art in Europe', 'Articles')\n",
"for i, article in enumerate(query_results):\n",
" print(f'{i + 1}. {article.payload[\"title\"]} (Score: {round(article.score, 3)})')"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-18T09:53:11.038910Z",
"start_time": "2023-01-18T09:52:55.248029Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1. History (Score: 0.797)\n",
"2. Dublin (Score: 0.787)\n",
"3. Ireland (Score: 0.786)\n",
"4. History of Australia (Score: 0.782)\n",
"5. Historian (Score: 0.778)\n",
"6. Belgium (Score: 0.776)\n",
"7. Black pudding (Score: 0.773)\n",
"8. London (Score: 0.769)\n",
"9. History of Spain (Score: 0.768)\n",
"10. Cartography (Score: 0.763)\n",
"11. March (Score: 0.762)\n",
"12. France (Score: 0.761)\n",
"13. Bubonic plague (Score: 0.76)\n",
"14. Great Lakes (Score: 0.759)\n",
"15. Inch (Score: 0.758)\n",
"16. Dissolution of the monasteries (Score: 0.758)\n",
"17. Austria (Score: 0.757)\n",
"18. English (Score: 0.757)\n",
"19. British English (Score: 0.757)\n",
"20. Armenia (Score: 0.756)\n"
]
}
],
"source": [
"# This time we're going to query using content vector\n",
"query_results = query_qdrant('Famous battles in Scottish history', 'Articles', 'content')\n",
"for i, article in enumerate(query_results):\n",
" print(f'{i + 1}. {article.payload[\"title\"]} (Score: {round(article.score, 3)})')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Thanks for following along, you're now equipped to set up your own vector databases and use embeddings to do all kinds of cool things - enjoy! For more complex use cases please continue to work through other cookbook examples in this repo."

@ -0,0 +1,8 @@
version: '3.4'
services:
qdrant:
image: qdrant/qdrant:v0.11.7
restart: on-failure
ports:
- "6333:6333"
- "6334:6334"
Loading…
Cancel
Save