" - *Setup*: Here we'll set up the Python client for Weaviate. For more details go [here](https://weaviate.io/developers/weaviate/current/client-libraries/python.html)\n",
" - *Index Data*: We'll create an index with __title__ search vectors in it\n",
" - *Search Data*: We'll run a few searches to confirm it works\n",
"- **Milvus**\n",
" - *Setup*: Here we'll set up the Python client for Milvus. For more details go [here](https://milvus.io/docs)\n",
" - *Index Data* We'll create a collection and index it for both __titles__ and __content__\n",
" - *Search Data*: We'll test out both collections with search queries to confirm it works\n",
"- **Qdrant**\n",
" - *Setup*: Here we'll set up the Python client for Qdrant. For more details go [here](https://github.com/qdrant/qdrant_client)\n",
" - *Index Data*: We'll create a collection with vectors for __titles__ and __content__\n",
@ -64,6 +68,7 @@
"# We'll need to install the clients for all vector databases\n",
"The next vector database we will take a look at is **Milvus**, which also offers a SaaS option like the previous two, as well as self-hosted options using either helm or docker-compose. Sticking to the idea of open source, we will show our self-hosted example here.\n",
"\n",
"In this example we will:\n",
"- Set up a local docker-compose based deployment\n",
"- Create the title and content collections\n",
"- Store our data\n",
"- Test out our system with real world searches"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "fe4914e9",
"metadata": {},
"source": [
"### Setup\n",
"\n",
"There are many ways to run Milvus (take a look [here](https://milvus.io/docs/install_cluster-milvusoperator.md)), but for now we will stick to a simple standalone Milvus instance with docker-compose.\n",
"\n",
"A simple docker-file can be found at `./milvus/docker-compose.yaml` and can be run using `docker-compose up` if within that mentioned directory or using `docker-compose -f path/to/file up`\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8e10f2ed",
"metadata": {},
"outputs": [],
"source": [
"from pymilvus import connections\n",
"\n",
"connections.connect(host='localhost', port=19530) # Local instance defaults to port 19530"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "64ffed22",
"metadata": {},
"source": [
"### Index data\n",
"\n",
"In Milvus data is stored in the form of collections, with each collection being able to store the vectors and any attributes that come with them.\n",
"\n",
"In this case we'll create a collection called **articles** which contains the url, title, text and the content_embedding.\n",
"\n",
"In addition to this we will also create an index on the content embedding. Milvus allows for the use of many SOTA indexing methods, but in this case, we are going to use HNSW.\n",
"Once the data is inserted into Milvus we can perform searches. For this example the search function takes one argument, top_k, how many closest matches to return. "