{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Meilisearch\n", "\n", "> [Meilisearch](https://meilisearch.com) is an open-source, lightning-fast, and hyper relevant search engine. It comes with great defaults to help developers build snappy search experiences. \n", ">\n", "> You can [self-host Meilisearch](https://www.meilisearch.com/docs/learn/getting_started/installation#local-installation) or run on [Meilisearch Cloud](https://www.meilisearch.com/pricing).\n", "\n", "Meilisearch v1.3 supports vector search. This page guides you through integrating Meilisearch as a vector store and using it to perform vector search." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "### Launching a Meilisearch instance\n", "\n", "You will need a running Meilisearch instance to use as your vector store. You can run [Meilisearch in local](https://www.meilisearch.com/docs/learn/getting_started/installation#local-installation) or create a [Meilisearch Cloud](https://cloud.meilisearch.com/) account.\n", "\n", "As of Meilisearch v1.3, vector storage is an experimental feature. After launching your Meilisearch instance, you need to **enable vector storage**. For self-hosted Meilisearch, read the docs on [enabling experimental features](https://www.meilisearch.com/docs/learn/experimental/vector-search). On **Meilisearch Cloud**, enable _Vector Store_ via your project _Settings_ page.\n", "\n", "You should now have a running Meilisearch instance with vector storage enabled. 🎉\n", "\n", "### Credentials\n", "\n", "To interact with your Meilisearch instance, the Meilisearch SDK needs a host (URL of your instance) and an API key.\n", "\n", "**Host**\n", "\n", "- In **local**, the default host is `localhost:7700`\n", "- On **Meilisearch Cloud**, find the host in your project _Settings_ page\n", "\n", "**API keys**\n", "\n", "Meilisearch instance provides you with three API keys out of the box: \n", "- A `MASTER KEY` — it should only be used to create your Meilisearch instance\n", "- A `ADMIN KEY` — use it only server-side to update your database and its settings\n", "- A `SEARCH KEY` — a key that you can safely share in front-end applications\n", "\n", "You can create [additional API keys](https://www.meilisearch.com/docs/learn/security/master_api_keys) as needed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Installing dependencies\n", "\n", "This guide uses the [Meilisearch Python SDK](https://github.com/meilisearch/meilisearch-python). You can install it by running:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install meilisearch" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For more information, refer to the [Meilisearch Python SDK documentation](https://meilisearch.github.io/meilisearch-python/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Examples\n", "\n", "There are multiple ways to initialize the Meilisearch vector store: providing a Meilisearch client or the _URL_ and _API key_ as needed. In our examples, the credentials will be loaded from the environment.\n", "\n", "You can make environment variables available in your Notebook environment by using `os` and `getpass`. You can use this technique for all the following examples." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import getpass\n", "\n", "os.environ[\"MEILI_HTTP_ADDR\"] = getpass.getpass(\"Meilisearch HTTP address and port:\")\n", "os.environ[\"MEILI_MASTER_KEY\"] = getpass.getpass(\"Meilisearch API Key:\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to use OpenAIEmbeddings so we have to get the OpenAI API Key." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding text and embeddings\n", "\n", "This example adds text to the Meilisearch vector database without having to initialize a Meilisearch vector store." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from langchain.vectorstores import Meilisearch\n", "from langchain.embeddings.openai import OpenAIEmbeddings\n", "from langchain.text_splitter import CharacterTextSplitter\n", "\n", "embeddings = OpenAIEmbeddings()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with open(\"../../../state_of_the_union.txt\") as f:\n", " state_of_the_union = f.read()\n", "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", "texts = text_splitter.split_text(state_of_the_union)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Use Meilisearch vector store to store texts & associated embeddings as vector\n", "vector_store = Meilisearch.from_texts(texts=texts, embedding=embeddings)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Behind the scenes, Meilisearch will convert the text to multiple vectors. This will bring us to the same result as the following example." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding documents and embeddings\n", "\n", "In this example, we'll use Langchain TextSplitter to split the text in multiple documents. Then, we'll store these documents along with their embeddings." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from langchain.document_loaders import TextLoader\n", "\n", "# Load text\n", "loader = TextLoader(\"../../../state_of_the_union.txt\")\n", "documents = loader.load()\n", "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", "\n", "# Create documents\n", "docs = text_splitter.split_documents(documents)\n", "\n", "# Import documents & embeddings in the vector store\n", "vector_store = Meilisearch.from_documents(documents=documents, embedding=embeddings)\n", "\n", "# Search in our vector store\n", "query = \"What did the president say about Ketanji Brown Jackson\"\n", "docs = vector_store.similarity_search(query)\n", "print(docs[0].page_content)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Add documents by creating a Meilisearch Vectorstore" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "In this approach, we create a vector store object and add documents to it." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from langchain.vectorstores import Meilisearch\n", "import meilisearch\n", "\n", "client = meilisearch.Client(url=\"http://127.0.0.1:7700\", api_key=\"***\")\n", "vector_store = Meilisearch(\n", " embedding=embeddings, client=client, index_name=\"langchain_demo\", text_key=\"text\"\n", ")\n", "vector_store.add_documents(documents)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Similarity Search with score\n", "\n", "This specific method allows you to return the documents and the distance score of the query to them." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "docs_and_scores = vector_store.similarity_search_with_score(query)\n", "docs_and_scores[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Similarity Search by vector" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "embedding_vector = embeddings.embed_query(query)\n", "docs_and_scores = vector_store.similarity_search_by_vector(embedding_vector)\n", "docs_and_scores[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Additional resources\n", "\n", "Documentation\n", "- [Meilisearch](https://www.meilisearch.com/docs/)\n", "- [Meilisearch Python SDK](https://python-sdk.meilisearch.com)\n", "\n", "Open-source repositories\n", "- [Meilisearch repository](https://github.com/meilisearch/meilisearch)\n", "- [Meilisearch Python SDK](https://github.com/meilisearch/meilisearch-python)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 2 }