{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "683953b3", "metadata": {}, "source": [ "# MyScale\n", "\n", ">[MyScale](https://docs.myscale.com/en/overview/) is a cloud-based database optimized for AI applications and solutions, built on the open-source [ClickHouse](https://github.com/ClickHouse/ClickHouse). \n", "\n", "This notebook shows how to use functionality related to the `MyScale` vector database." ] }, { "attachments": {}, "cell_type": "markdown", "id": "43ead5d5-2c1f-4dce-a69a-cb00e4f9d6f0", "metadata": {}, "source": [ "## Setting up envrionments" ] }, { "cell_type": "code", "execution_count": null, "id": "7dccc580-8270-4714-ad61-f79783dd6eea", "metadata": { "tags": [] }, "outputs": [], "source": [ "!pip install clickhouse-connect" ] }, { "attachments": {}, "cell_type": "markdown", "id": "15a1d477-9cdb-4d82-b019-96951ecb2b72", "metadata": {}, "source": [ "We want to use OpenAIEmbeddings so we have to get the OpenAI API Key." ] }, { "cell_type": "code", "execution_count": null, "id": "91003ea5-0c8c-436c-a5de-aaeaeef2f458", "metadata": {}, "outputs": [], "source": [ "import os\n", "import getpass\n", "\n", "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a9d16fa3", "metadata": {}, "source": [ "There are two ways to set up parameters for myscale index.\n", "\n", "1. Environment Variables\n", "\n", " Before you run the app, please set the environment variable with `export`:\n", " `export MYSCALE_HOST='' MYSCALE_PORT= MYSCALE_USERNAME= MYSCALE_PASSWORD= ...`\n", "\n", " You can easily find your account, password and other info on our SaaS. For details please refer to [this document](https://docs.myscale.com/en/cluster-management/)\n", "\n", " Every attributes under `MyScaleSettings` can be set with prefix `MYSCALE_` and is case insensitive.\n", "\n", "2. Create `MyScaleSettings` object with parameters\n", "\n", "\n", " ```python\n", " from langchain.vectorstores import MyScale, MyScaleSettings\n", " config = MyScaleSetting(host=\"\", port=8443, ...)\n", " index = MyScale(embedding_function, config)\n", " index.add_documents(...)\n", " ```" ] }, { "cell_type": "code", "execution_count": null, "id": "aac9563e", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.embeddings.openai import OpenAIEmbeddings\n", "from langchain.text_splitter import CharacterTextSplitter\n", "from langchain.vectorstores import MyScale\n", "from langchain.document_loaders import TextLoader" ] }, { "cell_type": "code", "execution_count": null, "id": "a3c3999a", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.document_loaders import TextLoader\n", "\n", "loader = TextLoader(\"../../../state_of_the_union.txt\")\n", "documents = loader.load()\n", "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", "docs = text_splitter.split_documents(documents)\n", "\n", "embeddings = OpenAIEmbeddings()" ] }, { "cell_type": "code", "execution_count": null, "id": "6e104aee", "metadata": {}, "outputs": [], "source": [ "for d in docs:\n", " d.metadata = {\"some\": \"metadata\"}\n", "docsearch = MyScale.from_documents(docs, embeddings)\n", "\n", "query = \"What did the president say about Ketanji Brown Jackson\"\n", "docs = docsearch.similarity_search(query)" ] }, { "cell_type": "code", "execution_count": null, "id": "9c608226", "metadata": {}, "outputs": [], "source": [ "print(docs[0].page_content)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e3a8b105", "metadata": {}, "source": [ "## Get connection info and data schema" ] }, { "cell_type": "code", "execution_count": null, "id": "69996818", "metadata": {}, "outputs": [], "source": [ "print(str(docsearch))" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f59360c0", "metadata": {}, "source": [ "## Filtering\n", "\n", "You can have direct access to myscale SQL where statement. You can write `WHERE` clause following standard SQL.\n", "\n", "**NOTE**: Please be aware of SQL injection, this interface must not be directly called by end-user.\n", "\n", "If you custimized your `column_map` under your setting, you search with filter like this:" ] }, { "cell_type": "code", "execution_count": null, "id": "232055f6", "metadata": {}, "outputs": [], "source": [ "from langchain.vectorstores import MyScale, MyScaleSettings\n", "from langchain.document_loaders import TextLoader\n", "\n", "loader = TextLoader(\"../../../state_of_the_union.txt\")\n", "documents = loader.load()\n", "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", "docs = text_splitter.split_documents(documents)\n", "\n", "embeddings = OpenAIEmbeddings()\n", "\n", "for i, d in enumerate(docs):\n", " d.metadata = {\"doc_id\": i}\n", "\n", "docsearch = MyScale.from_documents(docs, embeddings)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8d867b05", "metadata": {}, "source": [ "### Similarity search with score" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9ec25cc5", "metadata": {}, "source": [ "The returned distance score is cosine distance. Therefore, a lower score is better." ] }, { "cell_type": "code", "execution_count": null, "id": "ddbcee77", "metadata": {}, "outputs": [], "source": [ "meta = docsearch.metadata_column\n", "output = docsearch.similarity_search_with_relevance_scores(\n", " \"What did the president say about Ketanji Brown Jackson?\",\n", " k=4,\n", " where_str=f\"{meta}.doc_id<10\",\n", ")\n", "for d, dist in output:\n", " print(dist, d.metadata, d.page_content[:20] + \"...\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a359ed74", "metadata": {}, "source": [ "## Deleting your data" ] }, { "cell_type": "code", "execution_count": null, "id": "fb6a9d36", "metadata": {}, "outputs": [], "source": [ "docsearch.drop()" ] }, { "cell_type": "code", "execution_count": null, "id": "48dbd8e0", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 5 }