{ "cells": [ { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "# DashVector\n", "\n", "> [DashVector](https://help.aliyun.com/document_detail/2510225.html) is a fully-managed vectorDB service that supports high-dimension dense and sparse vectors, real-time insertion and filtered search. It is built to scale automatically and can adapt to different application requirements.\n", "\n", "This notebook shows how to use functionality related to the `DashVector` vector database.\n", "\n", "To use DashVector, you must have an API key.\n", "Here are the [installation instructions](https://help.aliyun.com/document_detail/2510223.html)." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## Install" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "!pip install dashvector dashscope" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "We want to use `DashScopeEmbeddings` so we also have to get the Dashscope API Key." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "pycharm": { "name": "#%%\n", "is_executing": true }, "ExecuteTime": { "end_time": "2023-08-11T10:37:15.091585Z", "start_time": "2023-08-11T10:36:51.859753Z" } }, "outputs": [], "source": [ "import os\n", "import getpass\n", "\n", "os.environ[\"DASHVECTOR_API_KEY\"] = getpass.getpass(\"DashVector API Key:\")\n", "os.environ[\"DASHSCOPE_API_KEY\"] = getpass.getpass(\"DashScope API Key:\")" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## Example" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "pycharm": { "name": "#%%\n", "is_executing": true }, "ExecuteTime": { "end_time": "2023-08-11T10:42:30.243460Z", "start_time": "2023-08-11T10:42:27.783785Z" } }, "outputs": [], "source": [ "from langchain.embeddings.dashscope import DashScopeEmbeddings\n", "from langchain.text_splitter import CharacterTextSplitter\n", "from langchain.vectorstores import DashVector" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" }, "ExecuteTime": { "end_time": "2023-08-11T10:42:30.391580Z", "start_time": "2023-08-11T10:42:30.249021Z" } }, "outputs": [], "source": [ "from langchain.document_loaders import TextLoader\n", "\n", "loader = TextLoader(\"../../modules/state_of_the_union.txt\")\n", "documents = loader.load()\n", "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", "docs = text_splitter.split_documents(documents)\n", "\n", "embeddings = DashScopeEmbeddings()" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "We can create DashVector from documents." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "dashvector = DashVector.from_documents(docs, embeddings)\n", "\n", "query = \"What did the president say about Ketanji Brown Jackson\"\n", "docs = dashvector.similarity_search(query)\n", "print(docs)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "We can add texts with meta datas and ids, and search with meta filter." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "pycharm": { "name": "#%%\n" }, "ExecuteTime": { "end_time": "2023-08-11T10:42:51.641309Z", "start_time": "2023-08-11T10:42:51.132109Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[Document(page_content='baz', metadata={'key': 2})]\n" ] } ], "source": [ "texts = [\"foo\", \"bar\", \"baz\"]\n", "metadatas = [{\"key\": i} for i in range(len(texts))]\n", "ids = [\"0\", \"1\", \"2\"]\n", "\n", "dashvector.add_texts(texts, metadatas=metadatas, ids=ids)\n", "\n", "docs = dashvector.similarity_search(\"foo\", filter=\"key = 2\")\n", "print(docs)" ] }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [], "metadata": { "collapsed": false } } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4" } }, "nbformat": 4, "nbformat_minor": 1 }