langchain/docs/extras/integrations/vectorstores/dashvector.ipynb

237 lines
5.2 KiB
Plaintext
Raw Normal View History

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"# DashVector\n",
"\n",
"> [DashVector](https://help.aliyun.com/document_detail/2510225.html) is a fully-managed vectorDB service that supports high-dimension dense and sparse vectors, real-time insertion and filtered search. It is built to scale automatically and can adapt to different application requirements.\n",
"\n",
"This notebook shows how to use functionality related to the `DashVector` vector database.\n",
"\n",
"To use DashVector, you must have an API key.\n",
"Here are the [installation instructions](https://help.aliyun.com/document_detail/2510223.html)."
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"## Install"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"!pip install dashvector dashscope"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"We want to use `DashScopeEmbeddings` so we also have to get the Dashscope API Key."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"pycharm": {
"name": "#%%\n",
"is_executing": true
},
"ExecuteTime": {
"end_time": "2023-08-11T10:37:15.091585Z",
"start_time": "2023-08-11T10:36:51.859753Z"
}
},
"outputs": [],
"source": [
"import os\n",
"import getpass\n",
"\n",
"os.environ[\"DASHVECTOR_API_KEY\"] = getpass.getpass(\"DashVector API Key:\")\n",
"os.environ[\"DASHSCOPE_API_KEY\"] = getpass.getpass(\"DashScope API Key:\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"## Example"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"pycharm": {
"name": "#%%\n",
"is_executing": true
},
"ExecuteTime": {
"end_time": "2023-08-11T10:42:30.243460Z",
"start_time": "2023-08-11T10:42:27.783785Z"
}
},
"outputs": [],
"source": [
"from langchain.embeddings.dashscope import DashScopeEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import DashVector"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"pycharm": {
"is_executing": true,
"name": "#%%\n"
},
"ExecuteTime": {
"end_time": "2023-08-11T10:42:30.391580Z",
"start_time": "2023-08-11T10:42:30.249021Z"
}
},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"\n",
"loader = TextLoader(\"../../modules/state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = DashScopeEmbeddings()"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"We can create DashVector from documents."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"dashvector = DashVector.from_documents(docs, embeddings)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = dashvector.similarity_search(query)\n",
"print(docs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"We can add texts with meta datas and ids, and search with meta filter."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"pycharm": {
"name": "#%%\n"
},
"ExecuteTime": {
"end_time": "2023-08-11T10:42:51.641309Z",
"start_time": "2023-08-11T10:42:51.132109Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Document(page_content='baz', metadata={'key': 2})]\n"
]
}
],
"source": [
"texts = [\"foo\", \"bar\", \"baz\"]\n",
"metadatas = [{\"key\": i} for i in range(len(texts))]\n",
"ids = [\"0\", \"1\", \"2\"]\n",
"\n",
"dashvector.add_texts(texts, metadatas=metadatas, ids=ids)\n",
"\n",
"docs = dashvector.similarity_search(\"foo\", filter=\"key = 2\")\n",
"print(docs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [],
"metadata": {
"collapsed": false
}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
}
},
"nbformat": 4,
"nbformat_minor": 1
}