langchain/docs/extras/integrations/vectorstores/alibabacloud_opensearch.ipynb

350 lines
9.8 KiB
Plaintext
Raw Normal View History

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Alibaba Cloud OpenSearch\n",
"\n",
">[Alibaba Cloud Opensearch](https://www.alibabacloud.com/product/opensearch) is a one-stop platform to develop intelligent search services. `OpenSearch` was built on the large-scale distributed search engine developed by `Alibaba`. `OpenSearch` serves more than 500 business cases in Alibaba Group and thousands of Alibaba Cloud customers. `OpenSearch` helps develop search services in different search scenarios, including e-commerce, O2O, multimedia, the content industry, communities and forums, and big data query in enterprises.\n",
"\n",
">`OpenSearch` helps you develop high quality, maintenance-free, and high performance intelligent search services to provide your users with high search efficiency and accuracy.\n",
"\n",
">`OpenSearch` provides the vector search feature. In specific scenarios, especially test question search and image search scenarios, you can use the vector search feature together with the multimodal search feature to improve the accuracy of search results.\n",
"\n",
"This notebook shows how to use functionality related to the `Alibaba Cloud OpenSearch Vector Search Edition`.\n",
"To run, you should have an [OpenSearch Vector Search Edition](https://opensearch.console.aliyun.com) instance up and running:\n",
"\n",
"Read the [help document](https://www.alibabacloud.com/help/en/opensearch/latest/vector-search) to quickly familiarize and configure OpenSearch Vector Search Edition instance."
]
},
{
"cell_type": "markdown",
"source": [
"After the instance is up and running, follow these steps to split documents, get embeddings, connect to the alibaba cloud opensearch instance, index documents, and perform vector retrieval."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"We need to install the following Python packages first."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"#!pip install alibabacloud-ha3engine"
]
},
{
"cell_type": "markdown",
"source": [
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"import os\n",
"import getpass\n",
"\n",
Fix `make docs_build` and related scripts (#7276) **Description: a description of the change** Fixed `make docs_build` and related scripts which caused errors. There are several changes. First, I made the build of the documentation and the API Reference into two separate commands. This is because it takes less time to build. The commands for documents are `make docs_build`, `make docs_clean`, and `make docs_linkcheck`. The commands for API Reference are `make api_docs_build`, `api_docs_clean`, and `api_docs_linkcheck`. It looked like `docs/.local_build.sh` could be used to build the documentation, so I used that. Since `.local_build.sh` was also building API Rerefence internally, I removed that process. `.local_build.sh` also added some Bash options to stop in error or so. Futher more added `cd "${SCRIPT_DIR}"` at the beginning so that the script will work no matter which directory it is executed in. `docs/api_reference/api_reference.rst` is removed, because which is generated by `docs/api_reference/create_api_rst.py`, and added it to .gitignore. Finally, the description of CONTRIBUTING.md was modified. **Issue: the issue # it fixes (if applicable)** https://github.com/hwchase17/langchain/issues/6413 **Dependencies: any dependencies required for this change** `nbdoc` was missing in group docs so it was added. I installed it with the `poetry add --group docs nbdoc` command. I am concerned if any modifications are needed to poetry.lock. I would greatly appreciate it if you could pay close attention to this file during the review. **Tag maintainer** - General / Misc / if you don't know who to tag: @baskaryan If this PR needs any additional changes, I'll be happy to make them! --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 02:05:14 +00:00
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import (\n",
" AlibabaCloudOpenSearch,\n",
" AlibabaCloudOpenSearchSettings,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Split documents and get embeddings."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Create opensearch settings."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"settings = AlibabaCloudOpenSearchSettings(\n",
" endpoint=\"The endpoint of opensearch instance, You can find it from the console of Alibaba Cloud OpenSearch.\",\n",
" instance_id=\"The identify of opensearch instance, You can find it from the console of Alibaba Cloud OpenSearch.\",\n",
" datasource_name=\"The name of the data source specified when creating it.\",\n",
" username=\"The username specified when purchasing the instance.\",\n",
" password=\"The password specified when purchasing the instance.\",\n",
" embedding_index_name=\"The name of the vector attribute specified when configuring the instance attributes.\",\n",
" field_name_mapping={\n",
" \"id\": \"id\", # The id field name mapping of index document.\n",
" \"document\": \"document\", # The text field name mapping of index document.\n",
" \"embedding\": \"embedding\", # The embedding field name mapping of index document.\n",
" \"name_of_the_metadata_specified_during_search\": \"opensearch_metadata_field_name,=\", # The metadata field name mapping of index document, could specify multiple, The value field contains mapping name and operator, the operator would be used when executing metadata filter query.\n",
" },\n",
")\n",
"\n",
"# for example\n",
"# settings = AlibabaCloudOpenSearchSettings(\n",
"# endpoint=\"ha-cn-5yd39d83c03.public.ha.aliyuncs.com\",\n",
"# instance_id=\"ha-cn-5yd39d83c03\",\n",
"# datasource_name=\"ha-cn-5yd39d83c03_test\",\n",
"# username=\"this is a user name\",\n",
"# password=\"this is a password\",\n",
"# embedding_index_name=\"index_embedding\",\n",
"# field_name_mapping={\n",
"# \"id\": \"id\",\n",
"# \"document\": \"document\",\n",
"# \"embedding\": \"embedding\",\n",
"# \"metadata_a\": \"metadata_a,=\" #The value field contains mapping name and operator, the operator would be used when executing metadata filter query\n",
"# \"metadata_b\": \"metadata_b,>\"\n",
"# \"metadata_c\": \"metadata_c,<\"\n",
"# \"metadata_else\": \"metadata_else,=\"\n",
"# })"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create an opensearch access instance by settings."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# Create an opensearch instance and index docs.\n",
"opensearch = AlibabaCloudOpenSearch.from_texts(\n",
" texts=docs, embedding=embeddings, config=settings\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"or"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"# Create an opensearch instance.\n",
"opensearch = AlibabaCloudOpenSearch(embedding=embeddings, config=settings)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Add texts and build index."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"metadatas = {\"md_key_a\": \"md_val_a\", \"md_key_b\": \"md_val_b\"}\n",
"# the key of metadatas must match field_name_mapping in settings.\n",
"opensearch.add_texts(texts=docs, ids=[], metadatas=metadatas)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Query and retrieve data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = opensearch.similarity_search(query)\n",
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Query and retrieve data with metadata.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"metadatas = {\"md_key_a\": \"md_val_a\"}\n",
"docs = opensearch.similarity_search(query, filter=metadatas)\n",
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"If you encounter any problems during use, please feel free to contact <xingshaomin.xsm@alibaba-inc.com>, and we will do our best to provide you with assistance and support.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}