mirror of https://github.com/hwchase17/langchain
You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
350 lines
9.8 KiB
Plaintext
350 lines
9.8 KiB
Plaintext
1 year ago
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
1 year ago
|
"metadata": {},
|
||
1 year ago
|
"source": [
|
||
|
"# Alibaba Cloud OpenSearch\n",
|
||
|
"\n",
|
||
1 year ago
|
">[Alibaba Cloud Opensearch](https://www.alibabacloud.com/product/opensearch) is a one-stop platform to develop intelligent search services. `OpenSearch` was built on the large-scale distributed search engine developed by `Alibaba`. `OpenSearch` serves more than 500 business cases in Alibaba Group and thousands of Alibaba Cloud customers. `OpenSearch` helps develop search services in different search scenarios, including e-commerce, O2O, multimedia, the content industry, communities and forums, and big data query in enterprises.\n",
|
||
1 year ago
|
"\n",
|
||
1 year ago
|
">`OpenSearch` helps you develop high quality, maintenance-free, and high performance intelligent search services to provide your users with high search efficiency and accuracy.\n",
|
||
1 year ago
|
"\n",
|
||
1 year ago
|
">`OpenSearch` provides the vector search feature. In specific scenarios, especially test question search and image search scenarios, you can use the vector search feature together with the multimodal search feature to improve the accuracy of search results.\n",
|
||
1 year ago
|
"\n",
|
||
|
"This notebook shows how to use functionality related to the `Alibaba Cloud OpenSearch Vector Search Edition`.\n",
|
||
|
"To run, you should have an [OpenSearch Vector Search Edition](https://opensearch.console.aliyun.com) instance up and running:\n",
|
||
1 year ago
|
"\n",
|
||
1 year ago
|
"Read the [help document](https://www.alibabacloud.com/help/en/opensearch/latest/vector-search) to quickly familiarize and configure OpenSearch Vector Search Edition instance."
|
||
1 year ago
|
]
|
||
|
},
|
||
1 year ago
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"source": [
|
||
|
"After the instance is up and running, follow these steps to split documents, get embeddings, connect to the alibaba cloud opensearch instance, index documents, and perform vector retrieval."
|
||
|
],
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
}
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"source": [
|
||
|
"We need to install the following Python packages first."
|
||
|
],
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
}
|
||
|
},
|
||
1 year ago
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 1,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#!pip install alibabacloud-ha3engine"
|
||
1 year ago
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"source": [
|
||
1 year ago
|
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
|
||
|
],
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
}
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import os\n",
|
||
|
"import getpass\n",
|
||
|
"\n",
|
||
1 year ago
|
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
|
||
1 year ago
|
],
|
||
|
"metadata": {
|
||
|
"collapsed": false,
|
||
|
"pycharm": {
|
||
|
"name": "#%%\n"
|
||
|
}
|
||
|
}
|
||
1 year ago
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": false,
|
||
1 year ago
|
"jupyter": {
|
||
|
"outputs_hidden": false
|
||
|
},
|
||
1 year ago
|
"pycharm": {
|
||
|
"name": "#%%\n"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||
|
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||
|
"from langchain.vectorstores import (\n",
|
||
|
" AlibabaCloudOpenSearch,\n",
|
||
|
" AlibabaCloudOpenSearchSettings,\n",
|
||
|
")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
1 year ago
|
"metadata": {},
|
||
1 year ago
|
"source": [
|
||
1 year ago
|
"Split documents and get embeddings."
|
||
1 year ago
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": false,
|
||
1 year ago
|
"jupyter": {
|
||
|
"outputs_hidden": false
|
||
|
},
|
||
1 year ago
|
"pycharm": {
|
||
|
"name": "#%%\n"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from langchain.document_loaders import TextLoader\n",
|
||
|
"\n",
|
||
|
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
|
||
|
"documents = loader.load()\n",
|
||
|
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||
|
"docs = text_splitter.split_documents(documents)\n",
|
||
|
"\n",
|
||
|
"embeddings = OpenAIEmbeddings()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%% md\n"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Create opensearch settings."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": false,
|
||
1 year ago
|
"jupyter": {
|
||
|
"outputs_hidden": false
|
||
|
},
|
||
1 year ago
|
"pycharm": {
|
||
|
"name": "#%%\n"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"settings = AlibabaCloudOpenSearchSettings(\n",
|
||
|
" endpoint=\"The endpoint of opensearch instance, You can find it from the console of Alibaba Cloud OpenSearch.\",\n",
|
||
|
" instance_id=\"The identify of opensearch instance, You can find it from the console of Alibaba Cloud OpenSearch.\",\n",
|
||
|
" datasource_name=\"The name of the data source specified when creating it.\",\n",
|
||
|
" username=\"The username specified when purchasing the instance.\",\n",
|
||
|
" password=\"The password specified when purchasing the instance.\",\n",
|
||
|
" embedding_index_name=\"The name of the vector attribute specified when configuring the instance attributes.\",\n",
|
||
|
" field_name_mapping={\n",
|
||
|
" \"id\": \"id\", # The id field name mapping of index document.\n",
|
||
|
" \"document\": \"document\", # The text field name mapping of index document.\n",
|
||
|
" \"embedding\": \"embedding\", # The embedding field name mapping of index document.\n",
|
||
1 year ago
|
" \"name_of_the_metadata_specified_during_search\": \"opensearch_metadata_field_name,=\", # The metadata field name mapping of index document, could specify multiple, The value field contains mapping name and operator, the operator would be used when executing metadata filter query.\n",
|
||
1 year ago
|
" },\n",
|
||
|
")\n",
|
||
|
"\n",
|
||
|
"# for example\n",
|
||
|
"# settings = AlibabaCloudOpenSearchSettings(\n",
|
||
|
"# endpoint=\"ha-cn-5yd39d83c03.public.ha.aliyuncs.com\",\n",
|
||
|
"# instance_id=\"ha-cn-5yd39d83c03\",\n",
|
||
|
"# datasource_name=\"ha-cn-5yd39d83c03_test\",\n",
|
||
|
"# username=\"this is a user name\",\n",
|
||
|
"# password=\"this is a password\",\n",
|
||
|
"# embedding_index_name=\"index_embedding\",\n",
|
||
|
"# field_name_mapping={\n",
|
||
|
"# \"id\": \"id\",\n",
|
||
|
"# \"document\": \"document\",\n",
|
||
|
"# \"embedding\": \"embedding\",\n",
|
||
1 year ago
|
"# \"metadata_a\": \"metadata_a,=\" #The value field contains mapping name and operator, the operator would be used when executing metadata filter query\n",
|
||
|
"# \"metadata_b\": \"metadata_b,>\"\n",
|
||
|
"# \"metadata_c\": \"metadata_c,<\"\n",
|
||
|
"# \"metadata_else\": \"metadata_else,=\"\n",
|
||
1 year ago
|
"# })"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
1 year ago
|
"metadata": {},
|
||
1 year ago
|
"source": [
|
||
|
"Create an opensearch access instance by settings."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": false,
|
||
1 year ago
|
"jupyter": {
|
||
|
"outputs_hidden": false
|
||
|
},
|
||
1 year ago
|
"pycharm": {
|
||
|
"name": "#%%\n"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Create an opensearch instance and index docs.\n",
|
||
|
"opensearch = AlibabaCloudOpenSearch.from_texts(\n",
|
||
|
" texts=docs, embedding=embeddings, config=settings\n",
|
||
|
")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
1 year ago
|
"metadata": {},
|
||
1 year ago
|
"source": [
|
||
|
"or"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": false,
|
||
1 year ago
|
"jupyter": {
|
||
|
"outputs_hidden": false
|
||
|
},
|
||
1 year ago
|
"pycharm": {
|
||
|
"name": "#%%\n"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# Create an opensearch instance.\n",
|
||
|
"opensearch = AlibabaCloudOpenSearch(embedding=embeddings, config=settings)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
1 year ago
|
"metadata": {},
|
||
1 year ago
|
"source": [
|
||
|
"Add texts and build index."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": false,
|
||
1 year ago
|
"jupyter": {
|
||
|
"outputs_hidden": false
|
||
|
},
|
||
1 year ago
|
"pycharm": {
|
||
|
"name": "#%%\n"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"metadatas = {\"md_key_a\": \"md_val_a\", \"md_key_b\": \"md_val_b\"}\n",
|
||
|
"# the key of metadatas must match field_name_mapping in settings.\n",
|
||
|
"opensearch.add_texts(texts=docs, ids=[], metadatas=metadatas)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
1 year ago
|
"metadata": {},
|
||
1 year ago
|
"source": [
|
||
|
"Query and retrieve data."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": false,
|
||
1 year ago
|
"jupyter": {
|
||
|
"outputs_hidden": false
|
||
|
},
|
||
1 year ago
|
"pycharm": {
|
||
|
"name": "#%%\n"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||
|
"docs = opensearch.similarity_search(query)\n",
|
||
|
"print(docs[0].page_content)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
1 year ago
|
"metadata": {},
|
||
1 year ago
|
"source": [
|
||
1 year ago
|
"Query and retrieve data with metadata.\n"
|
||
1 year ago
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": false,
|
||
1 year ago
|
"jupyter": {
|
||
|
"outputs_hidden": false
|
||
|
},
|
||
1 year ago
|
"pycharm": {
|
||
|
"name": "#%%\n"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||
|
"metadatas = {\"md_key_a\": \"md_val_a\"}\n",
|
||
|
"docs = opensearch.similarity_search(query, filter=metadatas)\n",
|
||
|
"print(docs[0].page_content)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%% md\n"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"If you encounter any problems during use, please feel free to contact <xingshaomin.xsm@alibaba-inc.com>, and we will do our best to provide you with assistance and support.\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
1 year ago
|
"display_name": "Python 3 (ipykernel)",
|
||
1 year ago
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
1 year ago
|
"version": 3
|
||
1 year ago
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
1 year ago
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.10.6"
|
||
1 year ago
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
1 year ago
|
"nbformat_minor": 4
|
||
1 year ago
|
}
|