2023-02-21 02:39:34 +00:00
{
"cells": [
{
"cell_type": "markdown",
"id": "683953b3",
"metadata": {},
"source": [
"# OpenSearch\n",
"\n",
"This notebook shows how to use functionality related to the OpenSearch database.\n",
"\n",
"To run, you should have the opensearch instance up and running: [here](https://opensearch.org/docs/latest/install-and-configure/install-opensearch/index/)\n",
"`similarity_search` by default performs the Approximate k-NN Search which uses one of the several algorithms like lucene, nmslib, faiss recommended for\n",
"large datasets. To perform brute force search we have other search methods known as Script Scoring and Painless Scripting.\n",
"Check [this](https://opensearch.org/docs/latest/search-plugins/knn/index/) for more details."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "aac9563e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import OpenSearchVectorSearch\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
2023-03-27 02:49:46 +00:00
"loader = TextLoader('../../../state_of_the_union.txt')\n",
2023-02-21 02:39:34 +00:00
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": null,
2023-03-27 02:49:46 +00:00
"id": "db3fa309",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
2023-02-21 02:39:34 +00:00
"outputs": [],
"source": [
"docsearch = OpenSearchVectorSearch.from_texts(texts, embeddings, opensearch_url=\"http://localhost:9200\")\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)"
2023-03-27 02:49:46 +00:00
]
2023-02-21 02:39:34 +00:00
},
{
"cell_type": "code",
"execution_count": null,
2023-03-27 02:49:46 +00:00
"id": "c160d5bb",
2023-02-21 02:39:34 +00:00
"metadata": {
"pycharm": {
"name": "#%%\n"
}
2023-03-27 02:49:46 +00:00
},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
2023-02-21 02:39:34 +00:00
},
{
"cell_type": "markdown",
2023-03-27 02:49:46 +00:00
"id": "01a9a035",
"metadata": {},
2023-02-21 02:39:34 +00:00
"source": [
"#### similarity_search using Approximate k-NN Search with Custom Parameters"
2023-03-27 02:49:46 +00:00
]
2023-02-21 02:39:34 +00:00
},
{
"cell_type": "code",
"execution_count": null,
2023-03-27 02:49:46 +00:00
"id": "96215c90",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
2023-02-21 02:39:34 +00:00
"outputs": [],
"source": [
"docsearch = OpenSearchVectorSearch.from_texts(texts, embeddings, opensearch_url=\"http://localhost:9200\", engine=\"faiss\", space_type=\"innerproduct\", ef_construction=256, m=48)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)"
2023-03-27 02:49:46 +00:00
]
2023-02-21 02:39:34 +00:00
},
{
"cell_type": "code",
"execution_count": null,
2023-03-27 02:49:46 +00:00
"id": "62a7cea0",
2023-02-21 02:39:34 +00:00
"metadata": {
"pycharm": {
"name": "#%%\n"
}
2023-03-27 02:49:46 +00:00
},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
2023-02-21 02:39:34 +00:00
},
{
"cell_type": "markdown",
2023-03-27 02:49:46 +00:00
"id": "0d0cd877",
"metadata": {},
2023-02-21 02:39:34 +00:00
"source": [
"#### similarity_search using Script Scoring with Custom Parameters"
2023-03-27 02:49:46 +00:00
]
2023-02-21 02:39:34 +00:00
},
{
"cell_type": "code",
"execution_count": null,
2023-03-27 02:49:46 +00:00
"id": "0a8e3c0e",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
2023-02-21 02:39:34 +00:00
"outputs": [],
"source": [
"docsearch = OpenSearchVectorSearch.from_texts(texts, embeddings, opensearch_url=\"http://localhost:9200\", is_appx_search=False)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(\"What did the president say about Ketanji Brown Jackson\", k=1, search_type=\"script_scoring\")"
2023-03-27 02:49:46 +00:00
]
2023-02-21 02:39:34 +00:00
},
{
"cell_type": "code",
"execution_count": null,
2023-03-27 02:49:46 +00:00
"id": "92bc40db",
2023-02-21 02:39:34 +00:00
"metadata": {
"pycharm": {
"name": "#%%\n"
}
2023-03-27 02:49:46 +00:00
},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
2023-02-21 02:39:34 +00:00
},
{
"cell_type": "markdown",
2023-03-27 02:49:46 +00:00
"id": "a4af96cc",
"metadata": {},
2023-02-21 02:39:34 +00:00
"source": [
"#### similarity_search using Painless Scripting with Custom Parameters"
2023-03-27 02:49:46 +00:00
]
2023-02-21 02:39:34 +00:00
},
{
"cell_type": "code",
"execution_count": null,
2023-03-27 02:49:46 +00:00
"id": "6d9f436e",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
2023-02-21 02:39:34 +00:00
"outputs": [],
"source": [
"docsearch = OpenSearchVectorSearch.from_texts(texts, embeddings, opensearch_url=\"http://localhost:9200\", is_appx_search=False)\n",
"filter = {\"bool\": {\"filter\": {\"term\": {\"text\": \"smuggling\"}}}}\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(\"What did the president say about Ketanji Brown Jackson\", search_type=\"painless_scripting\", space_type=\"cosineSimilarity\", pre_filter=filter)"
2023-03-27 02:49:46 +00:00
]
2023-02-21 02:39:34 +00:00
},
{
"cell_type": "code",
"execution_count": null,
2023-03-27 02:49:46 +00:00
"id": "8ca50bce",
2023-02-21 02:39:34 +00:00
"metadata": {
"pycharm": {
"name": "#%%\n"
}
2023-03-27 02:49:46 +00:00
},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
2023-02-21 02:39:34 +00:00
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
2023-03-27 02:49:46 +00:00
}