"> [OpenSearch](https://opensearch.org/) is a scalable, flexible, and extensible open-source software suite for search, analytics, and observability applications licensed under Apache 2.0. `OpenSearch` is a distributed search and analytics engine based on `Apache Lucene`.\n",
"\n",
"\n",
"This notebook shows how to use functionality related to the `OpenSearch` database.\n",
"\n",
"To run, you should have an OpenSearch instance up and running: [see here for an easy Docker installation](https://hub.docker.com/r/opensearchproject/opensearch).\n",
"\n",
"`similarity_search` by default performs the Approximate k-NN Search which uses one of the several algorithms like lucene, nmslib, faiss recommended for\n",
"large datasets. To perform brute force search we have other search methods known as Script Scoring and Painless Scripting.\n",
"Check [this](https://opensearch.org/docs/latest/search-plugins/knn/index/) for more details."
]
},
{
"cell_type": "markdown",
"id": "94963977-9dfc-48b7-872a-53f2947f46c6",
"metadata": {},
"source": [
"## Installation\n",
"Install the Python client."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e606066-9386-4427-8a87-1b93f435c57e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!pip install opensearch-py"
]
},
{
"cell_type": "markdown",
"id": "b1fa637e-4fbf-4d5a-9188-2cad826a193e",
"metadata": {},
"source": [
"We want to use OpenAIEmbeddings so we have to get the OpenAI API Key."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "28e5455e-322d-4010-9e3b-491d522ef5db",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import getpass\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(\n",
" \"What did the president say about Ketanji Brown Jackson\",\n",
" search_type=\"painless_scripting\",\n",
" space_type=\"cosineSimilarity\",\n",
" pre_filter=filter,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8ca50bce",
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "4f8fb0d0",
"metadata": {},
"source": [
"## Maximum marginal relevance search (MMR)\n",
"If you\u2019d like to look up for some similar documents, but you\u2019d also like to receive diverse results, MMR is method you should consider. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ba85e092",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",