Update Azure Cognitive Search to latest stable/GA Python SDK version + Renaming to Azure AI Search (#924)

pull/1033/head^2
Farzad Sunavala 3 months ago committed by GitHub
parent d75f2dd40d
commit 4b352176be
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -9,7 +9,7 @@
"\n",
"Azure OpenAI on your data enables you to run supported chat models such as GPT-3.5-Turbo and GPT-4 on your data without needing to train or fine-tune models. Running models on your data enables you to chat on top of, and analyze your data with greater accuracy and speed. One of the key benefits of Azure OpenAI on your data is its ability to tailor the content of conversational AI. Because the model has access to, and can reference specific sources to support its responses, answers are not only based on its pretrained knowledge but also on the latest information available in the designated data source. This grounding data also helps the model avoid generating responses based on outdated or incorrect information.\n",
"\n",
"Azure OpenAI on your own data with Azure Cognitive Search provides a customizable, pre-built solution for knowledge retrieval, from which a conversational AI application can be built. To see alternative methods for knowledge retrieval and semantic search, check out the cookbook examples for [vector databases](https://github.com/openai/openai-cookbook/tree/main/examples/vector_databases)."
"Azure OpenAI on your own data with Azure AI Search (f.k.a. Azure Cognitive Search) provides a customizable, pre-built solution for knowledge retrieval, from which a conversational AI application can be built. To see alternative methods for knowledge retrieval and semantic search, check out the cookbook examples for [vector databases](https://github.com/openai/openai-cookbook/tree/main/examples/vector_databases)."
]
},
{
@ -18,7 +18,7 @@
"source": [
"## How it works\n",
"\n",
"[Azure OpenAI on your own data](https://learn.microsoft.com/azure/ai-services/openai/concepts/use-your-data) connects the model with your data, giving it the ability to retrieve and utilize data in a way that enhances the model's output. Together with Azure Cognitive Search, data is retrieved from designated data sources based on the user input and provided conversation history. The data is then augmented and resubmitted as a prompt to the model, giving the model contextual information it can use to generate a response.\n",
"[Azure OpenAI on your own data](https://learn.microsoft.com/azure/ai-services/openai/concepts/use-your-data) connects the model with your data, giving it the ability to retrieve and utilize data in a way that enhances the model's output. Together with Azure AI Search, data is retrieved from designated data sources based on the user input and provided conversation history. The data is then augmented and resubmitted as a prompt to the model, giving the model contextual information it can use to generate a response.\n",
"\n",
"See the [Data, privacy, and security for Azure OpenAI Service](https://learn.microsoft.com/legal/cognitive-services/openai/data-privacy?context=%2Fazure%2Fai-services%2Fopenai%2Fcontext%2Fcontext) for more information."
]
@ -35,7 +35,7 @@
"To use your own data with Azure OpenAI models, you will need:\n",
"\n",
"1. Azure OpenAI access and a resource with a chat model deployed (for example, GPT-3 or GPT-4)\n",
"2. Azure Cognitive Search resource\n",
"2. Azure AI Search (f.k.a. Azure Cognitive Search) resource\n",
"3. Azure Blob Storage resource\n",
"4. Your documents to be used as data (See [data source options](https://learn.microsoft.com/azure/ai-services/openai/concepts/use-your-data#data-source-options))\n",
"\n",
@ -70,8 +70,8 @@
"\n",
"* `AZURE_OPENAI_ENDPOINT` - the Azure OpenAI endpoint. This can be found under \"Keys and Endpoints\" for your Azure OpenAI resource in the Azure Portal.\n",
"* `AZURE_OPENAI_API_KEY` - the Azure OpenAI API key. This can be found under \"Keys and Endpoints\" for your Azure OpenAI resource in the Azure Portal. Omit if using Azure Active Directory authentication (see below `Authentication using Microsoft Active Directory`)\n",
"* `SEARCH_ENDPOINT` - the Cognitive Search endpoint. This URL be found on the \"Overview\" of your Search resource on the Azure Portal.\n",
"* `SEARCH_KEY` - the Cognitive Search API key. Found under \"Keys\" for your Search resource in the Azure Portal.\n",
"* `SEARCH_ENDPOINT` - the AI Search endpoint. This URL be found on the \"Overview\" of your Search resource on the Azure Portal.\n",
"* `SEARCH_KEY` - the AI Search API key. Found under \"Keys\" for your Search resource in the Azure Portal.\n",
"* `SEARCH_INDEX_NAME` - the name of the index you created with your own data."
]
},

@ -9,7 +9,7 @@ Each provider has their own named directory, with a standard notebook to introdu
## Guides & deep dives
- [AnalyticDB](https://www.alibabacloud.com/help/en/analyticdb-for-postgresql/latest/get-started-with-analyticdb-for-postgresql)
- [Cassandra/Astra DB](https://docs.datastax.com/en/astra-serverless/docs/vector-search/qandasimsearch-quickstart.html)
- [AzureSearch](https://learn.microsoft.com/azure/search/search-get-started-vector)
- [Azure AI Search](https://learn.microsoft.com/azure/search/search-get-started-vector)
- [Chroma](https://docs.trychroma.com/getting-started)
- [Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html)
- [Hologres](https://www.alibabacloud.com/help/en/hologres/latest/procedure-to-use-hologres)

@ -0,0 +1,738 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Azure AI Search as a vector database for OpenAI embeddings"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook provides step by step instuctions on using Azure AI Search (f.k.a Azure Cognitive Search) as a vector database with OpenAI embeddings. Azure AI Search is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.\n",
"\n",
"## Prerequistites:\n",
"For the purposes of this exercise you must have the following:\n",
"- [Azure AI Search Service](https://learn.microsoft.com/azure/search/)\n",
"- [OpenAI Key](https://platform.openai.com/account/api-keys) or [Azure OpenAI credentials](https://learn.microsoft.com/azure/cognitive-services/openai/)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"! pip install wget\n",
"! pip install azure-search-documents \n",
"! pip install azure-identity\n",
"! pip install openai"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import required libraries"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import json \n",
"import wget\n",
"import pandas as pd\n",
"import zipfile\n",
"from openai import AzureOpenAI\n",
"from azure.identity import DefaultAzureCredential, get_bearer_token_provider\n",
"from azure.core.credentials import AzureKeyCredential \n",
"from azure.search.documents import SearchClient, SearchIndexingBufferedSender \n",
"from azure.search.documents.indexes import SearchIndexClient \n",
"from azure.search.documents.models import (\n",
" QueryAnswerType,\n",
" QueryCaptionType,\n",
" QueryType,\n",
" VectorizedQuery,\n",
")\n",
"from azure.search.documents.indexes.models import (\n",
" HnswAlgorithmConfiguration,\n",
" HnswParameters,\n",
" SearchField,\n",
" SearchableField,\n",
" SearchFieldDataType,\n",
" SearchIndex,\n",
" SemanticConfiguration,\n",
" SemanticField,\n",
" SemanticPrioritizedFields,\n",
" SemanticSearch,\n",
" SimpleField,\n",
" VectorSearch,\n",
" VectorSearchAlgorithmKind,\n",
" VectorSearchAlgorithmMetric,\n",
" VectorSearchProfile,\n",
")\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure OpenAI settings\n",
"\n",
"This section guides you through setting up authentication for Azure OpenAI, allowing you to securely interact with the service using either Azure Active Directory (AAD) or an API key. Before proceeding, ensure you have your Azure OpenAI endpoint and credentials ready. For detailed instructions on setting up AAD with Azure OpenAI, refer to the [official documentation](https://learn.microsoft.com/azure/ai-services/openai/how-to/managed-identity).\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"endpoint: str = \"YOUR_AZURE_OPENAI_ENDPOINT\"\n",
"api_key: str = \"YOUR_AZURE_OPENAI_KEY\"\n",
"api_version: str = \"2023-05-15\"\n",
"deployment = \"YOUR_AZURE_OPENAI_DEPLOYMENT_NAME\"\n",
"credential = DefaultAzureCredential()\n",
"token_provider = get_bearer_token_provider(\n",
" credential, \"https://cognitiveservices.azure.com/.default\"\n",
")\n",
"\n",
"# Set this flag to True if you are using Azure Active Directory\n",
"use_aad_for_aoai = True \n",
"\n",
"if use_aad_for_aoai:\n",
" # Use Azure Active Directory (AAD) authentication\n",
" client = AzureOpenAI(\n",
" azure_endpoint=endpoint,\n",
" api_version=api_version,\n",
" azure_ad_token_provider=token_provider,\n",
" )\n",
"else:\n",
" # Use API key authentication\n",
" client = AzureOpenAI(\n",
" api_key=api_key,\n",
" api_version=api_version,\n",
" azure_endpoint=endpoint,\n",
" )"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure Azure AI Search Vector Store settings\n",
"This section explains how to set up the Azure AI Search client for integrating with the Vector Store feature. You can locate your Azure AI Search service details in the Azure Portal or programmatically via the [Search Management SDK](https://learn.microsoft.com/rest/api/searchmanagement/).\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# Configuration\n",
"search_service_endpoint: str = \"YOUR_AZURE_SEARCH_ENDPOINT\"\n",
"search_service_api_key: str = \"YOUR_AZURE_SEARCH_ADMIN_KEY\"\n",
"index_name: str = \"azure-ai-search-openai-cookbook-demo\"\n",
"\n",
"# Set this flag to True if you are using Azure Active Directory\n",
"use_aad_for_search = True \n",
"\n",
"if use_aad_for_search:\n",
" # Use Azure Active Directory (AAD) authentication\n",
" credential = DefaultAzureCredential()\n",
"else:\n",
" # Use API key authentication\n",
" credential = AzureKeyCredential(search_service_api_key)\n",
"\n",
"# Initialize the SearchClient with the selected authentication method\n",
"search_client = SearchClient(\n",
" endpoint=search_service_endpoint, index_name=index_name, credential=credential\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load data\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'vector_database_wikipedia_articles_embedded.zip'"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"embeddings_url = \"https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip\"\n",
"\n",
"# The file is ~700 MB so this will take some time\n",
"wget.download(embeddings_url)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"with zipfile.ZipFile(\"vector_database_wikipedia_articles_embedded.zip\", \"r\") as zip_ref:\n",
" zip_ref.extractall(\"../../data\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>url</th>\n",
" <th>title</th>\n",
" <th>text</th>\n",
" <th>title_vector</th>\n",
" <th>content_vector</th>\n",
" <th>vector_id</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>https://simple.wikipedia.org/wiki/April</td>\n",
" <td>April</td>\n",
" <td>April is the fourth month of the year in the J...</td>\n",
" <td>[0.001009464613161981, -0.020700545981526375, ...</td>\n",
" <td>[-0.011253940872848034, -0.013491976074874401,...</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>https://simple.wikipedia.org/wiki/August</td>\n",
" <td>August</td>\n",
" <td>August (Aug.) is the eighth month of the year ...</td>\n",
" <td>[0.0009286514250561595, 0.000820168002974242, ...</td>\n",
" <td>[0.0003609954728744924, 0.007262262050062418, ...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>6</td>\n",
" <td>https://simple.wikipedia.org/wiki/Art</td>\n",
" <td>Art</td>\n",
" <td>Art is a creative activity that expresses imag...</td>\n",
" <td>[0.003393713850528002, 0.0061537534929811954, ...</td>\n",
" <td>[-0.004959689453244209, 0.015772193670272827, ...</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>8</td>\n",
" <td>https://simple.wikipedia.org/wiki/A</td>\n",
" <td>A</td>\n",
" <td>A or a is the first letter of the English alph...</td>\n",
" <td>[0.0153952119871974, -0.013759135268628597, 0....</td>\n",
" <td>[0.024894846603274345, -0.022186409682035446, ...</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>9</td>\n",
" <td>https://simple.wikipedia.org/wiki/Air</td>\n",
" <td>Air</td>\n",
" <td>Air refers to the Earth's atmosphere. Air is a...</td>\n",
" <td>[0.02224554680287838, -0.02044147066771984, -0...</td>\n",
" <td>[0.021524671465158463, 0.018522677943110466, -...</td>\n",
" <td>4</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id url title \\\n",
"0 1 https://simple.wikipedia.org/wiki/April April \n",
"1 2 https://simple.wikipedia.org/wiki/August August \n",
"2 6 https://simple.wikipedia.org/wiki/Art Art \n",
"3 8 https://simple.wikipedia.org/wiki/A A \n",
"4 9 https://simple.wikipedia.org/wiki/Air Air \n",
"\n",
" text \\\n",
"0 April is the fourth month of the year in the J... \n",
"1 August (Aug.) is the eighth month of the year ... \n",
"2 Art is a creative activity that expresses imag... \n",
"3 A or a is the first letter of the English alph... \n",
"4 Air refers to the Earth's atmosphere. Air is a... \n",
"\n",
" title_vector \\\n",
"0 [0.001009464613161981, -0.020700545981526375, ... \n",
"1 [0.0009286514250561595, 0.000820168002974242, ... \n",
"2 [0.003393713850528002, 0.0061537534929811954, ... \n",
"3 [0.0153952119871974, -0.013759135268628597, 0.... \n",
"4 [0.02224554680287838, -0.02044147066771984, -0... \n",
"\n",
" content_vector vector_id \n",
"0 [-0.011253940872848034, -0.013491976074874401,... 0 \n",
"1 [0.0003609954728744924, 0.007262262050062418, ... 1 \n",
"2 [-0.004959689453244209, 0.015772193670272827, ... 2 \n",
"3 [0.024894846603274345, -0.022186409682035446, ... 3 \n",
"4 [0.021524671465158463, 0.018522677943110466, -... 4 "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"article_df = pd.read_csv(\"../../data/vector_database_wikipedia_articles_embedded.csv\")\n",
"\n",
"# Read vectors from strings back into a list using json.loads\n",
"article_df[\"title_vector\"] = article_df.title_vector.apply(json.loads)\n",
"article_df[\"content_vector\"] = article_df.content_vector.apply(json.loads)\n",
"article_df[\"vector_id\"] = article_df[\"vector_id\"].apply(str)\n",
"article_df.head()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create an index\n",
"This code snippet demonstrates how to define and create a search index using the `SearchIndexClient` from the Azure AI Search Python SDK. The index incorporates both vector search and semantic ranker capabilities. For more details, visit our documentation on how to [Create a Vector Index](https://learn.microsoft.com/azure/search/vector-search-how-to-create-index?.tabs=config-2023-11-01%2Crest-2023-11-01%2Cpush%2Cportal-check-index)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"azure-ai-search-openai-cookbook-demo created\n"
]
}
],
"source": [
"# Initialize the SearchIndexClient\n",
"index_client = SearchIndexClient(\n",
" endpoint=search_service_endpoint, credential=credential\n",
")\n",
"\n",
"# Define the fields for the index\n",
"fields = [\n",
" SimpleField(name=\"id\", type=SearchFieldDataType.String),\n",
" SimpleField(name=\"vector_id\", type=SearchFieldDataType.String, key=True),\n",
" SimpleField(name=\"url\", type=SearchFieldDataType.String),\n",
" SearchableField(name=\"title\", type=SearchFieldDataType.String),\n",
" SearchableField(name=\"text\", type=SearchFieldDataType.String),\n",
" SearchField(\n",
" name=\"title_vector\",\n",
" type=SearchFieldDataType.Collection(SearchFieldDataType.Single),\n",
" vector_search_dimensions=1536,\n",
" vector_search_profile_name=\"my-vector-config\",\n",
" ),\n",
" SearchField(\n",
" name=\"content_vector\",\n",
" type=SearchFieldDataType.Collection(SearchFieldDataType.Single),\n",
" vector_search_dimensions=1536,\n",
" vector_search_profile_name=\"my-vector-config\",\n",
" ),\n",
"]\n",
"\n",
"# Configure the vector search configuration\n",
"vector_search = VectorSearch(\n",
" algorithms=[\n",
" HnswAlgorithmConfiguration(\n",
" name=\"my-hnsw\",\n",
" kind=VectorSearchAlgorithmKind.HNSW,\n",
" parameters=HnswParameters(\n",
" m=4,\n",
" ef_construction=400,\n",
" ef_search=500,\n",
" metric=VectorSearchAlgorithmMetric.COSINE,\n",
" ),\n",
" )\n",
" ],\n",
" profiles=[\n",
" VectorSearchProfile(\n",
" name=\"my-vector-config\",\n",
" algorithm_configuration_name=\"my-hnsw\",\n",
" )\n",
" ],\n",
")\n",
"\n",
"# Configure the semantic search configuration\n",
"semantic_search = SemanticSearch(\n",
" configurations=[\n",
" SemanticConfiguration(\n",
" name=\"my-semantic-config\",\n",
" prioritized_fields=SemanticPrioritizedFields(\n",
" title_field=SemanticField(field_name=\"title\"),\n",
" keywords_fields=[SemanticField(field_name=\"url\")],\n",
" content_fields=[SemanticField(field_name=\"text\")],\n",
" ),\n",
" )\n",
" ]\n",
")\n",
"\n",
"# Create the search index with the vector search and semantic search configurations\n",
"index = SearchIndex(\n",
" name=index_name,\n",
" fields=fields,\n",
" vector_search=vector_search,\n",
" semantic_search=semantic_search,\n",
")\n",
"\n",
"# Create or update the index\n",
"result = index_client.create_or_update_index(index)\n",
"print(f\"{result.name} created\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Uploading Data to Azure AI Search Index\n",
"\n",
"The following code snippet outlines the process of uploading a batch of documents—specifically, Wikipedia articles with pre-computed embeddings—from a pandas DataFrame to an Azure AI Search index. For a detailed guide on data import strategies and best practices, refer to [Data Import in Azure AI Search](https://learn.microsoft.com/azure/search/search-what-is-data-import).\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Uploaded 25000 documents in total\n"
]
}
],
"source": [
"from azure.core.exceptions import HttpResponseError\n",
"\n",
"# Convert the 'id' and 'vector_id' columns to string so one of them can serve as our key field\n",
"article_df[\"id\"] = article_df[\"id\"].astype(str)\n",
"article_df[\"vector_id\"] = article_df[\"vector_id\"].astype(str)\n",
"# Convert the DataFrame to a list of dictionaries\n",
"documents = article_df.to_dict(orient=\"records\")\n",
"\n",
"# Create a SearchIndexingBufferedSender\n",
"batch_client = SearchIndexingBufferedSender(\n",
" search_service_endpoint, index_name, credential\n",
")\n",
"\n",
"try:\n",
" # Add upload actions for all documents in a single call\n",
" batch_client.upload_documents(documents=documents)\n",
"\n",
" # Manually flush to send any remaining documents in the buffer\n",
" batch_client.flush()\n",
"except HttpResponseError as e:\n",
" print(f\"An error occurred: {e}\")\n",
"finally:\n",
" # Clean up resources\n",
" batch_client.close()\n",
"\n",
"print(f\"Uploaded {len(documents)} documents in total\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If your dataset didn't already contain pre-computed embeddings, you can create embeddings by using the below function using the `openai` python library. You'll also notice the same function and model are being used to generate query embeddings for performing vector searches."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Content: April is the fourth month of the year in the Julian and Gregorian calendars, and comes between March\n",
"Content vector generated\n"
]
}
],
"source": [
"# Example function to generate document embedding\n",
"def generate_embeddings(text, model):\n",
" # Generate embeddings for the provided text using the specified model\n",
" embeddings_response = client.embeddings.create(model=model, input=text)\n",
" # Extract the embedding data from the response\n",
" embedding = embeddings_response.data[0].embedding\n",
" return embedding\n",
"\n",
"\n",
"first_document_content = documents[0][\"text\"]\n",
"print(f\"Content: {first_document_content[:100]}\")\n",
"\n",
"content_vector = generate_embeddings(first_document_content, deployment)\n",
"print(\"Content vector generated\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Perform a vector similarity search"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Title: Documenta\n",
"Score: 0.8599451\n",
"URL: https://simple.wikipedia.org/wiki/Documenta\n",
"\n",
"Title: Museum of Modern Art\n",
"Score: 0.85260946\n",
"URL: https://simple.wikipedia.org/wiki/Museum%20of%20Modern%20Art\n",
"\n",
"Title: Expressionism\n",
"Score: 0.852354\n",
"URL: https://simple.wikipedia.org/wiki/Expressionism\n",
"\n"
]
}
],
"source": [
"# Pure Vector Search\n",
"query = \"modern art in Europe\"\n",
" \n",
"search_client = SearchClient(search_service_endpoint, index_name, credential) \n",
"vector_query = VectorizedQuery(vector=generate_embeddings(query, deployment), k_nearest_neighbors=3, fields=\"content_vector\")\n",
" \n",
"results = search_client.search( \n",
" search_text=None, \n",
" vector_queries= [vector_query], \n",
" select=[\"title\", \"text\", \"url\"] \n",
")\n",
" \n",
"for result in results: \n",
" print(f\"Title: {result['title']}\") \n",
" print(f\"Score: {result['@search.score']}\") \n",
" print(f\"URL: {result['url']}\\n\") "
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Perform a Hybrid Search\n",
"Hybrid search combines the capabilities of traditional keyword-based search with vector-based similarity search to provide more relevant and contextual results. This approach is particularly useful when dealing with complex queries that benefit from understanding the semantic meaning behind the text.\n",
"\n",
"The provided code snippet demonstrates how to execute a hybrid search query:"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Title: Wars of Scottish Independence\n",
"Score: 0.03306011110544205\n",
"URL: https://simple.wikipedia.org/wiki/Wars%20of%20Scottish%20Independence\n",
"\n",
"Title: Battle of Bannockburn\n",
"Score: 0.022253260016441345\n",
"URL: https://simple.wikipedia.org/wiki/Battle%20of%20Bannockburn\n",
"\n",
"Title: Scottish\n",
"Score: 0.016393441706895828\n",
"URL: https://simple.wikipedia.org/wiki/Scottish\n",
"\n"
]
}
],
"source": [
"# Hybrid Search\n",
"query = \"Famous battles in Scottish history\" \n",
" \n",
"search_client = SearchClient(search_service_endpoint, index_name, credential) \n",
"vector_query = VectorizedQuery(vector=generate_embeddings(query, deployment), k_nearest_neighbors=3, fields=\"content_vector\")\n",
" \n",
"results = search_client.search( \n",
" search_text=query, \n",
" vector_queries= [vector_query], \n",
" select=[\"title\", \"text\", \"url\"],\n",
" top=3\n",
")\n",
" \n",
"for result in results: \n",
" print(f\"Title: {result['title']}\") \n",
" print(f\"Score: {result['@search.score']}\") \n",
" print(f\"URL: {result['url']}\\n\") "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Perform a Hybrid Search with Reranking (powered by Bing)\n",
"[Semantic ranker](https://learn.microsoft.com/azure/search/semantic-search-overview) measurably improves search relevance by using language understanding to rerank search results. Additionally, you can get extractive captions, answers, and highlights. "
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Semantic Answer: Advancements During the industrial revolution, new technology brought many changes. For example:<em> Canals</em> were built to allow heavy goods to be moved easily where they were needed. The steam engine became the main source of power. It replaced horses and human labor. Cheap iron and steel became mass-produced.\n",
"Semantic Answer Score: 0.90478515625\n",
"\n",
"Title: Industrial Revolution\n",
"Reranker Score: 3.408700942993164\n",
"URL: https://simple.wikipedia.org/wiki/Industrial%20Revolution\n",
"Caption: Advancements During the industrial revolution, new technology brought many changes. For example: Canals were built to allow heavy goods to be moved easily where they were needed. The steam engine became the main source of power. It replaced horses and human labor. Cheap iron and steel became mass-produced.\n",
"\n",
"Title: Printing\n",
"Reranker Score: 1.603400707244873\n",
"URL: https://simple.wikipedia.org/wiki/Printing\n",
"Caption: Machines to speed printing, cheaper paper, automatic stitching and binding all arrived in the 19th century during the industrial revolution. What had once been done by a few men by hand was now done by limited companies on huge machines. The result was much lower prices, and a much wider readership.\n",
"\n",
"Title: Industrialisation\n",
"Reranker Score: 1.3238357305526733\n",
"URL: https://simple.wikipedia.org/wiki/Industrialisation\n",
"Caption: <em>Industrialisation</em> (or<em> industrialization)</em> is a process that happens in countries when they start to use machines to do work that was once done by people.<em> Industrialisation changes</em> the things people do.<em> Industrialisation</em> caused towns to grow larger. Many people left farming to take higher paid jobs in factories in towns.\n",
"\n"
]
}
],
"source": [
"# Semantic Hybrid Search\n",
"query = \"What were the key technological advancements during the Industrial Revolution?\"\n",
"\n",
"search_client = SearchClient(search_service_endpoint, index_name, credential)\n",
"vector_query = VectorizedQuery(\n",
" vector=generate_embeddings(query, deployment),\n",
" k_nearest_neighbors=3,\n",
" fields=\"content_vector\",\n",
")\n",
"\n",
"results = search_client.search(\n",
" search_text=query,\n",
" vector_queries=[vector_query],\n",
" select=[\"title\", \"text\", \"url\"],\n",
" query_type=QueryType.SEMANTIC,\n",
" semantic_configuration_name=\"my-semantic-config\",\n",
" query_caption=QueryCaptionType.EXTRACTIVE,\n",
" query_answer=QueryAnswerType.EXTRACTIVE,\n",
" top=3,\n",
")\n",
"\n",
"semantic_answers = results.get_answers()\n",
"for answer in semantic_answers:\n",
" if answer.highlights:\n",
" print(f\"Semantic Answer: {answer.highlights}\")\n",
" else:\n",
" print(f\"Semantic Answer: {answer.text}\")\n",
" print(f\"Semantic Answer Score: {answer.score}\\n\")\n",
"\n",
"for result in results:\n",
" print(f\"Title: {result['title']}\")\n",
" print(f\"Reranker Score: {result['@search.reranker_score']}\")\n",
" print(f\"URL: {result['url']}\")\n",
" captions = result[\"@search.captions\"]\n",
" if captions:\n",
" caption = captions[0]\n",
" if caption.highlights:\n",
" print(f\"Caption: {caption.highlights}\\n\")\n",
" else:\n",
" print(f\"Caption: {caption.text}\\n\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -1,647 +0,0 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Azure Cognitive Search as a vector database for OpenAI embeddings"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook provides step by step instuctions on using Azure Cognitive Search as a vector database with OpenAI embeddings. Azure Cognitive Search (formerly known as \"Azure Search\") is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.\n",
"\n",
"## Prerequistites:\n",
"For the purposes of this exercise you must have the following:\n",
"- [Azure Cognitive Search Service](https://learn.microsoft.com/azure/search/)\n",
"- [OpenAI Key](https://platform.openai.com/account/api-keys) or [Azure OpenAI credentials](https://learn.microsoft.com/azure/cognitive-services/openai/)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"! pip install wget\n",
"! pip install azure-search-documents --pre "
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import required libraries"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import openai\n",
"import json \n",
"import openai\n",
"import wget\n",
"import pandas as pd\n",
"import zipfile\n",
"from azure.core.credentials import AzureKeyCredential \n",
"from azure.search.documents import SearchClient \n",
"from azure.search.documents.indexes import SearchIndexClient \n",
"from azure.search.documents.models import Vector \n",
"from azure.search.documents import SearchIndexingBufferedSender\n",
"from azure.search.documents.indexes.models import ( \n",
" SearchIndex, \n",
" SearchField, \n",
" SearchFieldDataType, \n",
" SimpleField, \n",
" SearchableField, \n",
" SearchIndex, \n",
" SemanticConfiguration, \n",
" PrioritizedFields, \n",
" SemanticField, \n",
" SearchField, \n",
" SemanticSettings, \n",
" VectorSearch, \n",
" HnswVectorSearchAlgorithmConfiguration, \n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure OpenAI settings\n",
"\n",
"Configure your OpenAI or Azure OpenAI settings. For this example, we use Azure OpenAI."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"openai.api_type = \"azure\"\n",
"openai.api_base = \"YOUR_AZURE_OPENAI_ENDPOINT\"\n",
"openai.api_version = \"2023-05-15\"\n",
"openai.api_key = \"YOUR_AZURE_OPENAI_KEY\"\n",
"model: str = \"text-embedding-3-small\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure Azure Cognitive Search Vector Store settings\n",
"You can find this in the Azure Portal or using the [Search Management SDK](https://learn.microsoft.com/rest/api/searchmanagement/)\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"search_service_endpoint: str = \"YOUR_AZURE_SEARCH_ENDPOINT\"\n",
"search_service_api_key: str = \"YOUR_AZURE_SEARCH_ADMIN_KEY\"\n",
"index_name: str = \"azure-cognitive-search-vector-demo\"\n",
"credential = AzureKeyCredential(search_service_api_key)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load data\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'vector_database_wikipedia_articles_embedded.zip'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"embeddings_url = \"https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip\"\n",
"\n",
"# The file is ~700 MB so this will take some time\n",
"wget.download(embeddings_url)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"with zipfile.ZipFile(\"vector_database_wikipedia_articles_embedded.zip\",\"r\") as zip_ref:\n",
" zip_ref.extractall(\"../../data\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>url</th>\n",
" <th>title</th>\n",
" <th>text</th>\n",
" <th>title_vector</th>\n",
" <th>content_vector</th>\n",
" <th>vector_id</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>https://simple.wikipedia.org/wiki/April</td>\n",
" <td>April</td>\n",
" <td>April is the fourth month of the year in the J...</td>\n",
" <td>[0.001009464613161981, -0.020700545981526375, ...</td>\n",
" <td>[-0.011253940872848034, -0.013491976074874401,...</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>https://simple.wikipedia.org/wiki/August</td>\n",
" <td>August</td>\n",
" <td>August (Aug.) is the eighth month of the year ...</td>\n",
" <td>[0.0009286514250561595, 0.000820168002974242, ...</td>\n",
" <td>[0.0003609954728744924, 0.007262262050062418, ...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>6</td>\n",
" <td>https://simple.wikipedia.org/wiki/Art</td>\n",
" <td>Art</td>\n",
" <td>Art is a creative activity that expresses imag...</td>\n",
" <td>[0.003393713850528002, 0.0061537534929811954, ...</td>\n",
" <td>[-0.004959689453244209, 0.015772193670272827, ...</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>8</td>\n",
" <td>https://simple.wikipedia.org/wiki/A</td>\n",
" <td>A</td>\n",
" <td>A or a is the first letter of the English alph...</td>\n",
" <td>[0.0153952119871974, -0.013759135268628597, 0....</td>\n",
" <td>[0.024894846603274345, -0.022186409682035446, ...</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>9</td>\n",
" <td>https://simple.wikipedia.org/wiki/Air</td>\n",
" <td>Air</td>\n",
" <td>Air refers to the Earth's atmosphere. Air is a...</td>\n",
" <td>[0.02224554680287838, -0.02044147066771984, -0...</td>\n",
" <td>[0.021524671465158463, 0.018522677943110466, -...</td>\n",
" <td>4</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id url title \\\n",
"0 1 https://simple.wikipedia.org/wiki/April April \n",
"1 2 https://simple.wikipedia.org/wiki/August August \n",
"2 6 https://simple.wikipedia.org/wiki/Art Art \n",
"3 8 https://simple.wikipedia.org/wiki/A A \n",
"4 9 https://simple.wikipedia.org/wiki/Air Air \n",
"\n",
" text \\\n",
"0 April is the fourth month of the year in the J... \n",
"1 August (Aug.) is the eighth month of the year ... \n",
"2 Art is a creative activity that expresses imag... \n",
"3 A or a is the first letter of the English alph... \n",
"4 Air refers to the Earth's atmosphere. Air is a... \n",
"\n",
" title_vector \\\n",
"0 [0.001009464613161981, -0.020700545981526375, ... \n",
"1 [0.0009286514250561595, 0.000820168002974242, ... \n",
"2 [0.003393713850528002, 0.0061537534929811954, ... \n",
"3 [0.0153952119871974, -0.013759135268628597, 0.... \n",
"4 [0.02224554680287838, -0.02044147066771984, -0... \n",
"\n",
" content_vector vector_id \n",
"0 [-0.011253940872848034, -0.013491976074874401,... 0 \n",
"1 [0.0003609954728744924, 0.007262262050062418, ... 1 \n",
"2 [-0.004959689453244209, 0.015772193670272827, ... 2 \n",
"3 [0.024894846603274345, -0.022186409682035446, ... 3 \n",
"4 [0.021524671465158463, 0.018522677943110466, -... 4 "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"article_df = pd.read_csv('../../data/vector_database_wikipedia_articles_embedded.csv') \n",
" \n",
"# Read vectors from strings back into a list using json.loads \n",
"article_df[\"title_vector\"] = article_df.title_vector.apply(json.loads) \n",
"article_df[\"content_vector\"] = article_df.content_vector.apply(json.loads) \n",
"article_df['vector_id'] = article_df['vector_id'].apply(str) \n",
"article_df.head() \n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create an index"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"azure-cognitive-search-vector-demo created\n"
]
}
],
"source": [
"# Configure a search index\n",
"index_client = SearchIndexClient(\n",
" endpoint=search_service_endpoint, credential=credential)\n",
"fields = [\n",
" SimpleField(name=\"id\", type=SearchFieldDataType.String),\n",
" SimpleField(name=\"vector_id\", type=SearchFieldDataType.String, key=True),\n",
" SimpleField(name=\"url\", type=SearchFieldDataType.String),\n",
" SearchableField(name=\"title\", type=SearchFieldDataType.String),\n",
" SearchableField(name=\"text\", type=SearchFieldDataType.String),\n",
" SearchField(name=\"title_vector\", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),\n",
" searchable=True, vector_search_dimensions=1536, vector_search_configuration=\"my-vector-config\"),\n",
" SearchField(name=\"content_vector\", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),\n",
" searchable=True, vector_search_dimensions=1536, vector_search_configuration=\"my-vector-config\"),\n",
"]\n",
"\n",
"# Configure the vector search configuration\n",
"vector_search = VectorSearch(\n",
" algorithm_configurations=[\n",
" HnswVectorSearchAlgorithmConfiguration(\n",
" name=\"my-vector-config\",\n",
" kind=\"hnsw\",\n",
" parameters={\n",
" \"m\": 4,\n",
" \"efConstruction\": 400,\n",
" \"efSearch\": 500,\n",
" \"metric\": \"cosine\"\n",
" }\n",
" )\n",
" ]\n",
")\n",
"\n",
"# Optional: configure semantic reranking by passing your title, keywords, and content fields\n",
"semantic_config = SemanticConfiguration(\n",
" name=\"my-semantic-config\",\n",
" prioritized_fields=PrioritizedFields(\n",
" title_field=SemanticField(field_name=\"title\"),\n",
" prioritized_keywords_fields=[SemanticField(field_name=\"url\")],\n",
" prioritized_content_fields=[SemanticField(field_name=\"text\")]\n",
" )\n",
")\n",
"# Create the semantic settings with the configuration\n",
"semantic_settings = SemanticSettings(configurations=[semantic_config])\n",
"\n",
"# Create the index \n",
"index = SearchIndex(name=index_name, fields=fields,\n",
" vector_search=vector_search, semantic_settings=semantic_settings)\n",
"result = index_client.create_or_update_index(index)\n",
"print(f'{result.name} created')\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Insert text and embeddings into vector store\n",
"In this notebook, the wikipedia articles dataset provided by OpenAI, the embeddings are pre-computed. The code below takes the data frame and converts it into a dictionary list to upload to your Azure Search index.\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Uploaded 25000 documents in total\n"
]
}
],
"source": [
"# Convert the 'id' and 'vector_id' columns to string so one of them can serve as our key field \n",
"article_df['id'] = article_df['id'].astype(str) \n",
"article_df['vector_id'] = article_df['vector_id'].astype(str) \n",
" \n",
"# Convert the DataFrame to a list of dictionaries \n",
"documents = article_df.to_dict(orient='records') \n",
" \n",
"# Use SearchIndexingBufferedSender to upload the documents in batches optimized for indexing \n",
"with SearchIndexingBufferedSender(search_service_endpoint, index_name, AzureKeyCredential(search_service_api_key)) as batch_client: \n",
" # Add upload actions for all documents \n",
" batch_client.upload_documents(documents=documents) \n",
" \n",
"print(f\"Uploaded {len(documents)} documents in total\") "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If your dataset didn't already contain pre-computed embeddings, you can create embeddings by using the below function using the `openai` python library. You'll also notice the same function and model are being used to generate query embeddings for performing vector searches."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Content: April is the fourth month of the year in the Julian and Gregorian calendars, and comes between March\n",
"Content vector generated\n"
]
}
],
"source": [
"# Example function to generate document embedding \n",
"def generate_document_embeddings(text): \n",
" response = openai.Embedding.create( \n",
" input=text, engine=model) \n",
" embeddings = response['data'][0]['embedding'] \n",
" return embeddings \n",
" \n",
"# Sampling the first document content as an example \n",
"first_document_content = documents[0]['text'] \n",
"print(f\"Content: {first_document_content[:100]}\") \n",
" \n",
"# Generate the content vector using the `generate_document_embeddings` function \n",
"content_vector = generate_document_embeddings(first_document_content) \n",
"print(f\"Content vector generated\") \n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Perform a vector similarity search"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Title: Documenta\n",
"Score: 0.8599451\n",
"URL: https://simple.wikipedia.org/wiki/Documenta\n",
"\n",
"Title: Museum of Modern Art\n",
"Score: 0.85260946\n",
"URL: https://simple.wikipedia.org/wiki/Museum%20of%20Modern%20Art\n",
"\n",
"Title: Expressionism\n",
"Score: 0.85235393\n",
"URL: https://simple.wikipedia.org/wiki/Expressionism\n",
"\n"
]
}
],
"source": [
"# Function to generate query embedding\n",
"def generate_embeddings(text):\n",
" response = openai.Embedding.create(\n",
" input=text, engine=model)\n",
" embeddings = response['data'][0]['embedding']\n",
" return embeddings\n",
"\n",
"# Pure Vector Search\n",
"query = \"modern art in Europe\"\n",
" \n",
"search_client = SearchClient(search_service_endpoint, index_name, AzureKeyCredential(search_service_api_key)) \n",
"vector = Vector(value=generate_embeddings(query), k=3, fields=\"content_vector\") \n",
" \n",
"results = search_client.search( \n",
" search_text=None, \n",
" vectors=[vector], \n",
" select=[\"title\", \"text\", \"url\"] \n",
")\n",
" \n",
"for result in results: \n",
" print(f\"Title: {result['title']}\") \n",
" print(f\"Score: {result['@search.score']}\") \n",
" print(f\"URL: {result['url']}\\n\") "
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Perform a Hybrid Search"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Title: Wars of Scottish Independence\n",
"Score: 0.03306011110544205\n",
"URL: https://simple.wikipedia.org/wiki/Wars%20of%20Scottish%20Independence\n",
"\n",
"Title: Battle of Bannockburn\n",
"Score: 0.022253260016441345\n",
"URL: https://simple.wikipedia.org/wiki/Battle%20of%20Bannockburn\n",
"\n",
"Title: Scottish\n",
"Score: 0.016393441706895828\n",
"URL: https://simple.wikipedia.org/wiki/Scottish\n",
"\n"
]
}
],
"source": [
"# Hybrid Search\n",
"query = \"Famous battles in Scottish history\" \n",
" \n",
"search_client = SearchClient(search_service_endpoint, index_name, AzureKeyCredential(search_service_api_key)) \n",
"vector = Vector(value=generate_embeddings(query), k=3, fields=\"content_vector\") \n",
" \n",
"results = search_client.search( \n",
" search_text=query, \n",
" vectors=[vector],\n",
" select=[\"title\", \"text\", \"url\"],\n",
" top=3\n",
") \n",
" \n",
"for result in results: \n",
" print(f\"Title: {result['title']}\") \n",
" print(f\"Score: {result['@search.score']}\") \n",
" print(f\"URL: {result['url']}\\n\") "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Perform a Hybrid Search with Reranking (powered by Bing)\n",
"[Semantic search](https://learn.microsoft.com/azure/search/semantic-ranking) allows you to leverage deep neural networks from Microsoft Bing to further increase your search accuracy. Additionally, you can get captions, answers, and highlights. "
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Semantic Answer: The<em> Battle of Bannockburn,</em> fought on 23 and 24 June 1314, was an important Scottish victory in the Wars of Scottish Independence. A smaller Scottish army defeated a much larger and better armed English army. Background When King Alexander III of Scotland died in 1286, his heir was his granddaughter Margaret, Maid of Norway.\n",
"Semantic Answer Score: 0.8857421875\n",
"\n",
"Title: Wars of Scottish Independence\n",
"URL: https://simple.wikipedia.org/wiki/Wars%20of%20Scottish%20Independence\n",
"Caption: Important Figures Scotland King David II King John Balliol King Robert I the Bruce William Wallace England King Edward I King Edward II King Edward III Battles Battle of Bannockburn The Battle of Bannockburn (2324 June 1314) was an important Scottish victory. It was the decisive battle in the First War of Scottish Independence.\n",
"\n",
"Title: Battle of Bannockburn\n",
"URL: https://simple.wikipedia.org/wiki/Battle%20of%20Bannockburn\n",
"Caption: The Battle of Bannockburn, fought on 23 and 24 June 1314, was an important<em> Scottish</em> victory in the Wars of<em> Scottish</em> Independence. A smaller Scottish army defeated a much larger and better armed English army. Background When King Alexander III of Scotland died in 1286, his heir was his granddaughter Margaret, Maid of Norway.\n",
"\n",
"Title: First War of Scottish Independence\n",
"URL: https://simple.wikipedia.org/wiki/First%20War%20of%20Scottish%20Independence\n",
"Caption: The First War of<em> Scottish Independence</em> lasted from the outbreak of the war in 1296 until the 1328. The Scots were defeated at Dunbar on 27 April 1296. John Balliol abdicated in Montrose castle on 10 July 1296.\n",
"\n"
]
}
],
"source": [
"# Semantic Hybrid Search\n",
"query = \"Famous battles in Scottish history\" \n",
"\n",
"search_client = SearchClient(search_service_endpoint, index_name, AzureKeyCredential(search_service_api_key)) \n",
"vector = Vector(value=generate_embeddings(query), k=3, fields=\"content_vector\") \n",
"\n",
"results = search_client.search( \n",
" search_text=query, \n",
" vectors=[vector], \n",
" select=[\"title\", \"text\", \"url\"],\n",
" query_type=\"semantic\", query_language=\"en-us\", semantic_configuration_name='my-semantic-config', query_caption=\"extractive\", query_answer=\"extractive\",\n",
" top=3\n",
")\n",
"\n",
"semantic_answers = results.get_answers()\n",
"for answer in semantic_answers:\n",
" if answer.highlights:\n",
" print(f\"Semantic Answer: {answer.highlights}\")\n",
" else:\n",
" print(f\"Semantic Answer: {answer.text}\")\n",
" print(f\"Semantic Answer Score: {answer.score}\\n\")\n",
"\n",
"for result in results:\n",
" print(f\"Title: {result['title']}\")\n",
" print(f\"URL: {result['url']}\")\n",
" captions = result[\"@search.captions\"]\n",
" if captions:\n",
" caption = captions[0]\n",
" if caption.highlights:\n",
" print(f\"Caption: {caption.highlights}\\n\")\n",
" else:\n",
" print(f\"Caption: {caption.text}\\n\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -578,9 +578,9 @@
- embeddings
- tiktoken
- title: Azure Cognitive Search as a vector database for OpenAI embeddings
- title: Azure AI Search as a vector database for OpenAI embeddings
path: >-
examples/vector_databases/azuresearch/Getting_started_with_azure_cognitive_search_and_openai.ipynb
examples/vector_databases/azuresearch/Getting_started_with_azure_ai_search_and_openai.ipynb
date: 2023-09-11
authors:
- farzad528

Loading…
Cancel
Save