Community: Updating Azure Retriever and Docs to be Azure AI Search instead of Azure Cognitive Search (#19925)

Last year Microsoft [changed the
name](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search)
of Azure Cognitive Search to Azure AI Search. This PR updates the
Langchain Azure Retriever API and it's associated docs to reflect this
change. It may be confusing for users to see the name Cognitive here and
AI in the Microsoft documentation which is why this is needed. I've also
added a more detailed example to the Azure retriever doc page.

There are more places that need a similar update but I'm breaking it up
so the PRs are not too big 😄 Fixing my errors from the previous PR.

Twitter: @marlene_zw

Two new tests added to test backward compatibility in
`libs/community/tests/integration_tests/retrievers/test_azure_cognitive_search.py`

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
pull/9293/merge
Marlene 1 month ago committed by GitHub
parent 820b713086
commit 2f03bc397e
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -933,7 +933,7 @@
"**Answer**: The LangChain class includes various types of retrievers such as:\n",
"\n",
"- ArxivRetriever\n",
"- AzureCognitiveSearchRetriever\n",
"- AzureAISearchRetriever\n",
"- BM25Retriever\n",
"- ChaindeskRetriever\n",
"- ChatGPTPluginRetriever\n",
@ -993,7 +993,7 @@
{
"data": {
"text/plain": [
"{'question': 'LangChain possesses a variety of retrievers including:\\n\\n1. ArxivRetriever\\n2. AzureCognitiveSearchRetriever\\n3. BM25Retriever\\n4. ChaindeskRetriever\\n5. ChatGPTPluginRetriever\\n6. ContextualCompressionRetriever\\n7. DocArrayRetriever\\n8. ElasticSearchBM25Retriever\\n9. EnsembleRetriever\\n10. GoogleVertexAISearchRetriever\\n11. AmazonKendraRetriever\\n12. KNNRetriever\\n13. LlamaIndexGraphRetriever\\n14. LlamaIndexRetriever\\n15. MergerRetriever\\n16. MetalRetriever\\n17. MilvusRetriever\\n18. MultiQueryRetriever\\n19. ParentDocumentRetriever\\n20. PineconeHybridSearchRetriever\\n21. PubMedRetriever\\n22. RePhraseQueryRetriever\\n23. RemoteLangChainRetriever\\n24. SelfQueryRetriever\\n25. SVMRetriever\\n26. TFIDFRetriever\\n27. TimeWeightedVectorStoreRetriever\\n28. VespaRetriever\\n29. WeaviateHybridSearchRetriever\\n30. WebResearchRetriever\\n31. WikipediaRetriever\\n32. ZepRetriever\\n33. ZillizRetriever\\n\\nIt also includes self query translators like:\\n\\n1. ChromaTranslator\\n2. DeepLakeTranslator\\n3. MyScaleTranslator\\n4. PineconeTranslator\\n5. QdrantTranslator\\n6. WeaviateTranslator\\n\\nAnd remote retrievers like:\\n\\n1. RemoteLangChainRetriever'}"
"{'question': 'LangChain possesses a variety of retrievers including:\\n\\n1. ArxivRetriever\\n2. AzureAISearchRetriever\\n3. BM25Retriever\\n4. ChaindeskRetriever\\n5. ChatGPTPluginRetriever\\n6. ContextualCompressionRetriever\\n7. DocArrayRetriever\\n8. ElasticSearchBM25Retriever\\n9. EnsembleRetriever\\n10. GoogleVertexAISearchRetriever\\n11. AmazonKendraRetriever\\n12. KNNRetriever\\n13. LlamaIndexGraphRetriever\\n14. LlamaIndexRetriever\\n15. MergerRetriever\\n16. MetalRetriever\\n17. MilvusRetriever\\n18. MultiQueryRetriever\\n19. ParentDocumentRetriever\\n20. PineconeHybridSearchRetriever\\n21. PubMedRetriever\\n22. RePhraseQueryRetriever\\n23. RemoteLangChainRetriever\\n24. SelfQueryRetriever\\n25. SVMRetriever\\n26. TFIDFRetriever\\n27. TimeWeightedVectorStoreRetriever\\n28. VespaRetriever\\n29. WeaviateHybridSearchRetriever\\n30. WebResearchRetriever\\n31. WikipediaRetriever\\n32. ZepRetriever\\n33. ZillizRetriever\\n\\nIt also includes self query translators like:\\n\\n1. ChromaTranslator\\n2. DeepLakeTranslator\\n3. MyScaleTranslator\\n4. PineconeTranslator\\n5. QdrantTranslator\\n6. WeaviateTranslator\\n\\nAnd remote retrievers like:\\n\\n1. RemoteLangChainRetriever'}"
]
},
"execution_count": 31,
@ -1117,7 +1117,7 @@
"The LangChain class includes various types of retrievers such as:\n",
"\n",
"- ArxivRetriever\n",
"- AzureCognitiveSearchRetriever\n",
"- AzureAISearchRetriever\n",
"- BM25Retriever\n",
"- ChaindeskRetriever\n",
"- ChatGPTPluginRetriever\n",

File diff suppressed because one or more lines are too long

@ -252,23 +252,23 @@ from langchain_community.vectorstores import AzureCosmosDBVectorSearch
```
## Retrievers
### Azure Cognitive Search
### Azure AI Search
>[Azure Cognitive Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) (formerly known as `Azure Search`) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.
>[Azure AI Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) (formerly known as `Azure Search` or `Azure Cognitive Search` ) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.
>Search is foundational to any app that surfaces text to users, where common scenarios include catalog or document search, online retail apps, or data exploration over proprietary content. When you create a search service, you'll work with the following capabilities:
>- A search engine for full text search over a search index containing user-owned content
>- Rich indexing, with lexical analysis and optional AI enrichment for content extraction and transformation
>- Rich query syntax for text search, fuzzy search, autocomplete, geo-search and more
>- Programmability through REST APIs and client libraries in Azure SDKs
>- Azure integration at the data layer, machine learning layer, and AI (Cognitive Services)
>- Azure integration at the data layer, machine learning layer, and AI (AI Services)
See [set up instructions](https://learn.microsoft.com/en-us/azure/search/search-create-service-portal).
See a [usage example](/docs/integrations/retrievers/azure_cognitive_search).
See a [usage example](/docs/integrations/retrievers/azure_ai_search).
```python
from langchain.retrievers import AzureCognitiveSearchRetriever
from langchain.retrievers import AzureAISearchRetriever
```
## Toolkits

File diff suppressed because one or more lines are too long

@ -1,147 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "1edb9e6b",
"metadata": {},
"source": [
"# Azure Cognitive Search\n",
"\n",
">[Azure Cognitive Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) (formerly known as `Azure Search`) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.\n",
"\n",
">Search is foundational to any app that surfaces text to users, where common scenarios include catalog or document search, online retail apps, or data exploration over proprietary content. When you create a search service, you'll work with the following capabilities:\n",
">- A search engine for full text search over a search index containing user-owned content\n",
">- Rich indexing, with lexical analysis and optional AI enrichment for content extraction and transformation\n",
">- Rich query syntax for text search, fuzzy search, autocomplete, geo-search and more\n",
">- Programmability through REST APIs and client libraries in Azure SDKs\n",
">- Azure integration at the data layer, machine learning layer, and AI (Cognitive Services)\n",
"\n",
"This notebook shows how to use Azure Cognitive Search (ACS) within LangChain."
]
},
{
"cell_type": "markdown",
"id": "074b0004",
"metadata": {},
"source": [
"## Set up Azure Cognitive Search\n",
"\n",
"To set up ACS, please follow the instructions [here](https://learn.microsoft.com/en-us/azure/search/search-create-service-portal).\n",
"\n",
"Please note\n",
"1. the name of your ACS service, \n",
"2. the name of your ACS index,\n",
"3. your API key.\n",
"\n",
"Your API key can be either Admin or Query key, but as we only read data it is recommended to use a Query key."
]
},
{
"cell_type": "markdown",
"id": "0474661d",
"metadata": {},
"source": [
"## Using the Azure Cognitive Search Retriever"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "39d6074e",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain_community.retrievers import (\n",
" AzureCognitiveSearchRetriever,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "b7243e6d",
"metadata": {},
"source": [
"Set Service Name, Index Name and API key as environment variables (alternatively, you can pass them as arguments to `AzureCognitiveSearchRetriever`)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "33fd23d1",
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"AZURE_COGNITIVE_SEARCH_SERVICE_NAME\"] = \"<YOUR_ACS_SERVICE_NAME>\"\n",
"os.environ[\"AZURE_COGNITIVE_SEARCH_INDEX_NAME\"] = \"<YOUR_ACS_INDEX_NAME>\"\n",
"os.environ[\"AZURE_COGNITIVE_SEARCH_API_KEY\"] = \"<YOUR_API_KEY>\""
]
},
{
"cell_type": "markdown",
"id": "057deaad",
"metadata": {},
"source": [
"Create the Retriever"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c18d0c4c",
"metadata": {},
"outputs": [],
"source": [
"retriever = AzureCognitiveSearchRetriever(content_key=\"content\", top_k=10)"
]
},
{
"cell_type": "markdown",
"id": "e94ea104",
"metadata": {},
"source": [
"Now you can use retrieve documents from Azure Cognitive Search"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c8b5794b",
"metadata": {},
"outputs": [],
"source": [
"retriever.get_relevant_documents(\"what is langchain\")"
]
},
{
"cell_type": "markdown",
"id": "72eca08e",
"metadata": {},
"source": [
"You can change the number of results returned with the `top_k` parameter. The default value is `None`, which returns all results. "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -26,7 +26,8 @@ _module_lookup = {
"AmazonKnowledgeBasesRetriever": "langchain_community.retrievers.bedrock",
"ArceeRetriever": "langchain_community.retrievers.arcee",
"ArxivRetriever": "langchain_community.retrievers.arxiv",
"AzureCognitiveSearchRetriever": "langchain_community.retrievers.azure_cognitive_search", # noqa: E501
"AzureAISearchRetriever": "langchain_community.retrievers.azure_ai_search", # noqa: E501
"AzureCognitiveSearchRetriever": "langchain_community.retrievers.azure_ai_search", # noqa: E501
"BM25Retriever": "langchain_community.retrievers.bm25",
"BreebsRetriever": "langchain_community.retrievers.breebs",
"ChaindeskRetriever": "langchain_community.retrievers.chaindesk",

@ -18,13 +18,13 @@ DEFAULT_URL_SUFFIX = "search.windows.net"
"""Default URL Suffix for endpoint connection - commercial cloud"""
class AzureCognitiveSearchRetriever(BaseRetriever):
"""`Azure Cognitive Search` service retriever."""
class AzureAISearchRetriever(BaseRetriever):
"""`Azure AI Search` service retriever."""
service_name: str = ""
"""Name of Azure Cognitive Search service"""
"""Name of Azure AI Search service"""
index_name: str = ""
"""Name of Index inside Azure Cognitive Search service"""
"""Name of Index inside Azure AI Search service"""
api_key: str = ""
"""API Key. Both Admin and Query keys work, but for reading data it's
recommended to use a Query key."""
@ -45,27 +45,30 @@ class AzureCognitiveSearchRetriever(BaseRetriever):
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that service name, index name and api key exists in environment."""
values["service_name"] = get_from_dict_or_env(
values, "service_name", "AZURE_COGNITIVE_SEARCH_SERVICE_NAME"
values, "service_name", "AZURE_AI_SEARCH_SERVICE_NAME"
)
values["index_name"] = get_from_dict_or_env(
values, "index_name", "AZURE_COGNITIVE_SEARCH_INDEX_NAME"
values, "index_name", "AZURE_AI_SEARCH_INDEX_NAME"
)
values["api_key"] = get_from_dict_or_env(
values, "api_key", "AZURE_COGNITIVE_SEARCH_API_KEY"
values, "api_key", "AZURE_AI_SEARCH_API_KEY"
)
return values
def _build_search_url(self, query: str) -> str:
url_suffix = get_from_env(
"", "AZURE_COGNITIVE_SEARCH_URL_SUFFIX", DEFAULT_URL_SUFFIX
)
url_suffix = get_from_env("", "AZURE_AI_SEARCH_URL_SUFFIX", DEFAULT_URL_SUFFIX)
if url_suffix in self.service_name and "https://" in self.service_name:
base_url = f"{self.service_name}/"
elif url_suffix in self.service_name and "https://" not in self.service_name:
base_url = f"https://{self.service_name}/"
elif url_suffix not in self.service_name and "https://" in self.service_name:
base_url = f"{self.service_name}.{url_suffix}/"
elif (
url_suffix not in self.service_name and "https://" not in self.service_name
):
base_url = f"https://{self.service_name}.{url_suffix}/"
else:
# pass to Azure to throw a specific error
base_url = self.service_name
endpoint_path = f"indexes/{self.index_name}/docs?api-version={self.api_version}"
top_param = f"&$top={self.top_k}" if self.top_k else ""
@ -119,3 +122,11 @@ class AzureCognitiveSearchRetriever(BaseRetriever):
Document(page_content=result.pop(self.content_key), metadata=result)
for result in search_results
]
# For backwards compatibility
class AzureCognitiveSearchRetriever(AzureAISearchRetriever):
"""`Azure Cognitive Search` service retriever.
This version of the retriever will soon be
depreciated. Please switch to AzureAISearchRetriever
"""

@ -0,0 +1,70 @@
"""Test Azure AI Search wrapper."""
from langchain_core.documents import Document
from langchain_community.retrievers.azure_ai_search import (
AzureAISearchRetriever,
AzureCognitiveSearchRetriever,
)
def test_azure_ai_search_get_relevant_documents() -> None:
"""Test valid call to Azure AI Search.
In order to run this test, you should provide
a `service_name`, azure search `api_key` and an `index_name`
as arguments for the AzureAISearchRetriever in both tests.
api_version, aiosession and topk_k are optional parameters.
"""
retriever = AzureAISearchRetriever()
documents = retriever.get_relevant_documents("what is langchain?")
for doc in documents:
assert isinstance(doc, Document)
assert doc.page_content
retriever = AzureAISearchRetriever(top_k=1)
documents = retriever.get_relevant_documents("what is langchain?")
assert len(documents) <= 1
async def test_azure_ai_search_aget_relevant_documents() -> None:
"""Test valid async call to Azure AI Search.
In order to run this test, you should provide
a `service_name`, azure search `api_key` and an `index_name`
as arguments for the AzureAISearchRetriever.
"""
retriever = AzureAISearchRetriever()
documents = await retriever.aget_relevant_documents("what is langchain?")
for doc in documents:
assert isinstance(doc, Document)
assert doc.page_content
def test_azure_cognitive_search_get_relevant_documents() -> None:
"""Test valid call to Azure Cognitive Search.
This is to test backwards compatibility of the retriever
"""
retriever = AzureCognitiveSearchRetriever()
documents = retriever.get_relevant_documents("what is langchain?")
for doc in documents:
assert isinstance(doc, Document)
assert doc.page_content
retriever = AzureCognitiveSearchRetriever(top_k=1)
documents = retriever.get_relevant_documents("what is langchain?")
assert len(documents) <= 1
async def test_azure_cognitive_search_aget_relevant_documents() -> None:
"""Test valid async call to Azure Cognitive Search.
This is to test backwards compatibility of the retriever
"""
retriever = AzureCognitiveSearchRetriever()
documents = await retriever.aget_relevant_documents("what is langchain?")
for doc in documents:
assert isinstance(doc, Document)
assert doc.page_content

@ -1,37 +0,0 @@
"""Test Azure Cognitive Search wrapper."""
from langchain_core.documents import Document
from langchain_community.retrievers.azure_cognitive_search import (
AzureCognitiveSearchRetriever,
)
def test_azure_cognitive_search_get_relevant_documents() -> None:
"""Test valid call to Azure Cognitive Search.
In order to run this test, you should provide a service name, azure search api key
and an index_name as arguments for the AzureCognitiveSearchRetriever in both tests.
"""
retriever = AzureCognitiveSearchRetriever()
documents = retriever.get_relevant_documents("what is langchain?")
for doc in documents:
assert isinstance(doc, Document)
assert doc.page_content
retriever = AzureCognitiveSearchRetriever()
documents = retriever.get_relevant_documents("what is langchain?")
assert len(documents) <= 1
async def test_azure_cognitive_search_aget_relevant_documents() -> None:
"""Test valid async call to Azure Cognitive Search.
In order to run this test, you should provide a service name, azure search api key
and an index_name as arguments for the AzureCognitiveSearchRetriever.
"""
retriever = AzureCognitiveSearchRetriever()
documents = await retriever.aget_relevant_documents("what is langchain?")
for doc in documents:
assert isinstance(doc, Document)
assert doc.page_content

@ -5,6 +5,7 @@ EXPECTED_ALL = [
"AmazonKnowledgeBasesRetriever",
"ArceeRetriever",
"ArxivRetriever",
"AzureAISearchRetriever",
"AzureCognitiveSearchRetriever",
"BreebsRetriever",
"ChatGPTPluginRetriever",

@ -60,6 +60,7 @@ __all__ = [
"AmazonKnowledgeBasesRetriever",
"ArceeRetriever",
"ArxivRetriever",
"AzureAISearchRetriever",
"AzureCognitiveSearchRetriever",
"ChatGPTPluginRetriever",
"ContextualCompressionRetriever",

@ -0,0 +1,6 @@
from langchain_community.retrievers.azure_ai_search import (
AzureAISearchRetriever,
AzureCognitiveSearchRetriever,
)
__all__ = ["AzureAISearchRetriever", "AzureCognitiveSearchRetriever"]

@ -1,5 +0,0 @@
from langchain_community.retrievers.azure_cognitive_search import (
AzureCognitiveSearchRetriever,
)
__all__ = ["AzureCognitiveSearchRetriever"]

@ -6,6 +6,7 @@ EXPECTED_ALL = [
"AmazonKnowledgeBasesRetriever",
"ArceeRetriever",
"ArxivRetriever",
"AzureAISearchRetriever",
"AzureCognitiveSearchRetriever",
"ChatGPTPluginRetriever",
"ContextualCompressionRetriever",

Loading…
Cancel
Save