You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/libs/community/langchain_community
Clément Tamines a6cbb755a7
community[patch]: fix semantic answer bug in AzureSearch vector store (#18938)
- **Description:** The `semantic_hybrid_search_with_score_and_rerank`
method of `AzureSearch` contains a hardcoded field name "metadata" for
the document metadata in the Azure AI Search Index. Adding such a field
is optional when creating an Azure AI Search Index, as other snippets
from `AzureSearch` test for the existence of this field before trying to
access it. Furthermore, the metadata field name shouldn't be hardcoded
as "metadata" and use the `FIELDS_METADATA` variable that defines this
field name instead. In the current implementation, any index without a
metadata field named "metadata" will yield an error if a semantic answer
is returned by the search in
`semantic_hybrid_search_with_score_and_rerank`.

- **Issue:** https://github.com/langchain-ai/langchain/issues/18731

- **Prior fix to this bug:** This bug was fixed in this PR
https://github.com/langchain-ai/langchain/pull/15642 by adding a check
for the existence of the metadata field named `FIELDS_METADATA` and
retrieving a value for the key called "key" in that metadata if it
exists. If the field named `FIELDS_METADATA` was not present, an empty
string was returned. This fix was removed in this PR
https://github.com/langchain-ai/langchain/pull/15659 (see
ed1ffca911#).
@lz-chen: could you confirm this wasn't intentional? 

- **New fix to this bug:** I believe there was an oversight in the logic
of the fix from
[#1564](https://github.com/langchain-ai/langchain/pull/15642) which I
explain below.
The `semantic_hybrid_search_with_score_and_rerank` method creates a
dictionary `semantic_answers_dict` with semantic answers returned by the
search as follows.

5c2f7e6b2b/libs/community/langchain_community/vectorstores/azuresearch.py (L574-L581)
The keys in this dictionary are the unique document ids in the index, if
I understand the [documentation of semantic
answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers)
in Azure AI Search correctly. When the method transforms a search result
into a `Document` object, an "answer" key is added to the document's
metadata. The value for this "answer" key should be the semantic answer
returned by the search from this document, if such an answer is
returned. The match between a `Document` object and the semantic answers
returned by the search should be done through the unique document id,
which is used as a key for the `semantic_answers_dict` dictionary. This
id is defined in the search result's field named `FIELDS_ID`. I added a
check to avoid any error in case no field named `FIELDS_ID` exists in a
search result (which shouldn't happen in theory).
A benefit of this approach is that this fix should work whether or not
the Azure AI Search Index contains a metadata field.

@levalencia could you confirm my analysis and test the fix?
@raunakshrivastava7 do you agree with the fix?

Thanks for the help!
4 months ago
..
adapters docs: added `community` modules descriptions (#17827) 5 months ago
agent_toolkits community: Add PolygonAggregates tool (#18882) 5 months ago
callbacks community[patch] : [Fiddler] ensure dataset is not added if model is present (#19293) 4 months ago
chat_loaders community[patch]: speed up import times in the community package (#18928) 5 months ago
chat_message_histories community[patch]: speed up import times in the community package (#18928) 5 months ago
chat_models community[minor]: Prem AI langchain integration (#19113) 4 months ago
docstore community[patch]: speed up import times in the community package (#18928) 5 months ago
document_compressors community[patch]: speed up import times in the community package (#18928) 5 months ago
document_loaders community[patch]: expanding version in confluence loader (#19324) 4 months ago
document_transformers community[patch]: flattening imports 3 (#18939) 5 months ago
embeddings community[minor]: Prem AI langchain integration (#19113) 4 months ago
example_selectors docs: added `community` modules descriptions (#17827) 5 months ago
graphs community[patch]: flattening imports 3 (#18939) 5 months ago
indexes community: Add document manager and mongo document manager (#17320) 5 months ago
llms community[patch]: YandexGPT Use recent yandexcloud sdk version (#19341) 4 months ago
output_parsers langchain[patch], community[minor]: move `output_parsers.ernie_functions` (#16057) 7 months ago
retrievers community[patch]: Fixing incorrect base URLs for Azure Cognitive Search Retriever (#19352) 4 months ago
storage community[patch]: flattening imports 3 (#18939) 5 months ago
tools Josha91 fix docstring (#19249) 5 months ago
utilities community[patch]: flattening imports 3 (#18939) 5 months ago
utils core[patch]: Convert SimSIMD back to NumPy (#19473) 4 months ago
vectorstores community[patch]: fix semantic answer bug in AzureSearch vector store (#18938) 4 months ago
__init__.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
cache.py community: Use langchain-astradb for AstraDB caches (#18419) 5 months ago
py.typed community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago