You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/libs/community/langchain_community/vectorstores
Clément Tamines a6cbb755a7
community[patch]: fix semantic answer bug in AzureSearch vector store (#18938)
- **Description:** The `semantic_hybrid_search_with_score_and_rerank`
method of `AzureSearch` contains a hardcoded field name "metadata" for
the document metadata in the Azure AI Search Index. Adding such a field
is optional when creating an Azure AI Search Index, as other snippets
from `AzureSearch` test for the existence of this field before trying to
access it. Furthermore, the metadata field name shouldn't be hardcoded
as "metadata" and use the `FIELDS_METADATA` variable that defines this
field name instead. In the current implementation, any index without a
metadata field named "metadata" will yield an error if a semantic answer
is returned by the search in
`semantic_hybrid_search_with_score_and_rerank`.

- **Issue:** https://github.com/langchain-ai/langchain/issues/18731

- **Prior fix to this bug:** This bug was fixed in this PR
https://github.com/langchain-ai/langchain/pull/15642 by adding a check
for the existence of the metadata field named `FIELDS_METADATA` and
retrieving a value for the key called "key" in that metadata if it
exists. If the field named `FIELDS_METADATA` was not present, an empty
string was returned. This fix was removed in this PR
https://github.com/langchain-ai/langchain/pull/15659 (see
ed1ffca911#).
@lz-chen: could you confirm this wasn't intentional? 

- **New fix to this bug:** I believe there was an oversight in the logic
of the fix from
[#1564](https://github.com/langchain-ai/langchain/pull/15642) which I
explain below.
The `semantic_hybrid_search_with_score_and_rerank` method creates a
dictionary `semantic_answers_dict` with semantic answers returned by the
search as follows.

5c2f7e6b2b/libs/community/langchain_community/vectorstores/azuresearch.py (L574-L581)
The keys in this dictionary are the unique document ids in the index, if
I understand the [documentation of semantic
answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers)
in Azure AI Search correctly. When the method transforms a search result
into a `Document` object, an "answer" key is added to the document's
metadata. The value for this "answer" key should be the semantic answer
returned by the search from this document, if such an answer is
returned. The match between a `Document` object and the semantic answers
returned by the search should be done through the unique document id,
which is used as a key for the `semantic_answers_dict` dictionary. This
id is defined in the search result's field named `FIELDS_ID`. I added a
check to avoid any error in case no field named `FIELDS_ID` exists in a
search result (which shouldn't happen in theory).
A benefit of this approach is that this fix should work whether or not
the Azure AI Search Index contains a metadata field.

@levalencia could you confirm my analysis and test the fix?
@raunakshrivastava7 do you agree with the fix?

Thanks for the help!
4 months ago
..
docarray community[patch]: docarray requires hnsw installation (#19416) 4 months ago
redis community[patch]: the syntax error for Redis generated query (#17717) 5 months ago
__init__.py community[minor]: Add `DuckDB` as a vectorstore (#18916) 4 months ago
alibabacloud_opensearch.py infra: add -p to mkdir in lint steps (#17013) 6 months ago
analyticdb.py infra: add print rule to ruff (#16221) 6 months ago
annoy.py community[major]: breaking change in some APIs to force users to opt-in for pickling (#18696) 5 months ago
apache_doris.py community[minor]: Add Apache Doris as vector store (#17527) 6 months ago
astradb.py community: Fix deprecation version of AstraDB VectorStore (#17991) 5 months ago
atlas.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
awadb.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
azure_cosmos_db.py community[minor]: Adding Azure Cosmos Mongo vCore Vector DB Cache (#16856) 5 months ago
azuresearch.py community[patch]: fix semantic answer bug in AzureSearch vector store (#18938) 4 months ago
bageldb.py community[patch]: apply embedding functions during query if defined (#16646) 6 months ago
baiducloud_vector_search.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
baiduvectordb.py community[patch]: fix bugs in baiduvectordb as vectorstore (#19380) 4 months ago
bigquery_vector_search.py community[patch]: BigQueryVectorSearch JSON type unsupported for metadatas (#18234) 5 months ago
cassandra.py community[patch]: Use newer MetadataVectorCassandraTable in Cassandra vector store (#15987) 7 months ago
chroma.py community[patch]: Chroma use uuid4 instead of uuid1 to generate random ids (#18723) 5 months ago
clarifai.py community[patch] : Tidy up and update Clarifai SDK functions (#18314) 5 months ago
clickhouse.py Add docstrings for Clickhouse class methods (#19195) 5 months ago
couchbase.py community: add Couchbase Vector Store (#18994) 5 months ago
dashvector.py community: Add `partition` parameter to DashVector (#19023) 5 months ago
databricks_vector_search.py community[minor]: Add mmr and similarity_score_threshold retrieval to DatabricksVectorSearch (#16829) 6 months ago
deeplake.py community[patch]: fix, better error message in deeplake vectoriser (#18397) 5 months ago
dingo.py community[patch], langchain[minor]: Add retriever self_query and score_threshold in DingoDB (#18106) 5 months ago
documentdb.py community[minor]: Add DocumentDBVectorSearch VectorStore (#17757) 5 months ago
duckdb.py community[minor]: Add `DuckDB` as a vectorstore (#18916) 4 months ago
elastic_vector_search.py elasticsearch[patch], community[patch]: update references, deprecate community classes (#18506) 5 months ago
elasticsearch.py elasticsearch[patch], community[patch]: update references, deprecate community classes (#18506) 5 months ago
epsilla.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
faiss.py commnity[patch]: refactor code for faiss vectorstore, update faiss vectorstore documentation (#18092) 5 months ago
hanavector.py community[patch]: More flexible handling for entity names in vector store "HANA Cloud" (#19523) 4 months ago
hippo.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
hologres.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
infinispanvs.py community: VectorStore Infinispan, adding autoconfiguration (#18967) 5 months ago
inmemory.py community[minor]: Add InMemoryVectorStore (#19326) 5 months ago
jaguar.py infra: add print rule to ruff (#16221) 6 months ago
kdbai.py community: vectorstores.kdbai - Added support for when no docs are present (#18103) 5 months ago
kinetica.py Langchain vectorstore integration with Kinetica (#18102) 5 months ago
lancedb.py community[patch]: LanceDB integration improvements/fixes (#16173) 5 months ago
lantern.py community[patch]: docstrings (#16810) 6 months ago
llm_rails.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
marqo.py infra: add print rule to ruff (#16221) 6 months ago
matching_engine.py marked MatchingEngine as deprecated (#18585) 5 months ago
meilisearch.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
milvus.py community[patch]: milvus will autoflush, manual flush is slowly (#19300) 4 months ago
momento_vector_index.py community[patch]: support momento vector index filter expressions (#14978) 8 months ago
mongodb_atlas.py community[patch]: Fix MongoDBAtlasVectorSearch max_marginal_relevance_search (#17971) 5 months ago
myscale.py community: fix myscale delete function bug (#15675) 7 months ago
neo4j_vector.py Switch to md5 for deduplication in neo4j integrations (#18846) 5 months ago
nucliadb.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
opensearch_vector_search.py community[patch]: Opensearch delete method added - indexing supported (#18522) 5 months ago
pgembedding.py infra: add print rule to ruff (#16221) 6 months ago
pgvecto_rs.py infra: add -p to mkdir in lint steps (#17013) 6 months ago
pgvector.py community[patch]: avoid creating extension PGvector while using readOnly Databases (#19268) 4 months ago
pinecone.py pinecone[patch], docs: PineconeVectorStore, release 0.0.3 (#17896) 5 months ago
qdrant.py community[patch]: implement qdrant _aembed_query and use it in other async funcs (#19155) 5 months ago
rocksetdb.py community[patch]: update copy of metadata in rockset vectorstore integration (#17612) 6 months ago
scann.py community[major]: breaking change in some APIs to force users to opt-in for pickling (#18696) 5 months ago
semadb.py infra: add print rule to ruff (#16221) 6 months ago
singlestoredb.py community[patch]: Added add_images method to SingleStoreDB vector store (#17871) 5 months ago
sklearn.py docs: docstrings `langchain_community` update (#14889) 8 months ago
sqlitevss.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
starrocks.py infra: add print rule to ruff (#16221) 6 months ago
supabase.py Support `score_threshold` in SupabaseVectorStore similarity search (#14439) 7 months ago
surrealdb.py community[patch]: bug fix - add empty metadata when metadata not provided (#17669) 5 months ago
tair.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
tencentvectordb.py infra: add -p to mkdir in lint steps (#17013) 6 months ago
thirdai_neuraldb.py community[patch]: fix lint (#17984) 5 months ago
tidb_vector.py community[minor]: Add Initial Support for TiDB Vector Store (#15796) 5 months ago
tigris.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
tiledb.py community[major]: breaking change in some APIs to force users to opt-in for pickling (#18696) 5 months ago
timescalevector.py docs: docstrings `langchain_community` update (#14889) 8 months ago
typesense.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
usearch.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
utils.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
vald.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
vearch.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
vectara.py infra: add print rule to ruff (#16221) 6 months ago
vespa.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
vikingdb.py docs: add vikingdb docstrings(#19016) 5 months ago
weaviate.py community: Fixed schema discrepancy in from_texts function for weaviate vectorstore (#16693) 6 months ago
xata.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
yellowbrick.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
zep.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 8 months ago
zilliz.py community[patch]: Milvus supports add & delete texts by ids (#16256) 6 months ago