langchain/tests/integration_tests/vectorstores
Raymond Yuan 5171c3bcca
Refactor vector storage to correctly handle relevancy scores (#6570)
Description: This pull request aims to support generating the correct
generic relevancy scores for different vector stores by refactoring the
relevance score functions and their selection in the base class and
subclasses of VectorStore. This is especially relevant with VectorStores
that require a distance metric upon initialization. Note many of the
current implenetations of `_similarity_search_with_relevance_scores` are
not technically correct, as they just return
`self.similarity_search_with_score(query, k, **kwargs)` without applying
the relevant score function

Also includes changes associated with:
https://github.com/hwchase17/langchain/pull/6564 and
https://github.com/hwchase17/langchain/pull/6494

See more indepth discussion in thread in #6494 

Issue: 
https://github.com/hwchase17/langchain/issues/6526
https://github.com/hwchase17/langchain/issues/6481
https://github.com/hwchase17/langchain/issues/6346

Dependencies: None

The changes include:
- Properly handling score thresholding in FAISS
`similarity_search_with_score_by_vector` for the corresponding distance
metric.
- Refactoring the `_similarity_search_with_relevance_scores` method in
the base class and removing it from the subclasses for incorrectly
implemented subclasses.
- Adding a `_select_relevance_score_fn` method in the base class and
implementing it in the subclasses to select the appropriate relevance
score function based on the distance strategy.
- Updating the `__init__` methods of the subclasses to set the
`relevance_score_fn` attribute.
- Removing the `_default_relevance_score_fn` function from the FAISS
class and using the base class's `_euclidean_relevance_score_fn`
instead.
- Adding the `DistanceStrategy` enum to the `utils.py` file and updating
the imports in the vector store classes.
- Updating the tests to import the `DistanceStrategy` enum from the
`utils.py` file.

---------

Co-authored-by: Hanit <37485638+hanit-com@users.noreply.github.com>
2023-07-10 20:37:03 -07:00
..
cassettes
docarray Add DocArray vector stores (#4483) 2023-05-10 15:22:16 -07:00
docker-compose
fixtures
__init__.py
conftest.py
fake_embeddings.py Vector store support for Cassandra (#6426) 2023-06-20 10:46:20 -07:00
test_alibabacloud_opensearch.py Add Alibaba Cloud OpenSearch as a new vector store (#6154) 2023-06-20 10:07:40 -07:00
test_analyticdb.py Implement delete interface of vector store on AnalyticDB (#7170) 2023-07-05 13:01:00 -07:00
test_annoy.py
test_atlas.py
test_awadb.py Add a new vector store - AwaDB (#5971) (#5992) 2023-06-10 15:42:32 -07:00
test_azuresearch.py Harrison/cognitive search (#6011) 2023-06-11 21:15:42 -07:00
test_cassandra.py Second Attempt - Add concurrent insertion of vector rows in the Cassandra Vector Store (#7017) 2023-07-01 11:09:52 -07:00
test_chroma.py Refactor vector storage to correctly handle relevancy scores (#6570) 2023-07-10 20:37:03 -07:00
test_clarifai.py Clarifai integration (#5954) 2023-06-22 08:00:15 -07:00
test_clickhouse.py Integrate Clickhouse as Vector Store (#5650) 2023-06-05 13:32:04 -07:00
test_deeplake.py Added deeplake use case examples of the new features (#6528) 2023-07-10 07:04:29 -07:00
test_elasticsearch.py Add the usage of SSL certificates for Elasticsearch and user password authentication (#5058) 2023-05-22 11:51:32 -07:00
test_faiss.py Refactor vector storage to correctly handle relevancy scores (#6570) 2023-07-10 20:37:03 -07:00
test_hologres.py Harrison/hologres (#6012) 2023-06-11 20:56:51 -07:00
test_lancedb.py
test_marqo.py Adding Marqo to vectorstore ecosystem (#7068) 2023-07-05 14:44:12 -07:00
test_milvus.py
test_mongodb_atlas.py adding max_marginal_relevance_search method to MongoDBAtlasVectorSearch (#7310) 2023-07-10 04:04:19 -04:00
test_myscale.py
test_opensearch.py OpenSearch: Add Similarity Search with Score (#4089) 2023-05-08 16:35:21 -07:00
test_pgvector.py Refactor vector storage to correctly handle relevancy scores (#6570) 2023-07-10 20:37:03 -07:00
test_pinecone.py Refactor vector storage to correctly handle relevancy scores (#6570) 2023-07-10 20:37:03 -07:00
test_qdrant.py Refactor vector storage to correctly handle relevancy scores (#6570) 2023-07-10 20:37:03 -07:00
test_redis.py Update redis integration tests (#4937) 2023-05-18 10:22:17 -07:00
test_rocksetdb.py Integrate Rockset as Vectorstore (#6216) 2023-06-21 01:22:27 -07:00
test_singlestoredb.py Refactor vector storage to correctly handle relevancy scores (#6570) 2023-07-10 20:37:03 -07:00
test_tair.py Harrison/tair (#3770) 2023-04-28 21:25:33 -07:00
test_vectara.py Vectara upd2 (#6506) 2023-07-02 12:15:50 -07:00
test_weaviate.py Fixes issue #5072 - adds additional support to Weaviate (#5085) 2023-05-22 18:57:10 -07:00
test_zilliz.py