langchain/tests/integration_tests
Stefano Lottini 22af93d851
Vector store support for Cassandra (#6426)
This addresses #6291 adding support for using Cassandra (and compatible
databases, such as DataStax Astra DB) as a [Vector
Store](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor(ANN)+Vector+Search+via+Storage-Attached+Indexes).

A new class `Cassandra` is introduced, which complies with the contract
and interface for a vector store, along with the corresponding
integration test, a sample notebook and modified dependency toml.

Dependencies: the implementation relies on the library `cassio`, which
simplifies interacting with Cassandra for ML- and LLM-oriented
workloads. CassIO, in turn, uses the `cassandra-driver` low-lever
drivers to communicate with the database. The former is added as
optional dependency (+ in `extended_testing`), the latter was already in
the project.

Integration testing relies on a locally-running instance of Cassandra.
[Here](https://cassio.org/more_info/#use-a-local-vector-capable-cassandra)
a detailed description can be found on how to compile and run it (at the
time of writing the feature has not made it yet to a release).

During development of the integration tests, I added a new "fake
embedding" class for what I consider a more controlled way of testing
the MMR search method. Likewise, I had to amend what looked like a
glitch in the behaviour of `ConsistentFakeEmbeddings` whereby an
`embed_query` call would have bypassed storage of the requested text in
the class cache for use in later repeated invocations.

@dev2049 might be the right person to tag here for a review. Thank you!

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-06-20 10:46:20 -07:00
..
agent Add Multi-CSV/DF support in CSV and DataFrame Toolkits (#5009) 2023-05-25 14:23:11 -07:00
cache feat: add Momento as a standard cache and chat message history provider (#5221) 2023-05-25 19:13:21 -07:00
callbacks Add support for tags (#5898) 2023-06-13 12:30:59 -07:00
chains fix neo4j schema query (#6381) 2023-06-19 22:48:35 -07:00
chat_models fix anthropic chat model mutating input list (#6457) 2023-06-19 21:30:52 -07:00
document_loaders Harrison/unstructured page number (#6464) 2023-06-19 22:31:43 -07:00
embeddings add dashscope text embedding (#5929) 2023-06-11 21:14:20 -07:00
examples feat: Add UnstructuredXMLLoader for .xml files (#5955) 2023-06-10 16:24:42 -07:00
llms Baseten integration (#5862) 2023-06-08 23:05:57 -07:00
memory feat: add Momento as a standard cache and chat message history provider (#5221) 2023-05-25 19:13:21 -07:00
prompts
retrievers DocArray as a Retriever (#6031) 2023-06-17 09:09:33 -07:00
utilities ArxivAPIWrapper - doc_content_chars_max (#6063) 2023-06-15 22:16:42 -07:00
vectorstores Vector store support for Cassandra (#6426) 2023-06-20 10:46:20 -07:00
__init__.py
.env.example adding MongoDBAtlasVectorSearch (#5338) 2023-05-30 07:59:01 -07:00
conftest.py
test_document_transformers.py
test_nebulagraph.py Harrison/nebula graph (#5865) 2023-06-07 21:56:43 -07:00
test_nlp_text_splitters.py
test_pdf_pagesplitter.py
test_schema.py Add 'get_token_ids' method (#4784) 2023-05-22 13:17:26 +00:00
test_text_splitter.py chore: spedd up integration test by using smaller model (#6044) 2023-06-12 13:27:10 -07:00