Commit Graph

60 Commits (e257380debb8640268d2d2577f89139b3ea0b46f)

Author SHA1 Message Date
yujiosaka 6561efebb7
Accept uuids kwargs for weaviate (#4800)
# Accept uuids kwargs for weaviate

Fixes #4791
1 year ago
Magnus Friberg d126276693
Specify which data to return from chromadb (#4393)
# Improve the Chroma get() method by adding the optional "include"
parameter.

The Chroma get() method excludes embeddings by default. You can
customize the response by specifying the "include" parameter to
selectively retrieve the desired data from the collection.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Anirudh Suresh 03ac39368f
Fixing DeepLake Overwrite Flag (#4683)
# Fix DeepLake Overwrite Flag Issue

Fixes Issue #4682: essentially, setting overwrite to False in the
DeepLake constructor still triggers an overwrite, because the logic is
just checking for the presence of "overwrite" in kwargs. The fix is
simple--just add some checks to inspect if "overwrite" in kwargs AND
kwargs["overwrite"]==True.

Added a new test in
tests/integration_tests/vectorstores/test_deeplake.py to reflect the
desired behavior.


Co-authored-by: Anirudh Suresh <ani@Anirudhs-MBP.cable.rcn.com>
Co-authored-by: Anirudh Suresh <ani@Anirudhs-MacBook-Pro.local>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
1 year ago
Evan Jones f668251948
parameterized distance metrics; lint; format; tests (#4375)
# Parameterize Redis vectorstore index

Redis vectorstore allows for three different distance metrics: `L2`
(flat L2), `COSINE`, and `IP` (inner product). Currently, the
`Redis._create_index` method hard codes the distance metric to COSINE.

I've parameterized this as an argument in the `Redis.from_texts` method
-- pretty simple.

Fixes #4368 

## Before submitting

I've added an integration test showing indexes can be instantiated with
all three values in the `REDIS_DISTANCE_METRICS` literal. An example
notebook seemed overkill here. Normal API documentation would be more
appropriate, but no standards are in place for that yet.

## Who can review?

Not sure who's responsible for the vectorstore module... Maybe @eyurtsev
/ @hwchase17 / @agola11 ?
1 year ago
Davis Chase 46b100ea63
Add DocArray vector stores (#4483)
Thanks to @anna-charlotte and @jupyterjazz for the contribution! Made
few small changes to get it across the finish line

---------

Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
Co-authored-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Co-authored-by: jupyterjazz <saba.sturua@jina.ai>
Co-authored-by: Saba Sturua <45267439+jupyterjazz@users.noreply.github.com>
1 year ago
Aivin V. Solatorio 6335cb5b3a
Add support for Qdrant nested filter (#4354)
# Add support for Qdrant nested filter

This extends the filter functionality for the Qdrant vectorstore. The
current filter implementation is limited to a single-level metadata
structure; however, Qdrant supports nested metadata filtering. This
extends the functionality for users to maximize the filter functionality
when using Qdrant as the vectorstore.

Reference: https://qdrant.tech/documentation/filtering/#nested-key

---------

Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>
1 year ago
Naveen Tatikonda 782df1db10
OpenSearch: Add Similarity Search with Score (#4089)
### Description
Add `similarity_search_with_score` method for OpenSearch to return
scores along with documents in the search results

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
1 year ago
George 2324f19c85
Update qdrant interface (#3971)
Hello

1) Passing `embedding_function` as a callable seems to be outdated and
the common interface is to pass `Embeddings` instance

2) At the moment `Qdrant.add_texts` is designed to be used with
`embeddings.embed_query`, which is 1) slow 2) causes ambiguity due to 1.
It should be used with `embeddings.embed_documents`

This PR solves both problems and also provides some new tests
1 year ago
Davis Chase 2451310975
Chroma fix mmr (#3897)
Fixes #3628, thanks @derekmoeller for the issue!
1 year ago
Harrison Chase 0c0f14407c
Harrison/tair (#3770)
Co-authored-by: Seth Huang <848849+seth-hg@users.noreply.github.com>
1 year ago
Harrison Chase be7a8e0824
Harrison/redis cache (#3766)
Co-authored-by: Tyler Hutcherson <tyler.hutcherson@redis.com>
1 year ago
Harrison Chase a35bbbfa9e
Harrison/lancedb (#3634)
Co-authored-by: Minh Le <minhle@canva.com>
1 year ago
Harrison Chase ab749fa1bb
Harrison/opensearch logic (#3631)
Co-authored-by: engineer-matsuo <95115586+engineer-matsuo@users.noreply.github.com>
1 year ago
cs0lar 440c98e24b
Fix/issue 2695 (#3608)
## Background
fixes #2695  

## Changes
The `add_text` method uses the internal embedding function if one was
passes to the `Weaviate` constructor.
NOTE: the latest merge on the `Weaviate` class made the specification of
a `weaviate_api_key` mandatory which might not be desirable for all
users and connection methods (for example weaviate also support Embedded
Weaviate which I am happy to add support to here if people think it's
desirable). I wrapped the fetching of the api key into a try catch in
order to allow the `weaviate_api_key` to be unspecified. Do let me know
if this is unsatisfactory.

## Test Plan
added test for `add_texts` method.
1 year ago
Harrison Chase 408a0183cd
Harrison/weaviate (#3494)
Co-authored-by: Nick Rubell <nick@rubell.com>
1 year ago
cs0lar 3033c6b964
fixes #1214 (#3003)
### Background

Continuing to implement all the interface methods defined by the
`VectorStore` class. This PR pertains to implementation of the
`max_marginal_relevance_search_by_vector` method.

### Changes

- a `max_marginal_relevance_search_by_vector` method implementation has
been added in `weaviate.py`
- tests have been added to the the new method
- vcr cassettes have been added for the weaviate tests

### Test Plan

Added tests for the `max_marginal_relevance_search_by_vector`
implementation

### Change Safety

- [x] I have added tests to cover my changes
1 year ago
Davit Buniatyan 2c0023393b
Deep Lake mini upgrades (#3375)
Improvements
* set default num_workers for ingestion to 0
* upgraded notebooks for avoiding dataset creation ambiguity
* added `force_delete_dataset_by_path`
* bumped deeplake to 3.3.0
* creds arg passing to deeplake object that would allow custom S3

Notes
* please double check if poetry is not messed up (thanks!)

Asks
* Would be great to create a shared slack channel for quick questions

---------

Co-authored-by: Davit Buniatyan <d@activeloop.ai>
1 year ago
Harrison Chase a6664be79c
Harrison/myscale (#3352)
Co-authored-by: Fangrui Liu <fangruil@moqi.ai>
Co-authored-by: 刘 方瑞 <fangrui.liu@outlook.com>
Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>
1 year ago
Filip Haltmayer 215dcc2d26
Refactor Milvus/Zilliz (#3047)
Refactoring milvus/zilliz to clean up and have a more consistent
experience.

Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
1 year ago
Richy Wang 88a8f59aa7
Add a full PostgresSQL syntax database 'AnalyticDB' as vector store. (#3135)
Hi there!
I'm excited to open this PR to add support for using a fully Postgres
syntax compatible database 'AnalyticDB' as a vector.
As AnalyticDB has been proved can be used with AutoGPT,
ChatGPT-Retrieve-Plugin, and LLama-Index, I think it is also good for
you.
AnalyticDB is a distributed Alibaba Cloud-Native vector database. It
works better when data comes to large scale. The PR includes:

- [x]  A new memory: AnalyticDBVector
- [x]  A suite of integration tests verifies the AnalyticDB integration

I have read your [contributing
guidelines](72b7d76d79/.github/CONTRIBUTING.md).
And I have passed the tests below
- [x]  make format
- [x]  make lint
- [x]  make coverage
- [x]  make test
1 year ago
Naveen Tatikonda bb6c459f7a
OpenSearch: Add Support for Lucene Filter (#3201)
### Description
Add Support for Lucene Filter. When you specify a Lucene filter for a
k-NN search, the Lucene algorithm decides whether to perform an exact
k-NN search with pre-filtering or an approximate search with modified
post-filtering. This filter is supported only for approximate search
with the indexes that are created using `lucene` engine.

OpenSearch Documentation -
https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#lucene-k-nn-filter-implementation

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
1 year ago
Naveen Tatikonda 3453b7457c
OpenSearch: Add Support for Boolean Filter with ANN search (#3038)
### Description
Add Support for Boolean Filter with ANN search
Documentation -
https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#boolean-filter-with-ann-search

### Issues Resolved
https://github.com/hwchase17/langchain/issues/2924

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
1 year ago
Jan Backes a9310a3e8b
Add Annoy as VectorStore (#2939)
Adds Annoy (https://github.com/spotify/annoy) as vector Store. 

RESOLVES hwchase17/langchain#2842

discord ref:
https://discord.com/channels/1038097195422978059/1051632794427723827/1096089994168377354

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>
1 year ago
cs0lar 8b9e02da9d
Fix/issue 1213 (#2932)
### Background

Continuing to implement all the interface methods defined by the
`VectorStore` class. This PR pertains to implementation of the
`max_marginal_relevance_search` method.

### Changes

- a `max_marginal_relevance_search` method implementation has been added
in `weaviate.py`
- tests have been added to the the new method
- vcr cassettes have been added for the weaviate tests

### Test Plan

Added tests for the `max_marginal_relevance_search` implementation

### Change Safety

- [x] I have added tests to cover my changes
1 year ago
vowelparrot 4ffc58e07b
Add similarity_search_with_normalized_similarities (#2916)
Add a method that exposes a similarity search with corresponding
normalized similarity scores. Implement only for FAISS now.

### Motivation:

Some memory definitions combine `relevance` with other scores, like
recency , importance, etc.

While many (but not all) of the `VectorStore`'s expose a
`similarity_search_with_score` method, they don't all interpret the
units of that score (depends on the distance metric and whether or not
the the embeddings are normalized).

This PR proposes a `similarity_search_with_normalized_similarities`
method that lets consumers of the vector store not have to worry about
the metric and embedding scale.

*Most providers default to euclidean distance, with Pinecone being one
exception (defaults to cosine _similarity_).*

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Davit Buniatyan b3a5b51728
[minor] Deep Lake auth improvements in docs, kwargs pass, faster tests (#2927)
Minor cosmetic changes 
- Activeloop environment cred authentication in notebooks with
`getpass.getpass` (instead of CLI which not always works)
- much faster tests with Deep Lake pytest mode on 
- Deep Lake kwargs pass

Notes
- I put pytest environment creds inside `vectorstores/conftest.py`, but
feel free to suggest a better location. For context, if I put in
`test_deeplake.py`, `ruff` doesn't let me to set them before import
deeplake

---------

Co-authored-by: Davit Buniatyan <d@activeloop.ai>
1 year ago
Harrison Chase 1e9378d0a8
Harrison/weaviate fixes (#2872)
Co-authored-by: cs0lar <cristiano.solarino@gmail.com>
Co-authored-by: cs0lar <cristiano.solarino@brightminded.com>
1 year ago
sergerdn 04c458a270
feat: improve pinecone tests (#2806)
Improve the integration tests for Pinecone by adding an `.env.example`
file for local testing. Additionally, add some dev dependencies
specifically for integration tests.

This change also helps me understand how Pinecone deals with certain
things, see related issues
https://github.com/hwchase17/langchain/issues/2484
https://github.com/hwchase17/langchain/issues/2816
1 year ago
sergerdn 4bdcedab54
fix: some imports for integration tests (#2612)
Add more missed imports for integration tests. Bump `pytest` to the
current latest version.
Fix `tests/integration_tests/vectorstores/test_elasticsearch.py` to
update its cassette(easy fix).

Related PR: https://github.com/hwchase17/langchain/pull/2560
1 year ago
Ankush Gola c1521ddbdb
Add workaround for not having async vector store methods (#2733)
This allows us to use the async API for the Retrieval chains, though it is not guaranteed to be thread safe.
1 year ago
Naveen Tatikonda 4364d3316e
Add custom vector fields and text fields for OpenSearch (#2652)
**Description**
Add custom vector field name and text field name while indexing and
querying for OpenSearch

**Issues**
https://github.com/hwchase17/langchain/issues/2500

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
1 year ago
sergerdn 6dc86ad48f
feat: add pytest-vcr for recording HTTP interactions in integration tests (#2445)
Using `pytest-vcr` in integration tests has several benefits. Firstly,
it removes the need to mock external services, as VCR records and
replays HTTP interactions on the fly. Secondly, it simplifies the
integration test setup by eliminating the need to set up and tear down
external services in some cases. Finally, it allows for more reliable
and deterministic integration tests by ensuring that HTTP interactions
are always replayed with the same response.
Overall, `pytest-vcr` is a valuable tool for simplifying integration
test setup and improving their reliability

This commit adds the `pytest-vcr` package as a dependency for
integration tests in the `pyproject.toml` file. It also introduces two
new fixtures in `tests/integration_tests/conftest.py` files for managing
cassette directories and VCR configurations.

In addition, the
`tests/integration_tests/vectorstores/test_elasticsearch.py` file has
been updated to use the `@pytest.mark.vcr` decorator for recording and
replaying HTTP interactions.

Finally, this commit removes the `documents` fixture from the
`test_elasticsearch.py` file and replaces it with a new fixture defined
in `tests/integration_tests/vectorstores/conftest.py` that yields a list
of documents to use in any other tests.

This also includes my second attempt to fix issue :
https://github.com/hwchase17/langchain/issues/2386

Maybe related https://github.com/hwchase17/langchain/issues/2484
1 year ago
Davit Buniatyan b4914888a7
Deep Lake upgrade to include attribute search, distance metrics, returning scores and MMR (#2455)
### Features include

- Metadata based embedding search
- Choice of distance metric function (`L2` for Euclidean, `L1` for
Nuclear, `max` L-infinity distance, `cos` for cosine similarity, 'dot'
for dot product. Defaults to `L2`
- Returning scores
- Max Marginal Relevance Search
- Deleting samples from the dataset

### Notes
- Added numerous tests, let me know if you would like to shorten them or
make smarter

---------

Co-authored-by: Davit Buniatyan <d@activeloop.ai>
1 year ago
leo-gan fd69cc7e42
Removed duplicate BaseModel dependencies (#2471)
Removed duplicate BaseModel dependencies in class inheritances.
Also, sorted imports by `isort`.
1 year ago
sergerdn b410dc76aa
fix: elasticsearch (#2402)
- Create a new docker-compose file to start an Elasticsearch instance
for integration tests.
- Add new tests to `test_elasticsearch.py` to verify Elasticsearch
functionality.
- Include an optional group `test_integration` in the `pyproject.toml`
file. This group should contain dependencies for integration tests and
can be installed using the command `poetry install --with
test_integration`. Any new dependencies should be added by running
`poetry add some_new_deps --group "test_integration" `

Note:
New tests running in live mode, which involve end-to-end testing of the
OpenAI API. In the future, adding `pytest-vcr` to record and replay all
API requests would be a nice feature for testing process.More info:
https://pytest-vcr.readthedocs.io/en/latest/

Fixes https://github.com/hwchase17/langchain/issues/2386
1 year ago
Kacper Łukawski 585f60a5aa
Qdrant update to 1.1.1 & docs polishing (#2388)
This PR updates Qdrant to 1.1.1 and introduces local mode, so there is
no need to spin up the Qdrant server. By that occasion, the Qdrant
example notebooks also got updated, covering more cases and answering
some commonly asked questions. All the Qdrant's integration tests were
switched to local mode, so no Docker container is required to launch
them.
1 year ago
Eli 12f868b292
Propagate "filter" arg in Chroma similarity_search (#1869)
Technically a duplicate fix to #1619 but with unit tests and a small
documentation update
- Propagate `filter` arg in Chroma `similarity_search` to delegated call
to `similarity_search_with_score`
- Add `filter` arg to `similarity_search_by_vector`
- Clarify doc strings on FakeEmbeddings
2 years ago
Maurício Maia f155d9d3ec
Add metadata filter to PGVector search (#1872)
Add ability to filter pgvector documents by metadata.
2 years ago
Maurício Maia 2212520a6c
Add PGVector collection metadata (#1887)
The `CollectionStore` for `PGVector` has a `cmetadata` field but it's
never used. This PR add the ability to save metadata information to the
collection.
2 years ago
LeoGrin 3701b2901e
use namespace argument in Pinecone constructor (#1757)
Fix #1756

Use the `namespace` argument of `Pinecone.from_exisiting_index` to set
the default value of `namespace` for other methods. Leads to more
expected behavior and easier integration in chains.

For the test, I've added a line to delete and rebuild the
`langchain-demo` index at the beginning of the test. I'm not 100% sure
if it's a good idea but it makes the test reproducible.
2 years ago
Kacper Łukawski 4a327dd1d6
Implement basic metadata filtering in Qdrant (#1689)
This PR implements a basic metadata filtering mechanism similar to the
ones in Chroma and Pinecone. It still cannot express complex conditions,
as there are no operators, but some users requested to have that feature
available.
2 years ago
Harrison Chase 0b29e68c17
Harrison/pgvector (#1679)
Co-authored-by: Aman Kumar <krsingh.aman@gmail.com>
2 years ago
Xin Qiu 4e13cef05a
feat: add redisearch vectorstore (#1307)
# Description

Add `RediSearch` vectorstore for LangChain

RediSearch: [RediSearch quick
start](https://redis.io/docs/stack/search/quick_start/)

# How to use

```
from langchain.vectorstores.redisearch import RediSearch

rds = RediSearch.from_documents(docs, embeddings,redisearch_url="redis://localhost:6379")
```
2 years ago
Harrison Chase a1b9dfc099
Harrison/similarity search chroma (#1434)
Co-authored-by: shibuiwilliam <shibuiyusuke@gmail.com>
2 years ago
Kacper Łukawski 9ac442624c
Add Qdrant named arguments (#1386)
This PR:
- Increases `qdrant-client` version to 1.0.4
- Introduces custom content and metadata keys (as requested in #1087)
- Moves all the `QdrantClient` parameters into the method parameters to
simplify code completion
2 years ago
Harrison Chase 166cda2cc6
Harrison/deeplake (#1316)
Co-authored-by: Davit Buniatyan <d@activeloop.ai>
2 years ago
Harrison Chase aaad6cc954
Harrison/atlas db (#1315)
Co-authored-by: Brandon Duderstadt <brandonduderstadt@gmail.com>
2 years ago
Naveen Tatikonda 0118706fd6
Add Support for OpenSearch Vector database (#1191)
### Description
This PR adds a wrapper which adds support for the OpenSearch vector
database. Using opensearch-py client we are ingesting the embeddings of
given text into opensearch cluster using Bulk API. We can perform the
`similarity_search` on the index using the 3 popular searching methods
of OpenSearch k-NN plugin:

- `Approximate k-NN Search` use approximate nearest neighbor (ANN)
algorithms from the [nmslib](https://github.com/nmslib/nmslib),
[faiss](https://github.com/facebookresearch/faiss), and
[Lucene](https://lucene.apache.org/) libraries to power k-NN search.
- `Script Scoring` extends OpenSearch’s script scoring functionality to
execute a brute force, exact k-NN search.
- `Painless Scripting` adds the distance functions as painless
extensions that can be used in more complex combinations. Also, supports
brute force, exact k-NN search like Script Scoring.

### Issues Resolved 
https://github.com/hwchase17/langchain/issues/1054

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
2 years ago
Andrew White c5015d77e2
Allow k to be higher than doc size in max_marginal_relevance_search (#1187)
Fixes issue #1186. For some reason, #1117 didn't seem to fix it.
2 years ago
Noah Gundotra 8c5fbab72d
[Integration Tests] Cast fake embeddings to ALL float values (#1102)
Pydantic validation breaks tests for example (`test_qdrant.py`) because
fake embeddings contain an integer.

This PR casts the embeddings array to all floats.

Now the `qdrant` test passes, `poetry run pytest
tests/integration_tests/vectorstores/test_qdrant.py`
2 years ago