Commit Graph

24 Commits

Author SHA1 Message Date
Eli
12f868b292
Propagate "filter" arg in Chroma similarity_search (#1869)
Technically a duplicate fix to #1619 but with unit tests and a small
documentation update
- Propagate `filter` arg in Chroma `similarity_search` to delegated call
to `similarity_search_with_score`
- Add `filter` arg to `similarity_search_by_vector`
- Clarify doc strings on FakeEmbeddings
2023-03-22 19:40:10 -07:00
Maurício Maia
f155d9d3ec
Add metadata filter to PGVector search (#1872)
Add ability to filter pgvector documents by metadata.
2023-03-22 15:21:40 -07:00
Maurício Maia
2212520a6c
Add PGVector collection metadata (#1887)
The `CollectionStore` for `PGVector` has a `cmetadata` field but it's
never used. This PR add the ability to save metadata information to the
collection.
2023-03-22 11:27:07 -07:00
LeoGrin
3701b2901e
use namespace argument in Pinecone constructor (#1757)
Fix #1756

Use the `namespace` argument of `Pinecone.from_exisiting_index` to set
the default value of `namespace` for other methods. Leads to more
expected behavior and easier integration in chains.

For the test, I've added a line to delete and rebuild the
`langchain-demo` index at the beginning of the test. I'm not 100% sure
if it's a good idea but it makes the test reproducible.
2023-03-18 19:55:38 -07:00
Kacper Łukawski
4a327dd1d6
Implement basic metadata filtering in Qdrant (#1689)
This PR implements a basic metadata filtering mechanism similar to the
ones in Chroma and Pinecone. It still cannot express complex conditions,
as there are no operators, but some users requested to have that feature
available.
2023-03-15 07:31:39 -07:00
Harrison Chase
0b29e68c17
Harrison/pgvector (#1679)
Co-authored-by: Aman Kumar <krsingh.aman@gmail.com>
2023-03-14 21:13:58 -07:00
Xin Qiu
4e13cef05a
feat: add redisearch vectorstore (#1307)
# Description

Add `RediSearch` vectorstore for LangChain

RediSearch: [RediSearch quick
start](https://redis.io/docs/stack/search/quick_start/)

# How to use

```
from langchain.vectorstores.redisearch import RediSearch

rds = RediSearch.from_documents(docs, embeddings,redisearch_url="redis://localhost:6379")
```
2023-03-14 18:06:03 -07:00
Harrison Chase
a1b9dfc099
Harrison/similarity search chroma (#1434)
Co-authored-by: shibuiwilliam <shibuiyusuke@gmail.com>
2023-03-04 08:10:15 -08:00
Kacper Łukawski
9ac442624c
Add Qdrant named arguments (#1386)
This PR:
- Increases `qdrant-client` version to 1.0.4
- Introduces custom content and metadata keys (as requested in #1087)
- Moves all the `QdrantClient` parameters into the method parameters to
simplify code completion
2023-03-02 07:05:14 -08:00
Harrison Chase
166cda2cc6
Harrison/deeplake (#1316)
Co-authored-by: Davit Buniatyan <d@activeloop.ai>
2023-02-26 22:35:04 -08:00
Harrison Chase
aaad6cc954
Harrison/atlas db (#1315)
Co-authored-by: Brandon Duderstadt <brandonduderstadt@gmail.com>
2023-02-26 22:11:38 -08:00
Naveen Tatikonda
0118706fd6
Add Support for OpenSearch Vector database (#1191)
### Description
This PR adds a wrapper which adds support for the OpenSearch vector
database. Using opensearch-py client we are ingesting the embeddings of
given text into opensearch cluster using Bulk API. We can perform the
`similarity_search` on the index using the 3 popular searching methods
of OpenSearch k-NN plugin:

- `Approximate k-NN Search` use approximate nearest neighbor (ANN)
algorithms from the [nmslib](https://github.com/nmslib/nmslib),
[faiss](https://github.com/facebookresearch/faiss), and
[Lucene](https://lucene.apache.org/) libraries to power k-NN search.
- `Script Scoring` extends OpenSearch’s script scoring functionality to
execute a brute force, exact k-NN search.
- `Painless Scripting` adds the distance functions as painless
extensions that can be used in more complex combinations. Also, supports
brute force, exact k-NN search like Script Scoring.

### Issues Resolved 
https://github.com/hwchase17/langchain/issues/1054

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
2023-02-20 18:39:34 -08:00
Andrew White
c5015d77e2
Allow k to be higher than doc size in max_marginal_relevance_search (#1187)
Fixes issue #1186. For some reason, #1117 didn't seem to fix it.
2023-02-20 16:39:13 -08:00
Noah Gundotra
8c5fbab72d
[Integration Tests] Cast fake embeddings to ALL float values (#1102)
Pydantic validation breaks tests for example (`test_qdrant.py`) because
fake embeddings contain an integer.

This PR casts the embeddings array to all floats.

Now the `qdrant` test passes, `poetry run pytest
tests/integration_tests/vectorstores/test_qdrant.py`
2023-02-17 15:18:09 -08:00
seanaedmiston
f0a258555b
Support similarity search by vector (in FAISS) (#961)
Alternate implementation to PR #960 Again - only FAISS is implemented.
If accepted can add this to other vectorstores or leave as
NotImplemented? Suggestions welcome...
2023-02-15 22:50:00 -08:00
Anton Troynikov
d43d430d86
Chroma persistence (#1028)
This PR adds persistence to the Chroma vector store.

Users can supply a `persist_directory` with any of the `Chroma` creation
methods. If supplied, the store will be automatically persisted at that
directory.

If a user creates a new `Chroma` instance with the same persistence
directory, it will get loaded up automatically. If they use `from_texts`
or `from_documents` in this way, the documents will be loaded into the
existing store.

There is the chance of some funky behavior if the user passes a
different embedding function from the one used to create the collection
- we will make this easier in future updates. For now, we log a warning.
2023-02-13 21:09:06 -08:00
Anton Troynikov
78abd277ff
Chroma in LangChain (#1010)
Chroma is a simple to use, open-source, zero-config, zero setup
vectorstore.

Simply `pip install chromadb`, and you're good to go. 

Out-of-the-box Chroma is suitable for most LangChain workloads, but is
highly flexible. I tested to 1M embs on my M1 mac, with out issues and
reasonably fast query times.

Look out for future releases as we integrate more Chroma features with
LangChain!
2023-02-12 17:43:48 -08:00
Harrison Chase
1e56879d38
Harrison/save faiss (#916)
Co-authored-by: Shrey Joshi <shreyjoshi2004@gmail.com>
2023-02-06 21:44:50 -08:00
Kevin Huo
31b054f69d
Add pinecone integration test (#911)
Basic integration test for pinecone
2023-02-06 18:13:35 -08:00
Harrison Chase
3f48eed5bd
Harrison/milvus (#856)
Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
Signed-off-by: Frank Liu <frank.liu@zilliz.com>
Co-authored-by: Filip Haltmayer <81822489+filip-halt@users.noreply.github.com>
Co-authored-by: Frank Liu <frank@frankzliu.com>
2023-02-02 22:05:47 -08:00
dham
e04b063ff4
add faiss local saving/loading (#676)
- This uses the faiss built-in `write_index` and `load_index` to save
and load faiss indexes locally
- Also fixes #674
- The save/load functions also use the faiss library, so I refactored
the dependency into a function
2023-01-21 16:08:14 -08:00
Harrison Chase
0b204d8c21
Harrison/quadrant (#665)
Co-authored-by: Kacper Łukawski <kacperlukawski@users.noreply.github.com>
2023-01-20 09:45:01 -08:00
Samantha Whitmore
315b0c09c6
wip: add method for both docstore and embeddings (#119)
this will break atm but wanted to get thoughts on implementation.

1. should add() be on docstore interface?
2. should InMemoryDocstore change to take a list of documents as init?
(makes this slightly easier to implement in FAISS -- if we think it is
less clean then could expose a method to get the number of documents
currently in the dict, and perform the logic of creating the necessary
dictionary in the FAISS.add_texts method.

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2022-11-20 16:23:58 -08:00
Harrison Chase
c02eb199b6
add few shot example (#148) 2022-11-19 20:32:45 -08:00