Checking if weaviate similarity_search kwargs contains "certainty" and
use it accordingly. The minimal level of certainty must be a float, and
it is computed by normalized distance.
### Description
This PR adds a wrapper which adds support for the OpenSearch vector
database. Using opensearch-py client we are ingesting the embeddings of
given text into opensearch cluster using Bulk API. We can perform the
`similarity_search` on the index using the 3 popular searching methods
of OpenSearch k-NN plugin:
- `Approximate k-NN Search` use approximate nearest neighbor (ANN)
algorithms from the [nmslib](https://github.com/nmslib/nmslib),
[faiss](https://github.com/facebookresearch/faiss), and
[Lucene](https://lucene.apache.org/) libraries to power k-NN search.
- `Script Scoring` extends OpenSearch’s script scoring functionality to
execute a brute force, exact k-NN search.
- `Painless Scripting` adds the distance functions as painless
extensions that can be used in more complex combinations. Also, supports
brute force, exact k-NN search like Script Scoring.
### Issues Resolved
https://github.com/hwchase17/langchain/issues/1054
---------
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
Implementation fails if there are not enough documents. Added the same
check as used for similarity search.
Current implementation raises
```
File ".venv/lib/python3.9/site-packages/langchain/vectorstores/faiss.py", line 160, in max_marginal_relevance_search
_id = self.index_to_docstore_id[i]
KeyError: -1
```
The #1088 introduced a bug in Qdrant integration. That PR reverts those
changes and provides class attributes to ensure consistent payload keys.
In addition to that, an exception will be thrown if any of texts is None
(that could have been an issue reported in #1087)
Alternate implementation to PR #960 Again - only FAISS is implemented.
If accepted can add this to other vectorstores or leave as
NotImplemented? Suggestions welcome...
This PR adds persistence to the Chroma vector store.
Users can supply a `persist_directory` with any of the `Chroma` creation
methods. If supplied, the store will be automatically persisted at that
directory.
If a user creates a new `Chroma` instance with the same persistence
directory, it will get loaded up automatically. If they use `from_texts`
or `from_documents` in this way, the documents will be loaded into the
existing store.
There is the chance of some funky behavior if the user passes a
different embedding function from the one used to create the collection
- we will make this easier in future updates. For now, we log a warning.
Chroma is a simple to use, open-source, zero-config, zero setup
vectorstore.
Simply `pip install chromadb`, and you're good to go.
Out-of-the-box Chroma is suitable for most LangChain workloads, but is
highly flexible. I tested to 1M embs on my M1 mac, with out issues and
reasonably fast query times.
Look out for future releases as we integrate more Chroma features with
LangChain!
Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
Signed-off-by: Frank Liu <frank.liu@zilliz.com>
Co-authored-by: Filip Haltmayer <81822489+filip-halt@users.noreply.github.com>
Co-authored-by: Frank Liu <frank@frankzliu.com>
If `distance_func` and `collection_name` are in `kwargs` they are sent
to the `QdrantClient` which results in an error being raised.
Co-authored-by: Francisco Ingham <>
I'm providing a hotfix for Qdrant integration. Calculating a single
embedding to obtain the vector size was great idea. However, that change
introduced a bug trying to put only that single embedding into the
database. It's fixed. Right now all the embeddings will be pushed to
Qdrant.
- This uses the faiss built-in `write_index` and `load_index` to save
and load faiss indexes locally
- Also fixes#674
- The save/load functions also use the faiss library, so I refactored
the dependency into a function
Allow optionally specifying a list of ids for pinecone rather than
having them randomly generated.
This also permits editing the embedding/metadata of existing pinecone
entries, by id.
Also updated docs, and noticed an issue with the add_texts method on
VectorStores that I had missed before -- the metadatas arg should be
required to match the classmethod which initializes the VectorStores
(the add_example methods break otherwise in the ExampleSelectors)
this will break atm but wanted to get thoughts on implementation.
1. should add() be on docstore interface?
2. should InMemoryDocstore change to take a list of documents as init?
(makes this slightly easier to implement in FAISS -- if we think it is
less clean then could expose a method to get the number of documents
currently in the dict, and perform the logic of creating the necessary
dictionary in the FAISS.add_texts method.
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>