**Description:**
This integrates Infinispan as a vectorstore.
Infinispan is an open-source key-value data grid, it can work as single
node as well as distributed.
Vector search is supported since release 15.x
For more: [Infinispan Home](https://infinispan.org)
Integration tests are provided as well as a demo notebook
Follow up on https://github.com/langchain-ai/langchain/pull/17467.
- Update all references to the Elasticsearch classes to use the partners
package.
- Deprecate community classes.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
ValidationError: 2 validation errors for DocArrayDoc
text
Field required [type=missing, input_value={'embedding': [-0.0191128...9, 0.01005221541175212]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.5/v/missing
metadata
Field required [type=missing, input_value={'embedding': [-0.0191128...9, 0.01005221541175212]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.5/v/missing
```
In the `_get_doc_cls` method, the `DocArrayDoc` class is defined as
follows:
```python
class DocArrayDoc(BaseDoc):
text: Optional[str]
embedding: Optional[NdArray] = Field(**embeddings_params)
metadata: Optional[dict]
```
This is a PR that adds a dangerous load parameter to force users to opt in to use pickle.
This is a PR that's meant to raise user awareness that the pickling module is involved.
Thank you for contributing to LangChain!
- [ ] **PR title**: "community: deprecate vectorstores.MatchingEngine"
- [ ] **PR message**:
- **Description:** announced a deprecation since this integration has
been moved to langchain_google_vertexai
Description:
This pull request introduces several enhancements for Azure Cosmos
Vector DB, primarily focused on improving caching and search
capabilities using Azure Cosmos MongoDB vCore Vector DB. Here's a
summary of the changes:
- **AzureCosmosDBSemanticCache**: Added a new cache implementation
called AzureCosmosDBSemanticCache, which utilizes Azure Cosmos MongoDB
vCore Vector DB for efficient caching of semantic data. Added
comprehensive test cases for AzureCosmosDBSemanticCache to ensure its
correctness and robustness. These tests cover various scenarios and edge
cases to validate the cache's behavior.
- **HNSW Vector Search**: Added HNSW vector search functionality in the
CosmosDB Vector Search module. This enhancement enables more efficient
and accurate vector searches by utilizing the HNSW (Hierarchical
Navigable Small World) algorithm. Added corresponding test cases to
validate the HNSW vector search functionality in both
AzureCosmosDBSemanticCache and AzureCosmosDBVectorSearch. These tests
ensure the correctness and performance of the HNSW search algorithm.
- **LLM Caching Notebook** - The notebook now includes a comprehensive
example showcasing the usage of the AzureCosmosDBSemanticCache. This
example highlights how the cache can be employed to efficiently store
and retrieve semantic data. Additionally, the example provides default
values for all parameters used within the AzureCosmosDBSemanticCache,
ensuring clarity and ease of understanding for users who are new to the
cache implementation.
@hwchase17,@baskaryan, @eyurtsev,
### Description
Fixed a small bug in chroma.py add_images(), previously whenever we are
not passing metadata the documents is containing the base64 of the uris
passed, but when we are passing the metadata the documents is containing
normal string uris which should not be the case.
### Issue
In add_images() method when we are calling upsert() we have to use
"b64_texts" instead of normal string "uris".
### Twitter handle
https://twitter.com/whitepegasus01
If the document loader recieves Pathlib path instead of str, it reads
the file correctly, but the problem begins when the document is added to
Deeplake.
This problem arises from casting the path to str in the metadata.
```python
deeplake = True
fname = Path('./lorem_ipsum.txt')
loader = TextLoader(fname, encoding="utf-8")
docs = loader.load_and_split()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
chunks= text_splitter.split_documents(docs)
if deeplake:
db = DeepLake(dataset_path=ds_path, embedding=embeddings, token=activeloop_token)
db.add_documents(chunks)
else:
db = Chroma.from_documents(docs, embeddings)
```
So using this snippet of code the error message for deeplake looks like
this:
```
[part of error message omitted]
Traceback (most recent call last):
File "/home/mwm/repositories/sources/fixing_langchain/main.py", line 53, in <module>
db.add_documents(chunks)
File "/home/mwm/repositories/sources/langchain/libs/core/langchain_core/vectorstores.py", line 139, in add_documents
return self.add_texts(texts, metadatas, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/deeplake.py", line 258, in add_texts
return self.vectorstore.add(
^^^^^^^^^^^^^^^^^^^^^
File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/deeplake_vectorstore.py", line 226, in add
return self.dataset_handler.add(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/dataset_handlers/client_side_dataset_handler.py", line 139, in add
dataset_utils.extend_or_ingest_dataset(
File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/vector_search/dataset/dataset.py", line 544, in extend_or_ingest_dataset
extend(
File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/vector_search/dataset/dataset.py", line 505, in extend
dataset.extend(batched_processed_tensors, progressbar=False)
File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/dataset/dataset.py", line 3247, in extend
raise SampleExtendError(str(e)) from e.__cause__
deeplake.util.exceptions.SampleExtendError: Failed to append a sample to the tensor 'metadata'. See more details in the traceback. If you wish to skip the samples that cause errors, please specify `ignore_errors=True`.
```
Which is does not explain the error well enough.
The same error for chroma looks like this
```
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/mwm/repositories/sources/fixing_langchain/main.py", line 56, in <module>
db = Chroma.from_documents(docs, embeddings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/chroma.py", line 778, in from_documents
return cls.from_texts(
^^^^^^^^^^^^^^^
File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/chroma.py", line 736, in from_texts
chroma_collection.add_texts(
File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/chroma.py", line 309, in add_texts
raise ValueError(e.args[0] + "\n\n" + msg)
ValueError: Expected metadata value to be a str, int, float or bool, got lorem_ipsum.txt which is a <class 'pathlib.PosixPath'>
Try filtering complex metadata from the document using langchain_community.vectorstores.utils.filter_complex_metadata.
```
Which is way more user friendly, so I just added information about
possible mismatch of the type in the error message, the same way it is
covered in chroma
https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/vectorstores/chroma.py#L224
This PR migrates the existing MongoDBAtlasVectorSearch abstraction from
the `langchain_community` section to the partners package section of the
codebase.
- [x] Run the partner package script as advised in the partner-packages
documentation.
- [x] Add Unit Tests
- [x] Migrate Integration Tests
- [x] Refactor `MongoDBAtlasVectorStore` (autogenerated) to
`MongoDBAtlasVectorSearch`
- [x] ~Remove~ deprecate the old `langchain_community` VectorStore
references.
## Additional Callouts
- Implemented the `delete` method
- Included any missing async function implementations
- `amax_marginal_relevance_search_by_vector`
- `adelete`
- Added new Unit Tests that test for functionality of
`MongoDBVectorSearch` methods
- Removed [`del
res[self._embedding_key]`](e0c81e1cb0/libs/community/langchain_community/vectorstores/mongodb_atlas.py (L218))
in `_similarity_search_with_score` function as it would make the
`maximal_marginal_relevance` function fail otherwise. The `Document`
needs to store the embedding key in metadata to work.
Checklist:
- [x] PR title: Please title your PR "package: description", where
"package" is whichever of langchain, community, core, experimental, etc.
is being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
- [x] PR message
- [x] Pass lint and test: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified to check that you're
passing lint and testing. See contribution guidelines for more
information on how to write/run tests, lint, etc:
https://python.langchain.com/docs/contributing/
- [x] Add tests and docs: If you're adding a new integration, please
include
1. Existing tests supplied in docs/docs do not change. Updated
docstrings for new functions like `delete`
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory. (This already exists)
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Steven Silvester <steven.silvester@ieee.org>
Co-authored-by: Erick Friis <erick@langchain.dev>
Sometimes, you want to use various parameters in the retrieval query of
Neo4j Vector to personalize/customize results. Before, when there were
only predefined chains, it didn't really make sense. Now that it's all
about custom chains and LCEL, it is worth adding since users can inject
any params they wish at query time. Isn't prone to SQL injection-type
attacks since we use parameters and not concatenating strings.
- **Description:** By default it expects a list but that's not the case
in corner scenarios when there is no document ingested(use case:
Bootstrap application).
\
Hence added as check, if the instance is panda Dataframe instead of list
then it will procced with return immediately.
- **Issue:** NA
- **Dependencies:** NA
- **Twitter handle:** jaskiratsingh1
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
## Description & Issue
While following the official doc to use clickhouse as a vectorstore, I
found only the default `annoy` index is properly supported. But I want
to try another engine `usearch` for `annoy` is not properly supported on
ARM platforms.
Here is the settings I prefer:
``` python
settings = ClickhouseSettings(
table="wiki_Ethereum",
index_type="usearch", # annoy by default
index_param=[],
)
```
The above settings do not work for the command `set
allow_experimental_annoy_index=1` is hard-coded.
This PR will make sure the experimental feature follow the `index_type`
which is also consistent with Clickhouse's naming conventions.
**Description:** This PR adds an `__init__` method to the
NeuralDBVectorStore class, which takes in a NeuralDB object to
instantiate the state of NeuralDBVectorStore.
**Issue:** N/A
**Dependencies:** N/A
**Twitter handle:** N/A
**Description:**
Updated documentation for DeepLake init method.
Especially the exec_option docs needed improvement, but did a general
cleanup while I was looking at it.
**Issue:** n/a
**Dependencies:** None
---------
Co-authored-by: Nathan Voxland <nathan@voxland.net>
In this pull request, we introduce the add_images method to the
SingleStoreDB vector store class, expanding its capabilities to handle
multi-modal embeddings seamlessly. This method facilitates the
incorporation of image data into the vector store by associating each
image's URI with corresponding document content, metadata, and either
pre-generated embeddings or embeddings computed using the embed_image
method of the provided embedding object.
the change includes integration tests, validating the behavior of the
add_images. Additionally, we provide a notebook showcasing the usage of
this new method.
---------
Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
Hi, I'm from the LanceDB team.
Improves LanceDB integration by making it easier to use - now you aren't
required to create tables manually and pass them in the constructor,
although that is still backward compatible.
Bug fix - pandas was being used even though it's not a dependency for
LanceDB or langchain
PS - this issue was raised a few months ago but lost traction. It is a
feature improvement for our users kindly review this , Thanks !
Another PR will be done for the langchain-astradb package.
Note: for future PRs, devs will be done in the partner package only. This one is just to align with the rest of the components in the community package and it fixes a bunch of issues.
- **Description:** Addresses the bugs described in linked issue where an
import was erroneously removed and the rename of a keyword argument was
missed when migrating from beta --> stable of the azure-search-documents
package
- **Issue:** https://github.com/langchain-ai/langchain/issues/17598
- **Dependencies:** N/A
- **Twitter handle:** N/A
- **Description:** This fixes an issue with working with RecordManager.
RecordManager was generating new hashes on documents because `add_texts`
was modifying the metadata directly. Additionally moved some tests to
unit tests since that was a more appropriate home.
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** `@_morgan_adams_`
**Description:** This PR introduces a new "Astra DB" Partner Package.
So far only the vector store class is _duplicated_ there, all others
following once this is validated and established.
Along with the move to separate package, incidentally, the class name
will change `AstraDB` => `AstraDBVectorStore`.
The strategy has been to duplicate the module (with prospected removal
from community at LangChain 0.2). Until then, the code will be kept in
sync with minimal, known differences (there is a makefile target to
automate drift control. Out of convenience with this check, the
community package has a class `AstraDBVectorStore` aliased to `AstraDB`
at the end of the module).
With this PR several bugfixes and improvement come to the vector store,
as well as a reshuffling of the doc pages/notebooks (Astra and
Cassandra) to align with the move to a separate package.
**Dependencies:** A brand new pyproject.toml in the new package, no
changes otherwise.
**Twitter handle:** `@rsprrs`
---------
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
This pull request introduces support for various Approximate Nearest
Neighbor (ANN) vector index algorithms in the VectorStore class,
starting from version 8.5 of SingleStore DB. Leveraging this enhancement
enables users to harness the power of vector indexing, significantly
boosting search speed, particularly when handling large sets of vectors.
---------
Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Users can provide an Elasticsearch connection with custom headers. This
PR makes sure these headers are preserved when adding the langchain user
agent header.
- **Description:** The from__xx methods of FAISS class have hardcoded
InMemoryStore implementation and thereby not let users pass a custom
DocStore implementation,
- **Issue:** no referenced issue,
- **Dependencies:** none,
- **Twitter handle:** ksachdeva
- **Description:** This adds a delete method so that rocksetdb can be
used with `RecordManager`.
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** `@_morgan_adams_`
---------
Co-authored-by: Rockset API Bot <admin@rockset.io>
**Description:** changed filtering so that failed filter doesn't add
document to results. Currently filtering is entirely broken and all
documents are returned whether or not they pass the filter.
fixes issue introduced in
https://github.com/langchain-ai/langchain/pull/16190
- **Description:** This PR adds support for `search_types="mmr"` and
`search_type="similarity_score_threshold"` to retrievers using
`DatabricksVectorSearch`,
- **Issue:**
- **Dependencies:**
- **Twitter handle:**
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>