Commit Graph

203 Commits

Author SHA1 Message Date
Hamza Muhammad Farooqi
24a0a4472a
Add docstrings for Clickhouse class methods (#19195)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
2024-03-19 04:03:12 +00:00
Rohit Gupta
785f8ab174
[langchain_community] milvus vectorstores upsert: add **kwargs to make it use for other argument also (#19193)
add **kwargs in add_documents for upsert, to make it use for other
argument also.
Lets use this, it was unused as of now.

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

Co-authored-by: Rohit Gupta <rohit.gupta2@walmart.com>
2024-03-18 21:01:12 -07:00
Guangdong Liu
c3310c5e7f
community: Fix Milvus got multiple values for keyword argument 'timeout' (#19232)
- **Description:** Fix Milvus got multiple values for keyword argument
'timeout'
- **Issue:**  fix #18580
- @baskaryan @eyurtsev PTAL
2024-03-18 20:44:25 -07:00
Cailin Wang
7cd87d2f6a
community: Add partition parameter to DashVector (#19023)
**Description**: DashVector Add partition parameter
**Twitter handle**: @CailinWang_

---------

Co-authored-by: root <root@Bluedot-AI>
2024-03-16 15:20:30 -07:00
高远
ef9813dae6
docs: add vikingdb docstrings(#19016)
Co-authored-by: gaoyuan <gaoyuan.20001218@bytedance.com>
2024-03-15 16:29:29 -07:00
kaijietti
c20aeef79a
community[patch]: implement qdrant _aembed_query and use it in other async funcs (#19155)
`amax_marginal_relevance_search ` and `asimilarity_search_with_score `
should use an async version of `_embed_query `.
2024-03-15 21:20:12 +00:00
fengjial
c922ea36cb
community[minor]: Add Baidu VectorDB as vector store (#17997)
Co-authored-by: fengjialin <fengjialin@MacBook-Pro.local>
2024-03-15 19:01:58 +00:00
Erick Friis
781aee0068
community, langchain, infra: revert store extended test deps outside of poetry (#19153)
Reverts langchain-ai/langchain#18995

Because it makes installing dependencies in python 3.11 extended testing
take 80 minutes
2024-03-15 17:10:47 +00:00
Erick Friis
9e569d85a4
community, langchain, infra: store extended test deps outside of poetry (#18995)
poetry can't reliably handle resolving the number of optional "extended
test" dependencies we have. If we instead just rely on pip to install
extended test deps in CI, this isn't an issue.
2024-03-15 05:55:30 +00:00
Eugene Yurtsev
6cdca4355d
community[minor]: Revamp PGVector Filtering (#18992)
This PR makes the following updates in the pgvector database:

1. Use JSONB field for metadata instead of JSON
2. Update operator syntax to include required `$` prefix before the
operators (otherwise there will be name collisions with fields)
3. The change is non-breaking, old functionality is still the default,
but it will emit a deprecation warning
4. Previous functionality has bugs associated with comparisons due to
casting to text (so lexical ordering is used incorrectly for numeric
fields)
5. Adds an a GIN index on the JSONB field for more efficient querying
2024-03-14 16:56:00 -04:00
Leonid Ganeline
9c8523b529
community[patch]: flattening imports 3 (#18939)
@eyurtsev
2024-03-12 15:18:54 -07:00
Tymofii
0bec1f6877
commnity[patch]: refactor code for faiss vectorstore, update faiss vectorstore documentation (#18092)
**Description:** Refactor code of FAISS vectorcstore and update the
related documentation.
Details: 
 - replace `.format()` with f-strings for strings formatting;
- refactor definition of a filtering function to make code more readable
and more flexible;
- slightly improve efficiency of
`max_marginal_relevance_search_with_score_by_vector` method by removing
unnecessary looping over the same elements;
- slightly improve efficiency of `delete` method by using set data
structure for checking if the element was already deleted;

**Issue:** fix small inconsistency in the documentation (the old example
was incorrect and unappliable to faiss vectorstore)

**Dependencies:** basic langchain-community dependencies and `faiss`
(for CPU or for GPU)

**Twitter handle:** antonenkodev
2024-03-11 22:33:03 -07:00
Tomaz Bratanic
a28be31a96
Switch to md5 for deduplication in neo4j integrations (#18846)
Deduplicate documents using MD5 of the page_content. Also allows for
custom deduplication with graph ingestion method by providing metadata
id attribute

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-03-09 13:28:55 -08:00
Keith Chan
914af69b44
community[patch]: Update azuresearch vectorstore from_texts() method to include fields argument (#17661)
- **Description:** Update azuresearch vectorstore from_texts() method to
include fields argument, necessary for creating an Azure AI Search index
with custom fields.
- **Issue:** Currently index fields are fixed to default fields if Azure
Search index is created using from_texts() method
- **Dependencies:** None
- **Twitter handle:** None

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-08 17:05:35 -08:00
Phat Vo
3ecb903d49
community[patch] : Tidy up and update Clarifai SDK functions (#18314)
Description :
* Tidy up, add missing docstring and fix unused params
* Enable using session token
2024-03-07 19:47:44 -08:00
Ian
390ef6abe3
community[minor]: Add Initial Support for TiDB Vector Store (#15796)
This pull request introduces initial support for the TiDB vector store.
The current version is basic, laying the foundation for the vector store
integration. While this implementation provides the essential features,
we plan to expand and improve the TiDB vector store support with
additional enhancements in future updates.

Upcoming Enhancements:
* Support for Vector Index Creation: To enhance the efficiency and
performance of the vector store.
* Support for max marginal relevance search. 
* Customized Table Structure Support: Recognizing the need for
flexibility, we plan for more tailored and efficient data store
solutions.

Simple use case exmaple

```python
from typing import List, Tuple
from langchain.docstore.document import Document
from langchain_community.vectorstores import TiDBVectorStore
from langchain_openai import OpenAIEmbeddings

db = TiDBVectorStore.from_texts(
    embedding=embeddings,
    texts=['Andrew like eating oranges', 'Alexandra is from England', 'Ketanji Brown Jackson is a judge'],
    table_name="tidb_vector_langchain",
    connection_string=tidb_connection_url,
    distance_strategy="cosine",
)

query = "Can you tell me about Alexandra?"
docs_with_score: List[Tuple[Document, float]] = db.similarity_search_with_score(query)
for doc, score in docs_with_score:
    print("-" * 80)
    print("Score: ", score)
    print(doc.page_content)
    print("-" * 80)
```
2024-03-07 17:18:20 -08:00
axiangcoding
9745b5894d
community[patch]: Chroma use uuid4 instead of uuid1 to generate random ids (#18723)
- **Description:** Chroma use uuid4 instead of uuid1 as random ids. Use
uuid1 may leak mac address, changing to uuid4 will not cause other
effects.
  - **Issue:** None
  - **Dependencies:** None
  - **Twitter handle:** None
2024-03-07 11:48:25 -05:00
Sam Khano
1b4dcf22f3
community[minor]: Add DocumentDBVectorSearch VectorStore (#17757)
**Description:**
- Added Amazon DocumentDB Vector Search integration (HNSW index)
- Added integration tests
- Updated AWS documentation with DocumentDB Vector Search instructions
- Added notebook for DocumentDB integration with example usage

---------

Co-authored-by: EC2 Default User <ec2-user@ip-172-31-95-226.ec2.internal>
2024-03-06 15:11:34 -08:00
Vittorio Rigamonti
51f3902bc4
community[minor]: Adding support for Infinispan as VectorStore (#17861)
**Description:**
This integrates Infinispan as a vectorstore.
Infinispan is an open-source key-value data grid, it can work as single
node as well as distributed.

Vector search is supported since release 15.x 

For more: [Infinispan Home](https://infinispan.org)

Integration tests are provided as well as a demo notebook
2024-03-06 15:11:02 -08:00
Max Jakob
cca0167917
elasticsearch[patch], community[patch]: update references, deprecate community classes (#18506)
Follow up on https://github.com/langchain-ai/langchain/pull/17467.

- Update all references to the Elasticsearch classes to use the partners
package.
- Deprecate community classes.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-06 15:09:12 -08:00
Djordje
12b4a4d860
community[patch]: Opensearch delete method added - indexing supported (#18522)
- **Description:** Added delete method for OpenSearchVectorSearch,
therefore indexing supported
    - **Issue:** No
    - **Dependencies:** No
    - **Twitter handle:** stkbmf
2024-03-06 15:08:47 -08:00
Aaron Yi
c092db862e
community[patch]: make metadata and text optional as expected in DocArray (#18678)
ValidationError: 2 validation errors for DocArrayDoc
text
Field required [type=missing, input_value={'embedding': [-0.0191128...9, 0.01005221541175212]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.5/v/missing
metadata
Field required [type=missing, input_value={'embedding': [-0.0191128...9, 0.01005221541175212]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.5/v/missing
```
In the `_get_doc_cls` method, the `DocArrayDoc` class is defined as
follows:

```python
class DocArrayDoc(BaseDoc):
    text: Optional[str]
    embedding: Optional[NdArray] = Field(**embeddings_params)
    metadata: Optional[dict]
```
2024-03-06 16:51:41 -05:00
Eugene Yurtsev
4c25b49229
community[major]: breaking change in some APIs to force users to opt-in for pickling (#18696)
This is a PR that adds a dangerous load parameter to force users to opt in to use pickle.

This is a PR that's meant to raise user awareness that the pickling module is involved.
2024-03-06 16:43:01 -05:00
Hech
6a08134661
community[patch], langchain[minor]: Add retriever self_query and score_threshold in DingoDB (#18106) 2024-03-05 15:47:29 -08:00
Tomaz Bratanic
353248838d
Add precedence for input params over env variables in neo4j integration (#18581)
input parameters take precedence over env variables
2024-03-05 09:36:56 -08:00
Leonid Kuligin
04d134df17
marked MatchingEngine as deprecated (#18585)
Thank you for contributing to LangChain!

- [ ] **PR title**: "community: deprecate vectorstores.MatchingEngine"


- [ ] **PR message**: 
- **Description:** announced a deprecation since this integration has
been moved to langchain_google_vertexai
2024-03-05 09:34:53 -08:00
Aayush Kataria
7c2f3f6f95
community[minor]: Adding Azure Cosmos Mongo vCore Vector DB Cache (#16856)
Description:

This pull request introduces several enhancements for Azure Cosmos
Vector DB, primarily focused on improving caching and search
capabilities using Azure Cosmos MongoDB vCore Vector DB. Here's a
summary of the changes:

- **AzureCosmosDBSemanticCache**: Added a new cache implementation
called AzureCosmosDBSemanticCache, which utilizes Azure Cosmos MongoDB
vCore Vector DB for efficient caching of semantic data. Added
comprehensive test cases for AzureCosmosDBSemanticCache to ensure its
correctness and robustness. These tests cover various scenarios and edge
cases to validate the cache's behavior.
- **HNSW Vector Search**: Added HNSW vector search functionality in the
CosmosDB Vector Search module. This enhancement enables more efficient
and accurate vector searches by utilizing the HNSW (Hierarchical
Navigable Small World) algorithm. Added corresponding test cases to
validate the HNSW vector search functionality in both
AzureCosmosDBSemanticCache and AzureCosmosDBVectorSearch. These tests
ensure the correctness and performance of the HNSW search algorithm.
- **LLM Caching Notebook** - The notebook now includes a comprehensive
example showcasing the usage of the AzureCosmosDBSemanticCache. This
example highlights how the cache can be employed to efficiently store
and retrieve semantic data. Additionally, the example provides default
values for all parameters used within the AzureCosmosDBSemanticCache,
ensuring clarity and ease of understanding for users who are new to the
cache implementation.
 
 @hwchase17,@baskaryan, @eyurtsev,
2024-03-03 14:04:15 -08:00
Sourav Pradhan
50abeb7ed9
community[patch]: fix Chroma add_images (#17964)
###  Description

Fixed a small bug in chroma.py add_images(), previously whenever we are
not passing metadata the documents is containing the base64 of the uris
passed, but when we are passing the metadata the documents is containing
normal string uris which should not be the case.

### Issue

In add_images() method when we are calling upsert() we have to use
"b64_texts" instead of normal string "uris".

### Twitter handle

https://twitter.com/whitepegasus01
2024-03-01 21:55:58 +00:00
sarahberenji
08fa38d56d
community[patch]: the syntax error for Redis generated query (#17717)
To fix the reported error:
https://github.com/langchain-ai/langchain/discussions/17397
2024-03-01 12:18:10 -08:00
certified-dodo
43e3244573
community[patch]: Fix MongoDBAtlasVectorSearch max_marginal_relevance_search (#17971)
Description:
* `self._embedding_key` is accessed after deletion, breaking
`max_marginal_relevance_search` search
* Introduced in:
e135e5257c
* Updated but still persists in:
ce22e10c4b

Issue: https://github.com/langchain-ai/langchain/issues/17963

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-01 12:17:42 -08:00
mwmajewsk
e192f6b6eb
community[patch]: fix, better error message in deeplake vectoriser (#18397)
If the document loader recieves Pathlib path instead of str, it reads
the file correctly, but the problem begins when the document is added to
Deeplake.
This problem arises from casting the path to str in the metadata.

```python
deeplake = True
fname = Path('./lorem_ipsum.txt')
loader = TextLoader(fname, encoding="utf-8")
docs = loader.load_and_split()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
chunks= text_splitter.split_documents(docs)
if deeplake:
    db = DeepLake(dataset_path=ds_path, embedding=embeddings, token=activeloop_token)
    db.add_documents(chunks)
else:
    db = Chroma.from_documents(docs, embeddings)
```

So using this snippet of code the error message for deeplake looks like
this:

```
[part of error message omitted]

Traceback (most recent call last):
  File "/home/mwm/repositories/sources/fixing_langchain/main.py", line 53, in <module>
    db.add_documents(chunks)
  File "/home/mwm/repositories/sources/langchain/libs/core/langchain_core/vectorstores.py", line 139, in add_documents
    return self.add_texts(texts, metadatas, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/deeplake.py", line 258, in add_texts
    return self.vectorstore.add(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/deeplake_vectorstore.py", line 226, in add
    return self.dataset_handler.add(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/dataset_handlers/client_side_dataset_handler.py", line 139, in add
    dataset_utils.extend_or_ingest_dataset(
  File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/vector_search/dataset/dataset.py", line 544, in extend_or_ingest_dataset
    extend(
  File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/vector_search/dataset/dataset.py", line 505, in extend
    dataset.extend(batched_processed_tensors, progressbar=False)
  File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/dataset/dataset.py", line 3247, in extend
    raise SampleExtendError(str(e)) from e.__cause__
deeplake.util.exceptions.SampleExtendError: Failed to append a sample to the tensor 'metadata'. See more details in the traceback. If you wish to skip the samples that cause errors, please specify `ignore_errors=True`.
```

Which is does not explain the error well enough.
The same error for chroma looks like this 

```
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mwm/repositories/sources/fixing_langchain/main.py", line 56, in <module>
    db = Chroma.from_documents(docs, embeddings)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/chroma.py", line 778, in from_documents
    return cls.from_texts(
           ^^^^^^^^^^^^^^^
  File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/chroma.py", line 736, in from_texts
    chroma_collection.add_texts(
  File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/chroma.py", line 309, in add_texts
    raise ValueError(e.args[0] + "\n\n" + msg)
ValueError: Expected metadata value to be a str, int, float or bool, got lorem_ipsum.txt which is a <class 'pathlib.PosixPath'>

Try filtering complex metadata from the document using langchain_community.vectorstores.utils.filter_complex_metadata.
```

Which is way more user friendly, so I just added information about
possible mismatch of the type in the error message, the same way it is
covered in chroma
https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/vectorstores/chroma.py#L224
2024-03-01 11:21:21 -08:00
Erick Friis
eefb49680f
multiple[patch]: fix deprecation versions (#18349) 2024-02-29 16:58:33 -08:00
Jib
72bfc1d3db
mongodb[minor]: MongoDB Partner Package -- Porting MongoDBAtlasVectorSearch (#17652)
This PR migrates the existing MongoDBAtlasVectorSearch abstraction from
the `langchain_community` section to the partners package section of the
codebase.
- [x] Run the partner package script as advised in the partner-packages
documentation.
- [x] Add Unit Tests
- [x] Migrate Integration Tests
- [x] Refactor `MongoDBAtlasVectorStore` (autogenerated) to
`MongoDBAtlasVectorSearch`
- [x] ~Remove~ deprecate the old `langchain_community` VectorStore
references.

## Additional Callouts
- Implemented the `delete` method
- Included any missing async function implementations
  - `amax_marginal_relevance_search_by_vector`
  - `adelete` 
- Added new Unit Tests that test for functionality of
`MongoDBVectorSearch` methods
- Removed [`del
res[self._embedding_key]`](e0c81e1cb0/libs/community/langchain_community/vectorstores/mongodb_atlas.py (L218))
in `_similarity_search_with_score` function as it would make the
`maximal_marginal_relevance` function fail otherwise. The `Document`
needs to store the embedding key in metadata to work.

Checklist:

- [x] PR title: Please title your PR "package: description", where
"package" is whichever of langchain, community, core, experimental, etc.
is being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
  - Example: "community: add foobar LLM"
- [x] PR message
- [x] Pass lint and test: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified to check that you're
passing lint and testing. See contribution guidelines for more
information on how to write/run tests, lint, etc:
https://python.langchain.com/docs/contributing/
- [x] Add tests and docs: If you're adding a new integration, please
include
1. Existing tests supplied in docs/docs do not change. Updated
docstrings for new functions like `delete`
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory. (This already exists)

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Steven Silvester <steven.silvester@ieee.org>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-29 23:09:48 +00:00
Tomaz Bratanic
5999c4a240
Add support for parameters in neo4j retrieval query (#18310)
Sometimes, you want to use various parameters in the retrieval query of
Neo4j Vector to personalize/customize results. Before, when there were
only predefined chains, it didn't really make sense. Now that it's all
about custom chains and LCEL, it is worth adding since users can inject
any params they wish at query time. Isn't prone to SQL injection-type
attacks since we use parameters and not concatenating strings.
2024-02-29 13:00:54 -08:00
Christophe Bornet
8a81fcd5d3
community: Fix deprecation version of AstraDB VectorStore (#17991) 2024-02-28 17:15:09 -05:00
Ashley Xu
e3211c2b3d
community[patch]: BigQueryVectorSearch JSON type unsupported for metadatas (#18234) 2024-02-28 08:19:53 -08:00
Jaskirat Singh
ce682f5a09
community: vectorstores.kdbai - Added support for when no docs are present (#18103)
- **Description:** By default it expects a list but that's not the case
in corner scenarios when there is no document ingested(use case:
Bootstrap application).
\
Hence added as check, if the instance is panda Dataframe instead of list
then it will procced with return immediately.

- **Issue:** NA
- **Dependencies:** NA
- **Twitter handle:**  jaskiratsingh1

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-02-26 12:47:06 -08:00
am-kinetica
9b8f6455b1
Langchain vectorstore integration with Kinetica (#18102)
- **Description:** New vectorstore integration with the Kinetica
database
  - **Issue:** 
- **Dependencies:** the Kinetica Python API `pip install
gpudb==7.2.0.1`,
  - **Tag maintainer:** @baskaryan, @hwchase17 
  - **Twitter handle:**

---------

Co-authored-by: Chad Juliano <cjuliano@kinetica.com>
2024-02-26 12:46:48 -08:00
GoodBai
3589a135ef
community: make SET allow_experimental_[engine]_index configurabe in vectorstores.clickhouse (#18107)
## Description & Issue
While following the official doc to use clickhouse as a vectorstore, I
found only the default `annoy` index is properly supported. But I want
to try another engine `usearch` for `annoy` is not properly supported on
ARM platforms.
Here is the settings I prefer:

``` python
settings = ClickhouseSettings(
    table="wiki_Ethereum",
    index_type="usearch",  # annoy by default
    index_param=[],
)
```
The above settings do not work for the command `set
allow_experimental_annoy_index=1` is hard-coded.
This PR will make sure the experimental feature follow the `index_type`
which is also consistent with Clickhouse's naming conventions.
2024-02-26 12:39:17 -08:00
Bagatur
9b0b0032c2
community[patch]: fix lint (#17984) 2024-02-22 15:15:27 -08:00
kartikTAI
9cf6661dc5
community: use NeuralDB object to initialize NeuralDBVectorStore (#17272)
**Description:** This PR adds an `__init__` method to the
NeuralDBVectorStore class, which takes in a NeuralDB object to
instantiate the state of NeuralDBVectorStore.
**Issue:** N/A
**Dependencies:** N/A
**Twitter handle:** N/A
2024-02-22 12:05:01 -05:00
Erick Friis
a53370a060
pinecone[patch], docs: PineconeVectorStore, release 0.0.3 (#17896) 2024-02-22 08:24:08 -08:00
Hasan
7248e98b9e
community[patch]: Return PK in similarity search Document (#17561)
Issue: #17390

Co-authored-by: hasan <hasan@m2sys.com>
2024-02-21 17:03:50 -08:00
ehude
9e54c227f1
community[patch]: Bug Neo4j VectorStore when having multiple indexes the sort is not working and the store that returned is random (#17396)
Bug fix: when having multiple indexes the sort is not working and the
store that returned is random.
The following small fix resolves the issue.
2024-02-21 16:33:33 -08:00
Christophe Bornet
5019951a5d
docs: AstraDB VectorStore docstring (#17834) 2024-02-21 16:16:31 -08:00
Rohit Gupta
3acd0c74fc
community[patch]: added SCANN index in default search params (#17889)
This will enable users to add data in same collection for index type
SCANN for milvus
2024-02-21 15:47:47 -08:00
Karim Assi
afc1ba0329
community[patch]: add possibility to search by vector in OpenSearchVectorSearch (#17878)
- **Description:** implements the missing `similarity_search_by_vector`
function for `OpenSearchVectorSearch`
- **Issue:** N/A
- **Dependencies:** N/A
2024-02-21 15:44:55 -08:00
Nathan Voxland (Activeloop)
9ece134d45
docs: Improved deeplake.py init documentation (#17549)
**Description:** 
Updated documentation for DeepLake init method.

Especially the exec_option docs needed improvement, but did a general
cleanup while I was looking at it.

**Issue:** n/a
**Dependencies:** None

---------

Co-authored-by: Nathan Voxland <nathan@voxland.net>
2024-02-21 15:33:00 -08:00
volodymyr-memsql
0a9a519a39
community[patch]: Added add_images method to SingleStoreDB vector store (#17871)
In this pull request, we introduce the add_images method to the
SingleStoreDB vector store class, expanding its capabilities to handle
multi-modal embeddings seamlessly. This method facilitates the
incorporation of image data into the vector store by associating each
image's URI with corresponding document content, metadata, and either
pre-generated embeddings or embeddings computed using the embed_image
method of the provided embedding object.

the change includes integration tests, validating the behavior of the
add_images. Additionally, we provide a notebook showcasing the usage of
this new method.

---------

Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
2024-02-21 15:16:32 -08:00
nbyrneKX
c1bb5fd498
community[patch]: typo in doc-string for kdbai vectorstore (#17811)
community[patch]: typo in doc-string for kdbai vectorstore (#17811)
2024-02-21 10:35:11 -05:00
Karim Lalani
ea61302f71
community[patch]: bug fix - add empty metadata when metadata not provided (#17669)
Code fix to include empty medata dictionary to aadd_texts if metadata is
not provided.
2024-02-19 10:54:52 -08:00
Raghav Dixit
6c18f73ca5
community[patch]: LanceDB integration improvements/fixes (#16173)
Hi, I'm from the LanceDB team.

Improves LanceDB integration by making it easier to use - now you aren't
required to create tables manually and pass them in the constructor,
although that is still backward compatible.

Bug fix - pandas was being used even though it's not a dependency for
LanceDB or langchain

PS - this issue was raised a few months ago but lost traction. It is a
feature improvement for our users kindly review this , Thanks !
2024-02-19 10:22:02 -08:00
Guangdong Liu
73edf17b4e
community[minor]: Add Apache Doris as vector store (#17527)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-18 12:05:58 -07:00
Christophe Bornet
19ebc7418e
community: Use _AstraDBCollectionEnvironment in AstraDB VectorStore (community) (#17635)
Another PR will be done for the langchain-astradb package.

Note: for future PRs, devs will be done in the partner package only. This one is just to align with the rest of the components in the community package and it fixes a bunch of issues.
2024-02-16 11:28:16 -05:00
Krista Pratico
bf8e3c6dd1
community[patch]: add fixes for AzureSearch after update to stable azure-search-documents library (#17599)
- **Description:** Addresses the bugs described in linked issue where an
import was erroneously removed and the rename of a keyword argument was
missed when migrating from beta --> stable of the azure-search-documents
package
- **Issue:** https://github.com/langchain-ai/langchain/issues/17598
- **Dependencies:** N/A
- **Twitter handle:** N/A
2024-02-15 22:23:52 -08:00
morgana
9d7ca7df6e
community[patch]: update copy of metadata in rockset vectorstore integration (#17612)
- **Description:** This fixes an issue with working with RecordManager.
RecordManager was generating new hashes on documents because `add_texts`
was modifying the metadata directly. Additionally moved some tests to
unit tests since that was a more appropriate home.
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** `@_morgan_adams_`
2024-02-15 23:13:40 -07:00
Stefano Lottini
5240ecab99
astradb: bootstrapping Astra DB as Partner Package (#16875)
**Description:** This PR introduces a new "Astra DB" Partner Package.

So far only the vector store class is _duplicated_ there, all others
following once this is validated and established.

Along with the move to separate package, incidentally, the class name
will change `AstraDB` => `AstraDBVectorStore`.

The strategy has been to duplicate the module (with prospected removal
from community at LangChain 0.2). Until then, the code will be kept in
sync with minimal, known differences (there is a makefile target to
automate drift control. Out of convenience with this check, the
community package has a class `AstraDBVectorStore` aliased to `AstraDB`
at the end of the module).

With this PR several bugfixes and improvement come to the vector store,
as well as a reshuffling of the doc pages/notebooks (Astra and
Cassandra) to align with the move to a separate package.

**Dependencies:** A brand new pyproject.toml in the new package, no
changes otherwise.

**Twitter handle:** `@rsprrs`

---------

Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-15 15:50:59 -08:00
volodymyr-memsql
e36bc379f2
community[patch]: Add vector index support to SingleStoreDB VectorStore (#17308)
This pull request introduces support for various Approximate Nearest
Neighbor (ANN) vector index algorithms in the VectorStore class,
starting from version 8.5 of SingleStore DB. Leveraging this enhancement
enables users to harness the power of vector indexing, significantly
boosting search speed, particularly when handling large sets of vectors.

---------

Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-14 11:43:12 -08:00
Ashley Xu
f746a73e26
Add the BQ job usage tracking from LangChain (#17123)
- **Description:**
Add the BQ job usage tracking from LangChain

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-13 14:47:57 -08:00
Max Jakob
ab3d944667
community[patch]: ElasticsearchStore: preserve user headers (#16830)
Users can provide an Elasticsearch connection with custom headers. This
PR makes sure these headers are preserved when adding the langchain user
agent header.
2024-02-13 12:37:35 -08:00
Kapil Sachdeva
cd00a87db7
community[patch] - in FAISS vector store, support passing custom DocStore implementation when using from_xxx methods (#16801)
- **Description:** The from__xx methods of FAISS class have hardcoded
InMemoryStore implementation and thereby not let users pass a custom
DocStore implementation,
  - **Issue:** no referenced issue,
  - **Dependencies:** none,
  - **Twitter handle:** ksachdeva
2024-02-12 19:51:55 -08:00
morgana
722aae4fd1
community: add delete method to rocksetdb vectorstore to support recordmanager (#17030)
- **Description:** This adds a delete method so that rocksetdb can be
used with `RecordManager`.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** `@_morgan_adams_`

---------

Co-authored-by: Rockset API Bot <admin@rockset.io>
2024-02-12 19:50:20 -08:00
Lingzhen Chen
30af711c34
community[patch]: update AzureSearch class to work with azure-search-documents=11.4.0 (#15659)
- **Description:** Updates
`libs/community/langchain_community/vectorstores/azuresearch.py` to
support the stable version `azure-search-documents=11.4.0`
- **Issue:** https://github.com/langchain-ai/langchain/issues/14534,
https://github.com/langchain-ai/langchain/issues/15039,
https://github.com/langchain-ai/langchain/issues/15355
  - **Dependencies:** azure-search-documents>=11.4.0

---------

Co-authored-by: Clément Tamines <Skar0@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-12 19:23:35 -08:00
Spencer Kelly
54fa78c887
community[patch]: fixed vector similarity filtering (#16967)
**Description:** changed filtering so that failed filter doesn't add
document to results. Currently filtering is entirely broken and all
documents are returned whether or not they pass the filter.

fixes issue introduced in
https://github.com/langchain-ai/langchain/pull/16190
2024-02-12 14:52:57 -08:00
david-tempelmann
93da18b667
community[minor]: Add mmr and similarity_score_threshold retrieval to DatabricksVectorSearch (#16829)
- **Description:** This PR adds support for `search_types="mmr"` and
`search_type="similarity_score_threshold"` to retrievers using
`DatabricksVectorSearch`,
  - **Issue:** 
  - **Dependencies:**
  - **Twitter handle:**

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-12 12:51:37 -08:00
Erick Friis
3a2eb6e12b
infra: add print rule to ruff (#16221)
Added noqa for existing prints. Can slowly remove / will prevent more
being intro'd
2024-02-09 16:13:30 -08:00
Jael Gu
c07c0da01a
community[patch]: Fix Milvus add texts when ids=None (#17021)
- **Description:** Fix Milvus add texts when ids=None (auto_id=True)

Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-09 18:48:37 -05:00
Quang Hoa
54c1fb3f25
community[patch]: Make some functions work with Milvus (#10695)
**Description**
Make some functions work with Milvus:
1. get_ids: Get primary keys by field in the metadata
2. delete: Delete one or more entities by ids
3. upsert: Update/Insert one or more entities

**Issue**
None
**Dependencies**
None
**Tag maintainer:**
@hwchase17 
**Twitter handle:**
None

---------

Co-authored-by: HoaNQ9 <hoanq.1811@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-09 15:21:31 -08:00
Leonid Ganeline
932c52c333
community[patch]: docstrings (#16810)
- added missed docstrings
- formated docstrings to the consistent form
2024-02-09 12:48:57 -08:00
Kononov Pavel
15bc201967
langchain_community: Fix typo bug (#17324)
Problem from #17095

This error wasn't in the v1.4.0
2024-02-09 11:27:33 -05:00
Bagatur
02ef9164b5
langchain[patch]: expose cohere rerank score, add parent doc param (#16887) 2024-02-08 16:07:18 -08:00
cjpark-data
ce22e10c4b
community[patch]: Fix KeyError 'embedding' (MongoDBAtlasVectorSearch) (#17178)
- **Description:**
Embedding field name was hard-coded named "embedding".
So I suggest that change `res["embedding"]` into
`res[self._embedding_key]`.
  - **Issue:** #17177,
- **Twitter handle:**
[@bagcheoljun17](https://twitter.com/bagcheoljun17)
2024-02-08 12:06:42 -08:00
ByeongUk Choi
b88329e9a5
community[patch]: Implement Unique ID Enforcement in FAISS (#17244)
**Description:**
Implemented unique ID validation in the FAISS component to ensure all
document IDs are distinct. This update resolves issues related to
non-unique IDs, such as inconsistent behavior during deletion processes.
2024-02-08 12:03:33 -08:00
Bagatur
af74301ab9
core[patch], community[patch]: link extraction continue on failure (#17200) 2024-02-07 14:15:30 -08:00
Erick Friis
6ffd5b15bc
pinecone: init pkg (#16556)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-02-05 11:55:01 -08:00
Harrison Chase
4eda647fdd
infra: add -p to mkdir in lint steps (#17013)
Previously, if this did not find a mypy cache then it wouldnt run

this makes it always run

adding mypy ignore comments with existing uncaught issues to unblock other prs

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-02-05 11:22:06 -08:00
Killinsun - Ryota Takeuchi
bcfce146d8
community[patch]: Correct the calling to collection_name in qdrant (#16920)
## Description

In #16608, the calling `collection_name` was wrong.
I made a fix for it. 
Sorry for the inconvenience!

## Issue

https://github.com/langchain-ai/langchain/issues/16962

## Dependencies

N/A



<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Kumar Shivendu <kshivendu1@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-02-04 10:45:35 -08:00
Christophe Bornet
744070ee85
Add async methods for the AstraDB VectorStore (#16391)
- **Description**: fully async versions are available for astrapy 0.7+.
For older astrapy versions or if the user provides a sync client without
an async one, the async methods will call the sync ones wrapped in
`run_in_executor`
  - **Twitter handle:** cbornet_
2024-01-29 20:22:25 -08:00
thiswillbeyourgithub
1d082359ee
community: add support for callable filters in FAISS (#16190)
- **Description:**
Filtering in a FAISS vectorstores is very inflexible and doesn't allow
that many use case. I think supporting callable like this enables a lot:
regular expressions, condition on multiple keys etc. **Note** I had to
manually alter a test. I don't understand if it was falty to begin with
or if there is something funky going on.
- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** None

Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
2024-01-29 20:05:56 -08:00
Killinsun - Ryota Takeuchi
52f4ad8216
community: Add new fields in metadata for qdrant vector store (#16608)
## Description

The PR is to return the ID and collection name from qdrant client to
metadata field in `Document` class.

## Issue

The motivation is almost same to
[11592](https://github.com/langchain-ai/langchain/issues/11592)

Returning ID is useful to update existing records in a vector store, but
we cannot know them if we use some retrievers.

In order to avoid any conflicts, breaking changes, the new fields in
metadata have a prefix `_`

## Dependencies

N/A

## Twitter handle

@kill_in_sun

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-29 19:59:54 -08:00
Harrison Chase
8457c31c04
community[patch]: activeloop ai tql deprecation (#14634)
Co-authored-by: AdkSarsen <adilkhan@activeloop.ai>
2024-01-29 12:43:54 -08:00
Jael Gu
a1aa3a657c
community[patch]: Milvus supports add & delete texts by ids (#16256)
# Description

To support [langchain
indexing](https://python.langchain.com/docs/modules/data_connection/indexing)
as requested by users, vectorstore Milvus needs to support:
- document addition by id (`add_documents` method with `ids` argument)
- delete by id (`delete` method with `ids` argument)

Example usage:

```python
from langchain.indexes import SQLRecordManager, index
from langchain.schema import Document
from langchain_community.vectorstores import Milvus
from langchain_openai import OpenAIEmbeddings

collection_name = "test_index"
embedding = OpenAIEmbeddings()
vectorstore = Milvus(embedding_function=embedding, collection_name=collection_name)

namespace = f"milvus/{collection_name}"
record_manager = SQLRecordManager(
    namespace, db_url="sqlite:///record_manager_cache.sql"
)
record_manager.create_schema()

doc1 = Document(page_content="kitty", metadata={"source": "kitty.txt"})
doc2 = Document(page_content="doggy", metadata={"source": "doggy.txt"})

index(
    [doc1, doc1, doc2],
    record_manager,
    vectorstore,
    cleanup="incremental",  # None, "incremental", or "full"
    source_id_key="source",
)
```

# Fix issues

Fix https://github.com/milvus-io/milvus/issues/30112

---------

Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-01-29 11:19:50 -08:00
Michard Hugo
e9d3527b79
community[patch]: Add missing async similarity_distance_threshold handling in RedisVectorStoreRetriever (#16359)
Add missing async similarity_distance_threshold handling in
RedisVectorStoreRetriever

- **Description:** added method `_aget_relevant_documents` to
`RedisVectorStoreRetriever` that overrides parent method to add support
of `similarity_distance_threshold` in async mode (as for sync mode)
  - **Issue:** #16099
  - **Dependencies:** N/A
  - **Twitter handle:** N/A
2024-01-29 11:19:30 -08:00
Benito Geordie
f3fdc5c5da
community: Added integrations for ThirdAI's NeuralDB with Retriever and VectorStore frameworks (#15280)
**Description:** Adds ThirdAI NeuralDB retriever and vectorstore
integration. NeuralDB is a CPU-friendly and fine-tunable text retrieval
engine.
2024-01-29 08:35:42 -08:00
Pashva Mehta
22d90800c8
community: Fixed schema discrepancy in from_texts function for weaviate vectorstore (#16693)
* Description: Fixed schema discrepancy in **from_texts** function for
weaviate vectorstore which created a redundant property "key" inside a
class.
* Issue: Fixed: https://github.com/langchain-ai/langchain/issues/16692
* Twitter handle: @pashvamehta1
2024-01-28 16:53:31 -08:00
Rashedul Hasan Rijul
481493dbce
community[patch]: apply embedding functions during query if defined (#16646)
**Description:** This update ensures that the user-defined embedding
function specified during vector store creation is applied during
queries. Previously, even if a custom embedding function was defined at
the time of store creation, Bagel DB would default to using the standard
embedding function during query execution. This pull request addresses
this issue by consistently using the user-defined embedding function for
queries if one has been specified earlier.
2024-01-27 16:46:33 -08:00
Martin Kolb
04651f0248
community[minor]: VectorStore integration for SAP HANA Cloud Vector Engine (#16514)
- **Description:**
This PR adds a VectorStore integration for SAP HANA Cloud Vector Engine,
which is an upcoming feature in the SAP HANA Cloud database
(https://blogs.sap.com/2023/11/02/sap-hana-clouds-vector-engine-announcement/).

  - **Issue:** N/A
- **Dependencies:** [SAP HANA Python
Client](https://pypi.org/project/hdbcli/)
  - **Twitter handle:** @sapopensource

Implementation of the integration:
`libs/community/langchain_community/vectorstores/hanavector.py`

Unit tests:
`libs/community/tests/unit_tests/vectorstores/test_hanavector.py`

Integration tests:
`libs/community/tests/integration_tests/vectorstores/test_hanavector.py`

Example notebook:
`docs/docs/integrations/vectorstores/hanavector.ipynb`

Access credentials for execution of the integration tests can be
provided to the maintainers.

---------

Co-authored-by: sascha <sascha.stoll@sap.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-01-24 14:05:07 -08:00
bu2kx
ff3163297b
community[minor]: Add KDBAI vector store (#12797)
Addition of KDBAI vector store (https://kdb.ai).

Dependencies: `kdbai_client` v0.1.2 Python package.

Sample notebook: `docs/docs/integrations/vectorstores/kdbai.ipynb`

Tag maintainer: @bu2kx
Twitter handle: @kxsystems
2024-01-23 18:37:01 -08:00
Noah Stapp
e135e5257c
community[patch]: Include scores in MongoDB Atlas QA chain results (#14666)
Adds the ability to return similarity scores when using
`RetrievalQA.from_chain_type` with `MongoDBAtlasVectorSearch`. Requires
that `return_source_documents=True` is set.

Example use:

```
vector_search = MongoDBAtlasVectorSearch.from_documents(...)

qa = RetrievalQA.from_chain_type(
	llm=OpenAI(), 
	chain_type="stuff", 
	retriever=vector_search.as_retriever(search_kwargs={"additional": ["similarity_score"]}),
	return_source_documents=True
)

...

docs = qa({"query": "..."})

docs["source_documents"][0].metadata["score"] # score will be here
```

I've tested this feature locally, using a MongoDB Atlas Cluster with a
vector search index.
2024-01-23 18:18:28 -08:00
Frank995
5694728816
community[patch]: Implement vector length definition at init time in PGVector for indexing (#16133)
Replace this entire comment with:
- **Description:** allow user to define tVector length in PGVector when
creating the embedding store, this allows for later indexing
  - **Issue:** #16132
  - **Dependencies:** None
2024-01-22 14:32:44 -08:00
s-g-1
fbe592a5ce
community[patch]: fix typo in pgvecto_rs debug msg (#16318)
fixes typo in pip install message for the pgvecto_rs community vector
store
no issues found mentioning this
no dependents changed
2024-01-22 14:01:33 -08:00
Max Jakob
8569b8f680
community[patch]: ElasticsearchStore enable max inner product (#16393)
Enable max inner product for approximate retrieval strategy. For exact
strategy we lack the necessary `maxInnerProduct` function in the
Painless scripting language, this is why we do not add it there.

Similarity docs:
https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html#dense-vector-params

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Joe McElroy <joseph.mcelroy@elastic.co>
2024-01-22 11:26:18 -08:00
Max Jakob
de209af533
community[patch]: ElasticsearchStore: add relevance function selector (#16378)
Implement similarity function selector for ElasticsearchStore. The
scores coming back from Elasticsearch are already similarities (not
distances) and they are already normalized (see
[docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html#dense-vector-params)).
Hence we leave the scores untouched and just forward them.

This fixes #11539.

However, in hybrid mode (when keyword search and vector search are
involved) Elasticsearch currently returns no scores. This PR adds an
error message around this fact. We need to think a bit more to come up
with a solution for this case.

This PR also corrects a small error in the Elasticsearch integration
test.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-01-22 11:52:20 -07:00
Ofer Mendelevitch
ffae98d371
template: Update Vectara templates (#15363)
fixed multi-query template for Vectara
added self-query template for Vectara

Also added prompt_name parameter to summarization

CC @efriis 
 **Twitter handle:** @ofermend
2024-01-19 17:32:33 -08:00
Andreas Motl
3613d8a2ad
community[patch]: Use SQLAlchemy's bulk_save_objects method to improve insert performance (#16244)
- **Description:** Improve [pgvector vector store
adapter](https://github.com/langchain-ai/langchain/blob/v0.1.1/libs/community/langchain_community/vectorstores/pgvector.py)
to save embeddings in batches, to improve its performance.
  - **Issue:** NA
  - **Dependencies:** NA
  - **References:** https://github.com/crate-workbench/langchain/pull/1


Hi again from the CrateDB team,

following up on GH-16243, this is another minor patch to the pgvector
vector store adapter. Inserting embeddings in batches, using
[SQLAlchemy's
`bulk_save_objects`](https://docs.sqlalchemy.org/en/20/orm/session_api.html#sqlalchemy.orm.Session.bulk_save_objects)
method, can deliver substantial performance gains.

With kind regards,
Andreas.

NB: As I am seeing just now that this method is a legacy feature of SA
2.0, it will need to be reworked on a future iteration. However, it is
not deprecated yet, and I haven't been able to come up with a different
implementation, yet.
2024-01-18 18:35:39 -08:00
Christophe Bornet
fb940d11df
community[patch]: Use newer MetadataVectorCassandraTable in Cassandra vector store (#15987)
as VectorTable is deprecated

Tested manually with `test_cassandra.py` vector store integration test.
2024-01-17 10:37:07 -08:00
Felix Krones
d91126fc64
community[patch]: missing unpack operator for or_clause in pgvector document filter (#16148)
- Fix for #16146 
- Adding unpack operation to "or" and "and" filter for pgvector
retriever. #
2024-01-17 09:10:43 -08:00
James Briggs
ca288d8f2c
community[patch]: add vector param to index query for pinecone vec store (#16054) 2024-01-16 06:12:19 -08:00
Antonio Morales
476fb328ee
community[patch]: implement adelete from VectorStore in Qdrant (#16005)
**Description:**
Implement `adelete` function from `VectorStore` in `Qdrant` to support
other asynchronous flows such as async indexing (`aindex`) which
requires `adelete` to be implemented. Since `Qdrant` can be passed an
async qdrant client, this can be supported easily.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-01-15 19:57:09 -08:00
高远
061e63eef2
community[minor]: add vikingdb vecstore (#15155)
---------

Co-authored-by: gaoyuan <gaoyuan.20001218@bytedance.com>
2024-01-15 12:34:01 -08:00