Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
Support for old clients (Thin and Thick) Oracle Vector Store
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
Support for old clients (Thin and Thick) Oracle Vector Store
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
Have our own local tests
---------
Co-authored-by: rohan.aggarwal@oracle.com <rohaagga@phoenix95642.dev3sub2phx.databasede3phx.oraclevcn.com>
This PR add supports for Azure Cosmos DB for NoSQL vector store.
Summary:
Description: added vector store integration for Azure Cosmos DB for
NoSQL Vector Store,
Dependencies: azure-cosmos dependency,
Tag maintainer: @hwchase17, @baskaryan @efriis @eyurtsev
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** As pointed out in this issue #22770, DocumentDB
`similarity_search` does not support filtering through metadata which
this PR adds by passing in the parameter `filter`. Also this PR fixes a
minor Documentation error.
- **Issue:** #22770
---------
Co-authored-by: Erick Friis <erickfriis@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
- [ ] **Miscellaneous updates and fixes**:
- **Description:** Handled error in querying; quotes in table names;
updated gpudb API
- **Issue:** Threw an error with an error message difficult to
understand if a query failed or returned no records
- **Dependencies:** Updated GPUDB API version to `7.2.0.9`
@baskaryan @hwchase17
They cause `poetry lock` to take a ton of time, and `uv pip install` can
resolve the constraints from these toml files in trivial time
(addressing problem with #19153)
This allows us to properly upgrade lockfile dependencies moving forward,
which revealed some issues that were either fixed or type-ignored (see
file comments)
This PR adds a constructor `metadata_indexing` parameter to the
Cassandra vector store to allow optional fine-tuning of which fields of
the metadata are to be indexed.
This is a feature supported by the underlying CassIO library. Indexing
mode of "all", "none" or deny- and allow-list based choices are
available.
The rationale is, in some cases it's advisable to programmatically
exclude some portions of the metadata from the index if one knows in
advance they won't ever be used at search-time. this keeps the index
more lightweight and performant and avoids limitations on the length of
_indexed_ strings.
I added a integration test of the feature. I also added the possibility
of running the integration test with Cassandra on an arbitrary IP
address (e.g. Dockerized), via
`CASSANDRA_CONTACT_POINTS=10.1.1.5,10.1.1.6 poetry run pytest [...]` or
similar.
While I was at it, I added a line to the `.gitignore` since the mypy
_test_ cache was not ignored yet.
My X (Twitter) handle: @rsprrs.
- **Description:** The InMemoryVectorStore is a nice and simple vector
store implementation for quick development and debugging. The current
implementation is quite limited in its functionalities. This PR extends
the functionalities by adding utility function to persist the vector
store to a json file and to load it from a json file. We choose the json
file format because it allows inspection of the database contents in a
text editor, which is great for debugging. Furthermore, it adds a
`filter` keyword that can be used to filter out documents on their
`page_content` or `metadata`.
- **Issue:** -
- **Dependencies:** -
- **Twitter handle:** @Vincent_Min
- [ ] **community**: "vectorstore: added filtering support for LanceDB
vector store"
- [ ] **This PR adds filtering capabilities to LanceDB**:
- **Description:** In LanceDB filtering can be applied when searching
for data into the vectorstore. It is using the SQL language as mentioned
in the LanceDB documentation.
- **Issue:** #18235
- **Dependencies:** No
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Thank you for contributing to LangChain!
**Description:** update to the Vectara / Langchain integration to
integrate new Vectara capabilities:
- Full RAG implemented as a Runnable with as_rag()
- Vectara chat supported with as_chat()
- Both support streaming response
- Updated documentation and example notebook to reflect all the changes
- Updated Vectara templates
**Twitter handle:** ofermend
**Add tests and docs**: no new tests or docs, but updated both existing
tests and existing docs
The Vectorstore's API `as_retriever` doesn't expose explicitly the
parameters `search_type` and `search_kwargs` and so these are not well
documented.
This PR improves `as_retriever` for the Cassandra VectorStore by making
these parameters explicit.
NB: An alternative would have been to modify `as_retriever` in
`Vectorstore`. But there's probably a good reason these were not exposed
in the first place ? Is it because implementations may decide to not
support them and have fixed values when creating the
VectorStoreRetriever ?
This PR introduces namespace support for Upstash Vector Store, which
would allow users to partition their data in the vector index.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description**
Fix AzureSearch delete documents method by using FIELDS_ID variable
instead of the hard coded "id" value
**Issue:**
This is linked to this issue:
https://github.com/langchain-ai/langchain/issues/22314
Co-authored-by: dseban <dan.seban@neoxia.com>
### Issue: #22299
### descriptions
The documentation appears to be wrong. When the user actually sets this
parameter "asynchronous" to be True, it fails because the __init__
function of FAISS class doesn't allow this parameter. In fact, most of
the class/instance functions of this class have both the sync/async
version, so it looks like what we need is just to remove this parameter
from the doc.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Lifu Wu <lifu@nextbillion.ai>
Thank you for contributing to LangChain!
- [x] **PR title**: community: Add Zep Cloud components + docs +
examples
- [x] **PR message**:
We have recently released our new zep-cloud sdks that are compatible
with Zep Cloud (not Zep Open Source). We have also maintained our Cloud
version of langchain components (ChatMessageHistory, VectorStore) as
part of our sdks. This PRs goal is to port these components to langchain
community repo, and close the gap with the existing Zep Open Source
components already present in community repo (added
ZepCloudMemory,ZepCloudVectorStore,ZepCloudRetriever).
Also added a ZepCloudChatMessageHistory components together with an
expression language example ported from our repo. We have left the
original open source components intact on purpose as to not introduce
any breaking changes.
- **Issue:** -
- **Dependencies:** Added optional dependency of our new cloud sdk
`zep-cloud`
- **Twitter handle:** @paulpaliychuk51
- [x] **Add tests and docs**
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
3 fixes of DuckDB vector store:
- unify defaults in constructor and from_texts (users no longer have to
specify `vector_key`).
- include search similarity into output metadata (fixes#20969)
- significantly improve performance of `from_documents`
Dependencies: added Pandas to speed up `from_documents`.
I was thinking about CSV and JSON options, but I expect trouble loading
JSON values this way and also CSV and JSON options require storing data
to disk.
Anyway, the poetry file for langchain-community already contains a
dependency on Pandas.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
- **Description:** this PR gives clickhouse client the ability to use a
secure connection to the clickhosue server
- **Issue:** fixes#22082
- **Dependencies:** -
- **Twitter handle:** `_codingcoffee_`
Signed-off-by: Ameya Shenoy <shenoy.ameya@gmail.com>
Co-authored-by: Shresth Rana <shresth@grapevine.in>
This PR contains 4 added functions:
- max_marginal_relevance_search_by_vector
- amax_marginal_relevance_search_by_vector
- max_marginal_relevance_search
- amax_marginal_relevance_search
I'm no langchain expert, but tried do inspect other vectorstore sources
like chroma, to build these functions for SurrealDB. If someone has some
changes for me, please let me know. Otherwise I would be happy, if these
changes are added to the repository, so that I can use the orignal repo
and not my local monkey patched version.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Fixed `AzureSearchVectorStoreRetriever` to account
for search_kwargs. More explanation is in the mentioned issue.
- **Issue:** #21492
---------
Co-authored-by: MAC <mac@MACs-MacBook-Pro.local>
Co-authored-by: Massimiliano Pronesti <massimiliano.pronesti@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Updates Meilisearch vectorstore for compatibility
with v1.8. Adds [”showRankingScore”:
true”](https://www.meilisearch.com/docs/reference/api/search#ranking-score)
in the search parameters and replaces `_semanticScore` field with `
_rankingScore`
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description:**
- Extend AzureSearch with `maximal_marginal_relevance` (for vector and
hybrid search)
- Add construction `from_embeddings` - if the user has already embedded
the texts
- Add `add_embeddings`
- Refactor common parts (`_simple_search`, `_results_to_documents`,
`_reorder_results_with_maximal_marginal_relevance`)
- Add `vector_search_dimensions` as a parameter to the constructor to
avoid extra calls to `embed_query` (most of the time the user applies
the same model and knows the dimension)
**Issue:** none
**Dependencies:** none
- [x] **Add tests and docs**: The docstrings have been added to the new
functions, and unified for the existing ones. The example notebook is
great in illustrating the main usage of AzureSearch, adding the new
methods would only dilute the main content.
- [x] **Lint and test**
---------
Co-authored-by: Oleksii Pokotylo <oleksii.pokotylo@pwc.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Backwards compatible extension of the initialisation
interface of HanaDB to allow the user to specify
specific_metadata_columns that are used for metadata storage of selected
keys which yields increased filter performance. Any not-mentioned
metadata remains in the general metadata column as part of a JSON
string. Furthermore switched to executemany for batch inserts into
HanaDB.
**Issue:** N/A
**Dependencies:** no new dependencies added
**Twitter handle:** @sapopensource
---------
Co-authored-by: Martin Kolb <martin.kolb@sap.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Please let me know if you see any possible areas of improvement. I would
very much appreciate your constructive criticism if time allows.
**Description:**
- Added a aerospike vector store integration that utilizes
[Aerospike-Vector-Search](https://aerospike.com/products/vector-database-search-llm/)
add-on.
- Added both unit tests and integration tests
- Added a docker compose file for spinning up a test environment
- Added a notebook
**Dependencies:** any dependencies required for this change
- aerospike-vector-search
**Twitter handle:**
- No twitter, you can use my GitHub handle or LinkedIn if you'd like
Thanks!
---------
Co-authored-by: Jesse Schumacher <jschumacher@aerospike.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "community: enable SupabaseVectorStore to support
extended table fields"
- [x] **PR message**:
- Added extension fields to the function _add_vectors so that users can
add other custom fields when insert a record into the database. eg:
![image](https://github.com/langchain-ai/langchain/assets/10885578/e1d5ca20-936e-4cab-ba69-8fdd23b8ce8f)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
## Description
This PR introduces the new `langchain-qdrant` partner package, intending
to deprecate the community package.
## Changes
- Moved the Qdrant vector store implementation `/libs/partners/qdrant`
with integration tests.
- The conditional imports of the client library are now regular with
minor implementation improvements.
- Added a deprecation warning to
`langchain_community.vectorstores.qdrant.Qdrant`.
- Replaced references/imports from `langchain_community` with either
`langchain_core` or by moving the definitions to the `langchain_qdrant`
package itself.
- Updated the Qdrant vector store documentation to reflect the changes.
## Testing
- `QDRANT_URL` and
[`QDRANT_API_KEY`](583e36bf6b)
env values need to be set to [run integration
tests](d608c93d1f)
in the [cloud](https://cloud.qdrant.tech).
- If a Qdrant instance is running at `http://localhost:6333`, the
integration tests will use it too.
- By default, tests use an
[`in-memory`](https://github.com/qdrant/qdrant-client?tab=readme-ov-file#local-mode)
instance(Not comprehensive).
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Erick Friis <erickfriis@gmail.com>
Description: Adds NeuralDBClientVectorStore to the langchain, which is
our enterprise client.
---------
Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com>
Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
If Session and/or keyspace are not provided, they are resolved from
cassio's context. So they are not required.
This change is fully backward compatible.
Thank you for contributing to LangChain!
- Oracle AI Vector Search
Oracle AI Vector Search is designed for Artificial Intelligence (AI)
workloads that allows you to query data based on semantics, rather than
keywords. One of the biggest benefit of Oracle AI Vector Search is that
semantic search on unstructured data can be combined with relational
search on business data in one single system. This is not only powerful
but also significantly more effective because you don't need to add a
specialized vector database, eliminating the pain of data fragmentation
between multiple systems.
- Oracle AI Vector Search is designed for Artificial Intelligence (AI)
workloads that allows you to query data based on semantics, rather than
keywords. One of the biggest benefit of Oracle AI Vector Search is that
semantic search on unstructured data can be combined with relational
search on business data in one single system. This is not only powerful
but also significantly more effective because you don't need to add a
specialized vector database, eliminating the pain of data fragmentation
between multiple systems.
This Pull Requests Adds the following functionalities
Oracle AI Vector Search : Vector Store
Oracle AI Vector Search : Document Loader
Oracle AI Vector Search : Document Splitter
Oracle AI Vector Search : Summary
Oracle AI Vector Search : Oracle Embeddings
- We have added unit tests and have our own local unit test suite which
verifies all the code is correct. We have made sure to add guides for
each of the components and one end to end guide that shows how the
entire thing runs.
- We have made sure that make format and make lint run clean.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: skmishraoracle <shailendra.mishra@oracle.com>
Co-authored-by: hroyofc <harichandan.roy@oracle.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description**
This pull request updates the Bagel Network package name from
"betabageldb" to "bagelML" to align with the latest changes made by the
Bagel Network team.
The following modifications have been made:
- Updated all references to the old package name ("betabageldb") with
the new package name ("bagelML") throughout the codebase.
- Modified the documentation, and any relevant scripts to reflect the
package name change.
- Tested the changes to ensure that the functionality remains intact and
no breaking changes were introduced.
By merging this pull request, our project will stay up to date with the
latest Bagel Network package naming convention, ensuring compatibility
and smooth integration with their updated library.
Please review the changes and provide any feedback or suggestions. Thank
you!