langchain

Commit Graph

Author	SHA1	Message	Date
volodymyr-memsql	e36bc379f2	community[patch]: Add vector index support to SingleStoreDB VectorStore (#17308 ) This pull request introduces support for various Approximate Nearest Neighbor (ANN) vector index algorithms in the VectorStore class, starting from version 8.5 of SingleStore DB. Leveraging this enhancement enables users to harness the power of vector indexing, significantly boosting search speed, particularly when handling large sets of vectors. --------- Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	5 months ago
Ashley Xu	f746a73e26	Add the BQ job usage tracking from LangChain (#17123 ) - Description: Add the BQ job usage tracking from LangChain --------- Co-authored-by: Erick Friis <erick@langchain.dev>	5 months ago
Max Jakob	ab3d944667	community[patch]: ElasticsearchStore: preserve user headers (#16830 ) Users can provide an Elasticsearch connection with custom headers. This PR makes sure these headers are preserved when adding the langchain user agent header.	5 months ago
Kapil Sachdeva	cd00a87db7	community[patch] - in FAISS vector store, support passing custom DocStore implementation when using from_xxx methods (#16801 ) - Description: The from__xx methods of FAISS class have hardcoded InMemoryStore implementation and thereby not let users pass a custom DocStore implementation, - Issue: no referenced issue, - Dependencies: none, - Twitter handle: ksachdeva	5 months ago
morgana	722aae4fd1	community: add delete method to rocksetdb vectorstore to support recordmanager (#17030 ) - Description: This adds a delete method so that rocksetdb can be used with `RecordManager`. - Issue: N/A - Dependencies: N/A - Twitter handle: `@_morgan_adams_` --------- Co-authored-by: Rockset API Bot <admin@rockset.io>	5 months ago
Lingzhen Chen	30af711c34	community[patch]: update AzureSearch class to work with azure-search-documents=11.4.0 (#15659 ) - Description: Updates `libs/community/langchain_community/vectorstores/azuresearch.py` to support the stable version `azure-search-documents=11.4.0` - Issue: https://github.com/langchain-ai/langchain/issues/14534, https://github.com/langchain-ai/langchain/issues/15039, https://github.com/langchain-ai/langchain/issues/15355 - Dependencies: azure-search-documents>=11.4.0 --------- Co-authored-by: Clément Tamines <Skar0@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	5 months ago
Spencer Kelly	54fa78c887	community[patch]: fixed vector similarity filtering (#16967 ) Description: changed filtering so that failed filter doesn't add document to results. Currently filtering is entirely broken and all documents are returned whether or not they pass the filter. fixes issue introduced in https://github.com/langchain-ai/langchain/pull/16190	5 months ago
david-tempelmann	93da18b667	community[minor]: Add mmr and similarity_score_threshold retrieval to DatabricksVectorSearch (#16829 ) - Description: This PR adds support for `search_types="mmr"` and `search_type="similarity_score_threshold"` to retrievers using `DatabricksVectorSearch`, - Issue: - Dependencies: - Twitter handle: --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	5 months ago
Erick Friis	3a2eb6e12b	infra: add print rule to ruff (#16221 ) Added noqa for existing prints. Can slowly remove / will prevent more being intro'd	5 months ago
Jael Gu	c07c0da01a	community[patch]: Fix Milvus add texts when ids=None (#17021 ) - Description: Fix Milvus add texts when ids=None (auto_id=True) Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	5 months ago
Quang Hoa	54c1fb3f25	community[patch]: Make some functions work with Milvus (#10695 ) Description Make some functions work with Milvus: 1. get_ids: Get primary keys by field in the metadata 2. delete: Delete one or more entities by ids 3. upsert: Update/Insert one or more entities Issue None Dependencies None Tag maintainer: @hwchase17 Twitter handle: None --------- Co-authored-by: HoaNQ9 <hoanq.1811@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	5 months ago
Leonid Ganeline	932c52c333	community[patch]: docstrings (#16810 ) - added missed docstrings - formated docstrings to the consistent form	5 months ago
Kononov Pavel	15bc201967	langchain_community: Fix typo bug (#17324 ) Problem from #17095 This error wasn't in the v1.4.0	5 months ago
Bagatur	02ef9164b5	langchain[patch]: expose cohere rerank score, add parent doc param (#16887 )	5 months ago
cjpark-data	ce22e10c4b	community[patch]: Fix KeyError 'embedding' (MongoDBAtlasVectorSearch) (#17178 ) - Description: Embedding field name was hard-coded named "embedding". So I suggest that change `res["embedding"]` into `res[self._embedding_key]`. - Issue: #17177, - Twitter handle: [@bagcheoljun17](https://twitter.com/bagcheoljun17)	5 months ago
ByeongUk Choi	b88329e9a5	community[patch]: Implement Unique ID Enforcement in FAISS (#17244 ) Description: Implemented unique ID validation in the FAISS component to ensure all document IDs are distinct. This update resolves issues related to non-unique IDs, such as inconsistent behavior during deletion processes.	5 months ago
Bagatur	af74301ab9	core[patch], community[patch]: link extraction continue on failure (#17200 )	5 months ago
Erick Friis	6ffd5b15bc	pinecone: init pkg (#16556 ) <!-- Thank you for contributing to LangChain! Please title your PR "<package>: <description>", where <package> is whichever of langchain, community, core, experimental, etc. is being modified. Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes if applicable, - Dependencies: any dependencies required for this change, - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` from the root of the package you've modified to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://python.langchain.com/docs/contributing/ If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	5 months ago
Harrison Chase	4eda647fdd	infra: add -p to mkdir in lint steps (#17013 ) Previously, if this did not find a mypy cache then it wouldnt run this makes it always run adding mypy ignore comments with existing uncaught issues to unblock other prs --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	5 months ago
Killinsun - Ryota Takeuchi	bcfce146d8	community[patch]: Correct the calling to collection_name in qdrant (#16920 ) ## Description In #16608, the calling `collection_name` was wrong. I made a fix for it. Sorry for the inconvenience! ## Issue https://github.com/langchain-ai/langchain/issues/16962 ## Dependencies N/A <!-- Thank you for contributing to LangChain! Please title your PR "<package>: <description>", where <package> is whichever of langchain, community, core, experimental, etc. is being modified. Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes if applicable, - Dependencies: any dependencies required for this change, - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` from the root of the package you've modified to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://python.langchain.com/docs/contributing/ If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. --> --------- Co-authored-by: Kumar Shivendu <kshivendu1@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	5 months ago
Christophe Bornet	744070ee85	Add async methods for the AstraDB VectorStore (#16391 ) - Description: fully async versions are available for astrapy 0.7+. For older astrapy versions or if the user provides a sync client without an async one, the async methods will call the sync ones wrapped in `run_in_executor` - Twitter handle: cbornet_	5 months ago
thiswillbeyourgithub	1d082359ee	community: add support for callable filters in FAISS (#16190 ) - Description: Filtering in a FAISS vectorstores is very inflexible and doesn't allow that many use case. I think supporting callable like this enables a lot: regular expressions, condition on multiple keys etc. Note I had to manually alter a test. I don't understand if it was falty to begin with or if there is something funky going on. - Issue: None - Dependencies: None - Twitter handle: None Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>	5 months ago
Killinsun - Ryota Takeuchi	52f4ad8216	community: Add new fields in metadata for qdrant vector store (#16608 ) ## Description The PR is to return the ID and collection name from qdrant client to metadata field in `Document` class. ## Issue The motivation is almost same to [11592](https://github.com/langchain-ai/langchain/issues/11592) Returning ID is useful to update existing records in a vector store, but we cannot know them if we use some retrievers. In order to avoid any conflicts, breaking changes, the new fields in metadata have a prefix `_` ## Dependencies N/A ## Twitter handle @kill_in_sun <!-- Thank you for contributing to LangChain! Please title your PR "<package>: <description>", where <package> is whichever of langchain, community, core, experimental, etc. is being modified. Replace this entire comment with: - Description: a description of the change, - Issue: the issue # it fixes if applicable, - Dependencies: any dependencies required for this change, - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` from the root of the package you've modified to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://python.langchain.com/docs/contributing/ If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	5 months ago
Harrison Chase	8457c31c04	community[patch]: activeloop ai tql deprecation (#14634 ) Co-authored-by: AdkSarsen <adilkhan@activeloop.ai>	5 months ago
Jael Gu	a1aa3a657c	community[patch]: Milvus supports add & delete texts by ids (#16256 ) # Description To support [langchain indexing](https://python.langchain.com/docs/modules/data_connection/indexing) as requested by users, vectorstore Milvus needs to support: - document addition by id (`add_documents` method with `ids` argument) - delete by id (`delete` method with `ids` argument) Example usage: ```python from langchain.indexes import SQLRecordManager, index from langchain.schema import Document from langchain_community.vectorstores import Milvus from langchain_openai import OpenAIEmbeddings collection_name = "test_index" embedding = OpenAIEmbeddings() vectorstore = Milvus(embedding_function=embedding, collection_name=collection_name) namespace = f"milvus/{collection_name}" record_manager = SQLRecordManager( namespace, db_url="sqlite:///record_manager_cache.sql" ) record_manager.create_schema() doc1 = Document(page_content="kitty", metadata={"source": "kitty.txt"}) doc2 = Document(page_content="doggy", metadata={"source": "doggy.txt"}) index( [doc1, doc1, doc2], record_manager, vectorstore, cleanup="incremental", # None, "incremental", or "full" source_id_key="source", ) ``` # Fix issues Fix https://github.com/milvus-io/milvus/issues/30112 --------- Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	5 months ago
Michard Hugo	e9d3527b79	community[patch]: Add missing async similarity_distance_threshold handling in RedisVectorStoreRetriever (#16359 ) Add missing async similarity_distance_threshold handling in RedisVectorStoreRetriever - Description: added method `_aget_relevant_documents` to `RedisVectorStoreRetriever` that overrides parent method to add support of `similarity_distance_threshold` in async mode (as for sync mode) - Issue: #16099 - Dependencies: N/A - Twitter handle: N/A	5 months ago
Benito Geordie	f3fdc5c5da	community: Added integrations for ThirdAI's NeuralDB with Retriever and VectorStore frameworks (#15280 ) Description: Adds ThirdAI NeuralDB retriever and vectorstore integration. NeuralDB is a CPU-friendly and fine-tunable text retrieval engine.	5 months ago
Pashva Mehta	22d90800c8	community: Fixed schema discrepancy in from_texts function for weaviate vectorstore (#16693 ) * Description: Fixed schema discrepancy in from_texts function for weaviate vectorstore which created a redundant property "key" inside a class. * Issue: Fixed: https://github.com/langchain-ai/langchain/issues/16692 * Twitter handle: @pashvamehta1	5 months ago
Rashedul Hasan Rijul	481493dbce	community[patch]: apply embedding functions during query if defined (#16646 ) Description: This update ensures that the user-defined embedding function specified during vector store creation is applied during queries. Previously, even if a custom embedding function was defined at the time of store creation, Bagel DB would default to using the standard embedding function during query execution. This pull request addresses this issue by consistently using the user-defined embedding function for queries if one has been specified earlier.	5 months ago
Martin Kolb	04651f0248	community[minor]: VectorStore integration for SAP HANA Cloud Vector Engine (#16514 ) - Description: This PR adds a VectorStore integration for SAP HANA Cloud Vector Engine, which is an upcoming feature in the SAP HANA Cloud database (https://blogs.sap.com/2023/11/02/sap-hana-clouds-vector-engine-announcement/). - Issue: N/A - Dependencies: [SAP HANA Python Client](https://pypi.org/project/hdbcli/) - Twitter handle: @sapopensource Implementation of the integration: `libs/community/langchain_community/vectorstores/hanavector.py` Unit tests: `libs/community/tests/unit_tests/vectorstores/test_hanavector.py` Integration tests: `libs/community/tests/integration_tests/vectorstores/test_hanavector.py` Example notebook: `docs/docs/integrations/vectorstores/hanavector.ipynb` Access credentials for execution of the integration tests can be provided to the maintainers. --------- Co-authored-by: sascha <sascha.stoll@sap.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	5 months ago
bu2kx	ff3163297b	community[minor]: Add KDBAI vector store (#12797 ) Addition of KDBAI vector store (https://kdb.ai). Dependencies: `kdbai_client` v0.1.2 Python package. Sample notebook: `docs/docs/integrations/vectorstores/kdbai.ipynb` Tag maintainer: @bu2kx Twitter handle: @kxsystems	5 months ago
Noah Stapp	e135e5257c	community[patch]: Include scores in MongoDB Atlas QA chain results (#14666 ) Adds the ability to return similarity scores when using `RetrievalQA.from_chain_type` with `MongoDBAtlasVectorSearch`. Requires that `return_source_documents=True` is set. Example use: ``` vector_search = MongoDBAtlasVectorSearch.from_documents(...) qa = RetrievalQA.from_chain_type( llm=OpenAI(), chain_type="stuff", retriever=vector_search.as_retriever(search_kwargs={"additional": ["similarity_score"]}), return_source_documents=True ) ... docs = qa({"query": "..."}) docs["source_documents"][0].metadata["score"] # score will be here ``` I've tested this feature locally, using a MongoDB Atlas Cluster with a vector search index.	5 months ago
Frank995	5694728816	community[patch]: Implement vector length definition at init time in PGVector for indexing (#16133 ) Replace this entire comment with: - Description: allow user to define tVector length in PGVector when creating the embedding store, this allows for later indexing - Issue: #16132 - Dependencies: None	5 months ago
s-g-1	fbe592a5ce	community[patch]: fix typo in pgvecto_rs debug msg (#16318 ) fixes typo in pip install message for the pgvecto_rs community vector store no issues found mentioning this no dependents changed	5 months ago
Max Jakob	8569b8f680	community[patch]: ElasticsearchStore enable max inner product (#16393 ) Enable max inner product for approximate retrieval strategy. For exact strategy we lack the necessary `maxInnerProduct` function in the Painless scripting language, this is why we do not add it there. Similarity docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html#dense-vector-params --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Joe McElroy <joseph.mcelroy@elastic.co>	5 months ago
Max Jakob	de209af533	community[patch]: ElasticsearchStore: add relevance function selector (#16378 ) Implement similarity function selector for ElasticsearchStore. The scores coming back from Elasticsearch are already similarities (not distances) and they are already normalized (see [docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html#dense-vector-params)). Hence we leave the scores untouched and just forward them. This fixes #11539. However, in hybrid mode (when keyword search and vector search are involved) Elasticsearch currently returns no scores. This PR adds an error message around this fact. We need to think a bit more to come up with a solution for this case. This PR also corrects a small error in the Elasticsearch integration test. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	5 months ago
Ofer Mendelevitch	ffae98d371	template: Update Vectara templates (#15363 ) fixed multi-query template for Vectara added self-query template for Vectara Also added prompt_name parameter to summarization CC @efriis Twitter handle: @ofermend	5 months ago
Andreas Motl	3613d8a2ad	community[patch]: Use SQLAlchemy's `bulk_save_objects` method to improve insert performance (#16244 ) - Description: Improve [pgvector vector store adapter](https://github.com/langchain-ai/langchain/blob/v0.1.1/libs/community/langchain_community/vectorstores/pgvector.py) to save embeddings in batches, to improve its performance. - Issue: NA - Dependencies: NA - References: https://github.com/crate-workbench/langchain/pull/1 Hi again from the CrateDB team, following up on GH-16243, this is another minor patch to the pgvector vector store adapter. Inserting embeddings in batches, using [SQLAlchemy's `bulk_save_objects`](https://docs.sqlalchemy.org/en/20/orm/session_api.html#sqlalchemy.orm.Session.bulk_save_objects) method, can deliver substantial performance gains. With kind regards, Andreas. NB: As I am seeing just now that this method is a legacy feature of SA 2.0, it will need to be reworked on a future iteration. However, it is not deprecated yet, and I haven't been able to come up with a different implementation, yet.	6 months ago
Christophe Bornet	fb940d11df	community[patch]: Use newer MetadataVectorCassandraTable in Cassandra vector store (#15987 ) as VectorTable is deprecated Tested manually with `test_cassandra.py` vector store integration test.	6 months ago
Felix Krones	d91126fc64	community[patch]: missing unpack operator for or_clause in pgvector document filter (#16148 ) - Fix for #16146 - Adding unpack operation to "or" and "and" filter for pgvector retriever. #	6 months ago
James Briggs	ca288d8f2c	community[patch]: add vector param to index query for pinecone vec store (#16054 )	6 months ago
Antonio Morales	476fb328ee	community[patch]: implement adelete from VectorStore in Qdrant (#16005 ) Description: Implement `adelete` function from `VectorStore` in `Qdrant` to support other asynchronous flows such as async indexing (`aindex`) which requires `adelete` to be implemented. Since `Qdrant` can be passed an async qdrant client, this can be supported easily. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	6 months ago
高远	061e63eef2	community[minor]: add vikingdb vecstore (#15155 ) --------- Co-authored-by: gaoyuan <gaoyuan.20001218@bytedance.com>	6 months ago
盐粒 Yanli	ddf4e7c633	community[minor]: Update pgvecto_rs to use its high level sdk (#15574 ) - Description: Update pgvecto_rs to use its high level sdk, - Issue: fix #15173	6 months ago
YHW	ce21392a21	community: add a flag that determines whether to load the milvus collection (#15693 ) fix https://github.com/langchain-ai/langchain/issues/15694 --------- Co-authored-by: hyungwookyang <hyungwookyang@worksmobile.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	6 months ago
JaguarDB	b11fd3bedc	community[patch]: jaguar vector store fix integer-element error when joining metadata values (#15939 ) - Description: some document loaders add integer-type metadata values which cause error - Issue: 15937 - Dependencies: none --------- Co-authored-by: JY <jyjy@jaguardb>	6 months ago
Neo Zhao	21e0df937f	community[patch]: fix a bug that mistakenly handle zip iterator in FAISS.from_embeddings (#16020 ) Description: `zip` is iterator that will only produce result once, so the previous code will cause the `embeddings` to be an empty list. Issue: I could not find a related issue. Dependencies: this PR does not introduce or affect dependencies. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	6 months ago
Ashley Xu	ce7723c1e5	community[minor]: add additional support for `BigQueryVectorSearch` (#15904 ) BigQuery vector search lets you use GoogleSQL to do semantic search, using vector indexes for fast but approximate results, or using brute force for exact results. This PR: 1. Add `metadata[_job_ib]` in Document returned by any similarity search 2. Add `explore_job_stats` to enable users to explore job statistics and better the debuggability 3. Set the minimum row limit for running create vector index.	6 months ago
Karim Lalani	768e5e33bc	community[minor]: Fix to match SurrealDB 0.3.2 SDK (#15996 ) New version of SurrealDB python sdk was causing the integration to break. This fix addresses that change.	6 months ago
Varik Matevosyan	efe6cfafe2	community: Added Lantern as VectorStore (#12951 ) Support [Lantern](https://github.com/lanterndata/lantern) as a new VectorStore type. - Added Lantern as VectorStore. It will support 3 distance functions `l2 squared`, `cosine` and `hamming` and will use `HNSW` index. - Added tests - Added example notebook	6 months ago

1 2

96 Commits (70c296ae96972344a291389f02258309fa5387a1)