langchain

Commit Graph

Author	SHA1	Message	Date
Jakub Pawłowski	b0b1a67771	community[patch]: Skip unexpected 404 HTTP Error in Arxiv download (#21042 ) ### Description: When attempting to download PDF files from arXiv, an unexpected 404 error frequently occurs. This error halts the operation, regardless of whether there are additional documents to process. As a solution, I suggest implementing a mechanism to ignore and communicate this error and continue processing the next document from the list. Proposed Solution: To address the issue of unexpected 404 errors during PDF downloads from arXiv, I propose implementing the following solution: - Error Handling: Implement error handling mechanisms to catch and handle 404 errors gracefully. - Communication: Inform the user or logging system about the occurrence of the 404 error. - Continued Processing: After encountering a 404 error, continue processing the remaining documents from the list without interruption. This solution ensures that the application can handle unexpected errors without terminating the entire operation. It promotes resilience and robustness in the face of intermittent issues encountered during PDF downloads from arXiv. ### Issue: #20909 ### Dependencies: none --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	1 month ago
Erick Friis	b9c53e95b7	community: release 0.0.35 (#21104 )	1 month ago
Eugene Yurtsev	3c064a757f	core[minor],langchain[patch],community[patch]: Move storage interfaces to core (#20750 ) * Move storage interface to core * Move in memory and file system implementation to core	1 month ago
Charlie Marsh	8f38b7a725	multiple: Remove unnecessary Ruff suppression comments (#21050 ) ## Summary I ran `ruff check --extend-select RUF100 -n` to identify `# noqa` comments that weren't having any effect in Ruff, and then `ruff check --extend-select RUF100 -n --fix` on select files to remove all of the unnecessary `# noqa: F401` violations. It's possible that these were needed at some point in the past, but they're not necessary in Ruff v0.1.15 (used by LangChain) or in the latest release. Co-authored-by: Erick Friis <erick@langchain.dev>	1 month ago
Christophe Bornet	5c77f45b06	community[minor]: Add async methods to CassandraCache and CassandraSemanticCache (#20654 )	1 month ago
Kuro Denjiro	fa4124b821	community[minor]: add mintbase loader to langchain (#20089 ) - [x] Add Near NFT loader: "community: Load NFT near block chain using mintbase graph API" - [x] PR message: - Description: a description of the change - Twitter handle:Kurodenjiro --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 month ago
Alexander Dicke	d7e12750df	community[patch]: allows using `text-generation-inference` /generate route with `HuggingFaceEndpoint` (#20100 ) - Description: allows to use the /generate route of `text-generation-inference` with the `HuggingFaceEndpoint`	1 month ago
davidkgp	28b0b0d863	community[patch]: Fix for github issue #17690 (#20117 ) …/17690 Thank you for contributing to LangChain! - [x] Fix Google Lens knowledge graph issue: "langchain: community" - Fix for [No "knowledge_graph" property in Google Lens API call from SerpAPI](https://github.com/langchain-ai/langchain/issues/17690) - [x] PR message: *Delete this entire checklist* and replace with - Description: handled the existence of keys in the json response of Google Lens - Issue: [No "knowledge_graph" property in Google Lens API call from SerpAPI](https://github.com/langchain-ai/langchain/issues/17690) - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	1 month ago
高远	a7a4630bf4	community[patch]: Modify the text field type and add new exception handling (#20116 ) Co-authored-by: gaoyuan <gaoyuan.20001218@bytedance.com>	1 month ago
Rahul Triptahi	c172611647	community[patch]: Add classifier_url argument in PebbloSafeLoader and documentation update. (#21030 ) Description: Add classifier_url argument in PebbloSafeLoader. Documentation: Updated PebbloSafeLoader documentation with above change and new links for pebblo github pages. --------- Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	1 month ago
Leonid Ganeline	85094cbb3a	docs: community docstring updates (#21040 ) Added missed docstrings. Updated docstrings to consistent format.	1 month ago
Rodrigo Nogueira	90f19028e5	community[patch]: Add maritalk streaming (sync and async) (#19203 ) Co-authored-by: RosevalJr <rdmalajr@gmail.com> Co-authored-by: Roseval Donisete Malaquias Junior <roseval@maritaca.ai>	1 month ago
Cahid Arda Öz	cc6191cb90	community[minor]: Add support for Upstash Vector (#20824 ) ## Description Adding `UpstashVectorStore` to utilize [Upstash Vector](https://upstash.com/docs/vector/overall/getstarted)! #17012 was opened to add Upstash Vector to langchain but was closed to wait for filtering. Now filtering is added to Upstash vector and we open a new PR. Additionally, [embedding feature](https://upstash.com/docs/vector/features/embeddingmodels) was added and we add this to our vectorstore aswell. ## Dependencies [upstash-vector](https://pypi.org/project/upstash-vector/) should be installed to use `UpstashVectorStore`. Didn't update dependencies because of [this comment in the previous PR](https://github.com/langchain-ai/langchain/pull/17012#pullrequestreview-1876522450). ## Tests Tests are added and they pass. Tests are naturally network bound since Upstash Vector is offered through an API. There was [a discussion in the previous PR about mocking the unittests](https://github.com/langchain-ai/langchain/pull/17012#pullrequestreview-1891820567). We didn't make changes to this end yet. We can update the tests if you can explain how the tests should be mocked. --------- Co-authored-by: ytkimirti <yusuftaha9@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 month ago
chyroc	3e241956d3	community[minor]: add coze chat model (#20770 ) add coze chat model, to call coze.com apis	1 month ago
Massimiliano Pronesti	ce89b34fc0	community[patch]: support hybrid search with threshold in Azure AI Search Retriever (#20907 ) Support hybrid search with a score threshold -- similar to what we do for similarity search.	1 month ago
Andrei Panferov	b3efa38cc0	community[patch]: GigaChat model selection fix (#20988 ) Fixed the error that the model name is never actually put into GigaChat request payload, always defaulting to `GigaChat-Lite`. With this fix, model selection through ```python import os from langchain.chat_models.gigachat import GigaChat chat = GigaChat( name="GigaChat-Pro", # <- HERE!!!!! ... ) ``` should actually work, as intended in [here](`804390ba4b/libs/community/langchain_community/llms/gigachat.py (L36)`). --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	1 month ago
Patrick McFadin	3331865f6b	community[minor]: add Cassandra Database Toolkit (#20246 ) Description: ToolKit and Tools for accessing data in a Cassandra Database primarily for Agent integration. Initially, this includes the following tools: - `cassandra_db_schema` Gathers all schema information for the connected database or a specific schema. Critical for the agent when determining actions. - `cassandra_db_select_table_data` Selects data from a specific keyspace and table. The agent can pass paramaters for a predicate and limits on the number of returned records. - `cassandra_db_query` Expiriemental alternative to `cassandra_db_select_table_data` which takes a query string completely formed by the agent instead of parameters. May be removed in future versions. Includes unit test and two notebooks to demonstrate usage. Dependencies: cassio Twitter handle: @PatrickMcFadin --------- Co-authored-by: Phil Miesle <phil.miesle@datastax.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 month ago
Igor Brai	b3e74f2b98	community[minor]: add mojeek search util (#20922 ) Description: This pull request introduces a new feature to community tools, enhancing its search capabilities by integrating the Mojeek search engine Dependencies: None --------- Co-authored-by: Igor Brai <igor@mojeek.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: ccurme <chester.curme@gmail.com>	1 month ago
Tomaz Bratanic	67428c4052	community[patch]: Neo4j enhanced schema (#20983 ) Scan the database for example values and provide them to an LLM for better inference of Text2cypher	1 month ago
Pengcheng Liu	1fad39be1c	community[minor]: Add LarkSuite wiki document loader. (#21016 ) Description: Add LarkSuite wiki document loader. Refer to [LarkSuite api document ](https://open.feishu.cn/document/server-docs/docs/wiki-v2/space-node/list)for details. Issue: None Dependencies: None Twitter handle: None	1 month ago
Leonid Ganeline	dc7c06bc07	community[minor]: import fix (#20995 ) Issue: When the third-party package is not installed, whenever we need to `pip install <package>` the ImportError is raised. But sometimes, the `ValueError` or `ModuleNotFoundError` is raised. It is bad for consistency. Change: replaced the `ValueError` or `ModuleNotFoundError` with `ImportError` when we raise an error with the `pip install <package>` message. Note: Ideally, we replace all `try: import... except... raise ... `with helper functions like `import_aim` or just use the existing [langchain_core.utils.utils.guard_import](https://api.python.langchain.com/en/latest/utils/langchain_core.utils.utils.guard_import.html#langchain_core.utils.utils.guard_import) But it would be much bigger refactoring. @baskaryan Please, advice on this.	1 month ago
WilliamEspegren	804390ba4b	community: Spider integration (#20937 ) Added the [Spider.cloud](https://spider.cloud) document loader. [Spider](https://github.com/spider-rs/spider) is the [fastest](https://github.com/spider-rs/spider/blob/main/benches/BENCHMARKS.md) and cheapest crawler that returns LLM-ready data. ``` - Description: Adds Spider data loader - Dependencies: spider-client - Twitter handle: @WilliamEspegren ``` --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: = <=> Co-authored-by: Chester Curme <chester.curme@gmail.com>	1 month ago
Chip Davis	e818c75f8a	infra: test directory loader multithreaded (#20281 ) This is a unit test for #20230 which was a fix for using multithreaded mode with directory loader @eyurtsev	1 month ago
Guilherme Zanotelli	f931a9ce60	community[patch]: Pass kwargs to SPARQLStore from RdfGraph (#20385 ) This introduces `store_kwargs` which behaves similarly to `graph_kwargs` on the `RdfGraph` object, which will enable users to pass `headers` and other arguments to the underlying `SPARQLStore` object. I have also made a [PR in `rdflib` to support passing `default_graph`](https://github.com/RDFLib/rdflib/pull/2761). Example usage: ```python from langchain_community.graphs import RdfGraph graph = RdfGraph( query_endpoint="http://localhost/sparql", standard="rdf", store_kwargs=dict( default_graph="http://example.com/mygraph" ) ) ``` <!--If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.--> --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 month ago
Jorge Piedrahita Ortiz	40b2e2916b	community[minor]: Sambanova llm integration (#20955 ) - Description: Added [Sambanova systems](https://sambanova.ai/) integration, including sambaverse and sambastudio LLMs - Dependencies: sseclient-py (optional) --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 month ago
Rahul Triptahi	955cf186d2	community[patch]: Ingest source, owner and full_path if present in Document's metadata. (#20949 ) Description: The PebbloSafeLoader should first check for owner, full_path and size in metadata before implementing its own logic. Dependencies: None Documentation: NA. Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	1 month ago
Amine Djeghri	790ea75cf7	community[minor]: add exllamav2 library for GPTQ & EXL2 models (#17817 ) Added 3 files : - Library : ExLlamaV2 - Test integration - Notebook --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	1 month ago
Naveen Tatikonda	8bbdb4f6a0	community[patch]: Add OpenSearch as semantic cache (#20254 ) ### Description Use OpenSearch vector store as Semantic Cache. ### Twitter Handle @OpenSearchProj --------- Signed-off-by: Naveen Tatikonda <navtat@amazon.com> Co-authored-by: Harish Tatikonda <harishtatikonda@Harishs-MacBook-Air.local> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-31-155.ec2.internal> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 month ago
Mayank Solanki	8c085fc697	community[patch]: Added a function `from_existing_collection` in `Qdrant` vector database. (#20779 ) Issue: #20514 The current implementation of `construct_instance` expects a `texts: List[str]` that will call the embedding function. This might not be needed when we already have a client with collection and `path, you don't want to add any text. This PR adds a class method that returns a qdrant instance with an existing client. Here everytime `cb6e5e56c2/libs/community/langchain_community/vectorstores/qdrant.py (L1592)` `construct_instance` is called, this line sends some text for embedding generation. --------- Co-authored-by: Anush <anushshetty90@gmail.com>	1 month ago
Leonid Kuligin	893a924b90	core[minor], community[patch], langchain[patch]: move BaseChatLoader to core (#19607 ) Thank you for contributing to LangChain! - [ ] PR title: "core: move BaseChatLoader and BaseToolkit from community" - [ ] PR message: move BaseChatLoader and BaseToolkit --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 month ago
Lei Zhang	9281841cfe	community[patch]: fix integrated test case test_recursive_url_loader.py assertions (issue-20919) (#20920 ) Description: Fix integrated test case test_recursive_url_loader.py Local testing successful ```shell (venv) lei@LeideMacBook-Pro community % poetry run pytest tests/integration_tests/document_loaders/test_recursive_url_loader.py ================================================================================ test session starts ================================================================================ platform darwin -- Python 3.11.4, pytest-7.4.4, pluggy-1.4.0 -- /Users/zhanglei/Work/github/langchain/venv/bin/python cachedir: .pytest_cache rootdir: /Users/zhanglei/Work/github/langchain/libs/community configfile: pyproject.toml plugins: syrupy-4.6.1, asyncio-0.20.3, cov-4.1.0, vcr-1.0.2, mock-3.12.0, anyio-3.7.1, dotenv-0.5.2, requests-mock-1.11.0, socket-0.6.0 asyncio: mode=Mode.AUTO collected 6 items tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_async_recursive_url_loader PASSED [ 16%] tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_async_recursive_url_loader_deterministic PASSED [ 33%] tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_recursive_url_loader FAILED [ 50%] tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_async_equivalent PASSED [ 66%] tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_loading_invalid_url PASSED [ 83%] tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_async_metadata_necessary_properties PASSED [100%] ===================================================================================== FAILURES ====================================================================================== __________________________________________________________________________ test_sync_recursive_url_loader ___________________________________________________________________________ def test_sync_recursive_url_loader() -> None: url = "https://docs.python.org/3.9/" loader = RecursiveUrlLoader( url, extractor=lambda _: "placeholder", use_async=False, max_depth=2 ) docs = loader.load() > assert len(docs) == 23 E AssertionError: assert 24 == 23 E + where 24 = len([Document(page_content='placeholder', metadata={'source': 'https://docs.python.org/3.9/', 'content_type': 'text/html', 'title': '3.9.18 Documentation', 'language': None}), Document(page_content='placeholder', metadata={'source': 'https://docs.python.org/3.9/py-modindex.html', 'content_type': 'text/html', 'title': 'Python Module Index — Python 3.9.18 documentation', 'language': None}), Document(page_content='placeholder', metadata={'source': 'https://docs.python.org/3.9/download.html', 'content_type': 'text/html', 'title': 'Download — Python 3.9.18 documentation', 'language': None}), Document(page_content='placeholder', metadata={'source': 'https://docs.python.org/3.9/howto/index.html', 'content_type': 'text/html', 'title': 'Python HOWTOs — Python 3.9.18 documentation', 'language': None}), Document(page_content='placeholder', metadata={'source': 'https://docs.python.org/3.9/whatsnew/index.html', 'content_type': 'text/html', 'title': 'Whatâ\x80\x99s New in Python — Python 3.9.18 documentation', 'language': None}), Document(page_content='placeholder', metadata={'source': 'https://docs.python.org/3.9/c-api/index.html', 'content_type': 'text/html', 'title': 'Python/C API Reference Manual — Python 3.9.18 documentation', 'language': None}), ...]) tests/integration_tests/document_loaders/test_recursive_url_loader.py:38: AssertionError ================================================================================= warnings summary ================================================================================== tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_async_recursive_url_loader tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_async_recursive_url_loader_deterministic tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_recursive_url_loader tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_async_equivalent tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_async_metadata_necessary_properties /Users/zhanglei/.pyenv/versions/3.11.4/lib/python3.11/html/parser.py:170: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor. k = self.parse_starttag(i) -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ================================================================================ slowest 5 durations ================================================================================ 56.75s call tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_async_recursive_url_loader_deterministic 38.99s call tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_async_recursive_url_loader 31.20s call tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_async_metadata_necessary_properties 30.37s call tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_async_equivalent 15.44s call tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_recursive_url_loader ============================================================================== short test summary info ============================================================================== FAILED tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_recursive_url_loader - AssertionError: assert 24 == 23 ================================================================ 1 failed, 5 passed, 5 warnings in 172.97s (0:02:52) ================================================================ (venv) zhanglei@LeideMacBook-Pro community % poetry run pytest tests/integration_tests/document_loaders/test_recursive_url_loader.py ================================================================================ test session starts ================================================================================ platform darwin -- Python 3.11.4, pytest-7.4.4, pluggy-1.4.0 -- /Users/zhanglei/Work/github/langchain/venv/bin/python cachedir: .pytest_cache rootdir: /Users/zhanglei/Work/github/langchain/libs/community configfile: pyproject.toml plugins: syrupy-4.6.1, asyncio-0.20.3, cov-4.1.0, vcr-1.0.2, mock-3.12.0, anyio-3.7.1, dotenv-0.5.2, requests-mock-1.11.0, socket-0.6.0 asyncio: mode=Mode.AUTO collected 6 items tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_async_recursive_url_loader PASSED [ 16%] tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_async_recursive_url_loader_deterministic PASSED [ 33%] tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_recursive_url_loader PASSED [ 50%] tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_async_equivalent PASSED [ 66%] tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_loading_invalid_url PASSED [ 83%] tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_async_metadata_necessary_properties PASSED [100%] ================================================================================= warnings summary ================================================================================== tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_async_recursive_url_loader tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_async_recursive_url_loader_deterministic tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_recursive_url_loader tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_async_equivalent tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_async_metadata_necessary_properties /Users/zhanglei/.pyenv/versions/3.11.4/lib/python3.11/html/parser.py:170: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor. k = self.parse_starttag(i) -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ================================================================================ slowest 5 durations ================================================================================ 46.99s call tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_async_recursive_url_loader_deterministic 32.43s call tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_async_recursive_url_loader 31.23s call tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_async_equivalent 30.75s call tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_async_metadata_necessary_properties 15.89s call tests/integration_tests/document_loaders/test_recursive_url_loader.py::test_sync_recursive_url_loader ===================================================================== 6 passed, 5 warnings in 157.42s (0:02:37) ===================================================================== (venv) lei@LeideMacBook-Pro community % ``` Issue: https://github.com/langchain-ai/langchain/issues/20919 Twitter handle: @coolbeevip	1 month ago
Matt	28df4750ef	community[patch]: Add initial tests for AzureSearch vector store (#17663 ) Description: AzureSearch vector store has no tests. This PR adds initial tests to validate the code can be imported and used. Issue: N/A Dependencies: azure-search-documents and azure-identity are added as optional dependencies for testing --------- Co-authored-by: Matt Gotteiner <[email protected]> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 month ago
Dristy Srivastava	5f1d1666e3	community[patch]: Add support for pebblo server and client version (#20269 ) Description: _PebbloSafeLoader_: Add support for pebblo server and client version Documentation: NA Unit test: NA Issue: NA Dependencies: None --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	1 month ago
am-kinetica	b54b19ba1c	community[minor]: Implemented Kinetica Document Loader and added notebooks (#20002 ) - [ ] Kinetica Document Loader: "community: a class to load Documents from Kinetica" - [ ] Kinetica Document Loader: - Description: implemented KineticaLoader in `kinetica_loader.py` - Dependencies: install the Kinetica API using `pip install gpudb==7.2.0.1 `	1 month ago
Shengsheng Huang	fd1061e7bf	community[patch]: add more data types support to ipex-llm llm integration (#20833 ) - Description: - add support for more data types: by default `IpexLLM` will load the model in int4 format. This PR adds more data types support such as `sym_in5`, `sym_int8`, etc. Data formats like NF3, NF4, FP4 and FP8 are only supported on GPU and will be added in future PR. - Fix a small issue in saving/loading, update api docs - Dependencies: `ipex-llm` library - Document: In `docs/docs/integrations/llms/ipex_llm.ipynb`, added instructions for saving/loading low-bit model. - Tests: added new test cases to `libs/community/tests/integration_tests/llms/test_ipex_llm.py`, added config params. - Contribution maintainer: @shane-huang	1 month ago
Rahul Triptahi	dc921f0823	community[patch]: Add semantic info to metadata, classified by pebblo-server. (#20468 ) Description: Add support for Semantic topics and entities. Classification done by pebblo-server is not used to enhance metadata of Documents loaded by document loaders. Dependencies: None Documentation: Updated. Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	1 month ago
Jingpan Xiong	1202017c56	community[minor]: Add relyt vector database (#20316 ) Co-authored-by: kaka <kaka@zbyte-inc.cloud> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: jingsi <jingsi@leadincloud.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	1 month ago
davidefantiniIntel	f386f71bb3	community: fix tqdm import (#20263 ) Description: Fix tqdm import in QuantizedBiEncoderEmbeddings	1 month ago
Andres Algaba	05ae8ca7d4	community[patch]: deprecate persist method in Chroma (#20855 ) Thank you for contributing to LangChain! - [x] PR title - [x] PR message: - Description: Deprecate persist method in Chroma no longer exists in Chroma 0.4.x - Issue: #20851 - Dependencies: None - Twitter handle: AndresAlgaba1 - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	1 month ago
ccurme	b8db73233c	core, community: deprecate tool.__call__ (#20900 ) Does not update docs.	1 month ago
Tomaz Bratanic	520972fd0f	community[patch]: Support passing graph object to Neo4j integrations (#20876 ) For driver connection reusage, we introduce passing the graph object to neo4j integrations	1 month ago
Lei Zhang	748a6ae609	community[patch]: add HTTP response headers Content-Type to metadata of RecursiveUrlLoader document (#20875 ) Description: The RecursiveUrlLoader loader offers a link_regex parameter that can filter out URLs. However, this filtering capability is limited, and if the internal links of the website change, unexpected resources may be loaded. These resources, such as font files, can cause problems in subsequent embedding processing. > https://blog.langchain.dev/assets/fonts/source-sans-pro-v21-latin-ext_latin-regular.woff2?v=0312715cbf We can add the Content-Type in the HTTP response headers to the document metadata so developers can choose which resources to use. This allows developers to make their own choices. For example, the following may be a good choice for text knowledge. - text/plain - simple text file - text/html - HTML web page - text/xml - XML format file - text/json - JSON format data - application/pdf - PDF file - application/msword - Word document and ignore the following - text/css - CSS stylesheet - text/javascript - JavaScript script - application/octet-stream - binary data - image/jpeg - JPEG image - image/png - PNG image - image/gif - GIF image - image/svg+xml - SVG image - audio/mpeg - MPEG audio files - video/mp4 - MP4 video file - application/font-woff - WOFF font file - application/font-ttf - TTF font file - application/zip - ZIP compressed file - application/octet-stream - binary data Twitter handle: @coolbeevip --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	1 month ago
Joan Fontanals	baefbfb14e	community[mionr]: add Jina Reranker in retrievers module (#19406 ) - Description: Adapt JinaEmbeddings to run with the new Jina AI Rerank API - Twitter handle: https://twitter.com/JinaAI_ - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 month ago
Jason_Chen	53bb7dbd29	community[patch]: add BeautifulSoupTransformer remove_unwanted_classnames method (#20467 ) Add the remove_unwanted_classnames method to the BeautifulSoupTransformer class, which can filter more effectively. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 month ago
Bagatur	5b83130855	core[minor], langchain[patch], community[patch]: mv StructuredQuery (#20849 ) mv StructuredQuery to core	1 month ago
Mish Ushakov	6ccecf2363	community[minor]: added Browserbase loader (#20478 )	1 month ago
ccurme	481d3855dc	patch: remove usage of llm, chat model __call__ (#20788 ) - `llm(prompt)` -> `llm.invoke(prompt)` - `llm(prompt=prompt` -> `llm.invoke(prompt)` (same with `messages=`) - `llm(prompt, callbacks=callbacks)` -> `llm.invoke(prompt, config={"callbacks": callbacks})` - `llm(prompt, kwargs)` -> `llm.invoke(prompt, kwargs)`	1 month ago
Raghav Dixit	9b7fb381a4	community[patch]: LanceDB integration patch update (#20686 ) Description : - added functionalities - delete, index creation, using existing connection object etc. - updated usage - Added LaceDB cloud OSS support make lint_diff , make test checks done	1 month ago
volodymyr-memsql	493afe4d8d	community[patch]: add hybrid search to singlestoredb vectorstore (#20793 ) Implemented the ability to enable full-text search within the SingleStore vector store, offering users a versatile range of search strategies. This enhancement allows users to seamlessly combine full-text search with vector search, enabling the following search strategies: * Search solely by vector similarity. * Conduct searches exclusively based on text similarity, utilizing Lucene internally. * Filter search results by text similarity score, with the option to specify a threshold, followed by a search based on vector similarity. * Filter results by vector similarity score before conducting a search based on text similarity. * Perform searches using a weighted sum of vector and text similarity scores. Additionally, integration tests have been added to comprehensively cover all scenarios. Updated notebook with examples. CC: @baskaryan, @hwchase17 --------- Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	1 month ago
Tomaz Bratanic	9efab3ed66	community[patch]: Add driver config param for neo4j graph (#20772 ) Co-authored-by: Bagatur <baskaryan@gmail.com>	1 month ago
Leonid Ganeline	13751c3297	community: `tigergraph` fixes (#20034 ) - added guard on the `pyTigerGraph` import - added a missed example page in the `docs/integrations/graphs/` - formatted the `docs/integrations/providers/` page to the consistent format. Added links.	1 month ago
Martin Kolb	0186e4e633	community[patch]: Advanced filtering for HANA Cloud Vector Engine (#20821 ) - Description: This PR adds support for advanced filtering to the integration of HANA Vector Engine. The newly supported filtering operators are: $eq, $ne, $gt, $gte, $lt, $lte, $between, $in, $nin, $like, $and, $or - Issue: N/A - Dependencies: no new dependencies added Added integration tests to: `libs/community/tests/integration_tests/vectorstores/test_hanavector.py` Description of the new capabilities in notebook: `docs/docs/integrations/vectorstores/hanavector.ipynb`	1 month ago
Alex Sherstinsky	12e5ec6de3	community: Support both Predibase SDK-v1 and SDK-v2 in Predibase-LangChain integration (#20859 )	1 month ago
JeffKatzy	5ab3f9a995	community[patch]: standardize chat init args (#20844 ) Thank you for contributing to LangChain! community:perplexity[patch]: standardize init args updated pplx_api_key and request_timeout so that aliased to api_key, and timeout respectively. Added test that both continue to set the same underlying attributes. Related to [20085](https://github.com/langchain-ai/langchain/issues/20085) --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	1 month ago
Massimiliano Pronesti	8d1167b32f	community[patch]: add support for similarity_score_threshold search in… (#20852 ) See https://github.com/langchain-ai/langchain/issues/20600#issuecomment-2075569338 for details. @chrislrobert	1 month ago
Eugene Yurtsev	30e48c9878	core[patch],community[patch]: Move file chat history back to community (#20834 ) Marking as patch since we haven't had releases in between. This just reverting part of a PR from yesterday.	1 month ago
Nestor Qin	9111d3a636	community[patch]: Fix message formatting for Anthropic models on Amazon Bedrock (#20801 ) Description: This PR fixes an issue in message formatting function for Anthropic models on Amazon Bedrock. Currently, LangChain BedrockChat model will crash if it uses Anthropic models and the model return a message in the following type: - `AIMessageChunk` Moreover, when use BedrockChat with for building Agent, the following message types will trigger the same issue too: - `HumanMessageChunk` - `FunctionMessage` Issue: https://github.com/langchain-ai/langchain/issues/18831 Dependencies: No. Testing: Manually tested. The following code was failing before the patch and works after. ``` @tool def square_root(x: str): "Useful when you need to calculate the square root of a number" return math.sqrt(int(x)) llm = ChatBedrock( model_id="anthropic.claude-3-sonnet-20240229-v1:0", model_kwargs={ "temperature": 0.0 }, ) prompt = ChatPromptTemplate.from_messages( [ ("system", FUNCTION_CALL_PROMPT), ("human", "Question: {user_input}"), MessagesPlaceholder(variable_name="agent_scratchpad"), ] ) tools = [square_root] tools_string = format_tool_to_anthropic_function(square_root) agent = ( RunnablePassthrough.assign( user_input=lambda x: x['user_input'], agent_scratchpad=lambda x: format_to_openai_function_messages( x["intermediate_steps"] ) ) \| prompt \| llm \| AnthropicFunctionsAgentOutputParser() ) agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, return_intermediate_steps=True) output = agent_executor.invoke({ "user_input": "What is the square root of 2?", "tools_string": tools_string, }) ``` List of messages returned from Bedrock: ``` <SystemMessage> content='You are a helpful assistant.' <HumanMessage> content='Question: What is the square root of 2?' <AIMessageChunk> content="Okay, let's calculate the square root of 2.<scratchpad>\nTo calculate the square root of a number, I can use the square_root tool:\n\n<function_calls>\n <invoke>\n <tool_name>square_root</tool_name>\n <parameters>\n <__arg1>2</__arg1>\n </parameters>\n </invoke>\n</function_calls>\n</scratchpad>\n\n<function_results>\n<search_result>\nThe square root of 2 is approximately 1.414213562373095\n</search_result>\n</function_results>\n\n<answer>\nThe square root of 2 is approximately 1.414213562373095\n</answer>" id='run-92363df7-eff6-4849-bbba-fa16a1b2988c'" <FunctionMessage> content='1.4142135623730951' name='square_root' ```	2 months ago
Aliaksandr Kuzmik	5560cc448c	community[patch]: fix CometTracer bug (#20796 ) Hi! My name is Alex, I'm an SDK engineer from [Comet](https://www.comet.com/site/) This PR updates the `CometTracer` class. Fixed an issue when `CometTracer` failed while logging the data to Comet because this data is not JSON-encodable. The problem was in some of the `Run` attributes that could contain non-default types inside, now these attributes are taken not from the run instance, but from the `run.dict()` return value.	2 months ago
Eugene Yurtsev	645b1e142e	core[minor],langchain[patch],community[patch]: Move InMemory and File implementations of Chat History to core (#20752 ) This PR moves the implementations for chat history to core. So it's easier to determine which dependencies need to be broken / add deprecation warnings	2 months ago
ccurme	7a922f3e48	core, openai: support custom token encoders (#20762 )	2 months ago
Christophe Bornet	0ae5027d98	community[patch]: Remove usage of deprecated StoredBlobHistory in CassandraChatMessageHistory (#20666 )	2 months ago
Eugene Yurtsev	38adbfdf34	community[patch],core[minor]: Move BaseToolKit to core.tools (#20669 )	2 months ago
Mark Needham	ce23f8293a	Community patch clickhouse make it possible to not specify index (#20460 ) Vector indexes in ClickHouse are experimental at the moment and can sometimes break/change behaviour. So this PR makes it possible to say that you don't want to specify an index type. Any queries against the embedding column will be brute force/linear scan, but that gives reasonable performance for small-medium dataset sizes. --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
ccurme	c010ec8b71	patch: deprecate (a)get_relevant_documents (#20477 ) - `.get_relevant_documents(query)` -> `.invoke(query)` - `.get_relevant_documents(query=query)` -> `.invoke(query)` - `.get_relevant_documents(query, callbacks=callbacks)` -> `.invoke(query, config={"callbacks": callbacks})` - `.get_relevant_documents(query, kwargs)` -> `.invoke(query, kwargs)` --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2 months ago
Matheus Henrique Raymundo	bb69819267	community: Fix the stop sequence key name for Mistral in Bedrock (#20709 ) Fixing the wrong stop sequence key name that causes an error on AWS Bedrock. You can check the MistralAI bedrock parameters [here](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-mistral.html) This change fixes this [issue](https://github.com/langchain-ai/langchain/issues/20095)	2 months ago
Bagatur	1c7b3c75a7	community[patch], experimental[patch]: support tool-calling sql and p… (#20639 ) d agents	2 months ago
shumway743	cb6e5e56c2	community[minor]: add graph store implementation for apache age (#20582 ) Description: implemented GraphStore class for Apache Age graph db Dependencies: depends on psycopg2 Unit and integration tests included. Formatting and linting have been run. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Christophe Bornet	c909ae0152	community[minor]: Add async methods to CassandraVectorStore (#20602 ) Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2 months ago
Dmitry Tyumentsev	f111efeb6e	community[patch]: YandexGPT API add ability to disable request logging (#20670 ) Closes (#20622) Added the ability to [disable logging of requests to YandexGPT](https://yandex.cloud/en/docs/foundation-models/operations/yandexgpt/disable-logging).	2 months ago
Erick Friis	73809817ff	community: release 0.0.34 (#20672 )	2 months ago
Tomaz Bratanic	8c08cf4619	community: Add support for relationship indexes in neo4j vector (#20657 ) Neo4j has added relationship vector indexes. We can't populate them, but we can use existing indexes for retrieval	2 months ago
Charlie Holtz	1cbab0ebda	community: update Replicate to work with official models (#20633 ) Description: you don't need to pass a version for Replicate official models. That was broken on LangChain until now! You can now run: ``` llm = Replicate( model="meta/meta-llama-3-8b-instruct", model_kwargs={"temperature": 0.75, "max_length": 500, "top_p": 1}, ) prompt = """ User: Answer the following yes/no question by reasoning step by step. Can a dog drive a car? Assistant: """ llm(prompt) ``` I've updated the replicate.ipynb to reflect that. twitter: @charliebholtz --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2 months ago
Congyu	dd5139e304	community[patch]: truncate zhipuai `temperature` and `top_p` parameters to [0.01, 0.99] (#20261 ) ZhipuAI API only accepts `temperature` parameter between `(0, 1)` open interval, and if `0` is passed, it responds with status code `400`. However, 0 and 1 is often accepted by other APIs, for example, OpenAI allows `[0, 2]` for temperature closed range. This PR truncates temperature parameter passed to `[0.01, 0.99]` to improve the compatibility between langchain's ecosystem's and ZhipuAI (e.g., ragas `evaluate` often generates temperature 0, which results in a lot of 400 invalid responses). The PR also truncates `top_p` parameter since it has the same restriction. Reference: [glm-4 doc](https://open.bigmodel.cn/dev/api#glm-4) (which unfortunately is in Chinese though). --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
Lance Martin	d5c22b80a5	community[patch]: Fix Ollama for LLaMA3 (#20624 ) We see verbose generations w/ LLaMA3 and Ollama - https://smith.langchain.com/public/88c4cd21-3d57-4229-96fe-53443398ca99/r --- Fix here implies that when stop was being set to an empty list, the stream had no conditions under which to stop, which could lead to excessive or unintended output. Test LLaMA2 - https://smith.langchain.com/public/57dfc64a-591b-46fa-a1cd-8783acaefea2/r Test LLaMA3 - https://smith.langchain.com/public/76ff5f47-ac89-4772-a7d2-5caa907d3fd6/r https://smith.langchain.com/public/a31d2fad-9094-4c93-949a-964b27630ccb/r Test Mistral - https://smith.langchain.com/public/a4fe7114-c308-4317-b9fd-6c86d31f1c5b/r --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2 months ago
hulitaitai	7d0a008744	community[minor]: Add audio-parser "faster-whisper" in audio.py (#20012 ) faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is up to 4 times faster than enai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU. It can automatically detect the following 14 languages and transcribe the text into their respective languages: en, zh, fr, de, ja, ko, ru, es, th, it, pt, vi, ar, tr. The gitbub repository for faster-whisper is : https://github.com/SYSTRAN/faster-whisper --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2 months ago
Guangdong Liu	e3c2431c5b	comminuty[patch]:Fix Error in apache doris insert (#19989 ) - Issue: #19886	2 months ago
Tomaz Bratanic	27370b679e	community[patch]: Ignore null and invalid embedding values for neo4j metadata filtering (#20558 )	2 months ago
Massimiliano Pronesti	2542a09abc	community[patch]: AzureSearch incorrectly converted to retriever (#20601 ) Closes #20600. Please see the issue for more details.	2 months ago
Christophe Bornet	8f0b5687a3	community[minor]: Add hybrid search to Cassandra VectorStore (#20286 ) Only supported by Astra DB at the moment. Twitter handle: cbornet_	2 months ago
Christophe Bornet	d2d01370bc	community[minor]: Add async methods to CassandraLoader (#20609 ) Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2 months ago
balloonio	e786da7774	community[patch]: Invoke callback prior to yielding token fix [HuggingFaceTextGenInference] (#20426 ) …gFaceTextGenInference) - [x] PR title: community[patch]: Invoke callback prior to yielding token fix for [HuggingFaceTextGenInference] - [x] PR message: - Description: Invoke callback prior to yielding token in stream method in [HuggingFaceTextGenInference] - Issue: https://github.com/langchain-ai/langchain/issues/16913 - Dependencies: None - Twitter handle: @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2 months ago
Ethan Yang	2d6d796040	community: Add save_model function for openvino reranker and embedding (#19896 )	2 months ago
zR	9c1d7f2405	update zhipuai notebook (#20595 ) fix timeout issue fix zhipuai usecase notebookbook Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	2 months ago
ccurme	c897264b9b	community: (milvus) check for num_shards (#20603 ) @rgupta2508 I believe this change is necessary following https://github.com/langchain-ai/langchain/pull/20318 because of how Milvus handles defaults: `59bf5e811a/pymilvus/client/prepare.py (L82-L85)` ```python num_shards = kwargs[next(iter(same_key))] if not isinstance(num_shards, int): msg = f"invalid num_shards type, got {type(num_shards)}, expected int" raise ParamError(message=msg) req.shards_num = num_shards ``` this way lets Milvus control the default value (instead of maintaining a separate default in Langchain). Let me know if I've got this wrong or you feel it's unnecessary. Thanks.	2 months ago
Rohit Gupta	25c4c24e89	Support to create shards_num in milvus vectorstores (#20318 ) To support number of the shards for the collection to create in milvus vvectorstores. Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	2 months ago
Erick Friis	e395115807	docs: aws docs updates (#20571 )	2 months ago
Erick Friis	f09bd0b75b	upstage: init package (#20574 ) Co-authored-by: Sean Cho <sean@upstage.ai> Co-authored-by: JuHyung-Son <sonju0427@gmail.com>	2 months ago
Marco Perini	11c9ed3362	community[patch]: exposing headless flag parameter to AsyncChromiumLoader class (#20424 ) - Description: added the headless parameter as optional argument to the langchain_community.document_loaders AsyncChromiumLoader class - Dependencies: None - Twitter handle: @perinim_98 If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Christophe Bornet	a22da4315b	community[patch]: Replace function in CassandraVectorStore with simpler lambda (#20323 )	2 months ago
Christophe Bornet	75733c5cc1	community[minor]: Improve CassandraVectorStore from_texts (#20284 )	2 months ago
Tomer Cagan	463160c3f6	community: fix `DirectoryLoader` progress bar (#19821 ) Description: currently, the `DirectoryLoader` progress-bar maximum value is based on an incorrect number of files to process In langchain_community/document_loaders/directory.py:127: ```python paths = p.rglob(self.glob) if self.recursive else p.glob(self.glob) items = [ path for path in paths if not (self.exclude and any(path.match(glob) for glob in self.exclude)) ] ``` `paths` returns both files and directories. `items` is later used to determine the maximum value of the progress-bar which gives an incorrect progress indication.	2 months ago
Pengcheng Liu	ecd19a9e58	community[patch]: Add function call support in Tongyi chat model. (#20119 ) - [ ] PR message: - Description: This pr adds function calling support in Tongyi chat model. - Issue: None - Dependencies: None - Twitter handle: None Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
kaijietti	80679ab906	zep[patch]: implement add_messages and aadd_messages (#20099 ) This PR implement `add_messages` and `aadd_messages` to avoid unnecessary round-trips.	2 months ago
Sevin F. Varoglu	3f156e0ece	community[minor]: add ChatOctoAI (#20059 ) This PR adds ChatOctoAI, a chat model integration for OctoAI.	2 months ago
Eun Hye Kim	b34f1086fe	community[patch]: Add streaming logic in ChatHuggingFace (#18784 ) - Add functions (_stream, _astream) - Connect to _generate and _agenerate Thank you for contributing to LangChain! - [x] PR title: "community: Add streaming logic in ChatHuggingFace" - [x] PR message: *Delete this entire checklist* and replace with - Description: Addition functions (_stream, _astream) and connection to _generate and _agenerate - Issue: #18782 - Dependencies: none - Twitter handle: @lunara_x	2 months ago
pjb157	479be3cc91	community[minor]: Unify Titan Takeoff Integrations and Adding Embedding Support (#18775 ) Community: Unify Titan Takeoff Integrations and Adding Embedding Support Description: Titan Takeoff no longer reflects this either of the integrations in the community folder. The two integrations (TitanTakeoffPro and TitanTakeoff) where causing confusion with clients, so have moved code into one place and created an alias for backwards compatibility. Added Takeoff Client python package to do the bulk of the work with the requests, this is because this package is actively updated with new versions of Takeoff. So this integration will be far more robust and will not degrade as badly over time. Issue: Fixes bugs in the old Titan integrations and unified the code with added unit test converge to avoid future problems. Dependencies: Added optional dependency takeoff-client, all imports still work without dependency including the Titan Takeoff classes but just will fail on initialisation if not pip installed takeoff-client Twitter @MeryemArik9 Thanks all :) --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
Rahul Triptahi	2cbfc94bcb	community[patch]: Add support for authorized identities in PebbloSafeLoader. (#20055 ) Description: Add support for authorized identities in PebbloSafeLoader. Now with this change, PebbloSafeLoader will extract authorized_identities from metadata and send it to pebblo server Dependencies: None Documentation: None Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	2 months ago
Guangdong Liu	b78ede2f96	community[patch]: standardize init args (#20166 ) Related to https://github.com/langchain-ai/langchain/issues/20085 @baskaryan	2 months ago
Guangdong Liu	3729bec1a2	community[patch]: standardize init args (#20210 ) Related to https://github.com/langchain-ai/langchain/issues/20085 @baskaryan	2 months ago
sdan	a7c5e41443	community[minor]: Added VLite as VectorStore (#20245 ) Support [VLite](https://github.com/sdan/vlite) as a new VectorStore type. Description: vlite is a simple and blazing fast vector database(vdb) made with numpy. It abstracts a lot of the functionality around using a vdb in the retrieval augmented generation(RAG) pipeline such as embeddings generation, chunking, and file processing while still giving developers the functionality to change how they're made/stored. Before submitting: Added tests [here](`c09c2ebd5c/libs/community/tests/integration_tests/vectorstores/test_vlite.py`) Added ipython notebook [here](`c09c2ebd5c/docs/docs/integrations/vectorstores/vlite.ipynb`) Added simple docs on how to use [here](`c09c2ebd5c/docs/docs/integrations/providers/vlite.mdx`) Profiles Maintainers: @sdan Twitter handles: [@sdand](https://x.com/sdand) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Hyeongchan Kim	7824291252	community[patch]: Fix not to cast to str type when `file_path` is None (#20057 ) From `langchain_community 0.0.30`, there's a bug that cannot send a file-like object via `file` parameter instead of `file path` due to casting the `file_path` to str type even if `file_path` is None. which means that when I call the `partition_via_api()`, exactly one of `filename` and `file` must be specified by the following error message. however, from `langchain_community 0.0.30`, `file_path` is casted into `str` type even `file_path` is None in `get_elements_from_api()` and got an error at `exactly_one(filename=filename, file=file)`. here's an error message ``` ---> 51 exactly_one(filename=filename, file=file) 53 if metadata_filename and file_filename: 54 raise ValueError( 55 "Only one of metadata_filename and file_filename is specified. " 56 "metadata_filename is preferred. file_filename is marked for deprecation.", 57 ) File /opt/homebrew/lib/python3.11/site-packages/unstructured/partition/common.py:441, in exactly_one(**kwargs) 439 else: 440 message = f"{names[0]} must be specified." --> 441 raise ValueError(message) ValueError: Exactly one of filename and file must be specified. ``` So, I simply made a change that casting to str type when `file_path` is not None. I use `UnstructuredAPIFileLoader` like below. ``` from langchain_community.document_loaders.unstructured import UnstructuredAPIFileLoader documents: list = UnstructuredAPIFileLoader( file_path=None, file=file, # file-like object, io.BytesIO type mode='elements', url='http://127.0.0.1:8000/general/v0/general', content_type='application/pdf', metadata_filename='asdf.pdf', ).load_and_split() ```	2 months ago
MacanPN	bce69ae43d	community[patch]: Changes to base_o365 and sharepoint document loaders (#20373 ) ## Description: The PR introduces 3 changes: 1. added `recursive` property to `O365BaseLoader`. (To keep the behavior unchanged, by default is set to `False`). When `recursive=True`, `_load_from_folder()` also recursively loads all nested folders. 2. added `folder_id` to SharePointLoader.(similar to (this PR)[https://github.com/langchain-ai/langchain/pull/10780] ) This provides an alternative to `folder_path` that doesn't seem to reliably work. 3. when none of `document_ids`, `folder_id`, `folder_path` is provided, the loader fetches documets from root folder. Combined with `recursive=True` this provides an easy way of loading all compatible documents from SharePoint. The PR contains the same logic as [this stale PR](https://github.com/langchain-ai/langchain/pull/10780) by @WaleedAlfaris. I'd like to ask his blessing for moving forward with this one. ## Issue: - As described in https://github.com/langchain-ai/langchain/issues/19938 and https://github.com/langchain-ai/langchain/pull/10780 the sharepoint loader often does not seem to work with folder_path. - Recursive loading of subfolders is a missing functionality ## Dependecies: None Twitter handle: @martintriska1 @WRhetoric This is my first PR here, please be gentle :-) Please review @baskaryan --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
Sevin F. Varoglu	54d388d898	community[patch]: update OctoAI endpoint to subclass BaseOpenAI (#19757 ) This PR updates OctoAIEndpoint LLM to subclass BaseOpenAI as OctoAI is an OpenAI-compatible service. The documentation and tests have also been updated.	2 months ago
Benito Geordie	57b226532d	community[minor]: Added integrations for ThirdAI's NeuralDB as a Retriever (#17334 ) Description: Adds ThirdAI NeuralDB retriever integration. NeuralDB is a CPU-friendly and fine-tunable text retrieval engine. We previously added a vector store integration but we think that it will be easier for our customers if they can also find us under under langchain-community/retrievers. --------- Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com> Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>	2 months ago
WeichenXu	e9fc87aab1	community[patch]: Make ChatDatabricks model supports streaming response (#19912 ) Description: Make ChatDatabricks model supports stream Issue: N/A Dependencies: MLflow nightly build version (we will release next MLflow version soon) Twitter handle: N/A Manually test: (Before testing, please install `pip install git+https://github.com/mlflow/mlflow.git`) ```python # Test Databricks Foundation LLM model from langchain.chat_models import ChatDatabricks chat_model = ChatDatabricks( endpoint="databricks-llama-2-70b-chat", max_tokens=500 ) from langchain_core.messages import AIMessageChunk for chunk in chat_model.stream("What is mlflow?"): print(chunk.content, end="\|") ``` - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Signed-off-by: Weichen Xu <weichen.xu@databricks.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Dhruv Chawla	d6d559d50d	community[minor]: add UpTrainCallbackHandler (#19956 ) - Description: This PR adds a callback handler for UpTrain. It performs evaluations in the RAG pipeline to check the quality of retrieved documents, generated queries and responses. - Dependencies: - The UpTrainCallbackHandler requires the uptrain package --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>	2 months ago
Ravindu Somawansa	5acc7ba622	community[minor]: Add glue catalog loader (#20220 ) Add Glue Catalog loader	2 months ago
Martín Gotelli Ferenaz	b48add4353	community[patch]: Fix pgvector deprecated filter clause usage with OR and AND conditions (#20446 ) Description: Support filter by OR and AND for deprecated PGVector version Issue: #20445 Dependencies: N/A Twitter handle: @martinferenaz	2 months ago
Eugene Yurtsev	c50099161b	community[patch]: Use uuid4 not uuid1 (#20487 ) Using UUID1 is incorrect since it's time dependent, which makes it easy to generate the exact same uuid	2 months ago
Erick Friis	86cf1d3ee1	community: release 0.0.33 (#20490 )	2 months ago
Leonid Kuligin	676c68d318	community[patch]: deprecating remaining google_community integrations (#20471 ) Deprecating remaining google community integrations	2 months ago
balloonio	b66a4f48fa	community[patch]: Invoke callback prior to yielding token fix [DeepInfra] (#20427 ) - [x] PR title: community[patch]: Invoke callback prior to yielding token fix for [DeepInfra] - [x] PR message: - Description: Invoke callback prior to yielding token in stream method in [DeepInfra] - Issue: https://github.com/langchain-ai/langchain/issues/16913 - Dependencies: None - Twitter handle: @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	2 months ago
Juan Carlos José Camacho	450c458f8f	community[minor]: Add Datahareld tool (#19680 ) Description: Integrate [dataherald](https://www.dataherald.com) tool, It is a natural language-to-SQL tool. Dependencies: Install dataherald sdk to use it, ``` pip install dataherald ``` --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Christophe Bornet <cbornet@hotmail.com>	2 months ago
Egor Krasheninnikov	c8391d4ff1	community[patch]: Fix YandexGPT embeddings (#19720 ) Fix of YandexGPT embeddings. The current version uses a single `model_name` for queries and documents, essentially making the `embed_documents` and `embed_query` methods the same. Yandex has a different endpoint (`model_uri`) for encoding documents, see [this](https://yandex.cloud/en/docs/yandexgpt/concepts/embeddings). The bug may impact retrievers built with `YandexGPTEmbeddings` (for instance FAISS database as retriever) since they use both `embed_documents` and `embed_query`. A simple snippet to test the behaviour: ```python from langchain_community.embeddings.yandex import YandexGPTEmbeddings embeddings = YandexGPTEmbeddings() q_emb = embeddings.embed_query('hello world') doc_emb = embeddings.embed_documents(['hello world', 'hello world']) q_emb == doc_emb[0] ``` The response is `True` with the current version and `False` with the changes I made. Twitter: @egor_krash --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Guangdong Liu	4be7ca7b4c	community[patch]:sparkllm standardize init args (#20194 ) Related to https://github.com/langchain-ai/langchain/issues/20085 @baskaryan	2 months ago
Yuki Oshima	0758da8940	community[patch]: Set default value for _ListSQLDatabaseToolInput tool_input (#20409 ) Description: `_ListSQLDatabaseToolInput` raise error if model returns `{}`. For example, gpt-4-turbo returns `{}` with SQL Agent initialized by `create_sql_agent`. So, I set default value `""` for `_ListSQLDatabaseToolInput` tool_input. This is actually a gpt-4-turbo issue, not a LangChain issue, but I thought it would be helpful to set a default value `""`. This problem is discussed in detail in the following Issue. Issue: https://github.com/langchain-ai/langchain/issues/20405 Dependencies: none Sorry, I did not add or change the test code, as tests for this components was not exist . However, I have tested the following code based on the [SQL Agent Document](https://python.langchain.com/docs/use_cases/sql/agents/), to make sure it works. ``` from langchain_community.agent_toolkits.sql.base import create_sql_agent from langchain_community.utilities.sql_database import SQLDatabase from langchain_openai import ChatOpenAI db = SQLDatabase.from_uri("sqlite:///Chinook.db") llm = ChatOpenAI(model="gpt-4-turbo", temperature=0) agent_executor = create_sql_agent(llm, db=db, agent_type="openai-tools", verbose=True) result = agent_executor.invoke("List the total sales per country. Which country's customers spent the most?") print(result["output"]) ```	2 months ago
ccurme	38faa74c23	community[patch]: update use of deprecated llm methods (#20393 ) .predict and .predict_messages for BaseLanguageModel and BaseChatModel	2 months ago
Corey Zumar	3a068b26f3	community[patch]: Databricks - fix scope of dangerous deserialization error in Databricks LLM connector (#20368 ) fix scope of dangerous deserialization error in Databricks LLM connector --------- Signed-off-by: dbczumar <corey.zumar@databricks.com>	2 months ago
balloonio	e7b1a44c5b	community[patch]: Invoke callback prior to yielding token fix for Llamafile (#20365 ) - [x] PR title: community[patch]: Invoke callback prior to yielding token fix for Llamafile - [x] PR message: - Description: Invoke callback prior to yielding token in stream method in community llamafile.py - Issue: https://github.com/langchain-ai/langchain/issues/16913 - Dependencies: None - Twitter handle: @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	2 months ago
balloonio	93caa568f9	community[patch]: Invoke callback prior to yielding token fix for HuggingFaceEndpoint (#20366 ) - [x] PR title: community[patch]: Invoke callback prior to yielding token fix for HuggingFaceEndpoint - [x] PR message: - Description: Invoke callback prior to yielding token in stream method in community HuggingFaceEndpoint - Issue: https://github.com/langchain-ai/langchain/issues/16913 - Dependencies: None - Twitter handle: @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2 months ago
Nicolas	ad04585e30	community[minor]: Firecrawl.dev integration (#20364 ) Added the [FireCrawl](https://firecrawl.dev) document loader. Firecrawl crawls and convert any website into LLM-ready data. It crawls all accessible subpages and give you clean markdown for each. - Description: Adds FireCrawl data loader - Dependencies: firecrawl-py - Twitter handle: @mendableai ccing contributors: (@ericciarla @nickscamara) --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
P. Taylor Goetz	9317df7f16	community[patch]: Add "model" attribute to the payload sent to Ollama in `ChatOllama` (#20354 ) Example Ollama API calls: Request without "model": ``` curl --location 'http://localhost:11434/api/chat' \ --header 'Content-Type: application/json' \ --data '{ "messages": [ { "role": "user", "content": "What is the capitol of PA?" } ], "stream": false }' ``` Response: ``` {"error":"model is required"} ``` Request with "model": ``` curl --location 'http://localhost:11434/api/chat' \ --header 'Content-Type: application/json' \ --data '{ "model": "openchat", "messages": [ { "role": "user", "content": "What is the capitol of PA?" } ], "stream": false }' ``` Response: ``` { "eval_duration" : 733248000, "created_at" : "2024-04-11T23:04:08.735766843Z", "model" : "openchat", "message" : { "content" : " The capital city of Pennsylvania is Harrisburg.", "role" : "assistant" }, "total_duration" : 3138731168, "prompt_eval_count" : 25, "load_duration" : 466562959, "done" : true, "prompt_eval_duration" : 1938495000, "eval_count" : 10 } ```	2 months ago
Alex Sherstinsky	fad0962643	community: for Predibase -- enable both Predibase-hosted and HuggingFace-hosted fine-tuned adapter repositories (#20370 )	2 months ago
Isak Nyberg	bac9fb9a7c	community: add gpt-4 pricing in callback (#20292 ) Added the pricing for `gpt-4-turbo` and `gpt-4-turbo-2024-04-09` in the callback method. related to issue #17173 https://openai.com/pricing#language-models	2 months ago
Leonid Ganeline	7cf2d2759d	community[patch]: docstrings update (#20301 ) Added missed docstrings. Format docstings to the consistent form.	2 months ago
Eugene Yurtsev	22fd844e8a	community[patch]: Add deprecation warnings to postgres implementation (#20222 ) Add deprecation warnings to postgres implementation that are in langchain-postgres.	2 months ago
Leonid Ganeline	4cb5f4c353	community[patch]: import flattening fix (#20110 ) This PR should make it easier for linters to do type checking and for IDEs to jump to definition of code. See #20050 as a template for this PR. - As a byproduct: Added 3 missed `test_imports`. - Added missed `SolarChat` in to __init___.py Added it into test_import ut. - Added `# type: ignore` to fix linting. It is not clear, why linting errors appear after ^ changes. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2 months ago
Bagatur	e39fdfddf1	community[patch]: Release 0.0.32 (#20236 )	2 months ago
Chip Davis	806d4ae48f	community[patch]: fixed multithreading returning List[List[Documents]] instead of List[Documents] (#20230 ) Description: When multithreading is set to True and using the DirectoryLoader, there was a bug that caused the return type to be a double nested list. This resulted in other places upstream not being able to utilize the from_documents method as it was no longer a `List[Documents]` it was a `List[List[Documents]]`. The change made was to just loop through the `future.result()` and yield every item. Issue: #20093 Dependencies: N/A Twitter handle: N/A	2 months ago
seray	add31f46d0	community[patch]: OpenLLM Async Client Fixes and Timeout Parameter (#20007 ) Same changes as this merged [PR](https://github.com/langchain-ai/langchain/pull/17478) (https://github.com/langchain-ai/langchain/pull/17478), but for the async client, as the same issues persist. - Replaced 'responses' attribute of OpenLLM's GenerationOutput schema to 'outputs'. reference: `66de54eae7/openllm-core/src/openllm_core/_schemas.py (L135)` - Added timeout parameter for the async client. --------- Co-authored-by: Seray Arslan <seray.arslan@knime.com>	2 months ago
Erick Friis	37a9e23c05	community: switch to falkordb python client (#20229 )	2 months ago
David Lee	0394c6e126	community[minor]: add allow_dangerous_requests for OpenAPI toolkits (#19493 ) OpenAPI allow_dangerous_requests: community: add allow_dangerous_requests for OpenAPI toolkits Description: a description of the change Due to BaseRequestsTool changes, we need to pass allow_dangerous_requests manually. `b617085af0/libs/community/langchain_community/tools/requests/tool.py (L26-L46)` While OpenAPI toolkits didn't pass it in the arguments. `b617085af0/libs/community/langchain_community/agent_toolkits/openapi/planner.py (L262-L269)` Issue: the issue # it fixes, if applicable https://github.com/langchain-ai/langchain/issues/19440 If not passing allow_dangerous_requests, it won't be able to do requests. Dependencies: any dependencies required for this change Not much --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2 months ago
Timothy	0c848a25ad	community[patch]: GCSDirectoryLoader bugfix (#20005 ) - Description: Bug fix. Removed extra line in `GCSDirectoryLoader` to allow catching Exceptions. Now also logs the file path if Exception is raised for easier debugging. - Issue: #20198 Bug since langchain-community==0.0.31 - Dependencies: No change - Twitter handle: timothywong731 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
jeff kit	ac42e96e4c	community[patch], langchain[minor]: Enhance Tencent Cloud VectorDB, langchain: make Tencent Cloud VectorDB self query retrieve compatible (#19651 ) - make Tencent Cloud VectorDB support metadata filtering. - implement delete function for Tencent Cloud VectorDB. - support both Langchain Embedding model and Tencent Cloud VDB embedding model. - Tencent Cloud VectorDB support filter search keyword, compatible with langchain filtering syntax. - add Tencent Cloud VectorDB TranslationVisitor, now work with self query retriever. - more documentations. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
Bagatur	1a34c65e01	community[patch]: pass through sql agent kwargs (#19962 ) Fix #19961	2 months ago
Guangdong Liu	97d91ec17c	community[patch]: standardize baichuan init args (#20209 ) Related to https://github.com/langchain-ai/langchain/issues/20085 @baskaryan	2 months ago
Piyush Jain	cd7abc495a	community[minor]: add neptune analytics graph (#20047 ) Replacement for PR [#19772](https://github.com/langchain-ai/langchain/pull/19772). --------- Co-authored-by: Dave Bechberger <dbechbe@amazon.com> Co-authored-by: bechbd <bechbd@users.noreply.github.com>	2 months ago
Shuqian	ad9750403b	community[minor]: add bedrock anthropic callback for token usage counting (#19864 ) Description: add bedrock anthropic callback for token usage counting, consulted openai callback. --------- Co-authored-by: Massimiliano Pronesti <massimiliano.pronesti@gmail.com>	2 months ago
Prince Canuma	1f9f4d8742	community[minor]: Add support for MLX models (chat & llm) (#18152 ) Description: This PR adds support for MLX models both chat (i.e., instruct) and llm (i.e., pretrained) types/ Dependencies: mlx, mlx_lm, transformers Twitter handle: @Prince_Canuma --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
Leonid Ganeline	2f8dd1a161	community[patch]: `cross_encoders` flatten namespaces (#20183 ) Issue `langchain_community.cross_encoders` didn't have flattening namespace code in the __init__.py file. Changes: - added code to flattening namespaces (used #20050 as a template) - added ut for a change - added missed `test_imports` for `chat_loaders` and `chat_message_histories` modules	2 months ago
Alex Sherstinsky	5f563e040a	community: extend Predibase integration to support fine-tuned LLM adapters (#19979 ) - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: Langchain-Predibase integration was failing, because it was not current with the Predibase SDK; in addition, Predibase integration tests were instantiating the Langchain Community `Predibase` class with one required argument (`model`) missing. This change updates the Predibase SDK usage and fixes the integration tests. - Twitter handle: `@alexsherstinsky` - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2 months ago
Bagatur	5ae0e687b3	docs: use standard openai params (#20160 ) Part of #20085	2 months ago
david02871	e1a24d09c5	community: Add PHP language parser to document_loaders (#19850 ) Description: Added a PHP language parser to document_loaders Issue: N/A Dependencies: N/A Twitter handle: N/A --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2 months ago
Marlene	2f03bc397e	Community: Updating Azure Retriever and Docs to be Azure AI Search instead of Azure Cognitive Search (#19925 ) Last year Microsoft [changed the name](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) of Azure Cognitive Search to Azure AI Search. This PR updates the Langchain Azure Retriever API and it's associated docs to reflect this change. It may be confusing for users to see the name Cognitive here and AI in the Microsoft documentation which is why this is needed. I've also added a more detailed example to the Azure retriever doc page. There are more places that need a similar update but I'm breaking it up so the PRs are not too big 😄 Fixing my errors from the previous PR. Twitter: @marlene_zw Two new tests added to test backward compatibility in `libs/community/tests/integration_tests/retrievers/test_azure_cognitive_search.py` --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2 months ago
Rahul Triptahi	820b713086	community[minor]: Add support for Pebblo cloud_api_key in PebbloSafeLoader (#19855 ) Description: _PebbloSafeLoader_: Add support for pebblo's cloud api-key in PebbloSafeLoader - This Pull request enables PebbloSafeLoader to accept pebblo's cloud api-key and send the semantic classification data to pebblo cloud. Documentation: Updated Unit test: Added Issue: NA Dependencies: - None Twitter handle: @rahul_tripathi2 Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	2 months ago
Eugene Yurtsev	520ff50adc	community[patch]: Improve import callbacks to make it IDE friendly (#20050 ) * declares __all__ as a list of strings (instead of dynamically computing it) * import type definitions when TYPE_CHECKING is true	2 months ago
Leonid Ganeline	3aacd11846	community[minor]: added missed class to __all__ (#19888 ) Added missed `UnstructuredCHMLoader` class to the document_loader.\_\_init\_\_.py \_\_all\_\_	2 months ago
Tomaz Bratanic	df25829f33	community[minor]: Add metadata filtering support for neo4j vector (#20001 )	2 months ago
Ben Mitchell	b52b78478f	community[minor]: Implement Async OpenSearch `afrom_texts` & `afrom_embeddings` (#20009 ) - Description: Adds async variants of afrom_texts and afrom_embeddings into `OpenSearchVectorSearch`, which allows for `afrom_documents` to be called. - Issue: I implemented this because my use case involves an async scraper generating documents as and when they're ready to be ingested by Embedding/OpenSearch - Dependencies: None that I'm aware Co-authored-by: Ben Mitchell <b.mitchell@reply.com>	2 months ago
Tomaz Bratanic	09a0ecd000	langchain[minor]: Tests update metadata filtering examples of documents (#19963 ) Removing metadata properties that are dicts as some databases don't support that, and those properties aren't used in tests anyhow..	2 months ago
happy-go-lucky	c6432abdbe	community[patch]: Implement delete method and all async methods in opensearch_vector_search (#17321 ) - Description: In order to use index and aindex in libs/langchain/langchain/indexes/_api.py, I implemented delete method and all async methods in opensearch_vector_search - Dependencies: No changes	2 months ago
Cheng, Penghui	cc407e8a1b	community[minor]: weight only quantization with intel-extension-for-transformers. (#14504 ) Support weight only quantization with intel-extension-for-transformers. [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers) is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor [Sapphire Rapids](https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/4th-gen-xeon-scalable-processors.html) (codenamed Sapphire Rapids). The toolkit provides the below key features: * Seamless user experience of model compressions on Transformer-based models by extending [Hugging Face transformers](https://github.com/huggingface/transformers) APIs and leveraging [Intel® Neural Compressor](https://github.com/intel/neural-compressor) * Advanced software optimizations and unique compression-aware runtime. * Optimized Transformer-based model packages. * [NeuralChat](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat), a customizable chatbot framework to create your own chatbot within minutes by leveraging a rich set of plugins and SOTA optimizations. * [Inference](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/llm/runtime/graph) of Large Language Model (LLM) in pure C/C++ with weight-only quantization kernels. This PR is an integration of weight only quantization feature with intel-extension-for-transformers. Unit test is in lib/langchain/tests/integration_tests/llm/test_weight_only_quantization.py The notebook is in docs/docs/integrations/llms/weight_only_quantization.ipynb. The document is in docs/docs/integrations/providers/weight_only_quantization.mdx. --------- Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Eugene Yurtsev	d293431e10	core[minor]: Add aload to document loader (#19936 ) Add aload to document loader	2 months ago
Leonid Kuligin	eb0521064e	deprecating integrations moved to langchain_google_community (#19841 ) Thank you for contributing to LangChain! - [ ] PR title: "community: deprecating integrations moved to langchain_google_community" - [ ] PR message: deprecating integrations moved to langchain_google_community --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2 months ago
Erick Friis	f0d5b59962	core[patch]: remove requests (#19891 ) Removes required usage of `requests` from `langchain-core`, all of which has been deprecated. - removes Tracer V1 implementations - removes old `try_load_from_hub` github-based hub implementations Removal done in a way where imports will still succeed, and usage will fail with a `RuntimeError`.	2 months ago
Peter Vandenabeele	e830a4e731	community[patch]: Add remove_comments option (default True): do not extract html comments (#13259 ) - Description: add `remove_comments` option (default: True): do not extract html _comments_, - Issue: None, - Dependencies: None, - Tag maintainer: @nfcampos , - Twitter handle: peter_v I ran `make format`, `make lint` and `make test`. Discussion: I my use case, I prefer to not have the comments in the extracted text: * e.g. from a Google tag that is added in the html as comment * e.g. content that the authors have temporarily hidden to make it non visible to the regular reader Removing the comments makes the extracted text more alike the intended text to be seen by the reader. Choice to make: do we prefer to make the default for this `remove_comments` option to be True or False? I have changed it to True in a second commit, since that is how I would prefer to use it by default. Have the cleaned text (without technical Google tags etc.) and also closer to the actually visible and intended content. I am not sure what is best aligned with the conventions of langchain in general ... INITIAL VERSION (new version above): ~Choice to make: do we prefer to make the default for this `ignore_comments` option to be True or False? I have set it to False now to be backwards compatible. On the other hand, I would use it mostly with True. I am not sure what is best aligned with the conventions of langchain in general ...~ --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Jamsheed Mistri	4f70bc119d	community[minor]: add Layerup Security integration (#19787 ) Description: adds integration with [Layerup Security](https://uselayerup.com). Docs can be found [here](https://docs.uselayerup.com). Integrates directly with our Python SDK. Dependencies: [LayerupSecurity](https://pypi.org/project/LayerupSecurity/) Note: all methods for our product require a paid API key, so I only included 1 test which checks for an invalid API key response. I have tested extensively locally. Twitter handle: [@layerup_](https://twitter.com/layerup_) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Anıl Berk Altuner	4384fa8e49	community[minor]: Add Dria retriever (#17098 ) [Dria](https://dria.co/) is a hub of public RAG models for developers to both contribute and utilize a shared embedding lake. This PR adds a retriever that can retrieve documents from Dria.	2 months ago
Ethan Yang	48f84e253e	community[minor]: Add OpenVINO rerank model support (#19791 ) @eaidova @AlexKoff88 Could you help to review, thanks --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Chenhui Zhang	a1f3e9f537	community[minor]: Update ChatZhipuAI to support GLM-4 model (#16695 ) Description: Update `ChatZhipuAI` to support the latest `glm-4` model. Issue: N/A Dependencies: httpx, httpx-sse, PyJWT The previous `ChatZhipuAI` implementation requires the `zhipuai` package, and cannot call the latest GLM model. This is because - The old version `zhipuai==1.` doesn't support the latest model. - `zhipuai==2.` requires `pydantic V2`, which is incompatible with 'langchain-community'. This re-implementation invokes the GLM model by sending HTTP requests to [open.bigmodel.cn](https://open.bigmodel.cn/dev/api) via the `httpx` package, and uses the `httpx-sse` package to handle stream events. --------- Co-authored-by: zR <2448370773@qq.com>	2 months ago
Bagatur	d25b5b6f25	community[patch]: Release 0.0.31 (#19873 )	2 months ago
Bagatur	d62e84c4f5	community[patch]: Revert " Fix the bug that Chroma does not specify `e… (#19866 ) …mbedding_function` (#19277)" This reverts commit `7042934b5f`. Fixes #19848	2 months ago
Bagatur	0242bce38c	community[patch]: Release 0.0.30 (#19838 )	2 months ago
hsuyuming	5ab6b39098	community[patch]: add attribution_token within GoogleVertexAISearchRetriever (#18520 ) - Description: Add attribution_token within GoogleVertexAISearchRetriever so user can provide this information to Google support team or product team during debug session. Reference: https://cloud.google.com/generative-ai-app-builder/docs/view-analytics#user-events Attribution tokens. Attribution tokens are unique IDs generated by Vertex AI Search and returned with each search request. Make sure to include that attribution token as UserEvent.attributionToken with any user events resulting from a search. This is needed to identify if a search is served by the API. Only user events with a Google-generated attribution token are used to compute metrics. - Issue: No - Dependencies: No - Twitter handle: abehsu1992626 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Kenneth Choe	f98d7f7494	langchain[minor], community[minor]: add CrossEncoderReranker with HuggingFaceCrossEncoder and SagemakerEndpointCrossEncoder (#13687 ) - Description: Support reranking based on cross encoder models available from HuggingFace. - Added `CrossEncoder` schema - Implemented `HuggingFaceCrossEncoder` and `SagemakerEndpointCrossEncoder` - Implemented `CrossEncoderReranker` that performs similar functionality to `CohereRerank` - Added `cross-encoder-reranker.ipynb` to demonstrate how to use it. Please let me know if anything else needs to be done to make it visible on the table-of-contents navigation bar on the left, or on the card list on [retrievers documentation page](https://python.langchain.com/docs/integrations/retrievers). - Issue: N/A - Dependencies: None other than the existing ones. --------- Co-authored-by: Kenny Choe <kchoe@amazon.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Kamal Zhang	368e35c3b1	community[patch]: introduce convert_to_secret() to bananadev llm (#14283 ) - Description: Per #12165, this PR add to BananaLLM the function convert_to_secret_str() during environment variable validation. - Issue: #12165 - Tag maintainer: @eyurtsev - Twitter handle: @treewatcha75751 --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
anshaneel	0884e5de7f	community[minor]: Add Alpha Vantage API Tool (#14332 ) ### Description This implementation adds functionality from the AlphaVantage API, renowned for its comprehensive financial data. The class encapsulates various methods, each dedicated to fetching specific types of financial information from the API. ### Implemented Functions - `search_symbols`: - Searches the AlphaVantage API for financial symbols using the provided keywords. - `_get_market_news_sentiment`: - Retrieves market news sentiment for a specified stock symbol from the AlphaVantage API. - `_get_time_series_daily`: - Fetches daily time series data for a specific symbol from the AlphaVantage API. - `_get_quote_endpoint`: - Obtains the latest price and volume information for a given symbol from the AlphaVantage API. - `_get_time_series_weekly`: - Gathers weekly time series data for a particular symbol from the AlphaVantage API. - `_get_top_gainers_losers`: - Provides details on top gainers, losers, and most actively traded tickers in the US market from the AlphaVantage API. ### Issue: - #11994 ### Dependencies: - 'requests' library for HTTP requests. (import requests) - 'pytest' library for testing. (import pytest) --------- Co-authored-by: Adam Badar <94140103+adam-badar@users.noreply.github.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Alex Sherstinsky	a9bc212bf2	community[minor]: fix failing Predibase integration (#19776 ) - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: Langchain-Predibase integration was failing, because it was not current with the Predibase SDK; in addition, Predibase integration tests were instantiating the Langchain Community `Predibase` class with one required argument (`model`) missing. This change updates the Predibase SDK usage and fixes the integration tests. - Twitter handle: `@alexsherstinsky` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
ethynic	e9caa22d47	community[patch]: Update minimax.py (#14384 ) MiniMaxChat class _generate method shoud return a ChatResult object not str Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
M.Abdulrahman Alnaseer	ba54f1577f	community[minor]: add support for llmsherpa (#19741 ) Thank you for contributing to LangChain! - [x] PR title: "community: added support for llmsherpa library" - [x] Add tests and docs: 1. Integration test: 'docs/docs/integrations/document_loaders/test_llmsherpa.py'. 2. an example notebook: `docs/docs/integrations/document_loaders/llmsherpa.ipynb`. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Hrvoje Milković	b7344e3347	community[minor]: Infobip tool integration (#16805 ) Description: Adding Tool that wraps Infobip API for sending sms or emails and email validation. Dependencies: None, Twitter handle: @hmilkovic Implementation: ``` libs/community/langchain_community/utilities/infobip.py ``` Integration tests: ``` libs/community/tests/integration_tests/utilities/test_infobip.py ``` Example notebook: ``` docs/docs/integrations/tools/infobip.ipynb ``` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Luka Krapic	727a2ea9f1	community[patch]: history size support for DynamoDBChatMessageHistory (#16794 ) Description: PR adds support for limiting number of messages preserved in a session history for DynamoDBChatMessageHistory --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Dt22	6dbf1a2de0	community[patch]: fix redis input type for index_schema field (#16874 ) ### Subject: Fix Type Misdeclaration for index_schema in redis/base.py I noticed a type misdeclaration for the index_schema column in the redis/base.py file. When following the instructions outlined in [Redis Custom Metadata Indexing](https://python.langchain.com/docs/integrations/vectorstores/redis) to create our own index_schema, it leads to a Pylance type error. <br/> The error message indicates that Dict[str, list[Dict[str, str]]] is incompatible with the type Optional[Union[Dict[str, str], str, os.PathLike]]. ``` index_schema = { "tag": [{"name": "credit_score"}], "text": [{"name": "user"}, {"name": "job"}], "numeric": [{"name": "age"}], } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users_modified", index_schema=index_schema, ) ``` Therefore, I have created this pull request to rectify the type declaration problem. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
morgana	074ad5095f	community[patch]: mmr search for Rockset vectorstore integration (#16908 ) - Description: Adding support for mmr search in the Rockset vectorstore integration. - Issue: N/A - Dependencies: N/A - Twitter handle: `@_morgan_adams_` --------- Co-authored-by: Rockset API Bot <admin@rockset.io> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
shahrin014	f51e6a35ba	community[patch]: OllamaEmbeddings - Pass headers to post request (#16880 ) ## Feature - Set additional headers in constructor - Headers will be sent in post request This feature is useful if deploying Ollama on a cloud service such as hugging face, which requires authentication tokens to be passed in the request header. ## Tests - Test if header is passed - Test if header is not passed Similar to https://github.com/langchain-ai/langchain/pull/15881 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Jan Chorowski	b8b42ccbc5	community[minor]: Pathway vectorstore(#14859 ) - Description: Integration with pathway.com data processing pipeline acting as an always updated vectorstore - Issue: not applicable - Dependencies: optional dependency on [`pathway`](https://pypi.org/project/pathway/) - Twitter handle: pathway_com The PR provides and integration with `pathway` to provide an easy to use always updated vector store: ```python import pathway as pw from langchain.embeddings.openai import OpenAIEmbeddings from langchain.text_splitter import CharacterTextSplitter from langchain.vectorstores import PathwayVectorClient, PathwayVectorServer data_sources = [] data_sources.append( pw.io.gdrive.read(object_id="17H4YpBOAKQzEJ93xmC2z170l0bP2npMy", service_user_credentials_file="credentials.json", with_metadata=True)) text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) embeddings_model = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"]) vector_server = PathwayVectorServer( *data_sources, embedder=embeddings_model, splitter=text_splitter, ) vector_server.run_server(host="127.0.0.1", port="8765", threaded=True, with_cache=False) client = PathwayVectorClient( host="127.0.0.1", port="8765", ) query = "What is Pathway?" docs = client.similarity_search(query) ``` The `PathwayVectorServer` builds a data processing pipeline which continusly scans documents in a given source connector (google drive, s3, ...) and builds a vector store. The `PathwayVectorClient` implements LangChain's `VectorStore` interface and connects to the server to retrieve documents. --------- Co-authored-by: Mateusz Lewandowski <lewymati@users.noreply.github.com> Co-authored-by: mlewandowski <mlewandowski@MacBook-Pro-mlewandowski.local> Co-authored-by: Berke <berkecanrizai1@gmail.com> Co-authored-by: Adrian Kosowski <adrian@pathway.com> Co-authored-by: mlewandowski <mlewandowski@macbook-pro-mlewandowski.home> Co-authored-by: berkecanrizai <63911408+berkecanrizai@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: mlewandowski <mlewandowski@MBPmlewandowski.ht.home> Co-authored-by: Szymon Dudycz <szymond@pathway.com> Co-authored-by: Szymon Dudycz <szymon.dudycz@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
Arturs Konfino	2319212d54	community[patch]: avoid executing `toolkit.get_context()` when not necessary (#19762 ) If `prompt` is passed into `create_sql_agent()`, then `toolkit.get_context()` shouldn't be executed against the database unless relevant prompt variables (`table_info` or `table_names`) are present .	2 months ago
高璟琦	ec7a59c96c	community[minor]: Add solar embedding (#19761 ) Solar is a large language model developed by [Upstage](https://upstage.ai/). It's a powerful and purpose-trained LLM. You can visit the embedding service provided by Solar within this pr. You may get SOLAR_API_KEY from https://console.upstage.ai/services/embedding You can refer to more details about accepted llm integration at https://python.langchain.com/docs/integrations/llms/solar.	2 months ago
Tomaz Bratanic	dec00d3050	community[patch]: Add the ability to pass maps to neo4j retrieval query (#19758 ) Makes it easier to flatten complex values to text, so you don't have to use a lot of Cypher to do it.	2 months ago
Robby	f7e8a382cc	community[minor]: add hugging face text-to-speech inference API (#18880 ) Description: I implemented a tool to use Hugging Face text-to-speech inference API. Issue: n/a Dependencies: n/a Twitter handle: No Twitter, but do have [LinkedIn](https://www.linkedin.com/in/robby-horvath/) lol. --------- Co-authored-by: Robby <h0rv@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2 months ago
DasDingoCodes	73eb3f8fd9	community[minor]: Implement DirectoryLoader lazy_load function (#19537 ) Thank you for contributing to LangChain! - [x] PR title: "community: Implement DirectoryLoader lazy_load function" - [x] Description: The `lazy_load` function of the `DirectoryLoader` yields each document separately. If the given `loader_cls` of the `DirectoryLoader` also implemented `lazy_load`, it will be used to yield subdocuments of the file. - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access: `libs/community/tests/unit_tests/document_loaders/test_directory_loader.py` 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory: `docs/docs/integrations/document_loaders/directory.ipynb` - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2 months ago
Jialei	f7c903e24a	community[minor]: add support for Moonshot llm and chat model (#17100 )	2 months ago
Ethan Yang	7164015135	community[minor]: Add Openvino embedding support (#19632 ) This PR is used to support both HF and BGE embeddings with openvino --------- Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>	2 months ago
T Cramer	540ebf35a9	community[patch]: Add explicit error message to Bedrock error output. (#17328 ) - Description: Propagate Bedrock errors into Langchain explicitly. Use-case: unset region error is hidden behind 'Could not load credentials...' message - Issue: [17654](https://github.com/langchain-ai/langchain/issues/17654) - Dependencies: None --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
Marcus Virginia	69bb96c80f	community[patch]: surrealdb handle for empty metadata and allow collection names with complex characters (#17374 ) - Description: Handle for empty metadata and allow collection names with complex characters - Issue: #17057 - Dependencies: `surrealdb` --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
kYLe	124ab79c23	community[minor]: Add Anyscale embedding support (#17605 ) Description: Add embedding model support for Anyscale Endpoint Dependencies: openai --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Lance Martin	12843f292f	community[patch]: llama cpp embeddings reset default n_batch (#17594 ) When testing Nomic embeddings -- ``` from langchain_community.embeddings import LlamaCppEmbeddings embd_model_path = "/Users/rlm/Desktop/Code/llama.cpp/models/nomic-embd/nomic-embed-text-v1.Q4_K_S.gguf" embd_lc = LlamaCppEmbeddings(model_path=embd_model_path) embedding_lc = embd_lc.embed_query(query) ``` We were seeing this error for strings > a certain size -- ``` File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/llama.py:827, in Llama.embed(self, input, normalize, truncate, return_count) 824 s_sizes = [] 826 # add to batch --> 827 self._batch.add_sequence(tokens, len(s_sizes), False) 828 t_batch += n_tokens 829 s_sizes.append(n_tokens) File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/_internals.py:542, in _LlamaBatch.add_sequence(self, batch, seq_id, logits_all) 540 self.batch.token[j] = batch[i] 541 self.batch.pos[j] = i --> 542 self.batch.seq_id[j][0] = seq_id 543 self.batch.n_seq_id[j] = 1 544 self.batch.logits[j] = logits_all ValueError: NULL pointer access ``` The default `n_batch` of llama-cpp-python's Llama is `512` but we were explicitly setting it to `8`. These need to be set to equal for embedding models. * The embedding.cpp example has an assertion to make sure these are always equal. * Apparently this is not being done properly in llama-cpp-python. With `n_batch` set to 8, if more than 8 tokens are passed the batch runs out of space and it crashes. This also explains why the CPU compute buffer size was small: raw client with default `n_batch=512` ``` llama_new_context_with_model: CPU input buffer size = 3.51 MiB llama_new_context_with_model: CPU compute buffer size = 21.00 MiB ``` langchain with `n_batch=8` ``` llama_new_context_with_model: CPU input buffer size = 0.04 MiB llama_new_context_with_model: CPU compute buffer size = 0.33 MiB ``` We can work around this by passing `n_batch=512`, but this will not be obvious to some users: ``` embedding = LlamaCppEmbeddings(model_path=embd_model_path, n_batch=512) ``` From discussion w/ @cebtenzzre. Related: https://github.com/abetlen/llama-cpp-python/issues/1189 Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Zijian Han	8e976545f3	community[patch]: support OpenAI whisper base url (#17695 ) Description: The base URL for OpenAI is retrieved from the environment variable "OPENAI_BASE_URL", whereas for langchain it is obtained from "OPENAI_API_BASE". By adding `base_url = os.environ.get("OPENAI_API_BASE")`, the OpenAI proxy can execute correctly. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Paulo Nascimento	44a3484503	community[patch]: add NotebookLoader unit test (#17721 ) Thank you for contributing to LangChain! - Description: added unit tests for NotebookLoader. Linked PR: https://github.com/langchain-ai/langchain/pull/17614 - Issue: [#17614](https://github.com/langchain-ai/langchain/pull/17614) - Twitter handle: @paulodoestech - [x] Pass lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified to check that you're passing lint and testing. See contribution guidelines for more information on how to write/run tests, lint, etc: https://python.langchain.com/docs/contributing/ - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: lachiewalker <lachiewalker1@hotmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Paulo Nascimento	4c3a67122f	community[patch]: add Integration for OpenAI image gen with v1 sdk (#17771 ) Description: Created a Langchain Tool for OpenAI DALLE Image Generation. Issue: [#15901](https://github.com/langchain-ai/langchain/issues/15901) Dependencies: n/a Twitter handle: @paulodoestech - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Jiaming	3d3cc71287	community[patch]: fix bugs for bilibili Loader (#18036 ) - Description: 1. Fix the BiliBiliLoader that can receive cookie parameters, it requires 3 other parameters to run. The change is backward compatible. 2. Add test; 3. Add example in docs - Issue: [#14213] Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
Sachin Paryani	25c9f3d1d1	community[patch]: Support Streaming in Azure Machine Learning (#18246 ) - [x] PR title: "community: Support streaming in Azure ML and few naming changes" - [x] PR message: - Description: Added support for streaming for azureml_endpoint. Also, renamed and AzureMLEndpointApiType.realtime to AzureMLEndpointApiType.dedicated. Also, added new classes CustomOpenAIChatContentFormatter and CustomOpenAIContentFormatter and updated the classes LlamaChatContentFormatter and LlamaContentFormatter to now show a deprecated warning message when instantiated. --------- Co-authored-by: Sachin Paryani <saparan@microsoft.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Victor Adan	afa2d85405	community[patch]: Added missing from_documents method to KNNRetriever. (#18411 ) - Description: Added missing `from_documents` method to `KNNRetriever`, providing the ability to supply metadata to LangChain `Document`s, and to give it parity to the other retrievers, which do have `from_documents`. - Issue: None - Dependencies: None - Twitter handle: None Co-authored-by: Victor Adan <vadan@netroadshow.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
Smit Parmar	dfc4177b50	community[patch]: mypy ignore fix (#18483 ) Relates to #17048 Description : Applied fix to dynamodb and elasticsearch file. Error was : `Cannot override writeable attribute with read-only property` Suggestion: instead of adding ``` @messages.setter def messages(self, messages: List[BaseMessage]) -> None: raise NotImplementedError("Use add_messages instead") ``` we can change base class property `messages: List[BaseMessage]` to ``` @property def messages(self) -> List[BaseMessage]:... ``` then we don't need to add `@messages.setter` in all child classes.	2 months ago
Luca Dorigo	f19229c564	core[patch]: fix beta, deprecated typing (#18877 ) Description: While not technically incorrect, the TypeVar used for the `@beta` decorator prevented pyright (and thus most vscode users) from correctly seeing the types of functions/classes decorated with `@beta`. This is in part due to a small bug in pyright (https://github.com/microsoft/pyright/issues/7448 ) - however, the `Type` bound in the typevar `C = TypeVar("C", Type, Callable)` is not doing anything - classes are `Callables` by default, so by my understanding binding to `Type` does not actually provide any more safety - the modified annotation still works correctly for both functions, properties, and classes. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
wulixuan	b7c8bc8268	community[patch]: fix yuan2 errors in LLMs (#19004 ) 1. fix yuan2 errors while invoke Yuan2. 2. update tests.	2 months ago
高远	688ca48019	community[patch]: Adding validation when vector does not exist (#19698 ) Adding validation when vector does not exist Co-authored-by: gaoyuan <gaoyuan.20001218@bytedance.com>	2 months ago
Chaunte W. Lacewell	4a49fc5a95	community[patch]: Fix bug in vdms (#19728 ) Description: Fix embedding check in vdms Contribution maintainer: [@cwlacewe](https://github.com/cwlacewe)	2 months ago
高璟琦	75173d31db	community[minor]: Add solar model chat model (#18556 ) Add our solar chat models, available model choices: * solar-1-mini-chat * solar-1-mini-translate-enko * solar-1-mini-translate-koen More documents and pricing can be found at https://console.upstage.ai/services/solar. The references to our solar model can be found at * https://arxiv.org/abs/2402.17032 --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Davide Menini	f7042321f1	community[patch]: gather token usage info in BedrockChat during generation (#19127 ) This PR allows to calculate token usage for prompts and completion directly in the generation method of BedrockChat. The token usage details are then returned together with the generations, so that other downstream tasks can access them easily. This allows to define a callback for tokens tracking and cost calculation, similarly to what happens with OpenAI (see [OpenAICallbackHandler](https://api.python.langchain.com/en/latest/_modules/langchain_community/callbacks/openai_info.html#OpenAICallbackHandler). I plan on adding a BedrockCallbackHandler later. Right now keeping track of tokens in the callback is already possible, but it requires passing the llm, as done here: https://how.wtf/how-to-count-amazon-bedrock-anthropic-tokens-with-langchain.html. However, I find the approach of this PR cleaner. Thanks for your reviews. FYI @baskaryan, @hwchase17 --------- Co-authored-by: taamedag <Davide.Menini@swisscom.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
ligang-super	a662468dde	community[patch]: Fix the error of Baidu Qianfan not passing the stop parameter (#18666 ) - [x] PR title: "community: fix baidu qianfan missing stop parameter" - [x] PR message: - **Description: Baidu Qianfan lost the stop parameter when requesting service due to extracting it from kwargs. This bug can cause the agent to receive incorrect results --------- Co-authored-by: ligang33 <ligang33@baidu.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
kaijietti	9c4b6dc979	community[patch]: fix bug in cohere that `async for` a coroutine in ChatCohere (#19381 ) Without `await`, the `stream` returned from the `async_client` is actually a coroutine, which could not be used in `async for`.	2 months ago
Christian Galo	1adaa3c662	community[minor]: Update Azure Cognitive Services to Azure AI Services (#19488 ) This is a follow up to #18371. These are the changes: - New Azure AI Services toolkit and tools to replace those of Azure Cognitive Services. - Updated documentation for Microsoft platform. - The image analysis tool has been rewritten to use the new package `azure-ai-vision-imageanalysis`, doing a proper replacement of `azure-ai-vision`. These changes: - Update outdated naming from "Azure Cognitive Services" to "Azure AI Services". - Update documentation to use non-deprecated methods to create and use agents. - Removes need to depend on yanked python package (`azure-ai-vision`) There is one new dependency that is needed as a replacement to `azure-ai-vision`: - `azure-ai-vision-imageanalysis`. This is optional and declared within a function. There is a new `azure_ai_services.ipynb` notebook showing usage; Changes have been linted and formatted. I am leaving the actions of adding deprecation notices and future removal of Azure Cognitive Services up to the LangChain team, as I am not sure what the current practice around this is. --- If this PR makes it, my handle is @galo@mastodon.social --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com>	2 months ago
Shengsheng Huang	ac1dd8ad94	community[minor]: migrate `bigdl-llm` to `ipex-llm` (#19518 ) - Description: `bigdl-llm` library has been renamed to [`ipex-llm`](https://github.com/intel-analytics/ipex-llm). This PR migrates the `bigdl-llm` integration to `ipex-llm` . - Issue: N/A. The original PR of `bigdl-llm` is https://github.com/langchain-ai/langchain/pull/17953 - Dependencies: `ipex-llm` library - Contribution maintainer: @shane-huang Updated doc: docs/docs/integrations/llms/ipex_llm.ipynb Updated test: libs/community/tests/integration_tests/llms/test_ipex_llm.py	2 months ago
Chaunte W. Lacewell	a31f692f4e	community[minor]: Add VDMS vectorstore (#19551 ) - Description: Add support for Intel Lab's [Visual Data Management System (VDMS)](https://github.com/IntelLabs/vdms) as a vector store - Dependencies: `vdms` library which requires protobuf = "4.24.2". There is a conflict with dashvector in `langchain` package but conflict is resolved in `community`. - Contribution maintainer: [@cwlacewe](https://github.com/cwlacewe) - Added tests: libs/community/tests/integration_tests/vectorstores/test_vdms.py - Added docs: docs/docs/integrations/vectorstores/vdms.ipynb - Added cookbook: cookbook/multi_modal_RAG_vdms.ipynb --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
William FH	b7b62e29fb	community[patch], mongodb[patch]: Stop spamming SIMD import warnings (#19531 ) If you use an embedding dist function in an eval loop, you get warned every time. Would prefer to just check once and forget about it. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
yongheng.liu	7e29b6061f	community[minor]: integrate China Mobile Ecloud vector search (#15298 ) - Description: integrate China Mobile Ecloud vector search, - Dependencies: elasticsearch==7.10.1 Co-authored-by: liuyongheng <liuyongheng@cmss.chinamobile.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Hyeongchan Kim	9b70131aed	community[patch]: refactor the type hint of `file_path` in `UnstructuredAPIFileLoader` class (#18839 ) * Description: add `None` type for `file_path` along with `str` and `List[str]` types. * `file_path`/`filename` arguments in `get_elements_from_api()` and `partition()` can be `None`, however, there's no `None` type hint for `file_path` in `UnstructuredAPIFileLoader` and `UnstructuredFileLoader` currently. * calling the function with `file_path=None` is no problem, but my IDE annoys me lol. * Issue: N/A * Dependencies: N/A Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
CaroFG	cf96060ab7	community[patch]: update for compatibility with latest Meilisearch version (#18970 ) - Description: Updates Meilisearch vectorstore for compatibility with v1.6 and above. Adds embedders settings and embedder_name which are now required. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
chyroc	be2adb1083	community[patch]: support unstructured_kwargs for s3 loader (#15473 ) fix https://github.com/langchain-ai/langchain/issues/15472 Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Evgenii Zheltonozhskii	5b1f9c6d3a	infra: Consistent lxml requirements (#19520 ) Update the dependency for lxml to be consistent among different packages; should fix https://github.com/langchain-ai/langchain/issues/19040	2 months ago
Tomaz Bratanic	87d2a6b777	community[minor]: Add the option to omit schema refresh in Neo4jGraph (#19654 )	2 months ago
Rajendra Kadam	0019d8a948	community[minor]: Add support for non-file-based Document Loaders in PebbloSafeLoader (#19574 ) Description: PebbloSafeLoader: Add support for non-file-based Document Loaders This pull request enhances PebbloSafeLoader by introducing support for several non-file-based Document Loaders. With this update, PebbloSafeLoader now seamlessly integrates with the following loaders: - GoogleDriveLoader - SlackDirectoryLoader - Unstructured EmailLoader Issue: NA Dependencies: - None Twitter handle: @Raj__725 --------- Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	2 months ago
hulitaitai	dc2c9dd4d7	Update text2vec.py (#19657 ) Add that URL of the embedding tool "text2vec". Fix minor mistakes in the doc-string.	2 months ago
Guangdong Liu	7042934b5f	community[patch]: Fix the bug that Chroma does not specify `embedding_function` (#19277 ) - Issue: close #18291 - @baskaryan, @eyurtsev PTAL	2 months ago
yuwenzho	3a7d2cf443	community[minor]: Add ITREX optimized Embeddings (#18474 ) Introduction [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers) is an innovative toolkit designed to accelerate GenAI/LLM everywhere with the optimal performance of Transformer-based models on various Intel platforms Description adding ITREX runtime embeddings using intel-extension-for-transformers. added mdx documentation and example notebooks added embedding import testing. --------- Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Fabrizio Ruocco	f12cb0bea4	community[patch]: Microsoft Azure Document Intelligence updates (#16932 ) - Description: Update Azure Document Intelligence implementation by Microsoft team and RAG cookbook with Azure AI Search --------- Co-authored-by: Lu Zhang (AI) <luzhan@microsoft.com> Co-authored-by: Yateng Hong <yatengh@microsoft.com> Co-authored-by: teethache <hongyateng2006@126.com> Co-authored-by: Lu Zhang <44625949+luzhang06@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
Timothy	ad77fa15ee	community[patch]: Adding try-except block for GCSDirectoryLoader (#19591 ) - Description: Implemented try-except block for `GCSDirectoryLoader`. Reason: Users processing large number of unstructured files in a folder may experience many different errors. A try-exception block is added to capture these errors. A new argument `use_try_except=True` is added to enable silent failure so that error caused by processing one file does not break the whole function. - Issue: N/A - Dependencies: no new dependencies - Twitter handle: timothywong731 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
xsai9101	160a8eb178	community[minor]: add oracle autonomous database doc loader integration (#19536 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: Adding oracle autonomous database document loader integration. This will allow users to connect to oracle autonomous database through connection string or TNS configuration. https://www.oracle.com/autonomous-database/ - Issue: None - Dependencies: oracledb python package https://pypi.org/project/oracledb/ - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. Unit test and doc are added. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Adam Law	aeb7b6b11d	community[patch]: use semantic_configurations in AzureSearch (#19347 ) - Description: Currently the semantic_configurations are not used when creating an AzureSearch instance, instead creating a new one with default values. This PR changes the behavior to use the passed semantic_configurations if it is present, and the existing default configuration if not. --------- Co-authored-by: Adam Law <adamlaw@microsoft.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2 months ago
Adrian Valente	2763d8cbe5	community: add len() implementation to Chroma (#19419 ) Thank you for contributing to LangChain! - [x] Add len() implementation to Chroma: "package: community" - [x] PR message: - Description: add an implementation of the __len__() method for the Chroma vectostore, for convenience. - Issue: no exposed method to know the size of a Chroma vectorstore - Dependencies: None - Twitter handle: lowrank_adrian - [x] Add tests and docs - [x] Lint and test --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
Tom Aarsen	e0a1278d2b	docs: HFEmbeddings: Add more information to model_kwargs/encode_kwargs (#19594 ) - Description: Be more explicit with the `model_kwargs` and `encode_kwargs` for `HuggingFaceEmbeddings`. - Issue: - - Dependencies: - I received some reports by my users that they didn't realise that you could change the default `batch_size` with `HuggingFaceEmbeddings`, which may be attributed to how the `model_kwargs` and `encode_kwargs` don't give much information about what you can specify. I've added some parameter names & links to the Sentence Transformers documentation to help clear it up. Let me know if you'd rather have Markdown/Sphinx-style hyperlinks rather than a "bare URL". - Tom Aarsen	2 months ago
Dobiichi-Origami	18e6f9376d	community[Qianfan]: add function_call in additional_kwargs (#19550 ) - Description: add lacked `function_call` field in `additional_kwargs` in previous version - Dependencies: None of new dependency	2 months ago
mwmajewsk	f7a1fd91b8	community: better support of pathlib paths in document loaders (#18396 ) So this arose from the https://github.com/langchain-ai/langchain/pull/18397 problem of document loaders not supporting `pathlib.Path`. This pull request provides more uniform support for Path as an argument. The core ideas for this upgrade: - if there is a local file path used as an argument, it should be supported as `pathlib.Path` - if there are some external calls that may or may not support Pathlib, the argument is immidiately converted to `str` - if there `self.file_path` is used in a way that it allows for it to stay pathlib without conversion, is is only converted for the metadata. Twitter handle: https://twitter.com/mwmajewsk	2 months ago
Yuki Watanabe	cfecbda48b	community[minor]: Allow passing `allow_dangerous_deserialization` when loading LLM chain (#18894 ) ### Issue Recently, the new `allow_dangerous_deserialization` flag was introduced for preventing unsafe model deserialization that relies on pickle without user's notice (#18696). Since then some LLMs like Databricks requires passing in this flag with true to instantiate the model. However, this breaks existing functionality to loading such LLMs within a chain using `load_chain` method, because the underlying loader function [load_llm_from_config](`f96dd57501/libs/langchain/langchain/chains/loading.py (L40)`) (and load_llm) ignores keyword arguments passed in. ### Solution This PR fixes this issue by propagating the `allow_dangerous_deserialization` argument to the class loader iff the LLM class has that field. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
hulitaitai	d7c14cb6f9	community[minor]: Add embeddings integration for text2vec (#19267 ) Create a Class which allows to use the "text2vec" open source embedding model. It should install the model by running 'pip install -U text2vec'. Example to call the model through LangChain: from langchain_community.embeddings.text2vec import Text2vecEmbeddings embedding = Text2vecEmbeddings() bookend.embed_documents([ "This is a CoSENT(Cosine Sentence) model.", "It maps sentences to a 768 dimensional dense vector space.", ]) bookend.embed_query( "It can be used for text matching or semantic search." ) --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2 months ago
Kalyan Mudumby	d27600c6f7	community[patch]: GPTCache pydantic validation error on lookup (#19427 ) Description: this change fixes the pydantic validation error when looking up from GPTCache, the `ChatOpenAI` class returns `ChatGeneration` as response which is not handled. use the existing `_loads_generations` and `_dumps_generations` functions to handle it Trace ``` File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/development/scripts/chatbot-postgres-test.py", line 90, in <module> print(llm.invoke("tell me a joke")) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 166, in invoke self.generate_prompt( File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 544, in generate_prompt return self.generate(prompt_messages, stop=stop, callbacks=callbacks, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 408, in generate raise e File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 398, in generate self._generate_with_cache( File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 585, in _generate_with_cache cache_val = llm_cache.lookup(prompt, llm_string) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_community/cache.py", line 807, in lookup return [ ^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_community/cache.py", line 808, in <listcomp> Generation(generation_dict) for generation_dict in json.loads(res) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/load/serializable.py", line 120, in __init__ super().__init__(**kwargs) File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/pydantic/v1/main.py", line 341, in __init__ raise validation_error pydantic.v1.error_wrappers.ValidationError: 1 validation error for Generation type unexpected value; permitted: 'Generation' (type=value_error.const; given=ChatGeneration; permitted=('Generation',)) ``` Although I don't seem to find any issues here, here's an [issue](https://github.com/zilliztech/GPTCache/issues/585) raised in GPTCache. Please let me know if I need to do anything else Thank you --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
Piyush Jain	72ba738bf5	community[minor]: Improvements for NeptuneRdfGraph, Improve discovery of graph schema using database statistics (#19546 ) Fixes linting for PR [19244](https://github.com/langchain-ai/langchain/pull/19244) --------- Co-authored-by: mhavey <mchavey@gmail.com>	2 months ago
Christophe Bornet	8595c3ab59	community[minor]: Add InMemoryVectorStore to module level imports (#19576 )	2 months ago
Aayush Kataria	03c38005cb	community[patch]: Fixing some caching issues for AzureCosmosDBSemanticCache (#18884 ) Fixing some issues for AzureCosmosDBSemanticCache - Added the entry for "AzureCosmosDBSemanticCache" which was missing in langchain/cache.py - Added application name when creating the MongoClient for the AzureCosmosDBVectorSearch, for tracking purposes. @baskaryan, can you please review this PR, we need this to go in asap. These are just small fixes which we found today in our testing.	2 months ago
Clément Tamines	a6cbb755a7	community[patch]: fix semantic answer bug in AzureSearch vector store (#18938 ) - Description: The `semantic_hybrid_search_with_score_and_rerank` method of `AzureSearch` contains a hardcoded field name "metadata" for the document metadata in the Azure AI Search Index. Adding such a field is optional when creating an Azure AI Search Index, as other snippets from `AzureSearch` test for the existence of this field before trying to access it. Furthermore, the metadata field name shouldn't be hardcoded as "metadata" and use the `FIELDS_METADATA` variable that defines this field name instead. In the current implementation, any index without a metadata field named "metadata" will yield an error if a semantic answer is returned by the search in `semantic_hybrid_search_with_score_and_rerank`. - Issue: https://github.com/langchain-ai/langchain/issues/18731 - Prior fix to this bug: This bug was fixed in this PR https://github.com/langchain-ai/langchain/pull/15642 by adding a check for the existence of the metadata field named `FIELDS_METADATA` and retrieving a value for the key called "key" in that metadata if it exists. If the field named `FIELDS_METADATA` was not present, an empty string was returned. This fix was removed in this PR https://github.com/langchain-ai/langchain/pull/15659 (see `ed1ffca911`#). @lz-chen: could you confirm this wasn't intentional? - New fix to this bug: I believe there was an oversight in the logic of the fix from [#1564](https://github.com/langchain-ai/langchain/pull/15642) which I explain below. The `semantic_hybrid_search_with_score_and_rerank` method creates a dictionary `semantic_answers_dict` with semantic answers returned by the search as follows. `5c2f7e6b2b/libs/community/langchain_community/vectorstores/azuresearch.py (L574-L581)` The keys in this dictionary are the unique document ids in the index, if I understand the [documentation of semantic answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers) in Azure AI Search correctly. When the method transforms a search result into a `Document` object, an "answer" key is added to the document's metadata. The value for this "answer" key should be the semantic answer returned by the search from this document, if such an answer is returned. The match between a `Document` object and the semantic answers returned by the search should be done through the unique document id, which is used as a key for the `semantic_answers_dict` dictionary. This id is defined in the search result's field named `FIELDS_ID`. I added a check to avoid any error in case no field named `FIELDS_ID` exists in a search result (which shouldn't happen in theory). A benefit of this approach is that this fix should work whether or not the Azure AI Search Index contains a metadata field. @levalencia could you confirm my analysis and test the fix? @raunakshrivastava7 do you agree with the fix? Thanks for the help!	2 months ago
Anindyadeep	b2a11ce686	community[minor]: Prem AI langchain integration (#19113 ) ### Prem SDK integration in LangChain This PR adds the integration with [PremAI's](https://www.premai.io/) prem-sdk with langchain. User can now access to deployed models (llms/embeddings) and use it with langchain's ecosystem. This PR adds the following: ### This PR adds the following: - [x] Add chat support - [X] Adding embedding support - [X] writing integration tests - [X] writing tests for chat - [X] writing tests for embedding - [X] writing unit tests - [X] writing tests for chat - [X] writing tests for embedding - [X] Adding documentation - [X] writing documentation for chat - [X] writing documentation for embedding - [X] run `make test` - [X] run `make lint`, `make lint_diff` - [X] Final checks (spell check, lint, format and overall testing) --------- Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
Souhail Hanfi	cbec43afa9	community[patch]: avoid creating extension PGvector while using readOnly Databases (#19268 ) - Description: PgVector class always runs "create extension" on init and this statement crashes on ReadOnly databases (read only replicas). but wierdly the next create collection etc work even in readOnly databases - Dependencies: no new dependencies - Twitter handle: @VenOmaX666 Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
Barun Amalkumar Halder	9246ec6b36	community[patch] : [Fiddler] ensure dataset is not added if model is present (#19293 ) Description: - minor PR to speed up onboarding by not trying to add a dataset, if a model is already present. - replace batch publish API with streaming when single events are published. Dependencies: any dependencies required for this change Twitter handle: behalder Co-authored-by: Barun Halder <barun@fiddler.ai>	2 months ago
JSDu	6e090280fd	community[patch]: milvus will autoflush, manual flush is slowly (#19300 ) reference: https://milvus.io/docs/configure_quota_limits.md#quotaAndLimitsflushRateenabled https://github.com/milvus-io/milvus/issues/31407 Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
mackong	e65dc4b95b	community[patch]: clean warning when delete by ids (#19301 ) * Description: rearrange to avoid variable overwrite, which cause warning always. * Issue: N/A * Dependencies: N/A	2 months ago
Stefano Mosconi	01fc69c191	community[patch]: expanding version in confluence loader (#19324 ) Description: Expanding version in all the Confluence API calls so to get when the page was last modified/created in all cases. Issue: #12812 Twitter handle: zzste	2 months ago
Dmitry Tyumentsev	08b769d539	community[patch]: YandexGPT Use recent yandexcloud sdk version (#19341 ) Fixed inability to work with [yandexcloud SDK](https://pypi.org/project/yandexcloud/) version higher 0.265.0	2 months ago
Marlene	f1313339ac	community[patch]: Fixing incorrect base URLs for Azure Cognitive Search Retriever (#19352 ) This PR adds code to make sure that the correct base URL is being created for the Azure Cognitive Search retriever. At the moment an incorrect base URL is being generated. I think this is happening because the original code was based on a depreciated API version. No dependencies need to be added. I've also added more context to the test doc strings. I should also note that ACS is now Azure AI Search. I will open a separate PR to make these changes as that would be a breaking change and should potentially be discussed. Twitter: @marlene_zw - No new tests added, however the current ACS retriever tests are now passing when I run them. - Code was linted. Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
FinTech秋田	03ba1d4731	community[patch]: Add Support for GPU Index Types in Milvus 2.4 (#19468 ) - Description: This commit introduces support for the newly available GPU index types introduced in Milvus 2.4 within the LangChain project's `milvus.py`. With the release of Milvus 2.4, a range of GPU-accelerated index types have been added, offering enhanced search capabilities and performance optimizations for vector search operations. This update ensures LangChain users can fully utilize the new performance benefits for vector search operations. - Reference: https://milvus.io/docs/gpu_index.md Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2 months ago
Ash Vardanian	d01bad5169	core[patch]: Convert SimSIMD back to NumPy (#19473 ) This patch fixes the #18022 issue, converting the SimSIMD internal zero-copy outputs to NumPy. I've also noticed, that oftentimes `dtype=np.float32` conversion is used before passing to SimSIMD. Which numeric types do LangChain users generally care about? We support `float64`, `float32`, `float16`, and `int8` for cosine distances and `float16` seems reasonable for practically any kind of embeddings and any modern piece of hardware, so we can change that part as well 🤗	2 months ago
Mikelarg	dac2e0165a	community[minor]: Added GigaChat Embeddings support + updated previous GigaChat integration (#19516 ) - Description: Added integration with [GigaChat](https://developers.sber.ru/portal/products/gigachat) embeddings. Also added support for extra fields in GigaChat LLM and fixed docs.	2 months ago
Martin Kolb	e5bdb26f76	community[patch]: More flexible handling for entity names in vector store "HANA Cloud" (#19523 ) - Description: Added support for lower-case and mixed-case names The names for tables and columns previouly had to be UPPER_CASE. With this enhancement, also lower_case and MixedCase are supported, - Issue: N/A - Dependencies: no new dependecies added - Twitter handle: @sapopensource	2 months ago
billytrend-cohere	63343b4987	cohere[patch]: add cohere as a partner package (#19049 ) Description: adds support for langchain_cohere --------- Co-authored-by: Harry M <127103098+harry-cohere@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2 months ago
ccurme	82de8fd6c9	add kwargs (#19519 ) `HanaDB.add_texts` is missing **kwargs.	2 months ago
Nikhil Kumar	3d3b46a782	docs: Update docs for `HuggingFacePipeline` (#19306 ) Updated `HuggingFacePipeline` docs to be in sync with list of supported tasks, including translation. - [x] PR title: "community: Update docs for `HuggingFacePipeline`" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: - Description: Update docs for `HuggingFacePipeline`, was earlier missing `translation` as a valid task - Issue: N/A - Dependencies: N/A - Twitter handle: None - [x] Add tests and docs: - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/	3 months ago
Igor Muniz Soares	743f888580	community[minor]: Dappier chat model integration (#19370 ) Description: This PR adds [Dappier](https://dappier.com/) for the chat model. It supports generate, async generate, and batch functionalities. We added unit and integration tests as well as a notebook with more details about our chat model. Dependencies: No extra dependencies are needed.	3 months ago
Hugoberry	96dc180883	community[minor]: Add `DuckDB` as a vectorstore (#18916 ) DuckDB has a cosine similarity function along list and array data types, which can be used as a vector store. - Description: The latest version of DuckDB features a cosine similarity function, which can be used with its support for list or array column types. This PR surfaces this functionality to langchain. - Dependencies: duckdb 0.10.0 - Twitter handle: @igocrite --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
preak95	6ea3e57a63	community[minor]: S3FileLoader to use expose mode and post_processors arguments of unstructured loader (#19270 ) Description: Update s3_file.py to use arguments mode and post_processors from the base class UnstructuredBaseLoader to include more metadata about the files from the S3 bucket such as 'page_number', 'languages' etc. Issue: NA Dependencies: None Twitter handle: preak95 --------- Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
fengjial	3b52ee05d1	community[patch]: fix bugs in baiduvectordb as vectorstore (#19380 ) fix small bugs in vectorstore/baiduvectordb	3 months ago

... 3 4 5 6 7 ...

1121 Commits (master)