langchain

Commit Graph

Author	SHA1	Message	Date
高璟琦	ec7a59c96c	community[minor]: Add solar embedding (#19761 ) Solar is a large language model developed by [Upstage](https://upstage.ai/). It's a powerful and purpose-trained LLM. You can visit the embedding service provided by Solar within this pr. You may get SOLAR_API_KEY from https://console.upstage.ai/services/embedding You can refer to more details about accepted llm integration at https://python.langchain.com/docs/integrations/llms/solar.	3 months ago
Tomaz Bratanic	dec00d3050	community[patch]: Add the ability to pass maps to neo4j retrieval query (#19758 ) Makes it easier to flatten complex values to text, so you don't have to use a lot of Cypher to do it.	3 months ago
Robby	f7e8a382cc	community[minor]: add hugging face text-to-speech inference API (#18880 ) Description: I implemented a tool to use Hugging Face text-to-speech inference API. Issue: n/a Dependencies: n/a Twitter handle: No Twitter, but do have [LinkedIn](https://www.linkedin.com/in/robby-horvath/) lol. --------- Co-authored-by: Robby <h0rv@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	3 months ago
DasDingoCodes	73eb3f8fd9	community[minor]: Implement DirectoryLoader lazy_load function (#19537 ) Thank you for contributing to LangChain! - [x] PR title: "community: Implement DirectoryLoader lazy_load function" - [x] Description: The `lazy_load` function of the `DirectoryLoader` yields each document separately. If the given `loader_cls` of the `DirectoryLoader` also implemented `lazy_load`, it will be used to yield subdocuments of the file. - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access: `libs/community/tests/unit_tests/document_loaders/test_directory_loader.py` 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory: `docs/docs/integrations/document_loaders/directory.ipynb` - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	3 months ago
Jialei	f7c903e24a	community[minor]: add support for Moonshot llm and chat model (#17100 )	3 months ago
Ethan Yang	7164015135	community[minor]: Add Openvino embedding support (#19632 ) This PR is used to support both HF and BGE embeddings with openvino --------- Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>	3 months ago
T Cramer	540ebf35a9	community[patch]: Add explicit error message to Bedrock error output. (#17328 ) - Description: Propagate Bedrock errors into Langchain explicitly. Use-case: unset region error is hidden behind 'Could not load credentials...' message - Issue: [17654](https://github.com/langchain-ai/langchain/issues/17654) - Dependencies: None --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
Marcus Virginia	69bb96c80f	community[patch]: surrealdb handle for empty metadata and allow collection names with complex characters (#17374 ) - Description: Handle for empty metadata and allow collection names with complex characters - Issue: #17057 - Dependencies: `surrealdb` --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
kYLe	124ab79c23	community[minor]: Add Anyscale embedding support (#17605 ) Description: Add embedding model support for Anyscale Endpoint Dependencies: openai --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
Lance Martin	12843f292f	community[patch]: llama cpp embeddings reset default n_batch (#17594 ) When testing Nomic embeddings -- ``` from langchain_community.embeddings import LlamaCppEmbeddings embd_model_path = "/Users/rlm/Desktop/Code/llama.cpp/models/nomic-embd/nomic-embed-text-v1.Q4_K_S.gguf" embd_lc = LlamaCppEmbeddings(model_path=embd_model_path) embedding_lc = embd_lc.embed_query(query) ``` We were seeing this error for strings > a certain size -- ``` File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/llama.py:827, in Llama.embed(self, input, normalize, truncate, return_count) 824 s_sizes = [] 826 # add to batch --> 827 self._batch.add_sequence(tokens, len(s_sizes), False) 828 t_batch += n_tokens 829 s_sizes.append(n_tokens) File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/_internals.py:542, in _LlamaBatch.add_sequence(self, batch, seq_id, logits_all) 540 self.batch.token[j] = batch[i] 541 self.batch.pos[j] = i --> 542 self.batch.seq_id[j][0] = seq_id 543 self.batch.n_seq_id[j] = 1 544 self.batch.logits[j] = logits_all ValueError: NULL pointer access ``` The default `n_batch` of llama-cpp-python's Llama is `512` but we were explicitly setting it to `8`. These need to be set to equal for embedding models. * The embedding.cpp example has an assertion to make sure these are always equal. * Apparently this is not being done properly in llama-cpp-python. With `n_batch` set to 8, if more than 8 tokens are passed the batch runs out of space and it crashes. This also explains why the CPU compute buffer size was small: raw client with default `n_batch=512` ``` llama_new_context_with_model: CPU input buffer size = 3.51 MiB llama_new_context_with_model: CPU compute buffer size = 21.00 MiB ``` langchain with `n_batch=8` ``` llama_new_context_with_model: CPU input buffer size = 0.04 MiB llama_new_context_with_model: CPU compute buffer size = 0.33 MiB ``` We can work around this by passing `n_batch=512`, but this will not be obvious to some users: ``` embedding = LlamaCppEmbeddings(model_path=embd_model_path, n_batch=512) ``` From discussion w/ @cebtenzzre. Related: https://github.com/abetlen/llama-cpp-python/issues/1189 Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
Zijian Han	8e976545f3	community[patch]: support OpenAI whisper base url (#17695 ) Description: The base URL for OpenAI is retrieved from the environment variable "OPENAI_BASE_URL", whereas for langchain it is obtained from "OPENAI_API_BASE". By adding `base_url = os.environ.get("OPENAI_API_BASE")`, the OpenAI proxy can execute correctly. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
Paulo Nascimento	44a3484503	community[patch]: add NotebookLoader unit test (#17721 ) Thank you for contributing to LangChain! - Description: added unit tests for NotebookLoader. Linked PR: https://github.com/langchain-ai/langchain/pull/17614 - Issue: [#17614](https://github.com/langchain-ai/langchain/pull/17614) - Twitter handle: @paulodoestech - [x] Pass lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified to check that you're passing lint and testing. See contribution guidelines for more information on how to write/run tests, lint, etc: https://python.langchain.com/docs/contributing/ - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: lachiewalker <lachiewalker1@hotmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
Paulo Nascimento	4c3a67122f	community[patch]: add Integration for OpenAI image gen with v1 sdk (#17771 ) Description: Created a Langchain Tool for OpenAI DALLE Image Generation. Issue: [#15901](https://github.com/langchain-ai/langchain/issues/15901) Dependencies: n/a Twitter handle: @paulodoestech - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
Jiaming	3d3cc71287	community[patch]: fix bugs for bilibili Loader (#18036 ) - Description: 1. Fix the BiliBiliLoader that can receive cookie parameters, it requires 3 other parameters to run. The change is backward compatible. 2. Add test; 3. Add example in docs - Issue: [#14213] Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
Sachin Paryani	25c9f3d1d1	community[patch]: Support Streaming in Azure Machine Learning (#18246 ) - [x] PR title: "community: Support streaming in Azure ML and few naming changes" - [x] PR message: - Description: Added support for streaming for azureml_endpoint. Also, renamed and AzureMLEndpointApiType.realtime to AzureMLEndpointApiType.dedicated. Also, added new classes CustomOpenAIChatContentFormatter and CustomOpenAIContentFormatter and updated the classes LlamaChatContentFormatter and LlamaContentFormatter to now show a deprecated warning message when instantiated. --------- Co-authored-by: Sachin Paryani <saparan@microsoft.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
Victor Adan	afa2d85405	community[patch]: Added missing from_documents method to KNNRetriever. (#18411 ) - Description: Added missing `from_documents` method to `KNNRetriever`, providing the ability to supply metadata to LangChain `Document`s, and to give it parity to the other retrievers, which do have `from_documents`. - Issue: None - Dependencies: None - Twitter handle: None Co-authored-by: Victor Adan <vadan@netroadshow.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
Smit Parmar	dfc4177b50	community[patch]: mypy ignore fix (#18483 ) Relates to #17048 Description : Applied fix to dynamodb and elasticsearch file. Error was : `Cannot override writeable attribute with read-only property` Suggestion: instead of adding ``` @messages.setter def messages(self, messages: List[BaseMessage]) -> None: raise NotImplementedError("Use add_messages instead") ``` we can change base class property `messages: List[BaseMessage]` to ``` @property def messages(self) -> List[BaseMessage]:... ``` then we don't need to add `@messages.setter` in all child classes.	3 months ago
Luca Dorigo	f19229c564	core[patch]: fix beta, deprecated typing (#18877 ) Description: While not technically incorrect, the TypeVar used for the `@beta` decorator prevented pyright (and thus most vscode users) from correctly seeing the types of functions/classes decorated with `@beta`. This is in part due to a small bug in pyright (https://github.com/microsoft/pyright/issues/7448 ) - however, the `Type` bound in the typevar `C = TypeVar("C", Type, Callable)` is not doing anything - classes are `Callables` by default, so by my understanding binding to `Type` does not actually provide any more safety - the modified annotation still works correctly for both functions, properties, and classes. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
wulixuan	b7c8bc8268	community[patch]: fix yuan2 errors in LLMs (#19004 ) 1. fix yuan2 errors while invoke Yuan2. 2. update tests.	3 months ago
高远	688ca48019	community[patch]: Adding validation when vector does not exist (#19698 ) Adding validation when vector does not exist Co-authored-by: gaoyuan <gaoyuan.20001218@bytedance.com>	3 months ago
Chaunte W. Lacewell	4a49fc5a95	community[patch]: Fix bug in vdms (#19728 ) Description: Fix embedding check in vdms Contribution maintainer: [@cwlacewe](https://github.com/cwlacewe)	3 months ago
高璟琦	75173d31db	community[minor]: Add solar model chat model (#18556 ) Add our solar chat models, available model choices: * solar-1-mini-chat * solar-1-mini-translate-enko * solar-1-mini-translate-koen More documents and pricing can be found at https://console.upstage.ai/services/solar. The references to our solar model can be found at * https://arxiv.org/abs/2402.17032 --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
Davide Menini	f7042321f1	community[patch]: gather token usage info in BedrockChat during generation (#19127 ) This PR allows to calculate token usage for prompts and completion directly in the generation method of BedrockChat. The token usage details are then returned together with the generations, so that other downstream tasks can access them easily. This allows to define a callback for tokens tracking and cost calculation, similarly to what happens with OpenAI (see [OpenAICallbackHandler](https://api.python.langchain.com/en/latest/_modules/langchain_community/callbacks/openai_info.html#OpenAICallbackHandler). I plan on adding a BedrockCallbackHandler later. Right now keeping track of tokens in the callback is already possible, but it requires passing the llm, as done here: https://how.wtf/how-to-count-amazon-bedrock-anthropic-tokens-with-langchain.html. However, I find the approach of this PR cleaner. Thanks for your reviews. FYI @baskaryan, @hwchase17 --------- Co-authored-by: taamedag <Davide.Menini@swisscom.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
ligang-super	a662468dde	community[patch]: Fix the error of Baidu Qianfan not passing the stop parameter (#18666 ) - [x] PR title: "community: fix baidu qianfan missing stop parameter" - [x] PR message: - **Description: Baidu Qianfan lost the stop parameter when requesting service due to extracting it from kwargs. This bug can cause the agent to receive incorrect results --------- Co-authored-by: ligang33 <ligang33@baidu.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
kaijietti	9c4b6dc979	community[patch]: fix bug in cohere that `async for` a coroutine in ChatCohere (#19381 ) Without `await`, the `stream` returned from the `async_client` is actually a coroutine, which could not be used in `async for`.	3 months ago
Christian Galo	1adaa3c662	community[minor]: Update Azure Cognitive Services to Azure AI Services (#19488 ) This is a follow up to #18371. These are the changes: - New Azure AI Services toolkit and tools to replace those of Azure Cognitive Services. - Updated documentation for Microsoft platform. - The image analysis tool has been rewritten to use the new package `azure-ai-vision-imageanalysis`, doing a proper replacement of `azure-ai-vision`. These changes: - Update outdated naming from "Azure Cognitive Services" to "Azure AI Services". - Update documentation to use non-deprecated methods to create and use agents. - Removes need to depend on yanked python package (`azure-ai-vision`) There is one new dependency that is needed as a replacement to `azure-ai-vision`: - `azure-ai-vision-imageanalysis`. This is optional and declared within a function. There is a new `azure_ai_services.ipynb` notebook showing usage; Changes have been linted and formatted. I am leaving the actions of adding deprecation notices and future removal of Azure Cognitive Services up to the LangChain team, as I am not sure what the current practice around this is. --- If this PR makes it, my handle is @galo@mastodon.social --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com>	3 months ago
Shengsheng Huang	ac1dd8ad94	community[minor]: migrate `bigdl-llm` to `ipex-llm` (#19518 ) - Description: `bigdl-llm` library has been renamed to [`ipex-llm`](https://github.com/intel-analytics/ipex-llm). This PR migrates the `bigdl-llm` integration to `ipex-llm` . - Issue: N/A. The original PR of `bigdl-llm` is https://github.com/langchain-ai/langchain/pull/17953 - Dependencies: `ipex-llm` library - Contribution maintainer: @shane-huang Updated doc: docs/docs/integrations/llms/ipex_llm.ipynb Updated test: libs/community/tests/integration_tests/llms/test_ipex_llm.py	3 months ago
Chaunte W. Lacewell	a31f692f4e	community[minor]: Add VDMS vectorstore (#19551 ) - Description: Add support for Intel Lab's [Visual Data Management System (VDMS)](https://github.com/IntelLabs/vdms) as a vector store - Dependencies: `vdms` library which requires protobuf = "4.24.2". There is a conflict with dashvector in `langchain` package but conflict is resolved in `community`. - Contribution maintainer: [@cwlacewe](https://github.com/cwlacewe) - Added tests: libs/community/tests/integration_tests/vectorstores/test_vdms.py - Added docs: docs/docs/integrations/vectorstores/vdms.ipynb - Added cookbook: cookbook/multi_modal_RAG_vdms.ipynb --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
William FH	b7b62e29fb	community[patch], mongodb[patch]: Stop spamming SIMD import warnings (#19531 ) If you use an embedding dist function in an eval loop, you get warned every time. Would prefer to just check once and forget about it. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
yongheng.liu	7e29b6061f	community[minor]: integrate China Mobile Ecloud vector search (#15298 ) - Description: integrate China Mobile Ecloud vector search, - Dependencies: elasticsearch==7.10.1 Co-authored-by: liuyongheng <liuyongheng@cmss.chinamobile.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
Hyeongchan Kim	9b70131aed	community[patch]: refactor the type hint of `file_path` in `UnstructuredAPIFileLoader` class (#18839 ) * Description: add `None` type for `file_path` along with `str` and `List[str]` types. * `file_path`/`filename` arguments in `get_elements_from_api()` and `partition()` can be `None`, however, there's no `None` type hint for `file_path` in `UnstructuredAPIFileLoader` and `UnstructuredFileLoader` currently. * calling the function with `file_path=None` is no problem, but my IDE annoys me lol. * Issue: N/A * Dependencies: N/A Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
CaroFG	cf96060ab7	community[patch]: update for compatibility with latest Meilisearch version (#18970 ) - Description: Updates Meilisearch vectorstore for compatibility with v1.6 and above. Adds embedders settings and embedder_name which are now required. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
chyroc	be2adb1083	community[patch]: support unstructured_kwargs for s3 loader (#15473 ) fix https://github.com/langchain-ai/langchain/issues/15472 Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
Evgenii Zheltonozhskii	5b1f9c6d3a	infra: Consistent lxml requirements (#19520 ) Update the dependency for lxml to be consistent among different packages; should fix https://github.com/langchain-ai/langchain/issues/19040	3 months ago
Tomaz Bratanic	87d2a6b777	community[minor]: Add the option to omit schema refresh in Neo4jGraph (#19654 )	3 months ago
Rajendra Kadam	0019d8a948	community[minor]: Add support for non-file-based Document Loaders in PebbloSafeLoader (#19574 ) Description: PebbloSafeLoader: Add support for non-file-based Document Loaders This pull request enhances PebbloSafeLoader by introducing support for several non-file-based Document Loaders. With this update, PebbloSafeLoader now seamlessly integrates with the following loaders: - GoogleDriveLoader - SlackDirectoryLoader - Unstructured EmailLoader Issue: NA Dependencies: - None Twitter handle: @Raj__725 --------- Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	3 months ago
hulitaitai	dc2c9dd4d7	Update text2vec.py (#19657 ) Add that URL of the embedding tool "text2vec". Fix minor mistakes in the doc-string.	3 months ago
Guangdong Liu	7042934b5f	community[patch]: Fix the bug that Chroma does not specify `embedding_function` (#19277 ) - Issue: close #18291 - @baskaryan, @eyurtsev PTAL	3 months ago
yuwenzho	3a7d2cf443	community[minor]: Add ITREX optimized Embeddings (#18474 ) Introduction [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers) is an innovative toolkit designed to accelerate GenAI/LLM everywhere with the optimal performance of Transformer-based models on various Intel platforms Description adding ITREX runtime embeddings using intel-extension-for-transformers. added mdx documentation and example notebooks added embedding import testing. --------- Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
Fabrizio Ruocco	f12cb0bea4	community[patch]: Microsoft Azure Document Intelligence updates (#16932 ) - Description: Update Azure Document Intelligence implementation by Microsoft team and RAG cookbook with Azure AI Search --------- Co-authored-by: Lu Zhang (AI) <luzhan@microsoft.com> Co-authored-by: Yateng Hong <yatengh@microsoft.com> Co-authored-by: teethache <hongyateng2006@126.com> Co-authored-by: Lu Zhang <44625949+luzhang06@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
Timothy	ad77fa15ee	community[patch]: Adding try-except block for GCSDirectoryLoader (#19591 ) - Description: Implemented try-except block for `GCSDirectoryLoader`. Reason: Users processing large number of unstructured files in a folder may experience many different errors. A try-exception block is added to capture these errors. A new argument `use_try_except=True` is added to enable silent failure so that error caused by processing one file does not break the whole function. - Issue: N/A - Dependencies: no new dependencies - Twitter handle: timothywong731 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
xsai9101	160a8eb178	community[minor]: add oracle autonomous database doc loader integration (#19536 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: Adding oracle autonomous database document loader integration. This will allow users to connect to oracle autonomous database through connection string or TNS configuration. https://www.oracle.com/autonomous-database/ - Issue: None - Dependencies: oracledb python package https://pypi.org/project/oracledb/ - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. Unit test and doc are added. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
Adam Law	aeb7b6b11d	community[patch]: use semantic_configurations in AzureSearch (#19347 ) - Description: Currently the semantic_configurations are not used when creating an AzureSearch instance, instead creating a new one with default values. This PR changes the behavior to use the passed semantic_configurations if it is present, and the existing default configuration if not. --------- Co-authored-by: Adam Law <adamlaw@microsoft.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
Adrian Valente	2763d8cbe5	community: add len() implementation to Chroma (#19419 ) Thank you for contributing to LangChain! - [x] Add len() implementation to Chroma: "package: community" - [x] PR message: - Description: add an implementation of the __len__() method for the Chroma vectostore, for convenience. - Issue: no exposed method to know the size of a Chroma vectorstore - Dependencies: None - Twitter handle: lowrank_adrian - [x] Add tests and docs - [x] Lint and test --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
Tom Aarsen	e0a1278d2b	docs: HFEmbeddings: Add more information to model_kwargs/encode_kwargs (#19594 ) - Description: Be more explicit with the `model_kwargs` and `encode_kwargs` for `HuggingFaceEmbeddings`. - Issue: - - Dependencies: - I received some reports by my users that they didn't realise that you could change the default `batch_size` with `HuggingFaceEmbeddings`, which may be attributed to how the `model_kwargs` and `encode_kwargs` don't give much information about what you can specify. I've added some parameter names & links to the Sentence Transformers documentation to help clear it up. Let me know if you'd rather have Markdown/Sphinx-style hyperlinks rather than a "bare URL". - Tom Aarsen	3 months ago
Dobiichi-Origami	18e6f9376d	community[Qianfan]: add function_call in additional_kwargs (#19550 ) - Description: add lacked `function_call` field in `additional_kwargs` in previous version - Dependencies: None of new dependency	3 months ago
mwmajewsk	f7a1fd91b8	community: better support of pathlib paths in document loaders (#18396 ) So this arose from the https://github.com/langchain-ai/langchain/pull/18397 problem of document loaders not supporting `pathlib.Path`. This pull request provides more uniform support for Path as an argument. The core ideas for this upgrade: - if there is a local file path used as an argument, it should be supported as `pathlib.Path` - if there are some external calls that may or may not support Pathlib, the argument is immidiately converted to `str` - if there `self.file_path` is used in a way that it allows for it to stay pathlib without conversion, is is only converted for the metadata. Twitter handle: https://twitter.com/mwmajewsk	3 months ago
Yuki Watanabe	cfecbda48b	community[minor]: Allow passing `allow_dangerous_deserialization` when loading LLM chain (#18894 ) ### Issue Recently, the new `allow_dangerous_deserialization` flag was introduced for preventing unsafe model deserialization that relies on pickle without user's notice (#18696). Since then some LLMs like Databricks requires passing in this flag with true to instantiate the model. However, this breaks existing functionality to loading such LLMs within a chain using `load_chain` method, because the underlying loader function [load_llm_from_config](`f96dd57501/libs/langchain/langchain/chains/loading.py (L40)`) (and load_llm) ignores keyword arguments passed in. ### Solution This PR fixes this issue by propagating the `allow_dangerous_deserialization` argument to the class loader iff the LLM class has that field. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
hulitaitai	d7c14cb6f9	community[minor]: Add embeddings integration for text2vec (#19267 ) Create a Class which allows to use the "text2vec" open source embedding model. It should install the model by running 'pip install -U text2vec'. Example to call the model through LangChain: from langchain_community.embeddings.text2vec import Text2vecEmbeddings embedding = Text2vecEmbeddings() bookend.embed_documents([ "This is a CoSENT(Cosine Sentence) model.", "It maps sentences to a 768 dimensional dense vector space.", ]) bookend.embed_query( "It can be used for text matching or semantic search." ) --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	3 months ago
Kalyan Mudumby	d27600c6f7	community[patch]: GPTCache pydantic validation error on lookup (#19427 ) Description: this change fixes the pydantic validation error when looking up from GPTCache, the `ChatOpenAI` class returns `ChatGeneration` as response which is not handled. use the existing `_loads_generations` and `_dumps_generations` functions to handle it Trace ``` File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/development/scripts/chatbot-postgres-test.py", line 90, in <module> print(llm.invoke("tell me a joke")) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 166, in invoke self.generate_prompt( File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 544, in generate_prompt return self.generate(prompt_messages, stop=stop, callbacks=callbacks, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 408, in generate raise e File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 398, in generate self._generate_with_cache( File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 585, in _generate_with_cache cache_val = llm_cache.lookup(prompt, llm_string) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_community/cache.py", line 807, in lookup return [ ^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_community/cache.py", line 808, in <listcomp> Generation(generation_dict) for generation_dict in json.loads(res) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/load/serializable.py", line 120, in __init__ super().__init__(**kwargs) File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/pydantic/v1/main.py", line 341, in __init__ raise validation_error pydantic.v1.error_wrappers.ValidationError: 1 validation error for Generation type unexpected value; permitted: 'Generation' (type=value_error.const; given=ChatGeneration; permitted=('Generation',)) ``` Although I don't seem to find any issues here, here's an [issue](https://github.com/zilliztech/GPTCache/issues/585) raised in GPTCache. Please let me know if I need to do anything else Thank you --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
Piyush Jain	72ba738bf5	community[minor]: Improvements for NeptuneRdfGraph, Improve discovery of graph schema using database statistics (#19546 ) Fixes linting for PR [19244](https://github.com/langchain-ai/langchain/pull/19244) --------- Co-authored-by: mhavey <mchavey@gmail.com>	3 months ago
Christophe Bornet	8595c3ab59	community[minor]: Add InMemoryVectorStore to module level imports (#19576 )	3 months ago
Aayush Kataria	03c38005cb	community[patch]: Fixing some caching issues for AzureCosmosDBSemanticCache (#18884 ) Fixing some issues for AzureCosmosDBSemanticCache - Added the entry for "AzureCosmosDBSemanticCache" which was missing in langchain/cache.py - Added application name when creating the MongoClient for the AzureCosmosDBVectorSearch, for tracking purposes. @baskaryan, can you please review this PR, we need this to go in asap. These are just small fixes which we found today in our testing.	3 months ago
Clément Tamines	a6cbb755a7	community[patch]: fix semantic answer bug in AzureSearch vector store (#18938 ) - Description: The `semantic_hybrid_search_with_score_and_rerank` method of `AzureSearch` contains a hardcoded field name "metadata" for the document metadata in the Azure AI Search Index. Adding such a field is optional when creating an Azure AI Search Index, as other snippets from `AzureSearch` test for the existence of this field before trying to access it. Furthermore, the metadata field name shouldn't be hardcoded as "metadata" and use the `FIELDS_METADATA` variable that defines this field name instead. In the current implementation, any index without a metadata field named "metadata" will yield an error if a semantic answer is returned by the search in `semantic_hybrid_search_with_score_and_rerank`. - Issue: https://github.com/langchain-ai/langchain/issues/18731 - Prior fix to this bug: This bug was fixed in this PR https://github.com/langchain-ai/langchain/pull/15642 by adding a check for the existence of the metadata field named `FIELDS_METADATA` and retrieving a value for the key called "key" in that metadata if it exists. If the field named `FIELDS_METADATA` was not present, an empty string was returned. This fix was removed in this PR https://github.com/langchain-ai/langchain/pull/15659 (see `ed1ffca911`#). @lz-chen: could you confirm this wasn't intentional? - New fix to this bug: I believe there was an oversight in the logic of the fix from [#1564](https://github.com/langchain-ai/langchain/pull/15642) which I explain below. The `semantic_hybrid_search_with_score_and_rerank` method creates a dictionary `semantic_answers_dict` with semantic answers returned by the search as follows. `5c2f7e6b2b/libs/community/langchain_community/vectorstores/azuresearch.py (L574-L581)` The keys in this dictionary are the unique document ids in the index, if I understand the [documentation of semantic answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers) in Azure AI Search correctly. When the method transforms a search result into a `Document` object, an "answer" key is added to the document's metadata. The value for this "answer" key should be the semantic answer returned by the search from this document, if such an answer is returned. The match between a `Document` object and the semantic answers returned by the search should be done through the unique document id, which is used as a key for the `semantic_answers_dict` dictionary. This id is defined in the search result's field named `FIELDS_ID`. I added a check to avoid any error in case no field named `FIELDS_ID` exists in a search result (which shouldn't happen in theory). A benefit of this approach is that this fix should work whether or not the Azure AI Search Index contains a metadata field. @levalencia could you confirm my analysis and test the fix? @raunakshrivastava7 do you agree with the fix? Thanks for the help!	3 months ago
Anindyadeep	b2a11ce686	community[minor]: Prem AI langchain integration (#19113 ) ### Prem SDK integration in LangChain This PR adds the integration with [PremAI's](https://www.premai.io/) prem-sdk with langchain. User can now access to deployed models (llms/embeddings) and use it with langchain's ecosystem. This PR adds the following: ### This PR adds the following: - [x] Add chat support - [X] Adding embedding support - [X] writing integration tests - [X] writing tests for chat - [X] writing tests for embedding - [X] writing unit tests - [X] writing tests for chat - [X] writing tests for embedding - [X] Adding documentation - [X] writing documentation for chat - [X] writing documentation for embedding - [X] run `make test` - [X] run `make lint`, `make lint_diff` - [X] Final checks (spell check, lint, format and overall testing) --------- Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
Souhail Hanfi	cbec43afa9	community[patch]: avoid creating extension PGvector while using readOnly Databases (#19268 ) - Description: PgVector class always runs "create extension" on init and this statement crashes on ReadOnly databases (read only replicas). but wierdly the next create collection etc work even in readOnly databases - Dependencies: no new dependencies - Twitter handle: @VenOmaX666 Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
Barun Amalkumar Halder	9246ec6b36	community[patch] : [Fiddler] ensure dataset is not added if model is present (#19293 ) Description: - minor PR to speed up onboarding by not trying to add a dataset, if a model is already present. - replace batch publish API with streaming when single events are published. Dependencies: any dependencies required for this change Twitter handle: behalder Co-authored-by: Barun Halder <barun@fiddler.ai>	3 months ago
JSDu	6e090280fd	community[patch]: milvus will autoflush, manual flush is slowly (#19300 ) reference: https://milvus.io/docs/configure_quota_limits.md#quotaAndLimitsflushRateenabled https://github.com/milvus-io/milvus/issues/31407 Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
mackong	e65dc4b95b	community[patch]: clean warning when delete by ids (#19301 ) * Description: rearrange to avoid variable overwrite, which cause warning always. * Issue: N/A * Dependencies: N/A	3 months ago
Stefano Mosconi	01fc69c191	community[patch]: expanding version in confluence loader (#19324 ) Description: Expanding version in all the Confluence API calls so to get when the page was last modified/created in all cases. Issue: #12812 Twitter handle: zzste	3 months ago
Dmitry Tyumentsev	08b769d539	community[patch]: YandexGPT Use recent yandexcloud sdk version (#19341 ) Fixed inability to work with [yandexcloud SDK](https://pypi.org/project/yandexcloud/) version higher 0.265.0	3 months ago
Marlene	f1313339ac	community[patch]: Fixing incorrect base URLs for Azure Cognitive Search Retriever (#19352 ) This PR adds code to make sure that the correct base URL is being created for the Azure Cognitive Search retriever. At the moment an incorrect base URL is being generated. I think this is happening because the original code was based on a depreciated API version. No dependencies need to be added. I've also added more context to the test doc strings. I should also note that ACS is now Azure AI Search. I will open a separate PR to make these changes as that would be a breaking change and should potentially be discussed. Twitter: @marlene_zw - No new tests added, however the current ACS retriever tests are now passing when I run them. - Code was linted. Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
FinTech秋田	03ba1d4731	community[patch]: Add Support for GPU Index Types in Milvus 2.4 (#19468 ) - Description: This commit introduces support for the newly available GPU index types introduced in Milvus 2.4 within the LangChain project's `milvus.py`. With the release of Milvus 2.4, a range of GPU-accelerated index types have been added, offering enhanced search capabilities and performance optimizations for vector search operations. This update ensures LangChain users can fully utilize the new performance benefits for vector search operations. - Reference: https://milvus.io/docs/gpu_index.md Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
Ash Vardanian	d01bad5169	core[patch]: Convert SimSIMD back to NumPy (#19473 ) This patch fixes the #18022 issue, converting the SimSIMD internal zero-copy outputs to NumPy. I've also noticed, that oftentimes `dtype=np.float32` conversion is used before passing to SimSIMD. Which numeric types do LangChain users generally care about? We support `float64`, `float32`, `float16`, and `int8` for cosine distances and `float16` seems reasonable for practically any kind of embeddings and any modern piece of hardware, so we can change that part as well 🤗	3 months ago
Mikelarg	dac2e0165a	community[minor]: Added GigaChat Embeddings support + updated previous GigaChat integration (#19516 ) - Description: Added integration with [GigaChat](https://developers.sber.ru/portal/products/gigachat) embeddings. Also added support for extra fields in GigaChat LLM and fixed docs.	3 months ago
Martin Kolb	e5bdb26f76	community[patch]: More flexible handling for entity names in vector store "HANA Cloud" (#19523 ) - Description: Added support for lower-case and mixed-case names The names for tables and columns previouly had to be UPPER_CASE. With this enhancement, also lower_case and MixedCase are supported, - Issue: N/A - Dependencies: no new dependecies added - Twitter handle: @sapopensource	3 months ago
billytrend-cohere	63343b4987	cohere[patch]: add cohere as a partner package (#19049 ) Description: adds support for langchain_cohere --------- Co-authored-by: Harry M <127103098+harry-cohere@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev>	3 months ago
ccurme	82de8fd6c9	add kwargs (#19519 ) `HanaDB.add_texts` is missing **kwargs.	3 months ago
Nikhil Kumar	3d3b46a782	docs: Update docs for `HuggingFacePipeline` (#19306 ) Updated `HuggingFacePipeline` docs to be in sync with list of supported tasks, including translation. - [x] PR title: "community: Update docs for `HuggingFacePipeline`" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: - Description: Update docs for `HuggingFacePipeline`, was earlier missing `translation` as a valid task - Issue: N/A - Dependencies: N/A - Twitter handle: None - [x] Add tests and docs: - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/	3 months ago
Igor Muniz Soares	743f888580	community[minor]: Dappier chat model integration (#19370 ) Description: This PR adds [Dappier](https://dappier.com/) for the chat model. It supports generate, async generate, and batch functionalities. We added unit and integration tests as well as a notebook with more details about our chat model. Dependencies: No extra dependencies are needed.	3 months ago
Hugoberry	96dc180883	community[minor]: Add `DuckDB` as a vectorstore (#18916 ) DuckDB has a cosine similarity function along list and array data types, which can be used as a vector store. - Description: The latest version of DuckDB features a cosine similarity function, which can be used with its support for list or array column types. This PR surfaces this functionality to langchain. - Dependencies: duckdb 0.10.0 - Twitter handle: @igocrite --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
preak95	6ea3e57a63	community[minor]: S3FileLoader to use expose mode and post_processors arguments of unstructured loader (#19270 ) Description: Update s3_file.py to use arguments mode and post_processors from the base class UnstructuredBaseLoader to include more metadata about the files from the S3 bucket such as 'page_number', 'languages' etc. Issue: NA Dependencies: None Twitter handle: preak95 --------- Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
fengjial	3b52ee05d1	community[patch]: fix bugs in baiduvectordb as vectorstore (#19380 ) fix small bugs in vectorstore/baiduvectordb	3 months ago
aditya thomas	515aab3312	community[patch]: invoke callback prior to yielding token (openai) (#19389 ) Description: Invoke callback prior to yielding token for BaseOpenAI & OpenAIChat Issue: [Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913](https://github.com/langchain-ai/langchain/issues/16913) Dependencies: None	3 months ago
aditya thomas	49e932cd24	community[patch]: invoke callback prior to yielding token (fireworks) (#19388 ) Description: Invoke callback prior to yielding token for Fireworks Issue: [Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913](https://github.com/langchain-ai/langchain/issues/16913) Dependencies: None	3 months ago
Tarun Jain	ef6d3d66d6	community[patch]: docarray requires hnsw installation (#19416 ) I have a small dataset, and I tried to use docarray: ``DocArrayHnswSearch ``. But when I execute, it returns: ```bash raise ImportError( ImportError: Could not import docarray python package. Please install it with `pip install "langchain[docarray]"`. ``` Instead of docarray it needs to be ```bash docarray[hnswlib] ``` Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	3 months ago
German Swan	d4dc98a9f9	community[patch]: RecursiveUrlLoader: add base_url option (#19421 ) RecursiveUrlLoader does not currently provide an option to set `base_url` other than the `url`, though it uses a function with such an option. For example, this causes it unable to parse the `https://python.langchain.com/docs`, as it returns the 404 page, and `https://python.langchain.com/docs/get_started/introduction` has no child routes to parse. `base_url` allows setting the `https://python.langchain.com/docs` to filter by, while the starting URL is anything inside, that contains relevant links to continue crawling. I understand that for this case, the docusaurus loader could be used, but it's a common issue with many websites. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	3 months ago
aditya thomas	4856a87261	community[patch]: invoke callback prior to yielding token (llama.cpp) (#19392 ) Description: Invoke callback prior to yielding token for llama.cpp Issue: [Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913](https://github.com/langchain-ai/langchain/issues/16913) Dependencies: None	3 months ago
billytrend-cohere	f6bcd42421	community[patch]: Replace positional argument with text=text for cohere>=5 compatibility (#19407 ) - Description: Replace positional argument with text=text for cohere>=5 compatibility	3 months ago
Bagatur	b58b38769d	community[patch]: Release 0.0.29 (#19350 )	3 months ago
Yudhajit Sinha	7d216ad1e1	community[patch]: Invoke callback prior to yielding token (titan_takeoff_pro) (#18624 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/titan_takeoff_pro. - Issue: #16913 - Dependencies: None	3 months ago
Yudhajit Sinha	455a74486b	community[patch]: Invoke callback prior to yielding token (sparkllm) (#18625 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/sparkllm. - Issue: #16913 - Dependencies: None	3 months ago
Yudhajit Sinha	5ac1860484	community[patch]: Invoke callback prior to yielding token (replicate) (#18626 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/replicate. - Issue: #16913 - Dependencies: None	3 months ago
Yudhajit Sinha	9525e392de	community[patch]: Invoke callback prior to yielding token (pai_eas_endpoint) (#18627 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/pai_eas_endpoint. - Issue: #16913 - Dependencies: None	3 months ago
Yudhajit Sinha	140f06e59a	community[patch]: Invoke callback prior to yielding token (openai) (#18628 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/openai. - Issue: #16913 - Dependencies: None	3 months ago
Yudhajit Sinha	280a914920	community[patch]: Invoke callback prior to yielding token (ollama) (#18629 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ & _astream_ methods in llms/ollama. - Issue: #16913 - Dependencies: None	3 months ago
Christophe Bornet	00614f332a	community[minor]: Add InMemoryVectorStore (#19326 ) This is a basic VectorStore implementation using an in-memory dict to store the documents. It doesn't need any extra/optional dependency as it uses numpy which is already a dependency of langchain. This is useful for quick testing, demos, examples. Also it allows to write vendor-neutral tutorials, guides, etc...	3 months ago
Nithish Raghunandanan	7ad0a3f2a7	community: add Couchbase Vector Store (#18994 ) - Description: Added support for Couchbase Vector Search to LangChain. - Dependencies: couchbase>=4.1.12 - Twitter handle: @nithishr --------- Co-authored-by: Nithish Raghunandanan <nithishr@users.noreply.github.com>	4 months ago
Christophe Bornet	30e4a35d7a	community: Use langchain-astradb for AstraDB caches (#18419 ) - [x] Needs https://github.com/langchain-ai/langchain-datastax/pull/4 - [x] Needs a new release of langchain-astradb	4 months ago
Vittorio Rigamonti	9b2f9ee952	community: VectorStore Infinispan, adding autoconfiguration (#18967 ) Description: this PR enable VectorStore autoconfiguration for Infinispan: if metadatas are only of basic types, protobuf config will be automatically generated for the user.	4 months ago
gonvee	b82644078e	community: Add `keep_alive` parameter to control how long the model w… (#19005 ) Add `keep_alive` parameter to control how long the model will stay loaded into memory with Ollama。 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
Harrison Chase	efcdf54edd	Josha91 fix docstring (#19249 ) Co-authored-by: Josha van Houdt <josha.van.houdt@sap.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	4 months ago
Taqi Jaffri	044bc22acc	Community: Add mistral oss model support to azureml endpoints, plus configurable timeout (#19123 ) - Description: There was no formatter for mistral models for Azure ML endpoints. Adding that, plus a configurable timeout (it was hard coded before) - Dependencies: none - Twitter handle: @tjaffri @docugami	4 months ago
Hamza Muhammad Farooqi	24a0a4472a	Add docstrings for Clickhouse class methods (#19195 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	4 months ago
Rohit Gupta	785f8ab174	[langchain_community] milvus vectorstores upsert: add kwargs to make it use for other argument also (#19193 ) add kwargs in add_documents for upsert, to make it use for other argument also. Lets use this, it was unused as of now. - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. Co-authored-by: Rohit Gupta <rohit.gupta2@walmart.com>	4 months ago
Guangdong Liu	c3310c5e7f	community: Fix Milvus got multiple values for keyword argument 'timeout' (#19232 ) - Description: Fix Milvus got multiple values for keyword argument 'timeout' - Issue: fix #18580 - @baskaryan @eyurtsev PTAL	4 months ago
Leonid Ganeline	7de1d9acfd	community: `llms` imports fixes (#18943 ) Classes are missed in __all__ and in different places of __init__.py - BaichuanLLM - ChatDatabricks - ChatMlflow - Llamafile - Mlflow - Together Added classes to __all__. I also sorted __all__ list. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	4 months ago
Kenzie Mihardja	21f75991d4	deprecate community docugami loader (#19230 ) Thank you for contributing to LangChain! - [x] PR title: "community: deprecate DocugamiLoader" - [x] PR message: Deprecate the langchain_community and use the docugami_langchain DocugamiLoader --------- Co-authored-by: Kenzie Mihardja <kenzie28@cs.washington.edu>	4 months ago
Pengfei Jiang	514fe80778	community[patch]: add stop parameter support to volcengine maas (#19052 ) - Description: add stop parameter to volcengine maas model - Dependencies: no --------- Co-authored-by: 江鹏飞 <jiangpengfei.jiangpf@bytedance.com>	4 months ago
htaoruan	bcc771e37c	docs: ChatTongyi example error (#19013 )	4 months ago
primate88	5aa68936e0	community: Fix import path for StreamingStdOutCallbackHandler example (#19170 ) - Description: - Updated the import path for `StreamingStdOutCallbackHandler` in the streaming response example within `huggingface_endpoint.py`. This change corrects the import statement to reflect the actual location of `StreamingStdOutCallbackHandler` in `langchain_core.callbacks.streaming_stdout`. - Issue: - None - Dependencies: - No additional dependencies are required for this change. - Twitter handle: - None ## Note: I have tested this change locally and confirmed that the `StreamingStdOutCallbackHandler` works as expected with the updated import path. This PR does not require the addition of new tests since it is a correction to documentation/examples rather than functional code.	4 months ago
Nikhil Kumar	635b3372bd	community[minor]: Add support for translation in HuggingFacePipeline (#19190 ) - [x] Support for translation: "community: Add support for translation in `HuggingFacePipeline`" - [x] Add support for translation in `HuggingFacePipeline`: - Description: Add support for translation in `HuggingFacePipeline`, which earlier used to support only text summarization and generation. - Issue: N/A - Dependencies: N/A - Twitter handle: None	4 months ago
k.muto	8d2c34e655	community: Fix all page numbers were the same for _BaseGoogleVertexAISearchRetriever (#19175 ) - Description: - This pull request is to fix a bug where page numbers were not set correctly. In the current code, all chunks share the same metadata object doc_metadata, so the page number is set with the same value for all documents. To fix this, I changed to using separate metadata objects for each chunk. - Issue: - None - Dependencies: - No additional dependencies are required for this change. - Twitter handle: - @eycjur - Test - Even if it's not a bug, there are cases where everything ends up with the same number of pages, so it's very difficult for me to write integration tests.	4 months ago
Cailin Wang	7cd87d2f6a	community: Add `partition` parameter to DashVector (#19023 ) Description: DashVector Add partition parameter Twitter handle: @CailinWang_ --------- Co-authored-by: root <root@Bluedot-AI>	4 months ago
Rodrigo Nogueira	e64cf1aba4	community: Add model argument for maritalk models and better error handling (#19187 )	4 months ago
Sergey Kozlov	1a55e950aa	community[patch]: support fastembed v1 and v2 (#19125 ) Description: #18040 forces `fastembed>2.0`, and this causes dependency conflicts with the new `unstructured` package (different `onnxruntime`). There may be other dependency conflicts.. The only way to use `langchain-community>=0.0.28` is rollback to `unstructured 0.10.X`. But new `unstructured` contains many fixes. This PR allows to use both `fastembed` `v1` and `v2`. How to reproduce: `pyproject.toml`: ```toml [tool.poetry] name = "depstest" version = "0.0.0" description = "test" authors = ["<dev@example.org>"] [tool.poetry.dependencies] python = ">=3.10,<3.12" langchain-community = "^0.0.28" fastembed = "^0.2.0" unstructured = {extras = ["pdf"], version = "^0.12"} ``` ```bash $ poetry lock ``` Co-authored-by: Sergey Kozlov <sergey.kozlov@ludditelabs.io>	4 months ago
高远	ef9813dae6	docs: add vikingdb docstrings(#19016 ) Co-authored-by: gaoyuan <gaoyuan.20001218@bytedance.com>	4 months ago
wulixuan	0e0030f494	community[patch]: fix yuan2 chat model errors while invoke. (#19015 ) 1. fix yuan2 chat model errors while invoke. 2. update related tests. 3. fix some deprecationWarning.	4 months ago
Shuai Liu	c244e1a50b	community[patch]: Fixed bug in merging `generation_info` during chunk concatenation in Tongyi and ChatTongyi (#19014 ) - Description: In #16218 , during the `GenerationChunk` and `ChatGenerationChunk` concatenation, the `generation_info` merging changed from simple keys & values replacement to using the util method [`merge_dicts`](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/utils/_merge.py): ![image](https://github.com/langchain-ai/langchain/assets/2098020/10f315bf-7fe0-43a7-a0ce-6a3834b99a15) The `merge_dicts` method could not handle merging values of `int` or some other types, and would raise a [`TypeError`](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/utils/_merge.py#L55). This PR fixes this issue in the Tongyi and ChatTongyi Model by adopting the `generation_info` of the last chunk and discarding the `generation_info` of the intermediate chunks, ensuring that `stream` and `astream` function correctly. - Issue: - Related issues or PRs about Tongyi & ChatTongyi: #16605, #17105 - Other models or cases: #18441, #17376 - Dependencies: No new dependencies	4 months ago
Christophe Bornet	f2a7dda4bd	community[patch]: Use langchain-astradb for AstraDB doc loader (#19071 ) Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	4 months ago
Holt Skinner	cee03630d9	community[patch]: Add Blended Search Support to `GoogleVertexAISearchRetriever` (#19082 ) https://cloud.google.com/generative-ai-app-builder/docs/create-data-store-es#multi-data-stores --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	4 months ago
case-k	ebc4a64f9e	docs: fix databricks document url (#19096 ) Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	4 months ago
Guangdong Liu	cced3eb9bc	community[patch]: Fix sparkllm embeddings api bug. (#19122 ) - Description: Fix sparkllm embeddings api bug. @baskaryan PTAL	4 months ago
kaijietti	c20aeef79a	community[patch]: implement qdrant _aembed_query and use it in other async funcs (#19155 ) `amax_marginal_relevance_search ` and `asimilarity_search_with_score ` should use an async version of `_embed_query `.	4 months ago
Barun Amalkumar Halder	34d6f0557d	community[patch] : publishes duration as milliseconds to Fiddler (#19166 ) Description: Many LLM steps complete in sub-second duration, which can lead to non-collection of duration field for Fiddler. This PR updates duration from seconds to milliseconds. Issue: [INTERNAL] FDL-17568 Dependencies: NA Twitter handle: behalder Co-authored-by: Barun Halder <barun@fiddler.ai>	4 months ago
Barun Amalkumar Halder	b551d49cf5	community[patch] : adds feedback and status for Fiddler callback handler events (#19157 ) Description: This PR adds updates the fiddler events schema to also pass user feedback, and llm status to fiddler Tickets: [INTERNAL] FDL-17559 Dependencies: NA Twitter handle: behalder Co-authored-by: Barun Halder <barun@fiddler.ai>	4 months ago
Juan Felipe Arias	f5b9aedc48	community[patch]: add args_schema to sql_database tools for langGraph integration (#18595 ) - Description: This modification adds pydantic input definition for sql_database tools. This helps for function calling capability in LangGraph. Since actions nodes will usually check for the args_schema attribute on tools, This update should make these tools compatible with it (only implemented on the InfoSQLDatabaseTool) - Issue: N/A - Dependencies: N/A - Twitter handle: juanfe8881	4 months ago
fengjial	c922ea36cb	community[minor]: Add Baidu VectorDB as vector store (#17997 ) Co-authored-by: fengjialin <fengjialin@MacBook-Pro.local>	4 months ago
Erick Friis	781aee0068	community, langchain, infra: revert store extended test deps outside of poetry (#19153 ) Reverts langchain-ai/langchain#18995 Because it makes installing dependencies in python 3.11 extended testing take 80 minutes	4 months ago
Erick Friis	9e569d85a4	community, langchain, infra: store extended test deps outside of poetry (#18995 ) poetry can't reliably handle resolving the number of optional "extended test" dependencies we have. If we instead just rely on pip to install extended test deps in CI, this isn't an issue.	4 months ago
Erick Friis	7ce81eb6f4	voyageai[patch]: init package (#19098 ) Co-authored-by: fodizoltan <zoltan@conway.expert> Co-authored-by: Yujie Qian <thomasq0809@gmail.com> Co-authored-by: fzowl <160063452+fzowl@users.noreply.github.com>	4 months ago
billytrend-cohere	7253b816cc	community: Add support for cohere SDK v5 (keeps v4 backwards compatibility) (#19084 ) - Description: Add support for cohere SDK v5 (keeps v4 backwards compatibility) --------- Co-authored-by: Erick Friis <erick@langchain.dev>	4 months ago
Eugene Yurtsev	6cdca4355d	community[minor]: Revamp PGVector Filtering (#18992 ) This PR makes the following updates in the pgvector database: 1. Use JSONB field for metadata instead of JSON 2. Update operator syntax to include required `$` prefix before the operators (otherwise there will be name collisions with fields) 3. The change is non-breaking, old functionality is still the default, but it will emit a deprecation warning 4. Previous functionality has bugs associated with comparisons due to casting to text (so lexical ordering is used incorrectly for numeric fields) 5. Adds an a GIN index on the JSONB field for more efficient querying	4 months ago
Anton Parkhomenko	ae73b9d839	community[patch]: Fix NotionDBLoader 400 Error by conditionally adding filter parameter (#19075 ) - Description: This change fixes a bug where attempts to load data from Notion using the NotionDBLoader resulted in a 400 Bad Request error. The issue was traced to the unconditional addition of an empty 'filter' object in the request payload, which Notion's API does not accept. The modification ensures that the 'filter' object is only included in the payload when it is explicitly provided and not empty, thus preventing the 400 error from occurring. - Issue: Fixes [#18009](https://github.com/langchain-ai/langchain/issues/18009) - Dependencies: None - Twitter handle: @gunnzolder Co-authored-by: Anton Parkhomenko <anton@merge.rocks>	4 months ago
Leonid Ganeline	9c8523b529	community[patch]: flattening imports 3 (#18939 ) @eyurtsev	4 months ago
Erick Friis	af50f21765	community[patch]: release 0.0.28 (#18993 )	4 months ago
Dobiichi-Origami	471f2ed40a	community[patch]: re-arrange the addtional_kwargs of returned qianfan structure to avoid _merge_dict issue (#18889 ) fix issue: https://github.com/langchain-ai/langchain/issues/18441 PTAL, thanks @baskaryan, @efriis, @eyurtsev, @hwchase17. --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	4 months ago
Tymofii	0bec1f6877	commnity[patch]: refactor code for faiss vectorstore, update faiss vectorstore documentation (#18092 ) Description: Refactor code of FAISS vectorcstore and update the related documentation. Details: - replace `.format()` with f-strings for strings formatting; - refactor definition of a filtering function to make code more readable and more flexible; - slightly improve efficiency of `max_marginal_relevance_search_with_score_by_vector` method by removing unnecessary looping over the same elements; - slightly improve efficiency of `delete` method by using set data structure for checking if the element was already deleted; Issue: fix small inconsistency in the documentation (the old example was incorrect and unappliable to faiss vectorstore) Dependencies: basic langchain-community dependencies and `faiss` (for CPU or for GPU) Twitter handle: antonenkodev	4 months ago
Leonid Ganeline	11195cfa42	community[patch]: speed up import times in the community package (#18928 ) This PR speeds up import times in the community package	4 months ago
Virat Singh	cafffe8a21	community: Add PolygonAggregates tool (#18882 ) Description: In this PR, I am adding a `PolygonAggregates` tool, which can be used to get historical stock price data (called aggregates by Polygon) for a given ticker. Polygon [docs](https://polygon.io/docs/stocks/get_v2_aggs_ticker__stocksticker__range__multiplier___timespan___from___to) for this endpoint. Twitter: [@virattt](https://twitter.com/virattt)	4 months ago
Mohammad Mohtashim	43db4cd20e	core[major]: On Tool End Observation Casting Fix (#18798 ) This PR updates the on_tool_end handlers to return the raw output from the tool instead of casting it to a string. This is technically a breaking change, though it's impact is expected to be somewhat minimal. It will fix behavior in `astream_events` as well. Fixes the following issue #18760 raised by @eyurtsev --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	4 months ago
Massimiliano Pronesti	8113d612bb	community[patch]: support modin document loader (#18866 ) Langchain community document loaders support `pyspark`, `polars`, and `pandas` dataframes but not `modin`'s. This PR addresses this point.	4 months ago
Pol Ruiz Farre	a7f63d8cb4	community[patch]: Fix BasePDFLoader suffix for s3 presigned urls (#18844 ) BasePDFLoader doesn't parse the suffix of the file correctly when parsing S3 presigned urls. This fix enables the proper detection and parsing of S3 presigned URLs to prevent errors such as `OSError: [Errno 36] File name too long`. No additional dependencies required.	4 months ago
Joshua Carroll	ddaf9de169	community: Fix bug with StreamlitChatMessageHistory (#18834 ) - Description: Fix Streamlit bug which was introduced by https://github.com/langchain-ai/langchain/pull/18250, update integration test - Issue: https://github.com/langchain-ai/langchain/issues/18684 - Dependencies: None	4 months ago
Tomaz Bratanic	a28be31a96	Switch to md5 for deduplication in neo4j integrations (#18846 ) Deduplicate documents using MD5 of the page_content. Also allows for custom deduplication with graph ingestion method by providing metadata id attribute --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	4 months ago
Leonid Ganeline	476d6dc596	community[patch]: Use getattr for `toolkits` imports (#18825 ) This will preserve the namespace, without actually loading the underlying packages on init.	4 months ago
Luis Antonio Vieira Junior	67c880af74	community[patch]: adding linearization config to AmazonTextractPDFLoader (#17489 ) - Description: Adding an optional parameter `linearization_config` to the `AmazonTextractPDFLoader` so the caller can define how the output will be linearized, instead of forcing a predefined set of linearization configs. It will still have a default configuration as this will be an optional parameter. - Issue: #17457 - Dependencies: The same ones that already exist for `AmazonTextractPDFLoader` - Twitter handle: [@lvieirajr19](https://twitter.com/lvieirajr19) --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
Anis ZAKARI	37e89ba5b1	community[patch]: Bedrock add support for mistral models (#18756 ) Description*: My previous [PR](https://github.com/langchain-ai/langchain/pull/18521) was mistakenly closed, so I am reopening this one. Context: AWS released two Mistral models on Bedrock last Friday (March 1, 2024). This PR includes some code adjustments to ensure their compatibility with the Bedrock class. --------- Co-authored-by: Anis ZAKARI <anis.zakari@hymaia.com> Co-authored-by: Erick Friis <erick@langchain.dev>	4 months ago
Keith Chan	914af69b44	community[patch]: Update azuresearch vectorstore from_texts() method to include fields argument (#17661 ) - Description: Update azuresearch vectorstore from_texts() method to include fields argument, necessary for creating an Azure AI Search index with custom fields. - Issue: Currently index fields are fixed to default fields if Azure Search index is created using from_texts() method - Dependencies: None - Twitter handle: None --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
al1p	46f0cea2b9	community[patch][: improved the suffix prompt to avoid loop (#17791 ) Small improvement to the openapi prompt. The agent was not finding the server base URL (looping through all nodes). This small change narrows the search and enables finding the url faster. No dependency Twitter : @al1pra	4 months ago
Théo LEBRUN	cf94091cd0	community[patch]: Skip nested directories when using S3DirectoryLoader (#17829 ) - Description: `S3DirectoryLoader` is failing if prefix is a folder (ex: `my_folder/`) because `S3FileLoader` will try to load that folder and will fail. This PR skip nested directories so prefix can be set to folder instead of `my_folder/files_prefix`. - Issue: - #11917 - #6535 - #4326 - Dependencies: none - Twitter handle: @Falydoor - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/	4 months ago
Venkatesan	7a18b63dbf	community[patch]: Mongo index creation (#17748 ) - [ ] Title: Mongodb: MongoDB connection performance improvement. - [ ] Message: - Description: I made collection index_creation as optional. Index Creation is one time process. - Issue: MongoDBChatMessageHistory class object is attempting to create an index during connection, causing each request to take longer than usual. This should be optional with a parameter. - Dependencies: N/A - Branch to be checked: origin/mongo_index_creation --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
wt3639	5b5b37a999	community[patch]: Add embedding instruction to HuggingFaceBgeEmbeddings (#18017 ) - Description: Add embedding instruction to HuggingFaceBgeEmbeddings, so that it can be compatible with nomic and other models that need embedding instruction. --------- Co-authored-by: Tao Wu <tao.wu@rwth-aachen.de> Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
Ishani Vyas	2b0cbd65ba	community[patch]: Add Passio Nutrition AI Food Search Tool to Community Package (#18278 ) ## Add Passio Nutrition AI Food Search Tool to Community Package ### Description We propose adding a new tool to the `community` package, enabling integration with Passio Nutrition AI for food search functionality. This tool will provide a simple interface for retrieving nutrition facts through the Passio Nutrition AI API, simplifying user access to nutrition data based on food search queries. ### Implementation Details - Class Structure: Implement `NutritionAI`, extending `BaseTool`. It includes an `_run` method that accepts a query string and, optionally, a `CallbackManagerForToolRun`. - API Integration: Use `NutritionAIAPI` for the API wrapper, encapsulating all interactions with the Passio Nutrition AI and providing a clean API interface. - Error Handling: Implement comprehensive error handling for API request failures. ### Expected Outcome - User Benefits: Enable easy querying of nutrition facts from Passio Nutrition AI, enhancing the utility of the `langchain_community` package for nutrition-related projects. - Functionality: Provide a straightforward method for integrating nutrition information retrieval into users' applications. ### Dependencies - `langchain_core` for base tooling support - `pydantic` for data validation and settings management - Consider `requests` or another HTTP client library if not covered by `NutritionAIAPI`. ### Tests and Documentation - Unit Tests: Include tests that mock network interactions to ensure tool reliability without external API dependency. - Documentation: Create an example notebook in `docs/docs/integrations/tools/passio_nutrition_ai.ipynb` showing usage, setup, and example queries. ### Contribution Guidelines Compliance - Adhere to the project's linting and formatting standards (`make format`, `make lint`, `make test`). - Ensure compliance with LangChain's contribution guidelines, particularly around dependency management and package modifications. ### Additional Notes - Aim for the tool to be a lightweight, focused addition, not introducing significant new dependencies or complexity. - Potential future enhancements could include caching for common queries to improve performance. ### Twitter Handle - Here is our Passio AI [twitter handle](https://twitter.com/@passio_ai) where we announce our products. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	4 months ago
Kushagra	b1f22bf76c	community[minor]: added a feature to filter documents in Mongoloader (#18253 ) "community: added a feature to filter documents in Mongoloader" - Description: added a feature to filter documents in Mongoloader - Feature: the feature #18251 - Dependencies: No - Twitter handle: https://twitter.com/im_Kushagra	4 months ago
Eugene Yurtsev	1f50274df7	community[patch]: Add pgvector to docker compose and update settings used in integration test (#18815 )	4 months ago
Christophe Bornet	e54a49b697	community[minor]: Add lazy_table_reflection param to SqlDatabase (#18742 ) For some DBs with lots of tables, reflection of all the tables can take very long. So this change will make the tables be reflected lazily when get_table_info() is called and `lazy_table_reflection` is True.	4 months ago
Christophe Bornet	ead2a74806	community: Implement lazy_load() for JSONLoader (#18643 ) Covered by `tests/unit_tests/document_loaders/test_json_loader.py`	4 months ago
Phat Vo	3ecb903d49	community[patch] : Tidy up and update Clarifai SDK functions (#18314 ) Description : * Tidy up, add missing docstring and fix unused params * Enable using session token	4 months ago
Tomaz Bratanic	4bfe888717	comunity[patch]: Fix neo4j sanitizing values (#18750 ) Fixing sanitization for when deeply nested lists appear	4 months ago
Yunmo Koo	fee6f983ef	community[minor]: Integration for `Friendli` LLM and `ChatFriendli` ChatModel. (#17913 ) ## Description - Add [Friendli](https://friendli.ai/) integration for `Friendli` LLM and `ChatFriendli` chat model. - Unit tests and integration tests corresponding to this change are added. - Documentations corresponding to this change are added. ## Dependencies - Optional dependency [`friendli-client`](https://pypi.org/project/friendli-client/) package is added only for those who use `Frienldi` or `ChatFriendli` model. ## Twitter handle - https://twitter.com/friendliai	4 months ago
Smit Parmar	aed46cd6f2	community[patch]: Added support for filter out AWS Kendra search by score confidence (#12920 ) Description: It will add support for filter out kendra search by score confidence which will make result more accurate. For example ``` retriever = AmazonKendraRetriever( index_id=kendra_index_id, top_k=5, region_name=region, score_confidence="HIGH" ) ``` Result will not include the records which has score confidence "LOW" or "MEDIUM". Relevant docs https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kendra/client/query.html https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kendra/client/retrieve.html Issue: the issue # it resolve #11801 twitter: [@SmitCode](https://twitter.com/SmitCode)	4 months ago
Ian	390ef6abe3	community[minor]: Add Initial Support for TiDB Vector Store (#15796 ) This pull request introduces initial support for the TiDB vector store. The current version is basic, laying the foundation for the vector store integration. While this implementation provides the essential features, we plan to expand and improve the TiDB vector store support with additional enhancements in future updates. Upcoming Enhancements: * Support for Vector Index Creation: To enhance the efficiency and performance of the vector store. * Support for max marginal relevance search. * Customized Table Structure Support: Recognizing the need for flexibility, we plan for more tailored and efficient data store solutions. Simple use case exmaple ```python from typing import List, Tuple from langchain.docstore.document import Document from langchain_community.vectorstores import TiDBVectorStore from langchain_openai import OpenAIEmbeddings db = TiDBVectorStore.from_texts( embedding=embeddings, texts=['Andrew like eating oranges', 'Alexandra is from England', 'Ketanji Brown Jackson is a judge'], table_name="tidb_vector_langchain", connection_string=tidb_connection_url, distance_strategy="cosine", ) query = "Can you tell me about Alexandra?" docs_with_score: List[Tuple[Document, float]] = db.similarity_search_with_score(query) for doc, score in docs_with_score: print("-" * 80) print("Score: ", score) print(doc.page_content) print("-" * 80) ```	4 months ago
Bagatur	3b1eb1f828	community[patch]: chat hf typing fix (#18693 )	4 months ago
Eugene Yurtsev	e188d4ecb0	Add dangerous parameter to requests tool (#18697 ) The tools are already documented as dangerous. Not clear whether adding an opt-in parameter is necessary or not	4 months ago
Erick Friis	1beb84b061	community[patch]: move pdf text tests to integration (#18746 )	4 months ago
Christophe Bornet	4a7d73b39d	community: If load() has been overridden, use it in default lazy_load() (#18690 )	4 months ago
Christophe Bornet	6cd7607816	community[patch]: Implement lazy_load() for MHTMLLoader (#18648 ) Covered by `tests/unit_tests/document_loaders/test_mhtml.py`	4 months ago
axiangcoding	9745b5894d	community[patch]: Chroma use uuid4 instead of uuid1 to generate random ids (#18723 ) - Description: Chroma use uuid4 instead of uuid1 as random ids. Use uuid1 may leak mac address, changing to uuid4 will not cause other effects. - Issue: None - Dependencies: None - Twitter handle: None	4 months ago
Guangdong Liu	ced5e7bae7	community[patch]: Fix sparkllm authentication problem. (#18651 ) - Description: fix sparkllm authentication problem.The current timestamp is in RFC1123 format. The time deviation must be controlled within 300s. I changed to re-obtain the url every time I ask a question. https://www.xfyun.cn/doc/spark/general_url_authentication.html#_1-2-%E9%89%B4%E6%9D%83%E5%8F%82%E6%95%B0	4 months ago
Erick Friis	89d32ffbbd	community[patch]: release 0.0.27 (#18708 )	4 months ago
Piyush Jain	2b234a4d96	Support for claude v3 models. (#18630 ) Fixes #18513. ## Description This PR attempts to fix the support for Anthropic Claude v3 models in BedrockChat LLM. The changes here has updated the payload to use the `messages` format instead of the formatted text prompt for all models; `messages` API is backwards compatible with all models in Anthropic, so this should not break the experience for any models. ## Notes The PR in the current form does not support the v3 models for the non-chat Bedrock LLM. This means, that with these changes, users won't be able to able to use the v3 models with the Bedrock LLM. I can open a separate PR to tackle this use-case, the intent here was to get this out quickly, so users can start using and test the chat LLM. The Bedrock LLM classes have also grown complex with a lot of conditions to support various providers and models, and is ripe for a refactor to make future changes more palatable. This refactor is likely to take longer, and requires more thorough testing from the community. Credit to PRs [18579](https://github.com/langchain-ai/langchain/pull/18579) and [18548](https://github.com/langchain-ai/langchain/pull/18548) for some of the code here. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	4 months ago
Sam Khano	1b4dcf22f3	community[minor]: Add DocumentDBVectorSearch VectorStore (#17757 ) Description: - Added Amazon DocumentDB Vector Search integration (HNSW index) - Added integration tests - Updated AWS documentation with DocumentDB Vector Search instructions - Added notebook for DocumentDB integration with example usage --------- Co-authored-by: EC2 Default User <ec2-user@ip-172-31-95-226.ec2.internal>	4 months ago
Vittorio Rigamonti	51f3902bc4	community[minor]: Adding support for Infinispan as VectorStore (#17861 ) Description: This integrates Infinispan as a vectorstore. Infinispan is an open-source key-value data grid, it can work as single node as well as distributed. Vector search is supported since release 15.x For more: [Infinispan Home](https://infinispan.org) Integration tests are provided as well as a demo notebook	4 months ago
Max Jakob	cca0167917	elasticsearch[patch], community[patch]: update references, deprecate community classes (#18506 ) Follow up on https://github.com/langchain-ai/langchain/pull/17467. - Update all references to the Elasticsearch classes to use the partners package. - Deprecate community classes. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
Djordje	12b4a4d860	community[patch]: Opensearch delete method added - indexing supported (#18522 ) - Description: Added delete method for OpenSearchVectorSearch, therefore indexing supported - Issue: No - Dependencies: No - Twitter handle: stkbmf	4 months ago
Christophe Bornet	db8db6faae	community: Implement lazy_load() for PlaywrightURLLoader (#18676 ) Integration tests: `tests/integration_tests/document_loaders/test_url_playwright.py`	4 months ago
Aaron Yi	c092db862e	community[patch]: make metadata and text optional as expected in DocArray (#18678 ) ValidationError: 2 validation errors for DocArrayDoc text Field required [type=missing, input_value={'embedding': [-0.0191128...9, 0.01005221541175212]}, input_type=dict] For further information visit https://errors.pydantic.dev/2.5/v/missing metadata Field required [type=missing, input_value={'embedding': [-0.0191128...9, 0.01005221541175212]}, input_type=dict] For further information visit https://errors.pydantic.dev/2.5/v/missing ``` In the `_get_doc_cls` method, the `DocArrayDoc` class is defined as follows: ```python class DocArrayDoc(BaseDoc): text: Optional[str] embedding: Optional[NdArray] = Field(**embeddings_params) metadata: Optional[dict] ```	4 months ago
Eugene Yurtsev	4c25b49229	community[major]: breaking change in some APIs to force users to opt-in for pickling (#18696 ) This is a PR that adds a dangerous load parameter to force users to opt in to use pickle. This is a PR that's meant to raise user awareness that the pickling module is involved.	4 months ago
Eugene Yurtsev	0e52961562	community[patch]: Patch tdidf retriever (CVE-2024-2057) (#18695 ) This is a patch for `CVE-2024-2057`: https://www.cve.org/CVERecord?id=CVE-2024-2057 This affects users that: * Use the `TFIDFRetriever` * Attempt to de-serialize it from an untrusted source that contains a malicious payload	4 months ago
Christophe Bornet	ea141511d8	core: Move document loader interfaces to core (#17723 ) This is needed to be able to move document loaders to partner packages. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	4 months ago
Christophe Bornet	5985454269	Merge pull request #18539 * Implement lazy_load() for GitLoader	4 months ago
Christophe Bornet	9a6f7e213b	Merge pull request #18423 * Implement lazy_load() for BSHTMLLoader	4 months ago
Christophe Bornet	b3a0c44838	Merge pull request #18673 * Implement lazy_load() for PDFMinerPDFasHTMLLoader and PyMuPDFLoader	4 months ago
Christophe Bornet	68fc0cf909	Merge pull request #18674 * Implement lazy_load() for TextLoader	4 months ago
Christophe Bornet	5b92f962f1	Merge pull request #18671 * Implement lazy_load() for MastodonTootsLoader	4 months ago
Christophe Bornet	15b1770326	Merge pull request #18421 * Implement lazy_load() for AssemblyAIAudioTranscriptLoader	4 months ago
Christophe Bornet	bb284eebe4	Merge pull request #18436 * Implement lazy_load() for ConfluenceLoader	4 months ago
Christophe Bornet	691480f491	Merge pull request #18647 * Implement lazy_load() for UnstructuredBaseLoader	4 months ago
Christophe Bornet	52ac67c5d8	Merge pull request #18654 * Implement lazy_load() for ObsidianLoader	4 months ago
Christophe Bornet	b9c0cf9025	Merge pull request #18656 * Implement lazy_load() for PsychicLoader	4 months ago
Christophe Bornet	aa7ac57b67	community: Implement lazy_load() for TrelloLoader (#18658 ) Covered by `tests/unit_tests/document_loaders/test_trello.py`	4 months ago
Christophe Bornet	302985fea1	community: Implement lazy_load() for SlackDirectoryLoader (#18675 ) Integration tests: `tests/integration_tests/document_loaders/test_slack.py`	4 months ago
Christophe Bornet	ed36f9f604	community: Implement lazy_load() for WhatsAppChatLoader (#18677 ) Integration test: `tests/integration_tests/document_loaders/test_whatsapp_chat.py`	4 months ago
Christophe Bornet	f414f5cdb9	community[minor]: Implement lazy_load() for WikipediaLoader (#18680 ) Integration test: `tests/integration_tests/document_loaders/test_wikipedia.py`	4 months ago
Bagatur	4cbfeeb1c2	community[patch]: Release 0.0.26 (#18683 )	4 months ago
Christophe Bornet	1100f8de7a	community[minor]: Implement lazy_load() for ArxivLoader (#18664 ) Integration tests: `tests/integration_tests/utilities/test_arxiv.py` and `tests/integration_tests/document_loaders/test_arxiv.py`	4 months ago
Christophe Bornet	2d96803ddd	community[minor]: Implement lazy_load() for OutlookMessageLoader (#18668 ) Integration test: `tests/integration_tests/document_loaders/test_email.py`	4 months ago
Christophe Bornet	ae167fb5b2	community[minor]: Implement lazy_load() for SitemapLoader (#18667 ) Integration tests: `test_sitemap.py` and `test_docusaurus.py`	4 months ago
Christophe Bornet	623dfcc55c	community[minor]: Implement lazy_load() for FacebookChatLoader (#18669 ) Integration test: `tests/integration_tests/document_loaders/test_facebook_chat.py`	4 months ago
Christophe Bornet	20794bb889	community[minor]: Implement lazy_load() for GitbookLoader (#18670 ) Integration test: `tests/integration_tests/document_loaders/test_gitbook.py`	4 months ago
Liang Zhang	81985b31e6	community[patch]: Databricks SerDe uses cloudpickle instead of pickle (#18607 ) - Description: Databricks SerDe uses cloudpickle instead of pickle when serializing a user-defined function transform_input_fn since pickle does not support functions defined in `__main__`, and cloudpickle supports this. - Dependencies: cloudpickle>=2.0.0 Added a unit test.	4 months ago
Christophe Bornet	7d6de96186	community[patch]: Implement lazy_load() for CubeSemanticLoader (#18535 ) Covered by `test_cube_semantic.py`	4 months ago
Christophe Bornet	a6b5d45e31	community[patch]: Implement lazy_load() for EverNoteLoader (#18538 ) Covered by `test_evernote_loader.py`	4 months ago
Sunchao Wang	dc81dba6cf	community[patch]: Improve amadeus tool and doc (#18509 ) Description: This pull request addresses two key improvements to the langchain repository: Fix for Crash in Flight Search Interface: Previously, the code would crash when encountering a failure scenario in the flight ticket search interface. This PR resolves this issue by implementing a fix to handle such scenarios gracefully. Now, the code handles failures in the flight search interface without crashing, ensuring smoother operation. Documentation Update for Amadeus Toolkit: Prior to this update, examples provided in the documentation for the Amadeus Toolkit were unable to run correctly due to outdated information. This PR includes an update to the documentation, ensuring that all examples can now be executed successfully. With this update, users can effectively utilize the Amadeus Toolkit with accurate and functioning examples. These changes aim to enhance the reliability and usability of the langchain repository by addressing issues related to error handling and ensuring that documentation remains up-to-date and actionable. Issue: https://github.com/langchain-ai/langchain/issues/17375 Twitter Handle: SingletonYxx	4 months ago
Christophe Bornet	f77f7dc3ec	community[patch]: Fix VectorStoreQATool (#18529 ) Fix #18460	4 months ago
Dounx	ad48f55357	community[minor]: add Yuque document loader (#17924 ) This pull request support loading documents from Yuque with Langchain. Yuque is a professional cloud-based knowledge base for team collaboration in documentation. Website: https://www.yuque.com OpenAPI: https://www.yuque.com/yuque/developer/openapi	4 months ago
Kazuki Maeda	60c5d964a8	community[minor]: use jq schema for content_key in json_loader (#18003 ) ### Description Changed the value specified for `content_key` in JSONLoader from a single key to a value based on jq schema. I created [similar PR](https://github.com/langchain-ai/langchain/pull/11255) before, but it has several conflicts because of the architectural change associated stable version release, so I re-create this PR to fit new architecture. ### Why For json data like the following, specify `.data[].attributes.message` for page_content and `.data[].attributes.id` or `.data[].attributes.attributes. tags`, etc., the `content_key` must also parse the json structure. <details> <summary>sample json data</summary> ```json { "data": [ { "attributes": { "message": "message1", "tags": [ "tag1" ] }, "id": "1" }, { "attributes": { "message": "message2", "tags": [ "tag2" ] }, "id": "2" } ] } ``` </details> <details> <summary>sample code</summary> ```python def metadata_func(record: dict, metadata: dict) -> dict: metadata["source"] = None metadata["id"] = record.get("id") metadata["tags"] = record["attributes"].get("tags") return metadata sample_file = "sample1.json" loader = JSONLoader( file_path=sample_file, jq_schema=".data[]", content_key=".attributes.message", ## content_key is parsable into jq schema is_content_key_jq_parsable=True, ## this is added parameter metadata_func=metadata_func ) data = loader.load() data ``` </details> ### Dependencies none ### Twitter handle [kzk_maeda](https://twitter.com/kzk_maeda)	4 months ago
Hech	6a08134661	community[patch], langchain[minor]: Add retriever self_query and score_threshold in DingoDB (#18106 )	4 months ago
Yudhajit Sinha	4570b477b9	community[patch]: Invoke callback prior to yielding token (titan_takeoff) (#18560 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/titan_takeoff. - Issue: #16913 - Dependencies: None	4 months ago
Tomaz Bratanic	ea51cdaede	Remove neo4j bloom labels from graph schema (#18564 ) Neo4j tools use particular node labels and relationship types to store metadata, but are irrelevant for text2cypher or graph generation, so we want to ignore them in the schema representation.	4 months ago
Erick Friis	e1924b3e93	core[patch]: deprecate hwchase17/langchain-hub, address path traversal (#18600 ) Deprecates the old langchain-hub repository. Does not deprecate the new https://smith.langchain.com/hub @PinkDraconian has correctly raised that in the event someone is loading unsanitized user input into the `try_load_from_hub` function, they have the ability to load files from other locations in github than the hwchase17/langchain-hub repository. This PR adds some more path checking to that function and deprecates the functionality in favor of the hub built into LangSmith.	4 months ago
Jib	9da1e0cf34	mongodb[patch]: Migrate MongoDBChatMessageHistory (#18590 ) ## Description Migrate the `MongoDBChatMessageHistory` to the managed `langchain-mongodb` partner-package ## Dependencies None ## Twitter handle @mongodb ## tests and docs - [x] Migrate existing integration test - [x ]~ Convert existing integration test to a unit test~ Creation is out of scope for this ticket - [x ] ~Considering delaying work until #17470 merges to leverage the `MockCollection` object. ~ - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ --------- Co-authored-by: Erick Friis <erick@langchain.dev>	4 months ago
Tomaz Bratanic	353248838d	Add precedence for input params over env variables in neo4j integration (#18581 ) input parameters take precedence over env variables	4 months ago
Christophe Bornet	c8a171a154	community: Implement lazy_load() for GithubFileLoader (#18584 )	4 months ago
Leonid Kuligin	04d134df17	marked MatchingEngine as deprecated (#18585 ) Thank you for contributing to LangChain! - [ ] PR title: "community: deprecate vectorstores.MatchingEngine" - [ ] PR message: - Description: announced a deprecation since this integration has been moved to langchain_google_vertexai	4 months ago
Erick Friis	343438e872	community[patch]: deprecate community fireworks (#18544 )	4 months ago
Scott Nath	b051bba1a9	community: Add you.com tool, add async to retriever, add async testing, add You tool doc (#18032 ) - Description: finishes adding the you.com functionality including: - add async functions to utility and retriever - add the You.com Tool - add async testing for utility, retriever, and tool - add a tool integration notebook page - Dependencies: any dependencies required for this change - Twitter handle: @scottnath	4 months ago
William De Vena	275877980e	community[patch]: Invoke callback prior to yielding token (#18447 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message Description: Invoke callback prior to yielding token in _stream method in llms/vertexai. Issue: https://github.com/langchain-ai/langchain/issues/16913 Dependencies: None	4 months ago
William De Vena	67375e96e0	community[patch]: Invoke callback prior to yielding token (#18448 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream method in llms/tongyi. - Issue: https://github.com/langchain-ai/langchain/issues/16913 - Dependencies: None	4 months ago
William De Vena	2087cbae64	community[patch]: Invoke callback prior to yielding token (#18449 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream method in chat_models/perplexity. - Issue: https://github.com/langchain-ai/langchain/issues/16913 - Dependencies: None	4 months ago
William De Vena	eb04d0d3e2	community[patch]: Invoke callback prior to yielding token (#18452 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream and _astream methods in llms/anthropic. - Issue: https://github.com/langchain-ai/langchain/issues/16913 - Dependencies: None	4 months ago
William De Vena	371bec79bc	community[patch]: Invoke callback prior to yielding token (#18454 ) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream and _astream methods in llms/baidu_qianfan_endpoint. - Issue: https://github.com/langchain-ai/langchain/issues/16913 - Dependencies: None	4 months ago
Aayush Kataria	7c2f3f6f95	community[minor]: Adding Azure Cosmos Mongo vCore Vector DB Cache (#16856 ) Description: This pull request introduces several enhancements for Azure Cosmos Vector DB, primarily focused on improving caching and search capabilities using Azure Cosmos MongoDB vCore Vector DB. Here's a summary of the changes: - AzureCosmosDBSemanticCache: Added a new cache implementation called AzureCosmosDBSemanticCache, which utilizes Azure Cosmos MongoDB vCore Vector DB for efficient caching of semantic data. Added comprehensive test cases for AzureCosmosDBSemanticCache to ensure its correctness and robustness. These tests cover various scenarios and edge cases to validate the cache's behavior. - HNSW Vector Search: Added HNSW vector search functionality in the CosmosDB Vector Search module. This enhancement enables more efficient and accurate vector searches by utilizing the HNSW (Hierarchical Navigable Small World) algorithm. Added corresponding test cases to validate the HNSW vector search functionality in both AzureCosmosDBSemanticCache and AzureCosmosDBVectorSearch. These tests ensure the correctness and performance of the HNSW search algorithm. - LLM Caching Notebook - The notebook now includes a comprehensive example showcasing the usage of the AzureCosmosDBSemanticCache. This example highlights how the cache can be employed to efficiently store and retrieve semantic data. Additionally, the example provides default values for all parameters used within the AzureCosmosDBSemanticCache, ensuring clarity and ease of understanding for users who are new to the cache implementation. @hwchase17,@baskaryan, @eyurtsev,	4 months ago
Erick Friis	1fd1ac8e95	community[patch]: release 0.0.25 (#18408 )	4 months ago
Sourav Pradhan	50abeb7ed9	community[patch]: fix Chroma add_images (#17964 ) ### Description Fixed a small bug in chroma.py add_images(), previously whenever we are not passing metadata the documents is containing the base64 of the uris passed, but when we are passing the metadata the documents is containing normal string uris which should not be the case. ### Issue In add_images() method when we are calling upsert() we have to use "b64_texts" instead of normal string "uris". ### Twitter handle https://twitter.com/whitepegasus01	4 months ago
Kate Silverstein	b7c71e2e07	community[minor]: llamafile embeddings support (#17976 ) * Description: adds `LlamafileEmbeddings` class implementation for generating embeddings using [llamafile](https://github.com/Mozilla-Ocho/llamafile)-based models. Includes related unit tests and notebook showing example usage. * Issue: N/A * Dependencies: N/A	4 months ago
Tomaz Bratanic	f6bfb969ba	community[patch]: Add an option for indexed generic label when import neo4j graph documents (#18122 ) Current implementation doesn't have an indexed property that would optimize the import. I have added a `baseEntityLabel` parameter that allows you to add a secondary node label, which has an indexed id `property`. By default, the behaviour is identical to previous version. Since multi-labeled nodes are terrible for text2cypher, I removed the secondary label from schema representation object and string, which is used in text2cypher.	4 months ago
Arun Sathiya	4adac20d7b	community[patch]: Make cohere_api_key a SecretStr (#12188 ) This PR makes `cohere_api_key` in `llms/cohere` a SecretStr, so that the API Key is not leaked when `Cohere.cohere_api_key` is represented as a string. --------- Signed-off-by: Arun <arun@arun.blog> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	4 months ago
Petteri Johansson	6c1989d292	community[minor], langchain[minor], docs: Gremlin Graph Store and QA Chain (#17683 ) - Description: New feature: Gremlin graph-store and QA chain (including docs). Compatible with Azure CosmosDB. - Dependencies: no changes	4 months ago
Ather Fawaz	a5ccf5d33c	community[minor]: Add support for Perplexity chat model(#17024 ) - Description: This PR adds support for [Perplexity AI APIs](https://blog.perplexity.ai/blog/introducing-pplx-api). - Issues: None - Dependencies: None - Twitter handle: [@atherfawaz](https://twitter.com/AtherFawaz) --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	4 months ago
Rodrigo Nogueira	3438d2cbcc	community[minor]: add maritalk chat (#17675 ) Description: Adds the MariTalk chat that is based on a LLM specially trained for Portuguese. Twitter handle: @MaritacaAI	4 months ago
sarahberenji	08fa38d56d	community[patch]: the syntax error for Redis generated query (#17717 ) To fix the reported error: https://github.com/langchain-ai/langchain/discussions/17397	4 months ago
certified-dodo	43e3244573	community[patch]: Fix MongoDBAtlasVectorSearch max_marginal_relevance_search (#17971 ) Description: * `self._embedding_key` is accessed after deletion, breaking `max_marginal_relevance_search` search * Introduced in: `e135e5257c` * Updated but still persists in: `ce22e10c4b` Issue: https://github.com/langchain-ai/langchain/issues/17963 Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
Nikita Titov	9f2ab37162	community[patch]: don't try to parse json in case of errored response (#18317 ) Related issue: #13896. In case Ollama is behind a proxy, proxy error responses cannot be viewed. You aren't even able to check response code. For example, if your Ollama has basic access authentication and it's not passed, `JSONDecodeError` will overwrite the truth response error. <details> <summary><b>Log now:</b></summary> ``` { "name": "JSONDecodeError", "message": "Expecting value: line 1 column 1 (char 0)", "stack": "--------------------------------------------------------------------------- JSONDecodeError Traceback (most recent call last) File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/requests/models.py:971, in Response.json(self, kwargs) 970 try: --> 971 return complexjson.loads(self.text, kwargs) 972 except JSONDecodeError as e: 973 # Catch JSON-related errors and raise as requests.JSONDecodeError 974 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError File /opt/miniforge3/envs/.gpt/lib/python3.10/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, kw) 343 if (cls is None and object_hook is None and 344 parse_int is None and parse_float is None and 345 parse_constant is None and object_pairs_hook is None and not kw): --> 346 return _default_decoder.decode(s) 347 if cls is None: File /opt/miniforge3/envs/.gpt/lib/python3.10/json/decoder.py:337, in JSONDecoder.decode(self, s, _w) 333 \"\"\"Return the Python representation of ``s`` (a ``str`` instance 334 containing a JSON document). 335 336 \"\"\" --> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 338 end = _w(s, end).end() File /opt/miniforge3/envs/.gpt/lib/python3.10/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx) 354 except StopIteration as err: --> 355 raise JSONDecodeError(\"Expecting value\", s, err.value) from None 356 return obj, end JSONDecodeError: Expecting value: line 1 column 1 (char 0) During handling of the above exception, another exception occurred: JSONDecodeError Traceback (most recent call last) Cell In[3], line 1 ----> 1 print(translate_func().invoke('text')) File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/runnables/base.py:2053, in RunnableSequence.invoke(self, input, config) 2051 try: 2052 for i, step in enumerate(self.steps): -> 2053 input = step.invoke( 2054 input, 2055 # mark each step as a child run 2056 patch_config( 2057 config, callbacks=run_manager.get_child(f\"seq:step:{i+1}\") 2058 ), 2059 ) 2060 # finish the root run 2061 except BaseException as e: File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:165, in BaseChatModel.invoke(self, input, config, stop, kwargs) 154 def invoke( 155 self, 156 input: LanguageModelInput, (...) 160 kwargs: Any, 161 ) -> BaseMessage: 162 config = ensure_config(config) 163 return cast( 164 ChatGeneration, --> 165 self.generate_prompt( 166 [self._convert_input(input)], 167 stop=stop, 168 callbacks=config.get(\"callbacks\"), 169 tags=config.get(\"tags\"), 170 metadata=config.get(\"metadata\"), 171 run_name=config.get(\"run_name\"), 172 kwargs, 173 ).generations[0][0], 174 ).message File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:543, in BaseChatModel.generate_prompt(self, prompts, stop, callbacks, kwargs) 535 def generate_prompt( 536 self, 537 prompts: List[PromptValue], (...) 540 kwargs: Any, 541 ) -> LLMResult: 542 prompt_messages = [p.to_messages() for p in prompts] --> 543 return self.generate(prompt_messages, stop=stop, callbacks=callbacks, kwargs) File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:407, in BaseChatModel.generate(self, messages, stop, callbacks, tags, metadata, run_name, kwargs) 405 if run_managers: 406 run_managers[i].on_llm_error(e, response=LLMResult(generations=[])) --> 407 raise e 408 flattened_outputs = [ 409 LLMResult(generations=[res.generations], llm_output=res.llm_output) 410 for res in results 411 ] 412 llm_output = self._combine_llm_outputs([res.llm_output for res in results]) File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:397, in BaseChatModel.generate(self, messages, stop, callbacks, tags, metadata, run_name, kwargs) 394 for i, m in enumerate(messages): 395 try: 396 results.append( --> 397 self._generate_with_cache( 398 m, 399 stop=stop, 400 run_manager=run_managers[i] if run_managers else None, 401 kwargs, 402 ) 403 ) 404 except BaseException as e: 405 if run_managers: File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:576, in BaseChatModel._generate_with_cache(self, messages, stop, run_manager, kwargs) 572 raise ValueError( 573 \"Asked to cache, but no cache found at `langchain.cache`.\" 574 ) 575 if new_arg_supported: --> 576 return self._generate( 577 messages, stop=stop, run_manager=run_manager, kwargs 578 ) 579 else: 580 return self._generate(messages, stop=stop, kwargs) File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_community/chat_models/ollama.py:250, in ChatOllama._generate(self, messages, stop, run_manager, kwargs) 226 def _generate( 227 self, 228 messages: List[BaseMessage], (...) 231 kwargs: Any, 232 ) -> ChatResult: 233 \"\"\"Call out to Ollama's generate endpoint. 234 235 Args: (...) 247 ]) 248 \"\"\" --> 250 final_chunk = self._chat_stream_with_aggregation( 251 messages, 252 stop=stop, 253 run_manager=run_manager, 254 verbose=self.verbose, 255 kwargs, 256 ) 257 chat_generation = ChatGeneration( 258 message=AIMessage(content=final_chunk.text), 259 generation_info=final_chunk.generation_info, 260 ) 261 return ChatResult(generations=[chat_generation]) File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_community/chat_models/ollama.py:183, in ChatOllama._chat_stream_with_aggregation(self, messages, stop, run_manager, verbose, kwargs) 174 def _chat_stream_with_aggregation( 175 self, 176 messages: List[BaseMessage], (...) 180 kwargs: Any, 181 ) -> ChatGenerationChunk: 182 final_chunk: Optional[ChatGenerationChunk] = None --> 183 for stream_resp in self._create_chat_stream(messages, stop, kwargs): 184 if stream_resp: 185 chunk = _chat_stream_response_to_chat_generation_chunk(stream_resp) File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_community/chat_models/ollama.py:156, in ChatOllama._create_chat_stream(self, messages, stop, kwargs) 147 def _create_chat_stream( 148 self, 149 messages: List[BaseMessage], 150 stop: Optional[List[str]] = None, 151 kwargs: Any, 152 ) -> Iterator[str]: 153 payload = { 154 \"messages\": self._convert_messages_to_ollama_messages(messages), 155 } --> 156 yield from self._create_stream( 157 payload=payload, stop=stop, api_url=f\"{self.base_url}/api/chat/\", kwargs 158 ) File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_community/llms/ollama.py:234, in _OllamaCommon._create_stream(self, api_url, payload, stop, kwargs) 228 raise OllamaEndpointNotFoundError( 229 \"Ollama call failed with status code 404. \" 230 \"Maybe your model is not found \" 231 f\"and you should pull the model with `ollama pull {self.model}`.\" 232 ) 233 else: --> 234 optional_detail = response.json().get(\"error\") 235 raise ValueError( 236 f\"Ollama call failed with status code {response.status_code}.\" 237 f\" Details: {optional_detail}\" 238 ) 239 return response.iter_lines(decode_unicode=True) File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/requests/models.py:975, in Response.json(self, kwargs) 971 return complexjson.loads(self.text, kwargs) 972 except JSONDecodeError as e: 973 # Catch JSON-related errors and raise as requests.JSONDecodeError 974 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError --> 975 raise RequestsJSONDecodeError(e.msg, e.doc, e.pos) JSONDecodeError: Expecting value: line 1 column 1 (char 0)" } ``` </details> <details> <summary><b>Log after a fix:</b></summary> ``` { "name": "ValueError", "message": "Ollama call failed with status code 401. Details: <html>\r <head><title>401 Authorization Required</title></head>\r <body>\r <center><h1>401 Authorization Required</h1></center>\r <hr><center>nginx/1.18.0 (Ubuntu)</center>\r </body>\r </html>\r ", "stack": "--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[2], line 1 ----> 1 print(translate_func().invoke('text')) File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/runnables/base.py:2053, in RunnableSequence.invoke(self, input, config) 2051 try: 2052 for i, step in enumerate(self.steps): -> 2053 input = step.invoke( 2054 input, 2055 # mark each step as a child run 2056 patch_config( 2057 config, callbacks=run_manager.get_child(f\"seq:step:{i+1}\") 2058 ), 2059 ) 2060 # finish the root run 2061 except BaseException as e: File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:165, in BaseChatModel.invoke(self, input, config, stop, kwargs) 154 def invoke( 155 self, 156 input: LanguageModelInput, (...) 160 kwargs: Any, 161 ) -> BaseMessage: 162 config = ensure_config(config) 163 return cast( 164 ChatGeneration, --> 165 self.generate_prompt( 166 [self._convert_input(input)], 167 stop=stop, 168 callbacks=config.get(\"callbacks\"), 169 tags=config.get(\"tags\"), 170 metadata=config.get(\"metadata\"), 171 run_name=config.get(\"run_name\"), 172 kwargs, 173 ).generations[0][0], 174 ).message File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:543, in BaseChatModel.generate_prompt(self, prompts, stop, callbacks, kwargs) 535 def generate_prompt( 536 self, 537 prompts: List[PromptValue], (...) 540 kwargs: Any, 541 ) -> LLMResult: 542 prompt_messages = [p.to_messages() for p in prompts] --> 543 return self.generate(prompt_messages, stop=stop, callbacks=callbacks, kwargs) File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:407, in BaseChatModel.generate(self, messages, stop, callbacks, tags, metadata, run_name, kwargs) 405 if run_managers: 406 run_managers[i].on_llm_error(e, response=LLMResult(generations=[])) --> 407 raise e 408 flattened_outputs = [ 409 LLMResult(generations=[res.generations], llm_output=res.llm_output) 410 for res in results 411 ] 412 llm_output = self._combine_llm_outputs([res.llm_output for res in results]) File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:397, in BaseChatModel.generate(self, messages, stop, callbacks, tags, metadata, run_name, kwargs) 394 for i, m in enumerate(messages): 395 try: 396 results.append( --> 397 self._generate_with_cache( 398 m, 399 stop=stop, 400 run_manager=run_managers[i] if run_managers else None, 401 kwargs, 402 ) 403 ) 404 except BaseException as e: 405 if run_managers: File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py:576, in BaseChatModel._generate_with_cache(self, messages, stop, run_manager, kwargs) 572 raise ValueError( 573 \"Asked to cache, but no cache found at `langchain.cache`.\" 574 ) 575 if new_arg_supported: --> 576 return self._generate( 577 messages, stop=stop, run_manager=run_manager, kwargs 578 ) 579 else: 580 return self._generate(messages, stop=stop, kwargs) File /opt/miniforge3/envs/.gpt/lib/python3.10/site-packages/langchain_community/chat_models/ollama.py:250, in ChatOllama._generate(self, messages, stop, run_manager, kwargs) 226 def _generate( 227 self, 228 messages: List[BaseMessage], (...) 231 kwargs: Any, 232 ) -> ChatResult: 233 \"\"\"Call out to Ollama's generate endpoint. 234 235 Args: (...) 247 ]) 248 \"\"\" --> 250 final_chunk = self._chat_stream_with_aggregation( 251 messages, 252 stop=stop, 253 run_manager=run_manager, 254 verbose=self.verbose, 255 kwargs, 256 ) 257 chat_generation = ChatGeneration( 258 message=AIMessage(content=final_chunk.text), 259 generation_info=final_chunk.generation_info, 260 ) 261 return ChatResult(generations=[chat_generation]) File /storage/gpt-project/Repos/repo_nikita/gpt_lib/langchain/ollama.py:328, in ChatOllamaCustom._chat_stream_with_aggregation(self, messages, stop, run_manager, verbose, kwargs) 319 def _chat_stream_with_aggregation( 320 self, 321 messages: List[BaseMessage], (...) 325 kwargs: Any, 326 ) -> ChatGenerationChunk: 327 final_chunk: Optional[ChatGenerationChunk] = None --> 328 for stream_resp in self._create_chat_stream(messages, stop, kwargs): 329 if stream_resp: 330 chunk = _chat_stream_response_to_chat_generation_chunk(stream_resp) File /storage/gpt-project/Repos/repo_nikita/gpt_lib/langchain/ollama.py:301, in ChatOllamaCustom._create_chat_stream(self, messages, stop, kwargs) 292 def _create_chat_stream( 293 self, 294 messages: List[BaseMessage], 295 stop: Optional[List[str]] = None, 296 kwargs: Any, 297 ) -> Iterator[str]: 298 payload = { 299 \"messages\": self._convert_messages_to_ollama_messages(messages), 300 } --> 301 yield from self._create_stream( 302 payload=payload, stop=stop, api_url=f\"{self.base_url}/api/chat\", kwargs 303 ) File /storage/gpt-project/Repos/repo_nikita/gpt_lib/langchain/ollama.py:134, in _OllamaCommonCustom._create_stream(self, api_url, payload, stop, **kwargs) 132 else: 133 optional_detail = response.text --> 134 raise ValueError( 135 f\"Ollama call failed with status code {response.status_code}.\" 136 f\" Details: {optional_detail}\" 137 ) 138 return response.iter_lines(decode_unicode=True) ValueError: Ollama call failed with status code 401. Details: <html>\r <head><title>401 Authorization Required</title></head>\r <body>\r <center><h1>401 Authorization Required</h1></center>\r <hr><center>nginx/1.18.0 (Ubuntu)</center>\r </body>\r </html>\r " } ``` </details> The same is true for timeout errors or when you simply mistyped in `base_url` arg and get response from some other service, for instance. Real Ollama errors are still clearly readable: ``` ValueError: Ollama call failed with status code 400. Details: {"error":"invalid options: unknown_option"} ``` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
Yudhajit Sinha	e2b901c35b	community[patch]: chat message histrory mypy fix (#18250 ) Description: Fixed type: ignore's for mypy for chat_message_histories(streamlit) Adresses #17048 Planning to add more based on reviews	4 months ago
Hemslo Wang	58a2abf089	community[patch]: fix RecursiveUrlLoader metadata_extractor return type (#18193 ) Description: Fix `metadata_extractor` type for `RecursiveUrlLoader`, the default `_metadata_extractor` returns `dict` instead of `str`. Issue: N/A Dependencies: N/A Twitter handle: N/A Signed-off-by: Hemslo Wang <hemslo.wang@gmail.com>	4 months ago
Maxime Perrin	98380cff9b	community[patch]: removing "response_mode" parameter in llama_index retriever (#18180 ) - Description: Removing this line ```python response = index.query(query, response_mode="no_text", self.query_kwargs) ``` to ```python response = index.query(query, self.query_kwargs) ``` Since llama index query does not support response_mode anymore : ``` \| TypeError: BaseQueryEngine.query() got an unexpected keyword argument 'response_mode'```` - Twitter handle: @maximeperrin_ --------- Co-authored-by: Maxime Perrin <mperrin@doing.fr>	4 months ago
Christophe Bornet	177f51c7bd	community: Use default load() implementation in doc loaders (#18385 ) Following https://github.com/langchain-ai/langchain/pull/18289	4 months ago
mwmajewsk	e192f6b6eb	community[patch]: fix, better error message in deeplake vectoriser (#18397 ) If the document loader recieves Pathlib path instead of str, it reads the file correctly, but the problem begins when the document is added to Deeplake. This problem arises from casting the path to str in the metadata. ```python deeplake = True fname = Path('./lorem_ipsum.txt') loader = TextLoader(fname, encoding="utf-8") docs = loader.load_and_split() text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100) chunks= text_splitter.split_documents(docs) if deeplake: db = DeepLake(dataset_path=ds_path, embedding=embeddings, token=activeloop_token) db.add_documents(chunks) else: db = Chroma.from_documents(docs, embeddings) ``` So using this snippet of code the error message for deeplake looks like this: ``` [part of error message omitted] Traceback (most recent call last): File "/home/mwm/repositories/sources/fixing_langchain/main.py", line 53, in <module> db.add_documents(chunks) File "/home/mwm/repositories/sources/langchain/libs/core/langchain_core/vectorstores.py", line 139, in add_documents return self.add_texts(texts, metadatas, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/deeplake.py", line 258, in add_texts return self.vectorstore.add( ^^^^^^^^^^^^^^^^^^^^^ File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/deeplake_vectorstore.py", line 226, in add return self.dataset_handler.add( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/dataset_handlers/client_side_dataset_handler.py", line 139, in add dataset_utils.extend_or_ingest_dataset( File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/vector_search/dataset/dataset.py", line 544, in extend_or_ingest_dataset extend( File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/vector_search/dataset/dataset.py", line 505, in extend dataset.extend(batched_processed_tensors, progressbar=False) File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/dataset/dataset.py", line 3247, in extend raise SampleExtendError(str(e)) from e.__cause__ deeplake.util.exceptions.SampleExtendError: Failed to append a sample to the tensor 'metadata'. See more details in the traceback. If you wish to skip the samples that cause errors, please specify `ignore_errors=True`. ``` Which is does not explain the error well enough. The same error for chroma looks like this ``` During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/mwm/repositories/sources/fixing_langchain/main.py", line 56, in <module> db = Chroma.from_documents(docs, embeddings) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/chroma.py", line 778, in from_documents return cls.from_texts( ^^^^^^^^^^^^^^^ File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/chroma.py", line 736, in from_texts chroma_collection.add_texts( File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/chroma.py", line 309, in add_texts raise ValueError(e.args[0] + "\n\n" + msg) ValueError: Expected metadata value to be a str, int, float or bool, got lorem_ipsum.txt which is a <class 'pathlib.PosixPath'> Try filtering complex metadata from the document using langchain_community.vectorstores.utils.filter_complex_metadata. ``` Which is way more user friendly, so I just added information about possible mismatch of the type in the error message, the same way it is covered in chroma https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/vectorstores/chroma.py#L224	4 months ago
Daniel Chico	7d962278f6	community[patch]: type ignore fixes (#18395 ) Related to #17048	4 months ago
Christophe Bornet	69be82c86d	community[patch]: Implement lazy_load() for CSVLoader (#18391 ) Covered by `test_csv_loader.py`	4 months ago
Guangdong Liu	760a16ff32	community[patch]: Fix ChatModel for sparkllm Bug. (#18375 ) PR message: *Delete this entire checklist* and replace with - Description: fix sparkllm paramer error - Issue: close #18370 - Dependencies: change `IFLYTEK_SPARK_APP_URL` to `IFLYTEK_SPARK_API_URL` - Twitter handle: No	4 months ago
Yujie Qian	cbb65741a7	community[patch]: Voyage AI updates default model and batch size (#17655 ) - Description: update the default model and batch size in VoyageEmbeddings - Issue: N/A - Dependencies: N/A - Twitter handle: N/A --------- Co-authored-by: fodizoltan <zoltan@conway.expert>	4 months ago
Shengsheng Huang	ae471a7dcb	community[minor]: add BigDL-LLM integrations (#17953 ) - Description: [`bigdl-llm`](https://github.com/intel-analytics/BigDL) is a library for running LLM on Intel XPU (from Laptop to GPU to Cloud) using INT4/FP4/INT8/FP8 with very low latency (for any PyTorch model). This PR adds bigdl-llm integrations to langchain. - Issue: NA - Dependencies: `bigdl-llm` library - Contribution maintainer: @shane-huang Examples added: - docs/docs/integrations/llms/bigdl.ipynb	4 months ago
Ethan Yang	f61cb8d407	community[minor]: Add openvino backend support (#11591 ) - Description: add openvino backend support by HuggingFace Optimum Intel, - Dependencies: “optimum[openvino]”, --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	4 months ago
RadhikaBansal97	8bafd2df5e	community[patch]: Change github endpoint in GithubLoader (#17622 ) Description- - Changed the GitHub endpoint as existing was not working and giving 404 not found error - Also the existing function was failing if file_filter is not passed as the tree api return all paths including directory as well, and when get_file_content was iterating over these path, the function was failing for directory as the api was returning list of files inside the directory, so added a condition to ignore the paths if it a directory - Fixes this issue - https://github.com/langchain-ai/langchain/issues/17453 Co-authored-by: Radhika Bansal <Radhika.Bansal@veritas.com>	4 months ago
Anush	9d663f31fa	community[patch]: FastEmbed to latest (#18040 ) ## Description Updates the `langchain_community.embeddings.fastembed` provider as per the recent updates to [`FastEmbed`](https://github.com/qdrant/fastembed) library.	4 months ago
Eugene Yurtsev	51b661cfe8	community[patch]: BaseLoader load method should just delegate to lazy_load (#18289 ) load() should just reference lazy_load()	4 months ago
Bagatur	5efb5c099f	text-splitters[minor], langchain[minor], community[patch], templates, docs: langchain-text-splitters 0.0.1 (#18346 )	4 months ago
Erick Friis	eefb49680f	multiple[patch]: fix deprecation versions (#18349 )	4 months ago
Jib	72bfc1d3db	mongodb[minor]: MongoDB Partner Package -- Porting MongoDBAtlasVectorSearch (#17652 ) This PR migrates the existing MongoDBAtlasVectorSearch abstraction from the `langchain_community` section to the partners package section of the codebase. - [x] Run the partner package script as advised in the partner-packages documentation. - [x] Add Unit Tests - [x] Migrate Integration Tests - [x] Refactor `MongoDBAtlasVectorStore` (autogenerated) to `MongoDBAtlasVectorSearch` - [x] ~Remove~ deprecate the old `langchain_community` VectorStore references. ## Additional Callouts - Implemented the `delete` method - Included any missing async function implementations - `amax_marginal_relevance_search_by_vector` - `adelete` - Added new Unit Tests that test for functionality of `MongoDBVectorSearch` methods - Removed [`del res[self._embedding_key]`](`e0c81e1cb0/libs/community/langchain_community/vectorstores/mongodb_atlas.py (L218)`) in `_similarity_search_with_score` function as it would make the `maximal_marginal_relevance` function fail otherwise. The `Document` needs to store the embedding key in metadata to work. Checklist: - [x] PR title: Please title your PR "package: description", where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message - [x] Pass lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified to check that you're passing lint and testing. See contribution guidelines for more information on how to write/run tests, lint, etc: https://python.langchain.com/docs/contributing/ - [x] Add tests and docs: If you're adding a new integration, please include 1. Existing tests supplied in docs/docs do not change. Updated docstrings for new functions like `delete` 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. (This already exists) If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Steven Silvester <steven.silvester@ieee.org> Co-authored-by: Erick Friis <erick@langchain.dev>	4 months ago
Kai Kugler	df234fb171	community[patch]: Fixing embedchain document mapping (#18255 ) - Description: The current embedchain implementation seems to handle document metadata differently than done in the current implementation of langchain and a KeyError is thrown. I would love for someone else to test this... --------- Co-authored-by: KKUGLER <kai.kugler@mercedes-benz.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Deshraj Yadav <deshraj@gatech.edu>	4 months ago
Erick Friis	040271f33a	community[patch]: remove llmlingua extended tests (#18344 )	4 months ago
Tomaz Bratanic	5999c4a240	Add support for parameters in neo4j retrieval query (#18310 ) Sometimes, you want to use various parameters in the retrieval query of Neo4j Vector to personalize/customize results. Before, when there were only predefined chains, it didn't really make sense. Now that it's all about custom chains and LCEL, it is worth adding since users can inject any params they wish at query time. Isn't prone to SQL injection-type attacks since we use parameters and not concatenating strings.	4 months ago
Virat Singh	cd926ac3dd	community: Add PolygonFinancials Tool (#18324 ) Description: In this PR, I am adding a `PolygonFinancials` tool, which can be used to get financials data for a given ticker. The financials data is the fundamental data that is found in income statements, balance sheets, and cash flow statements of public US companies. Twitter: [@virattt](https://twitter.com/virattt)	4 months ago
Christophe Bornet	8a81fcd5d3	community: Fix deprecation version of AstraDB VectorStore (#17991 )	4 months ago
mackong	2c42f3a955	ollama[patch]: delete suffix slash to avoid redirect (#18260 ) - Description: see [ollama](https://github.com/ollama/ollama/blob/main/server/routes.go#L949)'s route definitions - Issue: N/A - Dependencies: N/A	4 months ago
William De Vena	6b58943917	community[patch]: Invoke callback prior to yielding token (#18288 ) ## PR title community[patch]: Invoke callback prior to yielding PR message Description: Invoke on_llm_new_token callback prior to yielding token in _stream and _astream methods. Issue: https://github.com/langchain-ai/langchain/issues/16913 Dependencies: None Twitter handle: None	4 months ago
Eugene Yurtsev	cd52433ba0	community[minor]: Add `SQLDatabaseLoader` document loader (#18281 ) - Description: A generic document loader adapter for SQLAlchemy on top of LangChain's `SQLDatabaseLoader`. - Needed by: https://github.com/crate-workbench/langchain/pull/1 - Depends on: GH-16655 - Addressed to: @baskaryan, @cbornet, @eyurtsev Hi from CrateDB again, in the same spirit like GH-16243 and GH-16244, this patch breaks out another commit from https://github.com/crate-workbench/langchain/pull/1, in order to reduce the size of this patch before submitting it, and to separate concerns. To accompany the SQLAlchemy adapter implementation, the patch includes integration tests for both SQLite and PostgreSQL. Let me know if corresponding utility resources should be added at different spots. With kind regards, Andreas. ### Software Tests ```console docker compose --file libs/community/tests/integration_tests/document_loaders/docker-compose/postgresql.yml up ``` ```console cd libs/community pip install psycopg2-binary pytest -vvv tests/integration_tests -k sqldatabase ``` ``` 14 passed ``` ![image](https://github.com/langchain-ai/langchain/assets/453543/42be233c-eb37-4c76-a830-474276e01436) --------- Co-authored-by: Andreas Motl <andreas.motl@crate.io>	4 months ago

... 3 4 5 6 7 ...

944 Commits (c9e9470c5a3a12e2fe7fd14164dd5a26aec37fb0)