Commit Graph

3049 Commits (c53aa5cd37da7a2b6c6630097b813064247c460c)

Author SHA1 Message Date
Bagatur ad285ca15c
community[patch]: Release 0.0.21 (#17750) 5 months ago
Karim Lalani ea61302f71
community[patch]: bug fix - add empty metadata when metadata not provided (#17669)
Code fix to include empty medata dictionary to aadd_texts if metadata is
not provided.
5 months ago
CogniJT 919ebcc596
community[minor]: CogniSwitch Agent Toolkit for LangChain (#17312)
**Description**: CogniSwitch focusses on making GenAI usage more
reliable. It abstracts out the complexity & decision making required for
tuning processing, storage & retrieval. Using simple APIs documents /
URLs can be processed into a Knowledge Graph that can then be used to
answer questions.

**Dependencies**: No dependencies. Just network calls & API key required
**Tag maintainer**: @hwchase17
**Twitter handle**: https://github.com/CogniSwitch
**Documentation**: Please check
`docs/docs/integrations/toolkits/cogniswitch.ipynb`
**Tests**: The usual tool & toolkits tests using `test_imports.py`

PR has passed linting and testing before this submission.

---------

Co-authored-by: Saicharan Sridhara <145636106+saiCogniswitch@users.noreply.github.com>
5 months ago
Christophe Bornet 6275d8b1bf
docs: Fix AstraDBChatMessageHistory docstrings (#17740) 5 months ago
Pranav Agarwal 86ae48b781
experimental[minor]: Amazon Personalize support (#17436)
## Amazon Personalize support on Langchain

This PR is a successor to this PR -
https://github.com/langchain-ai/langchain/pull/13216

This PR introduces an integration with [Amazon
Personalize](https://aws.amazon.com/personalize/) to help you to
retrieve recommendations and use them in your natural language
applications. This integration provides two new components:

1. An `AmazonPersonalize` client, that provides a wrapper around the
Amazon Personalize API.
2. An `AmazonPersonalizeChain`, that provides a chain to pull in
recommendations using the client, and then generating the response in
natural language.

We have added this to langchain_experimental since there was feedback
from the previous PR about having this support in experimental rather
than the core or community extensions.

Here is some sample code to explain the usage.

```python

from langchain_experimental.recommenders import AmazonPersonalize
from langchain_experimental.recommenders import AmazonPersonalizeChain
from langchain.llms.bedrock import Bedrock

recommender_arn = "<insert_arn>"

client=AmazonPersonalize(
    credentials_profile_name="default",
    region_name="us-west-2",
    recommender_arn=recommender_arn
)
bedrock_llm = Bedrock(
    model_id="anthropic.claude-v2", 
    region_name="us-west-2"
)

chain = AmazonPersonalizeChain.from_llm(
    llm=bedrock_llm, 
    client=client
)
response = chain({'user_id': '1'})
```


Reviewer: @3coins
5 months ago
Aymeric Roucher 0d294760e7
Community: Fuse HuggingFace Endpoint-related classes into one (#17254)
## Description
Fuse HuggingFace Endpoint-related classes into one:
-
[HuggingFaceHub](5ceaf784f3/libs/community/langchain_community/llms/huggingface_hub.py)
-
[HuggingFaceTextGenInference](5ceaf784f3/libs/community/langchain_community/llms/huggingface_text_gen_inference.py)
- and
[HuggingFaceEndpoint](5ceaf784f3/libs/community/langchain_community/llms/huggingface_endpoint.py)

Are fused into
- HuggingFaceEndpoint

## Issue
The deduplication of classes was creating a lack of clarity, and
additional effort to develop classes leads to issues like [this
hack](5ceaf784f3/libs/community/langchain_community/llms/huggingface_endpoint.py (L159)).

## Dependancies

None, this removes dependancies.

## Twitter handle

If you want to post about this: @AymericRoucher

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Bagatur 8009be862e
core[patch]: Release 0.1.24 (#17744) 5 months ago
Raghav Dixit 6c18f73ca5
community[patch]: LanceDB integration improvements/fixes (#16173)
Hi, I'm from the LanceDB team.

Improves LanceDB integration by making it easier to use - now you aren't
required to create tables manually and pass them in the constructor,
although that is still backward compatible.

Bug fix - pandas was being used even though it's not a dependency for
LanceDB or langchain

PS - this issue was raised a few months ago but lost traction. It is a
feature improvement for our users kindly review this , Thanks !
5 months ago
Christophe Bornet e92e96193f
community[minor]: Add async methods to the AstraDB BaseStore (#16872)
---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
5 months ago
Mohammad Mohtashim 43dc5d3416
community[patch]: OpenLLM Client Fixes + Added Timeout Parameter (#17478)
- OpenLLM was using outdated method to get the final text output from
openllm client invocation which was raising the error. Therefore
corrected that.
- OpenLLM `_identifying_params` was getting the openllm's client
configuration using outdated attributes which was raising error.
- Updated the docstring for OpenLLM.
- Added timeout parameter to be passed to underlying openllm client.
5 months ago
Leonid Ganeline 1d2aa19aee
docs: Fix bug that caused the word "Beta" to appear twice in doc-strings (#17704)
The current issue:
Several beta descriptions in the API Reference are duplicated. For
example:
`[Beta] Get a context value.[Beta] Get a context value.` for the
[ContextGet
class](https://api.python.langchain.com/en/latest/core_api_reference.html#module-langchain_core.beta)
description.

NOTE: I've tested it only with a new ut! I cannot build API Reference
locally :(
This PR related to #17615
5 months ago
Guangdong Liu 73edf17b4e
community[minor]: Add Apache Doris as vector store (#17527)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Bagatur a058c8812d
community[patch]: add VoyageEmbeddings truncation (#17638) 5 months ago
Mohammad Mohtashim 8d4547ae97
[Langchain_community]: Corrected the imports to make them compatible with Sqlachemy <2.0 (#17653)
- Small Change in Imports in sql_database module to make it work with
Sqlachemy <2.0
 - This was identified in the following issue: #17616
5 months ago
Christophe Bornet 75465a2a3c
partners/astradb: Add dotenv to langchain-astradb integration tests (#17629) 5 months ago
Christophe Bornet 19ebc7418e
community: Use _AstraDBCollectionEnvironment in AstraDB VectorStore (community) (#17635)
Another PR will be done for the langchain-astradb package.

Note: for future PRs, devs will be done in the partner package only. This one is just to align with the rest of the components in the community package and it fixes a bunch of issues.
5 months ago
Mateusz Szewczyk e25b722ea9
watsonx[patch]: Invoke callback prior to yielding token when streaming (#17625)
**Description**: Invoke callback prior to yielding token in stream
method for watsonx.
 **Issue**: https://github.com/langchain-ai/langchain/issues/16913
5 months ago
Nejc Habjan b4fa847a90
community[minor]: add exclude parameter to DirectoryLoader (#17316)
- **Description:** adds an `exclude` parameter to the DirectoryLoader
class, based on similar behavior in GenericLoader
- **Issue:** discussed in
https://github.com/langchain-ai/langchain/discussions/9059 and I think
in some other issues that I cannot find at the moment 🙇
  - **Dependencies:** None
  - **Twitter handle:** don't have one sorry! Just https://github/nejch

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
Bagatur 8f14234afb
infra: ignore flakey lua test (#17618) 5 months ago
Krista Pratico bf8e3c6dd1
community[patch]: add fixes for AzureSearch after update to stable azure-search-documents library (#17599)
- **Description:** Addresses the bugs described in linked issue where an
import was erroneously removed and the rename of a keyword argument was
missed when migrating from beta --> stable of the azure-search-documents
package
- **Issue:** https://github.com/langchain-ai/langchain/issues/17598
- **Dependencies:** N/A
- **Twitter handle:** N/A
5 months ago
William FH 64743dea14
core[patch], community[patch], langchain[patch], experimental[patch], robocorp[patch]: bump LangSmith 0.1.* (#17567) 5 months ago
morgana 9d7ca7df6e
community[patch]: update copy of metadata in rockset vectorstore integration (#17612)
- **Description:** This fixes an issue with working with RecordManager.
RecordManager was generating new hashes on documents because `add_texts`
was modifying the metadata directly. Additionally moved some tests to
unit tests since that was a more appropriate home.
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** `@_morgan_adams_`
5 months ago
Erick Friis c8d96f30bd
exa[patch]: fix lint (#17610) 5 months ago
Erick Friis 8f5c70769d
astradb[patch]: fix core dep 3 (#17617) 5 months ago
Leonid Ganeline 0835ebad70
docs: Fix bug that caused the word "Deprecated" to appear twice in doc-strings (#17615)
The current issue:
Most of the deprecation descriptions are duplicated. For example:
`[Deprecated] Chat Agent.[Deprecated] Chat Agent.` for the [ChatAgent
class](https://api.python.langchain.com/en/latest/langchain_api_reference.html#classes)
description.

NOTE: I've tested it only with new ut! I cannot build API Reference
locally :(
5 months ago
Erick Friis aa31025dd7
astradb[patch]: fix core dep 2 (#17608) 5 months ago
Erick Friis cc562e7c58
astradb[patch]: fix core dep (#17606) 5 months ago
Stefano Lottini 5240ecab99
astradb: bootstrapping Astra DB as Partner Package (#16875)
**Description:** This PR introduces a new "Astra DB" Partner Package.

So far only the vector store class is _duplicated_ there, all others
following once this is validated and established.

Along with the move to separate package, incidentally, the class name
will change `AstraDB` => `AstraDBVectorStore`.

The strategy has been to duplicate the module (with prospected removal
from community at LangChain 0.2). Until then, the code will be kept in
sync with minimal, known differences (there is a makefile target to
automate drift control. Out of convenience with this check, the
community package has a class `AstraDBVectorStore` aliased to `AstraDB`
at the end of the module).

With this PR several bugfixes and improvement come to the vector store,
as well as a reshuffling of the doc pages/notebooks (Astra and
Cassandra) to align with the move to a separate package.

**Dependencies:** A brand new pyproject.toml in the new package, no
changes otherwise.

**Twitter handle:** `@rsprrs`

---------

Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
Erick Friis 6cc6faa00e
ai21: init package (#17592)
Co-authored-by: Asaf Gardin <asafg@ai21.com>
Co-authored-by: etang <etang@ai21.com>
Co-authored-by: asafgardin <147075902+asafgardin@users.noreply.github.com>
5 months ago
Moshe Berchansky 20a56fe0a2
community[minor]: Add QuantizedEmbedders (#17391)
**Description:** 
* adding Quantized embedders using optimum-intel and
intel-extension-for-pytorch.
* added mdx documentation and example notebooks 
* added embedding import testing.

**Dependencies:** 
optimum = {extras = ["neural-compressor"], version = "^1.14.0", optional
= true}
intel_extension_for_pytorch = {version = "^2.2.0", optional = true}

Dependencies have been added to pyproject.toml for the community lib.  

**Twitter handle:** @peter_izsak

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Amir Karbasi bccc9241ea
community[patch]: Resolve KuzuQAChain API Changes (#16885)
- **Description:** Updates to the Kuzu API had broken this
functionality. These updates resolve those issues and add a new test to
demonstrate the updates.
- **Issue:** #11874
- **Dependencies:** No new dependencies
- **Twitter handle:** @amirk08


Test results:
```
tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_query_no_params PASSED                                   [ 33%]
tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_query_params PASSED                                      [ 66%]
tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_refresh_schema PASSED                                    [100%]

=================================================== slowest 5 durations =================================================== 
0.53s call     tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_refresh_schema
0.34s call     tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_query_no_params
0.28s call     tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_query_params
0.03s teardown tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_refresh_schema
0.02s teardown tests/integration_tests/graphs/test_kuzu.py::TestKuzu::test_query_params
==================================================== 3 passed in 1.27s ==================================================== 
```
5 months ago
Rafail Giavrimis a84a3add25
Community[patch]: Adjusted import to be compatible with SQLAlchemy<2 (#17520)
- **Description:** Adjusts an import to directly import `Result` from
`sqlalchemy.engine`.
- **Issue:** #17519 
- **Dependencies:** N/A
- **Twitter handle:** @grafail
5 months ago
Zachary Toliver 6746adf363
community[patch]: pass bool value for fetch_schema_from_transport in GraphQLAPIWrapper (#17552)
- **Description:** Allow a bool value to be passed to
fetch_schema_from_transport since not all GraphQL instances support this
feature, such as TigerGraph.
- **Threads:** @zacharytoliver

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Christophe Bornet 789cd5198d
community[patch]: Use astrapy built-in pagination prefetch in AstraDBLoader (#17569) 5 months ago
Christophe Bornet 387cacb881
community[minor]: Add async methods to AstraDBChatMessageHistory (#17572) 5 months ago
Christophe Bornet ff1f985a2a
community: Fix some mypy types in cassandra doc loader (#17570)
Thank you!
5 months ago
Mo Latif f3e4a0e27f
langchain[patch]: Update Chain prep_inputs docstring (#17575)
**Description**: @eyurtsev Following up on #16644 to fix the docstring,
because `prep_inputs` is not longer doing any validation.
5 months ago
William FH fc1617c44f
Update contact link (#17563) 5 months ago
Christophe Bornet ca2d4078f3
community: Add async methods to AstraDBCache (#17415)
Adds async methods to AstraDBCache
5 months ago
Jan Cap 7ae3ce60d2
community[patch]: Fix pwd import that is not available on windows (#17532)
- **Description:** Resolving problem in
`langchain_community\document_loaders\pebblo.py` with `import pwd`.
`pwd` is not available on windows. import moved to try catch block
  - **Issue:** #17514
5 months ago
nvpranak 91bcc9c5c9
community[minor]: Nemo embeddings(#16206)
This PR is adding support for NVIDIA NeMo embeddings issue #16095.

---------

Co-authored-by: Praveen Nakshatrala <pnakshatrala@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Mattt394 7c6009b76f
experimental[patch]: Fixed typos in SmartLLMChain ideation and critique prompts (#11507)
Noticed and fixed a few typos in the SmartLLMChain default ideation and
critique prompts

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
5 months ago
Erick Friis 86d3e42853
core[minor]: add name to basemessage (#17539)
Adds an optional name param to our base message to support passing names
into LLMs.

OpenAI supports having a name on anything except tool message now
(system, ai, user/human).
5 months ago
Mateusz Szewczyk 916332ef5b
ibm: added partners package `langchain_ibm`, added llm (#16512)
- **Description:** Added `langchain_ibm` as an langchain partners
package of IBM [watsonx.ai](https://www.ibm.com/products/watsonx-ai) LLM
provider (`WatsonxLLM`)
- **Dependencies:**
[ibm-watsonx-ai](https://pypi.org/project/ibm-watsonx-ai/),
  - **Tag maintainer:** : 
---------

Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
Shawn f6d3a3546f
community[patch]: document_loaders: modified athena key logic to handle s3 uris without a prefix (#17526)
https://github.com/langchain-ai/langchain/issues/17525

### Example Code

```python
from langchain_community.document_loaders.athena import AthenaLoader

database_name = "database"
s3_output_path = "s3://bucket-no-prefix"
query="""SELECT 
  CAST(extract(hour FROM current_timestamp) AS INTEGER) AS current_hour,
  CAST(extract(minute FROM current_timestamp) AS INTEGER) AS current_minute,
  CAST(extract(second FROM current_timestamp) AS INTEGER) AS current_second;
"""
profile_name = "AdministratorAccess"

loader = AthenaLoader(
    query=query,
    database=database_name,
    s3_output_uri=s3_output_path,
    profile_name=profile_name,
)

documents = loader.load()
print(documents)
```



### Error Message and Stack Trace (if applicable)

NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject
operation: The specified key does not exist

### Description

Athena Loader errors when result s3 bucket uri has no prefix. The Loader
instance call results in a "NoSuchKey: An error occurred (NoSuchKey)
when calling the GetObject operation: The specified key does not exist."
error.

If s3_output_path contains a prefix like:

```python
s3_output_path = "s3://bucket-with-prefix/prefix"
```

Execution works without an error.

## Suggested solution

Modify:

```python
key = "/".join(tokens[1:]) + "/" + query_execution_id + ".csv"
```

to

```python
key = "/".join(tokens[1:]) + ("/" if tokens[1:] else "") + query_execution_id + ".csv"
```


9e8a3fc4ff/libs/community/langchain_community/document_loaders/athena.py (L128)


### System Info


System Information
------------------
> OS:  Darwin
> OS Version: Darwin Kernel Version 22.6.0: Fri Sep 15 13:41:30 PDT
2023; root:xnu-8796.141.3.700.8~1/RELEASE_ARM64_T8103
> Python Version:  3.9.9 (main, Jan  9 2023, 11:42:03) 
[Clang 14.0.0 (clang-1400.0.29.102)]

Package Information
-------------------
> langchain_core: 0.1.23
> langchain: 0.1.7
> langchain_community: 0.0.20
> langsmith: 0.0.87
> langchain_openai: 0.0.6
> langchainhub: 0.1.14

Packages not installed (Not Necessarily a Problem)
--------------------------------------------------
The following packages were not found:

> langgraph
> langserve

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
wulixuan c776cfc599
community[minor]: integrate with model Yuan2.0 (#15411)
1. integrate with
[`Yuan2.0`](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/README-EN.md)
2. update `langchain.llms`
3. add a new doc for [Yuan2.0
integration](docs/docs/integrations/llms/yuan2.ipynb)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Philippe PRADOS d07db457fc
community[patch]: Fix SQLAlchemyMd5Cache race condition (#16279)
If the SQLAlchemyMd5Cache is shared among multiple processes, it is
possible to encounter a race condition during the cache update.

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
Alex Peplowski 70c296ae96
community[patch]: Expose Anthropic Retry Logic (#17069)
**Description:**

Expose Anthropic's retry logic, so that `max_retries` can be configured
via langchain. Anthropic's retry logic is implemented in their Python
SDK here:
https://github.com/anthropics/anthropic-sdk-python?tab=readme-ov-file#retries

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
DanisJiang de9a6cdf16
experimental[patch]: Enhance protection against arbitrary code execution in PALChain (#17091)
- **Description:** Block some ways to trigger arbitrary code execution
bug in PALChain.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
Lyndsey 8562a1e7d4
community[patch]: support query filters for NotionDBLoader (#17217)
- **Description:** Support filtering databases in the use case where
devs do not want to query ALL entries within a DB,
- **Issue:** N/A,
- **Dependencies:** N/A,
- **Twitter handle:** I don't have Twitter but feel free to tag my
Github!

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
volodymyr-memsql e36bc379f2
community[patch]: Add vector index support to SingleStoreDB VectorStore (#17308)
This pull request introduces support for various Approximate Nearest
Neighbor (ANN) vector index algorithms in the VectorStore class,
starting from version 8.5 of SingleStore DB. Leveraging this enhancement
enables users to harness the power of vector indexing, significantly
boosting search speed, particularly when handling large sets of vectors.

---------

Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Kate Silverstein 0bc4a9b3fc
community[minor]: Adds Llamafile as an LLM (#17431)
* **Description:** Adds a simple LLM implementation for interacting with
[llamafile](https://github.com/Mozilla-Ocho/llamafile)-based models.
* **Dependencies:** N/A
* **Issue:** N/A

**Detail**
[llamafile](https://github.com/Mozilla-Ocho/llamafile) lets you run LLMs
locally from a single file on most computers without installing any
dependencies.

To use the llamafile LLM implementation, the user needs to:

1. Download a llamafile e.g.
https://huggingface.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile?download=true
2. Make the file executable.
3. Run the llamafile in 'server mode'. (All llamafiles come packaged
with a lightweight server; by default, the server listens at
`http://localhost:8080`.)


```bash
wget https://url/of/model.llamafile
chmod +x model.llamafile
./model.llamafile --server --nobrowser
```

Now, the user can invoke the LLM via the LangChain client:

```python
from langchain_community.llms.llamafile import Llamafile

llm = Llamafile()

llm.invoke("Tell me a joke.")
```
5 months ago
Rakib Hosen 5ce1827d31
community[patch]: fix import in language parser (#17538)
- **Description:** Resolving import error in language_parser.py during
"from langchain.langchain.text_splitter import Language - **Issue:** the
issue #17536
- **Dependencies:** NO
- **Twitter handle:** @iRakibHosen

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Raunak 685d62b032
community[patch]: Added functions in NetworkxEntityGraph class (#17535)
- **Description:** 
1. Added _clear_edges()_ and _get_number_of_nodes()_ functions in
NetworkxEntityGraph class.
2. Added the above two function in graph_networkx_qa.ipynb
documentation.
5 months ago
Erick Friis bfaa8c3048
anthropic[patch]: de-beta anthropic messages, release 0.0.2 (#17540) 5 months ago
Erick Friis a99c667c22
partners: version constraints (#17492)
Core should be ^0.1 by default

Careful about 0.x.y and 0.0.z packages
5 months ago
Erick Friis d7418acbe1
nomic[patch]: release 0.0.2, dimensionality (#17534)
- nomic[patch]: release 0.0.2
- x
5 months ago
shibuiwilliam c502736841
infra: add test for ensemble retriever to ensure multiple retrievers (#8401)
Add tests to ensemble retriever to ensure it works with combination of
multiple retrievers

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Qihui Xie 5738143d4b
add mongodb_store (#13801)
# Add MongoDB storage
  - **Description:** 
  Add MongoDB Storage as an option for large doc store. 

Example usage: 
```Python
# Instantiate the MongodbStore with a MongoDB connection
from langchain.storage import MongodbStore

mongo_conn_str = "mongodb://localhost:27017/"
mongodb_store = MongodbStore(mongo_conn_str, db_name="test-db",
                                collection_name="test-collection")

# Set values for keys
doc1 = Document(page_content='test1')
doc2 = Document(page_content='test2')
mongodb_store.mset([("key1", doc1), ("key2", doc2)])

# Get values for keys
values = mongodb_store.mget(["key1", "key2"])
# [doc1, doc2]

# Iterate over keys
for key in mongodb_store.yield_keys():
    print(key)

# Delete keys
mongodb_store.mdelete(["key1", "key2"])
 ```

  - **Dependencies:**
  Use `mongomock` for integration test.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
Mo Latif 50b48a8e6a
langchain[patch]: Invoke chain prep_inputs and prep_outputs inside try block to catch validation errors (#16644)
- **Description:** Callback manager can't catch chain input or output
validation errors because `prepare_input` and `prepare_output` are not
part of the try/raise logic, this PR fixes that logic.
 
  - **Issue:** #15954
5 months ago
Christophe Bornet a8f530bc4d
Add async methods to CacheBackedEmbeddings (#16873)
Adds async methods to CacheBackedEmbeddings
5 months ago
Bagatur 50de7a31f0
langchain[patch]: structured output chain nits (#17291) 5 months ago
Nat Noordanus 8a3b74fe1f
community[patch]: Fix pydantic ForwardRef error in BedrockBase (#17416)
- **Description:** Fixes a type annotation issue in the definition of
BedrockBase. This issue was that the annotation for the `config`
attribute includes a ForwardRef to `botocore.client.Config` which is
only imported when `TYPE_CHECKING`. This can cause pydantic to raise an
error like `pydantic.errors.ConfigError: field "config" not yet prepared
so type is still a ForwardRef, ...`.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** `@__nat_n__`
5 months ago
Ashley Xu f746a73e26
Add the BQ job usage tracking from LangChain (#17123)
- **Description:**
Add the BQ job usage tracking from LangChain

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
JongRok BAEK 8d6cc90fc5
langchain.core : Use shallow copy for schema manipulation in JsonOutputParser.get_format_instructions (#17162)
- **Description :**  

Fix: Use shallow copy for schema manipulation in get_format_instructions

Prevents side effects on the original schema object by using a
dictionary comprehension for a safer and more controlled manipulation of
schema key-value pairs, enhancing code reliability.

  - **Issue:**  #17161 
  - **Dependencies:** None
  -  **Twitter handle:** None
5 months ago
Bagatur b5d3416563
experimental[patch]: Release 0.0.51 (#17484) 5 months ago
Bagatur de7c4b277c
langchain[patch]: Release 0.1.7 (#17482) 5 months ago
Bagatur 39342d98d6
community[patch]: Release 0.0.20 (#17480) 5 months ago
Bagatur 89b765ec27
core[patch]: Release 0.1.23 (#17479) 5 months ago
Max Jakob ab3d944667
community[patch]: ElasticsearchStore: preserve user headers (#16830)
Users can provide an Elasticsearch connection with custom headers. This
PR makes sure these headers are preserved when adding the langchain user
agent header.
5 months ago
Erick Friis 9eb1b56e73
pinecone[patch]: release 0.0.2 (#17477) 5 months ago
Erick Friis 37678471c4
openai[patch]: relax tiktoken constraint, release 0.0.6 (#17472) 5 months ago
Wendy H. Chun 2df7387c91
langchain[patch]: Fix to avoid infinite loop during collapse chain in map reduce (#16253)
- **Description:** Depending on `token_max` used in
`load_summarize_chain`, it could cause an infinite loop when documents
cannot collapse under `token_max`. This change would not affect the
existing feature, but it also gives an option to users to avoid the
situation.
  - **Issue:** https://github.com/langchain-ai/langchain/issues/16251
  - **Dependencies:** None
  - **Twitter handle:** None

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
wulixuan 5d06797905
community[minor]: integrate chat models with Yuan2.0 (#16575)
1. integrate chat models with
[`Yuan2.0`](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/README-EN.md)
2. add a new doc for [Yuan2.0
integration](docs/docs/integrations/llms/yuan2.ipynb)
 
Yuan2.0 is a new generation Fundamental Large Language Model developed
by IEIT System. We have published all three models, Yuan 2.0-102B, Yuan
2.0-51B, and Yuan 2.0-2B.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Taha Khabouss 15baffc484
langchain[patch]: Ensure that the Elasticsearch Query Translator functions accurately w… (#17044)
Description:
Addresses a problem where the Date type within an Elasticsearch
SelfQueryRetriever would encounter difficulties in generating a valid
query.

Issue: #17042

---------

Co-authored-by: Max Jakob <max.jakob@elastic.co>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Erick Friis e5c76f9dbd
pinecone[patch]: poetry update (#17471) 5 months ago
Erick Friis 10bdf2422c
pinecone[patch]: release 0.0.2rc0, remove simsimd dep (#17469) 5 months ago
Erick Friis 065cde69b1
google-genai[patch]: release 0.0.9, safety settings docs (#17432) 5 months ago
Sergey Kozlov db6f266d97
core: improve None value processing in merge_dicts() (#17462)
- **Description:** fix `None` and `0` merging in `merge_dicts()`, add
tests.
```python
from langchain_core.utils._merge import merge_dicts
assert merge_dicts({"a": None}, {"a": 0}) == {"a": 0}
```

---------

Co-authored-by: Sergey Kozlov <sergey.kozlov@ludditelabs.io>
5 months ago
Ian Gregory e5472b5eb8
Framework for supporting more languages in LanguageParser (#13318)
## Description

I am submitting this for a school project as part of a team of 5. Other
team members are @LeilaChr, @maazh10, @Megabear137, @jelalalamy. This PR
also has contributions from community members @Harrolee and @Mario928.

Initial context is in the issue we opened (#11229).

This pull request adds:

- Generic framework for expanding the languages that `LanguageParser`
can handle, using the
[tree-sitter](https://github.com/tree-sitter/py-tree-sitter#py-tree-sitter)
parsing library and existing language-specific parsers written for it
- Support for the following additional languages in `LanguageParser`:
  - C
  - C++
  - C#
  - Go
- Java (contributed by @Mario928
https://github.com/ThatsJustCheesy/langchain/pull/2)
  - Kotlin
  - Lua
  - Perl
  - Ruby
  - Rust
  - Scala
- TypeScript (contributed by @Harrolee
https://github.com/ThatsJustCheesy/langchain/pull/1)

Here is the [design
document](https://docs.google.com/document/d/17dB14cKCWAaiTeSeBtxHpoVPGKrsPye8W0o_WClz2kk)
if curious, but no need to read it.

## Issues

- Closes #11229
- Closes #10996
- Closes #8405

## Dependencies

`tree_sitter` and `tree_sitter_languages` on PyPI. We have tried to add
these as optional dependencies.

## Documentation

We have updated the list of supported languages, and also added a
section to `source_code.ipynb` detailing how to add support for
additional languages using our framework.

## Maintainer

- @hwchase17 (previously reviewed
https://github.com/langchain-ai/langchain/pull/6486)

Thanks!!

## Git commits

We will gladly squash any/all of our commits (esp merge commits) if
necessary. Let us know if this is desirable, or if you will be
squash-merging anyway.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Maaz Hashmi <mhashmi373@gmail.com>
Co-authored-by: LeilaChr <87657694+LeilaChr@users.noreply.github.com>
Co-authored-by: Jeremy La <jeremylai511@gmail.com>
Co-authored-by: Megabear137 <zubair.alnoor27@gmail.com>
Co-authored-by: Lee Harrold <lhharrold@sep.com>
Co-authored-by: Mario928 <88029051+Mario928@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
5 months ago
Bagatur 3925071dd6
langchain[patch], templates[patch]: fix multi query retriever, web re… (#17434)
…search retriever

Fixes #17352
5 months ago
Bagatur c0ce93236a
experimental[patch]: fix zero-shot pandas agent (#17442) 5 months ago
Abhishek Jain 37e1275f9e
community[patch]: Fixed the 'aembed' method of 'CohereEmbeddings'. (#16497)
**Description:**
- The existing code was trying to find a `.embeddings` property on the
`Coroutine` returned by calling `cohere.async_client.embed`.
- Instead, the `.embeddings` property is present on the value returned
by the `Coroutine`.
- Also, it seems that the original cohere client expects a value of
`max_retries` to not be `None`. Hence, setting the default value of
`max_retries` to `3`.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Sridhar Ramaswamy 9f1cbbc6ed
community[minor]: Add pebblo safe document loader (#16862)
- **Description:** Pebblo opensource project enables developers to
safely load data to their Gen AI apps. It identifies semantic topics and
entities found in the loaded data and summarizes them in a
developer-friendly report.
  - **Dependencies:** none
  - **Twitter handle:** srics

@hwchase17
5 months ago
mhavey 1bbb64d956
community[minor], langchian[minor]: Add Neptune Rdf graph and chain (#16650)
**Description**: This PR adds a chain for Amazon Neptune graph database
RDF format. It complements the existing Neptune Cypher chain. The PR
also includes a Neptune RDF graph class to connect to, introspect, and
query a Neptune RDF graph database from the chain. A sample notebook is
provided under docs that demonstrates the overall effect: invoking the
chain to make natural language queries against Neptune using an LLM.

**Issue**: This is a new feature
 
**Dependencies**: The RDF graph class depends on the AWS boto3 library
if using IAM authentication to connect to the Neptune database.

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Michael Feil e1cfd0f3e7
community[patch]: infinity embeddings update incorrect default url (#16759)
The default url has always been incorrect (7797 instead 7997). Here is a
update to the correct url.
5 months ago
Massimiliano Pronesti df7cbd6fbb
community[minor]: add FlashRank ranker (#16785)
**Description:** This PR adds support for
[flashrank](https://github.com/PrithivirajDamodaran/FlashRank) for
reranking as alternative to Cohere.

I'm not sure `libs/langchain` is the right place for this change. At
first, I wanted to put it under `libs/community`. All the compressors
were under `libs/langchain/retrievers/document_compressors` though. Hope
this makes sense!
5 months ago
Andreas Motl 1fdd9bd980
community/SQLDatabase: Generalize and trim software tests (#16659)
- **Description:** Improve test cases for `SQLDatabase` adapter
component, see
[suggestion](https://github.com/langchain-ai/langchain/pull/16655#pullrequestreview-1846749474).
  - **Depends on:** GH-16655
  - **Addressed to:** @baskaryan, @cbornet, @eyurtsev

_Remark: This PR is stacked upon GH-16655, so that one will need to go
in first._

Edit: Thank you for bringing in GH-17191, @eyurtsev. This is a little
aftermath, improving/streamlining the corresponding test cases.
5 months ago
Theo / Taeyoon Kang 1987f905ed
core[patch]: Support .yml extension for YAML (#16783)
- **Description:**

[AS-IS] When dealing with a yaml file, the extension must be .yaml.  

[TO-BE] In the absence of extension length constraints in the OS, the
extension of the YAML file is yaml, but control over the yml extension
must still be made.

It's as if it's an error because it's a .jpg extension in jpeg support.

  - **Issue:** - 

  - **Dependencies:**
no dependencies required for this change,
5 months ago
Kapil Sachdeva cd00a87db7
community[patch] - in FAISS vector store, support passing custom DocStore implementation when using from_xxx methods (#16801)
- **Description:** The from__xx methods of FAISS class have hardcoded
InMemoryStore implementation and thereby not let users pass a custom
DocStore implementation,
  - **Issue:** no referenced issue,
  - **Dependencies:** none,
  - **Twitter handle:** ksachdeva
5 months ago
Chris f9f5626ca4
community[patch]: Fix github search issues and PRs PaginatedList has no len() error (#16806)
**Description:** 
Bugfix: Langchain_community's GitHub Api wrapper throws a TypeError when
searching for issues and/or PRs (the `search_issues_and_prs` method).
This is because PyGithub's PageinatedList type does not support the
len() method. See https://github.com/PyGithub/PyGithub/issues/1476

![image](https://github.com/langchain-ai/langchain/assets/8849021/57390b11-ed41-4f48-ba50-f3028610789c)
  **Dependencies:** None 
  **Twitter handle**: @ChrisKeoghNZ
  
I haven't registered an issue as it would take me longer to fill the
template out than to make the fix, but I'm happy to if that's deemed
essential.

I've added a simple integration test to cover this as there were no
existing unit tests and it was going to be tricky to set them up.

Co-authored-by: Chris Keogh <chris.keogh@xero.com>
5 months ago
morgana 722aae4fd1
community: add delete method to rocksetdb vectorstore to support recordmanager (#17030)
- **Description:** This adds a delete method so that rocksetdb can be
used with `RecordManager`.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** `@_morgan_adams_`

---------

Co-authored-by: Rockset API Bot <admin@rockset.io>
5 months ago
yin1991 c454dc36fc
community[proxy]: Enhancement/add proxy support playwrighturlloader 16751 (#16822)
- **Description:** Enhancement/add proxy support playwrighturlloader
16751
- **Issue:** [Enhancement: Add Proxy Support to PlaywrightURLLoader
Class](https://github.com/langchain-ai/langchain/issues/16751)
  - **Dependencies:** 
  - **Twitter handle:** @ootR77013489

---------

Co-authored-by: root <root@ip-172-31-46-160.ap-southeast-1.compute.internal>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Bhupesh Varshney e3b775e035
infra: make `.gitignore` consistent with standard python gitignore (#16828)
- The new .gitignore version is inherited from the one maintained by the
github community over at
https://github.com/github/gitignore/blob/main/Python.gitignore
- This should cover all the cases of how a langchain app can be used.
5 months ago
James Braza 64938ae6f2
infra: unit testing `check_package_version` (#16825)
Wrote a unit test for `check_package_version` in the core package.

Note that this is a revival of
https://github.com/langchain-ai/langchain/pull/16387 after GitHub
incident (see
https://github.com/langchain-ai/langchain/discussions/16796).
5 months ago
Lingzhen Chen 30af711c34
community[patch]: update AzureSearch class to work with azure-search-documents=11.4.0 (#15659)
- **Description:** Updates
`libs/community/langchain_community/vectorstores/azuresearch.py` to
support the stable version `azure-search-documents=11.4.0`
- **Issue:** https://github.com/langchain-ai/langchain/issues/14534,
https://github.com/langchain-ai/langchain/issues/15039,
https://github.com/langchain-ai/langchain/issues/15355
  - **Dependencies:** azure-search-documents>=11.4.0

---------

Co-authored-by: Clément Tamines <Skar0@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Robby e135dc70c3
community[patch]: Invoke callback prior to yielding token (#17348)
**Description:** Invoke callback prior to yielding token in stream
method for Ollama.
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)

Co-authored-by: Robby <h0rv@users.noreply.github.com>
5 months ago
Christophe Bornet ab025507bc
community[patch]: Add async methods to VectorStoreQATool (#16949) 5 months ago
Christophe Bornet fb7552bfcf
Add async methods to InMemoryCache (#17425)
Add async methods to InMemoryCache
5 months ago
Eugene Yurtsev 93472ee9e6
core[patch]: Replace memory stream implementation used by LogStreamCallbackHandler (#17185)
This PR replaces the memory stream implementation used by the 
LogStreamCallbackHandler.

This implementation resolves an issue in which streamed logs and
streamed events originating from sync code would arrive only after the
entire sync code would finish execution (rather than arriving in real
time as they're generated).

One example is if trying to stream tokens from an llm within a tool. If
the tool was an async tool, but the llm was invoked via stream (sync
variant) rather than astream (async variant), then the tokens would fail
to stream in real time and would all arrived bunched up after the tool
invocation completed.
5 months ago
yin1991 37ef6ac113
community[patch]: Add Pagination to GitHubIssuesLoader for Efficient GitHub Issues Retrieval (#16934)
- **Description:** Add Pagination to GitHubIssuesLoader for Efficient
GitHub Issues Retrieval
- **Issue:** [the issue # it fixes if
applicable,](https://github.com/langchain-ai/langchain/issues/16864)

---------

Co-authored-by: root <root@ip-172-31-46-160.ap-southeast-1.compute.internal>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Bagatur 22638e5927
community[patch]: give reranker default client val (#17289) 5 months ago
Robby ece4b43a81
community[patch]: doc loaders mypy fixes (#17368)
**Description:** Fixed `type: ignore`'s for mypy for some
document_loaders.
**Issue:** [Remove "type: ignore" comments #17048
](https://github.com/langchain-ai/langchain/issues/17048)

---------

Co-authored-by: Robby <h0rv@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
Robby 0653aa469a
community[patch]: Invoke callback prior to yielding token (#17346)
**Description:** Invoke callback prior to yielding token in stream
method for watsonx.
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)

Co-authored-by: Robby <h0rv@users.noreply.github.com>
5 months ago
Bagatur f7e453971d
community[patch]: remove print (#17435) 5 months ago
Spencer Kelly 54fa78c887
community[patch]: fixed vector similarity filtering (#16967)
**Description:** changed filtering so that failed filter doesn't add
document to results. Currently filtering is entirely broken and all
documents are returned whether or not they pass the filter.

fixes issue introduced in
https://github.com/langchain-ai/langchain/pull/16190
5 months ago
Aditya a23c719c8b
google-genai[minor]: add safety settings (#16836)
Replace this entire comment with:
- **Description:Expose safety_settings for Gemini integrations on
google-generativeai
  - **Issue:NA,
  - **Dependencies:NA
  - **Twitter handle:@aditya_rane

@lkuligin for review

---------

Co-authored-by: adityarane@google.com <adityarane@google.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
Abhijeeth Padarthi 584b647b96
community[minor]: AWS Athena Document Loader (#15625)
- **Description:** Adds the document loader for [AWS
Athena](https://aws.amazon.com/athena/), a serverless and interactive
analytics service.
  - **Dependencies:** Added boto3 as a dependency
5 months ago
david-tempelmann 93da18b667
community[minor]: Add mmr and similarity_score_threshold retrieval to DatabricksVectorSearch (#16829)
- **Description:** This PR adds support for `search_types="mmr"` and
`search_type="similarity_score_threshold"` to retrievers using
`DatabricksVectorSearch`,
  - **Issue:** 
  - **Dependencies:**
  - **Twitter handle:**

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Erick Friis 42648061ad
openai[patch]: code cleaning (#17355)
h/t @tdene for finding cleanup op in #17047
5 months ago
Massimiliano Pronesti 3894b4d9a5
community: add gpt-4-turbo and gpt-4-0125 costs (#17349)
Ref: https://openai.com/pricing
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
5 months ago
Tomaz Bratanic 19a1c9183d
Improve graph cypher qa prompt (#17380)
Unlike vector results, the LLM has to completely trust the context of a
graph database result, even if it doesn't provide whole context. We
tried with instructions, but it seems that adding a single example is
the way to go to solve this issue.
5 months ago
Sandeep Banerjee 183daa6e6f
google-genai[patch]: on_llm_new_token fix (#16924)
### This pull request makes the following changes:
* Fixed issue #16913

Fixed the google gen ai chat_models.py code to make sure that the
callback is called before the token is yielded

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
Bagatur 10c10f2dea
cli[patch]: integration template nits (#14691)
Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
Erick Friis 99540d3d75
infra: no print in newer partner packages (#17353)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
5 months ago
William FH 7c03cc5ed4
Support serialization when inputs/outputs contain generators (#17338)
Pydantic's `dict()` function raises an error here if you pass in a
generator. We have a more robust serialization function in lagnsmith
that we will use instead.
5 months ago
Erick Friis 3a2eb6e12b
infra: add print rule to ruff (#16221)
Added noqa for existing prints. Can slowly remove / will prevent more
being intro'd
5 months ago
Jael Gu c07c0da01a
community[patch]: Fix Milvus add texts when ids=None (#17021)
- **Description:** Fix Milvus add texts when ids=None (auto_id=True)

Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Quang Hoa 54c1fb3f25
community[patch]: Make some functions work with Milvus (#10695)
**Description**
Make some functions work with Milvus:
1. get_ids: Get primary keys by field in the metadata
2. delete: Delete one or more entities by ids
3. upsert: Update/Insert one or more entities

**Issue**
None
**Dependencies**
None
**Tag maintainer:**
@hwchase17 
**Twitter handle:**
None

---------

Co-authored-by: HoaNQ9 <hoanq.1811@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
kYLe c9999557bf
community[patch]: Modify LLMs/Anyscale work with OpenAI API v1 (#14206)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
- **Description:** 
1. Modify LLMs/Anyscale to work with OAI v1
2. Get rid of openai_ prefixed variables in Chat_model/ChatAnyscale
3. Modify `anyscale_api_base` to `anyscale_base_url` to follow OAI name
convention (reverted)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
Charlie Marsh 24c0bab57b
infra, multiple: Upgrade configuration for Ruff v0.2.0 (#16905)
## Summary

This PR upgrades LangChain's Ruff configuration in preparation for
Ruff's v0.2.0 release. (The changes are compatible with Ruff v0.1.5,
which LangChain uses today.) Specifically, we're now warning when
linter-only options are specified under `[tool.ruff]` instead of
`[tool.ruff.lint]`.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Bagatur 01409add5a
google-vertexai[patch]: rm deps (#17077)
Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
Erick Friis 1c2facf88d
nvidia-ai-endpoints[patch]: release 0.0.3 (#17345) 5 months ago
Vadim Kudlay 5f9ac6986e
nvidia-ai-endpoints[patch]: model arguments (e.g. temperature) on construction bug (#17290)
- **Issue:** Issue with model argument support (been there for a while
actually):
- Non-specially-handled arguments like temperature don't work when
passed through constructor.
- Such arguments DO work quite well with `bind`, but also do not abide
by field requirements.
- Since initial push, server-side error messages have gotten better and
v0.0.2 raises better exceptions. So maybe it's better to let server-side
handle such issues?
- **Description:**
- Removed ChatNVIDIA's argument fields in favor of
`model_kwargs`/`model_kws` arguments which aggregates constructor kwargs
(from constructor pathway) and merges them with call kwargs (bind
pathway).
- Shuffled a few functions from `_NVIDIAClient` to `ChatNVIDIA` to
streamline construction for future integrations.
- Minor/Optional: Old services didn't have stop support, so client-side
stopping was implemented. Now do both.
- **Any Breaking Changes:** Minor breaking changes if you strongly rely
on chat_model.temperature, etc. This is captured by
chat_model.model_kwargs.

PR passes tests and example notebooks and example testing. Still gonna
chat with some people, so leaving as draft for now.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
Leonid Ganeline 932c52c333
community[patch]: docstrings (#16810)
- added missed docstrings
- formated docstrings to the consistent form
5 months ago
Leonid Ganeline ae66bcbc10
core[patch]: docstring update (#16813)
- added missed docstrings
- formated docstrings to consistent form
5 months ago
Eugene Yurtsev e10030e241
core[patch]: Add unit test to cover different streaming format for json parsing (#17063)
Add unit test to cover this issue:

https://github.com/langchain-ai/langchain/issues/16423

which was resolved by this PR:

https://github.com/langchain-ai/langchain/pull/16670/files

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Kononov Pavel 15bc201967
langchain_community: Fix typo bug (#17324)
Problem from #17095

This error wasn't in the v1.4.0
5 months ago
Erick Friis e660a1685b
google-genai[patch]: release 0.0.8 (#17285) 5 months ago
Erick Friis febf9540b9
google-genai[patch]: fix tool format, use protos (#17284) 5 months ago
German Martin 1032faba5f
langchain_google_genai : Add missing _identifying_params property. (#17224)
Description: Missing _identifying_params create issues when dealing with
callbacks to get current run model parameters.
All other model partners implementation provide this property and also
provide _default_params. I'm not sure about the default values to
include or if we can re-use the same as for _VertexAICommon(), this
change allows you to access the model parameters correctly.
Issue: Not exactly this issue but could be related
https://github.com/langchain-ai/langchain/issues/14711
Twitter handle:@musicaoriginal2
5 months ago
Erick Friis e4da7918f3
google-genai[patch]: fix streaming, function calling (#17268) 5 months ago
Ruben Hakopian 96b5711a0c
google-vertexai[patch]: Fixed SafetySettings handling in streaming API in VertexAI (#17278)
The streaming API doesn't separate safety_settings from the
generation_config payload. As the result the following error is observed
when using `stream` API. The functionality is correct with `invoke` API.

The fix separates the `safety_settings` from params and sets it as
argument to the `send_message` method.

```
ERROR:         Unknown field for GenerationConfig: safety_settings
Traceback (most recent call last):
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 250, in stream
    raise e
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 234, in stream
    for chunk in self._stream(
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/langchain_google_vertexai/chat_models.py", line 501, in _stream
    for response in responses:
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/vertexai/generative_models/_generative_models.py", line 921, in _send_message_streaming
    for chunk in stream:
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/vertexai/generative_models/_generative_models.py", line 514, in _generate_content_streaming
    request = self._prepare_request(
              ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/vertexai/generative_models/_generative_models.py", line 256, in _prepare_request
    gapic_generation_config = gapic_content_types.GenerationConfig(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/proto/message.py", line 576, in __init__
    raise ValueError(
ValueError: Unknown field for GenerationConfig: safety_settings
```

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
Bagatur 65e97c9b53
infra: mv SQLDatabase tests to community (#17276) 5 months ago
Bagatur 72c7af0bc0
langchain[patch]: undo redis cache import (#17275) 5 months ago
Bagatur 8bad4157ad
langchain[patch]: Release 0.1.6 (#17133) 5 months ago
Bagatur 7fa4dc593f
core[patch]: Release 0.1.22 (#17274) 5 months ago
Bagatur 02ef9164b5
langchain[patch]: expose cohere rerank score, add parent doc param (#16887) 5 months ago
Bagatur 35c1bf339d
infra: rm boto3, gcaip from pyproject (#17270) 5 months ago
Alex de5e96b5f9
community[patch]: updated openai prices in mapping (#17009)
- **Description:** there are january prices update for chatgpt
[blog](https://openai.com/blog/new-embedding-models-and-api-updates),
also there are updates on their website on page
[pricing](https://openai.com/pricing)
- **Issue:** N/A

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Mohammad Mohtashim e35c7fa3b2
[Langchain_core]: Added Docstring for RunnableConfigurableAlternatives (#17263)
I noticed that RunnableConfigurableAlternatives which is an important
composition in LCEL has no Docstring. Therefore I added the detailed
Docstring for it.
@baskaryan, @eyurtsev, @hwchase17 please have a look and let me if the
docstring is looking good.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Armin Stepanyan 641efcf41c
community: add runtime kwargs to HuggingFacePipeline (#17005)
This PR enables changing the behaviour of huggingface pipeline between
different calls. For example, before this PR there's no way of changing
maximum generation length between different invocations of the chain.
This is desirable in cases, such as when we want to scale the maximum
output size depending on a dynamic prompt size.

Usage example:

```python
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
hf = HuggingFacePipeline(pipeline=pipe)

hf("Say foo:", pipeline_kwargs={"max_new_tokens": 42})
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Scott Nath a32798abd7
community: Add you.com utility, update you retriever integration docs (#17014)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

- **Description: changes to you.com files** 
    - general cleanup
- adds community/utilities/you.py, moving bulk of code from retriever ->
utility
    - removes `snippet` as endpoint
    - adds `news` as endpoint
    - adds more tests

<s>**Description: update community MAKE file** 
    - adds `integration_tests`
    - adds `coverage`</s>

- **Issue:** the issue # it fixes if applicable,
- [For New Contributors: Update Integration
Documentation](https://github.com/langchain-ai/langchain/issues/15664#issuecomment-1920099868)
- **Dependencies:** n/a
- **Twitter handle:** @scottnath
- **Mastodon handle:** scottnath@mastodon.social

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
joelsprunger 3984f6604f
langchain: adds recursive json splitter (#17144)
- **Description:** This adds a recursive json splitter class to the
existing text_splitters as well as unit tests
- **Issue:** splitting text from structured data can cause issues if you
have a large nested json object and you split it as regular text you may
end up losing the structure of the json. To mitigate against this you
can split the nested json into large chunks and overlap them, but this
causes unnecessary text processing and there will still be times where
the nested json is so big that the chunks get separated from the parent
keys.

As an example you wouldn't want the following to be split in half:
```shell
{'val0': 'DFWeNdWhapbR',
 'val1': {'val10': 'QdJo',
          'val11': 'FWSDVFHClW',
          'val12': 'bkVnXMMlTiQh',
          'val13': 'tdDMKRrOY',
          'val14': 'zybPALvL',
          'val15': 'JMzGMNH',
          'val16': {'val160': 'qLuLKusFw',
                    'val161': 'DGuotLh',
                    'val162': 'KztlcSBropT',
-----------------------------------------------------------------------split-----
                    'val163': 'YlHHDrN',
                    'val164': 'CtzsxlGBZKf',
                    'val165': 'bXzhcrWLmBFp',
                    'val166': 'zZAqC',
                    'val167': 'ZtyWno',
                    'val168': 'nQQZRsLnaBhb',
                    'val169': 'gSpMbJwA'},
          'val17': 'JhgiyF',
          'val18': 'aJaqjUSFFrI',
          'val19': 'glqNSvoyxdg'}}
```
Any llm processing the second chunk of text may not have the context of
val1, and val16 reducing accuracy. Embeddings will also lack this
context and this makes retrieval less accurate.

Instead you want it to be split into chunks that retain the json
structure.
```shell
{'val0': 'DFWeNdWhapbR',
 'val1': {'val10': 'QdJo',
          'val11': 'FWSDVFHClW',
          'val12': 'bkVnXMMlTiQh',
          'val13': 'tdDMKRrOY',
          'val14': 'zybPALvL',
          'val15': 'JMzGMNH',
          'val16': {'val160': 'qLuLKusFw',
                    'val161': 'DGuotLh',
                    'val162': 'KztlcSBropT',
                    'val163': 'YlHHDrN',
                    'val164': 'CtzsxlGBZKf'}}}
```
and
```shell
{'val1':{'val16':{
                    'val165': 'bXzhcrWLmBFp',
                    'val166': 'zZAqC',
                    'val167': 'ZtyWno',
                    'val168': 'nQQZRsLnaBhb',
                    'val169': 'gSpMbJwA'},
          'val17': 'JhgiyF',
          'val18': 'aJaqjUSFFrI',
          'val19': 'glqNSvoyxdg'}}
```
This recursive json text splitter does this. Values that contain a list
can be converted to dict first by using split(... convert_lists=True)
otherwise long lists will not be split and you may end up with chunks
larger than the max chunk.

In my testing large json objects could be split into small chunks with 
   Increased question answering accuracy
 The ability to split into smaller chunks meant retrieval queries can
use fewer tokens


- **Dependencies:** json import added to text_splitter.py, and random
added to the unit test
  - **Twitter handle:** @joelsprunger

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
5 months ago
Leonid Kuligin 1862900078
google-genai[patch]: added parsing of function call / response (#17245) 5 months ago
Cailin Wang a210a8bc53
langchain[patch]: Fix create_retriever_tool missing on_retriever_end Document content (#16933)
- **Description:** In create_retriever_tool create_tool, fix
create_retriever_tool's missing Document content for on_retriever_end,
caused by create_retriever_tool's missing callbacks parameter,
  - **Twitter handle:** @CailinWang_

---------

Co-authored-by: root <root@Bluedot-AI>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Sparsh Jain a2167614b7
google-genai[patch]: Invoke callback prior to yielding token (#17092)
- **Description:** Invoke callback prior to yielding token in stream and
astream methods for Google-genai,
  - **Issue:** the issue # 16913,
  - **Twitter handle:** Sparsh10649446

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
Liang Zhang 7306600e2f
community[patch]: Support SerDe transform functions in Databricks LLM (#16752)
**Description:** Databricks LLM does not support SerDe the
transform_input_fn and transform_output_fn. After saving and loading,
the LLM will be broken. This PR serialize these functions into a hex
string using pickle, and saving the hex string in the yaml file. Using
pickle to serialize a function can be flaky, but this is a simple
workaround that unblocks many use cases. If more sophisticated SerDe is
needed, we can improve it later.

Test:
Added a simple unit test.
I did manual test on Databricks and it works well.
The saved yaml looks like:
```
llm:
      _type: databricks
      cluster_driver_port: null
      cluster_id: null
      databricks_uri: databricks
      endpoint_name: databricks-mixtral-8x7b-instruct
      extra_params: {}
      host: e2-dogfood.staging.cloud.databricks.com
      max_tokens: null
      model_kwargs: null
      n: 1
      stop: null
      task: null
      temperature: 0.0
      transform_input_fn: 80049520000000000000008c085f5f6d61696e5f5f948c0f7472616e73666f726d5f696e7075749493942e
      transform_output_fn: null
```

@baskaryan

```python
from langchain_community.embeddings import DatabricksEmbeddings
from langchain_community.llms import Databricks
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
import mlflow

embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")

def transform_input(**request):
  request["messages"] = [
    {
      "role": "user",
      "content": request["prompt"]
    }
  ]
  del request["prompt"]
  return request

llm = Databricks(endpoint_name="databricks-mixtral-8x7b-instruct", transform_input_fn=transform_input)

persist_dir = "faiss_databricks_embedding"

# Create the vector db, persist the db to a local fs folder
loader = TextLoader("state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
db = FAISS.from_documents(docs, embeddings)
db.save_local(persist_dir)

def load_retriever(persist_directory):
    embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")
    vectorstore = FAISS.load_local(persist_directory, embeddings)
    return vectorstore.as_retriever()

retriever = load_retriever(persist_dir)
retrievalQA = RetrievalQA.from_llm(llm=llm, retriever=retriever)
with mlflow.start_run() as run:
    logged_model = mlflow.langchain.log_model(
        retrievalQA,
        artifact_path="retrieval_qa",
        loader_fn=load_retriever,
        persist_dir=persist_dir,
    )

# Load the retrievalQA chain
loaded_model = mlflow.pyfunc.load_model(logged_model.model_uri)
print(loaded_model.predict([{"query": "What did the president say about Ketanji Brown Jackson"}]))

```
5 months ago
cjpark-data ce22e10c4b
community[patch]: Fix KeyError 'embedding' (MongoDBAtlasVectorSearch) (#17178)
- **Description:**
Embedding field name was hard-coded named "embedding".
So I suggest that change `res["embedding"]` into
`res[self._embedding_key]`.
  - **Issue:** #17177,
- **Twitter handle:**
[@bagcheoljun17](https://twitter.com/bagcheoljun17)
5 months ago
Neli Hateva 9bb5157a3d
langchain[patch], community[patch]: Fixes in the Ontotext GraphDB Graph and QA Chain (#17239)
- **Description:** Fixes in the Ontotext GraphDB Graph and QA Chain
related to the error handling in case of invalid SPARQL queries, for
which `prepareQuery` doesn't throw an exception, but the server returns
400 and the query is indeed invalid
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** @OntotextGraphDB
5 months ago