Commit Graph

3072 Commits (9bf58ec7dd0dd16760baa95f5ede37ae7bbd77fa)

Author SHA1 Message Date
Erick Friis 065cde69b1
google-genai[patch]: release 0.0.9, safety settings docs (#17432) 7 months ago
Sergey Kozlov db6f266d97
core: improve None value processing in merge_dicts() (#17462)
- **Description:** fix `None` and `0` merging in `merge_dicts()`, add
tests.
```python
from langchain_core.utils._merge import merge_dicts
assert merge_dicts({"a": None}, {"a": 0}) == {"a": 0}
```

---------

Co-authored-by: Sergey Kozlov <sergey.kozlov@ludditelabs.io>
7 months ago
Ian Gregory e5472b5eb8
Framework for supporting more languages in LanguageParser (#13318)
## Description

I am submitting this for a school project as part of a team of 5. Other
team members are @LeilaChr, @maazh10, @Megabear137, @jelalalamy. This PR
also has contributions from community members @Harrolee and @Mario928.

Initial context is in the issue we opened (#11229).

This pull request adds:

- Generic framework for expanding the languages that `LanguageParser`
can handle, using the
[tree-sitter](https://github.com/tree-sitter/py-tree-sitter#py-tree-sitter)
parsing library and existing language-specific parsers written for it
- Support for the following additional languages in `LanguageParser`:
  - C
  - C++
  - C#
  - Go
- Java (contributed by @Mario928
https://github.com/ThatsJustCheesy/langchain/pull/2)
  - Kotlin
  - Lua
  - Perl
  - Ruby
  - Rust
  - Scala
- TypeScript (contributed by @Harrolee
https://github.com/ThatsJustCheesy/langchain/pull/1)

Here is the [design
document](https://docs.google.com/document/d/17dB14cKCWAaiTeSeBtxHpoVPGKrsPye8W0o_WClz2kk)
if curious, but no need to read it.

## Issues

- Closes #11229
- Closes #10996
- Closes #8405

## Dependencies

`tree_sitter` and `tree_sitter_languages` on PyPI. We have tried to add
these as optional dependencies.

## Documentation

We have updated the list of supported languages, and also added a
section to `source_code.ipynb` detailing how to add support for
additional languages using our framework.

## Maintainer

- @hwchase17 (previously reviewed
https://github.com/langchain-ai/langchain/pull/6486)

Thanks!!

## Git commits

We will gladly squash any/all of our commits (esp merge commits) if
necessary. Let us know if this is desirable, or if you will be
squash-merging anyway.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Maaz Hashmi <mhashmi373@gmail.com>
Co-authored-by: LeilaChr <87657694+LeilaChr@users.noreply.github.com>
Co-authored-by: Jeremy La <jeremylai511@gmail.com>
Co-authored-by: Megabear137 <zubair.alnoor27@gmail.com>
Co-authored-by: Lee Harrold <lhharrold@sep.com>
Co-authored-by: Mario928 <88029051+Mario928@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
7 months ago
Bagatur 3925071dd6
langchain[patch], templates[patch]: fix multi query retriever, web re… (#17434)
…search retriever

Fixes #17352
7 months ago
Bagatur c0ce93236a
experimental[patch]: fix zero-shot pandas agent (#17442) 7 months ago
Abhishek Jain 37e1275f9e
community[patch]: Fixed the 'aembed' method of 'CohereEmbeddings'. (#16497)
**Description:**
- The existing code was trying to find a `.embeddings` property on the
`Coroutine` returned by calling `cohere.async_client.embed`.
- Instead, the `.embeddings` property is present on the value returned
by the `Coroutine`.
- Also, it seems that the original cohere client expects a value of
`max_retries` to not be `None`. Hence, setting the default value of
`max_retries` to `3`.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Sridhar Ramaswamy 9f1cbbc6ed
community[minor]: Add pebblo safe document loader (#16862)
- **Description:** Pebblo opensource project enables developers to
safely load data to their Gen AI apps. It identifies semantic topics and
entities found in the loaded data and summarizes them in a
developer-friendly report.
  - **Dependencies:** none
  - **Twitter handle:** srics

@hwchase17
7 months ago
mhavey 1bbb64d956
community[minor], langchian[minor]: Add Neptune Rdf graph and chain (#16650)
**Description**: This PR adds a chain for Amazon Neptune graph database
RDF format. It complements the existing Neptune Cypher chain. The PR
also includes a Neptune RDF graph class to connect to, introspect, and
query a Neptune RDF graph database from the chain. A sample notebook is
provided under docs that demonstrates the overall effect: invoking the
chain to make natural language queries against Neptune using an LLM.

**Issue**: This is a new feature
 
**Dependencies**: The RDF graph class depends on the AWS boto3 library
if using IAM authentication to connect to the Neptune database.

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Michael Feil e1cfd0f3e7
community[patch]: infinity embeddings update incorrect default url (#16759)
The default url has always been incorrect (7797 instead 7997). Here is a
update to the correct url.
7 months ago
Massimiliano Pronesti df7cbd6fbb
community[minor]: add FlashRank ranker (#16785)
**Description:** This PR adds support for
[flashrank](https://github.com/PrithivirajDamodaran/FlashRank) for
reranking as alternative to Cohere.

I'm not sure `libs/langchain` is the right place for this change. At
first, I wanted to put it under `libs/community`. All the compressors
were under `libs/langchain/retrievers/document_compressors` though. Hope
this makes sense!
7 months ago
Andreas Motl 1fdd9bd980
community/SQLDatabase: Generalize and trim software tests (#16659)
- **Description:** Improve test cases for `SQLDatabase` adapter
component, see
[suggestion](https://github.com/langchain-ai/langchain/pull/16655#pullrequestreview-1846749474).
  - **Depends on:** GH-16655
  - **Addressed to:** @baskaryan, @cbornet, @eyurtsev

_Remark: This PR is stacked upon GH-16655, so that one will need to go
in first._

Edit: Thank you for bringing in GH-17191, @eyurtsev. This is a little
aftermath, improving/streamlining the corresponding test cases.
7 months ago
Theo / Taeyoon Kang 1987f905ed
core[patch]: Support .yml extension for YAML (#16783)
- **Description:**

[AS-IS] When dealing with a yaml file, the extension must be .yaml.  

[TO-BE] In the absence of extension length constraints in the OS, the
extension of the YAML file is yaml, but control over the yml extension
must still be made.

It's as if it's an error because it's a .jpg extension in jpeg support.

  - **Issue:** - 

  - **Dependencies:**
no dependencies required for this change,
7 months ago
Kapil Sachdeva cd00a87db7
community[patch] - in FAISS vector store, support passing custom DocStore implementation when using from_xxx methods (#16801)
- **Description:** The from__xx methods of FAISS class have hardcoded
InMemoryStore implementation and thereby not let users pass a custom
DocStore implementation,
  - **Issue:** no referenced issue,
  - **Dependencies:** none,
  - **Twitter handle:** ksachdeva
7 months ago
Chris f9f5626ca4
community[patch]: Fix github search issues and PRs PaginatedList has no len() error (#16806)
**Description:** 
Bugfix: Langchain_community's GitHub Api wrapper throws a TypeError when
searching for issues and/or PRs (the `search_issues_and_prs` method).
This is because PyGithub's PageinatedList type does not support the
len() method. See https://github.com/PyGithub/PyGithub/issues/1476

![image](https://github.com/langchain-ai/langchain/assets/8849021/57390b11-ed41-4f48-ba50-f3028610789c)
  **Dependencies:** None 
  **Twitter handle**: @ChrisKeoghNZ
  
I haven't registered an issue as it would take me longer to fill the
template out than to make the fix, but I'm happy to if that's deemed
essential.

I've added a simple integration test to cover this as there were no
existing unit tests and it was going to be tricky to set them up.

Co-authored-by: Chris Keogh <chris.keogh@xero.com>
7 months ago
morgana 722aae4fd1
community: add delete method to rocksetdb vectorstore to support recordmanager (#17030)
- **Description:** This adds a delete method so that rocksetdb can be
used with `RecordManager`.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** `@_morgan_adams_`

---------

Co-authored-by: Rockset API Bot <admin@rockset.io>
7 months ago
yin1991 c454dc36fc
community[proxy]: Enhancement/add proxy support playwrighturlloader 16751 (#16822)
- **Description:** Enhancement/add proxy support playwrighturlloader
16751
- **Issue:** [Enhancement: Add Proxy Support to PlaywrightURLLoader
Class](https://github.com/langchain-ai/langchain/issues/16751)
  - **Dependencies:** 
  - **Twitter handle:** @ootR77013489

---------

Co-authored-by: root <root@ip-172-31-46-160.ap-southeast-1.compute.internal>
Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Bhupesh Varshney e3b775e035
infra: make `.gitignore` consistent with standard python gitignore (#16828)
- The new .gitignore version is inherited from the one maintained by the
github community over at
https://github.com/github/gitignore/blob/main/Python.gitignore
- This should cover all the cases of how a langchain app can be used.
7 months ago
James Braza 64938ae6f2
infra: unit testing `check_package_version` (#16825)
Wrote a unit test for `check_package_version` in the core package.

Note that this is a revival of
https://github.com/langchain-ai/langchain/pull/16387 after GitHub
incident (see
https://github.com/langchain-ai/langchain/discussions/16796).
7 months ago
Lingzhen Chen 30af711c34
community[patch]: update AzureSearch class to work with azure-search-documents=11.4.0 (#15659)
- **Description:** Updates
`libs/community/langchain_community/vectorstores/azuresearch.py` to
support the stable version `azure-search-documents=11.4.0`
- **Issue:** https://github.com/langchain-ai/langchain/issues/14534,
https://github.com/langchain-ai/langchain/issues/15039,
https://github.com/langchain-ai/langchain/issues/15355
  - **Dependencies:** azure-search-documents>=11.4.0

---------

Co-authored-by: Clément Tamines <Skar0@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Robby e135dc70c3
community[patch]: Invoke callback prior to yielding token (#17348)
**Description:** Invoke callback prior to yielding token in stream
method for Ollama.
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)

Co-authored-by: Robby <h0rv@users.noreply.github.com>
7 months ago
Christophe Bornet ab025507bc
community[patch]: Add async methods to VectorStoreQATool (#16949) 7 months ago
Christophe Bornet fb7552bfcf
Add async methods to InMemoryCache (#17425)
Add async methods to InMemoryCache
7 months ago
Eugene Yurtsev 93472ee9e6
core[patch]: Replace memory stream implementation used by LogStreamCallbackHandler (#17185)
This PR replaces the memory stream implementation used by the 
LogStreamCallbackHandler.

This implementation resolves an issue in which streamed logs and
streamed events originating from sync code would arrive only after the
entire sync code would finish execution (rather than arriving in real
time as they're generated).

One example is if trying to stream tokens from an llm within a tool. If
the tool was an async tool, but the llm was invoked via stream (sync
variant) rather than astream (async variant), then the tokens would fail
to stream in real time and would all arrived bunched up after the tool
invocation completed.
7 months ago
yin1991 37ef6ac113
community[patch]: Add Pagination to GitHubIssuesLoader for Efficient GitHub Issues Retrieval (#16934)
- **Description:** Add Pagination to GitHubIssuesLoader for Efficient
GitHub Issues Retrieval
- **Issue:** [the issue # it fixes if
applicable,](https://github.com/langchain-ai/langchain/issues/16864)

---------

Co-authored-by: root <root@ip-172-31-46-160.ap-southeast-1.compute.internal>
Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Bagatur 22638e5927
community[patch]: give reranker default client val (#17289) 7 months ago
Robby ece4b43a81
community[patch]: doc loaders mypy fixes (#17368)
**Description:** Fixed `type: ignore`'s for mypy for some
document_loaders.
**Issue:** [Remove "type: ignore" comments #17048
](https://github.com/langchain-ai/langchain/issues/17048)

---------

Co-authored-by: Robby <h0rv@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
7 months ago
Robby 0653aa469a
community[patch]: Invoke callback prior to yielding token (#17346)
**Description:** Invoke callback prior to yielding token in stream
method for watsonx.
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)

Co-authored-by: Robby <h0rv@users.noreply.github.com>
7 months ago
Bagatur f7e453971d
community[patch]: remove print (#17435) 7 months ago
Spencer Kelly 54fa78c887
community[patch]: fixed vector similarity filtering (#16967)
**Description:** changed filtering so that failed filter doesn't add
document to results. Currently filtering is entirely broken and all
documents are returned whether or not they pass the filter.

fixes issue introduced in
https://github.com/langchain-ai/langchain/pull/16190
7 months ago
Aditya a23c719c8b
google-genai[minor]: add safety settings (#16836)
Replace this entire comment with:
- **Description:Expose safety_settings for Gemini integrations on
google-generativeai
  - **Issue:NA,
  - **Dependencies:NA
  - **Twitter handle:@aditya_rane

@lkuligin for review

---------

Co-authored-by: adityarane@google.com <adityarane@google.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago
Abhijeeth Padarthi 584b647b96
community[minor]: AWS Athena Document Loader (#15625)
- **Description:** Adds the document loader for [AWS
Athena](https://aws.amazon.com/athena/), a serverless and interactive
analytics service.
  - **Dependencies:** Added boto3 as a dependency
7 months ago
david-tempelmann 93da18b667
community[minor]: Add mmr and similarity_score_threshold retrieval to DatabricksVectorSearch (#16829)
- **Description:** This PR adds support for `search_types="mmr"` and
`search_type="similarity_score_threshold"` to retrievers using
`DatabricksVectorSearch`,
  - **Issue:** 
  - **Dependencies:**
  - **Twitter handle:**

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Erick Friis 42648061ad
openai[patch]: code cleaning (#17355)
h/t @tdene for finding cleanup op in #17047
7 months ago
Massimiliano Pronesti 3894b4d9a5
community: add gpt-4-turbo and gpt-4-0125 costs (#17349)
Ref: https://openai.com/pricing
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
7 months ago
Tomaz Bratanic 19a1c9183d
Improve graph cypher qa prompt (#17380)
Unlike vector results, the LLM has to completely trust the context of a
graph database result, even if it doesn't provide whole context. We
tried with instructions, but it seems that adding a single example is
the way to go to solve this issue.
7 months ago
Sandeep Banerjee 183daa6e6f
google-genai[patch]: on_llm_new_token fix (#16924)
### This pull request makes the following changes:
* Fixed issue #16913

Fixed the google gen ai chat_models.py code to make sure that the
callback is called before the token is yielded

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago
Bagatur 10c10f2dea
cli[patch]: integration template nits (#14691)
Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago
Erick Friis 99540d3d75
infra: no print in newer partner packages (#17353)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
7 months ago
William FH 7c03cc5ed4
Support serialization when inputs/outputs contain generators (#17338)
Pydantic's `dict()` function raises an error here if you pass in a
generator. We have a more robust serialization function in lagnsmith
that we will use instead.
7 months ago
Erick Friis 3a2eb6e12b
infra: add print rule to ruff (#16221)
Added noqa for existing prints. Can slowly remove / will prevent more
being intro'd
7 months ago
Jael Gu c07c0da01a
community[patch]: Fix Milvus add texts when ids=None (#17021)
- **Description:** Fix Milvus add texts when ids=None (auto_id=True)

Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Quang Hoa 54c1fb3f25
community[patch]: Make some functions work with Milvus (#10695)
**Description**
Make some functions work with Milvus:
1. get_ids: Get primary keys by field in the metadata
2. delete: Delete one or more entities by ids
3. upsert: Update/Insert one or more entities

**Issue**
None
**Dependencies**
None
**Tag maintainer:**
@hwchase17 
**Twitter handle:**
None

---------

Co-authored-by: HoaNQ9 <hoanq.1811@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago
kYLe c9999557bf
community[patch]: Modify LLMs/Anyscale work with OpenAI API v1 (#14206)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
- **Description:** 
1. Modify LLMs/Anyscale to work with OAI v1
2. Get rid of openai_ prefixed variables in Chat_model/ChatAnyscale
3. Modify `anyscale_api_base` to `anyscale_base_url` to follow OAI name
convention (reverted)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago
Charlie Marsh 24c0bab57b
infra, multiple: Upgrade configuration for Ruff v0.2.0 (#16905)
## Summary

This PR upgrades LangChain's Ruff configuration in preparation for
Ruff's v0.2.0 release. (The changes are compatible with Ruff v0.1.5,
which LangChain uses today.) Specifically, we're now warning when
linter-only options are specified under `[tool.ruff]` instead of
`[tool.ruff.lint]`.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Bagatur 01409add5a
google-vertexai[patch]: rm deps (#17077)
Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago
Erick Friis 1c2facf88d
nvidia-ai-endpoints[patch]: release 0.0.3 (#17345) 7 months ago
Vadim Kudlay 5f9ac6986e
nvidia-ai-endpoints[patch]: model arguments (e.g. temperature) on construction bug (#17290)
- **Issue:** Issue with model argument support (been there for a while
actually):
- Non-specially-handled arguments like temperature don't work when
passed through constructor.
- Such arguments DO work quite well with `bind`, but also do not abide
by field requirements.
- Since initial push, server-side error messages have gotten better and
v0.0.2 raises better exceptions. So maybe it's better to let server-side
handle such issues?
- **Description:**
- Removed ChatNVIDIA's argument fields in favor of
`model_kwargs`/`model_kws` arguments which aggregates constructor kwargs
(from constructor pathway) and merges them with call kwargs (bind
pathway).
- Shuffled a few functions from `_NVIDIAClient` to `ChatNVIDIA` to
streamline construction for future integrations.
- Minor/Optional: Old services didn't have stop support, so client-side
stopping was implemented. Now do both.
- **Any Breaking Changes:** Minor breaking changes if you strongly rely
on chat_model.temperature, etc. This is captured by
chat_model.model_kwargs.

PR passes tests and example notebooks and example testing. Still gonna
chat with some people, so leaving as draft for now.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago
Leonid Ganeline 932c52c333
community[patch]: docstrings (#16810)
- added missed docstrings
- formated docstrings to the consistent form
7 months ago
Leonid Ganeline ae66bcbc10
core[patch]: docstring update (#16813)
- added missed docstrings
- formated docstrings to consistent form
7 months ago
Eugene Yurtsev e10030e241
core[patch]: Add unit test to cover different streaming format for json parsing (#17063)
Add unit test to cover this issue:

https://github.com/langchain-ai/langchain/issues/16423

which was resolved by this PR:

https://github.com/langchain-ai/langchain/pull/16670/files

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Kononov Pavel 15bc201967
langchain_community: Fix typo bug (#17324)
Problem from #17095

This error wasn't in the v1.4.0
7 months ago
Erick Friis e660a1685b
google-genai[patch]: release 0.0.8 (#17285) 7 months ago
Erick Friis febf9540b9
google-genai[patch]: fix tool format, use protos (#17284) 7 months ago
German Martin 1032faba5f
langchain_google_genai : Add missing _identifying_params property. (#17224)
Description: Missing _identifying_params create issues when dealing with
callbacks to get current run model parameters.
All other model partners implementation provide this property and also
provide _default_params. I'm not sure about the default values to
include or if we can re-use the same as for _VertexAICommon(), this
change allows you to access the model parameters correctly.
Issue: Not exactly this issue but could be related
https://github.com/langchain-ai/langchain/issues/14711
Twitter handle:@musicaoriginal2
7 months ago
Erick Friis e4da7918f3
google-genai[patch]: fix streaming, function calling (#17268) 7 months ago
Ruben Hakopian 96b5711a0c
google-vertexai[patch]: Fixed SafetySettings handling in streaming API in VertexAI (#17278)
The streaming API doesn't separate safety_settings from the
generation_config payload. As the result the following error is observed
when using `stream` API. The functionality is correct with `invoke` API.

The fix separates the `safety_settings` from params and sets it as
argument to the `send_message` method.

```
ERROR:         Unknown field for GenerationConfig: safety_settings
Traceback (most recent call last):
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 250, in stream
    raise e
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 234, in stream
    for chunk in self._stream(
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/langchain_google_vertexai/chat_models.py", line 501, in _stream
    for response in responses:
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/vertexai/generative_models/_generative_models.py", line 921, in _send_message_streaming
    for chunk in stream:
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/vertexai/generative_models/_generative_models.py", line 514, in _generate_content_streaming
    request = self._prepare_request(
              ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/vertexai/generative_models/_generative_models.py", line 256, in _prepare_request
    gapic_generation_config = gapic_content_types.GenerationConfig(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/proto/message.py", line 576, in __init__
    raise ValueError(
ValueError: Unknown field for GenerationConfig: safety_settings
```

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago
Bagatur 65e97c9b53
infra: mv SQLDatabase tests to community (#17276) 7 months ago
Bagatur 72c7af0bc0
langchain[patch]: undo redis cache import (#17275) 7 months ago
Bagatur 8bad4157ad
langchain[patch]: Release 0.1.6 (#17133) 7 months ago
Bagatur 7fa4dc593f
core[patch]: Release 0.1.22 (#17274) 7 months ago
Bagatur 02ef9164b5
langchain[patch]: expose cohere rerank score, add parent doc param (#16887) 7 months ago
Bagatur 35c1bf339d
infra: rm boto3, gcaip from pyproject (#17270) 7 months ago
Alex de5e96b5f9
community[patch]: updated openai prices in mapping (#17009)
- **Description:** there are january prices update for chatgpt
[blog](https://openai.com/blog/new-embedding-models-and-api-updates),
also there are updates on their website on page
[pricing](https://openai.com/pricing)
- **Issue:** N/A

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Mohammad Mohtashim e35c7fa3b2
[Langchain_core]: Added Docstring for RunnableConfigurableAlternatives (#17263)
I noticed that RunnableConfigurableAlternatives which is an important
composition in LCEL has no Docstring. Therefore I added the detailed
Docstring for it.
@baskaryan, @eyurtsev, @hwchase17 please have a look and let me if the
docstring is looking good.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Armin Stepanyan 641efcf41c
community: add runtime kwargs to HuggingFacePipeline (#17005)
This PR enables changing the behaviour of huggingface pipeline between
different calls. For example, before this PR there's no way of changing
maximum generation length between different invocations of the chain.
This is desirable in cases, such as when we want to scale the maximum
output size depending on a dynamic prompt size.

Usage example:

```python
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
hf = HuggingFacePipeline(pipeline=pipe)

hf("Say foo:", pipeline_kwargs={"max_new_tokens": 42})
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Scott Nath a32798abd7
community: Add you.com utility, update you retriever integration docs (#17014)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

- **Description: changes to you.com files** 
    - general cleanup
- adds community/utilities/you.py, moving bulk of code from retriever ->
utility
    - removes `snippet` as endpoint
    - adds `news` as endpoint
    - adds more tests

<s>**Description: update community MAKE file** 
    - adds `integration_tests`
    - adds `coverage`</s>

- **Issue:** the issue # it fixes if applicable,
- [For New Contributors: Update Integration
Documentation](https://github.com/langchain-ai/langchain/issues/15664#issuecomment-1920099868)
- **Dependencies:** n/a
- **Twitter handle:** @scottnath
- **Mastodon handle:** scottnath@mastodon.social

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
joelsprunger 3984f6604f
langchain: adds recursive json splitter (#17144)
- **Description:** This adds a recursive json splitter class to the
existing text_splitters as well as unit tests
- **Issue:** splitting text from structured data can cause issues if you
have a large nested json object and you split it as regular text you may
end up losing the structure of the json. To mitigate against this you
can split the nested json into large chunks and overlap them, but this
causes unnecessary text processing and there will still be times where
the nested json is so big that the chunks get separated from the parent
keys.

As an example you wouldn't want the following to be split in half:
```shell
{'val0': 'DFWeNdWhapbR',
 'val1': {'val10': 'QdJo',
          'val11': 'FWSDVFHClW',
          'val12': 'bkVnXMMlTiQh',
          'val13': 'tdDMKRrOY',
          'val14': 'zybPALvL',
          'val15': 'JMzGMNH',
          'val16': {'val160': 'qLuLKusFw',
                    'val161': 'DGuotLh',
                    'val162': 'KztlcSBropT',
-----------------------------------------------------------------------split-----
                    'val163': 'YlHHDrN',
                    'val164': 'CtzsxlGBZKf',
                    'val165': 'bXzhcrWLmBFp',
                    'val166': 'zZAqC',
                    'val167': 'ZtyWno',
                    'val168': 'nQQZRsLnaBhb',
                    'val169': 'gSpMbJwA'},
          'val17': 'JhgiyF',
          'val18': 'aJaqjUSFFrI',
          'val19': 'glqNSvoyxdg'}}
```
Any llm processing the second chunk of text may not have the context of
val1, and val16 reducing accuracy. Embeddings will also lack this
context and this makes retrieval less accurate.

Instead you want it to be split into chunks that retain the json
structure.
```shell
{'val0': 'DFWeNdWhapbR',
 'val1': {'val10': 'QdJo',
          'val11': 'FWSDVFHClW',
          'val12': 'bkVnXMMlTiQh',
          'val13': 'tdDMKRrOY',
          'val14': 'zybPALvL',
          'val15': 'JMzGMNH',
          'val16': {'val160': 'qLuLKusFw',
                    'val161': 'DGuotLh',
                    'val162': 'KztlcSBropT',
                    'val163': 'YlHHDrN',
                    'val164': 'CtzsxlGBZKf'}}}
```
and
```shell
{'val1':{'val16':{
                    'val165': 'bXzhcrWLmBFp',
                    'val166': 'zZAqC',
                    'val167': 'ZtyWno',
                    'val168': 'nQQZRsLnaBhb',
                    'val169': 'gSpMbJwA'},
          'val17': 'JhgiyF',
          'val18': 'aJaqjUSFFrI',
          'val19': 'glqNSvoyxdg'}}
```
This recursive json text splitter does this. Values that contain a list
can be converted to dict first by using split(... convert_lists=True)
otherwise long lists will not be split and you may end up with chunks
larger than the max chunk.

In my testing large json objects could be split into small chunks with 
   Increased question answering accuracy
 The ability to split into smaller chunks meant retrieval queries can
use fewer tokens


- **Dependencies:** json import added to text_splitter.py, and random
added to the unit test
  - **Twitter handle:** @joelsprunger

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
7 months ago
Leonid Kuligin 1862900078
google-genai[patch]: added parsing of function call / response (#17245) 7 months ago
Cailin Wang a210a8bc53
langchain[patch]: Fix create_retriever_tool missing on_retriever_end Document content (#16933)
- **Description:** In create_retriever_tool create_tool, fix
create_retriever_tool's missing Document content for on_retriever_end,
caused by create_retriever_tool's missing callbacks parameter,
  - **Twitter handle:** @CailinWang_

---------

Co-authored-by: root <root@Bluedot-AI>
Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Sparsh Jain a2167614b7
google-genai[patch]: Invoke callback prior to yielding token (#17092)
- **Description:** Invoke callback prior to yielding token in stream and
astream methods for Google-genai,
  - **Issue:** the issue # 16913,
  - **Twitter handle:** Sparsh10649446

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
7 months ago
Liang Zhang 7306600e2f
community[patch]: Support SerDe transform functions in Databricks LLM (#16752)
**Description:** Databricks LLM does not support SerDe the
transform_input_fn and transform_output_fn. After saving and loading,
the LLM will be broken. This PR serialize these functions into a hex
string using pickle, and saving the hex string in the yaml file. Using
pickle to serialize a function can be flaky, but this is a simple
workaround that unblocks many use cases. If more sophisticated SerDe is
needed, we can improve it later.

Test:
Added a simple unit test.
I did manual test on Databricks and it works well.
The saved yaml looks like:
```
llm:
      _type: databricks
      cluster_driver_port: null
      cluster_id: null
      databricks_uri: databricks
      endpoint_name: databricks-mixtral-8x7b-instruct
      extra_params: {}
      host: e2-dogfood.staging.cloud.databricks.com
      max_tokens: null
      model_kwargs: null
      n: 1
      stop: null
      task: null
      temperature: 0.0
      transform_input_fn: 80049520000000000000008c085f5f6d61696e5f5f948c0f7472616e73666f726d5f696e7075749493942e
      transform_output_fn: null
```

@baskaryan

```python
from langchain_community.embeddings import DatabricksEmbeddings
from langchain_community.llms import Databricks
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
import mlflow

embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")

def transform_input(**request):
  request["messages"] = [
    {
      "role": "user",
      "content": request["prompt"]
    }
  ]
  del request["prompt"]
  return request

llm = Databricks(endpoint_name="databricks-mixtral-8x7b-instruct", transform_input_fn=transform_input)

persist_dir = "faiss_databricks_embedding"

# Create the vector db, persist the db to a local fs folder
loader = TextLoader("state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
db = FAISS.from_documents(docs, embeddings)
db.save_local(persist_dir)

def load_retriever(persist_directory):
    embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")
    vectorstore = FAISS.load_local(persist_directory, embeddings)
    return vectorstore.as_retriever()

retriever = load_retriever(persist_dir)
retrievalQA = RetrievalQA.from_llm(llm=llm, retriever=retriever)
with mlflow.start_run() as run:
    logged_model = mlflow.langchain.log_model(
        retrievalQA,
        artifact_path="retrieval_qa",
        loader_fn=load_retriever,
        persist_dir=persist_dir,
    )

# Load the retrievalQA chain
loaded_model = mlflow.pyfunc.load_model(logged_model.model_uri)
print(loaded_model.predict([{"query": "What did the president say about Ketanji Brown Jackson"}]))

```
7 months ago
cjpark-data ce22e10c4b
community[patch]: Fix KeyError 'embedding' (MongoDBAtlasVectorSearch) (#17178)
- **Description:**
Embedding field name was hard-coded named "embedding".
So I suggest that change `res["embedding"]` into
`res[self._embedding_key]`.
  - **Issue:** #17177,
- **Twitter handle:**
[@bagcheoljun17](https://twitter.com/bagcheoljun17)
7 months ago
Neli Hateva 9bb5157a3d
langchain[patch], community[patch]: Fixes in the Ontotext GraphDB Graph and QA Chain (#17239)
- **Description:** Fixes in the Ontotext GraphDB Graph and QA Chain
related to the error handling in case of invalid SPARQL queries, for
which `prepareQuery` doesn't throw an exception, but the server returns
400 and the query is indeed invalid
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** @OntotextGraphDB
7 months ago
ByeongUk Choi b88329e9a5
community[patch]: Implement Unique ID Enforcement in FAISS (#17244)
**Description:**
Implemented unique ID validation in the FAISS component to ensure all
document IDs are distinct. This update resolves issues related to
non-unique IDs, such as inconsistent behavior during deletion processes.
7 months ago
Bagatur 852973d616
langchain[minor], core[minor]: update json, pydantic parser. add openai-json structured output runnable (#16914) 7 months ago
hsuyuming e22c4d4eb0
google-vertexai[patch]: fix _parse_response_candidate issue (#16647)
**Description:** enable _parse_response_candidate to support complex
structure format.
  **Issue:** 
currently, if Gemini response complex args format, people will get
"TypeError: Object of type RepeatedComposite is not JSON serializable"
error from _parse_response_candidate.
  
 response candidate example
```
content {
  role: "model"
  parts {
    function_call {
      name: "Information"
      args {
        fields {
          key: "people"
          value {
            list_value {
              values {
                string_value: "Joe is 30, his mom is Martha"
              }
            }
          }
        }
      }
    }
  }
}
finish_reason: STOP
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}
```
 
error msg:
```
Traceback (most recent call last):
  File "/home/jupyter/user/abehsu/gemini_langchain_tools/example2.py", line 36, in <module>
    print(tagging_chain.invoke({"input": "Joe is 30, his mom is Martha"}))
  File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 2053, in invoke
    input = step.invoke(
  File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3887, in invoke
    return self.bound.invoke(
  File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 165, in invoke
    self.generate_prompt(
  File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 543, in generate_prompt
    return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
  File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 407, in generate
    raise e
  File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 397, in generate
    self._generate_with_cache(
  File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 576, in _generate_with_cache
    return self._generate(
  File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_google_vertexai/chat_models.py", line 406, in _generate
    generations = [
  File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_google_vertexai/chat_models.py", line 408, in <listcomp>
    message=_parse_response_candidate(c),
  File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_google_vertexai/chat_models.py", line 280, in _parse_response_candidate
    function_call["arguments"] = json.dumps(
  File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type RepeatedComposite is not JSON serializable
```
  

  **Twitter handle:**  @abehsu1992626
7 months ago
Erick Friis d77bb7b4e9
google-vertexai[patch]: integration test fix, release 0.0.5 (#17258) 7 months ago
Aditya 98176ac982
langchain_google_vertexai : added logic to override get_num_tokens_from_messages() for ChatVertexAI (#16784)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
- **Description: added logic to override get_num_tokens_from_messages()
for ChatVertexAI. Currently ChatVertexAI was inheriting
get_num_tokens_from_messages() from BaseChatModel which in-turn was
calling GPT-2 tokenizer
  - **Issue: NA
  - **Dependencies: NA
  - **Twitter handle:@aditya_rane

@lkuligin for review

---------

Co-authored-by: adityarane@google.com <adityarane@google.com>
Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru>
7 months ago
Bassem Yacoube 4e3ed7f043
community[patch]: octoai embeddings bug fix (#17216)
fixes a bug in octoa_embeddings provider
7 months ago
Eugene Yurtsev 780e84ae79
community[minor]: SQLDatabase Add fetch mode `cursor`, query parameters, query by selectable, expose execution options, and documentation (#17191)
- **Description:** Improve `SQLDatabase` adapter component to promote
code re-use, see
[suggestion](https://github.com/langchain-ai/langchain/pull/16246#pullrequestreview-1846590962).
  - **Needed by:** GH-16246
  - **Addressed to:** @baskaryan, @cbornet 

## Details
- Add `cursor` fetch mode
- Accept SQL query parameters
- Accept both `str` and SQLAlchemy selectables as query expression
- Expose `execution_options`
- Documentation page (notebook) about `SQLDatabase` [^1]
See [About
SQLDatabase](https://github.com/langchain-ai/langchain/blob/c1c7b763/docs/docs/integrations/tools/sql_database.ipynb).

[^1]: Apparently there hasn't been any yet?

---------

Co-authored-by: Andreas Motl <andreas.motl@crate.io>
7 months ago
Tomaz Bratanic 7e4b676d53
community[patch]: Better error propagation for neo4jgraph (#17190)
There are other errors that could happen when refreshing the schema, so
we want to propagate specific errors for more clarity
7 months ago
Luiz Ferreira 34d2daffb3
community[patch]: Fix chat openai unit test (#17124)
- **Description:** 
Actually the test named `test_openai_apredict` isn't testing the
apredict method from ChatOpenAI.
  - **Twitter handle:**
  https://twitter.com/OAlmofadas
7 months ago
Dmitry Kankalovich f92738a6f6
langchain[minor], community[minor], core[minor]: Async Cache support and AsyncRedisCache (#15817)
* This PR adds async methods to the LLM cache. 
* Adds an implementation using Redis called AsyncRedisCache.
* Adds a docker compose file at the /docker to help spin up docker
* Updates redis tests to use a context manager so flushing always happens by default
7 months ago
Erick Friis 4153837502
google-genai[patch]: release 0.0.7 (#17193) 7 months ago
Erick Friis 927ab77d6e
google-genai[patch]: no error for FunctionMessage (#17215)
Both should eventually match this:
https://github.com/langchain-ai/langchain/blob/master/libs/partners/google-vertexai/langchain_google_vertexai/chat_models.py#L179

But seems undocumented / can't find types in genai package
7 months ago
Erick Friis 2ecf318218
google-genai[patch]: match function call interface (#17213)
should match vertex
7 months ago
Erick Friis e17173c403
google-vertexai[patch]: function calling integration test (#17209) 7 months ago
Erick Friis 52be84a603
google-vertexai[patch]: serializable citation metadata, release 0.0.4 (#17145)
was breaking in langserve before
7 months ago
Nuno Campos 19ff81e74f
Fix stream events/log with some kinds of non addable output (#17205)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
7 months ago
Bagatur 6f1403b9b6
community[patch]: Release 0.0.19 (#17207)
Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago
Erick Friis a13dc47a08
cli[patch]: copyright 2024 default (#17204) 7 months ago
Bagatur 00757567ba
core[patch]: Release 0.1.21 (#17202) 7 months ago
Bagatur af74301ab9
core[patch], community[patch]: link extraction continue on failure (#17200) 7 months ago
Henry 2281f00198
langchain: Standardize `output_parser.py` across all agent types for custom `FORMAT_INSTRUCTIONS` (#17168)
- **Description:** 
This PR standardizes the `output_parser.py` file across all agent types
to ensure a uniform parsing mechanism is implemented. It introduces a
cohesive structure and common interface for output parsing, facilitating
easier modifications and extensions by users. The standardized approach
enhances maintainability and scalability of the codebase by providing a
consistent pattern for output parsing, which can be easily understood
and utilized across different agent types.

This PR builds upon the foundation set by a previously merged PR, which
focused exclusively on standardizing the `output_parser.py` for the
`conversational_agent` ([PR
#16945](https://github.com/langchain-ai/langchain/pull/16945)). With
this new update, I extend the standardization efforts to encompass
`output_parser.py` files across all agent types. This enhancement not
only unifies the parsing mechanism across the board but also introduces
the flexibility for users to incorporate custom `FORMAT_INSTRUCTIONS`.

  - **Issue:** 
https://github.com/langchain-ai/langchain/issues/10721
https://github.com/langchain-ai/langchain/issues/4044

  - **Dependencies:**
No new dependencies required for this change

  - **Twitter handle:**
With my github user is enough. Thanks

I hope you accept my PR.
7 months ago
Bagatur 78409634fe
core[patch]: Release 0.1.20 (#17194) 7 months ago
Nuno Campos 65798289a4
core[minor]: Use batched tracing in sdk (#16305)
Remove threadpool executor usage in langchain tracer, this is now
handled by sdk
7 months ago
chyroc f87b38a559
google-genai[minor]: support functions call (#15146)
Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago
Tomaz Bratanic 302989a2b1
allow optional newline in the action responses of JSON Agent parser (#17186)
Based on my experiments, the newline isn't always there, so we can make
the regex slightly more robust by allowing an optional newline after the
bacticks
7 months ago
William FH 9fa07076da
Add trace_as_chain_group metadata (#17187) 7 months ago
Erick Friis 3e58df43c2
mistralai[patch]: release 0.0.4 (#17139) 7 months ago
Erick Friis 22b6a03a28
infra: read min versions (#17135) 7 months ago
Erick Friis f881a3330c
mistralai[patch]: 16k token batching logic embed (#17136) 7 months ago
Bagatur 226f376d59
community[patch]: Release 0.0.18 (#17129)
Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago
Erick Friis 980e30c361
nvidia-ai-endpoints[patch]: release 0.0.2 (#17125) 7 months ago
Erick Friis 15bd1154a7
pinecone[patch]: integration test new namespace (#17121) 7 months ago
Mikhail Khludnev 14ff1438e6
nvidia-trt[patch]: propagate InferenceClientException to the caller. (#16936)
- **Description:**  
 
before the change I've got

1. propagate InferenceClientException to the caller.
2. stop grpc receiver thread on exception 

```
        for token in result_queue:
>           result_str += token
E           TypeError: can only concatenate str (not "InferenceServerException") to str

../../langchain_nvidia_trt/llms.py:207: TypeError
```
And stream thread keeps running. 

after the change request thread stops correctly and caller got a root
cause exception:

```
E                   tritonclient.utils.InferenceServerException: [request id: 4529729] expected number of inputs between 2 and 3 but got 10 inputs for model 'vllm_model'

../../langchain_nvidia_trt/llms.py:205: InferenceServerException
```

  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
  - **Twitter handle:** [t.me/mkhl_spb](https://t.me/mkhl_spb)
 
I'm not sure about test coverage. Should I setup deep mocks or there's a
kind of triton stub via testcontainers or so.
7 months ago
Junyoung Park 1ed73f1992
community[minor]: Add SelfQueryRetriever support to PGVector (#16991)
- **Description:** Add SelfQueryRetriever support to PGVector
  - **Issue:** -
  - **Dependencies:** -
  - **Twitter handle:** -

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Bagatur cd945e3a5b
core[patch]: Release 0.1.19 (#17117) 7 months ago
Frank ef082c77b1
community[minor]: add github file loader to load any github file content b… (#15305)
### Description
support load any github file content based on file extension.  

Why not use [git
loader](https://python.langchain.com/docs/integrations/document_loaders/git#load-existing-repository-from-disk)
?
git loader clones the whole repo even only interested part of files,
that's too heavy. This GithubFileLoader only downloads that you are
interested files.

### Twitter handle
my twitter: @shufanhaotop

---------

Co-authored-by: Hao Fan <h_fan@apple.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Henry eaeb8a5f71
langchain[patch]: `output_parser.py` in conversation_chat is customizable (#16945)
**Description:**
With this modification, users can customize the `FORMAT_INSTRUCTIONS`
template, allowing them to create their own prompts

As it is happening in
[this](https://github.com/langchain-ai/langchain/issues/10721) issue,
the `FORMAT_INSTRUCTIONS` is not customizable for the output parser,
unless you create your own class `ConvoOutputParser`. To avoid this, a
modification was done, creating a `format_instruction` variable that
users can customize with ease after initialize the agent.

For example:
```
agent = initialize_agent(
    agent = AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
    tools = tools,
    llm = llm_agent,
    verbose = True,
    max_iterations = 3,
    early_stopping_method = 'generate',
    memory = b_w_memory,
    handle_parsing_errors = True,
    agent_kwargs={
        'system_message':PREFIX,
        'human_message':SUFFIX,
        'template_tool_response':TEMPLATE_TOOL_RESPONSE,
        }
)
agent.agent.output_parser.format_instructions = "MY CUSTOM FORMAT INSTRUCTIONS"
print(agent.agent.output_parser.get_format_instructions())
MY CUSTOM FORMAT INSTRUCTIONS
```

Other parameters like `system_message`, `human_message`, or
`template_tool_response` are already customizable and with this PR, the
last parameter `FORMAT_INSTRUCTIONS` in
`langchain.agents.conversational_chat.prompt` can be modified.


**Issue:**
https://github.com/langchain-ai/langchain/issues/10721

**Dependencies:**
No new dependencies required for this change

**Twitter handle:**
With my github user is enough. Thanks

I hope you accept my PR.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Ryan Kraus f027696b5f
community: Added new Utility runnables for NVIDIA Riva. (#15966)
**Please tag this issue with `nvidia_genai`**

- **Description:** Added new Runnables for integration NVIDIA Riva into
LCEL chains for Automatic Speech Recognition (ASR) and Text To Speech
(TTS).
- **Issue:** N/A
- **Dependencies:** To use these runnables, the NVIDIA Riva client
libraries are required. It they are not installed, an error will be
raised instructing how to install them. The Runnables can be safely
imported without the riva client libraries.
- **Twitter handle:** N/A

All of the Riva Runnables are inside a single folder in the Utilities
module. In this folder are four files:
- common.py - Contains all code that is common to both TTS and ASR
- stream.py - Contains a class representing an audio stream that allows
the end user to put data into the stream like a queue.
- asr.py - Contains the RivaASR runnable
- tts.py - Contains the RivaTTS runnable

The following Python function is an example of creating a chain that
makes use of both of these Runnables:

```python
def create(
    config: Configuration,
    audio_encoding: RivaAudioEncoding,
    sample_rate: int,
    audio_channels: int = 1,
) -> Runnable[ASRInputType, TTSOutputType]:
    """Create a new instance of the chain."""
    _LOGGER.info("Instantiating the chain.")

    # create the riva asr client
    riva_asr = RivaASR(
        url=str(config.riva_asr.service.url),
        ssl_cert=config.riva_asr.service.ssl_cert,
        encoding=audio_encoding,
        audio_channel_count=audio_channels,
        sample_rate_hertz=sample_rate,
        profanity_filter=config.riva_asr.profanity_filter,
        enable_automatic_punctuation=config.riva_asr.enable_automatic_punctuation,
        language_code=config.riva_asr.language_code,
    )

    # create the prompt template
    prompt = PromptTemplate.from_template("{user_input}")

    # model = ChatOpenAI()
    model = ChatNVIDIA(model="mixtral_8x7b")  # type: ignore

    # create the riva tts client
    riva_tts = RivaTTS(
        url=str(config.riva_asr.service.url),
        ssl_cert=config.riva_asr.service.ssl_cert,
        output_directory=config.riva_tts.output_directory,
        language_code=config.riva_tts.language_code,
        voice_name=config.riva_tts.voice_name,
    )

    # construct and return the chain
    return {"user_input": riva_asr} | prompt | model | riva_tts  # type: ignore
```

The following code is an example of creating a new audio stream for
Riva:

```python
input_stream = AudioStream(maxsize=1000)
# Send bytes into the stream
for chunk in audio_chunks:
    await input_stream.aput(chunk)
input_stream.close()
```

The following code is an example of how to execute the chain with
RivaASR and RivaTTS

```python
output_stream = asyncio.Queue()
while not input_stream.complete:
    async for chunk in chain.astream(input_stream):
        output_stream.put(chunk)    
```

Everything should be async safe and thread safe. Audio data can be put
into the input stream while the chain is running without interruptions.

---------

Co-authored-by: Hayden Wolff <hwolff@nvidia.com>
Co-authored-by: Hayden Wolff <hwolff@Haydens-Laptop.local>
Co-authored-by: Hayden Wolff <haydenwolff99@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago
François Paupier 929f071513
community[patch]: Fix error in `LlamaCpp` community LLM with Configurable Fields, 'grammar' custom type not available (#16995)
- **Description:** Ensure the `LlamaGrammar` custom type is always
available when instantiating a `LlamaCpp` LLM
  - **Issue:** #16994 
  - **Dependencies:** None
  - **Twitter handle:** @fpaupier

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Leonid Ganeline 563f325034
experimental[patch]: fixed import in `experimental` (#17078) 7 months ago
Eugene Yurtsev fbab8baac5
core[patch]: Add astream events config test (#17055)
Verify that astream events propagates config correctly

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
7 months ago
Scott Nath 10bd901139
infra: add integration_tests and coverage to MAKEFILE (#17053)
- **Description: update community MAKE file** 
    - adds `integration_tests`
    - adds `coverage`

- **Issue:** the issue # it fixes if applicable,
    - moving out of https://github.com/langchain-ai/langchain/pull/17014
- **Dependencies:** n/a
- **Twitter handle:** @scottnath
- **Mastodon handle:** scottnath@mastodon.social

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
8 months ago
Giulio Zani 9f0b63dba0
experimental[patch]: Fixes issue #17060 (#17062)
As described in issue #17060, in the case in which text has only one
sentence the following function fails. Checking for that and adding a
return case fixed the issue.

```python
    def split_text(self, text: str) -> List[str]:
        """Split text into multiple components."""
        # Splitting the essay on '.', '?', and '!'
        single_sentences_list = re.split(r"(?<=[.?!])\s+", text)
        sentences = [
            {"sentence": x, "index": i} for i, x in enumerate(single_sentences_list)
        ]
        sentences = combine_sentences(sentences)
        embeddings = self.embeddings.embed_documents(
            [x["combined_sentence"] for x in sentences]
        )
        for i, sentence in enumerate(sentences):
            sentence["combined_sentence_embedding"] = embeddings[i]
        distances, sentences = calculate_cosine_distances(sentences)
        start_index = 0

        # Create a list to hold the grouped sentences
        chunks = []
        breakpoint_percentile_threshold = 95
        breakpoint_distance_threshold = np.percentile(
            distances, breakpoint_percentile_threshold
        )  # If you want more chunks, lower the percentile cutoff

        indices_above_thresh = [
            i for i, x in enumerate(distances) if x > breakpoint_distance_threshold
        ]  # The indices of those breakpoints on your list

        # Iterate through the breakpoints to slice the sentences
        for index in indices_above_thresh:
            # The end index is the current breakpoint
            end_index = index

            # Slice the sentence_dicts from the current start index to the end index
            group = sentences[start_index : end_index + 1]
            combined_text = " ".join([d["sentence"] for d in group])
            chunks.append(combined_text)

            # Update the start index for the next group
            start_index = index + 1

        # The last group, if any sentences remain
        if start_index < len(sentences):
            combined_text = " ".join([d["sentence"] for d in sentences[start_index:]])
            chunks.append(combined_text)
        return chunks
```

Co-authored-by: Giulio Zani <salamanderxing@Giulios-MBP.homenet.telecomitalia.it>
8 months ago
Jimmy Moore 912210ac19
core[patch]: fix _sql_record_manager mypy for #17048 (#17073)
- **Description:** Add relevant type annotations for relevant session
and query objects to resolve mypy errors when `# type: ignore` comments
are removed.
  - **Issue:** #17048
  - **Dependencies:** None,
  - **Twitter handle:** [clesiemo3](https://twitter.com/clesiemo3)
 
I attempted to solve the `UpsertionRecord` ignore but it would require
added a deprecated plugin or moving completely to sqlalchemy 2.0+ from
my understanding. I'm assuming this is not something desired at this
point in time.
8 months ago
William FH 3d5e988c55
Add prompt metadata + tags (#17054) 8 months ago
Bagatur 6e2ed9671f
infra: fix breebs test lint (#17075) 8 months ago
T Cramer cf01fc3790
docs: update parse_partial_json source info (#17036)
- **Description:** Update source-link following recent license update at
open-interpreter project
  - **Issue:** N/A
  - **Dependencies:** None
8 months ago
Alex Boury 334b6ebdf3
community[minor]: Breebs docs retriever (#16578)
- **Description:** Implementation of breeb retriever with integration
tests ->
libs/community/tests/integration_tests/retrievers/test_breebs.py and
documentation (notebook) ->
docs/docs/integrations/retrievers/breebs.ipynb.
  - **Dependencies:** None
8 months ago
Serena Ruan 9b279ac127
community[patch]: MLflow callback update (#16687)
Signed-off-by: Serena Ruan <serena.rxy@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
8 months ago
Mohammad Mohtashim 3c4b24b69a
community[patch]: Fix the _call of HuggingFaceHub (#16891)
Fixed the following identified issue: #16849

@baskaryan
8 months ago
Tyler Titsworth 304f3f5fc1
community[patch]: Add Progress bar to HuggingFaceEmbeddings (#16758)
- **Description:** Adds a function parameter to HuggingFaceEmbeddings
called `show_progress` that enables a `tqdm` progress bar if enabled.
Does not function if `multi_process = True`.
  - **Issue:** n/a
  - **Dependencies:** n/a
8 months ago
Supreet Takkar ae33979813
community[patch]: Allow adding ARNs as model_id to support Amazon Bedrock custom models (#16800)
- **Description:** Adds an additional class variable to `BedrockBase`
called `provider` that allows sending a model provider such as amazon,
cohere, ai21, etc.
Up until now, the model provider is extracted from the `model_id` using
the first part before the `.`, such as `amazon` for
`amazon.titan-text-express-v1` (see [supported list of Bedrock model IDs
here](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids-arns.html)).
But for custom Bedrock models where the ARN of the provisioned
throughput must be supplied, the `model_id` is like
`arn:aws:bedrock:...` so the `model_id` cannot be extracted from this. A
model `provider` is required by the LangChain Bedrock class to perform
model-based processing. To allow the same processing to be performed for
custom-models of a specific base model type, passing this `provider`
argument can help solve the issues.
The alternative considered here was the use of
`provider.arn:aws:bedrock:...` which then requires ARN to be extracted
and passed separately when invoking the model. The proposed solution
here is simpler and also does not cause issues for current models
already using the Bedrock class.
  - **Issue:** N/A
  - **Dependencies:** N/A

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
8 months ago
T Cramer e022bfaa7d
langchain: add partial parsing support to JsonOutputToolsParser (#17035)
- **Description:** Add partial parsing support to JsonOutputToolsParser
- **Issue:**
[16736](https://github.com/langchain-ai/langchain/issues/16736)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
8 months ago
calvinweb dcf973c22c
Langchain: `json_chat` don't need stop sequenes (#16335)
This is a PR about #16334
The Stop sequenes isn't meanful in `json_chat` because it depends json
to work, not completions
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
8 months ago
Bagatur 66e45e8ab7
community[patch]: chat model mypy fixes (#17061)
Related to #17048
8 months ago
Bagatur d93de71d08
community[patch]: chat message history mypy fixes (#17059)
Related to #17048
8 months ago
Bagatur af5ae24af2
community[patch]: callbacks mypy fixes (#17058)
Related to #17048
8 months ago
Vadim Kudlay 75b6fa1134
nvidia-ai-endpoints[patch]: Support User-Agent metadata and minor fixes. (#16942)
- **Description:** Several meta/usability updates, including User-Agent.
  - **Issue:** 
- User-Agent metadata for tracking connector engagement. @milesial
please check and advise.
- Better error messages. Tries harder to find a request ID. @milesial
requested.
- Client-side image resizing for multimodal models. Hope to upgrade to
Assets API solution in around a month.
- `client.payload_fn` allows you to modify payload before network
request. Use-case shown in doc notebook for kosmos_2.
- `client.last_inputs` put back in to allow for advanced
support/debugging.
  - **Dependencies:** 
- Attempts to pull in PIL for image resizing. If not installed, prints
out "please install" message, warns it might fail, and then tries
without resizing. We are waiting on a more permanent solution.

For LC viz: @hinthornw 
For NV viz: @fciannella @milesial @vinaybagade

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
8 months ago
Nuno Campos ae56fd020a
Fix condition on custom root type in runnable history (#17017)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
8 months ago
Nuno Campos f0ffebb944
Shield callback methods from cancellation: Fix interrupted runs marked as pending forever (#17010)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
8 months ago
Bagatur e7b3290d30
community[patch]: fix agent_toolkits mypy (#17050)
Related to #17048
8 months ago
Erick Friis 6ffd5b15bc
pinecone: init pkg (#16556)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
8 months ago
Harrison Chase 4eda647fdd
infra: add -p to mkdir in lint steps (#17013)
Previously, if this did not find a mypy cache then it wouldnt run

this makes it always run

adding mypy ignore comments with existing uncaught issues to unblock other prs

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
8 months ago
Eugene Yurtsev fb245451d2
core[patch]: Add langsmith to printed sys information (#16899) 8 months ago
Mikhail Khludnev 2145636f1d
Nvidia trt model name for stop_stream() (#16997)
just removing some legacy leftover.
8 months ago
Christophe Bornet 2ef69fe11b
Add async methods to BaseChatMessageHistory and BaseMemory (#16728)
Adds:
   * async methods to BaseChatMessageHistory
   * async methods to ChatMessageHistory
   * async methods to BaseMemory
   * async methods to BaseChatMemory
   * async methods to ConversationBufferMemory
   * tests of ConversationBufferMemory's async methods

  **Twitter handle:** cbornet_
8 months ago
Ryan Kraus b3c3b58f2c
core[patch]: Fixed bug in dict to message conversion. (#17023)
- **Description**: We discovered a bug converting dictionaries to
messages where the ChatMessageChunk message type isn't handled. This PR
adds support for that message type.
- **Issue**: #17022 
- **Dependencies**: None
- **Twitter handle**: None
8 months ago
Killinsun - Ryota Takeuchi bcfce146d8
community[patch]: Correct the calling to collection_name in qdrant (#16920)
## Description

In #16608, the calling `collection_name` was wrong.
I made a fix for it. 
Sorry for the inconvenience!

## Issue

https://github.com/langchain-ai/langchain/issues/16962

## Dependencies

N/A



<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Kumar Shivendu <kshivendu1@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
8 months ago
Erick Friis 849051102a
google-genai[patch]: fix new core typing (#16988) 8 months ago
Bagatur 35446c814e
openai[patch]: rm tiktoken model warning (#16964) 8 months ago
ccurme 0826d87ecd
langchain_mistralai[patch]: Invoke callback prior to yielding token (#16986)
- **Description:** Invoke callback prior to yielding token in stream and
astream methods for ChatMistralAI.
- **Issue:** https://github.com/langchain-ai/langchain/issues/16913
8 months ago
Erick Friis afdd636999
docs: partner packages (#16960) 8 months ago
Erick Friis 06660bc78c
core[patch]: handle some optional cases in tools (#16954)
primary problem in pydantic still exists, where `Optional[str]` gets
turned to `string` in the jsonschema `.schema()`

Also fixes the `SchemaSchema` naming issue

---------

Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>
8 months ago
Mohammad Mohtashim f8943e8739
core[patch]: Add doc-string to RunnableEach (#16892)
Add doc-string to Runnable Each
---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
8 months ago
Bagatur 2a510c71a0
core[patch]: doc init positional args (#16854) 8 months ago
Bagatur d80c612c92
core[patch]: Message content as positional arg (#16921) 8 months ago
Bagatur c29e9b6412
core[patch]: fix chat prompt partial messages placeholder var (#16918) 8 months ago
hmasdev cc17334473
core[minor]: add validation error handler to `BaseTool` (#14007)
- **Description:** add a ValidationError handler as a field of
[`BaseTool`](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/tools.py#L101)
and add unit tests for the code change.
- **Issue:** #12721 #13662
- **Dependencies:** None
- **Tag maintainer:** 
- **Twitter handle:** @hmdev3
- **NOTE:**
  - I'm wondering if the update of document is required.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
8 months ago
William FH bdacfafa05
core[patch]: Remove deep copying of run prior to submitting it to LangChain Tracing (#16904) 8 months ago
William FH e02efd513f
core[patch]: Hide aliases when serializing (#16888)
Currently, if you dump an object initialized with an alias, we'll still
dump the secret values since they're retained in the kwargs
8 months ago
William FH 131c043864
Fix loading of ImagePromptTemplate (#16868)
We didn't override the namespace of the ImagePromptTemplate, so it is
listed as being in langchain.schema

This updates the mapping to let the loader deserialize.

Alternatively, we could make a slight breaking change and update the
namespace of the ImagePromptTemplate since we haven't broadly
publicized/documented it yet..
8 months ago
Eugene Yurtsev a265878d71
langchain_openai[patch]: Invoke callback prior to yielding token (#16909)
All models should be calling the callback for new token prior to
yielding the token.

Not doing this can cause callbacks for downstream steps to be called
prior to the callback for the new token; causing issues in
astream_events APIs and other things that depend in callback ordering
being correct.

We need to make this change for all chat models.
8 months ago
Erick Friis b1a847366c
community: revert SQL Stores (#16912)
This reverts commit cfc225ecb3.


https://github.com/langchain-ai/langchain/pull/15909#issuecomment-1922418097

These will have existed in langchain-community 0.0.16 and 0.0.17.
8 months ago
Leonid Ganeline c2ca6612fe
refactor `langchain.prompts.example_selector` (#15369)
The `langchain.prompts.example_selector` [still holds several
artifacts](https://api.python.langchain.com/en/latest/langchain_api_reference.html#module-langchain.prompts)
that belongs to `community`. If they moved to
`langchain_community.example_selectors`, the `langchain.prompts`
namespace would be effectively removed which is great.
- moved a class and afunction to `langchain_community`

Note:
- Previously, the `langchain.prompts.example_selector` artifacts were
moved into the `langchain_core.exampe_selectors`. See the flattened
namespace (`.prompts` was removed)!
Similar flattening was implemented for the `langchain_core` as the
`langchain_core.exampe_selectors`.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
8 months ago
Qihui Xie c5b01ac621
community[patch]: support LIKE comparator (full text match) in Qdrant (#12769)
**Description:** 
Support [Qdrant full text match
filtering](https://qdrant.tech/documentation/concepts/filtering/#full-text-match)
by adding Comparator.LIKE to QdrantTranslator.
8 months ago
Christophe Bornet 9d458d089a
community: Factorize AstraDB components constructors (#16779)
* Adds `AstraDBEnvironment` class and use it in `AstraDBLoader`,
`AstraDBCache`, `AstraDBSemanticCache`, `AstraDBBaseStore` and
`AstraDBChatMessageHistory`
* Create an `AsyncAstraDB` if we only have an `AstraDB` and vice-versa
so:
  * we always have an instance of `AstraDB`
* we always have an instance of `AsyncAstraDB` for recent versions of
astrapy
* Create collection if not exists in `AstraDBBaseStore`
* Some typing improvements

Note: `AstraDB` `VectorStore` not using `AstraDBEnvironment` at the
moment. This will be done after the `langchain-astradb` package is out.
8 months ago
Christophe Bornet 78a1af4848
langchain[patch]: Add async methods to MultiVectorRetriever (#16878)
Adds async support to multi vector retriever
8 months ago
Bagatur 7d03d8f586
docs: fix docstring examples (#16889) 8 months ago
Bagatur c2d09fb151
infra: bump exp min test reqs (#16884) 8 months ago
Bagatur 65ba5c220b
experimental[patch]: Release 0.0.50 (#16883) 8 months ago
Bagatur 9e7d9f9390
infra: bump langchain min test reqs (#16882) 8 months ago
Bagatur db442c635b
langchain[patch]: Release 0.1.5 (#16881) 8 months ago
Bagatur 2b4abed25c
commmunity[patch]: Release 0.0.17 (#16871) 8 months ago
Bagatur bb73251146
core[patch]: Release 0.1.18 (#16870) 8 months ago
Christophe Bornet a0ec045495
Add async methods to BaseStore (#16669)
- **Description:**

The BaseStore methods are currently blocking. Some implementations
(AstraDBStore, RedisStore) would benefit from having async methods.
Also once we have async methods for BaseStore, we can implement the
async `aembed_documents` in CacheBackedEmbeddings to cache the
embeddings asynchronously.

* adds async methods amget, amset, amedelete and ayield_keys to
BaseStore
  * implements the async methods for InMemoryStore
  * adds tests for InMemoryStore async methods

- **Twitter handle:** cbornet_
8 months ago
Erick Friis 17e886388b
nomic: init pkg (#16853)
Co-authored-by: Lance Martin <lance@langchain.dev>
8 months ago
Eugene Yurtsev 2e5949b6f8
core(minor): Add bulk add messages to BaseChatMessageHistory interface (#15709)
* Add bulk add_messages method to the interface.
* Update documentation for add_ai_message and add_human_message to
denote them as being marked for deprecation. We should stop using them
as they create more incorrect (inefficient) ways of doing things
8 months ago
Christophe Bornet af8c5c185b
langchain[minor],community[minor]: Add async methods in BaseLoader (#16634)
Adds:
* methods `aload()` and `alazy_load()` to interface `BaseLoader`
* implementation for class `MergedDataLoader `
* support for class `BaseLoader` in async function `aindex()` with unit
tests

Note: this is compatible with existing `aload()` methods that some
loaders already had.

**Twitter handle:** @cbornet_

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
8 months ago
Erick Friis c37ca45825
nvidia-trt: remove tritonclient all extra dep (#16749) 8 months ago
Erick Friis bb3b6bde33
openai[minor]: change to secretstr (#16803) 8 months ago
Raphael bf9068516e
community[minor]: add the ability to load existing transcripts from AssemblyAI by their id. (#16051)
- **Description:** the existing AssemblyAI API allows to pass a path or
an url to transcribe an audio file and turn in into Langchain Documents,
this PR allows to get existing transcript by their transcript id and
turn them into Documents.
  - **Issue:** not related to an existing issue
  - **Dependencies:** requests

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
8 months ago
Bagatur daf820c77b
community[patch]: undo create_sql_agent breaking (#16797) 8 months ago
Eugene Yurtsev ef2bd745cb
docs: Update doc-string in base callback managers (#15885)
Update doc-strings with a comment about on_llm_start vs.
on_chat_model_start.
8 months ago
William FH 881dc28d2c
Fix Dep Recommendation (#16793)
Tools are different than functions
8 months ago
Bagatur b0347f3e2b
docs: add csv use case (#16756) 8 months ago
Alexander Conway 4acd2654a3
Report which file was errored on in DirectoryLoader (#16790)
The current implementation leaves it up to the particular file loader
implementation to report the file on which an error was encountered - in
my case pdfminer was simply saying it could not parse a file as a PDF,
but I didn't know which of my hundreds of files it was failing on.

No reason not to log the particular item on which an error was
encountered, and it should be an immense debugging assistant.

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
8 months ago
Erick Friis a372b23675
robocorp: release 0.0.3 (#16789) 8 months ago
Rihards Gravis 442fa52b30
[partners]: langchain-robocorp ease dependency version (#16765) 8 months ago
Bob Lin 546b757303
community: Add ChatGLM3 (#15265)
Add [ChatGLM3](https://github.com/THUDM/ChatGLM3) and updated
[chatglm.ipynb](https://python.langchain.com/docs/integrations/llms/chatglm)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
8 months ago
Marina Pliusnina a1ce7ab672
adding parameter for changing the language in SpacyEmbeddings (#15743)
Description: Added the parameter for a possibility to change a language
model in SpacyEmbeddings. The default value is still the same:
"en_core_web_sm", so it shouldn't affect a code which previously did not
specify this parameter, but it is not hard-coded anymore and easy to
change in case you want to use it with other languages or models.

Issue: At Barcelona Supercomputing Center in Aina project
(https://github.com/projecte-aina), a project for Catalan Language
Models and Resources, we would like to use Langchain for one of our
current projects and we would like to comment that Langchain, while
being a very powerful and useful open-source tool, is pretty much
focused on English language. We would like to contribute to make it a
bit more adaptable for using with other languages.

Dependencies: This change requires the Spacy library and a language
model, specified in the model parameter.

Tag maintainer: @dev2049

Twitter handle: @projecte_aina

---------

Co-authored-by: Marina Pliusnina <marina.pliusnina@bsc.es>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
8 months ago
Christophe Bornet 744070ee85
Add async methods for the AstraDB VectorStore (#16391)
- **Description**: fully async versions are available for astrapy 0.7+.
For older astrapy versions or if the user provides a sync client without
an async one, the async methods will call the sync ones wrapped in
`run_in_executor`
  - **Twitter handle:** cbornet_
8 months ago
baichuan-assistant f8f2649f12
community: Add Baichuan LLM to community (#16724)
Replace this entire comment with:
- **Description:** Add Baichuan LLM to integration/llm, also updated
related docs.

Co-authored-by: BaiChuanHelper <wintergyc@WinterGYCs-MacBook-Pro.local>
8 months ago
thiswillbeyourgithub 1d082359ee
community: add support for callable filters in FAISS (#16190)
- **Description:**
Filtering in a FAISS vectorstores is very inflexible and doesn't allow
that many use case. I think supporting callable like this enables a lot:
regular expressions, condition on multiple keys etc. **Note** I had to
manually alter a test. I don't understand if it was falty to begin with
or if there is something funky going on.
- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** None

Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
8 months ago
Yudhajit Sinha 1703fe2361
core[patch]: preserve inspect.iscoroutinefunction with @beta decorator (#16440)
Adjusted deprecate decorator to make sure decorated async functions are
still recognized as "coroutinefunction" by inspect

Addresses #16402

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
8 months ago
Killinsun - Ryota Takeuchi 52f4ad8216
community: Add new fields in metadata for qdrant vector store (#16608)
## Description

The PR is to return the ID and collection name from qdrant client to
metadata field in `Document` class.

## Issue

The motivation is almost same to
[11592](https://github.com/langchain-ai/langchain/issues/11592)

Returning ID is useful to update existing records in a vector store, but
we cannot know them if we use some retrievers.

In order to avoid any conflicts, breaking changes, the new fields in
metadata have a prefix `_`

## Dependencies

N/A

## Twitter handle

@kill_in_sun

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
8 months ago
hulitaitai 32cad38ec6
<langchain_community\llms\chatglm.py>: <Correcting "history"> (#16729)
Use the real "history" provided by the original program instead of
putting "None" in the history.

- **Description:** I change one line in the code to make it return the
"history" of the chat model.
- **Issue:** At the moment it returns only the answers of the chat
model. However the chat model himself provides a history more complet
with the questions of the user.
  - **Dependencies:** no dependencies required for this change,
8 months ago
Bassem Yacoube 85e93e05ed
community[minor]: Update OctoAI LLM, Embedding and documentation (#16710)
This PR includes updates for OctoAI integrations:
- The LLM class was updated to fix a bug that occurs with multiple
sequential calls
- The Embedding class was updated to support the new GTE-Large endpoint
released on OctoAI lately
- The documentation jupyter notebook was updated to reflect using the
new LLM sdk
Thank you!
8 months ago
Shay Ben Elazar 84ebfb5b9d
openai[patch]: Added annotations support to azure openai (#13704)
- **Description:** Added Azure OpenAI Annotations (content filtering
results) to ChatResult

  - **Issue:** 13090

  - **Twitter handle:** ElazarShay

Co-authored-by: Bagatur <baskaryan@gmail.com>
8 months ago
Volodymyr Machula 32c5be8b73
community[minor]: Connery Tool and Toolkit (#14506)
## Summary

This PR implements the "Connery Action Tool" and "Connery Toolkit".
Using them, you can integrate Connery actions into your LangChain agents
and chains.

Connery is an open-source plugin infrastructure for AI.

With Connery, you can easily create a custom plugin with a set of
actions and seamlessly integrate them into your LangChain agents and
chains. Connery will handle the rest: runtime, authorization, secret
management, access management, audit logs, and other vital features.
Additionally, Connery and our community offer a wide range of
ready-to-use open-source plugins for your convenience.

Learn more about Connery:

- GitHub: https://github.com/connery-io/connery-platform
- Documentation: https://docs.connery.io
- Twitter: https://twitter.com/connery_io

## TODOs

- [x] API wrapper
   - [x] Integration tests
- [x] Connery Action Tool
   - [x] Docs
   - [x] Example
   - [x] Integration tests
- [x] Connery Toolkit
  - [x] Docs
  - [x] Example
- [x] Formatting (`make format`)
- [x] Linting (`make lint`)
- [x] Testing (`make test`)
8 months ago
Harrison Chase 8457c31c04
community[patch]: activeloop ai tql deprecation (#14634)
Co-authored-by: AdkSarsen <adilkhan@activeloop.ai>
8 months ago
Neli Hateva c95facc293
langchain[minor], community[minor]: Implement Ontotext GraphDB QA Chain (#16019)
- **Description:** Implement Ontotext GraphDB QA Chain
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** @OntotextGraphDB
8 months ago
chyroc a08f9a7ff9
langchain[patch]: support OpenAIAssistantRunnable async (#15302)
fix https://github.com/langchain-ai/langchain/issues/15299

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
8 months ago
Elliot 39eb00d304
community[patch]: Adapt more parameters related to MemorySearchPayload for the search method of ZepChatMessageHistory (#15441)
- **Description:** To adapt more parameters related to
MemorySearchPayload for the search method of ZepChatMessageHistory,
  - **Issue:** None,
  - **Dependencies:** None,
  - **Twitter handle:** None
8 months ago
Jael Gu a1aa3a657c
community[patch]: Milvus supports add & delete texts by ids (#16256)
# Description

To support [langchain
indexing](https://python.langchain.com/docs/modules/data_connection/indexing)
as requested by users, vectorstore Milvus needs to support:
- document addition by id (`add_documents` method with `ids` argument)
- delete by id (`delete` method with `ids` argument)

Example usage:

```python
from langchain.indexes import SQLRecordManager, index
from langchain.schema import Document
from langchain_community.vectorstores import Milvus
from langchain_openai import OpenAIEmbeddings

collection_name = "test_index"
embedding = OpenAIEmbeddings()
vectorstore = Milvus(embedding_function=embedding, collection_name=collection_name)

namespace = f"milvus/{collection_name}"
record_manager = SQLRecordManager(
    namespace, db_url="sqlite:///record_manager_cache.sql"
)
record_manager.create_schema()

doc1 = Document(page_content="kitty", metadata={"source": "kitty.txt"})
doc2 = Document(page_content="doggy", metadata={"source": "doggy.txt"})

index(
    [doc1, doc1, doc2],
    record_manager,
    vectorstore,
    cleanup="incremental",  # None, "incremental", or "full"
    source_id_key="source",
)
```

# Fix issues

Fix https://github.com/milvus-io/milvus/issues/30112

---------

Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
8 months ago
Michard Hugo e9d3527b79
community[patch]: Add missing async similarity_distance_threshold handling in RedisVectorStoreRetriever (#16359)
Add missing async similarity_distance_threshold handling in
RedisVectorStoreRetriever

- **Description:** added method `_aget_relevant_documents` to
`RedisVectorStoreRetriever` that overrides parent method to add support
of `similarity_distance_threshold` in async mode (as for sync mode)
  - **Issue:** #16099
  - **Dependencies:** N/A
  - **Twitter handle:** N/A
8 months ago
Bagatur 7237dc67d4
core[patch]: Release 0.1.17 (#16737) 8 months ago
Anthony Bernabeu 2db79ab111
community[patch]: Implement TTL for DynamoDBChatMessageHistory (#15478)
- **Description:** Implement TTL for DynamoDBChatMessageHistory, 
  - **Issue:** see #15477,
  - **Dependencies:** N/A,

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
8 months ago
Massimiliano Pronesti 1bc8d9a943
experimental[patch]: missing resolution strategy in anonymization (#16653)
- **Description:** Presidio-based anonymizers are not working because
`_remove_conflicts_and_get_text_manipulation_data` was being called
without a conflict resolution strategy. This PR fixes this issue. In
addition, it removes some mutable default arguments (antipattern).
 
To reproduce the issue, just run the very first cell of this
[notebook](https://python.langchain.com/docs/guides/privacy/2/) from
langchain's documentation.

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
8 months ago
taimo d3d9244fee
langchain-community: fix unicode escaping issue with SlackToolkit (#16616)
- **Description:** fix unicode escaping issue with SlackToolkit
  - **Issue:**  #16610
8 months ago
Benito Geordie f3fdc5c5da
community: Added integrations for ThirdAI's NeuralDB with Retriever and VectorStore frameworks (#15280)
**Description:** Adds ThirdAI NeuralDB retriever and vectorstore
integration. NeuralDB is a CPU-friendly and fine-tunable text retrieval
engine.
8 months ago
Pashva Mehta 22d90800c8
community: Fixed schema discrepancy in from_texts function for weaviate vectorstore (#16693)
* Description: Fixed schema discrepancy in **from_texts** function for
weaviate vectorstore which created a redundant property "key" inside a
class.
* Issue: Fixed: https://github.com/langchain-ai/langchain/issues/16692
* Twitter handle: @pashvamehta1
8 months ago
ccurme ec0ae23645
core: expand docstring for RunnableGenerator (#16672)
- **Description:** expand docstring for RunnableGenerator
  - **Issue:** https://github.com/langchain-ai/langchain/issues/16631
8 months ago
Daniel Erenrich 0600998f38
community: Wikidata tool support (#16691)
- **Description:** Adds Wikidata support to langchain. Can read out
documents from Wikidata.
  - **Issue:** N/A
- **Dependencies:** Adds implicit dependencies for
`wikibase-rest-api-client` (for turning items into docs) and
`mediawikiapi` (for hitting the search endpoint)
  - **Twitter handle:** @derenrich

You can see an example of this tool used in a chain
[here](https://nbviewer.org/urls/d.erenrich.net/upload/Wikidata_Langchain.ipynb)
or
[here](https://nbviewer.org/urls/d.erenrich.net/upload/Wikidata_Lars_Kai_Hansen.ipynb)

<!-- Thank you for contributing to LangChain!


Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
8 months ago
Tze Min 6ef718c5f4
Core: fix Anthropic json issue in streaming (#16670)
**Description:** fix ChatAnthropic json issue in streaming 
**Issue:** https://github.com/langchain-ai/langchain/issues/16423
**Dependencies:** n/a

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
8 months ago
Christophe Bornet 2e3af04080
Use Postponed Evaluation of Annotations in Astra and Cassandra doc loaders (#16694)
Minor/cosmetic change
8 months ago
Erick Friis 88e3129587
robocorp: release 0.0.2 (#16706) 8 months ago
Christophe Bornet 36e432672a
community[minor]: Add async methods to AstraDBLoader (#16652) 8 months ago
William FH 38425c99d2
core[minor]: Image prompt template (#14263)
Builds on Bagatur's (#13227). See unit test for example usage (below)

```python
def test_chat_tmpl_from_messages_multipart_image() -> None:
    base64_image = "abcd123"
    other_base64_image = "abcd123"
    template = ChatPromptTemplate.from_messages(
        [
            ("system", "You are an AI assistant named {name}."),
            (
                "human",
                [
                    {"type": "text", "text": "What's in this image?"},
                    # OAI supports all these structures today
                    {
                        "type": "image_url",
                        "image_url": "data:image/jpeg;base64,{my_image}",
                    },
                    {
                        "type": "image_url",
                        "image_url": {"url": "data:image/jpeg;base64,{my_image}"},
                    },
                    {"type": "image_url", "image_url": "{my_other_image}"},
                    {
                        "type": "image_url",
                        "image_url": {"url": "{my_other_image}", "detail": "medium"},
                    },
                    {
                        "type": "image_url",
                        "image_url": {"url": "https://www.langchain.com/image.png"},
                    },
                    {
                        "type": "image_url",
                        "image_url": {"url": "data:image/jpeg;base64,foobar"},
                    },
                ],
            ),
        ]
    )
    messages = template.format_messages(
        name="R2D2", my_image=base64_image, my_other_image=other_base64_image
    )
    expected = [
        SystemMessage(content="You are an AI assistant named R2D2."),
        HumanMessage(
            content=[
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{other_base64_image}"
                    },
                },
                {
                    "type": "image_url",
                    "image_url": {"url": f"{other_base64_image}"},
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"{other_base64_image}",
                        "detail": "medium",
                    },
                },
                {
                    "type": "image_url",
                    "image_url": {"url": "https://www.langchain.com/image.png"},
                },
                {
                    "type": "image_url",
                    "image_url": {"url": "data:image/jpeg;base64,foobar"},
                },
            ]
        ),
    ]
    assert messages == expected
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Brace Sproul <braceasproul@gmail.com>
8 months ago
Rashedul Hasan Rijul 481493dbce
community[patch]: apply embedding functions during query if defined (#16646)
**Description:** This update ensures that the user-defined embedding
function specified during vector store creation is applied during
queries. Previously, even if a custom embedding function was defined at
the time of store creation, Bagel DB would default to using the standard
embedding function during query execution. This pull request addresses
this issue by consistently using the user-defined embedding function for
queries if one has been specified earlier.
8 months ago
Serena Ruan f01fb47597
community[patch]: MLflowCallbackHandler -- Move textstat and spacy as optional dependency (#16657)
Signed-off-by: Serena Ruan <serena.rxy@gmail.com>
8 months ago
Zhuoyun(John) Xu 508bde7f40
community[patch]: Ollama - Pass headers to post request in async method (#16660)
# Description
A previous PR (https://github.com/langchain-ai/langchain/pull/15881)
added option to pass headers to ollama endpoint, but headers are not
pass to the async method.
8 months ago
João Carlos Ferra de Almeida 3e87b67a3c
community[patch]: Add Cookie Support to Fetch Method (#16673)
- **Description:** This change allows the `_fetch` method in the
`WebBaseLoader` class to utilize cookies from an existing
`requests.Session`. It ensures that when the `fetch` method is used, any
cookies in the provided session are included in the request. This
enhancement maintains compatibility with existing functionality while
extending the utility of the `fetch` method for scenarios where cookie
persistence is necessary.
- **Issue:** Not applicable (new feature),
- **Dependencies:** Requires `aiohttp` and `requests` libraries (no new
dependencies introduced),
- **Twitter handle:** N/A

Co-authored-by: Joao Almeida <joao.almeida@mercedes-benz.io>
8 months ago
Harrison Chase 27665e3546
[community] fix anthropic streaming (#16682) 8 months ago
Christophe Bornet 4915c3cd86
[Fix] Fix Cassandra Document loader default page content mapper (#16273)
We can't use `json.dumps` by default as many types returned by the
cassandra driver are not serializable. It's safer to use `str` and let
users define their own custom `page_content_mapper` if needed.
8 months ago
Nuno Campos e86fd946c8
In stream_event and stream_log handle closed streams (#16661)
if eg. the stream iterator is interrupted then adding more events to the
send_stream will raise an exception that we should catch (and handle
where appropriate)

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
8 months ago
Nuno Campos 52ccae3fb1
Accept message-like things in Chat models, LLMs and MessagesPlaceholder (#16418) 8 months ago
Pasha 4e189cd89a
community[patch]: youtube loader transcript format (#16625)
- **Description**: YoutubeLoader right now returns one document that
contains the entire transcript. I think it would be useful to add an
option to return multiple documents, where each document would contain
one line of transcript with the start time and duration in the metadata.
For example,
[AssemblyAIAudioTranscriptLoader](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/document_loaders/assemblyai.py)
is implemented in a similar way, it allows you to choose between the
format to use for the document loader.
8 months ago
yin1991 a936472512
docs: Update documentation to use 'model_id' rather than 'model_name' to match actual API (#16615)
- **Description:** Replace 'model_name' with 'model_id' for accuracy 
- **Issue:**
[link-to-issue](https://github.com/langchain-ai/langchain/issues/16577)
  - **Dependencies:** 
  - **Twitter handle:**
8 months ago
Micah Parker 6543e585a5
community[patch]: Added support for Ollama's num_predict option in ChatOllama (#16633)
Just a simple default addition to the options payload for a ollama
generate call to support a max_new_tokens parameter.

Should fix issue: https://github.com/langchain-ai/langchain/issues/14715
8 months ago
baichuan-assistant 70ff54eace
community[minor]: Add Baichuan Text Embedding Model and Baichuan Inc introduction (#16568)
- **Description:** Adding Baichuan Text Embedding Model and Baichuan Inc
introduction.

Baichuan Text Embedding ranks #1 in C-MTEB leaderboard:
https://huggingface.co/spaces/mteb/leaderboard

Co-authored-by: BaiChuanHelper <wintergyc@WinterGYCs-MacBook-Pro.local>
8 months ago
Bagatur 5b5115c408
google-vertexai[patch]: streaming bug (#16603)
Fixes errors seen here
https://github.com/langchain-ai/langchain/actions/runs/7661680517/job/20881556592#step:9:229
8 months ago
ccurme a989f82027
core: expand docstring for RunnableParallel (#16600)
- **Description:** expand docstring for RunnableParallel
  - **Issue:** https://github.com/langchain-ai/langchain/issues/16462

Feel free to modify this or let me know how it can be improved!
8 months ago
Ghani e30c6662df
Langchain-community : EdenAI chat integration. (#16377)
- **Description:** This PR adds [EdenAI](https://edenai.co/) for the
chat model (already available in LLM & Embeddings). It supports all
[ChatModel] functionality: generate, async generate, stream, astream and
batch. A detailed notebook was added.

  - **Dependencies**: No dependencies are added as we call a rest API.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
8 months ago
Antonio Lanza 08d3fd7f2e
langchain[patch]: inconsistent results with `RecursiveCharacterTextSplitter`'s `add_start_index=True` (#16583)
This PR fixes issue #16579
8 months ago
Eugene Yurtsev 42db96477f
docs: Update in code documentation for runnable with message history (#16585)
Update the in code documentation for Runnable With Message History
8 months ago
Jatin Chawda a79345f199
community[patch]: Fixed tool names snake_case (#16397)
#16396
Fixed
1. golden_query
2. google_lens
3. memorize
4. merriam_webster
5. open_weather_map
6. pub_med
7. stack_exchange
8. generate_image
9. wikipedia
8 months ago
Bagatur bcc71d1a57
openai[patch]: Release 0.0.5 (#16598) 8 months ago
Bagatur 68f7468754
google-vertexai[patch]: Release 0.0.3 (#16597) 8 months ago
Bagatur 61e876aad8
openai[patch]: Explicitly support embedding dimensions (#16596) 8 months ago
Bagatur 5df8ab574e
infra: move indexing documentation test (#16595) 8 months ago
Bagatur f3d61a6e47
langchain[patch]: Release 0.1.4 (#16592) 8 months ago
Bagatur 61b200947f
community[patch]: Release 0.0.16 (#16591) 8 months ago
Bagatur 75ad0bba2d
openai[patch]: Release 0.0.4 (#16590) 8 months ago
Bagatur 1e3ce338ca
core[patch]: Release 0.1.16 (#16589) 8 months ago
Bagatur 6c89507988
docs: add rag citations page (#16549) 8 months ago
Bagatur 31790d15ec
openai[patch]: accept function_call dict in bind_functions (#16483)
Confusing that you can't pass in a dict
8 months ago
Bagatur ef42d9d559
core[patch], community[patch], openai[patch]: consolidate openai tool… (#16485)
… converters

One way to convert anything to an OAI function:
convert_to_openai_function
One way to convert anything to an OAI tool: convert_to_openai_tool
Corresponding bind functions on OAI models: bind_functions, bind_tools
8 months ago
Brian Burgin 148347e858
community[minor]: Add LiteLLM Router Integration (#15588)
community:

  - **Description:**
- Add new ChatLiteLLMRouter class that allows a client to use a LiteLLM
Router as a LangChain chat model.
- Note: The existing ChatLiteLLM integration did not cover the LiteLLM
Router class.
    - Add tests and Jupyter notebook.
  - **Issue:** None
  - **Dependencies:** Relies on existing ChatLiteLLM integration
  - **Twitter handle:** @bburgin_0

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
8 months ago
JongRok BAEK 3b8eba32f9
anthropic[patch]: Fix message type lookup in Anthropic Partners (#16563)
- **Description:** 

The parameters for user and assistant in Anthropic should be 'ai ->
assistant,' but they are reversed to 'assistant -> ai.'
Below is error code.
```python
anthropic.BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages: Unexpected role "ai". Allowed roles are "user" or "assistant"'}}
```

[anthropic](7177f3a71f/src/anthropic/types/beta/message_param.py (L13))

  - **Issue:** : #16561
  -  **Dependencies:** : None
   - **Twitter handle:** : None
8 months ago
Dmitry Tyumentsev e86e66bad7
community[patch]: YandexGPT models - add sleep_interval (#16566)
Added sleep between requests to prevent errors associated with
simultaneous requests.
8 months ago
Bagatur e510cfaa23
core[patch]: passthrough BaseRetriever.invoke(**kwargs) (#16551)
Fix for #16547
8 months ago
Anders Åhsman 355ef2a4a6
langchain[patch]: Fix doc-string grammar (#16543)
- **Description:** Small grammar fix in docstring for class
`BaseCombineDocumentsChain`.
8 months ago
Aditya 9dd7cbb447
google-genai: added logic for method get_num_tokens() (#16205)
<!-- Thank you for contributing to LangChain!

Please title your PR "partners: google-genai",

Replace this entire comment with:
- **Description:** : added logic for method get_num_tokens() for
ChatGoogleGenerativeAI , GoogleGenerativeAI,
  - **Issue:** : https://github.com/langchain-ai/langchain/issues/16204,
  - **Dependencies:** : None,
  - **Twitter handle:** @Aditya_Rane

---------

Co-authored-by: adityarane@google.com <adityarane@google.com>
Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru>
8 months ago
James Braza 0785432e7b
langchain-google-vertexai: perserving grounding metadata (#16309)
Revival of https://github.com/langchain-ai/langchain/pull/14549 that
closes https://github.com/langchain-ai/langchain/issues/14548.
8 months ago
Erick Friis adc008407e
exa: init pkg (#16553) 8 months ago
Rave Harpaz c4e9c9ca29
community[minor]: Add OCI Generative AI integration (#16548)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
- **Description:** Adding Oracle Cloud Infrastructure Generative AI
integration. Oracle Cloud Infrastructure (OCI) Generative AI is a fully
managed service that provides a set of state-of-the-art, customizable
large language models (LLMs) that cover a wide range of use cases, and
which is available through a single API. Using the OCI Generative AI
service you can access ready-to-use pretrained models, or create and
host your own fine-tuned custom models based on your own data on
dedicated AI clusters.
https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm
  - **Issue:** None,
  - **Dependencies:** OCI Python SDK,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
Passed

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

we provide unit tests. However, we cannot provide integration tests due
to Oracle policies that prohibit public sharing of api keys.
 
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
8 months ago
Bagatur c173a69908
langchain[patch]: oai tools output parser nit (#16540)
allow positional init args
8 months ago