Users can provide an Elasticsearch connection with custom headers. This
PR makes sure these headers are preserved when adding the langchain user
agent header.
- **Description:** Depending on `token_max` used in
`load_summarize_chain`, it could cause an infinite loop when documents
cannot collapse under `token_max`. This change would not affect the
existing feature, but it also gives an option to users to avoid the
situation.
- **Issue:** https://github.com/langchain-ai/langchain/issues/16251
- **Dependencies:** None
- **Twitter handle:** None
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
1. integrate chat models with
[`Yuan2.0`](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/README-EN.md)
2. add a new doc for [Yuan2.0
integration](docs/docs/integrations/llms/yuan2.ipynb)
Yuan2.0 is a new generation Fundamental Large Language Model developed
by IEIT System. We have published all three models, Yuan 2.0-102B, Yuan
2.0-51B, and Yuan 2.0-2B.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Description:
Addresses a problem where the Date type within an Elasticsearch
SelfQueryRetriever would encounter difficulties in generating a valid
query.
Issue: #17042
---------
Co-authored-by: Max Jakob <max.jakob@elastic.co>
Co-authored-by: Bagatur <baskaryan@gmail.com>
## Description
I am submitting this for a school project as part of a team of 5. Other
team members are @LeilaChr, @maazh10, @Megabear137, @jelalalamy. This PR
also has contributions from community members @Harrolee and @Mario928.
Initial context is in the issue we opened (#11229).
This pull request adds:
- Generic framework for expanding the languages that `LanguageParser`
can handle, using the
[tree-sitter](https://github.com/tree-sitter/py-tree-sitter#py-tree-sitter)
parsing library and existing language-specific parsers written for it
- Support for the following additional languages in `LanguageParser`:
- C
- C++
- C#
- Go
- Java (contributed by @Mario928
https://github.com/ThatsJustCheesy/langchain/pull/2)
- Kotlin
- Lua
- Perl
- Ruby
- Rust
- Scala
- TypeScript (contributed by @Harrolee
https://github.com/ThatsJustCheesy/langchain/pull/1)
Here is the [design
document](https://docs.google.com/document/d/17dB14cKCWAaiTeSeBtxHpoVPGKrsPye8W0o_WClz2kk)
if curious, but no need to read it.
## Issues
- Closes#11229
- Closes#10996
- Closes#8405
## Dependencies
`tree_sitter` and `tree_sitter_languages` on PyPI. We have tried to add
these as optional dependencies.
## Documentation
We have updated the list of supported languages, and also added a
section to `source_code.ipynb` detailing how to add support for
additional languages using our framework.
## Maintainer
- @hwchase17 (previously reviewed
https://github.com/langchain-ai/langchain/pull/6486)
Thanks!!
## Git commits
We will gladly squash any/all of our commits (esp merge commits) if
necessary. Let us know if this is desirable, or if you will be
squash-merging anyway.
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Maaz Hashmi <mhashmi373@gmail.com>
Co-authored-by: LeilaChr <87657694+LeilaChr@users.noreply.github.com>
Co-authored-by: Jeremy La <jeremylai511@gmail.com>
Co-authored-by: Megabear137 <zubair.alnoor27@gmail.com>
Co-authored-by: Lee Harrold <lhharrold@sep.com>
Co-authored-by: Mario928 <88029051+Mario928@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
**Description:**
- The existing code was trying to find a `.embeddings` property on the
`Coroutine` returned by calling `cohere.async_client.embed`.
- Instead, the `.embeddings` property is present on the value returned
by the `Coroutine`.
- Also, it seems that the original cohere client expects a value of
`max_retries` to not be `None`. Hence, setting the default value of
`max_retries` to `3`.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Pebblo opensource project enables developers to
safely load data to their Gen AI apps. It identifies semantic topics and
entities found in the loaded data and summarizes them in a
developer-friendly report.
- **Dependencies:** none
- **Twitter handle:** srics
@hwchase17
**Description**: This PR adds a chain for Amazon Neptune graph database
RDF format. It complements the existing Neptune Cypher chain. The PR
also includes a Neptune RDF graph class to connect to, introspect, and
query a Neptune RDF graph database from the chain. A sample notebook is
provided under docs that demonstrates the overall effect: invoking the
chain to make natural language queries against Neptune using an LLM.
**Issue**: This is a new feature
**Dependencies**: The RDF graph class depends on the AWS boto3 library
if using IAM authentication to connect to the Neptune database.
---------
Co-authored-by: Piyush Jain <piyushjain@duck.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** This PR adds support for
[flashrank](https://github.com/PrithivirajDamodaran/FlashRank) for
reranking as alternative to Cohere.
I'm not sure `libs/langchain` is the right place for this change. At
first, I wanted to put it under `libs/community`. All the compressors
were under `libs/langchain/retrievers/document_compressors` though. Hope
this makes sense!
- **Description:** Improve test cases for `SQLDatabase` adapter
component, see
[suggestion](https://github.com/langchain-ai/langchain/pull/16655#pullrequestreview-1846749474).
- **Depends on:** GH-16655
- **Addressed to:** @baskaryan, @cbornet, @eyurtsev
_Remark: This PR is stacked upon GH-16655, so that one will need to go
in first._
Edit: Thank you for bringing in GH-17191, @eyurtsev. This is a little
aftermath, improving/streamlining the corresponding test cases.
- **Description:**
[AS-IS] When dealing with a yaml file, the extension must be .yaml.
[TO-BE] In the absence of extension length constraints in the OS, the
extension of the YAML file is yaml, but control over the yml extension
must still be made.
It's as if it's an error because it's a .jpg extension in jpeg support.
- **Issue:** -
- **Dependencies:**
no dependencies required for this change,
- **Description:** The from__xx methods of FAISS class have hardcoded
InMemoryStore implementation and thereby not let users pass a custom
DocStore implementation,
- **Issue:** no referenced issue,
- **Dependencies:** none,
- **Twitter handle:** ksachdeva
**Description:**
Bugfix: Langchain_community's GitHub Api wrapper throws a TypeError when
searching for issues and/or PRs (the `search_issues_and_prs` method).
This is because PyGithub's PageinatedList type does not support the
len() method. See https://github.com/PyGithub/PyGithub/issues/1476
![image](https://github.com/langchain-ai/langchain/assets/8849021/57390b11-ed41-4f48-ba50-f3028610789c)
**Dependencies:** None
**Twitter handle**: @ChrisKeoghNZ
I haven't registered an issue as it would take me longer to fill the
template out than to make the fix, but I'm happy to if that's deemed
essential.
I've added a simple integration test to cover this as there were no
existing unit tests and it was going to be tricky to set them up.
Co-authored-by: Chris Keogh <chris.keogh@xero.com>
- **Description:** This adds a delete method so that rocksetdb can be
used with `RecordManager`.
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** `@_morgan_adams_`
---------
Co-authored-by: Rockset API Bot <admin@rockset.io>
**Description:** Invoke callback prior to yielding token in stream
method for Ollama.
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)
Co-authored-by: Robby <h0rv@users.noreply.github.com>
This PR replaces the memory stream implementation used by the
LogStreamCallbackHandler.
This implementation resolves an issue in which streamed logs and
streamed events originating from sync code would arrive only after the
entire sync code would finish execution (rather than arriving in real
time as they're generated).
One example is if trying to stream tokens from an llm within a tool. If
the tool was an async tool, but the llm was invoked via stream (sync
variant) rather than astream (async variant), then the tokens would fail
to stream in real time and would all arrived bunched up after the tool
invocation completed.
**Description:** Invoke callback prior to yielding token in stream
method for watsonx.
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)
Co-authored-by: Robby <h0rv@users.noreply.github.com>
**Description:** changed filtering so that failed filter doesn't add
document to results. Currently filtering is entirely broken and all
documents are returned whether or not they pass the filter.
fixes issue introduced in
https://github.com/langchain-ai/langchain/pull/16190
- **Description:** Adds the document loader for [AWS
Athena](https://aws.amazon.com/athena/), a serverless and interactive
analytics service.
- **Dependencies:** Added boto3 as a dependency
- **Description:** This PR adds support for `search_types="mmr"` and
`search_type="similarity_score_threshold"` to retrievers using
`DatabricksVectorSearch`,
- **Issue:**
- **Dependencies:**
- **Twitter handle:**
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Ref: https://openai.com/pricing
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Unlike vector results, the LLM has to completely trust the context of a
graph database result, even if it doesn't provide whole context. We
tried with instructions, but it seems that adding a single example is
the way to go to solve this issue.
### This pull request makes the following changes:
* Fixed issue #16913
Fixed the google gen ai chat_models.py code to make sure that the
callback is called before the token is yielded
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Pydantic's `dict()` function raises an error here if you pass in a
generator. We have a more robust serialization function in lagnsmith
that we will use instead.
**Description**
Make some functions work with Milvus:
1. get_ids: Get primary keys by field in the metadata
2. delete: Delete one or more entities by ids
3. upsert: Update/Insert one or more entities
**Issue**
None
**Dependencies**
None
**Tag maintainer:**
@hwchase17
**Twitter handle:**
None
---------
Co-authored-by: HoaNQ9 <hoanq.1811@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- **Description:**
1. Modify LLMs/Anyscale to work with OAI v1
2. Get rid of openai_ prefixed variables in Chat_model/ChatAnyscale
3. Modify `anyscale_api_base` to `anyscale_base_url` to follow OAI name
convention (reverted)
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
## Summary
This PR upgrades LangChain's Ruff configuration in preparation for
Ruff's v0.2.0 release. (The changes are compatible with Ruff v0.1.5,
which LangChain uses today.) Specifically, we're now warning when
linter-only options are specified under `[tool.ruff]` instead of
`[tool.ruff.lint]`.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Issue:** Issue with model argument support (been there for a while
actually):
- Non-specially-handled arguments like temperature don't work when
passed through constructor.
- Such arguments DO work quite well with `bind`, but also do not abide
by field requirements.
- Since initial push, server-side error messages have gotten better and
v0.0.2 raises better exceptions. So maybe it's better to let server-side
handle such issues?
- **Description:**
- Removed ChatNVIDIA's argument fields in favor of
`model_kwargs`/`model_kws` arguments which aggregates constructor kwargs
(from constructor pathway) and merges them with call kwargs (bind
pathway).
- Shuffled a few functions from `_NVIDIAClient` to `ChatNVIDIA` to
streamline construction for future integrations.
- Minor/Optional: Old services didn't have stop support, so client-side
stopping was implemented. Now do both.
- **Any Breaking Changes:** Minor breaking changes if you strongly rely
on chat_model.temperature, etc. This is captured by
chat_model.model_kwargs.
PR passes tests and example notebooks and example testing. Still gonna
chat with some people, so leaving as draft for now.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Description: Missing _identifying_params create issues when dealing with
callbacks to get current run model parameters.
All other model partners implementation provide this property and also
provide _default_params. I'm not sure about the default values to
include or if we can re-use the same as for _VertexAICommon(), this
change allows you to access the model parameters correctly.
Issue: Not exactly this issue but could be related
https://github.com/langchain-ai/langchain/issues/14711
Twitter handle:@musicaoriginal2
The streaming API doesn't separate safety_settings from the
generation_config payload. As the result the following error is observed
when using `stream` API. The functionality is correct with `invoke` API.
The fix separates the `safety_settings` from params and sets it as
argument to the `send_message` method.
```
ERROR: Unknown field for GenerationConfig: safety_settings
Traceback (most recent call last):
File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 250, in stream
raise e
File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 234, in stream
for chunk in self._stream(
File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/langchain_google_vertexai/chat_models.py", line 501, in _stream
for response in responses:
File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/vertexai/generative_models/_generative_models.py", line 921, in _send_message_streaming
for chunk in stream:
File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/vertexai/generative_models/_generative_models.py", line 514, in _generate_content_streaming
request = self._prepare_request(
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/vertexai/generative_models/_generative_models.py", line 256, in _prepare_request
gapic_generation_config = gapic_content_types.GenerationConfig(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/Library/Caches/pypoetry/virtualenvs/chatbot-worker-main-Ju-qIM-X-py3.12/lib/python3.12/site-packages/proto/message.py", line 576, in __init__
raise ValueError(
ValueError: Unknown field for GenerationConfig: safety_settings
```
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
I noticed that RunnableConfigurableAlternatives which is an important
composition in LCEL has no Docstring. Therefore I added the detailed
Docstring for it.
@baskaryan, @eyurtsev, @hwchase17 please have a look and let me if the
docstring is looking good.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
This PR enables changing the behaviour of huggingface pipeline between
different calls. For example, before this PR there's no way of changing
maximum generation length between different invocations of the chain.
This is desirable in cases, such as when we want to scale the maximum
output size depending on a dynamic prompt size.
Usage example:
```python
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
hf = HuggingFacePipeline(pipeline=pipe)
hf("Say foo:", pipeline_kwargs={"max_new_tokens": 42})
```
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- **Description: changes to you.com files**
- general cleanup
- adds community/utilities/you.py, moving bulk of code from retriever ->
utility
- removes `snippet` as endpoint
- adds `news` as endpoint
- adds more tests
<s>**Description: update community MAKE file**
- adds `integration_tests`
- adds `coverage`</s>
- **Issue:** the issue # it fixes if applicable,
- [For New Contributors: Update Integration
Documentation](https://github.com/langchain-ai/langchain/issues/15664#issuecomment-1920099868)
- **Dependencies:** n/a
- **Twitter handle:** @scottnath
- **Mastodon handle:** scottnath@mastodon.social
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** This adds a recursive json splitter class to the
existing text_splitters as well as unit tests
- **Issue:** splitting text from structured data can cause issues if you
have a large nested json object and you split it as regular text you may
end up losing the structure of the json. To mitigate against this you
can split the nested json into large chunks and overlap them, but this
causes unnecessary text processing and there will still be times where
the nested json is so big that the chunks get separated from the parent
keys.
As an example you wouldn't want the following to be split in half:
```shell
{'val0': 'DFWeNdWhapbR',
'val1': {'val10': 'QdJo',
'val11': 'FWSDVFHClW',
'val12': 'bkVnXMMlTiQh',
'val13': 'tdDMKRrOY',
'val14': 'zybPALvL',
'val15': 'JMzGMNH',
'val16': {'val160': 'qLuLKusFw',
'val161': 'DGuotLh',
'val162': 'KztlcSBropT',
-----------------------------------------------------------------------split-----
'val163': 'YlHHDrN',
'val164': 'CtzsxlGBZKf',
'val165': 'bXzhcrWLmBFp',
'val166': 'zZAqC',
'val167': 'ZtyWno',
'val168': 'nQQZRsLnaBhb',
'val169': 'gSpMbJwA'},
'val17': 'JhgiyF',
'val18': 'aJaqjUSFFrI',
'val19': 'glqNSvoyxdg'}}
```
Any llm processing the second chunk of text may not have the context of
val1, and val16 reducing accuracy. Embeddings will also lack this
context and this makes retrieval less accurate.
Instead you want it to be split into chunks that retain the json
structure.
```shell
{'val0': 'DFWeNdWhapbR',
'val1': {'val10': 'QdJo',
'val11': 'FWSDVFHClW',
'val12': 'bkVnXMMlTiQh',
'val13': 'tdDMKRrOY',
'val14': 'zybPALvL',
'val15': 'JMzGMNH',
'val16': {'val160': 'qLuLKusFw',
'val161': 'DGuotLh',
'val162': 'KztlcSBropT',
'val163': 'YlHHDrN',
'val164': 'CtzsxlGBZKf'}}}
```
and
```shell
{'val1':{'val16':{
'val165': 'bXzhcrWLmBFp',
'val166': 'zZAqC',
'val167': 'ZtyWno',
'val168': 'nQQZRsLnaBhb',
'val169': 'gSpMbJwA'},
'val17': 'JhgiyF',
'val18': 'aJaqjUSFFrI',
'val19': 'glqNSvoyxdg'}}
```
This recursive json text splitter does this. Values that contain a list
can be converted to dict first by using split(... convert_lists=True)
otherwise long lists will not be split and you may end up with chunks
larger than the max chunk.
In my testing large json objects could be split into small chunks with
✅ Increased question answering accuracy
✅ The ability to split into smaller chunks meant retrieval queries can
use fewer tokens
- **Dependencies:** json import added to text_splitter.py, and random
added to the unit test
- **Twitter handle:** @joelsprunger
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
**Description:** Databricks LLM does not support SerDe the
transform_input_fn and transform_output_fn. After saving and loading,
the LLM will be broken. This PR serialize these functions into a hex
string using pickle, and saving the hex string in the yaml file. Using
pickle to serialize a function can be flaky, but this is a simple
workaround that unblocks many use cases. If more sophisticated SerDe is
needed, we can improve it later.
Test:
Added a simple unit test.
I did manual test on Databricks and it works well.
The saved yaml looks like:
```
llm:
_type: databricks
cluster_driver_port: null
cluster_id: null
databricks_uri: databricks
endpoint_name: databricks-mixtral-8x7b-instruct
extra_params: {}
host: e2-dogfood.staging.cloud.databricks.com
max_tokens: null
model_kwargs: null
n: 1
stop: null
task: null
temperature: 0.0
transform_input_fn: 80049520000000000000008c085f5f6d61696e5f5f948c0f7472616e73666f726d5f696e7075749493942e
transform_output_fn: null
```
@baskaryan
```python
from langchain_community.embeddings import DatabricksEmbeddings
from langchain_community.llms import Databricks
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
import mlflow
embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")
def transform_input(**request):
request["messages"] = [
{
"role": "user",
"content": request["prompt"]
}
]
del request["prompt"]
return request
llm = Databricks(endpoint_name="databricks-mixtral-8x7b-instruct", transform_input_fn=transform_input)
persist_dir = "faiss_databricks_embedding"
# Create the vector db, persist the db to a local fs folder
loader = TextLoader("state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
db = FAISS.from_documents(docs, embeddings)
db.save_local(persist_dir)
def load_retriever(persist_directory):
embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")
vectorstore = FAISS.load_local(persist_directory, embeddings)
return vectorstore.as_retriever()
retriever = load_retriever(persist_dir)
retrievalQA = RetrievalQA.from_llm(llm=llm, retriever=retriever)
with mlflow.start_run() as run:
logged_model = mlflow.langchain.log_model(
retrievalQA,
artifact_path="retrieval_qa",
loader_fn=load_retriever,
persist_dir=persist_dir,
)
# Load the retrievalQA chain
loaded_model = mlflow.pyfunc.load_model(logged_model.model_uri)
print(loaded_model.predict([{"query": "What did the president say about Ketanji Brown Jackson"}]))
```
- **Description:**
Embedding field name was hard-coded named "embedding".
So I suggest that change `res["embedding"]` into
`res[self._embedding_key]`.
- **Issue:** #17177,
- **Twitter handle:**
[@bagcheoljun17](https://twitter.com/bagcheoljun17)
- **Description:** Fixes in the Ontotext GraphDB Graph and QA Chain
related to the error handling in case of invalid SPARQL queries, for
which `prepareQuery` doesn't throw an exception, but the server returns
400 and the query is indeed invalid
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** @OntotextGraphDB
**Description:**
Implemented unique ID validation in the FAISS component to ensure all
document IDs are distinct. This update resolves issues related to
non-unique IDs, such as inconsistent behavior during deletion processes.
**Description:** enable _parse_response_candidate to support complex
structure format.
**Issue:**
currently, if Gemini response complex args format, people will get
"TypeError: Object of type RepeatedComposite is not JSON serializable"
error from _parse_response_candidate.
response candidate example
```
content {
role: "model"
parts {
function_call {
name: "Information"
args {
fields {
key: "people"
value {
list_value {
values {
string_value: "Joe is 30, his mom is Martha"
}
}
}
}
}
}
}
}
finish_reason: STOP
safety_ratings {
category: HARM_CATEGORY_HARASSMENT
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_HATE_SPEECH
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_SEXUALLY_EXPLICIT
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_DANGEROUS_CONTENT
probability: NEGLIGIBLE
}
```
error msg:
```
Traceback (most recent call last):
File "/home/jupyter/user/abehsu/gemini_langchain_tools/example2.py", line 36, in <module>
print(tagging_chain.invoke({"input": "Joe is 30, his mom is Martha"}))
File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 2053, in invoke
input = step.invoke(
File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3887, in invoke
return self.bound.invoke(
File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 165, in invoke
self.generate_prompt(
File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 543, in generate_prompt
return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 407, in generate
raise e
File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 397, in generate
self._generate_with_cache(
File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 576, in _generate_with_cache
return self._generate(
File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_google_vertexai/chat_models.py", line 406, in _generate
generations = [
File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_google_vertexai/chat_models.py", line 408, in <listcomp>
message=_parse_response_candidate(c),
File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/site-packages/langchain_google_vertexai/chat_models.py", line 280, in _parse_response_candidate
function_call["arguments"] = json.dumps(
File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/opt/conda/envs/gemini_langchain_tools/lib/python3.10/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type RepeatedComposite is not JSON serializable
```
**Twitter handle:** @abehsu1992626
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description: added logic to override get_num_tokens_from_messages()
for ChatVertexAI. Currently ChatVertexAI was inheriting
get_num_tokens_from_messages() from BaseChatModel which in-turn was
calling GPT-2 tokenizer
- **Issue: NA
- **Dependencies: NA
- **Twitter handle:@aditya_rane
@lkuligin for review
---------
Co-authored-by: adityarane@google.com <adityarane@google.com>
Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru>
- **Description:**
Actually the test named `test_openai_apredict` isn't testing the
apredict method from ChatOpenAI.
- **Twitter handle:**
https://twitter.com/OAlmofadas
* This PR adds async methods to the LLM cache.
* Adds an implementation using Redis called AsyncRedisCache.
* Adds a docker compose file at the /docker to help spin up docker
* Updates redis tests to use a context manager so flushing always happens by default
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- **Description:**
This PR standardizes the `output_parser.py` file across all agent types
to ensure a uniform parsing mechanism is implemented. It introduces a
cohesive structure and common interface for output parsing, facilitating
easier modifications and extensions by users. The standardized approach
enhances maintainability and scalability of the codebase by providing a
consistent pattern for output parsing, which can be easily understood
and utilized across different agent types.
This PR builds upon the foundation set by a previously merged PR, which
focused exclusively on standardizing the `output_parser.py` for the
`conversational_agent` ([PR
#16945](https://github.com/langchain-ai/langchain/pull/16945)). With
this new update, I extend the standardization efforts to encompass
`output_parser.py` files across all agent types. This enhancement not
only unifies the parsing mechanism across the board but also introduces
the flexibility for users to incorporate custom `FORMAT_INSTRUCTIONS`.
- **Issue:**
https://github.com/langchain-ai/langchain/issues/10721https://github.com/langchain-ai/langchain/issues/4044
- **Dependencies:**
No new dependencies required for this change
- **Twitter handle:**
With my github user is enough. Thanks
I hope you accept my PR.
Based on my experiments, the newline isn't always there, so we can make
the regex slightly more robust by allowing an optional newline after the
bacticks
- **Description:**
before the change I've got
1. propagate InferenceClientException to the caller.
2. stop grpc receiver thread on exception
```
for token in result_queue:
> result_str += token
E TypeError: can only concatenate str (not "InferenceServerException") to str
../../langchain_nvidia_trt/llms.py:207: TypeError
```
And stream thread keeps running.
after the change request thread stops correctly and caller got a root
cause exception:
```
E tritonclient.utils.InferenceServerException: [request id: 4529729] expected number of inputs between 2 and 3 but got 10 inputs for model 'vllm_model'
../../langchain_nvidia_trt/llms.py:205: InferenceServerException
```
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** [t.me/mkhl_spb](https://t.me/mkhl_spb)
I'm not sure about test coverage. Should I setup deep mocks or there's a
kind of triton stub via testcontainers or so.
### Description
support load any github file content based on file extension.
Why not use [git
loader](https://python.langchain.com/docs/integrations/document_loaders/git#load-existing-repository-from-disk)
?
git loader clones the whole repo even only interested part of files,
that's too heavy. This GithubFileLoader only downloads that you are
interested files.
### Twitter handle
my twitter: @shufanhaotop
---------
Co-authored-by: Hao Fan <h_fan@apple.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:**
With this modification, users can customize the `FORMAT_INSTRUCTIONS`
template, allowing them to create their own prompts
As it is happening in
[this](https://github.com/langchain-ai/langchain/issues/10721) issue,
the `FORMAT_INSTRUCTIONS` is not customizable for the output parser,
unless you create your own class `ConvoOutputParser`. To avoid this, a
modification was done, creating a `format_instruction` variable that
users can customize with ease after initialize the agent.
For example:
```
agent = initialize_agent(
agent = AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
tools = tools,
llm = llm_agent,
verbose = True,
max_iterations = 3,
early_stopping_method = 'generate',
memory = b_w_memory,
handle_parsing_errors = True,
agent_kwargs={
'system_message':PREFIX,
'human_message':SUFFIX,
'template_tool_response':TEMPLATE_TOOL_RESPONSE,
}
)
agent.agent.output_parser.format_instructions = "MY CUSTOM FORMAT INSTRUCTIONS"
print(agent.agent.output_parser.get_format_instructions())
MY CUSTOM FORMAT INSTRUCTIONS
```
Other parameters like `system_message`, `human_message`, or
`template_tool_response` are already customizable and with this PR, the
last parameter `FORMAT_INSTRUCTIONS` in
`langchain.agents.conversational_chat.prompt` can be modified.
**Issue:**
https://github.com/langchain-ai/langchain/issues/10721
**Dependencies:**
No new dependencies required for this change
**Twitter handle:**
With my github user is enough. Thanks
I hope you accept my PR.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Please tag this issue with `nvidia_genai`**
- **Description:** Added new Runnables for integration NVIDIA Riva into
LCEL chains for Automatic Speech Recognition (ASR) and Text To Speech
(TTS).
- **Issue:** N/A
- **Dependencies:** To use these runnables, the NVIDIA Riva client
libraries are required. It they are not installed, an error will be
raised instructing how to install them. The Runnables can be safely
imported without the riva client libraries.
- **Twitter handle:** N/A
All of the Riva Runnables are inside a single folder in the Utilities
module. In this folder are four files:
- common.py - Contains all code that is common to both TTS and ASR
- stream.py - Contains a class representing an audio stream that allows
the end user to put data into the stream like a queue.
- asr.py - Contains the RivaASR runnable
- tts.py - Contains the RivaTTS runnable
The following Python function is an example of creating a chain that
makes use of both of these Runnables:
```python
def create(
config: Configuration,
audio_encoding: RivaAudioEncoding,
sample_rate: int,
audio_channels: int = 1,
) -> Runnable[ASRInputType, TTSOutputType]:
"""Create a new instance of the chain."""
_LOGGER.info("Instantiating the chain.")
# create the riva asr client
riva_asr = RivaASR(
url=str(config.riva_asr.service.url),
ssl_cert=config.riva_asr.service.ssl_cert,
encoding=audio_encoding,
audio_channel_count=audio_channels,
sample_rate_hertz=sample_rate,
profanity_filter=config.riva_asr.profanity_filter,
enable_automatic_punctuation=config.riva_asr.enable_automatic_punctuation,
language_code=config.riva_asr.language_code,
)
# create the prompt template
prompt = PromptTemplate.from_template("{user_input}")
# model = ChatOpenAI()
model = ChatNVIDIA(model="mixtral_8x7b") # type: ignore
# create the riva tts client
riva_tts = RivaTTS(
url=str(config.riva_asr.service.url),
ssl_cert=config.riva_asr.service.ssl_cert,
output_directory=config.riva_tts.output_directory,
language_code=config.riva_tts.language_code,
voice_name=config.riva_tts.voice_name,
)
# construct and return the chain
return {"user_input": riva_asr} | prompt | model | riva_tts # type: ignore
```
The following code is an example of creating a new audio stream for
Riva:
```python
input_stream = AudioStream(maxsize=1000)
# Send bytes into the stream
for chunk in audio_chunks:
await input_stream.aput(chunk)
input_stream.close()
```
The following code is an example of how to execute the chain with
RivaASR and RivaTTS
```python
output_stream = asyncio.Queue()
while not input_stream.complete:
async for chunk in chain.astream(input_stream):
output_stream.put(chunk)
```
Everything should be async safe and thread safe. Audio data can be put
into the input stream while the chain is running without interruptions.
---------
Co-authored-by: Hayden Wolff <hwolff@nvidia.com>
Co-authored-by: Hayden Wolff <hwolff@Haydens-Laptop.local>
Co-authored-by: Hayden Wolff <haydenwolff99@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Ensure the `LlamaGrammar` custom type is always
available when instantiating a `LlamaCpp` LLM
- **Issue:** #16994
- **Dependencies:** None
- **Twitter handle:** @fpaupier
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
As described in issue #17060, in the case in which text has only one
sentence the following function fails. Checking for that and adding a
return case fixed the issue.
```python
def split_text(self, text: str) -> List[str]:
"""Split text into multiple components."""
# Splitting the essay on '.', '?', and '!'
single_sentences_list = re.split(r"(?<=[.?!])\s+", text)
sentences = [
{"sentence": x, "index": i} for i, x in enumerate(single_sentences_list)
]
sentences = combine_sentences(sentences)
embeddings = self.embeddings.embed_documents(
[x["combined_sentence"] for x in sentences]
)
for i, sentence in enumerate(sentences):
sentence["combined_sentence_embedding"] = embeddings[i]
distances, sentences = calculate_cosine_distances(sentences)
start_index = 0
# Create a list to hold the grouped sentences
chunks = []
breakpoint_percentile_threshold = 95
breakpoint_distance_threshold = np.percentile(
distances, breakpoint_percentile_threshold
) # If you want more chunks, lower the percentile cutoff
indices_above_thresh = [
i for i, x in enumerate(distances) if x > breakpoint_distance_threshold
] # The indices of those breakpoints on your list
# Iterate through the breakpoints to slice the sentences
for index in indices_above_thresh:
# The end index is the current breakpoint
end_index = index
# Slice the sentence_dicts from the current start index to the end index
group = sentences[start_index : end_index + 1]
combined_text = " ".join([d["sentence"] for d in group])
chunks.append(combined_text)
# Update the start index for the next group
start_index = index + 1
# The last group, if any sentences remain
if start_index < len(sentences):
combined_text = " ".join([d["sentence"] for d in sentences[start_index:]])
chunks.append(combined_text)
return chunks
```
Co-authored-by: Giulio Zani <salamanderxing@Giulios-MBP.homenet.telecomitalia.it>
- **Description:** Add relevant type annotations for relevant session
and query objects to resolve mypy errors when `# type: ignore` comments
are removed.
- **Issue:** #17048
- **Dependencies:** None,
- **Twitter handle:** [clesiemo3](https://twitter.com/clesiemo3)
I attempted to solve the `UpsertionRecord` ignore but it would require
added a deprecated plugin or moving completely to sqlalchemy 2.0+ from
my understanding. I'm assuming this is not something desired at this
point in time.
- **Description:** Adds a function parameter to HuggingFaceEmbeddings
called `show_progress` that enables a `tqdm` progress bar if enabled.
Does not function if `multi_process = True`.
- **Issue:** n/a
- **Dependencies:** n/a
- **Description:** Adds an additional class variable to `BedrockBase`
called `provider` that allows sending a model provider such as amazon,
cohere, ai21, etc.
Up until now, the model provider is extracted from the `model_id` using
the first part before the `.`, such as `amazon` for
`amazon.titan-text-express-v1` (see [supported list of Bedrock model IDs
here](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids-arns.html)).
But for custom Bedrock models where the ARN of the provisioned
throughput must be supplied, the `model_id` is like
`arn:aws:bedrock:...` so the `model_id` cannot be extracted from this. A
model `provider` is required by the LangChain Bedrock class to perform
model-based processing. To allow the same processing to be performed for
custom-models of a specific base model type, passing this `provider`
argument can help solve the issues.
The alternative considered here was the use of
`provider.arn:aws:bedrock:...` which then requires ARN to be extracted
and passed separately when invoking the model. The proposed solution
here is simpler and also does not cause issues for current models
already using the Bedrock class.
- **Issue:** N/A
- **Dependencies:** N/A
---------
Co-authored-by: Piyush Jain <piyushjain@duck.com>
This is a PR about #16334
The Stop sequenes isn't meanful in `json_chat` because it depends json
to work, not completions
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:** Several meta/usability updates, including User-Agent.
- **Issue:**
- User-Agent metadata for tracking connector engagement. @milesial
please check and advise.
- Better error messages. Tries harder to find a request ID. @milesial
requested.
- Client-side image resizing for multimodal models. Hope to upgrade to
Assets API solution in around a month.
- `client.payload_fn` allows you to modify payload before network
request. Use-case shown in doc notebook for kosmos_2.
- `client.last_inputs` put back in to allow for advanced
support/debugging.
- **Dependencies:**
- Attempts to pull in PIL for image resizing. If not installed, prints
out "please install" message, warns it might fail, and then tries
without resizing. We are waiting on a more permanent solution.
For LC viz: @hinthornw
For NV viz: @fciannella @milesial @vinaybagade
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Previously, if this did not find a mypy cache then it wouldnt run
this makes it always run
adding mypy ignore comments with existing uncaught issues to unblock other prs
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description**: We discovered a bug converting dictionaries to
messages where the ChatMessageChunk message type isn't handled. This PR
adds support for that message type.
- **Issue**: #17022
- **Dependencies**: None
- **Twitter handle**: None
## Description
In #16608, the calling `collection_name` was wrong.
I made a fix for it.
Sorry for the inconvenience!
## Issue
https://github.com/langchain-ai/langchain/issues/16962
## Dependencies
N/A
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Kumar Shivendu <kshivendu1@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
primary problem in pydantic still exists, where `Optional[str]` gets
turned to `string` in the jsonschema `.schema()`
Also fixes the `SchemaSchema` naming issue
---------
Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>
- **Description:** add a ValidationError handler as a field of
[`BaseTool`](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/tools.py#L101)
and add unit tests for the code change.
- **Issue:** #12721#13662
- **Dependencies:** None
- **Tag maintainer:**
- **Twitter handle:** @hmdev3
- **NOTE:**
- I'm wondering if the update of document is required.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
We didn't override the namespace of the ImagePromptTemplate, so it is
listed as being in langchain.schema
This updates the mapping to let the loader deserialize.
Alternatively, we could make a slight breaking change and update the
namespace of the ImagePromptTemplate since we haven't broadly
publicized/documented it yet..
All models should be calling the callback for new token prior to
yielding the token.
Not doing this can cause callbacks for downstream steps to be called
prior to the callback for the new token; causing issues in
astream_events APIs and other things that depend in callback ordering
being correct.
We need to make this change for all chat models.
The `langchain.prompts.example_selector` [still holds several
artifacts](https://api.python.langchain.com/en/latest/langchain_api_reference.html#module-langchain.prompts)
that belongs to `community`. If they moved to
`langchain_community.example_selectors`, the `langchain.prompts`
namespace would be effectively removed which is great.
- moved a class and afunction to `langchain_community`
Note:
- Previously, the `langchain.prompts.example_selector` artifacts were
moved into the `langchain_core.exampe_selectors`. See the flattened
namespace (`.prompts` was removed)!
Similar flattening was implemented for the `langchain_core` as the
`langchain_core.exampe_selectors`.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
* Adds `AstraDBEnvironment` class and use it in `AstraDBLoader`,
`AstraDBCache`, `AstraDBSemanticCache`, `AstraDBBaseStore` and
`AstraDBChatMessageHistory`
* Create an `AsyncAstraDB` if we only have an `AstraDB` and vice-versa
so:
* we always have an instance of `AstraDB`
* we always have an instance of `AsyncAstraDB` for recent versions of
astrapy
* Create collection if not exists in `AstraDBBaseStore`
* Some typing improvements
Note: `AstraDB` `VectorStore` not using `AstraDBEnvironment` at the
moment. This will be done after the `langchain-astradb` package is out.
- **Description:**
The BaseStore methods are currently blocking. Some implementations
(AstraDBStore, RedisStore) would benefit from having async methods.
Also once we have async methods for BaseStore, we can implement the
async `aembed_documents` in CacheBackedEmbeddings to cache the
embeddings asynchronously.
* adds async methods amget, amset, amedelete and ayield_keys to
BaseStore
* implements the async methods for InMemoryStore
* adds tests for InMemoryStore async methods
- **Twitter handle:** cbornet_
* Add bulk add_messages method to the interface.
* Update documentation for add_ai_message and add_human_message to
denote them as being marked for deprecation. We should stop using them
as they create more incorrect (inefficient) ways of doing things
Adds:
* methods `aload()` and `alazy_load()` to interface `BaseLoader`
* implementation for class `MergedDataLoader `
* support for class `BaseLoader` in async function `aindex()` with unit
tests
Note: this is compatible with existing `aload()` methods that some
loaders already had.
**Twitter handle:** @cbornet_
---------
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
- **Description:** the existing AssemblyAI API allows to pass a path or
an url to transcribe an audio file and turn in into Langchain Documents,
this PR allows to get existing transcript by their transcript id and
turn them into Documents.
- **Issue:** not related to an existing issue
- **Dependencies:** requests
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
The current implementation leaves it up to the particular file loader
implementation to report the file on which an error was encountered - in
my case pdfminer was simply saying it could not parse a file as a PDF,
but I didn't know which of my hundreds of files it was failing on.
No reason not to log the particular item on which an error was
encountered, and it should be an immense debugging assistant.
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Description: Added the parameter for a possibility to change a language
model in SpacyEmbeddings. The default value is still the same:
"en_core_web_sm", so it shouldn't affect a code which previously did not
specify this parameter, but it is not hard-coded anymore and easy to
change in case you want to use it with other languages or models.
Issue: At Barcelona Supercomputing Center in Aina project
(https://github.com/projecte-aina), a project for Catalan Language
Models and Resources, we would like to use Langchain for one of our
current projects and we would like to comment that Langchain, while
being a very powerful and useful open-source tool, is pretty much
focused on English language. We would like to contribute to make it a
bit more adaptable for using with other languages.
Dependencies: This change requires the Spacy library and a language
model, specified in the model parameter.
Tag maintainer: @dev2049
Twitter handle: @projecte_aina
---------
Co-authored-by: Marina Pliusnina <marina.pliusnina@bsc.es>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description**: fully async versions are available for astrapy 0.7+.
For older astrapy versions or if the user provides a sync client without
an async one, the async methods will call the sync ones wrapped in
`run_in_executor`
- **Twitter handle:** cbornet_
Replace this entire comment with:
- **Description:** Add Baichuan LLM to integration/llm, also updated
related docs.
Co-authored-by: BaiChuanHelper <wintergyc@WinterGYCs-MacBook-Pro.local>
- **Description:**
Filtering in a FAISS vectorstores is very inflexible and doesn't allow
that many use case. I think supporting callable like this enables a lot:
regular expressions, condition on multiple keys etc. **Note** I had to
manually alter a test. I don't understand if it was falty to begin with
or if there is something funky going on.
- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** None
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
Adjusted deprecate decorator to make sure decorated async functions are
still recognized as "coroutinefunction" by inspect
Addresses #16402
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
## Description
The PR is to return the ID and collection name from qdrant client to
metadata field in `Document` class.
## Issue
The motivation is almost same to
[11592](https://github.com/langchain-ai/langchain/issues/11592)
Returning ID is useful to update existing records in a vector store, but
we cannot know them if we use some retrievers.
In order to avoid any conflicts, breaking changes, the new fields in
metadata have a prefix `_`
## Dependencies
N/A
## Twitter handle
@kill_in_sun
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Use the real "history" provided by the original program instead of
putting "None" in the history.
- **Description:** I change one line in the code to make it return the
"history" of the chat model.
- **Issue:** At the moment it returns only the answers of the chat
model. However the chat model himself provides a history more complet
with the questions of the user.
- **Dependencies:** no dependencies required for this change,
This PR includes updates for OctoAI integrations:
- The LLM class was updated to fix a bug that occurs with multiple
sequential calls
- The Embedding class was updated to support the new GTE-Large endpoint
released on OctoAI lately
- The documentation jupyter notebook was updated to reflect using the
new LLM sdk
Thank you!
## Summary
This PR implements the "Connery Action Tool" and "Connery Toolkit".
Using them, you can integrate Connery actions into your LangChain agents
and chains.
Connery is an open-source plugin infrastructure for AI.
With Connery, you can easily create a custom plugin with a set of
actions and seamlessly integrate them into your LangChain agents and
chains. Connery will handle the rest: runtime, authorization, secret
management, access management, audit logs, and other vital features.
Additionally, Connery and our community offer a wide range of
ready-to-use open-source plugins for your convenience.
Learn more about Connery:
- GitHub: https://github.com/connery-io/connery-platform
- Documentation: https://docs.connery.io
- Twitter: https://twitter.com/connery_io
## TODOs
- [x] API wrapper
- [x] Integration tests
- [x] Connery Action Tool
- [x] Docs
- [x] Example
- [x] Integration tests
- [x] Connery Toolkit
- [x] Docs
- [x] Example
- [x] Formatting (`make format`)
- [x] Linting (`make lint`)
- [x] Testing (`make test`)
- **Description:** To adapt more parameters related to
MemorySearchPayload for the search method of ZepChatMessageHistory,
- **Issue:** None,
- **Dependencies:** None,
- **Twitter handle:** None
Add missing async similarity_distance_threshold handling in
RedisVectorStoreRetriever
- **Description:** added method `_aget_relevant_documents` to
`RedisVectorStoreRetriever` that overrides parent method to add support
of `similarity_distance_threshold` in async mode (as for sync mode)
- **Issue:** #16099
- **Dependencies:** N/A
- **Twitter handle:** N/A
- **Description:** Presidio-based anonymizers are not working because
`_remove_conflicts_and_get_text_manipulation_data` was being called
without a conflict resolution strategy. This PR fixes this issue. In
addition, it removes some mutable default arguments (antipattern).
To reproduce the issue, just run the very first cell of this
[notebook](https://python.langchain.com/docs/guides/privacy/2/) from
langchain's documentation.
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
* Description: Fixed schema discrepancy in **from_texts** function for
weaviate vectorstore which created a redundant property "key" inside a
class.
* Issue: Fixed: https://github.com/langchain-ai/langchain/issues/16692
* Twitter handle: @pashvamehta1
- **Description:** Adds Wikidata support to langchain. Can read out
documents from Wikidata.
- **Issue:** N/A
- **Dependencies:** Adds implicit dependencies for
`wikibase-rest-api-client` (for turning items into docs) and
`mediawikiapi` (for hitting the search endpoint)
- **Twitter handle:** @derenrich
You can see an example of this tool used in a chain
[here](https://nbviewer.org/urls/d.erenrich.net/upload/Wikidata_Langchain.ipynb)
or
[here](https://nbviewer.org/urls/d.erenrich.net/upload/Wikidata_Lars_Kai_Hansen.ipynb)
<!-- Thank you for contributing to LangChain!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
**Description:** This update ensures that the user-defined embedding
function specified during vector store creation is applied during
queries. Previously, even if a custom embedding function was defined at
the time of store creation, Bagel DB would default to using the standard
embedding function during query execution. This pull request addresses
this issue by consistently using the user-defined embedding function for
queries if one has been specified earlier.
- **Description:** This change allows the `_fetch` method in the
`WebBaseLoader` class to utilize cookies from an existing
`requests.Session`. It ensures that when the `fetch` method is used, any
cookies in the provided session are included in the request. This
enhancement maintains compatibility with existing functionality while
extending the utility of the `fetch` method for scenarios where cookie
persistence is necessary.
- **Issue:** Not applicable (new feature),
- **Dependencies:** Requires `aiohttp` and `requests` libraries (no new
dependencies introduced),
- **Twitter handle:** N/A
Co-authored-by: Joao Almeida <joao.almeida@mercedes-benz.io>
We can't use `json.dumps` by default as many types returned by the
cassandra driver are not serializable. It's safer to use `str` and let
users define their own custom `page_content_mapper` if needed.
if eg. the stream iterator is interrupted then adding more events to the
send_stream will raise an exception that we should catch (and handle
where appropriate)
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- **Description**: YoutubeLoader right now returns one document that
contains the entire transcript. I think it would be useful to add an
option to return multiple documents, where each document would contain
one line of transcript with the start time and duration in the metadata.
For example,
[AssemblyAIAudioTranscriptLoader](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/document_loaders/assemblyai.py)
is implemented in a similar way, it allows you to choose between the
format to use for the document loader.
- **Description:** Adding Baichuan Text Embedding Model and Baichuan Inc
introduction.
Baichuan Text Embedding ranks #1 in C-MTEB leaderboard:
https://huggingface.co/spaces/mteb/leaderboard
Co-authored-by: BaiChuanHelper <wintergyc@WinterGYCs-MacBook-Pro.local>
- **Description:** This PR adds [EdenAI](https://edenai.co/) for the
chat model (already available in LLM & Embeddings). It supports all
[ChatModel] functionality: generate, async generate, stream, astream and
batch. A detailed notebook was added.
- **Dependencies**: No dependencies are added as we call a rest API.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>