<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
**Description:** Implement stream and astream methods for RunnableLambda
to make streaming work for functions returning Runnable
- **Issue:** https://github.com/langchain-ai/langchain/issues/11998
- **Dependencies:** No new dependencies
- **Twitter handle:** https://twitter.com/qtangs
---------
Co-authored-by: Nuno Campos <nuno@langchain.dev>
**Description:**
Adding async methods to booth OllamaLLM and ChatOllama to enable async
streaming and async .on_llm_new_token callbacks.
**Issue:**
ChatOllama is not working in combination with an AsyncCallbackManager
because the .on_llm_new_token method is not awaited.
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- Added ensure_ascii property to ElasticsearchChatMessageHistory
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Ivan Chetverikov <ivan.chetverikov@raftds.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
**Description**: The parameter chunk_type was being hard coded to
"extractive_answers", so that when "snippet" was being passed, it was
being ignored. This change simply doesn't do that.
Added the call function get_summaries_as_docs inside of Arxivloader
- **Description:** Added a function that returns the documents from
get_summaries_as_docs, as the call signature is present in the parent
file but never used from Arxivloader, this can be used from Arxivloader
itself just like .load() as both the signatures are same.
- **Issue:** Reduces time to load papers as no pdf is processed only
metadata is pulled from Arxiv allowing users for faster load times on
bulk loads. Users can then choose one or more paper and use ID directly
with .load() to load pdf thereby loading all the contents of the paper.
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
## Description
Changes the behavior of `add_user_message` and `add_ai_message` to allow
for messages of those types to be passed in. Currently, if you want to
use the `add_user_message` or `add_ai_message` methods, you have to pass
in a string. For `add_message` on `ChatMessageHistory`, however, you
have to pass a `BaseMessage`. This behavior seems a bit inconsistent.
Personally, I'd love to be able to be explicit that I want to
`add_user_message` and pass in a `HumanMessage` without having to grab
the `content` attribute. This PR allows `add_user_message` to accept
`HumanMessage`s or `str`s and `add_ai_message` to accept `AIMessage`s or
`str`s to add that functionality and ensure backwards compatibility.
## Issue
* None
## Dependencies
* None
## Tag maintainer
@hinthornw
@baskaryan
## Note
`make test` results in `make: *** No rule to make target 'test'. Stop.`
- **Description:** `tools.gmail.send_message` implements a
`SendMessageSchema` that is not used anywhere. `GmailSendMessage` also
does not have an `args_schema` attribute (this led to issues when
invoking the tool with an OpenAI functions agent, at least for me). Here
we add the missing attribute and a minimal test for the tool.
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** N/A
---------
Co-authored-by: Chester Curme <chestercurme@microsoft.com>
- **Description:** In response to user feedback, this PR refactors the
Baseten integration with updated model endpoints, as well as updates
relevant documentation. This PR has been tested by end users in
production and works as expected.
- **Issue:** N/A
- **Dependencies:** This PR actually removes the dependency on the
`baseten` package!
- **Twitter handle:** https://twitter.com/basetenco
# Description
This PR adds the ability to pass a `botocore.config.Config` instance to
the boto3 client instantiated by the Bedrock LLM.
Currently, the Bedrock LLM doesn't support a way to pass a Config, which
means that some settings (e.g., timeouts and retry configuration)
require instantiating a new boto3 client with a Config and then
replacing the LLM's client:
```python
llm = Bedrock(
region_name='us-west-2',
model_id="anthropic.claude-v2",
model_kwargs={'max_tokens_to_sample': 4096, 'temperature': 0},
)
llm.client = boto_client('bedrock-runtime', region_name='us-west-2', config=Config({'read_timeout': 300}))
```
# Issue
N/A
# Dependencies
N/A
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
fix spellings
**seperate -> separate**: found more occurrences, see
https://github.com/langchain-ai/langchain/pull/14602
**initialise -> intialize**: the latter is more common in the repo
**pre-defined > predefined**: adding a comma after a prefix is a
delicate matter, but this is a generally accepted word
also, another word that appears in the repo is "fs" (stands for
filesystem), e.g., in `libs/core/langchain_core/prompts/loading.py`
` """Unified method for loading a prompt from LangChainHub or local
fs."""`
Isn't "filesystem" better?
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Ran <rccalman@gmail.com>
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- **Description:** This PR fixes test failures on Windows caused by path
handling differences and unescaped special characters in regex. The
failing tests are:
```
FAILED tests/unit_tests/storage/test_filesystem.py::test_yield_keys - AssertionError: assert ['key1', 'subdir\\key2'] == ['key1', 'subdir/key2']
FAILED tests/unit_tests/test_imports.py::test_importable_all - ModuleNotFoundError: No module named 'langchain_community.langchain_community\\adapters'
FAILED tests/unit_tests/tools/file_management/test_utils.py::test_get_validated_relative_path_errs_on_absolute - re.error: incomplete escape \U at position 53
FAILED tests/unit_tests/tools/file_management/test_utils.py::test_get_validated_relative_path_errs_on_parent_dir - re.error: incomplete escape \U at position 69
FAILED tests/unit_tests/tools/file_management/test_utils.py::test_get_validated_relative_path_errs_for_symlink_outside_root - re.error: incomplete escape \U at position 64
```
- **Issue:** fixes
https://github.com/langchain-ai/langchain/issues/11775 (partially)
- **Dependencies:** none
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Builds on #14040 with community refactor merged and notebook updated.
Note that with this refactor, models will be imported from
`langchain_community.chat_models.huggingface` rather than the main
`langchain` repo.
---------
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: ugm2 <unaigaraymaestre@gmail.com>
Signed-off-by: Yuchen Liang <yuchenl3@andrew.cmu.edu>
Co-authored-by: Andrew Reed <andrew.reed.r@gmail.com>
Co-authored-by: Andrew Reed <areed1242@gmail.com>
Co-authored-by: A-Roucher <aymeric.roucher@gmail.com>
Co-authored-by: Aymeric Roucher <69208727+A-Roucher@users.noreply.github.com>
Replace this entire comment with:
- **Description:** @kurtisvg has raised a point that it's a good idea to
have a fixed version for embeddings (since otherwise a user might run a
query with one version vs a vectorstore where another version was used).
In order to avoid breaking changes, I'd suggest to give users a warning,
and make a `model_name` a required argument in 1.5 months.
Surrealdb client changes from 0.3.1 to 0.3.2 broke the surrealdb vectore
integration.
This PR updates the code to work with the updated client. The change is
backwards compatible with previous versions of surrealdb client.
Also expanded the vector store implementation to store and retrieve
metadata that's included with the document object.
- **Description:** Fixed jaguar.py to import JaguarHttpClient with try
and catch
- **Issue:** the issue # Unable to use the JaguarHttpClient at run time
- **Dependencies:** It requires "pip install -U jaguardb-http-client"
- **Twitter handle:** workbot
---------
Co-authored-by: JY <jyjy@jaguardb>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description**
For the Momento Vector Index (MVI) vector store implementation, pass
through `filter_expression` kwarg to the MVI client, if specified. This
change will enable the MVI self query implementation in a future PR.
Also fixes some integration tests.
- **Description:** Fix typo in class Docstring to replace
AZURE_OPENAI_API_ENDPOINT by AZURE_OPENAI_ENDPOINT
- **Issue:** the issue #14901
- **Dependencies:** NA
- **Twitter handle:**
Co-authored-by: Yacine Bouakkaz <Yacine.Bouakkaz@evokegroup.com>
* This PR adds `stream` implementations to Runnable Branch.
* Runnable Branch still does not support `transform` so it'll break streaming if it happens in middle or end of sequence, but will work if happens at beginning of sequence.
* Fixes use the async callback manager for async methods
* Handle BaseException rather than Exception, so more errors could be logged as errors when they are encountered
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Description: Adding Summarization to Vectara, to reflect it provides not
only vector-store type functionality but also can return a summary.
Also added:
MMR capability (in the Vectara platform side)
Updated templates
Updated documentation and IPYNB examples
Tag maintainer: @baskaryan
Twitter handle: @ofermend
---------
Co-authored-by: Ofer Mendelevitch <ofermend@gmail.com>
**What is the reproduce code?**
```python
from langchain.chains import LLMChain, load_chain
from langchain.llms import Databricks
from langchain.prompts import PromptTemplate
def transform_output(response):
# Extract the answer from the responses.
return str(response["candidates"][0]["text"])
def transform_input(**request):
full_prompt = f"""{request["prompt"]}
Be Concise.
"""
request["prompt"] = full_prompt
return request
chat_model = Databricks(
endpoint_name="llama2-13B-chat-Brambles",
transform_input_fn=transform_input,
transform_output_fn=transform_output,
verbose=True,
)
print(f"Test chat model: {chat_model('What is Apache Spark')}") # This works
llm_chain = LLMChain(llm=chat_model, prompt=PromptTemplate.from_template("{chat_input}"))
llm_chain("colorful socks") # this works
llm_chain.save("databricks_llm_chain.yaml") # transform_input_fn and transform_output_fn are not serialized into the model yaml file
loaded_chain = load_chain("databricks_llm_chain.yaml") # The Databricks LLM is recreated with transform_input_fn=None, transform_output_fn=None.
loaded_chain("colorful socks") # Thus this errors. The transform_output_fn is needed to produce the correct output
```
Error:
```
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-6c34afab-3473-421d-877f-1ef18930ef4d/lib/python3.10/site-packages/pydantic/v1/main.py", line 341, in __init__
raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for Generation
text
str type expected (type=type_error.str)
request payload: {'query': 'What is a databricks notebook?'}'}
```
**What does the error mean?**
When the LLM generates an answer, represented by a Generation data
object. The Generation data object takes a str field called text, e.g.
Generation(text=”blah”). However, the Databricks LLM tried to put a
non-str to text, e.g. Generation(text={“candidates”:[{“text”: “blah”}]})
Thus, pydantic errors.
**Why the output format becomes incorrect after saving and loading the
Databricks LLM?**
Databrick LLM does not support serializing transform_input_fn and
transform_output_fn, so they are not serialized into the model yaml
file. When the Databricks LLM is loaded, it is recreated with
transform_input_fn=None, transform_output_fn=None. Without
transform_output_fn, the output text is not unwrapped, thus errors.
Missing transform_output_fn causes this error.
Missing transform_input_fn causes the additional prompt “Be Concise.” to
be lost after saving and loading.
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
## Description
This PR intends to add support for Qdrant's new [sparse vector
retrieval](https://qdrant.tech/articles/sparse-vectors/) by introducing
a new retriever class, `QdrantSparseVectorRetriever`.
Necessary usage docs and integration tests have been added for the
retriever.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:**
This PR fixes the issue faces with duplicate input id in Clarifai
vectorstore class when ingesting documents into the vectorstore more
than the batch size.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
## Description
Similar to https://github.com/langchain-ai/langchain/issues/5861, I've
experienced `KeyError`s resulting from unsafe lookups in the
`convert_dict_to_message` function in [this
file](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/adapters/openai.py).
While that issue focused on `KeyError 'content'`, I've opened another
issue (#14764) about how the problem still exists in the same function
but with `KeyError 'role'`. The fix for #5861 only added a safe lookup
to the specific line that was giving them trouble.. This PR fixes the
unsafe lookup in the rest of the function but the problem still exists
across the repo.
## Issues
* #14764
* #5861
## Dependencies
* None
## Checklist
[x] make format
[x] make lint
[ ] make test - Results in `make: *** No rule to make target 'test'.
Stop.`
## Maintainers
* @hinthornw
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
This PR adds support for PygmalionAI's [Aphrodite
Engine](https://github.com/PygmalionAI/aphrodite-engine), based on
vLLM's attention mechanism. At the moment, this PR does not include
support for the API servers, but they will be added in a later PR.
The only dependency as of now is `aphrodite-engine==0.4.2`. We pin the
version to prevent breakage due to changes in the aphrodite-engine
library.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Introducing an ability to work with the
[YandexGPT](https://cloud.yandex.com/en/services/yandexgpt) embeddings
models.
---------
Co-authored-by: Dmitry Tyumentsev <dmitry.tyumentsev@raftds.com>
- **Description:** Modify community chat model vertexai to handle png
and other image types encoded in base64
- **Dependencies:** added `import re` but no new dependencies.
This addresses a problem where the vertexai method
_parse_chat_history_gemini() was only recognizing image uris in jpeg
format. I made a simple change to cover other extension types.
- **Description:** The Qianfan SDK offers multiple authentication
methods, but in the `QianfanEndpoint` of Langchain, it currently only
supports authentication through AK and SK. In order to accommodate users
who wish to use alternative authentication methods, this pull request
makes AK and SK optional. This change should not impact existing users,
while allowing users to configure other authentication methods as per
the Qianfan SDK documentation.
- **Issue:** /
- **Dependencies:** No
- **Tag maintainer:** No
- **Twitter handle:**
Added Entry ID as a return value inside get_summaries_as_docs
- **Description:** Added the Entry ID as a return, so it's easier to
track the IDs of the papers that are being returned.
With the addition return of the entry ID in functions like
ArxivRetriever, it will be easier to reference the ID of the paper
itself.
- **Description:** Going forward, we have a own API `pip install
gradientai`. Therefore gradually removing the self-build packages in
llamaindex, haystack and langchain.
- **Issue:** None.
- **Dependencies:** `pip install gradientai`
- **Tag maintainer:** @michaelfeil
Very simple change in relation to the issue
https://github.com/langchain-ai/langchain/issues/14550
@baskaryan, @eyurtsev, @hwchase17.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Added logic for re-calling the YandexGPT API in case of
an error
---------
Co-authored-by: Dmitry Tyumentsev <dmitry.tyumentsev@raftds.com>
Description: A new vector store Jaguar is being added. Class, test
scripts, and documentation is added.
Issue: None -- This is the first PR contributing to LangChain
Dependencies: This depends on "pip install -U jaguardb-http-client"
client http package
Tag maintainer: @baskaryan, @eyurtsev, @hwchase1
Twitter handle: @workbot
---------
Co-authored-by: JY <jyjy@jaguardb>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Addded missed docstrings. Fixed inconsistency in docstrings.
**Note** CC @efriis
There were PR errors on
`langchain_experimental/prompt_injection_identifier/hugging_face_identifier.py`
But, I didn't touch this file in this PR! Can it be some cache problems?
I fixed this error.
- **Description:** added support for chat_history for Google
GenerativeAI (to actually use the `chat` API) plus since Gemini
currently doesn't have a support for SystemMessage, added support for it
only if a user provides additional `convert_system_message_to_human`
flag during model initialization (in this case, SystemMessage would be
prepanded to the first HumanMessage)
- **Issue:** #14710
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** lkuligin
---------
Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
- **Description:** This is addition to [my previous
PR](https://github.com/langchain-ai/langchain/pull/13930) with
improvements to flexibility allowing different models and notebook to
use ONNX runtime for faster speed. Since the last PR, [our
model](https://huggingface.co/laiyer/deberta-v3-base-prompt-injection)
got more than 660k downloads, and with the [public
benchmark](https://huggingface.co/spaces/laiyer/prompt-injection-benchmark)
showed much fewer false-positives than the previous one from deepset.
Additionally, on the ONNX runtime, it can be running 3x faster on the
CPU, which might be handy for builders using Langchain.
**Issue:** N/A
- **Dependencies:** N/A
- **Tag maintainer:** N/A
- **Twitter handle:** `@laiyer_ai`
Fixing issue - https://github.com/langchain-ai/langchain/issues/14494 to
avoid Kendra query ValidationException
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** Update kendra.py to avoid Kendra query
ValidationException,
- **Issue:** the issue
#https://github.com/langchain-ai/langchain/issues/14494,
- **Dependencies:** None,
- **Tag maintainer:** ,
- **Twitter handle:**
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:**
- Add a break case to `text_splitter.py::split_text_on_tokens()` to
avoid unwanted item at the end of result.
- Add a testcase to enforce the behavior.
- **Issue:**
- #14649
- #5897
- **Dependencies:** n/a,
---
**Quick illustration of change:**
```
text = "foo bar baz 123"
tokenizer = Tokenizer(
chunk_overlap=3,
tokens_per_chunk=7
)
output = split_text_on_tokens(text=text, tokenizer=tokenizer)
```
output before change: `["foo bar", "bar baz", "baz 123", "123"]`
output after change: `["foo bar", "bar baz", "baz 123"]`
This is technically a breaking change because it'll switch out default
models from `text-davinci-003` to `gpt-3.5-turbo-instruct`, but OpenAI
is shutting off those endpoints on 1/4 anyways.
Feels less disruptive to switch out the default instead.
Gpt-3.5 sometimes calls with empty string arguments instead of `{}`
I'd assume it's because the typescript representation on their backend
makes it a bit ambiguous.
- **Description:** VertexAIEmbeddings performance improvements
- **Twitter handle:** @vladkol
## Improvements
- Dynamic batch size, starting from 250, lowering down to 5. Batch size
varies across regions.
Some regions support larger batches, and it significantly improves
performance.
When running large batches of texts in `us-central1`, performance gain
can be up to 3.5x.
The dynamic batching also makes sure every batch is below 20K token
limit.
- New model parameter `embeddings_type` that translates to `task_type`
parameter of the API. Newer model versions support [different embeddings
task
types](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings#api_changes_to_models_released_on_or_after_august_2023).
Now that it's supported again for OAI chat models .
Shame this wouldn't include it in the `.invoke()` output though (it's
not included in the message itself). Would need to do a follow-up for
that to be the case
Fixed:
- `_agenerate` return value in the YandexGPT Chat Model
- duplicate line in the documentation
Co-authored-by: Dmitry Tyumentsev <dmitry.tyumentsev@raftds.com>
Builds out a developer documentation section in the docs
- Links it from contributing.md
- Adds an initial guide on how to contribute an integration
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Adds the option for `similarity_score_threshold` when using
`MongoDBAtlasVectorSearch` as a vector store retriever.
Example use:
```
vector_search = MongoDBAtlasVectorSearch.from_documents(...)
qa_retriever = vector_search.as_retriever(
search_type="similarity_score_threshold",
search_kwargs={
"score_threshold": 0.5,
}
)
qa = RetrievalQA.from_chain_type(
llm=OpenAI(),
chain_type="stuff",
retriever=qa_retriever,
)
docs = qa({"query": "..."})
```
I've tested this feature locally, using a MongoDB Atlas Cluster with a
vector search index.
… (#14723)
- **Description:** Minor updates per marketing requests. Namely, name
decisions (AI Foundation Models / AI Playground)
- **Tag maintainer:** @hinthornw
Do want to pass around the PR for a bit and ask a few more marketing
questions before merge, but just want to make sure I'm not working in a
vacuum. No major changes to code functionality intended; the PR should
be for documentation and only minor tweaks.
Note: QA model is a bit borked across staging/prod right now. Relevant
teams have been informed and are looking into it, and I'm placeholdered
the response to that of a working version in the notebook.
Co-authored-by: Vadim Kudlay <32310964+VKudlay@users.noreply.github.com>
Replace this entire comment with:
- **Description:** added support for new Google GenerativeAI models
- **Twitter handle:** lkuligin
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Description: Added NVIDIA AI Playground Initial support for a selection of models (Llama models, Mistral, etc.)
Dependencies: These models do depend on the AI Playground services in NVIDIA NGC. API keys with a significant amount of trial compute are available (10K queries as of the time of writing).
H/t to @VKudlay
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Co-authored-by: fangkeke <3339698829@qq.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Add a new ChatGoogleGenerativeAI class in a `langchain-google-genai`
package.
Still todo: add a deprecation warning in PALM
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru>
Co-authored-by: Bagatur <baskaryan@gmail.com>
h/t to @lkuligin
- **Description:** added new models on VertexAI
- **Twitter handle:** @lkuligin
---------
Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
This PR adds an example notebook for the Databricks Vector Search vector
store. It also adds an introduction to the Databricks Vector Search
product on the Databricks's provider page.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
When using local Chatglm2-6B by changing OPENAI_BASE_URL to localhost,
the token_usage in ChatOpenAI becomes None. This leads to an
AttributeError when trying to access token_usage.items().
This commit adds a check to ensure token_usage is not None before
accessing its items. This change prevents the AttributeError and allows
ChatOpenAI to work seamlessly with a local Chatglm2-6B model, aligning
with the way it operates with the OpenAI API.
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
**Description:** This PR fixes `HuggingFaceHubEmbeddings` by making the
API token optional (as in the client beneath). Most models don't require
one. I also updated the notebook for TEI (text-embeddings-inference)
accordingly as requested here #14288. In addition, I fixed a mistake in
the POST call parameters.
**Tag maintainers:** @baskaryan
## Description
New YAML output parser as a drop-in replacement for the Pydantic output
parser. Yaml is a much more token-efficient format than JSON, proving to
be **~35% faster and using the same percentage fewer completion
tokens**.
☑️ Formatted
☑️ Linted
☑️ Tested (analogous to the existing`test_pydantic_parser.py`)
The YAML parser excels in situations where a list of objects is
required, where the root object needs no key:
```python
class Products(BaseModel):
__root__: list[Product]
```
I ran the prompt `Generate 10 healthy, organic products` 10 times on one
chain using the `PydanticOutputParser`, the other one using
the`YamlOutputParser` with `Products` (see below) being the targeted
model to be created.
LLMs used were Fireworks' `lama-v2-34b-code-instruct` and OpenAI
`gpt-3.5-turbo`. All runs succeeded without validation errors.
```python
class Nutrition(BaseModel):
sugar: int = Field(description="Sugar in grams")
fat: float = Field(description="% of daily fat intake")
class Product(BaseModel):
name: str = Field(description="Product name")
stats: Nutrition
class Products(BaseModel):
"""A list of products"""
products: list[Product] # Used `__root__` for the yaml chain
```
Stats after 10 runs reach were as follows:
### JSON
ø time: 7.75s
ø tokens: 380.8
### YAML
ø time: 5.12s
ø tokens: 242.2
Looking forward to feedback, tips and contributions!
- **Description:** There is a bug in RedisNum filter that filter towards
value 0 will be parsed as "*". This is a fix to it.
- **Issue:** NA
- **Dependencies:** NA
- **Tag maintainer:** NA
- **Twitter handle:** NA
This PR updates RunnableWithMessage history to support user specific
configuration for the factory.
It extends support to passing multiple named arguments into the factory
if the factory takes more than a single argument.