Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
DefusedXML is causing parsing errors on previously functional code with
the 0.7.x versions. These do not seem to support newer version of python
well. 0.8.x has only been released as rc, so we're not going to to use
it in the core package
* Adds support for `additional_kwargs` in `get_cohere_chat_request`
* This functionality passes in Cohere SDK specific parameters from
`BaseMessage` based classes to the API
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [x] **Add len() implementation to Chroma**: "package: community"
- [x] **PR message**:
- **Description:** add an implementation of the __len__() method for the
Chroma vectostore, for convenience.
- **Issue:** no exposed method to know the size of a Chroma vectorstore
- **Dependencies:** None
- **Twitter handle:** lowrank_adrian
- [x] **Add tests and docs**
- [x] **Lint and test**
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** Be more explicit with the `model_kwargs` and
`encode_kwargs` for `HuggingFaceEmbeddings`.
- **Issue:** -
- **Dependencies:** -
I received some reports by my users that they didn't realise that you
could change the default `batch_size` with `HuggingFaceEmbeddings`,
which may be attributed to how the `model_kwargs` and `encode_kwargs`
don't give much information about what you can specify.
I've added some parameter names & links to the Sentence Transformers
documentation to help clear it up. Let me know if you'd rather have
Markdown/Sphinx-style hyperlinks rather than a "bare URL".
- Tom Aarsen
So this arose from the
https://github.com/langchain-ai/langchain/pull/18397 problem of document
loaders not supporting `pathlib.Path`.
This pull request provides more uniform support for Path as an argument.
The core ideas for this upgrade:
- if there is a local file path used as an argument, it should be
supported as `pathlib.Path`
- if there are some external calls that may or may not support Pathlib,
the argument is immidiately converted to `str`
- if there `self.file_path` is used in a way that it allows for it to
stay pathlib without conversion, is is only converted for the metadata.
Twitter handle: https://twitter.com/mwmajewsk
### Issue
Recently, the new `allow_dangerous_deserialization` flag was introduced
for preventing unsafe model deserialization that relies on pickle
without user's notice (#18696). Since then some LLMs like Databricks
requires passing in this flag with true to instantiate the model.
However, this breaks existing functionality to loading such LLMs within
a chain using `load_chain` method, because the underlying loader
function
[load_llm_from_config](f96dd57501/libs/langchain/langchain/chains/loading.py (L40))
(and load_llm) ignores keyword arguments passed in.
### Solution
This PR fixes this issue by propagating the
`allow_dangerous_deserialization` argument to the class loader iff the
LLM class has that field.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Create a Class which allows to use the "text2vec" open source embedding
model.
It should install the model by running 'pip install -U text2vec'.
Example to call the model through LangChain:
from langchain_community.embeddings.text2vec import Text2vecEmbeddings
embedding = Text2vecEmbeddings()
bookend.embed_documents([
"This is a CoSENT(Cosine Sentence) model.",
"It maps sentences to a 768 dimensional dense vector space.",
])
bookend.embed_query(
"It can be used for text matching or semantic search."
)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
## Description
This PR proposes a modification to the `libs/langchain/dev.Dockerfile`
configuration to copy the `libs/langchain/poetry.lock` into the working
directory. The change aims to address the issue where the Poetry install
command, the last command in the `dev.Dockerfile`, takes excessively
long hours, and to ensure the reproducibility of the poetry environment
in the devcontainer.
## Problem
The `dev.Dockerfile`, prepared for development environments such as
`.devcontainer`, encounters an unending dependency resolution when
attempting the Poetry installation.
### Steps to Reproduce
Execute the following build command:
```bash
docker build -f libs/langchain/dev.Dockerfile .
```
### Current Behavior
The Docker build process gets stuck at the following step, which, in my
experience, did not conclude even after an entire night:
```
=> [langchain-dev-dependencies 4/6] COPY libs/community/ ../community/ 0.9s
=> [langchain-dev-dependencies 5/6] COPY libs/text-splitters/ ../text-splitters/ 0.0s
=> [langchain-dev-dependencies 6/6] RUN poetry install --no-interaction --no-ansi --with dev,test,docs 12.3s
=> => # Updating dependencies
=> => # Resolving dependencies...
```
### Expected Behavior
The Docker build completes in a realistic timeframe. By applying this
PR, the build finishes within a few minutes.
### Analysis
The complexity of LangChain's dependencies has reached a point where
Poetry is required to resolve dependencies akin to threading a needle.
Consequently, poetry install fails to complete in a practical timeframe.
## Solution
The solution for dependency resolution is already recorded in
`libs/langchain/poetry.lock`, so we can use it. When copying
`project.toml` and `poetry.toml`, the `poetry.lock` located in the same
directory should also be copied.
```diff
# Copy only the dependency files for installation
-COPY libs/langchain/pyproject.toml libs/langchain/poetry.toml ./
+COPY libs/langchain/pyproject.toml libs/langchain/poetry.toml libs/langchain/poetry.lock ./
```
## Note
I am not intimately familiar with the historical context of the
`dev.Dockerfile` and thus do not know why `poetry.lock` has not been
copied until now. It might have been an oversight, or perhaps dependency
resolution used to complete quickly even without the `poetry.lock` file
in the past. However, if there are deliberate reasons why copying
`poetry.lock` is not advisable, please just close this PR.
Description:
this change fixes the pydantic validation error when looking up from
GPTCache, the `ChatOpenAI` class returns `ChatGeneration` as response
which is not handled.
use the existing `_loads_generations` and `_dumps_generations` functions
to handle it
Trace
```
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/development/scripts/chatbot-postgres-test.py", line 90, in <module>
print(llm.invoke("tell me a joke"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 166, in invoke
self.generate_prompt(
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 544, in generate_prompt
return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 408, in generate
raise e
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 398, in generate
self._generate_with_cache(
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 585, in _generate_with_cache
cache_val = llm_cache.lookup(prompt, llm_string)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_community/cache.py", line 807, in lookup
return [
^
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_community/cache.py", line 808, in <listcomp>
Generation(**generation_dict) for generation_dict in json.loads(res)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/load/serializable.py", line 120, in __init__
super().__init__(**kwargs)
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/pydantic/v1/main.py", line 341, in __init__
raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for Generation
type
unexpected value; permitted: 'Generation' (type=value_error.const; given=ChatGeneration; permitted=('Generation',))
```
Although I don't seem to find any issues here, here's an
[issue](https://github.com/zilliztech/GPTCache/issues/585) raised in
GPTCache. Please let me know if I need to do anything else
Thank you
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Few-Shot prompt template may use a `SemanticSimilarityExampleSelector`
that in turn uses a `VectorStore` that does I/O operations.
So to work correctly on the event loop, we need:
* async methods for the `VectorStore` (OK)
* async methods for the `SemanticSimilarityExampleSelector` (this PR)
* async methods for `BasePromptTemplate` and `BaseChatPromptTemplate`
(future work)
This is a small breaking change but I think it should be done as:
* No external dependency needs to be installed anymore for the default
to work
* It is vendor-neutral
Fixing some issues for AzureCosmosDBSemanticCache
- Added the entry for "AzureCosmosDBSemanticCache" which was missing in
langchain/cache.py
- Added application name when creating the MongoClient for the
AzureCosmosDBVectorSearch, for tracking purposes.
@baskaryan, can you please review this PR, we need this to go in asap.
These are just small fixes which we found today in our testing.
- **Description:** The `semantic_hybrid_search_with_score_and_rerank`
method of `AzureSearch` contains a hardcoded field name "metadata" for
the document metadata in the Azure AI Search Index. Adding such a field
is optional when creating an Azure AI Search Index, as other snippets
from `AzureSearch` test for the existence of this field before trying to
access it. Furthermore, the metadata field name shouldn't be hardcoded
as "metadata" and use the `FIELDS_METADATA` variable that defines this
field name instead. In the current implementation, any index without a
metadata field named "metadata" will yield an error if a semantic answer
is returned by the search in
`semantic_hybrid_search_with_score_and_rerank`.
- **Issue:** https://github.com/langchain-ai/langchain/issues/18731
- **Prior fix to this bug:** This bug was fixed in this PR
https://github.com/langchain-ai/langchain/pull/15642 by adding a check
for the existence of the metadata field named `FIELDS_METADATA` and
retrieving a value for the key called "key" in that metadata if it
exists. If the field named `FIELDS_METADATA` was not present, an empty
string was returned. This fix was removed in this PR
https://github.com/langchain-ai/langchain/pull/15659 (see
ed1ffca911#).
@lz-chen: could you confirm this wasn't intentional?
- **New fix to this bug:** I believe there was an oversight in the logic
of the fix from
[#1564](https://github.com/langchain-ai/langchain/pull/15642) which I
explain below.
The `semantic_hybrid_search_with_score_and_rerank` method creates a
dictionary `semantic_answers_dict` with semantic answers returned by the
search as follows.
5c2f7e6b2b/libs/community/langchain_community/vectorstores/azuresearch.py (L574-L581)
The keys in this dictionary are the unique document ids in the index, if
I understand the [documentation of semantic
answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers)
in Azure AI Search correctly. When the method transforms a search result
into a `Document` object, an "answer" key is added to the document's
metadata. The value for this "answer" key should be the semantic answer
returned by the search from this document, if such an answer is
returned. The match between a `Document` object and the semantic answers
returned by the search should be done through the unique document id,
which is used as a key for the `semantic_answers_dict` dictionary. This
id is defined in the search result's field named `FIELDS_ID`. I added a
check to avoid any error in case no field named `FIELDS_ID` exists in a
search result (which shouldn't happen in theory).
A benefit of this approach is that this fix should work whether or not
the Azure AI Search Index contains a metadata field.
@levalencia could you confirm my analysis and test the fix?
@raunakshrivastava7 do you agree with the fix?
Thanks for the help!
Description: Added support for AI21 Labs model - Segmentation, as a Text
Splitter
Dependencies: ai21, langchain-text-splitter
Twitter handle: https://github.com/AI21Labs
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
### Prem SDK integration in LangChain
This PR adds the integration with [PremAI's](https://www.premai.io/)
prem-sdk with langchain. User can now access to deployed models
(llms/embeddings) and use it with langchain's ecosystem. This PR adds
the following:
### This PR adds the following:
- [x] Add chat support
- [X] Adding embedding support
- [X] writing integration tests
- [X] writing tests for chat
- [X] writing tests for embedding
- [X] writing unit tests
- [X] writing tests for chat
- [X] writing tests for embedding
- [X] Adding documentation
- [X] writing documentation for chat
- [X] writing documentation for embedding
- [X] run `make test`
- [X] run `make lint`, `make lint_diff`
- [X] Final checks (spell check, lint, format and overall testing)
---------
Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** PgVector class always runs "create extension" on init
and this statement crashes on ReadOnly databases (read only replicas).
but wierdly the next create collection etc work even in readOnly
databases
- **Dependencies:** no new dependencies
- **Twitter handle:** @VenOmaX666
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Thank you for contributing to LangChain!
When run command langchain app new my-app, i get this error:
File
"/home/mauricio/.local/lib/python3.8/site-packages/langchain_cli/utils/pyproject.py",
line 15, in <module>
pyproject_toml: Path, local_editable_dependencies: Iterable[tuple[str,
Path]]
TypeError: 'type' object is not subscriptable
This PR fix the error.
**Description:**
- minor PR to speed up onboarding by not trying to add a dataset, if a
model is already present.
- replace batch publish API with streaming when single events are
published.
**Dependencies:** any dependencies required for this change
**Twitter handle:** behalder
Co-authored-by: Barun Halder <barun@fiddler.ai>
**Description:**
Expanding version in all the Confluence API calls so to get when the
page was last modified/created in all cases.
**Issue:** #12812
**Twitter handle:** zzste
This PR adds code to make sure that the correct base URL is being
created for the Azure Cognitive Search retriever. At the moment an
incorrect base URL is being generated. I think this is happening because
the original code was based on a depreciated API version. No
dependencies need to be added. I've also added more context to the test
doc strings.
I should also note that ACS is now Azure AI Search. I will open a
separate PR to make these changes as that would be a breaking change and
should potentially be discussed.
Twitter: @marlene_zw
- No new tests added, however the current ACS retriever tests are now
passing when I run them.
- Code was linted.
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** This commit introduces support for the newly
available GPU index types introduced in Milvus 2.4 within the LangChain
project's `milvus.py`. With the release of Milvus 2.4, a range of
GPU-accelerated index types have been added, offering enhanced search
capabilities and performance optimizations for vector search operations.
This update ensures LangChain users can fully utilize the new
performance benefits for vector search operations.
- Reference: https://milvus.io/docs/gpu_index.md
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
This patch fixes the #18022 issue, converting the SimSIMD internal
zero-copy outputs to NumPy.
I've also noticed, that oftentimes `dtype=np.float32` conversion is used
before passing to SimSIMD. Which numeric types do LangChain users
generally care about? We support `float64`, `float32`, `float16`, and
`int8` for cosine distances and `float16` seems reasonable for
practically any kind of embeddings and any modern piece of hardware, so
we can change that part as well 🤗
- **Description:** Added support for lower-case and mixed-case names
The names for tables and columns previouly had to be UPPER_CASE.
With this enhancement, also lower_case and MixedCase are supported,
- **Issue:** N/A
- **Dependencies:** no new dependecies added
- **Twitter handle:** @sapopensource
Previous PR passed _parser attribute which apparently is not meant to be
used by user code and causes non deterministic failures on CI when
testing the transform and a transform methods. Reverting this change
temporarily.
Description: adds support for langchain_cohere
---------
Co-authored-by: Harry M <127103098+harry-cohere@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
This mitigates a security concern for users still using older versions of libexpat that causes an attacker to compromise the availability of the system if an attacker manages to surface malicious payload to this XMLParser.
**Description:** This change passes through `batch_size` to
`add_documents()`/`aadd_documents()` on calls to `index()` and
`aindex()` such that the documents are processed in the expected batch
size.
**Issue:** #19415
**Dependencies:** N/A
**Twitter handle:** N/A
Updated `HuggingFacePipeline` docs to be in sync with list of supported
tasks, including translation.
- [x] **PR title**: "community: Update docs for `HuggingFacePipeline`"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**:
- **Description:** Update docs for `HuggingFacePipeline`, was earlier
missing `translation` as a valid task
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** None
- [x] **Add tests and docs**:
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
**Description:**
This PR adds [Dappier](https://dappier.com/) for the chat model. It
supports generate, async generate, and batch functionalities. We added
unit and integration tests as well as a notebook with more details about
our chat model.
**Dependencies:**
No extra dependencies are needed.
DuckDB has a cosine similarity function along list and array data types,
which can be used as a vector store.
- **Description:** The latest version of DuckDB features a cosine
similarity function, which can be used with its support for list or
array column types. This PR surfaces this functionality to langchain.
- **Dependencies:** duckdb 0.10.0
- **Twitter handle:** @igocrite
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** Update s3_file.py to use arguments **mode** and
**post_processors** from the base class **UnstructuredBaseLoader** to
include more metadata about the files from the S3 bucket such as
*'page_number', 'languages'* etc.
**Issue:** NA
**Dependencies:** None
**Twitter handle:** preak95
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** Invoke callback prior to yielding token for BaseOpenAI
& OpenAIChat
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)
**Dependencies:** None
**Description:** Invoke callback prior to yielding token for Fireworks
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)
**Dependencies:** None
**Description:** Moving FireworksEmbeddings documentation to the
location docs/integration/text_embedding/ from langchain_fireworks/docs/
**Issue:** FireworksEmbeddings documentation was not in the correct
location
**Dependencies:** None
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
I have a small dataset, and I tried to use docarray:
``DocArrayHnswSearch ``. But when I execute, it returns:
```bash
raise ImportError(
ImportError: Could not import docarray python package. Please install it with `pip install "langchain[docarray]"`.
```
Instead of docarray it needs to be
```bash
docarray[hnswlib]
```
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
RecursiveUrlLoader does not currently provide an option to set
`base_url` other than the `url`, though it uses a function with such an
option.
For example, this causes it unable to parse the
`https://python.langchain.com/docs`, as it returns the 404 page, and
`https://python.langchain.com/docs/get_started/introduction` has no
child routes to parse.
`base_url` allows setting the `https://python.langchain.com/docs` to
filter by, while the starting URL is anything inside, that contains
relevant links to continue crawling.
I understand that for this case, the docusaurus loader could be used,
but it's a common issue with many websites.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Modified regular expression to add support for
unicode chars and simplify pattern
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Delete MistralAIEmbeddings usage document from folder
partners/mistralai/docs
**Issue:** The document is present in the folder docs/docs
**Dependencies:** None
**Description:** Invoke callback prior to yielding token for llama.cpp
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)
**Dependencies:** None
```python
from langchain.agents import tool
from langchain_mistralai import ChatMistralAI
llm = ChatMistralAI(model="mistral-large-latest", temperature=0)
@tool
def get_word_length(word: str) -> int:
"""Returns the length of a word."""
return len(word)
tools = [get_word_length]
llm_with_tools = llm.bind_tools(tools)
llm_with_tools.invoke("how long is the word chrysanthemum")
```
currently raises
```
AttributeError: 'dict' object has no attribute 'model_dump'
```
Same with `.with_structured_output`
```python
from langchain_mistralai import ChatMistralAI
from langchain_core.pydantic_v1 import BaseModel
class AnswerWithJustification(BaseModel):
"""An answer to the user question along with justification for the answer."""
answer: str
justification: str
llm = ChatMistralAI(model="mistral-large-latest", temperature=0)
structured_llm = llm.with_structured_output(AnswerWithJustification)
structured_llm.invoke("What weighs more a pound of bricks or a pound of feathers")
```
This appears to fix.
For prompt templates with only 1 variable (common in e.g.,
MessageGraph), it's convenient to wrap the incoming object in the
variable before formatting.
The downside of this, of course, would be that some number of
invocations will successfully format when the user may have intended to
format it properly before
This is a basic VectorStore implementation using an in-memory dict to
store the documents.
It doesn't need any extra/optional dependency as it uses numpy which is
already a dependency of langchain.
This is useful for quick testing, demos, examples.
Also it allows to write vendor-neutral tutorials, guides, etc...
Classes and functions defined in __init__.py are not parsed into the API
Reference.
For example:
- libs/core/langchain_core/messages/__init__.py : AnyMessage,
MessageLikeRepresentation, get_buffer_string(), messages_from_dict(),
...
Opinionated: __init__.py is not a typical place to define artifacts.
Moved artifacts from __init__ into utils.py.
Added `MessageLikeRepresentation` to __all__ since it is used outside of
`messages`, for example, in
`libs/core/langchain_core/language_models/base.py`
Added `_message_from_dict` to __all__ since it is used outside of
`messages`(???) I would add `message_from_dict` (without underscore) as
an alias. Please, advise.
Covered by tests in
`libs/core/tests/unit_tests/language_models/chat_models/test_base.py`,
`libs/core/tests/unit_tests/language_models/llms/test_base.py` and
`libs/core/tests/unit_tests/runnables/test_runnable_events.py`
**Description:**
Currently, `CacheBackedEmbeddings` computes vectors for *all* uncached
documents before updating the store. This pull request updates the
embedding computation loop to compute embeddings in batches, updating
the store after each batch.
I noticed this when I tried `CacheBackedEmbeddings` on our 30k document
set and the cache directory hadn't appeared on disk after 30 minutes.
The motivation is to minimize compute/data loss when problems occur:
* If there is a transient embedding failure (e.g. a network outage at
the embedding endpoint triggers an exception), at least the completed
vectors are written to the store instead of being discarded.
* If there is an issue with the store (e.g. no write permissions), the
condition is detected early without computing (and discarding!) all the
vectors.
**Issue:**
Implements enhancement #18026.
**Testing:**
I was unable to run unit tests; details in [this
post](https://github.com/langchain-ai/langchain/discussions/15019#discussioncomment-8576684).
---------
Signed-off-by: chrispy <chrispy@synopsys.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
## Description
Semantic Cache can retrieve noisy information if the score threshold for
the value is too low. Adding the ability to set a `score_threshold` on
cache construction can allow for less noisy scores to appear.
- [x] **Add tests and docs**
1. Added tests that confirm the `score_threshold` query is valid.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Classes and functions defined in __init__.py are not parsed into the API
Reference.
For example: libs/core/langchain_core/globals/__init__.py :
`set_verbose` `get_llm_cache`, `set_llm_cache`, ...
And the whole `langchain_core.globals` namespace is not visible in the
API Reference. The refactoring is just file renaming.
- **Description:** Enhanced the `BaseChatModel` to support an
`Optional[Union[bool, BaseCache]]` type for the `cache` attribute,
allowing for both boolean flags and custom cache implementations.
Implemented logic within chat model methods to utilize the provided
custom cache implementation effectively. This change aims to provide
more flexibility in caching strategies for chat models.
- **Issue:** Implements enhancement request #17242.
- **Dependencies:** No additional dependencies required for this change.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Changing OpenAIAssistantRunnable.create_assistant to send the `file_ids`
parameter to openai.beta.assistants.create
Co-authored-by: Frederico Wu <fred.diaswu@coxautoinc.com>
**Description**:
this PR enable VectorStore autoconfiguration for Infinispan: if
metadatas are only of basic types, protobuf
config will be automatically generated for the user.
When creating a new index, if we use a retrieval strategy that expects a
model to be deployed in Elasticsearch, check if a model with this name
is indeed deployed before creating an index. This lowers the probability
to get into a state in which an index was created with a faulty model
ID, which cannot be overwritten any more (the index has to manually be
deleted).
Add `keep_alive` parameter to control how long the model will stay
loaded into memory with Ollama。
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Issue : For functions which have an argument with the name 'title', the
convert_pydantic_to_openai_function generates an incorrect output and
omits the argument all together. This is because the _rm_titles function
removes all instances of the the key 'title' from the output.
Description : Updates the _rm_titles function to check the presence of
the 'type' key as well before removing the 'title' key. As the title key
that we wish to omit always has a type key along with it.
Potential gap if there is a function defined which has both title and
key as argument names, in which case this would fail. Maybe we could set
a filter on the function argument names and reject those with keyword
argument names.
No dependencies. Passed all tests.
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- **Description:** There was no formatter for mistral models for Azure
ML endpoints. Adding that, plus a configurable timeout (it was hard
coded before)
- **Dependencies:** none
- **Twitter handle:** @tjaffri @docugami
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
add **kwargs in add_documents for upsert, to make it use for other
argument also.
Lets use this, it was unused as of now.
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Co-authored-by: Rohit Gupta <rohit.gupta2@walmart.com>
## Description
This PR modifies the settings in `libs/langchain/dev.Dockerfile` to
ensure that the `text-splitters` directory is copied before the poetry
installation process begins.
Without this modification, the `docker build` command fails for
`dev.Dockerfile`, preventing the setup of some development environments,
including `.devcontainer`.
## Bug Details
### Repro
Run the following command:
```bash
docker build -f libs/langchain/dev.Dockerfile .
```
### Current Behavior
The docker build command fails, raising the following error:
```
...
=> [langchain-dev-dependencies 4/5] COPY libs/community/ ../community/ 0.4s
=> ERROR [langchain-dev-dependencies 5/5] RUN poetry install --no-interaction --no-ansi --with dev,test,docs 1.1s
------
> [langchain-dev-dependencies 5/5] RUN poetry install --no-interaction --no-ansi --with dev,test,docs:
#13 0.970
#13 0.970 Directory ../text-splitters does not exist
------
executor failed running [/bin/sh -c poetry install --no-interaction --no-ansi --with dev,test,docs]: exit code: 1
```
### Expected Behavior
The `docker build` command successfully completes without the poetry
error.
### Analysis
The error occurs because the `text-splitters` directory is not copied
into the build environment, unlike the other packages under the `libs`
directory. I suspect that the `COPY` setting was overlooked since
`text-splitters` was separated in a recent PR.
## Fix
Add the following lines to the `libs/langchain/dev.Dockerfile`:
```dockerfile
# Copy the text-splitters library for installation
COPY libs/text-splitters/ ../text-splitters/
```
- **Description:** Tests fail to do value lookup because it does not
specify the index name
- **Issue:** the issue # Failing integration test
- [x] **Add tests and docs**: Tests now pass
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
The root run id (~trace id's) is useful for assigning feedback, but the
current recommended approach is to use callbacks to retrieve it, which
has some drawbacks:
1. Doesn't work for streaming until after the first event
2. Doesn't let you call other endpoints with the same trace ID in
parallel (since you have to wait until the call is completed/started to
use
This PR lets you provide = "run_id" in the runnable config.
Couple considerations:
1. For batch calls, we split the trace up into separate trees (to permit
better rendering). We keep the provided run ID for the first one and
generate a unique one for other elements of the batch.
2. For nested calls, the provided ID is ONLY used on the top root/trace.
### Example Usage
```
chain.invoke("foo", {"run_id": uuid.uuid4()})
```
Classes are missed in __all__ and in different places of __init__.py
- BaichuanLLM
- ChatDatabricks
- ChatMlflow
- Llamafile
- Mlflow
- Together
Added classes to __all__. I also sorted __all__ list.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [x] **PR title**: "community: deprecate DocugamiLoader"
- [x] **PR message**: Deprecate the langchain_community and use the
docugami_langchain DocugamiLoader
---------
Co-authored-by: Kenzie Mihardja <kenzie28@cs.washington.edu>
## Description
* In memory cache easily gets out of sync with the server cache, so we
will remove it entirely to reduce the issues around invalidated caches.
## Dependencies
None
- [x] If you're adding a new integration, please include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Co-authored-by: Erick Friis <erick@langchain.dev>
## Description
Returning the embedding is not necessary in the vector search
functionality unless specified as a debugging step. This change defaults
the behavior such that the server _only_ returns the embedding key if
explicitly requested, such as in the case of
`max_marginal_relevance_search`.
- [x] **Add tests and docs**: If you're adding a new integration, please
include
* Added `test_from_documents_no_embedding_return`
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- Description:
- Updated the import path for `StreamingStdOutCallbackHandler` in the
streaming response example within `huggingface_endpoint.py`. This change
corrects the import statement to reflect the actual location of
`StreamingStdOutCallbackHandler` in
`langchain_core.callbacks.streaming_stdout`.
- Issue:
- None
- Dependencies:
- No additional dependencies are required for this change.
- Twitter handle:
- None
## Note:
I have tested this change locally and confirmed that the
`StreamingStdOutCallbackHandler` works as expected with the updated
import path. This PR does not require the addition of new tests since it
is a correction to documentation/examples rather than functional code.
- [x] **Support for translation**: "community: Add support for
translation in `HuggingFacePipeline`"
- [x] **Add support for translation in `HuggingFacePipeline`**:
- **Description:** Add support for translation in `HuggingFacePipeline`,
which earlier used to support only text summarization and generation.
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** None
- Description:
- This pull request is to fix a bug where page numbers were not set
correctly. In the current code, all chunks share the same metadata
object doc_metadata, so the page number is set with the same value for
all documents. To fix this, I changed to using separate metadata objects
for each chunk.
- Issue:
- None
- Dependencies:
- No additional dependencies are required for this change.
- Twitter handle:
- @eycjur
- Test
- Even if it's not a bug, there are cases where everything ends up with
the same number of pages, so it's very difficult for me to write
integration tests.
**Description:**
#18040 forces `fastembed>2.0`, and this causes dependency conflicts with
the new `unstructured` package (different `onnxruntime`). There may be
other dependency conflicts.. The only way to use
`langchain-community>=0.0.28` is rollback to `unstructured 0.10.X`. But
new `unstructured` contains many fixes.
This PR allows to use both `fastembed` `v1` and `v2`.
How to reproduce:
`pyproject.toml`:
```toml
[tool.poetry]
name = "depstest"
version = "0.0.0"
description = "test"
authors = ["<dev@example.org>"]
[tool.poetry.dependencies]
python = ">=3.10,<3.12"
langchain-community = "^0.0.28"
fastembed = "^0.2.0"
unstructured = {extras = ["pdf"], version = "^0.12"}
```
```bash
$ poetry lock
```
Co-authored-by: Sergey Kozlov <sergey.kozlov@ludditelabs.io>
- **Description:** This modification addresses the issue of mutable
default parameters in functions. In the original code, the `chunks`
parameter is defaulted to a list containing an empty dictionary, which
is mutable. Since default parameters in Python are evaluated only once
at function definition time, modifications to the parameter would
persist across future calls. By changing the default to `None` and
checking/initializing within the function, a new list is created for
each call, thus avoiding potential issues.
---------
Co-authored-by: sixiang <sixiang@lixiang.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Update docstring of Together class to show example and
update API URL
**Issue:** Improves usability
**Dependencies:** None
**Lint and test**: `make format`, `make lint` and `make test` were run
- Description: [a description of the change] Add in code documentation
to core Runnable with_fallbacks method (docs only)
- Issue: the issue #18804
@eyurtsev PTAL
**Description:** Many LLM steps complete in sub-second duration, which
can lead to non-collection of duration field for Fiddler. This PR
updates duration from seconds to milliseconds.
**Issue:** [INTERNAL] FDL-17568
**Dependencies:** NA
**Twitter handle:** behalder
Co-authored-by: Barun Halder <barun@fiddler.ai>
- **Description:** Handling fallbacks when calling async streaming for a
LLM that doesn't support it.
- **Issue:** #18920
- **Twitter handle:**@maximeperrin_
---------
Co-authored-by: Maxime Perrin <mperrin@doing.fr>
**Description:** This PR adds updates the fiddler events schema to also
pass user feedback, and llm status to fiddler
**Tickets:** [INTERNAL] FDL-17559
**Dependencies:** NA
**Twitter handle:** behalder
Co-authored-by: Barun Halder <barun@fiddler.ai>
- **Description:** This modification adds pydantic input definition for
sql_database tools. This helps for function calling capability in
LangGraph. Since actions nodes will usually check for the args_schema
attribute on tools, This update should make these tools compatible with
it (only implemented on the InfoSQLDatabaseTool)
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** juanfe8881
poetry can't reliably handle resolving the number of optional "extended
test" dependencies we have. If we instead just rely on pip to install
extended test deps in CI, this isn't an issue.
Description: Added support for batching when using AI21 Embeddings model
Twitter handle: https://github.com/AI21Labs
---------
Co-authored-by: Asaf Gardin <asafg@ai21.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** add async tests, add tokenize support
- **Dependencies:**
[ibm-watsonx-ai](https://pypi.org/project/ibm-watsonx-ai/),
- **Tag maintainer:**
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally -> ✅
Please make sure integration_tests passing locally -> ✅
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR makes the following updates in the pgvector database:
1. Use JSONB field for metadata instead of JSON
2. Update operator syntax to include required `$` prefix before the
operators (otherwise there will be name collisions with fields)
3. The change is non-breaking, old functionality is still the default,
but it will emit a deprecation warning
4. Previous functionality has bugs associated with comparisons due to
casting to text (so lexical ordering is used incorrectly for numeric
fields)
5. Adds an a GIN index on the JSONB field for more efficient querying
**PR message**: ***Delete this entire checklist*** and replace with
- **Description:** [a description of the change](docs: Add in code
documentation to core Runnable assign method)
- **Issue:** the issue #18804
- **Description:** When calling the `_stream_log_implementation` from
the `astream_log` method in the `Runnable` class, it is not handing over
the `kwargs` argument. Therefore, even if i want to customize APIHandler
and implement additional features with additional arguments, it is not
possible. Conversely, the `astream_events` method normally handing over
the `kwargs` argument.
- **Issue:** https://github.com/langchain-ai/langchain/issues/19054
- **Dependencies:**
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
Co-authored-by: hyungwookyang <hyungwookyang@worksmobile.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
- **Description:** This change fixes a bug where attempts to load data
from Notion using the NotionDBLoader resulted in a 400 Bad Request
error. The issue was traced to the unconditional addition of an empty
'filter' object in the request payload, which Notion's API does not
accept. The modification ensures that the 'filter' object is only
included in the payload when it is explicitly provided and not empty,
thus preventing the 400 error from occurring.
- **Issue:** Fixes
[#18009](https://github.com/langchain-ai/langchain/issues/18009)
- **Dependencies:** None
- **Twitter handle:** @gunnzolder
Co-authored-by: Anton Parkhomenko <anton@merge.rocks>
This PR adds `batch as completed` method to the standard Runnable
interface. It takes in a list of inputs and yields the corresponding
outputs as the inputs are completed.
Add documentation notebook for `ElasticsearchRetriever`.
## Dependencies
- [ ] Release new `langchain-elasticsearch` version 0.2.0 that includes
`ElasticsearchRetriever`
**Description:** Circular dependencies when parsing references leading
to `RecursionError: maximum recursion depth exceeded` issue. This PR
address the issue by handling previously seen refs as in any typical DFS
to avoid infinite depths.
**Issue:** https://github.com/langchain-ai/langchain/issues/12163
**Twitter handle:** https://twitter.com/theBhulawat
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** Refactor code of FAISS vectorcstore and update the
related documentation.
Details:
- replace `.format()` with f-strings for strings formatting;
- refactor definition of a filtering function to make code more readable
and more flexible;
- slightly improve efficiency of
`max_marginal_relevance_search_with_score_by_vector` method by removing
unnecessary looping over the same elements;
- slightly improve efficiency of `delete` method by using set data
structure for checking if the element was already deleted;
**Issue:** fix small inconsistency in the documentation (the old example
was incorrect and unappliable to faiss vectorstore)
**Dependencies:** basic langchain-community dependencies and `faiss`
(for CPU or for GPU)
**Twitter handle:** antonenkodev
Issue : _call method of LLMRouterChain uses predict_and_parse, which is
slated for deprecation.
Description : Instead of using predict_and_parse, this replaces it with
individual predict and parse functions.
**Description:** Update the docstring of OpenAI, OpenAIEmbeddings and
ChatOpenAI classes
**Issue:** Update import module paths to the current LangChain API
**Dependencies:** None
**Lint and test**: `make format` and `make lint` were run
This incorporates the review comments from langchain-ai/langchain#18637
which I closed due to an issue I had in updating that pr branch
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:** Update AnthropicLLM deprecation message import path for
ChatAnthropic
**Issue:** Incorrect import path in deprecation message
**Dependencies:** None
**Lint and test**: `make format`, `make lint` and `make test` were run
This PR updates the on_tool_end handlers to return the raw output from the tool instead of casting it to a string.
This is technically a breaking change, though it's impact is expected to be somewhat minimal. It will fix behavior in `astream_events` as well.
Fixes the following issue #18760 raised by @eyurtsev
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
BasePDFLoader doesn't parse the suffix of the file correctly when
parsing S3 presigned urls. This fix enables the proper detection and
parsing of S3 presigned URLs to prevent errors such as `OSError: [Errno
36] File name too long`.
No additional dependencies required.
Deduplicate documents using MD5 of the page_content. Also allows for
custom deduplication with graph ingestion method by providing metadata
id attribute
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:** Adding an optional parameter `linearization_config`
to the `AmazonTextractPDFLoader` so the caller can define how the output
will be linearized, instead of forcing a predefined set of linearization
configs. It will still have a default configuration as this will be an
optional parameter.
- **Issue:** #17457
- **Dependencies:** The same ones that already exist for
`AmazonTextractPDFLoader`
- **Twitter handle:** [@lvieirajr19](https://twitter.com/lvieirajr19)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
*Description**: My previous
[PR](https://github.com/langchain-ai/langchain/pull/18521) was
mistakenly closed, so I am reopening this one. Context: AWS released two
Mistral models on Bedrock last Friday (March 1, 2024). This PR includes
some code adjustments to ensure their compatibility with the Bedrock
class.
---------
Co-authored-by: Anis ZAKARI <anis.zakari@hymaia.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Update azuresearch vectorstore from_texts() method to
include fields argument, necessary for creating an Azure AI Search index
with custom fields.
- **Issue:** Currently index fields are fixed to default fields if Azure
Search index is created using from_texts() method
- **Dependencies:** None
- **Twitter handle:** None
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Small improvement to the openapi prompt.
The agent was not finding the server base URL (looping through all
nodes). This small change narrows the search and enables finding the url
faster.
No dependency
Twitter : @al1pra
# Proper example for AzureOpenAI usage in error message
The original error message is wrong in part of a usage example it gives.
Corrected to the right one.
Co-authored-by: Dzmitry Kankalovich <dzmitry_kankalovich@epam.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** `S3DirectoryLoader` is failing if prefix is a folder
(ex: `my_folder/`) because `S3FileLoader` will try to load that folder
and will fail. This PR skip nested directories so prefix can be set to
folder instead of `my_folder/files_prefix`.
- **Issue:**
- #11917
- #6535
- #4326
- **Dependencies:** none
- **Twitter handle:** @Falydoor
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
- [ ] Title: Mongodb: MongoDB connection performance improvement.
- [ ] Message:
- **Description:** I made collection index_creation as optional. Index
Creation is one time process.
- **Issue:** MongoDBChatMessageHistory class object is attempting to
create an index during connection, causing each request to take longer
than usual. This should be optional with a parameter.
- **Dependencies:** N/A
- **Branch to be checked:** origin/mongo_index_creation
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Add embedding instruction to
HuggingFaceBgeEmbeddings, so that it can be compatible with nomic and
other models that need embedding instruction.
---------
Co-authored-by: Tao Wu <tao.wu@rwth-aachen.de>
Co-authored-by: Bagatur <baskaryan@gmail.com>
_generate() and _agenerate() both accept **kwargs, then pass them on to
_format_output; but _format_output doesn't accept **kwargs. Attempting
to pass, e.g.,
timeout=50
to _generate (or invoke()) results in a TypeError.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
## Add Passio Nutrition AI Food Search Tool to Community Package
### Description
We propose adding a new tool to the `community` package, enabling
integration with Passio Nutrition AI for food search functionality. This
tool will provide a simple interface for retrieving nutrition facts
through the Passio Nutrition AI API, simplifying user access to
nutrition data based on food search queries.
### Implementation Details
- **Class Structure:** Implement `NutritionAI`, extending `BaseTool`. It
includes an `_run` method that accepts a query string and, optionally, a
`CallbackManagerForToolRun`.
- **API Integration:** Use `NutritionAIAPI` for the API wrapper,
encapsulating all interactions with the Passio Nutrition AI and
providing a clean API interface.
- **Error Handling:** Implement comprehensive error handling for API
request failures.
### Expected Outcome
- **User Benefits:** Enable easy querying of nutrition facts from Passio
Nutrition AI, enhancing the utility of the `langchain_community` package
for nutrition-related projects.
- **Functionality:** Provide a straightforward method for integrating
nutrition information retrieval into users' applications.
### Dependencies
- `langchain_core` for base tooling support
- `pydantic` for data validation and settings management
- Consider `requests` or another HTTP client library if not covered by
`NutritionAIAPI`.
### Tests and Documentation
- **Unit Tests:** Include tests that mock network interactions to ensure
tool reliability without external API dependency.
- **Documentation:** Create an example notebook in
`docs/docs/integrations/tools/passio_nutrition_ai.ipynb` showing usage,
setup, and example queries.
### Contribution Guidelines Compliance
- Adhere to the project's linting and formatting standards (`make
format`, `make lint`, `make test`).
- Ensure compliance with LangChain's contribution guidelines,
particularly around dependency management and package modifications.
### Additional Notes
- Aim for the tool to be a lightweight, focused addition, not
introducing significant new dependencies or complexity.
- Potential future enhancements could include caching for common queries
to improve performance.
### Twitter Handle
- Here is our Passio AI [twitter handle](https://twitter.com/@passio_ai)
where we announce our products.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
"community: added a feature to filter documents in Mongoloader"
- **Description:** added a feature to filter documents in Mongoloader
- **Feature:** the feature #18251
- **Dependencies:** No
- **Twitter handle:** https://twitter.com/im_Kushagra
For some DBs with lots of tables, reflection of all the tables can take
very long. So this change will make the tables be reflected lazily when
get_table_info() is called and `lazy_table_reflection` is True.
Allows all chat models that implement _stream, but not _astream to still have async streaming to work.
Amongst other things this should resolve issues with streaming community model implementations through langserve since langserve is exclusively async.
**Description:** Replacing the deprecated predict() and apredict()
methods in the unit tests
**Issue:** Not applicable
**Dependencies:** None
**Lint and test**: `make format`, `make lint` and `make test` have been
run
Make `ElasticsearchRetriever` available as top-level import.
The `langchain` package depends on `langchain-community` so we do not
need to depend on it explicitly.
## Description
- Add [Friendli](https://friendli.ai/) integration for `Friendli` LLM
and `ChatFriendli` chat model.
- Unit tests and integration tests corresponding to this change are
added.
- Documentations corresponding to this change are added.
## Dependencies
- Optional dependency
[`friendli-client`](https://pypi.org/project/friendli-client/) package
is added only for those who use `Frienldi` or `ChatFriendli` model.
## Twitter handle
- https://twitter.com/friendliai
This pull request introduces initial support for the TiDB vector store.
The current version is basic, laying the foundation for the vector store
integration. While this implementation provides the essential features,
we plan to expand and improve the TiDB vector store support with
additional enhancements in future updates.
Upcoming Enhancements:
* Support for Vector Index Creation: To enhance the efficiency and
performance of the vector store.
* Support for max marginal relevance search.
* Customized Table Structure Support: Recognizing the need for
flexibility, we plan for more tailored and efficient data store
solutions.
Simple use case exmaple
```python
from typing import List, Tuple
from langchain.docstore.document import Document
from langchain_community.vectorstores import TiDBVectorStore
from langchain_openai import OpenAIEmbeddings
db = TiDBVectorStore.from_texts(
embedding=embeddings,
texts=['Andrew like eating oranges', 'Alexandra is from England', 'Ketanji Brown Jackson is a judge'],
table_name="tidb_vector_langchain",
connection_string=tidb_connection_url,
distance_strategy="cosine",
)
query = "Can you tell me about Alexandra?"
docs_with_score: List[Tuple[Document, float]] = db.similarity_search_with_score(query)
for doc, score in docs_with_score:
print("-" * 80)
print("Score: ", score)
print(doc.page_content)
print("-" * 80)
```
## **Description:**
MongoDB integration tests link to a provided Atlas Cluster. We have very
stringent permissions set against the cluster provided. In order to make
it easier to track and isolate the collections each test gets run
against, we've updated the collection names to map the test file name.
i.e. `langchain_{filename}` => `langchain_test_vectorstores`
Fixes integration test results
![image](https://github.com/langchain-ai/langchain/assets/2887713/41f911b9-55f7-4fe4-9134-5514b82009f9)
## **Dependencies:**
Provided MONGODB_ATLAS_URI
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
cc: @shaneharvey, @blink1073 , @NoahStapp , @caseyclements
- **Description:** Chroma use uuid4 instead of uuid1 as random ids. Use
uuid1 may leak mac address, changing to uuid4 will not cause other
effects.
- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** None
Fixes#18513.
## Description
This PR attempts to fix the support for Anthropic Claude v3 models in
BedrockChat LLM. The changes here has updated the payload to use the
`messages` format instead of the formatted text prompt for all models;
`messages` API is backwards compatible with all models in Anthropic, so
this should not break the experience for any models.
## Notes
The PR in the current form does not support the v3 models for the
non-chat Bedrock LLM. This means, that with these changes, users won't
be able to able to use the v3 models with the Bedrock LLM. I can open a
separate PR to tackle this use-case, the intent here was to get this out
quickly, so users can start using and test the chat LLM. The Bedrock LLM
classes have also grown complex with a lot of conditions to support
various providers and models, and is ripe for a refactor to make future
changes more palatable. This refactor is likely to take longer, and
requires more thorough testing from the community. Credit to PRs
[18579](https://github.com/langchain-ai/langchain/pull/18579) and
[18548](https://github.com/langchain-ai/langchain/pull/18548) for some
of the code here.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:**
This integrates Infinispan as a vectorstore.
Infinispan is an open-source key-value data grid, it can work as single
node as well as distributed.
Vector search is supported since release 15.x
For more: [Infinispan Home](https://infinispan.org)
Integration tests are provided as well as a demo notebook
Follow up on https://github.com/langchain-ai/langchain/pull/17467.
- Update all references to the Elasticsearch classes to use the partners
package.
- Deprecate community classes.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
ValidationError: 2 validation errors for DocArrayDoc
text
Field required [type=missing, input_value={'embedding': [-0.0191128...9, 0.01005221541175212]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.5/v/missing
metadata
Field required [type=missing, input_value={'embedding': [-0.0191128...9, 0.01005221541175212]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.5/v/missing
```
In the `_get_doc_cls` method, the `DocArrayDoc` class is defined as
follows:
```python
class DocArrayDoc(BaseDoc):
text: Optional[str]
embedding: Optional[NdArray] = Field(**embeddings_params)
metadata: Optional[dict]
```
This is a PR that adds a dangerous load parameter to force users to opt in to use pickle.
This is a PR that's meant to raise user awareness that the pickling module is involved.
This is a patch for `CVE-2024-2057`:
https://www.cve.org/CVERecord?id=CVE-2024-2057
This affects users that:
* Use the `TFIDFRetriever`
* Attempt to de-serialize it from an untrusted source that contains a
malicious payload
- **Description:** Databricks SerDe uses cloudpickle instead of pickle
when serializing a user-defined function transform_input_fn since pickle
does not support functions defined in `__main__`, and cloudpickle
supports this.
- **Dependencies:** cloudpickle>=2.0.0
Added a unit test.