- **Description:** This includes Pydantic field metadata in
`_create_subset_model_v2` so that it gets included in the final
serialized form that get sent out.
- **Issue:** #25031
- **Dependencies:** n/a
- **Twitter handle:** @gramliu
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
This PR adds a minimal document indexer abstraction.
The goal of this abstraction is to allow developers to create custom
retrievers that also have a standard indexing API and allow updating the
document content in them.
The abstraction comes with a test suite that can verify that the indexer
implements the correct semantics.
This is an iteration over a previous PRs
(https://github.com/langchain-ai/langchain/pull/24364). The main
difference is that we're sub-classing from BaseRetriever in this
iteration and as so have consolidated the sync and async interfaces.
The main problem with the current design is that runt time search
configuration has to be specified at init rather than provided at run
time.
We will likely resolve this issue in one of the two ways:
(1) Define a method (`get_retriever`) that will allow creating a
retriever at run time with a specific configuration.. If we do this, we
will likely break the subclass on BaseRetriever
(2) Generalize base retriever so it can support structured queries
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR introduces a module with some helper utilities for the pydantic
1 -> 2 migration.
They're meant to be used in the following way:
1) Use the utility code to get unit tests pass without requiring
modification to the unit tests
2) (If desired) upgrade the unit tests to match pydantic 2 output
3) (If desired) stop using the utility code
Currently, this module contains a way to map `schema()` generated by
pydantic 2 to (mostly) match the output from pydantic v1.
Add compatibility for pydantic 2 for a utility function.
This will help push some small changes to master, so they don't have to
be kept track of on a separate branch.
supports following UX
```python
class SubTool(TypedDict):
"""Subtool docstring"""
args: Annotated[Dict[str, Any], {}, "this does bar"]
class Tool(TypedDict):
"""Docstring
Args:
arg1: foo
"""
arg1: str
arg2: Union[int, str]
arg3: Optional[List[SubTool]]
arg4: Annotated[Literal["bar", "baz"], ..., "this does foo"]
arg5: Annotated[Optional[float], None]
```
- can parse google style docstring
- can use Annotated to specify default value (second arg)
- can use Annotated to specify arg description (third arg)
- can have nested complex types
Anthropic models (including via Bedrock and other cloud platforms)
accept a status/is_error attribute on tool messages/results
(specifically in `tool_result` content blocks for Anthropic API). Adding
a ToolMessage.status attribute so that users can set this attribute when
using those models
This PR proposes to create a rate limiter in the chat model directly,
and would replace: https://github.com/langchain-ai/langchain/pull/21992
It resolves most of the constraints that the Runnable rate limiter
introduced:
1. It's not annoying to apply the rate limiter to existing code; i.e.,
possible to roll out the change at the location where the model is
instantiated,
rather than at every location where the model is used! (Which is
necessary
if the model is used in different ways in a given application.)
2. batch rate limiting is enforced properly
3. the rate limiter works correctly with streaming
4. the rate limiter is aware of the cache
5. The rate limiter can take into account information about the inputs
into the
model (we can add optional inputs to it down-the road together with
outputs!)
The only downside is that information will not be properly reflected in
tracing
as we don't have any metadata evens about a rate limiter. So the total
time
spent on a model invocation will be:
* time spent waiting for the rate limiter
* time spend on the actual model request
## Example
```python
from langchain_core.rate_limiters import InMemoryRateLimiter
from langchain_groq import ChatGroq
groq = ChatGroq(rate_limiter=InMemoryRateLimiter(check_every_n_seconds=1))
groq.invoke('hello')
```
### Description
* support asynchronous in InMemoryVectorStore
* since embeddings might be possible to call asynchronously, ensure that
both asynchronous and synchronous functions operate correctly.
This PR introduces the following Runnables:
1. BaseRateLimiter: an abstraction for specifying a time based rate
limiter as a Runnable
2. InMemoryRateLimiter: Provides an in-memory implementation of a rate
limiter
## Example
```python
from langchain_core.runnables import InMemoryRateLimiter, RunnableLambda
from datetime import datetime
foo = InMemoryRateLimiter(requests_per_second=0.5)
def meow(x):
print(datetime.now().strftime("%H:%M:%S.%f"))
return x
chain = foo | meow
for _ in range(10):
print(chain.invoke('hello'))
```
Produces:
```
17:12:07.530151
hello
17:12:09.537932
hello
17:12:11.548375
hello
17:12:13.558383
hello
17:12:15.568348
hello
17:12:17.578171
hello
17:12:19.587508
hello
17:12:21.597877
hello
17:12:23.607707
hello
17:12:25.617978
hello
```
![image](https://github.com/user-attachments/assets/283af59f-e1e1-408b-8e75-d3910c3c44cc)
## Interface
The rate limiter uses the following interface for acquiring a token:
```python
class BaseRateLimiter(Runnable[Input, Output], abc.ABC):
@abc.abstractmethod
def acquire(self, *, blocking: bool = True) -> bool:
"""Attempt to acquire the necessary tokens for the rate limiter.```
```
The flag `blocking` has been added to the abstraction to allow
supporting streaming (which is easier if blocking=False).
## Limitations
- The rate limiter is not designed to work across different processes.
It is an in-memory rate limiter, but it is thread safe.
- The rate limiter only supports time-based rate limiting. It does not
take into account the size of the request or any other factors.
- The current implementation does not handle streaming inputs well and
will consume all inputs even if the rate limit has been reached. Better
support for streaming inputs will be added in the future.
- When the rate limiter is combined with another runnable via a
RunnableSequence, usage of .batch() or .abatch() will only respect the
average rate limit. There will be bursty behavior as .batch() and
.abatch() wait for each step to complete before starting the next step.
One way to mitigate this is to use batch_as_completed() or
abatch_as_completed().
## Bursty behavior in `batch` and `abatch`
When the rate limiter is combined with another runnable via a
RunnableSequence, usage of .batch() or .abatch() will only respect the
average rate limit. There will be bursty behavior as .batch() and
.abatch() wait for each step to complete before starting the next step.
This becomes a problem if users are using `batch` and `abatch` with many
inputs (e.g., 100). In this case, there will be a burst of 100 inputs
into the batch of the rate limited runnable.
1. Using a RunnableBinding
The API would look like:
```python
from langchain_core.runnables import InMemoryRateLimiter, RunnableLambda
rate_limiter = InMemoryRateLimiter(requests_per_second=0.5)
def meow(x):
return x
rate_limited_meow = RunnableLambda(meow).with_rate_limiter(rate_limiter)
```
2. Another option is to add some init option to RunnableSequence that
changes `.batch()` to be depth first (e.g., by delegating to
`batch_as_completed`)
```python
RunnableSequence(first=rate_limiter, last=model, how='batch-depth-first')
```
Pros: Does not require Runnable Binding
Cons: Feels over-complicated
Feedback that `RunnableWithMessageHistory` is unwieldy compared to
ConversationChain and similar legacy abstractions is common.
Legacy chains using memory typically had no explicit notion of threads
or separate sessions. To use `RunnableWithMessageHistory`, users are
forced to introduce this concept into their code. This possibly felt
like unnecessary boilerplate.
Here we enable `RunnableWithMessageHistory` to run without a config if
the `get_session_history` callable has no arguments. This enables
minimal implementations like the following:
```python
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-3.5-turbo-0125")
memory = InMemoryChatMessageHistory()
chain = RunnableWithMessageHistory(llm, lambda: memory)
chain.invoke("Hi I'm Bob") # Hello Bob!
chain.invoke("What is my name?") # Your name is Bob.
```
This will allow tools and parsers to accept pydantic models from any of
the
following namespaces:
* pydantic.BaseModel with pydantic 1
* pydantic.BaseModel with pydantic 2
* pydantic.v1.BaseModel with pydantic 2
Description:
This PR fixes a KeyError: 400 that occurs in the JSON schema processing
within the reduce_openapi_spec function. The _retrieve_ref function in
json_schema.py was modified to handle missing components gracefully by
continuing to the next component if the current one is not found. This
ensures that the OpenAPI specification is fully interpreted and the
agent executes without errors.
Issue:
Fixes issue #24335
Dependencies:
No additional dependencies are required for this change.
Twitter handle:
@lunara_x
The functions `convert_to_messages` has had an expansion of the
arguments it can take:
1. Previously, it only could take a `Sequence` in order to iterate over
it. This has been broadened slightly to an `Iterable` (which should have
no other impact).
2. Support for `PromptValue` and `BaseChatPromptTemplate` has been
added. These are generated when combining messages using the overloaded
`+` operator.
Functions which rely on `convert_to_messages` (namely `filter_messages`,
`merge_message_runs` and `trim_messages`) have had the type of their
arguments similarly expanded.
Resolves#23706.
<!--
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
-->
---------
Signed-off-by: JP-Ellis <josh@jpellis.me>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Spell check fixes for docs, comments, and a couple of
strings. No code change e.g. variable names.
**Issue:** none
**Dependencies:** none
**Twitter handle:** hmartin