- Description: Adds a new chain that acts as a wrapper around Sympy to
give LLMs the ability to do some symbolic math.
- Dependencies: SymPy
---------
Co-authored-by: sreiswig <sreiswig@github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
When using callbacks, there are times when callbacks can be added
redundantly: for instance sometimes you might need to create an llm with
specific callbacks, but then also create and agent that uses a chain
that has those callbacks already set. This means that "callbacks" might
get passed down again to the llm at predict() time, resulting in
duplicate calls to the `on_llm_start` callback.
For the sake of simplicity, I made it so that langchain never adds an
exact handler/callbacks object in `add_handler`, thus avoiding the
duplicate handler issue.
Tagging @hwchase17 for callback review
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Currently `ChatOutputParser` extracts actions by splitting the text on
"```", and then load the second part as a json string.
But sometimes the LLM will wrap the action in markdown code block like:
````markdown
```json
{
"action": "foo",
"action_input": "bar"
}
```
````
Splitting text on "```" will cause `OutputParserException` in such case.
This PR changes the behaviour to extract the `$JSON_BLOB` by regex, so
that it can handle both ` ``` ``` ` and ` ```json ``` `
@hinthornw
---------
Co-authored-by: Junlin Zhou <jlzhou@zjuici.com>
Hey @hwchase17 -
This PR adds a `ZepMemory` class, improves handling of Zep's message
metadata, and makes it easier for folks building custom chains to
persist metadata alongside their chat history.
We've had plenty confused users unfamiliar with ChatMessageHistory
classes and how to wrap the `ZepChatMessageHistory` in a
`ConversationBufferMemory`. So we've created the `ZepMemory` class as a
light wrapper for `ZepChatMessageHistory`.
Details:
- add ZepMemory, modify notebook to demo use of ZepMemory
- Modify summary to be SystemMessage
- add metadata argument to add_message; add Zep metadata to
Message.additional_kwargs
- support passing in metadata
Have noticed transient ref example misalignment. I believe this is
caused by the logic of assigning an example within the thread executor
rather than before.
Current problems:
1. Evaluating LLMs or Chat models isn't smooth. Even specifying
'generations' as the output inserts a redundant list into the eval
template
2. Configuring input / prediction / reference keys in the
`get_qa_evaluator` function is confusing. Unless you are using a chain
with the default keys, you have to specify all the variables and need to
reason about whether the key corresponds to the traced run's inputs,
outputs or the examples inputs or outputs.
Proposal:
- Configure the run evaluator according to a model. Use the model type
and input/output keys to assert compatibility where possible. Only need
to specify a reference_key for certain evaluators (which is less
confusing than specifying input keys)
When does this work:
- If you have your langchain model available (assumed always for
run_on_dataset flow)
- If you are evaluating an LLM, Chat model, or chain
- If the LLM or chat models are traced by langchain (wouldn't work if
you add an incompatible schema via the REST API)
When would this fail:
- Currently if you directly create an example from an LLM run, the
outputs are generations with all the extra metadata present. A simple
`example_key` and dumping all to the template could make the evaluations
unreliable
- Doesn't help if you're not using the low level API
- If you want to instantiate the evaluator without instantiating your
chain or LLM (maybe common for monitoring, for instance) -> could also
load from run or run type though
What's ugly:
- Personally think it's better to load evaluators one by one since
passing a config down is pretty confusing.
- Lots of testing needs to be added
- Inconsistent in that it makes a separate run and example input mapper
instead of the original `RunEvaluatorInputMapper`, which maps a run and
example to a single input.
Example usage running the for an LLM, Chat Model, and Agent.
```
# Test running for the string evaluators
evaluator_names = ["qa", "criteria"]
model = ChatOpenAI()
configured_evaluators = load_run_evaluators_for_model(evaluator_names, model=model, reference_key="answer")
run_on_dataset(ds_name, model, run_evaluators=configured_evaluators)
```
<details>
<summary>Full code with dataset upload</summary>
```
## Create dataset
from langchain.evaluation.run_evaluators.loading import load_run_evaluators_for_model
from langchain.evaluation import load_dataset
import pandas as pd
lcds = load_dataset("llm-math")
df = pd.DataFrame(lcds)
from uuid import uuid4
from langsmith import Client
client = Client()
ds_name = "llm-math - " + str(uuid4())[0:8]
ds = client.upload_dataframe(df, name=ds_name, input_keys=["question"], output_keys=["answer"])
## Define the models we'll test over
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, AgentType
from langchain.tools import tool
llm = OpenAI(temperature=0)
chat_model = ChatOpenAI(temperature=0)
@tool
def sum(a: float, b: float) -> float:
"""Add two numbers"""
return a + b
def construct_agent():
return initialize_agent(
llm=chat_model,
tools=[sum],
agent=AgentType.OPENAI_MULTI_FUNCTIONS,
)
agent = construct_agent()
# Test running for the string evaluators
evaluator_names = ["qa", "criteria"]
models = [llm, chat_model, agent]
run_evaluators = []
for model in models:
run_evaluators.append(load_run_evaluators_for_model(evaluator_names, model=model, reference_key="answer"))
# Run on LLM, Chat Model, and Agent
from langchain.client.runner_utils import run_on_dataset
to_test = [llm, chat_model, construct_agent]
for model, configured_evaluators in zip(to_test, run_evaluators):
run_on_dataset(ds_name, model, run_evaluators=configured_evaluators, verbose=True)
```
</details>
---------
Co-authored-by: Nuno Campos <nuno@boringbits.io>
- Description: pydantic's `ModelField.type_` only exposes the native
data type but not complex type hints like `List`. Thus, generating a
Tool with `from_function` through function signature produces incorrect
argument schemas (e.g., `str` instead of `List[str]`)
- Issue: N/A
- Dependencies: N/A
- Tag maintainer: @hinthornw
- Twitter handle: `mapped`
All the unittest (with an additional one in this PR) passed, though I
didn't try integration tests...
- Description:
- When `keep_separator` is `True` the `_split_text_with_regex()` method
in `text_splitter` uses regex to split, but when `keep_separator` is
`False` it uses `str.split()`. This causes problems when the separator
is a special regex character like `.` or `*`. This PR fixes that by
using `re.split()` in both cases.
- Issue: #7262
- Tag maintainer: @baskaryan
### Description
This pull request introduces the "Cube Semantic Layer" document loader,
which demonstrates the retrieval of Cube's data model metadata in a
format suitable for passing to LLMs as embeddings. This enhancement aims
to provide contextual information and improve the understanding of data.
Twitter handle:
@the_cube_dev
---------
Co-authored-by: rlm <pexpresss31@gmail.com>
- Description: Allow `InMemoryDocstore` to be created without passing a
dict to the constructor; the constructor can create a dict at runtime if
one isn't provided.
- Tag maintainer: @dev2049
## Description
Added Office365 tool modules to `__init__.py` files
## Issue
As described in Issue
https://github.com/hwchase17/langchain/issues/6936, the Office365
toolkit can't be loaded easily because it is not included in the
`__init__.py` files.
## Reviewer
@dev2049
- Description: If their are missing or extra variables when validating
Jinja 2 template then a warning is issued rather than raising an
exception. This allows for better flexibility for the developer as
described in #7044. Also changed the relevant test so pytest is checking
for raised warnings rather than exceptions.
- Issue: #7044
- Tag maintainer: @hwchase17, @baskaryan
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @dev2049
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @dev2049
- Memory: @hwchase17
- Agents / Tools / Toolkits: @vowelparrot
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @dev2049
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @dev2049
- Memory: @hwchase17
- Agents / Tools / Toolkits: @vowelparrot
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
**Description**:
The JSON Lines format is used by some services such as OpenAI and
HuggingFace. It's also a convenient alternative to CSV.
This PR adds JSON Lines support to `JSONLoader` and also updates related
tests.
**Tag maintainer**: @rlancemartin, @eyurtsev.
PS I was not able to build docs locally so didn't update related
section.
should be no functional changes
also keep __init__ exposing a lot for backwards compat
---------
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Handle the new retriever events in a way that (I think) is entirely
backwards compatible? Needs more testing for some of the chain changes
and all.
This creates an entire new run type, however. We could also just treat
this as an event within a chain run presumably (same with memory)
Adds a subclass initializer that upgrades old retriever implementations
to the new schema, along with tests to ensure they work.
First commit doesn't upgrade any of our retriever implementations (to
show that we can pass the tests along with additional ones testing the
upgrade logic).
Second commit upgrades the known universe of retrievers in langchain.
- [X] Add callback handling methods for retriever start/end/error (open
to renaming to 'retrieval' if you want that)
- [X] Update BaseRetriever schema to support callbacks
- [X] Tests for upgrading old "v1" retrievers for backwards
compatibility
- [X] Update existing retriever implementations to implement the new
interface
- [X] Update calls within chains to .{a]get_relevant_documents to pass
the child callback manager
- [X] Update the notebooks/docs to reflect the new interface
- [X] Test notebooks thoroughly
Not handled:
- Memory pass throughs: retrieval memory doesn't have a parent callback
manager passed through the method
---------
Co-authored-by: Nuno Campos <nuno@boringbits.io>
Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>
If you create a dataset from runs and run the same chain or llm on it
later, it usually works great.
If you have an agent dataset and want to run a different agent on it, or
have more complex schema, it's hard for us to automatically map these
values every time. This PR lets you pass in an input_mapper function
that converts the example inputs to whatever format your model expects
### Scientific Article PDF Parsing via Grobid
`Description:`
This change adds the GrobidParser class, which uses the Grobid library
to parse scientific articles into a universal XML format containing the
article title, references, sections, section text etc. The GrobidParser
uses a local Grobid server to return PDFs document as XML and parses the
XML to optionally produce documents of individual sentences or of whole
paragraphs. Metadata includes the text, paragraph number, pdf relative
bboxes, pages (text may overlap over two pages), section title
(Introduction, Methodology etc), section_number (i.e 1.1, 2.3), the
title of the paper and finally the file path.
Grobid parsing is useful beyond standard pdf parsing as it accurately
outputs sections and paragraphs within them. This allows for
post-fitering of results for specific sections i.e. limiting results to
the methodology section or results. While sections are split via
headings, ideally they could be classified specifically into
introduction, methodology, results, discussion, conclusion. I'm
currently experimenting with chatgpt-3.5 for this function, which could
later be implemented as a textsplitter.
`Dependencies:`
For use, the grobid repo must be cloned and Java must be installed, for
colab this is:
```
!apt-get install -y openjdk-11-jdk -q
!update-alternatives --set java /usr/lib/jvm/java-11-openjdk-amd64/bin/java
!git clone https://github.com/kermitt2/grobid.git
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-11-openjdk-amd64"
os.chdir('grobid')
!./gradlew clean install
```
Once installed the server is ran on localhost:8070 via
```
get_ipython().system_raw('nohup ./gradlew run > grobid.log 2>&1 &')
```
@rlancemartin, @eyurtsev
Twitter Handle: @Corranmac
Grobid Demo Notebook is
[here](https://colab.research.google.com/drive/1X-St_mQRmmm8YWtct_tcJNtoktbdGBmd?usp=sharing).
---------
Co-authored-by: rlm <pexpresss31@gmail.com>
Replace this comment with:
- Description: Add Async functionality to Zapier NLA Tools
- Issue: n/a
- Dependencies: n/a
- Tag maintainer:
Maintainer responsibilities:
- Agents / Tools / Toolkits: @vowelparrot
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
Description: When a 401 response is given back by Zapier, hint to the
end user why that may have occurred
- If an API Key was initialized with the wrapper, ask them to check
their API Key value
- if an access token was initialized with the wrapper, ask them to check
their access token or verify that it doesn't need to be refreshed.
Tag maintainer: @dev2049
#### Summary
A new approach to loading source code is implemented:
Each top-level function and class in the code is loaded into separate
documents. Then, an additional document is created with the top-level
code, but without the already loaded functions and classes.
This could improve the accuracy of QA chains over source code.
For instance, having this script:
```
class MyClass:
def __init__(self, name):
self.name = name
def greet(self):
print(f"Hello, {self.name}!")
def main():
name = input("Enter your name: ")
obj = MyClass(name)
obj.greet()
if __name__ == '__main__':
main()
```
The loader will create three documents with this content:
First document:
```
class MyClass:
def __init__(self, name):
self.name = name
def greet(self):
print(f"Hello, {self.name}!")
```
Second document:
```
def main():
name = input("Enter your name: ")
obj = MyClass(name)
obj.greet()
```
Third document:
```
# Code for: class MyClass:
# Code for: def main():
if __name__ == '__main__':
main()
```
A threshold parameter is added to control whether small scripts are
split in this way or not.
At this moment, only Python and JavaScript are supported. The
appropriate parser is determined by examining the file extension.
#### Tests
This PR adds:
- Unit tests
- Integration tests
#### Dependencies
Only one dependency was added as optional (needed for the JavaScript
parser).
#### Documentation
A notebook is added showing how the loader can be used.
#### Who can review?
@eyurtsev @hwchase17
---------
Co-authored-by: rlm <pexpresss31@gmail.com>
Description: Update documentation to
1) point to updated documentation links at Zapier.com (we've revamped
our help docs and paths), and
2) To provide clarity how to use the wrapper with an access token for
OAuth support
Demo:
Initializing the Zapier Wrapper with an OAuth Access Token
`ZapierNLAWrapper(zapier_nla_oauth_access_token="<redacted>")`
Using LangChain to resolve the current weather in Vancouver BC
leveraging Zapier NLA to lookup weather by coords.
```
> Entering new chain...
I need to use a tool to get the current weather.
Action: The Weather: Get Current Weather
Action Input: Get the current weather for Vancouver BC
Observation: {"coord__lon": -123.1207, "coord__lat": 49.2827, "weather": [{"id": 802, "main": "Clouds", "description": "scattered clouds", "icon": "03d", "icon_url": "http://openweathermap.org/img/wn/03d@2x.png"}], "weather[]icon_url": ["http://openweathermap.org/img/wn/03d@2x.png"], "weather[]icon": ["03d"], "weather[]id": [802], "weather[]description": ["scattered clouds"], "weather[]main": ["Clouds"], "base": "stations", "main__temp": 71.69, "main__feels_like": 71.56, "main__temp_min": 67.64, "main__temp_max": 76.39, "main__pressure": 1015, "main__humidity": 64, "visibility": 10000, "wind__speed": 3, "wind__deg": 155, "wind__gust": 11.01, "clouds__all": 41, "dt": 1687806607, "sys__type": 2, "sys__id": 2011597, "sys__country": "CA", "sys__sunrise": 1687781297, "sys__sunset": 1687839730, "timezone": -25200, "id": 6173331, "name": "Vancouver", "cod": 200, "summary": "scattered clouds", "_zap_search_was_found_status": true}
Thought: I now know the current weather in Vancouver BC.
Final Answer: The current weather in Vancouver BC is scattered clouds with a temperature of 71.69 and wind speed of 3
```
Notebook shows preference scoring between two chains and reports wilson
score interval + p value
I think I'll add the option to insert ground truth labels but doesn't
have to be in this PR
MHTML is a very interesting format since it's used both for emails but
also for archived webpages. Some scraping projects want to store pages
in disk to process them later, mhtml is perfect for that use case.
This is heavily inspired from the beautifulsoup html loader, but
extracting the html part from the mhtml file.
---------
Co-authored-by: rlm <pexpresss31@gmail.com>
# Add caching to BaseChatModel
Fixes#1644
(Sidenote: While testing, I noticed we have multiple implementations of
Fake LLMs, used for testing. I consolidated them.)
## Who can review?
Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
Models
- @hwchase17
- @agola11
Twitter: [@UmerHAdil](https://twitter.com/@UmerHAdil) | Discord:
RicChilligerDude#7589
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Fixes#5456
This PR removes the `callbacks` argument from a tool's schema when
creating a `Tool` or `StructuredTool` with the `from_function` method
and `infer_schema` is set to `True`. The `callbacks` argument is now
removed in the `create_schema_from_function` and `_get_filtered_args`
methods. As suggested by @vowelparrot, this fix provides a
straightforward solution that minimally affects the existing
implementation.
A test was added to verify that this change enables the expected use of
`Tool` and `StructuredTool` when using a `CallbackManager` and inferring
the tool's schema.
- @hwchase17
A new implementation of `StreamlitCallbackHandler`. It formats Agent
thoughts into Streamlit expanders.
You can see the handler in action here:
https://langchain-mrkl.streamlit.app/
Per a discussion with Harrison, we'll be adding a
`StreamlitCallbackHandler` implementation to an upcoming
[Streamlit](https://github.com/streamlit/streamlit) release as well, and
will be updating it as we add new LLM- and LangChain-specific features
to Streamlit.
The idea with this PR is that the LangChain `StreamlitCallbackHandler`
will "auto-update" in a way that keeps it forward- (and backward-)
compatible with Streamlit. If the user has an older Streamlit version
installed, the LangChain `StreamlitCallbackHandler` will be used; if
they have a newer Streamlit version that has an updated
`StreamlitCallbackHandler`, that implementation will be used instead.
(I'm opening this as a draft to get the conversation going and make sure
we're on the same page. We're really excited to land this into
LangChain!)
#### Who can review?
@agola11, @hwchase17