Current problems:
1. Evaluating LLMs or Chat models isn't smooth. Even specifying
'generations' as the output inserts a redundant list into the eval
template
2. Configuring input / prediction / reference keys in the
`get_qa_evaluator` function is confusing. Unless you are using a chain
with the default keys, you have to specify all the variables and need to
reason about whether the key corresponds to the traced run's inputs,
outputs or the examples inputs or outputs.
Proposal:
- Configure the run evaluator according to a model. Use the model type
and input/output keys to assert compatibility where possible. Only need
to specify a reference_key for certain evaluators (which is less
confusing than specifying input keys)
When does this work:
- If you have your langchain model available (assumed always for
run_on_dataset flow)
- If you are evaluating an LLM, Chat model, or chain
- If the LLM or chat models are traced by langchain (wouldn't work if
you add an incompatible schema via the REST API)
When would this fail:
- Currently if you directly create an example from an LLM run, the
outputs are generations with all the extra metadata present. A simple
`example_key` and dumping all to the template could make the evaluations
unreliable
- Doesn't help if you're not using the low level API
- If you want to instantiate the evaluator without instantiating your
chain or LLM (maybe common for monitoring, for instance) -> could also
load from run or run type though
What's ugly:
- Personally think it's better to load evaluators one by one since
passing a config down is pretty confusing.
- Lots of testing needs to be added
- Inconsistent in that it makes a separate run and example input mapper
instead of the original `RunEvaluatorInputMapper`, which maps a run and
example to a single input.
Example usage running the for an LLM, Chat Model, and Agent.
```
# Test running for the string evaluators
evaluator_names = ["qa", "criteria"]
model = ChatOpenAI()
configured_evaluators = load_run_evaluators_for_model(evaluator_names, model=model, reference_key="answer")
run_on_dataset(ds_name, model, run_evaluators=configured_evaluators)
```
<details>
<summary>Full code with dataset upload</summary>
```
## Create dataset
from langchain.evaluation.run_evaluators.loading import load_run_evaluators_for_model
from langchain.evaluation import load_dataset
import pandas as pd
lcds = load_dataset("llm-math")
df = pd.DataFrame(lcds)
from uuid import uuid4
from langsmith import Client
client = Client()
ds_name = "llm-math - " + str(uuid4())[0:8]
ds = client.upload_dataframe(df, name=ds_name, input_keys=["question"], output_keys=["answer"])
## Define the models we'll test over
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, AgentType
from langchain.tools import tool
llm = OpenAI(temperature=0)
chat_model = ChatOpenAI(temperature=0)
@tool
def sum(a: float, b: float) -> float:
"""Add two numbers"""
return a + b
def construct_agent():
return initialize_agent(
llm=chat_model,
tools=[sum],
agent=AgentType.OPENAI_MULTI_FUNCTIONS,
)
agent = construct_agent()
# Test running for the string evaluators
evaluator_names = ["qa", "criteria"]
models = [llm, chat_model, agent]
run_evaluators = []
for model in models:
run_evaluators.append(load_run_evaluators_for_model(evaluator_names, model=model, reference_key="answer"))
# Run on LLM, Chat Model, and Agent
from langchain.client.runner_utils import run_on_dataset
to_test = [llm, chat_model, construct_agent]
for model, configured_evaluators in zip(to_test, run_evaluators):
run_on_dataset(ds_name, model, run_evaluators=configured_evaluators, verbose=True)
```
</details>
---------
Co-authored-by: Nuno Campos <nuno@boringbits.io>
fixes https://github.com/hwchase17/langchain/issues/7289
A simple fix of the buggy output of `graph_qa`. If we have several
entities with triplets then the last entry of `triplets` for a given
entity merges with the first entry of the `triplets` of the next entity.
- Description: `MlflowCallbackHandler` fails with `KeyError: "['name']
not in index"`. See https://github.com/hwchase17/langchain/issues/5770
for more details. Root cause is that LangChain does not pass "name" as a
part of `serialized` argument to `on_llm_start()` callback method. The
commit where this change was made is probably this:
18af149e91.
My bug fix derives "name" from "id" field.
- Issue: https://github.com/hwchase17/langchain/issues/5770
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
### Description
Adding a callback handler for Context. Context is a product analytics
platform for AI chat experiences to help you understand how users are
interacting with your product.
I've added the callback library + an example notebook showing its use.
### Dependencies
Requires the user to install the `context-python` library. The library
is lazily-loaded when the callback is instantiated.
### Announcing the feature
We spoke with Harrison a few weeks ago about also doing a blog post
announcing our integration, so will coordinate this with him. Our
Twitter handle for the company is @getcontextai, and the founders are
@_agamble and @HenrySG.
Thanks in advance!
**Title:** Add verbose parameter for llamacpp
**Description:**
This pull request adds a 'verbose' parameter to the llamacpp module. The
'verbose' parameter, when set to True, will enable the output of
detailed logs during the execution of the Llama model. This added
parameter can aid in debugging and understanding the internal processes
of the module.
The verbose parameter is a boolean that prints verbose output to stderr
when set to True. By default, the verbose parameter is set to True but
can be toggled off if less output is desired. This new parameter has
been added to the `validate_environment` method of the `LlamaCpp` class
which initializes the `llama_cpp.Llama` API:
```python
class LlamaCpp(LLM):
...
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
...
model_param_names = [
...
"verbose", # New verbose parameter added
]
...
values["client"] = Llama(model_path, **model_params)
...
```
---------
Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
At the moment, pinecone vectorStore does not support filters and
namespaces when using similarity_score_threshold search type.
In this PR, I've implemented that. It passes all the kwargs except
"score_threshold" as that is not a supported argument for method
"similarity_search_with_score".
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
## Changes
- [X] Fill the `llm_output` param when there is an output parsing error
in a Pydantic schema so that we can get the original text that failed to
parse when handling the exception
## Background
With this change, we could do something like this:
```
output_parser = PydanticOutputParser(pydantic_object=pydantic_obj)
chain = ConversationChain(..., output_parser=output_parser)
try:
response: PydanticSchema = chain.predict(input=input)
except OutputParserException as exc:
logger.error(
'OutputParserException while parsing chatbot response: %s', exc.llm_output,
)
```
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
hi @rlancemartin ,
We had a new deployment and the `pg_extension` creation command was
updated from `CREATE EXTENSION pg_embedding` to `CREATE EXTENSION
embedding`.
https://github.com/neondatabase/neon/pull/4646
The extension not made public yet. No users will be affected by this.
Will be public next week.
Please let me know if you have any questions.
Thank you in advance 🙏
Continuing with Tolkien inspired series of langchain tools. I bring to
you:
**The Fellowship of the Vectors**, AKA EmbeddingsClusteringFilter.
This document filter uses embeddings to group vectors together into
clusters, then allows you to pick an arbitrary number of documents
vector based on proximity to the cluster centers. That's a
representative sample of the cluster.
The original idea is from [Greg Kamradt](https://github.com/gkamradt)
from this video (Level4):
https://www.youtube.com/watch?v=qaPMdcCqtWk&t=365s
I added few tricks to make it a bit more versatile, so you can
parametrize what to do with duplicate documents in case of cluster
overlap: replace the duplicates with the next closest document or remove
it. This allow you to use it as an special kind of redundant filter too.
Additionally you can choose 2 diff orders: grouped by cluster or
respecting the original retriever scores.
In my use case I was using the docs grouped by cluster to run refine
chains per cluster to generate summarization over a large corpus of
documents.
Let me know if you want to change anything!
@rlancemartin, @eyurtsev, @hwchase17,
---------
Co-authored-by: rlm <pexpresss31@gmail.com>
Change details:
- Description: When calling db.persist(), a check prevents from it
proceeding as the constructor only sets member `_persist_directory` from
parameters. But the ChromaDB client settings also has this parameter,
and if the client_settings parameter is used without passing the
persist_directory (which is optional), the `persist` method raises
`ValueError` for not setting `_persist_directory`. This change fixes it
by setting the member `_persist_directory` variable from client_settings
if it is set, else uses the constructor parameter.
- Issue: I didn't find any github issue of this, but I discovered it
after calling the persist method
- Dependencies: None
- Tag maintainer: vectorstore related change - @rlancemartin, @eyurtsev
- Twitter handle: Don't have one :(
*Additional discussion*: We may need to discuss the way I implemented
the fallback using `or`.
---------
Co-authored-by: rlm <pexpresss31@gmail.com>
This PR improves the example notebook for the Marqo vectorstore
implementation by adding a new RetrievalQAWithSourcesChain example. The
`embedding` parameter in `from_documents` has its type updated to
`Union[Embeddings, None]` and a default parameter of None because this
is ignored in Marqo.
This PR also upgrades the Marqo version to 0.11.0 to remove the device
parameter after a breaking change to the API.
Related to #7068 @tomhamer @hwchase17
---------
Co-authored-by: Tom Hamer <tom@marqo.ai>
Description: Pack of small fixes and refactorings that don't affect
functionality, just making code prettier & fixing some misspelling
(hand-filtered improvements proposed by SeniorAi.online, prototype of
code improving tool based on gpt4), agents and callbacks folders was
covered.
Dependencies: Nothing changed
Twitter: https://twitter.com/nayjest
Co-authored-by: Bagatur <baskaryan@gmail.com>
This PR improves upon the Clarifai LangChain integration with improved docs, errors, args and the addition of embedding model support in LancChain for Clarifai's embedding models and an overview of the various ways you can integrate with Clarifai added to the docs.
---------
Co-authored-by: Matthew Zeiler <zeiler@clarifai.com>
Description: Added number_of_head_rows as a parameter to pandas agent.
number_of_head_rows allows the user to select the number of rows to pass
with the prompt when include_df_in_prompt is True. This gives the
ability to control the token length and can be helpful in dealing with
large dataframe.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- Description: Solving, anthropic packaging version issue by clearing
the mixup from package.version that is being confused with version from
- importlib.metadata.version.
- Issue: it fixes the issue #7283
- Maintainer: @hwchase17
The following change has been explained in the comment -
https://github.com/hwchase17/langchain/issues/7283#issuecomment-1624328978
- Description: pydantic's `ModelField.type_` only exposes the native
data type but not complex type hints like `List`. Thus, generating a
Tool with `from_function` through function signature produces incorrect
argument schemas (e.g., `str` instead of `List[str]`)
- Issue: N/A
- Dependencies: N/A
- Tag maintainer: @hinthornw
- Twitter handle: `mapped`
All the unittest (with an additional one in this PR) passed, though I
didn't try integration tests...
- Description: Sometimes there are csv attachments with the media type
"application/vnd.ms-excel". These files failed to be loaded via the xlrd
library. It throws a corrupted file error. I fixed it by separately
processing excel files using pandas. Excel files will be processed just
like before.
- Dependencies: pandas, os, io
---------
Co-authored-by: Chathura <chathurar@yaalalabs.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
In some cases, the OpenAI response is missing the `finish_reason`
attribute. It seems to happen when using Ada or Babbage and
`stream=true`, but I can't always reproduce it. This change just
gracefully handles the missing key.