Commit Graph

3287 Commits

Author SHA1 Message Date
Jona Sassenhagen
0ea7224535
[Minor] Remove tagger from spacy sentencizer (#7534)
@svlandeg gave me a tip for how to improve a bit on
https://github.com/hwchase17/langchain/pull/7442 for some extra speed
and memory gains. The tagger isn't needed for sentencization, so can be
disabled too.
2023-07-11 16:43:46 -04:00
Kacper Łukawski
1f83b5f47e
Reuse the existing collection if configured properly in Qdrant.from_texts (#7530)
This PR changes the behavior of `Qdrant.from_texts` so the collection is
reused if not requested to recreate it. Previously, calling
`Qdrant.from_texts` or `Qdrant.from_documents` resulted in removing the
old data which was confusing for many.
2023-07-11 16:24:35 -04:00
Leonid Kuligin
6674b33cf5
Added support for chat_history (#7555)
#7469

Co-authored-by: Leonid Kuligin <kuligin@google.com>
2023-07-11 15:27:26 -04:00
Felix Brockmeier
406a9dc11f
Add notebook example for Lemon AI NLP Workflow Automation (#7556)
- Description: Added notebook to LangChain docs that explains how to use
Lemon AI NLP Workflow Automation tool with Langchain
  
- Issue: not applicable
  
- Dependencies: not applicable
  
- Tag maintainer: @agola11
  
- Twitter handle: felixbrockm
2023-07-11 15:15:11 -04:00
Lance Martin
9e067b8cc9
Add env setup (#7550)
Include setup
2023-07-11 09:48:40 -07:00
Bagatur
3c4338470e
bump 230 (#7544) 2023-07-11 11:24:08 -04:00
Bagatur
d2137eea9f
fix cpal docs (#7545) 2023-07-11 11:07:45 -04:00
Boris
9129318466
CPAL (#6255)
# Causal program-aided language (CPAL) chain

## Motivation

This builds on the recent [PAL](https://arxiv.org/abs/2211.10435) to
stop LLM hallucination. The problem with the
[PAL](https://arxiv.org/abs/2211.10435) approach is that it hallucinates
on a math problem with a nested chain of dependence. The innovation here
is that this new CPAL approach includes causal structure to fix
hallucination.

For example, using the below word problem, PAL answers with 5, and CPAL
answers with 13.

    "Tim buys the same number of pets as Cindy and Boris."
    "Cindy buys the same number of pets as Bill plus Bob."
    "Boris buys the same number of pets as Ben plus Beth."
    "Bill buys the same number of pets as Obama."
    "Bob buys the same number of pets as Obama."
    "Ben buys the same number of pets as Obama."
    "Beth buys the same number of pets as Obama."
    "If Obama buys one pet, how many pets total does everyone buy?"

The CPAL chain represents the causal structure of the above narrative as
a causal graph or DAG, which it can also plot, as shown below.


![complex-graph](https://github.com/hwchase17/langchain/assets/367522/d938db15-f941-493d-8605-536ad530f576)

.

The two major sections below are:

1. Technical overview
2. Future application

Also see [this jupyter
notebook](https://github.com/borisdev/langchain/blob/master/docs/extras/modules/chains/additional/cpal.ipynb)
doc.


## 1. Technical overview

### CPAL versus PAL

Like [PAL](https://arxiv.org/abs/2211.10435), CPAL intends to reduce
large language model (LLM) hallucination.

The CPAL chain is different from the PAL chain for a couple of reasons. 

* CPAL adds a causal structure (or DAG) to link entity actions (or math
expressions).
* The CPAL math expressions are modeling a chain of cause and effect
relations, which can be intervened upon, whereas for the PAL chain math
expressions are projected math identities.

PAL's generated python code is wrong. It hallucinates when complexity
increases.

```python
def solution():
    """Tim buys the same number of pets as Cindy and Boris.Cindy buys the same number of pets as Bill plus Bob.Boris buys the same number of pets as Ben plus Beth.Bill buys the same number of pets as Obama.Bob buys the same number of pets as Obama.Ben buys the same number of pets as Obama.Beth buys the same number of pets as Obama.If Obama buys one pet, how many pets total does everyone buy?"""
    obama_pets = 1
    tim_pets = obama_pets
    cindy_pets = obama_pets + obama_pets
    boris_pets = obama_pets + obama_pets
    total_pets = tim_pets + cindy_pets + boris_pets
    result = total_pets
    return result  # math result is 5
```

CPAL's generated python code is correct.

```python
story outcome data
    name                                   code  value      depends_on
0  obama                                   pass    1.0              []
1   bill               bill.value = obama.value    1.0         [obama]
2    bob                bob.value = obama.value    1.0         [obama]
3    ben                ben.value = obama.value    1.0         [obama]
4   beth               beth.value = obama.value    1.0         [obama]
5  cindy   cindy.value = bill.value + bob.value    2.0     [bill, bob]
6  boris   boris.value = ben.value + beth.value    2.0     [ben, beth]
7    tim  tim.value = cindy.value + boris.value    4.0  [cindy, boris]

query data
{
    "question": "how many pets total does everyone buy?",
    "expression": "SELECT SUM(value) FROM df",
    "llm_error_msg": ""
}
# query result is 13
```

Based on the comments below, CPAL's intended location in the library is
`experimental/chains/cpal` and PAL's location is`chains/pal`.

### CPAL vs Graph QA

Both the CPAL chain and the Graph QA chain extract entity-action-entity
relations into a DAG.

The CPAL chain is different from the Graph QA chain for a few reasons.

* Graph QA does not connect entities to math expressions
* Graph QA does not associate actions in a sequence of dependence.
* Graph QA does not decompose the narrative into these three parts:
  1. Story plot or causal model
  4. Hypothetical question
  5. Hypothetical condition 

### Evaluation

Preliminary evaluation on simple math word problems shows that this CPAL
chain generates less hallucination than the PAL chain on answering
questions about a causal narrative. Two examples are in [this jupyter
notebook](https://github.com/borisdev/langchain/blob/master/docs/extras/modules/chains/additional/cpal.ipynb)
doc.

## 2. Future application

### "Describe as Narrative, Test as Code"

The thesis here is that the Describe as Narrative, Test as Code approach
allows you to represent a causal mental model both as code and as a
narrative, giving you the best of both worlds.

#### Why describe a causal mental mode as a narrative?

The narrative form is quick. At a consensus building meeting, people use
narratives to persuade others of their causal mental model, aka. plan.
You can share, version control and index a narrative.

#### Why test a causal mental model as a code?

Code is testable, complex narratives are not. Though fast, narratives
are problematic as their complexity increases. The problem is LLMs and
humans are prone to hallucination when predicting the outcomes of a
narrative. The cost of building a consensus around the validity of a
narrative outcome grows as its narrative complexity increases. Code does
not require tribal knowledge or social power to validate.

Code is composable, complex narratives are not. The answer of one CPAL
chain can be the hypothetical conditions of another CPAL Chain. For
stochastic simulations, a composable plan can be integrated with the
[DoWhy library](https://github.com/py-why/dowhy). Lastly, for the
futuristic folk, a composable plan as code allows ordinary community
folk to design a plan that can be integrated with a blockchain for
funding.

An explanation of a dependency planning application is
[here.](https://github.com/borisdev/cpal-llm-chain-demo)

--- 
Twitter handle: @boris_dev

---------

Co-authored-by: Boris Dev <borisdev@Boriss-MacBook-Air.local>
2023-07-11 10:11:21 -04:00
Alejandra De Luna
2e4047e5e7
feat: support generate as an early stopping method for OpenAIFunctionsAgent (#7229)
This PR proposes an implementation to support `generate` as an
`early_stopping_method` for the new `OpenAIFunctionsAgent` class.

The motivation behind is to facilitate the user to set a maximum number
of actions the agent can take with `max_iterations` and force a final
response with this new agent (as with the `Agent` class).

The following changes were made:

- The `OpenAIFunctionsAgent.return_stopped_response` method was
overwritten to support `generate` as an `early_stopping_method`
- A boolean `with_functions` parameter was added to the
`OpenAIFunctionsAgent.plan` method

This way the `OpenAIFunctionsAgent.return_stopped_response` method can
call the `OpenAIFunctionsAgent.plan` method with `with_function=False`
when the `early_stopping_method` is set to `generate`, making a call to
the LLM with no functions and forcing a final response from the
`"assistant"`.

  - Relevant maintainer: @hinthornw
  - Twitter handle: @aledelunap

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-11 09:25:02 -04:00
Hashem Alsaket
1dd4236177
Fix HF endpoint returns blank for text-generation (#7386)
Description: Current `_call` function in the
`langchain.llms.HuggingFaceEndpoint` class truncates response when
`task=text-generation`. Same error discussed a few days ago on Hugging
Face: https://huggingface.co/tiiuae/falcon-40b-instruct/discussions/51
Issue: Fixes #7353 
Tag maintainer: @hwchase17 @baskaryan @hinthornw

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-11 03:06:05 -04:00
Lance Martin
4a94f56258
Minor edits to QA docs (#7507)
Small clean-ups
2023-07-10 22:15:05 -07:00
Raymond Yuan
5171c3bcca
Refactor vector storage to correctly handle relevancy scores (#6570)
Description: This pull request aims to support generating the correct
generic relevancy scores for different vector stores by refactoring the
relevance score functions and their selection in the base class and
subclasses of VectorStore. This is especially relevant with VectorStores
that require a distance metric upon initialization. Note many of the
current implenetations of `_similarity_search_with_relevance_scores` are
not technically correct, as they just return
`self.similarity_search_with_score(query, k, **kwargs)` without applying
the relevant score function

Also includes changes associated with:
https://github.com/hwchase17/langchain/pull/6564 and
https://github.com/hwchase17/langchain/pull/6494

See more indepth discussion in thread in #6494 

Issue: 
https://github.com/hwchase17/langchain/issues/6526
https://github.com/hwchase17/langchain/issues/6481
https://github.com/hwchase17/langchain/issues/6346

Dependencies: None

The changes include:
- Properly handling score thresholding in FAISS
`similarity_search_with_score_by_vector` for the corresponding distance
metric.
- Refactoring the `_similarity_search_with_relevance_scores` method in
the base class and removing it from the subclasses for incorrectly
implemented subclasses.
- Adding a `_select_relevance_score_fn` method in the base class and
implementing it in the subclasses to select the appropriate relevance
score function based on the distance strategy.
- Updating the `__init__` methods of the subclasses to set the
`relevance_score_fn` attribute.
- Removing the `_default_relevance_score_fn` function from the FAISS
class and using the base class's `_euclidean_relevance_score_fn`
instead.
- Adding the `DistanceStrategy` enum to the `utils.py` file and updating
the imports in the vector store classes.
- Updating the tests to import the `DistanceStrategy` enum from the
`utils.py` file.

---------

Co-authored-by: Hanit <37485638+hanit-com@users.noreply.github.com>
2023-07-10 20:37:03 -07:00
Lance Martin
bd0c6381f5
Minor update to clarify map-reduce custom prompt usage (#7453)
Update docs for map-reduce custom prompt usage
2023-07-10 16:43:44 -07:00
Lance Martin
28d2b213a4
Update landing page for "question answering over documents" (#7152)
Improve documentation for a central use-case, qa / chat over documents.

This will be merged as an update to `index.mdx`
[here](https://python.langchain.com/docs/use_cases/question_answering/).

Testing w/ local Docusaurus server:

```
From `docs` directory:
mkdir _dist
cp -r {docs_skeleton,snippets} _dist
cp -r extras/* _dist/docs_skeleton/docs
cd _dist/docs_skeleton
yarn install
yarn start
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-10 14:15:13 -07:00
William FH
dd648183fa
Rm create_project line (#7486)
not needed
2023-07-10 10:49:55 -07:00
Leonid Ganeline
5eec74d9a5
docstrings document_loaders 3 (#6937)
- Updated docstrings for `document_loaders`
- Mass update `"""Loader that loads` to `"""Loads`

@baskaryan  - please, review
2023-07-10 08:56:53 -07:00
Stanko Kuveljic
9d13dcd17c
Pinecone: Add V4 support (#7473) 2023-07-10 08:39:47 -07:00
Adilkhan Sarsen
5debd5043e
Added deeplake use case examples of the new features (#6528)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
 
 1. Added use cases of the new features
 2. Done some code refactoring

---------

Co-authored-by: Ivo Stranic <istranic@gmail.com>
2023-07-10 07:04:29 -07:00
Bagatur
9b615022e2
bump 229 (#7467) 2023-07-10 04:38:55 -04:00
Kazuki Maeda
92b4418c8c
Datadog logs loader (#7356)
### Description
Created a Loader to get a list of specific logs from Datadog Logs.

### Dependencies
`datadog_api_client` is required.

### Twitter handle
[kzk_maeda](https://twitter.com/kzk_maeda)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-10 04:27:55 -04:00
Yifei Song
7d29bb2c02
Add Xorbits Dataframe as a Document Loader (#7319)
- [Xorbits](https://doc.xorbits.io/en/latest/) is an open-source
computing framework that makes it easy to scale data science and machine
learning workloads in parallel. Xorbits can leverage multi cores or GPUs
to accelerate computation on a single machine, or scale out up to
thousands of machines to support processing terabytes of data.

- This PR added support for the Xorbits document loader, which allows
langchain to leverage Xorbits to parallelize and distribute the loading
of data.
- Dependencies: This change requires the Xorbits library to be installed
in order to be used.
`pip install xorbits`
- Request for review: @rlancemartin, @eyurtsev
- Twitter handle: https://twitter.com/Xorbitsio

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-10 04:24:47 -04:00
Sergio Moreno
21a353e9c2
feat: ctransformers support async chain (#6859)
- Description: Adding async method for CTransformers 
- Issue: I've found impossible without this code to run Websockets
inside a FastAPI micro service and a CTransformers model.
  - Tag maintainer: Not necessary yet, I don't like to mention directly 
  - Twitter handle: @_semoal
2023-07-10 04:23:41 -04:00
Paul-Emile Brotons
d2cf0d16b3
adding max_marginal_relevance_search method to MongoDBAtlasVectorSearch (#7310)
Adding a maximal_marginal_relevance method to the
MongoDBAtlasVectorSearch vectorstore enhances the user experience by
providing more diverse search results

Issue: #7304
2023-07-10 04:04:19 -04:00
Bagatur
04cddfba0d
Add lark import error (#7465) 2023-07-10 03:21:23 -04:00
Matt Robinson
bcab894f4e
feat: Add UnstructuredTSVLoader (#7367)
### Summary

Adds an `UnstructuredTSVLoader` for TSV files. Also updates the doc
strings for `UnstructuredCSV` and `UnstructuredExcel` loaders.

### Testing

```python
from langchain.document_loaders.tsv import UnstructuredTSVLoader

loader = UnstructuredTSVLoader(
    file_path="example_data/mlb_teams_2012.csv", mode="elements"
)
docs = loader.load()
```
2023-07-10 03:07:10 -04:00
Ronald Li
490f4a9ff0
Fixes KeyError in AmazonKendraRetriever initializer (#7464)
### Description
argument variable client is marked as required in commit
81e5b1ad36 which breaks the default way of
initialization providing only index_id. This commit avoid KeyError
exception when it is initialized without a client variable
### Dependencies
no dependency required
2023-07-10 03:02:36 -04:00
Jona Sassenhagen
7ffc431b3a
Add spacy sentencizer (#7442)
`SpacyTextSplitter` currently uses spacy's statistics-based
`en_core_web_sm` model for sentence splitting. This is a good splitter,
but it's also pretty slow, and in this case it's doing a lot of work
that's not needed given that the spacy parse is then just thrown away.
However, there is also a simple rules-based spacy sentencizer. Using
this is at least an order of magnitude faster than using
`en_core_web_sm` according to my local tests.
Also, spacy sentence tokenization based on `en_core_web_sm` can be sped
up in this case by not doing the NER stage. This shaves some cycles too,
both when loading the model and when parsing the text.

Consequently, this PR adds the option to use the basic spacy
sentencizer, and it disables the NER stage for the current approach,
*which is kept as the default*.

Lastly, when extracting the tokenized sentences, the `text` attribute is
called directly instead of doing the string conversion, which is IMO a
bit more idiomatic.
2023-07-10 02:52:05 -04:00
charosen
50a9fcccb0
feat(module): add param ids to ElasticVectorSearch.from_texts method (#7425)
# add param ids to ElasticVectorSearch.from_texts method.

- Description: add param ids to ElasticVectorSearch.from_texts method.
- Issue: NA. It seems `add_texts` already supports passing in document
ids, but param `ids` is omitted in `from_texts` classmethod,
- Dependencies: None,
- Tag maintainer: @rlancemartin, @eyurtsev please have a look, thanks

```
    # ElasticVectorSearch add_texts
    def add_texts(
        self,
        texts: Iterable[str],
        metadatas: Optional[List[dict]] = None,
        refresh_indices: bool = True,
        ids: Optional[List[str]] = None,
        **kwargs: Any,
    ) -> List[str]:
        ...

```

```
    # ElasticVectorSearch from_texts
    @classmethod
    def from_texts(
        cls,
        texts: List[str],
        embedding: Embeddings,
        metadatas: Optional[List[dict]] = None,
        elasticsearch_url: Optional[str] = None,
        index_name: Optional[str] = None,
        refresh_indices: bool = True,
        **kwargs: Any,
    ) -> ElasticVectorSearch:

```


Co-authored-by: charosen <charosen@bupt.cn>
2023-07-10 02:25:35 -04:00
James Yin
a5fd8873b1
fix: type hint of get_chat_history in BaseConversationalRetrievalChain (#7461)
The type hint of `get_chat_history` property in
`BaseConversationalRetrievalChain` is incorrect. @baskaryan
2023-07-10 02:14:00 -04:00
nikkie
dfc3f83b0f
docs(vectorstores/integrations/chroma): Fix loading and saving (#7437)
- Description: Fix loading and saving code about Chroma
- Issue: the issue #7436 
- Dependencies: -
- Twitter handle: https://twitter.com/ftnext
2023-07-10 02:05:15 -04:00
Daniel Chalef
c7f7788d0b
Add ZepMemory; improve ZepChatMessageHistory handling of metadata; Fix bugs (#7444)
Hey @hwchase17 - 

This PR adds a `ZepMemory` class, improves handling of Zep's message
metadata, and makes it easier for folks building custom chains to
persist metadata alongside their chat history.

We've had plenty confused users unfamiliar with ChatMessageHistory
classes and how to wrap the `ZepChatMessageHistory` in a
`ConversationBufferMemory`. So we've created the `ZepMemory` class as a
light wrapper for `ZepChatMessageHistory`.

Details:
- add ZepMemory, modify notebook to demo use of ZepMemory
- Modify summary to be SystemMessage
- add metadata argument to add_message; add Zep metadata to
Message.additional_kwargs
- support passing in metadata
2023-07-10 01:53:49 -04:00
Saurabh Chaturvedi
8f8e8d701e
Fix info about YouTube (#7447)
(Unintentionally mean 😅) nit: YouTube wasn't created by Google, this PR
fixes the mention in docs.
2023-07-10 01:52:55 -04:00
Leonid Ganeline
560c4dfc98
docstrings: docstore and client (#6783)
updated docstrings in `docstore/` and `client/`

@baskaryan
2023-07-09 01:34:28 -04:00
Jeroen Van Goey
f5bd88757e
Fix typo (#7416)
`quesitons` -> `questions`.
2023-07-09 00:54:48 -04:00
Alejandro Garrido Mota
ea9c3cc9c9
Fix syntax erros in documentation (#7409)
- Description: Tiny documentation fix. In Python, when defining function
parameters or providing arguments to a function or class constructor, we
do not use the `:` character.
- Issue: N/A
- Dependencies: N/A,
- Tag maintainer: @rlancemartin, @eyurtsev
- Twitter handle: @mogaal
2023-07-08 19:52:01 -04:00
Nolan
5da9f9abcb
docs(agents/toolkits): Fix error in document_comparison_toolkit.ipynb (#7417)
Replace this comment with:
- Description: Removes unneeded output warning in documentation at
https://python.langchain.com/docs/modules/agents/toolkits/document_comparison_toolkit
  - Issue: -
  - Dependencies: -
  - Tag maintainer: @baskaryan
  - Twitter handle: @finnless
2023-07-08 19:51:08 -04:00
nikkie
2eb4a2ceea
docs(retrievers/get-started): Fix broken state_of_the_union.txt link (#7399)
Thank you for this awesome library.

- Description: Fix broken link in documentation 
- Issue:
-
https://python.langchain.com/docs/modules/data_connection/retrievers/#get-started
- the URL:
https://github.com/hwchase17/langchain/blob/master/docs/modules/state_of_the_union.txt
- I think the right one is
https://github.com/hwchase17/langchain/blob/master/docs/extras/modules/state_of_the_union.txt
- Dependencies: -
- Tag maintainer: @baskaryan
- Twitter handle: -
2023-07-08 11:11:05 -04:00
Delgermurun
e7420789e4
improve description of JinaChat (#7397)
very small doc string change in the `JinaChat` class.
2023-07-08 10:57:11 -04:00
Bagatur
26c86a197c
bump 228 (#7393) 2023-07-08 03:05:20 -04:00
SvMax
1d649b127e
Added param to return only a structured json from the get_format_instructions method (#5848)
I just added a parameter to the method get_format_instructions, to
return directly the JSON instructions without the leading instruction
sentence. I'm planning to use it to define the structure of a JSON
object passed in input, the get_format_instructions().

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-08 02:57:26 -04:00
Bagatur
362bc301df
fix jina (#7392) 2023-07-08 02:41:54 -04:00
Delgermurun
a1603fccfb
integrate JinaChat (#6927)
Integration with https://chat.jina.ai/api. It is OpenAI compatible API.

- Twitter handle:
[https://twitter.com/JinaAI_](https://twitter.com/JinaAI_)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-07-08 02:17:04 -04:00
William FH
4ba7396f96
Add single run eval loader (#7390)
Plus 
- add evaluation name to make string and embedding validators work with
the run evaluator loader.
- Rm unused root validator
2023-07-07 23:06:49 -07:00
Roger Yu
633b673b85
Update pinecone.ipynb (#7382)
Fix typo
2023-07-08 01:48:03 -04:00
Oleg Zabluda
4d697d3f24
Allow passing custom prompts to GraphIndexCreator (#7381)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-08 01:47:53 -04:00
William FH
612a74eb7e
Make Ref Example Threadsafe (#7383)
Have noticed transient ref example misalignment. I believe this is
caused by the logic of assigning an example within the thread executor
rather than before.
2023-07-07 21:50:42 -07:00
William FH
4789c99bc2
Add String Distance and Embedding Evaluators (#7123)
Add a string evaluator and pairwise string evaluator implementation for:
- Embedding distance
- String distance

Update docs
2023-07-07 21:44:31 -07:00
ljeagle
fb6e63dc36
Upgrade the AwaDB from 0.3.5 to 0.3.6 (#7363) 2023-07-07 20:41:17 -07:00
William FH
c5edbea34a
Load Run Evaluator (#7101)
Current problems:
1. Evaluating LLMs or Chat models isn't smooth. Even specifying
'generations' as the output inserts a redundant list into the eval
template
2. Configuring input / prediction / reference keys in the
`get_qa_evaluator` function is confusing. Unless you are using a chain
with the default keys, you have to specify all the variables and need to
reason about whether the key corresponds to the traced run's inputs,
outputs or the examples inputs or outputs.


Proposal:
- Configure the run evaluator according to a model. Use the model type
and input/output keys to assert compatibility where possible. Only need
to specify a reference_key for certain evaluators (which is less
confusing than specifying input keys)


When does this work:
- If you have your langchain model available (assumed always for
run_on_dataset flow)
- If you are evaluating an LLM, Chat model, or chain
- If the LLM or chat models are traced by langchain (wouldn't work if
you add an incompatible schema via the REST API)

When would this fail:
- Currently if you directly create an example from an LLM run, the
outputs are generations with all the extra metadata present. A simple
`example_key` and dumping all to the template could make the evaluations
unreliable
- Doesn't help if you're not using the low level API
- If you want to instantiate the evaluator without instantiating your
chain or LLM (maybe common for monitoring, for instance) -> could also
load from run or run type though

What's ugly:
- Personally think it's better to load evaluators one by one since
passing a config down is pretty confusing.
- Lots of testing needs to be added
- Inconsistent in that it makes a separate run and example input mapper
instead of the original `RunEvaluatorInputMapper`, which maps a run and
example to a single input.

Example usage running the for an LLM, Chat Model, and Agent.

```
# Test running for the string evaluators
evaluator_names = ["qa", "criteria"]

model = ChatOpenAI()
configured_evaluators = load_run_evaluators_for_model(evaluator_names, model=model, reference_key="answer")
run_on_dataset(ds_name, model, run_evaluators=configured_evaluators)
```


<details>
  <summary>Full code with dataset upload</summary>
```
## Create dataset
from langchain.evaluation.run_evaluators.loading import load_run_evaluators_for_model
from langchain.evaluation import load_dataset
import pandas as pd

lcds = load_dataset("llm-math")
df = pd.DataFrame(lcds)

from uuid import uuid4
from langsmith import Client
client = Client()
ds_name = "llm-math - " + str(uuid4())[0:8]
ds = client.upload_dataframe(df, name=ds_name, input_keys=["question"], output_keys=["answer"])



## Define the models we'll test over
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, AgentType

from langchain.tools import tool

llm = OpenAI(temperature=0)
chat_model = ChatOpenAI(temperature=0)

@tool
    def sum(a: float, b: float) -> float:
        """Add two numbers"""
        return a + b
    
def construct_agent():
    return initialize_agent(
        llm=chat_model,
        tools=[sum],
        agent=AgentType.OPENAI_MULTI_FUNCTIONS,
    )

agent = construct_agent()

# Test running for the string evaluators
evaluator_names = ["qa", "criteria"]

models = [llm, chat_model, agent]
run_evaluators = []
for model in models:
    run_evaluators.append(load_run_evaluators_for_model(evaluator_names, model=model, reference_key="answer"))
    

# Run on LLM, Chat Model, and Agent
from langchain.client.runner_utils import run_on_dataset

to_test = [llm, chat_model, construct_agent]

for model, configured_evaluators in zip(to_test, run_evaluators):
    run_on_dataset(ds_name, model, run_evaluators=configured_evaluators, verbose=True)
```
</details>

---------

Co-authored-by: Nuno Campos <nuno@boringbits.io>
2023-07-07 19:57:59 -07:00
Bagatur
1ac347b4e3
update databerry-chaindesk redirect (#7378) 2023-07-07 19:11:46 -04:00