langchain/docs/use_cases/evaluation
Graham Neubig 31303d0b11
Added other evaluation metrics for data-augmented QA (#1521)
This PR adds additional evaluation metrics for data-augmented QA,
resulting in a report like this at the end of the notebook:

![Screen Shot 2023-03-08 at 8 53 23
AM](https://user-images.githubusercontent.com/398875/223731199-8eb8e77f-5ff3-40a2-a23e-f3bede623344.png)

The score calculation is based on the
[Critique](https://docs.inspiredco.ai/critique/) toolkit, an API-based
toolkit (like OpenAI) that has minimal dependencies, so it should be
easy for people to run if they choose.

The code could further be simplified by actually adding a chain that
calls Critique directly, but that probably should be saved for another
PR if necessary. Any comments or change requests are welcome!
2023-03-08 20:41:03 -08:00
..
data_augmented_question_answering.ipynb Added other evaluation metrics for data-augmented QA (#1521) 2023-03-08 20:41:03 -08:00
huggingface_datasets.ipynb Update huggingface_datasets.ipynb (#1417) 2023-03-04 00:22:31 -08:00
question_answering.ipynb feat: add custom prompt for QAEvalChain chain (#610) 2023-01-14 07:23:48 -08:00