add eval cookbook

pull/1106/head
Shahules786 3 months ago
parent bed41103a2
commit 812a2dea93

@ -0,0 +1,162 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "18c253af-fdb3-414b-bc68-8bd18004f5cc",
"metadata": {},
"source": [
"# Introduction\n",
"\n",
"Ragas is the de-facto opensource standard for RAG evaluations. Ragas provides features and methods to help evaluate RAG applications. In this notebook we will cover basic steps for evaluating your RAG application with Ragas. \n",
"\n",
"### Contents\n",
"- [Prerequisites]()\n",
"- [Dataset preparation]()\n",
"- [Evaluation]()\n",
"- [Analysis]()"
]
},
{
"cell_type": "markdown",
"id": "73c40aa9-aa04-44fc-8ef3-2ab7bd132c36",
"metadata": {},
"source": [
"### Prerequisites\n",
"- Ragas is a python package and we can install it using pip\n",
"- Ragas uses model guided techniques underneath to produce scores for each metric. In this tutorial, we will use OpenAI `gpt-3.5-turbo` and `text-embedding-ada-002`. These are the default models used in ragas but you can use any LLM or Embedding of your choice by referring to this [guide](https://docs.ragas.io/en/stable/howtos/customisations/bring-your-own-llm-or-embs.html). I highly recommend that you try this notebook with open-ai so that you get a feel of it with ease."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "59db1ff3-b618-4dca-924b-035a2f5def0c",
"metadata": {},
"outputs": [],
"source": [
"! pip install ragas"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ebc22310-92b3-4819-ae93-2b9a85c25ef7",
"metadata": {},
"outputs": [],
"source": [
"! export OPENAI_API_KEY=\"your-openai-key\""
]
},
{
"cell_type": "markdown",
"id": "8ae41c3a-4fc2-4596-93f2-c763adeef56e",
"metadata": {},
"source": [
"And that's it. You're ready to go."
]
},
{
"cell_type": "markdown",
"id": "b7b14d57-68cc-4beb-9b99-76827687db88",
"metadata": {},
"source": [
"## Dataset preparation\n",
"\n",
"Evaluating any ML pipeline will require several data points that constitues a test dataset. For Ragas, the data points required for evaluating your RAG completely are\n",
"\n",
"- `question`: A question or query that is relevant to your RAG.\n",
"- `contexts`: The retrieved contexts corresponding to each question. This is a `list[list]` since each question can retrieve multiple text chunks.\n",
"- `answer`: The answer generated by your RAG corresponding to each question.\n",
"- `ground_truth`: The expected correct answer corresponding to each question.\n",
"\n",
"For the purpose of this notebook, I have this dataset prepared from a simple RAG that I created myself to help me with NLP research. Let's use it."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "326f3dc1-775f-4ec5-8f27-afd76e9b5b22",
"metadata": {},
"outputs": [],
"source": [
"from datasets import load_dataset\n",
"PATH = \"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1ee49bc5-4661-4435-8463-197877c18fa3",
"metadata": {},
"outputs": [],
"source": [
"eval_dataset = load_dataset(PATH)\n",
"eval_dataset"
]
},
{
"cell_type": "markdown",
"id": "84ae0719-82bc-4103-8299-3df7021951e1",
"metadata": {},
"source": [
"As you can see, the dataset contains all the required attributes mentioned above. Now we can move on our next step of actually doing the evaluation with it.\n",
"\n",
"**Note:**\n",
"*We know that it's hard to formulate a test data containing Question and ground truth answer pairs when starting out. We have the perfect solution for this in this form of a ragas synthetic test data generation feature. The questions and ground truth answers were created by [ragas synthetic data generation]() feature. Check it out here once you finish this notebook*"
]
},
{
"cell_type": "markdown",
"id": "6be10812-894e-43ba-857d-36627eb54dc8",
"metadata": {},
"source": [
"## Evaluation\n",
"For evaluation ragas provides several metrics which is aimed to quantify the end-end performance of the pipeline and also the component wise performance of the pipeline. For this tutorial let's consider few of them"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "98b6cbba-fecb-4b92-8cd9-839d80025b22",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "7fa8eaaa-b6b9-4ae9-a2df-f65f0559b565",
"metadata": {},
"source": [
"## Analysis"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2ff74280-02c4-4992-998d-4a9689e47b89",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "ragas",
"language": "python",
"name": "ragas"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading…
Cancel
Save