" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
"</a>\n",
"\n",
"Ragas is the de-facto opensource standard for RAG evaluations. Ragas provides features and methods to help evaluate RAG applications. In this notebook we will cover basic steps for evaluating your RAG application with Ragas. \n",
"As you can see, the dataset contains all the required attributes mentioned above. Now we can move on our next step of actually doing the evaluation with it.\n",
"As you can see, the dataset contains two of the required attributes mentioned,that is `question` and `ground_truth` answers. Now we can move on our next step to collect the other two attributes.\n",
"\n",
"**Note:**\n",
"*We know that it's hard to formulate a test data containing Question and ground truth answer pairs when starting out. We have the perfect solution for this in this form of a ragas synthetic test data generation feature. The questions and ground truth answers were created by [ragas synthetic data generation]() feature. Check it out here once you finish this notebook*"
]
},
{
"cell_type": "markdown",
"id": "6184b6b5-7373-4665-9754-b4fc08929000",
"metadata": {},
"source": [
"#### Simple RAG pipeline\n",
"\n",
"Now with the above step we have two attributes needed for evaluation, that is `question` and `ground_truth` answers. We now need to feed these test questions to our RAG pipeline to collect the other two attributes, ie `contexts` and `answer`. Let's build a simple RAG using llama-index to do that. "
"For evaluation ragas provides several metrics which is aimed to quantify the end-end performance of the pipeline and also the component wise performance of the pipeline. For this tutorial let's consider few of them"
"For evaluation ragas provides several metrics which is aimed to quantify the end-end performance of the pipeline and also the component wise performance of the pipeline. For this tutorial let's consider few of them\n",
"\n",
"**Note**: *Refer to our [metrics](https://docs.ragas.io/en/stable/concepts/metrics/index.html) docs to read more about different metrics.*"
"You can export the individual scores to dataframe and analyse it. You can also add [callbacks and tracing](https://docs.ragas.io/en/latest/howtos/applications/tracing.html) to ragas to do indepth analysis."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 34,
"id": "2ff74280-02c4-4992-998d-4a9689e47b89",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>question</th>\n",
" <th>answer</th>\n",
" <th>contexts</th>\n",
" <th>ground_truth</th>\n",
" <th>answer_correctness</th>\n",
" <th>faithfulness</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>How does instruction tuning affect the zero-sh...</td>\n",
" <td>Instruction tuning enhances the zero-shot perf...</td>\n",
" <td>[34\\nthe effectiveness of different constructi...</td>\n",
" <td>For larger models on the order of 100B paramet...</td>\n",
" <td>0.781983</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>What is the Zero-shot-CoT method and how does ...</td>\n",
" <td>Zero-shot-CoT is a method that involves append...</td>\n",