add synthetic data cookbook
parent
812a2dea93
commit
2905dd4094
@ -0,0 +1,549 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "45b95acd-543f-4248-be8a-28e7379d2470",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Introduction\n",
|
||||
"\n",
|
||||
"Ragas is the de-facto opensource standard for RAG evaluations. Ragas provides features and methods to help evaluate RAG applications. In this notebook we will build a synthetic test dataset using Ragas to evaluate your RAG. \n",
|
||||
"\n",
|
||||
"### Contents\n",
|
||||
"- [Prerequisites]()\n",
|
||||
"- [Dataset preparation]()\n",
|
||||
"- [Evaluation]()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "36edfc55-b18a-44db-bac1-c1ec0a91c9db",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Prerequisites\n",
|
||||
"- Ragas is a python package and we can install it using pip\n",
|
||||
"- For creating QA pairs, you will need some documents from which you intend to create it. For the sake of this notebook, I am using few papers regarding prompt engineering\n",
|
||||
"- Ragas uses model guided techniques underneath to produce scores for each metric. In this tutorial, we will use OpenAI `gpt-3.5-turbo` and `text-embedding-ada-002`. These are the default models used in ragas but you can use any LLM or Embedding of your choice by referring to this [guide](https://docs.ragas.io/en/stable/howtos/customisations/bring-your-own-llm-or-embs.html). I highly recommend that you try this notebook with open-ai so that you get a feel of it with ease.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "bc320c4f-2367-4ecc-b2a7-5df941e07bf9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install ragas"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "50779956",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Cloning into 'moe-papers-collection'...\n",
|
||||
"remote: Enumerating objects: 15, done.\u001b[K\n",
|
||||
"remote: Counting objects: 100% (12/12), done.\u001b[K\n",
|
||||
"remote: Compressing objects: 100% (12/12), done.\u001b[K\n",
|
||||
"remote: Total 15 (delta 1), reused 0 (delta 0), pack-reused 3\u001b[K\n",
|
||||
"Unpacking objects: 100% (15/15), 2.70 MiB | 11.71 MiB/s, done.\n",
|
||||
"Filtering content: 100% (2/2), 8.11 MiB | 5.72 MiB/s, done.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"!git clone https://huggingface.co/datasets/explodinggradients/prompt-engineering-guide-papers"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "8dbfaeda-49a2-437f-8543-dd242c6422b2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"<your-open-api-key>\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "de2bf933-50cb-4d79-ad34-bed8db5a5872",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Data preparation\n",
|
||||
"\n",
|
||||
"Here I am loading and parsing each of our documents to a `Document` object using langchain document loaders. You can also use llama-index so that same. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "8dc30b79",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import DirectoryLoader\n",
|
||||
"from ragas.testset.generator import TestsetGenerator\n",
|
||||
"from ragas.testset.evolutions import simple, reasoning, multi_context, conditional\n",
|
||||
"\n",
|
||||
"loader = DirectoryLoader(\"./prompt-engineering-guide-papers\", use_multithreading=True, silent_errors=True,sample_size=5)\n",
|
||||
"documents = loader.load()\n",
|
||||
"\n",
|
||||
"for document in documents:\n",
|
||||
" document.metadata['filename'] = document.metadata['source']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1e73de7f-b983-419a-9bd1-b60aae48dc67",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Test set generation\n",
|
||||
"\n",
|
||||
"Ragas aims to create high quality and diverse test dataset containing questions of different difficulty levels and types. For this we use a paradigm inspired from the idea of question evolution. One can create test dataset with different types of questions that can be synthetised by ragas, which is controlled using `distributions` parameter. Here I am creating some sample with uniform distribution of each question type.\n",
|
||||
"\n",
|
||||
"**Note:** *To know more about the underlying paradigm refer to our [docs](https://docs.ragas.io/en/stable/concepts/testset_generation.html).*"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "782f15f8-0503-48a7-9b38-5e59ce692c3e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/var/folders/ww/sk5dkfhn673234cmy5w7008r0000gn/T/ipykernel_51325/2981689800.py:2: DeprecationWarning: The function with_openai was deprecated in 0.1.4, and will be removed in the 0.2.0 release. Use from_langchain instead.\n",
|
||||
" generator = TestsetGenerator.with_openai()\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"generator = TestsetGenerator.with_openai()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "360880ab-d5c7-485a-8ca0-fee1e639c8f6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"distributions = {simple: 0.25, reasoning: 0.25, multi_context: 0.25, conditional:0.25}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "438335a5",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"embedding nodes: 0%| | 0/286 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "508f7c85484b49efadee68da0030eeec",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"Generating: 0%| | 0/25 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"testset = generator.generate_with_langchain_docs(documents, test_size=25, \n",
|
||||
" raise_exceptions=False, with_debugging_logs=False,\n",
|
||||
" distributions=distributions) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "c603d429",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"df = testset.to_pandas()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "165e010d-3f8f-4201-bf1d-7cc3c0a13413",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And Wola! That's it. You now have a test dataset. Let's inspect and save it"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cb3721b0-1e04-4b25-9348-71c251c0eff9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Saving results"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "fc0f24ad-645a-4923-93ee-1e05acf0a47e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>question</th>\n",
|
||||
" <th>contexts</th>\n",
|
||||
" <th>ground_truth</th>\n",
|
||||
" <th>evolution_type</th>\n",
|
||||
" <th>episode_done</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>How does instruction tuning affect the zero-sh...</td>\n",
|
||||
" <td>[ tasks (see Table 2 in the Appendix), FLAN on...</td>\n",
|
||||
" <td>For larger models on the order of 100B paramet...</td>\n",
|
||||
" <td>simple</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>What is the Zero-shot-CoT method and how does ...</td>\n",
|
||||
" <td>[ prompts have also focused on per-task engine...</td>\n",
|
||||
" <td>Zero-shot-CoT is a zero-shot template-based pr...</td>\n",
|
||||
" <td>simple</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>How does prompt tuning affect model performanc...</td>\n",
|
||||
" <td>[080.863.867.439.249.4\\n\\nTask Cluster:# datas...</td>\n",
|
||||
" <td>Prompt tuning improves model performance in im...</td>\n",
|
||||
" <td>simple</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>What is the purpose of instruction tuning in l...</td>\n",
|
||||
" <td>[ via natural language instructions, such as “...</td>\n",
|
||||
" <td>The purpose of instruction tuning in language ...</td>\n",
|
||||
" <td>reasoning</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>4</th>\n",
|
||||
" <td>What distinguishes Zero-shot-CoT from Few-shot...</td>\n",
|
||||
" <td>[ prompts have also focused on per-task engine...</td>\n",
|
||||
" <td>Zero-shot-CoT differs from Few-shot-CoT in tha...</td>\n",
|
||||
" <td>reasoning</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>5</th>\n",
|
||||
" <td>Which language models were used in the experim...</td>\n",
|
||||
" <td>[list\\n\\n1. For all authors...\\n\\n(a) Do the m...</td>\n",
|
||||
" <td>The language models used in the experiment 'Ex...</td>\n",
|
||||
" <td>reasoning</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>6</th>\n",
|
||||
" <td>How does Zero-shot-CoT differ from previous fe...</td>\n",
|
||||
" <td>[ prompts have also focused on per-task engine...</td>\n",
|
||||
" <td>Zero-shot-CoT differs from previous few-shot a...</td>\n",
|
||||
" <td>reasoning</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>7</th>\n",
|
||||
" <td>What are the stages in the Zero-shot-CoT metho...</td>\n",
|
||||
" <td>[ it differs from most of the prior template p...</td>\n",
|
||||
" <td>The Zero-shot-CoT method for reasoning and ans...</td>\n",
|
||||
" <td>reasoning</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>8</th>\n",
|
||||
" <td>What are the main approaches for inducing LLMs...</td>\n",
|
||||
" <td>[2 2 0 2\\n\\nt c O 7\\n\\n] L C . s c [\\n\\n1 v 3 ...</td>\n",
|
||||
" <td>The main approaches for inducing LLMs to perfo...</td>\n",
|
||||
" <td>reasoning</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>9</th>\n",
|
||||
" <td>Which sorting method has the most impact on Au...</td>\n",
|
||||
" <td>[ t a R\\n\\n30\\n\\n20\\n\\n%\\n\\n(\\n\\ne t a R\\n\\n40...</td>\n",
|
||||
" <td>The sorting method that has the most impact on...</td>\n",
|
||||
" <td>multi_context</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>10</th>\n",
|
||||
" <td>What are the pros and cons of prompting method...</td>\n",
|
||||
" <td>[ prompts have also focused on per-task engine...</td>\n",
|
||||
" <td>Our work is based on prompting methods for lar...</td>\n",
|
||||
" <td>multi_context</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>11</th>\n",
|
||||
" <td>What are the stages in Zero-shot-CoT for reaso...</td>\n",
|
||||
" <td>[ it differs from most of the prior template p...</td>\n",
|
||||
" <td>Zero-shot-CoT involves two stages: reasoning e...</td>\n",
|
||||
" <td>multi_context</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>12</th>\n",
|
||||
" <td>How does the number of datasets and templates ...</td>\n",
|
||||
" <td>[oze\\n\\n94.8a 90.0 92.0 90.0 89.0 [10] 91.0 92...</td>\n",
|
||||
" <td>Using more datasets per task cluster improves ...</td>\n",
|
||||
" <td>multi_context</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>13</th>\n",
|
||||
" <td>What technique surpasses zero-shot large langu...</td>\n",
|
||||
" <td>[3 2 0 2\\n\\nn a J\\n\\n9 2\\n\\n] L C . s c [\\n\\n4...</td>\n",
|
||||
" <td>Chain of thought (CoT) prompting</td>\n",
|
||||
" <td>multi_context</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>14</th>\n",
|
||||
" <td>How does language model scale impact instructi...</td>\n",
|
||||
" <td>[ tasks (see Table 2 in the Appendix), FLAN on...</td>\n",
|
||||
" <td>For larger language models on the order of 100...</td>\n",
|
||||
" <td>conditional</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>15</th>\n",
|
||||
" <td>What's the advantage of using Zero-shot-CoT pr...</td>\n",
|
||||
" <td>[ prompts have also focused on per-task engine...</td>\n",
|
||||
" <td>Zero-shot-CoT prompts offer the advantage of n...</td>\n",
|
||||
" <td>conditional</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>16</th>\n",
|
||||
" <td>What's the difference in unresolving rate betw...</td>\n",
|
||||
" <td>[-Q-CoT.\\n\\nTo begin with, we invoke Zero-Shot...</td>\n",
|
||||
" <td>The unresolving rate of Retrieval-Q-CoT is 46....</td>\n",
|
||||
" <td>conditional</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>17</th>\n",
|
||||
" <td>What are the pros and cons of prompting method...</td>\n",
|
||||
" <td>[ prompts have also focused on per-task engine...</td>\n",
|
||||
" <td>Prompting methods for large language models ha...</td>\n",
|
||||
" <td>conditional</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>18</th>\n",
|
||||
" <td>What are the stages and processes in the Auto-...</td>\n",
|
||||
" <td>[ wrong demonstrations may be eliminated with ...</td>\n",
|
||||
" <td>The Auto-CoT method for constructing demonstra...</td>\n",
|
||||
" <td>conditional</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>19</th>\n",
|
||||
" <td>How are genes passed from one generation to th...</td>\n",
|
||||
" <td>[ Penguin is a kind of bird. Knowledge: Clouds...</td>\n",
|
||||
" <td>Genes are passed from parent to offspring.</td>\n",
|
||||
" <td>reasoning</td>\n",
|
||||
" <td>True</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" question \\\n",
|
||||
"0 How does instruction tuning affect the zero-sh... \n",
|
||||
"1 What is the Zero-shot-CoT method and how does ... \n",
|
||||
"2 How does prompt tuning affect model performanc... \n",
|
||||
"3 What is the purpose of instruction tuning in l... \n",
|
||||
"4 What distinguishes Zero-shot-CoT from Few-shot... \n",
|
||||
"5 Which language models were used in the experim... \n",
|
||||
"6 How does Zero-shot-CoT differ from previous fe... \n",
|
||||
"7 What are the stages in the Zero-shot-CoT metho... \n",
|
||||
"8 What are the main approaches for inducing LLMs... \n",
|
||||
"9 Which sorting method has the most impact on Au... \n",
|
||||
"10 What are the pros and cons of prompting method... \n",
|
||||
"11 What are the stages in Zero-shot-CoT for reaso... \n",
|
||||
"12 How does the number of datasets and templates ... \n",
|
||||
"13 What technique surpasses zero-shot large langu... \n",
|
||||
"14 How does language model scale impact instructi... \n",
|
||||
"15 What's the advantage of using Zero-shot-CoT pr... \n",
|
||||
"16 What's the difference in unresolving rate betw... \n",
|
||||
"17 What are the pros and cons of prompting method... \n",
|
||||
"18 What are the stages and processes in the Auto-... \n",
|
||||
"19 How are genes passed from one generation to th... \n",
|
||||
"\n",
|
||||
" contexts \\\n",
|
||||
"0 [ tasks (see Table 2 in the Appendix), FLAN on... \n",
|
||||
"1 [ prompts have also focused on per-task engine... \n",
|
||||
"2 [080.863.867.439.249.4\\n\\nTask Cluster:# datas... \n",
|
||||
"3 [ via natural language instructions, such as “... \n",
|
||||
"4 [ prompts have also focused on per-task engine... \n",
|
||||
"5 [list\\n\\n1. For all authors...\\n\\n(a) Do the m... \n",
|
||||
"6 [ prompts have also focused on per-task engine... \n",
|
||||
"7 [ it differs from most of the prior template p... \n",
|
||||
"8 [2 2 0 2\\n\\nt c O 7\\n\\n] L C . s c [\\n\\n1 v 3 ... \n",
|
||||
"9 [ t a R\\n\\n30\\n\\n20\\n\\n%\\n\\n(\\n\\ne t a R\\n\\n40... \n",
|
||||
"10 [ prompts have also focused on per-task engine... \n",
|
||||
"11 [ it differs from most of the prior template p... \n",
|
||||
"12 [oze\\n\\n94.8a 90.0 92.0 90.0 89.0 [10] 91.0 92... \n",
|
||||
"13 [3 2 0 2\\n\\nn a J\\n\\n9 2\\n\\n] L C . s c [\\n\\n4... \n",
|
||||
"14 [ tasks (see Table 2 in the Appendix), FLAN on... \n",
|
||||
"15 [ prompts have also focused on per-task engine... \n",
|
||||
"16 [-Q-CoT.\\n\\nTo begin with, we invoke Zero-Shot... \n",
|
||||
"17 [ prompts have also focused on per-task engine... \n",
|
||||
"18 [ wrong demonstrations may be eliminated with ... \n",
|
||||
"19 [ Penguin is a kind of bird. Knowledge: Clouds... \n",
|
||||
"\n",
|
||||
" ground_truth evolution_type \\\n",
|
||||
"0 For larger models on the order of 100B paramet... simple \n",
|
||||
"1 Zero-shot-CoT is a zero-shot template-based pr... simple \n",
|
||||
"2 Prompt tuning improves model performance in im... simple \n",
|
||||
"3 The purpose of instruction tuning in language ... reasoning \n",
|
||||
"4 Zero-shot-CoT differs from Few-shot-CoT in tha... reasoning \n",
|
||||
"5 The language models used in the experiment 'Ex... reasoning \n",
|
||||
"6 Zero-shot-CoT differs from previous few-shot a... reasoning \n",
|
||||
"7 The Zero-shot-CoT method for reasoning and ans... reasoning \n",
|
||||
"8 The main approaches for inducing LLMs to perfo... reasoning \n",
|
||||
"9 The sorting method that has the most impact on... multi_context \n",
|
||||
"10 Our work is based on prompting methods for lar... multi_context \n",
|
||||
"11 Zero-shot-CoT involves two stages: reasoning e... multi_context \n",
|
||||
"12 Using more datasets per task cluster improves ... multi_context \n",
|
||||
"13 Chain of thought (CoT) prompting multi_context \n",
|
||||
"14 For larger language models on the order of 100... conditional \n",
|
||||
"15 Zero-shot-CoT prompts offer the advantage of n... conditional \n",
|
||||
"16 The unresolving rate of Retrieval-Q-CoT is 46.... conditional \n",
|
||||
"17 Prompting methods for large language models ha... conditional \n",
|
||||
"18 The Auto-CoT method for constructing demonstra... conditional \n",
|
||||
"19 Genes are passed from parent to offspring. reasoning \n",
|
||||
"\n",
|
||||
" episode_done \n",
|
||||
"0 True \n",
|
||||
"1 True \n",
|
||||
"2 True \n",
|
||||
"3 True \n",
|
||||
"4 True \n",
|
||||
"5 True \n",
|
||||
"6 True \n",
|
||||
"7 True \n",
|
||||
"8 True \n",
|
||||
"9 True \n",
|
||||
"10 True \n",
|
||||
"11 True \n",
|
||||
"12 True \n",
|
||||
"13 True \n",
|
||||
"14 True \n",
|
||||
"15 True \n",
|
||||
"16 True \n",
|
||||
"17 True \n",
|
||||
"18 True \n",
|
||||
"19 True "
|
||||
]
|
||||
},
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"df = df[df['ground_truth']!=\"nan\"].reset_index(drop=True)\n",
|
||||
"df"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "ad315aee-3029-46c2-812c-edf821e3f033",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"df.to_csv(\"synthetic_test_dataset.csv\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "58e5c4c8-47dc-4195-8332-453f96e1a6d2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "ragas",
|
||||
"language": "python",
|
||||
"name": "ragas"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
Loading…
Reference in New Issue