From cbe292bd93632d7df9291d1ae17b294f49f0acf9 Mon Sep 17 00:00:00 2001 From: simonpfish Date: Tue, 22 Aug 2023 13:48:39 -0700 Subject: [PATCH] update fine-tuning cookbook --- examples/How_to_finetune_chat_models.ipynb | 686 +++++++ ...odels_for_structured_data_extraction.ipynb | 1598 ----------------- 2 files changed, 686 insertions(+), 1598 deletions(-) create mode 100644 examples/How_to_finetune_chat_models.ipynb delete mode 100644 examples/How_to_finetune_chat_models_for_structured_data_extraction.ipynb diff --git a/examples/How_to_finetune_chat_models.ipynb b/examples/How_to_finetune_chat_models.ipynb new file mode 100644 index 00000000..c368790b --- /dev/null +++ b/examples/How_to_finetune_chat_models.ipynb @@ -0,0 +1,686 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7bcdad0e-b67c-4927-b00e-3a4d950cd8ce", + "metadata": {}, + "source": [ + "# How to fine-tune chat models\n", + "\n", + "This notebook provides a step-by-step guide for our new `gpt-3.5-turbo` fine-tuning. We'll perform entity extraction using the [RecipeNLG dataset](https://github.com/Glorf/recipenlg), which provides various recipes and a list of extracted generic ingredients for each. This is a common dataset for named entity recognition (NER) tasks.\n", + "\n", + "We will go through the following steps:\n", + "\n", + "1. **Setup:** Loading our dataset and filtering down to one domain to fine-tune on.\n", + "2. **Data preparation:** Preparing your data for fine-tuning by creating training and validation examples, and uploading them to the `Files` endpoint.\n", + "3. **Create the fine-tune:** Creating your fine-tuned model.\n", + "4. **Use model for inference:** Using your fine-tuned model for inference on new inputs.\n", + "\n", + "By the end of this you should be able to train, evaluate and deploy a fine-tuned `gpt-3.5-turbo` model.\n" + ] + }, + { + "cell_type": "markdown", + "id": "6f49cb10-f895-41f4-aa97-da606d0084d4", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "First we will import any required libraries and prepare our data.\n", + "\n", + "Fine tuning works best when focused on a particular domain. It's important to make sure your dataset is both focused enough for the model to learn, but general enough that unseen examples won't be missed. Having this in mind, we have already extracted a subset from the RecipesNLG dataset to only contain documents from www.cookbooks.com.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "32036e70", + "metadata": {}, + "outputs": [], + "source": [ + "# make sure to use the latest version of the openai python package\n", + "!pip install --upgrade openai " + ] + }, + { + "cell_type": "code", + "execution_count": 76, + "id": "6e1f4403-37e1-4115-a215-12fd7daa1eb6", + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "import openai\n", + "import os\n", + "import pandas as pd\n", + "import requests\n", + "from pprint import pprint\n", + "\n", + "OPENAI_API_KEY = os.getenv(\"OPENAI_API_KEY\", \"\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "f57ebc23-14b7-47f9-90b8-1d791ccfc9bc", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleingredientsdirectionslinksourceNER
0No-Bake Nut Cookies[\"1 c. firmly packed brown sugar\", \"1/2 c. eva...[\"In a heavy 2-quart saucepan, mix brown sugar...www.cookbooks.com/Recipe-Details.aspx?id=44874www.cookbooks.com[\"brown sugar\", \"milk\", \"vanilla\", \"nuts\", \"bu...
1Jewell Ball'S Chicken[\"1 small jar chipped beef, cut up\", \"4 boned ...[\"Place chipped beef on bottom of baking dish....www.cookbooks.com/Recipe-Details.aspx?id=699419www.cookbooks.com[\"beef\", \"chicken breasts\", \"cream of mushroom...
2Creamy Corn[\"2 (16 oz.) pkg. frozen corn\", \"1 (8 oz.) pkg...[\"In a slow cooker, combine all ingredients. C...www.cookbooks.com/Recipe-Details.aspx?id=10570www.cookbooks.com[\"frozen corn\", \"cream cheese\", \"butter\", \"gar...
3Chicken Funny[\"1 large whole chicken\", \"2 (10 1/2 oz.) cans...[\"Boil and debone chicken.\", \"Put bite size pi...www.cookbooks.com/Recipe-Details.aspx?id=897570www.cookbooks.com[\"chicken\", \"chicken gravy\", \"cream of mushroo...
4Reeses Cups(Candy)[\"1 c. peanut butter\", \"3/4 c. graham cracker ...[\"Combine first four ingredients and press in ...www.cookbooks.com/Recipe-Details.aspx?id=659239www.cookbooks.com[\"peanut butter\", \"graham cracker crumbs\", \"bu...
\n", + "
" + ], + "text/plain": [ + " title ingredients \\\n", + "0 No-Bake Nut Cookies [\"1 c. firmly packed brown sugar\", \"1/2 c. eva... \n", + "1 Jewell Ball'S Chicken [\"1 small jar chipped beef, cut up\", \"4 boned ... \n", + "2 Creamy Corn [\"2 (16 oz.) pkg. frozen corn\", \"1 (8 oz.) pkg... \n", + "3 Chicken Funny [\"1 large whole chicken\", \"2 (10 1/2 oz.) cans... \n", + "4 Reeses Cups(Candy) [\"1 c. peanut butter\", \"3/4 c. graham cracker ... \n", + "\n", + " directions \\\n", + "0 [\"In a heavy 2-quart saucepan, mix brown sugar... \n", + "1 [\"Place chipped beef on bottom of baking dish.... \n", + "2 [\"In a slow cooker, combine all ingredients. C... \n", + "3 [\"Boil and debone chicken.\", \"Put bite size pi... \n", + "4 [\"Combine first four ingredients and press in ... \n", + "\n", + " link source \\\n", + "0 www.cookbooks.com/Recipe-Details.aspx?id=44874 www.cookbooks.com \n", + "1 www.cookbooks.com/Recipe-Details.aspx?id=699419 www.cookbooks.com \n", + "2 www.cookbooks.com/Recipe-Details.aspx?id=10570 www.cookbooks.com \n", + "3 www.cookbooks.com/Recipe-Details.aspx?id=897570 www.cookbooks.com \n", + "4 www.cookbooks.com/Recipe-Details.aspx?id=659239 www.cookbooks.com \n", + "\n", + " NER \n", + "0 [\"brown sugar\", \"milk\", \"vanilla\", \"nuts\", \"bu... \n", + "1 [\"beef\", \"chicken breasts\", \"cream of mushroom... \n", + "2 [\"frozen corn\", \"cream cheese\", \"butter\", \"gar... \n", + "3 [\"chicken\", \"chicken gravy\", \"cream of mushroo... \n", + "4 [\"peanut butter\", \"graham cracker crumbs\", \"bu... " + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Read in the dataset we'll use for this task.\n", + "# This will be the RecipesNLG dataset, which we've cleaned to only contain documents from www.cookbooks.com\n", + "recipe_df = pd.read_csv(\"data/cookbook_recipes_nlg_10k.csv\")\n", + "\n", + "recipe_df.head()" + ] + }, + { + "cell_type": "markdown", + "id": "2b3151e9-8715-47bd-a153-195d6a0d0a70", + "metadata": {}, + "source": [ + "## Data preparation\n", + "\n", + "We'll begin by preparing our data. When fine-tuning with the `ChatCompletion` format, each training example is a simple list of `messages`. For example, an entry could look like:\n", + "\n", + "```\n", + "[{'role': 'system',\n", + " 'content': 'You are a helpful recipe assistant. You are to extract the generic ingredients from each of the recipes provided.'},\n", + "\n", + " {'role': 'user',\n", + " 'content': 'Title: No-Bake Nut Cookies\\n\\nIngredients: [\"1 c. firmly packed brown sugar\", \"1/2 c. evaporated milk\", \"1/2 tsp. vanilla\", \"1/2 c. broken nuts (pecans)\", \"2 Tbsp. butter or margarine\", \"3 1/2 c. bite size shredded rice biscuits\"]\\n\\nGeneric ingredients: '},\n", + "\n", + " {'role': 'assistant',\n", + " 'content': '[\"brown sugar\", \"milk\", \"vanilla\", \"nuts\", \"butter\", \"bite size shredded rice biscuits\"]'}]\n", + "```\n", + "\n", + "During the training process this conversation will be split, with the final entry being the `completion` that the model will produce, and the remainder of the `messages` acting as the prompt. Consider this when building your training examples - if your model will act on multi-turn conversations, then please provide representative examples so it doesn't perform poorly when the conversation starts to expand.\n", + "\n", + "For fine-tuning with `ChatCompletion` you can begin with even 30-50 well-pruned examples. You should see performance continue to scale linearly as you increase the size of the training set, but your jobs will also take longer.\n", + "\n", + "Please note that currently there is a 4096 token limit for each training example. Anything longer than this will be truncated at 4096 tokens.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "9a8216b0-d1dc-472d-b07d-1be03acd70a5", + "metadata": {}, + "outputs": [], + "source": [ + "training_data = []\n", + "\n", + "system_message = \"You are a helpful recipe assistant. You are to extract the generic ingredients from each of the recipes provided.\"\n", + "\n", + "\n", + "def create_user_message(row):\n", + " return f\"\"\"Title: {row['title']}\\n\\nIngredients: {row['ingredients']}\\n\\nGeneric ingredients: \"\"\"\n", + "\n", + "\n", + "# Take first 100 records for training\n", + "for x, y in recipe_df.head(100).iterrows():\n", + " training_message = []\n", + " training_message.append({\"role\": \"system\", \"content\": system_message})\n", + "\n", + " user_message = create_user_message(y)\n", + " training_message.append({\"role\": \"user\", \"content\": user_message})\n", + "\n", + " training_message.append({\"role\": \"assistant\", \"content\": y[\"NER\"]})\n", + " training_message_dict = {\"messages\": training_message}\n", + " training_data.append(training_message_dict)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "5b853efa-dfea-4770-ab88-9b7e17794421", + "metadata": {}, + "outputs": [], + "source": [ + "validation_data = []\n", + "\n", + "# We'll pick a test set from further on in the dataset\n", + "test_df = recipe_df.loc[100:200]\n", + "\n", + "for x, y in test_df.iterrows():\n", + " validation_message = []\n", + " validation_message.append({\"role\": \"system\", \"content\": system_message})\n", + "\n", + " user_message = create_user_message(y)\n", + " validation_message.append({\"role\": \"user\", \"content\": user_message})\n", + "\n", + " validation_message.append({\"role\": \"assistant\", \"content\": y[\"NER\"]})\n", + " validation_message_dict = {\"messages\": validation_message}\n", + " validation_data.append(validation_message_dict)" + ] + }, + { + "cell_type": "markdown", + "id": "1d5e7bfe-f6c8-4a23-a951-3df3f3791d7f", + "metadata": {}, + "source": [ + "We then need to export these as `.jsonl` files, with each row being one training example.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "8d2eb207-2c2b-43f6-a613-64a7e92d494d", + "metadata": {}, + "outputs": [], + "source": [ + "def dicts_to_jsonl(data_list: list, filename: str) -> None:\n", + " \"\"\"\n", + " Method saves list of dicts into jsonl file.\n", + " :param data: (list) list of dicts to be stored,\n", + " :param filename: (str) path to the output file. If suffix .jsonl is not given then methods appends\n", + " .jsonl suffix into the file.\n", + " \"\"\"\n", + " sjsonl = \".jsonl\"\n", + "\n", + " # Check filename\n", + " if not filename.endswith(sjsonl):\n", + " filename = filename + \".jsonl\"\n", + "\n", + " # Save data\n", + " with open(filename, \"w\") as out:\n", + " for ddict in data_list:\n", + " jout = json.dumps(ddict) + \"\\n\"\n", + " out.write(jout)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "8b53e7a2-1cac-4c5f-8ba4-3292ba2a0770", + "metadata": {}, + "outputs": [], + "source": [ + "# Save training_data to JSONL\n", + "training_file_name = \"tmp_recipe_finetune_training\"\n", + "dicts_to_jsonl(training_data, training_file_name)\n", + "\n", + "# Save validation_data to JSONL\n", + "validation_file_name = \"tmp_recipe_finetune_validation\"\n", + "dicts_to_jsonl(validation_data, validation_file_name)" + ] + }, + { + "cell_type": "markdown", + "id": "0d149e2e-50dd-45c1-bd8d-1291975670b4", + "metadata": {}, + "source": [ + "### Upload files\n", + "\n", + "You can then upload the files to our `Files` endpoint to be used by the fine-tuned model.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "69462d9e-e6bd-49b9-a064-9eae4ea5b7a8", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Training file id: file-XMhftmLXyyTvvmiLpMRMIAcL\n", + "Validation file id: file-kZz433aerpPIMADZF7xC5NKd\n" + ] + } + ], + "source": [ + "training_response = openai.File.create(\n", + " file=open(training_file_name + \".jsonl\", \"rb\"), purpose=\"fine-tune\"\n", + ")\n", + "training_file_id = training_response[\"id\"]\n", + "\n", + "validation_response = openai.File.create(\n", + " file=open(validation_file_name + \".jsonl\", \"rb\"), purpose=\"fine-tune\"\n", + ")\n", + "validation_file_id = validation_response[\"id\"]\n", + "\n", + "print(\"Training file id:\", training_file_id)\n", + "print(\"Validation file id:\", validation_file_id)" + ] + }, + { + "cell_type": "markdown", + "id": "d61cd381-63ad-4ed9-b0be-47a438891028", + "metadata": {}, + "source": [ + "### Create fine-tune job\n", + "\n", + "Now we can create our fine-tuning job with the generated files and an optional suffix to identify the model. The response will contain an `id` which you can use to retrieve updates on the job.\n", + "\n", + "Note: The files have to first be processed by our system, so you might get a `File is not ready` error. In that case, simply retry a few minutes later.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "05541ceb-5628-447e-962d-7e57c112439c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"object\": \"fine_tuning.job\",\n", + " \"id\": \"ftjob-ksOzx7zjpsrADZfhB5eyfB0Z\",\n", + " \"model\": \"gpt-3.5-turbo-0613\",\n", + " \"created_at\": 1692734343,\n", + " \"finished_at\": null,\n", + " \"fine_tuned_model\": null,\n", + " \"organization_id\": \"org-l89177bnhkme4a44292n5r3j\",\n", + " \"result_files\": [],\n", + " \"status\": \"created\",\n", + " \"validation_file\": \"file-kZz433aerpPIMADZF7xC5NKd\",\n", + " \"training_file\": \"file-XMhftmLXyyTvvmiLpMRMIAcL\",\n", + " \"hyperparameters\": {\n", + " \"n_epochs\": 3\n", + " },\n", + " \"trained_tokens\": null\n", + "}\n" + ] + } + ], + "source": [ + "suffix_name = \"recipe-ner\"\n", + "\n", + "\n", + "response = openai.FineTuningJob.create(\n", + " training_file=training_file_id,\n", + " validation_file=validation_file_id,\n", + " model=\"gpt-3.5-turbo\",\n", + " suffix=suffix_name,\n", + ")\n", + "\n", + "job_id = response[\"id\"]\n", + "\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "id": "1de3ed71-f2d4-4138-95a3-70da187a007e", + "metadata": {}, + "source": [ + "#### Check job status\n", + "\n", + "You can make a `GET` request to the `https://api.openai.com/v1/alpha/fine-tunes` endpoint to list your alpha fine-tune jobs. In this instance you'll want to check that the ID you got from the previous step ends up as `status: succeeded`.\n", + "\n", + "Once it is completed, you can use the `result_files` to sample the results from the validation set (if you uploaded one), and use the ID from the `fine_tuned_model` parameter to invoke your trained model.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "id": "d7392f48", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"object\": \"fine_tuning.job\",\n", + " \"id\": \"ftjob-ksOzx7zjpsrADZfhB5eyfB0Z\",\n", + " \"model\": \"gpt-3.5-turbo-0613\",\n", + " \"created_at\": 1692734343,\n", + " \"finished_at\": 1692735182,\n", + " \"fine_tuned_model\": \"ft:gpt-3.5-turbo-0613:openai:recipe-ner:7qS2GFaX\",\n", + " \"organization_id\": \"org-l89177bnhkme4a44292n5r3j\",\n", + " \"result_files\": [\n", + " \"file-Tjt0E6BvQ846m75gYxtWiRzb\"\n", + " ],\n", + " \"status\": \"succeeded\",\n", + " \"validation_file\": \"file-kZz433aerpPIMADZF7xC5NKd\",\n", + " \"training_file\": \"file-XMhftmLXyyTvvmiLpMRMIAcL\",\n", + " \"hyperparameters\": {\n", + " \"n_epochs\": 3\n", + " },\n", + " \"trained_tokens\": 39687\n", + "}\n" + ] + } + ], + "source": [ + "response = openai.FineTuningJob.retrieve(job_id)\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "id": "30a57fbb", + "metadata": {}, + "source": [ + "We can track the progress of the fine-tune with the events endpoint. You can rerun the cell below a few times until the fine-tune is ready. \n" + ] + }, + { + "cell_type": "code", + "execution_count": 69, + "id": "08cace28", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Created fine-tune: ftjob-ksOzx7zjpsrADZfhB5eyfB0Z\n", + "Fine tuning job started\n", + "Step 10: training loss=0.16\n", + "Step 20: training loss=0.14\n", + "Step 30: training loss=0.30\n", + "Step 40: training loss=0.17\n", + "Step 50: training loss=0.03\n", + "Step 60: training loss=0.55\n", + "Step 70: training loss=0.09\n", + "Step 80: training loss=0.00\n", + "Step 90: training loss=0.15\n", + "Step 100: training loss=0.06\n", + "Step 110: training loss=0.03\n", + "Step 120: training loss=0.04\n", + "Step 130: training loss=0.21\n", + "Step 140: training loss=0.00\n", + "Step 150: training loss=0.02\n", + "Step 160: training loss=0.00\n", + "Step 170: training loss=0.00\n", + "Step 180: training loss=0.19\n", + "Step 190: training loss=0.55\n", + "Step 200: training loss=0.01\n", + "Step 210: training loss=0.00\n", + "Step 220: training loss=0.00\n", + "Step 230: training loss=0.00\n", + "Step 240: training loss=0.00\n", + "Step 250: training loss=0.00\n", + "Step 260: training loss=0.36\n", + "Step 270: training loss=0.16\n", + "Step 280: training loss=0.01\n", + "Step 290: training loss=0.04\n", + "Step 300: training loss=0.53\n", + "New fine-tuned model created: ft:gpt-3.5-turbo-0613:openai:recipe-ner:7qS2GFaX\n", + "Fine-tuning job successfully completed\n" + ] + } + ], + "source": [ + "response = openai.FineTuningJob.list_events(id=job_id, limit=50)\n", + "\n", + "events = response[\"data\"]\n", + "events.reverse()\n", + "\n", + "for event in events:\n", + " print(event[\"message\"])" + ] + }, + { + "cell_type": "markdown", + "id": "d0da4e32", + "metadata": {}, + "source": [ + "Now that it's done, we can get a fine-tuned model ID from the job\n" + ] + }, + { + "cell_type": "code", + "execution_count": 81, + "id": "40b28c26", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"object\": \"fine_tuning.job\",\n", + " \"id\": \"ftjob-ksOzx7zjpsrADZfhB5eyfB0Z\",\n", + " \"model\": \"gpt-3.5-turbo-0613\",\n", + " \"created_at\": 1692734343,\n", + " \"finished_at\": 1692735182,\n", + " \"fine_tuned_model\": \"ft:gpt-3.5-turbo-0613:openai:recipe-ner:7qS2GFaX\",\n", + " \"organization_id\": \"org-l89177bnhkme4a44292n5r3j\",\n", + " \"result_files\": [\n", + " \"file-Tjt0E6BvQ846m75gYxtWiRzb\"\n", + " ],\n", + " \"status\": \"succeeded\",\n", + " \"validation_file\": \"file-kZz433aerpPIMADZF7xC5NKd\",\n", + " \"training_file\": \"file-XMhftmLXyyTvvmiLpMRMIAcL\",\n", + " \"hyperparameters\": {\n", + " \"n_epochs\": 3\n", + " },\n", + " \"trained_tokens\": 39687\n", + "}\n", + "\n", + "Fine-tuned model id: ft:gpt-3.5-turbo-0613:openai:recipe-ner:7qS2GFaX\n" + ] + } + ], + "source": [ + "response = openai.FineTuningJob.retrieve(job_id)\n", + "fine_tuned_model_id = response[\"fine_tuned_model\"]\n", + "\n", + "print(response)\n", + "print(\"\\nFine-tuned model id:\", fine_tuned_model_id)" + ] + }, + { + "cell_type": "markdown", + "id": "0025e392-84cd-4566-a384-ea31ca43e567", + "metadata": {}, + "source": [ + "## Generate with the fine-tuned model" + ] + }, + { + "cell_type": "markdown", + "id": "0ab9ac11", + "metadata": {}, + "source": [ + "The last step is to use your fine-tuned model for inference. Similar to the classic `FineTuning`, you simply call `ChatCompletions` with your new fine-tuned model name filling the `model` parameter." + ] + }, + { + "cell_type": "code", + "execution_count": 87, + "id": "1c7de631-b68f-4eff-9ae7-051641579c2b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[{'content': 'You are a helpful recipe assistant. You are to extract the '\n", + " 'generic ingredients from each of the recipes provided.',\n", + " 'role': 'system'},\n", + " {'content': 'Title: Pancakes\\n'\n", + " '\\n'\n", + " 'Ingredients: [\"1 c. flour\", \"1 tsp. soda\", \"1 tsp. salt\", \"1 '\n", + " 'Tbsp. sugar\", \"1 egg\", \"3 Tbsp. margarine, melted\", \"1 c. '\n", + " 'buttermilk\"]\\n'\n", + " '\\n'\n", + " 'Generic ingredients: ',\n", + " 'role': 'user'}]\n" + ] + } + ], + "source": [ + "test_row = test_df.iloc[0]\n", + "test_messages = []\n", + "test_messages.append({\"role\": \"system\", \"content\": system_message})\n", + "user_message = create_user_message(test_row)\n", + "test_messages.append({\"role\": \"user\", \"content\": create_user_message(test_row)})\n", + "\n", + "pprint(test_messages)" + ] + }, + { + "cell_type": "code", + "execution_count": 88, + "id": "1a1d2589", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\"flour\", \"soda\", \"salt\", \"sugar\", \"egg\", \"margarine\", \"buttermilk\"]\n" + ] + } + ], + "source": [ + "response = openai.ChatCompletion.create(\n", + " model=fine_tuned_model_id, messages=test_messages, temperature=0, max_tokens=500\n", + ")\n", + "print(response[\"choices\"][0][\"message\"][\"content\"])" + ] + }, + { + "cell_type": "markdown", + "id": "07799909-3f2a-4274-b81e-dabc048be28f", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "Congratulations, you are now ready to fine-tune your own models using the `ChatCompletion` format! We look forward to seeing what you build\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/examples/How_to_finetune_chat_models_for_structured_data_extraction.ipynb b/examples/How_to_finetune_chat_models_for_structured_data_extraction.ipynb deleted file mode 100644 index 6ba6e2f5..00000000 --- a/examples/How_to_finetune_chat_models_for_structured_data_extraction.ipynb +++ /dev/null @@ -1,1598 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "7bcdad0e-b67c-4927-b00e-3a4d950cd8ce", - "metadata": {}, - "source": [ - "# How to fine-tune chat models for structured data extraction\n", - "\n", - "This notebook walks through using our new fine-tuning feature in `ChatCompletion` format. To illustrate this example we'll perform entity extraction using the [RecipeNLG dataset](https://github.com/Glorf/recipenlg), which provides the title, ingredients and directions of a food dish and extracts generic ingredients from it.\n", - "\n", - "We will walk through the end-to-end process which consists of:\n", - "- **Setup:** Loading our dataset and filtering down to one domain to fine-tune on.\n", - "- **Few-shot predictions:** Run a first attempt using few-shot examples, so we have a baseline to measure how well our fine-tune performs.\n", - "- **Data preparation:** Prepare your data for fine-tuning by creating training/validation examples and uploading to the `Files` endpoint.\n", - "- **Create and evaluate fine-tune:** Create your model and evaluate it against a validation set to measure its performance.\n", - "- **Use model for inference:** Close off the process by using your model for inference on new inputs.\n", - "\n", - "By the end of this you should be able to train, evaluate and deploy your fine-tuned models confidently. \n" - ] - }, - { - "cell_type": "markdown", - "id": "6f49cb10-f895-41f4-aa97-da606d0084d4", - "metadata": {}, - "source": [ - "## Setup\n", - "\n", - "Import any required libraries and prepare our data.\n", - "\n", - "Our data preparation includes breaking our source data up by where it was sourced from, and then focusing on one source only. This is an important consideration when fine-tuning - it will work best when tuning to a particular domain, so make sure your dataset is both focused enough for the model to learn, but general enough that unseen examples won't be missed." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "32036e70", - "metadata": {}, - "outputs": [], - "source": [ - "!pip install matplotlib \n", - "!pip install openai \n", - "!pip install pandas\n", - "!pip install requests" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "6e1f4403-37e1-4115-a215-12fd7daa1eb6", - "metadata": {}, - "outputs": [], - "source": [ - "import ast\n", - "import json\n", - "import matplotlib\n", - "import openai\n", - "import os\n", - "import pandas as pd\n", - "import requests\n", - "\n", - "GPT_MODEL = 'gpt-3.5-turbo'\n" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "f57ebc23-14b7-47f9-90b8-1d791ccfc9bc", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
titleingredientsdirectionslinksourceNER
0No-Bake Nut Cookies[\"1 c. firmly packed brown sugar\", \"1/2 c. eva...[\"In a heavy 2-quart saucepan, mix brown sugar...www.cookbooks.com/Recipe-Details.aspx?id=44874www.cookbooks.com[\"brown sugar\", \"milk\", \"vanilla\", \"nuts\", \"bu...
1Jewell Ball'S Chicken[\"1 small jar chipped beef, cut up\", \"4 boned ...[\"Place chipped beef on bottom of baking dish....www.cookbooks.com/Recipe-Details.aspx?id=699419www.cookbooks.com[\"beef\", \"chicken breasts\", \"cream of mushroom...
2Creamy Corn[\"2 (16 oz.) pkg. frozen corn\", \"1 (8 oz.) pkg...[\"In a slow cooker, combine all ingredients. C...www.cookbooks.com/Recipe-Details.aspx?id=10570www.cookbooks.com[\"frozen corn\", \"cream cheese\", \"butter\", \"gar...
3Chicken Funny[\"1 large whole chicken\", \"2 (10 1/2 oz.) cans...[\"Boil and debone chicken.\", \"Put bite size pi...www.cookbooks.com/Recipe-Details.aspx?id=897570www.cookbooks.com[\"chicken\", \"chicken gravy\", \"cream of mushroo...
4Reeses Cups(Candy)[\"1 c. peanut butter\", \"3/4 c. graham cracker ...[\"Combine first four ingredients and press in ...www.cookbooks.com/Recipe-Details.aspx?id=659239www.cookbooks.com[\"peanut butter\", \"graham cracker crumbs\", \"bu...
\n", - "
" - ], - "text/plain": [ - " title ingredients \\\n", - "0 No-Bake Nut Cookies [\"1 c. firmly packed brown sugar\", \"1/2 c. eva... \n", - "1 Jewell Ball'S Chicken [\"1 small jar chipped beef, cut up\", \"4 boned ... \n", - "2 Creamy Corn [\"2 (16 oz.) pkg. frozen corn\", \"1 (8 oz.) pkg... \n", - "3 Chicken Funny [\"1 large whole chicken\", \"2 (10 1/2 oz.) cans... \n", - "4 Reeses Cups(Candy) [\"1 c. peanut butter\", \"3/4 c. graham cracker ... \n", - "\n", - " directions \\\n", - "0 [\"In a heavy 2-quart saucepan, mix brown sugar... \n", - "1 [\"Place chipped beef on bottom of baking dish.... \n", - "2 [\"In a slow cooker, combine all ingredients. C... \n", - "3 [\"Boil and debone chicken.\", \"Put bite size pi... \n", - "4 [\"Combine first four ingredients and press in ... \n", - "\n", - " link source \\\n", - "0 www.cookbooks.com/Recipe-Details.aspx?id=44874 www.cookbooks.com \n", - "1 www.cookbooks.com/Recipe-Details.aspx?id=699419 www.cookbooks.com \n", - "2 www.cookbooks.com/Recipe-Details.aspx?id=10570 www.cookbooks.com \n", - "3 www.cookbooks.com/Recipe-Details.aspx?id=897570 www.cookbooks.com \n", - "4 www.cookbooks.com/Recipe-Details.aspx?id=659239 www.cookbooks.com \n", - "\n", - " NER \n", - "0 [\"brown sugar\", \"milk\", \"vanilla\", \"nuts\", \"bu... \n", - "1 [\"beef\", \"chicken breasts\", \"cream of mushroom... \n", - "2 [\"frozen corn\", \"cream cheese\", \"butter\", \"gar... \n", - "3 [\"chicken\", \"chicken gravy\", \"cream of mushroo... \n", - "4 [\"peanut butter\", \"graham cracker crumbs\", \"bu... " - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Read in the dataset we'll use for this task.\n", - "# This will be the RecipesNLG dataset, which we've cleaned to only contain documents from www.cookbooks.com\n", - "recipe_df = pd.read_csv('data/cookbook_recipes_nlg_10k.csv')\n", - "\n", - "recipe_df.head()\n" - ] - }, - { - "cell_type": "markdown", - "id": "a2ad00de-18fe-4454-b27d-9758c6f73365", - "metadata": {}, - "source": [ - "## Few-shot\n", - "\n", - "First we'll try to solve this problem few-shot by giving 5 examples and then measuring the proportion of entities that were accurately tagged for each entry.\n", - "\n", - "First we set a basic system prompt that outlines our task, to extract generic ingredients from each recipe provided." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "e2ba5fc2-8d5d-43ad-b51b-d625dcd4cfc9", - "metadata": {}, - "outputs": [], - "source": [ - "messages = []\n", - "system_prompt = 'You are a helpful recipe assistant. You are to extract the generic ingredients from each of the recipes provided.'\n", - "messages.append({\"role\":\"system\",\"content\":system_prompt})\n" - ] - }, - { - "cell_type": "markdown", - "id": "120fb541-a0bb-4fbc-87eb-419b9356b6e8", - "metadata": {}, - "source": [ - "Next we'll build up our user prompt with this `create_prompt` function. As with the old `FineTuning` endpoint, this is a critical point - you need to make sure your training examples match the format the model will see in production, otherwise your results may be poor." - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "7d5cf5bd-618f-4f89-9649-8cd2c286dcb4", - "metadata": {}, - "outputs": [], - "source": [ - "def create_prompt(row):\n", - " \"\"\"Simple function to take in one of our observations and build it into a prompt\"\"\"\n", - " title = row['title']\n", - " ingredients = row['ingredients']\n", - " directions = row['directions']\n", - "\n", - " user_prompt = f'''Title: {title}\\n\\nIngredients: {ingredients}\\n\\nDirections: {directions}\\n\\nGeneric ingredients: '''\n", - " return user_prompt" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "6ec43c84-56e7-4c1b-9f66-795f55b4c7c5", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "User Prompt: Title: No-Bake Nut Cookies\n", - "\n", - "Ingredients: [\"1 c. firmly packed brown sugar\", \"1/2 c. evaporated milk\", \"1/2 tsp. vanilla\", \"1/2 c. broken nuts (pecans)\", \"2 Tbsp. butter or margarine\", \"3 1/2 c. bite size shredded rice biscuits\"]\n", - "\n", - "Directions: [\"In a heavy 2-quart saucepan, mix brown sugar, nuts, evaporated milk and butter or margarine.\", \"Stir over medium heat until mixture bubbles all over top.\", \"Boil and stir 5 minutes more. Take off heat.\", \"Stir in vanilla and cereal; mix well.\", \"Using 2 teaspoons, drop and shape into 30 clusters on wax paper.\", \"Let stand until firm, about 30 minutes.\"]\n", - "\n", - "Generic ingredients: \n", - "Assistant Response: ['brown sugar', 'milk', 'vanilla', 'nuts', 'butter', 'bite size shredded rice biscuits']\n", - "User Prompt: Title: Jewell Ball'S Chicken\n", - "\n", - "Ingredients: [\"1 small jar chipped beef, cut up\", \"4 boned chicken breasts\", \"1 can cream of mushroom soup\", \"1 carton sour cream\"]\n", - "\n", - "Directions: [\"Place chipped beef on bottom of baking dish.\", \"Place chicken on top of beef.\", \"Mix soup and cream together; pour over chicken. Bake, uncovered, at 275\\u00b0 for 3 hours.\"]\n", - "\n", - "Generic ingredients: \n", - "Assistant Response: ['beef', 'chicken breasts', 'cream of mushroom soup', 'sour cream']\n", - "User Prompt: Title: Creamy Corn\n", - "\n", - "Ingredients: [\"2 (16 oz.) pkg. frozen corn\", \"1 (8 oz.) pkg. cream cheese, cubed\", \"1/3 c. butter, cubed\", \"1/2 tsp. garlic powder\", \"1/2 tsp. salt\", \"1/4 tsp. pepper\"]\n", - "\n", - "Directions: [\"In a slow cooker, combine all ingredients. Cover and cook on low for 4 hours or until heated through and cheese is melted. Stir well before serving. Yields 6 servings.\"]\n", - "\n", - "Generic ingredients: \n", - "Assistant Response: ['frozen corn', 'cream cheese', 'butter', 'garlic powder', 'salt', 'pepper']\n", - "User Prompt: Title: Chicken Funny\n", - "\n", - "Ingredients: [\"1 large whole chicken\", \"2 (10 1/2 oz.) cans chicken gravy\", \"1 (10 1/2 oz.) can cream of mushroom soup\", \"1 (6 oz.) box Stove Top stuffing\", \"4 oz. shredded cheese\"]\n", - "\n", - "Directions: [\"Boil and debone chicken.\", \"Put bite size pieces in average size square casserole dish.\", \"Pour gravy and cream of mushroom soup over chicken; level.\", \"Make stuffing according to instructions on box (do not make too moist).\", \"Put stuffing on top of chicken and gravy; level.\", \"Sprinkle shredded cheese on top and bake at 350\\u00b0 for approximately 20 minutes or until golden and bubbly.\"]\n", - "\n", - "Generic ingredients: \n", - "Assistant Response: ['chicken', 'chicken gravy', 'cream of mushroom soup', 'shredded cheese']\n", - "User Prompt: Title: Reeses Cups(Candy) \n", - "\n", - "Ingredients: [\"1 c. peanut butter\", \"3/4 c. graham cracker crumbs\", \"1 c. melted butter\", \"1 lb. (3 1/2 c.) powdered sugar\", \"1 large pkg. chocolate chips\"]\n", - "\n", - "Directions: [\"Combine first four ingredients and press in 13 x 9-inch ungreased pan.\", \"Melt chocolate chips and spread over mixture. Refrigerate for about 20 minutes and cut into pieces before chocolate gets hard.\", \"Keep in refrigerator.\"]\n", - "\n", - "Generic ingredients: \n", - "Assistant Response: ['peanut butter', 'graham cracker crumbs', 'butter', 'powdered sugar', 'chocolate chips']\n" - ] - } - ], - "source": [ - "# Pick the top 5 entries to be our few-shot examples\n", - "for x,y in recipe_df.head(5).iterrows():\n", - " \n", - " user_prompt = create_prompt(y)\n", - " print(f\"User Prompt: {user_prompt}\")\n", - "\n", - " messages.append({\"role\":\"user\",\"content\":user_prompt})\n", - "\n", - " results = ast.literal_eval(y['NER'])\n", - " messages.append({\"role\":\"assistant\",\"content\": y['NER']})\n", - " print(f\"Assistant Response: {results}\")\n", - " " - ] - }, - { - "cell_type": "markdown", - "id": "542386a1-e5e3-4222-9f58-bc96ddbbfb63", - "metadata": {}, - "source": [ - "### Test few-shot approach\n", - "\n", - "We'll pick 300 test cases from our dataset and measure how many of the entities were extracted correctly.\n", - "\n", - "If a result has a 100% match then all entities were extracted - if it has less than some were missed by the model." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "f8d81b7f-366f-4a77-a0f5-599d45bbc183", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "300" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# We'll pick a test set from further on in the dataset\n", - "test_df = recipe_df.loc[1500:1799]\n", - "len(test_df)" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "22d98bfd-2f76-4540-89fb-6e67fc336082", - "metadata": {}, - "outputs": [], - "source": [ - "results = []\n", - "for x,y in test_df.iterrows():\n", - " test_messages = messages.copy()\n", - " user_prompt = create_prompt(y)\n", - " test_messages.append({\"role\":\"user\",\"content\": user_prompt})\n", - "\n", - " try:\n", - " response = openai.ChatCompletion.create(\n", - " model=GPT_MODEL,\n", - " messages=test_messages,\n", - " temperature=0,\n", - " max_tokens=500\n", - " )\n", - " results.append((user_prompt,y['NER'],response['choices'][0]['message']['content']))\n", - "\n", - " except Exception as e:\n", - " print(e)\n", - " " - ] - }, - { - "cell_type": "markdown", - "id": "37dc6d6d-b1cf-423e-a998-4837f1b69ca2", - "metadata": {}, - "source": [ - "### Evaluate results\n", - "\n", - "Let's evaluate how well the model performed with 5 few-shot examples" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "id": "e451b7ff-e474-4dfb-aeaa-08ae55cb86fd", - "metadata": {}, - "outputs": [], - "source": [ - "def evaluate_ner(row):\n", - " actual_results = ast.literal_eval(row['actual'])\n", - " predicted_results = ast.literal_eval(row['prediction'])\n", - " actual_set = set(actual_results)\n", - "\n", - " accuracy = 0\n", - " for x in predicted_results:\n", - " if x in actual_set:\n", - " accuracy += 1\n", - "\n", - " if len(actual_results) == 0:\n", - " score = 1\n", - " else:\n", - " score = accuracy / len(actual_results)\n", - " return score\n", - " " - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "id": "cd9b5b56-7570-4910-8b35-5c9d2340b2c1", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
inputactualpredictionscore
0Title: Pretzel Candy\\n\\nIngredients: [\"1 lb. w...[\"white chocolate\", \"pretzel sticks\", \"peanuts\"][\"white chocolate\", \"pretzel sticks\", \"salted ...0.666667
1Title: Salmon Party Ball\\n\\nIngredients: [\"8 o...[\"cream cheese\", \"salmon\", \"lemon juice\", \"hor...[\"cream cheese\", \"salmon\", \"lemon juice\", \"hor...0.777778
2Title: Fancy Fried Green Tomatoes\\n\\nIngredien...[\"sour cream\", \"green onion\", \"salt\", \"eggs\", ...[\"sour cream\", \"green onion\", \"salt\", \"eggs\", ...0.800000
3Title: Potluck Potatoes\\n\\nIngredients: [\"1 (2...[\"frozen hash brown potatoes\", \"onions\", \"salt...[\"frozen hash brown potatoes\", \"onions\", \"salt...1.000000
4Title: Old-Fashioned Sweet-Sour Cole Slaw\\n\\nI...[\"shredded green cabbage\", \"salt\", \"sugar\", \"c...[\"shredded green cabbage\", \"salt\", \"sugar\", \"c...1.000000
5Title: Peanut Brittle\\n\\nIngredients: [\"3 c. w...[\"white sugar\", \"water\", \"butter\", \"soda\", \"wh...[\"white sugar\", \"water\", \"butter\", \"soda\", \"wh...0.857143
6Title: Chicken Inspiration\\n\\nIngredients: [\"1...[\"chicken breast\", \"fresh mushrooms\", \"Provolo...[\"chicken breast\", \"mushrooms\", \"Provolone che...0.636364
7Title: Down East Blueberry Cake\\n\\nIngredients...[\"butter\", \"sugar\", \"eggs\", \"sour milk\", \"blue...[\"butter\", \"sugar\", \"eggs\", \"sour milk\", \"blue...1.000000
8Title: Cranberry-Pecan Bars\\n\\nIngredients: [\"...[\"flour\", \"sugar\", \"salt\", \"margarine\", \"pecan...[\"flour\", \"sugar\", \"salt\", \"margarine\", \"pecan...0.833333
9Title: Stacked Twinkie Cake\\n\\nIngredients: [\"...[\"coconut\", \"Twinkies\", \"pineapple\", \"pecans\",...[\"coconut\", \"instant French vanilla pudding\", ...0.600000
\n", - "
" - ], - "text/plain": [ - " input \\\n", - "0 Title: Pretzel Candy\\n\\nIngredients: [\"1 lb. w... \n", - "1 Title: Salmon Party Ball\\n\\nIngredients: [\"8 o... \n", - "2 Title: Fancy Fried Green Tomatoes\\n\\nIngredien... \n", - "3 Title: Potluck Potatoes\\n\\nIngredients: [\"1 (2... \n", - "4 Title: Old-Fashioned Sweet-Sour Cole Slaw\\n\\nI... \n", - "5 Title: Peanut Brittle\\n\\nIngredients: [\"3 c. w... \n", - "6 Title: Chicken Inspiration\\n\\nIngredients: [\"1... \n", - "7 Title: Down East Blueberry Cake\\n\\nIngredients... \n", - "8 Title: Cranberry-Pecan Bars\\n\\nIngredients: [\"... \n", - "9 Title: Stacked Twinkie Cake\\n\\nIngredients: [\"... \n", - "\n", - " actual \\\n", - "0 [\"white chocolate\", \"pretzel sticks\", \"peanuts\"] \n", - "1 [\"cream cheese\", \"salmon\", \"lemon juice\", \"hor... \n", - "2 [\"sour cream\", \"green onion\", \"salt\", \"eggs\", ... \n", - "3 [\"frozen hash brown potatoes\", \"onions\", \"salt... \n", - "4 [\"shredded green cabbage\", \"salt\", \"sugar\", \"c... \n", - "5 [\"white sugar\", \"water\", \"butter\", \"soda\", \"wh... \n", - "6 [\"chicken breast\", \"fresh mushrooms\", \"Provolo... \n", - "7 [\"butter\", \"sugar\", \"eggs\", \"sour milk\", \"blue... \n", - "8 [\"flour\", \"sugar\", \"salt\", \"margarine\", \"pecan... \n", - "9 [\"coconut\", \"Twinkies\", \"pineapple\", \"pecans\",... \n", - "\n", - " prediction score \n", - "0 [\"white chocolate\", \"pretzel sticks\", \"salted ... 0.666667 \n", - "1 [\"cream cheese\", \"salmon\", \"lemon juice\", \"hor... 0.777778 \n", - "2 [\"sour cream\", \"green onion\", \"salt\", \"eggs\", ... 0.800000 \n", - "3 [\"frozen hash brown potatoes\", \"onions\", \"salt... 1.000000 \n", - "4 [\"shredded green cabbage\", \"salt\", \"sugar\", \"c... 1.000000 \n", - "5 [\"white sugar\", \"water\", \"butter\", \"soda\", \"wh... 0.857143 \n", - "6 [\"chicken breast\", \"mushrooms\", \"Provolone che... 0.636364 \n", - "7 [\"butter\", \"sugar\", \"eggs\", \"sour milk\", \"blue... 1.000000 \n", - "8 [\"flour\", \"sugar\", \"salt\", \"margarine\", \"pecan... 0.833333 \n", - "9 [\"coconut\", \"instant French vanilla pudding\", ... 0.600000 " - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "results_df = pd.DataFrame(results)\n", - "results_df.columns = ['input','actual','prediction']\n", - "results_df['score'] = results_df.apply(lambda x: evaluate_ner(x),axis=1)\n", - "results_df.head(10)" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "ff94c184-ce76-46b0-9cdd-8bec041ee18f", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0.806756229007777" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "results_df['score'].mean()" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "24ee164a-b619-4e14-a15f-ee361259329c", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "results_df['score'].plot.hist(bins=10)" - ] - }, - { - "cell_type": "markdown", - "id": "d3f012fc-0891-4ea6-8fc5-ae49398cc8c7", - "metadata": {}, - "source": [ - "Results are ok, with 80% of entities extracted correctly - the histogram doesn't look great though, with under half of results being 100% accurate. Lets see if baking learning into the model using fine-tuning gets us to a better place!" - ] - }, - { - "cell_type": "markdown", - "id": "8dd6be2c", - "metadata": {}, - "source": [ - "## OPTIONAL: Few-shot with GPT-4\n", - "\n", - "For some use cases the increased cost and latency of GPT-4 will be worth the trade-off of increased accuracy. In these cases, you may want to extend the few-shot approach to GPT-4 before diving in with fine-tuning.\n", - "\n", - "We'll perform the above test again with GPT-4 this time to give us another data point to compare our fine-tuning performance to." - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "62bb8702", - "metadata": {}, - "outputs": [], - "source": [ - "# If you don't run this section, run this cell so the evaluation at the bottom works\n", - "gpt_4_score = 0.8201281042085995" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "aeec72b7", - "metadata": {}, - "outputs": [], - "source": [ - "results_gpt4 = []\n", - "for x,y in test_df.iterrows():\n", - " test_messages = messages.copy()\n", - " user_prompt = create_prompt(y)\n", - " test_messages.append({\"role\":\"user\",\"content\": user_prompt})\n", - "\n", - " try:\n", - " response = openai.ChatCompletion.create(\n", - " model='gpt-4',\n", - " messages=test_messages,\n", - " temperature=0,\n", - " max_tokens=500\n", - " )\n", - " results_gpt4.append((user_prompt,y['NER'],response['choices'][0]['message']['content']))\n", - "\n", - " except Exception as e:\n", - " print(e)\n", - " " - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "e5e6e8fe", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
inputactualpredictionscore
0Title: Pretzel Candy\\n\\nIngredients: [\"1 lb. w...[\"white chocolate\", \"pretzel sticks\", \"peanuts\"][\"white chocolate\", \"pretzel sticks\", \"salted ...0.666667
1Title: Salmon Party Ball\\n\\nIngredients: [\"8 o...[\"cream cheese\", \"salmon\", \"lemon juice\", \"hor...[\"cream cheese\", \"salmon\", \"lemon juice\", \"hor...0.888889
2Title: Fancy Fried Green Tomatoes\\n\\nIngredien...[\"sour cream\", \"green onion\", \"salt\", \"eggs\", ...[\"sour cream\", \"green onion\", \"salt\", \"eggs\", ...0.800000
3Title: Potluck Potatoes\\n\\nIngredients: [\"1 (2...[\"frozen hash brown potatoes\", \"onions\", \"salt...[\"hash brown potatoes\", \"onions\", \"salt\", \"pep...0.900000
4Title: Old-Fashioned Sweet-Sour Cole Slaw\\n\\nI...[\"shredded green cabbage\", \"salt\", \"sugar\", \"c...[\"green cabbage\", \"salt\", \"sugar\", \"cider vine...0.800000
5Title: Peanut Brittle\\n\\nIngredients: [\"3 c. w...[\"white sugar\", \"water\", \"butter\", \"soda\", \"wh...[\"white sugar\", \"water\", \"butter\", \"soda\", \"wh...0.857143
6Title: Chicken Inspiration\\n\\nIngredients: [\"1...[\"chicken breast\", \"fresh mushrooms\", \"Provolo...[\"chicken breast\", \"fresh mushrooms\", \"Provolo...0.636364
7Title: Down East Blueberry Cake\\n\\nIngredients...[\"butter\", \"sugar\", \"eggs\", \"sour milk\", \"blue...[\"butter\", \"sugar\", \"eggs\", \"sour milk\", \"blue...1.000000
8Title: Cranberry-Pecan Bars\\n\\nIngredients: [\"...[\"flour\", \"sugar\", \"salt\", \"margarine\", \"pecan...[\"flour\", \"sugar\", \"salt\", \"margarine\", \"pecan...0.833333
9Title: Stacked Twinkie Cake\\n\\nIngredients: [\"...[\"coconut\", \"Twinkies\", \"pineapple\", \"pecans\",...[\"coconut\", \"instant French vanilla pudding\", ...0.800000
\n", - "
" - ], - "text/plain": [ - " input \\\n", - "0 Title: Pretzel Candy\\n\\nIngredients: [\"1 lb. w... \n", - "1 Title: Salmon Party Ball\\n\\nIngredients: [\"8 o... \n", - "2 Title: Fancy Fried Green Tomatoes\\n\\nIngredien... \n", - "3 Title: Potluck Potatoes\\n\\nIngredients: [\"1 (2... \n", - "4 Title: Old-Fashioned Sweet-Sour Cole Slaw\\n\\nI... \n", - "5 Title: Peanut Brittle\\n\\nIngredients: [\"3 c. w... \n", - "6 Title: Chicken Inspiration\\n\\nIngredients: [\"1... \n", - "7 Title: Down East Blueberry Cake\\n\\nIngredients... \n", - "8 Title: Cranberry-Pecan Bars\\n\\nIngredients: [\"... \n", - "9 Title: Stacked Twinkie Cake\\n\\nIngredients: [\"... \n", - "\n", - " actual \\\n", - "0 [\"white chocolate\", \"pretzel sticks\", \"peanuts\"] \n", - "1 [\"cream cheese\", \"salmon\", \"lemon juice\", \"hor... \n", - "2 [\"sour cream\", \"green onion\", \"salt\", \"eggs\", ... \n", - "3 [\"frozen hash brown potatoes\", \"onions\", \"salt... \n", - "4 [\"shredded green cabbage\", \"salt\", \"sugar\", \"c... \n", - "5 [\"white sugar\", \"water\", \"butter\", \"soda\", \"wh... \n", - "6 [\"chicken breast\", \"fresh mushrooms\", \"Provolo... \n", - "7 [\"butter\", \"sugar\", \"eggs\", \"sour milk\", \"blue... \n", - "8 [\"flour\", \"sugar\", \"salt\", \"margarine\", \"pecan... \n", - "9 [\"coconut\", \"Twinkies\", \"pineapple\", \"pecans\",... \n", - "\n", - " prediction score \n", - "0 [\"white chocolate\", \"pretzel sticks\", \"salted ... 0.666667 \n", - "1 [\"cream cheese\", \"salmon\", \"lemon juice\", \"hor... 0.888889 \n", - "2 [\"sour cream\", \"green onion\", \"salt\", \"eggs\", ... 0.800000 \n", - "3 [\"hash brown potatoes\", \"onions\", \"salt\", \"pep... 0.900000 \n", - "4 [\"green cabbage\", \"salt\", \"sugar\", \"cider vine... 0.800000 \n", - "5 [\"white sugar\", \"water\", \"butter\", \"soda\", \"wh... 0.857143 \n", - "6 [\"chicken breast\", \"fresh mushrooms\", \"Provolo... 0.636364 \n", - "7 [\"butter\", \"sugar\", \"eggs\", \"sour milk\", \"blue... 1.000000 \n", - "8 [\"flour\", \"sugar\", \"salt\", \"margarine\", \"pecan... 0.833333 \n", - "9 [\"coconut\", \"instant French vanilla pudding\", ... 0.800000 " - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "results_gpt4_df = pd.DataFrame(results_gpt4)\n", - "results_gpt4_df.columns = ['input','actual','prediction']\n", - "results_gpt4_df['score'] = results_gpt4_df.apply(lambda x: evaluate_ner(x),axis=1)\n", - "results_gpt4_df.head(10)" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "id": "b01d1a12", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0.8183172576477531" - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "results_gpt4_df['score'].mean()" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "id": "9fc1a911", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "results_gpt4_df['score'].plot.hist(bins=10)" - ] - }, - { - "cell_type": "markdown", - "id": "22f175e0-4613-4e70-ace9-b7f1bba51459", - "metadata": {}, - "source": [ - "## Fine-tuning\n", - "\n", - "Our results so far are **81%** correct classifications for `gpt-3.5-turbo` and **X%** for `gpt-4`. `gpt-3.5-turbo` is far cheaper and quicker, but is much less accurate.\n", - "\n", - "Lets see if we can achieve the best of both worlds, by fine-tuning `gpt-3.5-turbo` on a domain-specific training set of 1500 examples." - ] - }, - { - "cell_type": "markdown", - "id": "2b3151e9-8715-47bd-a153-195d6a0d0a70", - "metadata": {}, - "source": [ - "### Data preparation\n", - "\n", - "We'll begin by preparing our data. When fine-tuning with the `ChatCompletion` format, each training example is a simple list of `messages`. For example, an entry could look like:\n", - "```\n", - "[{'role': 'system',\n", - " 'content': 'You are a helpful recipe assistant. You are to extract the generic ingredients from each of the recipes provided.'},\n", - " {'role': 'user',\n", - " 'content': 'Title: No-Bake Nut Cookies\\n\\nIngredients: [\"1 c. firmly packed brown sugar\", \"1/2 c. evaporated milk\", \"1/2 tsp. vanilla\", \"1/2 c. broken nuts (pecans)\", \"2 Tbsp. butter or margarine\", \"3 1/2 c. bite size shredded rice biscuits\"]\\n\\nDirections: [\"In a heavy 2-quart saucepan, mix brown sugar, nuts, evaporated milk and butter or margarine.\", \"Stir over medium heat until mixture bubbles all over top.\", \"Boil and stir 5 minutes more. Take off heat.\", \"Stir in vanilla and cereal; mix well.\", \"Using 2 teaspoons, drop and shape into 30 clusters on wax paper.\", \"Let stand until firm, about 30 minutes.\"]\\n\\nGeneric ingredients: '},\n", - " {'role': 'assistant',\n", - " 'content': '[\"brown sugar\", \"milk\", \"vanilla\", \"nuts\", \"butter\", \"bite size shredded rice biscuits\"]'}]\n", - "```\n", - "\n", - "During the training process this conversation will be split, with the final entry being the `completion` that the model will produce, and the remainder of the `messages` acting as the prompt. Consider this when building your training examples - if your model will act on multi-turn conversations, then please provide representative examples so it doesn't perform poorly when the conversation starts to expand.\n", - "\n", - "For fine-tuning with `ChatCompletion` you can begin with even 30-50 well-pruned examples. However, given we have a large representative dataset at hand, we'll take a larger set of 1500 training examples to start with. You should see performance continue to scale linearly as you increase the size of the training set.\n", - "\n", - "Please note that currently there is a 4096 token limit for each training example. Anything longer than this will be truncated at 4096 tokens." - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "id": "9a8216b0-d1dc-472d-b07d-1be03acd70a5", - "metadata": {}, - "outputs": [], - "source": [ - "training_data = []\n", - "\n", - "# Take first 1500 records for training\n", - "for x, y in recipe_df.head(1500).iterrows():\n", - " training_prompt_message = []\n", - " training_prompt_message.append({\"role\":\"system\",\"content\":system_prompt})\n", - " \n", - " user_prompt = create_prompt(y)\n", - " training_prompt_message.append({\"role\":\"user\",\"content\": user_prompt})\n", - "\n", - " training_prompt_message.append({\"role\":\"assistant\",\"content\": y['NER']})\n", - " training_message_dict = {\"messages\": training_prompt_message}\n", - " training_data.append(training_message_dict)\n", - " " - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "id": "5b853efa-dfea-4770-ab88-9b7e17794421", - "metadata": {}, - "outputs": [], - "source": [ - "validation_data = []\n", - "\n", - "for x, y in test_df.iterrows():\n", - " validation_prompt_message = []\n", - " validation_prompt_message.append({\"role\":\"system\",\"content\":system_prompt})\n", - " \n", - " user_prompt = create_prompt(y)\n", - " validation_prompt_message.append({\"role\":\"user\",\"content\": user_prompt})\n", - "\n", - " validation_prompt_message.append({\"role\":\"assistant\",\"content\": y['NER']})\n", - " validation_message_dict = {\"messages\": validation_prompt_message}\n", - " validation_data.append(validation_message_dict)\n" - ] - }, - { - "cell_type": "markdown", - "id": "1d5e7bfe-f6c8-4a23-a951-3df3f3791d7f", - "metadata": {}, - "source": [ - "We then need to export these as `.jsonl` files, with each row being one training example." - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "id": "8d2eb207-2c2b-43f6-a613-64a7e92d494d", - "metadata": {}, - "outputs": [], - "source": [ - "def dicts_to_jsonl(data_list: list, filename: str) -> None:\n", - " \"\"\"\n", - " Method saves list of dicts into jsonl file.\n", - " :param data: (list) list of dicts to be stored,\n", - " :param filename: (str) path to the output file. If suffix .jsonl is not given then methods appends\n", - " .jsonl suffix into the file.\n", - " \"\"\"\n", - " sjsonl = '.jsonl'\n", - "\n", - " # Check filename\n", - " if not filename.endswith(sjsonl):\n", - " filename = filename + sjsonl\n", - " # Save data\n", - " \n", - " with open(filename, 'w') as out:\n", - " for ddict in data_list:\n", - " jout = json.dumps(ddict) + '\\n'\n", - " out.write(jout)" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "id": "8b53e7a2-1cac-4c5f-8ba4-3292ba2a0770", - "metadata": {}, - "outputs": [], - "source": [ - "# Save training_data to JSONL\n", - "dicts_to_jsonl(training_data,'recipe_finetune_training')\n", - "\n", - "# Save validation_data to JSONL\n", - "dicts_to_jsonl(validation_data,'recipe_finetune_validation')" - ] - }, - { - "cell_type": "markdown", - "id": "0d149e2e-50dd-45c1-bd8d-1291975670b4", - "metadata": {}, - "source": [ - "### Upload files\n", - "\n", - "You can then upload the files to our `Files` endpoint to be used by the fine-tuned model." - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "id": "25302bbc-80b1-4d56-8d19-ab8bd41579ac", - "metadata": {}, - "outputs": [], - "source": [ - "def upload_chat_file(filename):\n", - " \n", - "\n", - " headers = {\n", - " 'Authorization': 'Bearer ' + os.getenv('OPENAI_API_KEY', ''),\n", - " }\n", - " \n", - " files = {\n", - " 'purpose': (None, 'fine-tune'),\n", - " #'format': (None, 'fine-tune-chat'),\n", - " 'file': open(filename, 'rb'),\n", - " }\n", - "\n", - " response = requests.post('https://api.openai.com/v1/files', headers=headers, files=files)\n", - " return response" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "id": "69462d9e-e6bd-49b9-a064-9eae4ea5b7a8", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{'object': 'file', 'id': 'file-5bQjePpOBzsxyjYE5hvd6WSs', 'purpose': 'fine-tune', 'filename': 'recipe_finetune_training.jsonl', 'bytes': 1347985, 'created_at': 1692727484, 'status': 'uploaded', 'status_details': None}\n", - "{'object': 'file', 'id': 'file-h7SRtdHu3Mr5gTnuIX4mmVED', 'purpose': 'fine-tune', 'filename': 'recipe_finetune_validation.jsonl', 'bytes': 269054, 'created_at': 1692727486, 'status': 'uploaded', 'status_details': None}\n", - "['file-5bQjePpOBzsxyjYE5hvd6WSs', 'file-h7SRtdHu3Mr5gTnuIX4mmVED']\n" - ] - } - ], - "source": [ - "file_ids = []\n", - "files = ['recipe_finetune_training.jsonl','recipe_finetune_validation.jsonl']\n", - "\n", - "for file in files:\n", - " response = upload_chat_file(file)\n", - " #print(response.json())\n", - " file_ids.append(response.json()['id'])\n", - " print(response.json())\n", - " \n", - "print(file_ids)" - ] - }, - { - "cell_type": "markdown", - "id": "d61cd381-63ad-4ed9-b0be-47a438891028", - "metadata": {}, - "source": [ - "### Create fine-tune job\n", - "\n", - "Now we can create our fine-tuned job - we do this using the ```https://api.openai.com/v1/alpha/fine-tunes``` endpoint. You must supply the following parameters:\n", - "- **training_file:** the name of the file to train on\n", - "- **validation_file:** the name of the file to validate on\n", - "- **model:** the model name to finetune\n", - "- **suffix:** up to 18 character suffix to customize output name\n", - "\n", - "The response will contain an `id` which you can use to retrieve updates on the job." - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "id": "3452cafc", - "metadata": {}, - "outputs": [], - "source": [ - "training_file_id = file_ids[0]\n", - "validation_file_id = file_ids[1]\n", - "suffix_name = 'recipe-cj'\n" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "id": "d84ee75d-823c-4cfc-bb1f-7342b485ddd3", - "metadata": {}, - "outputs": [], - "source": [ - "def create_chatcompletion_finetune(training_file_id,validation_file_id,suffix):\n", - " \"\"\"This function creates a fine-tuned job given a training file, a validation file and a suffix\"\"\"\n", - " \n", - "\n", - " headers = {\n", - " 'Content-Type': 'application/json',\n", - " 'Authorization': 'Bearer ' + os.getenv('OPENAI_API_KEY', ''),\n", - " }\n", - "\n", - " json_data = {\n", - " 'training_file': training_file_id,\n", - " 'validation_file': validation_file_id,\n", - " 'model': 'gpt-3.5-turbo-0613-alpha',\n", - " 'suffix': suffix,\n", - " }\n", - " \n", - " response = requests.post('https://api.openai.com/v1/fine_tuning/jobs', headers=headers, json=json_data)\n", - " \n", - " return response\n" - ] - }, - { - "cell_type": "code", - "execution_count": 32, - "id": "05541ceb-5628-447e-962d-7e57c112439c", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{'error': {'message': 'Invalid URL (POST /v1/fine_tuning/jobs)', 'type': 'invalid_request_error', 'param': None, 'code': None}}\n", - "'id'\n" - ] - } - ], - "source": [ - "try:\n", - " response = create_chatcompletion_finetune(training_file_id,validation_file_id,suffix_name)\n", - " print(response.json())\n", - " job_id = response.json()['id']\n", - " print(response.json())\n", - "\n", - "except Exception as e:\n", - " print(e)" - ] - }, - { - "cell_type": "markdown", - "id": "1de3ed71-f2d4-4138-95a3-70da187a007e", - "metadata": {}, - "source": [ - "#### Check job status\n", - "\n", - "You can make a `GET` request to the `https://api.openai.com/v1/alpha/fine-tunes` endpoint to list your alpha fine-tune jobs. In this instance you'll want to check that the ID you got from the previous step ends up as `status: succeeded`.\n", - "\n", - "Once it is completed, you can use the `result_files` to sample the results from the validation set (if you uploaded one), and use the ID from the `fine_tuned_model` parameter to invoke your trained model." - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "id": "b6f65822-ed88-4667-bea5-4575f7df62a2", - "metadata": {}, - "outputs": [], - "source": [ - "def check_chatcompletion_finetunes(job_id):\n", - "\n", - " headers = {\n", - " 'Authorization': 'Bearer ' + os.getenv('OPENAI_API_KEY', ''),\n", - " }\n", - "\n", - " return requests.get(f'https://api.openai.com/v1/fine_tuning/jobs/{job_id}', headers=headers)\n" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "id": "6df1faf1", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{'object': 'fine_tuning.job', 'id': 'ft-laDr3mkgkv5N5FMlsKNzueBu', 'model': 'gpt-3.5-turbo-0613-alpha', 'created_at': 1692592059, 'fine_tuned_model': None, 'organization_id': 'org-p13k3klgno5cqxbf0q8hpgrk', 'result_files': [], 'status': 'running', 'validation_file': 'file-NLQBnaKFUSS8NYsnBRltq2Rd', 'training_file': 'file-lNnvY0Qon022gY4Fyh33EK6T', 'hyperparameters': {'n_epochs': 3, 'batch_size': 3, 'learning_rate_multiplier': 2}}\n" - ] - } - ], - "source": [ - "response = check_chatcompletion_finetunes(job_id)\n", - "\n", - "print(response.json())" - ] - }, - { - "cell_type": "markdown", - "id": "0025e392-84cd-4566-a384-ea31ca43e567", - "metadata": {}, - "source": [ - "## Evaluate fine-tuned model\n", - "\n", - "The last step is to use your fine-tuned model for inference. Similar to the classic `FineTuning` or Reserved Capacity, you simply call `ChatCompletions` with your new fine-tuned model name filling the `model` parameter.\n", - "\n", - "**NOTE:** If you uploaded your fine-tune job with a validation file, you can inspect the results immediately following training by using the `Files` endpoint." - ] - }, - { - "cell_type": "code", - "execution_count": 47, - "id": "85b109e5-f183-4537-9b5d-439a549be7c0", - "metadata": {}, - "outputs": [], - "source": [ - "# Input your FT_MODEL ID here - this will usually be something like ft:{MODEL_NAME}:{ORG_ID}:{SUFFIX_NAME}{STRING}\n", - "FT_MODEL = 'gpt-recipe-ft'" - ] - }, - { - "cell_type": "code", - "execution_count": 48, - "id": "1c7de631-b68f-4eff-9ae7-051641579c2b", - "metadata": {}, - "outputs": [], - "source": [ - "validation_results = []\n", - "for x,y in test_df.iterrows():\n", - " test_messages = messages.copy()\n", - " user_prompt = create_prompt(y)\n", - " test_messages.append({\"role\":\"user\",\"content\": user_prompt})\n", - "\n", - " try:\n", - " response = openai.ChatCompletion.create(\n", - " model=FT_MODEL,\n", - " messages=test_messages,\n", - " temperature=0,\n", - " max_tokens=500\n", - " )\n", - " validation_results.append((user_prompt,y['NER'],response['choices'][0]['message']['content']))\n", - "\n", - " except Exception as e:\n", - " print(e)" - ] - }, - { - "cell_type": "code", - "execution_count": 49, - "id": "8e3de5a9", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
inputactualpredictionscore
0Title: Pretzel Candy\\n\\nIngredients: [\"1 lb. w...[\"white chocolate\", \"pretzel sticks\", \"peanuts\"][\"white chocolate\", \"pretzel sticks\", \"peanuts\"]1.000000
1Title: Salmon Party Ball\\n\\nIngredients: [\"8 o...[\"cream cheese\", \"salmon\", \"lemon juice\", \"hor...[\"cream cheese\", \"salmon\", \"lemon juice\", \"hor...1.000000
2Title: Fancy Fried Green Tomatoes\\n\\nIngredien...[\"sour cream\", \"green onion\", \"salt\", \"eggs\", ...[\"sour cream\", \"green onion\", \"salt\", \"eggs\", ...0.900000
3Title: Potluck Potatoes\\n\\nIngredients: [\"1 (2...[\"frozen hash brown potatoes\", \"onions\", \"salt...[\"frozen hash brown potatoes\", \"onions\", \"salt...1.000000
4Title: Old-Fashioned Sweet-Sour Cole Slaw\\n\\nI...[\"shredded green cabbage\", \"salt\", \"sugar\", \"c...[\"green cabbage\", \"salt\", \"sugar\", \"cider vine...0.800000
5Title: Peanut Brittle\\n\\nIngredients: [\"3 c. w...[\"white sugar\", \"water\", \"butter\", \"soda\", \"wh...[\"white sugar\", \"water\", \"butter\", \"soda\", \"wh...1.000000
6Title: Chicken Inspiration\\n\\nIngredients: [\"1...[\"chicken breast\", \"fresh mushrooms\", \"Provolo...[\"boneless\", \"fresh mushrooms\", \"Provolone che...0.636364
7Title: Down East Blueberry Cake\\n\\nIngredients...[\"butter\", \"sugar\", \"eggs\", \"sour milk\", \"blue...[\"butter\", \"sugar\", \"eggs\", \"sour milk\", \"blue...1.000000
8Title: Cranberry-Pecan Bars\\n\\nIngredients: [\"...[\"flour\", \"sugar\", \"salt\", \"margarine\", \"pecan...[\"flour\", \"sugar\", \"salt\", \"margarine\", \"pecan...1.000000
9Title: Stacked Twinkie Cake\\n\\nIngredients: [\"...[\"coconut\", \"Twinkies\", \"pineapple\", \"pecans\",...[\"coconut\", \"instant French vanilla pudding\", ...1.000000
\n", - "
" - ], - "text/plain": [ - " input \\\n", - "0 Title: Pretzel Candy\\n\\nIngredients: [\"1 lb. w... \n", - "1 Title: Salmon Party Ball\\n\\nIngredients: [\"8 o... \n", - "2 Title: Fancy Fried Green Tomatoes\\n\\nIngredien... \n", - "3 Title: Potluck Potatoes\\n\\nIngredients: [\"1 (2... \n", - "4 Title: Old-Fashioned Sweet-Sour Cole Slaw\\n\\nI... \n", - "5 Title: Peanut Brittle\\n\\nIngredients: [\"3 c. w... \n", - "6 Title: Chicken Inspiration\\n\\nIngredients: [\"1... \n", - "7 Title: Down East Blueberry Cake\\n\\nIngredients... \n", - "8 Title: Cranberry-Pecan Bars\\n\\nIngredients: [\"... \n", - "9 Title: Stacked Twinkie Cake\\n\\nIngredients: [\"... \n", - "\n", - " actual \\\n", - "0 [\"white chocolate\", \"pretzel sticks\", \"peanuts\"] \n", - "1 [\"cream cheese\", \"salmon\", \"lemon juice\", \"hor... \n", - "2 [\"sour cream\", \"green onion\", \"salt\", \"eggs\", ... \n", - "3 [\"frozen hash brown potatoes\", \"onions\", \"salt... \n", - "4 [\"shredded green cabbage\", \"salt\", \"sugar\", \"c... \n", - "5 [\"white sugar\", \"water\", \"butter\", \"soda\", \"wh... \n", - "6 [\"chicken breast\", \"fresh mushrooms\", \"Provolo... \n", - "7 [\"butter\", \"sugar\", \"eggs\", \"sour milk\", \"blue... \n", - "8 [\"flour\", \"sugar\", \"salt\", \"margarine\", \"pecan... \n", - "9 [\"coconut\", \"Twinkies\", \"pineapple\", \"pecans\",... \n", - "\n", - " prediction score \n", - "0 [\"white chocolate\", \"pretzel sticks\", \"peanuts\"] 1.000000 \n", - "1 [\"cream cheese\", \"salmon\", \"lemon juice\", \"hor... 1.000000 \n", - "2 [\"sour cream\", \"green onion\", \"salt\", \"eggs\", ... 0.900000 \n", - "3 [\"frozen hash brown potatoes\", \"onions\", \"salt... 1.000000 \n", - "4 [\"green cabbage\", \"salt\", \"sugar\", \"cider vine... 0.800000 \n", - "5 [\"white sugar\", \"water\", \"butter\", \"soda\", \"wh... 1.000000 \n", - "6 [\"boneless\", \"fresh mushrooms\", \"Provolone che... 0.636364 \n", - "7 [\"butter\", \"sugar\", \"eggs\", \"sour milk\", \"blue... 1.000000 \n", - "8 [\"flour\", \"sugar\", \"salt\", \"margarine\", \"pecan... 1.000000 \n", - "9 [\"coconut\", \"instant French vanilla pudding\", ... 1.000000 " - ] - }, - "execution_count": 49, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "validation_results_df = pd.DataFrame(validation_results)\n", - "validation_results_df.columns = ['input','actual','prediction']\n", - "validation_results_df['score'] = validation_results_df.apply(lambda x: evaluate_ner(x),axis=1)\n", - "validation_results_df.head(10)\n" - ] - }, - { - "cell_type": "code", - "execution_count": 50, - "id": "cb51b42f", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0.923313319921679" - ] - }, - "execution_count": 50, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "validation_results_df['score'].mean()" - ] - }, - { - "cell_type": "code", - "execution_count": 51, - "id": "9f31acb1", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 51, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "validation_results_df['score'].plot.hist(bins=10)" - ] - }, - { - "cell_type": "markdown", - "id": "8972f998", - "metadata": {}, - "source": [ - "Looks like a great improvement - **92%** of entities were extracted correctly, with around 60% completely correct, an improvement of roughly **10/11%**." - ] - }, - { - "cell_type": "markdown", - "id": "318b7175", - "metadata": {}, - "source": [ - "## Model comparison\n", - "\n", - "The last step is to summarize our results in a table for comparison." - ] - }, - { - "cell_type": "code", - "execution_count": 52, - "id": "c7c2a539", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
modelresult
0gpt-3.5-turbo0.810288
1gpt-40.820128
2gpt-3.5-turbo-ft0.923313
\n", - "
" - ], - "text/plain": [ - " model result\n", - "0 gpt-3.5-turbo 0.810288\n", - "1 gpt-4 0.820128\n", - "2 gpt-3.5-turbo-ft 0.923313" - ] - }, - "execution_count": 52, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "gpt_35_score = results_df['score'].mean()\n", - "gpt_4_score = results_gpt4_df['score'].mean()\n", - "gpt_35_ft_score = validation_results_df['score'].mean()\n", - "\n", - "eval_results = pd.DataFrame({'model': ['gpt-3.5-turbo', 'gpt-4','gpt-3.5-turbo-ft'], 'result': [gpt_35_score,gpt_4_score,gpt_35_ft_score]})\n", - "eval_results.head()" - ] - }, - { - "cell_type": "markdown", - "id": "04ebcaa7", - "metadata": {}, - "source": [ - "Great! Our fine-tuned model is faster and cheaper than `gpt-4` for this task - mission accomplished." - ] - }, - { - "cell_type": "markdown", - "id": "07799909-3f2a-4274-b81e-dabc048be28f", - "metadata": {}, - "source": [ - "## Conclusion\n", - "\n", - "Congratulations, you are now ready to fine-tune your own models using the `ChatCompletion` format!\n", - "\n", - "Please reach out via Slack or email to your Account Director or Account Engineer with any successes or issues you experience, we're excited to hear what works and what doesn't. Look forward to seeing what you build!" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "chat_ft", - "language": "python", - "name": "chat_ft" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.11" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -}