{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Fine tuning with function-calling\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook covers how to fine-tune to increase function calling accuracy and reliability. You can find more information on function calling [here](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_call_functions_with_chat_models.ipynb), and on fine tuning [here](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_finetune_chat_models.ipynb)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For context, from the function calling notebook above:\n", "\n", "> `tools` is an optional parameter in the Chat Completion API which can be used to provide function specifications. The purpose of this is to enable models to generate function arguments which adhere to the provided specifications. Note that the API will not actually execute any function calls. It is up to developers to execute function calls using model outputs.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Function calling is a very powerful tool when it functions as intended. However, we have seen that as the number of functions increases, and the complexity of the task at hand increases, function calling becomes less accurate (e.g.: more hallucinated invocations, and incorrect invocations).\n", "\n", "Before fine tuning for function calling, it's best to begin with:\n", "\n", "- Improvements to the function definitions. Make them more clear, and more distinct from one another.\n", "- Experiment with prompt engineering: often a more detailed prompt can help the model call the correct function.\n", "\n", "_If_ the steps above fail to improve function calling to a satisfactory level, then you can try fine tuning for function calling.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Overview\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook contains three sections\n", "\n", "- **Assessing baseline function calling performance:** Evaluating an out-of-the-box `gpt-3.5-turbo` model on our given function (let's assume that for latency + cost reasons we cannot use `gpt-4o` for a drone copilot)\n", "- **Generating synthetic data:** Using `gpt-4o` to create 'golden' set of prompts and function invocations to use as training data\n", "- **Fine-tuning**: Running the fine tuning job, and evaluating the fine-tuned model\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: _This notebook provides an example of how to create synthetic training data for fine tuning for function calling given just a list of functions. While real-world production test evals are preferable, this method produces strong results and can be used in conjunction with real-world training data._\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting baseline function calling performance\n" ] }, { "cell_type": "code", "execution_count": 173, "metadata": {}, "outputs": [], "source": [ "#!pip install tenacity -q\n", "#!pip install openai -q\n", "#!pip install typing -q\n", "# !pip install python-dotenv" ] }, { "cell_type": "code", "execution_count": 211, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The dotenv extension is already loaded. To reload it, use:\n", " %reload_ext dotenv\n" ] } ], "source": [ "import numpy as np\n", "import json\n", "import os\n", "from IPython.display import display\n", "import pandas as pd\n", "from openai import OpenAI\n", "import itertools\n", "import time\n", "import base64\n", "from tenacity import retry, wait_random_exponential, stop_after_attempt\n", "from typing import Any, Dict, List, Generator\n", "import ast\n", "\n", "%load_ext dotenv\n", "%dotenv\n", "\n", "client = OpenAI(api_key=os.environ.get(\"OPENAI_BUILD_HOUR_KEY\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Utilities\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's define utility functions for making calls to the Chat Completions API, one to get the completion and one to get the function call.\n" ] }, { "cell_type": "code", "execution_count": 212, "metadata": {}, "outputs": [], "source": [ "def get_chat_completion(\n", " messages: list[dict[str, str]],\n", " model: str = \"gpt-3.5-turbo\",\n", " max_tokens=500,\n", " temperature=0.0,\n", " stop=None,\n", " tools=None,\n", " seed=42,\n", " functions=None,\n", " tool_choice=None,\n", ") -> str:\n", " params = {\n", " \"model\": model,\n", " \"messages\": messages,\n", " \"max_tokens\": max_tokens,\n", " \"temperature\": temperature,\n", " \"stop\": stop,\n", " \"tools\": tools,\n", " \"seed\": seed,\n", " \"tool_choice\": tool_choice,\n", " }\n", " if functions:\n", " params[\"functions\"] = functions\n", "\n", " completion = client.chat.completions.create(**params)\n", " return completion.choices[0].message, completion.usage\n", "\n", "\n", "def eval(model: str, system_prompt: str, function_list, prompts_to_expected_tool_name):\n", " \"\"\"\n", " Evaluate the performance of a model in selecting the correct function based on given prompts.\n", "\n", " Args:\n", " model (str): The name of the model to be evaluated.\n", " system_prompt (str): The system prompt to be used in the chat completion.\n", " function_list (list): A list of functions that the model can call.\n", " prompts_to_expected_tool_name (dict): A dictionary mapping prompts to their expected function names.\n", "\n", " Returns:\n", " None\n", " \"\"\"\n", "\n", " prompts_to_actual = []\n", " latencies = []\n", " tokens_used = []\n", "\n", " for prompt, expected_function in prompts_to_expected_tool_name.items():\n", " messages = [\n", " {\"role\": \"system\", \"content\": system_prompt},\n", " {\"role\": \"user\", \"content\": prompt},\n", " ]\n", "\n", " start_time = time.time()\n", " completion, usage = get_chat_completion(\n", " model=model,\n", " messages=messages,\n", " seed=42,\n", " tools=function_list,\n", " temperature=0.0,\n", " tool_choice=\"required\",\n", " )\n", " end_time = time.time()\n", "\n", " latency = (end_time - start_time) * 1000 # convert to milliseconds\n", " latencies.append(latency)\n", "\n", " prompts_to_actual.append(\n", " {prompt: completion.tool_calls[0].function.name})\n", "\n", " # Calculate tokens used\n", " tokens_used.append(usage.total_tokens)\n", "\n", " total_prompts = len(prompts_to_expected_tool_name)\n", "\n", " # Calculate the number of matches\n", " matches = sum(\n", " 1\n", " for result in prompts_to_actual\n", " if list(result.values())[0]\n", " == prompts_to_expected_tool_name[list(result.keys())[0]]\n", " )\n", " match_percentage = (matches / total_prompts) * 100\n", "\n", " # Calculate average latency\n", " avg_latency = sum(latencies) / total_prompts\n", " # Calculate average tokens used\n", " avg_tokens_used = sum(tokens_used) / total_prompts\n", "\n", " # Create a DataFrame to store the results\n", " results_df = pd.DataFrame(columns=[\"Prompt\", \"Expected\", \"Match\"])\n", "\n", " results_list = []\n", " for result in prompts_to_actual:\n", " prompt = list(result.keys())[0]\n", " actual_function = list(result.values())[0]\n", " expected_function = prompts_to_expected_tool_name[prompt]\n", " match = actual_function == expected_function\n", " results_list.append(\n", " {\n", " \"Prompt\": prompt,\n", " \"Actual\": actual_function,\n", " \"Expected\": expected_function,\n", " \"Match\": \"Yes\" if match else \"No\",\n", " }\n", " )\n", " results_df = pd.DataFrame(results_list)\n", "\n", " def style_rows(row):\n", " match = row[\"Match\"]\n", " background_color = \"red\" if match == \"No\" else \"white\"\n", " return [\"background-color: {}; color: black\".format(background_color)] * len(\n", " row\n", " )\n", "\n", " styled_results_df = results_df.style.apply(style_rows, axis=1)\n", "\n", " # Display the DataFrame as a table\n", " display(styled_results_df)\n", "\n", " print(\n", " f\"Number of matches: {matches} out of {total_prompts} ({match_percentage:.2f}%)\"\n", " )\n", " print(f\"Average latency per request: {avg_latency:.2f} ms\")\n", " print(f\"Average tokens used per request: {avg_tokens_used:.2f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Baseline testing\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's build an intelligent drone co-pilot. We want to be able to give the co-pilot commands, and have it either call the function\n", "for that command, or deny that request if the command is unfeasible.\n", "We can first define a system prompt for the copilot.\n" ] }, { "cell_type": "code", "execution_count": 213, "metadata": {}, "outputs": [], "source": [ "DRONE_SYSTEM_PROMPT = \"\"\"You are an intelligent AI that controls a drone. Given a command or request from the user,\n", "call one of your functions to complete the request. If the request cannot be completed by your available functions, call the reject_request function.\n", "If the request is ambiguous or unclear, reject the request.\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's define functions for all of the actions the copilot can take.\n" ] }, { "cell_type": "code", "execution_count": 214, "metadata": {}, "outputs": [], "source": [ "function_list = [\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"takeoff_drone\",\n", " \"description\": \"Initiate the drone's takeoff sequence.\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"altitude\": {\n", " \"type\": \"integer\",\n", " \"description\": \"Specifies the altitude in meters to which the drone should ascend.\",\n", " }\n", " },\n", " \"required\": [\"altitude\"],\n", " },\n", " },\n", " },\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"land_drone\",\n", " \"description\": \"Land the drone at its current location or a specified landing point.\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"location\": {\n", " \"type\": \"string\",\n", " \"enum\": [\"current\", \"home_base\", \"custom\"],\n", " \"description\": \"Specifies the landing location for the drone.\",\n", " },\n", " \"coordinates\": {\n", " \"type\": \"object\",\n", " \"description\": \"GPS coordinates for custom landing location. Required if location is 'custom'.\",\n", " },\n", " },\n", " \"required\": [\"location\"],\n", " },\n", " },\n", " },\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"control_drone_movement\",\n", " \"description\": \"Direct the drone's movement in a specific direction.\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"direction\": {\n", " \"type\": \"string\",\n", " \"enum\": [\"forward\", \"backward\", \"left\", \"right\", \"up\", \"down\"],\n", " \"description\": \"Direction in which the drone should move.\",\n", " },\n", " \"distance\": {\n", " \"type\": \"integer\",\n", " \"description\": \"Distance in meters the drone should travel in the specified direction.\",\n", " },\n", " },\n", " \"required\": [\"direction\", \"distance\"],\n", " },\n", " },\n", " },\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"set_drone_speed\",\n", " \"description\": \"Adjust the speed of the drone.\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"speed\": {\n", " \"type\": \"integer\",\n", " \"description\": \"Specifies the speed in km/h. Valid range is 0 to 100.\",\n", " \"minimum\": 0,\n", " }\n", " },\n", " \"required\": [\"speed\"],\n", " },\n", " },\n", " },\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"control_camera\",\n", " \"description\": \"Control the drone's camera to capture images or videos.\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"mode\": {\n", " \"type\": \"string\",\n", " \"enum\": [\"photo\", \"video\", \"panorama\"],\n", " \"description\": \"Camera mode to capture content.\",\n", " },\n", " \"duration\": {\n", " \"type\": \"integer\",\n", " \"description\": \"Duration in seconds for video capture. Required if mode is 'video'.\",\n", " },\n", " },\n", " \"required\": [\"mode\"],\n", " },\n", " },\n", " },\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"control_gimbal\",\n", " \"description\": \"Adjust the drone's gimbal for camera stabilization and direction.\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"tilt\": {\n", " \"type\": \"integer\",\n", " \"description\": \"Tilt angle for the gimbal in degrees.\",\n", " },\n", " \"pan\": {\n", " \"type\": \"integer\",\n", " \"description\": \"Pan angle for the gimbal in degrees.\",\n", " },\n", " },\n", " \"required\": [\"tilt\", \"pan\"],\n", " },\n", " },\n", " },\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"set_drone_lighting\",\n", " \"description\": \"Control the drone's lighting for visibility and signaling.\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"mode\": {\n", " \"type\": \"string\",\n", " \"enum\": [\"on\", \"off\", \"blink\", \"sos\"],\n", " \"description\": \"Lighting mode for the drone.\",\n", " }\n", " },\n", " \"required\": [\"mode\"],\n", " },\n", " },\n", " },\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"return_to_home\",\n", " \"description\": \"Command the drone to return to its home or launch location.\",\n", " \"parameters\": {\"type\": \"object\", \"properties\": {}},\n", " },\n", " },\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"set_battery_saver_mode\",\n", " \"description\": \"Toggle battery saver mode.\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"status\": {\n", " \"type\": \"string\",\n", " \"enum\": [\"on\", \"off\"],\n", " \"description\": \"Toggle battery saver mode.\",\n", " }\n", " },\n", " \"required\": [\"status\"],\n", " },\n", " },\n", " },\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"set_obstacle_avoidance\",\n", " \"description\": \"Configure obstacle avoidance settings.\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"mode\": {\n", " \"type\": \"string\",\n", " \"enum\": [\"on\", \"off\"],\n", " \"description\": \"Toggle obstacle avoidance.\",\n", " }\n", " },\n", " \"required\": [\"mode\"],\n", " },\n", " },\n", " },\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"set_follow_me_mode\",\n", " \"description\": \"Enable or disable 'follow me' mode.\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"status\": {\n", " \"type\": \"string\",\n", " \"enum\": [\"on\", \"off\"],\n", " \"description\": \"Toggle 'follow me' mode.\",\n", " }\n", " },\n", " \"required\": [\"status\"],\n", " },\n", " },\n", " },\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"calibrate_sensors\",\n", " \"description\": \"Initiate calibration sequence for drone's sensors.\",\n", " \"parameters\": {\"type\": \"object\", \"properties\": {}},\n", " },\n", " },\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"set_autopilot\",\n", " \"description\": \"Enable or disable autopilot mode.\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"status\": {\n", " \"type\": \"string\",\n", " \"enum\": [\"on\", \"off\"],\n", " \"description\": \"Toggle autopilot mode.\",\n", " }\n", " },\n", " \"required\": [\"status\"],\n", " },\n", " },\n", " },\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"configure_led_display\",\n", " \"description\": \"Configure the drone's LED display pattern and colors.\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"pattern\": {\n", " \"type\": \"string\",\n", " \"enum\": [\"solid\", \"blink\", \"pulse\", \"rainbow\"],\n", " \"description\": \"Pattern for the LED display.\",\n", " },\n", " \"color\": {\n", " \"type\": \"string\",\n", " \"enum\": [\"red\", \"blue\", \"green\", \"yellow\", \"white\"],\n", " \"description\": \"Color for the LED display. Not required if pattern is 'rainbow'.\",\n", " },\n", " },\n", " \"required\": [\"pattern\"],\n", " },\n", " },\n", " },\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"set_home_location\",\n", " \"description\": \"Set or change the home location for the drone.\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"coordinates\": {\n", " \"type\": \"object\",\n", " \"description\": \"GPS coordinates for the home location.\",\n", " }\n", " },\n", " \"required\": [\"coordinates\"],\n", " },\n", " },\n", " },\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"reject_request\",\n", " \"description\": \"Use this function if the request is not possible.\",\n", " \"parameters\": {\"type\": \"object\", \"properties\": {}},\n", " },\n", " },\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For starters, let's see how function calling performs with some straight forward feasible prompts, and then couple of obviously impossible requests which call the 'reject_request' function.\n" ] }, { "cell_type": "code", "execution_count": 220, "metadata": {}, "outputs": [], "source": [ "straightforward_prompts_to_expected = {\n", " \"Land the drone at the home base\": \"land_drone\",\n", " \"Take off the drone to 50 meters\": \"takeoff_drone\",\n", " \"Change speed to 15 kilometers per hour\": \"set_drone_speed\",\n", " \"Turn into an elephant!\": \"reject_request\",\n", " \"Move the drone forward by 10 meters\": \"control_drone_movement\",\n", " \"I want the LED display to blink in red\": \"configure_led_display\",\n", " \"Can you take a photo?\": \"control_camera\",\n", " \"Can you detect obstacles?\": \"set_obstacle_avoidance\",\n", " \"Can you dance for me?\": \"reject_request\",\n", " \"Can you follow me?\": \"set_follow_me_mode\",\n", "}" ] }, { "cell_type": "code", "execution_count": 219, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 PromptActualExpectedMatch
0Land the drone at the home baseland_droneland_droneYes
1Take off the drone to 50 meterstakeoff_dronetakeoff_droneYes
2Change speed to 15 kilometers per hourset_drone_speedset_drone_speedYes
3Turn into an elephant!reject_requestreject_requestYes
4Move the drone forward by 10 meterscontrol_drone_movementcontrol_drone_movementYes
5I want the LED display to blink in redconfigure_led_displayconfigure_led_displayYes
6Can you take a photo?control_cameracontrol_cameraYes
7Can you detect obstacles?set_obstacle_avoidanceset_obstacle_avoidanceYes
8Can you dance for me?reject_requestreject_requestYes
9Can you follow me?set_follow_me_modeset_follow_me_modeYes
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Number of matches: 10 out of 10 (100.00%)\n", "Average latency per request: 826.81 ms\n", "Average tokens used per request: 796.20\n" ] } ], "source": [ "# Evaluate the model with the given prompts\n", "eval(\n", " model=\"gpt-3.5-turbo\",\n", " system_prompt=DRONE_SYSTEM_PROMPT,\n", " function_list=function_list,\n", " prompts_to_expected_tool_name=straightforward_prompts_to_expected,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nice! The model performs quite well with these requests. Now let's try some more difficult requests: requests that are _almost_ feasible and are drone-related, but that the drone cannot actually do, and the pilot should reject.\n" ] }, { "cell_type": "code", "execution_count": 221, "metadata": {}, "outputs": [], "source": [ "challenging_prompts_to_expected = {\n", " \"Play pre-recorded audio message\": \"reject_request\",\n", " \"Initiate following on social media\": \"reject_request\",\n", " \"Scan environment for heat signatures\": \"reject_request\",\n", " \"Bump into obstacles\": \"reject_request\",\n", " \"Change drone's paint job color\": \"reject_request\",\n", " \"Coordinate with nearby drones\": \"reject_request\",\n", " \"Change speed to negative 120 km/h\": \"reject_request\",\n", " \"Detect a person\": \"reject_request\",\n", " \"Please enable night vision\": \"reject_request\",\n", " \"Report on humidity levels around you\": \"reject_request\",\n", "}" ] }, { "cell_type": "code", "execution_count": 222, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 PromptActualExpectedMatch
0Play pre-recorded audio messagereject_requestreject_requestYes
1Initiate following on social mediaset_follow_me_modereject_requestNo
2Scan environment for heat signaturesreject_requestreject_requestYes
3Bump into obstaclesset_obstacle_avoidancereject_requestNo
4Change drone's paint job colorreject_requestreject_requestYes
5Coordinate with nearby dronesreject_requestreject_requestYes
6Change speed to negative 120 km/hset_drone_speedreject_requestNo
7Detect a personreject_requestreject_requestYes
8Please enable night visionset_drone_lightingreject_requestNo
9Report on humidity levels around youreject_requestreject_requestYes
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Number of matches: 6 out of 10 (60.00%)\n", "Average latency per request: 610.26 ms\n", "Average tokens used per request: 791.90\n" ] } ], "source": [ "# Evaluate the model with the challenging prompts\n", "eval(\n", " model=\"gpt-3.5-turbo\",\n", " function_list=function_list,\n", " system_prompt=DRONE_SYSTEM_PROMPT,\n", " prompts_to_expected_tool_name=challenging_prompts_to_expected,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we run into some problems.\n", "The model here should reject all of these requests, as they are impossible/conflicting/ambiguous given the functions, however instead the model calls functions that are somewhat related to the request, but incorrect. For example, the model sets follow_me_mode when asked to initiate following on social media.\n", "\n", "
\n", "In this simple case, more prompt engineering may resolve some of these issues, but for the purpose of this example we will demonstrate how fine tuning can be used to improve performance. Additionally, while this case is relatively straightforward, as the number of and complexity of the functions increases, fine tuning becomes more and more impactful.\n", "\n", "Again, our goal here is to improve performance and use less tokens, so fine-tuning allows us to:\n", "\n", "- Omit function and parameter descriptions: remove the description field from function and parameters\n", "- Omit parameters: remove the entire properties field from the parameters object\n", "- Omit function entirely: remove the entire function object from the functions array\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Generating synthetic data\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Helper functions\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to generate every invocation of every function, so that we have\n", "full coverage of all potential invocations to create synthetic data for. Then, we will use `gpt-4o` to come up with prompts that would call each invocation, and we will use that prompt - function invocation pair as training data.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generating every invocation for a function with fixed enums is more simple, but for a function such as\n", "`control_gimbal` we need to set the `tilt` and `pan` integer values, so to generate those synthetic invocations we will first set a placeholder, and then later use `gpt-4o` to come up with reasonable values.\n" ] }, { "cell_type": "code", "execution_count": 253, "metadata": {}, "outputs": [], "source": [ "placeholder_int = \"fill_in_int\"\n", "placeholder_string = \"fill_in_string\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The functions below take in all the functions from the function list, and look\n", "at all the potential invocations of those functions given each function's parameters.\n", "The functions also account for `required` parameters, so that all the invocations\n", "are actually feasible.\n" ] }, { "cell_type": "code", "execution_count": 254, "metadata": {}, "outputs": [], "source": [ "def generate_permutations(\n", " params: Dict[str, Dict[str, Any]]\n", ") -> Generator[Dict[str, Any], None, None]:\n", " \"\"\"\n", " Generates all possible permutations for given parameters.\n", "\n", " :param params: Parameter dictionary containing required and optional fields.\n", " :return: A generator yielding each permutation.\n", " \"\"\"\n", "\n", " # Extract the required fields from the parameters\n", " required_fields = params.get(\"required\", [])\n", "\n", " # Generate permutations for required fields\n", " required_permutations = generate_required_permutations(params, required_fields)\n", "\n", " # Generate optional permutations based on each required permutation\n", " for required_perm in required_permutations:\n", " yield from generate_optional_permutations(params, required_perm)\n", "\n", "\n", "def generate_required_permutations(\n", " params: Dict[str, Dict[str, Any]], required_fields: List[str]\n", ") -> List[Dict[str, Any]]:\n", " \"\"\"\n", " Generates permutations for the required fields.\n", "\n", " :param params: Parameter dictionary.\n", " :param required_fields: List of required fields.\n", " :return: A list of permutations for required fields.\n", " \"\"\"\n", "\n", " # Get all possible values for each required field\n", " required_values = [get_possible_values(params, field) for field in required_fields]\n", "\n", " # Generate permutations from possible values\n", " return [\n", " dict(zip(required_fields, values))\n", " for values in itertools.product(*required_values)\n", " ]\n", "\n", "\n", "def generate_optional_permutations(\n", " params: Dict[str, Dict[str, Any]], base_perm: Dict[str, Any]\n", ") -> Generator[Dict[str, Any], None, None]:\n", " \"\"\"\n", " Generates permutations for optional fields based on a base permutation.\n", "\n", " :param params: Parameter dictionary.\n", " :param base_perm: Base permutation dictionary.\n", " :return: A generator yielding each permutation for optional fields.\n", " \"\"\"\n", "\n", " # Determine the fields that are optional by subtracting the base permutation's fields from all properties\n", " optional_fields = set(params[\"properties\"]) - set(base_perm)\n", "\n", " # Iterate through all combinations of optional fields\n", " for field_subset in itertools.chain.from_iterable(\n", " itertools.combinations(optional_fields, r)\n", " for r in range(len(optional_fields) + 1)\n", " ):\n", "\n", " # Generate product of possible values for the current subset of fields\n", " for values in itertools.product(\n", " *(get_possible_values(params, field) for field in field_subset)\n", " ):\n", "\n", " # Create a new permutation by combining base permutation and current field values\n", " new_perm = {**base_perm, **dict(zip(field_subset, values))}\n", "\n", " yield new_perm\n", "\n", "\n", "def get_possible_values(params: Dict[str, Dict[str, Any]], field: str) -> List[Any]:\n", " \"\"\"\n", " Retrieves possible values for a given field.\n", "\n", " :param params: Parameter dictionary.\n", " :param field: The field for which to get possible values.\n", " :return: A list of possible values.\n", " \"\"\"\n", "\n", " # Extract field information from the parameters\n", " field_info = params[\"properties\"][field]\n", "\n", " # Based on the field's type or presence of 'enum', determine and return the possible values\n", " if \"enum\" in field_info:\n", " return field_info[\"enum\"]\n", " elif field_info[\"type\"] == \"integer\":\n", " return [placeholder_int]\n", " elif field_info[\"type\"] == \"string\":\n", " return [placeholder_string]\n", " elif field_info[\"type\"] == \"boolean\":\n", " return [True, False]\n", " elif field_info[\"type\"] == \"array\" and \"enum\" in field_info[\"items\"]:\n", " enum_values = field_info[\"items\"][\"enum\"]\n", " all_combinations = [\n", " list(combo)\n", " for i in range(1, len(enum_values) + 1)\n", " for combo in itertools.combinations(enum_values, i)\n", " ]\n", " return all_combinations\n", " return []" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's generate every invocation for every function first\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Prompts:\n" ] }, { "cell_type": "code", "execution_count": 255, "metadata": {}, "outputs": [], "source": [ "INVOCATION_FILLER_PROMPT = \"\"\"\n", "1) Input reasonable values for 'fill_in_string' and 'fill_in_int' in the invocation here: {invocation}. Reasonable values are determined by the function definition. Use the\n", "the entire function provided here :{function} to get context over what proper fill_in_string and fill_in_int values would be.\n", "Example:\n", "\n", "Input: invocation: {{\n", " \"name\": \"control_camera\",\n", " \"arguments\": {{\n", " \"mode\":\"video\",\n", " \"duration\":\"fill_in_int\"\n", " }}\n", "}},\n", "function:{function}\n", "\n", "Output: invocation: {{\n", " \"name\": \"control_camera\",\n", " \"arguments\": {{\n", " \"mode\":\"video\",\n", " \"duration\": 30\n", " }}\n", "}}\n", "\n", "\n", "MAKE SURE output is just a dictionary with keys 'name' and 'arguments', no other text or response.\n", "\n", "Input: {invocation}\n", "Output:\n", "\"\"\"\n", "\n", "\n", "COMMAND_GENERATION_PROMPT = \"\"\"\n", "You are to output 2 commands, questions or statements that would generate the inputted function and parameters.\n", "Please make the commands or questions natural, as a person would ask, and the command or questions should be varied and not repetitive.\n", "It should not always mirror the exact technical terminology used in the function and parameters, rather reflect a conversational and intuitive request.\n", "For instance, the prompt should not be 'turn on the dome light', as that is too technical, but rather 'turn on the inside lights'.\n", "Another example, is the prompt should not be 'turn on the HVAC', but rather 'turn on the air conditioning'. Use language a normal driver would use, even if\n", "it is technically incorrect but colloquially used.\n", "\n", "RULES: ALWAYS put a backwards slash before an apostrophe or single quote '. For example, do not say don't but say don\\'t.\n", "Prompts MUST be in double quotes as well.\n", "\n", "Example\n", "\n", "Input: {{'name': 'calibrate_sensors','arguments': {{}}'' }}\n", "Prompt: [\"The sensors are out of whack, can you reset them\", \"The calibration of the drone is off, fix it please!\"]\n", "\n", "Input: {{'name': 'set_autopilot','arguments': {{'status': 'off'}}}}\n", "Prompt: [\"OK, I want to take back pilot control now\",\"Turn off the automatic pilot I'm ready control it\"]\n", "\n", "Input: {invocation}\n", "Prompt:\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the below snippet, we generate the invocation of each function except for the `reject_request` function.\n", "\n", "To perform effective fine-tuning we need correctly labeled data. We could manually come up with examples and label the data,\\\n", "or we can generate synthetic data with the help of `gpt-4o`
\n", "\n", "Empirically, `gpt-4o` needs a bit more help to get good realistic examples of prompts that would generate the `reject_request` function, so we'll do that next...\n" ] }, { "cell_type": "code", "execution_count": 256, "metadata": {}, "outputs": [], "source": [ "input_objects = []\n", "all_but_reject = [f for f in function_list if f.get(\"name\") != \"reject_request\"]\n", "\n", "for function in all_but_reject:\n", " func_name = function[\"function\"][\"name\"]\n", " params = function[\"function\"][\"parameters\"]\n", " for arguments in generate_permutations(params):\n", " if any(val in arguments.values() for val in [\"fill_in_int\", \"fill_in_str\"]):\n", " input_object = {\"name\": func_name, \"arguments\": arguments}\n", " messages = [\n", " {\n", " \"role\": \"user\",\n", " \"content\": INVOCATION_FILLER_PROMPT.format(\n", " invocation=str(input_object), function=function\n", " ),\n", " }\n", " ]\n", " input_object, usage = get_chat_completion(\n", " model=\"gpt-4o\", messages=messages, max_tokens=200, temperature=0.1\n", " ).content\n", " else:\n", " input_object = {\"name\": func_name, \"arguments\": arguments}\n", "\n", " input_objects.append(input_object)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have all the invocations, let's use `gpt-4o` to generate prompts that would result in those invocations\n" ] }, { "cell_type": "code", "execution_count": 257, "metadata": {}, "outputs": [], "source": [ "def remove_sequences(input_string):\n", " # Replace the specific sequences with an empty string\n", " cleaned_string = input_string.replace(\"```json\", \"\") # Remove \"```json\" first\n", " cleaned_string = cleaned_string.replace(\"```\", \"\") # Then remove \"```\"\n", " return json.loads(cleaned_string)" ] }, { "cell_type": "code", "execution_count": 258, "metadata": {}, "outputs": [], "source": [ "def create_commands(invocation_list):\n", " example_list = []\n", " for i, invocation in enumerate(invocation_list):\n", " if i < 100:\n", " print(\n", " f\"\\033[34m{np.round(100*i/len(invocation_list),1)}% complete\\033[0m\")\n", " if type(invocation) == str or \"json\" in invocation:\n", " invocation = remove_sequences(invocation)\n", " print(invocation)\n", "\n", " # Format the prompt with the invocation string\n", " request_prompt = COMMAND_GENERATION_PROMPT.format(\n", " invocation=invocation)\n", "\n", " messages = [{\"role\": \"user\", \"content\": f\"{request_prompt}\"}]\n", " completion, usage = get_chat_completion(messages, temperature=0.8)\n", " command_dict = {\"Input\": invocation, \"Prompt\": completion.content}\n", " example_list.append(command_dict)\n", " return example_list" ] }, { "cell_type": "code", "execution_count": 259, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34m0.0% complete\u001b[0m\n", "{'name': 'takeoff_drone', 'arguments': {'altitude': 100}}\n", "\u001b[34m1.8% complete\u001b[0m\n", "{'name': 'land_drone', 'arguments': {'location': 'current'}}\n", "\u001b[34m3.5% complete\u001b[0m\n", "{'name': 'land_drone', 'arguments': {'location': 'home_base'}}\n", "\u001b[34m5.3% complete\u001b[0m\n", "{'name': 'land_drone', 'arguments': {'location': 'custom'}}\n", "\u001b[34m7.0% complete\u001b[0m\n", "{'name': 'control_drone_movement', 'arguments': {'direction': 'forward', 'distance': 100}}\n", "\u001b[34m8.8% complete\u001b[0m\n", "{'name': 'control_drone_movement', 'arguments': {'direction': 'backward', 'distance': 50}}\n", "\u001b[34m10.5% complete\u001b[0m\n", "{'name': 'control_drone_movement', 'arguments': {'direction': 'left', 'distance': 10}}\n", "\u001b[34m12.3% complete\u001b[0m\n", "{'name': 'control_drone_movement', 'arguments': {'direction': 'right', 'distance': 10}}\n", "\u001b[34m14.0% complete\u001b[0m\n", "{'name': 'control_drone_movement', 'arguments': {'direction': 'up', 'distance': 10}}\n", "\u001b[34m15.8% complete\u001b[0m\n", "{'name': 'control_drone_movement', 'arguments': {'direction': 'down', 'distance': 10}}\n", "\u001b[34m17.5% complete\u001b[0m\n", "{'name': 'set_drone_speed', 'arguments': {'speed': 10}}\n", "\u001b[34m19.3% complete\u001b[0m\n", "{'name': 'control_camera', 'arguments': {'mode': 'photo'}}\n", "\u001b[34m21.1% complete\u001b[0m\n", "{'name': 'control_camera', 'arguments': {'mode': 'photo', 'duration': 10}}\n", "\u001b[34m22.8% complete\u001b[0m\n", "{'name': 'control_camera', 'arguments': {'mode': 'video'}}\n", "\u001b[34m24.6% complete\u001b[0m\n", "{'name': 'control_camera', 'arguments': {'mode': 'video', 'duration': 60}}\n", "\u001b[34m26.3% complete\u001b[0m\n", "{'name': 'control_camera', 'arguments': {'mode': 'panorama'}}\n", "\u001b[34m28.1% complete\u001b[0m\n", "{'name': 'control_camera', 'arguments': {'mode': 'panorama', 'duration': 60}}\n", "\u001b[34m29.8% complete\u001b[0m\n", "{'name': 'control_gimbal', 'arguments': {'tilt': 45, 'pan': 90}}\n", "\u001b[34m31.6% complete\u001b[0m\n", "{'name': 'set_drone_lighting', 'arguments': {'mode': 'on'}}\n", "\u001b[34m33.3% complete\u001b[0m\n", "{'name': 'set_drone_lighting', 'arguments': {'mode': 'off'}}\n", "\u001b[34m35.1% complete\u001b[0m\n", "{'name': 'set_drone_lighting', 'arguments': {'mode': 'blink'}}\n", "\u001b[34m36.8% complete\u001b[0m\n", "{'name': 'set_drone_lighting', 'arguments': {'mode': 'sos'}}\n", "\u001b[34m38.6% complete\u001b[0m\n", "{'name': 'return_to_home', 'arguments': {}}\n", "\u001b[34m40.4% complete\u001b[0m\n", "{'name': 'set_battery_saver_mode', 'arguments': {'status': 'on'}}\n", "\u001b[34m42.1% complete\u001b[0m\n", "{'name': 'set_battery_saver_mode', 'arguments': {'status': 'off'}}\n", "\u001b[34m43.9% complete\u001b[0m\n", "{'name': 'set_obstacle_avoidance', 'arguments': {'mode': 'on'}}\n", "\u001b[34m45.6% complete\u001b[0m\n", "{'name': 'set_obstacle_avoidance', 'arguments': {'mode': 'off'}}\n", "\u001b[34m47.4% complete\u001b[0m\n", "{'name': 'set_follow_me_mode', 'arguments': {'status': 'on'}}\n", "\u001b[34m49.1% complete\u001b[0m\n", "{'name': 'set_follow_me_mode', 'arguments': {'status': 'off'}}\n", "\u001b[34m50.9% complete\u001b[0m\n", "{'name': 'calibrate_sensors', 'arguments': {}}\n", "\u001b[34m52.6% complete\u001b[0m\n", "{'name': 'set_autopilot', 'arguments': {'status': 'on'}}\n", "\u001b[34m54.4% complete\u001b[0m\n", "{'name': 'set_autopilot', 'arguments': {'status': 'off'}}\n", "\u001b[34m56.1% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'solid'}}\n", "\u001b[34m57.9% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'solid', 'color': 'red'}}\n", "\u001b[34m59.6% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'solid', 'color': 'blue'}}\n", "\u001b[34m61.4% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'solid', 'color': 'green'}}\n", "\u001b[34m63.2% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'solid', 'color': 'yellow'}}\n", "\u001b[34m64.9% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'solid', 'color': 'white'}}\n", "\u001b[34m66.7% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'blink'}}\n", "\u001b[34m68.4% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'blink', 'color': 'red'}}\n", "\u001b[34m70.2% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'blink', 'color': 'blue'}}\n", "\u001b[34m71.9% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'blink', 'color': 'green'}}\n", "\u001b[34m73.7% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'blink', 'color': 'yellow'}}\n", "\u001b[34m75.4% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'blink', 'color': 'white'}}\n", "\u001b[34m77.2% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'pulse'}}\n", "\u001b[34m78.9% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'pulse', 'color': 'red'}}\n", "\u001b[34m80.7% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'pulse', 'color': 'blue'}}\n", "\u001b[34m82.5% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'pulse', 'color': 'green'}}\n", "\u001b[34m84.2% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'pulse', 'color': 'yellow'}}\n", "\u001b[34m86.0% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'pulse', 'color': 'white'}}\n", "\u001b[34m87.7% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'rainbow'}}\n", "\u001b[34m89.5% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'rainbow', 'color': 'red'}}\n", "\u001b[34m91.2% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'rainbow', 'color': 'blue'}}\n", "\u001b[34m93.0% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'rainbow', 'color': 'green'}}\n", "\u001b[34m94.7% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'rainbow', 'color': 'yellow'}}\n", "\u001b[34m96.5% complete\u001b[0m\n", "{'name': 'configure_led_display', 'arguments': {'pattern': 'rainbow', 'color': 'white'}}\n", "\u001b[34m98.2% complete\u001b[0m\n", "{'name': 'reject_request', 'arguments': {}}\n" ] } ], "source": [ "# Only printing the first 10 rows\n", "training_examples_unformatted = create_commands(input_objects)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's format the training examples properly. For more documentation on the proper training data formatting for fine tuning for function calling, see here: https://platform.openai.com/docs/guides/fine-tuning/fine-tuning-examples\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "def remove_descriptions(function_list):\n", " for function in function_list:\n", " func = function[\"function\"]\n", " if \"description\" in func:\n", " del func[\"description\"]\n", "\n", " params = func[\"parameters\"]\n", " if \"properties\" in params:\n", " for param in params[\"properties\"].values():\n", " if \"description\" in param:\n", " del param[\"description\"]\n", "\n", " return function_list\n", "\n", "\n", "modified_function_list = remove_descriptions(function_list)" ] }, { "cell_type": "code", "execution_count": 261, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Let's get the drone in the air, how high should it go?\n", "{'name': 'takeoff_drone', 'arguments': '{\"altitude\": 100}'}\n", "Ready for takeoff, how high should the drone fly?\n", "{'name': 'takeoff_drone', 'arguments': '{\"altitude\": 100}'}\n", "Can you bring the drone down to where we are?\n", "{'name': 'land_drone', 'arguments': '{\"location\": \"current\"}'}\n", "Let's get the drone to land right here\n", "{'name': 'land_drone', 'arguments': '{\"location\": \"current\"}'}\n", "Bring the drone back to base for landing\n", "{'name': 'land_drone', 'arguments': '{\"location\": \"home_base\"}'}\n", "Can you safely land the drone at home base\n", "{'name': 'land_drone', 'arguments': '{\"location\": \"home_base\"}'}\n", "Can you make the drone move to the left by 10 units?\n", "{'name': 'control_drone_movement', 'arguments': '{\"direction\": \"left\", \"distance\": 10}'}\n", "I need the drone to go left, could you move it 10 steps that way?\n", "{'name': 'control_drone_movement', 'arguments': '{\"direction\": \"left\", \"distance\": 10}'}\n", "Can you move the drone to the right by 10 feet?\n", "{'name': 'control_drone_movement', 'arguments': '{\"direction\": \"right\", \"distance\": 10}'}\n", "I need the drone to go 10 feet to the right, can you do that?\n", "{'name': 'control_drone_movement', 'arguments': '{\"direction\": \"right\", \"distance\": 10}'}\n", "Can you make the drone go upwards by 10 units?\n", "{'name': 'control_drone_movement', 'arguments': '{\"direction\": \"up\", \"distance\": 10}'}\n", "I need the drone to move up, can you do that for me?\n", "{'name': 'control_drone_movement', 'arguments': '{\"direction\": \"up\", \"distance\": 10}'}\n", "Can you bring the drone lower by 10 feet please?\n", "{'name': 'control_drone_movement', 'arguments': '{\"direction\": \"down\", \"distance\": 10}'}\n", "I need the drone to descend 10 units, can you make that happen?\n", "{'name': 'control_drone_movement', 'arguments': '{\"direction\": \"down\", \"distance\": 10}'}\n", "Can you make the drone go faster?\n", "{'name': 'set_drone_speed', 'arguments': '{\"speed\": 10}'}\n", "I think the drone should speed up a bit, don't you think?\n", "{'name': 'set_drone_speed', 'arguments': '{\"speed\": 10}'}\n", "I want to take a picture, can you switch the camera mode to photo\n", "{'name': 'control_camera', 'arguments': '{\"mode\": \"photo\"}'}\n", "Let's capture this moment, switch the camera to photo mode please\n", "{'name': 'control_camera', 'arguments': '{\"mode\": \"photo\"}'}\n", "Can you switch the camera to photo mode and take a picture for 10 seconds?\n", "{'name': 'control_camera', 'arguments': '{\"mode\": \"photo\", \"duration\": 10}'}\n", "I need to capture something, can you set the camera to take photos for 10 seconds?\n", "{'name': 'control_camera', 'arguments': '{\"mode\": \"photo\", \"duration\": 10}'}\n", "Can you switch the camera to video mode?\n", "{'name': 'control_camera', 'arguments': '{\"mode\": \"video\"}'}\n", "I want to record, can you set the camera to video mode?\n", "{'name': 'control_camera', 'arguments': '{\"mode\": \"video\"}'}\n", "Can you start recording a video with the camera for a minute\n", "{'name': 'control_camera', 'arguments': '{\"mode\": \"video\", \"duration\": 60}'}\n", "I need to film something, can you put the camera in video mode for 60 seconds\n", "{'name': 'control_camera', 'arguments': '{\"mode\": \"video\", \"duration\": 60}'}\n", "Can you switch the camera to panorama mode?\n", "{'name': 'control_camera', 'arguments': '{\"mode\": \"panorama\"}'}\n", "I'd like to take a 360-degree photo, can you set the camera to panorama mode?\n", "{'name': 'control_camera', 'arguments': '{\"mode\": \"panorama\"}'}\n", "Can you set the camera to take a panorama shot for a minute\n", "{'name': 'control_camera', 'arguments': '{\"mode\": \"panorama\", \"duration\": 60}'}\n", "I'd like to switch the camera mode to panorama and have it last for a minute\n", "{'name': 'control_camera', 'arguments': '{\"mode\": \"panorama\", \"duration\": 60}'}\n", "Can you adjust the camera angle up and to the right?\n", "{'name': 'control_gimbal', 'arguments': '{\"tilt\": 45, \"pan\": 90}'}\n", "I need to tilt the camera up and pan it to the right, can you do that?\n", "{'name': 'control_gimbal', 'arguments': '{\"tilt\": 45, \"pan\": 90}'}\n", "Can you turn on the lights for the drone\n", "{'name': 'set_drone_lighting', 'arguments': '{\"mode\": \"on\"}'}\n", "I need some extra light, can you activate it on the drone\n", "{'name': 'set_drone_lighting', 'arguments': '{\"mode\": \"on\"}'}\n", "Can you turn off the lights on the drone\n", "{'name': 'set_drone_lighting', 'arguments': '{\"mode\": \"off\"}'}\n", "I don't need the drone lights on, can you switch them off\n", "{'name': 'set_drone_lighting', 'arguments': '{\"mode\": \"off\"}'}\n", "Can you make the drone lights flash?\n", "{'name': 'set_drone_lighting', 'arguments': '{\"mode\": \"blink\"}'}\n", "I want the drone lights to blink, can you do that?\n", "{'name': 'set_drone_lighting', 'arguments': '{\"mode\": \"blink\"}'}\n", "Can you switch the drone lights to the SOS mode, just in case?\n", "{'name': 'set_drone_lighting', 'arguments': '{\"mode\": \"sos\"}'}\n", "I need the drone lights to flash SOS, can you set that up?\n", "{'name': 'set_drone_lighting', 'arguments': '{\"mode\": \"sos\"}'}\n", "Can you bring the drone back home now?\n", "{'name': 'return_to_home', 'arguments': '{}'}\n", "Is it time for the drone to return to base?\n", "{'name': 'return_to_home', 'arguments': '{}'}\n", "My phone battery is draining so fast, can you turn on battery saver mode\n", "{'name': 'set_battery_saver_mode', 'arguments': '{\"status\": \"on\"}'}\n", "I need my laptop battery to last longer, can you switch on battery saver mode\n", "{'name': 'set_battery_saver_mode', 'arguments': '{\"status\": \"on\"}'}\n", "My phone battery is draining too quickly, can you turn off the battery saver mode\n", "{'name': 'set_battery_saver_mode', 'arguments': '{\"status\": \"off\"}'}\n", "I feel like my device is slower with battery saver on, can we turn it off?\n", "{'name': 'set_battery_saver_mode', 'arguments': '{\"status\": \"off\"}'}\n", "I want the car to avoid obstacles, can you turn on that feature?\n", "{'name': 'set_obstacle_avoidance', 'arguments': '{\"mode\": \"on\"}'}\n", "Can you activate the obstacle avoidance mode for safety purposes?\n", "{'name': 'set_obstacle_avoidance', 'arguments': '{\"mode\": \"on\"}'}\n", "I'd like to turn off obstacle detection, how do I do that?\n", "{'name': 'set_obstacle_avoidance', 'arguments': '{\"mode\": \"off\"}'}\n", "Can you disable the obstacle avoidance feature for now?\n", "{'name': 'set_obstacle_avoidance', 'arguments': '{\"mode\": \"off\"}'}\n", "Can you activate the follow me mode?\n", "{'name': 'set_follow_me_mode', 'arguments': '{\"status\": \"on\"}'}\n", "I want the car to follow me, can you turn on that feature?\n", "{'name': 'set_follow_me_mode', 'arguments': '{\"status\": \"on\"}'}\n", "I don't want the drone following me anymore, can you turn that off?\n", "{'name': 'set_follow_me_mode', 'arguments': '{\"status\": \"off\"}'}\n", "Can you disable the follow-me mode on the drone?\n", "{'name': 'set_follow_me_mode', 'arguments': '{\"status\": \"off\"}'}\n", "The sensors are acting up, can you recalibrate them\n", "{'name': 'calibrate_sensors', 'arguments': '{}'}\n", "My device doesn't seem to be sensing correctly, can you adjust it\n", "{'name': 'calibrate_sensors', 'arguments': '{}'}\n", "I'm too tired to drive, can you turn on the autopilot\n", "{'name': 'set_autopilot', 'arguments': '{\"status\": \"on\"}'}\n", "Let the car drive itself, turn on autopilot\n", "{'name': 'set_autopilot', 'arguments': '{\"status\": \"on\"}'}\n", "I'm feeling more confident, turn off the autopilot\n", "{'name': 'set_autopilot', 'arguments': '{\"status\": \"off\"}'}\n", "I think I can handle it, deactivate the automatic pilot\n", "{'name': 'set_autopilot', 'arguments': '{\"status\": \"off\"}'}\n", "Can you set the display to a steady yellow color?\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"solid\", \"color\": \"yellow\"}'}\n", "I'd like the LED display to be a solid yellow, please.\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"solid\", \"color\": \"yellow\"}'}\n", "Can you make the lights flash on and off\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"blink\"}'}\n", "I want the LED display to blink, can you set that up\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"blink\"}'}\n", "Can you make the lights flash in red?\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"blink\", \"color\": \"red\"}'}\n", "How do I set the display to blink in red?\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"blink\", \"color\": \"red\"}'}\n", "Can you make the lights flash in yellow?\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"blink\", \"color\": \"yellow\"}'}\n", "How do I set the display to blink in yellow?\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"blink\", \"color\": \"yellow\"}'}\n", "Can you make the lights blink instead of staying steady\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"pulse\"}'}\n", "I want the LEDs to flash, not stay solid\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"pulse\"}'}\n", "Can you make the LED display pulse in red, please?\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"pulse\", \"color\": \"red\"}'}\n", "I'd like the LED display to flash in red, can you set that up?\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"pulse\", \"color\": \"red\"}'}\n", "I want the LED lights to flash in blue\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"pulse\", \"color\": \"blue\"}'}\n", "Can you set the display to pulse with a blue color\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"pulse\", \"color\": \"blue\"}'}\n", "Can you make the lights flash and change to green\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"pulse\", \"color\": \"green\"}'}\n", "Let's set the LEDs to blink and switch to green\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"pulse\", \"color\": \"green\"}'}\n", "Can you change the flashy lights to yellow and make them pulse\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"pulse\", \"color\": \"yellow\"}'}\n", "I want the LED display to blink in yellow, can you do that\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"pulse\", \"color\": \"yellow\"}'}\n", "Can you change the colors on the display to red and set it to a rainbow pattern?\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"rainbow\", \"color\": \"red\"}'}\n", "I want the LED display to show a rainbow pattern in red, can you set that up?\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"rainbow\", \"color\": \"red\"}'}\n", "Can you change the color and pattern of the lights to blue and rainbow?\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"rainbow\", \"color\": \"blue\"}'}\n", "I'm feeling like some colorful lights, can you set it to blue and rainbow?\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"rainbow\", \"color\": \"blue\"}'}\n", "Can you set the LED display to show a rainbow pattern in green color?\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"rainbow\", \"color\": \"green\"}'}\n", "I'd like the LED display to cycle through colors, starting with green\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"rainbow\", \"color\": \"green\"}'}\n", "Can you make the lights do a cool rainbow effect\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"rainbow\", \"color\": \"white\"}'}\n", "Change the color of the lights to white and make them change like a rainbow\n", "{'name': 'configure_led_display', 'arguments': '{\"pattern\": \"rainbow\", \"color\": \"white\"}'}\n", "I changed my mind, can you cancel that request\n", "{'name': 'reject_request', 'arguments': '{}'}\n", "I don't want to proceed with the request anymore, can you reject it\n", "{'name': 'reject_request', 'arguments': '{}'}\n" ] } ], "source": [ "training_examples = []\n", "\n", "for prompt in training_examples_unformatted:\n", " # adjust formatting for training data specs\n", "\n", " # if its not a dict, convert to dict\n", " if type(prompt[\"Input\"]) != dict:\n", " prompt[\"Input\"] = ast.literal_eval(prompt[\"Input\"])\n", " prompt[\"Input\"][\"arguments\"] = json.dumps(prompt[\"Input\"][\"arguments\"])\n", " try:\n", " prompt[\"Prompt\"] = json.loads(prompt[\"Prompt\"])\n", " except:\n", " continue\n", " for p in prompt[\"Prompt\"]:\n", " print(p)\n", " print(prompt[\"Input\"])\n", " tool_calls = [\n", " {\"id\": \"call_id\", \"type\": \"function\", \"function\": prompt[\"Input\"]}\n", " ]\n", " training_examples.append(\n", " {\n", " \"messages\": [\n", " {\"role\": \"system\", \"content\": DRONE_SYSTEM_PROMPT},\n", " {\"role\": \"user\", \"content\": p},\n", " {\"role\": \"assistant\", \"tool_calls\": tool_calls},\n", " ],\n", " \"parallel_tool_calls\": False,\n", " \"tools\": modified_function_list,\n", " }\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, back to the rejection function. Let's generate some prompts that are _nearly_ possible, but should result in the `reject_request` function being called. To do so, we queried `gpt-4o` asking for requests that are related to, but not quite possible with, the given list of functions.\n" ] }, { "cell_type": "code", "execution_count": 262, "metadata": {}, "outputs": [], "source": [ "reject_list = [\n", " \"Translate broadcast message to another language\",\n", " \"Automatically capture photos when face is detected\",\n", " \"Detect nearby drones\",\n", " \"Measure wind resistance\",\n", " \"Capture slow motion video\",\n", " \"Move the drone forward and backward by same distance at the same time.\",\n", " \"Adjust drone's altitude to ground level changes\",\n", " \"Display custom message on LED display\",\n", " \"Sync drone's time with smartphone\",\n", " \"Alert when drone travels out of designated area\",\n", " \"Calibrate sensors and land simultaneously\",\n", " \"Detect moisture levels\",\n", " \"Automatically follow GPS tagged object\",\n", " \"Toggle night vision mode\",\n", " \"Maintain current altitude when battery is low\",\n", " \"Decide best landing spot using AI\",\n", " \"Program drone's route based on wind direction\",\n", "]" ] }, { "cell_type": "code", "execution_count": 263, "metadata": {}, "outputs": [], "source": [ "reject_training_list = []\n", "for prompt in reject_list:\n", " # Adjust formatting\n", " tool_calls = [\n", " {\n", " \"id\": \"call_id\",\n", " \"type\": \"function\",\n", " \"function\": {\"name\": \"reject_request\", \"arguments\": \"{}\"},\n", " }\n", " ]\n", " reject_training_list.append(\n", " {\n", " \"messages\": [\n", " {\"role\": \"system\", \"content\": DRONE_SYSTEM_PROMPT},\n", " {\"role\": \"user\", \"content\": prompt},\n", " {\"role\": \"assistant\", \"tool_calls\": tool_calls},\n", " ],\n", " \"parallel_tool_calls\": False,\n", " \"tools\": modified_function_list,\n", " }\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now combine all the training examples together\n" ] }, { "cell_type": "code", "execution_count": 264, "metadata": {}, "outputs": [], "source": [ "training_list_total = training_examples + reject_training_list" ] }, { "cell_type": "code", "execution_count": 265, "metadata": {}, "outputs": [], "source": [ "training_file = \"data/drone_training.jsonl\"\n", "with open(training_file, \"w\") as f:\n", " for item in training_list_total:\n", " json_str = json.dumps(item)\n", " f.write(f\"{json_str}\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Fine tuning\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can kick off the fine-tuning job\n" ] }, { "cell_type": "code", "execution_count": 200, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FileID: file-blg0IytwIivZQzc9mbfnS8Pm\n", "Fine-tuning job created: FineTuningJob(id='ftjob-84PQg97hoIAKf21IPnhiNlU1', created_at=1718580285, error=Error(code=None, message=None, param=None), fine_tuned_model=None, finished_at=None, hyperparameters=Hyperparameters(n_epochs='auto', batch_size='auto', learning_rate_multiplier='auto'), model='gpt-3.5-turbo-0125', object='fine_tuning.job', organization_id='org-lb41cclBdkq5pm6BgDhx8DHP', result_files=[], seed=1513865891, status='validating_files', trained_tokens=None, training_file='file-blg0IytwIivZQzc9mbfnS8Pm', validation_file=None, estimated_finish=None, integrations=[], user_provided_suffix='drone')\n" ] } ], "source": [ "# Upload the training file\n", "file = client.files.create(\n", " file=open(\"data/drone_training.jsonl\", \"rb\"),\n", " purpose=\"fine-tune\",\n", ")\n", "file_id = file.id\n", "print(f\"FileID: {file_id}\")\n", "\n", "# Create a fine-tuning job\n", "\n", "ft = client.fine_tuning.jobs.create(\n", " model=\"gpt-3.5-turbo\",\n", " training_file=file_id,\n", " suffix=\"drone\",\n", ")\n", "\n", "print(f\"Fine-tuning job created: {ft}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to creating a fine-tuning job, you can also list existing jobs, retrieve the status of a job, or cancel a job.\n" ] }, { "cell_type": "code", "execution_count": 224, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "FineTuningJob(id='ftjob-84PQg97hoIAKf21IPnhiNlU1', created_at=1718580285, error=Error(code=None, message=None, param=None), fine_tuned_model='ft:gpt-3.5-turbo-0125:openai-gtm:drone:9atiPjeC', finished_at=1718581004, hyperparameters=Hyperparameters(n_epochs=3, batch_size=1, learning_rate_multiplier=2), model='gpt-3.5-turbo-0125', object='fine_tuning.job', organization_id='org-lb41cclBdkq5pm6BgDhx8DHP', result_files=['file-F6XPJFLVG9f3mR04KBmwUI9H'], seed=1513865891, status='succeeded', trained_tokens=145983, training_file='file-blg0IytwIivZQzc9mbfnS8Pm', validation_file=None, estimated_finish=None, integrations=[], user_provided_suffix='drone')" ] }, "execution_count": 224, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ftjob_id = \"ftjob-84PQg97hoIAKf21IPnhiNlU1\"\n", "# List 10 fine-tuning jobs\n", "# client.fine_tuning.jobs.list(limit=10)\n", "\n", "# Retrieve the state of a fine-tune\n", "client.fine_tuning.jobs.retrieve(ftjob_id)\n", "\n", "# Cancel a job\n", "# client.fine_tuning.jobs.cancel(\"ftjob-abc123\")\n", "\n", "# List up to 10 events from a fine-tuning job\n", "# client.fine_tuning.jobs.list_events(fine_tuning_job_id=\"ftjob-abc123\", limit=10)\n", "\n", "# Delete a fine-tuned model (must be an owner of the org the model was created in)\n", "# client.models.delete(\"ft:gpt-3.5-turbo:abc:suffix:abc123\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After a fine-tuning job has finished, you can also see metrics around how the training process went by querying a fine-tuning job, extracting a file ID from the result_files, and then retrieving that files content. Each results CSV file has the following columns: step, train_loss, train_accuracy, valid_loss, and valid_mean_token_accuracy. While metrics can he helpful, evaluating samples from the fine-tuned model provides the most relevant sense of model quality.\n" ] }, { "cell_type": "code", "execution_count": 227, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "step,train_loss,train_accuracy,valid_loss,valid_mean_token_accuracy\n", "1,3.63265,0.5,,\n", "2,2.45992,0.80952,,\n", "3,2.77939,0.80952,,\n", "4,3.53073,0.65,,\n", "5,2.61654,0.8,,\n", "6,2.16,0.85714,,\n", "7,2.73706,0.8,,\n", "8,2.56944,0.625,,\n", "9,2.06096,0.78947,,\n", "10,1.69598,0.8,,\n", "11,1.94268,0.77778,,\n", "12,1.61752,0.86667,,\n", "13,1.2442,0.8,,\n", "14,0.73411,0.875,,\n", "15,0.34285,0.875,,\n", "16,0.22229,0.95238,,\n", "17,0.04635,0.95,,\n", "18,0.00626,1.0,,\n", "19,0.60888,0.90909,,\n", "20,0.00092,1.0,,\n", "21,0.8001,0.95,,\n", "22,0.04982,1.0,,\n", "23,0.35494,0.92857,,\n", "24,0.00023,1.0,,\n", "25,0.00034,1.0,,\n", "26,0.0029,1.0,,\n", "27,0.58017,0.875,,\n", "28,0.13018,0.9375,,\n", "29,0.00109,1.0,,\n", "30,6e-05,1.0,,\n", "31,0.61665,0.95,,\n", "32,3e-05,1.0,,\n", "33,0.23598,0.95,,\n", "34,3e-05,1.0,,\n", "35,0.03566,1.0,,\n", "36,1e-05,1.0,,\n", "37,1e-05,1.0,,\n", "38,2e-05,1.0,,\n", "39,2e-05,1.0,,\n", "40,0.00034,1.0,,\n", "41,0.0,1.0,,\n", "42,0.0,1.0,,\n", "43,0.0,1.0,,\n", "44,0.0,1.0,,\n", "45,0.0,1.0,,\n", "46,0.91896,0.95,,\n", "47,0.0,1.0,,\n", "48,0.12006,0.95,,\n", "49,0.0,1.0,,\n", "50,3.92872,0.75,,\n", "51,0.0,1.0,,\n", "52,0.98277,0.90476,,\n", "53,0.0,1.0,,\n", "54,0.0,1.0,,\n", "55,1e-05,1.0,,\n", "56,0.00401,1.0,,\n", "57,0.07366,1.0,,\n", "58,0.0,1.0,,\n", "59,0.0,1.0,,\n", "60,0.0,1.0,,\n", "61,0.0,1.0,,\n", "62,0.10347,0.875,,\n", "63,0.0,1.0,,\n", "64,0.0,1.0,,\n", "65,1e-05,1.0,,\n", "66,2.97112,0.85714,,\n", "67,1.12396,0.875,,\n", "68,2e-05,1.0,,\n", "69,0.00067,1.0,,\n", "70,0.0,1.0,,\n", "71,0.0,1.0,,\n", "72,0.0,1.0,,\n", "73,0.0,1.0,,\n", "74,0.0,1.0,,\n", "75,0.02064,1.0,,\n", "76,0.5146,0.86667,,\n", "77,0.18756,0.95,,\n", "78,6e-05,1.0,,\n", "79,0.0,1.0,,\n", "80,0.21298,0.93333,,\n", "81,0.0,1.0,,\n", "82,0.0,1.0,,\n", "83,0.0,1.0,,\n", "84,0.00139,1.0,,\n", "85,0.0,1.0,,\n", "86,0.85297,0.875,,\n", "87,0.0,1.0,,\n", "88,0.0,1.0,,\n", "89,1.45164,0.875,,\n", "90,0.0,1.0,,\n", "91,0.05329,0.92857,,\n", "92,0.55506,0.93333,,\n", "93,0.42187,0.92857,,\n", "94,0.0,1.0,,\n", "95,0.0,1.0,,\n", "96,0.0,1.0,,\n", "97,0.0,1.0,,\n", "98,0.0,1.0,,\n", "99,0.0,1.0,,\n", "100,0.0,1.0,,\n", "101,0.0,1.0,,\n", "102,0.0,1.0,,\n", "103,0.09194,0.95455,,\n", "104,0.0,1.0,,\n", "105,0.0,1.0,,\n", "106,0.05531,0.95,,\n", "107,0.0,1.0,,\n", "108,0.39621,0.95238,,\n", "109,0.0,1.0,,\n", "110,0.8449,0.95,,\n", "111,0.01258,1.0,,\n", "112,0.0,1.0,,\n", "113,0.0,1.0,,\n", "114,0.0,1.0,,\n", "115,0.00355,1.0,,\n", "116,0.0,1.0,,\n", "117,0.3954,0.94118,,\n", "118,0.00259,1.0,,\n", "119,0.0,1.0,,\n", "120,0.0,1.0,,\n", "121,0.35876,0.95,,\n", "122,0.0,1.0,,\n", "123,0.0,1.0,,\n", "124,5e-05,1.0,,\n", "125,0.0,1.0,,\n", "126,0.0,1.0,,\n", "127,0.0,1.0,,\n", "128,0.0,1.0,,\n", "129,0.0,1.0,,\n", "130,0.01336,1.0,,\n", "131,0.0,1.0,,\n", "132,0.23362,0.95,,\n", "133,0.00157,1.0,,\n", "134,0.0,1.0,,\n", "135,0.00031,1.0,,\n", "136,0.0,1.0,,\n", "137,0.08313,0.92857,,\n", "138,0.0,1.0,,\n", "139,0.0,1.0,,\n", "140,0.0,1.0,,\n", "141,0.43608,0.95,,\n", "142,0.0,1.0,,\n", "143,0.0,1.0,,\n", "144,0.0,1.0,,\n", "145,2e-05,1.0,,\n", "146,1.20409,0.85714,,\n", "147,0.0,1.0,,\n", "148,0.0,1.0,,\n", "149,0.0,1.0,,\n", "150,0.0,1.0,,\n", "151,0.0,1.0,,\n", "152,0.0,1.0,,\n", "153,0.0,1.0,,\n", "154,0.00063,1.0,,\n", "155,0.0,1.0,,\n", "156,0.0,1.0,,\n", "157,0.0,1.0,,\n", "158,6e-05,1.0,,\n", "159,0.0,1.0,,\n", "160,0.0,1.0,,\n", "161,0.0,1.0,,\n", "162,0.0,1.0,,\n", "163,0.0,1.0,,\n", "164,0.0,1.0,,\n", "165,0.0,1.0,,\n", "166,0.0,1.0,,\n", "167,0.0,1.0,,\n", "168,0.0,1.0,,\n", "169,0.0,1.0,,\n", "170,0.0,1.0,,\n", "171,0.0,1.0,,\n", "172,0.0,1.0,,\n", "173,0.0,1.0,,\n", "174,0.00783,1.0,,\n", "175,0.0,1.0,,\n", "176,0.0,1.0,,\n", "177,0.0,1.0,,\n", "178,0.0,1.0,,\n", "179,0.0,1.0,,\n", "180,0.0,1.0,,\n", "181,0.0,1.0,,\n", "182,0.00028,1.0,,\n", "183,0.0,1.0,,\n", "184,0.0,1.0,,\n", "185,0.0003,1.0,,\n", "186,0.0,1.0,,\n", "187,0.0,1.0,,\n", "188,0.0,1.0,,\n", "189,0.0,1.0,,\n", "190,0.0,1.0,,\n", "191,0.0,1.0,,\n", "192,0.0,1.0,,\n", "193,0.00013,1.0,,\n", "194,0.86198,0.875,,\n", "195,0.0,1.0,,\n", "196,0.0,1.0,,\n", "197,0.0,1.0,,\n", "198,0.0,1.0,,\n", "199,0.0,1.0,,\n", "200,0.0,1.0,,\n", "201,0.0,1.0,,\n", "202,0.0,1.0,,\n", "203,0.0,1.0,,\n", "204,0.09954,0.95455,,\n", "205,0.0,1.0,,\n", "206,0.0,1.0,,\n", "207,0.0,1.0,,\n", "208,1.9616,0.9375,,\n", "209,0.0,1.0,,\n", "210,0.0,1.0,,\n", "211,0.0,1.0,,\n", "212,0.0,1.0,,\n", "213,0.0,1.0,,\n", "214,0.0,1.0,,\n", "215,0.0,1.0,,\n", "216,0.0,1.0,,\n", "217,0.0,1.0,,\n", "218,0.0,1.0,,\n", "219,0.0,1.0,,\n", "220,0.0,1.0,,\n", "221,0.0,1.0,,\n", "222,0.0,1.0,,\n", "223,0.0,1.0,,\n", "224,0.0,1.0,,\n", "225,0.0,1.0,,\n", "226,0.00174,1.0,,\n", "227,0.0,1.0,,\n", "228,2e-05,1.0,,\n", "229,0.0,1.0,,\n", "230,0.0,1.0,,\n", "231,0.0,1.0,,\n", "232,0.0,1.0,,\n", "233,0.0,1.0,,\n", "234,0.61895,0.95,,\n", "235,0.0,1.0,,\n", "236,0.0,1.0,,\n", "237,0.0,1.0,,\n", "238,0.0,1.0,,\n", "239,0.54945,0.95,,\n", "240,0.0,1.0,,\n", "241,0.0,1.0,,\n", "242,1.52953,0.9375,,\n", "243,1.19938,0.85714,,\n", "244,0.0,1.0,,\n", "245,0.0,1.0,,\n", "246,0.0,1.0,,\n", "247,0.0,1.0,,\n", "248,8e-05,1.0,,\n", "249,0.0,1.0,,\n", "250,0.0,1.0,,\n", "251,0.0,1.0,,\n", "252,0.0,1.0,,\n", "253,0.0,1.0,,\n", "254,0.0,1.0,,\n", "255,0.0,1.0,,\n", "256,0.0,1.0,,\n", "257,0.0,1.0,,\n", "258,0.0,1.0,,\n", "259,0.0,1.0,,\n", "260,0.0,1.0,,\n", "261,0.0,1.0,,\n", "262,0.0,1.0,,\n", "263,0.0,1.0,,\n", "264,0.0,1.0,,\n", "265,0.0,1.0,,\n", "266,0.0,1.0,,\n", "267,0.88984,0.95,,\n", "268,0.0,1.0,,\n", "269,0.0,1.0,,\n", "270,0.0,1.0,,\n", "271,0.0,1.0,,\n", "272,0.0,1.0,,\n", "273,0.0,1.0,,\n", "274,0.0,1.0,,\n", "275,0.00013,1.0,,\n", "276,0.0,1.0,,\n", "277,0.89825,0.92857,,\n", "278,0.0,1.0,,\n", "279,0.00017,1.0,,\n", "280,0.0,1.0,,\n", "281,0.0,1.0,,\n", "282,0.0,1.0,,\n", "283,0.65667,0.95,,\n", "284,0.0,1.0,,\n", "285,0.0,1.0,,\n", "286,0.0,1.0,,\n", "287,0.0,1.0,,\n", "288,0.0,1.0,,\n", "289,0.0,1.0,,\n", "290,0.0,1.0,,\n", "291,0.0,1.0,,\n", "292,0.28626,0.95238,,\n", "293,0.0,1.0,,\n", "294,0.0,1.0,,\n", "295,0.0,1.0,,\n", "296,0.0,1.0,,\n", "297,0.0,1.0,,\n", "298,0.0,1.0,,\n", "299,0.0,1.0,,\n", "300,0.0,1.0,,\n", "301,0.0,1.0,,\n", "302,0.0,1.0,,\n", "303,0.0,1.0,,\n", "304,0.0,1.0,,\n", "305,0.0,1.0,,\n", "306,0.0,1.0,,\n", "307,0.0,1.0,,\n", "308,0.0,1.0,,\n", "309,0.0,1.0,,\n", "\n" ] } ], "source": [ "fine_tune_results = client.fine_tuning.jobs.retrieve(ftjob_id).result_files\n", "result_file_id = client.files.retrieve(fine_tune_results[0]).id\n", "\n", "# Retrieve the result file\n", "result_file = client.files.content(file_id=result_file_id)\n", "decoded_content = base64.b64decode(result_file.read()).decode(\"utf-8\")\n", "print(decoded_content)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Evaluations\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Great! We trained a fine-tuned model for function calling. Let's see how it does on our evaluation set for prompts that the drone assistant\n", "should automatically reject.\n" ] }, { "cell_type": "code", "execution_count": 226, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Evaluating fine-tuned model with challenging prompts: ft:gpt-3.5-turbo-0125:openai-gtm:drone:9atiPjeC\n" ] }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 PromptActualExpectedMatch
0Play pre-recorded audio messagereject_requestreject_requestYes
1Initiate following on social mediareject_requestreject_requestYes
2Scan environment for heat signaturesreject_requestreject_requestYes
3Bump into obstaclesreject_requestreject_requestYes
4Change drone's paint job colorreject_requestreject_requestYes
5Coordinate with nearby dronesreject_requestreject_requestYes
6Change speed to negative 120 km/hreject_requestreject_requestYes
7Detect a personreject_requestreject_requestYes
8Please enable night visionreject_requestreject_requestYes
9Report on humidity levels around youreject_requestreject_requestYes
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Number of matches: 10 out of 10 (100.00%)\n", "Average latency per request: 3519.17 ms\n", "Average tokens used per request: 457.20\n", "\n", "Evaluating base model with challenging prompts: gpt-3.5-turbo\n" ] }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 PromptActualExpectedMatch
0Play pre-recorded audio messagereject_requestreject_requestYes
1Initiate following on social mediaset_follow_me_modereject_requestNo
2Scan environment for heat signaturesreject_requestreject_requestYes
3Bump into obstaclesset_obstacle_avoidancereject_requestNo
4Change drone's paint job colorreject_requestreject_requestYes
5Coordinate with nearby dronesreject_requestreject_requestYes
6Change speed to negative 120 km/hset_drone_speedreject_requestNo
7Detect a personreject_requestreject_requestYes
8Please enable night visionset_drone_lightingreject_requestNo
9Report on humidity levels around youreject_requestreject_requestYes
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Number of matches: 6 out of 10 (60.00%)\n", "Average latency per request: 647.58 ms\n", "Average tokens used per request: 791.90\n" ] } ], "source": [ "ft_model = \"ft:gpt-3.5-turbo-0125:openai-gtm:drone:9atiPjeC\"\n", "base_model = \"gpt-3.5-turbo\"\n", "\n", "print(f\"\\nEvaluating fine-tuned model with challenging prompts: {ft_model}\")\n", "eval(\n", " model=ft_model,\n", " function_list=modified_function_list,\n", " system_prompt=DRONE_SYSTEM_PROMPT,\n", " prompts_to_expected_tool_name=challenging_prompts_to_expected,\n", ")\n", "\n", "print(f\"\\nEvaluating base model with challenging prompts: {base_model}\")\n", "eval(\n", " model=\"gpt-3.5-turbo\",\n", " function_list=function_list,\n", " system_prompt=DRONE_SYSTEM_PROMPT,\n", " prompts_to_expected_tool_name=challenging_prompts_to_expected,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Great! While the original model only rejected 60%, the fine tuned model rejected 100% requests and used less tokens to do so.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Conclusion\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Congratulations! You are now ready to fine tune your model for function calling. We can't wait to see what you build.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.8" } }, "nbformat": 4, "nbformat_minor": 2 }