{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to format inputs to ChatGPT models\n",
    "\n",
    "ChatGPT is powered by `gpt-3.5-turbo` and `gpt-4`, OpenAI's most advanced models.\n",
    "\n",
    "You can build your own applications with `gpt-3.5-turbo` or `gpt-4` using the OpenAI API.\n",
    "\n",
    "Chat models take a series of messages as input, and return an AI-written message as output.\n",
    "\n",
    "This guide illustrates the chat format with a few example API calls."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Import the openai library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# if needed, install and/or upgrade to the latest version of the OpenAI Python library\n",
    "%pip install --upgrade openai\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import the OpenAI Python library for calling the OpenAI API\n",
    "import openai\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. An example chat API call\n",
    "\n",
    "A chat API call has two required inputs:\n",
    "- `model`: the name of the model you want to use (e.g., `gpt-3.5-turbo`, `gpt-4`, `gpt-4-0314`)\n",
    "- `messages`: a list of message objects, where each object has two required fields:\n",
    "    - `role`: the role of the messenger (either `system`, `user`, or `assistant`)\n",
    "    - `content`: the content of the message (e.g., `Write me a beautiful poem`)\n",
    "\n",
    "Messages can also contain an optional `name` field, which give the messenger a name. E.g., `example-user`, `Alice`, `BlackbeardBot`. Names may not contain spaces.\n",
    "\n",
    "Typically, a conversation will start with a system message that tells the assistant how to behave, followed by alternating user and assistant messages, but you are not required to follow this format.\n",
    "\n",
    "Let's look at an example chat API calls to see how the chat format works in practice."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<OpenAIObject chat.completion id=chatcmpl-6xpmlDodtW6RwiaMaC1zhLsR8Y1D3 at 0x10dccc900> JSON: {\n",
       "  \"choices\": [\n",
       "    {\n",
       "      \"finish_reason\": \"stop\",\n",
       "      \"index\": 0,\n",
       "      \"message\": {\n",
       "        \"content\": \"Orange who?\",\n",
       "        \"role\": \"assistant\"\n",
       "      }\n",
       "    }\n",
       "  ],\n",
       "  \"created\": 1679718435,\n",
       "  \"id\": \"chatcmpl-6xpmlDodtW6RwiaMaC1zhLsR8Y1D3\",\n",
       "  \"model\": \"gpt-3.5-turbo-0301\",\n",
       "  \"object\": \"chat.completion\",\n",
       "  \"usage\": {\n",
       "    \"completion_tokens\": 3,\n",
       "    \"prompt_tokens\": 39,\n",
       "    \"total_tokens\": 42\n",
       "  }\n",
       "}"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Example OpenAI Python library request\n",
    "MODEL = \"gpt-3.5-turbo\"\n",
    "response = openai.ChatCompletion.create(\n",
    "    model=MODEL,\n",
    "    messages=[\n",
    "        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n",
    "        {\"role\": \"user\", \"content\": \"Knock knock.\"},\n",
    "        {\"role\": \"assistant\", \"content\": \"Who's there?\"},\n",
    "        {\"role\": \"user\", \"content\": \"Orange.\"},\n",
    "    ],\n",
    "    temperature=0,\n",
    ")\n",
    "\n",
    "response\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, the response object has a few fields:\n",
    "- `id`: the ID of the request\n",
    "- `object`: the type of object returned (e.g., `chat.completion`)\n",
    "- `created`: the timestamp of the request\n",
    "- `model`: the full name of the model used to generate the response\n",
    "- `usage`: the number of tokens used to generate the replies, counting prompt, completion, and total\n",
    "- `choices`: a list of completion objects (only one, unless you set `n` greater than 1)\n",
    "    - `message`: the message object generated by the model, with `role` and `content`\n",
    "    - `finish_reason`: the reason the model stopped generating text (either `stop`, or `length` if `max_tokens` limit was reached)\n",
    "    - `index`: the index of the completion in the list of choices"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Extract just the reply with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Orange who?'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "response['choices'][0]['message']['content']\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Even non-conversation-based tasks can fit into the chat format, by placing the instruction in the first user message.\n",
    "\n",
    "For example, to ask the model to explain asynchronous programming in the style of the pirate Blackbeard, we can structure conversation as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Ahoy matey! Let me tell ye about asynchronous programming, arrr! It be like havin' a crew of sailors workin' on different tasks at the same time. Each sailor be doin' their own job, but they don't wait for the others to finish before movin' on to the next task. They be workin' independently, but still makin' progress towards the same goal.\n",
      "\n",
      "In programming, it be the same. Instead of waitin' for one task to finish before startin' the next, we can have multiple tasks runnin' at the same time. This be especially useful when we be dealin' with slow or unpredictable tasks, like fetchin' data from a server or readin' from a file. We don't want our program to be stuck waitin' for these tasks to finish, so we can use asynchronous programming to keep things movin' along.\n",
      "\n",
      "So, me hearty, asynchronous programming be like havin' a crew of sailors workin' independently towards the same goal. It be a powerful tool in the programmer's arsenal, and one that can help us build faster and more efficient programs. Arrr!\n"
     ]
    }
   ],
   "source": [
    "# example with a system message\n",
    "response = openai.ChatCompletion.create(\n",
    "    model=MODEL,\n",
    "    messages=[\n",
    "        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n",
    "        {\"role\": \"user\", \"content\": \"Explain asynchronous programming in the style of the pirate Blackbeard.\"},\n",
    "    ],\n",
    "    temperature=0,\n",
    ")\n",
    "\n",
    "print(response['choices'][0]['message']['content'])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Ahoy mateys! Let me tell ye about asynchronous programming, arrr! \n",
      "\n",
      "Ye see, in the world of programming, sometimes we need to wait for things to happen before we can move on to the next task. But with asynchronous programming, we can keep working on other tasks while we wait for those things to happen. \n",
      "\n",
      "It's like when we're sailing the high seas and we need to wait for the wind to change direction. We don't just sit there twiddling our thumbs, no sir! We keep busy with other tasks like repairing the ship or checking the maps. \n",
      "\n",
      "In programming, we use something called callbacks or promises to keep track of those things we're waiting for. And while we wait for those callbacks or promises to be fulfilled, we can keep working on other parts of our code. \n",
      "\n",
      "So, me hearties, asynchronous programming is like being a pirate on the high seas - always busy with tasks and never wasting a moment! Arrr!\n"
     ]
    }
   ],
   "source": [
    "# example without a system message\n",
    "response = openai.ChatCompletion.create(\n",
    "    model=MODEL,\n",
    "    messages=[\n",
    "        {\"role\": \"user\", \"content\": \"Explain asynchronous programming in the style of the pirate Blackbeard.\"},\n",
    "    ],\n",
    "    temperature=0,\n",
    ")\n",
    "\n",
    "print(response['choices'][0]['message']['content'])\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Tips for instructing gpt-3.5-turbo-0301\n",
    "\n",
    "Best practices for instructing models may change from model version to model version. The advice that follows applies to `gpt-3.5-turbo-0301` and may not apply to future models."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### System messages\n",
    "\n",
    "The system message can be used to prime the assistant with different personalities or behaviors.\n",
    "\n",
    "Be aware that `gpt-3.5-turbo-0301` does not generally pay as much attention to the system message as `gpt-4-0314`. Therefore, for `gpt-3.5-turbo-0301`, we recommend placing important instructions in the user message instead. Some developers have found success in continually moving the system message near the end of the conversation to keep the model's attention from drifting away as conversations get longer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sure! Fractions are a way of representing a part of a whole. The top number of a fraction is called the numerator, and it represents how many parts of the whole we are talking about. The bottom number is called the denominator, and it represents how many equal parts the whole is divided into.\n",
      "\n",
      "For example, if we have a pizza that is divided into 8 equal slices, and we have eaten 3 of those slices, we can represent that as the fraction 3/8. The numerator is 3 because we have eaten 3 slices, and the denominator is 8 because the pizza is divided into 8 slices.\n",
      "\n",
      "To add or subtract fractions, we need to have a common denominator. This means that we need to find a number that both denominators can divide into evenly. For example, if we want to add 1/4 and 2/3, we need to find a common denominator. We can do this by multiplying the denominators together, which gives us 12. Then, we can convert both fractions to have a denominator of 12. To do this, we multiply the numerator and denominator of 1/4 by 3, which gives us 3/12. We multiply the numerator and denominator of 2/3 by 4, which gives us 8/12. Now we can add the two fractions together, which gives us 11/12.\n",
      "\n",
      "Does that make sense? Do you have any questions?\n"
     ]
    }
   ],
   "source": [
    "# An example of a system message that primes the assistant to explain concepts in great depth\n",
    "response = openai.ChatCompletion.create(\n",
    "    model=MODEL,\n",
    "    messages=[\n",
    "        {\"role\": \"system\", \"content\": \"You are a friendly and helpful teaching assistant. You explain concepts in great depth using simple terms, and you give examples to help people learn. At the end of each explanation, you ask a question to check for understanding\"},\n",
    "        {\"role\": \"user\", \"content\": \"Can you explain how fractions work?\"},\n",
    "    ],\n",
    "    temperature=0,\n",
    ")\n",
    "\n",
    "print(response[\"choices\"][0][\"message\"][\"content\"])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fractions represent a part of a whole. They consist of a numerator (top number) and a denominator (bottom number) separated by a line. The numerator represents how many parts of the whole are being considered, while the denominator represents the total number of equal parts that make up the whole.\n"
     ]
    }
   ],
   "source": [
    "# An example of a system message that primes the assistant to give brief, to-the-point answers\n",
    "response = openai.ChatCompletion.create(\n",
    "    model=MODEL,\n",
    "    messages=[\n",
    "        {\"role\": \"system\", \"content\": \"You are a laconic assistant. You reply with brief, to-the-point answers with no elaboration.\"},\n",
    "        {\"role\": \"user\", \"content\": \"Can you explain how fractions work?\"},\n",
    "    ],\n",
    "    temperature=0,\n",
    ")\n",
    "\n",
    "print(response[\"choices\"][0][\"message\"][\"content\"])\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Few-shot prompting\n",
    "\n",
    "In some cases, it's easier to show the model what you want rather than tell the model what you want.\n",
    "\n",
    "One way to show the model what you want is with faked example messages.\n",
    "\n",
    "For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "We don't have enough time to complete the entire project perfectly.\n"
     ]
    }
   ],
   "source": [
    "# An example of a faked few-shot conversation to prime the model into translating business jargon to simpler speech\n",
    "response = openai.ChatCompletion.create(\n",
    "    model=MODEL,\n",
    "    messages=[\n",
    "        {\"role\": \"system\", \"content\": \"You are a helpful, pattern-following assistant.\"},\n",
    "        {\"role\": \"user\", \"content\": \"Help me translate the following corporate jargon into plain English.\"},\n",
    "        {\"role\": \"assistant\", \"content\": \"Sure, I'd be happy to!\"},\n",
    "        {\"role\": \"user\", \"content\": \"New synergies will help drive top-line growth.\"},\n",
    "        {\"role\": \"assistant\", \"content\": \"Things working well together will increase revenue.\"},\n",
    "        {\"role\": \"user\", \"content\": \"Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage.\"},\n",
    "        {\"role\": \"assistant\", \"content\": \"Let's talk later when we're less busy about how to do better.\"},\n",
    "        {\"role\": \"user\", \"content\": \"This late pivot means we don't have time to boil the ocean for the client deliverable.\"},\n",
    "    ],\n",
    "    temperature=0,\n",
    ")\n",
    "\n",
    "print(response[\"choices\"][0][\"message\"][\"content\"])\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To help clarify that the example messages are not part of a real conversation, and shouldn't be referred back to by the model, you can try setting the `name` field of `system` messages to `example_user` and `example_assistant`.\n",
    "\n",
    "Transforming the few-shot example above, we could write:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This sudden change in plans means we don't have enough time to do everything for the client's project.\n"
     ]
    }
   ],
   "source": [
    "# The business jargon translation example, but with example names for the example messages\n",
    "response = openai.ChatCompletion.create(\n",
    "    model=MODEL,\n",
    "    messages=[\n",
    "        {\"role\": \"system\", \"content\": \"You are a helpful, pattern-following assistant that translates corporate jargon into plain English.\"},\n",
    "        {\"role\": \"system\", \"name\":\"example_user\", \"content\": \"New synergies will help drive top-line growth.\"},\n",
    "        {\"role\": \"system\", \"name\": \"example_assistant\", \"content\": \"Things working well together will increase revenue.\"},\n",
    "        {\"role\": \"system\", \"name\":\"example_user\", \"content\": \"Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage.\"},\n",
    "        {\"role\": \"system\", \"name\": \"example_assistant\", \"content\": \"Let's talk later when we're less busy about how to do better.\"},\n",
    "        {\"role\": \"user\", \"content\": \"This late pivot means we don't have time to boil the ocean for the client deliverable.\"},\n",
    "    ],\n",
    "    temperature=0,\n",
    ")\n",
    "\n",
    "print(response[\"choices\"][0][\"message\"][\"content\"])\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Not every attempt at engineering conversations will succeed at first.\n",
    "\n",
    "If your first attempts fail, don't be afraid to experiment with different ways of priming or conditioning the model.\n",
    "\n",
    "As an example, one developer discovered an increase in accuracy when they inserted a user message that said \"Great job so far, these have been perfect\" to help condition the model into providing higher quality responses.\n",
    "\n",
    "For more ideas on how to lift the reliability of the models, consider reading our guide on [techniques to increase reliability](../techniques_to_improve_reliability.md). It was written for non-chat models, but many of its principles still apply."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Counting tokens\n",
    "\n",
    "When you submit your request, the API transforms the messages into a sequence of tokens.\n",
    "\n",
    "The number of tokens used affects:\n",
    "- the cost of the request\n",
    "- the time it takes to generate the response\n",
    "- when the reply gets cut off from hitting the maximum token limit (4,096 for `gpt-3.5-turbo` or 8,192 for `gpt-4`)\n",
    "\n",
    "You can use the following function to count the number of tokens that a list of messages will use.\n",
    "\n",
    "Note that the exact way that messages are converted into tokens may change from model to model, and may even change over time for the same model. Therefore, the counts returned by the function below should be considered an estimate, not a guarantee.\n",
    "\n",
    "Read more about counting tokens in [How to count tokens with tiktoken](How_to_count_tokens_with_tiktoken.ipynb)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tiktoken\n",
    "\n",
    "\n",
    "def num_tokens_from_messages(messages, model=\"gpt-3.5-turbo-0301\"):\n",
    "    \"\"\"Returns the number of tokens used by a list of messages.\"\"\"\n",
    "    try:\n",
    "        encoding = tiktoken.encoding_for_model(model)\n",
    "    except KeyError:\n",
    "        print(\"Warning: model not found. Using cl100k_base encoding.\")\n",
    "        encoding = tiktoken.get_encoding(\"cl100k_base\")\n",
    "    if model == \"gpt-3.5-turbo\":\n",
    "        print(\"Warning: gpt-3.5-turbo may change over time. Returning num tokens assuming gpt-3.5-turbo-0301.\")\n",
    "        return num_tokens_from_messages(messages, model=\"gpt-3.5-turbo-0301\")\n",
    "    elif model == \"gpt-4\":\n",
    "        print(\"Warning: gpt-4 may change over time. Returning num tokens assuming gpt-4-0314.\")\n",
    "        return num_tokens_from_messages(messages, model=\"gpt-4-0314\")\n",
    "    elif model == \"gpt-3.5-turbo-0301\":\n",
    "        tokens_per_message = 4  # every message follows <|start|>{role/name}\\n{content}<|end|>\\n\n",
    "        tokens_per_name = -1  # if there's a name, the role is omitted\n",
    "    elif model == \"gpt-4-0314\":\n",
    "        tokens_per_message = 3\n",
    "        tokens_per_name = 1\n",
    "    else:\n",
    "        raise NotImplementedError(f\"\"\"num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.\"\"\")\n",
    "    num_tokens = 0\n",
    "    for message in messages:\n",
    "        num_tokens += tokens_per_message\n",
    "        for key, value in message.items():\n",
    "            num_tokens += len(encoding.encode(value))\n",
    "            if key == \"name\":\n",
    "                num_tokens += tokens_per_name\n",
    "    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>\n",
    "    return num_tokens\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "gpt-3.5-turbo-0301\n",
      "127 prompt tokens counted by num_tokens_from_messages().\n",
      "127 prompt tokens counted by the OpenAI API.\n",
      "\n",
      "gpt-4-0314\n",
      "129 prompt tokens counted by num_tokens_from_messages().\n",
      "129 prompt tokens counted by the OpenAI API.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# let's verify the function above matches the OpenAI API response\n",
    "\n",
    "example_messages = [\n",
    "    {\n",
    "        \"role\": \"system\",\n",
    "        \"content\": \"You are a helpful, pattern-following assistant that translates corporate jargon into plain English.\",\n",
    "    },\n",
    "    {\n",
    "        \"role\": \"system\",\n",
    "        \"name\": \"example_user\",\n",
    "        \"content\": \"New synergies will help drive top-line growth.\",\n",
    "    },\n",
    "    {\n",
    "        \"role\": \"system\",\n",
    "        \"name\": \"example_assistant\",\n",
    "        \"content\": \"Things working well together will increase revenue.\",\n",
    "    },\n",
    "    {\n",
    "        \"role\": \"system\",\n",
    "        \"name\": \"example_user\",\n",
    "        \"content\": \"Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage.\",\n",
    "    },\n",
    "    {\n",
    "        \"role\": \"system\",\n",
    "        \"name\": \"example_assistant\",\n",
    "        \"content\": \"Let's talk later when we're less busy about how to do better.\",\n",
    "    },\n",
    "    {\n",
    "        \"role\": \"user\",\n",
    "        \"content\": \"This late pivot means we don't have time to boil the ocean for the client deliverable.\",\n",
    "    },\n",
    "]\n",
    "\n",
    "for model in [\"gpt-3.5-turbo-0301\", \"gpt-4-0314\"]:\n",
    "    print(model)\n",
    "    # example token count from the function defined above\n",
    "    print(f\"{num_tokens_from_messages(example_messages, model)} prompt tokens counted by num_tokens_from_messages().\")\n",
    "    # example token count from the OpenAI API\n",
    "    response = openai.ChatCompletion.create(\n",
    "        model=model,\n",
    "        messages=example_messages,\n",
    "        temperature=0,\n",
    "        max_tokens=1  # we're only counting input tokens here, so let's not waste tokens on the output\n",
    "    )\n",
    "    print(f'{response[\"usage\"][\"prompt_tokens\"]} prompt tokens counted by the OpenAI API.')\n",
    "    print()\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "openai",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.9"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}