adds examples of counting tokens for the ChatGPT API

pull/170/head
Ted Sanders 1 year ago
parent 73a64ff7da
commit 8cd6b0f53b

@ -9,23 +9,37 @@
"\n",
"[`tiktoken`](https://github.com/openai/tiktoken/blob/main/README.md) is a fast open-source tokenizer by OpenAI.\n",
"\n",
"Given a text string (e.g., `\"tiktoken is great!\"`) and an encoding (e.g., `\"gpt2\"`), a tokenizer can split the text string into a list of tokens (e.g., `[\"t\", \"ik\", \"token\", \" is\", \" great\", \"!\"]`).\n",
"Given a text string (e.g., `\"tiktoken is great!\"`) and an encoding (e.g., `\"cl100k_base\"`), a tokenizer can split the text string into a list of tokens (e.g., `[\"t\", \"ik\", \"token\", \" is\", \" great\", \"!\"]`).\n",
"\n",
"Splitting text strings into tokens is useful because models like GPT-3 see text in the form of tokens. Knowing how many tokens are in a text string can tell you (a) whether the string is too long for a text model to process and (b) how much an OpenAI API call costs (as usage is priced by token). Different models use different encodings.\n",
"Splitting text strings into tokens is useful because GPT models see text in the form of tokens. Knowing how many tokens are in a text string can tell you (a) whether the string is too long for a text model to process and (b) how much an OpenAI API call costs (as usage is priced by token). Different models use different encodings.\n",
"\n",
"\n",
"## Encodings\n",
"\n",
"Encodings specify how text is converted into tokens. Different models use different encodings.\n",
"\n",
"`tiktoken` supports three encodings used by OpenAI models:\n",
"\n",
"| Encoding name | OpenAI models |\n",
"|-------------------------|-----------------------------------------------------|\n",
"| `gpt2` (or `r50k_base`) | Most GPT-3 models |\n",
"| `cl100k_base` | ChatGPT models, `text-embedding-ada-002` |\n",
"| `p50k_base` | Code models, `text-davinci-002`, `text-davinci-003` |\n",
"| `cl100k_base` | `text-embedding-ada-002` |\n",
"| `r50k_base` (or `gpt2`) | GPT-3 models like `davinci` |\n",
"\n",
"You can retrieve the encoding for a model using `tiktoken.encoding_for_model()` as follows:\n",
"```python\n",
"encoding = tiktoken.encoding_for_model('gpt-3.5-turbo')\n",
"```\n",
"\n",
"`p50k_base` overlaps substantially with `gpt2`, and for non-code applications, they will usually give the same tokens.\n",
"`p50k_base` overlaps substantially with `r50k_base`, and for non-code applications, they will usually give the same tokens.\n",
"\n",
"## Tokenizer libraries and languages\n",
"\n",
"For `gpt2` encodings, tokenizers are available in many languages.\n",
"## Tokenizer libraries by language\n",
"\n",
"For `cl100k_base` and `p50k_base` encodings, `tiktoken` is the only tokenizer available as of March 2023.\n",
"- Python: [tiktoken](https://github.com/openai/tiktoken/blob/main/README.md)\n",
"\n",
"For `r50k_base` (`gpt2`) encodings, tokenizers are available in many languages.\n",
"- Python: [tiktoken](https://github.com/openai/tiktoken/blob/main/README.md) (or alternatively [GPT2TokenizerFast](https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2TokenizerFast))\n",
"- JavaScript: [gpt-3-encoder](https://www.npmjs.com/package/gpt-3-encoder)\n",
"- .NET / C#: [GPT Tokenizer](https://github.com/dluc/openai-tools)\n",
@ -34,8 +48,6 @@
"\n",
"(OpenAI makes no endorsements or guarantees of third-party libraries.)\n",
"\n",
"For `p50k_base` and `cl100k_base` encodings, `tiktoken` is the only tokenizer available as of January 2023.\n",
"- Python: [tiktoken](https://github.com/openai/tiktoken/blob/main/README.md)\n",
"\n",
"## How strings are typically tokenized\n",
"\n",
@ -49,11 +61,16 @@
"source": [
"## 0. Install `tiktoken`\n",
"\n",
"In your terminal, install `tiktoken` with `pip`:\n",
"\n",
"```bash\n",
"pip install tiktoken\n",
"```"
"Install `tiktoken` with `pip`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade tiktoken"
]
},
{
@ -66,7 +83,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
@ -87,11 +104,28 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"encoding = tiktoken.get_encoding(\"cl100k_base\")\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Use `tiktoken.encoding_for_model()` to automatically load the correct encoding for a given model name."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"encoding = tiktoken.get_encoding(\"gpt2\")\n"
"encoding = tiktoken.encoding_for_model(\"gpt-3.5-turbo\")"
]
},
{
@ -113,16 +147,16 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[83, 1134, 30001, 318, 1049, 0]"
"[83, 1609, 5963, 374, 2294, 0]"
]
},
"execution_count": 3,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
@ -141,7 +175,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
@ -154,7 +188,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 7,
"metadata": {},
"outputs": [
{
@ -163,13 +197,13 @@
"6"
]
},
"execution_count": 5,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"num_tokens_from_string(\"tiktoken is great!\", \"gpt2\")\n"
"num_tokens_from_string(\"tiktoken is great!\", \"cl100k_base\")\n"
]
},
{
@ -190,7 +224,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 8,
"metadata": {},
"outputs": [
{
@ -199,13 +233,13 @@
"'tiktoken is great!'"
]
},
"execution_count": 6,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"encoding.decode([83, 1134, 30001, 318, 1049, 0])\n"
"encoding.decode([83, 1609, 5963, 374, 2294, 0])\n"
]
},
{
@ -226,7 +260,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 9,
"metadata": {},
"outputs": [
{
@ -235,13 +269,13 @@
"[b't', b'ik', b'token', b' is', b' great', b'!']"
]
},
"execution_count": 7,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[encoding.decode_single_token_bytes(token) for token in [83, 1134, 30001, 318, 1049, 0]]\n"
"[encoding.decode_single_token_bytes(token) for token in [83, 1609, 5963, 374, 2294, 0]]\n"
]
},
{
@ -264,7 +298,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
@ -287,7 +321,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 11,
"metadata": {},
"outputs": [
{
@ -317,7 +351,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 12,
"metadata": {},
"outputs": [
{
@ -347,7 +381,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 13,
"metadata": {},
"outputs": [
{
@ -374,6 +408,117 @@
"source": [
"compare_encodings(\"お誕生日おめでとう\")\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Counting tokens for chat API calls\n",
"\n",
"ChatGPT models like `gpt-3.5-turbo` use tokens in the same way as other models, but because of their message-based formatting, it's more difficult to count how many tokens will be used by a conversation.\n",
"\n",
"Below is an example function for counting tokens for messages passed to `gpt-3.5-turbo-0301`.\n",
"\n",
"The exact way that messages are converted into tokens may change from model to model. So when future model versions are released, the answers returned by this function may be only approximate. The [ChatML documentation](https://github.com/openai/openai-python/blob/main/chatml.md) explains how messages are converted into tokens by the OpenAI API, and may be useful for writing your own function."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"def num_tokens_from_messages(messages, model=\"gpt-3.5-turbo-0301\"):\n",
" \"\"\"Returns the number of tokens used by a list of messages.\"\"\"\n",
" try:\n",
" encoding = tiktoken.encoding_for_model(model)\n",
" except KeyError:\n",
" encoding = tiktoken.get_encoding(\"cl100k_base\")\n",
" if model == \"gpt-3.5-turbo-0301\": # note: future models may deviate from this\n",
" num_tokens = 0\n",
" for message in messages:\n",
" num_tokens += 4 # every message follows <im_start>{role/name}\\n{content}<im_end>\\n\n",
" for key, value in message.items():\n",
" num_tokens += len(encoding.encode(value))\n",
" if key == \"name\": # if there's a name, the role is omitted\n",
" num_tokens += -1 # role is always required and always 1 token\n",
" num_tokens += 2 # every reply is primed with <im_start>assistant\n",
" return num_tokens\n",
" else:\n",
" raise NotImplementedError(f\"\"\"num_tokens_from_messages() is not presently implemented for model {model}.\n",
"See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.\"\"\")\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"messages = [\n",
" {\"role\": \"system\", \"content\": \"You are a helpful, pattern-following assistant that translates corporate jargon into plain English.\"},\n",
" {\"role\": \"system\", \"name\":\"example_user\", \"content\": \"New synergies will help drive top-line growth.\"},\n",
" {\"role\": \"system\", \"name\": \"example_assistant\", \"content\": \"Things working well together will increase revenue.\"},\n",
" {\"role\": \"system\", \"name\":\"example_user\", \"content\": \"Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage.\"},\n",
" {\"role\": \"system\", \"name\": \"example_assistant\", \"content\": \"Let's talk later when we're less busy about how to do better.\"},\n",
" {\"role\": \"user\", \"content\": \"This late pivot means we don't have time to boil the ocean for the client deliverable.\"},\n",
"]\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"126 prompt tokens counted.\n"
]
}
],
"source": [
"# example token count from the function defined above\n",
"model = \"gpt-3.5-turbo-0301\"\n",
"\n",
"print(f\"{num_tokens_from_messages(messages, model)} prompt tokens counted.\")\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"126 prompt tokens used.\n"
]
}
],
"source": [
"# example token count from the OpenAI API\n",
"import openai\n",
"\n",
"\n",
"response = openai.ChatCompletion.create(\n",
" model=model,\n",
" messages=messages,\n",
" temperature=0,\n",
")\n",
"\n",
"print(f'{response[\"usage\"][\"prompt_tokens\"]} prompt tokens used.')\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {

@ -31,17 +31,17 @@
"outputs": [],
"source": [
"# if needed, install and/or upgrade to the latest version of the OpenAI Python library\n",
"%pip install --upgrade openai"
"%pip install --upgrade openai\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# import the OpenAI Python library for calling the OpenAI API\n",
"import openai"
"import openai\n"
]
},
{
@ -64,13 +64,13 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<OpenAIObject chat.completion id=chatcmpl-6pKvHxoFPQnkVzGfIaEqdCrudunUl at 0x110a9f360> JSON: {\n",
"<OpenAIObject chat.completion id=chatcmpl-6pjrV9CvZ2ivOSxzZrBdEidUB6xfs at 0x13362cf90> JSON: {\n",
" \"choices\": [\n",
" {\n",
" \"finish_reason\": \"stop\",\n",
@ -81,8 +81,8 @@
" }\n",
" }\n",
" ],\n",
" \"created\": 1677693175,\n",
" \"id\": \"chatcmpl-6pKvHxoFPQnkVzGfIaEqdCrudunUl\",\n",
" \"created\": 1677789041,\n",
" \"id\": \"chatcmpl-6pjrV9CvZ2ivOSxzZrBdEidUB6xfs\",\n",
" \"model\": \"gpt-3.5-turbo-0301\",\n",
" \"object\": \"chat.completion\",\n",
" \"usage\": {\n",
@ -93,7 +93,7 @@
"}"
]
},
"execution_count": 2,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
@ -142,7 +142,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"metadata": {},
"outputs": [
{
@ -151,13 +151,13 @@
"'Orange who?'"
]
},
"execution_count": 3,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"response['choices'][0]['message']['content']"
"response['choices'][0]['message']['content']\n"
]
},
{
@ -172,18 +172,14 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Ahoy matey! Let me tell ye about asynchronous programming, arrr! It be like havin' a crew of sailors workin' on different tasks at the same time. Each sailor be doin' their own job, but they don't wait for the others to finish before movin' on to the next task. They be workin' independently, but still makin' progress towards the same goal.\n",
"\n",
"In programming, it be the same. Instead of waitin' for one task to finish before startin' the next, we can have multiple tasks runnin' at the same time. This be especially useful when we be dealin' with slow or unpredictable tasks, like fetchin' data from a server or readin' from a file. We don't want our program to be stuck waitin' for these tasks to finish, so we can use asynchronous programming to keep things movin' along.\n",
"\n",
"So, me hearty, asynchronous programming be like havin' a crew of sailors workin' independently towards the same goal. It be a powerful tool in the programmer's arsenal, and one that can help us build faster and more efficient programs. Arrr!\n"
"Ahoy matey! Asynchronous programming be like havin' a crew o' pirates workin' on different tasks at the same time. Ye see, instead o' waitin' for one task to be completed before startin' the next, we can have multiple tasks runnin' at once. It be like havin' me crew hoistin' the sails while others be swabbin' the deck and loadin' the cannons. Each task be workin' independently, but they all be contributin' to the overall success o' the ship. And just like how me crew communicates with each other to make sure everything be runnin' smoothly, asynchronous programming uses callbacks and promises to coordinate the different tasks and make sure they all be finished in the right order. Arrr, it be a powerful tool for any programmer lookin' to optimize their code and make it run faster.\n"
]
}
],
@ -203,7 +199,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 6,
"metadata": {},
"outputs": [
{
@ -261,7 +257,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 7,
"metadata": {},
"outputs": [
{
@ -270,11 +266,13 @@
"text": [
"Sure! Fractions are a way of representing a part of a whole. The top number of a fraction is called the numerator, and it represents how many parts of the whole we are talking about. The bottom number is called the denominator, and it represents how many equal parts the whole is divided into.\n",
"\n",
"For example, if we have a pizza that is divided into 8 equal slices, and we have eaten 3 of those slices, we can represent that as a fraction: 3/8. The numerator is 3 because we have eaten 3 slices, and the denominator is 8 because the pizza is divided into 8 slices.\n",
"For example, if we have a pizza that is divided into 8 equal slices, and we take 3 slices, we can represent that as the fraction 3/8. The numerator is 3 because we took 3 slices, and the denominator is 8 because the pizza was divided into 8 slices.\n",
"\n",
"To add or subtract fractions, we need to have a common denominator. This means that the denominators of the fractions need to be the same. To do this, we can find the least common multiple (LCM) of the denominators and then convert each fraction to an equivalent fraction with the LCM as the denominator.\n",
"\n",
"To add or subtract fractions, we need to have a common denominator. That means we need to find a number that both denominators can divide into evenly. For example, if we want to add 1/4 and 2/3, we need to find a common denominator. One way to do that is to multiply the denominators together: 4 x 3 = 12. Then we can convert both fractions to have a denominator of 12: 1/4 becomes 3/12 (multiply the numerator and denominator by 3), and 2/3 becomes 8/12 (multiply the numerator and denominator by 4). Now we can add the two fractions: 3/12 + 8/12 = 11/12.\n",
"To multiply fractions, we simply multiply the numerators together and the denominators together. To divide fractions, we multiply the first fraction by the reciprocal of the second fraction (flip the second fraction upside down).\n",
"\n",
"Does that make sense? Do you have any questions?\n"
"Now, here's a question to check for understanding: If we have a pizza that is divided into 12 equal slices, and we take 4 slices, what is the fraction that represents how much of the pizza we took?\n"
]
}
],
@ -294,7 +292,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 8,
"metadata": {},
"outputs": [
{
@ -335,7 +333,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 9,
"metadata": {},
"outputs": [
{
@ -378,7 +376,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 10,
"metadata": {},
"outputs": [
{
@ -420,6 +418,118 @@
"\n",
"For more ideas on how to lift the reliability of the models, consider reading our guide on [techniques to increase reliability](../techniques_to_improve_reliability.md). It was written for non-chat models, but many of its principles still apply."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Counting tokens\n",
"\n",
"When you submit your request, the API transforms the messages into a sequence of tokens.\n",
"\n",
"The number of tokens used affects:\n",
"- the cost of the request\n",
"- the time it takes to generate the response\n",
"- when the reply gets cut off from hitting the maximum token limit (4096 for `gpt-3.5-turbo`)\n",
"\n",
"As of Mar 01, 2023, you can use the following function to count the number of tokens that a list of messages will use."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"import tiktoken\n",
"\n",
"\n",
"def num_tokens_from_messages(messages, model=\"gpt-3.5-turbo-0301\"):\n",
" \"\"\"Returns the number of tokens used by a list of messages.\"\"\"\n",
" try:\n",
" encoding = tiktoken.encoding_for_model(model)\n",
" except KeyError:\n",
" encoding = tiktoken.get_encoding(\"cl100k_base\")\n",
" if model == \"gpt-3.5-turbo-0301\": # note: future models may deviate from this\n",
" num_tokens = 0\n",
" for message in messages:\n",
" num_tokens += 4 # every message follows <im_start>{role/name}\\n{content}<im_end>\\n\n",
" for key, value in message.items():\n",
" num_tokens += len(encoding.encode(value))\n",
" if key == \"name\": # if there's a name, the role is omitted\n",
" num_tokens += -1 # role is always required and always 1 token\n",
" num_tokens += 2 # every reply is primed with <im_start>assistant\n",
" return num_tokens\n",
" else:\n",
" raise NotImplementedError(f\"\"\"num_tokens_from_messages() is not presently implemented for model {model}.\n",
"See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.\"\"\")\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"messages = [\n",
" {\"role\": \"system\", \"content\": \"You are a helpful, pattern-following assistant that translates corporate jargon into plain English.\"},\n",
" {\"role\": \"system\", \"name\":\"example_user\", \"content\": \"New synergies will help drive top-line growth.\"},\n",
" {\"role\": \"system\", \"name\": \"example_assistant\", \"content\": \"Things working well together will increase revenue.\"},\n",
" {\"role\": \"system\", \"name\":\"example_user\", \"content\": \"Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage.\"},\n",
" {\"role\": \"system\", \"name\": \"example_assistant\", \"content\": \"Let's talk later when we're less busy about how to do better.\"},\n",
" {\"role\": \"user\", \"content\": \"This late pivot means we don't have time to boil the ocean for the client deliverable.\"},\n",
"]\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"126 prompt tokens counted.\n"
]
}
],
"source": [
"# example token count from the function defined above\n",
"print(f\"{num_tokens_from_messages(messages)} prompt tokens counted.\")\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"126 prompt tokens used.\n"
]
}
],
"source": [
"# example token count from the OpenAI API\n",
"response = openai.ChatCompletion.create(\n",
" model=MODEL,\n",
" messages=messages,\n",
" temperature=0,\n",
")\n",
"\n",
"print(f'{response[\"usage\"][\"prompt_tokens\"]} prompt tokens used.')\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {

Loading…
Cancel
Save