pull/1071/head
Ankur Rastogi 3 months ago
parent 44738133c3
commit 3ec50b6338

@ -32,8 +32,8 @@
"4. Token highlighting and outputting bytes\n",
"* Users can easily create a token highlighter using the built in tokenization that comes with enabling `logprobs`. Additionally, the bytes parameter includes the ASCII encoding of each output character, which is particularly useful for reproducing emojis and special characters.\n",
"\n",
"4. Calculating perplexity\n",
"* `logprobs1 can be used to help us assess the model's overall confidence in a result and help us compare the confidence of results from different prompts."
"5. Calculating perplexity\n",
"* `logprobs` can be used to help us assess the model's overall confidence in a result and help us compare the confidence of results from different prompts."
]
},
{
@ -45,7 +45,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
@ -763,20 +763,9 @@
"Additionally, we can get the joint probability of the entire completion, which is the exponentiated product of each token's log probability. This gives us how `likely` this given completion is given the prompt. Since, our prompt is quite directive (asking for a certain emoji and its name), the joint probability of this output is high! If we ask for a random output however, we'll see a much lower joint probability. This can also be a good tactic for developers during prompt engineering. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Calculating perplexity\n",
"\n",
"When looking to assess the model's confidence in a result, it can be useful to calculate perplexity, which is a measure of the uncertainty. Perplexity can be calculated by exponentiating the negative of the average of the logprobs. Generally, a higher perplexity indicates a more uncertain result, and a lower perplexity indicates a more confident result. As such, perplexity can be used to both assess the result of an individual model run and also to helpfully compare the relative confidence of results between model runs. While a high confidence doesn't guarantee result accuracy, it can be a helpful signal that can be paired with other evaluation metrics to build a better understanding of your prompt's behavior.\n",
"\n",
"For example, let's say that I want to use `gpt-3.5-turbo` to learn more about artificial intelligence. I could ask a question about recent history and a question about the future:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 4,
"metadata": {},
"outputs": [
{
@ -786,16 +775,16 @@
"Prompt: In a short sentence, has artifical intelligence grown in the last decade?\n",
"Response: Yes, artificial intelligence has grown significantly in the last decade. \n",
"\n",
"Tokens: Yes , artificial intelligence has grown significantly in the last decade .\n",
"Logprobs: -0.00 -0.00 -0.00 -0.00 -0.00 -0.29 -0.07 -0.00 -0.00 -0.01 -0.00 -0.00\n",
"Perplexity: 1.031933208572389 \n",
"Tokens: Yes , artificial intelligence has grown significantly in the last decade .\n",
"Logprobs: -0.00 -0.00 -0.00 -0.00 -0.00 -0.53 -0.11 -0.00 -0.00 -0.01 -0.00 -0.00\n",
"Perplexity: 1.0564125277713383 \n",
"\n",
"Prompt: In a short sentence, what are your thoughts on the future of artificial intelligence?\n",
"Response: The future of artificial intelligence holds great potential for advancing technology and improving various aspects of our lives. \n",
"Response: The future of artificial intelligence holds great potential for transforming industries and improving efficiency, but also raises ethical and societal concerns that must be carefully addressed. \n",
"\n",
"Tokens: The future of artificial intelligence holds great potential for advancing technology and improving various aspects of our lives .\n",
"Logprobs: -0.22 -0.03 -0.00 -0.00 -0.00 -0.25 -0.48 -0.24 -0.03 -1.51 -0.04 -0.02 -0.94 -0.32 -0.27 -0.00 -0.66 -0.29 -0.09\n",
"Perplexity: 1.3281962888326724 \n",
"Tokens: The future of artificial intelligence holds great potential for transforming industries and improving efficiency , but also raises ethical and societal concerns that must be carefully addressed .\n",
"Logprobs: -0.19 -0.03 -0.00 -0.00 -0.00 -0.30 -0.51 -0.24 -0.03 -1.45 -0.23 -0.03 -0.22 -0.83 -0.48 -0.01 -0.38 -0.07 -0.47 -0.63 -0.18 -0.26 -0.01 -0.14 -0.00 -0.59 -0.55 -0.00\n",
"Perplexity: 1.3220795252314004 \n",
"\n"
]
}
@ -831,6 +820,17 @@
" print(\"Perplexity:\".ljust(max_starter_length), perplexity_score, \"\\n\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Calculating perplexity\n",
"\n",
"When looking to assess the model's confidence in a result, it can be useful to calculate perplexity, which is a measure of the uncertainty. Perplexity can be calculated by exponentiating the negative of the average of the logprobs. Generally, a higher perplexity indicates a more uncertain result, and a lower perplexity indicates a more confident result. As such, perplexity can be used to both assess the result of an individual model run and also to compare the relative confidence of results between model runs. While a high confidence doesn't guarantee result accuracy, it can be a helpful signal that can be paired with other evaluation metrics to build a better understanding of your prompt's behavior.\n",
"\n",
"For example, let's say that I want to use `gpt-3.5-turbo` to learn more about artificial intelligence. I could ask a question about recent history and a question about the future:"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -874,7 +874,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "openai-env",
"display_name": "openai",
"language": "python",
"name": "python3"
},

Loading…
Cancel
Save