"Note that using `stream=True` in a production application makes it more difficult to moderate the content of the completions, as partial completions may be more difficult to evaluate. This may have implications for [approved usage](https://beta.openai.com/docs/usage-guidelines).\n",
"\n",
"Another small drawback of streaming responses is that the response no longer includes the `usage` field to tell you how many tokens were consumed. After receiving and combining all of the responses, you can calculate this yourself using [`tiktoken`](How_to_count_tokens_with_tiktoken.ipynb).\n",
"\n",
"## Example code\n",
"\n",
"Below, this notebook shows:\n",
"1. What a typical chat completion response looks like\n",
"2. What a streaming chat completion response looks like\n",
"3. How much time is saved by streaming a chat completion"
"3. How much time is saved by streaming a chat completion\n",
"4. How to get token usage data for streamed chat completion response"
]
},
{
@ -572,6 +571,65 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. How to get token usage data for streamed chat completion response\n",
"\n",
"You can get token usage statistics for your streamed response by setting `stream_options={\"include_usage\": True}`. When you do so, an extra chunk will be streamed as the final chunk. You can access the usage data for the entire request via the `usage` field on this chunk. A few important notes when you set `stream_options={\"include_usage\": True}`:\n",
"* The value for the `usage` field on all chunks except for the last one will be null.\n",
"* The `usage` field on the last chunk contains token usage statistics for the entire request.\n",
"* The `choices` field on the last chunk will always be an empty array `[]`.\n",