diff --git a/examples/How_to_stream_completions.ipynb b/examples/How_to_stream_completions.ipynb index 57f311ad..a1d9098c 100644 --- a/examples/How_to_stream_completions.ipynb +++ b/examples/How_to_stream_completions.ipynb @@ -19,14 +19,13 @@ "\n", "Note that using `stream=True` in a production application makes it more difficult to moderate the content of the completions, as partial completions may be more difficult to evaluate. This may have implications for [approved usage](https://beta.openai.com/docs/usage-guidelines).\n", "\n", - "Another small drawback of streaming responses is that the response no longer includes the `usage` field to tell you how many tokens were consumed. After receiving and combining all of the responses, you can calculate this yourself using [`tiktoken`](How_to_count_tokens_with_tiktoken.ipynb).\n", - "\n", "## Example code\n", "\n", "Below, this notebook shows:\n", "1. What a typical chat completion response looks like\n", "2. What a streaming chat completion response looks like\n", - "3. How much time is saved by streaming a chat completion" + "3. How much time is saved by streaming a chat completion\n", + "4. How to get token usage data for streamed chat completion response" ] }, { @@ -572,6 +571,65 @@ { "cell_type": "markdown", "metadata": {}, + "source": [ + "### 4. How to get token usage data for streamed chat completion response\n", + "\n", + "You can get token usage statistics for your streamed response by setting `stream_options={\"include_usage\": True}`. When you do so, an extra chunk will be streamed as the final chunk. You can access the usage data for the entire request via the `usage` field on this chunk. A few important notes when you set `stream_options={\"include_usage\": True}`:\n", + "* The value for the `usage` field on all chunks except for the last one will be null.\n", + "* The `usage` field on the last chunk contains token usage statistics for the entire request.\n", + "* The `choices` field on the last chunk will always be an empty array `[]`.\n", + "\n", + "Let's see how it works using the example in 2." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "choices: [Choice(delta=ChoiceDelta(content='', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)]\n", + "usage: None\n", + "****************\n", + "choices: [Choice(delta=ChoiceDelta(content='2', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)]\n", + "usage: None\n", + "****************\n", + "choices: [Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)]\n", + "usage: None\n", + "****************\n", + "choices: []\n", + "usage: CompletionUsage(completion_tokens=1, prompt_tokens=19, total_tokens=20)\n", + "****************\n" + ] + } + ], + "source": [ + "# Example of an OpenAI ChatCompletion request with stream=True and stream_options={\"include_usage\": True}\n", + "\n", + "# a ChatCompletion request\n", + "response = client.chat.completions.create(\n", + " model='gpt-3.5-turbo',\n", + " messages=[\n", + " {'role': 'user', 'content': \"What's 1+1? Answer in one word.\"}\n", + " ],\n", + " temperature=0,\n", + " stream=True,\n", + " stream_options={\"include_usage\": True}, # retrieving token usage for stream response\n", + ")\n", + "\n", + "for chunk in response:\n", + " print(f\"choices: {chunk.choices}\\nusage: {chunk.usage}\")\n", + " print(\"****************\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [] } ], diff --git a/registry.yaml b/registry.yaml index dcf48bd0..017b3979 100644 --- a/registry.yaml +++ b/registry.yaml @@ -207,7 +207,6 @@ - ted-at-openai tags: - completions - - tiktoken - title: Multiclass Classification for Transactions path: examples/Multiclass_classification_for_transactions.ipynb