diff --git a/examples/How_to_stream_completions.ipynb b/examples/How_to_stream_completions.ipynb index 5dfa964f..86c3fca5 100644 --- a/examples/How_to_stream_completions.ipynb +++ b/examples/How_to_stream_completions.ipynb @@ -7,23 +7,27 @@ "source": [ "# How to stream completions\n", "\n", - "By default, when you send a prompt to the OpenAI Completions endpoint, it computes the entire completion and sends it back in a single response.\n", + "By default, when you request a completion from the OpenAI, the entire completion is generated before being sent back in a single response.\n", "\n", - "If you're generating very long completions from a davinci-level model, waiting for the response can take many seconds. As of Aug 2022, responses from `text-davinci-002` typically take something like ~1 second plus ~2 seconds per 100 completion tokens.\n", + "If you're generating long completions, waiting for the response can take many seconds.\n", "\n", - "If you want to get the response faster, you can 'stream' the completion as it's being generated. This allows you to start printing or otherwise processing the beginning of the completion before the entire completion is finished.\n", + "To get responses sooner, you can 'stream' the completion as it's being generated. This allows you to start printing or processing the beginning of the completion before the full completion is finished.\n", "\n", - "To stream completions, set `stream=True` when calling the Completions endpoint. This will return an object that streams back text as [data-only server-sent events](https://app.mode.com/openai/reports/4fce5ba22b5b/runs/f518a0be4495).\n", + "To stream completions, set `stream=True` when calling the chat completions or completions endpoints. This will return an object that streams back the response as [data-only server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#event_stream_format). Extract chunks from the `delta` field rather than the `message` field.\n", "\n", "## Downsides\n", "\n", - "Note that using `stream=True` in a production application makes it more difficult to moderate the content of the completions, which has implications for [approved usage](https://beta.openai.com/docs/usage-guidelines).\n", + "Note that using `stream=True` in a production application makes it more difficult to moderate the content of the completions, as partial completions may be more difficult to evaluate. which has implications for [approved usage](https://beta.openai.com/docs/usage-guidelines).\n", "\n", "Another small drawback of streaming responses is that the response no longer includes the `usage` field to tell you how many tokens were consumed. After receiving and combining all of the responses, you can calculate this yourself using [`tiktoken`](How_to_count_tokens_with_tiktoken.ipynb).\n", "\n", "## Example code\n", "\n", - "Below is a Python code example of how to receive streaming completions." + "Below, this notebook shows:\n", + "1. What a typical chat completion response looks like\n", + "2. What a streaming chat completion response looks like\n", + "3. How much time is saved by streaming a chat completion\n", + "4. How to stream non-chat completions (used by older models like `text-davinci-003`)" ] }, { @@ -34,7 +38,7 @@ "source": [ "# imports\n", "import openai # for OpenAI API calls\n", - "import time # for measuring time savings" + "import time # for measuring time duration of API calls" ] }, { @@ -42,9 +46,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### A typical completion request\n", + "### 1. What a typical chat completion response looks like\n", "\n", - "With a typical Completions API call, the text is first computed and then returned all at once." + "With a typical ChatCompletions API call, the response is first computed and then returned all at once." ] }, { @@ -56,7 +60,1210 @@ "name": "stdout", "output_type": "stream", "text": [ - "Full response received 7.32 seconds after request\n", + "Full response received 3.03 seconds after request\n", + "Full response received:\n", + "{\n", + " \"choices\": [\n", + " {\n", + " \"finish_reason\": \"stop\",\n", + " \"index\": 0,\n", + " \"message\": {\n", + " \"content\": \"\\n\\n1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.\",\n", + " \"role\": \"assistant\"\n", + " }\n", + " }\n", + " ],\n", + " \"created\": 1677825456,\n", + " \"id\": \"chatcmpl-6ptKqrhgRoVchm58Bby0UvJzq2ZuQ\",\n", + " \"model\": \"gpt-3.5-turbo-0301\",\n", + " \"object\": \"chat.completion\",\n", + " \"usage\": {\n", + " \"completion_tokens\": 301,\n", + " \"prompt_tokens\": 36,\n", + " \"total_tokens\": 337\n", + " }\n", + "}\n" + ] + } + ], + "source": [ + "# Example of an OpenAI ChatCompletion request\n", + "# https://platform.openai.com/docs/guides/chat\n", + "\n", + "# record the time before the request is sent\n", + "start_time = time.time()\n", + "\n", + "# send a ChatCompletion request to count to 100\n", + "response = openai.ChatCompletion.create(\n", + " model='gpt-3.5-turbo',\n", + " messages=[\n", + " {'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}\n", + " ],\n", + " temperature=0,\n", + ")\n", + "\n", + "# calculate the time it took to receive the response\n", + "response_time = time.time() - start_time\n", + "\n", + "# print the time delay and text received\n", + "print(f\"Full response received {response_time:.2f} seconds after request\")\n", + "print(f\"Full response received:\\n{response}\")\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The reply can be extracted with `response['choices'][0]['message']`.\n", + "\n", + "The content of the reply can be extracted with `response['choices'][0]['message']['content']`." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Extracted reply: \n", + "{\n", + " \"content\": \"\\n\\n1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.\",\n", + " \"role\": \"assistant\"\n", + "}\n", + "Extracted content: \n", + "\n", + "\n", + "1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.\n" + ] + } + ], + "source": [ + "reply = response['choices'][0]['message']\n", + "print(f\"Extracted reply: \\n{reply}\")\n", + "\n", + "reply_content = response['choices'][0]['message']['content']\n", + "print(f\"Extracted content: \\n{reply_content}\")\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2. How to stream a chat completion\n", + "\n", + "With a streaming API call, the response is sent back incrementally in chunks via an [event stream](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#event_stream_format). In Python, you can iterate over these events with a `for` loop.\n", + "\n", + "Let's see what it looks like:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"choices\": [\n", + " {\n", + " \"delta\": {\n", + " \"role\": \"assistant\"\n", + " },\n", + " \"finish_reason\": null,\n", + " \"index\": 0\n", + " }\n", + " ],\n", + " \"created\": 1677825464,\n", + " \"id\": \"chatcmpl-6ptKyqKOGXZT6iQnqiXAH8adNLUzD\",\n", + " \"model\": \"gpt-3.5-turbo-0301\",\n", + " \"object\": \"chat.completion.chunk\"\n", + "}\n", + "{\n", + " \"choices\": [\n", + " {\n", + " \"delta\": {\n", + " \"content\": \"\\n\\n\"\n", + " },\n", + " \"finish_reason\": null,\n", + " \"index\": 0\n", + " }\n", + " ],\n", + " \"created\": 1677825464,\n", + " \"id\": \"chatcmpl-6ptKyqKOGXZT6iQnqiXAH8adNLUzD\",\n", + " \"model\": \"gpt-3.5-turbo-0301\",\n", + " \"object\": \"chat.completion.chunk\"\n", + "}\n", + "{\n", + " \"choices\": [\n", + " {\n", + " \"delta\": {\n", + " \"content\": \"2\"\n", + " },\n", + " \"finish_reason\": null,\n", + " \"index\": 0\n", + " }\n", + " ],\n", + " \"created\": 1677825464,\n", + " \"id\": \"chatcmpl-6ptKyqKOGXZT6iQnqiXAH8adNLUzD\",\n", + " \"model\": \"gpt-3.5-turbo-0301\",\n", + " \"object\": \"chat.completion.chunk\"\n", + "}\n", + "{\n", + " \"choices\": [\n", + " {\n", + " \"delta\": {},\n", + " \"finish_reason\": \"stop\",\n", + " \"index\": 0\n", + " }\n", + " ],\n", + " \"created\": 1677825464,\n", + " \"id\": \"chatcmpl-6ptKyqKOGXZT6iQnqiXAH8adNLUzD\",\n", + " \"model\": \"gpt-3.5-turbo-0301\",\n", + " \"object\": \"chat.completion.chunk\"\n", + "}\n" + ] + } + ], + "source": [ + "# Example of an OpenAI ChatCompletion request with stream=True\n", + "# https://platform.openai.com/docs/guides/chat\n", + "\n", + "# a ChatCompletion request\n", + "response = openai.ChatCompletion.create(\n", + " model='gpt-3.5-turbo',\n", + " messages=[\n", + " {'role': 'user', 'content': \"What's 1+1? Answer in one word.\"}\n", + " ],\n", + " temperature=0,\n", + " stream=True # this time, we set stream=True\n", + ")\n", + "\n", + "for chunk in response:\n", + " print(chunk)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see above, streaming responses have a `delta` field rather than a `message` field. `delta` can hold things like:\n", + "- a role token (e.g., `{\"role\": \"assistant\"}`)\n", + "- a content token (e.g., `{\"content\": \"\\n\\n\"}`)\n", + "- nothing (e.g., `{}`), when the stream is over" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3. How much time is saved by streaming a chat completion\n", + "\n", + "Now let's ask `gpt-3.5-turbo` to count to 100 again, and see how long it takes." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Message received 0.10 seconds after request: {\n", + " \"role\": \"assistant\"\n", + "}\n", + "Message received 0.10 seconds after request: {\n", + " \"content\": \"\\n\\n\"\n", + "}\n", + "Message received 0.10 seconds after request: {\n", + " \"content\": \"1\"\n", + "}\n", + "Message received 0.11 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.12 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.13 seconds after request: {\n", + " \"content\": \"2\"\n", + "}\n", + "Message received 0.14 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.15 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.16 seconds after request: {\n", + " \"content\": \"3\"\n", + "}\n", + "Message received 0.17 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.18 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.19 seconds after request: {\n", + " \"content\": \"4\"\n", + "}\n", + "Message received 0.20 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.21 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.22 seconds after request: {\n", + " \"content\": \"5\"\n", + "}\n", + "Message received 0.23 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.24 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.25 seconds after request: {\n", + " \"content\": \"6\"\n", + "}\n", + "Message received 0.26 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.27 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.28 seconds after request: {\n", + " \"content\": \"7\"\n", + "}\n", + "Message received 0.29 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.30 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.30 seconds after request: {\n", + " \"content\": \"8\"\n", + "}\n", + "Message received 0.31 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.32 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.33 seconds after request: {\n", + " \"content\": \"9\"\n", + "}\n", + "Message received 0.34 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.35 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.37 seconds after request: {\n", + " \"content\": \"10\"\n", + "}\n", + "Message received 0.40 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.43 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.43 seconds after request: {\n", + " \"content\": \"11\"\n", + "}\n", + "Message received 0.43 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.43 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.43 seconds after request: {\n", + " \"content\": \"12\"\n", + "}\n", + "Message received 0.43 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.44 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.45 seconds after request: {\n", + " \"content\": \"13\"\n", + "}\n", + "Message received 0.46 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.47 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.48 seconds after request: {\n", + " \"content\": \"14\"\n", + "}\n", + "Message received 0.49 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.50 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.51 seconds after request: {\n", + " \"content\": \"15\"\n", + "}\n", + "Message received 0.52 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.53 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.53 seconds after request: {\n", + " \"content\": \"16\"\n", + "}\n", + "Message received 0.55 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.55 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.56 seconds after request: {\n", + " \"content\": \"17\"\n", + "}\n", + "Message received 0.57 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.58 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.59 seconds after request: {\n", + " \"content\": \"18\"\n", + "}\n", + "Message received 0.60 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.61 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.62 seconds after request: {\n", + " \"content\": \"19\"\n", + "}\n", + "Message received 0.63 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.64 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.65 seconds after request: {\n", + " \"content\": \"20\"\n", + "}\n", + "Message received 0.66 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.67 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.68 seconds after request: {\n", + " \"content\": \"21\"\n", + "}\n", + "Message received 0.69 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.70 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.71 seconds after request: {\n", + " \"content\": \"22\"\n", + "}\n", + "Message received 0.72 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.73 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.74 seconds after request: {\n", + " \"content\": \"23\"\n", + "}\n", + "Message received 0.75 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.75 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.76 seconds after request: {\n", + " \"content\": \"24\"\n", + "}\n", + "Message received 0.79 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.79 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.79 seconds after request: {\n", + " \"content\": \"25\"\n", + "}\n", + "Message received 0.80 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.81 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.82 seconds after request: {\n", + " \"content\": \"26\"\n", + "}\n", + "Message received 0.83 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.84 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.85 seconds after request: {\n", + " \"content\": \"27\"\n", + "}\n", + "Message received 0.86 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.87 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.88 seconds after request: {\n", + " \"content\": \"28\"\n", + "}\n", + "Message received 0.89 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.90 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.92 seconds after request: {\n", + " \"content\": \"29\"\n", + "}\n", + "Message received 0.92 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.93 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.94 seconds after request: {\n", + " \"content\": \"30\"\n", + "}\n", + "Message received 0.95 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.96 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 0.97 seconds after request: {\n", + " \"content\": \"31\"\n", + "}\n", + "Message received 0.98 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 0.99 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 1.00 seconds after request: {\n", + " \"content\": \"32\"\n", + "}\n", + "Message received 1.01 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 1.02 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 1.03 seconds after request: {\n", + " \"content\": \"33\"\n", + "}\n", + "Message received 1.04 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 1.05 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 1.06 seconds after request: {\n", + " \"content\": \"34\"\n", + "}\n", + "Message received 1.07 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 1.08 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 1.09 seconds after request: {\n", + " \"content\": \"35\"\n", + "}\n", + "Message received 1.10 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 1.11 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 1.12 seconds after request: {\n", + " \"content\": \"36\"\n", + "}\n", + "Message received 1.13 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 1.13 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 1.14 seconds after request: {\n", + " \"content\": \"37\"\n", + "}\n", + "Message received 1.15 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 1.17 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 1.18 seconds after request: {\n", + " \"content\": \"38\"\n", + "}\n", + "Message received 1.19 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 1.19 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 1.20 seconds after request: {\n", + " \"content\": \"39\"\n", + "}\n", + "Message received 1.21 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 1.22 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 1.23 seconds after request: {\n", + " \"content\": \"40\"\n", + "}\n", + "Message received 1.24 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 1.25 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 1.26 seconds after request: {\n", + " \"content\": \"41\"\n", + "}\n", + "Message received 1.27 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 1.28 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 1.29 seconds after request: {\n", + " \"content\": \"42\"\n", + "}\n", + "Message received 1.30 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 1.31 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 1.32 seconds after request: {\n", + " \"content\": \"43\"\n", + "}\n", + "Message received 1.33 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 1.34 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 1.37 seconds after request: {\n", + " \"content\": \"44\"\n", + "}\n", + "Message received 1.37 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 1.37 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 1.37 seconds after request: {\n", + " \"content\": \"45\"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \"46\"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \"47\"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \"48\"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \"49\"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \"50\"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \"51\"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \"52\"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \"53\"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.10 seconds after request: {\n", + " \"content\": \"54\"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \"55\"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \"56\"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \"57\"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \"58\"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \"59\"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \"60\"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \"61\"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \"62\"\n", + "}\n", + "Message received 2.24 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \"63\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \"64\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \"65\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \"66\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \"67\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \"68\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \"69\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \"70\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \"71\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \"72\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \"73\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \"74\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.25 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.26 seconds after request: {\n", + " \"content\": \"75\"\n", + "}\n", + "Message received 2.26 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.26 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.27 seconds after request: {\n", + " \"content\": \"76\"\n", + "}\n", + "Message received 2.28 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.29 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.29 seconds after request: {\n", + " \"content\": \"77\"\n", + "}\n", + "Message received 2.31 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.32 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.32 seconds after request: {\n", + " \"content\": \"78\"\n", + "}\n", + "Message received 2.33 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.35 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.35 seconds after request: {\n", + " \"content\": \"79\"\n", + "}\n", + "Message received 2.36 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.37 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.38 seconds after request: {\n", + " \"content\": \"80\"\n", + "}\n", + "Message received 2.39 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.40 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.41 seconds after request: {\n", + " \"content\": \"81\"\n", + "}\n", + "Message received 2.42 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.43 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.44 seconds after request: {\n", + " \"content\": \"82\"\n", + "}\n", + "Message received 2.45 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.46 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.47 seconds after request: {\n", + " \"content\": \"83\"\n", + "}\n", + "Message received 2.48 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.49 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.50 seconds after request: {\n", + " \"content\": \"84\"\n", + "}\n", + "Message received 2.51 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.52 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.53 seconds after request: {\n", + " \"content\": \"85\"\n", + "}\n", + "Message received 2.54 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.55 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.56 seconds after request: {\n", + " \"content\": \"86\"\n", + "}\n", + "Message received 2.57 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.58 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.59 seconds after request: {\n", + " \"content\": \"87\"\n", + "}\n", + "Message received 2.60 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.60 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.62 seconds after request: {\n", + " \"content\": \"88\"\n", + "}\n", + "Message received 2.63 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.63 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.64 seconds after request: {\n", + " \"content\": \"89\"\n", + "}\n", + "Message received 2.66 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.66 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.68 seconds after request: {\n", + " \"content\": \"90\"\n", + "}\n", + "Message received 2.68 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.69 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.70 seconds after request: {\n", + " \"content\": \"91\"\n", + "}\n", + "Message received 2.71 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.72 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.73 seconds after request: {\n", + " \"content\": \"92\"\n", + "}\n", + "Message received 2.74 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.75 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.76 seconds after request: {\n", + " \"content\": \"93\"\n", + "}\n", + "Message received 2.77 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.78 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.79 seconds after request: {\n", + " \"content\": \"94\"\n", + "}\n", + "Message received 2.80 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.81 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.81 seconds after request: {\n", + " \"content\": \"95\"\n", + "}\n", + "Message received 2.82 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.83 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.84 seconds after request: {\n", + " \"content\": \"96\"\n", + "}\n", + "Message received 2.85 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.86 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.87 seconds after request: {\n", + " \"content\": \"97\"\n", + "}\n", + "Message received 2.88 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.88 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.89 seconds after request: {\n", + " \"content\": \"98\"\n", + "}\n", + "Message received 2.90 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.91 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.92 seconds after request: {\n", + " \"content\": \"99\"\n", + "}\n", + "Message received 2.93 seconds after request: {\n", + " \"content\": \",\"\n", + "}\n", + "Message received 2.93 seconds after request: {\n", + " \"content\": \" \"\n", + "}\n", + "Message received 2.94 seconds after request: {\n", + " \"content\": \"100\"\n", + "}\n", + "Message received 2.95 seconds after request: {\n", + " \"content\": \".\"\n", + "}\n", + "Message received 2.97 seconds after request: {}\n", + "Full response received 2.97 seconds after request\n", + "Full conversation received: \n", + "\n", + "1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.\n" + ] + } + ], + "source": [ + "# Example of an OpenAI ChatCompletion request with stream=True\n", + "# https://platform.openai.com/docs/guides/chat\n", + "\n", + "# record the time before the request is sent\n", + "start_time = time.time()\n", + "\n", + "# send a ChatCompletion request to count to 100\n", + "response = openai.ChatCompletion.create(\n", + " model='gpt-3.5-turbo',\n", + " messages=[\n", + " {'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}\n", + " ],\n", + " temperature=0,\n", + " stream=True # again, we set stream=True\n", + ")\n", + "\n", + "# create variables to collect the stream of chunks\n", + "collected_chunks = []\n", + "collected_messages = []\n", + "# iterate through the stream of events\n", + "for chunk in response:\n", + " chunk_time = time.time() - start_time # calculate the time delay of the chunk\n", + " collected_chunks.append(chunk) # save the event response\n", + " chunk_message = chunk['choices'][0]['delta'] # extract the message\n", + " collected_messages.append(chunk_message) # save the message\n", + " print(f\"Message received {chunk_time:.2f} seconds after request: {chunk_message}\") # print the delay and text\n", + "\n", + "# print the time delay and text received\n", + "print(f\"Full response received {chunk_time:.2f} seconds after request\")\n", + "full_reply_content = ''.join([m.get('content', '') for m in collected_messages])\n", + "print(f\"Full conversation received: {full_reply_content}\")\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Time comparison\n", + "\n", + "In the example above, both requests took about 3 seconds to fully complete. Request times will vary depending on load and other stochastic factors.\n", + "\n", + "However, with the streaming request, we received the first token after 0.1 seconds, and subsequent tokens every ~0.01-0.02 seconds." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 4. How to stream non-chat completions (used by older models like `text-davinci-003`)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### A typical completion request\n", + "\n", + "With a typical Completions API call, the text is first computed and then returned all at once." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Full response received 3.43 seconds after request\n", "Full text received: 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100\n" ] } @@ -92,214 +1299,214 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### A streaming completion request\n", + "#### A streaming completion request\n", "\n", "With a streaming Completions API call, the text is sent back via a series of events. In Python, you can iterate over these events with a `for` loop." ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Text received: 4 (0.16 seconds after request)\n", + "Text received: 4 (0.18 seconds after request)\n", "Text received: , (0.19 seconds after request)\n", "Text received: 5 (0.21 seconds after request)\n", - "Text received: , (0.24 seconds after request)\n", - "Text received: 6 (0.27 seconds after request)\n", + "Text received: , (0.23 seconds after request)\n", + "Text received: 6 (0.25 seconds after request)\n", + "Text received: , (0.26 seconds after request)\n", + "Text received: 7 (0.28 seconds after request)\n", "Text received: , (0.29 seconds after request)\n", - "Text received: 7 (0.32 seconds after request)\n", - "Text received: , (0.35 seconds after request)\n", - "Text received: 8 (0.37 seconds after request)\n", - "Text received: , (0.40 seconds after request)\n", - "Text received: 9 (0.43 seconds after request)\n", - "Text received: , (0.46 seconds after request)\n", - "Text received: 10 (0.48 seconds after request)\n", + "Text received: 8 (0.31 seconds after request)\n", + "Text received: , (0.33 seconds after request)\n", + "Text received: 9 (0.35 seconds after request)\n", + "Text received: , (0.36 seconds after request)\n", + "Text received: 10 (0.38 seconds after request)\n", + "Text received: , (0.39 seconds after request)\n", + "Text received: 11 (0.41 seconds after request)\n", + "Text received: , (0.42 seconds after request)\n", + "Text received: 12 (0.44 seconds after request)\n", + "Text received: , (0.45 seconds after request)\n", + "Text received: 13 (0.47 seconds after request)\n", + "Text received: , (0.48 seconds after request)\n", + "Text received: 14 (0.50 seconds after request)\n", "Text received: , (0.51 seconds after request)\n", - "Text received: 11 (0.54 seconds after request)\n", - "Text received: , (0.56 seconds after request)\n", - "Text received: 12 (0.59 seconds after request)\n", + "Text received: 15 (0.53 seconds after request)\n", + "Text received: , (0.54 seconds after request)\n", + "Text received: 16 (0.56 seconds after request)\n", + "Text received: , (0.57 seconds after request)\n", + "Text received: 17 (0.59 seconds after request)\n", "Text received: , (0.62 seconds after request)\n", - "Text received: 13 (0.64 seconds after request)\n", - "Text received: , (0.67 seconds after request)\n", - "Text received: 14 (0.70 seconds after request)\n", - "Text received: , (0.72 seconds after request)\n", - "Text received: 15 (0.75 seconds after request)\n", - "Text received: , (0.78 seconds after request)\n", - "Text received: 16 (0.84 seconds after request)\n", - "Text received: , (0.84 seconds after request)\n", - "Text received: 17 (0.86 seconds after request)\n", - "Text received: , (0.89 seconds after request)\n", - "Text received: 18 (0.91 seconds after request)\n", + "Text received: 18 (0.62 seconds after request)\n", + "Text received: , (0.63 seconds after request)\n", + "Text received: 19 (0.64 seconds after request)\n", + "Text received: , (0.66 seconds after request)\n", + "Text received: 20 (0.67 seconds after request)\n", + "Text received: , (0.68 seconds after request)\n", + "Text received: 21 (0.70 seconds after request)\n", + "Text received: , (0.71 seconds after request)\n", + "Text received: 22 (0.73 seconds after request)\n", + "Text received: , (0.74 seconds after request)\n", + "Text received: 23 (0.76 seconds after request)\n", + "Text received: , (0.77 seconds after request)\n", + "Text received: 24 (0.78 seconds after request)\n", + "Text received: , (0.80 seconds after request)\n", + "Text received: 25 (0.81 seconds after request)\n", + "Text received: , (0.82 seconds after request)\n", + "Text received: 26 (0.84 seconds after request)\n", + "Text received: , (0.85 seconds after request)\n", + "Text received: 27 (0.89 seconds after request)\n", + "Text received: , (0.90 seconds after request)\n", + "Text received: 28 (0.90 seconds after request)\n", + "Text received: , (0.91 seconds after request)\n", + "Text received: 29 (0.92 seconds after request)\n", "Text received: , (0.94 seconds after request)\n", - "Text received: 19 (1.41 seconds after request)\n", - "Text received: , (1.41 seconds after request)\n", - "Text received: 20 (1.41 seconds after request)\n", - "Text received: , (1.41 seconds after request)\n", - "Text received: 21 (1.41 seconds after request)\n", - "Text received: , (1.41 seconds after request)\n", - "Text received: 22 (1.41 seconds after request)\n", - "Text received: , (1.41 seconds after request)\n", - "Text received: 23 (1.41 seconds after request)\n", - "Text received: , (1.41 seconds after request)\n", - "Text received: 24 (1.46 seconds after request)\n", - "Text received: , (1.46 seconds after request)\n", - "Text received: 25 (1.46 seconds after request)\n", + "Text received: 30 (0.95 seconds after request)\n", + "Text received: , (0.96 seconds after request)\n", + "Text received: 31 (0.97 seconds after request)\n", + "Text received: , (0.99 seconds after request)\n", + "Text received: 32 (1.00 seconds after request)\n", + "Text received: , (1.01 seconds after request)\n", + "Text received: 33 (1.03 seconds after request)\n", + "Text received: , (1.04 seconds after request)\n", + "Text received: 34 (1.05 seconds after request)\n", + "Text received: , (1.07 seconds after request)\n", + "Text received: 35 (1.08 seconds after request)\n", + "Text received: , (1.10 seconds after request)\n", + "Text received: 36 (1.11 seconds after request)\n", + "Text received: , (1.12 seconds after request)\n", + "Text received: 37 (1.13 seconds after request)\n", + "Text received: , (1.15 seconds after request)\n", + "Text received: 38 (1.16 seconds after request)\n", + "Text received: , (1.18 seconds after request)\n", + "Text received: 39 (1.19 seconds after request)\n", + "Text received: , (1.20 seconds after request)\n", + "Text received: 40 (1.22 seconds after request)\n", + "Text received: , (1.24 seconds after request)\n", + "Text received: 41 (1.25 seconds after request)\n", + "Text received: , (1.26 seconds after request)\n", + "Text received: 42 (1.27 seconds after request)\n", + "Text received: , (1.29 seconds after request)\n", + "Text received: 43 (1.30 seconds after request)\n", + "Text received: , (1.31 seconds after request)\n", + "Text received: 44 (1.32 seconds after request)\n", + "Text received: , (1.34 seconds after request)\n", + "Text received: 45 (1.35 seconds after request)\n", + "Text received: , (1.36 seconds after request)\n", + "Text received: 46 (1.38 seconds after request)\n", + "Text received: , (1.39 seconds after request)\n", + "Text received: 47 (1.40 seconds after request)\n", + "Text received: , (1.42 seconds after request)\n", + "Text received: 48 (1.43 seconds after request)\n", + "Text received: , (1.45 seconds after request)\n", + "Text received: 49 (1.47 seconds after request)\n", + "Text received: , (1.47 seconds after request)\n", + "Text received: 50 (1.49 seconds after request)\n", + "Text received: , (1.50 seconds after request)\n", + "Text received: 51 (1.51 seconds after request)\n", + "Text received: , (1.53 seconds after request)\n", + "Text received: 52 (1.54 seconds after request)\n", "Text received: , (1.55 seconds after request)\n", - "Text received: 26 (1.61 seconds after request)\n", - "Text received: , (1.65 seconds after request)\n", - "Text received: 27 (1.66 seconds after request)\n", - "Text received: , (1.70 seconds after request)\n", - "Text received: 28 (1.72 seconds after request)\n", - "Text received: , (1.75 seconds after request)\n", - "Text received: 29 (1.78 seconds after request)\n", - "Text received: , (2.05 seconds after request)\n", - "Text received: 30 (2.08 seconds after request)\n", - "Text received: , (2.13 seconds after request)\n", - "Text received: 31 (2.16 seconds after request)\n", + "Text received: 53 (1.57 seconds after request)\n", + "Text received: , (1.58 seconds after request)\n", + "Text received: 54 (1.59 seconds after request)\n", + "Text received: , (1.61 seconds after request)\n", + "Text received: 55 (1.62 seconds after request)\n", + "Text received: , (1.64 seconds after request)\n", + "Text received: 56 (1.65 seconds after request)\n", + "Text received: , (1.66 seconds after request)\n", + "Text received: 57 (1.69 seconds after request)\n", + "Text received: , (1.69 seconds after request)\n", + "Text received: 58 (1.70 seconds after request)\n", + "Text received: , (1.72 seconds after request)\n", + "Text received: 59 (1.73 seconds after request)\n", + "Text received: , (1.74 seconds after request)\n", + "Text received: 60 (1.76 seconds after request)\n", + "Text received: , (1.77 seconds after request)\n", + "Text received: 61 (1.78 seconds after request)\n", + "Text received: , (1.80 seconds after request)\n", + "Text received: 62 (1.81 seconds after request)\n", + "Text received: , (1.83 seconds after request)\n", + "Text received: 63 (1.84 seconds after request)\n", + "Text received: , (1.85 seconds after request)\n", + "Text received: 64 (1.86 seconds after request)\n", + "Text received: , (1.88 seconds after request)\n", + "Text received: 65 (1.89 seconds after request)\n", + "Text received: , (1.90 seconds after request)\n", + "Text received: 66 (1.92 seconds after request)\n", + "Text received: , (1.93 seconds after request)\n", + "Text received: 67 (1.95 seconds after request)\n", + "Text received: , (1.96 seconds after request)\n", + "Text received: 68 (1.99 seconds after request)\n", + "Text received: , (1.99 seconds after request)\n", + "Text received: 69 (2.00 seconds after request)\n", + "Text received: , (2.01 seconds after request)\n", + "Text received: 70 (2.03 seconds after request)\n", + "Text received: , (2.04 seconds after request)\n", + "Text received: 71 (2.05 seconds after request)\n", + "Text received: , (2.07 seconds after request)\n", + "Text received: 72 (2.08 seconds after request)\n", + "Text received: , (2.09 seconds after request)\n", + "Text received: 73 (2.11 seconds after request)\n", + "Text received: , (2.12 seconds after request)\n", + "Text received: 74 (2.13 seconds after request)\n", + "Text received: , (2.15 seconds after request)\n", + "Text received: 75 (2.16 seconds after request)\n", + "Text received: , (2.17 seconds after request)\n", + "Text received: 76 (2.18 seconds after request)\n", "Text received: , (2.20 seconds after request)\n", - "Text received: 32 (2.26 seconds after request)\n", + "Text received: 77 (2.22 seconds after request)\n", + "Text received: , (2.23 seconds after request)\n", + "Text received: 78 (2.24 seconds after request)\n", + "Text received: , (2.25 seconds after request)\n", + "Text received: 79 (2.26 seconds after request)\n", "Text received: , (2.28 seconds after request)\n", - "Text received: 33 (2.31 seconds after request)\n", - "Text received: , (2.35 seconds after request)\n", - "Text received: 34 (2.38 seconds after request)\n", + "Text received: 80 (2.28 seconds after request)\n", + "Text received: , (2.29 seconds after request)\n", + "Text received: 81 (2.30 seconds after request)\n", + "Text received: , (2.31 seconds after request)\n", + "Text received: 82 (2.33 seconds after request)\n", + "Text received: , (2.34 seconds after request)\n", + "Text received: 83 (2.35 seconds after request)\n", + "Text received: , (2.36 seconds after request)\n", + "Text received: 84 (2.37 seconds after request)\n", + "Text received: , (2.39 seconds after request)\n", + "Text received: 85 (2.39 seconds after request)\n", + "Text received: , (2.40 seconds after request)\n", + "Text received: 86 (2.43 seconds after request)\n", + "Text received: , (2.43 seconds after request)\n", + "Text received: 87 (2.44 seconds after request)\n", + "Text received: , (2.45 seconds after request)\n", + "Text received: 88 (2.46 seconds after request)\n", + "Text received: , (2.47 seconds after request)\n", + "Text received: 89 (2.48 seconds after request)\n", + "Text received: , (2.49 seconds after request)\n", + "Text received: 90 (2.50 seconds after request)\n", + "Text received: , (2.51 seconds after request)\n", + "Text received: 91 (2.52 seconds after request)\n", "Text received: , (2.54 seconds after request)\n", - "Text received: 35 (2.55 seconds after request)\n", - "Text received: , (2.59 seconds after request)\n", - "Text received: 36 (2.61 seconds after request)\n", - "Text received: , (2.64 seconds after request)\n", - "Text received: 37 (2.67 seconds after request)\n", - "Text received: , (2.71 seconds after request)\n", - "Text received: 38 (2.86 seconds after request)\n", - "Text received: , (2.89 seconds after request)\n", - "Text received: 39 (2.92 seconds after request)\n", - "Text received: , (2.95 seconds after request)\n", - "Text received: 40 (2.99 seconds after request)\n", - "Text received: , (3.01 seconds after request)\n", - "Text received: 41 (3.04 seconds after request)\n", - "Text received: , (3.08 seconds after request)\n", - "Text received: 42 (3.15 seconds after request)\n", - "Text received: , (3.33 seconds after request)\n", - "Text received: 43 (3.36 seconds after request)\n", - "Text received: , (3.43 seconds after request)\n", - "Text received: 44 (3.47 seconds after request)\n", - "Text received: , (3.50 seconds after request)\n", - "Text received: 45 (3.53 seconds after request)\n", - "Text received: , (3.56 seconds after request)\n", - "Text received: 46 (3.59 seconds after request)\n", - "Text received: , (3.63 seconds after request)\n", - "Text received: 47 (3.65 seconds after request)\n", - "Text received: , (3.68 seconds after request)\n", - "Text received: 48 (3.71 seconds after request)\n", - "Text received: , (3.77 seconds after request)\n", - "Text received: 49 (3.77 seconds after request)\n", - "Text received: , (3.79 seconds after request)\n", - "Text received: 50 (3.82 seconds after request)\n", - "Text received: , (3.85 seconds after request)\n", - "Text received: 51 (3.89 seconds after request)\n", - "Text received: , (3.91 seconds after request)\n", - "Text received: 52 (3.93 seconds after request)\n", - "Text received: , (3.96 seconds after request)\n", - "Text received: 53 (3.98 seconds after request)\n", - "Text received: , (4.04 seconds after request)\n", - "Text received: 54 (4.05 seconds after request)\n", - "Text received: , (4.07 seconds after request)\n", - "Text received: 55 (4.10 seconds after request)\n", - "Text received: , (4.13 seconds after request)\n", - "Text received: 56 (4.19 seconds after request)\n", - "Text received: , (4.20 seconds after request)\n", - "Text received: 57 (4.20 seconds after request)\n", - "Text received: , (4.23 seconds after request)\n", - "Text received: 58 (4.26 seconds after request)\n", - "Text received: , (4.30 seconds after request)\n", - "Text received: 59 (4.31 seconds after request)\n", - "Text received: , (4.59 seconds after request)\n", - "Text received: 60 (4.61 seconds after request)\n", - "Text received: , (4.64 seconds after request)\n", - "Text received: 61 (4.67 seconds after request)\n", - "Text received: , (4.72 seconds after request)\n", - "Text received: 62 (4.73 seconds after request)\n", - "Text received: , (4.76 seconds after request)\n", - "Text received: 63 (4.80 seconds after request)\n", - "Text received: , (4.83 seconds after request)\n", - "Text received: 64 (4.86 seconds after request)\n", - "Text received: , (4.89 seconds after request)\n", - "Text received: 65 (4.92 seconds after request)\n", - "Text received: , (4.94 seconds after request)\n", - "Text received: 66 (4.97 seconds after request)\n", - "Text received: , (5.00 seconds after request)\n", - "Text received: 67 (5.03 seconds after request)\n", - "Text received: , (5.06 seconds after request)\n", - "Text received: 68 (5.09 seconds after request)\n", - "Text received: , (5.14 seconds after request)\n", - "Text received: 69 (5.16 seconds after request)\n", - "Text received: , (5.19 seconds after request)\n", - "Text received: 70 (5.22 seconds after request)\n", - "Text received: , (5.28 seconds after request)\n", - "Text received: 71 (5.30 seconds after request)\n", - "Text received: , (5.33 seconds after request)\n", - "Text received: 72 (5.36 seconds after request)\n", - "Text received: , (5.38 seconds after request)\n", - "Text received: 73 (5.41 seconds after request)\n", - "Text received: , (5.44 seconds after request)\n", - "Text received: 74 (5.48 seconds after request)\n", - "Text received: , (5.51 seconds after request)\n", - "Text received: 75 (5.53 seconds after request)\n", - "Text received: , (5.56 seconds after request)\n", - "Text received: 76 (5.60 seconds after request)\n", - "Text received: , (5.62 seconds after request)\n", - "Text received: 77 (5.65 seconds after request)\n", - "Text received: , (5.68 seconds after request)\n", - "Text received: 78 (5.71 seconds after request)\n", - "Text received: , (5.77 seconds after request)\n", - "Text received: 79 (5.77 seconds after request)\n", - "Text received: , (5.79 seconds after request)\n", - "Text received: 80 (5.82 seconds after request)\n", - "Text received: , (5.85 seconds after request)\n", - "Text received: 81 (5.88 seconds after request)\n", - "Text received: , (5.92 seconds after request)\n", - "Text received: 82 (5.93 seconds after request)\n", - "Text received: , (5.97 seconds after request)\n", - "Text received: 83 (5.98 seconds after request)\n", - "Text received: , (6.01 seconds after request)\n", - "Text received: 84 (6.04 seconds after request)\n", - "Text received: , (6.07 seconds after request)\n", - "Text received: 85 (6.09 seconds after request)\n", - "Text received: , (6.11 seconds after request)\n", - "Text received: 86 (6.14 seconds after request)\n", - "Text received: , (6.17 seconds after request)\n", - "Text received: 87 (6.19 seconds after request)\n", - "Text received: , (6.22 seconds after request)\n", - "Text received: 88 (6.24 seconds after request)\n", - "Text received: , (6.27 seconds after request)\n", - "Text received: 89 (6.30 seconds after request)\n", - "Text received: , (6.31 seconds after request)\n", - "Text received: 90 (6.35 seconds after request)\n", - "Text received: , (6.36 seconds after request)\n", - "Text received: 91 (6.40 seconds after request)\n", - "Text received: , (6.44 seconds after request)\n", - "Text received: 92 (6.46 seconds after request)\n", - "Text received: , (6.49 seconds after request)\n", - "Text received: 93 (6.51 seconds after request)\n", - "Text received: , (6.54 seconds after request)\n", - "Text received: 94 (6.56 seconds after request)\n", - "Text received: , (6.59 seconds after request)\n", - "Text received: 95 (6.62 seconds after request)\n", - "Text received: , (6.64 seconds after request)\n", - "Text received: 96 (6.68 seconds after request)\n", - "Text received: , (6.68 seconds after request)\n", - "Text received: 97 (6.70 seconds after request)\n", - "Text received: , (6.73 seconds after request)\n", - "Text received: 98 (6.75 seconds after request)\n", - "Text received: , (6.78 seconds after request)\n", - "Text received: 99 (6.90 seconds after request)\n", - "Text received: , (6.92 seconds after request)\n", - "Text received: 100 (7.25 seconds after request)\n", - "Full response received 7.25 seconds after request\n", + "Text received: 92 (2.55 seconds after request)\n", + "Text received: , (2.57 seconds after request)\n", + "Text received: 93 (2.57 seconds after request)\n", + "Text received: , (2.58 seconds after request)\n", + "Text received: 94 (2.59 seconds after request)\n", + "Text received: , (2.60 seconds after request)\n", + "Text received: 95 (2.62 seconds after request)\n", + "Text received: , (2.62 seconds after request)\n", + "Text received: 96 (2.64 seconds after request)\n", + "Text received: , (2.65 seconds after request)\n", + "Text received: 97 (2.66 seconds after request)\n", + "Text received: , (2.67 seconds after request)\n", + "Text received: 98 (2.68 seconds after request)\n", + "Text received: , (2.69 seconds after request)\n", + "Text received: 99 (2.71 seconds after request)\n", + "Text received: , (2.72 seconds after request)\n", + "Text received: 100 (2.73 seconds after request)\n", + "Full response received 2.73 seconds after request\n", "Full text received: 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100\n" ] } @@ -341,12 +1548,17 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Time comparison\n", + "#### Time comparison\n", "\n", - "In the example above, both requests took about 7 seconds to fully complete.\n", + "In the example above, both requests took about 3 seconds to fully complete. Request times will vary depending on load and other stochastic factors.\n", "\n", - "However, with the streaming request, you would have received the first token after 0.16 seconds, and subsequent tokens after about ~0.035 seconds each." + "However, with the streaming request, we received the first token after 0.18 seconds, and subsequent tokens every ~0.01-0.02 seconds." ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] } ], "metadata": { @@ -365,7 +1577,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.9 (main, Dec 7 2021, 18:04:56) \n[Clang 13.0.0 (clang-1300.0.29.3)]" + "version": "3.9.9" }, "orig_nbformat": 4, "vscode": {