openai-cookbook/examples/How_to_stream_completions.ipynb
2022-09-02 12:15:34 -07:00

757 lines
24 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# How to stream completions\n",
"\n",
"By default, when you send a prompt to the OpenAI Completions endpoint, it computes the entire completion and sends it back in a single response.\n",
"\n",
"If you're generating very long completions from a davinci-level model, waiting for the response can take many seconds. As of Aug 2022, responses from `text-davinci-002` typically take something like ~1 second plus ~2 seconds per 100 completion tokens.\n",
"\n",
"If you want to get the response faster, you can 'stream' the completion as it's being generated. This allows you to start printing or otherwise processing the beginning of the completion before the entire completion is finished.\n",
"\n",
"To stream completions, set `stream=True` when calling the Completions endpoint. This will return an object that streams back text as [data-only server-sent events](https://app.mode.com/openai/reports/4fce5ba22b5b/runs/f518a0be4495).\n",
"\n",
"Note that using `stream=True` in a production application makes it more difficult to moderate the content of the completions, which has implications for [approved usage](https://beta.openai.com/docs/usage-guidelines).\n",
"\n",
"Below is a Python code example of how to receive streaming completions."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"import openai # for OpenAI API calls\n",
"import time # for measuring time savings"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A typical completion request\n",
"\n",
"With a typical Completions API call, the text is first computed and then returned all at once."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Full response received 6.93 seconds after request\n",
"Full text received: 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100\n"
]
}
],
"source": [
"# Example of an OpenAI Completion request\n",
"# https://beta.openai.com/docs/api-reference/completions/create\n",
"\n",
"# record the time before the request is sent\n",
"start_time = time.time()\n",
"\n",
"# send a Completion request to count to 100\n",
"response = openai.Completion.create(\n",
" model='text-davinci-002',\n",
" prompt='1,2,3,',\n",
" max_tokens=193,\n",
" temperature=0,\n",
")\n",
"\n",
"# calculate the time it took to receive the response\n",
"response_time = time.time() - start_time\n",
"\n",
"# extract the text from the response\n",
"completion_text = response['choices'][0]['text']\n",
"\n",
"# print the time delay and text received\n",
"print(f\"Full response received {response_time:.2f} seconds after request\")\n",
"print(f\"Full text received: {completion_text}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A streaming completion request\n",
"\n",
"With a streaming Completions API call, the text is sent back via a series of events, which you can iterate over with a `for` loop."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Event received 0.19 seconds after request\n",
"Text received: 4\n",
"\n",
"Event received 0.22 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 0.26 seconds after request\n",
"Text received: 5\n",
"\n",
"Event received 0.28 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 0.32 seconds after request\n",
"Text received: 6\n",
"\n",
"Event received 0.35 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 0.38 seconds after request\n",
"Text received: 7\n",
"\n",
"Event received 0.41 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 0.44 seconds after request\n",
"Text received: 8\n",
"\n",
"Event received 0.52 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 0.55 seconds after request\n",
"Text received: 9\n",
"\n",
"Event received 0.58 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 0.64 seconds after request\n",
"Text received: 10\n",
"\n",
"Event received 0.67 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 0.70 seconds after request\n",
"Text received: 11\n",
"\n",
"Event received 0.73 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 0.76 seconds after request\n",
"Text received: 12\n",
"\n",
"Event received 0.79 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 0.82 seconds after request\n",
"Text received: 13\n",
"\n",
"Event received 0.85 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 0.89 seconds after request\n",
"Text received: 14\n",
"\n",
"Event received 0.92 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 0.96 seconds after request\n",
"Text received: 15\n",
"\n",
"Event received 1.00 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 1.03 seconds after request\n",
"Text received: 16\n",
"\n",
"Event received 1.11 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 1.24 seconds after request\n",
"Text received: 17\n",
"\n",
"Event received 1.25 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 1.28 seconds after request\n",
"Text received: 18\n",
"\n",
"Event received 1.31 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 1.34 seconds after request\n",
"Text received: 19\n",
"\n",
"Event received 1.38 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 1.41 seconds after request\n",
"Text received: 20\n",
"\n",
"Event received 1.44 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 1.47 seconds after request\n",
"Text received: 21\n",
"\n",
"Event received 1.50 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 1.53 seconds after request\n",
"Text received: 22\n",
"\n",
"Event received 1.56 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 1.60 seconds after request\n",
"Text received: 23\n",
"\n",
"Event received 1.63 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 1.66 seconds after request\n",
"Text received: 24\n",
"\n",
"Event received 1.73 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 1.76 seconds after request\n",
"Text received: 25\n",
"\n",
"Event received 1.79 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 1.82 seconds after request\n",
"Text received: 26\n",
"\n",
"Event received 1.85 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 1.88 seconds after request\n",
"Text received: 27\n",
"\n",
"Event received 1.91 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 1.94 seconds after request\n",
"Text received: 28\n",
"\n",
"Event received 2.00 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 2.03 seconds after request\n",
"Text received: 29\n",
"\n",
"Event received 2.06 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 2.09 seconds after request\n",
"Text received: 30\n",
"\n",
"Event received 2.12 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 2.15 seconds after request\n",
"Text received: 31\n",
"\n",
"Event received 2.18 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 2.21 seconds after request\n",
"Text received: 32\n",
"\n",
"Event received 2.26 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 2.28 seconds after request\n",
"Text received: 33\n",
"\n",
"Event received 2.31 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 2.34 seconds after request\n",
"Text received: 34\n",
"\n",
"Event received 2.37 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 2.43 seconds after request\n",
"Text received: 35\n",
"\n",
"Event received 2.47 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 2.50 seconds after request\n",
"Text received: 36\n",
"\n",
"Event received 2.53 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 2.56 seconds after request\n",
"Text received: 37\n",
"\n",
"Event received 2.62 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 2.67 seconds after request\n",
"Text received: 38\n",
"\n",
"Event received 2.70 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 2.73 seconds after request\n",
"Text received: 39\n",
"\n",
"Event received 2.76 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 2.79 seconds after request\n",
"Text received: 40\n",
"\n",
"Event received 2.82 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 2.85 seconds after request\n",
"Text received: 41\n",
"\n",
"Event received 2.88 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 2.91 seconds after request\n",
"Text received: 42\n",
"\n",
"Event received 2.94 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 2.97 seconds after request\n",
"Text received: 43\n",
"\n",
"Event received 3.00 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 3.03 seconds after request\n",
"Text received: 44\n",
"\n",
"Event received 3.05 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 3.08 seconds after request\n",
"Text received: 45\n",
"\n",
"Event received 3.11 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 3.16 seconds after request\n",
"Text received: 46\n",
"\n",
"Event received 3.20 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 3.23 seconds after request\n",
"Text received: 47\n",
"\n",
"Event received 3.29 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 3.32 seconds after request\n",
"Text received: 48\n",
"\n",
"Event received 3.39 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 3.45 seconds after request\n",
"Text received: 49\n",
"\n",
"Event received 3.48 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 3.51 seconds after request\n",
"Text received: 50\n",
"\n",
"Event received 3.54 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 3.57 seconds after request\n",
"Text received: 51\n",
"\n",
"Event received 3.60 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 3.63 seconds after request\n",
"Text received: 52\n",
"\n",
"Event received 3.66 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 3.69 seconds after request\n",
"Text received: 53\n",
"\n",
"Event received 3.72 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 3.75 seconds after request\n",
"Text received: 54\n",
"\n",
"Event received 3.78 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 3.81 seconds after request\n",
"Text received: 55\n",
"\n",
"Event received 3.90 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 3.93 seconds after request\n",
"Text received: 56\n",
"\n",
"Event received 3.96 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 3.99 seconds after request\n",
"Text received: 57\n",
"\n",
"Event received 4.02 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 4.05 seconds after request\n",
"Text received: 58\n",
"\n",
"Event received 4.08 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 4.14 seconds after request\n",
"Text received: 59\n",
"\n",
"Event received 4.17 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 4.17 seconds after request\n",
"Text received: 60\n",
"\n",
"Event received 4.20 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 4.23 seconds after request\n",
"Text received: 61\n",
"\n",
"Event received 4.26 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 4.29 seconds after request\n",
"Text received: 62\n",
"\n",
"Event received 4.32 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 4.35 seconds after request\n",
"Text received: 63\n",
"\n",
"Event received 4.38 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 4.41 seconds after request\n",
"Text received: 64\n",
"\n",
"Event received 4.44 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 4.47 seconds after request\n",
"Text received: 65\n",
"\n",
"Event received 4.50 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 4.58 seconds after request\n",
"Text received: 66\n",
"\n",
"Event received 4.62 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 4.64 seconds after request\n",
"Text received: 67\n",
"\n",
"Event received 4.67 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 4.71 seconds after request\n",
"Text received: 68\n",
"\n",
"Event received 4.74 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 4.84 seconds after request\n",
"Text received: 69\n",
"\n",
"Event received 4.87 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 4.92 seconds after request\n",
"Text received: 70\n",
"\n",
"Event received 4.95 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 4.98 seconds after request\n",
"Text received: 71\n",
"\n",
"Event received 5.03 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 5.12 seconds after request\n",
"Text received: 72\n",
"\n",
"Event received 5.15 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 5.18 seconds after request\n",
"Text received: 73\n",
"\n",
"Event received 5.21 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 5.24 seconds after request\n",
"Text received: 74\n",
"\n",
"Event received 5.27 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 5.30 seconds after request\n",
"Text received: 75\n",
"\n",
"Event received 5.33 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 5.36 seconds after request\n",
"Text received: 76\n",
"\n",
"Event received 5.39 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 5.42 seconds after request\n",
"Text received: 77\n",
"\n",
"Event received 5.45 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 5.48 seconds after request\n",
"Text received: 78\n",
"\n",
"Event received 5.51 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 5.54 seconds after request\n",
"Text received: 79\n",
"\n",
"Event received 5.57 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 5.61 seconds after request\n",
"Text received: 80\n",
"\n",
"Event received 5.65 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 5.68 seconds after request\n",
"Text received: 81\n",
"\n",
"Event received 5.71 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 5.74 seconds after request\n",
"Text received: 82\n",
"\n",
"Event received 5.81 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 5.81 seconds after request\n",
"Text received: 83\n",
"\n",
"Event received 5.83 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 5.86 seconds after request\n",
"Text received: 84\n",
"\n",
"Event received 5.89 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 5.92 seconds after request\n",
"Text received: 85\n",
"\n",
"Event received 5.95 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 6.14 seconds after request\n",
"Text received: 86\n",
"\n",
"Event received 6.18 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 6.21 seconds after request\n",
"Text received: 87\n",
"\n",
"Event received 6.24 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 6.27 seconds after request\n",
"Text received: 88\n",
"\n",
"Event received 6.30 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 6.33 seconds after request\n",
"Text received: 89\n",
"\n",
"Event received 6.36 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 6.38 seconds after request\n",
"Text received: 90\n",
"\n",
"Event received 6.41 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 6.44 seconds after request\n",
"Text received: 91\n",
"\n",
"Event received 6.47 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 6.50 seconds after request\n",
"Text received: 92\n",
"\n",
"Event received 6.53 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 6.55 seconds after request\n",
"Text received: 93\n",
"\n",
"Event received 6.58 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 6.61 seconds after request\n",
"Text received: 94\n",
"\n",
"Event received 6.64 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 6.67 seconds after request\n",
"Text received: 95\n",
"\n",
"Event received 6.70 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 6.72 seconds after request\n",
"Text received: 96\n",
"\n",
"Event received 6.75 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 6.78 seconds after request\n",
"Text received: 97\n",
"\n",
"Event received 6.81 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 6.84 seconds after request\n",
"Text received: 98\n",
"\n",
"Event received 6.86 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 6.89 seconds after request\n",
"Text received: 99\n",
"\n",
"Event received 6.92 seconds after request\n",
"Text received: ,\n",
"\n",
"Event received 6.95 seconds after request\n",
"Text received: 100\n",
"\n",
"Full response received 6.93 seconds after request\n",
"Full text received: 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100\n"
]
}
],
"source": [
"# Example of an OpenAI Completion request, using the stream=True option\n",
"# https://beta.openai.com/docs/api-reference/completions/create\n",
"\n",
"# record the time before the request is sent\n",
"start_time = time.time()\n",
"\n",
"# send a Completion request to count to 100\n",
"response = openai.Completion.create(\n",
" model='text-davinci-002',\n",
" prompt='1,2,3,',\n",
" max_tokens=193,\n",
" temperature=0,\n",
" stream=True, # this time, we set stream=True\n",
")\n",
"\n",
"# create variables to collect the stream of events\n",
"collected_events = []\n",
"completion_text = ''\n",
"# iterate through the stream of events\n",
"for event in response:\n",
" event_time = time.time() - start_time # calculate the time delay of the event\n",
" print(f\"Event received {event_time:.2f} seconds after request\") # print the time delay\n",
" collected_events.append(event) # save the event response\n",
" event_text = event['choices'][0]['text'] # extract the text\n",
" completion_text += event_text # append the text\n",
" print(f\"Text received: {event_text}\\n\") # print the text\n",
"\n",
"# print the time delay and text received\n",
"print(f\"Full response received {response_time:.2f} seconds after request\")\n",
"print(f\"Full text received: {completion_text}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Time comparison\n",
"\n",
"In the example above, both requests took 6.93 seconds to fully complete.\n",
"\n",
"However, with the streaming request, you would have received the first token after 0.19 seconds, and subsequent tokens after about ~0.035 seconds each."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.9 ('openai')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.9"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}