openai-cookbook/examples/How_to_handle_rate_limits.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to handle rate limits\n",
    "\n",
    "When you call the OpenAI API repeatedly, you may encounter error messages that say `429: 'Too Many Requests'` or `RateLimitError`. These error messages come from exceeding the API's rate limits.\n",
    "\n",
    "This guide shares tips for avoiding and handling rate limit errors.\n",
    "\n",
    "To see an example script for throttling parallel requests to avoid rate limit errors, see [api_request_parallel_processor.py](api_request_parallel_processor.py).\n",
    "\n",
    "## Why rate limits exist\n",
    "\n",
    "Rate limits are a common practice for APIs, and they're put in place for a few different reasons.\n",
    "\n",
    "- First, they help protect against abuse or misuse of the API. For example, a malicious actor could flood the API with requests in an attempt to overload it or cause disruptions in service. By setting rate limits, OpenAI can prevent this kind of activity.\n",
    "- Second, rate limits help ensure that everyone has fair access to the API. If one person or organization makes an excessive number of requests, it could bog down the API for everyone else. By throttling the number of requests that a single user can make, OpenAI ensures that everyone has an opportunity to use the API without experiencing slowdowns.\n",
    "- Lastly, rate limits can help OpenAI manage the aggregate load on its infrastructure. If requests to the API increase dramatically, it could tax the servers and cause performance issues. By setting rate limits, OpenAI can help maintain a smooth and consistent experience for all users.\n",
    "\n",
    "Although hitting rate limits can be frustrating, rate limits exist to protect the reliable operation of the API for its users."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Default rate limits\n",
    "\n",
    "As of Jan 2023, the default rate limits are:\n",
    "\n",
    "<table>\n",
    "<thead>\n",
    "  <tr>\n",
    "    <th></th>\n",
    "    <th>Text Completion &amp; Embedding endpoints</th>\n",
    "    <th>Code &amp; Edit endpoints</th>\n",
    "  </tr>\n",
    "</thead>\n",
    "<tbody>\n",
    "  <tr>\n",
    "    <td>Free trial users</td>\n",
    "    <td>\n",
    "        <ul>\n",
    "            <li>20 requests / minute</li>\n",
    "            <li>150,000 tokens / minute</li>\n",
    "        </ul>\n",
    "    </td>\n",
    "    <td>\n",
    "        <ul>\n",
    "            <li>20 requests / minute</li>\n",
    "            <li>150,000 tokens / minute</li>\n",
    "        </ul>\n",
    "    </td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>Pay-as-you-go users (in your first 48 hours)</td>\n",
    "    <td>\n",
    "        <ul>\n",
    "            <li>60 requests / minute</li>\n",
    "            <li>250,000 davinci tokens / minute (and proportionally more for cheaper models)</li>\n",
    "        </ul>\n",
    "    </td>\n",
    "    <td>\n",
    "        <ul>\n",
    "            <li>20 requests / minute</li>\n",
    "            <li>150,000 tokens / minute</li>\n",
    "        </ul>\n",
    "    </td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>Pay-as-you-go users (after your first 48 hours)</td>\n",
    "    <td>\n",
    "        <ul>\n",
    "            <li>3,000 requests / minute</li>\n",
    "            <li>250,000 davinci tokens / minute (and proportionally more for cheaper models)</li>\n",
    "        </ul>\n",
    "    </td>\n",
    "    <td>\n",
    "        <ul>\n",
    "            <li>20 requests / minute</li>\n",
    "            <li>150,000 tokens / minute</li>\n",
    "        </ul>\n",
    "    </td>\n",
    "  </tr>\n",
    "</tbody>\n",
    "</table>\n",
    "\n",
    "For reference, 1,000 tokens is roughly a page of text.\n",
    "\n",
    "### Other rate limit resources\n",
    "\n",
    "Read more about OpenAI's rate limits in these other resources:\n",
    "\n",
    "- [Guide: Rate limits](https://beta.openai.com/docs/guides/rate-limits/overview)\n",
    "- [Help Center: Is API usage subject to any rate limits?](https://help.openai.com/en/articles/5955598-is-api-usage-subject-to-any-rate-limits)\n",
    "- [Help Center: How can I solve 429: 'Too Many Requests' errors?](https://help.openai.com/en/articles/5955604-how-can-i-solve-429-too-many-requests-errors)\n",
    "\n",
    "### Requesting a rate limit increase\n",
    "\n",
    "If you'd like your organization's rate limit increased, please fill out the following form:\n",
    "\n",
    "- [OpenAI Rate Limit Increase Request form](https://forms.gle/56ZrwXXoxAN1yt6i9)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example rate limit error\n",
    "\n",
    "A rate limit error will occur when API requests are sent too quickly. If using the OpenAI Python library, they will look something like:\n",
    "\n",
    "```\n",
    "RateLimitError: Rate limit reached for default-codex in organization org-{id} on requests per min. Limit: 20.000000 / min. Current: 24.000000 / min. Contact support@openai.com if you continue to have issues or if you’d like to request an increase.\n",
    "```\n",
    "\n",
    "Below is example code for triggering a rate limit error."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import openai  # for making OpenAI API requests\n",
    "\n",
    "# request a bunch of completions in a loop\n",
    "for _ in range(100):\n",
    "    openai.ChatCompletion.create(\n",
    "        model=\"gpt-3.5-turbo\",\n",
    "        messages=[{\"role\": \"user\", \"content\": \"Hello\"}],\n",
    "        max_tokens=10,\n",
    "    )\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How to avoid rate limit errors\n",
    "\n",
    "### Retrying with exponential backoff\n",
    "\n",
    "One easy way to avoid rate limit errors is to automatically retry requests with a random exponential backoff. Retrying with exponential backoff means performing a short sleep when a rate limit error is hit, then retrying the unsuccessful request. If the request is still unsuccessful, the sleep length is increased and the process is repeated. This continues until the request is successful or until a maximum number of retries is reached.\n",
    "\n",
    "This approach has many benefits:\n",
    "\n",
    "- Automatic retries means you can recover from rate limit errors without crashes or missing data\n",
    "- Exponential backoff means that your first retries can be tried quickly, while still benefiting from longer delays if your first few retries fail\n",
    "- Adding random jitter to the delay helps retries from all hitting at the same time\n",
    "\n",
    "Note that unsuccessful requests contribute to your per-minute limit, so continuously resending a request won’t work.\n",
    "\n",
    "Below are a few example solutions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Example #1: Using the Tenacity library\n",
    "\n",
    "[Tenacity](https://tenacity.readthedocs.io/en/latest/) is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.\n",
    "\n",
    "To add exponential backoff to your requests, you can use the `tenacity.retry` [decorator](https://peps.python.org/pep-0318/). The following example uses the `tenacity.wait_random_exponential` function to add random exponential backoff to a request.\n",
    "\n",
    "Note that the Tenacity library is a third-party tool, and OpenAI makes no guarantees about its reliability or security."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<OpenAIObject text_completion id=cmpl-5oowO391reUW8RGVfFyzBM1uBs4A5 at 0x10d8cae00> JSON: {\n",
       "  \"choices\": [\n",
       "    {\n",
       "      \"finish_reason\": \"length\",\n",
       "      \"index\": 0,\n",
       "      \"logprobs\": null,\n",
       "      \"text\": \" a little girl dreamed of becoming a model.\\n\\nNowadays, that dream\"\n",
       "    }\n",
       "  ],\n",
       "  \"created\": 1662793900,\n",
       "  \"id\": \"cmpl-5oowO391reUW8RGVfFyzBM1uBs4A5\",\n",
       "  \"model\": \"text-davinci-002\",\n",
       "  \"object\": \"text_completion\",\n",
       "  \"usage\": {\n",
       "    \"completion_tokens\": 16,\n",
       "    \"prompt_tokens\": 5,\n",
       "    \"total_tokens\": 21\n",
       "  }\n",
       "}"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import openai  # for OpenAI API calls\n",
    "from tenacity import (\n",
    "    retry,\n",
    "    stop_after_attempt,\n",
    "    wait_random_exponential,\n",
    ")  # for exponential backoff\n",
    "\n",
    "\n",
    "@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))\n",
    "def completion_with_backoff(**kwargs):\n",
    "    return openai.Completion.create(**kwargs)\n",
    "\n",
    "\n",
    "completion_with_backoff(model=\"text-davinci-002\", prompt=\"Once upon a time,\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Example #2: Using the backoff library\n",
    "\n",
    "Another library that provides function decorators for backoff and retry is [backoff](https://pypi.org/project/backoff/).\n",
    "\n",
    "Like Tenacity, the backoff library is a third-party tool, and OpenAI makes no guarantees about its reliability or security."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<OpenAIObject text_completion id=cmpl-5oowPhIdUvshEsF1rBhhwE9KFfI3M at 0x111043680> JSON: {\n",
       "  \"choices\": [\n",
       "    {\n",
       "      \"finish_reason\": \"length\",\n",
       "      \"index\": 0,\n",
       "      \"logprobs\": null,\n",
       "      \"text\": \" two children lived in a poor country village. In the winter, the temperature would\"\n",
       "    }\n",
       "  ],\n",
       "  \"created\": 1662793901,\n",
       "  \"id\": \"cmpl-5oowPhIdUvshEsF1rBhhwE9KFfI3M\",\n",
       "  \"model\": \"text-davinci-002\",\n",
       "  \"object\": \"text_completion\",\n",
       "  \"usage\": {\n",
       "    \"completion_tokens\": 16,\n",
       "    \"prompt_tokens\": 5,\n",
       "    \"total_tokens\": 21\n",
       "  }\n",
       "}"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import backoff  # for exponential backoff\n",
    "import openai  # for OpenAI API calls\n",
    "\n",
    "\n",
    "@backoff.on_exception(backoff.expo, openai.error.RateLimitError)\n",
    "def completions_with_backoff(**kwargs):\n",
    "    return openai.Completion.create(**kwargs)\n",
    "\n",
    "\n",
    "completions_with_backoff(model=\"text-davinci-002\", prompt=\"Once upon a time,\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Example 3: Manual backoff implementation\n",
    "\n",
    "If you don't want to use third-party libraries, you can implement your own backoff logic."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<OpenAIObject text_completion id=cmpl-5oowRsCXv3AkUgVJyyo3TQrVq7hIT at 0x111024220> JSON: {\n",
       "  \"choices\": [\n",
       "    {\n",
       "      \"finish_reason\": \"length\",\n",
       "      \"index\": 0,\n",
       "      \"logprobs\": null,\n",
       "      \"text\": \" a man decided to greatly improve his karma by turning his life around.\\n\\n\"\n",
       "    }\n",
       "  ],\n",
       "  \"created\": 1662793903,\n",
       "  \"id\": \"cmpl-5oowRsCXv3AkUgVJyyo3TQrVq7hIT\",\n",
       "  \"model\": \"text-davinci-002\",\n",
       "  \"object\": \"text_completion\",\n",
       "  \"usage\": {\n",
       "    \"completion_tokens\": 16,\n",
       "    \"prompt_tokens\": 5,\n",
       "    \"total_tokens\": 21\n",
       "  }\n",
       "}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# imports\n",
    "import random\n",
    "import time\n",
    "\n",
    "import openai\n",
    "\n",
    "# define a retry decorator\n",
    "def retry_with_exponential_backoff(\n",
    "    func,\n",
    "    initial_delay: float = 1,\n",
    "    exponential_base: float = 2,\n",
    "    jitter: bool = True,\n",
    "    max_retries: int = 10,\n",
    "    errors: tuple = (openai.error.RateLimitError,),\n",
    "):\n",
    "    \"\"\"Retry a function with exponential backoff.\"\"\"\n",
    "\n",
    "    def wrapper(*args, **kwargs):\n",
    "        # Initialize variables\n",
    "        num_retries = 0\n",
    "        delay = initial_delay\n",
    "\n",
    "        # Loop until a successful response or max_retries is hit or an exception is raised\n",
    "        while True:\n",
    "            try:\n",
    "                return func(*args, **kwargs)\n",
    "\n",
    "            # Retry on specified errors\n",
    "            except errors as e:\n",
    "                # Increment retries\n",
    "                num_retries += 1\n",
    "\n",
    "                # Check if max retries has been reached\n",
    "                if num_retries > max_retries:\n",
    "                    raise Exception(\n",
    "                        f\"Maximum number of retries ({max_retries}) exceeded.\"\n",
    "                    )\n",
    "\n",
    "                # Increment the delay\n",
    "                delay *= exponential_base * (1 + jitter * random.random())\n",
    "\n",
    "                # Sleep for the delay\n",
    "                time.sleep(delay)\n",
    "\n",
    "            # Raise exceptions for any errors not specified\n",
    "            except Exception as e:\n",
    "                raise e\n",
    "\n",
    "    return wrapper\n",
    "\n",
    "\n",
    "@retry_with_exponential_backoff\n",
    "def completions_with_backoff(**kwargs):\n",
    "    return openai.Completion.create(**kwargs)\n",
    "\n",
    "\n",
    "completions_with_backoff(model=\"text-davinci-002\", prompt=\"Once upon a time,\")\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How to maximize throughput of batch processing given rate limits\n",
    "\n",
    "If you're processing real-time requests from users, backoff and retry is a great strategy to minimize latency while avoiding rate limit errors.\n",
    "\n",
    "However, if you're processing large volumes of batch data, where throughput matters more than latency, there are a few other things you can do in addition to backoff and retry.\n",
    "\n",
    "### Proactively adding delay between requests\n",
    "\n",
    "If you are constantly hitting the rate limit, then backing off, then hitting the rate limit again, then backing off again, it's possible that a good fraction of your request budget will be 'wasted' on requests that need to be retried. This limits your processing throughput, given a fixed rate limit.\n",
    "\n",
    "Here, one potential solution is to calculate your rate limit and add a delay equal to its reciprocal (e.g., if your rate limit 20 requests per minute, add a delay of 3–6 seconds to each request). This can help you operate near the rate limit ceiling without hitting it and incurring wasted requests.\n",
    "\n",
    "#### Example of adding delay to a request"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<OpenAIObject text_completion id=cmpl-5oowVVZnAzdCPtUJ0rifeamtLcZRp at 0x11b2c7680> JSON: {\n",
       "  \"choices\": [\n",
       "    {\n",
       "      \"finish_reason\": \"length\",\n",
       "      \"index\": 0,\n",
       "      \"logprobs\": null,\n",
       "      \"text\": \" there was an idyllic little farm that sat by a babbling brook\"\n",
       "    }\n",
       "  ],\n",
       "  \"created\": 1662793907,\n",
       "  \"id\": \"cmpl-5oowVVZnAzdCPtUJ0rifeamtLcZRp\",\n",
       "  \"model\": \"text-davinci-002\",\n",
       "  \"object\": \"text_completion\",\n",
       "  \"usage\": {\n",
       "    \"completion_tokens\": 16,\n",
       "    \"prompt_tokens\": 5,\n",
       "    \"total_tokens\": 21\n",
       "  }\n",
       "}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# imports\n",
    "import time\n",
    "import openai\n",
    "\n",
    "# Define a function that adds a delay to a Completion API call\n",
    "def delayed_completion(delay_in_seconds: float = 1, **kwargs):\n",
    "    \"\"\"Delay a completion by a specified amount of time.\"\"\"\n",
    "\n",
    "    # Sleep for the delay\n",
    "    time.sleep(delay_in_seconds)\n",
    "\n",
    "    # Call the Completion API and return the result\n",
    "    return openai.Completion.create(**kwargs)\n",
    "\n",
    "\n",
    "# Calculate the delay based on your rate limit\n",
    "rate_limit_per_minute = 20\n",
    "delay = 60.0 / rate_limit_per_minute\n",
    "\n",
    "delayed_completion(\n",
    "    delay_in_seconds=delay,\n",
    "    model=\"text-davinci-002\",\n",
    "    prompt=\"Once upon a time,\"\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "### Batching requests\n",
    "\n",
    "The OpenAI API has separate limits for requests per minute and tokens per minute.\n",
    "\n",
    "If you're hitting the limit on requests per minute, but have headroom on tokens per minute, you can increase your throughput by batching multiple tasks into each request. This will allow you to process more tokens per minute, especially with the smaller models.\n",
    "\n",
    "Sending in a batch of prompts works exactly the same as a normal API call, except that pass in a list of strings to `prompt` parameter instead of a single string.\n",
    "\n",
    "**Warning:** the response object may not return completions in the order of the prompts, so always remember to match responses back to prompts using the `index` field.\n",
    "\n",
    "#### Example without batching"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Once upon a time, before there were grandiloquent tales of the massacre at Fort Mims, there were stories of\n",
      "Once upon a time, a full-sized search and rescue was created. However, CIDIs are the addition of requiring\n",
      "Once upon a time, Schubert was hot with the films. “Schubert sings of honey, flowers,\n",
      "Once upon a time, you could watch these films on your VCR, sometimes years after their initial theatrical release, and there\n",
      "Once upon a time, there was a forest. In that forest, the forest animals ruled. The forest animals had their homes\n",
      "Once upon a time, there were two programs that complained about false positive scans. Peacock and Midnight Manager alike, only\n",
      "Once upon a time, a long, long time ago, tragedy struck. it was the darkest of nights, and there was\n",
      "Once upon a time, when Adam was a perfect little gentleman, he was presented at Court as a guarantee of good character.\n",
      "Once upon a time, Adam and Eve made a mistake. They ate the fruit from the tree of immortality and split the consequences\n",
      "Once upon a time, there was a set of programming fundamental principles known as the \"X model.\" This is a set of\n"
     ]
    }
   ],
   "source": [
    "import openai  # for making OpenAI API requests\n",
    "\n",
    "\n",
    "num_stories = 10\n",
    "prompt = \"Once upon a time,\"\n",
    "\n",
    "# serial example, with one story completion per request\n",
    "for _ in range(num_stories):\n",
    "    response = openai.Completion.create(\n",
    "        model=\"curie\",\n",
    "        prompt=prompt,\n",
    "        max_tokens=20,\n",
    "    )\n",
    "\n",
    "    # print story\n",
    "    print(prompt + response.choices[0].text)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Example with batching"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Once upon a time, there were two sisters, Eliza Pickering and Ariana 'Ari' Lucas. When these lovely\n",
      "Once upon a time, Keene was stung by a worm — actually, probably a python — snaking through his leg\n",
      "Once upon a time, there was a professor of physics during the depression. It was difficult, during this time, to get\n",
      "Once upon a time, before you got sick, you told stories to all and sundry, and your listeners believed in you\n",
      "Once upon a time, there was one very old nice donkey. He was incredibly smart, in a very old, kind of\n",
      "Once upon a time, the property of a common lodging house was a common cup for all the inhabitants. Betimes a constant\n",
      "Once upon a time, in an unspecified country, there was a witch who had an illegal product. It was highly effective,\n",
      "Once upon a time, a long time ago, I turned 13, my beautiful dog Duncan swept me up into his jaws like\n",
      "Once upon a time, as a thoroughly reformed creature from an army of Nazis, he took On Judgement Day myself and his\n",
      "Once upon a time, Capcom made a game for the Atari VCS called Missile Command. While it was innovative at the time\n"
     ]
    }
   ],
   "source": [
    "import openai  # for making OpenAI API requests\n",
    "\n",
    "\n",
    "num_stories = 10\n",
    "prompts = [\"Once upon a time,\"] * num_stories\n",
    "\n",
    "# batched example, with 10 stories completions per request\n",
    "response = openai.Completion.create(\n",
    "    model=\"curie\",\n",
    "    prompt=prompts,\n",
    "    max_tokens=20,\n",
    ")\n",
    "\n",
    "# match completions to prompts by index\n",
    "stories = [\"\"] * len(prompts)\n",
    "for choice in response.choices:\n",
    "    stories[choice.index] = prompts[choice.index] + choice.text\n",
    "\n",
    "# print stories\n",
    "for story in stories:\n",
    "    print(story)\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example parallel processing script\n",
    "\n",
    "We've written an example script for parallel processing large quantities of API requests: [api_request_parallel_processor.py](https://github.com/openai/openai-cookbook/blob/main/examples/api_request_parallel_processor.py).\n",
    "\n",
    "The script combines some handy features:\n",
    "- Streams requests from file, to avoid running out of memory for giant jobs\n",
    "- Makes requests concurrently, to maximize throughput\n",
    "- Throttles both request and token usage, to stay under rate limits\n",
    "- Retries failed requests, to avoid missing data\n",
    "- Logs errors, to diagnose problems with requests\n",
    "\n",
    "Feel free to use it as is or modify it to suit your needs."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.9.9 ('openai')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.9 (main, Dec  7 2021, 18:04:56) \n[Clang 13.0.0 (clang-1300.0.29.3)]"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}