openai-cookbook/examples/batch_processing.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "51c7a1f9",
   "metadata": {},
   "source": [
    "# Batch processing with the Batch API\n",
    "\n",
    "The new Batch API allows to **create async batch jobs for a lower price and with higher rate limits**.\n",
    "\n",
    "Batches will be completed within 24h, but may be processed sooner depending on global usage. \n",
    "\n",
    "Ideal use cases for the Batch API include:\n",
    "\n",
    "- Tagging, captioning, or enriching content on a marketplace or blog\n",
    "- Categorizing and suggesting answers for support tickets\n",
    "- Performing sentiment analysis on large datasets of customer feedback\n",
    "- Generating summaries or translations for collections of documents or articles\n",
    "\n",
    "and much more!\n",
    "\n",
    "This cookbook will walk you through how to use the Batch API with a couple of practical examples.\n",
    "\n",
    "We will start with an example to categorize movies using `gpt-3.5-turbo`, and then cover how we can use the vision capabilities of `gpt-4-turbo` to caption images.\n",
    "\n",
    "Please note that multiple models are available through the Batch API, and that you can use the same parameters in your Batch API calls as with the Chat Completions endpoint."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85ae3c4e",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "02e22580",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make sure you have the latest version of the SDK available to use the Batch API\n",
    "%pip install openai --upgrade"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "726bacba",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from openai import OpenAI\n",
    "import pandas as pd\n",
    "from IPython.display import Image, display"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "4ac0c9a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initializing OpenAI client - see https://platform.openai.com/docs/quickstart?context=python\n",
    "client = OpenAI()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5fec950f",
   "metadata": {},
   "source": [
    "## First example: Categorizing movies\n",
    "\n",
    "In this example, we will use `gpt-3.5-turbo` to extract movie categories from a description of the movie. We will also extract a 1-sentence summary from this description. \n",
    "\n",
    "We will use [JSON mode](https://platform.openai.com/docs/guides/text-generation/json-mode) to extract categories as an array of strings and the 1-sentence summary in a structured format. \n",
    "\n",
    "For each movie, we want to get a result that looks like this:\n",
    "\n",
    "```\n",
    "{\n",
    "    categories: ['category1', 'category2', 'category3'],\n",
    "    summary: '1-sentence summary'\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9fc6dcd6",
   "metadata": {},
   "source": [
    "### Loading data\n",
    "\n",
    "We will use the IMDB top 1000 movies dataset for this example. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "1b721a0d",
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Poster_Link</th>\n",
       "      <th>Series_Title</th>\n",
       "      <th>Released_Year</th>\n",
       "      <th>Certificate</th>\n",
       "      <th>Runtime</th>\n",
       "      <th>Genre</th>\n",
       "      <th>IMDB_Rating</th>\n",
       "      <th>Overview</th>\n",
       "      <th>Meta_score</th>\n",
       "      <th>Director</th>\n",
       "      <th>Star1</th>\n",
       "      <th>Star2</th>\n",
       "      <th>Star3</th>\n",
       "      <th>Star4</th>\n",
       "      <th>No_of_Votes</th>\n",
       "      <th>Gross</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>https://m.media-amazon.com/images/M/MV5BMDFkYT...</td>\n",
       "      <td>The Shawshank Redemption</td>\n",
       "      <td>1994</td>\n",
       "      <td>A</td>\n",
       "      <td>142 min</td>\n",
       "      <td>Drama</td>\n",
       "      <td>9.3</td>\n",
       "      <td>Two imprisoned men bond over a number of years...</td>\n",
       "      <td>80.0</td>\n",
       "      <td>Frank Darabont</td>\n",
       "      <td>Tim Robbins</td>\n",
       "      <td>Morgan Freeman</td>\n",
       "      <td>Bob Gunton</td>\n",
       "      <td>William Sadler</td>\n",
       "      <td>2343110</td>\n",
       "      <td>28,341,469</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>https://m.media-amazon.com/images/M/MV5BM2MyNj...</td>\n",
       "      <td>The Godfather</td>\n",
       "      <td>1972</td>\n",
       "      <td>A</td>\n",
       "      <td>175 min</td>\n",
       "      <td>Crime, Drama</td>\n",
       "      <td>9.2</td>\n",
       "      <td>An organized crime dynasty's aging patriarch t...</td>\n",
       "      <td>100.0</td>\n",
       "      <td>Francis Ford Coppola</td>\n",
       "      <td>Marlon Brando</td>\n",
       "      <td>Al Pacino</td>\n",
       "      <td>James Caan</td>\n",
       "      <td>Diane Keaton</td>\n",
       "      <td>1620367</td>\n",
       "      <td>134,966,411</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>https://m.media-amazon.com/images/M/MV5BMTMxNT...</td>\n",
       "      <td>The Dark Knight</td>\n",
       "      <td>2008</td>\n",
       "      <td>UA</td>\n",
       "      <td>152 min</td>\n",
       "      <td>Action, Crime, Drama</td>\n",
       "      <td>9.0</td>\n",
       "      <td>When the menace known as the Joker wreaks havo...</td>\n",
       "      <td>84.0</td>\n",
       "      <td>Christopher Nolan</td>\n",
       "      <td>Christian Bale</td>\n",
       "      <td>Heath Ledger</td>\n",
       "      <td>Aaron Eckhart</td>\n",
       "      <td>Michael Caine</td>\n",
       "      <td>2303232</td>\n",
       "      <td>534,858,444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>https://m.media-amazon.com/images/M/MV5BMWMwMG...</td>\n",
       "      <td>The Godfather: Part II</td>\n",
       "      <td>1974</td>\n",
       "      <td>A</td>\n",
       "      <td>202 min</td>\n",
       "      <td>Crime, Drama</td>\n",
       "      <td>9.0</td>\n",
       "      <td>The early life and career of Vito Corleone in ...</td>\n",
       "      <td>90.0</td>\n",
       "      <td>Francis Ford Coppola</td>\n",
       "      <td>Al Pacino</td>\n",
       "      <td>Robert De Niro</td>\n",
       "      <td>Robert Duvall</td>\n",
       "      <td>Diane Keaton</td>\n",
       "      <td>1129952</td>\n",
       "      <td>57,300,000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>https://m.media-amazon.com/images/M/MV5BMWU4N2...</td>\n",
       "      <td>12 Angry Men</td>\n",
       "      <td>1957</td>\n",
       "      <td>U</td>\n",
       "      <td>96 min</td>\n",
       "      <td>Crime, Drama</td>\n",
       "      <td>9.0</td>\n",
       "      <td>A jury holdout attempts to prevent a miscarria...</td>\n",
       "      <td>96.0</td>\n",
       "      <td>Sidney Lumet</td>\n",
       "      <td>Henry Fonda</td>\n",
       "      <td>Lee J. Cobb</td>\n",
       "      <td>Martin Balsam</td>\n",
       "      <td>John Fiedler</td>\n",
       "      <td>689845</td>\n",
       "      <td>4,360,000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                         Poster_Link  \\\n",
       "0  https://m.media-amazon.com/images/M/MV5BMDFkYT...   \n",
       "1  https://m.media-amazon.com/images/M/MV5BM2MyNj...   \n",
       "2  https://m.media-amazon.com/images/M/MV5BMTMxNT...   \n",
       "3  https://m.media-amazon.com/images/M/MV5BMWMwMG...   \n",
       "4  https://m.media-amazon.com/images/M/MV5BMWU4N2...   \n",
       "\n",
       "               Series_Title Released_Year Certificate  Runtime  \\\n",
       "0  The Shawshank Redemption          1994           A  142 min   \n",
       "1             The Godfather          1972           A  175 min   \n",
       "2           The Dark Knight          2008          UA  152 min   \n",
       "3    The Godfather: Part II          1974           A  202 min   \n",
       "4              12 Angry Men          1957           U   96 min   \n",
       "\n",
       "                  Genre  IMDB_Rating  \\\n",
       "0                 Drama          9.3   \n",
       "1          Crime, Drama          9.2   \n",
       "2  Action, Crime, Drama          9.0   \n",
       "3          Crime, Drama          9.0   \n",
       "4          Crime, Drama          9.0   \n",
       "\n",
       "                                            Overview  Meta_score  \\\n",
       "0  Two imprisoned men bond over a number of years...        80.0   \n",
       "1  An organized crime dynasty's aging patriarch t...       100.0   \n",
       "2  When the menace known as the Joker wreaks havo...        84.0   \n",
       "3  The early life and career of Vito Corleone in ...        90.0   \n",
       "4  A jury holdout attempts to prevent a miscarria...        96.0   \n",
       "\n",
       "               Director           Star1           Star2          Star3  \\\n",
       "0        Frank Darabont     Tim Robbins  Morgan Freeman     Bob Gunton   \n",
       "1  Francis Ford Coppola   Marlon Brando       Al Pacino     James Caan   \n",
       "2     Christopher Nolan  Christian Bale    Heath Ledger  Aaron Eckhart   \n",
       "3  Francis Ford Coppola       Al Pacino  Robert De Niro  Robert Duvall   \n",
       "4          Sidney Lumet     Henry Fonda     Lee J. Cobb  Martin Balsam   \n",
       "\n",
       "            Star4  No_of_Votes        Gross  \n",
       "0  William Sadler      2343110   28,341,469  \n",
       "1    Diane Keaton      1620367  134,966,411  \n",
       "2   Michael Caine      2303232  534,858,444  \n",
       "3    Diane Keaton      1129952   57,300,000  \n",
       "4    John Fiedler       689845    4,360,000  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataset_path = \"data/imdb_top_1000.csv\"\n",
    "\n",
    "df = pd.read_csv(dataset_path)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01396a47",
   "metadata": {},
   "source": [
    "### Processing step \n",
    "\n",
    "Here, we will prepare our requests by first trying them out with the Chat Completions endpoint.\n",
    "\n",
    "Once we're happy with the results, we can move on to creating the batch file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "806e768c",
   "metadata": {},
   "outputs": [],
   "source": [
    "categorize_system_prompt = '''\n",
    "Your goal is to extract movie categories from movie descriptions, as well as a 1-sentence summary for these movies.\n",
    "You will be provided with a movie description, and you will output a json object containing the following information:\n",
    "\n",
    "{\n",
    "    categories: string[] // Array of categories based on the movie description,\n",
    "    summary: string // 1-sentence summary of the movie based on the movie description\n",
    "}\n",
    "\n",
    "Categories refer to the genre or type of the movie, like \"action\", \"romance\", \"comedy\", etc. Keep category names simple and use only lower case letters.\n",
    "Movies can have several categories, but try to keep it under 3-4. Only mention the categories that are the most obvious based on the description.\n",
    "'''\n",
    "\n",
    "def get_categories(description):\n",
    "    response = client.chat.completions.create(\n",
    "    model=\"gpt-3.5-turbo\",\n",
    "    temperature=0.1,\n",
    "    # This is to enable JSON mode, making sure responses are valid json objects\n",
    "    response_format={ \n",
    "        \"type\": \"json_object\"\n",
    "    },\n",
    "    messages=[\n",
    "        {\n",
    "            \"role\": \"system\",\n",
    "            \"content\": categorize_system_prompt\n",
    "        },\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": description\n",
    "        }\n",
    "    ],\n",
    "    )\n",
    "\n",
    "    return response.choices[0].message.content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "4d079c56",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TITLE: The Shawshank Redemption\n",
      "OVERVIEW: Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.\n",
      "\n",
      "RESULT: {\n",
      "    \"categories\": [\"drama\"],\n",
      "    \"summary\": \"Two imprisoned men bond over the years and find redemption through acts of common decency.\"\n",
      "}\n",
      "\n",
      "\n",
      "----------------------------\n",
      "\n",
      "\n",
      "TITLE: The Godfather\n",
      "OVERVIEW: An organized crime dynasty's aging patriarch transfers control of his clandestine empire to his reluctant son.\n",
      "\n",
      "RESULT: {\n",
      "    \"categories\": [\"crime\", \"drama\"],\n",
      "    \"summary\": \"A crime drama about an aging patriarch passing on his empire to his son.\"\n",
      "}\n",
      "\n",
      "\n",
      "----------------------------\n",
      "\n",
      "\n",
      "TITLE: The Dark Knight\n",
      "OVERVIEW: When the menace known as the Joker wreaks havoc and chaos on the people of Gotham, Batman must accept one of the greatest psychological and physical tests of his ability to fight injustice.\n",
      "\n",
      "RESULT: {\n",
      "    \"categories\": [\"action\", \"thriller\"],\n",
      "    \"summary\": \"A thrilling action movie where Batman faces the chaotic Joker in a battle of justice.\"\n",
      "}\n",
      "\n",
      "\n",
      "----------------------------\n",
      "\n",
      "\n",
      "TITLE: The Godfather: Part II\n",
      "OVERVIEW: The early life and career of Vito Corleone in 1920s New York City is portrayed, while his son, Michael, expands and tightens his grip on the family crime syndicate.\n",
      "\n",
      "RESULT: {\n",
      "    \"categories\": [\"crime\", \"drama\"],\n",
      "    \"summary\": \"A portrayal of Vito Corleone's early life and career in 1920s New York City, as his son Michael expands the family crime syndicate.\"\n",
      "}\n",
      "\n",
      "\n",
      "----------------------------\n",
      "\n",
      "\n",
      "TITLE: 12 Angry Men\n",
      "OVERVIEW: A jury holdout attempts to prevent a miscarriage of justice by forcing his colleagues to reconsider the evidence.\n",
      "\n",
      "RESULT: {\n",
      "    \"categories\": [\"drama\"],\n",
      "    \"summary\": \"A gripping drama about a jury holdout trying to prevent a miscarriage of justice by challenging his colleagues to reconsider the evidence.\"\n",
      "}\n",
      "\n",
      "\n",
      "----------------------------\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Testing on a few examples\n",
    "for _, row in df[:5].iterrows():\n",
    "    description = row['Overview']\n",
    "    title = row['Series_Title']\n",
    "    result = get_categories(description)\n",
    "    print(f\"TITLE: {title}\\nOVERVIEW: {description}\\n\\nRESULT: {result}\")\n",
    "    print(\"\\n\\n----------------------------\\n\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a89a6709",
   "metadata": {},
   "source": [
    "### Creating the batch file\n",
    "\n",
    "The batch file, in the `jsonl` format, should contain one line (json object) per request.\n",
    "Each request is defined as such:\n",
    "\n",
    "```\n",
    "{\n",
    "    \"custom_id\": <REQUEST_ID>,\n",
    "    \"method\": \"POST\",\n",
    "    \"url\": \"/v1/chat/completions\",\n",
    "    \"body\": {\n",
    "        \"model\": <MODEL>,\n",
    "        \"messages\": <MESSAGES>,\n",
    "        // other parameters\n",
    "    }\n",
    "}\n",
    "```\n",
    "\n",
    "Note: the request ID should be unique per batch. This is what you can use to match results to the initial input files, as requests will not be returned in the same order."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "81e37981",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Creating an array of json tasks\n",
    "\n",
    "tasks = []\n",
    "\n",
    "for index, row in df.iterrows():\n",
    "    \n",
    "    description = row['Overview']\n",
    "    \n",
    "    task = {\n",
    "        \"custom_id\": f\"task-{index}\",\n",
    "        \"method\": \"POST\",\n",
    "        \"url\": \"/v1/chat/completions\",\n",
    "        \"body\": {\n",
    "            # This is what you would have in your Chat Completions API call\n",
    "            \"model\": \"gpt-3.5-turbo\",\n",
    "            \"temperature\": 0.1,\n",
    "            \"response_format\": { \n",
    "                \"type\": \"json_object\"\n",
    "            },\n",
    "            \"messages\": [\n",
    "                {\n",
    "                    \"role\": \"system\",\n",
    "                    \"content\": categorize_system_prompt\n",
    "                },\n",
    "                {\n",
    "                    \"role\": \"user\",\n",
    "                    \"content\": description\n",
    "                }\n",
    "            ],\n",
    "        }\n",
    "    }\n",
    "    \n",
    "    tasks.append(task)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "257e2eda",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Creating the file\n",
    "\n",
    "file_name = \"data/batch_tasks_movies.jsonl\"\n",
    "\n",
    "with open(file_name, 'w') as file:\n",
    "    for obj in tasks:\n",
    "        file.write(json.dumps(obj) + '\\n')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6b490cd",
   "metadata": {},
   "source": [
    "### Uploading the file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "40ea90dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "batch_file = client.files.create(\n",
    "  file=open(file_name, \"rb\"),\n",
    "  purpose=\"batch\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "081f602f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "FileObject(id='file-nG1JDPSMRMinN8FOdaL30kVD', bytes=1127310, created_at=1714045723, filename='batch_tasks_movies.jsonl', object='file', purpose='batch', status='processed', status_details=None)\n"
     ]
    }
   ],
   "source": [
    "print(batch_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8ef8ab5",
   "metadata": {},
   "source": [
    "### Creating the batch job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "4db403a3",
   "metadata": {},
   "outputs": [],
   "source": [
    "batch_job = client.batches.create(\n",
    "  input_file_id=batch_file.id,\n",
    "  endpoint=\"/v1/chat/completions\",\n",
    "  completion_window=\"24h\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7ca66c6",
   "metadata": {},
   "source": [
    "### Checking batch status\n",
    "\n",
    "Note: this can take up to 24h, but it will usually be completed faster.\n",
    "\n",
    "You can continue checking until the status is 'completed'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "6105d809",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Batch(id='batch_xU74ytOBYUpaUQE3Cwi8SCbA', completion_window='24h', created_at=1714049780, endpoint='/v1/chat/completions', input_file_id='file-6y0JPmkHU42qtaEK8x8ZYzkp', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1714049914, error_file_id=None, errors=None, expired_at=None, expires_at=1714136180, failed_at=None, finalizing_at=1714049896, in_progress_at=1714049821, metadata=None, output_file_id='file-XPfkEFZSaM4Avps7mcD3i8BY', request_counts=BatchRequestCounts(completed=312, failed=0, total=312))\n"
     ]
    }
   ],
   "source": [
    "batch_job = client.batches.retrieve(batch_job.id)\n",
    "print(batch_job)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6988fb64",
   "metadata": {},
   "source": [
    "### Retrieving results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "682c38d5",
   "metadata": {},
   "outputs": [],
   "source": [
    "result_file_id = batch_job.output_file_id\n",
    "result = client.files.content(result_file_id).content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "52840df9",
   "metadata": {},
   "outputs": [],
   "source": [
    "result_file_name = \"data/batch_job_results_movies.jsonl\"\n",
    "\n",
    "with open(result_file_name, 'wb') as file:\n",
    "    file.write(result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "f11b7d19",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Loading data from saved file\n",
    "results = []\n",
    "with open(result_file_name, 'r') as file:\n",
    "    for line in file:\n",
    "        # Parsing the JSON string into a dict and appending to the list of results\n",
    "        json_object = json.loads(line.strip())\n",
    "        results.append(json_object)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2bafff8",
   "metadata": {},
   "source": [
    "### Reading results\n",
    "Reminder: the results are not in the same order as in the input file.\n",
    "Make sure to check the custom_id to match the results against the input requests"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "004c12d3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TITLE: American Psycho\n",
      "OVERVIEW: A wealthy New York City investment banking executive, Patrick Bateman, hides his alternate psychopathic ego from his co-workers and friends as he delves deeper into his violent, hedonistic fantasies.\n",
      "\n",
      "RESULT: {\n",
      "    \"categories\": [\"thriller\", \"psychological\", \"drama\"],\n",
      "    \"summary\": \"A wealthy investment banker in New York City conceals his psychopathic alter ego while indulging in violent and hedonistic fantasies.\"\n",
      "}\n",
      "\n",
      "\n",
      "----------------------------\n",
      "\n",
      "\n",
      "TITLE: Lethal Weapon\n",
      "OVERVIEW: Two newly paired cops who are complete opposites must put aside their differences in order to catch a gang of drug smugglers.\n",
      "\n",
      "RESULT: {\n",
      "    \"categories\": [\"action\", \"comedy\", \"crime\"],\n",
      "    \"summary\": \"An action-packed comedy about two mismatched cops teaming up to take down a drug smuggling gang.\"\n",
      "}\n",
      "\n",
      "\n",
      "----------------------------\n",
      "\n",
      "\n",
      "TITLE: A Star Is Born\n",
      "OVERVIEW: A musician helps a young singer find fame as age and alcoholism send his own career into a downward spiral.\n",
      "\n",
      "RESULT: {\n",
      "    \"categories\": [\"drama\", \"music\"],\n",
      "    \"summary\": \"A musician's career spirals downward as he helps a young singer find fame amidst struggles with age and alcoholism.\"\n",
      "}\n",
      "\n",
      "\n",
      "----------------------------\n",
      "\n",
      "\n",
      "TITLE: From Here to Eternity\n",
      "OVERVIEW: In Hawaii in 1941, a private is cruelly punished for not boxing on his unit's team, while his captain's wife and second-in-command are falling in love.\n",
      "\n",
      "RESULT: {\n",
      "    \"categories\": [\"drama\", \"romance\", \"war\"],\n",
      "    \"summary\": \"A drama set in Hawaii in 1941, where a private faces punishment for not boxing on his unit's team, amidst a forbidden love affair between his captain's wife and second-in-command.\"\n",
      "}\n",
      "\n",
      "\n",
      "----------------------------\n",
      "\n",
      "\n",
      "TITLE: The Jungle Book\n",
      "OVERVIEW: Bagheera the Panther and Baloo the Bear have a difficult time trying to convince a boy to leave the jungle for human civilization.\n",
      "\n",
      "RESULT: {\n",
      "    \"categories\": [\"adventure\", \"animation\", \"family\"],\n",
      "    \"summary\": \"An adventure-filled animated movie about a panther and a bear trying to persuade a boy to leave the jungle for human civilization.\"\n",
      "}\n",
      "\n",
      "\n",
      "----------------------------\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Reading only the first results\n",
    "for res in results[:5]:\n",
    "    task_id = res['custom_id']\n",
    "    # Getting index from task id\n",
    "    index = task_id.split('-')[-1]\n",
    "    result = res['response']['body']['choices'][0]['message']['content']\n",
    "    movie = df.iloc[int(index)]\n",
    "    description = movie['Overview']\n",
    "    title = movie['Series_Title']\n",
    "    print(f\"TITLE: {title}\\nOVERVIEW: {description}\\n\\nRESULT: {result}\")\n",
    "    print(\"\\n\\n----------------------------\\n\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da4238f5",
   "metadata": {},
   "source": [
    "## Second example: Captioning images\n",
    "\n",
    "In this example, we will use `gpt-4-turbo` to caption images of furniture items. \n",
    "\n",
    "We will use the vision capabilities of the model to analyze the images and generate the captions."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d20e638",
   "metadata": {},
   "source": [
    "### Loading data\n",
    "\n",
    "We will use the Amazon furniture dataset for this example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "079d21e7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>asin</th>\n",
       "      <th>url</th>\n",
       "      <th>title</th>\n",
       "      <th>brand</th>\n",
       "      <th>price</th>\n",
       "      <th>availability</th>\n",
       "      <th>categories</th>\n",
       "      <th>primary_image</th>\n",
       "      <th>images</th>\n",
       "      <th>upc</th>\n",
       "      <th>...</th>\n",
       "      <th>color</th>\n",
       "      <th>material</th>\n",
       "      <th>style</th>\n",
       "      <th>important_information</th>\n",
       "      <th>product_overview</th>\n",
       "      <th>about_item</th>\n",
       "      <th>description</th>\n",
       "      <th>specifications</th>\n",
       "      <th>uniq_id</th>\n",
       "      <th>scraped_at</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>B0CJHKVG6P</td>\n",
       "      <td>https://www.amazon.com/dp/B0CJHKVG6P</td>\n",
       "      <td>GOYMFK 1pc Free Standing Shoe Rack, Multi-laye...</td>\n",
       "      <td>GOYMFK</td>\n",
       "      <td>$24.99</td>\n",
       "      <td>Only 13 left in stock - order soon.</td>\n",
       "      <td>['Home &amp; Kitchen', 'Storage &amp; Organization', '...</td>\n",
       "      <td>https://m.media-amazon.com/images/I/416WaLx10j...</td>\n",
       "      <td>['https://m.media-amazon.com/images/I/416WaLx1...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>White</td>\n",
       "      <td>Metal</td>\n",
       "      <td>Modern</td>\n",
       "      <td>[]</td>\n",
       "      <td>[{'Brand': ' GOYMFK '}, {'Color': ' White '}, ...</td>\n",
       "      <td>['Multiple layers: Provides ample storage spac...</td>\n",
       "      <td>multiple shoes, coats, hats, and other items E...</td>\n",
       "      <td>['Brand: GOYMFK', 'Color: White', 'Material: M...</td>\n",
       "      <td>02593e81-5c09-5069-8516-b0b29f439ded</td>\n",
       "      <td>2024-02-02 15:15:08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>B0B66QHB23</td>\n",
       "      <td>https://www.amazon.com/dp/B0B66QHB23</td>\n",
       "      <td>subrtex Leather ding Room, Dining Chairs Set o...</td>\n",
       "      <td>subrtex</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>['Home &amp; Kitchen', 'Furniture', 'Dining Room F...</td>\n",
       "      <td>https://m.media-amazon.com/images/I/31SejUEWY7...</td>\n",
       "      <td>['https://m.media-amazon.com/images/I/31SejUEW...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>Black</td>\n",
       "      <td>Sponge</td>\n",
       "      <td>Black Rubber Wood</td>\n",
       "      <td>[]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>['【Easy Assembly】: Set of 2 dining room chairs...</td>\n",
       "      <td>subrtex Dining chairs Set of 2</td>\n",
       "      <td>['Brand: subrtex', 'Color: Black', 'Product Di...</td>\n",
       "      <td>5938d217-b8c5-5d3e-b1cf-e28e340f292e</td>\n",
       "      <td>2024-02-02 15:15:09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>B0BXRTWLYK</td>\n",
       "      <td>https://www.amazon.com/dp/B0BXRTWLYK</td>\n",
       "      <td>Plant Repotting Mat MUYETOL Waterproof Transpl...</td>\n",
       "      <td>MUYETOL</td>\n",
       "      <td>$5.98</td>\n",
       "      <td>In Stock</td>\n",
       "      <td>['Patio, Lawn &amp; Garden', 'Outdoor Décor', 'Doo...</td>\n",
       "      <td>https://m.media-amazon.com/images/I/41RgefVq70...</td>\n",
       "      <td>['https://m.media-amazon.com/images/I/41RgefVq...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>Green</td>\n",
       "      <td>Polyethylene</td>\n",
       "      <td>Modern</td>\n",
       "      <td>[]</td>\n",
       "      <td>[{'Brand': ' MUYETOL '}, {'Size': ' 26.8*26.8 ...</td>\n",
       "      <td>['PLANT REPOTTING MAT SIZE: 26.8\" x 26.8\", squ...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>['Brand: MUYETOL', 'Size: 26.8*26.8', 'Item We...</td>\n",
       "      <td>b2ede786-3f51-5a45-9a5b-bcf856958cd8</td>\n",
       "      <td>2024-02-02 15:15:09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>B0C1MRB2M8</td>\n",
       "      <td>https://www.amazon.com/dp/B0C1MRB2M8</td>\n",
       "      <td>Pickleball Doormat, Welcome Doormat Absorbent ...</td>\n",
       "      <td>VEWETOL</td>\n",
       "      <td>$13.99</td>\n",
       "      <td>Only 10 left in stock - order soon.</td>\n",
       "      <td>['Patio, Lawn &amp; Garden', 'Outdoor Décor', 'Doo...</td>\n",
       "      <td>https://m.media-amazon.com/images/I/61vz1Igler...</td>\n",
       "      <td>['https://m.media-amazon.com/images/I/61vz1Igl...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>A5589</td>\n",
       "      <td>Rubber</td>\n",
       "      <td>Modern</td>\n",
       "      <td>[]</td>\n",
       "      <td>[{'Brand': ' VEWETOL '}, {'Size': ' 16*24INCH ...</td>\n",
       "      <td>['Specifications: 16x24 Inch ', \" High-Quality...</td>\n",
       "      <td>The decorative doormat features a subtle textu...</td>\n",
       "      <td>['Brand: VEWETOL', 'Size: 16*24INCH', 'Materia...</td>\n",
       "      <td>8fd9377b-cfa6-5f10-835c-6b8eca2816b5</td>\n",
       "      <td>2024-02-02 15:15:10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>B0CG1N9QRC</td>\n",
       "      <td>https://www.amazon.com/dp/B0CG1N9QRC</td>\n",
       "      <td>JOIN IRON Foldable TV Trays for Eating Set of ...</td>\n",
       "      <td>JOIN IRON Store</td>\n",
       "      <td>$89.99</td>\n",
       "      <td>Usually ships within 5 to 6 weeks</td>\n",
       "      <td>['Home &amp; Kitchen', 'Furniture', 'Game &amp; Recrea...</td>\n",
       "      <td>https://m.media-amazon.com/images/I/41p4d4VJnN...</td>\n",
       "      <td>['https://m.media-amazon.com/images/I/41p4d4VJ...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>Grey Set of 4</td>\n",
       "      <td>Iron</td>\n",
       "      <td>X Classic Style</td>\n",
       "      <td>[]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>['Includes 4 Folding Tv Tray Tables And one Co...</td>\n",
       "      <td>Set of Four Folding Trays With Matching Storag...</td>\n",
       "      <td>['Brand: JOIN IRON', 'Shape: Rectangular', 'In...</td>\n",
       "      <td>bdc9aa30-9439-50dc-8e89-213ea211d66a</td>\n",
       "      <td>2024-02-02 15:15:11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 25 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         asin                                   url  \\\n",
       "0  B0CJHKVG6P  https://www.amazon.com/dp/B0CJHKVG6P   \n",
       "1  B0B66QHB23  https://www.amazon.com/dp/B0B66QHB23   \n",
       "2  B0BXRTWLYK  https://www.amazon.com/dp/B0BXRTWLYK   \n",
       "3  B0C1MRB2M8  https://www.amazon.com/dp/B0C1MRB2M8   \n",
       "4  B0CG1N9QRC  https://www.amazon.com/dp/B0CG1N9QRC   \n",
       "\n",
       "                                               title            brand   price  \\\n",
       "0  GOYMFK 1pc Free Standing Shoe Rack, Multi-laye...           GOYMFK  $24.99   \n",
       "1  subrtex Leather ding Room, Dining Chairs Set o...          subrtex     NaN   \n",
       "2  Plant Repotting Mat MUYETOL Waterproof Transpl...          MUYETOL   $5.98   \n",
       "3  Pickleball Doormat, Welcome Doormat Absorbent ...          VEWETOL  $13.99   \n",
       "4  JOIN IRON Foldable TV Trays for Eating Set of ...  JOIN IRON Store  $89.99   \n",
       "\n",
       "                          availability  \\\n",
       "0  Only 13 left in stock - order soon.   \n",
       "1                                  NaN   \n",
       "2                             In Stock   \n",
       "3  Only 10 left in stock - order soon.   \n",
       "4    Usually ships within 5 to 6 weeks   \n",
       "\n",
       "                                          categories  \\\n",
       "0  ['Home & Kitchen', 'Storage & Organization', '...   \n",
       "1  ['Home & Kitchen', 'Furniture', 'Dining Room F...   \n",
       "2  ['Patio, Lawn & Garden', 'Outdoor Décor', 'Doo...   \n",
       "3  ['Patio, Lawn & Garden', 'Outdoor Décor', 'Doo...   \n",
       "4  ['Home & Kitchen', 'Furniture', 'Game & Recrea...   \n",
       "\n",
       "                                       primary_image  \\\n",
       "0  https://m.media-amazon.com/images/I/416WaLx10j...   \n",
       "1  https://m.media-amazon.com/images/I/31SejUEWY7...   \n",
       "2  https://m.media-amazon.com/images/I/41RgefVq70...   \n",
       "3  https://m.media-amazon.com/images/I/61vz1Igler...   \n",
       "4  https://m.media-amazon.com/images/I/41p4d4VJnN...   \n",
       "\n",
       "                                              images  upc  ...          color  \\\n",
       "0  ['https://m.media-amazon.com/images/I/416WaLx1...  NaN  ...          White   \n",
       "1  ['https://m.media-amazon.com/images/I/31SejUEW...  NaN  ...          Black   \n",
       "2  ['https://m.media-amazon.com/images/I/41RgefVq...  NaN  ...          Green   \n",
       "3  ['https://m.media-amazon.com/images/I/61vz1Igl...  NaN  ...          A5589   \n",
       "4  ['https://m.media-amazon.com/images/I/41p4d4VJ...  NaN  ...  Grey Set of 4   \n",
       "\n",
       "       material              style important_information  \\\n",
       "0         Metal             Modern                    []   \n",
       "1        Sponge  Black Rubber Wood                    []   \n",
       "2  Polyethylene             Modern                    []   \n",
       "3        Rubber             Modern                    []   \n",
       "4          Iron    X Classic Style                    []   \n",
       "\n",
       "                                    product_overview  \\\n",
       "0  [{'Brand': ' GOYMFK '}, {'Color': ' White '}, ...   \n",
       "1                                                NaN   \n",
       "2  [{'Brand': ' MUYETOL '}, {'Size': ' 26.8*26.8 ...   \n",
       "3  [{'Brand': ' VEWETOL '}, {'Size': ' 16*24INCH ...   \n",
       "4                                                NaN   \n",
       "\n",
       "                                          about_item  \\\n",
       "0  ['Multiple layers: Provides ample storage spac...   \n",
       "1  ['【Easy Assembly】: Set of 2 dining room chairs...   \n",
       "2  ['PLANT REPOTTING MAT SIZE: 26.8\" x 26.8\", squ...   \n",
       "3  ['Specifications: 16x24 Inch ', \" High-Quality...   \n",
       "4  ['Includes 4 Folding Tv Tray Tables And one Co...   \n",
       "\n",
       "                                         description  \\\n",
       "0  multiple shoes, coats, hats, and other items E...   \n",
       "1                     subrtex Dining chairs Set of 2   \n",
       "2                                                NaN   \n",
       "3  The decorative doormat features a subtle textu...   \n",
       "4  Set of Four Folding Trays With Matching Storag...   \n",
       "\n",
       "                                      specifications  \\\n",
       "0  ['Brand: GOYMFK', 'Color: White', 'Material: M...   \n",
       "1  ['Brand: subrtex', 'Color: Black', 'Product Di...   \n",
       "2  ['Brand: MUYETOL', 'Size: 26.8*26.8', 'Item We...   \n",
       "3  ['Brand: VEWETOL', 'Size: 16*24INCH', 'Materia...   \n",
       "4  ['Brand: JOIN IRON', 'Shape: Rectangular', 'In...   \n",
       "\n",
       "                                uniq_id           scraped_at  \n",
       "0  02593e81-5c09-5069-8516-b0b29f439ded  2024-02-02 15:15:08  \n",
       "1  5938d217-b8c5-5d3e-b1cf-e28e340f292e  2024-02-02 15:15:09  \n",
       "2  b2ede786-3f51-5a45-9a5b-bcf856958cd8  2024-02-02 15:15:09  \n",
       "3  8fd9377b-cfa6-5f10-835c-6b8eca2816b5  2024-02-02 15:15:10  \n",
       "4  bdc9aa30-9439-50dc-8e89-213ea211d66a  2024-02-02 15:15:11  \n",
       "\n",
       "[5 rows x 25 columns]"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataset_path = \"data/amazon_furniture_dataset.csv\"\n",
    "df = pd.read_csv(dataset_path)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f48ba962",
   "metadata": {},
   "source": [
    "### Processing step \n",
    "\n",
    "Again, we will first prepare our requests with the Chat Completions endpoint, and create the batch file afterwards."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "918fd79f",
   "metadata": {},
   "outputs": [],
   "source": [
    "caption_system_prompt = '''\n",
    "Your goal is to generate short, descriptive captions for images of items.\n",
    "You will be provided with an item image and the name of that item and you will output a caption that captures the most important information about the item.\n",
    "If there are multiple items depicted, refer to the name provided to understand which item you should describe.\n",
    "Your generated caption should be short (1 sentence), and include only the most important information about the item.\n",
    "The most important information could be: the type of item, the style (if mentioned), the material or color if especially relevant and/or any distinctive features.\n",
    "Keep it short and to the point.\n",
    "'''\n",
    "\n",
    "def get_caption(img_url, title):\n",
    "    response = client.chat.completions.create(\n",
    "    model=\"gpt-4-turbo\",\n",
    "    temperature=0.2,\n",
    "    max_tokens=300,\n",
    "    messages=[\n",
    "        {\n",
    "            \"role\": \"system\",\n",
    "            \"content\": caption_system_prompt\n",
    "        },\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [\n",
    "                {\n",
    "                    \"type\": \"text\",\n",
    "                    \"text\": title\n",
    "                },\n",
    "                # The content type should be \"image_url\" to use gpt-4-turbo's vision capabilities\n",
    "                {\n",
    "                    \"type\": \"image_url\",\n",
    "                    \"image_url\": {\n",
    "                        \"url\": img_url\n",
    "                    }\n",
    "                },\n",
    "            ],\n",
    "        }\n",
    "    ]\n",
    "    )\n",
    "\n",
    "    return response.choices[0].message.content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "1daac93d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"https://m.media-amazon.com/images/I/416WaLx10jL._SS522_.jpg\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CAPTION: White multi-layer metal shoe rack featuring eight double hooks for hanging accessories, ideal for organizing footwear and small items in living spaces.\n",
      "\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<img src=\"https://m.media-amazon.com/images/I/31SejUEWY7L._SS522_.jpg\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CAPTION: A set of two elegant black leather dining chairs with a sleek design and vertical stitching detail on the backrest.\n",
      "\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<img src=\"https://m.media-amazon.com/images/I/41RgefVq70L._SS522_.jpg\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CAPTION: A green, waterproof, square, foldable repotting mat designed for indoor gardening, featuring raised edges and displayed with gardening tools and small potted plants.\n",
      "\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<img src=\"https://m.media-amazon.com/images/I/61vz1IglerL._SS522_.jpg\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CAPTION: A brown, absorbent non-slip doormat featuring the phrase \"It's a good day to play PICKLEBALL\" with a pickleball paddle graphic, ideal for sports enthusiasts.\n",
      "\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<img src=\"https://m.media-amazon.com/images/I/41p4d4VJnNL._SS522_.jpg\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CAPTION: Set of four foldable grey TV trays with a stand, featuring a sleek, space-saving design suitable for small areas.\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Testing on a few images\n",
    "for _, row in df[:5].iterrows():\n",
    "    img_url = row['primary_image']\n",
    "    caption = get_caption(img_url, row['title'])\n",
    "    img = Image(url=img_url)\n",
    "    display(img)\n",
    "    print(f\"CAPTION: {caption}\\n\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1e75078",
   "metadata": {},
   "source": [
    "### Creating the batch job\n",
    "\n",
    "As with the first example, we will create an array of json tasks to generate a `jsonl` file and use it to create the batch job."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "48e59bb1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Creating an array of json tasks\n",
    "\n",
    "tasks = []\n",
    "\n",
    "for index, row in df.iterrows():\n",
    "    \n",
    "    title = row['title']\n",
    "    img_url = row['primary_image']\n",
    "    \n",
    "    task = {\n",
    "        \"custom_id\": f\"task-{index}\",\n",
    "        \"method\": \"POST\",\n",
    "        \"url\": \"/v1/chat/completions\",\n",
    "        \"body\": {\n",
    "            # This is what you would have in your Chat Completions API call\n",
    "            \"model\": \"gpt-4-turbo\",\n",
    "            \"temperature\": 0.2,\n",
    "            \"max_tokens\": 300,\n",
    "            \"messages\": [\n",
    "                {\n",
    "                    \"role\": \"system\",\n",
    "                    \"content\": caption_system_prompt\n",
    "                },\n",
    "                {\n",
    "                    \"role\": \"user\",\n",
    "                    \"content\": [\n",
    "                        {\n",
    "                            \"type\": \"text\",\n",
    "                            \"text\": title\n",
    "                        },\n",
    "                        {\n",
    "                            \"type\": \"image_url\",\n",
    "                            \"image_url\": {\n",
    "                                \"url\": img_url\n",
    "                            }\n",
    "                        },\n",
    "                    ],\n",
    "                }\n",
    "            ]            \n",
    "        }\n",
    "    }\n",
    "    \n",
    "    tasks.append(task)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "e75193f2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Creating the file\n",
    "\n",
    "file_name = \"data/batch_tasks_furniture.jsonl\"\n",
    "\n",
    "with open(file_name, 'w') as file:\n",
    "    for obj in tasks:\n",
    "        file.write(json.dumps(obj) + '\\n')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "f2bc166a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Uploading the file \n",
    "\n",
    "batch_file = client.files.create(\n",
    "  file=open(file_name, \"rb\"),\n",
    "  purpose=\"batch\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "0d7d7ec9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Creating the job\n",
    "\n",
    "batch_job = client.batches.create(\n",
    "  input_file_id=batch_file.id,\n",
    "  endpoint=\"/v1/chat/completions\",\n",
    "  completion_window=\"24h\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "53456a08",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Batch(id='batch_xU74ytOBYUpaUQE3Cwi8SCbA', completion_window='24h', created_at=1714049780, endpoint='/v1/chat/completions', input_file_id='file-6y0JPmkHU42qtaEK8x8ZYzkp', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1714049914, error_file_id=None, errors=None, expired_at=None, expires_at=1714136180, failed_at=None, finalizing_at=1714049896, in_progress_at=1714049821, metadata=None, output_file_id='file-XPfkEFZSaM4Avps7mcD3i8BY', request_counts=BatchRequestCounts(completed=312, failed=0, total=312))\n"
     ]
    }
   ],
   "source": [
    "batch_job = client.batches.retrieve(batch_job.id)\n",
    "print(batch_job)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "45ce15d7",
   "metadata": {},
   "source": [
    "### Getting results\n",
    "\n",
    "As with the first example, we can retrieve results once the batch job is done.\n",
    "\n",
    "Reminder: the results are not in the same order as in the input file.\n",
    "Make sure to check the custom_id to match the results against the input requests"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "05db39f3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Retrieving result file\n",
    "\n",
    "result_file_id = batch_job.output_file_id\n",
    "result = client.files.content(result_file_id).content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "a15fbb54",
   "metadata": {},
   "outputs": [],
   "source": [
    "result_file_name = \"data/batch_job_results_furniture.jsonl\"\n",
    "\n",
    "with open(result_file_name, 'wb') as file:\n",
    "    file.write(result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "beabfdcd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Loading data from saved file\n",
    "\n",
    "results = []\n",
    "with open(result_file_name, 'r') as file:\n",
    "    for line in file:\n",
    "        # Parsing the JSON string into a dict and appending to the list of results\n",
    "        json_object = json.loads(line.strip())\n",
    "        results.append(json_object)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "ad54ffee",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"https://m.media-amazon.com/images/I/31FOa-k+EtL._SS522_.jpg\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CAPTION: Brushed brass pedestal towel rack with a sleek, modern design, featuring multiple bars for hanging towels, measuring 25.75 x 14.44 x 32 inches.\n",
      "\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<img src=\"https://m.media-amazon.com/images/I/41z8YktAkGL._SS522_.jpg\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CAPTION: Black round end table featuring a tempered glass top and a metal frame, with a lower shelf for additional storage.\n",
      "\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<img src=\"https://m.media-amazon.com/images/I/511N0PuE9EL._SS522_.jpg\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CAPTION: Black collapsible and height-adjustable telescoping stool, portable and designed for makeup artists and hairstylists, shown in various stages of folding for easy transport.\n",
      "\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<img src=\"https://m.media-amazon.com/images/I/31dCSKQ14YL._SS522_.jpg\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CAPTION: Ergonomic pink gaming chair featuring breathable fabric, adjustable height, lumbar support, a footrest, and a swivel recliner function.\n",
      "\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<img src=\"https://m.media-amazon.com/images/I/51OPfpn9ovL._SS522_.jpg\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CAPTION: A set of two Glitzhome adjustable bar stools featuring a mid-century modern design with swivel seats, PU leather upholstery, and wooden backrests.\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Reading only the first results\n",
    "for res in results[:5]:\n",
    "    task_id = res['custom_id']\n",
    "    # Getting index from task id\n",
    "    index = task_id.split('-')[-1]\n",
    "    result = res['response']['body']['choices'][0]['message']['content']\n",
    "    item = df.iloc[int(index)]\n",
    "    img_url = item['primary_image']\n",
    "    img = Image(url=img_url)\n",
    "    display(img)\n",
    "    print(f\"CAPTION: {result}\\n\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6603e8f",
   "metadata": {},
   "source": [
    "## Wrapping up\n",
    "\n",
    "In this cookbook, we have seen two examples of how to use the new Batch API, but keep in mind that the Batch API works the same way as the Chat Completions endpoint, supporting the same parameters and most of the recent models (gpt-3.5-turbo, gpt-4, gpt-4-vision-preview, gpt-4-turbo...).\n",
    "\n",
    "By using this API, you can significantly reduce costs, so we recommend switching every workload that can happen async to a batch job with this new API."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}