From c0d415427a2e96da67fb40c313236e9e8454d6cc Mon Sep 17 00:00:00 2001 From: Katia Gil Guzman Date: Thu, 25 Apr 2024 14:12:53 +0100 Subject: [PATCH] updated notebook --- examples/batch_processing.ipynb | 1349 ++++++++++++++++++++----------- 1 file changed, 895 insertions(+), 454 deletions(-) diff --git a/examples/batch_processing.ipynb b/examples/batch_processing.ipynb index ca384cb8..23975ed9 100644 --- a/examples/batch_processing.ipynb +++ b/examples/batch_processing.ipynb @@ -9,13 +9,22 @@ "\n", "The new Batch API allows to **create async batch jobs for a lower price and with higher rate limits**.\n", "\n", - "Jobs will be completed within 24h, but can be completed faster depending on global usage. \n", + "Jobs will be completed within 24h, but may be processed sooner depending on global usage. \n", "\n", - "This notebook covers how to use the Batch API with a practical example.\n", + "Ideal use cases for the Batch API include:\n", "\n", - "As an example, we will caption images using the Amazon furniture dataset, using the `gpt-4-vision-preview` model. \n", + "- Tagging, captioning, or enriching content on a marketplace or blog\n", + "- Categorizing and suggesting answers for support tickets\n", + "- Performing sentiment analysis on large datasets of customer feedback\n", + "- Generating summaries or translations for collections of documents or articles\n", "\n", - "Please note that multiple models are available through the Batch API, and that you can use several parameters in your batch API calls, as you would with the Chat Completions endpoint." + "and much more!\n", + "\n", + "This cookbook will walk you through how to use the Batch API with a couple of practical examples.\n", + "\n", + "We will start with an example to categorize movies using `gpt-3.5-turbo`, and then cover how we can use the vision capabilities of `gpt-4-turbo` to caption images.\n", + "\n", + "Please note that multiple models are available through the Batch API, and that you can use the same parameters in your Batch API calls as with the Chat Completions endpoint." ] }, { @@ -26,71 +35,760 @@ "## Setup" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "02e22580", + "metadata": {}, + "outputs": [], + "source": [ + "# Make sure you have the latest version of the SDK available to use the Batch API\n", + "%pip install openai --upgrade" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "726bacba", + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "from openai import OpenAI\n", + "import pandas as pd\n", + "from IPython.display import Image, display" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "4ac0c9a7", + "metadata": {}, + "outputs": [], + "source": [ + "# Initializing OpenAI client - see https://platform.openai.com/docs/quickstart?context=python\n", + "client = OpenAI()" + ] + }, + { + "cell_type": "markdown", + "id": "5fec950f", + "metadata": {}, + "source": [ + "## First example: Categorizing movies\n", + "\n", + "In this example, we will use `gpt-3.5-turbo` to extract movie categories from a description of the movie. We will also extract a 1-sentence summary from this description. \n", + "\n", + "We will use [JSON mode](https://platform.openai.com/docs/guides/text-generation/json-mode) to extract categories as an array of strings and the 1-sentence summary in a structured format. \n", + "\n", + "For each movie, we want to get a result that looks like this:\n", + "\n", + "```\n", + "{\n", + " categories: ['category1', 'category2', 'category3'],\n", + " summary: '1-sentence summary'\n", + "}\n", + "```" + ] + }, { "cell_type": "markdown", - "id": "0646f81c", + "id": "9fc6dcd6", "metadata": {}, "source": [ - "### Imports" + "### Loading data\n", + "\n", + "We will use the IMDB top 1000 movies dataset for this example. " ] }, { "cell_type": "code", - "execution_count": null, - "id": "02e22580", + "execution_count": 3, + "id": "1b721a0d", + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Poster_LinkSeries_TitleReleased_YearCertificateRuntimeGenreIMDB_RatingOverviewMeta_scoreDirectorStar1Star2Star3Star4No_of_VotesGross
0https://m.media-amazon.com/images/M/MV5BMDFkYT...The Shawshank Redemption1994A142 minDrama9.3Two imprisoned men bond over a number of years...80.0Frank DarabontTim RobbinsMorgan FreemanBob GuntonWilliam Sadler234311028,341,469
1https://m.media-amazon.com/images/M/MV5BM2MyNj...The Godfather1972A175 minCrime, Drama9.2An organized crime dynasty's aging patriarch t...100.0Francis Ford CoppolaMarlon BrandoAl PacinoJames CaanDiane Keaton1620367134,966,411
2https://m.media-amazon.com/images/M/MV5BMTMxNT...The Dark Knight2008UA152 minAction, Crime, Drama9.0When the menace known as the Joker wreaks havo...84.0Christopher NolanChristian BaleHeath LedgerAaron EckhartMichael Caine2303232534,858,444
3https://m.media-amazon.com/images/M/MV5BMWMwMG...The Godfather: Part II1974A202 minCrime, Drama9.0The early life and career of Vito Corleone in ...90.0Francis Ford CoppolaAl PacinoRobert De NiroRobert DuvallDiane Keaton112995257,300,000
4https://m.media-amazon.com/images/M/MV5BMWU4N2...12 Angry Men1957U96 minCrime, Drama9.0A jury holdout attempts to prevent a miscarria...96.0Sidney LumetHenry FondaLee J. CobbMartin BalsamJohn Fiedler6898454,360,000
\n", + "
" + ], + "text/plain": [ + " Poster_Link \\\n", + "0 https://m.media-amazon.com/images/M/MV5BMDFkYT... \n", + "1 https://m.media-amazon.com/images/M/MV5BM2MyNj... \n", + "2 https://m.media-amazon.com/images/M/MV5BMTMxNT... \n", + "3 https://m.media-amazon.com/images/M/MV5BMWMwMG... \n", + "4 https://m.media-amazon.com/images/M/MV5BMWU4N2... \n", + "\n", + " Series_Title Released_Year Certificate Runtime \\\n", + "0 The Shawshank Redemption 1994 A 142 min \n", + "1 The Godfather 1972 A 175 min \n", + "2 The Dark Knight 2008 UA 152 min \n", + "3 The Godfather: Part II 1974 A 202 min \n", + "4 12 Angry Men 1957 U 96 min \n", + "\n", + " Genre IMDB_Rating \\\n", + "0 Drama 9.3 \n", + "1 Crime, Drama 9.2 \n", + "2 Action, Crime, Drama 9.0 \n", + "3 Crime, Drama 9.0 \n", + "4 Crime, Drama 9.0 \n", + "\n", + " Overview Meta_score \\\n", + "0 Two imprisoned men bond over a number of years... 80.0 \n", + "1 An organized crime dynasty's aging patriarch t... 100.0 \n", + "2 When the menace known as the Joker wreaks havo... 84.0 \n", + "3 The early life and career of Vito Corleone in ... 90.0 \n", + "4 A jury holdout attempts to prevent a miscarria... 96.0 \n", + "\n", + " Director Star1 Star2 Star3 \\\n", + "0 Frank Darabont Tim Robbins Morgan Freeman Bob Gunton \n", + "1 Francis Ford Coppola Marlon Brando Al Pacino James Caan \n", + "2 Christopher Nolan Christian Bale Heath Ledger Aaron Eckhart \n", + "3 Francis Ford Coppola Al Pacino Robert De Niro Robert Duvall \n", + "4 Sidney Lumet Henry Fonda Lee J. Cobb Martin Balsam \n", + "\n", + " Star4 No_of_Votes Gross \n", + "0 William Sadler 2343110 28,341,469 \n", + "1 Diane Keaton 1620367 134,966,411 \n", + "2 Michael Caine 2303232 534,858,444 \n", + "3 Diane Keaton 1129952 57,300,000 \n", + "4 John Fiedler 689845 4,360,000 " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dataset_path = \"data/imdb_top_1000.csv\"\n", + "\n", + "df = pd.read_csv(dataset_path)\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "id": "01396a47", + "metadata": {}, + "source": [ + "### Processing step \n", + "\n", + "Here, we will prepare our tasks by first trying them out with the Chat Completions endpoint.\n", + "\n", + "Once we're happy with the results, we can move on to creating the batch job file." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "806e768c", + "metadata": {}, + "outputs": [], + "source": [ + "categorize_system_prompt = '''\n", + "Your goal is to extract movie categories from movie descriptions, as well as a 1-sentence summary for these movies.\n", + "You will be provided with a movie description, and you will output a json object containing the following information:\n", + "\n", + "{\n", + " categories: string[] // Array of categories based on the movie description,\n", + " summary: string // 1-sentence summary of the movie based on the movie description\n", + "}\n", + "\n", + "Categories refer to the genre or type of the movie, like \"action\", \"romance\", \"comedy\", etc. Keep category names simple and use only lower case letters.\n", + "Movies can have several categories, but try to keep it under 3-4. Only mention the categories that are the most obvious based on the description.\n", + "'''\n", + "\n", + "def get_categories(description):\n", + " response = client.chat.completions.create(\n", + " model=\"gpt-3.5-turbo\",\n", + " temperature=0.1,\n", + " # This is to enable JSON mode, making sure responses are valid json objects\n", + " response_format={ \n", + " \"type\": \"json_object\"\n", + " },\n", + " messages=[\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": categorize_system_prompt\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": description\n", + " }\n", + " ],\n", + " )\n", + "\n", + " return response.choices[0].message.content" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "4d079c56", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "TITLE: The Shawshank Redemption\n", + "OVERVIEW: Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.\n", + "\n", + "RESULT: {\n", + " \"categories\": [\"drama\"],\n", + " \"summary\": \"Two imprisoned men bond over the years and find redemption through acts of common decency.\"\n", + "}\n", + "\n", + "\n", + "----------------------------\n", + "\n", + "\n", + "TITLE: The Godfather\n", + "OVERVIEW: An organized crime dynasty's aging patriarch transfers control of his clandestine empire to his reluctant son.\n", + "\n", + "RESULT: {\n", + " \"categories\": [\"crime\", \"drama\"],\n", + " \"summary\": \"A crime drama about an aging patriarch passing on his empire to his son.\"\n", + "}\n", + "\n", + "\n", + "----------------------------\n", + "\n", + "\n", + "TITLE: The Dark Knight\n", + "OVERVIEW: When the menace known as the Joker wreaks havoc and chaos on the people of Gotham, Batman must accept one of the greatest psychological and physical tests of his ability to fight injustice.\n", + "\n", + "RESULT: {\n", + " \"categories\": [\"action\", \"thriller\"],\n", + " \"summary\": \"A thrilling action movie where Batman faces the chaotic Joker in a battle of justice.\"\n", + "}\n", + "\n", + "\n", + "----------------------------\n", + "\n", + "\n", + "TITLE: The Godfather: Part II\n", + "OVERVIEW: The early life and career of Vito Corleone in 1920s New York City is portrayed, while his son, Michael, expands and tightens his grip on the family crime syndicate.\n", + "\n", + "RESULT: {\n", + " \"categories\": [\"crime\", \"drama\"],\n", + " \"summary\": \"A portrayal of Vito Corleone's early life and career in 1920s New York City, as his son Michael expands the family crime syndicate.\"\n", + "}\n", + "\n", + "\n", + "----------------------------\n", + "\n", + "\n", + "TITLE: 12 Angry Men\n", + "OVERVIEW: A jury holdout attempts to prevent a miscarriage of justice by forcing his colleagues to reconsider the evidence.\n", + "\n", + "RESULT: {\n", + " \"categories\": [\"drama\"],\n", + " \"summary\": \"A gripping drama about a jury holdout trying to prevent a miscarriage of justice by challenging his colleagues to reconsider the evidence.\"\n", + "}\n", + "\n", + "\n", + "----------------------------\n", + "\n", + "\n" + ] + } + ], + "source": [ + "# Testing on a few examples\n", + "for _, row in df[:5].iterrows():\n", + " description = row['Overview']\n", + " title = row['Series_Title']\n", + " result = get_categories(description)\n", + " print(f\"TITLE: {title}\\nOVERVIEW: {description}\\n\\nRESULT: {result}\")\n", + " print(\"\\n\\n----------------------------\\n\\n\")" + ] + }, + { + "cell_type": "markdown", + "id": "a89a6709", + "metadata": {}, + "source": [ + "### Creating the batch file\n", + "\n", + "The batch file, in the `jsonl` format, should contain one line (json object) per task.\n", + "Each task is defined as such:\n", + "\n", + "```\n", + "{\n", + " \"custom_id\": ,\n", + " \"method\": \"POST\",\n", + " \"url\": \"/v1/chat/completions\",\n", + " \"body\": {\n", + " \"model\": ,\n", + " \"messages\": ,\n", + " // other parameters\n", + " }\n", + "}\n", + "```\n", + "\n", + "Note: the task ID should be unique per batch job. This is what you can use to match results to the initial input files, as tasks will not be returned in the same order." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "81e37981", + "metadata": {}, + "outputs": [], + "source": [ + "# Creating an array of json tasks\n", + "\n", + "tasks = []\n", + "\n", + "for index, row in df.iterrows():\n", + " \n", + " description = row['Overview']\n", + " \n", + " task = {\n", + " \"custom_id\": f\"task-{index}\",\n", + " \"method\": \"POST\",\n", + " \"url\": \"/v1/chat/completions\",\n", + " \"body\": {\n", + " # This is what you would have in your Chat Completions API call\n", + " \"model\": \"gpt-3.5-turbo\",\n", + " \"temperature\": 0.1,\n", + " \"response_format\": { \n", + " \"type\": \"json_object\"\n", + " },\n", + " \"messages\": [\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": categorize_system_prompt\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": description\n", + " }\n", + " ],\n", + " }\n", + " }\n", + " \n", + " tasks.append(task)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "257e2eda", + "metadata": {}, + "outputs": [], + "source": [ + "# Creating the file\n", + "\n", + "file_name = \"data/batch_tasks_movies.jsonl\"\n", + "\n", + "with open(file_name, 'w') as file:\n", + " for obj in tasks:\n", + " file.write(json.dumps(obj) + '\\n')" + ] + }, + { + "cell_type": "markdown", + "id": "c6b490cd", + "metadata": {}, + "source": [ + "### Uploading the file" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "40ea90dd", + "metadata": {}, + "outputs": [], + "source": [ + "batch_file = client.files.create(\n", + " file=open(file_name, \"rb\"),\n", + " purpose=\"batch\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "081f602f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "FileObject(id='file-nG1JDPSMRMinN8FOdaL30kVD', bytes=1127310, created_at=1714045723, filename='batch_tasks_movies.jsonl', object='file', purpose='batch', status='processed', status_details=None)\n" + ] + } + ], + "source": [ + "print(batch_file)" + ] + }, + { + "cell_type": "markdown", + "id": "f8ef8ab5", + "metadata": {}, + "source": [ + "### Creating the batch job" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "4db403a3", + "metadata": {}, + "outputs": [], + "source": [ + "batch_job = client.batches.create(\n", + " input_file_id=batch_file.id,\n", + " endpoint=\"/v1/chat/completions\",\n", + " completion_window=\"24h\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "f7ca66c6", + "metadata": {}, + "source": [ + "### Checking job status\n", + "\n", + "Note: this can take up to 24h, but it will usually be completed faster.\n", + "\n", + "You can continue checking until the status is 'completed'." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "6105d809", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Batch(id='batch_EXl9tn7dTiJxw8YQJf69e2PM', completion_window='24h', created_at=1714045729, endpoint='/v1/chat/completions', input_file_id='file-nG1JDPSMRMinN8FOdaL30kVD', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1714048437, error_file_id=None, errors=None, expired_at=None, expires_at=1714132129, failed_at=None, finalizing_at=1714048381, in_progress_at=1714045863, metadata=None, output_file_id='file-hHjrZXf0Vo8n3tV9vhVgbHGY', request_counts=BatchRequestCounts(completed=1000, failed=0, total=1000))\n" + ] + } + ], + "source": [ + "batch_job = client.batches.retrieve(batch_job.id)\n", + "print(batch_job)" + ] + }, + { + "cell_type": "markdown", + "id": "6988fb64", + "metadata": {}, + "source": [ + "### Retrieving results" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "682c38d5", + "metadata": {}, + "outputs": [], + "source": [ + "result_file_id = batch_job.output_file_id\n", + "result = client.files.content(result_file_id).content" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "52840df9", + "metadata": {}, + "outputs": [], + "source": [ + "result_file_name = \"data/batch_job_results_movies.jsonl\"\n", + "\n", + "with open(result_file_name, 'wb') as file:\n", + " file.write(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "f11b7d19", "metadata": {}, "outputs": [], "source": [ - "# Make sure you have the latest version of the SDK available to use the Batch API\n", - "%pip install openai --upgrade" + "# Loading data from saved file\n", + "results = []\n", + "with open(result_file_name, 'r') as file:\n", + " for line in file:\n", + " # Parsing the JSON string into a dict and appending to the list of results\n", + " json_object = json.loads(line.strip())\n", + " results.append(json_object)" ] }, { - "cell_type": "code", - "execution_count": 1, - "id": "726bacba", + "cell_type": "markdown", + "id": "a2bafff8", "metadata": {}, - "outputs": [], "source": [ - "import json\n", - "from openai import OpenAI\n", - "import pandas as pd\n", - "from IPython.display import Image, display" + "### Reading results\n", + "Reminder: the results are not in the same order as in the input file.\n", + "Make sure to check the custom_id to match the results against the input tasks" ] }, { "cell_type": "code", - "execution_count": 2, - "id": "4ac0c9a7", + "execution_count": 16, + "id": "004c12d3", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "TITLE: American Psycho\n", + "OVERVIEW: A wealthy New York City investment banking executive, Patrick Bateman, hides his alternate psychopathic ego from his co-workers and friends as he delves deeper into his violent, hedonistic fantasies.\n", + "\n", + "RESULT: {\n", + " \"categories\": [\"thriller\", \"psychological\", \"drama\"],\n", + " \"summary\": \"A wealthy investment banker in New York City conceals his psychopathic alter ego while indulging in violent and hedonistic fantasies.\"\n", + "}\n", + "\n", + "\n", + "----------------------------\n", + "\n", + "\n", + "TITLE: Lethal Weapon\n", + "OVERVIEW: Two newly paired cops who are complete opposites must put aside their differences in order to catch a gang of drug smugglers.\n", + "\n", + "RESULT: {\n", + " \"categories\": [\"action\", \"comedy\", \"crime\"],\n", + " \"summary\": \"An action-packed comedy about two mismatched cops teaming up to take down a drug smuggling gang.\"\n", + "}\n", + "\n", + "\n", + "----------------------------\n", + "\n", + "\n", + "TITLE: A Star Is Born\n", + "OVERVIEW: A musician helps a young singer find fame as age and alcoholism send his own career into a downward spiral.\n", + "\n", + "RESULT: {\n", + " \"categories\": [\"drama\", \"music\"],\n", + " \"summary\": \"A musician's career spirals downward as he helps a young singer find fame amidst struggles with age and alcoholism.\"\n", + "}\n", + "\n", + "\n", + "----------------------------\n", + "\n", + "\n", + "TITLE: From Here to Eternity\n", + "OVERVIEW: In Hawaii in 1941, a private is cruelly punished for not boxing on his unit's team, while his captain's wife and second-in-command are falling in love.\n", + "\n", + "RESULT: {\n", + " \"categories\": [\"drama\", \"romance\", \"war\"],\n", + " \"summary\": \"A drama set in Hawaii in 1941, where a private faces punishment for not boxing on his unit's team, amidst a forbidden love affair between his captain's wife and second-in-command.\"\n", + "}\n", + "\n", + "\n", + "----------------------------\n", + "\n", + "\n", + "TITLE: The Jungle Book\n", + "OVERVIEW: Bagheera the Panther and Baloo the Bear have a difficult time trying to convince a boy to leave the jungle for human civilization.\n", + "\n", + "RESULT: {\n", + " \"categories\": [\"adventure\", \"animation\", \"family\"],\n", + " \"summary\": \"An adventure-filled animated movie about a panther and a bear trying to persuade a boy to leave the jungle for human civilization.\"\n", + "}\n", + "\n", + "\n", + "----------------------------\n", + "\n", + "\n" + ] + } + ], "source": [ - "# Initializing OpenAI client - see https://platform.openai.com/docs/quickstart?context=python\n", - "client = OpenAI()" + "# Reading only the first results\n", + "for res in results[:5]:\n", + " task_id = res['custom_id']\n", + " # Getting index from task id\n", + " index = task_id.split('-')[-1]\n", + " result = res['response']['body']['choices'][0]['message']['content']\n", + " movie = df.iloc[int(index)]\n", + " description = movie['Overview']\n", + " title = movie['Series_Title']\n", + " print(f\"TITLE: {title}\\nOVERVIEW: {description}\\n\\nRESULT: {result}\")\n", + " print(\"\\n\\n----------------------------\\n\\n\")" ] }, { "cell_type": "markdown", - "id": "5d20e638", + "id": "da4238f5", "metadata": {}, "source": [ - "### Loading data" + "## Second example: Captioning images\n", + "\n", + "In this example, we will use `gpt-4-turbo` to caption images of furniture items. \n", + "\n", + "We will use the vision capabilities of the model to analyze the images and generate the captions." ] }, { - "cell_type": "code", - "execution_count": 3, - "id": "079d21e7", + "cell_type": "markdown", + "id": "5d20e638", "metadata": {}, - "outputs": [], "source": [ - "dataset_path = \"data/amazon_furniture_dataset.csv\"" + "### Loading data\n", + "\n", + "We will use the Amazon furniture dataset for this example." ] }, { "cell_type": "code", - "execution_count": 4, - "id": "9c3912b8", + "execution_count": 17, + "id": "079d21e7", "metadata": {}, "outputs": [ { @@ -258,323 +956,108 @@ " bdc9aa30-9439-50dc-8e89-213ea211d66a\n", " 2024-02-02 15:15:11\n", " \n", - " \n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " \n", - " \n", - " 307\n", - " B08SLPBC36\n", - " https://www.amazon.com/dp/B08SLPBC36\n", - " Lexicon Victoria Saddle Wood Bar Stools (Set o...\n", - " Lexicon\n", - " $58.99\n", - " Only 7 left in stock (more on the way).\n", - " ['Home & Kitchen', 'Furniture', 'Game & Recrea...\n", - " https://m.media-amazon.com/images/I/41CPL03Y-W...\n", - " ['https://m.media-amazon.com/images/I/41CPL03Y...\n", - " NaN\n", - " ...\n", - " Black Sand\n", - " Wood\n", - " Contemporary\n", - " []\n", - " NaN\n", - " ['Frame Material: Wood ', ' Set includes two (...\n", - " With a country flair and a deep black sand fin...\n", - " ['Product Dimensions: 18\"D x 15.5\"W x 29\"H', '...\n", - " d3b681ac-6195-5c9b-8125-98b6425829f4\n", - " 2024-02-02 18:56:50\n", - " \n", - " \n", - " 308\n", - " B09KN5ZTXC\n", - " https://www.amazon.com/dp/B09KN5ZTXC\n", - " ANZORG Behind Door Hanging Kids Shoes Organize...\n", - " ANZORG Store\n", - " $9.99\n", - " Only 14 left in stock - order soon.\n", - " ['Home & Kitchen', 'Storage & Organization', '...\n", - " https://m.media-amazon.com/images/I/31qQ2tZPv-...\n", - " ['https://m.media-amazon.com/images/I/31qQ2tZP...\n", - " NaN\n", - " ...\n", - " 12 Pockets\n", - " Non Woven Fabric\n", - " NaN\n", - " []\n", - " NaN\n", - " ['Non Woven Fabric ', \" Hanging organizer with...\n", - " NaN\n", - " ['Specific Uses For Product: 鞋子', 'Material: N...\n", - " 07e5e60e-953d-5512-aab8-cf83193de252\n", - " 2024-02-02 18:56:51\n", - " \n", - " \n", - " 309\n", - " B0BN7T57NK\n", - " https://www.amazon.com/dp/B0BN7T57NK\n", - " Pipishell Full-Motion TV Wall Mount for Most 3...\n", - " Pipishell Store\n", - " $35.99\n", - " In Stock\n", - " ['Electronics', 'Television & Video', 'Accesso...\n", - " https://m.media-amazon.com/images/I/41TkLI3K2-...\n", - " ['https://m.media-amazon.com/images/I/41TkLI3K...\n", - " NaN\n", - " ...\n", - " Black\n", - " NaN\n", - " NaN\n", - " []\n", - " [{'Mounting Type': \" Wall Mount for 16'' Wood ...\n", - " ['Solid & Stable Support: This swivel wall mou...\n", - " NaN\n", - " ['Brand Name: Pipishell', 'Item Weight: 10.83 ...\n", - " cb66eee5-7113-5568-9713-9e08e2b48a26\n", - " 2024-02-02 18:56:52\n", - " \n", - " \n", - " 310\n", - " B097FC9C27\n", - " https://www.amazon.com/dp/B097FC9C27\n", - " Noori Rug Home - Lux Collection Modern Ava Rou...\n", - " NOORI RUG\n", - " $67.60\n", - " In Stock\n", - " ['Home & Kitchen', 'Furniture', 'Living Room F...\n", - " https://m.media-amazon.com/images/I/21Uq9uJEE5...\n", - " ['https://m.media-amazon.com/images/I/21Uq9uJE...\n", - " NaN\n", - " ...\n", - " Ivory/Gold Ava\n", - " Engineered Wood\n", - " Glam\n", - " []\n", - " NaN\n", - " ['Velvet ', ' Both functional and decorative, ...\n", - " Both functional and decorative, this storage s...\n", - " ['Product Dimensions: 13\"D x 13\"W x 15\"H', 'Co...\n", - " 0a3805a8-8249-55e1-a9de-ccf6c602f167\n", - " 2024-02-02 18:56:53\n", - " \n", - " \n", - " 311\n", - " B00SMM4H98\n", - " https://www.amazon.com/dp/B00SMM4H98\n", - " Modway Parcel Upholstered Fabric Parsons Dinin...\n", - " Modway Store\n", - " NaN\n", - " NaN\n", - " ['Home & Kitchen', 'Furniture', 'Dining Room F...\n", - " https://m.media-amazon.com/images/I/41f8WNXejU...\n", - " ['https://m.media-amazon.com/images/I/41f8WNXe...\n", - " NaN\n", - " ...\n", - " Beige\n", - " Foam\n", - " Modern\n", - " []\n", - " NaN\n", - " ['CHIC DETAIL - Introduce sophistication with ...\n", - " Wrap up moments of quality seating with the Pa...\n", - " ['Brand: Modway', 'Color: Beige', 'Product Dim...\n", - " ca0aa529-6ef0-56d0-81a7-042becdefd4d\n", - " 2024-02-02 18:56:53\n", - " \n", " \n", "\n", - "

312 rows × 25 columns

\n", + "

5 rows × 25 columns

\n", "" ], "text/plain": [ - " asin url \\\n", - "0 B0CJHKVG6P https://www.amazon.com/dp/B0CJHKVG6P \n", - "1 B0B66QHB23 https://www.amazon.com/dp/B0B66QHB23 \n", - "2 B0BXRTWLYK https://www.amazon.com/dp/B0BXRTWLYK \n", - "3 B0C1MRB2M8 https://www.amazon.com/dp/B0C1MRB2M8 \n", - "4 B0CG1N9QRC https://www.amazon.com/dp/B0CG1N9QRC \n", - ".. ... ... \n", - "307 B08SLPBC36 https://www.amazon.com/dp/B08SLPBC36 \n", - "308 B09KN5ZTXC https://www.amazon.com/dp/B09KN5ZTXC \n", - "309 B0BN7T57NK https://www.amazon.com/dp/B0BN7T57NK \n", - "310 B097FC9C27 https://www.amazon.com/dp/B097FC9C27 \n", - "311 B00SMM4H98 https://www.amazon.com/dp/B00SMM4H98 \n", + " asin url \\\n", + "0 B0CJHKVG6P https://www.amazon.com/dp/B0CJHKVG6P \n", + "1 B0B66QHB23 https://www.amazon.com/dp/B0B66QHB23 \n", + "2 B0BXRTWLYK https://www.amazon.com/dp/B0BXRTWLYK \n", + "3 B0C1MRB2M8 https://www.amazon.com/dp/B0C1MRB2M8 \n", + "4 B0CG1N9QRC https://www.amazon.com/dp/B0CG1N9QRC \n", "\n", - " title brand \\\n", - "0 GOYMFK 1pc Free Standing Shoe Rack, Multi-laye... GOYMFK \n", - "1 subrtex Leather ding Room, Dining Chairs Set o... subrtex \n", - "2 Plant Repotting Mat MUYETOL Waterproof Transpl... MUYETOL \n", - "3 Pickleball Doormat, Welcome Doormat Absorbent ... VEWETOL \n", - "4 JOIN IRON Foldable TV Trays for Eating Set of ... JOIN IRON Store \n", - ".. ... ... \n", - "307 Lexicon Victoria Saddle Wood Bar Stools (Set o... Lexicon \n", - "308 ANZORG Behind Door Hanging Kids Shoes Organize... ANZORG Store \n", - "309 Pipishell Full-Motion TV Wall Mount for Most 3... Pipishell Store \n", - "310 Noori Rug Home - Lux Collection Modern Ava Rou... NOORI RUG \n", - "311 Modway Parcel Upholstered Fabric Parsons Dinin... Modway Store \n", + " title brand price \\\n", + "0 GOYMFK 1pc Free Standing Shoe Rack, Multi-laye... GOYMFK $24.99 \n", + "1 subrtex Leather ding Room, Dining Chairs Set o... subrtex NaN \n", + "2 Plant Repotting Mat MUYETOL Waterproof Transpl... MUYETOL $5.98 \n", + "3 Pickleball Doormat, Welcome Doormat Absorbent ... VEWETOL $13.99 \n", + "4 JOIN IRON Foldable TV Trays for Eating Set of ... JOIN IRON Store $89.99 \n", "\n", - " price availability \\\n", - "0 $24.99 Only 13 left in stock - order soon. \n", - "1 NaN NaN \n", - "2 $5.98 In Stock \n", - "3 $13.99 Only 10 left in stock - order soon. \n", - "4 $89.99 Usually ships within 5 to 6 weeks \n", - ".. ... ... \n", - "307 $58.99 Only 7 left in stock (more on the way). \n", - "308 $9.99 Only 14 left in stock - order soon. \n", - "309 $35.99 In Stock \n", - "310 $67.60 In Stock \n", - "311 NaN NaN \n", + " availability \\\n", + "0 Only 13 left in stock - order soon. \n", + "1 NaN \n", + "2 In Stock \n", + "3 Only 10 left in stock - order soon. \n", + "4 Usually ships within 5 to 6 weeks \n", "\n", - " categories \\\n", - "0 ['Home & Kitchen', 'Storage & Organization', '... \n", - "1 ['Home & Kitchen', 'Furniture', 'Dining Room F... \n", - "2 ['Patio, Lawn & Garden', 'Outdoor Décor', 'Doo... \n", - "3 ['Patio, Lawn & Garden', 'Outdoor Décor', 'Doo... \n", - "4 ['Home & Kitchen', 'Furniture', 'Game & Recrea... \n", - ".. ... \n", - "307 ['Home & Kitchen', 'Furniture', 'Game & Recrea... \n", - "308 ['Home & Kitchen', 'Storage & Organization', '... \n", - "309 ['Electronics', 'Television & Video', 'Accesso... \n", - "310 ['Home & Kitchen', 'Furniture', 'Living Room F... \n", - "311 ['Home & Kitchen', 'Furniture', 'Dining Room F... \n", + " categories \\\n", + "0 ['Home & Kitchen', 'Storage & Organization', '... \n", + "1 ['Home & Kitchen', 'Furniture', 'Dining Room F... \n", + "2 ['Patio, Lawn & Garden', 'Outdoor Décor', 'Doo... \n", + "3 ['Patio, Lawn & Garden', 'Outdoor Décor', 'Doo... \n", + "4 ['Home & Kitchen', 'Furniture', 'Game & Recrea... \n", "\n", - " primary_image \\\n", - "0 https://m.media-amazon.com/images/I/416WaLx10j... \n", - "1 https://m.media-amazon.com/images/I/31SejUEWY7... \n", - "2 https://m.media-amazon.com/images/I/41RgefVq70... \n", - "3 https://m.media-amazon.com/images/I/61vz1Igler... \n", - "4 https://m.media-amazon.com/images/I/41p4d4VJnN... \n", - ".. ... \n", - "307 https://m.media-amazon.com/images/I/41CPL03Y-W... \n", - "308 https://m.media-amazon.com/images/I/31qQ2tZPv-... \n", - "309 https://m.media-amazon.com/images/I/41TkLI3K2-... \n", - "310 https://m.media-amazon.com/images/I/21Uq9uJEE5... \n", - "311 https://m.media-amazon.com/images/I/41f8WNXejU... \n", + " primary_image \\\n", + "0 https://m.media-amazon.com/images/I/416WaLx10j... \n", + "1 https://m.media-amazon.com/images/I/31SejUEWY7... \n", + "2 https://m.media-amazon.com/images/I/41RgefVq70... \n", + "3 https://m.media-amazon.com/images/I/61vz1Igler... \n", + "4 https://m.media-amazon.com/images/I/41p4d4VJnN... \n", "\n", - " images upc ... \\\n", - "0 ['https://m.media-amazon.com/images/I/416WaLx1... NaN ... \n", - "1 ['https://m.media-amazon.com/images/I/31SejUEW... NaN ... \n", - "2 ['https://m.media-amazon.com/images/I/41RgefVq... NaN ... \n", - "3 ['https://m.media-amazon.com/images/I/61vz1Igl... NaN ... \n", - "4 ['https://m.media-amazon.com/images/I/41p4d4VJ... NaN ... \n", - ".. ... ... ... \n", - "307 ['https://m.media-amazon.com/images/I/41CPL03Y... NaN ... \n", - "308 ['https://m.media-amazon.com/images/I/31qQ2tZP... NaN ... \n", - "309 ['https://m.media-amazon.com/images/I/41TkLI3K... NaN ... \n", - "310 ['https://m.media-amazon.com/images/I/21Uq9uJE... NaN ... \n", - "311 ['https://m.media-amazon.com/images/I/41f8WNXe... NaN ... \n", + " images upc ... color \\\n", + "0 ['https://m.media-amazon.com/images/I/416WaLx1... NaN ... White \n", + "1 ['https://m.media-amazon.com/images/I/31SejUEW... NaN ... Black \n", + "2 ['https://m.media-amazon.com/images/I/41RgefVq... NaN ... Green \n", + "3 ['https://m.media-amazon.com/images/I/61vz1Igl... NaN ... A5589 \n", + "4 ['https://m.media-amazon.com/images/I/41p4d4VJ... NaN ... Grey Set of 4 \n", "\n", - " color material style \\\n", - "0 White Metal Modern \n", - "1 Black Sponge Black Rubber Wood \n", - "2 Green Polyethylene Modern \n", - "3 A5589 Rubber Modern \n", - "4 Grey Set of 4 Iron X Classic Style \n", - ".. ... ... ... \n", - "307 Black Sand Wood Contemporary \n", - "308 12 Pockets Non Woven Fabric NaN \n", - "309 Black NaN NaN \n", - "310 Ivory/Gold Ava Engineered Wood Glam \n", - "311 Beige Foam Modern \n", + " material style important_information \\\n", + "0 Metal Modern [] \n", + "1 Sponge Black Rubber Wood [] \n", + "2 Polyethylene Modern [] \n", + "3 Rubber Modern [] \n", + "4 Iron X Classic Style [] \n", "\n", - " important_information product_overview \\\n", - "0 [] [{'Brand': ' GOYMFK '}, {'Color': ' White '}, ... \n", - "1 [] NaN \n", - "2 [] [{'Brand': ' MUYETOL '}, {'Size': ' 26.8*26.8 ... \n", - "3 [] [{'Brand': ' VEWETOL '}, {'Size': ' 16*24INCH ... \n", - "4 [] NaN \n", - ".. ... ... \n", - "307 [] NaN \n", - "308 [] NaN \n", - "309 [] [{'Mounting Type': \" Wall Mount for 16'' Wood ... \n", - "310 [] NaN \n", - "311 [] NaN \n", + " product_overview \\\n", + "0 [{'Brand': ' GOYMFK '}, {'Color': ' White '}, ... \n", + "1 NaN \n", + "2 [{'Brand': ' MUYETOL '}, {'Size': ' 26.8*26.8 ... \n", + "3 [{'Brand': ' VEWETOL '}, {'Size': ' 16*24INCH ... \n", + "4 NaN \n", "\n", - " about_item \\\n", - "0 ['Multiple layers: Provides ample storage spac... \n", - "1 ['【Easy Assembly】: Set of 2 dining room chairs... \n", - "2 ['PLANT REPOTTING MAT SIZE: 26.8\" x 26.8\", squ... \n", - "3 ['Specifications: 16x24 Inch ', \" High-Quality... \n", - "4 ['Includes 4 Folding Tv Tray Tables And one Co... \n", - ".. ... \n", - "307 ['Frame Material: Wood ', ' Set includes two (... \n", - "308 ['Non Woven Fabric ', \" Hanging organizer with... \n", - "309 ['Solid & Stable Support: This swivel wall mou... \n", - "310 ['Velvet ', ' Both functional and decorative, ... \n", - "311 ['CHIC DETAIL - Introduce sophistication with ... \n", + " about_item \\\n", + "0 ['Multiple layers: Provides ample storage spac... \n", + "1 ['【Easy Assembly】: Set of 2 dining room chairs... \n", + "2 ['PLANT REPOTTING MAT SIZE: 26.8\" x 26.8\", squ... \n", + "3 ['Specifications: 16x24 Inch ', \" High-Quality... \n", + "4 ['Includes 4 Folding Tv Tray Tables And one Co... \n", "\n", - " description \\\n", - "0 multiple shoes, coats, hats, and other items E... \n", - "1 subrtex Dining chairs Set of 2 \n", - "2 NaN \n", - "3 The decorative doormat features a subtle textu... \n", - "4 Set of Four Folding Trays With Matching Storag... \n", - ".. ... \n", - "307 With a country flair and a deep black sand fin... \n", - "308 NaN \n", - "309 NaN \n", - "310 Both functional and decorative, this storage s... \n", - "311 Wrap up moments of quality seating with the Pa... \n", + " description \\\n", + "0 multiple shoes, coats, hats, and other items E... \n", + "1 subrtex Dining chairs Set of 2 \n", + "2 NaN \n", + "3 The decorative doormat features a subtle textu... \n", + "4 Set of Four Folding Trays With Matching Storag... \n", "\n", - " specifications \\\n", - "0 ['Brand: GOYMFK', 'Color: White', 'Material: M... \n", - "1 ['Brand: subrtex', 'Color: Black', 'Product Di... \n", - "2 ['Brand: MUYETOL', 'Size: 26.8*26.8', 'Item We... \n", - "3 ['Brand: VEWETOL', 'Size: 16*24INCH', 'Materia... \n", - "4 ['Brand: JOIN IRON', 'Shape: Rectangular', 'In... \n", - ".. ... \n", - "307 ['Product Dimensions: 18\"D x 15.5\"W x 29\"H', '... \n", - "308 ['Specific Uses For Product: 鞋子', 'Material: N... \n", - "309 ['Brand Name: Pipishell', 'Item Weight: 10.83 ... \n", - "310 ['Product Dimensions: 13\"D x 13\"W x 15\"H', 'Co... \n", - "311 ['Brand: Modway', 'Color: Beige', 'Product Dim... \n", + " specifications \\\n", + "0 ['Brand: GOYMFK', 'Color: White', 'Material: M... \n", + "1 ['Brand: subrtex', 'Color: Black', 'Product Di... \n", + "2 ['Brand: MUYETOL', 'Size: 26.8*26.8', 'Item We... \n", + "3 ['Brand: VEWETOL', 'Size: 16*24INCH', 'Materia... \n", + "4 ['Brand: JOIN IRON', 'Shape: Rectangular', 'In... \n", "\n", - " uniq_id scraped_at \n", - "0 02593e81-5c09-5069-8516-b0b29f439ded 2024-02-02 15:15:08 \n", - "1 5938d217-b8c5-5d3e-b1cf-e28e340f292e 2024-02-02 15:15:09 \n", - "2 b2ede786-3f51-5a45-9a5b-bcf856958cd8 2024-02-02 15:15:09 \n", - "3 8fd9377b-cfa6-5f10-835c-6b8eca2816b5 2024-02-02 15:15:10 \n", - "4 bdc9aa30-9439-50dc-8e89-213ea211d66a 2024-02-02 15:15:11 \n", - ".. ... ... \n", - "307 d3b681ac-6195-5c9b-8125-98b6425829f4 2024-02-02 18:56:50 \n", - "308 07e5e60e-953d-5512-aab8-cf83193de252 2024-02-02 18:56:51 \n", - "309 cb66eee5-7113-5568-9713-9e08e2b48a26 2024-02-02 18:56:52 \n", - "310 0a3805a8-8249-55e1-a9de-ccf6c602f167 2024-02-02 18:56:53 \n", - "311 ca0aa529-6ef0-56d0-81a7-042becdefd4d 2024-02-02 18:56:53 \n", + " uniq_id scraped_at \n", + "0 02593e81-5c09-5069-8516-b0b29f439ded 2024-02-02 15:15:08 \n", + "1 5938d217-b8c5-5d3e-b1cf-e28e340f292e 2024-02-02 15:15:09 \n", + "2 b2ede786-3f51-5a45-9a5b-bcf856958cd8 2024-02-02 15:15:09 \n", + "3 8fd9377b-cfa6-5f10-835c-6b8eca2816b5 2024-02-02 15:15:10 \n", + "4 bdc9aa30-9439-50dc-8e89-213ea211d66a 2024-02-02 15:15:11 \n", "\n", - "[312 rows x 25 columns]" + "[5 rows x 25 columns]" ] }, - "execution_count": 4, + "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ + "dataset_path = \"data/amazon_furniture_dataset.csv\"\n", "df = pd.read_csv(dataset_path)\n", - "df" + "df.head()" ] }, { @@ -584,34 +1067,34 @@ "source": [ "### Processing step \n", "\n", - "Here, we will prepare our tasks by first trying them out with the Chat Completions endpoint.\n", - "\n", - "Once you're happy with the results you have using regular chat completions, you can move on to creating your batch job files." + "Again, we will first prepare our tasks with the Chat Completions endpoint, and create the batch job file afterwards." ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 20, "id": "918fd79f", "metadata": {}, "outputs": [], "source": [ - "system_prompt = '''\n", + "caption_system_prompt = '''\n", "Your goal is to generate short, descriptive captions for images of items.\n", "You will be provided with an item image and the name of that item and you will output a caption that captures the most important information about the item.\n", "If there are multiple items depicted, refer to the name provided to understand which item you should describe.\n", - "Your generated caption should be short (1 sentence), and include the most relevant information about the item.\n", - "The most important information could be: the type of item, the style (if mentioned), the material or color if especially relevant and any distinctive features.\n", + "Your generated caption should be short (1 sentence), and include only the most important information about the item.\n", + "The most important information could be: the type of item, the style (if mentioned), the material or color if especially relevant and/or any distinctive features.\n", + "Keep it short and to the point.\n", "'''\n", "\n", "def get_caption(img_url, title):\n", " response = client.chat.completions.create(\n", - " model=\"gpt-4-vision-preview\",\n", + " model=\"gpt-4-turbo\",\n", " temperature=0.2,\n", + " max_tokens=300,\n", " messages=[\n", " {\n", " \"role\": \"system\",\n", - " \"content\": system_prompt\n", + " \"content\": caption_system_prompt\n", " },\n", " {\n", " \"role\": \"user\",\n", @@ -620,14 +1103,16 @@ " \"type\": \"text\",\n", " \"text\": title\n", " },\n", + " # The content type should be \"image_url\" to use gpt-4-turbo's vision capabilities\n", " {\n", " \"type\": \"image_url\",\n", - " \"image_url\": img_url,\n", + " \"image_url\": {\n", + " \"url\": img_url\n", + " }\n", " },\n", " ],\n", " }\n", - " ],\n", - " max_tokens=300,\n", + " ]\n", " )\n", "\n", " return response.choices[0].message.content" @@ -635,7 +1120,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 22, "id": "1daac93d", "metadata": {}, "outputs": [ @@ -655,7 +1140,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "CAPTION: A white multi-layer metal shoe rack with eight double hooks, featuring shoes and accessories storage, placed against a wall next to a door.\n", + "CAPTION: White multi-layer metal shoe rack featuring eight double hooks for hanging accessories, ideal for organizing footwear and small items in living spaces.\n", "\n", "\n" ] @@ -676,7 +1161,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "CAPTION: A set of two black leather upholstered dining chairs with a simple, contemporary design.\n", + "CAPTION: A set of two elegant black leather dining chairs with a sleek design and vertical stitching detail on the backrest.\n", "\n", "\n" ] @@ -697,7 +1182,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "CAPTION: A green, waterproof, square plant repotting mat with raised corners and gardening tools displayed on it.\n", + "CAPTION: A green, waterproof, square, foldable repotting mat designed for indoor gardening, featuring raised edges and displayed with gardening tools and small potted plants.\n", "\n", "\n" ] @@ -718,7 +1203,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "CAPTION: A brown absorbent non-slip doormat with the phrase \"it's a good day to play PICKLEBALL\" and pickleball paddle graphics.\n", + "CAPTION: A brown, absorbent non-slip doormat featuring the phrase \"It's a good day to play PICKLEBALL\" with a pickleball paddle graphic, ideal for sports enthusiasts.\n", "\n", "\n" ] @@ -739,7 +1224,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "CAPTION: A set of four grey foldable TV trays with a matching stand, featuring a sleek and space-saving design for convenient snacking or dining.\n", + "CAPTION: Set of four foldable grey TV trays with a stand, featuring a sleek, space-saving design suitable for small areas.\n", "\n", "\n" ] @@ -757,33 +1242,17 @@ }, { "cell_type": "markdown", - "id": "a89a6709", + "id": "c1e75078", "metadata": {}, "source": [ - "## Creating the batch file\n", - "\n", - "The batch file (jsonl) should contain one line per task.\n", - "Each task is defined as such:\n", - "\n", - "```\n", - "{\n", - " \"custom_id\": ,\n", - " \"method\": \"POST\",\n", - " \"url\": \"/v1/chat/completions\",\n", - " \"body\": {\n", - " \"model\": ,\n", - " \"messages\": ,\n", - " // other parameters\n", - " }\n", - "}\n", - "```\n", + "### Creating the batch job\n", "\n", - "Note: the task ID should be unique per batch job. This is what you can use to match results to the initial input files, are tasks will not be returned in the same order." + "As with the first example, we will create an array of json tasks to generate a `jsonl` file and use it to create the batch job." ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 35, "id": "48e59bb1", "metadata": {}, "outputs": [], @@ -803,11 +1272,13 @@ " \"url\": \"/v1/chat/completions\",\n", " \"body\": {\n", " # This is what you would have in your Chat Completions API call\n", - " \"model\": \"gpt-4-vision-preview\",\n", + " \"model\": \"gpt-4-turbo\",\n", + " \"temperature\": 0.2,\n", + " \"max_tokens\": 300,\n", " \"messages\": [\n", " {\n", " \"role\": \"system\",\n", - " \"content\": system_prompt\n", + " \"content\": caption_system_prompt\n", " },\n", " {\n", " \"role\": \"user\",\n", @@ -818,13 +1289,13 @@ " },\n", " {\n", " \"type\": \"image_url\",\n", - " \"image_url\": img_url,\n", + " \"image_url\": {\n", + " \"url\": img_url\n", + " }\n", " },\n", " ],\n", " }\n", - " ],\n", - " \"temperature\": 0.2,\n", - " \"max_tokens\": 300\n", + " ] \n", " }\n", " }\n", " \n", @@ -833,35 +1304,29 @@ }, { "cell_type": "code", - "execution_count": 8, - "id": "257e2eda", + "execution_count": 36, + "id": "e75193f2", "metadata": {}, "outputs": [], "source": [ "# Creating the file\n", "\n", - "file_name = \"data/batch_tasks.jsonl\"\n", + "file_name = \"data/batch_tasks_furniture.jsonl\"\n", "\n", "with open(file_name, 'w') as file:\n", " for obj in tasks:\n", " file.write(json.dumps(obj) + '\\n')" ] }, - { - "cell_type": "markdown", - "id": "c6b490cd", - "metadata": {}, - "source": [ - "### Uploading the file" - ] - }, { "cell_type": "code", - "execution_count": 9, - "id": "40ea90dd", + "execution_count": 37, + "id": "f2bc166a", "metadata": {}, "outputs": [], "source": [ + "# Uploading the file \n", + "\n", "batch_file = client.files.create(\n", " file=open(file_name, \"rb\"),\n", " purpose=\"batch\"\n", @@ -870,37 +1335,13 @@ }, { "cell_type": "code", - "execution_count": 10, - "id": "081f602f", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "FileObject(id='file-kqHmhAcZM1nRcewdvT4V9Htr', bytes=350626, created_at=1713979938, filename='batch_tasks.jsonl', object='file', purpose='batch', status='processed', status_details=None)\n" - ] - } - ], - "source": [ - "print(batch_file)" - ] - }, - { - "cell_type": "markdown", - "id": "f8ef8ab5", - "metadata": {}, - "source": [ - "## Creating the batch job" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "4db403a3", + "execution_count": 38, + "id": "0d7d7ec9", "metadata": {}, "outputs": [], "source": [ + "# Creating the job\n", + "\n", "batch_job = client.batches.create(\n", " input_file_id=batch_file.id,\n", " endpoint=\"/v1/chat/completions\",\n", @@ -908,29 +1349,19 @@ ")" ] }, - { - "cell_type": "markdown", - "id": "f7ca66c6", - "metadata": {}, - "source": [ - "### Checking job status\n", - "\n", - "Note: this can take up to 24h, but it will usually be completed faster.\n", - "\n", - "You can continue checking until the status is 'completed'." - ] - }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 40, "id": "53456a08", - "metadata": {}, + "metadata": { + "scrolled": true + }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Batch(id='batch_8lRuRwDxKKdXm4A8S1u577zi', completion_window='24h', created_at=1713952714, endpoint='/v1/chat/completions', input_file_id='file-N9FrpHjYftSlW4zC1WkLv81a', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1713952907, error_file_id=None, errors=None, expired_at=None, expires_at=1714039114, failed_at=None, finalizing_at=1713952892, in_progress_at=1713952750, metadata=None, output_file_id='file-EuZcIJXoWgMWSnCypOEipVLQ', request_counts=BatchRequestCounts(completed=312, failed=0, total=312))\n" + "Batch(id='batch_xU74ytOBYUpaUQE3Cwi8SCbA', completion_window='24h', created_at=1714049780, endpoint='/v1/chat/completions', input_file_id='file-6y0JPmkHU42qtaEK8x8ZYzkp', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1714049914, error_file_id=None, errors=None, expired_at=None, expires_at=1714136180, failed_at=None, finalizing_at=1714049896, in_progress_at=1714049821, metadata=None, output_file_id='file-XPfkEFZSaM4Avps7mcD3i8BY', request_counts=BatchRequestCounts(completed=312, failed=0, total=312))\n" ] } ], @@ -941,31 +1372,38 @@ }, { "cell_type": "markdown", - "id": "6988fb64", + "id": "45ce15d7", "metadata": {}, "source": [ - "## Retrieving results" + "### Getting results\n", + "\n", + "As with the first example, we can retrieve results once the batch job is done.\n", + "\n", + "Reminder: the results are not in the same order as in the input file.\n", + "Make sure to check the custom_id to match the results against the input tasks" ] }, { "cell_type": "code", - "execution_count": 29, - "id": "682c38d5", + "execution_count": 41, + "id": "05db39f3", "metadata": {}, "outputs": [], "source": [ + "# Retrieving result file\n", + "\n", "result_file_id = batch_job.output_file_id\n", "result = client.files.content(result_file_id).content" ] }, { "cell_type": "code", - "execution_count": 33, - "id": "52840df9", + "execution_count": 42, + "id": "a15fbb54", "metadata": {}, "outputs": [], "source": [ - "result_file_name = \"data/batch_job_results.jsonl\"\n", + "result_file_name = \"data/batch_job_results_furniture.jsonl\"\n", "\n", "with open(result_file_name, 'wb') as file:\n", " file.write(result)" @@ -973,12 +1411,13 @@ }, { "cell_type": "code", - "execution_count": 39, - "id": "f11b7d19", + "execution_count": 43, + "id": "beabfdcd", "metadata": {}, "outputs": [], "source": [ "# Loading data from saved file\n", + "\n", "results = []\n", "with open(result_file_name, 'r') as file:\n", " for line in file:\n", @@ -987,26 +1426,16 @@ " results.append(json_object)" ] }, - { - "cell_type": "markdown", - "id": "a2bafff8", - "metadata": {}, - "source": [ - "### Reading results\n", - "Reminder: the results are not in the same order as in the input file.\n", - "Make sure to check the custom_id to match the results against the input tasks" - ] - }, { "cell_type": "code", - "execution_count": 47, + "execution_count": 44, "id": "ad54ffee", "metadata": {}, "outputs": [ { "data": { "text/html": [ - "" + "" ], "text/plain": [ "" @@ -1019,7 +1448,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "CAPTION: A brown absorbent non-slip doormat with the phrase \"it's a good day to play PICKLEBALL\" and pickleball paddle graphics.\n", + "CAPTION: Brushed brass pedestal towel rack with a sleek, modern design, featuring multiple bars for hanging towels, measuring 25.75 x 14.44 x 32 inches.\n", "\n", "\n" ] @@ -1027,7 +1456,7 @@ { "data": { "text/html": [ - "" + "" ], "text/plain": [ "" @@ -1040,7 +1469,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "CAPTION: A 30-inch LOVMOR bathroom vanity sink base cabinet with three drawers on the left and a single door, finished in a warm brown wood tone.\n", + "CAPTION: Black round end table featuring a tempered glass top and a metal frame, with a lower shelf for additional storage.\n", "\n", "\n" ] @@ -1048,7 +1477,7 @@ { "data": { "text/html": [ - "" + "" ], "text/plain": [ "" @@ -1061,7 +1490,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "CAPTION: A black 4-tier freestanding bathroom organizer with adjustable shelves and baskets, designed to fit over a toilet.\n", + "CAPTION: Black collapsible and height-adjustable telescoping stool, portable and designed for makeup artists and hairstylists, shown in various stages of folding for easy transport.\n", "\n", "\n" ] @@ -1069,7 +1498,7 @@ { "data": { "text/html": [ - "" + "" ], "text/plain": [ "" @@ -1082,7 +1511,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "CAPTION: Black full-motion TV wall mount with dual articulating arms for 37–75 inch TVs, capable of swivel and tilt, supporting up to 100 lbs and fitting 16\" wood studs.\n", + "CAPTION: Ergonomic pink gaming chair featuring breathable fabric, adjustable height, lumbar support, a footrest, and a swivel recliner function.\n", "\n", "\n" ] @@ -1090,7 +1519,7 @@ { "data": { "text/html": [ - "" + "" ], "text/plain": [ "" @@ -1103,7 +1532,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "CAPTION: A colorful modular kids play couch set with a galaxy-themed pattern that glows in the dark, designed for creative play and seating in a child's playroom.\n", + "CAPTION: A set of two Glitzhome adjustable bar stools featuring a mid-century modern design with swivel seats, PU leather upholstery, and wooden backrests.\n", "\n", "\n" ] @@ -1122,6 +1551,18 @@ " display(img)\n", " print(f\"CAPTION: {result}\\n\\n\")" ] + }, + { + "cell_type": "markdown", + "id": "f6603e8f", + "metadata": {}, + "source": [ + "## Wrapping up\n", + "\n", + "In this cookbook, we have seen two examples of how to use the new Batch API, but keep in mind that the Batch API works the same way as the Chat Completions endpoint, supporting the same parameters and most of the recent models (gpt-3.5-turbo, gpt-4, gpt-4-vision-preview, gpt-4-turbo...).\n", + "\n", + "By using this API, you can significantly reduce costs, so we recommend switching every workload that can happen async to a batch job with this new API." + ] } ], "metadata": {