openai-cookbook/examples/dalle/Image_generations_edits_and...

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to use the DALL·E API\n",
    "\n",
    "This notebook shows how to use OpenAI's DALL·E image API endpoints.\n",
    "\n",
    "There are three API endpoints:\n",
    "- **Generations:** generates an image or images based on an input caption\n",
    "- **Edits:** edits or extends an existing image\n",
    "- **Variations:** generates variations of an input image"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "- Import the packages you'll need\n",
    "- Import your OpenAI API key: You can do this by running \\``export OPENAI_API_KEY=\"your API key\"`\\` in your terminal.\n",
    "- Set a directory to save images to"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# imports\n",
    "from openai import OpenAI  # OpenAI Python library to make API calls\n",
    "import requests  # used to download images\n",
    "import os  # used to access filepaths\n",
    "from PIL import Image  # used to print and edit images\n",
    "\n",
    "# initialize OpenAI client\n",
    "client = OpenAI(api_key=os.environ.get(\"OPENAI_API_KEY\", \"<your OpenAI API key if not set as env var>\"))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "image_dir='./images'\n"
     ]
    }
   ],
   "source": [
    "# set a directory to save DALL·E images to\n",
    "image_dir_name = \"images\"\n",
    "image_dir = os.path.join(os.curdir, image_dir_name)\n",
    "\n",
    "# create the directory if it doesn't yet exist\n",
    "if not os.path.isdir(image_dir):\n",
    "    os.mkdir(image_dir)\n",
    "\n",
    "# print the directory to save to\n",
    "print(f\"{image_dir=}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generations\n",
    "\n",
    "The generation API endpoint creates an image based on a text prompt. [API Reference](https://platform.openai.com/docs/api-reference/images/create)\n",
    "\n",
    "**Required inputs:**\n",
    "- `prompt` (str): A text description of the desired image(s). The maximum length is 1000 characters for dall-e-2 and 4000 characters for dall-e-3.\n",
    "\n",
    "**Optional inputs:**\n",
    "- `model` (str): The model to use for image generation. Defaults to dall-e-2\n",
    "- `n` (int): The number of images to generate. Must be between 1 and 10. Defaults to 1.\n",
    "- `quality` (str): The quality of the image that will be generated. hd creates images with finer details and greater consistency across the image. This param is only supported for dall-e-3.\n",
    "- `response_format` (str): The format in which the generated images are returned. Must be one of \"url\" or \"b64_json\". Defaults to \"url\".\n",
    "- `size` (str): The size of the generated images. Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2. Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models. Defaults to \"1024x1024\".\n",
    "- `style`(str | null): The style of the generated images. Must be one of vivid or natural. Vivid causes the model to lean towards generating hyper-real and dramatic images. Natural causes the model to produce more natural, less hyper-real looking images. This param is only supported for dall-e-3.\n",
    "- `user` (str): A unique identifier representing your end-user, which will help OpenAI to monitor and detect abuse. [Learn more.](https://beta.openai.com/docs/usage-policies/end-user-ids)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ImagesResponse(created=1701994117, data=[Image(b64_json=None, revised_prompt=None, url='https://oaidalleapiprodscus.blob.core.windows.net/private/org-9HXYFy8ux4r6aboFyec2OLRf/user-8OA8IvMYkfdAcUZXgzAXHS7d/img-ced13hkOk3lXkccQgW1fAQjm.png?st=2023-12-07T23%3A08%3A37Z&se=2023-12-08T01%3A08%3A37Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-12-07T16%3A41%3A48Z&ske=2023-12-08T16%3A41%3A48Z&sks=b&skv=2021-08-06&sig=tcD0iyU0ABOvWAKsY89gp5hLVIYkoSXQnrcmH%2Brkric%3D')])\n"
     ]
    }
   ],
   "source": [
    "# create an image\n",
    "\n",
    "# set the prompt\n",
    "prompt = \"A cyberpunk monkey hacker dreaming of a beautiful bunch of bananas, digital art\"\n",
    "\n",
    "# call the OpenAI API\n",
    "generation_response = client.images.generate(\n",
    "    model = \"dall-e-3\",\n",
    "    prompt=prompt,\n",
    "    n=1,\n",
    "    size=\"1024x1024\",\n",
    "    response_format=\"url\",\n",
    ")\n",
    "\n",
    "# print response\n",
    "print(generation_response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "# save the image\n",
    "generated_image_name = \"generated_image.png\"  # any name you like; the filetype should be .png\n",
    "generated_image_filepath = os.path.join(image_dir, generated_image_name)\n",
    "generated_image_url = generation_response.data[0].url  # extract image URL from response\n",
    "generated_image = requests.get(generated_image_url).content  # download the image\n",
    "\n",
    "with open(generated_image_filepath, \"wb\") as image_file:\n",
    "    image_file.write(generated_image)  # write the image to the file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# print the image\n",
    "print(generated_image_filepath)\n",
    "display(Image.open(generated_image_filepath))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Variations\n",
    "\n",
    "The variations endpoint generates new images (variations) similar to an input image. [API Reference](https://platform.openai.com/docs/api-reference/images/createVariation)\n",
    "\n",
    "Here we'll generate variations of the image generated above.\n",
    "\n",
    "**Required inputs:**\n",
    "- `image` (str): The image to use as the basis for the variation(s). Must be a valid PNG file, less than 4MB, and square.\n",
    "\n",
    "**Optional inputs:**\n",
    "- `model` (str): The model to use for image variations. Only dall-e-2 is supported at this time.\n",
    "- `n` (int): The number of images to generate. Must be between 1 and 10. Defaults to 1.\n",
    "- `size` (str): The size of the generated images. Must be one of \"256x256\", \"512x512\", or \"1024x1024\". Smaller images are faster. Defaults to \"1024x1024\".\n",
    "- `response_format` (str): The format in which the generated images are returned. Must be one of \"url\" or \"b64_json\". Defaults to \"url\".\n",
    "- `user` (str): A unique identifier representing your end-user, which will help OpenAI to monitor and detect abuse. [Learn more.](https://beta.openai.com/docs/usage-policies/end-user-ids)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ImagesResponse(created=1701994139, data=[Image(b64_json=None, revised_prompt=None, url='https://oaidalleapiprodscus.blob.core.windows.net/private/org-9HXYFy8ux4r6aboFyec2OLRf/user-8OA8IvMYkfdAcUZXgzAXHS7d/img-noNRGgwaaotRGIe6Y2GVeSpr.png?st=2023-12-07T23%3A08%3A59Z&se=2023-12-08T01%3A08%3A59Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-12-07T16%3A39%3A11Z&ske=2023-12-08T16%3A39%3A11Z&sks=b&skv=2021-08-06&sig=ER5RUglhtIk9LWJXw1DsolorT4bnEmFostfnUjY21ns%3D'), Image(b64_json=None, revised_prompt=None, url='https://oaidalleapiprodscus.blob.core.windows.net/private/org-9HXYFy8ux4r6aboFyec2OLRf/user-8OA8IvMYkfdAcUZXgzAXHS7d/img-oz952tL11FFhf9iXXJVIRUZX.png?st=2023-12-07T23%3A08%3A59Z&se=2023-12-08T01%3A08%3A59Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-12-07T16%3A39%3A11Z&ske=2023-12-08T16%3A39%3A11Z&sks=b&skv=2021-08-06&sig=99rJOQwDKsfIeerlMXMHholhAhrHfYaQRFJBF8FKv74%3D')])\n"
     ]
    }
   ],
   "source": [
    "# create variations\n",
    "\n",
    "# call the OpenAI API, using `create_variation` rather than `create`\n",
    "variation_response = client.images.create_variation(\n",
    "    image=generated_image,  # generated_image is the image generated above\n",
    "    n=2,\n",
    "    size=\"1024x1024\",\n",
    "    response_format=\"url\",\n",
    ")\n",
    "\n",
    "# print response\n",
    "print(variation_response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "# save the images\n",
    "variation_urls = [datum.url for datum in variation_response.data]  # extract URLs\n",
    "variation_images = [requests.get(url).content for url in variation_urls]  # download images\n",
    "variation_image_names = [f\"variation_image_{i}.png\" for i in range(len(variation_images))]  # create names\n",
    "variation_image_filepaths = [os.path.join(image_dir, name) for name in variation_image_names]  # create filepaths\n",
    "for image, filepath in zip(variation_images, variation_image_filepaths):  # loop through the variations\n",
    "    with open(filepath, \"wb\") as image_file:  # open the file\n",
    "        image_file.write(image)  # write the image to the file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# print the original image\n",
    "print(generated_image_filepath)\n",
    "display(Image.open(generated_image_filepath))\n",
    "\n",
    "# print the new variations\n",
    "for variation_image_filepaths in variation_image_filepaths:\n",
    "    print(variation_image_filepaths)\n",
    "    display(Image.open(variation_image_filepaths))\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Edits\n",
    "\n",
    "The edit endpoint uses DALL·E to generate a specified portion of an existing image. Three inputs are needed: the image to edit, a mask specifying the portion to be regenerated, and a prompt describing the desired image. [API Reference](https://platform.openai.com/docs/api-reference/images/createEdit)\n",
    "\n",
    "**Required inputs:** \n",
    "- `image` (str): The image to edit. Must be a valid PNG file, less than 4MB, and square. If mask is not provided, image must have transparency, which will be used as the mask.\n",
    "- `prompt` (str): A text description of the desired image(s). The maximum length is 1000 characters.\n",
    "\n",
    "**Optional inputs:**\n",
    "- `mask` (file): An additional image whose fully transparent areas (e.g. where alpha is zero) indicate where image should be edited. Must be a valid PNG file, less than 4MB, and have the same dimensions as image.\n",
    "- `model` (str): The model to use for edit image. Only dall-e-2 is supported at this time.\n",
    "- `n` (int): The number of images to generate. Must be between 1 and 10. Defaults to 1.\n",
    "- `size` (str): The size of the generated images. Must be one of \"256x256\", \"512x512\", or \"1024x1024\". Smaller images are faster. Defaults to \"1024x1024\".\n",
    "- `response_format` (str): The format in which the generated images are returned. Must be one of \"url\" or \"b64_json\". Defaults to \"url\".\n",
    "- `user` (str): A unique identifier representing your end-user, which will help OpenAI to monitor and detect abuse. [Learn more.](https://beta.openai.com/docs/usage-policies/end-user-ids)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set Edit Area\n",
    "\n",
    "An edit requires a \"mask\" to specify which portion of the image to regenerate. Any pixel with an alpha of 0 (transparent) will be regenerated. The code below creates a 1024x1024 mask where the bottom half is transparent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a mask\n",
    "width = 1024\n",
    "height = 1024\n",
    "mask = Image.new(\"RGBA\", (width, height), (0, 0, 0, 1))  # create an opaque image mask\n",
    "\n",
    "# set the bottom half to be transparent\n",
    "for x in range(width):\n",
    "    for y in range(height // 2, height):  # only loop over the bottom half of the mask\n",
    "        # set alpha (A) to zero to turn pixel transparent\n",
    "        alpha = 0\n",
    "        mask.putpixel((x, y), (0, 0, 0, alpha))\n",
    "\n",
    "# save the mask\n",
    "mask_name = \"bottom_half_mask.png\"\n",
    "mask_filepath = os.path.join(image_dir, mask_name)\n",
    "mask.save(mask_filepath)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Perform Edit\n",
    "\n",
    "Now we supply our image, caption and mask to the API to get 5 examples of edits to our image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ImagesResponse(created=1701994167, data=[Image(b64_json=None, revised_prompt=None, url='https://oaidalleapiprodscus.blob.core.windows.net/private/org-9HXYFy8ux4r6aboFyec2OLRf/user-8OA8IvMYkfdAcUZXgzAXHS7d/img-9UOVGC7wB8MS2Q7Rwgj0fFBq.png?st=2023-12-07T23%3A09%3A27Z&se=2023-12-08T01%3A09%3A27Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-12-07T16%3A40%3A37Z&ske=2023-12-08T16%3A40%3A37Z&sks=b&skv=2021-08-06&sig=MsRMZ1rN434bVdWr%2B9kIoqu9CIrvZypZBfkQPTOhCl4%3D')])\n"
     ]
    }
   ],
   "source": [
    "# edit an image\n",
    "\n",
    "# call the OpenAI API\n",
    "edit_response = client.images.edit(\n",
    "    image=open(generated_image_filepath, \"rb\"),  # from the generation section\n",
    "    mask=open(mask_filepath, \"rb\"),  # from right above\n",
    "    prompt=prompt,  # from the generation section\n",
    "    n=1,\n",
    "    size=\"1024x1024\",\n",
    "    response_format=\"url\",\n",
    ")\n",
    "\n",
    "# print response\n",
    "print(edit_response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "# save the image\n",
    "edited_image_name = \"edited_image.png\"  # any name you like; the filetype should be .png\n",
    "edited_image_filepath = os.path.join(image_dir, edited_image_name)\n",
    "edited_image_url = edit_response.data[0].url  # extract image URL from response\n",
    "edited_image = requests.get(edited_image_url).content  # download the image\n",
    "\n",
    "with open(edited_image_filepath, \"wb\") as image_file:\n",
    "    image_file.write(edited_image)  # write the image to the file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# print the original image\n",
    "print(generated_image_filepath)\n",
    "display(Image.open(generated_image_filepath))\n",
    "\n",
    "# print edited image\n",
    "print(edited_image_filepath)\n",
    "display(Image.open(edited_image_filepath))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.9.9 ('openai')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.3"
  },
  "vscode": {
   "interpreter": {
    "hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}