You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
openai-cookbook/examples/dalle/Image_generations_edits_and...

385 lines
16 KiB
Plaintext

{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# How to use the DALL·E API\n",
"\n",
"This notebook shows how to use OpenAI's DALL·E image API endpoints.\n",
"\n",
"There are three API endpoints:\n",
"- **Generations:** generates an image or images based on an input caption\n",
"- **Edits:** edits or extends an existing image\n",
"- **Variations:** generates variations of an input image"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"- Import the packages you'll need\n",
"- Import your OpenAI API key: You can do this by running \\``export OPENAI_API_KEY=\"your API key\"`\\` in your terminal.\n",
"- Set a directory to save images to"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"# imports\n",
"from openai import OpenAI # OpenAI Python library to make API calls\n",
"import requests # used to download images\n",
"import os # used to access filepaths\n",
"from PIL import Image # used to print and edit images\n",
"\n",
"# initialize OpenAI client\n",
"client = OpenAI(api_key=os.environ.get(\"OPENAI_API_KEY\", \"<your OpenAI API key if not set as env var>\"))\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"image_dir='./images'\n"
]
}
],
"source": [
"# set a directory to save DALL·E images to\n",
"image_dir_name = \"images\"\n",
"image_dir = os.path.join(os.curdir, image_dir_name)\n",
"\n",
"# create the directory if it doesn't yet exist\n",
"if not os.path.isdir(image_dir):\n",
" os.mkdir(image_dir)\n",
"\n",
"# print the directory to save to\n",
"print(f\"{image_dir=}\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generations\n",
"\n",
"The generation API endpoint creates an image based on a text prompt. [API Reference](https://platform.openai.com/docs/api-reference/images/create)\n",
"\n",
"**Required inputs:**\n",
"- `prompt` (str): A text description of the desired image(s). The maximum length is 1000 characters for dall-e-2 and 4000 characters for dall-e-3.\n",
"\n",
"**Optional inputs:**\n",
"- `model` (str): The model to use for image generation. Defaults to dall-e-2\n",
"- `n` (int): The number of images to generate. Must be between 1 and 10. Defaults to 1.\n",
"- `quality` (str): The quality of the image that will be generated. hd creates images with finer details and greater consistency across the image. This param is only supported for dall-e-3.\n",
"- `response_format` (str): The format in which the generated images are returned. Must be one of \"url\" or \"b64_json\". Defaults to \"url\".\n",
"- `size` (str): The size of the generated images. Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2. Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models. Defaults to \"1024x1024\".\n",
"- `style`(str | null): The style of the generated images. Must be one of vivid or natural. Vivid causes the model to lean towards generating hyper-real and dramatic images. Natural causes the model to produce more natural, less hyper-real looking images. This param is only supported for dall-e-3.\n",
"- `user` (str): A unique identifier representing your end-user, which will help OpenAI to monitor and detect abuse. [Learn more.](https://beta.openai.com/docs/usage-policies/end-user-ids)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ImagesResponse(created=1701994117, data=[Image(b64_json=None, revised_prompt=None, url='https://oaidalleapiprodscus.blob.core.windows.net/private/org-9HXYFy8ux4r6aboFyec2OLRf/user-8OA8IvMYkfdAcUZXgzAXHS7d/img-ced13hkOk3lXkccQgW1fAQjm.png?st=2023-12-07T23%3A08%3A37Z&se=2023-12-08T01%3A08%3A37Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-12-07T16%3A41%3A48Z&ske=2023-12-08T16%3A41%3A48Z&sks=b&skv=2021-08-06&sig=tcD0iyU0ABOvWAKsY89gp5hLVIYkoSXQnrcmH%2Brkric%3D')])\n"
]
}
],
"source": [
"# create an image\n",
"\n",
"# set the prompt\n",
"prompt = \"A cyberpunk monkey hacker dreaming of a beautiful bunch of bananas, digital art\"\n",
"\n",
"# call the OpenAI API\n",
"generation_response = client.images.generate(\n",
" model = \"dall-e-3\",\n",
" prompt=prompt,\n",
" n=1,\n",
" size=\"1024x1024\",\n",
" response_format=\"url\",\n",
")\n",
"\n",
"# print response\n",
"print(generation_response)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"# save the image\n",
"generated_image_name = \"generated_image.png\" # any name you like; the filetype should be .png\n",
"generated_image_filepath = os.path.join(image_dir, generated_image_name)\n",
"generated_image_url = generation_response.data[0].url # extract image URL from response\n",
"generated_image = requests.get(generated_image_url).content # download the image\n",
"\n",
"with open(generated_image_filepath, \"wb\") as image_file:\n",
" image_file.write(generated_image) # write the image to the file"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# print the image\n",
"print(generated_image_filepath)\n",
"display(Image.open(generated_image_filepath))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Variations\n",
"\n",
"The variations endpoint generates new images (variations) similar to an input image. [API Reference](https://platform.openai.com/docs/api-reference/images/createVariation)\n",
"\n",
"Here we'll generate variations of the image generated above.\n",
"\n",
"**Required inputs:**\n",
"- `image` (str): The image to use as the basis for the variation(s). Must be a valid PNG file, less than 4MB, and square.\n",
"\n",
"**Optional inputs:**\n",
"- `model` (str): The model to use for image variations. Only dall-e-2 is supported at this time.\n",
"- `n` (int): The number of images to generate. Must be between 1 and 10. Defaults to 1.\n",
"- `size` (str): The size of the generated images. Must be one of \"256x256\", \"512x512\", or \"1024x1024\". Smaller images are faster. Defaults to \"1024x1024\".\n",
"- `response_format` (str): The format in which the generated images are returned. Must be one of \"url\" or \"b64_json\". Defaults to \"url\".\n",
"- `user` (str): A unique identifier representing your end-user, which will help OpenAI to monitor and detect abuse. [Learn more.](https://beta.openai.com/docs/usage-policies/end-user-ids)\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ImagesResponse(created=1701994139, data=[Image(b64_json=None, revised_prompt=None, url='https://oaidalleapiprodscus.blob.core.windows.net/private/org-9HXYFy8ux4r6aboFyec2OLRf/user-8OA8IvMYkfdAcUZXgzAXHS7d/img-noNRGgwaaotRGIe6Y2GVeSpr.png?st=2023-12-07T23%3A08%3A59Z&se=2023-12-08T01%3A08%3A59Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-12-07T16%3A39%3A11Z&ske=2023-12-08T16%3A39%3A11Z&sks=b&skv=2021-08-06&sig=ER5RUglhtIk9LWJXw1DsolorT4bnEmFostfnUjY21ns%3D'), Image(b64_json=None, revised_prompt=None, url='https://oaidalleapiprodscus.blob.core.windows.net/private/org-9HXYFy8ux4r6aboFyec2OLRf/user-8OA8IvMYkfdAcUZXgzAXHS7d/img-oz952tL11FFhf9iXXJVIRUZX.png?st=2023-12-07T23%3A08%3A59Z&se=2023-12-08T01%3A08%3A59Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-12-07T16%3A39%3A11Z&ske=2023-12-08T16%3A39%3A11Z&sks=b&skv=2021-08-06&sig=99rJOQwDKsfIeerlMXMHholhAhrHfYaQRFJBF8FKv74%3D')])\n"
]
}
],
"source": [
"# create variations\n",
"\n",
"# call the OpenAI API, using `create_variation` rather than `create`\n",
"variation_response = client.images.create_variation(\n",
" image=generated_image, # generated_image is the image generated above\n",
" n=2,\n",
" size=\"1024x1024\",\n",
" response_format=\"url\",\n",
")\n",
"\n",
"# print response\n",
"print(variation_response)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"# save the images\n",
"variation_urls = [datum.url for datum in variation_response.data] # extract URLs\n",
"variation_images = [requests.get(url).content for url in variation_urls] # download images\n",
"variation_image_names = [f\"variation_image_{i}.png\" for i in range(len(variation_images))] # create names\n",
"variation_image_filepaths = [os.path.join(image_dir, name) for name in variation_image_names] # create filepaths\n",
"for image, filepath in zip(variation_images, variation_image_filepaths): # loop through the variations\n",
" with open(filepath, \"wb\") as image_file: # open the file\n",
" image_file.write(image) # write the image to the file"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# print the original image\n",
"print(generated_image_filepath)\n",
"display(Image.open(generated_image_filepath))\n",
"\n",
"# print the new variations\n",
"for variation_image_filepaths in variation_image_filepaths:\n",
" print(variation_image_filepaths)\n",
" display(Image.open(variation_image_filepaths))\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Edits\n",
"\n",
"The edit endpoint uses DALL·E to generate a specified portion of an existing image. Three inputs are needed: the image to edit, a mask specifying the portion to be regenerated, and a prompt describing the desired image. [API Reference](https://platform.openai.com/docs/api-reference/images/createEdit)\n",
"\n",
"**Required inputs:** \n",
"- `image` (str): The image to edit. Must be a valid PNG file, less than 4MB, and square. If mask is not provided, image must have transparency, which will be used as the mask.\n",
"- `prompt` (str): A text description of the desired image(s). The maximum length is 1000 characters.\n",
"\n",
"**Optional inputs:**\n",
"- `mask` (file): An additional image whose fully transparent areas (e.g. where alpha is zero) indicate where image should be edited. Must be a valid PNG file, less than 4MB, and have the same dimensions as image.\n",
"- `model` (str): The model to use for edit image. Only dall-e-2 is supported at this time.\n",
"- `n` (int): The number of images to generate. Must be between 1 and 10. Defaults to 1.\n",
"- `size` (str): The size of the generated images. Must be one of \"256x256\", \"512x512\", or \"1024x1024\". Smaller images are faster. Defaults to \"1024x1024\".\n",
"- `response_format` (str): The format in which the generated images are returned. Must be one of \"url\" or \"b64_json\". Defaults to \"url\".\n",
"- `user` (str): A unique identifier representing your end-user, which will help OpenAI to monitor and detect abuse. [Learn more.](https://beta.openai.com/docs/usage-policies/end-user-ids)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set Edit Area\n",
"\n",
"An edit requires a \"mask\" to specify which portion of the image to regenerate. Any pixel with an alpha of 0 (transparent) will be regenerated. The code below creates a 1024x1024 mask where the bottom half is transparent."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"# create a mask\n",
"width = 1024\n",
"height = 1024\n",
"mask = Image.new(\"RGBA\", (width, height), (0, 0, 0, 1)) # create an opaque image mask\n",
"\n",
"# set the bottom half to be transparent\n",
"for x in range(width):\n",
" for y in range(height // 2, height): # only loop over the bottom half of the mask\n",
" # set alpha (A) to zero to turn pixel transparent\n",
" alpha = 0\n",
" mask.putpixel((x, y), (0, 0, 0, alpha))\n",
"\n",
"# save the mask\n",
"mask_name = \"bottom_half_mask.png\"\n",
"mask_filepath = os.path.join(image_dir, mask_name)\n",
"mask.save(mask_filepath)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Perform Edit\n",
"\n",
"Now we supply our image, caption and mask to the API to get 5 examples of edits to our image"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ImagesResponse(created=1701994167, data=[Image(b64_json=None, revised_prompt=None, url='https://oaidalleapiprodscus.blob.core.windows.net/private/org-9HXYFy8ux4r6aboFyec2OLRf/user-8OA8IvMYkfdAcUZXgzAXHS7d/img-9UOVGC7wB8MS2Q7Rwgj0fFBq.png?st=2023-12-07T23%3A09%3A27Z&se=2023-12-08T01%3A09%3A27Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-12-07T16%3A40%3A37Z&ske=2023-12-08T16%3A40%3A37Z&sks=b&skv=2021-08-06&sig=MsRMZ1rN434bVdWr%2B9kIoqu9CIrvZypZBfkQPTOhCl4%3D')])\n"
]
}
],
"source": [
"# edit an image\n",
"\n",
"# call the OpenAI API\n",
"edit_response = client.images.edit(\n",
" image=open(generated_image_filepath, \"rb\"), # from the generation section\n",
" mask=open(mask_filepath, \"rb\"), # from right above\n",
" prompt=prompt, # from the generation section\n",
" n=1,\n",
" size=\"1024x1024\",\n",
" response_format=\"url\",\n",
")\n",
"\n",
"# print response\n",
"print(edit_response)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"# save the image\n",
"edited_image_name = \"edited_image.png\" # any name you like; the filetype should be .png\n",
"edited_image_filepath = os.path.join(image_dir, edited_image_name)\n",
"edited_image_url = edit_response.data[0].url # extract image URL from response\n",
"edited_image = requests.get(edited_image_url).content # download the image\n",
"\n",
"with open(edited_image_filepath, \"wb\") as image_file:\n",
" image_file.write(edited_image) # write the image to the file"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# print the original image\n",
"print(generated_image_filepath)\n",
"display(Image.open(generated_image_filepath))\n",
"\n",
"# print edited image\n",
"print(edited_image_filepath)\n",
"display(Image.open(edited_image_filepath))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.9 ('openai')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
},
"vscode": {
"interpreter": {
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}