diff --git a/articles/what_is_new_with_dalle_3.mdx b/articles/what_is_new_with_dalle_3.mdx new file mode 100644 index 0000000..98d6c3d --- /dev/null +++ b/articles/what_is_new_with_dalle_3.mdx @@ -0,0 +1,202 @@ +# What’s new with DALL·E-3? + +DALL·E-3 is the latest version of our DALL-E text-to-image generation models. As the current state of the art in text-to-image generation, DALL·E is capable of generating high-quality images across a wide variety of domains. If you're interested in more technical details of how DALL·E-3 was built, you can read more about in our [research paper](https://cdn.openai.com/papers/dall-e-3.pdf). I'll be going over some of the new features and capabilities of DALL·E-3 in this article, as well as some examples of what new products you can build with the API. + +As a reminder, the Image generation API hasn't changed and maintains the same endpoints and formatting as with DALL·E-2. If you're looking for a guide on how to use the Image API, see [the Cookbook article](https://cookbook.openai.com/examples/dalle/image_generations_edits_and_variations_with_dall-e) on the subject. + +The only API endpoint available for use with DALL·E-3 right now is **Generations** (/v1/images/generations). We don’t support variations or inpainting yet, though the Edits and Variations endpoints are available for use with DALL·E-2. + +## Generations + +The generation API endpoint creates an image based on a text prompt. There’s a couple new parameters that we've added to enhance what you can create with our models. Here’s a quick overview of the options: + +### New parameters: + +- **model** (‘dall-e-2’ or ‘dall-e-3’): This is the model you’re generating with. Be careful to set it to ‘dall-e-3’ as it defaults to ‘dall-e-2’ if empty. +- **style** (‘natural’ or ‘vivid’): The style of the generated images. Must be one of vivid or natural. Vivid causes the model to lean towards generating hyper-real and dramatic images. Natural causes the model to produce more natural, less hyper-real looking images. Defaults to ‘vivid’. +- **quality** (‘standard’ or ‘hd’): The quality of the image that will be generated. ‘hd’ creates images with finer details and greater consistency across the image. Defaults to ‘standard’. + +### Other parameters: + +- **prompt** (str): A text description of the desired image(s). The maximum length is 1000 characters. Required field. +- **n** (int): The number of images to generate. Must be between 1 and 10. Defaults to 1. For dall-e-3, only n=1 is supported. +- **size** (...): The size of the generated images. Must be one of 256x256, 512x512, or 1024x1024 for DALL·E-2 models. Must be one of 1024x1024, 1792x1024, or 1024x1792 for DALL·E-3 models. +- **response_format** ('url' or 'b64_json'): The format in which the generated images are returned. Must be one of "url" or "b64_json". Defaults to "url". +- **user** (str): A unique identifier representing your end-user, which will help OpenAI to monitor and detect abuse. Learn more. + +## New Features + +Our launch of DALL·E-3 comes with lots of new features and capabilities to help you generate the images you want. Here’s a quick overview of what’s new: + +### Prompt Rewriting + +A new feature in the latest DALL·E-3 API is prompt rewriting, where we use GPT-4 to optimize all of your prompts before they’re passed to DALL-E. In our research, we’ve seen that using very detailed prompts give significantly better results. You can read more about our captioning, prompting, and safety mitigations in the [DALL·E-3 research paper](https://cdn.openai.com/papers/dall-e-3.pdf). + +_Keep in mind that this feature isn’t able to be disabled at the moment, though you can achieve a high level of fidelity by simply giving instructions to the relabeler in your prompt, as I'll show below with examples._ + +![Prompt Rewriting](/images/dalle_3/dalle_3_improved_prompts.png) + +### Standard vs HD Quality + +DALL·E-3 introduces a new 'quality' parameter that allows you to adjust the level of detail and organization in all of your generations. The 'standard' quality generations are the DALL·E-3 you're familiar with, with 'hd' generations bringing a new level of attention to detail and adherence to your prompt. Keep in mind that setting your generation quality to ‘hd’ does increase the cost per image, as well as often increasing the time it takes to generate by ~10 seconds or so. + +For example, here we have two different icons in 'hd' and 'standard' quality. Often the choice between either quality is up to taste, but 'hd' often wins when the task requires more ability to capture details and textures or better composition of a scene. + +![Icons](/images/dalle_3/icons.jpg) + +Here's another example, this time with a prompt of 'An infinite, uniform grid of tessellated cubes.', which DALL·E conveniently rewrites as _"An infinite, uniform grid of tessellated cubes painted carefully in an isometric perspective. The cubes are meticulously arranged in such a way that they seem to stretch endlessly into the distance. Each cube is identical to the next, with light reflecting consistently across all surfaces, underscoring their uniformity. This is a digitally rendered image."_: + +![Cubes](/images/dalle_3/cubes.jpg) + +### New Sizes + +DALL·E-3 accepts three different image sizes: 1024px by 1024px, 1792px by 1024px, and 1024px by 1792px. Beyond giving more flexibility in terms of aspect ratio, these sizes can have significant effects on the style and context of your generated image. For example, vertical images might work better when you’re looking for an image that looks like it was taken by a cellphone camera, or horizontal images may work better for landscape paintings or digital designs. + +To demonstrate this difference, here’s multiple variations on the same input prompt with a different aspect ratio. In this case, my prompt was: “Professional photoshoot of a Chemex brewer in the process of brewing coffee.” (For reference, this is a photo of [a real Chemex brewer](https://m.media-amazon.com/images/I/61lrld81vxL.jpg)). + +Here is the generation in square form (in both HD and standard qualities): + +![square_coffee](/images/dalle_3/square_coffee.jpg) + +You can see how these images are framed closely to the item and seem to be taken in a more closed space with various surrounding items nearby. + +Here are the results on the same prompts with a wider aspect ratio: + +![wide_coffee](/images/dalle_3/wide_coffee.jpg) + +Compared to the previous generations, these come in the form of close-ups. The background is blurred, with greater focus on the item itself, more like professionally organized photoshoots rather than quick snaps. + +Lastly, we have the vertical aspect ratio: + +![tall_coffee](/images/dalle_3/tall_coffee.jpg) + +These feel more akin to cellphone images, with a more candid appearance. There’s more action involved: the slowly dripping coffee or the active pour from the pot. + +### New Styles + +DALL·E-3 introduces two new styles: natural and vivid. The natural style is more similar to the DALL·E-2 style in its 'blander' realism, while the vivid style is a new style that leans towards generating hyper-real and cinematic images. For reference, all DALL·E generations in ChatGPT are generated in the 'vivid' style. + +The natural style is specifically useful in cases where DALL·E-3 over-exaggerates or confuses a subject that's supposed to be more simple, subdued, or realistic. I've often used it for logo generation, stock photos, or other cases where I'm trying to match a real-world object. + +Here's an example of the same prompt as above in the vivid style. The vivid is far more cinematic (and looks great), but might pop too much if you're not looking for that. + +![vivid_coffee](/images/dalle_3/vivid_coffee.jpg) + +There's many cases in which I prefer the natural style, such as this example of a painting in the style of Thomas Cole's 'Desolation': + +![thomas_cole](/images/dalle_3/thomas_cole.jpg) + +## Examples and Prompts + +To help you get started building with DALL·E-3, I've come up with a few examples of products you could build with the API, as well as collected some styles and capabilities that seem to be unique to DALL·E-3 at the moment. I've also listed some subjects that I'm struggling to prompt DALL·E-3 to generate in case you want to try your hand at it. + +### Icon Generation + +Have you ever struggled to find the perfect icon for your website or app? It would be awesome to see a custom icon generator app that lets you pick the style, size, and subject of your icon, and then generates a custom SVG from the DALL·E generation. Here's some examples of helpful website icons I generated with DALL·E-3: + +![icon_set](/images/dalle_3/icon_set.jpg) + +In this case, I used Potrace to convert the images to SVGs, which you can download [here](http://potrace.sourceforge.net/). This is what I used to convert the images: + +```bash +potrace -s cat.jpg -o cat.svg +``` + +You might need to boost the brightness and contrast of the image before converting it to an SVG. I used the following command to do so: + +```bash +convert cat.jpg -brightness-contrast 50x50 cat.jpg +``` + +### Logo Generation + +DALL·E-3 is great at jumpstarting the logo creation process for your company or product. By prompting DALL·E to create 'Vector logo design of a Greek statue, minimalistic, with a white background' I achieved the following: + +![logo_greece](/images/dalle_3/logo_greece.jpg) + +Here's another logo I created, this time for an Arabian coffee shop: + +![logo_arabia](/images/dalle_3/logo_arabia.jpg) + +In the case of iterating on an existing logo, I took OpenAI's logo, asked GPT-4V to describe it, and then asked DALL·E to generate variations on the logo: + +![iteration](/images/dalle_3/iteration.jpg) + +### Custom Tattoos + +DALL·E-3 is great at generating line art, which might be useful for generating custom tattoos. Here's some line art I generated with DALL·E-3: + +![tattoos](/images/dalle_3/tattoos.jpg) + +### Die-Cut Stickers & T-Shirts + +What if you could generate custom die-cut stickers and t-shirts with DALL·E-3, integrating with a print-on-demand service like Printful or Stickermule? You could have a custom sticker or t-shirt in minutes, with no design experience required. Here's some examples of stickers I generated with DALL·E-3: + +![stickers](/images/dalle_3/stickers.jpg) + +### Minecraft Skins + +With some difficulty, I managed to prompt DALL·E-3 to generate Minecraft skins. I'm sure with some clever prompting you could get DALL·E-3 to reliably generate incredible Minecraft skins. It might be hard to use the words 'Minecraft' since DALL·E might think you are trying to generate content from the game itself, instead, you can communicate the idea differently: "Flat player skin texture of a ninja skin, compatible with Minecraftskins.com or Planet Minecraft." + +Here's what I managed to create. They might need some work, but I think they're a good start: + +![minecraft](/images/dalle_3/minecraft.jpg) + +### And much more... + +Here's some ideas I've had that I haven't had time to try yet: + +- Custom emojis or Twitch emotes? +- Vector illustrations? +- Personalized Bitmoji-style avatars? +- Album art? +- Custom greeting cards? +- Poster/flyer 'pair-programming' with DALL·E? + +## Showcase + +We're really just starting to figure out what DALL·E-3 is capable of. Here's some of the best styles, generations, and prompts I've seen so far. I've been unable to locate the original authors of some of these images, so if you know who created them, please let me know! + +![collage](/images/dalle_3/collage.jpg) + +Sources: + +[@scharan79 on Reddit](https://www.reddit.com/r/dalle2/comments/170ce1r/dalle_3_is_pretty_good_at_drawing/) +[@TalentedJuli on Reddit](https://www.reddit.com/r/dalle2/comments/1712x7a/60s_pulp_magazine_illustration_is_the_best_style/) +[@Wild-Culture-5068 on Reddit](https://www.reddit.com/r/dalle2/comments/17dwp0s/soviet_blade_runner/) +[@popsicle_pope on Reddit](https://www.reddit.com/r/dalle2/comments/170lx1z/%F0%9D%94%AA%F0%9D%94%A2%F0%9D%94%B1%F0%9D%94%9E%F0%9D%94%AA%F0%9D%94%AC%F0%9D%94%AF%F0%9D%94%AD%F0%9D%94%A5%F0%9D%94%AC%F0%9D%94%B0%F0%9D%94%A6%F0%9D%94%B0/) +[@gopatrik on Twitter](https://twitter.com/gopatrik/status/1717579802205626619) +[@ARTiV3RSE on Twitter](https://twitter.com/ARTiV3RSE/status/1720202013638599040) +[@willdepue on Twitter](https://twitter.com/willdepue/status/1705677997150445941) +Various OpenAI employees + +## Challenges + +DALL·E-3 is still very new and there's still a lot of things it struggles with (or maybe I just haven't figured out how to prompt it correctly yet). Here's some challenges which you might want to try your hand at: + +### Web Design + +DALL·E really struggles at generating real looking websites, apps, etc. and often generates what looks like a portfolio page of a web designer. Here's the best I've gotten so far: + +![websites](/images/dalle_3/websites.jpg) + +### Seamless Textures + +It feels like DALL·E-3 is so close to being able to generate seamless textures. Often they come out great, just slightly cutoff or with a few artifacts. See examples below: + +![seamless](/images/dalle_3/seamless.jpg) + +### Fonts + +Using DALL·E to generate custom fonts or iterate on letter designs could be really cool, but I haven't been able to get it to work yet. Here's the best I've gotten so far: + +![fonts](/images/dalle_3/fonts.jpg) + +## More Resources + +Thanks for reading! If you're looking for more resources on DALL·E-3, here are some related links: + +- [DALL·E-3 Blog Post](https://openai.com/dall-e-3) +- [DALL·E-3 Research Paper](https://cdn.openai.com/papers/dall-e-3.pdf) +- [Image API Documentation](https://platform.openai.com/docs/api-reference/images) +- [Image API Cookbook](https://cookbook.openai.com/examples/dalle/image_generations_edits_and_variations_with_dall-e) diff --git a/examples/Deterministic_outputs_with_the_seed_parameter.ipynb b/examples/Deterministic_outputs_with_the_seed_parameter.ipynb new file mode 100644 index 0000000..ddac247 --- /dev/null +++ b/examples/Deterministic_outputs_with_the_seed_parameter.ipynb @@ -0,0 +1 @@ +{"cells":[{"cell_type":"markdown","metadata":{"cell_id":"67bb097e130b41099c9d257dc06a4054","deepnote_cell_type":"markdown"},"source":["# How to make your completions outputs consistent with the new seed parameter\n","\n","**TLDR**: Developers can now specify `seed` parameter in the Chat Completion request for consistent completions. We always include a `system_fingerprint` in the response that helps developers understand changes in our system that will affect determinism.\n","\n","### Context\n","\n","Determinism has always been a big request from user communities when using our APIs. For instance, when granted the capability of getting deterministic numerical result, users can unlock quite a bit of use cases that’s sensitive to numerical changes.\n","\n","#### Model level features for consistent outputs\n","\n","The Chat Completions and Completions APIs are non-deterministic by default (which means model outputs may differ from request to request), but now offer some control towards deterministic outputs using a few model level controls.\n","\n","This can unlock consistent completions which enables full control on the model behaviors for anything built on top of the APIs, and quite useful for reproducing results and testing so you know get peace of mind from knowing exactly what you’d get.\n","\n","#### Implementing consistent outputs\n","\n","To receive _mostly_ deterministic outputs across API calls:\n","\n","- Set the `seed` parameter to any integer of your choice, but use the same value across requests. For example, `12345`.\n","- Set all other parameters (prompt, temperature, top_p, etc.) to the same values across requests.\n","- In the response, check the `system_fingerprint` field. The system fingerprint is an identifier for the current combination of model weights, infrastructure, and other configuration options used by OpenAI servers to generate the completion. It changes whenever you change request parameters, or OpenAI updates numerical configuration of the infrastructure serving our models (which may happen a few times a year).\n","\n","If the `seed`, request parameters, and `system_fingerprint` all match across your requests, then model outputs will mostly be identical. There is a small chance that responses differ even when request parameters and `system_fingerprint` match, due to the inherent non-determinism of computers.\n"]},{"cell_type":"markdown","metadata":{"cell_id":"f49611fa59af4303883d76c491095fea","deepnote_cell_type":"markdown"},"source":["### Model level controls for consistent outputs - `seed` and `system_fingerprint`\n","\n","##### `seed`\n","\n","If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend.\n","\n","##### `system_fingerprint`\n","\n","This fingerprint represents the backend configuration that the model runs with. It can be used in conjunction with the seed request parameter to understand when backend changes have been made that might impact determinism.This is the indicator on whether users should expect \"almost always the same result\".\n"]},{"cell_type":"markdown","metadata":{"cell_id":"cc6cd37b9a2243aaa4688ef8832512eb","deepnote_cell_type":"markdown"},"source":["## Example: Generating a consistent short story with a fixed seed\n","\n","In this example, we will demonstrate how to generate a consistent short story using a fixed seed. This can be particularly useful in scenarios where you need to reproduce the same results for testing, debugging, or for applications that require consistent outputs.\n"]},{"cell_type":"code","execution_count":null,"metadata":{"cell_id":"48fd2d4c95ad465090ef97254a4a10d2","deepnote_cell_type":"code"},"outputs":[],"source":["import asyncio\n","import openai\n","import pprint\n","import difflib\n","from IPython.display import display, HTML\n","\n","GPT_MODEL = \"gpt-3.5-turbo-1106\""]},{"cell_type":"code","execution_count":null,"metadata":{"cell_id":"e54e0958be3746d39b6e4c16c59b395a","deepnote_cell_type":"code","deepnote_to_be_reexecuted":false,"execution_millis":5,"execution_start":1699034108287,"source_hash":null},"outputs":[],"source":["async def get_chat_response(system_message: str, user_request: str, seed: int = None):\n"," try:\n"," messages = [\n"," {\"role\": \"system\", \"content\": system_message},\n"," {\"role\": \"user\", \"content\": user_request},\n"," ]\n","\n"," response = openai.ChatCompletion.create(\n"," model=GPT_MODEL,\n"," messages=messages,\n"," seed=seed,\n"," max_tokens=200,\n"," temperature=0.7,\n"," )\n","\n"," response_content = response[\"choices\"][0][\"message\"][\"content\"]\n"," system_fingerprint = response[\"system_fingerprint\"]\n"," prompt_tokens = response[\"usage\"][\"prompt_tokens\"]\n"," completion_tokens = (\n"," response[\"usage\"][\"total_tokens\"] - response[\"usage\"][\"prompt_tokens\"]\n"," )\n","\n"," table = f\"\"\"\n"," \n"," \n"," \n"," \n"," \n","
Response{response_content}
System Fingerprint{system_fingerprint}
Number of prompt tokens{prompt_tokens}
Number of completion tokens{completion_tokens}
\n"," \"\"\"\n"," display(HTML(table))\n","\n"," return response_content\n"," except Exception as e:\n"," print(f\"An error occurred: {e}\")\n"," return None\n","\n","\n","# This function compares two responses and displays the differences in a table.\n","# Deletions are highlighted in red and additions are highlighted in green.\n","# If no differences are found, it prints \"No differences found.\"\n","\n","\n","def compare_responses(previous_response: str, response: str):\n"," d = difflib.Differ()\n"," diff = d.compare(previous_response.splitlines(), response.splitlines())\n","\n"," diff_table = \"\"\n"," diff_exists = False\n","\n"," for line in diff:\n"," if line.startswith(\"- \"):\n"," diff_table += f\"\"\n"," diff_exists = True\n"," elif line.startswith(\"+ \"):\n"," diff_table += f\"\"\n"," diff_exists = True\n"," else:\n"," diff_table += f\"\"\n","\n"," diff_table += \"
{line}
{line}
{line}
\"\n","\n"," if diff_exists:\n"," display(HTML(diff_table))\n"," else:\n"," print(\"No differences found.\")"]},{"cell_type":"markdown","metadata":{"cell_id":"dfa39a438aa948cc910a46254df937af","deepnote_cell_type":"text-cell-p","formattedRanges":[]},"source":["First, let's try generating a short story about \"a journey to Mars\" without the `seed` parameter. This is the default behavior:\n"]},{"cell_type":"code","execution_count":null,"metadata":{"cell_id":"9d09f63309c449e4929364caccfd7065","deepnote_cell_type":"code","deepnote_to_be_reexecuted":false,"execution_millis":964,"execution_start":1699034108745,"source_hash":null},"outputs":[{"data":{"text/html":["\n"," \n"," \n"," \n"," \n"," \n","
ResponseIn the year 2050, a team of courageous astronauts embarked on a groundbreaking mission to Mars. The journey was filled with uncertainty and danger, but the crew was undeterred by the challenges that lay ahead.\n","\n","As their spacecraft hurtled through the vast expanse of space, the astronauts marveled at the beauty of the stars and the distant planets. They passed the time by conducting experiments, training for the mission ahead, and bonding with one another.\n","\n","After months of travel, the red planet finally came into view. The crew prepared for the landing, their hearts pounding with a mix of excitement and nervous anticipation. As the spacecraft touched down on the Martian surface, cheers erupted in the control room back on Earth.\n","\n","The astronauts stepped out onto the alien terrain, taking in the breathtaking landscape of rusty red rocks and dusty plains. They set up their base camp and began their scientific research, collecting samples and conducting experiments to better understand the planet's composition and potential for sustaining life.\n","\n","Despite the challenges of living
System Fingerprintfp_fefa7b2153
Number of prompt tokens31
Number of completion tokens200
\n"," "],"text/plain":[""]},"metadata":{},"output_type":"display_data"},{"data":{"text/html":["\n"," \n"," \n"," \n"," \n"," \n","
ResponseIn the year 2050, a team of astronauts set out on a groundbreaking mission to Mars. The journey was long and arduous, but the crew was determined to make history. As they approached the red planet, they marveled at its otherworldly beauty and the sense of awe and wonder filled their hearts.\n","\n","Upon landing, the astronauts began to explore the alien landscape, conducting scientific experiments and collecting samples. They were amazed by the vast canyons, towering mountains, and the eerie silence that surrounded them. Each step they took was a giant leap for humankind, and they felt a profound sense of accomplishment.\n","\n","As they prepared to return to Earth, the astronauts reflected on the significance of their journey. They knew that their discoveries would pave the way for future generations to explore and inhabit Mars. With their mission complete, they boarded their spacecraft and set their sights on the distant blue planet in the sky, knowing that they had left their mark on the history of space exploration.
System Fingerprintfp_fefa7b2153
Number of prompt tokens31
Number of completion tokens198
\n"," "],"text/plain":[""]},"metadata":{},"output_type":"display_data"},{"data":{"text/html":["
- In the year 2050, a team of courageous astronauts embarked on a groundbreaking mission to Mars. The journey was filled with uncertainty and danger, but the crew was undeterred by the challenges that lay ahead.
+ In the year 2050, a team of astronauts set out on a groundbreaking mission to Mars. The journey was long and arduous, but the crew was determined to make history. As they approached the red planet, they marveled at its otherworldly beauty and the sense of awe and wonder filled their hearts.
- As their spacecraft hurtled through the vast expanse of space, the astronauts marveled at the beauty of the stars and the distant planets. They passed the time by conducting experiments, training for the mission ahead, and bonding with one another.
+ Upon landing, the astronauts began to explore the alien landscape, conducting scientific experiments and collecting samples. They were amazed by the vast canyons, towering mountains, and the eerie silence that surrounded them. Each step they took was a giant leap for humankind, and they felt a profound sense of accomplishment.
+ As they prepared to return to Earth, the astronauts reflected on the significance of their journey. They knew that their discoveries would pave the way for future generations to explore and inhabit Mars. With their mission complete, they boarded their spacecraft and set their sights on the distant blue planet in the sky, knowing that they had left their mark on the history of space exploration.
- After months of travel, the red planet finally came into view. The crew prepared for the landing, their hearts pounding with a mix of excitement and nervous anticipation. As the spacecraft touched down on the Martian surface, cheers erupted in the control room back on Earth.
-
- The astronauts stepped out onto the alien terrain, taking in the breathtaking landscape of rusty red rocks and dusty plains. They set up their base camp and began their scientific research, collecting samples and conducting experiments to better understand the planet's composition and potential for sustaining life.
-
- Despite the challenges of living
"],"text/plain":[""]},"metadata":{},"output_type":"display_data"}],"source":["topic = \"a journey to Mars\"\n","system_message = \"You are a helpful assistant that generates short stories.\"\n","user_request = f\"Generate a short story about {topic}.\"\n","\n","previous_response = await get_chat_response(\n"," system_message=system_message, user_request=user_request\n",")\n","\n","response = await get_chat_response(\n"," system_message=system_message, user_request=user_request\n",")\n","\n","# The function compare_responses is then called with the two responses as arguments.\n","# This function will compare the two responses and display the differences in a table.\n","# If no differences are found, it will print \"No differences found.\"\n","compare_responses(previous_response, response)"]},{"cell_type":"markdown","metadata":{"cell_id":"e7eaf30e13ac4841b11dcffc505379c1","deepnote_cell_type":"markdown"},"source":["Now, let's try to generate the short story with the same topic (a journey to Mars) with a constant `seed` of 123 and compare the responses and `system_fingerprint`.\n"]},{"cell_type":"code","execution_count":null,"metadata":{"cell_id":"a5754b8ef4074cf7adb479d44bebd97b","deepnote_cell_type":"code"},"outputs":[{"data":{"text/html":["\n"," \n"," \n"," \n"," \n"," \n","
ResponseIn the not-so-distant future, a team of brave astronauts embarked on a groundbreaking journey to Mars. The spacecraft, named \"Odyssey,\" soared through the vast expanse of space, leaving Earth behind as they ventured toward the mysterious red planet.\n","\n","As the crew navigated through the cosmos, they encountered a series of challenges and obstacles, from intense solar flares to treacherous asteroid fields. However, their unwavering determination and spirit of camaraderie propelled them forward, overcoming each hurdle with courage and resilience.\n","\n","Upon reaching Mars, the astronauts were greeted by a breathtaking landscape of rust-colored deserts and towering canyons. They marveled at the alien terrain, conducting scientific experiments and collecting samples to better understand the planet's enigmatic history.\n","\n","Amidst their exploration, the crew faced unexpected setbacks, including a sudden dust storm that threatened their safety. Yet, they stood united, devising ingenious solutions and supporting each other through the adversity.\n","\n","After a successful mission on Mars, the
System Fingerprintfp_fefa7b2153
Number of prompt tokens31
Number of completion tokens200
\n"," "],"text/plain":[""]},"metadata":{},"output_type":"display_data"},{"data":{"text/html":["\n"," \n"," \n"," \n"," \n"," \n","
ResponseIn the not-so-distant future, a team of brave astronauts embarked on a groundbreaking journey to Mars. The spacecraft, named \"Odyssey,\" soared through the vast expanse of space, leaving Earth behind as they ventured toward the mysterious red planet.\n","\n","As the crew navigated through the cosmos, they encountered a series of challenges and obstacles, from intense solar flares to treacherous asteroid fields. However, their unwavering determination and spirit of camaraderie propelled them forward, overcoming each hurdle with courage and resilience.\n","\n","Upon reaching Mars, the astronauts were greeted by a breathtaking landscape of rust-colored deserts and towering canyons. They marveled at the alien terrain, conducting scientific experiments and collecting samples to better understand the planet's enigmatic history.\n","\n","Amidst their exploration, the crew faced unexpected setbacks, including a sudden dust storm that threatened their safety. Yet, they stood united, devising ingenious solutions and supporting each other through the adversity.\n","\n","After a successful mission on Mars, the
System Fingerprintfp_fefa7b2153
Number of prompt tokens31
Number of completion tokens200
\n"," "],"text/plain":[""]},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":["No differences found.\n"]}],"source":["SEED = 123\n","response = await get_chat_response(\n"," system_message=system_message, seed=SEED, user_request=user_request\n",")\n","previous_response = response\n","response = await get_chat_response(\n"," system_message=system_message, seed=SEED, user_request=user_request\n",")\n","\n","compare_responses(previous_response, response)"]},{"cell_type":"markdown","metadata":{"cell_id":"f6c8ae9a6e29451baaeb52b7203fbea8","deepnote_cell_type":"markdown"},"source":["## Conclusion\n","\n","We demonstrated how to use a fixed integer `seed` to generate consistent outputs from our model.This is particularly useful in scenarios where reproducibility is important. However, it's important to note that while the `seed` ensures consistency, it does not guarantee the quality of the output. For instance, in the example provided, we used the same seed to generate a short story about a journey to Mars. Despite querying the model multiple times, the output remained consistent, demonstrating the effectiveness of using this model level control for reproducibility. Another great extension of this could be to use consistent `seed` when benchmarking/evaluating the performance of different prompts or models, to ensure that each version is evaluated under the same conditions, making the comparisons fair and the results reliable.\n"]},{"cell_type":"markdown","metadata":{"created_in_deepnote_cell":true,"deepnote_cell_type":"markdown"},"source":["\n","Created in deepnote.com \n","Created in Deepnote\n"]}],"metadata":{"deepnote":{},"deepnote_execution_queue":[],"deepnote_notebook_id":"90ee66ed8ee74f0dad849c869f1da806","kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.9.13"}},"nbformat":4,"nbformat_minor":0} diff --git a/examples/GPT_with_vision_for_video_understanding.ipynb b/examples/GPT_with_vision_for_video_understanding.ipynb new file mode 100644 index 0000000..73ed16b --- /dev/null +++ b/examples/GPT_with_vision_for_video_understanding.ipynb @@ -0,0 +1,277 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Processing and narrating a video with GPT's visual capabilities and the TTS API\n", + "\n", + "This notebook demonstrates how to use GPT's visual capabilities with a video. GPT-4 doesn't take videos as input directly, but we can use vision and the new 128K context widnow to describe the static frames of a whole video at once. We'll walk through two examples:\n", + "\n", + "1. Using GPT-4 to get a description of a video\n", + "2. Generating a voiceover for a video with GPT-4 and the TTS API\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import display, Image, Audio\n", + "\n", + "import cv2 # We're using OpenCV to read video\n", + "import base64\n", + "import time\n", + "import openai\n", + "import os\n", + "import requests" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Using GPT's visual capabilities to get a description of a video\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First we use OpenCV to extract frames from a nature video containing bisons and wolves:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "618 frames read.\n" + ] + } + ], + "source": [ + "video = cv2.VideoCapture(\"bison.mp4\")\n", + "\n", + "base64Frames = []\n", + "while video.isOpened():\n", + " success, frame = video.read()\n", + " if not success:\n", + " break\n", + " _, buffer = cv2.imencode(\".jpg\", frame)\n", + " base64Frames.append(base64.b64encode(buffer).decode(\"utf-8\"))\n", + "\n", + "video.release()\n", + "print(len(base64Frames), \"frames read.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Display frames to make sure we've read them in correctly:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "image/jpeg": "", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "display_handle = display(None, display_id=True)\n", + "for img in base64Frames:\n", + " display_handle.update(Image(data=base64.b64decode(img.encode(\"utf-8\"))))\n", + " time.sleep(0.025)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once we have the video frames we craft our prompt and send a request to GPT (Note that we don't need to send every frame for GPT to understand what's going on):\n" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Title: Survival Instincts: The Epic Standoff between Bison and Wolves in the Frigid Wild\n", + "\n", + "Description: \n", + "Witness the raw and riveting drama unfold on the snowy plains, where the formidable bison faces off against a determined pack of wolves. Set against a dazzling winter backdrop, this video captures the harrowing and breathtaking interaction between predator and prey. With survival at stake, each move is a dance of life and death in nature's great theater. Watch as these majestic creatures engage in a timeless struggle, showcasing the power, resilience, and indomitable spirit that define the wild. Join us for an unforgettable journey into the heart of nature's resilience – the ultimate testament to the cycle of life in the animal kingdom. #Wildlife #Nature #Survival #BisonVsWolves\n" + ] + } + ], + "source": [ + "PROMPT_MESSAGES = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " \"These are frames from a video that I want to upload. Generate a compelling description that I can upload along with the video.\",\n", + " *map(lambda x: {\"image\": x, \"resize\": 768}, base64Frames[0::10]),\n", + " ],\n", + " },\n", + "]\n", + "params = {\n", + " \"model\": \"gpt-4-vision-preview\",\n", + " \"messages\": PROMPT_MESSAGES,\n", + " \"api_key\": os.environ[\"OPENAI_API_KEY\"],\n", + " \"headers\": {\"Openai-Version\": \"2020-11-07\"},\n", + " \"max_tokens\": 200,\n", + "}\n", + "\n", + "result = openai.ChatCompletion.create(**params)\n", + "print(result.choices[0].message.content)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Generating a voiceover for a video with GPT-4 and the TTS API\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's create a voiceover for this video in the style of David Attenborough. Using the same video frames we prompt GPT to give us a short script:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "In the vast, white expanse of the winter landscape, a drama unfolds that is as timeless as it is raw. Here, in the cradle of nature's harshest trials, a pack of grey wolves has singled out a bison from the herd—a desperate struggle for life and sustenance is about to begin.\n", + "\n", + "In a carefully orchestrated assault, the pack encircles their quarry, each wolf keenly aware of its role. Muscles tense and breaths visible in the frigid air, they inch closer, probing for a weakness. The bison, a formidable giant, stands its ground, backed by the survival instincts honed over millennia. Its hulking form casts a solitary shadow against the snow's blinding canvas.\n", + "\n", + "The dance of predator and prey plays out as a symphony of survival—each movement, each feint, holds the weight of life itself. The wolves take turns attacking, conserving strength while wearing down their target. The herd, once the bison's allies, scatter into the distance, a stark reminder that in these wild territories, the law of survival supersedes the bonds of kinship.\n", + "\n", + "A burst of activity—the wolves close in. The bison, though mighty, is tiring, its breaths labored, its movements sluggish. The wolves sense the turning tide. With relentless determination, they press their advantage, a testament to the brutal beauty of the natural order.\n", + "\n", + "As the struggle reaches its inevitable conclusion, we are reminded of the delicate balance that governs these wild spaces. Life, death, struggle, and survival—the cycle continues, each chapter written in the snow, for as long as the wolf roams and the bison roves these frozen plains.\n" + ] + } + ], + "source": [ + "PROMPT_MESSAGES = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " \"These are frames of a video. Create a short voiceover script in the style of David Attenborough. Only include the narration.\",\n", + " *map(lambda x: {\"image\": x, \"resize\": 768}, base64Frames[0::10]),\n", + " ],\n", + " },\n", + "]\n", + "params = {\n", + " \"model\": \"gpt-4-vision-preview\",\n", + " \"messages\": PROMPT_MESSAGES,\n", + " \"api_key\": os.environ[\"OPENAI_API_KEY\"],\n", + " \"headers\": {\"Openai-Version\": \"2020-11-07\"},\n", + " \"max_tokens\": 500,\n", + "}\n", + "\n", + "result = openai.ChatCompletion.create(**params)\n", + "print(result.choices[0].message.content)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we can pass the script to the TTS API where it will generate a mp3 of the voiceover:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "response = requests.post(\n", + " \"https://api.openai.com/v1/audio/speech\",\n", + " headers={\n", + " \"Authorization\": f\"Bearer {os.environ['OPENAI_API_KEY']}\",\n", + " },\n", + " json={\n", + " \"model\": \"tts-1\",\n", + " \"input\": result.choices[0].message.content,\n", + " \"voice\": \"onyx\",\n", + " },\n", + ")\n", + "\n", + "audio = b\"\"\n", + "for chunk in response.iter_content(chunk_size=1024 * 1024):\n", + " audio += chunk\n", + "Audio(audio, autoplay=True)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "openai", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/images/bison.mp4 b/images/bison.mp4 new file mode 100644 index 0000000..9f1700a Binary files /dev/null and b/images/bison.mp4 differ diff --git a/registry.yaml b/registry.yaml index 1bb9080..1db834d 100644 --- a/registry.yaml +++ b/registry.yaml @@ -1069,3 +1069,29 @@ tags: - completions - functions + +- title: Processing and narrating a video with GPT's visual capabilities and the TTS API + path: examples/GPT_with_vision_for_video_understanding.ipynb + date: 2023-11-06 + authors: + - cathykc + tags: + - completions + - vision + - speech + +- title: What's new with DALL·E-3? + path: articles/what_is_new_with_dalle_3.mdx + date: 2023-11-06 + authors: + - 0hq + tags: + - dall-e + +- title: What's new with DALL·E-3? + path: examples/Deterministic_outputs_with_the_seed_parameter.ipynb + date: 2023-11-06 + authors: + - shyamal-anadkat + tags: + - completions