You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
openai-cookbook/examples/GPT_with_vision_for_video_u...

279 lines
164 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Processing and narrating a video with GPT's visual capabilities and the TTS API\n",
"\n",
"This notebook demonstrates how to use GPT's visual capabilities with a video. GPT-4 doesn't take videos as input directly, but we can use vision and the new 128K context window to describe the static frames of a whole video at once. We'll walk through two examples:\n",
"\n",
"1. Using GPT-4 to get a description of a video\n",
"2. Generating a voiceover for a video with GPT-4 and the TTS API\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import display, Image, Audio\n",
"\n",
"import cv2 # We're using OpenCV to read video, to install !pip install opencv-python\n",
"import base64\n",
"import time\n",
"from openai import OpenAI\n",
"import os\n",
"import requests\n",
"\n",
"client = OpenAI()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Using GPT's visual capabilities to get a description of a video\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we use OpenCV to extract frames from a nature [video](https://www.youtube.com/watch?v=kQ_7GtE529M) containing bisons and wolves:\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"618 frames read.\n"
]
}
],
"source": [
"video = cv2.VideoCapture(\"data/bison.mp4\")\n",
"\n",
"base64Frames = []\n",
"while video.isOpened():\n",
" success, frame = video.read()\n",
" if not success:\n",
" break\n",
" _, buffer = cv2.imencode(\".jpg\", frame)\n",
" base64Frames.append(base64.b64encode(buffer).decode(\"utf-8\"))\n",
"\n",
"video.release()\n",
"print(len(base64Frames), \"frames read.\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Display frames to make sure we've read them in correctly:\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"image/jpeg": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAIBAQEBAQIBAQECAgICAgQDAgICAgUEBAMEBgUGBgYFBgYGBwkIBgcJBwYGCAsICQoKCgoKBggLDAsKDAkKCgr/2wBDAQICAgICAgUDAwUKBwYHCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgr/wAARCALQBQADASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD6HSSRDx+PFXE6VGsKE5A6VKBgYoPLCo5HAfBHU1JUbx7mzg8HigAooowfSgAoqSigCOmvEGJbJzTqKAGug25VRntxUqHKg4pPL96bQAs44zUHl+9TUAZOKpOwFT7Me5P5VdsX8jgdfU00W8ZGcU4Rgd6adwH3dwMDb6VTZ/MLAjr1qZ13cHn60nlLjkVTasLW5RmhCt8rE+1XtKnjRhDIcZPHFNa2QnI6+9OSAKQcn8qzd2x6mhIVEfyv1qk6ADPapFYqu0GkPPWi9gK0ZJkAapacYxQUHvSAbkg8U9SSOaay46A0BiBgUAPJA60DkZFQOZR8zcj2NTQMuzn0p9AFopfk96SkAUUAZOKd5fvQA3BPQUuD6GngYGKKAI8EdRRUn40oUnkCgCLBPQUuD6GpNjelG1h2oAioqTZ/s/pRs/2f0oAjop/lH3/KgRlTn+lADKKkooAjopShFG0+lACUUuD6GkoAKKkooAjoqSo6ACiilClulAEVFSUUAFFFFABRRRQAUUUUAFFFFADvL96bUlJ5OPWtAGUUEEHBooAKKKKAECYOc0tFFABSgZOKSigAooorMAoqSitAI6CMjFOcADgU2gAooooAKaqkHJFOooAb5fvR5fvTqKAGtGDz/SonUbulT0hQGgCDA9BRU+wepprDBxQBFTgmRnNOooAb5fvSEYOKfSFMnOaAIKKk2j+7+lGB6CgBmD6GjB9DUmD6GkoAZg+howfQ0+igBmD6GkqSmN96gCKDvUlMhBCnIxzT6ACiiigCuBmlIx3FP4PB9KTCe350GZUoopGZV+8wFUlcBaD0NQyupPDfkahklRQfmo5SeYZKM4qBmypB9adNfxIcsPw21SudZsYH2NICfat4xfUWiHTrgnI9+tVTuRsg1DL4is84Csx7c1Wl10NnyYiMnjJyapx1Jcki3LAjnO3r3qOSzWRdo+Xiqo1S4b92RjuKct/M7BnbGParV2HMmPk0oFflfij7JAi5B5z0x0qCfVHQbQ64B9aoSa7cIS5IxVcsjN76GqsAVc4A+lNmijXnjHfNYlx4kuG+dDtH5Vm3+t3silRKfrmnyvuJHSXZtkQmZlH1NYOqG1mBC3O7n+Gsi+u767fBlPXPPrTYbeY4ZixNNKzCVyDU7S2lYCADPfPeqUOi+ZMsYhySeOa37bTXkfJjJ3d8VoWmgRuNvlnI6mtlqZpNnPW3h0rJuWAfew2a29O8ISzHmyHHYDrWvpugQo+GVj83Pauv0DQ441BXcfl5DdqwlWUNzWMDK8N+CEQq0kCgE88V2NhpNppyFbWMLu+8QOuOlJFCsYwKnR2ICLnPr6VyVqrm9DWMEhFkCN8x+Uf3RUqXUTcpwc8EimtGrYye9C2+2sdbamisiWF97HPY1ZQLt64qmkJQkhjyamV2XvSWiHfUnwv979KUOE6GoPNPv+dJ5meo/WmPmLJnPrSefnv+tVml2/w/rQsu7Py9BnrQVuIZTJ1pr9AaVV296HHy0Gb8iBmBfAqOnyjY/wAvpSByB0B+tVylKSsOp0fem8f3h+dAkCe/0NCTuErOJJSO4TqDzTDdIMkj9ainvICBhxx6mhRbIuibz09DTZJVZcAfnVOfUY4uBz9DWdd61Jlwh7cVahbcZsvdLCmGkAHaqtzrccRwGJz15rnLnWZ5lwv5Zqm89y/LnGferSRLN+58WsCQmAQe/JrPm8S3EiFWz055rOFvOz5YE/jVqLR5ZELZ6jIB60wVyGa9uJGzvY88fN0qIzzMcuCT9a0bXw85QGQfT2qe18P/AL0784z+VANsyo4pZOqnHvViDS5rmQKiHHrXRW+lWiHlfzFTm2ijPyDjtxR0FdnPHw3MeufzFS2vh2USjLfrW6q9lFPiBDjcvH1pPYNWM0XSIoc+ama0vsFp/wA8RTLYAZxUtZs0e5oUxyc4pKK5yxV6j60+o6cHwMEUANp6nIqPcv8AeH50tNqwElFR09PuikAtFFRyS7D0agCSgdT9ajSUsMgfnT1YY5NAC0UUVUQCigHjNGakB3l+9NIwcU7zPanA5GadrgR0UUq9RSASiiim3cAooopAFJsX0paKd9AE2r6Unl+9OooQDfL96fEMHFJT06fjSAWilVd3egoe1ACUDGeRS7G9KUIe5oAF2k/dp3l5/hoSM54NSDgYoAjERBziipKYykdKAEopcH0NGD6GgBKR/umlwfSgrkYIoAjqTA9BSBQKWgCOipCAeooIzwaAI6KkwPQUHkYoAjooooAKjqfYvpTPLH900AR0qru70+jFAEdFBBBwaKACineX70FMDOaAEweuKMH0NOT7opab3AjoqSgjIxVgR0UUUASUHpxUdFADo0LEZ9al2KOTTUdozlDikYnB5oAJACMLTNrelOX7tLQAzB9DRg+hp9FAEdFFFABRRRQBMqgjJo8v3pIZWHUDr6VZEQkUfLj6UAVihA45pNpPVf0q19mHv+dJ5C9iaAKhQZ4NOp3l+9Hl+9ADAoHakcegqXYPU0zBHUUAMwfQ0hBHUVJQQD1FAEdFP2LnNGwUAMopxQ9qTY1ACUhTJzmlooAQJg5zS0UUAMb71NZRgnFSFMnOaTy/egCGnhVI6U/yz2NNP1oAjooooAKinJAJBqWmyRhlOV60AVgW6AnmjY/90/lU4gXggfrT9g9TQBV2P/dP5UbH/un8qtbB6mjYPU0AVWGD1NV/Mb++fzq40SnjH51HJbR55GPoKtKxi2UXkYNkfyqCV5GbOTV6eMKeB9KruoHOKrSwnsVSZMdSPrUMvmnhT+tW5AM4xUM3yqcdqLNgkylMpztcVTvbYSocgcGtC4JKDJ71TlLbSM/StY6WB7mZNZKp5j/OoTbxKSwjAPrV65RyN2fwqnMswUlFBPbNbxs9RFJ9RgiYgjkdqrTavIxIiXHpgZq7F4cklIaR8g8086BbxZBbJqlZaGfKzGmlurg5yV9DTfsrs2JGb6Y4zW82ioGwj8kjgipIdGiZyACeenpTDlZz6WQAxhuTnPvSnRU25MYwTxxXVw6NGRgr0/WpBoqABSp69xUuVhqLOQt9DR5gVUEY54rUs/DO9QxiB56LW/baFEzgCPJ7Vei0crwmcDpxUOokNQOetdARJg32Zhg+taa6JEy5ERyMZx1rSXSpSd27Bz/dqxZ2E8cmQ2T2J4qXWRSsZ1ropjO9UwB2IrZsgYRtQ47cU94ZgoD4H40+C1kYZ3Aelcs5XZaFRQ7bSanAAwo/AU
"text/plain": [
"<IPython.core.display.Image object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"display_handle = display(None, display_id=True)\n",
"for img in base64Frames:\n",
" display_handle.update(Image(data=base64.b64decode(img.encode(\"utf-8\"))))\n",
" time.sleep(0.025)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once we have the video frames, we craft our prompt and send a request to GPT (Note that we don't need to send every frame for GPT to understand what's going on):\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Title: \"Survival in the Snow: The Thrilling Hunt of Wolves vs Bison\"\n",
"\n",
"Description: \"Immerse yourself in the raw beauty and sheer force of nature with this gripping video showcasing a pack of wolves on the hunt in a snow-covered wilderness. Witness the intense struggle for survival as these cunning predators work together to challenge a formidable bison. This rare footage captures the heart-pounding dynamics between hunter and prey amidst a stunning winter landscape. Every frame is a testament to the resilience and instinct that drives the wild. Watch the dance of life and death unfold in this captivating display of wildlife interaction. Don't forget to like, share, and subscribe for more breathtaking nature encounters!\"\n",
"\n",
"Remember to respect wildlife and consider the sensitivity of some viewers when sharing content like this. It's important to provide educational or scientific context to help viewers understand the natural behaviors of wildlife within their ecosystems.\n"
]
}
],
"source": [
"PROMPT_MESSAGES = [\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": [\n",
" \"These are frames from a video that I want to upload. Generate a compelling description that I can upload along with the video.\",\n",
" *map(lambda x: {\"image\": x, \"resize\": 768}, base64Frames[0::50]),\n",
" ],\n",
" },\n",
"]\n",
"params = {\n",
" \"model\": \"gpt-4-vision-preview\",\n",
" \"messages\": PROMPT_MESSAGES,\n",
" \"max_tokens\": 200,\n",
"}\n",
"\n",
"result = client.chat.completions.create(**params)\n",
"print(result.choices[0].message.content)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Generating a voiceover for a video with GPT-4 and the TTS API\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's create a voiceover for this video in the style of David Attenborough. Using the same video frames we prompt GPT to give us a short script:\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In the vast expanse of the snow-covered plains, drama unfolds as the circle of life pits two remarkable species in a struggle for survival. Here we see a pack of wolves, these cunning predators have isolated a bison from the herd, their strategy is as brutal as it is effective.\n",
"\n",
"The wolves work in unison, each member knows its role. Harassing and tiring their prey is their game. As the bison fights to fend off the attackers from every direction, its strength and stamina dwindle. The relentless wolves, embodying resilience and perseverance, continue their assault. Such encounters can last for many exhausting hours under the cold gaze of the winter sun.\n",
"\n",
"As the struggle continues, the snow beneath them is churned into a blizzard of powder. The bison, desperate to protect its own life, swings its massive head, trying to keep the wolves at bay. But the pack senses victory - every snap of their jaws is measured, every darting movement calculated.\n",
"\n",
"Despite the bison's formidable size and power, the wolves' tireless tactics begin to take their toll. Inevitably, nature's harsh judgement descends, and the balance between life and death tips.\n",
"\n",
"As the bison finally succumbs, the wolves secure their meal - essential nourishment that will keep the pack alive in this frigid wilderness. This is a raw display of nature's rule: only the fittest will survive the relentless churn of the seasons.\n",
"\n",
"As the pack claims their hard-won prize, we are reminded of the grueling reality for wildlife in these frozen lands. Life here is a relentless quest for survival, and for these wolves, today's success is but a momentary respite in the eternal struggle of the wild.\n"
]
}
],
"source": [
"PROMPT_MESSAGES = [\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": [\n",
" \"These are frames of a video. Create a short voiceover script in the style of David Attenborough. Only include the narration.\",\n",
" *map(lambda x: {\"image\": x, \"resize\": 768}, base64Frames[0::60]),\n",
" ],\n",
" },\n",
"]\n",
"params = {\n",
" \"model\": \"gpt-4-vision-preview\",\n",
" \"messages\": PROMPT_MESSAGES,\n",
" \"max_tokens\": 500,\n",
"}\n",
"\n",
"result = client.chat.completions.create(**params)\n",
"print(result.choices[0].message.content)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can pass the script to the TTS API where it will generate an mp3 of the voiceover:\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" <audio controls=\"controls\" >\n",
" <source src=\"data:audio/wav;base64,ewogICAgImVycm9yIjogewogICAgICAgICJtZXNzYWdlIjogIkluY29ycmVjdCBBUEkga2V5IHByb3ZpZGVkOiBzay1mb2oxQSoqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKjZsV2guIFlvdSBjYW4gZmluZCB5b3VyIEFQSSBrZXkgYXQgaHR0cHM6Ly9wbGF0Zm9ybS5vcGVuYWkuY29tL2FjY291bnQvYXBpLWtleXMuIiwKICAgICAgICAidHlwZSI6ICJpbnZhbGlkX3JlcXVlc3RfZXJyb3IiLAogICAgICAgICJwYXJhbSI6IG51bGwsCiAgICAgICAgImNvZGUiOiAiaW52YWxpZF9hcGlfa2V5IgogICAgfQp9Cg==\" type=\"audio/wav\" />\n",
" Your browser does not support the audio element.\n",
" </audio>\n",
" "
],
"text/plain": [
"<IPython.lib.display.Audio object>"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"response = requests.post(\n",
" \"https://api.openai.com/v1/audio/speech\",\n",
" headers={\n",
" \"Authorization\": f\"Bearer {os.environ['OPENAI_API_KEY']}\",\n",
" },\n",
" json={\n",
" \"model\": \"tts-1-1106\",\n",
" \"input\": result.choices[0].message.content,\n",
" \"voice\": \"onyx\",\n",
" },\n",
")\n",
"\n",
"audio = b\"\"\n",
"for chunk in response.iter_content(chunk_size=1024 * 1024):\n",
" audio += chunk\n",
"Audio(audio)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "openai",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 2
}