diff --git a/examples/Structured_output_with_function_calling.ipynb b/examples/Structured_output_with_function_calling.ipynb new file mode 100644 index 00000000..44458c30 --- /dev/null +++ b/examples/Structured_output_with_function_calling.ipynb @@ -0,0 +1,381 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Structured Output with Function Calling\n", + "\n", + "As developers, we often want our models to return structured output (as opposed to raw text), so they can interface with other systems. There's a range of [interesting](https://x.com/goodside/status/1657396491676164096?s=20) ways of doing this. This notebook will specifically look at the **function calling** approach, but we'll briefly go over alternatives at the end." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "**First off, why use function calling for JSON? What do function calls have to do with JSON output?**\n", + "\n", + "\n", + "### Motivation\n", + "\n", + "What does calling functions have to do with JSON output?\n", + "\n", + "Strictly speaking, it doesn't! The key idea is that both involve _structure_ (keys, values, and types), so we can leverage the **inherent structure in the typed arguments of a function** to model the **JSON we want to output**." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's look at an example:\n", + "\n", + "Say we want to extract the `name`, `date`, `event_type`, and `attendees` of an event given some natural language input.\n", + "\n", + "Input:\n", + "```\n", + "Next tuesday Jennifer is hosting a house warming at her place, and so far she's invited Steven and Julian. Alex told me he's going with Jessica, but Samantha can't make it.\n", + "```\n", + "\n", + "Desired Output:\n", + "```javascript\n", + "{\n", + " \"name\": \"Jennifer's Housewarming\",\n", + " \"date\": \"Next Tuesday\", // (to be computed later)\n", + " \"event_type\": \"SOCIAL\",\n", + " \"attendees\": [\"Steven\", \"Julian\", \"Jessica\"]\n", + "}\n", + "```\n", + "\n", + "We could hypothetically caputre this structre in a function's argument by defining it like so:\n", + "\n", + "```python\n", + "def define_event(name: str, date: str, attendees: List[str]):\n", + " pass\n", + "```" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We don't actually have to define this function in python – just define it's interface to pass to our model in the `functions` param! We can optionally use descriptions to help the model understand what an argument is meant to represent." + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [], + "source": [ + "functions = [\n", + " {\n", + " \"name\": \"define_event\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"name\": {\n", + " \"type\": \"string\",\n", + " },\n", + " \"date\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Can be either absolute or relative e.g. Next Week.\",\n", + " },\n", + " \"event_type\": {\"type\": \"string\", \"enum\": [\"SOCIAL\", \"WORK\", \"OTHER\"]},\n", + " \"attendees\": {\"type\": \"array\", \"items\": {\"type\": \"string\"}},\n", + " },\n", + " },\n", + " }\n", + "]" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we can make the call to OpenAI by providing the function in the `functions` param. We'll also set the `function_call` param to `{\"name\":\"define_event\"}` to force the model to call that function." + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + " JSON: {\n", + " \"id\": \"chatcmpl-8HMQCDtgR9NDjRL2G1FRFAV8CLrfe\",\n", + " \"object\": \"chat.completion\",\n", + " \"created\": 1699148456,\n", + " \"model\": \"gpt-4-0613\",\n", + " \"choices\": [\n", + " {\n", + " \"index\": 0,\n", + " \"message\": {\n", + " \"role\": \"assistant\",\n", + " \"content\": null,\n", + " \"function_call\": {\n", + " \"name\": \"define_event\",\n", + " \"arguments\": \"{\\n \\\"name\\\": \\\"Jennifer's House Warming\\\",\\n \\\"date\\\": \\\"Next Tuesday\\\",\\n \\\"event_type\\\": \\\"SOCIAL\\\",\\n \\\"attendees\\\": [\\\"Steven\\\", \\\"Julian\\\", \\\"Alex\\\", \\\"Jessica\\\"]\\n}\"\n", + " }\n", + " },\n", + " \"finish_reason\": \"stop\",\n", + " \"internal_metrics\": [\n", + " {\n", + " \"cached_prompt_tokens\": 0,\n", + " \"total_accepted_tokens\": 0,\n", + " \"total_batched_tokens\": 173,\n", + " \"total_predicted_tokens\": 0,\n", + " \"total_rejected_tokens\": 0,\n", + " \"total_tokens_in_completion\": 174,\n", + " \"cached_embeddings_bytes\": 0,\n", + " \"cached_embeddings_n\": 0,\n", + " \"uncached_embeddings_bytes\": 0,\n", + " \"uncached_embeddings_n\": 0,\n", + " \"fetched_embeddings_bytes\": 0,\n", + " \"fetched_embeddings_n\": 0,\n", + " \"n_evictions\": 0,\n", + " \"batcher_ttft\": 0.06491327285766602,\n", + " \"batcher_initial_queue_time\": 0.0002872943878173828\n", + " }\n", + " ]\n", + " }\n", + " ],\n", + " \"usage\": {\n", + " \"prompt_tokens\": 125,\n", + " \"completion_tokens\": 48,\n", + " \"total_tokens\": 173\n", + " }\n", + "}" + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import openai\n", + "\n", + "messages = [\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": \"Extract the event details.\",\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": \"Next tuesday Jennifer is hosting a house warming at her place, and so far she's invited Steven and Julian. Alex told me he's going with Jessica, but Samantha can't make it.\",\n", + " },\n", + "]\n", + "\n", + "response = openai.ChatCompletion.create(\n", + " model=\"gpt-4\",\n", + " messages=messages,\n", + " functions=functions,\n", + " function_call={\"name\": \"define_event\"},\n", + ")\n", + "response" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, we extract the function call provided by the model. Note the arguments are JSON encoded, so we'll need to decode those as well." + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'name': \"Jennifer's House Warming\",\n", + " 'date': 'Next Tuesday',\n", + " 'event_type': 'SOCIAL',\n", + " 'attendees': ['Steven', 'Julian', 'Alex', 'Jessica']}" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import json\n", + "\n", + "event_details = json.loads(response.choices[0].message.function_call.arguments)\n", + "event_details" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Awesome! Now all together wrapped ina " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import openai\n", + "\n", + "def extract_event_details(user_input):\n", + " messages = [\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": \"Extract the event details.\",\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": user_input,\n", + " },\n", + " ]\n", + " response = openai.ChatCompletion.create(\n", + " model=\"gpt-4\",\n", + " messages=messages,\n", + " functions=functions,\n", + " function_call={\"name\":\"define_event\"},\n", + " )\n", + " return response\n", + "\n", + "response = extract_event_details(\"Next tuesday Jennifer is hosting a house warming at her place, and so far she's invited Steven and Julian. Alex told me he's going with Jessica, but Samantha can't make it.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "event_details = json.loads(response)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import openai, json\n", + "\n", + "SAMPLE_JSON = {\n", + " \"date\": \"2021-10-10\",\n", + " \"event_type\": \"social\",\n", + "}\n", + "\n", + "JSON_SCHEMA = {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"date\": {\n", + " \"type\": \"string\",\n", + " },\n", + " \"event_type\": {\"type\": \"string\", \"enum\": [\"social\", \"work\"]},\n", + " },\n", + "}\n", + "\n", + "\n", + "def extract_json(input_text, json_schema):\n", + " extract_function_name = \"extract_json\"\n", + " functions = [{\"name\": extract_function_name, \"parameters\": json_schema}]\n", + "\n", + " messages = [\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": \"Extract the relevant fields by using the return_json function.\",\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": input_text,\n", + " },\n", + " ]\n", + "\n", + " response = openai.ChatCompletion.create(\n", + " model=\"gpt-4\",\n", + " messages=messages,\n", + " functions=functions,\n", + " function_call={\"name\":extract_function_name},\n", + " )\n", + " return json.loads(response.choices[0].message.function_call.arguments)\n", + "\n", + "\n", + "print(extract_json(\"The date is 2021-10-10 and the event type is social.\", JSON_SCHEMA))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "import os\n", + "\n", + "def maybe_json():\n", + " headers = {\n", + " \"Content-Type\": \"application/json\",\n", + " \"Authorization\": f\"Bearer {os.getenv('OPENAI_API_KEY')}\",\n", + " }\n", + " data = {\n", + " \"model\": \"gpt-3.5-turbo-1106\",\n", + " # \"response_format\": { \"type\": \"json_object\" },\n", + " \"max_tokens\": 200,\n", + " \"messages\": [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": \"Tell me a two sentence story about JSON.\"\n", + " }\n", + " ]\n", + " }\n", + " response = requests.post(\"https://api.openai.com/v1/chat/completions\", headers=headers, json=data)\n", + " return response.json()\n", + "\n", + "print(json.dumps(maybe_json(), indent=4))\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "openai", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.13" + }, + "orig_nbformat": 4 + }, + "nbformat": 4, + "nbformat_minor": 2 +}