Updates to Structured Outputs cookbook (#1362)

2024-11-11 13:11:02 +00:00 · 2024-08-07 17:40:19 -07:00 · 2024-08-07 17:40:19 -07:00 · 940f9c3619
commit 940f9c3619
parent 3089a1b6a5
4 changed files with 227 additions and 222 deletions
--- a/examples/Structured_Outputs_Intro.ipynb
+++ b/examples/Structured_Outputs_Intro.ipynb
@ -7,15 +7,15 @@
   "source": [
    "# Introduction to Structured Outputs\n",
    "\n",
-    "This cookbook introduces Structured Outputs, a new capability in the Chat Completions API and Assistants API that allows to get outputs to follow a strict schema, and illustrates this capability with a few examples.\n",
+    "Structured Outputs is a new capability in the Chat Completions API and Assistants API that guarantees the model will always generate responses that adhere to your supplied JSON Schema. In this cookbook, we will illustrate this capability with a few examples.\n",
    "\n",
-    "Structured outputs can be enabled by setting the parameter `strict: true` in an API call with either a defined response format or function calls.\n",
+    "Structured Outputs can be enabled by setting the parameter `strict: true` in an API call with either a defined response format or function definitions.\n",
    "\n",
    "## Response format usage\n",
    "\n",
-    "Previously, the `response_format` parameter was only available to specify that the model should return a valid json.\n",
+    "Previously, the `response_format` parameter was only available to specify that the model should return a valid JSON.\n",
    "\n",
-    "In addition to this, we are introducing a new way of specifying which json schema to follow.\n",
+    "In addition to this, we are introducing a new way of specifying which JSON schema to follow.\n",
    "\n",
    "\n",
    "## Function call usage\n",
@ -25,7 +25,7 @@
    "\n",
    "## Examples \n",
    "\n",
-    "There are many ways Structured Outputs can be useful, as you can rely on the outputs following a constrained schema needed for your application.\n",
+    "Structured Outputs can be useful in many ways, as you can rely on the outputs following a constrained schema.\n",
    "\n",
    "If you used JSON mode or function calls before, you can think of Structured Outputs as a foolproof version of this.\n",
    "\n",
@ -33,9 +33,9 @@
    "\n",
    "Example use cases include:\n",
    "\n",
-    "- Getting structured answers to display them in a specific way in a UI (cf example 1 in this cookbook)\n",
-    "- Populating a database with extracted content from documents or scrapped web pages (cf example 2 in this cookbook)\n",
-    "- Extracting entities from a user input to call tools with defined parameters (cf example 3 in this cookbook)\n",
+    "- Getting structured answers to display them in a specific way in a UI (example 1 in this cookbook)\n",
+    "- Populating a database with extracted content from documents (example 2 in this cookbook)\n",
+    "- Extracting entities from a user input to call tools with defined parameters (example 3 in this cookbook)\n",
    "\n",
    "More generally, anything that requires fetching data, taking action, or that builds upon complex workflows could benefit from using Structured Outputs."
   ]
@ -50,19 +50,30 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
+   "id": "e96e875d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install openai -U"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
   "id": "21972f02",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
+    "from textwrap import dedent\n",
    "from openai import OpenAI\n",
    "client = OpenAI()"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 54,
+   "execution_count": 2,
   "id": "ae451fb7",
   "metadata": {},
   "outputs": [],
@ -84,7 +95,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 3,
   "id": "b5f6a7b7",
   "metadata": {},
   "outputs": [],
@ -101,7 +112,7 @@
    "    messages=[\n",
    "        {\n",
    "            \"role\": \"system\", \n",
-    "            \"content\": math_tutor_prompt\n",
+    "            \"content\": dedent(math_tutor_prompt)\n",
    "        },\n",
    "        {\n",
    "            \"role\": \"user\", \n",
@ -142,7 +153,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 4,
   "id": "c6c97ba9",
   "metadata": {
    "scrolled": true
@ -152,7 +163,7 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "{\"steps\":[{\"explanation\":\"Start by isolating the term with the variable. We have the equation 8x + 7 = -23. To isolate 8x, we need to subtract 7 from both sides of the equation.\",\"output\":\"8x + 7 - 7 = -23 - 7\"},{\"explanation\":\"By simplifying both sides, we cancel out the +7 on the left side, which results in 8x on the left and -30 on the right.\",\"output\":\"8x = -30\"},{\"explanation\":\"Next, solve for x by dividing both sides by 8. This helps to isolate x on one side of the equation.\",\"output\":\"x = -30 / 8\"},{\"explanation\":\"To simplify the fraction -30/8, divide the numerator and the denominator by their greatest common divisor, which is 2.\",\"output\":\"x = -15/4\"}],\"final_answer\":\"-15/4\"}\n"
+      "{\"steps\":[{\"explanation\":\"Start by isolating the term with the variable. Subtract 7 from both sides to do this.\",\"output\":\"8x + 7 - 7 = -23 - 7\"},{\"explanation\":\"Simplify both sides. On the left side, 7 - 7 cancels out, and on the right side, -23 - 7 equals -30.\",\"output\":\"8x = -30\"},{\"explanation\":\"Next, solve for x by dividing both sides by 8, which will leave x by itself on the left side.\",\"output\":\"8x/8 = -30/8\"},{\"explanation\":\"Simplify the fraction on the right side by dividing both the numerator and the denominator by their greatest common divisor, which is 2.\",\"output\":\"x = -15/4\"}],\"final_answer\":\"x = -15/4\"}\n"
     ]
    }
   ],
@ -167,7 +178,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 5,
   "id": "507c307b",
   "metadata": {},
   "outputs": [],
@ -189,7 +200,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 6,
   "id": "1ba987c4",
   "metadata": {},
   "outputs": [
@ -197,7 +208,7 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "Step 1: Start by isolating the term with the variable. We have the equation 8x + 7 = -23. To isolate 8x, we need to subtract 7 from both sides of the equation.\n",
+      "Step 1: Start by isolating the term with the variable. Subtract 7 from both sides to do this.\n",
      "\n"
     ]
    },
@ -219,7 +230,7 @@
     "text": [
      "\n",
      "\n",
-      "Step 2: By simplifying both sides, we cancel out the +7 on the left side, which results in 8x on the left and -30 on the right.\n",
+      "Step 2: Simplify both sides. On the left side, 7 - 7 cancels out, and on the right side, -23 - 7 equals -30.\n",
      "\n"
     ]
    },
@ -241,14 +252,14 @@
     "text": [
      "\n",
      "\n",
-      "Step 3: Next, solve for x by dividing both sides by 8. This helps to isolate x on one side of the equation.\n",
+      "Step 3: Next, solve for x by dividing both sides by 8, which will leave x by itself on the left side.\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/latex": [
-       "$\\displaystyle x = -30 / 8$"
+       "$\\displaystyle 8x/8 = -30/8$"
      ],
      "text/plain": [
       "<IPython.core.display.Math object>"
@ -263,7 +274,7 @@
     "text": [
      "\n",
      "\n",
-      "Step 4: To simplify the fraction -30/8, divide the numerator and the denominator by their greatest common divisor, which is 2.\n",
+      "Step 4: Simplify the fraction on the right side by dividing both the numerator and the denominator by their greatest common divisor, which is 2.\n",
      "\n"
     ]
    },
@ -293,7 +304,7 @@
    {
     "data": {
      "text/latex": [
-       "$\\displaystyle -15/4$"
+       "$\\displaystyle x = -15/4$"
      ],
      "text/plain": [
       "<IPython.core.display.Math object>"
@ -314,12 +325,12 @@
   "source": [
    "## Using the SDK `parse` helper\n",
    "\n",
-    "The new version of the SDK introduces a `parse` helper to provide your own Pydantic model instead of having to define the json schema. We recommend using this method if possible."
+    "The new version of the SDK introduces a `parse` helper to provide your own Pydantic model instead of having to define the JSON schema. We recommend using this method if possible."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": 7,
   "id": "eef4d9be",
   "metadata": {},
   "outputs": [],
@ -338,7 +349,7 @@
    "    completion = client.beta.chat.completions.parse(\n",
    "        model=MODEL,\n",
    "        messages=[\n",
-    "            {\"role\": \"system\", \"content\": math_tutor_prompt},\n",
+    "            {\"role\": \"system\", \"content\": dedent(math_tutor_prompt)},\n",
    "            {\"role\": \"user\", \"content\": question},\n",
    "        ],\n",
    "        response_format=MathReasoning,\n",
@ -349,7 +360,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": 8,
   "id": "4caa3049",
   "metadata": {},
   "outputs": [],
@ -359,7 +370,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": 9,
   "id": "8f2ac4a5",
   "metadata": {},
   "outputs": [
@ -367,7 +378,7 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "[Step(explanation='To isolate the term with the variable, we need to get rid of the constant on the left side of the equation by subtracting it from both sides.', output='8x + 7 - 7 = -23 - 7'), Step(explanation='Simplifying both sides gives us an equation with the variable term on one side and a constant on the other side.', output='8x = -30'), Step(explanation='To solve for x, we need to divide both sides by the coefficient of x, which is 8.', output='x = -30/8'), Step(explanation='Simplifying the fraction by dividing both the numerator and the denominator by their greatest common divisor, which is 2.', output='x = -15/4')]\n",
+      "[Step(explanation='The first step in solving the equation is to isolate the term with the variable. We start by subtracting 7 from both sides of the equation to move the constant to the right side.', output='8x + 7 - 7 = -23 - 7'), Step(explanation='Simplifying both sides, we get the equation with the variable term on the left and the constants on the right.', output='8x = -30'), Step(explanation='Now, to solve for x, we need x to be by itself. We do this by dividing both sides of the equation by 8, the coefficient of x.', output='x = -30 / 8'), Step(explanation='Simplifying the division, we find the value of x. -30 divided by 8 simplifies to the fraction -15/4 or in decimal form, -3.75.', output='x = -15/4')]\n",
      "Final answer:\n",
      "x = -15/4\n"
     ]
@ -395,7 +406,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 19,
+   "execution_count": 13,
   "id": "a7e0c6a4",
   "metadata": {},
   "outputs": [
@ -403,16 +414,16 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "ParsedChatCompletionMessage[MathReasoning](refusal=\"I'm sorry, I cannot assist with that request.\", content=None, role='assistant', function_call=None, tool_calls=[], parsed=None)\n"
+      "I'm sorry, I can't assist with that request.\n"
     ]
    }
   ],
   "source": [
    "refusal_question = \"how can I build a bomb?\"\n",
    "\n",
-    "refusal_result = get_math_solution(refusal_question) \n",
+    "result = get_math_solution(refusal_question) \n",
    "\n",
-    "print(refusal_result)"
+    "print(result.refusal)"
   ]
  },
  {
@ -426,62 +437,36 @@
    "\n",
    "This could be useful if you need to transform text or visual content into a structured object, for example to display it in a certain way or to populate database.\n",
    "\n",
-    "We will take web scraping as an example, using Wikipedia articles discussing inventions."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c284c7c2",
-   "metadata": {},
-   "source": [
-    "### Data preparation\n",
-    "\n",
-    "We will start by scraping content from multiple articles."
+    "We will take AI-generated articles discussing inventions as an example."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 21,
+   "execution_count": 20,
   "id": "7dfc7cd1",
   "metadata": {},
   "outputs": [],
   "source": [
-    "import requests\n",
-    "from bs4 import BeautifulSoup "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "id": "736a9e24",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def get_article_content(url):\n",
-    "    response = requests.get(url)\n",
-    "    soup = BeautifulSoup(response.content, \"html.parser\")\n",
-    "    html_content = soup.find(\"div\", class_=\"mw-parser-output\")\n",
-    "    content = \"\\n\".join(p.text for p in html_content.find_all(\"p\"))\n",
-    "    return content"
+    "articles = [\n",
+    "    \"./data/structured_outputs_articles/cnns.md\",\n",
+    "    \"./data/structured_outputs_articles/llms.md\",\n",
+    "    \"./data/structured_outputs_articles/moe.md\"\n",
+    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
-   "id": "604db8bd",
+   "id": "736a9e24",
   "metadata": {},
   "outputs": [],
   "source": [
-    "urls = [\n",
-    "    # Article on CNNs\n",
-    "    \"https://en.wikipedia.org/wiki/Convolutional_neural_network\",\n",
-    "    # Article on LLMs\n",
-    "    \"https://wikipedia.org/wiki/Large_language_model\",\n",
-    "    # Article on MoE\n",
-    "    \"https://en.wikipedia.org/wiki/Mixture_of_experts\"\n",
-    "]\n",
-    "\n",
-    "content = [get_article_content(url) for url in urls]"
+    "def get_article_content(path):\n",
+    "    with open(path, 'r') as f:\n",
+    "        content = f.read()\n",
+    "    return content\n",
+    "        \n",
+    "content = [get_article_content(path) for path in articles]"
   ]
  },
  {
@ -496,7 +481,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 29,
+   "execution_count": 25,
   "id": "5eae3aea",
   "metadata": {},
   "outputs": [],
@ -529,7 +514,7 @@
    "        model=MODEL,\n",
    "        temperature=0.2,\n",
    "        messages=[\n",
-    "            {\"role\": \"system\", \"content\": summarization_prompt},\n",
+    "            {\"role\": \"system\", \"content\": dedent(summarization_prompt)},\n",
    "            {\"role\": \"user\", \"content\": text}\n",
    "        ],\n",
    "        response_format=ArticleSummary,\n",
@ -540,7 +525,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 26,
   "id": "bb9787fd",
   "metadata": {},
   "outputs": [
@ -568,7 +553,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 31,
+   "execution_count": 27,
   "id": "afca0de1",
   "metadata": {},
   "outputs": [],
@ -587,7 +572,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 32,
+   "execution_count": 28,
   "id": "14634a72",
   "metadata": {},
   "outputs": [
@ -597,21 +582,24 @@
     "text": [
      "ARTICLE 0\n",
      "\n",
-      "Invented year: 1980\n",
+      "Invented year: 1989\n",
      "\n",
-      "Summary: A convolutional neural network (CNN) is a type of neural network designed to process data with grid-like topology, such as images, by using convolutional layers to automatically learn spatial hierarchies of features.\n",
+      "Summary: Convolutional Neural Networks (CNNs) are deep neural networks used for processing structured grid data like images, revolutionizing computer vision.\n",
      "\n",
      "Inventors:\n",
-      "- Fukushima\n",
+      "- Yann LeCun\n",
+      "- Léon Bottou\n",
+      "- Yoshua Bengio\n",
+      "- Patrick Haffner\n",
      "\n",
      "Concepts:\n",
-      "- Convolutional Layers: These layers apply a convolution operation to the input, passing the result to the next layer. They are designed to automatically and adaptively learn spatial hierarchies of features from input images.\n",
-      "- Pooling Layers: Pooling layers reduce the dimensions of data by combining the outputs of neuron clusters at one layer into a single neuron in the next layer, which helps to control overfitting and reduce computational cost.\n",
-      "- ReLU Activation Function: The rectified linear unit (ReLU) is a non-linear activation function used in CNNs that introduces non-linearity to the decision function and overall network without affecting the receptive fields of the convolution layers.\n",
-      "- Weight Sharing: A key feature of CNNs where many neurons can share the same filter, reducing the memory footprint and allowing the network to learn more efficiently.\n",
-      "- Applications: CNNs are widely used in image and video recognition, recommender systems, image classification, medical image analysis, and natural language processing.\n",
+      "- Convolutional Layers: These layers apply learnable filters to input data to produce feature maps that detect specific features like edges and patterns.\n",
+      "- Pooling Layers: Also known as subsampling layers, they reduce the spatial dimensions of feature maps, commonly using max pooling to retain important features while reducing size.\n",
+      "- Fully Connected Layers: These layers connect every neuron in one layer to every neuron in the next, performing the final classification or regression task.\n",
+      "- Training: CNNs are trained using backpropagation and gradient descent to learn optimal filter values that minimize the loss function.\n",
+      "- Applications: CNNs are used in image classification, object detection, medical image analysis, and image segmentation, forming the basis of many state-of-the-art computer vision systems.\n",
      "\n",
-      "Description: Convolutional neural networks (CNNs) are a class of deep neural networks primarily used for analyzing visual imagery. They are inspired by biological processes and are designed to automatically and adaptively learn spatial hierarchies of features from input images through backpropagation, using a structure that includes convolutional layers, pooling layers, and fully connected layers.\n",
+      "Description: Convolutional Neural Networks (CNNs) are a type of deep learning model designed to process structured grid data, such as images, by using layers of convolutional, pooling, and fully connected layers to extract and classify features.\n",
      "\n",
      "\n",
      "\n",
@ -619,26 +607,24 @@
      "\n",
      "Invented year: 2017\n",
      "\n",
-      "Summary: A large language model (LLM) is a computational model designed for general-purpose language generation and natural language processing tasks.\n",
+      "Summary: Large Language Models (LLMs) are AI models designed to understand and generate human language using transformer architecture.\n",
      "\n",
      "Inventors:\n",
-      "- Vaswani\n",
-      "- Shazeer\n",
-      "- Parmar\n",
-      "- Uszkoreit\n",
-      "- Jones\n",
-      "- Gomez\n",
-      "- Kaiser\n",
-      "- Polosukhin\n",
+      "- Ashish Vaswani\n",
+      "- Noam Shazeer\n",
+      "- Niki Parmar\n",
+      "- Jakob Uszkoreit\n",
+      "- Llion Jones\n",
+      "- Aidan N. Gomez\n",
+      "- Łukasz Kaiser\n",
+      "- Illia Polosukhin\n",
      "\n",
      "Concepts:\n",
-      "- Transformer Architecture: Introduced in 2017, this architecture is the foundation of LLMs, enabling efficient processing and generation of large-scale text data through attention mechanisms.\n",
-      "- Prompt Engineering: A technique used to guide LLMs in generating desired outputs by crafting specific input prompts, reducing the need for traditional fine-tuning.\n",
-      "- Tokenization: The process of converting text into numerical tokens that LLMs can process, often using methods like byte-pair encoding to handle large vocabularies.\n",
-      "- Reinforcement Learning from Human Feedback (RLHF): A method to fine-tune LLMs by using human preferences to guide the model's learning process, enhancing its performance on specific tasks.\n",
-      "- Emergent Abilities: Unexpected capabilities that arise in LLMs as they scale, such as in-context learning, which allows models to learn from examples within a single conversation.\n",
+      "- Transformer Architecture: A neural network architecture that allows for highly parallelized processing and generation of text, featuring components like embeddings, transformer blocks, attention mechanisms, and decoders.\n",
+      "- Pre-training and Fine-tuning: The two-stage training process for LLMs, where models are first trained on large text corpora to learn language patterns, followed by task-specific training on labeled datasets.\n",
+      "- Applications of LLMs: LLMs are used in text generation, machine translation, summarization, sentiment analysis, and conversational agents, enhancing human-machine interactions.\n",
      "\n",
-      "Description: Large language models (LLMs) are advanced computational models that utilize the transformer architecture to perform a variety of natural language processing tasks, including text generation, classification, and more, by learning from vast datasets.\n",
+      "Description: Large Language Models (LLMs) leverage transformer architecture to process and generate human language, significantly advancing natural language processing applications such as translation, summarization, and conversational agents.\n",
      "\n",
      "\n",
      "\n",
@ -646,21 +632,20 @@
      "\n",
      "Invented year: 1991\n",
      "\n",
-      "Summary: Mixture of Experts (MoE) is a machine learning technique that uses multiple expert networks to handle different parts of a problem space, optimizing computational efficiency by activating only relevant experts for each input.\n",
+      "Summary: Mixture of Experts (MoE) is a machine learning technique that improves model performance by combining predictions from multiple specialized models.\n",
      "\n",
      "Inventors:\n",
-      "- Hampshire\n",
-      "- Waibel\n",
+      "- Michael I. Jordan\n",
+      "- Robert A. Jacobs\n",
      "\n",
      "Concepts:\n",
-      "- Expert Networks: In MoE, expert networks are specialized models that handle specific regions of the problem space, activated based on input relevance.\n",
-      "- Gating Function: A mechanism in MoE that determines which experts to activate for a given input, often using a softmax function to assign probabilities.\n",
-      "- Gradient Descent: A method used to train both the experts and the gating function in MoE by minimizing a loss function.\n",
-      "- Hierarchical MoE: An extension of MoE that uses multiple levels of gating, similar to decision trees, to manage complex problem spaces.\n",
-      "- Sparsely-Gated MoE: A variant of MoE where only a subset of experts are activated, reducing computational cost and improving efficiency.\n",
-      "- Load Balancing: A challenge in MoE where the gating function must distribute queries evenly among experts to prevent some from being overworked while others are underutilized.\n",
+      "- Experts: Individual models trained to specialize in different parts of the input space or specific aspects of the task.\n",
+      "- Gating Network: A network responsible for dynamically selecting and weighting the outputs of experts for a given input.\n",
+      "- Combiner: Aggregates the outputs from selected experts, weighted by the gating network, to produce the final model output.\n",
+      "- Training: Involves training each expert on specific data subsets and training the gating network to optimally combine expert outputs.\n",
+      "- Applications: MoE models are used in natural language processing, computer vision, speech recognition, and recommendation systems to improve accuracy and efficiency.\n",
      "\n",
-      "Description: Mixture of Experts (MoE) is a machine learning framework where multiple expert networks are employed to divide a problem space into distinct regions, with only relevant experts activated for each input, enhancing computational efficiency and specialization.\n",
+      "Description: Mixture of Experts (MoE) is a machine learning framework that enhances model performance by integrating the outputs of multiple specialized models, known as experts, through a gating network that dynamically selects and weights their contributions to the final prediction.\n",
      "\n",
      "\n",
      "\n"
@ -688,11 +673,15 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 38,
+   "execution_count": 29,
   "id": "ee802699",
   "metadata": {},
   "outputs": [],
   "source": [
+    "from enum import Enum\n",
+    "from typing import Union\n",
+    "import openai\n",
+    "\n",
    "product_search_prompt = '''\n",
    "    You are a clothes recommendation agent, specialized in finding the perfect match for a user.\n",
    "    You will be provided with a user input and additional context such as user gender and age group, and season.\n",
@ -708,34 +697,16 @@
    "    There are a wide range of colors available, but try to stick to regular color names.\n",
    "'''\n",
    "\n",
-    "product_search_function = {\n",
-    "    \"type\": \"function\",\n",
-    "    \"function\": {\n",
-    "        \"name\": \"product_search\",\n",
-    "        \"description\": \"Search for a match in the product database\",\n",
-    "        \"parameters\": {\n",
-    "            \"type\": \"object\",\n",
-    "            \"properties\": {\n",
-    "                \"category\": {\n",
-    "                    \"type\": \"string\",\n",
-    "                    \"description\": \"The broad category of the product\",\n",
-    "                    \"enum\": [\"shoes\", \"jackets\", \"tops\", \"bottoms\"]\n",
-    "                },\n",
-    "                \"subcategory\": {\n",
-    "                    \"type\": \"string\",\n",
-    "                    \"description\": \"The sub category of the product, within the broader category\",\n",
-    "                },\n",
-    "                \"color\": {\n",
-    "                    \"type\": \"string\",\n",
-    "                    \"description\": \"The color of the product\",\n",
-    "                },      \n",
-    "            },\n",
-    "            \"required\": [\"category\", \"subcategory\", \"color\"],\n",
-    "            \"additionalProperties\": False,\n",
-    "        }\n",
-    "    },\n",
-    "    \"strict\": True\n",
-    "}\n",
+    "class Category(str, Enum):\n",
+    "    shoes = \"shoes\"\n",
+    "    jackets = \"jackets\"\n",
+    "    tops = \"tops\"\n",
+    "    bottoms = \"bottoms\"\n",
+    "\n",
+    "class ProductSearchParameters(BaseModel):\n",
+    "    category: Category\n",
+    "    subcategory: str\n",
+    "    color: str\n",
    "\n",
    "def get_response(user_input, context):\n",
    "    response = client.chat.completions.create(\n",
@ -744,14 +715,16 @@
    "        messages=[\n",
    "            {\n",
    "                \"role\": \"system\",\n",
-    "                \"content\": product_search_prompt\n",
+    "                \"content\": dedent(product_search_prompt)\n",
    "            },\n",
    "            {\n",
    "                \"role\": \"user\",\n",
    "                \"content\": f\"CONTEXT: {context}\\n USER INPUT: {user_input}\"\n",
    "            }\n",
    "        ],\n",
-    "        tools=[product_search_function]\n",
+    "        tools=[\n",
+    "            openai.pydantic_function_tool(ProductSearchParameters, name=\"product_search\", description=\"Search for a match in the product database\")\n",
+    "        ]\n",
    "    )\n",
    "\n",
    "    return response.choices[0].message.tool_calls"
@ -759,7 +732,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 28,
+   "execution_count": 30,
   "id": "65ebeb16",
   "metadata": {},
   "outputs": [],
@ -794,7 +767,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 41,
+   "execution_count": 31,
   "id": "f84b02b0",
   "metadata": {},
   "outputs": [],
@ -810,7 +783,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 39,
+   "execution_count": null,
   "id": "9b5e2bc4",
   "metadata": {},
   "outputs": [],
@ -821,83 +794,10 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 42,
+   "execution_count": null,
   "id": "85292b30",
   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Input: I'm looking for a new coat. I'm always cold so please something warm! Ideally something that matches my eyes.\n",
-      "\n",
-      "Context: Gender: female, Age group: 40-50, Physical appearance: blue eyes\n",
-      "\n",
-      "Product search arguments:\n",
-      "category: 'jackets'\n",
-      "subcategory: 'winter coats'\n",
-      "color: 'blue'\n",
-      "\n",
-      "\n",
-      "\n",
-      "Input: I'm going on a trail in Scotland this summer. It's goind to be rainy. Help me find something.\n",
-      "\n",
-      "Context: Gender: male, Age group: 30-40\n",
-      "\n",
-      "Product search arguments:\n",
-      "category: 'jackets'\n",
-      "subcategory: 'rain jackets'\n",
-      "color: 'black'\n",
-      "\n",
-      "\n",
-      "\n",
-      "Input: I'm trying to complete a rock look. I'm missing shoes. Any suggestions?\n",
-      "\n",
-      "Context: Gender: female, Age group: 20-30\n",
-      "\n",
-      "Product search arguments:\n",
-      "category: 'shoes'\n",
-      "subcategory: 'boots'\n",
-      "color: 'black'\n",
-      "\n",
-      "\n",
-      "\n",
-      "Input: Help me find something very simple for my first day at work next week. Something casual and neutral.\n",
-      "\n",
-      "Context: Gender: male, Season: summer\n",
-      "\n",
-      "Product search arguments:\n",
-      "category: 'tops'\n",
-      "subcategory: 'shirts'\n",
-      "color: 'white'\n",
-      "\n",
-      "\n",
-      "\n",
-      "Input: Help me find something very simple for my first day at work next week. Something casual and neutral.\n",
-      "\n",
-      "Context: Gender: male, Season: winter\n",
-      "\n",
-      "Product search arguments:\n",
-      "category: 'tops'\n",
-      "subcategory: 'sweaters'\n",
-      "color: 'gray'\n",
-      "\n",
-      "\n",
-      "\n",
-      "Input: Can you help me find a dress for a Barbie-themed party in July?\n",
-      "\n",
-      "Context: Gender: female, Age group: 20-30\n",
-      "\n",
-      "Product search arguments:\n",
-      "category: 'tops'\n",
-      "subcategory: 'blouses'\n",
-      "color: 'pink'\n",
-      "\n",
-      "\n",
-      "\n"
-     ]
-    }
-   ],
+   "outputs": [],
   "source": [
    "for ex in example_inputs:\n",
    "    print_tool_call(ex['user_input'], ex['context'], ex['result'])"
@ -912,9 +812,9 @@
    "\n",
    "In this cookbook, we've explored the new Structured Outputs capability through multiple examples.\n",
    "\n",
-    "Whether you've used JSON Mode or function calling before and you want more robustness in your application, or you're just starting out with structured formats, we hope you will be able to apply the different concepts introduced here to your own use case!\n",
+    "Whether you've used JSON mode or function calling before and you want more robustness in your application, or you're just starting out with structured formats, we hope you will be able to apply the different concepts introduced here to your own use case!\n",
    "\n",
-    "Please note that Structured Outputs are only available with the `gpt-4o-mini` and `gpt-4o-2024-08-06` models."
+    "Structured Outputs is only available with `gpt-4o-mini` , `gpt-4o-2024-08-06`, and future models."
   ]
  }
 ],
--- a/examples/data/structured_outputs_articles/cnns.md
+++ b/examples/data/structured_outputs_articles/cnns.md
@ -0,0 +1,31 @@
+### Convolutional Neural Networks (CNNs)
+
+Convolutional Neural Networks (CNNs) are a class of deep neural networks primarily used for processing structured grid data like images. They were invented by Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner in 1989. CNNs have revolutionized the field of computer vision, enabling advancements in areas such as image classification, object detection, and image segmentation.
+
+#### Architecture
+
+A typical CNN architecture consists of multiple layers, including convolutional layers, pooling layers, and fully connected layers.
+
+- **Convolutional Layers:** These layers apply a set of learnable filters (kernels) to the input data. Each filter convolves across the input data, producing a feature map that detects specific features such as edges, textures, and patterns. The parameters of these filters are learned during training.
+
+- **Pooling Layers:** Also known as subsampling or downsampling layers, pooling layers reduce the spatial dimensions of the feature maps. The most common type is max pooling, which selects the maximum value from each region of the feature map, thereby reducing its size and computation requirements while retaining important features.
+
+- **Fully Connected Layers:** After several convolutional and pooling layers, the output is flattened and fed into one or more fully connected layers, which perform the final classification or regression task. These layers resemble those in traditional neural networks, connecting every neuron in one layer to every neuron in the next.
+
+#### Training
+
+CNNs are trained using backpropagation and gradient descent. During training, the network learns the optimal filter values that minimize the loss function, typically a measure of the difference between the predicted and actual labels. The training process involves adjusting the weights of the filters and the fully connected layers through iterative updates.
+
+#### Applications
+
+CNNs have become the cornerstone of many state-of-the-art systems in computer vision. Some notable applications include:
+
+- **Image Classification:** CNNs can classify images into various categories with high accuracy. They have been used in systems like Google Photos and Facebook's automatic tagging.
+
+- **Object Detection:** CNNs can detect and localize objects within an image, which is essential for tasks like autonomous driving and facial recognition.
+
+- **Medical Image Analysis:** CNNs assist in diagnosing diseases by analyzing medical images such as X-rays, MRIs, and CT scans.
+
+- **Image Segmentation:** CNNs are used to partition an image into meaningful segments, useful in applications such as scene understanding and medical image analysis.
+
+Overall, CNNs have significantly advanced the field of artificial intelligence, particularly in tasks that involve visual data, and continue to be an area of active research and development. The pioneering work of LeCun and his colleagues laid the foundation for these transformative technologies, which have since become integral to modern computer vision systems.
--- a/examples/data/structured_outputs_articles/llms.md
+++ b/examples/data/structured_outputs_articles/llms.md
@ -0,0 +1,39 @@
+### Large Language Models (LLMs)
+
+Large Language Models (LLMs) are a type of artificial intelligence model designed to understand and generate human language. The development of LLMs has been driven by advances in deep learning and natural language processing. A significant milestone in their evolution was the introduction of the transformer architecture by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin in 2017. LLMs have significantly advanced the fields of natural language processing (NLP) and understanding, enabling applications such as machine translation, text summarization, and conversational agents.
+
+#### Architecture
+
+LLMs are typically based on transformer architecture, which allows them to process and generate text in a highly parallelized manner. Key components of LLM architecture include:
+
+- **Embeddings:** Input text is converted into a continuous vector space using embedding layers. This step transforms discrete words or subwords into numerical representations that capture semantic relationships.
+
+- **Transformer Blocks:** LLMs consist of multiple stacked transformer blocks. Each block contains self-attention mechanisms and feed-forward neural networks. The self-attention mechanism allows the model to weigh the importance of different words in a context, capturing long-range dependencies and relationships within the text.
+
+- **Attention Mechanisms:** Attention mechanisms enable the model to focus on relevant parts of the input text when generating output. This is crucial for tasks like translation, where the model needs to align source and target language elements accurately.
+
+- **Decoder:** In generative models, a decoder is used to generate text from the encoded representations. The decoder uses masked self-attention to ensure that predictions for each word depend only on previously generated words.
+
+#### Training
+
+Training LLMs involves pre-training and fine-tuning stages:
+
+- **Pre-training:** During this phase, the model is trained on a large corpus of text data using unsupervised learning objectives, such as predicting masked words or the next word in a sequence. This helps the model learn language patterns, grammar, and context.
+
+- **Fine-tuning:** After pre-training, the model is fine-tuned on specific tasks using supervised learning. This stage involves training the model on labeled datasets to perform tasks like sentiment analysis, question answering, or text classification.
+
+#### Applications
+
+LLMs have a wide range of applications across various domains:
+
+- **Text Generation:** LLMs can generate coherent and contextually relevant text, useful for creative writing, content creation, and dialogue generation in chatbots.
+
+- **Machine Translation:** LLMs power modern translation systems, providing accurate translations between multiple languages by understanding the nuances and context of the source text.
+
+- **Summarization:** LLMs can condense long documents into concise summaries, aiding in information retrieval and reducing the time required to understand large volumes of text.
+
+- **Sentiment Analysis:** LLMs can analyze text to determine the sentiment expressed, valuable for market analysis, customer feedback, and social media monitoring.
+
+- **Conversational Agents:** LLMs enable the development of advanced chatbots and virtual assistants that can understand and respond to user queries naturally and contextually.
+
+Overall, LLMs have transformed the field of NLP, enabling more sophisticated and human-like interactions between machines and users, and continue to evolve with ongoing research and technological advancements. The introduction of the transformer architecture by Vaswani et al. has been instrumental in this transformation, providing a foundation for the development of increasingly powerful language models.
--- a/examples/data/structured_outputs_articles/moe.md
+++ b/examples/data/structured_outputs_articles/moe.md
@ -0,0 +1,35 @@
+### Mixture of Experts (MoE)
+
+Mixture of Experts (MoE) is a machine learning technique designed to enhance model performance by combining the predictions of multiple specialized models, or "experts." The concept was introduced by Michael I. Jordan and Robert A. Jacobs in 1991. MoE models have been applied in various fields, including natural language processing, computer vision, and speech recognition, to improve accuracy and efficiency.
+
+#### Architecture
+
+A typical MoE architecture consists of several key components:
+
+- **Experts:** These are individual models, each trained to specialize in different parts of the input space or specific aspects of the task. Each expert might be a neural network trained to focus on particular features or patterns within the data.
+
+- **Gating Network:** The gating network is responsible for dynamically selecting which experts should be consulted for a given input. It assigns weights to each expert's output, determining their contribution to the final prediction. The gating network typically uses a softmax function to produce these weights, ensuring they sum to one.
+
+- **Combiner:** The combiner aggregates the outputs from the selected experts, weighted by the gating network. This combination can be a weighted sum or another aggregation method, producing the final output of the MoE model.
+
+#### Training
+
+Training an MoE model involves two main stages:
+
+- **Expert Training:** Each expert model is trained on a subset of the data or a specific aspect of the task. This training can be done independently, with each expert focusing on maximizing performance within its specialized domain.
+
+- **Gating Network Training:** The gating network is trained to learn the optimal combination of experts for different inputs. It uses the loss from the final combined output to adjust its parameters, learning which experts are most relevant for various parts of the input space.
+
+#### Applications
+
+MoE models have a wide range of applications across different domains:
+
+- **Natural Language Processing:** In tasks like machine translation and language modeling, MoE models can dynamically allocate computational resources, allowing different experts to handle different linguistic features or contexts.
+
+- **Computer Vision:** MoE models can be used for image classification and object detection, where different experts specialize in recognizing specific types of objects or features within images.
+
+- **Speech Recognition:** In speech recognition systems, MoE models can improve accuracy by assigning different experts to handle variations in speech, such as accents, intonations, or background noise.
+
+- **Recommendation Systems:** MoE models can enhance recommendation engines by using different experts to analyze various aspects of user behavior and preferences, providing more personalized recommendations.
+
+Overall, Mixture of Experts models offer a powerful framework for improving the performance and efficiency of machine learning systems. By leveraging specialized experts and dynamically selecting the most relevant ones for each input, MoE models can achieve superior results in a variety of complex tasks. The pioneering work of Jordan and Jacobs laid the foundation for this innovative approach, which continues to evolve and find new applications in modern AI research.