langchain/docs/extras/modules/chains/how_to/openai_functions.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "54ccb772",
   "metadata": {},
   "source": [
    "# Using OpenAI functions\n",
    "This walkthrough demonstrates how to incorporate OpenAI function-calling API's in a chain. We'll go over: \n",
    "1. How to use functions to get structured outputs from ChatOpenAI\n",
    "2. How to create a generic chain that uses (multiple) functions\n",
    "3. How to create a chain that actually executes the chosen function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "767ac575",
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Optional\n",
    "\n",
    "from langchain.chains.openai_functions import (\n",
    "    create_openai_fn_chain,\n",
    "    create_structured_output_chain,\n",
    ")\n",
    "from langchain.chat_models import ChatOpenAI\n",
    "from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate\n",
    "from langchain.schema import HumanMessage, SystemMessage"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "976b6496",
   "metadata": {},
   "source": [
    "## Getting structured outputs\n",
    "We can take advantage of OpenAI functions to try and force the model to return a particular kind of structured output. We'll use the `create_structured_output_chain` to create our chain, which takes the desired structured output either as a Pydantic class or as JsonSchema.\n",
    "\n",
    "See here for relevant [reference docs](https://api.python.langchain.com/en/latest/chains/langchain.chains.openai_functions.base.create_structured_output_chain.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e052faae",
   "metadata": {},
   "source": [
    "### Using Pydantic classes\n",
    "When passing in Pydantic classes to structure our text, we need to make sure to have a docstring description for the class. It also helps to have descriptions for each of the classes attributes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "0e085c99",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pydantic import BaseModel, Field\n",
    "\n",
    "\n",
    "class Person(BaseModel):\n",
    "    \"\"\"Identifying information about a person.\"\"\"\n",
    "\n",
    "    name: str = Field(..., description=\"The person's name\")\n",
    "    age: int = Field(..., description=\"The person's age\")\n",
    "    fav_food: Optional[str] = Field(None, description=\"The person's favorite food\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "b459a33e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a world class algorithm for extracting information in structured formats.\n",
      "Human: Use the given format to extract information from the following input:\n",
      "Human: Sally is 13\n",
      "Human: Tips: Make sure to answer in the correct format\u001b[0m\n",
      " {'function_call': {'name': '_OutputFormatter', 'arguments': '{\\n  \"output\": {\\n    \"name\": \"Sally\",\\n    \"age\": 13,\\n    \"fav_food\": \"Unknown\"\\n  }\\n}'}}\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Person(name='Sally', age=13, fav_food='Unknown')"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# If we pass in a model explicitly, we need to make sure it supports the OpenAI function-calling API.\n",
    "llm = ChatOpenAI(model=\"gpt-4\", temperature=0)\n",
    "\n",
    "prompt_msgs = [\n",
    "    SystemMessage(\n",
    "        content=\"You are a world class algorithm for extracting information in structured formats.\"\n",
    "    ),\n",
    "    HumanMessage(\n",
    "        content=\"Use the given format to extract information from the following input:\"\n",
    "    ),\n",
    "    HumanMessagePromptTemplate.from_template(\"{input}\"),\n",
    "    HumanMessage(content=\"Tips: Make sure to answer in the correct format\"),\n",
    "]\n",
    "prompt = ChatPromptTemplate(messages=prompt_msgs)\n",
    "\n",
    "chain = create_structured_output_chain(Person, llm, prompt, verbose=True)\n",
    "chain.run(\"Sally is 13\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3539936",
   "metadata": {},
   "source": [
    "To extract arbitrarily many structured outputs of a given format, we can just create a wrapper Pydantic class  that takes a sequence of the original class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "4d8ea815",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a world class algorithm for extracting information in structured formats.\n",
      "Human: Use the given format to extract information from the following input:\n",
      "Human: Sally is 13, Joey just turned 12 and loves spinach. Caroline is 10 years older than Sally, so she's 23.\n",
      "Human: Tips: Make sure to answer in the correct format\u001b[0m\n",
      " {'function_call': {'name': '_OutputFormatter', 'arguments': '{\\n  \"output\": {\\n    \"people\": [\\n      {\\n        \"name\": \"Sally\",\\n        \"age\": 13,\\n        \"fav_food\": \"\"\\n      },\\n      {\\n        \"name\": \"Joey\",\\n        \"age\": 12,\\n        \"fav_food\": \"spinach\"\\n      },\\n      {\\n        \"name\": \"Caroline\",\\n        \"age\": 23,\\n        \"fav_food\": \"\"\\n      }\\n    ]\\n  }\\n}'}}\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "People(people=[Person(name='Sally', age=13, fav_food=''), Person(name='Joey', age=12, fav_food='spinach'), Person(name='Caroline', age=23, fav_food='')])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from typing import Sequence\n",
    "\n",
    "\n",
    "class People(BaseModel):\n",
    "    \"\"\"Identifying information about all people in a text.\"\"\"\n",
    "\n",
    "    people: Sequence[Person] = Field(..., description=\"The people in the text\")\n",
    "\n",
    "\n",
    "chain = create_structured_output_chain(People, llm, prompt, verbose=True)\n",
    "chain.run(\n",
    "    \"Sally is 13, Joey just turned 12 and loves spinach. Caroline is 10 years older than Sally, so she's 23.\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea66e10e",
   "metadata": {},
   "source": [
    "### Using JsonSchema\n",
    "\n",
    "We can also pass in JsonSchema instead of Pydantic classes to specify the desired structure. When we do this, our chain will output json corresponding to the properties described in the JsonSchema, instead of a Pydantic class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "3484415e",
   "metadata": {},
   "outputs": [],
   "source": [
    "json_schema = {\n",
    "    \"title\": \"Person\",\n",
    "    \"description\": \"Identifying information about a person.\",\n",
    "    \"type\": \"object\",\n",
    "    \"properties\": {\n",
    "        \"name\": {\"title\": \"Name\", \"description\": \"The person's name\", \"type\": \"string\"},\n",
    "        \"age\": {\"title\": \"Age\", \"description\": \"The person's age\", \"type\": \"integer\"},\n",
    "        \"fav_food\": {\n",
    "            \"title\": \"Fav Food\",\n",
    "            \"description\": \"The person's favorite food\",\n",
    "            \"type\": \"string\",\n",
    "        },\n",
    "    },\n",
    "    \"required\": [\"name\", \"age\"],\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "be9b76b3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a world class algorithm for extracting information in structured formats.\n",
      "Human: Use the given format to extract information from the following input:\n",
      "Human: Sally is 13\n",
      "Human: Tips: Make sure to answer in the correct format\u001b[0m\n",
      " {'function_call': {'name': 'output_formatter', 'arguments': '{\\n  \"name\": \"Sally\",\\n  \"age\": 13\\n}'}}\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'name': 'Sally', 'age': 13}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain = create_structured_output_chain(json_schema, llm, prompt, verbose=True)\n",
    "chain.run(\"Sally is 13\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12394696",
   "metadata": {},
   "source": [
    "## Creating a generic OpenAI functions chain\n",
    "To create a generic OpenAI functions chain, we can use the `create_openai_fn_chain` method. This is the same as `create_structured_output_chain` except that instead of taking a single output schema, it takes a sequence of function definitions.\n",
    "\n",
    "Functions can be passed in as:\n",
    "- dicts conforming to OpenAI functions spec,\n",
    "- Pydantic classes, in which case they should have docstring descriptions of the function they represent and descriptions for each of the parameters,\n",
    "- Python functions, in which case they should have docstring descriptions of the function and args, along with type hints.\n",
    "\n",
    "See here for relevant [reference docs](https://api.python.langchain.com/en/latest/chains/langchain.chains.openai_functions.base.create_openai_fn_chain.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff19be25",
   "metadata": {},
   "source": [
    "### Using Pydantic classes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "17f52508",
   "metadata": {},
   "outputs": [],
   "source": [
    "class RecordPerson(BaseModel):\n",
    "    \"\"\"Record some identifying information about a pe.\"\"\"\n",
    "\n",
    "    name: str = Field(..., description=\"The person's name\")\n",
    "    age: int = Field(..., description=\"The person's age\")\n",
    "    fav_food: Optional[str] = Field(None, description=\"The person's favorite food\")\n",
    "\n",
    "\n",
    "class RecordDog(BaseModel):\n",
    "    \"\"\"Record some identifying information about a dog.\"\"\"\n",
    "\n",
    "    name: str = Field(..., description=\"The dog's name\")\n",
    "    color: str = Field(..., description=\"The dog's color\")\n",
    "    fav_food: Optional[str] = Field(None, description=\"The dog's favorite food\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "a4658ad8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a world class algorithm for recording entities\n",
      "Human: Make calls to the relevant function to record the entities in the following input:\n",
      "Human: Harry was a chubby brown beagle who loved chicken\n",
      "Human: Tips: Make sure to answer in the correct format\u001b[0m\n",
      " {'function_call': {'name': 'RecordDog', 'arguments': '{\\n  \"name\": \"Harry\",\\n  \"color\": \"brown\",\\n  \"fav_food\": \"chicken\"\\n}'}}\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "RecordDog(name='Harry', color='brown', fav_food='chicken')"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt_msgs = [\n",
    "    SystemMessage(content=\"You are a world class algorithm for recording entities\"),\n",
    "    HumanMessage(\n",
    "        content=\"Make calls to the relevant function to record the entities in the following input:\"\n",
    "    ),\n",
    "    HumanMessagePromptTemplate.from_template(\"{input}\"),\n",
    "    HumanMessage(content=\"Tips: Make sure to answer in the correct format\"),\n",
    "]\n",
    "prompt = ChatPromptTemplate(messages=prompt_msgs)\n",
    "\n",
    "chain = create_openai_fn_chain([RecordPerson, RecordDog], llm, prompt, verbose=True)\n",
    "chain.run(\"Harry was a chubby brown beagle who loved chicken\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df6d9147",
   "metadata": {},
   "source": [
    "### Using Python functions\n",
    "We can pass in functions as Pydantic classes, directly as OpenAI function dicts, or Python functions. To pass Python function in directly, we'll want to make sure our parameters have type hints, we have a docstring, and we use [Google Python style docstrings](https://google.github.io/styleguide/pyguide.html#doc-function-args) to describe the parameters.\n",
    "\n",
    "**NOTE**: To use Python functions, make sure the function arguments are of primitive types (str, float, int, bool) or that they are Pydantic objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "95ac5825",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a world class algorithm for recording entities\n",
      "Human: Make calls to the relevant function to record the entities in the following input:\n",
      "Human: The most important thing to remember about Tommy, my 12 year old, is that he'll do anything for apple pie.\n",
      "Human: Tips: Make sure to answer in the correct format\u001b[0m\n",
      " {'function_call': {'name': 'record_person', 'arguments': '{\\n  \"name\": \"Tommy\",\\n  \"age\": 12,\\n  \"fav_food\": {\\n    \"food\": \"apple pie\"\\n  }\\n}'}}\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'name': 'Tommy', 'age': 12, 'fav_food': {'food': 'apple pie'}}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "class OptionalFavFood(BaseModel):\n",
    "    \"\"\"Either a food or null.\"\"\"\n",
    "\n",
    "    food: Optional[str] = Field(\n",
    "        None,\n",
    "        description=\"Either the name of a food or null. Should be null if the food isn't known.\",\n",
    "    )\n",
    "\n",
    "\n",
    "def record_person(name: str, age: int, fav_food: OptionalFavFood) -> str:\n",
    "    \"\"\"Record some basic identifying information about a person.\n",
    "\n",
    "    Args:\n",
    "        name: The person's name.\n",
    "        age: The person's age in years.\n",
    "        fav_food: An OptionalFavFood object that either contains the person's favorite food or a null value. Food should be null if it's not known.\n",
    "    \"\"\"\n",
    "    return f\"Recording person {name} of age {age} with favorite food {fav_food.food}!\"\n",
    "\n",
    "\n",
    "chain = create_openai_fn_chain([record_person], llm, prompt, verbose=True)\n",
    "chain.run(\n",
    "    \"The most important thing to remember about Tommy, my 12 year old, is that he'll do anything for apple pie.\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "403ea5dd",
   "metadata": {},
   "source": [
    "If we pass in multiple Python functions or OpenAI functions, then the returned output will be of the form\n",
    "```python\n",
    "{\"name\": \"<<function_name>>\", \"arguments\": {<<function_arguments>>}}\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "8b0d11de",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mSystem: You are a world class algorithm for recording entities\n",
      "Human: Make calls to the relevant function to record the entities in the following input:\n",
      "Human: I can't find my dog Henry anywhere, he's a small brown beagle. Could you send a message about him?\n",
      "Human: Tips: Make sure to answer in the correct format\u001b[0m\n",
      " {'function_call': {'name': 'record_dog', 'arguments': '{\\n  \"name\": \"Henry\",\\n  \"color\": \"brown\",\\n  \"fav_food\": {\\n    \"food\": null\\n  }\\n}'}}\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'name': 'record_dog',\n",
       " 'arguments': {'name': 'Henry', 'color': 'brown', 'fav_food': {'food': None}}}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def record_dog(name: str, color: str, fav_food: OptionalFavFood) -> str:\n",
    "    \"\"\"Record some basic identifying information about a dog.\n",
    "\n",
    "    Args:\n",
    "        name: The dog's name.\n",
    "        color: The dog's color.\n",
    "        fav_food: An OptionalFavFood object that either contains the dog's favorite food or a null value. Food should be null if it's not known.\n",
    "    \"\"\"\n",
    "    return f\"Recording dog {name} of color {color} with favorite food {fav_food}!\"\n",
    "\n",
    "\n",
    "chain = create_openai_fn_chain([record_person, record_dog], llm, prompt, verbose=True)\n",
    "chain.run(\n",
    "    \"I can't find my dog Henry anywhere, he's a small brown beagle. Could you send a message about him?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f93686b",
   "metadata": {},
   "source": [
    "## Other Chains using OpenAI  functions\n",
    "\n",
    "There are a number of more specific chains that use OpenAI functions.\n",
    "- [Extraction](/docs/modules/chains/additional/extraction): very similar to structured output chain, intended for information/entity extraction specifically.\n",
    "- [Tagging](/docs/use_cases/tagging): tag inputs.\n",
    "- [OpenAPI](/docs/use_cases/apis/openapi_openai): take an OpenAPI spec and create + execute valid requests against the API, using OpenAI functions under the hood.\n",
    "- [QA with citations](/docs/use_cases/question_answering/how_to/qa_citations): use OpenAI functions ability to extract citations from text."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "venv",
   "language": "python",
   "name": "venv"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}