update extraction use-case docs (#17979)

Update extraction use-case docs to showcase and explain all modes of
`create_structured_output_runnable`.
pull/18287/head
ccurme 4 months ago committed by GitHub
parent 8a81fcd5d3
commit 9bf58ec7dd
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -19,9 +19,7 @@
"\n",
"## Use case\n",
"\n",
"Getting structured output from raw LLM generations is hard.\n",
"\n",
"For example, suppose you need the model output formatted with a specific schema for:\n",
"LLMs can be used to generate text that is structured according to a specific schema. This can be useful in a number of scenarios, including:\n",
"\n",
"- Extracting a structured row to insert into a database \n",
"- Extracting API parameters\n",
@ -43,17 +41,23 @@
"source": [
"## Overview \n",
"\n",
"There are two primary approaches for this:\n",
"\n",
"- `Functions`: Some LLMs can call [functions](https://openai.com/blog/function-calling-and-other-api-updates) to extract arbitrary entities from LLM responses.\n",
"There are two broad approaches for this:\n",
"\n",
"- `Parsing`: [Output parsers](/docs/modules/model_io/output_parsers/) are classes that structure LLM responses. \n",
"- `Tools and JSON mode`: Some LLMs specifically support structured output generation in certain contexts. Examples include OpenAI's [function and tool calling](https://platform.openai.com/docs/guides/function-calling) or [JSON mode](https://platform.openai.com/docs/guides/text-generation/json-mode).\n",
"\n",
"Only some LLMs support functions (e.g., OpenAI), and they are more general than parsers. \n",
"- `Parsing`: LLMs can often be instructed to output their response in a dseired format. [Output parsers](/docs/modules/model_io/output_parsers/) will parse text generations into a structured form.\n",
"\n",
"Parsers extract precisely what is enumerated in a provided schema (e.g., specific attributes of a person).\n",
"\n",
"Functions can infer things beyond of a provided schema (e.g., attributes about a person that you did not ask for)."
"Functions and tools can infer things beyond of a provided schema (e.g., attributes about a person that you did not ask for)."
]
},
{
"cell_type": "markdown",
"id": "fbea06b5-66b6-4958-936d-23212061e4c8",
"metadata": {},
"source": [
"## Option 1: Leveraging tools and JSON mode"
]
},
{
@ -61,13 +65,16 @@
"id": "25d89f21",
"metadata": {},
"source": [
"## Quickstart\n",
"### Quickstart\n",
"\n",
"OpenAI functions are one way to get started with extraction.\n",
"`create_structured_output_runnable` will create Runnables to support structured data extraction via OpenAI tool use and JSON mode.\n",
"\n",
"Define a schema that specifies the properties we want to extract from the LLM output.\n",
"The desired output schema can be expressed either via a Pydantic model or a Python dict representing valid [JsonSchema](https://json-schema.org/).\n",
"\n",
"Then, we can use `create_extraction_chain` to extract our desired schema using an OpenAI function call."
"This function supports three modes for structured data extraction:\n",
"- `\"openai-functions\"` will define OpenAI functions and bind them to the given LLM;\n",
"- `\"openai-tools\"` will define OpenAI tools and bind them to the given LLM;\n",
"- `\"openai-json\"` will bind `response_format={\"type\": \"json_object\"}` to the given LLM.\n"
]
},
{
@ -86,197 +93,184 @@
},
{
"cell_type": "code",
"execution_count": 12,
"id": "3e017ba0",
"execution_count": 1,
"id": "4c2bc413-eacd-44bd-9fcb-bbbe1f97ca6c",
"metadata": {},
"outputs": [],
"source": [
"from typing import Optional\n",
"\n",
"from langchain.chains import create_structured_output_runnable\n",
"from langchain_core.pydantic_v1 import BaseModel\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"\n",
"class Person(BaseModel):\n",
" person_name: str\n",
" person_height: int\n",
" person_hair_color: str\n",
" dog_breed: Optional[str]\n",
" dog_name: Optional[str]\n",
"\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-4-0125-preview\", temperature=0)\n",
"runnable = create_structured_output_runnable(Person, llm)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "de8c9d7b-bb7b-45bc-9794-a355ed0d1508",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'},\n",
" {'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}]"
"Person(person_name='Alex', person_height=60, person_hair_color='blond', dog_breed=None, dog_name=None)"
]
},
"execution_count": 12,
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains import create_extraction_chain\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"# Schema\n",
"schema = {\n",
" \"properties\": {\n",
" \"name\": {\"type\": \"string\"},\n",
" \"height\": {\"type\": \"integer\"},\n",
" \"hair_color\": {\"type\": \"string\"},\n",
" },\n",
" \"required\": [\"name\", \"height\"],\n",
"}\n",
"\n",
"# Input\n",
"inp = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\"\"\"\n",
"\n",
"# Run chain\n",
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo\")\n",
"chain = create_extraction_chain(schema, llm)\n",
"chain.run(inp)"
"inp = \"Alex is 5 feet tall and has blond hair.\"\n",
"runnable.invoke(inp)"
]
},
{
"cell_type": "markdown",
"id": "6f7eb826",
"id": "02fd21ff-27a8-4890-bb18-fc852cafb18a",
"metadata": {},
"source": [
"## Option 1: OpenAI functions\n",
"\n",
"### Looking under the hood\n",
"\n",
"Let's dig into what is happening when we call `create_extraction_chain`.\n",
"\n",
"The [LangSmith trace](https://smith.langchain.com/public/72bc3205-7743-4ca6-929a-966a9d4c2a77/r) shows that we call the function `information_extraction` on the input string, `inp`.\n",
"\n",
"![Image description](../../static/img/extraction_trace_function.png)\n",
"\n",
"This `information_extraction` function is defined [here](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/openai_functions/extraction.py) and returns a dict.\n",
"\n",
"We can see the `dict` in the model output:\n",
"```\n",
" {\n",
" \"info\": [\n",
" {\n",
" \"name\": \"Alex\",\n",
" \"height\": 5,\n",
" \"hair_color\": \"blonde\"\n",
" },\n",
" {\n",
" \"name\": \"Claudia\",\n",
" \"height\": 6,\n",
" \"hair_color\": \"brunette\"\n",
" }\n",
" ]\n",
" }\n",
"```\n",
"\n",
"The `create_extraction_chain` then parses the raw LLM output for us using [`JsonKeyOutputFunctionsParser`](https://github.com/langchain-ai/langchain/blob/f81e613086d211327b67b0fb591fd4d5f9a85860/libs/langchain/langchain/chains/openai_functions/extraction.py#L62).\n",
"\n",
"This results in the list of JSON objects returned by the chain above:\n",
"```\n",
"[{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'},\n",
" {'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}]\n",
" ```"
"### Specifying schemas"
]
},
{
"cell_type": "markdown",
"id": "dcb03138",
"id": "a5a74f3e-92aa-4ac7-96f2-ea89b8740ba8",
"metadata": {},
"source": [
"A convenient way to express desired output schemas is via Pydantic. The above example specified the desired output schema via `Person`, a Pydantic model. Such schemas can be easily combined together to generate richer output formats:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c1c8fe71-0ae4-466a-b32f-001c59b62bb3",
"metadata": {},
"outputs": [],
"source": [
"### Multiple entity types\n",
"from typing import Sequence\n",
"\n",
"\n",
"class People(BaseModel):\n",
" \"\"\"Identifying information about all people in a text.\"\"\"\n",
"\n",
"We can extend this further.\n",
" people: Sequence[Person]\n",
"\n",
"Let's say we want to differentiate between dogs and people.\n",
"\n",
"We can add `person_` and `dog_` prefixes for each property"
"runnable = create_structured_output_runnable(People, llm)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "01eae733",
"execution_count": 4,
"id": "c5aa9e43-9202-4b2d-a767-e596296b3a81",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'person_name': 'Alex',\n",
" 'person_height': 5,\n",
" 'person_hair_color': 'blonde',\n",
" 'dog_name': 'Frosty',\n",
" 'dog_breed': 'labrador'},\n",
" {'person_name': 'Claudia',\n",
" 'person_height': 6,\n",
" 'person_hair_color': 'brunette'}]"
"People(people=[Person(person_name='Alex', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None), Person(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed='beagle', dog_name='Harry')])"
]
},
"execution_count": 8,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inp = \"\"\"Alex is 5 feet tall and has blond hair.\n",
"Claudia is 1 feet taller Alex and jumps higher than him.\n",
"Claudia is a brunette and has a beagle named Harry.\"\"\"\n",
"\n",
"runnable.invoke(inp)"
]
},
{
"cell_type": "markdown",
"id": "53e316ea-b74a-4512-a9ab-c5d01ff583fe",
"metadata": {},
"source": [
"Note that `dog_breed` and `dog_name` are optional attributes, such that here they are extracted for Claudia and not for Alex.\n",
"\n",
"One can also specify the desired output format with a Python dict representing valid JsonSchema:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "3e017ba0",
"metadata": {},
"outputs": [],
"source": [
"schema = {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"person_name\": {\"type\": \"string\"},\n",
" \"person_height\": {\"type\": \"integer\"},\n",
" \"person_hair_color\": {\"type\": \"string\"},\n",
" \"dog_name\": {\"type\": \"string\"},\n",
" \"dog_breed\": {\"type\": \"string\"},\n",
" \"name\": {\"type\": \"string\"},\n",
" \"height\": {\"type\": \"integer\"},\n",
" \"hair_color\": {\"type\": \"string\"},\n",
" },\n",
" \"required\": [\"person_name\", \"person_height\"],\n",
" \"required\": [\"name\", \"height\"],\n",
"}\n",
"\n",
"chain = create_extraction_chain(schema, llm)\n",
"\n",
"inp = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
"Alex's dog Frosty is a labrador and likes to play hide and seek.\"\"\"\n",
"\n",
"chain.run(inp)"
"runnable = create_structured_output_runnable(schema, llm)"
]
},
{
"cell_type": "markdown",
"id": "f205905c",
"cell_type": "code",
"execution_count": 6,
"id": "fb525991-643d-4d47-9111-a3d4364c03d7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'name': 'Alex', 'height': 60}"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### Unrelated entities\n",
"\n",
"If we use `required: []`, we allow the model to return **only** person attributes or **only** dog attributes for a single entity (person or dog)."
"inp = \"Alex is 5 feet tall. I don't know his hair color.\"\n",
"runnable.invoke(inp)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "6ff4ac7e",
"execution_count": 7,
"id": "a3d3f0d2-c9d4-4ab8-9a5a-1ddda62db6ec",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'person_name': 'Alex', 'person_height': 5, 'person_hair_color': 'blonde'},\n",
" {'person_name': 'Claudia',\n",
" 'person_height': 6,\n",
" 'person_hair_color': 'brunette'},\n",
" {'dog_name': 'Willow', 'dog_breed': 'German Shepherd'},\n",
" {'dog_name': 'Milo', 'dog_breed': 'border collie'}]"
"{'name': 'Alex', 'height': 60, 'hair_color': 'blond'}"
]
},
"execution_count": 14,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"person_name\": {\"type\": \"string\"},\n",
" \"person_height\": {\"type\": \"integer\"},\n",
" \"person_hair_color\": {\"type\": \"string\"},\n",
" \"dog_name\": {\"type\": \"string\"},\n",
" \"dog_breed\": {\"type\": \"string\"},\n",
" },\n",
" \"required\": [],\n",
"}\n",
"\n",
"chain = create_extraction_chain(schema, llm)\n",
"\n",
"inp = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
"Willow is a German Shepherd that likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by.\"\"\"\n",
"\n",
"chain.run(inp)"
"inp = \"Alex is 5 feet tall. He is blond.\"\n",
"runnable.invoke(inp)"
]
},
{
@ -284,11 +278,9 @@
"id": "34f3b958",
"metadata": {},
"source": [
"### Extra information\n",
"#### Extra information\n",
"\n",
"The power of functions (relative to using parsers alone) lies in the ability to perform semantic extraction.\n",
"\n",
"In particular, `we can ask for things that are not explicitly enumerated in the schema`.\n",
"Runnables constructed via `create_structured_output_runnable` generally are capable of semantic extraction, such that they can populate information that is not explicitly enumerated in the schema.\n",
"\n",
"Suppose we want unspecified additional information about dogs. \n",
"\n",
@ -297,44 +289,53 @@
},
{
"cell_type": "code",
"execution_count": 10,
"id": "40c7b26f",
"execution_count": 8,
"id": "0ed3b5e6-a7f3-453e-be61-d94fc665c16b",
"metadata": {},
"outputs": [],
"source": [
"inp = \"\"\"Alex is 5 feet tall and has blond hair.\n",
"Claudia is 1 feet taller Alex and jumps higher than him.\n",
"Claudia is a brunette and has a beagle named Harry.\n",
"Harry likes to play with other dogs and can always be found\n",
"playing with Milo, a border collie that lives close by.\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "be07928a-8022-4963-a15e-eb3097beef9f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'person_name': 'Alex', 'person_height': 5, 'person_hair_color': 'blonde'},\n",
" {'person_name': 'Claudia',\n",
" 'person_height': 6,\n",
" 'person_hair_color': 'brunette'},\n",
" {'dog_name': 'Willow',\n",
" 'dog_breed': 'German Shepherd',\n",
" 'dog_extra_info': 'likes to play with other dogs'},\n",
" {'dog_name': 'Milo',\n",
" 'dog_breed': 'border collie',\n",
" 'dog_extra_info': 'lives close by'}]"
"People(people=[Person(person_name='Alex', person_height=60, person_hair_color='blond', dog_breed=None, dog_name=None, dog_extra_info=None), Person(person_name='Claudia', person_height=72, person_hair_color='brunette', dog_breed='beagle', dog_name='Harry', dog_extra_info='likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by.')])"
]
},
"execution_count": 10,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"person_name\": {\"type\": \"string\"},\n",
" \"person_height\": {\"type\": \"integer\"},\n",
" \"person_hair_color\": {\"type\": \"string\"},\n",
" \"dog_name\": {\"type\": \"string\"},\n",
" \"dog_breed\": {\"type\": \"string\"},\n",
" \"dog_extra_info\": {\"type\": \"string\"},\n",
" },\n",
"}\n",
"class Person(BaseModel):\n",
" person_name: str\n",
" person_height: int\n",
" person_hair_color: str\n",
" dog_breed: Optional[str]\n",
" dog_name: Optional[str]\n",
" dog_extra_info: Optional[str]\n",
"\n",
"\n",
"chain = create_extraction_chain(schema, llm)\n",
"chain.run(inp)"
"class People(BaseModel):\n",
" \"\"\"Identifying information about all people in a text.\"\"\"\n",
"\n",
" people: Sequence[Person]\n",
"\n",
"\n",
"runnable = create_structured_output_runnable(People, llm)\n",
"runnable.invoke(inp)"
]
},
{
@ -347,112 +348,320 @@
},
{
"cell_type": "markdown",
"id": "bf71ddce",
"id": "97ed9f5e-33be-4667-aa82-af49cc874e1d",
"metadata": {},
"source": [
"### Pydantic \n",
"### Specifying extraction mode\n",
"\n",
"Pydantic is a data validation and settings management library for Python. \n",
"`create_structured_output_runnable` supports varying implementations of the underlying extraction under the hood, which are configured via the `mode` parameter. This parameter can be one of `\"openai-functions\"`, `\"openai-tools\"`, or `\"openai-json\"`."
]
},
{
"cell_type": "markdown",
"id": "7c8e0b00-d6e6-432d-b9b0-8d0a3c0c6572",
"metadata": {},
"source": [
"#### OpenAI Functions and Tools"
]
},
{
"cell_type": "markdown",
"id": "07ccdbb1-cbe5-45af-87e4-dde42baee5eb",
"metadata": {},
"source": [
"Some LLMs are fine-tuned to support the invocation of functions or tools. If they are given an input schema for a tool and recognize an occasion to use it, they may emit JSON output conforming to that schema. We can leverage this to drive structured data extraction from natural language.\n",
"\n",
"It allows you to create data classes with attributes that are automatically validated when you instantiate an object.\n",
"OpenAI originally released this via a [`functions` parameter in its chat completions API](https://openai.com/blog/function-calling-and-other-api-updates). This has since been deprecated in favor of a [`tools` parameter](https://platform.openai.com/docs/guides/function-calling), which can include (multiple) functions."
]
},
{
"cell_type": "markdown",
"id": "e6b02442-2884-4b45-a5a0-4fdac729fdb3",
"metadata": {},
"source": [
"Using OpenAI Functions:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "7b1c2266-b04b-4a23-83a9-da3cd2f88137",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Person(person_name='Alex', person_height=60, person_hair_color='blond', dog_breed=None, dog_name=None, dog_extra_info=None)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"runnable = create_structured_output_runnable(Person, llm, mode=\"openai-functions\")\n",
"\n",
"Lets define a class with attributes annotated with types."
"inp = \"Alex is 5 feet tall and has blond hair.\"\n",
"runnable.invoke(inp)"
]
},
{
"cell_type": "markdown",
"id": "1c07427b-a582-4489-a486-4c24a6c3165f",
"metadata": {},
"source": [
"Using OpenAI Tools:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "d36a743b",
"execution_count": 18,
"id": "0b1ca93a-ffd9-4d37-8baa-377757405357",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Properties(person_name='Alex', person_height=5, person_hair_color='blonde', dog_breed=None, dog_name=None),\n",
" Properties(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed=None, dog_name=None)]"
"Person(person_name='Alex', person_height=152, person_hair_color='blond', dog_breed=None, dog_name=None)"
]
},
"execution_count": 4,
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from typing import Optional\n",
"runnable = create_structured_output_runnable(Person, llm, mode=\"openai-tools\")\n",
"\n",
"from langchain.chains import create_extraction_chain_pydantic\n",
"from langchain_core.pydantic_v1 import BaseModel\n",
"runnable.invoke(inp)"
]
},
{
"cell_type": "markdown",
"id": "4018a8fc-1799-4c9d-b655-a66f618204b3",
"metadata": {},
"source": [
"The corresponding [LangSmith trace](https://smith.langchain.com/public/04cc37a7-7a1c-4bae-b972-1cb1a642568c/r) illustrates the tool call that generated our structured output.\n",
"\n",
"![Image description](../../static/img/extraction_trace_tool.png)"
]
},
{
"cell_type": "markdown",
"id": "fb2662d5-9492-4acc-935b-eb8fccebbe0f",
"metadata": {},
"source": [
"#### JSON Mode"
]
},
{
"cell_type": "markdown",
"id": "c0fd98ba-c887-4c30-8c9e-896ae90ac56a",
"metadata": {},
"source": [
"Some LLMs support generating JSON more generally. OpenAI implements this via a [`response_format` parameter](https://platform.openai.com/docs/guides/text-generation/json-mode) in its chat completions API.\n",
"\n",
"Note that this method may require explicit prompting (e.g., OpenAI requires that input messages contain the word \"json\" in some form when using this parameter)."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "6b3e4679-eadc-42c8-b882-92a600083f2f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Person(person_name='Alex', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None, dog_extra_info=None)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_core.prompts import ChatPromptTemplate\n",
"\n",
"# Pydantic data class\n",
"class Properties(BaseModel):\n",
" person_name: str\n",
" person_height: int\n",
" person_hair_color: str\n",
" dog_breed: Optional[str]\n",
" dog_name: Optional[str]\n",
"system_prompt = \"\"\"You extract information in structured JSON formats.\n",
"\n",
"Extract a valid JSON blob from the user input that matches the following JSON Schema:\n",
"\n",
"# Extraction\n",
"chain = create_extraction_chain_pydantic(pydantic_schema=Properties, llm=llm)\n",
"{output_schema}\"\"\"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", system_prompt),\n",
" (\"human\", \"{input}\"),\n",
" ]\n",
")\n",
"runnable = create_structured_output_runnable(\n",
" Person,\n",
" llm,\n",
" mode=\"openai-json\",\n",
" prompt=prompt,\n",
" enforce_function_usage=False,\n",
")\n",
"\n",
"# Run\n",
"inp = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\"\"\"\n",
"chain.run(inp)"
"runnable.invoke({\"input\": inp})"
]
},
{
"cell_type": "markdown",
"id": "07a0351a",
"id": "b22d8262-a9b8-415c-a142-d0ee4db7ec2b",
"metadata": {},
"source": [
"As we can see from the [trace](https://smith.langchain.com/public/fed50ae6-26bb-4235-a254-e0b7a229d10f/r), we use the function `information_extraction`, as above, with the Pydantic schema. "
"### Few-shot examples"
]
},
{
"cell_type": "markdown",
"id": "cbd9f121",
"id": "a01c75f6-99d7-4d7b-a58f-b0ea7e8f338a",
"metadata": {},
"source": [
"## Option 2: Parsing\n",
"\n",
"[Output parsers](/docs/modules/model_io/output_parsers/) are classes that help structure language model responses. \n",
"Suppose we want to tune the behavior of our extractor. There are a few options available. For example, if we want to redact names but retain other information, we could adjust the system prompt:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "c5d16ad6-824e-434a-906a-d94e78259d4f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Person(person_name='REDACTED', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"system_prompt = \"\"\"You extract information in structured JSON formats.\n",
"\n",
"As shown above, they are used to parse the output of the OpenAI function calls in `create_extraction_chain`.\n",
"Extract a valid JSON blob from the user input that matches the following JSON Schema:\n",
"\n",
"But, they can be used independent of functions.\n",
"{output_schema}\n",
"\n",
"### Pydantic\n",
"Redact all names.\n",
"\"\"\"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [(\"system\", system_prompt), (\"human\", \"{input}\")]\n",
")\n",
"runnable = create_structured_output_runnable(\n",
" Person,\n",
" llm,\n",
" mode=\"openai-json\",\n",
" prompt=prompt,\n",
" enforce_function_usage=False,\n",
")\n",
"\n",
"Just as a above, let's parse a generation based on a Pydantic data class."
"runnable.invoke({\"input\": inp})"
]
},
{
"cell_type": "markdown",
"id": "be611688-1224-4d5a-9e34-a158b3c04296",
"metadata": {},
"source": [
"Few-shot examples are another, effective way to illustrate intended behavior. For instance, if we want to redact names with a specific character string, a one-shot example will convey this. We can use a `FewShotChatMessagePromptTemplate` to easily accommodate both a fixed set of examples as well as the dynamic selection of examples based on the input."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "64650362",
"execution_count": 20,
"id": "0aeee951-7f73-4e24-9033-c81a08af14dc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"People(people=[Person(person_name='Alex', person_height=5, person_hair_color='blonde', dog_breed=None, dog_name=None), Person(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed=None, dog_name=None)])"
"Person(person_name='#####', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None)"
]
},
"execution_count": 10,
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_core.prompts import FewShotChatMessagePromptTemplate\n",
"\n",
"examples = [\n",
" {\n",
" \"input\": \"Samus is 6 ft tall and blonde.\",\n",
" \"output\": Person(\n",
" person_name=\"######\",\n",
" person_height=5,\n",
" person_hair_color=\"blonde\",\n",
" ).dict(),\n",
" }\n",
"]\n",
"\n",
"example_prompt = ChatPromptTemplate.from_messages(\n",
" [(\"human\", \"{input}\"), (\"ai\", \"{output}\")]\n",
")\n",
"few_shot_prompt = FewShotChatMessagePromptTemplate(\n",
" examples=examples,\n",
" example_prompt=example_prompt,\n",
")\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [(\"system\", system_prompt), few_shot_prompt, (\"human\", \"{input}\")]\n",
")\n",
"runnable = create_structured_output_runnable(\n",
" Person,\n",
" llm,\n",
" mode=\"openai-json\",\n",
" prompt=prompt,\n",
" enforce_function_usage=False,\n",
")\n",
"\n",
"runnable.invoke({\"input\": inp})"
]
},
{
"cell_type": "markdown",
"id": "51846211-e86b-4807-9348-eb263999f7f7",
"metadata": {},
"source": [
"Here, the [LangSmith trace](https://smith.langchain.com/public/6fe5e694-9c04-48f7-83ff-e541da764781/r) for the chat model call shows how the one-shot example is formatted into the prompt.\n",
"\n",
"![Image description](../../static/img/extraction_trace_few_shot.png)"
]
},
{
"cell_type": "markdown",
"id": "cbd9f121",
"metadata": {},
"source": [
"## Option 2: Parsing\n",
"\n",
"[Output parsers](/docs/modules/model_io/output_parsers/) are classes that help structure language model responses. \n",
"\n",
"As shown above, they are used to parse the output of the runnable created by `create_structured_output_runnable`.\n",
"\n",
"They can also be used more generally, if a LLM is instructed to emit its output in a certain format. Parsers include convenience methods for generating formatting instructions for use in prompts.\n",
"\n",
"Below we implement an example."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "64650362",
"metadata": {},
"outputs": [],
"source": [
"from typing import Optional, Sequence\n",
"\n",
"from langchain.output_parsers import PydanticOutputParser\n",
"from langchain.prompts import (\n",
" PromptTemplate,\n",
")\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain_core.pydantic_v1 import BaseModel, Field, validator\n",
"from langchain_openai import OpenAI\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"\n",
"class Person(BaseModel):\n",
@ -470,7 +679,7 @@
"\n",
"\n",
"# Run\n",
"query = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\"\"\"\n",
"query = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blond.\"\"\"\n",
"\n",
"# Set up a parser + inject instructions into the prompt template.\n",
"parser = PydanticOutputParser(pydantic_object=People)\n",
@ -484,9 +693,30 @@
"\n",
"# Run\n",
"_input = prompt.format_prompt(query=query)\n",
"model = OpenAI(temperature=0)\n",
"output = model(_input.to_string())\n",
"parser.parse(output)"
"model = ChatOpenAI()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "727f3bf2-31b1-4b07-94f5-9568acf3ffdf",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"People(people=[Person(person_name='Alex', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None), Person(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed=None, dog_name=None)])"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"output = model.invoke(_input.to_string())\n",
"\n",
"parser.parse(output.content)"
]
},
{
@ -494,46 +724,31 @@
"id": "826899df",
"metadata": {},
"source": [
"We can see from the [LangSmith trace](https://smith.langchain.com/public/8e3aa858-467e-46a5-aa49-5db65f0a2b9a/r) that we get the same output as above.\n",
"We can see from the [LangSmith trace](https://smith.langchain.com/public/aec42dd3-d471-4d34-801b-20dd88444931/r) that we get the same output as above.\n",
"\n",
"![Image description](../../static/img/extraction_trace_function_2.png)\n",
"![Image description](../../static/img/extraction_trace_parsing.png)\n",
"\n",
"We can see that we provide a two-shot prompt in order to instruct the LLM to output in our desired format.\n",
"\n",
"And, we need to do a bit more work:\n",
"\n",
"* Define a class that holds multiple instances of `Person`\n",
"* Explicitly parse the output of the LLM to the Pydantic class\n",
"\n",
"We can see this for other cases, too."
"We can see that we provide a two-shot prompt in order to instruct the LLM to output in our desired format."
]
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 21,
"id": "837c350e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')"
"Joke(setup=\"Why couldn't the bicycle find its way home?\", punchline='Because it lost its bearings!')"
]
},
"execution_count": 11,
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.output_parsers import PydanticOutputParser\n",
"from langchain.prompts import (\n",
" PromptTemplate,\n",
")\n",
"from langchain_core.pydantic_v1 import BaseModel, Field, validator\n",
"from langchain_openai import OpenAI\n",
"\n",
"\n",
"# Define your desired data structure.\n",
"class Joke(BaseModel):\n",
" setup: str = Field(description=\"question to set up a joke\")\n",
@ -562,9 +777,9 @@
"\n",
"# Run\n",
"_input = prompt.format_prompt(query=joke_query)\n",
"model = OpenAI(temperature=0)\n",
"output = model(_input.to_string())\n",
"parser.parse(output)"
"model = ChatOpenAI(temperature=0)\n",
"output = model.invoke(_input.to_string())\n",
"parser.parse(output.content)"
]
},
{
@ -574,9 +789,7 @@
"source": [
"As we can see, we get an output of the `Joke` class, which respects our originally desired schema: 'setup' and 'punchline'.\n",
"\n",
"We can look at the [LangSmith trace](https://smith.langchain.com/public/69f11d41-41be-4319-93b0-6d0eda66e969/r) to see exactly what is going on under the hood.\n",
"\n",
"![Image description](../../static/img/extraction_trace_joke.png)\n",
"We can look at the [LangSmith trace](https://smith.langchain.com/public/557ad630-af35-43e9-b043-93800539025f/r) to see exactly what is going on under the hood.\n",
"\n",
"### Going deeper\n",
"\n",
@ -610,7 +823,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.4"
}
},
"nbformat": 4,

Binary file not shown.

After

Width:  |  Height:  |  Size: 325 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 63 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 132 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 432 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 336 KiB

Loading…
Cancel
Save