mirror of
https://github.com/hwchase17/langchain
synced 2024-11-06 03:20:49 +00:00
84c1ad7eaa
<!-- Thank you for contributing to LangChain! Replace this comment with: - Description: a description of the change, - Issue: the issue # it fixes (if applicable), - Dependencies: any dependencies required for this change, - Tag maintainer: for a quicker response, tag the relevant maintainer (see below), - Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out! Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. Maintainer responsibilities: - General / Misc / if you don't know who to tag: @baskaryan - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev - Models / Prompts: @hwchase17, @baskaryan - Memory: @hwchase17 - Agents / Tools / Toolkits: @hinthornw - Tracing / Callbacks: @agola11 - Async: @agola11 If no one reviews your PR within a few days, feel free to @-mention the same people again. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md -->
598 lines
19 KiB
Plaintext
598 lines
19 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b84edb4e",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Extraction\n",
|
|
"\n",
|
|
"[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/extraction.ipynb)\n",
|
|
"\n",
|
|
"## Use case\n",
|
|
"\n",
|
|
"Getting structured output from raw LLM generations is hard.\n",
|
|
"\n",
|
|
"For example, suppose you need the model output formatted with a specific schema for:\n",
|
|
"\n",
|
|
"- Extracting a structured row to insert into a database \n",
|
|
"- Extracting API parameters\n",
|
|
"- Extracting different parts of a user query (e.g., for semantic vs keyword search)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "178dbc59",
|
|
"metadata": {},
|
|
"source": [
|
|
"![Image description](/img/extraction.png)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "97f474d4",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Overview \n",
|
|
"\n",
|
|
"There are two primary approaches for this:\n",
|
|
"\n",
|
|
"- `Functions`: Some LLMs can call [functions](https://openai.com/blog/function-calling-and-other-api-updates) to extract arbitrary entities from LLM responses.\n",
|
|
"\n",
|
|
"- `Parsing`: [Output parsers](/docs/modules/model_io/output_parsers/) are classes that structure LLM responses. \n",
|
|
"\n",
|
|
"Only some LLMs support functions (e.g., OpenAI), and they are more general than parsers. \n",
|
|
"\n",
|
|
"Parsers extract precisely what is enumerated in a provided schema (e.g., specific attributes of a person).\n",
|
|
"\n",
|
|
"Functions can infer things beyond of a provided schema (e.g., attributes about a person that you did not ask for)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "25d89f21",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Quickstart\n",
|
|
"\n",
|
|
"OpenAI funtions are one way to get started with extraction.\n",
|
|
"\n",
|
|
"Define a schema that specifies the properties we want to extract from the LLM output.\n",
|
|
"\n",
|
|
"Then, we can use `create_extraction_chain` to extract our desired schema using an OpenAI function call."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "3f5ec7a3",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"pip install langchain openai \n",
|
|
"\n",
|
|
"# Set env var OPENAI_API_KEY or load from a .env file:\n",
|
|
"# import dotenv\n",
|
|
"# dotenv.load_env()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"id": "3e017ba0",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'},\n",
|
|
" {'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}]"
|
|
]
|
|
},
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"from langchain.chat_models import ChatOpenAI\n",
|
|
"from langchain.chains import create_extraction_chain\n",
|
|
"\n",
|
|
"# Schema\n",
|
|
"schema = {\n",
|
|
" \"properties\": {\n",
|
|
" \"name\": {\"type\": \"string\"},\n",
|
|
" \"height\": {\"type\": \"integer\"},\n",
|
|
" \"hair_color\": {\"type\": \"string\"},\n",
|
|
" },\n",
|
|
" \"required\": [\"name\", \"height\"],\n",
|
|
"}\n",
|
|
"\n",
|
|
"# Input \n",
|
|
"inp = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\"\"\"\n",
|
|
"\n",
|
|
"# Run chain\n",
|
|
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo\")\n",
|
|
"chain = create_extraction_chain(schema, llm)\n",
|
|
"chain.run(inp)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6f7eb826",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Option 1: OpenAI funtions\n",
|
|
"\n",
|
|
"### Looking under the hood\n",
|
|
"\n",
|
|
"Let's dig into what is happening when we call `create_extraction_chain`.\n",
|
|
"\n",
|
|
"The [LangSmith trace](https://smith.langchain.com/public/72bc3205-7743-4ca6-929a-966a9d4c2a77/r) shows that we call the function `information_extraction` on the input string, `inp`.\n",
|
|
"\n",
|
|
"![Image description](/img/extraction_trace_function.png)\n",
|
|
"\n",
|
|
"This `information_extraction` function is defined [here](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/openai_functions/extraction.py) and returns a dict.\n",
|
|
"\n",
|
|
"We can see the `dict` in the model output:\n",
|
|
"```\n",
|
|
" {\n",
|
|
" \"info\": [\n",
|
|
" {\n",
|
|
" \"name\": \"Alex\",\n",
|
|
" \"height\": 5,\n",
|
|
" \"hair_color\": \"blonde\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\": \"Claudia\",\n",
|
|
" \"height\": 6,\n",
|
|
" \"hair_color\": \"brunette\"\n",
|
|
" }\n",
|
|
" ]\n",
|
|
" }\n",
|
|
"```\n",
|
|
"\n",
|
|
"The `create_extraction_chain` then parses the raw LLM output for us using [`JsonKeyOutputFunctionsParser`](https://github.com/langchain-ai/langchain/blob/f81e613086d211327b67b0fb591fd4d5f9a85860/libs/langchain/langchain/chains/openai_functions/extraction.py#L62).\n",
|
|
"\n",
|
|
"This results in the list of JSON objects returned by the chain above:\n",
|
|
"```\n",
|
|
"[{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'},\n",
|
|
" {'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}]\n",
|
|
" ```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "dcb03138",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Multiple entity types\n",
|
|
"\n",
|
|
"We can extend this further.\n",
|
|
"\n",
|
|
"Let's say we want to differentiate between dogs and people.\n",
|
|
"\n",
|
|
"We can add `person_` and `dog_` prefixes for each property"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"id": "01eae733",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[{'person_name': 'Alex',\n",
|
|
" 'person_height': 5,\n",
|
|
" 'person_hair_color': 'blonde',\n",
|
|
" 'dog_name': 'Frosty',\n",
|
|
" 'dog_breed': 'labrador'},\n",
|
|
" {'person_name': 'Claudia',\n",
|
|
" 'person_height': 6,\n",
|
|
" 'person_hair_color': 'brunette'}]"
|
|
]
|
|
},
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"schema = {\n",
|
|
" \"properties\": {\n",
|
|
" \"person_name\": {\"type\": \"string\"},\n",
|
|
" \"person_height\": {\"type\": \"integer\"},\n",
|
|
" \"person_hair_color\": {\"type\": \"string\"},\n",
|
|
" \"dog_name\": {\"type\": \"string\"},\n",
|
|
" \"dog_breed\": {\"type\": \"string\"},\n",
|
|
" },\n",
|
|
" \"required\": [\"person_name\", \"person_height\"],\n",
|
|
"}\n",
|
|
"\n",
|
|
"chain = create_extraction_chain(schema, llm)\n",
|
|
"\n",
|
|
"inp = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
|
|
"Alex's dog Frosty is a labrador and likes to play hide and seek.\"\"\"\n",
|
|
"\n",
|
|
"chain.run(inp)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f205905c",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Unrelated entities\n",
|
|
"\n",
|
|
"If we use `required: []`, we allow the model to return **only** person attributes or **only** dog attributes for a single entity (person or dog)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"id": "6ff4ac7e",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[{'person_name': 'Alex', 'person_height': 5, 'person_hair_color': 'blonde'},\n",
|
|
" {'person_name': 'Claudia',\n",
|
|
" 'person_height': 6,\n",
|
|
" 'person_hair_color': 'brunette'},\n",
|
|
" {'dog_name': 'Willow', 'dog_breed': 'German Shepherd'},\n",
|
|
" {'dog_name': 'Milo', 'dog_breed': 'border collie'}]"
|
|
]
|
|
},
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"schema = {\n",
|
|
" \"properties\": {\n",
|
|
" \"person_name\": {\"type\": \"string\"},\n",
|
|
" \"person_height\": {\"type\": \"integer\"},\n",
|
|
" \"person_hair_color\": {\"type\": \"string\"},\n",
|
|
" \"dog_name\": {\"type\": \"string\"},\n",
|
|
" \"dog_breed\": {\"type\": \"string\"},\n",
|
|
" },\n",
|
|
" \"required\": [],\n",
|
|
"}\n",
|
|
"\n",
|
|
"chain = create_extraction_chain(schema, llm)\n",
|
|
"\n",
|
|
"inp = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
|
|
"Willow is a German Shepherd that likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by.\"\"\"\n",
|
|
"\n",
|
|
"chain.run(inp)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "34f3b958",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Extra information\n",
|
|
"\n",
|
|
"The power of functions (relative to using parsers alone) lies in the ability to perform sematic extraction.\n",
|
|
"\n",
|
|
"In particular, `we can ask for things that are not explictly enumerated in the schema`.\n",
|
|
"\n",
|
|
"Suppose we want unspecified additional information about dogs. \n",
|
|
"\n",
|
|
"We can use add a placeholder for unstructured extraction, `dog_extra_info`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"id": "40c7b26f",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[{'person_name': 'Alex', 'person_height': 5, 'person_hair_color': 'blonde'},\n",
|
|
" {'person_name': 'Claudia',\n",
|
|
" 'person_height': 6,\n",
|
|
" 'person_hair_color': 'brunette'},\n",
|
|
" {'dog_name': 'Willow',\n",
|
|
" 'dog_breed': 'German Shepherd',\n",
|
|
" 'dog_extra_info': 'likes to play with other dogs'},\n",
|
|
" {'dog_name': 'Milo',\n",
|
|
" 'dog_breed': 'border collie',\n",
|
|
" 'dog_extra_info': 'lives close by'}]"
|
|
]
|
|
},
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"schema = {\n",
|
|
" \"properties\": {\n",
|
|
" \"person_name\": {\"type\": \"string\"},\n",
|
|
" \"person_height\": {\"type\": \"integer\"},\n",
|
|
" \"person_hair_color\": {\"type\": \"string\"},\n",
|
|
" \"dog_name\": {\"type\": \"string\"},\n",
|
|
" \"dog_breed\": {\"type\": \"string\"},\n",
|
|
" \"dog_extra_info\": {\"type\": \"string\"},\n",
|
|
" },\n",
|
|
"}\n",
|
|
"\n",
|
|
"chain = create_extraction_chain(schema, llm)\n",
|
|
"chain.run(inp)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "3a949c60",
|
|
"metadata": {},
|
|
"source": [
|
|
"This gives us additional information about the dogs."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "bf71ddce",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Pydantic \n",
|
|
"\n",
|
|
"Pydantic is a data validation and settings management library for Python. \n",
|
|
"\n",
|
|
"It allows you to create data classes with attributes that are automatically validated when you instantiate an object.\n",
|
|
"\n",
|
|
"Lets define a class with attributes annotated with types."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "d36a743b",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[Properties(person_name='Alex', person_height=5, person_hair_color='blonde', dog_breed=None, dog_name=None),\n",
|
|
" Properties(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed=None, dog_name=None)]"
|
|
]
|
|
},
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"from typing import Optional, List\n",
|
|
"from pydantic import BaseModel, Field\n",
|
|
"from langchain.chains import create_extraction_chain_pydantic\n",
|
|
"\n",
|
|
"# Pydantic data class\n",
|
|
"class Properties(BaseModel):\n",
|
|
" person_name: str\n",
|
|
" person_height: int\n",
|
|
" person_hair_color: str\n",
|
|
" dog_breed: Optional[str]\n",
|
|
" dog_name: Optional[str]\n",
|
|
" \n",
|
|
"# Extraction\n",
|
|
"chain = create_extraction_chain_pydantic(pydantic_schema=Properties, llm=llm)\n",
|
|
"\n",
|
|
"# Run \n",
|
|
"inp = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\"\"\"\n",
|
|
"chain.run(inp)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "07a0351a",
|
|
"metadata": {},
|
|
"source": [
|
|
"As we can see from the [trace](https://smith.langchain.com/public/fed50ae6-26bb-4235-a254-e0b7a229d10f/r), we use the function `information_extraction`, as above, with the Pydantic schema. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "cbd9f121",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Option 2: Parsing\n",
|
|
"\n",
|
|
"[Output parsers](/docs/modules/model_io/output_parsers/) are classes that help structure language model responses. \n",
|
|
"\n",
|
|
"As shown above, they are used to parse the output of the OpenAI function calls in `create_extraction_chain`.\n",
|
|
"\n",
|
|
"But, they can be used independent of functions.\n",
|
|
"\n",
|
|
"### Pydantic\n",
|
|
"\n",
|
|
"Just as a above, let's parse a generation based on a Pydantic data class."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"id": "64650362",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"People(people=[Person(person_name='Alex', person_height=5, person_hair_color='blonde', dog_breed=None, dog_name=None), Person(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed=None, dog_name=None)])"
|
|
]
|
|
},
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"from typing import Sequence\n",
|
|
"from langchain.prompts import (\n",
|
|
" PromptTemplate,\n",
|
|
" ChatPromptTemplate,\n",
|
|
" HumanMessagePromptTemplate,\n",
|
|
")\n",
|
|
"from langchain.llms import OpenAI\n",
|
|
"from pydantic import BaseModel, Field, validator\n",
|
|
"from langchain.output_parsers import PydanticOutputParser\n",
|
|
"\n",
|
|
"class Person(BaseModel):\n",
|
|
" person_name: str\n",
|
|
" person_height: int\n",
|
|
" person_hair_color: str\n",
|
|
" dog_breed: Optional[str]\n",
|
|
" dog_name: Optional[str]\n",
|
|
"\n",
|
|
"class People(BaseModel):\n",
|
|
" \"\"\"Identifying information about all people in a text.\"\"\"\n",
|
|
" people: Sequence[Person]\n",
|
|
"\n",
|
|
" \n",
|
|
"# Run \n",
|
|
"query = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\"\"\"\n",
|
|
"\n",
|
|
"# Set up a parser + inject instructions into the prompt template.\n",
|
|
"parser = PydanticOutputParser(pydantic_object=People)\n",
|
|
"\n",
|
|
"# Prompt\n",
|
|
"prompt = PromptTemplate(\n",
|
|
" template=\"Answer the user query.\\n{format_instructions}\\n{query}\\n\",\n",
|
|
" input_variables=[\"query\"],\n",
|
|
" partial_variables={\"format_instructions\": parser.get_format_instructions()},\n",
|
|
")\n",
|
|
"\n",
|
|
"# Run\n",
|
|
"_input = prompt.format_prompt(query=query)\n",
|
|
"model = OpenAI(temperature=0)\n",
|
|
"output = model(_input.to_string())\n",
|
|
"parser.parse(output)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "826899df",
|
|
"metadata": {},
|
|
"source": [
|
|
"We can see from the [LangSmith trace](https://smith.langchain.com/public/8e3aa858-467e-46a5-aa49-5db65f0a2b9a/r) that we get the same output as above.\n",
|
|
"\n",
|
|
"![Image description](/img/extraction_trace_function_2.png)\n",
|
|
"\n",
|
|
"We can see that we provide a two-shot prompt in order to instruct the LLM to output in our desired format.\n",
|
|
"\n",
|
|
"And, we need to do a bit more work:\n",
|
|
"\n",
|
|
"* Define a class that holds multiple instances of `Person`\n",
|
|
"* Explicty parse the output of the LLM to the Pydantic class\n",
|
|
"\n",
|
|
"We can see this for other cases, too."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"id": "837c350e",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')"
|
|
]
|
|
},
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"from langchain.prompts import (\n",
|
|
" PromptTemplate,\n",
|
|
" ChatPromptTemplate,\n",
|
|
" HumanMessagePromptTemplate,\n",
|
|
")\n",
|
|
"from langchain.llms import OpenAI\n",
|
|
"from pydantic import BaseModel, Field, validator\n",
|
|
"from langchain.output_parsers import PydanticOutputParser\n",
|
|
"\n",
|
|
"# Define your desired data structure.\n",
|
|
"class Joke(BaseModel):\n",
|
|
" setup: str = Field(description=\"question to set up a joke\")\n",
|
|
" punchline: str = Field(description=\"answer to resolve the joke\")\n",
|
|
"\n",
|
|
" # You can add custom validation logic easily with Pydantic.\n",
|
|
" @validator(\"setup\")\n",
|
|
" def question_ends_with_question_mark(cls, field):\n",
|
|
" if field[-1] != \"?\":\n",
|
|
" raise ValueError(\"Badly formed question!\")\n",
|
|
" return field\n",
|
|
"\n",
|
|
"# And a query intented to prompt a language model to populate the data structure.\n",
|
|
"joke_query = \"Tell me a joke.\"\n",
|
|
"\n",
|
|
"# Set up a parser + inject instructions into the prompt template.\n",
|
|
"parser = PydanticOutputParser(pydantic_object=Joke)\n",
|
|
"\n",
|
|
"# Prompt\n",
|
|
"prompt = PromptTemplate(\n",
|
|
" template=\"Answer the user query.\\n{format_instructions}\\n{query}\\n\",\n",
|
|
" input_variables=[\"query\"],\n",
|
|
" partial_variables={\"format_instructions\": parser.get_format_instructions()},\n",
|
|
")\n",
|
|
"\n",
|
|
"# Run\n",
|
|
"_input = prompt.format_prompt(query=joke_query)\n",
|
|
"model = OpenAI(temperature=0)\n",
|
|
"output = model(_input.to_string())\n",
|
|
"parser.parse(output)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d3601bde",
|
|
"metadata": {},
|
|
"source": [
|
|
"As we can see, we get an output of the `Joke` class, which respects our originally desired schema: 'setup' and 'punchline'.\n",
|
|
"\n",
|
|
"We can look at the [LangSmith trace](https://smith.langchain.com/public/69f11d41-41be-4319-93b0-6d0eda66e969/r) to see exactly what is going on under the hood.\n",
|
|
"\n",
|
|
"![Image description](/img/extraction_trace_joke.png)\n",
|
|
"\n",
|
|
"### Going deeper\n",
|
|
"\n",
|
|
"* The [output parser](/docs/modules/model_io/output_parsers/) documentation includes various parser examples for specific types (e.g., lists, datetimne, enum, etc). \n",
|
|
"* [JSONFormer](/docs/integrations/llms/jsonformer_experimental) offers another way for structured decoding of a subset of the JSON Schema.\n",
|
|
"* [Kor](https://eyurtsev.github.io/kor/) is another library for extraction where schema and examples can be provided to the LLM."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.9.16"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|