"* Large chunk overlap may cause the same information to be extracted twice, so be prepared to de-duplicate!\n",
"* LLMs can make up data. If looking for a single fact across a large text and using a brute force approach, you may end up getting more made up data."
"System: Answer the user query. Output your answer as JSON that matches the given schema: ```json\n",
"System: Answer the user query. Output your answer as JSON that matches the given schema: ```json\n",
"{'title': 'People', 'description': 'Identifying information about all people in a text.', 'type': 'object', 'properties': {'people': {'title': 'People', 'type': 'array', 'items': {'$ref': '#/definitions/Person'}}}, 'required': ['people'], 'definitions': {'Person': {'title': 'Person', 'description': 'Information about a person.', 'type': 'object', 'properties': {'name': {'title': 'Name', 'description': 'The name of the person', 'type': 'string'}, 'height_in_meters': {'title': 'Height In Meters', 'description': 'The height of the person expressed in meters.', 'type': 'number'}}, 'required': ['name', 'height_in_meters']}}}\n",
"```. Make sure to wrap the answer in ```json and ``` tags\n",
"Human: Anna is 23 years old and she is 6 feet tall\n"
"- **Tool/Function Calling** Mode: Some LLMs support a *tool or function calling* mode. These LLMs can structure output according to a given **schema**. Generally, this approach is the easiest to work with and is expected to yield good results.\n",
"\n",
"- **JSON Mode**: Some LLMs are can be forced to output valid JSON. This is similar to **tool/function Calling** approach, except that the schema is provided as part of the prompt. Generally, our intuition is that this performs worse than a **tool/function calling** approach.\n",
"- **JSON Mode**: Some LLMs are can be forced to output valid JSON. This is similar to **tool/function Calling** approach, except that the schema is provided as part of the prompt. Generally, our intuition is that this performs worse than a **tool/function calling** approach, but don't trust us and verify for your own use case!\n",
"\n",
"- **Prompting Based**: LLMs that can follow instructions well can be instructed to generate text in a desired format. The generated text can be parsed downstream using existing [Output Parsers](/docs/modules/model_io/output_parsers/) or using [custom parsers](/docs/modules/model_io/output_parsers/custom) into a structured format like JSON. This approach can be used with LLMs that **do not support** JSON mode or tool/function calling modes. This approach is more broadly applicable, though may yield worse results than models that have been fine-tuned for extraction or function calling.\n",
"\n",
@ -71,14 +71,6 @@
"* [OpenAI's function and tool calling](https://platform.openai.com/docs/guides/function-calling)\n",
"* For example, see [OpenAI's JSON mode](https://platform.openai.com/docs/guides/text-generation/json-mode)."
" # Having a good description can help improve extraction results.\n",
" name: Optional[str] = Field(..., description=\"The name of the person\")\n",
" hair_color: Optional[str] = Field(\n",
" ..., description=\"The color of the peron's eyes if known\"\n",
" ..., description=\"The color of the peron's hair if known\"\n",
" )\n",
" height_in_meters: Optional[str] = Field(\n",
" ..., description=\"Height measured in meters\"\n",
" )\n",
" height_in_meters: Optional[str] = Field(..., description=\"Height in meters\")\n",
"\n",
"\n",
"class Data(BaseModel):\n",
@ -267,26 +270,36 @@
" people: List[Person]"
]
},
{
"cell_type": "markdown",
"id": "5f5cda33-fd7b-481e-956a-703f45e40e1d",
"metadata": {},
"source": [
":::{.callout-important}\n",
"Extraction might not be perfect here. Please continue to see how to use **Reference Examples** to improve the quality of extraction, and see the **guidelines** section!\n",
"text = \"My name is Jeff and I am 2 meters. I have black hair. Anna has the same color hair as me.\"\n",
"text = \"My name is Jeff, my hair is black and i am 6 feet tall. Anna has the same color hair as me.\"\n",
"runnable.invoke({\"text\": text})"
]
},
@ -318,14 +331,6 @@
"- [Use a Parsing Approach](/docs/use_cases/extraction/how_to/parse): Use a prompt based approach to extract with models that do not support **tool/function calling**.\n",
"- [Guidelines](/docs/use_cases/extraction/guidelines): Guidelines for getting good performance on extraction tasks."