mirror of
https://github.com/hwchase17/langchain
synced 2024-11-08 07:10:35 +00:00
tagging docs refactor (#8722)
refactor of tagging use case according to new format --------- Co-authored-by: Lance Martin <lance@langchain.dev>
This commit is contained in:
parent
01ef786e7e
commit
9249d305af
BIN
docs/docs_skeleton/static/img/tagging.png
Normal file
BIN
docs/docs_skeleton/static/img/tagging.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 111 KiB |
BIN
docs/docs_skeleton/static/img/tagging_trace.png
Normal file
BIN
docs/docs_skeleton/static/img/tagging_trace.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 130 KiB |
@ -2,97 +2,90 @@
|
|||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"id": "a13ea924",
|
"id": "a0507a4b",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Tagging\n",
|
"# Tagging\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The tagging chain uses the OpenAI `functions` parameter to specify a schema to tag a document with. This helps us make sure that the model outputs exactly tags that we want, with their appropriate types.\n",
|
"[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/tagging.ipynb)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The tagging chain is to be used when we want to tag a passage with a specific attribute (i.e. what is the sentiment of this message?)"
|
"## Use case\n",
|
||||||
|
"\n",
|
||||||
|
"Tagging means labeling a document with classes such as:\n",
|
||||||
|
"\n",
|
||||||
|
"- sentiment\n",
|
||||||
|
"- language\n",
|
||||||
|
"- style (formal, informal etc.)\n",
|
||||||
|
"- covered topics\n",
|
||||||
|
"- political tendency\n",
|
||||||
|
"\n",
|
||||||
|
"![Image description](/img/tagging.png)\n",
|
||||||
|
"\n",
|
||||||
|
"## Overview\n",
|
||||||
|
"\n",
|
||||||
|
"Tagging has a few components:\n",
|
||||||
|
"\n",
|
||||||
|
"* `function`: Like [extraction](/docs/use_cases/extraction), tagging uses [functions](https://openai.com/blog/function-calling-and-other-api-updates) to specify how the model should tag a document\n",
|
||||||
|
"* `schema`: defines how we want to tag the document\n",
|
||||||
|
"\n",
|
||||||
|
"## Quickstart\n",
|
||||||
|
"\n",
|
||||||
|
"Let's see a very straightforward example of how we can use OpenAI functions for tagging in LangChain."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": 1,
|
"execution_count": null,
|
||||||
"id": "bafb496a",
|
"id": "dc5cbb6f",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [
|
"outputs": [],
|
||||||
{
|
|
||||||
"name": "stderr",
|
|
||||||
"output_type": "stream",
|
|
||||||
"text": [
|
|
||||||
"/Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.6.4) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.\n",
|
|
||||||
" warnings.warn(\n"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"source": [
|
"source": [
|
||||||
"from langchain.chat_models import ChatOpenAI\n",
|
"!pip install langchain openai \n",
|
||||||
"from langchain.chains import create_tagging_chain, create_tagging_chain_pydantic\n",
|
"\n",
|
||||||
"from langchain.prompts import ChatPromptTemplate"
|
"# Set env var OPENAI_API_KEY or load from a .env file:\n",
|
||||||
|
"# import dotenv\n",
|
||||||
|
"# dotenv.load_env()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": 2,
|
"execution_count": 2,
|
||||||
"id": "39f3ce3e",
|
"id": "bafb496a",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")"
|
"from langchain.chat_models import ChatOpenAI\n",
|
||||||
|
"from langchain.prompts import ChatPromptTemplate\n",
|
||||||
|
"from langchain.chains import create_tagging_chain, create_tagging_chain_pydantic"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"id": "832ddcd9",
|
"id": "b8ca3f93",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Simplest approach, only specifying type"
|
"We specify a few properties with their expected type in our schema."
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"id": "4fc8d766",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"We can start by specifying a few properties with their expected type in our schema"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": 3,
|
"execution_count": 4,
|
||||||
"id": "8329f943",
|
"id": "39f3ce3e",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
|
"# Schema\n",
|
||||||
"schema = {\n",
|
"schema = {\n",
|
||||||
" \"properties\": {\n",
|
" \"properties\": {\n",
|
||||||
" \"sentiment\": {\"type\": \"string\"},\n",
|
" \"sentiment\": {\"type\": \"string\"},\n",
|
||||||
" \"aggressiveness\": {\"type\": \"integer\"},\n",
|
" \"aggressiveness\": {\"type\": \"integer\"},\n",
|
||||||
" \"language\": {\"type\": \"string\"},\n",
|
" \"language\": {\"type\": \"string\"},\n",
|
||||||
" }\n",
|
" }\n",
|
||||||
"}"
|
"}\n",
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": 4,
|
|
||||||
"id": "6146ae70",
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"chain = create_tagging_chain(schema, llm)"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"id": "9e306ca3",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"As we can see in the examples, it correctly interprets what we want but the results vary so that we get, for example, sentiments in different languages ('positive', 'enojado' etc.).\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"We will see how to control these results in the next section."
|
"# LLM\n",
|
||||||
|
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
|
||||||
|
"chain = create_tagging_chain(schema, llm)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -126,7 +119,7 @@
|
|||||||
{
|
{
|
||||||
"data": {
|
"data": {
|
||||||
"text/plain": [
|
"text/plain": [
|
||||||
"{'sentiment': 'enojado', 'aggressiveness': 1, 'language': 'Spanish'}"
|
"{'sentiment': 'enojado', 'aggressiveness': 1, 'language': 'es'}"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"execution_count": 6,
|
"execution_count": 6,
|
||||||
@ -140,25 +133,15 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "markdown",
|
||||||
"execution_count": 7,
|
"id": "d921bb53",
|
||||||
"id": "aae85b27",
|
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [
|
|
||||||
{
|
|
||||||
"data": {
|
|
||||||
"text/plain": [
|
|
||||||
"{'sentiment': 'positive', 'aggressiveness': 0, 'language': 'English'}"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"execution_count": 7,
|
|
||||||
"metadata": {},
|
|
||||||
"output_type": "execute_result"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"source": [
|
"source": [
|
||||||
"inp = \"Weather is ok here, I can go outside without much more than a coat\"\n",
|
"As we can see in the examples, it correctly interprets what we want.\n",
|
||||||
"chain.run(inp)"
|
"\n",
|
||||||
|
"The results vary so that we get, for example, sentiments in different languages ('positive', 'enojado' etc.).\n",
|
||||||
|
"\n",
|
||||||
|
"We will see how to control these results in the next section."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -166,9 +149,11 @@
|
|||||||
"id": "bebb2f83",
|
"id": "bebb2f83",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## More control\n",
|
"## Finer control\n",
|
||||||
"\n",
|
"\n",
|
||||||
"By being smart about how we define our schema we can have more control over the model's output. Specifically we can define:\n",
|
"Careful schema definition gives us more control over the model's output. \n",
|
||||||
|
"\n",
|
||||||
|
"Specifically, we can define:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"- possible values for each property\n",
|
"- possible values for each property\n",
|
||||||
"- description to make sure that the model understands the property\n",
|
"- description to make sure that the model understands the property\n",
|
||||||
@ -180,7 +165,7 @@
|
|||||||
"id": "69ef0b9a",
|
"id": "69ef0b9a",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Following is an example of how we can use _enum_, _description_ and _required_ to control for each of the previously mentioned aspects:"
|
"Here is an example of how we can use `_enum_`, `_description_`, and `_required_` to control for each of the previously mentioned aspects:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -192,7 +177,6 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"schema = {\n",
|
"schema = {\n",
|
||||||
" \"properties\": {\n",
|
" \"properties\": {\n",
|
||||||
" \"sentiment\": {\"type\": \"string\", \"enum\": [\"happy\", \"neutral\", \"sad\"]},\n",
|
|
||||||
" \"aggressiveness\": {\n",
|
" \"aggressiveness\": {\n",
|
||||||
" \"type\": \"integer\",\n",
|
" \"type\": \"integer\",\n",
|
||||||
" \"enum\": [1, 2, 3, 4, 5],\n",
|
" \"enum\": [1, 2, 3, 4, 5],\n",
|
||||||
@ -234,7 +218,7 @@
|
|||||||
{
|
{
|
||||||
"data": {
|
"data": {
|
||||||
"text/plain": [
|
"text/plain": [
|
||||||
"{'sentiment': 'happy', 'aggressiveness': 0, 'language': 'spanish'}"
|
"{'aggressiveness': 0, 'language': 'spanish'}"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"execution_count": 10,
|
"execution_count": 10,
|
||||||
@ -256,7 +240,7 @@
|
|||||||
{
|
{
|
||||||
"data": {
|
"data": {
|
||||||
"text/plain": [
|
"text/plain": [
|
||||||
"{'sentiment': 'sad', 'aggressiveness': 10, 'language': 'spanish'}"
|
"{'aggressiveness': 5, 'language': 'spanish'}"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"execution_count": 11,
|
"execution_count": 11,
|
||||||
@ -278,7 +262,7 @@
|
|||||||
{
|
{
|
||||||
"data": {
|
"data": {
|
||||||
"text/plain": [
|
"text/plain": [
|
||||||
"{'sentiment': 'neutral', 'aggressiveness': 0, 'language': 'english'}"
|
"{'aggressiveness': 0, 'language': 'english'}"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"execution_count": 12,
|
"execution_count": 12,
|
||||||
@ -291,12 +275,25 @@
|
|||||||
"chain.run(inp)"
|
"chain.run(inp)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "cf6b7389",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"The [LangSmith trace](https://smith.langchain.com/public/311e663a-bbe8-4053-843e-5735055c032d/r) lets us peek under the hood:\n",
|
||||||
|
"\n",
|
||||||
|
"* As with [extraction](/docs/use_cases/extraction), we call the `information_extraction` function [here](https://github.com/langchain-ai/langchain/blob/269f85b7b7ffd74b38cd422d4164fc033388c3d0/libs/langchain/langchain/chains/openai_functions/extraction.py#L20) on the input string.\n",
|
||||||
|
"* This OpenAI funtion extraction information based upon the provided schema.\n",
|
||||||
|
"\n",
|
||||||
|
"![Image description](/img/tagging_trace.png)"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"id": "e68ad17e",
|
"id": "e68ad17e",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Specifying schema with Pydantic"
|
"## Pydantic"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -304,11 +301,11 @@
|
|||||||
"id": "2f5970ec",
|
"id": "2f5970ec",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We can also use a Pydantic schema to specify the required properties and types. We can also send other arguments, such as 'enum' or 'description' as can be seen in the example below.\n",
|
"We can also use a Pydantic schema to specify the required properties and types. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"By using the `create_tagging_chain_pydantic` function, we can send a Pydantic schema as input and the output will be an instantiated object that respects our desired schema. \n",
|
"We can also send other arguments, such as `enum` or `description`, to each field.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this way, we can specify our schema in the same manner that we would a new class or function in Python - with purely Pythonic types."
|
"This lets us specify our schema in the same manner that we would a new class or function in Python with purely Pythonic types."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -371,7 +368,7 @@
|
|||||||
{
|
{
|
||||||
"data": {
|
"data": {
|
||||||
"text/plain": [
|
"text/plain": [
|
||||||
"Tags(sentiment='sad', aggressiveness=10, language='spanish')"
|
"Tags(sentiment='sad', aggressiveness=5, language='spanish')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"execution_count": 17,
|
"execution_count": 17,
|
||||||
@ -382,6 +379,17 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"res"
|
"res"
|
||||||
]
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "29346d09",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Going deeper\n",
|
||||||
|
"\n",
|
||||||
|
"* You can use the [metadata tagger](https://python.langchain.com/docs/integrations/document_transformers/openai_metadata_tagger) document transformer to extract metadata from a LangChain `Document`. \n",
|
||||||
|
"* This covers the same basic functionality as the tagging chain, only applied to a LangChain `Document`."
|
||||||
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
@ -400,7 +408,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.9.1"
|
"version": "3.9.16"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
Loading…
Reference in New Issue
Block a user