tagging docs refactor (#8722)

refactor of tagging use case according to new format

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
This commit is contained in:
Francisco Ingham 2023-08-11 12:06:07 -03:00 committed by GitHub
parent 01ef786e7e
commit 9249d305af
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 91 additions and 83 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 111 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 130 KiB

View File

@ -2,97 +2,90 @@
"cells": [ "cells": [
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "a13ea924", "id": "a0507a4b",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Tagging\n", "# Tagging\n",
"\n", "\n",
"The tagging chain uses the OpenAI `functions` parameter to specify a schema to tag a document with. This helps us make sure that the model outputs exactly tags that we want, with their appropriate types.\n", "[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/tagging.ipynb)\n",
"\n", "\n",
"The tagging chain is to be used when we want to tag a passage with a specific attribute (i.e. what is the sentiment of this message?)" "## Use case\n",
"\n",
"Tagging means labeling a document with classes such as:\n",
"\n",
"- sentiment\n",
"- language\n",
"- style (formal, informal etc.)\n",
"- covered topics\n",
"- political tendency\n",
"\n",
"![Image description](/img/tagging.png)\n",
"\n",
"## Overview\n",
"\n",
"Tagging has a few components:\n",
"\n",
"* `function`: Like [extraction](/docs/use_cases/extraction), tagging uses [functions](https://openai.com/blog/function-calling-and-other-api-updates) to specify how the model should tag a document\n",
"* `schema`: defines how we want to tag the document\n",
"\n",
"## Quickstart\n",
"\n",
"Let's see a very straightforward example of how we can use OpenAI functions for tagging in LangChain."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 1, "execution_count": null,
"id": "bafb496a", "id": "dc5cbb6f",
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [],
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.6.4) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.\n",
" warnings.warn(\n"
]
}
],
"source": [ "source": [
"from langchain.chat_models import ChatOpenAI\n", "!pip install langchain openai \n",
"from langchain.chains import create_tagging_chain, create_tagging_chain_pydantic\n", "\n",
"from langchain.prompts import ChatPromptTemplate" "# Set env var OPENAI_API_KEY or load from a .env file:\n",
"# import dotenv\n",
"# dotenv.load_env()"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 2, "execution_count": 2,
"id": "39f3ce3e", "id": "bafb496a",
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")" "from langchain.chat_models import ChatOpenAI\n",
"from langchain.prompts import ChatPromptTemplate\n",
"from langchain.chains import create_tagging_chain, create_tagging_chain_pydantic"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "832ddcd9", "id": "b8ca3f93",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Simplest approach, only specifying type" "We specify a few properties with their expected type in our schema."
]
},
{
"cell_type": "markdown",
"id": "4fc8d766",
"metadata": {},
"source": [
"We can start by specifying a few properties with their expected type in our schema"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 3, "execution_count": 4,
"id": "8329f943", "id": "39f3ce3e",
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# Schema\n",
"schema = {\n", "schema = {\n",
" \"properties\": {\n", " \"properties\": {\n",
" \"sentiment\": {\"type\": \"string\"},\n", " \"sentiment\": {\"type\": \"string\"},\n",
" \"aggressiveness\": {\"type\": \"integer\"},\n", " \"aggressiveness\": {\"type\": \"integer\"},\n",
" \"language\": {\"type\": \"string\"},\n", " \"language\": {\"type\": \"string\"},\n",
" }\n", " }\n",
"}" "}\n",
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "6146ae70",
"metadata": {},
"outputs": [],
"source": [
"chain = create_tagging_chain(schema, llm)"
]
},
{
"cell_type": "markdown",
"id": "9e306ca3",
"metadata": {},
"source": [
"As we can see in the examples, it correctly interprets what we want but the results vary so that we get, for example, sentiments in different languages ('positive', 'enojado' etc.).\n",
"\n", "\n",
"We will see how to control these results in the next section." "# LLM\n",
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
"chain = create_tagging_chain(schema, llm)"
] ]
}, },
{ {
@ -126,7 +119,7 @@
{ {
"data": { "data": {
"text/plain": [ "text/plain": [
"{'sentiment': 'enojado', 'aggressiveness': 1, 'language': 'Spanish'}" "{'sentiment': 'enojado', 'aggressiveness': 1, 'language': 'es'}"
] ]
}, },
"execution_count": 6, "execution_count": 6,
@ -140,25 +133,15 @@
] ]
}, },
{ {
"cell_type": "code", "cell_type": "markdown",
"execution_count": 7, "id": "d921bb53",
"id": "aae85b27",
"metadata": {}, "metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'sentiment': 'positive', 'aggressiveness': 0, 'language': 'English'}"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [ "source": [
"inp = \"Weather is ok here, I can go outside without much more than a coat\"\n", "As we can see in the examples, it correctly interprets what we want.\n",
"chain.run(inp)" "\n",
"The results vary so that we get, for example, sentiments in different languages ('positive', 'enojado' etc.).\n",
"\n",
"We will see how to control these results in the next section."
] ]
}, },
{ {
@ -166,9 +149,11 @@
"id": "bebb2f83", "id": "bebb2f83",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## More control\n", "## Finer control\n",
"\n", "\n",
"By being smart about how we define our schema we can have more control over the model's output. Specifically we can define:\n", "Careful schema definition gives us more control over the model's output. \n",
"\n",
"Specifically, we can define:\n",
"\n", "\n",
"- possible values for each property\n", "- possible values for each property\n",
"- description to make sure that the model understands the property\n", "- description to make sure that the model understands the property\n",
@ -180,7 +165,7 @@
"id": "69ef0b9a", "id": "69ef0b9a",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Following is an example of how we can use _enum_, _description_ and _required_ to control for each of the previously mentioned aspects:" "Here is an example of how we can use `_enum_`, `_description_`, and `_required_` to control for each of the previously mentioned aspects:"
] ]
}, },
{ {
@ -192,7 +177,6 @@
"source": [ "source": [
"schema = {\n", "schema = {\n",
" \"properties\": {\n", " \"properties\": {\n",
" \"sentiment\": {\"type\": \"string\", \"enum\": [\"happy\", \"neutral\", \"sad\"]},\n",
" \"aggressiveness\": {\n", " \"aggressiveness\": {\n",
" \"type\": \"integer\",\n", " \"type\": \"integer\",\n",
" \"enum\": [1, 2, 3, 4, 5],\n", " \"enum\": [1, 2, 3, 4, 5],\n",
@ -234,7 +218,7 @@
{ {
"data": { "data": {
"text/plain": [ "text/plain": [
"{'sentiment': 'happy', 'aggressiveness': 0, 'language': 'spanish'}" "{'aggressiveness': 0, 'language': 'spanish'}"
] ]
}, },
"execution_count": 10, "execution_count": 10,
@ -256,7 +240,7 @@
{ {
"data": { "data": {
"text/plain": [ "text/plain": [
"{'sentiment': 'sad', 'aggressiveness': 10, 'language': 'spanish'}" "{'aggressiveness': 5, 'language': 'spanish'}"
] ]
}, },
"execution_count": 11, "execution_count": 11,
@ -278,7 +262,7 @@
{ {
"data": { "data": {
"text/plain": [ "text/plain": [
"{'sentiment': 'neutral', 'aggressiveness': 0, 'language': 'english'}" "{'aggressiveness': 0, 'language': 'english'}"
] ]
}, },
"execution_count": 12, "execution_count": 12,
@ -291,12 +275,25 @@
"chain.run(inp)" "chain.run(inp)"
] ]
}, },
{
"cell_type": "markdown",
"id": "cf6b7389",
"metadata": {},
"source": [
"The [LangSmith trace](https://smith.langchain.com/public/311e663a-bbe8-4053-843e-5735055c032d/r) lets us peek under the hood:\n",
"\n",
"* As with [extraction](/docs/use_cases/extraction), we call the `information_extraction` function [here](https://github.com/langchain-ai/langchain/blob/269f85b7b7ffd74b38cd422d4164fc033388c3d0/libs/langchain/langchain/chains/openai_functions/extraction.py#L20) on the input string.\n",
"* This OpenAI funtion extraction information based upon the provided schema.\n",
"\n",
"![Image description](/img/tagging_trace.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "e68ad17e", "id": "e68ad17e",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Specifying schema with Pydantic" "## Pydantic"
] ]
}, },
{ {
@ -304,11 +301,11 @@
"id": "2f5970ec", "id": "2f5970ec",
"metadata": {}, "metadata": {},
"source": [ "source": [
"We can also use a Pydantic schema to specify the required properties and types. We can also send other arguments, such as 'enum' or 'description' as can be seen in the example below.\n", "We can also use a Pydantic schema to specify the required properties and types. \n",
"\n", "\n",
"By using the `create_tagging_chain_pydantic` function, we can send a Pydantic schema as input and the output will be an instantiated object that respects our desired schema. \n", "We can also send other arguments, such as `enum` or `description`, to each field.\n",
"\n", "\n",
"In this way, we can specify our schema in the same manner that we would a new class or function in Python - with purely Pythonic types." "This lets us specify our schema in the same manner that we would a new class or function in Python with purely Pythonic types."
] ]
}, },
{ {
@ -371,7 +368,7 @@
{ {
"data": { "data": {
"text/plain": [ "text/plain": [
"Tags(sentiment='sad', aggressiveness=10, language='spanish')" "Tags(sentiment='sad', aggressiveness=5, language='spanish')"
] ]
}, },
"execution_count": 17, "execution_count": 17,
@ -382,6 +379,17 @@
"source": [ "source": [
"res" "res"
] ]
},
{
"cell_type": "markdown",
"id": "29346d09",
"metadata": {},
"source": [
"### Going deeper\n",
"\n",
"* You can use the [metadata tagger](https://python.langchain.com/docs/integrations/document_transformers/openai_metadata_tagger) document transformer to extract metadata from a LangChain `Document`. \n",
"* This covers the same basic functionality as the tagging chain, only applied to a LangChain `Document`."
]
} }
], ],
"metadata": { "metadata": {
@ -400,7 +408,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.9.1" "version": "3.9.16"
} }
}, },
"nbformat": 4, "nbformat": 4,