forked from Archives/langchain
You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
409 lines
10 KiB
Plaintext
409 lines
10 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a13ea924",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Tagging\n",
|
|
"\n",
|
|
"The tagging chain uses the OpenAI `functions` parameter to specify a schema to tag a document with. This helps us make sure that the model outputs exactly tags that we want, with their appropriate types.\n",
|
|
"\n",
|
|
"The tagging chain is to be used when we want to tag a passage with a specific attribute (i.e. what is the sentiment of this message?)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"id": "bafb496a",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"/Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.6.4) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.\n",
|
|
" warnings.warn(\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"from langchain.chat_models import ChatOpenAI\n",
|
|
"from langchain.chains import create_tagging_chain, create_tagging_chain_pydantic\n",
|
|
"from langchain.prompts import ChatPromptTemplate"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "39f3ce3e",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "832ddcd9",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Simplest approach, only specifying type"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "4fc8d766",
|
|
"metadata": {},
|
|
"source": [
|
|
"We can start by specifying a few properties with their expected type in our schema"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "8329f943",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"schema = {\n",
|
|
" \"properties\": {\n",
|
|
" \"sentiment\": {\"type\": \"string\"},\n",
|
|
" \"aggressiveness\": {\"type\": \"integer\"},\n",
|
|
" \"language\": {\"type\": \"string\"},\n",
|
|
" }\n",
|
|
"}"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "6146ae70",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"chain = create_tagging_chain(schema, llm)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "9e306ca3",
|
|
"metadata": {},
|
|
"source": [
|
|
"As we can see in the examples, it correctly interprets what we want but the results vary so that we get, for example, sentiments in different languages ('positive', 'enojado' etc.).\n",
|
|
"\n",
|
|
"We will see how to control these results in the next section."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"id": "5509b6a6",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"{'sentiment': 'positive', 'language': 'Spanish'}"
|
|
]
|
|
},
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"inp = \"Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!\"\n",
|
|
"chain.run(inp)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"id": "9154474c",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"{'sentiment': 'enojado', 'aggressiveness': 1, 'language': 'Spanish'}"
|
|
]
|
|
},
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"inp = \"Estoy muy enojado con vos! Te voy a dar tu merecido!\"\n",
|
|
"chain.run(inp)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "aae85b27",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"{'sentiment': 'positive', 'aggressiveness': 0, 'language': 'English'}"
|
|
]
|
|
},
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"inp = \"Weather is ok here, I can go outside without much more than a coat\"\n",
|
|
"chain.run(inp)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "bebb2f83",
|
|
"metadata": {},
|
|
"source": [
|
|
"## More control\n",
|
|
"\n",
|
|
"By being smart about how we define our schema we can have more control over the model's output. Specifically we can define:\n",
|
|
"\n",
|
|
"- possible values for each property\n",
|
|
"- description to make sure that the model understands the property\n",
|
|
"- required properties to be returned"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "69ef0b9a",
|
|
"metadata": {},
|
|
"source": [
|
|
"Following is an example of how we can use _enum_, _description_ and _required_ to control for each of the previously mentioned aspects:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"id": "6a5f7961",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"schema = {\n",
|
|
" \"properties\": {\n",
|
|
" \"sentiment\": {\"type\": \"string\", \"enum\": [\"happy\", \"neutral\", \"sad\"]},\n",
|
|
" \"aggressiveness\": {\n",
|
|
" \"type\": \"integer\",\n",
|
|
" \"enum\": [1, 2, 3, 4, 5],\n",
|
|
" \"description\": \"describes how aggressive the statement is, the higher the number the more aggressive\",\n",
|
|
" },\n",
|
|
" \"language\": {\n",
|
|
" \"type\": \"string\",\n",
|
|
" \"enum\": [\"spanish\", \"english\", \"french\", \"german\", \"italian\"],\n",
|
|
" },\n",
|
|
" },\n",
|
|
" \"required\": [\"language\", \"sentiment\", \"aggressiveness\"],\n",
|
|
"}"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"id": "e5a5881f",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"chain = create_tagging_chain(schema, llm)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5ded2332",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now the answers are much better!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"id": "d9b9d53d",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"{'sentiment': 'happy', 'aggressiveness': 0, 'language': 'spanish'}"
|
|
]
|
|
},
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"inp = \"Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!\"\n",
|
|
"chain.run(inp)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"id": "1c12fa00",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"{'sentiment': 'sad', 'aggressiveness': 10, 'language': 'spanish'}"
|
|
]
|
|
},
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"inp = \"Estoy muy enojado con vos! Te voy a dar tu merecido!\"\n",
|
|
"chain.run(inp)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"id": "0bdfcb05",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"{'sentiment': 'neutral', 'aggressiveness': 0, 'language': 'english'}"
|
|
]
|
|
},
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"inp = \"Weather is ok here, I can go outside without much more than a coat\"\n",
|
|
"chain.run(inp)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e68ad17e",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Specifying schema with Pydantic"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "2f5970ec",
|
|
"metadata": {},
|
|
"source": [
|
|
"We can also use a Pydantic schema to specify the required properties and types. We can also send other arguments, such as 'enum' or 'description' as can be seen in the example below.\n",
|
|
"\n",
|
|
"By using the `create_tagging_chain_pydantic` function, we can send a Pydantic schema as input and the output will be an instantiated object that respects our desired schema. \n",
|
|
"\n",
|
|
"In this way, we can specify our schema in the same manner that we would a new class or function in Python - with purely Pythonic types."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"id": "bf1f367e",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from enum import Enum\n",
|
|
"from pydantic import BaseModel, Field"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"id": "83a2e826",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"class Tags(BaseModel):\n",
|
|
" sentiment: str = Field(..., enum=[\"happy\", \"neutral\", \"sad\"])\n",
|
|
" aggressiveness: int = Field(\n",
|
|
" ...,\n",
|
|
" description=\"describes how aggressive the statement is, the higher the number the more aggressive\",\n",
|
|
" enum=[1, 2, 3, 4, 5],\n",
|
|
" )\n",
|
|
" language: str = Field(\n",
|
|
" ..., enum=[\"spanish\", \"english\", \"french\", \"german\", \"italian\"]\n",
|
|
" )"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"id": "6e404892",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"chain = create_tagging_chain_pydantic(Tags, llm)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 16,
|
|
"id": "b5fc43c4",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"inp = \"Estoy muy enojado con vos! Te voy a dar tu merecido!\"\n",
|
|
"res = chain.run(inp)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 17,
|
|
"id": "5074bcc3",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Tags(sentiment='sad', aggressiveness=10, language='spanish')"
|
|
]
|
|
},
|
|
"execution_count": 17,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"res"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.9.1"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|