diff --git a/docs/docs_skeleton/static/img/tagging.png b/docs/docs_skeleton/static/img/tagging.png new file mode 100644 index 0000000000..cd4443be2b Binary files /dev/null and b/docs/docs_skeleton/static/img/tagging.png differ diff --git a/docs/docs_skeleton/static/img/tagging_trace.png b/docs/docs_skeleton/static/img/tagging_trace.png new file mode 100644 index 0000000000..3cc1231d86 Binary files /dev/null and b/docs/docs_skeleton/static/img/tagging_trace.png differ diff --git a/docs/extras/use_cases/tagging.ipynb b/docs/extras/use_cases/tagging.ipynb index b51e3f6d5e..d8c3b16631 100644 --- a/docs/extras/use_cases/tagging.ipynb +++ b/docs/extras/use_cases/tagging.ipynb @@ -2,97 +2,90 @@ "cells": [ { "cell_type": "markdown", - "id": "a13ea924", + "id": "a0507a4b", "metadata": {}, "source": [ "# Tagging\n", "\n", - "The tagging chain uses the OpenAI `functions` parameter to specify a schema to tag a document with. This helps us make sure that the model outputs exactly tags that we want, with their appropriate types.\n", + "[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/tagging.ipynb)\n", "\n", - "The tagging chain is to be used when we want to tag a passage with a specific attribute (i.e. what is the sentiment of this message?)" + "## Use case\n", + "\n", + "Tagging means labeling a document with classes such as:\n", + "\n", + "- sentiment\n", + "- language\n", + "- style (formal, informal etc.)\n", + "- covered topics\n", + "- political tendency\n", + "\n", + "![Image description](/img/tagging.png)\n", + "\n", + "## Overview\n", + "\n", + "Tagging has a few components:\n", + "\n", + "* `function`: Like [extraction](/docs/use_cases/extraction), tagging uses [functions](https://openai.com/blog/function-calling-and-other-api-updates) to specify how the model should tag a document\n", + "* `schema`: defines how we want to tag the document\n", + "\n", + "## Quickstart\n", + "\n", + "Let's see a very straightforward example of how we can use OpenAI functions for tagging in LangChain." ] }, { "cell_type": "code", - "execution_count": 1, - "id": "bafb496a", + "execution_count": null, + "id": "dc5cbb6f", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.6.4) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.\n", - " warnings.warn(\n" - ] - } - ], + "outputs": [], "source": [ - "from langchain.chat_models import ChatOpenAI\n", - "from langchain.chains import create_tagging_chain, create_tagging_chain_pydantic\n", - "from langchain.prompts import ChatPromptTemplate" + "!pip install langchain openai \n", + "\n", + "# Set env var OPENAI_API_KEY or load from a .env file:\n", + "# import dotenv\n", + "# dotenv.load_env()" ] }, { "cell_type": "code", "execution_count": 2, - "id": "39f3ce3e", + "id": "bafb496a", "metadata": {}, "outputs": [], "source": [ - "llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")" - ] - }, - { - "cell_type": "markdown", - "id": "832ddcd9", - "metadata": {}, - "source": [ - "## Simplest approach, only specifying type" + "from langchain.chat_models import ChatOpenAI\n", + "from langchain.prompts import ChatPromptTemplate\n", + "from langchain.chains import create_tagging_chain, create_tagging_chain_pydantic" ] }, { "cell_type": "markdown", - "id": "4fc8d766", + "id": "b8ca3f93", "metadata": {}, "source": [ - "We can start by specifying a few properties with their expected type in our schema" + "We specify a few properties with their expected type in our schema." ] }, { "cell_type": "code", - "execution_count": 3, - "id": "8329f943", + "execution_count": 4, + "id": "39f3ce3e", "metadata": {}, "outputs": [], "source": [ + "# Schema\n", "schema = {\n", " \"properties\": {\n", " \"sentiment\": {\"type\": \"string\"},\n", " \"aggressiveness\": {\"type\": \"integer\"},\n", " \"language\": {\"type\": \"string\"},\n", " }\n", - "}" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "6146ae70", - "metadata": {}, - "outputs": [], - "source": [ - "chain = create_tagging_chain(schema, llm)" - ] - }, - { - "cell_type": "markdown", - "id": "9e306ca3", - "metadata": {}, - "source": [ - "As we can see in the examples, it correctly interprets what we want but the results vary so that we get, for example, sentiments in different languages ('positive', 'enojado' etc.).\n", + "}\n", "\n", - "We will see how to control these results in the next section." + "# LLM\n", + "llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n", + "chain = create_tagging_chain(schema, llm)" ] }, { @@ -126,7 +119,7 @@ { "data": { "text/plain": [ - "{'sentiment': 'enojado', 'aggressiveness': 1, 'language': 'Spanish'}" + "{'sentiment': 'enojado', 'aggressiveness': 1, 'language': 'es'}" ] }, "execution_count": 6, @@ -140,25 +133,15 @@ ] }, { - "cell_type": "code", - "execution_count": 7, - "id": "aae85b27", + "cell_type": "markdown", + "id": "d921bb53", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'sentiment': 'positive', 'aggressiveness': 0, 'language': 'English'}" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], "source": [ - "inp = \"Weather is ok here, I can go outside without much more than a coat\"\n", - "chain.run(inp)" + "As we can see in the examples, it correctly interprets what we want.\n", + "\n", + "The results vary so that we get, for example, sentiments in different languages ('positive', 'enojado' etc.).\n", + "\n", + "We will see how to control these results in the next section." ] }, { @@ -166,9 +149,11 @@ "id": "bebb2f83", "metadata": {}, "source": [ - "## More control\n", + "## Finer control\n", "\n", - "By being smart about how we define our schema we can have more control over the model's output. Specifically we can define:\n", + "Careful schema definition gives us more control over the model's output. \n", + "\n", + "Specifically, we can define:\n", "\n", "- possible values for each property\n", "- description to make sure that the model understands the property\n", @@ -180,7 +165,7 @@ "id": "69ef0b9a", "metadata": {}, "source": [ - "Following is an example of how we can use _enum_, _description_ and _required_ to control for each of the previously mentioned aspects:" + "Here is an example of how we can use `_enum_`, `_description_`, and `_required_` to control for each of the previously mentioned aspects:" ] }, { @@ -192,7 +177,6 @@ "source": [ "schema = {\n", " \"properties\": {\n", - " \"sentiment\": {\"type\": \"string\", \"enum\": [\"happy\", \"neutral\", \"sad\"]},\n", " \"aggressiveness\": {\n", " \"type\": \"integer\",\n", " \"enum\": [1, 2, 3, 4, 5],\n", @@ -234,7 +218,7 @@ { "data": { "text/plain": [ - "{'sentiment': 'happy', 'aggressiveness': 0, 'language': 'spanish'}" + "{'aggressiveness': 0, 'language': 'spanish'}" ] }, "execution_count": 10, @@ -256,7 +240,7 @@ { "data": { "text/plain": [ - "{'sentiment': 'sad', 'aggressiveness': 10, 'language': 'spanish'}" + "{'aggressiveness': 5, 'language': 'spanish'}" ] }, "execution_count": 11, @@ -278,7 +262,7 @@ { "data": { "text/plain": [ - "{'sentiment': 'neutral', 'aggressiveness': 0, 'language': 'english'}" + "{'aggressiveness': 0, 'language': 'english'}" ] }, "execution_count": 12, @@ -291,12 +275,25 @@ "chain.run(inp)" ] }, + { + "cell_type": "markdown", + "id": "cf6b7389", + "metadata": {}, + "source": [ + "The [LangSmith trace](https://smith.langchain.com/public/311e663a-bbe8-4053-843e-5735055c032d/r) lets us peek under the hood:\n", + "\n", + "* As with [extraction](/docs/use_cases/extraction), we call the `information_extraction` function [here](https://github.com/langchain-ai/langchain/blob/269f85b7b7ffd74b38cd422d4164fc033388c3d0/libs/langchain/langchain/chains/openai_functions/extraction.py#L20) on the input string.\n", + "* This OpenAI funtion extraction information based upon the provided schema.\n", + "\n", + "![Image description](/img/tagging_trace.png)" + ] + }, { "cell_type": "markdown", "id": "e68ad17e", "metadata": {}, "source": [ - "## Specifying schema with Pydantic" + "## Pydantic" ] }, { @@ -304,11 +301,11 @@ "id": "2f5970ec", "metadata": {}, "source": [ - "We can also use a Pydantic schema to specify the required properties and types. We can also send other arguments, such as 'enum' or 'description' as can be seen in the example below.\n", + "We can also use a Pydantic schema to specify the required properties and types. \n", "\n", - "By using the `create_tagging_chain_pydantic` function, we can send a Pydantic schema as input and the output will be an instantiated object that respects our desired schema. \n", + "We can also send other arguments, such as `enum` or `description`, to each field.\n", "\n", - "In this way, we can specify our schema in the same manner that we would a new class or function in Python - with purely Pythonic types." + "This lets us specify our schema in the same manner that we would a new class or function in Python with purely Pythonic types." ] }, { @@ -371,7 +368,7 @@ { "data": { "text/plain": [ - "Tags(sentiment='sad', aggressiveness=10, language='spanish')" + "Tags(sentiment='sad', aggressiveness=5, language='spanish')" ] }, "execution_count": 17, @@ -382,6 +379,17 @@ "source": [ "res" ] + }, + { + "cell_type": "markdown", + "id": "29346d09", + "metadata": {}, + "source": [ + "### Going deeper\n", + "\n", + "* You can use the [metadata tagger](https://python.langchain.com/docs/integrations/document_transformers/openai_metadata_tagger) document transformer to extract metadata from a LangChain `Document`. \n", + "* This covers the same basic functionality as the tagging chain, only applied to a LangChain `Document`." + ] } ], "metadata": { @@ -400,7 +408,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.1" + "version": "3.9.16" } }, "nbformat": 4,