mirror of
https://github.com/hwchase17/langchain
synced 2024-10-29 17:07:25 +00:00
383 lines
11 KiB
Plaintext
383 lines
11 KiB
Plaintext
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"collapsed": true,
|
||
|
"pycharm": {
|
||
|
"name": "#%% md\n"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"# Label Studio\n",
|
||
|
"\n",
|
||
|
"<div>\n",
|
||
|
"<img src=\"https://labelstudio-pub.s3.amazonaws.com/lc/open-source-data-labeling-platform.png\" width=\"400\"/>\n",
|
||
|
"</div>\n",
|
||
|
"\n",
|
||
|
"Label Studio is an open-source data labeling platform that provides LangChain with flexibility when it comes to labeling data for fine-tuning large language models (LLMs). It also enables the preparation of custom training data and the collection and evaluation of responses through human feedback.\n",
|
||
|
"\n",
|
||
|
"In this guide, you will learn how to connect a LangChain pipeline to Label Studio to:\n",
|
||
|
"\n",
|
||
|
"- Aggregate all input prompts, conversations, and responses in a single LabelStudio project. This consolidates all the data in one place for easier labeling and analysis.\n",
|
||
|
"- Refine prompts and responses to create a dataset for supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) scenarios. The labeled data can be used to further train the LLM to improve its performance.\n",
|
||
|
"- Evaluate model responses through human feedback. LabelStudio provides an interface for humans to review and provide feedback on model responses, allowing evaluation and iteration."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%% md\n"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## Installation and setup"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%% md\n"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"First install latest versions of Label Studio and Label Studio API client:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%%\n"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"!pip install -U label-studio label-studio-sdk openai"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%% md\n"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Next, run `label-studio` on the command line to start the local LabelStudio instance at `http://localhost:8080`. See the [Label Studio installation guide](https://labelstud.io/guide/install) for more options."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%% md\n"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"You'll need a token to make API calls.\n",
|
||
|
"\n",
|
||
|
"Open your LabelStudio instance in your browser, go to `Account & Settings > Access Token` and copy the key.\n",
|
||
|
"\n",
|
||
|
"Set environment variables with your LabelStudio URL, API key and OpenAI API key:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%%\n"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import os\n",
|
||
|
"\n",
|
||
|
"os.environ['LABEL_STUDIO_URL'] = '<YOUR-LABEL-STUDIO-URL>' # e.g. http://localhost:8080\n",
|
||
|
"os.environ['LABEL_STUDIO_API_KEY'] = '<YOUR-LABEL-STUDIO-API-KEY>'\n",
|
||
|
"os.environ['OPENAI_API_KEY'] = '<YOUR-OPENAI-API-KEY>'"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%% md\n"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## Collecting LLMs prompts and responses"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"The data used for labeling is stored in projects within Label Studio. Every project is identified by an XML configuration that details the specifications for input and output data. \n",
|
||
|
"\n",
|
||
|
"Create a project that takes human input in text format and outputs an editable LLM response in a text area:\n",
|
||
|
"\n",
|
||
|
"```xml\n",
|
||
|
"<View>\n",
|
||
|
"<Style>\n",
|
||
|
" .prompt-box {\n",
|
||
|
" background-color: white;\n",
|
||
|
" border-radius: 10px;\n",
|
||
|
" box-shadow: 0px 4px 6px rgba(0, 0, 0, 0.1);\n",
|
||
|
" padding: 20px;\n",
|
||
|
" }\n",
|
||
|
"</Style>\n",
|
||
|
"<View className=\"root\">\n",
|
||
|
" <View className=\"prompt-box\">\n",
|
||
|
" <Text name=\"prompt\" value=\"$prompt\"/>\n",
|
||
|
" </View>\n",
|
||
|
" <TextArea name=\"response\" toName=\"prompt\"\n",
|
||
|
" maxSubmissions=\"1\" editable=\"true\"\n",
|
||
|
" required=\"true\"/>\n",
|
||
|
"</View>\n",
|
||
|
"<Header value=\"Rate the response:\"/>\n",
|
||
|
"<Rating name=\"rating\" toName=\"prompt\"/>\n",
|
||
|
"</View>\n",
|
||
|
"```\n",
|
||
|
"\n",
|
||
|
"1. To create a project in Label Studio, click on the \"Create\" button. \n",
|
||
|
"2. Enter a name for your project in the \"Project Name\" field, such as `My Project`.\n",
|
||
|
"3. Navigate to `Labeling Setup > Custom Template` and paste the XML configuration provided above."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%% md\n"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"You can collect input LLM prompts and output responses in a LabelStudio project, connecting it via `LabelStudioCallbackHandler`:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%%\n"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from langchain.llms import OpenAI\n",
|
||
|
"from langchain.callbacks import LabelStudioCallbackHandler\n",
|
||
|
"\n",
|
||
|
"llm = OpenAI(\n",
|
||
|
" temperature=0,\n",
|
||
|
" callbacks=[\n",
|
||
|
" LabelStudioCallbackHandler(\n",
|
||
|
" project_name=\"My Project\"\n",
|
||
|
" )]\n",
|
||
|
")\n",
|
||
|
"print(llm(\"Tell me a joke\"))"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%% md\n"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"In the Label Studio, open `My Project`. You will see the prompts, responses, and metadata like the model name. "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%% md\n"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## Collecting Chat model Dialogues"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"You can also track and display full chat dialogues in LabelStudio, with the ability to rate and modify the last response:\n",
|
||
|
"\n",
|
||
|
"1. Open Label Studio and click on the \"Create\" button.\n",
|
||
|
"2. Enter a name for your project in the \"Project Name\" field, such as `New Project with Chat`.\n",
|
||
|
"3. Navigate to Labeling Setup > Custom Template and paste the following XML configuration:\n",
|
||
|
"\n",
|
||
|
"```xml\n",
|
||
|
"<View>\n",
|
||
|
"<View className=\"root\">\n",
|
||
|
" <Paragraphs name=\"dialogue\"\n",
|
||
|
" value=\"$prompt\"\n",
|
||
|
" layout=\"dialogue\"\n",
|
||
|
" textKey=\"content\"\n",
|
||
|
" nameKey=\"role\"\n",
|
||
|
" granularity=\"sentence\"/>\n",
|
||
|
" <Header value=\"Final response:\"/>\n",
|
||
|
" <TextArea name=\"response\" toName=\"dialogue\"\n",
|
||
|
" maxSubmissions=\"1\" editable=\"true\"\n",
|
||
|
" required=\"true\"/>\n",
|
||
|
"</View>\n",
|
||
|
"<Header value=\"Rate the response:\"/>\n",
|
||
|
"<Rating name=\"rating\" toName=\"dialogue\"/>\n",
|
||
|
"</View>\n",
|
||
|
"```"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%%\n"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from langchain.chat_models import ChatOpenAI\n",
|
||
|
"from langchain.schema import HumanMessage, SystemMessage\n",
|
||
|
"from langchain.callbacks import LabelStudioCallbackHandler\n",
|
||
|
"\n",
|
||
|
"chat_llm = ChatOpenAI(callbacks=[\n",
|
||
|
" LabelStudioCallbackHandler(\n",
|
||
|
" mode=\"chat\",\n",
|
||
|
" project_name=\"New Project with Chat\",\n",
|
||
|
" )\n",
|
||
|
"])\n",
|
||
|
"llm_results = chat_llm([\n",
|
||
|
" SystemMessage(content=\"Always use a lot of emojis\"),\n",
|
||
|
" HumanMessage(content=\"Tell me a joke\")\n",
|
||
|
"])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"In Label Studio, open \"New Project with Chat\". Click on a created task to view dialog history and edit/annotate responses."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%% md\n"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## Custom Labeling Configuration"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%% md\n"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"You can modify the default labeling configuration in LabelStudio to add more target labels like response sentiment, relevance, and many [other types annotator's feedback](https://labelstud.io/tags/).\n",
|
||
|
"\n",
|
||
|
"New labeling configuration can be added from UI: go to `Settings > Labeling Interface` and set up a custom configuration with additional tags like `Choices` for sentiment or `Rating` for relevance. Keep in mind that [`TextArea` tag](https://labelstud.io/tags/textarea) should be presented in any configuration to display the LLM responses.\n",
|
||
|
"\n",
|
||
|
"Alternatively, you can specify the labeling configuration on the initial call before project creation:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%%\n"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"ls = LabelStudioCallbackHandler(project_config='''\n",
|
||
|
"<View>\n",
|
||
|
"<Text name=\"prompt\" value=\"$prompt\"/>\n",
|
||
|
"<TextArea name=\"response\" toName=\"prompt\"/>\n",
|
||
|
"<TextArea name=\"user_feedback\" toName=\"prompt\"/>\n",
|
||
|
"<Rating name=\"rating\" toName=\"prompt\"/>\n",
|
||
|
"<Choices name=\"sentiment\" toName=\"prompt\">\n",
|
||
|
" <Choice value=\"Positive\"/>\n",
|
||
|
" <Choice value=\"Negative\"/>\n",
|
||
|
"</Choices>\n",
|
||
|
"</View>\n",
|
||
|
"''')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Note that if the project doesn't exist, it will be created with the specified labeling configuration."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%% md\n"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## Other parameters"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"pycharm": {
|
||
|
"name": "#%% md\n"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"The `LabelStudioCallbackHandler` accepts several optional parameters:\n",
|
||
|
"\n",
|
||
|
"- **api_key** - Label Studio API key. Overrides environmental variable `LABEL_STUDIO_API_KEY`.\n",
|
||
|
"- **url** - Label Studio URL. Overrides `LABEL_STUDIO_URL`, default `http://localhost:8080`.\n",
|
||
|
"- **project_id** - Existing Label Studio project ID. Overrides `LABEL_STUDIO_PROJECT_ID`. Stores data in this project.\n",
|
||
|
"- **project_name** - Project name if project ID not specified. Creates a new project. Default is `\"LangChain-%Y-%m-%d\"` formatted with the current date.\n",
|
||
|
"- **project_config** - [custom labeling configuration](#custom-labeling-configuration)\n",
|
||
|
"- **mode**: use this shortcut to create target configuration from scratch:\n",
|
||
|
" - `\"prompt\"` - Single prompt, single response. Default.\n",
|
||
|
" - `\"chat\"` - Multi-turn chat mode.\n",
|
||
|
"\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"display_name": "labelops",
|
||
|
"language": "python",
|
||
|
"name": "labelops"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.9.16"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 1
|
||
|
}
|