{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true, "pycharm": { "name": "#%% md\n" } }, "source": [ "# Label Studio\n", "\n", "
\n", "\n", "
\n", "\n", "Label Studio is an open-source data labeling platform that provides LangChain with flexibility when it comes to labeling data for fine-tuning large language models (LLMs). It also enables the preparation of custom training data and the collection and evaluation of responses through human feedback.\n", "\n", "In this guide, you will learn how to connect a LangChain pipeline to Label Studio to:\n", "\n", "- Aggregate all input prompts, conversations, and responses in a single LabelStudio project. This consolidates all the data in one place for easier labeling and analysis.\n", "- Refine prompts and responses to create a dataset for supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) scenarios. The labeled data can be used to further train the LLM to improve its performance.\n", "- Evaluate model responses through human feedback. LabelStudio provides an interface for humans to review and provide feedback on model responses, allowing evaluation and iteration." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## Installation and setup" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "First install latest versions of Label Studio and Label Studio API client:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "!pip install -U label-studio label-studio-sdk openai" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Next, run `label-studio` on the command line to start the local LabelStudio instance at `http://localhost:8080`. See the [Label Studio installation guide](https://labelstud.io/guide/install) for more options." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "You'll need a token to make API calls.\n", "\n", "Open your LabelStudio instance in your browser, go to `Account & Settings > Access Token` and copy the key.\n", "\n", "Set environment variables with your LabelStudio URL, API key and OpenAI API key:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "import os\n", "\n", "os.environ['LABEL_STUDIO_URL'] = '' # e.g. http://localhost:8080\n", "os.environ['LABEL_STUDIO_API_KEY'] = ''\n", "os.environ['OPENAI_API_KEY'] = ''" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## Collecting LLMs prompts and responses" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data used for labeling is stored in projects within Label Studio. Every project is identified by an XML configuration that details the specifications for input and output data. \n", "\n", "Create a project that takes human input in text format and outputs an editable LLM response in a text area:\n", "\n", "```xml\n", "\n", "\n", "\n", " \n", " \n", " \n", "