diff --git a/docs/extras/integrations/callbacks/labelstudio.ipynb b/docs/extras/integrations/callbacks/labelstudio.ipynb new file mode 100644 index 0000000000..927db2d639 --- /dev/null +++ b/docs/extras/integrations/callbacks/labelstudio.ipynb @@ -0,0 +1,382 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "collapsed": true, + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "# Label Studio\n", + "\n", + "
\n", + "\n", + "
\n", + "\n", + "Label Studio is an open-source data labeling platform that provides LangChain with flexibility when it comes to labeling data for fine-tuning large language models (LLMs). It also enables the preparation of custom training data and the collection and evaluation of responses through human feedback.\n", + "\n", + "In this guide, you will learn how to connect a LangChain pipeline to Label Studio to:\n", + "\n", + "- Aggregate all input prompts, conversations, and responses in a single LabelStudio project. This consolidates all the data in one place for easier labeling and analysis.\n", + "- Refine prompts and responses to create a dataset for supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) scenarios. The labeled data can be used to further train the LLM to improve its performance.\n", + "- Evaluate model responses through human feedback. LabelStudio provides an interface for humans to review and provide feedback on model responses, allowing evaluation and iteration." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## Installation and setup" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "First install latest versions of Label Studio and Label Studio API client:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "!pip install -U label-studio label-studio-sdk openai" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "Next, run `label-studio` on the command line to start the local LabelStudio instance at `http://localhost:8080`. See the [Label Studio installation guide](https://labelstud.io/guide/install) for more options." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "You'll need a token to make API calls.\n", + "\n", + "Open your LabelStudio instance in your browser, go to `Account & Settings > Access Token` and copy the key.\n", + "\n", + "Set environment variables with your LabelStudio URL, API key and OpenAI API key:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "os.environ['LABEL_STUDIO_URL'] = '' # e.g. http://localhost:8080\n", + "os.environ['LABEL_STUDIO_API_KEY'] = ''\n", + "os.environ['OPENAI_API_KEY'] = ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## Collecting LLMs prompts and responses" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The data used for labeling is stored in projects within Label Studio. Every project is identified by an XML configuration that details the specifications for input and output data. \n", + "\n", + "Create a project that takes human input in text format and outputs an editable LLM response in a text area:\n", + "\n", + "```xml\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "