From 16af5f86905705096552507f8739b5cfcaa77aa4 Mon Sep 17 00:00:00 2001 From: niklub Date: Fri, 11 Aug 2023 19:24:10 +0100 Subject: [PATCH] Add LabelStudio integration (#8880) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This PR introduces [Label Studio](https://labelstud.io/) integration with LangChain via `LabelStudioCallbackHandler`: - sending data to the Label Studio instance - labeling dataset for supervised LLM finetuning - rating model responses - tracking and displaying chat history - support for custom data labeling workflow ### Example ``` chat_llm = ChatOpenAI(callbacks=[LabelStudioCallbackHandler(mode="chat")]) chat_llm([ SystemMessage(content="Always use emojis in your responses."), HumanMessage(content="Hey AI, how's your day going?"), AIMessage(content="🤖 I don't have feelings, but I'm running smoothly! How can I help you today?"), HumanMessage(content="I'm feeling a bit down. Any advice?"), AIMessage(content="🤗 I'm sorry to hear that. Remember, it's okay to seek help or talk to someone if you need to. 💬"), HumanMessage(content="Can you tell me a joke to lighten the mood?"), AIMessage(content="Of course! 🎭 Why did the scarecrow win an award? Because he was outstanding in his field! 🌾"), HumanMessage(content="Haha, that was a good one! Thanks for cheering me up."), AIMessage(content="Always here to help! 😊 If you need anything else, just let me know."), HumanMessage(content="Will do! By the way, can you recommend a good movie?"), ]) ``` image ### Dependencies - [label-studio](https://pypi.org/project/label-studio/) - [label-studio-sdk](https://pypi.org/project/label-studio-sdk/) https://twitter.com/labelstudiohq --------- Co-authored-by: nik --- .../integrations/callbacks/labelstudio.ipynb | 382 +++++++++++++++++ .../langchain/langchain/callbacks/__init__.py | 2 + .../callbacks/labelstudio_callback.py | 392 ++++++++++++++++++ 3 files changed, 776 insertions(+) create mode 100644 docs/extras/integrations/callbacks/labelstudio.ipynb create mode 100644 libs/langchain/langchain/callbacks/labelstudio_callback.py diff --git a/docs/extras/integrations/callbacks/labelstudio.ipynb b/docs/extras/integrations/callbacks/labelstudio.ipynb new file mode 100644 index 0000000000..927db2d639 --- /dev/null +++ b/docs/extras/integrations/callbacks/labelstudio.ipynb @@ -0,0 +1,382 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "collapsed": true, + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "# Label Studio\n", + "\n", + "
\n", + "\n", + "
\n", + "\n", + "Label Studio is an open-source data labeling platform that provides LangChain with flexibility when it comes to labeling data for fine-tuning large language models (LLMs). It also enables the preparation of custom training data and the collection and evaluation of responses through human feedback.\n", + "\n", + "In this guide, you will learn how to connect a LangChain pipeline to Label Studio to:\n", + "\n", + "- Aggregate all input prompts, conversations, and responses in a single LabelStudio project. This consolidates all the data in one place for easier labeling and analysis.\n", + "- Refine prompts and responses to create a dataset for supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) scenarios. The labeled data can be used to further train the LLM to improve its performance.\n", + "- Evaluate model responses through human feedback. LabelStudio provides an interface for humans to review and provide feedback on model responses, allowing evaluation and iteration." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## Installation and setup" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "First install latest versions of Label Studio and Label Studio API client:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "!pip install -U label-studio label-studio-sdk openai" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "Next, run `label-studio` on the command line to start the local LabelStudio instance at `http://localhost:8080`. See the [Label Studio installation guide](https://labelstud.io/guide/install) for more options." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "You'll need a token to make API calls.\n", + "\n", + "Open your LabelStudio instance in your browser, go to `Account & Settings > Access Token` and copy the key.\n", + "\n", + "Set environment variables with your LabelStudio URL, API key and OpenAI API key:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "os.environ['LABEL_STUDIO_URL'] = '' # e.g. http://localhost:8080\n", + "os.environ['LABEL_STUDIO_API_KEY'] = ''\n", + "os.environ['OPENAI_API_KEY'] = ''" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "## Collecting LLMs prompts and responses" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The data used for labeling is stored in projects within Label Studio. Every project is identified by an XML configuration that details the specifications for input and output data. \n", + "\n", + "Create a project that takes human input in text format and outputs an editable LLM response in a text area:\n", + "\n", + "```xml\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "