langchain/docs/examples/prompts/prompt_management.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "43fb16cb",
   "metadata": {},
   "source": [
    "# Prompt Management\n",
    "\n",
    "Managing your prompts is annoying and tedious, with everyone writing their own slightly different variants of the same ideas. But it shouldn't be this way. \n",
    "\n",
    "LangChain provides a standard and flexible way for specifying and managing all your prompts, as well as clear and specific terminology around them. This notebook goes through the core components of working with prompts, showing how to use them as well as explaining what they do.\n",
    "\n",
    "This notebook covers how to work with prompts in Python. If you are interested in how to work with serialized versions of prompts and load them from disk, see [this notebook](prompt_serialization.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "890aad4d",
   "metadata": {},
   "source": [
    "### The BasePromptTemplate Interface\n",
    "\n",
    "A prompt template is a mechanism for constructing a prompt to pass to the language model given some user input. Below is the interface that all different types of prompt templates should expose.\n",
    "\n",
    "```python\n",
    "class BasePromptTemplate(ABC):\n",
    "\n",
    "    input_variables: List[str]\n",
    "    \"\"\"A list of the names of the variables the prompt template expects.\"\"\"\n",
    "\n",
    "    @abstractmethod\n",
    "    def format(self, **kwargs: Any) -> str:\n",
    "        \"\"\"Format the prompt with the inputs.\n",
    "\n",
    "        Args:\n",
    "            kwargs: Any arguments to be passed to the prompt template.\n",
    "\n",
    "        Returns:\n",
    "            A formatted string.\n",
    "\n",
    "        Example:\n",
    "\n",
    "        .. code-block:: python\n",
    "\n",
    "            prompt.format(variable1=\"foo\")\n",
    "        \"\"\"\n",
    "```\n",
    "\n",
    "The only two things that define a prompt are:\n",
    "\n",
    "1. `input_variables`: The user inputted variables that are needed to format the prompt.\n",
    "2. `format`: A method which takes in keyword arguments are returns a formatted prompt. The keys are expected to be the input variables\n",
    "    \n",
    "The rest of the logic of how the prompt is constructed is left up to different implementations. Let's take a look at some below."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cddb465e",
   "metadata": {},
   "source": [
    "### PromptTemplate\n",
    "\n",
    "This is the most simple type of prompt template, consisting of a string template that takes any number of input variables. The template should be formatted as a Python f-string, although we will support other formats (Jinja, Mako, etc) in the future. \n",
    "\n",
    "If you just want to use a hardcoded prompt template, you should use this implementation.\n",
    "\n",
    "Let's walk through a few examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "094229f4",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.prompts import PromptTemplate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "ab46bd2a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Tell me a joke.'"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# An example prompt with no input variables\n",
    "no_input_prompt = PromptTemplate(input_variables=[], template=\"Tell me a joke.\")\n",
    "no_input_prompt.format()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "c3ad0fa8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Tell me a funny joke.'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# An example prompt with one input variable\n",
    "one_input_prompt = PromptTemplate(input_variables=[\"adjective\"], template=\"Tell me a {adjective} joke.\")\n",
    "one_input_prompt.format(adjective=\"funny\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "ba577dcf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Tell me a funny joke about chickens.'"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# An example prompt with multiple input variables\n",
    "multiple_input_prompt = PromptTemplate(\n",
    "    input_variables=[\"adjective\", \"content\"], \n",
    "    template=\"Tell me a {adjective} joke about {content}.\"\n",
    ")\n",
    "multiple_input_prompt.format(adjective=\"funny\", content=\"chickens\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1492b49d",
   "metadata": {},
   "source": [
    "### Few Shot Prompts\n",
    "\n",
    "A FewShotPromptTemplate is a prompt template that includes some examples. If you have collected some examples of how the task should be done, you can insert them into prompt using this class.\n",
    "\n",
    "Examples are datapoints that can be included in the prompt in order to give the model more context what to do. Examples are represented as a dictionary of key-value pairs, with the key being the input (or label) name, and the value being the input (or label) value. \n",
    "\n",
    "In addition to the example, we also need to specify how the example should be formatted when it's inserted in the prompt. We can do this using the above `PromptTemplate`!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "3eb36972",
   "metadata": {},
   "outputs": [],
   "source": [
    "# These are some examples of a pretend task of creating antonyms.\n",
    "examples = [\n",
    "    {\"input\": \"happy\", \"output\": \"sad\"},\n",
    "    {\"input\": \"tall\", \"output\": \"short\"},\n",
    "]\n",
    "# This how we specify how the example should be formatted.\n",
    "example_prompt = PromptTemplate(\n",
    "    input_variables=[\"input\",\"output\"],\n",
    "    template=\"Input: {input}\\nOutput: {output}\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "80a91d96",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.prompts.few_shot import FewShotPromptTemplate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "7931e5f2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Give the antonym of every input\n",
      "\n",
      "Input: happy\n",
      "Output: sad\n",
      "\n",
      "Input: tall\n",
      "Output: short\n",
      "\n",
      "Input: big\n",
      "Output:\n"
     ]
    }
   ],
   "source": [
    "prompt_from_string_examples = FewShotPromptTemplate(\n",
    "    # These are the examples we want to insert into the prompt.\n",
    "    examples=examples,\n",
    "    # This is how we want to format the examples when we insert them into the prompt.\n",
    "    example_prompt=example_prompt,\n",
    "    # The prefix is some text that goes before the examples in the prompt.\n",
    "    # Usually, this consists of intructions.\n",
    "    prefix=\"Give the antonym of every input\",\n",
    "    # The suffix is some text that goes after the examples in the prompt.\n",
    "    # Usually, this is where the user input will go\n",
    "    suffix=\"Input: {adjective}\\nOutput:\", \n",
    "    # The input variables are the variables that the overall prompt expects.\n",
    "    input_variables=[\"adjective\"],\n",
    "    # The example_separator is the string we will use to join the prefix, examples, and suffix together with.\n",
    "    example_separator=\"\\n\\n\"\n",
    "    \n",
    ")\n",
    "print(prompt_from_string_examples.format(adjective=\"big\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf038596",
   "metadata": {},
   "source": [
    "### ExampleSelector\n",
    "If you have a large number of examples, you may need to select which ones to include in the prompt. The ExampleSelector is the class responsible for doing so. The base interface is defined as below.\n",
    "\n",
    "```python\n",
    "class BaseExampleSelector(ABC):\n",
    "    \"\"\"Interface for selecting examples to include in prompts.\"\"\"\n",
    "\n",
    "    @abstractmethod\n",
    "    def select_examples(self, input_variables: Dict[str, str]) -> List[dict]:\n",
    "        \"\"\"Select which examples to use based on the inputs.\"\"\"\n",
    "\n",
    "```\n",
    "\n",
    "The only method it needs to expose is a `select_examples` method. This takes in the input variables and then returns a list of examples. It is up to each specific implementation as to how those examples are selected. Let's take a look at some below."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "861a4d1f",
   "metadata": {},
   "source": [
    "### LengthBased ExampleSelector\n",
    "\n",
    "This ExampleSelector selects which examples to use based on length. This is useful when you are worried about constructing a prompt that will go over the length of the context window. For longer inputs, it will select fewer examples to include, while for shorter inputs it will select more.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "7c469c95",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.prompts.example_selector.length_based import LengthBasedExampleSelector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "0ec6d950",
   "metadata": {},
   "outputs": [],
   "source": [
    "# These are a lot of examples of a pretend task of creating antonyms.\n",
    "examples = [\n",
    "    {\"input\": \"happy\", \"output\": \"sad\"},\n",
    "    {\"input\": \"tall\", \"output\": \"short\"},\n",
    "    {\"input\": \"energetic\", \"output\": \"lethargic\"},\n",
    "    {\"input\": \"sunny\", \"output\": \"gloomy\"},\n",
    "    {\"input\": \"windy\", \"output\": \"calm\"},\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "207e55f7",
   "metadata": {},
   "outputs": [],
   "source": [
    "example_selector = LengthBasedExampleSelector(\n",
    "    # These are the examples is has available to choose from.\n",
    "    examples=examples, \n",
    "    # This is the PromptTemplate being used to format the examples.\n",
    "    example_prompt=example_prompt, \n",
    "    # This is the maximum length that the formatted examples should be.\n",
    "    # Length is measured by the get_text_length function below.\n",
    "    max_length=18,\n",
    "    # This is the function used to get the length of a string, which is used\n",
    "    # to determine which examples to include. It is commented out because\n",
    "    # it is provided as a default value if none is specified.\n",
    "    # get_text_length: Callable[[str], int] = lambda x: len(re.split(\"\\n| \", x))\n",
    ")\n",
    "dynamic_prompt = FewShotPromptTemplate(\n",
    "    # We provide an ExampleSelector instead of examples.\n",
    "    example_selector=example_selector,\n",
    "    example_prompt=example_prompt,\n",
    "    prefix=\"Give the antonym of every input\",\n",
    "    suffix=\"Input: {adjective}\\nOutput:\", \n",
    "    input_variables=[\"adjective\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "d00b4385",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Give the antonym of every input\n",
      "\n",
      "Input: happy\n",
      "Output: sad\n",
      "\n",
      "Input: tall\n",
      "Output: short\n",
      "\n",
      "Input: energetic\n",
      "Output: lethargic\n",
      "\n",
      "Input: sunny\n",
      "Output: gloomy\n",
      "\n",
      "Input: windy\n",
      "Output: calm\n",
      "\n",
      "Input: big\n",
      "Output:\n"
     ]
    }
   ],
   "source": [
    "# An example with small input, so it selects all examples.\n",
    "print(dynamic_prompt.format(adjective=\"big\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "878bcde9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Give the antonym of every input\n",
      "\n",
      "Input: happy\n",
      "Output: sad\n",
      "\n",
      "Input: big and huge and massive and large and gigantic and tall and bigger than everything else\n",
      "Output:\n"
     ]
    }
   ],
   "source": [
    "# An example with long input, so it selects only one example.\n",
    "long_string = \"big and huge and massive and large and gigantic and tall and bigger than everything else\"\n",
    "print(dynamic_prompt.format(adjective=long_string))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d007b0a",
   "metadata": {},
   "source": [
    "### Similarity ExampleSelector\n",
    "\n",
    "The SemanticSimilarityExampleSelector selects examples based on which examples are most similar to the inputs. It does this by finding the examples with the embeddings that have the greatest cosine similarity with the inputs.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "241bfe80",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.prompts.example_selector.semantic_similarity import SemanticSimilarityExampleSelector\n",
    "from langchain.vectorstores import FAISS\n",
    "from langchain.embeddings import OpenAIEmbeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "50d0a701",
   "metadata": {},
   "outputs": [],
   "source": [
    "example_selector = SemanticSimilarityExampleSelector.from_examples(\n",
    "    # This is the list of examples available to select from.\n",
    "    examples, \n",
    "    # This is the embedding class used to produce embeddings which are used to measure semantic similarity.\n",
    "    OpenAIEmbeddings(), \n",
    "    # This is the VectorStore class that is used to store the embeddings and do a similarity search over.\n",
    "    FAISS, \n",
    "    # This is the number of examples to produce.\n",
    "    k=1\n",
    ")\n",
    "similar_prompt = FewShotPromptTemplate(\n",
    "    # We provide an ExampleSelector instead of examples.\n",
    "    example_selector=example_selector,\n",
    "    example_prompt=example_prompt,\n",
    "    prefix=\"Give the antonym of every input\",\n",
    "    suffix=\"Input: {adjective}\\nOutput:\", \n",
    "    input_variables=[\"adjective\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "4c8fdf45",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Give the antonym of every input\n",
      "\n",
      "Input: happy\n",
      "Output: sad\n",
      "\n",
      "Input: worried\n",
      "Output:\n"
     ]
    }
   ],
   "source": [
    "# Input is a feeling, so should select the happy/sad example\n",
    "print(similar_prompt.format(adjective=\"worried\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "829af21a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Give the antonym of every input\n",
      "\n",
      "Input: tall\n",
      "Output: short\n",
      "\n",
      "Input: fat\n",
      "Output:\n"
     ]
    }
   ],
   "source": [
    "# Input is a measurment, so should select the tall/short example\n",
    "print(similar_prompt.format(adjective=\"fat\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dbc32551",
   "metadata": {},
   "source": [
    "### Serialization\n",
    "\n",
    "PromptTemplates and examples can be serialized and loaded from disk, making it easy to share and store prompts. For a detailed walkthrough on how to do that, see [this notebook](prompt_serialization.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e1e13c6",
   "metadata": {},
   "source": [
    "### Customizability\n",
    "The above covers all the ways currently supported in LangChain to represent prompts and example selectors. However, due to the simple interface that the base classes (`BasePromptTemplate`, `BaseExampleSelector`) expose, it should be easy to subclass them and write your own implementation in your own codebase. And of course, if you'd like to contribute that back to LangChain, we'd love that :)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c746d6f4",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}