{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "bf038596",
   "metadata": {},
   "source": [
    "# Example Selectors\n",
    "If you have a large number of examples, you may need to select which ones to include in the prompt. The ExampleSelector is the class responsible for doing so. The base interface is defined as below.\n",
    "\n",
    "```python\n",
    "class BaseExampleSelector(ABC):\n",
    "    \"\"\"Interface for selecting examples to include in prompts.\"\"\"\n",
    "\n",
    "    @abstractmethod\n",
    "    def select_examples(self, input_variables: Dict[str, str]) -> List[dict]:\n",
    "        \"\"\"Select which examples to use based on the inputs.\"\"\"\n",
    "\n",
    "```\n",
    "\n",
    "The only method it needs to expose is a `select_examples` method. This takes in the input variables and then returns a list of examples. It is up to each specific implementation as to how those examples are selected. Let's take a look at some below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8244ff60",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.prompts import FewShotPromptTemplate"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "861a4d1f",
   "metadata": {},
   "source": [
    "## LengthBased ExampleSelector\n",
    "\n",
    "This ExampleSelector selects which examples to use based on length. This is useful when you are worried about constructing a prompt that will go over the length of the context window. For longer inputs, it will select fewer examples to include, while for shorter inputs it will select more.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "7c469c95",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.prompts.example_selector import LengthBasedExampleSelector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "0ec6d950",
   "metadata": {},
   "outputs": [],
   "source": [
    "# These are a lot of examples of a pretend task of creating antonyms.\n",
    "examples = [\n",
    "    {\"input\": \"happy\", \"output\": \"sad\"},\n",
    "    {\"input\": \"tall\", \"output\": \"short\"},\n",
    "    {\"input\": \"energetic\", \"output\": \"lethargic\"},\n",
    "    {\"input\": \"sunny\", \"output\": \"gloomy\"},\n",
    "    {\"input\": \"windy\", \"output\": \"calm\"},\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "207e55f7",
   "metadata": {},
   "outputs": [],
   "source": [
    "example_selector = LengthBasedExampleSelector(\n",
    "    # These are the examples is has available to choose from.\n",
    "    examples=examples, \n",
    "    # This is the PromptTemplate being used to format the examples.\n",
    "    example_prompt=example_prompt, \n",
    "    # This is the maximum length that the formatted examples should be.\n",
    "    # Length is measured by the get_text_length function below.\n",
    "    max_length=25,\n",
    "    # This is the function used to get the length of a string, which is used\n",
    "    # to determine which examples to include. It is commented out because\n",
    "    # it is provided as a default value if none is specified.\n",
    "    # get_text_length: Callable[[str], int] = lambda x: len(re.split(\"\\n| \", x))\n",
    ")\n",
    "dynamic_prompt = FewShotPromptTemplate(\n",
    "    # We provide an ExampleSelector instead of examples.\n",
    "    example_selector=example_selector,\n",
    "    example_prompt=example_prompt,\n",
    "    prefix=\"Give the antonym of every input\",\n",
    "    suffix=\"Input: {adjective}\\nOutput:\", \n",
    "    input_variables=[\"adjective\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "d00b4385",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Give the antonym of every input\n",
      "\n",
      "Input: happy\n",
      "Output: sad\n",
      "\n",
      "Input: tall\n",
      "Output: short\n",
      "\n",
      "Input: energetic\n",
      "Output: lethargic\n",
      "\n",
      "Input: sunny\n",
      "Output: gloomy\n",
      "\n",
      "Input: windy\n",
      "Output: calm\n",
      "\n",
      "Input: big\n",
      "Output:\n"
     ]
    }
   ],
   "source": [
    "# An example with small input, so it selects all examples.\n",
    "print(dynamic_prompt.format(adjective=\"big\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "878bcde9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Give the antonym of every input\n",
      "\n",
      "Input: happy\n",
      "Output: sad\n",
      "\n",
      "Input: big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else\n",
      "Output:\n"
     ]
    }
   ],
   "source": [
    "# An example with long input, so it selects only one example.\n",
    "long_string = \"big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else\"\n",
    "print(dynamic_prompt.format(adjective=long_string))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "e4bebcd9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Give the antonym of every input\n",
      "\n",
      "Input: happy\n",
      "Output: sad\n",
      "\n",
      "Input: tall\n",
      "Output: short\n",
      "\n",
      "Input: energetic\n",
      "Output: lethargic\n",
      "\n",
      "Input: sunny\n",
      "Output: gloomy\n",
      "\n",
      "Input: windy\n",
      "Output: calm\n",
      "\n",
      "Input: big\n",
      "Output: small\n",
      "\n",
      "Input: enthusiastic\n",
      "Output:\n"
     ]
    }
   ],
   "source": [
    "# You can add an example to an example selector as well.\n",
    "new_example = {\"input\": \"big\", \"output\": \"small\"}\n",
    "dynamic_prompt.example_selector.add_example(new_example)\n",
    "print(dynamic_prompt.format(adjective=\"enthusiastic\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d007b0a",
   "metadata": {},
   "source": [
    "## Similarity ExampleSelector\n",
    "\n",
    "The SemanticSimilarityExampleSelector selects examples based on which examples are most similar to the inputs. It does this by finding the examples with the embeddings that have the greatest cosine similarity with the inputs.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "241bfe80",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.prompts.example_selector import SemanticSimilarityExampleSelector\n",
    "from langchain.vectorstores import FAISS\n",
    "from langchain.embeddings import OpenAIEmbeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "50d0a701",
   "metadata": {},
   "outputs": [],
   "source": [
    "example_selector = SemanticSimilarityExampleSelector.from_examples(\n",
    "    # This is the list of examples available to select from.\n",
    "    examples, \n",
    "    # This is the embedding class used to produce embeddings which are used to measure semantic similarity.\n",
    "    OpenAIEmbeddings(), \n",
    "    # This is the VectorStore class that is used to store the embeddings and do a similarity search over.\n",
    "    FAISS, \n",
    "    # This is the number of examples to produce.\n",
    "    k=1\n",
    ")\n",
    "similar_prompt = FewShotPromptTemplate(\n",
    "    # We provide an ExampleSelector instead of examples.\n",
    "    example_selector=example_selector,\n",
    "    example_prompt=example_prompt,\n",
    "    prefix=\"Give the antonym of every input\",\n",
    "    suffix=\"Input: {adjective}\\nOutput:\", \n",
    "    input_variables=[\"adjective\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "4c8fdf45",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Give the antonym of every input\n",
      "\n",
      "Input: happy\n",
      "Output: sad\n",
      "\n",
      "Input: worried\n",
      "Output:\n"
     ]
    }
   ],
   "source": [
    "# Input is a feeling, so should select the happy/sad example\n",
    "print(similar_prompt.format(adjective=\"worried\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "829af21a",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Give the antonym of every input\n",
      "\n",
      "Input: happy\n",
      "Output: sad\n",
      "\n",
      "Input: fat\n",
      "Output:\n"
     ]
    }
   ],
   "source": [
    "# Input is a measurement, so should select the tall/short example\n",
    "print(similar_prompt.format(adjective=\"fat\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "3c16fe23",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Give the antonym of every input\n",
      "\n",
      "Input: happy\n",
      "Output: sad\n",
      "\n",
      "Input: joyful\n",
      "Output:\n"
     ]
    }
   ],
   "source": [
    "# You can add new examples to the SemanticSimilarityExampleSelector as well\n",
    "similar_prompt.example_selector.add_example({\"input\": \"enthusiastic\", \"output\": \"apathetic\"})\n",
    "print(similar_prompt.format(adjective=\"joyful\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc35afd0",
   "metadata": {},
   "source": [
    "## Maximal Marginal Relevance ExampleSelector\n",
    "\n",
    "The MaxMarginalRelevanceExampleSelector selects examples based on a combination of which examples are most similar to the inputs, while also optimizing for diversity. It does this by finding the examples with the embeddings that have the greatest cosine similarity with the inputs, and then iteratively adding them while penalizing them for closeness to already selected examples.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "ac95c968",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.prompts.example_selector import MaxMarginalRelevanceExampleSelector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "db579bea",
   "metadata": {},
   "outputs": [],
   "source": [
    "example_selector = MaxMarginalRelevanceExampleSelector.from_examples(\n",
    "    # This is the list of examples available to select from.\n",
    "    examples, \n",
    "    # This is the embedding class used to produce embeddings which are used to measure semantic similarity.\n",
    "    OpenAIEmbeddings(), \n",
    "    # This is the VectorStore class that is used to store the embeddings and do a similarity search over.\n",
    "    FAISS, \n",
    "    # This is the number of examples to produce.\n",
    "    k=2\n",
    ")\n",
    "mmr_prompt = FewShotPromptTemplate(\n",
    "    # We provide an ExampleSelector instead of examples.\n",
    "    example_selector=example_selector,\n",
    "    example_prompt=example_prompt,\n",
    "    prefix=\"Give the antonym of every input\",\n",
    "    suffix=\"Input: {adjective}\\nOutput:\", \n",
    "    input_variables=[\"adjective\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "cd76e344",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Give the antonym of every input\n",
      "\n",
      "Input: happy\n",
      "Output: sad\n",
      "\n",
      "Input: windy\n",
      "Output: calm\n",
      "\n",
      "Input: worried\n",
      "Output:\n"
     ]
    }
   ],
   "source": [
    "# Input is a feeling, so should select the happy/sad example as the first one\n",
    "print(mmr_prompt.format(adjective=\"worried\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "cf82956b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Give the antonym of every input\n",
      "\n",
      "Input: happy\n",
      "Output: sad\n",
      "\n",
      "Input: enthusiastic\n",
      "Output: apathetic\n",
      "\n",
      "Input: worried\n",
      "Output:\n"
     ]
    }
   ],
   "source": [
    "# Let's compare this to what we would just get if we went solely off of similarity\n",
    "similar_prompt.example_selector.k = 2\n",
    "print(similar_prompt.format(adjective=\"worried\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c746d6f4",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}