cr

Merge branch 'master' into harrison/prompts_take_2
update prompts
2022-11-14 22:54:12 -08:00 · 2022-11-14 22:43:06 -08:00 · 2022-11-14 20:58:34 -08:00
4 changed files with 520 additions and 6 deletions
--- a/docs/examples/prompts/walkthrough.ipynb
+++ b/docs/examples/prompts/walkthrough.ipynb
@ -0,0 +1,410 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "43fb16cb",
+   "metadata": {},
+   "source": [
+    "# Prompt Walkthrough\n",
+    "\n",
+    "An overview of the different types of prompts in LangChain and how to use them"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cddb465e",
+   "metadata": {},
+   "source": [
+    "### Basic Prompt\n",
+    "\n",
+    "The most simple type of prompt - a string template that takes any number of input variables. The template should be formatted as a Python f-string."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "094229f4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.prompts import Prompt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "ab46bd2a",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'Tell me a joke.'"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# An example prompt with no input variables\n",
+    "no_input_prompt = Prompt(input_variables=[], template=\"Tell me a joke.\")\n",
+    "no_input_prompt.format()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "c3ad0fa8",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'Tell me a funny joke.'"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# An example prompt with one input variable\n",
+    "no_input_prompt = Prompt(input_variables=[\"adjective\"], template=\"Tell me a {adjective} joke.\")\n",
+    "no_input_prompt.format(adjective=\"funny\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "ba577dcf",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'Tell me a funny joke about chickens.'"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# An example prompt with multiple input variables\n",
+    "no_input_prompt = Prompt(input_variables=[\"adjective\", \"content\"], template=\"Tell me a {adjective} joke about {content}.\")\n",
+    "no_input_prompt.format(adjective=\"funny\", content=\"chickens\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d27b1824",
+   "metadata": {},
+   "source": [
+    "### Examples\n",
+    "Examples are datapoints that can be used to show the model how to produce results. They can be either strings, or dictionaries that are then turned into strings by an example prompt itself."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "2c00e965",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "string_examples = [\"Input: happy\\nOutput: sad\", \"Input: tall\\nOutput: short\"]\n",
+    "dict_examples = [{\"input\": \"happy\", \"output\": \"sad\"}, {\"input\": \"tall\", \"output\": \"short\"}]\n",
+    "example_prompt = Prompt(input_variables=[\"input\",\"output\"], template=\"Input: {input}\\nOutput: {output}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1492b49d",
+   "metadata": {},
+   "source": [
+    "### Simple Prompt with examples\n",
+    "\n",
+    "We can then use these examples to construct prompts."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "1a5a686d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Give the antonym of every input\n",
+      "\n",
+      "Input: happy\n",
+      "Output: sad\n",
+      "\n",
+      "Input: tall\n",
+      "Output: short\n",
+      "\n",
+      "Input: big\n",
+      "Output:\n"
+     ]
+    }
+   ],
+   "source": [
+    "prompt_from_string_examples = Prompt.from_examples(\n",
+    "    string_examples, \n",
+    "    prefix=\"Give the antonym of every input\",\n",
+    "    suffix=\"Input: {adjective}\\nOutput:\", \n",
+    "    input_variables=[\"adjective\"],\n",
+    ")\n",
+    "print(prompt_from_string_examples.format(adjective=\"big\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "7931e5f2",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Give the antonym of every input\n",
+      "\n",
+      "Input: happy\n",
+      "Output: sad\n",
+      "\n",
+      "Input: tall\n",
+      "Output: short\n",
+      "\n",
+      "Input: big\n",
+      "Output:\n"
+     ]
+    }
+   ],
+   "source": [
+    "prompt_from_string_examples = Prompt.from_structured_examples(\n",
+    "    dict_examples,\n",
+    "    example_prompt,\n",
+    "    prefix=\"Give the antonym of every input\",\n",
+    "    suffix=\"Input: {adjective}\\nOutput:\", \n",
+    "    input_variables=[\"adjective\"],\n",
+    ")\n",
+    "print(prompt_from_string_examples.format(adjective=\"big\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "861a4d1f",
+   "metadata": {},
+   "source": [
+    "### Dynamic Prompt\n",
+    "\n",
+    "We also do more clever things with prompts - for example, only select a certain number of examples in order to limit the size of the text passed in. This will vary with the input text size."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "7c469c95",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.prompts import DynamicPrompt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "207e55f7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dynamic_prompt = DynamicPrompt.from_structured_examples(\n",
+    "    dict_examples,\n",
+    "    example_prompt,\n",
+    "    prefix=\"Give the antonym of every input\",\n",
+    "    suffix=\"Input: {adjective}\\nOutput:\", \n",
+    "    input_variables=[\"adjective\"],\n",
+    "    max_length=20,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "d00b4385",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Give the antonym of every input\n",
+      "\n",
+      "Input: happy\n",
+      "Output: sad\n",
+      "\n",
+      "Input: tall\n",
+      "Output: short\n",
+      "\n",
+      "Input: big\n",
+      "Output:\n"
+     ]
+    }
+   ],
+   "source": [
+    "# An example with small input, so it selects both examples.\n",
+    "print(dynamic_prompt.format(adjective=\"big\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "878bcde9",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Give the antonym of every input\n",
+      "\n",
+      "Input: happy\n",
+      "Output: sad\n",
+      "\n",
+      "Input: big and huge and massive\n",
+      "Output:\n"
+     ]
+    }
+   ],
+   "source": [
+    "# An example with long input, so it selects only one example.\n",
+    "print(dynamic_prompt.format(adjective=\"big and huge and massive\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d007b0a",
+   "metadata": {},
+   "source": [
+    "# Optimized Prompt\n",
+    "\n",
+    "Besides selecting a variable number of examples to show, we can also select examples that most closely match the user input. This is done by creating embeddings of the user input and comparing it embeddings of the examples."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "241bfe80",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.prompts.optimized import OptimizedPrompt\n",
+    "from langchain.vectorstores import FAISS\n",
+    "from langchain.embeddings import OpenAIEmbeddings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "id": "50d0a701",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "optimized_prompt = OptimizedPrompt.from_structured_examples(\n",
+    "    dict_examples,\n",
+    "    example_prompt,\n",
+    "    prefix=\"Give the antonym of every input\",\n",
+    "    suffix=\"Input: {adjective}\\nOutput:\", \n",
+    "    input_variables=[\"adjective\"],\n",
+    "    embeddings=OpenAIEmbeddings(),\n",
+    "    vectorstore_cls=FAISS\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "id": "4c8fdf45",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Give the antonym of every input\n",
+      "\n",
+      "Input: happy\n",
+      "Output: sad\n",
+      "\n",
+      "Input: worried\n",
+      "Output:\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Input is a feeling, so should select the happy/sad example\n",
+    "print(optimized_prompt.format(adjective=\"worried\", k=1))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "829af21a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Give the antonym of every input\n",
+      "\n",
+      "Input: tall\n",
+      "Output: short\n",
+      "\n",
+      "Input: fat\n",
+      "Output:\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Input is a measurment, so should select the tall/short example\n",
+    "print(optimized_prompt.format(adjective=\"fat\", k=1))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "76a1065d",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/langchain/prompts/dynamic.py
+++ b/langchain/prompts/dynamic.py
@ -5,6 +5,7 @@ from typing import Any, Callable, Dict, List
 from pydantic import BaseModel, Extra, root_validator

 from langchain.prompts.base import DEFAULT_FORMATTER_MAPPING, BasePrompt
+from langchain.prompts.prompt import Prompt


 class DynamicPrompt(BaseModel, BasePrompt):
@ -18,7 +19,7 @@ class DynamicPrompt(BaseModel, BasePrompt):
                examples=["Say hi. Hi", "Say ho. Ho"],
                example_separator="\n\n",
                prefix="",
-                suffix="\n\nSay {foo}"
+                suffix="Say {foo}"
                input_variables=["foo"],
                max_length=200,
                get_text_length=word_count
@ -110,3 +111,20 @@ class DynamicPrompt(BaseModel, BasePrompt):
        except KeyError:
            raise ValueError("Invalid prompt schema.")
        return values
+
+    @classmethod
+    def from_structured_examples(
+        cls, examples: List[dict], example_prompt: Prompt, **kwargs: Any
+    ) -> "DynamicPrompt":
+        """Create prompt from structured examples.
+
+        Args:
+            examples: List of structured examples to use in the prompt.
+            example_prompt: Prompt used to format the examples.
+            **kwargs: Key-word arguments to passed through to init.
+
+        Returns:
+            The final prompt generated.
+        """
+        string_examples = [example_prompt.format(**example) for example in examples]
+        return cls(examples=string_examples, **kwargs)
--- a/langchain/prompts/optimized.py
+++ b/langchain/prompts/optimized.py
@ -1,11 +1,12 @@
 """Optimized prompt schema definition."""
 import re
-from typing import Any, Callable, Dict, List
+from typing import Any, Callable, Dict, List, Optional

 from pydantic import BaseModel, Extra, root_validator

 from langchain.embeddings.base import Embeddings
 from langchain.prompts.base import DEFAULT_FORMATTER_MAPPING
+from langchain.prompts.prompt import Prompt
 from langchain.vectorstores.base import VectorStore


@ -28,6 +29,9 @@ class OptimizedPrompt(BaseModel):
            )
    """

+    vectorstore: VectorStore
+    """Vectorstore to use for storing the embeddings."""
+
    example_separator: str = "\n\n"
    """Example separator, e.g. \n\n, for the dynamic prompt creation."""

@ -49,9 +53,6 @@ class OptimizedPrompt(BaseModel):
    max_length: int = 2048
    """Max length for the prompt, beyond which examples are cut."""

-    vectorstore: VectorStore
-    """Vectorstore to use for storing the embeddings."""
-
    class Config:
        """Configuration for this pydantic object."""

@ -154,8 +155,65 @@ class OptimizedPrompt(BaseModel):
        Returns:
            The OptimizedPrompt instantiated, backed by a vector store.
        """
+        dict_examples = [{"text": example} for example in examples]
+        example_prompt = Prompt(input_variables=["text"], template="{text}")
+        return cls.from_structured_examples(
+            dict_examples,
+            example_prompt,
+            suffix,
+            input_variables,
+            embeddings,
+            vectorstore_cls=vectorstore_cls,
+            example_separator=example_separator,
+            prefix=prefix,
+            **vectorstore_cls_kwargs,
+        )
+
+    @classmethod
+    def from_structured_examples(
+        cls,
+        examples: List[dict],
+        example_prompt: Prompt,
+        suffix: str,
+        input_variables: List[str],
+        embeddings: Embeddings,
+        vectorstore_cls: VectorStore,
+        example_separator: str = "\n\n",
+        prefix: str = "",
+        example_key: Optional[str] = None,
+        **vectorstore_cls_kwargs: Any,
+    ) -> "OptimizedPrompt":
+        """Create k-shot prompt optimizer using example list and embeddings.
+
+        Reshuffles examples for the prompt dynamically based on query similarity.
+
+        Args:
+            examples: List of structured examples to use in the prompt.
+            example_prompt: Prompt used to format the examples.
+            suffix: String to go after the list of examples. Should generally
+                set up the user's input.
+            input_variables: A list of variable names the final prompt template
+                will expect.
+            embeddings: An initialized embedding API interface, e.g. OpenAIEmbeddings().
+            vectorstore_cls: A vector store DB interface class, e.g. FAISS.
+            example_separator: The seperator to use in between examples. Defaults
+                to two new line characters.
+            prefix: String that should go before any examples. Generally includes
+                examples. Default to an empty string.
+            example_key: Optional string pointing to the key in the example to
+                vectorized. If None, will format the example in the example_prompt,
+                and then vectorize that whole string.
+            vectorstore_cls_kwargs: optional kwargs containing url for vector store
+
+        Returns:
+            The OptimizedPrompt instantiated, backed by a vector store.
+        """
+        if example_key is None:
+            string_examples = [example_prompt.format(**example) for example in examples]
+        else:
+            string_examples = [example[example_key] for example in examples]
        vectorstore = vectorstore_cls.from_texts(
-            examples, embeddings, **vectorstore_cls_kwargs
+            string_examples, embeddings, **vectorstore_cls_kwargs
        )
        return cls(
            suffix=suffix,
--- a/langchain/prompts/prompt.py
+++ b/langchain/prompts/prompt.py
@ -97,6 +97,34 @@ class Prompt(BaseModel, BasePrompt):
        template = example_separator.join([prefix, *examples, suffix])
        return cls(input_variables=input_variables, template=template)

+    @classmethod
+    def from_structured_examples(
+        cls,
+        examples: List[dict],
+        example_prompt: "Prompt",
+        suffix: str,
+        input_variables: List[str],
+        **kwargs: Any,
+    ) -> "Prompt":
+        """Take examples in list format with prefix and suffix to create a prompt.
+
+        Intended be used as a way to dynamically create a prompt from examples.
+
+        Args:
+            examples: List of structured examples to use in the prompt.
+            example_prompt: Prompt used to format each example.
+            suffix: String to go after the list of examples. Should generally
+                set up the user's input.
+            input_variables: A list of variable names the final prompt template
+                will expect.
+            **kwargs: Key-word arguments to be passed through to init.
+
+        Returns:
+            The final prompt generated.
+        """
+        string_examples = [example_prompt.format(**example) for example in examples]
+        return cls.from_examples(string_examples, suffix, input_variables, **kwargs)
+
    @classmethod
    def from_file(cls, template_file: str, input_variables: List[str]) -> "Prompt":
        """Load a prompt from a file.
Author	SHA1	Message	Date
Harrison Chase	0d0d3f122a	cr	2022-11-14 22:54:12 -08:00
Harrison Chase	bf3a9973f0	Merge branch 'master' into harrison/prompts_take_2	2022-11-14 22:43:06 -08:00
Harrison Chase	c28d5ec3ba	update prompts	2022-11-14 20:58:34 -08:00