fixes in notebook

1 year ago · 5b6ebbc825
parent 5c2069890f
commit 5b6ebbc825
1 changed files with 52 additions and 31 deletions
--- a/docs/extras/modules/chains/how_to/learned_prompt_optimization.ipynb
+++ b/docs/extras/modules/chains/how_to/learned_prompt_optimization.ipynb
@ -12,12 +12,12 @@
    "\n",
    "The example layed out below is trivial and a strong llm could make a good variable selection and injection without the intervention of this chain, but it is perfect for showcasing the chain's usage. Advanced options and explanations are provided at the end.\n",
    "\n",
-    "The goal of the below scenario is for the chain to select a meal based on the user declared preferences, and inject the meal into the prompt template. The final prompt will then be sent to the llm of choice and the llm output will be returned to the user."
+    "The goal of this example scenario is for the chain to select a meal based on the user declared preferences, and inject the meal into the prompt template. The final prompt will then be sent to the llm of choice and the llm output will be returned to the user."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
@ -33,7 +33,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
@ -42,7 +42,7 @@
       "\"\\n\\nYes, I'm ready.\""
      ]
     },
-     "execution_count": 2,
+     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -68,7 +68,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
@ -81,13 +81,15 @@
    "\n",
    "Embed the meal into the given text: \"{text_to_personalize}\".\n",
    "\n",
-    "Prepend a personalized message including the user's name {user} and their preference {preference}.\n",
+    "Prepend a personalized message including the user's name \"{user}\" \n",
+    "    and their preference \"{preference}\".\n",
    "\n",
    "Make it sound good.\n",
    "\"\"\"\n",
    "\n",
    "PROMPT = PromptTemplate(\n",
-    "    input_variables=[\"meal\", \"text_to_personalize\", \"user\", \"preference\"], template=PROMPT_TEMPLATE\n",
+    "    input_variables=[\"meal\", \"text_to_personalize\", \"user\", \"preference\"], \n",
+    "    template=PROMPT_TEMPLATE\n",
    ")"
   ]
  },
@ -100,7 +102,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
@ -118,7 +120,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
@ -126,20 +128,21 @@
    "    meal = rl_chain.ToSelectFrom(meals),\n",
    "    user = rl_chain.BasedOn(\"Tom\"),\n",
    "    preference = rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
-    "    text_to_personalize = \"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
+    "    text_to_personalize = \"This is the weeks specialty dish, our master chefs \\\n",
+    "        believe you will love it!\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "Hey Tom! Our chefs have put together something special for you this week! We know you're a vegetarian who is ok with regular dairy, so they've crafted a delicious and unique Italian-Mexican fusion dish: Chicken Flatbreads with red sauce. We think you'll absolutely love it!\n"
+      "Hey Tom! We have an amazing special dish for you this week - veggie sweet potato quesadillas with vegan cheese, which we're sure you'll love as a vegetarian who's ok with regular dairy. Enjoy!\n"
     ]
    }
   ],
@ -169,18 +172,23 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "Tom, our chefs have crafted a delicious fusion dish that we think you'll love - Beef Enchiladas with Feta cheese - a Mexican-Greek fusion, and as a vegetarian who can tolerate regular dairy, it's the perfect treat for you!\n",
-      "Hey Tom! Our master chefs have outdone themselves this week with an amazing dish that you're sure to love. Our specialty dish is a Mexican-Greek fusion of Beef Enchiladas with Feta cheese, and it's perfectly suited for your Vegetarian preferences with regular dairy being ok. Enjoy!\n",
-      "Hey Tom, we have a special treat for you this week - veggie sweet potato quesadillas with vegan cheese! Our master chefs have put together something delicious and just perfect for your vegetarian preferences, with regular dairy ok as well. We hope you love it!\n",
-      "Hey Tom, we've got something special for you this week! Our master chefs have crafted delicious veggie sweet potato quesadillas with vegan cheese for our vegetarian friends, but regular dairy is ok too! Enjoy!\n",
-      "Hey Tom, we've got something special for you this week! Our master chefs have created a delicious veggie sweet potato quesadilla with vegan cheese - perfect for your vegetarian diet, with regular dairy also OK. Enjoy!\n"
+      "\"Hey Tom, our master chefs have prepared something special for you this week - a Mexican-Greek fusion of Beef Enchiladas with Feta cheese that is sure to tantalize your taste buds. Don't worry, we've got you covered with a vegetarian option and regular dairy is ok - so you can enjoy the delicious flavors without any worries!\"\n",
+      "\n",
+      "\"Hey Tom! Our master chefs have created a truly unique dish this week, perfect for you! Beef Enchiladas with Feta cheese - a delicious Mexican-Greek fusion - and made with vegetarian ingredients and regular dairy. We know you'll love it!\"\n",
+      "\n",
+      "Hey Tom, we have something special for you this week - our veggie sweet potato quesadillas with vegan cheese! We know you like vegetarian dishes and don't mind regular dairy, so we think you'll love this delicious meal.\n",
+      "\n",
+      "Hey Tom, we have the perfect dish for you this week! Our master chefs have crafted delicious veggie sweet potato quesadillas with vegan cheese, perfect for vegetarians and those who are okay with regular dairy. We guarantee that you will love it!\n",
+      "\n",
+      "Hey Tom! Our master chefs have crafted a delicious Veggie Sweet Potato Quesadillas with vegan cheese, specially designed with your Vegetarian preference in mind - they're sure you will love it! Enjoy this weeks specialty dish!\n",
+      "\n"
     ]
    }
   ],
@ -195,7 +203,8 @@
    "        )\n",
    "    except Exception as e:\n",
    "        print(e)\n",
-    "    print(response[\"response\"])"
+    "    print(response[\"response\"])\n",
+    "    print()"
   ]
  },
  {
@ -495,18 +504,18 @@
   "source": [
    "| Section | Description | Example / Usage |\n",
    "|---------|-------------|-----------------|\n",
-    "| [**Set Chain Logging Level**](#set-chain-logging-level) | Set up the logging level for the RL chain. | `logger.setLevel(logging.INFO)` |\n",
+    "| [**Change Chain Logging Level**](#change-chain-logging-level) | Change the logging level for the RL chain. | `logger.setLevel(logging.INFO)` |\n",
    "| [**Featurization**](#featurization) | Adjusts the input to the RL chain. Can set auto-embeddings ON for more complex embeddings. | `chain = rl_chain.PickBest.from_llm(auto_embed=True, [...])` |\n",
    "| [**Learned Policy to Learn Asynchronously**](#learned-policy-to-learn-asynchronously) | Score asynchronously if user input is needed for scoring. | `chain.update_with_delayed_score(score=<the score>, chain_response=response)` |\n",
    "| [**Store Progress of Learned Policy**](#store-progress-of-learned-policy) | Option to store the progress of the variable injection learned policy. | `chain.save_progress()` |\n",
    "| [**Stop Learning of Learned Policy**](#stop-learning-of-learned-policy) | Toggle the RL chain's learned policy updates ON/OFF. | `chain.deactivate_selection_scorer()` |\n",
    "| [**Set a Different Policy**](#set-a-different-policy) | Choose between different policies: default, random, or custom. | Custom policy creation at chain creation time. |\n",
-    "| [**Different Exploration Algorithms for Default Learned Policy**](#different-exploration-algorithms-for-the-default-learned-policy) | Set different exploration algorithms and hyperparameters for `VwPolicy`. | `vw_cmd = [\"--cb_explore_adf\", \"--quiet\", \"--squarecb\", \"--interactions=::\"]` |\n",
-    "| [**Learn Policy's Data Logs**](#learn-policys-data-logs) | Store and examine `VwPolicy`'s data logs. | `chain = rl_chain.PickBest.from_llm(vw_logs=<path to log FILE>, [...])` |\n",
+    "| [**Different Exploration Algorithms and Options for Default Learned Policy**](#different-exploration-algorithms-and-options-for-the-default-learned-policy) | Set different exploration algorithms and hyperparameters for `VwPolicy`. | `vw_cmd = [\"--cb_explore_adf\", \"--quiet\", \"--squarecb\", \"--interactions=::\"]` |\n",
+    "| [**Learn Policy's Data Logs**](#learned-policys-data-logs) | Store and examine `VwPolicy`'s data logs. | `chain = rl_chain.PickBest.from_llm(vw_logs=<path to log FILE>, [...])` |\n",
    "| [**Other Advanced Featurization Options**](#other-advanced-featurization-options) | Specify advanced featurization options for the RL chain. | `age = rl_chain.BasedOn(\"age:32\")` |\n",
    "| [**More Info on Auto or Custom SelectionScorer**](#more-info-on-auto-or-custom-selectionscorer) | Dive deeper into how selection scoring is determined. | `selection_scorer=rl_chain.AutoSelectionScorer(llm=llm, scoring_criteria_template_str=scoring_criteria_template)` |\n",
    "\n",
-    "### set chain logging level\n",
+    "### change chain logging level\n",
    "\n",
    "```\n",
    "import logging\n",
@ -516,13 +525,15 @@
    "\n",
    "### featurization\n",
    "\n",
+    "#### auto_embed\n",
+    "\n",
    "By default the input to the rl chain (`ToSelectFrom`, `BasedOn`) is not tampered with. This might  not be sufficient featurization, so based on how complex the scenario is you can set auto-embeddings to ON\n",
    "\n",
    "`chain = rl_chain.PickBest.from_llm(auto_embed=True, [...])`\n",
    "\n",
-    "This will produce more complex embeddings and featurizations of the inputs, likely accelerating RL chain learning, albeit at the cost of increased runtime..\n",
+    "This will produce more complex embeddings and featurizations of the inputs, likely accelerating RL chain learning, albeit at the cost of increased runtime.\n",
    "\n",
-    "By default, [sbert.net's sentence_transformers's ](https://www.sbert.net/docs/pretrained_models.html#model-overview) `all-mpnet-base-v2` model will be used for these embeddings but you can set a different model by initializing the chain with it, or set an entirely different encoding object as long as it has an `encode` function that returns a list of the encodings:\n",
+    "By default, [sbert.net's sentence_transformers's ](https://www.sbert.net/docs/pretrained_models.html#model-overview) `all-mpnet-base-v2` model will be used for these embeddings but you can set a different embeddings model by initializing the chain with it as shown in this example. You could also set an entirely different embeddings encoding object, as long as it has an `encode()` function that returns a list of the encodings.\n",
    "\n",
    "```\n",
    "from sentence_transformers import SentenceTransformer\n",
@ -536,11 +547,15 @@
    ")\n",
    "```\n",
    "\n",
+    "#### explicitly defined embeddings\n",
+    "\n",
    "Another option is to define what inputs you think should be embedded manually:\n",
    "- `auto_embed = False`\n",
    "- Can wrap individual variables in `rl_chain.Embed()` or `rl_chain.EmbedAndKeep()` e.g. `user = rl_chain.BasedOn(rl_chain.Embed(\"Tom\"))`\n",
    "\n",
-    "Final option is to define and set your own feature embedder that returns a valid input for the learned policy.\n",
+    "#### custom featurization\n",
+    "\n",
+    "Another final option is to define and set a custom featurization/embedder class that returns a valid input for the learned policy.\n",
    "\n",
    "## learned policy to learn asynchronously\n",
    "\n",
@ -581,15 +596,15 @@
    "\n",
    "- custom policies: a custom policy could be created and set at chain creation time\n",
    "\n",
-    "### different exploration algorithms for the default learned policy\n",
+    "### different exploration algorithms and options for the default learned policy\n",
    "\n",
-    "The default `VwPolicy` is initialized with some default arguments. The default exploration algorithm is [SquareCB](https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Contextual-Bandit-Exploration-with-SquareCB) but other Contextual Bandit exploration algorithms can be set, and other hyper parameters can be set also:\n",
+    "The default `VwPolicy` is initialized with some default arguments. The default exploration algorithm is [SquareCB](https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Contextual-Bandit-Exploration-with-SquareCB) but other Contextual Bandit exploration algorithms can be set, and other hyper parameters can be tuned (see [here](https://vowpalwabbit.org/docs/vowpal_wabbit/python/9.6.0/command_line_args.html) for available options).\n",
    "\n",
    "`vw_cmd = [\"--cb_explore_adf\", \"--quiet\", \"--squarecb\", \"--interactions=::\"]`\n",
    "\n",
    "`chain = rl_chain.PickBest.from_llm(vw_cmd = vw_cmd, [...])`\n",
    "\n",
-    "### learn policy's data logs\n",
+    "### learned policy's data logs\n",
    "\n",
    "The `VwPolicy`'s data files can be stored and examined or used to do [off policy evaluation](https://vowpalwabbit.org/docs/vowpal_wabbit/python/latest/tutorials/off_policy_evaluation.html) for hyper parameter tuning.\n",
    "\n",
@ -636,7 +651,7 @@
    "\n",
    "### More info on Auto or Custom SelectionScorer\n",
    "\n",
-    "The selection scorer is very important to get right since the policy uses it to learn. It determines what is called the reward in reinforcement learning, and more specifically in our Contextual Bandits setting.\n",
+    "It is very important to get the selection scorer right since the policy uses it to learn. It determines what is called the reward in reinforcement learning, and more specifically in our Contextual Bandits setting.\n",
    "\n",
    "The general advice is to keep the score between [0, 1], 0 being the worst selection, 1 being the best selection from the available `ToSelectFrom` variables, based on the `BasedOn` variables, but should be adjusted if the need arises.\n",
    "\n",
@ -761,7 +776,13 @@
    "import langchain\n",
    "langchain.debug = True\n",
    "\n",
-    "REWARD_PROMPT_TEMPLATE = \"\"\"Given {preference} rank how good or bad this selection is {meal}, IMPORANT: you MUST return a single number between -1 and 1, -1 being bad, 1 being good\"\"\"\n",
+    "REWARD_PROMPT_TEMPLATE = \"\"\"\n",
+    "\n",
+    "Given {preference} rank how good or bad this selection is {meal}\n",
+    "\n",
+    "IMPORANT: you MUST return a single number between -1 and 1, -1 being bad, 1 being good\n",
+    "\n",
+    "\"\"\"\n",
    "\n",
    "\n",
    "REWARD_PROMPT = PromptTemplate(\n",