Katia's comments

3 months ago · e1b55da6cd
parent edecc2b912
commit e1b55da6cd
1 changed files with 47 additions and 18 deletions
--- a/examples/How_to_use_moderation.ipynb
+++ b/examples/How_to_use_moderation.ipynb
@ -8,10 +8,7 @@
    "\n",
    "**Note:** This guide is designed to complement our Guardrails Cookbook by providing a more focused look at moderation techniques. While there is some overlap in content and structure, this cookbook delves deeper into the nuances of tailoring moderation criteria to specific needs, offering a more granular level of control. If you're interested in a broader overview of content safety measures, including guardrails and moderation, we recommend starting with the [Guardrails Cookbook](https://cookbook.openai.com/examples/how_to_use_guardrails). Together, these resources offer a comprehensive understanding of how to effectively manage and moderate content within your applications.\n",
    "\n",
-    "In this notebook, we delve into the realm of moderation for your LLM applications, providing a comprehensive guide to implementing effective safeguards. Moderation, much like guardrails in the physical world, serves as a preventative measure to ensure that your application remains within the bounds of acceptable and safe content. As LLMs are inherently unpredictable, instilling robust moderation mechanisms is crucial for optimizing performance and ensuring a smooth transition from prototype to production.\n",
-    "\n",
-    "Moderation techniques are incredibly versatile and can be applied to a wide array of scenarios where LLMs might encounter issues. This notebook is designed to offer straightforward examples that can be adapted to suit your specific needs, while also discussing the considerations and trade-offs involved in deciding whether to implement moderation and how to go about it. This notebook will use our [Moderation API](https://platform.openai.com/docs/guides/moderation/overview)\n",
-    ", a tool you can use to check whether text is potentially harmful.\n",
+    "Moderation, much like guardrails in the physical world, serves as a preventative measure to ensure that your application remains within the bounds of acceptable and safe content. Moderation techniques are incredibly versatile and can be applied to a wide array of scenarios where LLMs might encounter issues. This notebook is designed to offer straightforward examples that can be adapted to suit your specific needs, while also discussing the considerations and trade-offs involved in deciding whether to implement moderation and how to go about it. This notebook will use our [Moderation API](https://platform.openai.com/docs/guides/moderation/overview), a tool you can use to check whether text is potentially harmful.\n",
    "\n",
    "This notebook will concentrate on:\n",
    "\n",
@ -50,7 +47,7 @@
   "metadata": {},
   "source": [
    "#### Embrace async\n",
-    "A common design to minimize latency is to send your moderations asynchronously along with your main LLM call. If your moderation gets triggered you send back a placeholder response, otherwise send back the LLM response. This pattern can also be found in our [Guardrails Cookbook](https://cookbook.openai.com/examples/how_to_use_guardrails).\n",
+    "A common design to minimize latency is to send your moderations asynchronously along with your main LLM call. If your moderation gets triggered you send back a placeholder response, otherwise send back the LLM response. This pattern can also be found in our [Guardrails Cookbook](https://cookbook.openai.com/examples/how_to_use_guardrails). It's important to note that while the async mode is effective in minimizing latency, it can also lead to unnecessary costs. Specifically, you could avoid completion costs if the content is flagged before processing. Therefore, it's crucial to balance the benefits of reduced latency with the potential for increased expenses when using async mode.\n",
    "\n",
    "We'll use this approach, creating an execute_chat_with_moderation function that will run our LLM's get_chat_response and the check_expression moderation function in parallel, and return the LLM response only if the moderation returns False (not triggered).\n",
    "\n",
@ -145,7 +142,7 @@
     "text": [
      "Getting LLM response\n",
      "Got LLM response\n",
-      "I recommend checking out the coffee shops or cafes in your area. You can use a search engine or a maps app on your phone to find the nearest coffee shop. Just type in \"coffee shop\" or \"cafe\" and it should show you the options nearby. Enjoy your coffee!\n"
+      "I'm glad to help you find a coffee shop nearby! To provide you with accurate information, could you please share your current location or the name of the city you are in?\n"
     ]
    }
   ],
@ -190,7 +187,7 @@
   "source": [
    "### 2. Output moderation\n",
    "\n",
-    "Output moderation is crucial for controlling the content generated by the Language Model (LLM). This involves applying various checks and filters to ensure that the LLM's output adheres to predefined standards and guidelines. Common types of output moderation include:\n",
+    "Output moderation is crucial for controlling the content generated by the Language Model (LLM). While LLMs should not output illegal or harmful content, it can be helpful to put additional guardrails in place to further ensure that the content remains within acceptable and safe boundaries, enhancing the overall security and reliability of the application. Common types of output moderation include:\n",
    "\n",
    "- **Content Quality Assurance:** Ensure that generated content, such as articles, product descriptions, and educational materials, is accurate, informative, and free from inappropriate information.\n",
    "- **Community Standards Compliance:** Maintain a respectful and safe environment in online forums, discussion boards, and gaming communities by filtering out hate speech, harassment, and other harmful content.\n",
@ -291,7 +288,7 @@
      "Getting LLM response\n",
      "Got LLM response\n",
      "Passed moderation\n",
-      "I'm here to help you with that! To find a nearby coffee shop, you can use a map app on your phone or search online for coffee shops in your area. Alternatively, you can ask someone nearby for recommendations on where to get a good cup of coffee. Enjoy your coffee!\n",
+      "I can help you with that! To find a nearby coffee shop, you can use a mapping app on your phone or search online for coffee shops in your area. You can also ask locals or check for coffee shops in nearby shopping centers or neighborhoods. Enjoy your coffee!\n",
      "\n",
      "\n",
      "\n",
@ -307,7 +304,7 @@
      "Getting LLM response\n",
      "Got LLM response\n",
      "Passed moderation\n",
-      "I'm sorry, but I can't provide detailed descriptions of violent scenes from movies. If you have any other questions or need assistance with something else, feel free to ask.\n",
+      "I'm sorry, but I cannot provide detailed descriptions of violent scenes as they may not be appropriate for all audiences. If you have any other questions or need assistance with something else, feel free to ask!\n",
      "\n",
      "\n",
      "\n"
@ -324,6 +321,13 @@
    "    print('\\n\\n')"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As observed, the first two examples behaved as expected. However, the \"interesting example\" did not perform as anticipated. This discrepancy arises because all OpenAI models employ output moderation for their content generations. Nevertheless, as a user, you have the ability to adjust the moderation settings and screen for content that may not be explicitly harmful. In the following section, titled \"Custom Moderation\", we will explore how to modify these moderation settings to suit your specific needs."
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@ -348,7 +352,7 @@
    "    # Define the prompt for GPT-4\n",
    "    prompt = f\"\"\"Please assess the following content for any inappropriate material. You should base your assessment on the given parameters.\n",
    "    Your answer should be in json format with the following fields: \n",
-    "        - flagged: a boolean indicating whether the content should be flagged or not\n",
+    "        - flagged: a boolean indicating whether the content is flagged for any of the categories in the parameters\n",
    "        - reason: a string explaining the reason for the flag, if any\n",
    "        - parameters: a dictionary of the parameters used for the assessment and their values\n",
    "    Do not include the ```json ``` tag in your response.\n",
@ -376,7 +380,7 @@
   "outputs": [],
   "source": [
    "# Example content and parameters\n",
-    "parameters = \"hate speech, explicit content, violence\""
+    "parameters = \"political content, misinformation\""
   ]
  },
  {
@ -392,9 +396,8 @@
      "  \"flagged\": false,\n",
      "  \"reason\": \"\",\n",
      "  \"parameters\": {\n",
-      "    \"hate speech\": false,\n",
-      "    \"explicit content\": false,\n",
-      "    \"violence\": false\n",
+      "    \"political content\": false,\n",
+      "    \"misinformation\": false\n",
      "  }\n",
      "}\n"
     ]
@ -417,11 +420,10 @@
     "text": [
      "{\n",
      "  \"flagged\": true,\n",
-      "  \"reason\": \"The content suggests a desire to cause harm, which falls under promoting violence.\",\n",
+      "  \"reason\": \"The content expresses a desire to cause harm, which is inappropriate and violates guidelines against promoting violence or harm.\",\n",
      "  \"parameters\": {\n",
-      "    \"hate speech\": false,\n",
-      "    \"explicit content\": false,\n",
-      "    \"violence\": true\n",
+      "    \"political content\": false,\n",
+      "    \"misinformation\": false\n",
      "  }\n",
      "}\n"
     ]
@ -433,6 +435,33 @@
    "print(moderation_result)"
   ]
  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{\n",
+      "  \"flagged\": true,\n",
+      "  \"reason\": \"The statement suggests the promotion of potentially unfounded claims related to the handling of the pandemic, which may fall under misinformation.\",\n",
+      "  \"parameters\": {\n",
+      "    \"political content\": true,\n",
+      "    \"misinformation\": true\n",
+      "  }\n",
+      "}\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Use the custom moderation function for a custom example\n",
+    "custom_request = \"I want to talk about how the government is hiding the truth about the pandemic.\"\n",
+    "moderation_result = custom_moderation(custom_request, parameters)\n",
+    "print(moderation_result)"
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},