openai-cookbook/examples/How_to_use_guardrails.ipynb
2023-10-19 09:49:19 +01:00

463 lines
24 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "8ea66173",
"metadata": {},
"source": [
"# How to use guardrails\n",
"\n",
"In this notebook we share examples of how to implement guardrails for your LLM applications. A guardrail is a generic term for **detective controls** that aim to steer your application. Greater steerability is a common requirement given the inherent randomness of LLMs, and so creating effective guardrails has become one of the most common areas of performance optimization when pushing an LLM from prototype to production. \n",
"\n",
"Guardrails are incredibly [diverse](https://github.com/NVIDIA/NeMo-Guardrails/blob/main/examples/README.md) and can be deployed to virtually any context you can imagine something going wrong with LLMs. This notebook aims to give simple examples that can be extended to meet your unique use case, as well as outlining the trade-offs to consider when deciding whether to implement a guardrail, and how to do it.\n",
"\n",
"This notebook will focus on:\n",
"1. **Input guardrails** that flag inappropriate content before it gets to your LLM\n",
"2. **Output guardrails** that validate what your LLM has produced before it gets to the customer\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "ef059e71",
"metadata": {},
"outputs": [],
"source": [
"import openai\n",
"\n",
"GPT_MODEL = 'gpt-3.5-turbo'"
]
},
{
"cell_type": "markdown",
"id": "63d917f0",
"metadata": {},
"source": [
"## 1. Input guardrails\n",
"\n",
"Input guardrails aim to prevent inappropriate content getting to the LLM in the first place - some common use cases are:\n",
"- **Topical guardrails:** Identify when a user asks an off-topic question and give them advice on what topics the LLM can help them with.\n",
"- **Jailbreaking:** Detect when a user is trying to hijack the LLM and override its prompting.\n",
"- **Prompt injection:** Pick up instances of prompt injection where users try to hide malicious code that will be executed in any downstream functions the LLM executes. \n",
"\n",
"In all of these they act as a preventative control, running either before or in parallel with the LLM, and triggering your application to behave differently if one of these criteria are met.\n",
"\n",
"### Designing a guardrail\n",
"\n",
"When designing guardrails it is important to consider the trade-off between **accuracy**, **latency** and **cost**, where you try to achieve maximum accuracy for the least impact to your bottom line and the user's experience. \n",
"\n",
"We'll begin with a simple **topical guardrail** which aims to detect off-topic questions and prevent the LLM from answering if triggered. This guardrail consists of a simple prompt and uses `gpt-3.5-turbo`, maximising latency/cost over accuracy, but if we wanted to optimize further we could consider:\n",
"- **Accuracy:** You could consider using a fine-tuned model or few-shot examples to increase the accuracy. RAG can also be effective if you have a corpus of information that can help determine whether a piece of content is allowed or not.\n",
"- **Latency/Cost:** You could try fine-tuning smaller models, such as `babbage-002` or open-source offerings like Llama, which can perform quite well when given enough training examples. When using open-source offerings you can also tune the machines you are using for inference to maximize either cost or latency reduction.\n",
"\n",
"This simple guardrail aims to ensure the LLM only answers to a predefined set of topics, and responds to out-of-bounds queries with a canned message.\n",
"\n",
"### Embrace async\n",
"\n",
"A common design to minimize latency is to send your guardrails asynchronously along with your main LLM call. If your guardrails get triggered you send back their response, otherwise send back the LLM response.\n",
"\n",
"We'll use this approach, creating an `execute_chat_with_guardrails` function that will run our LLM's `get_chat_response` and the `topical_guardrail` guardrail in parallel, and return the LLM response only if the guardrail returns `allowed`."
]
},
{
"cell_type": "code",
"execution_count": 57,
"id": "e95efc89",
"metadata": {},
"outputs": [],
"source": [
"system_prompt = \"You are a helpful assistant.\"\n",
"\n",
"bad_request = \"I want to talk about horses\"\n",
"good_request = \"What are the best breeds of dog for people that like cats?\""
]
},
{
"cell_type": "code",
"execution_count": 69,
"id": "fee948e2",
"metadata": {},
"outputs": [],
"source": [
"import asyncio\n",
"\n",
"\n",
"async def get_chat_response(user_request):\n",
" print(\"Getting LLM response\")\n",
" messages = [\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": user_request},\n",
" ]\n",
" response = openai.ChatCompletion.create(\n",
" model=GPT_MODEL, messages=messages, temperature=0.5\n",
" )\n",
" print(\"Got LLM response\")\n",
" return response[\"choices\"][0][\"message\"][\"content\"]\n",
"\n",
"\n",
"async def topical_guardrail(user_request):\n",
" print(\"Checking topical guardrail\")\n",
" messages = [\n",
" {\n",
" \"role\": \"system\",\n",
" \"content\": \"Your role is to assess whether the user question is allowed or not. The allowed topics are cats and dogs. If the topic is allowed, say 'allowed' otherwise say 'not_allowed'\",\n",
" },\n",
" {\"role\": \"user\", \"content\": user_request},\n",
" ]\n",
" response = openai.ChatCompletion.create(\n",
" model=GPT_MODEL, messages=messages, temperature=0\n",
" )\n",
" print(\"Got guardrail response\")\n",
" return response[\"choices\"][0][\"message\"][\"content\"]\n",
"\n",
"\n",
"async def execute_chat_with_guardrail(user_request):\n",
" topical_guardrail_task = asyncio.create_task(topical_guardrail(user_request))\n",
" chat_task = asyncio.create_task(get_chat_response(user_request))\n",
"\n",
" while True:\n",
" done, _ = await asyncio.wait(\n",
" [topical_guardrail_task, chat_task], return_when=asyncio.FIRST_COMPLETED\n",
" )\n",
" for task in done:\n",
" if task == topical_guardrail_task:\n",
" guardrail_response = topical_guardrail_task.result()\n",
" if guardrail_response == \"not_allowed\":\n",
" chat_task.cancel()\n",
" print(\"Topical guardrail triggered\")\n",
" return \"I can only talk about cats and dogs, the best animals that ever lived.\"\n",
" elif task == chat_task:\n",
" chat_response = chat_task.result()\n",
" if (\n",
" not topical_guardrail_task.done()\n",
" or topical_guardrail_task.result() != \"not_allowed\"\n",
" ):\n",
" return chat_response"
]
},
{
"cell_type": "code",
"execution_count": 70,
"id": "eba51754",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Checking topical guardrail\n",
"Got guardrail response\n",
"Getting LLM response\n",
"Got LLM response\n",
"If you are a cat lover considering getting a dog, there are certain dog breeds known for their cat-like characteristics that may be a good fit for you. Here are some dog breeds that tend to share traits with cats:\n",
"\n",
"1. Shiba Inu: Known for their independent nature and cleanliness, Shiba Inus are often compared to cats. They are typically reserved and have a strong sense of personal space.\n",
"\n",
"2. Basenji: These dogs are often referred to as \"barkless\" dogs and are known for their cat-like grooming habits and independent nature. They are clean, intelligent, and tend to be more aloof.\n",
"\n",
"3. Greyhound: Despite their large size, Greyhounds are gentle and low-energy dogs. They are known for their calm and quiet nature, which is reminiscent of cats. They also have a strong prey drive, similar to a cat's hunting instinct.\n",
"\n",
"4. Bichon Frise: These small, fluffy dogs are often described as cat-like due to their independent and intelligent nature. Bichon Frises are known for being relatively low-maintenance and are typically good with other pets.\n",
"\n",
"5. Cavalier King Charles Spaniel: These affectionate dogs are often described as having a cat-like personality due to their calm and gentle nature. They are typically good with cats and other pets.\n",
"\n",
"Remember, individual dogs within a breed can vary in personality, so it's important to spend time with the specific dog you are considering to ensure they are a good match for your cat-loving lifestyle.\n"
]
}
],
"source": [
"# Call the main function with the good request - this should go through\n",
"response = await execute_chat_with_guardrail(good_request)\n",
"print(response)"
]
},
{
"cell_type": "code",
"execution_count": 71,
"id": "c7d88b57",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Checking topical guardrail\n",
"Got guardrail response\n",
"Getting LLM response\n",
"Got LLM response\n",
"Topical guardrail triggered\n",
"I can only talk about cats and dogs, the best animals that ever lived.\n"
]
}
],
"source": [
"# Call the main function with the good request - this should get blocked\n",
"response = await execute_chat_with_guardrail(bad_request)\n",
"print(response)"
]
},
{
"cell_type": "markdown",
"id": "060b408e",
"metadata": {},
"source": [
"Looks like our guardrail worked - the first question was allowed through, but the second was blocked for being off-topic. Now we'll extend this concept to moderate the response we get from the LLM as well."
]
},
{
"cell_type": "markdown",
"id": "07af0154",
"metadata": {},
"source": [
"## 2. Output guardrails\n",
"\n",
"Output guardrails govern what the LLM comes back with. These can take many forms, with some of the most common being:\n",
"- **Hallucination/fact-checking guardrails:** Using a corpus of ground truth information or a training set of hallucinated responses to block hallucinated responses.\n",
"- **Moderation guardrails:** Applying brand and corporate guidelines to moderate the LLM's results, and either blocking or rewriting its response if it breaches them.\n",
"- **Syntax checks:** Structured outputs from LLMs can be returned corrupt or unable to be parsed - these guardrails detect those and either retry or fail gracefully, preventing failures in downstream applications.\n",
" - This is a common control to apply with function calling, ensuring that the expected schema is returned in the `arguments` when the LLM returns a `function_call`.\n",
" \n",
"### Moderation guardrail\n",
"\n",
"Here we implement a **moderation guardrail** that uses a version of the [G-Eval](https://arxiv.org/abs/2303.16634) evaluation method to score the presence of unwanted content in the LLM's response. This method is demonstrated in more detail in of our other [cookbooks](https://github.com/openai/openai-cookbook/blob/main/examples/evaluation/How_to_eval_abstractive_summarization.ipynb).\n",
"\n",
"To accomplish this we will make an extensible framework for moderating content that takes in a `domain` and applies `criteria` to a piece of `content` using a set of `steps`:\n",
"1. We set a domain name, which describes the type of content we're going to moderate.\n",
"2. We provide criteria, which outline clearly what the content should and should not contain.\n",
"3. Step-by-step instructions are provided for the LLM to grade the content.\n",
"4. The LLM returns a discrete score from 1-5.\n",
"\n",
"### Setting guardrail thresholds\n",
"\n",
"Our output guardrail will assess the LLM's response and block anything scoring a 3 or higher. Setting this threshold is a common area for optimization - we recommend building an evaluation set and grading the results using a confusion matrix to set the right tolerance for your guardrail. The trade-off here is generally:\n",
"- More false positives leads to a fractured user experience, where customers get annoyed and the assistant seems less helpful.\n",
"- More false negatives can cause lasting harm to your business, as people get the assistant to answer inappropriate questions, or prompt inject/jailbreak it.\n",
"\n",
"For example, for jailbreaking you may want to have a very low threshold, as the risk to your business if your LLM is hijacked and used to produce dangerous content that ends up on social media is very high. However, for our use case we're willing to accept a few false negatives, as the worst that could happen is someone ends up with a Bichon Frise who might have been better suited to a Labrador, which though sad will probably not cause lasting damage to our business (we hope)."
]
},
{
"cell_type": "code",
"execution_count": 66,
"id": "6c7b459f",
"metadata": {},
"outputs": [],
"source": [
"domain = \"animal breed recommendation\"\n",
"\n",
"animal_advice_criteria = \"\"\"\n",
"Assess the presence of explicit recommendation of cat or dog breeds in the content.\n",
"The content should contain only general advice about cats and dogs, not specific breeds to purchase.\"\"\"\n",
"\n",
"animal_advice_steps = \"\"\"\n",
"1. Read the content and the criteria carefully.\n",
"2. Assess how much explicit recommendation of cat or dog breeds is contained in the content.\n",
"3. Assign an animal advice score from 1 to 5, with 1 being no explicit cat or dog breed advice, and 5 being multiple named cat or dog breeds.\n",
"\"\"\"\n",
"\n",
"moderation_system_prompt = \"\"\"\n",
"You are a moderation assistant. Your role is to detect content about {domain} in the text provided, and mark the severity of that content.\n",
"\n",
"## {domain}\n",
"\n",
"### Criteria\n",
"\n",
"{scoring_criteria}\n",
"\n",
"### Instructions\n",
"\n",
"{scoring_steps}\n",
"\n",
"### Content\n",
"\n",
"{content}\n",
"\n",
"### Evaluation (score only!)\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 72,
"id": "43e3fd36",
"metadata": {},
"outputs": [],
"source": [
"async def moderation_guardrail(chat_response):\n",
" print(\"Checking moderation guardrail\")\n",
" mod_messages = [\n",
" {\"role\": \"user\", \"content\": moderation_system_prompt.format(\n",
" domain=domain,\n",
" scoring_criteria=animal_advice_criteria,\n",
" scoring_steps=animal_advice_steps,\n",
" content=chat_response\n",
" )},\n",
" ]\n",
" response = openai.ChatCompletion.create(\n",
" model=GPT_MODEL, messages=mod_messages, temperature=0\n",
" )\n",
" print(\"Got moderation response\")\n",
" return response[\"choices\"][0][\"message\"][\"content\"]\n",
" \n",
" \n",
"async def execute_all_guardrails(user_request):\n",
" topical_guardrail_task = asyncio.create_task(topical_guardrail(user_request))\n",
" chat_task = asyncio.create_task(get_chat_response(user_request))\n",
"\n",
" while True:\n",
" done, _ = await asyncio.wait(\n",
" [topical_guardrail_task, chat_task], return_when=asyncio.FIRST_COMPLETED\n",
" )\n",
" for task in done:\n",
" if task == topical_guardrail_task:\n",
" guardrail_response = topical_guardrail_task.result()\n",
" if guardrail_response == \"not_allowed\":\n",
" chat_task.cancel()\n",
" print(\"Topical guardrail triggered\")\n",
" return \"I can only talk about cats and dogs, the best animals that ever lived.\"\n",
" elif task == chat_task:\n",
" chat_response = chat_task.result()\n",
" if (\n",
" not topical_guardrail_task.done()\n",
" or topical_guardrail_task.result() != \"not_allowed\"\n",
" ):\n",
" moderation_response = await moderation_guardrail(chat_response)\n",
"\n",
" if int(moderation_response) >= 3:\n",
" print(f\"Moderation guardrail flagged with a score of {int(moderation_response)}\")\n",
" return \"Sorry, we're not permitted to give animal breed advice. I can help you with any general queries you might have.\"\n",
"\n",
" else:\n",
" print('Passed moderation')\n",
" return chat_response"
]
},
{
"cell_type": "code",
"execution_count": 76,
"id": "beea1305",
"metadata": {},
"outputs": [],
"source": [
"# Adding a request that should pass both our topical guardrail and our moderation guardrail\n",
"great_request = 'What is some advice you can give to a new dog owner?'"
]
},
{
"cell_type": "code",
"execution_count": 77,
"id": "1c582b4d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Checking topical guardrail\n",
"Got guardrail response\n",
"Getting LLM response\n",
"Got LLM response\n",
"Checking moderation guardrail\n",
"Got moderation response\n",
"Moderation guardrail flagged with a score of 4\n",
"Sorry, we're not permitted to give animal breed advice. I can help you with any general queries you might have.\n",
"\n",
"\n",
"\n",
"Checking topical guardrail\n",
"Got guardrail response\n",
"Getting LLM response\n",
"Got LLM response\n",
"Topical guardrail triggered\n",
"I can only talk about cats and dogs, the best animals that ever lived.\n",
"\n",
"\n",
"\n",
"Checking topical guardrail\n",
"Got guardrail response\n",
"Getting LLM response\n",
"Got LLM response\n",
"Checking moderation guardrail\n",
"Got moderation response\n",
"Passed moderation\n",
"As a new dog owner, here are some valuable pieces of advice:\n",
"\n",
"1. Research and choose the right breed for your lifestyle: Different dog breeds have different temperaments, energy levels, and care requirements. Make sure to select a breed that matches your living situation, activity level, and the time you can dedicate to training and grooming.\n",
"\n",
"2. Prepare your home: Create a safe and comfortable environment for your new pup. Remove any potential hazards, secure loose wires, and keep toxic substances out of their reach. Consider crate training for a designated resting space.\n",
"\n",
"3. Establish a routine: Dogs thrive on routine, so establish a consistent daily schedule for feeding, exercise, playtime, and potty breaks. This will help your dog feel secure and reduce anxiety.\n",
"\n",
"4. Socialize your dog: Introduce your dog to new people, animals, and environments gradually and positively. Socialization is crucial for their development and helps prevent behavioral issues. Enroll in puppy socialization classes or seek guidance from a professional trainer.\n",
"\n",
"5. Provide proper nutrition: Consult with a veterinarian to determine the best diet for your dog's age, size, and health condition. Feed them high-quality dog food and avoid feeding them harmful human foods.\n",
"\n",
"6. Regular exercise and mental stimulation: Dogs need physical exercise to stay healthy and mentally stimulated. Take them for daily walks, play fetch, or engage in other activities that suit their breed and energy level. Mental stimulation through puzzle toys, training sessions, and interactive games is equally important.\n",
"\n",
"7. Establish consistent rules and boundaries: Dogs thrive when they understand their place in the family and have clear boundaries. Be consistent with training, use positive reinforcement techniques, and set rules that are fair and consistent.\n",
"\n",
"8. Regular vet check-ups and vaccinations: Schedule regular visits to the veterinarian for vaccinations, check-ups, and preventive care. This will help ensure your dog's overall health and well-being.\n",
"\n",
"9. Be patient and understanding: Remember that dogs need time to adjust to their new surroundings and learn new behaviors. Be patient, consistent, and understanding during the training process. Reward good behavior and avoid punishment-based training methods.\n",
"\n",
"10. Show love and affection: Dogs are social animals that thrive on love and attention. Spend quality time with your dog, show them affection, and reinforce the bond between you. Regular grooming, such as brushing and bathing, can also be a great bonding activity.\n",
"\n",
"Remember, being a responsible dog owner requires time, effort, and commitment. Enjoy the journey and cherish the unconditional love and companionship your dog will bring into your life.\n",
"\n",
"\n",
"\n"
]
}
],
"source": [
"tests = [good_request,bad_request,great_request]\n",
"\n",
"for test in tests:\n",
" result = await execute_all_guardrails(test)\n",
" print(result)\n",
" print('\\n\\n')\n",
" "
]
},
{
"cell_type": "markdown",
"id": "4763dd2d",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"Guardrails are a vibrant and evolving topic in LLMs, and we hope this notebook has given you an effective introduction to the core concepts around guardrails. To recap:\n",
"- Guardrails are detective controls that aim to prevent harmful content getting to your applications and your users, and add steerability to your LLM in production.\n",
"- They can take the form of input guardrails, which target content before it gets to the LLM, and output guardrails, which control the LLM's response.\n",
"- Designing guardrails and setting their thresholds is a trade-off between accuracy, latency, and cost. Your decision should be based on clear evaluations of the performance of your guardrails, and an understanding of what the cost of a false negative and false positive are for your business.\n",
"- By embracing asynchronous design principles, you can scale guardrails horizontally to minimize the impact to the user as your guardrails increase in number and scope.\n",
"\n",
"We look forward to seeing how you take this forward, and how thinking on guardrails evolves as the ecosystem matures. If you'd like to read more on this topic, there are a few different sources we'd recommend:\n",
"- [NeMo Guardrails](https://github.com/NVIDIA/NeMo-Guardrails/tree/main)\n",
"- [Guardrails AI](https://github.com/ShreyaR/guardrails)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "openai_test",
"language": "python",
"name": "openai_test"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}