updates unit test writing example to use gpt-3.5-turbo

pull/437/head
Ted Sanders 1 year ago
parent f9d1934708
commit f71e6a96a1

@ -1,64 +1,27 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Unit test writing using a multi-step prompt\n",
"\n",
"Complex tasks, such as writing unit tests, can benefit from multi-step prompts. In contrast to a single prompt, a multi-step prompt generates text from GPT-3 and then feeds that text back into subsequent prompts. This can help in cases where you want GPT-3 to explain its reasoning before answering, or brainstorm a plan before executing it.\n",
"Complex tasks, such as writing unit tests, can benefit from multi-step prompts. In contrast to a single prompt, a multi-step prompt generates text from GPT and then feeds that output text back into subsequent prompts. This can help in cases where you want GPT to reason things out before answering, or brainstorm a plan before executing it.\n",
"\n",
"In this notebook, we use a 3-step prompt to write unit tests in Python using the following steps:\n",
"\n",
"1. Given a Python function, we first prompt GPT-3 to explain what the function is doing.\n",
"2. Second, we prompt GPT-3 to plan a set of unit tests for the function.\n",
" - If the plan is too short, we ask GPT-3 to elaborate with more ideas for unit tests.\n",
"3. Finally, we prompt GPT-3 to write the unit tests.\n",
"1. **Explain**: Given a Python function, we ask GPT to explain what the function is doing and why.\n",
"2. **Plan**: We ask GPT to plan a set of unit tests for the function.\n",
" - If the plan is too short, we ask GPT to elaborate with more ideas for unit tests.\n",
"3. **Execute**: Finally, we instruct GPT to write unit tests that cover the planned cases.\n",
"\n",
"The code example illustrates a few optional embellishments on the chained, multi-step prompt:\n",
"The code example illustrates a few embellishments on the chained, multi-step prompt:\n",
"\n",
"- Conditional branching (e.g., only asking for elaboration if the first plan is too short)\n",
"- Different models for different steps (e.g., `text-davinci-002` for the text planning steps and `code-davinci-002` for the code writing step)\n",
"- Conditional branching (e.g., asking for elaboration only if the first plan is too short)\n",
"- The choice of different models for different steps\n",
"- A check that re-runs the function if the output is unsatisfactory (e.g., if the output code cannot be parsed by Python's `ast` module)\n",
"- Streaming output so that you can start reading the output before it's fully generated (useful for long, multi-step outputs)\n",
"\n",
"The full 3-step prompt looks like this (using as an example `pytest` for the unit test framework and `is_palindrome` as the function):\n",
"\n",
" # How to write great unit tests with pytest\n",
"\n",
" In this advanced tutorial for experts, we'll use Python 3.9 and `pytest` to write a suite of unit tests to verify the behavior of the following function.\n",
" ```python\n",
" def is_palindrome(s):\n",
" return s == s[::-1]\n",
" ```\n",
"\n",
" Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been.\n",
" - First,{GENERATED IN STEP 1}\n",
" \n",
" A good unit test suite should aim to:\n",
" - Test the function's behavior for a wide range of possible inputs\n",
" - Test edge cases that the author may not have foreseen\n",
" - Take advantage of the features of `pytest` to make the tests easy to write and maintain\n",
" - Be easy to read and understand, with clean code and descriptive names\n",
" - Be deterministic, so that the tests always pass or fail in the same way\n",
"\n",
" `pytest` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above.\n",
"\n",
" For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets):\n",
" -{GENERATED IN STEP 2}\n",
"\n",
" [OPTIONALLY APPENDED]In addition to the scenarios above, we'll also want to make sure we don't forget to test rare or unexpected edge cases (and under each edge case, we include a few examples as sub-bullets):\n",
" -{GENERATED IN STEP 2B}\n",
"\n",
" Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does.\n",
" ```python\n",
" import pytest # used for our unit tests\n",
"\n",
" def is_palindrome(s):\n",
" return s == s[::-1]\n",
"\n",
" #Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator\n",
" {GENERATED IN STEP 3}"
"- Streaming output so that you can start reading the output before it's fully generated (handy for long, multi-step outputs)"
]
},
{
@ -71,204 +34,233 @@
"import ast # used for detecting whether generated Python code is valid\n",
"import openai # used for calling the OpenAI API\n",
"\n",
"color_prefix_by_role = {\n",
" \"system\": \"\\033[0m\", # gray\n",
" \"user\": \"\\033[0m\", # gray\n",
" \"assistant\": \"\\033[92m\", # green\n",
"}\n",
"\n",
"\n",
"def print_messages(messages, color_prefix_by_role=color_prefix_by_role) -> None:\n",
" \"\"\"Prints messages sent to or from GPT.\"\"\"\n",
" for message in messages:\n",
" role = message[\"role\"]\n",
" color_prefix = color_prefix_by_role[role]\n",
" content = message[\"content\"]\n",
" print(f\"{color_prefix}\\n[{role}]\\n{content}\")\n",
"\n",
"\n",
"def print_message_delta(delta, color_prefix_by_role=color_prefix_by_role) -> None:\n",
" \"\"\"Prints a chunk of messages streamed back from GPT.\"\"\"\n",
" if \"role\" in delta:\n",
" role = delta[\"role\"]\n",
" color_prefix = color_prefix_by_role[role]\n",
" print(f\"{color_prefix}\\n[{role}]\\n\", end=\"\")\n",
" elif \"content\" in delta:\n",
" content = delta[\"content\"]\n",
" print(content, end=\"\")\n",
" else:\n",
" pass\n",
"\n",
"\n",
"# example of a function that uses a multi-step prompt to write unit tests\n",
"def unit_test_from_function(\n",
"def unit_tests_from_function(\n",
" function_to_test: str, # Python function to test, as a string\n",
" unit_test_package: str = \"pytest\", # unit testing package; use the name as it appears in the import statement\n",
" approx_min_cases_to_cover: int = 7, # minimum number of test case categories to cover (approximate)\n",
" print_text: bool = False, # optionally prints text; helpful for understanding the function & debugging\n",
" text_model: str = \"text-davinci-002\", # model used to generate text plans in steps 1, 2, and 2b\n",
" code_model: str = \"code-davinci-002\", # if you don't have access to code models, you can use text models here instead\n",
" max_tokens: int = 1000, # can set this high, as generations should be stopped earlier by stop sequences\n",
" explain_model: str = \"gpt-3.5-turbo\", # model used to generate text plans in step 1\n",
" plan_model: str = \"gpt-3.5-turbo\", # model used to generate text plans in steps 2 and 2b\n",
" execute_model: str = \"gpt-3.5-turbo\", # model used to generate code in step 3\n",
" temperature: float = 0.4, # temperature = 0 can sometimes get stuck in repetitive loops, so we use 0.4\n",
" reruns_if_fail: int = 1, # if the output code cannot be parsed, this will re-run the function up to N times\n",
") -> str:\n",
" \"\"\"Outputs a unit test for a given Python function, using a 3-step GPT-3 prompt.\"\"\"\n",
" \"\"\"Returns a unit test for a given Python function, using a 3-step GPT prompt.\"\"\"\n",
"\n",
" # Step 1: Generate an explanation of the function\n",
"\n",
" # create a markdown-formatted prompt that asks GPT-3 to complete an explanation of the function, formatted as a bullet list\n",
" prompt_to_explain_the_function = f\"\"\"# How to write great unit tests with {unit_test_package}\n",
" # create a markdown-formatted message that asks GPT to explain the function, formatted as a bullet list\n",
" explain_system_message = {\n",
" \"role\": \"system\",\n",
" \"content\": \"You are a world-class Python developer with an eagle eye for unintended bugs and edge cases. You carefully explain code with great detail and accuracy. You organize your explanations in markdown-formatted, bulleted lists.\",\n",
" }\n",
" explain_user_message = {\n",
" \"role\": \"user\",\n",
" \"content\": f\"\"\"Please explain the following Python function. Review what each element of the function is doing precisely and what the author's intentions may have been. Organize your explanation as a markdown-formatted, bulleted list.\n",
"\n",
"In this advanced tutorial for experts, we'll use Python 3.9 and `{unit_test_package}` to write a suite of unit tests to verify the behavior of the following function.\n",
"```python\n",
"{function_to_test}\n",
"```\n",
"\n",
"Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been.\n",
"- First,\"\"\"\n",
"```\"\"\",\n",
" }\n",
" explain_messages = [explain_system_message, explain_user_message]\n",
" if print_text:\n",
" text_color_prefix = \"\\033[30m\" # black; if you read against a dark background \\033[97m is white\n",
" print(text_color_prefix + prompt_to_explain_the_function, end=\"\") # end='' prevents a newline from being printed\n",
" print_messages(explain_messages)\n",
"\n",
" # send the prompt to the API, using \\n\\n as a stop sequence to stop at the end of the bullet list\n",
" explanation_response = openai.Completion.create(\n",
" model=text_model,\n",
" prompt=prompt_to_explain_the_function,\n",
" stop=[\"\\n\\n\", \"\\n\\t\\n\", \"\\n \\n\"],\n",
" max_tokens=max_tokens,\n",
" explanation_response = openai.ChatCompletion.create(\n",
" model=explain_model,\n",
" messages=explain_messages,\n",
" temperature=temperature,\n",
" stream=True,\n",
" )\n",
" explanation_completion = \"\"\n",
" if print_text:\n",
" completion_color_prefix = \"\\033[92m\" # green\n",
" print(completion_color_prefix, end=\"\")\n",
" for event in explanation_response:\n",
" event_text = event[\"choices\"][0][\"text\"]\n",
" explanation_completion += event_text\n",
" explanation = \"\"\n",
" for chunk in explanation_response:\n",
" delta = chunk[\"choices\"][0][\"delta\"]\n",
" if print_text:\n",
" print(event_text, end=\"\")\n",
" print_message_delta(delta)\n",
" if \"content\" in delta:\n",
" explanation += delta[\"content\"]\n",
" explain_assistant_message = {\"role\": \"assistant\", \"content\": explanation}\n",
"\n",
" # Step 2: Generate a plan to write a unit test\n",
"\n",
" # create a markdown-formatted prompt that asks GPT-3 to complete a plan for writing unit tests, formatted as a bullet list\n",
" prompt_to_explain_a_plan = f\"\"\"\n",
" \n",
"A good unit test suite should aim to:\n",
" # Asks GPT to plan out cases the units tests should cover, formatted as a bullet list\n",
" plan_user_message = {\n",
" \"role\": \"user\",\n",
" \"content\": f\"\"\"A good unit test suite should aim to:\n",
"- Test the function's behavior for a wide range of possible inputs\n",
"- Test edge cases that the author may not have foreseen\n",
"- Take advantage of the features of `{unit_test_package}` to make the tests easy to write and maintain\n",
"- Be easy to read and understand, with clean code and descriptive names\n",
"- Be deterministic, so that the tests always pass or fail in the same way\n",
"\n",
"`{unit_test_package}` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above.\n",
"\n",
"For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets):\n",
"-\"\"\"\n",
"To help unit test the function above, list diverse scenarios that the function should be able to handle (and under each scenario, include a few examples as sub-bullets).\"\"\",\n",
" }\n",
" plan_messages = [\n",
" explain_system_message,\n",
" explain_user_message,\n",
" explain_assistant_message,\n",
" plan_user_message,\n",
" ]\n",
" if print_text:\n",
" print(text_color_prefix + prompt_to_explain_a_plan, end=\"\")\n",
"\n",
" # append this planning prompt to the results from step 1\n",
" prior_text = prompt_to_explain_the_function + explanation_completion\n",
" full_plan_prompt = prior_text + prompt_to_explain_a_plan\n",
"\n",
" # send the prompt to the API, using \\n\\n as a stop sequence to stop at the end of the bullet list\n",
" plan_response = openai.Completion.create(\n",
" model=text_model,\n",
" prompt=full_plan_prompt,\n",
" stop=[\"\\n\\n\", \"\\n\\t\\n\", \"\\n \\n\"],\n",
" max_tokens=max_tokens,\n",
" print_messages([plan_user_message])\n",
" plan_response = openai.ChatCompletion.create(\n",
" model=plan_model,\n",
" messages=plan_messages,\n",
" temperature=temperature,\n",
" stream=True,\n",
" )\n",
" plan_completion = \"\"\n",
" if print_text:\n",
" print(completion_color_prefix, end=\"\")\n",
" for event in plan_response:\n",
" event_text = event[\"choices\"][0][\"text\"]\n",
" plan_completion += event_text\n",
" plan = \"\"\n",
" for chunk in plan_response:\n",
" delta = chunk[\"choices\"][0][\"delta\"]\n",
" if print_text:\n",
" print(event_text, end=\"\")\n",
" print_message_delta(delta)\n",
" if \"content\" in delta:\n",
" plan += delta[\"content\"]\n",
" plan_assistant_message = {\"role\": \"assistant\", \"content\": plan}\n",
"\n",
" # Step 2b: If the plan is short, ask GPT-3 to elaborate further\n",
" # Step 2b: If the plan is short, ask GPT to elaborate further\n",
" # this counts top-level bullets (e.g., categories), but not sub-bullets (e.g., test cases)\n",
" elaboration_needed = plan_completion.count(\"\\n-\") +1 < approx_min_cases_to_cover # adds 1 because the first bullet is not counted\n",
" num_bullets = max(plan.count(\"\\n-\"), plan.count(\"\\n*\"))\n",
" elaboration_needed = num_bullets < approx_min_cases_to_cover\n",
" if elaboration_needed:\n",
" prompt_to_elaborate_on_the_plan = f\"\"\"\n",
"\n",
"In addition to the scenarios above, we'll also want to make sure we don't forget to test rare or unexpected edge cases (and under each edge case, we include a few examples as sub-bullets):\n",
"-\"\"\"\n",
" elaboration_user_message = {\n",
" \"role\": \"user\",\n",
" \"content\": f\"\"\"In addition to those scenarios above, list a few rare or unexpected edge cases (and as before, under each edge case, include a few examples as sub-bullets).\"\"\",\n",
" }\n",
" elaboration_messages = [\n",
" explain_system_message,\n",
" explain_user_message,\n",
" explain_assistant_message,\n",
" plan_user_message,\n",
" plan_assistant_message,\n",
" elaboration_user_message,\n",
" ]\n",
" if print_text:\n",
" print(text_color_prefix + prompt_to_elaborate_on_the_plan, end=\"\")\n",
"\n",
" # append this elaboration prompt to the results from step 2\n",
" prior_text = full_plan_prompt + plan_completion\n",
" full_elaboration_prompt = prior_text + prompt_to_elaborate_on_the_plan\n",
"\n",
" # send the prompt to the API, using \\n\\n as a stop sequence to stop at the end of the bullet list\n",
" elaboration_response = openai.Completion.create(\n",
" model=text_model,\n",
" prompt=full_elaboration_prompt,\n",
" stop=[\"\\n\\n\", \"\\n\\t\\n\", \"\\n \\n\"],\n",
" max_tokens=max_tokens,\n",
" print_messages([elaboration_user_message])\n",
" elaboration_response = openai.ChatCompletion.create(\n",
" model=plan_model,\n",
" messages=elaboration_messages,\n",
" temperature=temperature,\n",
" stream=True,\n",
" )\n",
" elaboration_completion = \"\"\n",
" if print_text:\n",
" print(completion_color_prefix, end=\"\")\n",
" for event in elaboration_response:\n",
" event_text = event[\"choices\"][0][\"text\"]\n",
" elaboration_completion += event_text\n",
" elaboration = \"\"\n",
" for chunk in elaboration_response:\n",
" delta = chunk[\"choices\"][0][\"delta\"]\n",
" if print_text:\n",
" print(event_text, end=\"\")\n",
" print_message_delta(delta)\n",
" if \"content\" in delta:\n",
" elaboration += delta[\"content\"]\n",
" elaboration_assistant_message = {\"role\": \"assistant\", \"content\": elaboration}\n",
"\n",
" # Step 3: Generate the unit test\n",
"\n",
" # create a markdown-formatted prompt that asks GPT-3 to complete a unit test\n",
" starter_comment = \"\"\n",
" # create a markdown-formatted prompt that asks GPT to complete a unit test\n",
" package_comment = \"\"\n",
" if unit_test_package == \"pytest\":\n",
" starter_comment = \"Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator\"\n",
" prompt_to_generate_the_unit_test = f\"\"\"\n",
" package_comment = \"# below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator\"\n",
" execute_system_message = {\n",
" \"role\": \"system\",\n",
" \"content\": \"You are a world-class Python developer with an eagle eye for unintended bugs and edge cases. You write careful, accurate unit tests. When asked to reply only with code, you write all of your code in a single block.\",\n",
" }\n",
" execute_user_message = {\n",
" \"role\": \"user\",\n",
" \"content\": f\"\"\"Using Python and the `{unit_test_package}` package, write a suite of unit tests for the function, following the cases above. Include helpful comments to explain each line. Reply only with code, formatted as follows:\n",
"\n",
"Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does.\n",
"```python\n",
"# imports\n",
"import {unit_test_package} # used for our unit tests\n",
"{{insert other imports as needed}}\n",
"\n",
"# function to test\n",
"{function_to_test}\n",
"\n",
"#{starter_comment}\"\"\"\n",
" if print_text:\n",
" print(text_color_prefix + prompt_to_generate_the_unit_test, end=\"\")\n",
"\n",
" # append this unit test prompt to the results from step 3\n",
"# unit tests\n",
"{package_comment}\n",
"{{insert unit test code here}}\n",
"```\"\"\",\n",
" }\n",
" execute_messages = [\n",
" execute_system_message,\n",
" explain_user_message,\n",
" explain_assistant_message,\n",
" plan_user_message,\n",
" plan_assistant_message,\n",
" ]\n",
" if elaboration_needed:\n",
" prior_text = full_elaboration_prompt + elaboration_completion\n",
" else:\n",
" prior_text = full_plan_prompt + plan_completion\n",
" full_unit_test_prompt = prior_text + prompt_to_generate_the_unit_test\n",
" execute_messages += [elaboration_user_message, elaboration_assistant_message]\n",
" execute_messages += [execute_user_message]\n",
" if print_text:\n",
" print_messages([execute_system_message, execute_user_message])\n",
"\n",
" # send the prompt to the API, using ``` as a stop sequence to stop at the end of the code block\n",
" unit_test_response = openai.Completion.create(\n",
" model=code_model,\n",
" prompt=full_unit_test_prompt,\n",
" stop=\"```\",\n",
" max_tokens=max_tokens,\n",
" execute_response = openai.ChatCompletion.create(\n",
" model=execute_model,\n",
" messages=execute_messages,\n",
" temperature=temperature,\n",
" stream=True\n",
" stream=True,\n",
" )\n",
" unit_test_completion = \"\"\n",
" if print_text:\n",
" print(completion_color_prefix, end=\"\")\n",
" for event in unit_test_response:\n",
" event_text = event[\"choices\"][0][\"text\"]\n",
" unit_test_completion += event_text\n",
" execution = \"\"\n",
" for chunk in execute_response:\n",
" delta = chunk[\"choices\"][0][\"delta\"]\n",
" if print_text:\n",
" print(event_text, end=\"\")\n",
" print_message_delta(delta)\n",
" if \"content\" in delta:\n",
" execution += delta[\"content\"]\n",
"\n",
" # check the output for errors\n",
" code_start_index = prompt_to_generate_the_unit_test.find(\"```python\\n\") + len(\"```python\\n\")\n",
" code_output = prompt_to_generate_the_unit_test[code_start_index:] + unit_test_completion\n",
" code = execution.split(\"```python\")[1].split(\"```\")[0].strip()\n",
" try:\n",
" ast.parse(code_output)\n",
" ast.parse(code)\n",
" except SyntaxError as e:\n",
" print(f\"Syntax error in generated code: {e}\")\n",
" if reruns_if_fail > 0:\n",
" print(\"Rerunning...\")\n",
" return unit_test_from_function(\n",
" return unit_tests_from_function(\n",
" function_to_test=function_to_test,\n",
" unit_test_package=unit_test_package,\n",
" approx_min_cases_to_cover=approx_min_cases_to_cover,\n",
" print_text=print_text,\n",
" text_model=text_model,\n",
" code_model=code_model,\n",
" max_tokens=max_tokens,\n",
" explain_model=explain_model,\n",
" plan_model=plan_model,\n",
" execute_model=execute_model,\n",
" temperature=temperature,\n",
" reruns_if_fail=reruns_if_fail-1, # decrement rerun counter when calling again\n",
" reruns_if_fail=reruns_if_fail\n",
" - 1, # decrement rerun counter when calling again\n",
" )\n",
"\n",
" # return the unit test as a string\n",
" return unit_test_completion\n"
" return code\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 2,
@ -278,19 +270,46 @@
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[30m# How to write great unit tests with pytest\n",
"\u001b[0m\n",
"[system]\n",
"You are a world-class Python developer with an eagle eye for unintended bugs and edge cases. You carefully explain code with great detail and accuracy. You organize your explanations in markdown-formatted, bulleted lists.\n",
"\u001b[0m\n",
"[user]\n",
"Please explain the following Python function. Review what each element of the function is doing precisely and what the author's intentions may have been. Organize your explanation as a markdown-formatted, bulleted list.\n",
"\n",
"In this advanced tutorial for experts, we'll use Python 3.9 and `pytest` to write a suite of unit tests to verify the behavior of the following function.\n",
"```python\n",
"def is_palindrome(s):\n",
" return s == s[::-1]\n",
"def pig_latin(text):\n",
" def translate(word):\n",
" vowels = 'aeiou'\n",
" if word[0] in vowels:\n",
" return word + 'way'\n",
" else:\n",
" consonants = ''\n",
" for letter in word:\n",
" if letter not in vowels:\n",
" consonants += letter\n",
" else:\n",
" break\n",
" return word[len(consonants):] + consonants + 'ay'\n",
"\n",
" words = text.lower().split()\n",
" translated_words = [translate(word) for word in words]\n",
" return ' '.join(translated_words)\n",
"\n",
"```\n",
"\u001b[92m\n",
"[assistant]\n",
"The `pig_latin` function takes a string of text and returns the text translated into pig latin. Here's how it works:\n",
"\n",
"Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been.\n",
"- First,\u001b[92m we have a function definition. This is where we give the function a name, `is_palindrome`, and specify the arguments that the function accepts. In this case, the function accepts a single string argument, `s`.\n",
"- Next, we have a return statement. This is where we specify the value that the function returns. In this case, the function returns `s == s[::-1]`.\n",
"- Finally, we have a function call. This is where we actually call the function with a specific set of arguments. In this case, we're calling the function with the string `\"racecar\"`.\u001b[30m\n",
" \n",
"* The function defines a nested function called `translate` that takes a single word as input and returns the word translated into pig latin.\n",
"* The `translate` function first defines a string of vowels.\n",
"* If the first letter of the input word is a vowel, the function adds \"way\" to the end of the word and returns the result.\n",
"* If the first letter of the input word is a consonant, the function loops through the word's letters until it finds a vowel.\n",
"* The function then takes the consonants at the beginning of the word and moves them to the end of the word, adding \"ay\" to the end of the word.\n",
"* The `pig_latin` function lowercases the input text and splits it into a list of words.\n",
"* The function then applies the `translate` function to each word in the list using a list comprehension.\n",
"* Finally, the function joins the translated words back together into a single string with spaces between each word and returns the result.\u001b[0m\n",
"[user]\n",
"A good unit test suite should aim to:\n",
"- Test the function's behavior for a wide range of possible inputs\n",
"- Test edge cases that the author may not have foreseen\n",
@ -298,127 +317,270 @@
"- Be easy to read and understand, with clean code and descriptive names\n",
"- Be deterministic, so that the tests always pass or fail in the same way\n",
"\n",
"`pytest` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above.\n",
"To help unit test the function above, list diverse scenarios that the function should be able to handle (and under each scenario, include a few examples as sub-bullets).\n",
"\u001b[92m\n",
"[assistant]\n",
"Here are some scenarios that the `pig_latin` function should be able to handle, along with examples:\n",
"\n",
"For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets):\n",
"-\u001b[92m The input is a palindrome\n",
" - `\"racecar\"`\n",
" - `\"madam\"`\n",
" - `\"anna\"`\n",
"- The input is not a palindrome\n",
" - `\"python\"`\n",
" - `\"test\"`\n",
" - `\"1234\"`\n",
"- The input is an empty string\n",
" - `\"\"`\n",
"- The input is `None`\n",
"- The input is not a string\n",
" - `1`\n",
" - `1.0`\n",
" - `True`\n",
" - `False`\n",
" - `[]`\n",
" - `{}`\u001b[30m\n",
"* Words that start with a vowel:\n",
" * \"apple\" -> \"appleway\"\n",
" * \"elephant\" -> \"elephantway\"\n",
"* Words that start with a single consonant:\n",
" * \"pig\" -> \"igpay\"\n",
" * \"latin\" -> \"atinlay\"\n",
"* Words that start with multiple consonants:\n",
" * \"string\" -> \"ingstray\"\n",
" * \"glove\" -> \"oveglay\"\n",
"* Words that contain numbers or special characters:\n",
" * \"hello!\" -> \"ellohay!\"\n",
" * \"world123\" -> \"orldway123\"\n",
"* Sentences with multiple words:\n",
" * \"hello world\" -> \"ellohay orldway\"\n",
" * \"the quick brown fox\" -> \"hetay ickquay ownbray oxfay\"\n",
"* Sentences with punctuation:\n",
" * \"Hello, world!\" -> \"Ellohay, orldway!\"\n",
" * \"The quick brown fox...\" -> \"Hetay ickquay ownbray oxfay...\" \n",
"* Empty strings:\n",
" * \"\" -> \"\"\u001b[0m\n",
"[user]\n",
"In addition to those scenarios above, list a few rare or unexpected edge cases (and as before, under each edge case, include a few examples as sub-bullets).\n",
"\u001b[92m\n",
"[assistant]\n",
"Here are some rare or unexpected edge cases that the `pig_latin` function should be able to handle, along with examples:\n",
"\n",
"In addition to the scenarios above, we'll also want to make sure we don't forget to test rare or unexpected edge cases (and under each edge case, we include a few examples as sub-bullets):\n",
"-\u001b[92m The input is a palindrome with spaces\n",
" - `\"race car\"`\n",
" - `\" madam \"`\n",
" - `\" anna \"`\n",
"- The input is not a palindrome with spaces\n",
" - `\" python \"`\n",
" - `\" test \"`\n",
" - `\" 1234 \"`\n",
"- The input is a palindrome with punctuation\n",
" - `\"racecar!\"`\n",
" - `\"Madam, I'm Adam.\"`\n",
" - `\"Anna's\"`\n",
"- The input is not a palindrome with punctuation\n",
" - `\"python!\"`\n",
" - `\"test.\"`\n",
" - `\"1234!\"`\n",
"- The input is a palindrome with mixed case\n",
" - `\"Racecar\"`\n",
" - `\"Madam\"`\n",
" - `\"Anna\"`\n",
"- The input is not a palindrome with mixed case\n",
" - `\"Python\"`\n",
" - `\"Test\"`\n",
" - `\"1234\"`\u001b[30m\n",
"* Words that consist entirely of consonants:\n",
" * \"xyz\" -> \"xyzay\"\n",
" * \"rhythm\" -> \"ythmrhay\"\n",
"* Words that consist entirely of vowels:\n",
" * \"aeiou\" -> \"aeiouway\"\n",
" * \"ouiea\" -> \"ouieaway\"\n",
"* Words that contain mixed case:\n",
" * \"PyThOn\" -> \"ythonpay\"\n",
" * \"eLePhAnT\" -> \"elephantway\"\n",
"* Strings that contain only whitespace:\n",
" * \" \" -> \"\"\n",
"* Non-string inputs:\n",
" * None -> TypeError\n",
" * 42 -> AttributeError\u001b[0m\n",
"[system]\n",
"You are a world-class Python developer with an eagle eye for unintended bugs and edge cases. You write careful, accurate unit tests. When asked to reply only with code, you write all of your code in a single block.\n",
"\u001b[0m\n",
"[user]\n",
"Using Python and the `pytest` package, write a suite of unit tests for the function, following the cases above. Include helpful comments to explain each line. Reply only with code, formatted as follows:\n",
"\n",
"Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does.\n",
"```python\n",
"# imports\n",
"import pytest # used for our unit tests\n",
"{insert other imports as needed}\n",
"\n",
"# function to test\n",
"def pig_latin(text):\n",
" def translate(word):\n",
" vowels = 'aeiou'\n",
" if word[0] in vowels:\n",
" return word + 'way'\n",
" else:\n",
" consonants = ''\n",
" for letter in word:\n",
" if letter not in vowels:\n",
" consonants += letter\n",
" else:\n",
" break\n",
" return word[len(consonants):] + consonants + 'ay'\n",
"\n",
" words = text.lower().split()\n",
" translated_words = [translate(word) for word in words]\n",
" return ' '.join(translated_words)\n",
"\n",
"\n",
"def is_palindrome(s):\n",
" return s == s[::-1]\n",
"# unit tests\n",
"# below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator\n",
"{insert unit test code here}\n",
"```\n",
"\u001b[92m\n",
"[assistant]\n",
"```python\n",
"# imports\n",
"import pytest\n",
"\n",
"# function to test\n",
"def pig_latin(text):\n",
" def translate(word):\n",
" vowels = 'aeiou'\n",
" if word[0] in vowels:\n",
" return word + 'way'\n",
" else:\n",
" consonants = ''\n",
" for letter in word:\n",
" if letter not in vowels:\n",
" consonants += letter\n",
" else:\n",
" break\n",
" return word[len(consonants):] + consonants + 'ay'\n",
"\n",
" words = text.lower().split()\n",
" translated_words = [translate(word) for word in words]\n",
" return ' '.join(translated_words)\n",
"\n",
"\n",
"# unit tests\n",
"# below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator\n",
"\n",
"# Tests for normal cases\n",
"@pytest.mark.parametrize('text, expected', [\n",
" ('apple', 'appleway'),\n",
" ('elephant', 'elephantway'),\n",
" ('pig', 'igpay'),\n",
" ('latin', 'atinlay'),\n",
" ('string', 'ingstray'),\n",
" ('glove', 'oveglay'),\n",
" ('hello world', 'ellohay orldway'),\n",
" ('the quick brown fox', 'hetay ickquay ownbray oxfay'),\n",
" ('Hello, world!', 'Ellohay, orldway!'),\n",
" ('The quick brown fox...', 'Hetay ickquay ownbray oxfay...'),\n",
" ('', ''),\n",
"])\n",
"\n",
"def test_pig_latin_normal_cases(text, expected):\n",
" assert pig_latin(text) == expected\n",
"\n",
"\n",
"# Tests for edge cases\n",
"@pytest.mark.parametrize('text, expected', [\n",
" ('xyz', 'xyzay'),\n",
" ('rhythm', 'ythmrhay'),\n",
" ('aeiou', 'aeiouway'),\n",
" ('ouiea', 'ouieaway'),\n",
" ('PyThOn', 'ythonpay'),\n",
" ('eLePhAnT', 'elephantway'),\n",
" (' ', ''),\n",
" (None, TypeError),\n",
" (42, AttributeError)\n",
"])\n",
"\n",
"#Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator\u001b[92m.\n",
"#The first element of the tuple is a name for the test case, and the second element is a list of arguments for the test case.\n",
"#The @pytest.mark.parametrize decorator will generate a separate test function for each test case.\n",
"#The generated test function will be named test_is_palindrome_<name> where <name> is the name of the test case.\n",
"#The generated test function will be given the arguments specified in the list of arguments for the test case.\n",
"#The generated test function will be given the fixture specified in the decorator, in this case the function itself.\n",
"#The generated test function will call the function with the arguments and assert that the result is equal to the expected value.\n",
"@pytest.mark.parametrize(\n",
" \"name,args,expected\",\n",
" [\n",
" # Test the function's behavior for a wide range of possible inputs\n",
" (\"palindrome\", [\"racecar\"], True),\n",
" (\"palindrome\", [\"madam\"], True),\n",
" (\"palindrome\", [\"anna\"], True),\n",
" (\"non-palindrome\", [\"python\"], False),\n",
" (\"non-palindrome\", [\"test\"], False),\n",
" (\"non-palindrome\", [\"1234\"], False),\n",
" (\"empty string\", [\"\"], True),\n",
" (\"None\", [None], False),\n",
" (\"non-string\", [1], False),\n",
" (\"non-string\", [1.0], False),\n",
" (\"non-string\", [True], False),\n",
" (\"non-string\", [False], False),\n",
" (\"non-string\", [[]], False),\n",
" (\"non-string\", [{}], False),\n",
" # Test edge cases that the author may not have foreseen\n",
" (\"palindrome with spaces\", [\"race car\"], True),\n",
" (\"palindrome with spaces\", [\" madam \"], True),\n",
" (\"palindrome with spaces\", [\" anna \"], True),\n",
" (\"non-palindrome with spaces\", [\" python \"], False),\n",
" (\"non-palindrome with spaces\", [\" test \"], False),\n",
" (\"non-palindrome with spaces\", [\" 1234 \"], False),\n",
" (\"palindrome with punctuation\", [\"racecar!\"], True),\n",
" (\"palindrome with punctuation\", [\"Madam, I'm Adam.\"], True),\n",
" (\"palindrome with punctuation\", [\"Anna's\"], True),\n",
" (\"non-palindrome with punctuation\", [\"python!\"], False),\n",
" (\"non-palindrome with punctuation\", [\"test.\"], False),\n",
" (\"non-palindrome with punctuation\", [\"1234!\"], False),\n",
" (\"palindrome with mixed case\", [\"Racecar\"], True),\n",
" (\"palindrome with mixed case\", [\"Madam\"], True),\n",
" (\"palindrome with mixed case\", [\"Anna\"], True),\n",
" (\"non-palindrome with mixed case\", [\"Python\"], False),\n",
" (\"non-palindrome with mixed case\", [\"Test\"], False),\n",
" (\"non-palindrome with mixed case\", [\"1234\"], False),\n",
" ],\n",
")\n",
"def test_is_palindrome(is_palindrome, args, expected):\n",
" assert is_palindrome(*args) == expected\n"
"def test_pig_latin_edge_cases(text, expected):\n",
" if type(expected) == type:\n",
" with pytest.raises(expected):\n",
" pig_latin(text)\n",
" else:\n",
" assert pig_latin(text) == expected\n",
"```"
]
},
}
],
"source": [
"example_function = \"\"\"def pig_latin(text):\n",
" def translate(word):\n",
" vowels = 'aeiou'\n",
" if word[0] in vowels:\n",
" return word + 'way'\n",
" else:\n",
" consonants = ''\n",
" for letter in word:\n",
" if letter not in vowels:\n",
" consonants += letter\n",
" else:\n",
" break\n",
" return word[len(consonants):] + consonants + 'ay'\n",
"\n",
" words = text.lower().split()\n",
" translated_words = [translate(word) for word in words]\n",
" return ' '.join(translated_words)\n",
"\"\"\"\n",
"\n",
"unit_tests = unit_tests_from_function(\n",
" example_function,\n",
" approx_min_cases_to_cover=10,\n",
" print_text=True\n",
")\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'.\\n#The first element of the tuple is a name for the test case, and the second element is a list of arguments for the test case.\\n#The @pytest.mark.parametrize decorator will generate a separate test function for each test case.\\n#The generated test function will be named test_is_palindrome_<name> where <name> is the name of the test case.\\n#The generated test function will be given the arguments specified in the list of arguments for the test case.\\n#The generated test function will be given the fixture specified in the decorator, in this case the function itself.\\n#The generated test function will call the function with the arguments and assert that the result is equal to the expected value.\\n@pytest.mark.parametrize(\\n \"name,args,expected\",\\n [\\n # Test the function\\'s behavior for a wide range of possible inputs\\n (\"palindrome\", [\"racecar\"], True),\\n (\"palindrome\", [\"madam\"], True),\\n (\"palindrome\", [\"anna\"], True),\\n (\"non-palindrome\", [\"python\"], False),\\n (\"non-palindrome\", [\"test\"], False),\\n (\"non-palindrome\", [\"1234\"], False),\\n (\"empty string\", [\"\"], True),\\n (\"None\", [None], False),\\n (\"non-string\", [1], False),\\n (\"non-string\", [1.0], False),\\n (\"non-string\", [True], False),\\n (\"non-string\", [False], False),\\n (\"non-string\", [[]], False),\\n (\"non-string\", [{}], False),\\n # Test edge cases that the author may not have foreseen\\n (\"palindrome with spaces\", [\"race car\"], True),\\n (\"palindrome with spaces\", [\" madam \"], True),\\n (\"palindrome with spaces\", [\" anna \"], True),\\n (\"non-palindrome with spaces\", [\" python \"], False),\\n (\"non-palindrome with spaces\", [\" test \"], False),\\n (\"non-palindrome with spaces\", [\" 1234 \"], False),\\n (\"palindrome with punctuation\", [\"racecar!\"], True),\\n (\"palindrome with punctuation\", [\"Madam, I\\'m Adam.\"], True),\\n (\"palindrome with punctuation\", [\"Anna\\'s\"], True),\\n (\"non-palindrome with punctuation\", [\"python!\"], False),\\n (\"non-palindrome with punctuation\", [\"test.\"], False),\\n (\"non-palindrome with punctuation\", [\"1234!\"], False),\\n (\"palindrome with mixed case\", [\"Racecar\"], True),\\n (\"palindrome with mixed case\", [\"Madam\"], True),\\n (\"palindrome with mixed case\", [\"Anna\"], True),\\n (\"non-palindrome with mixed case\", [\"Python\"], False),\\n (\"non-palindrome with mixed case\", [\"Test\"], False),\\n (\"non-palindrome with mixed case\", [\"1234\"], False),\\n ],\\n)\\ndef test_is_palindrome(is_palindrome, args, expected):\\n assert is_palindrome(*args) == expected\\n'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
"name": "stdout",
"output_type": "stream",
"text": [
"# imports\n",
"import pytest\n",
"\n",
"# function to test\n",
"def pig_latin(text):\n",
" def translate(word):\n",
" vowels = 'aeiou'\n",
" if word[0] in vowels:\n",
" return word + 'way'\n",
" else:\n",
" consonants = ''\n",
" for letter in word:\n",
" if letter not in vowels:\n",
" consonants += letter\n",
" else:\n",
" break\n",
" return word[len(consonants):] + consonants + 'ay'\n",
"\n",
" words = text.lower().split()\n",
" translated_words = [translate(word) for word in words]\n",
" return ' '.join(translated_words)\n",
"\n",
"\n",
"# unit tests\n",
"# below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator\n",
"\n",
"# Tests for normal cases\n",
"@pytest.mark.parametrize('text, expected', [\n",
" ('apple', 'appleway'),\n",
" ('elephant', 'elephantway'),\n",
" ('pig', 'igpay'),\n",
" ('latin', 'atinlay'),\n",
" ('string', 'ingstray'),\n",
" ('glove', 'oveglay'),\n",
" ('hello world', 'ellohay orldway'),\n",
" ('the quick brown fox', 'hetay ickquay ownbray oxfay'),\n",
" ('Hello, world!', 'Ellohay, orldway!'),\n",
" ('The quick brown fox...', 'Hetay ickquay ownbray oxfay...'),\n",
" ('', ''),\n",
"])\n",
"\n",
"def test_pig_latin_normal_cases(text, expected):\n",
" assert pig_latin(text) == expected\n",
"\n",
"\n",
"# Tests for edge cases\n",
"@pytest.mark.parametrize('text, expected', [\n",
" ('xyz', 'xyzay'),\n",
" ('rhythm', 'ythmrhay'),\n",
" ('aeiou', 'aeiouway'),\n",
" ('ouiea', 'ouieaway'),\n",
" ('PyThOn', 'ythonpay'),\n",
" ('eLePhAnT', 'elephantway'),\n",
" (' ', ''),\n",
" (None, TypeError),\n",
" (42, AttributeError)\n",
"])\n",
"\n",
"def test_pig_latin_edge_cases(text, expected):\n",
" if type(expected) == type:\n",
" with pytest.raises(expected):\n",
" pig_latin(text)\n",
" else:\n",
" assert pig_latin(text) == expected\n"
]
}
],
"source": [
"example_function = \"\"\"def is_palindrome(s):\n",
" return s == s[::-1]\"\"\"\n",
"\n",
"unit_test_from_function(example_function, print_text=True)"
"print(unit_tests)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Make sure to check any code before using it, as GPT makes plenty of mistakes (especially on character-based tasks like this one). For best results, use the most powerful model (GPT-4, as of May 2023)."
]
}
],

@ -0,0 +1,452 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Unit test writing using a multi-step prompt\n",
"\n",
"Complex tasks, such as writing unit tests, can benefit from multi-step prompts. In contrast to a single prompt, a multi-step prompt generates text from GPT-3 and then feeds that text back into subsequent prompts. This can help in cases where you want GPT-3 to explain its reasoning before answering, or brainstorm a plan before executing it.\n",
"\n",
"In this notebook, we use a 3-step prompt to write unit tests in Python using the following steps:\n",
"\n",
"1. Given a Python function, we first prompt GPT-3 to explain what the function is doing.\n",
"2. Second, we prompt GPT-3 to plan a set of unit tests for the function.\n",
" - If the plan is too short, we ask GPT-3 to elaborate with more ideas for unit tests.\n",
"3. Finally, we prompt GPT-3 to write the unit tests.\n",
"\n",
"The code example illustrates a few optional embellishments on the chained, multi-step prompt:\n",
"\n",
"- Conditional branching (e.g., only asking for elaboration if the first plan is too short)\n",
"- Different models for different steps (e.g., `text-davinci-002` for the text planning steps and `code-davinci-002` for the code writing step)\n",
"- A check that re-runs the function if the output is unsatisfactory (e.g., if the output code cannot be parsed by Python's `ast` module)\n",
"- Streaming output so that you can start reading the output before it's fully generated (useful for long, multi-step outputs)\n",
"\n",
"The full 3-step prompt looks like this (using as an example `pytest` for the unit test framework and `is_palindrome` as the function):\n",
"\n",
" # How to write great unit tests with pytest\n",
"\n",
" In this advanced tutorial for experts, we'll use Python 3.9 and `pytest` to write a suite of unit tests to verify the behavior of the following function.\n",
" ```python\n",
" def is_palindrome(s):\n",
" return s == s[::-1]\n",
" ```\n",
"\n",
" Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been.\n",
" - First,{GENERATED IN STEP 1}\n",
" \n",
" A good unit test suite should aim to:\n",
" - Test the function's behavior for a wide range of possible inputs\n",
" - Test edge cases that the author may not have foreseen\n",
" - Take advantage of the features of `pytest` to make the tests easy to write and maintain\n",
" - Be easy to read and understand, with clean code and descriptive names\n",
" - Be deterministic, so that the tests always pass or fail in the same way\n",
"\n",
" `pytest` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above.\n",
"\n",
" For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets):\n",
" -{GENERATED IN STEP 2}\n",
"\n",
" [OPTIONALLY APPENDED]In addition to the scenarios above, we'll also want to make sure we don't forget to test rare or unexpected edge cases (and under each edge case, we include a few examples as sub-bullets):\n",
" -{GENERATED IN STEP 2B}\n",
"\n",
" Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does.\n",
" ```python\n",
" import pytest # used for our unit tests\n",
"\n",
" def is_palindrome(s):\n",
" return s == s[::-1]\n",
"\n",
" #Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator\n",
" {GENERATED IN STEP 3}"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# imports needed to run the code in this notebook\n",
"import ast # used for detecting whether generated Python code is valid\n",
"import openai # used for calling the OpenAI API\n",
"\n",
"# example of a function that uses a multi-step prompt to write unit tests\n",
"def unit_test_from_function(\n",
" function_to_test: str, # Python function to test, as a string\n",
" unit_test_package: str = \"pytest\", # unit testing package; use the name as it appears in the import statement\n",
" approx_min_cases_to_cover: int = 7, # minimum number of test case categories to cover (approximate)\n",
" print_text: bool = False, # optionally prints text; helpful for understanding the function & debugging\n",
" text_model: str = \"text-davinci-002\", # model used to generate text plans in steps 1, 2, and 2b\n",
" code_model: str = \"code-davinci-002\", # if you don't have access to code models, you can use text models here instead\n",
" max_tokens: int = 1000, # can set this high, as generations should be stopped earlier by stop sequences\n",
" temperature: float = 0.4, # temperature = 0 can sometimes get stuck in repetitive loops, so we use 0.4\n",
" reruns_if_fail: int = 1, # if the output code cannot be parsed, this will re-run the function up to N times\n",
") -> str:\n",
" \"\"\"Outputs a unit test for a given Python function, using a 3-step GPT-3 prompt.\"\"\"\n",
"\n",
" # Step 1: Generate an explanation of the function\n",
"\n",
" # create a markdown-formatted prompt that asks GPT-3 to complete an explanation of the function, formatted as a bullet list\n",
" prompt_to_explain_the_function = f\"\"\"# How to write great unit tests with {unit_test_package}\n",
"\n",
"In this advanced tutorial for experts, we'll use Python 3.9 and `{unit_test_package}` to write a suite of unit tests to verify the behavior of the following function.\n",
"```python\n",
"{function_to_test}\n",
"```\n",
"\n",
"Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been.\n",
"- First,\"\"\"\n",
" if print_text:\n",
" text_color_prefix = \"\\033[30m\" # black; if you read against a dark background \\033[97m is white\n",
" print(text_color_prefix + prompt_to_explain_the_function, end=\"\") # end='' prevents a newline from being printed\n",
"\n",
" # send the prompt to the API, using \\n\\n as a stop sequence to stop at the end of the bullet list\n",
" explanation_response = openai.Completion.create(\n",
" model=text_model,\n",
" prompt=prompt_to_explain_the_function,\n",
" stop=[\"\\n\\n\", \"\\n\\t\\n\", \"\\n \\n\"],\n",
" max_tokens=max_tokens,\n",
" temperature=temperature,\n",
" stream=True,\n",
" )\n",
" explanation_completion = \"\"\n",
" if print_text:\n",
" completion_color_prefix = \"\\033[92m\" # green\n",
" print(completion_color_prefix, end=\"\")\n",
" for event in explanation_response:\n",
" event_text = event[\"choices\"][0][\"text\"]\n",
" explanation_completion += event_text\n",
" if print_text:\n",
" print(event_text, end=\"\")\n",
"\n",
" # Step 2: Generate a plan to write a unit test\n",
"\n",
" # create a markdown-formatted prompt that asks GPT-3 to complete a plan for writing unit tests, formatted as a bullet list\n",
" prompt_to_explain_a_plan = f\"\"\"\n",
" \n",
"A good unit test suite should aim to:\n",
"- Test the function's behavior for a wide range of possible inputs\n",
"- Test edge cases that the author may not have foreseen\n",
"- Take advantage of the features of `{unit_test_package}` to make the tests easy to write and maintain\n",
"- Be easy to read and understand, with clean code and descriptive names\n",
"- Be deterministic, so that the tests always pass or fail in the same way\n",
"\n",
"`{unit_test_package}` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above.\n",
"\n",
"For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets):\n",
"-\"\"\"\n",
" if print_text:\n",
" print(text_color_prefix + prompt_to_explain_a_plan, end=\"\")\n",
"\n",
" # append this planning prompt to the results from step 1\n",
" prior_text = prompt_to_explain_the_function + explanation_completion\n",
" full_plan_prompt = prior_text + prompt_to_explain_a_plan\n",
"\n",
" # send the prompt to the API, using \\n\\n as a stop sequence to stop at the end of the bullet list\n",
" plan_response = openai.Completion.create(\n",
" model=text_model,\n",
" prompt=full_plan_prompt,\n",
" stop=[\"\\n\\n\", \"\\n\\t\\n\", \"\\n \\n\"],\n",
" max_tokens=max_tokens,\n",
" temperature=temperature,\n",
" stream=True,\n",
" )\n",
" plan_completion = \"\"\n",
" if print_text:\n",
" print(completion_color_prefix, end=\"\")\n",
" for event in plan_response:\n",
" event_text = event[\"choices\"][0][\"text\"]\n",
" plan_completion += event_text\n",
" if print_text:\n",
" print(event_text, end=\"\")\n",
"\n",
" # Step 2b: If the plan is short, ask GPT-3 to elaborate further\n",
" # this counts top-level bullets (e.g., categories), but not sub-bullets (e.g., test cases)\n",
" elaboration_needed = plan_completion.count(\"\\n-\") +1 < approx_min_cases_to_cover # adds 1 because the first bullet is not counted\n",
" if elaboration_needed:\n",
" prompt_to_elaborate_on_the_plan = f\"\"\"\n",
"\n",
"In addition to the scenarios above, we'll also want to make sure we don't forget to test rare or unexpected edge cases (and under each edge case, we include a few examples as sub-bullets):\n",
"-\"\"\"\n",
" if print_text:\n",
" print(text_color_prefix + prompt_to_elaborate_on_the_plan, end=\"\")\n",
"\n",
" # append this elaboration prompt to the results from step 2\n",
" prior_text = full_plan_prompt + plan_completion\n",
" full_elaboration_prompt = prior_text + prompt_to_elaborate_on_the_plan\n",
"\n",
" # send the prompt to the API, using \\n\\n as a stop sequence to stop at the end of the bullet list\n",
" elaboration_response = openai.Completion.create(\n",
" model=text_model,\n",
" prompt=full_elaboration_prompt,\n",
" stop=[\"\\n\\n\", \"\\n\\t\\n\", \"\\n \\n\"],\n",
" max_tokens=max_tokens,\n",
" temperature=temperature,\n",
" stream=True,\n",
" )\n",
" elaboration_completion = \"\"\n",
" if print_text:\n",
" print(completion_color_prefix, end=\"\")\n",
" for event in elaboration_response:\n",
" event_text = event[\"choices\"][0][\"text\"]\n",
" elaboration_completion += event_text\n",
" if print_text:\n",
" print(event_text, end=\"\")\n",
"\n",
" # Step 3: Generate the unit test\n",
"\n",
" # create a markdown-formatted prompt that asks GPT-3 to complete a unit test\n",
" starter_comment = \"\"\n",
" if unit_test_package == \"pytest\":\n",
" starter_comment = \"Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator\"\n",
" prompt_to_generate_the_unit_test = f\"\"\"\n",
"\n",
"Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does.\n",
"```python\n",
"import {unit_test_package} # used for our unit tests\n",
"\n",
"{function_to_test}\n",
"\n",
"#{starter_comment}\"\"\"\n",
" if print_text:\n",
" print(text_color_prefix + prompt_to_generate_the_unit_test, end=\"\")\n",
"\n",
" # append this unit test prompt to the results from step 3\n",
" if elaboration_needed:\n",
" prior_text = full_elaboration_prompt + elaboration_completion\n",
" else:\n",
" prior_text = full_plan_prompt + plan_completion\n",
" full_unit_test_prompt = prior_text + prompt_to_generate_the_unit_test\n",
"\n",
" # send the prompt to the API, using ``` as a stop sequence to stop at the end of the code block\n",
" unit_test_response = openai.Completion.create(\n",
" model=code_model,\n",
" prompt=full_unit_test_prompt,\n",
" stop=\"```\",\n",
" max_tokens=max_tokens,\n",
" temperature=temperature,\n",
" stream=True\n",
" )\n",
" unit_test_completion = \"\"\n",
" if print_text:\n",
" print(completion_color_prefix, end=\"\")\n",
" for event in unit_test_response:\n",
" event_text = event[\"choices\"][0][\"text\"]\n",
" unit_test_completion += event_text\n",
" if print_text:\n",
" print(event_text, end=\"\")\n",
"\n",
" # check the output for errors\n",
" code_start_index = prompt_to_generate_the_unit_test.find(\"```python\\n\") + len(\"```python\\n\")\n",
" code_output = prompt_to_generate_the_unit_test[code_start_index:] + unit_test_completion\n",
" try:\n",
" ast.parse(code_output)\n",
" except SyntaxError as e:\n",
" print(f\"Syntax error in generated code: {e}\")\n",
" if reruns_if_fail > 0:\n",
" print(\"Rerunning...\")\n",
" return unit_test_from_function(\n",
" function_to_test=function_to_test,\n",
" unit_test_package=unit_test_package,\n",
" approx_min_cases_to_cover=approx_min_cases_to_cover,\n",
" print_text=print_text,\n",
" text_model=text_model,\n",
" code_model=code_model,\n",
" max_tokens=max_tokens,\n",
" temperature=temperature,\n",
" reruns_if_fail=reruns_if_fail-1, # decrement rerun counter when calling again\n",
" )\n",
"\n",
" # return the unit test as a string\n",
" return unit_test_completion\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[30m# How to write great unit tests with pytest\n",
"\n",
"In this advanced tutorial for experts, we'll use Python 3.9 and `pytest` to write a suite of unit tests to verify the behavior of the following function.\n",
"```python\n",
"def is_palindrome(s):\n",
" return s == s[::-1]\n",
"```\n",
"\n",
"Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been.\n",
"- First,\u001b[92m we have a function definition. This is where we give the function a name, `is_palindrome`, and specify the arguments that the function accepts. In this case, the function accepts a single string argument, `s`.\n",
"- Next, we have a return statement. This is where we specify the value that the function returns. In this case, the function returns `s == s[::-1]`.\n",
"- Finally, we have a function call. This is where we actually call the function with a specific set of arguments. In this case, we're calling the function with the string `\"racecar\"`.\u001b[30m\n",
" \n",
"A good unit test suite should aim to:\n",
"- Test the function's behavior for a wide range of possible inputs\n",
"- Test edge cases that the author may not have foreseen\n",
"- Take advantage of the features of `pytest` to make the tests easy to write and maintain\n",
"- Be easy to read and understand, with clean code and descriptive names\n",
"- Be deterministic, so that the tests always pass or fail in the same way\n",
"\n",
"`pytest` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above.\n",
"\n",
"For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets):\n",
"-\u001b[92m The input is a palindrome\n",
" - `\"racecar\"`\n",
" - `\"madam\"`\n",
" - `\"anna\"`\n",
"- The input is not a palindrome\n",
" - `\"python\"`\n",
" - `\"test\"`\n",
" - `\"1234\"`\n",
"- The input is an empty string\n",
" - `\"\"`\n",
"- The input is `None`\n",
"- The input is not a string\n",
" - `1`\n",
" - `1.0`\n",
" - `True`\n",
" - `False`\n",
" - `[]`\n",
" - `{}`\u001b[30m\n",
"\n",
"In addition to the scenarios above, we'll also want to make sure we don't forget to test rare or unexpected edge cases (and under each edge case, we include a few examples as sub-bullets):\n",
"-\u001b[92m The input is a palindrome with spaces\n",
" - `\"race car\"`\n",
" - `\" madam \"`\n",
" - `\" anna \"`\n",
"- The input is not a palindrome with spaces\n",
" - `\" python \"`\n",
" - `\" test \"`\n",
" - `\" 1234 \"`\n",
"- The input is a palindrome with punctuation\n",
" - `\"racecar!\"`\n",
" - `\"Madam, I'm Adam.\"`\n",
" - `\"Anna's\"`\n",
"- The input is not a palindrome with punctuation\n",
" - `\"python!\"`\n",
" - `\"test.\"`\n",
" - `\"1234!\"`\n",
"- The input is a palindrome with mixed case\n",
" - `\"Racecar\"`\n",
" - `\"Madam\"`\n",
" - `\"Anna\"`\n",
"- The input is not a palindrome with mixed case\n",
" - `\"Python\"`\n",
" - `\"Test\"`\n",
" - `\"1234\"`\u001b[30m\n",
"\n",
"Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does.\n",
"```python\n",
"import pytest # used for our unit tests\n",
"\n",
"def is_palindrome(s):\n",
" return s == s[::-1]\n",
"\n",
"#Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator\u001b[92m.\n",
"#The first element of the tuple is a name for the test case, and the second element is a list of arguments for the test case.\n",
"#The @pytest.mark.parametrize decorator will generate a separate test function for each test case.\n",
"#The generated test function will be named test_is_palindrome_<name> where <name> is the name of the test case.\n",
"#The generated test function will be given the arguments specified in the list of arguments for the test case.\n",
"#The generated test function will be given the fixture specified in the decorator, in this case the function itself.\n",
"#The generated test function will call the function with the arguments and assert that the result is equal to the expected value.\n",
"@pytest.mark.parametrize(\n",
" \"name,args,expected\",\n",
" [\n",
" # Test the function's behavior for a wide range of possible inputs\n",
" (\"palindrome\", [\"racecar\"], True),\n",
" (\"palindrome\", [\"madam\"], True),\n",
" (\"palindrome\", [\"anna\"], True),\n",
" (\"non-palindrome\", [\"python\"], False),\n",
" (\"non-palindrome\", [\"test\"], False),\n",
" (\"non-palindrome\", [\"1234\"], False),\n",
" (\"empty string\", [\"\"], True),\n",
" (\"None\", [None], False),\n",
" (\"non-string\", [1], False),\n",
" (\"non-string\", [1.0], False),\n",
" (\"non-string\", [True], False),\n",
" (\"non-string\", [False], False),\n",
" (\"non-string\", [[]], False),\n",
" (\"non-string\", [{}], False),\n",
" # Test edge cases that the author may not have foreseen\n",
" (\"palindrome with spaces\", [\"race car\"], True),\n",
" (\"palindrome with spaces\", [\" madam \"], True),\n",
" (\"palindrome with spaces\", [\" anna \"], True),\n",
" (\"non-palindrome with spaces\", [\" python \"], False),\n",
" (\"non-palindrome with spaces\", [\" test \"], False),\n",
" (\"non-palindrome with spaces\", [\" 1234 \"], False),\n",
" (\"palindrome with punctuation\", [\"racecar!\"], True),\n",
" (\"palindrome with punctuation\", [\"Madam, I'm Adam.\"], True),\n",
" (\"palindrome with punctuation\", [\"Anna's\"], True),\n",
" (\"non-palindrome with punctuation\", [\"python!\"], False),\n",
" (\"non-palindrome with punctuation\", [\"test.\"], False),\n",
" (\"non-palindrome with punctuation\", [\"1234!\"], False),\n",
" (\"palindrome with mixed case\", [\"Racecar\"], True),\n",
" (\"palindrome with mixed case\", [\"Madam\"], True),\n",
" (\"palindrome with mixed case\", [\"Anna\"], True),\n",
" (\"non-palindrome with mixed case\", [\"Python\"], False),\n",
" (\"non-palindrome with mixed case\", [\"Test\"], False),\n",
" (\"non-palindrome with mixed case\", [\"1234\"], False),\n",
" ],\n",
")\n",
"def test_is_palindrome(is_palindrome, args, expected):\n",
" assert is_palindrome(*args) == expected\n"
]
},
{
"data": {
"text/plain": [
"'.\\n#The first element of the tuple is a name for the test case, and the second element is a list of arguments for the test case.\\n#The @pytest.mark.parametrize decorator will generate a separate test function for each test case.\\n#The generated test function will be named test_is_palindrome_<name> where <name> is the name of the test case.\\n#The generated test function will be given the arguments specified in the list of arguments for the test case.\\n#The generated test function will be given the fixture specified in the decorator, in this case the function itself.\\n#The generated test function will call the function with the arguments and assert that the result is equal to the expected value.\\n@pytest.mark.parametrize(\\n \"name,args,expected\",\\n [\\n # Test the function\\'s behavior for a wide range of possible inputs\\n (\"palindrome\", [\"racecar\"], True),\\n (\"palindrome\", [\"madam\"], True),\\n (\"palindrome\", [\"anna\"], True),\\n (\"non-palindrome\", [\"python\"], False),\\n (\"non-palindrome\", [\"test\"], False),\\n (\"non-palindrome\", [\"1234\"], False),\\n (\"empty string\", [\"\"], True),\\n (\"None\", [None], False),\\n (\"non-string\", [1], False),\\n (\"non-string\", [1.0], False),\\n (\"non-string\", [True], False),\\n (\"non-string\", [False], False),\\n (\"non-string\", [[]], False),\\n (\"non-string\", [{}], False),\\n # Test edge cases that the author may not have foreseen\\n (\"palindrome with spaces\", [\"race car\"], True),\\n (\"palindrome with spaces\", [\" madam \"], True),\\n (\"palindrome with spaces\", [\" anna \"], True),\\n (\"non-palindrome with spaces\", [\" python \"], False),\\n (\"non-palindrome with spaces\", [\" test \"], False),\\n (\"non-palindrome with spaces\", [\" 1234 \"], False),\\n (\"palindrome with punctuation\", [\"racecar!\"], True),\\n (\"palindrome with punctuation\", [\"Madam, I\\'m Adam.\"], True),\\n (\"palindrome with punctuation\", [\"Anna\\'s\"], True),\\n (\"non-palindrome with punctuation\", [\"python!\"], False),\\n (\"non-palindrome with punctuation\", [\"test.\"], False),\\n (\"non-palindrome with punctuation\", [\"1234!\"], False),\\n (\"palindrome with mixed case\", [\"Racecar\"], True),\\n (\"palindrome with mixed case\", [\"Madam\"], True),\\n (\"palindrome with mixed case\", [\"Anna\"], True),\\n (\"non-palindrome with mixed case\", [\"Python\"], False),\\n (\"non-palindrome with mixed case\", [\"Test\"], False),\\n (\"non-palindrome with mixed case\", [\"1234\"], False),\\n ],\\n)\\ndef test_is_palindrome(is_palindrome, args, expected):\\n assert is_palindrome(*args) == expected\\n'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"example_function = \"\"\"def is_palindrome(s):\n",
" return s == s[::-1]\"\"\"\n",
"\n",
"unit_test_from_function(example_function, print_text=True)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.9 ('openai')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.9"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading…
Cancel
Save