diff --git a/examples/Unit_test_writing_using_a_multi-step_prompt.ipynb b/examples/Unit_test_writing_using_a_multi-step_prompt.ipynb new file mode 100644 index 00000000..2e83eece --- /dev/null +++ b/examples/Unit_test_writing_using_a_multi-step_prompt.ipynb @@ -0,0 +1,452 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Unit test writing using a multi-step prompt\n", + "\n", + "Complex tasks, such as writing unit tests, can benefit from multi-step prompts. In contrast to a single prompt, a multi-step prompt generates text from GPT-3 and then feeds that text back into subsequent prompts. This can help in cases where you want GPT-3 to explain its reasoning before answering, or brainstorm a plan before executing it.\n", + "\n", + "In this notebook, we use a 3-step prompt to write unit tests in Python using the following steps:\n", + "\n", + "1. Given a Python function, we first prompt GPT-3 to explain what the function is doing.\n", + "2. Second, we prompt GPT-3 to plan a set of unit tests for the function.\n", + " - If the plan is too short, we ask GPT-3 to elaborate with more ideas for unit tests.\n", + "3. Finally, we prompt GPT-3 to write the unit tests.\n", + "\n", + "The code example illustrates a few optional embellishments on the chained, multi-step prompt:\n", + "\n", + "- Conditional branching (e.g., only asking for elaboration if the first plan is too short)\n", + "- Different models for different steps (e.g., `text-davinci-002` for the text planning steps and `code-davinci-002` for the code writing step)\n", + "- A check that re-runs the function if the output is unsatisfactory (e.g., if the output code cannot be parsed by Python's `ast` module)\n", + "- Streaming output so that you can start reading the output before it's fully generated (useful for long, multi-step outputs)\n", + "\n", + "The full 3-step prompt looks like this (using as an example `pytest` for the unit test framework and `is_palindrome` as the function):\n", + "\n", + " # How to write great unit tests with pytest\n", + "\n", + " In this advanced tutorial for experts, we'll use Python 3.9 and `pytest` to write a suite of unit tests to verify the behavior of the following function.\n", + " ```python\n", + " def is_palindrome(s):\n", + " return s == s[::-1]\n", + " ```\n", + "\n", + " Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been.\n", + " - First,{GENERATED IN STEP 1}\n", + " \n", + " A good unit test suite should aim to:\n", + " - Test the function's behavior for a wide range of possible inputs\n", + " - Test edge cases that the author may not have foreseen\n", + " - Take advantage of the features of `pytest` to make the tests easy to write and maintain\n", + " - Be easy to read and understand, with clean code and descriptive names\n", + " - Be deterministic, so that the tests always pass or fail in the same way\n", + "\n", + " `pytest` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above.\n", + "\n", + " For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets):\n", + " -{GENERATED IN STEP 2}\n", + "\n", + " [OPTIONALLY APPENDED]In addition to the scenarios above, we'll also want to make sure we don't forget to test rare or unexpected edge cases (and under each edge case, we include a few examples as sub-bullets):\n", + " -{GENERATED IN STEP 2B}\n", + "\n", + " Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does.\n", + " ```python\n", + " import pytest # used for our unit tests\n", + "\n", + " def is_palindrome(s):\n", + " return s == s[::-1]\n", + "\n", + " #Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator\n", + " {GENERATED IN STEP 3}" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# imports needed to run the code in this notebook\n", + "import ast # used for detecting whether generated Python code is valid\n", + "import openai # used for calling the OpenAI API\n", + "\n", + "# example of a function that uses a multi-step prompt to write unit tests\n", + "def unit_test_from_function(\n", + " function_to_test: str, # Python function to test, as a string\n", + " unit_test_package: str = \"pytest\", # unit testing package; use the name as it appears in the import statement\n", + " approx_min_cases_to_cover: int = 7, # minimum number of test case categories to cover (approximate)\n", + " print_text: bool = False, # optionally prints text; helpful for understanding the function & debugging\n", + " text_model: str = \"text-davinci-002\", # model used to generate text plans in steps 1, 2, and 2b\n", + " code_model: str = \"code-davinci-002\", # if you don't have access to code models, you can use text models here instead\n", + " max_tokens: int = 1000, # can set this high, as generations should be stopped earlier by stop sequences\n", + " temperature: float = 0.4, # temperature = 0 can sometimes get stuck in repetitive loops, so we use 0.4\n", + " reruns_if_fail: int = 1, # if the output code cannot be parsed, this will re-run the function up to N times\n", + ") -> str:\n", + " \"\"\"Outputs a unit test for a given Python function, using a 3-step GPT-3 prompt.\"\"\"\n", + "\n", + " # Step 1: Generate an explanation of the function\n", + "\n", + " # create a markdown-formatted prompt that asks GPT-3 to complete an explanation of the function, formatted as a bullet list\n", + " prompt_to_explain_the_function = f\"\"\"# How to write great unit tests with {unit_test_package}\n", + "\n", + "In this advanced tutorial for experts, we'll use Python 3.9 and `{unit_test_package}` to write a suite of unit tests to verify the behavior of the following function.\n", + "```python\n", + "{function_to_test}\n", + "```\n", + "\n", + "Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been.\n", + "- First,\"\"\"\n", + " if print_text:\n", + " text_color_prefix = \"\\033[30m\" # black; if you read against a dark background \\033[97m is white\n", + " print(text_color_prefix + prompt_to_explain_the_function, end=\"\") # end='' prevents a newline from being printed\n", + "\n", + " # send the prompt to the API, using \\n\\n as a stop sequence to stop at the end of the bullet list\n", + " explanation_response = openai.Completion.create(\n", + " model=text_model,\n", + " prompt=prompt_to_explain_the_function,\n", + " stop=[\"\\n\\n\", \"\\n\\t\\n\", \"\\n \\n\"],\n", + " max_tokens=max_tokens,\n", + " temperature=temperature,\n", + " stream=True,\n", + " )\n", + " explanation_completion = \"\"\n", + " if print_text:\n", + " completion_color_prefix = \"\\033[92m\" # green\n", + " print(completion_color_prefix, end=\"\")\n", + " for event in explanation_response:\n", + " event_text = event[\"choices\"][0][\"text\"]\n", + " explanation_completion += event_text\n", + " if print_text:\n", + " print(event_text, end=\"\")\n", + "\n", + " # Step 2: Generate a plan to write a unit test\n", + "\n", + " # create a markdown-formatted prompt that asks GPT-3 to complete a plan for writing unit tests, formatted as a bullet list\n", + " prompt_to_explain_a_plan = f\"\"\"\n", + " \n", + "A good unit test suite should aim to:\n", + "- Test the function's behavior for a wide range of possible inputs\n", + "- Test edge cases that the author may not have foreseen\n", + "- Take advantage of the features of `{unit_test_package}` to make the tests easy to write and maintain\n", + "- Be easy to read and understand, with clean code and descriptive names\n", + "- Be deterministic, so that the tests always pass or fail in the same way\n", + "\n", + "`{unit_test_package}` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above.\n", + "\n", + "For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets):\n", + "-\"\"\"\n", + " if print_text:\n", + " print(text_color_prefix + prompt_to_explain_a_plan, end=\"\")\n", + "\n", + " # append this planning prompt to the results from step 1\n", + " prior_text = prompt_to_explain_the_function + explanation_completion\n", + " full_plan_prompt = prior_text + prompt_to_explain_a_plan\n", + "\n", + " # send the prompt to the API, using \\n\\n as a stop sequence to stop at the end of the bullet list\n", + " plan_response = openai.Completion.create(\n", + " model=text_model,\n", + " prompt=full_plan_prompt,\n", + " stop=[\"\\n\\n\", \"\\n\\t\\n\", \"\\n \\n\"],\n", + " max_tokens=max_tokens,\n", + " temperature=temperature,\n", + " stream=True,\n", + " )\n", + " plan_completion = \"\"\n", + " if print_text:\n", + " print(completion_color_prefix, end=\"\")\n", + " for event in plan_response:\n", + " event_text = event[\"choices\"][0][\"text\"]\n", + " plan_completion += event_text\n", + " if print_text:\n", + " print(event_text, end=\"\")\n", + "\n", + " # Step 2b: If the plan is short, ask GPT-3 to elaborate further\n", + " # this counts top-level bullets (e.g., categories), but not sub-bullets (e.g., test cases)\n", + " elaboration_needed = plan_completion.count(\"\\n-\") +1 < approx_min_cases_to_cover # adds 1 because the first bullet is not counted\n", + " if elaboration_needed:\n", + " prompt_to_elaborate_on_the_plan = f\"\"\"\n", + "\n", + "In addition to the scenarios above, we'll also want to make sure we don't forget to test rare or unexpected edge cases (and under each edge case, we include a few examples as sub-bullets):\n", + "-\"\"\"\n", + " if print_text:\n", + " print(text_color_prefix + prompt_to_elaborate_on_the_plan, end=\"\")\n", + "\n", + " # append this elaboration prompt to the results from step 2\n", + " prior_text = full_plan_prompt + plan_completion\n", + " full_elaboration_prompt = prior_text + prompt_to_elaborate_on_the_plan\n", + "\n", + " # send the prompt to the API, using \\n\\n as a stop sequence to stop at the end of the bullet list\n", + " elaboration_response = openai.Completion.create(\n", + " model=text_model,\n", + " prompt=full_elaboration_prompt,\n", + " stop=[\"\\n\\n\", \"\\n\\t\\n\", \"\\n \\n\"],\n", + " max_tokens=max_tokens,\n", + " temperature=temperature,\n", + " stream=True,\n", + " )\n", + " elaboration_completion = \"\"\n", + " if print_text:\n", + " print(completion_color_prefix, end=\"\")\n", + " for event in elaboration_response:\n", + " event_text = event[\"choices\"][0][\"text\"]\n", + " elaboration_completion += event_text\n", + " if print_text:\n", + " print(event_text, end=\"\")\n", + "\n", + " # Step 3: Generate the unit test\n", + "\n", + " # create a markdown-formatted prompt that asks GPT-3 to complete a unit test\n", + " starter_comment = \"\"\n", + " if unit_test_package == \"pytest\":\n", + " starter_comment = \"Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator\"\n", + " prompt_to_generate_the_unit_test = f\"\"\"\n", + "\n", + "Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does.\n", + "```python\n", + "import {unit_test_package} # used for our unit tests\n", + "\n", + "{function_to_test}\n", + "\n", + "#{starter_comment}\"\"\"\n", + " if print_text:\n", + " print(text_color_prefix + prompt_to_generate_the_unit_test, end=\"\")\n", + "\n", + " # append this unit test prompt to the results from step 3\n", + " if elaboration_needed:\n", + " prior_text = full_elaboration_prompt + elaboration_completion\n", + " else:\n", + " prior_text = full_plan_prompt + plan_completion\n", + " full_unit_test_prompt = prior_text + prompt_to_generate_the_unit_test\n", + "\n", + " # send the prompt to the API, using ``` as a stop sequence to stop at the end of the code block\n", + " unit_test_response = openai.Completion.create(\n", + " model=code_model,\n", + " prompt=full_unit_test_prompt,\n", + " stop=\"```\",\n", + " max_tokens=max_tokens,\n", + " temperature=temperature,\n", + " stream=True\n", + " )\n", + " unit_test_completion = \"\"\n", + " if print_text:\n", + " print(completion_color_prefix, end=\"\")\n", + " for event in unit_test_response:\n", + " event_text = event[\"choices\"][0][\"text\"]\n", + " unit_test_completion += event_text\n", + " if print_text:\n", + " print(event_text, end=\"\")\n", + "\n", + " # check the output for errors\n", + " code_start_index = prompt_to_generate_the_unit_test.find(\"```python\\n\") + len(\"```python\\n\")\n", + " code_output = prompt_to_generate_the_unit_test[code_start_index:] + unit_test_completion\n", + " try:\n", + " ast.parse(code_output)\n", + " except SyntaxError as e:\n", + " print(f\"Syntax error in generated code: {e}\")\n", + " if reruns_if_fail > 0:\n", + " print(\"Rerunning...\")\n", + " return unit_test_from_function(\n", + " function_to_test=function_to_test,\n", + " unit_test_package=unit_test_package,\n", + " approx_min_cases_to_cover=approx_min_cases_to_cover,\n", + " print_text=print_text,\n", + " text_model=text_model,\n", + " code_model=code_model,\n", + " max_tokens=max_tokens,\n", + " temperature=temperature,\n", + " reruns_if_fail=reruns_if_fail-1, # decrement rerun counter when calling again\n", + " )\n", + "\n", + " # return the unit test as a string\n", + " return unit_test_completion\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[30m# How to write great unit tests with pytest\n", + "\n", + "In this advanced tutorial for experts, we'll use Python 3.9 and `pytest` to write a suite of unit tests to verify the behavior of the following function.\n", + "```python\n", + "def is_palindrome(s):\n", + " return s == s[::-1]\n", + "```\n", + "\n", + "Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been.\n", + "- First,\u001b[92m we have a function definition. This is where we give the function a name, `is_palindrome`, and specify the arguments that the function accepts. In this case, the function accepts a single string argument, `s`.\n", + "- Next, we have a return statement. This is where we specify the value that the function returns. In this case, the function returns `s == s[::-1]`.\n", + "- Finally, we have a function call. This is where we actually call the function with a specific set of arguments. In this case, we're calling the function with the string `\"racecar\"`.\u001b[30m\n", + " \n", + "A good unit test suite should aim to:\n", + "- Test the function's behavior for a wide range of possible inputs\n", + "- Test edge cases that the author may not have foreseen\n", + "- Take advantage of the features of `pytest` to make the tests easy to write and maintain\n", + "- Be easy to read and understand, with clean code and descriptive names\n", + "- Be deterministic, so that the tests always pass or fail in the same way\n", + "\n", + "`pytest` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above.\n", + "\n", + "For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets):\n", + "-\u001b[92m The input is a palindrome\n", + " - `\"racecar\"`\n", + " - `\"madam\"`\n", + " - `\"anna\"`\n", + "- The input is not a palindrome\n", + " - `\"python\"`\n", + " - `\"test\"`\n", + " - `\"1234\"`\n", + "- The input is an empty string\n", + " - `\"\"`\n", + "- The input is `None`\n", + "- The input is not a string\n", + " - `1`\n", + " - `1.0`\n", + " - `True`\n", + " - `False`\n", + " - `[]`\n", + " - `{}`\u001b[30m\n", + "\n", + "In addition to the scenarios above, we'll also want to make sure we don't forget to test rare or unexpected edge cases (and under each edge case, we include a few examples as sub-bullets):\n", + "-\u001b[92m The input is a palindrome with spaces\n", + " - `\"race car\"`\n", + " - `\" madam \"`\n", + " - `\" anna \"`\n", + "- The input is not a palindrome with spaces\n", + " - `\" python \"`\n", + " - `\" test \"`\n", + " - `\" 1234 \"`\n", + "- The input is a palindrome with punctuation\n", + " - `\"racecar!\"`\n", + " - `\"Madam, I'm Adam.\"`\n", + " - `\"Anna's\"`\n", + "- The input is not a palindrome with punctuation\n", + " - `\"python!\"`\n", + " - `\"test.\"`\n", + " - `\"1234!\"`\n", + "- The input is a palindrome with mixed case\n", + " - `\"Racecar\"`\n", + " - `\"Madam\"`\n", + " - `\"Anna\"`\n", + "- The input is not a palindrome with mixed case\n", + " - `\"Python\"`\n", + " - `\"Test\"`\n", + " - `\"1234\"`\u001b[30m\n", + "\n", + "Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does.\n", + "```python\n", + "import pytest # used for our unit tests\n", + "\n", + "def is_palindrome(s):\n", + " return s == s[::-1]\n", + "\n", + "#Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator\u001b[92m.\n", + "#The first element of the tuple is a name for the test case, and the second element is a list of arguments for the test case.\n", + "#The @pytest.mark.parametrize decorator will generate a separate test function for each test case.\n", + "#The generated test function will be named test_is_palindrome_ where is the name of the test case.\n", + "#The generated test function will be given the arguments specified in the list of arguments for the test case.\n", + "#The generated test function will be given the fixture specified in the decorator, in this case the function itself.\n", + "#The generated test function will call the function with the arguments and assert that the result is equal to the expected value.\n", + "@pytest.mark.parametrize(\n", + " \"name,args,expected\",\n", + " [\n", + " # Test the function's behavior for a wide range of possible inputs\n", + " (\"palindrome\", [\"racecar\"], True),\n", + " (\"palindrome\", [\"madam\"], True),\n", + " (\"palindrome\", [\"anna\"], True),\n", + " (\"non-palindrome\", [\"python\"], False),\n", + " (\"non-palindrome\", [\"test\"], False),\n", + " (\"non-palindrome\", [\"1234\"], False),\n", + " (\"empty string\", [\"\"], True),\n", + " (\"None\", [None], False),\n", + " (\"non-string\", [1], False),\n", + " (\"non-string\", [1.0], False),\n", + " (\"non-string\", [True], False),\n", + " (\"non-string\", [False], False),\n", + " (\"non-string\", [[]], False),\n", + " (\"non-string\", [{}], False),\n", + " # Test edge cases that the author may not have foreseen\n", + " (\"palindrome with spaces\", [\"race car\"], True),\n", + " (\"palindrome with spaces\", [\" madam \"], True),\n", + " (\"palindrome with spaces\", [\" anna \"], True),\n", + " (\"non-palindrome with spaces\", [\" python \"], False),\n", + " (\"non-palindrome with spaces\", [\" test \"], False),\n", + " (\"non-palindrome with spaces\", [\" 1234 \"], False),\n", + " (\"palindrome with punctuation\", [\"racecar!\"], True),\n", + " (\"palindrome with punctuation\", [\"Madam, I'm Adam.\"], True),\n", + " (\"palindrome with punctuation\", [\"Anna's\"], True),\n", + " (\"non-palindrome with punctuation\", [\"python!\"], False),\n", + " (\"non-palindrome with punctuation\", [\"test.\"], False),\n", + " (\"non-palindrome with punctuation\", [\"1234!\"], False),\n", + " (\"palindrome with mixed case\", [\"Racecar\"], True),\n", + " (\"palindrome with mixed case\", [\"Madam\"], True),\n", + " (\"palindrome with mixed case\", [\"Anna\"], True),\n", + " (\"non-palindrome with mixed case\", [\"Python\"], False),\n", + " (\"non-palindrome with mixed case\", [\"Test\"], False),\n", + " (\"non-palindrome with mixed case\", [\"1234\"], False),\n", + " ],\n", + ")\n", + "def test_is_palindrome(is_palindrome, args, expected):\n", + " assert is_palindrome(*args) == expected\n" + ] + }, + { + "data": { + "text/plain": [ + "'.\\n#The first element of the tuple is a name for the test case, and the second element is a list of arguments for the test case.\\n#The @pytest.mark.parametrize decorator will generate a separate test function for each test case.\\n#The generated test function will be named test_is_palindrome_ where is the name of the test case.\\n#The generated test function will be given the arguments specified in the list of arguments for the test case.\\n#The generated test function will be given the fixture specified in the decorator, in this case the function itself.\\n#The generated test function will call the function with the arguments and assert that the result is equal to the expected value.\\n@pytest.mark.parametrize(\\n \"name,args,expected\",\\n [\\n # Test the function\\'s behavior for a wide range of possible inputs\\n (\"palindrome\", [\"racecar\"], True),\\n (\"palindrome\", [\"madam\"], True),\\n (\"palindrome\", [\"anna\"], True),\\n (\"non-palindrome\", [\"python\"], False),\\n (\"non-palindrome\", [\"test\"], False),\\n (\"non-palindrome\", [\"1234\"], False),\\n (\"empty string\", [\"\"], True),\\n (\"None\", [None], False),\\n (\"non-string\", [1], False),\\n (\"non-string\", [1.0], False),\\n (\"non-string\", [True], False),\\n (\"non-string\", [False], False),\\n (\"non-string\", [[]], False),\\n (\"non-string\", [{}], False),\\n # Test edge cases that the author may not have foreseen\\n (\"palindrome with spaces\", [\"race car\"], True),\\n (\"palindrome with spaces\", [\" madam \"], True),\\n (\"palindrome with spaces\", [\" anna \"], True),\\n (\"non-palindrome with spaces\", [\" python \"], False),\\n (\"non-palindrome with spaces\", [\" test \"], False),\\n (\"non-palindrome with spaces\", [\" 1234 \"], False),\\n (\"palindrome with punctuation\", [\"racecar!\"], True),\\n (\"palindrome with punctuation\", [\"Madam, I\\'m Adam.\"], True),\\n (\"palindrome with punctuation\", [\"Anna\\'s\"], True),\\n (\"non-palindrome with punctuation\", [\"python!\"], False),\\n (\"non-palindrome with punctuation\", [\"test.\"], False),\\n (\"non-palindrome with punctuation\", [\"1234!\"], False),\\n (\"palindrome with mixed case\", [\"Racecar\"], True),\\n (\"palindrome with mixed case\", [\"Madam\"], True),\\n (\"palindrome with mixed case\", [\"Anna\"], True),\\n (\"non-palindrome with mixed case\", [\"Python\"], False),\\n (\"non-palindrome with mixed case\", [\"Test\"], False),\\n (\"non-palindrome with mixed case\", [\"1234\"], False),\\n ],\\n)\\ndef test_is_palindrome(is_palindrome, args, expected):\\n assert is_palindrome(*args) == expected\\n'" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "example_function = \"\"\"def is_palindrome(s):\n", + " return s == s[::-1]\"\"\"\n", + "\n", + "unit_test_from_function(example_function, print_text=True)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.9.9 ('openai')", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.9" + }, + "orig_nbformat": 4, + "vscode": { + "interpreter": { + "hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97" + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}