Prompt-Engineering-Guide/guides/prompts-advanced-usage.md

313 lines
14 KiB
Markdown
Raw Normal View History

2023-02-07 06:51:47 +00:00
# Advanced Prompt Usage
By this point, it should be obvious that it helps to improve prompts to get better results on different tasks. That's the whole idea behind prompt engineering.
While those examples were fun, let's cover a few concepts more formally before we jump into more advanced concepts.
---
## Few-Shot Prompts
2023-02-07 07:29:07 +00:00
Before jumping into more advanced concepts, let's review an example where we use few-shot prompts.
Do you recall the previous example where we provided the following task
```
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
A:
```
If we try this again, the model output the following:
```
Yes, the odd numbers in this group add up to 107, which is an even number.
```
Again, this is not the correct response, which not only highlights the limitations of these systems but that there is a need for more advanced prompt engineering.
Let's try to add some examples, to see if this improves the results.
```
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: The answer is False.
The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 17, 9, 10, 12, 13, 4, 2.
A: The answer is False.
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
A:
```
Output
```
The answer is True.
```
That didn't work. It seems like basic standard prompting is not enough to get reliable responses for this type of reasoning problem. The example above provides basic information on the task, even with the examples. If you take a closer look at the task, it does involve more reasoning steps.
More recently, chain-of-thought (CoT) prompting has been popularized to address more complex arithmetic,
commonsense, and symbolic reasoning tasks. So let's talk about CoT next and see if we can solve the above task.
2023-02-07 08:08:22 +00:00
---
## Chain-of-Thought Prompting
2023-02-07 07:29:07 +00:00
2023-02-07 08:08:22 +00:00
Introduced in [Wei et al. (2022)](https://arxiv.org/abs/2201.11903), chain-of-thought (CoT) prompting uses few chain of thought demonstrations are provided as exemplars in prompting to improve the ability of large language models to perform complex reasoning.
2023-02-07 07:29:07 +00:00
```
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.
The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24.
A: Adding all the odd numbers (17, 19) gives 36. The answer is True.
The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24.
A: Adding all the odd numbers (11, 13) gives 24. The answer is True.
The odd numbers in this group add up to an even number: 17, 9, 10, 12, 13, 4, 2.
A: Adding all the odd numbers (17, 9, 13) gives 39. The answer is False.
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
A:
```
Output
```
Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.
```
Wow! We can see a perfect result when we provided the reasoning step. In fact, we can solve this task by providing even fewer examples, i.e., just one example seems enough:
```
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
A:
```
Output
```
Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.
```
2023-02-07 06:51:47 +00:00
2023-02-07 08:08:22 +00:00
Keep in mind that the authors claim that this is an emergent ability that arises with sufficiently large language models.
---
## Zero-Shot CoT
2023-02-07 08:10:06 +00:00
One recent idea that came out more recently is the idea of [zero-shot CoT](https://arxiv.org/abs/2205.11916) that essentially involves adding "Let's think step by step" to the the original prompt. Let's try a simple problem and see how the model performs:
2023-02-07 08:08:22 +00:00
```
I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. How many apples did I remain with?
```
Output
```
11 apples
```
The answer is incorrect Now Let's try with the special prompt.
```
I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. How many apples did I remain with?
Let's think step by step.
```
Output
```
First, you started with 10 apples.
You gave away 2 apples to the neighbor and 2 to the repairman, so you had 6 apples left.
Then you bought 5 more apples, so now you had 11 apples.
Finally, you ate 1 apple, so you would remain with 10 apples.
```
2023-02-08 21:42:23 +00:00
It's actually impressive that this simple prompt is effective at this task. This is particularly useful where you don't have too many examples to use in the prompt.
---
## Self-Consistency
Perhaps one of the more advanced techniques out there for prompt engineering is self-consistency. Proposed by [Wang et al. (2022)](https://arxiv.org/pdf/2203.11171.pdf), self-consistency aims "to replace the naive greedy decoding used in chain-of-thought prompting". The idea is to sample multiple, diverse reasoning paths through few-shot CoT, and use the generations to select the most consistent answer. This helps to boost the performance of CoT prompting ont asks involving arithmetic and commonsense reasoning.
Let's try the following example for arithmetic reasoning:
```
When I was 6 my sister was half my age. Now
Im 70 how old is my sister?
```
Output
```
35
```
The output is obviously wrong! How may we improve this with self-consistency? Let's try it out. We will use the few-shot exemplars from Wang et al. 2022 (Table 17):
```
Q: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done,
there will be 21 trees. How many trees did the grove workers plant today?
A: We start with 15 trees. Later we have 21 trees. The difference must be the number of trees they planted.
So, they must have planted 21 - 15 = 6 trees. The answer is 6.
Q: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?
A: There are 3 cars in the parking lot already. 2 more arrive. Now there are 3 + 2 = 5 cars. The answer is 5.
Q: Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total?
A: Leah had 32 chocolates and Leahs sister had 42. That means there were originally 32 + 42 = 74
chocolates. 35 have been eaten. So in total they still have 74 - 35 = 39 chocolates. The answer is 39.
Q: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops
did Jason give to Denny?
A: Jason had 20 lollipops. Since he only has 12 now, he must have given the rest to Denny. The number of
lollipops he has given to Denny must have been 20 - 12 = 8 lollipops. The answer is 8.
Q: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does
he have now?
A: He has 5 toys. He got 2 from mom, so after that he has 5 + 2 = 7 toys. Then he got 2 more from dad, so
in total he has 7 + 2 = 9 toys. The answer is 9.
Q: There were nine computers in the server room. Five more computers were installed each day, from
monday to thursday. How many computers are now in the server room?
A: There are 4 days from monday to thursday. 5 computers were added each day. That means in total 4 * 5 =
20 computers were added. There were 9 computers in the beginning, so now there are 9 + 20 = 29 computers.
The answer is 29.
Q: Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How many
golf balls did he have at the end of wednesday?
A: Michael initially had 58 balls. He lost 23 on Tuesday, so after that he has 58 - 23 = 35 balls. On
Wednesday he lost 2 more so now he has 35 - 2 = 33 balls. The answer is 33.
Q: Olivia has $23. She bought five bagels for $3 each. How much money does she have left?
A: She bought 5 bagels for $3 each. This means she spent 5
Q: When I was 6 my sister was half my age. Now Im 70 how old is my sister?
A:
```
Output 1:
```
When I was 6 my sister was half my age, so she was 3. Now I am 70, so she is 70 - 3 = 67. The answer is 67.
```
Output 2:
```
When the narrator was 6, his sister was half his age, which is 3. Now that the narrator is 70, his sister would be 70 - 3 = 67 years old. The answer is 67.
```
Output 3:
```
When I was 6 my sister was half my age, so she was 3. Now I am 70, so she is 70/2 = 35. The answer is 35.
```
2023-02-09 00:09:35 +00:00
Computing for the final answer involves a few steps (check out the paper for the details) but for the sake of simplicity, we can clearly see that there is already a majority answer emerging so that would essentially become the final answer.
---
## Generated Knowledge Prompting
LLMs continue to be improved and one popular technique include the ability to incorporate knowledge or information to help the model make more accurate predictions.
Using a similar idea, can the model also be used to generate knowledge before making a prediction? That's what attempted in the paper by [Liu et al. 2022](https://arxiv.org/pdf/2110.08387.pdf). In particular, how helpful is this for tasks such as commonsense reasoning?
Let's try a simple prompt:
```
Part of golf is trying to get a higher point total than others. Yes or No?
```
Output
```
Yes.
```
This type of mistakes reveal the limitations of LLMs to perform tasks that require more knowledge about the world. How do we improve this with knowledge generation?
First we generate a few "knowledges":
```
Input: Greece is larger than mexico.
Knowledge: Greece is approximately 131,957 sq km, while Mexico is approximately 1,964,375
sq km, making Mexico 1,389% larger than Greece.
Input: Glasses always fog up.
Knowledge: Condensation occurs on eyeglass lenses when water vapor from your sweat, breath,
and ambient humidity lands on a cold surface, cools, and then changes into tiny drops of liquid,
forming a film that you see as fog. Your lenses will be relatively cool compared to your breath,
especially when the outside air is cold.
Input: A fish is capable of thinking.
Knowledge: Fish are more intelligent than they appear. In many areas, such as memory, their
cognitive powers match or exceed those of higher vertebrates including non-human primates.
Fishs long-term memories help them keep track of complex social relationships.
Input: A common effect of smoking lots of cigarettes in ones lifetime is a higher than
normal chance of getting lung cancer.
Knowledge: Those who consistently averaged less than one cigarette per day over their lifetime
had nine times the risk of dying from lung cancer than never smokers. Among people who smoked
between one and 10 cigarettes per day, the risk of dying from lung cancer was nearly 12 times
higher than that of never smokers.
Input: A rock is the same size as a pebble.
Knowledge: A pebble is a clast of rock with a particle size of 4 to 64 millimetres based on the
Udden-Wentworth scale of sedimentology. Pebbles are generally considered larger than granules
(2 to 4 millimetres diameter) and smaller than cobbles (64 to 256 millimetres diameter).
Input: Part of golf is trying to get a higher point total than others.
Knowledge:
```
Knowledge 1:
```
The objective of golf is to play a set of holes in the least number of strokes. A round
of golf typically consists of 18 holes. Each hole is played once in the round on a standard golf
course. Each stroke is counted as one point, and the total number of strokes is used to determine
the winner of the game.
```
Knowledge 2:
```
Golf is a precision club-and-ball sport in which competing players (or golfers) use
many types of clubs to hit balls into a series of holes on a course using the fewest number of
strokes. The goal is to complete the course with the lowest score, which is calculated by
adding up the total number of strokes taken on each hole. The player with the lowest score
wins the game.
```
We are using the prompt provided in the paper by [Liu et al. 2022](https://arxiv.org/pdf/2110.08387.pdf).
Next step is to integrate the knowledge and get a prediction. I reformatted the question into QA format to guide the answer format.
```
Question: Part of golf is trying to get a higher point total than others. Yes or No?
Knowledge: The objective of golf is to play a set of holes in the least number of strokes. A round of golf typically consists of 18 holes. Each hole is played once in the round on a standard golf course. Each stroke is counted as one point, and the total number of strokes is used to determine the winner of the game.
Explain and Answer:
```
Answer 1 (confidence very high):
```
No, the objective of golf is not to get a higher point total than others. Rather, the objective is to play a set of holes in the least number of strokes. The total number of strokes is used to determine the winner of the game, not the total number of points.
```
```
Question: Part of golf is trying to get a higher point total than others. Yes or No?
Knowledge: Golf is a precision club-and-ball sport in which competing players (or golfers) use many types of clubs to hit balls into a series of holes on a course using the fewest number of strokes. The goal is to complete the course with the lowest score, which is calculated by adding up the total number of strokes taken on each hole. The player with the lowest score wins the game.
Explain and Answer:
```
Answer 2 (confidence is a lot lower):
```
Yes, part of golf is trying to get a higher point total than others. Each player tries to complete the course with the lowest score, which is calculated by adding up the total number of strokes taken on each hole. The player with the lowest score wins the game.
```
Some really interesting things happened with this example. In the first answer, the model was very confident but in the second not so much. I simplify the process for demonstration purposes but there are few more details to consider when arriving to the final answer. Check out the paper for more.