add pages
parent
d40cba80e5
commit
cda35c7769
@ -0,0 +1,22 @@
|
||||
import cn from 'clsx'
|
||||
import Image from 'next/image'
|
||||
|
||||
export function Screenshot({ src, alt, full }) {
|
||||
return (
|
||||
<div
|
||||
className={cn(
|
||||
'mt-6 -mb-4 flex justify-center overflow-hidden rounded-xl border dark:border-zinc-800',
|
||||
full ? 'bg-white' : 'bg-zinc-100'
|
||||
)}
|
||||
>
|
||||
<Image
|
||||
src={src}
|
||||
alt={alt}
|
||||
className={cn(
|
||||
'w-auto select-none bg-white',
|
||||
full ? '' : 'ring-1 ring-gray-200'
|
||||
)}
|
||||
/>
|
||||
</div>
|
||||
)
|
||||
}
|
@ -1,3 +0,0 @@
|
||||
# Satori
|
||||
|
||||
Satori (悟り) is a Japanese Buddhist term for awakening, "comprehension; understanding".
|
@ -1,31 +0,0 @@
|
||||
# Another Page
|
||||
|
||||
```js filename="demo.js" {3} copy
|
||||
let a = 1;
|
||||
|
||||
console.log(a);
|
||||
```
|
||||
|
||||
## Component 1
|
||||
|
||||
import { useState } from 'react'
|
||||
|
||||
{/* Import CSS modules */}
|
||||
import styles from '../components/counters.module.css'
|
||||
|
||||
export const Counter = () => {
|
||||
const [count, setCount] = useState(0);
|
||||
return (
|
||||
<div>
|
||||
<button onClick={() => setCount(count + 1)} className={styles.counter}>Clicked {count} times</button>
|
||||
</div>
|
||||
);
|
||||
};
|
||||
|
||||
<Counter/>
|
||||
|
||||
## External Component
|
||||
|
||||
import Counters from '../components/counters'
|
||||
|
||||
<Counters />
|
@ -0,0 +1,5 @@
|
||||
{
|
||||
"introduction": "Prompting Applications",
|
||||
"generating": "Generating Data",
|
||||
"pal": "Program-Aided Language Models"
|
||||
}
|
@ -0,0 +1,44 @@
|
||||
# Generating Data
|
||||
LLMs have strong capabilities to generate text. Using effective prompt strategies can steer the model to produce better, consistency, and more factual responses. LLMs can also especially useful for generating data which is really useful to run all sorts of experiments. For example, we can use it to generate quick samples for a sentiment classifier like so:
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
Produce 10 exemplars for sentiment analysis. Examples are categorized as either positive or negative. Produce 2 negative examples and 8 positive examples. Use this format for the examples:
|
||||
Q: <sentence>
|
||||
A: <sentiment>
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
Q: I just got the best news ever!
|
||||
A: Positive
|
||||
|
||||
Q: We just got a raise at work!
|
||||
A: Positive
|
||||
|
||||
Q: I'm so proud of what I accomplished today.
|
||||
A: Positive
|
||||
|
||||
Q: I'm having the best day ever!
|
||||
A: Positive
|
||||
|
||||
Q: I'm really looking forward to the weekend.
|
||||
A: Positive
|
||||
|
||||
Q: I just got the best present ever!
|
||||
A: Positive
|
||||
|
||||
Q: I'm so happy right now.
|
||||
A: Positive
|
||||
|
||||
Q: I'm so blessed to have such an amazing family.
|
||||
A: Positive
|
||||
|
||||
Q: The weather outside is so gloomy.
|
||||
A: Negative
|
||||
|
||||
Q: I just got some terrible news.
|
||||
A: Negative
|
||||
```
|
||||
|
||||
This is very useful. We actually use this example for a different test in another section of the guides.
|
@ -0,0 +1,3 @@
|
||||
In this guide we will cover some advanced and interesting ways we can use prompt engineering to perform useful and more advanced tasks.
|
||||
|
||||
**Note that this section is under heavy development.**
|
@ -0,0 +1,10 @@
|
||||
{
|
||||
"introduction": "Introduction",
|
||||
"basics": "Basic Prompts",
|
||||
"settings": "A Word on LLM Settings",
|
||||
"standard": "Standard Prompts",
|
||||
"elements": "Prompt Elements",
|
||||
"tips": "General Tips for Designing Prompts",
|
||||
"examples": "Examples of Prompts"
|
||||
}
|
||||
|
@ -0,0 +1,40 @@
|
||||
# Basic Prompts
|
||||
|
||||
You can already achieve a lot with prompts, but the quality of results depends on how much information you provide it. A prompt can contain information like the `instruction` or `question` you are passing to the model and including other details such as `inputs` or `examples`.
|
||||
|
||||
Here is a basic example of a simple prompt:
|
||||
|
||||
*Prompt*
|
||||
```
|
||||
The sky is
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
blue
|
||||
|
||||
The sky is blue on a clear day. On a cloudy day, the sky may be gray or white.
|
||||
```
|
||||
|
||||
As you can see, the language model outputs a continuation of strings that make sense give the context `"The sky is"`. The output might be unexpected or far from the task we want to accomplish.
|
||||
|
||||
This basic example also highlights the necessity to provide more context or instructions on what specifically we want to achieve.
|
||||
|
||||
Let's try to improve it a bit:
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
Complete the sentence:
|
||||
|
||||
The sky is
|
||||
```
|
||||
|
||||
*Output:*
|
||||
|
||||
```
|
||||
so beautiful today.
|
||||
```
|
||||
|
||||
Is that better? Well, we told the model to complete the sentence so the result looks a lot better as it follows exactly what we told it to do ("complete the sentence"). This approach of designing optimal prompts to instruct the model to perform a task is what's referred to as **prompt engineering**.
|
||||
|
||||
The example above is a basic illustration of what's possible with LLMs today. Today's LLMs are able to perform all kinds of advanced tasks that range from text summarization to mathematical reasoning to code generation.
|
@ -0,0 +1,15 @@
|
||||
# Elements of a Prompt
|
||||
|
||||
As we cover more and more examples and applications that are possible with prompt engineering, you will notice that there are certain elements that make up a prompt.
|
||||
|
||||
A prompt can contain any of the following components:
|
||||
|
||||
**Instruction** - a specific task or instruction you want the model to perform
|
||||
|
||||
**Context** - can involve external information or additional context that can steer the model to better responses
|
||||
|
||||
**Input Data** - is the input or question that we are interested to find a response for
|
||||
|
||||
**Output Indicator** - indicates the type or format of output.
|
||||
|
||||
Not all the components are required for a prompt and the format depends on the task at hand. We will touch on more concrete examples in upcoming guides.
|
@ -0,0 +1,7 @@
|
||||
# Introduction
|
||||
|
||||
Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs). Researchers use prompt engineering to improve the capacity of LLMs on a wide range of common and complex tasks such as question answering and arithmetic reasoning. Developers use prompt engineering to design robust and effective prompting techniques that interface with LLMs and other tools.
|
||||
|
||||
This guide covers the basics of standard prompts to provide a rough idea on how to use prompts to interact and instruct large language models (LLMs).
|
||||
|
||||
All examples are tested with `text-davinci-003` (using OpenAI's playground) unless otherwise specified. It uses the default configurations, e.g., `temperature=0.7` and `top-p=1`.
|
@ -0,0 +1,11 @@
|
||||
# A Word on LLM Settings
|
||||
|
||||
When working with prompts, you will be interacting with the LLM via an API or directly. You can configure a few parameters to get different results for your prompts.
|
||||
|
||||
**Temperature** - In short, the lower the temperature the more deterministic the results in the sense that the highest probable next token is always picked. Increasing temperature could lead to more randomness encouraging more diverse or creative outputs. We are essentially increasing the weights of the other possible tokens. In terms of application, we might want to use lower temperature for something like fact-based QA to encourage more factual and concise responses. For poem generation or other creative tasks it might be beneficial to increase temperature.
|
||||
|
||||
**Top_p** - Similarly, with top_p, a sampling technique with temperature called nucleus sampling, you can control how deterministic the model is at generating a response. If you are looking for exact and factual answers keep this low. If you are looking for more diverse responses, increase to a higher value.
|
||||
|
||||
The general recommendation is to alter one not both.
|
||||
|
||||
Before starting with some basic examples, keep in mind that your results may vary depending on the version of LLM you are using.
|
@ -0,0 +1,64 @@
|
||||
# Standard Prompts
|
||||
|
||||
We have tried a very simple prompt above. A standard prompt has the following format:
|
||||
|
||||
```
|
||||
<Question>?
|
||||
```
|
||||
|
||||
This can be formatted into a QA format, which is standard in a lot of QA dataset, as follows:
|
||||
|
||||
```
|
||||
Q: <Question>?
|
||||
A:
|
||||
```
|
||||
|
||||
Given the standard format above, one popular and effective technique to prompting is referred to as few-shot prompting where we provide exemplars. Few-shot prompts can be formatted as follows:
|
||||
|
||||
```
|
||||
<Question>?
|
||||
<Answer>
|
||||
|
||||
<Question>?
|
||||
<Answer>
|
||||
|
||||
<Question>?
|
||||
<Answer>
|
||||
|
||||
<Question>?
|
||||
|
||||
```
|
||||
|
||||
|
||||
And you can already guess that its QA format version would look like this:
|
||||
|
||||
```
|
||||
Q: <Question>?
|
||||
A: <Answer>
|
||||
|
||||
Q: <Question>?
|
||||
A: <Answer>
|
||||
|
||||
Q: <Question>?
|
||||
A: <Answer>
|
||||
|
||||
Q: <Question>?
|
||||
A:
|
||||
```
|
||||
|
||||
Keep in mind that it's not required to use QA format. The format depends on the task at hand. For instance, you can perform a simple classification task and give exemplars that demonstrate the task as follows:
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
This is awesome! // Positive
|
||||
This is bad! // Negative
|
||||
Wow that movie was rad! // Positive
|
||||
What a horrible show! //
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
Negative
|
||||
```
|
||||
|
||||
Few-shot prompts enable in-context learning which is the ability of language models to learn tasks given only a few examples. We will see more of this in action in the upcoming guides.
|
@ -0,0 +1,108 @@
|
||||
# General Tips for Designing Prompts
|
||||
|
||||
Here are some tips to keep in mind while you are designing your prompts:
|
||||
|
||||
|
||||
### Start Simple
|
||||
As you get started with designing prompts, you should keep in mind that it is really an iterative process that requires lot of experimentation to get optimal results. Using a simple playground like OpenAI's or Cohere's is a good starting point.
|
||||
|
||||
You can start with simple prompts and keep adding more elements and context as you aim for better results. Versioning your prompt along the way is vital for this reason. As we read the guide you will see many examples where specificity, simplicity, and conciseness will often give you better results.
|
||||
|
||||
When you have big task that involves many different subtasks, you can try to break down the task into simpler subtasks and keep building up as you get better results. This avoids adding too much complexity to the prompt design process at the beginning.
|
||||
|
||||
### The Instruction
|
||||
You can design effective prompts for various simple tasks by using commands to instruct the model what you want to achieve such as "Write", "Classify", "Summarize", "Translate", "Order", etc.
|
||||
|
||||
Keep in mind that you also need to experiment a lot so see what works best. Trying different instructions with different keywords, context, and data and see what works best for your particular use case and task. Usually, the more specific and relevant the context is to the task you are trying to perform, the better. We will touch on the importance of sampling and adding more context in the upcoming guides.
|
||||
|
||||
Others recommend that instructions are placed at the beginning of the prompt. It's also recommended that some clear separator like "###" is used to separate the instruction and context.
|
||||
|
||||
For instance:
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
### Instruction ###
|
||||
Translate the text below to Spanish:
|
||||
|
||||
Text: "hello!"
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
¡Hola!
|
||||
```
|
||||
|
||||
### Specificity
|
||||
Be very specific about the instruction and task you want the model to perform. The more descriptive and detailed the prompt is, the better the results. This is particularly important when you have a desired outcome or style of generation you are seeking. There aren't specific tokens or keywords that lead to better results. It's more important to have a good format and descriptive prompt. In fact, providing examples in the prompt is very effective to get desired output in specific formats.
|
||||
|
||||
When designing prompts you should also keep in mind the length of the prompt as there are limitations regarding how long this can be. Thinking about how specific and detailed you should be is something to consider. Too many unnecessary details is not necessarily a good approach. The details should be relevant and contribute to the task at hand. This is something you will need to experiment with a lot. We encourage a lot of experimentation and iteration to optimize prompts for your applications.
|
||||
|
||||
As an example, let's try a simple prompt to extract specific information from a piece of text.
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
Extract the name of places in the following text.
|
||||
|
||||
Desired format:
|
||||
Place: <comma_separated_list_of_company_names>
|
||||
|
||||
Input: "Although these developments are encouraging to researchers, much is still a mystery. “We often have a black box between the brain and the effect we see in the periphery,” says Henrique Veiga-Fernandes, a neuroimmunologist at the Champalimaud Centre for the Unknown in Lisbon. “If we want to use it in the therapeutic context, we actually need to understand the mechanism.""
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
Place: Champalimaud Centre for the Unknown, Lisbon
|
||||
```
|
||||
|
||||
Input text is obtained from [this Nature article](https://www.nature.com/articles/d41586-023-00509-z).
|
||||
|
||||
### Avoid Impreciseness
|
||||
|
||||
Given the tips above about being detailed and improving format, it's easy to fall into the trap of wanting to be too clever about prompts and potentially creating imprecise descriptions. It's often better to be specific and direct. The analogy here is very similar to effective communication -- the more direct, the more effective the message gets across.
|
||||
|
||||
For example, you might be interested in learning the concept of prompt engineering. You might try something like:
|
||||
|
||||
```
|
||||
Explain the concept prompt engineering. Keep the explanation short, only a few sentences, and don't be too descriptive.
|
||||
```
|
||||
|
||||
It's not clear from the prompt above how many sentences to use and what style. You might still somewhat get good responses with the above prompts but the better prompt would be one that is very specific, concise, and to the point. Something like:
|
||||
|
||||
```
|
||||
Use 2-3 sentences to explain the concept of prompt engineering to a high school student.
|
||||
```
|
||||
|
||||
### To do or not to do?
|
||||
Another common tip when designing prompts is to avoid saying what not to do but say what to do instead. This encourages more specificity and focus on the details that lead to good responses from the model.
|
||||
|
||||
Here is an example of movie recommendation chatbot failing at exactly what I don't want it to do because of how I wrote the instruction -- focusing on what not to do.
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
The following is an agent that recommends movies to a customer. DO NOT ASK FOR INTERESTS. DO NOT ASK FOR PERSONAL INFORMATION.
|
||||
|
||||
Customer: Please recommend a movie based on my interests.
|
||||
Agent:
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
Sure, I can recommend a movie based on your interests. What kind of movie would you like to watch? Do you prefer action, comedy, romance, or something else?
|
||||
```
|
||||
|
||||
Here is a better prompt:
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
The following is an agent that recommends movies to a customer. The agent is responsible to recommend a movie from the top global trending movies. It should refrain from asking users for their preferences and avoid asking for personal information. If the agent doesn't have a movie to recommend, it should respond "Sorry, couldn't find a movie to recommend today.".
|
||||
|
||||
Customer: Please recommend a movie based on my interests.
|
||||
Agent:
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
Sorry, I don't have any information about your interests. However, here's a list of the top global trending movies right now: [list of movies]. I hope you find something you like!
|
||||
```
|
||||
|
||||
Some of the examples above were adopted from the ["Best practices for prompt engineering with OpenAI API" article.](https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api)
|
@ -0,0 +1,5 @@
|
||||
{
|
||||
"introduction": "Introduction",
|
||||
"chatgpt": "ChatGPT"
|
||||
}
|
||||
|
@ -0,0 +1,184 @@
|
||||
# ChatGPT Prompt Engineering
|
||||
|
||||
import { Callout, FileTree } from 'nextra-theme-docs'
|
||||
import {Screenshot} from 'components/screenshot'
|
||||
import CHATGPT1 from '../../img/chatgpt-1.png'
|
||||
import CHATGPTCLASSIC from '../../img/chatgpt-classic.png'
|
||||
|
||||
In this section, we cover the latest prompt engineering techniques for ChatGPT, including tips, applications, limitations, papers, and additional reading materials.
|
||||
|
||||
**Note that this section is under heavy development.**
|
||||
|
||||
Topics:
|
||||
- [ChatGPT Introduction](#chatgpt-introduction)
|
||||
- [Reviewing The Conversation Task](#reviewing-the-conversation-task)
|
||||
- [Conversations with ChatGPT](#conversations-with-chatgpt)
|
||||
|
||||
---
|
||||
## ChatGPT Introduction
|
||||
|
||||
ChatGPT is a new model [trained by OpenAI](https://openai.com/blog/chatgpt) that has the capability to interact in a conversational way. This model is trained to follow instructions in a prompt to provide appropriate responses in the context of a dialogue. ChatGPT can help with answering questions, suggest recipes, write lyrics in a certain style, generate code, and much more.
|
||||
|
||||
ChatGPT is trained using Reinforcement Learning from Human Feedback (RLHF). While this model is a lot more capable than previous GPT iterations (and also trained to reduce harmful and untruthful outputs), it still comes with limitations. Let's cover some of the capabilities and limitations with concrete examples.
|
||||
|
||||
You can use the research preview of ChatGPT [here](chat.openai.com) but for the examples below we will use the `Chat` mode on the OpenAI Playground.
|
||||
|
||||
---
|
||||
## Reviewing The Conversation Task
|
||||
|
||||
In one of the previous guides, we covered a bit about conversation capabilities and role prompting. We covered how to instruct the LLM to have a conversation in a specific style, with a specific intent, behavior, and identity.
|
||||
|
||||
Let's review our previous basic example where we created a conversational system that's able to generate more technical and scientific responses to questions.
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
The following is a conversation with an AI research assistant. The assistant tone is technical and scientific.
|
||||
|
||||
Human: Hello, who are you?
|
||||
AI: Greeting! I am an AI research assistant. How can I help you today?
|
||||
Human: Can you tell me about the creation of black holes?
|
||||
AI:
|
||||
```
|
||||
|
||||
From the example above, you can see two important components:
|
||||
- the **intent** or explanation of what the chatbot is
|
||||
- the **identity** which instructs the style or tone the chatbot will use to respond
|
||||
|
||||
The simple example above works well with the text completion APIs that uses `text-davinci-003`. More recently, OpenAI [announced the ChatGPT APIs](https://openai.com/blog/introducing-chatgpt-and-whisper-apis), which is a more powerful and cheaper model called `gpt-3.5-turbo` was specifically built for this type of functionality (chat completions). In fact, OpenAI recommends this is as their best model even for non-chat use cases. Other benefits of using the ChatGPT APIs are the significant cost reduction (90%) and efficiency.
|
||||
|
||||
Big companies like Snap Inc. and Instacart are already integrating conversational features powered by ChatGPT on their products that range from personalized recommendations to open-ended shopping goals.
|
||||
|
||||
---
|
||||
## Conversations with ChatGPT
|
||||
|
||||
### Multi-turn Conversations
|
||||
|
||||
To begin demonstrating the capabilities of ChatGPT, we will use the chatbot assistant example above and discuss results. Compared to `text-davinci-003`, the `gpt-3.5-turbo` model that powers ChatGPT uses a chat format as input. The model expects a series of messages as input, and uses those to generate a response.
|
||||
|
||||
*Input:*
|
||||
```
|
||||
SYSTEM: You are an AI research assistant. You use a tone that is technical and scientific.
|
||||
USER: Hello, who are you?
|
||||
ASSISTANT: Greeting! I am an AI research assistant. How can I help you today?
|
||||
USER: Can you tell me about the creation of black holes?
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
ASSISTANT: Of course! Black holes are created when a massive star runs out of fuel and collapses under its own gravity. This collapse causes the star to become incredibly dense, with a gravitational field so strong that not even light can escape it. This point of no return is known as the event horizon. Once a star crosses the event horizon, it is considered a black hole...
|
||||
```
|
||||
|
||||
Note that in the example above, I have simplified the input and output but the ChatGPT chat completion API requires messages to be in a specific format. I have added a snapshot below of how this example would look using the `Chat Mode` in the OpenAI Playground:
|
||||
|
||||
<Screenshot src={CHATGPT1} alt="CHATGPT1" />
|
||||
|
||||
The more formal API call for our example would look something like the example below:
|
||||
|
||||
```python
|
||||
import openai
|
||||
|
||||
openai.ChatCompletion.create(
|
||||
model="gpt-3.5-turbo",
|
||||
messages=[
|
||||
{"role": "system", "content": "You are an AI research assistant. You use a tone that is technical and scientific."},
|
||||
{"role": "user", "content": "Hello, who are you?"},
|
||||
{"role": "assistant", "content": "Greeting! I am an AI research assistant. How can I help you today?"},
|
||||
{"role": "user", "content": "Can you tell me about the creation of black holes?"}
|
||||
]
|
||||
)
|
||||
```
|
||||
In fact, the way developers interact with ChatGPT in the future is expected to be done via the [Chat Markup Language](https://github.com/openai/openai-python/blob/main/chatml.md) (ChatML for short).
|
||||
|
||||
### Single-turn tasks
|
||||
|
||||
The chat format enables multi-turn conversations but it also supports single-turn tasks similar to what we used with `text-davinci-003`. This means we can use ChatGPT to perform similar tasks as what we have demonstrated for the original GPT models. For example, let's try to perform the following question answering task using ChatGPT:
|
||||
|
||||
*Input:*
|
||||
```
|
||||
USER: Answer the question based on the context below. Keep the answer short and concise. Respond "Unsure about answer" if not sure about the answer.
|
||||
|
||||
Context: Teplizumab traces its roots to a New Jersey drug company called Ortho Pharmaceutical. There, scientists generated an early version of the antibody, dubbed OKT3. Originally sourced from mice, the molecule was able to bind to the surface of T cells and limit their cell-killing potential. In 1986, it was approved to help prevent organ rejection after kidney transplants, making it the first therapeutic antibody allowed for human use.
|
||||
|
||||
Question: What was OKT3 originally sourced from?
|
||||
|
||||
Answer:
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
ASSISTANT: Mice.
|
||||
```
|
||||
|
||||
Keep in mind that I am adding the `USER` and `ASSISTANT` labels to better demonstrate how the task can be performed using ChatGPT. Here is the example using the Playground:
|
||||
|
||||
<Screenshot src={CHATGPTCLASSIC} alt="CHATGPTCLASSIC" />
|
||||
|
||||
More formally, this is the API call (I've only included the message component of the request):
|
||||
|
||||
```python
|
||||
CONTENT = """Answer the question based on the context below. Keep the answer short and concise. Respond \"Unsure about answer\" if not sure about the answer.
|
||||
|
||||
Context: Teplizumab traces its roots to a New Jersey drug company called Ortho Pharmaceutical. There, scientists generated an early version of the antibody, dubbed OKT3. Originally sourced from mice, the molecule was able to bind to the surface of T cells and limit their cell-killing potential. In 1986, it was approved to help prevent organ rejection after kidney transplants, making it the first therapeutic antibody allowed for human use.
|
||||
|
||||
Question: What was OKT3 originally sourced from?
|
||||
|
||||
Answer:
|
||||
"""
|
||||
|
||||
response = openai.ChatCompletion.create(
|
||||
model="gpt-3.5-turbo",
|
||||
messages=[
|
||||
{"role": "user", "content": CONTENT},
|
||||
],
|
||||
temperature=0,
|
||||
)
|
||||
```
|
||||
|
||||
### Instructing Chat Models
|
||||
|
||||
According to the official OpenAI docs, snapshots of the `gpt-3.5-turbo` model will also be made available. For example, we can access the snapshot from March 1 `gpt-3.5-turbo-0301`. This allows developers to opt for specific model versions. This also means that the best practices for instructing models may change from version to version.
|
||||
|
||||
The current recommendation for `gpt-3.5-turbo-0301` is to add instructions in the `user` message as opposed to the available `system` message.
|
||||
|
||||
---
|
||||
## References
|
||||
|
||||
- [Seeing ChatGPT Through Students' Eyes: An Analysis of TikTok Data](https://arxiv.org/abs/2303.05349) (March 2023)
|
||||
- [Extracting Accurate Materials Data from Research Papers with Conversational Language Models and Prompt Engineering -- Example of ChatGPT](https://arxiv.org/abs/2303.05352) (Mar 2023)
|
||||
- [ChatGPT is on the horizon: Could a large language model be all we need for Intelligent Transportation?](https://arxiv.org/abs/2303.05382) (Mar 2023)
|
||||
- [Making a Computational Attorney](https://arxiv.org/abs/2303.05383) (Mar 2023)
|
||||
- [Does Synthetic Data Generation of LLMs Help Clinical Text Mining?](https://arxiv.org/abs/2303.04360) (Mar 2023)
|
||||
- [MenuCraft: Interactive Menu System Design with Large Language Models](https://arxiv.org/abs/2303.04496) (Mar 2023)
|
||||
- [A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT](https://arxiv.org/abs/2303.04226) (Mar 2023)
|
||||
- [Exploring the Feasibility of ChatGPT for Event Extraction](https://arxiv.org/abs/2303.03836)
|
||||
- [ChatGPT: Beginning of an End of Manual Annotation? Use Case of Automatic Genre Identification](https://arxiv.org/abs/2303.03953) (Mar 2023)
|
||||
- [Is ChatGPT a Good NLG Evaluator? A Preliminary Study](https://arxiv.org/abs/2303.04048) (Mar 2023)
|
||||
- [Will Affective Computing Emerge from Foundation Models and General AI? A First Evaluation on ChatGPT](https://arxiv.org/abs/2303.03186) (Mar 2023)
|
||||
- [UZH_CLyp at SemEval-2023 Task 9: Head-First Fine-Tuning and ChatGPT Data Generation for Cross-Lingual Learning in Tweet Intimacy Prediction](https://arxiv.org/abs/2303.01194) (Mar 2023)
|
||||
- [How to format inputs to ChatGPT models](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_format_inputs_to_ChatGPT_models.ipynb) (Mar 2023)
|
||||
- [Can ChatGPT Assess Human Personalities? A General Evaluation Framework](https://arxiv.org/abs/2303.01248) (Mar 2023)
|
||||
- [Cross-Lingual Summarization via ChatGPT](https://arxiv.org/abs/2302.14229) (Feb 2023)
|
||||
- [ChatAug: Leveraging ChatGPT for Text Data Augmentation](https://arxiv.org/abs/2302.13007) (Feb 2023)
|
||||
- [Dr ChatGPT, tell me what I want to hear: How prompt knowledge impacts health answer correctness](https://arxiv.org/abs/2302.13793) (Feb 2023)
|
||||
- [An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP)](https://arxiv.org/abs/2302.13814) (Feb 2023)
|
||||
- [ChatGPT: A Meta-Analysis after 2.5 Months](https://arxiv.org/abs/2302.13795) (Feb 2023)
|
||||
- [Let's have a chat! A Conversation with ChatGPT: Technology, Applications, and Limitations](https://arxiv.org/abs/2302.13817) (Feb 2023)
|
||||
- [Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback](https://arxiv.org/abs/2302.12813) (Feb 2023)
|
||||
- [On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective](https://arxiv.org/abs/2302.12095) (Feb 2023)
|
||||
- [How Generative AI models such as ChatGPT can be (Mis)Used in SPC Practice, Education, and Research? An Exploratory Study](https://arxiv.org/abs/2302.10916) (Feb 2023)
|
||||
- [Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT](https://arxiv.org/abs/2302.10198) (Feb 2023)
|
||||
- [A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT](https://arxiv.org/abs/2302.11382) (Feb 2023)
|
||||
- [Zero-Shot Information Extraction via Chatting with ChatGPT](https://arxiv.org/abs/2302.10205) (Feb 2023)
|
||||
- [ChatGPT: Jack of all trades, master of none](https://arxiv.org/abs/2302.10724) (Feb 2023)
|
||||
- [A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning](https://arxiv.org/abs/2302.09068) (Feb 2023)
|
||||
- [Netizens, Academicians, and Information Professionals' Opinions About AI With Special Reference To ChatGPT](https://arxiv.org/abs/2302.07136) (Feb 2023)
|
||||
- [Linguistic ambiguity analysis in ChatGPT](https://arxiv.org/abs/2302.06426) (Feb 2023)
|
||||
- [ChatGPT versus Traditional Question Answering for Knowledge Graphs: Current Status and Future Directions Towards Knowledge Graph Chatbots](https://arxiv.org/abs/2302.06466) (Feb 2023)
|
||||
- [What ChatGPT and generative AI mean for science](https://www.nature.com/articles/d41586-023-00340-6) (Feb 2023)
|
||||
- [Applying BERT and ChatGPT for Sentiment Analysis of Lyme Disease in Scientific Literature](https://arxiv.org/abs/2302.06474) (Feb 2023)
|
||||
- [Exploring AI Ethics of ChatGPT: A Diagnostic Analysis](https://arxiv.org/abs/2301.12867) (Jan 2023)
|
||||
- [ChatGPT for Good? On Opportunities and Challenges of Large Language Models for Education](https://www.edu.sot.tum.de/fileadmin/w00bed/hctl/_my_direct_uploads/ChatGPT_for_Good_.pdf) (Jan 2023)
|
||||
- [The political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientation](https://arxiv.org/abs/2301.01768) (Jan 2023)
|
||||
- [Techniques to improve reliability - OpenAI Cookbook](https://github.com/openai/openai-cookbook/blob/main/techniques_to_improve_reliability.md)
|
||||
- [Awesome ChatGPT Prompts](https://github.com/f/awesome-chatgpt-prompts)
|
||||
- [Introducing ChatGPT](https://openai.com/blog/chatgpt) (Nov 2022)
|
@ -0,0 +1,3 @@
|
||||
# Models
|
||||
|
||||
In this section, we will cover some of the capabilities of language models by applying the latest and most advanced prompting engineering techniques.
|
@ -0,0 +1,7 @@
|
||||
{
|
||||
"introduction": "Introduction",
|
||||
"adversarial": "Adversarial Prompting",
|
||||
"factuality": "Factuality",
|
||||
"biases": "Biases"
|
||||
}
|
||||
|
@ -0,0 +1,98 @@
|
||||
# Biases
|
||||
|
||||
LLMs can produce problematic generations that can potentially be harmful and display biases that could deteriorate the performance of the model on downstream tasks. Some of these can be mitigates through effective prompting strategies but might require more advanced solutions like moderation and filtering.
|
||||
|
||||
### Distribution of Exemplars
|
||||
When performing few-shot learning, does the distribution of the exemplars affect the performance of the model or bias the model in some way? We can perform a simple test here.
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
Q: I just got the best news ever!
|
||||
A: Positive
|
||||
|
||||
Q: We just got a raise at work!
|
||||
A: Positive
|
||||
|
||||
Q: I'm so proud of what I accomplished today.
|
||||
A: Positive
|
||||
|
||||
Q: I'm having the best day ever!
|
||||
A: Positive
|
||||
|
||||
Q: I'm really looking forward to the weekend.
|
||||
A: Positive
|
||||
|
||||
Q: I just got the best present ever!
|
||||
A: Positive
|
||||
|
||||
Q: I'm so happy right now.
|
||||
A: Positive
|
||||
|
||||
Q: I'm so blessed to have such an amazing family.
|
||||
A: Positive
|
||||
|
||||
Q: The weather outside is so gloomy.
|
||||
A: Negative
|
||||
|
||||
Q: I just got some terrible news.
|
||||
A: Negative
|
||||
|
||||
Q: That left a sour taste.
|
||||
A:
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
Negative
|
||||
```
|
||||
|
||||
In the example above, it seems that the distribution of exemplars doesn't bias the model. This is good. Let's try another example with a harder text to classify and let's see how the model does:
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
Q: The food here is delicious!
|
||||
A: Positive
|
||||
|
||||
Q: I'm so tired of this coursework.
|
||||
A: Negative
|
||||
|
||||
Q: I can't believe I failed the exam.
|
||||
A: Negative
|
||||
|
||||
Q: I had a great day today!
|
||||
A: Positive
|
||||
|
||||
Q: I hate this job.
|
||||
A: Negative
|
||||
|
||||
Q: The service here is terrible.
|
||||
A: Negative
|
||||
|
||||
Q: I'm so frustrated with my life.
|
||||
A: Negative
|
||||
|
||||
Q: I never get a break.
|
||||
A: Negative
|
||||
|
||||
Q: This meal tastes awful.
|
||||
A: Negative
|
||||
|
||||
Q: I can't stand my boss.
|
||||
A: Negative
|
||||
|
||||
Q: I feel something.
|
||||
A:
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
Negative
|
||||
```
|
||||
|
||||
While that last sentence is somewhat subjective, I flipped the distribution and instead used 8 positive examples and 2 negative examples and then tried the same exact sentence again. Guess what the model responded? It responded "Positive". The model might have a lot of knowledge about sentiment classification so it will be hard to get it to display bias for this problem. The advice here is to avoid skewing the distribution and instead provide more balanced number of examples for each label. For harder tasks where the model doesn't have too much knowledge of, it will likely struggle more.
|
||||
|
||||
|
||||
### Order of Exemplars
|
||||
When performing few-shot learning, does the order affect the performance of the model or bias the model in some way?
|
||||
|
||||
You can try the above exemplars and see if you can get the model to be biased towards a label by changing the order. The advice is to randomly order exemplars. For example, avoid having all the positive examples first and then the negative examples last. This issue is further amplified if the distribution of labels is skewed. Always ensure to experiment a lot to reduce this type of biasness.
|
@ -0,0 +1,34 @@
|
||||
# Factuality
|
||||
|
||||
LLMs have a tendency to generate responses that sounds coherent and convincing but can sometimes be made up. Improving prompts can help improve the model to generate more accurate/factual responses and reduce the likelihood to generate inconsistent and made up responses.
|
||||
|
||||
Some solutions might include:
|
||||
- provide ground truth (e.g., related article paragraph or Wikipedia entry) as part of context to reduce the likelihood of the model producing made up text.
|
||||
- configure the model to produce less diverse responses by decreasing the probability parameters and instructing it to admit (e.g., "I don't know") when it doesn't know the answer.
|
||||
- provide in the prompt a combination of examples of questions and responses that it might know about and not know about
|
||||
|
||||
Let's look at a simple example:
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
Q: What is an atom?
|
||||
A: An atom is a tiny particle that makes up everything.
|
||||
|
||||
Q: Who is Alvan Muntz?
|
||||
A: ?
|
||||
|
||||
Q: What is Kozar-09?
|
||||
A: ? Q:
|
||||
|
||||
How many moons does Mars have?
|
||||
A: Two, Phobos and Deimos.
|
||||
|
||||
Q: Who is Neto Beto Roberto?
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
A: ?
|
||||
```
|
||||
|
||||
I made up the name "Neto Beto Roberto" so the model is correct in this instance. Try to change the question a bit and see if you can get it to work. There are different ways you can improve this further based on all that you have learned so far.
|
@ -0,0 +1,3 @@
|
||||
# Risks & Misuses
|
||||
|
||||
We have seen already how effective well-crafted prompts can be for various tasks using techniques like few-shot learning. As we think about building real-world applications on top of LLMs, it becomes crucial to think about the misuses, risks, and safety involved with language models. This section focuses on highlighting some of the risks and misuses of LLMs via techniques like prompt injections. It also highlights harmful behaviors including how to mitigate via effective prompting techniques. Other topics of interest include generalizability, calibration, biases, social biases, and factuality to name a few.
|
@ -0,0 +1,16 @@
|
||||
{
|
||||
"introduction": "Introduction",
|
||||
"zeroshot": "Zero-shot Prompting",
|
||||
"fewshot": "Few-shot Prompting",
|
||||
"cot": "Chain-of-Thought Prompting",
|
||||
"zerocot": "Zero-shot CoT",
|
||||
"consistency": "Self-Consistency",
|
||||
"knowledge": "Generate Knowledge Prompting",
|
||||
"ape": "Automatic Prompt Engineer",
|
||||
"activeprompt": "Active-Prompt",
|
||||
"dsp": "Directional Stimulus Prompting",
|
||||
"react": "ReAct",
|
||||
"multimodalcot": "Multimodal CoT",
|
||||
"graph": "Graph Prompting"
|
||||
}
|
||||
|
@ -0,0 +1,11 @@
|
||||
# Active-Prompt
|
||||
|
||||
import { Callout, FileTree } from 'nextra-theme-docs'
|
||||
import {Screenshot} from 'components/screenshot'
|
||||
import ACTIVE from '../../img/active-prompt.png'
|
||||
|
||||
Chain-of-thought (CoT) methods rely on a fixed set of human-annotated exemplars. The problem with this is that the exemplars might not be the most effective examples for the different tasks. To address this, [Diao et al., (2023)](https://arxiv.org/pdf/2302.12246.pdf) recently proposed a new prompting approach called Active-Prompt to adapt LLMs to different task-specific example prompts (annotated with human-designed CoT reasoning).
|
||||
|
||||
Below is an illustration of the approach. The first step is to query the LLM with or without a few CoT examples. *k* possible answers are generated for a set of training questions. An uncertainty metric is calculated based on the *k* answers (disagreement used). The most uncertain questions are selected for annotation by humans. The new annotated exemplars are then used to infer each question.
|
||||
|
||||
<Screenshot src={ACTIVE} alt="ACTIVE" />
|
@ -0,0 +1,24 @@
|
||||
# Automatic Prompt Engineer (APE)
|
||||
|
||||
import { Callout, FileTree } from 'nextra-theme-docs'
|
||||
import {Screenshot} from 'components/screenshot'
|
||||
import APE from '../../img/APE.png'
|
||||
import APECOT from '../../img/ape-zero-shot-cot.png'
|
||||
|
||||
<Screenshot src={APE} alt="APE" />
|
||||
|
||||
[Zhou et al., (2022)](https://arxiv.org/abs/2211.01910) propose automatic prompt engineer (APE) a framework for automatic instruction generation and selection. The instruction generation problem is framed as natural language synthesis addressed as a black-box optimization problem using LLMs to generate and search over candidate solutions.
|
||||
|
||||
The first step involves a large language model (as inference model) that is given output demonstrations to generate instruction candidates for a task. These candidate solution will guide the search procedure. The instructions are executed using a target model, and then the most appropriate instruction is selected based on computed evaluation scores.
|
||||
|
||||
APE discovers a better zero-shot CoT prompt than the human engineered "Let's think step by step" prompt from (Kojima et al., 2022).
|
||||
|
||||
The prompt "Let's work this out it a step by step to be sure we have the right answer." elicits chain-of-though reasoning and improves performance on the MultiArith and GSM8K benchmarks:
|
||||
|
||||
<Screenshot src={APECOT} alt="APECOT" />
|
||||
|
||||
This paper touches on an important topic related to prompt engineering which is this idea of automatically optimizing prompts. While we don't go deep in this topic in this guide, here are few key papers if you are interested in the topic:
|
||||
|
||||
- [AutoPrompt](https://arxiv.org/abs/2010.15980) - proposes an approach to automatically create prompts for a diverse set of tasks based on gradient-guided search.
|
||||
- [Prefix Tuning](https://arxiv.org/abs/2101.00190) - a lightweight alternative to fine-tuning that prepends a trainable continuous prefix for NLG tasks.
|
||||
- [Prompt Tuning](https://arxiv.org/abs/2104.08691) - proposes a mechanism for learning soft prompts through back propagation.
|
@ -0,0 +1,44 @@
|
||||
# Chain-of-Thought Prompting
|
||||
|
||||
Introduced in [Wei et al. (2022)](https://arxiv.org/abs/2201.11903), chain-of-thought (CoT) prompting enables complex reasoning capabilities through intermediate reasoning steps. You can combine it with few-shot prompting to get better results on more complex tasks that require reasoning before responding.
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
|
||||
A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.
|
||||
|
||||
The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24.
|
||||
A: Adding all the odd numbers (17, 19) gives 36. The answer is True.
|
||||
|
||||
The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24.
|
||||
A: Adding all the odd numbers (11, 13) gives 24. The answer is True.
|
||||
|
||||
The odd numbers in this group add up to an even number: 17, 9, 10, 12, 13, 4, 2.
|
||||
A: Adding all the odd numbers (17, 9, 13) gives 39. The answer is False.
|
||||
|
||||
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
|
||||
A:
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.
|
||||
```
|
||||
|
||||
Wow! We can see a perfect result when we provided the reasoning step. In fact, we can solve this task by providing even fewer examples, i.e., just one example seems enough:
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
|
||||
A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.
|
||||
|
||||
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
|
||||
A:
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.
|
||||
```
|
||||
|
||||
Keep in mind that the authors claim that this is an emergent ability that arises with sufficiently large language models.
|
@ -0,0 +1,15 @@
|
||||
# Directional Stimulus Prompting
|
||||
|
||||
import { Callout, FileTree } from 'nextra-theme-docs'
|
||||
import {Screenshot} from 'components/screenshot'
|
||||
import DSP from '../../img/dsp.jpeg'
|
||||
|
||||
[Li et al., (2023)](https://arxiv.org/abs/2302.11520) proposes a new prompting technique to better guide the LLM in generating the desired summary.
|
||||
|
||||
A tuneable policy LM is trained to generate the stimulus/hint. Seeing more use of RL to optimize LLMs.
|
||||
|
||||
The figure below shows how Directional Stimulus Prompting compares with standard prompting. The policy LM can be small and optimized to generate the hints that guide a black-box frozen LLM.
|
||||
|
||||
<Screenshot src={DSP} alt="DSP" />
|
||||
|
||||
Full example coming soon!
|
@ -0,0 +1,107 @@
|
||||
# Few-Shot Prompting
|
||||
|
||||
While large-language models already demonstrate remarkable zero-shot capabilities, they still fall short on more complex tasks when using the zero-shot setting. To improve on this, few-shot prompting is used as a technique to enable in-context learning where we provide demonstrations in the prompt to steer the model to better performance. The demonstrations serve as conditioning for subsequent examples where we would like the model to generate a response.
|
||||
|
||||
Let's demonstrate few-shot prompting via an example that was presented in [Brown et al. 2020](https://arxiv.org/abs/2005.14165). In the example, the task is to correctly use a new word in a sentence.
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses
|
||||
the word whatpu is:
|
||||
We were traveling in Africa and we saw these very cute whatpus.
|
||||
To do a "farduddle" means to jump up and down really fast. An example of a sentence that uses
|
||||
the word farduddle is:
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
When we won the game, we all started to farduddle in celebration.
|
||||
```
|
||||
|
||||
We can clearly observe that the model has somehow learned how to perform the task by providing it just one example (i.e., 1-shot). For more difficult tasks, we can experiment with increasing the demonstrations (e.g., 3-shot, 5-shot, 10-shot, etc.).
|
||||
|
||||
Following the findings from [Min et al. (2022)](https://arxiv.org/abs/2202.12837), here a few more tips about demonstrations/exemplars when doing few-shot:
|
||||
|
||||
- "the label space and the distribution of the input text specified by the demonstrations are both key (regardless of whether the labels are correct for individual inputs)"
|
||||
- the format you use also plays a key role in performance, even if you just use random labels, this is much better than no labels at all.
|
||||
- additional results show that selecting random labels from a true distribution of labels (instead of a uniform distribution) also helps.
|
||||
|
||||
Let's try out a few examples. Let's first try an example with random labels (meaning the labels Negative and Positive are randomly assigned to the inputs):
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
This is awesome! // Negative
|
||||
This is bad! // Positive
|
||||
Wow that movie was rad! // Positive
|
||||
What a horrible show! //
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
Negative
|
||||
```
|
||||
|
||||
We still get the correct answer, even though the labels have been randomized. Note that we also kept the format, which helps too. In fact, with further experimentation it seems the newer GPT models we are experimenting with are becoming more robust to even random format. Example:
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
Positive This is awesome!
|
||||
This is bad! Negative
|
||||
Wow that movie was rad!
|
||||
Positive
|
||||
What a horrible show! --
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
Negative
|
||||
```
|
||||
|
||||
There is no consistency in the format above but the model still predicted the correct label. We have to conduct more thorough analysis to confirm if this holds true for different and more complex tasks, including different variations of prompts.
|
||||
|
||||
### Limitations of Few-shot Prompting
|
||||
|
||||
Standard few-shot prompting works well for many tasks but is still not a perfect technique especially when dealing with more complex reasoning tasks. Let's demonstrate why this is the case. Do you recall the previous example where we provided the following task:
|
||||
|
||||
```
|
||||
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
|
||||
|
||||
A:
|
||||
```
|
||||
|
||||
If we try this again, the model outputs the following:
|
||||
|
||||
```
|
||||
Yes, the odd numbers in this group add up to 107, which is an even number.
|
||||
```
|
||||
|
||||
This is not the correct response, which not only highlights the limitations of these systems but that there is a need for more advanced prompt engineering.
|
||||
|
||||
Let's try to add some examples to see if few-shot prompting improves the results.
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
|
||||
A: The answer is False.
|
||||
|
||||
The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24.
|
||||
A: The answer is True.
|
||||
|
||||
The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24.
|
||||
A: The answer is True.
|
||||
|
||||
The odd numbers in this group add up to an even number: 17, 9, 10, 12, 13, 4, 2.
|
||||
A: The answer is False.
|
||||
|
||||
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
|
||||
A:
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
The answer is True.
|
||||
```
|
||||
|
||||
That didn't work. It seems like few-shot prompting is not enough to get reliable responses for this type of reasoning problem. The example above provides basic information on the task. If you take a closer look, the type of task we have introduced involves a few more reasoning steps. In other words, it might help if we break the problem down into steps and demonstrate that to the model. More recently, [chain-of-thought (CoT) prompting](https://arxiv.org/abs/2201.11903) has been popularized to address more complex arithmetic, commonsense, and symbolic reasoning tasks.
|
||||
|
||||
Overall, it seems that providing examples is useful for solving some tasks. When zero-shot prompting and few-shot prompting are not sufficient, it might mean that the whatever was learned by the model isn't enough to do well at the task. From here it is recommended to start thinking about fine-tuning your own models or experimenting with more advanced prompting techniques. Up next we talk about one of the popular prompting techniques called chain-of-thought prompting which has gained a lot of popularity.
|
@ -0,0 +1,5 @@
|
||||
# GraphPrompts
|
||||
|
||||
[Liu et al., 2023](https://arxiv.org/abs/2302.08043) introduces GraphPrompt, a new prompting framework for graphs to improve performance on downstream tasks.
|
||||
|
||||
More coming soon!
|
@ -0,0 +1,5 @@
|
||||
# Prompting Techniques
|
||||
|
||||
By this point, it should be obvious that it helps to improve prompts to get better results on different tasks. That's the whole idea behind prompt engineering.
|
||||
|
||||
While those examples were fun, let's cover a few concepts more formally before we jump into more advanced concepts.
|
@ -0,0 +1,14 @@
|
||||
# Multimodal CoT Prompting
|
||||
|
||||
import { Callout, FileTree } from 'nextra-theme-docs'
|
||||
import {Screenshot} from 'components/screenshot'
|
||||
import MCOT from '../../img/multimodal-cot.png'
|
||||
|
||||
[Zhang et al. (2023)](https://arxiv.org/abs/2302.00923) recently proposed a multimodal chain-of-thought prompting approach. Traditional CoT focuses on the language modality. In contrast, Multimodal CoT incorporates text and vision into a two-stage framework. The first step involves rationale generation based on multimodal information. This is followed by the second phase, answer inference, which leverages the informative generated rationales.
|
||||
|
||||
The multimodal CoT model (1B) outperforms GPT-3.5 on the ScienceQA benchmark.
|
||||
|
||||
<Screenshot src={MCOT} alt="MCOT" />
|
||||
|
||||
Further reading:
|
||||
- [Language Is Not All You Need: Aligning Perception with Language Models](https://arxiv.org/abs/2302.14045) (Feb 2023)
|
@ -0,0 +1,13 @@
|
||||
# ReAct
|
||||
|
||||
import { Callout, FileTree } from 'nextra-theme-docs'
|
||||
import {Screenshot} from 'components/screenshot'
|
||||
import REACT from '../../img/react.png'
|
||||
|
||||
[Yao et al., 2022](https://arxiv.org/abs/2210.03629) introduced a framework where LLMs are used to generate both reasoning traces and task-specific actions in an interleaved manner. Generating reasoning traces allow the model to induce, track, and update action plans, and even handle exceptions. The action step allows to interface with and gather information from external sources such as knowledge bases or environments.
|
||||
|
||||
The ReAct framework can allow LLMs to interact with external tools to retrieve additional information that leads to more reliable and factual responses.
|
||||
|
||||
<Screenshot src={REACT} alt="REACT" />
|
||||
|
||||
Full example coming soon!
|
@ -0,0 +1,32 @@
|
||||
# Zero-Shot CoT
|
||||
|
||||
One recent idea that came out more recently is the idea of [zero-shot CoT](https://arxiv.org/abs/2205.11916) (Kojima et al. 2022) that essentially involves adding "Let's think step by step" to the original prompt. Let's try a simple problem and see how the model performs:
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. How many apples did I remain with?
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
11 apples
|
||||
```
|
||||
|
||||
The answer is incorrect! Now Let's try with the special prompt.
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. How many apples did I remain with?
|
||||
|
||||
Let's think step by step.
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
First, you started with 10 apples.
|
||||
You gave away 2 apples to the neighbor and 2 to the repairman, so you had 6 apples left.
|
||||
Then you bought 5 more apples, so now you had 11 apples.
|
||||
Finally, you ate 1 apple, so you would remain with 10 apples.
|
||||
```
|
||||
|
||||
It's actually impressive that this simple prompt is effective at this task. This is particularly useful where you don't have too many examples to use in the prompt.
|
@ -0,0 +1,17 @@
|
||||
# Zero-Shot Prompting
|
||||
LLMs today trained on large amounts of data and tuned to follow instructions, are capable of performing tasks zero-shot. We actually tried a few zero-shot examples in the previous section. Here is one of the examples we used:
|
||||
|
||||
*Prompt:*
|
||||
```
|
||||
Classify the text into neutral, negative or positive.
|
||||
|
||||
Text: I think the vacation is okay.
|
||||
Sentiment:
|
||||
```
|
||||
|
||||
*Output:*
|
||||
```
|
||||
Neutral
|
||||
```
|
||||
|
||||
Note that in the prompt above we didn't provide the model any examples -- that's the zero-shot capabilities at work. When zero-shot doesn't work, it's recommended to provide demonstrations or examples in the prompt. Below we discuss the approach known as few-shot prompting.
|
Loading…
Reference in New Issue