new changes
After Width: | Height: | Size: 171 KiB |
After Width: | Height: | Size: 201 KiB |
After Width: | Height: | Size: 243 KiB |
After Width: | Height: | Size: 72 KiB |
After Width: | Height: | Size: 80 KiB |
After Width: | Height: | Size: 66 KiB |
After Width: | Height: | Size: 195 KiB |
After Width: | Height: | Size: 47 KiB |
After Width: | Height: | Size: 103 KiB |
After Width: | Height: | Size: 167 KiB |
After Width: | Height: | Size: 112 KiB |
@ -1,4 +1,6 @@
|
||||
{
|
||||
"chatgpt": "ChatGPT"
|
||||
"chatgpt": "ChatGPT",
|
||||
"flan": "Flan",
|
||||
"gpt-4": "GPT-4"
|
||||
}
|
||||
|
@ -0,0 +1,83 @@
|
||||
# Scaling Instruction-Finetuned Language Models
|
||||
|
||||
import {Screenshot} from 'components/screenshot'
|
||||
import FLAN1 from '../../img/flan-1.png'
|
||||
import FLAN2 from '../../img/flan-2.png'
|
||||
import FLAN3 from '../../img/flan-3.png'
|
||||
import FLAN4 from '../../img/flan-4.png'
|
||||
import FLAN5 from '../../img/flan-5.png'
|
||||
import FLAN6 from '../../img/flan-6.png'
|
||||
import FLAN7 from '../../img/flan-7.png'
|
||||
import FLAN8 from '../../img/flan-8.png'
|
||||
import FLAN9 from '../../img/flan-9.png'
|
||||
import FLAN10 from '../../img/flan-10.png'
|
||||
import FLAN11 from '../../img/flan-11.png'
|
||||
|
||||
## What's new?
|
||||
|
||||
<Screenshot src={FLAN1} alt="FLAN1" />
|
||||
Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
|
||||
|
||||
This paper explores the benefits scaling instruction finetuning and how it improves performance on a variety of models (PaLM, T5), prompting setups (zero-shot, few-shot, CoT), and benchmarks (MMLU, TyDiQA). This is explored with the following aspects: scaling the number of tasks (1.8K tasks), scaling model size, and finetuning on chain-of-thought data (9 datasets used).
|
||||
|
||||
**Finetuning procedure:**
|
||||
- 1.8K tasks were phrased as instructions and used to finetune the model
|
||||
- Uses both with and without exemplars, and with and without CoT
|
||||
|
||||
Finetuning tasks and held out tasks shown below:
|
||||
|
||||
<Screenshot src={FLAN11} alt="FLAN11" />
|
||||
|
||||
## Capabilities & Key Results
|
||||
|
||||
- Instruction finetuning scales well with the number of tasks and the size of the model; this suggests the need for scaling number of tasks and size of model further
|
||||
- Adding CoT datasets into the finetuning enables good performance on reasoning tasks
|
||||
- Flan-PaLM has improved multilingual abilities; 14.9% improvement on one-shot TyDiQA; 8.1% improvement on arithmetic reasoning in under-represented languages
|
||||
- Plan-PaLM also performs well on open-ended generation questions, which is a good indicator for improved usability
|
||||
- Improves performance across responsible AI (RAI) benchmarks
|
||||
- Flan-T5 instruction tuned models demonstrate strong few-shot capabilities and outperforms public checkpoint such as T5
|
||||
|
||||
|
||||
**The results when scaling number of finetuning tasks and model size:** scaling both the size of the model and the number of finetuning tasks is expected to continue improving performance, although scaling the number of tasks has diminished returns.
|
||||
|
||||
<Screenshot src={FLAN2} alt="FLAN2" />
|
||||
Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
|
||||
|
||||
**The results when finetuning with non-CoT and CoT data:** Jointly finetuning on non-CoT and CoT data improves performance on both evaluations, compared to finetuning on just one or the other.
|
||||
|
||||
<Screenshot src={FLAN3} alt="FLAN3" />
|
||||
Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
|
||||
|
||||
In addition, self-consistency combined with CoT achieves SoTA results on several benchmarks. CoT + self-consistency also significantly improves results on benchmarks involving math problems (e.g., MGSM, GSM8K).
|
||||
|
||||
<Screenshot src={FLAN4} alt="FLAN4" />
|
||||
Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
|
||||
|
||||
CoT finetuning unlocks zero-shot reasoning, activated by the phrase "let's think step-by-step", on BIG-Bench tasks. In general, zero-shot CoT Flan-PaLM outperforms zero-shot CoT PaLM without finetuning.
|
||||
|
||||
<Screenshot src={FLAN6} alt="FLAN6" />
|
||||
Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
|
||||
|
||||
Below are some demonstrations of zero-shot CoT for PaLM and Flan-PaLM in unseen tasks.
|
||||
|
||||
<Screenshot src={FLAN5} alt="FLAN5" />
|
||||
Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
|
||||
|
||||
Below are more examples for zero-shot prompting. It shows how the PaLM model struggles with repetitions and not replying to instructions in the zero-shot setting where the Flan-PaLM is able to perform well. Few-shot exemplars can mitigate these errors.
|
||||
|
||||
<Screenshot src={FLAN7} alt="FLAN7" />
|
||||
Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
|
||||
|
||||
Below are some examples demonstrating more zero-shot capabilities of the Flan-PALM model on several different types of challenging open-ended questions:
|
||||
|
||||
<Screenshot src={FLAN8} alt="FLAN8" />
|
||||
Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
|
||||
|
||||
|
||||
<Screenshot src={FLAN9} alt="FLAN9" />
|
||||
Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
|
||||
|
||||
<Screenshot src={FLAN10} alt="FLAN10" />
|
||||
Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
|
||||
|
||||
You can try [Flan-T5 models on the Hugging Face Hub](https://huggingface.co/google/flan-t5-xxl).
|
@ -0,0 +1,11 @@
|
||||
# Prompt Engineering Notebooks
|
||||
|
||||
Contains a collection of noteooks we have designed to help you get started with prompt engineering. More to be added soon!
|
||||
|
||||
| Description | Notebook |
|
||||
| :------------ | :---------: |
|
||||
|Learn how to perform many different types of common tasks using the `openai` and `LangChain` library|[Getting Started with Prompt Engineering](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/pe-lecture.ipynb)|
|
||||
|Learn how to use code as reasoning for solving common tasks using the Python interpreter in combination with the language model.|[Program-Aided Language Model](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/pe-pal.ipynb)|
|
||||
|Learn more about how to make calls to the ChatGPT APIs using the `openai` library.|[ChatGPT API Intro](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/pe-chatgpt-intro.ipynb)|
|
||||
|Learn how to use ChatGPT features using the `LangChain` library. |[ChatGPT API with LangChain](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/pe-chatgpt-langchain.ipynb)|
|
||||
|Learn about adversarial prompting include defensive measures.|[Adversarial Prompt Engineering](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/pe-chatgpt-adversarial.ipynb)|
|