diff --git a/img/flan-1.png b/img/flan-1.png new file mode 100644 index 0000000..c702532 Binary files /dev/null and b/img/flan-1.png differ diff --git a/img/flan-10.png b/img/flan-10.png new file mode 100644 index 0000000..a710bed Binary files /dev/null and b/img/flan-10.png differ diff --git a/img/flan-11.png b/img/flan-11.png new file mode 100644 index 0000000..7e93066 Binary files /dev/null and b/img/flan-11.png differ diff --git a/img/flan-2.png b/img/flan-2.png new file mode 100644 index 0000000..2ff3eef Binary files /dev/null and b/img/flan-2.png differ diff --git a/img/flan-3.png b/img/flan-3.png new file mode 100644 index 0000000..5b3d8a0 Binary files /dev/null and b/img/flan-3.png differ diff --git a/img/flan-4.png b/img/flan-4.png new file mode 100644 index 0000000..5713310 Binary files /dev/null and b/img/flan-4.png differ diff --git a/img/flan-5.png b/img/flan-5.png new file mode 100644 index 0000000..1ddb2b9 Binary files /dev/null and b/img/flan-5.png differ diff --git a/img/flan-6.png b/img/flan-6.png new file mode 100644 index 0000000..65c232b Binary files /dev/null and b/img/flan-6.png differ diff --git a/img/flan-7.png b/img/flan-7.png new file mode 100644 index 0000000..878ef38 Binary files /dev/null and b/img/flan-7.png differ diff --git a/img/flan-8.png b/img/flan-8.png new file mode 100644 index 0000000..a59ddcc Binary files /dev/null and b/img/flan-8.png differ diff --git a/img/flan-9.png b/img/flan-9.png new file mode 100644 index 0000000..d3628b1 Binary files /dev/null and b/img/flan-9.png differ diff --git a/pages/_meta.json b/pages/_meta.json index f4f2c4f..5da1eb5 100644 --- a/pages/_meta.json +++ b/pages/_meta.json @@ -7,6 +7,7 @@ "risks": "Risks & Misuses", "papers": "Papers", "tools": "Tools", + "notebooks": "Notebooks", "datasets": "Datasets", "readings": "Additional Readings", "about": { diff --git a/pages/index.mdx b/pages/index.mdx index 02e98f0..b442611 100644 --- a/pages/index.mdx +++ b/pages/index.mdx @@ -1,7 +1,9 @@ # Prompt Engineering Guide -Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs). +Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs). Researchers use prompt engineering to improve the capacity of LLMs on a wide range of common and complex tasks such as question answering and arithmetic reasoning. Developers use prompt engineering to design robust and effective prompting techniques that interface with LLMs and other tools. -Motivated by the high interest in developing with LLMs, we have created this new prompt engineering guide that contains all the latest papers, learning guides, lectures, references, and tools related to prompt engineering. \ No newline at end of file +Prompt engineering is not just about designing and developing prompts. It encompasses a wide range of skills and techniques that are useful for interacting and developing with LLMs. It's an important skill to interface, build with, and understand capabilities of LLMs. You can use prompt engineering to improve safety of LLMs and build new capabilities like augmenting LLMs with domain knowledge and external tools. + +Motivated by the high interest in developing with LLMs, we have created this new prompt engineering guide that contains all the latest papers, learning guides, models, lectures, references, new LLM capabilities, and tools related to prompt engineering. \ No newline at end of file diff --git a/pages/models.mdx b/pages/models.mdx index 0a23bba..657acf1 100644 --- a/pages/models.mdx +++ b/pages/models.mdx @@ -2,7 +2,7 @@ import { Callout } from 'nextra-theme-docs' -In this section, we will cover capabilities of some of the recent language models by applying the latest and most advanced prompting engineering techniques. +In this section, we will cover some of the recent language models and how they successfully apply the latest and most advanced prompting engineering techniques. In addition, we cover capabilities of these models on a range of tasks and prompting setups like few-shot prompting, zero-shot prompting, and chain-of-thought prompting. Understanding these capabilities are important to understand the limitations of these models and how to use them effectively. This section is under heavy development. diff --git a/pages/models/_meta.json b/pages/models/_meta.json index 6d981f4..371a906 100644 --- a/pages/models/_meta.json +++ b/pages/models/_meta.json @@ -1,4 +1,6 @@ { - "chatgpt": "ChatGPT" + "chatgpt": "ChatGPT", + "flan": "Flan", + "gpt-4": "GPT-4" } \ No newline at end of file diff --git a/pages/models/chatgpt.mdx b/pages/models/chatgpt.mdx index 4e302c6..76e521b 100644 --- a/pages/models/chatgpt.mdx +++ b/pages/models/chatgpt.mdx @@ -145,6 +145,9 @@ The current recommendation for `gpt-3.5-turbo-0301` is to add instructions in th --- ## References +- [Consistency Analysis of ChatGPT](https://arxiv.org/abs/2303.06273) (Mar 2023) +- [Algorithmic Ghost in the Research Shell: Large Language Models and Academic Knowledge Creation in Management Research](https://arxiv.org/abs/2303.07304) (Mar 2023) +- [Large Language Models in the Workplace: A Case Study on Prompt Engineering for Job Type Classification](https://arxiv.org/abs/2303.07142) (March 2023) - [Seeing ChatGPT Through Students' Eyes: An Analysis of TikTok Data](https://arxiv.org/abs/2303.05349) (March 2023) - [Extracting Accurate Materials Data from Research Papers with Conversational Language Models and Prompt Engineering -- Example of ChatGPT](https://arxiv.org/abs/2303.05352) (Mar 2023) - [ChatGPT is on the horizon: Could a large language model be all we need for Intelligent Transportation?](https://arxiv.org/abs/2303.05382) (Mar 2023) diff --git a/pages/models/flan.mdx b/pages/models/flan.mdx new file mode 100644 index 0000000..524473f --- /dev/null +++ b/pages/models/flan.mdx @@ -0,0 +1,83 @@ +# Scaling Instruction-Finetuned Language Models + +import {Screenshot} from 'components/screenshot' +import FLAN1 from '../../img/flan-1.png' +import FLAN2 from '../../img/flan-2.png' +import FLAN3 from '../../img/flan-3.png' +import FLAN4 from '../../img/flan-4.png' +import FLAN5 from '../../img/flan-5.png' +import FLAN6 from '../../img/flan-6.png' +import FLAN7 from '../../img/flan-7.png' +import FLAN8 from '../../img/flan-8.png' +import FLAN9 from '../../img/flan-9.png' +import FLAN10 from '../../img/flan-10.png' +import FLAN11 from '../../img/flan-11.png' + +## What's new? + + +Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416) + +This paper explores the benefits scaling instruction finetuning and how it improves performance on a variety of models (PaLM, T5), prompting setups (zero-shot, few-shot, CoT), and benchmarks (MMLU, TyDiQA). This is explored with the following aspects: scaling the number of tasks (1.8K tasks), scaling model size, and finetuning on chain-of-thought data (9 datasets used). + +**Finetuning procedure:** +- 1.8K tasks were phrased as instructions and used to finetune the model +- Uses both with and without exemplars, and with and without CoT + +Finetuning tasks and held out tasks shown below: + + + +## Capabilities & Key Results + +- Instruction finetuning scales well with the number of tasks and the size of the model; this suggests the need for scaling number of tasks and size of model further +- Adding CoT datasets into the finetuning enables good performance on reasoning tasks +- Flan-PaLM has improved multilingual abilities; 14.9% improvement on one-shot TyDiQA; 8.1% improvement on arithmetic reasoning in under-represented languages +- Plan-PaLM also performs well on open-ended generation questions, which is a good indicator for improved usability +- Improves performance across responsible AI (RAI) benchmarks +- Flan-T5 instruction tuned models demonstrate strong few-shot capabilities and outperforms public checkpoint such as T5 + + +**The results when scaling number of finetuning tasks and model size:** scaling both the size of the model and the number of finetuning tasks is expected to continue improving performance, although scaling the number of tasks has diminished returns. + + +Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416) + +**The results when finetuning with non-CoT and CoT data:** Jointly finetuning on non-CoT and CoT data improves performance on both evaluations, compared to finetuning on just one or the other. + + +Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416) + +In addition, self-consistency combined with CoT achieves SoTA results on several benchmarks. CoT + self-consistency also significantly improves results on benchmarks involving math problems (e.g., MGSM, GSM8K). + + +Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416) + +CoT finetuning unlocks zero-shot reasoning, activated by the phrase "let's think step-by-step", on BIG-Bench tasks. In general, zero-shot CoT Flan-PaLM outperforms zero-shot CoT PaLM without finetuning. + + +Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416) + +Below are some demonstrations of zero-shot CoT for PaLM and Flan-PaLM in unseen tasks. + + +Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416) + +Below are more examples for zero-shot prompting. It shows how the PaLM model struggles with repetitions and not replying to instructions in the zero-shot setting where the Flan-PaLM is able to perform well. Few-shot exemplars can mitigate these errors. + + +Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416) + +Below are some examples demonstrating more zero-shot capabilities of the Flan-PALM model on several different types of challenging open-ended questions: + + +Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416) + + + +Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416) + + +Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416) + +You can try [Flan-T5 models on the Hugging Face Hub](https://huggingface.co/google/flan-t5-xxl). \ No newline at end of file diff --git a/pages/models/gpt-4.mdx b/pages/models/gpt-4.mdx new file mode 100644 index 0000000..e69de29 diff --git a/pages/notebooks.mdx b/pages/notebooks.mdx new file mode 100644 index 0000000..7433a5c --- /dev/null +++ b/pages/notebooks.mdx @@ -0,0 +1,11 @@ +# Prompt Engineering Notebooks + +Contains a collection of noteooks we have designed to help you get started with prompt engineering. More to be added soon! + +| Description | Notebook | +| :------------ | :---------: | +|Learn how to perform many different types of common tasks using the `openai` and `LangChain` library|[Getting Started with Prompt Engineering](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/pe-lecture.ipynb)| +|Learn how to use code as reasoning for solving common tasks using the Python interpreter in combination with the language model.|[Program-Aided Language Model](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/pe-pal.ipynb)| +|Learn more about how to make calls to the ChatGPT APIs using the `openai` library.|[ChatGPT API Intro](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/pe-chatgpt-intro.ipynb)| +|Learn how to use ChatGPT features using the `LangChain` library. |[ChatGPT API with LangChain](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/pe-chatgpt-langchain.ipynb)| +|Learn about adversarial prompting include defensive measures.|[Adversarial Prompt Engineering](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/pe-chatgpt-adversarial.ipynb)| diff --git a/pages/papers.mdx b/pages/papers.mdx index 788d1ed..4d476f3 100644 --- a/pages/papers.mdx +++ b/pages/papers.mdx @@ -13,6 +13,7 @@ The following are the latest papers (sorted by release date) on prompt engineeri - [Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing](https://arxiv.org/abs/2107.13586) (Jul 2021) - Approaches/Techniques: + - [Model-tuning Via Prompts Makes NLP Models Adversarially Robust](https://arxiv.org/abs/2303.07320) (Mar 2023) - [Structure Pretraining and Prompt Tuning for Knowledge Graph Transfer](https://arxiv.org/abs/2303.03922) (March 2023) - [CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification](https://arxiv.org/abs/2303.03628) (March 2023) - [Larger language models do in-context learning differently](https://arxiv.org/abs/2303.03846) (March 2023) diff --git a/pages/risks/adversarial.mdx b/pages/risks/adversarial.mdx index 5b1a9b0..d059902 100644 --- a/pages/risks/adversarial.mdx +++ b/pages/risks/adversarial.mdx @@ -257,6 +257,7 @@ More recently, ChatGPT came into the scene. For many of the attacks that we trie ## References +- [Model-tuning Via Prompts Makes NLP Models Adversarially Robust](https://arxiv.org/abs/2303.07320) (Mar 2023) - [Can AI really be protected from text-based attacks?](https://techcrunch.com/2023/02/24/can-language-models-really-be-protected-from-text-based-attacks/) (Feb 2023) - [Hands-on with Bing’s new ChatGPT-like features](https://techcrunch.com/2023/02/08/hands-on-with-the-new-bing/) (Feb 2023) - [Using GPT-Eliezer against ChatGPT Jailbreaking](https://www.alignmentforum.org/posts/pNcFYZnPdXyL2RfgA/using-gpt-eliezer-against-chatgpt-jailbreaking) (Dec 2022)