new changes

1 year ago · 8855f0473b
parent 1f21b4407c
commit 8855f0473b
21 changed files with 108 additions and 4 deletions
--- a/img/flan-1.png
+++ b/img/flan-1.png
--- a/img/flan-10.png
+++ b/img/flan-10.png
--- a/img/flan-11.png
+++ b/img/flan-11.png
--- a/img/flan-2.png
+++ b/img/flan-2.png
--- a/img/flan-3.png
+++ b/img/flan-3.png
--- a/img/flan-4.png
+++ b/img/flan-4.png
--- a/img/flan-5.png
+++ b/img/flan-5.png
--- a/img/flan-6.png
+++ b/img/flan-6.png
--- a/img/flan-7.png
+++ b/img/flan-7.png
--- a/img/flan-8.png
+++ b/img/flan-8.png
--- a/img/flan-9.png
+++ b/img/flan-9.png
--- a/pages/_meta.json
+++ b/pages/_meta.json
@ -7,6 +7,7 @@
  "risks": "Risks & Misuses",
  "papers": "Papers",
  "tools": "Tools",
+  "notebooks": "Notebooks",
  "datasets": "Datasets",
  "readings": "Additional Readings",
  "about": {
--- a/pages/index.mdx
+++ b/pages/index.mdx
@ -4,4 +4,6 @@ Prompt engineering is a relatively new discipline for developing and optimizing

 Researchers use prompt engineering to improve the capacity of LLMs on a wide range of common and complex tasks such as question answering and arithmetic reasoning. Developers use prompt engineering to design robust and effective prompting techniques that interface with LLMs and other tools.

-Motivated by the high interest in developing with LLMs, we have created this new prompt engineering guide that contains all the latest papers, learning guides, lectures, references, and tools related to prompt engineering.
+Prompt engineering is not just about designing and developing prompts. It encompasses a wide range of skills and techniques that are useful for interacting and developing with LLMs. It's an important skill to interface, build with, and understand capabilities of LLMs. You can use prompt engineering to improve safety of LLMs and build new capabilities like augmenting LLMs with domain knowledge and external tools.
+
+Motivated by the high interest in developing with LLMs, we have created this new prompt engineering guide that contains all the latest papers, learning guides, models, lectures, references, new LLM capabilities, and tools related to prompt engineering.
--- a/pages/models.mdx
+++ b/pages/models.mdx
@ -2,7 +2,7 @@

 import { Callout } from 'nextra-theme-docs'

-In this section, we will cover capabilities of some of the recent language models by applying the latest and most advanced prompting engineering techniques.
+In this section, we will cover some of the recent language models and how they successfully apply the latest and most advanced prompting engineering techniques. In addition, we cover capabilities of these models on a range of tasks and prompting setups like few-shot prompting, zero-shot prompting, and chain-of-thought prompting. Understanding these capabilities are important to understand the limitations of these models and how to use them effectively.

 <Callout emoji="⚠️">
  This section is under heavy development.
--- a/pages/models/_meta.json
+++ b/pages/models/_meta.json
@ -1,4 +1,6 @@
 {
-    "chatgpt": "ChatGPT"
+    "chatgpt": "ChatGPT",
+    "flan": "Flan",
+    "gpt-4": "GPT-4"
 }
  
--- a/pages/models/chatgpt.mdx
+++ b/pages/models/chatgpt.mdx
@ -145,6 +145,9 @@ The current recommendation for `gpt-3.5-turbo-0301` is to add instructions in th
 ---
 ## References

+- [Consistency Analysis of ChatGPT](https://arxiv.org/abs/2303.06273) (Mar 2023)
+- [Algorithmic Ghost in the Research Shell: Large Language Models and Academic Knowledge Creation in Management Research](https://arxiv.org/abs/2303.07304) (Mar 2023)
+- [Large Language Models in the Workplace: A Case Study on Prompt Engineering for Job Type Classification](https://arxiv.org/abs/2303.07142) (March 2023)
 - [Seeing ChatGPT Through Students' Eyes: An Analysis of TikTok Data](https://arxiv.org/abs/2303.05349) (March 2023)
 - [Extracting Accurate Materials Data from Research Papers with Conversational Language Models and Prompt Engineering -- Example of ChatGPT](https://arxiv.org/abs/2303.05352) (Mar 2023)
 - [ChatGPT is on the horizon: Could a large language model be all we need for Intelligent Transportation?](https://arxiv.org/abs/2303.05382) (Mar 2023)
--- a/pages/models/flan.mdx
+++ b/pages/models/flan.mdx
@ -0,0 +1,83 @@
+# Scaling Instruction-Finetuned Language Models
+
+import {Screenshot} from 'components/screenshot'
+import FLAN1 from '../../img/flan-1.png'
+import FLAN2 from '../../img/flan-2.png'
+import FLAN3 from '../../img/flan-3.png'
+import FLAN4 from '../../img/flan-4.png'
+import FLAN5 from '../../img/flan-5.png'
+import FLAN6 from '../../img/flan-6.png'
+import FLAN7 from '../../img/flan-7.png'
+import FLAN8 from '../../img/flan-8.png'
+import FLAN9 from '../../img/flan-9.png'
+import FLAN10 from '../../img/flan-10.png'
+import FLAN11 from '../../img/flan-11.png'
+
+## What's new?
+
+<Screenshot src={FLAN1} alt="FLAN1" />
+Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
+
+This paper explores the benefits  scaling instruction finetuning and how it improves performance on a variety of models (PaLM, T5), prompting setups (zero-shot, few-shot, CoT), and benchmarks (MMLU, TyDiQA). This is explored with the following aspects: scaling the number of tasks (1.8K tasks), scaling model size, and finetuning on chain-of-thought data (9 datasets used).
+
+**Finetuning procedure:**
+- 1.8K tasks were phrased as instructions and used to finetune the model
+- Uses both with and without exemplars, and with and without CoT
+
+Finetuning tasks and held out tasks shown below:
+
+<Screenshot src={FLAN11} alt="FLAN11" />
+
+## Capabilities & Key Results
+
+- Instruction finetuning scales well with the number of tasks and the size of the model; this suggests the need for scaling number of tasks and size of model further
+- Adding CoT datasets into the finetuning enables good performance on reasoning tasks
+- Flan-PaLM has improved multilingual abilities; 14.9% improvement on one-shot TyDiQA; 8.1% improvement on arithmetic reasoning in under-represented languages
+- Plan-PaLM also performs well on open-ended generation questions, which is a good indicator for improved usability
+- Improves performance across responsible AI (RAI) benchmarks
+- Flan-T5 instruction tuned models demonstrate strong few-shot capabilities and outperforms public checkpoint such as T5
+
+
+**The results when scaling number of finetuning tasks and model size:** scaling both the size of the model and the number of finetuning tasks is expected to continue improving performance, although scaling the number of tasks has diminished returns.
+
+<Screenshot src={FLAN2} alt="FLAN2" />
+Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
+
+**The results when finetuning with non-CoT and CoT data:** Jointly finetuning on non-CoT and CoT data improves performance on both evaluations, compared to finetuning on just one or the other.
+
+<Screenshot src={FLAN3} alt="FLAN3" />
+Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
+
+In addition, self-consistency combined with CoT achieves SoTA results on several benchmarks. CoT + self-consistency also significantly improves results on benchmarks involving math problems (e.g., MGSM, GSM8K).
+
+<Screenshot src={FLAN4} alt="FLAN4" />
+Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
+
+CoT finetuning unlocks zero-shot reasoning, activated by the phrase "let's think step-by-step", on BIG-Bench tasks. In general, zero-shot CoT Flan-PaLM outperforms zero-shot CoT PaLM without finetuning.
+
+<Screenshot src={FLAN6} alt="FLAN6" />
+Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
+
+Below are some demonstrations of zero-shot CoT for PaLM and Flan-PaLM in unseen tasks.
+
+<Screenshot src={FLAN5} alt="FLAN5" />
+Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
+
+Below are more examples for zero-shot prompting. It shows how the PaLM model struggles with repetitions and not replying to instructions in the zero-shot setting where the Flan-PaLM is able to perform well. Few-shot exemplars can mitigate these errors. 
+
+<Screenshot src={FLAN7} alt="FLAN7" />
+Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
+
+Below are some examples demonstrating more zero-shot capabilities of the Flan-PALM model on several different types of challenging open-ended questions:
+
+<Screenshot src={FLAN8} alt="FLAN8" />
+Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
+
+
+<Screenshot src={FLAN9} alt="FLAN9" />
+Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
+
+<Screenshot src={FLAN10} alt="FLAN10" />
+Image Source: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
+
+You can try [Flan-T5 models on the Hugging Face Hub](https://huggingface.co/google/flan-t5-xxl). 
--- a/pages/models/gpt-4.mdx
+++ b/pages/models/gpt-4.mdx
--- a/pages/notebooks.mdx
+++ b/pages/notebooks.mdx
@ -0,0 +1,11 @@
+# Prompt Engineering Notebooks
+
+Contains a collection of noteooks we have designed to help you get started with prompt engineering. More to be added soon!
+
+| Description   | Notebook   | 
+| :------------ | :---------: | 
+|Learn how to perform many different types of common tasks using the `openai` and `LangChain` library|[Getting Started with Prompt Engineering](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/pe-lecture.ipynb)|
+|Learn how to use code as reasoning for solving common tasks using the Python interpreter in combination with the language model.|[Program-Aided Language Model](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/pe-pal.ipynb)|
+|Learn more about how to make calls to the ChatGPT APIs using the `openai` library.|[ChatGPT API Intro](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/pe-chatgpt-intro.ipynb)|
+|Learn how to use ChatGPT features using the `LangChain` library. |[ChatGPT API with LangChain](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/pe-chatgpt-langchain.ipynb)|
+|Learn about adversarial prompting include defensive measures.|[Adversarial Prompt Engineering](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/pe-chatgpt-adversarial.ipynb)|
--- a/pages/papers.mdx
+++ b/pages/papers.mdx
@ -13,6 +13,7 @@ The following are the latest papers (sorted by release date) on prompt engineeri
  - [Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing](https://arxiv.org/abs/2107.13586) (Jul 2021)
 - Approaches/Techniques:
  
+  - [Model-tuning Via Prompts Makes NLP Models Adversarially Robust](https://arxiv.org/abs/2303.07320) (Mar 2023)
  - [Structure Pretraining and Prompt Tuning for Knowledge Graph Transfer](https://arxiv.org/abs/2303.03922) (March 2023)
  - [CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification](https://arxiv.org/abs/2303.03628) (March 2023)
  - [Larger language models do in-context learning differently](https://arxiv.org/abs/2303.03846) (March 2023)
--- a/pages/risks/adversarial.mdx
+++ b/pages/risks/adversarial.mdx
@ -257,6 +257,7 @@ More recently, ChatGPT came into the scene. For many of the attacks that we trie

 ## References

+- [Model-tuning Via Prompts Makes NLP Models Adversarially Robust](https://arxiv.org/abs/2303.07320) (Mar 2023)
 - [Can AI really be protected from text-based attacks?](https://techcrunch.com/2023/02/24/can-language-models-really-be-protected-from-text-based-attacks/) (Feb 2023)
 - [Hands-on with Bing’s new ChatGPT-like features](https://techcrunch.com/2023/02/08/hands-on-with-the-new-bing/) (Feb 2023)
 - [Using GPT-Eliezer against ChatGPT Jailbreaking](https://www.alignmentforum.org/posts/pNcFYZnPdXyL2RfgA/using-gpt-eliezer-against-chatgpt-jailbreaking) (Dec 2022)