From efc2edb156b92e786e87f7f2213cb08a6c60e8b3 Mon Sep 17 00:00:00 2001 From: Elvis Saravia Date: Mon, 3 Apr 2023 22:28:20 -0600 Subject: [PATCH] added collection of models --- pages/models/collection.en.mdx | 68 ++++++++++++++++++++++++++-------- pages/models/collection.zh.mdx | 63 ++++++++++++++++++++++++++++++- pages/models/llama.en.mdx | 3 ++ pages/papers.en.mdx | 1 + pages/readings.en.mdx | 1 + 5 files changed, 120 insertions(+), 16 deletions(-) diff --git a/pages/models/collection.en.mdx b/pages/models/collection.en.mdx index a32bdfd..62818ff 100644 --- a/pages/models/collection.en.mdx +++ b/pages/models/collection.en.mdx @@ -6,22 +6,60 @@ import { Callout, FileTree } from 'nextra-theme-docs' This section is under heavy development. -This section consists of a collection and summary of notable and foundational LLMs. - +This section consists of a collection and summary of notable and foundational LLMs. (Data adopted from [Papers with Code](https://paperswithcode.com/methods/category/language-models) and the recent work by [Zhao et al. (2023)](https://arxiv.org/pdf/2303.18223.pdf). ## Models -| Model | Description | -| --- | --- | -| [BERT](https://arxiv.org/abs/1810.04805) | Bidirectional Encoder Representations from Transformers | -| [RoBERTa](https://arxiv.org/abs/1907.11692) | A Robustly Optimized BERT Pretraining Approach | -| [ALBERT](https://arxiv.org/abs/1909.11942) | A Lite BERT for Self-supervised Learning of Language Representations | -| [XLNet](https://arxiv.org/abs/1906.08237) | Generalized Autoregressive Pretraining for Language Understanding and Generation | -| [GPT](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf) | Improving Language Understanding by Generative Pre-Training | -| [GPT-2](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) | Language Models are Unsupervised Multitask Learners | -| [GPT-3](https://arxiv.org/abs/2005.14165) | Language Models are Few-Shot Learners | -| [T5](https://arxiv.org/abs/1910.10683) | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | -| [CTRL](https://arxiv.org/abs/1909.05858) | CTRL: A Conditional Transformer Language Model for Controllable Generation | -| [BART](https://arxiv.org/abs/1910.13461) | Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | -| [Chinchilla](https://arxiv.org/abs/2203.15556)(Hoffman et al. 2022) | Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data. | +| Model | Release Date | Description | +| --- | --- | --- | +| [BERT](https://arxiv.org/abs/1810.04805)| 2018 | Bidirectional Encoder Representations from Transformers | +| [GPT](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf) | 2018 | Improving Language Understanding by Generative Pre-Training | +| [RoBERTa](https://arxiv.org/abs/1907.11692) | 2019 | A Robustly Optimized BERT Pretraining Approach | +| [GPT-2](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) | 2019 | Language Models are Unsupervised Multitask Learners | +| [T5](https://arxiv.org/abs/1910.10683) | 2019 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | +| [BART](https://arxiv.org/abs/1910.13461) | 2019 | Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | +| [ALBERT](https://arxiv.org/abs/1909.11942) |2019 | A Lite BERT for Self-supervised Learning of Language Representations | +| [XLNet](https://arxiv.org/abs/1906.08237) | 2019 | Generalized Autoregressive Pretraining for Language Understanding and Generation | +| [CTRL](https://arxiv.org/abs/1909.05858) |2019 | CTRL: A Conditional Transformer Language Model for Controllable Generation | +| [ERNIE](https://arxiv.org/abs/1904.09223v1) | 2019| ERNIE: Enhanced Representation through Knowledge Integration | +| [GShard](https://arxiv.org/abs/2006.16668v1) | 2020 | GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding | +| [GPT-3](https://arxiv.org/abs/2005.14165) | 2020 | Language Models are Few-Shot Learners | +| [LaMDA](https://arxiv.org/abs/2201.08239v3) | 2021 | LaMDA: Language Models for Dialog Applications | +| [PanGu-α](https://arxiv.org/abs/2104.12369v1) | 2021 | PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation | +| [mT5](https://arxiv.org/abs/2010.11934v3) | 2021 | mT5: A massively multilingual pre-trained text-to-text transformer | +| [CPM-2](https://arxiv.org/abs/2106.10715v3) | 2021 | CPM-2: Large-scale Cost-effective Pre-trained Language Models | +| [T0](https://arxiv.org/abs/2110.08207) |2021 |Multitask Prompted Training Enables Zero-Shot Task Generalization | +| [HyperCLOVA](https://arxiv.org/abs/2109.04650) | 2021 | What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers | +| [Codex](https://arxiv.org/abs/2107.03374v2) |2021 |Evaluating Large Language Models Trained on Code | +| [ERNIE 3.0](https://arxiv.org/abs/2107.02137v1) | 2021 | ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation| +| [Jurassic-1](https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf) | 2021 | Jurassic-1: Technical Details and Evaluation | +| [FLAN](https://arxiv.org/abs/2109.01652v5) | 2021 | Finetuned Language Models Are Zero-Shot Learners | +| [MT-NLG](https://arxiv.org/abs/2201.11990v3) | 2021 | Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model| +| [Yuan 1.0](https://arxiv.org/abs/2110.04725v2) | 2021| Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning | +| [WebGPT](https://arxiv.org/abs/2112.09332v3) | 2021 | WebGPT: Browser-assisted question-answering with human feedback | +| [Gopher](https://arxiv.org/abs/2112.11446v2) |2021 | Scaling Language Models: Methods, Analysis & Insights from Training Gopher | +| [ERNIE 3.0 Titan](https://arxiv.org/abs/2112.12731v1) |2021 | ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation | +| [GLaM](https://arxiv.org/abs/2112.06905) | 2021 | GLaM: Efficient Scaling of Language Models with Mixture-of-Experts | +| [InstructGPT](https://arxiv.org/abs/2203.02155v1) | 2022 | Training language models to follow instructions with human feedback | +| [GPT-NeoX-20B](https://arxiv.org/abs/2204.06745v1) | 2022 | GPT-NeoX-20B: An Open-Source Autoregressive Language Model | +| [AlphaCode](https://arxiv.org/abs/2203.07814v1) | 2022 | Competition-Level Code Generation with AlphaCode | +| [CodeGen](https://arxiv.org/abs/2203.13474v5) | 2022 | CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis | +| [Chinchilla](https://arxiv.org/abs/2203.15556) | 2022 | Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data. | +| [Tk-Instruct](https://arxiv.org/abs/2204.07705v3) | 2022 | Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks | +| [UL2](https://arxiv.org/abs/2205.05131v3) | 2022 | UL2: Unifying Language Learning Paradigms | +| [PaLM](https://arxiv.org/abs/2204.02311v5) |2022| PaLM: Scaling Language Modeling with Pathways | +| [OPT](https://arxiv.org/abs/2205.01068) | 2022 | OPT: Open Pre-trained Transformer Language Models | +| [BLOOM](https://arxiv.org/abs/2211.05100v3) | 2022 | BLOOM: A 176B-Parameter Open-Access Multilingual Language Model | +| [GLM-130B](https://arxiv.org/abs/2210.02414v1) | 2022 | GLM-130B: An Open Bilingual Pre-trained Model | +| [AlexaTM](https://arxiv.org/abs/2208.01448v2) | 2022 | AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model | +| [Flan-T5](https://arxiv.org/abs/2210.11416v5) | 2022 | Scaling Instruction-Finetuned Language Models | +| [Sparrow](https://arxiv.org/abs/2209.14375) | 2022 | Improving alignment of dialogue agents via targeted human judgements | +| [U-PaLM](https://arxiv.org/abs/2210.11399v2) | 2022 | Transcending Scaling Laws with 0.1% Extra Compute | +| [mT0](https://arxiv.org/abs/2211.01786v1) | 2022 | Crosslingual Generalization through Multitask Finetuning | +| [Galactica](https://arxiv.org/abs/2211.09085v1) | 2022 | Galactica: A Large Language Model for Science | +| [OPT-IML](https://arxiv.org/abs/2212.12017v3) | 2022 | OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization | +| [LLaMA](https://arxiv.org/abs/2302.13971v1) | 2023 | LLaMA: Open and Efficient Foundation Language Models | +| [GPT-4](https://arxiv.org/abs/2303.08774v3) | 2023 |GPT-4 Technical Report | +| [PanGu-Σ](https://arxiv.org/abs/2303.10845v1) | 2023 | PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing | +| [BloombergGPT](https://arxiv.org/abs/2303.17564v1)| 2023 |BloombergGPT: A Large Language Model for Finance| \ No newline at end of file diff --git a/pages/models/collection.zh.mdx b/pages/models/collection.zh.mdx index 9aaa9a1..0e5b99d 100644 --- a/pages/models/collection.zh.mdx +++ b/pages/models/collection.zh.mdx @@ -1,3 +1,64 @@ # Model Collection -Needs translation! Feel free to contribute a translating by clicking the `Edit this page` button on the right side. \ No newline at end of file +import { Callout, FileTree } from 'nextra-theme-docs' + + + This section is under heavy development. + + +This section consists of a collection and summary of notable and foundational LLMs. (Data adopted from [Papers with Code](https://paperswithcode.com/methods/category/language-models) and the recent work by [Zhao et al. (2023)](https://arxiv.org/pdf/2303.18223.pdf). + +## Models + +| Model | Release Date | Description | +| --- | --- | --- | +| [BERT](https://arxiv.org/abs/1810.04805)| 2018 | Bidirectional Encoder Representations from Transformers | +| [GPT](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf) | 2018 | Improving Language Understanding by Generative Pre-Training | +| [RoBERTa](https://arxiv.org/abs/1907.11692) | 2019 | A Robustly Optimized BERT Pretraining Approach | +| [GPT-2](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) | 2019 | Language Models are Unsupervised Multitask Learners | +| [T5](https://arxiv.org/abs/1910.10683) | 2019 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | +| [BART](https://arxiv.org/abs/1910.13461) | 2019 | Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | +| [ALBERT](https://arxiv.org/abs/1909.11942) |2019 | A Lite BERT for Self-supervised Learning of Language Representations | +| [XLNet](https://arxiv.org/abs/1906.08237) | 2019 | Generalized Autoregressive Pretraining for Language Understanding and Generation | +| [CTRL](https://arxiv.org/abs/1909.05858) |2019 | CTRL: A Conditional Transformer Language Model for Controllable Generation | +| [ERNIE](https://arxiv.org/abs/1904.09223v1) | 2019| ERNIE: Enhanced Representation through Knowledge Integration | +| [GShard](https://arxiv.org/abs/2006.16668v1) | 2020 | GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding | +| [GPT-3](https://arxiv.org/abs/2005.14165) | 2020 | Language Models are Few-Shot Learners | +| [LaMDA](https://arxiv.org/abs/2201.08239v3) | 2021 | LaMDA: Language Models for Dialog Applications | +| [PanGu-α](https://arxiv.org/abs/2104.12369v1) | 2021 | PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation | +| [mT5](https://arxiv.org/abs/2010.11934v3) | 2021 | mT5: A massively multilingual pre-trained text-to-text transformer | +| [CPM-2](https://arxiv.org/abs/2106.10715v3) | 2021 | CPM-2: Large-scale Cost-effective Pre-trained Language Models | +| [T0](https://arxiv.org/abs/2110.08207) |2021 |Multitask Prompted Training Enables Zero-Shot Task Generalization | +| [HyperCLOVA](https://arxiv.org/abs/2109.04650) | 2021 | What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers | +| [Codex](https://arxiv.org/abs/2107.03374v2) |2021 |Evaluating Large Language Models Trained on Code | +| [ERNIE 3.0](https://arxiv.org/abs/2107.02137v1) | 2021 | ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation| +| [Jurassic-1](https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf) | 2021 | Jurassic-1: Technical Details and Evaluation | +| [FLAN](https://arxiv.org/abs/2109.01652v5) | 2021 | Finetuned Language Models Are Zero-Shot Learners | +| [MT-NLG](https://arxiv.org/abs/2201.11990v3) | 2021 | Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model| +| [Yuan 1.0](https://arxiv.org/abs/2110.04725v2) | 2021| Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning | +| [WebGPT](https://arxiv.org/abs/2112.09332v3) | 2021 | WebGPT: Browser-assisted question-answering with human feedback | +| [Gopher](https://arxiv.org/abs/2112.11446v2) |2021 | Scaling Language Models: Methods, Analysis & Insights from Training Gopher | +| [ERNIE 3.0 Titan](https://arxiv.org/abs/2112.12731v1) |2021 | ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation | +| [GLaM](https://arxiv.org/abs/2112.06905) | 2021 | GLaM: Efficient Scaling of Language Models with Mixture-of-Experts | +| [InstructGPT](https://arxiv.org/abs/2203.02155v1) | 2022 | Training language models to follow instructions with human feedback | +| [GPT-NeoX-20B](https://arxiv.org/abs/2204.06745v1) | 2022 | GPT-NeoX-20B: An Open-Source Autoregressive Language Model | +| [AlphaCode](https://arxiv.org/abs/2203.07814v1) | 2022 | Competition-Level Code Generation with AlphaCode | +| [CodeGen](https://arxiv.org/abs/2203.13474v5) | 2022 | CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis | +| [Chinchilla](https://arxiv.org/abs/2203.15556) | 2022 | Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data. | +| [Tk-Instruct](https://arxiv.org/abs/2204.07705v3) | 2022 | Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks | +| [UL2](https://arxiv.org/abs/2205.05131v3) | 2022 | UL2: Unifying Language Learning Paradigms | +| [PaLM](https://arxiv.org/abs/2204.02311v5) |2022| PaLM: Scaling Language Modeling with Pathways | +| [OPT](https://arxiv.org/abs/2205.01068) | 2022 | OPT: Open Pre-trained Transformer Language Models | +| [BLOOM](https://arxiv.org/abs/2211.05100v3) | 2022 | BLOOM: A 176B-Parameter Open-Access Multilingual Language Model | +| [GLM-130B](https://arxiv.org/abs/2210.02414v1) | 2022 | GLM-130B: An Open Bilingual Pre-trained Model | +| [AlexaTM](https://arxiv.org/abs/2208.01448v2) | 2022 | AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model | +| [Flan-T5](https://arxiv.org/abs/2210.11416v5) | 2022 | Scaling Instruction-Finetuned Language Models | +| [Sparrow](https://arxiv.org/abs/2209.14375) | 2022 | Improving alignment of dialogue agents via targeted human judgements | +| [U-PaLM](https://arxiv.org/abs/2210.11399v2) | 2022 | Transcending Scaling Laws with 0.1% Extra Compute | +| [mT0](https://arxiv.org/abs/2211.01786v1) | 2022 | Crosslingual Generalization through Multitask Finetuning | +| [Galactica](https://arxiv.org/abs/2211.09085v1) | 2022 | Galactica: A Large Language Model for Science | +| [OPT-IML](https://arxiv.org/abs/2212.12017v3) | 2022 | OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization | +| [LLaMA](https://arxiv.org/abs/2302.13971v1) | 2023 | LLaMA: Open and Efficient Foundation Language Models | +| [GPT-4](https://arxiv.org/abs/2303.08774v3) | 2023 |GPT-4 Technical Report | +| [PanGu-Σ](https://arxiv.org/abs/2303.10845v1) | 2023 | PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing | +| [BloombergGPT](https://arxiv.org/abs/2303.17564v1)| 2023 |BloombergGPT: A Large Language Model for Finance| \ No newline at end of file diff --git a/pages/models/llama.en.mdx b/pages/models/llama.en.mdx index ac89811..7074e61 100644 --- a/pages/models/llama.en.mdx +++ b/pages/models/llama.en.mdx @@ -34,6 +34,9 @@ Overall, LLaMA-13B outperform GPT-3(175B) on many benchmarks despite being 10x s ## References +- [Koala: A Dialogue Model for Academic Research](https://bair.berkeley.edu/blog/2023/04/03/koala/) (April 2023) +- [Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data](https://arxiv.org/abs/2304.01196) (April 2023) +- [Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality](https://vicuna.lmsys.org/) (March 2023) - [LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention](https://arxiv.org/abs/2303.16199) (March 2023) - [GPT4All](https://github.com/nomic-ai/gpt4all) (March 2023) - [ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge](https://arxiv.org/abs/2303.14070) (March 2023) diff --git a/pages/papers.en.mdx b/pages/papers.en.mdx index 8c75a41..18fc9ab 100644 --- a/pages/papers.en.mdx +++ b/pages/papers.en.mdx @@ -15,6 +15,7 @@ The following are the latest papers (sorted by release date) on prompt engineeri ## Approaches + - [Self-Refine: Iterative Refinement with Self-Feedback](https://arxiv.org/abs/2303.17651v1) (Mar 2023) - [kNN Prompting: Beyond-Context Learning with Calibration-Free Nearest Neighbor Inference](https://arxiv.org/abs/2303.13824) (Mar 2023) - [Visual-Language Prompt Tuning with Knowledge-guided Context Optimization](https://arxiv.org/abs/2303.13283) (Mar 2023) - [Fairness-guided Few-shot Prompting for Large Language Models](https://arxiv.org/abs/2303.13217) (Mar 2023) diff --git a/pages/readings.en.mdx b/pages/readings.en.mdx index 9f92bb2..fc8b81e 100644 --- a/pages/readings.en.mdx +++ b/pages/readings.en.mdx @@ -1,6 +1,7 @@ # Additional Readings #### (Sorted by Name) +- [2023 AI Index Report](https://aiindex.stanford.edu/report/) - [【徹底解説】これからのエンジニアの必携スキル、プロンプトエンジニアリングの手引「Prompt Engineering Guide」を読んでまとめてみた](https://dev.classmethod.jp/articles/how-to-design-prompt-engineering/) - [3 Principles for prompt engineering with GPT-3](https://www.linkedin.com/pulse/3-principles-prompt-engineering-gpt-3-ben-whately) - [A beginner-friendly guide to generative language models - LaMBDA guide](https://aitestkitchen.withgoogle.com/how-lamda-works)