You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Prompt-Engineering-Guide/pages/models/collection.jp.mdx

65 lines
7.6 KiB
Markdown

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# モデル一覧
import { Callout, FileTree } from 'nextra-theme-docs'
<Callout emoji="⚠️">
このセクションの内容は、鋭意開発進行中です。
</Callout>
このセクションには、注目すべきLLMの基礎技術(モデル)の一覧とその概要をまとめています([Papers with Code](https://paperswithcode.com/methods/category/language-models)と[Zhao et al. (2023)](https://arxiv.org/pdf/2303.18223.pdf) による直近の研究成果を元に一覧を作成しています)。
## Models
| モデル名 | 発表された年 | 概要説明 |
| --- | --- | --- |
| [BERT](https://arxiv.org/abs/1810.04805)| 2018 | Transformer による双方向(Bidirectional)エンコーダーの特徴表現を利用したモデル |
| [GPT](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf) | 2018 | 事前学習を利用した生成モデルにより、自然言語の理解を進展させた |
| [RoBERTa](https://arxiv.org/abs/1907.11692) | 2019 | 頑健性(Robustness)を重視して BERT を最適化する事前学習のアプローチ |
| [GPT-2](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) | 2019 | 自然言語モデルが、教師なし学習によってマルチタスクをこなせるようになるということを実証 |
| [T5](https://arxiv.org/abs/1910.10683) | 2019 | フォーマットを統一した Text-to-Text Transformer を用いて、転移学習の限界を探索 |
| [BART](https://arxiv.org/abs/1910.13461) | 2019 | 自然言語の生成、翻訳、理解のために、 Sequence-to-Sequence な事前学習モデルのノイズを除去した |
| [ALBERT](https://arxiv.org/abs/1909.11942) | 2019 | 言語表現を自己教師学習するための BERT 軽量(Lite)化モデル |
| [XLNet](https://arxiv.org/abs/1906.08237) | 2019 | 自然言語の理解と生成のための自己回帰事前学習の一般化 |
| [CTRL](https://arxiv.org/abs/1909.05858) | 2019 | CTRL: 生成モデルをコントロール可能にするための、条件付き Transformer 言語モデル |
| [ERNIE](https://arxiv.org/abs/1904.09223v1) | 2019 | ERNIE: 知識の統合を通じて特徴表現を高度化 |
| [GShard](https://arxiv.org/abs/2006.16668v1) | 2020 | GShard: 条件付き演算と自動シャーディング(Sharding)を用いた巨大モデルのスケーリング |
| [GPT-3](https://arxiv.org/abs/2005.14165) | 2020 | 自然言語モデルが、 Few-Shot で十分学習できるということを実証 |
| [LaMDA](https://arxiv.org/abs/2201.08239v3) | 2021 | LaMDA: 対話(Dialogue)アプリケーションのための自然言語モデル |
| [PanGu-α](https://arxiv.org/abs/2104.12369v1) | 2021 | PanGu-α: 自動並列演算を用いて自己回帰事前学習された、中国語大規模言語モデル |
| [mT5](https://arxiv.org/abs/2010.11934v3) | 2021 | mT5: 多言語で大規模に事前学習された text-to-text transformer |
| [CPM-2](https://arxiv.org/abs/2106.10715v3) | 2021 | CPM-2: Large-scale Cost-effective Pre-trained Language Models |
| [T0](https://arxiv.org/abs/2110.08207) |2021 |Multitask Prompted Training Enables Zero-Shot Task Generalization |
| [HyperCLOVA](https://arxiv.org/abs/2109.04650) | 2021 | What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers |
| [Codex](https://arxiv.org/abs/2107.03374v2) |2021 |Evaluating Large Language Models Trained on Code |
| [ERNIE 3.0](https://arxiv.org/abs/2107.02137v1) | 2021 | ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation|
| [Jurassic-1](https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf) | 2021 | Jurassic-1: Technical Details and Evaluation |
| [FLAN](https://arxiv.org/abs/2109.01652v5) | 2021 | Finetuned Language Models Are Zero-Shot Learners |
| [MT-NLG](https://arxiv.org/abs/2201.11990v3) | 2021 | Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model|
| [Yuan 1.0](https://arxiv.org/abs/2110.04725v2) | 2021| Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning |
| [WebGPT](https://arxiv.org/abs/2112.09332v3) | 2021 | WebGPT: Browser-assisted question-answering with human feedback |
| [Gopher](https://arxiv.org/abs/2112.11446v2) | 2021 | Scaling Language Models: Methods, Analysis & Insights from Training Gopher |
| [ERNIE 3.0 Titan](https://arxiv.org/abs/2112.12731v1) | 2021 | ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation |
| [GLaM](https://arxiv.org/abs/2112.06905) | 2021 | GLaM: Efficient Scaling of Language Models with Mixture-of-Experts |
| [InstructGPT](https://arxiv.org/abs/2203.02155v1) | 2022 | Training language models to follow instructions with human feedback |
| [GPT-NeoX-20B](https://arxiv.org/abs/2204.06745v1) | 2022 | GPT-NeoX-20B: An Open-Source Autoregressive Language Model |
| [AlphaCode](https://arxiv.org/abs/2203.07814v1) | 2022 | Competition-Level Code Generation with AlphaCode |
| [CodeGen](https://arxiv.org/abs/2203.13474v5) | 2022 | CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis |
| [Chinchilla](https://arxiv.org/abs/2203.15556) | 2022 | Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data. |
| [Tk-Instruct](https://arxiv.org/abs/2204.07705v3) | 2022 | Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks |
| [UL2](https://arxiv.org/abs/2205.05131v3) | 2022 | UL2: Unifying Language Learning Paradigms |
| [PaLM](https://arxiv.org/abs/2204.02311v5) | 2022 | PaLM: Scaling Language Modeling with Pathways |
| [OPT](https://arxiv.org/abs/2205.01068) | 2022 | OPT: Open Pre-trained Transformer Language Models |
| [BLOOM](https://arxiv.org/abs/2211.05100v3) | 2022 | BLOOM: A 176B-Parameter Open-Access Multilingual Language Model |
| [GLM-130B](https://arxiv.org/abs/2210.02414v1) | 2022 | GLM-130B: An Open Bilingual Pre-trained Model |
| [AlexaTM](https://arxiv.org/abs/2208.01448v2) | 2022 | AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model |
| [Flan-T5](https://arxiv.org/abs/2210.11416v5) | 2022 | Scaling Instruction-Finetuned Language Models |
| [Sparrow](https://arxiv.org/abs/2209.14375) | 2022 | Improving alignment of dialogue agents via targeted human judgements |
| [U-PaLM](https://arxiv.org/abs/2210.11399v2) | 2022 | Transcending Scaling Laws with 0.1% Extra Compute |
| [mT0](https://arxiv.org/abs/2211.01786v1) | 2022 | Crosslingual Generalization through Multitask Finetuning |
| [Galactica](https://arxiv.org/abs/2211.09085v1) | 2022 | Galactica: A Large Language Model for Science |
| [OPT-IML](https://arxiv.org/abs/2212.12017v3) | 2022 | OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization |
| [LLaMA](https://arxiv.org/abs/2302.13971v1) | 2023 | LLaMA: Open and Efficient Foundation Language Models |
| [GPT-4](https://arxiv.org/abs/2303.08774v3) | 2023 | GPT-4 Technical Report |
| [PanGu-Σ](https://arxiv.org/abs/2303.10845v1) | 2023 | PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing |
| [BloombergGPT](https://arxiv.org/abs/2303.17564v1)| 2023 |BloombergGPT: A Large Language Model for Finance |
| [PaLM 2](https://ai.google/static/documents/palm2techreport.pdf) | 2023 | A Language Model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. |