# LLM Koleksiyonu
Bu bölüm, dikkate değer ve temel LLM'lerin bir koleksiyonunu ve özetini içerir.
## Models
| Model | Çıkış Tarihi | Boyut (B) | Kontrol Noktaları | Açıklama |
| --- | --- | --- | --- | --- |
| [Falcon LLM]( | May 2023 | 7, 40 | [Falcon-7B](, [Falcon-40B]( | Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM a 40B model. |
| [PaLM 2]( | May 2023 | - | - | A Language Model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. |
| [Med-PaLM 2]( | May 2023 | - | - | Towards Expert-Level Medical Question Answering with Large Language Models |
| [Gorilla]( | May 2023 | 7 | [Gorilla]( | Gorilla: Large Language Model Connected with Massive APIs |
| [RedPajama-INCITE]( | May 2023 | 3, 7 | [RedPajama-INCITE]( | A family of models including base, instruction-tuned & chat models. |
| [LIMA]( | May 2023 | 65 | - | A 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. |
| [Replit Code]( | May 2023 | 3 | [Replit Code]( | replit-code-v1-3b model is a 2.7B LLM trained on 20 languages from the Stack Dedup v1.2 dataset. |
| [h2oGPT]( | May 2023 | 12 | [h2oGPT]( | h2oGPT is a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities. |
| [CodeGen2]( | May 2023 | 1, 3, 7, 16 | [CodeGen2]( | Code models for program synthesis. |
| [CodeT5 and CodeT5+]( | May 2023 | 16 | [CodeT5]( | CodeT5 and CodeT5+ models for Code Understanding and Generation from Salesforce Research. |
| [StarCoder]( | May 2023 | 15 | [StarCoder]( | StarCoder: A State-of-the-Art LLM for Code |
| [MPT-7B]( | May 2023 | 7 | [MPT-7B]( | MPT-7B is a GPT-style model, and the first in the MosaicML Foundation Series of models. |
| [DLite]( | May 2023 | 0.124 - 1.5 | [DLite-v2-1.5B]( | Lightweight instruction following models which exhibit ChatGPT-like interactivity. |
| [Dolly]( | April 2023 | 3, 7, 12 | [Dolly]( | An instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use. |
| [StableLM]( | April 2023 | 3, 7 | [StableLM-Alpha]( | Stability AI's StableLM series of language models |
| [Pythia]( | April 2023 | 0.070 - 12 | [Pythia]( | A suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. |
| [Open Assistant (Pythia Family)]( | March 2023 | 12 | [Open Assistant]( | OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so. |
| [Cerebras-GPT]( | March 2023 | 0.111 - 13 | [Cerebras-GPT]( | Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster |
| [BloombergGPT](| March 2023 | 50 | - | BloombergGPT: A Large Language Model for Finance|
| [PanGu-Σ]( | March 2023 | 1085 | - | PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing |
| [GPT-4]( | March 2023 | - | - | GPT-4 Technical Report |
| [LLaMA]( | Feb 2023 | 7, 13, 33, 65 | [LLaMA]( | LLaMA: Open and Efficient Foundation Language Models |
| [ChatGPT]( | Nov 2022 | - | - | A model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests. |
| [Galactica]( | Nov 2022 | 0.125 - 120 | [Galactica]( | Galactica: A Large Language Model for Science |
| [mT0]( | Nov 2022 | 13 | [mT0-xxl]( | Crosslingual Generalization through Multitask Finetuning |
| [BLOOM]( | Nov 2022 | 176 | [BLOOM]( | BLOOM: A 176B-Parameter Open-Access Multilingual Language Model |
| [U-PaLM]( | Oct 2022 | 540 | - | Transcending Scaling Laws with 0.1% Extra Compute |
| [UL2]( | Oct 2022 | 20 | [UL2, Flan-UL2]( | UL2: Unifying Language Learning Paradigms |
| [Sparrow]( | Sep 2022 | 70 | - | Improving alignment of dialogue agents via targeted human judgements |
| [Flan-T5]( | Oct 2022 | 11 | [Flan-T5-xxl]( | Scaling Instruction-Finetuned Language Models |
| [AlexaTM]( | Aug 2022 | 20 | - | AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model |
| [GLM-130B]( | Oct 2022 | 130 | [GLM-130B]( | GLM-130B: An Open Bilingual Pre-trained Model |
| [OPT-IML]( | Dec 2022 | 30, 175 | [OPT-IML]( | OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization |
| [OPT]( | May 2022 | 175 | [OPT-13B](, [OPT-66B]( | OPT: Open Pre-trained Transformer Language Models |
| [PaLM]( |April 2022| 540 | - | PaLM: Scaling Language Modeling with Pathways |
| [Tk-Instruct]( | April 2022 | 11 | [Tk-Instruct-11B]( | Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks |
| [GPT-NeoX-20B]( | April 2022 | 20 | [GPT-NeoX-20B]( | GPT-NeoX-20B: An Open-Source Autoregressive Language Model |
| [Chinchilla]( | Mar 2022 | 70 | - | Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data. |
| [InstructGPT]( | Mar 2022 | 175 | - | Training language models to follow instructions with human feedback |
| [CodeGen]( | Mar 2022 | 0.350 - 16 | [CodeGen]( | CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis |
| [AlphaCode]( | Feb 2022 | 41 | - | Competition-Level Code Generation with AlphaCode |
| [MT-NLG]( | Jan 2022 | 530 | - | Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model|
| [LaMDA]( | Jan 2022 | 137 | - | LaMDA: Language Models for Dialog Applications |
| [GLaM]( | Dec 2021 | 1200 | - | GLaM: Efficient Scaling of Language Models with Mixture-of-Experts |
| [Gopher]( | Dec 2021 | 280 | - | Scaling Language Models: Methods, Analysis & Insights from Training Gopher |
| [WebGPT]( | Dec 2021 | 175 | - | WebGPT: Browser-assisted question-answering with human feedback |
| [Yuan 1.0]( | Oct 2021| 245 | - | Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning |
| [T0]( | Oct 2021 | 11 | [T0]( | Multitask Prompted Training Enables Zero-Shot Task Generalization |
| [FLAN]( | Sep 2021 | 137 | - | Finetuned Language Models Are Zero-Shot Learners |
| [HyperCLOVA]( | Sep 2021 | 82 | - | What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers |
| [ERNIE 3.0 Titan]( | July 2021 | 10 | - | ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation |
| [Jurassic-1]( | Aug 2021 | 178 | - | Jurassic-1: Technical Details and Evaluation |
| [ERNIE 3.0]( | July 2021 | 10 | - | ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation|
| [Codex]( | July 2021 | 12 | - | Evaluating Large Language Models Trained on Code |
| [GPT-J-6B]( | June 2021 | 6 | [GPT-J-6B]( | A 6 billion parameter, autoregressive text generation model trained on The Pile. |
| [CPM-2]( | Jun 2021 | 198 | [CPM]( | CPM-2: Large-scale Cost-effective Pre-trained Language Models |
| [PanGu-α]( | April 2021 | 13 | [PanGu-α]( | PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation |
| [mT5]( | Oct 2020 | 13 | [mT5]( | mT5: A massively multilingual pre-trained text-to-text transformer |
| [BART]( | Jul 2020 | - | [BART]( | Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension |
| [GShard]( | Jun 2020 | 600| -| GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding |
| [GPT-3]( | May 2020 | 175 | - | Language Models are Few-Shot Learners |
| [CTRL]( | Sep 2019 | 1.63 | [CTRL]( | CTRL: A Conditional Transformer Language Model for Controllable Generation |
| [ALBERT]( | Sep 2019 | 0.235 | [ALBERT]( | A Lite BERT for Self-supervised Learning of Language Representations |
| [XLNet]( | Jun 2019 | - | [XLNet]( | Generalized Autoregressive Pretraining for Language Understanding and Generation |
| [T5]( | Oct 2019 | 0.06 - 11 | [Flan-T5]( | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer |
| [GPT-2]( | Nov 2019 | 1.5 | [GPT-2]( | Language Models are Unsupervised Multitask Learners |
| [RoBERTa]( | July 2019 | 0.125 - 0.355 | [RoBERTa]( | A Robustly Optimized BERT Pretraining Approach |
| [BERT](| Oct 2018 | - | [BERT]( | Bidirectional Encoder Representations from Transformers |
| [GPT]( | June 2018 | - | [GPT]( | Improving Language Understanding by Generative Pre-Training |
<Callout emoji="⚠️">
Bu bölüm geliştirme aşamasındadır.
Veriler, [Papers with Code]( ve [Zhao ve diğerleri tarafından (2023)]( yapılan yakın çalışmalardan alınmıştır.