adding link to the medium article explaining differences between causal and masked language modeling

pull/24/head
Raigon Kunnath Augustin 6 months ago
parent f409eea77a
commit 736318352a

@ -174,7 +174,7 @@ Pre-training is a very long and costly process, which is why this is not the foc
* [BLOOM](https://bigscience.notion.site/BLOOM-BigScience-176B-Model-ad073ca07cdf479398d5f95d88e218c4) by BigScience: Notion pages that describes how the BLOOM model was built, with a lot of useful information about the engineering part and the problems that were encountered.
* [OPT-175 Logbook](https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/chronicles/OPT175B_Logbook.pdf) by Meta: Research logs showing what went wrong and what went right. Useful if you're planning to pre-train a very large language model (in this case, 175B parameters).
* [LLM 360](https://www.llm360.ai/): A framework for open-source LLMs with training and data preparation code, data, metrics, and models.
* [Causal language modeling vs Masked language modeling](https://medium.com/@tom_21755/understanding-causal-llms-masked-llm-s-and-seq2seq-a-guide-to-language-model-training-d4457bbd07fa) by Tomas Vykruta: Explains the difference causal language modeling, masked language modelling and Seq2Seq.
---
### 4. Supervised Fine-Tuning

Loading…
Cancel
Save