Update TRAINING_LOG.md

pull/7/head
Zach Nussbaum 2 years ago committed by GitHub
parent d55df6b254
commit f7b6263749
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -230,4 +230,8 @@ We additionally train a full model
| Weight decay | 0 |
| Warmup Steps | 100 |
Taking inspiration from [the Alpaca Repo](https://github.com/tatsu-lab/stanford_alpaca), we roughly scale the learning rate by `sqrt(k)`, where `k` is the increase in batch size, where Alpaca used a batch size of 128 and learning rate of 2e-5.
Taking inspiration from [the Alpaca Repo](https://github.com/tatsu-lab/stanford_alpaca), we roughly scale the learning rate by `sqrt(k)`, where `k` is the increase in batch size, where Alpaca used a batch size of 128 and learning rate of 2e-5.
Comparing our model LoRa to the [Alpaca LoRa](https://huggingface.co/tloen/alpaca-lora-7b), our model has lower perplexity. Qualitatively, training on 3 epochs performed the best on perplexity as well as qualitative examples.
We tried training a full model using the parameters above, but found that during the second epoch the model overfit.

Loading…
Cancel
Save