Commit Graph

13 Commits (4dd5df1b6fd0104b46fa5e85e5300507007de44a)

Author SHA1 Message Date
Zach Nussbaum a3485c4b32 Merge: main into gptj 1 year ago
Zach Nussbaum 8a94a8c068 fix: multi-turn data breaks 1 year ago
Zach Nussbaum be3f528810 fix: tokenization error 1 year ago
Zach 0bd6acb4dd fix: drop uneven batch size 1 year ago
Zach 1b14b1f723 fix: data for inference 1 year ago
Zach 7751f39432 fix: data processing 1 year ago
Zach 65ec606f21 fix: prompt len for larger 1 year ago
Zach Nussbaum 5c5f41ba36 fix: clean up data, pad at end 1 year ago
Zach Nussbaum 7e468f2199
Update data.py 1 year ago
Zach Nussbaum 1a95f68494 fix: just read from watermark file 1 year ago
Zach Nussbaum bb28929305 fix: eos conditional, watermark 1 year ago
Zach Nussbaum eac7734cbf fix: add eos 1 year ago
Zach Nussbaum 723a50bdf1 feat: train and clean data 1 year ago