diff --git a/gpt4all-training/TRAINING_LOG.md b/gpt4all-training/TRAINING_LOG.md index f86838c2..2433175c 100644 --- a/gpt4all-training/TRAINING_LOG.md +++ b/gpt4all-training/TRAINING_LOG.md @@ -247,7 +247,7 @@ We trained multiple [GPT-J models](https://huggingface.co/EleutherAI/gpt-j-6b) w We release the checkpoint after epoch 1. -Using Atlas, we extracted the embeddings of each point in the dataset and calculated the loss per sequence. We then uploaded [this to Atlas](https://atlas.nomic.ai/map/gpt4all-j-post-epoch-1-embeddings) and noticed that the higher loss items seem to cluster. On further inspection, the highest density clusters seemded to be of prompt/response pairs that asked for creative-like generations such as `Generate a story about ...` ![](figs/clustering_overfit.png) +Using Atlas, we extracted the embeddings of each point in the dataset and calculated the loss per sequence. We then uploaded [this to Atlas](https://atlas.nomic.ai/map/gpt4all-j-post-epoch-1-embeddings) and noticed that the higher loss items seem to cluster. On further inspection, the highest density clusters seemed to be of prompt/response pairs that asked for creative-like generations such as `Generate a story about ...` ![](figs/clustering_overfit.png)