"2. **Number of Messages Per Example**: Summarizes the distribution of the number of messages in each conversation, providing insight into dialogue complexity.\n",
"3. **Total Tokens Per Example**: Calculates and summarizes the distribution of the total number of tokens in each conversation. Important for understanding fine-tuning costs.\n",
"4. **Tokens in Assistant's Messages**: Calculates the number of tokens in the assistant's messages per conversation and summarizes this distribution. Useful for understanding the assistant's verbosity.\n",
"5. **Token Limit Warnings**: Checks if any examples exceed the maximum token limit (4096 tokens), as such examples will be truncated during fine-tuning, potentially resulting in data loss.\n"
"5. **Token Limit Warnings**: Checks if any examples exceed the maximum token limit (16,385 tokens), as such examples will be truncated during fine-tuning, potentially resulting in data loss.\n"
]
},
{
@ -240,7 +240,7 @@
"mean / median: 1610.2, 10.0\n",
"p5 / p95: 6.0, 4811.200000000001\n",
"\n",
"1 examples may be over the 4096 token limit, they will be truncated during fine-tuning\n"
"0 examples may be over the 16,385 token limit, they will be truncated during fine-tuning\n"