Update gpt-3.5 token limit in Chat_finetuning_data_prep.ipynb

The token limit in Chat_finetuning_data_prep.ipynb is not up to date with the current context window limit. It should be 16,385 tokens as stated in https://platform.openai.com/docs/models/gpt-3-5-turbo
pull/1178/head
Sean Diacono 4 weeks ago committed by GitHub
parent dc0e64aedf
commit 4220188df4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -207,7 +207,7 @@
"2. **Number of Messages Per Example**: Summarizes the distribution of the number of messages in each conversation, providing insight into dialogue complexity.\n", "2. **Number of Messages Per Example**: Summarizes the distribution of the number of messages in each conversation, providing insight into dialogue complexity.\n",
"3. **Total Tokens Per Example**: Calculates and summarizes the distribution of the total number of tokens in each conversation. Important for understanding fine-tuning costs.\n", "3. **Total Tokens Per Example**: Calculates and summarizes the distribution of the total number of tokens in each conversation. Important for understanding fine-tuning costs.\n",
"4. **Tokens in Assistant's Messages**: Calculates the number of tokens in the assistant's messages per conversation and summarizes this distribution. Useful for understanding the assistant's verbosity.\n", "4. **Tokens in Assistant's Messages**: Calculates the number of tokens in the assistant's messages per conversation and summarizes this distribution. Useful for understanding the assistant's verbosity.\n",
"5. **Token Limit Warnings**: Checks if any examples exceed the maximum token limit (4096 tokens), as such examples will be truncated during fine-tuning, potentially resulting in data loss.\n" "5. **Token Limit Warnings**: Checks if any examples exceed the maximum token limit (16,385 tokens), as such examples will be truncated during fine-tuning, potentially resulting in data loss.\n"
] ]
}, },
{ {
@ -240,7 +240,7 @@
"mean / median: 1610.2, 10.0\n", "mean / median: 1610.2, 10.0\n",
"p5 / p95: 6.0, 4811.200000000001\n", "p5 / p95: 6.0, 4811.200000000001\n",
"\n", "\n",
"1 examples may be over the 4096 token limit, they will be truncated during fine-tuning\n" "0 examples may be over the 16,385 token limit, they will be truncated during fine-tuning\n"
] ]
} }
], ],
@ -267,8 +267,8 @@
"print_distribution(n_messages, \"num_messages_per_example\")\n", "print_distribution(n_messages, \"num_messages_per_example\")\n",
"print_distribution(convo_lens, \"num_total_tokens_per_example\")\n", "print_distribution(convo_lens, \"num_total_tokens_per_example\")\n",
"print_distribution(assistant_message_lens, \"num_assistant_tokens_per_example\")\n", "print_distribution(assistant_message_lens, \"num_assistant_tokens_per_example\")\n",
"n_too_long = sum(l > 4096 for l in convo_lens)\n", "n_too_long = sum(l > 16,385 for l in convo_lens)\n",
"print(f\"\\n{n_too_long} examples may be over the 4096 token limit, they will be truncated during fine-tuning\")" "print(f\"\\n{n_too_long} examples may be over the 16,385 token limit, they will be truncated during fine-tuning\")"
] ]
}, },
{ {

Loading…
Cancel
Save