"Most LLMs are constrained by the number of tokens that you can pass in, which is not the same as the number of characters. In order to get a more accurate estimate, we can use HuggingFace tokenizers to count the text length.\n",
"# HuggingFace Length Function\n",
"Most LLMs are constrained by the number of tokens that you can pass in, which is not the same as the number of characters. In order to get a more accurate estimate, we can use HuggingFace tokenizers to count the text length.\n",
"\n",
"1. How the text is split: by character passed in\n",
"2. How the chunk size is measured: by Hugging Face tokenizer"