Add handling for last chunk in split_into_sentences function

I have added handling for the last chunk in the split_into_sentences function. Previously, the function did not account for the last chunk, which could lead to incomplete sentences in the output.

To solve this, I added a conditional statement to check if the last chunk is non-empty. If it is, I append it to the list of chunks with a period to ensure the last sentence is complete.

This change improves the accuracy of the split_into_sentences function and ensures that all sentences in the input text are properly segmented. Please review and let me know if you have any feedback or concerns.
This commit is contained in:
Sung Kim 2023-02-19 11:00:27 +09:00 committed by GitHub
parent e030442a09
commit 58b63dfb0e

View File

@ -249,6 +249,10 @@ def split_into_many(text, max_tokens = max_tokens):
chunk.append(sentence) chunk.append(sentence)
tokens_so_far += token + 1 tokens_so_far += token + 1
# Add the last chunk to the list of chunks
if chunk:
chunks.append(". ".join(chunk) + ".")
return chunks return chunks