Add handling for last chunk in split_into_sentences function

I have added handling for the last chunk in the split_into_sentences function. Previously, the function did not account for the last chunk, which could lead to incomplete sentences in the output.

To solve this, I added a conditional statement to check if the last chunk is non-empty. If it is, I append it to the list of chunks with a period to ensure the last sentence is complete.

This change improves the accuracy of the split_into_sentences function and ensures that all sentences in the input text are properly segmented. Please review and let me know if you have any feedback or concerns.
This commit is contained in:
Sung Kim 2023-02-19 11:00:27 +09:00 committed by GitHub
parent 3826607431
commit 3210b38e35
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -248,6 +248,10 @@ def split_into_many(text, max_tokens = max_tokens):
# Otherwise, add the sentence to the chunk and add the number of tokens to the total
chunk.append(sentence)
tokens_so_far += token + 1
# Add the last chunk to the list of chunks
if chunk:
chunks.append(". ".join(chunk) + ".")
return chunks