Add handling for last chunk in split_into_sentences function

I have added handling for the last chunk in the split_into_sentences function. Previously, the function did not account for the last chunk, which could lead to incomplete sentences in the output.

To solve this, I added a conditional statement to check if the last chunk is non-empty. If it is, I append it to the list of chunks with a period to ensure the last sentence is complete.

This change improves the accuracy of the split_into_sentences function and ensures that all sentences in the input text are properly segmented. Please review and let me know if you have any feedback or concerns.
pull/146/head
Sung Kim 1 year ago committed by GitHub
parent 3826607431
commit 3210b38e35
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -248,6 +248,10 @@ def split_into_many(text, max_tokens = max_tokens):
# Otherwise, add the sentence to the chunk and add the number of tokens to the total
chunk.append(sentence)
tokens_so_far += token + 1
# Add the last chunk to the list of chunks
if chunk:
chunks.append(". ".join(chunk) + ".")
return chunks

Loading…
Cancel
Save