Commit Graph

8 Commits (9642ca5e1784d788a2ca1f5fe7929791e1c21f34)

Author SHA1 Message Date
Darshan Panchal cea88fe1b1 Update requirements.txt
removed html since it was not required
1 year ago
Alexander Khapaev ca9ffd2ae8 Updated the get_domain_hyperlinks function to include handling of tel: links in addition to mailto: links, to exclude them from the clean links list. 1 year ago
fabiofranco85 8faf594773 Improve regex 1 year ago
William Buck f86eaea9ca remove duplicate import of `distances_from_embeddings` 2 years ago
Sung Kim 58b63dfb0e Add handling for last chunk in split_into_sentences function
I have added handling for the last chunk in the split_into_sentences function. Previously, the function did not account for the last chunk, which could lead to incomplete sentences in the output.

To solve this, I added a conditional statement to check if the last chunk is non-empty. If it is, I append it to the list of chunks with a period to ensure the last sentence is complete.

This change improves the accuracy of the split_into_sentences function and ensures that all sentences in the input text are properly segmented. Please review and let me know if you have any feedback or concerns.
2 years ago
Logan Kilpatrick e030442a09 Add comment on where to learn about rate limits 2 years ago
Daniel Zhukovsky 215f15795d Redefinition of unused 'pd' 2 years ago
isafulf f453d0154a rename web crawl q and a 2 years ago