Commit Graph

7 Commits (ee9b6268d450bc62c275e095cce2d0d275a92cbd)

Author SHA1 Message Date
Alexander Khapaev ee9b6268d4 Updated the get_domain_hyperlinks function to include handling of tel: links in addition to mailto: links, to exclude them from the clean links list. 1 year ago
fabiofranco85 5a80ef2571
Improve regex 1 year ago
William Buck ca9b9d485d
remove duplicate import of `distances_from_embeddings` 1 year ago
Sung Kim 3210b38e35
Add handling for last chunk in split_into_sentences function
I have added handling for the last chunk in the split_into_sentences function. Previously, the function did not account for the last chunk, which could lead to incomplete sentences in the output.

To solve this, I added a conditional statement to check if the last chunk is non-empty. If it is, I append it to the list of chunks with a period to ensure the last sentence is complete.

This change improves the accuracy of the split_into_sentences function and ensures that all sentences in the input text are properly segmented. Please review and let me know if you have any feedback or concerns.
1 year ago
Logan Kilpatrick 3826607431
Add comment on where to learn about rate limits 1 year ago
Daniel Zhukovsky be9877edbf
Redefinition of unused 'pd' 1 year ago
isafulf daf8e0d011 rename web crawl q and a 1 year ago