Commit Graph

8 Commits

Author SHA1 Message Date
Darshan Panchal
cea88fe1b1 Update requirements.txt
removed html since it was not required
2023-05-11 09:21:34 +05:30
Alexander Khapaev
ca9ffd2ae8 Updated the get_domain_hyperlinks function to include handling of tel: links in addition to mailto: links, to exclude them from the clean links list. 2023-04-07 18:28:44 +03:00
fabiofranco85
8faf594773 Improve regex 2023-03-27 07:38:35 -03:00
William Buck
f86eaea9ca remove duplicate import of distances_from_embeddings 2023-03-20 13:02:37 -07:00
Sung Kim
58b63dfb0e Add handling for last chunk in split_into_sentences function
I have added handling for the last chunk in the split_into_sentences function. Previously, the function did not account for the last chunk, which could lead to incomplete sentences in the output.

To solve this, I added a conditional statement to check if the last chunk is non-empty. If it is, I append it to the list of chunks with a period to ensure the last sentence is complete.

This change improves the accuracy of the split_into_sentences function and ensures that all sentences in the input text are properly segmented. Please review and let me know if you have any feedback or concerns.
2023-02-19 11:00:27 +09:00
Logan Kilpatrick
e030442a09 Add comment on where to learn about rate limits 2023-02-17 06:16:14 -06:00
Daniel Zhukovsky
215f15795d Redefinition of unused 'pd' 2023-02-16 15:05:04 +00:00
isafulf
f453d0154a rename web crawl q and a 2023-02-11 16:37:29 -08:00