Commit Graph

10 Commits (1e4927a1d23108c4a9374952f2f5661df76387f0)

Author SHA1 Message Date
Smit Shah a87a2aacaa
[Minor Fix] Fix spacy TextSplitter init (#606) 1 year ago
Harrison Chase 1511606799
Harrison/fix splitting (#563)
fix issue where text splitting could possibly create empty docs
1 year ago
Harrison Chase 1192cc0767
smart text splitter (#530)
smart text splitter that iteratively tries different separators until it
works!
1 year ago
Harrison Chase c104d507bf
Harrison/improve data augmented generation docs (#390)
Co-authored-by: cameronccohen <cameron.c.cohen@gmail.com>
Co-authored-by: Cameron Cohen <cameron.cohen@quantco.com>
2 years ago
Harrison Chase e7b625fe03
fix text splitter (#375) 2 years ago
Harrison Chase 2dd895d98c
add openai tokenizer (#355) 2 years ago
Xupeng (Tony) Tong bb4bf9d6d0
chore: minor clean up / formatting (#233)
to get familiarize with the project
2 years ago
Harrison Chase d87e73ddb1
huggingface tokenizer (#75) 2 years ago
Delip Rao 3ee6e332dd
Implements NLTK and Spacy-based TextSplitters (#103)
This PR is for Issue #88 

- [x] `make format`
- [x] `make lint`
- [x] `make tests`
2 years ago
Harrison Chase 160af4ba6b
Harrison/map reduce (#36) 2 years ago