mirror of
https://github.com/hwchase17/langchain
synced 2024-10-31 15:20:26 +00:00
ea331f3136
- **Description:** - Add a break case to `text_splitter.py::split_text_on_tokens()` to avoid unwanted item at the end of result. - Add a testcase to enforce the behavior. - **Issue:** - #14649 - #5897 - **Dependencies:** n/a, --- **Quick illustration of change:** ``` text = "foo bar baz 123" tokenizer = Tokenizer( chunk_overlap=3, tokens_per_chunk=7 ) output = split_text_on_tokens(text=text, tokenizer=tokenizer) ``` output before change: `["foo bar", "bar baz", "baz 123", "123"]` output after change: `["foo bar", "bar baz", "baz 123"]` |
||
---|---|---|
.. | ||
cli | ||
community | ||
core | ||
experimental | ||
langchain | ||
partners |