langchain

History

kahkeng 4a8f5cdf4b Add alternative token-based text splitter (#816 ) This does not involve a separator, and will naively chunk input text at the appropriate boundaries in token space. This is helpful if we have strict token length limits that we need to strictly follow the specified chunk size, and we can't use aggressive separators like spaces to guarantee the absence of long strings. CharacterTextSplitter will let these strings through without splitting them, which could cause overflow errors downstream. Splitting at arbitrary token boundaries is not ideal but is hopefully mitigated by having a decent overlap quantity. Also this results in chunks which has exact number of tokens desired, instead of sometimes overcounting if we concatenate shorter strings. Potentially also helps with #528.		2023-02-02 19:55:13 -08:00
..
chains	Harrison/improve memory (#432 )	2022-12-27 08:23:51 -05:00
embeddings	rfc: instruct embeddings (#811 )	2023-02-02 08:44:02 -08:00
llms	Harrison/cohere experimental (#638 )	2023-01-17 22:28:55 -08:00
vectorstores	add faiss local saving/loading (#676 )	2023-01-21 16:08:14 -08:00
__init__.py	initial commit	2022-10-24 14:51:15 -07:00
test_googlesearch_api.py	Harrison/new search engine (#477 )	2022-12-30 08:06:57 -05:00
test_ngram_overlap_example_selector.py	Harrison/ngram example (#846 )	2023-02-02 09:44:42 -08:00
test_nlp_text_splitters.py	OptimizedPrompt -- k-shot example choice backed by semantic search (#91 )	2022-11-09 21:15:42 -08:00
test_serpapi.py	move search to not be a chain (#226 )	2022-11-29 20:07:44 -08:00
test_text_splitter.py	Add alternative token-based text splitter (#816 )	2023-02-02 19:55:13 -08:00
test_wolfram_alpha_api.py	Harrison/wolfram alpha (#579 )	2023-01-11 05:52:19 -08:00