langchain/libs/experimental/langchain_experimental
matt haigh a4896da2a0
Experimental: Add other threshold types to SemanticChunker (#16807)
**Description**
Adding different threshold types to the semantic chunker. I’ve had much
better and predictable performance when using standard deviations
instead of percentiles.


![image](https://github.com/langchain-ai/langchain/assets/44395485/066e84a8-460e-4da5-9fa1-4ff79a1941c5)

For all the documents I’ve tried, the distribution of distances look
similar to the above: positively skewed normal distribution. All skews
I’ve seen are less than 1 so that explains why standard deviations
perform well, but I’ve included IQR if anyone wants something more
robust.

Also, using the percentile method backwards, you can declare the number
of clusters and use semantic chunking to get an ‘optimal’ splitting.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-02-26 13:50:48 -08:00
..
agents experimental[patch]: fix zero-shot pandas agent (#17442) 2024-02-12 21:58:35 -08:00
autonomous_agents experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
chat_models experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
comprehend_moderation experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
cpal experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
data_anonymizer experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
fallacy_removal experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
generative_agents experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
graph_transformers experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
llm_bash experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
llm_symbolic_math experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
llms experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
open_clip experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
openai_assistant Move OAI assistants to langchain and add callbacks (#13236) 2023-11-13 17:42:07 -08:00
pal_chain experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
plan_and_execute experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
prompt_injection_identifier experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
prompts experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
pydantic_v1 poetry lock the experimental package. (#9478) 2023-08-22 14:09:35 -04:00
recommenders experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
retrievers experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
rl_chain experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
smart_llm experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
sql experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
synthetic_data experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
tabular_synthetic_data experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
tools experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
tot experimental: docstrings update (#18048) 2024-02-23 21:24:16 -05:00
utilities Clean up deprecated agents and update __init__ in experimental (#12231) 2023-10-27 13:52:50 -04:00
__init__.py Add version to langchain_experimental (#11613) 2023-10-10 11:17:41 -04:00
py.typed Add py.typed file to langchain-experimental. (#9557) 2023-08-21 15:37:16 -04:00
text_splitter.py Experimental: Add other threshold types to SemanticChunker (#16807) 2024-02-26 13:50:48 -08:00