You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/libs/experimental/langchain_experimental
matt haigh a4896da2a0
Experimental: Add other threshold types to SemanticChunker (#16807)
**Description**
Adding different threshold types to the semantic chunker. I’ve had much
better and predictable performance when using standard deviations
instead of percentiles.


![image](https://github.com/langchain-ai/langchain/assets/44395485/066e84a8-460e-4da5-9fa1-4ff79a1941c5)

For all the documents I’ve tried, the distribution of distances look
similar to the above: positively skewed normal distribution. All skews
I’ve seen are less than 1 so that explains why standard deviations
perform well, but I’ve included IQR if anyone wants something more
robust.

Also, using the percentile method backwards, you can declare the number
of clusters and use semantic chunking to get an ‘optimal’ splitting.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
7 months ago
..
agents experimental[patch]: fix zero-shot pandas agent (#17442) 7 months ago
autonomous_agents experimental: docstrings update (#18048) 7 months ago
chat_models experimental: docstrings update (#18048) 7 months ago
comprehend_moderation experimental: docstrings update (#18048) 7 months ago
cpal experimental: docstrings update (#18048) 7 months ago
data_anonymizer experimental: docstrings update (#18048) 7 months ago
fallacy_removal experimental: docstrings update (#18048) 7 months ago
generative_agents experimental: docstrings update (#18048) 7 months ago
graph_transformers experimental: docstrings update (#18048) 7 months ago
llm_bash experimental: docstrings update (#18048) 7 months ago
llm_symbolic_math experimental: docstrings update (#18048) 7 months ago
llms experimental: docstrings update (#18048) 7 months ago
open_clip experimental: docstrings update (#18048) 7 months ago
openai_assistant Move OAI assistants to langchain and add callbacks (#13236) 10 months ago
pal_chain experimental: docstrings update (#18048) 7 months ago
plan_and_execute experimental: docstrings update (#18048) 7 months ago
prompt_injection_identifier experimental: docstrings update (#18048) 7 months ago
prompts experimental: docstrings update (#18048) 7 months ago
pydantic_v1 `poetry lock` the experimental package. (#9478) 1 year ago
recommenders experimental: docstrings update (#18048) 7 months ago
retrievers experimental: docstrings update (#18048) 7 months ago
rl_chain experimental: docstrings update (#18048) 7 months ago
smart_llm experimental: docstrings update (#18048) 7 months ago
sql experimental: docstrings update (#18048) 7 months ago
synthetic_data experimental: docstrings update (#18048) 7 months ago
tabular_synthetic_data experimental: docstrings update (#18048) 7 months ago
tools experimental: docstrings update (#18048) 7 months ago
tot experimental: docstrings update (#18048) 7 months ago
utilities Clean up deprecated agents and update __init__ in experimental (#12231) 11 months ago
__init__.py Add version to langchain_experimental (#11613) 12 months ago
py.typed Add `py.typed` file to `langchain-experimental`. (#9557) 1 year ago
text_splitter.py Experimental: Add other threshold types to SemanticChunker (#16807) 7 months ago