You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/libs/experimental
matt haigh a4896da2a0
Experimental: Add other threshold types to SemanticChunker (#16807)
**Description**
Adding different threshold types to the semantic chunker. I’ve had much
better and predictable performance when using standard deviations
instead of percentiles.


![image](https://github.com/langchain-ai/langchain/assets/44395485/066e84a8-460e-4da5-9fa1-4ff79a1941c5)

For all the documents I’ve tried, the distribution of distances look
similar to the above: positively skewed normal distribution. All skews
I’ve seen are less than 1 so that explains why standard deviations
perform well, but I’ve included IQR if anyone wants something more
robust.

Also, using the percentile method backwards, you can declare the number
of clusters and use semantic chunking to get an ‘optimal’ splitting.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
7 months ago
..
langchain_experimental Experimental: Add other threshold types to SemanticChunker (#16807) 7 months ago
scripts infra: add print rule to ruff (#16221) 7 months ago
tests docs, templates: update schema imports to core (#17885) 7 months ago
LICENSE Library Licenses (#13300) 10 months ago
Makefile create mypy cache dir if it doesn't exist (#14579) 9 months ago
README.md CONTRIBUTING.md Quick Start: focus on langchain core; clarify docs and experimental are separate (#10906) 12 months ago
poetry.lock experimental[patch]: Release 0.0.52 (#17763) 7 months ago
poetry.toml Harrison/move experimental (#8084) 1 year ago
pyproject.toml experimental[patch]: Release 0.0.52 (#17763) 7 months ago

README.md

🦜🧪 LangChain Experimental

This package holds experimental LangChain code, intended for research and experimental uses.

[!WARNING] Portions of the code in this package may be dangerous if not properly deployed in a sandboxed environment. Please be wary of deploying experimental code to production unless you've taken appropriate precautions and have already discussed it with your security team.

Some of the code here may be marked with security notices. However, given the exploratory and experimental nature of the code in this package, the lack of a security notice on a piece of code does not mean that the code in question does not require additional security considerations in order to be safe to use.