langchain/libs/experimental/langchain_experimental
Raviraj 858ce264ef
SemanticChunker : Feature Addition ("Semantic Splitting with gradient") (#22895)
```SemanticChunker``` currently provide three methods to split the texts semantically:
- percentile
- standard_deviation
- interquartile

I propose new method ```gradient```. In this method, the gradient of distance is used to split chunks along with the percentile method (technically) . This method is useful when chunks are highly correlated with each other or specific to a domain e.g. legal or medical. The idea is to apply anomaly detection on gradient array so that the distribution become wider and easy to identify boundaries in highly semantic data.
I have tested this merge on a set of 10 domain specific documents (mostly legal).

Details : 
    - **Issue:** Improvement
    - **Dependencies:** NA
    - **Twitter handle:** [x.com/prajapat_ravi](https://x.com/prajapat_ravi)


@hwchase17

---------

Co-authored-by: Raviraj Prajapat <raviraj.prajapat@sirionlabs.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
2024-06-17 21:01:08 -07:00
..
agents experimental[patch]/docs[patch]: Update links to security docs (#22864) 2024-06-13 20:29:34 +00:00
autonomous_agents experimental[patch]: return from HuggingGPT task executor task.run() exception (#20219) 2024-04-25 20:16:39 +00:00
chat_models infra: rm unused # noqa violations (#22049) 2024-05-22 15:21:08 -07:00
comprehend_moderation langchain: callbacks imports fix (#20348) 2024-04-12 20:13:14 +00:00
cpal Fix: lint errors and update Field alias in models.py and AutoSelectionScorer initialization (#22846) 2024-06-13 18:18:00 -07:00
data_anonymizer experimental[patch]: update module doc strings (#19539) 2024-03-26 10:38:10 -04:00
fallacy_removal experimental[patch]: prompts import fix (#20534) 2024-04-18 16:09:11 -04:00
generative_agents patch: deprecate (a)get_relevant_documents (#20477) 2024-04-22 11:14:53 -04:00
graph_transformers Improve llm graph transformer docstring (#22939) 2024-06-15 15:33:26 -04:00
llm_bash infra: rm unused # noqa violations (#22049) 2024-05-22 15:21:08 -07:00
llm_symbolic_math experimental[patch]: prompts import fix (#20534) 2024-04-18 16:09:11 -04:00
llms [experimental][llms][OllamaFunctions] tool calling related fixes (#22339) 2024-06-12 16:34:43 -04:00
open_clip experimental[patch]: update module doc strings (#19539) 2024-03-26 10:38:10 -04:00
openai_assistant Move OAI assistants to langchain and add callbacks (#13236) 2023-11-13 17:42:07 -08:00
pal_chain community[major], experimental[patch]: Remove Python REPL from community (#22904) 2024-06-14 17:53:29 +00:00
plan_and_execute experimental[patch]: prompts import fix (#20534) 2024-04-18 16:09:11 -04:00
prompt_injection_identifier experimental[minor]: upgrade the prompt injection model (#20783) 2024-04-23 10:23:39 -04:00
prompts experimental[patch]: prompts import fix (#20534) 2024-04-18 16:09:11 -04:00
pydantic_v1 poetry lock the experimental package. (#9478) 2023-08-22 14:09:35 -04:00
recommenders infra: rm unused # noqa violations (#22049) 2024-05-22 15:21:08 -07:00
retrievers langchain: callbacks imports fix (#20348) 2024-04-12 20:13:14 +00:00
rl_chain infra: rm unused # noqa violations (#22049) 2024-05-22 15:21:08 -07:00
smart_llm experimental[patch]: prompts import fix (#20534) 2024-04-18 16:09:11 -04:00
sql experimental[patch], docs: refine notebook for MyScale SelfQueryRetriever (#22016) 2024-05-22 21:49:01 +00:00
synthetic_data experimental[patch]: prompts import fix (#20534) 2024-04-18 16:09:11 -04:00
tabular_synthetic_data experimental[patch]: prompts import fix (#20534) 2024-04-18 16:09:11 -04:00
tools langchain: callbacks imports fix (#20348) 2024-04-12 20:13:14 +00:00
tot experimental[patch]: prompts import fix (#20534) 2024-04-18 16:09:11 -04:00
utilities experimental: clean python repl input(experimental:Added code for PythonREPL) (#20930) 2024-05-01 05:19:09 +00:00
video_captioning langchain: callbacks imports fix (#20348) 2024-04-12 20:13:14 +00:00
__init__.py Add version to langchain_experimental (#11613) 2023-10-10 11:17:41 -04:00
py.typed Add py.typed file to langchain-experimental. (#9557) 2023-08-21 15:37:16 -04:00
text_splitter.py SemanticChunker : Feature Addition ("Semantic Splitting with gradient") (#22895) 2024-06-17 21:01:08 -07:00