You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/docs
mziru 9e3c1d4463
add HTMLHeaderTextSplitter (#11039)
Description: Similar in concept to the `MarkdownHeaderTextSplitter`, the
`HTMLHeaderTextSplitter` is a "structure-aware" chunker that splits text
at the element level and adds metadata for each header "relevant" to any
given chunk. It can return chunks element by element or combine elements
with the same metadata, with the objectives of (a) keeping related text
grouped (more or less) semantically and (b) preserving context-rich
information encoded in document structures. It can be used with other
text splitters as part of a chunking pipeline.

Dependency: lxml python package

Maintainer: @hwchase17

Twitter handle: @MartinZirulnik

---------

Co-authored-by: PresidioVantage <github@presidiovantage.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
12 months ago
..
_scripts llm feat table revision (#10947) 12 months ago
api_reference add model feat table (#10921) 12 months ago
docs_skeleton Use term keyword according to the official python doc glossary (#11338) 12 months ago
extras add HTMLHeaderTextSplitter (#11039) 12 months ago
snippets Docs: improve similarity search examples (#11298) 12 months ago
.local_build.sh Update local script for docs build (#8377) 1 year ago
vercel_requirements.txt Add api cross ref linking (#8275) 1 year ago