You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/libs/community/langchain_community/document_loaders/parsers
Florian MOREL 4b7969efc5
community[minor]: New documents loader for visio files (with extension .vsdx) (#16171)
**Description** : New documents loader for visio files (with extension
.vsdx)

A [visio file](https://fr.wikipedia.org/wiki/Microsoft_Visio) (with
extension .vsdx) is associated with Microsoft Visio, a diagram creation
software. It stores information about the structure, layout, and
graphical elements of a diagram. This format facilitates the creation
and sharing of visualizations in areas such as business, engineering,
and computer science.

A Visio file can contain multiple pages. Some of them may serve as the
background for others, and this can occur across multiple layers. This
loader extracts the textual content from each page and its associated
pages, enabling the extraction of all visible text from each page,
similar to what an OCR algorithm would do.

**Dependencies** : xmltodict package
8 months ago
..
html
language
__init__.py community[minor]: New documents loader for visio files (with extension .vsdx) (#16171) 8 months ago
audio.py
doc_intelligence.py
docai.py
generic.py
grobid.py community[patch]: Update grobid.py (#16298) 8 months ago
msword.py
pdf.py
registry.py
txt.py
vsdx.py community[minor]: New documents loader for visio files (with extension .vsdx) (#16171) 8 months ago