langchain/tests/integration_tests/document_loaders/test_xml.py
qued e4224a396b
feat: Add UnstructuredXMLLoader for .xml files (#5955)
# Unstructured XML Loader
Adds an `UnstructuredXMLLoader` class for .xml files. Works with
unstructured>=0.6.7. A plain text representation of the text with the
XML tags will be available under the `page_content` attribute in the
doc.

### Testing
```python
from langchain.document_loaders import UnstructuredXMLLoader

loader = UnstructuredXMLLoader(
    "example_data/factbook.xml",
)
docs = loader.load()
```


## Who can review?

@hwchase17 
@eyurtsev
2023-06-10 16:24:42 -07:00

16 lines
421 B
Python

import os
from pathlib import Path
from langchain.document_loaders import UnstructuredXMLLoader
EXAMPLE_DIRECTORY = file_path = Path(__file__).parent.parent / "examples"
def test_unstructured_xml_loader() -> None:
"""Test unstructured loader."""
file_path = os.path.join(EXAMPLE_DIRECTORY, "factbook.xml")
loader = UnstructuredXMLLoader(str(file_path))
docs = loader.load()
assert len(docs) == 1