langchain/docs/extras/integrations/providers/beautiful_soup.mdx

# Beautiful Soup

>[Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/) is a Python package for parsing
> HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup).
> It creates a parse tree for parsed pages that can be used to extract data from HTML,[3] which
> is useful for web scraping.

## Installation and Setup

```bash
pip install beautifulsoup4
```

## Document Transformer

See a [usage example](/docs/integrations/document_transformers/beautiful_soup).

```python
from langchain.document_loaders import BeautifulSoupTransformer
```