mirror of
https://github.com/hwchase17/langchain
synced 2024-10-29 17:07:25 +00:00
cb84f612c9
- Updated `document_transformers` examples: titles, descriptions, links - Added `integrations/providers` for missed document_transformers
21 lines
606 B
Plaintext
21 lines
606 B
Plaintext
# Beautiful Soup
|
|
|
|
>[Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/) is a Python package for parsing
|
|
> HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup).
|
|
> It creates a parse tree for parsed pages that can be used to extract data from HTML,[3] which
|
|
> is useful for web scraping.
|
|
|
|
## Installation and Setup
|
|
|
|
```bash
|
|
pip install beautifulsoup4
|
|
```
|
|
|
|
## Document Transformer
|
|
|
|
See a [usage example](/docs/integrations/document_transformers/beautiful_soup).
|
|
|
|
```python
|
|
from langchain.document_loaders import BeautifulSoupTransformer
|
|
```
|