langchain/docs/extras/ecosystem/integrations/diffbot.mdx
Davis Chase 87e502c6bc
Doc refactor (#6300)
Co-authored-by: jacoblee93 <jacoblee93@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-16 11:52:56 -07:00

19 lines
837 B
Plaintext

# Diffbot
>[Diffbot](https://docs.diffbot.com/docs) is a service to read web pages. Unlike traditional web scraping tools,
> `Diffbot` doesn't require any rules to read the content on a page.
>It starts with computer vision, which classifies a page into one of 20 possible types. Content is then interpreted by a machine learning model trained to identify the key attributes on a page based on its type.
>The result is a website transformed into clean-structured data (like JSON or CSV), ready for your application.
## Installation and Setup
Read [instructions](https://docs.diffbot.com/reference/authentication) how to get the Diffbot API Token.
## Document Loader
See a [usage example](/docs/modules/data_connection/document_loaders/integrations/diffbot.html).
```python
from langchain.document_loaders import DiffbotLoader
```