forked from Archives/langchain
You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
823 B
823 B
Diffbot
Diffbot is a service to read web pages. Unlike traditional web scraping tools,
Diffbot
doesn't require any rules to read the content on a page. It starts with computer vision, which classifies a page into one of 20 possible types. Content is then interpreted by a machine learning model trained to identify the key attributes on a page based on its type. The result is a website transformed into clean-structured data (like JSON or CSV), ready for your application.
Installation and Setup
Read instructions how to get the Diffbot API Token.
Document Loader
See a usage example.
from langchain.document_loaders import DiffbotLoader