docs: Clean up Diffbot docs (#21781)

The Diffbot DocumentLoader page doesn't actually run for a number of
reasons. This PR fixes it along with some light details on the Graph
Transformer and Provider pages.

## Full Changelog

[Document Loader
Page](https://python.langchain.com/v0.1/docs/integrations/document_loaders/diffbot/)
* Fixed the notebook so that it actually runs (missing required modules,
env variables, etc..)
* Added "open in colab" button like the Graph Transformer page

[Graph Transformer
Page](https://python.langchain.com/v0.2/docs/integrations/graphs/diffbot/)
* Fixed broken colab link
* Moved "open in colab" button to below description so the description
in the [Graphs category
page](https://python.langchain.com/v0.2/docs/integrations/graphs/) shows
up correctly

[Provider
Page](https://python.langchain.com/v0.2/docs/integrations/providers/diffbot/)
* Clarified explanations of Diffbot products
* Added section and link to LangChain Graph Transformer page

---------

Co-authored-by: jeromechoo <hello@jeromechoo.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
pull/21735/head
Jerome Choo 4 months ago committed by GitHub
parent d8a101074f
commit 2316635add
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

File diff suppressed because one or more lines are too long

@ -7,24 +7,21 @@
"source": [
"# Diffbot\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/graph/diffbot_graphtransformer.ipynb)\n",
"\n",
">[Diffbot](https://docs.diffbot.com/docs/getting-started-with-diffbot) is a suite of products that make it easy to integrate and research data on the web.\n",
">[Diffbot](https://docs.diffbot.com/docs/getting-started-with-diffbot) is a suite of ML-based products that make it easy to structure web data.\n",
">\n",
">[The Diffbot Knowledge Graph](https://docs.diffbot.com/docs/getting-started-with-diffbot-knowledge-graph) is a self-updating graph database of the public web.\n",
">Diffbot's [Natural Language Processing API](https://www.diffbot.com/products/natural-language/) allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/integrations/graphs/diffbot.ipynb)\n",
"\n",
"## Use case\n",
"\n",
"Text data often contain rich relationships and insights used for various analytics, recommendation engines, or knowledge management applications.\n",
"\n",
"`Diffbot's NLP API` allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.\n",
"\n",
"By coupling `Diffbot's NLP API` with `Neo4j`, a graph database, you can create powerful, dynamic graph structures based on the information extracted from text. These graph structures are fully queryable and can be integrated into various applications.\n",
"\n",
"This combination allows for use cases such as:\n",
"\n",
"* Building knowledge graphs from textual documents, websites, or social media feeds.\n",
"* Building knowledge graphs (like [Diffbot's Knowledge Graph](https://www.diffbot.com/products/knowledge-graph/)) from textual documents, websites, or social media feeds.\n",
"* Generating recommendations based on semantic relationships in the data.\n",
"* Creating advanced search features that understand the relationships between entities.\n",
"* Building analytics dashboards that allow users to explore the hidden relationships in data.\n",
@ -57,11 +54,11 @@
"id": "77718977-629e-46c2-b091-f9191b9ec569",
"metadata": {},
"source": [
"### Diffbot NLP Service\n",
"### Diffbot NLP API\n",
"\n",
"`Diffbot's NLP` service is a tool for extracting entities, relationships, and semantic context from unstructured text data.\n",
"`Diffbot's NLP API` is a tool for extracting entities, relationships, and semantic context from unstructured text data.\n",
"This extracted information can be used to construct a knowledge graph.\n",
"To use their service, you'll need to obtain an API key from [Diffbot](https://www.diffbot.com/products/natural-language/)."
"To use the API, you'll need to obtain a [free API token from Diffbot](https://app.diffbot.com/get-started/)."
]
},
{
@ -73,8 +70,8 @@
"source": [
"from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer\n",
"\n",
"diffbot_api_key = \"DIFFBOT_API_KEY\"\n",
"diffbot_nlp = DiffbotGraphTransformer(diffbot_api_key=diffbot_api_key)"
"diffbot_api_token = \"DIFFBOT_API_TOKEN\"\n",
"diffbot_nlp = DiffbotGraphTransformer(diffbot_api_token=diffbot_api_token)"
]
},
{

@ -1,18 +1,29 @@
# Diffbot
>[Diffbot](https://docs.diffbot.com/docs) is a service to read web pages. Unlike traditional web scraping tools,
> `Diffbot` doesn't require any rules to read the content on a page.
>It starts with computer vision, which classifies a page into one of 20 possible types. Content is then interpreted by a machine learning model trained to identify the key attributes on a page based on its type.
>The result is a website transformed into clean-structured data (like JSON or CSV), ready for your application.
> [Diffbot](https://docs.diffbot.com/docs) is a suite of ML-based products that make it easy to structure and integrate web data.
## Installation and Setup
Read [instructions](https://docs.diffbot.com/reference/authentication) how to get the Diffbot API Token.
[Get a free Diffbot API token](https://app.diffbot.com/get-started/) and [follow these instructions](https://docs.diffbot.com/reference/authentication) to authenticate your requests.
## Document Loader
Diffbot's [Extract API](https://docs.diffbot.com/reference/extract-introduction) is a service that structures and normalizes data from web pages.
Unlike traditional web scraping tools, `Diffbot Extract` doesn't require any rules to read the content on a page. It uses a computer vision model to classify a page into one of 20 possible types, and then transforms raw HTML markup into JSON. The resulting structured JSON follows a consistent [type-based ontology](https://docs.diffbot.com/docs/ontology), which makes it easy to extract data from multiple different web sources with the same schema.
See a [usage example](/docs/integrations/document_loaders/diffbot).
```python
from langchain_community.document_loaders import DiffbotLoader
```
## Graphs
Diffbot's [Natural Language Processing API](https://www.diffbot.com/products/natural-language/) allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.
See a [usage example](/docs/integrations/graphs/diffbot).
```python
from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer
```

Loading…
Cancel
Save