mirror of https://github.com/hwchase17/langchain
You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
308 lines
10 KiB
Plaintext
308 lines
10 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7f0b0c06-ee70-468c-8bf5-b023f9e5e0a2",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Diffbot\n",
|
|
"\n",
|
|
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/graph/diffbot_graphtransformer.ipynb)\n",
|
|
"\n",
|
|
">[Diffbot](https://docs.diffbot.com/docs/getting-started-with-diffbot) is a suite of products that make it easy to integrate and research data on the web.\n",
|
|
">\n",
|
|
">[The Diffbot Knowledge Graph](https://docs.diffbot.com/docs/getting-started-with-diffbot-knowledge-graph) is a self-updating graph database of the public web.\n",
|
|
"\n",
|
|
"\n",
|
|
"## Use case\n",
|
|
"\n",
|
|
"Text data often contain rich relationships and insights used for various analytics, recommendation engines, or knowledge management applications.\n",
|
|
"\n",
|
|
"`Diffbot's NLP API` allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.\n",
|
|
"\n",
|
|
"By coupling `Diffbot's NLP API` with `Neo4j`, a graph database, you can create powerful, dynamic graph structures based on the information extracted from text. These graph structures are fully queryable and can be integrated into various applications.\n",
|
|
"\n",
|
|
"This combination allows for use cases such as:\n",
|
|
"\n",
|
|
"* Building knowledge graphs from textual documents, websites, or social media feeds.\n",
|
|
"* Generating recommendations based on semantic relationships in the data.\n",
|
|
"* Creating advanced search features that understand the relationships between entities.\n",
|
|
"* Building analytics dashboards that allow users to explore the hidden relationships in data.\n",
|
|
"\n",
|
|
"## Overview\n",
|
|
"\n",
|
|
"LangChain provides tools to interact with Graph Databases:\n",
|
|
"\n",
|
|
"1. `Construct knowledge graphs from text` using graph transformer and store integrations \n",
|
|
"2. `Query a graph database` using chains for query creation and execution\n",
|
|
"3. `Interact with a graph database` using agents for robust and flexible querying \n",
|
|
"\n",
|
|
"## Setting up\n",
|
|
"\n",
|
|
"First, get required packages and set environment variables:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "975648da-b24f-4164-a671-6772179e12df",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%pip install --upgrade --quiet langchain langchain-experimental langchain-openai neo4j wikipedia"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "77718977-629e-46c2-b091-f9191b9ec569",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Diffbot NLP Service\n",
|
|
"\n",
|
|
"`Diffbot's NLP` service is a tool for extracting entities, relationships, and semantic context from unstructured text data.\n",
|
|
"This extracted information can be used to construct a knowledge graph.\n",
|
|
"To use their service, you'll need to obtain an API key from [Diffbot](https://www.diffbot.com/products/natural-language/)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "2cbf97d0-3682-439b-8750-b695ff726789",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer\n",
|
|
"\n",
|
|
"diffbot_api_key = \"DIFFBOT_API_KEY\"\n",
|
|
"diffbot_nlp = DiffbotGraphTransformer(diffbot_api_key=diffbot_api_key)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5e3b894a-e3ee-46c7-8116-f8377f8f0159",
|
|
"metadata": {},
|
|
"source": [
|
|
"This code fetches Wikipedia articles about \"Warren Buffett\" and then uses `DiffbotGraphTransformer` to extract entities and relationships.\n",
|
|
"The `DiffbotGraphTransformer` outputs a structured data `GraphDocument`, which can be used to populate a graph database.\n",
|
|
"Note that text chunking is avoided due to Diffbot's [character limit per API request](https://docs.diffbot.com/reference/introduction-to-natural-language-api)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "53f8df86-47a1-44a1-9a0f-6725b90703bc",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_community.document_loaders import WikipediaLoader\n",
|
|
"\n",
|
|
"query = \"Warren Buffett\"\n",
|
|
"raw_documents = WikipediaLoader(query=query).load()\n",
|
|
"graph_documents = diffbot_nlp.convert_to_graph_documents(raw_documents)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "31bb851a-aab4-4b97-a6b7-fce397d32b47",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Loading the data into a knowledge graph\n",
|
|
"\n",
|
|
"You will need to have a running Neo4j instance. One option is to create a [free Neo4j database instance in their Aura cloud service](https://neo4j.com/cloud/platform/aura-graph-database/). You can also run the database locally using the [Neo4j Desktop application](https://neo4j.com/download/), or running a docker container. You can run a local docker container by running the executing the following script:\n",
|
|
"```\n",
|
|
"docker run \\\n",
|
|
" --name neo4j \\\n",
|
|
" -p 7474:7474 -p 7687:7687 \\\n",
|
|
" -d \\\n",
|
|
" -e NEO4J_AUTH=neo4j/pleaseletmein \\\n",
|
|
" -e NEO4J_PLUGINS=\\[\\\"apoc\\\"\\] \\\n",
|
|
" neo4j:latest\n",
|
|
"``` \n",
|
|
"If you are using the docker container, you need to wait a couple of second for the database to start."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "0b2b6641-5a5d-467c-b148-e6aad5e4baa7",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_community.graphs import Neo4jGraph\n",
|
|
"\n",
|
|
"url = \"bolt://localhost:7687\"\n",
|
|
"username = \"neo4j\"\n",
|
|
"password = \"pleaseletmein\"\n",
|
|
"\n",
|
|
"graph = Neo4jGraph(url=url, username=username, password=password)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "0b15e840-fe6f-45db-9193-1b4e2df5c12c",
|
|
"metadata": {},
|
|
"source": [
|
|
"The `GraphDocuments` can be loaded into a knowledge graph using the `add_graph_documents` method."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"id": "1a67c4a8-955c-42a2-9c5d-de3ac0e640ec",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"graph.add_graph_documents(graph_documents)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ed411e05-2b03-460d-997e-938482774f40",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Refresh graph schema information\n",
|
|
"If the schema of database changes, you can refresh the schema information needed to generate Cypher statements"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"id": "904c9ee3-787c-403f-857d-459ce5ad5a1b",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"graph.refresh_schema()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f19d1387-5899-4258-8c94-8ef5fa7db464",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Querying the graph\n",
|
|
"We can now use the graph cypher QA chain to ask question of the graph. It is advisable to use **gpt-4** to construct Cypher queries to get the best experience."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "9393b732-67c8-45c1-9ec2-089f49c62448",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain.chains import GraphCypherQAChain\n",
|
|
"from langchain_openai import ChatOpenAI\n",
|
|
"\n",
|
|
"chain = GraphCypherQAChain.from_llm(\n",
|
|
" cypher_llm=ChatOpenAI(temperature=0, model_name=\"gpt-4\"),\n",
|
|
" qa_llm=ChatOpenAI(temperature=0, model_name=\"gpt-3.5-turbo\"),\n",
|
|
" graph=graph,\n",
|
|
" verbose=True,\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"id": "1a9b3652-b436-404d-aa25-5fb576f23dc0",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\n",
|
|
"\n",
|
|
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
|
|
"Generated Cypher:\n",
|
|
"\u001b[32;1m\u001b[1;3mMATCH (p:Person {name: \"Warren Buffett\"})-[:EDUCATED_AT]->(o:Organization)\n",
|
|
"RETURN o.name\u001b[0m\n",
|
|
"Full Context:\n",
|
|
"\u001b[32;1m\u001b[1;3m[{'o.name': 'New York Institute of Finance'}, {'o.name': 'Alice Deal Junior High School'}, {'o.name': 'Woodrow Wilson High School'}, {'o.name': 'University of Nebraska'}]\u001b[0m\n",
|
|
"\n",
|
|
"\u001b[1m> Finished chain.\u001b[0m\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"'Warren Buffett attended the University of Nebraska.'"
|
|
]
|
|
},
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"chain.run(\"Which university did Warren Buffett attend?\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"id": "adc0ba0f-a62c-4875-89ce-da717f3ab148",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\n",
|
|
"\n",
|
|
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
|
|
"Generated Cypher:\n",
|
|
"\u001b[32;1m\u001b[1;3mMATCH (p:Person)-[r:EMPLOYEE_OR_MEMBER_OF]->(o:Organization) WHERE o.name = 'Berkshire Hathaway' RETURN p.name\u001b[0m\n",
|
|
"Full Context:\n",
|
|
"\u001b[32;1m\u001b[1;3m[{'p.name': 'Charlie Munger'}, {'p.name': 'Oliver Chace'}, {'p.name': 'Howard Buffett'}, {'p.name': 'Howard'}, {'p.name': 'Susan Buffett'}, {'p.name': 'Warren Buffett'}]\u001b[0m\n",
|
|
"\n",
|
|
"\u001b[1m> Finished chain.\u001b[0m\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"'Charlie Munger, Oliver Chace, Howard Buffett, Susan Buffett, and Warren Buffett are or were working at Berkshire Hathaway.'"
|
|
]
|
|
},
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"chain.run(\"Who is or was working at Berkshire Hathaway?\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "d636954b-d967-4e96-9489-92e11c74af35",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.12"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|