mirror of https://github.com/hwchase17/langchain
Diffbot Graph Transformer / Neo4j Graph document ingestion (#9979)
Co-authored-by: Bagatur <baskaryan@gmail.com>pull/10149/head
parent
ccb9e3ee2d
commit
db73c9d5b5
@ -0,0 +1,307 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7f0b0c06-ee70-468c-8bf5-b023f9e5e0a2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Diffbot Graph Transformer\n",
|
||||
"\n",
|
||||
"[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/more/graph/diffbot_transformer.ipynb)\n",
|
||||
"\n",
|
||||
"## Use case\n",
|
||||
"\n",
|
||||
"Text data often contain rich relationships and insights that can be useful for various analytics, recommendation engines, or knowledge management applications.\n",
|
||||
"\n",
|
||||
"Diffbot's NLP API allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.\n",
|
||||
"\n",
|
||||
"By coupling Diffbot's NLP API with Neo4j, a graph database, you can create powerful, dynamic graph structures based on the information extracted from text. These graph structures are fully queryable and can be integrated into various applications.\n",
|
||||
"\n",
|
||||
"This combination allows for use cases such as:\n",
|
||||
"\n",
|
||||
"* Building knowledge graphs from textual documents, websites, or social media feeds.\n",
|
||||
"* Generating recommendations based on semantic relationships in the data.\n",
|
||||
"* Creating advanced search features that understand the relationships between entities.\n",
|
||||
"* Building analytics dashboards that allow users to explore the hidden relationships in data.\n",
|
||||
"\n",
|
||||
"## Overview\n",
|
||||
"\n",
|
||||
"LangChain provides tools to interact with Graph Databases:\n",
|
||||
"\n",
|
||||
"1. `Construct knowledge graphs from text` using graph transformer and store integrations \n",
|
||||
"2. `Query a graph database` using chains for query creation and execution\n",
|
||||
"3. `Interact with a graph database` using agents for robust and flexible querying \n",
|
||||
"\n",
|
||||
"## Quickstart\n",
|
||||
"\n",
|
||||
"First, get required packages and set environment variables:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "975648da-b24f-4164-a671-6772179e12df",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install langchain langchain-experimental openai neo4j wikipedia"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "77718977-629e-46c2-b091-f9191b9ec569",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Diffbot NLP Service\n",
|
||||
"\n",
|
||||
"Diffbot's NLP service is a tool for extracting entities, relationships, and semantic context from unstructured text data.\n",
|
||||
"This extracted information can be used to construct a knowledge graph.\n",
|
||||
"To use their service, you'll need to obtain an API key from [Diffbot](https://www.diffbot.com/products/natural-language/)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "2cbf97d0-3682-439b-8750-b695ff726789",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer\n",
|
||||
"\n",
|
||||
"diffbot_api_key = \"DIFFBOT_API_KEY\"\n",
|
||||
"diffbot_nlp = DiffbotGraphTransformer(diffbot_api_key=diffbot_api_key)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5e3b894a-e3ee-46c7-8116-f8377f8f0159",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This code fetches Wikipedia articles about \"Baldur's Gate 3\" and then uses `DiffbotGraphTransformer` to extract entities and relationships.\n",
|
||||
"The `DiffbotGraphTransformer` outputs a structured data `GraphDocument`, which can be used to populate a graph database.\n",
|
||||
"Note that text chunking is avoided due to Diffbot's [character limit per API request](https://docs.diffbot.com/reference/introduction-to-natural-language-api)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "53f8df86-47a1-44a1-9a0f-6725b90703bc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import WikipediaLoader\n",
|
||||
"\n",
|
||||
"query = \"Warren Buffett\"\n",
|
||||
"raw_documents = WikipediaLoader(query=query).load()\n",
|
||||
"graph_documents = diffbot_nlp.convert_to_graph_documents(raw_documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "31bb851a-aab4-4b97-a6b7-fce397d32b47",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Loading the data into a knowledge graph\n",
|
||||
"\n",
|
||||
"You will need to have a running Neo4j instance. One option is to create a [free Neo4j database instance in their Aura cloud service](https://neo4j.com/cloud/platform/aura-graph-database/). You can also run the database locally using the [Neo4j Desktop application](https://neo4j.com/download/), or running a docker container. You can run a local docker container by running the executing the following script:\n",
|
||||
"```\n",
|
||||
"docker run \\\n",
|
||||
" --name neo4j \\\n",
|
||||
" -p 7474:7474 -p 7687:7687 \\\n",
|
||||
" -d \\\n",
|
||||
" -e NEO4J_AUTH=neo4j/pleaseletmein \\\n",
|
||||
" -e NEO4J_PLUGINS=\\[\\\"apoc\\\"\\] \\\n",
|
||||
" neo4j:latest\n",
|
||||
"``` \n",
|
||||
"If you are using the docker container, you need to wait a couple of second for the database to start."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "0b2b6641-5a5d-467c-b148-e6aad5e4baa7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.graphs import Neo4jGraph\n",
|
||||
"\n",
|
||||
"url=\"bolt://localhost:7687\"\n",
|
||||
"username=\"neo4j\"\n",
|
||||
"password=\"pleaseletmein\"\n",
|
||||
"\n",
|
||||
"graph = Neo4jGraph(\n",
|
||||
" url=url,\n",
|
||||
" username=username, \n",
|
||||
" password=password\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0b15e840-fe6f-45db-9193-1b4e2df5c12c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The `GraphDocuments` can be loaded into a knowledge graph using the `add_graph_documents` method."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "1a67c4a8-955c-42a2-9c5d-de3ac0e640ec",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"graph.add_graph_documents(graph_documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ed411e05-2b03-460d-997e-938482774f40",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Refresh graph schema information\n",
|
||||
"If the schema of database changes, you can refresh the schema information needed to generate Cypher statements"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "904c9ee3-787c-403f-857d-459ce5ad5a1b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"graph.refresh_schema()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f19d1387-5899-4258-8c94-8ef5fa7db464",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Querying the graph\n",
|
||||
"We can now use the graph cypher QA chain to ask question of the graph. It is advisable to use **gpt-4** to construct Cypher queries to get the best experience."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "9393b732-67c8-45c1-9ec2-089f49c62448",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import GraphCypherQAChain\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"\n",
|
||||
"chain = GraphCypherQAChain.from_llm(\n",
|
||||
" cypher_llm=ChatOpenAI(temperature=0, model_name=\"gpt-4\"),\n",
|
||||
" qa_llm=ChatOpenAI(temperature=0, model_name=\"gpt-3.5-turbo\"),\n",
|
||||
" graph=graph, verbose=True,\n",
|
||||
" \n",
|
||||
")\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "1a9b3652-b436-404d-aa25-5fb576f23dc0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
|
||||
"Generated Cypher:\n",
|
||||
"\u001b[32;1m\u001b[1;3mMATCH (p:Person {name: \"Warren Buffett\"})-[:EDUCATED_AT]->(o:Organization)\n",
|
||||
"RETURN o.name\u001b[0m\n",
|
||||
"Full Context:\n",
|
||||
"\u001b[32;1m\u001b[1;3m[{'o.name': 'New York Institute of Finance'}, {'o.name': 'Alice Deal Junior High School'}, {'o.name': 'Woodrow Wilson High School'}, {'o.name': 'University of Nebraska'}]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Warren Buffett attended the University of Nebraska.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(\"Which university did Warren Buffett attend?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "adc0ba0f-a62c-4875-89ce-da717f3ab148",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
|
||||
"Generated Cypher:\n",
|
||||
"\u001b[32;1m\u001b[1;3mMATCH (p:Person)-[r:EMPLOYEE_OR_MEMBER_OF]->(o:Organization) WHERE o.name = 'Berkshire Hathaway' RETURN p.name\u001b[0m\n",
|
||||
"Full Context:\n",
|
||||
"\u001b[32;1m\u001b[1;3m[{'p.name': 'Charlie Munger'}, {'p.name': 'Oliver Chace'}, {'p.name': 'Howard Buffett'}, {'p.name': 'Howard'}, {'p.name': 'Susan Buffett'}, {'p.name': 'Warren Buffett'}]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Charlie Munger, Oliver Chace, Howard Buffett, Susan Buffett, and Warren Buffett are or were working at Berkshire Hathaway.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(\"Who is or was working at Berkshire Hathaway?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d636954b-d967-4e96-9489-92e11c74af35",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,5 @@
|
||||
from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer
|
||||
|
||||
__all__ = [
|
||||
"DiffbotGraphTransformer",
|
||||
]
|
@ -0,0 +1,316 @@
|
||||
from typing import Any, Dict, List, Optional, Sequence, Tuple, Union
|
||||
|
||||
import requests
|
||||
from langchain.graphs.graph_document import GraphDocument, Node, Relationship
|
||||
from langchain.schema import Document
|
||||
from langchain.utils import get_from_env
|
||||
|
||||
|
||||
def format_property_key(s: str) -> str:
|
||||
words = s.split()
|
||||
if not words:
|
||||
return s
|
||||
first_word = words[0].lower()
|
||||
capitalized_words = [word.capitalize() for word in words[1:]]
|
||||
return "".join([first_word] + capitalized_words)
|
||||
|
||||
|
||||
class NodesList:
|
||||
"""
|
||||
Manages a list of nodes with associated properties.
|
||||
|
||||
Attributes:
|
||||
nodes (Dict[Tuple, Any]): Stores nodes as keys and their properties as values.
|
||||
Each key is a tuple where the first element is the
|
||||
node ID and the second is the node type.
|
||||
"""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self.nodes: Dict[Tuple[Union[str, int], str], Any] = dict()
|
||||
|
||||
def add_node_property(
|
||||
self, node: Tuple[Union[str, int], str], properties: Dict[str, Any]
|
||||
) -> None:
|
||||
"""
|
||||
Adds or updates node properties.
|
||||
|
||||
If the node does not exist in the list, it's added along with its properties.
|
||||
If the node already exists, its properties are updated with the new values.
|
||||
|
||||
Args:
|
||||
node (Tuple): A tuple containing the node ID and node type.
|
||||
properties (Dict): A dictionary of properties to add or update for the node.
|
||||
"""
|
||||
if node not in self.nodes:
|
||||
self.nodes[node] = properties
|
||||
else:
|
||||
self.nodes[node].update(properties)
|
||||
|
||||
def return_node_list(self) -> List[Node]:
|
||||
"""
|
||||
Returns the nodes as a list of Node objects.
|
||||
|
||||
Each Node object will have its ID, type, and properties populated.
|
||||
|
||||
Returns:
|
||||
List[Node]: A list of Node objects.
|
||||
"""
|
||||
nodes = [
|
||||
Node(id=key[0], type=key[1], properties=self.nodes[key])
|
||||
for key in self.nodes
|
||||
]
|
||||
return nodes
|
||||
|
||||
|
||||
# Properties that should be treated as node properties instead of relationships
|
||||
FACT_TO_PROPERTY_TYPE = [
|
||||
"Date",
|
||||
"Number",
|
||||
"Job title",
|
||||
"Cause of death",
|
||||
"Organization type",
|
||||
"Academic title",
|
||||
]
|
||||
|
||||
|
||||
schema_mapping = [
|
||||
("HEADQUARTERS", "ORGANIZATION_LOCATIONS"),
|
||||
("RESIDENCE", "PERSON_LOCATION"),
|
||||
("ALL_PERSON_LOCATIONS", "PERSON_LOCATION"),
|
||||
("CHILD", "HAS_CHILD"),
|
||||
("PARENT", "HAS_PARENT"),
|
||||
("CUSTOMERS", "HAS_CUSTOMER"),
|
||||
("SKILLED_AT", "INTERESTED_IN"),
|
||||
]
|
||||
|
||||
|
||||
class SimplifiedSchema:
|
||||
"""
|
||||
Provides functionality for working with a simplified schema mapping.
|
||||
|
||||
Attributes:
|
||||
schema (Dict): A dictionary containing the mapping to simplified schema types.
|
||||
"""
|
||||
|
||||
def __init__(self) -> None:
|
||||
"""Initializes the schema dictionary based on the predefined list."""
|
||||
self.schema = dict()
|
||||
for row in schema_mapping:
|
||||
self.schema[row[0]] = row[1]
|
||||
|
||||
def get_type(self, type: str) -> str:
|
||||
"""
|
||||
Retrieves the simplified schema type for a given original type.
|
||||
|
||||
Args:
|
||||
type (str): The original schema type to find the simplified type for.
|
||||
|
||||
Returns:
|
||||
str: The simplified schema type if it exists;
|
||||
otherwise, returns the original type.
|
||||
"""
|
||||
try:
|
||||
return self.schema[type]
|
||||
except KeyError:
|
||||
return type
|
||||
|
||||
|
||||
class DiffbotGraphTransformer:
|
||||
"""Transforms documents into graph documents using Diffbot's NLP API.
|
||||
|
||||
A graph document transformation system takes a sequence of Documents and returns a
|
||||
sequence of Graph Documents.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
class DiffbotGraphTransformer(BaseGraphDocumentTransformer):
|
||||
|
||||
def transform_documents(
|
||||
self, documents: Sequence[Document], **kwargs: Any
|
||||
) -> Sequence[GraphDocument]:
|
||||
results = []
|
||||
|
||||
for document in documents:
|
||||
raw_results = self.nlp_request(document.page_content)
|
||||
graph_document = self.process_response(raw_results, document)
|
||||
results.append(graph_document)
|
||||
return results
|
||||
|
||||
async def atransform_documents(
|
||||
self, documents: Sequence[Document], **kwargs: Any
|
||||
) -> Sequence[Document]:
|
||||
raise NotImplementedError
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
diffbot_api_key: Optional[str] = None,
|
||||
fact_confidence_threshold: float = 0.7,
|
||||
include_qualifiers: bool = True,
|
||||
include_evidence: bool = True,
|
||||
simplified_schema: bool = True,
|
||||
) -> None:
|
||||
"""
|
||||
Initialize the graph transformer with various options.
|
||||
|
||||
Args:
|
||||
diffbot_api_key (str):
|
||||
The API key for Diffbot's NLP services.
|
||||
|
||||
fact_confidence_threshold (float):
|
||||
Minimum confidence level for facts to be included.
|
||||
include_qualifiers (bool):
|
||||
Whether to include qualifiers in the relationships.
|
||||
include_evidence (bool):
|
||||
Whether to include evidence for the relationships.
|
||||
simplified_schema (bool):
|
||||
Whether to use a simplified schema for relationships.
|
||||
"""
|
||||
self.diffbot_api_key = diffbot_api_key or get_from_env(
|
||||
"diffbot_api_key", "DIFFBOT_API_KEY"
|
||||
)
|
||||
self.fact_threshold_confidence = fact_confidence_threshold
|
||||
self.include_qualifiers = include_qualifiers
|
||||
self.include_evidence = include_evidence
|
||||
self.simplified_schema = None
|
||||
if simplified_schema:
|
||||
self.simplified_schema = SimplifiedSchema()
|
||||
|
||||
def nlp_request(self, text: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Make an API request to the Diffbot NLP endpoint.
|
||||
|
||||
Args:
|
||||
text (str): The text to be processed.
|
||||
|
||||
Returns:
|
||||
Dict[str, Any]: The JSON response from the API.
|
||||
"""
|
||||
|
||||
# Relationship extraction only works for English
|
||||
payload = {
|
||||
"content": text,
|
||||
"lang": "en",
|
||||
}
|
||||
|
||||
FIELDS = "facts"
|
||||
HOST = "nl.diffbot.com"
|
||||
url = (
|
||||
f"https://{HOST}/v1/?fields={FIELDS}&"
|
||||
f"token={self.diffbot_api_key}&language=en"
|
||||
)
|
||||
result = requests.post(url, data=payload)
|
||||
return result.json()
|
||||
|
||||
def process_response(
|
||||
self, payload: Dict[str, Any], document: Document
|
||||
) -> GraphDocument:
|
||||
"""
|
||||
Transform the Diffbot NLP response into a GraphDocument.
|
||||
|
||||
Args:
|
||||
payload (Dict[str, Any]): The JSON response from Diffbot's NLP API.
|
||||
document (Document): The original document.
|
||||
|
||||
Returns:
|
||||
GraphDocument: The transformed document as a graph.
|
||||
"""
|
||||
|
||||
# Return empty result if there are no facts
|
||||
if "facts" not in payload or not payload["facts"]:
|
||||
return GraphDocument(nodes=[], relationships=[], source=document)
|
||||
|
||||
# Nodes are a custom class because we need to deduplicate
|
||||
nodes_list = NodesList()
|
||||
# Relationships are a list because we don't deduplicate nor anything else
|
||||
relationships = list()
|
||||
for record in payload["facts"]:
|
||||
# Skip if the fact is below the threshold confidence
|
||||
if record["confidence"] < self.fact_threshold_confidence:
|
||||
continue
|
||||
|
||||
# TODO: It should probably be treated as a node property
|
||||
if not record["value"]["allTypes"]:
|
||||
continue
|
||||
|
||||
# Define source node
|
||||
source_id = (
|
||||
record["entity"]["allUris"][0]
|
||||
if record["entity"]["allUris"]
|
||||
else record["entity"]["name"]
|
||||
)
|
||||
source_label = record["entity"]["allTypes"][0]["name"].capitalize()
|
||||
source_name = record["entity"]["name"]
|
||||
source_node = Node(id=source_id, type=source_label)
|
||||
nodes_list.add_node_property(
|
||||
(source_id, source_label), {"name": source_name}
|
||||
)
|
||||
|
||||
# Define target node
|
||||
target_id = (
|
||||
record["value"]["allUris"][0]
|
||||
if record["value"]["allUris"]
|
||||
else record["value"]["name"]
|
||||
)
|
||||
target_label = record["value"]["allTypes"][0]["name"].capitalize()
|
||||
target_name = record["value"]["name"]
|
||||
# Some facts are better suited as node properties
|
||||
if target_label in FACT_TO_PROPERTY_TYPE:
|
||||
nodes_list.add_node_property(
|
||||
(source_id, source_label),
|
||||
{format_property_key(record["property"]["name"]): target_name},
|
||||
)
|
||||
else: # Define relationship
|
||||
# Define target node object
|
||||
target_node = Node(id=target_id, type=target_label)
|
||||
nodes_list.add_node_property(
|
||||
(target_id, target_label), {"name": target_name}
|
||||
)
|
||||
# Define relationship type
|
||||
rel_type = record["property"]["name"].replace(" ", "_").upper()
|
||||
if self.simplified_schema:
|
||||
rel_type = self.simplified_schema.get_type(rel_type)
|
||||
|
||||
# Relationship qualifiers/properties
|
||||
rel_properties = dict()
|
||||
relationship_evidence = [el["passage"] for el in record["evidence"]][0]
|
||||
if self.include_evidence:
|
||||
rel_properties.update({"evidence": relationship_evidence})
|
||||
if self.include_qualifiers and record.get("qualifiers"):
|
||||
for property in record["qualifiers"]:
|
||||
prop_key = format_property_key(property["property"]["name"])
|
||||
rel_properties[prop_key] = property["value"]["name"]
|
||||
|
||||
relationship = Relationship(
|
||||
source=source_node,
|
||||
target=target_node,
|
||||
type=rel_type,
|
||||
properties=rel_properties,
|
||||
)
|
||||
relationships.append(relationship)
|
||||
|
||||
return GraphDocument(
|
||||
nodes=nodes_list.return_node_list(),
|
||||
relationships=relationships,
|
||||
source=document,
|
||||
)
|
||||
|
||||
def convert_to_graph_documents(
|
||||
self, documents: Sequence[Document]
|
||||
) -> List[GraphDocument]:
|
||||
"""Convert a sequence of documents into graph documents.
|
||||
|
||||
Args:
|
||||
documents (Sequence[Document]): The original documents.
|
||||
**kwargs: Additional keyword arguments.
|
||||
|
||||
Returns:
|
||||
Sequence[GraphDocument]: The transformed documents as graphs.
|
||||
"""
|
||||
results = []
|
||||
for document in documents:
|
||||
raw_results = self.nlp_request(document.page_content)
|
||||
graph_document = self.process_response(raw_results, document)
|
||||
results.append(graph_document)
|
||||
return results
|
@ -0,0 +1,51 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import List, Union
|
||||
|
||||
from langchain.load.serializable import Serializable
|
||||
from langchain.pydantic_v1 import Field
|
||||
from langchain.schema import Document
|
||||
|
||||
|
||||
class Node(Serializable):
|
||||
"""Represents a node in a graph with associated properties.
|
||||
|
||||
Attributes:
|
||||
id (Union[str, int]): A unique identifier for the node.
|
||||
type (str): The type or label of the node, default is "Node".
|
||||
properties (dict): Additional properties and metadata associated with the node.
|
||||
"""
|
||||
|
||||
id: Union[str, int]
|
||||
type: str = "Node"
|
||||
properties: dict = Field(default_factory=dict)
|
||||
|
||||
|
||||
class Relationship(Serializable):
|
||||
"""Represents a directed relationship between two nodes in a graph.
|
||||
|
||||
Attributes:
|
||||
source (Node): The source node of the relationship.
|
||||
target (Node): The target node of the relationship.
|
||||
type (str): The type of the relationship.
|
||||
properties (dict): Additional properties associated with the relationship.
|
||||
"""
|
||||
|
||||
source: Node
|
||||
target: Node
|
||||
type: str
|
||||
properties: dict = Field(default_factory=dict)
|
||||
|
||||
|
||||
class GraphDocument(Serializable):
|
||||
"""Represents a graph document consisting of nodes and relationships.
|
||||
|
||||
Attributes:
|
||||
nodes (List[Node]): A list of nodes in the graph.
|
||||
relationships (List[Relationship]): A list of relationships in the graph.
|
||||
source (Document): The document from which the graph information is derived.
|
||||
"""
|
||||
|
||||
nodes: List[Node]
|
||||
relationships: List[Relationship]
|
||||
source: Document
|
Loading…
Reference in New Issue