Diffbot Graph Transformer / Neo4j Graph document ingestion (#9979)

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago · db73c9d5b5
parent ccb9e3ee2d
commit db73c9d5b5
7 changed files with 761 additions and 1 deletions
--- a/docs/extras/use_cases/more/graph/diffbot_graphtransformer.ipynb
+++ b/docs/extras/use_cases/more/graph/diffbot_graphtransformer.ipynb
@ -0,0 +1,307 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "7f0b0c06-ee70-468c-8bf5-b023f9e5e0a2",
+   "metadata": {},
+   "source": [
+    "# Diffbot Graph Transformer\n",
+    "\n",
+    "[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/more/graph/diffbot_transformer.ipynb)\n",
+    "\n",
+    "## Use case\n",
+    "\n",
+    "Text data often contain rich relationships and insights that can be useful for various analytics, recommendation engines, or knowledge management applications.\n",
+    "\n",
+    "Diffbot's NLP API allows for the extraction of entities, relationships, and semantic meaning from unstructured text data.\n",
+    "\n",
+    "By coupling Diffbot's NLP API with Neo4j, a graph database, you can create powerful, dynamic graph structures based on the information extracted from text. These graph structures are fully queryable and can be integrated into various applications.\n",
+    "\n",
+    "This combination allows for use cases such as:\n",
+    "\n",
+    "* Building knowledge graphs from textual documents, websites, or social media feeds.\n",
+    "* Generating recommendations based on semantic relationships in the data.\n",
+    "* Creating advanced search features that understand the relationships between entities.\n",
+    "* Building analytics dashboards that allow users to explore the hidden relationships in data.\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "LangChain provides tools to interact with Graph Databases:\n",
+    "\n",
+    "1. `Construct knowledge graphs from text` using graph transformer and store integrations \n",
+    "2. `Query a graph database` using chains for query creation and execution\n",
+    "3. `Interact with a graph database` using agents for robust and flexible querying \n",
+    "\n",
+    "## Quickstart\n",
+    "\n",
+    "First, get required packages and set environment variables:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "975648da-b24f-4164-a671-6772179e12df",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install langchain langchain-experimental openai neo4j wikipedia"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77718977-629e-46c2-b091-f9191b9ec569",
+   "metadata": {},
+   "source": [
+    "## Diffbot NLP Service\n",
+    "\n",
+    "Diffbot's NLP service is a tool for extracting entities, relationships, and semantic context from unstructured text data.\n",
+    "This extracted information can be used to construct a knowledge graph.\n",
+    "To use their service, you'll need to obtain an API key from [Diffbot](https://www.diffbot.com/products/natural-language/)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "2cbf97d0-3682-439b-8750-b695ff726789",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer\n",
+    "\n",
+    "diffbot_api_key = \"DIFFBOT_API_KEY\"\n",
+    "diffbot_nlp = DiffbotGraphTransformer(diffbot_api_key=diffbot_api_key)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e3b894a-e3ee-46c7-8116-f8377f8f0159",
+   "metadata": {},
+   "source": [
+    "This code fetches Wikipedia articles about \"Baldur's Gate 3\" and then uses `DiffbotGraphTransformer` to extract entities and relationships.\n",
+    "The `DiffbotGraphTransformer` outputs a structured data `GraphDocument`, which can be used to populate a graph database.\n",
+    "Note that text chunking is avoided due to Diffbot's [character limit per API request](https://docs.diffbot.com/reference/introduction-to-natural-language-api)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "53f8df86-47a1-44a1-9a0f-6725b90703bc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.document_loaders import WikipediaLoader\n",
+    "\n",
+    "query = \"Warren Buffett\"\n",
+    "raw_documents = WikipediaLoader(query=query).load()\n",
+    "graph_documents = diffbot_nlp.convert_to_graph_documents(raw_documents)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31bb851a-aab4-4b97-a6b7-fce397d32b47",
+   "metadata": {},
+   "source": [
+    "## Loading the data into a knowledge graph\n",
+    "\n",
+    "You will need to have a running Neo4j instance. One option is to create a [free Neo4j database instance in their Aura cloud service](https://neo4j.com/cloud/platform/aura-graph-database/). You can also run the database locally using the [Neo4j Desktop application](https://neo4j.com/download/), or running a docker container. You can run a local docker container by running the executing the following script:\n",
+    "```\n",
+    "docker run \\\n",
+    "    --name neo4j \\\n",
+    "    -p 7474:7474 -p 7687:7687 \\\n",
+    "    -d \\\n",
+    "    -e NEO4J_AUTH=neo4j/pleaseletmein \\\n",
+    "    -e NEO4J_PLUGINS=\\[\\\"apoc\\\"\\]  \\\n",
+    "    neo4j:latest\n",
+    "```    \n",
+    "If you are using the docker container, you need to wait a couple of second for the database to start."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "0b2b6641-5a5d-467c-b148-e6aad5e4baa7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.graphs import Neo4jGraph\n",
+    "\n",
+    "url=\"bolt://localhost:7687\"\n",
+    "username=\"neo4j\"\n",
+    "password=\"pleaseletmein\"\n",
+    "\n",
+    "graph = Neo4jGraph(\n",
+    "    url=url,\n",
+    "    username=username, \n",
+    "    password=password\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b15e840-fe6f-45db-9193-1b4e2df5c12c",
+   "metadata": {},
+   "source": [
+    "The `GraphDocuments` can be loaded into a knowledge graph using the `add_graph_documents` method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "1a67c4a8-955c-42a2-9c5d-de3ac0e640ec",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "graph.add_graph_documents(graph_documents)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed411e05-2b03-460d-997e-938482774f40",
+   "metadata": {},
+   "source": [
+    "## Refresh graph schema information\n",
+    "If the schema of database changes, you can refresh the schema information needed to generate Cypher statements"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "904c9ee3-787c-403f-857d-459ce5ad5a1b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "graph.refresh_schema()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f19d1387-5899-4258-8c94-8ef5fa7db464",
+   "metadata": {},
+   "source": [
+    "## Querying the graph\n",
+    "We can now use the graph cypher QA chain to ask question of the graph. It is advisable to use **gpt-4** to construct Cypher queries to get the best experience."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "9393b732-67c8-45c1-9ec2-089f49c62448",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.chains import GraphCypherQAChain\n",
+    "from langchain.chat_models import ChatOpenAI\n",
+    "\n",
+    "chain = GraphCypherQAChain.from_llm(\n",
+    "    cypher_llm=ChatOpenAI(temperature=0, model_name=\"gpt-4\"),\n",
+    "    qa_llm=ChatOpenAI(temperature=0, model_name=\"gpt-3.5-turbo\"),\n",
+    "    graph=graph, verbose=True,\n",
+    "    \n",
+    ")\n",
+    "     "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "1a9b3652-b436-404d-aa25-5fb576f23dc0",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
+      "Generated Cypher:\n",
+      "\u001b[32;1m\u001b[1;3mMATCH (p:Person {name: \"Warren Buffett\"})-[:EDUCATED_AT]->(o:Organization)\n",
+      "RETURN o.name\u001b[0m\n",
+      "Full Context:\n",
+      "\u001b[32;1m\u001b[1;3m[{'o.name': 'New York Institute of Finance'}, {'o.name': 'Alice Deal Junior High School'}, {'o.name': 'Woodrow Wilson High School'}, {'o.name': 'University of Nebraska'}]\u001b[0m\n",
+      "\n",
+      "\u001b[1m> Finished chain.\u001b[0m\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "'Warren Buffett attended the University of Nebraska.'"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "chain.run(\"Which university did Warren Buffett attend?\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "adc0ba0f-a62c-4875-89ce-da717f3ab148",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
+      "Generated Cypher:\n",
+      "\u001b[32;1m\u001b[1;3mMATCH (p:Person)-[r:EMPLOYEE_OR_MEMBER_OF]->(o:Organization) WHERE o.name = 'Berkshire Hathaway' RETURN p.name\u001b[0m\n",
+      "Full Context:\n",
+      "\u001b[32;1m\u001b[1;3m[{'p.name': 'Charlie Munger'}, {'p.name': 'Oliver Chace'}, {'p.name': 'Howard Buffett'}, {'p.name': 'Howard'}, {'p.name': 'Susan Buffett'}, {'p.name': 'Warren Buffett'}]\u001b[0m\n",
+      "\n",
+      "\u001b[1m> Finished chain.\u001b[0m\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "'Charlie Munger, Oliver Chace, Howard Buffett, Susan Buffett, and Warren Buffett are or were working at Berkshire Hathaway.'"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "chain.run(\"Who is or was working at Berkshire Hathaway?\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d636954b-d967-4e96-9489-92e11c74af35",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/libs/experimental/langchain_experimental/graph_transformers/init.py
+++ b/libs/experimental/langchain_experimental/graph_transformers/init.py
@ -0,0 +1,5 @@
+from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer
+
+__all__ = [
+    "DiffbotGraphTransformer",
+]
--- a/libs/experimental/langchain_experimental/graph_transformers/diffbot.py
+++ b/libs/experimental/langchain_experimental/graph_transformers/diffbot.py
@ -0,0 +1,316 @@
+from typing import Any, Dict, List, Optional, Sequence, Tuple, Union
+
+import requests
+from langchain.graphs.graph_document import GraphDocument, Node, Relationship
+from langchain.schema import Document
+from langchain.utils import get_from_env
+
+
+def format_property_key(s: str) -> str:
+    words = s.split()
+    if not words:
+        return s
+    first_word = words[0].lower()
+    capitalized_words = [word.capitalize() for word in words[1:]]
+    return "".join([first_word] + capitalized_words)
+
+
+class NodesList:
+    """
+    Manages a list of nodes with associated properties.
+
+    Attributes:
+        nodes (Dict[Tuple, Any]): Stores nodes as keys and their properties as values.
+            Each key is a tuple where the first element is the
+            node ID and the second is the node type.
+    """
+
+    def __init__(self) -> None:
+        self.nodes: Dict[Tuple[Union[str, int], str], Any] = dict()
+
+    def add_node_property(
+        self, node: Tuple[Union[str, int], str], properties: Dict[str, Any]
+    ) -> None:
+        """
+        Adds or updates node properties.
+
+        If the node does not exist in the list, it's added along with its properties.
+        If the node already exists, its properties are updated with the new values.
+
+        Args:
+            node (Tuple): A tuple containing the node ID and node type.
+            properties (Dict): A dictionary of properties to add or update for the node.
+        """
+        if node not in self.nodes:
+            self.nodes[node] = properties
+        else:
+            self.nodes[node].update(properties)
+
+    def return_node_list(self) -> List[Node]:
+        """
+        Returns the nodes as a list of Node objects.
+
+        Each Node object will have its ID, type, and properties populated.
+
+        Returns:
+            List[Node]: A list of Node objects.
+        """
+        nodes = [
+            Node(id=key[0], type=key[1], properties=self.nodes[key])
+            for key in self.nodes
+        ]
+        return nodes
+
+
+# Properties that should be treated as node properties instead of relationships
+FACT_TO_PROPERTY_TYPE = [
+    "Date",
+    "Number",
+    "Job title",
+    "Cause of death",
+    "Organization type",
+    "Academic title",
+]
+
+
+schema_mapping = [
+    ("HEADQUARTERS", "ORGANIZATION_LOCATIONS"),
+    ("RESIDENCE", "PERSON_LOCATION"),
+    ("ALL_PERSON_LOCATIONS", "PERSON_LOCATION"),
+    ("CHILD", "HAS_CHILD"),
+    ("PARENT", "HAS_PARENT"),
+    ("CUSTOMERS", "HAS_CUSTOMER"),
+    ("SKILLED_AT", "INTERESTED_IN"),
+]
+
+
+class SimplifiedSchema:
+    """
+    Provides functionality for working with a simplified schema mapping.
+
+    Attributes:
+        schema (Dict): A dictionary containing the mapping to simplified schema types.
+    """
+
+    def __init__(self) -> None:
+        """Initializes the schema dictionary based on the predefined list."""
+        self.schema = dict()
+        for row in schema_mapping:
+            self.schema[row[0]] = row[1]
+
+    def get_type(self, type: str) -> str:
+        """
+        Retrieves the simplified schema type for a given original type.
+
+        Args:
+            type (str): The original schema type to find the simplified type for.
+
+        Returns:
+            str: The simplified schema type if it exists;
+                 otherwise, returns the original type.
+        """
+        try:
+            return self.schema[type]
+        except KeyError:
+            return type
+
+
+class DiffbotGraphTransformer:
+    """Transforms documents into graph documents using Diffbot's NLP API.
+
+    A graph document transformation system takes a sequence of Documents and returns a
+    sequence of Graph Documents.
+
+    Example:
+        .. code-block:: python
+
+            class DiffbotGraphTransformer(BaseGraphDocumentTransformer):
+
+                def transform_documents(
+                    self, documents: Sequence[Document], **kwargs: Any
+                ) -> Sequence[GraphDocument]:
+                    results = []
+
+                    for document in documents:
+                        raw_results = self.nlp_request(document.page_content)
+                        graph_document = self.process_response(raw_results, document)
+                        results.append(graph_document)
+                    return results
+
+                async def atransform_documents(
+                    self, documents: Sequence[Document], **kwargs: Any
+                ) -> Sequence[Document]:
+                    raise NotImplementedError
+    """
+
+    def __init__(
+        self,
+        diffbot_api_key: Optional[str] = None,
+        fact_confidence_threshold: float = 0.7,
+        include_qualifiers: bool = True,
+        include_evidence: bool = True,
+        simplified_schema: bool = True,
+    ) -> None:
+        """
+        Initialize the graph transformer with various options.
+
+        Args:
+            diffbot_api_key (str):
+               The API key for Diffbot's NLP services.
+
+            fact_confidence_threshold (float):
+                Minimum confidence level for facts to be included.
+            include_qualifiers (bool):
+                Whether to include qualifiers in the relationships.
+            include_evidence (bool):
+                Whether to include evidence for the relationships.
+            simplified_schema (bool):
+                Whether to use a simplified schema for relationships.
+        """
+        self.diffbot_api_key = diffbot_api_key or get_from_env(
+            "diffbot_api_key", "DIFFBOT_API_KEY"
+        )
+        self.fact_threshold_confidence = fact_confidence_threshold
+        self.include_qualifiers = include_qualifiers
+        self.include_evidence = include_evidence
+        self.simplified_schema = None
+        if simplified_schema:
+            self.simplified_schema = SimplifiedSchema()
+
+    def nlp_request(self, text: str) -> Dict[str, Any]:
+        """
+        Make an API request to the Diffbot NLP endpoint.
+
+        Args:
+            text (str): The text to be processed.
+
+        Returns:
+            Dict[str, Any]: The JSON response from the API.
+        """
+
+        # Relationship extraction only works for English
+        payload = {
+            "content": text,
+            "lang": "en",
+        }
+
+        FIELDS = "facts"
+        HOST = "nl.diffbot.com"
+        url = (
+            f"https://{HOST}/v1/?fields={FIELDS}&"
+            f"token={self.diffbot_api_key}&language=en"
+        )
+        result = requests.post(url, data=payload)
+        return result.json()
+
+    def process_response(
+        self, payload: Dict[str, Any], document: Document
+    ) -> GraphDocument:
+        """
+        Transform the Diffbot NLP response into a GraphDocument.
+
+        Args:
+            payload (Dict[str, Any]): The JSON response from Diffbot's NLP API.
+            document (Document): The original document.
+
+        Returns:
+            GraphDocument: The transformed document as a graph.
+        """
+
+        # Return empty result if there are no facts
+        if "facts" not in payload or not payload["facts"]:
+            return GraphDocument(nodes=[], relationships=[], source=document)
+
+        # Nodes are a custom class because we need to deduplicate
+        nodes_list = NodesList()
+        # Relationships are a list because we don't deduplicate nor anything else
+        relationships = list()
+        for record in payload["facts"]:
+            # Skip if the fact is below the threshold confidence
+            if record["confidence"] < self.fact_threshold_confidence:
+                continue
+
+            # TODO: It should probably be treated as a node property
+            if not record["value"]["allTypes"]:
+                continue
+
+            # Define source node
+            source_id = (
+                record["entity"]["allUris"][0]
+                if record["entity"]["allUris"]
+                else record["entity"]["name"]
+            )
+            source_label = record["entity"]["allTypes"][0]["name"].capitalize()
+            source_name = record["entity"]["name"]
+            source_node = Node(id=source_id, type=source_label)
+            nodes_list.add_node_property(
+                (source_id, source_label), {"name": source_name}
+            )
+
+            # Define target node
+            target_id = (
+                record["value"]["allUris"][0]
+                if record["value"]["allUris"]
+                else record["value"]["name"]
+            )
+            target_label = record["value"]["allTypes"][0]["name"].capitalize()
+            target_name = record["value"]["name"]
+            # Some facts are better suited as node properties
+            if target_label in FACT_TO_PROPERTY_TYPE:
+                nodes_list.add_node_property(
+                    (source_id, source_label),
+                    {format_property_key(record["property"]["name"]): target_name},
+                )
+            else:  # Define relationship
+                # Define target node object
+                target_node = Node(id=target_id, type=target_label)
+                nodes_list.add_node_property(
+                    (target_id, target_label), {"name": target_name}
+                )
+                # Define relationship type
+                rel_type = record["property"]["name"].replace(" ", "_").upper()
+                if self.simplified_schema:
+                    rel_type = self.simplified_schema.get_type(rel_type)
+
+                # Relationship qualifiers/properties
+                rel_properties = dict()
+                relationship_evidence = [el["passage"] for el in record["evidence"]][0]
+                if self.include_evidence:
+                    rel_properties.update({"evidence": relationship_evidence})
+                if self.include_qualifiers and record.get("qualifiers"):
+                    for property in record["qualifiers"]:
+                        prop_key = format_property_key(property["property"]["name"])
+                        rel_properties[prop_key] = property["value"]["name"]
+
+                relationship = Relationship(
+                    source=source_node,
+                    target=target_node,
+                    type=rel_type,
+                    properties=rel_properties,
+                )
+                relationships.append(relationship)
+
+        return GraphDocument(
+            nodes=nodes_list.return_node_list(),
+            relationships=relationships,
+            source=document,
+        )
+
+    def convert_to_graph_documents(
+        self, documents: Sequence[Document]
+    ) -> List[GraphDocument]:
+        """Convert a sequence of documents into graph documents.
+
+        Args:
+            documents (Sequence[Document]): The original documents.
+            **kwargs: Additional keyword arguments.
+
+        Returns:
+            Sequence[GraphDocument]: The transformed documents as graphs.
+        """
+        results = []
+        for document in documents:
+            raw_results = self.nlp_request(document.page_content)
+            graph_document = self.process_response(raw_results, document)
+            results.append(graph_document)
+        return results
--- a/libs/experimental/poetry.lock
+++ b/libs/experimental/poetry.lock
@ -3752,6 +3752,31 @@ files = [
    {file = "types_PyYAML-6.0.12.11-py3-none-any.whl", hash = "sha256:a461508f3096d1d5810ec5ab95d7eeecb651f3a15b71959999988942063bf01d"},
 ]

+[[package]]
+name = "types-requests"
+version = "2.31.0.2"
+description = "Typing stubs for requests"
+optional = false
+python-versions = "*"
+files = [
+    {file = "types-requests-2.31.0.2.tar.gz", hash = "sha256:6aa3f7faf0ea52d728bb18c0a0d1522d9bfd8c72d26ff6f61bfc3d06a411cf40"},
+    {file = "types_requests-2.31.0.2-py3-none-any.whl", hash = "sha256:56d181c85b5925cbc59f4489a57e72a8b2166f18273fd8ba7b6fe0c0b986f12a"},
+]
+
+[package.dependencies]
+types-urllib3 = "*"
+
+[[package]]
+name = "types-urllib3"
+version = "1.26.25.14"
+description = "Typing stubs for urllib3"
+optional = false
+python-versions = "*"
+files = [
+    {file = "types-urllib3-1.26.25.14.tar.gz", hash = "sha256:229b7f577c951b8c1b92c1bc2b2fdb0b49847bd2af6d1cc2a2e3dd340f3bda8f"},
+    {file = "types_urllib3-1.26.25.14-py3-none-any.whl", hash = "sha256:9683bbb7fb72e32bfe9d2be6e04875fbe1b3eeec3cbb4ea231435aa7fd6b4f0e"},
+]
+
 [[package]]
 name = "typing-extensions"
 version = "4.7.1"
@ -3995,4 +4020,4 @@ extended-testing = ["faker", "presidio-analyzer", "presidio-anonymizer"]
 [metadata]
 lock-version = "2.0"
 python-versions = ">=3.8.1,<4.0"
-content-hash = "66ac482bd05eb74414210ac28fc1e8dae1a9928a4a1314e1326fada3551aa8ad"
+content-hash = "443e88f690572715cf58671e4480a006574c7141a1258dff0a0818b954184901"
--- a/libs/experimental/pyproject.toml
+++ b/libs/experimental/pyproject.toml
@ -23,6 +23,7 @@ black = "^23.1.0"
 [tool.poetry.group.typing.dependencies]
 mypy = "^0.991"
 types-pyyaml = "^6.0.12.2"
+types-requests = "^2.28.11.5"

 [tool.poetry.group.dev.dependencies]
 jupyter = "^1.0.0"
--- a/libs/langchain/langchain/graphs/graph_document.py
+++ b/libs/langchain/langchain/graphs/graph_document.py
@ -0,0 +1,51 @@
+from __future__ import annotations
+
+from typing import List, Union
+
+from langchain.load.serializable import Serializable
+from langchain.pydantic_v1 import Field
+from langchain.schema import Document
+
+
+class Node(Serializable):
+    """Represents a node in a graph with associated properties.
+
+    Attributes:
+        id (Union[str, int]): A unique identifier for the node.
+        type (str): The type or label of the node, default is "Node".
+        properties (dict): Additional properties and metadata associated with the node.
+    """
+
+    id: Union[str, int]
+    type: str = "Node"
+    properties: dict = Field(default_factory=dict)
+
+
+class Relationship(Serializable):
+    """Represents a directed relationship between two nodes in a graph.
+
+    Attributes:
+        source (Node): The source node of the relationship.
+        target (Node): The target node of the relationship.
+        type (str): The type of the relationship.
+        properties (dict): Additional properties associated with the relationship.
+    """
+
+    source: Node
+    target: Node
+    type: str
+    properties: dict = Field(default_factory=dict)
+
+
+class GraphDocument(Serializable):
+    """Represents a graph document consisting of nodes and relationships.
+
+    Attributes:
+        nodes (List[Node]): A list of nodes in the graph.
+        relationships (List[Relationship]): A list of relationships in the graph.
+        source (Document): The document from which the graph information is derived.
+    """
+
+    nodes: List[Node]
+    relationships: List[Relationship]
+    source: Document
--- a/libs/langchain/langchain/graphs/neo4j_graph.py
+++ b/libs/langchain/langchain/graphs/neo4j_graph.py
@ -1,5 +1,7 @@
 from typing import Any, Dict, List

+from langchain.graphs.graph_document import GraphDocument
+
 node_properties_query = """
 CALL apoc.meta.data()
 YIELD label, other, elementType, type, property
@ -99,3 +101,56 @@ class Neo4jGraph:
        The relationships are the following:
        {[el['output'] for el in relationships]}
        """
+
+    def add_graph_documents(
+        self, graph_documents: List[GraphDocument], include_source: bool = False
+    ) -> None:
+        """
+        Take GraphDocument as input as uses it to construct a graph.
+        """
+        for document in graph_documents:
+            include_docs_query = (
+                "CREATE (d:Document) "
+                "SET d.text = $document.page_content "
+                "SET d += $document.metadata "
+                "WITH d "
+            )
+            # Import nodes
+            self.query(
+                (
+                    f"{include_docs_query if include_source else ''}"
+                    "UNWIND $data AS row "
+                    "CALL apoc.merge.node([row.type], {id: row.id}, "
+                    "row.properties, {}) YIELD node "
+                    f"{'MERGE (d)-[:MENTIONS]->(node) ' if include_source else ''}"
+                    "RETURN distinct 'done' AS result"
+                ),
+                {
+                    "data": [el.__dict__ for el in document.nodes],
+                    "document": document.source.__dict__,
+                },
+            )
+            # Import relationships
+            self.query(
+                "UNWIND $data AS row "
+                "CALL apoc.merge.node([row.source_label], {id: row.source},"
+                "{}, {}) YIELD node as source "
+                "CALL apoc.merge.node([row.target_label], {id: row.target},"
+                "{}, {}) YIELD node as target "
+                "CALL apoc.merge.relationship(source, row.type, "
+                "{}, row.properties, target) YIELD rel "
+                "RETURN distinct 'done'",
+                {
+                    "data": [
+                        {
+                            "source": el.source.id,
+                            "source_label": el.source.type,
+                            "target": el.target.id,
+                            "target_label": el.target.type,
+                            "type": el.type.replace(" ", "_").upper(),
+                            "properties": el.properties,
+                        }
+                        for el in document.relationships
+                    ]
+                },
+            )