langchain/docs/docs/integrations/graphs/memgraph.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "311b3061",
   "metadata": {},
   "source": [
    "# Memgraph\n",
    "\n",
    ">[Memgraph](https://github.com/memgraph/memgraph) is the open-source graph database, compatible with `Neo4j`.\n",
    ">The database is using the `Cypher` graph query language, \n",
    ">\n",
    ">[Cypher](https://en.wikipedia.org/wiki/Cypher_(query_language)) is a declarative graph query language that allows for expressive and efficient data querying in a property graph.\n",
    "\n",
    "This notebook shows how to use LLMs to provide a natural language interface to a [Memgraph](https://github.com/memgraph/memgraph) database.\n",
    "\n",
    "\n",
    "## Setting up\n",
    "\n",
    "To complete this tutorial, you will need [Docker](https://www.docker.com/get-started/) and [Python 3.x](https://www.python.org/) installed.\n",
    "\n",
    "Ensure you have a running `Memgraph` instance. You can download and run it in a local Docker container by executing the following script:\n",
    "```\n",
    "docker run \\\n",
    "    -it \\\n",
    "    -p 7687:7687 \\\n",
    "    -p 7444:7444 \\\n",
    "    -p 3000:3000 \\\n",
    "    -e MEMGRAPH=\"--bolt-server-name-for-init=Neo4j/\" \\\n",
    "    -v mg_lib:/var/lib/memgraph memgraph/memgraph-platform\n",
    "```\n",
    "\n",
    "You will need to wait a few seconds for the database to start. If the process is completed successfully, you should see something like this:\n",
    "```\n",
    "mgconsole X.X\n",
    "Connected to 'memgraph://127.0.0.1:7687'\n",
    "Type :help for shell usage\n",
    "Quit the shell by typing Ctrl-D(eof) or :quit\n",
    "memgraph>\n",
    "```\n",
    "\n",
    "Now you can start playing with `Memgraph`!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "45ee105e",
   "metadata": {},
   "source": [
    "Begin by installing and importing all the necessary packages. We'll use the package manager called [pip](https://pip.pypa.io/en/stable/installation/), along with the `--user` flag, to ensure proper permissions. If you've installed Python 3.4 or a later version, pip is included by default. You can install all the required packages using the following command:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fd6b9672",
   "metadata": {},
   "outputs": [],
   "source": [
    "pip install langchain langchain-openai neo4j gqlalchemy --user"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec969a02",
   "metadata": {},
   "source": [
    "You can either run the provided code blocks in this notebook or use a separate Python file to experiment with Memgraph and LangChain."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8206f90d",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from gqlalchemy import Memgraph\n",
    "from langchain.chains import GraphCypherQAChain\n",
    "from langchain.prompts import PromptTemplate\n",
    "from langchain_community.graphs import MemgraphGraph\n",
    "from langchain_openai import ChatOpenAI"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95ba37a4",
   "metadata": {},
   "source": [
    "We're utilizing the Python library [GQLAlchemy](https://github.com/memgraph/gqlalchemy) to establish a connection between our Memgraph database and Python script. To execute queries, we can set up a Memgraph instance as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b90c9cf8",
   "metadata": {},
   "outputs": [],
   "source": [
    "memgraph = Memgraph(host=\"127.0.0.1\", port=7687)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c379d16",
   "metadata": {},
   "source": [
    "## Populating the database\n",
    "You can effortlessly populate your new, empty database using the Cypher query language. Don't worry if you don't grasp every line just yet, you can learn Cypher from the documentation [here](https://memgraph.com/docs/cypher-manual/). Running the following script will execute a seeding query on the database, giving us data about a video game, including details like the publisher, available platforms, and genres. This data will serve as a basis for our work."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11922bdf",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Creating and executing the seeding query\n",
    "query = \"\"\"\n",
    "    MERGE (g:Game {name: \"Baldur's Gate 3\"})\n",
    "    WITH g, [\"PlayStation 5\", \"Mac OS\", \"Windows\", \"Xbox Series X/S\"] AS platforms,\n",
    "            [\"Adventure\", \"Role-Playing Game\", \"Strategy\"] AS genres\n",
    "    FOREACH (platform IN platforms |\n",
    "        MERGE (p:Platform {name: platform})\n",
    "        MERGE (g)-[:AVAILABLE_ON]->(p)\n",
    "    )\n",
    "    FOREACH (genre IN genres |\n",
    "        MERGE (gn:Genre {name: genre})\n",
    "        MERGE (g)-[:HAS_GENRE]->(gn)\n",
    "    )\n",
    "    MERGE (p:Publisher {name: \"Larian Studios\"})\n",
    "    MERGE (g)-[:PUBLISHED_BY]->(p);\n",
    "\"\"\"\n",
    "\n",
    "memgraph.execute(query)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "378db965",
   "metadata": {},
   "source": [
    "## Refresh graph schema"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6b37df3",
   "metadata": {},
   "source": [
    "You're all set to instantiate the Memgraph-LangChain graph using the following script. This interface will allow us to query our database using LangChain, automatically creating the required graph schema for generating Cypher queries through LLM."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f38bbe83",
   "metadata": {},
   "outputs": [],
   "source": [
    "graph = MemgraphGraph(url=\"bolt://localhost:7687\", username=\"\", password=\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "846c32a8",
   "metadata": {},
   "source": [
    "If necessary, you can manually refresh the graph schema as follows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b561026e",
   "metadata": {},
   "outputs": [],
   "source": [
    "graph.refresh_schema()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c51b7948",
   "metadata": {},
   "source": [
    "To familiarize yourself with the data and verify the updated graph schema, you can print it using the following statement."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f2e0ec3e",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(graph.schema)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0c2a556",
   "metadata": {},
   "source": [
    "```\n",
    "Node properties are the following:\n",
    "Node name: 'Game', Node properties: [{'property': 'name', 'type': 'str'}]\n",
    "Node name: 'Platform', Node properties: [{'property': 'name', 'type': 'str'}]\n",
    "Node name: 'Genre', Node properties: [{'property': 'name', 'type': 'str'}]\n",
    "Node name: 'Publisher', Node properties: [{'property': 'name', 'type': 'str'}]\n",
    "\n",
    "Relationship properties are the following:\n",
    "\n",
    "The relationships are the following:\n",
    "['(:Game)-[:AVAILABLE_ON]->(:Platform)']\n",
    "['(:Game)-[:HAS_GENRE]->(:Genre)']\n",
    "['(:Game)-[:PUBLISHED_BY]->(:Publisher)']\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44d3a1da",
   "metadata": {},
   "source": [
    "## Querying the database"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8aedfd63",
   "metadata": {},
   "source": [
    "To interact with the OpenAI API, you must configure your API key as an environment variable using the Python [os](https://docs.python.org/3/library/os.html) package. This ensures proper authorization for your requests. You can find more information on obtaining your API key [here](https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b8385c72",
   "metadata": {},
   "outputs": [],
   "source": [
    "os.environ[\"OPENAI_API_KEY\"] = \"your-key-here\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a74565a",
   "metadata": {},
   "source": [
    "You should create the graph chain using the following script, which will be utilized in the question-answering process based on your graph data. While it defaults to GPT-3.5-turbo, you might also consider experimenting with other models like [GPT-4](https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4) for notably improved Cypher queries and outcomes. We'll utilize the OpenAI chat, utilizing the key you previously configured. We'll set the temperature to zero, ensuring predictable and consistent answers. Additionally, we'll use our Memgraph-LangChain graph and set the verbose parameter, which defaults to False, to True to receive more detailed messages regarding query generation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4a3a5f2e",
   "metadata": {},
   "outputs": [],
   "source": [
    "chain = GraphCypherQAChain.from_llm(\n",
    "    ChatOpenAI(temperature=0), graph=graph, verbose=True, model_name=\"gpt-3.5-turbo\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "949de4f3",
   "metadata": {},
   "source": [
    "Now you can start asking questions!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b7aea263",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = chain.run(\"Which platforms is Baldur's Gate 3 available on?\")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a06a8164",
   "metadata": {},
   "source": [
    "```\n",
    "> Entering new GraphCypherQAChain chain...\n",
    "Generated Cypher:\n",
    "MATCH (g:Game {name: 'Baldur\\'s Gate 3'})-[:AVAILABLE_ON]->(p:Platform)\n",
    "RETURN p.name\n",
    "Full Context:\n",
    "[{'p.name': 'PlayStation 5'}, {'p.name': 'Mac OS'}, {'p.name': 'Windows'}, {'p.name': 'Xbox Series X/S'}]\n",
    "\n",
    "> Finished chain.\n",
    "Baldur's Gate 3 is available on PlayStation 5, Mac OS, Windows, and Xbox Series X/S.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "59d298d5",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = chain.run(\"Is Baldur's Gate 3 available on Windows?\")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99dd783c",
   "metadata": {},
   "source": [
    "```\n",
    "> Entering new GraphCypherQAChain chain...\n",
    "Generated Cypher:\n",
    "MATCH (:Game {name: 'Baldur\\'s Gate 3'})-[:AVAILABLE_ON]->(:Platform {name: 'Windows'})\n",
    "RETURN true\n",
    "Full Context:\n",
    "[{'true': True}]\n",
    "\n",
    "> Finished chain.\n",
    "Yes, Baldur's Gate 3 is available on Windows.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "08620465",
   "metadata": {},
   "source": [
    "## Chain modifiers"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6603e6c8",
   "metadata": {},
   "source": [
    "To modify the behavior of your chain and obtain more context or additional information, you can modify the chain's parameters."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d187a83",
   "metadata": {},
   "source": [
    "#### Return direct query results\n",
    "The `return_direct` modifier specifies whether to return the direct results of the executed Cypher query or the processed natural language response."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0533847d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Return the result of querying the graph directly\n",
    "chain = GraphCypherQAChain.from_llm(\n",
    "    ChatOpenAI(temperature=0), graph=graph, verbose=True, return_direct=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "afbe96fb",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = chain.run(\"Which studio published Baldur's Gate 3?\")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "94b32b6e",
   "metadata": {},
   "source": [
    "```\n",
    "> Entering new GraphCypherQAChain chain...\n",
    "Generated Cypher:\n",
    "MATCH (:Game {name: 'Baldur\\'s Gate 3'})-[:PUBLISHED_BY]->(p:Publisher)\n",
    "RETURN p.name\n",
    "\n",
    "> Finished chain.\n",
    "[{'p.name': 'Larian Studios'}]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c97ab3a",
   "metadata": {},
   "source": [
    "#### Return query intermediate steps\n",
    "The `return_intermediate_steps` chain modifier enhances the returned response by including the intermediate steps of the query in addition to the initial query result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "82f673c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Return all the intermediate steps of query execution\n",
    "chain = GraphCypherQAChain.from_llm(\n",
    "    ChatOpenAI(temperature=0), graph=graph, verbose=True, return_intermediate_steps=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d87e0976",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = chain(\"Is Baldur's Gate 3 an Adventure game?\")\n",
    "print(f\"Intermediate steps: {response['intermediate_steps']}\")\n",
    "print(f\"Final response: {response['result']}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df12b3da",
   "metadata": {},
   "source": [
    "```\n",
    "> Entering new GraphCypherQAChain chain...\n",
    "Generated Cypher:\n",
    "MATCH (g:Game {name: 'Baldur\\'s Gate 3'})-[:HAS_GENRE]->(genre:Genre {name: 'Adventure'})\n",
    "RETURN g, genre\n",
    "Full Context:\n",
    "[{'g': {'name': \"Baldur's Gate 3\"}, 'genre': {'name': 'Adventure'}}]\n",
    "\n",
    "> Finished chain.\n",
    "Intermediate steps: [{'query': \"MATCH (g:Game {name: 'Baldur\\\\'s Gate 3'})-[:HAS_GENRE]->(genre:Genre {name: 'Adventure'})\\nRETURN g, genre\"}, {'context': [{'g': {'name': \"Baldur's Gate 3\"}, 'genre': {'name': 'Adventure'}}]}]\n",
    "Final response: Yes, Baldur's Gate 3 is an Adventure game.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41124485",
   "metadata": {},
   "source": [
    "#### Limit the number of query results\n",
    "The `top_k` modifier can be used when you want to restrict the maximum number of query results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7340fc87",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Limit the maximum number of results returned by query\n",
    "chain = GraphCypherQAChain.from_llm(\n",
    "    ChatOpenAI(temperature=0), graph=graph, verbose=True, top_k=2\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3a17cdc6",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = chain.run(\"What genres are associated with Baldur's Gate 3?\")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dcff33ed",
   "metadata": {},
   "source": [
    "```\n",
    "> Entering new GraphCypherQAChain chain...\n",
    "Generated Cypher:\n",
    "MATCH (:Game {name: 'Baldur\\'s Gate 3'})-[:HAS_GENRE]->(g:Genre)\n",
    "RETURN g.name\n",
    "Full Context:\n",
    "[{'g.name': 'Adventure'}, {'g.name': 'Role-Playing Game'}]\n",
    "\n",
    "> Finished chain.\n",
    "Baldur's Gate 3 is associated with the genres Adventure and Role-Playing Game.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2eb524a1",
   "metadata": {},
   "source": [
    "# Advanced querying"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "113be997",
   "metadata": {},
   "source": [
    "As the complexity of your solution grows, you might encounter different use-cases that require careful handling. Ensuring your application's scalability is essential to maintain a smooth user flow without any hitches."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0b2db17",
   "metadata": {},
   "source": [
    "Let's instantiate our chain once again and attempt to ask some questions that users might potentially ask."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fc544d0b",
   "metadata": {},
   "outputs": [],
   "source": [
    "chain = GraphCypherQAChain.from_llm(\n",
    "    ChatOpenAI(temperature=0), graph=graph, verbose=True, model_name=\"gpt-3.5-turbo\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e2abde2d",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = chain.run(\"Is Baldur's Gate 3 available on PS5?\")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf22dc48",
   "metadata": {},
   "source": [
    "```\n",
    "> Entering new GraphCypherQAChain chain...\n",
    "Generated Cypher:\n",
    "MATCH (g:Game {name: 'Baldur\\'s Gate 3'})-[:AVAILABLE_ON]->(p:Platform {name: 'PS5'})\n",
    "RETURN g.name, p.name\n",
    "Full Context:\n",
    "[]\n",
    "\n",
    "> Finished chain.\n",
    "I'm sorry, but I don't have the information to answer your question.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "293aa1c9",
   "metadata": {},
   "source": [
    "The generated Cypher query looks fine, but we didn't receive any information in response. This illustrates a common challenge when working with LLMs - the misalignment between how users phrase queries and how data is stored. In this case, the difference between user perception and the actual data storage can cause mismatches. Prompt refinement, the process of honing the model's prompts to better grasp these distinctions, is an efficient solution that tackles this issue. Through prompt refinement, the model gains increased proficiency in generating precise and pertinent queries, leading to the successful retrieval of the desired data."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a87b2f1b",
   "metadata": {},
   "source": [
    "### Prompt refinement"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8edb9976",
   "metadata": {},
   "source": [
    "To address this, we can adjust the initial Cypher prompt of the QA chain. This involves adding guidance to the LLM on how users can refer to specific platforms, such as PS5 in our case. We achieve this using the LangChain [PromptTemplate](/docs/modules/model_io/prompts/), creating a modified initial prompt. This modified prompt is then supplied as an argument to our refined Memgraph-LangChain instance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "312dad05",
   "metadata": {},
   "outputs": [],
   "source": [
    "CYPHER_GENERATION_TEMPLATE = \"\"\"\n",
    "Task:Generate Cypher statement to query a graph database.\n",
    "Instructions:\n",
    "Use only the provided relationship types and properties in the schema.\n",
    "Do not use any other relationship types or properties that are not provided.\n",
    "Schema:\n",
    "{schema}\n",
    "Note: Do not include any explanations or apologies in your responses.\n",
    "Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.\n",
    "Do not include any text except the generated Cypher statement.\n",
    "If the user asks about PS5, Play Station 5 or PS 5, that is the platform called PlayStation 5.\n",
    "\n",
    "The question is:\n",
    "{question}\n",
    "\"\"\"\n",
    "\n",
    "CYPHER_GENERATION_PROMPT = PromptTemplate(\n",
    "    input_variables=[\"schema\", \"question\"], template=CYPHER_GENERATION_TEMPLATE\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2c297245",
   "metadata": {},
   "outputs": [],
   "source": [
    "chain = GraphCypherQAChain.from_llm(\n",
    "    ChatOpenAI(temperature=0),\n",
    "    cypher_prompt=CYPHER_GENERATION_PROMPT,\n",
    "    graph=graph,\n",
    "    verbose=True,\n",
    "    model_name=\"gpt-3.5-turbo\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7efb11a0",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = chain.run(\"Is Baldur's Gate 3 available on PS5?\")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "289db07f",
   "metadata": {},
   "source": [
    "```\n",
    "> Entering new GraphCypherQAChain chain...\n",
    "Generated Cypher:\n",
    "MATCH (g:Game {name: 'Baldur\\'s Gate 3'})-[:AVAILABLE_ON]->(p:Platform {name: 'PlayStation 5'})\n",
    "RETURN g.name, p.name\n",
    "Full Context:\n",
    "[{'g.name': \"Baldur's Gate 3\", 'p.name': 'PlayStation 5'}]\n",
    "\n",
    "> Finished chain.\n",
    "Yes, Baldur's Gate 3 is available on PlayStation 5.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "84b5f6af",
   "metadata": {},
   "source": [
    "Now, with the revised initial Cypher prompt that includes guidance on platform naming, we are obtaining accurate and relevant results that align more closely with user queries. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a21108ad",
   "metadata": {},
   "source": [
    "This approach allows for further improvement of your QA chain. You can effortlessly integrate extra prompt refinement data into your chain, thereby enhancing the overall user experience of your app."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}