docs: Add main graph documentation (#18021)

Co-authored-by: Bagatur <baskaryan@gmail.com>
pull/18278/head
Tomaz Bratanic 4 months ago committed by GitHub
parent 7c8c4e5743
commit c0bdd4d45b
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -0,0 +1,76 @@
{
"cells": [
{
"cell_type": "raw",
"metadata": {},
"source": [
"---\n",
"sidebar_position: 0.4\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Graphs\n",
"\n",
"One of the common types of databases that we can build Q&A systems for are graph databases. LangChain comes with a number of built-in chains and agents that are compatible with graph query language dialects like Cypher, SparQL, and others (e.g., Neo4j, MemGraph, Amazon Neptune, OntoText, Tigegraph). They enable use cases such as:\n",
"\n",
"* Generating queries that will be run based on natural language questions,\n",
"* Creating chatbots that can answer questions based on database data,\n",
"* Building custom dashboards based on insights a user wants to analyze,\n",
"\n",
"and much more.\n",
"\n",
"## ⚠️ Security note ⚠️\n",
"\n",
"Building Q&A systems of graph databases might require executing model-generated database queries. There are inherent risks in doing this. Make sure that your database connection permissions are always scoped as narrowly as possible for your chain/agent's needs. This will mitigate though not eliminate the risks of building a model-driven system. For more on general security best practices, [see here](/docs/security).\n",
"\n",
"![graphgrag_usecase.png](../../../static/img/graph_usecase.png)\n",
"\n",
"> Employing database query templates within a semantic layer provides the advantage of bypassing the need for database query generation. This approach effectively eradicates security vulnerabilities linked to the generation of database queries.\n",
"\n",
"## Quickstart\n",
"\n",
"Head to the **[Quickstart](/docs/use_cases/graph/quickstart)** page to get started.\n",
"\n",
"## Advanced\n",
"\n",
"Once you've familiarized yourself with the basics, you can head to the advanced guides:\n",
"\n",
"* [Prompting strategies](/docs/use_cases/graph/prompting): Advanced prompt engineering techniques.\n",
"* [Mapping values](/docs/use_cases/graph/mapping): Techniques for mapping values from questions to database.\n",
"* [Semantic layer](/docs/use_cases/graph/semantic_layer): Techniques for working implementing semantic layers."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

@ -1,11 +0,0 @@
---
sidebar-position: 1
---
# Graph querying
Graph databases give us a powerful way to represent and query real-world relationships. There are a number of chains that make it easy to use LLMs to interact with various graph DBs.
import DocCardList from "@theme/DocCardList";
<DocCardList />

@ -0,0 +1,2 @@
label: 'Integrations'
position: 3

@ -0,0 +1,474 @@
{
"cells": [
{
"cell_type": "raw",
"id": "5e61b0f2-15b9-4241-9ab5-ff0f3f732232",
"metadata": {},
"source": [
"---\n",
"sidebar_position: 1\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "846ef4f4-ee38-4a42-a7d3-1a23826e4830",
"metadata": {},
"source": [
"# Mapping values to database\n",
"\n",
"In this guide we'll go over strategies to improve graph database query generation by mapping values from user inputs to database.\n",
"When using the built-in graph chains, the LLM is aware of the graph schema, but has no information about the values of properties stored in the database.\n",
"Therefore, we can introduce a new step in graph database QA system to accurately map values.\n",
"\n",
"## Setup\n",
"\n",
"First, get required packages and set environment variables:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "18294435-182d-48da-bcab-5b8945b6d9cf",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install --upgrade --quiet langchain langchain-community langchain-openai neo4j"
]
},
{
"cell_type": "markdown",
"id": "d86dd771-4001-4a34-8680-22e9b50e1e88",
"metadata": {},
"source": [
"We default to OpenAI models in this guide, but you can swap them out for the model provider of your choice."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "9346f8e9-78bf-4667-b3d3-72807a73b718",
"metadata": {},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
"\n",
"# Uncomment the below to use LangSmith. Not required.\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()\n",
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\""
]
},
{
"cell_type": "markdown",
"id": "271c8a23-e51c-4ead-a76e-cf21107db47e",
"metadata": {},
"source": [
"Next, we need to define Neo4j credentials.\n",
"Follow [these installation steps](https://neo4j.com/docs/operations-manual/current/installation/) to set up a Neo4j database."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a2a3bb65-05c7-4daf-bac2-b25ae7fe2751",
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"NEO4J_URI\"] = \"bolt://localhost:7687\"\n",
"os.environ[\"NEO4J_USERNAME\"] = \"neo4j\"\n",
"os.environ[\"NEO4J_PASSWORD\"] = \"password\""
]
},
{
"cell_type": "markdown",
"id": "50fa4510-29b7-49b6-8496-5e86f694e81f",
"metadata": {},
"source": [
"The below example will create a connection with a Neo4j database and will populate it with example data about movies and their actors."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4ee9ef7a-eef9-4289-b9fd-8fbc31041688",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_community.graphs import Neo4jGraph\n",
"\n",
"graph = Neo4jGraph()\n",
"\n",
"# Import movie information\n",
"\n",
"movies_query = \"\"\"\n",
"LOAD CSV WITH HEADERS FROM \n",
"'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'\n",
"AS row\n",
"MERGE (m:Movie {id:row.movieId})\n",
"SET m.released = date(row.released),\n",
" m.title = row.title,\n",
" m.imdbRating = toFloat(row.imdbRating)\n",
"FOREACH (director in split(row.director, '|') | \n",
" MERGE (p:Person {name:trim(director)})\n",
" MERGE (p)-[:DIRECTED]->(m))\n",
"FOREACH (actor in split(row.actors, '|') | \n",
" MERGE (p:Person {name:trim(actor)})\n",
" MERGE (p)-[:ACTED_IN]->(m))\n",
"FOREACH (genre in split(row.genres, '|') | \n",
" MERGE (g:Genre {name:trim(genre)})\n",
" MERGE (m)-[:IN_GENRE]->(g))\n",
"\"\"\"\n",
"\n",
"graph.query(movies_query)"
]
},
{
"cell_type": "markdown",
"id": "0cb0ea30-ca55-4f35-aad6-beb57453de66",
"metadata": {},
"source": [
"## Detecting entities in the user input\n",
"We have to extract the types of entities/values we want to map to a graph database. In this example, we are dealing with a movie graph, so we can map movies and people to the database."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "e1a19424-6046-40c2-81d1-f3b88193a293",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/tomazbratanic/anaconda3/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The function `create_structured_output_chain` was deprecated in LangChain 0.1.1 and will be removed in 0.2.0. Use create_structured_output_runnable instead.\n",
" warn_deprecated(\n"
]
}
],
"source": [
"from typing import List, Optional\n",
"\n",
"from langchain.chains.openai_functions import create_structured_output_chain\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.pydantic_v1 import BaseModel, Field\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
"\n",
"\n",
"class Entities(BaseModel):\n",
" \"\"\"Identifying information about entities.\"\"\"\n",
"\n",
" names: List[str] = Field(\n",
" ...,\n",
" description=\"All the person or movies appearing in the text\",\n",
" )\n",
"\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"You are extracting person and movies from the text.\",\n",
" ),\n",
" (\n",
" \"human\",\n",
" \"Use the given format to extract information from the following \"\n",
" \"input: {question}\",\n",
" ),\n",
" ]\n",
")\n",
"\n",
"\n",
"entity_chain = create_structured_output_chain(Entities, llm, prompt)"
]
},
{
"cell_type": "markdown",
"id": "9c14084c-37a7-4a9c-a026-74e12961c781",
"metadata": {},
"source": [
"We can test the entity extraction chain."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "bbfe0d8f-982e-46e6-88fb-8a4f0d850b07",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'Who played in Casino movie?',\n",
" 'function': Entities(names=['Casino'])}"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"entities = entity_chain.invoke({\"question\": \"Who played in Casino movie?\"})\n",
"entities"
]
},
{
"cell_type": "markdown",
"id": "a8afbf13-05d0-4383-8050-f88b8c2f6fab",
"metadata": {},
"source": [
"We will utilize a simple `CONTAINS` clause to match entities to database. In practice, you might want to use a fuzzy search or a fulltext index to allow for minor misspellings."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "6f92929f-74fb-4db2-b7e1-eb1e9d386a67",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Casino maps to Casino Movie in database\\n'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"match_query = \"\"\"MATCH (p:Person|Movie)\n",
"WHERE p.name CONTAINS $value OR p.title CONTAINS $value\n",
"RETURN coalesce(p.name, p.title) AS result, labels(p)[0] AS type\n",
"LIMIT 1\n",
"\"\"\"\n",
"\n",
"\n",
"def map_to_database(values):\n",
" result = \"\"\n",
" for entity in values.names:\n",
" response = graph.query(match_query, {\"value\": entity})\n",
" try:\n",
" result += f\"{entity} maps to {response[0]['result']} {response[0]['type']} in database\\n\"\n",
" except IndexError:\n",
" pass\n",
" return result\n",
"\n",
"\n",
"map_to_database(entities[\"function\"])"
]
},
{
"cell_type": "markdown",
"id": "f66c6756-6efb-4b1e-9b5d-87ed914a5212",
"metadata": {},
"source": [
"## Custom Cypher generating chain\n",
"\n",
"We need to define a custom Cypher prompt that takes the entity mapping information along with the schema and the user question to construct a Cypher statement.\n",
"We will be using the LangChain expression language to accomplish that."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "8ef3e21d-f1c2-45e2-9511-4920d1cf6e7e",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"\n",
"# Generate Cypher statement based on natural language input\n",
"cypher_template = \"\"\"Based on the Neo4j graph schema below, write a Cypher query that would answer the user's question:\n",
"{schema}\n",
"Entities in the question map to the following database values:\n",
"{entities_list}\n",
"Question: {question}\n",
"Cypher query:\"\"\" # noqa: E501\n",
"\n",
"cypher_prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"Given an input question, convert it to a Cypher query. No pre-amble.\",\n",
" ),\n",
" (\"human\", cypher_template),\n",
" ]\n",
")\n",
"\n",
"cypher_response = (\n",
" RunnablePassthrough.assign(names=entity_chain)\n",
" | RunnablePassthrough.assign(\n",
" entities_list=lambda x: map_to_database(x[\"names\"][\"function\"]),\n",
" schema=lambda _: graph.get_schema,\n",
" )\n",
" | cypher_prompt\n",
" | llm.bind(stop=[\"\\nCypherResult:\"])\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "1f0011e3-9660-4975-af2a-486b1bc3b954",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'MATCH (:Movie {title: \"Casino\"})<-[:ACTED_IN]-(actor)\\nRETURN actor.name'"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cypher = cypher_response.invoke({\"question\": \"Who played in Casino movie?\"})\n",
"cypher"
]
},
{
"cell_type": "markdown",
"id": "38095678-611f-4847-a4de-e51ef7ef727c",
"metadata": {},
"source": [
"## Generating answers based on database results\n",
"\n",
"Now that we have a chain that generates the Cypher statement, we need to execute the Cypher statement against the database and send the database results back to an LLM to generate the final answer.\n",
"Again, we will be using LCEL."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "d1fa97c0-1c9c-41d3-9ee1-5f1905d17434",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.graph_qa.cypher_utils import CypherQueryCorrector, Schema\n",
"\n",
"# Cypher validation tool for relationship directions\n",
"corrector_schema = [\n",
" Schema(el[\"start\"], el[\"type\"], el[\"end\"])\n",
" for el in graph.structured_schema.get(\"relationships\")\n",
"]\n",
"cypher_validation = CypherQueryCorrector(corrector_schema)\n",
"\n",
"# Generate natural language response based on database results\n",
"response_template = \"\"\"Based on the the question, Cypher query, and Cypher response, write a natural language response:\n",
"Question: {question}\n",
"Cypher query: {query}\n",
"Cypher Response: {response}\"\"\" # noqa: E501\n",
"\n",
"response_prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"Given an input question and Cypher response, convert it to a natural\"\n",
" \" language answer. No pre-amble.\",\n",
" ),\n",
" (\"human\", response_template),\n",
" ]\n",
")\n",
"\n",
"chain = (\n",
" RunnablePassthrough.assign(query=cypher_response)\n",
" | RunnablePassthrough.assign(\n",
" response=lambda x: graph.query(cypher_validation(x[\"query\"])),\n",
" )\n",
" | response_prompt\n",
" | llm\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "918146e5-7918-46d2-a774-53f9547d8fcb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Joe Pesci, Robert De Niro, Sharon Stone, and James Woods played in the movie \"Casino\".'"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke({\"question\": \"Who played in Casino movie?\"})"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c7ba75cd-8399-4e54-a6f8-8a411f159f56",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -0,0 +1,540 @@
{
"cells": [
{
"cell_type": "raw",
"metadata": {},
"source": [
"---\n",
"sidebar_position: 2\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Prompting strategies\n",
"\n",
"In this guide we'll go over prompting strategies to improve graph database query generation. We'll largely focus on methods for getting relevant database-specific information in your prompt.\n",
"\n",
"## Setup\n",
"\n",
"First, get required packages and set environment variables:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install --upgrade --quiet langchain langchain-community langchain-openai neo4j"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We default to OpenAI models in this guide, but you can swap them out for the model provider of your choice."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
"\n",
"# Uncomment the below to use LangSmith. Not required.\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()\n",
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we need to define Neo4j credentials.\n",
"Follow [these installation steps](https://neo4j.com/docs/operations-manual/current/installation/) to set up a Neo4j database."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"NEO4J_URI\"] = \"bolt://localhost:7687\"\n",
"os.environ[\"NEO4J_USERNAME\"] = \"neo4j\"\n",
"os.environ[\"NEO4J_PASSWORD\"] = \"password\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The below example will create a connection with a Neo4j database and will populate it with example data about movies and their actors."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_community.graphs import Neo4jGraph\n",
"\n",
"graph = Neo4jGraph()\n",
"\n",
"# Import movie information\n",
"\n",
"movies_query = \"\"\"\n",
"LOAD CSV WITH HEADERS FROM \n",
"'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'\n",
"AS row\n",
"MERGE (m:Movie {id:row.movieId})\n",
"SET m.released = date(row.released),\n",
" m.title = row.title,\n",
" m.imdbRating = toFloat(row.imdbRating)\n",
"FOREACH (director in split(row.director, '|') | \n",
" MERGE (p:Person {name:trim(director)})\n",
" MERGE (p)-[:DIRECTED]->(m))\n",
"FOREACH (actor in split(row.actors, '|') | \n",
" MERGE (p:Person {name:trim(actor)})\n",
" MERGE (p)-[:ACTED_IN]->(m))\n",
"FOREACH (genre in split(row.genres, '|') | \n",
" MERGE (g:Genre {name:trim(genre)})\n",
" MERGE (m)-[:IN_GENRE]->(g))\n",
"\"\"\"\n",
"\n",
"graph.query(movies_query)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Filtering graph schema\n",
"\n",
"At times, you may need to focus on a specific subset of the graph schema while generating Cypher statements.\n",
"Let's say we are dealing with the following graph schema:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Node properties are the following:\n",
"Movie {imdbRating: FLOAT, id: STRING, released: DATE, title: STRING},Person {name: STRING},Genre {name: STRING}\n",
"Relationship properties are the following:\n",
"\n",
"The relationships are the following:\n",
"(:Movie)-[:IN_GENRE]->(:Genre),(:Person)-[:DIRECTED]->(:Movie),(:Person)-[:ACTED_IN]->(:Movie)\n"
]
}
],
"source": [
"graph.refresh_schema()\n",
"print(graph.schema)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's say we want to exclude the _Genre_ node from the schema representation we pass to an LLM.\n",
"We can achieve that using the `exclude` parameter of the GraphCypherQAChain chain."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import GraphCypherQAChain\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
"chain = GraphCypherQAChain.from_llm(\n",
" graph=graph, llm=llm, exclude_types=[\"Genre\"], verbose=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Node properties are the following:\n",
"Movie {imdbRating: FLOAT, id: STRING, released: DATE, title: STRING},Person {name: STRING}\n",
"Relationship properties are the following:\n",
"\n",
"The relationships are the following:\n",
"(:Person)-[:DIRECTED]->(:Movie),(:Person)-[:ACTED_IN]->(:Movie)\n"
]
}
],
"source": [
"print(chain.graph_schema)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Few-shot examples\n",
"\n",
"Including examples of natural language questions being converted to valid Cypher queries against our database in the prompt will often improve model performance, especially for complex queries.\n",
"\n",
"Let's say we have the following examples:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"examples = [\n",
" {\n",
" \"question\": \"How many artists are there?\",\n",
" \"query\": \"MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)\",\n",
" },\n",
" {\n",
" \"question\": \"Which actors played in the movie Casino?\",\n",
" \"query\": \"MATCH (m:Movie {{title: 'Casino'}})<-[:ACTED_IN]-(a) RETURN a.name\",\n",
" },\n",
" {\n",
" \"question\": \"How many movies has Tom Hanks acted in?\",\n",
" \"query\": \"MATCH (a:Person {{name: 'Tom Hanks'}})-[:ACTED_IN]->(m:Movie) RETURN count(m)\",\n",
" },\n",
" {\n",
" \"question\": \"List all the genres of the movie Schindler's List\",\n",
" \"query\": \"MATCH (m:Movie {{title: 'Schindler\\\\'s List'}})-[:IN_GENRE]->(g:Genre) RETURN g.name\",\n",
" },\n",
" {\n",
" \"question\": \"Which actors have worked in movies from both the comedy and action genres?\",\n",
" \"query\": \"MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name\",\n",
" },\n",
" {\n",
" \"question\": \"Which directors have made movies with at least three different actors named 'John'?\",\n",
" \"query\": \"MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH d, COUNT(DISTINCT a) AS JohnsCount WHERE JohnsCount >= 3 RETURN d.name\",\n",
" },\n",
" {\n",
" \"question\": \"Identify movies where directors also played a role in the film.\",\n",
" \"query\": \"MATCH (p:Person)-[:DIRECTED]->(m:Movie), (p)-[:ACTED_IN]->(m) RETURN m.title, p.name\",\n",
" },\n",
" {\n",
" \"question\": \"Find the actor with the highest number of movies in the database.\",\n",
" \"query\": \"MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DESC LIMIT 1\",\n",
" },\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can create a few-shot prompt with them like so:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate\n",
"\n",
"example_prompt = PromptTemplate.from_template(\n",
" \"User input: {question}\\nCypher query: {query}\"\n",
")\n",
"prompt = FewShotPromptTemplate(\n",
" examples=examples[:5],\n",
" example_prompt=example_prompt,\n",
" prefix=\"You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.\\n\\nHere is the schema information\\n{schema}.\\n\\nBelow are a number of examples of questions and their corresponding Cypher queries.\",\n",
" suffix=\"User input: {question}\\nCypher query: \",\n",
" input_variables=[\"question\", \"schema\"],\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.\n",
"\n",
"Here is the schema information\n",
"foo.\n",
"\n",
"Below are a number of examples of questions and their corresponding Cypher queries.\n",
"\n",
"User input: How many artists are there?\n",
"Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)\n",
"\n",
"User input: Which actors played in the movie Casino?\n",
"Cypher query: MATCH (m:Movie {title: 'Casino'})<-[:ACTED_IN]-(a) RETURN a.name\n",
"\n",
"User input: How many movies has Tom Hanks acted in?\n",
"Cypher query: MATCH (a:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN count(m)\n",
"\n",
"User input: List all the genres of the movie Schindler's List\n",
"Cypher query: MATCH (m:Movie {title: 'Schindler\\'s List'})-[:IN_GENRE]->(g:Genre) RETURN g.name\n",
"\n",
"User input: Which actors have worked in movies from both the comedy and action genres?\n",
"Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name\n",
"\n",
"User input: How many artists are there?\n",
"Cypher query: \n"
]
}
],
"source": [
"print(prompt.format(question=\"How many artists are there?\", schema=\"foo\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dynamic few-shot examples\n",
"\n",
"If we have enough examples, we may want to only include the most relevant ones in the prompt, either because they don't fit in the model's context window or because the long tail of examples distracts the model. And specifically, given any input we want to include the examples most relevant to that input.\n",
"\n",
"We can do just this using an ExampleSelector. In this case we'll use a [SemanticSimilarityExampleSelector](https://api.python.langchain.com/en/latest/example_selectors/langchain_core.example_selectors.semantic_similarity.SemanticSimilarityExampleSelector.html), which will store the examples in the vector database of our choosing. At runtime it will perform a similarity search between the input and our examples, and return the most semantically similar ones: "
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.vectorstores import Neo4jVector\n",
"from langchain_core.example_selectors import SemanticSimilarityExampleSelector\n",
"from langchain_openai import OpenAIEmbeddings\n",
"\n",
"example_selector = SemanticSimilarityExampleSelector.from_examples(\n",
" examples,\n",
" OpenAIEmbeddings(),\n",
" Neo4jVector,\n",
" k=5,\n",
" input_keys=[\"question\"],\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'query': 'MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)',\n",
" 'question': 'How many artists are there?'},\n",
" {'query': \"MATCH (a:Person {{name: 'Tom Hanks'}})-[:ACTED_IN]->(m:Movie) RETURN count(m)\",\n",
" 'question': 'How many movies has Tom Hanks acted in?'},\n",
" {'query': \"MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name\",\n",
" 'question': 'Which actors have worked in movies from both the comedy and action genres?'},\n",
" {'query': \"MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH d, COUNT(DISTINCT a) AS JohnsCount WHERE JohnsCount >= 3 RETURN d.name\",\n",
" 'question': \"Which directors have made movies with at least three different actors named 'John'?\"},\n",
" {'query': 'MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DESC LIMIT 1',\n",
" 'question': 'Find the actor with the highest number of movies in the database.'}]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"example_selector.select_examples({\"question\": \"how many artists are there?\"})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To use it, we can pass the ExampleSelector directly in to our FewShotPromptTemplate:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"prompt = FewShotPromptTemplate(\n",
" example_selector=example_selector,\n",
" example_prompt=example_prompt,\n",
" prefix=\"You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.\\n\\nHere is the schema information\\n{schema}.\\n\\nBelow are a number of examples of questions and their corresponding Cypher queries.\",\n",
" suffix=\"User input: {question}\\nCypher query: \",\n",
" input_variables=[\"question\", \"schema\"],\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.\n",
"\n",
"Here is the schema information\n",
"foo.\n",
"\n",
"Below are a number of examples of questions and their corresponding Cypher queries.\n",
"\n",
"User input: How many artists are there?\n",
"Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)\n",
"\n",
"User input: How many movies has Tom Hanks acted in?\n",
"Cypher query: MATCH (a:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN count(m)\n",
"\n",
"User input: Which actors have worked in movies from both the comedy and action genres?\n",
"Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name\n",
"\n",
"User input: Which directors have made movies with at least three different actors named 'John'?\n",
"Cypher query: MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH d, COUNT(DISTINCT a) AS JohnsCount WHERE JohnsCount >= 3 RETURN d.name\n",
"\n",
"User input: Find the actor with the highest number of movies in the database.\n",
"Cypher query: MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DESC LIMIT 1\n",
"\n",
"User input: how many artists are there?\n",
"Cypher query: \n"
]
}
],
"source": [
"print(prompt.format(question=\"how many artists are there?\", schema=\"foo\"))"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
"chain = GraphCypherQAChain.from_llm(\n",
" graph=graph, llm=llm, cypher_prompt=prompt, verbose=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
"Generated Cypher:\n",
"\u001b[32;1m\u001b[1;3mMATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)\u001b[0m\n",
"Full Context:\n",
"\u001b[32;1m\u001b[1;3m[{'count(DISTINCT a)': 967}]\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'query': 'How many actors are in the graph?',\n",
" 'result': 'There are 967 actors in the graph.'}"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke(\"How many actors are in the graph?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

@ -0,0 +1,336 @@
{
"cells": [
{
"cell_type": "raw",
"metadata": {},
"source": [
"---\n",
"sidebar_position: 0\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Quickstart\n",
"\n",
"In this guide we'll go over the basic ways to create a Q&A chain over a graph database. These systems will allow us to ask a question about the data in a graph database and get back a natural language answer.\n",
"\n",
"## ⚠️ Security note ⚠️\n",
"\n",
"Building Q&A systems of graph databases requires executing model-generated graph queries. There are inherent risks in doing this. Make sure that your database connection permissions are always scoped as narrowly as possible for your chain/agent's needs. This will mitigate though not eliminate the risks of building a model-driven system. For more on general security best practices, [see here](/docs/security).\n",
"\n",
"\n",
"## Architecture\n",
"\n",
"At a high-level, the steps of most graph chains are:\n",
"\n",
"1. **Convert question to a graph database query**: Model converts user input to a graph database query (e.g. Cypher).\n",
"2. **Execute graph database query**: Execute the graph database query.\n",
"3. **Answer the question**: Model responds to user input using the query results.\n",
"\n",
"\n",
"![sql_usecase.png](../../../static/img/graph_usecase.png)\n",
"\n",
"## Setup\n",
"\n",
"First, get required packages and set environment variables.\n",
"In this example, we will be using Neo4j graph database."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain langchain-community langchain-openai neo4j"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We default to OpenAI models in this guide."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
"\n",
"# Uncomment the below to use LangSmith. Not required.\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()\n",
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we need to define Neo4j credentials.\n",
"Follow [these installation steps](https://neo4j.com/docs/operations-manual/current/installation/) to set up a Neo4j database."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"NEO4J_URI\"] = \"bolt://localhost:7687\"\n",
"os.environ[\"NEO4J_USERNAME\"] = \"neo4j\"\n",
"os.environ[\"NEO4J_PASSWORD\"] = \"password\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The below example will create a connection with a Neo4j database and will populate it with example data about movies and their actors."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_community.graphs import Neo4jGraph\n",
"\n",
"graph = Neo4jGraph()\n",
"\n",
"# Import movie information\n",
"\n",
"movies_query = \"\"\"\n",
"LOAD CSV WITH HEADERS FROM \n",
"'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'\n",
"AS row\n",
"MERGE (m:Movie {id:row.movieId})\n",
"SET m.released = date(row.released),\n",
" m.title = row.title,\n",
" m.imdbRating = toFloat(row.imdbRating)\n",
"FOREACH (director in split(row.director, '|') | \n",
" MERGE (p:Person {name:trim(director)})\n",
" MERGE (p)-[:DIRECTED]->(m))\n",
"FOREACH (actor in split(row.actors, '|') | \n",
" MERGE (p:Person {name:trim(actor)})\n",
" MERGE (p)-[:ACTED_IN]->(m))\n",
"FOREACH (genre in split(row.genres, '|') | \n",
" MERGE (g:Genre {name:trim(genre)})\n",
" MERGE (m)-[:IN_GENRE]->(g))\n",
"\"\"\"\n",
"\n",
"graph.query(movies_query)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Graph schema\n",
"\n",
"In order for an LLM to be able to generate a Cypher statement, it needs information about the graph schema. When you instantiate a graph object, it retrieves the information about the graph schema. If you later make any changes to the graph, you can run the `refresh_schema` method to refresh the schema information."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Node properties are the following:\n",
"Movie {imdbRating: FLOAT, id: STRING, released: DATE, title: STRING},Person {name: STRING},Genre {name: STRING},Chunk {id: STRING, question: STRING, query: STRING, text: STRING, embedding: LIST}\n",
"Relationship properties are the following:\n",
"\n",
"The relationships are the following:\n",
"(:Movie)-[:IN_GENRE]->(:Genre),(:Person)-[:DIRECTED]->(:Movie),(:Person)-[:ACTED_IN]->(:Movie)\n"
]
}
],
"source": [
"graph.refresh_schema()\n",
"print(graph.schema)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Great! We've got a graph database that we can query. Now let's try hooking it up to an LLM.\n",
"\n",
"## Chain\n",
"\n",
"Let's use a simple chain that takes a question, turns it into a Cypher query, executes the query, and uses the result to answer the original question.\n",
"\n",
"![graph_chain.webp](../../../static/img/graph_chain.webp)\n",
"\n",
"\n",
"LangChain comes with a built-in chain for this workflow that is designed to work with Neo4j: [GraphCypherQAChain](https://python.langchain.com/docs/use_cases/graph/graph_cypher_qa)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
"Generated Cypher:\n",
"\u001b[32;1m\u001b[1;3mMATCH (:Movie {title: \"Casino\"})<-[:ACTED_IN]-(actor:Person)\n",
"RETURN actor.name\u001b[0m\n",
"Full Context:\n",
"\u001b[32;1m\u001b[1;3m[{'actor.name': 'Joe Pesci'}, {'actor.name': 'Robert De Niro'}, {'actor.name': 'Sharon Stone'}, {'actor.name': 'James Woods'}]\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'query': 'What was the cast of the Casino?',\n",
" 'result': 'The cast of Casino included Joe Pesci, Robert De Niro, Sharon Stone, and James Woods.'}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains import GraphCypherQAChain\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
"chain = GraphCypherQAChain.from_llm(graph=graph, llm=llm, verbose=True)\n",
"response = chain.invoke({\"query\": \"What was the cast of the Casino?\"})\n",
"response"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Validating relationship direction\n",
"\n",
"LLMs can struggle with relationship directions in generated Cypher statement. Since the graph schema is predefined, we can validate and optionally correct relationship directions in the generated Cypher statements by using the `validate_cypher` parameter."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
"Generated Cypher:\n",
"\u001b[32;1m\u001b[1;3mMATCH (:Movie {title: \"Casino\"})<-[:ACTED_IN]-(actor:Person)\n",
"RETURN actor.name\u001b[0m\n",
"Full Context:\n",
"\u001b[32;1m\u001b[1;3m[{'actor.name': 'Joe Pesci'}, {'actor.name': 'Robert De Niro'}, {'actor.name': 'Sharon Stone'}, {'actor.name': 'James Woods'}]\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'query': 'What was the cast of the Casino?',\n",
" 'result': 'The cast of Casino included Joe Pesci, Robert De Niro, Sharon Stone, and James Woods.'}"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain = GraphCypherQAChain.from_llm(\n",
" graph=graph, llm=llm, verbose=True, validate_cypher=True\n",
")\n",
"response = chain.invoke({\"query\": \"What was the cast of the Casino?\"})\n",
"response"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Next steps\n",
"\n",
"For more complex query-generation, we may want to create few-shot prompts or add query-checking steps. For advanced techniques like this and more check out:\n",
"\n",
"* [Prompting strategies](/docs/use_cases/graph/prompting): Advanced prompt engineering techniques.\n",
"* [Mapping values](/docs/use_cases/graph/mapping): Techniques for mapping values from questions to database.\n",
"* [Semantic layer](/docs/use_cases/graph/semantic_layer): Techniques for working implementing semantic layers."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

@ -0,0 +1,415 @@
{
"cells": [
{
"cell_type": "raw",
"id": "19cc5b11-3822-454b-afb3-7bebd7f17b5c",
"metadata": {},
"source": [
"---\n",
"sidebar_position: 1\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "2e17a273-bcfc-433f-8d42-2ba9533feeb8",
"metadata": {},
"source": [
"# Semantic layer over graph database\n",
"\n",
"You can use database queries to retrieve information from a graph database like Neo4j.\n",
"One option is to use LLMs to generate Cypher statements.\n",
"While that option provides excellent flexibility, the solution could be brittle and not consistently generating precise Cypher statements.\n",
"Instead of generating Cypher statements, we can implement Cypher templates as tools in a semantic layer that an LLM agent can interact with.\n",
"\n",
"![graph_semantic.png](../../../static/img/graph_semantic.png)\n",
"\n",
"## Setup\n",
"\n",
"First, get required packages and set environment variables:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "ffdd48f6-bd05-4e5c-b846-d41183398a55",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install --upgrade --quiet langchain langchain-community langchain-openai neo4j"
]
},
{
"cell_type": "markdown",
"id": "4575b174-01e6-4061-aebf-f81e718de777",
"metadata": {},
"source": [
"We default to OpenAI models in this guide, but you can swap them out for the model provider of your choice."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "eb11c4a8-c00c-4c2d-9309-74a6acfff91c",
"metadata": {},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
"\n",
"# Uncomment the below to use LangSmith. Not required.\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()\n",
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\""
]
},
{
"cell_type": "markdown",
"id": "76bb62ba-0060-41a2-a7b9-1f9c1faf571a",
"metadata": {},
"source": [
"Next, we need to define Neo4j credentials.\n",
"Follow [these installation steps](https://neo4j.com/docs/operations-manual/current/installation/) to set up a Neo4j database."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "ef59a3af-31a8-4ad8-8eb9-132aca66956e",
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"NEO4J_URI\"] = \"bolt://localhost:7687\"\n",
"os.environ[\"NEO4J_USERNAME\"] = \"neo4j\"\n",
"os.environ[\"NEO4J_PASSWORD\"] = \"password\""
]
},
{
"cell_type": "markdown",
"id": "1e8fbc2c-b8e8-4c53-8fce-243cf99d3c1c",
"metadata": {},
"source": [
"The below example will create a connection with a Neo4j database and will populate it with example data about movies and their actors."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c84b1449-6fcd-4140-b591-cb45e8dce207",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_community.graphs import Neo4jGraph\n",
"\n",
"graph = Neo4jGraph()\n",
"\n",
"# Import movie information\n",
"\n",
"movies_query = \"\"\"\n",
"LOAD CSV WITH HEADERS FROM \n",
"'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'\n",
"AS row\n",
"MERGE (m:Movie {id:row.movieId})\n",
"SET m.released = date(row.released),\n",
" m.title = row.title,\n",
" m.imdbRating = toFloat(row.imdbRating)\n",
"FOREACH (director in split(row.director, '|') | \n",
" MERGE (p:Person {name:trim(director)})\n",
" MERGE (p)-[:DIRECTED]->(m))\n",
"FOREACH (actor in split(row.actors, '|') | \n",
" MERGE (p:Person {name:trim(actor)})\n",
" MERGE (p)-[:ACTED_IN]->(m))\n",
"FOREACH (genre in split(row.genres, '|') | \n",
" MERGE (g:Genre {name:trim(genre)})\n",
" MERGE (m)-[:IN_GENRE]->(g))\n",
"\"\"\"\n",
"\n",
"graph.query(movies_query)"
]
},
{
"cell_type": "markdown",
"id": "403b9acd-aa0d-4157-b9de-6ec426835c43",
"metadata": {},
"source": [
"## Custom tools with Cypher templates\n",
"\n",
"A semantic layer consists of various tools exposed to an LLM that it can use to interact with a knowledge graph.\n",
"They can be of various complexity. You can think of each tool in a semantic layer as a function.\n",
"\n",
"The function we will implement is to retrieve information about movies or their cast."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "d1dc1c8c-f343-4024-924b-a8a86cf5f1af",
"metadata": {},
"outputs": [],
"source": [
"from typing import Optional, Type\n",
"\n",
"from langchain.callbacks.manager import (\n",
" AsyncCallbackManagerForToolRun,\n",
" CallbackManagerForToolRun,\n",
")\n",
"\n",
"# Import things that are needed generically\n",
"from langchain.pydantic_v1 import BaseModel, Field\n",
"from langchain.tools import BaseTool\n",
"\n",
"description_query = \"\"\"\n",
"MATCH (m:Movie|Person)\n",
"WHERE m.title CONTAINS $candidate OR m.name CONTAINS $candidate\n",
"MATCH (m)-[r:ACTED_IN|HAS_GENRE]-(t)\n",
"WITH m, type(r) as type, collect(coalesce(t.name, t.title)) as names\n",
"WITH m, type+\": \"+reduce(s=\"\", n IN names | s + n + \", \") as types\n",
"WITH m, collect(types) as contexts\n",
"WITH m, \"type:\" + labels(m)[0] + \"\\ntitle: \"+ coalesce(m.title, m.name) \n",
" + \"\\nyear: \"+coalesce(m.released,\"\") +\"\\n\" +\n",
" reduce(s=\"\", c in contexts | s + substring(c, 0, size(c)-2) +\"\\n\") as context\n",
"RETURN context LIMIT 1\n",
"\"\"\"\n",
"\n",
"\n",
"def get_information(entity: str) -> str:\n",
" try:\n",
" data = graph.query(description_query, params={\"candidate\": entity})\n",
" return data[0][\"context\"]\n",
" except IndexError:\n",
" return \"No information was found\""
]
},
{
"cell_type": "markdown",
"id": "bdecc24b-8065-4755-98cc-9c6d093d4897",
"metadata": {},
"source": [
"You can observe that we have defined the Cypher statement used to retrieve information.\n",
"Therefore, we can avoid generating Cypher statements and use the LLM agent to only populate the input parameters.\n",
"To provide additional information to an LLM agent about when to use the tool and their input parameters, we wrap the function as a tool."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "f4cde772-0d05-475d-a2f0-b53e1669bd13",
"metadata": {},
"outputs": [],
"source": [
"from typing import Optional, Type\n",
"\n",
"from langchain.callbacks.manager import (\n",
" AsyncCallbackManagerForToolRun,\n",
" CallbackManagerForToolRun,\n",
")\n",
"\n",
"# Import things that are needed generically\n",
"from langchain.pydantic_v1 import BaseModel, Field\n",
"from langchain.tools import BaseTool\n",
"\n",
"\n",
"class InformationInput(BaseModel):\n",
" entity: str = Field(description=\"movie or a person mentioned in the question\")\n",
"\n",
"\n",
"class InformationTool(BaseTool):\n",
" name = \"Information\"\n",
" description = (\n",
" \"useful for when you need to answer questions about various actors or movies\"\n",
" )\n",
" args_schema: Type[BaseModel] = InformationInput\n",
"\n",
" def _run(\n",
" self,\n",
" entity: str,\n",
" run_manager: Optional[CallbackManagerForToolRun] = None,\n",
" ) -> str:\n",
" \"\"\"Use the tool.\"\"\"\n",
" return get_information(entity)\n",
"\n",
" async def _arun(\n",
" self,\n",
" entity: str,\n",
" run_manager: Optional[AsyncCallbackManagerForToolRun] = None,\n",
" ) -> str:\n",
" \"\"\"Use the tool asynchronously.\"\"\"\n",
" return get_information(entity)"
]
},
{
"cell_type": "markdown",
"id": "ff4820aa-2b57-4558-901f-6d984b326738",
"metadata": {},
"source": [
"## OpenAI Agent\n",
"\n",
"LangChain expression language makes it very convenient to define an agent to interact with a graph database over the semantic layer."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "6e959ac2-537d-4358-a43b-e3a47f68e1d6",
"metadata": {},
"outputs": [],
"source": [
"from typing import List, Tuple\n",
"\n",
"from langchain.agents import AgentExecutor\n",
"from langchain.agents.format_scratchpad import format_to_openai_function_messages\n",
"from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser\n",
"from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
"from langchain_core.messages import AIMessage, HumanMessage\n",
"from langchain_core.utils.function_calling import convert_to_openai_function\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
"tools = [InformationTool()]\n",
"\n",
"llm_with_tools = llm.bind(functions=[convert_to_openai_function(t) for t in tools])\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"You are a helpful assistant that finds information about movies \"\n",
" \" and recommends them. If tools require follow up questions, \"\n",
" \"make sure to ask the user for clarification. Make sure to include any \"\n",
" \"available options that need to be clarified in the follow up questions \"\n",
" \"Do only the things the user specifically requested. \",\n",
" ),\n",
" MessagesPlaceholder(variable_name=\"chat_history\"),\n",
" (\"user\", \"{input}\"),\n",
" MessagesPlaceholder(variable_name=\"agent_scratchpad\"),\n",
" ]\n",
")\n",
"\n",
"\n",
"def _format_chat_history(chat_history: List[Tuple[str, str]]):\n",
" buffer = []\n",
" for human, ai in chat_history:\n",
" buffer.append(HumanMessage(content=human))\n",
" buffer.append(AIMessage(content=ai))\n",
" return buffer\n",
"\n",
"\n",
"agent = (\n",
" {\n",
" \"input\": lambda x: x[\"input\"],\n",
" \"chat_history\": lambda x: _format_chat_history(x[\"chat_history\"])\n",
" if x.get(\"chat_history\")\n",
" else [],\n",
" \"agent_scratchpad\": lambda x: format_to_openai_function_messages(\n",
" x[\"intermediate_steps\"]\n",
" ),\n",
" }\n",
" | prompt\n",
" | llm_with_tools\n",
" | OpenAIFunctionsAgentOutputParser()\n",
")\n",
"\n",
"agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "b0459833-fe84-4ebc-9823-a3a3ffd929e9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\n",
"Invoking: `Information` with `{'entity': 'Casino'}`\n",
"\n",
"\n",
"\u001b[0m\u001b[36;1m\u001b[1;3mtype:Movie\n",
"title: Casino\n",
"year: 1995-11-22\n",
"ACTED_IN: Joe Pesci, Robert De Niro, Sharon Stone, James Woods\n",
"\u001b[0m\u001b[32;1m\u001b[1;3mThe movie \"Casino\" starred Joe Pesci, Robert De Niro, Sharon Stone, and James Woods.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'input': 'Who played in Casino?',\n",
" 'output': 'The movie \"Casino\" starred Joe Pesci, Robert De Niro, Sharon Stone, and James Woods.'}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_executor.invoke({\"input\": \"Who played in Casino?\"})"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c2759973-de8a-4624-8930-c90a21d6caa3",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

@ -1,5 +1,61 @@
{
"redirects": [
{
"source": "/docs/use_cases/graph/diffbot_graphtransformer",
"destination": "/docs/use_cases/graph/integrations/diffbot_graphtransformer"
},
{
"source": "/docs/use_cases/graph/graph_arangodb_qa",
"destination": "/docs/use_cases/graph/integrations/graph_arangodb_qa"
},
{
"source": "/docs/use_cases/graph/graph_cypher_qa",
"destination": "/docs/use_cases/graph/integrations/graph_cypher_qa"
},
{
"source": "/docs/use_cases/graph/graph_falkordb_qa",
"destination": "/docs/use_cases/graph/integrations/graph_falkordb_qa"
},
{
"source": "/docs/use_cases/graph/graph_gremlin_cosmosdb_qa",
"destination": "/docs/use_cases/graph/integrations/graph_gremlin_cosmosdb_qa"
},
{
"source": "/docs/use_cases/graph/graph_hugegraph_qa",
"destination": "/docs/use_cases/graph/integrations/graph_hugegraph_qa"
},
{
"source": "/docs/use_cases/graph/graph_kuzu_qa",
"destination": "/docs/use_cases/graph/integrations/graph_kuzu_qa"
},
{
"source": "/docs/use_cases/graph/graph_memgraph_qa",
"destination": "/docs/use_cases/graph/integrations/graph_memgraph_qa"
},
{
"source": "/docs/use_cases/graph/graph_nebula_qa",
"destination": "/docs/use_cases/graph/integrations/graph_nebula_qa"
},
{
"source": "/docs/use_cases/graph/graph_networkx_qa",
"destination": "/docs/use_cases/graph/integrations/graph_networkx_qa"
},
{
"source": "/docs/use_cases/graph/graph_ontotext_graphdb_qa",
"destination": "/docs/use_cases/graph/integrations/graph_ontotext_graphdb_qa"
},
{
"source": "/docs/use_cases/graph/graph_sparql_qa",
"destination": "/docs/use_cases/graph/integrations/graph_sparql_qa"
},
{
"source": "/docs/use_cases/graph/neptune_cypher_qa.ipynb",
"destination": "/docs/use_cases/graph/integrations/neptune_cypher_qa.ipynb"
},
{
"source": "/docs/use_cases/graph/neptune_sparql_qa",
"destination": "/docs/use_cases/graph/integrations/neptune_sparql_qa"
},
{
"source": "/docs/integrations/llms/huggingface_textgen_inference",
"destination": "/docs/integrations/llms/huggingface_endpoint"

Loading…
Cancel
Save