You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/docs/docs/use_cases/graph/quickstart.ipynb

338 lines
10 KiB
Plaintext

{
"cells": [
{
"cell_type": "raw",
"metadata": {},
"source": [
"---\n",
"sidebar_position: 0\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Quickstart\n",
"\n",
"In this guide we'll go over the basic ways to create a Q&A chain over a graph database. These systems will allow us to ask a question about the data in a graph database and get back a natural language answer.\n",
"\n",
"## ⚠️ Security note ⚠️\n",
"\n",
"Building Q&A systems of graph databases requires executing model-generated graph queries. There are inherent risks in doing this. Make sure that your database connection permissions are always scoped as narrowly as possible for your chain/agent's needs. This will mitigate though not eliminate the risks of building a model-driven system. For more on general security best practices, [see here](/docs/security).\n",
"\n",
"\n",
"## Architecture\n",
"\n",
"At a high-level, the steps of most graph chains are:\n",
"\n",
"1. **Convert question to a graph database query**: Model converts user input to a graph database query (e.g. Cypher).\n",
"2. **Execute graph database query**: Execute the graph database query.\n",
"3. **Answer the question**: Model responds to user input using the query results.\n",
"\n",
"\n",
"![sql_usecase.png](../../../static/img/graph_usecase.png)\n",
"\n",
"## Setup\n",
"\n",
"First, get required packages and set environment variables.\n",
"In this example, we will be using Neo4j graph database."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain langchain-community langchain-openai neo4j"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We default to OpenAI models in this guide."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n",
"\n",
"# Uncomment the below to use LangSmith. Not required.\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()\n",
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we need to define Neo4j credentials.\n",
"Follow [these installation steps](https://neo4j.com/docs/operations-manual/current/installation/) to set up a Neo4j database."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"NEO4J_URI\"] = \"bolt://localhost:7687\"\n",
"os.environ[\"NEO4J_USERNAME\"] = \"neo4j\"\n",
"os.environ[\"NEO4J_PASSWORD\"] = \"password\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The below example will create a connection with a Neo4j database and will populate it with example data about movies and their actors."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_community.graphs import Neo4jGraph\n",
"\n",
"graph = Neo4jGraph()\n",
"\n",
"# Import movie information\n",
"\n",
"movies_query = \"\"\"\n",
"LOAD CSV WITH HEADERS FROM \n",
"'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'\n",
"AS row\n",
"MERGE (m:Movie {id:row.movieId})\n",
"SET m.released = date(row.released),\n",
" m.title = row.title,\n",
" m.imdbRating = toFloat(row.imdbRating)\n",
"FOREACH (director in split(row.director, '|') | \n",
" MERGE (p:Person {name:trim(director)})\n",
" MERGE (p)-[:DIRECTED]->(m))\n",
"FOREACH (actor in split(row.actors, '|') | \n",
" MERGE (p:Person {name:trim(actor)})\n",
" MERGE (p)-[:ACTED_IN]->(m))\n",
"FOREACH (genre in split(row.genres, '|') | \n",
" MERGE (g:Genre {name:trim(genre)})\n",
" MERGE (m)-[:IN_GENRE]->(g))\n",
"\"\"\"\n",
"\n",
"graph.query(movies_query)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Graph schema\n",
"\n",
"In order for an LLM to be able to generate a Cypher statement, it needs information about the graph schema. When you instantiate a graph object, it retrieves the information about the graph schema. If you later make any changes to the graph, you can run the `refresh_schema` method to refresh the schema information."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Node properties are the following:\n",
"Movie {imdbRating: FLOAT, id: STRING, released: DATE, title: STRING},Person {name: STRING},Genre {name: STRING},Chunk {id: STRING, question: STRING, query: STRING, text: STRING, embedding: LIST}\n",
"Relationship properties are the following:\n",
"\n",
"The relationships are the following:\n",
"(:Movie)-[:IN_GENRE]->(:Genre),(:Person)-[:DIRECTED]->(:Movie),(:Person)-[:ACTED_IN]->(:Movie)\n"
]
}
],
"source": [
"graph.refresh_schema()\n",
"print(graph.schema)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Great! We've got a graph database that we can query. Now let's try hooking it up to an LLM.\n",
"\n",
"## Chain\n",
"\n",
"Let's use a simple chain that takes a question, turns it into a Cypher query, executes the query, and uses the result to answer the original question.\n",
"\n",
"![graph_chain.webp](../../../static/img/graph_chain.webp)\n",
"\n",
"\n",
"LangChain comes with a built-in chain for this workflow that is designed to work with Neo4j: [GraphCypherQAChain](/docs/integrations/graphs/neo4j_cypher)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
"Generated Cypher:\n",
"\u001b[32;1m\u001b[1;3mMATCH (:Movie {title: \"Casino\"})<-[:ACTED_IN]-(actor:Person)\n",
"RETURN actor.name\u001b[0m\n",
"Full Context:\n",
"\u001b[32;1m\u001b[1;3m[{'actor.name': 'Joe Pesci'}, {'actor.name': 'Robert De Niro'}, {'actor.name': 'Sharon Stone'}, {'actor.name': 'James Woods'}]\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'query': 'What was the cast of the Casino?',\n",
" 'result': 'The cast of Casino included Joe Pesci, Robert De Niro, Sharon Stone, and James Woods.'}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains import GraphCypherQAChain\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
"chain = GraphCypherQAChain.from_llm(graph=graph, llm=llm, verbose=True)\n",
"response = chain.invoke({\"query\": \"What was the cast of the Casino?\"})\n",
"response"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Validating relationship direction\n",
"\n",
"LLMs can struggle with relationship directions in generated Cypher statement. Since the graph schema is predefined, we can validate and optionally correct relationship directions in the generated Cypher statements by using the `validate_cypher` parameter."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
"Generated Cypher:\n",
"\u001b[32;1m\u001b[1;3mMATCH (:Movie {title: \"Casino\"})<-[:ACTED_IN]-(actor:Person)\n",
"RETURN actor.name\u001b[0m\n",
"Full Context:\n",
"\u001b[32;1m\u001b[1;3m[{'actor.name': 'Joe Pesci'}, {'actor.name': 'Robert De Niro'}, {'actor.name': 'Sharon Stone'}, {'actor.name': 'James Woods'}]\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'query': 'What was the cast of the Casino?',\n",
" 'result': 'The cast of Casino included Joe Pesci, Robert De Niro, Sharon Stone, and James Woods.'}"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain = GraphCypherQAChain.from_llm(\n",
" graph=graph, llm=llm, verbose=True, validate_cypher=True\n",
")\n",
"response = chain.invoke({\"query\": \"What was the cast of the Casino?\"})\n",
"response"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Next steps\n",
"\n",
"For more complex query-generation, we may want to create few-shot prompts or add query-checking steps. For advanced techniques like this and more check out:\n",
"\n",
"* [Prompting strategies](/docs/use_cases/graph/prompting): Advanced prompt engineering techniques.\n",
"* [Mapping values](/docs/use_cases/graph/mapping): Techniques for mapping values from questions to database.\n",
"* [Semantic layer](/docs/use_cases/graph/semantic): Techniques for implementing semantic layers.\n",
"* [Constructing graphs](/docs/use_cases/graph/constructing): Techniques for constructing knowledge graphs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 4
}