mirror of https://github.com/hwchase17/langchain
Support for SPARQL (#7165)
# [SPARQL](https://www.w3.org/TR/rdf-sparql-query/) for [LangChain](https://github.com/hwchase17/langchain) ## Description LangChain support for knowledge graphs relying on W3C standards using RDFlib: SPARQL/ RDF(S)/ OWL with special focus on RDF \ * Works with local files, files from the web, and SPARQL endpoints * Supports both SELECT and UPDATE queries * Includes both a Jupyter notebook with an example and integration tests ## Contribution compared to related PRs and discussions * [Wikibase agent](https://github.com/hwchase17/langchain/pull/2690) - uses SPARQL, but specifically for wikibase querying * [Cypher qa](https://github.com/hwchase17/langchain/pull/5078) - graph DB question answering for Neo4J via Cypher * [PR 6050](https://github.com/hwchase17/langchain/pull/6050) - tries something similar, but does not cover UPDATE queries and supports only RDF * Discussions on [w3c mailing list](mailto:semantic-web@w3.org) related to the combination of LLMs (specifically ChatGPT) and knowledge graphs ## Dependencies * [RDFlib](https://github.com/RDFLib/rdflib) ## Tag maintainer Graph database related to memory -> @hwchase17pull/7213/head
parent
7cd0936b1c
commit
db98c44f8f
@ -0,0 +1,300 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c94240f5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# GraphSparqlQAChain\n",
|
||||
"\n",
|
||||
"Graph databases are an excellent choice for applications based on network-like models. To standardize the syntax and semantics of such graphs, the W3C recommends Semantic Web Technologies, cp. [Semantic Web](https://www.w3.org/standards/semanticweb/). [SPARQL](https://www.w3.org/TR/sparql11-query/) serves as a query language analogously to SQL or Cypher for these graphs. This notebook demonstrates the application of LLMs as a natural language interface to a graph database by generating SPARQL.\\\n",
|
||||
"Disclaimer: To date, SPARQL query generation via LLMs is still a bit unstable. Be especially careful with UPDATE queries, which alter the graph."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "dbc0ee68",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There are several sources you can run queries against, including files on the web, files you have available locally, SPARQL endpoints, e.g., [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page), and [triple stores](https://www.w3.org/wiki/LargeTripleStores)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "62812aad",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"is_executing": true
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.chains import GraphSparqlQAChain\n",
|
||||
"from langchain.graphs import RdfGraph"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "0928915d",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"is_executing": true
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"graph = RdfGraph(\n",
|
||||
" source_file=\"http://www.w3.org/People/Berners-Lee/card\",\n",
|
||||
" standard=\"rdf\",\n",
|
||||
" local_copy=\"test.ttl\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Note that providing a `local_file` is necessary for storing changes locally if the source is read-only."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "58c1a8ea",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Refresh graph schema information\n",
|
||||
"If the schema of the database changes, you can refresh the schema information needed to generate SPARQL queries."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "4e3de44f",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"is_executing": true
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"graph.load_schema()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "1fe76ccd",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"In the following, each IRI is followed by the local name and optionally its description in parentheses. \n",
|
||||
"The RDF graph supports the following node types:\n",
|
||||
"<http://xmlns.com/foaf/0.1/PersonalProfileDocument> (PersonalProfileDocument, None), <http://www.w3.org/ns/auth/cert#RSAPublicKey> (RSAPublicKey, None), <http://www.w3.org/2000/10/swap/pim/contact#Male> (Male, None), <http://xmlns.com/foaf/0.1/Person> (Person, None), <http://www.w3.org/2006/vcard/ns#Work> (Work, None)\n",
|
||||
"The RDF graph supports the following relationships:\n",
|
||||
"<http://www.w3.org/2000/01/rdf-schema#seeAlso> (seeAlso, None), <http://purl.org/dc/elements/1.1/title> (title, None), <http://xmlns.com/foaf/0.1/mbox_sha1sum> (mbox_sha1sum, None), <http://xmlns.com/foaf/0.1/maker> (maker, None), <http://www.w3.org/ns/solid/terms#oidcIssuer> (oidcIssuer, None), <http://www.w3.org/2000/10/swap/pim/contact#publicHomePage> (publicHomePage, None), <http://xmlns.com/foaf/0.1/openid> (openid, None), <http://www.w3.org/ns/pim/space#storage> (storage, None), <http://xmlns.com/foaf/0.1/name> (name, None), <http://www.w3.org/2000/10/swap/pim/contact#country> (country, None), <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> (type, None), <http://www.w3.org/ns/solid/terms#profileHighlightColor> (profileHighlightColor, None), <http://www.w3.org/ns/pim/space#preferencesFile> (preferencesFile, None), <http://www.w3.org/2000/01/rdf-schema#label> (label, None), <http://www.w3.org/ns/auth/cert#modulus> (modulus, None), <http://www.w3.org/2000/10/swap/pim/contact#participant> (participant, None), <http://www.w3.org/2000/10/swap/pim/contact#street2> (street2, None), <http://www.w3.org/2006/vcard/ns#locality> (locality, None), <http://xmlns.com/foaf/0.1/nick> (nick, None), <http://xmlns.com/foaf/0.1/homepage> (homepage, None), <http://creativecommons.org/ns#license> (license, None), <http://xmlns.com/foaf/0.1/givenname> (givenname, None), <http://www.w3.org/2006/vcard/ns#street-address> (street-address, None), <http://www.w3.org/2006/vcard/ns#postal-code> (postal-code, None), <http://www.w3.org/2000/10/swap/pim/contact#street> (street, None), <http://www.w3.org/2003/01/geo/wgs84_pos#lat> (lat, None), <http://xmlns.com/foaf/0.1/primaryTopic> (primaryTopic, None), <http://www.w3.org/2006/vcard/ns#fn> (fn, None), <http://www.w3.org/2003/01/geo/wgs84_pos#location> (location, None), <http://usefulinc.com/ns/doap#developer> (developer, None), <http://www.w3.org/2000/10/swap/pim/contact#city> (city, None), <http://www.w3.org/2006/vcard/ns#region> (region, None), <http://xmlns.com/foaf/0.1/member> (member, None), <http://www.w3.org/2003/01/geo/wgs84_pos#long> (long, None), <http://www.w3.org/2000/10/swap/pim/contact#address> (address, None), <http://xmlns.com/foaf/0.1/family_name> (family_name, None), <http://xmlns.com/foaf/0.1/account> (account, None), <http://xmlns.com/foaf/0.1/workplaceHomepage> (workplaceHomepage, None), <http://purl.org/dc/terms/title> (title, None), <http://www.w3.org/ns/solid/terms#publicTypeIndex> (publicTypeIndex, None), <http://www.w3.org/2000/10/swap/pim/contact#office> (office, None), <http://www.w3.org/2000/10/swap/pim/contact#homePage> (homePage, None), <http://xmlns.com/foaf/0.1/mbox> (mbox, None), <http://www.w3.org/2000/10/swap/pim/contact#preferredURI> (preferredURI, None), <http://www.w3.org/ns/solid/terms#profileBackgroundColor> (profileBackgroundColor, None), <http://schema.org/owns> (owns, None), <http://xmlns.com/foaf/0.1/based_near> (based_near, None), <http://www.w3.org/2006/vcard/ns#hasAddress> (hasAddress, None), <http://xmlns.com/foaf/0.1/img> (img, None), <http://www.w3.org/2000/10/swap/pim/contact#assistant> (assistant, None), <http://xmlns.com/foaf/0.1/title> (title, None), <http://www.w3.org/ns/auth/cert#key> (key, None), <http://www.w3.org/ns/ldp#inbox> (inbox, None), <http://www.w3.org/ns/solid/terms#editableProfile> (editableProfile, None), <http://www.w3.org/2000/10/swap/pim/contact#postalCode> (postalCode, None), <http://xmlns.com/foaf/0.1/weblog> (weblog, None), <http://www.w3.org/ns/auth/cert#exponent> (exponent, None), <http://rdfs.org/sioc/ns#avatar> (avatar, None)\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"graph.get_schema"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "68a3c677",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Querying the graph\n",
|
||||
"\n",
|
||||
"Now, you can use the graph SPARQL QA chain to ask questions about the graph."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "7476ce98",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"is_executing": true
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = GraphSparqlQAChain.from_llm(\n",
|
||||
" ChatOpenAI(temperature=0), graph=graph, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "ef8ee27b",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"is_executing": true
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001B[1m> Entering new GraphSparqlQAChain chain...\u001B[0m\n",
|
||||
"Identified intent:\n",
|
||||
"\u001B[32;1m\u001B[1;3mSELECT\u001B[0m\n",
|
||||
"Generated SPARQL:\n",
|
||||
"\u001B[32;1m\u001B[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
|
||||
"SELECT ?homepage\n",
|
||||
"WHERE {\n",
|
||||
" ?person foaf:name \"Tim Berners-Lee\" .\n",
|
||||
" ?person foaf:workplaceHomepage ?homepage .\n",
|
||||
"}\u001B[0m\n",
|
||||
"Full Context:\n",
|
||||
"\u001B[32;1m\u001B[1;3m[]\u001B[0m\n",
|
||||
"\n",
|
||||
"\u001B[1m> Finished chain.\u001B[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"Tim Berners-Lee's work homepage is http://www.w3.org/People/Berners-Lee/.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(\"What is Tim Berners-Lee's work homepage?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "af4b3294",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Updating the graph\n",
|
||||
"\n",
|
||||
"Analogously, you can update the graph, i.e., insert triples, using natural language."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "fdf38841",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"is_executing": true
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001B[1m> Entering new GraphSparqlQAChain chain...\u001B[0m\n",
|
||||
"Identified intent:\n",
|
||||
"\u001B[32;1m\u001B[1;3mUPDATE\u001B[0m\n",
|
||||
"Generated SPARQL:\n",
|
||||
"\u001B[32;1m\u001B[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
|
||||
"INSERT {\n",
|
||||
" ?person foaf:workplaceHomepage <http://www.w3.org/foo/bar/> .\n",
|
||||
"}\n",
|
||||
"WHERE {\n",
|
||||
" ?person foaf:name \"Timothy Berners-Lee\" .\n",
|
||||
"}\u001B[0m\n",
|
||||
"\n",
|
||||
"\u001B[1m> Finished chain.\u001B[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Successfully inserted triples into the graph.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(\"Save that the person with the name 'Timothy Berners-Lee' has a work homepage at 'http://www.w3.org/foo/bar/'\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5e0f7fc1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's verify the results:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "f874171b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[(rdflib.term.URIRef('https://www.w3.org/'),),\n",
|
||||
" (rdflib.term.URIRef('http://www.w3.org/foo/bar/'),)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = (\n",
|
||||
" \"\"\"PREFIX foaf: <http://xmlns.com/foaf/0.1/>\\n\"\"\"\n",
|
||||
" \"\"\"SELECT ?hp\\n\"\"\"\n",
|
||||
" \"\"\"WHERE {\\n\"\"\"\n",
|
||||
" \"\"\" ?person foaf:name \"Timothy Berners-Lee\" . \\n\"\"\"\n",
|
||||
" \"\"\" ?person foaf:workplaceHomepage ?hp .\\n\"\"\"\n",
|
||||
" \"\"\"}\"\"\"\n",
|
||||
")\n",
|
||||
"graph.query(query)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "lc",
|
||||
"language": "python",
|
||||
"name": "lc"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,127 @@
|
||||
"""
|
||||
Question answering over an RDF or OWL graph using SPARQL.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from pydantic import Field
|
||||
|
||||
from langchain.base_language import BaseLanguageModel
|
||||
from langchain.callbacks.manager import CallbackManagerForChainRun
|
||||
from langchain.chains.base import Chain
|
||||
from langchain.chains.graph_qa.prompts import (
|
||||
SPARQL_GENERATION_SELECT_PROMPT,
|
||||
SPARQL_GENERATION_UPDATE_PROMPT,
|
||||
SPARQL_INTENT_PROMPT,
|
||||
SPARQL_QA_PROMPT,
|
||||
)
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.graphs.rdf_graph import RdfGraph
|
||||
from langchain.prompts.base import BasePromptTemplate
|
||||
|
||||
|
||||
class GraphSparqlQAChain(Chain):
|
||||
"""
|
||||
Chain for question-answering against an RDF or OWL graph by generating
|
||||
SPARQL statements.
|
||||
"""
|
||||
|
||||
graph: RdfGraph = Field(exclude=True)
|
||||
sparql_generation_select_chain: LLMChain
|
||||
sparql_generation_update_chain: LLMChain
|
||||
sparql_intent_chain: LLMChain
|
||||
qa_chain: LLMChain
|
||||
input_key: str = "query" #: :meta private:
|
||||
output_key: str = "result" #: :meta private:
|
||||
|
||||
@property
|
||||
def input_keys(self) -> List[str]:
|
||||
return [self.input_key]
|
||||
|
||||
@property
|
||||
def output_keys(self) -> List[str]:
|
||||
_output_keys = [self.output_key]
|
||||
return _output_keys
|
||||
|
||||
@classmethod
|
||||
def from_llm(
|
||||
cls,
|
||||
llm: BaseLanguageModel,
|
||||
*,
|
||||
qa_prompt: BasePromptTemplate = SPARQL_QA_PROMPT,
|
||||
sparql_select_prompt: BasePromptTemplate = SPARQL_GENERATION_SELECT_PROMPT,
|
||||
sparql_update_prompt: BasePromptTemplate = SPARQL_GENERATION_UPDATE_PROMPT,
|
||||
sparql_intent_prompt: BasePromptTemplate = SPARQL_INTENT_PROMPT,
|
||||
**kwargs: Any,
|
||||
) -> GraphSparqlQAChain:
|
||||
"""Initialize from LLM."""
|
||||
qa_chain = LLMChain(llm=llm, prompt=qa_prompt)
|
||||
sparql_generation_select_chain = LLMChain(llm=llm, prompt=sparql_select_prompt)
|
||||
sparql_generation_update_chain = LLMChain(llm=llm, prompt=sparql_update_prompt)
|
||||
sparql_intent_chain = LLMChain(llm=llm, prompt=sparql_intent_prompt)
|
||||
|
||||
return cls(
|
||||
qa_chain=qa_chain,
|
||||
sparql_generation_select_chain=sparql_generation_select_chain,
|
||||
sparql_generation_update_chain=sparql_generation_update_chain,
|
||||
sparql_intent_chain=sparql_intent_chain,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
def _call(
|
||||
self,
|
||||
inputs: Dict[str, Any],
|
||||
run_manager: Optional[CallbackManagerForChainRun] = None,
|
||||
) -> Dict[str, str]:
|
||||
"""
|
||||
Generate SPARQL query, use it to retrieve a response from the gdb and answer
|
||||
the question.
|
||||
"""
|
||||
_run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager()
|
||||
callbacks = _run_manager.get_child()
|
||||
prompt = inputs[self.input_key]
|
||||
|
||||
_intent = self.sparql_intent_chain.run({"prompt": prompt}, callbacks=callbacks)
|
||||
intent = _intent.strip()
|
||||
|
||||
if intent == "SELECT":
|
||||
sparql_generation_chain = self.sparql_generation_select_chain
|
||||
elif intent == "UPDATE":
|
||||
sparql_generation_chain = self.sparql_generation_update_chain
|
||||
else:
|
||||
raise ValueError(
|
||||
"I am sorry, but this prompt seems to fit none of the currently "
|
||||
"supported SPARQL query types, i.e., SELECT and UPDATE."
|
||||
)
|
||||
|
||||
_run_manager.on_text("Identified intent:", end="\n", verbose=self.verbose)
|
||||
_run_manager.on_text(intent, color="green", end="\n", verbose=self.verbose)
|
||||
|
||||
generated_sparql = sparql_generation_chain.run(
|
||||
{"prompt": prompt, "schema": self.graph.get_schema}, callbacks=callbacks
|
||||
)
|
||||
|
||||
_run_manager.on_text("Generated SPARQL:", end="\n", verbose=self.verbose)
|
||||
_run_manager.on_text(
|
||||
generated_sparql, color="green", end="\n", verbose=self.verbose
|
||||
)
|
||||
|
||||
if intent == "SELECT":
|
||||
context = self.graph.query(generated_sparql)
|
||||
|
||||
_run_manager.on_text("Full Context:", end="\n", verbose=self.verbose)
|
||||
_run_manager.on_text(
|
||||
str(context), color="green", end="\n", verbose=self.verbose
|
||||
)
|
||||
result = self.qa_chain(
|
||||
{"prompt": prompt, "context": context},
|
||||
callbacks=callbacks,
|
||||
)
|
||||
res = result[self.qa_chain.output_key]
|
||||
elif intent == "UPDATE":
|
||||
self.graph.update(generated_sparql)
|
||||
res = "Successfully inserted triples into the graph."
|
||||
else:
|
||||
raise ValueError("Unsupported SPARQL query type.")
|
||||
return {self.output_key: res}
|
@ -0,0 +1,279 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
List,
|
||||
Optional,
|
||||
)
|
||||
|
||||
if TYPE_CHECKING:
|
||||
import rdflib
|
||||
|
||||
prefixes = {
|
||||
"owl": """PREFIX owl: <http://www.w3.org/2002/07/owl#>\n""",
|
||||
"rdf": """PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>\n""",
|
||||
"rdfs": """PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n""",
|
||||
"xsd": """PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>\n""",
|
||||
}
|
||||
|
||||
cls_query_rdf = prefixes["rdfs"] + (
|
||||
"""SELECT DISTINCT ?cls ?com\n"""
|
||||
"""WHERE { \n"""
|
||||
""" ?instance a ?cls . \n"""
|
||||
""" OPTIONAL { ?cls rdfs:comment ?com } \n"""
|
||||
"""}"""
|
||||
)
|
||||
|
||||
cls_query_rdfs = prefixes["rdfs"] + (
|
||||
"""SELECT DISTINCT ?cls ?com\n"""
|
||||
"""WHERE { \n"""
|
||||
""" ?instance a/rdfs:subClassOf* ?cls . \n"""
|
||||
""" OPTIONAL { ?cls rdfs:comment ?com } \n"""
|
||||
"""}"""
|
||||
)
|
||||
|
||||
cls_query_owl = cls_query_rdfs
|
||||
|
||||
rel_query_rdf = prefixes["rdfs"] + (
|
||||
"""SELECT DISTINCT ?rel ?com\n"""
|
||||
"""WHERE { \n"""
|
||||
""" ?subj ?rel ?obj . \n"""
|
||||
""" OPTIONAL { ?cls rdfs:comment ?com } \n"""
|
||||
"""}"""
|
||||
)
|
||||
|
||||
rel_query_rdfs = (
|
||||
prefixes["rdf"]
|
||||
+ prefixes["rdfs"]
|
||||
+ (
|
||||
"""SELECT DISTINCT ?rel ?com\n"""
|
||||
"""WHERE { \n"""
|
||||
""" ?rel a/rdfs:subPropertyOf* rdf:Property . \n"""
|
||||
""" OPTIONAL { ?cls rdfs:comment ?com } \n"""
|
||||
"""}"""
|
||||
)
|
||||
)
|
||||
|
||||
op_query_owl = (
|
||||
prefixes["rdfs"]
|
||||
+ prefixes["owl"]
|
||||
+ (
|
||||
"""SELECT DISTINCT ?op ?com\n"""
|
||||
"""WHERE { \n"""
|
||||
""" ?op a/rdfs:subPropertyOf* owl:ObjectProperty . \n"""
|
||||
""" OPTIONAL { ?cls rdfs:comment ?com } \n"""
|
||||
"""}"""
|
||||
)
|
||||
)
|
||||
|
||||
dp_query_owl = (
|
||||
prefixes["rdfs"]
|
||||
+ prefixes["owl"]
|
||||
+ (
|
||||
"""SELECT DISTINCT ?dp ?com\n"""
|
||||
"""WHERE { \n"""
|
||||
""" ?dp a/rdfs:subPropertyOf* owl:DatatypeProperty . \n"""
|
||||
""" OPTIONAL { ?cls rdfs:comment ?com } \n"""
|
||||
"""}"""
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
class RdfGraph:
|
||||
"""
|
||||
RDFlib wrapper for graph operations.
|
||||
Modes:
|
||||
* local: Local file - can be queried and changed
|
||||
* online: Online file - can only be queried, changes can be stored locally
|
||||
* store: Triple store - can be queried and changed if update_endpoint available
|
||||
Together with a source file, the serialization should be specified.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
source_file: Optional[str] = None,
|
||||
serialization: Optional[str] = "ttl",
|
||||
query_endpoint: Optional[str] = None,
|
||||
update_endpoint: Optional[str] = None,
|
||||
standard: Optional[str] = "rdf",
|
||||
local_copy: Optional[str] = None,
|
||||
) -> None:
|
||||
"""
|
||||
Set up the RDFlib graph
|
||||
|
||||
:param source_file: either a path for a local file or a URL
|
||||
:param serialization: serialization of the input
|
||||
:param query_endpoint: SPARQL endpoint for queries, read access
|
||||
:param update_endpoint: SPARQL endpoint for UPDATE queries, write access
|
||||
:param standard: RDF, RDFS, or OWL
|
||||
:param local_copy: new local copy for storing changes
|
||||
"""
|
||||
self.source_file = source_file
|
||||
self.serialization = serialization
|
||||
self.query_endpoint = query_endpoint
|
||||
self.update_endpoint = update_endpoint
|
||||
self.standard = standard
|
||||
self.local_copy = local_copy
|
||||
|
||||
try:
|
||||
import rdflib
|
||||
from rdflib.graph import DATASET_DEFAULT_GRAPH_ID as default
|
||||
from rdflib.plugins.stores import sparqlstore
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"Could not import rdflib python package. "
|
||||
"Please install it with `pip install rdflib`."
|
||||
)
|
||||
if self.standard not in (supported_standards := ("rdf", "rdfs", "owl")):
|
||||
raise ValueError(
|
||||
f"Invalid standard. Supported standards are: {supported_standards}."
|
||||
)
|
||||
|
||||
if (
|
||||
not source_file
|
||||
and not query_endpoint
|
||||
or source_file
|
||||
and (query_endpoint or update_endpoint)
|
||||
):
|
||||
raise ValueError(
|
||||
"Could not unambiguously initialize the graph wrapper. "
|
||||
"Specify either a file (local or online) via the source_file "
|
||||
"or a triple store via the endpoints."
|
||||
)
|
||||
|
||||
if source_file:
|
||||
if source_file.startswith("http"):
|
||||
self.mode = "online"
|
||||
else:
|
||||
self.mode = "local"
|
||||
if self.local_copy is None:
|
||||
self.local_copy = self.source_file
|
||||
self.graph = rdflib.Graph()
|
||||
self.graph.parse(source_file, format=self.serialization)
|
||||
|
||||
if query_endpoint:
|
||||
self.mode = "store"
|
||||
if not update_endpoint:
|
||||
self._store = sparqlstore.SPARQLStore()
|
||||
self._store.open(query_endpoint)
|
||||
else:
|
||||
self._store = sparqlstore.SPARQLUpdateStore()
|
||||
self._store.open((query_endpoint, update_endpoint))
|
||||
self.graph = rdflib.Graph(self._store, identifier=default)
|
||||
|
||||
# Verify that the graph was loaded
|
||||
if not len(self.graph):
|
||||
raise AssertionError("The graph is empty.")
|
||||
|
||||
# Set schema
|
||||
self.schema = ""
|
||||
self.load_schema()
|
||||
|
||||
@property
|
||||
def get_schema(self) -> str:
|
||||
"""
|
||||
Returns the schema of the graph database.
|
||||
"""
|
||||
return self.schema
|
||||
|
||||
def query(
|
||||
self,
|
||||
query: str,
|
||||
) -> List[rdflib.query.ResultRow]:
|
||||
"""
|
||||
Query the graph.
|
||||
"""
|
||||
from rdflib.exceptions import ParserError
|
||||
from rdflib.query import ResultRow
|
||||
|
||||
try:
|
||||
res = self.graph.query(query)
|
||||
except ParserError as e:
|
||||
raise ValueError("Generated SPARQL statement is invalid\n" f"{e}")
|
||||
return [r for r in res if isinstance(r, ResultRow)]
|
||||
|
||||
def update(
|
||||
self,
|
||||
query: str,
|
||||
) -> None:
|
||||
"""
|
||||
Update the graph.
|
||||
"""
|
||||
from rdflib.exceptions import ParserError
|
||||
|
||||
try:
|
||||
self.graph.update(query)
|
||||
except ParserError as e:
|
||||
raise ValueError("Generated SPARQL statement is invalid\n" f"{e}")
|
||||
if self.local_copy:
|
||||
self.graph.serialize(
|
||||
destination=self.local_copy, format=self.local_copy.split(".")[-1]
|
||||
)
|
||||
else:
|
||||
raise ValueError("No target file specified for saving the updated file.")
|
||||
|
||||
@staticmethod
|
||||
def _get_local_name(iri: str) -> str:
|
||||
if "#" in iri:
|
||||
local_name = iri.split("#")[-1]
|
||||
elif "/" in iri:
|
||||
local_name = iri.split("/")[-1]
|
||||
else:
|
||||
raise ValueError(f"Unexpected IRI '{iri}', contains neither '#' nor '/'.")
|
||||
return local_name
|
||||
|
||||
def _res_to_str(self, res: rdflib.query.ResultRow, var: str) -> str:
|
||||
return (
|
||||
"<"
|
||||
+ res[var]
|
||||
+ "> ("
|
||||
+ self._get_local_name(res[var])
|
||||
+ ", "
|
||||
+ str(res["com"])
|
||||
+ ")"
|
||||
)
|
||||
|
||||
def load_schema(self) -> None:
|
||||
"""
|
||||
Load the graph schema information.
|
||||
"""
|
||||
|
||||
def _rdf_s_schema(
|
||||
classes: List[rdflib.query.ResultRow],
|
||||
relationships: List[rdflib.query.ResultRow],
|
||||
) -> str:
|
||||
return (
|
||||
f"In the following, each IRI is followed by the local name and "
|
||||
f"optionally its description in parentheses. \n"
|
||||
f"The RDF graph supports the following node types:\n"
|
||||
f'{", ".join([self._res_to_str(r, "cls") for r in classes])}\n'
|
||||
f"The RDF graph supports the following relationships:\n"
|
||||
f'{", ".join([self._res_to_str(r, "rel") for r in relationships])}\n'
|
||||
)
|
||||
|
||||
if self.standard == "rdf":
|
||||
clss = self.query(cls_query_rdf)
|
||||
rels = self.query(rel_query_rdf)
|
||||
self.schema = _rdf_s_schema(clss, rels)
|
||||
elif self.standard == "rdfs":
|
||||
clss = self.query(cls_query_rdfs)
|
||||
rels = self.query(rel_query_rdfs)
|
||||
self.schema = _rdf_s_schema(clss, rels)
|
||||
elif self.standard == "owl":
|
||||
clss = self.query(cls_query_owl)
|
||||
ops = self.query(cls_query_owl)
|
||||
dps = self.query(cls_query_owl)
|
||||
self.schema = (
|
||||
f"In the following, each IRI is followed by the local name and "
|
||||
f"optionally its description in parentheses. \n"
|
||||
f"The OWL graph supports the following node types:\n"
|
||||
f'{", ".join([self._res_to_str(r, "cls") for r in clss])}\n'
|
||||
f"The OWL graph supports the following object properties, "
|
||||
f"i.e., relationships between objects:\n"
|
||||
f'{", ".join([self._res_to_str(r, "op") for r in ops])}\n'
|
||||
f"The OWL graph supports the following data properties, "
|
||||
f"i.e., relationships between objects and literals:\n"
|
||||
f'{", ".join([self._res_to_str(r, "dp") for r in dps])}\n'
|
||||
)
|
||||
else:
|
||||
raise ValueError(f"Mode '{self.standard}' is currently not supported.")
|
@ -0,0 +1,79 @@
|
||||
"""Test RDF/ SPARQL Graph Database Chain."""
|
||||
import os
|
||||
|
||||
from langchain.chains.graph_qa.sparql import GraphSparqlQAChain
|
||||
from langchain.graphs import RdfGraph
|
||||
from langchain.llms.openai import OpenAI
|
||||
|
||||
|
||||
def test_connect_file_rdf() -> None:
|
||||
"""
|
||||
Test loading online resource.
|
||||
"""
|
||||
berners_lee_card = "http://www.w3.org/People/Berners-Lee/card"
|
||||
|
||||
graph = RdfGraph(
|
||||
source_file=berners_lee_card,
|
||||
standard="rdf",
|
||||
)
|
||||
|
||||
query = """SELECT ?s ?p ?o\n""" """WHERE { ?s ?p ?o }"""
|
||||
|
||||
output = graph.query(query)
|
||||
assert len(output) == 86
|
||||
|
||||
|
||||
def test_sparql_select() -> None:
|
||||
"""
|
||||
Test for generating and executing simple SPARQL SELECT query.
|
||||
"""
|
||||
berners_lee_card = "http://www.w3.org/People/Berners-Lee/card"
|
||||
|
||||
graph = RdfGraph(
|
||||
source_file=berners_lee_card,
|
||||
standard="rdf",
|
||||
)
|
||||
|
||||
chain = GraphSparqlQAChain.from_llm(OpenAI(temperature=0), graph=graph)
|
||||
output = chain.run("What is Tim Berners-Lee's work homepage?")
|
||||
expected_output = (
|
||||
" The work homepage of Tim Berners-Lee is "
|
||||
"http://www.w3.org/People/Berners-Lee/."
|
||||
)
|
||||
assert output == expected_output
|
||||
|
||||
|
||||
def test_sparql_insert() -> None:
|
||||
"""
|
||||
Test for generating and executing simple SPARQL INSERT query.
|
||||
"""
|
||||
berners_lee_card = "http://www.w3.org/People/Berners-Lee/card"
|
||||
_local_copy = "test.ttl"
|
||||
|
||||
graph = RdfGraph(
|
||||
source_file=berners_lee_card,
|
||||
standard="rdf",
|
||||
local_copy=_local_copy,
|
||||
)
|
||||
|
||||
chain = GraphSparqlQAChain.from_llm(OpenAI(temperature=0), graph=graph)
|
||||
chain.run(
|
||||
"Save that the person with the name 'Timothy Berners-Lee' "
|
||||
"has a work homepage at 'http://www.w3.org/foo/bar/'"
|
||||
)
|
||||
query = (
|
||||
"""PREFIX foaf: <http://xmlns.com/foaf/0.1/>\n"""
|
||||
"""SELECT ?hp\n"""
|
||||
"""WHERE {\n"""
|
||||
""" ?person foaf:name "Timothy Berners-Lee" . \n"""
|
||||
""" ?person foaf:workplaceHomepage ?hp .\n"""
|
||||
"""}"""
|
||||
)
|
||||
output = graph.query(query)
|
||||
assert len(output) == 2
|
||||
|
||||
# clean up
|
||||
try:
|
||||
os.remove(_local_copy)
|
||||
except OSError:
|
||||
pass
|
Loading…
Reference in New Issue