mirror of https://github.com/hwchase17/langchain
You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
423 lines
15 KiB
Plaintext
423 lines
15 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c94240f5",
|
|
"metadata": {},
|
|
"source": [
|
|
"# RDFLib\n",
|
|
"\n",
|
|
">[RDFLib](https://rdflib.readthedocs.io/) is a pure Python package for working with [RDF](https://en.wikipedia.org/wiki/Resource_Description_Framework). `RDFLib` contains most things you need to work with `RDF`, including:\n",
|
|
">- parsers and serializers for RDF/XML, N3, NTriples, N-Quads, Turtle, TriX, Trig and JSON-LD\n",
|
|
">- a Graph interface which can be backed by any one of a number of Store implementations\n",
|
|
">- store implementations for in-memory, persistent on disk (Berkeley DB) and remote SPARQL endpoints\n",
|
|
">- a SPARQL 1.1 implementation - supporting SPARQL 1.1 Queries and Update statements\n",
|
|
">- SPARQL function extension mechanisms\n",
|
|
"\n",
|
|
"Graph databases are an excellent choice for applications based on network-like models. To standardize the syntax and semantics of such graphs, the W3C recommends `Semantic Web Technologies`, cp. [Semantic Web](https://www.w3.org/standards/semanticweb/). \n",
|
|
"\n",
|
|
"[SPARQL](https://www.w3.org/TR/sparql11-query/) serves as a query language analogously to `SQL` or `Cypher` for these graphs. This notebook demonstrates the application of LLMs as a natural language interface to a graph database by generating `SPARQL`.\n",
|
|
"\n",
|
|
"**Disclaimer:** To date, `SPARQL` query generation via LLMs is still a bit unstable. Be especially careful with `UPDATE` queries, which alter the graph."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "dbc0ee68",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setting up\n",
|
|
"\n",
|
|
"We have to install a python library:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "f66923ed-ba21-4584-88ac-1ad0de310889",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!pip install rdflib"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a2d2ac86-39b3-421f-a7ce-1104d2bff707",
|
|
"metadata": {},
|
|
"source": [
|
|
"There are several sources you can run queries against, including files on the web, files you have available locally, SPARQL endpoints, e.g., [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page), and [triple stores](https://www.w3.org/wiki/LargeTripleStores)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "62812aad",
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain.chains import GraphSparqlQAChain\n",
|
|
"from langchain_community.graphs import RdfGraph\n",
|
|
"from langchain_openai import ChatOpenAI"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "0928915d",
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"graph = RdfGraph(\n",
|
|
" source_file=\"http://www.w3.org/People/Berners-Lee/card\",\n",
|
|
" standard=\"rdf\",\n",
|
|
" local_copy=\"test.ttl\",\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7af596b5",
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"source": [
|
|
"Note that providing a `local_file` is necessary for storing changes locally if the source is read-only."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "58c1a8ea",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Refresh graph schema information\n",
|
|
"If the schema of the database changes, you can refresh the schema information needed to generate SPARQL queries."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"id": "4e3de44f",
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"graph.load_schema()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"id": "1fe76ccd",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"In the following, each IRI is followed by the local name and optionally its description in parentheses. \n",
|
|
"The RDF graph supports the following node types:\n",
|
|
"<http://xmlns.com/foaf/0.1/PersonalProfileDocument> (PersonalProfileDocument, None), <http://www.w3.org/ns/auth/cert#RSAPublicKey> (RSAPublicKey, None), <http://www.w3.org/2000/10/swap/pim/contact#Male> (Male, None), <http://xmlns.com/foaf/0.1/Person> (Person, None), <http://www.w3.org/2006/vcard/ns#Work> (Work, None)\n",
|
|
"The RDF graph supports the following relationships:\n",
|
|
"<http://www.w3.org/2000/01/rdf-schema#seeAlso> (seeAlso, None), <http://purl.org/dc/elements/1.1/title> (title, None), <http://xmlns.com/foaf/0.1/mbox_sha1sum> (mbox_sha1sum, None), <http://xmlns.com/foaf/0.1/maker> (maker, None), <http://www.w3.org/ns/solid/terms#oidcIssuer> (oidcIssuer, None), <http://www.w3.org/2000/10/swap/pim/contact#publicHomePage> (publicHomePage, None), <http://xmlns.com/foaf/0.1/openid> (openid, None), <http://www.w3.org/ns/pim/space#storage> (storage, None), <http://xmlns.com/foaf/0.1/name> (name, None), <http://www.w3.org/2000/10/swap/pim/contact#country> (country, None), <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> (type, None), <http://www.w3.org/ns/solid/terms#profileHighlightColor> (profileHighlightColor, None), <http://www.w3.org/ns/pim/space#preferencesFile> (preferencesFile, None), <http://www.w3.org/2000/01/rdf-schema#label> (label, None), <http://www.w3.org/ns/auth/cert#modulus> (modulus, None), <http://www.w3.org/2000/10/swap/pim/contact#participant> (participant, None), <http://www.w3.org/2000/10/swap/pim/contact#street2> (street2, None), <http://www.w3.org/2006/vcard/ns#locality> (locality, None), <http://xmlns.com/foaf/0.1/nick> (nick, None), <http://xmlns.com/foaf/0.1/homepage> (homepage, None), <http://creativecommons.org/ns#license> (license, None), <http://xmlns.com/foaf/0.1/givenname> (givenname, None), <http://www.w3.org/2006/vcard/ns#street-address> (street-address, None), <http://www.w3.org/2006/vcard/ns#postal-code> (postal-code, None), <http://www.w3.org/2000/10/swap/pim/contact#street> (street, None), <http://www.w3.org/2003/01/geo/wgs84_pos#lat> (lat, None), <http://xmlns.com/foaf/0.1/primaryTopic> (primaryTopic, None), <http://www.w3.org/2006/vcard/ns#fn> (fn, None), <http://www.w3.org/2003/01/geo/wgs84_pos#location> (location, None), <http://usefulinc.com/ns/doap#developer> (developer, None), <http://www.w3.org/2000/10/swap/pim/contact#city> (city, None), <http://www.w3.org/2006/vcard/ns#region> (region, None), <http://xmlns.com/foaf/0.1/member> (member, None), <http://www.w3.org/2003/01/geo/wgs84_pos#long> (long, None), <http://www.w3.org/2000/10/swap/pim/contact#address> (address, None), <http://xmlns.com/foaf/0.1/family_name> (family_name, None), <http://xmlns.com/foaf/0.1/account> (account, None), <http://xmlns.com/foaf/0.1/workplaceHomepage> (workplaceHomepage, None), <http://purl.org/dc/terms/title> (title, None), <http://www.w3.org/ns/solid/terms#publicTypeIndex> (publicTypeIndex, None), <http://www.w3.org/2000/10/swap/pim/contact#office> (office, None), <http://www.w3.org/2000/10/swap/pim/contact#homePage> (homePage, None), <http://xmlns.com/foaf/0.1/mbox> (mbox, None), <http://www.w3.org/2000/10/swap/pim/contact#preferredURI> (preferredURI, None), <http://www.w3.org/ns/solid/terms#profileBackgroundColor> (profileBackgroundColor, None), <http://schema.org/owns> (owns, None), <http://xmlns.com/foaf/0.1/based_near> (based_near, None), <http://www.w3.org/2006/vcard/ns#hasAddress> (hasAddress, None), <http://xmlns.com/foaf/0.1/img> (img, None), <http://www.w3.org/2000/10/swap/pim/contact#assistant> (assistant, None), <http://xmlns.com/foaf/0.1/title> (title, None), <http://www.w3.org/ns/auth/cert#key> (key, None), <http://www.w3.org/ns/ldp#inbox> (inbox, None), <http://www.w3.org/ns/solid/terms#editableProfile> (editableProfile, None), <http://www.w3.org/2000/10/swap/pim/contact#postalCode> (postalCode, None), <http://xmlns.com/foaf/0.1/weblog> (weblog, None), <http://www.w3.org/ns/auth/cert#exponent> (exponent, None), <http://rdfs.org/sioc/ns#avatar> (avatar, None)\n",
|
|
"\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"graph.get_schema"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "68a3c677",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Querying the graph\n",
|
|
"\n",
|
|
"Now, you can use the graph SPARQL QA chain to ask questions about the graph."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "7476ce98",
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": true
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"chain = GraphSparqlQAChain.from_llm(\n",
|
|
" ChatOpenAI(temperature=0), graph=graph, verbose=True\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"id": "ef8ee27b",
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": true
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\n",
|
|
"\n",
|
|
"\u001b[1m> Entering new GraphSparqlQAChain chain...\u001b[0m\n",
|
|
"Identified intent:\n",
|
|
"\u001b[32;1m\u001b[1;3mSELECT\u001b[0m\n",
|
|
"Generated SPARQL:\n",
|
|
"\u001b[32;1m\u001b[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
|
|
"SELECT ?homepage\n",
|
|
"WHERE {\n",
|
|
" ?person foaf:name \"Tim Berners-Lee\" .\n",
|
|
" ?person foaf:workplaceHomepage ?homepage .\n",
|
|
"}\u001b[0m\n",
|
|
"Full Context:\n",
|
|
"\u001b[32;1m\u001b[1;3m[]\u001b[0m\n",
|
|
"\n",
|
|
"\u001b[1m> Finished chain.\u001b[0m\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"\"Tim Berners-Lee's work homepage is http://www.w3.org/People/Berners-Lee/.\""
|
|
]
|
|
},
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"chain.run(\"What is Tim Berners-Lee's work homepage?\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "af4b3294",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Updating the graph\n",
|
|
"\n",
|
|
"Analogously, you can update the graph, i.e., insert triples, using natural language."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"id": "fdf38841",
|
|
"metadata": {
|
|
"pycharm": {
|
|
"is_executing": true
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\n",
|
|
"\n",
|
|
"\u001b[1m> Entering new GraphSparqlQAChain chain...\u001b[0m\n",
|
|
"Identified intent:\n",
|
|
"\u001b[32;1m\u001b[1;3mUPDATE\u001b[0m\n",
|
|
"Generated SPARQL:\n",
|
|
"\u001b[32;1m\u001b[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
|
|
"INSERT {\n",
|
|
" ?person foaf:workplaceHomepage <http://www.w3.org/foo/bar/> .\n",
|
|
"}\n",
|
|
"WHERE {\n",
|
|
" ?person foaf:name \"Timothy Berners-Lee\" .\n",
|
|
"}\u001b[0m\n",
|
|
"\n",
|
|
"\u001b[1m> Finished chain.\u001b[0m\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"'Successfully inserted triples into the graph.'"
|
|
]
|
|
},
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"chain.run(\n",
|
|
" \"Save that the person with the name 'Timothy Berners-Lee' has a work homepage at 'http://www.w3.org/foo/bar/'\"\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5e0f7fc1",
|
|
"metadata": {},
|
|
"source": [
|
|
"Let's verify the results:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 27,
|
|
"id": "f874171b",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[(rdflib.term.URIRef('https://www.w3.org/'),),\n",
|
|
" (rdflib.term.URIRef('http://www.w3.org/foo/bar/'),)]"
|
|
]
|
|
},
|
|
"execution_count": 15,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"query = (\n",
|
|
" \"\"\"PREFIX foaf: <http://xmlns.com/foaf/0.1/>\\n\"\"\"\n",
|
|
" \"\"\"SELECT ?hp\\n\"\"\"\n",
|
|
" \"\"\"WHERE {\\n\"\"\"\n",
|
|
" \"\"\" ?person foaf:name \"Timothy Berners-Lee\" . \\n\"\"\"\n",
|
|
" \"\"\" ?person foaf:workplaceHomepage ?hp .\\n\"\"\"\n",
|
|
" \"\"\"}\"\"\"\n",
|
|
")\n",
|
|
"graph.query(query)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "eb00a625-a6c9-4766-b3f0-eaed024851c9",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Return SQARQL query\n",
|
|
"You can return the SPARQL query step from the Sparql QA Chain using the `return_sparql_query` parameter"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 20,
|
|
"id": "f13e2865-176a-4417-95e6-db818b214d08",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"chain = GraphSparqlQAChain.from_llm(\n",
|
|
" ChatOpenAI(temperature=0), graph=graph, verbose=True, return_sparql_query=True\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 29,
|
|
"id": "4f4d47b6-4202-4e74-8c88-aeaac5344c04",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\n",
|
|
"\n",
|
|
"\u001b[1m> Entering new GraphSparqlQAChain chain...\u001b[0m\n",
|
|
"Identified intent:\n",
|
|
"\u001b[32;1m\u001b[1;3mSELECT\u001b[0m\n",
|
|
"Generated SPARQL:\n",
|
|
"\u001b[32;1m\u001b[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
|
|
"SELECT ?workHomepage\n",
|
|
"WHERE {\n",
|
|
" ?person foaf:name \"Tim Berners-Lee\" .\n",
|
|
" ?person foaf:workplaceHomepage ?workHomepage .\n",
|
|
"}\u001b[0m\n",
|
|
"Full Context:\n",
|
|
"\u001b[32;1m\u001b[1;3m[]\u001b[0m\n",
|
|
"\n",
|
|
"\u001b[1m> Finished chain.\u001b[0m\n",
|
|
"SQARQL query: PREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
|
|
"SELECT ?workHomepage\n",
|
|
"WHERE {\n",
|
|
" ?person foaf:name \"Tim Berners-Lee\" .\n",
|
|
" ?person foaf:workplaceHomepage ?workHomepage .\n",
|
|
"}\n",
|
|
"Final answer: Tim Berners-Lee's work homepage is http://www.w3.org/People/Berners-Lee/.\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"result = chain(\"What is Tim Berners-Lee's work homepage?\")\n",
|
|
"print(f\"SQARQL query: {result['sparql_query']}\")\n",
|
|
"print(f\"Final answer: {result['result']}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 30,
|
|
"id": "be3d9ff7-dc00-47d0-857d-fd40437a3f22",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"PREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
|
|
"SELECT ?workHomepage\n",
|
|
"WHERE {\n",
|
|
" ?person foaf:name \"Tim Berners-Lee\" .\n",
|
|
" ?person foaf:workplaceHomepage ?workHomepage .\n",
|
|
"}\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"print(result[\"sparql_query\"])"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.12"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|