{ "cells": [ { "cell_type": "markdown", "id": "c94240f5", "metadata": {}, "source": [ "# GraphSparqlQAChain\n", "\n", "Graph databases are an excellent choice for applications based on network-like models. To standardize the syntax and semantics of such graphs, the W3C recommends Semantic Web Technologies, cp. [Semantic Web](https://www.w3.org/standards/semanticweb/). [SPARQL](https://www.w3.org/TR/sparql11-query/) serves as a query language analogously to SQL or Cypher for these graphs. This notebook demonstrates the application of LLMs as a natural language interface to a graph database by generating SPARQL.\\\n", "Disclaimer: To date, SPARQL query generation via LLMs is still a bit unstable. Be especially careful with UPDATE queries, which alter the graph." ] }, { "cell_type": "markdown", "id": "dbc0ee68", "metadata": {}, "source": [ "There are several sources you can run queries against, including files on the web, files you have available locally, SPARQL endpoints, e.g., [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page), and [triple stores](https://www.w3.org/wiki/LargeTripleStores)." ] }, { "cell_type": "code", "execution_count": 1, "id": "62812aad", "metadata": { "pycharm": { "is_executing": true } }, "outputs": [], "source": [ "from langchain.chat_models import ChatOpenAI\n", "from langchain.chains import GraphSparqlQAChain\n", "from langchain.graphs import RdfGraph" ] }, { "cell_type": "code", "execution_count": 8, "id": "0928915d", "metadata": { "pycharm": { "is_executing": true } }, "outputs": [], "source": [ "graph = RdfGraph(\n", " source_file=\"http://www.w3.org/People/Berners-Lee/card\",\n", " standard=\"rdf\",\n", " local_copy=\"test.ttl\",\n", ")" ] }, { "cell_type": "markdown", "source": [ "Note that providing a `local_file` is necessary for storing changes locally if the source is read-only." ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "id": "58c1a8ea", "metadata": {}, "source": [ "## Refresh graph schema information\n", "If the schema of the database changes, you can refresh the schema information needed to generate SPARQL queries." ] }, { "cell_type": "code", "execution_count": 9, "id": "4e3de44f", "metadata": { "pycharm": { "is_executing": true } }, "outputs": [], "source": [ "graph.load_schema()" ] }, { "cell_type": "code", "execution_count": 10, "id": "1fe76ccd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "In the following, each IRI is followed by the local name and optionally its description in parentheses. \n", "The RDF graph supports the following node types:\n", " (PersonalProfileDocument, None), (RSAPublicKey, None), (Male, None), (Person, None), (Work, None)\n", "The RDF graph supports the following relationships:\n", " (seeAlso, None), (title, None), (mbox_sha1sum, None), (maker, None), (oidcIssuer, None), (publicHomePage, None), (openid, None), (storage, None), (name, None), (country, None), (type, None), (profileHighlightColor, None), (preferencesFile, None), (label, None), (modulus, None), (participant, None), (street2, None), (locality, None), (nick, None), (homepage, None), (license, None), (givenname, None), (street-address, None), (postal-code, None), (street, None), (lat, None), (primaryTopic, None), (fn, None), (location, None), (developer, None), (city, None), (region, None), (member, None), (long, None), (address, None), (family_name, None), (account, None), (workplaceHomepage, None), (title, None), (publicTypeIndex, None), (office, None), (homePage, None), (mbox, None), (preferredURI, None), (profileBackgroundColor, None), (owns, None), (based_near, None), (hasAddress, None), (img, None), (assistant, None), (title, None), (key, None), (inbox, None), (editableProfile, None), (postalCode, None), (weblog, None), (exponent, None), (avatar, None)\n", "\n" ] } ], "source": [ "graph.get_schema" ] }, { "cell_type": "markdown", "id": "68a3c677", "metadata": {}, "source": [ "## Querying the graph\n", "\n", "Now, you can use the graph SPARQL QA chain to ask questions about the graph." ] }, { "cell_type": "code", "execution_count": 11, "id": "7476ce98", "metadata": { "pycharm": { "is_executing": true } }, "outputs": [], "source": [ "chain = GraphSparqlQAChain.from_llm(\n", " ChatOpenAI(temperature=0), graph=graph, verbose=True\n", ")" ] }, { "cell_type": "code", "execution_count": 12, "id": "ef8ee27b", "metadata": { "pycharm": { "is_executing": true } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\u001B[1m> Entering new GraphSparqlQAChain chain...\u001B[0m\n", "Identified intent:\n", "\u001B[32;1m\u001B[1;3mSELECT\u001B[0m\n", "Generated SPARQL:\n", "\u001B[32;1m\u001B[1;3mPREFIX foaf: \n", "SELECT ?homepage\n", "WHERE {\n", " ?person foaf:name \"Tim Berners-Lee\" .\n", " ?person foaf:workplaceHomepage ?homepage .\n", "}\u001B[0m\n", "Full Context:\n", "\u001B[32;1m\u001B[1;3m[]\u001B[0m\n", "\n", "\u001B[1m> Finished chain.\u001B[0m\n" ] }, { "data": { "text/plain": [ "\"Tim Berners-Lee's work homepage is http://www.w3.org/People/Berners-Lee/.\"" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chain.run(\"What is Tim Berners-Lee's work homepage?\")" ] }, { "cell_type": "markdown", "id": "af4b3294", "metadata": {}, "source": [ "## Updating the graph\n", "\n", "Analogously, you can update the graph, i.e., insert triples, using natural language." ] }, { "cell_type": "code", "execution_count": 14, "id": "fdf38841", "metadata": { "pycharm": { "is_executing": true } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\u001B[1m> Entering new GraphSparqlQAChain chain...\u001B[0m\n", "Identified intent:\n", "\u001B[32;1m\u001B[1;3mUPDATE\u001B[0m\n", "Generated SPARQL:\n", "\u001B[32;1m\u001B[1;3mPREFIX foaf: \n", "INSERT {\n", " ?person foaf:workplaceHomepage .\n", "}\n", "WHERE {\n", " ?person foaf:name \"Timothy Berners-Lee\" .\n", "}\u001B[0m\n", "\n", "\u001B[1m> Finished chain.\u001B[0m\n" ] }, { "data": { "text/plain": [ "'Successfully inserted triples into the graph.'" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chain.run(\"Save that the person with the name 'Timothy Berners-Lee' has a work homepage at 'http://www.w3.org/foo/bar/'\")" ] }, { "cell_type": "markdown", "id": "5e0f7fc1", "metadata": {}, "source": [ "Let's verify the results:" ] }, { "cell_type": "code", "execution_count": 15, "id": "f874171b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(rdflib.term.URIRef('https://www.w3.org/'),),\n", " (rdflib.term.URIRef('http://www.w3.org/foo/bar/'),)]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "query = (\n", " \"\"\"PREFIX foaf: \\n\"\"\"\n", " \"\"\"SELECT ?hp\\n\"\"\"\n", " \"\"\"WHERE {\\n\"\"\"\n", " \"\"\" ?person foaf:name \"Timothy Berners-Lee\" . \\n\"\"\"\n", " \"\"\" ?person foaf:workplaceHomepage ?hp .\\n\"\"\"\n", " \"\"\"}\"\"\"\n", ")\n", "graph.query(query)" ] } ], "metadata": { "kernelspec": { "display_name": "lc", "language": "python", "name": "lc" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 5 }