mirror of https://github.com/hwchase17/langchain
langchain[minor], community[minor]: Implement Ontotext GraphDB QA Chain (#16019)
- **Description:** Implement Ontotext GraphDB QA Chain - **Issue:** N/A - **Dependencies:** N/A - **Twitter handle:** @OntotextGraphDBpull/16740/head
parent
a08f9a7ff9
commit
c95facc293
File diff suppressed because one or more lines are too long
@ -0,0 +1,21 @@
|
||||
# Ontotext GraphDB
|
||||
|
||||
>[Ontotext GraphDB](https://graphdb.ontotext.com/) is a graph database and knowledge discovery tool compliant with RDF and SPARQL.
|
||||
|
||||
## Dependencies
|
||||
|
||||
Install the [rdflib](https://github.com/RDFLib/rdflib) package with
|
||||
```bash
|
||||
pip install rdflib==7.0.0
|
||||
```
|
||||
|
||||
## Graph QA Chain
|
||||
|
||||
Connect your GraphDB Database with a chat model to get insights on your data.
|
||||
|
||||
See the notebook example [here](/docs/use_cases/graph/graph_ontotext_graphdb_qa).
|
||||
|
||||
```python
|
||||
from langchain_community.graphs import OntotextGraphDBGraph
|
||||
from langchain.chains import OntotextGraphDBQAChain
|
||||
```
|
@ -0,0 +1,543 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "922a7a98-7d73-4a1a-8860-76a33451d1be",
|
||||
"metadata": {
|
||||
"id": "922a7a98-7d73-4a1a-8860-76a33451d1be"
|
||||
},
|
||||
"source": [
|
||||
"# Ontotext GraphDB QA Chain\n",
|
||||
"\n",
|
||||
"This notebook shows how to use LLMs to provide natural language querying (NLQ to SPARQL, also called text2sparql) for [Ontotext GraphDB](https://graphdb.ontotext.com/). Ontotext GraphDB is a graph database and knowledge discovery tool compliant with [RDF](https://www.w3.org/RDF/) and [SPARQL](https://www.w3.org/TR/sparql11-query/).\n",
|
||||
"\n",
|
||||
"## GraphDB LLM Functionalities\n",
|
||||
"\n",
|
||||
"GraphDB supports some LLM integration functionalities as described in [https://github.com/w3c/sparql-dev/issues/193](https://github.com/w3c/sparql-dev/issues/193):\n",
|
||||
"\n",
|
||||
"[gpt-queries](https://graphdb.ontotext.com/documentation/10.5/gpt-queries.html)\n",
|
||||
"\n",
|
||||
"* magic predicates to ask an LLM for text, list or table using data from your knowledge graph (KG)\n",
|
||||
"* query explanation\n",
|
||||
"* result explanation, summarization, rephrasing, translation\n",
|
||||
"\n",
|
||||
"[retrieval-graphdb-connector](https://graphdb.ontotext.com/documentation/10.5/retrieval-graphdb-connector.html)\n",
|
||||
"\n",
|
||||
"* Indexing of KG entities in a vector database\n",
|
||||
"* Supports any text embedding algorithm and vector database\n",
|
||||
"* Uses the same powerful connector (indexing) language that GraphDB uses for Elastic, Solr, Lucene\n",
|
||||
"* Automatic synchronization of changes in RDF data to the KG entity index\n",
|
||||
"* Supports nested objects (no UI support in GraphDB version 10.5)\n",
|
||||
"* Serializes KG entities to text like this (e.g. for a Wines dataset):\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"Franvino:\n",
|
||||
"- is a RedWine.\n",
|
||||
"- made from grape Merlo.\n",
|
||||
"- made from grape Cabernet Franc.\n",
|
||||
"- has sugar dry.\n",
|
||||
"- has year 2012.\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"[talk-to-graph](https://graphdb.ontotext.com/documentation/10.5/talk-to-graph.html)\n",
|
||||
"\n",
|
||||
"* A simple chatbot using a defined KG entity index\n",
|
||||
"\n",
|
||||
"## Querying the GraphDB Database\n",
|
||||
"\n",
|
||||
"For this tutorial, we won't use the GraphDB LLM integration, but SPARQL generation from NLQ. We'll use the Star Wars API (SWAPI) ontology and dataset that you can examine [here](https://drive.google.com/file/d/1wQ2K4uZp4eq3wlJ6_F_TxkOolaiczdYp/view?usp=drive_link).\n",
|
||||
"\n",
|
||||
"You will need to have a running GraphDB instance. This tutorial shows how to run the database locally using the [GraphDB Docker image](https://hub.docker.com/r/ontotext/graphdb). It provides a docker compose set-up, which populates GraphDB with the Star Wars dataset. All nessessary files including this notebook can be downloaded from GDrive.\n",
|
||||
"\n",
|
||||
"### Set-up\n",
|
||||
"\n",
|
||||
"* Install [Docker](https://docs.docker.com/get-docker/). This tutorial is created using Docker version `24.0.7` which bundles [Docker Compose](https://docs.docker.com/compose/). For earlier Docker versions you may need to install Docker Compose separately.\n",
|
||||
"* Download all files from [GDrive](https://drive.google.com/drive/folders/18dN7WQxfGu26Z9C9HUU5jBwDuPnVTLbl) in a local folder on your machine.\n",
|
||||
"* Start GraphDB with the following script executed from this folder\n",
|
||||
" ```\n",
|
||||
" docker build --tag graphdb .\n",
|
||||
" docker compose up -d graphdb\n",
|
||||
" ```\n",
|
||||
" You need to wait a couple of seconds for the database to start on `http://localhost:7200/`. The Star Wars dataset `starwars-data.trig` is automatically loaded into the `langchain` repository. The local SPARQL endpoint `http://localhost:7200/repositories/langchain` can be used to run queries against. You can also open the GraphDB Workbench from your favourite web browser `http://localhost:7200/sparql` where you can make queries interactively.\n",
|
||||
"* Working environment\n",
|
||||
"\n",
|
||||
"If you use `conda`, create and activate a new conda env (e.g. `conda create -n graph_ontotext_graphdb_qa python=3.9.18`).\n",
|
||||
"Install the following libraries:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"pip install jupyter==1.0.0\n",
|
||||
"pip install openai==1.6.1\n",
|
||||
"pip install rdflib==7.0.0\n",
|
||||
"pip install langchain-openai==0.0.2\n",
|
||||
"pip install langchain\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Run Jupyter with\n",
|
||||
"```\n",
|
||||
"jupyter notebook\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e51b397c-2fdc-4b99-9fed-1ab2b6ef7547",
|
||||
"metadata": {
|
||||
"id": "e51b397c-2fdc-4b99-9fed-1ab2b6ef7547"
|
||||
},
|
||||
"source": [
|
||||
"### Specifying the Ontology\n",
|
||||
"\n",
|
||||
"In order for the LLM to be able to generate SPARQL, it needs to know the knowledge graph schema (the ontology). It can be provided using one of two parameters on the `OntotextGraphDBGraph` class:\n",
|
||||
"\n",
|
||||
"* `query_ontology`: a `CONSTRUCT` query that is executed on the SPARQL endpoint and returns the KG schema statements. We recommend that you store the ontology in its own named graph, which will make it easier to get only the relevant statements (as the example below). `DESCRIBE` queries are not supported, because `DESCRIBE` returns the Symmetric Concise Bounded Description (SCBD), i.e. also the incoming class links. In case of large graphs with a million of instances, this is not efficient. Check https://github.com/eclipse-rdf4j/rdf4j/issues/4857\n",
|
||||
"* `local_file`: a local RDF ontology file. Supported RDF formats are `Turtle`, `RDF/XML`, `JSON-LD`, `N-Triples`, `Notation-3`, `Trig`, `Trix`, `N-Quads`.\n",
|
||||
"\n",
|
||||
"In either case, the ontology dump should:\n",
|
||||
"\n",
|
||||
"* Include enough information about classes, properties, property attachment to classes (using rdfs:domain, schema:domainIncludes or OWL restrictions), and taxonomies (important individuals).\n",
|
||||
"* Not include overly verbose and irrelevant definitions and examples that do not help SPARQL construction."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "dc8792e0-acfb-4310-b5fa-8f649e448870",
|
||||
"metadata": {
|
||||
"id": "dc8792e0-acfb-4310-b5fa-8f649e448870"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.graphs import OntotextGraphDBGraph\n",
|
||||
"\n",
|
||||
"# feeding the schema using a user construct query\n",
|
||||
"\n",
|
||||
"graph = OntotextGraphDBGraph(\n",
|
||||
" query_endpoint=\"http://localhost:7200/repositories/langchain\",\n",
|
||||
" query_ontology=\"CONSTRUCT {?s ?p ?o} FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a08b8d8c-af01-4401-8069-5f2cd022a6df",
|
||||
"metadata": {
|
||||
"id": "a08b8d8c-af01-4401-8069-5f2cd022a6df"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# feeding the schema using a local RDF file\n",
|
||||
"\n",
|
||||
"graph = OntotextGraphDBGraph(\n",
|
||||
" query_endpoint=\"http://localhost:7200/repositories/langchain\",\n",
|
||||
" local_file=\"/path/to/langchain_graphdb_tutorial/starwars-ontology.nt\", # change the path here\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "583b26ce-fb0d-4e9c-b5cd-9ec0e3be8922",
|
||||
"metadata": {
|
||||
"id": "583b26ce-fb0d-4e9c-b5cd-9ec0e3be8922"
|
||||
},
|
||||
"source": [
|
||||
"Either way, the ontology (schema) is fed to the LLM as `Turtle` since `Turtle` with appropriate prefixes is most compact and easiest for the LLM to remember.\n",
|
||||
"\n",
|
||||
"The Star Wars ontology is a bit unusual in that it includes a lot of specific triples about classes, e.g. that the species `:Aleena` live on `<planet/38>`, they are a subclass of `:Reptile`, have certain typical characteristics (average height, average lifespan, skinColor), and specific individuals (characters) are representatives of that class:\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"@prefix : <https://swapi.co/vocabulary/> .\n",
|
||||
"@prefix owl: <http://www.w3.org/2002/07/owl#> .\n",
|
||||
"@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .\n",
|
||||
"@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .\n",
|
||||
"\n",
|
||||
":Aleena a owl:Class, :Species ;\n",
|
||||
" rdfs:label \"Aleena\" ;\n",
|
||||
" rdfs:isDefinedBy <https://swapi.co/ontology/> ;\n",
|
||||
" rdfs:subClassOf :Reptile, :Sentient ;\n",
|
||||
" :averageHeight 80.0 ;\n",
|
||||
" :averageLifespan \"79\" ;\n",
|
||||
" :character <https://swapi.co/resource/aleena/47> ;\n",
|
||||
" :film <https://swapi.co/resource/film/4> ;\n",
|
||||
" :language \"Aleena\" ;\n",
|
||||
" :planet <https://swapi.co/resource/planet/38> ;\n",
|
||||
" :skinColor \"blue\", \"gray\" .\n",
|
||||
"\n",
|
||||
" ...\n",
|
||||
"\n",
|
||||
" ```\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6277d911-b0f6-4aeb-9aa5-96416b668468",
|
||||
"metadata": {
|
||||
"id": "6277d911-b0f6-4aeb-9aa5-96416b668468"
|
||||
},
|
||||
"source": [
|
||||
"In order to keep this tutorial simple, we use un-secured GraphDB. If GraphDB is secured, you should set the environment variables 'GRAPHDB_USERNAME' and 'GRAPHDB_PASSWORD' before the initialization of `OntotextGraphDBGraph`.\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"os.environ[\"GRAPHDB_USERNAME\"] = \"graphdb-user\"\n",
|
||||
"os.environ[\"GRAPHDB_PASSWORD\"] = \"graphdb-password\"\n",
|
||||
"\n",
|
||||
"graph = OntotextGraphDBGraph(\n",
|
||||
" query_endpoint=...,\n",
|
||||
" query_ontology=...\n",
|
||||
")\n",
|
||||
"```\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "446d8a00-c98f-43b8-9e84-77b244f7bb24",
|
||||
"metadata": {
|
||||
"id": "446d8a00-c98f-43b8-9e84-77b244f7bb24"
|
||||
},
|
||||
"source": [
|
||||
"### Question Answering against the StarWars Dataset\n",
|
||||
"\n",
|
||||
"We can now use the `OntotextGraphDBQAChain` to ask some questions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "fab63d88-511d-4049-9bf0-ca8748f1fbff",
|
||||
"metadata": {
|
||||
"id": "fab63d88-511d-4049-9bf0-ca8748f1fbff"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from langchain.chains import OntotextGraphDBQAChain\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"# We'll be using an OpenAI model which requires an OpenAI API Key.\n",
|
||||
"# However, other models are available as well:\n",
|
||||
"# https://python.langchain.com/docs/integrations/chat/\n",
|
||||
"\n",
|
||||
"# Set the environment variable `OPENAI_API_KEY` to your OpenAI API key\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"sk-***\"\n",
|
||||
"\n",
|
||||
"# Any available OpenAI model can be used here.\n",
|
||||
"# We use 'gpt-4-1106-preview' because of the bigger context window.\n",
|
||||
"# The 'gpt-4-1106-preview' model_name will deprecate in the future and will change to 'gpt-4-turbo' or similar,\n",
|
||||
"# so be sure to consult with the OpenAI API https://platform.openai.com/docs/models for the correct naming.\n",
|
||||
"\n",
|
||||
"chain = OntotextGraphDBQAChain.from_llm(\n",
|
||||
" ChatOpenAI(temperature=0, model_name=\"gpt-4-1106-preview\"),\n",
|
||||
" graph=graph,\n",
|
||||
" verbose=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "64de8463-35b1-4c65-91e4-387daf4dd7d4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's ask a simple one."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "f1dc4bea-b0f1-48f7-99a6-351a31acac7b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new OntotextGraphDBQAChain chain...\u001b[0m\n",
|
||||
"Generated SPARQL:\n",
|
||||
"\u001b[32;1m\u001b[1;3mPREFIX : <https://swapi.co/vocabulary/>\n",
|
||||
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
||||
"\n",
|
||||
"SELECT ?climate\n",
|
||||
"WHERE {\n",
|
||||
" ?planet rdfs:label \"Tatooine\" ;\n",
|
||||
" :climate ?climate .\n",
|
||||
"}\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The climate on Tatooine is arid.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.invoke({chain.input_key: \"What is the climate on Tatooine?\"})[chain.output_key]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6d3a37f4-5c56-4b3e-b6ae-3eb030ffcc8f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And a bit more complicated one."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "4dde8b18-4329-4a86-abfb-26d3e77034b7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new OntotextGraphDBQAChain chain...\u001b[0m\n",
|
||||
"Generated SPARQL:\n",
|
||||
"\u001b[32;1m\u001b[1;3mPREFIX : <https://swapi.co/vocabulary/>\n",
|
||||
"PREFIX owl: <http://www.w3.org/2002/07/owl#>\n",
|
||||
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
|
||||
"\n",
|
||||
"SELECT ?climate\n",
|
||||
"WHERE {\n",
|
||||
" ?character rdfs:label \"Luke Skywalker\" .\n",
|
||||
" ?character :homeworld ?planet .\n",
|
||||
" ?planet :climate ?climate .\n",
|
||||
"}\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"The climate on Luke Skywalker's home planet is arid.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.invoke({chain.input_key: \"What is the climate on Luke Skywalker's home planet?\"})[\n",
|
||||
" chain.output_key\n",
|
||||
"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "51d3ce3e-9528-4a65-8f3e-2281de08cbf1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can also ask more complicated questions like"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "ab6f55f1-a3e0-4615-abd2-3cb26619c8d9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new OntotextGraphDBQAChain chain...\u001b[0m\n",
|
||||
"Generated SPARQL:\n",
|
||||
"\u001b[32;1m\u001b[1;3mPREFIX : <https://swapi.co/vocabulary/>\n",
|
||||
"PREFIX owl: <http://www.w3.org/2002/07/owl#>\n",
|
||||
"PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>\n",
|
||||
"PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>\n",
|
||||
"\n",
|
||||
"SELECT (AVG(?boxOffice) AS ?averageBoxOffice)\n",
|
||||
"WHERE {\n",
|
||||
" ?film a :Film .\n",
|
||||
" ?film :boxOffice ?boxOfficeValue .\n",
|
||||
" BIND(xsd:decimal(?boxOfficeValue) AS ?boxOffice)\n",
|
||||
"}\n",
|
||||
"\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The average box office revenue for all the Star Wars movies is approximately 754.1 million dollars.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.invoke(\n",
|
||||
" {\n",
|
||||
" chain.input_key: \"What is the average box office revenue for all the Star Wars movies?\"\n",
|
||||
" }\n",
|
||||
")[chain.output_key]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "11511345-8436-4634-92c6-36f2c0dd44db",
|
||||
"metadata": {
|
||||
"id": "11511345-8436-4634-92c6-36f2c0dd44db"
|
||||
},
|
||||
"source": [
|
||||
"### Chain Modifiers\n",
|
||||
"\n",
|
||||
"The Ontotext GraphDB QA chain allows prompt refinement for further improvement of your QA chain and enhancing the overall user experience of your app.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"#### \"SPARQL Generation\" Prompt\n",
|
||||
"\n",
|
||||
"The prompt is used for the SPARQL query generation based on the user question and the KG schema.\n",
|
||||
"\n",
|
||||
"- `sparql_generation_prompt`\n",
|
||||
"\n",
|
||||
" Default value:\n",
|
||||
" ````python\n",
|
||||
" GRAPHDB_SPARQL_GENERATION_TEMPLATE = \"\"\"\n",
|
||||
" Write a SPARQL SELECT query for querying a graph database.\n",
|
||||
" The ontology schema delimited by triple backticks in Turtle format is:\n",
|
||||
" ```\n",
|
||||
" {schema}\n",
|
||||
" ```\n",
|
||||
" Use only the classes and properties provided in the schema to construct the SPARQL query.\n",
|
||||
" Do not use any classes or properties that are not explicitly provided in the SPARQL query.\n",
|
||||
" Include all necessary prefixes.\n",
|
||||
" Do not include any explanations or apologies in your responses.\n",
|
||||
" Do not wrap the query in backticks.\n",
|
||||
" Do not include any text except the SPARQL query generated.\n",
|
||||
" The question delimited by triple backticks is:\n",
|
||||
" ```\n",
|
||||
" {prompt}\n",
|
||||
" ```\n",
|
||||
" \"\"\"\n",
|
||||
" GRAPHDB_SPARQL_GENERATION_PROMPT = PromptTemplate(\n",
|
||||
" input_variables=[\"schema\", \"prompt\"],\n",
|
||||
" template=GRAPHDB_SPARQL_GENERATION_TEMPLATE,\n",
|
||||
" )\n",
|
||||
" ````\n",
|
||||
"\n",
|
||||
"#### \"SPARQL Fix\" Prompt\n",
|
||||
"\n",
|
||||
"Sometimes, the LLM may generate a SPARQL query with syntactic errors or missing prefixes, etc. The chain will try to amend this by prompting the LLM to correct it a certain number of times.\n",
|
||||
"\n",
|
||||
"- `sparql_fix_prompt`\n",
|
||||
"\n",
|
||||
" Default value:\n",
|
||||
" ````python\n",
|
||||
" GRAPHDB_SPARQL_FIX_TEMPLATE = \"\"\"\n",
|
||||
" This following SPARQL query delimited by triple backticks\n",
|
||||
" ```\n",
|
||||
" {generated_sparql}\n",
|
||||
" ```\n",
|
||||
" is not valid.\n",
|
||||
" The error delimited by triple backticks is\n",
|
||||
" ```\n",
|
||||
" {error_message}\n",
|
||||
" ```\n",
|
||||
" Give me a correct version of the SPARQL query.\n",
|
||||
" Do not change the logic of the query.\n",
|
||||
" Do not include any explanations or apologies in your responses.\n",
|
||||
" Do not wrap the query in backticks.\n",
|
||||
" Do not include any text except the SPARQL query generated.\n",
|
||||
" The ontology schema delimited by triple backticks in Turtle format is:\n",
|
||||
" ```\n",
|
||||
" {schema}\n",
|
||||
" ```\n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" GRAPHDB_SPARQL_FIX_PROMPT = PromptTemplate(\n",
|
||||
" input_variables=[\"error_message\", \"generated_sparql\", \"schema\"],\n",
|
||||
" template=GRAPHDB_SPARQL_FIX_TEMPLATE,\n",
|
||||
" )\n",
|
||||
" ````\n",
|
||||
"\n",
|
||||
"- `max_fix_retries`\n",
|
||||
" \n",
|
||||
" Default value: `5`\n",
|
||||
"\n",
|
||||
"#### \"Answering\" Prompt\n",
|
||||
"\n",
|
||||
"The prompt is used for answering the question based on the results returned from the database and the initial user question. By default, the LLM is instructed to only use the information from the returned result(s). If the result set is empty, the LLM should inform that it can't answer the question.\n",
|
||||
"\n",
|
||||
"- `qa_prompt`\n",
|
||||
" \n",
|
||||
" Default value:\n",
|
||||
" ````python\n",
|
||||
" GRAPHDB_QA_TEMPLATE = \"\"\"Task: Generate a natural language response from the results of a SPARQL query.\n",
|
||||
" You are an assistant that creates well-written and human understandable answers.\n",
|
||||
" The information part contains the information provided, which you can use to construct an answer.\n",
|
||||
" The information provided is authoritative, you must never doubt it or try to use your internal knowledge to correct it.\n",
|
||||
" Make your response sound like the information is coming from an AI assistant, but don't add any information.\n",
|
||||
" Don't use internal knowledge to answer the question, just say you don't know if no information is available.\n",
|
||||
" Information:\n",
|
||||
" {context}\n",
|
||||
" \n",
|
||||
" Question: {prompt}\n",
|
||||
" Helpful Answer:\"\"\"\n",
|
||||
" GRAPHDB_QA_PROMPT = PromptTemplate(\n",
|
||||
" input_variables=[\"context\", \"prompt\"], template=GRAPHDB_QA_TEMPLATE\n",
|
||||
" )\n",
|
||||
" ````"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2ef8c073-003d-44ab-8a7b-cf45c50f6370",
|
||||
"metadata": {
|
||||
"id": "2ef8c073-003d-44ab-8a7b-cf45c50f6370"
|
||||
},
|
||||
"source": [
|
||||
"Once you're finished playing with QA with GraphDB, you can shut down the Docker environment by running\n",
|
||||
"``\n",
|
||||
"docker compose down -v --remove-orphans\n",
|
||||
"``\n",
|
||||
"from the directory with the Docker compose file."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": [],
|
||||
"toc_visible": true
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,213 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
List,
|
||||
Optional,
|
||||
Union,
|
||||
)
|
||||
|
||||
if TYPE_CHECKING:
|
||||
import rdflib
|
||||
|
||||
|
||||
class OntotextGraphDBGraph:
|
||||
"""Ontotext GraphDB https://graphdb.ontotext.com/ wrapper for graph operations.
|
||||
|
||||
*Security note*: Make sure that the database connection uses credentials
|
||||
that are narrowly-scoped to only include necessary permissions.
|
||||
Failure to do so may result in data corruption or loss, since the calling
|
||||
code may attempt commands that would result in deletion, mutation
|
||||
of data if appropriately prompted or reading sensitive data if such
|
||||
data is present in the database.
|
||||
The best way to guard against such negative outcomes is to (as appropriate)
|
||||
limit the permissions granted to the credentials used with this tool.
|
||||
|
||||
See https://python.langchain.com/docs/security for more information.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
query_endpoint: str,
|
||||
query_ontology: Optional[str] = None,
|
||||
local_file: Optional[str] = None,
|
||||
local_file_format: Optional[str] = None,
|
||||
) -> None:
|
||||
"""
|
||||
Set up the GraphDB wrapper
|
||||
|
||||
:param query_endpoint: SPARQL endpoint for queries, read access
|
||||
|
||||
If GraphDB is secured,
|
||||
set the environment variables 'GRAPHDB_USERNAME' and 'GRAPHDB_PASSWORD'.
|
||||
|
||||
:param query_ontology: a `CONSTRUCT` query that is executed
|
||||
on the SPARQL endpoint and returns the KG schema statements
|
||||
Example:
|
||||
'CONSTRUCT {?s ?p ?o} FROM <https://example.com/ontology/> WHERE {?s ?p ?o}'
|
||||
Currently, DESCRIBE queries like
|
||||
'PREFIX onto: <https://example.com/ontology/>
|
||||
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
|
||||
DESCRIBE ?term WHERE {
|
||||
?term rdfs:isDefinedBy onto:
|
||||
}'
|
||||
are not supported, because DESCRIBE returns
|
||||
the Symmetric Concise Bounded Description (SCBD),
|
||||
i.e. also the incoming class links.
|
||||
In case of large graphs with a million of instances, this is not efficient.
|
||||
Check https://github.com/eclipse-rdf4j/rdf4j/issues/4857
|
||||
|
||||
:param local_file: a local RDF ontology file.
|
||||
Supported RDF formats:
|
||||
Turtle, RDF/XML, JSON-LD, N-Triples, Notation-3, Trig, Trix, N-Quads.
|
||||
If the rdf format can't be determined from the file extension,
|
||||
pass explicitly the rdf format in `local_file_format` param.
|
||||
|
||||
:param local_file_format: Used if the rdf format can't be determined
|
||||
from the local file extension.
|
||||
One of "json-ld", "xml", "n3", "turtle", "nt", "trig", "nquads", "trix"
|
||||
|
||||
Either `query_ontology` or `local_file` should be passed.
|
||||
"""
|
||||
|
||||
if query_ontology and local_file:
|
||||
raise ValueError("Both file and query provided. Only one is allowed.")
|
||||
|
||||
if not query_ontology and not local_file:
|
||||
raise ValueError("Neither file nor query provided. One is required.")
|
||||
|
||||
try:
|
||||
import rdflib
|
||||
from rdflib.plugins.stores import sparqlstore
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"Could not import rdflib python package. "
|
||||
"Please install it with `pip install rdflib`."
|
||||
)
|
||||
|
||||
auth = self._get_auth()
|
||||
store = sparqlstore.SPARQLStore(auth=auth)
|
||||
store.open(query_endpoint)
|
||||
|
||||
self.graph = rdflib.Graph(store, identifier=None, bind_namespaces="none")
|
||||
self._check_connectivity()
|
||||
|
||||
if local_file:
|
||||
ontology_schema_graph = self._load_ontology_schema_from_file(
|
||||
local_file, local_file_format
|
||||
)
|
||||
else:
|
||||
self._validate_user_query(query_ontology)
|
||||
ontology_schema_graph = self._load_ontology_schema_with_query(
|
||||
query_ontology
|
||||
)
|
||||
self.schema = ontology_schema_graph.serialize(format="turtle")
|
||||
|
||||
@staticmethod
|
||||
def _get_auth() -> Union[tuple, None]:
|
||||
"""
|
||||
Returns the basic authentication configuration
|
||||
"""
|
||||
username = os.environ.get("GRAPHDB_USERNAME", None)
|
||||
password = os.environ.get("GRAPHDB_PASSWORD", None)
|
||||
|
||||
if username:
|
||||
if not password:
|
||||
raise ValueError(
|
||||
"Environment variable 'GRAPHDB_USERNAME' is set, "
|
||||
"but 'GRAPHDB_PASSWORD' is not set."
|
||||
)
|
||||
else:
|
||||
return username, password
|
||||
return None
|
||||
|
||||
def _check_connectivity(self) -> None:
|
||||
"""
|
||||
Executes a simple `ASK` query to check connectivity
|
||||
"""
|
||||
try:
|
||||
self.graph.query("ASK { ?s ?p ?o }")
|
||||
except ValueError:
|
||||
raise ValueError(
|
||||
"Could not query the provided endpoint. "
|
||||
"Please, check, if the value of the provided "
|
||||
"query_endpoint points to the right repository. "
|
||||
"If GraphDB is secured, please, "
|
||||
"make sure that the environment variables "
|
||||
"'GRAPHDB_USERNAME' and 'GRAPHDB_PASSWORD' are set."
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _load_ontology_schema_from_file(local_file: str, local_file_format: str = None):
|
||||
"""
|
||||
Parse the ontology schema statements from the provided file
|
||||
"""
|
||||
import rdflib
|
||||
|
||||
if not os.path.exists(local_file):
|
||||
raise FileNotFoundError(f"File {local_file} does not exist.")
|
||||
if not os.access(local_file, os.R_OK):
|
||||
raise PermissionError(f"Read permission for {local_file} is restricted")
|
||||
graph = rdflib.ConjunctiveGraph()
|
||||
try:
|
||||
graph.parse(local_file, format=local_file_format)
|
||||
except Exception as e:
|
||||
raise ValueError(f"Invalid file format for {local_file} : ", e)
|
||||
return graph
|
||||
|
||||
@staticmethod
|
||||
def _validate_user_query(query_ontology: str) -> None:
|
||||
"""
|
||||
Validate the query is a valid SPARQL CONSTRUCT query
|
||||
"""
|
||||
from pyparsing import ParseException
|
||||
from rdflib.plugins.sparql import prepareQuery
|
||||
|
||||
if not isinstance(query_ontology, str):
|
||||
raise TypeError("Ontology query must be provided as string.")
|
||||
try:
|
||||
parsed_query = prepareQuery(query_ontology)
|
||||
except ParseException as e:
|
||||
raise ValueError("Ontology query is not a valid SPARQL query.", e)
|
||||
|
||||
if parsed_query.algebra.name != "ConstructQuery":
|
||||
raise ValueError(
|
||||
"Invalid query type. Only CONSTRUCT queries are supported."
|
||||
)
|
||||
|
||||
def _load_ontology_schema_with_query(self, query: str):
|
||||
"""
|
||||
Execute the query for collecting the ontology schema statements
|
||||
"""
|
||||
from rdflib.exceptions import ParserError
|
||||
|
||||
try:
|
||||
results = self.graph.query(query)
|
||||
except ParserError as e:
|
||||
raise ValueError(f"Generated SPARQL statement is invalid\n{e}")
|
||||
|
||||
return results.graph
|
||||
|
||||
@property
|
||||
def get_schema(self) -> str:
|
||||
"""
|
||||
Returns the schema of the graph database in turtle format
|
||||
"""
|
||||
return self.schema
|
||||
|
||||
def query(
|
||||
self,
|
||||
query: str,
|
||||
) -> List[rdflib.query.ResultRow]:
|
||||
"""
|
||||
Query the graph.
|
||||
"""
|
||||
from rdflib.exceptions import ParserError
|
||||
from rdflib.query import ResultRow
|
||||
|
||||
try:
|
||||
res = self.graph.query(query)
|
||||
except ParserError as e:
|
||||
raise ValueError(f"Generated SPARQL statement is invalid\n{e}")
|
||||
return [r for r in res if isinstance(r, ResultRow)]
|
@ -0,0 +1,6 @@
|
||||
FROM ontotext/graphdb:10.5.1
|
||||
RUN mkdir -p /opt/graphdb/dist/data/repositories/langchain
|
||||
COPY config.ttl /opt/graphdb/dist/data/repositories/langchain/
|
||||
COPY starwars-data.trig /
|
||||
COPY graphdb_create.sh /run.sh
|
||||
ENTRYPOINT bash /run.sh
|
@ -0,0 +1,46 @@
|
||||
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
|
||||
@prefix rep: <http://www.openrdf.org/config/repository#>.
|
||||
@prefix sr: <http://www.openrdf.org/config/repository/sail#>.
|
||||
@prefix sail: <http://www.openrdf.org/config/sail#>.
|
||||
@prefix graphdb: <http://www.ontotext.com/config/graphdb#>.
|
||||
|
||||
[] a rep:Repository ;
|
||||
rep:repositoryID "langchain" ;
|
||||
rdfs:label "" ;
|
||||
rep:repositoryImpl [
|
||||
rep:repositoryType "graphdb:SailRepository" ;
|
||||
sr:sailImpl [
|
||||
sail:sailType "graphdb:Sail" ;
|
||||
|
||||
graphdb:read-only "false" ;
|
||||
|
||||
# Inference and Validation
|
||||
graphdb:ruleset "empty" ;
|
||||
graphdb:disable-sameAs "true" ;
|
||||
graphdb:check-for-inconsistencies "false" ;
|
||||
|
||||
# Indexing
|
||||
graphdb:entity-id-size "32" ;
|
||||
graphdb:enable-context-index "false" ;
|
||||
graphdb:enablePredicateList "true" ;
|
||||
graphdb:enable-fts-index "false" ;
|
||||
graphdb:fts-indexes ("default" "iri") ;
|
||||
graphdb:fts-string-literals-index "default" ;
|
||||
graphdb:fts-iris-index "none" ;
|
||||
|
||||
# Queries and Updates
|
||||
graphdb:query-timeout "0" ;
|
||||
graphdb:throw-QueryEvaluationException-on-timeout "false" ;
|
||||
graphdb:query-limit-results "0" ;
|
||||
|
||||
# Settable in the file but otherwise hidden in the UI and in the RDF4J console
|
||||
graphdb:base-URL "http://example.org/owlim#" ;
|
||||
graphdb:defaultNS "" ;
|
||||
graphdb:imports "" ;
|
||||
graphdb:repository-type "file-repository" ;
|
||||
graphdb:storage-folder "storage" ;
|
||||
graphdb:entity-index-size "10000000" ;
|
||||
graphdb:in-memory-literal-properties "true" ;
|
||||
graphdb:enable-literal-index "true" ;
|
||||
]
|
||||
].
|
@ -0,0 +1,9 @@
|
||||
version: '3.7'
|
||||
|
||||
services:
|
||||
|
||||
graphdb:
|
||||
image: graphdb
|
||||
container_name: graphdb
|
||||
ports:
|
||||
- "7200:7200"
|
@ -0,0 +1,33 @@
|
||||
#! /bin/bash
|
||||
REPOSITORY_ID="langchain"
|
||||
GRAPHDB_URI="http://localhost:7200/"
|
||||
|
||||
echo -e "\nUsing GraphDB: ${GRAPHDB_URI}"
|
||||
|
||||
function startGraphDB {
|
||||
echo -e "\nStarting GraphDB..."
|
||||
exec /opt/graphdb/dist/bin/graphdb
|
||||
}
|
||||
|
||||
function waitGraphDBStart {
|
||||
echo -e "\nWaiting GraphDB to start..."
|
||||
for _ in $(seq 1 5); do
|
||||
CHECK_RES=$(curl --silent --write-out '%{http_code}' --output /dev/null ${GRAPHDB_URI}/rest/repositories)
|
||||
if [ "${CHECK_RES}" = '200' ]; then
|
||||
echo -e "\nUp and running"
|
||||
break
|
||||
fi
|
||||
sleep 30s
|
||||
echo "CHECK_RES: ${CHECK_RES}"
|
||||
done
|
||||
}
|
||||
|
||||
function loadData {
|
||||
echo -e "\nImporting starwars-data.trig"
|
||||
curl -X POST -H "Content-Type: application/x-trig" -T /starwars-data.trig ${GRAPHDB_URI}/repositories/${REPOSITORY_ID}/statements
|
||||
}
|
||||
|
||||
startGraphDB &
|
||||
waitGraphDBStart
|
||||
loadData
|
||||
wait
|
@ -0,0 +1,5 @@
|
||||
set -ex
|
||||
|
||||
docker compose down -v --remove-orphans
|
||||
docker build --tag graphdb .
|
||||
docker compose up -d graphdb
|
@ -0,0 +1,43 @@
|
||||
@base <https://swapi.co/resource/>.
|
||||
@prefix voc: <https://swapi.co/vocabulary/> .
|
||||
@prefix owl: <http://www.w3.org/2002/07/owl#> .
|
||||
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
|
||||
|
||||
{
|
||||
|
||||
<besalisk/71>
|
||||
a voc:Besalisk , voc:Character ;
|
||||
rdfs:label "Dexter Jettster" ;
|
||||
voc:eyeColor "yellow" ;
|
||||
voc:gender "male" ;
|
||||
voc:height 198.0 ;
|
||||
voc:mass 102.0 ;
|
||||
voc:skinColor "brown" .
|
||||
|
||||
}
|
||||
|
||||
<https://swapi.co/ontology/> {
|
||||
|
||||
voc:Character a owl:Class .
|
||||
voc:Species a owl:Class .
|
||||
|
||||
voc:Besalisk a voc:Species;
|
||||
rdfs:label "Besalisk";
|
||||
voc:averageHeight 178.0;
|
||||
voc:averageLifespan "75";
|
||||
voc:character <https://swapi.co/resource/besalisk/71>;
|
||||
voc:language "besalisk";
|
||||
voc:skinColor "brown";
|
||||
voc:eyeColor "yellow" .
|
||||
|
||||
voc:averageHeight a owl:DatatypeProperty .
|
||||
voc:averageLifespan a owl:DatatypeProperty .
|
||||
voc:character a owl:ObjectProperty .
|
||||
voc:language a owl:DatatypeProperty .
|
||||
voc:skinColor a owl:DatatypeProperty .
|
||||
voc:eyeColor a owl:DatatypeProperty .
|
||||
voc:gender a owl:DatatypeProperty .
|
||||
voc:height a owl:DatatypeProperty .
|
||||
voc:mass a owl:DatatypeProperty .
|
||||
|
||||
}
|
@ -0,0 +1,181 @@
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from langchain_community.graphs import OntotextGraphDBGraph
|
||||
|
||||
"""
|
||||
cd libs/community/tests/integration_tests/graphs/docker-compose-ontotext-graphdb
|
||||
./start.sh
|
||||
"""
|
||||
|
||||
|
||||
def test_query() -> None:
|
||||
graph = OntotextGraphDBGraph(
|
||||
query_endpoint="http://localhost:7200/repositories/langchain",
|
||||
query_ontology="CONSTRUCT {?s ?p ?o}"
|
||||
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
|
||||
)
|
||||
|
||||
query_results = graph.query(
|
||||
"PREFIX voc: <https://swapi.co/vocabulary/> "
|
||||
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> "
|
||||
"SELECT ?eyeColor "
|
||||
"WHERE {"
|
||||
' ?besalisk rdfs:label "Dexter Jettster" ; '
|
||||
" voc:eyeColor ?eyeColor ."
|
||||
"}"
|
||||
)
|
||||
assert len(query_results) == 1
|
||||
assert len(query_results[0]) == 1
|
||||
assert str(query_results[0][0]) == "yellow"
|
||||
|
||||
|
||||
def test_get_schema_with_query() -> None:
|
||||
graph = OntotextGraphDBGraph(
|
||||
query_endpoint="http://localhost:7200/repositories/langchain",
|
||||
query_ontology="CONSTRUCT {?s ?p ?o}"
|
||||
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
|
||||
)
|
||||
|
||||
from rdflib import Graph
|
||||
|
||||
assert len(Graph().parse(data=graph.get_schema, format="turtle")) == 19
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"rdf_format, file_extension",
|
||||
[
|
||||
("json-ld", "json"),
|
||||
("json-ld", "jsonld"),
|
||||
("json-ld", "json-ld"),
|
||||
("xml", "rdf"),
|
||||
("xml", "xml"),
|
||||
("xml", "owl"),
|
||||
("pretty-xml", "xml"),
|
||||
("n3", "n3"),
|
||||
("turtle", "ttl"),
|
||||
("nt", "nt"),
|
||||
("trig", "trig"),
|
||||
("nquads", "nq"),
|
||||
("nquads", "nquads"),
|
||||
("trix", "trix"),
|
||||
],
|
||||
)
|
||||
def test_get_schema_from_file(
|
||||
tmp_path: Path, rdf_format: str, file_extension: str
|
||||
) -> None:
|
||||
expected_number_of_ontology_statements = 19
|
||||
|
||||
graph = OntotextGraphDBGraph(
|
||||
query_endpoint="http://localhost:7200/repositories/langchain",
|
||||
query_ontology="CONSTRUCT {?s ?p ?o}"
|
||||
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
|
||||
)
|
||||
|
||||
from rdflib import ConjunctiveGraph, Graph
|
||||
|
||||
assert (
|
||||
len(Graph().parse(data=graph.get_schema, format="turtle"))
|
||||
== expected_number_of_ontology_statements
|
||||
)
|
||||
|
||||
# serialize the ontology schema loaded with the query in a local file
|
||||
# in various rdf formats and check that this results
|
||||
# in the same number of statements
|
||||
conjunctive_graph = ConjunctiveGraph()
|
||||
ontology_context = conjunctive_graph.get_context("https://swapi.co/ontology/")
|
||||
ontology_context.parse(data=graph.get_schema, format="turtle")
|
||||
|
||||
assert len(ontology_context) == expected_number_of_ontology_statements
|
||||
assert len(conjunctive_graph) == expected_number_of_ontology_statements
|
||||
|
||||
local_file = tmp_path / ("starwars-ontology." + file_extension)
|
||||
conjunctive_graph.serialize(local_file, format=rdf_format)
|
||||
|
||||
graph = OntotextGraphDBGraph(
|
||||
query_endpoint="http://localhost:7200/repositories/langchain",
|
||||
local_file=str(local_file),
|
||||
)
|
||||
assert (
|
||||
len(Graph().parse(data=graph.get_schema, format="turtle"))
|
||||
== expected_number_of_ontology_statements
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"rdf_format", ["json-ld", "xml", "n3", "turtle", "nt", "trig", "nquads", "trix"]
|
||||
)
|
||||
def test_get_schema_from_file_with_explicit_rdf_format(
|
||||
tmp_path: Path, rdf_format: str
|
||||
) -> None:
|
||||
expected_number_of_ontology_statements = 19
|
||||
|
||||
graph = OntotextGraphDBGraph(
|
||||
query_endpoint="http://localhost:7200/repositories/langchain",
|
||||
query_ontology="CONSTRUCT {?s ?p ?o}"
|
||||
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
|
||||
)
|
||||
|
||||
from rdflib import ConjunctiveGraph, Graph
|
||||
|
||||
assert (
|
||||
len(Graph().parse(data=graph.get_schema, format="turtle"))
|
||||
== expected_number_of_ontology_statements
|
||||
)
|
||||
|
||||
# serialize the ontology schema loaded with the query in a local file
|
||||
# in various rdf formats and check that this results
|
||||
# in the same number of statements
|
||||
conjunctive_graph = ConjunctiveGraph()
|
||||
ontology_context = conjunctive_graph.get_context("https://swapi.co/ontology/")
|
||||
ontology_context.parse(data=graph.get_schema, format="turtle")
|
||||
|
||||
assert len(ontology_context) == expected_number_of_ontology_statements
|
||||
assert len(conjunctive_graph) == expected_number_of_ontology_statements
|
||||
|
||||
local_file = tmp_path / "starwars-ontology.txt"
|
||||
conjunctive_graph.serialize(local_file, format=rdf_format)
|
||||
|
||||
graph = OntotextGraphDBGraph(
|
||||
query_endpoint="http://localhost:7200/repositories/langchain",
|
||||
local_file=str(local_file),
|
||||
local_file_format=rdf_format,
|
||||
)
|
||||
assert (
|
||||
len(Graph().parse(data=graph.get_schema, format="turtle"))
|
||||
== expected_number_of_ontology_statements
|
||||
)
|
||||
|
||||
|
||||
def test_get_schema_from_file_with_wrong_extension(tmp_path: Path) -> None:
|
||||
expected_number_of_ontology_statements = 19
|
||||
|
||||
graph = OntotextGraphDBGraph(
|
||||
query_endpoint="http://localhost:7200/repositories/langchain",
|
||||
query_ontology="CONSTRUCT {?s ?p ?o}"
|
||||
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
|
||||
)
|
||||
|
||||
from rdflib import ConjunctiveGraph, Graph
|
||||
|
||||
assert (
|
||||
len(Graph().parse(data=graph.get_schema, format="turtle"))
|
||||
== expected_number_of_ontology_statements
|
||||
)
|
||||
|
||||
conjunctive_graph = ConjunctiveGraph()
|
||||
ontology_context = conjunctive_graph.get_context("https://swapi.co/ontology/")
|
||||
ontology_context.parse(data=graph.get_schema, format="turtle")
|
||||
|
||||
assert len(ontology_context) == expected_number_of_ontology_statements
|
||||
assert len(conjunctive_graph) == expected_number_of_ontology_statements
|
||||
|
||||
local_file = tmp_path / "starwars-ontology.trig"
|
||||
conjunctive_graph.serialize(local_file, format="nquads")
|
||||
|
||||
with pytest.raises(ValueError):
|
||||
OntotextGraphDBGraph(
|
||||
query_endpoint="http://localhost:7200/repositories/langchain",
|
||||
local_file=str(local_file),
|
||||
)
|
@ -0,0 +1,176 @@
|
||||
import os
|
||||
import tempfile
|
||||
import unittest
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
class TestOntotextGraphDBGraph(unittest.TestCase):
|
||||
def test_import(self) -> None:
|
||||
from langchain_community.graphs import OntotextGraphDBGraph # noqa: F401
|
||||
|
||||
@pytest.mark.requires("rdflib")
|
||||
def test_validate_user_query_wrong_type(self) -> None:
|
||||
from langchain_community.graphs import OntotextGraphDBGraph
|
||||
|
||||
with self.assertRaises(TypeError) as e:
|
||||
OntotextGraphDBGraph._validate_user_query(
|
||||
[
|
||||
"PREFIX starwars: <https://swapi.co/ontology/> "
|
||||
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> "
|
||||
"DESCRIBE starwars: ?term "
|
||||
"WHERE {?term rdfs:isDefinedBy starwars: }"
|
||||
]
|
||||
)
|
||||
self.assertEqual("Ontology query must be provided as string.", str(e.exception))
|
||||
|
||||
@pytest.mark.requires("rdflib")
|
||||
def test_validate_user_query_invalid_sparql_syntax(self) -> None:
|
||||
from langchain_community.graphs import OntotextGraphDBGraph
|
||||
|
||||
with self.assertRaises(ValueError) as e:
|
||||
OntotextGraphDBGraph._validate_user_query(
|
||||
"CONSTRUCT {?s ?p ?o} FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o"
|
||||
)
|
||||
self.assertEqual(
|
||||
"('Ontology query is not a valid SPARQL query.', "
|
||||
"Expected ConstructQuery, "
|
||||
"found end of text (at char 70), (line:1, col:71))",
|
||||
str(e.exception),
|
||||
)
|
||||
|
||||
@pytest.mark.requires("rdflib")
|
||||
def test_validate_user_query_invalid_query_type_select(self) -> None:
|
||||
from langchain_community.graphs import OntotextGraphDBGraph
|
||||
|
||||
with self.assertRaises(ValueError) as e:
|
||||
OntotextGraphDBGraph._validate_user_query("SELECT * { ?s ?p ?o }")
|
||||
self.assertEqual(
|
||||
"Invalid query type. Only CONSTRUCT queries are supported.",
|
||||
str(e.exception),
|
||||
)
|
||||
|
||||
@pytest.mark.requires("rdflib")
|
||||
def test_validate_user_query_invalid_query_type_ask(self) -> None:
|
||||
from langchain_community.graphs import OntotextGraphDBGraph
|
||||
|
||||
with self.assertRaises(ValueError) as e:
|
||||
OntotextGraphDBGraph._validate_user_query("ASK { ?s ?p ?o }")
|
||||
self.assertEqual(
|
||||
"Invalid query type. Only CONSTRUCT queries are supported.",
|
||||
str(e.exception),
|
||||
)
|
||||
|
||||
@pytest.mark.requires("rdflib")
|
||||
def test_validate_user_query_invalid_query_type_describe(self) -> None:
|
||||
from langchain_community.graphs import OntotextGraphDBGraph
|
||||
|
||||
with self.assertRaises(ValueError) as e:
|
||||
OntotextGraphDBGraph._validate_user_query(
|
||||
"PREFIX swapi: <https://swapi.co/ontology/> "
|
||||
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> "
|
||||
"DESCRIBE ?term WHERE { ?term rdfs:isDefinedBy swapi: }"
|
||||
)
|
||||
self.assertEqual(
|
||||
"Invalid query type. Only CONSTRUCT queries are supported.",
|
||||
str(e.exception),
|
||||
)
|
||||
|
||||
@pytest.mark.requires("rdflib")
|
||||
def test_validate_user_query_construct(self) -> None:
|
||||
from langchain_community.graphs import OntotextGraphDBGraph
|
||||
|
||||
OntotextGraphDBGraph._validate_user_query(
|
||||
"CONSTRUCT {?s ?p ?o} FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}"
|
||||
)
|
||||
|
||||
@pytest.mark.requires("rdflib")
|
||||
def test_check_connectivity(self) -> None:
|
||||
from langchain_community.graphs import OntotextGraphDBGraph
|
||||
|
||||
with self.assertRaises(ValueError) as e:
|
||||
OntotextGraphDBGraph(
|
||||
query_endpoint="http://localhost:7200/repositories/non-existing-repository",
|
||||
query_ontology="PREFIX swapi: <https://swapi.co/ontology/> "
|
||||
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> "
|
||||
"DESCRIBE ?term WHERE {?term rdfs:isDefinedBy swapi: }",
|
||||
)
|
||||
self.assertEqual(
|
||||
"Could not query the provided endpoint. "
|
||||
"Please, check, if the value of the provided "
|
||||
"query_endpoint points to the right repository. "
|
||||
"If GraphDB is secured, please, make sure that the environment variables "
|
||||
"'GRAPHDB_USERNAME' and 'GRAPHDB_PASSWORD' are set.",
|
||||
str(e.exception),
|
||||
)
|
||||
|
||||
@pytest.mark.requires("rdflib")
|
||||
def test_local_file_does_not_exist(self) -> None:
|
||||
from langchain_community.graphs import OntotextGraphDBGraph
|
||||
|
||||
non_existing_file = os.path.join("non", "existing", "path", "to", "file.ttl")
|
||||
with self.assertRaises(FileNotFoundError) as e:
|
||||
OntotextGraphDBGraph._load_ontology_schema_from_file(non_existing_file)
|
||||
self.assertEqual(f"File {non_existing_file} does not exist.", str(e.exception))
|
||||
|
||||
@pytest.mark.requires("rdflib")
|
||||
def test_local_file_no_access(self) -> None:
|
||||
from langchain_community.graphs import OntotextGraphDBGraph
|
||||
|
||||
with tempfile.NamedTemporaryFile() as tmp_file:
|
||||
tmp_file_name = tmp_file.name
|
||||
|
||||
# Set file permissions to write and execute only
|
||||
os.chmod(tmp_file_name, 0o300)
|
||||
|
||||
with self.assertRaises(PermissionError) as e:
|
||||
OntotextGraphDBGraph._load_ontology_schema_from_file(tmp_file_name)
|
||||
|
||||
self.assertEqual(
|
||||
f"Read permission for {tmp_file_name} is restricted", str(e.exception)
|
||||
)
|
||||
|
||||
@pytest.mark.requires("rdflib")
|
||||
def test_local_file_bad_syntax(self) -> None:
|
||||
from langchain_community.graphs import OntotextGraphDBGraph
|
||||
|
||||
with tempfile.TemporaryDirectory() as tempdir:
|
||||
tmp_file_path = os.path.join(tempdir, "starwars-ontology.trig")
|
||||
with open(tmp_file_path, "w") as tmp_file:
|
||||
tmp_file.write("invalid trig")
|
||||
|
||||
with self.assertRaises(ValueError) as e:
|
||||
OntotextGraphDBGraph._load_ontology_schema_from_file(tmp_file_path)
|
||||
self.assertEqual(
|
||||
f"('Invalid file format for {tmp_file_path} : '"
|
||||
", BadSyntax('', 0, 'invalid trig', 0, "
|
||||
"'expected directive or statement'))",
|
||||
str(e.exception),
|
||||
)
|
||||
|
||||
@pytest.mark.requires("rdflib")
|
||||
def test_both_query_and_local_file_provided(self) -> None:
|
||||
from langchain_community.graphs import OntotextGraphDBGraph
|
||||
|
||||
with self.assertRaises(ValueError) as e:
|
||||
OntotextGraphDBGraph(
|
||||
query_endpoint="http://localhost:7200/repositories/non-existing-repository",
|
||||
query_ontology="CONSTRUCT {?s ?p ?o}"
|
||||
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
|
||||
local_file="starwars-ontology-wrong.trig",
|
||||
)
|
||||
self.assertEqual(
|
||||
"Both file and query provided. Only one is allowed.", str(e.exception)
|
||||
)
|
||||
|
||||
@pytest.mark.requires("rdflib")
|
||||
def test_nor_query_nor_local_file_provided(self) -> None:
|
||||
from langchain_community.graphs import OntotextGraphDBGraph
|
||||
|
||||
with self.assertRaises(ValueError) as e:
|
||||
OntotextGraphDBGraph(
|
||||
query_endpoint="http://localhost:7200/repositories/non-existing-repository",
|
||||
)
|
||||
self.assertEqual(
|
||||
"Neither file nor query provided. One is required.", str(e.exception)
|
||||
)
|
@ -0,0 +1,182 @@
|
||||
"""Question answering over a graph."""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from langchain_community.graphs import OntotextGraphDBGraph
|
||||
from langchain_core.callbacks.manager import CallbackManager
|
||||
from langchain_core.language_models import BaseLanguageModel
|
||||
from langchain_core.prompts.base import BasePromptTemplate
|
||||
from langchain_core.pydantic_v1 import Field
|
||||
|
||||
from langchain.callbacks.manager import CallbackManagerForChainRun
|
||||
from langchain.chains.base import Chain
|
||||
from langchain.chains.graph_qa.prompts import (
|
||||
GRAPHDB_QA_PROMPT,
|
||||
GRAPHDB_SPARQL_FIX_PROMPT,
|
||||
GRAPHDB_SPARQL_GENERATION_PROMPT,
|
||||
)
|
||||
from langchain.chains.llm import LLMChain
|
||||
|
||||
|
||||
class OntotextGraphDBQAChain(Chain):
|
||||
"""Question-answering against Ontotext GraphDB
|
||||
https://graphdb.ontotext.com/ by generating SPARQL queries.
|
||||
|
||||
*Security note*: Make sure that the database connection uses credentials
|
||||
that are narrowly-scoped to only include necessary permissions.
|
||||
Failure to do so may result in data corruption or loss, since the calling
|
||||
code may attempt commands that would result in deletion, mutation
|
||||
of data if appropriately prompted or reading sensitive data if such
|
||||
data is present in the database.
|
||||
The best way to guard against such negative outcomes is to (as appropriate)
|
||||
limit the permissions granted to the credentials used with this tool.
|
||||
|
||||
See https://python.langchain.com/docs/security for more information.
|
||||
"""
|
||||
|
||||
graph: OntotextGraphDBGraph = Field(exclude=True)
|
||||
sparql_generation_chain: LLMChain
|
||||
sparql_fix_chain: LLMChain
|
||||
max_fix_retries: int
|
||||
qa_chain: LLMChain
|
||||
input_key: str = "query" #: :meta private:
|
||||
output_key: str = "result" #: :meta private:
|
||||
|
||||
@property
|
||||
def input_keys(self) -> List[str]:
|
||||
return [self.input_key]
|
||||
|
||||
@property
|
||||
def output_keys(self) -> List[str]:
|
||||
_output_keys = [self.output_key]
|
||||
return _output_keys
|
||||
|
||||
@classmethod
|
||||
def from_llm(
|
||||
cls,
|
||||
llm: BaseLanguageModel,
|
||||
*,
|
||||
sparql_generation_prompt: BasePromptTemplate = GRAPHDB_SPARQL_GENERATION_PROMPT,
|
||||
sparql_fix_prompt: BasePromptTemplate = GRAPHDB_SPARQL_FIX_PROMPT,
|
||||
max_fix_retries: int = 5,
|
||||
qa_prompt: BasePromptTemplate = GRAPHDB_QA_PROMPT,
|
||||
**kwargs: Any,
|
||||
) -> OntotextGraphDBQAChain:
|
||||
"""Initialize from LLM."""
|
||||
sparql_generation_chain = LLMChain(llm=llm, prompt=sparql_generation_prompt)
|
||||
sparql_fix_chain = LLMChain(llm=llm, prompt=sparql_fix_prompt)
|
||||
max_fix_retries = max_fix_retries
|
||||
qa_chain = LLMChain(llm=llm, prompt=qa_prompt)
|
||||
return cls(
|
||||
qa_chain=qa_chain,
|
||||
sparql_generation_chain=sparql_generation_chain,
|
||||
sparql_fix_chain=sparql_fix_chain,
|
||||
max_fix_retries=max_fix_retries,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
def _call(
|
||||
self,
|
||||
inputs: Dict[str, Any],
|
||||
run_manager: Optional[CallbackManagerForChainRun] = None,
|
||||
) -> Dict[str, str]:
|
||||
"""
|
||||
Generate a SPARQL query, use it to retrieve a response from GraphDB and answer
|
||||
the question.
|
||||
"""
|
||||
_run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager()
|
||||
callbacks = _run_manager.get_child()
|
||||
prompt = inputs[self.input_key]
|
||||
ontology_schema = self.graph.get_schema
|
||||
|
||||
sparql_generation_chain_result = self.sparql_generation_chain.invoke(
|
||||
{"prompt": prompt, "schema": ontology_schema}, callbacks=callbacks
|
||||
)
|
||||
generated_sparql = sparql_generation_chain_result[
|
||||
self.sparql_generation_chain.output_key
|
||||
]
|
||||
|
||||
generated_sparql = self._get_valid_sparql_query(
|
||||
_run_manager, callbacks, generated_sparql, ontology_schema
|
||||
)
|
||||
query_results = self.graph.query(generated_sparql)
|
||||
|
||||
qa_chain_result = self.qa_chain.invoke(
|
||||
{"prompt": prompt, "context": query_results}, callbacks=callbacks
|
||||
)
|
||||
result = qa_chain_result[self.qa_chain.output_key]
|
||||
return {self.output_key: result}
|
||||
|
||||
def _get_valid_sparql_query(
|
||||
self,
|
||||
_run_manager: CallbackManagerForChainRun,
|
||||
callbacks: CallbackManager,
|
||||
generated_sparql: str,
|
||||
ontology_schema: str,
|
||||
) -> str:
|
||||
try:
|
||||
return self._prepare_sparql_query(_run_manager, generated_sparql)
|
||||
except Exception as e:
|
||||
retries = 0
|
||||
error_message = str(e)
|
||||
self._log_invalid_sparql_query(
|
||||
_run_manager, generated_sparql, error_message
|
||||
)
|
||||
|
||||
while retries < self.max_fix_retries:
|
||||
try:
|
||||
sparql_fix_chain_result = self.sparql_fix_chain.invoke(
|
||||
{
|
||||
"error_message": error_message,
|
||||
"generated_sparql": generated_sparql,
|
||||
"schema": ontology_schema,
|
||||
},
|
||||
callbacks=callbacks,
|
||||
)
|
||||
generated_sparql = sparql_fix_chain_result[
|
||||
self.sparql_fix_chain.output_key
|
||||
]
|
||||
return self._prepare_sparql_query(_run_manager, generated_sparql)
|
||||
except Exception as e:
|
||||
retries += 1
|
||||
parse_exception = str(e)
|
||||
self._log_invalid_sparql_query(
|
||||
_run_manager, generated_sparql, parse_exception
|
||||
)
|
||||
|
||||
raise ValueError("The generated SPARQL query is invalid.")
|
||||
|
||||
def _prepare_sparql_query(
|
||||
self, _run_manager: CallbackManagerForChainRun, generated_sparql: str
|
||||
) -> str:
|
||||
from rdflib.plugins.sparql import prepareQuery
|
||||
|
||||
prepareQuery(generated_sparql)
|
||||
self._log_valid_sparql_query(_run_manager, generated_sparql)
|
||||
return generated_sparql
|
||||
|
||||
def _log_valid_sparql_query(
|
||||
self, _run_manager: CallbackManagerForChainRun, generated_query: str
|
||||
) -> None:
|
||||
_run_manager.on_text("Generated SPARQL:", end="\n", verbose=self.verbose)
|
||||
_run_manager.on_text(
|
||||
generated_query, color="green", end="\n", verbose=self.verbose
|
||||
)
|
||||
|
||||
def _log_invalid_sparql_query(
|
||||
self,
|
||||
_run_manager: CallbackManagerForChainRun,
|
||||
generated_query: str,
|
||||
error_message: str,
|
||||
) -> None:
|
||||
_run_manager.on_text("Invalid SPARQL query: ", end="\n", verbose=self.verbose)
|
||||
_run_manager.on_text(
|
||||
generated_query, color="red", end="\n", verbose=self.verbose
|
||||
)
|
||||
_run_manager.on_text(
|
||||
"SPARQL Query Parse Error: ", end="\n", verbose=self.verbose
|
||||
)
|
||||
_run_manager.on_text(
|
||||
error_message, color="red", end="\n\n", verbose=self.verbose
|
||||
)
|
@ -0,0 +1,6 @@
|
||||
FROM ontotext/graphdb:10.5.1
|
||||
RUN mkdir -p /opt/graphdb/dist/data/repositories/starwars
|
||||
COPY config.ttl /opt/graphdb/dist/data/repositories/starwars/
|
||||
COPY starwars-data.trig /
|
||||
COPY graphdb_create.sh /run.sh
|
||||
ENTRYPOINT bash /run.sh
|
@ -0,0 +1,46 @@
|
||||
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
|
||||
@prefix rep: <http://www.openrdf.org/config/repository#>.
|
||||
@prefix sr: <http://www.openrdf.org/config/repository/sail#>.
|
||||
@prefix sail: <http://www.openrdf.org/config/sail#>.
|
||||
@prefix graphdb: <http://www.ontotext.com/config/graphdb#>.
|
||||
|
||||
[] a rep:Repository ;
|
||||
rep:repositoryID "starwars" ;
|
||||
rdfs:label "" ;
|
||||
rep:repositoryImpl [
|
||||
rep:repositoryType "graphdb:SailRepository" ;
|
||||
sr:sailImpl [
|
||||
sail:sailType "graphdb:Sail" ;
|
||||
|
||||
graphdb:read-only "false" ;
|
||||
|
||||
# Inference and Validation
|
||||
graphdb:ruleset "empty" ;
|
||||
graphdb:disable-sameAs "true" ;
|
||||
graphdb:check-for-inconsistencies "false" ;
|
||||
|
||||
# Indexing
|
||||
graphdb:entity-id-size "32" ;
|
||||
graphdb:enable-context-index "false" ;
|
||||
graphdb:enablePredicateList "true" ;
|
||||
graphdb:enable-fts-index "false" ;
|
||||
graphdb:fts-indexes ("default" "iri") ;
|
||||
graphdb:fts-string-literals-index "default" ;
|
||||
graphdb:fts-iris-index "none" ;
|
||||
|
||||
# Queries and Updates
|
||||
graphdb:query-timeout "0" ;
|
||||
graphdb:throw-QueryEvaluationException-on-timeout "false" ;
|
||||
graphdb:query-limit-results "0" ;
|
||||
|
||||
# Settable in the file but otherwise hidden in the UI and in the RDF4J console
|
||||
graphdb:base-URL "http://example.org/owlim#" ;
|
||||
graphdb:defaultNS "" ;
|
||||
graphdb:imports "" ;
|
||||
graphdb:repository-type "file-repository" ;
|
||||
graphdb:storage-folder "storage" ;
|
||||
graphdb:entity-index-size "10000000" ;
|
||||
graphdb:in-memory-literal-properties "true" ;
|
||||
graphdb:enable-literal-index "true" ;
|
||||
]
|
||||
].
|
@ -0,0 +1,9 @@
|
||||
version: '3.7'
|
||||
|
||||
services:
|
||||
|
||||
graphdb:
|
||||
image: graphdb
|
||||
container_name: graphdb
|
||||
ports:
|
||||
- "7200:7200"
|
@ -0,0 +1,33 @@
|
||||
#! /bin/bash
|
||||
REPOSITORY_ID="starwars"
|
||||
GRAPHDB_URI="http://localhost:7200/"
|
||||
|
||||
echo -e "\nUsing GraphDB: ${GRAPHDB_URI}"
|
||||
|
||||
function startGraphDB {
|
||||
echo -e "\nStarting GraphDB..."
|
||||
exec /opt/graphdb/dist/bin/graphdb
|
||||
}
|
||||
|
||||
function waitGraphDBStart {
|
||||
echo -e "\nWaiting GraphDB to start..."
|
||||
for _ in $(seq 1 5); do
|
||||
CHECK_RES=$(curl --silent --write-out '%{http_code}' --output /dev/null ${GRAPHDB_URI}/rest/repositories)
|
||||
if [ "${CHECK_RES}" = '200' ]; then
|
||||
echo -e "\nUp and running"
|
||||
break
|
||||
fi
|
||||
sleep 30s
|
||||
echo "CHECK_RES: ${CHECK_RES}"
|
||||
done
|
||||
}
|
||||
|
||||
function loadData {
|
||||
echo -e "\nImporting starwars-data.trig"
|
||||
curl -X POST -H "Content-Type: application/x-trig" -T /starwars-data.trig ${GRAPHDB_URI}/repositories/${REPOSITORY_ID}/statements
|
||||
}
|
||||
|
||||
startGraphDB &
|
||||
waitGraphDBStart
|
||||
loadData
|
||||
wait
|
@ -0,0 +1,5 @@
|
||||
set -ex
|
||||
|
||||
docker compose down -v --remove-orphans
|
||||
docker build --tag graphdb .
|
||||
docker compose up -d graphdb
|
@ -0,0 +1,160 @@
|
||||
@base <https://swapi.co/resource/>.
|
||||
@prefix voc: <https://swapi.co/vocabulary/> .
|
||||
@prefix owl: <http://www.w3.org/2002/07/owl#> .
|
||||
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
|
||||
|
||||
{
|
||||
|
||||
<human/11>
|
||||
a voc:Character , voc:Human ;
|
||||
rdfs:label "Anakin Skywalker", "Darth Vader" ;
|
||||
voc:birthYear "41.9BBY" ;
|
||||
voc:eyeColor "blue" ;
|
||||
voc:gender "male" ;
|
||||
voc:hairColor "blond" ;
|
||||
voc:height 188.0 ;
|
||||
voc:homeworld <planet/1> ;
|
||||
voc:mass 84.0 ;
|
||||
voc:skinColor "fair" ;
|
||||
voc:cybernetics "Cybernetic right arm" .
|
||||
|
||||
<human/1>
|
||||
a voc:Character , voc:Human ;
|
||||
rdfs:label "Luke Skywalker" ;
|
||||
voc:birthYear "19BBY" ;
|
||||
voc:eyeColor "blue" ;
|
||||
voc:gender "male" ;
|
||||
voc:hairColor "blond" ;
|
||||
voc:height 172.0 ;
|
||||
voc:homeworld <planet/1> ;
|
||||
voc:mass 77.0 ;
|
||||
voc:skinColor "fair" .
|
||||
|
||||
<human/35>
|
||||
a voc:Character , voc:Human ;
|
||||
rdfs:label "Padmé Amidala" ;
|
||||
voc:birthYear "46BBY" ;
|
||||
voc:eyeColor "brown" ;
|
||||
voc:gender "female" ;
|
||||
voc:hairColor "brown" ;
|
||||
voc:height 165.0 ;
|
||||
voc:homeworld <planet/8> ;
|
||||
voc:mass 45.0 ;
|
||||
voc:skinColor "light" .
|
||||
|
||||
<planet/1>
|
||||
a voc:Planet ;
|
||||
rdfs:label "Tatooine" ;
|
||||
voc:climate "arid" ;
|
||||
voc:diameter 10465 ;
|
||||
voc:gravity "1 standard" ;
|
||||
voc:orbitalPeriod 304 ;
|
||||
voc:population 200000 ;
|
||||
voc:resident <human/1> , <human/11> ;
|
||||
voc:rotationPeriod 23 ;
|
||||
voc:surfaceWater 1 ;
|
||||
voc:terrain "desert" .
|
||||
|
||||
<planet/8>
|
||||
a voc:Planet ;
|
||||
rdfs:label "Naboo" ;
|
||||
voc:climate "temperate" ;
|
||||
voc:diameter 12120 ;
|
||||
voc:gravity "1 standard" ;
|
||||
voc:orbitalPeriod 312 ;
|
||||
voc:population 4500000000 ;
|
||||
voc:resident <human/35> ;
|
||||
voc:rotationPeriod 26 ;
|
||||
voc:surfaceWater 12 ;
|
||||
voc:terrain "grassy hills, swamps, forests, mountains" .
|
||||
|
||||
<planet/14>
|
||||
a voc:Planet ;
|
||||
rdfs:label "Kashyyyk" ;
|
||||
voc:climate "tropical" ;
|
||||
voc:diameter 12765 ;
|
||||
voc:gravity "1 standard" ;
|
||||
voc:orbitalPeriod 381 ;
|
||||
voc:population 45000000 ;
|
||||
voc:resident <wookiee/13> , <wookiee/80> ;
|
||||
voc:rotationPeriod 26 ;
|
||||
voc:surfaceWater 60 ;
|
||||
voc:terrain "jungle, forests, lakes, rivers" .
|
||||
|
||||
<wookiee/13>
|
||||
a voc:Character , voc:Wookiee ;
|
||||
rdfs:label "Chewbacca" ;
|
||||
voc:birthYear "200BBY" ;
|
||||
voc:eyeColor "blue" ;
|
||||
voc:gender "male" ;
|
||||
voc:hairColor "brown" ;
|
||||
voc:height 228.0 ;
|
||||
voc:homeworld <planet/14> ;
|
||||
voc:mass 112.0 .
|
||||
|
||||
<wookiee/80>
|
||||
a voc:Character , voc:Wookiee ;
|
||||
rdfs:label "Tarfful" ;
|
||||
voc:eyeColor "blue" ;
|
||||
voc:gender "male" ;
|
||||
voc:hairColor "brown" ;
|
||||
voc:height 234.0 ;
|
||||
voc:homeworld <planet/14> ;
|
||||
voc:mass 136.0 ;
|
||||
voc:skinColor "brown" .
|
||||
}
|
||||
|
||||
<https://swapi.co/ontology/> {
|
||||
|
||||
voc:Character a owl:Class .
|
||||
voc:Species a owl:Class .
|
||||
|
||||
voc:Human a voc:Species;
|
||||
rdfs:label "Human";
|
||||
voc:averageHeight 180.0;
|
||||
voc:averageLifespan "120";
|
||||
voc:character <https://swapi.co/resource/human/1>, <https://swapi.co/resource/human/35>,
|
||||
<https://swapi.co/resource/human/11>;
|
||||
voc:language "Galactic Basic";
|
||||
voc:skinColor "black", "caucasian", "asian", "hispanic";
|
||||
voc:eyeColor "blue", "brown", "hazel", "green", "grey", "amber";
|
||||
voc:hairColor "brown", "red", "black", "blonde" .
|
||||
|
||||
voc:Planet a owl:Class .
|
||||
|
||||
voc:Wookiee a voc:Species;
|
||||
rdfs:label "Wookiee";
|
||||
voc:averageHeight 210.0;
|
||||
voc:averageLifespan "400";
|
||||
voc:character <https://swapi.co/resource/wookiee/13>, <https://swapi.co/resource/wookiee/80>;
|
||||
voc:language "Shyriiwook";
|
||||
voc:planet <https://swapi.co/resource/planet/14>;
|
||||
voc:skinColor "gray";
|
||||
voc:eyeColor "blue", "yellow", "brown", "red", "green", "golden";
|
||||
voc:hairColor "brown", "black" .
|
||||
|
||||
voc:birthYear a owl:DatatypeProperty .
|
||||
voc:eyeColor a owl:DatatypeProperty .
|
||||
voc:gender a owl:DatatypeProperty .
|
||||
voc:hairColor a owl:DatatypeProperty .
|
||||
voc:height a owl:DatatypeProperty .
|
||||
voc:homeworld a owl:ObjectProperty .
|
||||
voc:mass a owl:DatatypeProperty .
|
||||
voc:skinColor a owl:DatatypeProperty .
|
||||
voc:cybernetics a owl:DatatypeProperty .
|
||||
voc:climate a owl:DatatypeProperty .
|
||||
voc:diameter a owl:DatatypeProperty .
|
||||
voc:gravity a owl:DatatypeProperty .
|
||||
voc:orbitalPeriod a owl:DatatypeProperty .
|
||||
voc:population a owl:DatatypeProperty .
|
||||
voc:resident a owl:ObjectProperty .
|
||||
voc:rotationPeriod a owl:DatatypeProperty .
|
||||
voc:surfaceWater a owl:DatatypeProperty .
|
||||
voc:terrain a owl:DatatypeProperty .
|
||||
voc:averageHeight a owl:DatatypeProperty .
|
||||
voc:averageLifespan a owl:DatatypeProperty .
|
||||
voc:character a owl:ObjectProperty .
|
||||
voc:language a owl:DatatypeProperty .
|
||||
voc:planet a owl:ObjectProperty .
|
||||
|
||||
}
|
@ -0,0 +1,323 @@
|
||||
from unittest.mock import MagicMock, Mock
|
||||
|
||||
import pytest
|
||||
from langchain_community.graphs import OntotextGraphDBGraph
|
||||
|
||||
from langchain.chains import LLMChain, OntotextGraphDBQAChain
|
||||
|
||||
"""
|
||||
cd libs/langchain/tests/integration_tests/chains/docker-compose-ontotext-graphdb
|
||||
./start.sh
|
||||
"""
|
||||
|
||||
|
||||
@pytest.mark.requires("langchain_openai", "rdflib")
|
||||
@pytest.mark.parametrize("max_fix_retries", [-2, -1, 0, 1, 2])
|
||||
def test_valid_sparql(max_fix_retries: int) -> None:
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
question = "What is Luke Skywalker's home planet?"
|
||||
answer = "Tatooine"
|
||||
|
||||
graph = OntotextGraphDBGraph(
|
||||
query_endpoint="http://localhost:7200/repositories/starwars",
|
||||
query_ontology="CONSTRUCT {?s ?p ?o} "
|
||||
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
|
||||
)
|
||||
chain = OntotextGraphDBQAChain.from_llm(
|
||||
Mock(ChatOpenAI),
|
||||
graph=graph,
|
||||
max_fix_retries=max_fix_retries,
|
||||
)
|
||||
chain.sparql_generation_chain = Mock(LLMChain)
|
||||
chain.sparql_fix_chain = Mock(LLMChain)
|
||||
chain.qa_chain = Mock(LLMChain)
|
||||
|
||||
chain.sparql_generation_chain.output_key = "text"
|
||||
chain.sparql_generation_chain.invoke = MagicMock(
|
||||
return_value={
|
||||
"text": "SELECT * {?s ?p ?o} LIMIT 1",
|
||||
"prompt": question,
|
||||
"schema": "",
|
||||
}
|
||||
)
|
||||
chain.sparql_fix_chain.output_key = "text"
|
||||
chain.sparql_fix_chain.invoke = MagicMock()
|
||||
chain.qa_chain.output_key = "text"
|
||||
chain.qa_chain.invoke = MagicMock(
|
||||
return_value={
|
||||
"text": answer,
|
||||
"prompt": question,
|
||||
"context": [],
|
||||
}
|
||||
)
|
||||
|
||||
result = chain.invoke({chain.input_key: question})
|
||||
|
||||
assert chain.sparql_generation_chain.invoke.call_count == 1
|
||||
assert chain.sparql_fix_chain.invoke.call_count == 0
|
||||
assert chain.qa_chain.invoke.call_count == 1
|
||||
assert result == {chain.output_key: answer, chain.input_key: question}
|
||||
|
||||
|
||||
@pytest.mark.requires("langchain_openai", "rdflib")
|
||||
@pytest.mark.parametrize("max_fix_retries", [-2, -1, 0])
|
||||
def test_invalid_sparql_non_positive_max_fix_retries(
|
||||
max_fix_retries: int,
|
||||
) -> None:
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
question = "What is Luke Skywalker's home planet?"
|
||||
|
||||
graph = OntotextGraphDBGraph(
|
||||
query_endpoint="http://localhost:7200/repositories/starwars",
|
||||
query_ontology="CONSTRUCT {?s ?p ?o} "
|
||||
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
|
||||
)
|
||||
chain = OntotextGraphDBQAChain.from_llm(
|
||||
Mock(ChatOpenAI),
|
||||
graph=graph,
|
||||
max_fix_retries=max_fix_retries,
|
||||
)
|
||||
chain.sparql_generation_chain = Mock(LLMChain)
|
||||
chain.sparql_fix_chain = Mock(LLMChain)
|
||||
chain.qa_chain = Mock(LLMChain)
|
||||
|
||||
chain.sparql_generation_chain.output_key = "text"
|
||||
chain.sparql_generation_chain.invoke = MagicMock(
|
||||
return_value={
|
||||
"text": "```sparql SELECT * {?s ?p ?o} LIMIT 1```",
|
||||
"prompt": question,
|
||||
"schema": "",
|
||||
}
|
||||
)
|
||||
chain.sparql_fix_chain.output_key = "text"
|
||||
chain.sparql_fix_chain.invoke = MagicMock()
|
||||
chain.qa_chain.output_key = "text"
|
||||
chain.qa_chain.invoke = MagicMock()
|
||||
|
||||
with pytest.raises(ValueError) as e:
|
||||
chain.invoke({chain.input_key: question})
|
||||
|
||||
assert str(e.value) == "The generated SPARQL query is invalid."
|
||||
|
||||
assert chain.sparql_generation_chain.invoke.call_count == 1
|
||||
assert chain.sparql_fix_chain.invoke.call_count == 0
|
||||
assert chain.qa_chain.invoke.call_count == 0
|
||||
|
||||
|
||||
@pytest.mark.requires("langchain_openai", "rdflib")
|
||||
@pytest.mark.parametrize("max_fix_retries", [1, 2, 3])
|
||||
def test_valid_sparql_after_first_retry(max_fix_retries: int) -> None:
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
question = "What is Luke Skywalker's home planet?"
|
||||
answer = "Tatooine"
|
||||
generated_invalid_sparql = "```sparql SELECT * {?s ?p ?o} LIMIT 1```"
|
||||
|
||||
graph = OntotextGraphDBGraph(
|
||||
query_endpoint="http://localhost:7200/repositories/starwars",
|
||||
query_ontology="CONSTRUCT {?s ?p ?o} "
|
||||
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
|
||||
)
|
||||
chain = OntotextGraphDBQAChain.from_llm(
|
||||
Mock(ChatOpenAI),
|
||||
graph=graph,
|
||||
max_fix_retries=max_fix_retries,
|
||||
)
|
||||
chain.sparql_generation_chain = Mock(LLMChain)
|
||||
chain.sparql_fix_chain = Mock(LLMChain)
|
||||
chain.qa_chain = Mock(LLMChain)
|
||||
|
||||
chain.sparql_generation_chain.output_key = "text"
|
||||
chain.sparql_generation_chain.invoke = MagicMock(
|
||||
return_value={
|
||||
"text": generated_invalid_sparql,
|
||||
"prompt": question,
|
||||
"schema": "",
|
||||
}
|
||||
)
|
||||
chain.sparql_fix_chain.output_key = "text"
|
||||
chain.sparql_fix_chain.invoke = MagicMock(
|
||||
return_value={
|
||||
"text": "SELECT * {?s ?p ?o} LIMIT 1",
|
||||
"error_message": "pyparsing.exceptions.ParseException: "
|
||||
"Expected {SelectQuery | ConstructQuery | DescribeQuery | AskQuery}, "
|
||||
"found '`' (at char 0), (line:1, col:1)",
|
||||
"generated_sparql": generated_invalid_sparql,
|
||||
"schema": "",
|
||||
}
|
||||
)
|
||||
chain.qa_chain.output_key = "text"
|
||||
chain.qa_chain.invoke = MagicMock(
|
||||
return_value={
|
||||
"text": answer,
|
||||
"prompt": question,
|
||||
"context": [],
|
||||
}
|
||||
)
|
||||
|
||||
result = chain.invoke({chain.input_key: question})
|
||||
|
||||
assert chain.sparql_generation_chain.invoke.call_count == 1
|
||||
assert chain.sparql_fix_chain.invoke.call_count == 1
|
||||
assert chain.qa_chain.invoke.call_count == 1
|
||||
assert result == {chain.output_key: answer, chain.input_key: question}
|
||||
|
||||
|
||||
@pytest.mark.requires("langchain_openai", "rdflib")
|
||||
@pytest.mark.parametrize("max_fix_retries", [1, 2, 3])
|
||||
def test_invalid_sparql_after_all_retries(max_fix_retries: int) -> None:
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
question = "What is Luke Skywalker's home planet?"
|
||||
generated_invalid_sparql = "```sparql SELECT * {?s ?p ?o} LIMIT 1```"
|
||||
|
||||
graph = OntotextGraphDBGraph(
|
||||
query_endpoint="http://localhost:7200/repositories/starwars",
|
||||
query_ontology="CONSTRUCT {?s ?p ?o} "
|
||||
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
|
||||
)
|
||||
chain = OntotextGraphDBQAChain.from_llm(
|
||||
Mock(ChatOpenAI),
|
||||
graph=graph,
|
||||
max_fix_retries=max_fix_retries,
|
||||
)
|
||||
chain.sparql_generation_chain = Mock(LLMChain)
|
||||
chain.sparql_fix_chain = Mock(LLMChain)
|
||||
chain.qa_chain = Mock(LLMChain)
|
||||
|
||||
chain.sparql_generation_chain.output_key = "text"
|
||||
chain.sparql_generation_chain.invoke = MagicMock(
|
||||
return_value={
|
||||
"text": generated_invalid_sparql,
|
||||
"prompt": question,
|
||||
"schema": "",
|
||||
}
|
||||
)
|
||||
chain.sparql_fix_chain.output_key = "text"
|
||||
chain.sparql_fix_chain.invoke = MagicMock(
|
||||
return_value={
|
||||
"text": generated_invalid_sparql,
|
||||
"error_message": "pyparsing.exceptions.ParseException: "
|
||||
"Expected {SelectQuery | ConstructQuery | DescribeQuery | AskQuery}, "
|
||||
"found '`' (at char 0), (line:1, col:1)",
|
||||
"generated_sparql": generated_invalid_sparql,
|
||||
"schema": "",
|
||||
}
|
||||
)
|
||||
chain.qa_chain.output_key = "text"
|
||||
chain.qa_chain.invoke = MagicMock()
|
||||
|
||||
with pytest.raises(ValueError) as e:
|
||||
chain.invoke({chain.input_key: question})
|
||||
|
||||
assert str(e.value) == "The generated SPARQL query is invalid."
|
||||
|
||||
assert chain.sparql_generation_chain.invoke.call_count == 1
|
||||
assert chain.sparql_fix_chain.invoke.call_count == max_fix_retries
|
||||
assert chain.qa_chain.invoke.call_count == 0
|
||||
|
||||
|
||||
@pytest.mark.requires("langchain_openai", "rdflib")
|
||||
@pytest.mark.parametrize(
|
||||
"max_fix_retries,number_of_invalid_responses",
|
||||
[(1, 0), (2, 0), (2, 1), (10, 6)],
|
||||
)
|
||||
def test_valid_sparql_after_some_retries(
|
||||
max_fix_retries: int, number_of_invalid_responses: int
|
||||
) -> None:
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
question = "What is Luke Skywalker's home planet?"
|
||||
answer = "Tatooine"
|
||||
generated_invalid_sparql = "```sparql SELECT * {?s ?p ?o} LIMIT 1```"
|
||||
generated_valid_sparql_query = "SELECT * {?s ?p ?o} LIMIT 1"
|
||||
|
||||
graph = OntotextGraphDBGraph(
|
||||
query_endpoint="http://localhost:7200/repositories/starwars",
|
||||
query_ontology="CONSTRUCT {?s ?p ?o} "
|
||||
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
|
||||
)
|
||||
chain = OntotextGraphDBQAChain.from_llm(
|
||||
Mock(ChatOpenAI),
|
||||
graph=graph,
|
||||
max_fix_retries=max_fix_retries,
|
||||
)
|
||||
chain.sparql_generation_chain = Mock(LLMChain)
|
||||
chain.sparql_fix_chain = Mock(LLMChain)
|
||||
chain.qa_chain = Mock(LLMChain)
|
||||
|
||||
chain.sparql_generation_chain.output_key = "text"
|
||||
chain.sparql_generation_chain.invoke = MagicMock(
|
||||
return_value={
|
||||
"text": generated_invalid_sparql,
|
||||
"prompt": question,
|
||||
"schema": "",
|
||||
}
|
||||
)
|
||||
chain.sparql_fix_chain.output_key = "text"
|
||||
chain.sparql_fix_chain.invoke = Mock()
|
||||
chain.sparql_fix_chain.invoke.side_effect = [
|
||||
{
|
||||
"text": generated_invalid_sparql,
|
||||
"error_message": "pyparsing.exceptions.ParseException: "
|
||||
"Expected {SelectQuery | ConstructQuery | DescribeQuery | AskQuery}, "
|
||||
"found '`' (at char 0), (line:1, col:1)",
|
||||
"generated_sparql": generated_invalid_sparql,
|
||||
"schema": "",
|
||||
}
|
||||
] * number_of_invalid_responses + [
|
||||
{
|
||||
"text": generated_valid_sparql_query,
|
||||
"error_message": "pyparsing.exceptions.ParseException: "
|
||||
"Expected {SelectQuery | ConstructQuery | DescribeQuery | AskQuery}, "
|
||||
"found '`' (at char 0), (line:1, col:1)",
|
||||
"generated_sparql": generated_invalid_sparql,
|
||||
"schema": "",
|
||||
}
|
||||
]
|
||||
chain.qa_chain.output_key = "text"
|
||||
chain.qa_chain.invoke = MagicMock(
|
||||
return_value={
|
||||
"text": answer,
|
||||
"prompt": question,
|
||||
"context": [],
|
||||
}
|
||||
)
|
||||
|
||||
result = chain.invoke({chain.input_key: question})
|
||||
|
||||
assert chain.sparql_generation_chain.invoke.call_count == 1
|
||||
assert chain.sparql_fix_chain.invoke.call_count == number_of_invalid_responses + 1
|
||||
assert chain.qa_chain.invoke.call_count == 1
|
||||
assert result == {chain.output_key: answer, chain.input_key: question}
|
||||
|
||||
|
||||
@pytest.mark.requires("langchain_openai", "rdflib")
|
||||
@pytest.mark.parametrize(
|
||||
"model_name,question",
|
||||
[
|
||||
("gpt-3.5-turbo-1106", "What is the average height of the Wookiees?"),
|
||||
("gpt-3.5-turbo-1106", "What is the climate on Tatooine?"),
|
||||
("gpt-3.5-turbo-1106", "What is Luke Skywalker's home planet?"),
|
||||
("gpt-4-1106-preview", "What is the average height of the Wookiees?"),
|
||||
("gpt-4-1106-preview", "What is the climate on Tatooine?"),
|
||||
("gpt-4-1106-preview", "What is Luke Skywalker's home planet?"),
|
||||
],
|
||||
)
|
||||
def test_chain(model_name: str, question: str) -> None:
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
graph = OntotextGraphDBGraph(
|
||||
query_endpoint="http://localhost:7200/repositories/starwars",
|
||||
query_ontology="CONSTRUCT {?s ?p ?o} "
|
||||
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
|
||||
)
|
||||
chain = OntotextGraphDBQAChain.from_llm(
|
||||
ChatOpenAI(temperature=0, model_name=model_name), graph=graph, verbose=True
|
||||
)
|
||||
try:
|
||||
chain.invoke({chain.input_key: question})
|
||||
except ValueError:
|
||||
pass
|
@ -0,0 +1,2 @@
|
||||
def test_import() -> None:
|
||||
from langchain.chains import OntotextGraphDBQAChain # noqa: F401
|
Loading…
Reference in New Issue