langchain[minor], community[minor]: Implement Ontotext GraphDB QA Chain (#16019)

- **Description:** Implement Ontotext GraphDB QA Chain
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** @OntotextGraphDB
pull/16740/head
Neli Hateva 4 months ago committed by GitHub
parent a08f9a7ff9
commit c95facc293
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

File diff suppressed because one or more lines are too long

@ -0,0 +1,21 @@
# Ontotext GraphDB
>[Ontotext GraphDB](https://graphdb.ontotext.com/) is a graph database and knowledge discovery tool compliant with RDF and SPARQL.
## Dependencies
Install the [rdflib](https://github.com/RDFLib/rdflib) package with
```bash
pip install rdflib==7.0.0
```
## Graph QA Chain
Connect your GraphDB Database with a chat model to get insights on your data.
See the notebook example [here](/docs/use_cases/graph/graph_ontotext_graphdb_qa).
```python
from langchain_community.graphs import OntotextGraphDBGraph
from langchain.chains import OntotextGraphDBQAChain
```

@ -0,0 +1,543 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "922a7a98-7d73-4a1a-8860-76a33451d1be",
"metadata": {
"id": "922a7a98-7d73-4a1a-8860-76a33451d1be"
},
"source": [
"# Ontotext GraphDB QA Chain\n",
"\n",
"This notebook shows how to use LLMs to provide natural language querying (NLQ to SPARQL, also called text2sparql) for [Ontotext GraphDB](https://graphdb.ontotext.com/). Ontotext GraphDB is a graph database and knowledge discovery tool compliant with [RDF](https://www.w3.org/RDF/) and [SPARQL](https://www.w3.org/TR/sparql11-query/).\n",
"\n",
"## GraphDB LLM Functionalities\n",
"\n",
"GraphDB supports some LLM integration functionalities as described in [https://github.com/w3c/sparql-dev/issues/193](https://github.com/w3c/sparql-dev/issues/193):\n",
"\n",
"[gpt-queries](https://graphdb.ontotext.com/documentation/10.5/gpt-queries.html)\n",
"\n",
"* magic predicates to ask an LLM for text, list or table using data from your knowledge graph (KG)\n",
"* query explanation\n",
"* result explanation, summarization, rephrasing, translation\n",
"\n",
"[retrieval-graphdb-connector](https://graphdb.ontotext.com/documentation/10.5/retrieval-graphdb-connector.html)\n",
"\n",
"* Indexing of KG entities in a vector database\n",
"* Supports any text embedding algorithm and vector database\n",
"* Uses the same powerful connector (indexing) language that GraphDB uses for Elastic, Solr, Lucene\n",
"* Automatic synchronization of changes in RDF data to the KG entity index\n",
"* Supports nested objects (no UI support in GraphDB version 10.5)\n",
"* Serializes KG entities to text like this (e.g. for a Wines dataset):\n",
"\n",
"```\n",
"Franvino:\n",
"- is a RedWine.\n",
"- made from grape Merlo.\n",
"- made from grape Cabernet Franc.\n",
"- has sugar dry.\n",
"- has year 2012.\n",
"```\n",
"\n",
"[talk-to-graph](https://graphdb.ontotext.com/documentation/10.5/talk-to-graph.html)\n",
"\n",
"* A simple chatbot using a defined KG entity index\n",
"\n",
"## Querying the GraphDB Database\n",
"\n",
"For this tutorial, we won't use the GraphDB LLM integration, but SPARQL generation from NLQ. We'll use the Star Wars API (SWAPI) ontology and dataset that you can examine [here](https://drive.google.com/file/d/1wQ2K4uZp4eq3wlJ6_F_TxkOolaiczdYp/view?usp=drive_link).\n",
"\n",
"You will need to have a running GraphDB instance. This tutorial shows how to run the database locally using the [GraphDB Docker image](https://hub.docker.com/r/ontotext/graphdb). It provides a docker compose set-up, which populates GraphDB with the Star Wars dataset. All nessessary files including this notebook can be downloaded from GDrive.\n",
"\n",
"### Set-up\n",
"\n",
"* Install [Docker](https://docs.docker.com/get-docker/). This tutorial is created using Docker version `24.0.7` which bundles [Docker Compose](https://docs.docker.com/compose/). For earlier Docker versions you may need to install Docker Compose separately.\n",
"* Download all files from [GDrive](https://drive.google.com/drive/folders/18dN7WQxfGu26Z9C9HUU5jBwDuPnVTLbl) in a local folder on your machine.\n",
"* Start GraphDB with the following script executed from this folder\n",
" ```\n",
" docker build --tag graphdb .\n",
" docker compose up -d graphdb\n",
" ```\n",
" You need to wait a couple of seconds for the database to start on `http://localhost:7200/`. The Star Wars dataset `starwars-data.trig` is automatically loaded into the `langchain` repository. The local SPARQL endpoint `http://localhost:7200/repositories/langchain` can be used to run queries against. You can also open the GraphDB Workbench from your favourite web browser `http://localhost:7200/sparql` where you can make queries interactively.\n",
"* Working environment\n",
"\n",
"If you use `conda`, create and activate a new conda env (e.g. `conda create -n graph_ontotext_graphdb_qa python=3.9.18`).\n",
"Install the following libraries:\n",
"\n",
"```\n",
"pip install jupyter==1.0.0\n",
"pip install openai==1.6.1\n",
"pip install rdflib==7.0.0\n",
"pip install langchain-openai==0.0.2\n",
"pip install langchain\n",
"```\n",
"\n",
"Run Jupyter with\n",
"```\n",
"jupyter notebook\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "e51b397c-2fdc-4b99-9fed-1ab2b6ef7547",
"metadata": {
"id": "e51b397c-2fdc-4b99-9fed-1ab2b6ef7547"
},
"source": [
"### Specifying the Ontology\n",
"\n",
"In order for the LLM to be able to generate SPARQL, it needs to know the knowledge graph schema (the ontology). It can be provided using one of two parameters on the `OntotextGraphDBGraph` class:\n",
"\n",
"* `query_ontology`: a `CONSTRUCT` query that is executed on the SPARQL endpoint and returns the KG schema statements. We recommend that you store the ontology in its own named graph, which will make it easier to get only the relevant statements (as the example below). `DESCRIBE` queries are not supported, because `DESCRIBE` returns the Symmetric Concise Bounded Description (SCBD), i.e. also the incoming class links. In case of large graphs with a million of instances, this is not efficient. Check https://github.com/eclipse-rdf4j/rdf4j/issues/4857\n",
"* `local_file`: a local RDF ontology file. Supported RDF formats are `Turtle`, `RDF/XML`, `JSON-LD`, `N-Triples`, `Notation-3`, `Trig`, `Trix`, `N-Quads`.\n",
"\n",
"In either case, the ontology dump should:\n",
"\n",
"* Include enough information about classes, properties, property attachment to classes (using rdfs:domain, schema:domainIncludes or OWL restrictions), and taxonomies (important individuals).\n",
"* Not include overly verbose and irrelevant definitions and examples that do not help SPARQL construction."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "dc8792e0-acfb-4310-b5fa-8f649e448870",
"metadata": {
"id": "dc8792e0-acfb-4310-b5fa-8f649e448870"
},
"outputs": [],
"source": [
"from langchain_community.graphs import OntotextGraphDBGraph\n",
"\n",
"# feeding the schema using a user construct query\n",
"\n",
"graph = OntotextGraphDBGraph(\n",
" query_endpoint=\"http://localhost:7200/repositories/langchain\",\n",
" query_ontology=\"CONSTRUCT {?s ?p ?o} FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a08b8d8c-af01-4401-8069-5f2cd022a6df",
"metadata": {
"id": "a08b8d8c-af01-4401-8069-5f2cd022a6df"
},
"outputs": [],
"source": [
"# feeding the schema using a local RDF file\n",
"\n",
"graph = OntotextGraphDBGraph(\n",
" query_endpoint=\"http://localhost:7200/repositories/langchain\",\n",
" local_file=\"/path/to/langchain_graphdb_tutorial/starwars-ontology.nt\", # change the path here\n",
")"
]
},
{
"cell_type": "markdown",
"id": "583b26ce-fb0d-4e9c-b5cd-9ec0e3be8922",
"metadata": {
"id": "583b26ce-fb0d-4e9c-b5cd-9ec0e3be8922"
},
"source": [
"Either way, the ontology (schema) is fed to the LLM as `Turtle` since `Turtle` with appropriate prefixes is most compact and easiest for the LLM to remember.\n",
"\n",
"The Star Wars ontology is a bit unusual in that it includes a lot of specific triples about classes, e.g. that the species `:Aleena` live on `<planet/38>`, they are a subclass of `:Reptile`, have certain typical characteristics (average height, average lifespan, skinColor), and specific individuals (characters) are representatives of that class:\n",
"\n",
"\n",
"```\n",
"@prefix : <https://swapi.co/vocabulary/> .\n",
"@prefix owl: <http://www.w3.org/2002/07/owl#> .\n",
"@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .\n",
"@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .\n",
"\n",
":Aleena a owl:Class, :Species ;\n",
" rdfs:label \"Aleena\" ;\n",
" rdfs:isDefinedBy <https://swapi.co/ontology/> ;\n",
" rdfs:subClassOf :Reptile, :Sentient ;\n",
" :averageHeight 80.0 ;\n",
" :averageLifespan \"79\" ;\n",
" :character <https://swapi.co/resource/aleena/47> ;\n",
" :film <https://swapi.co/resource/film/4> ;\n",
" :language \"Aleena\" ;\n",
" :planet <https://swapi.co/resource/planet/38> ;\n",
" :skinColor \"blue\", \"gray\" .\n",
"\n",
" ...\n",
"\n",
" ```\n"
]
},
{
"cell_type": "markdown",
"id": "6277d911-b0f6-4aeb-9aa5-96416b668468",
"metadata": {
"id": "6277d911-b0f6-4aeb-9aa5-96416b668468"
},
"source": [
"In order to keep this tutorial simple, we use un-secured GraphDB. If GraphDB is secured, you should set the environment variables 'GRAPHDB_USERNAME' and 'GRAPHDB_PASSWORD' before the initialization of `OntotextGraphDBGraph`.\n",
"\n",
"```python\n",
"os.environ[\"GRAPHDB_USERNAME\"] = \"graphdb-user\"\n",
"os.environ[\"GRAPHDB_PASSWORD\"] = \"graphdb-password\"\n",
"\n",
"graph = OntotextGraphDBGraph(\n",
" query_endpoint=...,\n",
" query_ontology=...\n",
")\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "446d8a00-c98f-43b8-9e84-77b244f7bb24",
"metadata": {
"id": "446d8a00-c98f-43b8-9e84-77b244f7bb24"
},
"source": [
"### Question Answering against the StarWars Dataset\n",
"\n",
"We can now use the `OntotextGraphDBQAChain` to ask some questions."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "fab63d88-511d-4049-9bf0-ca8748f1fbff",
"metadata": {
"id": "fab63d88-511d-4049-9bf0-ca8748f1fbff"
},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain.chains import OntotextGraphDBQAChain\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"# We'll be using an OpenAI model which requires an OpenAI API Key.\n",
"# However, other models are available as well:\n",
"# https://python.langchain.com/docs/integrations/chat/\n",
"\n",
"# Set the environment variable `OPENAI_API_KEY` to your OpenAI API key\n",
"os.environ[\"OPENAI_API_KEY\"] = \"sk-***\"\n",
"\n",
"# Any available OpenAI model can be used here.\n",
"# We use 'gpt-4-1106-preview' because of the bigger context window.\n",
"# The 'gpt-4-1106-preview' model_name will deprecate in the future and will change to 'gpt-4-turbo' or similar,\n",
"# so be sure to consult with the OpenAI API https://platform.openai.com/docs/models for the correct naming.\n",
"\n",
"chain = OntotextGraphDBQAChain.from_llm(\n",
" ChatOpenAI(temperature=0, model_name=\"gpt-4-1106-preview\"),\n",
" graph=graph,\n",
" verbose=True,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "64de8463-35b1-4c65-91e4-387daf4dd7d4",
"metadata": {},
"source": [
"Let's ask a simple one."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f1dc4bea-b0f1-48f7-99a6-351a31acac7b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new OntotextGraphDBQAChain chain...\u001b[0m\n",
"Generated SPARQL:\n",
"\u001b[32;1m\u001b[1;3mPREFIX : <https://swapi.co/vocabulary/>\n",
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
"\n",
"SELECT ?climate\n",
"WHERE {\n",
" ?planet rdfs:label \"Tatooine\" ;\n",
" :climate ?climate .\n",
"}\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'The climate on Tatooine is arid.'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke({chain.input_key: \"What is the climate on Tatooine?\"})[chain.output_key]"
]
},
{
"cell_type": "markdown",
"id": "6d3a37f4-5c56-4b3e-b6ae-3eb030ffcc8f",
"metadata": {},
"source": [
"And a bit more complicated one."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4dde8b18-4329-4a86-abfb-26d3e77034b7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new OntotextGraphDBQAChain chain...\u001b[0m\n",
"Generated SPARQL:\n",
"\u001b[32;1m\u001b[1;3mPREFIX : <https://swapi.co/vocabulary/>\n",
"PREFIX owl: <http://www.w3.org/2002/07/owl#>\n",
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
"\n",
"SELECT ?climate\n",
"WHERE {\n",
" ?character rdfs:label \"Luke Skywalker\" .\n",
" ?character :homeworld ?planet .\n",
" ?planet :climate ?climate .\n",
"}\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"The climate on Luke Skywalker's home planet is arid.\""
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke({chain.input_key: \"What is the climate on Luke Skywalker's home planet?\"})[\n",
" chain.output_key\n",
"]"
]
},
{
"cell_type": "markdown",
"id": "51d3ce3e-9528-4a65-8f3e-2281de08cbf1",
"metadata": {},
"source": [
"We can also ask more complicated questions like"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ab6f55f1-a3e0-4615-abd2-3cb26619c8d9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new OntotextGraphDBQAChain chain...\u001b[0m\n",
"Generated SPARQL:\n",
"\u001b[32;1m\u001b[1;3mPREFIX : <https://swapi.co/vocabulary/>\n",
"PREFIX owl: <http://www.w3.org/2002/07/owl#>\n",
"PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>\n",
"PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>\n",
"\n",
"SELECT (AVG(?boxOffice) AS ?averageBoxOffice)\n",
"WHERE {\n",
" ?film a :Film .\n",
" ?film :boxOffice ?boxOfficeValue .\n",
" BIND(xsd:decimal(?boxOfficeValue) AS ?boxOffice)\n",
"}\n",
"\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'The average box office revenue for all the Star Wars movies is approximately 754.1 million dollars.'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke(\n",
" {\n",
" chain.input_key: \"What is the average box office revenue for all the Star Wars movies?\"\n",
" }\n",
")[chain.output_key]"
]
},
{
"cell_type": "markdown",
"id": "11511345-8436-4634-92c6-36f2c0dd44db",
"metadata": {
"id": "11511345-8436-4634-92c6-36f2c0dd44db"
},
"source": [
"### Chain Modifiers\n",
"\n",
"The Ontotext GraphDB QA chain allows prompt refinement for further improvement of your QA chain and enhancing the overall user experience of your app.\n",
"\n",
"\n",
"#### \"SPARQL Generation\" Prompt\n",
"\n",
"The prompt is used for the SPARQL query generation based on the user question and the KG schema.\n",
"\n",
"- `sparql_generation_prompt`\n",
"\n",
" Default value:\n",
" ````python\n",
" GRAPHDB_SPARQL_GENERATION_TEMPLATE = \"\"\"\n",
" Write a SPARQL SELECT query for querying a graph database.\n",
" The ontology schema delimited by triple backticks in Turtle format is:\n",
" ```\n",
" {schema}\n",
" ```\n",
" Use only the classes and properties provided in the schema to construct the SPARQL query.\n",
" Do not use any classes or properties that are not explicitly provided in the SPARQL query.\n",
" Include all necessary prefixes.\n",
" Do not include any explanations or apologies in your responses.\n",
" Do not wrap the query in backticks.\n",
" Do not include any text except the SPARQL query generated.\n",
" The question delimited by triple backticks is:\n",
" ```\n",
" {prompt}\n",
" ```\n",
" \"\"\"\n",
" GRAPHDB_SPARQL_GENERATION_PROMPT = PromptTemplate(\n",
" input_variables=[\"schema\", \"prompt\"],\n",
" template=GRAPHDB_SPARQL_GENERATION_TEMPLATE,\n",
" )\n",
" ````\n",
"\n",
"#### \"SPARQL Fix\" Prompt\n",
"\n",
"Sometimes, the LLM may generate a SPARQL query with syntactic errors or missing prefixes, etc. The chain will try to amend this by prompting the LLM to correct it a certain number of times.\n",
"\n",
"- `sparql_fix_prompt`\n",
"\n",
" Default value:\n",
" ````python\n",
" GRAPHDB_SPARQL_FIX_TEMPLATE = \"\"\"\n",
" This following SPARQL query delimited by triple backticks\n",
" ```\n",
" {generated_sparql}\n",
" ```\n",
" is not valid.\n",
" The error delimited by triple backticks is\n",
" ```\n",
" {error_message}\n",
" ```\n",
" Give me a correct version of the SPARQL query.\n",
" Do not change the logic of the query.\n",
" Do not include any explanations or apologies in your responses.\n",
" Do not wrap the query in backticks.\n",
" Do not include any text except the SPARQL query generated.\n",
" The ontology schema delimited by triple backticks in Turtle format is:\n",
" ```\n",
" {schema}\n",
" ```\n",
" \"\"\"\n",
" \n",
" GRAPHDB_SPARQL_FIX_PROMPT = PromptTemplate(\n",
" input_variables=[\"error_message\", \"generated_sparql\", \"schema\"],\n",
" template=GRAPHDB_SPARQL_FIX_TEMPLATE,\n",
" )\n",
" ````\n",
"\n",
"- `max_fix_retries`\n",
" \n",
" Default value: `5`\n",
"\n",
"#### \"Answering\" Prompt\n",
"\n",
"The prompt is used for answering the question based on the results returned from the database and the initial user question. By default, the LLM is instructed to only use the information from the returned result(s). If the result set is empty, the LLM should inform that it can't answer the question.\n",
"\n",
"- `qa_prompt`\n",
" \n",
" Default value:\n",
" ````python\n",
" GRAPHDB_QA_TEMPLATE = \"\"\"Task: Generate a natural language response from the results of a SPARQL query.\n",
" You are an assistant that creates well-written and human understandable answers.\n",
" The information part contains the information provided, which you can use to construct an answer.\n",
" The information provided is authoritative, you must never doubt it or try to use your internal knowledge to correct it.\n",
" Make your response sound like the information is coming from an AI assistant, but don't add any information.\n",
" Don't use internal knowledge to answer the question, just say you don't know if no information is available.\n",
" Information:\n",
" {context}\n",
" \n",
" Question: {prompt}\n",
" Helpful Answer:\"\"\"\n",
" GRAPHDB_QA_PROMPT = PromptTemplate(\n",
" input_variables=[\"context\", \"prompt\"], template=GRAPHDB_QA_TEMPLATE\n",
" )\n",
" ````"
]
},
{
"cell_type": "markdown",
"id": "2ef8c073-003d-44ab-8a7b-cf45c50f6370",
"metadata": {
"id": "2ef8c073-003d-44ab-8a7b-cf45c50f6370"
},
"source": [
"Once you're finished playing with QA with GraphDB, you can shut down the Docker environment by running\n",
"``\n",
"docker compose down -v --remove-orphans\n",
"``\n",
"from the directory with the Docker compose file."
]
}
],
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -9,6 +9,7 @@ from langchain_community.graphs.nebula_graph import NebulaGraph
from langchain_community.graphs.neo4j_graph import Neo4jGraph
from langchain_community.graphs.neptune_graph import NeptuneGraph
from langchain_community.graphs.networkx_graph import NetworkxEntityGraph
from langchain_community.graphs.ontotext_graphdb_graph import OntotextGraphDBGraph
from langchain_community.graphs.rdf_graph import RdfGraph
from langchain_community.graphs.tigergraph_graph import TigerGraph
@ -24,4 +25,5 @@ __all__ = [
"ArangoGraph",
"FalkorDBGraph",
"TigerGraph",
"OntotextGraphDBGraph",
]

@ -0,0 +1,213 @@
from __future__ import annotations
import os
from typing import (
TYPE_CHECKING,
List,
Optional,
Union,
)
if TYPE_CHECKING:
import rdflib
class OntotextGraphDBGraph:
"""Ontotext GraphDB https://graphdb.ontotext.com/ wrapper for graph operations.
*Security note*: Make sure that the database connection uses credentials
that are narrowly-scoped to only include necessary permissions.
Failure to do so may result in data corruption or loss, since the calling
code may attempt commands that would result in deletion, mutation
of data if appropriately prompted or reading sensitive data if such
data is present in the database.
The best way to guard against such negative outcomes is to (as appropriate)
limit the permissions granted to the credentials used with this tool.
See https://python.langchain.com/docs/security for more information.
"""
def __init__(
self,
query_endpoint: str,
query_ontology: Optional[str] = None,
local_file: Optional[str] = None,
local_file_format: Optional[str] = None,
) -> None:
"""
Set up the GraphDB wrapper
:param query_endpoint: SPARQL endpoint for queries, read access
If GraphDB is secured,
set the environment variables 'GRAPHDB_USERNAME' and 'GRAPHDB_PASSWORD'.
:param query_ontology: a `CONSTRUCT` query that is executed
on the SPARQL endpoint and returns the KG schema statements
Example:
'CONSTRUCT {?s ?p ?o} FROM <https://example.com/ontology/> WHERE {?s ?p ?o}'
Currently, DESCRIBE queries like
'PREFIX onto: <https://example.com/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
DESCRIBE ?term WHERE {
?term rdfs:isDefinedBy onto:
}'
are not supported, because DESCRIBE returns
the Symmetric Concise Bounded Description (SCBD),
i.e. also the incoming class links.
In case of large graphs with a million of instances, this is not efficient.
Check https://github.com/eclipse-rdf4j/rdf4j/issues/4857
:param local_file: a local RDF ontology file.
Supported RDF formats:
Turtle, RDF/XML, JSON-LD, N-Triples, Notation-3, Trig, Trix, N-Quads.
If the rdf format can't be determined from the file extension,
pass explicitly the rdf format in `local_file_format` param.
:param local_file_format: Used if the rdf format can't be determined
from the local file extension.
One of "json-ld", "xml", "n3", "turtle", "nt", "trig", "nquads", "trix"
Either `query_ontology` or `local_file` should be passed.
"""
if query_ontology and local_file:
raise ValueError("Both file and query provided. Only one is allowed.")
if not query_ontology and not local_file:
raise ValueError("Neither file nor query provided. One is required.")
try:
import rdflib
from rdflib.plugins.stores import sparqlstore
except ImportError:
raise ValueError(
"Could not import rdflib python package. "
"Please install it with `pip install rdflib`."
)
auth = self._get_auth()
store = sparqlstore.SPARQLStore(auth=auth)
store.open(query_endpoint)
self.graph = rdflib.Graph(store, identifier=None, bind_namespaces="none")
self._check_connectivity()
if local_file:
ontology_schema_graph = self._load_ontology_schema_from_file(
local_file, local_file_format
)
else:
self._validate_user_query(query_ontology)
ontology_schema_graph = self._load_ontology_schema_with_query(
query_ontology
)
self.schema = ontology_schema_graph.serialize(format="turtle")
@staticmethod
def _get_auth() -> Union[tuple, None]:
"""
Returns the basic authentication configuration
"""
username = os.environ.get("GRAPHDB_USERNAME", None)
password = os.environ.get("GRAPHDB_PASSWORD", None)
if username:
if not password:
raise ValueError(
"Environment variable 'GRAPHDB_USERNAME' is set, "
"but 'GRAPHDB_PASSWORD' is not set."
)
else:
return username, password
return None
def _check_connectivity(self) -> None:
"""
Executes a simple `ASK` query to check connectivity
"""
try:
self.graph.query("ASK { ?s ?p ?o }")
except ValueError:
raise ValueError(
"Could not query the provided endpoint. "
"Please, check, if the value of the provided "
"query_endpoint points to the right repository. "
"If GraphDB is secured, please, "
"make sure that the environment variables "
"'GRAPHDB_USERNAME' and 'GRAPHDB_PASSWORD' are set."
)
@staticmethod
def _load_ontology_schema_from_file(local_file: str, local_file_format: str = None):
"""
Parse the ontology schema statements from the provided file
"""
import rdflib
if not os.path.exists(local_file):
raise FileNotFoundError(f"File {local_file} does not exist.")
if not os.access(local_file, os.R_OK):
raise PermissionError(f"Read permission for {local_file} is restricted")
graph = rdflib.ConjunctiveGraph()
try:
graph.parse(local_file, format=local_file_format)
except Exception as e:
raise ValueError(f"Invalid file format for {local_file} : ", e)
return graph
@staticmethod
def _validate_user_query(query_ontology: str) -> None:
"""
Validate the query is a valid SPARQL CONSTRUCT query
"""
from pyparsing import ParseException
from rdflib.plugins.sparql import prepareQuery
if not isinstance(query_ontology, str):
raise TypeError("Ontology query must be provided as string.")
try:
parsed_query = prepareQuery(query_ontology)
except ParseException as e:
raise ValueError("Ontology query is not a valid SPARQL query.", e)
if parsed_query.algebra.name != "ConstructQuery":
raise ValueError(
"Invalid query type. Only CONSTRUCT queries are supported."
)
def _load_ontology_schema_with_query(self, query: str):
"""
Execute the query for collecting the ontology schema statements
"""
from rdflib.exceptions import ParserError
try:
results = self.graph.query(query)
except ParserError as e:
raise ValueError(f"Generated SPARQL statement is invalid\n{e}")
return results.graph
@property
def get_schema(self) -> str:
"""
Returns the schema of the graph database in turtle format
"""
return self.schema
def query(
self,
query: str,
) -> List[rdflib.query.ResultRow]:
"""
Query the graph.
"""
from rdflib.exceptions import ParserError
from rdflib.query import ResultRow
try:
res = self.graph.query(query)
except ParserError as e:
raise ValueError(f"Generated SPARQL statement is invalid\n{e}")
return [r for r in res if isinstance(r, ResultRow)]

@ -3433,7 +3433,6 @@ files = [
{file = "jq-1.6.0-cp37-cp37m-musllinux_1_1_i686.whl", hash = "sha256:227b178b22a7f91ae88525810441791b1ca1fc71c86f03190911793be15cec3d"},
{file = "jq-1.6.0-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:780eb6383fbae12afa819ef676fc93e1548ae4b076c004a393af26a04b460742"},
{file = "jq-1.6.0-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:08ded6467f4ef89fec35b2bf310f210f8cd13fbd9d80e521500889edf8d22441"},
{file = "jq-1.6.0-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:49e44ed677713f4115bd5bf2dbae23baa4cd503be350e12a1c1f506b0687848f"},
{file = "jq-1.6.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:984f33862af285ad3e41e23179ac4795f1701822473e1a26bf87ff023e5a89ea"},
{file = "jq-1.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f42264fafc6166efb5611b5d4cb01058887d050a6c19334f6a3f8a13bb369df5"},
{file = "jq-1.6.0-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:a67154f150aaf76cc1294032ed588436eb002097dd4fd1e283824bf753a05080"},
@ -6223,6 +6222,7 @@ files = [
{file = "pymongo-4.6.1-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:b8729dbf25eb32ad0dc0b9bd5e6a0d0b7e5c2dc8ec06ad171088e1896b522a74"},
{file = "pymongo-4.6.1-cp312-cp312-win32.whl", hash = "sha256:3177f783ae7e08aaf7b2802e0df4e4b13903520e8380915e6337cdc7a6ff01d8"},
{file = "pymongo-4.6.1-cp312-cp312-win_amd64.whl", hash = "sha256:00c199e1c593e2c8b033136d7a08f0c376452bac8a896c923fcd6f419e07bdd2"},
{file = "pymongo-4.6.1-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:6dcc95f4bb9ed793714b43f4f23a7b0c57e4ef47414162297d6f650213512c19"},
{file = "pymongo-4.6.1-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:13552ca505366df74e3e2f0a4f27c363928f3dff0eef9f281eb81af7f29bc3c5"},
{file = "pymongo-4.6.1-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:77e0df59b1a4994ad30c6d746992ae887f9756a43fc25dec2db515d94cf0222d"},
{file = "pymongo-4.6.1-cp37-cp37m-manylinux2014_aarch64.whl", hash = "sha256:3a7f02a58a0c2912734105e05dedbee4f7507e6f1bd132ebad520be0b11d46fd"},
@ -7093,6 +7093,27 @@ PyYAML = "*"
Shapely = ">=1.7.1"
six = ">=1.15.0"
[[package]]
name = "rdflib"
version = "7.0.0"
description = "RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information."
optional = true
python-versions = ">=3.8.1,<4.0.0"
files = [
{file = "rdflib-7.0.0-py3-none-any.whl", hash = "sha256:0438920912a642c866a513de6fe8a0001bd86ef975057d6962c79ce4771687cd"},
{file = "rdflib-7.0.0.tar.gz", hash = "sha256:9995eb8569428059b8c1affd26b25eac510d64f5043d9ce8c84e0d0036e995ae"},
]
[package.dependencies]
isodate = ">=0.6.0,<0.7.0"
pyparsing = ">=2.1.0,<4"
[package.extras]
berkeleydb = ["berkeleydb (>=18.1.0,<19.0.0)"]
html = ["html5lib (>=1.0,<2.0)"]
lxml = ["lxml (>=4.3.0,<5.0.0)"]
networkx = ["networkx (>=2.0.0,<3.0.0)"]
[[package]]
name = "referencing"
version = "0.31.1"
@ -9226,9 +9247,9 @@ testing = ["big-O", "jaraco.functools", "jaraco.itertools", "more-itertools", "p
[extras]
cli = ["typer"]
extended-testing = ["aiosqlite", "aleph-alpha-client", "anthropic", "arxiv", "assemblyai", "atlassian-python-api", "azure-ai-documentintelligence", "beautifulsoup4", "bibtexparser", "cassio", "chardet", "cohere", "dashvector", "databricks-vectorsearch", "datasets", "dgml-utils", "elasticsearch", "esprima", "faiss-cpu", "feedparser", "fireworks-ai", "geopandas", "gitpython", "google-cloud-documentai", "gql", "gradientai", "hdbcli", "hologres-vector", "html2text", "javelin-sdk", "jinja2", "jq", "jsonschema", "lxml", "markdownify", "motor", "msal", "mwparserfromhell", "mwxml", "newspaper3k", "numexpr", "oci", "openai", "openapi-pydantic", "oracle-ads", "pandas", "pdfminer-six", "pgvector", "praw", "psychicapi", "py-trello", "pymupdf", "pypdf", "pypdfium2", "pyspark", "rank-bm25", "rapidfuzz", "rapidocr-onnxruntime", "requests-toolbelt", "rspace_client", "scikit-learn", "sqlite-vss", "streamlit", "sympy", "telethon", "timescale-vector", "tqdm", "upstash-redis", "xata", "xmltodict", "zhipuai"]
extended-testing = ["aiosqlite", "aleph-alpha-client", "anthropic", "arxiv", "assemblyai", "atlassian-python-api", "azure-ai-documentintelligence", "beautifulsoup4", "bibtexparser", "cassio", "chardet", "cohere", "dashvector", "databricks-vectorsearch", "datasets", "dgml-utils", "elasticsearch", "esprima", "faiss-cpu", "feedparser", "fireworks-ai", "geopandas", "gitpython", "google-cloud-documentai", "gql", "gradientai", "hdbcli", "hologres-vector", "html2text", "javelin-sdk", "jinja2", "jq", "jsonschema", "lxml", "markdownify", "motor", "msal", "mwparserfromhell", "mwxml", "newspaper3k", "numexpr", "oci", "openai", "openapi-pydantic", "oracle-ads", "pandas", "pdfminer-six", "pgvector", "praw", "psychicapi", "py-trello", "pymupdf", "pypdf", "pypdfium2", "pyspark", "rank-bm25", "rapidfuzz", "rapidocr-onnxruntime", "rdflib", "requests-toolbelt", "rspace_client", "scikit-learn", "sqlite-vss", "streamlit", "sympy", "telethon", "timescale-vector", "tqdm", "upstash-redis", "xata", "xmltodict", "zhipuai"]
[metadata]
lock-version = "2.0"
python-versions = ">=3.8.1,<4.0"
content-hash = "064816bab088c1f6ff9902cb998291581b66a6d7762f965ff805b4e0b9b2e7e9"
content-hash = "42d012441d7b42d273e11708b7e12308fc56b169d4d56c4c2511e7469743a983"

@ -90,6 +90,7 @@ zhipuai = {version = "^1.0.7", optional = true}
elasticsearch = {version = "^8.12.0", optional = true}
hdbcli = {version = "^2.19.21", optional = true}
oci = {version = "^2.119.1", optional = true}
rdflib = {version = "7.0.0", optional = true}
[tool.poetry.group.test]
optional = true
@ -254,7 +255,8 @@ extended_testing = [
"zhipuai",
"elasticsearch",
"hdbcli",
"oci"
"oci",
"rdflib",
]
[tool.ruff]
@ -303,7 +305,7 @@ markers = [
asyncio_mode = "auto"
[tool.codespell]
skip = '.git,*.pdf,*.svg,*.pdf,*.yaml,*.ipynb,poetry.lock,*.min.js,*.css,package-lock.json,example_data,_dist,examples'
skip = '.git,*.pdf,*.svg,*.pdf,*.yaml,*.ipynb,poetry.lock,*.min.js,*.css,package-lock.json,example_data,_dist,examples,*.trig'
# Ignore latin etc
ignore-regex = '.*(Stati Uniti|Tense=Pres).*'
# whats is a typo but used frequently in queries so kept as is

@ -0,0 +1,6 @@
FROM ontotext/graphdb:10.5.1
RUN mkdir -p /opt/graphdb/dist/data/repositories/langchain
COPY config.ttl /opt/graphdb/dist/data/repositories/langchain/
COPY starwars-data.trig /
COPY graphdb_create.sh /run.sh
ENTRYPOINT bash /run.sh

@ -0,0 +1,46 @@
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rep: <http://www.openrdf.org/config/repository#>.
@prefix sr: <http://www.openrdf.org/config/repository/sail#>.
@prefix sail: <http://www.openrdf.org/config/sail#>.
@prefix graphdb: <http://www.ontotext.com/config/graphdb#>.
[] a rep:Repository ;
rep:repositoryID "langchain" ;
rdfs:label "" ;
rep:repositoryImpl [
rep:repositoryType "graphdb:SailRepository" ;
sr:sailImpl [
sail:sailType "graphdb:Sail" ;
graphdb:read-only "false" ;
# Inference and Validation
graphdb:ruleset "empty" ;
graphdb:disable-sameAs "true" ;
graphdb:check-for-inconsistencies "false" ;
# Indexing
graphdb:entity-id-size "32" ;
graphdb:enable-context-index "false" ;
graphdb:enablePredicateList "true" ;
graphdb:enable-fts-index "false" ;
graphdb:fts-indexes ("default" "iri") ;
graphdb:fts-string-literals-index "default" ;
graphdb:fts-iris-index "none" ;
# Queries and Updates
graphdb:query-timeout "0" ;
graphdb:throw-QueryEvaluationException-on-timeout "false" ;
graphdb:query-limit-results "0" ;
# Settable in the file but otherwise hidden in the UI and in the RDF4J console
graphdb:base-URL "http://example.org/owlim#" ;
graphdb:defaultNS "" ;
graphdb:imports "" ;
graphdb:repository-type "file-repository" ;
graphdb:storage-folder "storage" ;
graphdb:entity-index-size "10000000" ;
graphdb:in-memory-literal-properties "true" ;
graphdb:enable-literal-index "true" ;
]
].

@ -0,0 +1,9 @@
version: '3.7'
services:
graphdb:
image: graphdb
container_name: graphdb
ports:
- "7200:7200"

@ -0,0 +1,33 @@
#! /bin/bash
REPOSITORY_ID="langchain"
GRAPHDB_URI="http://localhost:7200/"
echo -e "\nUsing GraphDB: ${GRAPHDB_URI}"
function startGraphDB {
echo -e "\nStarting GraphDB..."
exec /opt/graphdb/dist/bin/graphdb
}
function waitGraphDBStart {
echo -e "\nWaiting GraphDB to start..."
for _ in $(seq 1 5); do
CHECK_RES=$(curl --silent --write-out '%{http_code}' --output /dev/null ${GRAPHDB_URI}/rest/repositories)
if [ "${CHECK_RES}" = '200' ]; then
echo -e "\nUp and running"
break
fi
sleep 30s
echo "CHECK_RES: ${CHECK_RES}"
done
}
function loadData {
echo -e "\nImporting starwars-data.trig"
curl -X POST -H "Content-Type: application/x-trig" -T /starwars-data.trig ${GRAPHDB_URI}/repositories/${REPOSITORY_ID}/statements
}
startGraphDB &
waitGraphDBStart
loadData
wait

@ -0,0 +1,5 @@
set -ex
docker compose down -v --remove-orphans
docker build --tag graphdb .
docker compose up -d graphdb

@ -0,0 +1,43 @@
@base <https://swapi.co/resource/>.
@prefix voc: <https://swapi.co/vocabulary/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
{
<besalisk/71>
a voc:Besalisk , voc:Character ;
rdfs:label "Dexter Jettster" ;
voc:eyeColor "yellow" ;
voc:gender "male" ;
voc:height 198.0 ;
voc:mass 102.0 ;
voc:skinColor "brown" .
}
<https://swapi.co/ontology/> {
voc:Character a owl:Class .
voc:Species a owl:Class .
voc:Besalisk a voc:Species;
rdfs:label "Besalisk";
voc:averageHeight 178.0;
voc:averageLifespan "75";
voc:character <https://swapi.co/resource/besalisk/71>;
voc:language "besalisk";
voc:skinColor "brown";
voc:eyeColor "yellow" .
voc:averageHeight a owl:DatatypeProperty .
voc:averageLifespan a owl:DatatypeProperty .
voc:character a owl:ObjectProperty .
voc:language a owl:DatatypeProperty .
voc:skinColor a owl:DatatypeProperty .
voc:eyeColor a owl:DatatypeProperty .
voc:gender a owl:DatatypeProperty .
voc:height a owl:DatatypeProperty .
voc:mass a owl:DatatypeProperty .
}

@ -0,0 +1,181 @@
from pathlib import Path
import pytest
from langchain_community.graphs import OntotextGraphDBGraph
"""
cd libs/community/tests/integration_tests/graphs/docker-compose-ontotext-graphdb
./start.sh
"""
def test_query() -> None:
graph = OntotextGraphDBGraph(
query_endpoint="http://localhost:7200/repositories/langchain",
query_ontology="CONSTRUCT {?s ?p ?o}"
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
)
query_results = graph.query(
"PREFIX voc: <https://swapi.co/vocabulary/> "
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> "
"SELECT ?eyeColor "
"WHERE {"
' ?besalisk rdfs:label "Dexter Jettster" ; '
" voc:eyeColor ?eyeColor ."
"}"
)
assert len(query_results) == 1
assert len(query_results[0]) == 1
assert str(query_results[0][0]) == "yellow"
def test_get_schema_with_query() -> None:
graph = OntotextGraphDBGraph(
query_endpoint="http://localhost:7200/repositories/langchain",
query_ontology="CONSTRUCT {?s ?p ?o}"
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
)
from rdflib import Graph
assert len(Graph().parse(data=graph.get_schema, format="turtle")) == 19
@pytest.mark.parametrize(
"rdf_format, file_extension",
[
("json-ld", "json"),
("json-ld", "jsonld"),
("json-ld", "json-ld"),
("xml", "rdf"),
("xml", "xml"),
("xml", "owl"),
("pretty-xml", "xml"),
("n3", "n3"),
("turtle", "ttl"),
("nt", "nt"),
("trig", "trig"),
("nquads", "nq"),
("nquads", "nquads"),
("trix", "trix"),
],
)
def test_get_schema_from_file(
tmp_path: Path, rdf_format: str, file_extension: str
) -> None:
expected_number_of_ontology_statements = 19
graph = OntotextGraphDBGraph(
query_endpoint="http://localhost:7200/repositories/langchain",
query_ontology="CONSTRUCT {?s ?p ?o}"
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
)
from rdflib import ConjunctiveGraph, Graph
assert (
len(Graph().parse(data=graph.get_schema, format="turtle"))
== expected_number_of_ontology_statements
)
# serialize the ontology schema loaded with the query in a local file
# in various rdf formats and check that this results
# in the same number of statements
conjunctive_graph = ConjunctiveGraph()
ontology_context = conjunctive_graph.get_context("https://swapi.co/ontology/")
ontology_context.parse(data=graph.get_schema, format="turtle")
assert len(ontology_context) == expected_number_of_ontology_statements
assert len(conjunctive_graph) == expected_number_of_ontology_statements
local_file = tmp_path / ("starwars-ontology." + file_extension)
conjunctive_graph.serialize(local_file, format=rdf_format)
graph = OntotextGraphDBGraph(
query_endpoint="http://localhost:7200/repositories/langchain",
local_file=str(local_file),
)
assert (
len(Graph().parse(data=graph.get_schema, format="turtle"))
== expected_number_of_ontology_statements
)
@pytest.mark.parametrize(
"rdf_format", ["json-ld", "xml", "n3", "turtle", "nt", "trig", "nquads", "trix"]
)
def test_get_schema_from_file_with_explicit_rdf_format(
tmp_path: Path, rdf_format: str
) -> None:
expected_number_of_ontology_statements = 19
graph = OntotextGraphDBGraph(
query_endpoint="http://localhost:7200/repositories/langchain",
query_ontology="CONSTRUCT {?s ?p ?o}"
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
)
from rdflib import ConjunctiveGraph, Graph
assert (
len(Graph().parse(data=graph.get_schema, format="turtle"))
== expected_number_of_ontology_statements
)
# serialize the ontology schema loaded with the query in a local file
# in various rdf formats and check that this results
# in the same number of statements
conjunctive_graph = ConjunctiveGraph()
ontology_context = conjunctive_graph.get_context("https://swapi.co/ontology/")
ontology_context.parse(data=graph.get_schema, format="turtle")
assert len(ontology_context) == expected_number_of_ontology_statements
assert len(conjunctive_graph) == expected_number_of_ontology_statements
local_file = tmp_path / "starwars-ontology.txt"
conjunctive_graph.serialize(local_file, format=rdf_format)
graph = OntotextGraphDBGraph(
query_endpoint="http://localhost:7200/repositories/langchain",
local_file=str(local_file),
local_file_format=rdf_format,
)
assert (
len(Graph().parse(data=graph.get_schema, format="turtle"))
== expected_number_of_ontology_statements
)
def test_get_schema_from_file_with_wrong_extension(tmp_path: Path) -> None:
expected_number_of_ontology_statements = 19
graph = OntotextGraphDBGraph(
query_endpoint="http://localhost:7200/repositories/langchain",
query_ontology="CONSTRUCT {?s ?p ?o}"
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
)
from rdflib import ConjunctiveGraph, Graph
assert (
len(Graph().parse(data=graph.get_schema, format="turtle"))
== expected_number_of_ontology_statements
)
conjunctive_graph = ConjunctiveGraph()
ontology_context = conjunctive_graph.get_context("https://swapi.co/ontology/")
ontology_context.parse(data=graph.get_schema, format="turtle")
assert len(ontology_context) == expected_number_of_ontology_statements
assert len(conjunctive_graph) == expected_number_of_ontology_statements
local_file = tmp_path / "starwars-ontology.trig"
conjunctive_graph.serialize(local_file, format="nquads")
with pytest.raises(ValueError):
OntotextGraphDBGraph(
query_endpoint="http://localhost:7200/repositories/langchain",
local_file=str(local_file),
)

@ -12,6 +12,7 @@ EXPECTED_ALL = [
"ArangoGraph",
"FalkorDBGraph",
"TigerGraph",
"OntotextGraphDBGraph",
]

@ -0,0 +1,176 @@
import os
import tempfile
import unittest
import pytest
class TestOntotextGraphDBGraph(unittest.TestCase):
def test_import(self) -> None:
from langchain_community.graphs import OntotextGraphDBGraph # noqa: F401
@pytest.mark.requires("rdflib")
def test_validate_user_query_wrong_type(self) -> None:
from langchain_community.graphs import OntotextGraphDBGraph
with self.assertRaises(TypeError) as e:
OntotextGraphDBGraph._validate_user_query(
[
"PREFIX starwars: <https://swapi.co/ontology/> "
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> "
"DESCRIBE starwars: ?term "
"WHERE {?term rdfs:isDefinedBy starwars: }"
]
)
self.assertEqual("Ontology query must be provided as string.", str(e.exception))
@pytest.mark.requires("rdflib")
def test_validate_user_query_invalid_sparql_syntax(self) -> None:
from langchain_community.graphs import OntotextGraphDBGraph
with self.assertRaises(ValueError) as e:
OntotextGraphDBGraph._validate_user_query(
"CONSTRUCT {?s ?p ?o} FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o"
)
self.assertEqual(
"('Ontology query is not a valid SPARQL query.', "
"Expected ConstructQuery, "
"found end of text (at char 70), (line:1, col:71))",
str(e.exception),
)
@pytest.mark.requires("rdflib")
def test_validate_user_query_invalid_query_type_select(self) -> None:
from langchain_community.graphs import OntotextGraphDBGraph
with self.assertRaises(ValueError) as e:
OntotextGraphDBGraph._validate_user_query("SELECT * { ?s ?p ?o }")
self.assertEqual(
"Invalid query type. Only CONSTRUCT queries are supported.",
str(e.exception),
)
@pytest.mark.requires("rdflib")
def test_validate_user_query_invalid_query_type_ask(self) -> None:
from langchain_community.graphs import OntotextGraphDBGraph
with self.assertRaises(ValueError) as e:
OntotextGraphDBGraph._validate_user_query("ASK { ?s ?p ?o }")
self.assertEqual(
"Invalid query type. Only CONSTRUCT queries are supported.",
str(e.exception),
)
@pytest.mark.requires("rdflib")
def test_validate_user_query_invalid_query_type_describe(self) -> None:
from langchain_community.graphs import OntotextGraphDBGraph
with self.assertRaises(ValueError) as e:
OntotextGraphDBGraph._validate_user_query(
"PREFIX swapi: <https://swapi.co/ontology/> "
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> "
"DESCRIBE ?term WHERE { ?term rdfs:isDefinedBy swapi: }"
)
self.assertEqual(
"Invalid query type. Only CONSTRUCT queries are supported.",
str(e.exception),
)
@pytest.mark.requires("rdflib")
def test_validate_user_query_construct(self) -> None:
from langchain_community.graphs import OntotextGraphDBGraph
OntotextGraphDBGraph._validate_user_query(
"CONSTRUCT {?s ?p ?o} FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}"
)
@pytest.mark.requires("rdflib")
def test_check_connectivity(self) -> None:
from langchain_community.graphs import OntotextGraphDBGraph
with self.assertRaises(ValueError) as e:
OntotextGraphDBGraph(
query_endpoint="http://localhost:7200/repositories/non-existing-repository",
query_ontology="PREFIX swapi: <https://swapi.co/ontology/> "
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> "
"DESCRIBE ?term WHERE {?term rdfs:isDefinedBy swapi: }",
)
self.assertEqual(
"Could not query the provided endpoint. "
"Please, check, if the value of the provided "
"query_endpoint points to the right repository. "
"If GraphDB is secured, please, make sure that the environment variables "
"'GRAPHDB_USERNAME' and 'GRAPHDB_PASSWORD' are set.",
str(e.exception),
)
@pytest.mark.requires("rdflib")
def test_local_file_does_not_exist(self) -> None:
from langchain_community.graphs import OntotextGraphDBGraph
non_existing_file = os.path.join("non", "existing", "path", "to", "file.ttl")
with self.assertRaises(FileNotFoundError) as e:
OntotextGraphDBGraph._load_ontology_schema_from_file(non_existing_file)
self.assertEqual(f"File {non_existing_file} does not exist.", str(e.exception))
@pytest.mark.requires("rdflib")
def test_local_file_no_access(self) -> None:
from langchain_community.graphs import OntotextGraphDBGraph
with tempfile.NamedTemporaryFile() as tmp_file:
tmp_file_name = tmp_file.name
# Set file permissions to write and execute only
os.chmod(tmp_file_name, 0o300)
with self.assertRaises(PermissionError) as e:
OntotextGraphDBGraph._load_ontology_schema_from_file(tmp_file_name)
self.assertEqual(
f"Read permission for {tmp_file_name} is restricted", str(e.exception)
)
@pytest.mark.requires("rdflib")
def test_local_file_bad_syntax(self) -> None:
from langchain_community.graphs import OntotextGraphDBGraph
with tempfile.TemporaryDirectory() as tempdir:
tmp_file_path = os.path.join(tempdir, "starwars-ontology.trig")
with open(tmp_file_path, "w") as tmp_file:
tmp_file.write("invalid trig")
with self.assertRaises(ValueError) as e:
OntotextGraphDBGraph._load_ontology_schema_from_file(tmp_file_path)
self.assertEqual(
f"('Invalid file format for {tmp_file_path} : '"
", BadSyntax('', 0, 'invalid trig', 0, "
"'expected directive or statement'))",
str(e.exception),
)
@pytest.mark.requires("rdflib")
def test_both_query_and_local_file_provided(self) -> None:
from langchain_community.graphs import OntotextGraphDBGraph
with self.assertRaises(ValueError) as e:
OntotextGraphDBGraph(
query_endpoint="http://localhost:7200/repositories/non-existing-repository",
query_ontology="CONSTRUCT {?s ?p ?o}"
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
local_file="starwars-ontology-wrong.trig",
)
self.assertEqual(
"Both file and query provided. Only one is allowed.", str(e.exception)
)
@pytest.mark.requires("rdflib")
def test_nor_query_nor_local_file_provided(self) -> None:
from langchain_community.graphs import OntotextGraphDBGraph
with self.assertRaises(ValueError) as e:
OntotextGraphDBGraph(
query_endpoint="http://localhost:7200/repositories/non-existing-repository",
)
self.assertEqual(
"Neither file nor query provided. One is required.", str(e.exception)
)

@ -41,6 +41,7 @@ from langchain.chains.graph_qa.hugegraph import HugeGraphQAChain
from langchain.chains.graph_qa.kuzu import KuzuQAChain
from langchain.chains.graph_qa.nebulagraph import NebulaGraphQAChain
from langchain.chains.graph_qa.neptune_cypher import NeptuneOpenCypherQAChain
from langchain.chains.graph_qa.ontotext_graphdb import OntotextGraphDBQAChain
from langchain.chains.graph_qa.sparql import GraphSparqlQAChain
from langchain.chains.history_aware_retriever import create_history_aware_retriever
from langchain.chains.hyde.base import HypotheticalDocumentEmbedder
@ -96,6 +97,7 @@ __all__ = [
"GraphCypherQAChain",
"GraphQAChain",
"GraphSparqlQAChain",
"OntotextGraphDBQAChain",
"HugeGraphQAChain",
"HypotheticalDocumentEmbedder",
"KuzuQAChain",

@ -0,0 +1,182 @@
"""Question answering over a graph."""
from __future__ import annotations
from typing import Any, Dict, List, Optional
from langchain_community.graphs import OntotextGraphDBGraph
from langchain_core.callbacks.manager import CallbackManager
from langchain_core.language_models import BaseLanguageModel
from langchain_core.prompts.base import BasePromptTemplate
from langchain_core.pydantic_v1 import Field
from langchain.callbacks.manager import CallbackManagerForChainRun
from langchain.chains.base import Chain
from langchain.chains.graph_qa.prompts import (
GRAPHDB_QA_PROMPT,
GRAPHDB_SPARQL_FIX_PROMPT,
GRAPHDB_SPARQL_GENERATION_PROMPT,
)
from langchain.chains.llm import LLMChain
class OntotextGraphDBQAChain(Chain):
"""Question-answering against Ontotext GraphDB
https://graphdb.ontotext.com/ by generating SPARQL queries.
*Security note*: Make sure that the database connection uses credentials
that are narrowly-scoped to only include necessary permissions.
Failure to do so may result in data corruption or loss, since the calling
code may attempt commands that would result in deletion, mutation
of data if appropriately prompted or reading sensitive data if such
data is present in the database.
The best way to guard against such negative outcomes is to (as appropriate)
limit the permissions granted to the credentials used with this tool.
See https://python.langchain.com/docs/security for more information.
"""
graph: OntotextGraphDBGraph = Field(exclude=True)
sparql_generation_chain: LLMChain
sparql_fix_chain: LLMChain
max_fix_retries: int
qa_chain: LLMChain
input_key: str = "query" #: :meta private:
output_key: str = "result" #: :meta private:
@property
def input_keys(self) -> List[str]:
return [self.input_key]
@property
def output_keys(self) -> List[str]:
_output_keys = [self.output_key]
return _output_keys
@classmethod
def from_llm(
cls,
llm: BaseLanguageModel,
*,
sparql_generation_prompt: BasePromptTemplate = GRAPHDB_SPARQL_GENERATION_PROMPT,
sparql_fix_prompt: BasePromptTemplate = GRAPHDB_SPARQL_FIX_PROMPT,
max_fix_retries: int = 5,
qa_prompt: BasePromptTemplate = GRAPHDB_QA_PROMPT,
**kwargs: Any,
) -> OntotextGraphDBQAChain:
"""Initialize from LLM."""
sparql_generation_chain = LLMChain(llm=llm, prompt=sparql_generation_prompt)
sparql_fix_chain = LLMChain(llm=llm, prompt=sparql_fix_prompt)
max_fix_retries = max_fix_retries
qa_chain = LLMChain(llm=llm, prompt=qa_prompt)
return cls(
qa_chain=qa_chain,
sparql_generation_chain=sparql_generation_chain,
sparql_fix_chain=sparql_fix_chain,
max_fix_retries=max_fix_retries,
**kwargs,
)
def _call(
self,
inputs: Dict[str, Any],
run_manager: Optional[CallbackManagerForChainRun] = None,
) -> Dict[str, str]:
"""
Generate a SPARQL query, use it to retrieve a response from GraphDB and answer
the question.
"""
_run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager()
callbacks = _run_manager.get_child()
prompt = inputs[self.input_key]
ontology_schema = self.graph.get_schema
sparql_generation_chain_result = self.sparql_generation_chain.invoke(
{"prompt": prompt, "schema": ontology_schema}, callbacks=callbacks
)
generated_sparql = sparql_generation_chain_result[
self.sparql_generation_chain.output_key
]
generated_sparql = self._get_valid_sparql_query(
_run_manager, callbacks, generated_sparql, ontology_schema
)
query_results = self.graph.query(generated_sparql)
qa_chain_result = self.qa_chain.invoke(
{"prompt": prompt, "context": query_results}, callbacks=callbacks
)
result = qa_chain_result[self.qa_chain.output_key]
return {self.output_key: result}
def _get_valid_sparql_query(
self,
_run_manager: CallbackManagerForChainRun,
callbacks: CallbackManager,
generated_sparql: str,
ontology_schema: str,
) -> str:
try:
return self._prepare_sparql_query(_run_manager, generated_sparql)
except Exception as e:
retries = 0
error_message = str(e)
self._log_invalid_sparql_query(
_run_manager, generated_sparql, error_message
)
while retries < self.max_fix_retries:
try:
sparql_fix_chain_result = self.sparql_fix_chain.invoke(
{
"error_message": error_message,
"generated_sparql": generated_sparql,
"schema": ontology_schema,
},
callbacks=callbacks,
)
generated_sparql = sparql_fix_chain_result[
self.sparql_fix_chain.output_key
]
return self._prepare_sparql_query(_run_manager, generated_sparql)
except Exception as e:
retries += 1
parse_exception = str(e)
self._log_invalid_sparql_query(
_run_manager, generated_sparql, parse_exception
)
raise ValueError("The generated SPARQL query is invalid.")
def _prepare_sparql_query(
self, _run_manager: CallbackManagerForChainRun, generated_sparql: str
) -> str:
from rdflib.plugins.sparql import prepareQuery
prepareQuery(generated_sparql)
self._log_valid_sparql_query(_run_manager, generated_sparql)
return generated_sparql
def _log_valid_sparql_query(
self, _run_manager: CallbackManagerForChainRun, generated_query: str
) -> None:
_run_manager.on_text("Generated SPARQL:", end="\n", verbose=self.verbose)
_run_manager.on_text(
generated_query, color="green", end="\n", verbose=self.verbose
)
def _log_invalid_sparql_query(
self,
_run_manager: CallbackManagerForChainRun,
generated_query: str,
error_message: str,
) -> None:
_run_manager.on_text("Invalid SPARQL query: ", end="\n", verbose=self.verbose)
_run_manager.on_text(
generated_query, color="red", end="\n", verbose=self.verbose
)
_run_manager.on_text(
"SPARQL Query Parse Error: ", end="\n", verbose=self.verbose
)
_run_manager.on_text(
error_message, color="red", end="\n\n", verbose=self.verbose
)

@ -197,6 +197,68 @@ SPARQL_QA_PROMPT = PromptTemplate(
input_variables=["context", "prompt"], template=SPARQL_QA_TEMPLATE
)
GRAPHDB_SPARQL_GENERATION_TEMPLATE = """
Write a SPARQL SELECT query for querying a graph database.
The ontology schema delimited by triple backticks in Turtle format is:
```
{schema}
```
Use only the classes and properties provided in the schema to construct the SPARQL query.
Do not use any classes or properties that are not explicitly provided in the SPARQL query.
Include all necessary prefixes.
Do not include any explanations or apologies in your responses.
Do not wrap the query in backticks.
Do not include any text except the SPARQL query generated.
The question delimited by triple backticks is:
```
{prompt}
```
"""
GRAPHDB_SPARQL_GENERATION_PROMPT = PromptTemplate(
input_variables=["schema", "prompt"],
template=GRAPHDB_SPARQL_GENERATION_TEMPLATE,
)
GRAPHDB_SPARQL_FIX_TEMPLATE = """
This following SPARQL query delimited by triple backticks
```
{generated_sparql}
```
is not valid.
The error delimited by triple backticks is
```
{error_message}
```
Give me a correct version of the SPARQL query.
Do not change the logic of the query.
Do not include any explanations or apologies in your responses.
Do not wrap the query in backticks.
Do not include any text except the SPARQL query generated.
The ontology schema delimited by triple backticks in Turtle format is:
```
{schema}
```
"""
GRAPHDB_SPARQL_FIX_PROMPT = PromptTemplate(
input_variables=["error_message", "generated_sparql", "schema"],
template=GRAPHDB_SPARQL_FIX_TEMPLATE,
)
GRAPHDB_QA_TEMPLATE = """Task: Generate a natural language response from the results of a SPARQL query.
You are an assistant that creates well-written and human understandable answers.
The information part contains the information provided, which you can use to construct an answer.
The information provided is authoritative, you must never doubt it or try to use your internal knowledge to correct it.
Make your response sound like the information is coming from an AI assistant, but don't add any information.
Don't use internal knowledge to answer the question, just say you don't know if no information is available.
Information:
{context}
Question: {prompt}
Helpful Answer:"""
GRAPHDB_QA_PROMPT = PromptTemplate(
input_variables=["context", "prompt"], template=GRAPHDB_QA_TEMPLATE
)
AQL_GENERATION_TEMPLATE = """Task: Generate an ArangoDB Query Language (AQL) query from a User Input.

@ -3049,7 +3049,6 @@ files = [
{file = "jq-1.6.0-cp37-cp37m-musllinux_1_1_i686.whl", hash = "sha256:227b178b22a7f91ae88525810441791b1ca1fc71c86f03190911793be15cec3d"},
{file = "jq-1.6.0-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:780eb6383fbae12afa819ef676fc93e1548ae4b076c004a393af26a04b460742"},
{file = "jq-1.6.0-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:08ded6467f4ef89fec35b2bf310f210f8cd13fbd9d80e521500889edf8d22441"},
{file = "jq-1.6.0-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:49e44ed677713f4115bd5bf2dbae23baa4cd503be350e12a1c1f506b0687848f"},
{file = "jq-1.6.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:984f33862af285ad3e41e23179ac4795f1701822473e1a26bf87ff023e5a89ea"},
{file = "jq-1.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f42264fafc6166efb5611b5d4cb01058887d050a6c19334f6a3f8a13bb369df5"},
{file = "jq-1.6.0-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:a67154f150aaf76cc1294032ed588436eb002097dd4fd1e283824bf753a05080"},
@ -3447,7 +3446,7 @@ files = [
[[package]]
name = "langchain-community"
version = "0.0.15"
version = "0.0.16"
description = "Community contributed LangChain integrations."
optional = false
python-versions = ">=3.8.1,<4.0"
@ -3457,7 +3456,7 @@ develop = true
[package.dependencies]
aiohttp = "^3.8.3"
dataclasses-json = ">= 0.5.7, < 0.7"
langchain-core = ">=0.1.14,<0.2"
langchain-core = ">=0.1.16,<0.2"
langsmith = ">=0.0.83,<0.1"
numpy = "^1"
PyYAML = ">=5.3"
@ -3467,7 +3466,7 @@ tenacity = "^8.1.0"
[package.extras]
cli = ["typer (>=0.9.0,<0.10.0)"]
extended-testing = ["aiosqlite (>=0.19.0,<0.20.0)", "aleph-alpha-client (>=2.15.0,<3.0.0)", "anthropic (>=0.3.11,<0.4.0)", "arxiv (>=1.4,<2.0)", "assemblyai (>=0.17.0,<0.18.0)", "atlassian-python-api (>=3.36.0,<4.0.0)", "azure-ai-documentintelligence (>=1.0.0b1,<2.0.0)", "beautifulsoup4 (>=4,<5)", "bibtexparser (>=1.4.0,<2.0.0)", "cassio (>=0.1.0,<0.2.0)", "chardet (>=5.1.0,<6.0.0)", "cohere (>=4,<5)", "dashvector (>=1.0.1,<2.0.0)", "databricks-vectorsearch (>=0.21,<0.22)", "datasets (>=2.15.0,<3.0.0)", "dgml-utils (>=0.3.0,<0.4.0)", "elasticsearch (>=8.12.0,<9.0.0)", "esprima (>=4.0.1,<5.0.0)", "faiss-cpu (>=1,<2)", "feedparser (>=6.0.10,<7.0.0)", "fireworks-ai (>=0.9.0,<0.10.0)", "geopandas (>=0.13.1,<0.14.0)", "gitpython (>=3.1.32,<4.0.0)", "google-cloud-documentai (>=2.20.1,<3.0.0)", "gql (>=3.4.1,<4.0.0)", "gradientai (>=1.4.0,<2.0.0)", "hdbcli (>=2.19.21,<3.0.0)", "hologres-vector (>=0.0.6,<0.0.7)", "html2text (>=2020.1.16,<2021.0.0)", "javelin-sdk (>=0.1.8,<0.2.0)", "jinja2 (>=3,<4)", "jq (>=1.4.1,<2.0.0)", "jsonschema (>1)", "lxml (>=4.9.2,<5.0.0)", "markdownify (>=0.11.6,<0.12.0)", "motor (>=3.3.1,<4.0.0)", "msal (>=1.25.0,<2.0.0)", "mwparserfromhell (>=0.6.4,<0.7.0)", "mwxml (>=0.3.3,<0.4.0)", "newspaper3k (>=0.2.8,<0.3.0)", "numexpr (>=2.8.6,<3.0.0)", "oci (>=2.119.1,<3.0.0)", "openai (<2)", "openapi-pydantic (>=0.3.2,<0.4.0)", "oracle-ads (>=2.9.1,<3.0.0)", "pandas (>=2.0.1,<3.0.0)", "pdfminer-six (>=20221105,<20221106)", "pgvector (>=0.1.6,<0.2.0)", "praw (>=7.7.1,<8.0.0)", "psychicapi (>=0.8.0,<0.9.0)", "py-trello (>=0.19.0,<0.20.0)", "pymupdf (>=1.22.3,<2.0.0)", "pypdf (>=3.4.0,<4.0.0)", "pypdfium2 (>=4.10.0,<5.0.0)", "pyspark (>=3.4.0,<4.0.0)", "rank-bm25 (>=0.2.2,<0.3.0)", "rapidfuzz (>=3.1.1,<4.0.0)", "rapidocr-onnxruntime (>=1.3.2,<2.0.0)", "requests-toolbelt (>=1.0.0,<2.0.0)", "rspace_client (>=2.5.0,<3.0.0)", "scikit-learn (>=1.2.2,<2.0.0)", "sqlite-vss (>=0.1.2,<0.2.0)", "streamlit (>=1.18.0,<2.0.0)", "sympy (>=1.12,<2.0)", "telethon (>=1.28.5,<2.0.0)", "timescale-vector (>=0.0.1,<0.0.2)", "tqdm (>=4.48.0)", "upstash-redis (>=0.15.0,<0.16.0)", "xata (>=1.0.0a7,<2.0.0)", "xmltodict (>=0.13.0,<0.14.0)", "zhipuai (>=1.0.7,<2.0.0)"]
extended-testing = ["aiosqlite (>=0.19.0,<0.20.0)", "aleph-alpha-client (>=2.15.0,<3.0.0)", "anthropic (>=0.3.11,<0.4.0)", "arxiv (>=1.4,<2.0)", "assemblyai (>=0.17.0,<0.18.0)", "atlassian-python-api (>=3.36.0,<4.0.0)", "azure-ai-documentintelligence (>=1.0.0b1,<2.0.0)", "beautifulsoup4 (>=4,<5)", "bibtexparser (>=1.4.0,<2.0.0)", "cassio (>=0.1.0,<0.2.0)", "chardet (>=5.1.0,<6.0.0)", "cohere (>=4,<5)", "dashvector (>=1.0.1,<2.0.0)", "databricks-vectorsearch (>=0.21,<0.22)", "datasets (>=2.15.0,<3.0.0)", "dgml-utils (>=0.3.0,<0.4.0)", "elasticsearch (>=8.12.0,<9.0.0)", "esprima (>=4.0.1,<5.0.0)", "faiss-cpu (>=1,<2)", "feedparser (>=6.0.10,<7.0.0)", "fireworks-ai (>=0.9.0,<0.10.0)", "geopandas (>=0.13.1,<0.14.0)", "gitpython (>=3.1.32,<4.0.0)", "google-cloud-documentai (>=2.20.1,<3.0.0)", "gql (>=3.4.1,<4.0.0)", "gradientai (>=1.4.0,<2.0.0)", "hdbcli (>=2.19.21,<3.0.0)", "hologres-vector (>=0.0.6,<0.0.7)", "html2text (>=2020.1.16,<2021.0.0)", "javelin-sdk (>=0.1.8,<0.2.0)", "jinja2 (>=3,<4)", "jq (>=1.4.1,<2.0.0)", "jsonschema (>1)", "lxml (>=4.9.2,<5.0.0)", "markdownify (>=0.11.6,<0.12.0)", "motor (>=3.3.1,<4.0.0)", "msal (>=1.25.0,<2.0.0)", "mwparserfromhell (>=0.6.4,<0.7.0)", "mwxml (>=0.3.3,<0.4.0)", "newspaper3k (>=0.2.8,<0.3.0)", "numexpr (>=2.8.6,<3.0.0)", "oci (>=2.119.1,<3.0.0)", "openai (<2)", "openapi-pydantic (>=0.3.2,<0.4.0)", "oracle-ads (>=2.9.1,<3.0.0)", "pandas (>=2.0.1,<3.0.0)", "pdfminer-six (>=20221105,<20221106)", "pgvector (>=0.1.6,<0.2.0)", "praw (>=7.7.1,<8.0.0)", "psychicapi (>=0.8.0,<0.9.0)", "py-trello (>=0.19.0,<0.20.0)", "pymupdf (>=1.22.3,<2.0.0)", "pypdf (>=3.4.0,<4.0.0)", "pypdfium2 (>=4.10.0,<5.0.0)", "pyspark (>=3.4.0,<4.0.0)", "rank-bm25 (>=0.2.2,<0.3.0)", "rapidfuzz (>=3.1.1,<4.0.0)", "rapidocr-onnxruntime (>=1.3.2,<2.0.0)", "rdflib (==7.0.0)", "requests-toolbelt (>=1.0.0,<2.0.0)", "rspace_client (>=2.5.0,<3.0.0)", "scikit-learn (>=1.2.2,<2.0.0)", "sqlite-vss (>=0.1.2,<0.2.0)", "streamlit (>=1.18.0,<2.0.0)", "sympy (>=1.12,<2.0)", "telethon (>=1.28.5,<2.0.0)", "timescale-vector (>=0.0.1,<0.0.2)", "tqdm (>=4.48.0)", "upstash-redis (>=0.15.0,<0.16.0)", "xata (>=1.0.0a7,<2.0.0)", "xmltodict (>=0.13.0,<0.14.0)", "zhipuai (>=1.0.7,<2.0.0)"]
[package.source]
type = "directory"
@ -5807,6 +5806,7 @@ files = [
{file = "pymongo-4.5.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6422b6763b016f2ef2beedded0e546d6aa6ba87910f9244d86e0ac7690f75c96"},
{file = "pymongo-4.5.0-cp312-cp312-win32.whl", hash = "sha256:77cfff95c1fafd09e940b3fdcb7b65f11442662fad611d0e69b4dd5d17a81c60"},
{file = "pymongo-4.5.0-cp312-cp312-win_amd64.whl", hash = "sha256:e57d859b972c75ee44ea2ef4758f12821243e99de814030f69a3decb2aa86807"},
{file = "pymongo-4.5.0-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:8443f3a8ab2d929efa761c6ebce39a6c1dca1c9ac186ebf11b62c8fe1aef53f4"},
{file = "pymongo-4.5.0-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:2b0176f9233a5927084c79ff80b51bd70bfd57e4f3d564f50f80238e797f0c8a"},
{file = "pymongo-4.5.0-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:89b3f2da57a27913d15d2a07d58482f33d0a5b28abd20b8e643ab4d625e36257"},
{file = "pymongo-4.5.0-cp37-cp37m-manylinux2014_aarch64.whl", hash = "sha256:5caee7bd08c3d36ec54617832b44985bd70c4cbd77c5b313de6f7fce0bb34f93"},
@ -6698,6 +6698,27 @@ PyYAML = "*"
Shapely = ">=1.7.1"
six = ">=1.15.0"
[[package]]
name = "rdflib"
version = "7.0.0"
description = "RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information."
optional = true
python-versions = ">=3.8.1,<4.0.0"
files = [
{file = "rdflib-7.0.0-py3-none-any.whl", hash = "sha256:0438920912a642c866a513de6fe8a0001bd86ef975057d6962c79ce4771687cd"},
{file = "rdflib-7.0.0.tar.gz", hash = "sha256:9995eb8569428059b8c1affd26b25eac510d64f5043d9ce8c84e0d0036e995ae"},
]
[package.dependencies]
isodate = ">=0.6.0,<0.7.0"
pyparsing = ">=2.1.0,<4"
[package.extras]
berkeleydb = ["berkeleydb (>=18.1.0,<19.0.0)"]
html = ["html5lib (>=1.0,<2.0)"]
lxml = ["lxml (>=4.3.0,<5.0.0)"]
networkx = ["networkx (>=2.0.0,<3.0.0)"]
[[package]]
name = "redis"
version = "4.6.0"
@ -9118,7 +9139,7 @@ cli = ["typer"]
cohere = ["cohere"]
docarray = ["docarray"]
embeddings = ["sentence-transformers"]
extended-testing = ["aiosqlite", "aleph-alpha-client", "anthropic", "arxiv", "assemblyai", "atlassian-python-api", "beautifulsoup4", "bibtexparser", "cassio", "chardet", "cohere", "couchbase", "dashvector", "databricks-vectorsearch", "datasets", "dgml-utils", "esprima", "faiss-cpu", "feedparser", "fireworks-ai", "geopandas", "gitpython", "google-cloud-documentai", "gql", "hologres-vector", "html2text", "javelin-sdk", "jinja2", "jq", "jsonschema", "langchain-openai", "lxml", "markdownify", "motor", "msal", "mwparserfromhell", "mwxml", "newspaper3k", "numexpr", "openai", "openai", "openapi-pydantic", "pandas", "pdfminer-six", "pgvector", "praw", "psychicapi", "py-trello", "pymupdf", "pypdf", "pypdfium2", "pyspark", "rank-bm25", "rapidfuzz", "rapidocr-onnxruntime", "requests-toolbelt", "rspace_client", "scikit-learn", "sqlite-vss", "streamlit", "sympy", "telethon", "timescale-vector", "tqdm", "upstash-redis", "xata", "xmltodict"]
extended-testing = ["aiosqlite", "aleph-alpha-client", "anthropic", "arxiv", "assemblyai", "atlassian-python-api", "beautifulsoup4", "bibtexparser", "cassio", "chardet", "cohere", "couchbase", "dashvector", "databricks-vectorsearch", "datasets", "dgml-utils", "esprima", "faiss-cpu", "feedparser", "fireworks-ai", "geopandas", "gitpython", "google-cloud-documentai", "gql", "hologres-vector", "html2text", "javelin-sdk", "jinja2", "jq", "jsonschema", "langchain-openai", "lxml", "markdownify", "motor", "msal", "mwparserfromhell", "mwxml", "newspaper3k", "numexpr", "openai", "openai", "openapi-pydantic", "pandas", "pdfminer-six", "pgvector", "praw", "psychicapi", "py-trello", "pymupdf", "pypdf", "pypdfium2", "pyspark", "rank-bm25", "rapidfuzz", "rapidocr-onnxruntime", "rdflib", "requests-toolbelt", "rspace_client", "scikit-learn", "sqlite-vss", "streamlit", "sympy", "telethon", "timescale-vector", "tqdm", "upstash-redis", "xata", "xmltodict"]
javascript = ["esprima"]
llms = ["clarifai", "cohere", "huggingface_hub", "manifest-ml", "nlpcloud", "openai", "openlm", "torch", "transformers"]
openai = ["openai", "tiktoken"]
@ -9128,4 +9149,4 @@ text-helpers = ["chardet"]
[metadata]
lock-version = "2.0"
python-versions = ">=3.8.1,<4.0"
content-hash = "3cabbf56e60340c9e95892a50c281ca9e0859a5bee76a9f45fd54f6a047e6673"
content-hash = "1f0d8707a249814b4b96af0d775e5f1484f332af232a8b3e855c263efbb19fc2"

@ -111,6 +111,7 @@ couchbase = {version = "^4.1.9", optional = true}
dgml-utils = {version = "^0.3.0", optional = true}
datasets = {version = "^2.15.0", optional = true}
langchain-openai = {version = ">=0.0.2,<0.1", optional = true}
rdflib = {version = "7.0.0", optional = true}
[tool.poetry.group.test]
optional = true
@ -296,6 +297,7 @@ extended_testing = [
"dgml-utils",
"cohere",
"langchain-openai",
"rdflib",
]
[tool.ruff]
@ -343,7 +345,7 @@ markers = [
asyncio_mode = "auto"
[tool.codespell]
skip = '.git,*.pdf,*.svg,*.pdf,*.yaml,*.ipynb,poetry.lock,*.min.js,*.css,package-lock.json,example_data,_dist,examples'
skip = '.git,*.pdf,*.svg,*.pdf,*.yaml,*.ipynb,poetry.lock,*.min.js,*.css,package-lock.json,example_data,_dist,examples,*.trig'
# Ignore latin etc
ignore-regex = '.*(Stati Uniti|Tense=Pres).*'
# whats is a typo but used frequently in queries so kept as is

@ -0,0 +1,6 @@
FROM ontotext/graphdb:10.5.1
RUN mkdir -p /opt/graphdb/dist/data/repositories/starwars
COPY config.ttl /opt/graphdb/dist/data/repositories/starwars/
COPY starwars-data.trig /
COPY graphdb_create.sh /run.sh
ENTRYPOINT bash /run.sh

@ -0,0 +1,46 @@
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rep: <http://www.openrdf.org/config/repository#>.
@prefix sr: <http://www.openrdf.org/config/repository/sail#>.
@prefix sail: <http://www.openrdf.org/config/sail#>.
@prefix graphdb: <http://www.ontotext.com/config/graphdb#>.
[] a rep:Repository ;
rep:repositoryID "starwars" ;
rdfs:label "" ;
rep:repositoryImpl [
rep:repositoryType "graphdb:SailRepository" ;
sr:sailImpl [
sail:sailType "graphdb:Sail" ;
graphdb:read-only "false" ;
# Inference and Validation
graphdb:ruleset "empty" ;
graphdb:disable-sameAs "true" ;
graphdb:check-for-inconsistencies "false" ;
# Indexing
graphdb:entity-id-size "32" ;
graphdb:enable-context-index "false" ;
graphdb:enablePredicateList "true" ;
graphdb:enable-fts-index "false" ;
graphdb:fts-indexes ("default" "iri") ;
graphdb:fts-string-literals-index "default" ;
graphdb:fts-iris-index "none" ;
# Queries and Updates
graphdb:query-timeout "0" ;
graphdb:throw-QueryEvaluationException-on-timeout "false" ;
graphdb:query-limit-results "0" ;
# Settable in the file but otherwise hidden in the UI and in the RDF4J console
graphdb:base-URL "http://example.org/owlim#" ;
graphdb:defaultNS "" ;
graphdb:imports "" ;
graphdb:repository-type "file-repository" ;
graphdb:storage-folder "storage" ;
graphdb:entity-index-size "10000000" ;
graphdb:in-memory-literal-properties "true" ;
graphdb:enable-literal-index "true" ;
]
].

@ -0,0 +1,9 @@
version: '3.7'
services:
graphdb:
image: graphdb
container_name: graphdb
ports:
- "7200:7200"

@ -0,0 +1,33 @@
#! /bin/bash
REPOSITORY_ID="starwars"
GRAPHDB_URI="http://localhost:7200/"
echo -e "\nUsing GraphDB: ${GRAPHDB_URI}"
function startGraphDB {
echo -e "\nStarting GraphDB..."
exec /opt/graphdb/dist/bin/graphdb
}
function waitGraphDBStart {
echo -e "\nWaiting GraphDB to start..."
for _ in $(seq 1 5); do
CHECK_RES=$(curl --silent --write-out '%{http_code}' --output /dev/null ${GRAPHDB_URI}/rest/repositories)
if [ "${CHECK_RES}" = '200' ]; then
echo -e "\nUp and running"
break
fi
sleep 30s
echo "CHECK_RES: ${CHECK_RES}"
done
}
function loadData {
echo -e "\nImporting starwars-data.trig"
curl -X POST -H "Content-Type: application/x-trig" -T /starwars-data.trig ${GRAPHDB_URI}/repositories/${REPOSITORY_ID}/statements
}
startGraphDB &
waitGraphDBStart
loadData
wait

@ -0,0 +1,5 @@
set -ex
docker compose down -v --remove-orphans
docker build --tag graphdb .
docker compose up -d graphdb

@ -0,0 +1,160 @@
@base <https://swapi.co/resource/>.
@prefix voc: <https://swapi.co/vocabulary/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
{
<human/11>
a voc:Character , voc:Human ;
rdfs:label "Anakin Skywalker", "Darth Vader" ;
voc:birthYear "41.9BBY" ;
voc:eyeColor "blue" ;
voc:gender "male" ;
voc:hairColor "blond" ;
voc:height 188.0 ;
voc:homeworld <planet/1> ;
voc:mass 84.0 ;
voc:skinColor "fair" ;
voc:cybernetics "Cybernetic right arm" .
<human/1>
a voc:Character , voc:Human ;
rdfs:label "Luke Skywalker" ;
voc:birthYear "19BBY" ;
voc:eyeColor "blue" ;
voc:gender "male" ;
voc:hairColor "blond" ;
voc:height 172.0 ;
voc:homeworld <planet/1> ;
voc:mass 77.0 ;
voc:skinColor "fair" .
<human/35>
a voc:Character , voc:Human ;
rdfs:label "Padmé Amidala" ;
voc:birthYear "46BBY" ;
voc:eyeColor "brown" ;
voc:gender "female" ;
voc:hairColor "brown" ;
voc:height 165.0 ;
voc:homeworld <planet/8> ;
voc:mass 45.0 ;
voc:skinColor "light" .
<planet/1>
a voc:Planet ;
rdfs:label "Tatooine" ;
voc:climate "arid" ;
voc:diameter 10465 ;
voc:gravity "1 standard" ;
voc:orbitalPeriod 304 ;
voc:population 200000 ;
voc:resident <human/1> , <human/11> ;
voc:rotationPeriod 23 ;
voc:surfaceWater 1 ;
voc:terrain "desert" .
<planet/8>
a voc:Planet ;
rdfs:label "Naboo" ;
voc:climate "temperate" ;
voc:diameter 12120 ;
voc:gravity "1 standard" ;
voc:orbitalPeriod 312 ;
voc:population 4500000000 ;
voc:resident <human/35> ;
voc:rotationPeriod 26 ;
voc:surfaceWater 12 ;
voc:terrain "grassy hills, swamps, forests, mountains" .
<planet/14>
a voc:Planet ;
rdfs:label "Kashyyyk" ;
voc:climate "tropical" ;
voc:diameter 12765 ;
voc:gravity "1 standard" ;
voc:orbitalPeriod 381 ;
voc:population 45000000 ;
voc:resident <wookiee/13> , <wookiee/80> ;
voc:rotationPeriod 26 ;
voc:surfaceWater 60 ;
voc:terrain "jungle, forests, lakes, rivers" .
<wookiee/13>
a voc:Character , voc:Wookiee ;
rdfs:label "Chewbacca" ;
voc:birthYear "200BBY" ;
voc:eyeColor "blue" ;
voc:gender "male" ;
voc:hairColor "brown" ;
voc:height 228.0 ;
voc:homeworld <planet/14> ;
voc:mass 112.0 .
<wookiee/80>
a voc:Character , voc:Wookiee ;
rdfs:label "Tarfful" ;
voc:eyeColor "blue" ;
voc:gender "male" ;
voc:hairColor "brown" ;
voc:height 234.0 ;
voc:homeworld <planet/14> ;
voc:mass 136.0 ;
voc:skinColor "brown" .
}
<https://swapi.co/ontology/> {
voc:Character a owl:Class .
voc:Species a owl:Class .
voc:Human a voc:Species;
rdfs:label "Human";
voc:averageHeight 180.0;
voc:averageLifespan "120";
voc:character <https://swapi.co/resource/human/1>, <https://swapi.co/resource/human/35>,
<https://swapi.co/resource/human/11>;
voc:language "Galactic Basic";
voc:skinColor "black", "caucasian", "asian", "hispanic";
voc:eyeColor "blue", "brown", "hazel", "green", "grey", "amber";
voc:hairColor "brown", "red", "black", "blonde" .
voc:Planet a owl:Class .
voc:Wookiee a voc:Species;
rdfs:label "Wookiee";
voc:averageHeight 210.0;
voc:averageLifespan "400";
voc:character <https://swapi.co/resource/wookiee/13>, <https://swapi.co/resource/wookiee/80>;
voc:language "Shyriiwook";
voc:planet <https://swapi.co/resource/planet/14>;
voc:skinColor "gray";
voc:eyeColor "blue", "yellow", "brown", "red", "green", "golden";
voc:hairColor "brown", "black" .
voc:birthYear a owl:DatatypeProperty .
voc:eyeColor a owl:DatatypeProperty .
voc:gender a owl:DatatypeProperty .
voc:hairColor a owl:DatatypeProperty .
voc:height a owl:DatatypeProperty .
voc:homeworld a owl:ObjectProperty .
voc:mass a owl:DatatypeProperty .
voc:skinColor a owl:DatatypeProperty .
voc:cybernetics a owl:DatatypeProperty .
voc:climate a owl:DatatypeProperty .
voc:diameter a owl:DatatypeProperty .
voc:gravity a owl:DatatypeProperty .
voc:orbitalPeriod a owl:DatatypeProperty .
voc:population a owl:DatatypeProperty .
voc:resident a owl:ObjectProperty .
voc:rotationPeriod a owl:DatatypeProperty .
voc:surfaceWater a owl:DatatypeProperty .
voc:terrain a owl:DatatypeProperty .
voc:averageHeight a owl:DatatypeProperty .
voc:averageLifespan a owl:DatatypeProperty .
voc:character a owl:ObjectProperty .
voc:language a owl:DatatypeProperty .
voc:planet a owl:ObjectProperty .
}

@ -0,0 +1,323 @@
from unittest.mock import MagicMock, Mock
import pytest
from langchain_community.graphs import OntotextGraphDBGraph
from langchain.chains import LLMChain, OntotextGraphDBQAChain
"""
cd libs/langchain/tests/integration_tests/chains/docker-compose-ontotext-graphdb
./start.sh
"""
@pytest.mark.requires("langchain_openai", "rdflib")
@pytest.mark.parametrize("max_fix_retries", [-2, -1, 0, 1, 2])
def test_valid_sparql(max_fix_retries: int) -> None:
from langchain_openai import ChatOpenAI
question = "What is Luke Skywalker's home planet?"
answer = "Tatooine"
graph = OntotextGraphDBGraph(
query_endpoint="http://localhost:7200/repositories/starwars",
query_ontology="CONSTRUCT {?s ?p ?o} "
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
)
chain = OntotextGraphDBQAChain.from_llm(
Mock(ChatOpenAI),
graph=graph,
max_fix_retries=max_fix_retries,
)
chain.sparql_generation_chain = Mock(LLMChain)
chain.sparql_fix_chain = Mock(LLMChain)
chain.qa_chain = Mock(LLMChain)
chain.sparql_generation_chain.output_key = "text"
chain.sparql_generation_chain.invoke = MagicMock(
return_value={
"text": "SELECT * {?s ?p ?o} LIMIT 1",
"prompt": question,
"schema": "",
}
)
chain.sparql_fix_chain.output_key = "text"
chain.sparql_fix_chain.invoke = MagicMock()
chain.qa_chain.output_key = "text"
chain.qa_chain.invoke = MagicMock(
return_value={
"text": answer,
"prompt": question,
"context": [],
}
)
result = chain.invoke({chain.input_key: question})
assert chain.sparql_generation_chain.invoke.call_count == 1
assert chain.sparql_fix_chain.invoke.call_count == 0
assert chain.qa_chain.invoke.call_count == 1
assert result == {chain.output_key: answer, chain.input_key: question}
@pytest.mark.requires("langchain_openai", "rdflib")
@pytest.mark.parametrize("max_fix_retries", [-2, -1, 0])
def test_invalid_sparql_non_positive_max_fix_retries(
max_fix_retries: int,
) -> None:
from langchain_openai import ChatOpenAI
question = "What is Luke Skywalker's home planet?"
graph = OntotextGraphDBGraph(
query_endpoint="http://localhost:7200/repositories/starwars",
query_ontology="CONSTRUCT {?s ?p ?o} "
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
)
chain = OntotextGraphDBQAChain.from_llm(
Mock(ChatOpenAI),
graph=graph,
max_fix_retries=max_fix_retries,
)
chain.sparql_generation_chain = Mock(LLMChain)
chain.sparql_fix_chain = Mock(LLMChain)
chain.qa_chain = Mock(LLMChain)
chain.sparql_generation_chain.output_key = "text"
chain.sparql_generation_chain.invoke = MagicMock(
return_value={
"text": "```sparql SELECT * {?s ?p ?o} LIMIT 1```",
"prompt": question,
"schema": "",
}
)
chain.sparql_fix_chain.output_key = "text"
chain.sparql_fix_chain.invoke = MagicMock()
chain.qa_chain.output_key = "text"
chain.qa_chain.invoke = MagicMock()
with pytest.raises(ValueError) as e:
chain.invoke({chain.input_key: question})
assert str(e.value) == "The generated SPARQL query is invalid."
assert chain.sparql_generation_chain.invoke.call_count == 1
assert chain.sparql_fix_chain.invoke.call_count == 0
assert chain.qa_chain.invoke.call_count == 0
@pytest.mark.requires("langchain_openai", "rdflib")
@pytest.mark.parametrize("max_fix_retries", [1, 2, 3])
def test_valid_sparql_after_first_retry(max_fix_retries: int) -> None:
from langchain_openai import ChatOpenAI
question = "What is Luke Skywalker's home planet?"
answer = "Tatooine"
generated_invalid_sparql = "```sparql SELECT * {?s ?p ?o} LIMIT 1```"
graph = OntotextGraphDBGraph(
query_endpoint="http://localhost:7200/repositories/starwars",
query_ontology="CONSTRUCT {?s ?p ?o} "
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
)
chain = OntotextGraphDBQAChain.from_llm(
Mock(ChatOpenAI),
graph=graph,
max_fix_retries=max_fix_retries,
)
chain.sparql_generation_chain = Mock(LLMChain)
chain.sparql_fix_chain = Mock(LLMChain)
chain.qa_chain = Mock(LLMChain)
chain.sparql_generation_chain.output_key = "text"
chain.sparql_generation_chain.invoke = MagicMock(
return_value={
"text": generated_invalid_sparql,
"prompt": question,
"schema": "",
}
)
chain.sparql_fix_chain.output_key = "text"
chain.sparql_fix_chain.invoke = MagicMock(
return_value={
"text": "SELECT * {?s ?p ?o} LIMIT 1",
"error_message": "pyparsing.exceptions.ParseException: "
"Expected {SelectQuery | ConstructQuery | DescribeQuery | AskQuery}, "
"found '`' (at char 0), (line:1, col:1)",
"generated_sparql": generated_invalid_sparql,
"schema": "",
}
)
chain.qa_chain.output_key = "text"
chain.qa_chain.invoke = MagicMock(
return_value={
"text": answer,
"prompt": question,
"context": [],
}
)
result = chain.invoke({chain.input_key: question})
assert chain.sparql_generation_chain.invoke.call_count == 1
assert chain.sparql_fix_chain.invoke.call_count == 1
assert chain.qa_chain.invoke.call_count == 1
assert result == {chain.output_key: answer, chain.input_key: question}
@pytest.mark.requires("langchain_openai", "rdflib")
@pytest.mark.parametrize("max_fix_retries", [1, 2, 3])
def test_invalid_sparql_after_all_retries(max_fix_retries: int) -> None:
from langchain_openai import ChatOpenAI
question = "What is Luke Skywalker's home planet?"
generated_invalid_sparql = "```sparql SELECT * {?s ?p ?o} LIMIT 1```"
graph = OntotextGraphDBGraph(
query_endpoint="http://localhost:7200/repositories/starwars",
query_ontology="CONSTRUCT {?s ?p ?o} "
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
)
chain = OntotextGraphDBQAChain.from_llm(
Mock(ChatOpenAI),
graph=graph,
max_fix_retries=max_fix_retries,
)
chain.sparql_generation_chain = Mock(LLMChain)
chain.sparql_fix_chain = Mock(LLMChain)
chain.qa_chain = Mock(LLMChain)
chain.sparql_generation_chain.output_key = "text"
chain.sparql_generation_chain.invoke = MagicMock(
return_value={
"text": generated_invalid_sparql,
"prompt": question,
"schema": "",
}
)
chain.sparql_fix_chain.output_key = "text"
chain.sparql_fix_chain.invoke = MagicMock(
return_value={
"text": generated_invalid_sparql,
"error_message": "pyparsing.exceptions.ParseException: "
"Expected {SelectQuery | ConstructQuery | DescribeQuery | AskQuery}, "
"found '`' (at char 0), (line:1, col:1)",
"generated_sparql": generated_invalid_sparql,
"schema": "",
}
)
chain.qa_chain.output_key = "text"
chain.qa_chain.invoke = MagicMock()
with pytest.raises(ValueError) as e:
chain.invoke({chain.input_key: question})
assert str(e.value) == "The generated SPARQL query is invalid."
assert chain.sparql_generation_chain.invoke.call_count == 1
assert chain.sparql_fix_chain.invoke.call_count == max_fix_retries
assert chain.qa_chain.invoke.call_count == 0
@pytest.mark.requires("langchain_openai", "rdflib")
@pytest.mark.parametrize(
"max_fix_retries,number_of_invalid_responses",
[(1, 0), (2, 0), (2, 1), (10, 6)],
)
def test_valid_sparql_after_some_retries(
max_fix_retries: int, number_of_invalid_responses: int
) -> None:
from langchain_openai import ChatOpenAI
question = "What is Luke Skywalker's home planet?"
answer = "Tatooine"
generated_invalid_sparql = "```sparql SELECT * {?s ?p ?o} LIMIT 1```"
generated_valid_sparql_query = "SELECT * {?s ?p ?o} LIMIT 1"
graph = OntotextGraphDBGraph(
query_endpoint="http://localhost:7200/repositories/starwars",
query_ontology="CONSTRUCT {?s ?p ?o} "
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
)
chain = OntotextGraphDBQAChain.from_llm(
Mock(ChatOpenAI),
graph=graph,
max_fix_retries=max_fix_retries,
)
chain.sparql_generation_chain = Mock(LLMChain)
chain.sparql_fix_chain = Mock(LLMChain)
chain.qa_chain = Mock(LLMChain)
chain.sparql_generation_chain.output_key = "text"
chain.sparql_generation_chain.invoke = MagicMock(
return_value={
"text": generated_invalid_sparql,
"prompt": question,
"schema": "",
}
)
chain.sparql_fix_chain.output_key = "text"
chain.sparql_fix_chain.invoke = Mock()
chain.sparql_fix_chain.invoke.side_effect = [
{
"text": generated_invalid_sparql,
"error_message": "pyparsing.exceptions.ParseException: "
"Expected {SelectQuery | ConstructQuery | DescribeQuery | AskQuery}, "
"found '`' (at char 0), (line:1, col:1)",
"generated_sparql": generated_invalid_sparql,
"schema": "",
}
] * number_of_invalid_responses + [
{
"text": generated_valid_sparql_query,
"error_message": "pyparsing.exceptions.ParseException: "
"Expected {SelectQuery | ConstructQuery | DescribeQuery | AskQuery}, "
"found '`' (at char 0), (line:1, col:1)",
"generated_sparql": generated_invalid_sparql,
"schema": "",
}
]
chain.qa_chain.output_key = "text"
chain.qa_chain.invoke = MagicMock(
return_value={
"text": answer,
"prompt": question,
"context": [],
}
)
result = chain.invoke({chain.input_key: question})
assert chain.sparql_generation_chain.invoke.call_count == 1
assert chain.sparql_fix_chain.invoke.call_count == number_of_invalid_responses + 1
assert chain.qa_chain.invoke.call_count == 1
assert result == {chain.output_key: answer, chain.input_key: question}
@pytest.mark.requires("langchain_openai", "rdflib")
@pytest.mark.parametrize(
"model_name,question",
[
("gpt-3.5-turbo-1106", "What is the average height of the Wookiees?"),
("gpt-3.5-turbo-1106", "What is the climate on Tatooine?"),
("gpt-3.5-turbo-1106", "What is Luke Skywalker's home planet?"),
("gpt-4-1106-preview", "What is the average height of the Wookiees?"),
("gpt-4-1106-preview", "What is the climate on Tatooine?"),
("gpt-4-1106-preview", "What is Luke Skywalker's home planet?"),
],
)
def test_chain(model_name: str, question: str) -> None:
from langchain_openai import ChatOpenAI
graph = OntotextGraphDBGraph(
query_endpoint="http://localhost:7200/repositories/starwars",
query_ontology="CONSTRUCT {?s ?p ?o} "
"FROM <https://swapi.co/ontology/> WHERE {?s ?p ?o}",
)
chain = OntotextGraphDBQAChain.from_llm(
ChatOpenAI(temperature=0, model_name=model_name), graph=graph, verbose=True
)
try:
chain.invoke({chain.input_key: question})
except ValueError:
pass

@ -14,6 +14,7 @@ EXPECTED_ALL = [
"GraphCypherQAChain",
"GraphQAChain",
"GraphSparqlQAChain",
"OntotextGraphDBQAChain",
"HugeGraphQAChain",
"HypotheticalDocumentEmbedder",
"KuzuQAChain",

@ -0,0 +1,2 @@
def test_import() -> None:
from langchain.chains import OntotextGraphDBQAChain # noqa: F401

@ -53,7 +53,7 @@ langchain-openai = { path = "libs/partners/openai", develop = true }
[tool.poetry.group.typing.dependencies]
[tool.codespell]
skip = '.git,*.pdf,*.svg,*.pdf,*.yaml,*.ipynb,poetry.lock,*.min.js,*.css,package-lock.json,example_data,_dist,examples,templates'
skip = '.git,*.pdf,*.svg,*.pdf,*.yaml,*.ipynb,poetry.lock,*.min.js,*.css,package-lock.json,example_data,_dist,examples,templates,*.trig'
# Ignore latin etc
ignore-regex = '.*(Stati Uniti|Tense=Pres).*'
# whats is a typo but used frequently in queries so kept as is

Loading…
Cancel
Save