0% found this document useful (0 votes)
40 views12 pages

A Comparison of Query Rewriting Techniques For DL-Lite

This document compares two query rewriting techniques for DL-Lite ontologies: the CGLLR algorithm and a resolution-based algorithm. The CGLLR algorithm relies on a reduction step that can significantly increase the size of query rewritings. The resolution-based algorithm uses functional terms to track existential quantification and handles qualified existential axioms natively. An implementation of the resolution-based algorithm called REQUIEM produced significantly smaller rewritings than an implementation of the CGLLR algorithm on benchmark tests using real-world DL-Lite ontologies and queries.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views12 pages

A Comparison of Query Rewriting Techniques For DL-Lite

This document compares two query rewriting techniques for DL-Lite ontologies: the CGLLR algorithm and a resolution-based algorithm. The CGLLR algorithm relies on a reduction step that can significantly increase the size of query rewritings. The resolution-based algorithm uses functional terms to track existential quantification and handles qualified existential axioms natively. An implementation of the resolution-based algorithm called REQUIEM produced significantly smaller rewritings than an implementation of the CGLLR algorithm on benchmark tests using real-world DL-Lite ontologies and queries.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

A Comparison of Query Rewriting Techniques

for DL-Lite

Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks

Oxford University Computing Laboratory


Oxford, England
{hector.perez-urbina,boris.motik,ian.horrocks}@comlab.ox.ac.uk

1 Introduction

An incomplete database is defined by a set of constraints and a partial database


instance [1]. Answering conjunctive queries over incomplete databases is an im-
portant computational task that lies at the core of many problems, such as in-
formation integration [12], data exchange [9], and data warehousing [17]. Given
a query and an incomplete database, the task is to compute the so-called certain
answers—the tuples that satisfy the query in every database instance that con-
forms to the partial instance and satisfies the constraints [16]. It is well known
that answering queries under general constraints is undecidable; therefore, the
expressivity of the constraint languages considered is typically restricted in order
to achieve decidability.
Description Logics (DLs) [3] can be viewed as very expressive but decidable
first-order fragments, which makes them natural candidates for constraint lan-
guages. DLs are a family of knowledge representation formalisms that represent a
given domain in terms of concepts (unary predicates), roles (binary predicates),
and individuals (constants). A DL Knowledge Base (KB) K = hT , Ai consists
of a terminological component T called the TBox, and an assertional compo-
nent A called the ABox. In analogy to incomplete databases, the TBox can be
seen as a conceptual schema containing the set of constraints, and the ABox
as some partial instance of the schema. The use of DLs as constraint languages
has already proven to be useful in a variety of scenarios such as ontology-based
information integration [7] and the Semantic Web [10]. Various DLs exist for
which it is possible to transform a conjunctive query Q and a TBox T into a
datalog query Q0 —a so-called rewriting of Q w.r.t. T —such that for every ABox
A, the answers to Q over T and A, and the answers to Q0 over A coincide. Thus,
the problem of answering Q over K = hT , Ai for a specific A can be reduced
to evaluating the datalog query Q0 over A. A motivation for answering queries
via query rewriting is the prospect of using existing deductive database sys-
tems to evaluate the computed rewritings, thus taking advantage of the query
optimization strategies provided by such systems.
The problem of conjunctive query rewriting for DLs has been considered by
Rosati [15] for the EL family of languages [2], and by Calvanese et al. for the
DL-Lite family of languages [6]. The rewriting algorithm proposed by Calvanese
2 Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks

et al., which we refer to as the CGLLR algorithm, takes as input a conjunctive


query Q and a DL-LiteR TBox T and computes a union of conjunctive queries
that is a rewriting of Q w.r.t. T . The CGLLR algorithm lies at the core of
reasoners such as QuOnto1 and Owlgres2 . The aforementioned approaches are
closely related; however, they have been developed for two different DL families.
Motivated by the prospect of developing a unified algorithm, in our previous
work [13] we described a novel rewriting algorithm for the DL ELHI [2]. The
algorithm takes as input a conjunctive query Q and an ELHI TBox T , and
produces a rewriting Q0 of Q w.r.t. T . A distinguishing feature of our algorithm
is that it exhibits “pay-as-you-go” behavior: if T is in ELHI, ELH or EL, then
Q0 is a datalog query, as in [15]; if T is in DL-Lite+ [13], then Q0 consists of a
union of conjunctive queries and a linear datalog query; and, if T is in DL-LiteR
or DL-Litecore , then Q0 is a union of conjunctive queries, as in [6]. Hence, our
approach not only handles all the aforementioned sublanguages of ELHI, but
it is worst-case optimal w.r.t. data complexity for all of them, which makes it a
generalization and an extension of the approaches of [15] and [6].
Our algorithm and the CGLLR algorithm mainly differ in the way existential
quantification is handled. In our algorithm, functional terms are used to keep
track of role successors. In contrast, the CGLLR algorithm deals with existen-
tial quantification by relying on a so-called reduction step. Moreover, unlike our
algorithm, the CGLLR algorithm does not handle qualified existential quan-
tification axioms natively; instead, a preliminary step is required where every
such an axiom is replaced with a suitable encoding that introduces a fresh role.
Our algorithm does not derive the queries produced by the reduction step and it
handles qualified existential quantification natively; therefore, we conjecture that
our algorithm will often produce smaller rewritings than the CGLLR algorithm.
In order to gather empirical evidence to support our conjecture, we imple-
mented both algorithms in their original form—that is, we did not implement
any optimization to reduce the size of the rewritings. We then compared the
two implementations using a benchmark suite containing different DL-LiteR on-
tologies with TBoxes of varying complexity, and a set of test queries derived
from applications that use these ontologies. Our results clearly indicate that the
reduction step of the CGLLR algorithm can lead to a significant increase in the
size of the rewritings. The implementation of our algorithm—called REQUIEM
(REsolution-based QUery rewrIting for Expressive Models)—not only produced
significantly smaller rewritings, but was also significantly faster in most cases.
Finally, in order to determine the level of redundancy in the rewritings pro-
duced by our algorithm, we implemented an optimized version of REQUIEM,
called REQUIEM-Full, that uses forward subsumption [5] and performs an a
posteriori query subsumption check on the rewritings. REQUIEM produced the
same rewritings as REQUIEM-full for 63% of the queries. Therefore, our empiri-
cal evaluation suggests that REQUIEM alone can be effectively used in practice
in various scenarios.
1
http://www.dis.uniroma1.it/~quonto/
2
http://pellet.owldl.com/owlgres/
A Comparison of Query Rewriting Techniques for DL-Lite 3

2 Preliminaries

For P an atomic role, a basic role has the form P or P − . For A an atomic
concept and R a basic role, an antecedent concept has the form A or ∃R; and a
consequent concept has the form A, ∃R, or ∃R.A. A DL-LiteR TBox is a set of
inclusion assertions or inclusion axioms of the form A v C or R1 v R2 , where
A is an antecedent concept, C is a consequent concept, and R1 and R2 are basic
roles. The former inclusion assertion is called a concept inclusion, and the latter,
a role inclusion. An ABox is a set of membership assertions of the form A(a)
or P (a, b), where A is an atomic concept, P is an atomic role, and a and b are
constants. A DL-LiteR Knowledge Base (KB) K is a tuple hT , Ai, where T is a
DL-LiteR TBox, and A is an ABox. The semantics of K is defined as usual [3].
We use the well-known definitions of constants, variables, function symbols,
terms, and atoms of first-order logic [8]. A Horn clause C is an expression of the
form D0 ← D1 ∧ ... ∧ Dn where each Di is an atom. The atom D0 is called the
head, and the set {D1 , ..., Dn } is called the body. Variables that occur in the body
at most once and do not occur in the head are called unbound variables and may
be denoted with the symbol ; all other variables are called bound variables.
A Horn clause is safe if every variable occurring in the head also occurs in the
body. The depth of a term t is defined as depth(t) = 0 if t is a constant or a
variable, and depth(f (s1 , ..., sn )) = 1 + max(depth(si )) for 1 ≤ i ≤ n if t is a
functional term f (s1 , ..., sn ). The notion of depth for atoms and Horn clauses is
extended in the natural way. An atom Di occurring in a Horn clause C is said
to be deepest in C if depth(Di ) ≥ depth(Dj ) for every body atom Dj of C. A
conjunctive query over a DL KB K is a safe Horn clause whose head predicate
does not occur in K, and whose body is a set of atoms whose predicates are
concept and role names occurring in K. A union of conjunctive queries over K is
a set of conjunctive queries over K with the same head up to variable renaming
[4]. A tuple of constants ~a is a certain answer to a union of conjunctive queries
Q over K if and only if K ∪ Q |= QP (~a), where QP is the head predicate of Q,
and Q is considered to be a set of universally quantified implications with the
usual first-order semantics [8]. The set of all answers to Q over K is denoted by
ans(Q, K). Given a conjunctive query Q and a TBox T , a query Q0 is said to be
a rewriting of Q w.r.t. T if ans(Q, hT , Ai) = ans(Q0 , A) for every ABox A.

3 Query Rewriting Algorithms

In this section we briefly present the two rewriting algorithms we consider in our
empirical evaluation: the CGLLR algorithm and our resolution-based algorithm.
Both algorithms transform a conjunctive query Q and a DL-LiteR TBox T into
a union of conjunctive queries Q0 that is a rewriting of Q w.r.t. T .
The CGLLR Algorithm This algorithm uses the axioms of T as rewriting
rules and applies them to the body atoms of Q. The rewriting rules are based
on a partial function ref(D, α) that takes as input an axiom α ∈ T and a body
atom D of Q. We define ref(D, α) as follows:
4 Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks

Input: Conjunctive query Q, DL-LiteR TBox T


R = {Q};
repeat
foreach query Q0 ∈ R do
(reformulation) foreach atom D in Q0 do
foreach axiom α ∈ T do
if α is applicable to D then
R = R ∪ {Q0 [D/ref(D, α)]};
end
end
end
(reduction) forall atoms D1 , D2 in Q0 do
if D1 and D2 unify then
σ = MGU(D1 , D2 );
R = R ∪ {λ(Q0 σ))};
end
end
end
until no query unique up to variable renaming can be added to R ;
return R;
Algorithm 1: The CGLLR algorithm

– if D = A(x), then we have that (i) if α = B v A, then ref(D, α) = B(x);


(ii) if α = ∃P v A, then ref(D, α) = P (x, ); and (iii) if α = ∃P − v A, then
ref(D, α) = P ( , x).
– If D = P (x, ), then we have that (i) if α = A v ∃P , then ref(D, α) = A(x);
(ii) if α = ∃S v ∃P , then ref(D, α) = S(x, ); and (iii) if α = ∃S − v ∃P ,
then ref(D, α) = S( , x).
– If D = P ( , x), then we have that (i) if α = A v ∃P − , then ref(D, α) = A(x);
(ii) if α = ∃S v ∃P − , then ref(D, α) = S(x, ); and (iii) if α = ∃S − v ∃P − ,
then ref(D, α) = S( , x).
– If D = P (x, y), then we have that (i) if either α = S v P or α = S − v P − ,
then ref(D, α) = S(x, y); and (ii) if either α = S v P − or α = S − v P , then
ref(D, α) = S(y, x).

If ref(D, α) is defined for α and D, we say that α is applicable to D.


The CGLLR algorithm is shown in Algorithm 1. Starting with the original
query Q, the algorithm keeps producing queries in two steps until no other query
unique up to variable renaming can be produced. In the reformulation step the
algorithm rewrites the body atoms of a given query Q0 by using applicable TBox
axioms as rewriting rules, generating a new query for every atom reformulation.
The expression Q[D/D0 ] denotes the conjunctive query obtained from Q by
replacing the body atom D with a new atom D0 . Then, in the reduction step
the algorithm produces a new query λ(Q0 σ) for each pair of body atoms of Q0
that unify. The function MGU takes as input two atoms and returns their most
general unifier [4]. The function λ takes as input a conjunctive query Q and
A Comparison of Query Rewriting Techniques for DL-Lite 5

Input: Conjunctive query Q, DL-LiteR TBox T


R = Ξ(T ) ∪ {Q};
repeat
(saturation) forall clauses C1 , C2 in R do
R = R ∪ resolve(C1 , C2 );
end
until no query unique up to variable renaming can be added to R ;
return {C | C ∈ unfold(ff(R)), and C has the same head predicate as Q};
Algorithm 2: Our resolution-based algorithm

returns a new conjunctive query obtained by replacing each occurrence of an


unbound variable in Q with the symbol . The reduction step is necessary to
achieve completeness since an axiom that is not applicable to any body atom of
Q0 may be applicable to some body atom of λ(Q0 σ).
As a final remark, we point out that the CGLLR algorithm does not handle
qualified existential quantification natively—that is, there is no rewriting rule
for axioms of the form A v ∃R.B. Instead, w.l.o.g. every axiom of such a form
occurring in T is replaced with the set of axioms {A v ∃P1 , ∃P1− v B, P1 v R},
where P1 is a new atomic role not occurring in T .
Resolution-based Algorithm Our algorithm takes as input a conjunctive
query Q and an ELHI TBox T , and produces a datalog query that is a rewriting
of Q w.r.t. T . We have shown that if T is in DL-LiteR , then the rewriting is
a union of conjunctive queries. The algorithm first transforms Q and T into
clauses, and then computes the rewriting by using a resolution-based calculus to
derive new clauses from the initial set.
Our algorithm is presented in Algorithm 2, where we show the parts of the
original algorithm that are relevant to DL-LiteR only. The rewriting is computed
in four steps: clausification, saturation, unfolding, and pruning. The algorithm
starts by transforming Q and T into a set of clauses Ξ(T ) ∪ {Q}. The expression
Ξ(T ) denotes the set of clauses obtained from T according to Table 1. Then, the
algorithm keeps producing clauses in the saturation step until no other clause
unique up to variable renaming can be produced. The function resolve takes two
clauses C1 and C2 , and it returns a set containing every clause CR that can be
obtained by combining the atoms of C1 and C2 according to the inference tem-
plates shown in Table 2. A template of the form P1 R P2 denotes that, if C1 is a
clause of the form of P1 and C2 is a clause of the form of P2 , then resolve(C1 , C2 )
contains all clauses of the form of R that can be constructed from C1 and C2 ; oth-
erwise, resolve(C1 , C2 ) = ∅. After the saturation step, the resulting function-free
clauses are unfolded. The function ff takes a set of clauses N and returns the set
of function-free clauses of N . The function unfold takes a set of clauses N , and re-
turns the set obtained by unfolding every clause in N ; for example, if we have that
N = {QP (x) ← A(x), A(x) ← B(x)}, then unfold(N ) = N ∪ {QP (x) ← B(x)},
which results by unfolding A(x) ← B(x) into QP (x) ← A(x). A formal descrip-
tion of the unfolding step can be found in [13]. In the last step, every clause that
does not have the same head predicate as Q is dropped.
6 Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks

Table 1. Translating T into a set of clauses Ξ(T )

DL-LiteR clause DL-LiteR axiom


B(x) ← A(x) AvB
P (x, f (x)) ← A(x) A v ∃P.B
B(f (x)) ← A(x)
P (f (x), x) ← A(x) A v ∃P − .B
B(f (x)) ← A(x)
A(x) ← P (x, y) ∃P v A
A(x) ← P (y, x) ∃P − v A
S(x, y) ← P (x, y) P v S, P − v S −
S(x, y) ← P (y, x) P v S−, P − v S

Note 1. Each axiom of the form A v ∃R.B is uniquely associated with a function
symbol f .

The Main Difference The presented techniques mainly differ in their han-
dling of existential quantification: while our algorithm deals with axioms con-
taining existential quantifiers on the right-hand side by introducing functional
terms, the CGLLR algorithm does so by restricting the applicability of such ax-
ioms and relying on the reduction step. We explore this difference by means of
an example (taken from [6]). Consider a DL-LiteR TBox T that consists of the
following axioms:

Professor v ∃teaches, (1)



∃teaches v Student. (2)

The TBox T states that a professor teaches at least someone, and that someone
that is taught is a student. Consider the query

Q(x) ← teaches(x, y) ∧ Student(y). (3)

We first analyze the execution of the CGLLR algorithm. In the first iteration,
the axiom (1) is not applicable to teaches(x, y) because teaches(x, y) has more
than one bound variable. The reason why the applicability of (1) has to be
restricted in this case is that the CGLLR algorithm does not keep track of
information about role successors. In fact, naively allowing axioms of the form
of (1) to be applicable in such a case would result in the loss of soundness.
To illustrate this point, suppose that (1) were applicable to teaches(x, y) and
ref(teaches(x, y), (1)) = Professor(x); the algorithm would then obtain

Q(x) ← Professor(x) ∧ Student(y). (4)

Note that the relation between x and y is lost—that is, the fact that the individ-
ual represented by y must be a teaches-successor of the individual represented
by x is not captured by query (4). In particular, if T were to contain axiom (1)
only, then query (4) would clearly produce wrong answers.
A Comparison of Query Rewriting Techniques for DL-Lite 7

Table 2. Inference templates for the function resolve

C(x) ← B(x) B(f (x)) ← A(x)


C(f (x)) ← A(x)

B(x) ← P (x, y) P (x, f (x)) ← A(x) B(x) ← P (x, y) P (f (x), x) ← A(x)


B(x) ← A(x) B(f (x)) ← A(x)

B(x) ← P (y, x) P (x, f (x)) ← A(x) B(x) ← P (y, x) P (f (x), x) ← A(x)


B(f (x)) ← A(x) B(x) ← A(x)

S(x, y) ← P (x, y) P (x, f (x)) ← A(x) S(x, y) ← P (x, y) P (f (x), x) ← A(x)


S(x, f (x)) ← A(x) S(f (x), x) ← A(x)

S(x, y) ← P (y, x) P (x, f (x)) ← A(x) S(x, y) ← P (y, x) P (f (x), x) ← A(x)


S(f (x), x) ← A(x) S(x, f (x)) ← A(x)

u) ← B(t) ∧ Di (t~i ) B(f (x)) ← A(x)


V
QP (~
u)σ ← A(x)σ ∧ Di (t~i )σ
V
QP (~
where σ = MGU(B(t), B(f (x)), and B(t) is deepest in its clause.

u) ← P (s, t) ∧ Di (t~i ) P (x, f (x)) ← A(x)


V
QP (~
u)σ ← A(x)σ ∧ Di (t~i )σ
V
QP (~
where σ = MGU(P (s, t), P (x, f (x)), and P (s, t) is deepest in its clause.

u) ← P (s, t) ∧ Di (t~i ) P (f (x), x) ← A(x)


V
QP (~
u)σ ← A(x)σ ∧ Di (t~i )σ
V
QP (~
where σ = MGU(P (s, t), P (f (x), x), and P (s, t) is deepest in its clause.

Although the applicability of (1) is restricted, the axiom (2) is applicable to


Student(y) in (3), producing

Q(x) ← teaches(x, y) ∧ teaches( , y). (5)

In the next iteration, neither (1) nor (2) are applicable to any body atom of (5),
so no query is added in the reformulation step. In the reduction step, however,
the algorithm produces

Q(x) ← teaches(x, ), (6)

by unifying the body atoms of (5). In the following iteration, the axiom (1) is
applicable to the only body atom of (6), producing

Q(x) ← Professor(x). (7)

Note that without the reduction step, the algorithm would not have produced
query (7). It can easily be verified that no more queries unique up to variable
renaming can be produced; thus, the algorithm returns {(3), (5), (6), (7)}.
8 Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks

We now analyze the execution of our algorithm. The axioms (1) and (2) are
translated into the following clauses:

teaches(x, f (x)) ← Professor(x), (8)


Student(x) ← teaches(y, x). (9)

In the saturation step the algorithm produces

Student(f (x)) ← Professor(x), (resolve((8), (9))) (10)


Q(x) ← Professor(x) ∧ Student(f (x)), (resolve((3), (8))) (11)
Q(x) ← teaches(x, f (x)) ∧ Professor(x), (resolve((3), (10))) (12)
Q(x) ← Professor(x). (resolve((8), (12))) (13)

Note the difference between queries (4) and (11). Since the function symbol f
is uniquely associated with clause (8), unlike query (4), query (11) captures the
fact that the individual represented by f (x) must be a teaches-successor of the
individual represented by x. It can easily be verified that no other clause is
produced in the first step.
Clearly, ff(R) = {(3), (9), (13)}; therefore, we have that unfold(ff(R)) consists
of ff(R) and the query

Q(x) ← teaches(x, y) ∧ teaches(z, y), (14)

which results from unfolding (9) into (3). Finally, since the clause (9) does not
have the same head predicate as query (3), our algorithm returns {(3), (13), (14)}.
The use of functional terms is what mainly distinguishes our algorithm from
the CGLLR algorithm. This distinctive feature makes our approach more goal-
oriented, in the sense that it does not need to derive the queries produced by
the reduction step of the CGLLR algorithm in order to be complete. Moreover,
we have shown that every clause containing functional terms produced in the
saturation step of our algorithm can be safely dropped. Furthermore, our algo-
rithm handles qualified existential quantification natively, so it does not need to
introduce auxiliary roles. These are the main reasons why we conjecture that our
algorithm will often produce smaller rewritings than the CGLLR algorithm.

4 Evaluation
The main goal of the evaluation is to compare the algorithms described in Section
3 w.r.t. the size of the rewritings they produce. Since a rewriting containing fewer
queries than another is not necessarily less complex, we consider the size of a
rewriting as being the number of symbols needed to represent it in the standard
datalog notation. In order to give an indication of likely performance, we also
present the running time of both implementations; we point out, however, that
no special care was taken to obtain time efficient implementations and that the
times reported correspond to the computation of the rewritings only (we did not
evaluate the computed rewritings).
A Comparison of Query Rewriting Techniques for DL-Lite 9

The implementation of our algorithm is called REQUIEM and we refer to our


implementation of the CGLLR algorithm as C. Both implementations are avail-
able at REQUIEM’s Web site3 . All tests were performed on a desktop computer
with a 2.59 GHz Intel processor, 1.87 GB of RAM, running Windows XP. We
used Java 1.6.0 Update 7 with a maximum heap size of 256 MB.
Test Ontologies and Queries We selected test DL-LiteR ontologies with
varying numbers of axioms containing existential quantification. We consid-
ered ontologies that were developed in the context of real projects. The queries
we used were selected based on canonical examples used in the corresponding
project.
VICODI is an ontology developed in the EU-funded project of the same
name,4 that captures information about European history contextualized w.r.t.
location, time and subject. STOCKEXCHANGE is an ontology developed for
ontology-based data access [14], that represents the domain of financial institu-
tions of the European Union. UNIVERSITY is a DL-LiteR version of LUBM5 —a
benchmark developed at Lehigh University for testing the performance of on-
tology management and reasoning systems; the ontology describes the organiza-
tional structure of universities. ADOLENA (Abilities and Disabilities OntoLogy
for ENhancing Accessibility) is an ontology developed for ontology-based data
access for the South African National Accessibility Portal [11]; the ontology de-
scribes abilities, disabilities, and devices. We decided to include two synthetic
ontologies in our tests in order to have a controlled scenario to help us under-
stand the impact of the reduction step of the CGLLR algorithm. PATH1 and
PATH5 model information about graphs: nodes are represented by individuals,
and vertices are ABox assertions of the form edge(a, b). The ontology PATH5
contains concepts representing paths of length 1–5, while PATH1 contains a
concept representing paths of length 1 only. All the ontologies and queries are
available at REQUIEM’s Web site.
Results Figure 1 shows the size of the rewritings produced by REQUIEM
and C for the different considered ontologies and queries. Figure 2 shows the
time it took the implementations to compute the rewritings.
As can be seen in Figure 1, the rewritings produced by REQUIEM were never
larger than those produced by C and were often significantly smaller. The most
extreme case was the fifth query over ADOLENA, in which REQUIEM produced
624 queries, as opposed to the more than 78,500 produced by C. We compared
the actual rewritings in order to verify whether for every query QR produced by
REQUIEM, there was an equivalent query up to variable renaming QP produced
by C. It turned out that this was the case for all of our tests.
A closer inspection of the rewritings produced by C revealed that most of
them contained a considerable number of queries produced in the reduction step
of the algorithm. The impact of the reduction step can be clearly seen in the tests
involving PATH1 and PATH5. The queries we used with these two ontologies are
3
http://www.comlab.ox.ac.uk/projects/requiem/
4
http://www.vicodi.org/
5
http://swat.cse.lehigh.edu/projects/lubm/
10 Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks

Fig. 1. Size of the rewritings


VICODI PATH1
100,000 10,000

10,000
1,000

1,000

Symbols

Symbols
100
100

10
10

1 1
Q0 Q1 Q2 Q3 Q4 Q0 Q1 Q2 Q3 Q4
REQUIEM 454 762 6,525 11,911 3,761 REQUIEM 42 70 98 126 154
C 454 812 6,525 11,911 16,255 C 42 92 262 734 1,824

PATH5 STOCKEXCHANGE
10,000,000 10,000,000

1,000,000 1,000,000

100,000 100,000

Symbols
Symbols

10,000 10,000

1,000 1,000

100 100

10 10

1 1
Q0 Q1 Q2 Q3 Q4 Q0 Q1 Q2 Q3 Q4
REQUIEM 122 286 486 708 938 REQUIEM 158 11,422 56,536 111,092 466,896
C 298 2,814 23,740 200,696 1,690,902 C 158 13,680 121,674 173,088 1,602,203

UNIVERSITY ADOLENA
1,000,000 10,000,000

100,000 1,000,000

100,000
10,000
Symbols

Symbols
10,000
1,000
1,000
100
100

10 10

1 1
Q0 Q1 Q2 Q3 Q4 Q0 Q1 Q2 Q3 Q4
REQUIEM 118 10,378 29,376 113,270 279,266 REQUIEM 21,933 7,122 10,108 33,454 70,320
C 286 18,496 151,848 348,782 822,279 C 39,593 116,137 413,760 461,549 7,747,561

of the form Qn (x0 ) ← edge(x0 , x1 ) ∧ ... ∧ edge(xn , xn+1 ), for 0 ≤ n ≤ 4. Clearly,


every pair of body atoms in queries ofthis form unify; therefore, for every Q0
with m body atoms, C will produce m 2 new queries in the reduction step. The
most extreme case for these ontologies was the fifth query over PATH5, in which
REQUIEM produced 16 queries, as opposed to the more than 25,000 produced
by C. Note that for every query Q00 produced from a query Q0 in the reduction
step, there is a substitution σ such that Q0 σ = Q00 , in which case we say that
Q0 subsumes Q00 . It is well known that every such query Q00 can be dropped
from the rewriting without sacrificing completeness [8]. Identifying such queries,
however, is not straightforward since the CGLLR algorithm does not keep track
of which queries were produced in the reduction step. Notably, several of the
rewritings produced by C contained a considerable number of queries containing
the auxiliary roles introduced to simulate axioms of the form A v ∃R.B (see
Section 3). We believe that the CGLLR algorithm could benefit if such axioms
were handled natively. To the best of our knowledge, however, there does not
exist a version of the CGLLR algorithm that does so.
The results shown in Figure 2 suggest that REQUIEM will be significantly
faster than C in case the queries are relatively complex and/or the ontologies
contain a relatively large number of existential quantification axioms. The most
extreme case was again the fifth query over ADOLENA, in which REQUIEM
took a little over half a second, as opposed to the almost 20 minutes taken by C.
Finally, in order to determine the level of redundancy in the rewritings pro-
duced by our algorithm, we implemented REQUIEM-Full—an optimized version
of REQUIEM that uses forward subsumption [5] and performs an a posteriori
A Comparison of Query Rewriting Techniques for DL-Lite 11

Fig. 2. Running time


VICODI PATH1
1,000 100

100

Milliseconds

Milliseconds
10

10

1 1
Q0 Q1 Q2 Q3 Q4 Q0 Q1 Q2 Q3 Q4
REQUIEM 250 266 297 359 282 REQUIEM 1 1 1 16 32
C 1 1 31 47 78 C 1 16 16 16 31

PATH5 STOCKEXCHANGE
1,000,000 100,000

100,000
10,000

10,000

Milliseconds
Milliseconds

1,000
1,000
100
100

10
10

1 1
Q0 Q1 Q2 Q3 Q4 Q0 Q1 Q2 Q3 Q4
REQUIEM 15 15 94 922 24,172 REQUIEM 16 93 625 1,438 21,704
C 1 15 141 3,546 265,875 C 1 78 922 1,109 80,031

UNIVERSITY ADOLENA
100,000 10,000,000

1,000,000
10,000
100,000
Milliseconds

Milliseconds
1,000
10,000

1,000
100

100
10
10

1 1
Q0 Q1 Q2 Q3 Q4 Q0 Q1 Q2 Q3 Q4
REQUIEM 32 78 156 1,828 7,594 REQUIEM 203 79 109 281 657
C 1 62 531 4,610 16,219 C 157 687 4,187 8,000 1,119,000

query subsumption check on the rewritings. For every clause C produced in the
saturation step, the forward subsumption optimization verifies whether there is
a previously produced clause C 0 such that C 0 subsumes C. If this is the case,
then C is not added to the set of produced clauses. The query subsumption
check takes the rewriting after the pruning step and gets rid of every clause C
for which there is another clause C 0 such that C 0 subsumes C. Note that in
the case of the CGLLR algorithm, there always is a previously produced query
that subsumes every query produced in the reduction step; therefore, forward
subsumption would get rid of every query produced in the reduction step on
the fly, which would compromise completeness. Moreover, given the size of the
rewritings, a straightforward optimization based on an a posteriori query sub-
sumption check may be impractical. REQUIEM produced the same rewritings as
REQUIEM-Full for 63% of the queries. This suggests that REQUIEM alone can
be used effectively in practice in various scenarios.

5 Future Work

We plan to develop optimization techniques for REQUIEM in order to reduce or


eliminate recursion when possible in the computed rewritings. Additionally, we
plan to use REQUIEM in an ontology-based data access system using existing
database technology in order to evaluate the practicality of our approach for
query answering. Finally, we plan to extend REQUIEM to be a reasoner for
OWL 2 QL.
12 Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks

References
1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley,
1995.
2. F. Baader, S. Brandt, and C. Lutz. Pushing the EL Envelope. In International
Joint Conferences on Artificial Intelligence (IJCAI-05), 2005.
3. F. Baader and W. Nutt. Basic Description Logics, chapter 2, pages 47–100. Cam-
bridge University Press, 2003.
4. F. Baader and W. Snyder. Unification Theory. In A. Robinson and A. Voronkov,
editors, Handbook of Automated Reasoning, volume I, chapter 8, pages 445–532.
Elsevier Science, 2001.
5. L. Bachmair and H. Ganzinger. Resolution Theorem Proving. In A. Robinson
and A. Voronkov, editors, Handbook of Automated Reasoning, volume 1, chapter 2,
pages 19–100. North Holland, 2001.
6. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable
Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Fam-
ily. J. of Automated Reasoning, 2007.
7. D. Calvanese, G. D. Giacomo, M. Lenzerini, D. Nardi, and R. Rosati. Descrip-
tion Logic Framework for Information Integration. In Principles of Knowledge
Representation and Reasoning, pages 2–13, 1998.
8. C.-L. Chang and R. C.-T. Lee. Symbolic Logic and Mechanical Theorem Proving.
Academic Press, Inc., Orlando, FL, USA, 1997.
9. R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data Exchange: Semantics and
Query Answering. In ICDT, pages 207–224, 2003.
10. J. Heflin and J. Hendler. A portrait of the semantic web in action. IEEE Intelligent
Systems, 16(2):54–59, 2001.
11. C. M. Keet, R. Alberts, A. Gerber, and G. Chimamiwa. Enhancing web por-
tals with ontology-based data access: The case study of south africa’s accessibility
portal for people with disabilities. In OWLED, 2008.
12. M. Lenzerini. Data Integration: a theoretical perspective. In PODS ’02: Proceedings
of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of
database systems, pages 233–246, New York, NY, USA, 2002. ACM Press.
13. H. Pérez-Urbina, B. Motik, and I. Horrocks. Rewriting Conjunctive Queries under
Description Logic Constraints. In Proceedings of the International Workshop on
Logics in Databases, May 2008.
14. M. Rodriguez-Muro, L. Lubyte, and D. Calvanese. Realizing ontology based data
access: A plug-in for protégé. In Proc. of the Workshop on Information Integration
Methods, Architectures, and Systems (IIMAS 2008), pages 286–289. IEEE Com-
puter Society Press, 2008.
15. R. Rosati. On conjunctive query answering in EL. In Proceedings of the 2007
International Workshop on Description Logics (DL2007), CEUR-WS, 2007.
16. R. van der Meyden. Logical Approaches to Incomplete Information: A Survey. In
J. Chomicki and G. Saake, editors, Logics for Databases and Information Systems,
pages 307–356. Kluwer, 1998.
17. J. Widom. Research Problems in Data Warehousing. In 4th International Confer-
ence on Information and Knowledge Management, pages 25–30, Baltimore, Mary-
land, 1995.

You might also like