A Comparison of Query Rewriting Techniques For DL-Lite
A Comparison of Query Rewriting Techniques For DL-Lite
for DL-Lite
1 Introduction
2 Preliminaries
For P an atomic role, a basic role has the form P or P − . For A an atomic
concept and R a basic role, an antecedent concept has the form A or ∃R; and a
consequent concept has the form A, ∃R, or ∃R.A. A DL-LiteR TBox is a set of
inclusion assertions or inclusion axioms of the form A v C or R1 v R2 , where
A is an antecedent concept, C is a consequent concept, and R1 and R2 are basic
roles. The former inclusion assertion is called a concept inclusion, and the latter,
a role inclusion. An ABox is a set of membership assertions of the form A(a)
or P (a, b), where A is an atomic concept, P is an atomic role, and a and b are
constants. A DL-LiteR Knowledge Base (KB) K is a tuple hT , Ai, where T is a
DL-LiteR TBox, and A is an ABox. The semantics of K is defined as usual [3].
We use the well-known definitions of constants, variables, function symbols,
terms, and atoms of first-order logic [8]. A Horn clause C is an expression of the
form D0 ← D1 ∧ ... ∧ Dn where each Di is an atom. The atom D0 is called the
head, and the set {D1 , ..., Dn } is called the body. Variables that occur in the body
at most once and do not occur in the head are called unbound variables and may
be denoted with the symbol ; all other variables are called bound variables.
A Horn clause is safe if every variable occurring in the head also occurs in the
body. The depth of a term t is defined as depth(t) = 0 if t is a constant or a
variable, and depth(f (s1 , ..., sn )) = 1 + max(depth(si )) for 1 ≤ i ≤ n if t is a
functional term f (s1 , ..., sn ). The notion of depth for atoms and Horn clauses is
extended in the natural way. An atom Di occurring in a Horn clause C is said
to be deepest in C if depth(Di ) ≥ depth(Dj ) for every body atom Dj of C. A
conjunctive query over a DL KB K is a safe Horn clause whose head predicate
does not occur in K, and whose body is a set of atoms whose predicates are
concept and role names occurring in K. A union of conjunctive queries over K is
a set of conjunctive queries over K with the same head up to variable renaming
[4]. A tuple of constants ~a is a certain answer to a union of conjunctive queries
Q over K if and only if K ∪ Q |= QP (~a), where QP is the head predicate of Q,
and Q is considered to be a set of universally quantified implications with the
usual first-order semantics [8]. The set of all answers to Q over K is denoted by
ans(Q, K). Given a conjunctive query Q and a TBox T , a query Q0 is said to be
a rewriting of Q w.r.t. T if ans(Q, hT , Ai) = ans(Q0 , A) for every ABox A.
In this section we briefly present the two rewriting algorithms we consider in our
empirical evaluation: the CGLLR algorithm and our resolution-based algorithm.
Both algorithms transform a conjunctive query Q and a DL-LiteR TBox T into
a union of conjunctive queries Q0 that is a rewriting of Q w.r.t. T .
The CGLLR Algorithm This algorithm uses the axioms of T as rewriting
rules and applies them to the body atoms of Q. The rewriting rules are based
on a partial function ref(D, α) that takes as input an axiom α ∈ T and a body
atom D of Q. We define ref(D, α) as follows:
4 Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks
Note 1. Each axiom of the form A v ∃R.B is uniquely associated with a function
symbol f .
The Main Difference The presented techniques mainly differ in their han-
dling of existential quantification: while our algorithm deals with axioms con-
taining existential quantifiers on the right-hand side by introducing functional
terms, the CGLLR algorithm does so by restricting the applicability of such ax-
ioms and relying on the reduction step. We explore this difference by means of
an example (taken from [6]). Consider a DL-LiteR TBox T that consists of the
following axioms:
The TBox T states that a professor teaches at least someone, and that someone
that is taught is a student. Consider the query
We first analyze the execution of the CGLLR algorithm. In the first iteration,
the axiom (1) is not applicable to teaches(x, y) because teaches(x, y) has more
than one bound variable. The reason why the applicability of (1) has to be
restricted in this case is that the CGLLR algorithm does not keep track of
information about role successors. In fact, naively allowing axioms of the form
of (1) to be applicable in such a case would result in the loss of soundness.
To illustrate this point, suppose that (1) were applicable to teaches(x, y) and
ref(teaches(x, y), (1)) = Professor(x); the algorithm would then obtain
Note that the relation between x and y is lost—that is, the fact that the individ-
ual represented by y must be a teaches-successor of the individual represented
by x is not captured by query (4). In particular, if T were to contain axiom (1)
only, then query (4) would clearly produce wrong answers.
A Comparison of Query Rewriting Techniques for DL-Lite 7
In the next iteration, neither (1) nor (2) are applicable to any body atom of (5),
so no query is added in the reformulation step. In the reduction step, however,
the algorithm produces
by unifying the body atoms of (5). In the following iteration, the axiom (1) is
applicable to the only body atom of (6), producing
Note that without the reduction step, the algorithm would not have produced
query (7). It can easily be verified that no more queries unique up to variable
renaming can be produced; thus, the algorithm returns {(3), (5), (6), (7)}.
8 Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks
We now analyze the execution of our algorithm. The axioms (1) and (2) are
translated into the following clauses:
Note the difference between queries (4) and (11). Since the function symbol f
is uniquely associated with clause (8), unlike query (4), query (11) captures the
fact that the individual represented by f (x) must be a teaches-successor of the
individual represented by x. It can easily be verified that no other clause is
produced in the first step.
Clearly, ff(R) = {(3), (9), (13)}; therefore, we have that unfold(ff(R)) consists
of ff(R) and the query
which results from unfolding (9) into (3). Finally, since the clause (9) does not
have the same head predicate as query (3), our algorithm returns {(3), (13), (14)}.
The use of functional terms is what mainly distinguishes our algorithm from
the CGLLR algorithm. This distinctive feature makes our approach more goal-
oriented, in the sense that it does not need to derive the queries produced by
the reduction step of the CGLLR algorithm in order to be complete. Moreover,
we have shown that every clause containing functional terms produced in the
saturation step of our algorithm can be safely dropped. Furthermore, our algo-
rithm handles qualified existential quantification natively, so it does not need to
introduce auxiliary roles. These are the main reasons why we conjecture that our
algorithm will often produce smaller rewritings than the CGLLR algorithm.
4 Evaluation
The main goal of the evaluation is to compare the algorithms described in Section
3 w.r.t. the size of the rewritings they produce. Since a rewriting containing fewer
queries than another is not necessarily less complex, we consider the size of a
rewriting as being the number of symbols needed to represent it in the standard
datalog notation. In order to give an indication of likely performance, we also
present the running time of both implementations; we point out, however, that
no special care was taken to obtain time efficient implementations and that the
times reported correspond to the computation of the rewritings only (we did not
evaluate the computed rewritings).
A Comparison of Query Rewriting Techniques for DL-Lite 9
10,000
1,000
1,000
Symbols
Symbols
100
100
10
10
1 1
Q0 Q1 Q2 Q3 Q4 Q0 Q1 Q2 Q3 Q4
REQUIEM 454 762 6,525 11,911 3,761 REQUIEM 42 70 98 126 154
C 454 812 6,525 11,911 16,255 C 42 92 262 734 1,824
PATH5 STOCKEXCHANGE
10,000,000 10,000,000
1,000,000 1,000,000
100,000 100,000
Symbols
Symbols
10,000 10,000
1,000 1,000
100 100
10 10
1 1
Q0 Q1 Q2 Q3 Q4 Q0 Q1 Q2 Q3 Q4
REQUIEM 122 286 486 708 938 REQUIEM 158 11,422 56,536 111,092 466,896
C 298 2,814 23,740 200,696 1,690,902 C 158 13,680 121,674 173,088 1,602,203
UNIVERSITY ADOLENA
1,000,000 10,000,000
100,000 1,000,000
100,000
10,000
Symbols
Symbols
10,000
1,000
1,000
100
100
10 10
1 1
Q0 Q1 Q2 Q3 Q4 Q0 Q1 Q2 Q3 Q4
REQUIEM 118 10,378 29,376 113,270 279,266 REQUIEM 21,933 7,122 10,108 33,454 70,320
C 286 18,496 151,848 348,782 822,279 C 39,593 116,137 413,760 461,549 7,747,561
100
Milliseconds
Milliseconds
10
10
1 1
Q0 Q1 Q2 Q3 Q4 Q0 Q1 Q2 Q3 Q4
REQUIEM 250 266 297 359 282 REQUIEM 1 1 1 16 32
C 1 1 31 47 78 C 1 16 16 16 31
PATH5 STOCKEXCHANGE
1,000,000 100,000
100,000
10,000
10,000
Milliseconds
Milliseconds
1,000
1,000
100
100
10
10
1 1
Q0 Q1 Q2 Q3 Q4 Q0 Q1 Q2 Q3 Q4
REQUIEM 15 15 94 922 24,172 REQUIEM 16 93 625 1,438 21,704
C 1 15 141 3,546 265,875 C 1 78 922 1,109 80,031
UNIVERSITY ADOLENA
100,000 10,000,000
1,000,000
10,000
100,000
Milliseconds
Milliseconds
1,000
10,000
1,000
100
100
10
10
1 1
Q0 Q1 Q2 Q3 Q4 Q0 Q1 Q2 Q3 Q4
REQUIEM 32 78 156 1,828 7,594 REQUIEM 203 79 109 281 657
C 1 62 531 4,610 16,219 C 157 687 4,187 8,000 1,119,000
query subsumption check on the rewritings. For every clause C produced in the
saturation step, the forward subsumption optimization verifies whether there is
a previously produced clause C 0 such that C 0 subsumes C. If this is the case,
then C is not added to the set of produced clauses. The query subsumption
check takes the rewriting after the pruning step and gets rid of every clause C
for which there is another clause C 0 such that C 0 subsumes C. Note that in
the case of the CGLLR algorithm, there always is a previously produced query
that subsumes every query produced in the reduction step; therefore, forward
subsumption would get rid of every query produced in the reduction step on
the fly, which would compromise completeness. Moreover, given the size of the
rewritings, a straightforward optimization based on an a posteriori query sub-
sumption check may be impractical. REQUIEM produced the same rewritings as
REQUIEM-Full for 63% of the queries. This suggests that REQUIEM alone can
be used effectively in practice in various scenarios.
5 Future Work
References
1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley,
1995.
2. F. Baader, S. Brandt, and C. Lutz. Pushing the EL Envelope. In International
Joint Conferences on Artificial Intelligence (IJCAI-05), 2005.
3. F. Baader and W. Nutt. Basic Description Logics, chapter 2, pages 47–100. Cam-
bridge University Press, 2003.
4. F. Baader and W. Snyder. Unification Theory. In A. Robinson and A. Voronkov,
editors, Handbook of Automated Reasoning, volume I, chapter 8, pages 445–532.
Elsevier Science, 2001.
5. L. Bachmair and H. Ganzinger. Resolution Theorem Proving. In A. Robinson
and A. Voronkov, editors, Handbook of Automated Reasoning, volume 1, chapter 2,
pages 19–100. North Holland, 2001.
6. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable
Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Fam-
ily. J. of Automated Reasoning, 2007.
7. D. Calvanese, G. D. Giacomo, M. Lenzerini, D. Nardi, and R. Rosati. Descrip-
tion Logic Framework for Information Integration. In Principles of Knowledge
Representation and Reasoning, pages 2–13, 1998.
8. C.-L. Chang and R. C.-T. Lee. Symbolic Logic and Mechanical Theorem Proving.
Academic Press, Inc., Orlando, FL, USA, 1997.
9. R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data Exchange: Semantics and
Query Answering. In ICDT, pages 207–224, 2003.
10. J. Heflin and J. Hendler. A portrait of the semantic web in action. IEEE Intelligent
Systems, 16(2):54–59, 2001.
11. C. M. Keet, R. Alberts, A. Gerber, and G. Chimamiwa. Enhancing web por-
tals with ontology-based data access: The case study of south africa’s accessibility
portal for people with disabilities. In OWLED, 2008.
12. M. Lenzerini. Data Integration: a theoretical perspective. In PODS ’02: Proceedings
of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of
database systems, pages 233–246, New York, NY, USA, 2002. ACM Press.
13. H. Pérez-Urbina, B. Motik, and I. Horrocks. Rewriting Conjunctive Queries under
Description Logic Constraints. In Proceedings of the International Workshop on
Logics in Databases, May 2008.
14. M. Rodriguez-Muro, L. Lubyte, and D. Calvanese. Realizing ontology based data
access: A plug-in for protégé. In Proc. of the Workshop on Information Integration
Methods, Architectures, and Systems (IIMAS 2008), pages 286–289. IEEE Com-
puter Society Press, 2008.
15. R. Rosati. On conjunctive query answering in EL. In Proceedings of the 2007
International Workshop on Description Logics (DL2007), CEUR-WS, 2007.
16. R. van der Meyden. Logical Approaches to Incomplete Information: A Survey. In
J. Chomicki and G. Saake, editors, Logics for Databases and Information Systems,
pages 307–356. Kluwer, 1998.
17. J. Widom. Research Problems in Data Warehousing. In 4th International Confer-
ence on Information and Knowledge Management, pages 25–30, Baltimore, Mary-
land, 1995.