Logic Programming PDF
Logic Programming PDF
Volume 5
DOV M. GABBAY
C. J. HOGGER
J. A. ROBINSON,
Editors
CLARENDON PRESS
HANDBOOK OF LOGIC
IN A R T I F I C I A L I N T E L L I G E N C E
AND LOGIC P R O G R A M M I N G
Editors
Executive Editor
Dov M. Gabbay
Administrator
Jane Spurr
Edited by
DOV M. GABBAY
and
C. J. HOGGER
Imperial College of Science, Technology and Medicine
London
and
J. A. ROBINSON
Syracuse University, New York
C L A R E N D O N PRESS • O X F O R D
1998
Oxford University Press, Great Clarendon Street. Oxford OX2 6DP
Oxford New York
Athens Auckland Bangkok Bogota Bombay
Buenos Aires Calcutta Cape Town Dares Salaam
Delhi Florence Hong Kong Istanbul Karachi
Kuala Lumpur Madras Madrid Melbourne
Mexico City Nairobi Paris Singapore
Taipei Tokyo Toronto Warsaw
and associated companies in
Berlin Ibadan
A catalogue record for this book is available from the British Library
ISBN 0 19 853792 I
The Handbooks
The Handbook of Logic in Artificial Intelligence and Logic Programming
and its companion, the Handbook of Logic in Computer Science, have been
created in response to a growing need for an in-depth survey of the appli-
cation of logic in AI and computer science.
We see the creation of the Handbook as a combination of authoritative
exposition, comprehensive survey, and fundamental reasearch exploring the
underlying unifying themes in the various areas. The intended audience is
graduate students and researchers in the areas of computing and logic, as
well as other people interested in the subject. We assume as background
some mathematical sophistication. Much of the material will also be of
interest to logicians and mathematicians.
The tables of contents of the volumes were finalized after extensive dis-
cussions between Handbook authors and second readers. The first two
volumes present the background logic and mathematics extensively used
in artificial intelligence and logic programming. The point of view is ap-
plication oriented. The other volumes present major areas in which the
methods are used. These include: Volume 1—Logical foundations; Volume
2—Deduction methodologies; Volume 3—Nonmonotonic reasoning and un-
certain reasoning; Volume 4—Epistemic and temporal reasoning.
The chapters, which in many cases are of monographic length and scope,
are written with emphasis on possible unifying themes. The chapters have
an overview, introduction, and main body. A final part is dedicated to
vi PREFACE
London D. M. Gabbay
July 1997
Vll
Contents
List of contributors xiv
Introduction: Logic and Logic Programming
Languages
Michael J. O'Donnell 1
1 Introduction 1
1.1 Motivation 1
1.2 A notational apology 3
2 Specifying logic programming languages 7
2.1 Semantic systems and semantic consequences ... 7
2.2 Query Systems, questions and answers 11
2.3 Examples of logic programming languages 15
3 Implementing logic programming languages 37
3.1 Proof systems 37
3.2 Soundness and completeness of proof systems ... 40
3.3 Programming systems 44
3.4 Soundness and completeness of programming
systems 49
3.5 Proof-theoretic foundations for logic programming 56
4 The uses of semantics 57
4.1 Logical semantics vs. denotational semantics . . . . 57
4.2 Logical semantics vs. initial/final-algebra and
Herbrand semantics 58
Index 789
Contributors
J. Gallagher Department of Computer Science, University of Bristol,
University Walk, Bristol BS8 3PN.
P. M. Hill School of Computer Studies, The University of Leeds, Leeds
LS2 9JT.
J. Jaffar Department of Information Systems and Computer Science,
National University of Singapore, Kent Ridge, Singapore 0511.
A. C. Kakas Department of Computer Science, University of Cyprus,
PO Box 537, CY-1678 Nicosia, Cyprus.
R. A. Kowalski Department of Computing, Imperial College of Science,
Technology and Medicine, 180 Queen's Gate, London SW7 2BZ.
J. Lobo Department of Computer Science, University of Illionois at
Chicago Circle, Chicago, Illinois, USA.
D. W. Loveland Computer Science Department, Box 91029, Duke Uni-
versity, Durham, NC 27708-0129, USA.
M. J. Maher IBM Thomas J. Watson Research Center, PO Box 704,
Yorktown Heights, NY 10598, USA.
D. Miller Computer and Information Science, University of Pennsylva-
nia, Philadelphia, PA 19104-6389, USA.
J. Minker Department of Computer Science and Institute for Advanced
Computer Studies, University of Maryland, College Park, Maryland 20742,
USA.
G. Nadathur Department of Computer Science, University of Chicago,
1100 East 58th Street, Chicago, Illinois 60637, USA.
M. J. O'Donnell Department of Computer Science, University of Chicago,
1100 East 58th Street, Chicago, Illinois 60637, USA.
A. Pettorossi Electronics Department, University of Rome II, Via della
Ricerca Scientifica, I-00133 Roma, Italy.
M. Proietti Viale Manzoni 30, I-00185 Roma, Italy.
A. Rajasekar San Diego Supercomputer Center, La Jolla, California
92093, USA.
J. Shepherdson Department of Mathematics, University of Bristol,
University Walk, Bristol BS8 3PN.
CONTRIBUTORS xv
Contents
1 Introduction 1
1.1 Motivation 1
1.2 A notational apology 3
2 Specifying logic programming languages 7
2.1 Semantic systems and semantic consequences ... 7
2.2 Query Systems, questions and answers 11
2.3 Examples of logic programming languages 15
3 Implementing logic programming languages 37
3.1 Proof systems 37
3.2 Soundness and completeness of proof systems ... 40
3.3 Programming systems 44
3.4 Soundness and completeness of programming systems 49
3.5 Proof-theoretic foundations for logic programming 56
4 The uses of semantics 57
4.1 Logical semantics vs. denotational semantics . . . . 57
4.2 Logical semantics vs. initial/final-algebra and
Herbrand semantics 58
1 Introduction
1.1 Motivation
Logic, according to Webster's dictionary [Webster, 1987], is 'a science that
deals with the principles and criteria of validity of inference and demon-
stration: the science of the formal principles of reasoning.' Such 'principles
and criteria' are always described in terms of a language in which infer-
ence, demonstration, and reasoning may be expressed. One of the most
useful accomplishments of logic for mathematics is the design of a particu-
lar formal language, the First Order Predicate Calculus (FOPC). FOPC is
so successful at expressing the assertions arising in mathematical discourse
2 Michael J. O'Donnell
that mathematicians and computer scientists often identify logic with clas-
sical logic expressed in FOPC. In order to explore a range of possible uses of
logic in the design of programming languages, we discard the conventional
identification of logic with FOPC, and formalize a general schema for a vari-
ety of logical systems, based on the dictionary meaning of the word. Then,
we show how logic programming languages may be designed systematically
for any sufficiently effective logic, and explain how to view Prolog, Dat-
alog, aProlog, Equational Logic Programming, and similar programming
languages, as instances of the general schema of logic programming. Other
generalizations of logic programming have been proposed independently by
Meseguer [Meseguer, 1989], Miller, Nadathur, Pfenning and Scedrov [Miller
et al., 1991], Goguen and Burstall [Goguen and Burstall, 1992].
The purpose of this chapter is to introduce a set of basic concepts for
understanding logic programming, not in terms of its historical develop-
ment, but in a systematic way based on retrospective insights. In order to
achieve a systematic treatment, we need to review a number of elementary
definitions from logic and theoretical computer science and adapt them to
the needs of logic programming. The result is a slightly modified logical
notation, which should be recognizable to those who know the traditional
notation. Conventional logical notation is also extended to new and anal-
ogous concepts, designed to make the similarities and differences between
logical relations and computational relations as transparent as possible.
Computational notation is revised radically to make it look similar to log-
ical notation. The chapter is self-contained, but it includes references to
the logic and theoretical computer science literature for those who wish to
explore connections.
There are a number of possible motivations for developing, studying,
and using logic programming languages. Many people are attracted to
Prolog, the best known logic programming language, simply for the spe-
cial programming tools based on unification and backtracking search that
it provides. This chapter is not concerned with the utility of particular
logic programming languages as programming tools, but with the value
of concepts from logic, particularly semantic concepts, in the design, im-
plementation, and use of programming languages. In particular, while
denotational and algebraic semantics provide excellent tools to describe
important aspects of programming systems, and often to prove correct-
ness of implementations, we will see that logical semantics can exploit the
strong traditional consensus about the meanings of certain logical notations
to prescribe the behavior of programming systems. Logical semantics also
provides a natural approach, through proof systems, to verifiably correct
implementations, that is sometimes simpler than the denotational and al-
gebraic approaches. A comparison of the three styles of semantics will show
that denotational and algebraic semantics provide descriptive tools, logical
semantics provides prescriptive tools, and the methods of algebraic seman-
Introduction 3
Q?- T | D - F
means that in response to the input /, the program P may perform the
computation C, yielding output O. The correspondence between the ar-
guments Q and I, T and P, D and C, F and O displays the crucial cor-
respondence between logic and computation that is at the heart of logic
programming.
There are two closely related trinary notations.
Q7-TI-F
T\-F
means that there exists a proof D such that T I D - F—that is, F is for-
mally derivable from T. Corresponding to the relation I- of formal deriv-
ability is the relation ^ of semantic entailment.
T|=F
Q?-T|=F
/
means that F is an answer to Q semantically entailed by T (Q 7~ F and
T (= F) in analogy to Q 1- T h F. The mathematical definition of semantic
entailment involves one more semantic relation. Let M be a model, and F
a formula.
M\=F
means that F is true in M.
Table 1 displays all of the special notations for semantic, proof-theoretic,
and computational relations. The precise meanings and applications of
these notations are developed at length in subsequent sections. The no-
tation described above is subscripted when necessary to distinguish the
logical and computational relations of different systems.
L ogic Computation
Semantics Proof
Q7-TID-F I> P 0 C ->• O
Q?-T|=F Q?-ThF I> P\HO
I> PDC
QT- F
C->0
T\D - F
T|=F TI-F
M \=F
Table 1. Special notations for logical and computational relations
Introduction 7
truth. For example, well-known sets of formulae often come with a syntactic
operator to construct, from two formulae A and B, their logical conjunction
A ^ B. The semantics for conjunctions is defined structurally, by the rule
M |= A A B if and only if M |= A and M |= B. The formal analysis of this
chapter deals only with the abstract relation of a model to a formula that
holds in the state of the world represented by that model, not the internal
structure of that relation, because we are interested here in the use of
semantics for understanding logic programming, rather than the deeper
structure of semantics itself. Goguen's and Burstall's institutions [Goguen
and Burstall, 1992] are similar to semantic systems, but they capture in
addition the structural connection between syntax and semantics through
category theory, and show that the functions Models and Theory form a
Galois connection.
Notice that the sets F of formulae and M of models are not required to
be given effectively. In well-known semantic systems, the set of formulae is
normally computable, since formulae are normally finite syntactic objects,
and it is easy to determine mechanically whether a given object is a formula
or not. Infinite formulae, however, have important uses, and they can be
given practical computational interpretations, so we do not add any formal
requirement of computability. The set of models, on the other hand, is
typically quite complex, because models represent conceivable states of an
external world, rather than finite constructions of our own minds. In fact,
for many semantic systems there are technical set-theoretic problems even
in regarding the collection of models in the system as a set, but those
problems do not affect any of the results of this chapter.
In this chapter, basic concepts are illustrated through a running ex-
ample based on the shallow implicational calculus (SIC), designed to be
almost trivial, but just complex enough to make an interesting example.
More realistic examples are treated toward the end of the chapter.
Example 2.1.2. Let At be a set of atomic propositional formulae. The
set Fsh of formulae in the shallow implicational calculus is the smallest set
such that:
1. At C Fsh
2. If a, b € At, then (a => 6) <E Fsh
The set MSH of models in SIC is defined by
Msh = 2At
simple behavior:
1. for atomic formulae a € At, T |=sh a if and only if there is a fi-
nite sequence {O.Q, ..., am) of atomic formulae such that OQ 6 T, and
(a,i =>• di+i) € T for all i < m, and am = a
2. T |=sh (o =>•fr)if and only if there is a finite sequence (OQ, ..., am) of
atomic formulae such that a0 € T U {a}, and (<n => Oi+i) 6 T for all
i < m, and am = b
We may think of the implications in T as directed edges in a graph whose
vertices are atomic formulae. Atomic formulae in T are marked true. An
atomic formula a is a semantic consequence of T precisely if there is a
directed path from some atomic formula in T to a. Similarly, an implication
a => b is a semantic consequence of T precisely if there is a directed path
from a, or from an atomic formula in T, to 6. Notice that SIC satisfies the
deduction property: [Andrews, 1986; Kleene, 1952; Gallier, 1986]
plays the part of the speaker above, the processor plays the part of the
auditor, the program is the utterance, and the logical meaning of the pro-
gram is the resulting state of knowledge produced in the auditor/processor.
Inputs to, computations of, and outputs from logic programs are treated
later.
Notice that this style of semantic analysis of communication does not
give either speaker or auditor direct access to inspect or modify the models
constituting the other's state of implicit knowledge. Rather, all such access
is mediated by the utterance of explicit logical formulae. Also, notice that
there is no attempt to construct a unique model to represent a state of
knowledge, or the information communicated by an utterance. Rather, an
increase in implicit knowledge is represented by a reduction in the variabil-
ity of members of a set of models, any one of which might represent the
real state of the world. Unless the semantic-consequence relation of a se-
mantic system is very easy to compute—which it seldom is—the difference
between implicit knowledge and effectively utterable explicit knowledge can
be quite significant. The proof systems of Section 3.1 help describe a way
in which implicit knowledge is made explicit, and yield a rough description
of the computations of logic programs.
The preceding scheme for representing communication of knowledge
deals naturally with a sequence of utterances, by iterating the process of
shrinking the auditor's set of models. There is no provision, however, for
analyzing any sort of interactive dialogue, other than as a pair of formally
unrelated monologues. The query systems of the next section introduce a
primitive sort of interactive question-answering dialogue.
Questions, like formulae, are normally finite syntactic objects, and the set
of all questions is normally computable, but we allow exceptions to the
normal case.
Q ?- F is intended to mean that F is an answer to Q. ?- is intended
only to determine the acceptable form for an answer to a question, not
to carry any information about correctness of an answer. For example, it
is reasonable to say that '2 + 2 = 5' is an incorrect answer to 'what is
2 + 2?', while '2 + 2 = 22' is correct, but not an answer. The correctness or
incorrectness of an answer is evaluated semantically with respect to explicit
knowledge.
Definition 2.2.2. Let Q = {Fq,Q,?-) be a query system, and let
«S = {Fs, M, ^=) be a semantic system with FQ C FS-
Q 7- T |= F means that F € FQ is a semantically correct answer to
Q 6 Q for explicit knowledge T C FS, defined by
answer to questions imp (a) for all atomic formulae a. This problem may
be avoided by considering states of knowledge in which only implications are
known, or it may be addressed by changing the underlying semantic system
to one with a relevant interpretation of implication [Anderson and Belnap
Jr., 1975]. Second, (a => a) is a correct answer to the question imp(a).
(a => a) is a tautology, that is, it holds in all models, so it cannot give
any information about a state of knowledge. We could define a new query
system, in which only nontautologies are considered to be answers. Since,
for most useful logics, the detection of tautologies ranges from intractable to
impossible, such a technique is generally unsatisfying. A better approach
is to let a question present a set of atomic formulae that must not be
used in an answer, since the questioner considers them to be insufficiently
informative. We may find later that certain nontautological formulae are
uninformative for various reasons, and this technique reserves the flexibility
to handle those cases.
Example 2.2.4. Let rest-imp be a new formal symbol, and let
Now (Fsh,Qsi > ?~si) is a query system representing the conceivable an-
swers to questions of the form 'what atomic formula not in A does a imply?'
The new query system of Example 2.2.4 may be used very flexibly
to guide answers toward the most informative implications of an atomic
formula a. If the explicit knowledge available to the auditor to answer
questions is finite, then there are only a finite number of atomic formulae
that can appear in an answer, so the sets of prohibited formulae may simply
be listed. In more sophisticated languages than SIC, we need some sort
of finite notation for describing large and even infinite sets of prohibited
answers.
Query systems allow a further enrichment of the analysis of communi-
cation. Once a speaker has communicated some implicit knowledge K to
an auditor by uttering formulae, a questioner (sometimes, but not always,
identical with the speaker) may ask a question Q, which the auditor tries
to answer by discovering a formula F such that Q f- F (F is an answer
to the question Q), and F 6 Theory (K) (Q is correct according to the
implicit knowledge K).
So, given a set T of formulae expressing the knowledge Models(T), a
question Q provides an additional constraint on the search for a formula F
such that T (= F, to ensure that Q 1- F as well. In many cases, there is
more than one correct answer F such that Q ?- T (= F. Depending on the
14 Michael J. O'Donnell
Consequentially strongest answers are not necessarily unique, but all con-
sequentially strongest answers are semantically equivalent. Notice that the
comparison of strength for two answers F and G is done without taking
into account the knowledge T. That is, we require {F} \= G, rather than
T U {F} |= G. This makes sense because T is known to the auditor, but
not necessarily to the questioner. Even if the questioner knows T, he may
not be able to derive its consequences. The whole purpose of the communi-
cation between questioner and auditor is to give the questioner the benefit
of the auditor's knowledge and inferential power. So, the value of an an-
swer to the questioner must be determined independently of the knowledge
used by the auditor in its construction (the alternative form T U {F} |= G
holds trivially by monotonicity, so it carries no information anyway).
In order to illustrate the use of consequentially strongest answers, we
extend SIC to deal with conjunctions of implications.
Example 2.2.6. Expand the formulae of SIC to the set
conj-imp (c) ?~sc («i =>• Z»i) A • • • A (am =>• bm) if and only if
Oj = c for all i < m
Introduction 15
3. if f 6EFun; for some i > 0 and ti,... ,t, € TP, then /(ti,... ,ti) 6 Tp
Terms are intended to represent objects in some universe of discourse.
/ ( < i , . . . , t { ) is intended to represent the result of applying the function
denoted by / to the objects represented by ti,...,ti. We often take the
liberty of writing binary function application in infix notation. For exam-
ple, if + € Fun2 we may write (ti + i?) for +(^1,^2)- A ground term is a
term containing no variables.
The set Fp of formulae in FOPC is defined inductively as the least set
such that:
1. True, False € Fp
2. if P e Predo, then P £ FP
3. if P E Predj for some i > O a n d f i , . . . ,tt e TP, thenP(ti,... ,ti) E Fp
4. if A, B e FP, then (A A B), (AVB),(A=* B), (->A) e FP
5. if A 6 Fp and x <E V, then (3x : A), (Vx :A)eFP
Formulae in FOPC are intended to represent assertions about the objects
represented by their component terms. True and False are the trivially
true and false assertions, respectively. P(ti,..., ti) represents the assertion
that the relation denoted by P holds between ti,... ,ti. (A A B), (A V B),
(A => B), (-*A) represent the usual conjunction, disjunction, implication,
and negation. (3x : A) represents 'there exists x such that A,' and (Vx : A)
represents 'for all x A.' Parentheses may dropped when they can be inferred
from normal precedence rules.
In a more general setting, it is best to understand Furii and Pred^ as
parameters giving a signature for first-order logic, and let them vary to
produce an infinite class of predicate calculi. For this chapter, we may take
Fun; and Predi to be arbitrary but fixed. In many texts on mathematical
logic, the language of FOPC includes a special binary predicate symbol '='
for equality. We follow the Prolog tradition in using the pure first-order
predicate calculus, without equality, and referring to it simply as FOPC.
The intended meanings of FOPC formulae sketched above are formal-
ized by the traditional semantic system defined below. First, we need a set
of models for FOPC.
Definition 2.3.2 ([Andrews, 1986; Kleene, 1952; Gallier, 1986]).
Let U be an infinite set, called the universe. Let U C U.
A variable assignment over U is a function z/: V —)• U.
A predicate assignment over U is a function
p : LKPredi : i > 0} -> U{2 (£/i) : i > 0}
such that P 6 Pred; implies p(P) C [/' for all i > 0.
A function assignment over U is a function
T : LKFuni : i > 0} -> (J{U^ :i>0}
such that / e Funi implies r(f) : Ui -> U for all i > 0.
If U C U, T is a function assignment over U, and p is a predicate
Introduction 17
= {(whatzi,...,Zi :F) : F 6
(what x\ , . . . , Xi : F) T- p G
if and only if
Now (Fp, QP, ?-p) is a query system representing the conceivable single
answers to questions of the form 'for what terms t\ , . . . , ti does F[t i , . . . , ti/
* ! , . . . , Si] hold?'
The query system above has a crucial role in the profound theoretical
connections between logic and computation. For each finite (or even semi-
computable) set T C Fp of formulae, and each question Q = (what xi , . . . ,
Xi : F), the set
for some formula F 6 Fp. Similarly, every partial computable function </>
may be defined by choosing an appropriate formula F 6 Fp, and letting
(j)(i) be the unique number j such that
Notice that the FOPC questions (what Xi,..., Xi : F) do not allow triv-
ial tautological answers, such as the correct answer (a => a) to the question
imp(a) ('what atomic formula does a imply?', Example 2.2.3, Section 2.2).
In fact, (what xi,..., Xi : F) has a tautological answer if and only if F is
a tautology. FOPC questions avoid this problem through the distinction
between predicate symbols and function symbols. When we try to find an
answer F[ti,..., tifx\,..., £;] to the question (what x\,..., Xi : F), the in-
formation in the question (what xi,..., Xi : F) is carried largely through
predicate symbols, while the information in the answer F[t\,..., tijx\,...,
Xi] is carried entirely by the function symbols in ti,..., ti, since the pred-
icate symbols in the formula F are already fixed by the question. It is
the syntactic incompatibility between the formula given in the question
and the terms substituted in by the answer that prevents tautological an-
swers. Suppose that FOPC were extended with a symbol choose, where
(choosex : F) is a term such that (3x : F) implies F[(choosea;: F)/x}.
Then jP[(choose x : F)/x] would be a trivially (but not quite tautologi-
cally) correct answer to (what x : F) except when no correct answer exists.
The absence of trivial tautological answers does not mean that all an-
swers to FOPC questions are equally useful. In some cases a question
has a most general semantically correct answer. This provides a nice syn-
tactic way to recognize certain consequentially strongest answers, and a
useful characterization of all answers even when there is no consequentially
strongest answer.
Proposition 2.3.7. // G' is an instance of G, then G' is a semantic
consequence of G (G ^p G'). It follows immediately that if G is a se-
mantically correct answer to Q £ Qp (Q ?-p T \=p G), then G' is also a
semantically correct answer (Q ?-p T \=p G').
If G' is a variable renaming of G, then they are semantically equivalent.
20 Michael J. O'Donnell
2. for all GO <E FP, if G is an instance of G0 and Q 7-P T |=P G0, then
GO is a variable renaming of G.
A set A C Fp of formulae is a most general set of correct answers to Q
for T if and only if
1. each formula F e A is a most general answer to Q for T
2. for all formulae G e Fp, if Q ?-p T ^p G, then G is an instance of
some F € A
3. for all formulae FI, F2 £ A, if F2 is an instance of FI, then F2 = FI
It is easy to see that for each question Q £ QP and set of formulae T C Fp
there is a most general set of correct answers (possibly empty or infinite).
Furthermore, the most general set of correct answers is unique up to vari-
able renaming of its members.
Notice that it is very easy to generate all correct answers to a question
in Qp from the most general set— they are precisely the instances of its
members. If Q has a consequentially strongest answer, then it has a conse-
quentially strongest answer that is also most general. If the most general
set of correct answers is the singleton set {F}, then F is a consequentially
strongest answer.
Example 2.3.9. Let G <E Pred2 be a binary predicate symbol, where
G(ti , £2) is intended to assert that ti is strictly larger than t% . Suppose that
objects in the universe have left and right portions, and that l,r € Funi
denote the operations that produce those portions. A minimal natural
state of knowledge about such a situation is
if and only if
Since a question always requests substitutions for all free variables, the
header 'whatxi,...,Xi' is omitted and the question is abbreviated in the
form
completely determines the answer, actual Prolog output presents only the
substitution, in the form
RANGE X P
RANGE Y O
which declare the variable X to range over tuples of the relation P and Y
to range over tuples of Q, the expression
VX3Y(X.D = Y.G)
systems the notations are equivalent. Another way of viewing this seman-
tic difference is to suppose that each database implicitly contains a dosed
world assumption [Reiter, 1978] expressing the fact that only the objects
mentioned in the database exist. In FOPC with equality, the closed world
assumption may be expressed by the formula Va; : x = CQ V • • • V x = cn ,
where CD, . . . , cn are all of the constant symbols appearing in the database.
Without equality, we can only express the fact that every object acts just
like one of CQ, . . . , cn (i.e., it satisfies exactly the same formulae), and even
that requires an infinite number of formulae.
Given the translation of DSL ALPHA expressions to FOPC formulae
suggested above, we may translate DSL ALPHA queries into FOPC ques-
tions. DSL ALPHA queries have the general form
GET
where PI , . . . , Pi are relations in the database, Q is a new relational symbol
not occurring in the database, D\, . . . ,Di are domain names, and E is an
expression. Appropriate declarations of the ranges of variables in E must
be given before the query. Let F be a FOPC formula equivalent to E, using
the variables x\ , . . . , Xi for the values PI .D\ , . . . , Pi-Di . Then, the effect of
the DSL ALPHA query above is to assign to Q the relation (set of tuples)
answering the question
That is,
tion, which looks even more distant from FOPC than the relational cal-
culus notation, but is still easily translatable into FOPC. See [Date, 1986;
Codd, 1972] for a description of relational algebra notation for database
queries, and a translation to the relational calculus notation.
2.3.4 Programming in equational logic
Another natural logical system in which to program is equational logic.
A large number of programming languages may be viewed essentially as
different restrictions of programming in equational logic.
Definition 2.3.13. Let the set V of variables, the sets Fun0, Funi,... of
constant and function symbols, and the set Tp of terms be the same as in
FOPC (see Definition 2.3.1). Let = be a new formal symbol (we add the
dot to distinguish between the formal symbol for equality in our language
and the meaningful equality symbol = used in discussing the language).
The set F = of equational formulae (or simply equations) is
Models for equational logic are the same as models for FOPC, omit-
ting the predicate assignments. Although = behaves like a special binary
predicate symbol, it is given a fixed meaning (as are A, V, ->, =>, 3, V),
so it need not be specified in each model. An equational formula t\ = t-z
holds in a model precisely when t\ denotes the same object as tz for every
variable assignment.
Definition 2.3.14. Let the infinite universe U and the set of function
assignments be the same as in FOPC (Definition 2.3.4). If U C U and r
is a function assignment over U, then {U, T) is a model of equational logic.
M^. is the set of all models of equational logic.
Let the set of variable assignments be the same as in FOPC (Def-
inition 2.3.2), as well as the definition of a term valuation rv from a function
assignment T and variable assignment v (Definition 2.3.3). The classical
semantic system for equational logic is {F_L,M_L, (=_:.), where (U,r) (=.:. t\
= t2 if and only if rv(ti) = Tvfo) for all variable assignments v over U.
Models of equational logic are essentially algebras [Cohn, 1965; Gratzer,
1968; Mac Lane and Birkhoff, 1967] . The only difference is that alge-
bras are restricted to signatures—subsets, usually finite, of the set of con-
stant and function symbols. Such restriction does not affect any of the
properties discussed in this chapter. If T is a finite subset of Tj., then
the set of algebras Models(T) (restricted, of course, to an appropriate
signature) is called a variety. For example, the monoids are the models
of {m(x,m(y,z)) = m(m(x,y),z), m(x,e) = x, m(e,x) == e} restricted to
the signature with one constant symbol e and one binary function symbol
m.
Introduction 29
where tla is fixed by the language design, tp, . . . ,tp are determined by
the program, and tin, . . . ,tin are determined by the input. In principle,
other forms are possible, but it seems most natural to view the language
design as providing an operation to be applied to the program, producing
an operation to be applied to the input. For example, in a pure Lisp
eval interpreter, tla = eval(x 1 , nil). The program to be evaluated is tpr
(the second argument nil to the eval function indicates an empty list of
definitions under which to evaluate the program). Pure Lisp has no input—
conceptual input may be encoded into the program, or extralogical features
34 Michael J. O'Donnell
parameters, taken from the lambda calculus [Church, 1941; Stenlund, 1972;
Barendregt, 1984] . Since the equations for manipulating bindings were
never formalized precisely in the early days of Lisp, implementors may ar-
gue that their work is correct with respect to an unconventional definition
of substitution. Early Lispers seem to have been unaware of the logical
literature on variable substitution, and referred to the dynamic binding
problem as the 'funarg' problem.
Essentially all functional programming languages before the 1980s fail
to find certain semantically correct answers, due to infinite evaluation of
irrelevant portions of a term. In conventional Lisp implementations, for
example, the defining equation car (cons (x, y)) = x is not applied to a term
cor(cons(t1, t 2 )) until both t1 and £2 have been converted to normal forms.
If the attempt to normalize t2 fails due to infinite computation, then the
computation as a whole fails, even though a semantically correct answer
might have been derived using only t1. Systems that fail to find a normal
form for car(cons(t 1 , t2)) unless both of t1 and t2 have normal forms are said
to have strict cons functions. The discovery of lazy evaluation [Friedman
and Wise, 1976; Henderson and Morris, 1976; O'Donnell, 1977] showed
how to avoid imposing unnecessary strictness on cons and other functions,
and many recent implementations of functional programming languages are
guaranteed to find all semantically correct answers. Of course, it is always
possible to modify defining equations so that the strict interpretation of a
function is semantically complete.
Example 2.3.19. Consider Lisp, with the standard equations
car (cons (x, y)) = test(strict(y), x) and cdr (cons (x, y)) = test(strict(x), y)
Lazy evaluation with the new set of equations has the same effect as strict
evaluation with the old set.
Some definitions of functional programming language specify strictness
explicitly. One might argue that the strict version of cons was intended in
the original definition of Lisp [McCarthy, 1960], but strictness was never
stated explicitly there.
36 Michael J. O'Donnell
1. F; € T
2. Fi = (a =>• a) for some atomic formula a € At
3. Ft = 6, and there exist j, k < i such that Fj = a and Fk = (a => b)
for some atomic formulae a, 6 e At
4. FJ = (a => c), and there exist j, k < i such that F, = (a => b) and
Ft = (b => c) for some atomic formulae a,b,c£ At
The set Psn of natural deduction proofs in SIC and the proof relation
I "SnQ 2 Sh x Psn x FSK are denned by simultaneous inductive definition
to be the least set and relation such that:
T |- F implies T |= F
D is complete for 5 if and only if, for all T C F and F £ FD,
T |= F implies T |- F
Each of the proposed proof systems for SIC is sound and complete
for the semantic system of SIC. The following proofs of completeness for
SIC are similar in form to completeness proofs in general, but unusu-
ally simple. Given a set of hypotheses T, and a formula F that is not
provable from T, we construct a model M satisfying exactly the set of
provable consequences of T within some sublanguage FF containing F
(Theory(M) n FF 3 Theory(Models(T)) n FF). In our example below,
FF is just the set of all shallow implicational formulae (F Sh ), and the model
construction is particularly simple.
Example 3.2.2. Each of the proof systems (Fsh, PSn, |-Sn) of Exam-
ple 3.1.3 and-(F sh , PSl, |- S1) of Example 3.1.2 is sound and complete for
the semantic system (FSh, Msh, |=sh >of Example 2.1.2.
The proofs of soundness involve elementary inductions on the size of
proofs. For the natural deduction system, the semantic correctness of a
proof follows from the correctness of its components; for the linear system
correctness of a proof (F0,..., F m+1 ) follows from the correctness of the
prefix <F0, . . . , Fm>.
The proofs of completeness require construction, for each set T of for-
mulae, of a model M — {a € At : T |-Sn a} (or T |-si a). So M =Sh a for all
atomic formulae a E T n At trivially. It is easy to show that M |=sh (a => &)
for all implications (a = b) 6 T as well, since either a £ M, or b follows by
modus ponens from a and a => b, so b € M. Finally, it is easy to show that
M =sh F if and only if T |-Sn F (or T |-S1 F).
In richer languages than SIC, containing for example disjunctions or
negations, things may be more complicated. For example, if the disjunction
(A V B) is in the set of hypotheses T, each model satisfying T must satisfy
one of A or B, yet neither may be a logical consequence of T, so one of them
must be omitted from Ff. Similarly, in a language allowing negation, we
often require that every model satisfy either A or - A. Such considerations
complicate the construction of a model substantially.
Notice that the formal definitions above do not restrict the nature of
semantic systems and proof systems significantly. All sorts of nonsensical
formal systems fit the definitions. Rather, the relationships of soundness
and completeness provide us with conceptual tools for evaluating the be-
havior of a proof system, with respect to a semantic system that we have
42 Michael J. O'Donnell
Similarly,
Example 3.3.4. Let F=> = {(a => b) : a, b € At} be the set of implica-
tional SIC formulae (F$h — At, with FSh and At defined in Example 2.1.2).
The set of implicational logic programs (P=>) is the set of finite subsets of
of (C0 =>• Cm) in the proof system of Example 3.1.2. The first line is an
instance of the reflexive rule, and subsequent lines alternate between im-
plications in T, and applications of transitivity.
In order to avoid uninformative outputs, such as (a => a), we need a
programming system with a slightly more sophisticated notion of when to
stop a computation.
Example 3.3.5. Let fail be a new formal symbol, and let the set of naive
implicational computations with failure be
(the set of finite and infinite sequences of atomic formulae, possibly ending
in the special object fail).
Introduction 47
1. C0 = {a}
2. if Ci C A and Ci = 0, then there is an i + 1st element, and
a given question has more than one correct answer, completeness criteria
vary depending on the way in which we expect an output answer to be
chosen.
Definition 3.4.1. Let P = <P, I, C, O,> D, ->) be a programming system,
let S = (F S ,M, |=) be a semantic system, and let Q = (F Q ,Q,?-) be a
query system, with P C 2Fs, I C Q, and O C Fs n FQ.
P is sound for S and Q if and only if, for all P e P and I € I,
I> P H- O implies I ?- P |= O
P is weakly complete for S and Q if and only if, for all P e P and 7 € I
such that / is semantically answerable for P (Definition 2.2.2), and for all
computations C e C such that I> P D C, there exists O e O such that
C-» O and I?- P |= O
(O is unique because —> is a partial function).
P is consequentially complete for S and Q if and only if P is weakly
complete and, in addition, O above is a consequentially strongest correct
answer ({O} |= N for all N € FQ such that I ?- P |= N).
So, a programming system is sound if all of its outputs are correct an-
swers to input questions, based on the knowledge represented explicitly
by programs. A system is weakly complete if, whenever a correct answer
exists, every computation outputs some correct answer. A system is conse-
quentially complete if, whenever a correct answer exists, every computation
outputs a consequentially strongest correct answer. Notice that, for conse-
quential completeness, the strength of the output answer is judged against
all possible answers in the query system, not just those that are possible
outputs in the programming system, so we cannot achieve consequential
completeness by the trickery of disallowing the truly strongest answers.
A programming system provides another approach to analyzing a simple
form of communication. While semantic systems, proof systems, and query
systems yield insight into the meaning of communication and criteria for
evaluating the behavior of communicating agents, programming systems
merely describe that behavior. A programmer provides a program P to a
processor. A user (sometimes, but not always, identical with the program-
mer) provides an input /, and the processor performs a computation C such
that I> P D C from which the output O, if any, such that C —> O, may be
extracted. We allow the mapping from computation to output to depend
on purely conventional rules that are adopted by the three agents. What
aspects of a computation are taken to be significant to the output is really
a matter of convention, not necessity. Often, only the string of symbols
displayed on some printing device is taken to be the output, but in various
contexts the temporal order in which they are displayed (which may be
Introduction 51
different from the printed order if the device can backspace), the temporal
or spatial interleaving of input and output, the speed with which output
occurs, the color in which the symbols are displayed, which of several de-
vices is used for display, all may be taken as significant. Also, convention
determines the treatment of infinite computation as having no output, null
output, or some nontrivial and possibly infinite output produced incremen-
tally. The presentation of input to a computation is similarly a matter of
accepted convention, rather than formal computation.
In logic programming, where the programmer acts as speaker, the pro-
cessor as auditor, and the user as questioner, soundness of the program-
ming system guarantees that all outputs constitute correct answers. Vari-
ous forms of completeness guarantee that answers will always be produced
when they exist. In this sense, soundness and completeness mean that a
programming system provides a correct and powerful implementation of
the auditor in the speaker-auditor-questioner scenario of Section 2.2.
There is a close formal correspondence between programming systems
and pairs of proof and query systems: inputs correspond to questions,
programs correspond to sets of hypotheses, computations to proofs, and
outputs to theorems (for a different correspondence, in which programs
in the form of lambda terms correspond to natural deduction proofs, see
[Howard, 1980; Tait, 1967; Constable et al., 1986]—compare this to the in-
terpretation of formulae as queries and proofs as answers [Meseguer, 1989]).
Notice that both quaternary relations Q ?- T I D - F and I> P 0 C -> O
are typically computable, while both of the trinary relations Q ?- T |- F
and I> P D-> O are typically semicomputable. Furthermore, the defini-
tions of the trinary relations from the corresponding quaternary relations
are analogous. In both cases we quantify existentially over the third argu-
ment, which is variously a proof or a computation.
There is an important difference, however, between the forms of defi-
nition of the provable-answer relation Q ?- T I D - F, and of the compu-
tational relation I> P 0 C —> O, reflecting the difference in intended uses
of these relations. This difference has only a minor impact on the rela-
tions definable in each form, but a substantial impact on the efficiency of
straightforward implementations based on the definitions. In the query-
proof domain, we relate formulae giving explicit knowledge (program) to
the proofs (computations) that can be constructed from that knowledge,
yielding formulae (outputs) that are provable consequences of the given
knowledge in the relation T I D - F. We independently relate questions
(inputs) to the answers (outputs) in the relation Q ?- F, and then take
the conjunction of the two. There is no formal provision for the question
(input) to interact with the knowledge formulae (program) to guide the
construction of the proof (computation)—the question (input) is only used
to select a completed proof. In the computational domain, we relate inputs
(questions) directly to programs (knowledge formulae) to determine com-
52 Michael J. O'Donnell
putations (proofs) that they can produce in the relation I> P 0 C. Then,
we extract outputs (answer formulae) from computations (proofs) in the
relation C -> O. The relation I> P 0 C provides a formal concept that
may be used to represent the interaction of input (question) with program
(knowledge) to guide the construction of the computation (proof).
Given a proof system and a query system, we can construct a pro-
gramming system with essentially the same behavior. This translation is
intended as an exercise in understanding the formal correspondence be-
tween proofs and computations. Since our requirements for proof systems
and programming systems are quite different, this construction does not
normally lead to useful implementations.
Proposition 3.4.2. Let D = (F, D, I -) be a proof system, and let Q =
(F, Q,?-) be a query system. Define the programming system
where
1. Q> T 0 (P, F) if and only if Q ?- T I P - F
2. Q> T O fail if and only if there are no P and F such that
Q?- T I P - F
3. (P, F) -> G if and only if F = G
4. fail -> F is false for all F € F
Then,
Q?-T\P-Fifand only if Q> T D (P,F) -> F
Therefore,
Q ? - T | - F if and only if Q> T D-> F
question 'what is the capital city of Idaho?' the abbreviated answer 'Boise'
is generally accepted as equivalent to the full answer 'the capital city of
Idaho is Boise.'
Proposition 3.4.3. Let P = (P, I, C, O) be a programming system. De-
fine the proof system D = (P U (I x O), C, I -), where
1. T I C - (I, O) if and only if I> P 0 C -> O for some P € T
2. T I C - P is false for all P € P
Also define the query system Q = (I x O, I, ?-), where
1. I ?- (J, O) if and only if I = J.
Then,
Therefore,
Given the input question imp(a) ('what logical formula does a imply?'), a
possible computation is the infinite sequence
(a, b, a, 6, . . .)
Given the input question rest-imp(a, {a, b, c}) ('what logical formula not
in {a, 6, c} does a imply?'), two possible computations with no output are
There is a correct answer, a => d, which is found by the computation (a, d).
The programming system of backtracking implicational computations
in Example 3.3.6 avoids the finite failure of the naive computations with
failure, but is still not weakly complete because of infinite computations. It
succeeds on the program and question above, with the unique computation
and the question rest-imp(a, {a, b, c, d}). There is no finite failing compu-
tation, and the correct answer a => e is output by the computation
({a}, {b, c, d}, {b, d}, {a, d}, {a, e}, print(e)}
({a}, {&, c, d}, {b, c, e}, {b, a, e}, {b, c, d, e}, {&, a, d, e}, {b, c, d, e}, . . .)
proofs. Cut elimination theorems are usually associated with the predicate
calculus and its fragments, variants, and extensions. The impact of cut
elimination on proof strategies has been studied very thoroughly, leading
to an excellent characterization of the sequent-style proof systems that
are susceptible to generalizations of the simple goal-directed proof strategy
used in Prolog [Miller et al., 1991]. These proof-theoretic methods have
been applied successfully in some novel logics where the model-theoretic
semantics are not yet properly understood.
In the term rewriting literature, there are similar results on the nor-
malization of equational proofs. Many of these come from confluence (also
called Church-Rosser) results. A system of equations
car (cons ( x , y ) ) = x
cdr (cons ( x , y ) ) = y
s = cdr(car(cons(cons(a, s),b)))
60 Michael J. O'Donnell
= cdr(car(cons(a,b)))
= cdr(car(cons(cons(a,t),b)))
= t
Acknowledgements
I am very grateful for the detailed comments that I received from the
readers, Bharay Jayaraman, Jose Meseguer, Dale Miller, Gopalan Nadathur
and Ed Wimmers. All of the readers made substantial contributions to the
correctness and readability of the chapter.
References
[Anderson and Belnap Jr., 1975] Alan Ross Anderson and Nuel D. Belnap
Jr. Entailment—the Logic of Relevance and Necessity, volume 1. Prince-
ton University Press, Princeton, NJ, 1975.
[Andrews, 1986] Peter B. Andrews. An Introduction to Mathematical Logic
and Type Theory: To Truth Through Proof. Computer Science and Ap-
plied Mathematics. Academic Press, New York, NY, 1986.
[Ashcroft and Wadge, 1977] E. Ashcroft and W. Wadge. Lucid: A non-
procedural language with iteration. Communications of the ACM,
20(7):519-526, 1977.
[Backus, 1974] John Backus. Programming language semantics and closed
applicative languages. In Proceedings of the 1st ACM Symposium on
Principles of Programming Languages, pages 71-86. ACM, 1974.
Introduction 61
[Backus, 1978] John Backus. Can programming be liberated from the von
Neumann style? a functional style and its algebra of programs. Com-
munications of the ACM, 21(8):613-641, 1978.
[Barendregt, 1984] Hendrik Peter Barendregt. The Lambda Calculus: Its
Syntax and Semantics. North-Holland, Amsterdam, 1984.
[Belnap Jr. and Steel, 1976] Nuel D. Belnap Jr. and T. B. Steel. The Logic
of Questions and Answers. Yale University Press, New Haven, CT, 1976.
[Brooks et al., 1982] R. A. Brooks, R. P. Gabriel, and Guy L. Steele. An
optimizing compiler for lexically scoped Lisp. In Proceedings of the 1982
ACM Compiler Construction Conference, June 1982.
[Church, 1941] A. Church. The Calculi of Lambda-Conversion. Princeton
University Press, Princeton, New Jersey, 1941.
[Codd, 1970] E. F. Codd. A relational model of data for large shared data
banks. Communications of the ACM, 13(6), June 1970.
[Codd, 1971] E. F. Codd. A data base sublanguage founded on the rela-
tional calculus. In Proceedings of the 1971 ACM SIGFIDET Workshop
on Data Description, Access and Control, 1971.
[Codd, 1972] E. F. Codd. Relational completeness of data base sublan-
guages. In Data Base Systems, volume 6 of Courant Computer Science
Symposia. Prentice-Hall, Englewood Cliffs, NJ, 1972.
[Cohn, 1965] P. M. Cohn. Universal Algebra. Harper and Row, New York,
NY, 1965.
[Constable et al, 1986] Robert L. Constable, S. F. Allen, H. M. Brom-
ley, W. R. Cleaveland, J. F. Cremer, R. W. Harper, D. J. Howe,
Todd B. Knoblock, N. P. Mendler, Prakesh Panangaden, J. T. Sasaki,
and Scott F. Smith. Implementing Mathematics with the Nuprl Proof
Development System. Prentice-Hall, Englewood Cliffs, NJ, 1986.
[Curry and Feys, 1958] H. B. Curry and R. Feys. Combinatory Logic, vol-
ume 1. North-Holland, Amsterdam, 1958.
[Date, 1986] C. J. Date. An Introduction to Database Systems. Systems
Programming. Addison-Wesley, Reading, MA, 4 edition, 1986.
[Dijkstra, 1976] Edsger W. Dijkstra. A Discipline of Programming.
Prentice-Hall, Englewood Cliffs, NJ, 1976.
[Dwelly, 1988] Andrew Dwelly. Synchronizing the I/O behavior of func-
tional programs with feedback. Information Processing Letters, 28, 1988.
[Fagin et al., 1984] Ronald Fagin, Joseph Y. Halpern, and Moshe Y. Vardi.
A model-theoretic analysis of knowledge. In Proceedings of the 25th
Annual IEEE Symposium on Foundations of Computer Science, pages
268-278, 1984.
[Friedman and Wise, 1976] Daniel Friedman and David S. Wise. Cons
should not evaluate its arguments. In 3rd International Colloquium on
62 Michael J. O'Donnell
[Machtey and Young, 1978] Michael Machtey and Paul Young. An Intro-
duction to the General Theory of Algorithms. Theory of Computation.
North-Holland, New York, NY, 1978.
[Maier and Warren, 1988] David Maier and David S. Warren. Comput-
ing with Logic—Logic Programming with Prolog. Benjamin Cummings,
Menlo Park, CA, 1988.
[Markowsky, 1976] G. Markowsky. Chain-complete posets and directed sets
with applications. Algebra Universalis, 6:53-68, 1976.
[McCarthy, 1960] John McCarthy. Recursive functions of symbolic expres-
sions and their computation by machine, part I. Communications of the
ACM, 3(4):184-195, 1960.
[Meseguer and Goguen, 1985] Jose Meseguer and Joseph A. Goguen. Ini-
tiality, induction, and computability. In Maurice Nivat and John
Reynolds, editors, Algebraic Methods in Semantics, pages 459-541. Cam-
bridge University Press, 1985.
[Meseguer, 1989] Jose Meseguer. General logics. In H.-D. Ebbinghaus et.
al., editor, Logic Colloquium '87: Proceedings of the Colloquium held in
Granada, Spain July 20-25, 1987, Amsterdam, 1989. Elsevier North-
Holland.
[Meseguer, 1992] Jose Meseguer. Multiparadigm logic programming. In
H. Kirchner and G. Levi, editors, Proceedings of the 3rd Interna-
tional Conference on Algebraic and Logic Programming, Volterra, Italy,
September 1992, Lecture Notes in Computer Science. Springer-Verlag,
1992.
[Miller et al., 1991] Dale Miller, Gopalan Nadathur, Prank Pfenning, and
Andre Scedrov. Uniform proofs as a foundation for logic programming.
Annals of Pure and Applied Logic, 51:125-157, 1991.
[Moses, 1970] Joel Moses. The function of FUNCTION in LISP, or why
the FUNARG problem should be called the environment problem. A CM
SIGSAM Bulletin, 15, 1970.
[Mostowski et al., 1953] Andrzej Mostowski, Raphael M. Robinson, and
Alfred Tarski. Undecidability and Essential Undecidability in Arithmetic,
chapter II, pages 37-74. Studies in Logic and the Foundations of Mathe-
matics. North-Holland, Amsterdam, 1953. Book author: Alfred Tarski in
collaboration with Andrzej Mostowski and Raphael M. Robinson. Series
editors: L. E. J. Brouwer, E. W. Beth, A. Heyting.
[Muchnick and Pleban, 1980] Steven S. Muchnick and Uwe F. Pleban. A
semantic comparison of Lisp and Scheme. In Proceedings of the 1980
Lisp Conference, pages 56-64, 1980. Stanford University.
[Nadathur and Miller, 1988] Gopalan Nadathur and Dale Miller. An
overview of AProlog. In Proceedings of the 5th International Confer-
ence on Logic Programming, pages 810-827, Cambridge, MA, 1988. MIT
Press.
Introduction 65
[Nadathur and Miller, 1990] Gopalan Nadathur and Dale Miller. Higher-
order Horn clauses. Journal of the ACM, 37(4):777-814, October 1990.
[O'Donnell, 1977] Michael James O'Donnell. Computing in Systems De-
scribed by Equations, volume 58 of Lecture Notes in Computer Science.
Springer-Verlag, 1977.
[O'Donnell, 1985] Michael James O'Donnell. Equational Logic as a Pro-
gramming Language. Foundations of Computing. MIT Press, Cambridge,
MA, 1985.
[Perry, 1991] Nigel Perry. The Implementation of Practical Functional Pro-
gramming Languages. PhD thesis, Imperial College of Science, Technol-
ogy and Medicine, University of London, 1991.
[Prawitz, 1965] Dag Prawitz. Natural Deduction—a Proof-Theoretic Study.
Alqvist and Wiksell, Stockholm, 1965.
[Rebelsky, 1992] Samuel A. Rebelsky. I/O trees and interactive lazy func-
tional programming. In Maurice Bruynooghe and Martin Wirsing, ed-
itors, Proceedings of the Fourth International Symposium on Program-
ming Language Implementation and Logic Programming, volume 631 of
Lecture Notes in Computer Science, pages 458-472. Springer-Verlag, Au-
gust 1992.
[Rebelsky, 1993] Samuel A. Rebelsky. Tours, a System for Lazy Term-
Based Communication. PhD thesis, The University of Chicago, June
1993.
[Rees Clinger, 1986] The Revised3 report on the algorithmic language
Scheme. ACM SIGPLAN Notices, 21(12):37-79, 1986.
[Reiter, 1978] Raymond Reiter. On closed world databases. In Herve Gal-
laire and Jack Minker, editors, Logic and Databases, pages 149-178.
Plenum Press, 1978. also appeared as [Reiter, 1981].
[Reiter, 1981] Raymond Reiter. On closed world databases. In Bon-
nie Lynn Webber and Nils J. Nilsson, editors, Readings in Artificial
Intelligence, pages 119-140. Tioga, Palo Alto, CA, 1981.
[Reiter, 1984] Raymond Reiter. Towards a logical reconstruction of rela-
tional database theory. In Michael L. Brodie, John Mylopoulos, and
Joachim W. Schmidt, editors, On Conceptual Modelling—Perspectives
from Artificial Intelligence, Databases, and Programming Languages,
Topics in Information Systems, pages 191-233. Springer-Verlag, 1984.
[Reynolds, 1973] John C. Reynolds. On the interpretation of Scott's do-
mains. In Proceedings of Convegno d'Informatica Teorica, Rome, Italy,
February 1973. Instituto Nazionale di Alta Matematica (Citta Universi-
taria).
[Schmidt, 1986] David A. Schmidt. Denotational Semantics: A Methodol-
ogy for Language Development. Allyn and Bacon, 1986.
66 Michael J. O'Donnell
Contents
1 Introduction to equational logic programming 69
1.1 Survey of prerequisites 69
1.2 Motivation for programming with equations 71
1.3 Outline of the chapter 74
2 Proof systems for equational logic 75
2.1 Inferential proofs 75
2.2 Term rewriting proofs 78
2.3 The confluence property and the completeness of term
rewriting 81
3 Term rewriting proof strategies 96
3.1 Complete and outermost complete rewriting sequences 97
3.2 Sequentiality analysis and optimal rewriting 100
4 Algorithms and data structures to implement
equational languages 111
4.1 Data structures to represent terms 111
4.2 Pattern-matching and sequencing methods 120
4.3 Driving procedures for term rewriting 129
5 Compiling efficient code from equations 137
6 Parallel implementation 139
7 Extensions to equational logic programming 141
7.1 Incremental infinite input and output 141
7.2 Solving equations 147
7.3 Indeterminate evaluation in subset logic 149
7.4 Relational rewriting 151
(norm t1, . . . , ti : t) ?- T |= (t = s)
(t = s is a semantically correct answer to the question (norm t1, . . . , ti : t)
for knowledge T, see Definition 2.2.2) means that s is a normal form—a
term containing no instances of t1, . . . , ti—whose equality to t is a seman-
tic consequence of the equations in T. Equational logic programming lan-
guages in use today all take sets T of equations, prohibited forms t1, . . . , ti,
and terms t to normalize, and they compute normal forms 5 satisfying the
relation above.
Section 2.3.5 of the 'Introduction . . .' chapter explains how different
equational languages variously determine T, t1, . . . ,ti, and t from the
language design, the program being executed, and the input. An alter-
nate style of equational logic programming, using questions of the form
(solve x1, . . . , xi :t1 = t2) that ask for substitutions for x1, . . . , xi solving
the equation (t1 = t2), is very attractive for its expressive power, but much
harder to implement efficiently (see Section 7.2).
There is a lot of terminological confusion about equational logic pro-
gramming. First, many in the Prolog community use 'logic' to mean the
first-order predicate calculus (FOPC), while I stick closer to the dictionary
meaning of logic, in which FOPC is one of an infinity of possible logical
systems. Those who identify logic with FOPC often use the phrase 'equa-
tional logic programming' to mean some sort of extension of Prolog using
equations, such as logic programming in FOPC with equality. In this chap-
ter, 'equational logic programming' means the logic programming of pure
Equational Logic Programming 71
is defined by
T I <F0, . . . , Fm) -R F if and only if Fm = F and,
for all i < m, one of the following cases holds:
1. Fi € T
2. There exist j1, . . . ,jn< i such that {Fj1 , . . . , Fjn } R Fi
Now, when R is the union of any of the rules presented above, ( F = , P=,
I -R) is a compact proof system (Definition 2.1.3, Section 2.1).
The rules above are somewhat redundant. Every proof system using a
subset of these rules is sound, and those using the Reflexive, Symmetric,
Transitive and Instantiation rules, and at least one of Substitution and
Congruence, are also complete.
Proposition 2.1.3. Let R be the union of any of the rules in Defini-
tion 2.1.2. Then (F= ,P=, I -R) is a sound proof system for the standard
semantic system of Definition 2.3.14, Section 2.3.4 of the chapter 'Introduc-
tion: Logic and Logic Programming Languages.' That is, T |-R (t1 = t2)
implies T | = ( t 1 = t 2 ) .
The proof of soundness is an elementary induction on the number of
steps in a formal equational proof, using the fact that each of the rules of
inference proposed above preserves truth.
Proposition 2.1.4. Let R be the union of the Reflexive, Symmetric,
Transitive, and Instantiation rules, and at least one of the Substitution
and Congruence rules. Then (F = ,P = , I -R) is a complete proof system.
That is, T |= (t1 = t2) implies T |-R (t1 = t2).
To prove completeness, we construct for each set T of equations, a
term model MT such that Theory ({MT}) contains exactly the semantic
consequences of T. For each term t e Tp,
whose universe is
Steps 5, 10, and 12 above are redundant (they reproduce the results already
obtained in steps 4, 9, 2), but the systematic procedure in the induction of
Proposition 2.2.3 includes them for uniformity.
So, a term rewriting proof is a convenient and natural shorthand for an
inferential proof.
Not every inferential proof corresponds to a term rewriting proof. First,
the proofs corresponding to term rewriting sequences do not use the Sym-
metric rule. This represents a serious incompleteness in term rewriting
proof. Section 2.3 shows how restrictions on equational hypotheses can
avoid the need for the Symmetric rule, and render term rewriting complete
for answering certain normal form questions.
Example 2.2.5. Let T = {a = b, c = b, c = d}. T |= (a = d), and
T |-inf (a = d), by one application of the Symmetric rule and two appli-
cations of the Transitive rule. But, there is no term rewriting sequence
from a to d, nor from d to o, nor from a and d to a common form equal to
both.
Second, term rewriting proofs limit the order in which the Instantiation,
Substitution, and Transitive rules are applied. This second limitation does
not affect the deductive power of the proof system.
Proposition 2.2.6. Let T = {li = r1, . . .,ln= rn }be a set of equations.
Let TR = {r1 = lj, . . . ,rn = ln}. TR is the same as T except that the left
and right sides of equations are interchanged—equivalently, T contains
Equational Logic Programming 81
Two similar properties that are very important in the literature are
local confluence, which is weaker than confluence, and one-step confluence,
which is stronger than confluence.
Definition 2.3.2 ([Newman, 1942]). Let -> be a binary relation, and
->* be its reflexive-transitive closure. -> is locally confluent if and only if,
for all s, t1, t2 in its domain such that s —> t1 and s —> t2, there exists a u
such that t1 -> u and t2-»* u (see Figure 1 A).
While confluence guarantees that divergent term rewriting sequences
may always be rewritten further to a common form, local confluence guar-
antees this only for single step term rewriting sequences.
Definition 2.3.3 ([Newman, 1942]). Let -> be a binary relation. -> is
locally confluent if and only if, for all s, t1, t2 in its domain such that s -> t1
and s -> t2, there exists a u such that t1 -» u and t2 -> u (see Figure 1 C).
While confluence guarantees that divergent term rewriting sequences
may always be rewritten further to a common form, one-step confluence
guarantees that for single step divergences, there is a single-step conver-
gence.
Proposition 2.3.4. One-step confluence implies confluence implies local
confluence.
The first implication is a straightforward induction on the number of
steps in the diverging rewrite sequences. The second is trivial.
Equational Logic Programming 83
and
Definition 2.3.9 above, and the symbols of the system partition into two
disjoint sets—the set C of constructor symbols, and the set D of defined
symbols, satisfying
4" (Symbol-Nonambiguous)
• Every left-hand side of an equation in T has the form f ( t 1 , . . . ,
tm), where / € D is a defined symbol, and t 1 , . . . , t m contain
only variables and constructor symbols in C.
• Let li and lj be left-hand sides of equations in T. If there exists
a common instance s of li and lj, then i = j.
In most of the term-rewriting literature, 'orthogonal' and 'regular' both
mean rule-orthogonal. It is easy to see that constructor orthogonality im-
plies rule-orthogonality, which implies rewrite-orthogonality. Most func-
tional programming languages have restrictions equivalent or very similar
to constructor-orthogonality.
Orthogonal systems of all varieties are confluent.
Proposition 2.3.14. Let T be a constructor-, rule- or rewrite-orthogonal
set of equations. Then the term rewriting relation —>• is confluent.
Let — be the rewrite relation that is to be proved confluent. The essen-
tial idea of these, and many other, proofs of confluence is to choose another
relation - with the one-step confluence property (Definition 2.3.3, whose
transitive closure is the same as the transitive closure of -. Since conflu-
ence is defined entirely in terms of the transitive closure, — is confluent
if and only if —>' is confluent. - is confluent because one-step confluence
implies confluence. To prove confluence of orthogonal systems of equations,
the appropriate - allows simultaneous rewriting of any number of disjoint
subterms.
Theorem 10.1.3 of the chapter 'Equational Reasoning and Term-
Rewriting Systems' in Section 10.1 of Volume 1 of this handbook is the
rewrite-orthogonal portion of this proposition, which is also proved in [Huet
and Levy, 1991; Klop, 1991]. The proof for rewrite.-orthogonal systems has
never been published, but it is a straightforward generalization. [0 'Donnell,
1977] proves a version intermediate between rule-orthogonality and rewrite-
orthogonality.
In fact, for nontrivial, rule-like, left-linear systems, rule-nonambiguity
captures precisely the cases of confluence that depend only on the left-hand
sides of equations.
Proposition 2.3.15. A nontrivial, rule-like, left-linear set T of equations
is rule-nonambiguous if and only if, for every set of equations T' left-
similar to T, -> is confluent.
(=>) is a direct consequence of Propositions 2.3.14 and 2.3.12. (•=) is
straightforward. In a rule-ambiguous system, simply fill in each right-hand
88 Michael J. O'Donnell
side with a different constant symbol, not appearing on any left-hand side,
to get a nonconftuent system.
In the rest of this chapter, we use the term 'orthogonal' in assertions
that hold for both rewrite- and rule-orthogonality. To get a general un-
derstanding of orthogonality, and its connection to confluence, it is best
to consider examples of nonorthogonal systems and investigate why they
are not confluent, as well as a few examples of systems that are not rule
orthogonal, but are rewrite orthogonal, and therefore confluent.
Example 2.3.16. The first example, due to Klop [Klop, 1980], shows
the subtle way in which non-left-linear systems may fail to be confluent.
Let
and also
The system is not confluent, because the attempt to rewrite f(true) to true
yields an infinite regress with f(true) -4 eq(true,f(true)). Notice that -4
has unique normal forms. The failure of confluence involves a term with
a normal form, and an infinite term rewriting sequence from which that
normal form cannot be reached. Non-left-linear systems that satisfy the
other requirements of rule-orthogonality always have unique normal forms,
even when they fail to be confluent [Chew, 1981]. I conjecture that this
holds for rewrite-orthogonality as well.
A typical rewrite-ambiguous set of equations is
but
Equational Logic Programming 89
Although
and w is not an instance of or(x, false), the substitution above unifies the
corresponding right-hand sides as well:
T—
Tor- is rewrite-orthogonal, so or-* is confluent.
Another type of rewrite-ambiguous set of equations is
T4 is rewrite-orthogonal, so -4 is confluent.
Condition (4) may also be violated by a single self-overlapping equation,
such as
The left-hand side f(f(x)) overlaps itself in f(f(f(x))), with the second
instance of the symbol / participating in two different instances of f ( f ( x ) ) .
Condition (4) is violated, because
T5 T5
but f(y) is not an instance of f(f(x)). - is not confluent, as f ( f ( f ( a ) ) ) —
g(f(a)) and f(f(f(a)))- f(g(a\)), but both g(f(a)) and f(g(a)) are in
normal form.
A final example of overlapping left-hand sides is
The left-hand sides of the two equations overlap in f(g(a, b ) , y ) , with the
symbol g participating in instances of the left-hand sides of both equations.
Condition (4) is violated, because
with the intuitive forms of equations, and because I believe that improved
equational logic programming languages of the future will deal with even
more general sets of equations, so I prefer to discourage dependence on the
special properties of constructor systems.
Knuth-Bendix Methods. Although overlapping left-hand sides of equa-
tions may destroy the confluence property, there are many useful equa-
tional programs that are confluent in spite of overlaps. In particular, the
equation expressing the associative property has a self-overlap, and equa-
tions expressing distributive or homomorphic properties often overlap with
those expressing identity, idempotence, cancellation, or other properties
that collapse a term. These overlaps are usually benign, and many useful
equational programs containing similar overlaps are in fact confluent.
Example 2.3.17. Consider the singleton set
expressing the associative law for the operator g. This equation has a self-
overlap, violating condition (4) of rewrite-orthogonality (Definition 2.3.9)
because
and
expressing the distribution of / over g, and the fact that i is a left identity
for g and a fixed point for /. The first and second equations overlap,
violating condition (4) of rewrite-orthogonality, because
by the second equation, the first result rewrites to the second, which is in
normal form, by
Notice that T8 = T3 U {f(i) = i}, and that confluence failed for T3 (Ex-
ample 2.3.16).
Experience with equational logic programming suggests that most
naively written programs contain a small number of benign overlaps, which
are almost always similar to the examples above. An efficient test for con-
fluence in the presence of such overlaps would be extremely valuable.
The only known approach to proving confluence in spite of overlaps is
based on the Knuth-Bendix procedure [Knuth and Bendix, 1970]. This
procedure relies on the fact that local confluence (Definition 2.3.2) is often
easier to verify than confluence, and that local confluence plus termination
imply confluence.
Proposition 2.3.18 ([Newman, 1942]). // -» is locally confluent, and
there is no infinite sequence SQ -> si -*•••, then -»• is confluent.
The proof is a simple induction on the number of steps to normal form.
Unfortunately, a system with nonterminating rewriting sequences may
be locally confluent, but not confluent.
Example 2.3.19. T1 of Example 2.3.16 is locally confluent, but not
confluent.
94 Michael J. O'Donnell
-4 is locally confluent, but not confluent. Notice how confluence fails due
to the two-step rewritings a -4 b -4 d and b -4 a -4 c (see Figure 2).
-4° is locally confluent, but not confluent. Again, confluence fails due
to the two-step rewritings f ( x ) -4° g(h(x)) -4° d and g(x) -4° f ( h ( x ) ) -¥ c
(see Figure 3).
but s is not an instance of l2. For each s, l1 and l2, use the smallest
t 1 ,...,t m and t(,..., t'n that satisfy this equation. The results of rewrit-
ing the instance of s above in two different ways, according to the over-
lapping instances of equations, are c1 = s[r 1 [t 1 , . . . , t m / x 1 , . . . , xm]/y] and
c2 = r^t'i,..., t'nl%\,..., x'n]. The pair (c1, c2) is called a critical pair. A
finite set of equations generates a finite set of critical pairs, since only a
finite number of ss can be compatible with some l2, but not an instance
of l2. The procedure checks all critical pairs to see if they rewrite to a
common normal form. If so, the system is locally confluent.
Proposition 2.3.20 ([Huet, 1980]). Let T be a set of equations. If for
every critical pair (c1,c2) of T there is a term d such that c\ -^* d and
c1 —t* d, then —t is locally confluent.
This proposition, and the Knuth-Bendix method, apply even to non-
left-linear sets of equations. For example, the local confluence of T1 in
Example 2.3.16 may be proved by inspecting all critical pairs.
When some critical pair cannot be rewritten to a common form, the
Knuth-Bendix procedure tries to add an equation to repair that failure
of local confluence. For equational logic programming, we would like
to use just the part of the procedure that checks local confluence, and
leave it to the programmer to decide how to repair a failure. Although,
in principle, the search for a common form for a critical pair might go
on forever, in practice a very shallow search suffices. I have never ob-
served a natural case in which more than two rewriting steps were in-
volved. Unfortunately, many useful equational programs have nontermi-
nating term rewriting sequences, so local confluence is not enough. The
design of a variant of the Knuth-Bendix procedure that is practically
useful for equational logic programming is an open topic of research—
some exploratory steps are described in [Chen and O'Donnell, 1991]. A
number of methods for proving termination are known [Dershowitz, 1987;
Guttag et al., 1983], which might be applied to portions of an equational
program even if the whole program is not terminating, but we have no
experience with the practical applicability of these methods. If the rewrit-
ing of the terms c1 and c2 in a critical pair to a common form d (see
Proposition 2.3.20) takes no more than one rewriting step (this is one-step
confluence, Definition 2.3.3), then we get confluence and not just local con-
fluence. Rewrite-orthogonal systems are those whose critical pairs are all
96 Michael J. O 'Donnell
trivial—the members of the pair are equal, and so the reduction to a com-
mon form takes zero steps. Unfortunately, all of the important examples so
far of confluent but not rewrite-orthogonal equational programs have the
basic structure of associativity or distributivity (see Example 2.3.17) and
require two rewriting steps to resolve their critical pairs.
The sets of equations in Example 2.3.17 pass the Knuth-Bendix test
for local confluence, and a number of well-known techniques can be used to
prove that there is no infinite term rewriting sequence in these systems.
But, we need to recognize many variations on these example systems,
when they are embedded in much larger sets of equations which gener-
ate some infinite term rewriting sequences, and no completely automated
method has yet shown practical success at that problem (although there are
special treatments of commutativity and associativity [Baird et al., 1989;
Dershowitz et al., 1983]). On the other hand, in practice naturally con-
structed systems of equations that are locally confluent are almost always
confluent. Surely someone will find a useful and efficient formal criterion
to distinguish the natural constructions from the pathological ones of Ex-
ample 2.3.19.
The first equation is the usual definition of the car (left projection) function
of Lisp, the second is a silly example of an equation leading to infinite
term rewriting. T11 is orthogonal, so -V is confluent. But car(cons(b,a))
rewrites to the normal form 6, and also in the infinite rewriting sequence
car(cons(b,a)) -¥ car(cons(b, f ( a ) ) ) -V
Notice that, due to confluence, no matter how far we go down the infi-
nite term rewriting sequence car (cons(b, a)) -V car(cons(b, f(a))) -V • • •,
one application of the first equation leads to the normal form b. Nonethe-
less, a naive strategy might fail to find that normal form by making an infi-
Equational Logic Programming 97
The term g(f(f(h(a))), f(h(a))) has five redexes: two occurrences each of
the terms a and f(h(a)), and one occurrence of f(f(h(a))). The latter
two are both instances of the left-hand side f ( x ) of the first equation.
The leftmost occurrence of f(h(a)) is nested inside f(f(h(a))), so it is not
outermost. Each occurrence of a is nested inside an occurrence of f ( h ( a ) ) ,
so neither is outermost. The rightmost occurrence of f(h(a)), and the sole
occurrence of f(f(h(a))), are both outermost redexes. In the rewriting
98 Michael J. O'Donnell
These equations are confluent, but not rewrite-orthogonal, since the left-
hand sides of the first and second equations overlap in f ( g ( x , b ) ) , but the
corresponding right-hand sides yield 6 ^ f ( g ( f ( x ) , b ) ) . The natural out-
ermost complete rewriting sequence starting with f(g(b, a)) is the infinite
one
In a term of the form cond(r, s, t), there will usually be outermost redexes
in all three of r s and t. But, once r rewrites to either true or false, one of
s and t will be thrown away, and any rewriting in the discarded subterm
will be wasted. The ad hoc optimization of noticing when rewriting of one
outermost redex immediately causes another to be nonoutermost sounds
tempting, but it probably introduces more overhead in detecting such cases
than it saves in avoiding unnecessary steps. Notice that it will help the cond
example only when r rewrites to true or false in one step. So, we need
some further analysis to choose which among several outermost redexes to
rewrite.
and consider the system Tc U T14. Given a term of the form f ( r , s, t), we
cannot decide whether to rewrite redexes in r, in s, or in t without risking
wasted work, because we cannot separate computably the three cases
• r -t* a and s ->•* b
• r ->* b and t -»* a
• s -)•* a and t ->•* b
Unlike the parallel-or example, it is impossible for more than one of
these three cases to hold. There is always a mathematically well-defined
redex that must be rewritten in order to reach a normal form, and the
problem is entirely one of choosing such a redex effectively. In fact, for sets
T of equations such that T U T14 is terminating (every term has a normal
form), the choice of whether to rewrite r, s, or t in f(r, s, t) is effective, but
usually unacceptably inefficient.
So, some further analysis of the form of equations beyond checking for
orthogonality is required in order to choose a good redex to rewrite next
in a term rewriting sequence. Analysis of equations in order to determine
a good choice of redex is called sequentiality analysis.
has two redexes, neither of which is needed, since either can be elimi-
nated by rewriting the other, then rewriting the whole term to true. Rule-
orthogonality guarantees weak sequentiality.
Proposition 3.2.5 ([Huet and Levy, 1991]). A nontrivial, rule-like,
and left-linear set of equations (Definition 2.3.9) is weakly sequential if
and only if it is rule-orthogonal.
The proof of (•$=) is in [Huet and Levy, 1991]. It involves a search
through all rewriting sequences (including infinite ones), and does not yield
an effective procedure. (=*•) is straightforward, since when two redexes over-
lap neither is needed.
Proposition 3.2.5 above shows that no analysis based on weak sequen-
tiality can completely sequentialize systems whose confluence derives from
rewrite-orthogonality, or from a Knuth-Bendix analysis. Section 3.2.4 dis-
cusses possible extensions of sequentiality beyond rule-orthogonal systems.
The system Tc U T14 of Example 3.2.2 is rule-orthogonal, and therefore
weakly sequential. For example, in a term of the form f ( r , s, t), where
r —>* a and s —>•* b, both r and s contain needed redexes. The subsystem
T14, without the general-purpose programming system Tc, is effectively
weakly sequential, but only because it is terminating. I conjecture that
effective weak sequentiality is undecidable for rule-orthogonal systems.
3.2.2 Strongly needed redexes and strong sequentiality
The uncomputability of needed redexes and the weak sequential property
are addressed analogously to the uncomputability of confluence: by finding
efficiently computable sufficient conditions for a redex to be needed, and
for a system to be effectively weakly sequential. A natural approach is
to ignore right-hand sides of equations, and detect those cases of needed
redexes and effectively weakly sequential systems that are guaranteed by
the structure of the left-hand sides. To this end we define w-rewriting, in
which a redex is replaced by an arbitrary term.
Definition 3.2.6 ([Huet and Levy, 1991]). Let T = {l1 = r 1 , . . . ,
ln = rn} be a rule-orthogonal set of equations.
104 Michael J. O'Donnell
In the term f ( g ( a , a), h(a, a)), the first and last occurrences of a are strongly
needed, but the second and third are not.
Notice that w-rewriting allows different redexes that are occurrences
of instances of the same left-hand side to be rewritten inconsistently in
different w-rewriting steps. Such inconsistency is critical to the example
above, where in one case f ( a , b, c) w-rewrites to a, and in another case it
w-rewrites to b.
Strong sequentiality is independent of the right-hand sides of equa-
tions.
Proposition 3.2.10. If T1 and T2 are left-similar (see Definition 2.3.11),
and T1 is strongly sequential, then so is T2.
The proof is straightforward, since T1 and T2 clearly have the same
<w-rewriting relations (Definition 3.2.6).
It is not true, however, that a system is strongly sequential whenever
all left-similar systems are weakly sequential. The system of Example 3.2.2
and all left-similar systems are weakly sequential, but not strongly sequen-
tial. But in that case, no given redex is needed in all of the left-similar
systems. An interesting open question is whether a redex that is needed in
all left-similar systems must be strongly needed.
For finite rule-orthogonal sets of equations, strong sequentiality is de-
cidable.
Proposition 3.2.11 ([Huet and Levy, 1991; Klop and Middeldorp,
1991]). Given a finite rule-orthogonal set T of equations, it is decidable
whether T is strongly sequential.
The details of the proof are quite tricky, but the essential idea is that
only a finite set of terms, with sizes limited by a function of the sizes of
left-hand sides of equations, need to be checked for strongly needed redexes.
In developing the concept of strongly needed redexes and connecting it
to the concept of weakly needed redexes, Huet and Levy define the inter-
mediately powerful concept of an index. Roughly, an index is a needed
redex that can be distinguished from other redexes just by their relative
positions in a term, without knowing the forms of the redexes themselves
[Huet and Levy, 1991]. Every index is a weakly needed redex, but not vice
versa. Strong indexes are equivalent to strongly needed redexes. A system
in which every term not in normal form has at least one index is called
sequential. The precise relation between sequentiality in all left-similar
systems, and strong sequentiality, is an interesting open problem.
All of the sequentiality theory discussed in this section deals with se-
quentializing the process of rewriting a term to normal form. Example 7.1.6
in Section 7.1 shows that even strongly sequential systems may require a
parallel evaluation strategy for other purposes, such as a complete proce-
dure for rewriting to head-normal form.
106 Michael J. O'Donnell
given the initial term f(f(a)), both redexes, f(f(a)) and f(a), are needed,
but only the outermost one is strongly needed. By rewriting a strongly
needed redex at each step, we get the 3-step sequence
which does not rewrite the unique strongly needed redex in the first step.
It is easy to construct further examples in which the number of steps
wasted by rewriting strongly needed redexes is arbitrarily large.
Proposition 3.2.13. Given an arbitrary strongly sequential system of
equations, and a term, there is no effective procedure to choose a redex at
each step so as to minimize the length of the rewriting sequence to normal
form.
I am not aware of a treatment of this point in the literature. The basic
idea is to use the equations
Notice that, when there is a normal form, breadth-first search over all
rewriting sequences yields a very expensive computation of a minimal se-
quence. But, no effective procedure can choose some redex in all cases (even
in the absence of a normal form), and minimize the number of rewriting
steps when there is a normal form.
The uncomputability of minimal-length rewriting strategies in Propo-
sition 3.2.13 sounds discouraging. The number of rewriting steps is not,
however, a good practical measure of the efficiency of a sequencing strat-
egy. Given equations, such as f ( x ) = g(x, x) in Example 3.2.12, with more
than one occurrence of the same variable x on the right-hand side, normal
sensible implementations do not make multiple copies of the subterm sub-
stituted for that variable. Rather, they use multiple pointers to a single
copy. Then, only one actual computing step is required to rewrite all of
the apparent multiple copies of a redex within that substituted subterm.
So, in Example 3.2.12, the strategy of choosing a strongly needed redex
actually leads to only two steps, from f(f(a)) to g ( f ( a ) , f ( a ) ) , and then
directly to g(g(a,a),g(a,a)). The normal form is represented in practice
with only one copy of the subterm g(a, a), and two pointers to it for the
two arguments of the outermost g. If we charge only one step for rewriting
a whole set of shared redexes, then rewriting strongly needed redexes is
optimal.
Proposition 3.2.14. Consider multiple-rewriting sequences, in which in
one step all of the shared copies of a redex are rewritten simultaneously.
Given a strongly sequential set of equations and a term, the strategy of
rewriting at each step a strongly needed redex and all of its shared copies
leads to normal form in a minimal number of steps.
This proposition has never been completely proved in print. I claimed
a proof [O'Donnell, 1977] but had a fatal error [Berry and Levy, 1979;
O'Donnell, 1979]. The hard part of the proposition—that the rewriting
of a strongly needed redex is never a wasted step—was proved by Huet
and Levy [Huet and Le"vy, 1991]. The remaining point—that rewriting a
strongly needed redex never causes additional rewriting work later in the
sequence—seems obvious, but has never been treated formally in general.
Levy [Levy, 1978] treated a similar situation in the lambda calculus, but
in that case there is no known efficient implementation technique for the
sequences used in the optimality proof. Although the formal literature on
optimal rewriting is still incomplete, and extensions of optimality theory to
systems (such as the lambda calculus) with bound variables are extremely
subtle, for most practical purposes Huet's and Levy's work justifies the
strategy of rewriting all shared copies of a strongly needed redex at each
step. Optimality aside, rewriting strategies that always choose a strongly
needed redex are examples of one-step normalizing strategies, which pro-
vide interesting theoretical problems in combinatory logic and the lambda
108 Michael J. O'Donnell
which shows that the rightmost occurrence of g(a, a) is not needed and
The result is still a type 3 overlap, but the system is weakly rewrite-
sequential, since the redexes that are not needed immediately in the cre-
ation of an outer redex are preserved for later rewriting.
The positive parallel-or equations in Tor+ of Examples 2.3.16 and 3.2.1
give another example of a type 3 overlap where weak rewrite-sequentiality
fails. On the other hand, the negative parallel-or equations of Tor- in
Example 2.3.16 have type 2 overlap, but they are sequential. In a term
T T _
of the form or(s,t) where s -3-~ false and t -3T false, it is safe to rewrite
either s or t first, since the redexes in the other, unrewritten, subterm are
preserved for later rewriting.
Theories extending sequentiality analysis, through concepts such as
weak rewrite-sequentiality, are open topics for research. I conjecture that
Equational Logic Programming 111
heap node, and assume that such optimizations are applied at a lower level
of implementation.
4.1.1 A conceptual model for term data structures
Some useful techniques for implementing equational logic programming
require more than the linked heap structures representing terms. For ex-
ample, it is sometimes better to represent the rewriting of s to t by a link
from the head node of the representation of s pointing to the head node
of t, rather than by an actual replacement of s by t. This representation
still uses a heap, but the heap now represents a portion of the infinite
rewriting graph for a starting term, rather than just a single term at some
intermediate stage in rewriting to normal form. Other techniques involve
the memoing of intermediate steps to avoid recomputation—these require
more efficient table lookup than may be achieved with a linked heap. For-
tunately, there is a single abstract data structure that subsumes all of the
major proposals as special cases, and which allows a nice logical interpre-
tation [Sherman, 1990]. This data structure is best understood in terms of
three tables representing three special sorts of functions.
Definition 4.1.1. For each i > 0 let Funi be a countably infinite set of
function symbols of arity i. The 0-ary function symbols in Fun0 are called
constant symbols. T°P is the set of ground terms (terms without variables)
constructed from the given function symbols (see Definition 2.3.1 of the
chapter 'Introduction: Logic and Logic Programming Languages'). Let P
be a countably infinite set. Members of P are called parameters, and are
written a, /?,..., sometimes with subscripts. Formally, parameters behave
just like the variables of Definition 2.3.1, but their close association with
heap addresses later on makes us think of them somewhat differently.
An i-ary signature is a member of Funi x Pi. The signature (/, ( a 1 , . . . ,
ai)), is normally denoted by f ( a 1 , . . . , ai). Sig denotes the set of signatures
of all arities.
Let nil be a symbol distinct from all function symbols, parameters, and
signatures.
A parameter valuation is a function val: P —> Sig U {nil}.
A parameter replacement is a function repl: P -» P U {nil}.
A signature index is a function ind : Sig —>• P U {nil}.
A parameter valuation, parameter replacement, or signature index is
finitely based if and only if its value is nil for all but a finite number of
arguments.
The conventional representation of a term by a linked structure in a
heap may be understood naturally as a table representing a finitely based
parameter valuation. The parameters are the heap addresses, and the
signatures are the possible values for data nodes. val(a) is the signature
stored at address a. But, we may also think of parameters as additional
Equational Logic Programming 113
when repl(a) = nil and val(a) = f(A1,• • •, Ai)- As with val*, valrepl™"*, is
undefined if nil is encountered as a value of val, or if the induction fails
to terminate because of a loop.
Example 4.1.6. Consider the finitely based parameter valuation val
and parameter replacement repl given by Table 2. All values of val and
repl not shown in the tables are nil. These tables represent some of the
consequences of the equations
when used to rewrite the term f ( f ( a , b ) , b ) The linked heap structure as-
sociated with val and repl is shown in Figure 6. The rightmost link in
each node a points to repl(a). By following links from a0 in the table we
can construct the six ground terms in val*repl(ao): val*(ao) = f(f(a, b), b),
f(f(a,b),c), f(f(a,c), b), f ( f ( a , c ) , c ) , g(a,b), and valrepl^l^o) = g(a,c).
Every equality between these terms is a logical consequence of T22, and
all of these equalities may be read immediately from the data structure by
following links from a0.
A prime weakness of data structures based on parameter valuations
and replacements is that both functions require a parameter as argument.
Given a newly constructed signature, there is no direct way, other than
116 Michael J. O'Donnell
Example 4.1.8. Consider the finitely based parameter valuation val and
parameter replacement repl discussed in Example 4.1.6, and shown in Ta-
ble 2 and Figure 6. If a0 is the parameter used for the root of the input
and output, then the logical interpretation of val and repl is
with occurrences A (the root), 1 (leftmost 6), 2 (f(f(b, b), f ( b , f(f(b, b), b)))),
2.1 (leftmost f(b, b)), 2.1.1, 2.1.2 (second and third bs), 2.2 (f(b, f(f(b, b), b))).
2.2.1 (fourth b), 2.2.2 (f(f(b,b),b)), 2.2.2.1 (rightmost f(b,b)), 2.2.2.1.1,
2.2.2.1.2, 2.2.2.2 (fifth, sixth and seventh bs). The pattern f(A, f(f(A, a), a)),
and the term t, are shown pictorially in Figure 7. The occurrence of A cho-
sen for expansion at each step is underlined, and the subterm replacing it
is shown in a box.
• Start at the root of t, with SA = Q.
* There is only one choice, so read the / at the root, and expand
to sA = f/(n,n) -
* Only the rightmost n is strongly needed, so read the correspond-
ing symbol, and expand to SA = f(n, f(£l,Q) )-
Equational Logic Programming 123
and continue at 2.2, using the last value for s2.2 before reading 2.2.2.2:
• Continue at 2 with the last ft-term before reading 2.2: s2 = /(ft, ft).
* Expand the rightmost ft again, but this time we read an a in-
stead of an /, yielding s2 = /(ft,fa])).
* The new ft-term is incompatible with the patterns, and there is
no compatible subterm, so we move back toward the root.
• Continue at A with the last ft term before reading 2.2:
according to the ways that it does and does not match portions of those
left-hand sides.
Definition 4.2.5. Let T be a set of equations. UT is the set of all sub-
terms of members of patt(T).
A subpattern set for T is a subset B C UT.
Subpattern sets may be used to present information about pattern-
matching and sequencing in several ways:
1. A match set is a subpattern set containing all of the subpatterns
known to match at a particular node in a term.
2. A possibility set is a subpattern set containing all of the subpatterns
that might match at a particular node in a term, as the result of
w-rewriting the proper subterms of that node.
3. A search set is a subpattern set containing subpatterns to search for
at a particular node, in order to contribute to a redex.
Notice that match sets and possibility sets always contain fi (except in the
unusual case when T has no occurrence of a variable), because everything
matches fl. Search sets never contain fi, since it is pointless to search for
something that is everywhere; they always contain patt(T), since finding
an entire redex occurrence is always useful.
In order to control pattern-matching and sequencing with subpattern
sets, associate a match set Ma and a possibility set Pa with each node in
a term t. Initially, Ma = {ft} and Pa = UT for all a. At all times the
subterm at a will match every subpattern in Ma, but no subpattern that
is not in Pa. That is, [M, P] is an interval in the lattice of subpattern sets
in which the true current description of the subterm at a always resides.
At every visit to a node a, update Ma and Pa based on the symbol / at a
and the information at the children a1, . . . , an of a as follows:
P'a represents the set of all subpatterns that may match at node a, as
the result of w-rewriting at proper descendants of a. Notice the similarity
between the calculation of Pa and the process of melting—UT above plays
the role of fi in melting.
126 Michael J. 0 'Donnell
Similarly, keep a search set Sa at each node that is visited in the traver-
sal, entering at the root with SA = patt(T). When moving from a node a
to the ith child a.i of a, update the search set at a.i by
and search for a strongly needed redex in the term f ( g ( c , c ) , c ) with oc-
currences A (the root), 1 (the subterm g(c, c)), 1.1, 1.2, and 2 (the three
occurrences of c, from left to right).
• Initially, we search at A, with
Update MI and PI to
Notice that information from 2 has led to a smaller search set, not
containing g(£l,a). Update MI and PI to
This time it is safe to move to 1.1, but not to 1.2, so we choose the
former.
• Search at 1.1 with
Procedure eval(t)
Let * = /(*!,...,*„); (1)
For i := l , . . . , n do (2)
Vi :— eval(ti) (3)
end for;
Return/(vi,...,v n ) (4)
end procedure eval
above to find the normal form of its input. But, the structure of the proce-
dure is heavily biased toward strict evaluation. Even the conditional func-
tion cond requires a special test between lines (1) and (2) to avoid evaluat-
ing both branches. Lazy functional languages have been implemented with
conventional recursive evaluation, by adding new values, called suspensions,
thunks, or closures to encode unevaluated subterms [Peyton Jones, 1987;
Bloss et al., 147-164]. The overhead of naive implementations of suspen-
sions led to early skepticism about the performance of lazy evaluation.
Clever optimizations of suspensions have improved this overhead substan-
tially, but it is still arguably better to start with an evaluation schema that
deals with lazy evaluation directly.
For orthogonal systems of equations, a function symbol that contributes
to a redex occurrence at a proper ancestor cannot itself be at the root of a
redex. So, a term may be normalized by a recursive procedure that rewrites
its argument only until the root symbol can never be rewritten.
Definition 4.3.1. Let T be a set of equations, and t be a term. t is a
(weak) head-normal form for T if and only if there is no redex u such that
t$*u.
t is a strong head-normal form for T if and only if there is no redex u
such that t -^* u.
Head normality is undecidable, but strong head normality is easy to
detect by melting.
Proposition 4.3.2. Let t be a term, t is a strong head-normal form for
T if and only if meltj(i) / ft.
Let t be an fl-term. The folowing are equivalent:
• u is a strong head-normal form for T, for all u 3 t
• meli£(«) 56 n.
The procedure head-eval in Figure 9 below rewrites its argument only to
strong head-normal form, instead of all the way to normal form. The
tests whether t is in strong head-normal form (1), whether t is a redex
(2), and the choice of a safe child i (5), may be accomplished by any
Equational Logic Programming 131
Procedure head-eval(t)
While t is not in strong head-normal form do (1)
If t is a redex then (2)
t := right-side(t) (3)
else
Let * = /(*!,...,*„); (4)
Choose i € [l,n] that is safe to process; (5)
ti := head-eval(ti); (6)
t :=/(«!,...,tn) (7)
end if
end while;
Return t (8)
end procedure head-eval
Procedure norm(t)
t := head-eval(t); (1)
Lett = /(«!,..., t w ); (2)
Fori := 1, . . . , n do (3)
ti := norm(ti) (4)
end for;
t :=/(«!,...,<n); (5)
Return t (6)
end procedure norm
enough to be well worth the effort. The cost of detecting identical subterms
of right-hand sides using the tree isomorphism algorithm [Aho et al., 1974]
is quite modest, and can be borne entirely at compile time.
4.3.3 Dynamic exploitation of sharing and memoing of ground
terms
There are several ways that the signature index ind may be used to im-
prove sharing dynamically and opportunistically at run time (these are
examples of transformations (6) and (7) of Section 4.1.2). A further so-
phistication of right-side builds the structure of the instance of r from
the leaves and variable instances up, and for each constructed signature
f(a\,..., an) it checks ind(/(ai,..., a n )). If the result is nil, right-side
builds a new heap node (3 containing signature f(a\,..., an), and updates
ind by ind(/(a1,..., an) := 0. If the result is 7 ^ nil it shares the pre-
existing representation rooted at 7. This technique is essentially the same
as the hashed cons optimization in Lisp implementations [Spitzen et al.,
1978]. There are more places where it may be valuable to check the
signature index for opportunistic sharing. Immediately after line (6) of
head-eval, the signature at the root a of the heap representation of t may
have changed, due to rewriting of ti and path-compression. A sophisticated
implementation may check ind(val(a)), and if the result is /? ^ nil, a, re-
assign repl(a) := /3, so that further path compression will replace links
pointing to a by links to B, thus increasing sharing. If ind(val(a)) = nil,
then reassign ind(val(a)) := a. Finally, when head-eval is called recur-
sively on a subterm rooted at the heap address a, the signature at a may
be modified as the result of rewriting of a shared descendant of a that was
visited previously by a path not passing through a. So, the same check-
ing of ind and updating of repl or ind may be performed in step (1) of
head-eval, after any implicit path compression, but before actually testing
for strong head-normal form.
The opportunistic sharing techniques described above have a nontriv-
ial cost in overhead added to each computation step, but in some cases
they reduce the number of computation steps by an exponentially growing
amount. See [Sherman, 1990] for more details on opportunistic sharing,
and discussion of the tradeoff between ovehead on each step and number
of steps. The method that checks new opportunities for sharing at every
node visit as described above is called lazy directed congruence closure. An
even more aggressive strategy, called directed congruence closure [Chew,
1980], makes extra node visits wherever there is a chance that a change in
signature has created new sharing opportunities, even though the pattern-
matcher/sequencer has not generated a need to visit all such nodes (i.e.,
when the signature at a shared node changes, directed congruence closure
visits all parents of the node, many of which might never be visited again
by head-eval). Directed congruence closure may reduce the number of
134 Michael J. O'Donnell
app(w, cons(x, cons(y, z))) = cons(x, cons(y, app(w, z))), which moves the
appended element w past the first two elements in the list. The new equa-
tion paramodulates again with app(x, cons(y,z)) = cons(y, app(x, z)) to
move the appended element past three list elements, but paramodulat-
ing the new equation with itself does even better, moving the appended
element past four list elements. It is straightforward to produce a sequence
of equations, each by paramodulating the previous one with itself, so that
the ith equation in the sequence moves an appended element past 2i list
elements. So, paramodulation reduces the number of steps involved in nor-
malizing app(a,/3), where 0 is a list of i elements, from i + 1 to O(log i).
The improvement in normalizing rev (a) is less dramatic, but perhaps
more useful in practice. If a has i elements, then rewriting requires fi(i 2 )
steps to normalize rev(a), because it involves appending to the end of a
sequence of lists of lengths 0,1,..., i. But, we may produce a sequence of
equations, each by paramodulating the previous one with
6 Parallel implementation
One of the primary early motivations for studying functional programming
languages was the apparent opportunities for parallel evaluation [Backus,
1978]. So, it is ironic that most of the effort in functional and equational
logic programming to date involves sequential implementation. A variety
of proposals for parallel implementation may be found in [Peyton Jones,
1987].
Both strictness analysis and sequentiality analysis are used primarily
to choose a sequential order of computation that avoids wasted steps. An
important open topic for research is the extension of sequentiality analysis
to support parallel computation. Huet and Levy's sequentiality analysis
is already capable of identifying more than one strongly needed redex in
a term. A parallel implementation might allocate processors first to the
strongly needed redexes, and then to other more speculative efforts. It ap-
pears that sequentiality analysis can be generalized rather easily (although
this has not been done in print) to identify strongly needed sets of re-
dexes, where no individual redex is certain to be needed, but at least one
member of the set is needed. For example, with the positive parallel-or
equations in Tor+ of Examples 2.3.16 and 3.2.1, it is intuitively clear that
if a is needed in t, and 0 is needed in u, then at least one of a and /3 must
be rewritten in or(t, u) (although neither is needed according to the for-
mal Definition 3.2.3). Further research is required on the practical impact
of heuristic strategies for allocating parallel processors to the members of
strongly needed sets. It is natural to give priority to the singleton sets
(that is, to the strongly needed redexes), but it is not clear whether a set
of size 2 should be preferred to one of size 3—perhaps other factors than
the size of the set should be considered. Strongly needed redex sets are
essentially disjunctive assertions about the need for redexes—more general
sorts of boolean relations may be useful (e.g., either all of ai,... ,am or
all of /3i,..., (3n are needed).
Unfortunately, since strongly needed redexes are all outermost, sequen-
tiality analysis as known today can only help with parallelism between
different arguments to a function. But, one of the most useful qualities of
lazy programming is that it simulates a parallel producer-consumer rela-
tionship between a function and its arguments. It seems likely that much
of the useful parallelism to be exploited in equational logic programming
involves parallel rewriting of nested redexes. An analysis of nonoutermost
needed redexes appears to require the sort of abstract interpretation that is
used in strictness analysis [Mycroft, 1980; Hughes, 1985b]—it certainly will
depend on right-hand sides as well as left-hand sides of equations. Unfor-
tunately, most of the proposed intermediate languages for compiling equa-
tional programs are inherently sequential, and a lot of work is required to
convert current sequential compiling ideas to a parallel environment. The
140 Michael J. O'Donnell
idea of compiling everything into combinators may not be useful for parallel
implementation. The known translations of lambda terms into combinators
eliminate some apparent parallelism between application of a function and
rewriting of the definition of the function (although no direct implemen-
tation is known to support all of this apparent parallelism either). Only
very preliminary information is available on the inherent parallel power of
rewriting systems: even the correct definition of such power is problematic
[O'Donnell, 1985].
At a more concrete level, there are a lot of problems involved in par-
allelizing the heap-based execution of evaluation/rewriting sequences. A
data structure, possibly distributed amongst several processors, is needed
to keep track of the multiple locations at which work is proceeding. If any
speculative work is done on redexes that are not known to be needed, there
must be a way to kill off processes that are no longer useful, and reallocate
their processing to to useful processes (although aggressive sharing through
the signature index confuses the question of usefulness of processes in the
same way that it confuses the usefulness of heap nodes).
Sharing presents another challenge. Several different processors may
reach a shared node by different paths. It is important for data integrity
that they do not make simultaneous incompatible updates, and important
for efficiency that they do not repeat work. But, it is incorrect for the
first process to lock all others completely out of its work area. Suppose we
are applying a system of equations including car (cons ( x , y ) ) = x. There
may be a node a in the heap containing the signature cons(/3i,/32) shared
between two parents, one of them containing cons (a, 6) and the other con-
taining car (a). A process might enter first through the cons parent, and
perform an infinite loop rewriting inside /32. If a second process enters
through the car node, it is crucial to allow it to see the cons at a, to link
to /?i, and depending on the context above possibly to continue rewriting
at /3j and below.
Abstractly, we want to lock node/state pairs, and allow multiple pro-
cesses to inspect the same node, as long as they do so in different states,
but when a process reaches a node that is already being processed in the
same state it should wait, and allow its processor to be reassigned, since
any work that it tries to do will merely repeat that of its predecessor. It is
not at all clear how to implement such a notion of locking with acceptable
overhead. Perhaps a radically different approach for assigning work to pro-
cessors is called for, that does not follow the structure of current sequential
implementations so closely. For example, instead of the shared-memory ap-
proach to the heap in the preceding discussion, perhaps different sections
of the heap should be assigned to particular processors for a relatively long
period, and demands for evaluation should be passed as messages between
processors when they cross the boundaries of heap allocation.
Equational Logic Programming 141
The natural infinite output to expect from input a is f(c, f(c,...)). b pro-
duces the same infinite output. So, if in our semantic system f(c, f(c,...))
is a generalized term, with a definite value (no matter what that value is),
then T27 |= (a = b). But, according to the semantics of Definition 2.3.14
in Section 2.3.4 of the chapter 'Introduction: Logic and Logic Program-
ming Languages', T27 ^j. (a = b), because there are certainly models in
which the equation x = f(c,x) has more than one solution. For example,
interpret / as integer addition, and c as the identity element zero. Every
142 Michael J. O'Donnell
integer n satisfies n = 0 + n.
Even if we replace the second equation with the exact substitution of 6
for a in the first, that is b == f ( c , b), it does not follow by standard equational
semantics that a = b. With some difficulty we may define a new semantic
system, with a restricted set of models in which all functions have sufficient
continuity properties to guarantee unique values for infinite terms. But it is
easy to see that the logical consequence relation for such a semantic system
is not semicomputable (recursively enumerable). It is always suspect to say
that we understand a system of logic according to a definition of meaning
that we cannot apply effectively.
Instead, I propose to interpret outputs involving infinite terms as ab-
breviations for infinite conjunctions of formulae in the first-order predicate
calculus (FOPC) (such infinite conjunctions, and infinite disjunctions as
well, are studied in the formal system called Lw1,w [Karp, 1964]).
Definition 7.1.2. The first- order predicate calculus with equality (FOPC_i)
is the result of including the equality symbol = in the set Pred2 of binary
predicate symbols, and combining the semantic rules for FOPC (Defini-
tions 2.3.1-2.3.4, Section 2.3.1 of the chapter 'Introduction: Logic and
Logic Programming Languages') and Equational Logic (Definitions 2.3.13-
2.3.14, Section 2.3.4 of the 'Introduction . . . ' chapter) in the natural way
(a model for FOPC.:. satisfies the restrictions of a model for FOPC and the
restrictions of a model for equational logic).
La u JL is the extension of FOPCj. to allow countably infinite conjunc-
tions and disjunctions in formulae. Let FFu u ± be the set of formulae
in Lu u _L. . Extend the semantic system for FOPC_L to a semantic system
for Lj 'u ± by the following additional rules for infinite conjunctions and
disjunctions:
1. pT,l/(f\{Ai,A2, ...}) = 1 if and only if pT<l/(Ai) = 1 for all i > 1
2- Pr,v(M{Ai, A%, ...}) = 1 if and only if pTtV(Ai) = 1 for some i > 1
That is, an infinite conjunction is true precisely when all of its conjuncts
are true; and an infinite disjunction is true precisely when at least one of
its disjuncts is true.
A set T of terms is a directed set if and only if, for every two terms
t1, t2 € T, there is a term t3 € T such that £3 is an instance of t1 and also
an instance of t%.
Let U be a countable directed set of linear terms. For each u € U, let
yu be a list of the variables occurring in u. Let t be a finite ground term.
The conjunctive equation of t and U is the infinite conjunction
The term limit of a directed set U (written lim(U)) is the possibly infi-
nite term resulting from overlaying all of the terms u £ U, and substituting
the new symbol for any variables that are not overlaid by nonvariable
symbols. Since members of U are pairwise consistent, every location in the
limit gets a unique symbol, or by default comes out as _L. To see more
rigorously that the term limit of U is well defined, construct a (possibly
transfinite) chain, (to E ti C • • • ) . First, to = _L. Given ta, choose (axiom
of choice) an s £ U such that s % ta, and let ta+i be the overlay of s with
ta. At limit ordinals A, let t\ be the limit of the chain of terms preceding
A. With a lot of transfinite induction, we get to a tp such that, for all
s £ U, s C t/3, at which point the chain is finished. lim(J7) is the limit of
that chain.
The canonical set of an infinite term t (written approx(t)) is the set of
all finite linear terms t1 (using some arbitrary canonical scheme for naming
the variables) such that t is an instance of t'. approx(t) is a directed set,
and lim(approx(i)) = t.
A careful formal definition of infinite terms and limits is in [Kenneway
et al., 1991; Dershowitz et al, 1991], but an intuitive appreciation suffices
for this section.
When the finite input t produces the infinite output lim(U), instead
of interpreting the output as the equation t = lim(U), interpret it as the
conjunctive equation of t and U. Notice that a chain (a sequence ui,uy,...
such that Uj+i is an instance of Ui) is a special case of a directed set.
The infinite outputs for finite inputs may be expressed by chains, but the
generality of directed sets is needed with infinite outputs for infinite inputs
below.
Infinite inputs require a more complex construction, since they intro-
duce universal quantification that must be nested appropriately with the
existential quantification associated with infinite output. Also, different
orders in which a consumer explores the output may yield different se-
quences of demands for input, and this flexibility needs to be supported by
the semantics.
Definition 7.1.3. Let T and U be directed sets of linear terms such that
no variable occurs in both t € T and u € U. For each term t € T and u €. U,
let Xt and yu be lists of the variables occurring in t and w, respectively. Let
/ : T -4 2U be a function from terms in T to directed subsets of U, such
that when t% is an instance of t1, every member of f(t%) is an instance of
every member of f(<i). The conjunctive equation of T and U by f is the
infinite conjunction
if and only if there exists a directed set U and a monotonic (in the instance
ordering C) function f : approx(tw) ->• 2U such that
1. A is the conjunctive equation of approx(tw) and U by f;
2. u is in incremental normal form, for all u € U.
Now ( F a ) i u _L,Q",?-^) is a query system representing the answers to
questions of' the form 'what set of incremental normal forms for s 1 , . . . , si
is conjunctively equal to t?' In an implementation of equational logic pro-
gramming based on norm" , the consumer of output generates demands
causing the computation of some u €. U. Some of the symbols in u are
demanded directly by the consumer, others may be demanded by the sys-
tem itself in order to get an incremental normal form. The implementation
demands enough input to construct t £ approx(tw) such that u 6 f ( t ) . By
modelling the input and output as directed sets, rather than sequences, we
allow enough flexibility to model all possible orders in which the consumer
might demand output. The function / is used to model the partial synchro-
nization of input with output required by the semantics of equational logic.
Unfortunately, the trivial output y,y,..., with term limit _L, always satisfies
1-2 above. The most useful implementation would provide consequentially
strongest answers (Definition 2.2.5, Section 2.2 of the chapter 'Introduction:
Logic and Logic Programming Languages')—that is, they would demand
the minimum amount of input semantically required to produce the de-
manded output. If consequentially strongest answers appear too difficult
to implement, a more modest requirement would be that lim(U) is max-
imal among all correct answers—that is, all semantically correct output
is produced, but not necessarily from the minimal input. The techniques
outlined above do not represent sharing information, either within the in-
put, within the output, or between input and output. Further research is
needed into semantic interpretations of sharing information.
Example 7.1.5. Consider the system
Equational Logic Programming 145
where a<j is the root of the heap representation of the term being rewritten).
Heuristically, it appears that the majority of the individual steps in
solving an equation are unlikely to create such parallelism. But, EqL's uni-
form requirement of finding a solution in normal form is likely to introduce
the same huge inefficiencies in some cases as strict evaluation (notice that
the individual evaluations of s', t' to normal form are lazy, but the equation
solution is not lazy, as it may take steps that are required to reach normal
form, but not to solve the equation). If these heuristic observations are ac-
curate, even an implementation with rather high overhead for parallelism
may be very valuable, as long as the cost of parallelism is introduced only
in proportion to the actual degree of parallelism, rather than as a uniform
overhead on sequential evaluation as well.
Among the theoretical literature on term rewriting, the most useful
material for equation solving will probably be that on narrowing [Fay,
1979] (see the chapter 'Equational Reasoning and Term Rewriting Systems
in Volume 1), which is a careful formalization of observation (2) above.
Information on sequencing narrowing steps seems crucial to a truly efficient
implementation: some new results are in [Antoy et al., 1994].
7.3 Indeterminate evaluation in subset logic
Until now, I have considered only quantified conjunctions of equations as
formulae, else I cannot claim to be using equational logic. The discipline
of equational logic constrains the use of term rewriting systems, and leads
us to insist on confluent systems with lazy evaluation. Lazy evaluation
appears to be mostly a benefit, but the restriction to confluent systems
is annoying in many ways, and it particularly weakens the modularity of
equational logic programming languages, since orthogonality depends on
the full textual structure of left-hand sides of equations, and not just on an
abstract notion of their meanings. As well as improving the techniques for
guaranteeing confluence, we should investigate nonconfluent term rewrit-
ing. Since, without confluence, term rewriting cannot be semantically com-
plete for equational logic, we need to consider other logical interpretations
of term rewriting rules.
A natural alternative to equational logic is subset logic [O'Donnell,
1987]. Subset logic has the same terms as equational logic, the formu-
lae are the same except that they use the backward subset relation symbol
D instead of = (Definition 2.3.13, Section 2.3.4 of the chapter 'Introduction:
Logic and Logic Programming Languages'). Subset logic semantics are the
same as equational (Definition 2.3.14, Section 2.3.4 of the 'Introduction
...' chapter ), except that every term represents a subset of the universe of
values instead of a single value, function symbols represent functions from
subsets to subsets that are extended pointwise from functions of individual
values (i.e., f ( S ) = U { f ( { x } ) : x € S}). Subset logic is complete with the
reflexive, transitive, and substitution rules of equality, omitting the sym-
150 Michael J. O'Donnell
metric rule (Definition 2.1.2, Section 2.1). So, term rewriting from left to
right is complete for subset logic, with no restrictions on the rules. Tech-
nically, it doesn't matter whether the left side of a rule is a subset of the
right, or vice versa, as long as the direction is always the same. Intuitively,
it seems more natural to think of rewriting as producing a subset of the
input term, since then a term may be thought of as denoting a set of pos-
sible answers. Thus, a single line of a subset logic program looks like / D r .
Note that normal forms do not necessarily denote singleton sets, although
it is always possible to construct models in which they do.
Subset logic programming naturally supports programs with indetermi-
nate answers. When a given input s rewrites to two different normal forms
t and M, subset logic merely requires that s 3 t and s D u are true, from
which it does not follow that t = u is true, nor t D u, nor u 2 t. While
equational logic programming extends naturally to infinite inputs and out-
puts, without changing its application to finite terms, such extension of
subset logic programming is more subtle. If only finite terms are allowed,
then infinite computations may be regarded as a sort of failure, and a finite
normal form must be found whenever one exists. If incremental output
of possibly infinite normal forms is desired, then there is no effective way
to give precedence to the finite forms when they exist. The most natural
idea seems to be to follow all possible rewriting paths, until one of them
produces a stable head symbol (that is, the head symbol of a strong head-
normal form) for output. Whenever such a symbol is output, all rewriting
paths producing different symbols at the same location are dropped. Only
a reduction path that has already agreed with all of the symbols that have
been output is allowed to generate further output. The details work out es-
sentially the same as with infinite outputs for equational programs, merely
substituting D for ==. It is not at all clear whether this logically natu-
ral notion of commitment to a symbol, rather than a computation path,
is useful. I cannot find a natural semantic scheme to support the more
conventional sort of commitment to a computation path instead of an in-
termediate term, although a user may program in such a way that multiple
consistent computation paths never occur and the two sorts of commitment
are equivalent.
Efficient implementation of nonconfluent rewriting presents a very in-
teresting challenge. It is obviously unacceptable to explore naively the
exponentially growing set of rewriting sequences. A satisfying implemen-
tation should take advantage of partial confluent behavior to prune the
search space down to a much smaller set of rewriting sequences that is still
capable of producing all of the possible outputs. The correct definition
of the right sort of partial confluence property is not even known. It is
not merely confluence for a subset of rules, since two rules that do not
interfere with one another may interfere differently with a third, so that
the order in which the two noninterfering rules are applied may still make
Equational Logic Programming 151
than two nodes) with nodes standing for variables and hyperedges (edges
touching any number of nodes, not necessarily two) representing predicate
symbols. Two fundamental problems stand in the way of a useful imple-
mentation of relational logic programming through hypergraph rewriting.
1. We need an efficient implementation of hypergraph rewriting. The
pattern-matching problem alone is highly challenging, and no practi-
cal solution is yet known.
2. We need a semantic system that makes intuitive sense, and also sup-
ports computationally desirable methods of hypergraph rewriting.
A first cut at (2) might use the semantic system for FOPC, and express
rewrite rules as formulae of the form
These clauses differ from the Horn clauses of Prolog in two ways. The
quantification above is Vx : 3y :, where clauses are universally quantified.
And, the consequence of the implication above is a conjunction, where
Horn clauses allow at most one atomic formula in the consequence, and
even general clauses allow a disjunction, rather than a conjunction. No-
tice that, with Prolog-style all-universal quantification, the implication dis-
tributes over conjunctions in the consequence ((A\ A ^2) •<= B is equivalent
to (A1 4= B) A (A2 <= B)). But, if existentially quantified variables from y
appear in AI,..., Am, the distribution property does not hold.
Since the consequence and hypothesis of our formula are both con-
junctions of atomic formulae, they may be represented by hypergraphs,
and it is natural to read the formula as a hypergraph rewrite rule. A
natural sort of hypergraph rewriting will be sound for deriving univer-
sally quantified implications from V3 implications, as long as the exis-
tentially quantified variables y correspond to nodes in the graph that
participate only in the hyperedges representing Ai,...,Am. The uni-
versally quantified variables in x correspond to nodes that may connect
in other ways as well as by hyperedges representing A1,...,A m , so they
are the interface nodes to the unrewritten part of the hypergraph. In-
terestingly, the final result of a computation on input / is an implica-
tion VtZ? : I <= O. This implication does not give a solution to the goal
/, as in Prolog. Rather, it rewrites the goal / into a (presumably sim-
pler or more transparent) goal O, such that every solution to O is also
a solution to I. This is formally similar to functional and equational
logic programming, in that the class of legitimate outputs is a subclass
of the legitimate inputs—in Prolog inputs are formulae and outputs are
substitutions. Constraint logic programming [Jaffar and Lassez, 1987;
Lassez, 1991] is perhaps heading in this direction.
The semantic treatment suggested above is not very satisfactory, as it
Equational Logic Programming 153
rules out lazy evaluation, and even storage reclamation (an intermediate
part of a hypergraph that becomes disconnected from the input and out-
put cannot be thrown away, even though it cannot affect the nature of
a solution, since if any part of the hypergraph is unsolvable, the whole
conjunction is unsolvable). Also, the logically sound notion of unification
of the left-hand side of a rule with a portion of a hypergraph in order to
rewrite it seems far too liberal with this semantic system—without the
single-valued quality of functions, additional relations can be postulated
to hold anywhere. I conjecture that the first problem can be solved by a
semantic system in which every unquantified formula is solvable, so that a
disconnected portion of the hypergraph may be discarded as semantically
irrelevant. One way to achieve this would be to use some sort of measured
universe [Cohn, 1980], and require every atomic predicate to hold for all but
a subset of measure 0—then every finite intersection of atomic predicates
(equivalently, the value of every finite conjunction of atomic formulae) has
a solution. Such a semantic system is not as weird as it first seems, if we
understand the universe as a space of concepts, rather than real objects,
and think of each predicate symbol as giving a tiny bit of information about
a concept. We can conceive of every combination of information (such as a
green unicorn with a prime number of feet that is a large power of 2), even
though most such combinations never enter our notion of reality. Although
the question of solvability becomes trivial in this semantic system, impli-
cations between solutions still have interesting content. The problem of
limiting unifications in a useful way appears more difficult. I suggest that
nonclassical 'relevant' interpretations of implication [Anderson and Belnap
Jr., 1975] are likely to be helpful.
Acknowledgements
I am very grateful for the detailed comments that I received from the
readers, Bharat Jayaraman, Donald W. Loveland, Gopalan Nadathur and
David J. Sherman.
References
[Abramsky and Hankin, 1987] S. Abramsky and C. Hankin. Abstract In-
terpretation of Declarative Languages. Ellis Horwood, Chichester, UK,
1987.
[Aho et al., 1974] A. V. Aho, John E. Hopcroft, and J. D. Ullman. The
Design and Analysis of Computer Algorithms. Addison-Wesley, 1974.
[Anderson and Belnap Jr., 1975] Alan Ross Anderson and Nuel D. Belnap
Jr. Entailment—the Logic of Relevance and Necessity, volume 1. Prince-
ton University Press, Princeton, NJ, 1975.
[Antoy et al., 1994] Sergio Antoy, Rachid Echahed, and Michael Hanus. A
needed narrowing strategy. In Proceedings of the 21st ACM Symposium
154 Michael J. O'Donnell
[Jaffar and Lassez, 1987] Joxan Jaffar and Jean-Louis Lassez. Constraint
logic programming. In 14th Annual ACM Symposium on Principles of
Programming Languages, pages 111-119, 1987.
[Jayaraman, 1985] Bharat Jayaraman. Equational programming: A uni-
fying approach to functional and logic programming. Technical Report
85-030, The University of North Carolina, 1985.
[Jayaraman, 1992] Bharat Jayaraman. Implementation of subset-
equational programming. The Journal of Logic Programming, 12(4),
April 1992.
[Johnsson, 1984] Thomas Johnsson. Efficient compilation of lazy evalua-
tion. In Proceedings of the ACM SIGPLAN'84 Symposium on Compiler
Construction, 1984. SIGPLAN Notices 19(6) June, 1984.
[Kahn and Plotkin, 1978] Gilles Kahn and Gordon Plotkin. Domaines con-
crets. Technical report, IRIA Laboria, LeChesnay, France, 1978.
[Kapur et al, 1982] Deepak Kapur, M. S. Krishnamoorthy, and P. Naren-
dran. A new linear algorithm for unification. Technical Report 82CRD-
100, General Electric, 1982.
[Karlsson, 1981] K. Karlsson. Nebula, a functional operating system. Tech-
nical report, Chalmers University, 1981.
[Karp, 1964] Carol R. Karp. Languages with Expressions of Infinite Length.
North-Holland, Amsterdam, 1964.
[Kathail, 1984] Arvind and Vinod Kumar Kathail. Sharing of computation
in functional langauge implementations. In Proceedings of the Interna-
tional Workshop on High-Level Computer Architecture, 1984.
[Keller and Sleep, 1986] R. M. Keller and M. R. Sleep. Applicative caching.
ACM Transactions on Programming Languages and Systems, 8(1):88-
108, 1986.
[Kenneway et ai, 1991] J. R. Kenneway, Jan Willem Klop, M. R. Sleep,
and F. J. de Vries. Transfinite reductions in orthogonal term rewriting
systems. In Proceedings of the 4th International Conference on Rewriting
Techniques and Applications, volume 488 of Lecture Notes in Computer
Science. Springer-Verlag, 1991.
[Klop, 1980] Jan Willem Klop. Combinatory Reduction Systems. PhD
thesis, Mathematisch Centrum, Amsterdam, 1980.
[Klop, 1991] Jan Willem Klop. Term rewriting systems. In S. Abramsky,
Dov M. Gabbay, and T. S. E. Maibaum, editors, Handbook of Logic in
Computer Science, volume 1, chapter 6. Oxford University Press, Oxford,
1991.
[Klop and Middeldorp, 1991] Jan Willem Klop and A. Middeldorp. Se-
quentiality in orthogonal term rewriting systems. Journal of Symbolic
Computation, 12:161-195,1991.
Equational Logic Programming 159
Contents
1 Building the framework: the resolution procedure . . . . 163
1.1 The resolution procedure 164
1.2 Linear resolution refinements 175
2 The logic programming paradigm 186
2.1 Horn clause logic programming 186
2.2 A framework for logic programming 190
2.3 Abstract logic programming languages 198
3 Extending the logic programming paradigm 212
3.1 A language for hypothetical reasoning 213
3.2 Near-Horn Prolog 219
4 Conclusion 229
namely the proof procedures which are clearly ancestors of the first proof
procedure associated with logic programming, SLD-resolution. Extensive
treatment of proof procedures for automated theorem proving appear in
Bibel [Bibel, 1982], Chang and Lee [Chang and Lee, 1973] and Loveland
[Loveland, 1978].
1.1 The resolution procedure
Although the consideration of proof procedures for automated theorem
proving began about 1958 we begin our overview with the introduction of
the resolution proof procedure by Robinson in 1965. We then review the
linear resolution procedures, model elimination and SL-resolution proce-
dures. Our exclusion of other proof procedures from consideration here
is due to our focus, not because other procedures are less important his-
torically or for general use within automated or semi-automated theorem
process.
After a review of the general resolution proof procedure, we consider
the linear refinement for resolution and then further restrict the proce-
dure format to linear input resolution. Here we are no longer capable of
treating full first-order logic, but have forced ourselves to address a smaller
domain, in essence the renameable Horn clause formulas. By leaving the
resolution format, indeed leaving traditional formula representation, we see
there exists a linear input procedure for all of first-order logic. This is the
model elimination(ME) procedure, of which a modification known as the
SL-resolution procedure was the direct inspiration for the SLD-resolution
procedure which provided the inference engine for the logic programming
language Prolog. The ME-SL-resolution linear input format can be trans-
lated into a very strict resolution restriction, linear but not an input re-
striction, as we will observe.
The resolution procedure invented by Alan Robinson and published in
1965 (see [Robinson, 1965]) is studied in depth in the chapter on resolution-
based deduction in Volume 1 of this handbook, and so quickly reviewed
here. The resolution procedure is a refutation procedure, which means
we establish unsatisfiability rather than validity. Of course, no loss of
generality occurs because a formula is valid if and only if (iff) its negation
is unsatisfiable. The formula to be tested for unsatisfiability is "converted
to conjunctive normal form with no existential quantifiers present". By this
we mean that with a given formula F to be tested for unsatisfiability, we
associate a logically equivalent formula F' which is in conjunctive normal
form (a conjunction of clauses, each clause a disjunction of atomic formulas
or negated atomic formulas, called literals) with no existential quantifiers.
F' may be written without any quantifiers, since we regard F, and therefore
F', as a closed formula, so the universal quantifiers are implicitly assumed
to be present preceding each clause. (Thus we are free to rename variables
of each clause so that no variable occurs in more than one clause.) The
Proof Procedures for Logic Programming 165
where g(x) is a Skolem function introduced for the y that occurs in the
intermediate formula
has
a notation change in the middle of the paper, but this reflects common
sense and historical precedent, so we forsake uniform notation.
Resolution procedures are based on the resolution inference rule
C1 V a ->a V C2
where C1 V a and ->a V C2 are two known clauses, with C1 and C2 repre-
senting arbitrary disjuncts of literals. The literals a and ->a are called the
resolving literals. The derived clause is called the resolvent. The similarity
of the inference rule to Gentzen's cut rule is immediately clear and the rule
can be seen as a generalization of modus ponens. The resolvent is true in
any model for the two given clauses, so the inference rule preserves validity.
The resolution inference rule just given is the prepositional rule, also
called the ground resolution rule for a reason given later. We postpone
discussion of the very similar first-order inference rule to delay some com-
plications.
Because the Skolem conjunctive format is so uniform in style, it is con-
venient to simplify notation when using it. We drop the OR symbol in
clauses and instead simply concatenate literals. Clauses themselves are ei-
ther written one per line or separated by commas. When the non-logical
symbols are all single letters it is also convenient to drop the parentheses as-
sociated with predicate letters and the commas between arguments. Thus
the Skolem conjunctive form formula (P(a) V P(b)) A (-'P(a)) A (~P(b)) is
shortened to PaPb, –Pa, –Pb by use of the simplifying notation.
Using the just-introduced shorthand notation for a formula in this form,
often called a clause set, we present a refutation of a clause set. We use
the terms input clause to designate a clause from the input clause set given
by the user to distinguish such clauses from derived clauses of any type.
1. PaPb input clause
2. Pa–Pb input clause
3. –PaPb input clause
4. –Pa–Pb input clause
5. Pa resolvent of clauses 1, 2
6. –Pa resolvent of clauses 3, 4
7. contradiction Pa and –Pa cannot both hold
Note that in line 5 the two identical literals are merged.
Since each resolvent is true in any model satisfying the two parent
clauses, a satisfiable clause set could not yield two contradictory unit
clauses. Thus Pa, -Pa signals an unsatisfiable input clause set. The
resolution inference rule applied to Pa and -Pa yields the empty clause,
denoted D. This is customarily entered instead of the word "contradiction",
as we do hereafter.
Proof Procedures for Logic Programming 167
Except for merging, these unit clause resolutions must be the resolutions
that occur near the end of refutations, so it makes sense to "look ahead"
in this limited way.
Having treated the level-saturation resolution proof procedure, which is
a breadth-first search ordering, it is natural to ask about depth-first search
procedures. This is of particular interest to logic programmers who know
that this is the search order for Prolog. We want to do this in a manner
that preserves completeness. (Why it is desirable to preserve completeness
is a non-trivial question, particularly at the first-order level where searches
need not terminate and the combinatorial explosion of resolvent produc-
tion is the dominant problem. Two points speak strongly for considering
completeness: 1) these inference systems have such a fine-grained inference
step that incompleteness in the inference proof procedures leads to some
very simple problems being beyond reach and 2) it is best to understand
what one has to do to maintain completeness to know what price must
be paid if completeness is sacrificed. We will make some further specific
comments on this issue later.)
Before presenting a depth-first proof procedure let us observe that at the
prepositional, or ground, level there are only a finite number of resolvents
possible beginning with any given (finite) clause set. This follows directly
from two facts: 1) no new atoms are created by the resolution inference
rule, and 2) a literal appears at most once in a clause, so there is an upper
bound on the number of literals in any resolvent.
A straightforward depth-first proof procedure proceeds as follows. If
there are n input clauses then we temporarily designate clause n as the focus
clause. In general the focus clause is the last resolvent still a candidate for
inclusion in a resolution refutation (except for the first clause assignment,
to clause n). We begin by resolving focus clause n against clause 1, then
clause 2, etc. until a new non-subsumed non-tautological clause is created
as a resolvent. This resolvent is labeled clause n+1 and becomes the new
focus clause. It is resolved against clause 1, clause 2, etc. until a new clause
is created that is retained. This becomes the new focus clause and the
pattern continues. If D is obtained the refutation is successful. Otherwise,
some focus clause m creates no new surviving clause and backtracking is
begun. The clause m — 1 is relabeled the focus clause (but clause m is
retained) and clause m — 1 is resolved against those clauses not previously
tried, beginning with clause j + 1 where clause j is the clause that paired
with clause m — 1 to produce clause m. The first retained resolvent is
labeled clause m + 1. Clause m + 1 now becomes the focus clause and the
process continues as before. This backtracking is close to the backtracking
employed by Prolog.
The above depth-first procedure does differ from that used in Prolog,
however. The primary difference is that after one backtracks from a clause
that yields no new resolvent, the clause is not removed. If we are interested
Proof Procedures for Logic Programming 169
A linear refutation:
1. PaPb input clause
2. Pa-Pb input clause
3. -PaPb input clause
4. -Pa-Pb input clause
5. -Pb resolvent of 2, 4
6. Pa resolvent of 1,5
7. Pb resolvent of 3,6
8. a resolvent of 5,7
The last step in the above refutation involved two resolvents. This is
true of every refutation of this clause set, since no input clause is a one-
literal clause and D must have two one-literal parents. This provides an
example that not every clause set has a linear refutation where the far
parent is always a input clause. Clauses 4, 5 and 6 are the ancestors of
clause 7 in the above example. This definition of ancestor omits clauses 1,
2 and 3 as ancestors of clause 7 but our intent is to capture the derived
clauses used in the derivation of clause 7.
The linear restriction was independently discovered by Loveland (see
[Loveland, 1970], where the name linear was introduced and a stronger re-
striction s-linear resolution also was introduced) and by Luckham (where
the name ancestor filter was used; see [Luckham, 1970].) A proof of com-
pleteness of this restriction also is given in Chang and Lee [Chang and Lee,
1973] and Loveland [Loveland, 1978].
With this result we can organize our depth-first search to discard any
clause when we backtrack due to failure of that clause to lead to a proof.
It follows that we need only try a given resolvent with clauses of lower
index than the resolvent itself and that all retained resolvents are elements
of the same (linear) proof attempt. Use of this restriction permits a great
saving in space since only the current proof attempt clauses are retained.
However, a large cost in search effectiveness may be paid because of much
duplication of resolvent computation. In a proof search the same clause (or
close variant) is often derived many times through different proof histories.
In a breadth-first style search the redundant occurrences can be eliminated
by subsumption check. This is a check not available when resolved clauses
are eliminated upon backtracking as usually done in the depth-first linear
resolution procedures. Each recreation of a clause occurrence usually means
a search to try to eliminate the occurrence. This produces the high search
price associated with depth-first search procedures.
The redundant search problem suffered by depth-first linear resolution
procedures may make this approach unwise for proving deep mathematical
theorems, where much computation is needed and few heuristic rules exist
to guide the search. Depth-first linear resolution is often just the right pro-
cedure when there is a fairly strong guidance mechanism suggesting which
Proof Procedures for Logic Programming 171
clauses are most likely to be useful at certain points in the search. This
principle is at work in the success of Prolog, which implements a depth-first
linear strategy. When using Prolog, the user generally uses many predi-
cate names, which means that relatively few literals are possible matches.
This trims the branching rate. Secondly, the user orders the clauses with
knowledge of how he expects the computation to proceed. Clearly, when
some information exists to lead one down essentially correct paths, then
there is a big win over developing and retaining all possible deductions to
a certain depth. Besides logic programming applications, the depth-first
approach is justified within artificial intelligence (AI) reasoning system ap-
plications where the search may be fairly restricted with strong heuristic
guidance. When such conditions hold, depth-first search can have an ad-
ditional advantage in that very sophisticated implementation architectures
exist (based on the Warren Abstract Machine; see [Ait-Kaci, 1990]) allow-
ing much higher inference rates (essentially, the number of resolution opera-
tions attempted) than is realized by other procedure designs. This speed is
realized for a restricted linear form called linear input procedures, which we
discuss later, which is usually implemented using depth-first search mode.
It is important to realize that the linear resolution restriction need not
be implemented in a depth-first manner. For example, one might begin with
several input clauses "simultaneously" (either with truly parallel compu-
tation on a parallel computer or by interleaving sequential time segments
on a sequential machine) and compute all depth n + 1 resolvents using
all depth n parents before proceeding to depth n + 2 resolvents. This
way subsumption can be used on the retained clauses (without endanger-
ing completeness) yet many resolution rule applications are avoided, such
as between two input clauses not designated as start clauses for a linear
refutation search.
The mode of search described above is very close to the set-of-support
resolution restriction introduced by Wos et al. [1964; 1965]. A refutation
of a clause set S with set-of-support T C S is a resolution refutation where
every resolvent has at least one parent a resolvent or a member of T. Thus
two members of S — T cannot be resolved together. To insure completeness
it suffices to determine that S—T is a satisfiable set. The primary difference
between this restriction and the quasi-breadth-first search for linear resolu-
tion described in the preceding paragraph is that for the linear refutation
search of the previous paragraph one could choose to segregate clauses by
the deduction responsible for its creation (labeled by initial parents), and
not resolve clauses from different deductions. This reduces the branching
factor in the search tree. If this is done, then any subsuming clause must
acquire the deduction label of the clause it subsumes. The resulting refu-
tation will not be linear if a subsuming clause is utilized, but usually the
refutation existence, and not the style of refutation, is what matters. One
could call such refutations locally linear refutations. Loveland [Loveland,
172 Donald W. Loveland and Gopalan Nadathur
Binary factoring. Given clause C V l\ V I?, where li and 12 are literals and
C is a disjunction of literals, we deduce factor (C V l)a where a is the mgu
of li and 12, i.e. l\a = I = l2a; in summary:
Because the resolution rule requires the two parent clauses to be variable
disjoint, clauses may have their variables renamed prior to the application
of the resolution rule. This is done without explicit reference; we say
that clause Px and clause -PxRf(x) has resolvent Rf(x) when the actual
resolution rule application would first rename Px as (say) Py and then use
174 Donald W. Loveland and Gopalan Nadathur
Since a set of resolvents generated from a finite set of clauses can be infinite,
pure depth-first search may not terminate on some search paths. Rather, it
always utilizes the last resolvent successfully and never backtracks. (This
is well-known to Prolog users.) To insure that a refutation can be found
when one exists, a variation of pure depth-first called iterative deepening is
sometimes used. Iterative deepening calls for repeated depth-first search to
bounds d + nr, for d > 0, r > 0 and n = 0, 1, 2, ..., where the user sets the
parameters d and r prior to the computation. Frequently, r = 1 is used.
The advantage of small storage use, and speed for input linear search, is
retained and yet some similarity to breadth-first search is introduced. The
cost is recomputation of earlier levels. However, if one recalls that for
complete binary trees (no one-child nodes and all leaves at the same level)
there axe as many leaves as interior nodes, and the proportion of leaves
grows as the branching factor grows, one sees that recomputation is not
as frightening as it first appears. Perhaps the more important downside to
iterative deepening is that if you are structured so as to find the solution
while sweeping a small portion of the entire search space with sufficient
depth bound then a depth bound just too low makes the procedure sweep
the entire search space before failing and incrementing. This is a primary
reason why Prolog employs pure depth-first search. (Also, of course, be-
cause very frequently Prolog does not have infinite search branches, at least
under the clause ordering chosen by the user.)
Seeking linear refutations at the first-order level introduces a techni-
cal issue regarding factoring. The complication arises because we wish to
enforce the notion of linearity strongly. In particular, if a factor of a far
parent is needed that was not produced when that clause was a near par-
ent, then the factor would have to be entered as a line in the deduction,
violating the condition that the near parent always participates in creation
of the next entry (by resolution or factoring). To avoid this violation we
will agree that for linear resolution, unless otherwise noted, a resolution
operation can include use of a factor of the far parent ancestor or input
clause. Actually, for most versions of linear resolution, including the gen-
eral form introduced already, this caveat is not needed provided that the
given set has all its factors explicitly given also. (That is, the given set
is closed under the factoring inference rule.) A fuller treatment of this is-
sue, and much of what follows regarding linear resolution refinements and
variants, is found in Loveland [Loveland, 1978].
1.2 Linear resolution refinements
There are several restrictions of linear resolution, and several combinations
that multiply the possibilities for variations of linear resolutions. We will
settle for one variant, among the most restricted of possibilities. Our inter-
est in the particular form is that it is a resolution equivalent to the model
elimination (SL-resolution) procedure that we study next.
176 Donald W. Loveland and Gopalan Nadathur
Example 1.2.2.
Near parent clause: PxQaRx
Far parent clause: Qb-Rf(x)
No s-resolvent exists
The first example illustrates the reason we must allow s-resolvent C
to 0-subsume only an instance of near parent clause C1. In this example,
the standard resolvent contains Pf(x), created by the resolution operation
from parent literal Px. Clearly, there is no instance of the resolvent that
will 0-subsume near parent clause PxQaRx, but instance Pf(x)QaRf(x)
is 9-subsumed by Pf(x)Qa, here a factor of the standard resolvent.
The s-resolution operation can be quite costly to perform. The direct
approach involves computing the resolvent and checking the 0-subsumption
condition. The latter would most often cause rejection, so a priori tests on
the parents would be desirable. We do not pursue this further because we
finesse this issue later.
In pursuit of the strongly restricted form of linear resolution, we intro-
duce the notion of ordered clause, with the intention of restricting the choice
of resolving literal. By decreasing the choice we decrease the branching fac-
tor of the search tree. We set the convention that the rightmost literal of a
clause is to be the resolving literal; however, we merge leftward, so that in
the resolution operation the substitution instance demanded by the unifier
of the resolution operation may create a merge left situation where the
intended resolving literal disappears from the rightmost position! Then we
say that such a resolution operation application fails.
The class of orderings for the restricted linear resolution we define per-
mits any ordering the user chooses for input clauses except that each literal
of each input clause must be rightmost in some ordered clause. Thus a 3-
literal input clause must have at least three ordered clauses derived from
it. A resolvent-ordered clause has leftmost the surviving descendents of
the literals of the near parent ordered clause, in the order determined by
the near parent ordered clause, while the surviving descendents of the far
parent can be ordered in any manner the user wishes, i.e. is determined by
the particular ordering the user has chosen.
The ordering applied to the input clauses is superfluous, except to the
input clause chosen as first near parent clause (called the top clause), be-
cause all other uses of input clauses are as far parent clauses in a resolution
operation. By definition of resolvent ordered clause, the literals descendent
from the far parent can be ordered as desired when determining the resol-
vent. Of course, it is important that any ordering of input clauses provide
that every literal of every input clause be rightmost, and thus accessible
as a resolving literal. The notion of "top clause" is often used with arbi-
trary linear deductions, but is needed for ordered clause linear deductions
because the first resolution step is not symmetrical in the parent clauses;
178 Donald W. Loveland and Gopalan Nadathur
1. PxPy input
2. -Px-Py input, top clause
3. -PxPy resolvent of 1,2
4. -Px S-resolvent of 2,3
5. Px resolvent of 1,4
6. O S-resolvent of 4,5
considered anyway, because if the resolvent has every literal simply a vari-
ant of its parent literal no instantiation has been eliminated by this action.
That is, the weak subsumption rule can be extended to force s-resolution
if the s-resolvent does not instantiate the literals inherited from the near
parent clause.
The second example has a more dramatic instance of s-resolution and
illustrates how regular factoring of resolvents can then be dropped. How-
ever, it is seen from this example that there is a possible price to pay for
banning factoring in that a longer proof is required. It is controversial
whether factoring is better included or excluded in linear deductions. It is
a trade-off of a reduced branching factor versus shorter proofs and hence
shorter proof trees. (Recall that Prolog avoids factoring; factoring within
linear resolution is optional in the Horn clause domain.)
The clause order is determined by P < Q < R in the following example.
Example 1.2.3.
1. Qx input
2. PxRx input
3. -PxRf(y) input
4. -Px-Rx input
5. Px-Qy-Rz input, top clause
6. Px-QyPz resolvent of 2,5
7. Px-QyRf(z) resolvent of 3,6
8. Px-Qy-Pf(z) resolvent of 4,7
9. Px-Qy s-resolvent of 6,8
10. Px resolvent of 1,9
11. Rf(x) resolvent of 3,10
12. -Pf(x] resolvent of 4,11
13. D s-resolvent of 10,12
clauses, mimicking the ground case, we call this a merge factor and the
pertinent literal(s) merge literal(s).
An MTOSS deduction (a TOSS deduction with merging) is a TOSS
deduction where s-resolution occurs only when the rightmost literal of the
far parent is a descendent of a merge literal.
Although the MTOSS restriction is as constraining a principle as is
known regarding limiting resolution inferences when both parent clauses
are derived, the merge condition has not received much attention in im-
plementation. Actually, the TOSS and even the s-resolution restriction
have not received much attention regarding implementations explicitly. As
suggested earlier, a strongly related format has received much attention.
The focus of the preceding consideration of linear restrictions is to re-
duce the need to resolve two derived clauses together. Why is this interest-
ing? For several (related) reasons. One of the problems in proof search is
the branching factor, how many alternative inferences could be pursued at
a given point in a partial deduction. Since all input clauses are deemed al-
ways eligible (at least for linear resolution) the branching factor is reduced
by tightening conditions for resolving two derived clauses. A second rea-
son is that the input clauses can be viewed as given operators. In a linear
deduction the far parent clause can be seen as modifying the near parent
clause, transforming the near parent clause to a new but related clause.
S-resolution is pleasing in this regard because it says that one needs to
deviate from applying the given "operators" only when the near parent is
modified by removal of the literal receiving current attention, the resolving
literal. (The ordering previously specified for linear deductions is useful in
this viewpoint because it creates a focus on a given literal and its descen-
dents before considering another literal of the clause.) An implementation
consideration for limiting derived-clause resolution is that one can precon-
dition the input clauses to optimize the resolution operation when a input
clause is used in the resolution. Indeed, this is what Prolog does; compilers
exist based on this principle.
To the extent that the above points have validity then the ideal linear
resolution is the linear input format, where every resolvent has a input
clause as far parent clause. (The word "linear" in "linear input" is redun-
dant as use of a input clause as one parent forces a deduction to be linear,
but the term is common and makes explicit that input resolution is a linear
resolution restriction.)
Linear input resolution has been shown earlier not to be complete for
first-order logic, but it is complete over a very important subset. A formula
is a Horn formula if it is in prenex normal form with its quantifier-free part
(matrix) in conjunctive normal form, where each clause has at most one
positive literal. A Horn (clause) set is a clause set where each clause has
at most one positive literal. That every unsatisfiable Horn set has a linear
input refutation is immediate from the completeness of linear resolution
Proof Procedures for Logic Programming 181
is not Horn, because the first clause has two positive literals, but does have
a linear input refutation, as the reader can easily check. However, the set is
convertible to a Horn set by interchanging Pa, Pb, PC with their negations.
This theorem that ground clause sets with linear input refutations are
renamable Horn sets need not hold at the general level. This is because
a literal within a clause C has several ground instances, some instances
requiring renaming and others not, and without finding the refutation (or
at least the instances needed) it is not possible to decide whether or not to
rename the literal occurrence within C.
Horn sets are considered further when we focus on logic programming.
We move on with our review of related theorem-proving procedures.
Apparently the linear input format is desirable, and sophisticated and
effective architectures for Prolog implementations have indeed affirmed its
desirability, but we seem condemned by the theorem just considered to
clause sets whose ground image is a renamable Horn set. Actually, the
manner of overcoming this problem was discovered before linear resolution
was understood as such. The means of having linear input format complete
for all of first-order logic is to alter the underlying resolution structure.
The model elimination (ME) procedure (see [Loveland, 1968; Loveland,
1969; Loveland, 1978]), introduced almost simultaneously with the resolu-
tion procedure, uses the notion of chain, analogous to an ordered clause but
with two classes of literals, A-literals and B-literals. The set of B-literals
in a chain can be regarded as a derived clause as resolution would obtain.
SL-resolution [Kowalski and Kuehner, 1971], also a linear input procedure,
uses the ME device of two classes of literals to achieve completeness for
182 Donald W. Loveland and Gopalan Nadathur
Although possibly not the optimal procedural format for proving math-
ematical theorems, the linear input format is excellent for situations where
strong search guidance exists, as previously noted. One of the situations
where such guidance is often possible is in the logic programming domain.
We now explore this domain.
knows that these are not practically definable within first-order Horn clause
logic. Someone wishing further amplification of these qualities of Prolog
should consult some of the other chapters of this volume.) To better capture
the direct notion of proof and computation important to the presentation
of Prolog we need to adopt another view of Horn clause logic. The new
viewpoint says that we are not interested simply in whether or not a set
of clauses is unsatisfiable. We now are interested in answers to a query
made in the context of a database, to use one vocabulary common in logic
programming. In a more compatible terminology, we seek a constructive
proof of an existentially quantified theorem where the existential variables
are instantiated so as to make the theorem true. (It happens that only one
instantiation of these existential variables is needed for any Horn clause
problem; this is a consequence of the need for only one negative clause
in any (minimal) unsatisfiable Horn clause set, ground sets in particular.)
Along with a notion of answer we will also emphasize a direct proof system,
meaning that conceptually we will consider proofs from axioms with the
theorem as the last line of the deduction. This is important even though
our customary proof search will be a backward chaining system that begins
with the proposed theorem (query).
The alternate viewpoint begins with a different representation for a
Horn clause. The definite clauses have the generic form A1 V -A2 V -A3 V
. . . V -An which we now choose to represent in the (classically) logically
equivalent form A1 «- A2, A3 . . ., An where the latter expression is the
implication A2 A A3 A. . . A An D A1 written with the consequent on the left
to facilitate the clause presentation when backchaining in the proof search
mode, and with the comma replacing AND. Negative clause -A1 V -A2 V
-".A3 V. . . V -An is written «- A1 , A2, A3 ,. . . , An which as an implication is
read A1 A A2 A A3 A. . . A An D FALSE (i.e., that the positive conjunction
is contradictory). In particular, we note that the negative literals appear
as atoms on the right of <— (conjoined together) and the positive literal
appears on the left.
We now consider the problem presentation. Theorem proving problems
are often of the form A D B, where A is a set of axioms and B is the
proposed theorem. The validation of A D B is equivalent to the refutability
of its negation, A A -B. To assure that A A -B is a Horn clause set we can
start with Horn clauses as axioms and let B have form 3x(B 1 AB 2 A. . ./\Br),
where 3x means the existential closure of all variables in the expression it
quantifies. This form works for B because -3x(B 1 AB2 A. . . A B r ) is logically
equivalent to a negative Horn clause. In Horn clause logic programming
(i.e. for Prolog) the problem format is explicitly of the form P D Q, where
P, the program, is a set of definite Horn clauses and Q, the query, is of
form 3x(B 1 A B2 A . . . A Br). The notation <- B1 , . . . , Br that we use for
this formula is suggestive of a query if the «— is read here as a query mark.
Indeed, ?- is the prefix of a query for most Prolog implementations. (The
Proof Procedures for Logic Programming 189
Program:
1. p(a,a,b)
2. p(c, c, 6)
3. p(x, f(u,iv),z) *-p(x,u,y),p(y,w,z)
4. p(x,g(s),y)<-p(y,s,x)
Query:
5. <-p(a, t, c)
Derivation:
6. «- p(a, u,y), p(y, w, c) using clause 3
Substitution: t/f(u,w)
7. 4— p(b, w, c) using clause 1
Substitution: u/a
8. <- p(c, s, b) using clause 4
Substitution: w/g(s)
9. success using clause 2
Substitution: s/c
Answer:
That the deduction form is that of ordered clause input linear resolution
is clear by comparing the above to the resolution deduction that follows.
Given clauses:
1. p(a, a, b)
2. p(c, c, b)
3. p(x, f ( u , w), z), -p(x, u, y),-p(y, w, z)
4. p(x,g(s),y)-p(y,s,x)
5. -p(a,t,c)
Derivation:
6. -p(a, u, y),-p(y, w, c) resolvent of 3,5
Substitution: t / f ( u , w)
7. -p(b,w,c) resolvent of 1,6
Substitution: u/a
8. -p(c, s, b) resolvent of 4,7
Substitution: w/g(s)
9. D resolvent of 2,8
Substitution: s/c
restricted form given by Horn clause logic the only meaningful logic for
logic programming? Do more powerful (and useful) logics exist? The idea
of using arbitrary clauses as opposed to Horn clauses has been championed
by [Kowalski, 1979]. In a similar fashion, the use of full first-order logic
— as opposed to logic in clausal form — has been advocated and explored
[Bowen, 1982]. In a more conservative direction, extending the structure
of Horn clauses by limited uses of connectives and quantifiers has been
suggested. The best known extension of pure Horn clause logic within the
logic programming paradigm permits negation in goals, using the notion of
negation-as-failure. However, the idea of using implications and universal
quantifiers and, in fact, arbitrary logical connectives in goals has also been
advocated [Gabbay and Reyle, 1984a; Lloyd and Topor, 1984; McCarty,
1988a; McCarty, 1988b; Miller, 1989b; Miller et al., 1991].
There is a wide spectrum of logical languages between those given by
Horn clause logic and full quantificational logic, especially if the derivability
relation to be used is also open to choice. An obvious question that arises
in this situation is whether some of these languages provide more natural
bases for logic programming than do others. We argue for an affirmative
answer to this question in this subsection. In particular, we describe a
criterion for determining whether or not a given predicate logic provides
an adequate basis for logic programming. The principle underlying this
criterion is that a logic program in a suitable logical language must satisfy
dual needs: it should function as a specification of a problem while serving
at the same time to describe a programming task. It is primarily due to the
specification requirement that an emphasis is placed on the use of symbolic
logic for providing the syntax and the metatheory of logic programming.
The programming task description requirement, although less well recog-
nized, appears on reflection to be of equal importance. The viewpoint we
adopt here is that the programming character of logic programming arises
from thinking of a program as a description of a search whose structure is
determined by interpreting the logical connectives and quantifiers as fixed
search instructions. Prom this perspective, the connectives and quantifiers
appearing in programs must exhibit a duality between a search-related in-
terpretation and a logical meaning. Such a duality in meaning cannot be
attributed to these symbols in every logical language. The criterion that we
describe below is, in essence, one for identifying those languages in which
this can be done.
To provide some concreteness to the abstract discussion above, let us
reconsider briefly the notions of programs and queries in the context of
Horn clause logic. As has previously been noted, one view of a query in
this setting is as a formula of the form 3x (p1 (x) A. . . Ap n (x)); x is used here
to denote a sequence of variables and Pi(x) represents an atomic formula in
which some of these variables appear free. The SLD-resolution procedure
described in the previous subsection tries to find a proof for such a query
192 Donald W. Loveland and Gopalan Nadathur
by looking for specific substitutions for the variables in x that make each
of the formulas p; (x) follow from the program. This procedure can, thus,
be thought to be the result of assigning particular search interpretations
to existential quantifiers and conjunctions, the former being interpreted as
specifications of (infinite) OR branches with the branches being parame-
terized by substitutions and the latter being interpreted as specifications of
AND branches. This view of the logical symbols is of obvious importance
from the perspective of programming in Horn clause logic. Now, it is a
nontrivial property of Horn clause logic that a query has a constructive
proof of the kind described above if it has any proof at all from a given
program. It is this property that permits the search-related interpretation
of the logical symbols appearing in queries to co-exist with their logical
or declarative semantics and that eventually underlies the programming
utility of Horn clauses.
Our desire is to turn the observations made in the context of Horn clause
logic into a broad criterion for recognizing logical languages that can be
used in a similar fashion in programming. However, the restricted syntax
of logic in clausal form is an impediment to this enterprise and we therefore
enrich our logical vocabulary before proceeding further. In particular, we
shall allow the logical symbols A, V, D, V, 3, T and J. to appear in the
formulas that we consider. The first five of these symbols do not need an
explanation. The symbols T and L are intended to correspond to truth
and falsity, respectively. We shall also use the following syntactic variables
with the corresponding general connotations:
D A set of formulas, finite subsets of which serve as possible
programs of some logic programming language.
g A set of formulas, each member of which serves as a possi-
ble query or goal for this programming language.
A An atomic formula excluding T and L.
D A member of D, referred to as a program clause.
G A member of g, referred to as a goal or query.
P A finite set of formulas from D, referred to as a (logic)
program.
Using the current vocabulary, computation in logic programming can be
viewed as the process of constructing a derivation for a query G from a
program P. The question that needs to be addressed, then, is that of the
restrictions that must be placed on D and g and the notion of derivation
to make this a useful viewpoint.
Towards answering this question, we shall describe a proof-theoretic cri-
terion that captures the idea of computation-as-search. The first step in this
direction is to define the search-related semantics that is to be attributed to
the logical symbols. This may be done by outlining the structure of a sim-
ple nondeterministic interpreter for programs and goals. This interpreter
Proof Procedures for Logic Programming 193
either succeeds or does not succeed, depending on the program P and the
goal G that it is given in its initial state. We shall write P \-Q G to indicate
that the interpreter succeeds on G given P; the subscript in ho signifies
that this relation is to be thought of as the "operational" semantics of an
(idealized) interpreter. The behavior of this interpreter is characterized by
the following search instructions corresponding to the various logical sym-
bols; the notation [x/t]G is used here to denote the result of substituting t
for all free occurrences of x in G:
SUCCESS P \-0 T.
AND P \-0 G! A G2 only if P ho G1 and P \-o G2.
OR P\-0 G1 VG2 only if P \-o G1 or P \-o G2.
INSTANCE P h0 3x G only if there is some term t such that P \-o
[x/t]G.
AUGMENT P \-0 D D G only if P U {D} ho G.
GENERIC P\-0VxG only if P \-o [x/c]G, where c is a constant that
does not appear in P or in G.
disjunctive search are meaningful ones, and some of the programming util-
ity of the AUGMENT and GENERIC search operations will be discussed
later in this section.
The second point to note is that we have addressed only the issue of the
success/failure semantics for the various logical symbols through the pre-
sentation of the idealized interpreter. In particular, we have not described
the notion of the result of a computation. There is, of course, a simple way
in which this notion can be elaborated: existentially quantified goals are
to be solved by finding instantiations for the quantifiers that yield solvable
goals, and the instantiations of this kind that are found for the top-level
existential quantifiers in the original goal can be provided as the outcome
of the computation. However, in the interest of developing as broad a
framework as possible, we have not built either this or any other notion of
a result into our operational semantics. We note also that our interpreter
has been specified in a manner completely independent of any notion of
unification. Free variables that appear in goals are not placeholders in
the sense that substitutions can be made for them and substitutions are
made only in the course of instantiating quantifiers. Of course, a practical
interpreter for a particular language whose operational semantics satisfies
the conditions presented here might utilize a relevant form of unification
as well as a notion of variables that can be instantiated. We indicate how
such an interpreter might be designed in the next section by considering
this issue in the context of specific languages described there.
In a sense related to that of the above discussion, we observe that our
search instructions only partially specify the behavior to be exhibited by
an interpreter in the course of solving a goal. In particular, they do not
describe the action to be taken when an atomic goal needs to be solved.
A natural choice from this perspective turns out to be the operation of
backchaining that was used in the context of Horn clause logic. Thus, an
instruction of the following sort may often be included for dealing with
atomic goals:
ATOMIC P \~o A only if A is an instance of a clause in P or P \-Q G
for an instance G D A of a clause in P.
There are two reasons for not including an instruction of this kind in the
prescribed operational semantics. First, we are interested at this juncture
only in describing the manner in which the connectives and quantifiers in a
goal should affect a search for a solution and the particular use that is to be
made of program clauses is, from this perspective, of secondary importance.
Second, such an instruction requires program clauses to conform to a fairly
rigid structure and as such runs counter to the desire for generality in the
view of logic programming that we are developing.
A notable omission from the logical connectives that we are considering
is that of -. The interpretation of negation in most logical systems corre-
Proof Procedures for Logic Programming 195
The structure of a uniform proof thus reflects the search instructions asso-
ciated with the logical symbols. We can, in fact, define ho by saying that
P \-Q G, i.e., the interpreter succeeds on the goal G given the program
P, if and only if there is a uniform proof of the sequent P —> G. We
observe now that the logical symbols exhibit the desired duality between a
logical and a search-related meaning in exactly those situations where the
existence of a proof within a given logical system ensures the existence of a
uniform proof. We use this observation to define our criterion for establish-
ing the suitability of a logical language as the basis for logic programming:
letting h denote a chosen proof relation, we say that a triple (D, g, h) is an
abstract logic programming language just in case, for all finite subsets P of
D and all formulas G of G, P I- G if and only if P —> G has a uniform
proof.
There is, however, no uniform proof for 3xq(x) from the program being
considered; such a proof would require q(t) to be derivable from {p D
q(a), ->p D q(b)} for some particular t, and this clearly does not hold. Thus,
the language under consideration fails to satisfy the defining criterion for
abstract logic programming languages. Another non-example along these
lines is obtained by taking D to be the collection of (the universal closures
of) positive and negative Horn clauses, g to consist of the existential closure
of a conjunction of literals containing at most one negative literal and h to
be classical provability [Gallier and Raatz, 1987]. This triple fails to be an
abstract logic programming language since
->p(a) V ~>p(6) \~c 3x ~>p(x)
although no particular instance of the existentially quantified goal can be
proved. For a final non-example, let T> and g consist of arbitrary formulas
and let I- be provability in classical, intuitionistic or minimal logic. This
triple, once again, does not constitute an abstract logic programming lan-
guage, since, for instance,
p(a) Vp(b) \-3xp(x)
regardless of whether h is interpreted as h^, I-/ or KM whereas there is no
term t such that p(t) is derivable even in classical logic from p(a) V p(6).
To conclude that our criterion really makes distinctions, it is necessary
to also exhibit positive examples of abstract logic programming languages.
We provide examples of this kind now and the syntactic richness of these
examples will simultaneously demonstrate a genuine utility for our crite-
rion.
The first example that we consider is of a logical language that is
slightly richer than Horn clause logic. This language is given by the triple
(Di, Qi, \~c} where g1 is the collection of first-order formulas defined by the
syntax rule
G ::= T\A\G/\G\GvG\lxG
200 Donald W. Loveland and Gopalan Nadathur
possibilities for the last inference figure. The argument is trivial when
this is one of A-R, V-R, 3-R, D-R, and V-R. From the previous claim, this
figure cannot be J.-R. Given the structure of the formulas in A, the only
remaining cases are V-L, A-L and D-L. Consider the case for V-L, i.e., when
the last figure is of the form
The argument in this case depends on the structure of G'. For instance, let
G' = VxG1. The upper sequent of the above figure is of a kind to which
the inductive hypothesis is applicable. Hence, there is a constant c that
does not appear in any of the formulas in 0 U { [ t / x ] P , V x G 1 ] for which
[t/x]P, 0 — > [c/x]G1 has an I-proof of size less than s. Adding below
this derivation a V-L inference figure, we obtain an I-proof of size less than
s + 1 for Vx;P, 0 — > [c/x]G1 . Observing now that c cannot appear in
Vx P if it is does not appear in [t/x]P, the claim is verified in this case.
The analysis for the cases when G' has a different structure follows an
analogous pattern. Further, similar arguments can be provided when the
last inference figure is an A-L or an D-L.
Now let P — > G have an I-proof. It follows from the second claim
that it must then have a uniform proof. The proof of the claim in fact
outlines a mechanism for moving the inference figure that introduces the
top-level logical connective in G to the end of the I-proof. A repeated use
of this observation yields a method for transforming an I-proof of F — > G
into a uniform proof for the same sequent. •
The proof of Proposition 2.3.3 reveals a relationship between deriv-
ability in intuitionistic and minimal logic in the context of interest. In
particular, let P be a finite subset of D2 and let G be a formula in g2 .
We have observed, then, that an I-proof of the sequent P — > G can-
not contain uses of the inference figure J.-R in it. Thus any I-proof of
such a sequent must also be an M-proof. In other words, these two no-
tions of provability are indistinguishable from the perspective of existence
of derivations for sequents of the form P — > G. It follows from this that
(D2, g2, l~/) and (D2, g2, \-M) constitute the same abstract logic program-
ming languages. We note that the introduction of implication together
with its desired search interpretation leads to a distinction between classi-
cal provability on the one hand and intuitionistic and minimal provability
on the other. It may be asked whether a similar sort of distinction needs to
be made between intuitionistic and minimal provability. It turns out that
a treatment of negation and an interpretation of the idea of a contradiction
requires these two notions to be differentiated. We do not discuss this issue
any further here, but the interested reader may refer to [Miller, 1989b] for
some thoughts in this direction.
Proof Procedures for Logic Programming 207
We have raised the issue previously of what the behavior of the (ideal-
ized) interpreter should be when an atomic goal is encountered. We have
also suggested that, in the case of the language (D2, g2, '~M)» the instruc-
tion ATOMIC might be used at such a point. Following this course is
sound with respect to the defined operational semantics as the following
proposition shows.
Proposition 2.3.4. Let V be a finite subset of D2 and let A be an atomic
formula. If A is an instance of a clause in P or if there is an instance
(G D A) of a clause in P such that P —^ G has a uniform proof, then
P —> A has an uniform proof.
Proof. If A is an instance of a formula in P, we can obtain a uniform
proof of P —> A by appending below an initial sequent some number of
V-L inference figures. Suppose (G D A) is an instance of a clause in P and
that P —» G has a uniform proof. We can then obtain one for P —> A
by using an D-L inference figure followed by some number of V-L below the
given derivation and the initial sequent A, P —> A. I
Using ATOMIC as a means for solving atomic goals in conjunction with
the instructions for solving the other kinds of goals also yields a complete
strategy as we now observe.
Proposition 2.3.5. Let P be a finite subset of D2 and let A be an atomic
formula. If the sequent P —>• A has a uniform proof containing l se-
quents, then either A is an instance of a clause in P or there is an in-
stance (G D A) of a clause in P such that P —> G has an uniform proof
containing fewer than l sequents.
Proof. The following may be added as a sixth item to the second claim in
the proof of Proposition 2.3.3 and established by the same induction: the
sequent P — A has an I-proof containing l sequents only if either A is
an instance of a clause in P or there is an instance (G D A) of a clause
in P such that P —> G has an I-proof containing fewer than l sequents.
The claim when embellished in this way easily yields the proposition. I
The abstract logic programming language (D2, G2, H/) incorporates each
of the search primitives discussed in the previous subsection into the syn-
tax of its goals. It provides the basis for an actual language that contains
two new search operations, AUGMENT and GENERIC, in addition to
those already present in Prolog. At least one use for these operations in
a practical setting is that of realizing scoping mechanisms with regard to
program clauses and data objects. Prolog provides a means for augmenting
a program through the nonlogical predicate called assert and for deleting
clauses through a similar predicate called retract. One problem with these
predicates is that their effects are rather global: an assert makes a new
clause available in the search for a proof for every goal and a retract re-
moves it from consideration in every derivation. The AUGMENT operation
208 Donald W. Loveland and Gopalan Nadathur
it can be used in the course of solving G(x). However, the only way it can
be referred to by name, and hence directly manipulated, in this context is
by using one of the clauses in D(x,y).
Although the above discussion provides the intuition guiding a realiza-
tion of information hiding and of abstract data types in logic programming,
a complete realization of these notions requires an ability to quantify over
function symbols as well. This kind of a "higher-order" ability can be incor-
porated into the language we have presented without much difficulty. How-
ever, we do not do this here and refer the interested reader to [Miller et al.,
1991] and [Nadathur and Miller, 1988] instead. We also note that a more
detailed discussion of the scoping mechanism provided by the GENERIC
operation appears in [Miller, 1989a] and the language described there also
incorporates a higher-order ability.
We consider now the issue of designing an actual interpreter or proof
procedure for the abstract logic programming languages discussed in this
subsection. Let us examine first the language (D 1 , g1, 1-c . The notion of a
uniform proof determines, to a large extent, the structure of an interpreter
for this language. However, some elaboration is required of the method
for choosing terms in the context of the INSTANCE instruction and of the
action to be taken when atomic goals are encountered. Propositions 2.3.4
and 2.3.5 and the discussion of SLD-resolution suggest a course that might
be taken in each of these cases. Thus, if a goal of the form xG(x) is
encountered at a certain point in a search, a possible strategy is to delay
the choice of instantiation for the quantifier till such time that this choice
can be made in an educated fashion. Such a delaying strategy can be real-
ized by instantiating the existential quantifier with a "placeholder" whose
value may be determined at a later stage through the use of unification.
Placeholders of this kind are what are referred to as logic variables in logic
programming parlance. Thus, using the convention of representing place-
holders by capital letters, the goal G(x) may be transformed into one of
the form G(X) where X is a new logic variable. The use of such variables
is to be "cashed out" when employing ATOMIC in solving atomic goals. In
particular, given a goal A that possibly contains logic variables, the strat-
egy will be to look for a program clause of the form y A' or y (G A')
such that A unifies with the atom that results from A' by replacing all the
universally quantified variables in it with logic variables. Finding such a
clause results, in the first case, in an immediate success or, in the second
case, in an attempt to solve the resulting instance of G. The interpreter
that is obtained by incorporating these mechanisms into a search for a uni-
form proof is still nondeterministic: a choice has to be made of a disjunct
in the context of the OR instruction and of the program clause to use with
respect to the ATOMIC instruction. This nondeterminism is, however,
of a more tolerable kind than that encountered in the context of the IN-
STANCE instruction. Moreover, a deterministic but incomplete procedure
210 Donald W. Loveland and Gopalan Nadathur
and
212 Donald W. Loveland and Gopalan Nadathur
Using the first clause in trying to solve g(X) produces the subgoals q(X)
and p(b]. The first subgoal can be successfully solved by using the second
clause and the added clause p(X). However, as a result of this solution to
the first subgoal, the added clause must be changed to p(a). This clause
can, therefore, not be used to solve the second subgoal, and this ultimately
leads to a failure in the attempt to solve the overall goal.
The problem with logic variables in program clauses occurs again in the
system N-Prolog that is presented in the next section and is discussed in
more detail there. An actual interpreter for the language (D2 g 2 ,1- 1 ) that
includes a treatment of several of the issues outlined here is described in
[Nadathur, 1993]. The question of implementing this interpreter efficiently
has also been considered and we refer the interested reader to [Nadathur et
al., 1993] for a detailed discussion of this aspect. For the present purposes,
it suffices to note that a procedure whose structure is similar to the one
for constructing SLD-refutations can be used for finding uniform proofs
for the extended language and that a satisfactory implementation of this
procedure can be provided along the lines of usual Prolog implementations.
We have focused here on using the criterion developed in the last sub-
section in describing abstract logic programming languages in the context
of first-order logic. However, the framework that has been described is
quite broad and can be utilized in the context of other logics as well. It
can, for instance, be used in the context of a higher-order logic to yield
a higher-order version of the language (D 2 , G2, l- 1 ). Such a language has,
in fact, been described [Miller et al, 1991] and has provided the basis for
a higher-order extension to Prolog that is called AProlog [Nadathur and
Miller, 1988]. A discussion of some aspects of this extension also appears
in the chapter on higher-order logic programming in this volume of the
handbook. More generally, the framework can be utilized with a differ-
ent set of logical symbols or with a different search interpretation given
to the logical symbols considered here. Interpreted in this sense, our crite-
rion has been used in conjunction with linear logic [Harland and Pym, 1991;
Hodas and Miller, 1994] and a calculus of dependent types [Pfenning, 1989]
to describe logic programming languages that have interesting applications.
(1) Existential variables may be shared among program clauses and goals,
reflecting the view that the existential quantifiers associated with the
variables are exterior to the entire expression (i.e. 3X(P ? Q));
(2) Only existential variables occur in goals.
(d4) A person is a bad friend if there is someone who would become neu-
rotic if this someone were befriended by the person.
The query is "Is there someone who is a bad friend?"
The database and query are formalized as follows:
(dl) [/([U1, U2) A ct(U1U2 ) D d(U2)] D n(U2)
(d2) cr(s,m) D d(m)
(d3) cr (s,U 1 )
(d4) [f(U1,U2))Dn(C7 2 )] Db(U 1 )
(query) b(Y)
The query uses an existential variable, and all database clauses use
universal variables.
We now follow [Gabbay and Reyle, 1984b] in presenting one possible
computation sequence for this database and query.
1. dl,d2,d3,d4 ? b(Y)
2. dl,d2,d3,d4 ? f(Y, X) D n(X) using d4
3. dl,d2,d3,d4,/(y,X) ? n(X) using rule for implication
4. dl,d2,d3,d4,/(y, X) ? f ( Z , X) A cr(Z, X) D d(X)
using dl
5. dl,d2,d3,d4,/(y, X), f ( Z , X),cr(Z, X) ? d(X)
using rule for implication
6. dl,d2,d3,d4,/(y,m),f(Z,m),cr(Z,m) ? cr(s,m)
using d2
7. dl,d2,d3,d4,/(y,m),/(Z,m),cr(Z,m) ? cr(s,m)
using d3.
In connection with this computation, it is interesting to note that in step
2 and in step 4 the universal variables C2 and U1 that appear in clauses
d4 and dl respectively are not instantiated during unification with the
previous goal and have therefore to be renamed to an existential variable
in the new goal. Note also that additions are made to the program in
the course of computation that contain existential variables in them. An
instantiation is determined for one of these variables in step 6 and this
actually has the effect of changing the program itself.
Observe that the answer variable is uninstantiated; the bad friend exists,
by the computation success, but is unnamed. The neurotic friend is seen
to be Mary by inspection of the computation.
The derivation given by Gabbay and Reyle uses clause d3 at Step 7.
Perhaps a more interesting derivation makes use of the augmented database
at Step 7 and shows that clause d3 is unneeded. The intuition behind this
alternative is that dl says that for some instance of U2 one can establish
n(U 2 ) by satisfying (an appropriate instance of) an implication, and d2
provides an appropriate implication for such an instance.
Proof Procedures for Logic Programming 217
figure occurrence
for the new block, chosen by a user supplied selection function from the
deferred head list of the final line of the preceding block. (For a standard
implementation it will be the leftmost deferred head.) The deferred head
list is the deferred head list of the final line of the preceding block, minus
the now distinguished active head. For other nH-Prologs this defines the
active and deferred head lists of the new block but InH-Prolog has also as
active heads the active heads of the block in which the new distinguished
active head was deferred. If one represents the block structure in tree form
with a child block for each deferred head in a block, then the active heads
in the new child block include the parent block deferred head as distin-
guished active head and the active heads of the parent block as the other
active heads. No deferred head list exists, as the heads are immediately
recorded in the children blocks. (The possibility that multiple heads from
a clause can share variables complicates the tree format by requiring that
binding information be passed between blocks. In the sequential format
we are outlining, acquiring heads from the previous block handles shared
bindings. However, special recording mechanisms are needed in InH-Prolog
to identify the correct active heads in the preceding block.)
In the start block where no active head list exists, the separator symbol
# is omitted until deferred heads are introduced.
The deduction terminates successfully when no new block can be formed,
which occurs when the deferred head list is empty.
We distinguish the newest active head primarily because of the key
cancellation requirement discussed in the next paragraph, but also because
nH-Prolog deductions must have one restart block per deferred head, and
the distinguished active head is the appropriate head atom to identify with
the block. The active heads serve as conditional facts in the case-analysis
view of the nH-Prologs.
The InH-Prolog procedure has added requirements that relate to search.
A non-obvious property whose absence would make the procedure imprac-
tical is the cancellation requirement. This requirement states that a restart
block is to be accepted in a deduction only if a cancellation has occurred
using the distinguished active head. (Cancellation with other active heads
may occur also, of course, but do not satisfy the requirement.) This de-
mand keeps blocks relevant, and prunes away many false search branches.
The underlying proof procedure is a complete proof procedure with this
feature. We return to the importance of this later.
Another important search feature is progressive search. Although too
complex to treat fully here we outline the feature. Recall that a restart
block in InH-Prolog always begins with the initial query FALSE. (This is
not true of the Unit nH-Prolog variant, originally called Progressive nH-
Prolog [Loveland, 1987; Loveland, 1991] because it first incorporated this
important search feature. However, this feature is useful for all variants.)
If the program clauses were processed exactly as its "parent" block, then
224 Donald W. Loveland and Gopalan Nadathur
it would duplicate the computation with no progress made, unless the new
distinguished active head can invoke a cancellation. Progressive search
starts the search of the restart block where the "parent" block finished
(roughly), so that a new search space is swept. More precisely, the initial
portion of the restart block deduction duplicates the initial portion of the
block where the current distinguished active head was deferred, the initial
portion ending at that point of deferral. (The path to this point can be
copied with no search time accrued.) In effect we insert a FAIL at this
point and let normal Prolog backtracking advance the search through the
remainder of the search space. If this search fails then the search continues
"from the top", as a regular search pattern would begin the block. The
search halts in failure when the initial path to the deferred head point
is again reached. For a more precise description see [Loveland, 1991]. An
interpreter for Unit nH-Prolog has proven the worth of this search strategy.
We present two examples of InH-Prolog deductions. The first has very
short blocks but has multiple non-Horn clauses and an indefinite answer.
The second example has less trivial blocks and yields a definite answer.
An answer is the disjunction of the different instantiations of the query
clause which begin InH-Prolog blocks. Not every block begins with a query
clause use; negative clauses also have head atom FALSE and may begin
blocks. However, negative clauses are from the program so represent known
information. Only uses of the query clause yield new information and war-
rant inclusion in the answer to a query. Nor does each use of a query clause
yield a different instantiation; identical instantiations of answer literals are
merged.
The deduction below shows a successful computation without any in-
dication of the search needed to find it, as is true of any displayed de-
duction. It is natural to wonder how much search occurs before the query
"men_req(X, Cond)" calls the correct clause in the successive restart blocks.
(Following Prolog, strings beginning with capital letters are variables.) At
the completion of the deduction we show that the cancellation requirement
kills the attempt to repeat the start block and thus forces the system to
backtrack immediately upon block failure to move to the next (and correct)
clause to match the query.
Example 3.2.1 (The manpower requirement example).
Program:
men_req(8,normal) :- cond(normal).
men_req(20,wind_storm) :- cond(wind_storm).
men_req(30,snow .removal) :- cond(snow-removal).
cond(snow_removal); cond(wind_storm) :- cond(abnormal).
cond(normal) ; cond(abnormal).
Query:
Proof Procedures for Logic Programming 225
?- men_req(X,Cond).
Query clause:
FALSE :- men_req(X,Cond).
Deduction:
?- FALSE
- men_req(X,Cond)
- cond(normal) % X=8, Cond=normal
# [cond(abnormal)]
restart
7
- FALSE # cond(abnormal)
- men_req(Xl,Condi) # cond(abnormal)
- cond(wind_storm) # cond(abnormal)
% X=20, Condl=wind^torm
- cond(abnormal) # cond(abnormal) [cond(snow_removal)]
# cond(abnormal) [cond(snow_removal)]
% cancellation with distinguished head
restart
?- FALSE # cond(snow .removal), cond (abnormal)
% The new distinguished head is placed
% leftmost in the active head list
:- men_req(X2,Cond2) # cond(snowjremoval),cond(abnormal)
:- cond(snow .removal) # cond(snow .removal), cond (abnormal)
% X=30, Cond2=snow_removal
:- # cond(snow_removal),cond(abnormal)
% cancellation with distinguished head
restart
?- FALSE # cond (abnormal)
:- men_req(Xl,Condi) # cond (abnormal)
: - cond(normal) # cond (abnormal)
226 Donald W. Loveland and Gopalan Nadathur
# cond(abnormal) [cond(abnormal)]
Query:
?- goes_to_mtg(X).
Query clause:
FALSE : - goes_to_intg(X)
Deduction:
?- FALSE
:- goes.tojntg(X)
: - gets_to(X,office_bldg)
:- parks(X,omce_bldg)
:-drives(X,office_bldg) # [lot-full(office.bldg)]
:-car(X) # [lotJull(office_bldg)]_bldg)]
Proof Procedures for Logic Programming 227
# [lot-full(office.bldg)]
% X=al
restart
?- FALSE
- goes-to-mtg(Xl) # lotJull(office.bldg)
- gets_to(Xl,office_bldg) # lot-full(office.bldg)
- finds(Xl,office.bldg) # lot-full (office.bldg)
- drives(Xl,office_bldg), parks(Xl,commercial)
# lot-full(office-bldg)
- car(Xl), parks(Xl,commercial) # lot_full(office_bldg)
- parks(al,commercial) # lot-full(ofHce-bldg)
% XI=al
- car(al), lotJull(office_bldg) # lot Jull(office_bldg)
- lot_full(office_bldg) # lot-full (office.bldg)
% cancellation
# lot_full(office.bldg)
Query answer:
goes_to_mtg(al)
Observe that the query answer above is the merge of the disjunction of
two identical atoms, each corresponding to a block.
Unit nH-Prolog differs from InH-Prolog primarily by the use of only one
active head atom, the distinguished active head, and by a possibly different
restart block initial goal clause. The use of only one active head means
a "constant time" inner loop speed (when a no-occurs-check unification
is used, as for Prolog) rather than an inner-loop speed that varies with
the length of the active list to be checked. (Pre-processing by a compiler
usually makes the possibly longer active head list of InH-Prolog also process
in near-constant time. This is not always possible, such as when multiple
entries to the active head list share the same predicate letter.) The different
restart initial goals can allow a shorter deduction to be written (and yield a
shorter search time) but represent another branch point in the search and
may in fact lengthen search time. (The latter is much less costly than first
considerations indicate.) The search time tradeoff here is hard to quantify.
Evidence at present suggests that InH-Prolog is superior when compilation
is possible. For interpreters, Unit nH-Prolog seems to have the edge on
speed, but is somewhat more complex to code.
We now consider how to represent the underlying inference system of
InH-Prolog in a uniform proof format. One way is to rewrite non-Horn
clauses in a classically equivalent form so that only rules acceptable to the
uniform proof concept are used. We illustrate this approach by example,
writing clauses as traditional implications (not Prolog notation), using D
for implies and interpreting the atom FALSE as an alternative notation
228 Donald W. Loveland and Gopalan Nadathur
for J_. Given the non-Horn clause A D (B V C), we may rewrite this in
the classically equivalent forms (A A (B D FALSE)) D C and (A A (C D
FALSE)) D S with both clauses needed to replace A D (B V C). With use
of clauses of this form, InH-Prolog derivations can indeed be described as
uniform proofs.
We choose a different approach for our major illustration, one that
does not require us to rewrite the non-Horn clauses. We obtain the effect
of case analysis and block initiation used by InH-Prolog by use of a derived
rule meeting the uniform proof conditions. We justify the derived restart
inference figure
There is a symmetric rule for P,AvB —> A and this obviously gener-
alizes for different size disjunctive clauses. The similarity in effect between
the restart rule and the rewriting of the non-Horn clauses suggested earlier
is obvious.
We give a uniform proof for the manpower requirement example. To
shorten the proof tree we will combine the CI (Clause Incorporation) and
BC (backchain) into one inference figure. Single symbol abbreviations are
used for predicate and constant names and P denotes the formulas consti-
tuting the program for the example. We again use dl,d2,d3,d4 ? f(Y, X) D
n(X) "existential" variables in the CI inference figure to signify the creation
of one instance of a clause when the proof is read from the bottom upward.
We conclude this section with another example, one where the capabil-
ity of the InH-Prolog procedure might seem to extend beyond the ability to
Proof Procedures for Logic Programming 229
The sequent p(a) V p(b) —> 3Xp(X) has no uniform proof, so it is in-
structive to see the uniform proof for this InH-Prolog computation. By
convention, the query is replaced by query clause FALSE :- p(X) and
query ?- FALSE. This transformation is justifiable in classical logic; in
particular, the sequent (VX(p(X) D FALSE)) D FALSE —> 3Xp(X) has a
C-proof. The InH-Prolog computation of query ?- FALSE given program
p(a) V p ( b ) , p ( X ) D FALSE succeeds, giving answer p(a) Vp(6). Using the
various conventions described earlier and denoting the set {VX(p(X) D
FALSE),p(a) Vp(6)} by P, the associated uniform proof is given below.
4 Conclusion
The material presented in this paper may be read in two ways. The first
way is to consider the procedures as primary, starting with the historical
introduction of procedures. This leads to the SLD-resolution procedure,
which is the inference foundation for Prolog and anchors the traditional
view of logic programming. Two extensions of the traditional notion of
logic programming also appear. Included is an abstraction of the concept
of logic programming that may be regarded as a derivative attempt to en-
code the essence of these procedures. An alternative view is to take the
abstraction as primary and as setting the standard for acceptable logic pro-
gramming languages. If this viewpoint is adopted, the abstraction provides
guidance in constructing goal-directed proof procedures for logic program-
ming — such procedures will essentially be ones that search for uniform
proofs in the relevant context. Horn clause logic and the associated SLD-
resolution procedure are illustrations of this viewpoint. However, richer
exploitations of the general framework are also possible and this issue is
230 Donald W. Loveland and Gopalan Nadathur
discussed at length. The two procedures that conclude the paper serve as
additional novel illustrations of the notion of searching for a uniform proof.
N-Prolog is an excellent example of an extension of SLD-resolution that
fits the uniform proof structure. The second example, near-Horn Prolog,
also fits the uniform proof structure when we specify carefully the version
(InH-Prolog) and the input format (use of query clause). That InH-Prolog
is in fact a proof procedure for full (classical) first-order logic makes the
fit within a uniform proof structure particularly interesting. However, re-
call that conversion of a given valid formula into the positive implication
refutation logic format used by InH-Prolog may utilize several equivalences
valid only classically.
Although the second way of reading the paper most likely is not the
viewpoint the reader first adopts, it is a viewpoint to seriously consider.
If the reader agrees with the paradigm captured by the structure of uni-
form proof, then he/she has a framework to judge not just the procedures
presented here but future procedures that seek to extend our capabilities
within the logic programming arena.
Acknowledgements
We wish to thank Robert Kowalski for suggestions that lead to an improved
paper.
Loveland's work on this paper was partially supported by NSF Grants
CCR-89-00383 and CCR-91-16203 and Nadathur's work was similarly sup-
ported by NSF Grants CCR-89-05825 and CCR-92-08465.
References
[Ait-Kaci, 1990] H. Ai't-Kaci. Warren's Abstract Machine: a Tutorial Re-
construction. MIT Press, Cambridge, MA, 1990.
[Andrews, 1976] P. Andrews. Refutations by Matings. IEEE Transactions
on Computers, 0-25:801-807, 1976.
[Astrachan and Loveland, 1991] O. L. Astrachan and D. W. Loveland.
METEORs: High performance theorem provers for Model Elimination.
In R.S. Boyer, editor, Automated Reasoning: Essays in Honor of Woody
Bledsoe. Kluwer Academic Publishers, Dordrecht, 1991.
[Astrachan and Stickel, 1992] O. L. Astrachan and M. E. Stickel. Caching
and lemmaizing in Model Elimination theorem provers. In W. Bibel
and R. Kowalski, editors, Proceedings of the Eleventh Conference on
Automated Deduction, Lecture Notes in Artificial Intelligence 607, pages
224-238. Springer-Verlag, Berlin, June 1992.
[Bibel, 1982] W. Bibel. Automated Theorem Proving. Vieweg Verlag,
Braunschweig, 1982.
Proof Procedures for Logic Programming 231
Contents
1 Introduction 236
1.1 Abduction in logic 237
1.2 Integrity constraints 241
1.3 Applications 243
2 Knowledge assimilation 244
3 Default reasoning viewed as abduction 249
4 Negation as failure as abduction 254
4.1 Logic programs as abductive frameworks 255
4.2 An abductive proof procedure for LP 257
4.3 An argumentation-theoretic interpretation 263
4.4 An argumentation-theoretic interpretation of the abduc-
tive proof procedure 267
5 Abductive logic programming 269
5.1 Generalised stable model semantics 270
5.2 An abductive proof procedure for ALP 273
5.3 An argumentation-theoretic interpretation of the abduc-
tive proof procedure for ALP 277
5.4 Computation of abduction through TMS 279
5.5 Simulation of abduction 279
5.6 Abduction through deduction from the completion 285
5.7 Abduction and constraint logic programming . . . 286
6 Extended logic programming 288
6.1 Answer set semantics 289
6.2 Restoring consistency of answer sets 290
6.3 Rules and exceptions in LP 293
6.4 (Extended) Logic Programming without Negation as Fail-
ure 295
6.5 An argumentation-theoretic approach to ELP . . . 297
236 A. C. Kakas, R. A. Kowalski and F. Toni
1 Introduction
This paper extends and updates our earlier survey and analysis of work on
the extension of logic programming to perform abductive reasoning [Kakas
et al., 1993]. The purpose of the paper is to provide a critical overview of
some of the main research results, in order to develop a common frame-
work for evaluating these results, to identify the main unresolved prob-
lems, and to indicate directions for future work. The emphasis is not on
technical details but on relationships and common features of different ap-
proaches. Some of the main issues we will consider are the contributions
that abduction can make to the problems of reasoning with incomplete
or negative information, the evolution of knowledge, and the semantics of
logic programming and its extensions. We also discuss recent work on the
argumentation-theoretic interpretation of abduction, which was introduced
in the earlier version of this paper.
grass-is-wet 4— rained-last-night
grass-is-wet <— sprinkler-was-on
shoes-are-wet 4— grass-is-wet.
If we observe that our shoes are wet, and we want to know why this is so,
{rained-last-night} is a possible explanation, i.e. a set of hypotheses that
together with the explicit knowledge in T implies the given observation.
{sprinkler-was-on} is another alternative explanation.
Abduction consists in computing such explanations for observations.
It is a form of non-monotonic reasoning, because explanations which are
consistent with one state of a knowledge base may become inconsistent with
new information. In the example above the explanation rained-last-night
may turn out to be false, and the alternative explanation sprinkler-was-on
may be the true cause for the given observation. The existence of multiple
explanations is a general characteristic of abductive reasoning, and the
selection of "preferred" explanations is an important problem.
{grass-is-wet}
{rained-last-night}
{sprinkler-was-on}
are.
{rained-last-night, sprinkler-was-on}
{ rained-last-night}
{sprinkler-was-on}
are.
So far we have presented a semantic characterisation of abduction and
discussed some heuristics to deal with the multiple explanation problem,
but we have not described any proof procedures for computing abduction.
Various authors have suggested the use of top-down, goal-oriented compu-
tation, based on the use of deduction to drive the generation of abductive
hypotheses. Cox and Pietrzykowski [Cox and Pietrzykowski, 1992] con-
struct hypotheses from the "dead ends" of linear resolution proofs. Finger
and Genesereth [Finger and Genesereth, 1985] generate "deductive solu-
tions to design problems" using the "residue" left behind in resolution
proofs. Poole, Goebel and Aleliunas [Poole et al, 1987] also use linear
resolution to generate hypotheses.
In contrast, the ATMS [de Kleer, 1986] computes abductive explana-
tions bottom-up. The ATMS can be regarded as a form of hyper-resolution,
augmented with subsumption, for propositional logic programs [Reiter and
De Kleer, 1987]. Lamma and Mello [Lamma and Mello, 1992] have devel-
oped an extension of the ATMS for the non-propositional case. Resolution-
based techniques for computing abduction have also been developed by De-
molombe and Farinas del Cerro [Demolombe and Farinas del Cerro, 1991]
and Gaifman and Shapiro [Gaifman and Shapiro, 1989].
Abduction can also be applied to logic programming (LP). A (general)
logic program is a set of Horn clauses extended by negation as failure
[Clark, 1978], i.e. clauses of the form:
Kakas and Mancarella, 1990a; Kakas and Mancarella, 1990d; Denecker and
De Schreye, 1992b; Teusink, 1993]. Instead of failing in a proof when a
selected subgoal fails to unify with the head of any rule, the subgoal can
be viewed as a hypothesis. This is similar to viewing abducibles as "ask-
able" conditions which are treated as qualifications to answers to queries
[Sergot, 1983]. In the same way that it is useful to distinguish a subset of
all predicates as "askable", it is useful to distinguish certain predicates as
abducible. In fact, it is generally convenient to choose, as abducible predi-
cates, ones which are not conclusions of any clause. As we shall remark at
the beginning of Section 5, this restriction can be imposed without loss of
generality, and has the added advantage of ensuring that all explanations
will be basic.
Abductive explanations computed in LP are guaranteed to be minimal,
unless the program itself encodes non-minimal explanations. For example,
in the prepositional logic program
both the minimal explanation {q} and the non-minimal explanation {q, r}
are computed for the observation p.
The abductive task for the logic-based approach has been proved to
be highly intractable: it is NP-hard even if T is a set of acyclic [Apt and
Bezem, 1990] propositional definite clauses [Selman and Levesque, 1990;
Eiter and Gottlob, 1993], and is even harder if T is a set of any propositional
clauses [Eiter and Gottlob, 1993]. These complexity results hold even if
explanations are not required to be minimal. However, the abductive task
is tractable for certain more restricted classes of logic programs (see for
example [Eshghi, 1993]).
There are other formalisations of abduction. We mention them for
completeness, but in the sequel we will concentrate on the logic-based view
previously described.
• Allemand, Tanner, Bylander and Josephson [Allemand et al., 1991]
and Reggia [Reggia, 1983] present a mathematical characterisation,
where abduction is defined over sets of observations and hypotheses,
in terms of coverings and parsimony.
• Levesque [Levesque, 1989] gives an account of abduction at the "knowl-
edge level". He characterises abduction in terms of a (modal) logic
of beliefs, and shows how the logic-based approach to abduction can
be understood in terms of a particular kind of belief.
In the previous discussion we have briefly described both semantics and
proof procedures for abduction. The relationship between semantics and
proof procedures can be understood as a special case of the relationship
The Role of Abduction 241
KB satisfies o iff KB = o.
These definitions have been proposed in the case where the theory is a logic
program P by Kowalski and Sadri [Sadri and Kowalski, 1987] and Lloyd
and Topor [Lloyd and Topor, 1985] respectively, where KB is the Clark
completion [Clark, 1978] of P.
Another view of integrity constraints [Kakas, 1991; Kakas and Man-
carella, 1990; Kowalski, 1990; Reiter, 1988; Reiter, 1990] regards these as
epistemic or metalevel statements about the content of the database. In
this case the integrity constraints are understood as statements at a differ-
ent level from those in the knowledge base. They specify what must be true
about the knowledge base rather than what is true about the world mod-
elled by the knowledge base. When later we consider abduction in LP (see
Sections 4,5), integrity satisfaction will be understood in a sense which is
stronger than consistency, weaker than theoremhood, and arguably similar
to the epistemic or metalevel view.
For each such semantics, we have a specification of the integrity checking
problem. Although the different views of integrity satisfaction are concep-
tually very different, the integrity checking procedures based upon these
views are not very different in practice (e.g. [Decker, 1986; Sadri and
Kowalski, 1987; Lloyd and Topor, 1985]). They are mainly concerned
with avoiding the inefficiency which arises if all the integrity constraints
are retested after each update. A common idea of all these procedures is
to render integrity checking more efficient by exploiting the assumption
that the database before the update satisfies the integrity constraints, and
therefore if integrity constraints are violated after the update, this viola-
tion should depend upon the update itself: In [Sadri and Kowalski, 1987]
2
Here and in the rest of this paper we will use the same symbol A to indicate both
the set of abducible predicates and the set of all their variable-free instances.
The Role of Abduction 243
request.
Another important application which can be understood in terms of a
"non-causal" use of abduction is default reasoning. Default reasoning
concerns the use of general rules to derive information in the absence of
contradictions. In the application of abduction to default reasoning, conclu-
sions are viewed as observations to be explained by means of assumptions
which hold by default unless a contradiction can be shown [Eshghi and
Kowalski, 1988; Poole, 1988]. As Poole [Poole, 1988] argues, the use of ab-
duction avoids the need to develop a non-classical, non-monotonic logic for
default reasoning. In Section 3 we will further discuss the use of abduction
for default reasoning in greater detail. Because negation as failure in LP is
a form of default reasoning, its interpretation by means of abduction will
be discussed in section 4.
Some authors (e.g. Pearl [Pearl, 1988]) advocate the use of probabil-
ity theory as an alternative approach to common sense reasoning in gen-
eral, and to many of the applications listed above in particular. However,
Poole [Poole, 1993] shows how abduction can be used to simulate (dis-
crete) Bayesian networks in probability theory. He proposes the language
of probabilistic Horn abduction: in this language an abductive framework
is a triple (T, A, I), where T is a set of Horn clauses, A is a set of abducibles
without definitions in T (without loss of generality, see Section 5), and /
is a set of integrity constraints in the form of denials of abducibles only.
In addition, for each integrity constraint, a probability value is assigned to
each abducible, so that the sum of all the values of all the abducibles in
each integrity constraint is 1. If the abductive framework satisfies certain
assumptions, e.g. T is acyclic [Apt and Bezem, 1990], the bodies of all the
clauses defining each non-abducible atom are mutually exclusive and these
clauses are "covering", and abducibles in A are "probabilistically indepen-
dent", then such a probabilistic Horn abduction theory can be mapped
onto a (discrete) Bayesian network and vice versa.
2 Knowledge assimilation
Abduction takes place in the context of assimilating new knowledge (infor-
mation, belief or data) into a theory (or knowledge base). There are four
possible deductive relationships between the current knowledge base (KB),
the new information, and the new KB which arises as a result [Kowalski,
1979; Kowalski, 1994].
1. The new information is already deducible from the current KB. The
new KB, as a result, is identical with the current one.
2. The current KB = KB1 U KB2 can be decomposed into two parts.
One part KB1 together with the new information can be used to
deduce the other part KB2. The new KB is KB1 together with the
new information.
The Role of Abduction 245
3. The new information violates the integrity of the current KB. In-
tegrity can be restored by modifying or rejecting one or more of the
assumptions which lead to the contradiction.
4. The new information is independent from the current KB. The new
KB is obtained by adding the new information to the current KB.
-.[persists (T1, P, T 2 )A happens (E, T)/\ terminates(E, P}/\T1 <T < T2].
246 A. C. Kakas, R. A. Kowdski and F. Toni
happens(takes_book(mary),to)
initiates(takes_book(X), has-book(X))
terminates(gives_book(X, Y), has_book(X))
initiates(gives_book(X, Y), has_book(Y))
Then, given to < t1 < t2, the persistence axiom predicts holds_at(has_book
(mary),t1) by assuming persists(t 0 ,has_book(mary),t1), and holds.at
(has_book(mary),t2) by assuming persists(to,has_book(mary),t2). Both
these assumptions are consistent with the integrity constraint. Suppose
now that the new information holds.at(has_book(john), t2) is added to KB.
This conflicts with the prediction holds_at(has-book(mary),t2)- However,
the new information can be assimilated by adding to KB the hypotheses
happens(gives_book(mary,john),t1) and persists(t1,has_book(john),t2}
and by retracting the hypothesis persists(to,has-book(mary),t2)- There-
fore, the earlier prediction holds-at(has_book(mary),t2) can no longer be
derived from the new KB.
Note that in this example the hypothesis happens(gives_book(mary,
john),t1) can be added to KB since it does not violate the further integrity
constraint
the new information. For instance we may prefer a set of hypotheses which
entails information already in the KB, i.e. hypotheses that render the KB
as "compact" as possible.
Example 2.0.2. Suppose the current KB contains
where sibling and parent are view predicates, father and mother are ex-
tensional, and =,= are "built-in" predicates such that
X = X and
s = t for all distinct variable-free terms s and t.
is given. This can be translated into either of the two minimal updates
on the extensional part of the KB. Both of these updates satisfy the in-
tegrity constraints. However, only the first update satisfies the integrity
constraints if we are given the further update
which expresses, for all variable-free instances t of X,4 that r(t) can be de-
rived if a(t) holds and each of Bi(t) is consistent, where a(X), B i ( X ) , r ( X )
are first-order formulae. Default rules provide a way of extending an un-
derlying incomplete theory. Different applications of the defaults can yield
different extensions.
As already mentioned in Section 1, Poole, Goebel and Aleliunas [Poole
et al., 1987] and Poole [Poole, 1988] propose an alternative formalisation
of default reasoning in terms of abduction. Like Reiter, Poole also distin-
guishes two kinds of beliefs:
• beliefs that belong to a consistent set of first order sentences F rep-
resenting "facts", and
• beliefs that belong to a set of first order formulae D representing
defaults.
4
We use the notation X to indicate a tuple of variables X 1 , . . . , Xn and t to represent
a tuple of terms t 1 , . . . , t n .
250 A. C. Kakas, R. A. Kowalski and F. Toni
Perhaps the most important difference between Poole's and Reiter's for-
malisations is that Poole uses sentences (and formulae) of classical first
order logic to express defaults, while Reiter uses rules of inference. Given a
Theorist framework (F, D), default reasoning can be thought of as theory
formation. A new theory is formed by extending the existing theory T with
a set A of sentences which are variable-free instances of formulae in D. The
new theory FU A should be consistent. This process of theory formation is
a form of abduction, where variable-free instances of defaults in D are the
candidate abducibles. Poole [Poole, 1988] shows that the semantics of the
theory formation framework (F, D) is equivalent to that of an abductive
framework (F1 , A, 0) (see Section 1.2) where the default formulae are all
atomic. The set of abducibles A consists of a new predicate
in D with free variables x. The new predicate is said to "name" the default.
The set F is the set F augmented with a sentence
The priority of the rule that penguins do not fly over the rule that birds
fly is obtained by regarding the first rule as a fact and the second rule as a
default. The atom fly(john) is a default conclusion which holds in f U A
with
5
Here, we use the conventional notation of first-order logic, rather than LP form.
We use —>• for the usual implication symbol for first-order logic in contrast with •<— for
LP. However, as in LP notation, variables occurring in formulae of f are assumed to
be universally quantified. Formulae of D, on the other hand, should be understood as
schemata standing for the set of all their variable-free instances.
The Role of Abduction 251
A = { bird(john) -¥ fly(john) }.
We obtain the same conclusion by naming the default (3.1) by means of a
predicate birds-fly(X), adding to F the new "fact"
and extending the resulting augmented set of facts Fwith the set of hy-
potheses
A' = {birds-fly(john) }.
On the other hand, the conclusion fly(tweety) cannot be derived, because
the extension
A = { bird(tweety) -> fly(tweety) }
is inconsistent with F, and similarly the extension
where MBj. is a new predicate, and MBI. (X) is a default formula (abducible),
for all i = 1, . . . , n. Integrity constraints
are needed to link the new predicates MBI; appropriately with the predicates
B, for all i = 1, . . . , n. A further integrity constraint
Hanks and McDermott showed, in effect, that the default theory, whose
facts consist of
has two extensions: one in which ->r, and therefore q holds; and one in
which ->q, and therefore p holds. The second extension is intuitively in-
correct under the intended interpretation. Hanks and Me Dermott showed
that many other approaches to default reasoning give similarly incorrect
results. However, Morris [Morris, 1988] showed that the default theory
which has no facts but contains the two non-normal defaults
yields only one extension, containing q, which is the correct result. In con-
trast, all natural representations of the problem in Theorist give incorrect
results.
As Eshghi and Kowalski [Eshghi and Kowalski, 1988], Evans [Evans,
1989] and Apt and Bezem [Apt and Bezem, 1990] observe, the Yale shooting
problem has the form of a logic program, and interpreting negation in the
problem as negation as failure yields only the correct result. This is the case
for both the semantics and the proof theory of LP. Moreover, [Eshghi and
Kowalski, 1988] and [Kakas and Mancarella, 1989] show how to retain the
correct result when negation as failure is interpreted as a form of abduction.
On the other hand, the Theorist framework does overcome the problem
that some default theories do not have extensions and hence cannot be
given any meaning within Reiter's default logic. In the next section we will
see that this problem also occurs in LP, but that it can also be overcome
by an abductive treatment of negation as failure. We will also see that the
resulting abductive interpretation of negation as failure allows us to regard
LP as a hybrid which treats defaults as abducibles in Theorist but treats
clauses as inference rules in default logic.
254 A. C. Kakas, R. A. Kowalski and F. Toni
To see how NAF can be used for default reasoning, we return to the flying-
birds example.
The Role of Abduction 255
Example 4.0.1. The NAF formulation differs from the logic program
with abduction presented in the last section (Example 3.0.2) by employing
a negative condition
~ abnormal-bird(X)
instead of a positive abducible condition
tirds-fly(X)
abnormal-bird(X )
A p*(X)]and
The semantics of the abductive framework {P*, A*, I*), in terms of ex-
tensions 7 P* U A of P* , where A C A* , gives a semantics for the original
program P. A conclusion Q holds with respect to P if and only if the query
Q*, obtained by replacing each negative literal ~ p(t) in Q by p*(t), has an
abductive explanation in the framework (P*, A*, I*). This transformation
of P into (P*, A*, I*) is an example of the method, described at the end
of Section 1.1, of giving a semantics to a language by translating it into
another language whose semantics is already known.
The integrity constraints in /* play a crucial role in capturing the mean-
ing of NAF. The denials express that the newly introduced symbols p* are
the negations of the corresponding p. They prevent an assumption p* (t) if
p(t) holds. On the other hand the disjunctive integrity constraints force a
hypothesis p* (t) whenever p(t) does not hold.
Hence we define the meaning of the integrity constraints /* as follows:
An extension P*U A (which is a Horn theory) of P* satisfies I* if and
only if for every variable-free atom p,
P* U A y= p A p*, and
P* U A (= p or P" U A (= p*.
Eshghi and Kowalski [Eshghi and Kowalski, 1989] show that there is a one
to one correspondence between stable models [Gelfond and Lifschitz, 1988]
of P and abductive extensions of P*. We recall the definition of stable
model:
Let P be a general logic program, and assume that all the clauses in
P are variable-free.8 For any set M of variable-free atoms, let PM be the
Horn program obtained by deleting from P:P:
6
In the original paper the disjunctive integrity constraints were written in the form
Demo(P*U A, p(t)) V Demo(P*U A, p*(t)),
where t is any variable-free term. This formulation makes explicit a particular (meta-
level) interpretation of the disjunctive integrity constraint. The simpler form
is neutral with respect to the interpretation of integrity constraints and allows the meta-
level interpretation as a special case.
7
This use of the term "extension" is different from other uses. For example, in
default logic an extension is formally denned to be the deductive closure of a theory
"extended" by means of the conclusions of default rules. In this paper we also use the
term "extension" informally (as in Example 3.0.1) to refer to A alone.
8
If P is not variable-free, then it is replaced by the set of all its variable-free instances.
The Role of Abduction 257
The query <— s succeeds with answer A = {p*, r*}. The computation
is shown in Figure 1. Parts of the search space enclosed by a double box
show the incremental integrity checking of the latest abducible added to the
explanation A. For example, the outer double box shows the integrity check
for the abducible p*. For this we start from «- p = ->p (resulting from the
resolution of p* with the integrity constraint -> (p A p*) = -ip V ->p*) and
resolve backwards in SLD-fashion to show that all branches end in failure,
depicted here by a black box. During this consistency phase for p* a new
abductive phase (shown in the single box) is generated when q* is selected
since the disjunctive integrity constraint q* V q implies that failure of q* is
allowed only provided that q is provable. The SLD proof of q requires the
9
As noticed by Dung [Dung, 1991], the procedure presented in [Eshghi and Kowalski,
1989] contains a mistake, which is not present, however, in the earlier unpublished
version of the paper.
10
We use the term "consistency phase" for historical reasons. However, in view of the
precise definition of integrity constraint satisfaction, some other term might be more
appropriate.
The Role of Abduction 259
The last two clauses in this program give rise to a two-step loop via
NAF, in the sense that p (and, similarly, q) "depends" negatively on itself
through two applications of NAF. This causes the SLDNF proof procedure,
executing the query <— s, to go into an infinite loop. Therefore, the query
has no SLDNF refutation. However, in the corresponding abductive frame-
work the query has two answers, A = {p*} and A = {q*}, corresponding
to the two stable models of the program. The computation for the first
answer is shown in Figure 2. The outer abductive phase generates the hy-
pothesis p* and triggers the consistency phase for p* shown in the double
box. In general, whenever a hypothesis is tested for integrity, we can add
the hypothesis to A either at the beginning or at the end of the consistency
phase. When this addition is done at the beginning (as originally defined
in [Eshghi and Kowalski, 1989]) this extra information can be used in any
subordinate abductive phase. In this example, the hypothesis p* is used in
the subordinate abductive proof of q to justify the failure of q* and con-
sequently to render p* acceptable. In other words, the acceptability of p*
as a hypothesis is proved under the assumption of p*. The same abductive
proof procedure, but where each new hypothesis is added to A only at
The Role of Abduction 261
Note that the first clause of this program give rise to a one-step loop
via NAF, in the sense that r "depends" negatively on itself through one
application of NAF. The abductive proof procedure succeeds with the ex-
planation {q*}, but the only set of hypotheses which satisfies the integrity
constraints is {p*}.
So, as Eshghi and Kowalski [Eshghi and Kowalski, 1989] show by means
of this example, the abductive proof procedure is not always sound with
respect to the above abductive semantics of NAF. In fact, following the
result in [Dung, 1991], it can be proved that the proof procedure is sound
for the class of order-consistent logic programs defined by Sato [Sato, 1990].
262 A. C. Kakas, R. A. Kowalski and F. Toni
Intuitively, this is the class of programs which do not contain clauses giving
rise to odd-step loops via NAF.
For the overall class of general logic programs, moreover, it is possible
to argue that it is the semantics and not the proof procedure that is at
fault. Indeed, Sacca and Zaniolo [Sacca and Zaniolo, 1990], Przymusinski
[Przymusinski, 1990] and others have argued that the totality requirement
of stable models is too strong. They relax this requirement and consider
partial or three-valued stable models instead. In the context of the abduc-
tive semantics of NAF this is an argument against the disjunctive integrity
constraints.
An abductive semantics of NAF without disjunctive integrity constraints
has been proposed by Dung [Dung, 1991] (see Section 4.3 below). The ab-
ductive proof procedure is sound with respect to this improved semantics.
An alternative abductive semantics of NAF without disjunctive in-
tegrity constraints has been proposed by Brewka [Brewka, 1993], follow-
ing ideas presented in [Konolige, 1992]. He suggests that the set which
includes both accepted and refuted NAF hypotheses be maximised. For
each such set of hypotheses, the logic program admits a "model" which is
the union of the sets of accepted hypotheses together with the "comple-
ment" of the refuted hypotheses. For Example 4.2.3 the only "model" is
{p*,q,r}. Therefore, the abductive proof procedure is still unsound with
respect to this semantics. Moreover, this semantics has other undesirable
consequences. For example, the program
p <— p, ~ g
Similarly to Example 4.2.3, the last clause gives rise to a one-step loop
The Role of Abduction 265
The last three clauses give rise to a three-step loop via NAF, since p (and,
similarly, q and r) "depends" negatively on itself through three applications
of NAF. In the corresponding abductive framework, the only attack against
the hypothesis s* is E = {p*}. But although P* U{s*} U E does not attack
E, E is not a valid attack because it is not stable (or admissible) according
to the definition above.
To generalise the reasoning in this example so that it gives an intuitively
correct semantics to any program with clauses giving rise to an odd-step
loop via NAF, we need to liberalise further the conditions for defeating
E. Kakas and Mancarella suggest a recursive definition in which a set of
hypotheses is deemed acceptable if no attack against it is acceptable. More
precisely, given an initial set of hypotheses AO, a set of hypotheses A is
acceptable to AO iff
for every attack E against A — A0, E is not acceptable to A U A0.
The semantics of a program P can be identified with any A which is maxi-
mally acceptable to the empty set of hypotheses 0. As before with weak sta-
bility and stable theories, the consideration of attacks only against A — A0
ensures that attacks and counterattacks are genuine, i.e. they attack the
new part of A that does not contain AO.
Notice "that, as a special case, we obtain a basis for the definition:
A is acceptable to A0 if A C A0.
Therefore, if A is acceptable to 0 then A is consistent.
Notice, too, that applying the recursive definition twice, and starting with
the base case, we obtain an approximation to the recursive definition
A is acceptable to AO if for every attack E against A — A0 ,
E U A U A0 attacks E - (A U A 0 ).
Thus, the stable theories are those which are maximally acceptable to 0,
where acceptability is defined by this approximation to the recursive defi-
nition.
A related argumentation-theoretic interpretation for the semantics of
NAF in LP has also been developed by Geffher [Geffner, 1991]. This inter-
pretation is equivalent to the well-founded semantics [Dung, 1993]. Based
upon Geffner's notion of argumentation, Torres [Torres, 1993] has proposed
an argumentation-theoretic semantics for NAF that is equivalent to Kakas
and Mancarella's stable theory semantics [Kakas and Mancarella, 1991;
Kakas and Mancarella, 1991a], but is formulated in terms of the following
notion of attack: E attacks A (relative to P*) if P* U E U A - p for
some p* € A.
Alferes and Pereira [Alferes and Pereira, 1994] apply the argumentation-
theoretic interpretation introduced in [Kakas et al., 1993] to expand the
well-founded model of normal and extended logic programs (see Section 5).
The Role of Abduction 267
In the case of normal logic programming, their semantics gives the same
result as the acceptability semantics in Example 4.3.3.
Simari and Loui [Simari and Loui, 1992] define an argumentation-theoretic
framework for default reasoning in general. They combine a notion of ac-
ceptability with Poole's notion of "most specific" explanation [Poole, 1985],
to deal with hierarchies of defaults.
In Section 7 we will present an abstract argumentation-theoretic frame-
work which is based upon the framework for LP but unifies many other
approaches to default reasoning.
Fig. 3. Computation for Example 4.3.2 with respect to the revisited proof
procedure
The interpretations M(A 1 ) = {a, p} and M(A 2 ) = {a, b,p, q} are gen-
eralised stable models of <P, A, I). Consequently, both A1 = {a} and
A2 = {a, b} are abductive explanations of p. On the other hand, the in-
terpretation {b, q}, corresponding to the set of abducibles {b}, is not a
generalised stable model of (P, A, I) , because it is not a model of / as it
does not contain p. Moreover, the interpretation {b, q, p}, although it is a
model of P U I and therefore satisfies I according to the consistency view
of constraint satisfaction, is not a generalised stable model of (P, A, I),
because it is not a stable model of P. This shows that the notion of in-
tegrity satisfaction for ALP is stronger than the consistency view. It is also
The Role of Abduction 271
possible to show that it is weaker than the theoremhood view and to argue
that it is similar to the metalevel or epistemic view.
An alternative, and perhaps more fundamental way of understanding
the generalised stable model semantics is by using abduction both for hy-
pothetical reasoning and for NAF. The negative literals in {P, A, I} can be
viewed as further abducibles according to the transformation described in
Section 4. The set of abducible predicates then becomes A U A*, where A*
is the set of negative abducibles introduced by the transformation. This
results in a new abductive framework (P*, ADA*, I U I*), where /* is the
set of special integrity constraints introduced by the transformation of Sec-
tion 4.14 The semantics of the abductive framework (P*, A U A*, I U I*)
can then be given by the sets A* of hypotheses drawn from A U A* which
satisfy the integrity constraints I U I*.
Example 5.1.2. Consider P
q* is acceptable.
An important feature of the abductive proof procedures is that they
avoid performing a full general-purpose integrity check (such as the for-
ward reasoning procedure of [Kowalski and Sadri, 1988]). In the case of a
negative hypothesis, q* for example, a general-purpose forward reasoning
integrity check would have to use rules in the program such as p <- q* to
derive p. The optimised integrity check in the abductive proof procedure
avoids this inference and only reasons forward one step with the integrity
constraint -> (q A q*), deriving the resolvent 4— q, and then reasoning back-
ward from the resolvent.
Similarly, the integrity check for a positive hypothesis, r for example,
avoids reasoning forward with any rules which might have r in the body.
Indeed, in a case, such as Example 5.2.1 above, where there are no domain
specific integrity constraints, the integrity check for a positive abducible,
such as r, simply consists in checking that its complement, in our example
r*, does not belong to A.
To ensure that this optimised form of integrity check is correct, the
proof procedure is extended to record those positive abducibles it needs to
assume absent to show the integrity of other abducibles in A. So whenever a
positive abducible, which is not in A, is selected in a branch of a consistency
phase, the procedure fails on that branch and at the same time records
that this abducible needs to be absent. This extension is illustrated by the
following example.
Example 5.2.2. Consider the program
where r is abducible and the query is 4- p (see Figure 5). The acceptability
of q* requires the absence of the abducible r. The simplest way to ensure
this is by adding r* to A. This, then, prevents the abduction of r and the
computation fails. Notice that the proof procedure does not reason forward
from r to test its integrity. This test has been performed backwards in the
earlier consistency phase for q*, and the addition of r* to A ensures that
it is not necessary to repeat it.
The way in which the absence of abducibles is recorded depends on how
the negation of abducibles is interpreted. Under the stable and generalised
stable model semantics, as we have assumed in Example 5.2.2 above, the
required failure of a positive abducible is recorded by adding its complement
to A. However, in general it is not always appropriate to assume that the
absence of an abducible implies its negation. On the contrary, it may be
appropriate to treat abducibles as open rather than closed (see Section 6.1),
and correspondingly to treat the negation of abducible predicates as open.
The Role of Abduction 275
16
Recall that the abductive proof procedure for ALP employs the restriction that each
integrity constraint contains at least one literal with an abducible predicate.
278 A. C. Kakas, R. A. Kowalski and F. Toni
and the set of integrity constraints I = {-p}. P has two stable models
M1 = {p, q} and M2 = {r}, but only M2 satisfies I. The proof procedure
of [Satoh and Iwayama, 1991] deterministically computes only the intended
model M2, without first computing and then rejecting M1.
In Section 8 we will see more generally that truth maintenance systems
can be regarded as a form of ALP.
P' has two generalised stable models that satisfy the integrity constraints,
namely M'1 = M(A1) U {b} = {a, p, b'}, and M'2 = M(A 2 ) = {a, b, p, q]
where M(A1) and M(^2) are the generalised stable models seen in Exam-
ple 5.1.1.
An alternative way of viewing abduction, which emphasises the defea-
sibility of abducibles, is retractability [Goebel et al., 1986]. Instead of
regarding abducibles as atoms to be consistently added to a theory, they can
be considered as assertions in the theory to be retracted in the presence of
contradictions until consistency (or integrity) is restored (c.f. Section 6.2).
One approach to this understanding of abduction is presented in [Kowal-
ski and Sadri, 1988]. Here, Kowalski and Sadri present a transformation
from a general logic program P with integrity constraints /, together with
some indication of how to restore consistency, to a new general logic pro-
gram P' without integrity constraints. Restoration of consistency is indi-
cated by nominating one atom as retractable in each integrity constraint.18
Integrity constraints are represented as denials, and the atom to be re-
tracted must occur positively in the integrity constraint. The (informally
specified) semantics is that whenever an integrity constraint of the form
17
Satoh and Iwayama use the notation p* instead of p' and explicitly consider only
prepositional programs.
18
Many different atoms can be retractable in the same integrity constraint. Alterna-
tive ways of nominating retractable atoms correspond to alternative ways of restoring
consistency in P.
The Role of Abduction 281
has been violated, where the atom p has been nominated as retractable,
then consistency should be restored by retracting the instance of the clause
of the form
by
where, as before, a' stands for the complement of a. The first clause in P' is
obtained by replacing the positive condition a in the clause in P by the NAF
literal ~ a'. The second clause replaces the integrity constraint in /. Note
that this replaces "a should be retracted" if the integrity constraint ->[aAg]
is violated by "the complement a' of a should be asserted". Finally, the
The Role of Abduction 283
that, together with the third clause, expresses that neither a nor a' need
hold, even if no integrity constraint is violated. Note that the last two
clauses in P' are those used by Satoh and Iwayama [Satoh and Iwayama,
1991] to simulate non-default abduction by means of NAF.
Toni and Kowalski [Toni and Kowalski, 1995] prove that the transfor-
mation is correct and complete in the sense that there is a one-to-one cor-
respondence between attacks in the framework (P, A, I) and in the frame-
work corresponding to the transformed program P'. Thus, for any seman-
tics that can be defined argumentation-theoretically there is a one-to-one
correspondence between the semantics for an abductive logic program and
the semantics of the transformed program. As a consequence, any proof
procedure for LP which is correct for one of these semantics provides a
correct proof procedure for ALP for the analogous semantics (and, less
interestingly, vice versa).
In addition to the transformations from ALP to general LP, discussed
above, transformations between ALP and disjunctive logic programming
(DLP) have also been investigated. Inoue et al. [Inoue et al., 1992a],19 in
particular, translate ALP clauses of the form
where a' is a new atom that stands for the complement of a, as expressed
by the integrity constraint
(5.1)
Console, Dupre and Torasso extend this approach to deal with propo-
sitional abductive logic programs with integrity constraints / in the form
of denials of abducibles and of clauses expressing taxonomic relationships
among abducibles. An explanation formula for an observation O is now de-
fined to be the most specific formula F, formulated in terms of abducible
predicates only, such that
The proof procedure is extended by using the denial and taxonomic in-
tegrity constraints to simplify F.
In the more general case of non-propositional abductive logic programs,
the Clark equality theory CET [Clark, 1978], is used; the notion that F
is more specific than F' requires that F -> F' be a logical consequence
of CET and that F' -> F not be a consequence of CET. The explana-
tion formula is unique up to equivalence with respect to CET. The proof
procedure is extended to take into account the equality theory CET.
Denecker and De Schreye [Denecker and De Schreye, 1992a] compare
the search space obtained by reasoning backward using the if-half of the
if-and-only-if form of a definite program with that obtained by reasoning
forward using the only-if half. They show an equivalence between the search
space for SLD-resolution extended with abduction and the search space for
model generation with SATCHMO [Manthey and Bry, 1988] augmented
with term rewriting to simulate unification.
by
Clauses with explicit negation in their conclusions can also serve a sim-
ilar function to integrity constraints with retractibles. For example, the
integrity constraint
where n > m > 0 and each Li is either an atom (A) or the explicit nega-
tion of an atom ( - A ) . This negation denoted by "–" is called "classical
negation" in [Gelfond and Lifschitz, 1990]. However, as we will see be-
low, because the contrapositives of extended clauses do not hold, the term
"classical negation" can be regarded as inappropriate. For this reason we
use the term "explicit negation" instead.
A similar notion has been investigated by Pearce and Wagner [Pearce
and Wagner, 1991], who develop an extension of definite programs by means
of Nelson's strong negation. They also suggest the possibility of combining
strong negation with NAF. Akama [Akama, 1992] argues that the seman-
tics of this combination of strong negation with NAF is equivalent to the
answer set semantics for extended logic programs developed by Gelfond
and Lifschitz.
The semantics of an extended program is given by its answer sets, which
are like stable models but consist of both positive and (explicit) negative lit-
erals. Perhaps the easiest way to understand the semantics is to transform
the extended program P into a general logic program P' without explicit
negation, and to apply the stable model semantics to the resulting general
logic program. The transformation consists in replacing every occurrence
of explicit negation -p(t) by a new (positive) atom p'(t). The stable mod-
els of P' which do not contain a contradiction of the form p(t) and p'(t)
correspond to the consistent answer sets of P. The corresponding an-
swer sets of P contain explicit negative literals ->p(t) wherever the stable
290 A. C. Kakas, R. A. Kowalski and F. Toni
models contain p'(t). In [Gelfond and Lifschitz, 1990] the answer sets are
defined directly on the extended program by modifying the definition of the
stable model semantics. The consistent answer sets of P also correspond to
the generalised stable models (see Section 5.1) of p' with a set of integrity
constraints V X-> [ p ( X ) A p'(X)], for every predicate p.
In the general case a stable model of P' might contain a contradiction
of the form p(t) and p'(t). In this case the corresponding inconsistent
answer set is defined to be the set of all the variable-free (positive and
explicit negative) literals. It is in this sense that explicit negation can
be said to be "classical". The same effect can be obtained by explicitly
augmenting P' by the clauses
for all predicate symbols q and p in P'. Then the answer sets of P simply
correspond to the stable models of the augmented set of clauses. If these
clauses are not added, then the resulting treatment of explicit negation
gives rise to a paraconsistent logic, i.e. one in which contradictions are
tolerated.
Notice that, although Gelfond and Lifschitz define the answer set se-
mantics directly without transforming the program and then applying the
stable model semantics, the transformation can also be used with any
other semantics for the resulting transformed program. Thus Przymusin-
ski [Przymusinski, 1990] for example applies the well-founded semantics
to extended logic programs. Similarly any other semantics can also be
applied. As we have seen before, this is one of the main advantages of
transformational semantics in general.
An important problem for the practical use of extended programs is how
to distinguish whether a negative condition is to be interpreted as explicit
negation or as NAF. This problem will be addressed in Sections 6.4 and 9.
to prevent contradictions.
Extended preferred extensions are then defined by modifying the defini-
tion of preferred extensions in Section 4 for the resulting abductive frame-
work with this new set I* of integrity constraints. The new integrity con-
straints in /* have the effect of removing a NAF hypothesis when it leads
to a contradiction. Clearly, any other semantics for logic programs with
integrity constraints could also be applied to this framework.
Pereira, Aparicio and Alferes [Pereira et al., 199la] employ a similar
approach within the context of Przymuszynski's extended stable models
[Przymusinski, 1990]. It consists in identifying explicitly all the possible
sets of NAF hypotheses which lead to an inconsistency and then restoring
consistency by removing at least one hypothesis from each such set. This
method can be viewed as a form of belief revision, where if inconsistency
can be attributed to an abducible hypothesis or a retractable atom (see
Section 5.5), then we can reject the hypothesis to restore consistency. In
fact Pereira, Aparicio and Alferes have also used this method to study
counterfactual reasoning [Pereira et al., 1991c]. Alferes and Pereira [Alferes
and Pereira, 1993] have shown that this method of restoring consistency
can also be viewed in terms of inconsistency avoidance.
This method [Pereira et al., 1991a] is not able to restore consistency in
all cases, as illustrated by the following example.
292 A. C. Kakas, R. A. Kowalski and F. Toni
Example 6.2.2.
given the extended logic program
(0, P).
The Role of Abduction 293
For Example 6.2.3 this will give two alternative semantics, {p} or {->p}.
A similar approach to restoring consistency follows also from the work
in [Kakas, 1992; Kakas et al., 1994] (see Section 7), where argumentation-
based semantics can be used to select acceptable (and hence consistent)
subsets of an inconsistent extended logic program.
Kowalski and Sadri [Kowalski and Sadri, 1990] also present a transfor-
mation, which preserves the new semantics, and is arguably a more elegant
form of the transformation presented in [Kowalski and Sadri, 1988] (see
Section 5.5). In the case of the flying-birds example described above the
new transformation gives the clause
into every clause with a positive conclusion p(t). The condition is vacuous
if there are no exceptions with ->p in the conclusion. The answer set
semantics of the new program is equivalent to the modified answer set
semantics of the original program, and both are consistent. Moreover,
the transformed program can be further transformed into a general logic
program by renaming explicit negations ->p by new positive predicates p'.
Because of this renaming, positive and negative predicates can be handled
symmetrically, and therefore, in effect, clauses with positive conclusions can
represent exceptions to rules with (renamed) negative conclusions. Thus,
for example, a negative rule such as
together with an ordering that indicates that the second clause has priority
over the first. In general, the extended clauses
are ordered so that TJ > r for 1 < j < k. In [Kakas et al., 1994], the result-
ing prioritised clauses are formulated in an ELP framework (with explicit
negation) without NAF but with an ordering relation on the clauses of the
given program.
This new framework for ELP is proposed in [Kakas et al., 1994] as an exam-
ple of a general theory of the acceptability semantics (see Section 4.3) devel-
oped within the argumentation-theoretic framework introduced in [Kakas
296 A. C. Kakas, R. A. Kowalski and F. Toni
et a/., 1993] (see Section 7). Its semantics is based upon an appropriate no-
tion of attack between subtheories consisting of partially ordered extended
clauses in a theory T. Informally, for any subsets E and A of T such that
E U A have a contradictory consequence, E attacks A if and only if either
E does not contain a clause which is lower than some clause in A or if E
does contain such a clause, it also contains some clause which is higher than
a clause in A. Thus, the priority ordering is used to break the symmetry
between the incompatible sets E and A. Hence in the example above, if
we have a bird that walks, then the subtheory which, in addition to these
two facts, consists of the second clause
and the same two facts, but not vice versa; so, the first subtheory is ac-
ceptable whereas the second one is not.
Kakas, Mancarella and Dung show that, with this notion of attack in
the new framework with explicit negation but without NAF, it is possible
to capture exactly the semantics of NAF in LP. This shows that, if LP is
extended with explicit negation, then NAF can be simulated by introducing
a priority ordering between clauses. Moreover, the new framework of ELP
is more general than conventional ELP as it allows any ordering relation
on the clauses of extended logic programs.
In the extended logic program which results from the transformation de-
scribed above, if ->p holds then ~ p holds in the corresponding general logic
program, for any atom p. We can argue, therefore, that the transformed
extended logic program satisfies the coherence principle, proposed by
Pereira and Alferes [Pereira and Alferes, 1992], namely that whenever ->p
holds then ~ p must also hold. They consider the satisfaction of this prin-
ciple to be a desirable property of any semantics for ELP, as illustrated by
the following example, taken from [Alferes et al., 1993].
Example 6.4.1. Given the extended logic program
20
Note that, for simplicity, here we use NAF literals directly as hypotheses, without
renaming them as positive atoms.
21
P' stands for the extended logic program P where all explicitly negated literals of
the form ~p(t) are rewritten as atoms p'(i).
298 A. C. Kakas, R. A. Kowalski and F. Toni
where the condition r represents the conditions under which the exception
holds. In the flying-birds example, the second clause of
expresses that the default named birds-fly does not apply for penguins.
The possibility of expressing both exceptions to rules as well as ex-
ceptions to predicates is useful for representing hierarchies of exceptions.
Suppose we want to change (6.3) to the default rule "penguins usually don't
fly". This can be done by replacing (6.3) by
Pereira, Aparicio and Alferes [Pereira et al., 1991] use the well-founded se-
mantics extended with explicit negation to give a semantics for this method-
ology for default reasoning. However it is worth noting that any other se-
mantics of extended logic programs could also be used. For example Inoue
[Inoue, 1994; Inoue, 1991] uses an extension of the answer set semantics
(see Section 6.2), but for a slightly different transformation.
This more general notion is useful for capturing the semantics of ALP.
To cater for the semantics of LP, T is a general logic program, h is
modus ponens and A is the set of all negative literals. The contrary of ~ p
is p.
302 A. C. Kakas, R. A. Kowalski and F. Toni
For default logic, default rules are rewritten as sentences of the form
(similarly to Poole's simulation of default logic, Section 3), where the un-
derlying language is first-order logic augmented with a new symbol "M"
which creates a sentence from a sentence not containing M, and with a
new implication symbol <— in addition to the usual implication symbol for
first-order logic. The theory T is F U D, where f is the set of "facts" and
D is the set of defaults written as sentences, - is ordinary provability for
classical logic augmented with modus ponens for the new implication sym-
bol. (This is different from Poole's simulation, which treats <— as ordinary
implication.) The set A is the set of all sentences of the form Ma. The
contrary of Ma is ->a.
For autoepistemic logic, the theory T is any set of sentences written in
modal logic. However, I- is provability in classical (non-modal) logic. The
set A is the set of all sentences of the form ->L<£ or L0. The contrary of
-L0 is 0, whereas the contrary of L0 is ->L0.
For non-monotonic logic II, T is any set of sentences of modal logic,
as in the case of autoepistemic logic, but | is provability in modal logic
(including the inference rule of necessitation, which derives L0 from 0).
The set A is the set of all sentences of the form ->L0. The contrary of -<L0
is 0.
Given any theory T in any monotonic logic, candidate assumptions A
and notion of the "contrary" of an assumption, a set of assumptions A is
stable iff
a justification for r, changing its label from OUT to IN. Notice that this
is similar to the behaviour of the extended abductive proof procedure de-
scribed in Example 5.2.1, Section 5.2.
Several authors have observed that the JTMS can be given a seman-
tics corresponding to the semantics of logic programs, by interpreting jus-
tifications as prepositional logic program clauses, and interpreting ~ pi
as NAF of pi. The papers [Elkan, 1990; Giordano and Martelli, 1990;
Kakas and Mancarella, 1990b; Pimentel and Cuadrado, 1989], in particu-
lar, show that a well-founded labelling for a JTMS corresponds to a stable
model of the corresponding logic program. Several authors [Elkan, 1990;
Fujiwara and Honiden, 1989; Kakas and Mancarella, 1990b; Reinfrank and
Dessler, 1989], exploiting the interpretation of stable models as autoepis-
temic expansions [Gelfond and Lifschitz, 1988], have shown a correspon-
dence between well-founded labellings and stable expansions of the set of
justifications viewed as autoepistemic theories.
The JTMS can also be understood in terms of abduction using the
abductive approach to the semantics of NAF, as shown in [Dung, 1991a;
Giordano and Martelli, 1990; Kakas and Mancarella, 1990b]. This has the
advantage that the nogoods of the JTMS can be interpreted as integrity
constraints of the abductive framework. The correspondence between ab-
duction and the JTMS is reinforced by [Satoh and Iwayama, 1991], which
gives a proof procedure to compute generalised stable models using the
JTMS (see Section 5.4).
However, whereas the JTMS maintains only one implicit context of as-
sumptions at a time, the ATMS explicitly records with every proposition
the different sets of assumptions which provide the foundations for its be-
lief. In ATMS, assumptions are propositions that have been pre-specified
as assumable. Each record of assumptions that supports a proposition p
can also be expressed in Horn clause form
The dependence
is not recorded because its assumptions violate the nogood. The depen-
dence
Reiter and deKleer [Reiter and De Kleer, 1987] show that, given a set
of justifications, nogoods, and candidate assumptions, the ATMS can be
understood as computing minimal and consistent abductive explanations
in the prepositional case (where assumptions are interpreted as abductive
hypotheses). This abductive interpretation of ATMS has been developed
further by Inoue [Inoue, 1990], who gives an abductive proof procedure for
the ATMS.
Given an abductive logic program P and goal G, the explicit construc-
tion in ALP of a set of hypotheses A, which together with P implies G
and together with P satisfies any integrity constraints /, is similar to the
record
and of defining a semantics that helps to unify the different forms of ab-
duction, NAF, and default reasoning within a common framework. We
have seen, in particular, that a proof procedure which is incorrect under
one semantics (e.g. [Eshghi and Kowalski, 1989]) can be correct under an-
other improved semantics (e.g. [Dung, 1991]). We have also introduced
an argumentation-theoretic interpretation for the semantics of abduction
applied to NAF, and we have seen that this interpretation can help to
understand the relationships between different semantics.
The argumentation-theoretic interpretation of NAF has been abstracted
and shown to unify and simplify the semantics of such different formalisms
for default reasoning as default logic, autoepistemic logic and non-mono-
tonic logic. In each case the standard semantics of these formalisms has
been shown to be a special instance of a single abstract notion that a
set of assumptions is a (stable) semantics if it does not attack itself but
does attack all other assumptions it does not contain. The stable model
semantics, generalised stable model semantics and answer set semantics
are other special cases. We have seen that stable model semantics and its
extensions have deficiencies which are avoided with admissibility, preferred
extension, complete scenaria, weak stability, stable theory and acceptability
semantics. Because these more refined semantics for LP can be defined
abstractly for any argumentation-based framework, they automatically and
uniformly provide improvements for the semantics of other formalisms for
default reasoning.
Despite the many advances in the application of abduction to LP and
to non-monotonic reasoning more generally, there is still much scope for
further development. Important problems in semantics still need to be
resolved. These include the problem of clarifying the role of integrity con-
straints in providing attacks and counterattacks in ALP.
The further development, clarification and simplification of the abstract
argumentation-theoretic framework and its applications both to existing
formalisms and to new formalisms for non-monotonic reasoning is another
important direction for future research. Of special importance is the prob-
lem of relating circumscription and the if-and-only-if completion semantics
to the argumentation-theoretic approach. An important step in this direc-
tion may be the "common sense" axiomatisation of NAF [Van Gelder and
Schlipf, 1993] by Van Gelder and Schlipf, which augments the if-and-only-if
completion with axioms of induction. The inclusion of induction axioms
relates this approach to circumscription, whereas the rewriting of negative
literals by new positive literals relates it to the abductive interpretation of
NAF.
The development of systems that combine ALP and CLP is another
important area that is still in its infancy. Among the results that might be
expected from this development are more powerful systems that combine
constructive abduction and constructive negation, and systems in which
The Role of Abduction 309
Acknowledgements
This research was supported by Fujitsu Research Laboratories and by the
Esprit Basic Research Action Compulog II. The authors are grateful to
Katsumi Inoue and Ken Satoh for helpful comments on an earlier draft,
and to Jose Julio Alferes, Phan Minh Dung, Paolo Mancarella and Luis
Moniz Pereira for many helpful discussions.
References
[Akama, 1992] S. Akama. Answer set semantics and constructive logic with
strong negation. Technical Report, 1992.
[Alferes et al., 1993] J. J. Alferes, P. M. Dung and L. M. Pereira. Scenario
semantics of extended logic programs. Proc 2nd International Work-
shop on Logic Programming and Nonmonotonic Reasoning, Lisbon. L.
M. Pereira and A. Nerode, eds. pp. 334-348, MIT Press, Cambridge,
MA, 1993.
[Alferes and Pereira, 1994] J. J. Alferes and L. M. Pereira. An argument-
ation-theoretic semantics based on non-refutable falsity. Proc. 4th Int.
Workshop on Non-monotonic Extensions of Logic Programming, Santa
Margherita Ligure, Italy. J. Dix, L. M. Pereira and T. Przymusinski eds.
1994.
[Alferes and Pereira, 1993] J. J. Alferes and L. M. Pereira. Contradiction
in logic programs: when avoidance equal removal, Parts I and II. Proc.
4th Int. Workshop on Extensions of Logic Programming, R. Dyckhoff ed.
pp. 7-26. Lecture Notes in AI 798, Springer-Verlag, Berlin, 1993.
[Alferes et al., 1994] J. J. Alferes, C. V. Damasio and L. M. Pereira. Top-
down query evaluation for well-founded semantics with explicit negation.
The Role of Abduction 311
mantles for Eshghi and Kowalski's abductive procedure. Proc. 10th In-
ternational Conference on Logic Programming, Budapest, 586-600, MIT
Press, Cambridge, MA, 1993.
[Goebel et al., 1986] R. Goebel, K. Furukawa and D. Poole. Using definite
clauses and integrity constraints as the basis for a theory formation ap-
proach to diagnostic reasoning. Proc. 3rd International Conference on
Logic Programming, London, 1986, MIT Press, Cambridge, MA, .pp.
211-222. Vol 225 of Lecture Notes in Computer Science, Springer Ver-
lag, Berlin, 1986.
[Hanks and McDermott, 1986] S. Hanks and D. McDermott. Default rea-
soning, non-monotonic logics, and the frame problem. Proc. 8th AAAI
'86, Philadelphia, 328-333,1986.
[Hanks and McDermott, 1987] S. Hanks and D. McDermott. Non-
monotonic logics and temporal projection. Artificial Intelligence, 33,
1987.
[Hasegawa and Fujita, 1992] R. Hasegawa and M. Fujita. Parallel theorem
provers and their applications. Proc. International Conference on Fifth
Generation Computer Systems, Tokyo, 132-154, 1992.
[Hobbs, 1990] J. R. Hobbs. An integrated abductive framework for dis-
course interpretation. Proc. AAAI Symposium on Automated Abduction,
Stanford, 10-12, 1990.
[Hobbs et al, 1990] J. R. Hobbs, M. Stickel, D. Appelt and P. Martin.
Interpretation as abduction. Technical Report, 499, Artificial Intelligence
Center, Computing and Engineering Sciences Division, Menlo Park, CA,
1990.
[lnoue, 1990] K. Inoue. An abductive procedure for the CMS/ATMS. Proc.
European Conference on Artificial Intelligence, ECAI '90, International
Workshop on Truth Maintenance, Stockholm, Springer Verlag Lecture
notes in Computer Science, 1990.
[Inoue, 1991] K. Inoue. Extended logic programs with default assumptions.
Proc. 8th International Conference on Logic Programming, Paris, 490-
504, MIT Press, Cambridge, MA, 1991.
[Inoue, 1994] K. Inoue. Hypothetical reasoning in logic programs. Journal
of Logic Programming, 18, 191-227, 1994.
[Inoue et al., 1992] K. Inoue, M. Koshimura and R. Hasegawa. Embedding
negation as failure into a model generation theorem prover. Proc. llth
International Conference on Automated Deduction, CADE '92, Saratoga
Springs, NY, 1992.
[Inoue et al., 1992a] K. Inoue, Y. Ohta, R. Hasegawa and M. Nakashima.
Hypothetical reasoning systems on the MGTP. Technical Report, ICOT,
Tokyo (in Japanese), 1992..
[Junker, 1989] U. Junker. A correct non-monotonic ATMS. Proc. llth
International Joint Conference on Artificial Intelligence, Detroit, MI,
The Role of Abduction 317
1049-1054, 1989.
[Kakas, 1991] A. C. Kakas. Deductive databases as theories of belief. Tech-
nical Report, Logic Programming Group, Imperial College, London,
1991.
[Kakas, 1991a] A. C. Kakas. On the evolution of databases. Technical Re-
port, Logic Programming Group, Imperial College, London, 1991.
[Kakas, 1992] A. C. Kakas. Default reasoning via negation as failure. Proc.
ECAI-92 workshop on "Foundations of Knowledge Representation and
Reasoning", G. Lakemeyer and B. Nebel, eds. Vol 810 of Lecture Notes
in AI, Springer Verlag, Berlin, 1992.
[Kakas and Mancarella, 1989] A. C. Kakas and P. Mancarella. Anomalous
models and abduction. Proc. 2nd International Symposium on Artificial
intelligence, Monterrey, Mexico, 1989.
[Kakas and Mancarella, 1990] A. C. Kakas and P. Mancarella. Generalized
Stable Models: a Semantics for Abduction. Pore. 9th European Confer-
ence on artificial Intelligence, ECAI '90, Stockholm, 385-391, 1990.
[Kakas and Mancarella, 1990a] A. C. Kakas and P. Mancarella. Database
updates through abduction. Proc. 16th International Conference on Very
Large Databases, VLDB'90, Brisbane, Australia, 1990.
[Kakas and Mancarella, 1990b] A. C. Kakas and P. Mancarella. On the
relation of truth maintenance and abduction. Proc. of the 1st Pacific Rim
International Conference on Artificial Intelligence, PRICAI'90, Nagoya,
Japan,1990.
[Kakas and Mancarella, 1990c] A. C. Kakas and P. Mancarella. Abductive
LP. Proc. NACLP '90,, Workshop on Non-Monotonic Reasoning and
Logic Programming, Austin, Texas, 1990.
[Kakas and Mancarella, 1990d] A. C. Kakas and P. Mancarella. Knowledge
assimilation and abduction. Proc. European Conference on Artificial
Intelligence, ECAI '90, International Workshop on Truth Maintenance,
Stockholm, Springer Verlag Lecture notes in Computer Science, 1990.
[Kakas and Mancarella, 1991] A. C. Kakas and P. Mancarella. Negation as
stable hypotheses. Proc. 1st International Workshop on Logic Program-
ming and Nonmonotonic Reasoning, Washington, DC. A. Nerode, V.
Marek and V. Subrahmanian eds., 1991.
[Kakas and Mancarella, 1991a] A. C. Kakas and P. Mancarella. Stable the-
ories for logic programs. Pore. ISLP '91, San Diego, 1991.
[Kakas and Mancarella, 1993a] A. C. Kakas and P. Mancarella. Construc-
tive abduction in logic programming. Technical Report, Dipartimento di
Informatica, Universita di Pisa, 1993.
[Kakas and Mancarella, 1993] A. C. Kakas and P. Mancarella. Preferred
extensions are partial stable models. Journal of Logic Programming, 14,
341-348, 1993.
318 A. C. Kakas, R. A. Kowalski and F. Toni
Contents
1 Introduction 325
2 Positive consequences in logic programs 327
2.1 Definite logic programming 328
2.2 Disjunctive logic programming 330
3 Negation in logic programs 337
3.1 Negation in definite logic programs 337
3.2 Negation in disjunctive logic progrtams 338
4 Normal or general disjunctive logic programs 340
4.1 Stratified definite logic programs 341
4.2 Stratified disjunctive logic programs 343
4.3 Well-founded and generalized well-founded logic programs 346
4.4 Generalized disjunctive well-founded semantics . . 346
5 Summary 347
6 Addendum 349
1 Introduction
During the past 20 years, logic programming has grown from a new disci-
pline to a mature field. Logic programming is a direct outgrowth of work
that started in automated theorem proving. The first programs based on
logic were developed by Colmerauer and his students [Colmerauer et al.,
1973] at the University of Marseilles in 1972 where the logic programming
language PROLOG was developed. Kowalski [1974] published the first pa-
per that formally described logic as a programming language in 1974. Alain
Colmerauer and Robert Kowalski are considered the founders of the field of
logic programming, van Emden and Kowalski [van Emden and Kowalski,
1976] laid down the theoretical foundation for logic programming. In the
past decade the field has witnessed rapid progress with the publication of
several theoretical results which have provided a strong foundation for logic
326 Jorge Lobo, Jack Minker and Arcot Rajasekar
where the Ai's and the BJ'S are atoms. The atoms in the left of the
implication sign form a disjunction and is called the head of the formula
and those on the right form a conjunction and is called the body of the
formula. The formula is read as "A1 or A2 or ... or Am if B1 and B2
and ... and Bn." There are several forms of the formula that one usually
distinguishes. If the body of the formula is empty, and the head is not,
the formula is referred to as a fact. If both are not empty the formula is
referred to as a procedure. A procedure or a fact is also referred to as a logic
program clause. If both head and body of a formula are empty, then the
formula is referred to as the halt statement. Finally, a query is a formula
where the head of the formula is empty and the body is not empty. A finite
set of such logic program clauses is said to constitute a disjunctive logic
program. If the head of a logic program clause consists of a single atom,
then it is called a Horn or definite logic program clause. A finite set of
such definite logic program clauses is said to constitute a Horn or definite
logic program. A definite logic program is a special case of disjunctive logic
program. We shall also consider clauses of the form (1), where the B, may
be literals. By a literal we mean either an atom or the negation of an atom.
A clause of the form (1) which contains literals in the body is referred to as
a normal (when the head is an atom) or general disjunctive logic program
clause. Similarly we also deal with queries which can have literals and we
refer to them as general queries.
This article is divided into several sections. In the section Positive con-
sequences in logic programs, we describe the theory developed to character-
ize logical consequences from definite and disjunctive logic programs whose
forms are as in (1). In the section Negation in logic programs, we describe
the theories developed to handle general queries in definite and disjunctive
logic programs. In the section Normal or general disjunctive logic programs
we discuss several topics: stratified logic programs, well-founded and gener-
alized well-founded logic programs and generalized disjunctive well-founded
logic programs. In the subsection Stratified logic programs, the theory for
general logic programs with no recursion through negative literals in the
body of a logic program clause is described. Such logic programs are called
Disjunctive Logic Programs 327
We can see that these are the only ground atoms which are logical con-
sequences of P.
A second semantics that can be associated with a logic program is
a procedural semantics. Godel showed that one obtains the same results
with proof theory as one does from model theory. Van Emden and Kowalski
[1976] showed that if one uses a proof procedure called linear resolution with
Disjunctive Logic Programs 329
lattice is the null set 0. The following mapping, defined by van Emden and
Kowalski [1976] is a continuous mapping.
Definition 2.1.4. [van Emden and Kowalski, 1976] Let P be a defi-
nite logic program, and let I be a Herbrand interpretation. Then
and
The least fixpoint contains all the ground atoms which are logical con-
sequences of P
The major result is that the model theoretic, the procedural and fixpoint
semantics all capture the same meaning to a logic program: the set of
ground atoms that are logical consequences of the logic program.
Theorem 2.1.6 ([van Emden and Kowalski, 1976]
(Horn Characterization—Positive)). Let P be a definite logic program
and A 6 HBp. Then the following are equivalent:
(a) A is in Mp
(b) A is in least fixpoint of Tp
(c) A is in SLD(P)
(d) A is a logical consequence of P.
a)}. Neither model contains the other. Furthermore, the intersection of the
two minimal Herbrand models is not a model of the logic program. Hence,
disjunctive logic programs do not have the Herbrand model intersection
property of definite logic programs.
Although there is no unique minimal Herbrand model in a disjunctive
logic program, there is a set of minimal Herbrand models. In 1982, Minker
[1982] developed a suitable model theoretic semantics for disjunctive logic
programs in terms of the minimal models. He showed that a ground clause
is a logical consequence of a disjunctive logic program P if and only if it is
true in every minimal Herbrand model of P. This semantics extends the
unique minimal model theory of definite logic programs to disjunctive logic
programs.
Example 2.2.1. Consider the following disjunctive logic program:
P = {(1) path(X,Y),unconnected(X, Y) <- point(X),point(Y);
(2) point(a) ;
(3) point(b)}.
The minimal Herbrand models of P are given by
Mp = {path(a,a),path(a,b),path(b,a),path(b,b),point(a),point(b)}
Mp = {unconnected(a,a),path(a,b),path(b,a),path(b,b),point(a),
point (b)}
Mp = {path(a,a),unconnected(a,b),path(b,a),path(b,b),point(a),
point(b)}
Mp = {path(a,a),path(a,b),unconnected(b,a),path(b,b),point(a),
point (b)}
Mp = {path(a,a),path(a,b),path(b,a),unconnected(b,b),point(a),
point(b)}
Mf, = {unconnected(a,a),unconnected(a,b),path(b,a),path(b,b),
paint(a)>point(b)}
Mp = {unconnected(a, a), path(a, b), unconnected(b, a), path(b, b),
point(a),point(b)}
Mp = {unconnected(a, a),path(a, b),path(b, a), unconnected(b,b),
point(a),point(b)}
Mp = {path(a,a),unconnected(a,b),unconnected(b,a),path(b,b),
point(a), point(b)}
Mp° = {path(a, a), unconnected(a, b),path(b, a),unconnected(b, b),
point(a),point(b)}
Mp1 = {path(a, a), path (a,b), unconnected(b, a), unconnected(b, b),
point (a), point (b)}
Mp2 = {unconnected(a, a), unconnected(a, b),unconnected(b, a),
path(b, b), point(a),point(b)}
Mp3 — {unconnected(a,a),unconnected(a,b),path(b,a),
unconnected(b, b), point(a), point(b)}
332 Jorge Lobo, Jack Minker and Arcot Rajasekar
We can see that every clause in MSp is true in every minimal Herbrand
model of P enumerated in Example 2.2.1.
Disjunctive Logic Programs 333
is a t-clause representation of
As the proof theoretic semantics one should take the set of all positive
ground clauses (that is, disjunctive clauses made of ground atoms from
the Herbrand base) that one can derive from the logic program using SLI
resolution. We call this set the SLI-success set, (SLI(P)), of P. The proof
336 Jorge Lobo, Jack Minker and Arcot Rajasekar
theoretic and the model theoretic semantics yield the same results.
Theorem 2.2.9 ([Minker and Zanon, 1979]). Consider a disjunctive
logic program P and a ground positive disjunctive clause C. Then,
C E SLI(P) if and only if C is a logical consequence of P
To obtain the fixpoint semantics of a disjunctive logic program, Minker
and Rajasekar [l987a] modified the van Emden-Kowalski fixpoint opera-
tor Tp. When working with disjunctive logic programs, it is not possible
to map Herbrand interpretations to Herbrand interpretations. The natu-
ral mapping with a disjunctive theory is to map a set of positive ground
disjuncts to positive ground disjuncts. Minker and Rajasekar used the dis-
junctive Herbrand base, DHBp, and its subsets to define a lattice for the
mapping. Subsets of the DHBp under the partial order C form a lattice.
Minker and Rajasekar defined their fixpoint operator to be:
Definition 2.2.10 ([Minker and Rajasekar, 1987a]). Let P be a dis-
junctive logic program, and let S be a state. Then
TP(S) = {C E DHB(P)/ C' - B1, . . . , Bn is a ground instance of a
clause in P,
B1 V C1, . . . , Bn V Cn are in S, and C is the smallest factor of the
clause C'VC1V...VC n , where the Ci, 1 < i < n are positive clauses.}
The smallest factor of a ground clause is another claus C' such that C' has
only distinct ground atoms and is a subset of C. Minker and Rajasekar
[l987a] showed that the operator Tp is continuous and the least fixpoint of
Tp captures all the minimal ground clauses which are SLI-derivable from
a disjunctive logic program P. By a minimal ground clause which is SLI-
derivable we mean that no sub-clause of it is in the set SLI(P).
Example 2.2.11. Consider the disjunctive logic program P given in Ex-
ample 2.2.1. The least fixpoint of Tp is computed as follows:
= {point(a),point(b)}
Tp({point(a),point(b)})
= {path(a, a) V unconnected(a, a),path(a, b) V unconnected(a, b),
path(b, a) V unconnected(b, a),path(b, b) V unconnected(b, b),point(a),
point(b)}
This is the least fixpoint and contains all the minimal ground clauses which
are derivable from P.
The major result in disjunctive logic programming is that the model
semantics based on Herbrand models and model-states, the proof seman-
tics and the fixpoint semantics yield the same semantics and capture the
set of minimal positive clauses that are logical consequences of the logic
program. Moreover, each of these semantics reduce to corresponding se-
mantics for definite logic programs discussed in the previous section when
Disjunctive Logic Programs 337
Data Bases. To answer negated queries, Reiter defined the closed world as-
sumption (CWA). According to the CWA, one can assume a negated atom
to be true if one cannot prove the atom from the logic program. Reiter
showed that the union of the theory and the negated atoms proved by the
CWA is consistent. Clark viewed negation of an atom as a lack of sufficient
condition for provability of the atom. Clark argued that the logic program
clauses in a definite logic program should be viewed as definitions of the
atoms in the Herbrand base of the program. Hence they are necessary and
sufficient to provide a proof for the atoms. That is, what one should do is
consider the logic clauses as definitions to imply if-and-only-if conditions
instead of only if conditions. To do so, one effectively reverses the if con-
dition in the logic program clauses of a definite logic program P to be an
only-if condition and then takes the union of this set of clauses with the
original logic program clauses. The union of these two sets, augmented by
Clark's equality theory (CET) [Clark, 1978] is referred to as the Clark com-
pletion of the program, and written comp(P). Clark shows that by using
what is called the negation as finite failure (NAF) rule on the if defini-
tions, one can conclude the negation of a ground atom if it fails finitely to
prove the atom. He augmented SLD-resolution with this rule, and called it
SLDNF-resblution, and showed that it is sound and complete with respect
to the the semantics defined by comp(P).
Shepherdson [1984; 1985; 1987] showed a relationship between answers
found using the CWA and the comp(P) theories of negation. He provides
conditions under which they are the same and under which they may differ.
Proof-theoretic Definition:
WGCWA(P) = {A E HBP / AK, K is a positive (possibly empty) ground
clause, P derives A V K}.
We can consider the concept of derivability from a program P as equiv-
alent to membership in the least fixpoint of Tp.
Example 3.2.4. Consider the disjunctive logic program P given in Exam-
ple 3.2.2. The derivable consequences of P (as given by the least fixpoint
of Tp)
clauses with atomic heads) or normal disjunctive logic programs. One can,
of course, write an equivalent formula for a disjunctive logic program which
does not contain negated atoms, by moving the negated atom to the head
of the logic program clause to achieve a disjunction of atoms in the head of
the clause. This, however, has a different connotation than that intended
by a negated atom in the body of a clause. It is intended in this case that
the negated atom be considered solved by a default rule, such as the CWA
or negation as finite failure.
As noted by Apt, Blair and Walker [Apt et al., 1987], and also by Van
Gelder [1987], by Naqvi [1986] and by Chandra and Harel [1985], problems
arise with the intended meaning of a normal logic program in some in-
stances. For example, the logic program {P = p(a) - -q(a), q(a) - p(a)}
raises questions as to its meaning. We describe alternative ways to han-
dle normal definite logic programs and the corresponding approach with
normal disjunctive logic programs.
Note that the atoms in the negative literals appearing in the body of a
clause in stratum Pi appear only in the heads of clauses in stratum Pj with
j < i. That is, the atoms appearing in these literals are defined in a stratum
below them. Hence, there can be no recursion through these literals. In
the case of positive literals appearing in the body of a clause in stratum
Pi it can be seen that the atoms forming these literals are in the heads of
clauses in stratum Pj with j < i. again, this implies that these atoms are
defined in the same stratum or in strata below. Recursion through these
literals are allowed. The intended model is calculated iteratively as follows:
Mp = Mp3 is the intended meaning of the program using the Apt, Blair
and Walker semantics.
The model semantics achieved by the theory developed by Apt, Blair
and Walker [1987] is independent of the manner in which the logic program
is stratified. Gelfond and Lifschitz [1988] show that Mp is also a stable
model. Przymusinski [l988a] has defined the concept of a perfect model
and has shown that every stratified logic program has exactly one perfect
model. It is identical to the model obtained by Apt, Blair and Walker.
Theorem 4.1.2. Let P be a stratified normal program. Then
Disjunctive Logic Programs 343
1. Tp(MP) = MP,
2. Mp is a minimal Herbrand model of P,
3. Mp is a supported model of P,
4. Mp is a stable model of P,
5. Mp is a perfect model of P.
In the procedural interpretation of normal disjunctive programs prob-
lems arise when a negated literal has to be evaluated and the literal contains
a variable. In this case, the logic program is said to flounder. Chan [1988]
has defined constructive negation, that will find correct answers even in the
case of negated literals that contain variables. The underlying idea behind
constructive negation is to answer queries using formulas involving only
equality predicates. The following example illustrates the concept behind
constructive negation.
Example 4.1.3. Consider the following normal program:
unconnected(b, Y)
This query cannot be answered using the NAF rule [Clark, 1978] which
requires that a negative literal be ground before it is selected in an SLDNF-
derivation. Chan [1988] developed constructive negation which provides
answers using inequality. An answer to the above query in his theory
would be the inequality {Y = a}
1. Tcp(MSp) = MSp,
2. MSp is a supported state of P,
3. MSp is a stable state of P.
Van Gelder, Ross and Schlipf [Van Gelder et al., 1988] define the concept
of well-founded semantics to handle such logic programs. Przymusinski
[1988b] presents the ideas of well-founded semantics in terms of two sets
of atoms T and F. Atoms in T are assumed to be true and those in F are
assumed to be false. If an atom is neither true nor false, it is assumed to
be unknown. Thus, in the logic program,
P = {p - a, p - b, a - -b, b - -a}.
We can conclude that p, a, and b are all unknown. The atom p is considered
unknown since it is defined using a and & both of which are unknown.
Van Gelder, Ross and Schlipf [Van Gelder et al., 1988] develop fixpoint
and model theoretic semantics for such logic programs. Ross [l989b] and
Przymusinski [1988b] develop procedural semantics. The three different
semantics are equivalent.
If one analyzes the above logic program, another meaning to the logic
program is also possible. In particular, the last two logic program clauses
state that a is true if b is not true, and 6 is true if a is not true. Hence,
if a is true then p must be true and if b is true p must be true. Thus,
although we may not be able to conclude which of a or & are true, we can
surely conclude that p must be true. Baral, Lobo and Minker [Baral et
al., 1989a] develop model theoretic, fixpoint and procedural semantics to
capture the meaning of logic programs such as given above. They term
this generalized well-founded semantics (GWFS). The fixpoint definition is
similar to the definitions of well-founded semantics of Przymusinski. Every
atom proved to be true in the well-founded semantics is also true in the
generalized well-founded semantics. However, some additional atoms may
be proved true in the GWFS.
Baral, Lobo and Minker [Baral et al., 1989b] describe a semantics which
solves the above problem in GWFS. The semantics is based on disjunctions
and conjunctions of atoms instead of only sets of atoms. The disjuncts
are assumed to be true and the conjuncts are assumed to be false. This
allows the representation of indefinite information. That is, -aV-b can be
assumed to be true (by having a A 6 as false) without knowing if either -a
or -b or both are true. The semantics is also general enough to extend the
well-founded semantics to normal disjunctive programs. They also present
a procedural semantics for the extended semantics and show how to restrict
the procedure to compute the generalized well-founded semantics. Since
the procedure handles not only atomic but also disjunctive information,
factoring and bookkeeping for ancestry resolution are needed. In addition,
the deduction of negative information is obtained using the GCWA which
was proved to be more complex than the closed world assumption [Chomicki
and Subrahmanian, 1989].
There have been other extensions of the well-founded theory to disjunc-
tive logic programs. Ross [l989a] developed a strong well-founded semantics
and Przymusinski [1990] developed what is called a stationary semantics.
This work extends well-founded semantics to disjunctive logic programs.
Table 4 summarizes the results that have been obtained in well-founded
semantics for definite and disjunctive logic programs.
5 Summary
We have described the foundational theory that exists for definite normal
logic programs and the extensions that have been made to that theory
and to disjunctive normal logic programs. The results are summarized in
348 Jorge Lobo, Jack Minker and Arcot Rajasekar
tables 1-4. The theory of definite and disjunctive logic programs applies
equally to deductive databases where one typically assumes that the rules
are function-free. A firm foundation now exists both for definite normal
and disjunctive normal logic programs for deductive databases and logic
programming.
Although we have developed model theoretic, proof theoretic and fix-
point semantics for disjunctive logic programs, efficient techniques will be
required for computing answers to queries in disjunctive deductive databases
and logic programs. Some preliminary work has been reported by Minker
and Grant [l986b], Liu and Sunderraman [1990], and by Henschen and
his students [Yahya and Henschen, 1985; Henschen and Park, 1986; Chi
and Henschen, 1988]. However, a great deal of additional work is required
regarding theoretical, implementational and applicative aspects of disjunc-
tive logic programming. Questions regarding negation, efficient proof pro-
cedures and answer extraction are important areas which need to be looked
into. Implementing a language based on disjunctive logic has open prob-
lems which need to be solved such as efficient data structure, subsumption
algorithm, control strategies and extra logical features. Applications rang-
ing from knowledge based systems, to common sense reasoning, to natural
language processing seem to be appropriate domains for applying disjunc-
tive logic programming. These areas and others need to be explored.
Disjunctive Logic Programs 349
6 Addendum
Disjunctive logic programming has made significant progress since the pa-
per was submitted in 1989.1
Semantics of disjunctive logic programming have been extended to in-
clude literals both in the head and in the body of disjunctive clauses and
default negation in the body of clauses. In [Lobo et al., 1992] the theoretical
foundations of disjunctive logic programs including theories of negation, the
semantics and the view update problem in disjunctive deductive databases
are given. See [Minker, 1994] for work up to 1994; [Minker and Ruiz, 1996]
for literature on disjunctive theories that contain literals in clauses and
default negation in the body of clauses, and on theories that contain both
literals and multiple default rules in the body of clauses.
Nonmonotonic reasoning and logic programming are closely related.
Nonmonotonic theories such as circumscription, default reasoning and au-
toepistemic logic can be transformed to disjunctive logic programs. See
Minker [1993; 1996] for references. Thus, disjunctive logic programs can
serve as computational vehicles for nonmonotonic reasoning.
Complexity and properties of disjunctive logic programs have been stud-
ied extensively. Complexity results are known for most theories of extended
disjunctive logic programs, see [Minker, 1996]. For references to work on
properties of the consequence relations defined by the different semantics of
disjunctive logic programs, see [Minker, 1996]. Based on properties of pro-
grams and their complexity, users may select the semantics of disjunctive
logic programs of interest to them.
Disjunctive deductive databases are a subset of disjunctive logic pro-
gramming. Such databases are function-free and hence on is dealing with fi-
nite theories. For theories and algorithms of disjunctive deductive databases,
see [Fernandez and Minker, 1995]. For work and references on the view up-
date problem, see [Fernandez, Grant and Minker, 1996]. For references to
the role of disjunctive databases in knowledge bases, see [Minker, 1996].
Acknowledgements
We wish to express our appreciation to the National Science Foundation
for their support of our work under grant number IRI-86-09170 and the
Army Research Office under grant number DAAG-29-85-K-0-177.
References
[Apt et al., 1987] K. R. Apt, H. A. Blair, and A. Walker. Towards a theory
of declarative knowledge. In J. Minker, editor, Foundations of Deduc-
1This paper was written in 1989. the addendum specifies where references to work
through 1996 may be found. Space did not permit citing all relevant work.
350 Jorge Lobo, Jack Minker and Arcot Rajasekar
Contents
1 Overview/introduction 356
1.1 Negation as failure, the closed world assumption and the
Clark completion 356
1.2 Incompleteness of NF for comp(P) 359
1.3 Floundering, an irremovable source of incompleteness 359
1.4 Cases where SLDNF-resolution is complete for
comp(P) 361
1.5 Semantics for negation via special classes of model 362
1.6 Semantics for negation using non-classical logics . . 363
1.7 Constructive negation: an extension of negation as fail-
ure 364
1.8 Concluding remarks 365
2 Main body 365
2.1 Negation in logic programming 365
2.2 Negation as failure; SLDNF-resolution 367
2.3 The closed world assumption, CWA(P) 370
2.4 The Clark completion, comp(P) 374
2.5 Definite Horn clause programs 384
2.6 Three-valued logic 385
2.7 Cases where SLDNF-resolution is complete for
comp(P): hierarchical, stratified and call-consistent pro-
grams 391
2.8 Semantics for negation in terms of special classes of mod-
els 393
2.9 Constructive negation; an extension of negation as fail-
ure ... 402
2.10 Modal and autoepistemic logic 406
356 J. C. Shepherdson
1 Overview/introduction
1.1 Negation as failure, the closed world assumption
and the Clark completion
The usual way of introducing negation into Horn clause logic programming
is by 'negation as failure': if A is a ground atom
the goal -A succeeds if A fails
the goal -A fails if A succeeds.
This is obviously not classical negation, at least not relative to the given
program P; the fact that A fails from P does not mean that you can prove
-A from P, e.g. if P is
a - -b
Negation as failure (NF) is sound for the closed world assumption for
both success and failure, i.e.
if?—Q succeeds from P with answer 0 using NF then CWA(P) =H
Q0,
if? -Q fails from P using NF then CWA(P) =H -Q
where T =H S means 'S is true in all Herbrand models of T '.
The closed world assumption seems appropriate for programs represent-
ing the simpler kinds of database, but its more general use is limited by
two facts:
1. If P implies indefinite information about ground literals then
CWA(P) is inconsistent.
e.g. if P is the program above consisting of the single indefinite clause,
a - -b, then neither a nor 6 is a consequence of P so in forming
CWA(P) both -a and -b are added, and the original clause, which
is equivalent to a V b, is inconsistent with these.
2. Even if P consists of definite Horn clauses, NF may be incomplete
for CWA(P), i.e.
'if CWA(P) =H Q0 then ? — Q succeeds from P with answer including
6 using NF' fails to hold for some programs P.
Indeed there may be no automatic proof procedure which is both
sound and complete for CWA(P), because there are P such that the
set of negative ground literals which are consequences of CWA(P)
may be non-recursively enumerable.
A more widely applicable declarative semantics for NF was given by
Clark [1978]. This is now usually called the (Clark) completion, comp(P),
of the original program P. It is based on 'the implied iff', the idea that
when in a logic program you write
even(0) —
even(s(s(x))) — even(x),
In the general case to form comp(P) you treat each predicate symbol p
like this, rewriting the clauses of P in which p appears in the head in the
form of an iff definition of p. Since this introduces the equality predicate
= it is necessary to add axioms for that. These take the form of the usual
equality axioms together with 'freeness axioms', which say that two terms
are equal iff they are forced to be by the equality axioms; for example if
358 J. C. Shepherdson
f and g are different unary function symbols one of the freeness axioms is
f(x) = g(y).
The basic result of [Clark, 1978] is that NF is sound for comp(P) for
both success and failure. Many logic programmers regard this as a justifi-
cation for taking comp(P) to be the declarative meaning of a logic program,
indeed some of them take it so much for granted they do not feel the need
to say that this is what they are doing. Quite apart from the lack of com-
pleteness which we discuss below, I think this is unsatisfactory. Although
in everyday language we may often use 'if' when we mean 'iff', confus-
ing them is responsible for many of the logical errors made by beginning
students of mathematics. Since one of the merits of logic programming is
supposed to be making a rapprochement between the declarative and pro-
cedural interpretation of a program, in the interests of Wysiwym— What
you say is what you mean—logic programming, I think that if you mean
'iff' you should write 'iff'; if you want to derive consequences of comp(P)
you should write comp(P), and if in order to carry out this derivation it is
necessary to go via P then this should be done automatically. A practical
reason for doing this would be that although in simple examples like the
one given it is easy to understand the meaning of comp(P) given P, this is
no longer true when P contains 'recursive' clauses with the same predicate
symbol occurring on both sides of the implication sign, or clauses displaying
mutual recursion. Unfortunately writing comp(P) instead of P would not
solve all our problems because the fact that NF is usually incomplete for
comp(P) means that two programs with the same completion can behave
differently with respect to NF. For example the program
has completion
but the query ? - a succeeds with respect to the latter but not with respect
to the former.
Despite these shortcomings of the closed world assumption and the
completion, they dominate the current view of negation as failure to such
an extent that a large part of this chapter will be taken up with their study.
It must be admitted that the notion of the 'implied iff' is implicit in the use
of negation as failure together with unification, and that the completion,
although not as transparent as one would wish, is from the logical point of
Negation as Failure, Completion and Stratification 359
view one of the simplest declarative semantics which have been proposed
for negation as failure.
and when a positive literal Li is selected from the current goal the compu-
tation tree proceeds in the same way as the SLD-resolution used for definite
Horn clauses ; i.e. if Li unifies with the head of a program clause
resulting from its removal. If the goal - B succeeds then -B fails so the
main derivation fails at this point. If neither of these happens the main
derivation has a dead-end here. This can arise because the derivation tree
for - B has infinite branches but no successful ones, or if it, or some
subsidiary derivation, flounders or has dead-ends.
The fact that we cannot deal with non-ground negative literals means
that we can only hope to get completeness of SLDNF-resolution, for any
semantics, for queries which do not flounder. In general the problem of
deciding whether a query flounders is recursively unsolvable ([Borger, 1987];
a simpler proof is in [Apt, 1990]), so a strong overall condition on both the
program and the query is often used which is sufficient to prevent this. A
query is said to be allowed if every variable which occurs in it occurs in a
positive literal of it; a program clause A - L1 ,..., Ln is allowed if every
variable which occurs in it occurs in a positive literal of its body L1,..., Ln,
and a program is allowed if all of its clauses are allowed. It is easy to show
that if the program and the query are both allowed then the query cannot
flounder, because the variables occurring in negative literals are eventually
grounded by the positive literals containing them.
Negation as Failure, Completion and Stratification 361
and
Horn clause programs, this least fixpoint semantics coincides with the se-
mantics based on CWA(P) because the least fixpoint model is a model,
and the only model, of CWA(P). It is also almost identical, for the usual
positive queries, with the semantics based on considering all models of P.
This is because the least fixpoint model is a generic model, and if Q is a
conjunction of atoms then 3Q is true in the least fixpoint model of P iff it is
true in all models of P. [However it is not true that an answer substitution
which is correct for the least fixpoint model is correct in all models, e.g. in
the example above the query ?even(y) has the correct answer y = s(s(x))
in the least fixpoint model, but not in all models.] For such positive queries
?Q the least fixpoint semantics also agrees with the semantics based on all
models of comp(P). This is because comp(P) = 3Q iff P = 3Q. However
this agreement no longer holds in general for queries containing negation.
For example for the program P:
the negative ground literal -num(s(0)) is true in the least fixpoint model
of P but not in all models of comp(P) or of P.
And when negative literals are allowed in the bodies of program clauses
the least fixpoint model may no longer exist. A natural alternative then is
to consider (as in Minker [1982]) a semantics based on the class of minimal
Herbrand models of P. Apt et al. [1988] advocate a semantics based on the
class of minimal Herbrand models of P which are also models of comp(P),
and Przymusinski [l988b] proposes an even more restricted class of perfect
models.
We discuss these model-theoretic semantics in Section 2.8, but only
briefly, because they are considered in more detail in Chapters 2.6 and 3.3
of this volume. They are not all directly related to negation as failure.
For example negation as failure allows ?q to succeed from the program
P : q - -p, but q is not true in all minimal models of P. So negation as
failure is not sound for the semantics based on all minimal models of P.
It is sound for the semantics based on all minimal models of P which are
also models of comp(P), but since this in general a proper subset of the
set of all models of comp(P), negation as failure must be expected to be
even more incomplete for this semantics than for the usual one based on
all models of comp(P).
where 3 quantifies all variables except x1,... ,xn. When the SLDNF-tree
for a query Q is finite, with answers A1,..., Ak then
of hierarchic program, although it may well be true for many programs and
queries in practice.
1.8 Concluding remarks
Neither the closed world assumption nor the Clark completion provide a
satisfactory declarative semantics for negation as failure since although
it is sound for both of them it is not in general complete for either of
them. And it is even more incomplete for the semantics based on minimal,
perfect, well-founded or stable models of comp(P). There are sound and
complete semantics expressible in modal or linear logic but these seem too
complicated to serve as a practical guide to the meaning of a program. I
believe that this is inevitable; that the use of negation as failure is only
justifiable in general by some very contorted logic, and that it is one of
the impure features of present day logic programming which should only
be used with great caution.
Those who are wedded to comp(P) as the 'right', semantics for nega-
tion as failure must accept the incompleteness of negation as failure for
this semantics, or confine themselves to programs and queries for which
completeness has been proved for comp(P), e.g. the allowed programs and
queries which are definite, hierarchical or semi-strict (call-consistent) as
described in Section 2.7. If they are prepared to think in terms of 3-valued
logic then allowedness is enough. However, although comp(P) is one of the
simplest semantics proposed for negation as failure, it is often not easy to
read off its meaning from P.
Comprehensive surveys of recent work can be found in the special issue
(Vol 17, nos 2, 3, 4, November 1993) of the Journal of Logic Programming,
devoted to Non-monotonic Reasoning and Logic Programming and in [Apt,
1994].
I am grateful to K. R. Apt and K. Fine for reading earlier drafts very
carefully and making several corrections and many improvements.
2 Main body
2.1 Negation in logic programming
Before taking up our main topic of negation as failure we mention briefly
in this section some other kinds of negation used in logic programming.
The obvious treatment of negation—allowing the full use of classical
negation in both program clauses and queries and using a theorem prover
which is sound and complete for full first order logic—is generally believed
to be infeasible because it leads to a combinatorial explosion. However
there have been some attempts to do this, e.g. [Stickel, 1986; Naish, 1986;
Poole and Goebel, 1986; Loveland, 1988; Sakai and Miyachi, 1986].
In some cases the use of negation can be avoided by renaming. If there
are no occurrences in the program or query of p(t1,..., tn) for any terms
366 J. C. Shepherdson
1. (true,
(true,l)6R
l)E R
2. If Q is Q1 A A A Q2 where A is a positive literal, if A' - Q' is a
clause of P, if 0=mgu(A, A') and ((Q1 A Q' A Q2)o H) E R, R, then
the
the query ? - r succeeds if the 'last literal' rule is used but not if the Prolog
'first literal' rule is used. {The reason for this discrepancy is that whether
a query fails using SLD-resolution may depend on the computation rule.}
So it is hard to imagine a feasible way of implementing SLDNF-resolution,
since to determine whether a query succeeds requires a search through all
possible derivation trees, using all possible selections of literals. It is shown
in Shepherdson [1985] that there are maximal computation rules Rm such
that if a query succeeds with answer 6 using any computation rule then it
does so under Rm, and if it fails using any rule then it fails using Rm, but
in Shepherdson [1991] a program is given for which there is no maximal
recursive rule. What causes the difficulty is that in SLDNF-resolution once
having chosen a ground negative literal -A in a goal G you are committed
to waiting possibly forever for the result of the query A before proceeding
with the main derivation. What you need to do is to keep coming back and
trying other choices of literal in G to see whether any of them fail, since
when one of these exists it is not always possible to determine it in advance.
This would be very complicated to implement. This is presumably why
SLDNF-resolution insists that the evaluation of a negative call is pursued
to completion before attending to siblings.
It is also important not to allow the use of negation as failure in the
present form on non-ground negative literals. A query ? - p(x) is taken to
mean ? - 3xp(x), and a query ? - -p(x) to mean ? - 3x-p(x). It is possible
Negation as Failure, Completion and Stratification 369
that both of these axe true so it would be unsound to fail ? - 3z-ip(a;) just
because ? — 3xp(x) succeeded. For example for the program
p(a)
?-p(x) should succeed because ?- p(a) succeeds, and ?- -p(x) should also
succeed, because ? - -p(b) succeeds (using negation as failure legitimately
on the ground negative literal -p(b)). Most Prolog implementations are
unsound because they allow the use of negation as failure on non-ground
negative literals, or even on more general goals. They are, of course also
incomplete even for SLD-resolution because of their depth-first search, so
it would appear to be very difficult to give a simple declarative semantics
for negation as treated in Prolog.
This inability to handle non-ground negative literals means that if we
reach a flounder, a goal containing only non-ground negative literals, we
can, in SLDNF-resolution, proceed no further. This situation is a source of
incompleteness in SLDNF-resolution. For the program p(x) - - r ( x ) , the
query ? - p(a) succeeds but ? - p(x) (meaning ? - 3xp(x)) does not. For
the program
p(x) - -q(x)
r(a)
the query ? - p ( x ) , r(x) succeeds but the query ? - p(x) flounders. So the
set of queries which succeed from a program P using SLDNF-resolution is
not closed under either of the rules:
o(a) AAB
3xo(x) A
where T =H S means 'S is true in all Herbrand models of T'. This is be-
cause CWA(P)U-Q consists of universal sentences not containing equality
so, by the usual Herbrand-Skolem argument, if it has a model it has a Her-
brand model. In particular if CWA(P) is consistent it has a Herbrand
model.}
Negation as Failure, Completion and Stratification 371
(i.e. CWA(P) is consistent) if the Herbrand universe, i.e. the set of ground
terms, is the usual one {a, b} determined by the terms appearing in P, and
it continues to do so if any more atoms from the corresponding Herbrand
base {p(a), p(b), q(a), q(b)} are added to the program. But if the Herbrand
universe is enlarged to {a, b, c} the closed world assumption becomes in-
consistent. This ambiguity does not arise if the program is a definite Horn
clause program, for the argument above shows that the closed world as-
sumption for such a program is consistent whatever Herbrand universe is
used.
Makowsky [1986] observed that the consistency of the closed world as-
sumption is also equivalent to an important model-theoretic property:
If P is a set of first order sentences then a term structure M is
a model for CWA(P) iff it is a generic model of P, i.e. for all
ground atoms A, M = A iff P = A.
{We use the words 'term structure' rather than 'Herbrand interpreta-
tion' here because in this more general setting where the sentences of P
may not be all universal, the word 'Herbrand' would be more appropriately
used for the language extended to include the Skolem functions needed to
express these sentences in universal form.} This notion of a generic model
is like that of a free algebraic structure; just as a free group is one in which
an equation is true only if it is true in all groups, so a generic model of P
is one in which a ground atom is true only if it is true in all models of P,
i.e. is a consequence of P. So it is a unique most economical model of P
in which a ground atom is true iff it has to be. If we identify a term model
in the usual way with the subset of the base (set of ground atoms) which
are true in it, then it is literally the smallest term model. It is easy to see
that the genericity of a generic term model extends from ground atoms to
existential quantifications of conjunctions of ground atoms:
// M is a generic model of P, and Q is a conjunction of atoms,
then M =3Q iff P = E Q .
So a positive query is true in M iff it is a consequence of P, and one
can behave almost as though one was dealing with a theory which had a
unique model. This is one of the attractive features of definite Horn clause
logic programming, which, as the results above show, does not extend much
beyond it. But notice that if one is interested in answer substitutions then
one cannot restrict consideration to a generic model, e.g. if P is:
p(s(x))
For negative queries not only is this genericity property lost, but as
Apt et al. [1988] have pointed out, the set of queries which are true under
the closed world assumption, i.e. true in the generic model, may not even
be recursively enumerable, that is to say there may be no computable
procedure for generating the set of queries which ought to succeed under
the closed world assumption. To show this take a non-recursive recursively
enumerable set W and a definite Horn clause program P with constant 0,
unary function symbol s and unary predicate symbol p, such that
P - p(sn(0)) iff n E W.
[This is possible since every partial recursive function can be computed
by a definite Horn clause program ([Andreka and Nemeti, 1978]). Now
-p(sn(0)) is true under the closed world assumption iff n EW. However,
this situation does not arise under the conditions under which Reiter origi-
nally suggested the use of the closed world assumption, namely when there
are no function symbols in the language. Then the Herbrand base is finite
and so is the model determined by the closed world assumption, hence the
set of true queries is recursive.
The relevance of the closed world assumption to this chapter is that
negation as failure is sound for the closed world assumption for both success
and failure, i.e.
if ? - Q succeeds from P with answer 0 using SLDNF-resolution
then CWA(P) =H QO,
if?-Q fails from P using SLDNF-resolution then CWA(P) =H
-Q.
For a proof see [Shepherdson, 1984]. However, SLDNF-resolution is
usually incomplete for the closed world assumption. This is bound to be
the case when the closed world assumption is inconsistent. Even for defi-
nite Horn clause programs, where the closed world assumption will be con-
sistent, and even when there are no function symbols, SLDNF-resolution
can be incomplete for the closed world assumption, e.g. for the program
p(a) - p(a), and the query ? - -p(a), since this query ends in a dead-end.
{As noted above, the restriction to Herbrand models makes no difference
in the first of the soundness statements above, i.e. it remains true when =H
is replaced by - This is not true of the second, where we are are dealing
with the negation of a query. For example if P consists of the single clause
p(a) - p(b) and Q is p(x) then Q fails from P but CWA(P) =H -Q is not
true since there are non-Herbrand models of CWA(P) containing elements
c with p(c) true.}
It is possible that one might want to apply the closed world assumption
to some predicates but to protect from it other predicates where it was
known that the information about them was incomplete. For reference to
this notion of protected data see [Minker and Perlis, 1985; Jager, 1988]
374 J. C. Shepherdson
general forms of all these clauses (we assume there are only finitely many)
are:
p(X1,...,Xn) - E1
comp(P) - P.
The basic result of [Clark, 1978] is that negation as failure is sound for
comp(P) for both success and failure:
if? - Q succeeds from P with answer B using SLDNF-resolution
then comp(P) = Q0,
if? - Q fails from P using SLDNF-resolution then comp(P) =
-Q.
For a proof see [Lloyd, 1987, pp. 92, 93]. The key step in the proof is
to show that comp(P) implies that in an SLDNF-derivation tree a goal is
equivalent to the conjunction of its child goals.
This soundness result can be strengthened by replacing the derivabil-
ity relation = by =3 (truth in all 3- valued models, for a specific 3-valued
logic - [Kunen, 1987, Section 2.7]) or by -1, the intuitionistic derivability
relation ([Shepherdson, 1985]) or by a relation -3I, which admits only rules
which are sound both for classical 3-valued and intuitionistic 2-valued logic
([Shepherdson, 1985]). This helps to explain why SLDNF-resolution is usu-
ally incomplete for comp(P) with respect to the usual 2-valued derivability
relation; in order to succeed a query must not only be true in all 2-valued
models of comp(P), it must be true in all 3-valued models, and must be
derivable using only intuitionistically acceptable steps. For example, if P
is the program p - -p, then comp(P) contains p - -p and is inconsistent
in the 2-valued sense, so p is a 2-valued consequence of it, and if SLDNF-
resolution were complete for comp(P) then ? - p should succeed. In fact it
dead-ends. This can be explained by noting that there is a 3-valued model
of comp(P) in which p is not true but undefined. Similarly if P is the
program
where A is an atom and L1,...,L m are literals, the same definition could
be applied when P is a set of sentences of the more general form
where W is a first order formula. This allows much more general statements
to be made, but as Lloyd and Topor [1984] have observed, if one takes the
view that the intended meaning of a program P is not P but comp(P), then
there is no gain in generality. Indeed they show how to transform such a
program P into a normal program P' such that comp(P') is essentially
equivalent to comp(P) (the precise sense of equivalence is explained later).
It must be emphasised that the validity of this transformation depends
crucially on the assumption that the meaning of the program is given by
comp(P). Their transformation makes comp(P') equivalent to comp(P); it
does not make P' equivalent to P. If the intended meaning of the program
was the program P actually written, then the appropriate way to obtain
an equivalent normal program would be to introduce Skolem functions to
get rid of the quantifiers. That would give a program P' essentially equiv-
alent to P - b u t then comp(P') would not be equivalent to comp(P). So
both methods of transformation change the relationship between P and
comp(P). Granted that the intended meaning is comp(P), the transforma-
tion of Lloyd and Topor is useful since it enables one to express comp(P)
using a P expressed in a form closer to natural language, and then convert
P into a normal program P' on which one can use SLDNF-resolution to
derive consequences of comp(P'), i.e. of comp(P). One can also allow the
query to be an arbitrary first order formula W, since if W has free variables
x1,...,x n , the effect of asking ? - W for the program P is the same as ask-
ing ? - answer(x1,..., xn) for the program P* obtained by adding the clause
answer(x1,..., xn) - W to P. More precisely, WO is a logical consequence
of comp(P) iff answer(x1,... , xn)0 is a logical consequence of comp(P*),
and -W is a logical consequence of comp(P) iff -answer(x1,... ,xn) is a
logical consequence of comp(P*). For a proof of this, and of the validity of
the transformation of P into P' given below, see [Lloyd, 1987, Ch. 4].
The easiest way of effecting a transformation of a general program P
into a normal program P' which sends comp(P) into comp(P') is to use
the fact that every first order formula can be built up using -, V, and 3.
Replacing
by
Negation as Failure, Completion and Stratification 381
by
Replacing
by
to define p.} If we now take each statement of the original general program
P, rewrite its body in terms of -, V, and E, and then, starting with the
outermost connective of its body, successively apply these transformations
we end up with a normal program P' such that comp(P') is a conservative
extension of comp(P), so that in particular any query expressed in the
original language is a consequence of comp(P') iff it is a consequence of
comp(P).
The transformation just described is very uneconomical, and goes fur-
ther than it needs. The clauses of the resulting normal program P' are
of the very simple forms A - B, A - -B, where A and B are atoms.
{In fact by replacing B by --B, you can take them all in the latter form!
This demonstrates that it is not always easy to predict the consequences of
forming the completion of a program even if its individual clauses are very
simple.} The body of a normal program clause is allowed to be a conjunc-
tion of literals, so there is no need for us to replace A A B by -(-A V -B)
and apply four transformation steps. The transformations given by Lloyd
and Topor take account of this and are framed in terms of transforming if
382 J. C. Shepherdson
statements whose bodies are conjunctions, e.g. the rule for eliminating V
is to replace
by
is replaced by
and
for each ra-ary predicate symbol in o (this is to ensure that forming the
completion does not add any statements about these predicates). Apart
from the equality and freeness axioms comp(P) is equivalent to
comp(P) in the same way when they do, and it is possible to transform P
into a normal program P' such that comp(P') is a conservative extension
of comp(P). To do this we deal with equations t1 = t2 as follows: use
the equality and freeness axioms to reduce them successively either to false
or to conjunctions of equations of the form x = t, where a is a variable,
then replace all occurrences of x by t and discard the equation. However
programs containing = are not usually considered in this way because the
fact that forming the completion entails adding the freeness axioms means
that = is severely constrained, and the meaning a statement involving it
will have when the completion is made is not easy to predict. In fact
for programs, normal or otherwise, involving =, the freeness axioms are
inappropriate and a different treatment is called for; see [Jaffar et al, 1984;
Jaffar et al., 1986a] for the case of SLD-resolution, [Shepherdson, 1992b]
for SLDNF-resolution.
definite programs and queries'; see [Lloyd, 1987, Chs 2, 3]. Let us see to
what extent these results can be extended to normal queries , i.e. those of
the form A 1 ,..., Ar, -B 1 ,..., -BS ([Apt, 1990, Section 6] contains a more
thorough discussion of this). In this case the negation as failure rule is
invoked only at the end of the computation, to declare a grounded -Bj6
a success or failure according to whether Bj0 fails or succeeds using SLD-
resolution. The soundness of SLDNF-resolution with respect to comp(P)
implies that the 'only if halves of (1),(2),(3), are true for all such queries,
but the 'if halves need additional conditions.
For normal queries Q containing negative literals,
1. above holds if the program P and query Q are allowed,
2. holds if the query Q is ground,
3. holds if Q is ground and contains no positive literals.
These results follow easily from the results above for definite queries.
(1) depends on the fact that if the program and query are allowed then
any computed answer to the query must be ground, because a variable in
the query can only be removed by grounding. (2) is the same as (1) if
Q is ground. (3) uses the fact that if comp(P) implies a disjunction of
ground atoms then it implies one of them, which follows from the existence
of the least fixed point model. The following counter examples show that
(1),(2),(3), do not hold if the conditions on Q are removed. The program
p(x) and allowed query p(x),-q(x) violate (1). The allowed program
p(a) p ( a ) «— p(f(a)) and allowed query p ( x ) , - p ( f ( x ) ) violate (2)
because comp(P) implies - p ( f ( f ( a ) ) ) hence that either x = a or x = f(a)
is a correct answer to the query, but it does not give a definite answer,
and SLDNF-resolution can only give definite answers. (2) would, like (1),
be true for all allowed programs and queries if the classical consequence
relation = was replaced by the intuitionistic derivability relation 1, because
3Q is an intuitionistic consequence of comp(P) iff some Q0 is. Finally the
program p <- p and query p, ->p violate (3).
For definite programs the relation between CWA(P) and comp(P) is
clear. According to CWA(P) all ground atoms not in the least fixed point
model Tp |ware deemed false, but comp(P) implies the falsity only of the
subset of these which fail under SLD-resolution. It can be shown ([Apt and
van Emden, 1982]) that this latter set is the complement with respect to the
Herbrand base Bp of the set Tp w obtained by starting with Tp 0 = Bp
and defining T 4 (a + 1) = TP(TP 4 a),T P a = 0<aTP 4 0 for a a limit
ordinal.
some things are true, some things are false, but about other things we do
not know whether they are true or false. Kleene [1952] introduced such
a logic to deal with partial recursive functions and predicates. The three
truth values are t, true, f, false and u, undefined or unknown. A connective
has the value t or f if it has that value in ordinary 2-valued logic for all
possible replacements of u's by t or f, otherwise it has the value u. For
example p - q gets the truth value t if p is f or q is t but the value u if
p, q are both u. (So p - p is not a tautology, since it has the value u if p
has value u.) The universal quantifier is treated as an infinite conjunction
so V x o ( x ) is t if o(a) is t for all a, f if o(a) is f for some a, otherwise u
(so it is the glb of truth values of the o(a) in the partial ordering given by
t - u, f - u). Similarly 3xo(x) is t if some o ( a ) is t, f if all o(a) are f,
otherwise u.
This logic has been used in connection with logic programming by My-
croft [1983] and Lassez and Maher [1985]. Recent work of Fitting [1985]
and Kunen [1987; 1989] provide an explanation of the incompleteness of
negation as failure for 2-valued models of comp(P). It turns out that nega-
tion as failure is also sound for comp(P) in 3-valued logic, so it can only
derive those consequences that are true not only in all 2-valued models but
also in all 3-valued models.
Use of the third truth value avoids the usual difficulty with a non-Horn
clause program P that the associated operator Tp which corresponds to
one application of ground instances of the clauses regarded as rules, is no
longer monotonic. To avoid asymmetric associations of T with true, let
us call the corresponding operator for 3-valued logic Sp. This operates on
pairs (To, Fo) of disjoint subsets of Bp, the Herbrand base of P, to produce
a new pair (71, Fi) = Sp(To, Fo), the idea being that if the elements of
TO are known to be true and the elements of Fo are known to be false,
then one application of the rules of P to ground instances shows that the
elements of T1 are true and the elements of F1 are false.
Formally, for a ground atom A, we put A in 71 iff some ground instance
of a clause of P has head A and a body made true by (T0, FO), and we put
A in F1 iff all ground instances of clauses of P with head A have body made
false by (To.Fo). In particular if .A does not match the head of any clause
of P it is put into FI, which is in accordance with the default reasoning
behind negation as failure.
A little care is needed in defining the notion of a 3-valued model and the
notion of comp(P). A 3-valued model of a set of sentences 5 is a non-empty
domain together with interpretations of the various function symbols in the
same way as in the 2-valued case. The equality relation '=' is interpreted
as identity (hence is 2-valued), the sentences in 5 must all evaluate to t,
and so must what Kunen calls CET or 'Clark's equational theory', i.e. the
equality and freeness axioms listed above as part of comp(P). If we were
now to write the rest of comp(P), the completed definitions of predicates,
Negation as Failure, Completion and Stratification 387
in the form
using the Kleene truth table for - then we should be committing ourselves
to 2-valued models, for p - p is not t but u when p is u. Kunen therefore
replaces this - with Kleene's weak equivalence - which gives p - q the
value t if p, q have the same truth value, f otherwise. (Note: Our notation
here agrees with Fitting but not with Kunen who uses - instead of our -
and = instead of our -) This saves comp(P) from the inconsistency it can
have under 2-valued logic. For example if P is p - -p then the 2-valued
comp(P) is p - -ip, which is inconsistent; what Kunen uses is p - -p
which has a model with p having the value u. Having done this, if we want
comp(P) ^3 P to hold i.e. 3-valued models of comp(P) also to be models
of P then we must replace the — in the clauses of P by D where p D q is
the 2-valued 'if p is true then q is true', which has the value t except when
p is t and q is f or u, when it has the value f . (Actually this would still
be true with the slightly stronger 2-valued connective which also requires
'and if q is f then p is f .) comp(P) =3 P does not hold if -4 is used in P
because the program p - p has completion p - p which has a model where
p is u which gives p - p the value u not t.
We may identify a pair (T, F) of disjoint subsets of the Herbrand base
with the three valued structure which gives all elements of T the value t,
all elements of F the value f, and all other elements of the Herbrand base
the value u. The pairs (T, F) of disjoint subsets of Bp form a complete
lattice with the natural ordering C.
We define Sp t as we defined Tp t a, i.e. Sp t 0 = (0, , 0),Sp t
(a + 7)Sp(Sp t a),Phip t a = UBaSp t B for a & limit ordinal. We
now have the analogue for Sp of the 2-valued properties of TP.
1. Sp is monotonic.
2. Sp has a least fixed point given by Sp t a for some ordinal a.
3. If T, F are disjoint then (T, F) is a 3-valued Herbrand model of
comp(P) iff it is a fixed point of Sp p.
In addition:
4. comp(P) is always consistent in 3-valued logic.
For a proof see [Fitting, 1985]. The monotonicity is obvious and (2)
is a well known consequence of that. (3) is also easily proved, as in the
2-valued case, and (4) follows from (2) and (3).
However, the operator Sp is not in general continuous and so the closure
ordinal a, i.e. the a such that $p t a is the least fixed point, may be greater
than w. For example if P is :
388 J. C. Shepherdson
then it is easy to check that the closure ordinal is w +1. Fitting shows that
the closure ordinal can be as high as Church-Kleene w1, the first non recur-
sive ordinal. Both Fitting and Kunen show also that a semantics based on
this least fixed point as the sole model suffers from the same disadvantage
as the closed world assumption, namely that the set of sentences, indeed
even the set of ground atoms, that are true in this model may be non-
recursively enumerable (as high as II1 complete in fact). The same is true
of a semantics based on all 3-valued Her brand models, for the programs in
the examples constructed by Fitting and Kunen can be taken to have only
one 3-valued Herbrand model.
Kunen proposes a very interesting and natural way of avoiding this non-
computable semantics: Why should we consider only Herbrand models;
why not consider all 3-valued models? In other words, ask whether the
query Q is true in all 3-valued models of comp(P). He shows that an
equivalent way of obtaining the same semantics is by using the operator
$P but chopping it off at w:
A sentence 0 has value t in all 3-valued models of comp(P) iff
it has value t in $p t n for some finite n.
There are three peculiar features of this result. First, in determining
the truth value of 0 in $p t n, quantifiers are interpreted as ranging
over the Herbrand universe, yet we get equivalence to truth in all 3-valued
models. This may be partly explained by the fact that the truth of this
result depends on $p and comp(P) being formed using a language Lx with
infinitely many function symbols. (Where constants are treated as 0-ary
functions. Actually Kunen uses a language with infinitely many function
symbols of all arities, but the weaker condition above is sufficient.) For
example if P is simply p(a), then the usual language Lp associated with
P has just one constant a, the Herbrand base is {p(a)}, and if $p is
evaluated wrt this, then Vxp(x) has value t in $p t 1. But Vxp(x) is
not true in all 3-valued (or even all 2-valued) models of comp(P), and it
does not have value t in any $p t n if $p is evaluated wrt a language
with infinitely many function symbols, because if b is a new constant (or
the result /(a,..., a) of applying a new function symbol to a) then p(b)
has value f in $p t 1. Second, truth in some $p t n is not the same
as truth in $p t w, which is not usually a model of comp(P). Third
the result holds only for sentences built up from the Kleene connectives
A, V, -, t, -, V, 3, which have the property that if they have the value
t or f and one of their arguments changes from u to t or f then their
value doesn't change. The weak equivalence - used in the sentences giving
the completed definitions of the predicates in comp(P) does not have this
property, so the sentences of comp(P)—which obviously have value t in all
3-valued models of comp(P)—may not evaluate to t in any $p t n. For
example if P is
Negation as Failure, Completion and Stratification 389
This has value f in each $p t n for finite n, because for y = s n (0) the
left hand side is u but the right hand side is t.
If $P and comp(P) are formed using a language L with only finitely
many function symbols then [Shepherdson, 1988b] the corresponding result
is
A sentence 0 has value t in all 3-valued models of comp(P) U
{DC A} iff it has value t in $p t n for some finite n.
Here DCA is the domain closure axiom for L
(where rf is the arity of /), which states that every element is the value of
a function in L (so it is satisfied in Herbrand models formed using L.).
It is worth noting here the way in which comp(P) depends on the lan-
guage L. As mentioned above this is due to comp(P) containing the freeness
axioms for L. Let us denote by compL (P) the completion of P formed us-
ing the language L, by GkL the sentence expressing the fact that there
are greater than or equal to k distinct elements which are not values of
a function in L, and by GL the set of sentences expressing the fact that
there are infinitely many such elements. Let L(P) denote the language of
symbols occurring in the program. Then [Shepherdson, 1988a]:
Let L2 D L1 D L(P). Then compL 2 (P) is a conservative extension of
1. compL1(P) if L1 has infinitely many function symbols,
2. compL1 (P) U GL1 if L1 has finitely many function symbols and L2 —
LI contains a function symbol of positive arity or infinitely many
constants,
3. compLl (P) U {Gkl1,} if L1 has finitely many function symbols and
L2 — L1 contains no function symbol of positive arity but exactly k
constants.
These results are true in both 2- and 3-valued logic since only the equal-
ity predicate is involved, and this is always taken to be 2-valued.
Kunen shows that whether 0 has value t in $p t n is decidable, so
when a language with infinitely many function symbols is used:
The set of Q such that comp(P) N3 Q is recursively enumerable.
An alternative proof could be obtained by giving a complete and con-
sistent deductive system for 3-valued logic, as Ebbinghaus [1969] has done
390 J. C. Shepherdson
for a very similar system of 3-valued logic. This alternative proof shows
that the result holds for any language. For definite Horn clause programs
3-valued logic gives results which are in good agreement with those of 2-
valued logic.
If P is a definite Horn clause program then
call-consistent above.
Following Apt, Blair and Walker he defined a program to be strict iff
there are no p, q such that p -+1 q and p --1 q. Finally if Q is a query, let
us define Q -, p iff either q -i p for some q occurring positively in Q, or
q--i p for some q occurring negatively in Q. Then a program P is said to
be strict with respect to Q iff for no predicate letter p do we have Q -+1 p
and Q --1 p. [This condition excludes queries Q such as p, -p; here -Q is
a tautology so is a consequence of any comp(P) but Q does not fail unless
p succeeds or fails.]
Kunen [1987] proved:
If P is a call-consistent program which is strict wrt the query
Q then
comp(P) (=2 VQ implies comp(P) =3 VQ,
comp(P) =2 -3Q implies comp(P) =3 -3Q.
[Actually the second part is not explicitly stated but follows by a similar
argument]. Combining this with the 3-valued completeness results given in
Section 2.6 above gives:
If P is an allowed, call-consistent normal program, Q is an
allowed normal query, and P is strict with respect to Q, then
SLDNF-resolution is complete with respect to comp(P) in 2-
valued logic for the query Q, i.e.
if comp(P) h Q9 then Q succeeds with answer 6
if comp(P) I—3Q then Q fails.
This does not include the result of Clark, that if call-consistent here is
strengthened to hierarchical then the condition that P is strict with respect
of Q is not needed. Kunen gives a version that includes this as follows. Let
P be a subset of the predicate symbols of P which is downward closed i.e.
if the predicate symbol in the head of a clause of P belongs to P then so do
all predicate symbols in the body of the clause. Suppose that P restricted
to P is hierarchical. Then the condition that P is strict with respect to
Q can be weakened to: for no predicate letter p outside of P do we have
Q -+1 p and Q --1 p.
These results extend to general programs (Section 2.4) where arbitrary
first formulae are allowed in the bodies of program clauses, if the notions
of allowed, call-consistent and strict are defined in the appropriate way; see
[Cavedon, 1988] for details.
There may be more than one model satisfying these conditions, e.g if
P is the stratified program p - p, q - -p there are two such models {p}
and {q}. They show that there is one such model Mp which is defined in
a natural way and propose that it be taken as defining the semantics for
the program P, i.e., that an ideal query evaluation procedure should make
a query Q succeed if Q is true in Mp and fail if Q is false in Mp. They
give two equivalent ways of defining Mp. A stratified program P can be
partitioned
Notice that like comp(P), Mp depends not only on the logical content
of P but on the way it is written, for p 4- -q and q - -p give different
Mp.
Since SLDNF-resolution is sound for comp(P) and Mp is a model of
comp(P), it is certainly sound for Mp but, since more sentences will be
true in Mp than in all models of comp(P), SLDNF-resolution will be even
more incomplete for Mp than for comp(P). For example, with the program
above, where q is true in Mp but not in all models of comp(P), there is no
chance of proving q by SLDNF-resolution.
In general there may be no sound and complete computational proof
procedure for the semantics based on Mp. This is shown by the example
in 2.3 which shows this for CWA(P), since for definite programs Mp is the
least Herbrand model, so the semantics based on Mp coincides with that
based on CWA(P). However Apt, Blair and Walker do give an interpreter
which is sound, and which is complete when there are no function symbols.
Przymusinski [l988b; 1988a] proposes an even more restricted class of
models than the minimal, supported models, namely the class of perfect
models. The argument for a semantics based on this class is that if one
writes p V q, then one intends p, q to be treated equally; but, if one writes
p .- -q there is a presupposition that in the absence of contrary evidence
q is false and hence p is true. He allows 'disjunctive databases,' i.e., clauses
with more than one atom in the head, e.g.
and his basic notion of priority is that the Cs here should have lower pri-
ority than the Bs and no higher priority than the As. To obtain greater
generality, he defines this notion for ground atoms rather than predicates,
i.e., if the above clause is a ground instance of a program clause he says
that Ci Bj, Ci - Ak. Taking the transitive closure of these relations
establishes a relation on the ground atoms that is transitive (but may not
be asymmetric and irreflexive if the program is not stratified).
His basic philosophy is
... if we have a model of DB and if another model N is obtained
by possibly adding some ground atoms of M and removing some
other ground atoms from M, then we should consider the new
model TV to be preferable to M only if the addition of a lower
priority atom A to TV is justified by the simultaneous removal
from M of a higher priority atom B (i.e. such that B > A).
This reflects the general principle that we are willing to min-
imize higher priority predicates, even at the cost of enlarging
predicates of lower priority, in an attempt to minimize high
priority predicates as much as possible. A model M will be
considered perfect if there are no models preferable to it. More
398 J. C. Shepherdson
formally:
[Definition 2.] Suppose that M and N are two different models
of a disjunctive database DB. We say that N is preferable to
M (briefly, N < M) if for every ground atom A in N — M there
is a ground atom B in M - N, such that B > A. We say that
a model M of DB is perfect if there are no models preferable
to M.
He extends the notion of stratifiability to disjunctive databases by re-
quiring that in a clause
the predicates in C1,..., Cp should all be of the same level i greater than
that of the predicates in B1....Bn and greater than or equal to those of
the predicates in A1 ..., Am. He then weakens this to local stratifiability by
applying it to ground atoms and instances of program clauses instead of to
predicates and program clauses. (The number of levels is then allowed to
be infinite.) It is equivalent to the nonexistence of increasing sequences in
the above relation < between ground atoms.
He proves:
Every locally stratified disjunctive database has a perfect model.
Moreover every stratified logic program P (i.e. where the head
of each clause is a single atom) has exactly one perfect model,
and it coincides with the model Mp of Apt, Blair, and Walker.
He also shows that every perfect model is minimal and supported, if the
program is positive disjunctive, then a model is perfect iff it is minimal,
and a model is perfect if there are no minimal models preferable to it.
(Positive disjunctive means the clauses are of the form C1 V ... V Cp -
A1 A ... A A m. ) He also establishes a relation between perfect models and
the concept of prioritized circumscription introduced by McCarthy [1984]
and further developed by Lifschitz [1985]:
He shows
Negation as Failure, Completion and Stratification 399
Van Gelder et al. [1988], building on an idea of Ross and Topor [1987]
defined a semantics based on well-founded models. These are Herbrand
models which are supported in a stronger sense than that defined above.
It is explained roughly by the following example. Suppose that p — q and
q — p are the only clauses in the program with p or q in the head. Then p
needs q to support it and q needs p to support it so the set {p, q} gets no
external support and in a well-founded model all its members will be taken
to be false. In order to deal with all programs they worked with partial
interpretations and models. A partial interpretation I of a program P is
a set of literals which is consistent, i.e. does not contain both p and -p
for any ground atom (element of the Herbrand base Bp) p. If p belongs
to / then p is true in /, if -p belongs to / then p is false in I, otherwise
p is undefined in I. It is called a total interpretation if it contains either
p or -p for each ground atom p. A total interpretation / is a total model
of P if every instantiated clause of P is satisfied in /. A partial model is
a partial interpretation that can be extended to a total model. A subset
A of the Herbrand base BP is an unfounded set of P with respect to the
partial interpretation I if each atom p E A satisfies the following condition:
For each instantiated clause C of P whose head is p, at least one of the
following holds:
1. Some literal in the body of C is false in /
2. Some positive literal in the body of C is in A.
The well-founded semantics uses conditions (1) and (2) to draw negative
conclusions. Essentially it simultaneously infers all atoms in A to be false,
on the grounds that there is no one atom in A that can be first established
as true by the clauses of P, starting from 'knowing' /, so that if we choose
to infer that all atoms in A are false there is no way we would later have
to infer one as true. The usual notion of supported uses (1) only. The
closed sets of Ross and Topor [1987] use (2) only. It is easily shown that
the union of all unfounded sets with respect to / is an unfounded set, the
greatest unfounded set of P with respect to /, denoted by Up(I). Now for
each partial interpretation / an extended partial interpretation Wp(P) is
obtained by adding to / all those positive literals p such that there is an
instantiated clause of P whose body is true in / (this part is like the familiar
Tp operator) and all those negative literals -p such that p € Up(I). It is
routine to show that Up is monotonic and so has a least fixed point reached
after iteration to some countable ordinal. This is denoted by Mwp(P) and
called the well-founded partial model of P. The well-founded semantics of P
is based on Mwp(P). In general Mwp(P) will be a partial model, giving rise
to a 3-valued semantics. Using the 3-valued logic of Section 2.6 Mwp(P)
is a 3-valued model of comp(P), but in general it is not the same as the
Fitting model defined in Section 2.6 as the least fixed point of p. Since
the Fitting model is the least 3-valued model of comp(P) it is a subset of
Negation as Failure, Completion and Stratification 401
which leads to no gaps and no glut, and a felicitous model is one where
the hypothesis that the false statements are precisely those which are false
in the model, is a happy hypothesis. Fine shows that the restriction to
felicitous models can be viewed as a kind of self-referential closed world
assumption.
To sum up. Plausible arguments have been given for each of the seman-
tics discussed in this section. The minimal models of comp(P) of Apt,
Blair and Walker, the perfect models of Przymusinski, the well-founded
models of van Gelder, Ross and Schlipf, and the stable models of Gelfond
and Lifschitz are all models of comp(P), so SLDNF-resolution is sound for
them. So they all offer plausible semantics for negation as failure in general
different from that based on comp(P) because they are based on a subset
of the models of comp(P). The fact that for the important class of locally
stratified programs they all coincide, giving a unique model Mp, which
often appears to be 'the' natural model, adds support to their claim to be
chosen as the intended semantics. However SLDNF-resolution will be even
more incomplete for them than it is for the semantics based on comp(P)
and, as noted above there is, even for the class of definite programs, demon-
strably no way of extending SLDNF-resolution to give a computable proof
procedure which is both sound and complete for them.
where 3 quantifies the variables on the right hand side other than x1,. ..,xn.
If there are a finite number of answers to the query Q then
then ?q(x) has the answer x = a, so ?q(x) is given the answer x / a and
?p the answer 3x(x = a).
If there are infinitely many answers to Q this procedure is not applica-
ble, for example if P is
then the only computed answer to ?p(x) is x = a, but the truth of p(b)
is not determined by comp(P), so p(x) - x = a is not a consequence of
comp(P).
When the procedure does work it returns as answer to a query Q an
equality formula E (i.e. a formula with = as the only predicate symbol)
such that
where each ti, si, is either a non-variable term or one of the free variables
of this formula distinct from Xi, x1i( respectively, where the V in V(x'i = si)
universally quantifies some (perhaps none) of the variables in si, where the
y1, . . . , ys are distinct from x1 , . . . , xm , x1, . . . , x'm , and each yi occurs in
at least one of the terms t1, . . . , tm . Chan does not describe his normal
form explicitly but from his examples and his reduction algorithm it would
appear that he achieves the further restriction corresponding to the use of
idempotent mgu, that the occurrence of Xi on the left hand side of Xi = ti
is the only occurrence of xi in the formula. But in order to achieve this
he has to admit quantified inequations of the form V ( y j = r j ) as well. For
example, the formula
The reason why the assumption that L is infinite simplifies the normal
forms for equality formulas is that CETL is then a complete theory. Normal
forms for the case of finite L have been given in Shepherdson [l988a]. The
Negation as Failure, Completion and Stratification 405
only difference is that closed formulae of the form GkL, -GkL as defined in
Section 2.6 may occur in the normal form. The formula GkL expresses
the fact that there at least k distinct elements which are not values of a
function in L. So if one is working with models for which the number
of such elements is known, these can be replaced by true or false and
the same normal forms as above are achievable, although the reduction
algorithm will be more complicated than those of Chan and Przymusinski.
One special case of interest is where one only wishes to consider models
satisfying the domain closure axiom, DCA, of Section 2.6, which includes all
the usual Herbrand models. Then all GkL would be replaced by false. Other
normal forms for equality formulae have been given by Malcev [1971] and
Maher [1988]. The normal forms of Chan and Przymusinski are probably
the most intelligible and easily obtainable ones, but Maher's is also fairly
simple—just a boolean combination of basic formulae. These are of the
form
where y s , . . . , ys, x1,..., xs are distinct variables and x1,..., xm do not oc-
cur in t1,..., tm; so they are the answer formulae corresponding to the usual
answer substitutions given by SLD- or SLDNF-resolution using idempotent
mgu.
Chan's constructive negation is a procedure to translate a given query
Q into an equality formula E which is equivalent to it with respect to the
semantics comp(P) i.e. such that
comp(P) h V(Q - E).
As we have seen above it does not always succeed. Clearly it can only
succeed when Q is equality definable with respect to comp(P) i.e. when
there is such an equivalent equality formula. Let us say P is equality defin-
able with respect to comp(P) when all first order formulae Q are equality
definable with respect to comp(P). Przymusinski [l989a] considers equality
definability with respect to different semantics. Since all perfect models of
P are models of comp(P), and all models of comp(P) are models of P it is
clear that:
If Q is equality definable with respect to P then it is equality
definable with respect to comp(P); if it is equality definable with
respect to comp(P) then it is equality definable with respect to
the perfect model semantics.
He shows that a program P is equality definable with respect to the
perfect model semantics if it has no function symbols or more generally
if it has the bounded term property defined by Van Gelder [Van Gelder,
1989]. The analogous result is not true for the semantics based on P or on
comp(P). Indeed Shepherdson [l988a] showed that P is equality definable
with respect to comp(P) iff there is a 2-level hierarchic program which is
406 J. C. Shepherdson
where A is an atom and a — mgu (A, A'). In all these rules A' — Z is as
usual supposed to be a variant of the program clause which is standardised
apart so as to have no variables in common with A, X.
A rule allowing you to pass from 'succeeds with answer 0' to 'succeeds',
if .4 is ground
if .4 is ground.
If we take SLDNF-resolution to be defined in the familiar way in terms
of trees, as in Lloyd [1987] then the statement that Kunen's definition given
in Section 1.3 above is equivalent to it amounts to the statement:
A goal X succeeds under SLDNF-resolution from P iff (X) is
derivable in this calculus; it succeeds with answer substitution
9 iff ( X ; 0 ) is derivable in the calculus; it fails iff - (X) is
derivable in the calculus.
The proof is routine, by induction on the length of the derivation for
the 'if halves, and by induction on the number of nodes in the success
412 J. C. Shepherdson
or failure tree for the 'only if halves. For a survey of these calculi see
Shepherdson [1992a] which also gives the obvious extension of this calculus
to deal with the extended negation as failure rules described in Section 1.6
above:
if A fails then -A succeeds with answer 6
if A succeeds with answer 1 then -A fails.
References
[Andreka and Nemeti, 1978] H. Andreka and I. Nemeti. The generalised
completeness of Horn predicate logic as a programming langauge. Acta
Cybernet, 4:3-10, 1978.
[Apt, 1990] K. R. Apt. Introduction to logic programming. In J. van
Leeuwen, editor, Handbook of Theoretical Computer Science, Vol B,
Chapter 10. Elsevier Science, North Holland, Amsterdam, 1990.
[Apt and Bezem, 1991] K. R. Apt and M. Bezem. Acyclic programs. New
Generation Computing, 9:335-363, 1991.
[Apt and Blair, 1988] K. R. Apt and H. A. Blair. Arithmetic classification
of perfect models of stratified programs. Technical Report TR-88-09,
University of Texas at Austin, 1988.
[Apt, 1994] K. R. Apt and R. Bol. Logic programming and negation. Jour-
nal of Logic Programming, 1994.
[Apt and Pugin, 1987] K. R. Apt and J-M. Pugin. Management of strat-
ified databases. Technical Report TR-87-41, University of Texas at
Austin, 1987.
[Apt and van Emden, 1982] K. R. Apt and M. H. van Emden. Contribu-
tions to the theory of logic programming. Journal of the ACM, 29:841-
863, 1982.
[Apt et al, 1988] K. R. Apt, H. A. Blair, and A. Walker. Towards a theory
of declarative knowledge. In J. Minker, editor, Foundations of Deductive
Databases and Logic Programming, pp. 89-148. Morgan Kaufmann, Los
Altos, CA, 1988.
[Barbuti and Martelli, 1986] R. Barbuti and M. Martelli. Completeness of
SLDNF-resolution for structured programs. Preprint, 1986.
[Blair, 1982] H. A. Blair. The recursion theoretic complexity of the se-
mantics of predicate logic as a programming language. Information and
Control, 54, 24-47, 1982.
[Borger, 1987] E. Borger. Unsolvable decision problems for prolog pro-
grams. In E. Borger, editor, Computer Theory and Logic. Lecture Notes
in Computer Science, Springer-Verlag, 1987.
[Brass and Lipeck, 1989] S. Brass and U. W. Lipeck. Specifying closed
world assumptions for logic databases. In Proc. Second Symposium on
Mathematical Fundamentals of Database Systems (MFDBS89), 1989.
Negation as Failure, Completion and Stratification 413
1983.
[Jaffar et ai, 1984] J. Jaffar, J. L. Lassez, and M. J. Maher. A theory of
complete logic programs with equality. Journal of Logic Programming,
1, 211-223, 1984.
[Jaffar et al., 1986a] J. Jaffar, J. L. Lassez, and M. J. Maher. Comments
on general failure of logic programs. Journal of Logic Programming, 3,
115-118, 1986.
[Jaffar et al., 1986b] J. Jaffar, J. L. Lassez, and M. J. Maher. Some issues
and trends in the semantics of logic programs. In Proceedings Third
International Conference on Logic Programming, pp. 223-241. Springer,
1986.
[Jager, 1988] G. Jager. Annotations on the consistency of the closed world
assumption. Technical report, Computer Science Dept., Technische
Hochschule, Zerich, 1988.
[Jager, 1993] G. Jager. Some proof-theoretic aspects of logic programming.
In F. L. Bauer, W. Brauer, and H. Schwichtenberg, editors, Logic and
Algebra of Specification, pp. 113-142. Springer-Verlag, 1993. Proceed-
ings of the NATO Advanced Studies Institute on Logic and Algebra of
Specification, Marktoherdorf, Germany, 1991.
[Kleene, 1952] S. C. Kleene. Introduction to Metamathematics. van Nos-
trand, New York, 1952.
[Kowalski, 1979] R. A. Kowalski. Logic for Problem Solving. North Hol-
land, New York, 1979.
[Kunen, 1987] K. Kunen. Negation in logic programming. Journal of Logic
Programming, 4, 289-308, 1987.
[Kunen, 1989] K. Kunen. Signed data dependencies in logic programs.
Journal of Logic Programming, 7, 231-245, 1989.
[Lassez and Maher, 1985] J. L. Lassez and M. J. Maher. Optimal fixed-
points of logic programs. Theoretical Computer Science, 15-25, 1985.
[Lewis, 1978] H. Lewis. Renaming a set of clauses as a horn set. Journal
of the ACM, 25, 134-135, 1978.
[Lifschitz, 1985] V. Lifschitz. Computing circumscription. In Proceedings
IJCAI-85, pp. 121-127, 1985.
[Lifschitz, 1988] V. Lifschitz. On the declarative semantics of logic pro-
grams with negation. In J. Minker, editor, Foundations of Deductive
Databases and Logic Programming, pp. 177-192. Morgan Kaufmann, Los
Altos, CA, 1988.
[Lloyd, 1987] J. W. Lloyd. Foundations of Logic Programming. Springer,
Berlin., second edition, 1987.
[Lloyd and Topor, 1984] J. W. Lloyd and R. W. Topor. Making Prolog
more expressive. Journal of Logic Programming, 1, 225-240, 1984.
416 J. C. Shepherdson
[Lloyd and Topor, 1985] J. W. Lloyd and R. W. Topor. A basis for de-
ductive data base systems, II. Journal of Logic Programming, 3, 55-68,
1985.
[Loveland, 1988] D. W. Loveland. Near-horn prolog. In J.-L. Lassez, editor,
Proc. ICLP'87. MIT Press, Cambridge, MA, 1988.
[McCarthy, 1984] J. McCarthy. Applications of circumscription to formal-
izing common sense knowledge. In AAAI Workshop on Non-Monotonic
Reasoning, pp. 295-323, 1984.
[Maher, 1988] M. J. Maher. Complete axiomatization of the algebras of fi-
nite, infinite and rational trees. In Proc. of the Third Annual Symposium
on Logic in Computer Science, Edinburgh, pp. 345-357, 1988.
[Mahr and Makowsky, 1983] B. Mahr and J. A. Makowsky. Characteriz-
ing specification languages which admit initial semantics. In Proc. 8th
CAAP, pp. 300-316. Lecture Notes in Computer Science 159, Springer-
Verlag, 1983.
[Makowsky, 1986] J. A. Makowsky. Why Horn formulas matter in com-
puter science: Initial structures and generic examples (extended ab-
stract). Technical Report 329, Technion Haifa, 1986. Also in Mathemat-
ical Foundations of Software Development, Proceedings of the Interna-
tional Joint Conference on Theory and Practice of Software Development
(TAPSOFT) (H. Ehrig et al, Eds.), Lecture Notes in Computer Science
185, pp. 374-387, Springer, 1985. (Revised version May 15, 1986, 1-28,
preprint.) The references in the text are to this most recent version.
[Malcev, 1971] A. Malcev. Axiomatizable classes of locally free algebras of
various types. In The Metamathematics of Algebraic Systems: Collected
Papers, chapter 23, pp. 262-281. North-Holland, Amsterdam, 1971.
[Mancarella et al., 1988] P. Mancarella, S. Martini, and D. Pedreschi.
Complete logic programs with domain closure axiom. Journal of Logic
Programming, 5, 263-276, 1988.
[Meltzer, 1983] B. Meltzer. Theorem-proving for computers: Some results
on resolution and renaming. In J. Siekmann and G. Wrightson, editors,
Automation of Reasoning, pp. 493-495. Springer, Berlin, 1983.
[Minker, 1982] J. Minker. On indefinite data bases and the closed world
assumption. In Proc. 6th Conference on Automated Deduction, pp. 292-
308. Lecture Notes in Computer Science 138, Springer-Verlag, 1982.
[Minker and Perils, 1985] J. Minker and D. Perlis. Computing protected
circumscription. Journal of Logic Programming, 2, 1-24, 1985.
[Mints, 1986] G. Mints. Complete calculus for pure Prolog (Russian). Proc.
Acad. Sci. Estonian SSR, 35, 367-380, 1986.
[Moore, 1985] R. C. Moore. Semantic considerations on non-monotonic
logic. Artificial Intelligence, 25, 75-94, 1985.
Negation as Failure, Completion and Stratification 417
Contents
1 Introduction 422
1.1 Theoretical foundations 423
1.2 Applications 425
1.3 Efficiency improvements 426
1.4 Preliminaries 427
2 The non-ground representation 429
2.1 The representation 431
2.2 Reflective predicates 434
2.3 Meta-programming in Prolog 439
3 The ground representation 440
3.1 The representation 442
3.2 Reflective predicates 448
3.3 The language Godel and meta-programming . . . . 453
4 Self-applicability 459
4.1 Separated meta-programming 460
4.2 Amalgamated meta-programming 461
4.3 Ambivalent logic 467
5 Dynamic meta-programming 468
5.1 Constructing programs 468
5.2 Updating programs 471
5.3 The three wise men problem 473
5.4 Transforming and specializing programs 478
6 Specialization of meta-programs 481
6.1 Logic program specialization 481
6.2 Specialization and compilation 487
6.3 Self-applicable program specializers 488
6.4 Applications of meta-program specialization . . . . 489
422 P. M. Hill and J. Gallagher
1 Introduction
A meta-program, regardless of the nature of the programming language,
is a program whose data denotes another (object) program. The impor-
tance of meta-programming can be gauged from its large number of ap-
plications. These include compilers, interpreters, program analysers, and
program transformers. Furthermore, if the object program is a logic or
functional program formalizing some knowledge, then the meta-program
may be regarded as a meta-reasoner for reasoning about this knowledge.
In this chapter, the meta-program is assumed to be a logic program. The
object program does not have to be a logic program although much of the
work in this chapter assumes this.
We have identified three major topics for consideration. These are the
theoretical foundations of meta-programming, the suitability of the alterna-
tive meta-programming techniques for different applications, and methods
for improving the efficiency of meta-programs. As with logic programs
generally, meta-programs have declarative and procedural semantics. The
theoretical study of meta-programming shows that both aspects of the
semantics depend crucially on the manner in which object programs are
represented as data in a meta-program. The second theme of the pa-
per is the problem of designing and choosing appropriate ways of spec-
ifying important meta-programming problems, including dynamic meta-
programming and problems involving self-application. The third theme
concerns efficient implementation of meta-programs. Meta-programming
systems require representations with facilities that minimize the overhead
of interpreting the object program. In addition, efficiency can be gained
by transforming the meta-program, specializing it for the particular object
program it is reasoning about. This chapter, which concentrates on these
aspects of meta-programming, is not intended to be a survey of the field.
A more complete survey of meta-programming for logic programming can
be found in [Barklund, 1995].
Many issues in meta-programming have their roots in problems in logic
which have been studied for several decades. This chapter emphasizes
meta-programming solutions. It is not intended to give a full treatment of
the underlying logical problems, though we try to indicate some connections
to wider topics in meta-logic.
The meta-programs in this chapter are logic programs based on first
order logic. An alternative approach, which we do not describe here, is
to extend logic programming with features based on higher-order logic.
Higher-order logic programming is an ongoing subject of research and is
discussed by Miller and Nadathur [1995]. The higher-order logic program-
ming language AProlog and its derivatives have been shown to be useful for
many meta-programming applications, particularly when the object pro-
gram is a functional program [Miller and Nadathur, 1987], [Hannan and
Meta-Programming in Logic Programming 423
1.2 Applications
We identify two important application requirements in this chapter. One is
for meta-programs that can be applied to (representations of) themselves
and the other is for meta-programs that need to reason about object pro-
grams that can change. We call the first, self-applicable, and the second,
dynamic meta-programming.
There are many programming tools for which self-application is impor-
tant. In particular; interpreters, compilers, program analysers, program
transformers, program debuggers, and program specializers can be usefully
applied to themselves. Self-applicable meta-programming is discussed in
Section 4 and the use of self-applicable program specializers is discussed in
Section 6.
426 P. M. Hill and J. Gallagher
Cat(Tom)
Mouse (Jerry)
Chase(x,y) - Cat(x) A Mouse(y)
1.4 Preliminaries
The two principal representations, non-ground and ground, discussed in
this chapter are supported by the programming systems, Prolog and Godel,
respectively. Since Prolog and Godel have different syntax, we have, for
uniformity, adopted a syntax similar to Godel. Thus, for example, vari-
ables begin with a lower-case letter, non-variable symbols are either non-
alphabetic or begin with an upper-case letter. The logical connectives are
standard. Thus A, -, -, and 3 denote conjunction, negation, left implica-
tion and existential quantification, respectively. The exception to the use
of this syntax is where we quote from particular programming systems. In
these cases, we adopt the appropriate notation.
Figures 1 and 2 contain two simple examples of logic programs (in
this syntax) that will be used as object programs to illustrate the meta-
programming concepts in later sections. There are two syntactic forms for
the definition of Member in Figure 2. One uses the standard list notation
[...|...] and the other uses the constant Nil and function Cons/2 to construct
a list. For illustrating the representations it is usually more informative
to use the latter form although the former is simpler. The language of
the program in Figure 1 is assumed to be defined using just the symbols
Cat/1, Mouse/I, Chase/2, Tom, and Jerry occurring in the program. The
language of the program in Figure 2 is assumed to include the non-logical
symbols in the program in Figure 1, the predicate Member/2, the function
Cons/2, the constant Nil, together with the natural numbers.
We summarize here the main logic programming concepts required for
this chapter. Our terminology is based on that of [Lloyd, 1987] and the
reader is referred to this book for more details. A logic program contains
a set of program statements. A program statement which is a formula in
first order logic is written as either
H
428 P. M. Hill and J. Gallagher
or
H -B
denoting the formula -B, where B is an arbitrary formula called the body
of the goal. If B is a conjunction of literals (respectively, atoms), then
the goal is normal (respectively, definite). As in Prolog and Godel, an '_'
is used to denote a unique variable existentially quantified at the front of
the atom in which it occurs. It is assumed that all other free variables are
universally quantified at the front of the statement or goal.
The usual procedural meaning given to a definite logic program is SLD-
resolution. However, this is inadequate to deal with more general types of
program statements. In logic programming, a (ground) negative literal -A
in the body of a normal clause or goal is usually implemented by 'negation
as failure':
the goal -A succeeds if A fails
the goal -A fails if A succeeds.
Using the program as the theory, negation, as defined in classical logic, can-
not provide a semantics for such a procedure. However, it has been shown
that negation as failure is a satisfactory implementation of logical negation
provided it is assumed that the theory of a program is not just the set of
statements in the program, but is its completion [Clark, 1978]. A clear and
detailed account of negation as failure and a definition of the completion
of a normal program is given in [Shepherdson, 1994]. This definition can
easily be extended to programs with arbitrary program statements. More-
over, it is shown in [Lloyd, 1987] that any program can be transformed
to an equivalent normal program. Thus, in this chapter, we only consider
object programs that are normal and, unless otherwise stated, that the
semantics of a logic program is defined by its completion. The completion
of a program P is denoted by comp(P).
In meta-programming, we are concerned, not only with the actual for-
mulas in an object theory but also the language in which these formulas
are written. This language which we call the object language may either be
inferred from the symbols appearing in the object theory, or be explicitly
declared. When the object theory is a Prolog program, then the language
is inferred. However, when the object program is a Godel program the lan-
Meta-Programming in Logic Programming 429
guage is declared. These declarations also declare the types of the symbols
so that the semantics of the Godel system is based on many-sorted logic.
The following notation is used in this chapter.
1. The symbols £ and M denote languages. Usually £ denotes an object
language while M is the language of some meta-program.
2. The notation EC means that E is an expression in a language £. The
subscript is omitted when the language is either obvious or irrelevant.
3. The representation of an expression E in another language M is
written [E]M. The actual representation intended by this notation
will depend on the context in which it is used. The subscript is
omitted when the language is either obvious or irrelevant
4. If £ is a language, then the set of representations of expressions of £
in language M. is written |L]M.
We discuss a number of reflective predicates, but the key examples will
realise adaptations of one or both of the following reflective principles. For
all finite sets of sentences A and single sentences B of an object language £,
meta-program.
Therefore, in this section, a non-ground representation is presented
where variables are represented as variables and the naming relation for
the non-logical symbols is summarized as follows.
Object symbol Meta symbol
Constant Constant
Function of arity n Function of arity n
Proposition Constant
Predicate of arity n Function of arity n
as follows.
• If Q is of the form -R, then [Q] is Not [ R ] .
• If Q is of the form R A S, then [Q] is [R] And [ S ] .
• If Q is of the form .R - 5, then [Q] is [R] If [S].
Continuing the above example using this naming relation, the formula
Cat(Tom) A Mouse(Jerry)
is represented by the term
Cat'(Tom1) And Mouse'(Jerry').
In this example, the name of an object symbol is distinct from the name
of the symbol that represents it. However, there is no requirement that this
should be the case. Thus, the names of the object symbol and the symbol
that represents it can be the same. For example, with this representation,
the atomic formula Cat (Tom) is represented by the term Cat (Tom). This
is the trivial naming relation, used in Prolog. It does not in itself cause
any amalgamation of the object language and meta-language; it is just a
syntactic convenience for the programmer. We adopt this trivial naming
relation together with the above representation of the connectives for the
rest of this section.
A logic program is a set of normal clauses. It is clearly necessary that
if the meta-program is to reason about the object program, there must be
a way of identifying the clauses of a program. In Prolog, each clause in the
program is represented as a fact (that is, a clause with the empty body).
This has the advantage that the variables in the fact are automatically
standardized apart each time the fact is used. We adopt this representation
here. Thus it is assumed that there is a distinguished constant True and
distinguished predicate Clause/2 in the meta-program defined so that each
clause in the object program of the form
h-b.
is represented in the meta-program as a fact
Clause([h],[b]).
and each fact
h.
is represented in the meta-program as a fact
Clause ([h], True).
Thus, in Figure 1,
Chase(x,y) - Cat(x) A Mouse(y).
is represented by the fact
Clause(Chase(x,y), Cat(x) And Mouse(y)).
The program in Figure 2 would be represented by the two facts.
434 P. M. Hill and J. Gallagher
< Unify (
Member(x, Cons(x, y)),
Member(l, Cons(z, Cons(2, Cons(3, Nil)))) )
will succeed with
x = l, y = Cons(2, Cons(3, Nil)), z = 1.
The ability to use the underlying unification mechanism for both the
object program and its representation makes it easy to define reflective
predicates whose semantics in the meta-program correspond to the seman-
tics of the object program.
The meta-interpreter V in Figure 3 assumes that the object program is
a normal logic program and defines the reflective predicate Solve/1. Given
Solve (True)
Solve((a And b)) <-
Solve (a) A
Solve(b)
Solve(Not a) <-
-,Solve (a)
Solve (a)
Clause(a, b) A
Solve(b)
Fig. 3. The Vanilla meta-interpreter V
P\-c Q iff VP -M
In addition, the predicates Solve/1 and Clause/2 have each of their argu-
ments of type u. The following result is proved in [Hill and Lloyd, 1989].
Theorem 2.2.1. Let P be a normal program and «- Q a normal goal. Let
Vp be the program defined above. Then the following hold:
1. comp(P) is consistent iff comp(Vp) is consistent.
2. 9 is a correct answer for comp(P)U{<- Q} iff 0 is a correct answer
for comp(V P )U{<- Solve(Q)}.
3. ->Q is a logical consequence of comp(P) iff -> Solve(Q) is a logical
consequence of comp(Vp).
Theorem 2.2.2. Let P be a normal program and +- Q a normal goal. Let
Vp be the program defined above. Then the following hold:
1. 0 is a computed answer for Pu{<— Q} iff 6 is a computed answer for
VPU{«- Solve(Q)}.
2. PU{<- Q} has a finitely failed SLDNF-tree iff VPU{<- Solve(Q)}
has a finitely failed SLDNF-tree.
It is important to note here that, although the declarative semantics
of Vp requires a model which is typed, the procedural semantics for the
program P and for Vp is the same and that, apart from checking that the
given goal is correctly typed, no run-time type checking is involved.
Martens and De Schreye have provided an alternative solution to the
semantics of Vp that does not require the use of types. We now give a
summary of the semantics that they have proposed. Their work requires
Meta-Programming in Logic Programming 437
perfect Herbrand model of Vp iff p(t 1 ,...,tn) is not true in the perfect
Herbrand model of P.
The main issue distinguishing the typed and language independent ap-
proaches is the criterion that is used in determining the language of a
program. Either the language of a program is inferred from the symbols it
contains or the language can be defined explicitly by declaring the symbols.
As the language of the program Vp must include a representation of all
the symbols of the object program, it is clear that if we require the lan-
guage of a program to be explicitly declared, the meta-program will need
to distinguish between the terms that represent object terms and those
that represent object formulas. This of course leads naturally to a typed
interpretation. On the other hand, if the language of a program is fixed by
the symbols actually appearing in that program, then we need to ensure
the interpretation is unchanged when the program is extended with new
symbols. This leads us to look at the concept of language independence
which is the basis of the second approach.
The advantages of using a typed interpretation is that no extra condi-
tions are placed on the object program. The usual procedural semantics
for logic programming can be used for both typed and untyped programs.
Thus the type information can be omitted from the program although it
seems desirable that the intended types of the symbols be indicated at least
as a comment to the program code. A possible disadvantage is that we must
use many-sorted logic to explain the semantics of the meta-program instead
of the better known unsorted logic. The alternative of coding the type in-
formation explicitly as part of the program is not desirable since this would
create an unnecessary overhead and adversely affect the efficiency of the
meta-interpreter.
The advantage of the language independence approach is that for def-
inite language independent object programs, the semantics of the V pro-
gram is based on that of unsorted first order logic. However, many common
programs such as the program in Figure 2 are not language independent.
Moreover, as soon as we allow negation in the bodies of clauses, any ad-
vantage is lost. Not only do we still require the language independence
condition for the object program, but we can only compare the weakly
perfect model of the meta-program with the perfect model of the object
program.
We conclude this subsection by presenting in Figure 4 an extended form
of the program V in Figure 3. This program, which is a typical example of
what the non-ground interpretation can be used for is adapted from [Ster-
ling and Shapiro, 1986]. The program W defines a predicate PSolve/2. The
first argument of PSolve/2 corresponds to the single argument of Solve but
the second argument must be bound to the program's proof of the first
argument.
Meta-Programming in Logic Programming 439
PSolve(True, True)
PSolve(x And y, xproof And yproof) <-
PSolve(x, xproof) A
PSolve(y, yproof)
PSolve(Not x,True) <-
-<PSolve(x, _)
PSolve(x,x If yproof) «-
Clause(x,y) A
PSolve(y, yproof)
Fig. 4. The Proof-Tree meta-interpreter W
arg(l,cat(X),X).
arg(1,mouse(X),X).
440 P. M. Hill and J. Gallagher
arg(1,chase(X,Y),X).
arg(2, chase(X,Y),Y).
The predicate = .. is a binary infix predicate which is true when the left
hand argument is a term of the form f or f (t1,...,t n ) and the right hand
argument is the list [f ,t 1 ,.. .,tn]. This predicate is not really necessary
and can be defined in terms of functor and arg.
Term = .. [Function l Args] :-
functor(Term, Function, Arity),
findargs(Arity, Args, Term).
first order logic given by the Godel numbering has a different number for
a constant when viewed as a symbol to that when it is viewed as a term.
The actual structural levels that may be represented depend on the
tasks the meta-program wishes to perform. In particular, if we wish to
define a reflective predicate that has an intended interpretation of SLD-
resolution, we need to define unification. For this, a representation of the
symbols is required. If we wish to be able to change the object program
using new symbols created dynamically, then a representation of the char-
acters would be needed.
A representation is in fact a coding and, as is required for any cipher, a
representation should be invertible (injective) so that results at the meta-
level can be interpreted at the object-level3. Such an inverse mapping
we call here a re-representation. To ensure that not only the syntax and
structural level of the object element can be recovered but also the kind
of unit at that level (for example, if it is a language element in a logic
program, whether it is a term, atom, or formula) that is represented, the
representation has to include this information as part of the coding.
We give, as an example, a simple ground representation for a normal
program. This is used in the next subsection by the Instance-Demo pro-
gram in Figure 5 and the SLD-Demo program in Figure 6. First we de-
fine the naming relation for each symbol in the language and then define
the representation for the terms and formulas (using just the connectives
A, <—, -i) that can be constructed from them. Individual characters are not
represented. For this representation, it is assumed that the language of
the meta-program includes three types: the type s for terms representing
object symbols; the type o for terms representing object terms; and the
type u for terms representing object formulas. It is also assumed that the
language includes the type List (a) for any type a together with the usual
list constructor Cons and constant Nil. The language must also include the
functions Term/2, Atom/2, V/1, C/1, F / 1 , P/1 And/2, If/2, and Not/I
whose intended types are as follows.
3
There is, of course, an alternative explanation of why a representation must be
injective: it is the inverse of denotation. Since denotation must be functional (any term
has (at most) one denotation), the inverse of denotation must be injective.
Meta-Programming in Logic Programming 445
Function Type
Term s * List(o) --> o
Atom s * List(o) —> u
V Integer - > o
C Integer - > s
F Integer - > s
P Integer - > s
And (u* u - > u
If u* u - > u
Not u, - > u,
Term(C(l),[}),
Term(F(0), [Term(C(2), ), Term(C(0),
( Term(C(l),[]),
Term(F(0), [Term(C(2), []), Term(C(0), [])] ) ] )
] )-
As the string representation carries no structural information, this rep-
resentation is computationally very inefficient. Thus, it is desirable for a
representation such as a string as well as one that is more structurally
descriptive to be available. The meta-programming system should then
provide a means of converting from one representation to the other.
Most researchers on meta-programming define a specific representation
suitable for their purposes without explaining why the particular repre-
sentation was chosen. However, Van Harmelen [1992] has considered the
many different ways in which a representation may be defined and how this
may assist in reasoning about the object theory. We conclude this subsec-
tion with interesting examples of non-standard representations based on
examples in his paper.
In the first example, the object theory encodes graphs as facts of the
form Edge(Ni,Nj) together with a rule defining the transitive closure.
Connected ( n1, n2) «- Edge(n 1 ,n 3 ) A Connected (n3, n2)
The representation assigns different terms to the object formulas, depend-
ing on their degree of instantiation.
448 P. M. Hill and J. Gallagher
There axe two basic styles of meta-interpreter that use the ground rep-
resentation and have been discussed in the literature. The first style is
derived from an idea proposed in [Kowalski, 1990] and has a similar form
to the program V in Figure 3 but uses the ground rather than the non-
ground representation. In this subsection, we present, in Figure 5, an
interpreter I based on this proposal. This style of meta-interpreter and
similar meta-programs are being used in several programs although a com-
plete version and a discussion of its semantics has not previously been
published. In the second style the procedural semantics of the object pro-
gram is intended to be a model of the meta-interpreter. For example, the
meta-interpreter outlined in [Bowen and Kowalski, 1982] is intended to
define SLD-resolution. An extension of this meta-interpreter for SLDNF-
resolution in shown in [Hill and Lloyd, 1989] to be correct with respect to
its intended interpretation. An outline of such a meta-interpreter J is given
in Figure 6. The Godel program SLD-Demo, also based on this style, is
presented in the next subsection in Figure 9.
Both the programs I and J make use of the ground representation de-
fined in the previous subsection. These programs require an additional
type a to be used for the bindings in a substitution. The function Bind/2
is used to construct such a binding. This has domain type Integer * o and
range type a.
The meta-interpreter I in Figure 5 defines the predicate IDemo/3. This
program is intended to satisfy the reflective principles
P \-c Q0 iff I \-M IDemo(\P\ \Q\,\Q0\)
comp(P) \=c Q0iff comp(I) \=M IDemo([P], [Q],[Q0]).
Here, Q is a conjunction of literals and 0 a substitution that grounds Q.
These reflective principles, which are adaptations of those given in Sub-
section 1.4, ensure that provability and logical consequence for the object
program are defined in the meta-program.
The types of the predicates in Program I are as follows.
Predicate Type
IDemo List(u) * u * u
IDemo1 List(u) * u,
InstanceOf u * u
InstFormula u * u * List(a) * List(a)
InstTerm 0 * 0 * List(a) * List(a)
InstArgs List(o) * List(o) * List(a) * List(a)
IDemo(p,x,y) <—
InstanceOf (x, y) ^
IDemo1(p, y)
IDemol(-, True)
IDemo1(p, And(x,y)) <—
IDemo1(p,x) ^
IDemo1(p, y)
IDemo1(p, Not(x)) <-
-lIDemo1(p, x)
IDemo1(p,Atom(q,xs)) <-
Member(z, p) ^ InstanceOf (z, If (Atom(q,xs),b)) ^
IDemo1(p, b)
InstTerm(V(n), x , [ ] , [Bind(n,x)])
InstTerm(V(n),x, [Bind(n, x)[s], [Bind(n,x)[s])
InstTerm(V(n),x, [Bind(m,y)[s], [Bind(m,y)[s1]) <-
n # mA
InstTerm(V (n), x, s, s1)
InstTerm(Term(f, xs), Term(f, ys),s, s1) <-
InstArgs(xs, ys, s, s1)
InstArgs([],[],s,s)
InstArgs([x\xs],[y\ys],s,s2) <-
InstTerm(x,y,s,s1) A
InstArgs(xs, ys, s1, s2)
JDemo(p,x,y) «-
MaxForm(x, n) A
Derivation(p,x, True,[],s,n) A
ApplyToForm(s, x, y)
Derivation(-, x, x, s, s, _)
Derivation(p, x,z,s,t, n) «-
SelectLit(Atom(q, xs),x) A
Member(If(Atom(q,ys),ls),p) A
Resolve(x, Atom(q, xs),If(Atom(q,ys),ls),s, sl,zl,n, nl) A
Derivation(p, zl, z, s1, t, nl)
Derivation(p, x, z,s,s,_) <-
SelectLit(Not(a),x) A
ApplyToForm(s,a,a1) A
Grotmd(al) A
-lDerivation(p,al, True, [],, 0) A
ReplaceConj(x,Not(a), True,z)
Goal
Demo(
[ If(Atom(P(0), [V(0), Term(F(0), [V(0), V ( 1 ) ] ) ] ) , True),
If(Atom(P(0), (V(0), Term(F(0), [V(1), V(2)])]),
Atom(P(0),[(V(0),V(2)])) ])
Atom(P(0), [
V(0),
Term(F(0),
( Term(C(1),[]),
Term(F(0),[Term(C(2),[]),Term(C(0),[])] ) ])
9,
Computed answers
g= Atom(P(0), [
Term(C(l),[]),
Term(F(0),
[ Term(C(1),[]),
Term(F(0),[Term(C(2),[]), Term(C(0),
9= Atom(P(0), [
Term(C(2),[]),
Term(F(0),
[ Term(C(1),[]),
Term(F(0),[Term(C(2),[]), Term(C(0),
The provision is mainly focussed on the case where the object program is
another Godel program, although there are also some basic facilities for
representing and manipulating expressions in any structured language.
One of the first problems encountered when using the ground repre-
sentation is how to obtain a representation of a given object program and
goals for that program. It can be seen from the example in Figure 7,
that a ground representation of even a simple formula can be quite a large
expression. Hence constructing, changing, or even just querying such an
expression can require a large amount of program code. The Godel sys-
tem provides considerable system support for the constructing and query-
ing of terms representing object Godel expressions. By employing an ab-
stract data type approach so that the representation is not made explicit,
Godel allows a user to ignore the details of the representation. Such an
abstract data type approach makes the design and maintenance of the
meta-programming facilities easier. Abstract data types are facilitated in
Godel by its type and module systems. Thus, in order to describe the
meta-programming facilities of Godel, a brief account of these systems is
given.
Each constant, function, predicate, and proposition in a Godel program
must be specified by a language declaration. The type of a variable is
not declared but inferred from its context within a particular program
statement. To illustrate the type system, we give the language declarations
that would be required for the program in Figure 1:
BASE Name.
CONSTANT Tom, Jerry : Name.
PREDICATE Chase : Name * Name;
Cat, Mouse : Name.
Note that the declaration beginning BASE indicates that Name is a base
type. In the statement
Chase(x,y) <- Cat(x) & Mouse(y) .
the variables x and y are inferred to be of type Name.
Polymorphic types can also be defined in Godel. They are constructed
from the base types, type variables called parameters, and type construc-
tors. Each constructor has an arity > 1 attached to it. As an example,
we give the language declarations for the non-logical symbols used in the
(second variant) of the program in Figure 2.
CONSTRUCTOR List/1.
CONSTANT Nil : List(a).
FUNCTION Cons : a * List(a) -> List(a)
PREDICATE Member : a * List(a).
Here List is declared to be a type constructor of arity 1. The type List (a)
is a polymorphic type that can be used generically. Godel provides the usual
Meta-Programming in Logic Programming 455
syntax for lists so that [] denotes Nil and [x|y] denotes Cons (x, y). Thus,
if 1 and 2 have type Integer, [1, 2] is a term of type List (Integer).
The Godel module system is based on traditional software engineering
ideas. A program is a collection of modules. Each module has at most
two parts, an export part and a local part. The part and name of the
module is indicated by the first line of the module. The statements can
occur only in the local part. Symbols that are declared or imported into the
export part of the module are available for use in both parts of the module
and other modules that import it. Symbols that are declared or imported
into the local part (but not into the export part) of the module can only
be used in the local part. There are a number of module conditions that
prevent accidental interference between different modules and facilitate the
definition of an abstract data type.
An example of an abstract data type is illustrated in the Godel program
in Figure 8 which consists of two modules, UseADT and ADT. In UseADT, the
MODULE UseADT.
IMPORT ADT.
EXPORT ADT.
BASE H,K.
CONSTANT C,D : H.
PREDICATE P : H * K * K;
Q : H * H.
LOCAL ADT.
CONSTANT E : K.
FUNCTION F : H * K -> K.
P(u,F(u,E),E).
Q(C,D).
type K, which is imported from ADT, is an abstract data type. If the query
<- Q(x,D) & P ( x , y , z ) .
was given to UseADT, then the displayed answer would be:
x = C,
y = <K>,
z = <K>
The system modules include general purpose modules such as Integers,
Lists, and Strings as well as the modules that give explicit support for
meta-programming. Integers provides the integers and the usual arith-
456 P. M. Hill and J. Gallagher
metic operations on the integers. Lists provides the standard list notation
explained earlier in this subsection as well as the usual list processing predi-
cates such as Member and Append. The module Strings makes available the
standard double quote notation for sequences of (ascii) characters. There
is no type for an individual ascii character except as a string of length one.
Godel provides an abstract data type Unit defined in the module Units.
A Unit is intended to represent a term-like data structure. The module
Flocks imports the module Units and provides an abstract data type
Flock which is an ordered collection of terms of type Unit. Since Flocks
and Units do not provide any reflective predicates, they cannot be regarded
as complete meta-programming modules. However, Flocks are useful tools
for the manipulation of any object language whose syntax can be viewed
as a sequence of units. Thus Flocks can be used as the basis of a meta-
program that can choose the object programming system and its semantics.
The four system modules for meta-programming are Syntax, Programs,
Scripts, and Theories. The modules Programs, Scripts, and Theories
support the ground representation of Godel programs, scripts, and theo-
ries, respectively. A script is a special form of a program where the module
structure is collapsed. Since program transformations frequently violate
the module structure, Scripts is mainly intended for meta-programs that
perform program transformations. A theory is assumed to be defined using
an extension of the Godel syntax to allow for arbitrary first order formulas
as the axioms of the theory. The fourth meta-programming module Syntax
is imported by Programs, Scripts, and Theories and facilitates the ma-
nipulation of expressions in the object language. We describe briefly here
the modules Syntax and Programs. The modules Scripts and Theories
are similar to Programs and details of all the meta-programming modules
can be found in [Hill and Lloyd, 1994].
The module Syntax defines abstract data types including Name, Type,
Term, Formula, TypeSubst, TermSubst, and VarTyping which are the types
of the representations of, respectively, the name of a symbol, a type, a term,
a formula, a type substitution, a term substitution, and a variable typing.
(A variable typing is a set of bindings where each binding consists of a
variable together with the type assigned to that variable.)
The module Syntax provides a large number of predicates that support
these abstract data types. Many of these are concerned with the represen-
tation and can be used to identify and construct representations of object
expressions. For example, And is a predicate with three arguments of type
Formula and is true if the third argument is a representation of the con-
junction of the formulas represented in the first two arguments. Variable
has a single argument of type Term and is true if its argument is the rep-
resentation of a variable.
The predicate Derive is an example of a reflective predicate in Syntax.
Derive has the declaration
Meta-Programming in Logic Programming 457
EXPORT SLDDemo.
IMPORT Programs.
PREDICATE SLDDemo : Program * Formula * Formula.
LOCAL SLDDemo.
MODULE TestSLDDemo.
4 Self-applicability
There are a number of meta-programs that can be applied to (copies of)
themselves. In this section we review the motivation for this form of meta-
programming and discuss the various degrees of self-applicability that can
be achieved by these programs.
The usefulness of self-applicability was demonstrated by Godel in [1931]
where the natural numbers represent the axioms of arithmetic and some
fundamental theorems about logic are derived using the properties of arith-
metic. Perlis and Subrahmanian [1994] give a description of many of
the general issues surrounding self-reference in logic and artificial intel-
ligence together with a comprehensive bibliography. The concept of self-
applicability has been used to construct many well-known logical para-
doxes, one of the most famous being the liar paradox:
This sentence is false.
Here, we are concerned with programming applications of self-applicability
and how particular programming languages support such applications.
460 P. M. Hill and J. Gallagher
These or similar rules are necessary for communication between the object
program and its representation in the amalgamated meta-program.
To facilitate these linking rules, a means of computing the re-represent-
ation of the terms representing the terms of the object language must be
provided. Also, a method for finding the representation of an object term
is required. This reflective requirement, which concerns only the terms of
the object language, may be realized by means of inference rules, functions,
or relations. In each case, we consider how the predicate Demo/2 whose
semantics is given by the above linking rules may be defined for the atomic
formulas. The definition of Demo/2 for the non-atomic formulas is the
same in every case. Thus, for formulas that are conjunctions of literals, the
definition of Demo/2 would include the following clauses.
Demo(p, a A' b) «- Demo(p, a) A Demo(p, b)
Demo(p,-l'a) < -lDemo(p,a)
where A' represents A and -l represents -l.
If the reflective requirement is realized by means of inference rules, then
these must be built into the programming system. Thus the representation
must also be fixed by the programming system. The inference rule that
determines t from a term [t] must first check that [t] is ground and that
it represents an object level term and secondly, if the object language is
typed, that t is correctly typed in this language. The set of statements of
the form
Demo( ThisProgram, P'( [ x 1 ] , . . . , [ x n ] ) ) <- P(x1 , . . ., xn)
for each predicate P/n in the object language represented by a function
P'/n in the language of the meta-program will provide a definition of
Meta-Programming in Logic Programming 463
Demo/2 for the atomic formulas. Note that the xi are universally quanti-
fied variables, quantified over the terms of the object language.
Note that, as the trivial naming relation is built into Prolog, the repre-
sentation is trivially determined by means of an inference rule. However,
there is no check that [t] is ground before applying the inference rule. Re-
flective Prolog [Costantini and Lanzarone, 1989] (see below), is an example
of a language with a non-trivial naming relation, but where the represen-
tation and re-representation are determined by inference rules.
If a re-representation was defined functionally, the meta-program would
require a function such as ReRepresent/1. Thus, for each ground term t in
the object language, the equality theory for the meta-program must satisfy
ReRepresent([t]) = t
As this is part of a logic programming system where the equality theory is
normally fixed by the unification procedure and constraint handling mech-
anisms, the evaluation method for this function would be built-in so that
the representation would again be fixed by the programming system. Using
this function, the definition of Demo/2 for the atomic formulas is given by
a set of statements of the form
Demo(ThisProgram,P'(y 1 ,...,y n )) «-
P(ReRepresent(y 1 ), . . . , ReRepresent(yn))
for each predicate P/n in the object program.
For logic programming, the most flexible way in which a representation
may be defined is as a relation, say Represent/2, where the first argument
is an object term and the second its representation.
Represent(t, [t]).
Then the definition of Demo/2, for each predicate P/n in the object lan-
guage, would consist of a statement of the form
Demo( ThisProgram, P'(y 1 , . . . , y n )) <-
Represent(x 1 , y1) A ... A Represent(xn,yn) A
At the next (third) meta-level, not only would the representation of Nil",
Cons"/2, and Member"/2 have to be defined by Represent/2 but also the
representation of the function Demo'/2.
Each meta-level in a program could contain the representations of sev-
eral programs at the previous level. The relationships between the different
meta-levels in a program are sometimes called its meta-level architecture.
The usual architectures in which each meta-level reasons about only one
object program at the next lower level might be said to be "linear".
As the representation has to be explicitly defined using Demo/2 and
Represent/2, there is always a top-most meta-level. This will contain sym-
bols with no representation. Hence, without any 'higher-order' entensions,
logic programming cannot be used for the strong form of amalgamation.
One of the problems of this amalgamation is that the representation
needs to be made explicit. In the previous discussion, a representation of
the symbols is given and then a representation of the terms and formulas
is constructed in the standard way. However, it is often more convenient to
hide the details of the representation from the programmer. Quine [1951]
introduced the 'quasi-quotes' already used in this chapter to indicate a rep-
resentation of some unspecified object expression. This (or similar syntax)
can be used instead of an explicit representation. For example,
[Cons(x, Nil)]
would correspond (using the above representation) to the term
Cons'(V(0), Nil')
where V(0) is the (ground) representation of x.
This syntax is not constructive. That is, there is no direct means of con-
structing larger expressions from their components. Note that, in the Godel
programming system, the use of abstract data types for meta-programming
has a similar problem.
The main reason that terms representing object-level expressions have
to be constructed dynamically is because the structure of components
of these expressions may not be fully specified. The unspecified sub-
expressions are defined by variables that range not over the object terms
but over the representations of arbitrary object expressions. Such variables
are called meta-variables. Thus, instead of a programming system provid-
ing predicates for constructing terms representing object expressions, the
syntax may distinguish between meta-variables and variables ranging over
the object level terms. The partially specified object expression can then
be enclosed in the quasi-quotes but those meta-variables that occur within
their scope must be syntactically identifiable using some escape notation.
For example, if the escape notation is an overline, x indicates that in the
context of [. . .] a; is a meta-variable. For example,
\Cans(x,Nil)]
466 P. M. Hill and J. Gallagher
with a semantics. Chen et al. [1993] have developed the logic Hilog. This
is intended to give a logical basis for Prolog's ambivalent syntax. However,
although Hilog does not distinguish between functions, predicates, terms,
and atoms, it does not (as in Prolog) allow variables to range over arbitrary
expressions in the language. Hilog is intended to provide a basis for a new
logic programming system similar to Prolog but based on the logic of Hilog.
Richards [1974] has defined an ambivalent logic that allows formulas to be
treated as terms but not vice-versa. Another ambivalent logic is developed
by Jiang [1994]. This employs features from both Richards' logic and Hilog.
In Jiang's logic, there is no syntactic distinction between functions, predi-
cates, terms, and formulas. Moreover, the semantics distinguishes between
substitution (which is purely syntactic) and equality. The main purpose of
this logic is to provide an expressive syntax for self-reference together with
a suitable extension of first order logic for its semantics. However, it has
also been used to show that the Vanilla program Vp in Section 2 (with-
out the third clause that interprets negative formulas) has the intended
semantics.
5 Dynamic meta-programming
We consider three forms of dynamic meta-programming: constructing a
program using predefined components, updating a program, and trans-
forming or synthesizing a program.
P1 and P2 together.
P1 n P2 represents the module consisting of all clauses defined in the fol-
lowing way:
If p(t1,... ,tn) <- B1 is in the module represented byP1,
p ( u 1 , . . . , un) «- B2 is in the module represented by P2,
and 9 is an mgu for { p ( t 1 , . . . , t n ) , p ( u 1 , . . . ,un)},
then p ( t 1 , . . . , tn)0 <- B10, B20 is in the module represented by P1 n P26.
In the non-ground representation described in Section 2, the object pro-
gram is represented in the meta-program by the definition of the predicate
Clause/2. With this representation, a declarative meta-program cannot
modify the (representation of) the object program. However, by adding an
extra argument with type Module and replacing Clause /2 by OClause/3,
we can include in the representation of a clause the name of the module
in which it occurs. Any expression formed from module names, and the
operators U and n is called a program term. For example, given the facts:
OClause(P, Cat(Tom), True)
OClause(P, Mouse (Jerry), True)
OClause(Q, Chase(x,y), Cat(x) And Mouse(y))
for modules named P and Q, the program term PUQ represents the Chase
program in Figure 1. Moreover, with the set of facts:
OClause(P1, Cat(Tom), True)
OClause(P1, Mouse( Jerry), True)
OClause(P1,Chase(x,y),Cat(x))
OClause(Q1, Cat(Tom), True)
OClause(Q1, Mouse( Jerry), True)
OClause(Q1, Chase(x,y),Mouse(y))
for modules named P1 and Ql, the program term P1 n Ql also represents
the Chase program.
The operators U and n can be defined by extending program V in
Figure 3 to give the Operator-Vanilla program O in Figure 11. As the
operators have only been defined here in the case of definite programs,
O does not have a clause for interpreting negative literals. Given a set
of modules R, the program OR consists of the program O together with
the set of facts extending the definition of OClause and representing the
modules in PL.
The following result was proved in [Brogi et al., 1992].
Theorem 5.1.1. Let P and Q be object programs. Then, for any ground
atom A in the object language,
• OP,Q - OSolve(P u Q, A) iff A is a logical consequence of P u Q
6
We assume that the statements are standardized apart so that they have no variables
in common.
470 P. M. Hill and J. Gallagher
OSolve(p, True)
OSolve(p,(b And c)) <-
OSolve(p,b) A
OSolve(p,c)
OSolve(p,a) «-
OClause(p,a,b) A
OSolve(p,b)
OClause(p U q, a, 6) «-
OClause(p,a,b)
OClause(p U q, a, b) <-
OClause(q, a, b)
OClause(p n q, a, (b And c)) <-
OClause(p,a,b) A
Assimilate(kb,s,kb) «-
Demo(kb, s,_)
Assimilate(kb,s,newkb) <—
Remove(If(a,True),kb,kb1)/\
Demo([If(s, True)|fc61],a,_) A
Assimilate(kb1, s, newkb)`
Assimilate(kb,s,[If(s, True)\kb]) «-
->Demo(kb,s,-) A
-(3a3kbl(Remove(If(a, True),kb,kb1) A
Predicate Type
Assimilate List(u) * u * List(u)
Remove u, * List(u) * List(u)
Remove/3 is true if the first argument is an element of the list in the second
argument and the third argument is obtained by removing this element.
It is natural to require certain integrity constraints to hold when facts
are added to or removed from a knowledge base. These constraints are
formulas that should be logical consequences of (the completion of) the
updated knowledge base. A set of integrity constraints may be represented
as a list of terms. Each term representing an integrity constraint from the
set. The assimilation with integrity constraint checking is illustrated in
Figure 13. This defines the predicate AssimilateWithIC/4.
Predicate Type
AssimilateWithIC List(u) * List(u) * u * List(u)
The first argument of AssimilateWithIC/'4 is the representation of the set
of integrity constraints. If the knowledge base, consisting of the initial
knowledge base (represented by the second argument) together with the
fact which is to be assimilated (represented in the third argument), sat-
isfies the integrity constraints (represented in the first argument), then
the Assimilate predicate, defined in Figure 12, is called to update the
knowledge base. The fourth argument of AssimilateWithIC/4 contains the
representation of the updated knowledge base.
As indicated in Section 3, the first logic programming system to provide
declarative facilities for updating logic programs based on a ground repre-
Meta-Programming in Logic Programming 473
icates used for the king's and wise men's knowledge bases together with
their representation is as follows.
Object symbol Representation
Black C(0)
White C(1)
W1 C(ll)
W2 C(12)
W3 C(13)
Hat/2 P(1)
DontKnow/1 P(2)
Hear/2 P(3)
See/2 P(4)
DiffColor/2 P(5)
Same/2 P(6)
The predicate Demo is defined as either IDemo in Figure 5 or JDemo in
Figure 6. AssimilateWithIC is defined in Figure 13. The knowledge of the
king and his wise men consists of sets of normal clauses represented by a
list of representations of these clauses in some order. There are two initial
knowledge bases. One contains knowledge common to all the men.
Hat( W3, White) <- Hat( W2, Black) A Hat( W1, Black)
Hat( W2, White) <- Hat( W1, Black) A Hat( W3, Black)
Hat( W1, White) <- Hat( W3, Black) A Hat( W2, Black)
Hear(W3,W2)
Hear (W3, W1)
Hear(W2, W1)
See(x,y) «- --Same(x,y)
DiffColor( White, Black)
DiffColor (Black, White)
Same(x,x)
This is represented by the single fact defining the predicate CommonKb/1.
CommonKb ([
CommonKb(1.
If(Atom(P(1),[C(13),C(1)]),
And(Atom(P(1), [C(12), C(0)]),Atom(P(1), [C(ll), C(0)]))),
If(Atom(P(l),[C(12),C7(l)]),
And(Atom(P(1), [C(ll), C(0)]), Atom(P(1), [C(13), C(0)]))),
If(Atom(P(l),[C(ll),C(l)]),
And(Atom(P(1), [C(13), C(0)]), .Atom(P(l), [C(12), C(0)]))),
If(Atom(P(3), [C(13),C(12)]), True),
If(Atom(P(3), [(7(13),C(ll)]), True),
I f ( A t o m ( P ( 3 ) , [C(12), C(ll)]), True),
If(Atom(P(4), [V(l), V(2)]), Not(Atom(P(6), (V(1),V(2)]))),
I f ( A t o m ( P ( 5 ) , [ C ( 1 ) , C ( 0 ) ] ) , True),
Meta-Programming in Logic Programming 475
If(Atom(P(5),[C(0),C(1)]), True),
If(Atom(P(6),(V(1),V(1)]),True)
})
In addition to the above common knowledge base, the men will have certain
commonly held constraints on what combination of knowledge is accept-
able. As an example of such a constraint, we assume that all men know
that a hat cannot be both black and white.
-(Hat(w, Black) A Hat(w, White))
This can be represented in the fact defining the predicate CommonIC/1.
CommonIC([
Not(And(
Atom(P(1),(V(11),C(0)]),
The other knowledge base contains a list of facts describing the king's
knowledge about the state of the world.
Hat(W1, White)
Hat(W2, White)
Hat(W3, White)
DontKnow( W1)
DontKnow( W2)
This is represented by the fact defining the predicate KingKb/1.
KingKb([
I f ( A t o m ( P ( 1 ) , [C(ll), C(l)]), True),
I f ( A t o m ( P ( 1 ) , [C(12), C(l)]), True),
If(Atom(P(1),(C(l3),C(1)]), True),
If(Atom(P(2),[C(12)]),True),
If(Atom(P(2),(C(ll)]),True)
])
The king and the wise men can use only their own knowledge when sim-
ulating other men's reasoning. The key to their reasoning is defined by the
predicate LocalKb/3. Given the name of a wise man and a current knowl-
edge base in the first and second arguments, respectively, then LocalKb/3
will be true if the third argument is bound to the part of the knowledge
base of the wise man which is contained in the current knowledge base.
LocalKb(w,kb,localkb) <- LocalKbl(w,kb,kb,localkb)
The predicate LocalKb/3 calls LocalKb1/4. This predicate requires an ex-
tra copy of the initial knowledge base so that it can process all clauses
represented by elements of the list and determine which should be included
in the new knowledge base.
476 P. M. Hill and J. Gallagher
LocalKb1(-, -,[],[])
LocalKb1(w, ikb, [k\kb], [k\lkb]) <-
CommonKb(ckb) A
Member (k,ckb) A
LocalKb1(w, ikb, kb, lkb)
LocalKb1(w,ikb,[If(Atom(P(1),[w1,c]), True)\kb],
[ I f ( A t o m ( P ( 1 ) , [ w 1 , c ] ) , True)\lkb]) <-
Demo(ikb, Atom(P(4), [w, w1]),-.) A
LocalKb1(w, ikb, kb, lkb)
LocalKb1(w,ikb,[If(Atom(P(1),[w1,-]),True)\kb],
-Demo(ikb,Atom(P(4),[w,w1]),-) A
LocalKb1(w, ikb, kb, ikb, lkb)
LocalKb1(w,ikb, [If(Atom(P(2), [ w 1 ] ) , True)\kb],
[ I f ( A t o m ( P ( 2 ) , [ w 1 ] ) , True)\lkb]) <-
Demo(ikb,Atom(P(3),[w,w1]),-) A
LocalKb1(w, ikb, kb, lkb)
LocalKb1(w,ikb, [If(Atom(P(2), [w1]), True)|kb],
Ikbs) «-
-Demo(ikb, Atom(P(3), [w, w 1 ] ) , - ) A
LocalKb1(w, ikb, kbs, lkb)
There are six statements defining LocalKb1/4. The first is the base case.
The second ensures that common knowledge is included in the wise man's
knowledge. The third (resp., fourth) deals with the case where the wise
man can (resp., cannot) see a hat and the colour is known. The fifth
(resp., sixth) deals with the case where the wise man can (resp., cannot)
hear another man and that man says he doesn't know.
The predicate Reason/3 simulates the reasoning of the men. Either the
man in the first argument reasons that the colour of his hat is white because
he believes the other two hats are black, or, if he has heard another wise
man say "I do not know the colour of my hat" , he hypothesises a colour for
his own hat and by simulating the other man's reasoning, tries to obtain
a contradiction. The predicate AssimilateWithIC checks that no integrity
constraints are violated by the extra hypothesis.
Reason(kb,w,c) <—
Demo(kb, Atom(P(1), [w, V ( 1 ) ] ) , Atom(P(1), [w, c]))
Reason(kb,w,c1) «—
Demo(kb, Atom(P(2), [ V ( 1 ) ] ) , Atom(P(2), [ w 1 ] ) ) A
LocalKb(w1, kb, newkb)A
Demo(kb, Atom(P(5), [V(1), V(2)]), Atom(P(5), [ c 1 , c 2 ] ) ) A
CommonIC (ics) A
Assimilate WithIC(ics, newkb, Atom(P(1), [w,c2]),newkb1) A
Reason(newkb1, w1, .)
Meta-Programming in Logic Programming 477
Finally, the predicate King/2 models the king's own reasoning. The king
will deduce that the man in the first argument should be able to reason
that the colour of his hat is the colour given in the second argument.
King(w, c) <—
KingKb(kkb) A
CommonKb(ckb) A
Append (kkb, ckb, kb) A
LocalKb(w, kb, lkb) A
Reason(lkb, w, c)
With the program consisting of these definitions together with the pro-
grams in Figures 5, 12, and 13 and the usual definitions of Append/3 and
Member /2, the goal
<- King(C(13),c)
has just the computed answer
7
The wise men program with these goals has been machine-checked using the Godel
system.
478 P. M. Hill and J. Gallagher
DeriveAll([],_,_,[],_)
Denve,An([pc|pcs], If(h, b), a, [c|cs], m) <-
Resolve(b,a,pc, [],s,bl,m,n) A
ApplyToForm (s, If(h, b1) , c) A
Derive All (pcs, If(h, b), a, cs, n)
DeriveAll(\pc\pcs], If(h, b),a, cs, m ) < —
->Resolve (b, a, pc, [],_,_, m,_) A
Derive All (pcs, If(h, b), a, cs, m)
by Seki [1989] and Gardner and Shepherdson [1991] for normal programs.
Kanamori and Horiuchi [1987] showed that it preserves computed answers.
Figure 14 contains the top part of a program that performs an unfold-
ing step, defined by the predicate Unfold/4. The ground representation
given in Subsection 2.1 is assumed. The types of Unfold/4, SelectClause/2,
DeriveAll/5, and Replace/4 are as follows.
Predicate Type
Unfold List(u) * u, * u * List(u)
SelectClause u * List (u)
Derive All List(u) * u * u * List(u) * Integer
Replace List(u) * u * List(u) * List(u)
The predicate SelectClause/ 2 selects a clause from a program; DeriveAll/5
attempts a derivation step for every clause in the program; and Replace/4
removes an element from a list and inserts a sublist of elements in its place.
The remaining predicates are described in Subsection 3.2.
The unfold procedure requires certain basic steps: standardising apart
the variables used in the program's clauses from the variables in the clause
selected for unfolding, computing a unifier, and applying substitutions, and
so on. Moreover at the heart of such a procedure is the need to construct
480 P. M. Hill and J. Gallagher
6 Specialization of meta-programs
Meta-level computations involve an overhead for interpreting the repre-
sentation of an object program. The more complex and expressive the
representation, the greater the overhead is likely to be. The ground repre-
sentation, in particular, is associated by many with inefficiency. It has been
shown that the overhead can be "compiled away" for a meta-program that
operates on a given object program. The method for doing this is based on
a program transformation technique called program specialization which
is a large topic in its own right and not limited to meta-programs in its
application. However, the combination of meta-programming and program
specialization appears a particularly fruitful one, and it is this aspect to-
gether with its applications that we discuss in this section.
IDemol(_, True).
IDemol(p,And(x,y)) <-
IDemol(p,x) A
IDemol(p,y)
IDemol(p, Not(x)) <-
-IDemol(p, x)
IDemo1(p,Atom(P(n),xs)) <-
Member(z,p) A InstanceOf(z, If(Atom(P(n),xs),b)) A
InstanceOf (x , y) A
/Demol(y).
(P(0), [x, Term(F(1), [x, z])]).
IDemol(Atom(P(0),[x,Term(F(1),[y,z])]) <-
IDemol(Atom(P(0), [x, z])).
Fig. 16. The specialized Instance-Demo interpreter
<- IDemo([
If(Atom(P(0), [Vor(O), Term(F(l), [Var(O), Var(l)])]), True),
If(Atom(P(0), [Var(0),Term(F(1), [Var(1), Var(2)])]),
Atom(P(0),(Var(0),Var(2)]))],
x,y)
A suitable partial evaluation of I with respect to this goal gives the result
shown in Figure 16.
Meta-Programming in Logic Programming 483
(We assume that D is the language of input and output for programs in
£). That is, IM takes the representations of a program pc and some data
XD for pc and computes the representation of the output, say yD.
The analogy between compilation and interpreter specialization was
first identified by Futamura [Futamura, 1971]. In the following we formulate
the so-called Futamura projections in such a way as to emphasize meta-
programming aspects.
Definition 6.2.3. First Futamura projection
Let PSk be a program specializer written in K, for M . Let IM be an
interpreter for programs in £. Then specialization of the interpreter with
respect to a given program pc is expressed by
Note that the second argument is meta2. The result, namely [PS I M ]k,
is the representation of a program which, when given a program pc. , pro-
duces [Ipm]k. This is the representation of a compiled version of pc, as
established by the first Futamura projection. Thus the second Futamura
projection expresses the production of a compiler from an interpreter for C.
Definition 6.3.2. Third Futamura projection
The specialization of PS with respect to itself and (a representation of)
itself is expressed by
Again, the second argument is meta2, and is not the same as the first
argument. The result [PSPS]k is a program which, when given [IM]k,
produces [PS IM ]k. In other words, it returns the representation of a com-
piler of programs in L, as established by the second Futamura projection.
The third Futamura projection thus expresses the production of a compiler
generator (that produces a compiler from a given interpreter).
Meta- Programming in Logic Programming 489
The second and third projections provide a way of achieving the first
projection in stages. This is useful where the same meta-program is to be
executed with many different object programs. The "compiler" associated
with that meta-program can be obtained using the second projection. If
compilers for different meta-programs are to be produced, then the third
projection is useful since it shows how to obtain a "compiler-generator"
from a partial evaluator.
The effective implementation of the Futamura projections is the subject
of current research. The effectiveness of the first Futamura projection is
critical to the usefulness of the second and third, which are simply means
to achieve the first projection (compilation) by stages. Interpreters, or
indeed any meta-programs, do not appear to be any less complex than
programs in general from the point of view of specialization. Therefore
an effective specializer for the first projection should be also an effective
general purpose specializer.
In order to perform the second and third projections the specializer
should be effectively self-applicable. This requirement, added to the re-
quirement of being a good general purpose specializer, has proved very
difficult to meet. Program analysis methods based on abstract interpre-
tation have been employed to complement partial evaluation and add to
its effectiveness [Sestoft and Jones, 1988], [Mogensen and Bondorf, 1993],
[Gurr, 1993]. Another approach to self-application is to use two or more
versions of a specializer [Ruf and Weise, 1993], [Fujita and Furukawa, 1988].
A simple specializer can be applied to a complex one, or vice versa. In such
a method, extra run-time computation in the compiled program or in the
compiler produced by the projections is traded for less computation during
the second and third Futamura projections respectively.
6.4 Applications of meta-program specialization
The uses of specialization with meta-programming are many, and we finish
this section by indicating some areas of current research in which special-
ization of logic programs is relevant.
Implementing other languages and logics. First order logic, or fragments of
it such as definite clauses or normal clauses, can be used as a meta-language
for defining the semantics of other languages and logics. The Futamura
projections then offer a general compilation mechanism to improve the
computational efficiency of the semantics.
A proof system for a logic £ can be constructed from a set of (abstract)
syntax rules for defining expressions of L, together with a set of inference
rules of the form
0
where <*0, . . . ,ok,/3 are expressions in L, and B is inferred from c*0, . . . ,
and ok. Clearly these rules can be encoded as definite clauses, and the
490 P. M. Hill and J. Gallagher
Acknowledgements
We are particularly indebted to John Lloyd and Frank Van Harmelen who
have made many suggestions that have improved this chapter. We greatly
appreciate their help. We also thank all those who, through discussion
and by commenting on earlier drafts contributed to this work. These in-
clude Jonas Barklund, Tony Bowers, Henning Christiansen, Yuejun Jiang,
Bob Kowalski, Bern Martens, Dale Miller, Alberto Pettorossi, Danny De
Schreye, Sten-Ake Tarnlund, and Jiwei Wang.
Work on this chapter has been carried out while the first author was
supported by a SERC grant GR/H/79862. In addition, the ESPRIT Basic
Research Action 3012 (Compulog) and Project 6810 (Compulog 2) have
provided opportunities to develop our ideas and understanding of this area
by supporting many workshops and meetings and providing the funds to
attend.
492 P. M. Hill and J. Gallagher
References
[Abramson and Rogers, 1989] H. Abramson and M. Rogers, eds. Meta-
Programming in Logic Programming, MIT Press, 1989. Proceedings of
the Meta88 Workshop.
[Aiello et al, 1988] L. C. Aiello, D. Nardi and M. Schaerf. Reasoning about
knowledge and ignorance, in 'Proceedings of the FGCS', pp. 618-627,
1988.
[Barklund, 1995] J. Barklund. Metaprogramming in logic, Technical Re-
port UPMAIL 80, Department of Computer Science, University of Upp-
sala, Sweden. Also in Encyclopedia of Computer Science and Technology,
Vol. 33, A. Kent and J. G. Williams (eds.), Marcel Dekker, 1995.
[Barklund and Hamfelt, 1994] H. Barklund and A. Hamfelt. Hierarchical
representation of legal knowledge with meta-programming in logic, Jour-
nal of Logic Programming 18(1), 55-80, 1994.
[Barklund et al., 1995] J. Barklund, K. Boberg, P. Dell'Aqua and M.
Veanes. Meta-programming with theory systems. In K. Apt and
F. Turini, eds, Meta-programming in Logic Programming, MIT Press,
1995.
[Bowen and Kowalski, 1982] K. Bowen and R. A. Kowalski. Amalgamating
language and metalanguage in logic programming. In K. Clark & S.-A.
Tarnlund, eds, Logic Programming, pp. 153-172. Academic Press, 1982.
[Bowen and Weinberg, 1985] K. Bowen and T. Weinberg. A meta-level ex-
tension of Prolog. In Proceedings of 2nd IEEE Symposium on Logic Pro-
gramming, Boston, pp. 669-675. Computer Society Press, 1985.
[Brogi et al., 1990] A. Brogi, P. Mancarella, D. Pedreschi and F. Turini.
Composition operators for logic theories. In J. W. Lloyd, ed., Computa-
tional Logic, Springer-Verlag, pp. 117-134, 1990.
[Brogi et al., 1992] A. Brogi, P. Mancarella, D. Pedreschi and F. Turini.
Meta for modularising logic programming. In A. Pettorossi, ed., Meta-
Programming in Logic, Proceedings of the 3rd International Workshop,
Meta-92, Uppsala, Sweden, pp. 105-119. Springer-Verlag, 1992.
[Bruynooghe et al., 1989] M. Bruynooghe, D. De Schreye and B. Krekels.
Compiling control, Journal of Logic Programming 6, 135-162, 1989.
[Chen et al., 1993] W. Chen, M. Kifer and D. S. Warren. HiLog: A founda-
tion for higher-order logic programming, Journal of Logic Programming
15, 187-230, 1993.
[Christiansen, 1994] H. Christiansen. On proof predicates in logic program-
ming, In A. Momigliano and M. Ornaghi, eds, Proof-Theoretical Exten-
sions of Logic Programming, CMU, Pittsburgh, PA 15213-3890, USA,
1994. Proceedings of an ICLP-94 Post-Conference Workshop.
[Clark, 1978] K. L. Clark. Negation as failure. In H. Gallaire and J. Minker,
eds, Logic and Data Bases, Plenum Press, pp. 293-322, 1978.
Meta-Programming in Logic Programming 493
Contents
1 Introduction 500
2 A motivation for higher-order features 502
3 A higher-order logic 510
3.1 The language 510
3.2 Equality between terms 513
3.3 The notion of derivation 517
3.4 A notion of models 519
3.5 Predicate variables and the subformula property . 522
4 Higher-order Horn clauses 523
5 The meaning of computations 528
5.1 Restriction to positive terms 529
5.2 Provability and operational semantics 534
6 Towards a practical realization 537
6.1 The higher-order unification problem 538
6.2 P-derivations 541
6.3 Designing an actual interpreter 546
7 Examples of higher-order programming 549
7.1 A concrete syntax for programs 549
7.2 Some simple higher-order programs 552
7.3 Implementing tactics and tacticals 556
7.4 A comparison with functional programming . . . . 560
8 Using y-terms as data structures 561
8.1 Implementing an interpreter for Horn clauses . . . 563
8.2 Dealing with functional programs as data 565
8.3 A limitation of higher-order Horn clauses 572
9 Hereditary Harrop formulas 574
9.1 Universal quantifiers and implications in goals . . . 574
9.2 Recursion over structures with binding 577
10 Conclusion 584
500 Gopalan Nadathur and Dale Miller
1 Introduction
Modern programming languages such as Lisp, Scheme and ML permit pro-
cedures to be encapsulated within data in such a way that they can subse-
quently be retrieved and used to guide computations. The languages that
provide this kind of an ability are usually based on the functional pro-
gramming paradigm, and the procedures that can be encapsulated in them
correspond to functions. The objects that are encapsulated are, therefore,
of higher-order type and so also are the functions that manipulate them.
For this reason, these languages are said to allow for higher-order pro-
gramming. This form of programming is popular among the users of these
languages and its theory is well developed.
The success of this style of encapsulation in functional programming
makes is natural to ask if similar ideas can be supported within the logic
programming setting. Noting that procedures are implemented by predi-
cates in logic programming, higher-order programming in this setting would
correspond to mechanisms for encapsulating predicate expressions within
terms and for later retrieving and invoking such stored predicates. At least
some devices supporting such an ability have been seen to be useful in prac-
tice. Attempts have therefore been made to integrate such features into
Prolog (see, for example, [Warren, 1982]), and many existing implemen-
tations of Prolog provide for some aspects of higher-order programming.
These attempts, however, are unsatisfactory in two respects. First, they
have relied on the use of ad hoc mechanisms that are at variance with the
declarative foundations of logic programming. Second, they have largely
imported the notion of higher-order programming as it is understood within
functional programming and have not examined a notion that is intrinsic
to logic programming.
In this chapter, we develop the idea of higher-order logic programming
by utilizing a higher-order logic as the basis for computing. There are, of
course, many choices for the higher-order logic that might be used in such
a study. If the desire is only to emulate the higher-order features found in
functional programming languages, it is possible to adopt a "minimalist"
approach, i.e., to consider extending the logic of first-order Horn clauses—
the logical basis of Prolog—in as small a way as possible to realize the
additional functionality. The approach that we adopt here, however, is
to enunciate a notion of higher-order logic programming by describing an
analogue of Horn clauses within a rich higher-order logic, namely, Church's
simple theory of types [Church, 1940]. Following this course has a number
of pleasant outcomes. First, the extension to Horn clause logic that results
from it supports, in a natural fashion, the usual higher-order program-
ming capabilities that are present in functional programming languages.
Second, the particular logic that is used provides the elegant mechanism
of y-terms as a means for constructing descriptions of predicates and this
Higher-Order Logic Programming 501
logics that do not use y-terms and in which the equality of expressions
continues to be based on the identity relation. One such proposal appears
in [Wadge, 1991]. Conversely, a logic that is higher-order in the third
sense may well not permit a quantification over predicates and, thus, may
not be higher-order in the second sense. An example of this kind is the
specification logic that underlies the Isabelle proof system [Paulson, 1990].
Developing a theory of logic programming within higher-order logic has
another fortunate outcome. The clearest and strongest presentations of
the central results regarding higher-order Horn clauses require the use of
the sequent calculus: resolution and model theory based methods that are
traditionally used in the analysis of first-order logic programming are either
not useful or not available in the higher-order context. It turns out that
the sequent calculus is an apt tool for characterizing the intrinsic role of
logical connectives within logic programming and a study such as the one
undertaken here illuminates this fact. This observation is developed in a
more complete fashion in [Miller et al., 1991] and [Miller, 1994] and in the
chapter by Loveland and Nadathur in this volume of the Handbook.
These formulas are related to Horn clauses in the following sense: within
the framework of classical logic, the negation of a G-formula is equivalent
to a set of negative Horn clauses and, similarly, a D-formula is equivalent
to a set of positive Horn clauses. We refer to the D-formulas as definite
clauses, an alternative name in the literature for positive Horn clauses, and
to the G-formulas as goal formulas because they function as goals within
the programming paradigm of interest. These names in fact motivate the
symbol chosen to denote members of the respective classes of formulas.
The programming interpretation of these formulas is dependent on
treating a collection of closed definite clauses as a program and a closed
goal formula as a query to be evaluated; definite clauses and goal formulas
are, for this reason, also called program clauses and queries, respectively.
The syntactic structures of goal formulas and definite clauses are relevant
to their being interpreted in this fashion. The matrix of a definite clause
is a formula of the form A or G D A, and this is intended to correspond to
(part of) the definition of a procedure whose name is the predicate head
of A. Thus, an atomic goal formula that "matches" with A may be solved
immediately or by solving the corresponding instance of G, depending on
the case of the definite clause. The outermost existential quantifiers in a
goal formula (which are usually left implicit) are interpreted as a request
to find values for the quantified variables that produce a solvable instance
of the goal formula. The connectives A and V that may appear in goal
formulas typically have search interpretations in the programming context.
The first connective specifies an AND branch in a search, i.e. it is a request
to solve both components of a conjoined goal. The goal formula G1 V G2
represents a request to solve either G1 or G2 independently, and V is thus
a primitive for specifying OR branches in a search. This primitive is not
provided for in the traditional description of Horn clauses, but it is nev-
ertheless present in most implementations of Horn clause logic. Finally, a
search related interpretation can also be accorded to the existential quan-
tifier. This quantifier can be construed as a primitive for specifying an
infinite OR branch in a search, where each branch is parameterized by the
504 Gopalan Nadathur and Dale Miller
in the context of the append program. The clauses above involve a quantifi-
cation over P which is evidently a predicate variable. This variable can be
instantiated with the name of an actual procedure (or predicate) and would
lead to the invocation of that procedure (with appropriate arguments) in
the course of evaluating a mappred query. To provide a particular example,
let us assume that our program also contains a list of clauses defining the
ages of various individuals, such as the following:
age(bob, 24),
age(sue, 23).
The procedure mappred can then be invoked with age and a list of indi-
viduals as arguments and may be expected to return a corresponding list
of ages. For instance, the query
EL mappred(age, cons(bob, cons(sue, nil)),L)
should produce as an answer the substitution cons(24, cons(23, nil)) for L.
Tracing a successful solution path for this query reveals that, in the course
of producing an answer, queries of the form age(bob, Y1) and age(sue, Y2)
have to be solved with suitable instantiations for Y1 and Y2.
The above example involves an instantiation of a simple kind for pred-
icate variables—the substitution of a name of a predicate. A question to
consider is if it is useful to permit the instantiation of such variables with
more complex predicate terms. One ability that seems worth while to sup-
port is that of creating new relations by changing the order of arguments
of given relations or by projecting onto some of their arguments. There are
several programming situations where it is necessary to have "views" of a
given relation that are obtained in this fashion, and it would be useful to
have a device that permits the generation of such views without extensive
additions to the program. The operations of abstraction and application
that are formalized by the y-calculus provide for such a device. Consider,
for example, a relation that is like the age relation above, except that it has
its arguments reversed. Such a relation can be represented by the predicate
term yX yY age(Y, X). As another example, the expression yX age(X, 24)
creates from age a predicate that represents the set of individuals whose
age is 24.
An argument can thus be made for enhancing the structure of terms
in the language by including the operations of abstraction and application.
The general rationale is that it is worth while to couple the ability to
treat predicates as values with devices for creating new predicate valued
terms. Now, there are mechanisms for combining predicate terms, namely
the logical connectives and quantifiers, and the same argument may be
advanced for including these as well. To provide a concrete example, let us
assume that our program contains the following set of definite clauses that
define the "parent" relation between individuals:
Higher-Order Logic Programming 507
goals is not a feasible possibility. To see this, suppose that we are given
the query
3P mappred(P, cons(bob, cons(sue, nil)),cons(24, cons(23, nil))).
It might appear that a suitable answer can be provided to this query and
that this might, in fact, be the value age for P. A little thought, however,
indicates that the query is an ill-posed one. There are too many predicate
terms that hold of 606 and 24 and of sue and 23—consider, for example,
the myriad ways for stating the relation that holds of any two objects—and
enumerating these does not seem to be a meaningful computational task.
The above discussion brings out the distinction between quantification
over only predicates and quantification over both predicates and functions.
This does not in itself, however, address the question of usefulness of per-
mitting quantification over functions. This question is considered in detail
in Sections 8 and 9 and so we provide only a glimpse of an answer to it at
this point. For this purpose, we return to the two mapfun queries above. In
both queries the new ability obtained from function variables and y-terms
is that of analyzing the process of substitution. In the first query, the com-
putation involved is that of performing substitutions into a given structure,
namely h(1, X). The second query involves finding a structure from which
two different structures can be obtained by substitutions; in particular, a
structure which yields h(l, 1) when 1 is substituted into it and h(1,2) when
2 is used instead. Now, there are several situations where this ability to
analyze the structures of terms via the operation of substitution is impor-
tant. Furthermore, in many of these contexts, objects that incorporate the
notion of binding will have to be treated. The terms of a y-calculus and
the accompanying conversion rules are, as we shall see, appropriate tools
for correctly formalizing the idea of substitution in the presence of bound
variables. Using function variables and y-terms can, therefore, provide for
programming primitives that are useful in these contexts.
We have already argued, using the mappred example, for the provision
of predicate variables in the arguments of (atomic) goal formulas. This
argument can be strengthened in light of the fact that predicates are but
functions of a special kind. When predicate variables appear as the heads
of atomic goals, i.e., in "extensional" positions, they can be instantiated,
thereby leading to the computation of new goals. However, as we have
just observed, it is not meaningful to contemplate finding values for such
variables. When predicate variables appear in the arguments of goals,
i.e. in "intensional" positions, values can be found for them by structural
analyses in much the same way as for other function variables. These
two kinds of occurrences of predicate variables can thus be combined to
advantage: an intensional occurrence of a predicate variable can be used
to form a query whose solution is then sought via an extensional occurrence
510 Gopalan Nadathur and Dale Miller
3 A higher-order logic
A principled development of a logic programming language that incorpo-
rates the features outlined in the previous section must be based on a
higher-order logic. The logic that we use for this purpose is derived from
Church's formulation of the simple theory of types [Church, 1940] princi-
pally by the exclusion of the axioms concerning infinity, extensionality for
propositions, choice and description. Church's logic is particularly suited
to our purposes since it is obtained by superimposing logical notions over
the calculus of y-conversion. Our omission of certain axioms is based on a
desire for a logic that generalizes first-order logic by providing a stronger
notion of variable and term, but that, at the same time, encompasses only
the most primitive logical notions that are relevant in this context; only
these notions appear to be of consequence from the perspective of com-
putational applications. Our logic is closely related to that of [Andrews,
1971], the only real differences being the inclusion of n-conversion as a rule
of inference and the incorporation of a larger number of propositional con-
nectives and quantifiers as primitives. In the subsections that follow, we
describe the language of this logic and clarify the intended meanings of
expressions by the presentation of a deductive calculus as well as a notion
of models. There are several similarities between this logic and first-order
logic, especially in terms of proof-theoretic properties. However, the richer
syntax of the higher-order logic make the interpretation and usefulness of
these properties in the two contexts different. We dwell on this aspect in
Subsection 3.5.
set of "function" terms whose domains and ranges are given by a and T
respectively. In keeping with this intuition, we refer to the types obtained
by virtue of (i) and (ii) as atomic types and to those obtained by virtue of
(iii) as function types.
We will employ certain notational conventions with respect to types.
To begin with, we will use the letters a and T, perhaps with subscripts, as
metalanguage variables for types. Further, the use of parentheses will be
minimized by assuming that -> associates to the right. Using this conven-
tion, every type can be written in the form (o1 —> . . .—> on —> T) where T
is an atomic type. We will refer to o1, . . . , on as the argument types and
to T as the target type of the type when it is written in this form. This
terminology is extended to atomic types by permitting the argument types
to be an empty sequence.
The class of terms is obtained by the operations of abstraction and
application from given sets of constants and variables. We assume that
the constants and variables are each specified with a type and that these
collections meet the following additional conditions: there is at least one
constant and a denumerable number of variables of each type and the
variables of each type are distinct from the constants and the variables of
any other type. The terms or formulas are then specified together with an
associated type in the following fashion:
between 3 and V and the quantifiers familiar from first-order logic may be
understood from the following: the existential and universal quantification
of x over P is written as (3 (yx P)) and (V (yx P)) respectively. Under this
representation, the dual aspects of binding and predication that accompany
the usual notions of quantification are handled separately by abstractions
and constants that are propositional functions of propositional functions.
The constants that are used must, of course, be accorded a suitable in-
terpretation for this representation to be a satisfactory one. For example,
the meaning assigned to V must be such that (V (yx P)) holds just in case
(yx P) corresponds to the set of all objects of the type of x.
Certain conventions and terminology are motivated by the intended
interpretations of the type o and the logical constants. In order to per-
mit a distinction between arbitrary terms and those of type o, we re-
serve the word "formula" exclusively for terms of type o. Terms of type
o1 -> . . . -> on -> o correspond to n-ary relations on terms and, for this
reason, we will also refer to them as predicates of n-arguments. In writing
formulas, we will adopt an infix notation for the symbols A, V and D; e.g.,
we will write (A F G) as (F^G). In a similar vein, the expressions (3x F)
and (Vx F) will be used as abbreviations for (3 (yx F)) and (V (yx F)) Par-
allel to the convention for abstractions, we will sometimes write the expres-
sions 3x1 . . . 3xn F and Vx1 . . . Vxn F as 3x 1 , . . . , xn F and Vx1, . . . , xn F
respectively. In several cases it will only be important to specify that the
"prefix" contains a sequence of some length. In such cases, we will use x
an abbreviation for a sequence of variables and write yx F, 3x F or Vic F,
as the case might be.
Our language at this point has the ability to represent functions as well
as logical relations between functions. However, the sense in which it can
represent these notions is still informal and needs to be made precise. We
do this in the next two subsections, first by describing a formal system that
clarifies the meaning of abstraction and application and then by presenting
a sequent calculus that bestows upon the various logical symbols their
intended meanings.
where A is a constant or variable, and, for 1 < i < m, Ti also has the
same structure. We refer to the sequence x1, . . .,xn as the binder, to A as
the head and to Ti,..., Tm as the arguments of such a term; in particular
instances, the binder may be empty, and the term may also have no argu-
ments. Such a term is said to be rigid if its head, i.e. A, is either a constant
or a variable that appears in the binder, and flexible otherwise.
In identifying the canonical members of the equivalence classes of terms
with respect to y-conv, there are two different approaches that might be
followed. Under one approach, we say that a term is in n-normal form
if it has no subterm of the form (Ax (A x)) in which x is not free in A,
and we say a term is in y-normal form if it is in both 0- and n-normal
form. The other alternative is to say that a term is in y-normal form if
it can be written in the form yx (A T1 . . . Tm) where A is a constant
or variable of type o -> . . . -> om -> T with T being an atomic type
and, further, each Ti can also be written in a similar form. (Note that
the term must be in B-normal form for this to be possible.) In either
case we say that T is a y-normal form for 5 if 5 y-conv T and T is in
y-normal form. Regardless of which definition is chosen, it is known that
a y-normal form exists for every term in our typed language and that
this form is unique up to a renaming of bound variables (see [Barendregt,
1981] and also the discussion in [Nadathur, 1987]). We find it convenient
to use the latter definition in this chapter and we will write ynorm(T)
to denote a y-normal form for T under it. To obtain such a form for
any given term, the following procedure may be used: First convert the
term into B-normal form by repeatedly replacing every subterm of the form
((yx A) B) by SBx A preceded, perhaps, by some a-conversion steps. Now,
if the resulting term is of the form yx1, . . . , xm (A T1 . . . Tn) where A is of
type o1 -> . . . on -> on+1 ->. . .—> on+r -> T, then replace it by the term
where yi, . . . ,yr are distinct variables of appropriate types that are not
contained in {x1, . . . ,xm}. Finally repeat this operation of "fluffing-up" of
the arguments on the terms T1,. . . , Tn, y1, . . . , yr.
A y-normal form of a term T as we have defined it here is unique only
up to a renaming of bound variables (i.e., up to o-conversions). While this
is sufficient for most purposes, we will occasionally need to talk of a unique
normal form. In such cases, we use p(F) to designate what we call the
principal normal form of F. Determining this form essentially requires a
naming convention for bound variables and a convention such as the one
in [Andrews, 1971] suffices for our purposes.
The existence of a y-normal form for each term is useful for two reasons.
First, it provides a mechanism for determining whether two terms are equal
by virtue of the y-conversion rules. Second, it permits the properties of
terms to be discussed by using a representative from each of the equivalence
516 Gopalan Nadathur and Dale Miller
It can be seen that this definition is independent of the order in which the
pairs are taken from 0 and that it formalizes the idea of replacing the free
occurrences of x1, . . . ,xn in S simultaneously by the terms T1, . . . , Tn. We
often have to deal with substitutions that are given by singleton sets and
we introduce a special notation for the application of such substitutions: if
0 is {(x,T)}, then 0(S) may also be written as [T/x]S.
Certain terminology pertaining to substitutions will be used later in
the chapter. Given two terms T1 and T2, we say T1 is an instance of T2
if it results from applying a substitution to T2. The composition of two
substitutions 01 and 02, written as 01 o 02, is precisely the composition of
01 and 02 when these are viewed as mappings: Q1 o0 2 (G) = 01(0 2 (G)). The
restriction of a substitution 0 to a set of variables V, denoted by 0 t V, is
given as follows:
from our calculus. These inference figures are referred to as the Cut in-
ference figures and they occupy a celebrated position within logic. Their
omission from our calculus is justified by a rather deep result for the higher-
order logic under consideration, the so-called cut-elimination theorem. This
theorem asserts that the same set of sequents have proofs in the calculi with
and without the Cut inference figures. A proof of this theorem for our logic
can be modelled on the one in [Andrews, 1971] (see [Nadathur, 1987] for
details). The only other significant difference between our calculus and
the one in [Gentzen, 1969] is that our formulas have a richer syntax and
the A inference figures have been included to manipulate this syntax. This
difference, as we shall observe presently, has a major impact on the process
of searching for proofs.
the last two are actually a family of mappings, parameterized by types. Let
C be the set containing Tl and these various mappings. Then the tuple
(£, { D a } a , I , C) is said to be a pre-structure or pre-interpretation for our
language.
A mapping o on the variables is an assignment with respect to a pre-
structure (£, {Da}a,I, C) just in case o maps each variable of type a to
Da. We wish to extend o to a "valuation" function Vo, on all terms. The
desired behavior of Vo on a term H is given by induction over the structure
of H:
(1) H is a variable of a constant. In this case
(i)if H is a variable then Vo(H) = o ( H ) ,
(ii)if H is a parameter then V^H) = I ( H ) ,
(iii)if H is T then Vo(H) = (Tl,T)
(iv) if H is - then Vo(H)(l,p) = (-l(l),q), where q is F if p is T and
T otherwise,
(v) if H is V then Vo(H)(l1,p)(l2,q) = (V,(l1)(l 2 ),r), where r is T
if either p or q is T and F otherwise,
(vi) if H is A then Vo(H)(l1,p)(l 2 ,q) = (A;(l1)(l2),r), where r is F
if either p or q is F and T otherwise,
(vii) if H is D then Vo(H)(li,p}(l^q) = (D1 (l1)(l 2 ),r), where r is T
if either p is F or q is T and F otherwise,
(viii) if H is 3 of type ((a -> o) -> o) then, for any pEDa->o,
Vo(H)(p) = (3i (p),q), where q is T if there is some t E Da such
that p(t) = (l, T) for some l E L. and q is F otherwise, and
(ix) if H is V of type ((a -> o) -> o) then, for any p € Da->0,
Vo(H)(p) = (Vi (p),q), where q is T if for every t E Da there is
some l e L such that p(t) = (l, T) and q is F otherwise.
(2) H is (H1 H2). In this case, 1o(H) = Vo(H1)(Vo(H 2 )).
(3) H is (yx/H1). Let x be of type a and, for any t E Da, let o(x := t)
be the assignment that is identical to o except that it maps x to t.
Higher-Order Logic Programming 521
Theorem 3.4.1. Let r,O be finite sets of formulas. Then F —> 0 has
a C-proof if and only if r |= (V0).
3.5 Predicate variables and the subformula property
As noted in Subsection 3.3, the proof systems for first-order logic and our
higher-order logic look similar: the only real differences are, in fact, in the
presence of the y-conversion rules and the richer syntax of formulas. The
impact of these differences is, however, nontrivial. An important property
of formulas in first-order logic is that performing substitutions into them
preserves their logical structure — the resulting formula is in a certain
precise sense a subformula of the original formula (see [Gentzen, 1969]).
A similar observation can unfortunately not be made about formulas in
our higher-order logic. As an example, consider the formula F = ((p a) D
(Y a)); we assume that p and a are parameters of suitable type here and
that Y is a variable. Now let 9 be the substitution
ant under substitutions, the search for a proof can be based on this struc-
ture and the substitution (or, more appropriately, unification) process may
be reduced to a constraint on the search. However, the situation is different
in a logic in which substitutions can change the prepositional structure of
formulas. In such logics, the construction of a proof often involves finding
the "right" way in which to change the propositional structure as well. As
might be imagined, this problem is a difficult one to solve in general, and
no good method that is also complete has yet been described for deter-
mining these kinds of substitutions in our higher-order logic. The existing
theorem-provers for this logic either sacrifice completeness (Bledsoe, 1979;
Andrews et al., 1984] or are quite intractable for this reason [Huet, 1973a;
Andrews, 1989].
In the next section we describe a certain class of formulas from our
higher-order logic. Our primary interest in these formulas is that they pro-
vide a logical basis for higher-order features in logic programming. There is
an auxiliary interest, however, in these formulas in the light of the above ob-
servations. The special structure of these formulas enables us to obtain use-
ful information about derivations concerning them from the cut-elimination
theorem for higher-order logic. This information, in turn, enables the de-
scription of a proof procedure that is complete and that at the same time
finds substitutions for predicate variables almost entirely through unifica-
tion. Our study of these formulas thus also demonstrates the utility of the
cut-elimination theorem even in the context of a higher-order logic.
If 1, 2, and h are parameters of type int, int, and int -> int -> int
respectively and L, X and F are variables of suitable types, the following
are queries:
The explanations of these two aspects are, in a certain sense, related. The
typical scenario in logic programming is one where a goal formula with some
free variables is to be solved relative to some program. The calculation that
is intended in this situation is that of solving the existential closure of the
goal formula from the program. If this calculation is successful, the result
that is expected is a set of substitutions for the free variables in the given
query that make it so. We observe that the behavior of our abstract inter-
preter accords well with this view of the outcome of a computation: the
success of the interpreter on an existential query entails its success on a
particular instance of the query and so it is reasonable to require a specific
substitution to be returned as an "answer."
Example 4.0.6. Suppose that our language includes all the parameters
and variables described in Example 4.0.3. Further, suppose that our pro-
gram consists of the definite clauses defining mappred in that example and
the following in which 24 and 23 are parameters of type int.
(age bob 24),
(age sue 23).
Then, the query
(mappred age (cons bob (cons sue nil)) L)
in which L is a free variable actually requires the goal formula
3L (mappred age (cons bob (cons sue nil)) L).
to be solved. There is a solution to this goal formula that, in accordance
with the description of the abstract interpreter, involves solving the follow-
ing "subgoals" in sequence:
(mappred age (cons bob (cons sue nil)) (cons' 24 (cons' 23 nil')))
(age bob 24) A (mappred age (cons sue nil) (cons' 23 nil'))
(age bob 24)
(mappred age (cons sue nil) (cons' 23 nil'))
(age sue 23) A (mappred age nil nil')
(age sue 23)
(mappred age nil nil').
The answer to the original query that is obtained from this solution is the
substitution (cons' 24 (cons' 23 nil')) for L.
As another example, assume that our program consists of the clauses
for mapfun in Example 4.0.3 and that the query is now the goal formula
(mapfun F (cons 1 (cons 2 nil)) (cons (h 1 1) (cons (h 1 2) nil)))
in which F is a free variable. Once again we can construct the goal formula
whose solution is implicitly called for by this query. Further, a successful
solution path may be traced to show that an answer to the query is the
value yx (h 1 x) for F.
528 Gopalan Nadathur and Dale Miller
involve the derivation of (*), or of sequents similar to (*), then the idea of
proving 3Y (p Y) would diverge from the idea of solving it, at least in the
context where the program consists of the formula VX (X D (p a)).
We show in this section that problems of the sort described in the previ-
ous paragraph do not arise, and that the notions of success and provability
in the context of our definite clauses and goal formulas coincide. The
method that we adopt for demonstrating this is the following. We first
identify a C'-proof as a C-proof in which each occurrence of V-L and 3-R
constitutes a generalization upon a closed term from H+. In other words,
in each appearance of figures of the forms
Definition 5.1.1. Let x and y be variables of type o and, for each a, let
za be a variable of type a —> o. Then the function pos on terms is defined
as follows:
(i) If T is a constant or a variable
if T is -
if T is D
if T is V of type (a -> o) -> o
otherwise.
where r, l1 , . . . ,lr > 0, we shall use the notation r+ to denote the set
[(Simi) + /yi mi ] . . . [(Si1)+/yi1]Gi must also be T. In the other case, for some
i,j such that 1 < i < s and 1 < j < t, we have that
and, further, that these are atomic formulas. From the last observation, it
follows that Dj and Gi are atomic formulas. Using Lemma 5.1.4 again, it
follows that
Our analysis breaks up into two parts depending on the structure of Gi:
(1) If Gi is an atomic formula, we obtain from Lemma 5.1.4 that
Now B and D can be written as [B/y]y and [D/y]y, respectively. From the
hypothesis it thus follows that
A+ —> (O')+, B+ and A+ —> (O') + , D+
have C'-proofs. Using an A-R inference figure in conjunction with these,
we obtain a C'-proof for A+ —> O+.
(2) If d is not an atomic formula then it must be of the form G1i AG2i.
But then B =[Simi/yimi]...[Si1/yi1]G1iand D = [Simi/yimi] . . . [Si1/yi1]G2i. It
follows from the hypothesis that C'-proofs exist for
A+ ->
and
Higher-Order Logic Programming 533
A+ -
A proof for A+ — > @+ can be obtained from these by using an A-R
inference figure.
An analogous argument can be provided when the last figure is V-R.
For the case of D -L, we observe first that if the result of performing a
sequence of substitutions into a definite clause D is a formula of the form
B D C, then D must be of the form G D AT where G is a goal formula and
Ar is a rigid positive atom. An analysis similar to that used for A-R now
verifies the claim.
Consider now the case when the last inference figure is a ->-R, i.e., of
the form
We see in this case that for some suitable i, [Simi/yimi] . . . [Si1/yi1]Gi = -B.
But then Gi must be an atomic goal formula and by Lemma 5.1.4
Noting that
Notice that there is no ambiguity about the answer substitution that should
be extracted from this derivation for the existentially quantified variable
Y.
not permitted. It holds, for instance, in the case when we are dealing with
only first-order formulas. Showing that the observation also applies in the
case of our higher-order formulas requires much work, as we have already
seen.
Definition 5.2.3. The length of a derivation E is defined by recursion on
its height as follows:
(i) It is 1 if E consists of only an initial sequent.
(ii) It is l + 1 if the last inference figure in E has a single upper sequent
whose derivation has length l.
(iii) It is(iii)+ l2 + 1 if the last inference figure in E has two upper sequents
whose derivations are of length l1 and l2 respectively.
for enumerating some of them. This procedure utilizes the fact that there
are certain disagreement sets for which at least one unifier can easily be
provided and, similarly, there are other disagreement sets for which it is
easily manifest that no unifiers can exist. Given an arbitrary disagreement
set, the procedure attempts to reduce it to a disagreement set of one of
these two kinds. This reduction proceeds by an iterative use of two sim-
plifying functions, called SIMPL and MATCH, on disagreement sets. The
basis for the first of these functions is provided by the following lemma
whose proof may be found in [Huet, 1975]. In this lemma, and in the rest
of this section, we use the notation U(D) to denote the set of unifiers for a
disagreement set T>.
Lemma 6.1.1. Let TI = \x (H1 A1 . . . Ar) and T2 = Az (H2 B1 . . . Bs)
be two rigid terms of the same type that are in ^-normal form. Then
9 € W({<Ti,r 2 >}) if and only if
(i) H1 = H2 (and, therefore, r = s), and
(ii) 9£U({(\xAi,\xBi)\l<i<r}).
Let us call a finite set of goal formulas a goal set, and a disagreement set
that is F or consists solely of pairs of positive terms a positive disagreement
set. If g1 is a goal set and D1 is a positive disagreement set then it is clear,
from Definitions 6.1.2, 6.1.4 and 6.2.1 and the fact that a positive term
remains a positive term under a positive substitution, that g2 is a goal
set and D2 a positive disagreement set for any tuple (g 2 , D2, 02, V2) that is
P-derivable from (g1 , D1 , 01 ,V1) .
Definition 6.2.2. Let g be a goal set. Then we say that a sequence of the
form (g1,D1, 0i, Vi)1<i<n is a P -derivation sequence for g just in case g1 =
6, V1 = F(g1, D1 = 0, 01 = 0, and, for 1 < i < n, {gi+1,Di+1,0i+1, Vi+1)
is P-derivable from {g,Di,0i, Vi).
Prom our earlier observations it follows easily that, in a P-
derivation sequence for a goal set g, each gi is a goal set and each Pi
is a positive disagreement set. We make implicit use of this fact in our
discussions below. In particular, we intend unqualified uses of the sym-
bols g and D to be read as syntactic variables for goal sets and positive
disagreement sets, respectively.
Definition 6.2.3. A p-derivation sequence (gi,Di,0i, Vi)\<i<n terminates,
i.e., is not contained in a longer sequence, if
(a) gn is either empty or is a goal set consisting solely of flexible atoms
and Dn is either empty or consists solely of flexible-flexible pairs, or
(b) D n = F .
In the former case we say that it is a successfully terminated sequence. If
this sequence also happens to be a P-derivation sequence for g, then we call
Higher-Order Logic Programming 543
From (iv) it follows that the sequence must be finite. From (iii) and Lem-
mas 6.1.3 and 6.1.5 it is evident, then, that it must be a successfully ter-
minated sequence, i.e. a P-derivation of G. Suppose that the length of the
sequence is n. From (i), (ii), Lemma 6.2.9 and an induction on n, it can
be seen that that tp XVl 0n o • • • o 01. But f(G) = V1 and 0n o • • • o 01 is
the answer substitution for the sequence. I
If the substitutions provided by PROJiS are chosen first at each stage, then
these unifiers will be produced in the order that they appear above, perhaps
with the second and third interchanged. On the other hand, choosing the
substitution provided by IMIT first results in these unifiers being produced
in the reverse order. Now, the above unification problem may arise in a
programming context out of the following kind of desire: We wish to unify
the function variable F with the result of "abstracting" out all occurrences
of a particular constant, which is 2 in this case, from a given data structure,
which is an integer list here. If this is the desire, then it is clearly preferable
to choose the PROJs substitutions before the substitution provided by
IMIT. In a slightly more elaborate scheme, the user may be given a means
for switching between these possibilities.
In attempting to solve a goal set, a nonflexible goal formula from the
set may be picked arbitrarily. If the goal formula is either conjunctive or
existential, then there is only one way in which it can be simplified. If the
goal formula picked is {G1 V G2} and the remaining goal set is g, then, for
the sake of completeness, the interpreter should try to solve {G1} U g and
{G2} Ug simultaneously. In practice, the interpreter may attempt to solve
{G1}Ug first, returning to {G2}Ug only in case of failure. This approach,
as the reader might notice, is in keeping with the one used in Prolog. In
the case that the goal formula picked is atomic, a backchaining step must
be used. In performing such a step, it is enough to consider only those
definite clauses in the program of the form Vx A or Vx (G D A) where the
head of A is identical to the head of the goal formula being solved, since all
other cases will cause the disagreement set to be reduced to F by SIMPL.
For completeness, it is necessary to use each of these definite clauses in a
breadth-first fashion in attempting to solve the goal formula. Here again
the scheme that is used by standard Prolog interpreters might be adopted:
the first appropriate clause might be picked based on some predetermined
ordering and others may be returned to only if this choice leads to failure.
The above discussion indicates how an interpreter can be constructed
based on the notion of P-derivations. An interpreter based on only these
ideas would still be a fairly simplistic one. Several specific improvements
(such as recognizing and handling special kinds of unification problems) and
enhancements (such as those for dealing with polymorphism, a necessary
practical addition) can be described to this interpreter. An examination
of these aspects is beyond the scope of this chapter. For the interested
reader, we mention that a further discussion of some of these aspects may
Higher-Order Logic Programming 549
(forevery P nil).
(forevery P (X::L)) :- (P X), (forevery P L).
(trans R X Y) :- (R X Y).
(trans R X Z) :- (R X Y), (trans R Y Z).
is invoked relative to the same set of definite clauses, two substitutions for
K can be returned as answers: (sue: :bob: :nil) and (ned: :bob: :nil).
Notice that within the form of higher-order programming being considered,
non-determinism is supported naturally. Support is also provided in this
context for the use of "partially determined" predicates, i.e., predicate
expressions that contain variables whose values are to be determined in
the course of computation. The query
?- (forevery x\(age x A) (ned::bob::sue::nil)).
illustrates this feature. Solving this query requires determining if the pred-
icate x\(age x A) is true of ned, bob and sue. Notice that this predicate
has a variable A appearing in it and a binding for A will be determined in the
course of computation, causing the predicate to become further specified.
The given query will fail relative to the clauses in Figures 2 and 3 because
the three individuals in the list do not have a common age. However, the
query
?- (forevery x\(age x A) (ned::sue::nil)).
will succeed and will result in A being bound to 23. The last two queries
are to be contrasted with the query
?- (forevery x\(sigma Y\(age x Y)) (ned::bob::sue::nil)).
in which the predicate x\(sigma Y\(age x Y)) is completely determined.
This query will succeed relative to the clauses in Figures 2 and 3 since all
the individuals in the list (ned: :bob: :sue: :nil) have an age defined for
them.
An interpreter for our language that is based on the ideas in Section 6
solves a query by using a succession of goal reduction, backchaining and
unification steps. None of these steps result in a flexible goal formula being
selected. Flexible goal formulas remain in the goal set until substitutions
performed on them in the course of computation make them rigid. This
may never happen and the computation may terminate with such goal
formulas being left in the goal set. Thus, consider the query
Higher-Order Logic Programming 555
?- (P sue 23).
relative to the clauses in Figure 3. Our interpreter will succeed on this
immediately because the only goal formula in its initial goal set is a flex-
ible one. It is sensible to claim this to be a successful computation be-
cause there is at least one substitution — in particular, the substitution
x\y\true for P — that makes this query a solvable one. It might be argued
that there are meaningful answers for this query relative to the given pro-
gram and that these should be provided by the interpreter. For example,
it may appear that the binding of P to the term x\y\(age x y) (that is
equal by virtue of n-conversion to age) should be returned as an answer
to this query. However, many other similarly suitable terms can be of-
fered as bindings for P; for example, consider the terms x\y\(age ned 23)
and x\y\((age x 23), (age ned y)). There are, in fact, far too many
"suitable" answer substitutions for this kind of a query for any reasonable
interpreter to attempt to generate. It is for this reason that flexible goal
formulas are never selected in the course of constructing a P-derivation and
that the ones that persist at the end are presented as such to the user along
with the answer substitution and any remaining flexible-flexible pairs.
Despite these observations, flexible goal formulas can have a meaningful
role to play in programs because the range of acceptable substitutions for
predicate variables can be restricted by other clauses in the program. For
example, while it is not sensible to ask for the substitutions for R that make
the query
?- (R john mary).
a solvable one, a programmer can first describe a restricted collection of
predicate terms and then ask if any of these predicates can be substituted
for R to make the given query a satisfiable one. Thus, suppose that our
program contains the definite clauses that are presented in Figure 4. Then,
the query
?- (rel R), (R john mary).
is a meaningful one and is solvable only if the term
x\y\(sigma Z\((wife x Z), (mother Z y ) ) ) .
is substituted for R. The second-order predicate rel specifies the collection
of predicate terms that are relevant to consider as substitutions in this
situation.
Our discussions pertaining to flexible queries have been based on a
certain logical view of predicates and the structure of predicate terms.
However, this is not the only tenable view. There is, in fact, an alternative
viewpoint under which a query such as
?- (P sue 23).
can be considered meaningful and for which the only legitimate answer is
556 Gopalan Nadathur and Dale Miller
(primrel mother).
(primrel wife).
(rel R) :- (primrel R).
(rel x\y\(sigma Z \ ( ( R x Z), (S Z y)))) :-
(primrel R), (primrel S).
(mother jane mary).
(wife John jane).
the substitution of age for P. We refer the reader to [Chen et al., 1993] for
a presentation of a logic that justifies this viewpoint and for the description
of a programming language that is based on this logic.
encoding of compound goals. For instance, we will use below the constant
truegoal of type g to denote the trivially satisfiable goal and the constant
andgoal of type g -> g -> g to denote a goal formed out of the conjunc-
tion of two other goals. Other combinations such as the disjunction of two
goals are also possible and can be encoded in a similar way.
A tactic in our view is a binary relation between a primitive goal and
another goal, either compound or primitive. Thus tactics can be encoded
by predicates of type g -> g -> o. Abstractly, if a tactic R holds of G1 and
G2, i.e., if (R Gl G2) is solvable from a presentation of primitive tactics as
a set of definite clauses, then satisfying the goal G2 in the object-language
should suffice to satisfy goal Gl.
An illustration of these ideas can be provided by considering the task
of implementing a proof procedure for propositional Horn clauses. For
simplicity of presentation, we restrict the propositional goal formulas that
will be considered to be conjunctions of propositions. The objective will, of
course, be to prove such formulas. Each primitive object-level goal therefore
corresponds to showing some atomic proposition to be true, and such a goal
might be encoded by a constant of type g whose name is identical to that
of the proposition. Now, if p and q are two atomic propositions, then
the goal of showing that their conjunction is true can be encoded in the
object-level goal (andgoal p q). The primitive method for reducing such
goals is that of backchaining on (propositional) definite clauses. Thus, the
tactics of interest will have the form (R H G), where H represents the head
of a clause and G is the goal corresponding to its body. The declarations
in Figure 5 use these ideas to provide a tactic-style encoding of the four
propositional clauses
p :- r,s.
q :- r.
s:- r,q.
r.
The tactics clltac, c12tac, c13tac and c14tac correspond to each of
these clauses respectively.
The declarations in Figure 6 serve to implement several general tacti-
cals. Notice that tacticals are higher-order predicates in our context since
they take tactics that are themselves predicates as arguments. The tacticals
in Figure 6 are to be understood as follows. The orelse tactical is one that
succeeds if either of the two tactics it is given can be used to successfully
reduce the given goal. The try tactical forms the reflexive closure of a given
tactic: if R is a tactic then (try R) is itself a tactic and, in fact, one that
always succeeds, returning the original goal if it is unable to reduce it by
means of R. This tactical uses an auxiliary tactic called idtac whose mean-
ing is self-evident. The then tactical specifies the natural join of the two
relations that are its arguments and is used to compose tactics: if Rl and
558 Gopalan Nadathur and Dale Miller
type p g.
type q g.
type r g.
type s g.
R2 are closed terms denoting tactics, and Gl and G2 are closed terms repre-
senting (object-level) goals, then the query (then Rl R2 Gl G2) succeeds
just in case the application of Rl to Gl produces G3 and the application of
R2 to G3 yields the goal G2. The maptac tactical is used in carrying out
the application of the second tactic in this process since G2 may be not
be a primitive object-level goal: maptac maps a given tactic over all the
primitive goals in a compound goal. The then tactical plays a fundamental
role in combining the results of step-by-step goal reduction. The repeat
tactical is defined recursively using then, orelse, and idtac and it repeat-
edly applies the tactic it is given until this tactic is no longer applicable.
Finally, the complete tactical succeeds if the tactic it is given completely
solves the goal it is given. The completely solved goal can be written as
truegoal and (andgoal truegoal truegoal) and in several other ways,
and so the auxiliary predicate goalreduce is needed to reduce all of these
to truegoal. Although the complete tactical is the only one that uses the
predicate goalreduce, the other tacticals can be modified so that they also
use it to simplify the structure of the goal they produce whenever this is
possible.
Tacticals, as mentioned earlier, can be used to combine tactics to pro-
duce large scale problem solvers. As an illustration of this, consider the
following definite clause
(depthfirst G) :-
(complete (repeat (orelse c11tac
(orelse c12tac (orelse c13tac c14tac))))
G truegoal).
in conjunction with the declarations in Figures 5 and 6. Assuming an
Higher-Order Logic Programming 559
(idtac G G).
(complete R Gl truegoal) :-
(R Gl G2), (goalreduce G2 truegoal).
interpreter for our language of the kind described in Section 6, this clause
defines a procedure that attempts to solve an object-level goal by a depth-
first search using the given propositional Horn clauses. The query
?- (depthfirst p).
has a successful derivation and it follows from this that the proposition p
is a logical consequence of the given Horn clauses.
560 Gopalan Nadathur and Dale Miller
objects in our language. For this purpose, we consider the predicate mapfun
that is defined by the following declarations.
type mapfun (A -> B) -> (list A) -> (list B) -> o.
type a term.
type b term.
type c term.
type f term -> term.
type path term -> term -> form.
type adj term -> term -> form.
type prog list form -> o.
by writing the expression ((x\T) S); this term is equal, via b-conversion,
to the term [S/x]T. The upshot of these observations is that a programmer
using our language does not need to explicitly implement procedures for
testing for alphabetic equivalence, for performing substitution or for car-
rying out other similar logical operations on formulas. The benefits of this
are substantial since implementing such operations correctly is a non-trivial
task.
We provide a specific illustration of the aspects discussed above by
considering the problem of implementing an interpreter for the logic of
first-order Horn clauses. The first problem to address here is the represen-
tation of object-level formulas and terms. We introduce the sorts form and
term for this purpose; A-terms of these two types will represent the objects
in question. Object-level logical and nonlogical constants, functions, and
predicates will be represented by relevant meta-level nonlogical constants.
Thus, suppose we wish to represent the following list of first-order definite
clauses:
adj(a,b),
adj(b,c),
adj(c,f(c)),
VxVy(adj(x,y) Dpath(x,y)) and
VxVyVz(adj(x, y) Apath(y, z) D path(x, z)).
We might do this by using the declarations in Figure 7. In this program, a
representation of a list of these clauses is eventually "stored" by using the
(meta-level) predicate prog. Notice that universal and existential quantifi-
cation at the object-level are encoded by using, respectively, the constants
all and some of second-order type and that variable binding at the object-
level is handled as expected by meta-level abstraction.
The clauses in Figure 8 implement an interpreter for the logic of first-
order Horn clauses assuming the representation just described. Thus, given
the clauses in Figures 7 and 8, the query
?- (prog Cs), (intep Cs (path a X ) ) .
is solvable and produces three distinct answer substitutions, binding X to
b, c and (f c), respectively. Notice that object-level quantifiers need to be
instantiated at two places in an interpreter of the kind we desire: in dealing
with an existentially quantified goal formula and in generating an instance
of a universally quantified definite clause. Both kinds of instantiation are
realized in the clauses in Figure 8 through (meta-level) b-conversion and
the specific instantiation is delayed in both cases through the use of a logic
variable.
In understanding the advantages of our language, it is useful to consider
the task of implementing an interpreter for first-order Horn clause logic in
a pure first-order logic programming language. This task is a much more
involved one since object-level quantification will have to be represented in
Higher-Order Logic Programming 565
kind tm type.
that application and b-conversion at the meta-level are used in the second
clause for eval to realize the needed substitution at the object-level of an
actual argument for a formal one of a functional expression.
The underlying A-calculus is enhanced in the typical functional pro-
gramming language by including a collection of predefined constants and
functions. In encoding such constants a corresponding set of nonlogical
constants might be used. For the purposes of the discussions below, we
shall assume the collection presented in Figure 10 whose purpose is under-
stood as follows:
(1) The constants fixpt and cond encode the fixed-point and conditional
operators of the object-language being considered; these operators
play an essential role in realizing recursive schemes.
(2) The constants truth and false represent the boolean values and and
represents the binary operation of conjunction on booleans.
(3) The constant c when applied to an integer yields the encoding of that
integer; thus (c 0) encodes the integer 0. The constants +, *, -, <
and = represent the obvious binary operators on integers. Finally, the
constant intp encodes a function that recognizes integers: (intp e)
encodes an expression that evaluates to true if e is an encoding of an
integer and to false otherwise.
(4) The constants cons and nill encode list constructors, consp and
null encode recognizers for nonempty and empty lists, respectively,
and the constants car and cdr represent the usual list destructors.
(5) The constant pair represents the pair constructor, pairp represents
the pair recognizer, and first and second represent the obvious
destructors of pairs.
(6) the constant error encodes the error value.
Each of the constants and functions whose encoding is described above
has a predefined meaning that is used in the evaluation of expressions
in the object-language. These meanings are usually clarified by a set of
Higher-Order Logic Programming 567
equations that are satisfied by them. For instance, if fixpt and cond are
the fixed-point and conditional operators of the functional programming
language being encoded and true and false are the boolean values, then
the meanings of these operators might be given by the following equations:
Vx ((fixpt x) = (x (fixpt x))),
VxVy ((cond true x y) = x), and
VxVy ((cond false x y) = y).
The effect of such equations on evaluation can be captured in our encoding
by augmenting the set of definite clauses for eval contained in Figure 9.
For example, the definite clause
(eval (fixpt F) V) :- (eval (F (fixpt F)) V).
568 Gopalan Nadathur and Dale Miller
can be added to this collection to realize the effect of the first of the equa-
tions above. We do not describe a complete set of equations or an encoding
of it here, but the interested reader can find extended discussions of this
and other related issues in [Hannan and Miller, 1992] and [Hannan, 1993].
A point that should be noted about an encoding of the kind described here
is that it reflects the effect of the object-language equations only in the
evaluation relation and does not affect the equality relation of the meta-
language. In particular, the notions of equality and unification for our
typed A-terms, even those containing the new nonlogical constants, are
still governed only by the A-conversion rules.
The set of nonlogical constants that we have described above suffices for
representing several recursive functional programs as A-terms. For example,
consider a tail-recursive version of the factorial function that might be
defined in a functional programming language as follows:
fact n m = (cond (n = 0) m (fact (n - 1) (n * m))).
(The factorial of a non-negative integer n is to be obtained by evaluating
the expression (fact n 1).) The function fact that is defined in this manner
can be represented by the A-term
(fixpt fact\(abs n\(abs m\
(cond (= n (c 0)) m
(app (app fact (- n (c 1))) (* n m ) ) ) ) ) ) .
We assume below a representation of this kind for functional programs and
we describe manipulations on these programs in terms of manipulations on
their representation in our language.
As an example of manipulating programs, let us suppose that we wish
to transform a recursive program that expects a single pair argument into
a corresponding curried version. Thus, we would like to transform the
program given by the term
(fixpt fact\(abs p\
(cond (and (pairp p) (= (first p) (c 0)))
(second p)
(cond (pairp p)
(app fact (pair (- (first p) (c 1))
(* (first p) (second p))))
error))))
into the factorial program presented earlier. Let the argument of the given
function be p as in the case above. If the desired transformer is imple-
mented in a language such as Lisp, ML, or Prolog then it would have to
be a recursive program that descends through the structure of the term
representing the functional program, making sure that the occurrences of
the bound variable p in it are within expressions of the form (pairp p),
(first p), or (second p) and, in this case, replacing these expressions
Higher-Order Logic Programming 569
respectively by a true condition, and the first and second arguments of the
version of the program being constructed. Although this does not happen
with the program term displayed above, it is possible for this descent to
enter a context in which the variable p is bound locally, and care must be
exercised to ensure that occurrences of p in this context are not confused
with the argument of the function. It is somewhat unfortunate that the
names of bound variables have to be considered explicitly in this process
since the choice of these names has no relevance to the meanings of pro-
grams. However, this concern is unavoidable if our program transformer
is to be implemented in one of the languages under consideration since a
proper understanding of bound variables is simply not embedded in them.
The availability of A-terms and of higher-order unification in our lan-
guage permits a rather different kind of solution to the given problem. In
fact, the following (atomic) definite clause provides a concise description
of the desired relationship between terms representing the curried and un-
curried versions of recursive programs:
(curry (fixpt ql\(abs x\(A (first x) (second x) (pairp x)
(r\s\(app ql (pair r s ) ) ) ) ) )
(fixpt q2\(abs y\(abs z\(A y z truth (r\s\(app
(app q2 r) s ) ) ) ) ) ) ) .
The first argument of the predicate curry in this clause constitutes a "tem-
plate" for programs of the type we wish to transform: For a term represent-
ing a functional program to unify with this term, it must have one argument
that corresponds to x, every occurrence of this argument in the body of
the program must be within an expression of the form (first x), (second
x) or (pairp x), and every recursive call in the program must involve the
formation of a pair argument. (The expression r\s\(app ql (pair r s))
represents a function of two arguments that applies ql to the pair formed
from its arguments.) The recognition of the representation of a functional
program as being of this form results in the higher-order variable A being
bound to the result of extracting the expressions (first x), (second x),
(pairp x), and r\s\(app ql (pair r s)) from the term corresponding
to the body of this program. The desired transformation can be effected
merely by replacing the extracted expressions with the two arguments of
the curried version, truth and a recursive call, respectively. Such a replace-
ment is achieved by means of application and b-conversion in the second
argument of curry in the clause above.
To illustrate the computation described abstractly above, let us consider
solving the query
?- (curry (fixpt fact\(abs p\
(cond (and (pairp p) (= (first p) (c 0)))
(second p)
(cond (pairp p)
570 Gopalan Nadathur and Dale Miller
(app fact
(pair (- (first p) (c 1))
(* (first p) (second p ) ) ) )
error))))
NewProg).
using the given clause for curry. The described interpreter for our language
would, first of all, instantiate the variable A with the term
u\v\p\q\(cond (and p (= u (c 0 ) ) ) v
(cond p (q (- u (c 1)) (* u v)) error).
(This instantiation for A is unique.) The variable NewProg will then be set
to a term that is equal via A-conversion to
(fixpt q2 \(abs y\(abs z\(cond (and truth (= y (c 0 ) ) ) z
(cond truth (app (app q2 (- y (cl))) (* y z)) error))))).
Although this is not identical to the term we wished to produce, it can be
reduced to that form by using simple identities pertaining to the boolean
constants and operations of the function programming language. A pro-
gram transformer that uses these identities to simplify functional programs
can be written relatively easily.
As a more complex example of manipulating functional programs, we
consider the task of recognizing programs that are tail-recursive; such a
recognition step might be a prelude to transforming a given program into
one in iterative form. The curried version of the factorial program is an
example of a tail-recursive program as also is the program for summing two
non-negative integers that is represented by the following A-term:
(fixpt sum\(abs n\(abs m\
(cond (= n (c 0)) m
(app (app sum (- n (c 1))) (+ m (c 1)))))))).
Now, the tail-recursiveness of these programs can easily be recognized by
using higher-order unification in conjunction with their indicated represen-
tations. Both are, in fact, instances of the term
(fixpt f\(abs x\(abs y\
(cond (C x y) (H x y)
(app (app f (Fl x y)) (F2 x y))))))
Further, the representations of only tail-recursive programs are instances
of the last term: Any closed term that unifies with the given "second-
order template" must be a representation of a recursive program of two
arguments whose body is a conditional and in which the only recursive
call appears in the second branch of the conditional and, that too, as the
head of the expression constituting that branch. Clearly any functional
program that has such a structure must be tail-recursive. Notice that all
the structural analysis that has to be performed in this recognition process
Higher-Order Logic Programming 571
perform the instantiation? The ideal solution is to use a new constant, say
dummy, of type term; thus, (all B) would be recognized as a definite clause
if (B dummy) is a definite clause. The problem is that our language does not
provide a logical way to generate such a new constant. An alternative is to
use just any constant. This strategy will work in the present situation, but
there is some arbitrariness to it and, in any case, there are other contexts
in which the newness of the constant is critical.
A predicate that recognizes tail-recursive functional programs of two
arguments was presented in Subsection 8.2. It is natural to ask if it is
possible to recognize tail-recursive programs that have other arities. One
apparent answer to this question is that we mimic the clauses in Figure 11
for each arity. Thus, we might add the clauses
(tailrec (fixpt f\(abs x\(H x ) ) ) ) .
(tailrec (fixpt f\(abs x\(app (appf (h x)) (G x ) ) ) ) ) .
(tailrec (fixpt f\(abs x\(cond (C x) (H1 f x) (H2 f x ) ) ) ) ) :-
(tailrec (fixpt f\(abs x\(H1 f x)))),
(tailrec (fixpt f\(abs x\(H2 f x)))).
to the earlier program to obtain one that also recognizes tail-recursive func-
tional programs of arity 1. However, this is not really a solution to our
problem. What we desire is a finite set of clauses that can be used to
recognize tail-recursive programs of arbitrary arity. If we maintain our en-
coding of functional programs, a solution to this problem seems to require
that we descend through the abstractions corresponding to the arguments
of a program in the term representing the program discharging each of
these abstractions with a new constant, and that we examine the structure
that results from this for tail-recursiveness. Once again we notice that our
language does not provide us with a principled mechanism for creating the
new constants that are needed in implementing this approach.
The examples above indicate a problem in defining predicates in our
language for recognizing terms of certain kinds. This problem becomes
more severe when we consider defining relationships between terms. For
example, suppose we wish to define a binary predicate called prenex that
relates the representations of two first-order formulas only if the second is
a prenex normal form of the first. One observation useful in defining this
predicate is that a formula of the form Vx B has Vx C as a prenex normal
form just in case B has C as a prenex normal form. In implementing this
observation, it is necessary, once again, to consider dropping a quantifier.
Using the technique of substitution with a dummy constant to simulate
this, we might translate our observation into the definite clause
(prenex (all B) (all C)) :- (prenex (B dummy) (C dummy)).
Unfortunately this clause does not capture our observation satisfactorily
and describes a relation on formulas that has little to do with prenex normal
forms. Thus, consider the query
574 Gopalan Nadathur and Dale Miller
Harrop formulas and the corresponding goal formulas relative to S are the
D- and G-formulas defined by the following mutually recursive syntax rules:
D ::= Ar \ G D Ar \VxD\D/\D
G ::= T | A \ G A G | G V G | VzG \ 3xG \ D D G.
Quantification in these formulas, as in the context of higher-order Horn
clauses, may be over function and predicate variables. When we use the
formulas described here in programming, we shall think of a closed D-
formula as a program clause relative to the signature E, a collection of such
clauses as a program, and a G-formula as a query. There are considerations
that determine exactly the definitions of the D- and G-formulas that are
given here, and these are described in [Miller, 1990; Miller et al., 1991].
An approach similar to that in Section 4 can be employed here as well in
explaining what it means to solve a query relative to a program and what
the result of a solution is to be. The main difference is that, in explaining
how a closed goal formula is to be solved, we now have to deal with the
additional possibilities for such formulas to contain universal quantifiers
and implications. The attempt to solve a universally quantified goal for-
mula can result in the addition of new nonlogical constants to an existing
signature. It will therefore be necessary to consider generating instances
of program clauses relative to different signatures. The following definition
provides a method for doing this.
Definition 9.1.1. Let E be a set of nonlogical constants and let a closed
positive ^.-substitution be one whose range is a set of closed terms contained
in K.%. Now, if D is a program clause, then the collection of its closed
positive S-instances is denoted by [D]s and is given as follows:
(i) if D is of the form A or G D A, then it is {D},
(ii) if D is of the form D' A D" then it is [D']s U [D"]s , and
(iii) if D is of the form Vx D' then it is UdvC^'Jls I f is a closed positive
S-substitution for a;}.
This notation is extended to programs as follows: if P is a program,
(term a).
(term b).
(term c).
(term (f X)) :- (term X).
a global context. Each goal formula must, therefore, carry a relevant sig-
nature and program. Second, the permitted substitutions for each logic
variable that is introduced in the course of solving an existential query
or instantiating a program clause are determined by the signature that
is in existence at the time of introduction of the logic variable. It is
therefore necessary to encode this signature in some fashion in the logic
variable and to use it within the unification process to ensure that in-
stantiations of the variable contain only the permitted nonlogical con-
stants. Several different methods can be used to realize this requirement
in practice, and some of these are described in [Miller, 1989a; Miller, 1992;
Nadathur, 1993]).
(goal truth).
(goal (and B C)) :- (goal B), (goal C).
(goal (or B C)) :- (goal B), (goal C).
(goal (some C)) :- (pi x\((term x) => (goal (C x)))).
(goal A) :- (atom A).
Figure 13 in defining the predicate defcl and goal that are intended to
be recognizers of encodings of object-language definite clauses and goal
formulas respectively. We use the symbol => to represent implications in
(meta-level) goal formulas in the program clauses that appear in this figure.
Recall that the symbol pi represents the universal quantifier.
In understanding the definitions presented in Figure 13, it is useful to
focus on the first clause for defcl that appears in it. This clause is not
an acceptable one in the Horn clause setting because an implication and a
universal quantifier are used in its "body". An attempt to solve the query
?- (defcl (all x\(all y\(all z\
(imp (and (adj x y) (path y z)) (path x z)))))).
will result in this clause being used. This will, in turn, result in the variable
C that appears in the clause being instantiated to the term
x\(all y\(all z\(imp (and (adj x y) (path y z))
(path x z ) ) ) ) .
The way in which this A-term is to be processed can be described as follows.
First, a new constant must be picked to play the role of a name for the
bound variable x. This constant must be added to the signature of the
object-level logic, in turn requiring that the definition of the predicate term
be extended. Finally, the A-term must be applied to the new constant and
it must be checked if the resulting term represents a Horn clause. Thus,
if the new constant that is picked is d, then an attempt must be made to
solve the goal formula
(defcl (all y\(all z\(imp (and (adj d y) (path y z))
(path d z ) ) ) ) )
after the program has been augmented with the clause (term d). From
the operational interpretation of implications and universal quantifiers de-
Higher-Order Logic Programming 579
(quantfree false).
(quantfree truth).
(quantfree A) :- (atom A).
(quantfree (and B C)) :- ((quantfree B), (quantfree O).
(quantfree (or B C)) :- ((quantfree B), (quantfree C)).
(quantfree (imp B C)) :- ((quantfree B), (quantfree C)).
scribed in Definition 9.1.2, it is easily seen that this is exactly the compu-
tation that is performed with the A-term under consideration.
One of the virtues of our extended language is that it is relatively
straightforward to verify formal properties of programs written in it. For
example, consider the following property of the definition of defcl: If the
query (defcl (all B)) has a solution and if T is an object-level term (that
is, (term T) is solvable), then the query (defcl (B T)) has a solution,
i.e., the property of being a Horn clause is maintained under first-order
universal instantiation. This property can be seen to hold by the following
argument: If the query (defcl (all B)) is solvable, then it must be the
case that the query pi x\((term x) => (defcl (B x))) is also solvable.
Since (term T) is provable and since universal instantiation and modus
ponens holds for intuitionistic logic and since the operational semantics of
our language coincides with intuitionistic provability, we can conclude that
the query (defcl (B T)) has a solution. The reader will easily appreci-
ate the fact that proofs of similar properties of programs written in other
programming languages would be rather more involved than this.
Ideas similar to those above are used in Figures 16 and 15 in defining
the predicate prenex. The declarations in these figures are assumed to
build on those in Figure 7 and 12 that formalize the object-language and
those in Figure 14 that define a predicate called quantfree that recognizes
quantifier free formulas. The predicate merge that is defined in Figure 15
serves to raise the scopes of quantifiers over the binary connectives. This
predicate is used in the definition of prenex in Figure 16. An interpreter
for our language should succeed on the query
?- (prenex (or (all x\(and (adj x x) (and (all y\(path x y))
(adj (f x) c))))
(adj a b))
Pnf).
relative to a program consisting of these various clauses and should produce
the term
all x\(all y\(or (and (adj x x)
580 Gopalan Nadathur and Dale Miller
Fig. 16. Relating first-order formulas and their prenex normal forms
kind ty type.
part, self explanatory. We draw the attention of the reader to the first two
clauses in this collection that are the only ones that deal with abstractions
in the representation of functional programs. Use is made in both cases of
a combination of universal quantification and implication that is familiar
by now in order to move the term-level abstraction into the meta-level. We
observe, once again, that the manner of definition of the typing predicate
in our language makes it easy to establish formal properties concerning it.
For example, the following property can be shown to be true of typeof by
using arguments similar to those used in the case of defcl: if the queries
(typeof (abs M) (arrow A B)) and (typeof N A) have solutions, then
the query (typeof (M N) B) also has a solution.
We refer the reader to [Felty, 1993; Pareschi and Miller, 1990; Hannan
and Miller, 1992; Miller, 1991] for other, more extensive, illustrations of
the value of a language based on higher-order hereditary Harrop formulas
from the perspective of meta-programming. It is worth noting that all the
example programs presented in this section as well as several others that
are described in the literature fall within a sublanguage of this language
called L\. This sublanguage, which is described in detail in [Miller, 1991],
has the computationally pleasant property that higher-order unification is
decidable in its context and admits of most general unifiers.
10 Conclusion
We have attempted to develop the notion of higher-order programming
within logic programming in this chapter. A central concern in this en-
deavour has been to preserve the declarative style that is a hallmark of
logic programming. Our approach has therefore been to identify an ana-
logue of first-order Horn clauses in the context of a higher-order logic; this
analogue must, of course, preserve the logical properties of the first-order
formulas that are essential to their computational use while incorporating
desirable higher-order features. This approach has led to the description of
the so-called higher-order Horn clauses in the Simple Theory of Types, a
higher-order logic that is based on a typed version of the lambda calculus.
An actual use of these formulas in programming requires that a practically
Higher-Order Logic Programming 585
acceptable proof procedure exist for them. We have exhibited such a proce-
dure by utilizing the logical properties of the formulas in conjunction with
a procedure for unifying terms of the relevant typed A-calculus. We have
then examined the applications for a programming language that is based
on these formulas. As initially desired, this language provides for the usual
higher-order programming features within logic programming. This lan-
guage also supports some unusual forms of higher-order programming: it
permits A-terms to be used in constructing the descriptions of syntactic ob-
jects such as programs and quantified formulas, and it allows computations
to be performed on these descriptions by means of the A-conversion rules
and higher-order unification. These novel features have interesting uses in
the realm of meta-programming and we have illustrated this fact in this
chapter. A complete realization of these meta-programming capabilities,
however, requires a language with a larger set of logical primitives than
that obtained by using Horn clauses. These additional primitives are in-
corporated into the logic of hereditary Harrop formulas. We have described
this logic here and have also outlined some of the several applications that
a programming language based on this logic has in areas such as theo-
rem proving, type inference, program transformation, and computational
linguistics.
The discussions in this chapter reveal a considerable richness to the
notion of higher-order logic programming. We note also that these dis-
cussions are not exhaustive. Work on this topic continues along several
dimensions such as refinement, modification and extension of the language,
implementation, and exploration of applications, especially in the realm of
meta-programming.
Acknowledgements
We are grateful to Gilles Dowek for his comments on this chapter. Miller's
work has been supported in part by the following grants: ARO DAAL03-89-
0031, ONR N00014-93-1-1324, NSF CCR91-02753, and NSF CCR92-09224.
Nadathur has similarly received support from the NSF grants CCR-89-
05825 and CCR-92-08465.
References
[Andrews, 1971] Peter B. Andrews. Resolution in type theory. Journal of
Symbolic Logic, 36:414-432, 1971.
[Andrews, 1989] Peter B. Andrews. On connections and higher-order logic.
Journal of Automated Reasoning, 5(3):257-291, 1989.
[Andrews et al., 1984] Peter B. Andrews, Eve Longini Cohen, Dale Miller,
and Frank Pfenning. Automating higher order logic. In Automated The-
orem Proving: After 25 Years, pages 169-192. American Mathematical
Society, Providence, RI, 1984.
Higher-Order Logic Programming 587
[Apt and van Emden, 1982] K. R. Apt and M. H. van Emden. Contribu-
tions to the theory of logic programming. Journal of the ACM, 29(3):841-
862, 1982.
[Barendregt, 1981] H. P. Barendregt. The Lambda Calculus: Its Syntax
and Semantics. North-Holland, 1981.
[Bledsoe, 1979] W. W. Bledsoe. A maximal method for set variables in
automatic theorem-proving. In Machine Intelligence 9, pages 53-100.
John Wiley, 1979.
[Brisset and Ridoux, 1991] Pascal Brisset and Olivier Ridoux. Naive re-
verse can be linear. In Eighth International Logic Programming Confer-
ence, Paris, France, June 1991. MIT Press.
[Brisset and Ridoux, 1992] Pascal Brisset and Olivier Ridoux. The archi-
tecture of an implementation of AProlog: Prolog/Mali. In Dale Miller,
editor, Proceedings of the 1992 ^Prolog Workshop, 1992.
[Chen et al., 1993] Weidong Chen, Michael Kifer, and David S. Warren.
HiLog: A foundation for higher-order logic programming. Journal of
Logic Programming, 15(3): 187-230, February 1993.
[Church, 1940] Alonzo Church. A formulation of the simple theory of types.
Journal of Symbolic Logic, 5:56-68, 1940.
[Damas and Milner, 1982] Luis Damas and Robin Milner. Principal type
schemes for functional programs. In Proceedings of the Ninth ACM Sym-
posium on Principles of Programming Languages, pages 207-212. ACM
Press, 1982.
[Elliott and Pfenning, 1991] Conal Elliott and Frank Pfenning. A semi-
functional implementation of a higher-order logic programming language.
In Peter Lee, editor, Topics in Advanced Language Implementation,
pages 289-325. MIT Press, 1991.
[Felty, 1993] Amy Felty. Implementing tactics and tacticals in a higher-
order logic programming language. Journal of Automated Reasoning,
11(1):43-81, August 1993.
[Felty and Miller, 1988] Amy Felty and Dale Miller. Specifying theorem
provers in a higher-order logic programming language. In Ninth Interna-
tional Conference on Automated Deduction, pages 61-80, Argonne, IL,
May 1988. Springer-Verlag.
[Gentzen, 1969] Gerhard Gentzen. Investigations into logical deduction.
In M. E. Szabo, editor, The Collected Papers of Gerhard Gentzen, pages
68-131. North-Holland, 1969.
[Girard et al., 1989] Jean-Yves Girard, Paul Taylor, and Yves Lafont.
Proofs and Types. Cambridge University Press, 1989.
[Goldfarb, 1981] Warren Goldfarb. The undecidability of the second-order
unification problem. Theoretical Computer Science, 13:225-230, 1981.
588 Gopalan Nadathur and Dale Miller
Contents
1 Introduction 592
1.1 Constraint languages 593
1.2 Logic Programming 595
1.3 CLP languages 596
1.4 Synopsis 598
1.5 Notation and terminology 599
2 Constraint domains 601
3 Logical semantics 608
4 Fixedpoint semantics 609
5 Top-down execution 611
6 Soundness and completeness results 615
7 Bottom-up execution 617
8 Concurrent constraint logic programming 619
9 Linguistic extensions 621
9.1 Shrinking the computation tree 621
9.2 Complex constraints 623
9.3 User-defined constraints 624
9.4 Negation 625
9.5 Preferred solutions 626
10 Algorithms for constraint solving 628
10.1 Incrementality 628
10.2 Satisfiability (non-incremental) 630
10.3 Satisfiability (incremental) 633
10.4 Entailment 637
10.5 Projection 640
10.6 Backtracking 643
11 Inference engine 645
11.1 Delaying/wakeup of goals and constraints .... 645
11.2 Abstract machine 651
592 Joxan Jaffar and Michael J. Maher
1 Introduction
Constraint Logic Programming (CLP) began as a natural merger of two
declarative paradigms: constraint solving and logic programming. This
combination helps make CLP programs both expressive and flexible, and
in some cases, more efficient than other kinds of programs. Though a
relatively new field, CLP has progressed in several and quite different di-
rections. In particular, the early fundamental concepts have been adapted
to better serve in different areas of applications. In this survey of CLP,
a primary goal is to give a systematic description of the major trends in
terms of common fundamental concepts.
Consider first an example program in order to identify some crucial CLP
concepts. The program below defines the relation sumto(n, 1 + 2 + . . . . . + n)
for natural numbers n.
sumto(0, 0).
sumto(N, S) :- N >= 1, N <= S, sumto(N - 1, S - N).
The query S <= 3, sumto(N, S) gives rise to three answers (N = 0, S =
0), (N = 1, S =1), and (N = 2, S = 3), and terminates. The computation
sequence of states for the third answer, for example, is
S <3,sumto(N,S).
S = S 1 ,N 1 >1,N 1 < S 1 ,
-N1 = S2,N2 >l,N2< S2,
Prolog can be said to be a CLP language where the constraints are equa-
tions over the algebra of terms (also called the algebra of finite trees, or the
Herbrand domain). The equations are implicit in the use of unification2.
Almost every language we discuss incorporates Prolog-like terms in ad-
dition to other terms and constraints, so we will not discuss this aspect
further. Prolog II [Colmerauer, 1982a] employs equations and disequations
(£) over rational trees (an extension of the finite trees of Prolog to cyclic
structures). It was the first logic language explicitly described as using
constraints [Colmerauer, 1983].
CLP (72.) [Jaffar et al, 1992a] has linear arithmetic constraints and com-
putes over the real numbers. Nonlinear constraints are ignored (delayed)
until they become effectively linear. CHIP [Dincbas et al., 1988a] and Pro-
log III [Colmerauer, 1988] compute over several domains. Both compute
over Boolean domains: Prolog III over the well-known 2-valued Boolean
algebra, and CHIP over a larger Boolean algebra that contains symbolic
values. Both CHIP and Prolog III perform linear arithmetic over the ratio-
nal numbers. Separately (domains cannot be mixed), CHIP also performs
linear arithmetic over bounded subsets of the integers (known as "finite
domains"). Prolog III also computes over a domain of strings. There are
now several languages which compute over finite domains in the manner
of CHIP, including clp(FD) [Diaz and Codognet, 1993], Echidna [Havens et
al., 1992], and Flang [Mantsivoda, 1993]. cc(FD) [van Hentenryck et al,
1993] is essentially a second-generation CHIP system.
LOGIN [Ai't-Kaci and Nasr, 1986] and LIFE [A'ft-Kaci and Podelski,
1993a] compute over an order-sorted domain of feature trees. This domain
provides a limited notion of object (in the object-oriented sense). The
languages support a term syntax which is not first-order, although every
term can be interpreted through first-order constraints. Unlike other CLP
languages/domains, Prolog-like trees are essentially part of this domain,
instead of being built on top of the domain. CIL [Mukai, 1987] computes
over a domain similar to feature trees.
BNR-Prolog [Older and Benhamou, 1993] computes over three domains:
the 2-valued Boolean algebra, finite domains, and arithmetic over the real
numbers. In contrast to other CLP languages over arithmetic domains,
it computes solutions numerically, instead of symbolically. Trilogy [Voda,
1988a; Voda, 1988b] computes over strings, integers, and real numbers.
Although its syntax is closer to that of C, 2LP [McAloon and Tretkoff,
1989] can be considered to be a CLP language permitting only a subset
of Horn clauses. It computes with linear constraints over integers and real
numbers.
CAL [Aiba et al., 1988] computes over two domains: the real num-
2
The language Absys [Elcock, 1990], which was very similar to Prolog, used equations
explicitly, making it more obviously a CLP language.
598 Joxan Jaffar and Michael J. Maher
1.4 Synopsis
The remainder of this paper is organized into three main parts. In part
I, we provide a formal framework for CLP. Particular attention will be
paid to operational semantics and operational models. As we have seen
in examples, it is the operational interpretation of constraints, rather than
the declarative interpretation, which distinguishes CLP from LP. In part
II, algorithm and data structure considerations are discussed. A crucial
property of any CLP implementation is that its constraint handling algo-
rithms are incremental. In this light, we review several important solvers
and their algorithms for the satisfiability, entailment, and delaying of con-
straints. We will also discuss the requirements of an inference engine for
CLP. In part III, we consider CLP applications. In particular, we discuss
two rather different programming paradigms, one suited for the modelling
of complex problems, and one for the solution of combinatorial problems.
In this survey, we concentrate on the issues raised by the introduction
of constraints to LP. Consequently, we will ignore, or pass over quickly,
those issues inherent in LP. We assume the reader is somewhat familiar
with LP and basic first-order logic. Appropriate background can be ob-
tained from [Lloyd, 1987] for LP and [Shoenfield, 1967] for logic. For
introductory papers on constraint logic programming and CLP languages
we refer the reader to [Colmerauer, 1987; Colmerauer, 1990; Lassez, 1987;
Fruhwirth et al, 1992]. For further reading on CLP, we suggest other
surveys [Cohen, 1990; van Hentenryck, 1991; van Hentenryck, 1992], some
collections of papers [Benhamou and Colmerauer, 1993; Kanellakis et al.,
to appear; van Hentenryck, 1993], and some books [van Hentenryck, 1989a;
Saraswat, 1989]. More generally, papers on CLP appear in various jour-
nals and conference proceedings devoted to computational logic, constraint
processing, or symbolic computation.
Constraint Logic Programming 599
2 Constraint domains
For any signature £, let D be a S-structure (the domain of computation)
and £ be a class of S-formulas (the constraints). We call the pair (D, £) a
constraint domain. In a slight abuse of notation we will sometimes denote
the constraint domain by T>. We will make, several assumptions, none of
which is strictly necessary, to simplify the exposition. We assume
• The terms and constraints in £ come from a first-order language4.
• The binary predicate symbol = is contained in S and is interpreted
as identity in D5.
• There are constraints in £ which are, respectively, identically true
and identically false in V.
• The class of constraints £ is closed under variable renaming, conjunc-
tion and existential quantification.
We will denote the smallest set of constraints which satisfies these assump-
tions and contains all primitive constraints - the constraints generated by
the primitive constraints - by £s. In general, £ may be strictly larger than
£s since, for example, universal quantifiers or disjunction are permitted in
£; it also may be smaller, as in Example 2.0.7 of Section 2 below. However,
we will usually take £ = £s. On occasion we will consider an extension of
S and £, to E* and £* respectively, so that there is a constant in S* for
every element of D.
We now present some example constraint domains. In practice, these
are not always fully implemented, but we leave discussion of that until
later. Most general purpose CLP languages incorporate some arithmetic
domain, including BNR-Prolog [Older and Benhamou, 1993], CAL [Aiba
et al., 1988], CHIP [Dincbas et al., 1988a], CLP(R) [Jaffar et al., 1992a],
Prolog III [Colmerauer, 1988], RISC-CLP(Real) [Hong, 1993].
Example 2.0.1. Let £ contain the constants 0 and 1, the binary function
symbols + and *, and the binary predicate symbols =, < and <. Let D
be the set of real numbers and let D interpret the symbols of £ as usual
(i.e. + is interpreted as addition, etc). Let £ be the constraints generated
by the primitive constraints. Then 5R = (£>,£) is the constraint domain
of arithmetic over the real numbers. If we omit from £ the symbol *
then the corresponding constraint domain RLin = (D1, £') is the constraint
domain of linear arithmetic over the real numbers. If the domain is further
4
Without this assumption, some of the results we cite are not applicable, since there
can be no appropriate first-order theory T. The remaining assumptions can be omitted,
at the expense of a messier reformulation of definitions and results.
5
This assumption is unnecessary when terms have a most general unifier in D, as
occurs in Prolog. Otherwise = is needed to express parameter passing.
602 Joxan Jaffar and Michael J. Maher
8
A variant of this domain, with a slightly different signature, is used in [Smolka and
Treinen, 1992].
604 Joxan Jaffar and Michael J. Maher
9
Only finitely many constants are used in any one program, so it can be argued
that a finite Boolean algebra is a more appropriate domain of computation. However,
the two alternatives agree on satisfiability and constraint entailment (although not if
an expanded language of constraints is permitted), and it is preferable to view the
constraint domain as independent of the program. Currently, it is not clear whether the
alternatives agree on other constraint operations.
Constraint Logic Programming 605
10
It is also closely related to the model-theoretic properties that led to an interest in
Horn formulas [McKinsey, 1943; Horn, 1951].
Constraint Logic Programming 607
3 Logical semantics
There are two common logical semantics of CLP programs over a constraint
domain (Z>, £). The first interprets a rule
where x U y is the set of all free variables in the rule. The collection of all
such formulas corresponding to rules of P gives a theory also denoted by
P.
The second logical semantics associates a logic formula with each pred-
icate in n. If the set of all rules of P with p in the head is
p(x) <- Bi
p(x) <- B2
p(x) <- Bn
The collection of all such formulas is called the Clark completion of P, and
is denoted by P*.
A valuation v is a mapping from variables to D, and the natural ex-
tension which maps terms to D and formulas to closed £*-formulas. If
X is a set of facts then [X]D = {v(a) | (a <- c) e X,D (= v(c)}. A
D-interpretation of a formula is an interpretation of the formula with the
same domain as D and the same interpretation for the symbols in E as D.
It can be represented as a subset of BD where B-D = {p(d) | p € II, d e Dk}.
A D-model of a closed formula is a P-interpretation which is a model of
the formula.
Let T denote a satisfaction complete theory for (D,L). The usual
logical semantics are based on the D-models of P and the models of P*, T.
The least P-model of a formula Q under the subset ordering is denoted by
lm(Q,D), and the greatest is denoted by gm(Q,D)). A solution to a query
G is a valuation v such that v(G) C lm(P,D).
4 Fixedpoint semantics
The fixedpoint semantics we present are based on one-step consequence
functions TpD and Sp, and the closure operator [P] generated by Tp.
The functions Tp* and [P] map over P-interpretations. The set of T>-
interpretations forms a complete lattice under the subset ordering, and
these functions are continuous on B-D.
/tO = 0 .
/ t ( a + l) = / ( / t a )
/ 10 = U«</3 / t « if /? is a limit ordinal
<(^U Q»(0) =
. V \= P1 «• P2 iff [P1] = [P2]
We will need the following terminology later. P is said to be (D, £)-
canonical iff gfp(Tj?) = Tp | w. Canonical logic programs, but not
Constraint Logic Programming 611
constraint logic programs, were first studied in [Jaffar and Stuckey, 1986]
which showed that every logic program is equivalent (wrt the success and
finite failure sets) to a canonical logic program. The proof here was not
constructive, but subsequently, [Wallace, 1989] provided an algorithm to
generate the canonical logic program12. Like many other kinds of results
in traditional logic programming, these results are likely to extend to CLP
in a straightforward way.
5 Top-down execution
The phrase "top-down execution" covers a multitude of operational mod-
els. We will present a fairly general framework for operational semantics
in which we can describe the operational semantics of some major CLP
systems.
We will present the operational semantics as a transition system on
states: tuples (A, C, S) where A is a multiset of atoms and constraints, and
C and S are multisets of constraints. The constraints C and S are referred
to as the constraint store and, in implementations, are acted upon by a
constraint solver. Intuitively, A is a collection of as-yet-unseen atoms and
constraints, C is the collection of constraints which are playing an active
role (or are awake), and 5 is a collection of constraints playing a passive
role (or are asleep). There is one other state, denoted by fail. To express
more details of an operational semantics, it can be necessary to represent
the collections of atoms and constraints more precisely. For example, to
express the left-to-right Prolog execution order we might use a sequence
of atoms rather than a multiset. However, we will not be concerned with
such details here.
We will assume as given a computation rule which selects a transition
type and an appropriate element of A (if necessary) for each state13. The
transition system is also parameterized by a predicate consistent and a
function infer, which we will discuss later. An initial goal G for execution
is represented as a state by (G, 0,0).
The transitions in the transition system are:
(A U a, C, S) -»r (A U B, C, S U (a = h))
(A U a, C, S) -)> fail
if a is selected by the computation rule, a is an atom and, for every rule
h 4- B of P, h and a have different predicate symbols.
if consistent(C).
14
For example, in CLP(R), where linear constraints are active and non-linear con-
straints are passive, if S is y = x * x then we can take c to be y > 2k x — K2, for any K.
There is no finite collection C' of active constraints which implies all these constraints
and is not stronger than S.
614 Joxan Jaffar and Michael J. Maher
(A,C,S)-+a(A,C',9)
where C' is a set of equations in £* such that V \= C' ->• (C A 5) and, for
every variable x occurring in C or 5, C' contains an equation x = d for
some constant d. Thus —>g grounds all variables in the constraint solver.
We also have the transitions
etc., most results (and even their proofs) lift from LP to CLP. The lifting
is discussed in greater detail in [Maher, 1993b]. Furthermore, most opera-
tional aspects of LP (and Prolog) can be interpreted as logical operations,
and consequently these operations (although not their implementations)
also lift to CLP. One early example is matching, which is used in various
LP systems (e.g. GHC, NU-Prolog) as a basis for affecting the computation
rule; the corresponding operation in CLP is constraint entailment [Maher,
1987].
The philosophy of the CLP Scheme [Jaffar and Lassez, 1987] gives pri-
macy to the structure T> over which computation is performed, and less
prominence to the theory T. We have followed this approach. However,
it is also possible to start with a satisfaction complete theory T (see, for
example [Maher, 1987]) without reference to a structure. We can arbitrar-
ily choose a model of T as the structure D> and the same results apply.
Another variation [Hohfeld and Smolka, 1988] considers a collection D of
structures and defines consistent(C) to hold iff for some structure D> 6 D
we have T> \= 3 C. Weaker forms of the soundness and completeness of
successful derivations apply in this case.
7 Bottom-up execution
Bottom-up execution has its main use in database applications. The set-
at-a-time processing limits the number of accesses to secondary storage in
comparison to tuple-at-a-time processing (as in top-down execution), and
the simple semantics gives great scope for query optimization.
Bottom-up execution is also formalized as a transition system. For
every rule r of the form h f- c, b\ , . . . , b1 in P and every set A of facts
there is a transition
A ~> A U {h f- c' | Oj «- Ci, i = 1, . . . , n are elements of
A,T>\=cf <->• cA Ar=i Ci A&i = at}
In brief, then, we have A ~» AL)S®(A), for every set A and every rule r in P
(Sp was defined in Section 4). An execution is a sequence of transitions. It
is fair if each rule is applied infinitely often. The limit of ground instances
of sets generated by fair executions is independent of the order in which
transitions are applied, and is regarded as the result of the bottom-up
execution. If Q is an initial set of facts and P is a program, and A is the
result of a fair bottom-up execution then A = SS(P U Q) = ((P))(Q) and
A ~* A U reduce(SJ?(A), A)
h «- ask : tell I B
15
Concurrent Prolog [Shapiro, 1983a] is not an ask-tell language, but [Saraswat, 1988]
shows how it can be fitted inside the CCLP framework.
620 Joxan Jaffar and Michael J. Maher
9 Linguistic extensions
We discuss in this section some additional linguistic features for top-down
CLP languages.
9.1 Shrinking the computation tree
The aim of -*i transitions is to extract as much information as is reasonable
from the passive constraints, so that the branching of —>r transitions is
reduced. There are several other techniques, used or proposed, for achieving
this result.
In [Le Provost and Wallace, 1993] it is suggested that information can
also be extracted from the atoms in a state. The constraint extracted would
be an approximation of the answers to the atom. This operation can be
expressed by an additional transition rule:
the domain of each variable and the other (less accurately) approximat-
ing each constraint using the interval constraints for each variable. The
disjunction of these approximations is easily approximated by an active
constraint. For linear arithmetic [De Backer and Beringer, 1993] suggests
the use of the convex hull of the regions defined by the two constraints as
the approximation. Note that the constructive disjunction behavior could
be obtained from the clauses for p using the methods of [Le Provost and
Wallace, 1993].
In the second category, we mention two constructs used with the fi-
nite domain solver of CHIP. element(X, L, T) expresses that T is the
X'ih element in the list L. Operationally, it allows constraints on ei-
ther the index X or element T of the list to be reflected by constraints
on the other. For example, if X is constrained so that X 6 {1,3,5} then
element(X, [1,1,2,3,5,8], T) can constrain T so that T e {1,2,5} and,
similarly, if T is constrained so that T € {1,3,5} then X is constrained
so that X G {1,2,4,5}. Declaratively, the cumulative constraint of [Ag-
goun and Beldiceanu, 1992] expresses a collection of linear inequalities on
its arguments. Several problems that can be expressed as integer pro-
gramming problems can be expressed with cumulative. Operationally, it
behaves somewhat differently from the way CHIP would usually treat the
inequalities.
9.4 Negation
Treatments of negation in logic programming lift readily to constraint logic
programming, with only minor adjustments necessary. Indeed many of
the semantics for programs with negation are essentially propositional, be-
ing based upon the collection of ground instances of program rules. The
perfect model [Przymusinski, 1988; Apt et al., 1988; van Gelder, 1988],
well-founded model [van Gelder et al., 1988], stable model [Gelfond and
Lifschitz, 1988] and Fitting fixedpoint semantics [Fitting, 1986], to name
but a few, fall into this category. The grounding of variables in CLP rules
by all elements of the domain (i.e. by all terms in £*) and the deletion of
all grounded rules whose constraints evaluate to false produces the desired
prepositional rules (see, for example, [Maher, 1993a]).
Other declarative semantics, based on Clark's completion P* of the
18
This approach has been called a 'glass-box' approach.
626 Joxan Jaffar and Michael J. Maher
the above logical formulation [Fages, 1993; Marriott and Stuckey, 1993b],
although a special-purpose implementation is more efficient. [Marriott and
Stuckey, 1993b] gives a completeness result for such an implementation,
based on Kunen's semantics for negation.
A second approach is to admit constraints which are not required to
be satisfied by a solution, but express a preference for solutions which do
satisfy them. Such constraints are sometimes called soft constraints. The
most developed use of this approach is in hierarchical constraint logic pro-
gramming (HCLP) [Borning et al., 1989; Wilson and Horning, 1993]. In
HCLP, soft constraints have different strengths and the constraints accu-
mulated during a derivation form a constraint hierarchy based on these
strengths. There are many possible ways to compare solutions using these
constraint hierarchies [Borning et al., 1989; Maher and Stuckey, 1989;
Wilson and Borning, 1993], different methods being suitable for different
problems. The hierarchy dictates that any number of weak constraints
can be over-ruled by a stronger constraint. Thus, for example, default be-
havior can be expressed in a program by weak constraints, which will be
over-ruled by stronger constraints when non-default behavior is required.
The restriction to best solutions of a constraint hierarchy can be viewed as
a form of circumscription [Satoh and Aiba, 1993].
Each of the above approaches has some programming advantages over
the other, in certain applications, but both have problems as general-
purpose methods. While the first approach works well when there is a
natural choice of objective function suggested by the problem, in general
there is no natural choice. The second approach provides a higher-level
expression of preference but it cannot be so easily 'fine-tuned' and it can
produce an exponential number of best answers if not used carefully. The
approaches have the advantages and disadvantages of explicit (respectively,
implicit) representations of preference. In the first approach, it can be dif-
ficult to reflect intended preferences. In the second approach it is easier
to reflect intended preferences, but harder to detect inconsistency in these
preferences. It is also possible to 'weight' soft constraints, which provides
a combination of both approaches.
Implementation issues
The main innovation required to implement a CLP system is clearly in the
manipulation of constraints. Thus the main focus in this part of the survey
is on constraint solver operations, described in Section 10. Section 11 then
considers the problem of extending the LP inference engine to deal with
constraints. Here the discussion is not tied down to a particular constraint
domain.
It is important to note that the algorithms and data structures in this
part are presented in view of their use in top-down systems and, in partic-
ular, systems with backtracking. At the present, there is little experience
628 Joxan Jaffar and Michael J. Maher
10.1 Incrementality
According to the folklore of CLP, algorithms for CLP implementations must
be incremental in order to be practical. However, this prescription is not
totally satisfactory, since the term incremental can be used in two different
senses. On one hand, incrementality is used to refer to the nature of the
algorithm. That is, an algorithm is incremental if it accumulates an internal
state and a new input is processed in combination with the internal state.
Such algorithms are sometimes called on-line algorithms. On the other
hand, incrementality is sometimes used to refer to the performance of the
algorithm. This section serves to clarify the latter notion of incrementality
as a prelude to our discussion of algorithms in the following subsections.
We do not, however, offer a formal definition of incrementality.
We begin by abstracting away the inference engine from the operational
semantics, to leave simply the constraint solver and its operations. We
consider the state of the constraint solver to consist of the constraint store
C, a collection of constraints G that are to be entailed, and some backtrack
points. In the initial state, denoted by 0, there are no constraints nor
backtrack points. The constraint solver reacts to a sequence of operations,
and results in (a) a new state, and (b) a response.
Recall that the operations in CLP languages are:
• augment C with c to obtain a new store, determine whether the new
store is satisfiable, and if so, determine which constraints in G are
implied by the new store;
• add a new constraint to G;
• set a backtrack point (and associate with it the current state of the
system);
• backtrack to the previous backtrack point (i.e. return the state of the
system to that associated with the backtrack point);
• project C onto a fixed set of variables.
Constraint Logic Programming 629
Only the first and last of these operations can produce a response from the
constraint solver.
Consider the application of a sequence of operations o1,..., Ok on a
state A; denote the updated state by -F(A,oi.. .o/t), and the sequence of
responses to the operations by Q(o1.. . Ok). In what follows we shall be
concerned with the average cost of computing f and Q. Using standard
definitions, this cost is parameterized by the distribution of (sequences of)
operations (see, for example, [Vitter and Flajolet, 1990]). We use average
cost assuming the true distribution, the distribution that reflects what
occurs in practice. Even though this distribution is almost always not
known, we often have some hypotheses about it. For example, one can
identify typical and often occurring operation sequences and hence can
approximate the true distribution accordingly. The informal definitions
below therefore are intended to be a guide, as opposed to a formal tool for
cost analysis.
For an expression exp(o) denoting a function of o, define 4V[exp(o)]
to be the average value of exp(o), over all sequences of operations 6. Note
that the definition of average here is also dependent on the distribution of
the o. For example, let cost(o) denote the cost of computing F ( 0 , 0) by
some algorithm, for each fixed sequence o. Then AV[cost(6)] denotes the
average cost of computing ?((£), 6) over all o.
Let A be shorthand for f-((£>,o\.. .Ok-1) Let A denote an algorithm
which applies a sequence of operations on the initial state, giving the same
response as does the constraint solver, but not necessarily computing the
new state. That is, A is the batch (or off-line) version of our constraint
solver. In what follows we discuss what it means for an algorithm to be
incremental relative to some algorithm A. Intuitively A represents the best
available batch algorithm for the operations.
At one extreme, we consider that an algorithm for f- and Q is 'non-
incremental' relative to A if the average cost of applying an extra operation
Ok to A is no better than the cost of the straightforward approach using A
on o1... Ok • We express this as
AV[cost(0, QI ...Ok-i) + c
extra.cost(oi . . . O k )
where the additional term extra.cost(oi . . .Ok) denotes the extra cost in-
curred by the on-line algorithm over the best batch algorithm. Therefore,
one possible "definition" of an incremental algorithm, good enough for use
in a CLP system, is simply that its extra-cost factor is negligible.
In what follows, we shall tacitly bear in mind this expression to ob-
tain a rough definition of incrementality21 . Although we have defined in-
crementality for a collection of operations, we will review the operations
individually, and discuss incrementality in isolation. This can sometimes
be an oversimplification; for example, [Mannila and Ukkonen, 1986] has
shown that the standard unification problem does not remain linear when
backtracking is considered. In general, however, it is simply too complex,
in a survey article, to do otherwise.
complexity. For the more general domain Sftz,ira, polynomial time algo-
rithms are also known [Khachian, 1979], but these algorithms are not used
in practical CLP systems. Instead, the Simplex algorithm (see eg. [Chvatal,
1983]), despite its exponential time worst case complexity [Klee and Minty,
1972], is used as a basis for the algorithm. However, since the Simplex
algorithm works over non-negative numbers and non-strict inequalities, it
must be extended for use in CLP systems. While such an extension is
straightforward in principle, implementations must be carefully engineered
to avoid significant overhead. The main differences between the Simplex-
based solvers in CLP systems is in the specific realization of this basic
algorithm. For example, the CLP(R) system uses a floating-point repre-
sentation of numbers, whereas the solvers of CHIP and Prolog III use exact
precision rational number arithmetic. As another example, in the CLP (72.)
system a major design decision was to separately deal with equations and
inequalities, enjoying a faster (Gaussian-elimination based) algorithm for
equations, but enduring a cost for communication between the two kinds
of algorithms [Jaffar et al., 1992a]. Some elements of the CHIP solver are
described in [van Hentenryck and Graf, 1991]. Disequality constraints can
be handled using entailment of the corresponding equation (discussed in
Section 10.4) since an independence of negative constraints holds [Lassez
and McAloon, 1992].
For the domain of word equations W£, an algorithm is known [Makanin,
1977] but no efficient algorithm is known. In fact, the general problem,
though easily provable to be NP-hard, is not known to be in NP. The
most efficient algorithm known still has the basic structure of the Makanin
algorithm but uses a far better bound for termination [Koscielski and Pa-
cholski, 1992]. Systems using word equations, Prolog III for example, thus
resort to partial constraint solving using a standard delay technique on the
lengths of word variables. Rajasekar's 'string logic programs' [Rajasekaar,
1993] also uses a partial solution of word equations. First, solutions are
found for equations over the lengths of the word variables appearing in the
constraint; only then is the word equation solved.
As with word equations, the satisfiability problem in finite domains
such as FD is almost always NP-hard. Partial constraint solving is once
again required, and here is a typical approach. Attach to each variable x a
data structure representing dom(x), its current possible values23. Clearly
dom(x) should be a superset of the projection space w.r.t. x. Define min(x)
and max(x) to be the smallest and largest numbers in dom(x) respectively.
Now, assume that every constraint is written in so that each inequality is
of the form x < y or x < y, each disequality is of the form x ^ y, and each
equation is of the form x — n, x = y, x = y + z, where x,y,z are variables
23
The choice of such a data structure should depend on the size of the finite domains.
For example, with small domains a characteristic vector is appropriate.
632 Joxan Jaffar and Michael J. Maher
24
In this case, simply remove from dom(x) all elements bigger than max(y), and
remove from dom(y) all elements smaller than min(x). We omit the details of similar
operations in the following discussion.
25
If both are singletons, clearly the constraint reduces to either true or false.
Constraint Logic Programming 633
The first two kinds of solved form above are also examples of solution
forms, that is, a format in which the set of all solutions of the constraints
is evident. Here, any instance of the variables y determines values for x
and thus gives one solution. The set of all such instances gives the set of
all solutions. The Simplex format, however, is not in solution form: each
choice of basis variables depicts just one particular solution.
An important property of solution forms (and sometimes of just solved
forms) is that they define a convenient representation of the projection of
the solution space with respect to any set of variables. More specifically,
each variable can be equated with a substitution expression containing only
parametric variables, that is, variables whose projections are the entire
space. This property, in turn, aids incrementality as we now show via our
sample domains.
In each of the following examples, let C be a (satisfiable) constraint in
solved form and let c be the new constraint at hand. For FT, the substitu-
tion expression for a variable x is simply ax if a is not eliminable; otherwise
it is the expression equated to x in the solved form C. This mapping is
generalized to terms in the obvious way. Similarly we can define a map-
ping of linear expressions by replacing the eliminable variables therein with
their substitution expressions, and then collecting like terms. For the do-
main 9?Ltni in which case C is in Simplex form, the substitution expression
for a variable x is simply a: if a; is not basic; otherwise it is the expression
obtained by writing the (unique) equation in C containing x with x as the
subject. Once again, this mapping can be generalized to any linear expres-
sion in an obvious way. In summary, a solution form defines a mapping
9 which can be used to map any expression t into an equivalent form t9
which is free of eliminable variables.
The basic step of a satisfiability algorithm using a solution form is
essentially this.
Algorithm 10.3.1. Given C, (a) Replace the newly considered constraint
c by c9 where 9 is the substitution defined by C. (b) Then write c9 into
equations of the form x = ..., and this involves choosing the x and rear-
ranging terms. Unsatisfiability is detected at this stage, (c) If the previous
step succeeds, use the new equations to substitute out all occurrences of x
in C. (d) Finally, simply add the new equations to C, to obtain a solution
form for C A c.
Note that the nonappearance of eliminable variables in substitution
expressions is needed in (b) to ensure that the new equations themselves
are in solved form, and in (c) to ensure that C, augmented with the new
equations, remains in solution form.
The belief that this methodology leads to an incremental algorithm is
based upon believing that the cost of dealing with c is more closely related
to the size of c (which is small on average) than that of C (which is very
Constraint Logic Programming 635
i ,..., x n _i ) = 0
so that the problem for t — 0 can be reduced to that of
xn = h(xi,...,xn-i)®->g(xi,...,xn-i) Aj/ n
where yn is a new variable, describes all the possible solutions for xn.
This reduction clearly can be repeatedly applied until we are left with the
straightforward problem of deciding the satisfiability of equations of the
form t A x © u — 0 where t and u are ground. The unifier desired is given
simply by collecting (and substituting out all assigned variables in) the
assignments, such as that for xn above.
The key efficiency problem here is, of course, that the variable elimina-
tion process gives rise to larger expressions, an increase which is exponential
in the number of eliminated variables, in the worst case. So even though
this algorithm satisfies the structure of Algorithm 1, it does not satisfy
our assumption about the size of expressions obtained after substitution,
and hence our general argument for incrementality does not apply here.
Despite this, and the fact that Boole's work dates far back, this method is
still used, for example, in CHIP [Buttner and Simonis, 1987].
Another unification algorithm is due to Lowenhein, and we adapt the
presentation of [Martin and Nipkow, 1989] here. Let f(xi, . . . ,xn) = 0 be
Constraint Logic Programming 637
the equation considered. Let a denote a solution. The unifier is then simply
given by
10.4 Entailment
Given satisfiable C, guard constraints G such that no constraint therein is
entailed by C, and a new constraint c, the problem at hand is to determine
the subset GI of G of constraints entailed by C A c. We will also consider
the problem of detecting groundness which is not, strictly speaking, an
entailment problem. However, it is essentially the same as the problem of
detecting groundness to a specific value, which is an entailment problem.
In what follows the distinction is unimportant.
638 Joxan Jaffar and Michael J. Maher
26
As mentioned above, this discussion will essentially apply to guard constraints of
the form ground(x).
Constraint Logic Programming 639
tions and disequations (^) are considered constraints. This algorithm has a
rather different basis than those discussed above; it involves memorization
of pairs of terms (entailments and disequations) and the use of a reduction
of disequation entailment to several equation entailments.
For ytunEqn, let G contain arbitrary equations e. Add to the algorithm
which constructs the solved form a representation of each such equation e in
which all eliminable variables are substituted out. Note, however, that even
though these equations are stored with the other equations in the constraint
store, they are considered as a distinct collection, and they play no direct
role in the question of satisfiability of the current store. For example, a
constraint store containing x = z + 3, y = z + 2 would cause the guard
equation y + z = 4 to be represented as z = 1. It is easy to show that
a guard equation e is entailed iff its representation reduces to the trivial
form 0 = 0, and similarly, the equation is refuted if its representation is of
the form 0 = n where n is a nonzero number. (In our example, the guard
equation is entailed or refuted just in case z becomes ground.) In order to
have incrementality we must argue that the substitution operation is often
applied only to very few of the guard constraints. This is tantamount to
the second assumption made to argue the incrementality of Algorithm 1.
Hence we believe our algorithm is incremental.
We move now to the domain 9tnn, but allow only equations in the guard
constraints G. Here we can proceed as in the above discussion for ^LinEqn
to obtain an incremental algorithm, but we will have the further require-
ment that the constraint store contains all implicit equalities27 explicitly
represented as equations. It is then still easy to show that the entailment
of a guard equation e need be checked only when the representation of e
is trivial. The argument for incrementality given above for ^tiinBqn essen-
tially holds here, provided that the cost of computing implicit equalities is
sufficiently low.
There are two main works on the detection of implicit equalities in CLP
systems over &£,;„. In [Studkey, 1989], the existence of implicit equalities is
detected by the appearance of an equation of a special kind in the Simplex
tableau at the end of the satisfiability checking process. Such an equation
indicates some of the implicit equalities, but more pivoting (which, in turn,
can give rise to more special equations) is generally required to find all of
them. An important characteristic of this algorithm is that the extra cost
incurred is proportional to the number of implicit equalities. This method
is used in CLP (R) and Prolog III. CHIP uses a method based on [van
Hentenryck and Graf, 1991]. In this method, a solved form which is more
restrictive than the usual Simplex solved form is used. An equation in this
27
These are equalities which are entailed by the store because of the presence of
inequalities. For example, the constraint store x + y<3,x + y>3 entails the implicit
equality x + y = 3.
640 Joxan Jaffar and Michael J. Maher
form does not contribute to any implicit equality, and a whole tableau in
this solved form implies that there are no implicit equalities. The basic
idea is then to maintain the constraints in the solved form and when a
new constraint is encountered, the search for implicit equalities can be first
limited to variables in the new constraint. One added feature of this solved
form is that it directly accommodates strict inequalities and disequations.
Next still consider the domain ^Lin, but now allow inequalities to be
in G. Here it is not clear how to represent a guard inequality, say x > 5, in
such a way that its entailment or refutation is detectable by some simple
format in its representation. Using the Simplex tableau format as a solved
form as discussed above, and using the same intuition as in the discussion
of guard equations, we could substitute out x in x > 5 in case x is basic.
However, it is not clear to which format(s) we should limit the resulting
expression in order to avoid explicitly checking whether x > 5 is entailed28.
Thus an incremental algorithm for checking the entailment of inequalities
is yet to be found.
For BOOL there seems to be a similar problem in detecting the entail-
ment of Boolean constraints. However, in the case of groundness entailment
some of the algorithms we have previously discussed are potentially incre-
mental. The Prolog III algorithm, in fact, is designed with the detection of
groundness as a criterion. The algorithm represents explicitly all variables
that are grounded by the constraints. The Groebner basis algorithm will
also contain in its basis an explicit representation of grounded variables.
Finally, for the unification algorithms, the issue is clearly the form of the
unifier. If the unifier is in fully simplified form then every ground variable
will be associated with a ground value.
In summary for this subsection, the problem of detecting entailment is
not limited just to the cost of determining if a particular constraint is en-
tailed. Incrementality is crucial, and this property can be defined roughly as
limiting the cost to depend on the number of guard constraints affected by
each change to the store. In particular, dealing (even briefly) with the entire
collection of guard constraints each-time the store changes is unacceptable.
Below, in Section 11.1, an issue related to entailment is taken up. Here
we have focused on how to adapt the underlying satisfiability algorithm
to be incremental for determining entailment. There we will consider the
generic problem, independent of the constraint domain, of managing de-
layed goals which awake when certain constraints become entailed.
10.5 Projection
The problem at hand is to obtain a useful representation of the projection
of constraints C w.r.t. a given set of variables. More formally, the problem
28
And this can, of course, be done, perhaps even efficiently, but the crucial point is,
once again, that we cannot afford to do this every time the store changes.
Constraint Logic Programming 641
is: given target variables x and constraints C(x, y) involving variables from
x and y, express 3y C(x,y) in the most usable form. While we cannot de-
fine usability formally, it typically means both conciseness and readability.
An important area of use is the output phase of a CLP system: the desired
output from running a goal is the projection of the answer constraints with
respect to the goal variables. Here it is often useful to have only the tar-
get variables output (though, depending on the domain, this is not always
possible). For example, the output of x = z + l,y = z + 2 w.r.t. to x and
y should be x = y — I or some rearrangement of this, but it should not
involve any other variable. Another area of use is in meta-programming
where a description of the current store may be wanted for further manip-
ulation. For example, projecting ^Lin constraints onto a single variable x
can show if x is bounded, and if so, this bound can be used in the pro-
gram. Projection also provides the logical basis for eliminating variables
from the accumulated set of constraints, once it is known that they will
not be referred to again.
There are few general principles that guide the design of projection
algorithms across the various constraint domains. The primary reason is,
of course, that these algorithms have to be intimately related to the do-
main at hand. We therefore will simply resort to briefly mentioning existing
approaches for some of our sample domains.
The projection problem is particularly simple for the domain TT: the
result of projection is x = x9 where 0 is the substitution obtained from the
solved form of C. Now, we have described above that this solved form is
simply the mgu of C, that is, equations whose r.h.s. does not contain any
variable on the l.h.s. For example, x — f ( y ) , y = f ( z ) would have the
solved form x = f ( f ( z ) ) , y = f ( z ) . However, the equations x = f ( y ) , y =
f(z) are more efficiently stored internally as they are (and this is done
in actual implementations). The solved form for x therefore is obtained
only when needed (during unification for example) by fully dereferencing
y in the term f ( y ) . A direct representation of the projection of C on a
variable x, as required in a printout for example, can be exponential in
the size of C. This happens, for example, if C is of the form x = f ( x i ,xi),
xi — f(x2,X2), • • -, xn = f ( a , a) because x0 would contain 2n+1 occurrences
of the constant a. A solution would be to present the several equations
equivalent to x = x0, such as the n + 1 equations in this example. This
however is a less explicit representation of the projection; for example, it
would not always be obvious if a variable were ground.
Projection in the domain RT can be done by simply presenting those
equations whose l.h.s. is a target variable and, recursively, all equations
whose l.h.s. appears in anything already presented. Such a straightforward
presentation is in general not the most compact. For example, the equation
x = f ( f ( x , x ) , f ( x , x ) ) is best presented as x = f(x,x). In general, the
problem of finding the most compact representation is roughly equivalent to
642 Joxan Jaffar and Michael J. Maher
For RLinEqn the problem is only slightly more complicated. Recall that
equations are maintained in parametric form, with eliminable and para-
metric variables. A relatively simple algorithm can be obtained by using a
form of Gaussian elimination, and is informally described in Figure 1. It
assumes there is some ordering on variables, and ensures that lower pri-
ority variables are represented in terms of higher priority variables. This
ordering is arbitrary, except for the fact that the target variables should
be of higher priority than other variables. We remark that a crucial point
for efficiency is that the main loop in Figure 1 iterates n times, and this
number (the number of target variables) is often far smaller than the total
number of variables in the system. More details on this algorithm can be
found in [Jaffar et ai, 1993].
For ytiin, there is a relatively simple projection algorithm. Assume all
inequalities are written in a standard form ... < 0. Let C+ (C~) denote the
subset of constraints C in which x has only positive (negative) coefficients.
Let C® denote those inequalities in C not containing x at all. We can now
describe an algorithm, due to Fourier [Fourier, 1824], which eliminates a
variable x from a given C. If constraints c and c' have a positive and a nega-
tive coefficient of x, we can define elimx(c, c') to be a linear combination of
c and c', which does not contain x29. A Fourier step eliminates x from a set
of constraints C by computing FX(C) = {elimx(c,c') : c e C+,c' e C~}.
It is easy to show that 3xC <-> FX(C). Clearly repeated applications of F
29
Obtained, for example, by multiplying c by 1/m and c' by (—1/m'), where m and m'
are the coefficients of x in c and c' respectively, and then adding the resulting equations
together.
Constraint Logic Programming 643
10.6 Backtracking
The issue here is to restore the state of the constraint solver to a previ-
ous state (or, at least, an equivalent state). The most common technique,
following Prolog, is the trailing of constraints when they are modified by
the constraint solver and the restoration of these constraints upon back-
tracking. In Prolog, constraints are equations between terms, represented
internally as bindings of variables. Since variables are implemented as point-
ers to their bound values31, backtracking can be facilitated by the simple
mechanism of an untagged trail [Warren, 1983; Ait-Kaci, 1991]. This iden-
tifies the set of variables which have been bound since the last choice point.
Upon backtracking, these variables are simply reset to become unbound.
30
A constraint c e C is redundant in C if C 4+ C — {c}.
31
Recall that this means that eliminable variables are not explicitly dereferenced on
the r.h.s. of the equations in the solved form.
644 Joxan Jaffar and Michael J. Maher
11 Inference engine
This section deals with extensions to the basic inference engine for logic pro-
gramming needed because of constraints. What follows contains two main
sections. In the first, we consider the problem of an incremental algorithm
to manage a collection of delayed goals and constraints. This problem, dis-
cussed independently of the particular constraint domain at hand, reduces
to the problem of determining which of a given set of guard constraints (cf.
Section 9) are affected as a result of change to the constraint store. The
next section discusses extensions to the WAM, in both the design of the
instruction set as well as in the main elements of the runtime structure.
Finally, we give a brief discussion of work on parallel implementations.
35
These are templates for the guard constraints.
Constraint Logic Programming 647
pow(X,Y,Z) :-
Y=l I X=l.
pow(X,Y,Z) :-
ground(X), X^O, ground(Y), Y^l I Z=log(X)/log(Y).
pow(X,Y,Z) :-
ground(X), X^O, ground(Z) I Y=-^C.
pow(X,Y,Z) :-
ground(Y), Y = l , ground(Z) I X=Y Z .
pow(X,Y,Z) :-
X=0 | Y=0, Z=O.
pow(X,Y,Z) :-
Z=0 I X=l.
pow(X,Y,Z) :-
Z=l I X=Y.
This program could be compiled into the wakeup system in Figure 2, where
the three intermediate nodes reflect subexpressions in the guards that might
be entailed without the entire guard being entailed. (More precisely, several
woken nodes would be used, one for each clause body.) Thus wakeup sys-
tems express a central part of the implementation of (flat) guarded clauses.
Since a guarded atom can be viewed as a one-clause guarded clause pro-
gram for an anonymous predicate, wakeup systems are also applicable to
implementing these constructs.
11.1.2 Runtime structure
Here we present an implementational framework in the context of a given
wakeup system. There are three major operations with delayed goals or
delayed constraints which correspond to the actions of delaying, awakening
and backtracking:
1. adding a goal or delayed constraint to the current collection;
2. awakening a delayed goal or delayed constraint as the result of in-
putting a new (active) constraint, and
3. restoring the entire runtime structure to a previous state, that is,
restoring the collection of delayed goals and delayed constraints to
some earlier collection, and restoring all auxiliary structures accord-
ingly.
In what follows, we concentrate on delayed constraints; as mentioned above,
the constraint solver operations to handle delayed goals and guarded clauses
are essentially the same.
The first of our two major structures is a stack containing the delayed
constraints. Thus implementing operation 1 simply requires a push opera-
tion. Additionally, the stack contains constraints which are newer forms of
Constraint Logic Programming 649
(a) Pop the stack, and let C denote the constraint just popped, (b)
Delete all occurrence nodes pointed to by C. If there is no pointer
from C (and so it was a constraint that was newly delayed) to another
constraint deeper in the stack, then nothing more need be done, (c)
If there is a pointer from C to another constraint C" (and so C is
the reduced form of C'), then perform the modifications to the access
structure as though C' were being pushed onto the stack. These mod-
ifications, described above, involve computing the guards pertinent to
C", inserting occurrence nodes, and setting up reverse pointers.
Note that the index structure obtained in backtracking may not be
structurally the same as that of the previous state. What is important,
however, is that it depicts the same logical structure as that of the
previous state.
Figure 3 illustrates the entire runtime structure after the two constraints
pow(x,y,z) and pow(y,x,y) are stored, in this order. Figure 4 illustrates
the structure after a new input constraint makes x = 5 entailed.
x=# y=#
x=0 y=l *<>0 **" z=0 z=1
pow(y.x.y)
pow(x,y,z)
z=#
#<>0
initpf 5 accumulator: 5
addpf 1, X accumulator : 5 + X
addpf -1, Y accumulator : 1.86 + X - Z
solve_eqO solve : 1.86 + X - Z = 0
initpf -5 accumulator: —5
addpf -1, Y accumulator: -1.86+Z
solve_no_fail.eq X add : X = -1.86 + Z
In summary for this special case, for CLP systems in general, we often en-
counter constraints which can be organized into a form such that its consis-
tency with the store is obvious. This typically happens when a new variable
appears in an equation, for example, and new variables are often created
in CLP systems. Thus the instructions of the form solve_no_fail-zxx are
justified.
Next consider the special case 2, and the following example CLP (7£)
program.
sum(0, 0).
sum(N, X) :-
N >= 1,
Nl = N - 1,
XI = X - N,
sum(Nl, X I ) .
solve_no_fail_eq XI
(2) init_pf -6
addpf.and-delete 1, XI
solve_no_fail_eq XI'
(3) init.pf -5
addpf_and_delete 1, XI'
solve_no_fail_eq XI"
Note that a different set of instructions is required for the first equation
from that required for the remaining equations. Hence the first iteration
needs to be unrolled to produce the most efficient code. The main challenge
for this special case is, as in special case 2, the detection of the special
constraints. We now address this issue.
11.2.2 Techniques for CLP program analysis
The kinds of program analysis required to utilize the specialized instruc-
tions include those techniques developed for Prolog, most prominently, de-
tecting special cases of unification and deterministic predicates. Algorithms
for such analysis have become familiar; see [Debray, 1989a; Debray, 1989b]
for example. See [Garcia and Hermenegildo, 1993], for example, for a de-
scription of how to extend the general techniques of abstract interpretation
applicable in LP to CLP. Our considerations above, however, require rather
specific kinds of analyses.
Detecting redundant variables and future redundant constraints can in
fact be done without dataflow analysis. One simple method involves unfold-
ing the predicate definition (and typically once is enough), and then, in the
case of detecting redundant variables, simply inspecting where variables
occur last in the unfolded definitions. For detecting a future redundant
constraint, the essential step is determining whether the constraints in an
unfolded predicate definition imply the constraint being analyzed.
An early work describing these kinds of optimizations is [Jorgensen et
a/., 1991], and some further discussion can also be found in [Jaffar et al.,
1992b]. The latter first described the abstract machine CLAM for CLP(7t),
and the former first defined and examined the problem of our special case
2, that of detecting and exploiting the existence of future redundant con-
straints in CLP(R). More recently, [McDonald et ai, 1993] reported new
algorithms for the problem of special case 3, that of detecting redundant
variables in CLP(72.). The work [Marriott and Stuckey, 1993a] describes,
in a more general setting, a collection of techniques (entitled refinement,
removal and reordering) for optimization in CLP systems. See also [Mar-
riott et ai, 1994] for an overview of the status of CLP(72.) optimization and
[Michaylov, 1992; Yap, 1994] for detailed empirical results.
Despite the potential of optimization as reported in these works, the
656 Joxan Jaffar and Michael J. Maher
lack of (full) implementations leaves open the practicality of using these and
other sophisticated optimization techniques for CLP systems in general.
Solver identifiers
It is often necessary to have a way to index from a variable to the
constraints it is involved in. Since the WAM structure provides stack
locations for the dynamically created variables, it remains just to have
a tag and value structure to respectively (a) identify the variable as a
solver variable, and (b) access the constraint(s) associated with this
variable. Note that the basic unification algorithm, assuming functors
are used in the constraint system, needs to be augmented to deal with
this new type.
Tagged trail
As mentioned in Section 10.6, the trail in the WAM merely consists
of a stack of addresses to be reset on backtracking. In CLP systems
in general, the trail is also used to store changes to constraints. Hence
a tagged value trail is required. The tags specify what operation is
to be reversed, and the value component, if present, contains any old
data to be restored.
Time-stamped data structures
Time stamps have been briefly discussed in Section 10.6. The basic
idea here is that the data structure representing a constraint may go
through several changes without there being a new choice point en-
countered during this activity. Clearly only one state of the structure
need be trailed for each choice point.
Constraint accumulator
A constraint is typically built up using a basic instruction repeatedly,
for example, the addpf instruction in CLP (72.). During this process,
the partially constructed constraint is represented in an accumulator.
One of the solve instructions then passes the constraint to the solver.
We can think of this linear form accumulator as a generalization of
the accumulator in classical computer architectures, accumulating a
partially constructed constraint instead of a number.
Constraint Logic Programming 657
circuit(resistor(R), V, I) :- V = I * R.
circuit(series(Nl, N 2 ) , V, I) :-
I = II,
I = 12,
V = VI + V2,
circuit(Nl, VI, I I ) ,
circuit(N2, V2, 12).
circuit(parallel(Nl, N2), V, I) :-
V = VI, V = V2,
I = II + 12,
circuit(Nl, VI, II),
circuit(N2, V2, 12).
For example, the query
?- circuit(series(series(resistor(R).resistor(R)),
resistor(R)),V,5)
asks for the voltage value if a current value of 5 is flowing through a net-
work containing just three identical resistors in series. (The answer is R =
0.0666667*V.) Additional rules can be added for other devices. For exam-
ple, the piece-wise linear model of a diode described by the voltage-current
relationship
r 10
10V + 1000 if V < -100
J = {^ o-c
0.001 V if - 100 < V < 0.6
100V - 60 if V > 0.6
I 10i
is captured by the rules:
circuit(diode, V, 10 * V + 1000) :- V < -100.
circuit(diode, V, 0.0001 * V) :- -100 <= V, V <= 0.6.
circuit(diode, V, 100 * V - 60) :- V > 0.6.
This basic idea can be extended to model AC networks. For example, sup-
pose we wish to reason about an RLC network in steady-state. First, we
dispense with complex numbers by representing X + iY as a CLP (R,) term
c(X, Y), and use:
c_equal(c(Re, Im), c(Re, Im)).
c_add(c(Rel, Iml), c(Re2, Im2), c(Rel + Re2, Iml
cmult(c(Rel, Iml), c(Re2, Im2), c(Re3, Im3)) :-
Re3 = Rel * Re2 - Iml * Im2,
Im3 = Rel * Im2 + Re2 * Iml.
660 Joxan Jaffar and Michael J. Maher
costing $800 which gives the right to purchase 100 shares at $50 per share
within some period of time. This call option can be sold at the current
market price, or exercised at a cost of $5000. Now if the price of the share
is $60, then the option may be exercised to obtain a profit of $10 per share;
taking the price of the option into account, the net gain is $200. After
the specified period, the call option, if not exercised, becomes worthless.
Figure 5 shows payoff diagrams which are a simple model of the relationship
between the value of a call option and the share price. Sell options have
similar diagrams. Note that c denotes the cost of the option and x the
exercise price.
,, , f 0 if a; > w , , x fO if x > y
h(x,y)
v =<, ., . and r(x,y)
v y =i ,, .
"' \ 1 otherwise ' \ y-x otherwise
The payoff function for call and put options can now be described by the
following matrix product which creates a linear piecewise function:
payoff'= [hi,h2,ri,r2] x
' r(b2,s)
where s is the share price, bi is either the strike price or 0, and hi and ri are
multipliers of the Heaviside and ramp functions. In the following program,
the variables S, X, R respectively denote the stock price, the exercise price
and the interest rate.
h(X, Y, Z) :- Y < X, Z = 0.
h(X, Y, Z) :- Y >= X, Z = 1.
r(X, Y, Z) :- Y < X, Z = 0.
r(X, Y, Z) :- Y >= X, Z = Y - X.
value(Type, Buy_or_Sell, S, C, P, R, X, B, Payoff) :-
sign(Buy_or_Sell, Sign),
data(Type, S, C, P, R, X, B, Bl, B2, HI, H2, Rl, R2),
h(Bl, S, Tl), h(B2, S, T2), r(Bl, S, T3),
r(B2, S, T4),
Payoff = Sign*(Hl*Tl + H2*T2 + R1*T3 + R2*T4).
The parameters for the piecewise functions can be expressed symbolically
in the following tables, implemented simply as CLP facts.
sign(buy, -1).
sign(sell, 1).
data(stock, S, C, P, R, X, B, 0, 0, S*R, 0, -1, 0).
data(call, S, C, P, R, X, B, 0, X, OR, 0, 0, -1).
data(put, S, C, P, R, X, B, 0, X, P*R-X, 0, 1, -1).
data(bond, S, C, P, R, X, B, 0, 0, B*R, 0, 0, 0).
This program forms the basis for evaluating option combinations. The fol-
lowing direct query evaluates the sale of a call option that expires in-the-
monej/38,
38
That is, when the strike price is less than the share price.
Constraint Logic Programming 663
The above is just a brief overview of the core ideas behind the work in
[Lassez et al, 1987]. Among the important aspects that are omitted are
consideration of option pricing models, and details of implementing the de-
39
We will use ';' to separate different sets of answer constraints in the output.
664 Joxan Jaffar and Michael J. Maher
cision support system OTAS [Huynh and Lassez, 1988]. As in the circuit
modelling application described above, the advantages of using CLP (R)
here are that the program is concise and that the query language is expres-
sive.
40
We assume that time is modelled by the integers.
Constraint Logic Programming 665
[Baudinet et al., 1993] surveys work in this area using an integer model of
time.
j : Al • • • Ai = DI • • • Dj and
ViBj : BI • • • Bi = DI • • • DJ, and conversely,
Let di denote the length of Ai, similarly for 6i, and di. Let Si denote the
subsequence (a1, . . . , ai), 1 < i < N. Similarly define bi and di. The prob-
lem at hand now can be stated as: given the multisets a = {ai, . . . ,an},
b = {&i,..., &Af} and d = {d1,.. . , d/k}, construct the sequences SN =
(ai,... ,an), bM = (bi,...,bM) and dk = (d1,. . . , d k ) .
Our basic algorithm generates d1 , d2 , . . . in order and extends the par-
titions for a and b using the following invariant property which can be
obtained from the problem definition above. Either
• dk is aligned with ai, that is, d1 + • • • + dk = ai + • • • + a1, or
• dk is aligned with bj (but not with ai41,) that is, d1 + • • • + dk =
61 + --- + bj.
In the program below, the main procedure solve takes as input three
lists representing a, 6 and d in the first three arguments, and outputs in
the remaining three arguments. Enumeration is done by choosing, at each
recursive step of the rsm procedure, one of two cases mentioned above.
Hence the two rules for rsm. Note that the three middle arguments of
rsm maintain the length of the subsequences found so far, and in all calls,
either lenA = lenD < lenB or lenB = lenD < lenA holds; the procedure
choose_initial chooses the first fragment, and the first call to rsm is
made with this invariant holding. Finally, the procedure choose deletes
some element from the given list and returns the resultant list. Note that
one more rule for rsm is needed in case the A and B fragments do align
4
'For simplicity we assume that we never have all three partitions aligned except at
the beginning and at the end.
Constraint Logic Programming 669
anywhere except at the extreme ends; we have omitted this possibility for
simplicity.
This application of CLP is due to Yap [Yap, 1991; Yap, 1993] and it is im-
portant to note that the above program is a considerable simplification of
Yap's program. A major omission is the consideration of errors in the frag-
ment lengths (because these lengths are obtained from experimentation). A
major point in Yap's approach is that it gives a robust and uniform treat-
ment of the experimental errors inherent in the data as compared with
many of the approaches in the literature. Furthermore, [Yap, 1993] shows
how the simple two enzyme problem can be extended to a number of other
problem variations. Because a map solution is just a set of answer con-
straints returned by the algorithm, it is easy to combine this with other
maps, compare maps, verify maps, etc. This kind of flexibility is impor-
tant as the computational problem of just computing a consistent map is
intractable, and hence when dealing with any substantial amount of data,
any algorithm would have to take into account data from many varieties of
mapping experiments, as well as other information specific to the molecule
in question.
670 Joxan Jaffar and Michael J. Maher
13.3 Scheduling
In this class of problems, we are given a number of tasks, and for each task, a
task duration. Each task also requires other resources to be performed, and
there are constraints on precedences of task performance, and on resource
usage. The problem is to schedule the tasks so that the resources are most
efficiently used (for example, perform the tasks so that all are done as soon
as possible).
Consider now a basic job-shop scheduling problem in which is given a
number m of machines, j sequences of tasks, the task durations and the
machine assigned to each task. The precedence constraints are that the
tasks in each sequence (called a job) are performed in the sequence order.
The resource constraints are that each machine performs at most one task
at any given time.
In the program below, precedences sets up the precedence constraints
for one job, and is called with two equally long lists. The first contains the
task variables, whose values are the start times. The second list contains
the durations of the tasks. Thus precedences is called once for each job.
The procedure resources is called repeatedly, once for each pair of tasks
TI and T2 which must be performed without overlapping; their durations
are given by D1 and D2.
precedences( [Ti, T2 I Tail], [D1, D2 I Tail2]) :-
TI + Dj <= T2,
precedences(Tail, Tail2).
precedences ( [] , []).
precedences( ... ),
define_cost(T1, T2 T n , Cost),
enumerate(T1, T 2 , ... , T n ),
generate jstart_times(T1, t2, ... , T n ).
enumerated (T1, ... , Tn) :-
resources ( ... ) , % one per pair of tasks assigned
to same machine
resources( ... ).
formula_of (R 2 , (C 2 , H 2 , N 2 , 0 2 )),
formula_of(Pi, (C 3 , H 3 , N 3 > 0 3 )),
formula_of (P 2 , (C 4 , H 4 , N 4 , 0 4 )),
c1 + C2 = C3 + C 4 ,
HI + H2 = H3 + H 4 ,
Ni + N2 = N3 + N 4 ,
Oi + 02 = 03 + 0 4 .
pathway _step_consistency(R1, R 2 , PI, P 2 ) :-
arith_var_of (R1 , NI) , arith_var_of (R1 , N2) ,
aritli_var_of (P1 , N3) , arith_var_of (P1 , N4) ,
NI + N2 = N3 + N4.
formation-dependencies (R , R2, P1, P2) :-
bool_var_a_of (R1 , A1) bool_var_b_of (R1 , B1),
bool_var_a_of (Ri , A2) bool_var_b_of (R2 , B2) ,
bool.var_a_of (Pi , A3) bool_var_b_of (P1 , B 3 ) ,
bool_var_a_of (P1 , A4) bool_var_b_of (P2 , B4) ,
A1 A A2 => A3 A A4 ,
B1 A B2 => B3 A B4 .
both_reagents_needed(R1, R2, T) :-
bool_var_a_of (R1 , A1) , bool_var_b_of (R2 , BI)
bool_var_a_of (T, A 3 ) , bool_var_b_of (T , B3) ,
-. (A1 => A 3 ) ,
- (B1 => B 3 ) .
13.5 Propositional solver
As mentioned above in the discussion about the Boolean constraint domain,
one approach to solving Boolean equations is to use clp (FD) , representing
the input formulas in a straightforward way using variables constrained to
be 0 or 1. See section 3.3.2 of [Simonis and Dincbas, 1993] and [Codognet
and Diaz, 1993] for example. What follows is from [Codognet and Diaz,
1993].
Assuming, without losing generality, that the input is a conjunction of
equations of the form Z = X A Y, Z = X V Y or X = ->Y, the basic
algorithm is simply to represent each equation
Z = X AY by the FD) constraints Z =X xY
Z<X <Z x Y + l-
Z<Y <ZxX + l-
Z = X V Y by Z = X + Y -X xY
Zx(1-Y)<X <Z
Zx(l-X)<Y <Z
X = -Y by X = 1-Y
Y = l-X
The following is a clp (FD) program fragment which realizes these represen-
Constraint Logic Programming 675
tations. What is not shown is a procedure which takes the input equation
and calls the and, or and not procedures appropriately, and an enumeration
procedure (over the values 0 and 1) for all variables. In this program val(X)
delays execution of an ?"D constraint containing it until X is ground, at
which time val(X) denotes the value of X. The meanings of min(X) and
max(X) are, respectively, the current lower and upper bounds on X main-
tained by the constraint solver, as discussed in Section 9.3. A constraint
X in s..t expresses that s and t are, respectively, lower and upper bounds
for X.
and(X, Y, Z) :-
Z in min(X)*min(Y) .. max(X)*max(Y),
X in min(Z) .. max(Z)*max(Y) + 1 - min(Y),
Y in min(Z) .. max(Z)*max(X) + 1 - min(X).
or(X, Y, Z) :-
Z in min(X) + min(Y) - min(X)*min(Y) ..
max(X) + max(Y) - max(X)*max(Y),
X in min(Z)*(l - max(Y)) .. max(Z),
Y in min(Z)*(l - max(X)) .. max(Z).
not(X, Y) :-
X in 1 - val(Y),
Y in 1 - val(X).
We conclude here by mentioning the authors' claim that this approach has
great efficiency. In particular, it is several times faster than each of two
Boolean solvers deployed in CHIP, and some special-purpose stand-alone
solvers.
14 Further applications
The applications discussed in the previous two sections are but a sample
of CLP applications. Here we briefly mention some others to indicate the
breadth of problems that have been addressed using CLP languages and
techniques.
We have exemplified the use of CLP to model analog circuits above.
A considerable amount of work has also been done on digital circuits, in
particular on verification [Simonis, 1989a; Simonis and Dincbas, 1987a;
Simonis et al., 1988; Simonis and Le Provost, 1989], diagnosis [Simonis
and Dincbas, 1987b], synthesis [Simonis and Graf, 1990] and test-pattern
generation [Simonis, 1989b]. Many of these works used the CHIP system.
See also [Filkorn et al., 1991] for a description of a large application. In
civil engineering, [Lakmazaheri and Rasdorf, 1989] used CLP(R) for the
analysis and partial synthesis of truss structures. As with electrical cir-
cuits, the constraints implement physical modelling and are used to verify
676 Joxan Jaffar and Michael J. Maher
Acknowledgements
We would like to thank the following people for their comments on drafts
of this paper and/or help in other ways: M. Bruynooghe, N. Heintze, P.
van Hentenryck, A. Herold, J-L. Lassez, S. Michaylov, C. Palamidessi, K.
Shuerman, P. Stuckey, M. Wallace, R. Yap. We also thank the anonymous
referees for their careful reading and helpful comments.
Constraint Logic Programming 677
References
[Abadi and Manna, 1989] M. Abadi and Z. Manna. Temporal Logic Pro-
gramming, Journal of Symbolic Computation, 8, 277-295, 1989.
[Aggoun and Beldiceanu, 1992] A. Aggoun and N. Beldiceanu. Extend-
ing CHIP to Solve Complex Scheduling and Packing Problems, In
Journees Francophones De Programmation Logique, Lille, France,
1992.
[Aggoun and Beldiceanu, 1993] A. Aggoun and N. Beldiceanu. Overview
of the CHIP Compiler System. In Constraint Logic Programming:
Selected Research, F. Benhamou and A. Colmerauer, eds. pp. 421-
435. MIT Press, 1993.
[Aiba et at., 1988] A. Aiba, K. Sakai, Y. Sato, D. Hawley and R. Hasegawa.
Constraint Logic Programming Language CAL, Proc. International
Conference on Fifth Generation Computer Systems 1988, 263-276,
1988.
[Ai't-Kaci, 1986] H. Ai't-Kaci. An Algebraic Semantics Approach to the Ef-
fective Resolution of Type Equations, Theoretical Computer Sci-
ence, 45, 293-351, 1986.
[Ai't-Kaci, 1991] H. Ai't-Kaci. Warren's Abstract Machine: A Tutorial Re-
construction, MIT Press, 1991.
[Ai't-Kaci and Nasr, 1986] H. Ai't-Kaci and R. Nasr. LOGIN: A Logic Pro-
gramming Language with Built-in Inheritance, Journal of Logic Pro-
gramming, 3, 185-215, 1986.
[Ai't-Kaci and Nasr, 1987] H. Ai't-Kaci, P. Lincoln and R. Nasr. Le Fun:
Logic Equations and Functions, Proc. Symposium on Logic Program-
ming, 17-23, 1987.
[Ai't-Kaci and Podelski, 1993a] H. Ai't-Kaci and A. Podelski. Towards a
Meaning of LIFE, Journal of Logic Programming, 16,195-234,1993.
[Ai't-Kaci and Poldeski, 1993b] H. Ai't-Kaci and A. Podelski. Entailment
and Disentailment of Order-Sorted Feature Constraints, manuscript,
1993.
[Ai't-Kaci and Poldeski, 1993c] H. Ai't-Kaci and A. Podelski. A General
Residuation Framework, manuscript, 1993.
[Ai't-Kaci et al., 1992] H. Ai't-Kaci, A. Podelski and G. Smolka. A Feature-
based Constraint System for Logic Programming with Entailment,
Theoretical Computer Science, to appear. Also in: Proc. Interna-
tional Conference on Fifth Generation Computer Systems 1992, Vol.
2, 1992, 1012-1021.
[Albert et al., 1993] L. Albert, R. Casas and F. Fages. Average-case Anal-
ysis of Unification Algorithms, Theoretical Computer Science 113,
3-34, 1993.
[Apt et al., 1988] K. Apt, H. Blair and A. Walker. Towards a theory of
678 Joxan Jaffar and Michael J, Maher
[Burg et al., 1992] J. Burg, C. Hughes and S.D. Lang. Parallel Execution
of CLP-Sft Programs, Technical Report TR-CS-92-20, University of
Central Florida, 1992.
[Buttner and Simonis, 1987] W. Buttner and H. Simonis. Embedding
Boolean Expressions into Logic Programming, Journal of Symbolic
Computation, 4, 191-205, 1987.
[Carlsson, 1987] M. Carlsson. Freeze, Indexing and other Implementation
Issues in the WAM, Proc. 4th International Conference on Logic
Programming, 40-58, 1987.
[Carlsson and Grindal, 1993] M. Carlsson and M. Grindal. Automatic Fre-
quency Assignment for Cellular Telephones Using Constraint Sat-
isfaction Techniques, Proc. 10th International Conference on Logic
Programming, 647-665, 1993.
[Cernikov, 1963] S.N. Cernikov. Contraction of Finite Systems of Linear
Inequalities (In Russian), Doklady Akademiia Nauk SSSR, 152, No.
5, 1075-1078,1963. (English translation in Soviet Mathematics Dok-
lady, 4, No. 5, 1520-1524, 1963.)
[Chadra et al., 1992] R. Chadra, O. Cockings and S. Narain. Interoperabil-
ity Analysis by Symbolic Simulation, Proc. JICSLP Workshop on
Constraint Logic Programming, 55-58, 1992.
[Chamard et al., 1992] A. Chamard, F. Deces and A. Fischler. Applying
CHIP to a Complex Scheduling Problem, draft manuscript, Dassault
Aviation, Department of Artificial Intelligence, 1992.
[Chan, 1988] D. Chan. Constructive Negation based on Completed
Database, Proc. 5th International Conference on Logic Program-
ming, 111-125, 1988.
[Chandru, 1993] V. Chandru. Variable Elimination in Linear Constraints,
The Computer Journal, 36(5), 463-472, 1993.
[Chandru, 1991] V. Chandru and J.N. Hooker. Extended Horn Sets in
Prepositional Logic, Journal of the ACM, 38, 205-221, 1991.
[Chvatal, 1983] V. Chvatal. Linear Programming, W.H. Freeman, New
York, 1983.
[Clark, 1978] K.L. Clark. Negation as Failure. In Logic and Databases, H.
Gallaire and J. Minker, eds. pp. 293-322. Plenum Press, New York,
1978.
[Codognet and Diaz, 1993] P. Codognet and D. Diaz. Boolean Constraint
Solving using clp(FD), Proc. International Logic Programming
Symposium, pp. 525-539, 1993.
[Codognet et al., 1992] P. Codognet, F. Fages, J. Jourdan, R. Lissajoux
and T. Sola. On the Design of Meta(F) and its Applications in
Air Traffic Control, Proc. JICSLP Workshop on Constraint Logic
Programming, pp. 28-35, 1992.
Constraint Logic Programming 681
[de Boer and Palamidessi, 1990] F.S. de Boer and C. Palamidessi. A Fully
Abstract Model for Concurrent Constraint Programming, Proc. of
TAPSOFT/CAAP, LNCS 493, 296-319, 1991.
[de Boer and Palamidessi, 1991] F.S. de Boer and C. Palamidessi. Embed-
ding as a Tool for Language Comparison, Information and Compu-
tation 108, 128-157, 1994.
[de Boer and Palamidessi, 1993] F.S. de Boer and C. Palamidessi. From
Concurrent Logic Programming to Concurrent Constraint Program-
ming, in: Advances in Logic Programming Theory, Oxford University
Press, to appear.
[de Boer et al., 1993] F. de Boer, J. Kok, C. Palamidessi and J. Rutten.
Non-monotonic Concurrent Constraint Programming, Proc. Inter-
national Logic Programming Symposium, 315-334, 1993.
[Debray, 1989a] S. K. Debray. Static Inference of Modes and Data De-
pendencies in Logic Programs, ACM Transactions on Programming
Languages and Systems 11 (3), 418-450, 1989.
[Debray, 1989b] S. K. Debray and D.S. Warren. Functional Computations
in Logic Programs, ACM Transactions on Programming Languages
and Systems 11 (3), 451-481, 1989.
[de Kleer and Sussman, 1980] J. de Kleer and G.J. Sussman. Propagation
of Constraints Applied to Circuit Synthesis, Circuit Theory and Ap-
plications 8, 127-144, 1980.
[Diaz and Codognet, 1993] D. Diaz and P. Codognet. A Minimal Extension
of the WAM for clp(FD), Proc. 10th International Conference on
Logic Programming, 774-790, 1993.
[Dincbas et al., 1988a] M. Dincbas, P. Van Hentenryck, H. Simonis, and
A. Aggoun. The Constraint Logic Programming Language CHIP,
Proceedings of the 2nd International Conference on Fifth Generation
Computer Systems, 249-264, 1988.
[Dincbas et al., 1988b] M. Dincbas, H. Simonis and P. van Hentenryck.
Solving a Cutting-stock Problem in CLP, Proceedings 5th Interna-
tional Conference on Logic Programming, MIT Press, 42-58, 1988.
[Dincbas et al., 1990] M. Dincbas, H. Simonis and P. Van Hentenryck.
Solving Large Combinatorial Problems in Logic Programming, Jour-
nal of Logic Programming 8 (1 and 2), 75-93, 1990.
[Dovier and Rossi, 1993] A. Dovier and G. Rossi. Embedding Extensional
Finite Sets in CLP, Proc. International Logic Programming Sympo-
sium, 540-556, 1993.
[Duisburg, 1986] R. Duisburg. Constraint-based Animation: Temporal
Constraints in the Animus System, Technical Report CR-86-37, Tek-
tronix Laboratories, August 1986.
[Ege et al, 1987] R. Ege, D. Maier and A. Borning. The Filter Browser:
Constraint Logic Programming 683
[Horn, 1951] A. Horn. On sentences which are true of direct unions of al-
gebras. Journal of Symbolic Logic, 16, 14-21, 1951.
[Huynh and Lassez, 1988] T. Huynh and C. Lassez. A CLP(R) Options
Trading Analysis System, Proceedings 5th International Conference
on Logic Programming, pp. 59-69, 1988.
[Huynh et al., 1990] T. Huynh, C. Lassez and J-L. Lassez. Practical issues
on the projection of polyhedral sets. Annals of Mathematics and
Artificial Intelligence, 6, 295-315, 1992.
[Imbert, 1993a] J.-L. Imbert. Variable elimination for disequations in gen-
eralized linear constraint systems. The Computer Journal, 36, 473-
484, 1993.
[Imbert, 1993b] J.-L. Imbert. Fourier's Elimination: which to choose? Proc.
Workshop on Principles and Practice of Constraint Programming,
Newport, pp. 119-131, April 1993.
[Jaffar, 1984] J. Jaffar. Efficient unification over infinite terms. New Gen-
eration Computing, 2, 207-219, 1984.
[Jaffar, 1990] J. Jaffar. Minimal and Complete Word Unification, Journal
of the ACM, 37, 47-85, 1990.
[Jaffar and Lassez, 1986] J. Jaffar and J.-L. Lassez. Constraint Logic Pro-
gramming, Technical Report 86/73, Department of Computer Sci-
ence, Monash University, 1986.
[Jaffar and Lassez, 1987] J. Jaffar and J.-L. Lassez. Constraint Logic Pro-
gramming, Proc. 14th ACM Symposium on Principles of Program-
ming Languages, Munich (January 1987), pp. 111-119.
[Jaffar and Stuckey, 1986] J. Jaffar and P. Stuckey. Canonical Logic Pro-
grams, Journal of Logic Programming, 3, 143-155, 1986.
[Jaffar et al., 1984] J. Jaffar, J.-L. Lassez and M.J. Maher. A theory of
complete logic programs with equality. Journal of Logic Program-
ming, 1, 211-223, 1984.
[Jaffar et al., 1986] J. Jaffar, J.-L. Lassez and M.J. Maher. A logic pro-
gramming language scheme. In Logic Programming: Relations,
Functions and Equations, D. DeGroot and G. Lindstrom, eds. pp.
441-467. Prentice-Hall, 1986.
[Jaffar et al., 1991] J. Jaffar, S. Michaylov and R.H.C. Yap. A Methodol-
ogy for Managing Hard Constraints in CLP Systems, Proc. ACM-
SIGPLAN Conference on Programming Language Design and Im-
plementation, pp. 306-316, 1991.
[Jaffar et al., 1992a] J. Jaffar, S. Michaylov, P. Stuckey and R.H.C. Yap.
The CLP(R.) language and system. ACM Transactions on Program-
ming Languages, 14, 339-395, 1992.
[Jaffar et al., 1992b] J. Jaffar, S. Michaylov, P. Stuckey and R.H.C. Yap.
An Abstract Machine for CLP(R.), Proceedings ACM-SIGPLAN
686 Joxan Jaffar and Michael J. Maher
Contents
1 Introduction 697
2 A preliminary example 701
3 Transformation rules for logic programs 704
3.1 Syntax of logic programs 704
3.2 Semantics of logic programs 706
3.3 Unfold/fold rules 707
4 Correctness of the transformation rules 715
4.1 Reversible transformations 716
4.2 A derived goal replacement rule 719
4.3 The unfold/fold proof method 721
4.4 Correctness results for definite programs 723
4.5 Correctness results for normal programs 736
5 Strategies for transforming logic programs 742
5.1 Basic strategies 745
5.2 Techniques which use basic strategies 747
5.3 Overview of other techniques 760
6 Partial evaluation and program specialization 764
7 Related methodologies for program development 771
1 Introduction
Program transformation is a methodology for deriving correct and efficient
programs from specifications.
In this chapter, we will look at the so called 'rules + strategies' ap-
proach, and we will report on the main techniques which have been intro-
duced in the literature for that approach, in the case of logic programs. We
will also present some examples of program transformation, and we hope
that through those examples the reader may acquire some familiarity with
the techniques we will describe.
The program transformation approach to the development of programs
has been first advocated in the case of functional languages by Burstall and
698 Alberto Pettorossi and Maurizio Proietti
SEM\SEM\
V
Fig. 1. The program transformation idea: from program P0 we derive
program Pn preserving the semantic value V.
most logic programs which produce a set of answers for any input query,
we may allow transformation steps which are partially correct, but not
totally correct, in the sense that for 0 < i < n and for every input query
Q, SEM[Pi,Q] C SEM[Pi+i, Q]. (Here, and in what follows, the semantic
function SEM is assumed to depend both on the program and the input
query.)
As already mentioned, during the program transformation process one
is interested in reducing the complexity of the derived program w.r.t. the
initial program. This means that for the sequence P0 , . . . , Pn of programs
there exists a cost function C which measures the computational complexity
of the programs, such that C(Po) > C7(Pn).
Notice that we may allow ourselves to derive a program, say Pi, for
some i > 0, such that C(Po) < C'(Pj), because subsequent transformations
may lead us to a program whose tost is smaller than the one of P0. Un-
fortunately, there is no general theory of program transformations which
deals with this situation in a satisfactory way in all possible circumstances.
The efficiency improvement from program PO to program Pn is not
ensured by an undisciplined use of the transformation rules. This is the
reason why we need to introduce the so-called transformation strategies,
that is, meta-rules which prescribe suitable sequences of applications of the
transformation rules.
In logic programming there are many notions of efficiency which have
been used. They are related either to the size of the proofs or to the
machine model which is assumed for the execution of programs. In what
follows we will briefly explain how the strategies which have been proposed
in the literature may improve program efficiency, and we will refer to the
original papers for more details on these issues.
So far we have indicated two major objectives of the program trans-
formation approach, namely the preservation of the semantic value of the
initial program and the reduction of the computational complexity of the
derived program w.r.t. the initial one.
There is a third important objective which is often given less attention:
the formalization of the program transformation process itself. The need
for this formalization derives from the desire of making the program trans-
700 Alberto Pettorossi and Maurizio Proietti
2 A preliminary example
The 'rules + strategies' approach to program transformation as it was
first introduced in [Burstall and Darlington, 1977] for recursive equation
programs, is based on the use of two elementary transformation rules: the
unfolding rule and the folding rule.
The unfolding rule consists in replacing an instance of the left hand side
of a recursive equation by the corresponding instance of the right hand side.
This rule corresponds to the 'replacement rule' used in [Kleene, 1971] for
the computation of recursively defined functions. The application of the
unfolding rule can also be viewed as a symbolic computation step.
The folding rule consists in replacing an instance of the right hand side
of a recursive equation by the corresponding instance of the left hand side.
Folding can be viewed as the inverse of unfolding, in the sense that, if we
perform an unfolding step followed by a folding step, we get back the initial
expression. Vice versa, unfolding can be viewed as the inverse of folding.
The reader who is not familiar with the transformation methodology,
may wonder about the usefulness of performing a folding step, that is,
of inverting a symbolic computation step, when one desires to improve
program efficiency. However, as we will see in some examples below, the
folding rule allows us to modify the recursive structure of the programs to
be transformed and, by doing so, we will often be able to achieve substantial
efficiency improvements.
Program derivation techniques following the 'rules + strategies' ap-
proach, have been presented in the context of logic programming in [Clark
and Sickel, 1977; Hogger, 1981], where the basic derivation rules consist of
the substitution of a formula by an equivalent formula.
Tamaki and Sato [1984] have adapted the unfolding and folding rules to
the case of logic programs. Following the basic ideas relative to functional
programs, they take an application of the unfolding rule to be equivalent
to a computation step, that is, an application of SLD-resolution, and the
folding rule to be the inverse of unfolding.
As already mentioned, during the transformation process we want to
keep unchanged, at least in a weak sense, the semantic value of the pro-
grams which are derived, and in particular, we want the final program to
be partially correct w.r.t. the initial program.
If from a program P0 we derive by unfold/fold transformations a pro-
gram PI , then the least Herbrand model of PI , as defined in [van Emden and
Kowalski, 1976], is contained in the least Herbrand model of P0 [Tamaki
and Sato, 1984]. Thus, the unfold/fold transformations are partially correct
w.r.t. the least Herbrand model semantics.
In general, unfold/fold transformations are not totally correct w.r.t. the
least Herbrand model semantics, that is, the least Herbrand model of P0
may be not contained in the one of PI . In order to get total correctness
702 Alberto Pettorossi and Maurizio Proietti
one has to comply with some extra conditions [Tamaki and Sato, 1984].
The study of the various semantics which are preserved when using the
unfold/fold transformation rules will be the objective of Section 4.
Let us now consider a preliminary example of program transformation
where we will see in action some of the rules and strategies for transforming
logic programs. In this example, together with the unfolding and folding
rules, we will also see the use of two other transformation rules, called
definition rule and goal replacement rule, and the use of a transformation
strategy, called tupling strategy.
As already mentioned, the need for strategies which drive the appli-
cation of the transformation rules and improve efficiency comes from the
fact that folding is the inverse of unfolding, and thus we may construct
a useless transformation sequence where the final program is equal to the
initial program.
Let us consider the following logic program PO for testing whether or
not a given list is a palindrome:
1. pal([ ]) <-
2. pal((H}) <-
3. pal([H\T\) <- append(Y,[H],T),pal(Y)
4. append([],Y,Y) ^
5. append([H\X], Y, [H\Z]) <- append(X, Y, Z)
We have that, given the lists X, Y, and Z, append(X,Y,Z) holds in the
least Herbrand model of PO iff Z is the concatenation of X and Y.
Both pal(Y) and append(Y,[H],T) visit the same list Y and we may
avoid this double visit by applying the tupling strategy which suggests the
introduction of the following clause for the new predicate newp:
6. newp(L,T) «- append(Y,L,T),pal(Y)
Actually, clause 6 has been obtained by a simultaneous application of
the tupling strategy and the so-called generalization strategy, in the sense
that in the body of clause 3 the argument [H] has been generalized to the
variable L. In Section 5, we will consider the tupling and the generaliza-
tion strategies and we will indicate in what cases they may be useful for
improving program efficiency.
By adding clause 6 to PO we get a new program P\ which is equivalent to
PO w.r.t. all predicates occurring in the initial program PO, in the sense that
each ground atom q(. . .), where q is a predicate occurring in PO, belongs
to the least Herbrand model of PO iff q( . . . ) belongs to the least Herbrand
model of PI .
In order to avoid the double occurrence of the list Y in the body
of clause 3, we now fold that clause using clause 6, that is, we replace
'append(Y, [H], T), pal(Y)' which is an instance of the body of clause 6, by
Transformation of Logic Programs 703
10. newp(L, L) «-
11. newp(L, [H\L]) <-
13f. newp(L,[H\U]) <- newp([H\L],U)
In this final program no double visits of lists are performed, and the
time complexity is improved from O(n2) to O(n), where n is the size of the
input list. The initial and final programs have the same least Herbrand
model semantics w.r.t. the predicates pal and append.
Notice that if we are interested in the computation of the predicate
pal only, in the final program we can discard clauses 4 and 5, which are
unnecessary.
The crucial step in the above program transformation, which improves
the program performance, is the introduction of clause 6 defining the new
predicate newp. In the literature that step is referred to as a eureka step
and the predicate newp is also called a eureka predicate.
It can easily be seen that eureka steps cannot, in general, be mechani-
cally performed, because they require a certain degree of ingenuity. There
are, however, many cases in which the synthesis of eureka predicates can
be performed in an automatic way, and this is the reason why in practice
the use of the program transformation methodology is very powerful.
In Section 5, we will consider the problem of inventing the eureka pred-
icates and we will see that it can often be solved on the basis of syntactical
properties of the program to be transformed by applying suitable transfor-
mation strategies.
1989], where the clauses C1, . . . , Cn are viewed as the son-nodes of C. This
tree-based representation will be useful for describing the transformation
strategies (see Section 5).
The transformation rules we will present in this chapter are collectively
called unfold/fold rules and they are a generalization of those introduced
by [Tamaki and Sato, 1984]. Several special cases of these rules will be
introduced in the subsequent sections, when discussing the correctness of
the transformation rules w.r.t. different semantics of logic programs.
In the presentation of the rules we will refer to the transformation se-
quence P0, . . . , Pk and we will assume that the variables of the clauses which
are involved in each transformation rule are suitably renamed so that they
do not have variables in common.
Rule R13. Unfolding. Let Pk be the program E1, . . . ,Er,C, Er+1 , . . . ,Es
and C be the clause H <- F, A, G, where A is a positive literal and F and
G are (possibly empty) goals. Suppose that
1. D1, . . . ,Dn , with n > 0, is the subsequence of all clauses of a pro-
gram PJ, for some j, with 0 < j < k, such that A is unifiable
with hd(D1) , . . . ,hd(Dn), with most general unifiers 01,. . . ,0n, re-
spectively, and
2. Ci is the clause (H <- F, bd(Di), C)0i, for i = 1, . . . , n.
If we unfold C w.r.t. A using Pj we derive the clauses C1 , . . . , Cn and we
get the new program Pk+1 = EI , . . . , Er, C1 , . . . , Cn, Er+1 , . . . ,Es .
The unfolding rule corresponds to the application of a resolution step
to clause C with the selection of the positive literal A and the input clauses
D1, . . . , Dn.
Example 3.3.1. Let Pk be the following program:
p(X)<-q(t(X)),r(X),r(b)
q(a)<-
q(t(b)) <-
q(X) <- r(X)
Then, by unfolding p(X) <- q(t(X)),r(X),r(b) w.r.t. q(t(X)) using Pk
itself we derive the following program Pk+1:
p(b)<-r(b),r(b)
p(X)<-r(t(X)),r(X),r(b)
g(a)«-
q(t(b)) <-
q(X) 4- r(X)
Remark 3.3.2. There are two main differences between the unfolding rule
in the case of logic programs and the unfolding rule in the case of functional
programs.
Transformation of Logic Programs 709
The above definition of goal equivalence does not take into account the
clause where the goal replacement occurs. As a result, many substitutions
of goals by new goals which produce from a program Pk a new program
Pk+1, cannot be viewed as applications of our rule R5, even though Pk and
are equivalent programs.
Transformation of Logic Programs 713
In Example 4.3.1 below we will see that rule R5.1 overcomes the above
mentioned limitation of rule R5.
In the next section we will show that the clausal goal replacement rule
R5.1 can be viewed as a derived rule, because its application can be mim-
icked by suitable applications of the transformation rules Rl, R2, R3, R4,
and R5 defined above. Thus, without loss of generality, we may consider
rule R5 as the only goal replacement rule, when also rules Rl, R2, R3, and
R4 are available.
Various notions of goal equivalence and goal replacement have been in-
troduced in the literature [Tamaki and Sato, 1984; Maher, 1987; Gardner
and Shepherdson, 1991; Bossi et ai, 1992b; Bossi et al., 1992a; Cook and
Gallagher, 1994]. Each of these notions has been defined in terms of a par-
ticular semantics, while in our presentation we introduced a notion which
has the advantage of being parametric w.r.t. the given semantics SEM.
We finally present a class of transformation rules which will collectively
be called clause replacement rules and referred to as rule R6.
Rule R6. Clause replacement. Prom program Pk we get program Pk+1
by applying one of the following four rules.
Rule R6.1 Clause rearrangement. We get Pk+1 by replacing in Pk
the sequence C, D of two clauses by D, C.
This clause rearrangement rule is implicitly used by many authors who
consider a program as a set or a multiset of clauses.
Rule R6.2 Deletion of subsumed clauses. A clause C is subsumed
by a clause D iff there exist a substitution 0 and a (possibly empty) goal G
such that hd(C) = hd(D)0 and bd(C) = bd(D)0, G. We may get program
Pk+1 by deleting from Pk a clause which is subsumed by another clause in
Pk.
In particular, the rule for the deletion of subsumed clauses allows us to
remove duplicate clauses.
Rule R6.3 Deletion of clauses with finitely failed body. Let C be
a clause in program Pk of the form: H <- A1, . . . , Am, L, B1, . . . , Bn with
m, n > 0. If literal L has a finitely failed SLDNF-tree in Pk, then we say
that C has a finitely failed body in Pk and we get program Pk+1 by deleting
C from Pk.
The rules for the deletion of subsumed clauses and the deletion of clauses
with finitely failed body are instances of the clause deletion rule introduced
by Tamaki and Sato [1984] for definite programs. Other rules for deleting
clauses from a given program, or adding clauses to a given program are
studied in [Tamaki and Sato, 1984; Gardner and Shepherdson, 1991; Bossi
and Cocco, 1993; Maher, 1993]. The correctness of those rules strictly
Transformation of Logic Programs 715
depends on the semantics considered. For further details the reader may
look at the original papers.
Rule R6.4 Generalization + equality introduction. Let us assume
that the equality predicate '=' (written in infix notation) is defined by
the clause X = X <- in every program of the transformation sequence
PO, . . . . , Pk. Let us also consider a clause
C. H < - A 1 , . . . , Am
in Pk, a substitution 6 = {X/t}, with X not occurring in t, and a clause
GenC. GenH <- GenA1, . . . , GenAm
such that C = GenC 0,
By generalization + equality introduction we derive the clause
D. GenH <- X = t,GenA1, . . . ,GenAm
and we get Pk+1 by replacing C by D in Pk-
This transformation rule was formalized in [Proietti and Pettorossi,
1990].
correct) w.r.t. SEM iff for every query Q in Q, containing only predicate
symbols which occur in Pk, we have that S E M [ P k + 1 , Q ] < SEM[Pk,Q]
(or SEM[Pk+1,Q] = SEM[Pk,Q]).
Notice that, if a transformation sequence is constructed by performing
a sequence of partially correct transformation steps, then it is partially
correct. However, it may be the case that not all transformation steps
realizing a partially correct transformation sequence are partially correct.
Also, the application of a partially correct transformation rule may generate
a transformation step which is not partially correct.
Similar remarks also hold for total correctness, instead of partial cor-
rectness.
Obviously, if P0 , . . . , Pk and Pk, Pk+1, . . . , Pn are partially correct (or
totally correct) transformation sequences, also their 'concatenation' P0, . . . ,
Pk,Pk+1, . . . ,Pn is partially correct (or totally correct). In what follows,
by 'correctness' we will mean 'total correctness'.
The following lemma establishes a correctness result for the definition
introduction and definition elimination rules assuming that SEM is rele-
vant. In the subsequent sections we will give some more correctness results
which hold with different assumptions on SEM.
Lemma 4.0.2 (Relevance). The rules of definition introduction and def-
inition elimination are totally correct w.r.t. any relevant semantics.
Other instances of the in-situ folding rule have been proposed in [Maher,
1987; Gardner and Shepherdson, 1991].
We will see in Section 4.4 that the reversibility property of in-situ folding
allows us to establish in a straightforward way some total correctness results
for this rule. However, in-situ folding has limited power, in the sense that,
as we will see in Section 5, most transformation strategies for improving
program efficiency make use of folding steps which are not in-situ foldings.
Rule R5.2 Persistent goal replacement. Let C be a clause in Pk and
goal G1 be equivalent to goal G-z w.r.t. a semantics SEM and program Pk.
The goal replacement of G1 by G2 in C is said to be persistent iff G1 and
G2 are equivalent w.r.t. SEM and the derived program
Any persistent goal replacement step which replaces G1 by G2 is re-
versible by a goal replacement step which performs the inverse replacement
of G2 by G1. Thus, if the goal replacement rule is partially correct w.r.t.
SEM, then any persistent goal replacement step is totally correct w.r.t.
SEM.
In the following definition we introduce a variant of the goal replacement
rule which is reversible if one considers relevant semantics.
Rule R5.3 Independent goal replacement. Let C be a clause in a
program P and goal G1 be equivalent to goal G2 w.r.t. a semantics SEM
and program P. The replacement of G1 by G-z in C is said to be independent
iff C belongs to neither P rel (G1) nor P rel (G2) (that is, neither G1 nor G2
depends on G).
Lemma 4.1.4 (Reversibility of independent goal replacement). Let
SEM be a relevant semantics and goal G1 be equivalent to goal G2 w.r.t.
SEM and program P. Any independent goal replacement of G\ by G-z in
a clause of P is reversible by performing an independent goal replacement
step.
Proof. Let Q be the program obtained from P by replacing the goal G1
by the goal G-z in a clause C of P such that neither G1 nor G2 depends on
C. We first show that this independent goal replacement step is persistent
by proving that G1 and G2 are equivalent w.r.t. SEM and program Q,
that is, SEM[Q, <- G1] = SEM[Q, <- G2}. Indeed, we have that
SEM[Q, <- G1] (by relevance of SEM)
= SEM[<Qrel(G1), <- G1] (since C $ Prel(G1))
= SEM[Prel(G1), <- G1] (by relevance of SEM)
= SEM[P, <- G1] (since G\ is equivalent to G-z w.r.t.
SEM and P)
= SEM[P, <- G2] (by relevance of SEM)
= SEM[Prel(G2), <- G2] (since C g P rel (G 2 ))
Transformation of Logic Programs 719
Step 3. The goals newpi (X1 , . . . , Xm) and newpz(X1 , . . . , Xm) are equiv-
alent w.r.t. SEM and Q. Indeed, we have
This proof method was introduced by Kott for recursive equation pro-
grams [Kott, 1982] and its application to logic programs is described in
[Boulanger and Bruynooghe, 1993; Proietti and Pettorossi, 1994a; Proietti
and Pettorossi, 1994b],
The unfold/fold proof method can be described as follows. Given a
program P, a semantics function SEM, and a replacement law G1 =v G2,
with V — {X1, . . . , Xm}, we consider the clauses
where the equivalence w.r.t. the set {X, Y} is justified by the fact that
E2. newp1(X,Y)<-append(T,X,U),append([H\U},Z,Y)
and we get the program R1 =C,A1, A2, E1, E2.
Then, by unfolding clause E2 w.r.t. append([H\U],Z, Y) we get
E3. newp1 (X, [H\V]) <- append(T, X, U), append(U, Z, V)
and we get the program R2 = C, A1 , A2 , E1 , E3 .
Finally, by folding clause E3 using clause D1 in R0 we derive
E4. newp1(X,[H\V])<-newpl(X,V)
and we get the program R3 = C, A1, A2, E1 , E4.
2. Transformation sequence starting from S0.
By unfolding clause D2 in 5o w.r.t. append(K, J, Y) we derive two
clauses
F1. newp2(X,Y) <- append(X,L,Y)
F2. newp2(X, [H\U]) <- append(X, L, J),append(T, J, U)
and we get the program S1 = C, A1, A2, F1, F2.
By folding clause F2 using clause D2 in So we get the clause
F3. newp2(X;[H\U])<-newp2(X,U)
and we derive the final program of this transformation sequence which
ing steps have been performed before folding, so that 'going backward in
the computation' (as folding does) does not prevail over 'going forward in
the computation' (as unfolding does). This idea is the basis for various
techniques in which total correctness is ensured by counting the number of
unfolding and folding steps performed during the transformation sequence
[Kott, 1978; Kanamori and Fujita, 1986; Bossi et al., 1992a].
An alternative approach is based on the verification that some ter-
mination properties are preserved by the transformation process, thereby
avoiding the introduction of infinite computations [Amtoft, 1992; Bossi and
Etalle, 1994b; Bossi and Cocco, 1994; Cook and Gallagher, 1994].
The following definition introduces the version of the folding rule we
have promised above. This version is a special case of rule R2 for n = 1.
P0: P(X) <- q(X), r(X) ?(o) <- r(6) <- r(6)
(by unfolding the first clause w.r.t. r(X))
Pi: p(6)<-«(6),r(6) q(a) <- r(6) <- r(6)
(by applying single-folding to the first clause)
P 2 : p(6)«-p(6) ?(a)<- r(6)«-r(6)
This transformation sequence satisfies the conditions stated in the sec-
ond correctness theorem w.r.t. both LHM and CAS, but PQ finitely fails
for the query <— p(b), while P% does not.
As we have shown in the above Example 4.4.10, if we allow folding
steps which are not in-situ foldings, it may be the case that the derived
transformation sequence is not correct w.r.t. FF.
The fact that such folding steps are not totally correct w.r.t. FF is
related to the notion of fair SLD-derivation [Lloyd, 1987].
A possibly infinite SLD-derivation is fair iff it is either failed or for every
occurrence of an atom A in the SLD-derivation, that occurrence of A or
an instance (possibly via the identity substitution) of A which is derived
from that occurrence, is selected for SLD-resolution within a finite number
of steps.
Fairness of SLD-derivations is a sufficient condition for the completeness
of SLD-resolution w.r.t. FF.
Let us consider a program PI and a query Q, and let us apply a folding
step which replaces a goal B by an atom H , thereby obtaining a program,
say pj . We may view every SLD-derivation 6? of Q in the derived program
P<2 as 'simulating' an SLD-derivation 8\ of Q in PI . Indeed, the simulated
SLD-derivation Si can be obtained by replacing in #2 the instances of H
introduced by folding steps, with the corresponding instances of B.
By applying a folding step (which is not an in-situ folding) to a clause
in PI , we may derive a program PJ such that a fair SLD-derivation for Q
using pj simulates an unfair SLD-derivation for Q using PI , as shown by
the following example.
Example 4.4.11. Let us consider the program Pj of Example 4.4.10 and
the infinite sequence of queries
which constitutes a fair SLD-derivation for the program P^ and the query
*-p(b). The folding step which produces P% from PI replaces the goal
(q(b),r(b)) by the goalp(6). The above SLD-derivation can be viewed as a
simulation of the following SLD-derivation for PI :
<-p(6) <-«(6),r(ft) <-«(&), r(6) ...
which is unfair, because it has been obtained by always selecting the atom
r(b) for performing an SLD-resolution step.
The Theorem 4.4.12 below is the analogue for the FF semantics of the
second correctness theorems w.r.t. LHM and CAS. Its proof is based on
732 Alberto Pettorossi and Maurizio Proietti
where P+ and Q+ are the sets of definite programs and definite queries,
respectively. (SubstSeq, <) is the set of finite or infinite sequences of
defined substitutions, and finite sequences of defined substitutions followed
by the undefined substitution ±. Similar approaches to the semantics of
Prolog can be found in [Jones and Mycroft, 1984; Debray and Mishra, 1988;
Deville, 1990; Baudinet, 1992].
The sequence consisting of the substitutions 01, 02, . . . is denoted by
(01, 02, . . .), and the concatenation of two sequences S1 and 52 in SubstSeq
is denoted by S1@S2 and it is defined as the usual monoidal concatenation
of finite or infinite sequences, with the extra property; {J_)@S = (-L), for
any sequence 5.
Example 4.4.13. Consider the following three programs:
P1: p(a) <- p(b) «- p(a) 4-
P2: p(o)«- p(X)<-p(X) p(6)<-
P3: p(o) <- p(&)«-p(6) p(o) 4-
We have that
Prolog^, 4- p(A-)] = ({X/a}, {X/b}, {X/a})
Prolog[P2,4- ppQ] = ({X/a}, {X/a},...)
Prolog[P3,4- p(X)] = ({X/a}, 1).
The order < over SubstSeq expresses a 'less defined than or equal to'
relation between sequences which can be introduced as follows.
For any two sequences of substitutions Si and S2, the relation Si < S2
holds iff either S1 = S2 or S1 = S3@{±> and S2 = S3@S4, for some S3 and
S4 in SubstSeq. For instance, we have that: (i) for all substitutions 771
734 Alberto Pettorossi and Maurizio Proietti
and 772 with 771 / ± and 772 ^ -L, (ni, -L) < {^i,%). and the sequences {771}
and (771,772) are not comparable w.r.t. <, and (ii) for any (possibly empty)
sequence S, (±) < 5.
Unfortunately, most transformation rules presented in the previous sec-
tions are not even partially correct w.r.t. Prolog. Indeed, it is easy to
see that a clause rearrangement may affect the 'generation order' or the
'computability in finite time' of the answer substitutions, and the deletion
of a duplicate clause may affect their multiplicity.
An unfolding step may affect the order of the computed answer substi-
tutions as well as the termination of a program, as is shown by the following
examples.
Example 4.4.14. By unfolding w.r.t. r(Y) the first clause of the following
program:
we get
we get
such that
COMP[F, <- G] = {9 | Camp(P) |= V(G0)}
where VC denotes the universal closure of a conjunction C of literals, and
similarly to the case of LHM, we identify any program P and any goal G
with the corresponding logical formulas.
As already mentioned, COMP is not a relevant semantics. Thus, we
cannot use the relevance lemma (page 716), and indeed, the definition
introduction rule and the definition elimination rule are not totally cor-
rect w.r.t. COMP. To see this, let us consider the case where we add
to a program P1 whose completion is consistent, a new clause of form
newp(X) <— -inewp(X). We get a new program, say P2, whose completion
contains the formula newp(X) <-> -<newp(X) and it is inconsistent. Thus,
COMP[P1,Q] = COMP[P2,Q] for Q =<- newp(X).
However, it can be shown that if the definition introductions are only
non-recursive definition introductions then any step of non-recursive defini-
tion introduction or definition elimination is totally correct w.r.t. COMP.
The partial correctness w.r.t. COMP of the unfolding rule Rl and the
folding rule R2 can easily be established, as illustrated by the following
example.
Example 4.5.2. Let us consider the program
whose completion is
whose completion is
whose completion is
together with the axioms of Clark's equality theory [Lloyd, 1987; Apt,
1990]. By unfolding the last clause of PO we get
Transformation of Logic Programs 741
page 726), definition introduction (rule R3, page 711), definition elim-
ination (rule R4, page 712), independent goal replacement (rule R5.3,
page 718), goal rearrangement (rule R5.4, page 725), deletion of duplicate
goals (rule R5.5, page 725), basic goal replacement (rule R5.6, page 727),
and clause replacement (rule R6, page 714).
The correctness of those rules w.r.t. LHM is ensured by the first and
the second correctness theorems w.r.t. LHM (pages 725 and 727).
In order to simplify our presentation, sometimes we will not mention
the use of goal rearrangement and deletion of duplicate goals.
As already pointed out, from the correctness of the goal rearrangement,
deletion of duplicate goals, clause rearrangement (rule R6.1, page 714),
and deletion of duplicate clauses (rule derived from rule R6.2, page 714) it
follows that the concatenation of sequences of literals and the concatena-
tion of sequences of clauses are associative, commutative, and idempotent.
Therefore, when dealing with goals or programs, we will feel free to use
set-theoretic notations, such as {...} and U, instead of sequence-theoretic
notations.
Before giving the technical details concerning the transformation strate-
gies we would like to present, we now informally explain the main ideas
which justify their use.
Suppose that we are given an initial program and we want to apply the
transformation rules to improve its efficiency. In order to do so, we usually
need a preliminary analysis of the initial program by which we discover that
the evaluation of a goal, say AI,.. .,An, in the body of a program clause,
say C, is inefficient, because it generates some redundant computations.
For example, by analysing the initial program PQ given in the palindrome
example of Section 2 (page 702), we may discover that the evaluation of
the body of the clause
3. pal((H\T\) <- appmd(Y,[H],T),pal(Y)
is inefficient because it determines multiple traversals of the list Y.
In order to improve the performance of PQ, we can apply the technique
which consists in introducing a new predicate, say newp, by means of a
clause, say N, with body AI ,..., An.
This initial transformation step can be formalized as an application
of the so-called tupling strategy (page 746). Sometimes we also need a
simultaneous application of the generalization strategy (page 747). Then
we fold clause C w.r.t. the goal A\,..., An by using clause TV, and we
unfold clause N one or more times, thereby generating some new clauses.
This process can be viewed as a symbolic evaluation of a query which
is an instance of AI, ... ,An. This unfolding gives us the opportunity of
improving our program, because, for instance, we may delete some clauses
with finitely failed body, thereby avoiding failures at run-time, and we may
delete duplicate atoms, thereby avoiding repeated computations.
744 Alberto Pettorossi and Maurizio Proietti
proofs and inductive synthesis are driven by the need for applying suitable
inductive hypotheses.
5.1 Basic strategies
We now describe some of the basic strategies which have been introduced
in the literature for transforming logic programs. They are: tupling, loop
absorption, and generalization. The basic ideas underlying these strategies
come from the early days of program transformation and they were already
present in [Burstall and Darlington, 1977].
The tupling strategy was formally defined in [Pettorossi, 1977] where it
is used for tupling together different function calls which require common
subcomputations or visit the same data structure.
The name 'loop absorption' was introduced in [Proietti and Pettorossi,
1990] for indicating a strategy which derives a new predicate definition
when a goal is recurrently generated during program transformation. This
strategy is present in various forms in a number of different transforma-
tion techniques, such as the above mentioned tupling, supercompilation
[Turchin, 1986], compiling control [Bruynooghe et al., 1989], as well as
various techniques for partial evaluation (see Section 6).
Finally, the generalization strategy has its origin in the automated the-
orem proving context [Boyer and Moore, 1975], where it is used to generate
a new generalized conjecture allowing the application of an inductive hy-
pothesis.
The tupling, loop absorption, and generalization strategies are used
in this chapter as building blocks to describe a number of more complex
transformation techniques.
For a formal description of the strategies and their possible mechaniza-
tion we now introduce the notion of unfolding tree. It represents the process
of transforming a given clause by performing unfolding and basic goal re-
placement steps. This notion is also related to the one of symbolic trace tree
of [Bruynooghe et al., 1989], where, however, the basic goal replacement
rule is not taken into account.
Definition 5.1.1 (Unfolding tree). Let P be a program and C a clause.
An unfolding tree for (P, C) is a (finite or infinite) labelled tree such that
• the root is labelled by the clause C,
• if M is a node labelled by a clause D then
either M has no sons,
or M has n(> 1) sons labelled by the clauses D1, . . . , Dn obtained
by unfolding D w.r.t. an atom of its body using P,
or M has one son labelled by a clause obtained by basic goal replace-
ment from D.
In an unfolding tree we also have the usual relations of 'descendant
node' (or clause) and 'ancestor node' (or clause).
746 Alberto Pettorossi and Maurizio Proietti
We now have to look for the recursive definition of the predicate new-
csub. Starting from clause 11, we perform the unfolding step corresponding
to the one which leads from clause 6 to clauses 9 and 10. We get the clauses
12. newcsub(A, X, Y, [A\Z]) 4- subseq(X, Y), subseq(X, Z)
13. newcsub(A,X,Y,[B\Z]) 4- subseq(X,Y),subseq([A\X],Z)
and by folding we get
12f. newcsub(A,X,Y,[A\Z]) 4- csub(X,Y,Z)
13f. newcsub(A,X,Y,[B\Z]) 4- newcsub(A,X,Y,Z)
The final program is made out of the following clauses:
8. csub([],Y,Z) 4-
6f. csub([A\X], [A\Y], Z) 4- newcsub(A, X, F, Z)
7f. csu6([yl|X],[ J B|y] > Z) 4- csub([A\X],Y, Z)
12f. newcsub(A, X, Y, [A\Z]) 4- csub(X, Y, Z)
13f. newcsub(A,X,Y,[B\Z}) 4- newcsub(A,X,Y,Z)
The correctness of the above transformation can easily be proved by
applying the second correctness theorem w.r.t. LHM with the assumption
that newcsub and csub are top predicates and subseq is an intermediate
predicate. In particular, the single-folding step which generates clause 6f
from clause 6 using clause 11, satisfies the conditions of that theorem, be-
cause: (i) clause 11 has been introduced by the definition rule, (ii) the head
of clause 11 has a top predicate, and (iii) clause 6 has been derived from
clause 1 by unfolding w.r.t. the intermediate atom subseq(X, Y). Similar
conditions ensure the correctness of the other single-folding steps.
Let us now compare the SLD-tree, say Ti, for Csub, a query of form
4— csub(X, s, t) in Q, and the computation rule C, with the SLD-tree, say
T2, for the final program, the query 4- csub(X, s,(), and the left-to-right
computation rule.
752 Alberto Pettorossi and Maurizio Proietti
As the reader may verify, the tree T% can be obtained from the tree
TI by first replacing every query of the form '«- subseq(x,b), subseq(x,c)'
by the query '«— csub(x,b,cY and every query of the form '<— subseq(x,b),
subseq([a\x],c)' by the query '«— newcsub(a,x,b,cy, and then by perform-
ing on the derived tree the rewritings shown in Fig. 3 for any unbound
variable X and ground lists u and v.
5.2.2 Composing programs
A popular style of programming, which can be called compositional, con-
sists in decomposing a given goal into easier subgoals, then writing program
modules which solve these subgoals, and finally, composing the various pro-
gram modules together. The compositional style of programming is often
helpful for writing programs which can easily be understood and proved
correct w.r.t. their specifications.
However, this programming style often produces inefficient programs,
because the composition of the various subgoals does not take into account
the interactions which may occur among the evaluations of these subgoals.
For instance, let us consider a logic program with a clause of the form
p(X)<-q(X,Y),r(Y)
where in order to solve the goal p(X) we are required to solve q(X, Y) and
r(Y). The binding of the variable Y is not explicitly needed because it does
not occur in the head of the clause. If the construction, the memorization,
and the destruction of that binding are expensive, then our program is
likely to be inefficient.
Similar problems occur when the compositional style of programming
is applied for writing programs in other programming languages, different
from logic. In imperative languages, for instance, one may construct several
procedures which are then combined together by using various kinds of
sequential or parallel composition operators. In functional languages, the
small subtasks in which a given task is decomposed are solved by means of
individual functions which are then combined together by using function
application or tupling.
There are various papers in the literature which present techniques for
improving the efficiency of the evaluation of programs written according to
Transformation of Logic Programs 753
some techniques which have been first introduced in the case of functional
programs and are referred to as program annotations [Schwarz, 1982].
In the case of Prolog, a typical technique which produces annotated
programs consists in adding a cut operator '!' in a point where the execution
of the program can be performed in a deterministic way. For instance, the
following Prolog program fragment:
p(X) <- C, Body A
p(X) <- not(C),BodyB
can be transformed (if C has no side-effects) into
The resulting program may be more efficient than the initial program
because, by using the partially known input, it is possible to perform at
compile time some run-time computations.
Partial evaluation can be viewed as a particular case of program special-
ization [Scherlis, 1981], which is aimed at transforming a given program by
exploiting the knowledge of the context where that program is used. This
knowledge can be expressed as a precondition which is satisfied by the
values of the input to the program.
Not much work has been done in the area of logic program specializa-
tion, apart from the particular case of partial deduction. However, some
results are reported in [Bossi et al., 1990] and in various papers by Gal-
lagher and others [Gallagher et al., 1988; Gallagher and Bruynooghe, 1991;
de Waal and Gallagher, 1992]. In the latter papers the use of the abstract
interpretation methodology plays a crucial role. Using this methodology
one can represent and manipulate a possibly infinite set of input values
which satisfies a given precondition, by considering, instead, an element of
a finite abstract domain.
Abstract interpretations can be used before and after the application
of program specialization, that is, during the so-called preprocessing phase
and postprocessing phase. During the preprocessing phase, by using ab-
stract interpretations we may collect information depending on the control
flow, such as groundness of arguments and determinacy of predicates. This
information can then be exploited for directing the specialization process.
Examples of this preprocessing are the binding time analysis performed by
the Logimix partial evaluator of [Mogensen and Bondorf, 1993] and the
determinacy analysis performed by Mixtus [Sahlin, 1993].
During the postprocessing phase, abstract interpretations may be used
for improving the program obtained by the specialization process, as indi-
cated, for instance, in [Gallagher, 1993] where it is shown how one can get
rid of the so-called useless clauses.
The idea of partial evaluation of logic programs can be presented as
follows [Lloyd and Shepherdson, 1991]. Let us consider a normal program
P and a query «— A, where A is an atom. We construct a finite portion of
an SLDNF-tree for Pu {«- .4} containing at least one non-root node. For
this construction we use an unfolding strategy U which tells us the atoms
which should be unfolded and when to terminate the construction of that
tree. The notion of unfolding strategy is analogous to the one of u-selection
rule (page 746), but it applies to goals, instead of clauses. The design of
unfolding strategies which eventually terminate, thereby producing a finite
SLDNF-tree, can be done within general frameworks like the ones described
in [Bruynooghe et al., 1992; Bol, 1993].
We then construct the set of clauses {Adi <- G< | » = 1,... ,n}, called
resultants, obtained by collecting from each non-failed leaf of the SLDNF-
tree the query <- d and the corresponding computed answer substitution
766 Alberto Pettorossi and Maurizio Proietti
and the atom A = p(X,a). Let us use the unfolding strategy U which
performs unfolding steps starting from the query <- p(X, a) until each leaf
of the SLDNF-tree is either a success or a failure or it is an atom with
predicate p. We get the tree depicted in Fig. 5.
By collecting the goals and the substitutions corresponding to the leaves
of that tree we have the following set of resultants:
w.r.t. the atom p(a). We can derive the resultant p(o) <- p(b). Let A be
{p(o)|. Thus, a partial evaluation of P w.r.t. p(a) is the program PA'
p(a) <- p(b)
obtained by replacing the definition of p in P (that is, the whole program
P) by the resultant p(a) 4- p(b). PA is not {p(a)}- closed and we have that
CASNF[PA, <- p(a)] = {}, whereas CASNF[P, <- p(a)} = {{}}.
Example 6.0.7. Let us consider the following program P:
768 Alberto Pettorossi and Maurizio Proietti
and the set S of atoms {p,q(X), q(a)} which is not independent. A partial
evaluation of P w.r.t. S is the following program PS'
These two equalities hold if, for instance, we derive PG from Pu{D} by
using the definition, unfolding, and folding rules according to the restric-
tions of the second correctness theorems w.r.t. CASNF (page 738) and
FF (page 732), respectively.
Let us now briefly compare the two approaches to partial evaluation we
have mentioned above, that is, the one based on Lloyd and Shepherdson's
theorem and the one based on the unfold/fold rules.
In the approach based on Lloyd and Shepherdson's theorem, the ef-
ficiency gains are obtained by constructing SLDNF-trees and extracting
resultants. This process corresponds to the application of some unfolding
steps, and since efficiency gains are obtained without using the folding rule,
it may seem that this is an exception to the 'need for folding' meta-strategy
described in Section 5. However, in order to guarantee the correctness of
the partial evaluation of a given program P w.r.t. a set of atoms S, for
each element of S we are required to find an SLDNF-tree whose leaves
contain instances of atoms in S (see the closedness condition), and as the
reader may easily verify, this requirement exactly corresponds to the 'need
for folding'.
Conversely, the approach based on the unfold/fold rules does not require
us to find the set S with the closedness and independence properties, but as
we show in Example 6.0.8 below, we often need to introduce some auxiliary
clauses by the definition rule and we also need to perform some final folding
steps using those clauses.
Example 6.0.8 below also shows that in the partial evaluation approach
based on the unfold/fold rules, the use of the renaming technique for struc-
ture specialization [Benkerimi and Lloyd, 1990; Gallagher and Bruynooghe,
1990; Gallagher, 1993; Benkerimi and Hill, 1993] which is often required in
the first approach, is not needed. For other issues concerning the use of
folding during partial evaluation the reader may refer to [Owen, 1989].
We now present an example of derivation of a partial evaluation of a
program by applying the unfold/fold transformation rules and the loop
absorption strategy.
Example 6.0.8. [String matching] [Sahlin, 1991; Gallagher, 1993]. Let us
consider the following program Match for string matching:
1. match(P,S) «- aux(P,S,P,S)
2. aux([],X,Y,Z)<-
3. aux([A\Ps],[A\Ss],P,S) <- aux(Ps,Ss,P,S)
4. auo;([yl|Fs],[B|5s],P,[C|S]) <- -.(A = B),aux(P,S,P,S)
where the pattern P and the string S are represented as lists, and the
relation match(P, S) holds iff the pattern P occurs in the string 5. For
instance, the pattern [a,b] occurs in the string [c,a,b], but it does not occur
in the string [a,c,b].
770 Alberto Pettorossi and Maurizio Proietti
Let us now partially evaluate the given program Match w.r.t. the atom
match([a,a,b],X). In order to do so we first introduce the following defi-
nition:
5. newp(X) «- match([a,a, b],X)
whose body is the atom w.r.t. which the partial evaluation should be per-
formed. As usual when applying the definition rule, the name of the head
predicate is a new symbol, newp in our case. Then we construct the un-
folding tree for (Match, clause 5) using the u-selection rule which
i) unfolds a clause w.r.t. any atom of the form either match(...) or
aux(...), and
ii) does not unfold a clause for which we can apply the loop absorption
strategy, that is, a clause in whose body there is an instance of an
atom which occurs in the body of a clause in an ancestor node.
We get the tree depicted in Fig. 6. In clause 8 of Fig. 6, the atom
aux([a, a, b], S, [a, a, 6], S) is an instance of the body of clause 6 via the
substitution {X/S}.
Analogously, in clause 10 the atom aux([a,a,b],[H\S],[a,a,b],[H\S]),
and in clause 12 the atom aux([a,a,b],[a,H\S],[a,a,b],[a,H\S]) are in-
stances of the body of clause 6.
Thus, we can apply the loop absorption strategy and we introduce the
new definition:
14. newq(S) <- aux([a,a,b],S,[a,a,b],S)
Transformation of Logic Programs 771
Conclusions
We have looked at the theoretical foundations of the so-called 'rules +
strategies' approach to logic program transformation. We have established
a unified framework for presenting and comparing the various rules which
have been proposed in the literature. That framework is parametric with
respect to the semantics which is preserved during transformation.
We have presented various sets of transformation rules and the cor-
responding correctness results w.r.t. different semantics of definite logic
774 Alberto Pettorossi and Maurizio Proietti
programs, such as: the least Herbrand model, the computed answer sub-
stitutions, the finite failure, and the pure Prolog semantics.
We have also considered the case of normal programs, and using the
proposed framework, we have presented the rules which preserve computed
answer substitutions, finite failure, and Clark's completion semantics. We
have briefly mentioned the results concerning the rules which preserve other
semantics for normal programs.
We have also presented a unified framework in which it is possible to
describe some of the most significant techniques for guiding the application
of the transformation rules with the aim of improving program efficiency.
We have singled out a few basic strategies, such as tupling, loop absorption,
and generalization, and we have shown that various methods for compiling
control, program composition, change of data representations, and partial
evaluation, can be viewed as suitable applications of those strategies.
An area of further investigation is the characterization of the power of
the transformation rules and strategies, both in the 'completeness' sense,
that is, their capability of deriving all programs which are equivalent to the
given initial program, and in the 'complexity' sense, that is, their capability
of deriving programs which are more efficient than the initial program. No
conclusive results are available in either direction.
A line of research that can be pursued in the future, is the integration
of tools, like abstract interpretations, proofs of properties, and program
synthesis, within the 'rules + strategies' approach to program transforma-
tion.
Unfortunately, the transformational methodology in the practice of logic
programming has gained only moderate attention in the past. However, it
is recognized that the automation of transformation techniques and their
integrated use is of crucial importance for building advanced software de-
velopment systems.
There is a growing interest in the mechanization of transformation
strategies and the production of interactive tools for implementing pro-
gram transformers. Moreover, some optimizing compilers already devel-
oped make use of various transformation techniques.
The importance of the transformation methodology will substantially
increase if we extend its theory and applications also to the case of complex
logic languages which manipulate constraints, and support both concur-
rency and object-orientation.
Acknowledgements
We thank M. Bruynooghe, J. P. Gallagher, M. Leuschel, M. Maher, and
H. Seki for their helpful comments and advice on many issues concerning
the transformation of logic programs. Our thanks go also to O. Aioni, P.
Dell'Acqua, M. Gaspari and M. Kalsbeek for reading a preliminary version •
of this chapter.
Transformation of Logic Programs 775
References
[Alexandra et at, 1992] F. Alexandra, K. Bsai'es, J. P. Finance, and
A. Quere. Spes: A system for logic program transformation. In Pro-
ceedings of the International Conference on Logic Programming and Au-
tomated Reasoning, LPAR '92, Lecture Notes in Computer Science 624,
pages 445-447, 1992.
[Amtoft, 1992] T. Amtoft. Unfold/fold transformations preserving termi-
nation properties. In Proc. PLILP '92, Leuven, Belgium, Lecture Notes
in Computer Science 631, pages 187-201. Springer-Verlag, 1992.
[Apt, 1990] K. R. Apt. Introduction to logic programming. In J. van
Leeuwen, editor, Handbook of Theoretical Computer Science, pages 493-
576. Elsevier, 1990.
[Aravindan and Dung, 1995] C. Aravindan and P. M. Dung. On the cor-
rectness of unfold/fold transformation of normal and extended logic pro-
grams. Journal of Logic Programming, 24(3):201-217, 1995.
[Arsac and Kodratoff, 1982] J. Arsac and Y. Kodratoff. Some techniques
for recursion removal from recursive functions. ACM Transactions on
Programming Languages and Systems, 4(2):295-322, 1982.
[Azibi, 1987] N. -Azibi. TREQUASI: Un systeme pour la transformation
automatique de programmes Prolog recursifs en quasi-iteratifs. PhD the-
sis, Universite de Paris-Sud, Centre d'Orsay, France, 1987.
[Baudinet, 1992] M. Baudinet. Proving termination properties of Prolog
programs: A semantic approach. Journal of Logic Programming, 14:1-
29, 1992.
[Benkerimi and Hill, 1993] K. Benkerimi and P. M. Hill. Supporting trans-
formations for the partial evaluation of logic programs. Journal of Logic
and Computation, 3(5):469-486, 1993.
[Benkerimi and Lloyd, 1990] K. Benkerimi and J. W. Lloyd. A partial eval-
uation procedure for logic programs. In S. Debray and M. Hermenegildo,
editors, Logic Programming: Proceedings of the 1990 North American
Conference, Austin, TX, USA, pages 343-358. The MIT Press, 1990.
[Bensaou and Guessarian, 1994] N. Bensaou and I. Guessarian. Trans-
forming constraint logic programs. In llth Symp. on Theoretical Aspects
of Computer Science, STAGS '94, Lecture Notes in Computer Science
775, pages 33-46. Springer-Verlag, 1994.
[Bird, 1984] R. S. Bird. The promotion and accumulation strategies in
transformational programming. ACM Toplas, 6(4):487-504, 1984.
[Bj0rner et al., 1988] D. Bj0rner, A. P. Ershov, and N. D. Jones, editors.
Partial Evaluation and Mixed Computation. North-Holland, 1988. IFIP
TC2 Workshop on Partial and Mixed Computation, Gammel Avernaes,
Denmark, 1987.
776 Alberto Pettorossi and Maurizio Proietti
[Flener and Deville, 1993] P. Flener and Y. Deville. Logic program syn-
thesis from incomplete specifications. Journal of Symbolic Computation,
15:775-805, 1993.
[Fuchs and Fromherz, 1992] N. E. Fuchs and M. P. J. Fromherz. Schema-
based transformations of logic programs. In T. Clement and K.-K. Lau,
editors, Logic Program Synthesis and Transformation, Proceedings LOP-
STR '91, Manchester, UK, pages 111-125. Springer-Verlag, 1992.
[Fujita, 1987] H. Fujita. An algorithm for partial evaluation with con-
straints. Technical Memorandum TM-0367, ICOT, Tokyo, Japan, 1987.
[Fujita and Furukawa, 1988] H. Fujita and K. Furukawa. A self-applicable
partial evaluator and its use in incremental compilation. New Generation
Computing, 6(2&3):91-118, 1988.
[Fuller and Abramsky, 1988] D. A. Fuller and S. Abramsky. Mixed compu-
tation of Prolog programs. New Generation Computing, 6(2&3): 119-141,
1988.
[Futamura, 1971] Y. Futamura. Partial evaluation of computation
process—an approach to a compiler-compiler. Systems, Computers,
Controls, 2(5):45-50, 1971.
[Gallagher, 1986] J. P. Gallagher. Transforming programs by specializing
interpreters. In Proceedings Seventh European Conference on Artificial
Intelligence, ECAI '86, pages 109-122, 1986.
[Gallagher, 1991] J. P. Gallagher. A system for specializing logic programs.
Technical Report TR-91-32, University of Bristol, Bristol, U.K., 1991.
[Gallagher, 1993] J. P. Gallagher. Tutorial on specialization of logic pro-
grams. In Proceedings of ACM SIGPLAN Symposium on Partial Evalu-
ation and Semantics Based Program Manipulation, PEPM '93, Copen-
hagen, Denmark, pages 88-98. ACM Press, 1993.
[Gallagher and Bruynooghe, 1990] J. P. Gallagher and M. Bruynooghe.
Some low-level source transformations for logic programs. In M. Bruy-
nooghe, editor, Proceedings of the Second Workshop on Meta-Pro-
gramming in Logic, Leuven, Belgium, pages 229-246. Department of
Computer Science, KU Leuven (Belgium), April 1990.
[Gallagher and Bruynooghe, 199l] J. P. Gallagher and M. Bruynooghe.
The derivation of an algorithm for program specialisation. New Gen-
eration Computing, 6(2):305-333, 1991.
[Gallagher et al, 1988] J. P. Gallagher, M. Codish, and E. Shapiro. Spe-
cialization of Prolog and FCP programs using abstract interpretation.
New Generation Computing, 6(2&3):159-186, 1988.
[Gardner and Shepherdson, 1991] P. A. Gardner and J. C. Shepherdson.
Unfold/fold transformations of logic programs. In J.-L. Lassez and
G. Plotkin, editors, Computational Logic, Essays in Honor of Alan
Robinson, pages 565-583. The MIT Press, 1991.
Transformation of Logic Programs 781
H- 4, 6 semantics 308
> | -> 4, 6, 44 simulation 280-5
J-\- 4, 6, 43 use for various forms of reasoning 271-2
> h* 5, 6 abductive framework 242
?- 5, 6, 11, 70 abductive logic programming [ALP] 269
I- 5, 6, 37, 75 modification of semantics 278
0 | 5, 6 abductive proof procedure 273-7
-> 5, 6, 78 abductive phase 258-60
h 6, 40, 75 argumentation-theoretic interpretation 267-
N 6, 7, 70 9, 277-9
consistency phase 258, 258n, 260
A8 soundness 261-2
. 25 abductive reasoning 236-7
= 28 abductive task, intractability 240
-L 71, 192 abstract data type 455
n 120 realization 208
C 120 abstract interpretation 111, 772
T 120, 516 abstract interpreter 526-7
D 166, 363, 406 for higher-order Horn clauses, deficien-
T 192 cies 537
HO 193 abstract logic programming language 198
? 213 examples 199-200, 205
= 387 abstract machine
<-> 387 advantages 651-652
~387 design of instruction set 652-655
0408 runtime support 656
[]429 AC, see admissibility condition
[1 465 acceptability semantics 296
X 516 accumulation strategy 759
:: 551 admissibility condition [AC] 333
\ 551 admissible chain 184
(( )> 609 AKL, see Andorra Kernel Language
[]609 algebra 28; see also error...; functional...;
initial...; relational...
abducible hypotheses, retraction 281-2 allowed program 360-1, 391
abducibles, negation of 274-5 Alloy 466
abducible sentences 237 ALP, see abductive logic programming
abduction a-conversion 514
applications in AI 243-4 ALPS 622
argumentation-theoretic interpretation 236 amalgamated language, incompleteness 460
computation through TMS 279 amalgamated program 461
and constraint logic programming 287-8 amalgamation 460
deduction from the completion 285-7 advantages 464
default and non-default 308 ambivalent logic 467-8
formalizations 240 analog circuits, analysis and synthesis
proof procedures 239 658-60
790 INDEX