100% found this document useful (1 vote)
269 views

Logic Programming PDF

Uploaded by

xoxom19116
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
269 views

Logic Programming PDF

Uploaded by

xoxom19116
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 815

Logic Programming,

Volume 5

DOV M. GABBAY
C. J. HOGGER
J. A. ROBINSON,
Editors

CLARENDON PRESS
HANDBOOK OF LOGIC
IN A R T I F I C I A L I N T E L L I G E N C E
AND LOGIC P R O G R A M M I N G

Editors

Dov M. Gabbay, C. J. Hogger, and J. A. Robinson


HANDBOOKS OF LOGIC IN COMPUTER SCIENCE
and
A R T I F I C I A L I N T E L L I G E N C E AND LOGIC
PROGRAMMING

Executive Editor
Dov M. Gabbay

Administrator
Jane Spurr

Handbook of Logic in Computer Science


Volume 1 Background: Mathematical structures
Volume 2 Background: Computational structures
Volume 3 Semantic structures
Volume 4 Semantic modelling
Volume 5 Theoretical methods in specification and verification

Handbook of Logic in Artificial Intelligence and


Logic Programming
Volume 1 Logical foundations
Volume 2 Deduction methodologies
Volume 3 Nonmonotonic reasoning and uncertain reasoning
Volume 4 Epistemic and temporal reasoning
Volume 5 Logic programming
HANDBOOK OF LOGIC IN
ARTIFICIAL INTELLIGENCE
AND LOGIC PROGRAMMING
Volume 5
Logic Programming

Edited by
DOV M. GABBAY
and
C. J. HOGGER
Imperial College of Science, Technology and Medicine
London
and
J. A. ROBINSON
Syracuse University, New York

C L A R E N D O N PRESS • O X F O R D
1998
Oxford University Press, Great Clarendon Street. Oxford OX2 6DP
Oxford New York
Athens Auckland Bangkok Bogota Bombay
Buenos Aires Calcutta Cape Town Dares Salaam
Delhi Florence Hong Kong Istanbul Karachi
Kuala Lumpur Madras Madrid Melbourne
Mexico City Nairobi Paris Singapore
Taipei Tokyo Toronto Warsaw
and associated companies in
Berlin Ibadan

Oxford is a trade mark of Oxford University Press

Published in the United States by


Oxford University Press Inc., New York

(C) The contributors listed on pp. xiv-xv, 1998

'Constraint logic programming: a survey' by J. Jaffar and M. J. Maher was


previously published in the Journal of Logic
Programming 19/20, 503-82. It is reproduced with
permission from Elsevier

AH rights reserved. No part of this publication may be


reproduced, stored in a retrieval system, or transmitted, in any
form or by any means, without the prior permission in writing of Oxford
University Press. Within the UK, exceptions are allowed in respect of any
fair dealing for the purpose of research or private study, or criticism or
review, as permitted under the Copyright, Designs and Patents Act, 1988, or
in the case of reprographic reproduction in accordance with the terms of
licences issued by the Copyright Licensing Agency. Enquiries concerning
reproduction outside those terms and in other countries should be sent to
the Rights Department, Oxford University Press, at the address above.

This book is sold subject to the condition that it shall not,


by way of trade or otherwise, be lent, re-sold, hired out. or otherwise
circulated without the publisher's prior consent in any form of binding
or cover other than that in which it is published and without a similar
condition including this condition being imposed
on the subsequent purchaser.

A catalogue record for this book is available from the British Library

Library of Congress Cataloging in Publication Data


(Data available)

ISBN 0 19 853792 I

Typeset by the authors using LATEX

Printed in Great Britain by


Bookcraft (Bath) Ltd
Midsomer Norton, A von
Preface
I am very happy to present to the community the fifth and last volume of
our Handbook series, covering the area of Logic and Logic Programming.
The ideas and methods of logic programming gave a substantial push to
the development of logic itself. Ideas like negation as failure, goal directed
presentation of a logic, metalevel features in the object level are applicable
to any logical system and not only to the classical Horn clause fragment.
The central role and success of these ideas in logic programming pro-
vided an example to follow for research into similar developments for gen-
eral logics.
Logic programming is also a central tool in the new and wide area of
non-monotonic logic and artificial intelligence. The methods of abduction,
the use of constraints and higher order features have all interacted and
supported the new systems of logic designed to cope with practical common
sense reasoning.

The Handbooks
The Handbook of Logic in Artificial Intelligence and Logic Programming
and its companion, the Handbook of Logic in Computer Science, have been
created in response to a growing need for an in-depth survey of the appli-
cation of logic in AI and computer science.
We see the creation of the Handbook as a combination of authoritative
exposition, comprehensive survey, and fundamental reasearch exploring the
underlying unifying themes in the various areas. The intended audience is
graduate students and researchers in the areas of computing and logic, as
well as other people interested in the subject. We assume as background
some mathematical sophistication. Much of the material will also be of
interest to logicians and mathematicians.
The tables of contents of the volumes were finalized after extensive dis-
cussions between Handbook authors and second readers. The first two
volumes present the background logic and mathematics extensively used
in artificial intelligence and logic programming. The point of view is ap-
plication oriented. The other volumes present major areas in which the
methods are used. These include: Volume 1—Logical foundations; Volume
2—Deduction methodologies; Volume 3—Nonmonotonic reasoning and un-
certain reasoning; Volume 4—Epistemic and temporal reasoning.
The chapters, which in many cases are of monographic length and scope,
are written with emphasis on possible unifying themes. The chapters have
an overview, introduction, and main body. A final part is dedicated to
vi PREFACE

more specialized topics.


Chapters are written by internationally renowned researchers in their
respective areas. The chapters are co-ordinated and their contents were dis-
cussed in joint meetings. Each chapter has been written using the following
procedures:
1. A very detailed table of contents was discussed and co-ordinated at
several meetings between authors and editors of related chapters.
The discussion was in the form of a series of lectures by the authors.
Once an agreement was reached on the detailed table of contents, the
authors wrote a draft and sent it to the editors and to other related
authors. For each chapter there is a second reader (the first reader is
the author) whose job it has been to scrutinize the chapter together
with the editors. The second reader's role is very important and has
required effort and serious involvement with the authors.
Second readers for this volume include (in alphabetical order) K. Apt,
M. Bruynooghe, G. Dowek, K. Fine, J. P. Gallagher, F. van Harmelen,
K. Inoue, B. Jayaraman, P. Kanellakis. R. Kowalski, J-L. Lassez, J.
Lloyd, M. Leuschel, D. W. Loveland, M. Maher, J. Meseguer, D.
Miller, G. Nadathur, T. Przymusinski, K. Satoh, D. J. Sherman, and
E. Wimmers.
2. Once this process was completed (i.e. drafts seen and read by a large
enough group of authors), there were other meetings on several chap-
ters in which authors lectured on their chapters and faced the criti-
cism of the editors and audience. The final drafts were prepared after
these meetings.
3. We attached great importance to group effort and co-ordination in the
writing of chapters. The first two parts of each chapter, namely the
introduction-overview and main body are not completely under the
discretion of the author, as he/she had to face the general criticism
of all the other authors. Only the third part of the chapter is entirely
for the authors' own personal contribution.
The Handbook meetings were generously financed by OUP, by SERC
contract SO/809/86, by the Department of Computing at Imperial Col-
lege, and by several anonymous private donations. We would like to thank
our colleagues, authors, second readers, and students for their effort and
professionalism in producing the manuscripts for the Handbook. We would
particularly like to thank the staff of OUP for their continued and enthusi-
astic support, Mrs L. Rivlin for help with design, and Mrs Jane Spurr, our
OUP Adminstrator for her dedication and efficiency.

London D. M. Gabbay
July 1997
Vll

Contents
List of contributors xiv
Introduction: Logic and Logic Programming
Languages
Michael J. O'Donnell 1
1 Introduction 1
1.1 Motivation 1
1.2 A notational apology 3
2 Specifying logic programming languages 7
2.1 Semantic systems and semantic consequences ... 7
2.2 Query Systems, questions and answers 11
2.3 Examples of logic programming languages 15
3 Implementing logic programming languages 37
3.1 Proof systems 37
3.2 Soundness and completeness of proof systems ... 40
3.3 Programming systems 44
3.4 Soundness and completeness of programming
systems 49
3.5 Proof-theoretic foundations for logic programming 56
4 The uses of semantics 57
4.1 Logical semantics vs. denotational semantics . . . . 57
4.2 Logical semantics vs. initial/final-algebra and
Herbrand semantics 58

Equational Logic Programming


Michael J. O'Donnell 69
1 Introduction to equational logic programming 69
1.1 Survey of prerequisites 69
1.2 Motivation for programming with equations . . . . 71
1.3 Outline of the chapter 74
2 Proof systems for equational logic 75
2.1 Inferential proofs 75
2.2 Term rewriting proofs 78
2.3 The confluence property and the completeness of
term rewriting 81
3 Term rewriting proof strategies 96
viii CONTENTS

3.1 Complete and outer most complete rewriting


sequences 97
3.2 Sequentiality analysis and optimal rewriting . . . . 100
4 Algorithms and data structures to implement
equational languages 111
4.1 Data structures to represent terms 111
4.2 Pattern-matching and sequencing methods 120
4.3 Driving procedures for term rewriting 129
5 Compiling efficient code from equations 137
6 Parallel implementation 139
7 Extensions to equational logic programming 141
7.1 Incremental infinite input and output 141
7.2 Solving equations 147
7.3 Indeterminate evaluation in subset logic 149
7.4 Relational rewriting 151

Proof Procedures for Logic Programming


Donald W. Loveland and Gopalan Nadathur 163
1 Building the framework: the resolution procedure . . . . 163
1.1 The resolution procedure 164
1.2 Linear resolution refinements 175
2 The logic programming paradigm 186
2.1 Horn clause logic programming 186
2.2 A framework for logic programming 190
2.3 Abstract logic programming languages 198
3 Extending the logic programming paradigm 212
3.1 A language for hypothetical reasoning 213
3.2 Near-Horn Prolog 219
4 Conclusion 229

The Role of Abduction in Logic Programming


A. C. Kakas, R. A. Kowalski and F. Toni 235
1 Introduction 236
1.1 Abduction in logic 237
1.2 Integrity constraints 241
1.3 Applications 243
2 Knowledge assimilation 244
3 Default reasoning viewed as abduction 249
4 Negation as failure as abduction 254
4.1 Logic programs as abductive frameworks 255
4.2 An abductive proof procedure for LP 257
4.3 An argumentation-theoretic interpretation 263
CONTENTS ix

4.4 An argumentation-theoretic interpretation of the abduc-


tive proof procedure 267
5 Abductive logic programming 269
5.1 Generalized stable model semantics 270
5.2 An abductive proof procedure for ALP 273
5.3 An argumentation-theoretic interpretation of the abduc-
tive proof procedure for ALP 277
5.4 Computation of abduction through TMS 279
5.5 Simulation of abduction 279
5.6 Abduction through deduction from the completion 285
5.7 Abduction and constraint logic programming . . . 286
6 Extended logic programming 288
6.1 Answer set semantics 289
6.2 Restoring consistency of answer sets 290
6.3 Rules and exceptions in LP 293
6.4 (Extended) Logic Programming without Negation as Fail-
ure 295
6.5 An argumentation-theoretic approach to ELP . . . 297
6.6 A methodology for default reasoning with explicit nega-
tion 299
6.7 ELP with abduction 300
7 An abstract argumentation-based framework for default reason-
ing 300
8 Abduction and truth maintenance 303
8.1 Justification-based truth maintenance 304
8.2 Assumption-based truth maintenance 305
9 Conclusions and future work 307

Semantics for Disjunctive and Normal


Disjunctive Logic Programs
Jorge Lobo, Jack Minker and Arcot Rajasekar 325
1 Introduction 325
2 Positive consequences in logic programs 327
2.1 Definite logic programming 328
2.2 Disjunctive logic programming 330
3 Negation in logic programs 337
3.1 Negation in definite logic programs 337
3.2 Negation in disjunctive logic programs 338
4 Normal or general disjunctive logic programs 340
4.1 Stratified definite logic programs 341
4.2 Stratified disjunctive logic programs 343
4.3 Well-founded and generalized well-founded
logic programs 346
x CONTENTS

4.4 Generalized disjunctive well-founded semantics . . 346


5 Summary 347
6 Addendum 349

Negation as Failure, Completion and


Stratification
J. C. Shepherdson 356
1 Overview/introduction 356
1.1 Negation as failure, the closed world assumption
and the Clark completion 356
1.2 Incompleteness of NF for comp(P) 359
1.3 Floundering, an irremovable source of
incompleteness 359
1.4 Cases where SLDNF-resolution is complete for
comp(P) 361
1.5 Semantics for negation via special classes of model 362
1.6 Semantics for negation using non-classical logics . . 363
1.7 Constructive negation: an extension of negation as fail-
ure 364
1.8 Concluding remarks 365
2 Main body 365
2.1 Negation in logic programming 365
2.2 Negation as failure; SLDNF-resolution 367
2.3 The closed world assumption, CWA(P) 370
2.4 The Clark completion, comp(P) 374
2.5 Definite Horn clause programs 384
2.6 Three-valued logic 385
2.7 Cases where SLDNF-resolution is complete for comp(P):
hierarchical, stratified and call-consistent programs. 391
2.8 Semantics for negation in terms of special classes
of models 393
2.9 Constructive negation; an extension of negation as
failure 402
2.10 Modal and autoepistemic logic 406
2.11 Deductive calculi for negation as failure 409

Meta-Programming in Logic Programming


P. M. Hill and J. Gallagher 421
1 Introduction 422
1.1 Theoretical foundations 423
1.2 Applications 425
1.3 Efficiency improvements 426
1.4 Preliminaries 427
CONTENTS xi

2 The non-ground representation 429


2.1 The representation 431
2.2 Reflective predicates 434
2.3 Meta-programming in Prolog 439
3 The ground representation 440
3.1 The representation 442
3.2 Reflective predicates 448
3.3 The language Godel and meta-programming . . . . 453
4 Self-applicability 459
4.1 Separated meta-programming 460
4.2 Amalgamated meta-programming 461
4.3 Ambivalent logic 467
5 Dynamic meta-programming 468
5.1 Constructing programs 468
5.2 Updating programs 471
5.3 The three wise men problem 473
5.4 Transforming and specializing programs 478
6 Specialization of meta-programs 481
6.1 Logic program specialization 481
6.2 Specialization and compilation 487
6.3 Self-applicable program specializers 488
6.4 Applications of meta-program specialization . . . . 489

Higher-Order Logic Programming


Gopalan Nadathur and Dale Miller 499
1 Introduction 500
2 A motivation for higher-order features 502
3 A higher-order logic 510
3.1 The language 510
3.2 Equality between terms 513
3.3 The notion of derivation 517
3.4 A notion of models 519
3.5 Predicate variables and the subformula property . 522
4 Higher-order Horn clauses 523
5 The meaning of computations 528
5.1 Restriction to positive terms 529
5.2 Provability and operational semantics 534
6 Towards a practical realization 537
6.1 The higher-order unification problem 538
6.2 P derivations 541
6.3 Designing an actual interpreter 546
7 Examples of higher-order programming 549
7.1 A concrete syntax for programs 549
xii CONTENTS

7.2 Some simple higher-order programs 552


7.3 Implementing tactics and tacticals 556
7.4 A comparison with functional programming . . . . 560
8 Using a-terms as data structures 561
8.1 Implementing an interpreter for Horn clauses . . . 563
8.2 Dealing with functional programs as data 565
8.3 A limitation of higher-order Horn clauses 572
9 Hereditary Harrop formulas 574
9.1 Universal quantifiers and implications in goals . . . 574
9.2 Recursion over structures with binding 577
10 Conclusion 584

Constraint Logic Programming: A Survey


Joxan Jaffar and Michael J. Maher 591
1 Introduction 592
1.1 Constraint languages 593
1.2 Logic Programming 595
1.3 CLP languages 596
1.4 Synopsis 598
1.5 Notation and terminology 599
2 Constraint domains 601
3 Logical semantics 608
4 Fixedpoint semantics 609
5 Top-down execution 611
6 Soundness and completeness results 615
7 Bottom-up execution 617
8 Concurrent constraint logic programming 619
9 Linguistic extensions 621
9.1 Shrinking the computation tree 621
9.2 Complex constraints 623
9.3 User-defined constraints 624
9.4 Negation 625
9.5 Preferred solutions 626
10 Algorithms for constraint solving 628
10.1 Incrementality 628
10.2 Satisfiability (non-incremental) 630
10.3 Satisfiability (incremental) 633
10.4 Entailment 637
10.5 Projection 640
10.6 Backtracking 643
11 Inference engine 645
11.1 Delaying/wakeup of goals and constraints .... 645
11.2 Abstract machine 651
CONTENTS xiii

11.3 Parallel implementations 657


12 Modelling of complex problems 658
12.1 Analysis and synthesis of analog circuits 658
12.2 Options trading analysis 660
12.3 Temporal reasoning 664
13 Combinatorial search problems 665
13.1 Cutting stock 666
13.2 DNA sequencing 668
13.3 Scheduling 670
13.4 Chemical hypothetical reasoning 671
13.5 Propositional solver 674
14 Further applications 675

Transformation of Logic Programs


Alberto Pettorossi and Maurizio Proietti 697
1 Introduction 697
2 A preliminary example 701
3 Transformation rules for logic programs 704
3.1 Syntax of logic programs 704
3.2 Semantics of logic programs 706
3.3 Unfold/fold rules 707
4 Correctness of the transformation rules 715
4.1 Reversible transformations 716
4.2 A derived goal replacement rule 719
4.3 The unfold/fold proof method 721
4.4 Correctness results for definite programs 723
4.5 Correctness results for normal programs . . . . . . 736
5 Strategies for transforming logic programs 742
5.1 Basic strategies 745
5.2 Techniques which use basic strategies 747
5.3 Overview of other techniques 760
6 Partial evaluation and program specialization 764
7 Related methodologies for program development 771

Index 789
Contributors
J. Gallagher Department of Computer Science, University of Bristol,
University Walk, Bristol BS8 3PN.
P. M. Hill School of Computer Studies, The University of Leeds, Leeds
LS2 9JT.
J. Jaffar Department of Information Systems and Computer Science,
National University of Singapore, Kent Ridge, Singapore 0511.
A. C. Kakas Department of Computer Science, University of Cyprus,
PO Box 537, CY-1678 Nicosia, Cyprus.
R. A. Kowalski Department of Computing, Imperial College of Science,
Technology and Medicine, 180 Queen's Gate, London SW7 2BZ.
J. Lobo Department of Computer Science, University of Illionois at
Chicago Circle, Chicago, Illinois, USA.
D. W. Loveland Computer Science Department, Box 91029, Duke Uni-
versity, Durham, NC 27708-0129, USA.
M. J. Maher IBM Thomas J. Watson Research Center, PO Box 704,
Yorktown Heights, NY 10598, USA.
D. Miller Computer and Information Science, University of Pennsylva-
nia, Philadelphia, PA 19104-6389, USA.
J. Minker Department of Computer Science and Institute for Advanced
Computer Studies, University of Maryland, College Park, Maryland 20742,
USA.
G. Nadathur Department of Computer Science, University of Chicago,
1100 East 58th Street, Chicago, Illinois 60637, USA.
M. J. O'Donnell Department of Computer Science, University of Chicago,
1100 East 58th Street, Chicago, Illinois 60637, USA.
A. Pettorossi Electronics Department, University of Rome II, Via della
Ricerca Scientifica, I-00133 Roma, Italy.
M. Proietti Viale Manzoni 30, I-00185 Roma, Italy.
A. Rajasekar San Diego Supercomputer Center, La Jolla, California
92093, USA.
J. Shepherdson Department of Mathematics, University of Bristol,
University Walk, Bristol BS8 3PN.
CONTRIBUTORS xv

F. Toni Department of Computing, Imperial College of Science, Tech-


nology and Medicine, 180 Queen's Gate, London SW7 2BZ.
Introduction: Logic and Logic
Programming Languages
Michael J. O'Donnell

Contents
1 Introduction 1
1.1 Motivation 1
1.2 A notational apology 3
2 Specifying logic programming languages 7
2.1 Semantic systems and semantic consequences ... 7
2.2 Query Systems, questions and answers 11
2.3 Examples of logic programming languages 15
3 Implementing logic programming languages 37
3.1 Proof systems 37
3.2 Soundness and completeness of proof systems ... 40
3.3 Programming systems 44
3.4 Soundness and completeness of programming systems 49
3.5 Proof-theoretic foundations for logic programming 56
4 The uses of semantics 57
4.1 Logical semantics vs. denotational semantics . . . . 57
4.2 Logical semantics vs. initial/final-algebra and
Herbrand semantics 58

1 Introduction
1.1 Motivation
Logic, according to Webster's dictionary [Webster, 1987], is 'a science that
deals with the principles and criteria of validity of inference and demon-
stration: the science of the formal principles of reasoning.' Such 'principles
and criteria' are always described in terms of a language in which infer-
ence, demonstration, and reasoning may be expressed. One of the most
useful accomplishments of logic for mathematics is the design of a particu-
lar formal language, the First Order Predicate Calculus (FOPC). FOPC is
so successful at expressing the assertions arising in mathematical discourse
2 Michael J. O'Donnell

that mathematicians and computer scientists often identify logic with clas-
sical logic expressed in FOPC. In order to explore a range of possible uses of
logic in the design of programming languages, we discard the conventional
identification of logic with FOPC, and formalize a general schema for a vari-
ety of logical systems, based on the dictionary meaning of the word. Then,
we show how logic programming languages may be designed systematically
for any sufficiently effective logic, and explain how to view Prolog, Dat-
alog, aProlog, Equational Logic Programming, and similar programming
languages, as instances of the general schema of logic programming. Other
generalizations of logic programming have been proposed independently by
Meseguer [Meseguer, 1989], Miller, Nadathur, Pfenning and Scedrov [Miller
et al., 1991], Goguen and Burstall [Goguen and Burstall, 1992].
The purpose of this chapter is to introduce a set of basic concepts for
understanding logic programming, not in terms of its historical develop-
ment, but in a systematic way based on retrospective insights. In order to
achieve a systematic treatment, we need to review a number of elementary
definitions from logic and theoretical computer science and adapt them to
the needs of logic programming. The result is a slightly modified logical
notation, which should be recognizable to those who know the traditional
notation. Conventional logical notation is also extended to new and anal-
ogous concepts, designed to make the similarities and differences between
logical relations and computational relations as transparent as possible.
Computational notation is revised radically to make it look similar to log-
ical notation. The chapter is self-contained, but it includes references to
the logic and theoretical computer science literature for those who wish to
explore connections.
There are a number of possible motivations for developing, studying,
and using logic programming languages. Many people are attracted to
Prolog, the best known logic programming language, simply for the spe-
cial programming tools based on unification and backtracking search that
it provides. This chapter is not concerned with the utility of particular
logic programming languages as programming tools, but with the value
of concepts from logic, particularly semantic concepts, in the design, im-
plementation, and use of programming languages. In particular, while
denotational and algebraic semantics provide excellent tools to describe
important aspects of programming systems, and often to prove correct-
ness of implementations, we will see that logical semantics can exploit the
strong traditional consensus about the meanings of certain logical notations
to prescribe the behavior of programming systems. Logical semantics also
provides a natural approach, through proof systems, to verifiably correct
implementations, that is sometimes simpler than the denotational and al-
gebraic approaches. A comparison of the three styles of semantics will show
that denotational and algebraic semantics provide descriptive tools, logical
semantics provides prescriptive tools, and the methods of algebraic seman-
Introduction 3

tics may be used to translate logical semantics into denotational/algebraic


semantics.
In this chapter, a relation is called computable if and only if its char-
acteristic function is total recursive, and a relation is semicomputable if
and only if the set of ordered pairs in the relation is recursively enumer-
able. Recursion theorists and theoretical computer scientists often refer to
computable sets as decidable sets, but logicians sometimes call a theory
decidable when every formula is either provable or refutable in the theory.
The two meanings of 'decidable' are closely connected, but not identical,
and we avoid confusion by choosing a different word. When some com-
ponent of a relation is a finite set, the set is assumed to be represented
by a list of its members for the purpose of discussing computability and
semicomputability.

1.2 A notational apology


In order to understand logic programming rigorously in terms of formal con-
cepts from mathematical logic, and at the same time intuitively, we need
to look closely at the details of several formal relations from logic and from
theory of computation. We must come to understand the formal similarities
and differences between these relations, and how those formal properties
arise from the intuitive similarities and differences in our intended applica-
tions of these relations. Unfortunately, the conventional notations for logic
and computation look radically different, and take advantage of different
simplifying assumptions, which obscures those connections that are essen-
tial to intuitive applications of the corresponding concepts. So, we will
make visually small variations on conventional logical notation, extending
it to deal with questions and their answers as well as the traditional asser-
tions and their semantic interpretations. Then, we will radically redesign
conventional recursion-theoretic notation in order to display visually the
connections between computational relations and logical relations. In or-
der to be prepared for the strange look of the notations, we need to review
them all briefly in advance, although the precise definitions for the concepts
that they denote will be introduced gradually through Sections 2-3.
The important domains of conventional logical objects for our study are
the sets of
• logical assertions, or formulae F
• sets of formulae, or theories T E 2F
• semantic interpretations, or models M
• sets of models, representing knowledge K E 2M
• proofs, or derivations D
We add the unconventional domain of
• questions Q
4 Michael J. O'Donnell

Answers to questions are particular formulae, so no additional domain is


required for them. The domains of conventional computational objects are
the sets of
• programs P
• inputs I
• computations C
• outputs O
In recursion-theoretic treatments of computation, programs, inputs, and
outputs are all integers, but our analysis is more convenient when they are
allowed to be different domains. We will find strong intuitive analogies and
formal connections between
• programs and sets of formulae
• inputs and questions
• computations and proofs
• outputs and formulae (intended as answers to questions)
In order to understand the analogies and formal connections thoroughly,
we must investigate a number of relations between domains with varying
arities from two to four. In all cases, we will use multiple infix notation.
That is, each n-ary relation will be denoted by n — 1 symbols separating
its arguments. With some care in the geometrical design of the separator
symbols, we get a reasonably mnemonic notation.
There are two quaternary relational notations from which all the other
notations may be derived. Let Q be a question, T a set of formulae, D a
proof, and F a formula. The notation

Q?- T | D - F

means that in response to the question Q, given the postulates in T, we


may discover the proof D of the answer F. Similarly, let I be an input, P
a program, C a computation, and O an output. The notation

I > P D|C -> O

means that in response to the input /, the program P may perform the
computation C, yielding output O. The correspondence between the ar-
guments Q and I, T and P, D and C, F and O displays the crucial cor-
respondence between logic and computation that is at the heart of logic
programming.
There are two closely related trinary notations.

Q7-TI-F

means that there exists a proof D such that Q ?- T I D - F, and


Introduction 5

means that there exists a computation C such that / > P 0 C —i O.


The symbol H in Q ?- F h F is the conventional symbol for proof in
mathematical logic; we take the liberty of decomposing it into the two
symbols I and - for the quaternary notation. The conventional recursion-
theoretic notation for our /> P [H O is <f>p(I) — O. The computational
symbol D-* and its components D and —> are designed to have similar shapes
to h, I, and -.
Other relations from logic do not correspond directly to computational
relations, but can be understood by their connections to the quaternary
form, in which the logic/computation correspondence is direct and trans-
parent. In Section 3.2 I define Q 1- T I D - F to hold exactly when
both
Q f- F and T ID - F
where Q 7- F means that F is an answer (not necessarily a correct one)
to the question Q, and T I D - F means that D is a proof of F, using
postulates in the set T. T I D - F is a conventional concept from math-
ematical logic (often written T, D \- F or T \-& F). The question-answer
relation f- is not conventional. Notice that each separating symbol in the
quaternary notation Q f- T \ D - F is used exactly once in the binary
and trinary forms from which it is defined, so the notational conjunction
of symbols suggests the logical conjunction of the denoted relations. Un-
fortunately, while the symbol 7- appears between the question Q and the
answer formula F in the binary notation Q f- F, it is not adjacent to F
in the quaternary notation Q ?- T I D - F. The dash component - of the
symbol ?- mimics the - symbol at the end of the quaternary notation, and
the similar component of the h symbol from the trinary notation above, as
a reminder that the ?- symbol is expressing a relation to the final answer
formula F, rather than to the set T of postulated formulae.
The quaternary computational relation is also defined as the conjunc-
tion of a binary and a trinary relation, but the arguments involved in these
relations do not correspond to the arguments of the binary and trinary
relations from logic. In Section 3.3 I define /> P D C ->• O to hold exactly
when both
/> P D C and C -+ O
where /> P D C means that the program P on input / may perform the
computation C, and C -> O means that the computation C yields output
O. In this case, the mnemonic suggestion of the conjunction of the trinary
and binary relations in the quaternary notation works out perfectly, as all
argument positions are adjacent to the appropriate separator symbols.
A few other derived notations are useful for denoting relations from
logic. These all agree with conventional notation in mathematical logic.
6 Michael J. O'Donnell

T\-F

means that there exists a proof D such that T I D - F—that is, F is for-
mally derivable from T. Corresponding to the relation I- of formal deriv-
ability is the relation ^ of semantic entailment.

T|=F

means that F is semantically entailed by T. Similarly,

Q?-T|=F
/
means that F is an answer to Q semantically entailed by T (Q 7~ F and
T (= F) in analogy to Q 1- T h F. The mathematical definition of semantic
entailment involves one more semantic relation. Let M be a model, and F
a formula.
M\=F
means that F is true in M.
Table 1 displays all of the special notations for semantic, proof-theoretic,
and computational relations. The precise meanings and applications of
these notations are developed at length in subsequent sections. The no-
tation described above is subscripted when necessary to distinguish the
logical and computational relations of different systems.

L ogic Computation
Semantics Proof
Q7-TID-F I> P 0 C ->• O
Q?-T|=F Q?-ThF I> P\HO
I> PDC
QT- F
C->0
T\D - F
T|=F TI-F
M \=F
Table 1. Special notations for logical and computational relations
Introduction 7

2 Specifying logic programming languages


Logic typically develops its 'principles and criteria of validity of inference'
by studying the relations between notations for assertions, the meanings
of those assertions, and derivations of notations expressing true assertions.
Mathematical logic formalizes these concepts using logical formulae as no-
tations, sets of models to analyze meanings and characterize truth, and
demonstrations or proofs as derivations of true formulae and inferences.
The structure of formulae alone is syntax, their relation to models is se-
mantics, and their relation to proofs is proof theory. Syntax is not relevant
to the present discussion. We must examine formal systems of semantics,
and augment them with formal concepts of questions and their answers, in
order to understand the specification of a logic programming language. In
Section 3 we see how formal systems of proof provide a natural approach
to the implementation of computations for logic programming.

2.1 Semantic systems and semantic consequences


A semantic system relates a set F of logical formulae to a set M of formal
models, each representing a conceivable state of the world in enough detail
to determine when a given formula represents a true assertion in that state
of the world.
Definition 2.1.1. A semantic system is a system S = (F,M, ^=), where
1. F is a set of logical formulae
2. M is a set of models
3. (= is a relation on M x F
Let K C M. Theory(K) = {F e F : M \= F for all M 6 K}.
Let T C F. Models(T) = {M e M : M \= F for all F e T}.
Intuitively, M \= F is intended to mean that formula F holds in, or is
valid in, or is satisfied by, model M. Theory(K) is the fullest possible
description of K using a set of formulae in the language of the system.
Models(T) represents the state of knowledge given implicitly by the for-
mulae in T—knowing T we know that reality corresponds to one of the
models in Models(T), but we do not know which one. Notice the anti-
monotone relation between F and M:
TI C T2 if and only if Models(Ti) D Models(T2)
Kj C K2 if and only if Theory(Ki) D Theory (K 2 )
Models(Ti U T2) = Models(Ti) n Models(T2)
Models(Tx n T2) = Models(T1) U Models(T2)
Theory(K1 U K2) = Theory(K1) n Theory(K2)
Theory(K1 n K2) = Theory (K1) U Theory(K2)

In order to provide satisfactory intuitive insight, a semantic system


must relate the syntactic structure of formulae to the determination of
8 Michael J. O'Donnell

truth. For example, well-known sets of formulae often come with a syntactic
operator to construct, from two formulae A and B, their logical conjunction
A ^ B. The semantics for conjunctions is defined structurally, by the rule
M |= A A B if and only if M |= A and M |= B. The formal analysis of this
chapter deals only with the abstract relation of a model to a formula that
holds in the state of the world represented by that model, not the internal
structure of that relation, because we are interested here in the use of
semantics for understanding logic programming, rather than the deeper
structure of semantics itself. Goguen's and Burstall's institutions [Goguen
and Burstall, 1992] are similar to semantic systems, but they capture in
addition the structural connection between syntax and semantics through
category theory, and show that the functions Models and Theory form a
Galois connection.
Notice that the sets F of formulae and M of models are not required to
be given effectively. In well-known semantic systems, the set of formulae is
normally computable, since formulae are normally finite syntactic objects,
and it is easy to determine mechanically whether a given object is a formula
or not. Infinite formulae, however, have important uses, and they can be
given practical computational interpretations, so we do not add any formal
requirement of computability. The set of models, on the other hand, is
typically quite complex, because models represent conceivable states of an
external world, rather than finite constructions of our own minds. In fact,
for many semantic systems there are technical set-theoretic problems even
in regarding the collection of models in the system as a set, but those
problems do not affect any of the results of this chapter.
In this chapter, basic concepts are illustrated through a running ex-
ample based on the shallow implicational calculus (SIC), designed to be
almost trivial, but just complex enough to make an interesting example.
More realistic examples are treated toward the end of the chapter.
Example 2.1.2. Let At be a set of atomic propositional formulae. The
set Fsh of formulae in the shallow implicational calculus is the smallest set
such that:
1. At C Fsh
2. If a, b € At, then (a => 6) <E Fsh
The set MSH of models in SIC is defined by

Msh = 2At

The semantic relation (=sh^ MSK * FSK is defined by:


1. For a G At, M \=sh a if and only if a € M
2. M |=sh (a => b) if and only if either 6 £ M or a £ M
Introduction 9

Now (Fsh M sh , NSH) is a semantic system, representing the classical con-


cept of meaning for the implicational formulae of SIC.
SIC is just the restriction of the classical propositional calculus [An-
drews, 1986; Kleene, 1952; Gallier, 1986] to atomic propositional formu-
lae, and implications between atomic propositional formulae. It is called
'shallow' because no nesting of implications is allowed. Since the truth of
formulae in SIC (as in the propositional calculus) is determined entirely by
the truth of atomic formulae, a model merely specifies the set of atomic
formulae that are true in a given conceivable state of the world. Following
the tradition of material implication in classical logic, an implication is true
precisely when its conclusion is true, or its hypothesis is false.
For the formal definition of a logic programming language, the impor-
tant thing about a semantic system is the semantic-consequence relation
that it defines, determining when the truth of a set of formulae justifies
inferring the truth of an additional formula.
Definition 2.1.3 ([Andrews, 1986; Gallier, 1986] ). Let S = (F,M,
t=) be a semantic system. The semantic-consequence relation defined by S
is (=C 2F x F, where T |= F if and only if M \= F for all M € Models(T).
The semantic-consequence relation (= is compact if and only if, for all
T C F and F € F, whenever T |= F there exists a finite subset Tf C T
such that Tf |= F.
Intuitively, T (= F means that F is a semantic consequence of T, since
F must be true whenever all formulae in T are true. Semantic consequences
are often called logical consequences; our terminology is chosen to highlight
the contrast between semantic consequences and the provable consequences
of Definition 3.1.4. Notice that Theory(Models(T)) is the set of semantic
consequences of T. It is easy to show that an arbitrary relation |= on 2F x F
is the semantic-consequence relation of some semantic system if and only
if it is
1. reflexive: F 6 T implies that T \= F
2. monotone: T |= F and T C U imply that U (= F
3. transitive: T \= F and T U {F} \= G imply that T \= G
In order for a semantic-consequence relation to be useful for logic program-
ming, or for rigorous formal reasoning, it must be sufficiently effective.
Well-known semantic systems normally define semantic-consequence rela-
tions that are compact—their behavior on arbitrary sets is determined by
their behavior on finite sets. Normal semantic-consequence relations are
semicomputable, but not necessarily computable, when restricted to finite
sets of formulae in the first component. Fortunately, semicomputability is
enough for logic programming.
Example 2.1.4. The semantic-consequence relation \=sh of the shallow
implicational calculus of Example 2.1.2 is compact, and has a particularly
10 Michael J. O'Donnell

simple behavior:
1. for atomic formulae a € At, T |=sh a if and only if there is a fi-
nite sequence {O.Q, ..., am) of atomic formulae such that OQ 6 T, and
(a,i =>• di+i) € T for all i < m, and am = a
2. T |=sh (o =>•fr)if and only if there is a finite sequence (OQ, ..., am) of
atomic formulae such that a0 € T U {a}, and (<n => Oi+i) 6 T for all
i < m, and am = b
We may think of the implications in T as directed edges in a graph whose
vertices are atomic formulae. Atomic formulae in T are marked true. An
atomic formula a is a semantic consequence of T precisely if there is a
directed path from some atomic formula in T to a. Similarly, an implication
a => b is a semantic consequence of T precisely if there is a directed path
from a, or from an atomic formula in T, to 6. Notice that SIC satisfies the
deduction property: [Andrews, 1986; Kleene, 1952; Gallier, 1986]

T |=sh (a => b) if and only if T U {a} (=Sh b

A semantic system provides a conceptual tool for analyzing a primitive sort


of communication in a monologue. A state of implicit knowledge is natu-
rally represented by the set of models corresponding to conceivable states of
the world that are consistent with that knowledge. Notice that larger sets
of models represent smaller amounts of knowledge. For a general discussion
of knowledge as sets of models, the shortcomings of such representations,
and problems and paradoxes that arise when subtle sorts of knowledge are
considered, see [Fagin et al., 1984]. The knowledge involved in formal anal-
ysis of the examples of logic programming in this chapter is simple enough
to be represented by sets of models without presenting the problems that
arise in a more general setting. Explicit knowledge is naturally represented
by a set of formulae. Models(T) is the implicit knowledge given explicitly
by T. Similarly, Theory(K) is the strongest explicit representation of the
implicit knowledge K that is expressible in a given language, but there
is no guarantee that an agent with implicit knowledge K can effectively
produce all of the explicit knowledge Theory (K).
Consider a speaker, whose state of knowledge is represented by Ks, and
an auditor with initial knowledge K0. The speaker wishes to communi-
cate some of her knowledge to the auditor, so she utters a set of formulae
T C Theory(Ks). The impact of the speaker's utterance on the auditor's
state of knowledge is to remove from the auditor's set of models those that
do not satisfy T. That is, K0 is replaced by K1 = K° n Models(T). No-
tice that, if the auditor's initial knowledge is minimal, that is if K° is the
set of all models in the semantic system, then K* = Models(T), so the
formulae implied by the new knowledge, Theory(K^), are exactly the se-
mantic consequences of T. In logic programming systems, the programmer
Introduction 11

plays the part of the speaker above, the processor plays the part of the
auditor, the program is the utterance, and the logical meaning of the pro-
gram is the resulting state of knowledge produced in the auditor/processor.
Inputs to, computations of, and outputs from logic programs are treated
later.
Notice that this style of semantic analysis of communication does not
give either speaker or auditor direct access to inspect or modify the models
constituting the other's state of implicit knowledge. Rather, all such access
is mediated by the utterance of explicit logical formulae. Also, notice that
there is no attempt to construct a unique model to represent a state of
knowledge, or the information communicated by an utterance. Rather, an
increase in implicit knowledge is represented by a reduction in the variabil-
ity of members of a set of models, any one of which might represent the
real state of the world. Unless the semantic-consequence relation of a se-
mantic system is very easy to compute—which it seldom is—the difference
between implicit knowledge and effectively utterable explicit knowledge can
be quite significant. The proof systems of Section 3.1 help describe a way
in which implicit knowledge is made explicit, and yield a rough description
of the computations of logic programs.
The preceding scheme for representing communication of knowledge
deals naturally with a sequence of utterances, by iterating the process of
shrinking the auditor's set of models. There is no provision, however, for
analyzing any sort of interactive dialogue, other than as a pair of formally
unrelated monologues. The query systems of the next section introduce a
primitive sort of interactive question-answering dialogue.

2.2 Query Systems, questions and answers


Semantic systems and semantic-consequence relations are conventional sub-
jects for logical investigation. They suffice for discussions of the truth of a
formula and the validity of the inference of a new formula from a given set
of formulae. In order to analyze the relation between input to a logic pro-
gram and the corresponding output, we need a formal basis for discussing
questions and their answers. Mathematical logicians have given very little
attention to this branch of logic—one exception is the formal treatment
by Belnap and Steel [Belnap Jr. and Steel, 1976]. Query systems are an
abstraction of the common formal schema from a number of instances of
question-answer domains defined by Belnap and Steel.
Definition 2.2.1. A query system is a system Q = (F, Q, ?-), where

1. F is a set of logical formulae


2. Q is a set of questions
3. T- is a relation on Q x F
12 Michael J. O'Donnell

Questions, like formulae, are normally finite syntactic objects, and the set
of all questions is normally computable, but we allow exceptions to the
normal case.
Q ?- F is intended to mean that F is an answer to Q. ?- is intended
only to determine the acceptable form for an answer to a question, not
to carry any information about correctness of an answer. For example, it
is reasonable to say that '2 + 2 = 5' is an incorrect answer to 'what is
2 + 2?', while '2 + 2 = 22' is correct, but not an answer. The correctness or
incorrectness of an answer is evaluated semantically with respect to explicit
knowledge.
Definition 2.2.2. Let Q = {Fq,Q,?-) be a query system, and let
«S = {Fs, M, ^=) be a semantic system with FQ C FS-
Q 7- T |= F means that F € FQ is a semantically correct answer to
Q 6 Q for explicit knowledge T C FS, defined by

Q ?- T |= F if and only if Q ?- F and T |= F

A question Q 6 Q is semantically answerable for explicit knowledge


T C FS if and only if there exists a formula F e FQ such that F is a
semantically correct answer to Q in T.
Meseguer [Meseguer, 1989; Meseguer, 1992] proposes a different notion
of question answering, in which a question is a formula, and an answer is a
proof (in an abstract notation omitting many details) of the formula. (This
is an interesting twist on the formulae as types concept [Howard, 1980;
Tait, 1967], which is more usually applied by letting a program specification
be a formula, and a program be a proof of the formula [Constable et al.,
1986].)
Several interesting query systems may be defined for the shallow impli-
cational calculus.
Example 2.2.3. Let imp be a new formal symbol, and let FSH be the
set of formulae in SIC defined in Example 2.1.2. Let

QS1 = {imp(F) : F € At}

Define the relation ?~siC QS1 x Fsh by

imp(c) ?~si (a =>• b) if and only if a = c

Now (Fsh,Qsi>^~si) is a query system representing the conceivable an-


swers to questions of the form 'what atomic formula does a imply?'
The query system of Example 2.2.3 above is susceptible to two sorts of
answers that may be intuitively unsatisfying. First, in a state of knowledge
in which an atomic formula b is known to be true, (a =>• b) is a correct
Introduction 13

answer to questions imp (a) for all atomic formulae a. This problem may
be avoided by considering states of knowledge in which only implications are
known, or it may be addressed by changing the underlying semantic system
to one with a relevant interpretation of implication [Anderson and Belnap
Jr., 1975]. Second, (a => a) is a correct answer to the question imp(a).
(a => a) is a tautology, that is, it holds in all models, so it cannot give
any information about a state of knowledge. We could define a new query
system, in which only nontautologies are considered to be answers. Since,
for most useful logics, the detection of tautologies ranges from intractable to
impossible, such a technique is generally unsatisfying. A better approach
is to let a question present a set of atomic formulae that must not be
used in an answer, since the questioner considers them to be insufficiently
informative. We may find later that certain nontautological formulae are
uninformative for various reasons, and this technique reserves the flexibility
to handle those cases.
Example 2.2.4. Let rest-imp be a new formal symbol, and let

Qs2 = {rest-imp(a, A) : a € At and A C At}

Define the relation ?~S2C QS2 x FSK by

rest-imp(c, C) ?~si (a =>• 6) if and only if a = c and 6 £ C

Now (Fsh,Qsi > ?~si) is a query system representing the conceivable an-
swers to questions of the form 'what atomic formula not in A does a imply?'
The new query system of Example 2.2.4 may be used very flexibly
to guide answers toward the most informative implications of an atomic
formula a. If the explicit knowledge available to the auditor to answer
questions is finite, then there are only a finite number of atomic formulae
that can appear in an answer, so the sets of prohibited formulae may simply
be listed. In more sophisticated languages than SIC, we need some sort
of finite notation for describing large and even infinite sets of prohibited
answers.
Query systems allow a further enrichment of the analysis of communi-
cation. Once a speaker has communicated some implicit knowledge K to
an auditor by uttering formulae, a questioner (sometimes, but not always,
identical with the speaker) may ask a question Q, which the auditor tries
to answer by discovering a formula F such that Q f- F (F is an answer
to the question Q), and F 6 Theory (K) (Q is correct according to the
implicit knowledge K).
So, given a set T of formulae expressing the knowledge Models(T), a
question Q provides an additional constraint on the search for a formula F
such that T (= F, to ensure that Q 1- F as well. In many cases, there is
more than one correct answer F such that Q ?- T (= F. Depending on the
14 Michael J. O'Donnell

context, the questioner may want a single answer chosen nondeterministi-


cally from the set of correct answers, or a best answer under some criterion.
The case where the questioner wants a list of all answers may be modelled
by representing that list by a single formula giving the conjunction of all
the list elements. A particularly useful criterion for best answer uses the
logical consequence relation.
Definition 2.2.5. Let Q = {Fq,Q, ?-) be a query system, and let
S = (Fs , W, |=) be a semantic system with FQ C FS . F is a consequen-
tially strongest correct answer to the question Q for explicit knowledge T
if and only if

2. for all G € FQ, whenever Q ?- T |= G, then {F} |= G

Consequentially strongest answers are not necessarily unique, but all con-
sequentially strongest answers are semantically equivalent. Notice that the
comparison of strength for two answers F and G is done without taking
into account the knowledge T. That is, we require {F} \= G, rather than
T U {F} |= G. This makes sense because T is known to the auditor, but
not necessarily to the questioner. Even if the questioner knows T, he may
not be able to derive its consequences. The whole purpose of the communi-
cation between questioner and auditor is to give the questioner the benefit
of the auditor's knowledge and inferential power. So, the value of an an-
swer to the questioner must be determined independently of the knowledge
used by the auditor in its construction (the alternative form T U {F} |= G
holds trivially by monotonicity, so it carries no information anyway).
In order to illustrate the use of consequentially strongest answers, we
extend SIC to deal with conjunctions of implications.
Example 2.2.6. Expand the formulae of SIC to the set

FSc = FSh U {Fi A • • • A Fm : F, G € Fsh}

of formulae in the shallow implicational-conjunctive calculus (SICC). The


semantic systems and proof systems of Examples 2.1.2, 3.1.3, 3.1.2 extend
in the natural way to deal with conjunctive formulae. Let conj-imp be a
new formal symbol, and let

Qsc = {conj-imp(a) : a £ At}

Define the relation ?-scC QSc x FSc by

conj-imp (c) ?~sc («i =>• Z»i) A • • • A (am =>• bm) if and only if
Oj = c for all i < m
Introduction 15

Now (Fsc,Qsc>?~Sc) is a query system representing the conceivable an-


swers to questions of the form 'what are some atomic formulae implied by
a?' A consequentially strongest answer to conj-imp(a) is a conjunction
of all of the implications with hypothesis a that hold in a given state of
knowledge.
The concept of consequentially strongest answers is particularly helpful
in systems where a potentially infinite answer is produced incrementally.
The entire infinite answer may often be read as an infinite conjunction of
finite formulae, and the requirement of consequentially strongest answers
guarantees that the incremental production of the answer does not stop
prematurely, before all available information is expressed.
In logic programming systems, the user of a program plays the part of
the questioner. The input is the question, and the output is the answer, if
any, discovered and proved by the processor/auditor. This scenario allows
the knowledge resources of a programmer/speaker to be combined with the
deductive powers of a processor/auditor, in order to answer questions from
the user/questioner.
2.3 Examples of logic programming languages
Now we can design a wide variety of logic programming languages, by
defining appropriate semantic systems and query systems.
2.3.1 Programming in first-order predicate calculus
Several logic programming systems, particularly Prolog and Relational
Databases, are essentially sublanguages of a general language for logic pro-
gramming in FOPC.
Definition 2.3.1 ([Andrews, 1986; Kleene, 1952; Gallier, 1986]).
Let V be a countably infinite set. Members of V are called variables, and
are written u,v,w, x, y,z, sometimes with subscripts.
Let Fun; be a countably infinite set for each i > 0, with Funi and Funj
disjoint when i = j. Members of Fun^ are called function symbols of arity
i, and are written f,g,h, sometimes with subscripts. A function symbol of
arity 0 in Funo is called a constant, and may be written a, 6, c, d, e.
Let Fred; be a countably infinite set for each i > 0, with Predj and
Predj disjoint when i ^ j, Predj and Funj disjoint for all i and j. Mem-
bers of Predi are called predicate or relation symbols of arity i, and are
written P,Q,R, sometimes with subscripts. A predicate symbol of arity 0
in Predo is called a propositional symbol, and is closely analogous to an
atomic propositional formula in At as used in Example 2.1.2.
The set Tp of terms in FOPC is defined inductively as the least set
such that:
1. if x€ V then x e TP
2. if a 6 Fun0 then a e Tp
16 Michael J. O'Donnell

3. if f 6EFun; for some i > 0 and ti,... ,t, € TP, then /(ti,... ,ti) 6 Tp
Terms are intended to represent objects in some universe of discourse.
/ ( < i , . . . , t { ) is intended to represent the result of applying the function
denoted by / to the objects represented by ti,...,ti. We often take the
liberty of writing binary function application in infix notation. For exam-
ple, if + € Fun2 we may write (ti + i?) for +(^1,^2)- A ground term is a
term containing no variables.
The set Fp of formulae in FOPC is defined inductively as the least set
such that:
1. True, False € Fp
2. if P e Predo, then P £ FP
3. if P E Predj for some i > O a n d f i , . . . ,tt e TP, thenP(ti,... ,ti) E Fp
4. if A, B e FP, then (A A B), (AVB),(A=* B), (->A) e FP
5. if A 6 Fp and x <E V, then (3x : A), (Vx :A)eFP
Formulae in FOPC are intended to represent assertions about the objects
represented by their component terms. True and False are the trivially
true and false assertions, respectively. P(ti,..., ti) represents the assertion
that the relation denoted by P holds between ti,... ,ti. (A A B), (A V B),
(A => B), (-*A) represent the usual conjunction, disjunction, implication,
and negation. (3x : A) represents 'there exists x such that A,' and (Vx : A)
represents 'for all x A.' Parentheses may dropped when they can be inferred
from normal precedence rules.
In a more general setting, it is best to understand Furii and Pred^ as
parameters giving a signature for first-order logic, and let them vary to
produce an infinite class of predicate calculi. For this chapter, we may take
Fun; and Predi to be arbitrary but fixed. In many texts on mathematical
logic, the language of FOPC includes a special binary predicate symbol '='
for equality. We follow the Prolog tradition in using the pure first-order
predicate calculus, without equality, and referring to it simply as FOPC.
The intended meanings of FOPC formulae sketched above are formal-
ized by the traditional semantic system defined below. First, we need a set
of models for FOPC.
Definition 2.3.2 ([Andrews, 1986; Kleene, 1952; Gallier, 1986]).
Let U be an infinite set, called the universe. Let U C U.
A variable assignment over U is a function z/: V —)• U.
A predicate assignment over U is a function
p : LKPredi : i > 0} -> U{2 (£/i) : i > 0}
such that P 6 Pred; implies p(P) C [/' for all i > 0.
A function assignment over U is a function
T : LKFuni : i > 0} -> (J{U^ :i>0}
such that / e Funi implies r(f) : Ui -> U for all i > 0.
If U C U, T is a function assignment over U, and p is a predicate
Introduction 17

assignment over U, then (U, r, p) is a model of FOPC. Mp is the set of all


models of FOPC.
Notice that FOPC models assign values to function and predicate sym-
bols, but not to variables. The exclusion of variable assignments from
models is the only technically significant distinction between variables and
constant symbols. A function assignment and a variable assignment to-
gether determine a valuation of terms. With a predicate assignment, they
also define a valuation of formulae.
Definition 2.3.3 ([Andrews, 1986; Kleene, 1952; Gallier, 1986]).
Let r be a function assignment over U, v a variable assignment over U.
The term valuation rv : Tp —>• U is defined inductively by
1. if x e V, then TV(X) = v(x)
2. if / e Funi, and t 1 , . . . , ti € TP, then
rv(f(ti,..., ti)) = T ( f ) ( r v ( t i ) , . . . , rv(ti))
In addition, let p be a predicate assignment over U. The formula valuation
PT,V '• FP ->• {0,1} is defined inductively by
1. p,.,,, (False) = 0
2. p r>v (True) = 1
3. if Pe Predi, and * i , . . . , * i £ Tp, then pT,v(P(ti,... ,<<)) = 1 if and
Only if (Tr,v ( t 1 ) , . . .,TT,v(ti)) € p(P)
4. pT,v(A A B) = 1 if and only if pTjU(A) = 1 and pr,v(B) = 1
5. pT,v(A V B) = 1 if and only if pr,v(A) = 1 or pT,v(B) = 1
6. pT,v(A =>• B) — 1 if and only if pT<v(A) = 0 or pT^(B) = 1
7- pT,v(-^A) = 1 if and only if pr^(A) = 0
8. pT}V(Bx : A) = 1 if and only if /?,-,„/ (A) — I for some v' such that y ^ x
implies v(y] = v'(y)
9. p Tit/ (Va;: .A) = 1 if and only if pTy(A) = 1 for all v' such that y ^ x
implies v(y) = v'(y)
Now, we may define an appropriate semantic system for FOPC.
Definition 2.3.4. The classical semantic system for FOPC is (Fp,Mp,
(=p), where (U,r,p) (=p F if and only if pTtll(F) = 1 for all variable as-
signments v over U.
FOPC is particularly well suited to defining relations, letting variables
stand for the parameters in the relations. For such purposes, it is important
to distinguish between uses of variables that are bound by the quantifiers
3 and V, and those that are free to be used as relational parameters.
Definition 2.3.5 ([Andrews, 1986; Kleene, 1952; Gallier, 1986]).
An occurrence of a variable a; in a formula is bound if and only if it is
located within a subformula of the form (3x : F) or the form (Vz : F). An
occurrence of a variable a; in a formula is free if and only if it is not bound.
18 Michael J. O'Donnell

A sentence is a formula with no free occurrences of variables. If F € Fp


is a formula, and all free occurrences of variables in F are among xi,. . .,14,
then the sentence (Vxi : • • • Vzj : F) is a closure of F. It is easy to see that
F is semantically equivalent to each of its closures.
Let xi, . . . , Xi be a list of variables with no repetitions, and let ti , . . . , t j
6 Tp. F[ti , . . . , ti/xi , . . . , Xi] is the formula that results from substituting
the term tj for every free occurrence of the variable Xj in the formula F,
for each j, 1 < j; < i, renaming bound variables of F when necessary
so that the variables of ti,...,ti are free in the result [Andrews, 1986;
Kleene, 1952; Gallier, 1986] . When G = F[ti , . . . , ti/n , . . . , x^ , we say
that G is an instance of F, and that F is more general than G. These
relations apply naturally to terms as well as formulae.
G is a variable renaming of F if and only if G = F[yi ,..., yi/Xi , . . . , Xi] ,
for some list of variables j/i, . . . , j/t with no repetitions (equivalently, G is an
instance of F and F is an instance of G, since we get F = G[XI ,..., Xi/y\ ,
...,W]).
It is very natural to think of a query system in which we may ask what
substitutions for the free variables of a formula make it true.
Definition 2.3.6 ([Belnap Jr. and Steel, 1976]). Let what be a
new formal symbol. Let

= {(whatzi,...,Zi :F) : F 6

Define the relation ?-pC Qp x Fp by

(what x\ , . . . , Xi : F) T- p G

if and only if

G = F[ti, . . . ,*i/asi, . . . ,arj for some ti, . . . ,fc € TP

Now (Fp, QP, ?-p) is a query system representing the conceivable single
answers to questions of the form 'for what terms t\ , . . . , ti does F[t i , . . . , ti/
* ! , . . . , Si] hold?'

The query system above has a crucial role in the profound theoretical
connections between logic and computation. For each finite (or even semi-
computable) set T C Fp of formulae, and each question Q = (what xi , . . . ,
Xi : F), the set

{<ti, . . - , ti) : Q ?-P T \=P F[*i, . . . , ti/xi ,..., Xi]}

is semicomputable. If we use some constant symbol c G Funo to repre-


sent the number 0, some unary function symbol / & Funi to represent the
Introduction 19

successor function, and some binary predicate symbol E to represent the


equality relation, then we may define all semicomputable sets of integers by
formulae in a simple and natural way. We let R C Fp be a finite set of pos-
tulates for Robinson's arithmetic (system Q of [Mostowski et al., 1953])—a
primitive theory sufficient for deriving answers to all addition and multi-
plication problems. Then every semicomputable set is of the form

{j : (what x : F) T-P R \=P F[fi(c)/x}}

for some formula F 6 Fp. Similarly, every partial computable function </>
may be defined by choosing an appropriate formula F 6 Fp, and letting
(j)(i) be the unique number j such that

(whaty : F(f(c)/x}) ^p R \=P F(f(c),r(c)/x,y}

Notice that the FOPC questions (what Xi,..., Xi : F) do not allow triv-
ial tautological answers, such as the correct answer (a => a) to the question
imp(a) ('what atomic formula does a imply?', Example 2.2.3, Section 2.2).
In fact, (what xi,..., Xi : F) has a tautological answer if and only if F is
a tautology. FOPC questions avoid this problem through the distinction
between predicate symbols and function symbols. When we try to find an
answer F[ti,..., tifx\,..., £;] to the question (what x\,..., Xi : F), the in-
formation in the question (what xi,..., Xi : F) is carried largely through
predicate symbols, while the information in the answer F[t\,..., tijx\,...,
Xi] is carried entirely by the function symbols in ti,..., ti, since the pred-
icate symbols in the formula F are already fixed by the question. It is
the syntactic incompatibility between the formula given in the question
and the terms substituted in by the answer that prevents tautological an-
swers. Suppose that FOPC were extended with a symbol choose, where
(choosex : F) is a term such that (3x : F) implies F[(choosea;: F)/x}.
Then jP[(choose x : F)/x] would be a trivially (but not quite tautologi-
cally) correct answer to (what x : F) except when no correct answer exists.
The absence of trivial tautological answers does not mean that all an-
swers to FOPC questions are equally useful. In some cases a question
has a most general semantically correct answer. This provides a nice syn-
tactic way to recognize certain consequentially strongest answers, and a
useful characterization of all answers even when there is no consequentially
strongest answer.
Proposition 2.3.7. // G' is an instance of G, then G' is a semantic
consequence of G (G ^p G'). It follows immediately that if G is a se-
mantically correct answer to Q £ Qp (Q ?-p T \=p G), then G' is also a
semantically correct answer (Q ?-p T \=p G').
If G' is a variable renaming of G, then they are semantically equivalent.
20 Michael J. O'Donnell

Definition 2.3.8. Let T C Fp be a set of formulae, and let Q = (what x\ ,


. . . , Xi : F) e QP be a question. G is a most general answer to Q for explicit
knowledge T if and only if

2. for all GO <E FP, if G is an instance of G0 and Q 7-P T |=P G0, then
GO is a variable renaming of G.
A set A C Fp of formulae is a most general set of correct answers to Q
for T if and only if
1. each formula F e A is a most general answer to Q for T
2. for all formulae G e Fp, if Q ?-p T ^p G, then G is an instance of
some F € A
3. for all formulae FI, F2 £ A, if F2 is an instance of FI, then F2 = FI
It is easy to see that for each question Q £ QP and set of formulae T C Fp
there is a most general set of correct answers (possibly empty or infinite).
Furthermore, the most general set of correct answers is unique up to vari-
able renaming of its members.
Notice that it is very easy to generate all correct answers to a question
in Qp from the most general set— they are precisely the instances of its
members. If Q has a consequentially strongest answer, then it has a conse-
quentially strongest answer that is also most general. If the most general
set of correct answers is the singleton set {F}, then F is a consequentially
strongest answer.
Example 2.3.9. Let G <E Pred2 be a binary predicate symbol, where
G(ti , £2) is intended to assert that ti is strictly larger than t% . Suppose that
objects in the universe have left and right portions, and that l,r € Funi
denote the operations that produce those portions. A minimal natural
state of knowledge about such a situation is

A natural question is Q = (what Z, y : G(x,y)). The most general set of


answers is
Ao = {G(z, !(*)), G(x,r(x))}
Other answers include G(l(x),r(l(x))), G(r(x),l(r(x))), etc.
TO has only the specific knowledge that a whole is larger than its por-
tions, but not the general knowledge that the relation G is a strict ordering
relation. Let
Ti = T0L>{Vx,y,z:(G(x,y)AG(y,z))=>
G(x, z), Vx, y : -(G(z, y) A G(y, a?))}
For this extended knowledge, the most general set of answers to Q is the
infinite set
Introduction 21

A! = AoU{G(*,l(J(*))), G(*,i(r(*))), G(x,r(r(x))), G ( x , l ( l ( l ( x ) ) ) ) , • ••}

The first formula added to TI above, which expresses the transitivity of


ordering, leads to the additional answers.
In some cases, it is convenient to allow conjunctions of answers to ques-
tions in Qp to be conjoined into a single answer.
Definition 2.3.10. Let conj-what be a new formal symbol. Let

QPA = {(conj-what xi, . . . ,Xi : F) : F € FP}

Define the relation T-FA^ QPA x Fp by

(conj-what xi,...,Xi : F) ?-pA G

if and only if

G = F[t{, ..., t}/xi,. . . ,Xi] A • • • A F[t?, . . . , *r/zi, ...,xt] for some

Now (FpA,Qp A , ?~PA) is a query system representing the conceivable


conjunctive answers to questions of the form 'for what terms ti , . . . , £; does

Answers to (conj-what x\ , . . . , Xi : F) are precisely conjunctions of


answers to (what xi,...,Xi : F). (conj-what x\ , . . . , Xj : F) may have
a consequentially strongest answer even though (what x\, . . . ,Xi : F) does
not. In particular, whenever the most general set of answers to (what xi,
. . . , Xi : F) is finite, the conjunction of those answers is a consequentially
strongest answer to (conj-what xi , . . . , Xi : F).
2.3.2 Prolog
The most famous programming language associated with logic program-
ming, and the one that instigated scientists to study logic programming as
a specialty within computer science, is Prolog [Kowalski, 1974; van Emden
and Kowalski, 1976], the creation of Kowalski and Colmerauer. The name
'Prolog' is usually associated with a group of very similar programming
languages based on logic programming in Horn clauses— a particular sub-
language of the first-order predicate calculus. Prolog as it is actually used
deviates from pure logic programming in several ways: it fails to produce
some logically entailed answers; in rare cases it produces logically incor-
rect answers; and it contains constructs that are not definable naturally
in FOPC, and which are normally understood in conventional imperative
ways. Furthermore, the criteria by which Prolog chooses one of several
possible answers cannot be explained naturally in terms of the semantics
22 Michael J. O'Donnell

of FOPC. The discrepancy between Prolog and Horn-clause logic program-


ming is closely comparable to the discrepancy between Lisp and the purely
functional language based on the lambda calculus that is sometimes called
'pure lazy Lisp'. In spite of the discrepancies, the best way to understand
Prolog is to conceive of it as an approximation to, and extension of, a
Horn-clause logic programming language.
The essential idea behind Prolog is to find correct answers to predi-
cate calculus questions in QP of Section 2.3.1 above. In principle, all such
answers are computable. Currently known implementation techniques re-
quire fairly stringent conditions on the sets of formulae that may be used as
explicit knowledge for question answering, and on the questions that may
be asked, in order to search for and generate proofs in a relatively simple
and disciplined way. Prolog is based on the restriction of programs to sets
of Horn clauses, and questions to simple conjunctions of positive atomic
formulae.
Definition 2.3.11. A formula F € Fp is a Horn clause if and only if it
is in one of the forms
1. F = (Rl(ti,!,. . . , * M l ) A • • • A Rm(tm,l,- • • , *m,i m ) => P(«l, • . . , Uj))
2. F = (.Ri(ti,i, . . .,<!,;,) A • • • A R m (t m , i, . . .,tm,im) => False)
3. F = (True=»P(«i,...,«,-))
4. F - (True =>• False)
where R 1 , . . . , Rm, P are predicate symbols, and titi, . . . , t m , i m > W i > - . . ,Uj
are terms.
A pure Prolog program is a finite set of Horn clauses.
A pure Prolog input is a question of the form

(what x\,...,Xi:Pi («i,i , . . . , titil ) A • • • A Pm(£m,i , . . . , im,im ))

where x 1 , . . . , x i are precisely the free variables of PI (<i,i , . . . , 1 1,^ ) A • • • A


•*• m ( * m , l > • • • i *TO,i m j-

In order to promote readability of programs, within the limitations of


older keyboards and printers, typical Prolog implementations replace the
conventional logical symbol 'A' by ';', they write the arguments to implica-
tions in the reverse of the usual order and replace the symbol '•<=' by ':— '.
Also, they denote the symbol 'False' in the conclusion of an implication,
and 'True' in the hypothesis, by the empty string. Predicate symbols are
written in lower case, and variables are written in upper case. So, the four
forms of Horn clause in actual Prolog programs look like
Introduction 23

Since a question always requests substitutions for all free variables, the
header 'whatxi,...,Xi' is omitted and the question is abbreviated in the
form

Since the substitution in an answer of the form

(Rl(ti,i,. . ., tl.ij A • • • A Rm(tm,i,. . ., t m ,t m ))[*l, • • • ,Si/Xi, ...,Xi]

completely determines the answer, actual Prolog output presents only the
substitution, in the form

These notational variations and abbreviations have a substantial impact on


the way Prolog programs, inputs, and outputs look, but they are completely
transparent to the logical meaning.
When a pure Prolog input is presented to a pure Prolog program, all
possible answers may be derived systematically by treating each clause of
form 1 as a recursive part of a definition of the procedure P in terms of calls
to the procedures j R i , . . . , Rm- Because the same predicate symbol F may
appear in the conclusion of more than one clause, each clause normally pro-
vides only a part of the definition of P. Clauses of form 3 are nonrecursive
parts of the definition of a procedure P. Clauses of form 2 are somewhat
peculiar: they act like extra hardcoded parts of the input. Clauses of form 4
are useless in programs, but allowed by the formal definition. Different im-
plementations may or may not prohibit the degenerate forms. Prolog tries
to choose more general answers instead of their instances, but it does not
guarantee that all answers produced are most general. An understanding
of the precise correspondence of Prolog to the answering of pure Prolog
input questions using pure Prolog programs requires a lot of detail that
must wait until the chapter 'Horn clause logic programming'.
A notable variation on Prolog is aProlog [Nadathur and Miller, 1990;
Nadathur and Miller, 1988]. This language extends predicate calculus logic
programming into the omega-order predicate calculus, also called type the-
ory [Andrews, 1986]. Higher-order predicate calculi add variables ranging
over predicates, quantification over such variables, and predicates that ap-
ply to other predicates. aProlog generalizes the Horn clauses of FOPC
to the hereditary Harrop formulae of the omega-order predicate calculus
[Miller et al., 1991].
2.3.3 Relational databases and Datalog
Relational databases [Date, 1986; Codd, 1970] were invented by Codd,
completely independently of the development of Prolog and logic program-
ming in the programming language community. Nonetheless, relational
24 Michael J. O'Dormell

databases and their queries may be understood very naturally in terms of


logic programming concepts. This point has been noted by the Prolog com-
munity, leading to the definition of Datalog [Maier and Warren, 1988], a
variant of relational databases in the style of Prolog. Gallaire, Minker, and
Nicolas have developed the concept of deductive databases [Gallaire et al.,
1984] to capture the logical content of relational databases and their vari-
ants. Reiter [Reiter, 1984] has shown how a logical view of databases has
advantages of robustness under several useful generalizations of database
functionality. My proposed approach to logic programming applies logic
to programming languages in essentially the same way that Reiter applies
logic to databases.
Like Prolog, relational database systems find correct answers to pred-
icate calculus questions in QF of Section 2.3.1. The different natural as-
sumptions and performance requirements of the database world lead to far
more stringent restrictions on the sets of formulae that may be used for
question answering. The questions, which correspond to database queries,
are essentially unrestricted. Because of the stringent restrictions on knowl-
edge formulae, which limit answers to a predetermined finite set of pos-
sibilities, actual relational database implementations do not deviate from
pure logic programming, except by offering constructs that go beyond the
logical definitions. On purely relational queries, they produce precisely all
of the semantically correct answers.
Definition 2.3.12. A pure relational database is a finite set of formu-
lae of the form P(C 1 , ... ,C i ), where P 6 Predj is a predicate symbol and
c i , . . . ,a £ Funo are constant symbols.
A pure relational database as defined above is the natural logical view
of the contents of a relational database system at any given time. The
constant symbols are the objects that may appear in fields of relations in
the database. Each predicate symbol represents one of the relations in the
database. Each formula P(CI, ... ,c±) represents the presence of the tuple
(GI, ..., a) in the relation P.
Pure relational database query languages are equivalent to QP—all
queries of the form (what xi,..., Xi : F) are allowed. Because of the sim-
plicity of the formulae in the database, restrictions on the queries are not re-
quired for tractable implementation. Notice the complementarity of Prolog
and relational database query languages. Prolog allows relatively powerful
Horn clauses as knowledge formulae, but restricts queries to conjunctions of
atomic formulae. Relational database query languages restrict knowledge
formulae to the very simple form of predicate symbols applied to constant
symbols, but allow unrestricted FOPC what queries.
The simplicity of the formulae in a pure relational database guaran-
tees that the set of answers to (what x\,..., Xi : F) is finite, and relational
database query systems actually produce all the semantically correct an-
Introduction 25

swers. Equivalently, we may think of a relational database query system


as producing the consequentially strongest answer (unique up to the or-
der of the conjuncts) to (conj-what xi,...,n : F). The consequentially
strongest answer to the conj-what form of the question is simply the
conjunction of all the answers to the what form.
Datalog restricts queries to the Prolog form, but allows Horn clauses
with no function symbols (no 'functors' in Prolog jargon) to be added to the
formulae of the database to provide additional knowledge for the purpose
of answering a given query. The Horn clauses are thought of as defining
new relations to be used in the query, rather than as adding information
to the database.
A variety of notations have been used for the expression of FOPC
queries in relational database systems. These variant notations may look
very different from FOPC at first glance, but in fact they are equivalent in
querying power. The complete translations between notations are clumsy
to define formally, so we consider the general principles behind the nota-
tion, and illustrate the translation with examples. I use Codd's language
DSL ALPHA [Date, 1986; Codd, 1971], often called relational calculus, for
the database notation.
Instead of referring directly to objects in the universe U, relational
database languages generally refer to tuples in relations, because they cor-
respond to records in a conventional file. Instead of the positional notation
P(ti,... ,ti), they give domain names D\,... ,Di to the parameter posi-
tions. The name of a relation (predicate symbol) is treated as a variable
ranging over the set of tuples in that relation. The value of the domain D
in an arbitrary tuple of the relation P is denoted by the record notation
P.D. P.D = c means that the value in the domain D of an unknown tuple
in the relation P is c, and P.D = Q.E means that the value in the domain
D of some tuple in P is the same as that in domain E of some tuple in
Q. Because of the absence of function symbols with arity greater than 0,
this use of equality does not introduce the general capabilities of FOPC
with equality, it merely captures the limited sort of equality information
that is represented in pure FOPC by multiple occurrences of constants and
variables, and by binding variables to constants. In principle, the equation
x = c can be understood as Pc(x), where P is a special predicate postulated
to hold on c, and to not hold on all other constant symbols. For example,
if P and Q are binary predicates and D,E,F,G are appropriate domain
names, then the DSL ALPHA expression

P.D = c A P.E = Q.F A Q.G = d

is equivalent to the FOPC formula


26 Michael J. O'Donndl

Additional variables may be declared in DSL ALPHA to range over the


tuples of a relation in the database, and these variables may be quantified
with V and 3. So, in the presence of the declarations

RANGE X P
RANGE Y O

which declare the variable X to range over tuples of the relation P and Y
to range over tuples of Q, the expression

VX(X.D = c V BY (X.E = Y.F A Y.G = d))

is equivalent to the FOPC formula

Van :Vz 2 :(P(ii,z 2 )=*(zi = cV Q(x2,d))).


Notice that the existential quantifier in the DSL ALPHA expression is left
out of the FOPC formula, because one of its components is bound to the
constant d, and the other to the E component of the variable X, whose
quantification occurs before that of Y. In general, a FOPC quantifier is
required for each domain position of a quantified tuple in a DSL ALPHA
expression that is not bound by equality either to a constant or to a com-
ponent of a variable that has already been quantified. With the same
RANGE declarations above, the DSL ALPHA expression

VX3Y(X.D = Y.G)

is equivalent to the FOPC formula

So, there are some syntactic subtleties in translating quantification from


DSL ALPHA into FOPC, but they are all solvable with sufficient care.
There is one semantic problem that prevents DSL ALPHA from express-
ing everything that can be expressed by FOPC formulae with only 0-ary
function symbols. That is the limitation of the range of quantified variables
in DSL ALPHA to the set of tuples actually occurring in a relation in the
database, while FOPC may quantify over an abstract universe U. There is
no expression in DSL ALPHA semantically equivalent to Vx : P(x), since
DSL ALPHA can only express the fact that every object in a tuple of the
database satisfies the predicate P. Because these objects are exactly those
mentioned by the knowledge formulae, however, the restricted quantifica-
tion of DSL ALPHA yields the same answers to queries as the unrestricted
quantification of FOPC, so in terms of the behavior of query-answering
Introduction 27

systems the notations are equivalent. Another way of viewing this seman-
tic difference is to suppose that each database implicitly contains a dosed
world assumption [Reiter, 1978] expressing the fact that only the objects
mentioned in the database exist. In FOPC with equality, the closed world
assumption may be expressed by the formula Va; : x = CQ V • • • V x = cn ,
where CD, . . . , cn are all of the constant symbols appearing in the database.
Without equality, we can only express the fact that every object acts just
like one of CQ, . . . , cn (i.e., it satisfies exactly the same formulae), and even
that requires an infinite number of formulae.
Given the translation of DSL ALPHA expressions to FOPC formulae
suggested above, we may translate DSL ALPHA queries into FOPC ques-
tions. DSL ALPHA queries have the general form

GET
where PI , . . . , Pi are relations in the database, Q is a new relational symbol
not occurring in the database, D\, . . . ,Di are domain names, and E is an
expression. Appropriate declarations of the ranges of variables in E must
be given before the query. Let F be a FOPC formula equivalent to E, using
the variables x\ , . . . , Xi for the values PI .D\ , . . . , Pi-Di . Then, the effect of
the DSL ALPHA query above is to assign to Q the relation (set of tuples)
answering the question

That is,

where D is the set of formulae in the database. Equivalently, the value of


Q may be thought of as an abbreviation for the consequentially strongest
answer to conj-what Xi , . . . , Xi : F for D, which is just the conjunction of
all the answers to what x 1 , . . . , X i : F.
Another type of notation for relational database queries avoids ex-
plicit quantification entirely, and uses the relational operations of pro-
jection, join, etc. to define new relations from those in the database.
Notation in the style of DSL ALPHA above is called relational calcu-
lus, because of the similarity to predicate calculus in the use of explicit
quantification. The alternate approach through operations on relations
is called relational algebra, because the equations that express the prop-
erties of the relational operations resemble well-known definitions of al-
gebras. In fact, each notation has its own algebra and its own calculus,
and the difference is just the way in which relations are denoted. Most
recent work on relational databases refers to the relational algebra nota-
28 Michael J. O'Donnell

tion, which looks even more distant from FOPC than the relational cal-
culus notation, but is still easily translatable into FOPC. See [Date, 1986;
Codd, 1972] for a description of relational algebra notation for database
queries, and a translation to the relational calculus notation.
2.3.4 Programming in equational logic
Another natural logical system in which to program is equational logic.
A large number of programming languages may be viewed essentially as
different restrictions of programming in equational logic.
Definition 2.3.13. Let the set V of variables, the sets Fun0, Funi,... of
constant and function symbols, and the set Tp of terms be the same as in
FOPC (see Definition 2.3.1). Let = be a new formal symbol (we add the
dot to distinguish between the formal symbol for equality in our language
and the meaningful equality symbol = used in discussing the language).
The set F = of equational formulae (or simply equations) is

Models for equational logic are the same as models for FOPC, omit-
ting the predicate assignments. Although = behaves like a special binary
predicate symbol, it is given a fixed meaning (as are A, V, ->, =>, 3, V),
so it need not be specified in each model. An equational formula t\ = t-z
holds in a model precisely when t\ denotes the same object as tz for every
variable assignment.
Definition 2.3.14. Let the infinite universe U and the set of function
assignments be the same as in FOPC (Definition 2.3.4). If U C U and r
is a function assignment over U, then {U, T) is a model of equational logic.
M^. is the set of all models of equational logic.
Let the set of variable assignments be the same as in FOPC (Def-
inition 2.3.2), as well as the definition of a term valuation rv from a function
assignment T and variable assignment v (Definition 2.3.3). The classical
semantic system for equational logic is {F_L,M_L, (=_:.), where (U,r) (=.:. t\
= t2 if and only if rv(ti) = Tvfo) for all variable assignments v over U.
Models of equational logic are essentially algebras [Cohn, 1965; Gratzer,
1968; Mac Lane and Birkhoff, 1967] . The only difference is that alge-
bras are restricted to signatures—subsets, usually finite, of the set of con-
stant and function symbols. Such restriction does not affect any of the
properties discussed in this chapter. If T is a finite subset of Tj., then
the set of algebras Models(T) (restricted, of course, to an appropriate
signature) is called a variety. For example, the monoids are the models
of {m(x,m(y,z)) = m(m(x,y),z), m(x,e) = x, m(e,x) == e} restricted to
the signature with one constant symbol e and one binary function symbol
m.
Introduction 29

Perhaps the simplest sort of question that is naturally answered by an


equation is 'what is io?' for a term to- For each term t\, the equation to = ti
is an answer to this question. The trouble with such a primitive question
is that it admits too many answers. For example, the tautology to = to is
always a correct answer to 'what is to?'- This problem is closely analogous
to the problem of the tautological answer (a =>• a) to the question imp(a)
('what atomic formula does a imply?', Example 2.2.3, Section 2.2). For
the shallow implicational calculus, we avoided undesired answers simply
by listing a set A of them, in the form rest-imp(a, A) ('what atomic
formula not in A does a imply?', Example 2.2.4). Since the number of
terms t1 making to = ti true is generally infinite, we need a finite notation
for describing the prohibited answers. A particularly useful way is to give a
finite set of terms with variables, and prohibit all instances of those terms
from appearing as subterms in an answer term.
Definition 2.3.15. Let x 1 , . . . , X i be a list of variables with no repeti-
tions, and let t,ti,...,ti € Tp. t[ti,. ..,ti/xi,... , #i] is the formula that
results from substituting the term tj for every occurrence of the variable
Xj in the term t, for each j, 1 < j <i. When s — t[ti,..., ti/xi,..., n], we
say that s is an instance of t, and that t is more general than s.
The concepts of substitution, instance, and generality for terms are
analogous to the corresponding concepts defined for formulae in Defini-
tion 2.3.5, simplified because all occurrences of variables in terms are
free.
Definition 2.3.16. Let t\,...,ti € Tp be terms. A term s is a normal
form for {t\,..., t^} if and only if no subterm of s is an instance of a term
in {ti,...,ti}.
Let norm be a new formal symbol. Let

Q= = {(norm t l t . . . , tt : t) : t,«lf..., tt £ TP}

Define the relation 7-^C Q^ x F^ by

(normii,..., ti : t) !-± (s\ = $2)

if and only if si = t and S2 is a normal form for {*i,..., ti}.


Now (Fj., Qj., ?-.i) is a query system representing the answers to ques-
tions of the form 'what normal form for t 1 ,...,t i is equal to t?'
For every semicomputable set T C F.^, the set of equations (t = s) such
that (norm ti,..., ti : t) 7-^ T (=_:. (t = s) is semicomputable. It is easy to
define a query system with conjunctive equational answers, similar to the
conjunctive FOPC answers of Definition 2.3.10. Such a step is most useful
when infinite conjunctions are allowed, so it is reserved for Section 7.1 of the
30 Michael J. O'Donnell

chapter 'Equational Logic Programming.'


Example 2.3.17. Let a € Funo be a constant symbol representing zero,
let s 6 Funx be a unary function symbol representing the successor func-
tion, and let p E Fun2 be a binary function symbol representing addition.
A natural state of knowledge about these symbols is
T =-{p(a,x) = x, p(s(x),y) = s(p(x,y))}
A natural question is (normp(x, y) :p(s(s(a)),s(s(s(a))))). The unique
correct answer is (p(s(s(a)),s(s(s(a)))) = s(s(s(s(s(a)))))). That is, the
answer to the question 'what is a term for 2 plus 3, not using the plus
operator?' is '2 plus 3 equals 5.'
Another question, peculiar but formally legitimate, is (norms(s(o:)),
s(p(x,y)) : s(s(a))). Correct answers include (s(s(a)) = p(s(o),s(o))),
(s(s(a)) = p(s(a),p(s(a),a))), (s(s(a)) = p(p(s(a),a),s(a))), etc. The an-
swers to this question express 2 as sums of 1 and 0. The simplest form is
1 plus 1; all other forms simply add extraneous 0s.
Every partial computable function o may be defined similarly by letting
o(i) be the unique j such that
(norm «!,...,* : /(**(o))) ?- T ^ (/(«>)) = sj(a))
for appropriately chosen 11,..., ti and finite set T of equations defining
/. In principle, we might ask for most general or consequentially strongest
answers to questions in Q^. In practice, multiple answers to such questions
are usually incomparable.
A more powerful form of equational question answering involves the
solution of equations.
Definition 2.3.18. Let solve be a new formal symbol. Let

Qs= = {(solvexi,...,au :t1 = *a) :«i,*2 € TP}


Define the relation t~s± by
(solve xi,..., Xi : ti = t?) "hs± (sj = s2)
if and only if there are terms u i , . . . , u< 6 Tp such that si = 11 [HI ,..., Ui/
X i , . . . , X i ] and s 2 = t2[u1:... ,«i/£i,... ,0^].
Now (Qs=, F^, ?-«=) is a query system representing the answer to ques-
tions of the form 'what values of x\,..., Xi solve the equation t\ = t^T
Notice the close analogy between the question (solve x\,...,Xi :t1 =t 2 )
above, and the FOPC question (what n,..., Xi : F) of Definition 2.3.6,
Section 2.3.1. Syntactically, the equational question is merely the spe-
cial case of the FOPC question where the formula F is restricted to be an
Introduction 31

equation. The semantics of equational logic lead, however, to very different


typical uses for, and implementations of, equation solving.
2.3.5 Functional and equational programming languages
A wide variety of nonprocedural programming languages have been inspired
by Backus' proposal of functional programming languages [Backus, 1974;
Backus, 1978] defined by equations. The previous development of Lisp by
McCarthy [McCarthy, 1960], although not originally conceived in terms of
functional programming, fits in retrospect into the functional approach,
and its success has boosted the interest in functional languages substan-
tially. Languages for the algebraic description of abstract data types [Gut-
tag and Horning, 1978; Wand, 1976; Futatsugi et al., 1985] use equations
in individual programs, rather than in the language design, and one exper-
imental language is defined explicitly in terms of equational logic program-
ming [Hoffmann and O'Donnell, 1982; Hoffmann et al., 1985; O'Donnell,
1985]. The essential idea behind functional, algebraic, and equational, pro-
gramming languages is to find correct answers to normalization questions
in Q= of Section 2.3.4 above. A number of different styles are used to
specify these languages, often obscuring the logic programming content.
In this section, 'functional programming languages' include all program-
ming languages that can be described naturally as answering normalization
questions, and we view them as a form of equational logic programming,
whether or not they are conventionally thought of in that way. Actual
functional languages differ widely on a number of dimensions:
• the notation in which terms are written;
• the way in which the knowledge formulae are determined by the lan-
guage design and the program;
• the way in which questions are determined by the language design,
the program, and the input;
• deviations from pure equational logic programming.
Because of the complexity of these variations, the discussion in this section
is organized around the various decisions involved in designing a functional
programming language, rather than around a survey of the most important
languages.
The style in which many functional programming languages are spec-
ified creates an impression that there are fundamental logical differences
between functional programming and equational logic programming. This
impression is false—functional programming and equational logic program-
ming are two different ways of describing the same behaviors. The different
styles of description may encourage different choices in language design, but
they do not introduce any fundamental logical deviations. In particular,
'higher order' functional programming languages are not higher order in
any fundamental logical sense, and may be described very naturally by
32 Michael J. O'Donnell

first-order equational logic [Goguen, 1990]. The chapter 'Equational Logic


Programming,' Section 1.2, provides more discussion of the connection be-
tween functional and equational programming ideologies.
Determining equational knowledge from language design and program. The
equational formulae used as explicit knowledge for answering normaliza-
tion questions in functional programming languages are derived from the
language design itself and from the program that is being executed. In
principle, they could also be given in the input, but this possibility has
not been exploited explicitly. Many functional languages are processed by
interactive interpreters, which blur the distinction between program and
input, and many functional languages have mechanisms, such as lambda
abstraction [McCarthy, 1960] or the let construct, that simulate the intro-
duction of certain equations within an input term. Interactive interpretive
processing, and the simulation of additional equations within terms, pro-
vide implicitly a lot of the power of explicit equations in the input.
Most functional languages are designed around substantial sets of prim-
itive operations, defined by equations. For example, the primitives cons
(construct ordered pair), car (first component of pair), and cdr (second
component of pair) in Lisp are defined by the two equations (car(cons(x, y))
= x) and (cdr (cons(x,y)) = y) [McCarthy, 1960]. Primitive operators that
manipulate term structure, or provide program control structures, are usu-
ally defined by explicitly given small finite sets of equations. Primitive
operators for basic mathematical operations are defined by large or infinite
sets of equations that must be described rather than listed. For exam-
ple, the conventional arithmetic operation of addition is defined by the
equations add(0,0) = 0, add(Q, 1) = 1,..., add (1,0) = 1, add (1,1) = 2,...,
add(2,0) = 2, add(2,1) = 3, add(2,2) = 4,....
Most functional programming languages have rich enough primitive sets
of operators defined by equations in the language design that it is not nec-
essary to introduce new equations in a program—the goals of the program
may be accomplished by appropriate combinations of primitive operations.
In particular, many functional languages have operators that simulate the
introduction of additional local equations within a term. Even in lan-
guages with powerful primitives, it is often convenient to introduce explicit
equations defining new operators in a given program. A few functional
languages have weak primitives, and depend upon equations in programs
for their expressive power.
Most functional languages impose restrictions on the equations that
may be introduced by programs, in order to allow simple and efficient
proof searches in the implementation. Typical restrictions are surveyed
in the chapter 'Equational Logic Programming,' Section 2.3.2. In most
languages, the restrictions on equations in programs allow each equation
f ( t 1 , . . . , ti) = t in a program to be treated as a part of the definition of a
Introduction 33

procedure to compute /. Note the similarity to the treatment of clauses in


Prolog programs (Section 2.3.2).
Determining a question from language design, program, and input. Re-
call that the form of a normalization question is (norm t1 , . . . , ti : t) . An
answer to such a question gives a term equal to t that does not contain
a subterm of any of the forms t1, . . . , ti. The determination of a question
divides naturally into the determination of the prohibited forms t1 , . . . ,
ti and the term t to normalize.
In most functional programming languages, the prohibited forms t 1 , . . . ,
ti are determined by partitioning the set of symbols into constructors, prim-
itive functions, and defined functions. Constructors are intended to be com-
putationally inert—they are treated as elements of a data structure. The
use of constructors in equations is highly restricted to ensure this inertness.
The binary symbol cons in Lisp is the archetypal constructor. In languages
that distinguish constructors, the prohibited forms in normalization ques-
tions are precisely the terms f(x 1 , . . . , x i ) , where / is a primitive function
or a defined function of arity i. That is, the normal forms for questions in
constructor systems are precisely the constructor expressions—terms com-
posed entirely of variables and constructors. The set of constructors may
be fixed by the language design, or a program may be allowed to introduce
new constructors, in explicit declarations of the symbols or in recursive
type definitions.
In a functional language without constructors, the prohibited forms
may be implicit in the equations of a program. Most functional languages
infer from the presentation of the equation t1 = t2 that the left-hand side
t1 is to be transformed into the right-hand side t2. Such an inference is
not justified directly by the semantics of equational logic, but it is often
justified by restrictions on the form of equations imposed by the language.
So, the prohibited forms in questions are often taken to be precisely the
terms on the left-hand sides of equations in the program.
The term to be normalized is typically in the form

where tla is fixed by the language design, tp, . . . ,tp are determined by
the program, and tin, . . . ,tin are determined by the input. In principle,
other forms are possible, but it seems most natural to view the language
design as providing an operation to be applied to the program, producing
an operation to be applied to the input. For example, in a pure Lisp
eval interpreter, tla = eval(x 1 , nil). The program to be evaluated is tpr
(the second argument nil to the eval function indicates an empty list of
definitions under which to evaluate the program). Pure Lisp has no input—
conceptual input may be encoded into the program, or extralogical features
34 Michael J. O'Donnell

may be used to accomplish input. A natural extension of pure Lisp to


allow definitions of symbols in a program yields tla = eval(x 1 , x 2 ), tpr is
the expression given by the program to be interpreted, and tpr is a list of
bindings representing the definitions given in the program. In a language,
unlike Lisp, where a program designates a particular defined function /
as the main procedure to be executed on the input, we can have tla = x1
(the language design does not contribute to the term to be normalized),
tpr = f ( y 1 ) where f is the designated main function, and tin is the input.
In yet other languages, the term to be normalized is given entirely by the
input.
Notational variations and abbreviations. Functional programming lang-
uages vary widely in the notations used for terms, using all sorts of prefix,
infix, postfix, and mixfix forms. A survey of the possibilities is pointless. All
current functional languages determine the prohibited forms in questions
implicitly, either from the language design or from the left-hand sides of
equations in a program. Questions are presented by specifying some or all
of the term to be normalized—other parts of the term may be implicit as
described above. Outputs always present only the normal form s rather
than the equation t = s, since the question term t is already given explicitly
or implicitly.
Deviations from pure equational logic programming. Implementations of
functional programming languages have generally come closer to the ideal
of pure equational logic programming than Prolog systems have to pure
Horn-clause logic programming, largely because of a simpler correspon-
dence between logical and procedural concepts in functional languages than
in Prolog. Many functional languages extend pure logic programming with
side-effect-producing operations, similar to those in Prolog. These exten-
sions are usually used to deal with input and output. Some functional
languages avoid such extensions by modelling the entire input and output
as an expression, called a stream, that lists the atomic elements of the in-
put and output [Karlsson, 1981; Thompson, 1990; Hudak and Sundaresh,
1988; Hudak, 1992; Gordon, 1992; Dwelly, 1988]; others use a functional
representation of input and output based on continuations [Perry, 1991;
Hudak and Sundaresh, 1988; Hudak, 1992]. Also see [Williams and Wim-
mers, 1988] for an implicit approach to functional I/O. A protocol that
decouples the temporal order of input and output from its representa-
tion in a functional program has been proposed as well [Rebelsky, 1993;
Rebelsky, 1992] .
The purely functional subsets of functional programming languages usu-
ally avoid giving incorrect answers. Implementations of Lisp before the
1980s are arguably incorrect in that their use of dynamic scope [Stark, 1990;
Moses, 1970] for parameter bindings gives answers that are incorrect ac-
cording to the conventional logical equations for substitution of terms for
Introduction 35

parameters, taken from the lambda calculus [Church, 1941; Stenlund, 1972;
Barendregt, 1984] . Since the equations for manipulating bindings were
never formalized precisely in the early days of Lisp, implementors may ar-
gue that their work is correct with respect to an unconventional definition
of substitution. Early Lispers seem to have been unaware of the logical
literature on variable substitution, and referred to the dynamic binding
problem as the 'funarg' problem.
Essentially all functional programming languages before the 1980s fail
to find certain semantically correct answers, due to infinite evaluation of
irrelevant portions of a term. In conventional Lisp implementations, for
example, the defining equation car (cons (x, y)) = x is not applied to a term
cor(cons(t1, t 2 )) until both t1 and £2 have been converted to normal forms.
If the attempt to normalize t2 fails due to infinite computation, then the
computation as a whole fails, even though a semantically correct answer
might have been derived using only t1. Systems that fail to find a normal
form for car(cons(t 1 , t2)) unless both of t1 and t2 have normal forms are said
to have strict cons functions. The discovery of lazy evaluation [Friedman
and Wise, 1976; Henderson and Morris, 1976; O'Donnell, 1977] showed
how to avoid imposing unnecessary strictness on cons and other functions,
and many recent implementations of functional programming languages are
guaranteed to find all semantically correct answers. Of course, it is always
possible to modify defining equations so that the strict interpretation of a
function is semantically complete.
Example 2.3.19. Consider Lisp, with the standard equations

car(cons(x, y)) = x and cdr(cons(x, y)) = y

To enforce strict evaluation of lists, even in a lazily evaluated implemen-


tation of equational logic programming, add new function symbols test,
strict, and, and a new constant true, with the equations

test(true, x) = x, and(true, true) = true,


strict(cons(x, y)) = and (strict (x), strict (y)), strict(a) = true

for each atomic symbol a. Then, redefine car and cdr by

car (cons (x, y)) = test(strict(y), x) and cdr (cons (x, y)) = test(strict(x), y)

Lazy evaluation with the new set of equations has the same effect as strict
evaluation with the old set.
Some definitions of functional programming language specify strictness
explicitly. One might argue that the strict version of cons was intended in
the original definition of Lisp [McCarthy, 1960], but strictness was never
stated explicitly there.
36 Michael J. O'Donnell

2.3.6 Equation solving and predicate calculus with equality


Given successful applications of logic programming in the pure first-order
predicate calculus without equality, and in pure equational logic, it is
very tempting to develop languages for logic programming in the first-
order predicate calculus with equality (FOPC=). Unfortunately, there
is a mismatch between the style of questions used in the two sorts of
logic programming. FOPC logic programming uses questions of the form
(what x 1 ,. . . , X2 : F) (Definition 2.3.6), whose answers supply substitu-
tions for the variables x1, . . . , xi satisfying F. Equational logic program-
ming uses questions of the form (norm t1, . . . ,ti : t) (Definition 2.3.16),
whose answers supply normal forms equal to t, not containing the forms
t1, . . .,ti- The implementation techniques for answering these two sorts of
questions do not appear to combine well.
The most natural idea for achieving logic programming in FOPC= is
to use the FOPC style of question, and extend it to deal with equations.
In principle that is feasible, because the formal equality symbol = may be
treated as another binary predicate symbol in Precb, and Horn clauses ex-
pressing its crucial properties are easy to find. Unfortunately, the natural
Horn clauses for the equality predicate, when treated by current imple-
mentation techniques for FOPC logic programming, yield unacceptably
inefficient results. In practice, a satisfactory realization of logic program-
ming in FOPC= will require new techniques for solving equations—that
is, for answering questions of the form (solve x 1 , . . . , xi : t1 = t2) (Defini-
tion 2.3.18). An interesting experimental language incorporating signifi-
cant steps toward logic programming in FOPC= is EqL [Jayaraman, 1985],
but this language only finds solutions consisting of constructor expressions.
The problem of finding nonconstructor solutions is much more difficult.
It is also possible to define FOPC logic programming in terms of a
normalization-style query. Consider a question 'what acceptable formula
implies FO', which presents a goal formula F0, and some description of
which answer formulae are acceptable and which are prohibited (the prob-
lem of describing such prohibited formulae is more complex in FOPC than
in equational logic because of the structural properties of the logical con-
nectives ^, V, . . . ). An answer to such a question is an implication of the
form F1 => FO , where F1 is an acceptable formula. The conventional FOPC
questions (what x 1 , . . . , xi : F0) may be understood as a variant of the
'what implies FO' questions, where the acceptable formulae are precisely
those of the form x1 = t1 ^ . . . ^ xi = t2. The proposed new style of FOPC
question may be seen as presenting a set of constraints expressed by FO,
and requesting a normalized expression of constraints F1 such that every
solution to the constraints expressed by F1 is also a solution to the con-
straints of F0. Constraint Logic Programming [Jaffar and Lassez, 1987;
Lassez, 1991] has taken some steps in this direction, although Constraint
Introduction 37

Logic Programming generally views the normalized constraints expressed


by F1 not as a final result to be output as an answer, but as something to
be solved by techniques outside of FOPC, such as numerical techniques for
solving linear systems of equations.

3 Implementing logic programming languages


Semantic systems and query systems are convenient tools for specifying
logic programming languages: we require a language to provide semanti-
cally correct answers to questions. In order to implement logic program-
ming languages, we first develop sound (correct) and complete (powerful)
proof systems to provide effective certificates of the semantic correctness of
inferences. Then, we convert proof systems to programming systems that
process inputs and programs efficiently to produce outputs, by introduc-
ing strategies for choosing incrementally which proofs to construct. In this
chapter we consider only the abstract forms of implementations, far above
the level of actual code for real machines. Truly practical implementations
do, however, follow these abstract forms quite closely.

3.1 Proof systems and provable consequences


A semantic-consequence relation determines in principle whether it is cor-
rect to infer one logical formula from others. In order to give a formal
justification for an inference, we need a notation for proofs. We reserve the
initial P for programs, so D is used to stand for proofs, which might also
be called demonstrations or derivations.
Definition 3.1.1. A proof system is a system D = {F,D, I -), where
1. F is a set of formulae
2. D is a set of proofs
3. I - is a relation on 2F x D x F (when (T, D, F) are in the relation
I -, we write T I D - F)
4. I - is monotone (if T I D - F and T C U, then U | D - F)
The proof system D is compact if and only if, for all T C F, P 6 D,
and F £ F, whenever T I D - F there exists a finite subset Tf C T such
that Tf I D - F.
T I D - F is intended to mean that D is a proof of F which is allowed
to use hypotheses in T. The fact that D is not required to use all hypothe-
ses leads to monotonicity (4). There are a number of proposals for systems
of 'nonmonotonic logic,' but they may be regarded as studies of different
relations between proofs and formulae than the notion of derivability rep-
resented by I - above, rather than as arguments about the properties of I -.
The controversy about nonmonotonic logic is not relevant to the discussion
in this chapter, and fans of nonmonotonic relations between proofs and
formulae may translate into their own notation if they like.
38 Michael J. O'Donnell

In well-known proof systems, proofs as well as formulae are finite syntac-


tic objects, and the set D of all proofs is computable. There are important
uses, however, for infinite proofs of infinite formulae, so we do not add
a formal requirement of computability. Typically, a proof D determines
uniquely the conclusion formula F and minimum hypothesis set T such
that T I D - F, but there is no need to require such a property. Meseguer
[Meseguer, 1989] proposed a similar general notion of proof calculus.
It is straightforward to design a proof system for SIC. The following
proof system follows the conventional style of textbooks in logic. Proofs
are sequences of formulae, each one either a hypothesis, a postulate, or a
consequence of earlier formulae in the proof.
Example 3.1.2. Let FSh be the set of formulae in SIC. The set of linear
proofs in SIC is PSl = F+sh, the set of nonempty finite sequences of formulae.
The proof relation I -51 is defined by
T I (Fo,..., Fm) -si F if and only if Fm = F and,
for alii < m, one of the following cases holds:

1. F; € T
2. Fi = (a =>• a) for some atomic formula a € At
3. Ft = 6, and there exist j, k < i such that Fj = a and Fk = (a => b)
for some atomic formulae a, 6 e At
4. FJ = (a => c), and there exist j, k < i such that F, = (a => b) and
Ft = (b => c) for some atomic formulae a,b,c£ At

Now (Fsh,Ps1 | - si) is a compact proof system, representing the Hilbert,


or linear, style of proof for implications.
Intuitively, a linear or Hilbert-style proof is a finite sequence of formu-
lae, each one being either a hypothesis (case 1 above), a postulate (case 2
above, expressing the postulate scheme of reflexivity of implication), or
the consequence of previous formulae by a rule of inference (case 3 above,
expressing the rule of modus ponens, and case 4, expressing the rule of
transitivity of implication). The conclusion of a linear proof is the last
formula in the list.
An alternative proof system for SIC, less conventional but more con-
venient for some purposes, follows the natural deduction style of proof
[Prawitz, 1965]. Natural deduction proofs are trees, rather than sequences,
to display the actual logical dependencies of formulae. They also allow the
introduction and discharging of temporary assumptions in proofs, to mimic
the informal style in which we prove a => 6 by assuming a and proving b.
Example 3.1.3. Let FSh be the set of formulae in the shallow implica-
tional calculus defined in Example 2.1.2. Let assume, modus-ponens,
and deduction-rule be new formal symbols.
Introduction 39

The set Psn of natural deduction proofs in SIC and the proof relation
I "SnQ 2 Sh x Psn x FSK are denned by simultaneous inductive definition
to be the least set and relation such that:

1. for each atomic formula a £ At, and each set of formulae T C


assume(a) £ Psn
and
T U {a} I assume(a) -sn a
2. if a, 0 £ Psni and T I a - sn a for some atomic formula a £ At, and
U I ft -Sn (a =*• 6), then

modus-ponens(a, ft) £ Psn and TuU I modus-ponens(a, ft) ~sn b

3. if /? 6 Psn and T U {a} I /3 ~sn b for some atomic formula a € At,


then

deduction-rule (a, /3) € Psn and T I deduction-rule (a, 0) -sn (a => b)

Now {Fsh,Psn | -Sn) is a compact proof system, representing the natural


deduction style of proof for implications.
Intuitively, assume(a) is a trivial proof of a from hypothesis a. rnodus-
ponens (a, 0) is the result of using a proof a of some atomic formula
a, a proof B of an implication (a => b), and combining the results along
with the rule of modus ponens to conclude 6. The set of hypotheses for
the resulting proof includes all hypotheses of a and all hypotheses of ft.
deduction-rule(a,B) is the result of taking a proof ft of b from hypotheses
including a, and discharging some (possibly 0) of the assumptions of a from
the proof, to get a proof of (a => 6) by the deduction rule. In clause 3 of the
inductive definition above, notice that the hypothesis set T may contain
a, in which case T U {a} = T. It is this case that allows for the possibility
that one or more assumptions of a remain undischarged in an application
of the deduction rule.
The style of proof formalized in Example 3.1.3 is called natural deduc-
tion, since it mimics one popular informal style of proof in which an implica-
tion is proved by assuming its hypothesis and deriving its conclusion. Nat-
ural deduction style [Prawitz, 1965], and the similar style of sequent deriva-
tion [Gentzen, 1935; Kleene, 1952], both due to Gentzen, are the styles of
proof most commonly treated by research in proof theory [Stenlund, 1972;
Prawitz, 1965; Takeuti, 1975; Schutte, 1977; Girard et al., 1989]. In proof
theory, natural deduction rules are expressed very naturally as terms in a
typed lambda calculus, where the type of a lambda term is the formula
that it proves [Howard, 1980; Tait, 1967].
40 Michael J. O'Donnell

In many cases, we are interested only in the provability of an inference,


and not the proof itself. So, we let each proof system define a provable-
consequence relation, analogous to the semantic-consequence relation asso-
ciated with a semantic system.
Definition 3.1.4. Let D = (F,D, I -) be a proof system. The provable-
consequence relation defined by D is HC 2F x F, where

T I- F if and only if there exists a proof D € D such that T I D - F.

The provable-consequence relation h is compact if and only if, for all


T C F, and F e F, whenever T |- F then there exists a finite subset Tf C T
such that Tf |- F.
Intuitively, T |- F means that F is provable from hypotheses in T. It
is easy to show that an arbitrary relation |- on 2F x F is the provable-
consequence relation of some proof system if and only if it is monotone
(T |- F and T C U imply that U |- F). See [Meseguer, 1989] for another
abstract definition of provable consequence relations, with more stringent
requirements.
Most well-known semantic/provable-consequence relations are compact,
and semicomputable on the finite sets of formulae. The trinary proof rela-
tions of proof systems (restricted to finite sets of hypotheses) are normally
computable. That is, in a reasonable proof system we can determine defi-
nitely and mechanically whether or not a supposed proof is in fact a proof of
a given conclusion from a given finite set of hypotheses. It is easy to see that
a proof system with semicomputable set D of proofs and semicomputable
trinary proof relation I - also has a semicomputable provable-consequence
relation, and that compactness of the proof system implies compactness of
the provable-consequence relation. In fact, every semicomputable provable-
consequence relation is defined by some proof system with computable D
and trinary proof relation |- . In this respect the trinary proof relation acts
as a Godel T-predicate [Kleene, 1952] to the binary provable consequence
relation.

3.2 Soundness and completeness of proof systems


The behavior of a proof system may be evaluated in a natural way with re-
spect to a semantic system with the same or larger set of formulae. We say
that the proof system is sound for the semantic system when every prov-
able consequence is a semantic consequence, and that the proof system is
complete when every semantic consequence is provable. Roughly, sound-
ness means correctness, and completeness means maximal power within the
constraints imposed by the set of formulae available in the proof system.
Definition 3.2.1. Let D = (FD,D, I -) be a proof system, and let S =
Introduction 41

(Fs,M, |=> be a semantic system, with FD C FS-


D is sound for S if and only if, for all T C F and F e FD,

T |- F implies T |= F
D is complete for 5 if and only if, for all T C F and F £ FD,
T |= F implies T |- F

Each of the proposed proof systems for SIC is sound and complete
for the semantic system of SIC. The following proofs of completeness for
SIC are similar in form to completeness proofs in general, but unusu-
ally simple. Given a set of hypotheses T, and a formula F that is not
provable from T, we construct a model M satisfying exactly the set of
provable consequences of T within some sublanguage FF containing F
(Theory(M) n FF 3 Theory(Models(T)) n FF). In our example below,
FF is just the set of all shallow implicational formulae (F Sh ), and the model
construction is particularly simple.
Example 3.2.2. Each of the proof systems (Fsh, PSn, |-Sn) of Exam-
ple 3.1.3 and-(F sh , PSl, |- S1) of Example 3.1.2 is sound and complete for
the semantic system (FSh, Msh, |=sh >of Example 2.1.2.
The proofs of soundness involve elementary inductions on the size of
proofs. For the natural deduction system, the semantic correctness of a
proof follows from the correctness of its components; for the linear system
correctness of a proof (F0,..., F m+1 ) follows from the correctness of the
prefix <F0, . . . , Fm>.
The proofs of completeness require construction, for each set T of for-
mulae, of a model M — {a € At : T |-Sn a} (or T |-si a). So M =Sh a for all
atomic formulae a E T n At trivially. It is easy to show that M |=sh (a => &)
for all implications (a = b) 6 T as well, since either a £ M, or b follows by
modus ponens from a and a => b, so b € M. Finally, it is easy to show that
M =sh F if and only if T |-Sn F (or T |-S1 F).
In richer languages than SIC, containing for example disjunctions or
negations, things may be more complicated. For example, if the disjunction
(A V B) is in the set of hypotheses T, each model satisfying T must satisfy
one of A or B, yet neither may be a logical consequence of T, so one of them
must be omitted from Ff. Similarly, in a language allowing negation, we
often require that every model satisfy either A or - A. Such considerations
complicate the construction of a model substantially.
Notice that the formal definitions above do not restrict the nature of
semantic systems and proof systems significantly. All sorts of nonsensical
formal systems fit the definitions. Rather, the relationships of soundness
and completeness provide us with conceptual tools for evaluating the be-
havior of a proof system, with respect to a semantic system that we have
42 Michael J. O'Donnell

already accepted as appropriate. Logic is distinguished from other tech-


nical sciences, not by the formal definition of the systems that it studies,
but rather by the use of logical concepts to evaluate these systems. The
distinction between logic programming and other forms of programming is
similarly based on the approach to evaluating the systems, rather than the
formal qualities of the systems.
While formal studies may reveal pleasing or disturbing properties of
semantic systems, there is also an unavoidable intuitive component in the
evaluation of a semantic system. A semantic system is reasonable only if
it accurately captures enough of the structure of the mental meaning that
we want to associate with a formula to allow a sensible determination of
the correctness or incorrectness of steps of reasoning. Since different people
have different intuitive notions about the proper meanings of formulae, and
the same person may find different intuitions useful for different purposes,
we should be open minded about considering a broad variety of semantic
systems. But, the mere satisfaction of a given form of definition, whether
the form in Definition 2.1.1 above, or one of the popular 'denotational'
forms using lattices or chain-complete partial orderings and fixpoints, does
not make a 'semantics' meaningful. A number of additional mathemati-
cal and philosophical dissertations are needed to give practical aid in the
evaluation and selection of semantic proposals. The best one-sentence ad-
vice that I can offer is to always ask of a proposed semantic system, 'for
a given formula F, what does the semantic system tell about the informa-
tion regarding the world that is asserted by F.' For this chapter, I use
only systems of semantics based on first-order classical forms of logic that
have been shaken down very thoroughly over the years. In these systems,
the individual models clearly present enough alleged facts about a possible
state of the world to determine the truth or falsehood of each formula. So,
the information asserted by a formula is that the world under discussion
is one of the ones satisfying the formula in the sense of |=. There are
many reasons to prefer nonclassical logics for programming and for other
purposes. But, we must never rest satisfied with 'semantic' treatments of
these logics until they have been connected convincingly to an intuitive
notion of meaning.
A semantic system and a sound proof system may be used to analyze
the process by which implicit knowledge is made explicit—we are particu-
larly interested in the derivation of explicit knowledge in order to answer
a question. Consider an agent with implicit knowledge given by the set
K of models consistent with that knowledge, and represent the agent's ex-
plicit knowledge by a set T of formulae that he can utter effectively. The
correctness of the explicit knowledge requires that T C Theory(K). Sup-
pose that the agent knows that the proof system is sound, and suppose
that he can recognize at least some cases when the relation T I D - F
holds—often this capability results from computing the appropriate deci-
Introduction 43

sion procedure for a computable proof system (or enumeration procedure


for a semicomputable proof system), with an appropriate finite subset of T.
Then, whenever he finds a formula F and a proof D such that T I D - F,
the agent may add the formula F to his explicit knowledge. The soundness
of the proof system guarantees that F E Theory(K). Notice that sound
proofs can never extend explicit knowledge beyond the bound determined
by implicit knowledge, which is Theory(K).
Definition 3.2.3. Let Q = (FQ, Q,?-) be a query system, and let D =
(F D , D, I -) be a proof system with FQ C FD.
Q ?- T I- F means that F E FQ is a provably correct answer to Q E Q
for explicit knowledge T C FS , defined by

Q ?- T |- F if and only if Q ?- F and T |- F

Similarly,

Q ?- T I D - F if and only if Q ?- F and T I D - F

If P is sound, then provable correctness implies semantic correctness


(Definition 2.2.2). If P is complete, then semantic correctness implies prov-
able correctness.
Going back to the communication analysis of previous sections, let Ks
be the speaker's implicit knowledge, let K0 be the auditor's initial im-
plicit knowledge, and let T0 be the auditor's initial explicit knowledge.
When the speaker utters a set of formulae Tu, consistent with her im-
plicit knowledge, the auditor's implicit knowledge improves as before to
K1 = K0 n Models(Tu), and the auditor's explicit knowledge improves to
T1 = T0 U Tu. Let a questioner ask a question Q of the auditor. Without
further communication from the speaker, the auditor may improve his ex-
plicit knowledge by proving new formulae from hypotheses in T1 in order to
answer the question Q. If the auditor's initial explicit knowledge is empty,
then T1 = Tu, so the formulae derivable in this way are exactly the prov-
able consequences of Tu, and the answers that may be found are exactly
the provably correct answers to Q for Tu. If the proof system used by the
auditor is sound, then all such answers are semantically correct; if the proof
system is complete then all semantically correct answers are provable. Now
let the speaker be a programmer who utters a set of formulae constituting
a logic program, let the auditor be a processor, and let the questioner be a
user whose question is given as input to the program. Then, computations
performed by the processor take the form of search for and construction of
proofs deriving the explicit knowledge needed to produce an output answer,
from the explicit knowledge given in the program.
44 Michael J. O'Donnell

3.3 Programming systems


A programming system represents the computational behavior of a proces-
sor. In order to understand logic programming, we consider an arbitrary,
possibly ineffective and nondeterministic, programming system, and then
show how to evaluate its behavior logically with respect to a given semantic
system and query system. We choose an unconventional formal notation
for programming systems in order to expose the close analogy of sets of for-
mulae to programs, questions to inputs, proofs to computations, answers
to outputs.
Definition 3.3.1. A programming system is a system P = (P, I, C, O,
> D, ->}, where
1. P is a set of programs
2. I is a set of inputs
3. C is a set of computations
4. O is a set of outputs
5. > D is a relation on I x P x C (when (I, P, C) are in the relation> D,
we write I> P D C)
6. For each I E I and P e P, there is at least one C € C with I> P D C
7. ->• is a relation on C x O
8. For each C 6 C, there is at most one O e O with C ->• O (that is, ->
is a partial function from C to O)
We define the relation > D -»C I x P x C x O by

I> P D C is intended to mean that, when input / is presented to pro-


gram P, one possible resulting computation is C. C —> O is intended to
mean that the computation C produces output O. Multiple computa-
tions for a given P and I are allowed, indicating nondeterminism, but each
computation produces at most one output. The intended meaning of a
nondeterministic computation relation is that we do not know which of
the several possible computations will occur for a given input and output
[Dijkstra, 1976]. The choice may be determined by unknown and time-
dependent factors, or it may be random. In order to guarantee some
property of the result of computation, we must ensure that it holds for
all possible nondeterministic choices.
In well-known programming systems from theory textbooks, programs,
inputs, and outputs (like formulae, proofs, and questions) are finite syntac-
tic objects, and the sets P, I, and O are computable. Infinite computations,
in the form of infinite sequences of finite memory-state descriptions, are al-
lowed, but in theory textbooks the infinite computations normally have
Introduction 45

no output. On the other hand, the straightforward abstractions of well-


known programming systems from real life (abstracted only by ignoring
all bounds on time and space resources in the computation) allow infinite
computations to consume infinite inputs and produce infinite outputs.
In many cases, we are interested only in the input-output behavior of
a program, and not in the computations themselves. So, we let each pro-
gramming system define a trinary relation determining the possible outputs
for a given program and input.
Definition 3.3.2. Let P = {P, I, C, O,> D, -») be a programming system.
The computed-output relation defined by P is> D->C P x I x O, where

I> PD->O if and only if there exists a computation C € C


such that I> P D C -> O.

For a programming system to be useful, the computed-output relation >


(D-> must be sufficiently effective to allow a mechanical implementation.
Computed-output relations in theory textbooks, like provable-consequence
relations, are normally semicomputable; computation and output relations
> D and ->, like trinary proof relations I -, are normally computable (and
even primitive recursive). If C, > D, and -> are all semicomputable, then
so is> (D->. In fact, every semicomputable computed-output relation >
inputs and/or infinite outputs require more liberal, and less conventional,
notions of effectiveness.
The programming systems of Definition 3.3.1 are not required to be de-
terministic, or effective. They are a simple generalization of the program-
ming systems, also called indexings and Godel numberings, of recursion
theory [Machtey and Young, 1978; Kleene, 1952]. Our I> P D-> O corre-
sponds to the recursion-theoretic notation o p ( I ) = O. Recursion theory
normally considers only determinate programming systems.
Definition 3.3.3. A programming system P = {P, I, C, O,> 0,->) is de-
terminate if and only if, for each program P and input /, there is at most
one output O such that I> PD->O. That is, > (H is a partial function
from P x I to O.
A programming system P = (P, I, C, O,> D, —>) is deterministic if and
only if, for each program P and input /, there is a unique computation C
such that I> P D C. That is, > 0 is a partial function from P x I to C.
Determinism implies determinacy, but not the converse—a nondeter-
ministic programming system may provide many computations that yield
the same determinate output.
A number of different programming systems may be defined to answer
questions in the shallow implicational calculus, depending on the type of
question and the stringency of requirements for the answer.
46 Michael J. O'Donnell

Example 3.3.4. Let F=> = {(a => b) : a, b € At} be the set of implica-
tional SIC formulae (F$h — At, with FSh and At defined in Example 2.1.2).
The set of implicational logic programs (P=>) is the set of finite subsets of

Let the set of inputs to implicational logic programs be QS1—the set of


questions of the form imp(a) ('what atomic formula does a imply?') de-
fined in Example 2.2.3.
The set of naive implicational computations (Cs1) is the set of nonempty
finite and infinite sequences of atomic formulae.
The computation relation > DS1 is defined by imp(a) > T DS1 (c0,
c1 , . . .) if and only if
1. CQ = a
2. (ci-1 => Ci) £ T for all i > 0 in the (finite or infinite) range of the
sequence (C0,.. .)
The output relation -»s1 is defined by

Infinite computations have no output.


Now (P=>, QS1, Cs1,F=>,>DSI, ->s1) is a programming system, computing
answers to questions of the form 'what atomic formula does a imply?'
The programming system above behaves nondeterministically and in-
determinately in proving some implication of a. Its computations may halt
at any point and output the latest atomic conclusion found. Loops in the
graph of implications lead to infinite computations with no output. Notice
that each finite computation (C0, . . .ci, . . . ,c m ), with output (c0 => cm),
translates very easily into the linear proof

of (C0 =>• Cm) in the proof system of Example 3.1.2. The first line is an
instance of the reflexive rule, and subsequent lines alternate between im-
plications in T, and applications of transitivity.
In order to avoid uninformative outputs, such as (a => a), we need a
programming system with a slightly more sophisticated notion of when to
stop a computation.
Example 3.3.5. Let fail be a new formal symbol, and let the set of naive
implicational computations with failure be

(the set of finite and infinite sequences of atomic formulae, possibly ending
in the special object fail).
Introduction 47

Let the set of inputs to implicational logic programs be QS2—the set of


questions of the form rest-imp(a, A) ('what atomic formula not in A does
a imply?').
The computation relation > DS2 is defined by rest-imp(a, A)> T Ds2 (C0,
. . .) if and only if
1. c0 = a
2. if Ci e A, and there is a d € At such that (c; => d) £ T, then the
sequence has an i + 1st element ci+1, and (a => Ci+1) e T
3. if Ci e A, and there is no d € At such that (ci => d) e T, then the
sequence has an i + 1st element, and Ci+1 = fail
4. if ci € At — A, then either there is no i+lst element, or (ci => Ci+1) € T
5. if Ci — fail, then there is no i + 1st element
So, > Qs2 allows computations to terminate only when an atomic formula
outside of A has been found, or a dead end in the implication graph has
been reached.
The output relation ->S2 is defined by

Infinite computations and finite computations ending in fail have no out-


put.
Now (P=>, QS2, Cs2, F=>,> Ds2, ->S2) is a programming system, computing
answers to questions of the form 'what atomic formula not in A does a
imply?'
The programming system of Example 3.3.5 is nondeterministic and in-
determinate. It avoids useless answers, but it still may fall into infinite or
finite failing computations, even when a legitimate answer exists. It also
may find an answer, but fail to output it and proceed instead into a failure
or an infinite computation. Successful computations translate into proofs
as in Example 3.3.4.
We may further strengthen the behavior of a programming system by
letting it back up and try new proofs after finite failures, and by insisting
that answers be output as soon as they are found.
Example 3.3.6. Let the set of inputs to implicational logic programs
again be QS2—the set of questions of the form rest-imp (a, A) ('what
atomic formula not in A does a imply?').
Let print be a new formal symbol. The set of backtracking implicational
computations (Cs3) is the set of nonempty finite and infinite sequences of
finite sets of atomic formulae, possibly ending in the special form print (a)
where a is an atomic formula (fail of Example 3.3.5 is represented now by
the empty set).
The computation relation > Ds3 is defined by rest-imp(a, A)> T DS3 (C0,
. . .) if and only if
48 Michael J. O'Donnell

1. C0 = {a}
2. if Ci C A and Ci = 0, then there is an i + 1st element, and

for some atomic formula c € Ci


3. if Ci C At, and Ci — A = 0, then there is an i + 1st element, and
Ci+1= print (c) for some c £ Ci - A
4. if Ci = 0, or Ci = print (c), then there is no i + 1st element
So, > DS3 allows a computation to replace any atomic formula that has
already been proved with the set of atomic formulae that it implies directly
in T. A computation halts precisely when it chooses a unique atomic
formula not in A to output, or when it fails by producing the empty set.
The output relation -»s3 is defined by

Infinite computations, and finite computations ending in 0, have no output.


Now (P=>, Qs2, Cs3, F=>,> DS3, ->s3) is another programming system, com-
puting answers to questions of the form 'what atomic formula not in A does
a imply?'
The programming system of Example 3.3.6 is nondeterministic and in-
determinate. It is less susceptible to missing answers than that of Exam-
ple 3.3.5. The new system does not get stuck with a failure when a single
path in the implication graph leads to a dead end: a computation ends in
0 only when all paths have been followed to a dead end. When there is a
finite path to an answer, and also a cycle, the nondeterministic choice of
which formula to replace at each step determines which path is followed in
the computation, and so determines success or infinite computation. The
translation of a computation in the latest system to a proof is not quite
as transparent as in Examples 3.3.4 and 3.3.5, but it is still simple. Each
successful computation ({C0}, C1, . . ., C m _1, print(c m )) must contain a se-
quence of atomic formulae (C0 , c1, . . . , cm _ 1, cm), where for i < m ci 6 ,
and for adjacent pairs Ci,Ci+1, either ci = Cj+1, or (ci => ci+1) 6 T. This
sequence of atomic formulae transforms to a linear proof as before.
A final example of a programming system illustrates the use of incre-
mental output from possibly infinite computations to produce consequen-
tially strong answers.
Example 3.3.7. Let the set of inputs to implicational logic programs be
QSc—the set of questions of the form conj-imp(a) ('what are some atomic
formulae that a implies?') from Example 2.2.6.
The set of conjunctive implicational computations (Cs4) is the set of non-
empty finite or infinite sequences of finite sets of atomic formulae (the same
Introduction 49

as Cs3, without the final elements print (a)).


The computation relation> Ds4 is defined by conj-imp(a)> T Ds4 (C0,...)
if and only if
1. C0 = {a}
2. if Ci = 0, then there is an i + 1st element, and

for some atomic formula c € Ci


3. if Ci = 0, then there is no i + 1st element
The computations above are the same as those of Example 3.3.6, except
that we never choose a single atomic formula to output. 0 is no longer
regarded as a failure.
The output relation -»S4 is defined by (Co,...) ->S4 (a => b1) A ... A
(a => bm) if and only if
1. C0 = {a}
2. {b1,...b m } = C0 U Ci U . . . , and b1,...b m are given in the order of
first appearance in the sequence Co,..., with ties broken by some
arbitrary ordering of atomic formulae
Notice that even infinite computations have output.
Now (P => ,Q Sc , Cs4, F=>,> Ds4,->S4) is a programming system, computing
answers to questions of the form 'what are some atomic formulae that a
implies?'
The programming system above should be thought of as producing its
output incrementally at each computation step. It is nondeterministic and
indeterminate. Even though the computation may be infinite, it never
fails to produce an answer, although the output may be the trivial formula
a => a. The strength of the answer produced depends on the nondetermin-
istic choices of the atomic formula replaced in each computation step.

3.4 Soundness and completeness of programming sys-


tems
A query system determining what constitutes an answer, and the semantic-
consequence relation of a semantic system determining the correctness of
an answer, yield criteria for evaluating the behavior of a programming
system, similar to the criteria for evaluating a proof system in Section 3.1.
We define soundness and completeness of programming systems, in analogy
to the soundness and completeness of proof systems. Logic programming
is distinguished from other sorts of programming by the use of such logical
concepts to evaluate programming systems.
There is only one sensible concept of soundness for a programming
system: every output is a correct answer to the input program. When
50 Michael J. O'Donnell

a given question has more than one correct answer, completeness criteria
vary depending on the way in which we expect an output answer to be
chosen.
Definition 3.4.1. Let P = <P, I, C, O,> D, ->) be a programming system,
let S = (F S ,M, |=) be a semantic system, and let Q = (F Q ,Q,?-) be a
query system, with P C 2Fs, I C Q, and O C Fs n FQ.
P is sound for S and Q if and only if, for all P e P and I € I,
I> P H- O implies I ?- P |= O
P is weakly complete for S and Q if and only if, for all P e P and 7 € I
such that / is semantically answerable for P (Definition 2.2.2), and for all
computations C e C such that I> P D C, there exists O e O such that
C-» O and I?- P |= O
(O is unique because —> is a partial function).
P is consequentially complete for S and Q if and only if P is weakly
complete and, in addition, O above is a consequentially strongest correct
answer ({O} |= N for all N € FQ such that I ?- P |= N).
So, a programming system is sound if all of its outputs are correct an-
swers to input questions, based on the knowledge represented explicitly
by programs. A system is weakly complete if, whenever a correct answer
exists, every computation outputs some correct answer. A system is conse-
quentially complete if, whenever a correct answer exists, every computation
outputs a consequentially strongest correct answer. Notice that, for conse-
quential completeness, the strength of the output answer is judged against
all possible answers in the query system, not just those that are possible
outputs in the programming system, so we cannot achieve consequential
completeness by the trickery of disallowing the truly strongest answers.
A programming system provides another approach to analyzing a simple
form of communication. While semantic systems, proof systems, and query
systems yield insight into the meaning of communication and criteria for
evaluating the behavior of communicating agents, programming systems
merely describe that behavior. A programmer provides a program P to a
processor. A user (sometimes, but not always, identical with the program-
mer) provides an input /, and the processor performs a computation C such
that I> P D C from which the output O, if any, such that C —> O, may be
extracted. We allow the mapping from computation to output to depend
on purely conventional rules that are adopted by the three agents. What
aspects of a computation are taken to be significant to the output is really
a matter of convention, not necessity. Often, only the string of symbols
displayed on some printing device is taken to be the output, but in various
contexts the temporal order in which they are displayed (which may be
Introduction 51

different from the printed order if the device can backspace), the temporal
or spatial interleaving of input and output, the speed with which output
occurs, the color in which the symbols are displayed, which of several de-
vices is used for display, all may be taken as significant. Also, convention
determines the treatment of infinite computation as having no output, null
output, or some nontrivial and possibly infinite output produced incremen-
tally. The presentation of input to a computation is similarly a matter of
accepted convention, rather than formal computation.
In logic programming, where the programmer acts as speaker, the pro-
cessor as auditor, and the user as questioner, soundness of the program-
ming system guarantees that all outputs constitute correct answers. Vari-
ous forms of completeness guarantee that answers will always be produced
when they exist. In this sense, soundness and completeness mean that a
programming system provides a correct and powerful implementation of
the auditor in the speaker-auditor-questioner scenario of Section 2.2.
There is a close formal correspondence between programming systems
and pairs of proof and query systems: inputs correspond to questions,
programs correspond to sets of hypotheses, computations to proofs, and
outputs to theorems (for a different correspondence, in which programs
in the form of lambda terms correspond to natural deduction proofs, see
[Howard, 1980; Tait, 1967; Constable et al., 1986]—compare this to the in-
terpretation of formulae as queries and proofs as answers [Meseguer, 1989]).
Notice that both quaternary relations Q ?- T I D - F and I> P 0 C -> O
are typically computable, while both of the trinary relations Q ?- T |- F
and I> P D-> O are typically semicomputable. Furthermore, the defini-
tions of the trinary relations from the corresponding quaternary relations
are analogous. In both cases we quantify existentially over the third argu-
ment, which is variously a proof or a computation.
There is an important difference, however, between the forms of defi-
nition of the provable-answer relation Q ?- T I D - F, and of the compu-
tational relation I> P 0 C —> O, reflecting the difference in intended uses
of these relations. This difference has only a minor impact on the rela-
tions definable in each form, but a substantial impact on the efficiency of
straightforward implementations based on the definitions. In the query-
proof domain, we relate formulae giving explicit knowledge (program) to
the proofs (computations) that can be constructed from that knowledge,
yielding formulae (outputs) that are provable consequences of the given
knowledge in the relation T I D - F. We independently relate questions
(inputs) to the answers (outputs) in the relation Q ?- F, and then take
the conjunction of the two. There is no formal provision for the question
(input) to interact with the knowledge formulae (program) to guide the
construction of the proof (computation)—the question (input) is only used
to select a completed proof. In the computational domain, we relate inputs
(questions) directly to programs (knowledge formulae) to determine com-
52 Michael J. O'Donnell

putations (proofs) that they can produce in the relation I> P 0 C. Then,
we extract outputs (answer formulae) from computations (proofs) in the
relation C -> O. The relation I> P 0 C provides a formal concept that
may be used to represent the interaction of input (question) with program
(knowledge) to guide the construction of the computation (proof).
Given a proof system and a query system, we can construct a pro-
gramming system with essentially the same behavior. This translation is
intended as an exercise in understanding the formal correspondence be-
tween proofs and computations. Since our requirements for proof systems
and programming systems are quite different, this construction does not
normally lead to useful implementations.
Proposition 3.4.2. Let D = (F, D, I -) be a proof system, and let Q =
(F, Q,?-) be a query system. Define the programming system

where
1. Q> T 0 (P, F) if and only if Q ?- T I P - F
2. Q> T O fail if and only if there are no P and F such that
Q?- T I P - F
3. (P, F) -> G if and only if F = G
4. fail -> F is false for all F € F
Then,
Q?-T\P-Fifand only if Q> T D (P,F) -> F
Therefore,
Q ? - T | - F if and only if Q> T D-> F

If D above is sound for some semantic system 5, then P is sound for


S and Q, but the converse fails because some formulae may never occur
as answers to questions. Proposition 3.4.2 shows that a proof may be
interpreted as a nondeterministically chosen computation that outputs a
theorem.
Because hypotheses to proofs are sets of formulae, rather than single for-
mulae, and the proof relation must be monotone with respect to the subset
relation, there are computational relations > D-> defined by programming
systems that are not the same as the relation ?- |- of any proof and query
systems. Intuitively, in order to mimic an arbitrary programming system
with a proof system and a query system, we must augment the output O
resulting from input / into a formula asserting that input I produces out-
put O. This augmentation is almost trivial, in the sense that echoed input
may just as well be regarded as an implicit part of the output. Informal
question answering uses such implicit augmentation: in response to the
Introduction 53

question 'what is the capital city of Idaho?' the abbreviated answer 'Boise'
is generally accepted as equivalent to the full answer 'the capital city of
Idaho is Boise.'
Proposition 3.4.3. Let P = (P, I, C, O) be a programming system. De-
fine the proof system D = (P U (I x O), C, I -), where
1. T I C - (I, O) if and only if I> P 0 C -> O for some P € T
2. T I C - P is false for all P € P
Also define the query system Q = (I x O, I, ?-), where
1. I ?- (J, O) if and only if I = J.
Then,

Therefore,

As in the construction of Proposition 3.4.2, soundness of V implies


soundness of P, but the converse fails, and completeness does not transfer
either way. Proposition 3.4.3 shows that a computation may be interpreted
as a proof that a given program and input produce a certain output.
Example 3.4.4. The programming systems of Examples 3.3.4, 3.3.5, 3.3.6,
and 3.3.7 are all sound for their appropriate semantic and query systems.
The proof of soundness is easy—every computation can be transformed
easily into a proof in a sound proof system.
The programming system of naive implicational computations in Ex-
ample 3.3.4 is not weakly complete. Consider the program

Given the input question imp(a) ('what logical formula does a imply?'), a
possible computation is the infinite sequence

(a, b, a, 6, . . .)

which has no output. There are three correct answers,

each of which is found by a short computation.


The programming system of naive implicational computations with fail-
ure in Example 3.3.5 is not weakly complete. Consider the program
54 Michael J. O'Donnell

Given the input question rest-imp(a, {a, b, c}) ('what logical formula not
in {a, 6, c} does a imply?'), two possible computations with no output are

(a, b, fail) and (a, c, a, c, . . .)

There is a correct answer, a => d, which is found by the computation (a, d).
The programming system of backtracking implicational computations
in Example 3.3.6 avoids the finite failure of the naive computations with
failure, but is still not weakly complete because of infinite computations. It
succeeds on the program and question above, with the unique computation

but fails on a slightly trickier case. Consider the program

and the question rest-imp(a, {a, b, c, d}). There is no finite failing compu-
tation, and the correct answer a => e is output by the computation

({a}, {b, c, d}, {b, d}, {a, d}, {a, e}, print(e)}

But there is still an infinite computation that misses the output:

The programming system of conjunctive implicational computations in


Example 3.3.7 is weakly complete, simply because every computation out-
puts some correct answer of the form (a => a) A . . ., where in the worst case
only the first conjunct is given. This system was clearly aimed, however,
toward producing consequentially strong answers. It is not consequentially
complete. Consider again the program

and the new question conj-imp(a). The computation

({a}, {&, c, d}, {b, c, e}, {b, a, e}, {b, c, d, e}, {&, a, d, e}, {b, c, d, e}, . . .)

outputs the consequentially strongest answer

But the computation


Introduction 55

outputs only the weaker answer

missing the conjunct a => e.


In each case above, the failure of completeness results from the possi-
bility of unfortunate choices for the next computational step.
Most practical implementations of logic programming languages are
not complete, and many are not even sound. Nonetheless, soundness and
completeness are useful standards against which to judge implementations.
Most implementations are sound and complete for well-characterized sub-
sets of their possible programs and inputs. The cases where soundness
and/or completeness fail are typically considered at least peculiar, and
sometimes pathological, and they are the topics of much discussion and
debate. The history of programming languages gives some hope for a
trend toward stricter adherence at least to soundness criteria. For ex-
ample, early Lisp processors employed dynamic scoping, which is essen-
tially an unsound implementation of logical substitution. Modern Lisp
processors are usually statically scoped, and provide sound implementa-
tions of substitution [Muchnick and Pleban, 1980; Brooks et al., 1982;
Rees Clinger, 1986]. As compiler technology matured, the logically cor-
rect static scoping was found to be more efficient than dynamic scoping,
although early work assumed the contrary.
In spite of the close formal correspondence outlined above between
proof and computation, our natural requirements for proof systems and
programming systems differ significantly. The requirements for correct-
ness, formalized as soundness, are essentially the same—everything that is
proved/computed must be a logical consequence of given information. But,
the requirements for power, formalized as completeness, vary substantially.
Proofs are thought of as things to search for, using any available tools
whether formal, intuitive, or inspirational, and we only demand formal or
mechanical verification of the correctness of a proof, not mechanical dis-
covery of the proof. So, proofs are quantified existentially in the definition
of completeness of a proof system, and we are satisfied with the mere ex-
istence of a proof of a given true formula. Computations, on the other
hand, are thought of as being generated on demand by a computing agent
in order to satisfy the requirement of a user. So, we require that a complete
programming system be guaranteed to produce a sufficiently strong correct
answer whenever a correct answer exists. Since we do not know which of
several possible computations will be generated nondeterministically by the
computing agent, we quantify universally over computations.
Because of the requirement that all computations in a complete pro-
56 Michael J. O'Donnell

gramming system yield correct answers, merely mimicking the relational


behavior of a proof system, as in Proposition 3.4.2, is not sufficient for use-
ful implementation. Practical implementations must use complete strate-
gies for choosing proofs in order to provide programming systems with the
desired guaranteed results.

3.5 Proof-theoretic foundations for logic programming


Given a suitable proof system, a practical implementation of a logic pro-
gramming language still must solve the difficult problem of searching the set
of proofs for one that provides an answer to a given input question. Meth-
ods for choosing and generating proofs are called proof strategies. While
logical semantics provides the conceptual tools for specifying logic program-
ming languages, proof theory [Stenlund, 1972; Prawitz, 1965; Takeuti, 1975;
Schutte, 1977; Girard et al, 1989]. provides the tools for developing proof
strategies. Once a proof strategy has been defined, the remaining prob-
lems in implementation are the invention of appropriate algorithms and
data structures for the strategy, and the details of code generation or in-
terpreting. The organization of logic programming into the application of
semantics to specification, and the application of proof theory to imple-
mentation, does not mean, however, that the former precedes the latter in
the design of a logic programming language. Design normally requires the
simultaneous consideration of specification and implementation, and the
designer must search the two spaces of semantic specifications and proof-
theoretic strategies in parallel for a compatible pair of ideas. In different
circumstances either topic can be the primary driver of design decisions.
In writing this chapter, I have not been able to develop the proof theoretic
side of logic programming design as thoroughly as the semantic side, merely
because I ran out of time, pages, and energy. In this section, I will only
outline the issues involved in applying proof theory to logic programming.
The choice of a proof strategy affects both the power and the complexity
of an implementation, but not the soundness. Given a sound proof system,
a proof strategy can only choose (or fail to find) a correct proof, it cannot
expand the class of proofs. But, a given proof strategy may be incapable of
discovering certain provably correct answers, so it may yield an incomplete
computing system, even when starting with a complete proof system. So,
there is great value in proof theoretic theorems demonstrating that, when-
ever a formula F is provable, there is a proof in some usefully restricted
form. Even when these do not lead to complete implementations, they can
improve the power/complexity tradeoff dramatically. Fortunately, proof
theorists have concentrated a lot of attention on such results, particularly
in the form of proof normalization theorems, which show that all proofs
may be reduced to a normal form with special structure. Many normal-
ization results are expressed as cut elimination theorems, showing that a
particular version of modus ponens called the cut rule may be removed from
Introduction 57

proofs. Cut elimination theorems are usually associated with the predicate
calculus and its fragments, variants, and extensions. The impact of cut
elimination on proof strategies has been studied very thoroughly, leading
to an excellent characterization of the sequent-style proof systems that
are susceptible to generalizations of the simple goal-directed proof strategy
used in Prolog [Miller et al., 1991]. These proof-theoretic methods have
been applied successfully in some novel logics where the model-theoretic
semantics are not yet properly understood.
In the term rewriting literature, there are similar results on the nor-
malization of equational proofs. Many of these come from confluence (also
called Church-Rosser) results. A system of equations

presented with each equation oriented in a particular left-right order, is


confluent precisely if every proof of an equation s = t may be transformed
into a rewriting of each of s and t to a common form u. Rewriting means
here that an equational hypothesis li = yi may only be used from left to
right, to replace instances of li by corresponding instances of ri but not the
reverse. The restriction of equational proofs to rewritings allows complete
strategies that are much simpler and more efficient than those that search
through all equational proofs. See Sections 2.2 and 2.3 of the chapter
'Equational Logic Programming' in this volume, as well as [Klop, 1991],
for more on the application of term rewriting to equational proofs.

4 The uses of semantics


The development of logic programming systems from logics, given above,
provides a particular flavor of semantics, called logical semantics, for logic
programming languages. Logical semantics, rather than competing directly
with other flavors of programming language semantics, provides different
insights, and is useful for different purposes. The careful comparison of dif-
ferent styles of semantics is a wide-open area for further research. In this
section, I sketch the sort of relations that I believe should be explored be-
tween logical semantics, denotational semantics, and algebraic semantics.
Meseguer proposes two sorts of logic programming— 'weak' logic program-
ming uses essentially the same notion of logical semantics as mine, while
'strong' logic programming uses the theory of a single model, such as a
model derived by algebraic semantics [Meseguer, 1989].

4.1 Logical semantics vs. denotational semantics


Roughly, denotational semantics [Scott, 1970; Scott and Strachey, 1971;
Stoy, 1977] takes the meaning of a program to be an abstract descrip-
tion of its input/output behavior, where inputs and outputs are uninter-
preted tokens. Denotational semantics assigns to each program a unique
58 Michael J. O'Donnell

value carrying that meaning. One problem of denotational semantics is


how to deal with observable computational behavior, such as nontermi-
nation, that does not produce output tokens in the concrete sense. This
problem was solved by expanding the domains of input and output, as
well as the domains of program meanings, to partially ordered sets (usu-
ally chain-complete partial orderings [Markowsky, 1976] or lattices [Scott,
1976]) containing objects representing abstract computational behaviors,
not all of which produce tokens as output [Reynolds, 1973; Scott, 1982]. In
practice, the definition of appropriate domains is often the most challeng-
ing task in creating a denotational semantic description of a programming
language, and domain theory has become a definite specialty in theoretical
computer science [Schmidt, 1986; Stoy, 1977; Zhang, 1991; Gunter, 1992;
Winksel, 1993].
The denotational approach provides a useful tool for characterizing
what a particular type of implementation actually does, but it does not
give any intuitive basis for discussing what an implementation ought to
do. Logical semantics, on the other hand, begins with an interpretation
of input and output. It does not directly address techniques for analyz-
ing the behavior of programs—that is left to a metalanguage. But it does
provide an intuitive basis for distinguishing logically reasonable behaviors
from other behaviors.
For example, denotational semantics for functional languages was ini-
tially defined using eager evaluation [Backus, 1978] The domains that were
used to define eager evaluation are not rich enough to represent lazy evalu-
ation. In fact the definition of domains for lazy evaluation [Winksel, 1993]
posed difficult technical problems, causing resistance to the use of lazy
evaluation. Denotational semantics for lazy evaluation matured long after
the idea had been implemented, and informal evidence for its utility had
been presented. [Friedman and Wise, 1976; Henderson and Morris, 1976]
Logical semantics for equational programming, on the other hand, requires
lazy evaluation for completeness, and the demand for lazy evaluation from
this point of view precedes its invention as a programming tool—at latest
it goes back to [O'Donnell, 1977] and the essential roots are already there
in work on combinatory logic and the lambda calculus [Curry and Feys,
1958]. Once lazy evaluation was explained denotationally, that explanation
became a very useful tool for analysis and for deriving implementations. In
general logical semantics predicts and prescribes useful techniques, while
denotational semantics explains and analyzes them.

4.2 Logical semantics vs. initial/final-algebra and


Herbrand semantics
Semantics that use initial or final algebras or Herbrand models [Guttag and
Horning, 1978; Goguen et al., 1978; Meseguer and Goguen, 1985] to repre-
Introduction 59

sent the meanings of programs provide systematic techniques for deriving


denotational-like semantics from logical semantics. Logical semantics de-
termines a large class of models consistent with a given program. Algebraic
semantic techniques construct a single model, depending on the language
in which output is expressed as well as the given program, whose output
theory is the same as that of the class of models given by logical semantics.
This single model can be used for the same sorts of analysis as denotational
semantics (although it is not always based on a lattice or chain-complete
partial ordering). Such single-model semantics must be reconsidered when-
ever the output language expands, since in the larger language the theory
of the single model may not be the same as the theory of the class of models
consistent with the program.
For example, consider a language (based on ideas from Lucid [Ashcroft
and Wadge, 1977] ) with symbols cons, first, more, a, b, satisfying the
equations
first(cons(x,y)) = first(x)
first(a) = a
•first(b) = b
more(cons(x,y)) = y
more (a) = a
more (b) = b
Assume that only the symbols a and b are allowed as output—that
is, we are only interested in deriving equations of the forms s = a and
t = b, where s and t are arbitrary input terms. Algebraic semantic tech-
niques typically interpret this system over a universe of infinite flat (i.e.,
not nested) lists with elements from the set {a, b}, where after some fi-
nite prefix, all elements of the list are the same. cons(s,t) is interpreted
as the list beginning with the first element of s, followed by all the ele-
ments o f f . In this algebraic interpretation, cons (cons (a, b), b) = cons(a,b)
and cons(b, cons(a,a)) = cons(b,a) hold, although neither is a semantic
consequence of the given equations. If, however, we add the conventional
symbols cor and cdr, and define them by the equations

car (cons ( x , y ) ) = x
cdr (cons ( x , y ) ) = y

then we must expand the universe of the algebraic interpretation to the


universe of binary trees with leaves marked a and b. There is no way to
define the functions car and cdr in the flat list model so that they satisfy
the new equations. If we take the full equational theory of the flat list
model, and add the defining equations for car and cdr, then the resulting
theory trivializes. Every two terms s and t are equal by the derivation

s = cdr(car(cons(cons(a, s),b)))
60 Michael J. O'Donnell

= cdr(car(cons(a,b)))
= cdr(car(cons(cons(a,t),b)))
= t

Of course, nobody would apply algebraic semantics in this way—taking the


model for a smaller language and trying to interpret new function symbols
in the same universe. But, what the example shows is that an algebraic
model of a given system of equations may not preserve all of the relevant
information about the behavior of those equations in extended languages.
The set of models associated with a system of equations by logical semantics
is much more robust, and carries enough information to perform extensions
such as the example above.
In general, algebraic semantic techniques, based on initial models, final
models, and Herbrand universes, provide useful tools for determining, in a
given program, the minimum amount of information that a data structure
must carry in order to support the computational needs of that program.
They do not, and are not intended to, represent the inherent information
given by the formulae in the program, independently of a particular def-
inition of the computational inputs and outputs that the program may
operate on.

Acknowledgements
I am very grateful for the detailed comments that I received from the
readers, Bharay Jayaraman, Jose Meseguer, Dale Miller, Gopalan Nadathur
and Ed Wimmers. All of the readers made substantial contributions to the
correctness and readability of the chapter.

References
[Anderson and Belnap Jr., 1975] Alan Ross Anderson and Nuel D. Belnap
Jr. Entailment—the Logic of Relevance and Necessity, volume 1. Prince-
ton University Press, Princeton, NJ, 1975.
[Andrews, 1986] Peter B. Andrews. An Introduction to Mathematical Logic
and Type Theory: To Truth Through Proof. Computer Science and Ap-
plied Mathematics. Academic Press, New York, NY, 1986.
[Ashcroft and Wadge, 1977] E. Ashcroft and W. Wadge. Lucid: A non-
procedural language with iteration. Communications of the ACM,
20(7):519-526, 1977.
[Backus, 1974] John Backus. Programming language semantics and closed
applicative languages. In Proceedings of the 1st ACM Symposium on
Principles of Programming Languages, pages 71-86. ACM, 1974.
Introduction 61

[Backus, 1978] John Backus. Can programming be liberated from the von
Neumann style? a functional style and its algebra of programs. Com-
munications of the ACM, 21(8):613-641, 1978.
[Barendregt, 1984] Hendrik Peter Barendregt. The Lambda Calculus: Its
Syntax and Semantics. North-Holland, Amsterdam, 1984.
[Belnap Jr. and Steel, 1976] Nuel D. Belnap Jr. and T. B. Steel. The Logic
of Questions and Answers. Yale University Press, New Haven, CT, 1976.
[Brooks et al., 1982] R. A. Brooks, R. P. Gabriel, and Guy L. Steele. An
optimizing compiler for lexically scoped Lisp. In Proceedings of the 1982
ACM Compiler Construction Conference, June 1982.
[Church, 1941] A. Church. The Calculi of Lambda-Conversion. Princeton
University Press, Princeton, New Jersey, 1941.
[Codd, 1970] E. F. Codd. A relational model of data for large shared data
banks. Communications of the ACM, 13(6), June 1970.
[Codd, 1971] E. F. Codd. A data base sublanguage founded on the rela-
tional calculus. In Proceedings of the 1971 ACM SIGFIDET Workshop
on Data Description, Access and Control, 1971.
[Codd, 1972] E. F. Codd. Relational completeness of data base sublan-
guages. In Data Base Systems, volume 6 of Courant Computer Science
Symposia. Prentice-Hall, Englewood Cliffs, NJ, 1972.
[Cohn, 1965] P. M. Cohn. Universal Algebra. Harper and Row, New York,
NY, 1965.
[Constable et al, 1986] Robert L. Constable, S. F. Allen, H. M. Brom-
ley, W. R. Cleaveland, J. F. Cremer, R. W. Harper, D. J. Howe,
Todd B. Knoblock, N. P. Mendler, Prakesh Panangaden, J. T. Sasaki,
and Scott F. Smith. Implementing Mathematics with the Nuprl Proof
Development System. Prentice-Hall, Englewood Cliffs, NJ, 1986.
[Curry and Feys, 1958] H. B. Curry and R. Feys. Combinatory Logic, vol-
ume 1. North-Holland, Amsterdam, 1958.
[Date, 1986] C. J. Date. An Introduction to Database Systems. Systems
Programming. Addison-Wesley, Reading, MA, 4 edition, 1986.
[Dijkstra, 1976] Edsger W. Dijkstra. A Discipline of Programming.
Prentice-Hall, Englewood Cliffs, NJ, 1976.
[Dwelly, 1988] Andrew Dwelly. Synchronizing the I/O behavior of func-
tional programs with feedback. Information Processing Letters, 28, 1988.
[Fagin et al., 1984] Ronald Fagin, Joseph Y. Halpern, and Moshe Y. Vardi.
A model-theoretic analysis of knowledge. In Proceedings of the 25th
Annual IEEE Symposium on Foundations of Computer Science, pages
268-278, 1984.
[Friedman and Wise, 1976] Daniel Friedman and David S. Wise. Cons
should not evaluate its arguments. In 3rd International Colloquium on
62 Michael J. O'Donnell

Automata, Languages and Programming, pages 257-284. Edinburgh Uni-


versity Press, 1976.
[Futatsugi et al., 1985] K. Futatsugi, Joseph A. Goguen, J.-P. Jouannaud,
and Jose Meseguer. Principles of OB J2. In 12th Annual Symposium on
Principles of Programming Languages, pages 52-66. ACM, 1985.
[Gallaire et al., 1984] Herve Gallaire, Jack Minker, and J. M. Nicolas.
Databases: A deductive approach. ACM Computing Surveys, 16(2),
June 1984.
[Gallier, 1986] Jean H. Gallier. Logic for Computer Science—Foundations
of Automatic Theorem Proving. Harper & Row, New York, NY, 1986.
[Gentzen, 1935] Gerhard Gentzen. Untersuchungen iiber das logische
schlieBen. Mathematische Zeitschrift, 39:176-210, 405-431, 1935. En-
glish translation in [Gentzen, 1969].
[Gentzen, 1969] Gerhard Gentzen. Investigations into logical deductions,
1935. In M. E. Szabo, editor, The Collected Works of Gerhard Gentzen,
pages 68-131. North-Holland, Amsterdam, 1969.
[Girard et al., 1989] Jean-Yves Girard, Yves Lafont, and Paul Taylor.
Proofs and Types. Cambridge Tracts in Theoretical Computer Science.
Cambridge University Press, Cambridge, UK, 1989.
[Goguen and Burstall, 1992] Joseph A. Goguen and Rod M. Burstall. In-
stitutions: Abstract model theory for specification and programming.
Journal of the ACM, 39(1):95-146, January 1992.
[Goguen et al., 1978] Joseph A. Goguen, James Thatcher, and Eric Wag-
ner. An initial algebra approach to the specification, correctness and
implementation of abstract data types. In Raymond Yeh, editor, Cur-
rent Trends in Programming Methodology, pages 80-149. Prentice-Hall,
1978.
[Goguen, 1990] Joseph A. Goguen. Higher order functions considered un-
necessary for higher order programming. In David A. Turner, editor,
Research Topics in Functional Programming, pages 309-351. Addison-
Wesley, 1990.
[Gordon, 1992] Andrew D. Gordon. Functional Programming and In-
put/Output. PhD thesis, University of Cambridge, 1992.
[Gratzer, 1968] G. Gratzer. Universal Algebra. Van Nostrand, Princeton,
NJ, 1968.
[Gunter, 1992] Carl A. Gunter. Semantics of Programming Languages:
Structures and Techniques. Foundations of Computing. MIT Press, Cam-
bridge, MA, 1992.
[Guttag and Horning, 1978] John V. Guttag and J. J. Horning. The alge-
braic specification of abstract data types. Ada Informatica, 10(l):l-26,
1978.
Introduction 63

[Henderson and Morris, 1976] P. Henderson and J. H. Morris. A lazy eval-


uator. In 3rd Annual ACM Symposium on Principles of Programming
Languages, pages 95-103. SIGPLAN and SIGACT, 1976.
[Hoffmann and O'Donnell, 1982] C. M. Hoffmann and M. J. O'Donnell.
Programming with equations. A CM Transactions on Programming Lan-
guages and Systems, 4(1):83-112, 1982.
[Hoffmann et al., 1985] C. M. Hoffmann, M. J. O'Donnell, and R. I.
Strandh. Implementation of an interpreter for abstract equations. Soft-
ware — Practice and Experience, 15(12):1185-1203, 1985.
[Howard, 1980] William Howard. The formulas-as-types notion of con-
struction. In John P. Seldin and J. R. Hindley, editors, To H. B. Curry:
Essays on Combinatory Logic, Lambda-Calculus, and Formalism, pages
479-490. Academic Press, New York, NY, 1980.
[Hudak and Sundaresh, 1988] Paul Hudak and Raman S. Sundaresh. On
the expressiveness of purely functional I/O systems. Technical Report
YALEU/DCS/RR665, Yale University, New Haven, CT, December 1988.
[Hudak, 1992] Report on the programming language Haskell, a non-strict,
purely functional language, version 1.2. ACM SIGPLAN Notices, 27(5),
May 1992.
[Jaffar and Lassez, 1987] Joxan Jaffar and Jean-Louis Lassez. Constraint
logic programming. In Fourteenth Annual ACM Symposium on Princi-
ples of Programming Languages, pages 111-119, 1987.
[Jayaraman, 1985] Bharat Jayaraman. Equational programming: A uni-
fying approach to functional and logic programming. Technical Report
85-030, The University of North Carolina, 1985.
[Karlsson, 1981] K. Karlsson. Nebula, a functional operating system. Tech-
nical report, Chalmers University, 1981.
[Kleene, 1952] Steven Cole Kleene. Introduction to Metamathematics, vol-
ume 1 of Biblioteca Mathematica. North-Holland, Amsterdam, 1952.
[Klop, 1991] Jan Willem Klop. Term rewriting systems. In S. Abramsky,
Dov M. Gabbay, and T. S. E. Maibaum, editors, Handbook of Logic in
Computer Science, volume 1, chapter 6. Oxford University Press, Oxford,
1991.
[Kowalski, 1974] R. Kowalski. Predicate logic as a programming language.
In Information Processing 74, pages 569-574. North-Holland, 1974.
[Lassez, 1991] Jean-Louis Lassez. From LP to CLP: Programming with
constraints. In T. Ito and A. R. Meyer, editors, Theoretical Aspects of
Computer Software: International Conference, volume 526 of Lecture
Notes in Computer Science. Springer-Verlag, 1991.
[Mac Lane and Birkhoff, 1967] Saunders Mac Lane and G. Birkhoff. Alge-
bra. Macmillan, New York, NY, 1967.
64 Michael J. O'Donnell

[Machtey and Young, 1978] Michael Machtey and Paul Young. An Intro-
duction to the General Theory of Algorithms. Theory of Computation.
North-Holland, New York, NY, 1978.
[Maier and Warren, 1988] David Maier and David S. Warren. Comput-
ing with Logic—Logic Programming with Prolog. Benjamin Cummings,
Menlo Park, CA, 1988.
[Markowsky, 1976] G. Markowsky. Chain-complete posets and directed sets
with applications. Algebra Universalis, 6:53-68, 1976.
[McCarthy, 1960] John McCarthy. Recursive functions of symbolic expres-
sions and their computation by machine, part I. Communications of the
ACM, 3(4):184-195, 1960.
[Meseguer and Goguen, 1985] Jose Meseguer and Joseph A. Goguen. Ini-
tiality, induction, and computability. In Maurice Nivat and John
Reynolds, editors, Algebraic Methods in Semantics, pages 459-541. Cam-
bridge University Press, 1985.
[Meseguer, 1989] Jose Meseguer. General logics. In H.-D. Ebbinghaus et.
al., editor, Logic Colloquium '87: Proceedings of the Colloquium held in
Granada, Spain July 20-25, 1987, Amsterdam, 1989. Elsevier North-
Holland.
[Meseguer, 1992] Jose Meseguer. Multiparadigm logic programming. In
H. Kirchner and G. Levi, editors, Proceedings of the 3rd Interna-
tional Conference on Algebraic and Logic Programming, Volterra, Italy,
September 1992, Lecture Notes in Computer Science. Springer-Verlag,
1992.
[Miller et al., 1991] Dale Miller, Gopalan Nadathur, Prank Pfenning, and
Andre Scedrov. Uniform proofs as a foundation for logic programming.
Annals of Pure and Applied Logic, 51:125-157, 1991.
[Moses, 1970] Joel Moses. The function of FUNCTION in LISP, or why
the FUNARG problem should be called the environment problem. A CM
SIGSAM Bulletin, 15, 1970.
[Mostowski et al., 1953] Andrzej Mostowski, Raphael M. Robinson, and
Alfred Tarski. Undecidability and Essential Undecidability in Arithmetic,
chapter II, pages 37-74. Studies in Logic and the Foundations of Mathe-
matics. North-Holland, Amsterdam, 1953. Book author: Alfred Tarski in
collaboration with Andrzej Mostowski and Raphael M. Robinson. Series
editors: L. E. J. Brouwer, E. W. Beth, A. Heyting.
[Muchnick and Pleban, 1980] Steven S. Muchnick and Uwe F. Pleban. A
semantic comparison of Lisp and Scheme. In Proceedings of the 1980
Lisp Conference, pages 56-64, 1980. Stanford University.
[Nadathur and Miller, 1988] Gopalan Nadathur and Dale Miller. An
overview of AProlog. In Proceedings of the 5th International Confer-
ence on Logic Programming, pages 810-827, Cambridge, MA, 1988. MIT
Press.
Introduction 65

[Nadathur and Miller, 1990] Gopalan Nadathur and Dale Miller. Higher-
order Horn clauses. Journal of the ACM, 37(4):777-814, October 1990.
[O'Donnell, 1977] Michael James O'Donnell. Computing in Systems De-
scribed by Equations, volume 58 of Lecture Notes in Computer Science.
Springer-Verlag, 1977.
[O'Donnell, 1985] Michael James O'Donnell. Equational Logic as a Pro-
gramming Language. Foundations of Computing. MIT Press, Cambridge,
MA, 1985.
[Perry, 1991] Nigel Perry. The Implementation of Practical Functional Pro-
gramming Languages. PhD thesis, Imperial College of Science, Technol-
ogy and Medicine, University of London, 1991.
[Prawitz, 1965] Dag Prawitz. Natural Deduction—a Proof-Theoretic Study.
Alqvist and Wiksell, Stockholm, 1965.
[Rebelsky, 1992] Samuel A. Rebelsky. I/O trees and interactive lazy func-
tional programming. In Maurice Bruynooghe and Martin Wirsing, ed-
itors, Proceedings of the Fourth International Symposium on Program-
ming Language Implementation and Logic Programming, volume 631 of
Lecture Notes in Computer Science, pages 458-472. Springer-Verlag, Au-
gust 1992.
[Rebelsky, 1993] Samuel A. Rebelsky. Tours, a System for Lazy Term-
Based Communication. PhD thesis, The University of Chicago, June
1993.
[Rees Clinger, 1986] The Revised3 report on the algorithmic language
Scheme. ACM SIGPLAN Notices, 21(12):37-79, 1986.
[Reiter, 1978] Raymond Reiter. On closed world databases. In Herve Gal-
laire and Jack Minker, editors, Logic and Databases, pages 149-178.
Plenum Press, 1978. also appeared as [Reiter, 1981].
[Reiter, 1981] Raymond Reiter. On closed world databases. In Bon-
nie Lynn Webber and Nils J. Nilsson, editors, Readings in Artificial
Intelligence, pages 119-140. Tioga, Palo Alto, CA, 1981.
[Reiter, 1984] Raymond Reiter. Towards a logical reconstruction of rela-
tional database theory. In Michael L. Brodie, John Mylopoulos, and
Joachim W. Schmidt, editors, On Conceptual Modelling—Perspectives
from Artificial Intelligence, Databases, and Programming Languages,
Topics in Information Systems, pages 191-233. Springer-Verlag, 1984.
[Reynolds, 1973] John C. Reynolds. On the interpretation of Scott's do-
mains. In Proceedings of Convegno d'Informatica Teorica, Rome, Italy,
February 1973. Instituto Nazionale di Alta Matematica (Citta Universi-
taria).
[Schmidt, 1986] David A. Schmidt. Denotational Semantics: A Methodol-
ogy for Language Development. Allyn and Bacon, 1986.
66 Michael J. O'Donnell

[Schutte, 1977] Kurt Schutte. Proof Theory. Springer-Verlag, New York,


NY, 1977.
[Scott and Strachey, 1971] Dana Scott and Christopher Strachey. Toward
a mathematical semantics for computer languages. In Proceedings of
the Symposium on Computers and Automata, pages 19-46, Polytechnic
Institute of Brooklyn, 1971.
[Scott, 1970] Dana Scott. Outline of a Mathematical Theory of Compu-
tation, volume PRG-2 of Oxford Monographs. Oxford University Press,
Oxford, UK, 1970.
[Scott, 1976] Dana Scott. Data types as lattices. SIAM Journal on Com-
puting, 5(3), 1976.
[Scott, 1982] Dana Scott. Domains for denotational semantics. In
M. Nielsen and E. M. Schmidt, editors, Automata, Languages and
Programming—Ninth Colloquium, volume 140 of Lecture Notes in Com-
puter Science, pages 577-613. Springer-Verlag, Berlin, 1982.
[Stark, 1990] W. Richard Stark. LISP, Lore, and Logic—An Algebraic
View of LISP Programming, Foundations, and Applications. Springer-
Verlag, New York, NY, 1990.
[Stenlund, 1972] Soren Stenlund. Combinators, A-Terms, and Proof The-
ory. D. Reidel, Dordrecht, Netherlands, 1972.
[Stoy, 1977] Joseph E. Stoy. Denotational Semantics: The Scott-Strachey
Approach to Programming Language Theory. MIT Press, Cambridge,
MA, 1977.
[Tait, 1967] William W. Tait. Intensional interpretation of functionals of
finite type. Journal of Symbolic Logic, 32(2):187-199, 1967.
[Takeuti, 1975] Gaisi Takeuti. Proof Theory. North-Holland, Amsterdam,
1975.
[Thompson, 1990] Simon Thompson. Interactive functional programs, a
method and a formal formal semantics. In David A. Turner, editor,
Research Topics in Functional Programming. Addison-Wesley, 1990.
[van Emden and Kowalski, 1976] M. H. van Emden and R. A. Kowalski.
The semantics of predicate logic as a programming language. Journal of
the ACM, 23(4):733-742, 1976.
[Wand, 1976] Mitchell Wand. First order identities as a defining language.
Acta Informatica, 14:336-357, 1976.
[Webster, 1987] Webster's Ninth New Collegiate Dictionary. Merriam-
Webster Inc., Springfield, MA, 1987.
[Williams and Wimmers, 1988] John H. Williams and Edward L. Wim-
mers. Sacrificing simplicity for convenience: Where do you draw the
line? In Proceedings of the Fifteenth Annual ACM Symposium on Prin-
ciples of Programming Languages, pages 169-179. ACM, 1988.
Introduction 67

[Winksel, 1993] Glynn Winksel. The Formal Semantics of Programming


Languages—An Introduction. Foundations of Computing. MIT Press,
Cambridge, MA, 1993.
[Zhang, 1991] Guo-Qiang Zhang. Logic of Domains. Progress in Theoret-
ical Computer Science. Birkhauser, Boston, MA, 1991.
This page intentionally left blank
Equational Logic Programming
Michael J. O'Donnell

Contents
1 Introduction to equational logic programming 69
1.1 Survey of prerequisites 69
1.2 Motivation for programming with equations 71
1.3 Outline of the chapter 74
2 Proof systems for equational logic 75
2.1 Inferential proofs 75
2.2 Term rewriting proofs 78
2.3 The confluence property and the completeness of term
rewriting 81
3 Term rewriting proof strategies 96
3.1 Complete and outermost complete rewriting sequences 97
3.2 Sequentiality analysis and optimal rewriting 100
4 Algorithms and data structures to implement
equational languages 111
4.1 Data structures to represent terms 111
4.2 Pattern-matching and sequencing methods 120
4.3 Driving procedures for term rewriting 129
5 Compiling efficient code from equations 137
6 Parallel implementation 139
7 Extensions to equational logic programming 141
7.1 Incremental infinite input and output 141
7.2 Solving equations 147
7.3 Indeterminate evaluation in subset logic 149
7.4 Relational rewriting 151

1 Introduction to equational logic programming


1.1 Survey of prerequisites
Sections 2.3.4 and 2.3.5 of the chapter 'Introduction: Logic and Logic Pro-
gramming Languages' are crucial prerequisites to this chapter. I summarize
70 Michael J. O'Donnell

their relevance below, but do not repeat their content.


Logic programming languages in general are those that compute by
deriving semantic consequences of given formulae in order to answer ques-
tions. In equational logic programming languages, the formulae are all
equations expressing postulated properties of certain functions, and the
questions ask for equivalent normal forms for given terms. Section 2.3.4 of
the 'Introduction . . .' chapter gives definitions of the models of equational
logic, the semantic consequence relation

T |== (t1 = t2)

(t1 = t2 is a semantic consequence of the set T of equations, see Defini-


tion 2.3.14), and the question answering relation

(norm t1, . . . ,ti :t) ?-= (t = s)

(t = s asserts the equality of t to the normal form s, which contains no


instances of t1, . . . , ti, see Definition 2.3.16). Since this chapter is entirely
about Equational Logic, we drop the subscripts and write |= for |== and
?= for ?-_;.. The composed relation

(norm t1, . . . , ti : t) ?- T |= (t = s)
(t = s is a semantically correct answer to the question (norm t1, . . . , ti : t)
for knowledge T, see Definition 2.2.2) means that s is a normal form—a
term containing no instances of t1, . . . , ti—whose equality to t is a seman-
tic consequence of the equations in T. Equational logic programming lan-
guages in use today all take sets T of equations, prohibited forms t1, . . . , ti,
and terms t to normalize, and they compute normal forms 5 satisfying the
relation above.
Section 2.3.5 of the 'Introduction . . .' chapter explains how different
equational languages variously determine T, t1, . . . ,ti, and t from the
language design, the program being executed, and the input. An alter-
nate style of equational logic programming, using questions of the form
(solve x1, . . . , xi :t1 = t2) that ask for substitutions for x1, . . . , xi solving
the equation (t1 = t2), is very attractive for its expressive power, but much
harder to implement efficiently (see Section 7.2).
There is a lot of terminological confusion about equational logic pro-
gramming. First, many in the Prolog community use 'logic' to mean the
first-order predicate calculus (FOPC), while I stick closer to the dictionary
meaning of logic, in which FOPC is one of an infinity of possible logical
systems. Those who identify logic with FOPC often use the phrase 'equa-
tional logic programming' to mean some sort of extension of Prolog using
equations, such as logic programming in FOPC with equality. In this chap-
ter, 'equational logic programming' means the logic programming of pure
Equational Logic Programming 71

equational logic, as described in the chapter 'Introduction: Logic and Logic


Programming Languages.'
A second source of confusion is that many equational logic programming
languages have been invented under different labels. Lisp [McCarthy, 1960],
APL [Iverson, 1962], Red languages [Backus, 1974], functional program-
ming languages [Backus, 1978; Hudak, 1992], many dataflow languages
[Ashcroft and Wadge, 1985; Pingali and Arvind, 1985; Pingali and Arvind,
1986], and languages for algebraic specification of abstract datatypes [Fu-
tatsugi et al, 1985; Guttag and Horning, 1978; Wand, 1976] are all forms
of equational logic programming languages, although they are seldom re-
ferred to as such. This chapter focuses on a generic notion of equational
logic programming, rather than surveying particular languages.
1.2 Motivation for programming with equations
From a programmer's point of view, an equational logic programming lan-
guage is the same thing as a functional programming language[Backus,
1978]. The advantages of functional programming languages are discussed
in [Hudak, 1989; Bird and Wadler, 1988; Field and Harrison, 1988]—
equational logic programming languages offer essentially the same advan-
tages to the programmer. Functional programming and equational logic
programming are different views of programming, which provide different
ways of designing and describing a language, but they yield essentially the
same class of possible languages. The different styles of design and descrip-
tion, while they allow the same range of possibilities, influence the sense
of naturalness of different languages, and therefore the relative importance
of certain features to the designer and implementer. The most important
impact of the equational logic programming view on language design is
the strong motivation that it gives to implement lazy, or demand-driven,
computation.
In the conventional view of functional programming, computation is
the evaluation of an input term in a unique model associated with the pro-
gramming language. This view makes it very natural to evaluate a term of
the form f ( s 1 , . . . , sn) by first evaluating all of the arguments Si, and then
applying the function denoted by / to the values of the arguments. If the
attempt to evaluate one of the arguments leads to infinite computation,
then the value of that argument in the model is said to be an object called
'undefined' (the word is used here as a noun, although the dictionary rec-
ognizes it only as an adjective), and typically denoted by the symbol JL.
But, since the value J. is indicated by the behavior of infinite computation,
there is no chance to actually apply the function denoted by / to it, so that
every function is forced to map _L to _L. Such functions are called strict
functions.
Early functional programming languages required all primitive func-
tions to be strict, except for the conditional function cond. The normal
72 Michael J. O'Donnell

way to evaluate a term of the form cond(s,t, u) is to evaluate s, then use


its value to determine which of t or u to evaluate, omitting the other. The
function denoted by cond is thus not strict, since for example the value
of cond (true, 0,JL) is 0 rather than _L. Only Backus seems to have been
annoyed by the inconsistency between the nonstrictness of the conditional
function and the strictness of all other primitives. He proposed a strict
conditional, recovering the selective behavior of the nonstrict conditional
through a higher-order coding trick [Backus, 1978]. In effect, he took ad-
vantage of the nearly universal unconscious acceptance of a nonstrict in-
terpretation of function application, even when the function to be applied
is strict.
In the equational logic programming view, computation is the deriva-
tion of an equivalent normal form for an input term using the information
given by a set of equations describing the symbols of the programming
language. The equivalence of input to output holds in all of the infinitely
many models of those equations. This view makes it very natural to ap-
ply equations involving / to derive an equivalent form for f(s1, . . . , sn) at
any time, possibly before all possible derivation has been performed on the
arguments Si. The natural desire for completeness of an implementation re-
quires that infinite computation be avoided whenever possible. Notice that
the equational logic programming view does not assign a value JL denoting
'undefined' (the noun) to a term with infinite computational behavior. In
fact, in each individual model all functions are total. Rather, we might ob-
serve that a term is undefined (the word is now an adjective, as approved by
the dictionary) if there is no equivalent term suitable for output, although
each model of the given equations assigns it some value. So, equational
logic programming leads naturally to computational behavior that is not
strict—in fact, a logically complete implementation of equational logic pro-
gramming must make functions as unstrict as possible. The preference for
nonstrictness comes from regarding undefinedness as our inability to dis-
cover the value of a function, rather than the inherent lack of a semantic
value.
The contrast between strict and nonstrict treatments of functions is
best understood by comparing the conventional implementation of cond,
true and false to that of cons, car and cdr in Lisp.
Example 1.2.1. The following equations define the relationship between
cond, true, and false:
T =
cond {(cond(true,x,y) = x), (cond(false,x,y) = y)}
Similarly the following equations define the relationship between cons, car,
and cdr:

Tcons = {(car(cons(x,y)) = x), (cdr(cons(x,y)) = y)}


Equational Logic Programming 73

These equations were given, without explicit restriction, in the earliest


published definition of Lisp [McCarthy, 1960].
Notice the formal similarity between T cond and Tcons in Example 1.2.1
above. In both cases, two equations provide a way to select one of the two
subterms denoted by the variables x and y. In T c o n d , the selection is de-
termined by the first argument to cond, in Tcons it is determined by the
function symbol applied to the term headed by cons. Yet in all early Lisp
implementations cons is evaluated strictly, while cond is not. The equation
(car (cons (0, s)) = 0) is a logical consequence of Tcons, even when s leads
to infinite computation, so a complete implementation of equational logic
programming must not treat cons strictly.
In the Lisp and functional programming communities, nonstrict evalu-
ation of functions other than the conditional is called lazy evaluation. The
power of lazy evaluation as a programming tool is discussed in [Friedman
and Wise, 1976; Henderson and Morris, 1976; Henderson, 1980; Hudak,
1989; Bird and Wadler, 1988; Field and Harrison, 1988; O'Donnell, 1985].
Lazy evaluation is demand-driven—computation is performed only as re-
quired to satisfy demands for output. So, the programmer may define large,
and even infinite, data structures as intermediate values, and depend on
the language implementation to compute only the relevant parts of those
structures. In particular, lazily computed lists behave as streams [Karlsson,
1981; Hudak and Sundaresh, 1988], allowing a straightforward encoding of
pipelined coroutines in a functional style.
Many modern implementations of functional programming languages
offer some degree of lazy evaluation, and a few are now uniformly lazy. But,
in the functional programming view, lazy evaluation is an optional added
feature to make programming languages more powerful. The basic denota-
tional semantic approach to functional programming makes strictness very
natural to describe, while denotational semantics for lazy evaluation seems
to require rather sophisticated use of domain theory to construct models
with special values representing all of the relevant nonterminating and par-
tially terminating behaviors of terms [Winksel, 1993]. In the equational
logic programming view, lazy evaluation is required for logical complete-
ness, and strict evaluation is an arbitrary restriction on derivations that
prevents certain answers from being found.
The functional and equational views also diverge in their treatments
of certain terms that are viewed as pathological. From the functional
programming view, pathological terms seem to require specialized logical
techniques treating errors as values, and even new types of models called
error algebras [Goguen, 1977]. For example, in a language with stacks,
the term pop(empty) is generally given a value which is a token denoting
the erroneous attempt to pop an empty stack. Given a set of equations,
equational logic programming provides a conceptual framework, based on
74 Michael J. O'Donnell

well-understood traditional concepts from mathematical logic, for prescrib-


ing completely the computational behavior of terms. The judgement that
a particular term is pathological is left to the consumer of that answer,
which might be a human reader or another program. For example, the
term pop(empty) need not be evaluated to an error token: it may be out-
put as a normal form, and easily recognized as a pathological case by the
consumer. Or, an explicit equation pop(empty) = e may be added to the
program, where e gives as much or as little detailed information about the
particular error as desired.
So, for the programmer there is nothing to choose between lazy func-
tional programming and equational logic programming—these are two styles
for describing the same programming languages, rather than two different
classes of programming languages. To the language designer or implemen-
tor, the functional programming view provides a connection to a large body
of previous work, and offers some sophisticated tools for the thorough de-
scription of the processing of erroneous programs and the use of varying
degrees of strictness or laziness. The equational logic programming view
offers a deeper explanation of the logical content of computations, a way
of defining correctness of the computation of answers independently of the
classification of programs as correct or erroneous, and a strong motivation
for uniformly lazy evaluation. It also connects equational/functional pro-
gramming to other sorts of logic programming in a coherent way, which
may prove useful to future designs that integrate equational/functional
programming with other styles.

1.3 Outline of the chapter


The next part of this chapter is primarily concerned with problems in
the implementation of equational logic programming and some interest-
ing variants of it. Those problems arise at four very different levels of
abstraction—logic, strategy, algorithm, and code. At the level of pure
logic, Section 2 discusses two different formal systems of proof for equa-
tional logic—inferential proof and term rewriting proof—and argues that
the latter is logically weaker in general, but more likely to provide efficient
computation for typical equational programs. The confluence property of
sets of equations is introduced, and shown to be a useful way of guarantee-
ing that term rewriting proof can succeed. Next, Section 3 treats high-level
strategic questions in the efficient search for a term rewriting proof to an-
swer a given question. The crucial problem is to choose the next rewriting
step out of, a number of possibilities, so as to guarantee that all correct an-
swers are found, and to avoid unnecessary steps. Then, Section 4 discusses
the design of efficient algorithms and data structures for finding and choos-
ing rewriting steps, and for representing the results of rewriting. Section 5
contains a brief description of the conventional machine code that a com-
piler can generate based on these algorithms and data structures. Section 6
Equational Logic Programming 75

discusses briefly some of the problems involved in parallel implementation


of equational logic programming. Finally, Section 7 treats several possible
extensions to the functionality of equational logic programming and the
problems that arise in their semantics and implementation.

2 Proof systems for equational logic


The basic idea in implementations of equational logic programming is to
search for a proof that provides a correct answer to a given question. The
basic idea behind proofs in equational logic is that the equation t1 = t2
allows t1 and t2 to be used interchangeably in other formulae. As in Def-
inition 3.1.1, of the chapter 'Introduction: Logic and Logic Programming
Languages,' T I D - F means that D is a correct proof of the formula F
from hypotheses in T. T I- F means that there exists a proof of F from
hypotheses in T. In this chapter, subscripts on the generic symbols I -
and |- are omitted whenever the particular proof system is clear from the
context.
In Sections 2.1 and 2.2, we consider two different styles of equational
proof. Inferential proofs derive equations step by step from other equa-
tions. Term rewriting proofs use equations to transform a given term into
a provably equivalent term by substituting equals for equals.

2.1 Inferential proofs


In order to explore a variety of approaches to proving equations, we first
define generic concepts of rules of inference and proofs using rules, and
then consider the power of various sets of rules.
Definition 2.1.1. Let the set V of variables, the sets Funi of i-ary func-
tion symbols, and the set Tp of terms, be the same as in Definition 2.3.1
of the 'Introduction . . .' chapter Section 2.3, and let the set of equational
formulae, or simply equations, be F= = {t1 = t2 : t1, t2 6 Tp}, as in Defi-
nition 2.3.13 in Section 2.3.4 of the chapter 'Introduction: Logic and Logic
Programming Languages.'
An equational rule of inference is a binary relation

When T R F, we say that F follows from T by rule R. Members of T axe


called hypotheses to the application of the rule, and F is the conclusion.
When 0 R F, we call F a postulate. (It is popular now to call a postulated
formula F an axiom, although the dictionary says that an axiom must be
self-evident, not just postulated.) Rules of inference are usually presented
in the form
76 Michael J. O'Donnell

Where H1 . . . , Hm are schematic descriptions of the hypotheses, and C is


a schematic description of the conclusion of an arbitrary application of the
rule. Notice that the union of rules of inference is itself a rule.
The set of inferential equational proofs is P= = F+, the set of nonempty
finite sequences of equations. Given a rule of inference R, the proof relation

is defined by
T I <F0, . . . , Fm) -R F if and only if Fm = F and,
for all i < m, one of the following cases holds:
1. Fi € T
2. There exist j1, . . . ,jn< i such that {Fj1 , . . . , Fjn } R Fi

So, a proof of F from hypotheses in T using rule R is a sequence of equa-


tions, each one of which is either a hypothesis, or it follows from previous
equations by the rule R. The following are popular rules of inference for
proofs in equational logic.
Definition 2.1.2.
Equational Logic Programming 77

Now, when R is the union of any of the rules presented above, ( F = , P=,
I -R) is a compact proof system (Definition 2.1.3, Section 2.1).
The rules above are somewhat redundant. Every proof system using a
subset of these rules is sound, and those using the Reflexive, Symmetric,
Transitive and Instantiation rules, and at least one of Substitution and
Congruence, are also complete.
Proposition 2.1.3. Let R be the union of any of the rules in Defini-
tion 2.1.2. Then (F= ,P=, I -R) is a sound proof system for the standard
semantic system of Definition 2.3.14, Section 2.3.4 of the chapter 'Introduc-
tion: Logic and Logic Programming Languages.' That is, T |-R (t1 = t2)
implies T | = ( t 1 = t 2 ) .
The proof of soundness is an elementary induction on the number of
steps in a formal equational proof, using the fact that each of the rules of
inference proposed above preserves truth.
Proposition 2.1.4. Let R be the union of the Reflexive, Symmetric,
Transitive, and Instantiation rules, and at least one of the Substitution
and Congruence rules. Then (F = ,P = , I -R) is a complete proof system.
That is, T |= (t1 = t2) implies T |-R (t1 = t2).
To prove completeness, we construct for each set T of equations, a
term model MT such that Theory ({MT}) contains exactly the semantic
consequences of T. For each term t e Tp,

Because R includes the Reflexive, Symmetric, and Transitive rules, prov-


able equality is an equivalence relation on terms, and |t|T is the equivalence
class containing t. Now, construct the model

whose universe is

and whose function assignment is defined by

Either of the rules Substitution and Congruence is sufficient to guaran-


tee that TT is well defined. Finally, the Instantiation rule guarantees that
T |- (s = t) if and only if T T v ( s ) = T T v ( t ) for all variable assignments v,
which by Definition 2.3.14 is equivalent to T |= (s = t).
78 Michael J. 0 'Donnell

Notice that each inference by the Congruence rule is derivable by k


applications of the Substitution rule, combined by the Transitive rule. In
effect, Congruence is just a special form of multiple simultaneous substi-
tution. Similarly, each inference by the Substitution rule is derivable by
repeated applications of the Congruence rule and additional instances of
the Reflexive rule (this can be proved easily by induction on the structure
of the term r on which substitution is performed in the Substitution rule).
In the rest of this chapter, the symbols I -inf and |-inf refer to a sound
and complete system of inferential equational proof, when the precise rules
of inference are not important.

2.2 Term rewriting proofs


The most commonly used methods for answering normal form questions
(norm t1, . . . , ti : t) all involve replacing subterms by equal subterms, using
the Substitution rule, to transform the term t into an equivalent normal
form. Substitution of subterms according to given rules is called term
rewriting, and is an interesting topic even when the rewriting rules are
not given by equations (see the chapter 'Equational Reasoning and Term
Rewriting Systems' in Volume 1). In this chapter, we are concerned only
with the use of term rewriting to generate equational proofs—this technique
is also called demodulation [Loveland, 1978] in the automated deduction
literature.
Definition 2.2.1. Let T = {li = r1, . . .,ln = rn} be a set of equations.
Recall that an instance of a formula or term is the result of substituting
terms for variables (Definition 2.3.5 in Section 2.3.1 of the chapter 'Intro-
duction: Logic and Logic Programming Languages').
A term s1 rewrites to s2 by T (written s1 —> s2) if and only if there is
a term t, a variable x with exactly one occurrence in t, and an instance
l'i = ri of an equation li = ri in T, such that s1 = tty'Jx] and 52 = t[r'Jx].
That is, $2 results from finding exactly one instance of a left-hand side of
an equation in T occurring as a subterm of s1, and replacing it with the
corresponding right-hand side instance.
A term rewriting sequence for T is a nonempty finite or infinite sequence
(U0,u1, . . .) such that, for each i, Ui -> ui+1.
Term rewriting sequences formalize the natural intuitive process of re-
placing equals by equals to transform a term. A term rewriting sequence
may be viewed as a somewhat terse proof.
Definition 2.2.2. Let T+ be the set of nonempty finite sequences of terms
in Tp. The proof relation I -tr is defined by T I (u0, . . ., um) -tr (s = t) if
and only if U0 = s, um = t, and for each i < m, Ui -> Ui+1.
Then (F = , T+, I- t r ), is a compact proof system, representing the term
Equational Logic Programming 79

rewriting style of equational proof.


A term rewriting proof for T represents an inferential proof from hy-
potheses in T in a natural way.
Proposition 2.2.3. // T |-tr (s = t), then T |-inf (s = t).
Let (u0, . . . ,u n ) be the term rewriting sequence such that

In particular, U0 = s and un = t. The proof of the proposition is an ele-


mentary induction on n.
BASIS: For n = 0, s = U0 = un = t, so T I (U0 = U0) —inf (s = t), by
the Reflexive rule.
INDUCTION: For n > 0, since a nonempty prefix of a term rewrit-
ing proof is also a term rewriting proof, we have T I (U0,. . . , u n - 1 ) -tr
(s = u n-1 ). By the induction hypothesis, there is a D such that T I D -inf
(s = u n - 1 ) . It is easy to extend D to D' so that T I D' — (s = t), by
adding the following steps:

• the appropriate equation from T;


• a sequence of applications of the Instantiation rule to produce the
appropriate instance of the equation above;
• one application of the Substitution rule to produce un-1 = t;
• one application of the Transitive rule to produce s = t.

Since inferential proof is sound, it follows that term rewriting proof is


also sound.
Example 2.2.4. Let T = {f(a,f(x,y)) = f(y,x), g(x) = x}.

is a term rewriting proof of

from T. The corresponding inferential proof from the induction in Propo-


sition 2.2.3 is given below. Line numbers are added on the left, and rules
cited on the right, for clarity: formally the proof is just the sequence of
equations. The key occurrences of the terms in the term rewriting sequence
above are boxed to show the correspondence.
80 Michael J. O'Donnell

Steps 5, 10, and 12 above are redundant (they reproduce the results already
obtained in steps 4, 9, 2), but the systematic procedure in the induction of
Proposition 2.2.3 includes them for uniformity.
So, a term rewriting proof is a convenient and natural shorthand for an
inferential proof.
Not every inferential proof corresponds to a term rewriting proof. First,
the proofs corresponding to term rewriting sequences do not use the Sym-
metric rule. This represents a serious incompleteness in term rewriting
proof. Section 2.3 shows how restrictions on equational hypotheses can
avoid the need for the Symmetric rule, and render term rewriting complete
for answering certain normal form questions.
Example 2.2.5. Let T = {a = b, c = b, c = d}. T |= (a = d), and
T |-inf (a = d), by one application of the Symmetric rule and two appli-
cations of the Transitive rule. But, there is no term rewriting sequence
from a to d, nor from d to o, nor from a and d to a common form equal to
both.
Second, term rewriting proofs limit the order in which the Instantiation,
Substitution, and Transitive rules are applied. This second limitation does
not affect the deductive power of the proof system.
Proposition 2.2.6. Let T = {li = r1, . . .,ln= rn }be a set of equations.
Let TR = {r1 = lj, . . . ,rn = ln}. TR is the same as T except that the left
and right sides of equations are interchanged—equivalently, T contains
Equational Logic Programming 81

the results of applying the Symmetric rule to the equations in T.


For all equations (s = t), if T |-inf (s = t) (equivalently, if T |=(s = t))
then T U TR |-tr (s = t).
The proof of the proposition, given in more detail in [O'Donnell, 1977],
works by permuting the steps in an arbitrary inferential proof of s = t into
the form:
1. hypotheses;
2. applications of the Symmetric rule;
3. applications of the Instantiation rule;
4. applications of the Substitution rule;
5. applications of the Transitive rule.
The reflexive rule is only needed in the degenerate case when s = t (s
and t are the same term). In this form, it is easy to represent each of
the applications of the Transitive rule as concatenating two term rewriting
sequences. The crucial quality of the permuted form of the proof is that
all uses of the Instantiation rule come before any use of the Transitive and
Substitution rules.
The implementor of a logic programming system often faces a trade-
off between the cost of an individual proof, and the cost of the search for
that proof. The discipline of term rewriting can be very advantageous in
reducing the number of possible steps to consider in the search for a proof
to answer a question, but it increases the lengths of proofs in some cases.
Section 4.3.3 shows how clever uses of Instantiation sometimes reduce the
length of a proof substantially compared to term rewriting proofs. Effi-
cient implementations of programming languages have not yet succeeded
in controlling the costs of search for a proof with the more sophisticated
approaches to Instantiation, so term rewriting is the basis for almost all
implementations.

2.3 The confluence property and the completeness of


term rewriting
Term rewriting is often much more efficient than an undisciplined search
for an equational proof. But, for general sets T of equational hypotheses,
term rewriting is not complete, due to its failure to apply the Symmet-
ric rule. It is tempting, then, to use each equation in both directions,
and take advantage of the completeness result of Proposition 2.2.6. Un-
fortunately, known techniques for efficient term rewriting typically fail or
become inefficient when presented with the reversed forms of equations.
So, we find special restrictions on equations that imply the completeness of
term rewriting for the answering of particular normal form questions. The
confluence property, also called the Church-Rosser property, provides the
key to such restrictions.
82 Michael J. O'Donnell

Definition 2.3.1. Let -» be a binary relation, and ->* be its reflexive-


transitive closure. -» is confluent if and only if, for all s, t1, t2 in its
domain such that s ->* t1 and s ->* t2, there exists a u such that ti ->* u
and t2 —>* u (see Figure 1 B)

A. Local confluence B. Confluence C. One-step confluence


The circle around u indicates that it is existentially quantified, the uncircled
s, t1, t2 are universally quantified.
Fig. 1. Confluence and related properties.

Two similar properties that are very important in the literature are
local confluence, which is weaker than confluence, and one-step confluence,
which is stronger than confluence.
Definition 2.3.2 ([Newman, 1942]). Let -> be a binary relation, and
->* be its reflexive-transitive closure. -> is locally confluent if and only if,
for all s, t1, t2 in its domain such that s —> t1 and s —> t2, there exists a u
such that t1 -> u and t2-»* u (see Figure 1 A).
While confluence guarantees that divergent term rewriting sequences
may always be rewritten further to a common form, local confluence guar-
antees this only for single step term rewriting sequences.
Definition 2.3.3 ([Newman, 1942]). Let -> be a binary relation. -> is
locally confluent if and only if, for all s, t1, t2 in its domain such that s -> t1
and s -> t2, there exists a u such that t1 -» u and t2 -> u (see Figure 1 C).
While confluence guarantees that divergent term rewriting sequences
may always be rewritten further to a common form, one-step confluence
guarantees that for single step divergences, there is a single-step conver-
gence.
Proposition 2.3.4. One-step confluence implies confluence implies local
confluence.
The first implication is a straightforward induction on the number of
steps in the diverging rewrite sequences. The second is trivial.
Equational Logic Programming 83

2.3.1 Consequences of confluence


T
When the term rewriting relation -> for a set T of equations has the con-
fluence property, term rewriting is sufficient for deriving all logical conse-
quences of T, in the sense that T |= (s = t) implies that s and t rewrite to
some common form u.
Proposition 2.3.5 ([Curry and Feys, 1958]). Let T be a set of equa-
tions, and let -> be the term rewriting relation for T (Definition 2.2.1). If
-> is confluent, then for all terms s and t such that T |= (s = t), there is
a term u such that T |-tr (s = u) and T |-tr (t = u).
The proof of the proposition is an elementary induction on the length
of an inferential proof D such that T I D -inf (s = t).
So, confluent term rewriting is nearly complete, in the sense that every
logical consequence s = t of a set of equations T may be derived by choosing
an appropriate term u, and finding two term rewriting proofs and a trivial
inferential proof as follows:
1. T |-tr (S = u)
2. T |-tr (t = u)
3. {s = u, t = u} |-inf (s = t) trivially, by one application of Symmetry
and one application of Transitivity.
The near-completeness of confluent term rewriting leads to its use in the-
orem proving [Knuth and Bendix, 1970; Loveland, 1978]. For equational
logic programming, term rewriting can answer all normal form queries in
a confluent system, when the prohibited terms in normal forms are all the
left-hand sides of equations.
Proposition 2.3.6. Let T = {l1 = r1, . . . ,lm = rm} be a set of equa-
tions, with confluent term rewriting relation —>, and let t be any term.
If
(norm l1, . . . , lm : t) ?- T |= (t = s)
then
(norm l1, . . . ,lm : t) ?- T |-tr (t = s)
The proof is elementary. By confluence, t and s rewrite to a common
form u. Since s is a normal form, it is not rewritable, and must be the
same as u.
So, for equations T with confluent rewriting relation, term rewriting
based on T is sufficient for answering all queries requesting normal forms
that prohibit left-hand sides of equations in T. From now on, a normal
form will mean a normal form for the left-hand sides of whatever set of
equations we are discussing (see Definition 2.3.16 in the chapter 'Introduc-
tion: Logic and Logic Programming Languages' for the general concept of
normal form).
84 Michael J. O'Donnell

The most famous consequence of the confluence property is uniqueness


of normal forms.
Proposition 2.3.7. Let T = {l1 = r1, . . . ,lm = rm} be a set of equa-
tions, with confluent term rewriting relation ->. If

and

then S1 = S2 (s1 and s2 are the same term).


The proof is elementary. By confluence, s1 and s2 rewrite to a common
form u. Since s1 and s2 are normal forms, they are not rewritable, and
must be the same as u.
So, equational logic programs using confluent systems of equations have
uniquely defined outputs. This is an interesting property to note, but it is
not essential to the logic programming enterprise — logic programs in FOPC
are allowed to have indeterminate answers (Section 2.3.1 of the 'Introduc-
tion' chapter), and this freedom is often seen as an advantage. In efficient
equational' logic programming, confluence is required for the completeness
of term rewriting, and uniqueness of answers is an accidental side-effect
that may be considered beneficial or annoying in different applications.
Confluence, in effect, guarantees that the order of applying rewrite steps
cannot affect the normal form. In Section 3 we see that the order of appli-
cation of rewrite rules can affect the efficiency with which a normal form is
found, and in some cases whether or not the unique normal form is found
at all.
2.3.2 Testing for confluence
Proposition 2.3.8. Confluence is an undecidable property of finite sets
of equations.
The proof is straightforward. Given an arbitrary Turing Machine M,
modify M so that, if it halts, it does so in the special configuration If.
Encode configurations (instantaneous descriptions) of M. as terms (just let
the tape and state symbols be unary function symbols), and provide rewrit-
ing rules to simulate the computation of M. So far, we have a system
of equations in which an arbitrary encoding of an initial configuration I0
rewrites to If if and only if M. halts on I0 . Choose a new symbol a not
used in encoded configurations, and add two more equations: I0 = a and
If = a. The extended system is confluent if and only if M halts on I0.
For practical purposes in programming language implementations, we
need a sufficient condition for confluence that is efficient to test.
Equational Logic Programming 85

Orthogonality. A particularly useful sort of condition for guaranteeing


confluence is orthogonality, also called regularity (but not connected in
any sensible way to the regular languages). Orthogonality is a set of re-
strictions on rewrite rules insuring that they do not interfere with one
another in certain pathological ways. We consider three versions of or-
thogonality. Rewrite-orthogonality insures that the rewrites performed by
the rules do not interfere, while the stronger condition of rule-orthogonality
prohibits even the appearance of interference based on an inspection of the
left-hand sides of the rules, and ignoring the right-hand sides. Constructor-
orthogonality is an even stronger and simpler syntactic condition that guar-
antees rule-orthogonality. In other literature on term rewriting, 'orthogo-
nality' and 'regularity' refer to the stronger form, rule-orthogonality.
Definition 2.3.9. Let T = {l1 = r1, ...,l n = rn} be a set of equations.
T is rewrite-orthogonal if and only if the following conditions hold:
1. (Nontrivial) No left-hand side li of an equation li = ri in T consists
entirely of a variable.
2. (Rule-like) Every variable in the right-hand side ri of an equation
li = ri in T occurs in the left-hand side li as well.
3. (Left-linear) No variable occurs more than once in the left-hand side
li of an equation li = ri in T.
4. (Rewrite-Nonambiguous) Let li and lj be left-hand sides of equations
in T, and let s be a term with a single occurrence of a new variable
y (not occurring in any equation of T). If

then either s is an instance of lj, or

In clause 4 the nested substitution may be hard to read.

is the result of substituting t1,,..., tm for x 1 ,..., xm in ri , to produce ri =


r i [ t 1 , . . . ,tm /x 1 ,.. .,x m ], then substituting r2[ for y in s. Clause 4 is best
understood by considering an example where it fails. The set of equations
{f(g(v,w),x) = a, g(h(y),z) = b} is rewrite-ambiguous because, in the
term f ( g ( h ( c ) , d ) , e ) , there is an instance of f ( g ( w , x ) ) and an instance
of g ( h ( y ) , z ) , and the two instances share the symbol g. Furthermore,
f(g(h(c),d),e) rewrites to a using the first equation, and to a different
result, f ( b , e ) , using the second equation.
Nontriviality and the Rule-like property are required in order for the
interpretation of the equations as term rewriting rules to make much sense.
86 Michael J. O'Donnell

Left-linearity is of practical importance because the application of a rule


with repeated variables on the left-hand side requires a test for equality.
Non-left-linear systems also fail to be confluent in rather subtle ways, as
shown in Example 2.3.16 below. Rewrite-nonambiguity says that if two
rewriting steps may be applied to the same term, then they are either
completely independent (they apply to disjoint sets of symbols), or they
are equivalent (they produce the same result). Example 2.3.16 below shows
more cases of rewrite-ambiguity and its consequences.
One simple way to insure rewrite-nonambiguity is to prohibit all inter-
ference between left-hand sides of rules.
Definition 2.3.10 (Klop [1991; 1980]). Let T = {l1 = r 1 ; . . . , ln = rn}
be a set of equations. T is rule-orthogonal if and only if T satisfies condi-
tions 1-3 of Definition 2.3.9 above, and also
4' (Rule-Nonambiguous) Let li and lj be left-hand sides of equations in
T, and let s be a term with a single occurrence of a new variable y
(not occurring in any equation of T). If

then either s is an instance of lj, or s = y and i = j .


Rule-nonambiguity says that if two rewriting steps may be applied to the
same term, then they are either completely independent, or they are iden-
tical (the same rule applied at the same place). Notice that rule nonambi-
guity depends only on the left-hand sides of equations, not the right-hand
sides. In fact, only the Rule-like condition of rule-orthogonality depends
on right-hand sides.
Definition 2.3.11. Two systems of equations are left-similar if the mul-
tisets of left-hand sides of equations are the same, except for renaming of
variables.
Proposition 2.3.12. A set T of equations is rule-orthogonal if and only
if
• T satisfies the rule-like restriction, and
• every rule-like set of equations left-similar to T is rewrite-orthogonal.
That is, rule-orthogonality holds precisely when rewrite-orthogonality can
be guaranteed by the forms of the left-hand sides alone, independently of
the right-hand sides.
An even simpler way to insure rule-nonambiguity is to use a constructor
system, in which symbols appearing leftmost in rules are not allowed to
appear at other locations in left-hand sides.
Definition 2.3.13. Let T = {l1 = r 1 , . . . , ln = rn} be a set of equations.
T is constructor-orthogonal if and only if T satisfies conditions 1-3 of
Equational Logic Programming 87

Definition 2.3.9 above, and the symbols of the system partition into two
disjoint sets—the set C of constructor symbols, and the set D of defined
symbols, satisfying
4" (Symbol-Nonambiguous)
• Every left-hand side of an equation in T has the form f ( t 1 , . . . ,
tm), where / € D is a defined symbol, and t 1 , . . . , t m contain
only variables and constructor symbols in C.
• Let li and lj be left-hand sides of equations in T. If there exists
a common instance s of li and lj, then i = j.
In most of the term-rewriting literature, 'orthogonal' and 'regular' both
mean rule-orthogonal. It is easy to see that constructor orthogonality im-
plies rule-orthogonality, which implies rewrite-orthogonality. Most func-
tional programming languages have restrictions equivalent or very similar
to constructor-orthogonality.
Orthogonal systems of all varieties are confluent.
Proposition 2.3.14. Let T be a constructor-, rule- or rewrite-orthogonal
set of equations. Then the term rewriting relation —>• is confluent.
Let — be the rewrite relation that is to be proved confluent. The essen-
tial idea of these, and many other, proofs of confluence is to choose another
relation - with the one-step confluence property (Definition 2.3.3, whose
transitive closure is the same as the transitive closure of -. Since conflu-
ence is defined entirely in terms of the transitive closure, — is confluent
if and only if —>' is confluent. - is confluent because one-step confluence
implies confluence. To prove confluence of orthogonal systems of equations,
the appropriate - allows simultaneous rewriting of any number of disjoint
subterms.
Theorem 10.1.3 of the chapter 'Equational Reasoning and Term-
Rewriting Systems' in Section 10.1 of Volume 1 of this handbook is the
rewrite-orthogonal portion of this proposition, which is also proved in [Huet
and Levy, 1991; Klop, 1991]. The proof for rewrite.-orthogonal systems has
never been published, but it is a straightforward generalization. [0 'Donnell,
1977] proves a version intermediate between rule-orthogonality and rewrite-
orthogonality.
In fact, for nontrivial, rule-like, left-linear systems, rule-nonambiguity
captures precisely the cases of confluence that depend only on the left-hand
sides of equations.
Proposition 2.3.15. A nontrivial, rule-like, left-linear set T of equations
is rule-nonambiguous if and only if, for every set of equations T' left-
similar to T, -> is confluent.
(=>) is a direct consequence of Propositions 2.3.14 and 2.3.12. (•=) is
straightforward. In a rule-ambiguous system, simply fill in each right-hand
88 Michael J. O'Donnell

side with a different constant symbol, not appearing on any left-hand side,
to get a nonconftuent system.
In the rest of this chapter, we use the term 'orthogonal' in assertions
that hold for both rewrite- and rule-orthogonality. To get a general un-
derstanding of orthogonality, and its connection to confluence, it is best
to consider examples of nonorthogonal systems and investigate why they
are not confluent, as well as a few examples of systems that are not rule
orthogonal, but are rewrite orthogonal, and therefore confluent.
Example 2.3.16. The first example, due to Klop [Klop, 1980], shows
the subtle way in which non-left-linear systems may fail to be confluent.
Let

eq represents an equality test, a very useful operation to define with a


non-left-linear equation. Now

and also

true is in normal form, and f(true) rewrites infinitely as

The system is not confluent, because the attempt to rewrite f(true) to true
yields an infinite regress with f(true) -4 eq(true,f(true)). Notice that -4
has unique normal forms. The failure of confluence involves a term with
a normal form, and an infinite term rewriting sequence from which that
normal form cannot be reached. Non-left-linear systems that satisfy the
other requirements of rule-orthogonality always have unique normal forms,
even when they fail to be confluent [Chew, 1981]. I conjecture that this
holds for rewrite-orthogonality as well.
A typical rewrite-ambiguous set of equations is

c represents a primitive sort of nondeterministic choice operator. T2 vio-


lates condition (4') because

but
Equational Logic Programming 89

-4 is not confluent, as c(a, b) -4 a by the first equation, and c(a, b) -4 b by


the second equation, but a and 6 are in normal form.
By contrast, consider the set

of equations defining the positive parallel or operator. Although Tor+ is


rule-ambiguous, it is rewhte-nonambiguous:

and w is not an instance of or(x, true), but the corresponding right-hand


sides are both true:

tor+ is rewrite-orthogonal, so -+ is confluent.


A more subtle example of a rewrite-orthogonal set of equations that is
rule-ambiguous is the negative parallel or:

Although

and w is not an instance of or(x, false), the substitution above unifies the
corresponding right-hand sides as well:

T—
Tor- is rewrite-orthogonal, so or-* is confluent.
Another type of rewrite-ambiguous set of equations is

These equations express the fact that / is a homomorphism for g (i.e., /


distributes over g), and that i is a left identity for g. The left-hand sides of
the two equations overlap in f ( g ( i , z ) ) , with the symbol g participating in
instances of the left-hand sides of both equations. Condition (4) is violated,
because
90 Michael J. O'Donnell

but f(w) is not an instance of f(g(x, y ) ) . —? is not confluent, as f ( g ( i , i)} -4


g ( f ( i ) , f ( i ) ) by the first equation, and f ( g ( i , i ) ) - f(i) by the second equa-
tion, but both g(f(i),f(i)) and f(i) are in normal form. While the previous
examples of ambiguity involved two rules applying to precisely the same
term, the ambiguity in T3 comes from two overlapping applications of
rules to a term and one of its subterms. Some definitions of orthogonal-
ity/regularity treat these two forms of ambiguity separately.
By contrast, consider the set

Although T4 is rule-ambiguous, it is rewrite-nonambiguous:

and f(w) is not an instance of f ( g ( x , y ) ) , but the corresponding right-hand


sides yield

T4 is rewrite-orthogonal, so -4 is confluent.
Condition (4) may also be violated by a single self-overlapping equation,
such as

The left-hand side f(f(x)) overlaps itself in f(f(f(x))), with the second
instance of the symbol / participating in two different instances of f ( f ( x ) ) .
Condition (4) is violated, because

T5 T5
but f(y) is not an instance of f(f(x)). - is not confluent, as f ( f ( f ( a ) ) ) —
g(f(a)) and f(f(f(a)))- f(g(a\)), but both g(f(a)) and f(g(a)) are in
normal form.
A final example of overlapping left-hand sides is

The left-hand sides of the two equations overlap in f(g(a, b ) , y ) , with the
symbol g participating in instances of the left-hand sides of both equations.
Condition (4) is violated, because

but f(w,y) is not an instance of f(g(a,x),y). - is not confluent, as


f ( g ( a , b ) , c ) - a by the first equation, and f ( g ( a , b ) , c ) - f ( b , c ) by the
second equation, but both a and f(b, c) are in normal form.
Equational Logic Programming 91

The equations for combinatory logic

are rule-orthogonal, but not constructor-orthogonal, since the symbol @


(standing for application of a function to an argument) appears leftmost
and also in the interior of left-hand sides. In more familiar notation, @(a, (3)
is written (a/3), and leftward parentheses are omitted, so the equations look
like

Many functional programming languages vary the definition of constructor-


orthogonality to allow pure applicative systems (the only symbol of arity
greater than zero is the apply symbol @) in which the zeroary symbols
(5 and K in the example above) are partitioned into defined symbols and
constructors.
The equations for addition of Horner-rule form polynomials in the sym-
bolic variable V (V is a variable in the polynomials, but is treated formally
as a constant symbol in the equations) are

This system is rule-orthogonal, but not constructor-orthogonal, because


the symbols + and * appear leftmost and also in the interior of left-hand
sides. In the more familiar infix form for -I- and *, the equations look like

No natural variation on the definition of constructor-orthogonality seems to


allow these equations. The only obvious way to simulate their behavior with
a constructor-orthogonal system is to use two different symbols for addition,
and two different symbols for multiplication, depending on whether the
operation is active in adding two polynomials, or is merely part of the
representation of a polynomial in Horner-rule form.
Although the polynomial example above shows that some natural sets of
equations are rule-orthogonal but not constructor-orthogonal, Thatte has
an automatic translation from rule-orthogonal to constructor-orthogonal
systems [Thatte, 1985] showing that in some sense the programming power
of the two classes of systems is the same. I still prefer to focus attention on
the more general forms of orthogonality, because they deal more directly
92 Michael J. O'Donnell

with the intuitive forms of equations, and because I believe that improved
equational logic programming languages of the future will deal with even
more general sets of equations, so I prefer to discourage dependence on the
special properties of constructor systems.
Knuth-Bendix Methods. Although overlapping left-hand sides of equa-
tions may destroy the confluence property, there are many useful equa-
tional programs that are confluent in spite of overlaps. In particular, the
equation expressing the associative property has a self-overlap, and equa-
tions expressing distributive or homomorphic properties often overlap with
those expressing identity, idempotence, cancellation, or other properties
that collapse a term. These overlaps are usually benign, and many useful
equational programs containing similar overlaps are in fact confluent.
Example 2.3.17. Consider the singleton set

expressing the associative law for the operator g. This equation has a self-
overlap, violating condition (4) of rewrite-orthogonality (Definition 2.3.9)
because

but the corresponding right-hand sides disagree:

by different applications of the equation, the two results rewrite to a com-


mon normal form by

and

Consider also the set


Equational Logic Programming 93

expressing the distribution of / over g, and the fact that i is a left identity
for g and a fixed point for /. The first and second equations overlap,
violating condition (4) of rewrite-orthogonality, because

but the corresponding right-hand sides disagree:

Nonetheless, -I is confluent. For example, while

by the first equation and

by the second equation, the first result rewrites to the second, which is in
normal form, by

Notice that T8 = T3 U {f(i) = i}, and that confluence failed for T3 (Ex-
ample 2.3.16).
Experience with equational logic programming suggests that most
naively written programs contain a small number of benign overlaps, which
are almost always similar to the examples above. An efficient test for con-
fluence in the presence of such overlaps would be extremely valuable.
The only known approach to proving confluence in spite of overlaps is
based on the Knuth-Bendix procedure [Knuth and Bendix, 1970]. This
procedure relies on the fact that local confluence (Definition 2.3.2) is often
easier to verify than confluence, and that local confluence plus termination
imply confluence.
Proposition 2.3.18 ([Newman, 1942]). // -» is locally confluent, and
there is no infinite sequence SQ -> si -*•••, then -»• is confluent.
The proof is a simple induction on the number of steps to normal form.
Unfortunately, a system with nonterminating rewriting sequences may
be locally confluent, but not confluent.
Example 2.3.19. T1 of Example 2.3.16 is locally confluent, but not
confluent.
94 Michael J. O'Donnell

Consider also the set of equations

-4 is locally confluent, but not confluent. Notice how confluence fails due
to the two-step rewritings a -4 b -4 d and b -4 a -4 c (see Figure 2).

Fig. 2. T9 is locally confluent, but not confluent.

Another example, without a rewriting loop, is the set of equations

-4° is locally confluent, but not confluent. Again, confluence fails due
to the two-step rewritings f ( x ) -4° g(h(x)) -4° d and g(x) -4° f ( h ( x ) ) -¥ c
(see Figure 3).

Fig. 3. T10 is locally confluent, but not confluent.


Equational Logic Programming 95

The Knuth-Bendix procedure examines overlaps one at a time to see


whether they destroy the local confluence property. Given a pair of equa-
tions l1 = r1 and l2 = r2 be such that their left-hand sides overlap)—i.e.,
there is a term s ^ y with one occurrence of y such that

but s is not an instance of l2. For each s, l1 and l2, use the smallest
t 1 ,...,t m and t(,..., t'n that satisfy this equation. The results of rewrit-
ing the instance of s above in two different ways, according to the over-
lapping instances of equations, are c1 = s[r 1 [t 1 , . . . , t m / x 1 , . . . , xm]/y] and
c2 = r^t'i,..., t'nl%\,..., x'n]. The pair (c1, c2) is called a critical pair. A
finite set of equations generates a finite set of critical pairs, since only a
finite number of ss can be compatible with some l2, but not an instance
of l2. The procedure checks all critical pairs to see if they rewrite to a
common normal form. If so, the system is locally confluent.
Proposition 2.3.20 ([Huet, 1980]). Let T be a set of equations. If for
every critical pair (c1,c2) of T there is a term d such that c\ -^* d and
c1 —t* d, then —t is locally confluent.
This proposition, and the Knuth-Bendix method, apply even to non-
left-linear sets of equations. For example, the local confluence of T1 in
Example 2.3.16 may be proved by inspecting all critical pairs.
When some critical pair cannot be rewritten to a common form, the
Knuth-Bendix procedure tries to add an equation to repair that failure
of local confluence. For equational logic programming, we would like
to use just the part of the procedure that checks local confluence, and
leave it to the programmer to decide how to repair a failure. Although,
in principle, the search for a common form for a critical pair might go
on forever, in practice a very shallow search suffices. I have never ob-
served a natural case in which more than two rewriting steps were in-
volved. Unfortunately, many useful equational programs have nontermi-
nating term rewriting sequences, so local confluence is not enough. The
design of a variant of the Knuth-Bendix procedure that is practically
useful for equational logic programming is an open topic of research—
some exploratory steps are described in [Chen and O'Donnell, 1991]. A
number of methods for proving termination are known [Dershowitz, 1987;
Guttag et al., 1983], which might be applied to portions of an equational
program even if the whole program is not terminating, but we have no
experience with the practical applicability of these methods. If the rewrit-
ing of the terms c1 and c2 in a critical pair to a common form d (see
Proposition 2.3.20) takes no more than one rewriting step (this is one-step
confluence, Definition 2.3.3), then we get confluence and not just local con-
fluence. Rewrite-orthogonal systems are those whose critical pairs are all
96 Michael J. O 'Donnell

trivial—the members of the pair are equal, and so the reduction to a com-
mon form takes zero steps. Unfortunately, all of the important examples so
far of confluent but not rewrite-orthogonal equational programs have the
basic structure of associativity or distributivity (see Example 2.3.17) and
require two rewriting steps to resolve their critical pairs.
The sets of equations in Example 2.3.17 pass the Knuth-Bendix test
for local confluence, and a number of well-known techniques can be used to
prove that there is no infinite term rewriting sequence in these systems.
But, we need to recognize many variations on these example systems,
when they are embedded in much larger sets of equations which gener-
ate some infinite term rewriting sequences, and no completely automated
method has yet shown practical success at that problem (although there are
special treatments of commutativity and associativity [Baird et al., 1989;
Dershowitz et al., 1983]). On the other hand, in practice naturally con-
structed systems of equations that are locally confluent are almost always
confluent. Surely someone will find a useful and efficient formal criterion
to distinguish the natural constructions from the pathological ones of Ex-
ample 2.3.19.

3 Term rewriting proof strategies


Given an orthogonal set of equations T = {/i = r i , . . . , lm = rm}, or any set
with confluent term rewriting relation —>, we may now answer all questions
of the form (norm l\,..., lm : t) by exploring term rewriting sequences
starting with t. Confluence guarantees that if there is an answer, some
term rewriting sequence will find it (Proposition 2.3.6). Furthermore, con-
fluence guarantees that no finite number of term rewriting steps can be
catastrophic, in the sense that if s -^* t and if s rewrites to a normal form,
then t rewrites to the same normal form. Confluence, however, does not
guarantee that no infinite term rewriting sequence can be catastrophic.
Example 3.0.1. Consider the set of equations

The first equation is the usual definition of the car (left projection) function
of Lisp, the second is a silly example of an equation leading to infinite
term rewriting. T11 is orthogonal, so -V is confluent. But car(cons(b,a))
rewrites to the normal form 6, and also in the infinite rewriting sequence
car(cons(b,a)) -¥ car(cons(b, f ( a ) ) ) -V
Notice that, due to confluence, no matter how far we go down the infi-
nite term rewriting sequence car (cons(b, a)) -V car(cons(b, f(a))) -V • • •,
one application of the first equation leads to the normal form b. Nonethe-
less, a naive strategy might fail to find that normal form by making an infi-
Equational Logic Programming 97

nite number of unfortunate rewrites. In fact, the usual recursive evaluation


techniques used in Lisp and other term-evaluating languages correspond to
term rewriting strategies that choose infinite sequences whenever possible.
A breadth-first search of all possible rewriting sequences is guaranteed to
find all normal forms, but at the cost of a lot of unnecessary work.
For efficient implementation of equational logic programming, we need
strategies for choosing term rewriting steps so that
• a small number of term rewriting sequences is explored, preferably
only one;
• if there is a normal form, it is found, preferably by the shortest or
cheapest sequence possible.
Some theoretical work on sequencing in the lambda calculus has already
been explored under the title of one-step strategies [Barendregt, 1984].

3.1 Complete and outermost complete rewriting se-


quences
In orthogonal systems of equations, there are two useful results on strate-
gies that are guaranteed to find normal forms. The formal notation for
stating these results precisely is somewhat involved (see the chapter 'Equa-
tional Reasoning and Term Rewriting Systems' in Volume 1), so I only
give rough definitions here. The concepts in this section can be extended
to nonorthogonal systems, but in some cases there are very subtle problems
in the extensions, and they have never been treated in the literature.
Definition 3.1.1 ([Huet and Levy, 1991]). A redex is an occurrence
of an instance of a left-hand side of an equation in a term. An outermost
redex is one that is not nested inside any other redex. When a is a redex in
s, and s -5-* t, the residuals of a in t are the redexes in t that correspond in
the obvious way to a in s—they are essentially explicit copies of a, except
that some rewriting step may have rewritten a subterm of a, so that some
copies may be modified. All residuals of a are occurrences of instances of
the same left-hand side as a.
Example 3.1.2. Consider the rule-orthogonal set of equations

The term g(f(f(h(a))), f(h(a))) has five redexes: two occurrences each of
the terms a and f(h(a)), and one occurrence of f(f(h(a))). The latter
two are both instances of the left-hand side f ( x ) of the first equation.
The leftmost occurrence of f(h(a)) is nested inside f(f(h(a))), so it is not
outermost. Each occurrence of a is nested inside an occurrence of f ( h ( a ) ) ,
so neither is outermost. The rightmost occurrence of f(h(a)), and the sole
occurrence of f(f(h(a))), are both outermost redexes. In the rewriting
98 Michael J. O'Donnell

sequence below, the leftmost occurrence of f ( h ( a ) ) , and its residuals in


each succeeding term, are boxed.

Notice how the leftmost occurrence of f ( h ( a ) ) in the first term of the


sequence is copied into two occurrences in the second, due to the rewriting
of a redex in which it is nested. The first of these is changed to f(h(b))
in the third term of the sequence, but it is still a residual of the original
leftmost f ( h ( a ) ) . In the fourth term of the sequence, f(h(b)) is rewritten,
eliminating one of the residuals. In the sixth term, the remaining residual,
still in the form f ( h ( a ) ) , is eliminated due to rewriting of a redex in which
it is nested. Another occurrence of f ( h ( a ) ) remains, but it is a residual of
the rightmost occurrence of that term in the first term of the sequence.
In general, a residual a of a redex is eliminated when a is rewritten (or,
in rewrite-orthogonal systems, when a redex overlapping a is rewritten). a
is copied zero, one, or more times (zero times eliminates the residual) when
another redex in which a is nested is rewritten, a remains the same when
another redex disjoint from a is rewritten. Finally, a is modified in form,
but remains an instance of the same left-hand side, when another redex
nested inside a is rewritten.
In orthogonal systems, the nontrivial, rule-like, and nonambiguous qual-
ities of equations (restrictions 1, 2, 4 or 4' of Definition 2.3.9 or 2.3.10) guar-
antee that a given redex may be rewritten in precisely one way. So, a term
rewriting strategy need only choose a redex to rewrite at each step. The
most obvious way to insure that all normal forms are found is to rewrite
every redex fairly.
Definition 3.1.3 ([O'Donnell, 1977]). A finite or infinite term rewriting
sequence to —>• t1 — ) • • • • is complete (also called fair) if and only if, for every
i and every redex a in ti, there exists a j > i such that tj contains no
residual of a.
A complete term rewriting sequence is fair to all redexes, in the sense
that every redex a (or its residuals, which are essentially the later versions
of the redex) eventually gets eliminated, either by rewriting a (with rewrite-
Equational Logic Programming 99

orthogonality, a redex overlapping a), or by making zero copies of a while


rewriting another redex in which a is nested. Complete term rewriting
sequences are maximal, in the sense that they produce terms that are
rewritten further than every other sequence. Since nothing is rewritten
further than a normal form, complete sequences produce a normal form
whenever there is one.
Proposition 3.1.4 ([O'Donnell, 1977]). Let T be an orthogonal set of
equations, let t0 ->• t1 -»• • • • be a complete rewriting sequence, and let s
be any term such that to —»•* s. There exists an i such that s —I* ti. In
particular, if s is in normal form, then ti = s.
Computing a single complete term rewriting sequence is generally
cheaper than searching breadth-first among a number of sequences, but
fair rewriting strategies (such as the strategy of adding new redexes to a
queue, and rewriting all residuals of the head redex in the queue) typically
perform a substantial number of superfluous rewriting steps, and can eas-
ily waste an exponentially growing amount of work in some cases. Since a
residual a of a redex may only be eliminated by rewriting a, or some redex
inside which a is nested, we need only be fair to the outermost redexes in
order to be sure of finding normal forms.
Definition 3.1.5 ([O'Donnell, 1977]). A finite or infinite term rewrit-
ing sequence to -> *i ->• • • • is outermost complete or outermost fair (called
eventually outermost in [O'Donnell, 1977]) if and only if, for every i and
every outermost redex a in ti, there exists a j > i such that the unique
residual of a in tj-1 is either eliminated by rewriting in tj, or is no longer
outermost in tj (equivalently, no residual of a is outermost in tj).
Since, for the least j satisfying the definition above, a remains outer-
most from ti through tj-1 and cannot be copied, there is no loss of gener-
ality in requiring the residual of a in tj-i to be unique.
Proposition 3.1.6 ([O'Donnell, 1977]). Let T be a rule-orthogonal
set of equations, let t0 -> <i -> • • • be an outermost complete rewriting se-
quence, and let s be a (unique) normal form for t0. There exists an i such
that ti = s.
[O'Donnell, 1977] proves this proposition for a form of orthogonality
intermediate between rule- and rewrite-orthogonality. I conjecture that the
proof generalizes to rewrite-orthogonality.
The requirement that T be orthogonal, and not just confluent, is essen-
tial to Proposition 3.1.6.
Example 3.1.7. Consider the set of equations
100 Michael J. O'Donnell

These equations are confluent, but not rewrite-orthogonal, since the left-
hand sides of the first and second equations overlap in f ( g ( x , b ) ) , but the
corresponding right-hand sides yield 6 ^ f ( g ( f ( x ) , b ) ) . The natural out-
ermost complete rewriting sequence starting with f(g(b, a)) is the infinite
one

But f(g(b, a)) rewrites to normal form by

The problem is that rewriting the nonoutermost redex a to b creates a


new outermost redex for the first equation above the previously outermost
one for the second equation. This leapfrogging effect allows an inner re-
dex to kill an outer one indirectly, by creating another redex even closer
to the root. There should be some interesting conditions, weaker than
rewrite-orthogonality, that prohibit this leapfrogging effect and guarantee
outermost termination for confluent systems.
The obvious way to generate outermost-complete rewriting sequences
is to alternate between finding all outermost redexes, and rewriting them
all. The order in which the outermost redexes are rewritten is irrelevant
since they are all disjoint and cannot cause copying or modification of one
another. Unfortunately, this strategy often generates a lot of wasted work.
For example, consider a system containing the equations Tcond for the
conditional function from Example 1.2.1

In a term of the form cond(r, s, t), there will usually be outermost redexes
in all three of r s and t. But, once r rewrites to either true or false, one of
s and t will be thrown away, and any rewriting in the discarded subterm
will be wasted. The ad hoc optimization of noticing when rewriting of one
outermost redex immediately causes another to be nonoutermost sounds
tempting, but it probably introduces more overhead in detecting such cases
than it saves in avoiding unnecessary steps. Notice that it will help the cond
example only when r rewrites to true or false in one step. So, we need
some further analysis to choose which among several outermost redexes to
rewrite.

3.2 Sequentiality analysis and optimal rewriting


For rewrite-orthogonal systems of equations in general, it is impossible to
choose reliably a redex that must be rewritten in order to reach normal
form, so that there is no risk of wasted work.
Equational Logic Programming 101

Example 3.2.1. Let Tc be equations defining some general-purpose


programming system, such as Lisp. The forms of the particular equations
in Tc are not important to this example, merely the fact that they are
powerful enough for general-purpose programming. Assume that in the
system Tc there is an effective way to choose a redex that must be rewritten
to reach normal form (this is the case for typical definitions of Lisp). Now,
add the positive parallel-or equations

and consider the system Tc U T or+.


For an arbitrary given term or(s, t), we would like to choose either
s or t to rewrite first. If s rewrites to true, but t does not, then it is
crucial to choose s, else work (possibly infinitely much work) will be wasted.
Similarly, if t rewrites to true, but s does not, it is crucial to choose t. If
neither s nor t rewrites to true, then both must be rewritten to normal
form in order to normalize the whole term, so we may choose either. If both
s and t rewrite to true then, ideally, we would like to choose the one that
is cheapest to rewrite, but suppose that we are satisfied with either choice
in this case also.
Suppose that we have an effective way to choose s or t above. Then,
we have a recursive separation of the terms or(s, t) in which s rewrites
to true and t has no normal form from those in which t rewrites to true
and s has no normal form. Such a separation is known to be impossible.
(It would lead easily to a computable solution of the halting problem. See
[Machtey and Young, 1978] for a discussion of recursive inseparability.) So,
we cannot decide effectively whether to rewrite redexes in s or in t without
risking wasted work.
The case where both s and t rewrite to true poses special conceptual
problems for sequentiality theory. Although it is necessary to rewrite one
of s or t in order to reach the normal form true, it is neither necessary to
rewrite s, nor necessary to rewrite t. The criterion of choosing a redex that
must be rewritten fails to even define a next rewriting step mathematically,
and the question of computability does not even arise. Notice that this
latter case is problematic for T or+ alone, without the addition of Tc.
The difficulty in Example 3.2.1 above appears to depend on the unifia-
bility of the left-hand sides of the two equations in T or+, which is allowed
in rewrite-orthogonal systems, but not rule-orthogonal systems. A more
subtle example, due to Huet and Levy, shows that rule-orthogonality is
still not sufficient for effective sequencing.
Example 3.2.2 ([Huet and Levy, 1991]). Replace the parallel-or
equations of Example 3.2.1 by the following:
102 Michael J. O'Donnell

and consider the system Tc U T14. Given a term of the form f ( r , s, t), we
cannot decide whether to rewrite redexes in r, in s, or in t without risking
wasted work, because we cannot separate computably the three cases
• r -t* a and s ->•* b
• r ->* b and t -»* a
• s -)•* a and t ->•* b
Unlike the parallel-or example, it is impossible for more than one of
these three cases to hold. There is always a mathematically well-defined
redex that must be rewritten in order to reach a normal form, and the
problem is entirely one of choosing such a redex effectively. In fact, for sets
T of equations such that T U T14 is terminating (every term has a normal
form), the choice of whether to rewrite r, s, or t in f(r, s, t) is effective, but
usually unacceptably inefficient.
So, some further analysis of the form of equations beyond checking for
orthogonality is required in order to choose a good redex to rewrite next
in a term rewriting sequence. Analysis of equations in order to determine
a good choice of redex is called sequentiality analysis.

3.2.1 Needed redexes and weak sequentiality


The essential ideas for sequentiality analysis in term rewriting are due to
Huet and Levy [Huet and Levy, 1991], based on a notion of sequential pred-
icate by Kahn [Kahn and Plotkin, 1978]. A redex that may be rewritten
without risk of wasted work is called a needed redex.
Definition 3.2.3 ([Huet and Levy, 1991]). Given an orthogonal set T
of equations and a term t0, a redex a in t0 is a needed redex if and only if,
for every term rewriting sequence to ->• t\ -^ • • • -> tm, either
• there exists an i, 1 < i < m such that a residual of a is rewritten in
the step ti-1 -> ti, or
• a has at least one residual in tm.

A needed redex is a redex whose residuals can never be completely elim-


inated by rewriting other redexes. So, the rewriting of a needed redex is
not wasted work, since at least one of its residuals has to be rewritten in
order to reach normal form. Huet and Levy defined needed redexes only for
terms with normal forms, but the generalization above is trivial. A system
is weakly sequential if there is always a needed redex to rewrite.
Definition 3.2.4. A rewrite-orthogonal set of equations is weakly se-
quential if and only if every term that is not in normal form contains at
least one needed redex. A set of equations is effectively weakly sequential
if and only if there is an effective procedure that finds a needed redex in
each term not in normal form.
Equational Logic Programming 103

The word 'sequential' above is conventional, but may be misleading to


those interested in parallel computation. A weakly sequential system is not
required to be computed sequentially—typically there is great opportunity
for parallel evaluation. Rather, a weakly sequential system allows sequen-
tial computation without risk of wasted rewriting work. In this respect
'sequentializable' would be a more enlightening word than 'sequential.'
The parallel-or system T or+ of Example 3.2.1 is rewrite-orthogonal,
but not weakly sequential, because the term

or(or (true, a), or(true, a))

has two redexes, neither of which is needed, since either can be elimi-
nated by rewriting the other, then rewriting the whole term to true. Rule-
orthogonality guarantees weak sequentiality.
Proposition 3.2.5 ([Huet and Levy, 1991]). A nontrivial, rule-like,
and left-linear set of equations (Definition 2.3.9) is weakly sequential if
and only if it is rule-orthogonal.
The proof of (•$=) is in [Huet and Levy, 1991]. It involves a search
through all rewriting sequences (including infinite ones), and does not yield
an effective procedure. (=*•) is straightforward, since when two redexes over-
lap neither is needed.
Proposition 3.2.5 above shows that no analysis based on weak sequen-
tiality can completely sequentialize systems whose confluence derives from
rewrite-orthogonality, or from a Knuth-Bendix analysis. Section 3.2.4 dis-
cusses possible extensions of sequentiality beyond rule-orthogonal systems.
The system Tc U T14 of Example 3.2.2 is rule-orthogonal, and therefore
weakly sequential. For example, in a term of the form f ( r , s, t), where
r —>* a and s —>•* b, both r and s contain needed redexes. The subsystem
T14, without the general-purpose programming system Tc, is effectively
weakly sequential, but only because it is terminating. I conjecture that
effective weak sequentiality is undecidable for rule-orthogonal systems.
3.2.2 Strongly needed redexes and strong sequentiality
The uncomputability of needed redexes and the weak sequential property
are addressed analogously to the uncomputability of confluence: by finding
efficiently computable sufficient conditions for a redex to be needed, and
for a system to be effectively weakly sequential. A natural approach is
to ignore right-hand sides of equations, and detect those cases of needed
redexes and effectively weakly sequential systems that are guaranteed by
the structure of the left-hand sides. To this end we define w-rewriting, in
which a redex is replaced by an arbitrary term.
Definition 3.2.6 ([Huet and Levy, 1991]). Let T = {l1 = r 1 , . . . ,
ln = rn} be a rule-orthogonal set of equations.
104 Michael J. O'Donnell

A term s1 w-rewrites to s2 by T (written s1 -^ s2) if and only if there


is a term t, a variable x with exactly one occurrence in t, an instance l\ of a
left-hand side li in T, and a term r such that s1 = t[^/x] and s2 = t[r/x].
That is, S2 results from finding exactly one instance of a left-hand side
of an equation in T occurring as a subterm of s1, and replacing it with
an arbitrary term. The definition of residual (Definition 3.1.1) generalizes
naturally to w-rewriting.
Now, a strongly needed redex is defined analogously to a needed redex,
using w-rewriting instead of rewriting.
Definition 3.2.7 ([Huet and Levy, 1991]). Given a rule-orthogonal set
T of equations and a term t0, a redex a in t0 is strongly needed if and only
if, for every w-rewriting sequence to -jf ti -&"•-$ tm, either
• there exists an i, 1 < i <m such that a residual of a is rewritten in
the step <i_i -^ ti, or
• a has at least one residual in tm.

Because of rule-orthogonality, the property of being strongly needed de-


pends only on the location of a redex occurrence, and not on its internal
structure. So, we call an arbitrary occurrence in a term strongly needed if
and only if a redex substituted in at that location is strongly needed. [Huet
and Levy, 1991] defines strong indexes, and shows that they determine ex-
actly the strongly needed redexes. It is easy to see that every strongly
needed redex is needed, and outermost. And, it is easy to detect whether
a given redex is strongly needed (see Section 4 and [Huet and Levy, 1991]).
A system of equations is strongly sequential if there is always a strongly
needed redex to be rewritten, except in a normal form term.
Definition 3.2.8 ([Huet and Levy, 1991]). A rule-orthogonal set of
equations is strongly sequential if and only if every term that is not in
normal form contains at least one strongly needed redex.
It is obvious that every strongly sequential system is effectively weakly
sequential, but the converse does not hold.
Example 3.2.9. The system T14 of Example 3.2.2, although it is effec-
tively weakly sequential, is not strongly sequential. f(f(a,b,c),f(a,b,c),
f ( a , b, c)) w-rewrites to f(f(a, b, c),a, b), which is a redex, so the first redex
f(a, b, c) is not strongly needed. Similarly, f(f(a, 6, c), f(a, 6, c), f(a, b, c))
w-rewrites to the redexes f ( b , f ( a , b, c), a) and f(a, 6, f(a, 6, c)), so the sec-
ond and third redexes are not strongly needed. All three redexes are weakly
needed.
By contrast, consider the strongly sequential system
Equational Logic Programming 105

In the term f ( g ( a , a), h(a, a)), the first and last occurrences of a are strongly
needed, but the second and third are not.
Notice that w-rewriting allows different redexes that are occurrences
of instances of the same left-hand side to be rewritten inconsistently in
different w-rewriting steps. Such inconsistency is critical to the example
above, where in one case f ( a , b, c) w-rewrites to a, and in another case it
w-rewrites to b.
Strong sequentiality is independent of the right-hand sides of equa-
tions.
Proposition 3.2.10. If T1 and T2 are left-similar (see Definition 2.3.11),
and T1 is strongly sequential, then so is T2.
The proof is straightforward, since T1 and T2 clearly have the same
<w-rewriting relations (Definition 3.2.6).
It is not true, however, that a system is strongly sequential whenever
all left-similar systems are weakly sequential. The system of Example 3.2.2
and all left-similar systems are weakly sequential, but not strongly sequen-
tial. But in that case, no given redex is needed in all of the left-similar
systems. An interesting open question is whether a redex that is needed in
all left-similar systems must be strongly needed.
For finite rule-orthogonal sets of equations, strong sequentiality is de-
cidable.
Proposition 3.2.11 ([Huet and Levy, 1991; Klop and Middeldorp,
1991]). Given a finite rule-orthogonal set T of equations, it is decidable
whether T is strongly sequential.
The details of the proof are quite tricky, but the essential idea is that
only a finite set of terms, with sizes limited by a function of the sizes of
left-hand sides of equations, need to be checked for strongly needed redexes.
In developing the concept of strongly needed redexes and connecting it
to the concept of weakly needed redexes, Huet and Levy define the inter-
mediately powerful concept of an index. Roughly, an index is a needed
redex that can be distinguished from other redexes just by their relative
positions in a term, without knowing the forms of the redexes themselves
[Huet and Levy, 1991]. Every index is a weakly needed redex, but not vice
versa. Strong indexes are equivalent to strongly needed redexes. A system
in which every term not in normal form has at least one index is called
sequential. The precise relation between sequentiality in all left-similar
systems, and strong sequentiality, is an interesting open problem.
All of the sequentiality theory discussed in this section deals with se-
quentializing the process of rewriting a term to normal form. Example 7.1.6
in Section 7.1 shows that even strongly sequential systems may require a
parallel evaluation strategy for other purposes, such as a complete proce-
dure for rewriting to head-normal form.
106 Michael J. O'Donnell

3.2.3 Optimal rewriting


Prom a naive point of view, the natural strategy of rewriting a strongly
needed redex at each step does not lead to minimal-length rewriting se-
quences ending in normal form. The problem is that the rewriting of a
strongly needed redex may cause another needed (but not strongly needed)
redex to be copied arbitrarily many times. Since strongly needed redexes
are always outermost, they are particularly likely to cause such copying.
Example 3.2.12. In the strongly sequential system of equations

given the initial term f(f(a)), both redexes, f(f(a)) and f(a), are needed,
but only the outermost one is strongly needed. By rewriting a strongly
needed redex at each step, we get the 3-step sequence

But, there is a 2-step sequence

which does not rewrite the unique strongly needed redex in the first step.
It is easy to construct further examples in which the number of steps
wasted by rewriting strongly needed redexes is arbitrarily large.
Proposition 3.2.13. Given an arbitrary strongly sequential system of
equations, and a term, there is no effective procedure to choose a redex at
each step so as to minimize the length of the rewriting sequence to normal
form.
I am not aware of a treatment of this point in the literature. The basic
idea is to use the equations

similar to T16 of Example 3.2.12, and add additional equations to define h


as the evaluator for Lisp or some other general-purpose computing system.
Now, start with a term of the form f(a, p), where p is an arbitrary program.
If evaluation of p halts with value 0 (that is, if h(p) rewrites to 0), then
an optimal rewriting sequence rewrites f(a, p) to g(a, a, h(p)) in the first
step, and never rewrites the occurrences of a. If evaluation of p halts with
any value other than 0, then an optimal sequence rewrites a to 6 in the
first step (else it must be rewritten twice later). An effective method for
choosing the first step would yield a recursive separation of the programs
that halt with output 0 from those that halt with output 1, which is known
to be impossible [Machtey and Young, 1978].
Equational Logic Programming 107

Notice that, when there is a normal form, breadth-first search over all
rewriting sequences yields a very expensive computation of a minimal se-
quence. But, no effective procedure can choose some redex in all cases (even
in the absence of a normal form), and minimize the number of rewriting
steps when there is a normal form.
The uncomputability of minimal-length rewriting strategies in Propo-
sition 3.2.13 sounds discouraging. The number of rewriting steps is not,
however, a good practical measure of the efficiency of a sequencing strat-
egy. Given equations, such as f ( x ) = g(x, x) in Example 3.2.12, with more
than one occurrence of the same variable x on the right-hand side, normal
sensible implementations do not make multiple copies of the subterm sub-
stituted for that variable. Rather, they use multiple pointers to a single
copy. Then, only one actual computing step is required to rewrite all of
the apparent multiple copies of a redex within that substituted subterm.
So, in Example 3.2.12, the strategy of choosing a strongly needed redex
actually leads to only two steps, from f(f(a)) to g ( f ( a ) , f ( a ) ) , and then
directly to g(g(a,a),g(a,a)). The normal form is represented in practice
with only one copy of the subterm g(a, a), and two pointers to it for the
two arguments of the outermost g. If we charge only one step for rewriting
a whole set of shared redexes, then rewriting strongly needed redexes is
optimal.
Proposition 3.2.14. Consider multiple-rewriting sequences, in which in
one step all of the shared copies of a redex are rewritten simultaneously.
Given a strongly sequential set of equations and a term, the strategy of
rewriting at each step a strongly needed redex and all of its shared copies
leads to normal form in a minimal number of steps.
This proposition has never been completely proved in print. I claimed
a proof [O'Donnell, 1977] but had a fatal error [Berry and Levy, 1979;
O'Donnell, 1979]. The hard part of the proposition—that the rewriting
of a strongly needed redex is never a wasted step—was proved by Huet
and Levy [Huet and Le"vy, 1991]. The remaining point—that rewriting a
strongly needed redex never causes additional rewriting work later in the
sequence—seems obvious, but has never been treated formally in general.
Levy [Levy, 1978] treated a similar situation in the lambda calculus, but
in that case there is no known efficient implementation technique for the
sequences used in the optimality proof. Although the formal literature on
optimal rewriting is still incomplete, and extensions of optimality theory to
systems (such as the lambda calculus) with bound variables are extremely
subtle, for most practical purposes Huet's and Levy's work justifies the
strategy of rewriting all shared copies of a strongly needed redex at each
step. Optimality aside, rewriting strategies that always choose a strongly
needed redex are examples of one-step normalizing strategies, which pro-
vide interesting theoretical problems in combinatory logic and the lambda
108 Michael J. O'Donnell

calculus [Barendregt, 1984].


3.2.4 Extensions to sequentiality analysis
Proposition 3.2.5 seems to invalidate rewrite-orthogonal systems for effi-
cient or optimal sequential rewriting. A closer look shows that the defini-
tion of weakly needed redexes and weak sequentiality is inappropriate for
rewrite-orthogonal systems. When two redexes from a rewrite-orthogonal
system overlap, we get the same result by rewriting either one. So, there is
no need for a sequential strategy to choose between them, and we might as
well allow an arbitrary selection. This observation suggests a more liberal
concept of needed redex.
Definition 3.2.15. Given a rewrite-orthogonal set T of equations and a
term to, a redex a in t0 is a rewrite-needed redex if and only if, for every
term rewriting sequence to —> ti — > • • • — » • t m , either
• there exists an i, 1 < i < m such that a residual of a is rewritten in
T
the step ti-1 —> ti, or
• there exists an i, 1 < i < m such that a redex overlapping a residual
T
of a is rewritten in the step ti-1 —*• ti, or
• a has at least one residual in tm.
This is the same as Definition 3.2.3 of needed redex, except that when one
redex is reduced, we give credit to all redexes that overlap it. We generalize
Definition 3.2.4 with the new version of needed redexes.
Definition 3.2.16. An orthogonal set of equations is weakly rewrite-
sequential if and only if every term that is not in normal form contains
at least one rewrite-needed redex. A set of equations is effectively weakly
rewrite-sequential if and only if there is an effective procedure that finds a
rewrite-needed redex in each term not in normal form.
Three sorts of overlaps between left-hand sides of equations have dif-
ferent impacts on weak rewrite-sequentiality. Recall (Definition 2.3.9) that
the problematic overlaps occur when there is a term s, and left-hand sides
l1 and l1,-, such that

Rewrite-nonambiguity requires that either s is an instance of lj, or the


corresponding right-hand sides ri and rj satisfy

1. Sometimes the structure of the inner term li is entirely subsumed


by the structure of the outer term lj—that is, the substituted terms
t,...,t'n are trivial, and
Equational Logic Programming 109

In this case, the equation lj = rj is redundant, since every possible


application of it can be accomplished by applying li = ri instead.
2. Sometimes the structure of the inner term li extends below the struc-
ture of the outer term lj—that is, the substituted terms ti,...,tm
are trivial, and

Overlaps of this sort appear not to destroy weak rewrite-sequentiality.


3. Otherwise, neither set of substituted terms t1,... , tm nor t^,..., t'n is
trivial. This is the interesting case. Weak rewrite-sequentiality may
hold or not, depending on the extent to which redexes in substituted
subterms are copied or eliminated by the right-hand sides.
Figure 4 illustrates the three types of overlap with suggestive pictures.

Example 3.2.17. Consider the set

The overlap here is of type 1, since

The first equation is redundant since it is essentially a special case of the


second.
Next, consider

T19 = {f(g(x), a) = f(x, a), g(h(x)) = h(x)}


110 Michael J. O'Donnell

The overlap here is of type 2, since

T19 is weakly rewrite-sequential. In a term of the form f ( s , t ) , in which


s -¥ g(h(s')) and also t -¥ a, it is always safe to rewrite t first, since by
rewriting s to g(h(s')) and then rewriting this to h(s'), we cannot eliminate
the redexes in t.
Now, consider

The overlap here is of type 3, since

is the smallest substitution showing the overlap, and neither substitution


is trivial. T20 is not weakly rewrite-sequential, since the term f ( g ( g ( a , a ) ,
g(a, a))) has the two rewrite sequences

which shows that the rightmost occurrence of g(a, a) is not needed and

which shows that the leftmost occurrence of g(a, a) is not needed.


Modify the previous example slightly, by changing the right-hand sides:

The result is still a type 3 overlap, but the system is weakly rewrite-
sequential, since the redexes that are not needed immediately in the cre-
ation of an outer redex are preserved for later rewriting.
The positive parallel-or equations in Tor+ of Examples 2.3.16 and 3.2.1
give another example of a type 3 overlap where weak rewrite-sequentiality
fails. On the other hand, the negative parallel-or equations of Tor- in
Example 2.3.16 have type 2 overlap, but they are sequential. In a term
T T _
of the form or(s,t) where s -3-~ false and t -3T false, it is safe to rewrite
either s or t first, since the redexes in the other, unrewritten, subterm are
preserved for later rewriting.
Theories extending sequentiality analysis, through concepts such as
weak rewrite-sequentiality, are open topics for research. I conjecture that
Equational Logic Programming 111

weak rewrite-sequentiality is an undecidable property of rewrite-orthogonal


systems, and that the natural concept of strong rewrite-sequentiality has
essentially the same properties as strong sequentiality, except for allow-
ing type 2 overlaps. Optimality is very subtle in these extensions, since
the amount of sharing may vary depending on which of two overlapping
redexes is reduced. More interesting and powerful extensions of sequential-
ity will require analysis of right-hand sides to deal with type 3 overlaps.
Such analysis should be related in interesting ways to strictness analysis in
functional programming languages [Mycroft, 1980; Hughes, 1985b] , which
detects partial strictness properties of defined functions. Abstract interpre-
tation [Abramsky and Hankin, 1987; Cousot and Cousot, 1977] provides a
promising approach to sequentiality analysis based on right-hand sides.
Extensions of useful sequentiality analysis to systems whose confluence
is established by variations on the Knuth-Bendix procedure will require
the concept of residual to be generalized so that the residual of a redex a
may be an arbitrarily long rewriting sequence used in resolving a critical
pair involving a. Variations on sequentiality analysis for incremental and
parallel implementations of equational logic programming are discussed in
Sections 6 and 7, respectively.

4 Algorithms and data structures to implement


equational languages
The basic idea of implementing equational logic programming for strongly
sequential systems is straightforward. Represent terms as linked structures
with sharing, in the time-honored style of Lisp [McCarthy et al., 1965;
McCarthy, 1960]. At every step, find a strongly needed redex and rewrite
it, halting if and when the sequence ends with a normal form. A lot of work
is required to reduce these basic ideas to efficient practice. At the abstract
level of algorithm and data-structure design, the problem breaks naturally
into three components: a data structure to represent terms, a pattern-
matching and sequencing algorithm to find strongly needed redexes, and
a driving procedure to invoke the pattern-matcher/sequencer, perform the
chosen rewrites, and incorporate the results into the term data structure.

4.1 Data structures to represent terms


The natural data structure for terms is a linked structure in a heap, with
sharing allowed. Each occurrence of a symbol / in a term is represented by
a node of storage containing / and pointers to its arguments. Sharing is
accomplished by allowing several different argument pointers to point to the
same node. There are a number of optimizations that coalesce small nodes,
or break large nodes into linked sequences, that have been explored in the
literature on Lisp compilers [Bobrow and Clark, 1979]. In this section, we
consider data structures at an abstract level with precisely one symbol per
112 Michael J. O'Donnell

heap node, and assume that such optimizations are applied at a lower level
of implementation.
4.1.1 A conceptual model for term data structures
Some useful techniques for implementing equational logic programming
require more than the linked heap structures representing terms. For ex-
ample, it is sometimes better to represent the rewriting of s to t by a link
from the head node of the representation of s pointing to the head node
of t, rather than by an actual replacement of s by t. This representation
still uses a heap, but the heap now represents a portion of the infinite
rewriting graph for a starting term, rather than just a single term at some
intermediate stage in rewriting to normal form. Other techniques involve
the memoing of intermediate steps to avoid recomputation—these require
more efficient table lookup than may be achieved with a linked heap. For-
tunately, there is a single abstract data structure that subsumes all of the
major proposals as special cases, and which allows a nice logical interpre-
tation [Sherman, 1990]. This data structure is best understood in terms of
three tables representing three special sorts of functions.
Definition 4.1.1. For each i > 0 let Funi be a countably infinite set of
function symbols of arity i. The 0-ary function symbols in Fun0 are called
constant symbols. T°P is the set of ground terms (terms without variables)
constructed from the given function symbols (see Definition 2.3.1 of the
chapter 'Introduction: Logic and Logic Programming Languages'). Let P
be a countably infinite set. Members of P are called parameters, and are
written a, /?,..., sometimes with subscripts. Formally, parameters behave
just like the variables of Definition 2.3.1, but their close association with
heap addresses later on makes us think of them somewhat differently.
An i-ary signature is a member of Funi x Pi. The signature (/, ( a 1 , . . . ,
ai)), is normally denoted by f ( a 1 , . . . , ai). Sig denotes the set of signatures
of all arities.
Let nil be a symbol distinct from all function symbols, parameters, and
signatures.
A parameter valuation is a function val: P —> Sig U {nil}.
A parameter replacement is a function repl: P -» P U {nil}.
A signature index is a function ind : Sig —>• P U {nil}.
A parameter valuation, parameter replacement, or signature index is
finitely based if and only if its value is nil for all but a finite number of
arguments.
The conventional representation of a term by a linked structure in a
heap may be understood naturally as a table representing a finitely based
parameter valuation. The parameters are the heap addresses, and the
signatures are the possible values for data nodes. val(a) is the signature
stored at address a. But, we may also think of parameters as additional
Equational Logic Programming 113

Table 1. Parameter valuation representing f(g(a, f(a, a)), f(a, a)).

Fig. 5. Linked structure representing f(g(a, f(a, a)), f(a, a)).

0-ary symbols, of signatures as terms of height 1 built from parameters,


and of the function val as a set of formulae asserting equalities between
parameters and signatures. Each value val(a) = /(/?1,..., &) ^ nil of the
valuation function represents the formula a = f(01, • • • , Ai). When val
represents the contents of a heap with the head symbol of a term t stored at
address a, then a = t is a logical consequence of the equations represented
by val.
Example 4.1.2. Consider the finitely based parameter valuation val given
by Table 1. All values of val not shown in the tables are nil. The linked
heap structure associated with val is shown in Figure 5. The set of equa-
tions represented by val is

Logical consequences of these equations include a2 = f(a, a), a1 = g(a,


f(a,a)), and «0 = f(g(a, f(a, a)), f(a,a)).
It is useful to have a notation for the term represented explicitly by a
parameter valuation at a particular parameter.
114 Michael J. O'Donnell

Definition 4.1.3. Let val be a parameter valuation. The partial function


val* : P —> Tp is defined inductively by

when val(a) = f ( 0 1 , . . . , /?i). Notice that the case where / is a constant


symbol, and therefore i = 0, provides a basis for the inductive definition. If
a value of nil is encountered anywhere in the induction, or if the induction
fails to terminate because of a loop, val* is undefined.
When val* (a) is well-defined, the equation a = val* (a) is always a
logical consequence of the equations represented by val.
Optimized implementations of equational logic programming languages
sometimes find it more efficient to link together nodes representing left-
and right-hand sides of equations, rather than to actually perform rewrit-
ing steps. Such linking can be implemented by an additional pointer in
each node of a heap structure. The information in these links is natu-
rally represented by a parameter replacement function. Given a set T
of equations, it first appears that we should think of the function repl
as a set of formulae asserting that one term rewrites to another—that is,
repl(a) = ft ^ nil represents the formula val* (a) -^* val*(/3). But further
rewriting steps on subterms of the term represented by a may invalidate
such a relation. There are also implementations that make efficient use
of data structures for which val* is ill defined. So, the proper logical in-
terpretation of repl(a) = 0 as a formula is merely a = /3. An efficient
implementation manipulates val and repl so that /? is in some way a bet-
ter starting point for further rewriting than a. The precise sense in which
it is better varies among different implementations. val and repl together
yield a set of terms for each parameter, all of which are known to be equal.
The set val*repl(a) defined below is the set of terms that may be read by
starting at a and following links in val and repl.
Definition 4.1.4. Let val be a parameter valuation and let repl be a
parameter replacement. The function val*repl : P ->• 2Tp is defined so that
val*repl(a) is the least set satisfying

In the presence of loops, val*repl(a) may be infinite. Even without loops,


its size may be exponential in the size of the data structure representing
val and repl. The power of such data structures derives from this ability
to represent large sets of equivalent terms compactly. When val* (a) is
well defined, val* (a) € val*repl(a). Another special member of val*repl(a)
Equational Logic Programming 115

Table 2. Parameter valuation and replacement.

is particularly interesting—the one reached by following repl links as much


as possible.
Definition 4.1.5. Let val be a parameter valuation and let repl be a
parameter replacement. The partial function valrepl™^ : P —>• Tp is defined
inductively by

when repl(a) ^ nil.

when repl(a) = nil and val(a) = f(A1,• • •, Ai)- As with val*, valrepl™"*, is
undefined if nil is encountered as a value of val, or if the induction fails
to terminate because of a loop.
Example 4.1.6. Consider the finitely based parameter valuation val
and parameter replacement repl given by Table 2. All values of val and
repl not shown in the tables are nil. These tables represent some of the
consequences of the equations

when used to rewrite the term f ( f ( a , b ) , b ) The linked heap structure as-
sociated with val and repl is shown in Figure 6. The rightmost link in
each node a points to repl(a). By following links from a0 in the table we
can construct the six ground terms in val*repl(ao): val*(ao) = f(f(a, b), b),
f(f(a,b),c), f(f(a,c), b), f ( f ( a , c ) , c ) , g(a,b), and valrepl^l^o) = g(a,c).
Every equality between these terms is a logical consequence of T22, and
all of these equalities may be read immediately from the data structure by
following links from a0.
A prime weakness of data structures based on parameter valuations
and replacements is that both functions require a parameter as argument.
Given a newly constructed signature, there is no direct way, other than
116 Michael J. O'Donnell

Fig. 6. Linked structure showing val and repl links.

searching through the parameter valuation, to discover whether informa-


tion on that signature is already available. Signature indexes are intended
precisely to allow a newly constructed signature to be translated to an
equivalent parameter. While finitely based parameter valuations and re-
placements are normally implemented by direct memory access, using pa-
rameters as addresses, the number of possible signatures is generally too
great to allow such an implementation of a finitely based signature index.
General-purpose table-look-up methods are used instead, usually hash ta-
bles [Knuth, 1973]. A typical application of a hash-table representation of
a signature index is the hashed cons optimization in Lisp [Spitzen et al.,
1978], where every newly constructed node is looked up in a hash table to
see whether it already exists in the heap—if it does the existing node may be
shared instead of creating another copy in the heap. The most obvious use
of a signature index ind, such as the hashed cons application, requires that
whenever ind(f(/31,... ,&i)) = a ^ nil, then val(a) = f(/?1,... ,&i); that
is, val is a partial inverse to ind. It may be advantageous in some cases to
let ind(f(/31,...,/%i)) be a parameter known to be equal to /(/?i,. • • ,/?i).
The proper logical interpretation of ind(f(/?1,..., &i)) = a ^ nil is merely
the formula f(/?i,..., $i) = a. So, ind provides the same type of logical in-
formation as val, but allows access to that information through a signature
argument, instead of a parameter argument.
Equational Logic Programming 117

4.1.2 Logical interpretation of term data structures


Each point in the graph of a parameter valuation val, a parameter re-
placement repl, and a signature index ind represents an equation. An
entire data structure consisting of finitely based functions val, repl, and
ind represents a suitably quantified conjunction of these equations. For
definiteness, suppose that inputs and outputs are rooted at the parameter
a0- Then the logical meaning of val, repl, and ind is the conjunction of all
their equations, with all parameters except a0 existentially quantified.
Definition 4.1.7. Let val, repl, ind be a parameter valuation, a parame-
ter replacement, and a signature index, respectively, all three finitely based.
Let a 1 , . . . , an be all of the parameters J3 occurring in the finite basis of the
domain of val or of repl (val(B) = nil or repl(B) = nil), or in the range of
repl, or as a component of a signature in the finite basis of the domain of
ind or in the range of val, except for the input/output parameter a0- The
logical interpretation of val, repl, ind is the formulaFval,repl,inddefined
by

where G is the conjunction of all the equations

Example 4.1.8. Consider the finitely based parameter valuation val and
parameter replacement repl discussed in Example 4.1.6, and shown in Ta-
ble 2 and Figure 6. If a0 is the parameter used for the root of the input
and output, then the logical interpretation of val and repl is

The interpretation of parameter valuations, parameter replacements,


and signature indexes as encodings of existentially quantified conjunctions
of equations makes it much easier to insure correctness of proposed algo-
rithms for manipulating data structures based on these functions. The
essential idea is that a transformation of a data structure from a state
representing val0, repl0, ind0 to one representing val1, repll ind1 in the
computation of an equational program T is logically permissible if and only
if the formula represented by val1, repl1, ind1 is a logical consequence of
T plus the formula represented by val0, repl0, ind0. So, an evaluation
algorithm may take input s, start in a state where valmax(a0) = s, and ap-
ply permissible transformations to reach a state where t € valmaxrepl(a0), for
some normal form t. The permissibility of the transformations guarantees
that t is a correct answer.
118 Michael J. O'Donnell

The interpretation of the term data structures as sets of formulae applies


the concepts of logic programming to the implementation of a logic pro-
gramming language. A similar use of logical concepts within the implemen-
tation of a logic programming language occurs in recent presentations of
unification algorithms (used in the implementation of Prolog) as processes
that derive solutions to equations, where every intermediate step repre-
sents a new and simpler set of equations to be solved [Kapur et al., 1982;
Martelli and Montanari, 1976]. The view of unification as transformation
of equations to be solved clarifies the correctness of a number of clever and
efficient algorithmic techniques.
Similarly, the logical view of term data structures shows immediately
that a wide range of transformations of such data structures are correct,
leaving an implementer free to choose the ones that appear to be most
efficient. In order to represent new terms during a computation, we must
be careful not to introduce spurious information about the input/output
parameter a0, nor to postulate the solvability of nontrivial equations. The
concept of reachability, defined below, may be used to enforce both of these
conditions.
Definition 4.1.9. The set of parameters reachable from a given param-
eter a is defined inductively by
1. a is reachable from a
2. If 0 is reachable from a, and one of the following holds as well

then 7 is reachable from a

Permissible transformations include


1. Building new terms: when a is not reachable from any of a0, B1,..., Bi,
reassign val(a) := f ( B 1 , . . . , Bi). (This allows both bottom-up and
top-down construction of directed-acyclic representations of terms,
using names completely unconnected to a0. More liberal conditions
than the unreachability of a from a0, B 1 , . . . , Bi are possible, but they
are somewhat complex to define.)
2. Representing rewriting in repl: when some s 6 val*repl(a) rewrites
to some t,E val*repl(B), reassign repl(a) := B.
3. Rewriting in val: when repl(a) = nil,
reassign val(a) := val(repl(a)).
Equational Logic Programming 119

4. Compressing paths in repl: when repl(repl(a)) = nil,


reassign repl(a) := repl(repl(a)).
5. Rewriting arguments in val: when val(a) = f(B1,..., Bi) and
repl(Bj) = nil, reassign val(a) := f(B1,..., repl(Bj),..., Bi).
6. Opportunistic sharing: when ind(val(a)) = nil, a,
reassign repl(a) := ind(val(a))
7. Indexing: when val(a) = f(B1,..., Bi),
reassign ind(f(B1,..., Bi)) := a.
8. Garbage collecting: reassign val(a) := nil, and/or repl(a) := nil,
and/or ind(f(...)) := nil. (This is always permissible, because it
only erases assertions. It is desirable only when the information re-
moved is not useful for the remainder of the computation.)
A straightforward evaluator scans valmaxrepl (a0) to find an instance of a left-
hand side s of an equation rooted at some node 0, uses transformation 1
repeatedly to build a copy of the corresponding right-hand side t in free
nodes rooted at 7, then links B to r by transformation 2. Transformations
3-5 allow the same term valmaxrepl(a0) to be scanned more efficiently, by
reducing the number of repl links that must be followed. Transformations
6 and 7 are used for optimizations such as hashed cons, congruence closure,
and memo functions to avoid re-evaluating the same subterm when it is
constructed repeatedly in a computation. Section 4.3 discusses several
optimizations using transformations 3-7.
The existential quantification of all parameters other than a0 is re-
quired for the logical correctness of transformation 1 (term building), which
changes the meaning of some existentially quantified parameter, without
affecting any assertion about a0. 2-8 are permissible independent of the
quantification. Notice that only transformation 2 depends on the given
equational program, and only 2 adds information to the data structure. 1
is needed to build new terms required by 2, but by itself 1 does not change
the logical assertions about a0. 3-8 can only reduce the information given
by val and repl, but they may improve the efficiency of access to infor-
mation that is retained. 3-7 never changevalmaxrepl(a0).Transformation 8
is normally applied only to nodes that are inaccessible from a0, in which
case 8 also preserves valmaxrepl(a0).
The logical interpretation introduced in Definition 4.1.7 seems to be
the most natural one for explaining currently known techniques for imple-
menting equational logic programming, but others are possible, and might
lead to useful extensions of, or variations on, the current sort of equa-
tional logic programming. Let a1,..., an be all of the parameters used in
val, repl, ind, other than the input/output parameter a0, and let G be
the conjunction of all the equations represented by points in the graphs
of val, repl, ind, just as in Definition 4.1.7. Two other interesting logical
interpretations worthy of study are
120 Michael J. O'Donndl

The first proposal above allows introduction of arbitrary structures not


connected to the input/output variable a0. The second one allows for
solution of equations, where every solution to the output is guaranteed to
be a solution to the input problem as well.

4.2 Pattern-matching and sequencing methods


Given a representation of a term, the implementation of an equational
logic program must perform pattern-matching to discover instances of left-
hand sides of equations that may be rewritten, and must apply sequencing
techniques to determine which of several such redexes to rewrite. These
two conceptual tasks appear to be inextricably connected, so it is best to
provide a single procedure to do both—that is, to find the next appropriate
redex to rewrite in a given term. In order to find and choose a redex
in a term, a procedure traverses the term until it has gathered enough
information about the symbols in the term to make its choice. The full
details of pattern-matching and sequencing methods are too long for this
chapter. So, I describe two basic approaches to the problem and illustrate
them by examples. The reader who needs more detail should consult [Huet
and Levy, 1991; Hoffmann and O'Donnell, 1979; Hoffmann et al., 1985;
O'Donnell, 1985; Klop, 1991; Klop and Middeldorp, 1991].
4.2.1 Representing sequencing information by A-terms
A natural way to represent partial information about a term is to present
the known structure as a term, with a special symbol (ft) representing
unknown portions.
Definition 4.2.1. The set Tp of A-terms is defined in the same way as
the set Tp of terms (Definition 2.3.1 of the chapter 'Introduction: Logic
and Logic Programming Languages'), except the new constant ft is added
to the set Fun0 of symbols of arity 0.
An A-term s is less defined than an A-term t, written s C t, if and only
if s is the result of replacing zero or more subterms of t by ft.
s n t denotes the greatest lower bound of s and t.
When A-terms s and t have a common upper bound (that is, when
there is a u such that s C u and t C u), we write s f t.
The A in an A-term behaves formally much like a variable, except that
each occurrence of the same symbol ft is treated as a unique variable,
occurring only at that location.
There is an elegant and simple procedure, called melting, for computing
all of the possible effects of w-rewriting on an A-term.
Definition 4.2.2. The function patt : TP -» Tp is defined by
Equational Logic Programming 121

where x1,..., xm are all of the variables occurring in t.


Extend patt to sets of equations T = {l1 = r1, . . . , ln = rn} by

Given a set T of equations, and an A-term s E Tp, s is transformed into


meltTW(s) by the following extremely simple polynomial-time procedure:
1. If s = t[u/x], where u=Aand u | p for some p (E patt(T), then re-
place s by t[A/x\.
2. Repeat (1) above until it is not applicable.
If s is an A-term representing current information about a term t that we
are rewriting to normal form, then meltTW(s) represents precisely the in-
formation in s that is guaranteed to hold for all t' such that t ^* t'. By
marking an occurrence with a new inert symbol, and melting, we can deter-
mine whether w-rewriting may eliminate that occurrence without rewriting
it—that is, we determine whether the occurrence is strongly needed.
Proposition 4.2.3. Let s be an A-term, and let a be an occurrence in
s. Let s' be the result of replacing a in s with a new constant symbol, •. a
is strongly needed if and only if meltTW(s) contains an occurrence of •.
Huet and Levy [Huet and Levy, 1991] propose a pattern-matching and
sequencing technique that accumulates an A-term sa (initially just A),
representing information about the subterm at a particular node a in a
term t. At each step, they choose a strongly needed occurrence of A in
sa, and extend sa by reading the symbol at the corresponding subterm
occurrence in t. The information in sa may be used to control pattern-
matching and sequencing as follows:
1. If there is a pattern p E patt(T) such that p C sa, then a is a redex
occurrence (in strongly sequential systems we always get sa = p, but
it is better to think of the more general case).
2. If there is no pattern p E patt(T) such that sa t p, then a is not
currently a redex occurrence.
3. If meltTW(sa) = A, then a will never be a redex occurrence.
4. It is safe to query the symbol in t corresponding to a strongly needed
occurrence of A in sa, and to rewrite any redex occurring there, since
it must be strongly needed.
In cases 2 and 3, Huet and Levy move to a proper descendant a' of a corre-
sponding to the largest subterm s' of sa, containing the newly read symbol,
such that s' t p for some p E patt(T) (because of strong sequentiality, in
fact s' C p). Then they let sa' = s' and proceed at a'. In case 1, the fact
that we have reached a by safe moves implies that the redex is strongly
122 Michael J. O'Donnell

Fig. 7. Pictures of terms for sequencing example

needed. So, they rewrite, and continue processing at an appropriate ances-


tor of a, rereading whatever symbols have been changed by rewriting at a.

Example 4.2.4. Consider the strongly sequential system

and search for a strongly needed redex in the term

with occurrences A (the root), 1 (leftmost 6), 2 (f(f(b, b), f ( b , f(f(b, b), b)))),
2.1 (leftmost f(b, b)), 2.1.1, 2.1.2 (second and third bs), 2.2 (f(b, f(f(b, b), b))).
2.2.1 (fourth b), 2.2.2 (f(f(b,b),b)), 2.2.2.1 (rightmost f(b,b)), 2.2.2.1.1,
2.2.2.1.2, 2.2.2.2 (fifth, sixth and seventh bs). The pattern f(A, f(f(A, a), a)),
and the term t, are shown pictorially in Figure 7. The occurrence of A cho-
sen for expansion at each step is underlined, and the subterm replacing it
is shown in a box.
• Start at the root of t, with SA = Q.
* There is only one choice, so read the / at the root, and expand
to sA = f/(n,n) -
* Only the rightmost n is strongly needed, so read the correspond-
ing symbol, and expand to SA = f(n, f(£l,Q) )-
Equational Logic Programming 123

* Only the rightmost fi is strongly needed, so expand to

* The new fl-term is incompatible with left-hand side patterns


(the / conflicts with an a in the pattern), so we find the largest
subterm containing the box that is compatible with a pattern.
That subterm is the second principal subterm at 2,

• Now process the second principal subterm of t,

* Only the rightmost fl is strongly needed, so expand to

* The new ft-term is incompatible with the patterns (the / con-


flicts with an a in the pattern), so we find s' = /(fi,/(fi, fi)),
initialize s2.2 to s', and continue at 2.2.

* Only the rightmost fl is strongly needed, so expand to

* Only the rightmost (7 is strongly needed, so expand to

* The new fi-term is incompatible with the patterns (the b con-


flicts with an a in the pattern), so we find s' = b, initialize s2.2.2.2
to s', and continue at 2.2.2.2.
Now process the subterm t2.2.2.2 = b at 2.2.2.2, with s2.2.2.2 = b.
b C S2.2.2.2, and b E patt(T23), so this subterm is a strongly needed
redex. Rewrite it to a, yielding

and continue at 2.2, using the last value for s2.2 before reading 2.2.2.2:

* Expand the fourth Q again, but this time we read an a instead

* Only the rightmost fl is strongly needed, so expand to

* Only the rightmost J7 is strongly needed, so expand to

* Again the new fi-term is incompatible with the patterns, and


further processing at 2.2.2.1.2 rewrites the b to a, yielding
124 Michael J. O'Donnell

We continue at 2.2, with the last version of s2.2 before reading


2.2.2.1.2. Extend s2.2 = /(ft,/(/(ft, ft), a)) again, this time to

* Now /(ft,/(/(ft,a),a))Es 2 . 2 , and /(ft, /(/(ft,a),a)) €


patt(T2s), so we have a strongly needed redex, which we rewrite
to o, yielding

• Continue at 2 with the last ft-term before reading 2.2: s2 = /(ft, ft).
* Expand the rightmost ft again, but this time we read an a in-
stead of an /, yielding s2 = /(ft,fa])).
* The new ft-term is incompatible with the patterns, and there is
no compatible subterm, so we move back toward the root.
• Continue at A with the last ft term before reading 2.2:

* This time we read an a instead of an /, yielding:

Only the rightmost ft is strongly needed, so expand to

* Only the rightmost ft is strongly needed, so expand to

* As before, the ft-term is incompatible with the patterns; further


processing at 2.1.2 discovers that b is a redex, and rewrites it to
a, yielding

Then we continue at A with SA = /(ft,/(/(ft, ft), a)) again, this


time reading an a and extending to SA = /(ft,/(/(ft,[a]), a)).
* Now we have a redex pattern /(ft, /(/(ft, a), a)) at the root. We
rewrite to a and start over at the root.
• In general, there might be more computation at the root, but in this
example we immediately get SA = a, which is incompatible .with the
patterns. Since there is nowhere left to go, we have a normal form.
Huet and Levy use ft-terms in a pattern-matching and sequentializing
method that succeeds for every strongly sequential system of equations.
Several similar but less general methods have been proposed in order to sim-
plify the method in special cases [O'Donnell, 1985; Strandh, 1988; Strandh,
1989; Durand and Salinier, 1993; Durand, 1994; Ramakrishnan and Sekar,
1990].
4.2.2 Representing Sequencing Information by Sets of Subpat-
terns
Another natural approach to pattern-matching and sequencing is to focus
on the left-hand sides of equations, and represent information about a term
Equational Logic Programming 125

according to the ways that it does and does not match portions of those
left-hand sides.
Definition 4.2.5. Let T be a set of equations. UT is the set of all sub-
terms of members of patt(T).
A subpattern set for T is a subset B C UT.
Subpattern sets may be used to present information about pattern-
matching and sequencing in several ways:
1. A match set is a subpattern set containing all of the subpatterns
known to match at a particular node in a term.
2. A possibility set is a subpattern set containing all of the subpatterns
that might match at a particular node in a term, as the result of
w-rewriting the proper subterms of that node.
3. A search set is a subpattern set containing subpatterns to search for
at a particular node, in order to contribute to a redex.
Notice that match sets and possibility sets always contain fi (except in the
unusual case when T has no occurrence of a variable), because everything
matches fl. Search sets never contain fi, since it is pointless to search for
something that is everywhere; they always contain patt(T), since finding
an entire redex occurrence is always useful.
In order to control pattern-matching and sequencing with subpattern
sets, associate a match set Ma and a possibility set Pa with each node in
a term t. Initially, Ma = {ft} and Pa = UT for all a. At all times the
subterm at a will match every subpattern in Ma, but no subpattern that
is not in Pa. That is, [M, P] is an interval in the lattice of subpattern sets
in which the true current description of the subterm at a always resides.
At every visit to a node a, update Ma and Pa based on the symbol / at a
and the information at the children a1, . . . , an of a as follows:

P'a represents the set of all subpatterns that may match at node a, as
the result of w-rewriting at proper descendants of a. Notice the similarity
between the calculation of Pa and the process of melting—UT above plays
the role of fi in melting.
126 Michael J. 0 'Donnell

Similarly, keep a search set Sa at each node that is visited in the traver-
sal, entering at the root with SA = patt(T). When moving from a node a
to the ith child a.i of a, update the search set at a.i by

{s : s is the iih principal subterm of a member of Sa D P'a}

The information in Ma, Pa, Sa, and the corresponding information at


the children ai, . . . an of a, may be used to control pattern-matching and
sequencing as follows:
1. If Ma n patt(T) = 0, then a is a redex occurrence.
2. If Pa n patt(T) = 0, then a will never be a redex occurrence.
3. If Ma n Sa = 0, then a contributes to a possible redex at one of its
ancestors.
4. If Pa n Sa = 0, then the symbol at a will never contribute to a redex
occurrence.
5. If there is no Q-term s E Sa whose ith principal subterm is in Ma.i,
then it is safe to move to a.i and process it further—in particular,
any redex occurrence reached by safe moves is strongly needed.
Hoffmann and I proposed a pattern-matching and sequencing method based
on match/possibility/search sets. The method appears heuristically to suc-
ceed on most naturally defined orthogonal systems, but Jiefei Hong noticed
that there are some strongly sequential systems for which it fails.
Example 4.2.6. Consider the strongly sequential system

and search for a strongly needed redex in the term f ( g ( c , c ) , c ) with oc-
currences A (the root), 1 (the subterm g(c, c)), 1.1, 1.2, and 2 (the three
occurrences of c, from left to right).
• Initially, we search at A, with

Update MA and PA based on M1 = M2 = {fi}, PI = P? = UT24> and


the symbol / at the root—in this case MA and PA do not change,
and

Prom condition (5) above, it is safe to move to either of 1 or 2.


Suppose that we choose arbitrarily to move to 1.
Equational Logic Programming 127

• Now, we search at 1, with

Update MI and PI to

The possibility set decreases in this case, because there is no left-


hand side with g at the root. Now by condition (5), it is not safe to
move to the first child of 1, 1.1, because g(£l, a) € S1 has (7 as its first
principal subterm, and fi € MI.I. Similarly, it is not safe to move to
1.2, because of g(a,fi) e Si. The method suggested in [Hoffmann
and O'Donnell, 1979] is not capable of backing up to parents and
siblings of 1 in order to decide which of the children of 1 to visit first.
On the other hand, suppose we choose arbitrarily to move from A to 2
instead of 1.
• Now, we search at 2, with

Since c 6 M2 npatt(T), there is a redex, which is strongly needed


since we found it by safe moves, and we rewrite c to o, yielding
f ( g ( c , c ) , a ) . Recalculate M2 and P2 to

a E M2 n S2, so continue back at A trying to complete a redex using


the a at 2.
Search again at A; update MA, PA an<^ ^*A again. M\ and P\ do not
change, but now

It is still safe to move to 1.


Search at 1 with
128 Michael J. O'Donnell

Notice that information from 2 has led to a smaller search set, not
containing g(£l,a). Update MI and PI to

This time it is safe to move to 1.1, but not to 1.2, so we choose the
former.
• Search at 1.1 with

The results are the same as in the search at 2 above—c rewrites to a


and we continue back at 1.
• Search again at 1, update MI, PI again to

g(a,fZ) € MI n Si, so continue back at A


• Search one more time at A, recalculating MA, PA to

f ( g ( a , £ l ) , a ) £ MA Dpatt(T), so there is a redex at A. Rewrite it,


yielding d. Update MA, PA to

PA n patt(T) = 0, so the d at A will never change. Since A now has


no children, we are done, and d is a normal form.
T24 shows that for some strongly sequential systems, the success of the
subpattern set analysis depends on the choice of traversal order. In the
strongly sequential system

no order of traversal allows the subpattern set analysis to succeed. The


only way to sequence with T2s is to visit both of the children of an /
before descending to grandchildren. The Huet-Levy method allows this
breadth-first behavior, but the Hoffmann-O'Donnell method does not.
Equational Logic Programming 129

4.2.3 Applying the sequencing techniques


Neither fi-terms nor subpattern sets should be computed explicitly at run
time. Rather, at compile time we compute the finite set of values that can
possibly occur while executing a given system of equations T, along with
tables of the operations required to update them at run time. It is natural
to think of fi-terms and subpattern sets as components of the state of a
finite automaton, and the tables of operations as a representation of the
transition graph. A key problem is that the number of states may grow ex-
ponentially in the size of T. [Hoffmann and O'Donnell, 1982] analyzes the
number of match sets associated with a set of patterns. There is no pub-
lished analysis of the number of fJ-terms, nor of possibility and search sets,
but it appears that all three may grow at an exponential rate also. Most
implementations of equational logic programming appear to use methods
that are equivalent to highly restricted forms of the fi-term method.
The Huet-Levy fi-term method is the only published method that suc-
ceeds for all strongly sequential systems. The only published method using
subpatterns fails in cases, such as T25, where it is necessary to back up
before reaching the leaves of a pattern, in order to carry information about
sequencing to a sibling. I conjecture that a simple variation in the algorithm
of [Hoffmann and O'Donnell, 1979] will succeed on all strongly sequential
systems. Besides the superior generality of the Huet-Levy algorithm, fi-
term methods have the advantage of a simple notation, making it relatively
easy to discover restrictions that control the number of possible values. On
the other hand, subpattern sets express the significance of information for
pattern matching purposes in a particularly transparent way. The subpat-
tern set method also has the advantage of separating clearly the information
that is passed up the tree (match and possibility sets) from the informa-
tion that is passed down the tree (search sets). Match and possibility sets
may be associated with nodes in a heap representation of a tree, taking
advantage of sharing in the heap. Search sets and fJ-terms depend on the
path by which a node was reached, so they must be stored on a traversal
stack, or otherwise marked so that they are not shared inappropriately.

4.3 Driving procedures for term rewriting


4.3.1 A recursive schema for lazy evaluation
It is natural to drive the conventional strict evaluation of a term by a recur-
sive procedure, eval, shown in Figure 8. For each function symbol / that
appears in a term, there is a predefined procedure / to compute the value
of that function on given arguments. The simplicity of recursive evaluation
is appealing, and it has a direct and transparent correspondence to the
semantic definition of the value of a term (Definition 2.3.3 of the chapter
'Introduction: Logic and Logic Programming Languages). It is straightfor-
ward to let the value of a term be its normal form, and use the recursive eval
130 Michael J. O'Donnell

Procedure eval(t)
Let * = /(*!,...,*„); (1)
For i := l , . . . , n do (2)
Vi :— eval(ti) (3)
end for;
Return/(vi,...,v n ) (4)
end procedure eval

Fig. 8. Recursive schema for strict evaluation.

above to find the normal form of its input. But, the structure of the proce-
dure is heavily biased toward strict evaluation. Even the conditional func-
tion cond requires a special test between lines (1) and (2) to avoid evaluat-
ing both branches. Lazy functional languages have been implemented with
conventional recursive evaluation, by adding new values, called suspensions,
thunks, or closures to encode unevaluated subterms [Peyton Jones, 1987;
Bloss et al., 147-164]. The overhead of naive implementations of suspen-
sions led to early skepticism about the performance of lazy evaluation.
Clever optimizations of suspensions have improved this overhead substan-
tially, but it is still arguably better to start with an evaluation schema that
deals with lazy evaluation directly.
For orthogonal systems of equations, a function symbol that contributes
to a redex occurrence at a proper ancestor cannot itself be at the root of a
redex. So, a term may be normalized by a recursive procedure that rewrites
its argument only until the root symbol can never be rewritten.
Definition 4.3.1. Let T be a set of equations, and t be a term. t is a
(weak) head-normal form for T if and only if there is no redex u such that
t$*u.
t is a strong head-normal form for T if and only if there is no redex u
such that t -^* u.
Head normality is undecidable, but strong head normality is easy to
detect by melting.
Proposition 4.3.2. Let t be a term, t is a strong head-normal form for
T if and only if meltj(i) / ft.
Let t be an fl-term. The folowing are equivalent:
• u is a strong head-normal form for T, for all u 3 t
• meli£(«) 56 n.
The procedure head-eval in Figure 9 below rewrites its argument only to
strong head-normal form, instead of all the way to normal form. The
tests whether t is in strong head-normal form (1), whether t is a redex
(2), and the choice of a safe child i (5), may be accomplished by any
Equational Logic Programming 131

Procedure head-eval(t)
While t is not in strong head-normal form do (1)
If t is a redex then (2)
t := right-side(t) (3)
else
Let * = /(*!,...,*„); (4)
Choose i € [l,n] that is safe to process; (5)
ti := head-eval(ti); (6)
t :=/(«!,...,tn) (7)
end if
end while;
Return t (8)
end procedure head-eval

Fig. 9. Recursive schema to find head-normal form.

pattern-matching and sequencing method that succeeds on the given sys-


tem of equations. The procedure right-side called by head-eval above builds
and returns the right-hand side instance corresponding to a given redex.
right-side contains the information about function symbols analogous to
that contained in the collection of /s used by the strict evaluator eval. In
a detailed implementation, pattern-matching and sequencing require some
additional data structures, and possibly some additional values returned
by head-eval.
In order to produce a normal form for t, head-eval must be called recur-
sively on the subterms of t, as shown by the procedure norm in Figure 10
below. Section 7 shows how other schemes for calling head-eval, more so-

Procedure norm(t)
t := head-eval(t); (1)
Lett = /(«!,..., t w ); (2)
Fori := 1, . . . , n do (3)
ti := norm(ti) (4)
end for;
t :=/(«!,...,<n); (5)
Return t (6)
end procedure norm

Fig. 10. Recursive schema to normalize a term lazily.

phisticated than norm, may be used to generalize the behavior of input


and output in equational logic programming.
132 Michael J. O'Donnell

4.3.2 Using the term data structure in rewriting


The basic recursive schema of head-eval may manipulate the tables val,
repl and ind in a variety of different ways, with various advantages and dis-
advantages in performance. In most cases, particularly in implementations
of orthogonal systems of equations, the pattern-matching and sequencing
method operates on val^^ao), where a0 is a heap address that initially
represents the root of the input term, val^pj^ao) is typically the closest
thing to a normal form for the original input that can be read easily from
val, repl and ind, so we achieve the best progress by rewriting it further.
It is also clearly correct to run the pattern-matcher/sequencer on another
term in val*epi(ao). I am not aware of any current implementation that
does this, but it is likely to be necessary in complete implementations of
nonconfluent systems, and equation solving systems (Section 7.2), which
will presumably need to explore several rewriting sequences in parallel. The
details of access to val and repl for pattern-matching/sequencing fit into
the implementation of the conditions tested in lines (1) and (2) of head-eval,
and in the choice of i in line (5). If the calculation of val™p*(ao) follows
repl links, it is usually desirable to compress the paths using transforma-
tions (4) and (5) from Section 4.1.2, which speeds up future calculations of
val™p*(ao). The benefits of transformations (4) and (5) are essentially the
same as the benefits of path compression in the UNION/FIND algorithm
[Aho et ai, 1974] (think of repl as representing a partition of terms into
equivalence classes of terms that have been proved equal).
The implementation of right-side, called in line (3) of head-eval, is the
only place where updating of val, repl and ind is required. Suppose that
we are applying the equation / = r to rewrite the term t, represented in
the heap with root node a. The most obvious implementation of right-side
builds all of the nonvariable structure of r in newly allocated heap nodes in
val, using pointers to pre-existing structure for the variable substitutions
(this is essentially what a conventional Lisp implementation does). Let /? be
the root node of the constructed representation of the appropriate instance
of r. Once the instance of r is built, /3 may be copied into a. Unfortunately,
when r consists of a single variable, as in car(cons(x, y)) = x, /3 may have
already been shared extensively, and this copying loses an obvious oppor-
tunity for further sharing [O'Donnell, 1977]. The natural alternative to
copying 0 is to assign repl(a) := 0 (represent the rewriting step by a link,
instead of rewriting in place). This avoids any loss of sharing, and if path
compression is applied in calculating val™p*(ao), it appears to have an
acceptable cost in overhead for access to the heap. In many programs, a
more sophisticated implementation of right-side can save an exponentially
growing amount of wasted work by sharing identical subterms of r. Many
natural programs have such identical subterms of right-hand sides, and in
practice this makes the difference between efficiency and infeasibility often
Equational Logic Programming 133

enough to be well worth the effort. The cost of detecting identical subterms
of right-hand sides using the tree isomorphism algorithm [Aho et al., 1974]
is quite modest, and can be borne entirely at compile time.
4.3.3 Dynamic exploitation of sharing and memoing of ground
terms
There are several ways that the signature index ind may be used to im-
prove sharing dynamically and opportunistically at run time (these are
examples of transformations (6) and (7) of Section 4.1.2). A further so-
phistication of right-side builds the structure of the instance of r from
the leaves and variable instances up, and for each constructed signature
f(a\,..., an) it checks ind(/(ai,..., a n )). If the result is nil, right-side
builds a new heap node (3 containing signature f(a\,..., an), and updates
ind by ind(/(a1,..., an) := 0. If the result is 7 ^ nil it shares the pre-
existing representation rooted at 7. This technique is essentially the same
as the hashed cons optimization in Lisp implementations [Spitzen et al.,
1978]. There are more places where it may be valuable to check the
signature index for opportunistic sharing. Immediately after line (6) of
head-eval, the signature at the root a of the heap representation of t may
have changed, due to rewriting of ti and path-compression. A sophisticated
implementation may check ind(val(a)), and if the result is /? ^ nil, a, re-
assign repl(a) := /3, so that further path compression will replace links
pointing to a by links to B, thus increasing sharing. If ind(val(a)) = nil,
then reassign ind(val(a)) := a. Finally, when head-eval is called recur-
sively on a subterm rooted at the heap address a, the signature at a may
be modified as the result of rewriting of a shared descendant of a that was
visited previously by a path not passing through a. So, the same check-
ing of ind and updating of repl or ind may be performed in step (1) of
head-eval, after any implicit path compression, but before actually testing
for strong head-normal form.
The opportunistic sharing techniques described above have a nontriv-
ial cost in overhead added to each computation step, but in some cases
they reduce the number of computation steps by an exponentially growing
amount. See [Sherman, 1990] for more details on opportunistic sharing,
and discussion of the tradeoff between ovehead on each step and number
of steps. The method that checks new opportunities for sharing at every
node visit as described above is called lazy directed congruence closure. An
even more aggressive strategy, called directed congruence closure [Chew,
1980], makes extra node visits wherever there is a chance that a change in
signature has created new sharing opportunities, even though the pattern-
matcher/sequencer has not generated a need to visit all such nodes (i.e.,
when the signature at a shared node changes, directed congruence closure
visits all parents of the node, many of which might never be visited again
by head-eval). Directed congruence closure may reduce the number of
134 Michael J. O'Donnell

steps in lazy directed congruence closure by an exponential amount, but


the added overhead appears heuristically large, and changes the structure
of the implementation significantly from the recursive schema of head-eval.
Opportunistic sharing strategies are generalizations of the idea of memo
functions [Keller and Sleep, 1986; Mitchie, 1968; Field and Harrison, 1988].
Most memoing techniques limit themselves to remembering equalities of
the form f(t\, . . . , < „ ) = u, where the tiS and u are all in normal form.
Lazy directed congruence closure and its variants can remember partial
evaluations as well, as can the lazy memo functions of Hughes [Hughes,
1985a]. There is a nice theoretical characterization of the memoing power
of directed congruence closure.
Proposition 4.3.3 ([Chew, 1980]). At any step in the process of rewrit-
ing a term, there is a set G of ground instances of equations that have been
applied in rewriting. Using directed congruence closure, we never apply a
ground instance that is a semantic consequence of G.
Not only does directed congruence closure avoid ever doing the same
rewriting step twice in different contexts, it only performs a new rewriting
step when new substitution of terms for variables in the given equations is
absolutely necessary to derive that step.
4.3.4 Sharing and memoing nonground terms—paramodulation
The next natural step beyond directed congruence closure is sharing/
memoing nonground terms. Abstractly in theorem-proving language, this
amounts to applying paramodulation [Robinson and Wos, 1969; Loveland,
1978; Gallier, 1986] to derive new equations with variables, instead of using
demodulation to merely rewrite a given term. Paramodulation takes two
equations p = q[r/x] and s = t and substitutions <r r , <rs of terms for vari-
ables such that rar = scrs, and derives the new equation par = qo-T[tas/x],
which follows from the first two by instantiation, substitution, and tran-
sitivity. Paramodulation sometimes improves the lengths of proofs by ex-
ponentially growing amounts compared to pure term rewriting [Loveland,
1978].
Example 4.3.4. Consider the set Trev of equations defining list reversal
(rev) and appending an element to the end of a list (app).

Trev = {{ rev(nil) = nil,


rev (cons
(cons(x,y))
( x , y ) ) = app (x, rev
rev(y)),
(y)),
app(x, nil) = x,
app(x, cons(y, z)) = cons(y, app(x, z)) }

The number of rewriting steps required to append an element to a list


of i elements is i + I, But, consider applying paramodulation instead.
One step of paramodulating the last equation in Trev with itself yields
Equational Logic Programming 135

app(w, cons(x, cons(y, z))) = cons(x, cons(y, app(w, z))), which moves the
appended element w past the first two elements in the list. The new equa-
tion paramodulates again with app(x, cons(y,z)) = cons(y, app(x, z)) to
move the appended element past three list elements, but paramodulat-
ing the new equation with itself does even better, moving the appended
element past four list elements. It is straightforward to produce a sequence
of equations, each by paramodulating the previous one with itself, so that
the ith equation in the sequence moves an appended element past 2i list
elements. So, paramodulation reduces the number of steps involved in nor-
malizing app(a,/3), where 0 is a list of i elements, from i + 1 to O(log i).
The improvement in normalizing rev (a) is less dramatic, but perhaps
more useful in practice. If a has i elements, then rewriting requires fi(i 2 )
steps to normalize rev(a), because it involves appending to the end of a
sequence of lists of lengths 0,1,..., i. But, we may produce a sequence of
equations, each by paramodulating the previous one with

so that the ith equation in the sequence appends an arbitrary element to a


list of length i. The whole sequence requires only i steps of paramodulation,
so a list of i elements may be reversed by O(i) paramodulation steps.
There are several problems in reducing the ideas on paramodulation
suggested in Example 4.3.4 to useful practice. First, we need a way to
control paramodulation—that is, to choose useful steps of paramodulation
to perform (equivalently, useful derived nonground equations to save and
use along with the originally given program equations for further evalua-
tion steps). Sequential strategies for lazy evaluation appear to solve this
problem in principle. Whenever a sequential rewriting process discovers
that a ground term s rewrites in one step to a ground term t, and existing
links in the repl table rewrite t further to u, ground sharing techniques
such as lazy directed congruence closure in effect save the derived ground
equation s = u for later use as a single step (this is accomplished by setting
repl(a) := 7 where t = val*repl(a) and u = val*repl(7)). A method for non-
ground sharing should save instead the nonground equation s' = u', where
s' is the generalization of s containing all of the symbol instances in s that
were scanned by the sequential rewriting process in order to rewrite s to u,
with unique variables replacing the unscanned portions of s. The appropri-
ate generalization u' of u is easy to derive from the same information that
determines s'. The equation s' = u' may be derived by paramodulation,
using the same equations that were used to rewrite s to u. This strategy
for driving paramodulation by the same sequentializing process that drives
rewriting accomplishes the improvement of list reversal from quadratic to
a linear number of inference steps in Example 4.3.4 above.
A second problem is to deal efficiently with the more general sets of
136 Michael J. O'Donnell

equations that we get by adding the results of paramodulation to the orig-


inal program equations. Notice that, even when the original program is
orthogonal, there are overlaps between the original equations and the re-
sults of paramodulation. When several different equations apply to the
same subterm, we probably should choose the most specific one, but that
may not always be uniquely determined (see Figure 4 in Section 3.2.4 for
the different sorts of overlap). At least in Example 4.3.4 above, choosing
the most specific applicable equation accomplishes the improvement of list
appending from linear to logarithmic.
Finally, we need an efficient data structure for representing sets of non-
ground equations, with a good way to add the new equations that result
from paramodulation. Suffix trees [Aho et al., 1974] appear to provide a
good basis for representing dynamically increasing sets of patterns and/or
equations [Strandh, 1984], but it is not at all clear how to generalize these
ideas efficiently to terms instead of strings (a good dissertation project:
implement string rewriting with as much sharing as possible at both ends
of a substring—this is essentially nonground sharing with only unary func-
tions). A more detailed examination of Example 4.3.4 above reveals that,
although the number of abstract steps required to reverse a list of length i
using paramodulation is only O(i), each abstract step involves an equation
with a larger left-hand side than the one before. So, while the number of
steps decreases, the cost of pattern-matching in each step increases, and
the total cost for reversal remains fi(i2). The exponential improvement
in appending to the end of a list still yields a net win, but it is not clear
how often such exponential improvements will arise in practice. In order
to decrease the total cost of reversal to O(i), we need a way to make the
pattern matching process incremental, so that the work done to match a
smaller pattern may be reused in matching a larger pattern derived from
it. I cannot find a reason why such efficient incremental pattern matching
is impossible, but no known technique accomplishes it today.
Another starting point for research on nonground sharing is the work
on sharing in the lambda calculus [Staples, 1982; Lamping, 1990; Kathail,
1984; Gonthier et al., 1992]. Some surprisingly powerful data structures
have been discovered already, but the overhead of using them is still not
understood well enough to determine their practical impact.
4.3.5 Storage allocation
Essentially all of the known ideas for automatic storage allocation and
deallocation are applicable to the val/repl/ind heap [Cohen, 1981; Ap-
pel, 1991]. Unfortunately, one of the most attractive methods in current
practice—generational garbage collection [Lieberman and Hewitt, 1983]—
has been analyzed on the assumption that it is rare for old nodes in the
heap to point to much more recently allocated ones. The repl links in lazy
implementations of equational logic programming appear to violate that as-
Equational Logic Programming 137

sumption. I know of no experimental or theoretical study of the efficiency


of generational garbage collection with lazy evaluation. The aggressive
sharing techniques also call into question the normal definition of 'garbage'
nodes as those that cannot be reached starting from some set of directly
accessible root nodes. In the discussion above, the directly accessible nodes
are ao and all of the nodes entered in the ind table. With the more ag-
gressive approaches to sharing, every node in the heap is usually accessible
from nodes in ind, so there is no garbage according to the strict defini-
tion. On the other hand, nodes that are inaccessible from ao (the root of
the term being rewritten to normal form) contain information that may be
recalculated from the ao-accessible nodes and the equational program, so,
while they are not useless garbage, they are not essential to the calculation
of a final result. I am not aware of any published literature on deallocation
of useful but inessential nodes in a heap, which might be called 'rum-
mage sale' instead of 'garbage collection.' Phillip Wadler proposed that a
garbage-collection/rummage-sale procedure might also perform rewriting
steps that reduce the size of the given term. Such a technique appears
to provide the space-saving benefits of strictness analysis [Mycroft, 1980;
Hughes, 1985b], avoiding both the compile-time cost and the nonunifor-
mity in the run-time control. Research is needed on the practical impact
of various heuristic strategies for looking ahead in a rewriting sequence in
search of a smaller version of the current term.

5 Compiling efficient code from equations


Conventional approaches to compiling functional programs, following early
Lisp compilers, associate a block of machine code with each function sym-
bol /, and use the recursive schema for eval in Figure 8, with some rep-
resentation of suspensions to encode partially evaluated terms as a new
sort of value [Peyton Jones, 1987]. A natural and less explored alterna-
tive is to associate a block of machine code with each state in a finite
automaton whose states are derived from an analysis of the program using
ft-terms or subpattern sets [Bondorf, 1989; Durand et al., 1991]. The state-
based approach has no explicit representation of suspensions, although in
effect terms that it must build in the heap are implicit suspensions. And,
a naive implementation of the state-based approach leads to more and
smaller recursive procedures than the symbol-based approach. Of course,
sophisticated symbol-based compilers may optimize by compiling multiple
specialized versions of a single function for different sorts of arguments,
and sophisticated state-based compilers may optimize by inlining some of
the recursive calls. The supercombinator method [Hughes, 1982] introduces
new symbols internal to the implementation, reducing the dependence of
symbol-based compiling on the actual symbols in the source program. Af-
ter such optimizations, it is not clear whether symbol-based compiling and
138 Michael J. O'Dormell

state-based compiling will actually generate different code. The function


symbols appear to be conceptually tied to the programmer's view of her
program, while the automaton states appear to be conceptually tied to the
computational requirements of the program. I conjecture that in the long
run the state-based approach will prove more convenient for achieving the
best run-time performance, while the symbol-based approach will maintain
a more intuitive connection between program structure and performance.
Both the function-based and state-based methods compile the informa-
tion in the set of equations constituting a program, but they both treat
the input term and all of the intermediate terms produced in rewriting to
normal form as static data structures, which must be interpreted by the
compiled equational code. The Tigre system [Koopman and Lee, 1989;
Koopman, 1990] by contrast compiles each function symbol in the term
being evaluated into a block of code—the entire term becomes a self mod-
ifying program that reduces itself to normal form. Tigre compiles only a
particular extension of the combinator calculus; an interesting generaliza-
tion would be a Tigre-like compiler for an arbitrary orthogonal system.
Several abstract machine languages have been proposed as intermediate
languages for equational compilers, or as real machine languages for spe-
cialized term-rewriting machines. Landin's SECD machine [Landin, 1965]
is designed to support evaluation of lambda terms. The G Machine [Johns-
son, 1984; Augustsson, 1984; Burn et al., 1988] is intended as a specialized
machine language or intermediate language for term rewriting in general.
Equational Machine (EM) code [Durand et al., 1991; Strandh, 1988] is
designed to support state-based compiling of equational programs, and
the optimizations that seem most important in that approach. Several
other abstract machine proposals are in [Cardelli, 1983; Cardelli, 1984;
Cousineau et al., 1985]. Turner proposes to compile equational systems
into the combinator calculus, using the well-known capability of the com-
binator calculus to encode substitution in the lambda calculus [Turner,
1979]. Then, an implementation of the combinator calculus, fine tuned
for performance, can support essentially all equational logic programming
languages. The impact of such translations of one rewriting system to an-
other is ill understood—in particular the effects on sharing and parallelism
(see Section 6) are quite subtle. The Warren Abstract Machine (WAM)
[Warren, 1983], intended as an intermediate language for Prolog compilers,
has also been used in theorem provers as a representation for terms—t is
represented by WAM code to unify an arbitrary input with t. I am not
aware of any published study of the applicability of such a WAM encoding
of terms to functional or equational logic programming.
Equational Logic Programming 139

6 Parallel implementation
One of the primary early motivations for studying functional programming
languages was the apparent opportunities for parallel evaluation [Backus,
1978]. So, it is ironic that most of the effort in functional and equational
logic programming to date involves sequential implementation. A variety
of proposals for parallel implementation may be found in [Peyton Jones,
1987].
Both strictness analysis and sequentiality analysis are used primarily
to choose a sequential order of computation that avoids wasted steps. An
important open topic for research is the extension of sequentiality analysis
to support parallel computation. Huet and Levy's sequentiality analysis
is already capable of identifying more than one strongly needed redex in
a term. A parallel implementation might allocate processors first to the
strongly needed redexes, and then to other more speculative efforts. It ap-
pears that sequentiality analysis can be generalized rather easily (although
this has not been done in print) to identify strongly needed sets of re-
dexes, where no individual redex is certain to be needed, but at least one
member of the set is needed. For example, with the positive parallel-or
equations in Tor+ of Examples 2.3.16 and 3.2.1, it is intuitively clear that
if a is needed in t, and 0 is needed in u, then at least one of a and /3 must
be rewritten in or(t, u) (although neither is needed according to the for-
mal Definition 3.2.3). Further research is required on the practical impact
of heuristic strategies for allocating parallel processors to the members of
strongly needed sets. It is natural to give priority to the singleton sets
(that is, to the strongly needed redexes), but it is not clear whether a set
of size 2 should be preferred to one of size 3—perhaps other factors than
the size of the set should be considered. Strongly needed redex sets are
essentially disjunctive assertions about the need for redexes—more general
sorts of boolean relations may be useful (e.g., either all of ai,... ,am or
all of /3i,..., (3n are needed).
Unfortunately, since strongly needed redexes are all outermost, sequen-
tiality analysis as known today can only help with parallelism between
different arguments to a function. But, one of the most useful qualities of
lazy programming is that it simulates a parallel producer-consumer rela-
tionship between a function and its arguments. It seems likely that much
of the useful parallelism to be exploited in equational logic programming
involves parallel rewriting of nested redexes. An analysis of nonoutermost
needed redexes appears to require the sort of abstract interpretation that is
used in strictness analysis [Mycroft, 1980; Hughes, 1985b]—it certainly will
depend on right-hand sides as well as left-hand sides of equations. Unfor-
tunately, most of the proposed intermediate languages for compiling equa-
tional programs are inherently sequential, and a lot of work is required to
convert current sequential compiling ideas to a parallel environment. The
140 Michael J. O'Donnell

idea of compiling everything into combinators may not be useful for parallel
implementation. The known translations of lambda terms into combinators
eliminate some apparent parallelism between application of a function and
rewriting of the definition of the function (although no direct implemen-
tation is known to support all of this apparent parallelism either). Only
very preliminary information is available on the inherent parallel power of
rewriting systems: even the correct definition of such power is problematic
[O'Donnell, 1985].
At a more concrete level, there are a lot of problems involved in par-
allelizing the heap-based execution of evaluation/rewriting sequences. A
data structure, possibly distributed amongst several processors, is needed
to keep track of the multiple locations at which work is proceeding. If any
speculative work is done on redexes that are not known to be needed, there
must be a way to kill off processes that are no longer useful, and reallocate
their processing to to useful processes (although aggressive sharing through
the signature index confuses the question of usefulness of processes in the
same way that it confuses the usefulness of heap nodes).
Sharing presents another challenge. Several different processors may
reach a shared node by different paths. It is important for data integrity
that they do not make simultaneous incompatible updates, and important
for efficiency that they do not repeat work. But, it is incorrect for the
first process to lock all others completely out of its work area. Suppose we
are applying a system of equations including car (cons ( x , y ) ) = x. There
may be a node a in the heap containing the signature cons(/3i,/32) shared
between two parents, one of them containing cons (a, 6) and the other con-
taining car (a). A process might enter first through the cons parent, and
perform an infinite loop rewriting inside /32. If a second process enters
through the car node, it is crucial to allow it to see the cons at a, to link
to /?i, and depending on the context above possibly to continue rewriting
at /3j and below.
Abstractly, we want to lock node/state pairs, and allow multiple pro-
cesses to inspect the same node, as long as they do so in different states,
but when a process reaches a node that is already being processed in the
same state it should wait, and allow its processor to be reassigned, since
any work that it tries to do will merely repeat that of its predecessor. It is
not at all clear how to implement such a notion of locking with acceptable
overhead. Perhaps a radically different approach for assigning work to pro-
cessors is called for, that does not follow the structure of current sequential
implementations so closely. For example, instead of the shared-memory ap-
proach to the heap in the preceding discussion, perhaps different sections
of the heap should be assigned to particular processors for a relatively long
period, and demands for evaluation should be passed as messages between
processors when they cross the boundaries of heap allocation.
Equational Logic Programming 141

7 Extensions to equational logic programming


7.1 Incremental infinite input and output
The logic programming approach to answering normal form queries leads
naturally to the use of lazy evaluation to guarantee completeness. But,
once we commit to lazy evaluation, the basic scheme of normal form queries
seems too limited. Inside an equational computation, it is natural to con-
ceive of procedures consuming and producing potentially infinite terms in-
crementally, in the manner of communicating coroutines. It is perfectly sen-
sible to define objects that correspond intuitively to infinite terms, as long
as only finite portions of these infinite terms are required to produce output.
It is annoying that the full flexibility of incremental demand-driven com-
putation in the internal computation is not available at the input/output
interface. Aside from the loss of input/output programming power per se,
such a discrepancy between internal and external communication tends to
encourage large, unwieldy, monolithic programs, and discourage the col-
lections of small, separate, modular programs that are more desirable in
many ways.
The desired abstract behavior of a demand-driven incremental I/O in-
terface is essentially clear. The consumer of output demands the symbol
at the root of the output. The equational program computes until it has
produced a head-normal form equivalent to the input, then it outputs the
root symbol of that head-normal form, and recursively makes the principal
subterms available to the consumer of output, who can demand symbols
down different paths of the output tree in any order. During the compu-
tation of a head-normal form, the equational program generates demands
for symbols in the input. The producer of input is responsible only for
providing those input symbols that are demanded. In this way, both input
and output terms are treated incrementally as needed, and each may be
infinite.
A direct generalization of equational logic programming by extending
the set of terms to include infinite ones is semantically problematic.
Example 7.1.1. Consider the system of equations

The natural infinite output to expect from input a is f(c, f(c,...)). b pro-
duces the same infinite output. So, if in our semantic system f(c, f(c,...))
is a generalized term, with a definite value (no matter what that value is),
then T27 |= (a = b). But, according to the semantics of Definition 2.3.14
in Section 2.3.4 of the chapter 'Introduction: Logic and Logic Program-
ming Languages', T27 ^j. (a = b), because there are certainly models in
which the equation x = f(c,x) has more than one solution. For example,
interpret / as integer addition, and c as the identity element zero. Every
142 Michael J. O'Donnell

integer n satisfies n = 0 + n.
Even if we replace the second equation with the exact substitution of 6
for a in the first, that is b == f ( c , b), it does not follow by standard equational
semantics that a = b. With some difficulty we may define a new semantic
system, with a restricted set of models in which all functions have sufficient
continuity properties to guarantee unique values for infinite terms. But it is
easy to see that the logical consequence relation for such a semantic system
is not semicomputable (recursively enumerable). It is always suspect to say
that we understand a system of logic according to a definition of meaning
that we cannot apply effectively.
Instead, I propose to interpret outputs involving infinite terms as ab-
breviations for infinite conjunctions of formulae in the first-order predicate
calculus (FOPC) (such infinite conjunctions, and infinite disjunctions as
well, are studied in the formal system called Lw1,w [Karp, 1964]).
Definition 7.1.2. The first- order predicate calculus with equality (FOPC_i)
is the result of including the equality symbol = in the set Pred2 of binary
predicate symbols, and combining the semantic rules for FOPC (Defini-
tions 2.3.1-2.3.4, Section 2.3.1 of the chapter 'Introduction: Logic and
Logic Programming Languages') and Equational Logic (Definitions 2.3.13-
2.3.14, Section 2.3.4 of the 'Introduction . . . ' chapter) in the natural way
(a model for FOPC.:. satisfies the restrictions of a model for FOPC and the
restrictions of a model for equational logic).
La u JL is the extension of FOPCj. to allow countably infinite conjunc-
tions and disjunctions in formulae. Let FFu u ± be the set of formulae
in Lu u _L. . Extend the semantic system for FOPC_L to a semantic system
for Lj 'u ± by the following additional rules for infinite conjunctions and
disjunctions:
1. pT,l/(f\{Ai,A2, ...}) = 1 if and only if pT<l/(Ai) = 1 for all i > 1
2- Pr,v(M{Ai, A%, ...}) = 1 if and only if pTtV(Ai) = 1 for some i > 1
That is, an infinite conjunction is true precisely when all of its conjuncts
are true; and an infinite disjunction is true precisely when at least one of
its disjuncts is true.
A set T of terms is a directed set if and only if, for every two terms
t1, t2 € T, there is a term t3 € T such that £3 is an instance of t1 and also
an instance of t%.
Let U be a countable directed set of linear terms. For each u € U, let
yu be a list of the variables occurring in u. Let t be a finite ground term.
The conjunctive equation of t and U is the infinite conjunction

Tp is the set of finite and infinite terms.


Equational Logic Programming 143

The term limit of a directed set U (written lim(U)) is the possibly infi-
nite term resulting from overlaying all of the terms u £ U, and substituting
the new symbol for any variables that are not overlaid by nonvariable
symbols. Since members of U are pairwise consistent, every location in the
limit gets a unique symbol, or by default comes out as _L. To see more
rigorously that the term limit of U is well defined, construct a (possibly
transfinite) chain, (to E ti C • • • ) . First, to = _L. Given ta, choose (axiom
of choice) an s £ U such that s % ta, and let ta+i be the overlay of s with
ta. At limit ordinals A, let t\ be the limit of the chain of terms preceding
A. With a lot of transfinite induction, we get to a tp such that, for all
s £ U, s C t/3, at which point the chain is finished. lim(J7) is the limit of
that chain.
The canonical set of an infinite term t (written approx(t)) is the set of
all finite linear terms t1 (using some arbitrary canonical scheme for naming
the variables) such that t is an instance of t'. approx(t) is a directed set,
and lim(approx(i)) = t.
A careful formal definition of infinite terms and limits is in [Kenneway
et al., 1991; Dershowitz et al, 1991], but an intuitive appreciation suffices
for this section.
When the finite input t produces the infinite output lim(U), instead
of interpreting the output as the equation t = lim(U), interpret it as the
conjunctive equation of t and U. Notice that a chain (a sequence ui,uy,...
such that Uj+i is an instance of Ui) is a special case of a directed set.
The infinite outputs for finite inputs may be expressed by chains, but the
generality of directed sets is needed with infinite outputs for infinite inputs
below.
Infinite inputs require a more complex construction, since they intro-
duce universal quantification that must be nested appropriately with the
existential quantification associated with infinite output. Also, different
orders in which a consumer explores the output may yield different se-
quences of demands for input, and this flexibility needs to be supported by
the semantics.
Definition 7.1.3. Let T and U be directed sets of linear terms such that
no variable occurs in both t € T and u € U. For each term t € T and u €. U,
let Xt and yu be lists of the variables occurring in t and w, respectively. Let
/ : T -4 2U be a function from terms in T to directed subsets of U, such
that when t% is an instance of t1, every member of f(t%) is an instance of
every member of f(<i). The conjunctive equation of T and U by f is the
infinite conjunction

Definition 7.1.4. Let s1,..., Si € Tp be terms. A term u is an incre-


144 Michael J. O'Donnell

mental normal form for {s1,...,«i} if and only if no nonvariable subterm


of u unifies with a term in {s1,... ,Si}. Equivalently, u is an incremen-
tal normal form if and only if u is a normal form and every substitution
of normal forms for variables leaves u in normal form. Equivalently, u is
an incremental normal form if and only if melting preserves ua, where un
results from substituting ft for each variable in u.
Let norm" be a new formal symbol. Let

Define the relation ?-|C C£ x Fp,u>i w ^ by

if and only if there exists a directed set U and a monotonic (in the instance
ordering C) function f : approx(tw) ->• 2U such that
1. A is the conjunctive equation of approx(tw) and U by f;
2. u is in incremental normal form, for all u € U.
Now ( F a ) i u _L,Q",?-^) is a query system representing the answers to
questions of' the form 'what set of incremental normal forms for s 1 , . . . , si
is conjunctively equal to t?' In an implementation of equational logic pro-
gramming based on norm" , the consumer of output generates demands
causing the computation of some u €. U. Some of the symbols in u are
demanded directly by the consumer, others may be demanded by the sys-
tem itself in order to get an incremental normal form. The implementation
demands enough input to construct t £ approx(tw) such that u 6 f ( t ) . By
modelling the input and output as directed sets, rather than sequences, we
allow enough flexibility to model all possible orders in which the consumer
might demand output. The function / is used to model the partial synchro-
nization of input with output required by the semantics of equational logic.
Unfortunately, the trivial output y,y,..., with term limit _L, always satisfies
1-2 above. The most useful implementation would provide consequentially
strongest answers (Definition 2.2.5, Section 2.2 of the chapter 'Introduction:
Logic and Logic Programming Languages')—that is, they would demand
the minimum amount of input semantically required to produce the de-
manded output. If consequentially strongest answers appear too difficult
to implement, a more modest requirement would be that lim(U) is max-
imal among all correct answers—that is, all semantically correct output
is produced, but not necessarily from the minimal input. The techniques
outlined above do not represent sharing information, either within the in-
put, within the output, or between input and output. Further research is
needed into semantic interpretations of sharing information.
Example 7.1.5. Consider the system
Equational Logic Programming 145

and the infinite input term

and the query

The natural desired answer A to this query is the conjunction of

A represents the infinite output t'u = h(h(g(t'u,t'u))). U does not contain


all approximations to t'u, but only the ones that have gs as the rootmost
non-J. symbols. The monotone function mapping partial inputs to the
portions of output that they determine is:
146 Michael J. O'Donndl

A natural approach to representing sharing information would be to use


Vxi,z 2 : f(g(xi,xt)) = h(h(g(xi,xi))) instead of Vxi,x 2 : 3t/i,y2 :
f ( g ( x i , x 2 ) ) = h(h(g(yi,y2))), etc., but this idea has yet to be explored.
It is simple in principle to modify an implementation of equational logic
programming to deal incrementally with infinite input queries and output
answers in the query system Q^. The practical details of such an implemen-
tation are interesting and challenging. The tours protocol [Rebelsky, 1992;
Rebelsky, 1993] provides one design for incremental input and output of in-
finite terms. Implementations based on the recursive scheme for head-eval
in Figure 9 may be adapted relatively simply to incremental input and out-
put, simply by replacing the scheme for norm in Figure 10 by a scheme that
issues calls to head-eval only as symbols are demanded by the consumer
of output. Unfortunately, such adaptations are not normally complete, be-
cause of sequentializing problems. The trouble is that conjunctive behavior
(which is easy to sequentialize) in the application of a rule corresponds to
disjunctive behavior (which is impossible to sequentialize) in the produc-
tion of strong head-normal forms, and vice versa.
Example 7.1.6. Consider the following equation defining the negative
sequential-or:

This equation causes no problem in sequentializing the rewriting of a term


to a finite normal form, since it requires both arguments to or. But, or(s, t)
is a strong head-normal form if and only if either of s or t is a strong head-
normal form ^ false. So, it is not safe to rewrite s, because of the possibility
that s has no strong head-normal form, while t rewrites to strong head-
normal form u ^ false. For symmetric reasons, it is not safe to rewrite t.
Only a parallel rewriting of both s and t is complete, and such parallel
rewriting is very likely to waste steps.
By contrast, the positive parallel-or equations T0r+ of Examples 2.3.16
and 3.2.1 prevent sequential computation of a finite normal form, but pose
no problem to sequential head-normalization of or(s, t) by rewriting s and
t—both arguments must rewrite to strong head-normal forms ^ true in
order to head-normalize or(s,t).
So, to achieve complete implementations of equational logic program-
ming with incremental input and output, we must solve the sticky problems
of efficient implementation of parallel rewriting, even if the parallelism is
simulated by multitasking on a sequential processor. In the meantime, se-
quential and incomplete implementations are likely to be practically useful
for many problems—the incompleteness appears to be no more serious than
the incompleteness of Prolog, which is widely accepted.
Beyond the incremental evaluation of an input that increases over time,
there is an even more challenging problem to evaluate a changing input. In
Equational Logic Programming 147

principle, changing input may be reduced to increasing input by represent-


ing the entire edit history of a term as another term [Rebelsky, 1993]. In
practice, there is a lot of research required to achieve practical efficiency.
A highly efficient general scheme for re-evaluation after a change in in-
put will be extremely valuable. There are several techniques known for
re-evaluation, in special structures, such as attributed parse trees [Demers
et al., 1981; Pugh and Teitelbaum, 1989], but none approaches the gener-
ality or fine-grainedness of equational re-evaluation. A late result for the
A-calculus that I have not had a chance to review is in [Field and Teit-
elbaum, 1990]. Powerful memo techniques seem to be crucial to efficient
re-evaluation of a changed input term. In some sense efficient re-evaluation
is precisely the intelligent reuse of partial results from the previous evalua-
tion. I conjecture that nonground memoing will be especially valuable for
this problem.
Finally, we should study implementations of equational logic program-
ming in which the set of equations is changing. Changing sets of equations
arise in rapid recompiling during program development, in applications
that use equations as inputs rather than as programs, and in methods
based on nonground memoing or paramodulation (Section 4.3.4). Very lit-
tle is known about this problem—all that I am aware of is a technique
for incremental maintenance of a finite automaton for pattern matching
[Strandh, 1984].

7.2 Solving equations


Queries of the form (solvex1,...,x n • s = t) (Definition 2.3.18, Section
2.3.4 of the chapter 'Introduction: Logic and Logic Programming Lan-
guages') require the solution of the equation s = t for the values of x1,... ,xn.
This appears to be much more difficult than finding normal forms, and to
offer more programming power. In particular, it appears that solving equa-
tions is the essential problem in the most natural approach to combining
the useful qualities of Prolog and lazy functional programming. Notice that
solve queries are quite similar to the what queries processed by Prolog
(Section 2.3.1 of the 'Introduction . . . " chapter). A combination of the
methods for solving equations with the techniques of Prolog should pro-
vide a way of answering what queries in the first-order predicate calculus
with equality.
Much of the theory of term rewriting falls short of dealing with the
problems posed by equation solving. Confluence appears to help—if the
given set T of equations is confluent, then we may seek an instance s' of
s and t' of t, and a term u, such that s' -^* u and t' 5* u, ignoring the
symmetric rule and applying equations only from left to right. But, for
a complete implementation, it is not sufficient to rewrite needed redexes.
That is, it may be that there is a u such that s' -3-* u and t' ^* u, but no
148 Michael J. O'Donnell

such u may be reached by rewriting only needed redexes (Definition 3.2.3,


Section 3.2.3). Outermost complete rewriting (Definition 3.1.5, Section 3.1)
is similarly insufficient. Complete rewriting sequences (Definition 3.1.3)
come a bit closer. If s' and t' have a common form, then a complete
rewriting sequence starting with s' is guaranteed eventually to produce
a term u such that t' —>* u. But, another complete rewriting sequence
starting with t' is not guaranteed to produce the same u. Finally, even if
we generate rewriting sequences from s' and t' that both contain u, it is
not clear how to synchronize the sequences so that u appears in both of
them at the same time.
No thoroughly satisfactory method is known for a complete implemen-
tation of equation solving in an arbitrary strongly sequential system. But,
enough is known to make two observations that are likely to be useful in
such an implementation.
1. Suppose we are solving f ( s 1 , . . . , sm) = g(t1, • • •, tn), where f ^ g. It
suffices to rewrite only needed redexes until the head symbols agree.
Unfortunately, it is hard to know whether to rewrite f(s1,...,s m )
or g ( t 1 , . . . , tn) or both, but within each we can use the same sort of
sequentializer that rewrites a single term to normal form.
2. There are only two cases in which it is helpful to instantiate a variable:
(a) when the instantiation creates a redex at a proper ancestor of
the variable, which we immediately rewrite (so, in the situation
of (1) above, the redex must be needed);
(b) when the instantiation unifies corresponding subterms of s and
t.
In each case, we should substitute the most general (smallest) term
that achieves the redex or the unification.
Jayaraman noticed (1) [Jayaraman, 1985], and used it to implement a
system called EqL for solving equations s = t only in the case where in-
stances s' of s and t' of t reduce to a common form consisting entirely of
constructor symbols in a constructor-orthogonal system (Definition 2.3.13,
Section 2.3.2). For this purpose, it suffices to rewrite s' and t' to head-
normal forms. If the head symbols agree, recursively solve equations be-
tween the corresponding arguments, otherwise there is no solution. It
should be straightforward to generalize EqL to find solutions in normal
form in strongly sequential rewrite- and rule-orthogonal systems. If we
want all solutions, even those that are not in normal form, we must some-
how explore in parallel the attempt to rewrite the head symbol of s' and
the attempt to solve the equation without head rewriting one or both of
them (and a similar parallelism for t'). This introduces all of the problems
of parallelism discussed in Section 6, and it probably limits or eliminates
the ability to do path compression on repl links (Section 4.3), since it is not
sufficient to work only on the most rewritten form of a term
Equational Logic Programming 149

where a<j is the root of the heap representation of the term being rewritten).
Heuristically, it appears that the majority of the individual steps in
solving an equation are unlikely to create such parallelism. But, EqL's uni-
form requirement of finding a solution in normal form is likely to introduce
the same huge inefficiencies in some cases as strict evaluation (notice that
the individual evaluations of s', t' to normal form are lazy, but the equation
solution is not lazy, as it may take steps that are required to reach normal
form, but not to solve the equation). If these heuristic observations are ac-
curate, even an implementation with rather high overhead for parallelism
may be very valuable, as long as the cost of parallelism is introduced only
in proportion to the actual degree of parallelism, rather than as a uniform
overhead on sequential evaluation as well.
Among the theoretical literature on term rewriting, the most useful
material for equation solving will probably be that on narrowing [Fay,
1979] (see the chapter 'Equational Reasoning and Term Rewriting Systems
in Volume 1), which is a careful formalization of observation (2) above.
Information on sequencing narrowing steps seems crucial to a truly efficient
implementation: some new results are in [Antoy et al., 1994].
7.3 Indeterminate evaluation in subset logic
Until now, I have considered only quantified conjunctions of equations as
formulae, else I cannot claim to be using equational logic. The discipline
of equational logic constrains the use of term rewriting systems, and leads
us to insist on confluent systems with lazy evaluation. Lazy evaluation
appears to be mostly a benefit, but the restriction to confluent systems
is annoying in many ways, and it particularly weakens the modularity of
equational logic programming languages, since orthogonality depends on
the full textual structure of left-hand sides of equations, and not just on an
abstract notion of their meanings. As well as improving the techniques for
guaranteeing confluence, we should investigate nonconfluent term rewrit-
ing. Since, without confluence, term rewriting cannot be semantically com-
plete for equational logic, we need to consider other logical interpretations
of term rewriting rules.
A natural alternative to equational logic is subset logic [O'Donnell,
1987]. Subset logic has the same terms as equational logic, the formu-
lae are the same except that they use the backward subset relation symbol
D instead of = (Definition 2.3.13, Section 2.3.4 of the chapter 'Introduction:
Logic and Logic Programming Languages'). Subset logic semantics are the
same as equational (Definition 2.3.14, Section 2.3.4 of the 'Introduction
...' chapter ), except that every term represents a subset of the universe of
values instead of a single value, function symbols represent functions from
subsets to subsets that are extended pointwise from functions of individual
values (i.e., f ( S ) = U { f ( { x } ) : x € S}). Subset logic is complete with the
reflexive, transitive, and substitution rules of equality, omitting the sym-
150 Michael J. O'Donnell

metric rule (Definition 2.1.2, Section 2.1). So, term rewriting from left to
right is complete for subset logic, with no restrictions on the rules. Tech-
nically, it doesn't matter whether the left side of a rule is a subset of the
right, or vice versa, as long as the direction is always the same. Intuitively,
it seems more natural to think of rewriting as producing a subset of the
input term, since then a term may be thought of as denoting a set of pos-
sible answers. Thus, a single line of a subset logic program looks like / D r .
Note that normal forms do not necessarily denote singleton sets, although
it is always possible to construct models in which they do.
Subset logic programming naturally supports programs with indetermi-
nate answers. When a given input s rewrites to two different normal forms
t and M, subset logic merely requires that s 3 t and s D u are true, from
which it does not follow that t = u is true, nor t D u, nor u 2 t. While
equational logic programming extends naturally to infinite inputs and out-
puts, without changing its application to finite terms, such extension of
subset logic programming is more subtle. If only finite terms are allowed,
then infinite computations may be regarded as a sort of failure, and a finite
normal form must be found whenever one exists. If incremental output
of possibly infinite normal forms is desired, then there is no effective way
to give precedence to the finite forms when they exist. The most natural
idea seems to be to follow all possible rewriting paths, until one of them
produces a stable head symbol (that is, the head symbol of a strong head-
normal form) for output. Whenever such a symbol is output, all rewriting
paths producing different symbols at the same location are dropped. Only
a reduction path that has already agreed with all of the symbols that have
been output is allowed to generate further output. The details work out es-
sentially the same as with infinite outputs for equational programs, merely
substituting D for ==. It is not at all clear whether this logically natu-
ral notion of commitment to a symbol, rather than a computation path,
is useful. I cannot find a natural semantic scheme to support the more
conventional sort of commitment to a computation path instead of an in-
termediate term, although a user may program in such a way that multiple
consistent computation paths never occur and the two sorts of commitment
are equivalent.
Efficient implementation of nonconfluent rewriting presents a very in-
teresting challenge. It is obviously unacceptable to explore naively the
exponentially growing set of rewriting sequences. A satisfying implemen-
tation should take advantage of partial confluent behavior to prune the
search space down to a much smaller set of rewriting sequences that is still
capable of producing all of the possible outputs. The correct definition
of the right sort of partial confluence property is not even known. It is
not merely confluence for a subset of rules, since two rules that do not
interfere with one another may interfere differently with a third, so that
the order in which the two noninterfering rules are applied may still make
Equational Logic Programming 151

a difference to the outcome. Pattern-matching and sequencing techniques


must be generalized as well, and a good data structure designed to rep-
resent simultaneously the many different rewritings under consideration
as compactly as possible. A good complete implementation of noncon-
fluent rewriting will probably allow the prohibited terms in normal forms
(Definition 2.3.16, Section 2.3.4 of the chapter 'Introduction: Logic and
Logic Programming Languages') to be different from the left-hand sides of
rules. Sharing of equivalent subterms becomes problematic since it may be
necessary to reduce two different occurrences of the same subterm in two
different ways. The difficulties in discovering a complete implementation of
subset logic programming would be well rewarded, since this style of pro-
gramming can capture the useful indeterminate behavior of Prolog, while
avoiding the repeated generation of the same solution by slightly different
paths that slows some Prolog programs down substantially.
Even an incomplete implementation of nonconfluent rewriting might be
very valuable. In the spirit of logic programming, such an implementation
should depend on inherent structural properties of the rules in a program
to determine which of a set of possible rewriting steps is chosen when the
redex patterns overlap. From the point of view of implementation, the most
natural approach appears to be to select an outermost connected compo-
nent of overlapping redex patterns, but to rewrite within the component
from innermost out. It is not clear how to resolve the ambiguous case where
two redex patterns have the same root, nor has anyone proposed a sensi-
ble semantic explanation based on logic programming for any principle for
choosing between conflicting rules.
Jayaraman has a different concept of subset logic programming [Jayara-
man, 1992]. Meseguer's rewriting logic has the same rules as my subset
logic, with a different semantic justification. [Meseguer, 1992]
7.4 Relational rewriting
Equation solving can lead to a combination of Prolog and functional pro-
gramming through logic programming in the first-order predicate calculus
with equality. Subset logic programming, implemented with nonconfluent
term rewriting, can lead to an implementation of predicate calculus pro-
gramming in a sort of indeterminate functional programming. Another
intriguing way to reconcile some of the good qualities of Prolog and func-
tional programming is relational logic programming. Relational logic pro-
gramming can be approached as a generalization of terms to systems of
relational constraints, or as a purification of Prolog by removing the func-
tion symbols (called 'functors' in Prolog).
A relational formula is a FOPC formula with no function symbols (this
prohibition includes zeroary functions, or constants). The natural analogue
to a term in relational logic is a conjunction of atomic formulae, which may
be represented as a hypergraph (like a graph, but edges may touch more
152 Michael J. O'Donnell

than two nodes) with nodes standing for variables and hyperedges (edges
touching any number of nodes, not necessarily two) representing predicate
symbols. Two fundamental problems stand in the way of a useful imple-
mentation of relational logic programming through hypergraph rewriting.
1. We need an efficient implementation of hypergraph rewriting. The
pattern-matching problem alone is highly challenging, and no practi-
cal solution is yet known.
2. We need a semantic system that makes intuitive sense, and also sup-
ports computationally desirable methods of hypergraph rewriting.
A first cut at (2) might use the semantic system for FOPC, and express
rewrite rules as formulae of the form

These clauses differ from the Horn clauses of Prolog in two ways. The
quantification above is Vx : 3y :, where clauses are universally quantified.
And, the consequence of the implication above is a conjunction, where
Horn clauses allow at most one atomic formula in the consequence, and
even general clauses allow a disjunction, rather than a conjunction. No-
tice that, with Prolog-style all-universal quantification, the implication dis-
tributes over conjunctions in the consequence ((A\ A ^2) •<= B is equivalent
to (A1 4= B) A (A2 <= B)). But, if existentially quantified variables from y
appear in AI,..., Am, the distribution property does not hold.
Since the consequence and hypothesis of our formula are both con-
junctions of atomic formulae, they may be represented by hypergraphs,
and it is natural to read the formula as a hypergraph rewrite rule. A
natural sort of hypergraph rewriting will be sound for deriving univer-
sally quantified implications from V3 implications, as long as the exis-
tentially quantified variables y correspond to nodes in the graph that
participate only in the hyperedges representing Ai,...,Am. The uni-
versally quantified variables in x correspond to nodes that may connect
in other ways as well as by hyperedges representing A1,...,A m , so they
are the interface nodes to the unrewritten part of the hypergraph. In-
terestingly, the final result of a computation on input / is an implica-
tion VtZ? : I <= O. This implication does not give a solution to the goal
/, as in Prolog. Rather, it rewrites the goal / into a (presumably sim-
pler or more transparent) goal O, such that every solution to O is also
a solution to I. This is formally similar to functional and equational
logic programming, in that the class of legitimate outputs is a subclass
of the legitimate inputs—in Prolog inputs are formulae and outputs are
substitutions. Constraint logic programming [Jaffar and Lassez, 1987;
Lassez, 1991] is perhaps heading in this direction.
The semantic treatment suggested above is not very satisfactory, as it
Equational Logic Programming 153

rules out lazy evaluation, and even storage reclamation (an intermediate
part of a hypergraph that becomes disconnected from the input and out-
put cannot be thrown away, even though it cannot affect the nature of
a solution, since if any part of the hypergraph is unsolvable, the whole
conjunction is unsolvable). Also, the logically sound notion of unification
of the left-hand side of a rule with a portion of a hypergraph in order to
rewrite it seems far too liberal with this semantic system—without the
single-valued quality of functions, additional relations can be postulated
to hold anywhere. I conjecture that the first problem can be solved by a
semantic system in which every unquantified formula is solvable, so that a
disconnected portion of the hypergraph may be discarded as semantically
irrelevant. One way to achieve this would be to use some sort of measured
universe [Cohn, 1980], and require every atomic predicate to hold for all but
a subset of measure 0—then every finite intersection of atomic predicates
(equivalently, the value of every finite conjunction of atomic formulae) has
a solution. Such a semantic system is not as weird as it first seems, if we
understand the universe as a space of concepts, rather than real objects,
and think of each predicate symbol as giving a tiny bit of information about
a concept. We can conceive of every combination of information (such as a
green unicorn with a prime number of feet that is a large power of 2), even
though most such combinations never enter our notion of reality. Although
the question of solvability becomes trivial in this semantic system, impli-
cations between solutions still have interesting content. The problem of
limiting unifications in a useful way appears more difficult. I suggest that
nonclassical 'relevant' interpretations of implication [Anderson and Belnap
Jr., 1975] are likely to be helpful.

Acknowledgements
I am very grateful for the detailed comments that I received from the
readers, Bharat Jayaraman, Donald W. Loveland, Gopalan Nadathur and
David J. Sherman.

References
[Abramsky and Hankin, 1987] S. Abramsky and C. Hankin. Abstract In-
terpretation of Declarative Languages. Ellis Horwood, Chichester, UK,
1987.
[Aho et al., 1974] A. V. Aho, John E. Hopcroft, and J. D. Ullman. The
Design and Analysis of Computer Algorithms. Addison-Wesley, 1974.
[Anderson and Belnap Jr., 1975] Alan Ross Anderson and Nuel D. Belnap
Jr. Entailment—the Logic of Relevance and Necessity, volume 1. Prince-
ton University Press, Princeton, NJ, 1975.
[Antoy et al., 1994] Sergio Antoy, Rachid Echahed, and Michael Hanus. A
needed narrowing strategy. In Proceedings of the 21st ACM Symposium
154 Michael J. O'Donnell

on Principles of Programming Languages, pages 268-279. ACM, January


1994.
[Appel, 1991] Andrew Appel. Garbage collection. In Peter Lee, editor,
Topics in Advanced Language Implementation Techniques. MIT Press,
1991.
[Ashcroft and Wadge, 1985] E. A. Ashcroft and W. W. Wadge. Lucid, the
Dataflow Programming Language. Academic Press, London, UK, 1985.
[Augustsson, 1984] Lennart Augustsson. A compiler for lazy ML. In A CM
Symposium on Lisp and Functional Programming, August 1984.
[Backus, 1974] John Backus. Programming language semantics and closed
applicative languages. In Proceedings of the 1st ACM Symposium on
Principles of Programming Languages, pages 71-86. ACM, 1974.
[Backus, 1978] John Backus. Can programming be liberated from the von
Neumann style? a functional style and its algebra of programs. Com-
munications of the ACM, 21(8):613-641, 1978.
[Baird et al, 1989] T. Baird, G. Peterson, and R. Wilkerson. Complete
sets of reductions modulo associativity, commutativity, and identity. In
Proceedings of the 3rd International Conference on Rewriting Techniques
and Applications, volume 355 of Lecture Notes in Computer Science,
pages 29-44, 1989.
[Barendregt, 1984] Hendrik Peter Barendregt. The Lambda Calculus: Its
Syntax and Semantics. North-Holland, Amsterdam, 1984.
[Berry and Levy, 1979] Gerard Berry and Jean-Jacques Levy. Letter to the
editor, SIGACT News, 11, 3-4, 1979.
[Bird and Wadler, 1988] R. Bird and P. Wadler. Introduction to Functional
Programming. Prentice-Hall, New York, NY, 1988.
[Bloss et al., 147-164] A. Bloss, Paul Hudak, and J. Young. Code opti-
mizations for lazy evaluation. Lisp and Symbolic Computation: an In-
ternational Journal, 1, 147-164.
[Bobrow and Clark, 1979] D. Bobrow and D. Clark. Compact encodings
of list structure. A CM Transactions on Programming Languages and
Systems, l(2):266-286, 1979.
[Bondorf, 1989] Anders Bondorf. A self-applicable partial evaluator for
term-rewriting systems. In International Joint Conference on the Theory
and Practice of Software Development, volume 352 of Lecture Notes in
Computer Science. Springer-Verlag, 1989.
[Burn et al., 1988] G. L. Burn, Simon L. Peyton Jones, and J. D. Robson.
The spineless g-machine. In Proceedings of the 1988 ACM Conference
on Lisp and Functional Programming, pages 244-258, 1988.
[Cardelli, 1983] Luca Cardelli. The functional abstract machine. Polymor-
phism, 1(1), 1983.
Equational Logic Programming 155

[Cardelli, 1984] Luca Cardelli. Compiling a functional language. In Pro-


ceedings of the ACM Symposium on Lisp and Functional Programming,
August 1984.
[Chen and O'Donnell, 1991] Yiyun Chen and Michael James O'Donnell.
Testing confluence of nonterminating overlapping systems of rewrite
rules. In Conditional and Typed Rewriting Systems 2nd International
CTRS Workshop, Montreal, June 1990, volume 516 of Lecture Notes in
Computer Science, pages 127-136. Springer-Verlag, 1991.
[Chew, 1980] Leslie Paul Chew. An improved algorithm for computing
with equations. In 21st Annual Symposium on Foundations of Computer
Science, pages 108-117. IEEE, 1980.
[Chew, 1981] Leslie Paul Chew. Unique normal forms in term rewriting
systems with repeated variables. In 13th Annual ACM Symposium on
Theory of Computing, pages 7-18, 1981.
[Cohen, 1981] J. Cohen. Garbage collection of linked data structures. Com-
puting Surveys, 13(3), September 1981.
[Cohn, 1980] Donald L. Cohn. Measure Theory. Birkhauser, Boston, MA,
1980.
[Cousineau et al., 1985] G. Cousineau, P.-L. Curien, and M. Mauny. The
categorical abstract machine. In Symposium on Functional Programming
Languages and Computer Architecture, volume 201 of Lecture Notes in
Computer Science. Springer-Verlag, 1985.
[Cousot and Cousot, 1977] P. Cousot and R. Cousot. Abstract interpreta-
tion: A unified framework for static analysis of programs by construction
or approximation of fixpoints. In Fourth Annual ACM Symposium on
Principles of Programming Languages. ACM, 1977.
[Curry and Feys, 1958] H. B. Curry and R. Feys. Combinatory Logic, vol-
ume 1. North-Holland, Amsterdam, 1958.
[Demers et al., 1981] Alan Demers, Thomas Reps, and Tim Teitelbaum.
Incremental evaluation of attribute grammars with application to syntax-
directed editors. In Conference Record of the 13th Annual ACM Sym-
posium on Principles of Programming Languages, pages 105-116. ACM,
1981.
[Dershowitz, 1987] Nachum Dershowitz. Termination of rewriting. Journal
of Symbolic Computation, 3:69-116, 1987.
[Dershowitz et al., 1983] Nachum Dershowitz, Jieh Hsiang, N. Josephson,
and David A. Plaisted. Associative-commutative rewriting. In Proceed-
ings of the 8th International Joint Conference on Artificial Intelligence,
pages 940-944, August 1983.
[Dershowitz et al., 1991] Nachum Dershowitz, Simon Kaplan, and David
A. Plaisted. Rewrite, rewrite, rewrite, rewrite, rewrite. Theoretical Com-
puter Science, 83:71-96, 1991.
156 Michael J. O'Donnell

[Durand, 1994] Irene Durand. Bounded, strongly sequential, and forward-


branching term-rewriting systems. Journal of Symbolic Computation,
18, 319-352, 1994.
[Durand and Salinier, 1993] Irene Durand and Bruno Salinier. Constructor
equivalent term rewriting systems. Information Processing Letters, 47,
1993.
[Durand et al., 1991] Irene Durand, David J. Sherman, and Robert I.
Strandh. Optimization of equational programs using partial evaluation.
In Proceedings of the ACM/IFIP Symposium on Partial Evaluation and
Semantics-Based Program Manipulation, New Haven, CT, 1991.
[Fay, 1979] M. Fay. First order unification in equational theories. In Pro-
ceedings of the 4th Workshop on Automated Deduction, volume 87 of Lec-
ture Notes in Computer Science, pages 161-167. Springer-Verlag, 1979.
[Field and Harrison, 1988] A. J. Field and P. G. Harrison. Functional Pro-
gramming. Addison-Wesley, 1988.
[Field and Teitelbaum, 1990] John Field and Tim Teitelbaum. Incremen-
tal reduction in the lambda calculus. In Proceedings of the 1990 ACM
Conference on Lisp and Functional Programming, pages 307-322. ACM
Press, 1990.
[Friedman and Wise, 1976] Daniel Friedman and David S. Wise. Cons
should not evaluate its arguments. In 3rd International Colloquium on
Automata, Languages and Programming, pages 257-284. Edinburgh Uni-
versity Press, 1976.
[Futatsugi et al., 1985] K. Futatsugi, Joseph A. Goguen, J.-P. Jouannaud,
and Jose Meseguer. Principles of OBJ2. In 12th Annual Symposium on
Principles of Programming Languages, pages 52-66. ACM, 1985.
[Gallier, 1986] Jean H. Gallier. Logic for Computer Science—Foundations
of Automatic Theorem Proving. Harper & Row, New York, NY, 1986.
[Goguen, 1977] Joseph A. Goguen. Abstract errors for abstract data types.
In E. J. Neuhold, editor, Proceedings of IFIP Working Conference on
Formal Description of Program Concepts. North-Holland, 1977.
[Gonthier et al., 1992] Georges Gonthier, Martin Abadi, and Jean-Jacques
Levy. The geometry of optimal lambda reduction. In Conference Record
of the 19th Annual ACM Symposium on Principles of Programming Lan-
guages, pages 15-26. ACM, 1992.
[Guttag and Horning, 1978] John V. Guttag and J. J. Horning. The alge-
braic specification of abstract data types. Acta Informatica, 10(l):l-26,
1978.
[Guttag et al., 1983] John V. Guttag, Deepak Kapur, and David Musser.
On proving uniform termination and restricted termination of rewriting
systems. SIAM Journal on Computing, 12:189-214,1983.
Equational Logic Programming 157

[Henderson, 1980] P. Henderson. Functional Programming—Application


and Implementation. Prentice-Hall, 1980.
[Henderson and Morris, 1976] P. Henderson and J. H. Morris. A lazy eval-
uator. In 3rd Annual ACM Symposium on Principles of Programming
Languages, pages 95-103. SIGPLAN and SIGACT, 1976.
[Hoffmann and O'Donnell, 1979] C. M. Hoffmann and M. J. O'Donnell.
Interpreter generation using tree pattern matching. In 6th Annual Sym-
posium on Principles of Programming Languages, pages 169-179. SIG-
PLAN and SIGACT, 1979.
[Hoffmann and O'Donnell, 1982] C. M. Hoffmann and M. J. O'Donnell.
Pattern matching in trees. Journal of the ACM, 29(1):169-179, 1982.
[Hoffmann et al, 1985] C. M. Hoffmann, M. J. O'Donnell, and R. I.
Strandh. Implementation of an interpreter for abstract equations. Soft-
ware — Practice and Experience, 15(12):1185-1203,1985.
[Hudak, 1989] Paul Hudak. Conception, evolution, and application of func-
tional programming languages. ACM Computing Surveys, 21(3):359-411,
1989.
[Hudak, 1992] Report on the programming language Haskell, a non-strict,
purely functional language, version 1.2. ACM SIGPLAN Notices, 27(5),
May 1992.
[Hudak and Sundaresh, 1988] Paul Hudak and Raman S. Sundaresh. On
the expressiveness of purely functional I/O systems. Technical Report
YALEU/DCS/RR665, Yale University, New Haven, CT, December 1988.
[Huet, 1980] G. Huet. Confluent reductions: Abstract properties and appli-
cations to term rewriting. Journal of the ACM, 27(4):797-821, October
1980.
[Huet and Levy, 1991] Ge'rard Huet and Jean-Jacques Levy. Computa-
tions in orthogonal rewriting systems. In Jean-Louis Lassez and Gordon
Plotkin, editors, Computational Logic—Essays in Honor of Alan Robin-
son, pages 395-443. MIT Press, Cambridge, MA, 1991.
[Hughes, 1982] R. J. M. Hughes. Super-combinators: A new implementa-
tion method for applicative languages. In ACM Symposium on Lisp and
Functional Programming, August 1982.
[Hughes, 1985a] J. Hughes. Lazy memo-functions. In Functional Program-
ming Languages and Computer Architecture, volume 201 of Lecture Notes
in Computer Science, pages 129-146. Springer-Verlag, 1985.
[Hughes, 1985b] R. J. M. Hughes. Strictness detection in non-flat domains.
In Neil Jones and Harald Ganzinger, editors, Workshop on Programs
as Data Objects, volume 217 of Lecture Notes in Computer Science.
Springer-Verlag, 1985.
[iverson, 1962] K. E. Iverson. A Programming Language. John Wiley and
Sons, New York, NY, 1962.
158 Michael J. O'Donnell

[Jaffar and Lassez, 1987] Joxan Jaffar and Jean-Louis Lassez. Constraint
logic programming. In 14th Annual ACM Symposium on Principles of
Programming Languages, pages 111-119, 1987.
[Jayaraman, 1985] Bharat Jayaraman. Equational programming: A uni-
fying approach to functional and logic programming. Technical Report
85-030, The University of North Carolina, 1985.
[Jayaraman, 1992] Bharat Jayaraman. Implementation of subset-
equational programming. The Journal of Logic Programming, 12(4),
April 1992.
[Johnsson, 1984] Thomas Johnsson. Efficient compilation of lazy evalua-
tion. In Proceedings of the ACM SIGPLAN'84 Symposium on Compiler
Construction, 1984. SIGPLAN Notices 19(6) June, 1984.
[Kahn and Plotkin, 1978] Gilles Kahn and Gordon Plotkin. Domaines con-
crets. Technical report, IRIA Laboria, LeChesnay, France, 1978.
[Kapur et al, 1982] Deepak Kapur, M. S. Krishnamoorthy, and P. Naren-
dran. A new linear algorithm for unification. Technical Report 82CRD-
100, General Electric, 1982.
[Karlsson, 1981] K. Karlsson. Nebula, a functional operating system. Tech-
nical report, Chalmers University, 1981.
[Karp, 1964] Carol R. Karp. Languages with Expressions of Infinite Length.
North-Holland, Amsterdam, 1964.
[Kathail, 1984] Arvind and Vinod Kumar Kathail. Sharing of computation
in functional langauge implementations. In Proceedings of the Interna-
tional Workshop on High-Level Computer Architecture, 1984.
[Keller and Sleep, 1986] R. M. Keller and M. R. Sleep. Applicative caching.
ACM Transactions on Programming Languages and Systems, 8(1):88-
108, 1986.
[Kenneway et ai, 1991] J. R. Kenneway, Jan Willem Klop, M. R. Sleep,
and F. J. de Vries. Transfinite reductions in orthogonal term rewriting
systems. In Proceedings of the 4th International Conference on Rewriting
Techniques and Applications, volume 488 of Lecture Notes in Computer
Science. Springer-Verlag, 1991.
[Klop, 1980] Jan Willem Klop. Combinatory Reduction Systems. PhD
thesis, Mathematisch Centrum, Amsterdam, 1980.
[Klop, 1991] Jan Willem Klop. Term rewriting systems. In S. Abramsky,
Dov M. Gabbay, and T. S. E. Maibaum, editors, Handbook of Logic in
Computer Science, volume 1, chapter 6. Oxford University Press, Oxford,
1991.
[Klop and Middeldorp, 1991] Jan Willem Klop and A. Middeldorp. Se-
quentiality in orthogonal term rewriting systems. Journal of Symbolic
Computation, 12:161-195,1991.
Equational Logic Programming 159

[Knuth, 1973] Donald E. Knuth. The Art of Computer Programming—


Sorting and Searching, volume 3. Addison-Wesley, Reading, MA, 1973.
[Knuth and Bendix, 1970] Donald E. Knuth and P. Bendix. Simple word
problems in universal algebras. In J. Leech, editor, Computational Prob-
lems in Abstract Algebra, pages 127-146. Pergamon Press, Oxford, 1970.
[Koopman, 1990] Philip J. Koopman. An Architecture for Combinator
Graph Reduction. Academic Press, Boston, MA, 1990.
[Koopman and Lee, 1989] Philip J. Koopman and Peter Lee. A fresh look
at combinator graph reduction. In Proceedings of the SIGPLAN'89 Con-
ference on Programming Language Design and Implementation, October
1989.
[Lamping, 1990] John Lamping. An algorithm for optimal lambda calculus
reduction. In Conference Record of the 17th Annual ACM Symposium
on Principles of Programming Languages, pages 16-30. ACM, 1990.
[Landin, 1965] P. J. Landin. A correspondence between ALGOL 60 and
Church's lambda-notation: Part I. Communications of the ACM,
8(2):89-101, 1965.
[Lassez, 1991] Jean-Louis Lassez. Prom LP to CLP: Programming with
constraints. In T. Ito and A. R. Meyer, editors, Theoretical Aspects of
Computer Software: International Conference, volume 526 of Lecture
Notes in Computer Science. Springer-Verlag, 1991.
[Levy, 1978] Jean-Jacques Levy. Reductions Correctes et Optimales dans
le Lambda-Calcul. PhD thesis, Universite Paris, January 1978.
[Lieberman and Hewitt, 1983] Henry Lieberman and Carl Hewitt. A real-
time garbage collector based on the lifetimes of objects. Communications
of the ACM, 26(6):419-429, June 1983.
[Loveland, 1978] Donald W. Loveland. Automated Theorem Proving: A
Logical Basis. Elsevier North-Holland, New York, NY, 1978.
[Machtey and Young, 1978] Michael Machtey and Paul Young. An Intro-
duction to the General Theory of Algorithms. Theory of Computation.
North-Holland, New York, NY, 1978.
[Martelli and Montanari, 1976] A. Martelli and U. Montanari. Unification
in linear time and space: A structured presentation. Technical Report
B76-16, Institut di Elaborazione delle Informazione, Consiglio Nazionale
delle Ricerche, Pisa, Italy, 1976.
[McCarthy, I960] John McCarthy. Recursive functions of symbolic expres-
sions and their computation by machine, part I. Communications of the
ACM, 3(4):184-195, 1960.
[McCarthy et al., 1965] John McCarthy, Paul W. Abrahams, Daniel J. Ed-
wards, Timothy P. Hart, and Michael I. Levin. LISP 1.5 Programmer's
Manual. MIT Press, Cambridge, MA, 1965.
[Meseguer, 1992] Jose Meseguer. Multiparadigm logic programming. In
160 Michael J. O'Donnell

H. Kirchner and G. Levi, editors, Proceedings of the 3rd Interna-


tional Conference on Algebraic and Logic Programming, Volterra, Italy,
September 1992, Lecture Notes in Computer Science. Springer-Verlag,
1992.
[Mitchie, 1968] D. Mitchie. 'Memo' functions and machine learning. Na-
ture, 1968.
[Mycroft, 1980] Alan Mycroft. The theory and practice of transforming
call-by-need into call-by-value. In International Symposium on Program-
ming, volume 83 of Lecture Notes in Computer Science. Springer-Verlag,
1980.
[Newman, 1942] M. H. A. Newman. On theories with a combinatorial
definition of 'equivalence'. Annals of Mathematics, 43(2):223-243, 1942.
[O'Donnell, 1977] Michael James O'Donnell. Computing in Systems De-
scribed by Equations, volume 58 of Lecture Notes in Computer Science.
Springer-Verlag, 1977.
[O'Donnell, 1979] Michael James O'Donnell. Letter to the editor, SIGACT
News, 11, 2, 1979.
[O'Donnell, 1985] Michael James O'Donnell. Equational Logic as a Pro-
gramming Language. Foundations of Computing. MIT Press, Cambridge,
MA, 1985.
[O'Donnell, 1987] Michael James O'Donnell. Tree-rewriting implementa-
tion of equational logic programming. In Pierre Lescanne, editor, Rewrit-
ing Techniques and Applications — Bordeaux, France, May 1987 — Pro-
ceedings, volume 256 of Lecture Notes in Computer Science. Springer-
Verlag, 1987.
[Peyton Jones, 1987] Simon L. Peyton Jones. The Implementation of Func-
tional Programming Languages. Prentice-Hall, Englewood Cliffs, NJ,
1987.
[Pingali and Arvind, 1985] Keshav Pingali and Arvind. Efficient demand-
driven evaluation, part 1. ACM Transactions on Programming Lan-
guages and Systems, 7(2):311-333, April 1985.
[Pingali and Arvind, 1986] Keshav Pingali and Arvind. Efficient demand-
driven evaluation, part 2. ACM Transactions on Programming Lan-
guages and Systems, 8(1):109-139, January 1986.
[Pugh and Teitelbaum, 1989] William Pugh and Tim Teitelbaum. Incre-
mental computation via function caching. In Conference Record of the
Sixteenth Annual ACM Symposium on Principles of Programming Lan-
guages, pages 315-328. ACM, 1989.
[Ramakrishnan and Sekar, 1990] I. V. Ramakrishnan and R. C. Sekar. Pro-
gramming in equational logic: Beyond strong sequentiality. In Proceed-
ings of the IEEE Conference on Logic in Computer Science, 1990.
[Rebelsky, 1992] Samuel A. Rebelsky. I/O trees and interactive lazy func-
Equational Logic Programming 161

tional programming. In Maurice Bruynooghe and Martin Wirsing, ed-


itors, Proceedings of the 4th International Symposium on Programming
Language Implementation and Logic Programming, volume 631 of Lec-
ture Notes in Computer Science, pages 458-472. Springer-Verlag, August
1992.
[Rebelsky, 1993] Samuel A. Rebelsky. Tours, a System for Lazy Term-based
Communication. PhD thesis, The University of Chicago, June 1993.
[Robinson and Wos, 1969] G. A. Robinson and L. Wos. Paramodulation
and theorem-proving in first-order logic with equality. Machine Intelli-
gence, 4:135-150, 1969.
[Sherman, 1990] David J. Sherman. Lazy directed congruence closure.
Technical Report 90-028, The University of Chicago, 1990.
[Spitzen et al, 1978] Jay M. Spitzen, Karl N. Levitt, and Lawrence Robin-
son. An example of hierarchical design and proof. Communications of
the ACM, 21 (12):1064-1075, December 1978.
[Staples, 1982] John Staples. Two-level expression representation for faster
evaluation. In Hartmut Ehrig, Manfred Nagl, and Grzegorz Rozenberg,
editors, Graph Grammars and their Application to Computer Science:
2nd International Workshop, volume 153 of Lecture Notes in Computer
Science. Springer-Verlag, 1982.
[Strandh, 1984] R. I. Strandh. Incremental suffix trees with multiple sub-
ject strings. Technical Report JHU/EECS-84/18, The Johns-Hopkins
University, 1984.
[Strandh, 1988] Robert I. Strandh. Compiling Equational Programs into
Efficient Machine Code. PhD thesis, The Johns Hopkins University,
Baltimore, MD, 1988.
[Strandh, 1989] Robert I. Strandh. Classes of equational programs that
compile into efficient machine code. In Proceedings of the 3rd Interna-
tional Conference on Rewrite Techniques and Applications, 1989.
[Thatte, 1985] Satish Thatte. On the correspondence between two classes
of reduction systems. Information Processing Letters, 1985.
[Turner, 1979] D. A. Turner. A new implementation technique for applica-
tive languages. Software—Practice and Experience, 9:31-49, 1979.
[Wand, 1976] Mitchell Wand. First order identities as a defining language.
Ada Informatica, 14:336-357, 1976.
[Warren, 1983] David H. D. Warren. An abstract Prolog instruction set.
Technical Report 309, Artificial Intelligence Center, SRI International,
Menlo Park, CA, October 1983.
[Winksel, 1993] Glynn Winksel. The Formal Semantics of Programming
Languages—An Introduction. Foundations of Computing. MIT Press,
Cambridge, MA, 1993.
This page intentionally left blank
Proof Procedures for Logic
Programming
Donald W. Loveland and Gopalan Nadathur

Contents
1 Building the framework: the resolution procedure . . . . 163
1.1 The resolution procedure 164
1.2 Linear resolution refinements 175
2 The logic programming paradigm 186
2.1 Horn clause logic programming 186
2.2 A framework for logic programming 190
2.3 Abstract logic programming languages 198
3 Extending the logic programming paradigm 212
3.1 A language for hypothetical reasoning 213
3.2 Near-Horn Prolog 219
4 Conclusion 229

1 Building the framework: the resolution procedure


A proof procedure is an algorithm (technically, a semi-decision procedure)
which identifies a formula as valid (or unsatisfiable) when appropriate, and
may not terminate when the formula is invalid (satisfiable). Since a proof
procedure concerns a logic the procedure takes a special form, superimpos-
ing a search strategy on an inference calculus. We will consider a certain
collection of proof procedures in the light of an inference calculus format
that abstracts the concept of logic programming. This formulation al-
lows us to look beyond SLD-resolution, the proof procedure that underlies
Prolog, to generalizations and extensions that retain an essence of logic
programming structure.
The inference structure used in the formulation of the logic program-
ming concept and first realization, Prolog, evolved from the work done in
the subdiscipline called automated theorem proving. While many proof
procedures have been developed within this subdiscipline, some of which
appear in Volume 1 of this handbook, we will present a narrow selection,
164 Donald W. Loveland and Gopalan Nadathur

namely the proof procedures which are clearly ancestors of the first proof
procedure associated with logic programming, SLD-resolution. Extensive
treatment of proof procedures for automated theorem proving appear in
Bibel [Bibel, 1982], Chang and Lee [Chang and Lee, 1973] and Loveland
[Loveland, 1978].
1.1 The resolution procedure
Although the consideration of proof procedures for automated theorem
proving began about 1958 we begin our overview with the introduction of
the resolution proof procedure by Robinson in 1965. We then review the
linear resolution procedures, model elimination and SL-resolution proce-
dures. Our exclusion of other proof procedures from consideration here
is due to our focus, not because other procedures are less important his-
torically or for general use within automated or semi-automated theorem
process.
After a review of the general resolution proof procedure, we consider
the linear refinement for resolution and then further restrict the proce-
dure format to linear input resolution. Here we are no longer capable of
treating full first-order logic, but have forced ourselves to address a smaller
domain, in essence the renameable Horn clause formulas. By leaving the
resolution format, indeed leaving traditional formula representation, we see
there exists a linear input procedure for all of first-order logic. This is the
model elimination(ME) procedure, of which a modification known as the
SL-resolution procedure was the direct inspiration for the SLD-resolution
procedure which provided the inference engine for the logic programming
language Prolog. The ME-SL-resolution linear input format can be trans-
lated into a very strict resolution restriction, linear but not an input re-
striction, as we will observe.
The resolution procedure invented by Alan Robinson and published in
1965 (see [Robinson, 1965]) is studied in depth in the chapter on resolution-
based deduction in Volume 1 of this handbook, and so quickly reviewed
here. The resolution procedure is a refutation procedure, which means
we establish unsatisfiability rather than validity. Of course, no loss of
generality occurs because a formula is valid if and only if (iff) its negation
is unsatisfiable. The formula to be tested for unsatisfiability is "converted
to conjunctive normal form with no existential quantifiers present". By this
we mean that with a given formula F to be tested for unsatisfiability, we
associate a logically equivalent formula F' which is in conjunctive normal
form (a conjunction of clauses, each clause a disjunction of atomic formulas
or negated atomic formulas, called literals) with no existential quantifiers.
F' may be written without any quantifiers, since we regard F, and therefore
F', as a closed formula, so the universal quantifiers are implicitly assumed
to be present preceding each clause. (Thus we are free to rename variables
of each clause so that no variable occurs in more than one clause.) The
Proof Procedures for Logic Programming 165

formula F' is derivable from F by standard logical equivalences plus the


use of Skolem functions to replace quantifiers. We shall say that F' is in
Skolem conjunctive form and that F1 is the Skolem conjunctive form of F.
Precise algorithms for conversion are given in many textbooks, including
Chang and Lee [Chang and Lee, 1973] and Loveland [Loveland, 1978].
The formula

is the Skolem conjunctive form of

where g(x) is a Skolem function introduced for the y that occurs in the
intermediate formula

(Recall that an existentially quantified variable is replaced by a new func-


tion letter followed by arguments consisting of all universally quantified
variables where the universal quantifier contains the existential quantifier
within its scope.) The formula

has

as its Skolem conjunctive form formula, where a is a Skolem function having


0 arguments. Note that in this example immediate Skolemization, before
applying the distributive law and then importing the quantifiers, would
have introduced a Skolem function f(z) instead of the Skolem constant a
used above. For pragmatic reasons regarding the cost of search in auto-
mated proof procedures it is best to seek Skolem functions with the fewest
number of arguments. We have adopted the traditional logic notation of
capital letters for predicate letters, and non-capitals for variables, func-
tion and constant letters. Variables are represented by end of the alphabet
letters.
For those accustomed to logic programming notation, based on clauses
as implications, it is a natural question to ask: why use conjunctive normal
form? The resolution procedure (and hence its restrictions) does not dis-
tinguish between positive literals (atomic formula, or atoms) and negative
literals (negated atoms). Also, the general resolution procedure (and many
variants) requires symmetric calling access to all literals of a clause. Thus
the symmetry of OR as the connective in the clause is very suitable. Of
course, we could present the entire set of procedures using implicative form
but there is no insight gained and the presentation would differ from the
traditional presentations for no essential reason. Therefore, we will invoke
166 Donald W. Loveland and Gopalan Nadathur

a notation change in the middle of the paper, but this reflects common
sense and historical precedent, so we forsake uniform notation.
Resolution procedures are based on the resolution inference rule

C1 V a ->a V C2

where C1 V a and ->a V C2 are two known clauses, with C1 and C2 repre-
senting arbitrary disjuncts of literals. The literals a and ->a are called the
resolving literals. The derived clause is called the resolvent. The similarity
of the inference rule to Gentzen's cut rule is immediately clear and the rule
can be seen as a generalization of modus ponens. The resolvent is true in
any model for the two given clauses, so the inference rule preserves validity.
The resolution inference rule just given is the prepositional rule, also
called the ground resolution rule for a reason given later. We postpone
discussion of the very similar first-order inference rule to delay some com-
plications.
Because the Skolem conjunctive format is so uniform in style, it is con-
venient to simplify notation when using it. We drop the OR symbol in
clauses and instead simply concatenate literals. Clauses themselves are ei-
ther written one per line or separated by commas. When the non-logical
symbols are all single letters it is also convenient to drop the parentheses as-
sociated with predicate letters and the commas between arguments. Thus
the Skolem conjunctive form formula (P(a) V P(b)) A (-'P(a)) A (~P(b)) is
shortened to PaPb, –Pa, –Pb by use of the simplifying notation.
Using the just-introduced shorthand notation for a formula in this form,
often called a clause set, we present a refutation of a clause set. We use
the terms input clause to designate a clause from the input clause set given
by the user to distinguish such clauses from derived clauses of any type.
1. PaPb input clause
2. Pa–Pb input clause
3. –PaPb input clause
4. –Pa–Pb input clause
5. Pa resolvent of clauses 1, 2
6. –Pa resolvent of clauses 3, 4
7. contradiction Pa and –Pa cannot both hold
Note that in line 5 the two identical literals are merged.
Since each resolvent is true in any model satisfying the two parent
clauses, a satisfiable clause set could not yield two contradictory unit
clauses. Thus Pa, -Pa signals an unsatisfiable input clause set. The
resolution inference rule applied to Pa and -Pa yields the empty clause,
denoted D. This is customarily entered instead of the word "contradiction",
as we do hereafter.
Proof Procedures for Logic Programming 167

We have noted that the ground resolution inference system is sound,


i.e. will not deduce D if the original clause set is satisfiable. We now
note that this inference system is complete, i.e. for every propositional
unsatisfiable clause set there is a resolution refutation. A proof appears
in Chang and Lee [Chang and Lee, 1973] or Loveland [Loveland, 1978].
A proof by induction on the number of "excess" literal occurrences in the
clause set (the number of literal occurrences minus the number of clauses in
the set) is not difficult and the reader may enjoy trying the proof himself.
Above, we have given a proof of unsatisfiability, or a refutation. There
is no indication there as to how such a proof might be found. Descriptions
of search algorithms that employ the resolution inference rule comprise
proof procedures for resolution. Perhaps the most natural proof procedure
for resolution is the level saturation procedure, where all resolvents at a
certain level are determined before beginning the next level. Because we
enumerate all resolvents eventually, we have a complete proof procedure.
We define all the given, or input, clauses to be at level 0. A resolvent is
at level k + 1 if the parent clauses are at level k and j, for j < k. For the
preceding example we list the clauses by level:
Level 0: PaPb, Pa-Pb, -PaPb, -Pa-Pb
Level 1: Pa, Pb, -Pa, -Pb, -PaPa, -PbPb
Level 2: D , and all Level 0 and Level 1 clauses.
Level 2 resolvents include all previously obtained clauses because the tau-
tologies -PaPa and -PbPb as one parent clause will always produce the
other parent clause as the resolvent. Thus we can remove tautologies with-
out affecting completeness. This is true for almost all resolution restric-
tions also. (An exception can occur when clauses are ordered, such as in
the locking, or indexing, strategy. For example, see Loveland [1978].)
Other deletion rules can be advantageous, given that only clause D is
sought. Once Pa is produced, resolving thereafter with PaPb or Pa-Pb
is counterproductive because the resolvents again contain Pa. It is almost
immediate that, when trying to derive the (length 0) clause D, there is no
need to retain a clause that contains another existing clause as a subclause.
Eliminating a clause C1 V C2 when C2 also exists, for clauses C1 and C2, is
known as subsumption elimination. This generalizes at the first-order level
(noted later) and can get reasonably technical, with various options. This
is studied in detail in Loveland [1978]. Also see Wos et al. [1991].
A variant of the level saturation proof procedure that has proved useful
in practice employs a unit preference search strategy. (See Wos et al. [Wos
et al., 1964].) Here resolvents at higher levels than the current level being
"saturated" are computed whenever one of the parents is a unit clause
(one-literal clause). The intuition here is that the resolvent is always at
least one literal less in cardinality ("shorter") than the longer parent clause,
which is progress since the goal clause is the empty clause, having length 0.
168 Donald W. Loveland and Gopalan Nadathur

Except for merging, these unit clause resolutions must be the resolutions
that occur near the end of refutations, so it makes sense to "look ahead"
in this limited way.
Having treated the level-saturation resolution proof procedure, which is
a breadth-first search ordering, it is natural to ask about depth-first search
procedures. This is of particular interest to logic programmers who know
that this is the search order for Prolog. We want to do this in a manner
that preserves completeness. (Why it is desirable to preserve completeness
is a non-trivial question, particularly at the first-order level where searches
need not terminate and the combinatorial explosion of resolvent produc-
tion is the dominant problem. Two points speak strongly for considering
completeness: 1) these inference systems have such a fine-grained inference
step that incompleteness in the inference proof procedures leads to some
very simple problems being beyond reach and 2) it is best to understand
what one has to do to maintain completeness to know what price must
be paid if completeness is sacrificed. We will make some further specific
comments on this issue later.)
Before presenting a depth-first proof procedure let us observe that at the
prepositional, or ground, level there are only a finite number of resolvents
possible beginning with any given (finite) clause set. This follows directly
from two facts: 1) no new atoms are created by the resolution inference
rule, and 2) a literal appears at most once in a clause, so there is an upper
bound on the number of literals in any resolvent.
A straightforward depth-first proof procedure proceeds as follows. If
there are n input clauses then we temporarily designate clause n as the focus
clause. In general the focus clause is the last resolvent still a candidate for
inclusion in a resolution refutation (except for the first clause assignment,
to clause n). We begin by resolving focus clause n against clause 1, then
clause 2, etc. until a new non-subsumed non-tautological clause is created
as a resolvent. This resolvent is labeled clause n+1 and becomes the new
focus clause. It is resolved against clause 1, clause 2, etc. until a new clause
is created that is retained. This becomes the new focus clause and the
pattern continues. If D is obtained the refutation is successful. Otherwise,
some focus clause m creates no new surviving clause and backtracking is
begun. The clause m — 1 is relabeled the focus clause (but clause m is
retained) and clause m — 1 is resolved against those clauses not previously
tried, beginning with clause j + 1 where clause j is the clause that paired
with clause m — 1 to produce clause m. The first retained resolvent is
labeled clause m + 1. Clause m + 1 now becomes the focus clause and the
process continues as before. This backtracking is close to the backtracking
employed by Prolog.
The above depth-first procedure does differ from that used in Prolog,
however. The primary difference is that after one backtracks from a clause
that yields no new resolvent, the clause is not removed. If we are interested
Proof Procedures for Logic Programming 169

in a complete procedure then we might need to consider using this "aban-


doned" clause C with clauses of higher index not yet defined, and hence not
yet tried with C. Suppose hypothetically that a four clause given set has
only the following resolution refutation (excluding renumbering): clause 3
and clause 4 create a resolvent (clause 5), clause 1 and clause 2 create a
resolvent (clause 6), and D is the resolvent of clause 5 and clause 6. The
depth-first search starts its search by creating clause 5 as before, failing
to create any new resolvent using clause 5, and backtracks by restarting
with clause 4. Clause 4 first pairs with itself, but automatically skips this
since no nontautologous clause resolves with itself, and fails when paired
with clause 5. We have exhausted our first chosen focus clause; when this
happens the backtracking continues backwards through all clauses. Thus,
clause 3 is the new focus clause, does create a resolvent when paired with
clause 4, but that is subsumed by (is identical to) clause 5. Clause 3 sub-
sequently fails as the focus clause and clause 2 becomes the focus clause,
and produces a resolvent when paired with clause 1. This new clause 6
immediately becomes the focus clause, is eventually paired with clause 5
and D is produced. The point of all this is that clause 5 had to be retained
to be used when clause 6 was finally created.
The reader may object to the supposition that clause 5 fails to produce
any resolvent, but the contention that it would always pair with a lower
indexed clause if it is ever used in a refutation would require a theorem
about behavior of resolution refutations. The depth-first search utilized
here is simply designed to sweep the same proof space as swept by the
breadth-first search, but in a different order.
It does happen that clauses that exhaust their pairings and force a back-
track do not need to be retained. (That is, clause 5 above would either yield
a resolvent under some pairing or could be removed upon backtracking.)
This is a consequent of the existence of linear resolution refinements. The
linear refinement asserts that if a clause set is unsatisfiable then there is a
resolution refutation such that each resolvent has the preceding clause as
one of the parent clauses and the other parent clause must appear earlier
in the deduction or be an input clause. The preceding parent clause is
called the near parent clause. The other clause, the far parent clause, can
be defined more precisely as follows: The far parent clause must be either
an input clause or an ancestor clause, where the set of ancestor clauses of a
clause C is the smallest set of clauses containing the near parent clause of
C and the ancestors of the near parent clause of C. Again note that this is
a condition on proof form, not on search. There exists a linear refutation
with any input clause as first near-parent clause if that clause is in a mini-
mally unsatisfiable subset of the input clauses. We call the first near-parent
clause the top clause of the linear deduction. We give a linear refutation of
the clause set given earlier. It is common to list the top clause as the last
input clause.
170 Donald W. Loveland and Gopalan Nadathur

A linear refutation:
1. PaPb input clause
2. Pa-Pb input clause
3. -PaPb input clause
4. -Pa-Pb input clause
5. -Pb resolvent of 2, 4
6. Pa resolvent of 1,5
7. Pb resolvent of 3,6
8. a resolvent of 5,7
The last step in the above refutation involved two resolvents. This is
true of every refutation of this clause set, since no input clause is a one-
literal clause and D must have two one-literal parents. This provides an
example that not every clause set has a linear refutation where the far
parent is always a input clause. Clauses 4, 5 and 6 are the ancestors of
clause 7 in the above example. This definition of ancestor omits clauses 1,
2 and 3 as ancestors of clause 7 but our intent is to capture the derived
clauses used in the derivation of clause 7.
The linear restriction was independently discovered by Loveland (see
[Loveland, 1970], where the name linear was introduced and a stronger re-
striction s-linear resolution also was introduced) and by Luckham (where
the name ancestor filter was used; see [Luckham, 1970].) A proof of com-
pleteness of this restriction also is given in Chang and Lee [Chang and Lee,
1973] and Loveland [Loveland, 1978].
With this result we can organize our depth-first search to discard any
clause when we backtrack due to failure of that clause to lead to a proof.
It follows that we need only try a given resolvent with clauses of lower
index than the resolvent itself and that all retained resolvents are elements
of the same (linear) proof attempt. Use of this restriction permits a great
saving in space since only the current proof attempt clauses are retained.
However, a large cost in search effectiveness may be paid because of much
duplication of resolvent computation. In a proof search the same clause (or
close variant) is often derived many times through different proof histories.
In a breadth-first style search the redundant occurrences can be eliminated
by subsumption check. This is a check not available when resolved clauses
are eliminated upon backtracking as usually done in the depth-first linear
resolution procedures. Each recreation of a clause occurrence usually means
a search to try to eliminate the occurrence. This produces the high search
price associated with depth-first search procedures.
The redundant search problem suffered by depth-first linear resolution
procedures may make this approach unwise for proving deep mathematical
theorems, where much computation is needed and few heuristic rules exist
to guide the search. Depth-first linear resolution is often just the right pro-
cedure when there is a fairly strong guidance mechanism suggesting which
Proof Procedures for Logic Programming 171

clauses are most likely to be useful at certain points in the search. This
principle is at work in the success of Prolog, which implements a depth-first
linear strategy. When using Prolog, the user generally uses many predi-
cate names, which means that relatively few literals are possible matches.
This trims the branching rate. Secondly, the user orders the clauses with
knowledge of how he expects the computation to proceed. Clearly, when
some information exists to lead one down essentially correct paths, then
there is a big win over developing and retaining all possible deductions to
a certain depth. Besides logic programming applications, the depth-first
approach is justified within artificial intelligence (AI) reasoning system ap-
plications where the search may be fairly restricted with strong heuristic
guidance. When such conditions hold, depth-first search can have an ad-
ditional advantage in that very sophisticated implementation architectures
exist (based on the Warren Abstract Machine; see [Ait-Kaci, 1990]) allow-
ing much higher inference rates (essentially, the number of resolution opera-
tions attempted) than is realized by other procedure designs. This speed is
realized for a restricted linear form called linear input procedures, which we
discuss later, which is usually implemented using depth-first search mode.
It is important to realize that the linear resolution restriction need not
be implemented in a depth-first manner. For example, one might begin with
several input clauses "simultaneously" (either with truly parallel compu-
tation on a parallel computer or by interleaving sequential time segments
on a sequential machine) and compute all depth n + 1 resolvents using
all depth n parents before proceeding to depth n + 2 resolvents. This
way subsumption can be used on the retained clauses (without endanger-
ing completeness) yet many resolution rule applications are avoided, such
as between two input clauses not designated as start clauses for a linear
refutation search.
The mode of search described above is very close to the set-of-support
resolution restriction introduced by Wos et al. [1964; 1965]. A refutation
of a clause set S with set-of-support T C S is a resolution refutation where
every resolvent has at least one parent a resolvent or a member of T. Thus
two members of S — T cannot be resolved together. To insure completeness
it suffices to determine that S—T is a satisfiable set. The primary difference
between this restriction and the quasi-breadth-first search for linear resolu-
tion described in the preceding paragraph is that for the linear refutation
search of the previous paragraph one could choose to segregate clauses by
the deduction responsible for its creation (labeled by initial parents), and
not resolve clauses from different deductions. This reduces the branching
factor in the search tree. If this is done, then any subsuming clause must
acquire the deduction label of the clause it subsumes. The resulting refu-
tation will not be linear if a subsuming clause is utilized, but usually the
refutation existence, and not the style of refutation, is what matters. One
could call such refutations locally linear refutations. Loveland [Loveland,
172 Donald W. Loveland and Gopalan Nadathur

1978] discusses linear restrictions with subsumption.


Before concluding this discussion of search strategies, we consider ori-
ented clause sets. So far, we have treated the clause set as non-oriented, in
that no clause or subset of clauses received special notice a priori. Such is
the situation when an arbitrary logical formula is tested for unsatisfiabil-
ity. However, often the clause set derives from a theorem of form A D B,
where A is a set of axioms of some theory and B is the theorem statement
believed to hold in the theory. Analogously, A may be viewed as a logic
programming program and B is the query. It is well-known that testing
the validity of A D B answers the question: Is B a logical consequence of
A?
We test A D B for validity by testing A A -B for unsatisfiability. The
conversion to Skolem conjunctive form of A A -B permits A and -B to
be converted separately and the two clause sets conjoined. The clause set
5A from A is called the theory base, database or axiom set, and the clause
set SB from -B is called the goal set, query set or theorem statement
(set). In practice, the cardinality of set 5A is usually much larger than the
cardinality of SB. Also, there is often no assurance that all the clauses of
SA are needed in the proof, whereas usually all clauses of SB are needed.
Moreover, SA is usually a consistent clause set. Thus, the support set T is
usually a subset of SB, and linear refutations are started with one parent
from SB.
In this oriented clause set setting there are natural notions of direction of
search. A linear refutation with one parent from SB is called a backchain-
ing, goal-oriented, or top-down refutation and the associated search is a
backchaining or goal-oriented search. If the linear refutation begins with
both parents from SA then it is a forward chaining, forward-directed, or
bottom-up refutation, with the same labels applied to the associated search.
The same labels apply to the set-of-support refinement with support set
T C SB or T C 5A. If top-down and bottom-up refutations are sought
simultaneously, then the search is said to be bidirectional. The idea of bidi-
rectionality is to share resolvents between the bottom-up and top-down
searches, in effect having search trees growing bottom-up and top-down
which meet and provide a complete proof path from data to goal. If this
occurs, the search trees involved are roughly half the depth of a search
tree strictly top-down or a tree strictly bottom-up. Since the number of
tree nodes (resolvents) of any level generally grows exponentially with the
depth of the level a tremendous potential savings exists in search size. Au-
tomated theorem provers use bidirectional search sometimes, but use is by
no means universal. One problem is what clauses from 5A are appropriate.
In back-chaining systems, often definitions are expanded forward several
steps, a case of forward chaining.
A study of bidirectional search in a more general setting that resolu-
tion inference was done by Pohl [Pohl, 1971]. A related study on search
Proof Procedures for Logic Programming 173

strategies, explicitly considering resolution, is given in Kowalski [Kowalski,


1970].
Before proceeding to further restrictions of resolution we need to present
the first-order version of the resolution procedure. The more common pre-
sentation of first-order resolution uses two inference rules, binary resolution
and binary factoring. Binary resolution is the first-order counterpart of the
propositional resolution rule given earlier and factoring is the generaliza-
tion of the merging of literals that removes duplication in the propositional
case. ("Binary" refers to action limited to two literals; if factoring is in-
corporated within the resolution rule then several literals in one or both
parent clauses might be "removed" from the resolvent.)
To present the binary resolution and factoring rules of inference, unifi-
cation must be introduced. Substitution 8 is a unifier of expressions l1 and
l2 iff l10 = l20. For our use it suffices to consider l1 and l2 as (first-order)
atoms of first-order terms. Substitution a is a most general unifier (mgu)
of l1 and l2 iff l1o = l2o and for any substitution 7 such that l17 = l27,
there is a A such that 7 = a\. We defer to the chapter on unification in
Volume 1 of this handbook for further discussion of this topic. Also see the
chapter on resolution-based deduction in the same volume.
Two clauses are variable disjoint if they share no common variables.
We now present the inference rules for first-order resolution.
Binary resolution. Given variable disjoint clauses C1 V a1 and C2 V -a2,
where a1 and a2 are atoms and C1 and C2 are disjunctions of literals, we
deduce resolvent (C1 V C2)a from C1 V a1 and Ci V -a2, where o is the
mgu of a1 and a2; in summary:

Binary factoring. Given clause C V l\ V I?, where li and 12 are literals and
C is a disjunction of literals, we deduce factor (C V l)a where a is the mgu
of li and 12, i.e. l\a = I = l2a; in summary:

We illustrate the use of these rules by giving a refutation of the clause


set

Because the resolution rule requires the two parent clauses to be variable
disjoint, clauses may have their variables renamed prior to the application
of the resolution rule. This is done without explicit reference; we say
that clause Px and clause -PxRf(x) has resolvent Rf(x) when the actual
resolution rule application would first rename Px as (say) Py and then use
174 Donald W. Loveland and Gopalan Nadathur

We give a linear refutation:


1. PxQx input clause
2. Px-Qy input clause
3. -PxQf(x) input clause
4. -Px-Qx input clause
5. -Qx-Qy resolvent of 2, 4
6. -Qx factor of 5
7. Px resolvent of 1, 6
8. Qf(x) resolvent of 3,7
9. D resolvent of 6,8
The soundness of resolution procedures at the first-order level, that D
is derived only when the input clause set is unsatisfiable, is an immedi-
ate consequence of the fact that resolution and factoring preserve validity.
The completeness of resolution procedures, that D is derivable whenever
the input clause set is unsatisfiable, utilizes the Skolem-Herbrand-Godel
theorem (a model-theoretic version of a very difficult proof-theoretic the-
orem of Herbrand). The Skolem-Herbrand-Godel theorem asserts that a
first-order formula (resp., clause set) is unsatisfiable iff a finite conjunction
of ground formula (resp. clause set) instances is unsatisfiable. By a ground
formula instance we mean a copy of the formula with the variables replaced
by variable-free terms composed of symbols from the (non-variable) term
alphabet of the formula. The set of such terms is called the Herbrand uni-
verse. (If no constant individual appears in the formula, one is included
by decree.) The standard completeness proof for a first-order resolution
proof procedure has two parts: 1) a proof of the ground version of the
procedure and 2) a "lifting" argument that relates the first-order search
to a ground-level search by establishing that for any ground deduction of
that class of deductions defined by the proof procedure there is a first-order
deduction in that clause with that ground deduction as a ground instance.
In particular, use of most general unifiers assures us that if clauses C\ and
C2 have ground instances C\6 and C?9 in the ground deduction then the
resolvent C3 of C\ and C2 has C3O as a resolvent of CiQ and C?9. One
can find more on the general nature of such arguments in Volume 1 of this
handbook. Proofs of completeness of various restrictions of resolution must
be argued separately; many such proofs appear in Chang and Lee [Chang
and Lee, 1973] and Loveland [Loveland, 1978].
The search component of first-order proof procedures is as discussed
for the propositional (or ground) procedures, with one notable exception.
Proof Procedures for Logic Programming 175

Since a set of resolvents generated from a finite set of clauses can be infinite,
pure depth-first search may not terminate on some search paths. Rather, it
always utilizes the last resolvent successfully and never backtracks. (This
is well-known to Prolog users.) To insure that a refutation can be found
when one exists, a variation of pure depth-first called iterative deepening is
sometimes used. Iterative deepening calls for repeated depth-first search to
bounds d + nr, for d > 0, r > 0 and n = 0, 1, 2, ..., where the user sets the
parameters d and r prior to the computation. Frequently, r = 1 is used.
The advantage of small storage use, and speed for input linear search, is
retained and yet some similarity to breadth-first search is introduced. The
cost is recomputation of earlier levels. However, if one recalls that for
complete binary trees (no one-child nodes and all leaves at the same level)
there axe as many leaves as interior nodes, and the proportion of leaves
grows as the branching factor grows, one sees that recomputation is not
as frightening as it first appears. Perhaps the more important downside to
iterative deepening is that if you are structured so as to find the solution
while sweeping a small portion of the entire search space with sufficient
depth bound then a depth bound just too low makes the procedure sweep
the entire search space before failing and incrementing. This is a primary
reason why Prolog employs pure depth-first search. (Also, of course, be-
cause very frequently Prolog does not have infinite search branches, at least
under the clause ordering chosen by the user.)
Seeking linear refutations at the first-order level introduces a techni-
cal issue regarding factoring. The complication arises because we wish to
enforce the notion of linearity strongly. In particular, if a factor of a far
parent is needed that was not produced when that clause was a near par-
ent, then the factor would have to be entered as a line in the deduction,
violating the condition that the near parent always participates in creation
of the next entry (by resolution or factoring). To avoid this violation we
will agree that for linear resolution, unless otherwise noted, a resolution
operation can include use of a factor of the far parent ancestor or input
clause. Actually, for most versions of linear resolution, including the gen-
eral form introduced already, this caveat is not needed provided that the
given set has all its factors explicitly given also. (That is, the given set
is closed under the factoring inference rule.) A fuller treatment of this is-
sue, and much of what follows regarding linear resolution refinements and
variants, is found in Loveland [Loveland, 1978].
1.2 Linear resolution refinements
There are several restrictions of linear resolution, and several combinations
that multiply the possibilities for variations of linear resolutions. We will
settle for one variant, among the most restricted of possibilities. Our inter-
est in the particular form is that it is a resolution equivalent to the model
elimination (SL-resolution) procedure that we study next.
176 Donald W. Loveland and Gopalan Nadathur

We need to be more precise about the notion of clause subsumption


before considering the next refinement. The idea of subsumption is to rec-
ognize when one formula makes a second formula redundant. This concept
of subsumption can be expressed as follows: formula C subsumes formula D
iff VxC implies VyD, where x (resp. y) denotes the free variables of C (resp.
D). For example, Vx(P(x)V-P(f(x)) implies Vy(P(y) V-P(f(f(y))); here,
two instances of clause Px-Pf(x) are needed to infer Vy(Py-Pf(f(y))),
namely, Px~Pf(x) and P f ( x ) - P f ( f ( x ) ) . Since we need an expedient
test for subsumption, a form of subsumption used most often in resolution
provers is a more limited form we call 0-subsumption: clause C 0-subsumes
clause D if there exists a substitution 9 such that the literals of C0 are also
in D (we write this as C9 C D, accepting the view of a clause as a set of
literals), and that the number of literals of C does not exceed the number
of literals of D. Without the literal count requirement every factor would
be subsumed by its parent, and discarding clauses based on subsumption
would generally not be possible. For many resolution strategies including
breadth-first, linear and set-of support resolution, 0-subsumed clauses can
be removed from further consideration.
We do not claim that the check for 9-subsumption always carries an
acceptable cost; its use especially when C is a multiliteral clause has been
controversial. However, strong use of 9-subsumption has been well defended
(see [Wos et al., 1991]). Certainly the test is much faster than the general
definition because only one instance of C is involved.
We now introduce a small modification of the resolution inference rule
that utilizes the 9-subsumption criterion. We say that C is an s-resolvent
(for subsumption resolvent) of near parent clause C1 and far parent clause
C2 if C is (a factor of) a resolvent C3 where C 0-subsumes a substitu-
tion instance of near parent C\ — l, l the resolving literal. We amplify
our reference to "substitution instance" below. Sometimes a factor of the
standard resolvent is needed to satisfy the 9-subsumption restriction, and
at other times no s-resolvent exists. We will see that this highly constrains
the resolvent; moreover, we can limit any resolution where the far parent
is a non-input ancestor clause to s-resolution in the s-linear restriction we
define shortly. We give examples of s-resolution:
Example 1.2.1.
Near parent clause: PxQaRx
Far parent clause: Qy-Rf(x)
S-resolvent: Pf(x)Qa
Standard resolvent: Pf(x)QaQy
Proof Procedures for Logic Programming 177

Example 1.2.2.
Near parent clause: PxQaRx
Far parent clause: Qb-Rf(x)
No s-resolvent exists
The first example illustrates the reason we must allow s-resolvent C
to 0-subsume only an instance of near parent clause C1. In this example,
the standard resolvent contains Pf(x), created by the resolution operation
from parent literal Px. Clearly, there is no instance of the resolvent that
will 0-subsume near parent clause PxQaRx, but instance Pf(x)QaRf(x)
is 9-subsumed by Pf(x)Qa, here a factor of the standard resolvent.
The s-resolution operation can be quite costly to perform. The direct
approach involves computing the resolvent and checking the 0-subsumption
condition. The latter would most often cause rejection, so a priori tests on
the parents would be desirable. We do not pursue this further because we
finesse this issue later.
In pursuit of the strongly restricted form of linear resolution, we intro-
duce the notion of ordered clause, with the intention of restricting the choice
of resolving literal. By decreasing the choice we decrease the branching fac-
tor of the search tree. We set the convention that the rightmost literal of a
clause is to be the resolving literal; however, we merge leftward, so that in
the resolution operation the substitution instance demanded by the unifier
of the resolution operation may create a merge left situation where the
intended resolving literal disappears from the rightmost position! Then we
say that such a resolution operation application fails.
The class of orderings for the restricted linear resolution we define per-
mits any ordering the user chooses for input clauses except that each literal
of each input clause must be rightmost in some ordered clause. Thus a 3-
literal input clause must have at least three ordered clauses derived from
it. A resolvent-ordered clause has leftmost the surviving descendents of
the literals of the near parent ordered clause, in the order determined by
the near parent ordered clause, while the surviving descendents of the far
parent can be ordered in any manner the user wishes, i.e. is determined by
the particular ordering the user has chosen.
The ordering applied to the input clauses is superfluous, except to the
input clause chosen as first near parent clause (called the top clause), be-
cause all other uses of input clauses are as far parent clauses in a resolution
operation. By definition of resolvent ordered clause, the literals descendent
from the far parent can be ordered as desired when determining the resol-
vent. Of course, it is important that any ordering of input clauses provide
that every literal of every input clause be rightmost, and thus accessible
as a resolving literal. The notion of "top clause" is often used with arbi-
trary linear deductions, but is needed for ordered clause linear deductions
because the first resolution step is not symmetrical in the parent clauses;
178 Donald W. Loveland and Gopalan Nadathur

one clause provides the leftmost literals of the ordered resolvent.


An ordered s-linear deduction of clause A from a set 5 of clauses is
an ordered clause linear deduction of A from the fully factored set Sf of
5 where any resolvent without a input clause as a parent clause is an s-
resolvent. If the fully factored input set is used then factors of ancestor
clauses need not be derived in the deduction. This is true because it can
be shown that factors of s-resolvents need not be computed.
Given a resolution refutation B1,...,B n , it seems intuitive that if Bi
0-subsumes Bj, i < j, we have made no progress towards D in the interval
[i, j]. We call a deduction weakly tight if for no i < j is Bi C Bj. Full
tightness, that for no i < j does Bi 0-subsume Bj, is the ideal condition
to enforce, but it is not clear that it is enforceable. A careful definition
excludes clauses from 9-subsuming their factors but this hints at the care
needed. Also, since s-resolution results in a shorter clause than the near
parent (resolving literal of the near parent is removed at least) its use is a
win when available. A deduction satisfies the weak subsumption rule if an
s-resolvent is used whenever both parent clauses are derived clauses and
the near parent clause is variable-free. In particular this rule says that for
the propositional case no consideration to resolution with input clauses is
needed when an s-resolvent exists. The rule is called "weak" because the
unconstrained demand that s-resolution be used when available yields an
incomplete procedure. The reason is intuitive; s-resolution always works
when the right terms are at hand, but s-resolution where free variables
occur can lead to forcing a wrong instantiation.
An ordered s-linear deduction is a TOSS deduction (weakly tight or-
dered s-linear resolution with the weak subsumption rule) if the deduction
is weakly tight and satisfies the weak subsumption rule.
S-linear deductions exist that are neither weakly tight nor satisfy the
weak subsumption rule.
We give two examples of ordered s-linear (actually, TOSS) deductions.
The first example is one of the simplest examples where factoring is required
in almost all resolution formats. Any order of input clauses is permitted.

1. PxPy input
2. -Px-Py input, top clause
3. -PxPy resolvent of 1,2
4. -Px S-resolvent of 2,3
5. Px resolvent of 1,4
6. O S-resolvent of 4,5

This refutation uses an s-resolution with a input clause as one parent.


Here the far parent is the top clause. The top clause can be treated like
a derived clause, so that if derived clause 3 above had been ground, s-
resolution could be forced. Actually, in this case no alternative need be
Proof Procedures for Logic Programming 179

considered anyway, because if the resolvent has every literal simply a vari-
ant of its parent literal no instantiation has been eliminated by this action.
That is, the weak subsumption rule can be extended to force s-resolution
if the s-resolvent does not instantiate the literals inherited from the near
parent clause.
The second example has a more dramatic instance of s-resolution and
illustrates how regular factoring of resolvents can then be dropped. How-
ever, it is seen from this example that there is a possible price to pay for
banning factoring in that a longer proof is required. It is controversial
whether factoring is better included or excluded in linear deductions. It is
a trade-off of a reduced branching factor versus shorter proofs and hence
shorter proof trees. (Recall that Prolog avoids factoring; factoring within
linear resolution is optional in the Horn clause domain.)
The clause order is determined by P < Q < R in the following example.
Example 1.2.3.
1. Qx input
2. PxRx input
3. -PxRf(y) input
4. -Px-Rx input
5. Px-Qy-Rz input, top clause
6. Px-QyPz resolvent of 2,5
7. Px-QyRf(z) resolvent of 3,6
8. Px-Qy-Pf(z) resolvent of 4,7
9. Px-Qy s-resolvent of 6,8
10. Px resolvent of 1,9
11. Rf(x) resolvent of 3,10
12. -Pf(x] resolvent of 4,11
13. D s-resolvent of 10,12

By factoring at step 6, steps 7 and 8 would have been omitted. In


essence, step 7 through step 9 are repeated at step 11 through step 13, each
sequence "removing" Px. These redundant proof segments are common
when factoring is not used.
The TOSS restriction is not the strongest restriction possible; one can
superimpose a restriction on the choice of far parent of s-resolutions. The
added restriction is based on a restriction introduced by Andrews [An-
drews, 1976], called resolution with merging. A resolvent may often con-
tain a literal with parenthood in both parent clauses, such as literal Pa
in resolvent PaPx with parent clauses PaQx and PaPx-Qx. This occurs
because identical literals are merged. At the general level, unifiable liter-
als may remain separate in the resolvent, but factoring then permits their
"merging". Thus, PxQy and Px-Qy yields resolvent PxPz with factor
Px. When such a factor contains a literal with parenthood in both parent
180 Donald W. Loveland and Gopalan Nadathur

clauses, mimicking the ground case, we call this a merge factor and the
pertinent literal(s) merge literal(s).
An MTOSS deduction (a TOSS deduction with merging) is a TOSS
deduction where s-resolution occurs only when the rightmost literal of the
far parent is a descendent of a merge literal.
Although the MTOSS restriction is as constraining a principle as is
known regarding limiting resolution inferences when both parent clauses
are derived, the merge condition has not received much attention in im-
plementation. Actually, the TOSS and even the s-resolution restriction
have not received much attention regarding implementations explicitly. As
suggested earlier, a strongly related format has received much attention.
The focus of the preceding consideration of linear restrictions is to re-
duce the need to resolve two derived clauses together. Why is this interest-
ing? For several (related) reasons. One of the problems in proof search is
the branching factor, how many alternative inferences could be pursued at
a given point in a partial deduction. Since all input clauses are deemed al-
ways eligible (at least for linear resolution) the branching factor is reduced
by tightening conditions for resolving two derived clauses. A second rea-
son is that the input clauses can be viewed as given operators. In a linear
deduction the far parent clause can be seen as modifying the near parent
clause, transforming the near parent clause to a new but related clause.
S-resolution is pleasing in this regard because it says that one needs to
deviate from applying the given "operators" only when the near parent is
modified by removal of the literal receiving current attention, the resolving
literal. (The ordering previously specified for linear deductions is useful in
this viewpoint because it creates a focus on a given literal and its descen-
dents before considering another literal of the clause.) An implementation
consideration for limiting derived-clause resolution is that one can precon-
dition the input clauses to optimize the resolution operation when a input
clause is used in the resolution. Indeed, this is what Prolog does; compilers
exist based on this principle.
To the extent that the above points have validity then the ideal linear
resolution is the linear input format, where every resolvent has a input
clause as far parent clause. (The word "linear" in "linear input" is redun-
dant as use of a input clause as one parent forces a deduction to be linear,
but the term is common and makes explicit that input resolution is a linear
resolution restriction.)
Linear input resolution has been shown earlier not to be complete for
first-order logic, but it is complete over a very important subset. A formula
is a Horn formula if it is in prenex normal form with its quantifier-free part
(matrix) in conjunctive normal form, where each clause has at most one
positive literal. A Horn (clause) set is a clause set where each clause has
at most one positive literal. That every unsatisfiable Horn set has a linear
input refutation is immediate from the completeness of linear resolution
Proof Procedures for Logic Programming 181

and the observation that if one takes a clause of all-negative literals (a


negative clause) as top clause then every derived clause is a negative clause
and resolution between derived clauses is impossible. A negative clause
must exist in each unsatisfiable set, for otherwise a model consisting of
positive literals exists. The completeness proof for linear resolution shows
that a refutation exists using as top clause any clause that is in a minimally
unsatisfiable subset of the input clause set.
How much wider a class than Horn sets can linear input resolution han-
dle? Essentially, no wider class. It can be shown (see Loveland [Loveland,
1978]) that if ground clause set S has a linear input refutation then there
is a possible renaming of literals such that S1, a renaming of S, is a Horn
set. A ground set S1 is a renaming of S if S and Si differ only in that for
some atoms the positive and negated occurrences of that atom have been
interchanged. This is equivalent to renaming each atom of a selected set of
atoms by a new negated atom at every occurrences and replacing --A by
A. Thus the ground set

is not Horn, because the first clause has two positive literals, but does have
a linear input refutation, as the reader can easily check. However, the set is
convertible to a Horn set by interchanging Pa, Pb, PC with their negations.
This theorem that ground clause sets with linear input refutations are
renamable Horn sets need not hold at the general level. This is because
a literal within a clause C has several ground instances, some instances
requiring renaming and others not, and without finding the refutation (or
at least the instances needed) it is not possible to decide whether or not to
rename the literal occurrence within C.
Horn sets are considered further when we focus on logic programming.
We move on with our review of related theorem-proving procedures.
Apparently the linear input format is desirable, and sophisticated and
effective architectures for Prolog implementations have indeed affirmed its
desirability, but we seem condemned by the theorem just considered to
clause sets whose ground image is a renamable Horn set. Actually, the
manner of overcoming this problem was discovered before linear resolution
was understood as such. The means of having linear input format complete
for all of first-order logic is to alter the underlying resolution structure.
The model elimination (ME) procedure (see [Loveland, 1968; Loveland,
1969; Loveland, 1978]), introduced almost simultaneously with the resolu-
tion procedure, uses the notion of chain, analogous to an ordered clause but
with two classes of literals, A-literals and B-literals. The set of B-literals
in a chain can be regarded as a derived clause as resolution would obtain.
SL-resolution [Kowalski and Kuehner, 1971], also a linear input procedure,
uses the ME device of two classes of literals to achieve completeness for
182 Donald W. Loveland and Gopalan Nadathur

first-order logic. In this respect it is not strictly a resolution procedure in


spite of its name (it differs from ME primarily in introducing a factoring
operation) so does not conflict with the relationship between Horn clause
sets and linear input resolution as its label might suggest.
As is well-known to logic programming experts, the procedure support-
ing Prolog is a linear input procedure applied to Horn clauses, but adapted
pragmatically after experimentation with SL-resolution. ME owes its cur-
rent success to a series of implementations by Mark Stickel [Stickel, 1984;
Stickel, 1988]; the implementations are collected under the name Pro-
log Technology Theorem Prover. Recently, parallel implementations of
ME have appeared [Bose et al., 1991; Schumann and Letz, 1990; Astra-
chan and Loveland, 1991]. (Some impressive sequential implementations
accompany the parallel versions; e.g., see [Astrachan and Stickel, 1992;
Letz et al., 1991].) These implementations have used the Warren Abstract
Machine (WAM) architecture [Warren, 1983; Ait-Kaci, 1990] so successful
in implementing Prolog.
We present a summary of the model elimination and SL-resolution pro-
cedures. We present the ME procedure and then note the distinctions
introduced by SL-resolution.
There is great similarity between TOSS deductions and ME deductions;
for a proper variant of the TOSS procedure there is an isomorphism with
ME. (The variant allows no merging or factoring.) For that reason we
will present an example, first with a ground TOSS refutation and then
with a corresponding ME refutation. This will be helpful in following the
description of ME. Following the informal, simplified introduction to ME
and the examples we give a formal, precise definition of (one version of)
ME.
The model elimination procedure has two inference operations, exten-
sion and reduction. Extension corresponds closely to the resolution oper-
ation and reduction is related to factoring. Whereas the resolution opera-
tion deletes both resolving literals, the corresponding extension operation
retains the "resolving" literal of the "near parent" chain, the preceding
chain, but promotes the descendent of the literal from a B-literal to an
A-literal. A-literals capture and retain all the ancestor information that
is really needed, it turns out. (We will drop the reference to descendent
and identify the successor literal with the parent, even when instantiation
occurs. This will cause no confusion as it is natural to think dynami-
cally of literals passing to newly derived chains in a procedural way.) The
counterpart of s-resolution is the reduction operation (not related to "re-
duction" in logic programming parlance) where the rightmost B-literal is
removed because it is complementary (using unification in the general case)
to an A-literal to its left. Factoring is not needed in ME, as mentioned;
reduction can be seen to incorporate factoring and even merging, so merg-
ing of ground literals is not done either. (Factoring is explicitly done in
Proof Procedures for Logic Programming 183

SL-resolution although not needed for completeness.) Chains are ordered,


like ordered clauses, and the rightmost B-literal is always the literal acted
upon. Any A-literal that becomes rightmost is dropped. The ME proce-
dure we describe is called weak ME in [Loveland, 1978], and was the first
ME procedure introduced [Loveland, 1968].
We now give our two related examples. We take our examples from
[Loveland, 1978], p. 170.
Example 1.2.4. We choose the clause ordering for resolvents that is the
reverse order of their presentation in the prefix. Recall that the ordering
only affects literals from the far parent; literals from the near parent have
the order of the near parent.
A TOSS refutation:
1. -Pa input
2. -Pc input
3. -PbPd input
4. -PbPc-Pd input
5. PaPbPc input, top clause
6. PaPb resolvent of 2,5
7. PaPd resolvent of 3,6
8. PaPc^Pb resolvent of 4,7
9. PaPc s-resolvent of 6,8
10. Pa resolvent of 2,9
11. D resolvent of 1,10

Now we give the ME counterpart.


Example 1.2.5. The chain-ordering for derived chains is as for the pre-
ceding example, namely to extend with literals in reverse order to their
appearance in the prefix, the input clause set.
An ME refutation:
1. -Pa input clause
2. -Pc input clause
3. -PbPd input clause
4. -PhPc-Pd input clause
5. PaPbPc input, top clause
6a. PaPb[Pc] extend with 2
(A-literal is then deleted as it is rightmost)
6b. PaPb
7. Pa[Pb]Pd extend with 3
8. Pa[Pb][Pd]Pc^Pb extend with 4
9. Pa[Pb][Pd]Pc reduce
(A-literal Pb allows the reduction)
10. Pa extend with 2
184 Donald W. Loveland and Gopalan Nadathur

11. D extend with 1


We now present the specifications for the (weak) model elimination
procedure.
A chain is a sequence of literals, where each literal is in one of two
classes, A-literal or B-literal. The input clauses define elementary chains,
consisting of all B-literals. Each input clause must define at least one
chain for each literal of the input clause, with that literal rightmost. (This
provides that each literal of each input clause is accessible as a resolving
literal.)
Not every literal sequence is a useful chain. We will use only admissible
chains. A chain is admissible iff:
(1) complementary B-literals are separated by an A-literal;
(2) no B-literal is to the right of an identical A-literal;
(3) no two A-literals have the same atom; and
(4) the rightmost literal is a B-literal.
The reason for condition (1) is that complementary B-literals not sep-
arated by an A-literal originate in the same clause, and the offending non-
admissible chain would incorporate a tautologous clause. This can arise
even if the associated input clause that contributed those literals is not
tautologous because the derived chain undergoes instantiation. Reason (2)
comes from the procedure design, from one basic view of ME as using the
A-literals to attempt to define a model of the clause and where each attempt
at a model is eliminated in turn in the process of obtaining a refutation.
No redundant entries in a model specification is needed and the B-literal
could only become an A-literal later. Reason (3), that no identical or com-
plementary A-literals need occur combines the just mentioned relationship
to models and recognition that the reduction operation should have been
used when the right A-literal was a B-literal in an earlier chain. Reason
(4) reflects automatic removal of rightmost A-literals.
A (weak) model elimination refutation is a sequence of chains, beginning
with a top chain selected from the input elementary chains, ending with
D, where each successor chain is derived by extension or reduction defined
below.
Each inference rule takes a single chain as the parent chain. The ex-
tension operation also uses an auxiliary chain, for our purposes always a
input elementary chain.
Given an admissible parent chain K and a variable-disjoint auxiliary
elementary chain K1, chain K2 is formed by the extension rule if the right-
most literals of K and K1 can be made complementary literals by unifier
a. The derived chain K2 is then formed by deleting the rightmost literal of
K1a, promoting the rightmost literal of K to A-literal, and appending the
B-literals of K1a to the right of Ka in any order chosen by a user-given
Proof Procedures for Logic Programming 185

ordering rule. If no literals of K10 exist, all rightmost A-literals of Ka are


dropped.
Given an admissible parent chain K, chain K2 is formed by the reduction
rule if there exists an A-literal which can be made complementary to the
rightmost literal by unifier a. The derived chain K1 is then formed by
deleting the rightmost literal of Ka and any A-literals that would then
become rightmost.
The ordering rule that dictates the order of appended B-literals from
the auxiliary chain in extension is arbitrary. In SL-resolution this free-
dom to process in user-desired order is described by a selection function
that selects from the B-literals to the right of the rightmost A-literal the
literal to next use for extension. In ME this is captured by assuming an
oracle ordering rule that places the literals in order that the user chooses
for his/her selection sequence. Although B-literals between A-literals carry
their order forward under both inference rules, clearly this is a descriptive
and processing convenience only. The order can be changed at any time
by rationalizing that the ordering rule could have made this new selection
initially. No inference depends on nonrightmost B-literals to place any con-
straints on reordering. The selection function mechanism of SL-resolution
makes this ordering freedom clear.
The ME procedure also possesses a lemma mechanism that allows cer-
tain derived elementary chains to be added to the set of input chains. This
must be done with great selectivity because the auxiliary set can explode
in size very quickly. Also, the ability to use the derived chain in the man-
ner of the compiled input clauses seems to be lost. It seems that only unit
clause lemmas are worth retaining, and only a selected subset of those. The
lemma device has recently proved itself in the METEOR implementation
(see [Astrachan and Stickel, 1992]).
To prove substantial mathematical theorems using a linear input proof
procedure will take some substantial added devices. In spite of the very
high inference rate achievable due to the effective implementation tech-
niques that takes advantage of the fixed set providing the far parent clauses
(chains), the tremendous number of redundant calculations overwhelm the
search. Some manner of recording computed search subtrees when suc-
cessful (lemmas) and when failed, is needed. Such a methodology, called
caching, is being studied and has had some success (see [Astrachan and
Stickel, 1992]). Success with devices like caching are needed if linear input
provers are to be useful in proving substantial mathematical theorems. Not
only do procedures that employ a breadth-first search style keep clauses for
subsumption use, but forward-chaining systems tend to create more ground
literals because of a frequent property of axioms. Axioms are often Horn
clauses and the positive literal often contains only variables shared with
another literal. By resolving the positive literal last it is often a ground
literal when resolved.
186 Donald W. Loveland and Gopalan Nadathur

Although possibly not the optimal procedural format for proving math-
ematical theorems, the linear input format is excellent for situations where
strong search guidance exists, as previously noted. One of the situations
where such guidance is often possible is in the logic programming domain.
We now explore this domain.

2 The logic programming paradigm


Logic programming as a phrase has come to have two meanings. Many peo-
ple familiar only with Prolog regard logic programming in a very narrow
sense; the use of Horn clause logic in a manner implemented by Prolog.
In its broadest sense logic programming is a philosophy regarding com-
putation, asserting that deduction and computation are two sides of the
same concept. This viewpoint is built on the following general observa-
tion: an answer that can be computed using a problem-specific algorithm
can also be determined by a proof procedure presented with the axioms
of the appropriate theory and the problem as a theorem to be proven. In
its fullest sense, then, logic programming is concerned with the develop-
ment and use of logics whose proof procedures best fit our well-understood
notion of good computation procedures. The advantage of good compu-
tation models for the proof search is that the user then understands the
computational implications of the problem formulation as (s)he develops
the problem specification.
In this section we will look at the notion of logic programming in its
fullest sense, by a systematic presentation of the logical structures that
preserve good computation properties. This is done by preserving the im-
portant procedural qualities possessed by Horn clause logic that makes
Prolog so successful. Thus it is appropriate to begin by a review of logic
programming in the narrow sense mentioned above, which will reflect the
historical development of logic programming as a field of study. This will
bring us back to the subject matter of the preceding section, for the infer-
ence engine that underlies Prolog is a restricted form of linear resolution
using a revised notation.

2.1 Horn clause logic programming


We first need to define Horn clause logic. If we recall that a clause is a
disjunction of literals, then a Horn clause is a clause with at most one
positive literal, or atom. Clauses with one atom are called definite clauses.
One formulation of Horn clause logic is as a refutation logic of the type we
have already studied with the clause set restricted to Horn clauses. Sim-
ple restrictions of refutation systems already considered give us complete
refutation procedures for this logic. One restriction is to linear resolution;
simply impose an input restriction on a standard linear resolution system.
That is, the far parent clause must be from the input clause set. A sim-
Proof Procedures for Logic Programming 187

ple observation extends the completeness of linear resolution for first-order


logic to a completeness argument for linear input resolution over Horn logic.
That is done as follows: Start with a negative clause (all literals are nega-
tive; there always is such a clause if the clause set is unsatisfiable) and then
note that any linear deduction must also be an input deduction because
every derived clause is a negative clause. (Since no two negative clauses
can resolve with each other no two derived clauses can resolve together.)
One can show that if a refutation exists then one exists starting with a
negative clause. Thus, since in general an unsatisfiable clause set always
has a linear refutation, there must be a refutation among the linear input
deductions for each unsatisfiable Horn clause set. Finally, clauses can be
ordered arbitrarily and resolution done on only the rightmost, or only the
leftmost, literal of the current clause. This is clear because the subdeduc-
tion that "removes" that literal can only use input clauses, so is totally
independent of the other literals of the clause.
A second approach to structure a linear procedure for Horn clause logic
is to restrict the model elimination (SL-resolution) procedure by removing
the reduction rule. This leaves only extension as an inference rule which
makes no use of A-literals so the need for two types of literals has gone.
If the A-literals are removed then extension is simply the binary resolu-
tion inference rule. Thus ME also collapses to a linear input resolution
refutation procedure within the Horn clause domain. Again we note that
if we begin with a chain from a negative clause then reduction is never
called so that the above restriction is equivalent to the full ME procedure
on the class of Horn clause sets. Thus this restriction is complete for Horn
clause logic. This restriction clearly does not use factoring so we see that
linear input resolution without factoring is complete for the Horn clause
domain. In ME (SL-resolution) we can only order (select from) the most
recently added literals. However, with only the extension rule of infer-
ence, we can reorder (select from) the entire clause at each step. Extension
makes no reference to any literal within the current chain other than the
literal extended on, so any literal may be chosen first. This restriction
of ME (SL-resolution) to Horn clauses, with the changes noted, is called
SLD-resolution for SL-resolution for definite clauses. SLD-resolution was
so-named by Apt and van Emden for the reason outlined here (and, prob-
ably more importantly, because the development of Prolog's underlying
logic, SLD-resolution, followed historically from SL-resolution). However,
independent of the activity directly leading to Prolog, Robert Hill [Hill,
1974] defined and proved complete for Horn clause logic this ordered linear
input procedure, which he called LUSH resolution, for linear, unrestricted
selection resolution for Horn clauses.
Horn clause logic provides the mechanism that defines "pure Prolog",
the first-order logic part of Prolog. (We assume that the reader is familiar
with the search control aspects of Prolog and with negation-as-failure, and
188 Donald W. Loveland and Gopalan Nadathur

knows that these are not practically definable within first-order Horn clause
logic. Someone wishing further amplification of these qualities of Prolog
should consult some of the other chapters of this volume.) To better capture
the direct notion of proof and computation important to the presentation
of Prolog we need to adopt another view of Horn clause logic. The new
viewpoint says that we are not interested simply in whether or not a set
of clauses is unsatisfiable. We now are interested in answers to a query
made in the context of a database, to use one vocabulary common in logic
programming. In a more compatible terminology, we seek a constructive
proof of an existentially quantified theorem where the existential variables
are instantiated so as to make the theorem true. (It happens that only one
instantiation of these existential variables is needed for any Horn clause
problem; this is a consequence of the need for only one negative clause
in any (minimal) unsatisfiable Horn clause set, ground sets in particular.)
Along with a notion of answer we will also emphasize a direct proof system,
meaning that conceptually we will consider proofs from axioms with the
theorem as the last line of the deduction. This is important even though
our customary proof search will be a backward chaining system that begins
with the proposed theorem (query).
The alternate viewpoint begins with a different representation for a
Horn clause. The definite clauses have the generic form A1 V -A2 V -A3 V
. . . V -An which we now choose to represent in the (classically) logically
equivalent form A1 «- A2, A3 . . ., An where the latter expression is the
implication A2 A A3 A. . . A An D A1 written with the consequent on the left
to facilitate the clause presentation when backchaining in the proof search
mode, and with the comma replacing AND. Negative clause -A1 V -A2 V
-".A3 V. . . V -An is written «- A1 , A2, A3 ,. . . , An which as an implication is
read A1 A A2 A A3 A. . . A An D FALSE (i.e., that the positive conjunction
is contradictory). In particular, we note that the negative literals appear
as atoms on the right of <— (conjoined together) and the positive literal
appears on the left.
We now consider the problem presentation. Theorem proving problems
are often of the form A D B, where A is a set of axioms and B is the
proposed theorem. The validation of A D B is equivalent to the refutability
of its negation, A A -B. To assure that A A -B is a Horn clause set we can
start with Horn clauses as axioms and let B have form 3x(B 1 AB 2 A. . ./\Br),
where 3x means the existential closure of all variables in the expression it
quantifies. This form works for B because -3x(B 1 AB2 A. . . A B r ) is logically
equivalent to a negative Horn clause. In Horn clause logic programming
(i.e. for Prolog) the problem format is explicitly of the form P D Q, where
P, the program, is a set of definite Horn clauses and Q, the query, is of
form 3x(B 1 A B2 A . . . A Br). The notation <- B1 , . . . , Br that we use for
this formula is suggestive of a query if the «— is read here as a query mark.
Indeed, ?- is the prefix of a query for most Prolog implementations. (The
Proof Procedures for Logic Programming 189

existential quantifier is not notated.)


We have just seen that the Prolog query is an existentially quantified
conjunction of atoms. As already noted, it is central to logic programming
that the proof search yield a substitution instance for each existentially
quantified query variable and not just confirmation that the query instance
follows from the program clause set. We call this answer extraction. Answer
extraction is not original to logic programming; it goes back to work in
question-answering systems of Green and Raphael [Green and Raphael,
1968; Green, 1969] for natural language systems and robotics in particular.
Again, we note that for Horn clause sets only one query instance is needed;
every Horn clause program implies (at most) a single query instance, a
definite answer.
We emphasize that we have shifted from resolution theorem proving,
which focuses on refutability and with a very symmetric input set (no dis-
tinguished literal or clauses) to a focus on logical consequence and input
asymmetry. Asymmetry occurs at the clause level with a designated con-
sequent atom and at the problem level with a designated query.
We now focus more closely on answer extraction. Given program P and
query Q a correct answer substitution 9 for P, Q is a substitution over Q
such that Q0 is a logical consequence of P. For program P and query Q the
computed answer substitution 6 is the substitution applied to all variables
of Q as determined at the successful termination of the SLD-resolution pro-
cedure. The key statements of soundness and completeness for Horn clause
logic programming is that for SLD-resolution every computed answer is cor-
rect and if a correct answer exists then there exists a finite SLD-resolution
deduction yielding a computed answer. Of course, this is a direct trans-
lation of results known to the resolution theory community. (We hasten
to note that many important results in logic programming are not direct
derivatives of work in the automated theorem proving field.)
Finally, we remark that the backchaining, problem reduction proof
search mode inherited from linear resolution is an artifact of practicality
only. The goal-oriented search has proved very advantageous, especially
since user search control (e.g. ordering of clauses in the program) means
that a small fraction of the search space is expanded in depth-first goal-
oriented search. Conceptually, it is also useful to consider proofs in a
forward direction, from facts through implied facts to the query instance.
We present a small example in the two formats of SLD-resolution to
illustrate the isomorphism between the logic programming notation and
the resolution notation for linear input deductions. We give the problem in
the logic programming style first. Note that s, t, u, v, w, x, y, z are variables
and a, b, c are constant terms. The substitutions that participate in the
computed answer are indicated. The format is x/t. where x is a variable
and t is the replacement term.
190 Donald W. Loveland and Gopalan Nadathur

Program:
1. p(a,a,b)
2. p(c, c, 6)
3. p(x, f(u,iv),z) *-p(x,u,y),p(y,w,z)
4. p(x,g(s),y)<-p(y,s,x)
Query:
5. <-p(a, t, c)
Derivation:
6. «- p(a, u,y), p(y, w, c) using clause 3
Substitution: t/f(u,w)
7. 4— p(b, w, c) using clause 1
Substitution: u/a
8. <- p(c, s, b) using clause 4
Substitution: w/g(s)
9. success using clause 2
Substitution: s/c
Answer:

That the deduction form is that of ordered clause input linear resolution
is clear by comparing the above to the resolution deduction that follows.

Given clauses:
1. p(a, a, b)
2. p(c, c, b)
3. p(x, f ( u , w), z), -p(x, u, y),-p(y, w, z)
4. p(x,g(s),y)-p(y,s,x)
5. -p(a,t,c)
Derivation:
6. -p(a, u, y),-p(y, w, c) resolvent of 3,5
Substitution: t / f ( u , w)
7. -p(b,w,c) resolvent of 1,6
Substitution: u/a
8. -p(c, s, b) resolvent of 4,7
Substitution: w/g(s)
9. D resolvent of 2,8
Substitution: s/c

2.2 A framework for logic programming


We have now completed our review of SLD-resolution and its role in Horn
clause logic programming. Horn clause logic programming provides the core
concepts underlying Prolog, the so-called pure Prolog. But is the quite
Proof Procedures for Logic Programming 191

restricted form given by Horn clause logic the only meaningful logic for
logic programming? Do more powerful (and useful) logics exist? The idea
of using arbitrary clauses as opposed to Horn clauses has been championed
by [Kowalski, 1979]. In a similar fashion, the use of full first-order logic
— as opposed to logic in clausal form — has been advocated and explored
[Bowen, 1982]. In a more conservative direction, extending the structure
of Horn clauses by limited uses of connectives and quantifiers has been
suggested. The best known extension of pure Horn clause logic within the
logic programming paradigm permits negation in goals, using the notion of
negation-as-failure. However, the idea of using implications and universal
quantifiers and, in fact, arbitrary logical connectives in goals has also been
advocated [Gabbay and Reyle, 1984a; Lloyd and Topor, 1984; McCarty,
1988a; McCarty, 1988b; Miller, 1989b; Miller et al., 1991].
There is a wide spectrum of logical languages between those given by
Horn clause logic and full quantificational logic, especially if the derivability
relation to be used is also open to choice. An obvious question that arises
in this situation is whether some of these languages provide more natural
bases for logic programming than do others. We argue for an affirmative
answer to this question in this subsection. In particular, we describe a
criterion for determining whether or not a given predicate logic provides
an adequate basis for logic programming. The principle underlying this
criterion is that a logic program in a suitable logical language must satisfy
dual needs: it should function as a specification of a problem while serving
at the same time to describe a programming task. It is primarily due to the
specification requirement that an emphasis is placed on the use of symbolic
logic for providing the syntax and the metatheory of logic programming.
The programming task description requirement, although less well recog-
nized, appears on reflection to be of equal importance. The viewpoint we
adopt here is that the programming character of logic programming arises
from thinking of a program as a description of a search whose structure is
determined by interpreting the logical connectives and quantifiers as fixed
search instructions. Prom this perspective, the connectives and quantifiers
appearing in programs must exhibit a duality between a search-related in-
terpretation and a logical meaning. Such a duality in meaning cannot be
attributed to these symbols in every logical language. The criterion that we
describe below is, in essence, one for identifying those languages in which
this can be done.
To provide some concreteness to the abstract discussion above, let us
reconsider briefly the notions of programs and queries in the context of
Horn clause logic. As has previously been noted, one view of a query in
this setting is as a formula of the form 3x (p1 (x) A. . . Ap n (x)); x is used here
to denote a sequence of variables and Pi(x) represents an atomic formula in
which some of these variables appear free. The SLD-resolution procedure
described in the previous subsection tries to find a proof for such a query
192 Donald W. Loveland and Gopalan Nadathur

by looking for specific substitutions for the variables in x that make each
of the formulas p; (x) follow from the program. This procedure can, thus,
be thought to be the result of assigning particular search interpretations
to existential quantifiers and conjunctions, the former being interpreted as
specifications of (infinite) OR branches with the branches being parame-
terized by substitutions and the latter being interpreted as specifications of
AND branches. This view of the logical symbols is of obvious importance
from the perspective of programming in Horn clause logic. Now, it is a
nontrivial property of Horn clause logic that a query has a constructive
proof of the kind described above if it has any proof at all from a given
program. It is this property that permits the search-related interpretation
of the logical symbols appearing in queries to co-exist with their logical
or declarative semantics and that eventually underlies the programming
utility of Horn clauses.
Our desire is to turn the observations made in the context of Horn clause
logic into a broad criterion for recognizing logical languages that can be
used in a similar fashion in programming. However, the restricted syntax
of logic in clausal form is an impediment to this enterprise and we therefore
enrich our logical vocabulary before proceeding further. In particular, we
shall allow the logical symbols A, V, D, V, 3, T and J. to appear in the
formulas that we consider. The first five of these symbols do not need an
explanation. The symbols T and L are intended to correspond to truth
and falsity, respectively. We shall also use the following syntactic variables
with the corresponding general connotations:
D A set of formulas, finite subsets of which serve as possible
programs of some logic programming language.
g A set of formulas, each member of which serves as a possi-
ble query or goal for this programming language.
A An atomic formula excluding T and L.
D A member of D, referred to as a program clause.
G A member of g, referred to as a goal or query.
P A finite set of formulas from D, referred to as a (logic)
program.
Using the current vocabulary, computation in logic programming can be
viewed as the process of constructing a derivation for a query G from a
program P. The question that needs to be addressed, then, is that of the
restrictions that must be placed on D and g and the notion of derivation
to make this a useful viewpoint.
Towards answering this question, we shall describe a proof-theoretic cri-
terion that captures the idea of computation-as-search. The first step in this
direction is to define the search-related semantics that is to be attributed to
the logical symbols. This may be done by outlining the structure of a sim-
ple nondeterministic interpreter for programs and goals. This interpreter
Proof Procedures for Logic Programming 193

either succeeds or does not succeed, depending on the program P and the
goal G that it is given in its initial state. We shall write P \-Q G to indicate
that the interpreter succeeds on G given P; the subscript in ho signifies
that this relation is to be thought of as the "operational" semantics of an
(idealized) interpreter. The behavior of this interpreter is characterized by
the following search instructions corresponding to the various logical sym-
bols; the notation [x/t]G is used here to denote the result of substituting t
for all free occurrences of x in G:

SUCCESS P \-0 T.
AND P \-0 G! A G2 only if P ho G1 and P \-o G2.
OR P\-0 G1 VG2 only if P \-o G1 or P \-o G2.
INSTANCE P h0 3x G only if there is some term t such that P \-o
[x/t]G.
AUGMENT P \-0 D D G only if P U {D} ho G.
GENERIC P\-0VxG only if P \-o [x/c]G, where c is a constant that
does not appear in P or in G.

These instructions may be understood as follows: The logical constant T


signifies a successfully completed search. The connectives A and V provide,
respectively, for the specification of an AND and a nondeterministic OR
node in the search space to be explored by the interpreter. The quantifier
3 specifies an infinite nondeterministic OR node whose branches are pa-
rameterized by the set of all terms. Implication instructs the interpreter
to augment its program prior to searching for a solution to the consequent
of the implication. Finally, universal quantification is an instruction to in-
troduce a new constant and to try to solve the goal that results from the
given one by a (generic) instantiation with this constant.
Certain comments about the search instructions are in order before we
proceed further. First of all, it is necessary to counter a possible impression
that their structure is arbitrary. If ho were interpreted as derivability in
almost any reasonable logical system, then the conditions pertaining to it
that are contained in the instructions AND, OR, INSTANCE, AUGMENT
and GENERIC would be true in the reverse direction. Thus, in order to
maintain a duality between a logical and a search-related reading of the
logical symbols, the operational interpretation of these symbols is forced
to satisfy the listed conditions in this direction. The instructions provided
above represent, in this sense, a strengthening of the logical character of the
relevant connectives and quantifiers in a way that permits a search-related
meaning to be extracted for each of them. Further, the search interpre-
tations that are so obtained are, quite evidently, of a natural kind. The
interpretations for 3 and A are, as we have observed, the ones accorded
to them within the framework of Horn clauses, the notions of success and
194 Donald W. Loveland and Gopalan Nadathur

disjunctive search are meaningful ones, and some of the programming util-
ity of the AUGMENT and GENERIC search operations will be discussed
later in this section.
The second point to note is that we have addressed only the issue of the
success/failure semantics for the various logical symbols through the pre-
sentation of the idealized interpreter. In particular, we have not described
the notion of the result of a computation. There is, of course, a simple way
in which this notion can be elaborated: existentially quantified goals are
to be solved by finding instantiations for the quantifiers that yield solvable
goals, and the instantiations of this kind that are found for the top-level
existential quantifiers in the original goal can be provided as the outcome
of the computation. However, in the interest of developing as broad a
framework as possible, we have not built either this or any other notion of
a result into our operational semantics. We note also that our interpreter
has been specified in a manner completely independent of any notion of
unification. Free variables that appear in goals are not placeholders in
the sense that substitutions can be made for them and substitutions are
made only in the course of instantiating quantifiers. Of course, a practical
interpreter for a particular language whose operational semantics satisfies
the conditions presented here might utilize a relevant form of unification
as well as a notion of variables that can be instantiated. We indicate how
such an interpreter might be designed in the next section by considering
this issue in the context of specific languages described there.
In a sense related to that of the above discussion, we observe that our
search instructions only partially specify the behavior to be exhibited by
an interpreter in the course of solving a goal. In particular, they do not
describe the action to be taken when an atomic goal needs to be solved.
A natural choice from this perspective turns out to be the operation of
backchaining that was used in the context of Horn clause logic. Thus, an
instruction of the following sort may often be included for dealing with
atomic goals:
ATOMIC P \~o A only if A is an instance of a clause in P or P \-Q G
for an instance G D A of a clause in P.
There are two reasons for not including an instruction of this kind in the
prescribed operational semantics. First, we are interested at this juncture
only in describing the manner in which the connectives and quantifiers in a
goal should affect a search for a solution and the particular use that is to be
made of program clauses is, from this perspective, of secondary importance.
Second, such an instruction requires program clauses to conform to a fairly
rigid structure and as such runs counter to the desire for generality in the
view of logic programming that we are developing.
A notable omission from the logical connectives that we are considering
is that of -. The interpretation of negation in most logical systems corre-
Proof Procedures for Logic Programming 195

spends to an inconsistency between a proposition and a set of assumptions.


In contrast, the natural search-related meaning for this symbol seems to be
that of failure in finding a solution to a given goal. There is, thus, a consid-
erable variance between the two desired views of the symbol. Further, the
search-related view assumes that the abilities of the system of derivation
can be circumscribed in some fashion. It appears difficult to represent this
kind of an ability within a proof system and so we do not make an attempt
to do this here. It is easy to provide for an alternative view of negation
that is close to its logical interpretation as is done in [Miller, 1989b]. There
is, however, no need to add an explicit negation symbol for this purpose
since it can be treated as defined in terms of D and L. As for the symbol
± itself, there is a natural tendency to read it as failure. Once again, this
does not correspond to the usual interpretation of L within logical sys-
tems where it is considered to be a proposition that is "true" when there
is a contradiction in the assumptions. In including an interpretation of J_
that is close to its logical meaning, a choice has to be made between two
competing views of the consequences of discovering a contradiction. The
search instructions described above are compatible with either view. The
view that is ultimately adopted is dependent on the logical system that is
chosen to provide the declarative semantics; in particular, on which of the
notions of provability that are described below is used.
The declarative semantics to be associated with the various connectives
and quantifiers can be formalized by means of sequent-style proof systems.
We digress briefly to summarize the central notions pertaining to such sys-
tems. The basic unit of assertion within these systems is that of a sequent.
This is a pair of finite (possibly empty) sets of formulas (F, 6} that is writ-
ten as F —> 0. The first element of this pair is commonly referred to
as the antecedent of the sequent and the second is called its succedent.
A sequent corresponds intuitively to the assertion that at least one of the
formulas in its succedent holds given as assumptions all those in its an-
tecedent. (Although we shall not be concerned with the situation where 0
is empty, this usually corresponds to a contradiction in the assumptions.) A
proof for a sequent is a finite tree constructed using a given set of inference
figures and such that the root is labeled with the sequent in question and
the leaves are labeled with designated initial sequents. Particular sequent
systems are characterized by their choice of inference figures and initial
sequents.
Figure 1 contains the inference figure schemata of interest in this paper.
Actual inference figures are obtained from these by instantiating F, 0 and
A by sets of formulas, B, C, and P by formulas, t by a term and c by a
constant. There is, in addition, a proviso on the choice of constant for c:
it should not appear in any formula contained in the lower sequent in the
same figure. The notation B, F (0, B) that is used in the schemata is to
be read as an abbreviation for {B} U F ({B} U 0) and a set of formulas
196 Donald W. Loveland and Gopalan Nadathur

can be viewed as being of this form even if B € F (B € 0). The initial


sequents in the proof systems that we consider are of the form F —t 0
where T 6 0 or F l~l © contains either J. or an atomic formula.
Sequent-style proof systems of the kind we have described generally
have three structural inference figure schemata, which we have not listed.
Two of these schemata, usually called interchange and contraction, ensure
that the order and multiplicity of formulas in sequents are unimportant
in a situation where the antecedents and succedents are taken to be lists
instead of sets. Our use of sets, and the interpretation of the notation B, T
(0, B) that is explained above, obviates these schemata. The third kind
of structural inference figure schema, commonly referred to as thinning or
weakening, permits the addition of a formula to the antecedent or succedent
of a sequent whose proof has already been constructed. Such inference
figures are required when the antecedent and the succedent of an initial
sequent are permitted to have at most one formula. Our choice of initial
sequents, once again, obviates these inference figures.
We shall call an arbitrary proof that is obtained using the schemata
in Figure 1 a C-proof. Placing additional restrictions on the use of the
schemata in Figure 1 results in alternative notions of derivations. One
such notion that is of interest to us in this paper is that of an I-proof.
A proof of this kind is a C-proof in which each sequent occurrence has a
singleton set as its succedent. As a further refinement, if we disallow the
use of the inference figure schema J.-R in an I-proof, we obtain the notion
of an M-proof. We shall need these three notions of derivability later
in this chapter and so we introduce special notation for them. We write
F h c B, F \~i B, and F \~M B, if the sequent F —>• B has, respectively,
a C-proof, an I-proof, and an M-proof; if the set F is empty, it may be
omitted from the left side of these three relations. The three relations
that are thus defined correspond to provability in, respectively, classical,
intuitionistic and minimal logic. More detailed discussions of these kinds of
sequent proof systems and their relationship to other presentations of the
corresponding logics can be found in [Fitting, 1969; Gentzen, 1969; Prawitz,
1965; Troelstra, 1973]. It is relevant to note that in our presentation of
these three provability relations we have excluded a structural inference
figure schema called cut. As a result of this omission, the correspondence
between our presentation and the customary definitions of the respective
derivability relations depends on a fundamental theorem for the logics in
question that is referred to as the cut-elimination theorem.
The framework of sequent systems provides a convenient means for
formalizing the desired duality in interpretation for the logical connectives
and quantifiers. The first step in this direction is to formalize the behavior
of the idealized interpreter within such systems. We do this by identifying
the notion of a uniform proof. This is an I-proof in which any sequent whose
succedent contains a non-atomic formula occurs only as the lower sequent
Proof Procedures for Logic Programming 197

Fig. 1. Inference figure schemata

of an inference figure that introduces the top-level logical symbol of that


formula. Suppose now that the sequent F —>• G appears in a uniform
proof. Then the following conditions must be satisfied with respect to this
sequent:
o If G is T, the sequent is initial.
o If G is B A C then the sequent is inferred by A-R from F —> B and
F —> C.
o If G is BvC then the sequent is inferred by V-R from either F —)• B
or F —> C.
o If G is 3z P then the sequent is inferred by 3-R from F —t [x/t]P
for some term t.
o If G is B D C then the sequent is inferred by D-R from B, F —> C.
o If G is Vx P then the sequent is inferred by V-R from F —> [x/c]P,
where c is a constant that does not occur in the given sequent.
198 Donald W. Loveland and Gopalan Nadathur

The structure of a uniform proof thus reflects the search instructions asso-
ciated with the logical symbols. We can, in fact, define ho by saying that
P \-Q G, i.e., the interpreter succeeds on the goal G given the program
P, if and only if there is a uniform proof of the sequent P —> G. We
observe now that the logical symbols exhibit the desired duality between a
logical and a search-related meaning in exactly those situations where the
existence of a proof within a given logical system ensures the existence of a
uniform proof. We use this observation to define our criterion for establish-
ing the suitability of a logical language as the basis for logic programming:
letting h denote a chosen proof relation, we say that a triple (D, g, h) is an
abstract logic programming language just in case, for all finite subsets P of
D and all formulas G of G, P I- G if and only if P —> G has a uniform
proof.

2.3 Abstract logic programming languages


Abstract logic programming languages as described in the previous sub-
section are parameterized by three components: a class of formulas each
member of which can serve as a program clause, another class of formu-
las that corresponds to possible queries and a derivability relation between
formulas. Within the framework envisaged, the purpose of the derivability
relation is to provide a declarative semantics for the various logical symbols
used. This purpose is realized if a well understood proof relation, such as
the relation \~c, I-/, or HM, is used. Once a particular declarative seman-
tics is chosen, it must be verified that this accords well with the desired
operational or search semantics for the logical symbols. In general, this will
be the case only when the permitted programs and queries are restricted
in some fashion. In determining if the desired duality is realized in any
given situation, it is necessary to confirm that the existence of a proof for
a goal from a program entails also the existence of a uniform proof. If
this is the case, then our logical language has a proof procedure that is
goal-directed and whose search behavior is substantially determined by the
logical symbols that appear in the goal being considered. Viewed differ-
ently, the defining criterion for an abstract logic programming language
is one that ensures that a procedure of this kind may be used for finding
proofs without an inherent loss of completeness.
Even though our criterion for identifying logical languages that can
serve as the bases for logic programming seems to have content at an in-
tuitive level, it is necessary to demonstrate that it actually serves a useful
purpose. An important requirement in this regard is that it have a definite
discriminatory effect. Towards showing that it does have such an effect, we
first consider some recently proposed extensions to Horn clause logic and
show that they fail to qualify as abstract logic programming languages.
These extensions utilize the negation symbol that may be introduced into
our setting by defining ->P to be (P D i.). With this understanding, the
Proof Procedures for Logic Programming 199

first non-example is one where V consists of the universal closures of atoms


or formulas of the form (B1 A . . . A Bn) D A where A is an atom and Bi is
a literal, Q consists of the existential closure of conjunctions of atoms and
h corresponds to classical provability (see, for example, [Fitting, 1985]).
Within the language defined by this triple, the set {p D q(a),-^p D q(b)}
constitutes a program and the formula 3x q(x) corresponds to a query. This
query is provable from the program in question within classical logic and
the following is a C-proof for the relevant sequent:

There is, however, no uniform proof for 3xq(x) from the program being
considered; such a proof would require q(t) to be derivable from {p D
q(a), ->p D q(b)} for some particular t, and this clearly does not hold. Thus,
the language under consideration fails to satisfy the defining criterion for
abstract logic programming languages. Another non-example along these
lines is obtained by taking D to be the collection of (the universal closures
of) positive and negative Horn clauses, g to consist of the existential closure
of a conjunction of literals containing at most one negative literal and h to
be classical provability [Gallier and Raatz, 1987]. This triple fails to be an
abstract logic programming language since
->p(a) V ~>p(6) \~c 3x ~>p(x)
although no particular instance of the existentially quantified goal can be
proved. For a final non-example, let T> and g consist of arbitrary formulas
and let I- be provability in classical, intuitionistic or minimal logic. This
triple, once again, does not constitute an abstract logic programming lan-
guage, since, for instance,
p(a) Vp(b) \-3xp(x)
regardless of whether h is interpreted as h^, I-/ or KM whereas there is no
term t such that p(t) is derivable even in classical logic from p(a) V p(6).
To conclude that our criterion really makes distinctions, it is necessary
to also exhibit positive examples of abstract logic programming languages.
We provide examples of this kind now and the syntactic richness of these
examples will simultaneously demonstrate a genuine utility for our crite-
rion.
The first example that we consider is of a logical language that is
slightly richer than Horn clause logic. This language is given by the triple
(Di, Qi, \~c} where g1 is the collection of first-order formulas defined by the
syntax rule
G ::= T\A\G/\G\GvG\lxG
200 Donald W. Loveland and Gopalan Nadathur

and D1 is similarly defined by the rule


D ::= A\GDA\D/\D\VxD.
As observed in Subsection 2.1, a positive Horn clause is equivalent in clas-
sical logic to a formula of the form Vx (B D A) where B is a conjunction
of atoms. This form is subsumed by the D formulas defined above. The
queries within the Horn clause logic programming framework are of the
form 3x B where B is a conjunction of atoms. Formulas of this sort are,
once again, contained in the set denoted by g1. It is thus evident that the
paradigm of programming provided for by Horn clause logic is subsumed
by the triple (D1, G1,\-c) if indeed this turns out to be an adequate basis
for logic programming. That this is the case is the content of the following
proposition:
Proposition 2.3.1. The triple (D1, g1, \~c) constitutes an abstract logic
programming language.
The reader interested in a proof of this proposition is referred, amongst
other places, to [Miller et al, 1991]. Although we do not present a proof
here, it is instructive to consider the structure of such a proof. As noted
already, the nontrivial part consists of showing that the existence of a
classical proof guarantees also the existence of a uniform proof. There
are two observations that lead to this conclusion in the case of interest.
First, if P is a finite subset of D1 and G is a member of G1, then an
inductive argument on the heights of derivations shows that the sequent
P —>• G has a C-proof only if it has one in which (a) the inference figure
-L-R does not appear and (b) there is only one formula in the succedent
of each sequent appearing in the proof. Second, in a derivation of the sort
engendered by the first observation, the inference figures A-L, D-L and
V-L can be moved above A-R, V-R and 3-R if these immediately precede
them. Thus, the derivation of the restricted form determined by the first
observation can in fact be transformed into a uniform proof for the same
sequent.
The outline of the proof for Proposition 2.3.1 brings out certain addi-
tional aspects that should be noted. To begin with, a derivation of the kind
described in the first observation is obviously an I-proof and an M-proof.
The content of this observation may thus be rephrased in the following
fashion: the notions of provability in classical, intuitionistic and minimal
logic are indistinguishable from the perspective of sequents of the form
P —>• G, assuming that P is a finite subset of D1 and G is an element
of g1. The second point to note is that we are assured of the existence of
a uniform proof regardless of whether the sequent in question has a proof
in classical, intuitionistic or minimal logic. Thus, the triples (D1, g1, ^~c),
(D1, G1,\-j) and (D1, G1,\-M) all determine the same abstract logic pro-
gramming language.
Proof Procedures for Logic Programming 201

We have already observed that the programming paradigm based on


Horn clauses is subsumed by the abstract logic programming language
(D1, g1, l~c). There is a correspondence in the converse direction as well.
Using the equivalence of G D A and ->G V A, the operations of prenexing
and anti-prenexing, de Morgan's laws and the distributivity of V over A,
it can be seen that each formula in D1 is equivalent in classical logic to a
conjunction of positive Horn clauses. Each formula in g1 can, in a similar
fashion, be seen to be classically equivalent to the negation of a conjunc-
tion of negative Horn clauses. Now, it is easily observed that the union
of a set S1 of positive Horn clauses and a set S2 of negative Horn clauses
has a refutation only if S1 augmented with some particular element of S2
also has a refutation. Thus, the idea of solving a goal from a program
in the abstract logic programming language (D1, g1,^c) can be reduced,
by a process of translation, to logic programming using Horn clause logic.
However, there are several advantages to not carrying out such a reduc-
tion and to utilizing the syntax for programs and goals embodied in the
triple (D1, g1,\-c) instead. First of all, preserving the richer syntax can
lead to a more compact notation, given that the size of the conjunctive
normal form of a formula can be exponentially larger than that of the for-
mula itself. Second, this syntax allows for the explicit use of the symbols V
and 3 in programs and goals with their associated search interpretations.
These symbols provide useful programming primitives and also engender
a syntax that is closer to the one used in practice; for instance, Prolog
programmers often use disjunctions in the bodies of clauses with the in-
tention of signifying a disjunctive search. Finally, the reduction described
above depends significantly on the use of classical logic. The translation
to clausal form is questionable if the abstract logic programming language
under consideration is thought to be defined by the triple (D1, g1, ^i) or
the triple (D1, g 1 , ^ - M ) instead. This point is of particular relevance since
the enrichment to this abstract logic programming language that we con-
sider below is based on abandoning classical logic in favor of intuitionistic
or minimal logic.
The language (D1, g1, \~c) does not utilize the framework developed in
the previous subsection fully. In particular, the symbols corresponding to
the search operations AUGMENT and GENERIC are excluded from the
syntax of goal formulas in this language. When we consider adding D
to the logical symbols already permitted in goals, we see that the use of
classical logic to provide the declarative semantics does not accord well with
the intended search interpretation for the various symbols. Consider, for
example, the goal formula pV (p D q). Given the desired search semantics
for V and D, we would expect an interpreter that attempts to solve this
goal in the context of an empty program to fail; the interpreter should
succeed only if either p is solvable from the empty program or q is solvable
from a program containing just p, and clearly neither is the case. This
202 Donald W. Loveland and Gopalan Nadathur

expectation is manifest in the fact that the sequent —» p V (p D q) does


not have a uniform proof. There is, however, a C-proof for this sequent as
witnessed by the following derivation:

(The last inference figure in this derivation introduces a formula already


in the succedent which need not be written, given our treatment of an-
tecedents and succedents as sets.) The problem in this case arises from the
fact that the formulas p V (p D q) and (p D p) V (p D q) are equivalent
in classical logic. However, the search semantics of these two formulas are
incompatible: the former permits the clause p to be used only in finding a
solution for q, whereas the latter makes it available even in solving p. A
problem of a similar nature arises when we consider interactions between
3 and D. The "query" 3x(p(x) D q), for instance, has a classical proof
from a program consisting of the clause ((p(a) Ap(6)) D q) although it has
no derivation consistent with the search interpretation of 3. In general,
a derivability relation that is weaker than that of classical logic is needed
for determining the declarative semantics of a logic programming language
incorporating the AUGMENT search operation, and intuitionistic or min-
imal provability appear to be possible choices. It may appear somewhat
paradoxical that a weaker derivability relation should provide the basis for
a richer logic programming language. However, this apparent paradox dis-
appears when we consider the equivalence of (D1,g1, \~c), (D1, g1, h/) and
(D1,G1,\~M)', intuitionistic and minimal derivability provide, in a certain
sense, a tighter analysis of the declarative semantics of the same language.
We describe replacements for D1 and g1 that realize the richer syntax for
programs and goals considered above. In particular, let G- and D-formulas
be given now by the following mutually recursive syntax rules:
G ::= T | A \ G A G \ G V G \ VxG \ 3xG \DDG,
D ::= A \ G D A \ Vx D \ D A D.
We shall use g2 and D2 to refer to the classes of G- and D formulas so
defined. There is a correspondence between these D-formulas and those
described by the logician Harrop [Harrop, 1960; Troelstra, 1973]. Assuming
that the symbol B represents arbitrary formulas, the so-called Harrop for-
mulas are equivalent in intuitionistic and minimal logic to the H-formulas
defined by the rule
H::= A\BDA\VxH\H/\H.
An interesting property of Harrop formulas, proved in [Harrop, I960], is
the following: if P is a finite set of Harrop formulas and C is a non-atomic
formula, then "P h/ C only if there is an I-proof of P —> C in which
Proof Procedures for Logic Programming 203

the last inference rule introduces the logical connective of C. Thus, an


I-proof of a sequent whose antecedent is a set of Harrop formulas can be
made uniform "at the root." This observation does not in itself imply the
existence of a uniform proof for P —> C whenever it has an I-proof: there
may be sequents in such a derivation whose antecedents contain formulas
that are not Harrop formulas and whose proofs can therefore not be made
uniform at the root. For example, consider the situation when P is empty
and C is (pVq) D (q Vp). Following Harrop's observation, we see that a
proof of the resulting sequent can be obtained from one for p V q —> qVp.
However, the antecedent of this sequent does not contain an H-formula
and it is easily seen that a derivation for the sequent in intuitionistic logic
cannot have the introduction of the disjunction in the succedent as its
last step. Now, by further restricting the syntax of the H-formulas and
of the formulas that we attempt to show follow from them, it is possible
to ensure the applicability of Harrop's observation at each sequent in a
derivation. This idea is, in fact, reflected in the definitions of the formulas
in g2 and D2 above. Viewed differently, every subformula of a formula
in D2 that appears in a positive context has the top-level structure of a
Harrop formula. The members of D2 are, for this reason, referred to as
hereditary Harrop formulas.
We shall show presently that the triple (D2, g2, l~/) defines an abstract
logic programming language. Before we do this, we illustrate the nature of
programming in this language.
Example 2.3.2. The problem that we consider here is commonly referred
to as the sterile jar problem. Assume that the following facts are given to
us: a jar is sterile if every germ in it is dead, a germ in a heated jar is
dead and a particular jar has been heated. The task is to represent this
information so that the conclusion that there is some jar that is sterile can
be drawn.
There is an obvious division of labor for the representation task at hand:
the information that is provided is best represented by a program and the
conclusion that is to be drawn can be thought of as a goal. Let sterile be
a unary predicate corresponding to the sterility of jars, let germ(X) and
dead(X) represent the assertions that X is a germ and that (the germ) X
is dead, let heated(Y) represent the fact that (the jar) Y has been heated
and let in(X, Y) correspond to the assertion that (the germ) X is in (the
jar) Y. Letting j be a constant that denotes the given jar, the assumptions
of the problem can be represented by the following formulas:
Vy((4x(germ(x) D (in(x,y) D dead(x)))) D sterile(y)),
Vy Vx ((heated(y) A (in(x, y) A germ(x))) D dead(x)), and
heated(j).
It is easily seen that each of these formulas is a member of D2 and so they
collectively constitute a program in the language under consideration. We
204 Donald W. Loveland and Gopalan Nadathur

shall denote this program by P. An interesting aspect of these formulas is


the translation into logical notation of the defining characteristic of steril-
ity for jars. In determining that a given jar is sterile, it is necessary to
verify that any germ that is in the jar is dead. Two forms of hypothetical
reasoning arise in this context. First, we are interested exclusively in the
germs that are in the jar and we only need to ascertain that these are dead.
However, this property must hold of every germ hypothesized to be in the
jar, not just those whose existence is known of explicitly at some point.
These two requirements are reflected in the embedded universal quantifier
and implications that appear in the first program clause above. Further, as
the reader may observe from the discussion that follows, the programming
interpretation for these two logical symbols and the use of the instruction
ATOMIC described in the previous subsection yields exactly the desired
"procedure" for determining the sterility of a jar.
The conclusion that we wish to draw can be represented by the formula
3x sterile (x).
This formula is obviously a member of g2 and, hence, a goal in our extended
language. We may think of reaching the desired conclusion by construct-
ing a uniform proof for this goal from the program P. Using the search
interpretation of the existential quantifier, such a proof may be obtained
by, first of all, constructing a proof for sterile(j) from the same program.
Note that we have exhibited clairvoyance in the choice of the instantiating
term. Such foresight can, of course, not be expected of an actual procedure
that looks for uniform proofs and we will discuss ways of dealing with this
issue in an actual implementation. Now, the search instructions do not
dictate the next step in finding a proof for sterile(j). However, we may
think of using the instruction ATOMIC for this purpose; as we shall see
presently, ATOMIC turns out to be a derived rule in the context of con-
structing uniform proofs. Using this instruction with respect to the first
program clause in the list above produces the (sub)goal
Vx(germ(x) D (in(x, j) D dead(x)))
with the program still being P. At this stage the instructions GENERIC
and (two instances of) AUGMENT will be used, giving rise to the goal
dead(c) and the program P U {germ(c), in(c, j ) } , where c is assumed to be
a new constant. The situation that has been produced reflects an attempt
to show that jar j is sterile by showing that any (generic) germ that is
assumed to be in it is dead. To proceed further, we may use ATOMIC
with respect to the second clause in the list above, yielding the goal
(heated(j) A (in(c, j) A <?germ(c)));
once again, some foresight is needed in the choice of instance for the formula
from the program. Invoking the instruction AND twice now results in the
atomic goals heated(j), in(c, j), and germ(c) that are to be solved from the
Proof Procedures for Logic Programming 205

program P U {germ(c),in(c, j)}. Each of these goals can be immediately


solved by using the ATOMIC instruction.
We could think of extracting a result from the uniform proof that has
been constructed. This result can be the instantiation for the existentially
quantified variable in the original goal that yields the uniform proof. Intu-
itively, this corresponds to exhibiting j as an example of a sterile jar.
The following proposition shows that the classes D2 and g2 constitute
satisfactory replacements for D1 and g1 respectively. We present a proof
of this proposition here to illustrate the kind of analysis involved in estab-
lishing such a property.
Proposition 2.3.3. The triple (D2, g2, ~/) constitutes an abstract logic
programming language.
Proof. Let P be a finite set of D2 formulas and G be a g2 formula. Since
a uniform proof is also an I-proof, P \~o G only if P (-/ G. Thus we only
need to show that the converse holds.
Let F be a finite subset of D2. We claim, then, that there cannot be an
I-proof for F —> _L. This claim is established by an induction on the sizes
of proofs, where the size of a proof is the number of sequents appearing in
it. So suppose the claim is not true and let F be such that the derivation
of F —> J. is of least size. Now, the size of this derivation cannot be 1
since _L cannot be a member of F. If the size is greater than 1, then the
possible structures for the formulas in D2 dictates that the last rule in the
derivation must be one of D-L, V-L, or A-L. In each of these cases, there
must be a derivation of smaller size for a sequent F' —^ J. where F' is a
set of D2 formulas. This contradicts our assumption about F.
Now let A be a finite subset of D2 and G' be a member of 62 and
assume that A —> G' has an I-proof of size l. We claim then that
(1) if G' = GI A G2 then A —» G1 and A —> G2 have I-proofs of
size less than l,
(2) if G' = G1 V G2 then either A —> G1 or A —> G2 has an I-proof
of size less than l,
(3) if G' = 3x G1 then there is a term t such that A —> [t/x]G1 has
an I-proof of size less than l,
(4) if G' = D D G1 then A U {D} —> G1 has an I-proof of size less
than l, and
(5) if G' = VxGi then there is a constant c that does not appear in
any of the formulas in A U {Vx; G1} for which A —» [c/x]G1 has an
I-proof of size less than l.
This claim is proved, once again, by induction on the size of the derivation
for A —¥ G'. If this size is 1, G' must be atomic and so the claim
is vacuously true. For the case when the size is s + 1, we consider the
206 Donald W. Loveland and Gopalan Nadathur

possibilities for the last inference figure. The argument is trivial when
this is one of A-R, V-R, 3-R, D-R, and V-R. From the previous claim, this
figure cannot be J.-R. Given the structure of the formulas in A, the only
remaining cases are V-L, A-L and D-L. Consider the case for V-L, i.e., when
the last figure is of the form

The argument in this case depends on the structure of G'. For instance, let
G' = VxG1. The upper sequent of the above figure is of a kind to which
the inductive hypothesis is applicable. Hence, there is a constant c that
does not appear in any of the formulas in 0 U { [ t / x ] P , V x G 1 ] for which
[t/x]P, 0 — > [c/x]G1 has an I-proof of size less than s. Adding below
this derivation a V-L inference figure, we obtain an I-proof of size less than
s + 1 for Vx;P, 0 — > [c/x]G1 . Observing now that c cannot appear in
Vx P if it is does not appear in [t/x]P, the claim is verified in this case.
The analysis for the cases when G' has a different structure follows an
analogous pattern. Further, similar arguments can be provided when the
last inference figure is an A-L or an D-L.
Now let P — > G have an I-proof. It follows from the second claim
that it must then have a uniform proof. The proof of the claim in fact
outlines a mechanism for moving the inference figure that introduces the
top-level logical connective in G to the end of the I-proof. A repeated use
of this observation yields a method for transforming an I-proof of F — > G
into a uniform proof for the same sequent. •
The proof of Proposition 2.3.3 reveals a relationship between deriv-
ability in intuitionistic and minimal logic in the context of interest. In
particular, let P be a finite subset of D2 and let G be a formula in g2 .
We have observed, then, that an I-proof of the sequent P — > G can-
not contain uses of the inference figure J.-R in it. Thus any I-proof of
such a sequent must also be an M-proof. In other words, these two no-
tions of provability are indistinguishable from the perspective of existence
of derivations for sequents of the form P — > G. It follows from this that
(D2, g2, l~/) and (D2, g2, \-M) constitute the same abstract logic program-
ming languages. We note that the introduction of implication together
with its desired search interpretation leads to a distinction between classi-
cal provability on the one hand and intuitionistic and minimal provability
on the other. It may be asked whether a similar sort of distinction needs to
be made between intuitionistic and minimal provability. It turns out that
a treatment of negation and an interpretation of the idea of a contradiction
requires these two notions to be differentiated. We do not discuss this issue
any further here, but the interested reader may refer to [Miller, 1989b] for
some thoughts in this direction.
Proof Procedures for Logic Programming 207

We have raised the issue previously of what the behavior of the (ideal-
ized) interpreter should be when an atomic goal is encountered. We have
also suggested that, in the case of the language (D2, g2, '~M)» the instruc-
tion ATOMIC might be used at such a point. Following this course is
sound with respect to the defined operational semantics as the following
proposition shows.
Proposition 2.3.4. Let V be a finite subset of D2 and let A be an atomic
formula. If A is an instance of a clause in P or if there is an instance
(G D A) of a clause in P such that P —^ G has a uniform proof, then
P —> A has an uniform proof.
Proof. If A is an instance of a formula in P, we can obtain a uniform
proof of P —> A by appending below an initial sequent some number of
V-L inference figures. Suppose (G D A) is an instance of a clause in P and
that P —» G has a uniform proof. We can then obtain one for P —> A
by using an D-L inference figure followed by some number of V-L below the
given derivation and the initial sequent A, P —> A. I
Using ATOMIC as a means for solving atomic goals in conjunction with
the instructions for solving the other kinds of goals also yields a complete
strategy as we now observe.
Proposition 2.3.5. Let P be a finite subset of D2 and let A be an atomic
formula. If the sequent P —>• A has a uniform proof containing l se-
quents, then either A is an instance of a clause in P or there is an in-
stance (G D A) of a clause in P such that P —> G has an uniform proof
containing fewer than l sequents.
Proof. The following may be added as a sixth item to the second claim in
the proof of Proposition 2.3.3 and established by the same induction: the
sequent P — A has an I-proof containing l sequents only if either A is
an instance of a clause in P or there is an instance (G D A) of a clause
in P such that P —> G has an I-proof containing fewer than l sequents.
The claim when embellished in this way easily yields the proposition. I
The abstract logic programming language (D2, G2, H/) incorporates each
of the search primitives discussed in the previous subsection into the syn-
tax of its goals. It provides the basis for an actual language that contains
two new search operations, AUGMENT and GENERIC, in addition to
those already present in Prolog. At least one use for these operations in
a practical setting is that of realizing scoping mechanisms with regard to
program clauses and data objects. Prolog provides a means for augmenting
a program through the nonlogical predicate called assert and for deleting
clauses through a similar predicate called retract. One problem with these
predicates is that their effects are rather global: an assert makes a new
clause available in the search for a proof for every goal and a retract re-
moves it from consideration in every derivation. The AUGMENT operation
208 Donald W. Loveland and Gopalan Nadathur

provides a more controlled means for augmenting programs: this operation


makes program clauses available only during the search for solutions to
particular goals. Consider, for instance, a goal whose structure is given
schematically by
(D1 D (G1 A (D2 D G 2 ))) A G3,
where D1 and D2 represent formulas from D2 and G1, G2 and G3 represent
formulas from g2. Assume further that we are attempting to solve this goal
from a program given by P. The search interpretation for D requires such a
solution to be obtained by solving G1 from the program PL) {D1}, G2 from
the program PU{D1, D2 and G3 from P. The AUGMENT operation thus
provides a means for making additions to a program in a well structured
fashion. This idea can be exploited to realize block structuring in logic
programming as well as to provide a (logically justified) notion of modules
in this programming paradigm. The reader is referred to [Miller, 1989b]
and [Miller, 1989a] for illustrative examples and for a general exploration
of these abilities.
Just as the AUGMENT operation is useful in giving program clauses
a scope, the GENERIC operation is useful in controlling the availability
of objects. To understand this possibility, consider a goal that is given
schematically by 3x ((Vy G 1 (x, y)) G 2 (x)). Solving this goal involves find-
ing a term t such that for some "new" constant c the goals G 1 ( t , c ) and
G2 (t) are solvable. Note that the constant c is available only in the context
of solving the goal G 1 (t, c). In particular, it cannot appear within t and
thus cannot be transmitted into the context of solving G2 (t) or "outwards"
as an answer substitution. Further, while it can appear in terms created
in the course of solving G1 (t, c) and is, in this sense, available as an object
in this context, it cannot be directly manipulated by the clauses that are
utilized in solving this goal. The latter is a consequence of the fact that the
quantifier y determines the lexical scope of c and thus controls the context
in which it can be referred to directly. Viewed differently, the GENERIC
operation provides a means for introducing a new constant whose lexical
scope is determined by the symbol representing the operation and whose
dynamic scope is the goal that is spawned as a result of the operation.
The GENERIC and AUGMENT operations can be used in tandem
to realize the notion of abstract data types in logic programming. The
essential idea that needs to be captured in this context is that of limiting
direct access to particular objects to only those program clauses that are
supposed to implement operations on them, while allowing these objects
and the operations defined on them to be used in a larger context. To see
how this requirement might be realized in our context, consider a goal of
the form x y(D(x,y) G(x)), where y does not appear in G(x). Prom
one perspective, the variable y is a constant whose name is visible only in
the program clauses contained in D(x, y). Once this constant is "created,"
Proof Procedures for Logic Programming 209

it can be used in the course of solving G(x). However, the only way it can
be referred to by name, and hence directly manipulated, in this context is
by using one of the clauses in D(x,y).
Although the above discussion provides the intuition guiding a realiza-
tion of information hiding and of abstract data types in logic programming,
a complete realization of these notions requires an ability to quantify over
function symbols as well. This kind of a "higher-order" ability can be incor-
porated into the language we have presented without much difficulty. How-
ever, we do not do this here and refer the interested reader to [Miller et al.,
1991] and [Nadathur and Miller, 1988] instead. We also note that a more
detailed discussion of the scoping mechanism provided by the GENERIC
operation appears in [Miller, 1989a] and the language described there also
incorporates a higher-order ability.
We consider now the issue of designing an actual interpreter or proof
procedure for the abstract logic programming languages discussed in this
subsection. Let us examine first the language (D 1 , g1, 1-c . The notion of a
uniform proof determines, to a large extent, the structure of an interpreter
for this language. However, some elaboration is required of the method
for choosing terms in the context of the INSTANCE instruction and of the
action to be taken when atomic goals are encountered. Propositions 2.3.4
and 2.3.5 and the discussion of SLD-resolution suggest a course that might
be taken in each of these cases. Thus, if a goal of the form xG(x) is
encountered at a certain point in a search, a possible strategy is to delay
the choice of instantiation for the quantifier till such time that this choice
can be made in an educated fashion. Such a delaying strategy can be real-
ized by instantiating the existential quantifier with a "placeholder" whose
value may be determined at a later stage through the use of unification.
Placeholders of this kind are what are referred to as logic variables in logic
programming parlance. Thus, using the convention of representing place-
holders by capital letters, the goal G(x) may be transformed into one of
the form G(X) where X is a new logic variable. The use of such variables
is to be "cashed out" when employing ATOMIC in solving atomic goals. In
particular, given a goal A that possibly contains logic variables, the strat-
egy will be to look for a program clause of the form y A' or y (G A')
such that A unifies with the atom that results from A' by replacing all the
universally quantified variables in it with logic variables. Finding such a
clause results, in the first case, in an immediate success or, in the second
case, in an attempt to solve the resulting instance of G. The interpreter
that is obtained by incorporating these mechanisms into a search for a uni-
form proof is still nondeterministic: a choice has to be made of a disjunct
in the context of the OR instruction and of the program clause to use with
respect to the ATOMIC instruction. This nondeterminism is, however,
of a more tolerable kind than that encountered in the context of the IN-
STANCE instruction. Moreover, a deterministic but incomplete procedure
210 Donald W. Loveland and Gopalan Nadathur

can be obtained by making these choices in a fixed manner as is done in


the case of Prolog.
In devising an actual interpreter for (D 2 , g2, l-1), we need also to think of
a treatment of the search primitives embodied in D and V. From inspecting
the instructions corresponding to these symbols, it appears that relatively
straightforward mechanisms can be used for this purpose. Thus, consider
the situation when a goal of the form D G is encountered. We might
proceed in this case by enhancing the program with the clause D and
then attempting to solve G. Similarly, assume that we desire to solve the
goal x G(x). We might think now of generating a new constant c and
of attempting to solve the goal G(c) that results from instantiating the
universal quantifier with this constant.
The mechanisms outlined above can, in principle, be used to implement
the new search primitives. However, some care must be taken to ensure
that they mesh properly with the devices that are used in implementing the
language (P1, G1, 1-c). One possible source of unwanted interaction is that
between the use of logic variables to deal with INSTANCE and of new con-
stants to realize GENERIC: since instantiations for existential quantifiers
may be determined only after the introduction of new constants for dealing
with universal quantifiers that appear within their scope, some method is
needed for precluding the appearance of such constants in the instantiating
terms. As a concrete example, consider solving the goal 3xVy p(x, y) from
the program {Vx p(x,x)}; we assume that p is a predicate symbol here.
Following the strategy outlined up to this point in a naive fashion results
in a success: the initial goal would be reduced to one of the form p(X, c),
and this can be unified with the (only) program clause by instantiating X
to c. Notice, however, that this is an erroneous solution since the instanti-
ation determined for X leads to a violation of the condition of newness on
c at the time that it is generated.
To prevent incorrect solutions of the kind described above, it is nec-
essary to limit possible instantiations for logic variables in some relevant
manner. Some devices such as raising [Miller, 1992], lifting [Paulson, 1987]
and dynamic Skolemization [Fitting, 1990] have been proposed for this pur-
pose. (Static Skolemization, the device generally used in the setting of clas-
sical logic, does not work in the current context.) These devices preserve
the usual unification process but build the constraints that have to be sat-
isfied into the "logic variable" that is used in connection with INSTANCE
or the new "constant" that is introduced in connection with GENERIC.
An alternative approach that is described in [Nadathur, 1993] is to modify
the naive treatment of INSTANCE and GENERIC only to the extent of
including a numeric label with logic variables and constants and to con-
strain unification to respect the information contained in these labels. This
approach requires much less bookkeeping than the earlier mentioned ones
and is, in fact, simple enough to be incorporated into an abstract machine
Proof Procedures for Logic Programming 211

for the language described in this subsection [Nadathur et al, 1993].


An adequate implementation of AUGMENT must consider two addi-
tional aspects. One of these arises from the fact that different programs
might have to be considered at distinct points in a search. An efficient
means is therefore needed for realizing changing program contexts. The
additions to and the depletions from the program generally follow a stack
discipline and can, in principle, be implemented using such a device. A
complicating factor, however, is the imposition of determinism on the in-
terpreter that we have outlined and the consequent need for backtracking
when a failure is encountered along a solution path that is being explored.
To understand why this might be a problem, let us consider a goal of the
form 3x((D G G 1 (X)) A G 2 x ) ) in the context of a program P. Under
the scheme being considered, the program would first be augmented with
D and an attempt would be made to solve the goal G 1 ( X ) , where X is a
logic variable. Suppose that this attempt succeeds with a certain binding
for X and also leaves behind an alternative (partially explored) solution
path for G 1 ( X ) . An attempt would now be made to solve the appropriate
instance of G 2 ( X ) after removing D from the program. Assume, however,
that the attempt to solve this instance of G 2 ( X ) fails. It is necessary, at
this point, to continue the exploration of the alternative solution path for
G 1 ( X ) . However, this exploration must be carried out in the context of
the appropriate program. In particular, all the clauses that were added in
the course of trying to solve G2 (X) must be removed and the clause D and
others that might have been added along the alternative solution path for
G 1 ( X ) must be reinstated. The kind of context switching that is required
as a result of backtracking is, thus, fairly complex and some thought is
required in devising a method for realizing it efficiently.
The second new aspect that needs to be dealt with in a satisfactory
implementation of AUGMENT is that of program clauses containing logic
variables. To see how such clauses might be produced, and also to under-
stand what is involved in handling such clauses properly, let us consider
the goal 3x (p(x) D g(x)); we assume that p and g are predicate symbols
here. Following the strategy already outlined, the attempt to solve this goal
leads to an attempt to solve the goal g(X) from a program that now also
includes the clause p ( X ) . Notice that the symbol X that appears in these
formulas is a logic variable and that this variable actually appears in the
new program clause. The nature of this variable in the clause is different
from that of the variables that usually appear in program clauses. While
this variable can be instantiated, such an instantiation must be performed
in a consistent way across all copies of the clause and must also be tied to
the instantiation of X in the goal g ( X ) . Thus, suppose that the program
from which we are attempting to solve the given goal contains the clauses

and
212 Donald W. Loveland and Gopalan Nadathur

Using the first clause in trying to solve g(X) produces the subgoals q(X)
and p(b]. The first subgoal can be successfully solved by using the second
clause and the added clause p(X). However, as a result of this solution to
the first subgoal, the added clause must be changed to p(a). This clause
can, therefore, not be used to solve the second subgoal, and this ultimately
leads to a failure in the attempt to solve the overall goal.
The problem with logic variables in program clauses occurs again in the
system N-Prolog that is presented in the next section and is discussed in
more detail there. An actual interpreter for the language (D2 g 2 ,1- 1 ) that
includes a treatment of several of the issues outlined here is described in
[Nadathur, 1993]. The question of implementing this interpreter efficiently
has also been considered and we refer the interested reader to [Nadathur et
al., 1993] for a detailed discussion of this aspect. For the present purposes,
it suffices to note that a procedure whose structure is similar to the one
for constructing SLD-refutations can be used for finding uniform proofs
for the extended language and that a satisfactory implementation of this
procedure can be provided along the lines of usual Prolog implementations.
We have focused here on using the criterion developed in the last sub-
section in describing abstract logic programming languages in the context
of first-order logic. However, the framework that has been described is
quite broad and can be utilized in the context of other logics as well. It
can, for instance, be used in the context of a higher-order logic to yield
a higher-order version of the language (D 2 , G2, l- 1 ). Such a language has,
in fact, been described [Miller et al, 1991] and has provided the basis for
a higher-order extension to Prolog that is called AProlog [Nadathur and
Miller, 1988]. A discussion of some aspects of this extension also appears
in the chapter on higher-order logic programming in this volume of the
handbook. More generally, the framework can be utilized with a differ-
ent set of logical symbols or with a different search interpretation given
to the logical symbols considered here. Interpreted in this sense, our crite-
rion has been used in conjunction with linear logic [Harland and Pym, 1991;
Hodas and Miller, 1994] and a calculus of dependent types [Pfenning, 1989]
to describe logic programming languages that have interesting applications.

3 Extending the logic programming paradigm


In the previous section we presented an abstract framework which attempts
to capture the essence of the logic programming paradigm. We have seen
that the Horn clause logic underlying Prolog fits this framework but that a
much richer class of program clauses can be used if intuitionistic logic is em-
ployed in providing the underlying declarative semantics. Abstracting and
formalizing the logic programming paradigm thus provides a mechanism to
propose new logic programming languages. However, it also allows us to
Proof Procedures for Logic Programming 213

better understand logic programming languages developed independently


of the framework. We illustrate this in this section with the consideration
of two such existing logic programming languages.
One category of languages that we present, N-Prolog and the related
QN-Prolog, extends Horn clause logic by permitting implication in goals
and in the antecedent of program clauses. We shall see that this language is
based on an abstract logic programming language. The proof procedures for
N-Prolog and QN-Prolog may also be understood as ones that attempt to
construct uniform proofs, and we illustrate this aspect in our presentation.
The second language presents a different situation, a datapoint seem-
ingly outside the framework we have presented. The programming lan-
guage family, called near-Horn Prolog (nH-Prolog), permits disjunction in
the consequent of program clauses. This leads to some strikingly differ-
ent characteristics relative to other logic programming languages, such as
the possibility of disjunctive answers. (With a suitable encoding of clas-
sical negation one may view the logic as coextensive with full first-order
logic.) The proof mechanism of one variant of nH-Prolog fits the structure
of uniform proofs, a startling fact given the power of representation of the
underlying logic. We show by illustration the key reasons why the strong
proof restrictions hold for this variant.
The study of the nH-Prolog language we undertake here accomplishes
two points simultaneously. First, it supports the claim that the nH-Prolog
language is a very attractive way to approach the extended logic domain
with regard to the logic programming paradigm. Second, understanding
the relationship between the variant of nH-Prolog considered here and the
notion of uniform proof enhances our understanding of both entities.
We begin our consideration of existing languages with N-Prolog.

3.1 A language for hypothetical reasoning


The N-Prolog language, developed by Gabbay and Reyle [Gabbay and
Reyle, 1984b; Gabbay, 1985] introduces the hypothetical implication. This
mechanism allows exploration of the consequences of a hypothetical situa-
tion, where additions are temporarily made to the database, or program,
when certain queries are addressed. The basic system is sound and com-
plete for positive intuitionistic logic, and Gabbay has shown that it can be
extended to a complete procedure for positive classical logic by adding one
inference rule (a "restart" rule).
We first present the propositional form of N-Prolog. We follow closely
Gabbay and Reyle [Gabbay and Reyle, 1984b]. If P is a program and Q
is a query, then P ? Q expresses the problem of determining if query Q
follows from program P. P ? Q = 1 denotes success in processing Q;
P ? Q = 0 denotes finite failure. If we let P + A denote P U {A}, where
P is a program and A is a clause, then the key novel deductive step of
N-Prolog is to define P ? (A D B) as (P + A) ? B. That is, the conditional
214 Donald W. Loveland and Gopalan Nadathur

A D B as query is interpreted as asking B in the context of P augmented


by the hypothetical fact A. We now formalize propositional N-Prolog.
An N- clause is defined inductively as an atom or as
A1 A A2 A . . . A An D B
where the Ai are N-clauses and B is an atom. An N-program or database
is a set of N-clauses. An N-goal is any conjunction of N-clauses. For
example, any Horn clause is an N-clause, as are (((a D 6) D c) D d) and
((a D 6) A c) D d. The expression
(a D (b D c)) D ((a D 6) D (a D c)),
an axiom in one formulation of intuitionistic logic, is not an N-clause, but
the expression
((a A b D c) A (a D b) A a) D c
that is intuitionistically equivalent to it is an N-clause. In general, for-
mulas of form A D (B D C) need to be rewritten in the intuitionistically
equivalent form A A B D C in order to qualify as N-clauses.
There are three rules for computation for prepositional N-Prolog.
(1) The rule for atoms has two parts: if Q is an atom then P ? Q succeeds
iff
(a) Q e P or
(b) for some clause A1 A . . . A Ak D Q in P we have that P ? (Ai A
. . . A Ak succeeds.
(2) The rule for conjunctions states that P ? (A1 A . . . A Ak) succeeds iff
each P ? Ai succeeds.
(3) The rule for implication states that P ? (A D B) succeeds iff (P +
A) ? B succeeds, where P + A is the result of adding each conjunct
of A to P.
To illustrate a deduction in N-Prolog we show that
P ? ((a A 6 D c) A (a D 6) A a) D c
succeeds for P = 0, the empty program.
Example 3.1.1. Example: Show 0 ? ((a A b D c) A (a D 6) A a) D c = 1.
1. 0 ? ( ( a A b D c ) A (a D b) A a) D c
2. aA6Dc,aD6,a?c
3. aA bDc,a D b,a?aAb
4. Case 1: a A & D c, a D b, a?a
This case succeeds immediately.
5. Case 2: o A f e D c , a D b, a?b
6. a A b D c , a 6, a? a
Proof Procedures for Logic Programming 215

This case now succeeds.

N-Prolog with quantifiers, which we might also call general level N-


Prolog or first-order N-Prolog, addresses the extension of N-Prolog to the
first-order level. The major complication in moving to the general level that
is not shared by Prolog is the sharing of variables between program and
goal. The task P ? (a(X) D b(X)) succeeds if (P + a(X)) ? b(X) succeeds,
where variable X now is shared by program and goal. (We hereafter adopt
the Prolog convention of capital letters for variables.)
To handle the sharing of variables, N-Prolog with quantifiers, hereafter
QN-Prolog, uses two classes of variables, universal and existential. Here
universal variables are denoted by Ui, for i an integer; existential variables
use other letters of the alphabet, also possibly with subscripts. (Of course, a
commercial system might permit arbitrary words in place of the subscript.)
Atoms now are predicate letters with terms that may contain variables of
the two different kinds. QN-clauses, QN-goals and QN-programs generalize
from the propositional form in the obvious way except no universal variables
are permitted in QN-goals. Recall that in Prolog variables in goals behave
as existential variables, and as universal variables in the program, but now
that universal and existential variables co-exist in programs, an explicit
mechanism is necessary. Some explicit replacement of universal variables
by existential variables is necessary, as illustrated below.
For a formal definition of QN-Prolog the reader should consult [Gabbay
and Reyle, 1984b]. We will use an example to illustrate the use of the
two classes of variables. First, we emphasize the key attributes regarding
variables use:

(1) Existential variables may be shared among program clauses and goals,
reflecting the view that the existential quantifiers associated with the
variables are exterior to the entire expression (i.e. 3X(P ? Q));
(2) Only existential variables occur in goals.

We adapt Example E2 (Section 4) from [Gabbay and Reyle, 1984b]


for our illustration. Although somewhat contrived so as to present an
interesting non-Horn structure in a compact example, the example does
suggest that real databases (programs) exist where hypothetical implication
occurs and thus processing can benefit from the use of QN-Prolog.
Example 3.1.2 (Example E2 in [Gabbay and Reyle, 1984b]).
Assume that we are given the following database:
(dl) A person is neurotic if he/she is greatly disturbed when criticized by
one of his/her friends.
(d2) Mary gets greatly disturbed when Sally criticizes her.
(d3) Sally criticizes everyone.
216 Donald W. Loveland and Gopalan Nadathur

(d4) A person is a bad friend if there is someone who would become neu-
rotic if this someone were befriended by the person.
The query is "Is there someone who is a bad friend?"
The database and query are formalized as follows:
(dl) [/([U1, U2) A ct(U1U2 ) D d(U2)] D n(U2)
(d2) cr(s,m) D d(m)
(d3) cr (s,U 1 )
(d4) [f(U1,U2))Dn(C7 2 )] Db(U 1 )
(query) b(Y)
The query uses an existential variable, and all database clauses use
universal variables.
We now follow [Gabbay and Reyle, 1984b] in presenting one possible
computation sequence for this database and query.
1. dl,d2,d3,d4 ? b(Y)
2. dl,d2,d3,d4 ? f(Y, X) D n(X) using d4
3. dl,d2,d3,d4,/(y,X) ? n(X) using rule for implication
4. dl,d2,d3,d4,/(y, X) ? f ( Z , X) A cr(Z, X) D d(X)
using dl
5. dl,d2,d3,d4,/(y, X), f ( Z , X),cr(Z, X) ? d(X)
using rule for implication
6. dl,d2,d3,d4,/(y,m),f(Z,m),cr(Z,m) ? cr(s,m)
using d2
7. dl,d2,d3,d4,/(y,m),/(Z,m),cr(Z,m) ? cr(s,m)
using d3.
In connection with this computation, it is interesting to note that in step
2 and in step 4 the universal variables C2 and U1 that appear in clauses
d4 and dl respectively are not instantiated during unification with the
previous goal and have therefore to be renamed to an existential variable
in the new goal. Note also that additions are made to the program in
the course of computation that contain existential variables in them. An
instantiation is determined for one of these variables in step 6 and this
actually has the effect of changing the program itself.
Observe that the answer variable is uninstantiated; the bad friend exists,
by the computation success, but is unnamed. The neurotic friend is seen
to be Mary by inspection of the computation.
The derivation given by Gabbay and Reyle uses clause d3 at Step 7.
Perhaps a more interesting derivation makes use of the augmented database
at Step 7 and shows that clause d3 is unneeded. The intuition behind this
alternative is that dl says that for some instance of U2 one can establish
n(U 2 ) by satisfying (an appropriate instance of) an implication, and d2
provides an appropriate implication for such an instance.
Proof Procedures for Logic Programming 217

We now state the soundness and completeness result for QN-Prolog


given in Gabbay [Gabbay, 1985].
Proposition 3.1.3. P ? Q succeeds in QN-Prolog if and only if
h 3X[(VU)P D Q]
holds in intuitionistic logic for program (database) P and query Q where
X denotes all the existential variables in P and Q and U denotes all the
universal variables in P.
The above proposition brings out the connection between QN-Prolog
and the discussion in the previous section. QN-Prolog is readily seen to
be based on an abstract logic programming language contained in the lan-
guage (D2,g2,1- 1 ) that is described in Subsection 2.3. This language is in
fact created from the Horn clause language by permitting formulas of the
form D D A to appear in goals and in the antecedents of program clauses
where D is itself a conjunction of program clauses under this definition. To
ensure the existence of uniform proofs, i.e. proofs in which the succedent is
singleton and also atomic whenever an antecedent inference figure is used,
the derivability relation that needs to be adopted is 1-1. To reinforce the
understanding of QN-Prolog as an abstract logic programming language
we present the uniform proof implicit in the previous example using the
sequent calculus described in the previous section.
In the presentation of the proof we adopt several suggestions made in
Subsection 2.3 to improve the abstract interpreter efficiency. We delay the
instantiations by use of logic variables as suggested for the abstract inter-
preter and as executed in the description of QN-Prolog. When the proof
is read from the bottom (conclusion) upward, existential variables replace
the instances that the V-L inference figures would normally introduce. The
use of these variables signifies that one, perhaps undetermined, instance
of the clause has been created. Other instances may similarly be created
from the explicitly quantified version of the clause that is present in the
antecedent of the sequent.
Several nonstandard inference figures are used to aid the presentation.
The inference figure CI (clause incorporation) contains a formula in the
antecedent of the upper sequent whose universal closure (zero or more ap-
plications of V-L) results in a clause already present in P and not explicitly
listed in the lower sequent. We have used EG (existential generalization)
simply to identify existential variables with their appropriate instantiation
— an artifact of our delayed instantiation during goal-oriented (backward)
proof development. We use the inference figure BC (backchain), suggested
in the previous section, to save a step and simplify our deduction. The
inference figure BC replaces D-L and eliminates an initial upper sequent.
(Also logic variables may be instantiated in the remaining upper sequent.)
For example, the lowest use of BC in the deduction replaces the inference
218 Donald W. Loveland and Gopalan Nadathur

figure occurrence

where D-L here is augmented by the instantiation of the logic variable V


byY.
With the modified inference figures suggested by the abstract inter-
preter of the preceding section, we observe that the modified I-proof is
almost identical to that given in the QN-Prolog presentation.
Example 3.1.4 (A uniform proof of the Gabbay-Reyle example).
As already noted, P corresponds here to the (universal closure of the)
program presented earlier.

The deductive power of N-Prolog can be extended to a proof procedure


for classical logic by the introduction of a new rule of inference called a
restart rule. Gabbay [Gabbay, 1985] calls the resulting system NR-Prolog.
The propositional form of the restart rule states that at any point where
the current goal is atomic, this atom can be replaced by the original query,
and if this original query occurrence succeeds then the replaced goal is
deemed to succeed. The original query might succeed at this point but not
at the original occurrence because now the program has been enlarged by
use of the rule for implications. One should note that any ancestor goal
can replace a current goal because the ancestor goal could be regarded as
the original query. (Use of clauses augmenting the program but introduced
before the ancestor goal is not a problem, as we note below.)
The natural example to illustrate the use of the restart rule is Pierce's
law. This formula is not valid intuitionistically, thus not provable in N-
Prolog, as the reader can verify. We show that Pierce's law succeeds in
NR-Prolog.
Proof Procedures for Logic Programming 219

Example 3.1.5 (Pierce's law).


1. ? ((a D 6) D a) D a
2. (a D 6) D a ? a rule for implication
3. (a D b) D a ? a D 6 using the only clause
4. (a D b) D a, a ? 6 rule for implication
5. (a D 6) D a, a ? (a D b) D a) D a restart
6. (a D 6) D a, a, (a D 6) D a ? a rule for implication
7. success
As can be seen above, the restart subdeduction succeeds. Thus 6 succeeds
from {(a D b) D a,a}. Thus the original query succeeds.
In the above example ancestor goal a could have been the restart goal.
Although (a D b) D a was added to the program before a became a goal,
it is re-derived in the restart, which can always reproduce the path to a.
Thus, any ancestor goal restart is always permissible in NR-Prolog.
The restart rule at the first-order level is a simple generalization of the
propositional case. The restart rule for QNR-Prolog, the extension of QN-
Prolog by addition of this rule, is precisely as for the propositional case
except that the reintroduced original goal must have all its (existential)
variables replaced by new existential variables. See [Gabbay, 1985] for
details.
The power of the restart rule is seen by comparing the following sound-
ness and completeness theorem with the earlier theorem for QN-Prolog:
Proposition 3.1.6. P ? Q succeeds in QNR-Prolog if
\- 3X[(VUP D Q]
holds in classical logic for program (database) P and query Q, where X
denotes all the existential variables in P and Q and U denotes all the uni-
versal variables in P.
3.2 Near-Horn Prolog
We now shift our attention from an extension of Prolog (SLD-resolution)
by hypothetical implication to an extension by disjunctive consequent. We
wish to handle clauses with an indefinite conclusion, such as "jump in lake
D sink V swim". This extension has received considerable attention re-
cently, identified as a subarea (called disjunctive logic programming) with
a first book devoted to the subarea (see [Lobo et a/., 1992]). Our focus
is on proof procedures so we restrict ourselves to presenting (one version
of) the Near-Horn Prolog (nH-Prolog) procedure mentioned earlier. This
version of nH-Prolog was developed by Reed and Loveland [Reed, 1988;
Reed, 1991; Loveland and Reed, 1991] based on earlier versions devised by
Loveland [Loveland, 1987; Loveland, 1991].
The astute reader may recognize that the introduction of non-Horn
clauses such as above returns us to the full descriptive power of first-order
220 Donald W. Loveland and Gopalan Nadathur

logic since every clause in a conjunctive normal form (cnf) presentation of


a formula can be written as an implication of atoms. Namely, negative
literals of the cnf clause appear (positively) in the antecedent of the im-
plication and positive literals of the cnf clause appear in the consequent.
Since SLD-resolution is a restriction of SL-resolution (or ME), the appro-
priate procedure family seems at hand. Unfortunately, these procedures
have properties not compatible with the character of logic programming.
A severe problem with the use of ME-based procedures is the need to access
any literal of the input clause in executing the extension operation. With
the clauses in positive implication format this means accessing antecedent
atoms as well as consequent atoms. This requires a notion of contrapos-
itive implications, which reverses the direction of implication. This can
strongly disrupt the user's sense of computation sequence, which is key to
severely limiting the branching that occurs in Prolog execution. Moreover,
the ability to catch errors in the formulation of the program is strongly de-
pendent on the procedural reading capability of the programmer, and the
disruption caused by the introduction of unintuitive implications is high.
This seems a fatal flaw to the notion of "program" in comparison to "ax-
iom clause set" that separates logic programming from automated theorem
proving. There certainly is reason to look to preserve implication direction
as SLD-resolution does.
There are several important properties enjoyed by SLD-resolution, and
its implementation as Prolog, that one would seek to preserve in extending
to a new domain. Prolog (SLD-resolution) is goal oriented, allows a declar-
ative and procedural reading capability, has a linear input format, uses
a positive implication format, and needs no contrapositive occurrences of
the implications. Moreover, the process is intuitionistically sound, which
insures a certain constructive nature to the inference structure. The at-
tractiveness of nH-Prolog is that it shares these properties with Prolog,
except that the linearity and the procedural reading properties are local
only. (A certain type of contrapositive is used, but not one that changes the
direction of implication.) The degree of locality depends on the number of
uses of non-Horn clauses in the computation. Low use of non-Horn clauses
yields more traditional computation sequences. This is a major reason why
near-Horn Prolog is so-named; the intended domain is clause sets with few
non-Horn clauses.
Like SLD-resolution, nH-Prolog is intuitionistically sound, although
the process of preparation of clauses uses transformations that are clas-
sically but not intuitionistically provable. We will consider a version of
nH-Prolog that meets the uniform proof criteria, somewhat unexpectedly,
given that the problem domain is all of first-order logic. (However, the
above-mentioned clause preparation accounts for some of the apparent gap
in these two statements. For example, the formula - -A D A, not intu-
itionistically provable, simply transforms to query A and program A.)
Proof Procedures for Logic Programming 221

The nH-Prolog procedure (in all variants) can be regarded as deduction


by case analysis. A deduction is divided into blocks, where each block is
essentially a Prolog deduction of one case. Information on deferred cases
is retained in a "deferred head" list within each deduction line. The case-
analysis approach is effective when the number of non-Horn clauses is low,
but is not as effective as, say, ME-type procedures when many non-Horn
clauses are present, such as in some logic puzzles and mathematical theo-
rems.
We present a variant known as Inheritance near-Horn Prolog, or InH-
Prolog, because it is relatively simple to describe, has an underlying com-
plete first-order proof procedure, yet also seems to be the preferred version
for implementation when building a compiler for the language. See [Love-
land and Reed, 1991] for a full discussion of related issues. See [Reed and
Loveland, 1992] for one comparison of this variant with N-Prolog and a
simplified problem reduction format of Plaisted. After presenting the pro-
cedure we give two example deductions; these illustrate the procedure and
also the role of disjunction in a database.
We consider the language format. The general clause form is
A1;;...; An :- B0, ..., Bm
where the Ai, Bj are atoms. The semicolons denote "or", and, as usual,
the commas denote "and". Thus, the semantics for the above clause is the
implication
#! A BI ... A Bm D AI V ... V An.
A definite Horn clause is given by n = 1. A (possibly disjunctive) fact
is given by m = 0. The n = 0 case is not permitted as such, but is
transformed into a special-case Horn clause by use of the special atom
FALSE, which is given the semantics of "false" by its role in the InH-Prolog
procedure. (As we shall see later, this atom corresponds to the logical
symbol L introduced in Subsection 2.2.) Thus query clause ?- BI, ... ,
Bm is written FALSE : - BI , ... ,Bm and we use a fixed query ?- FALSE.
The reason for the introduction of special atom FALSE involves the man-
ner of handling negative clauses. Negative clauses may appear in the pro-
gram itself, unlike Horn clause programs. A traditional query form, e.g.
?- q(X), is usable but then every negative program clause would have form
q(X) :- BI, ... , Bm instead of FALSE :- B1 ... , Bm. This makes the
program dependent on the query, a very awkward trait. Answer retrieval
is accomplished from the query clause FALSE : - Q, where Q is the user's
query. Although formally a program clause, this (single) clause would be
outside the compiled program and instead interpreted since it changes with
each query. See [Loveland and Reed, 1991] and [Smith and Loveland, 1988]
for further discussion.
222 Donald W. Loveland and Gopalan Nadathur

Some terminology is needed before presenting the InH-Prolog proce-


dure. The head of a clause now possibly has multiple atoms, called head
atoms, sometimes just heads when context permits. (The term body atom,
for atoms in the clause body, is available but rarely used.) Goal clauses,
used in recording the deduction, are notated :- G1, ..., Gn except for the
initial goal clause, the query, which is notated ?- G, in our case ?- FALSE.
The summary we give applies to all the nH-Prolog variants except where
noted.
An InH-Prolog deduction may be regarded as blocks of Prolog deduc-
tions where each block differs only slightly from standard Prolog. The
basic Prolog reduction inference step is the primary inference step of the
InH-Prolog block, with the adjustment that if the called program clause is
non-Horn then the head atoms not called are put aside for use in another
block. We now amplify the form of the deduction.
For the presentation of an InH-Prolog deduction we use a line format
that displays the current goal clause, the active head list and the deferred
head list. (The various head lists are defined below.) The format is:
goal-clause # active_head-list [deferred-head list]
We first state the inference rules of InH-Prolog for a block.
Reduction — if Gs is the selected goal atom of the preceding goal clause
G (in usual implementations this will always be the leftmost goal atom)
and HI;. . .;Hn :- Gi,...,G m is a program clause, where Gs and Hi are
unifiable, for some i, 1 < i < n, then ( G 1 , . . . , Gm)9 replaces GS6 in G9,
where 9 is the most general unifier of Gs and H. The HiO, j ^ i, are added
to the deferred head list, with no variable renaming. Both the active and
deferred head list pass to the next line; changes from line to line include
only possible instantiation of variables shared with the goal clause and
additions to the deferred head list as just described. The deferred head list
is placed between brackets.
Cancellation — if Ga is the selected goal in goal clause G and Gs unifies
with an active head A (defined below), then the next goal clause is (G —
{Gs})0, where 9 is the unifier of Gs and A.
Variables in the head list atoms can be instantiated during reduction
through shared variables with goal clause atoms or during cancellation.
We now consider block structure, which is reflected in the initialization
of the block. Blocks serve as subdeductions, a single case consideration in
the analysis-by-cases characterization of the InH-procedure.
Block initiation — the first line of a block has the query ?- FALSE as goal
clause. For restart blocks (blocks other than the first, or start, block) lists
of active head atoms and deferred head atoms are declared that accompany
each line of the block. If a block terminates successfully by deriving the
empty clause as for Prolog, but with a nonempty deferred head list, then
a successor restart block is created. A distinguished active head is defined
Proof Procedures for Logic Programming 223

for the new block, chosen by a user supplied selection function from the
deferred head list of the final line of the preceding block. (For a standard
implementation it will be the leftmost deferred head.) The deferred head
list is the deferred head list of the final line of the preceding block, minus
the now distinguished active head. For other nH-Prologs this defines the
active and deferred head lists of the new block but InH-Prolog has also as
active heads the active heads of the block in which the new distinguished
active head was deferred. If one represents the block structure in tree form
with a child block for each deferred head in a block, then the active heads
in the new child block include the parent block deferred head as distin-
guished active head and the active heads of the parent block as the other
active heads. No deferred head list exists, as the heads are immediately
recorded in the children blocks. (The possibility that multiple heads from
a clause can share variables complicates the tree format by requiring that
binding information be passed between blocks. In the sequential format
we are outlining, acquiring heads from the previous block handles shared
bindings. However, special recording mechanisms are needed in InH-Prolog
to identify the correct active heads in the preceding block.)
In the start block where no active head list exists, the separator symbol
# is omitted until deferred heads are introduced.
The deduction terminates successfully when no new block can be formed,
which occurs when the deferred head list is empty.
We distinguish the newest active head primarily because of the key
cancellation requirement discussed in the next paragraph, but also because
nH-Prolog deductions must have one restart block per deferred head, and
the distinguished active head is the appropriate head atom to identify with
the block. The active heads serve as conditional facts in the case-analysis
view of the nH-Prologs.
The InH-Prolog procedure has added requirements that relate to search.
A non-obvious property whose absence would make the procedure imprac-
tical is the cancellation requirement. This requirement states that a restart
block is to be accepted in a deduction only if a cancellation has occurred
using the distinguished active head. (Cancellation with other active heads
may occur also, of course, but do not satisfy the requirement.) This de-
mand keeps blocks relevant, and prunes away many false search branches.
The underlying proof procedure is a complete proof procedure with this
feature. We return to the importance of this later.
Another important search feature is progressive search. Although too
complex to treat fully here we outline the feature. Recall that a restart
block in InH-Prolog always begins with the initial query FALSE. (This is
not true of the Unit nH-Prolog variant, originally called Progressive nH-
Prolog [Loveland, 1987; Loveland, 1991] because it first incorporated this
important search feature. However, this feature is useful for all variants.)
If the program clauses were processed exactly as its "parent" block, then
224 Donald W. Loveland and Gopalan Nadathur

it would duplicate the computation with no progress made, unless the new
distinguished active head can invoke a cancellation. Progressive search
starts the search of the restart block where the "parent" block finished
(roughly), so that a new search space is swept. More precisely, the initial
portion of the restart block deduction duplicates the initial portion of the
block where the current distinguished active head was deferred, the initial
portion ending at that point of deferral. (The path to this point can be
copied with no search time accrued.) In effect we insert a FAIL at this
point and let normal Prolog backtracking advance the search through the
remainder of the search space. If this search fails then the search continues
"from the top", as a regular search pattern would begin the block. The
search halts in failure when the initial path to the deferred head point
is again reached. For a more precise description see [Loveland, 1991]. An
interpreter for Unit nH-Prolog has proven the worth of this search strategy.
We present two examples of InH-Prolog deductions. The first has very
short blocks but has multiple non-Horn clauses and an indefinite answer.
The second example has less trivial blocks and yields a definite answer.
An answer is the disjunction of the different instantiations of the query
clause which begin InH-Prolog blocks. Not every block begins with a query
clause use; negative clauses also have head atom FALSE and may begin
blocks. However, negative clauses are from the program so represent known
information. Only uses of the query clause yield new information and war-
rant inclusion in the answer to a query. Nor does each use of a query clause
yield a different instantiation; identical instantiations of answer literals are
merged.
The deduction below shows a successful computation without any in-
dication of the search needed to find it, as is true of any displayed de-
duction. It is natural to wonder how much search occurs before the query
"men_req(X, Cond)" calls the correct clause in the successive restart blocks.
(Following Prolog, strings beginning with capital letters are variables.) At
the completion of the deduction we show that the cancellation requirement
kills the attempt to repeat the start block and thus forces the system to
backtrack immediately upon block failure to move to the next (and correct)
clause to match the query.
Example 3.2.1 (The manpower requirement example).
Program:
men_req(8,normal) :- cond(normal).
men_req(20,wind_storm) :- cond(wind_storm).
men_req(30,snow .removal) :- cond(snow-removal).
cond(snow_removal); cond(wind_storm) :- cond(abnormal).
cond(normal) ; cond(abnormal).

Query:
Proof Procedures for Logic Programming 225

?- men_req(X,Cond).

Query clause:
FALSE :- men_req(X,Cond).

Deduction:
?- FALSE
- men_req(X,Cond)
- cond(normal) % X=8, Cond=normal
# [cond(abnormal)]
restart
7
- FALSE # cond(abnormal)
- men_req(Xl,Condi) # cond(abnormal)
- cond(wind_storm) # cond(abnormal)
% X=20, Condl=wind^torm
- cond(abnormal) # cond(abnormal) [cond(snow_removal)]
# cond(abnormal) [cond(snow_removal)]
% cancellation with distinguished head
restart
?- FALSE # cond(snow .removal), cond (abnormal)
% The new distinguished head is placed
% leftmost in the active head list
:- men_req(X2,Cond2) # cond(snowjremoval),cond(abnormal)
:- cond(snow .removal) # cond(snow .removal), cond (abnormal)
% X=30, Cond2=snow_removal
:- # cond(snow_removal),cond(abnormal)
% cancellation with distinguished head

The query answer is the disjunction of the various instantiations of the


user query for successful blocks. Here the answer is:
men_req(8,normal) OR men_req(20,wind-storm) OR
men-req(30,snow-removal)
We now show that the cancellation requirement aids the search pro-
cess in the manner discussed before the deduction. A restart block with
standard search rules would recompute the computation of the start block.
However, if the following restart block were to follow the start block, and
not be eliminated, then an infinite loop would be entered. (This restart
block essentially repeats the start block.)

restart
?- FALSE # cond (abnormal)
:- men_req(Xl,Condi) # cond (abnormal)
: - cond(normal) # cond (abnormal)
226 Donald W. Loveland and Gopalan Nadathur

# cond(abnormal) [cond(abnormal)]

Because the distinguished active head "cond(abnormal)" is not used in


cancellation this block would not be retained, normal backtracking would
occur and the block of the deduction would be obtained. The progressive
search feature discussed previously would also keep this block from being
computed directly after the start block; the cancellation requirement allows
a more standard search order to be used if desired. Also this illustrates
the cancellation requirement mechanism; this discard feature is used many
times other than repressing near-duplicate blocks.
The second example yields a definite answer. This is a commuting
problem with an alternate behavior depending on the status of the office
parking lot. If the problem were formulated to return a plan, then an indef-
inite answer would be returned [Loveland and Reed, 1991]. This example
illustrates what we believe is a common structure for disjunctive logic pro-
grams. The non-Horn clause records the expected case ("park in office lot")
and the error case ("lot is full"). Recovery possibilities are confirmed by
the completion of the computation.
Example 3.2.2 (The commute example).
Program:
car(al).
drives(X,office_bldg) :-car(X).
gets_to(X,office_bldg) :- parks(X,office_bldg).
goes_to_mtg(X) : - gets_to(X,office_bldg).
parks(X,commercial) :- car(X),lot_full(office_bldg).
finds(X,office_bldg) :- drives(X,office_bldg), parks(X,commercial).
gets_to(X,office_bldg) :- finds(X.office.bldg).
parks(X,office.bldg); lot_full(office_bldg) :- drives(X,office.bldg).

Query:
?- goes_to_mtg(X).

Query clause:
FALSE : - goes_to_intg(X)

Deduction:
?- FALSE
:- goes.tojntg(X)
: - gets_to(X,office_bldg)
:- parks(X,omce_bldg)
:-drives(X,office_bldg) # [lot-full(office.bldg)]
:-car(X) # [lotJull(office_bldg)]_bldg)]
Proof Procedures for Logic Programming 227

# [lot-full(office.bldg)]
% X=al
restart
?- FALSE
- goes-to-mtg(Xl) # lotJull(office.bldg)
- gets_to(Xl,office_bldg) # lot-full(office.bldg)
- finds(Xl,office.bldg) # lot-full (office.bldg)
- drives(Xl,office_bldg), parks(Xl,commercial)
# lot-full(office-bldg)
- car(Xl), parks(Xl,commercial) # lot_full(office_bldg)
- parks(al,commercial) # lot-full(ofHce-bldg)
% XI=al
- car(al), lotJull(office_bldg) # lot Jull(office_bldg)
- lot_full(office_bldg) # lot-full (office.bldg)
% cancellation
# lot_full(office.bldg)

Query answer:
goes_to_mtg(al)

Observe that the query answer above is the merge of the disjunction of
two identical atoms, each corresponding to a block.
Unit nH-Prolog differs from InH-Prolog primarily by the use of only one
active head atom, the distinguished active head, and by a possibly different
restart block initial goal clause. The use of only one active head means
a "constant time" inner loop speed (when a no-occurs-check unification
is used, as for Prolog) rather than an inner-loop speed that varies with
the length of the active list to be checked. (Pre-processing by a compiler
usually makes the possibly longer active head list of InH-Prolog also process
in near-constant time. This is not always possible, such as when multiple
entries to the active head list share the same predicate letter.) The different
restart initial goals can allow a shorter deduction to be written (and yield a
shorter search time) but represent another branch point in the search and
may in fact lengthen search time. (The latter is much less costly than first
considerations indicate.) The search time tradeoff here is hard to quantify.
Evidence at present suggests that InH-Prolog is superior when compilation
is possible. For interpreters, Unit nH-Prolog seems to have the edge on
speed, but is somewhat more complex to code.
We now consider how to represent the underlying inference system of
InH-Prolog in a uniform proof format. One way is to rewrite non-Horn
clauses in a classically equivalent form so that only rules acceptable to the
uniform proof concept are used. We illustrate this approach by example,
writing clauses as traditional implications (not Prolog notation), using D
for implies and interpreting the atom FALSE as an alternative notation
228 Donald W. Loveland and Gopalan Nadathur

for J_. Given the non-Horn clause A D (B V C), we may rewrite this in
the classically equivalent forms (A A (B D FALSE)) D C and (A A (C D
FALSE)) D S with both clauses needed to replace A D (B V C). With use
of clauses of this form, InH-Prolog derivations can indeed be described as
uniform proofs.
We choose a different approach for our major illustration, one that
does not require us to rewrite the non-Horn clauses. We obtain the effect
of case analysis and block initiation used by InH-Prolog by use of a derived
rule meeting the uniform proof conditions. We justify the derived restart
inference figure

by the following derivation:

There is a symmetric rule for P,AvB —> A and this obviously gener-
alizes for different size disjunctive clauses. The similarity in effect between
the restart rule and the rewriting of the non-Horn clauses suggested earlier
is obvious.
We give a uniform proof for the manpower requirement example. To
shorten the proof tree we will combine the CI (Clause Incorporation) and
BC (backchain) into one inference figure. Single symbol abbreviations are
used for predicate and constant names and P denotes the formulas consti-
tuting the program for the example. We again use dl,d2,d3,d4 ? f(Y, X) D
n(X) "existential" variables in the CI inference figure to signify the creation
of one instance of a clause when the proof is read from the bottom upward.

We conclude this section with another example, one where the capabil-
ity of the InH-Prolog procedure might seem to extend beyond the ability to
Proof Procedures for Logic Programming 229

represent the computation by a uniform proof. Indeed, the ability to model


the computation by a uniform proof depends on the variant of InH-Prolog
used, because the role of the query clause is significant in the associated
uniform proof.
We are given the program consisting of the single clause p(a);p(b), and
ask the query

The sequent p(a) V p(b) —> 3Xp(X) has no uniform proof, so it is in-
structive to see the uniform proof for this InH-Prolog computation. By
convention, the query is replaced by query clause FALSE :- p(X) and
query ?- FALSE. This transformation is justifiable in classical logic; in
particular, the sequent (VX(p(X) D FALSE)) D FALSE —> 3Xp(X) has a
C-proof. The InH-Prolog computation of query ?- FALSE given program
p(a) V p ( b ) , p ( X ) D FALSE succeeds, giving answer p(a) Vp(6). Using the
various conventions described earlier and denoting the set {VX(p(X) D
FALSE),p(a) Vp(6)} by P, the associated uniform proof is given below.

4 Conclusion
The material presented in this paper may be read in two ways. The first
way is to consider the procedures as primary, starting with the historical
introduction of procedures. This leads to the SLD-resolution procedure,
which is the inference foundation for Prolog and anchors the traditional
view of logic programming. Two extensions of the traditional notion of
logic programming also appear. Included is an abstraction of the concept
of logic programming that may be regarded as a derivative attempt to en-
code the essence of these procedures. An alternative view is to take the
abstraction as primary and as setting the standard for acceptable logic pro-
gramming languages. If this viewpoint is adopted, the abstraction provides
guidance in constructing goal-directed proof procedures for logic program-
ming — such procedures will essentially be ones that search for uniform
proofs in the relevant context. Horn clause logic and the associated SLD-
resolution procedure are illustrations of this viewpoint. However, richer
exploitations of the general framework are also possible and this issue is
230 Donald W. Loveland and Gopalan Nadathur

discussed at length. The two procedures that conclude the paper serve as
additional novel illustrations of the notion of searching for a uniform proof.
N-Prolog is an excellent example of an extension of SLD-resolution that
fits the uniform proof structure. The second example, near-Horn Prolog,
also fits the uniform proof structure when we specify carefully the version
(InH-Prolog) and the input format (use of query clause). That InH-Prolog
is in fact a proof procedure for full (classical) first-order logic makes the
fit within a uniform proof structure particularly interesting. However, re-
call that conversion of a given valid formula into the positive implication
refutation logic format used by InH-Prolog may utilize several equivalences
valid only classically.
Although the second way of reading the paper most likely is not the
viewpoint the reader first adopts, it is a viewpoint to seriously consider.
If the reader agrees with the paradigm captured by the structure of uni-
form proof, then he/she has a framework to judge not just the procedures
presented here but future procedures that seek to extend our capabilities
within the logic programming arena.

Acknowledgements
We wish to thank Robert Kowalski for suggestions that lead to an improved
paper.
Loveland's work on this paper was partially supported by NSF Grants
CCR-89-00383 and CCR-91-16203 and Nadathur's work was similarly sup-
ported by NSF Grants CCR-89-05825 and CCR-92-08465.

References
[Ait-Kaci, 1990] H. Ai't-Kaci. Warren's Abstract Machine: a Tutorial Re-
construction. MIT Press, Cambridge, MA, 1990.
[Andrews, 1976] P. Andrews. Refutations by Matings. IEEE Transactions
on Computers, 0-25:801-807, 1976.
[Astrachan and Loveland, 1991] O. L. Astrachan and D. W. Loveland.
METEORs: High performance theorem provers for Model Elimination.
In R.S. Boyer, editor, Automated Reasoning: Essays in Honor of Woody
Bledsoe. Kluwer Academic Publishers, Dordrecht, 1991.
[Astrachan and Stickel, 1992] O. L. Astrachan and M. E. Stickel. Caching
and lemmaizing in Model Elimination theorem provers. In W. Bibel
and R. Kowalski, editors, Proceedings of the Eleventh Conference on
Automated Deduction, Lecture Notes in Artificial Intelligence 607, pages
224-238. Springer-Verlag, Berlin, June 1992.
[Bibel, 1982] W. Bibel. Automated Theorem Proving. Vieweg Verlag,
Braunschweig, 1982.
Proof Procedures for Logic Programming 231

[Bose et al., 1991] S. Bose, E. M. Clarke, D. E. Long and S. Michaylov.


PARTHENON: a parallel theorem prover for non-Horn clauses. Journal
of Automated Reasoning, pages 153-181, 1991.
[Bowen, 1982] K. A. Bowen. Programming with full first-order logic. In
J. E. Hayes, D. Michie and Y.-H. Pao, editors, Machine Intelligence 10,
pages 421-440. Halsted Press, 1982.
[Chang and Lee, 1973] C. Chang and R. C. Lee. Symbolic Logic and Me-
chanical Theorem Proving. Academic Press, New York, 1973.
[Fitting, 1969] M. Fitting. Intuitionistic Logic, Model Theory and Forcing.
North-Holland Publishing Company, 1969.
[Fitting, 1985] M. Fitting. A Kripke-Kleene semantics for logic program-
ming. Journal of Logic Programming, 2(4):295-312, 1985.
[Fitting, 1990] M. Fitting. First-order Logic and Automated Theorem
Proving. Springer-Verlag, 1990.
[Gabbay, 1985] D. M. Gabbay. N-Prolog: an extension of Prolog with
hypothetical implication, Part 2. Journal of Logic Programming, 4:251-
283, 1985.
[Gabbay and Reyle, 1984a] D. Gabbay and U. Reyle. N-Prolog: An ex-
tension to Prolog with hypothetical implications I. Journal of Logic
Programming, l(4):319-355, 1984.
[Gabbay and Reyle, 1984b] D. M. Gabbay and U. Reyle. N-Prolog: an
extension of Prolog with hypothetical implication, Part 1. Journal of
Logic Programming, 4:319-355, 1984.
[Gallier and Raatz, 1987] J. Gallier and S. Raatz. Hornlog: A graph-based
interpreter for general Horn clauses. Journal of Logic Programming,
4(2):119-156, 1987.
[Gentzen, 1969] G. Gentzen. Investigations into logical deduction. In M. E.
Szabo, editor, The Collected Papers of Gerhard Gentzen, pages 68-131.
North Holland, Amsterdam, 1969.
[Green, 1969] C. Green. Theorem-proving by resolution as a basis for
questions-answering systems. In B. Meltzer and D. Michie, editors, Ma-
chine Intelligence 4, pages 183-205. Edinburgh University Press, Edin-
burgh, 1969.
[Green and Raphael, 1968] C. Green and B. Raphael. The use of theorem-
proving techniques in systems. In Proceedings of the 23rd National
Conference of the Association of Computing Machinery, pages 169-181,
Princeton, 1968. Brandon Systems Press.
[Harland and Pym, 1991] J. Harland and D. Pym. The uniform proof-
theoretic foundation of linear logic programming (extended abstract). In
V. Saraswat and K. Ueda, editors, Proceedings of the 1991 International
Logic Programming Symposium, pages 304-318. MIT Press, 1991.
[Harrop, I960] R. Harrop. Concerning formulas of the types A -> B V
232 Donald W. Loveland and Gopalan Nadathur

C, A -> (Ex)B(x) in intuitionistic formal systems. Journal of Symbolic


Logic, 25:27-32, 1960.
[Hill, 1974] R. Hill. LUSH-resolution and its completeness. DCL Memo No.
78, School of Artificial Intelligence, University of Edinburgh, Edinburgh,
Scotland, August 1974.
[Hodas and Miller, 1994] J. Hodas and D. Miller. Logic programming in
a fragment of intuitionistic linear logic. Information and Computation,
110(2):327-365, May 1994.
[Kowalski, 1970] R. Kowalski. Search strategies for theorem proving. In B.
Meltzer and D. Michie, editors, Machine Intelligence 6, pages 181-201.
Edinburgh University Press, Edinburgh, 1970.
[Kowalski, 1979] R. Kowalski. Logic for Problem Solving. North-Holland,
1979.
[Kowalski and Kuehner, 1971] R. Kowalski and D. Kuehner. Linear res-
olution with selection function. Artificial Intelligence, pages 227-260,
1971.
[Letz et al., 1991] R. Letz, J. Schumann, S. Bayerl, and W. Bibel.
SETHEO: a high-performance theorem prover. Journal of Automated
Reasoning, pages 183-212, 1991.
[Lloyd and Topor, 1984] J. Lloyd and R. Topor. Making Prolog more ex-
pressive. Journal of Logic Programming, 1(3):225-240, 1984.
[Lobo et al., 1992] J. Lobo, J. Minker, and A. Rajasekar. Foundations of
Disjunction Logic Programming. MIT Press, Cambridge, MA, 1992.
[Loveland, 1968] D. W. Loveland. Mechanical theorem proving by model
elimination. Journal of the Association for Computing Machines, pages
236-251, 1968.
[Loveland, 1969] D. W. Loveland. A simplified format for the model elim-
ination procedure. Journal of the Association for Computing Machines,
pages 349-363, 1969.
[Loveland, 1970] D. W. Loveland. A linear format for resolution. In Sympo-
sium on Automatic Demonstration, Lecture Notes in Mathematics 125,
pages 147-162. Springer-Verlag, Berlin, 1970.
[Loveland, 1978] D. W. Loveland. Automated Theorem Proving: a Logical
Basis. North-Holland, Amsterdam, 1978.
[Loveland, 1987] D. W. Loveland. Near-Horn Prolog. In Proceedings of the
Fourth International Conference and Symposium on Logic Programming,
pages 456-469. MIT Press, Cambridge, MA, 1987.
[Loveland, 1991] D. W. Loveland. Near-Horn Prolog and beyond. Journal
of Automated Reasoning, pages 1-26, 1991.
[Loveland and Reed, 1991] D. W. Loveland and D. W. Reed. A near-Horn
Prolog for compilation. In J. Lassez and G. Plotkin, editors, Computa-
tional Logic: Essays in Honor of Alan Robinson, pages 542-564. MIT
Proof Procedures for Logic Programming 233

Press, Cambridge, MA, 1991.


[Luckham, 1970] D. Luckham. Refinement theorems in resolution theory.
In Symposium on Automatic Demonstration, Lecture Notes in Mathe-
matics 125, pages 163-190. Springer-Verlag, Berlin, 1970.
[McCarty, 1988a] L. T. McCarty. Clausal intuitionistic logic I. Fixed point
semantics. Journal of Logic Programming, 5(1):1-31, 1988.
[McCarty, 1988b] L. T. McCarty. Clausal intuitionistic logic II. Tableau
proof procedures. Journal of Logic Programming, 5(2):93-132, 1988.
[Miller, 1989a] D. Miller. Lexical scoping as universal quantification. In
G. Levi and M. Martelli, editors, Sixth International Logic Programming
Conference, pages 268-283, Lisbon, Portugal, June 1989. MIT Press,
Cambridge, MA.
[Miller, 1989b] D. Miller. A logical analysis of modules in logic program-
ming. Journal of Logic Programming, 6:79-108, 1989.
[Miller, 1992] D. Miller. Unification under a mixed prefix. Journal of Sym-
bolic Computation, 14:321-358, 1992.
[Miller et al, 1991] D. Miller, G. Nadathur, F. Pfenning and A. Scedrov.
Uniform proofs as a foundation for logic programming. Annals of Pure
and Applied Logic, 51:125-157, 1991.
[Nadathur, 1993] G. Nadathur. A proof procedure for the logic of heredi-
tary Harrop formulas. Journal of Automated Reasoning, 11(1):115-145,
August 1993.
[Nadathur and Miller, 1988] G. Nadathur and D. Miller. An overview of
AProlog. In K. A. Bowen and R. A. Kowalski, editors, Fifth Interna-
tional Logic Programming Conference, pages 810-827, Seattle, Washing-
ton, August 1988. MIT Press.
[Nadathur et al., 1993] G. Nadathur, B. Jayaramanand K. Kwon. Scoping
constructs in logic programming: Implementation problems and their so-
lution. Technical Report CS-1993-17, Department of Computer Science,
Duke University, July 1993.
[Paulson, 1987] L. R. Paulson. The representation of logics in higher-order
logic. Technical Report Number 113, University of Cambridge, Computer
Laboratory, August 1987.
[Pfenning, 1989] F. Pfenning. Elf: A language for logic definition and veri-
fied metaprogramming. In Proceedings of the Fourth Annual Symposium
on Logic in Computer Science, pages 313-322. IEEE Computer Society
Press, June 1989.
[Pohl, 1971] I. Pohl. Bi-directional search. In B. Meltzer and D. Michie,
editors, Machine Intelligence 6, pages 127-140. Edinburgh University
Press, Edinburgh, 1971.
[Prawitz, 1965] D. Prawitz. Natural Deduction: A Proof-Theoretical Study.
Almqvist & Wiksell, 1965.
234 Donald W. Loveland and Gopalan Nadathur

[Reed, 1988] D. W. Reed. Near-Horn Prolog: A first-order extension to


Prolog. Master's thesis, Duke University, May 1988.
[Reed, 1991] D. W. Reed. A Case-analysis Approach to Disjunctive Logic
Programming. PhD thesis, Duke University, December 1991.
[Reed and Loveland, 1992] D. W. Reed and D. W. Loveland. A comparison
of three Prolog extensions. Journal of Logic Programming, 12:25-50,
1992.
[Robinson, 1965] J. A. Robinson. A machine-oriented logic based on the
resolution principle. Journal of the Association for Computing Machines,
23-41, 1965.
[Schumann and Letz, 1990] J. Schumann and R. Letz. PARTHEO: a high
performance parallel theorem prover. In M. Stickel, editor, Proceedings
of the Tenth International Conference on Automated Deduction, Lec-
ture Notes in Artificial Intelligence 449, pages 40-56, Berlin, June 1990.
Springer-Verlag.
[Smith and Loveland, 1988] B. T. Smith and D. W. Loveland. A simple
near-Horn Prolog interpreter. In Proceedings of the Fifth International
Conference and Symposium on Logic Programming, Seattle, 1988.
[Stickel, 1984] M. E. Stickel. A Prolog technology theorem prover. New
Generation Computing, 371-383, 1984.
[Stickel, 1988] M. E. Stickel. A Prolog Technology Theorem Prover: im-
plementation by an extended Prolog compiler. Journal of Automated
Reasoning, 353-380, 1988.
[Troelstra, 1973] A. Troelstra. Metamathematical Investigation of Intu-
itionistic Arithmetic and Analysis. Number 344 in Lecture Notes in
Mathematics. Springer-Verlag, 1973.
[Warren, 1983] D. H. D. Warren. An abstract Prolog instruction set. Tech-
nical Note 309, SRI International, October 1983.
[Wos et al, 1964] L. Wos, D. Carson and G. Robinson. The unit preference
strategy in theorem proving. In AFIPS Conference Proceedings 26, pages
615-621, Washington, DC, 1964. Spartan Books.
[Wos et ai, 1965] L. Wos, G. Robinson, and D. Carson. Efficiency and
completeness of the set of support strategy in theorem proving. Journal
of the Association for Computing Machines, 536-541, 1965.
[Wos et al., 1991] L. Wos, R. Overbeek, and E. Lusk. Subsumption, a
sometimes undervalued procedure. In J. Lassez and G. Plotkin, editors,
Computational Logic: Essays in Honor of Alan Robinson, pages 3-40.
MIT Press, Cambridge, MA, 1991.
The Role of Abduction in Logic
Programming
A. C. Kakas, R. A. Kowalski and F. Toni

Contents
1 Introduction 236
1.1 Abduction in logic 237
1.2 Integrity constraints 241
1.3 Applications 243
2 Knowledge assimilation 244
3 Default reasoning viewed as abduction 249
4 Negation as failure as abduction 254
4.1 Logic programs as abductive frameworks 255
4.2 An abductive proof procedure for LP 257
4.3 An argumentation-theoretic interpretation 263
4.4 An argumentation-theoretic interpretation of the abduc-
tive proof procedure 267
5 Abductive logic programming 269
5.1 Generalised stable model semantics 270
5.2 An abductive proof procedure for ALP 273
5.3 An argumentation-theoretic interpretation of the abduc-
tive proof procedure for ALP 277
5.4 Computation of abduction through TMS 279
5.5 Simulation of abduction 279
5.6 Abduction through deduction from the completion 285
5.7 Abduction and constraint logic programming . . . 286
6 Extended logic programming 288
6.1 Answer set semantics 289
6.2 Restoring consistency of answer sets 290
6.3 Rules and exceptions in LP 293
6.4 (Extended) Logic Programming without Negation as Fail-
ure 295
6.5 An argumentation-theoretic approach to ELP . . . 297
236 A. C. Kakas, R. A. Kowalski and F. Toni

6.6 A methodology for default reasoning with explicit nega-


tion 299
6.7 ELP with abduction 300
7 An abstract argumentation-based framework for default reason-
ing 300
8 Abduction and truth maintenance 303
8.1 Justification-based truth maintenance 304
8.2 Assumption-based truth maintenance 305
9 Conclusions and future work 307

1 Introduction
This paper extends and updates our earlier survey and analysis of work on
the extension of logic programming to perform abductive reasoning [Kakas
et al., 1993]. The purpose of the paper is to provide a critical overview of
some of the main research results, in order to develop a common frame-
work for evaluating these results, to identify the main unresolved prob-
lems, and to indicate directions for future work. The emphasis is not on
technical details but on relationships and common features of different ap-
proaches. Some of the main issues we will consider are the contributions
that abduction can make to the problems of reasoning with incomplete
or negative information, the evolution of knowledge, and the semantics of
logic programming and its extensions. We also discuss recent work on the
argumentation-theoretic interpretation of abduction, which was introduced
in the earlier version of this paper.

The philosopher Peirce first introduced the notion of abduction. In [Peirce,


1931-58] he identified three distinguished forms of reasoning.
Deduction, an analytic process based on the application of general rules
to particular cases, with the inference of a result.
Induction, synthetic reasoning which infers the rule from the case and
the result.
Abduction, another form of synthetic inference, but of the case from a
rule and a result.
Peirce further characterised abduction as the "probational adoption of a
hypothesis" as explanation for observed facts (results), according to known
laws. "It is however a weak kind of inference, because we cannot say that we
believe in the truth of the explanation, but only that it may be true" [Peirce,
1931-58].

Abduction is widely used in common-sense reasoning, for instance in di-


agnosis, to reason from effect to cause [Charniak and McDermott, 1985;
The Role of Abduction 237

Pople, 1973]. We consider here an example drawn from [Pearl, 1987].


Example 1.0.1.
Consider the following theory T:

grass-is-wet 4— rained-last-night
grass-is-wet <— sprinkler-was-on
shoes-are-wet 4— grass-is-wet.

If we observe that our shoes are wet, and we want to know why this is so,
{rained-last-night} is a possible explanation, i.e. a set of hypotheses that
together with the explicit knowledge in T implies the given observation.
{sprinkler-was-on} is another alternative explanation.
Abduction consists in computing such explanations for observations.
It is a form of non-monotonic reasoning, because explanations which are
consistent with one state of a knowledge base may become inconsistent with
new information. In the example above the explanation rained-last-night
may turn out to be false, and the alternative explanation sprinkler-was-on
may be the true cause for the given observation. The existence of multiple
explanations is a general characteristic of abductive reasoning, and the
selection of "preferred" explanations is an important problem.

1.1 Abduction in logic


Given a set of sentences T (a theory presentation), and a sentence G (obser-
vation), to a first approximation, the abductive task can be characterised
as the problem of rinding a set of sentences A (abductive explanation for
G) such that:
(1) TUA |= G,
(2) T U A is consistent.
This characterisation of abduction is independent of the language in which
T, G and A are formulated. The logical implication sign (= in (1) can
alternatively be replaced by a deduction operator K The consistency re-
quirement in (2) is not explicit in Peirce's more informal characterisation
of abduction, but it is a natural further requirement.
In fact, these two conditions (1) and (2) alone are too weak to capture
Peirce's notion. In particular, additional restrictions on A are needed to
distinguish abductive explanations from inductive generalisations [Console
and Saitta, 1992]. Moreover, we also need to restrict A so that it conveys
some reason why the observations hold, e.g. we do not want to explain one
effect in terms of another effect, but only in terms of some cause. For both
of these reasons, explanations are often restricted to belong to a special
pre-specified, domain-specific class of sentences called abducible. In this
paper we will assume that the class of abducibles is always given.
238 A. C. Kakas, R. A. Kowalski and F. Toni

Additional criteria have also been proposed to restrict the number of


candidate explanations:
• Once we restrict the hypotheses to belong to a specified set of sen-
tences, we can further restrict, without loss of generality, the hy-
potheses to atoms (that "name" these sentences) which are predicates
explicitly indicated as abducible, as shown by Poole [Poole, 1988].
• In Section 1.2 we will discuss the use of integrity constraints to reduce
the number of possible explanations.
• Additional information can help to discriminate between different ex-
planations, by rendering some of them more appropriate or plausible
than others. For example Sattar and Goebel [Sattar and Goebel,
1989] use "crucial literals" to discriminate between two mutually in-
compatible explanations. When the crucial literals are tested, one of
the explanations is rejected. More generally Evans and Kakas [Evans
and Kakas, 1992] use the notion of corroboration to select explana-
tions. An explanation fails to be corroborated if some of its logical
consequences are not observed. A related technique is presented by
Sergot in [Sergot, 1983], where information is obtained from the user
during the process of query evaluation.
• Moreover various (domain specific) criteria of preference can be spec-
ified. They impose a (partial) order on the sets of hypotheses which
leads to the discrimination of explanations [Brewka, 1989; Charniak
and McDermott, 1985; Gabbay, 1991; Hobbs et al., 1990; Poole, 1985;
Poole, 1992; Stickel, 1989].
Cox and Pietrzykowski [Cox and Pietrzykowski, 1992] identify other
desirable properties of abductive explanations. For instance, an explana-
tion should be basic, i.e. should not be explainable in terms of other
explanations. For instance, in Example 1.0.1 the explanation

{grass-is-wet}

for the observation


shoes-are-wet
is not basic, whereas the alternative explanations

{rained-last-night}
{sprinkler-was-on}

are.

An explanation should also be minimal, i.e. not subsumed by another


one. For example, in Example 1.0.1 the explanation
The Role of Abduction 239

{rained-last-night, sprinkler-was-on}

for the observation


shoes-are-wet
is not minimal, while the explanations

{ rained-last-night}
{sprinkler-was-on}

are.
So far we have presented a semantic characterisation of abduction and
discussed some heuristics to deal with the multiple explanation problem,
but we have not described any proof procedures for computing abduction.
Various authors have suggested the use of top-down, goal-oriented compu-
tation, based on the use of deduction to drive the generation of abductive
hypotheses. Cox and Pietrzykowski [Cox and Pietrzykowski, 1992] con-
struct hypotheses from the "dead ends" of linear resolution proofs. Finger
and Genesereth [Finger and Genesereth, 1985] generate "deductive solu-
tions to design problems" using the "residue" left behind in resolution
proofs. Poole, Goebel and Aleliunas [Poole et al, 1987] also use linear
resolution to generate hypotheses.
In contrast, the ATMS [de Kleer, 1986] computes abductive explana-
tions bottom-up. The ATMS can be regarded as a form of hyper-resolution,
augmented with subsumption, for propositional logic programs [Reiter and
De Kleer, 1987]. Lamma and Mello [Lamma and Mello, 1992] have devel-
oped an extension of the ATMS for the non-propositional case. Resolution-
based techniques for computing abduction have also been developed by De-
molombe and Farinas del Cerro [Demolombe and Farinas del Cerro, 1991]
and Gaifman and Shapiro [Gaifman and Shapiro, 1989].
Abduction can also be applied to logic programming (LP). A (general)
logic program is a set of Horn clauses extended by negation as failure
[Clark, 1978], i.e. clauses of the form:

where each L; is either an atom Ai or its negation ~ Ai,1 A is an atom and


each variable occurring in the clause is implicitly universally quantified. A
is called the head and L1,..., Ln is called the body of the clause. A logic
program where each literal Li in the body of every clause is atomic is said
to be definite.
Abduction can be computed in LP by extending SLD and SLDNF [Chen
and Warren, 1989; Eshghi and Kowalski, 1988; Eshghi and Kowalski, 1989;
1
In the sequel we will represent negation as failure as <
240 A. C. Kakas, R. A. Kowalski and F. Toni

Kakas and Mancarella, 1990a; Kakas and Mancarella, 1990d; Denecker and
De Schreye, 1992b; Teusink, 1993]. Instead of failing in a proof when a
selected subgoal fails to unify with the head of any rule, the subgoal can
be viewed as a hypothesis. This is similar to viewing abducibles as "ask-
able" conditions which are treated as qualifications to answers to queries
[Sergot, 1983]. In the same way that it is useful to distinguish a subset of
all predicates as "askable", it is useful to distinguish certain predicates as
abducible. In fact, it is generally convenient to choose, as abducible predi-
cates, ones which are not conclusions of any clause. As we shall remark at
the beginning of Section 5, this restriction can be imposed without loss of
generality, and has the added advantage of ensuring that all explanations
will be basic.
Abductive explanations computed in LP are guaranteed to be minimal,
unless the program itself encodes non-minimal explanations. For example,
in the prepositional logic program

both the minimal explanation {q} and the non-minimal explanation {q, r}
are computed for the observation p.
The abductive task for the logic-based approach has been proved to
be highly intractable: it is NP-hard even if T is a set of acyclic [Apt and
Bezem, 1990] propositional definite clauses [Selman and Levesque, 1990;
Eiter and Gottlob, 1993], and is even harder if T is a set of any propositional
clauses [Eiter and Gottlob, 1993]. These complexity results hold even if
explanations are not required to be minimal. However, the abductive task
is tractable for certain more restricted classes of logic programs (see for
example [Eshghi, 1993]).
There are other formalisations of abduction. We mention them for
completeness, but in the sequel we will concentrate on the logic-based view
previously described.
• Allemand, Tanner, Bylander and Josephson [Allemand et al., 1991]
and Reggia [Reggia, 1983] present a mathematical characterisation,
where abduction is defined over sets of observations and hypotheses,
in terms of coverings and parsimony.
• Levesque [Levesque, 1989] gives an account of abduction at the "knowl-
edge level". He characterises abduction in terms of a (modal) logic
of beliefs, and shows how the logic-based approach to abduction can
be understood in terms of a particular kind of belief.
In the previous discussion we have briefly described both semantics and
proof procedures for abduction. The relationship between semantics and
proof procedures can be understood as a special case of the relationship
The Role of Abduction 241

between program specifications and programs. A program specification


characterises what is the intended result expected from the execution of the
program. In the same way semantics can be viewed as an abstract, possibly
non-constructive definition of what is to be computed by the proof proce-
dure. From this point of view, semantics is not so much concerned with
explicating meaning in terms of truth and falsity, as it is with providing
an abstract specification which "declaratively" expresses what we want to
compute. This specification view of semantics is effectively the one adopted
in most recent work on the semantics of LP, which restricts interpretations
to Herbrand interpretations. The restriction to Herbrand interpretations
means that interpretations are purely syntactic objects, which have no
bearing on the correspondence between language and "reality". A purely
syntactic view of semantics, based upon the notion of knowledge assimila-
tion described in Section 2 below, is developed in [Kowalski, 1994].

One important alternative way to specify the semantics of a language,


which will be used in the sequel, is through the translation of sentences
expressed in one language into sentences of another language whose se-
mantics is already well understood. For example if we have a sentence in
a typed logic language of the form "there exists an object of type t such
that the property p holds" we can translate this into a sentence of the form
3x (p(x) A t ( x ) ) , where t is a new predicate to represent the type t, whose
semantics is then given by the familiar semantics of first-order logic. Simi-
larly the typed logic sentence "for all objects of type t the property p holds"
becomes the sentence Vx(p(x) 4- t ( x ) ) . Hence instead of developing a new
semantics for the typed logic language, we apply the translation and use
the existing semantics of first-order logic.

1.2 Integrity constraints


Abduction as presented so far can be restricted by the use of integrity
constraints. Integrity constraints are useful to avoid unintended updates
to a database or knowledge base. They can also be used to represent desired
properties of a program [Lever, 1991].
The concept of integrity constraints first arose in the field of databases
and to a lesser extent in the field of AI knowledge representation. The basic
idea is that only certain knowledge base states are considered acceptable,
and an integrity constraint is meant to enforce these legal states. When
abduction is used to perform updates (see Section 2), we can use integrity
constraints to reject abductive explanations.
Given a set of integrity constraints, /, of first-order closed formulae,
the second condition (2) of the semantic definition of abduction (see Sec-
tion 1.1) can be replaced by:
(2') T U A satisfies I.
242 A. C. Kakas, R. A. Kowalski and F. Toni

As previously mentioned, we also restrict A to consist of atoms drawn


from predicates explicitly indicated as abducible. Until the discussion in
Section 5.7, we further restrict A to consist of variable-free atomic sen-
tences.
In the sequel an abductive framework will be given as a triple (T, A, I),
where T is a theory, A is the set of abducible predicates, i.e. A C A 2, and
/ is a set of integrity constraints.
There are several ways to define what it means for a knowledge base KB
(T U A in our case) to satisfy an integrity constraint </> (in our framework
( / > € / ) . The consistency view requires that:

KB satisfies <j> iff KB U <f> is consistent.

Alternatively the theoremhood view requires that:

KB satisfies o iff KB = o.

These definitions have been proposed in the case where the theory is a logic
program P by Kowalski and Sadri [Sadri and Kowalski, 1987] and Lloyd
and Topor [Lloyd and Topor, 1985] respectively, where KB is the Clark
completion [Clark, 1978] of P.
Another view of integrity constraints [Kakas, 1991; Kakas and Man-
carella, 1990; Kowalski, 1990; Reiter, 1988; Reiter, 1990] regards these as
epistemic or metalevel statements about the content of the database. In
this case the integrity constraints are understood as statements at a differ-
ent level from those in the knowledge base. They specify what must be true
about the knowledge base rather than what is true about the world mod-
elled by the knowledge base. When later we consider abduction in LP (see
Sections 4,5), integrity satisfaction will be understood in a sense which is
stronger than consistency, weaker than theoremhood, and arguably similar
to the epistemic or metalevel view.
For each such semantics, we have a specification of the integrity checking
problem. Although the different views of integrity satisfaction are concep-
tually very different, the integrity checking procedures based upon these
views are not very different in practice (e.g. [Decker, 1986; Sadri and
Kowalski, 1987; Lloyd and Topor, 1985]). They are mainly concerned
with avoiding the inefficiency which arises if all the integrity constraints
are retested after each update. A common idea of all these procedures is
to render integrity checking more efficient by exploiting the assumption
that the database before the update satisfies the integrity constraints, and
therefore if integrity constraints are violated after the update, this viola-
tion should depend upon the update itself: In [Sadri and Kowalski, 1987]
2
Here and in the rest of this paper we will use the same symbol A to indicate both
the set of abducible predicates and the set of all their variable-free instances.
The Role of Abduction 243

this assumption is exploited by reasoning forward from the updates. This


idea is exploited for the purpose of checking the satisfaction of abductive
hypotheses in [Eshghi and Kowalski, 1989; Kakas and Mancarella, 1990c;
Kakas and Mancarella, 1990d]. Although this procedure was originally for-
mulated for the consistency view of constraint satisfaction, it has proved
equally appropriate for the semantics of integrity constraints in abductive
logic programming.
1.3 Applications
In this section we briefly describe some of the applications of abduction
in AI. In general, abduction is appropriate for reasoning with incomplete
information. The generation of abducibles to solve a top-level goal can be
viewed as the addition of new information to make incomplete information
more complete.
Abduction can be used to generate causal explanations for fault diag-
nosis (see for example [Console et al, 1989; Preist and Eshghi, 1992]). In
medical diagnosis, for example, the candidate hypotheses are the possible
causes (diseases), and the observations are the symptoms to be explained
[Poole, 1988a; Reggia, 1983]. Abduction can also be used for model-based
diagnosis [Eshghi, 1990; Reiter, 1987]. In this case the theory describes the
"normal" behaviour of the system, and the task is to find a set of hypothe-
ses of the form "some component A is not normal" that explains why the
behaviour of the system is not normal.
Abduction can be used to perform high level vision [Cox and Pietrzy-
kowski, 1992]. The hypotheses are the objects to be recognised, and the
observations are partial descriptions of objects.
Abduction can be used in natural language understanding to in-
terpret ambiguous sentences [Charniak and McDermott, 1985; Gabbay and
Kempson, 1991; Hobbs, 1990; Stickel, 1988]. The abductive explanations
correspond to the various possible interpretations of such sentences.
In planning problems, plans can be viewed as explanations of the given
goal state to be reached [Eshghi, 1988; Shanahan, 1989].
These applications of abduction can all be understood as generating
hypotheses which are causes for observations which are effects. An ap-
plication that does not necessarily have a direct causal interpretation is
knowledge assimilation [Kakas and Mancarella, 1990d; Kowalski, 1979;
Kunifuji et al., 1986; Miyaki et al., 1984], described in greater detail below.
The assimilation of a new datum can be performed by adding to the theory
new hypotheses that are explanations for the datum. Knowledge assimila-
tion can also be viewed as the general context within which abduction takes
place. Database view updates [Bry, 1990; Kakas and Mancarella, 1990a;
Console et al, 1994] are an important special case of knowledge assimila-
tion. Update requests are interpreted as observations to be explained. The
explanations of the observations are transactions that satisfy the update
244 A. C. Kakas, R. A. Kowalski and F. Toni

request.
Another important application which can be understood in terms of a
"non-causal" use of abduction is default reasoning. Default reasoning
concerns the use of general rules to derive information in the absence of
contradictions. In the application of abduction to default reasoning, conclu-
sions are viewed as observations to be explained by means of assumptions
which hold by default unless a contradiction can be shown [Eshghi and
Kowalski, 1988; Poole, 1988]. As Poole [Poole, 1988] argues, the use of ab-
duction avoids the need to develop a non-classical, non-monotonic logic for
default reasoning. In Section 3 we will further discuss the use of abduction
for default reasoning in greater detail. Because negation as failure in LP is
a form of default reasoning, its interpretation by means of abduction will
be discussed in section 4.
Some authors (e.g. Pearl [Pearl, 1988]) advocate the use of probabil-
ity theory as an alternative approach to common sense reasoning in gen-
eral, and to many of the applications listed above in particular. However,
Poole [Poole, 1993] shows how abduction can be used to simulate (dis-
crete) Bayesian networks in probability theory. He proposes the language
of probabilistic Horn abduction: in this language an abductive framework
is a triple (T, A, I), where T is a set of Horn clauses, A is a set of abducibles
without definitions in T (without loss of generality, see Section 5), and /
is a set of integrity constraints in the form of denials of abducibles only.
In addition, for each integrity constraint, a probability value is assigned to
each abducible, so that the sum of all the values of all the abducibles in
each integrity constraint is 1. If the abductive framework satisfies certain
assumptions, e.g. T is acyclic [Apt and Bezem, 1990], the bodies of all the
clauses defining each non-abducible atom are mutually exclusive and these
clauses are "covering", and abducibles in A are "probabilistically indepen-
dent", then such a probabilistic Horn abduction theory can be mapped
onto a (discrete) Bayesian network and vice versa.

2 Knowledge assimilation
Abduction takes place in the context of assimilating new knowledge (infor-
mation, belief or data) into a theory (or knowledge base). There are four
possible deductive relationships between the current knowledge base (KB),
the new information, and the new KB which arises as a result [Kowalski,
1979; Kowalski, 1994].
1. The new information is already deducible from the current KB. The
new KB, as a result, is identical with the current one.
2. The current KB = KB1 U KB2 can be decomposed into two parts.
One part KB1 together with the new information can be used to
deduce the other part KB2. The new KB is KB1 together with the
new information.
The Role of Abduction 245

3. The new information violates the integrity of the current KB. In-
tegrity can be restored by modifying or rejecting one or more of the
assumptions which lead to the contradiction.

4. The new information is independent from the current KB. The new
KB is obtained by adding the new information to the current KB.

In case (4) the KB can, alternatively, be augmented by an explanation for


the new datum [Kakas and Mancarella, 1990d; Kowalski, 1979; Kunifuji et
al., 1986]. In [Kunifuji et al., 1986] the authors have developed a system
for knowledge assimilation (KA) based on this use of abduction. They
have identified the basic issues associated with such a system and proposed
solutions for some of these.
Various motivations can be given for the addition of an abductive ex-
planation instead of the new datum in case (4) of the process of KA. For
example, in natural language understanding or in diagnosis, the assimila-
tion of information naturally demands an explanation. In other cases the
addition of an explanation as a way of assimilating new data is forced by the
particular way in which the knowledge is represented in the theory. This
is the case, for instance, for the formulation of temporal reasoning in the
Event Calculus [Kowalski and Sergot, 1986; Kowalski, 1992], as illustrated
by the following example.
Example 2.0.1. The simplified version of the event calculus we consider
contains an axiom that expresses the persistence of a property P from the
time TI that it is initiated by an event E to a later time T2:

holds-at(P, T2) «- happens(E,T1),


T1 < T2,
initiates(E, P),
persist$(T1, P, T2).

New information that a property holds at a particular time point can


be assimilated by adding an explanation in terms of the happening of some
event that initiates this property at an earlier point of time together with
an appropriate assumption that the property persists from one time to
the other [Eshghi, 1988; Kakas and Mancarella, 1989; Shanahan, 1989;
Van Belleghem et al., 1994]. This has the additional effect that the new
KB will imply that the property holds until it is terminated in the future
by the happening of some event [Shanahan, 1989]. The fact that a property
P cannot persist from a time T1 to a later time T2 if an event E happens
at a time T between T1 and T2 such that E terminates P is expressed by
the following integrity constraint:

-.[persists (T1, P, T 2 )A happens (E, T)/\ terminates(E, P}/\T1 <T < T2].
246 A. C. Kakas, R. A. Kowdski and F. Toni

Assimilating new information by adding explanations that satisfy the in-


tegrity constraints has the further effect of resolving conflicts between
the current KB and the new information [Kakas and Mancarella, 1989;
Shanahan, 1989]. For example, suppose that KB contains the facts 3

happens(takes_book(mary),to)
initiates(takes_book(X), has-book(X))
terminates(gives_book(X, Y), has_book(X))
initiates(gives_book(X, Y), has_book(Y))

Then, given to < t1 < t2, the persistence axiom predicts holds_at(has_book
(mary),t1) by assuming persists(t 0 ,has_book(mary),t1), and holds.at
(has_book(mary),t2) by assuming persists(to,has_book(mary),t2). Both
these assumptions are consistent with the integrity constraint. Suppose
now that the new information holds.at(has_book(john), t2) is added to KB.
This conflicts with the prediction holds_at(has-book(mary),t2)- However,
the new information can be assimilated by adding to KB the hypotheses
happens(gives_book(mary,john),t1) and persists(t1,has_book(john),t2}
and by retracting the hypothesis persists(to,has-book(mary),t2)- There-
fore, the earlier prediction holds-at(has_book(mary),t2) can no longer be
derived from the new KB.
Note that in this example the hypothesis happens(gives_book(mary,
john),t1) can be added to KB since it does not violate the further integrity
constraint

-i[happens(E, T) A precanditian(E', T, P)A ~ holds_at(P, T)]

expressing that an event E cannot happen at a time T if the preconditions


P of E do not hold at time T. In this example, we may assume that KB
also contains the fact

preconditi0n(gives_book(X, Y), has_book(X)).

Once a hypothesis has been generated as an explanation for an external


datum, it itself needs to be assimilated into the KB. In the simplest situ-
ation, the explanation is just added to the KB, i.e. only case (4) applies
without further abduction. Case (1) doesn't apply, if abductive explana-
tions are required to be basic. However case (2) may apply, and can be
particularly useful for discriminating between alternative explanations for
3
Note that here KB contains a definition for the abducible predicate happens. In
Section 5 we will see that new predicates and clauses can be added to KB so that
abducible predicates have no definitions in the transformed KB.
The Role of Abduction 247

the new information. For instance we may prefer a set of hypotheses which
entails information already in the KB, i.e. hypotheses that render the KB
as "compact" as possible.
Example 2.0.2. Suppose the current KB contains

and r is the new datum to be assimilated. The explanation {q} is preferable


to the explanation {s}, because q implies both r and p, but s only implies
r. Namely, the explanation {q} is more relevant.
Notice however that the use of case (2) to remove redundant informa-
tion can cause problems later. If we need to retract previously inserted
information, entailed information which is no longer explicitly in the KB
might be lost.
It is interesting to note that case (3) can be used to check the integrity
of abductive hypotheses generated in case (4).
Any violation of integrity detected in case (3) can be remedied in several
ways [Kowalski, 1979]. The new input can be retracted as in conventional
databases. Alternatively the new input can be upheld and some other as-
sumptions can be withdrawn. This is the case with view updates. The
task of translating the update request on the view predicates to an equiv-
alent update on the extensional part (as in case (4) of KA) is achieved by
finding an abductive explanation for the update in terms of variable-free
instances of extensional predicates [Kakas and Mancarella, 1990a]. Any
violation of integrity is dealt with by changing the extensional part of the
database.

Example 2.0.3. Suppose the current KB consists of the clauses


sibling(X, Y) <- parent(Z, X), parent(Z, Y)
parent(X,Y) <- father(X,Y)
parent(X, Y) <- mother(X, Y)
father(john, mary)
mother(jane, mary)
together with the integrity constraints
X = Y <- father(X, Z), father (Y, Z)
X = Y <- mother(X, Z), mother(Y, Z)
248 A. C. Kakas, R. A. Kowalski and F. Toni

X ^ Y <- mother(X, Z), father(Y, W)

where sibling and parent are view predicates, father and mother are ex-
tensional, and =,= are "built-in" predicates such that

X = X and
s = t for all distinct variable-free terms s and t.

Suppose the view update

insert sibling(mary, bob)

is given. This can be translated into either of the two minimal updates

insert father (John, bob)


insert mother(jane, bob)

on the extensional part of the KB. Both of these updates satisfy the in-
tegrity constraints. However, only the first update satisfies the integrity
constraints if we are given the further update

insert mother (sue, bob).

The general problem of belief revision has been studied formally in


[Gardenfors, 1988; Nebel, 1989; Nebel, 1991; Doyle, 1991]. Gardenfors
proposes a set of axioms for rational belief revision containing such con-
straints on the new theory as "no change should occur to the theory when
trying to delete a fact that is not already present" and "the result of re-
vision should not depend on the syntactic form of the new data". These
axioms ensure that there is always a unique way of performing belief re-
vision. However Doyle [Doyle, 1991] argues that, for applications in AI,
this uniqueness property is too strong. He proposes instead the notion of
"economic rationality", in which the revised sets of beliefs are optimal, but
not necessarily unique, with respect to a set of preference criteria on the
possible beliefs states. This notion has been used to study the evolution
of databases by means of updates [Kakas, 1991a]. It should be noted that
the use of abduction to perform belief revision in the view update case
also allows results which are not unique, as illustrated in Example 2.0.3.
Aravindan and Dung [Aravindan and Dung, 1994] have given an abductive
characterisation of rational belief revision and have applied this result to
formulate belief revision postulates for the view update problem.
A logic-based theory of the assimilation of new information has also
been developed in the Relevance Theory of Sperber and Wilson [Sperber
The Role of Abduction 249

and Wilson, 1986] with special attention to natural language understand-


ing. Gabbay, Kempson and Pitts [Gabbay et al., 1994] have investigated
how abductive reasoning and relevance theory can be integrated to choose
between different abductive interpretations of a natural language discourse.
KA and belief revision are also related to truth maintenance systems.
We will discuss truth maintenance and its relationship with abduction in
Section 8.

3 Default reasoning viewed as abduction


Default reasoning concerns the application of general rules to draw conclu-
sions provided the application of the rules does not result in contradictions.
Given, for example, the general rules "birds fly" and "penguins are birds
that do not fly" and the only fact about Tweety that Tweety is a bird,
we can derive the default conclusion that Tweety flies. However, if we are
now given the extra information that Tweety is a penguin, we can also
conclude that Tweety does not fly. In ordinary, common sense reasoning,
the rule that penguins do not fly has priority over the rule that birds fly,
and consequently this new conclusion that Tweety does not fly causes the
original conclusion to be withdrawn.
One of the most important formalisations of default reasoning is the
default logic of Reiter [Reiter, 1980]. Reiter separates beliefs into two kinds,
ordinary sentences used to express "facts" and default rules of inference
used to express general rules. A default rule is an inference rule of the
form

which expresses, for all variable-free instances t of X,4 that r(t) can be de-
rived if a(t) holds and each of Bi(t) is consistent, where a(X), B i ( X ) , r ( X )
are first-order formulae. Default rules provide a way of extending an un-
derlying incomplete theory. Different applications of the defaults can yield
different extensions.
As already mentioned in Section 1, Poole, Goebel and Aleliunas [Poole
et al., 1987] and Poole [Poole, 1988] propose an alternative formalisation
of default reasoning in terms of abduction. Like Reiter, Poole also distin-
guishes two kinds of beliefs:
• beliefs that belong to a consistent set of first order sentences F rep-
resenting "facts", and
• beliefs that belong to a set of first order formulae D representing
defaults.
4
We use the notation X to indicate a tuple of variables X 1 , . . . , Xn and t to represent
a tuple of terms t 1 , . . . , t n .
250 A. C. Kakas, R. A. Kowalski and F. Toni

Perhaps the most important difference between Poole's and Reiter's for-
malisations is that Poole uses sentences (and formulae) of classical first
order logic to express defaults, while Reiter uses rules of inference. Given a
Theorist framework (F, D), default reasoning can be thought of as theory
formation. A new theory is formed by extending the existing theory T with
a set A of sentences which are variable-free instances of formulae in D. The
new theory FU A should be consistent. This process of theory formation is
a form of abduction, where variable-free instances of defaults in D are the
candidate abducibles. Poole [Poole, 1988] shows that the semantics of the
theory formation framework (F, D) is equivalent to that of an abductive
framework (F1 , A, 0) (see Section 1.2) where the default formulae are all
atomic. The set of abducibles A consists of a new predicate

for each default formula

in D with free variables x. The new predicate is said to "name" the default.
The set F is the set F augmented with a sentence

for each default in D.


The theory formation framework and its correspondence with the ab-
ductive framework can be illustrated by the flying-birds example.
Example 3.0.1. In this case, the framework (F, D) is 5

F = { penguin(X) -> bird(X),


penguin(X) —>• -<fly(X),
penguin(tweety) ,
bird(john) }
D ={ bird(X) -)• fly(X) }. (3.1)

The priority of the rule that penguins do not fly over the rule that birds
fly is obtained by regarding the first rule as a fact and the second rule as a
default. The atom fly(john) is a default conclusion which holds in f U A
with
5
Here, we use the conventional notation of first-order logic, rather than LP form.
We use —>• for the usual implication symbol for first-order logic in contrast with •<— for
LP. However, as in LP notation, variables occurring in formulae of f are assumed to
be universally quantified. Formulae of D, on the other hand, should be understood as
schemata standing for the set of all their variable-free instances.
The Role of Abduction 251

A = { bird(john) -¥ fly(john) }.
We obtain the same conclusion by naming the default (3.1) by means of a
predicate birds-fly(X), adding to F the new "fact"

birds-fly(X) -> [bird(X) -4 fly(X)] (3.2)

and extending the resulting augmented set of facts Fwith the set of hy-
potheses
A' = {birds-fly(john) }.
On the other hand, the conclusion fly(tweety) cannot be derived, because
the extension
A = { bird(tweety) -> fly(tweety) }
is inconsistent with F, and similarly the extension

A' = { birds- fly(tweety) }

is inconsistent with f '.


Poole shows that normal defaults without prerequisites in Reiter's de-
fault logic

can be simulated by Theorist (abduction) simply by making the predicates


B ( X ) abducible. He shows that the default logic extensions in this case are
equivalent to maximal sets of variable-free instances of the default formulae
J3(X) that can consistently be added to the set of facts.
Maximality of abductive hypotheses is a natural requirement for default
reasoning, because we want to apply defaults whenever possible. However,
maximality is not appropriate for other uses of abductive reasoning. In
particular, in diagnosis we are generally interested in explanations which
are minimal. Later, in Section 5.1 we will distinguish between default and
non-default abducibles in the context of abductive logic programming.
In the attempt to use abduction to simulate more general default rules,
however, Poole needs to use integrity constraints. The new theory F U A
should be consistent with these constraints. Default rules of the form:

are translated into "facts" , which are implications


252 A. C. Kakas, R. A. Kowalski and F. Toni

where MBj. is a new predicate, and MBI. (X) is a default formula (abducible),
for all i = 1, . . . , n. Integrity constraints

are needed to link the new predicates MBI; appropriately with the predicates
B, for all i = 1, . . . , n. A further integrity constraint

for any i = 1, . . . , n, is needed to prevent the application of the contrapos-


itive

of the implication, in the attempt to make the implication behave like


an inference rule. This use of integrity constraints is different from their
intended use in abductive frameworks as presented in Section 1.2.
Poole's attempted simulation of Reiter's general default rules is not
exact. He presents a number of examples where the two formulations differ
and argues that Reiter's default logic gives counterintuitive results. In fact,
many of these examples can be dealt with correctly in certain extensions
of default logic, such as cumulative default logic [Makinson, 1989], and it
is possible to dispute some of the other examples. But, more importantly,
there are still other examples where the Theorist approach arguably gives
the wrong result. The most important of these is the now notorious Yale
shooting problem of [Hanks and McDermott, 1986; Hanks and McDermott,
1987]. This can be reduced to the prepositional logic program
alive-after-load-wait-shoot i- alive-after-load-wait,
~ abnormal-alive-shoot
loaded-after-load-wait 4— loaded-after-load,
~ abnormal-loaded-wait
abnormal-alive-shoot 4— loaded-after-load-wait
alive-after-load-wait
loaded-after-load.
As argued in [Morris, 1988], these clauses can be simplified further: First,
the facts alive-after-load-wait and loaded-after-load can be eliminated by
resolving them against the corresponding conditions of the first two clauses,
giving
alive-after-load-wait-shoot 4— ~ abnormal-alive-shoot
loaded-after-load-wait 4— ~ abnormal-loaded-wait
abnormal-alive-shoot 4— loaded-after-load-wait.
Then the atom loaded-after-load-wait can be resolved away from the sec-
ond and third clauses leaving the two clauses
alive-after-load-wait-shoot 4— ~ abnormal-alive-shoot
The Role of Abduction 253

abnormal-alive-shoot <— ~ abnormal-loaded-wait.


The resulting clauses have the form

Hanks and McDermott showed, in effect, that the default theory, whose
facts consist of

and whose defaults are the normal defaults

has two extensions: one in which ->r, and therefore q holds; and one in
which ->q, and therefore p holds. The second extension is intuitively in-
correct under the intended interpretation. Hanks and Me Dermott showed
that many other approaches to default reasoning give similarly incorrect
results. However, Morris [Morris, 1988] showed that the default theory
which has no facts but contains the two non-normal defaults

yields only one extension, containing q, which is the correct result. In con-
trast, all natural representations of the problem in Theorist give incorrect
results.
As Eshghi and Kowalski [Eshghi and Kowalski, 1988], Evans [Evans,
1989] and Apt and Bezem [Apt and Bezem, 1990] observe, the Yale shooting
problem has the form of a logic program, and interpreting negation in the
problem as negation as failure yields only the correct result. This is the case
for both the semantics and the proof theory of LP. Moreover, [Eshghi and
Kowalski, 1988] and [Kakas and Mancarella, 1989] show how to retain the
correct result when negation as failure is interpreted as a form of abduction.
On the other hand, the Theorist framework does overcome the problem
that some default theories do not have extensions and hence cannot be
given any meaning within Reiter's default logic. In the next section we will
see that this problem also occurs in LP, but that it can also be overcome
by an abductive treatment of negation as failure. We will also see that the
resulting abductive interpretation of negation as failure allows us to regard
LP as a hybrid which treats defaults as abducibles in Theorist but treats
clauses as inference rules in default logic.
254 A. C. Kakas, R. A. Kowalski and F. Toni

The inference rule interpretation of logic programs, makes LP extended


with abduction especially suitable for default reasoning. Integrity con-
straints can be used, not for preventing application of contrapositives, but
for representing negative information and exceptions to defaults.
Example 3.0.2. The default (3.1) in the flying-birds Example 3.0.1 can
be represented by the logic program

fly(X) <- bird(X), birds-fly(X),


with the abducible predicate birds-fly(X). Note that this clause is equiva-
lent to the "fact" (3.2) obtained by renaming the default (3.1) in Theorist.
The exception can be represented by an integrity constraint:

-i fly(X) <— penguin(X).

The resulting logic program, extended by means of abduction and integrity


constraints, gives similar results to the Theorist formulation of Exam-
ple 3.0.1.

In Sections 4, 5 and 6 we will see other ways of performing default reason-


ing in LP. In Section 4 we will introduce negation as failure as a form of
abductive reasoning. In Section 5 we will discuss abductive logic program-
ming with default and non-default abducibles and domain-specific integrity
constraints. In Section 6 we will consider an extended LP framework that
contains clauses with negative conclusions and avoids the use of explicit
integrity constraints in many cases. In Section 7 we will present an ab-
stract argumentation-based framework for default reasoning which unifies
the treatment of abduction, default logic, LP and several other approaches
to default reasoning.

4 Negation as failure as abduction


We noted in the previous section that default reasoning can be performed
by means of abduction in LP by explicitly introducing abducibles into rules.
Default reasoning can also be performed with the use of negation as failure
(NAF) [Clark, 1978] in general logic programs. NAF provides a natural
and powerful mechanism for performing non-monotonic and default rea-
soning. As we have already mentioned, it provides a simple solution to the
Yale shooting problem. The abductive interpretation of NAF that we will
present below provides further evidence for the suitability of abduction for
default reasoning.

To see how NAF can be used for default reasoning, we return to the flying-
birds example.
The Role of Abduction 255

Example 4.0.1. The NAF formulation differs from the logic program
with abduction presented in the last section (Example 3.0.2) by employing
a negative condition
~ abnormal-bird(X)
instead of a positive abducible condition

tirds-fly(X)

and by employing a positive conclusion

abnormal-bird(X )

in an ordinary program clause, instead of a negative conclusion

in an integrity constraint. The two predicates abnormal-bird and birds- fly


are opposite to one another. Thus in the NAF formulation the default is
expressed by the clause

fly(X) <- bird(X), ~ abnormal-bird(X)

and the exception by the clause

abnormal-bird(X) <— penguin(X).

In this example, both the abductive formulation with an integrity con-


straint and the NAF formulation give the same result. We will see later
in Section 5.5 that there exists a systematic transformation which replaces
positive abducibles by NAF and integrity constraints by ordinary clauses.
This example can be regarded as an instance of that transformation.

4.1 Logic programs as abductive frameworks


The similarity between abduction and NAF can be used to give an abduc-
tive interpretation of NAF. This interpretation was presented in [Eshghi
and Kowalski, 1988] and [Eshghi and Kowalski, 1989], where negative lit-
erals are interpreted as abductive hypotheses that can be assumed to hold
provided that, together with the program, they satisfy a canonical set of
integrity constraints. A general logic program P is thereby transformed
into an abductive framework (P*, A*, I*) (see Section 1) in the following
way.
• A new predicate symbol p* (the opposite of p) is introduced for each
pin P, and A* is the set of all these predicates.
256 A. C. Kakas, R. A. Kowalski and F. Toni

• P* is P where each negative literal ~ p(t) has been replaced by p*(t).


• I* is a set of all integrity constraints of the form:6

A p*(X)]and

The semantics of the abductive framework {P*, A*, I*), in terms of ex-
tensions 7 P* U A of P* , where A C A* , gives a semantics for the original
program P. A conclusion Q holds with respect to P if and only if the query
Q*, obtained by replacing each negative literal ~ p(t) in Q by p*(t), has an
abductive explanation in the framework (P*, A*, I*). This transformation
of P into (P*, A*, I*) is an example of the method, described at the end
of Section 1.1, of giving a semantics to a language by translating it into
another language whose semantics is already known.
The integrity constraints in /* play a crucial role in capturing the mean-
ing of NAF. The denials express that the newly introduced symbols p* are
the negations of the corresponding p. They prevent an assumption p* (t) if
p(t) holds. On the other hand the disjunctive integrity constraints force a
hypothesis p* (t) whenever p(t) does not hold.
Hence we define the meaning of the integrity constraints /* as follows:
An extension P*U A (which is a Horn theory) of P* satisfies I* if and
only if for every variable-free atom p,

P* U A y= p A p*, and
P* U A (= p or P" U A (= p*.

Eshghi and Kowalski [Eshghi and Kowalski, 1989] show that there is a one
to one correspondence between stable models [Gelfond and Lifschitz, 1988]
of P and abductive extensions of P*. We recall the definition of stable
model:
Let P be a general logic program, and assume that all the clauses in
P are variable-free.8 For any set M of variable-free atoms, let PM be the
Horn program obtained by deleting from P:P:
6
In the original paper the disjunctive integrity constraints were written in the form
Demo(P*U A, p(t)) V Demo(P*U A, p*(t)),
where t is any variable-free term. This formulation makes explicit a particular (meta-
level) interpretation of the disjunctive integrity constraint. The simpler form

is neutral with respect to the interpretation of integrity constraints and allows the meta-
level interpretation as a special case.
7
This use of the term "extension" is different from other uses. For example, in
default logic an extension is formally denned to be the deductive closure of a theory
"extended" by means of the conclusions of default rules. In this paper we also use the
term "extension" informally (as in Example 3.0.1) to refer to A alone.
8
If P is not variable-free, then it is replaced by the set of all its variable-free instances.
The Role of Abduction 257

i) each rule that contains a negative literal ~ A, with A € M,


ii) all negative literals in the remaining rules.
If the minimal (Herbrand) model of PM coincides with M, then M is a
stable model for P.
The correspondence between the stable model semantics of a program
P and abductive extensions of P* is given by:
• For any stable model M of P, the extension P*U A satisfies I*, where
A = {p* |pis a variable-free atom,p £ M}.
• For any A such that P*U A satisfies /*, there is a stable model M of
P, where M = {p|pis a variable-free atom,p* £ A}.
Notice that the disjunctive integrity constraints in the abductive frame-
work correspond to a totality requirement that every atom must be either
true or false in the stable model semantics. Several authors have argued
that this totality requirement is too strong, because it prevents us from
giving a semantics to some programs, for example p <- ~ p. We would
like to be able to assign a semantics to every program in order to have
modularity, as otherwise one part of the program can affect the meaning
of another unrelated part. We will see below that the disjunctive integrity
constraint also causes problems for the implementation of the abductive
framework for NAF.
Notice that the semantics of NAF in terms of abductive extensions is
syntactic rather than model-theoretic. It is a semantics in the sense that it
is a non-constructive specification. Similarly, the stable model semantics,
as is clear from its correspondence with abductive extensions, is a seman-
tics in the sense that it is a non-constructive specification of what should
be computed. The computation itself is performed by means of a proof
procedure.

4.2 An abductive proof procedure for LP


In addition to having a clear and simple semantics for abduction, it is
also important to have an effective method for computing abductive expla-
nations. Any such method will be very useful in practice in view of the
many diverse applications of abductive reasoning, including default reason-
ing. The Theorist framework of [Poole, 1988; Poole et al., 1987] provides
such an implementation of abduction by means of a resolution based proof
procedure.
In their study of NAF through abduction Eshghi and Kowalski [Eshghi
and Kowalski, 1989] have defined an abductive proof procedure for NAF in
logic programming. We will describe this procedure in some detail as it also
serves as the basis for computing abductive explanations more generally
within logic programming with other abducibles and integrity constraints
(see Section 5). In this section we will refer to the version of the abductive
258 A. C. Kakas, R. A. Kowalski and F. Toni

proof procedure presented in [Dung, 1991].9


The abductive proof procedure interleaves two types of computation.
The first type, referred to as the abductive phase, is standard SLD-
resolution, which generates (negative) hypotheses and adds them to the
set of abducibles being generated, while the second type, referred to as the
consistency phase,10 incrementally checks that the hypotheses satisfy the
integrity constraints /* for NAF. Integrity checking of a hypothesis p*(t)
reasons forward one step using a denial integrity constraint to derive the
new denial -<p(t), which is then interpreted as the goal •<— p(t). Thereafter
it reasons backward in SLD-fashion in all possible ways. Integrity checking
succeeds if all the branches of the resulting search space fail finitely, in other
words, if the contrary of p*(t), namely p(t), finitely fails to hold. Whenever
the potential failure of a branch of the consistency phase search space is
due to the failure of a selected abducible, say q*(s), a new abductive phase
of SLD-resolution is triggered for the goal <- q(s), to ensure that the
disjunctive integrity constraint q*(s) V q(s) is not violated by the failure
of both q*(s) and q(s). This attempt to show q(s) can require in turn the
addition of further abductive assumptions to the set of hypotheses which
is being generated.
To illustrate the procedure consider the following logic program, which
is a minor elaboration of the prepositional form of the Yale shooting prob-
lem discussed in Section 3.
Example 4.2.1.

The query <— s succeeds with answer A = {p*, r*}. The computation
is shown in Figure 1. Parts of the search space enclosed by a double box
show the incremental integrity checking of the latest abducible added to the
explanation A. For example, the outer double box shows the integrity check
for the abducible p*. For this we start from «- p = ->p (resulting from the
resolution of p* with the integrity constraint -> (p A p*) = -ip V ->p*) and
resolve backwards in SLD-fashion to show that all branches end in failure,
depicted here by a black box. During this consistency phase for p* a new
abductive phase (shown in the single box) is generated when q* is selected
since the disjunctive integrity constraint q* V q implies that failure of q* is
allowed only provided that q is provable. The SLD proof of q requires the
9
As noticed by Dung [Dung, 1991], the procedure presented in [Eshghi and Kowalski,
1989] contains a mistake, which is not present, however, in the earlier unpublished
version of the paper.
10
We use the term "consistency phase" for historical reasons. However, in view of the
precise definition of integrity constraint satisfaction, some other term might be more
appropriate.
The Role of Abduction 259

Fig. 1. Computation for Example 4.2.1

addition of r* to A, which in turn generates a new consistency phase for r*


shown in the inner double box. The goal <— r fails trivially because there
are no rules for r and so r* and the enlarged explanation A = {p*, r*}
satisfy the integrity constraints. Tracing the computation backwards, we
see that q holds, therefore q" fails and, therefore p* satisfies the integrity
constraints and the original query «— s succeeds.
In general, an abductive phase succeeds if and only if one of its branches
ends in a white box (indicating that no subgoals remain to be solved). It
fails finitely if and only if all branches end in a black box (indicating that
some subgoal cannot be solved). A consistency phase fails if and only if
one of its branches ends in a white box (indicating that integrity has been
violated). It succeeds finitely if and only if all branches end in a black box
(indicating that integrity has not been violated).
It is instructive to compare the computation space of the abductive
proof procedure with that of SLDNF. It is easy to see that these are closely
related. In particular, in both cases negative atoms need to be variable-
free before they are selected. On the other hand, the two proof procedures
have some important differences. A successful derivation of the abduc-
tive proof procedure will produce, together with the usual answer obtained
from SLDNF, additional information, namely the abductive explanation A.
260 A. C. Kakas, R. A. Kowalski and F. Toni

This additional information can be useful in different ways, in particular to


avoid recomputation of negative subgoals. More importantly, as the next
example will show, this information will allow the procedure to handle non-
stratified programs and queries for which SLDNF is incomplete. In this
way the abductive proof procedure generalises SLDNF. Furthermore, the
abductive explanation A produced by the procedure can be recorded and
used in any subsequent revision of the beliefs held by the program, in a sim-
ilar fashion to truth maintenance systems [Kakas and Mancarella, 1990d].
In fact, this abductive treatment of NAF allows us to identify a close con-
nection between logic programming and truth maintenance systems in gen-
eral (see Section 8). Another important difference is the distinction that
the abductive proof procedure for NAF makes between the abductive and
consistency phases. This allows a natural extension of the procedure to
a more general framework where we have other hypotheses and integrity
constraints in addition to those for NAF [Kakas and Mancarella, 1990a;
Kakas and Mancarella, 1990b; Kakas and Mancarella, 1990c] (see Sec-
tion 5.2).
To see how the abductive proof procedure extends SLDNF, consider the
following program.
Example 4.2.2.

The last two clauses in this program give rise to a two-step loop via
NAF, in the sense that p (and, similarly, q) "depends" negatively on itself
through two applications of NAF. This causes the SLDNF proof procedure,
executing the query <— s, to go into an infinite loop. Therefore, the query
has no SLDNF refutation. However, in the corresponding abductive frame-
work the query has two answers, A = {p*} and A = {q*}, corresponding
to the two stable models of the program. The computation for the first
answer is shown in Figure 2. The outer abductive phase generates the hy-
pothesis p* and triggers the consistency phase for p* shown in the double
box. In general, whenever a hypothesis is tested for integrity, we can add
the hypothesis to A either at the beginning or at the end of the consistency
phase. When this addition is done at the beginning (as originally defined
in [Eshghi and Kowalski, 1989]) this extra information can be used in any
subordinate abductive phase. In this example, the hypothesis p* is used in
the subordinate abductive proof of q to justify the failure of q* and con-
sequently to render p* acceptable. In other words, the acceptability of p*
as a hypothesis is proved under the assumption of p*. The same abductive
proof procedure, but where each new hypothesis is added to A only at
The Role of Abduction 261

Fig. 2. Computation for Example 4.2.2

the successful completion of its consistency phase, provides a sound proof


procedure for the well-founded semantics [Van Gelder et al., 1988].
Example 4.2.3. Consider the query «- p with respect to the abductive
framework corresponding to the following program:

Note that the first clause of this program give rise to a one-step loop
via NAF, in the sense that r "depends" negatively on itself through one
application of NAF. The abductive proof procedure succeeds with the ex-
planation {q*}, but the only set of hypotheses which satisfies the integrity
constraints is {p*}.
So, as Eshghi and Kowalski [Eshghi and Kowalski, 1989] show by means
of this example, the abductive proof procedure is not always sound with
respect to the above abductive semantics of NAF. In fact, following the
result in [Dung, 1991], it can be proved that the proof procedure is sound
for the class of order-consistent logic programs defined by Sato [Sato, 1990].
262 A. C. Kakas, R. A. Kowalski and F. Toni

Intuitively, this is the class of programs which do not contain clauses giving
rise to odd-step loops via NAF.
For the overall class of general logic programs, moreover, it is possible
to argue that it is the semantics and not the proof procedure that is at
fault. Indeed, Sacca and Zaniolo [Sacca and Zaniolo, 1990], Przymusinski
[Przymusinski, 1990] and others have argued that the totality requirement
of stable models is too strong. They relax this requirement and consider
partial or three-valued stable models instead. In the context of the abduc-
tive semantics of NAF this is an argument against the disjunctive integrity
constraints.
An abductive semantics of NAF without disjunctive integrity constraints
has been proposed by Dung [Dung, 1991] (see Section 4.3 below). The ab-
ductive proof procedure is sound with respect to this improved semantics.
An alternative abductive semantics of NAF without disjunctive in-
tegrity constraints has been proposed by Brewka [Brewka, 1993], follow-
ing ideas presented in [Konolige, 1992]. He suggests that the set which
includes both accepted and refuted NAF hypotheses be maximised. For
each such set of hypotheses, the logic program admits a "model" which is
the union of the sets of accepted hypotheses together with the "comple-
ment" of the refuted hypotheses. For Example 4.2.3 the only "model" is
{p*,q,r}. Therefore, the abductive proof procedure is still unsound with
respect to this semantics. Moreover, this semantics has other undesirable
consequences. For example, the program

p <— p, ~ g

admits both {~ q} and {~ p} as "models"


"models",, while the only intuitively correct
"model" is {~ q}.
An alternative three-valued semantics for NAF has been proposed by
Giordano, Martelli and Sapino [Giordano et al., 1993]. According to their
semantics, given the program

p and p* are both undefined. In contrast, p* holds in the semantics of


[Dung, 1991], as well as in the stable model [Gelfond and Lifschitz, 1988]
and well-founded semantics [Van Gelder et al., 1988]. Giordano, Martelli
and Sapino [Giordano et al, 1993] modify the abductive proof procedure so
that the modification is sound and complete with respect to their semantics.
Satoh and Iwayama [Satoh and Iwayama, 1992], on the other hand,
show how to extend the abductive proof procedure of [Eshghi and Kowalski,
1989] to deal correctly with the stable model semantics. Their extension
modifies the integrity checking method of [Sadri and Kowalski, 1987] and
deals more generally with arbitrary integrity constraints expressed in the
form of denials.
The Role of Abduction 263

Casamayor and Decker [Casamayor and Decker, 1992] also develop an


abductive proof procedure for NAF. Their proposal combines features of
the Eshghi-Kowalski procedure with ancestor resolution.
Finally, we note that, to show that ~ p holds for programs such as p <-
p, it is possible to define a non-effective extension of the proof procedure
that allows infinite failure in the consistency phases.

4.3 An argumentation-theoretic interpretation


Dung [Dung, 1991] replaces the disjunctive integrity constraints by a weaker
requirement similar to the requirement that that the set of negative hy-
potheses A be a maximally consistent set. Unfortunately, simply replacing
the disjunctive integrity constraints by maximality does not work, as shown
in the following example.
Example 4.3.1. With this change the program

has two maximally consistent extensions AI = { p * } and A2 = {q*}•


However, only the second extension is computed both by SLDNF and by
the abductive proof procedure. Moreover, for the same reason as in the
case of the propositional Yale shooting problem discussed before, only the
second extension is intuitively correct.
To avoid such problems Dung's notion of maximality is more subtle. He
associates with every logic program P an abductive framework (P* , A*, I*)
where I* contains only denials

as integrity constraints. Then, given sets A, E of (negative) hypotheses, i.e.


A C A* and E C A*, E can be said to attack A (relative to {P*, A*, I*})
if P* U E \- p for some p* € A. u Dung calls an extension P*U A of P*
preferred if
• P* U A is consistent with /* and
• A is maximal with respect to the property that for every attack E
against A, A attacks E (i.e. A "counterattacks" E or "defends" itself
against E).
Thus a preferred extension can be thought of as a maximally consistent set
of hypotheses that contains its own defence against all attacks. In [Dung,
1991] a consistent set of hypotheses A (not necessarily maximal) satisfying
the property of containing its own defence against all attacks is said to be
11
Alternatively, instead of the symbol h we could use the symbol t=, here and elsewhere
in the paper where we define the notion of "attack".
264 A. C. Kakas, R. A. Kowalski and F. Toni

admissible (to P*). In fact, Dung's definition is not formulated explicitly


in terms of the notions of attack and defence, but is equivalent to the one
just presented.
Preferred extensions solve the problem with disjunctive integrity con-
straints in Example 4.2.3 and with maximal consistency semantics in Ex-
ample 4.3.1. In Example 4.2.3 the preferred extension semantics sanctions
the derivation of p by means of an abductive derivation with generated hy-
potheses { q* }. In fact, Dung proves that the abductive proof procedure is
sound with respect to the preferred extension semantics. In Example 4.3.1
the definition of preferred extension excludes the maximally consistent ex-
tension {p*}, because there is no defence against the attack q*.
The preferred extension semantics provides a unifying framework for
various approaches to the semantics of negation in LP. Kakas and Mancar-
ella [Kakas and Mancarella, 1993] show that it is equivalent to Sacca and
Zaniolo's partial stable model semantics [Sacca and Zaniolo, 1990]. Like
the partial stable model semantics, it includes the stable model semantics
as a special case.
Dung [Dung, 1991] also defines the notion of complete extension. An
extension P* U A is complete if
• P* U A is consistent with /* and
• A = {p* | for each attack E against {p*}, A attacks E}
(i.e. A is admissible and it contains all hypotheses it can defend
against all attacks).
Stationary expansions [Przymusinski, 1991] are equivalent to complete
extensions, as shown in [Brogi et al., 1992]. Moreover, Dung shows that the
well-founded model [Van Gelder et al, 1988] is the smallest complete
extension that can be constructed bottom-up from the empty set of negative
hypotheses, by adding incrementally all admissible hypotheses. Thus the
well-founded semantics is minimalist and sceptical, whereas the preferred
extension semantics is maximalist and credulous. The relationship between
these two semantics is further investigated in [Dung et al, 1992], where the
well-founded model and preferred extensions are shown to correspond to
the least fixed point and greatest fixed point, respectively, of the same
operator.
Kakas and Mancarella [Kakas and Mancarella, 1991; Kakas and Mancar-
ella, 1991a] propose an improvement of the preferred extension semantics.
Their proposal can be illustrated by the following example.
Example 4.3.2. Consider the program

Similarly to Example 4.2.3, the last clause gives rise to a one-step loop
The Role of Abduction 265

via NAF, since q "depends" negatively on itself through one application of


NAF. In the abductive framework corresponding to this program consider
the set of hypotheses A = {p*}. The only attack against A is E = {q*}, and
the only attack against E is E itself. Thus A is not an admissible extension
of the program according to the preferred extension semantics, because
A cannot defend itself against E. The empty set is the only preferred
extension. However, intuitively A should be "admissible" because the only
attack E against A attacks itself, and therefore should not be regarded as
an "admissible" attack against A.
To deal with this kind of example, Kakas and Mancarella [Kakas and
Mancarella, 1991; Kakas and Mancarella, 1991a] modify Dung's seman-
tics, increasing the number of ways in which an attack E can be defeated.
Whereas Dung only allows A to defeat an attack .E, they also allow E to
defeat itself. They call a set of hypotheses A weakly stable if
• for every attack E against A, E U A attacks E — A.
Moreover, they call an extension P* U A of P* a stable theory if A
is maximally weakly stable. Note that here the condition "P* U A is
consistent with I*" of the definition of preferred extensions and admissible
sets of hypotheses is subsumed by the new condition. This is a consequence
of another difference between [Kakas and Mancarella, 1991; Kakas and
Mancarella, 1991a] and [Dung, 1991], namely that for each attack E against
A the counter-attack is required to be against E— A rather than against E.
In other words, the defence of A must be a genuine attack that does not at
the same time also attack A. Therefore, if A is inconsistent, it contains as
a subset an attack E, which can not be counterattacked because E — A is
empty. In [Kakas and Mancarella, 1991a], Kakas and Mancarella show how
these notions can also be used to extend the sceptical well-founded model
semantics. In Example 4.3.2 above this extension of the well-founded model
will contain the negation of p.
Like the original definition of admissible sets of hypotheses and preferred
extension, the definition of weakly stable sets of hypotheses and stable
theories was not originally formulated in terms of attack, but is equivalent
to the one presented here.
Kakas and Mancarella [Kakas and Mancarella, 1991a] argue that the
notion of defeating an attack needs to be liberalised further. They illustrate
their argument with the following example.
Example 4.3.3. Consider the program P
266 A. C. Kakas, R. A. Kowalski and F. Toni

The last three clauses give rise to a three-step loop via NAF, since p (and,
similarly, q and r) "depends" negatively on itself through three applications
of NAF. In the corresponding abductive framework, the only attack against
the hypothesis s* is E = {p*}. But although P* U{s*} U E does not attack
E, E is not a valid attack because it is not stable (or admissible) according
to the definition above.
To generalise the reasoning in this example so that it gives an intuitively
correct semantics to any program with clauses giving rise to an odd-step
loop via NAF, we need to liberalise further the conditions for defeating
E. Kakas and Mancarella suggest a recursive definition in which a set of
hypotheses is deemed acceptable if no attack against it is acceptable. More
precisely, given an initial set of hypotheses AO, a set of hypotheses A is
acceptable to AO iff
for every attack E against A — A0, E is not acceptable to A U A0.
The semantics of a program P can be identified with any A which is maxi-
mally acceptable to the empty set of hypotheses 0. As before with weak sta-
bility and stable theories, the consideration of attacks only against A — A0
ensures that attacks and counterattacks are genuine, i.e. they attack the
new part of A that does not contain AO.
Notice "that, as a special case, we obtain a basis for the definition:
A is acceptable to A0 if A C A0.
Therefore, if A is acceptable to 0 then A is consistent.

Notice, too, that applying the recursive definition twice, and starting with
the base case, we obtain an approximation to the recursive definition
A is acceptable to AO if for every attack E against A — A0 ,
E U A U A0 attacks E - (A U A 0 ).
Thus, the stable theories are those which are maximally acceptable to 0,
where acceptability is defined by this approximation to the recursive defi-
nition.
A related argumentation-theoretic interpretation for the semantics of
NAF in LP has also been developed by Geffher [Geffner, 1991]. This inter-
pretation is equivalent to the well-founded semantics [Dung, 1993]. Based
upon Geffner's notion of argumentation, Torres [Torres, 1993] has proposed
an argumentation-theoretic semantics for NAF that is equivalent to Kakas
and Mancarella's stable theory semantics [Kakas and Mancarella, 1991;
Kakas and Mancarella, 1991a], but is formulated in terms of the following
notion of attack: E attacks A (relative to P*) if P* U E U A - p for
some p* € A.
Alferes and Pereira [Alferes and Pereira, 1994] apply the argumentation-
theoretic interpretation introduced in [Kakas et al., 1993] to expand the
well-founded model of normal and extended logic programs (see Section 5).
The Role of Abduction 267

In the case of normal logic programming, their semantics gives the same
result as the acceptability semantics in Example 4.3.3.
Simari and Loui [Simari and Loui, 1992] define an argumentation-theoretic
framework for default reasoning in general. They combine a notion of ac-
ceptability with Poole's notion of "most specific" explanation [Poole, 1985],
to deal with hierarchies of defaults.
In Section 7 we will present an abstract argumentation-theoretic frame-
work which is based upon the framework for LP but unifies many other
approaches to default reasoning.

4.4 An argumentation-theoretic interpretation of the


abductive proof procedure
As mentioned above, the incorrectness (with respect to the stable model
semantics) of the abductive proof procedure can be remedied by adopting
the preferred extension, stable theory or acceptability semantics. This
reinterpretation of the original abductive proof procedure in terms of an
improved semantics, and the extension of the proof procedure to capture
further improvements in the semantics, is an interesting example of the
interaction that can arise between a program (proof procedure in this case)
and its specification (semantics).
To illustrate the argumentation-theoretic interpretation of the proof
procedure, consider again Figure 1 of Example 4.2.1. The consistency
phase for p*, shown in the outermost double box, can be understood as
searching for any attack against {p*}. The only attack, namely {q*}, is
counterattacked (thereby defending {p*}) by assuming the additional hy-
pothesis r*, as this implies q. Hence the set A = {p*, r* } is admissible,
i.e. it can defend itself against any attack, since all attacks against {p*}
are counterattacked by {r*} and there are no attacks against {r*}.
In general, the proof procedure constructs an admissible set of negative
hypotheses in two steps. First, it constructs a set of hypotheses which is
sufficient to solve the original goal. Then, it augments this set with the
hypotheses necessary to defend the first set against attack.
The argumentation-theoretic interpretation suggests how to extend the
proof procedure to capture more fully the stable theory semantics and more
generally the semantics given by the recursive definition for acceptability.
The extension, presented in [Toni and Kakas, 1995], involves temporarily
remembering a (selected) attack E and using E itself together with the sub-
set of A generated so far, to counterattack E, in the subordinate abductive
phase.
For Example 4.3.2 of Section 4.3, as shown in Figure 3, to defend against
the attack q* on p*, we need to temporarily remember q* and use it in the
subordinate abductive phase to prove q and therefore to attack q* itself.
In the original abductive proof procedure of [Eshghi and Kowalski,
268 A. C. Kakas, R. A. Kowalski and F. Toni

Fig. 3. Computation for Example 4.3.2 with respect to the revisited proof
procedure

1989], hypotheses in defences are always added to A. However, in the


proof procedure for the acceptability semantics defences D can not always
be added to A, because even though D might be acceptable to A, A U D
might not be acceptable to 0. This situation arises for the three-step loop
program of Example 4.3.3, where D = {q*} is used to defend A = {s*}
against the attack E = {p*}, but A U D is not acceptable to 0.
To cater for this characteristic of the acceptability semantics, the ex-
tended proof procedure non-deterministically considers two cases. For each
hypothesis in a defence D against an attack E against A, the hypothesis
either can be added to A or can be remembered temporarily to counter-
attack any attack E' against D, together with A and E. In general, a
sequence of consecutive attacks and defences E, D, E', D', . . . can be gen-
erated before an acceptable abductive explanation A is found, and the same
non-deterministic consideration of cases is applied to D' and all successive
defences in the sequence.
The definitions of admissible, stable and acceptable sets A of hypothe-
ses all require that every attack against A be counterattacked. Although
every superset of an attack is also an attack, the abductive proof proce-
dure in [Eshghi and Kowalski, 1989] only considers those "minimal" attacks
The Role of Abduction 269

generated by SLD, 12 without examining superset attacks. This is possible


because all supersets of an attack can be counterattacked in exactly the
same way as the attack itself, which is generated by SLD. For this reason,
the proof procedure of [Eshghi and Kowalski, 1989] is sound for the admissi-
bility semantics. Unfortunately, supersets of attacks need to be considered
to guarantee soundness of the proof procedure for the acceptability seman-
tics. In [Toni and Kakas, 1995], however, Toni and Kakas prove that only
certain supersets of "minimally generated" attacks need to be considered.
The additional features required for the proof procedure to capture more
fully the acceptability semantics render the proof procedure considerably
more complex and less efficient than proof procedures for simpler semantics.
However, this extra complexity is due to the treatment of any odd-step
loops via NAF and such programs seem to occur very rarely in practice.
Therefore, in most cases it is sufficient to consider the approximation of the
proof procedure which computes the preferred extension and stable theory
semantics. This approximation improves upon the Eshghi-Kowalski proof
procedure, since in the case of finite failure it terminates earlier, avoiding
unnecessary computation.

5 Abductive logic programming


Abductive logic programming (ALP), as understood in the remainder of
this paper, is the extension of LP to support abduction in general, and not
only the use of abduction for NAF. This extension was introduced earlier in
Section 1, as the special case of an abductive framework (T, A, I), where T
is a logic program. In this paper we will assume, without loss of generality,
that abducible predicates do not have definitions in T, i.e. do not appear
in the heads of clauses in the program T.13 This assumption has the
advantage that all explanations are thereby guaranteed to be basic.
Semantics and proof procedures for ALP have been proposed by Eshghi
and Kowalski [Eshghi and Kowalski, 1988], Kakas and Mancarella [Kakas
and Mancarella, 1990] and Chen and Warren [Chen and Warren, 1989].
Chen and Warren extend the perfect model semantics of Przymusinski
[Przymusinski, 1989] to include abducibles and integrity constraints over
12
As illustrated in Section 1, these attacks are genuinely minimal unless the logic
program encodes non-minimal explanations.
13
In the case in which abducibile predicates have definitions in T, auxiliary predicates
can be introduced in such a way that the resulting program has no definitions for the
abducible predicates. This can be done by means of a transformation similar to the one
used to separate extensional and intensional predicates in deductive databases [Minker,
1982]. For example, for each abducible predicate a(X) in T we can introduce a new
predicate Sa(X) and add the clause

The predicate a(X) is no longer abducible, whereas Sa(X) is now abducible.


270 A. C. Kakas, R. A. Kowalski and F. Toni

abducibles. Here we shall concentrate on the proposal of Kakas and Man-


carella, which extends the stable model semantics.

5.1 Generalised stable model semantics


Kakas and Mancarella [Kakas and Mancarella, 1990] develop a semantics
for ALP by generalising the stable model semantics for LP. Let (P, A, I)
be an abductive framework, where P is a general logic program, and let A
be a subset of A. M(A) is a generalised stable model of (P, A, I) iff
• M(A) is a stable model of P U A, and
• M(A) |= 7.
Here the semantics of the integrity constraints / is defined by the second
condition in the definition above. Consequently, an abductive extension
P U A of the program P satisfies I if and only if there exists a stable
model M(A) of P U A such that I is true in M(A).
Note that in a similar manner, it is possible to generalise other model-
theoretic semantics for logic programs, by considering only those models
of P U A (of the appropriate kind, e.g. partial stable models, well-founded
models etc.) in which the integrity constraints are all true.
Generalised stable models are defined independently from any query.
However, given a query Q, we can define an abductive explanation for Q
in (P, A, I) to be any subset A of A such that
• M(A) is a generalised stable model of (P, A, I), and
. M(A) |= Q.
Example 5.1.1. Consider the program P

with A = {a, b} and integrity constraint /

The interpretations M(A 1 ) = {a, p} and M(A 2 ) = {a, b,p, q} are gen-
eralised stable models of <P, A, I). Consequently, both A1 = {a} and
A2 = {a, b} are abductive explanations of p. On the other hand, the in-
terpretation {b, q}, corresponding to the set of abducibles {b}, is not a
generalised stable model of (P, A, I) , because it is not a model of / as it
does not contain p. Moreover, the interpretation {b, q, p}, although it is a
model of P U I and therefore satisfies I according to the consistency view
of constraint satisfaction, is not a generalised stable model of (P, A, I),
because it is not a stable model of P. This shows that the notion of in-
tegrity satisfaction for ALP is stronger than the consistency view. It is also
The Role of Abduction 271

possible to show that it is weaker than the theoremhood view and to argue
that it is similar to the metalevel or epistemic view.
An alternative, and perhaps more fundamental way of understanding
the generalised stable model semantics is by using abduction both for hy-
pothetical reasoning and for NAF. The negative literals in {P, A, I} can be
viewed as further abducibles according to the transformation described in
Section 4. The set of abducible predicates then becomes A U A*, where A*
is the set of negative abducibles introduced by the transformation. This
results in a new abductive framework (P*, ADA*, I U I*), where /* is the
set of special integrity constraints introduced by the transformation of Sec-
tion 4.14 The semantics of the abductive framework (P*, A U A*, I U I*)
can then be given by the sets A* of hypotheses drawn from A U A* which
satisfy the integrity constraints I U I*.
Example 5.1.2. Consider P

with A = {a, b} and I = 0. If Q is <- p then A* = {a, q*, b*} is an


explanation for Q* = Q in (P*, A U A*, I*). Note that 6* is in A* because
/* contains the disjunctive integrity constraint b V b*.
Kakas and Mancarella show a one to one correspondence between the
generalised stable models of (P, A, I) and the sets of hypotheses A* that
satisfy the transformed framework (P*, A U A*, I U I*). Moreover they
show that for any abductive explanation A* for a query Q in {P*, A U
A*, I U I*), A = A*n A is an abductive explanation for Q in (P, A, I).
Example 5.1.3. Consider the framework (P, A, I) and the query Q of
Example 5.1.2. We have already seen that A* = {a, q*, b*} is an expla-
nation for Q* in (P*, A U A", I*). Accordingly the subset A = {a} is an
explanation for Q in (P, A, I).
Note that the generalised stable model semantics as defined above re-
quires that for each abducible a, either a or a* holds. This can be relaxed
by dropping the disjunctive integrity constraints a V a* and defining the
set of abducible hypotheses A to include both a and a*. Such a relaxation
would be in the spirit of replacing stable model semantics by admissible or
preferred extensions in the case of ordinary LP.
Generalised stable models combine the use of abduction for default rea-
soning (in the form of NAF) with the use of abduction for other forms of
14
Note that the transformation described in Section 4 also needs to be applied to the
set / of integrity constraints. For notational convenience, however, we continue to use
the symbol I to represent the result of applying the transformation to / (otherwise we
would need to use the symbol I*, conflicting with the use of the symbol /* for the special
integrity constraints introduced in Section 4).
272 A. C. Kakas, R. A. Kowalski and F. Toni

hypothetical reasoning. In the generalised stable model semantics, abduc-


tion for default reasoning is expressed solely by NAF. However, in the event
calculus persistence axiom presented in Section 2 the predicate persists
is a positive abducible that has a default nature. Therefore, instances of
persists should be abduced unless some integrity constraint is violated. In-
deed, in standard formulations of the persistence axiom the positive atom
persists(T 1 ,P,T 2 ) is replaced by a negative literal ~ dipped(T 1 ,P,T 2 )
[Shanahan, 1989; Denecker and De Schreye, 1993]. In contrast, the abduc-
tion of happens is used for non-default hypothetical reasoning. The dis-
tinction between default reasoning and non-default abduction is also made
in Konolige's proposal [Konolige, 1990], which combines abduction for non-
default hypothetical reasoning with default logic [Reiter, 1980] for default
reasoning. This proposal is similar, therefore, to the way in which gener-
alised stable models combine abduction with NAF. Poole [Poole, 1989], on
the other hand, proposes an abductive framework where abducibles can be
specified either as default, like persists, or non-default, like happens. In
[Toni and Kowalski, 1995], Toni and Kowalski show how both default and
non-default abducibles can be reduced to NAF. This reduction is discussed
in Section 5.5 below.
The knowledge representation problem in ALP is complicated by the
need to decide whether information should be represented as part of the
program, as an integrity constraint, or as an observation to be explained, as
illustrated by the following example taken from [Baral and Gelfond, 1994].
Example 5.1.4.
fly(X) <- bird(X),~ abnormal Jrird(X)
abnormal b i r d ( X ) 4— penguin(X)
has-beak(X) 4- bird(X).
Suppose that bird is abducible and consider the three cases in which
fly(tweety)
is either added to the program, added to the integrity constraints, or con-
sidered as the observation to be explained. In the first case, the abducible
bird(tweety) and, as a consequence, the atom hasJbeak(tweety) belong to
some, but not all, generalised stable models. Instead, in the second case ev-
ery generalised stable model contains Mrd(tweety) and hasJbeak(tweety).
In the last case, the observation is assimilated by adding the explanation
{bird(tweety)} to the program, and therefore hasJbeak(tweety) is derived
in the resulting generalised stable model. Thus, the last two alternatives
have similar effects. Denecker and DeSchreye [Denecker and De Schreye,
1993] argue that the second alternative is especially appropriate for knowl-
edge representation in the temporal reasoning domain.
The Role of Abduction 273

Fig. 4. Extended proof procedure for Example 5.2.1

5.2 An abductive proof procedure for ALP


In [Kakas and Mancarella, 1990a; Kakas and Mancarella, 1990b; Kakas
and Mancarella, 1990c], a proof procedure is given to compute abductive
explanations in ALP. This extends the abductive proof procedure for NAF
[Eshghi and Kowalski, 1989] described in Section 4.2, retaining the basic
structure which interleaves an abductive phase that generates and collects
abductive hypotheses with a consistency phase that incrementally checks
these hypotheses for integrity. We will illustrate these extended proof pro-
cedures by means of examples.
Example 5.2.1. Consider again Example 4.2.1. The abductive proof
procedure for NAF fails on the query <- p. Ignoring, for the moment,
the construction of the set A, the computation is that shown inside the
outer double box of Figure 1 with the abductive and consistency phases
interchanged, i.e. the type of each box changed from a double box to a
single box and vice versa. Suppose now that we have the same program
and query but in an ALP setting where the predicate r is abducible. The
query will then succeed with the explanation A = {q*, r} as shown in
Figure 4. As before the computation arrives at a point where r needs to
be proved. Whereas this failed before, this succeeds now by abducing r.
Hence by adding the hypothesis r to the explanation we can ensure that
274 A. C. Kakas, R. A. Kowalski and F. Toni

q* is acceptable.
An important feature of the abductive proof procedures is that they
avoid performing a full general-purpose integrity check (such as the for-
ward reasoning procedure of [Kowalski and Sadri, 1988]). In the case of a
negative hypothesis, q* for example, a general-purpose forward reasoning
integrity check would have to use rules in the program such as p <- q* to
derive p. The optimised integrity check in the abductive proof procedure
avoids this inference and only reasons forward one step with the integrity
constraint -> (q A q*), deriving the resolvent 4— q, and then reasoning back-
ward from the resolvent.
Similarly, the integrity check for a positive hypothesis, r for example,
avoids reasoning forward with any rules which might have r in the body.
Indeed, in a case, such as Example 5.2.1 above, where there are no domain
specific integrity constraints, the integrity check for a positive abducible,
such as r, simply consists in checking that its complement, in our example
r*, does not belong to A.
To ensure that this optimised form of integrity check is correct, the
proof procedure is extended to record those positive abducibles it needs to
assume absent to show the integrity of other abducibles in A. So whenever a
positive abducible, which is not in A, is selected in a branch of a consistency
phase, the procedure fails on that branch and at the same time records
that this abducible needs to be absent. This extension is illustrated by the
following example.
Example 5.2.2. Consider the program

where r is abducible and the query is 4- p (see Figure 5). The acceptability
of q* requires the absence of the abducible r. The simplest way to ensure
this is by adding r* to A. This, then, prevents the abduction of r and the
computation fails. Notice that the proof procedure does not reason forward
from r to test its integrity. This test has been performed backwards in the
earlier consistency phase for q*, and the addition of r* to A ensures that
it is not necessary to repeat it.
The way in which the absence of abducibles is recorded depends on how
the negation of abducibles is interpreted. Under the stable and generalised
stable model semantics, as we have assumed in Example 5.2.2 above, the
required failure of a positive abducible is recorded by adding its complement
to A. However, in general it is not always appropriate to assume that the
absence of an abducible implies its negation. On the contrary, it may be
appropriate to treat abducibles as open rather than closed (see Section 6.1),
and correspondingly to treat the negation of abducible predicates as open.
The Role of Abduction 275

Fig. 5. Extended proof procedure for Example 5.2.2

As we shall argue later, this might be done by treating such a negation as


a form of explicit negation, which is also abducible. In this case recording
the absence of a positive abducible by adding its complement to A is too
strong, and we will use a separate (purely computational) data structure
to hold this information.
Integrity checking can also be optimised when there are domain specific
integrity constraints, provided the constraints can be formulated as de-
nials 15 containing at least one literal whose predicate is abducible. In this
case the abductive proof procedure needs only a minor extension [Kakas
and Mancarella, 1990b; Kakas and Mancarella, 1990c]: when a new hypoth-
esis is added to A, the proof procedure resolves the hypothesis against any
integrity constraint containing that hypothesis, and then reasons backward
from the resolvent. To illustrate this extension consider the following ex-
ample.
15
Notice that any integrity constraint can be transformed into a denial (possibly with
the introduction of new auxiliary predicates). For example:
276 A. C. Kakas, R. A. Kowalski and F. Toni

Fig. 6. Extended computation for Example 5.2.3

Example 5.2.3. Let the abductive framework be:

where a, 6 are abducible and the query is «— s (see Figure 6).


Assume that the integrity check for a is performed Prolog-style, by re-
solving first with the first integrity constraint and then with the second.
The first integrity constraint requires the additional hypothesis b as shown
in the innermost single box. The integrity check for b is trivial, as b appears
only in the integrity constraint -[b A b*] in I*, and the goal <— b* trivially
fails, given A = {a, b} (innermost double box). But A = {a, b} violates
the integrity constraints, as can be seen by reasoning forward from b to q
and then resolving with the second integrity constraint -> [a A q]. However,
the proof procedure does not perform this forward reasoning and does not
detect this violation of integrity at this stage. Nevertheless the proof pro-
cedure is sound because the violation is found later by backward reasoning
when o is resolved with the second integrity constraint.
The Role of Abduction 277

In summary, the overall effect of additional integrity constraints is to


increase the size of the search space during the consistency phase, with no
significant change to the basic structure of the backward reasoning proce-
dure.
Even if the absence of abducibles is not identified with the presence of
their complement, the abductive proof procedure [Kakas and Mancarella,
1990a; Kakas and Mancarella, 1990b; Kakas and Mancarella, 1990c] de-
scribed above suffers from the same soundness problem shown in Section 4
for the abductive proof procedure for NAF. This problem can be solved sim-
ilarly, by replacing stable models with any of the non-total semantics for
NAF mentioned in Section 4 (partial stable models, preferred extensions,
stable theories or acceptability semantics). Replacing the stable models
semantics by any of these semantics requires that the notion of integrity
satisfaction be revised appropriately. This is an interesting problem for
future work.
The soundness problem can also be addressed by providing an argument-
ation-theoretic semantics for ALP which treats integrity constraints and
NAF uniformly via an appropriately extended notion of attack. In Sec-
tion 5.3 we will see that this alternative approach arises naturally from an
argumentation-theoretic re-interpretation of the abductive proof procedure
for ALP.
The proof procedure can be also modified to provide a sound com-
putational mechanism for the generalised stable model semantics. This
approach has been followed by Satoh and Iwayama [Satoh and Iwayama,
1991], as we illustrate in Section 5.4.

5.3 An argumentation-theoretic interpretation of the


abductive proof procedure for ALP
Similarly to the LP case, the abductive proof procedure for ALP can be
reinterpreted in argumentation-theoretic terms. For the ALP procedure,
attacks can be provided as follows:
• via NAF:
Relative to (P*, A UA", I U I*), E attacks A via NAF if
E attacks A as in Section 4.3, i.e. P* U E h p for some p* € A,
or
a* is in E, for some abducible a in A;
• via integrity constraints:
Relative to (P*, A UA*, IU I*), E attacks A via an integrity con-
straint -(L1 A. . .A Ln) in I if P* U E |- L1, . . . ,L^i,L i + 1 , . . . ,Ln,
for some Li in A. 16

16
Recall that the abductive proof procedure for ALP employs the restriction that each
integrity constraint contains at least one literal with an abducible predicate.
278 A. C. Kakas, R. A. Kowalski and F. Toni

To illustrate the argumentation-theoretic interpretation of the proof


procedure for ALP, consider again Figure 6 of Example 5.2.3. The consis-
tency phase for a, shown in the outer double box, can be understood as
searching for attacks against {a}. There are two such attacks, {q*} and
{6}, shown by the two branches in the figure, {q*} attacks {a} via the in-
tegrity constraint ->(aAp) in /, since q* implies p. Analogously, {b} attacks
{a} via the integrity constraint -i(a A q) in I, since b implies q. The first
attack {q*} is counterattacked by {6}, via NAF (as in Section 4.3), since
this implies q. This is shown in the single box. The hypothesis b is added to
A since the attack {b*} against {b}, via NAF, is trivially counterattacked
by {b}, via NAF, as sketched in the inner double box. However, {b} attacks
{a}, as shown by the right branch in the outer double box. Therefore, A
attacks itself, and this causes failure of the proof procedure.
The analysis of the proof procedure in terms of attacks and counterat-
tacks suggests the following argumentation-theoretic semantics for ALP. A
set of hypotheses A is KM-admissible if
• for every attack E against A,
A attacks (E - A) via NAF alone.
In Section 6.5 we will see that the notion of KM-admissible set of hy-
potheses is similar to the notion of admissibility proposed by Dung [Dung,
1993b] for extended logic programming, in that only attacks via NAF are
allowed to counterattack.
The argumentation-theoretic interpretation of ALP suggests several
ways in which the semantics and proof procedure for ALP can be mod-
ified. Firstly, the notion of attack itself can be modified, e.g. following
Torres' equivalent formulation of the stable theory semantics [Torres, 1993]
(see Section 4.3). Secondly, the notion of admissibility can be changed to
allow counterattacks via integrity constraints, as well as via NAF. Finally,
as in the case of standard LP, the notion of admissibility can be replaced
by other semantic notions such as weak stability and acceptability (see
Section 4.3). The proof procedure for ALP can be modified appropriately
to reflect each of these modifications. Such modifications of the semantics
and the corresponding modifications of the proof procedure require further
investigation.
Using the definition of well-founded semantics given in Section 4.3 (non-
default) abducibles are always undefined, and consequently fulfill no func-
tion, in the well-founded semantics of ALP, as illustrated by the following
example.
Example 5.3.1. Consider the prepositional abductive framework (P, A,
I) where P is

A = {a}, and I = 0. The well-founded model of (P, A, I) is 0.


The Role of Abduction 279

In [Pereira et al., 1991], Pereira, Aparicio and Alferes define an alterna-


tive, generalised well-founded semantics for ALP where first programs are
extended by a set of abducibles as in the case of generalised stable models,
and then the well-founded semantics (rather than stable model semantics)
is applied to the extended programs. As a result, the well-founded mod-
els of an abductive framework are not unique. In the example above, 0,
{p*,a*} and {p,a} are the generalised well-founded models of (P, A, I).
Note that in this application of the well-founded semantics, if an abducible
is not in a set of hypotheses A then its negation does not necessarily belong
to A. Thus the negation of an abducible is not interpreted as NAF. More-
over, since abducible predicates can be undefined some of the non-abducible
predicates can also be undefined.

5.4 Computation of abduction through TMS


Satoh and Iwayama [Satoh and Iwayama, 1991] present a method for com-
puting generalised stable models for logic programs with integrity con-
straints represented as denials. The method is a bottom-up computation
based upon the TMS procedure of [Doyle, 1979]. Although the computa-
tion is not goal-directed, goals (or queries) can be represented as denials
and be treated as integrity constraints.

Compared with other bottom-up procedures for computing generalised sta-


ble model semantics, which first generate stable models and then test the
integrity constraints, the method of Satoh and Iwayama dynamically uses
the integrity constraints during the process of generating the stable models,
in order to prune the search space more efficiently.
Example 5.4.1. Consider the program P

and the set of integrity constraints I = {-p}. P has two stable models
M1 = {p, q} and M2 = {r}, but only M2 satisfies I. The proof procedure
of [Satoh and Iwayama, 1991] deterministically computes only the intended
model M2, without first computing and then rejecting M1.
In Section 8 we will see more generally that truth maintenance systems
can be regarded as a form of ALP.

5.5 Simulation of abduction


Satoh and Iwayama [Satoh and Iwayama, 1991] also show that an abductive
logic program can be transformed into a logic program without abducibles
but where the integrity constraints remain. For each abducible predicate
280 A. C. Kakas, R. A. Kowalski and F. Toni

p in A, a new predicate p' is introduced, which intuitively represents the


complement of p, and a new pair of clauses 17

is added to the program. In effect abductive assumptions of the form


p(t) are thereby transformed into NAF assumptions of the form ~ p'(t).
Satoh and Iwayama apply the generalised stable model semantics to the
transformed program. However, the transformational semantics, which is
effectively employed by Satoh and Iwayama, has the advantage that any
semantics can be used for the resulting transformed program.
Example 5.5.1. Consider the abductive framework (P, A: I) of exam-
ple 5.1.1. The transformation generates a new theory P' with the additional
clauses

P' has two generalised stable models that satisfy the integrity constraints,
namely M'1 = M(A1) U {b} = {a, p, b'}, and M'2 = M(A 2 ) = {a, b, p, q]
where M(A1) and M(^2) are the generalised stable models seen in Exam-
ple 5.1.1.
An alternative way of viewing abduction, which emphasises the defea-
sibility of abducibles, is retractability [Goebel et al., 1986]. Instead of
regarding abducibles as atoms to be consistently added to a theory, they can
be considered as assertions in the theory to be retracted in the presence of
contradictions until consistency (or integrity) is restored (c.f. Section 6.2).
One approach to this understanding of abduction is presented in [Kowal-
ski and Sadri, 1988]. Here, Kowalski and Sadri present a transformation
from a general logic program P with integrity constraints /, together with
some indication of how to restore consistency, to a new general logic pro-
gram P' without integrity constraints. Restoration of consistency is indi-
cated by nominating one atom as retractable in each integrity constraint.18
Integrity constraints are represented as denials, and the atom to be re-
tracted must occur positively in the integrity constraint. The (informally
specified) semantics is that whenever an integrity constraint of the form

17
Satoh and Iwayama use the notation p* instead of p' and explicitly consider only
prepositional programs.
18
Many different atoms can be retractable in the same integrity constraint. Alterna-
tive ways of nominating retractable atoms correspond to alternative ways of restoring
consistency in P.
The Role of Abduction 281

has been violated, where the atom p has been nominated as retractable,
then consistency should be restored by retracting the instance of the clause
of the form

which has been used to derive the inconsistency.


The transformation of [Kowalski and Sadri, 1988] replaces a program P
with integrity constraints / by a program P' without integrity constraints
which is always consistent with I; and if P is inconsistent with I, then P'
represents one possible way to restore consistency (relative to the choice of
the retractable atom).
Given an integrity constraint of the form

where p is retractable, the transformation replaces the integrity constraints


and every clause of the form

by

where the condition ~ q may need to be transformed further, if necessary,


into general logic program form, and where the transformation needs to be
repeated for every integrity constraint. Kowalski and Sadri show that if
P is a stratified program with appropriately stratified integrity constraints
/, so that the transformed program P' is stratified, then P' computes the
same consistent answers as P with I.
Notice that retracting abducible hypotheses is a special case where the
abducibility of a predicate a is represented by an assertion

The following example illustrates the behaviour of the transformation when


applied to ALP.
Example 5.5.2. Consider the simplified version of the event calculus
presented in example 2.0.1. If the integrity constraint

->\persists(T1, P, T2) K happens(E, T)Aterminates(E, P) AT1 < T < T2]

is violated, then it is natural to restore integrity by retracting the instance


of persists(T1, P, T2) that has led to the violation. Thus, persists(T1, P, T2)
is the retractible in this integrity constraint. By applying the transforma-
tion sketched above, the integrity constraint and the use of abduction can
be replaced by the clauses obtained by further transforming
282 A. C. Kakas, R. A. Kowalski and F. Toni

persists(T1, P, T2) «-~ (happens(E, T),terminates(E, P),T1 < T < T2)

into general LP form.


One problem with the retractability semantics is that the equivalence of
the original program with the transformed program was proved only in the
case where the resulting transformed program is locally stratified. Moreover
the proof of equivalence was based on a tedious comparison of search spaces
for the two programs. This problem was addressed in a subsequent paper
[Kowalski and Sadri, 1990] in which integrity constraints are re-expressed
as extended clauses and the retractable atoms become explicitly negated
conclusions. This use of extended clauses in place of integrity constraints
with retractibles is discussed later in Section 6.3.
The transformation of [Kowalski and Sadri, 1988], applied to ALP,
treats all abducibles as default abducibles. In particular, abducibles which
do not occur as retractibles in integrity constraints are simply asserted
in the transformed program P'. Therefore, this transformation can only
be used to eliminate default abducibles together with their integrity con-
straints. A more complete transformation [Toni and Kowalski, 1995] can
be obtained by combining the use of retractibles to eliminate integrity con-
straints with the transformation of [Satoh and Iwayama, 1991] for reducing
non-default abducibles to NAF. The new transformation is defined for ab-
ductive frameworks where every integrity constraint has a retractible which
is either an abducible or the NAF of an abducible.
As an example, consider the propositional abductive logic program
(P, A, I) where P contains the clause

a is in A, and / contains the integrity constraint

where a is retractible. If a is a default abducible, the transformation gen-


erates the logic program P'

where, as before, a' stands for the complement of a. The first clause in P' is
obtained by replacing the positive condition a in the clause in P by the NAF
literal ~ a'. The second clause replaces the integrity constraint in /. Note
that this replaces "a should be retracted" if the integrity constraint ->[aAg]
is violated by "the complement a' of a should be asserted". Finally, the
The Role of Abduction 283

last clause in P' expresses the nature of a as a default abducible. Namely,


a holds by default, unless some integrity constraint is violated. In this
example, a holds if q does not hold.
If a is a non-default abducible, then the logic program P' obtained
by transforming the same abductive program {P, A, I) also contains the
fourth clause

that, together with the third clause, expresses that neither a nor a' need
hold, even if no integrity constraint is violated. Note that the last two
clauses in P' are those used by Satoh and Iwayama [Satoh and Iwayama,
1991] to simulate non-default abduction by means of NAF.
Toni and Kowalski [Toni and Kowalski, 1995] prove that the transfor-
mation is correct and complete in the sense that there is a one-to-one cor-
respondence between attacks in the framework (P, A, I) and in the frame-
work corresponding to the transformed program P'. Thus, for any seman-
tics that can be defined argumentation-theoretically there is a one-to-one
correspondence between the semantics for an abductive logic program and
the semantics of the transformed program. As a consequence, any proof
procedure for LP which is correct for one of these semantics provides a
correct proof procedure for ALP for the analogous semantics (and, less
interestingly, vice versa).
In addition to the transformations from ALP to general LP, discussed
above, transformations between ALP and disjunctive logic programming
(DLP) have also been investigated. Inoue et al. [Inoue et al., 1992a],19 in
particular, translate ALP clauses of the form

where a is abducible, into DLP clauses

where a' is a new atom that stands for the complement of a, as expressed
by the integrity constraint

(5.1)

A model generation theorem-prover (such as SATCHMO or MGTP [Fujita


and Hasegawa, 1991]) can then be applied to compute all the minimal
models that satisfy the integrity constraints (5.1). This transformation is
related to a similar transformation [Inoue et al., 1992] for eliminating NAF.
Elsewhere [Sakama and Inoue, 1994], Sakama and Inoue demonstrate a
19
A description of this work can also be found in [Hasegawa and Fujita, 1992].
284 A. C. Kakas, R. A. Kowalski and F. Tom"

one-to-one correspondence between generalised stable models for ALP and


possible models [Sakama and Inoue, 1993] for DLP. Consider, for example,
the abductive logic program (P, A, I) where P is

A = {a} and I is empty. M1 = 0 and M2 = {a,p} are the generalised stable


models of (P, A, I). The program can be transformed into a disjunctive
logic program PD

PD has possible models M( = {e}, M2 = {a,p} and M3 = {e,a,p}, such


that M1 - {e} = Ml and M2^ - {e} = M3 - {e} = M2.
Conversely, [Sakama and Inoue, 1994] shows how to transform DLP
programs into ALP. For example, consider the disjunctive logic program
PD

whose possible models are M1 = {c,a}, M2 = {c,b} and M3 = {c,a,b}.


It can be transformed into an abductive logic program (P, A, I) where P
consists of

a' and b' are new atoms, A = {a',b'}, and I consists of

(P, A, I) has generalised stable models M1 = {c, a, a'}, M2 — {c,b,b'}


and M3 = {c, a, a',b, b'}, such that, if HB is the Herbrand base of PD,
M( n HB = Mi, for each i = 1,2, 3.
Whereas the transformation of [Sakama and Inoue, 1994] deals with
inclusive disjunction, Dung [Dung, 1992] presents a simpler transformation
that deals with exclusive disjunction, but works only for the case of acyclic
programs. For example, the clause
The Role of Abduction 285

can be replaced by the two clauses

With this transformation, for acyclic programs, the Eshghi-Kowalski pro-


cedure presented in Section 4.2 is sound. For the more general case, Dung
[Dung, 1992a] represents disjunction explicitly and extends the Eshghi-
Kowalski procedure by using resolution-based techniques similar to those
employed in [Finger and Genesereth, 1985].
5.6 Abduction through deduction from the completion
In the approaches presented so far, hypotheses are generated by backward
reasoning with the clauses of logic programs used as inference rules. An
alternative approach is presented by Console, Dupre and Torasso [Console
et al., 1991]. Here clauses of programs are interpreted as if-halves of if-and-
only-if definitions that are obtained from the completion of the program
[Clark, 1978] restricted to non-abducible predicates. Abductive hypotheses
are generated deductively by replacing atoms by their definitions, starting
from the observation to be explained.
Given a propositional logic program P with abducible predicates A
without definitions in P, let PC denote the completion of the non-abducible
predicates in P. An explanation formula for an observation O is the most
specific formula F such that

where F is formulated in terms of abducible predicates only, and F is more


specific than F' iff |= F -> F' and ^ F' -> F.
Based on this specification, a proof procedure that generates explana-
tion formulas is defined. This proof procedure replaces atoms by their
definitions in Pc, starting from a given observation O. Termination and
soundness of the proof procedure are ensured for hierarchical programs.
The explanation formula resulting from the computation characterises all
the different abductive explanations for O, as exemplified in the following
example.
Example 5.6.1. Consider the following program P:
wobbly-wheel broken-spokes
wobbly-wheel flat-tyre
flat-tyre punctured-tube
flat-tyre leaky-valve,
where the predicates without definitions are considered to be abducible.
The completion PC is:
286 A. C. Kakas, R. A. Kowalski and F. Toni

wobbly-wheel <-»• broken-spokes V flat-tyre


flat-tyre - punctured-tube V leaky-valve.

If O is wobbly-wheel then the most specific explanation F is

broken-spokes V punctured-tube V leaky-valve,

corresponding to the abductive explanations

Console, Dupre and Torasso extend this approach to deal with propo-
sitional abductive logic programs with integrity constraints / in the form
of denials of abducibles and of clauses expressing taxonomic relationships
among abducibles. An explanation formula for an observation O is now de-
fined to be the most specific formula F, formulated in terms of abducible
predicates only, such that

The proof procedure is extended by using the denial and taxonomic in-
tegrity constraints to simplify F.
In the more general case of non-propositional abductive logic programs,
the Clark equality theory CET [Clark, 1978], is used; the notion that F
is more specific than F' requires that F -> F' be a logical consequence
of CET and that F' -> F not be a consequence of CET. The explana-
tion formula is unique up to equivalence with respect to CET. The proof
procedure is extended to take into account the equality theory CET.
Denecker and De Schreye [Denecker and De Schreye, 1992a] compare
the search space obtained by reasoning backward using the if-half of the
if-and-only-if form of a definite program with that obtained by reasoning
forward using the only-if half. They show an equivalence between the search
space for SLD-resolution extended with abduction and the search space for
model generation with SATCHMO [Manthey and Bry, 1988] augmented
with term rewriting to simulate unification.

5.7 Abduction and constraint logic programming


ALP has many similarities with constraint logic programming (CLP). Recog-
nition of these similarities has motivated a number of recent proposals to
unify the two frameworks.
The Role of Abduction 287

Both frameworks distinguish two kinds of predicates. The first kind is


defined by ordinary LP clauses, and is eliminated during query evaluation.
The second kind is "constrained" either by integrity constraints in the case
of ALP or by means of a built-in semantic domain in the case of CLP. In
both cases, an answer to a query is a "satisfiable" formula involving only
the second kind of predicate.
Certain predicates, such as inequality, can be treated either as abducible
or as constraint predicates. Treated as abducible, they are constrained by
explicitly formulated integrity constraints such as

Treated as constraint predicates, they are tested for satisfiability by us-


ing specialised algorithms which respect the semantics of the underlying
domain. Constraints can also be simplified, replacing, for example,

by

Such simplification is less common in abductive frameworks.


A number of proposals have been made recently to unify the treatment
of abducibles and constraints. Several of these, [Eshghi, 1988; Shanahan,
1989; Maim, 1992; Kakas and Michael, 1993] in particular, have investi-
gated the implementation of specialised constraint satisfaction and simpli-
fication algorithms of CLP (specifically for inequality) by means of general-
purpose integrity checking methods applied to domain-specific integrity
constraints as in the case of ALP.

Kowalski [Kowalski, 1992a] proposes a general framework which at-


tempts to unify ALP and CLP using if-and-only-if definitions for ordinary
LP predicates and using integrity constraints for abducible and constraint
predicates. Abduction is performed by means of deduction in the style of
[Console et al., 1991] (see Section 5.6). This framework has been developed
further by Fung [Fung, 1993] and has been applied to job-shop scheduling
by Toni [Toni, 1994]. A related proposal, to include user-defined constraint
handling rules within a CLP framework, has been made by Fruhwirth
[Frtuhwirth, 1992].
Burchert [Buchert, 1994] and Burchert and Nutt [Buchert and Nutt,
1991], on the other hand, define a framework for general clausal resolution
and show how abduction without integrity constraints can be treated as a
special case of constrained resolution.
288 A. C. Kakas, R. A. Kowalski and F. Toni

Another approach, which integrates both frameworks while preserv-


ing their identity, has been developed by Kakas and Michael [Kakas and
Michael, 1995]. In this approach, the central notions of the two frame-
works are combined, so that abduction and constraint handling cooperate
to solve a common goal. Typically, the goal is reduced first by abduction
to abducible hypotheses whose integrity checking reduces this further to a
set of constraints to be satisfied in CLP.
Constructive abduction is the generation of non-ground abductive ex-
planations, such as A = {3X a ( X ) } . The integrity checking of such ab-
ducible hypotheses involves the introduction of equality assumptions, which
can naturally be understood in CLP terms. A procedure for performing
constructive abduction within a framework that treats equality as an ab-
ducible predicate and the Clark equality theory as a set of integrity con-
straint was first proposed by Eshghi [Eshghi, 1988]. Building upon this
proposal, Kakas and Mancarella [Kakas and Mancarella, 1993a] extend the
abductive proof procedure for LP in [Eshghi and Kowalski, 1989] (see Sec-
tion 4.2) to combine constructive negation with constructive abduction in
a uniform way, by reducing the former to the latter using the abductive
interpretation of NAF.
The problem of constructive abduction has also been studied within
the completion semantics. Denecker and De Schreye [l992b] define a proof
procedure for constructive abduction, SLDNFA, which they show is sound
and complete. Teusink [1993] extends Drabent's [1995] procedure, SLDNA,
for constructive negation to perform constructive abduction and uses three-
valued semantics to show soundness and completeness. In both proposals,
[Denecker and De Schreye, 1992b] and [Teusink, 1993], integrity constraints
are dealt with by means of a transformation, rather than explicitly.

6 Extended logic programming


Extended logic programming (ELP) extends general LP by allowing, in
addition to NAF, a second, explicit form of negation. Explicit negation
can be used, when the definition of a predicate is incomplete, to explicitly
define negative instances of the predicate, instead of having them inferred
implicitly using NAF.

Clauses with explicit negation in their conclusions can also serve a sim-
ilar function to integrity constraints with retractibles. For example, the
integrity constraint

-,\persists(T1, P, T2) A happens(E, T)A terminates(E, P)/\Tl<T < T2]

with persists(T 1 , P, T2) retractible can be reformulated as a clause with


explicit negation in the conclusion
The Role of Abduction 289

->persists(T1, P, T2) «- happens(E, T), terminates(E, P ) , T 1 < T < T2.

6.1 Answer set semantics


In general logic programs, negative information is inferred by means of
NAF. This is appropriate when the closed world assumption [Reiter, 1978],
that the program gives a complete definition of the positive instances of a
predicate, can safely be applied. It is not appropriate when the definition of
a predicate is incomplete and therefore "open", as in the case of abducible
predicates.

For open predicates it is possible to extend logic programs to allow explicit


negation in the conclusions of clauses. In this section we will discuss the
extension proposed by Gelfond and Lifschitz [Gelfond and Lifschitz, 1990].
This extension is based on the stable model semantics, and can be under-
stood, therefore, in terms of abduction, as we have already seen.

Gelfond and Lifschitz define the notion of extended logic programs,


consisting of clauses of the form:

where n > m > 0 and each Li is either an atom (A) or the explicit nega-
tion of an atom ( - A ) . This negation denoted by "–" is called "classical
negation" in [Gelfond and Lifschitz, 1990]. However, as we will see be-
low, because the contrapositives of extended clauses do not hold, the term
"classical negation" can be regarded as inappropriate. For this reason we
use the term "explicit negation" instead.
A similar notion has been investigated by Pearce and Wagner [Pearce
and Wagner, 1991], who develop an extension of definite programs by means
of Nelson's strong negation. They also suggest the possibility of combining
strong negation with NAF. Akama [Akama, 1992] argues that the seman-
tics of this combination of strong negation with NAF is equivalent to the
answer set semantics for extended logic programs developed by Gelfond
and Lifschitz.
The semantics of an extended program is given by its answer sets, which
are like stable models but consist of both positive and (explicit) negative lit-
erals. Perhaps the easiest way to understand the semantics is to transform
the extended program P into a general logic program P' without explicit
negation, and to apply the stable model semantics to the resulting general
logic program. The transformation consists in replacing every occurrence
of explicit negation -p(t) by a new (positive) atom p'(t). The stable mod-
els of P' which do not contain a contradiction of the form p(t) and p'(t)
correspond to the consistent answer sets of P. The corresponding an-
swer sets of P contain explicit negative literals ->p(t) wherever the stable
290 A. C. Kakas, R. A. Kowalski and F. Toni

models contain p'(t). In [Gelfond and Lifschitz, 1990] the answer sets are
defined directly on the extended program by modifying the definition of the
stable model semantics. The consistent answer sets of P also correspond to
the generalised stable models (see Section 5.1) of p' with a set of integrity
constraints V X-> [ p ( X ) A p'(X)], for every predicate p.
In the general case a stable model of P' might contain a contradiction
of the form p(t) and p'(t). In this case the corresponding inconsistent
answer set is defined to be the set of all the variable-free (positive and
explicit negative) literals. It is in this sense that explicit negation can
be said to be "classical". The same effect can be obtained by explicitly
augmenting P' by the clauses

for all predicate symbols q and p in P'. Then the answer sets of P simply
correspond to the stable models of the augmented set of clauses. If these
clauses are not added, then the resulting treatment of explicit negation
gives rise to a paraconsistent logic, i.e. one in which contradictions are
tolerated.
Notice that, although Gelfond and Lifschitz define the answer set se-
mantics directly without transforming the program and then applying the
stable model semantics, the transformation can also be used with any
other semantics for the resulting transformed program. Thus Przymusin-
ski [Przymusinski, 1990] for example applies the well-founded semantics
to extended logic programs. Similarly any other semantics can also be
applied. As we have seen before, this is one of the main advantages of
transformational semantics in general.
An important problem for the practical use of extended programs is how
to distinguish whether a negative condition is to be interpreted as explicit
negation or as NAF. This problem will be addressed in Sections 6.4 and 9.

6.2 Restoring consistency of answer sets


The answer sets of an extended program are not always consistent.
Example 6.2.1. The extended logic program:

has no consistent answer set.


As mentioned in Section 6.1, this problem can be dealt with by employ-
ing a paraconsistent semantics. Alternatively, in some cases it is possible
to restore consistency by removing some of the NAF assumptions implicit
The Role of Abduction 291

in the answer set. In the example above we can restore consistency by


rejecting the NAF assumption ~ bird(tom) even though bird(tom) does
not hold. We then get the consistent set {bat(tom), fly(tom)}. This prob-
lem has been studied in [Dung and Ruamviboonsuk, 1991] and [Pereira et
al., 1991a]. Both of these studies are primarily concerned with the related
problem of inconsistency of the well-founded semantics when applied to
extended logic programs [Przymusinski, 1990].
To deal with the problem of inconsistency in extended logic programs,
Dung and Ruamviboonsuk [Dung and Ruamviboonsuk, 1991] apply the
preferred extension semantics to a new abductive framework derived from
an extended logic program. An extended logic program P is first trans-
formed into an ordinary general logic program P' by renaming explicitly
negated literals ->p(t) by positive literals p'(t). The resulting program is
then further transformed into an abductive framework by renaming NAF
literals ~ q(t) by positive literals q* (t) and adding the integrity constraints

as described in Section 4.3. Thus if p' expresses the explicit negation of


p the set A* will contain p'* as well as p* . Moreover Dung includes in I*
additional integrity constraints of the form

to prevent contradictions.
Extended preferred extensions are then defined by modifying the defini-
tion of preferred extensions in Section 4 for the resulting abductive frame-
work with this new set I* of integrity constraints. The new integrity con-
straints in /* have the effect of removing a NAF hypothesis when it leads
to a contradiction. Clearly, any other semantics for logic programs with
integrity constraints could also be applied to this framework.
Pereira, Aparicio and Alferes [Pereira et al., 199la] employ a similar
approach within the context of Przymuszynski's extended stable models
[Przymusinski, 1990]. It consists in identifying explicitly all the possible
sets of NAF hypotheses which lead to an inconsistency and then restoring
consistency by removing at least one hypothesis from each such set. This
method can be viewed as a form of belief revision, where if inconsistency
can be attributed to an abducible hypothesis or a retractable atom (see
Section 5.5), then we can reject the hypothesis to restore consistency. In
fact Pereira, Aparicio and Alferes have also used this method to study
counterfactual reasoning [Pereira et al., 1991c]. Alferes and Pereira [Alferes
and Pereira, 1993] have shown that this method of restoring consistency
can also be viewed in terms of inconsistency avoidance.
This method [Pereira et al., 1991a] is not able to restore consistency in
all cases, as illustrated by the following example.
292 A. C. Kakas, R. A. Kowalski and F. Toni

Example 6.2.2.
given the extended logic program

the method of [Pereira et al., 1991a] is unable to restore consistency by


withdrawing the hypothesis p* .
In [Pereira et al., 1992] and [Pereira et al., 1993], Pereira and Alferes
present two different modifications of the method of [Pereira et al., 1991a]
to deal with this problem. For the program in Example 6.2.2, the method
in [Pereira et al., 1992] restores consistency by letting p undefined, while
the method in [Pereira et al., 1993] restores consistency by assigning p to
truth. This second method is more suitable for diagnosis applications.
Both methods, [Dung and Ruamviboonsuk, 1991] and [Pereira et al.,
1991a; Pereira et al., 1992; Pereira et al., 1993], can deal only with in-
consistencies that can be attributed to NAF hypotheses, as shown by the
following example.
Example 6.2.3. It is not possible to restore consistency by removing
NAF hypotheses given the program:

However, Inoue Inoue, 1994; Inoue, 1991] suggests a general method


for restoring consistency, which is applicable to this case. This method (see
also Section 6.3) is based on [Geffner, 1990] and [Poole, 1988] and consists
in isolating inconsistencies by finding maximally consistent subprograms.
In this approach a knowledge system is represented by a pair (P, H), where:
1 . P and H are both extended logic programs,
2. P represents a set of facts,
3. H represents a set of assumptions.
The semantics is given using abduction as in [Poole, 1988] (see Section 3)
by means of theory extensions P U A of P, with A C H maximal with
respect to set inclusion, such that P U A has a consistent answer set.
In this approach, whenever the answer set of an extended logic program
P is inconsistent, it is possible to restore consistency by regarding it as a
knowledge system of the form

(0, P).
The Role of Abduction 293

For Example 6.2.3 this will give two alternative semantics, {p} or {->p}.
A similar approach to restoring consistency follows also from the work
in [Kakas, 1992; Kakas et al., 1994] (see Section 7), where argumentation-
based semantics can be used to select acceptable (and hence consistent)
subsets of an inconsistent extended logic program.

6.3 Rules and exceptions in LP


Another way of restoring consistency of answer sets is presented in [Kowal-
ski and Sadri, 1990], where sentences with explicitly negated conclusions
are given priority over sentences with positive conclusions. In this ap-
proach, extended clauses with negative conclusions are similar to integrity
constraints with retractibles.
Example 6.3.1. Consider the program

and the integrity constraint

with fly(X) retractable. The integrity constraint is violated, because both


walk(john) and fly(john) hold. Following the approach presented in Sec-
tion 5.5, integrity can be restored by retracting the instance

of the first clause in the program. Alternatively, the integrity constraint


can be formulated as a clause with an explicit negative conclusion

In the new formulation it is natural to interpret clauses with negative


conclusions as exceptions, and clauses with positive conclusions as default
rules. In this example, the extended clause

can be interpreted as an exception to the "general" rule


294 A. C. Kakas, R. A. Kowalski and F. Toni

To capture the intention that exceptions should override general rules,


Kowalski and Sadri [Kowalski and Sadri, 1990] modify the answer set se-
mantics, so that instances of clauses with positive conclusions are retracted
if they are contradicted by explicit negative information.

Kowalski and Sadri [Kowalski and Sadri, 1990] also present a transfor-
mation, which preserves the new semantics, and is arguably a more elegant
form of the transformation presented in [Kowalski and Sadri, 1988] (see
Section 5.5). In the case of the flying-birds example described above the
new transformation gives the clause

This can be further transformed by "macroprocessing" the call to -> f l y ( X ) ,


giving the result of the original transformation in [Kowalski and Sadri, 1988]

In general, the new transformation introduces a new condition

into every clause with a positive conclusion p(t). The condition is vacuous
if there are no exceptions with ->p in the conclusion. The answer set
semantics of the new program is equivalent to the modified answer set
semantics of the original program, and both are consistent. Moreover,
the transformed program can be further transformed into a general logic
program by renaming explicit negations ->p by new positive predicates p'.
Because of this renaming, positive and negative predicates can be handled
symmetrically, and therefore, in effect, clauses with positive conclusions can
represent exceptions to rules with (renamed) negative conclusions. Thus,
for example, a negative rule such as

with a positive exception

can be transformed into a clause

and all occurrences of the negative literal -*fly(X) can be renamed by a


new positive literal f l y ' ( X ) . This is not entirely adequate for a proper
The Role of Abduction 295

treatment of exceptions to exceptions. However, this approach can be


extended, as we shall see in Section 6.6.
More direct approaches to the problem of treating positive and negative
predicates symmetrically in default reasoning are presented in [Inoue, 1994;
Inoue, 1991], following the methods of [Geffner, 1990] and [Poole, 1988] (see
Section 6.2 for a discussion), and in [Kakas, 1992; Kakas et a/., 1994], based
on an argumentation-theoretic framework (see Sections 6.4 and 7).

6.4 (Extended) Logic Programming without Negation


as Failure
Kakas, Mancarella and Dung [Kakas et al., 1994] show that the Kowalski-
Sadri transformation presented in Section 6.3 can be applied in the reverse
direction, to replace clauses with NAF by clauses with explicit negation
together with a priority ordering between extended clauses. Thus, for ex-
ample,

can be transformed "back" to

together with an ordering that indicates that the second clause has priority
over the first. In general, the extended clauses

generated by transforming the clause

are ordered so that TJ > r for 1 < j < k. In [Kakas et al., 1994], the result-
ing prioritised clauses are formulated in an ELP framework (with explicit
negation) without NAF but with an ordering relation on the clauses of the
given program.

This new framework for ELP is proposed in [Kakas et al., 1994] as an exam-
ple of a general theory of the acceptability semantics (see Section 4.3) devel-
oped within the argumentation-theoretic framework introduced in [Kakas
296 A. C. Kakas, R. A. Kowalski and F. Toni

et a/., 1993] (see Section 7). Its semantics is based upon an appropriate no-
tion of attack between subtheories consisting of partially ordered extended
clauses in a theory T. Informally, for any subsets E and A of T such that
E U A have a contradictory consequence, E attacks A if and only if either
E does not contain a clause which is lower than some clause in A or if E
does contain such a clause, it also contains some clause which is higher than
a clause in A. Thus, the priority ordering is used to break the symmetry
between the incompatible sets E and A. Hence in the example above, if
we have a bird that walks, then the subtheory which, in addition to these
two facts, consists of the second clause

attacks the subtheory consisting of the clause

and the same two facts, but not vice versa; so, the first subtheory is ac-
ceptable whereas the second one is not.
Kakas, Mancarella and Dung show that, with this notion of attack in
the new framework with explicit negation but without NAF, it is possible
to capture exactly the semantics of NAF in LP. This shows that, if LP is
extended with explicit negation, then NAF can be simulated by introducing
a priority ordering between clauses. Moreover, the new framework of ELP
is more general than conventional ELP as it allows any ordering relation
on the clauses of extended logic programs.
In the extended logic program which results from the transformation de-
scribed above, if ->p holds then ~ p holds in the corresponding general logic
program, for any atom p. We can argue, therefore, that the transformed
extended logic program satisfies the coherence principle, proposed by
Pereira and Alferes [Pereira and Alferes, 1992], namely that whenever ->p
holds then ~ p must also hold. They consider the satisfaction of this prin-
ciple to be a desirable property of any semantics for ELP, as illustrated by
the following example, taken from [Alferes et al., 1993].
Example 6.4.1. Given the extended logic program

one should derive the conclusion take-bus.


The coherence principle automatically holds for the answer set seman-
tics. Pereira and Alferes [Pereira and Alferes, 1992] and Alferes, Dung
and Pereira [Alferes et al, 1993] have defined new semantics for ELP that
incorporates the coherence principle. These semantics are adaptations of
The Role of Abduction 297

Przymusinski's extended stable model semantics [Przymusinski, 1990] and


Dung's preferred extension semantics [Dung, 1991], respectively, to ELP.
Alferes, Damasio and Pereira [Alferes et al., 1994] provide a sound and
complete proof procedure for the semantics in [Pereira and Alferes, 1992].
The proof procedure is implemented in Prolog by means of an appropriate
transformation from ELP to general LP.

6.5 An argumentation-theoretic approach to ELP


The Dung and Ruamviboonsuk semantics for ELP [Dung and Ruamvi-
boonsuk, 1991] in effect reduces ELP to ALP by renaming the explicit
negation -<p of a predicate p to a new predicate p' and employing integrity
constraints

for all predicates p in the program. This reduction automatically pro-


vides us with an argumentation-theoretic interpretation of ELP, where at-
tacks via these integrity constraints become attacks via explicit nega-
tion. Such notions of attack via explicit negation have been defined by
Dung [Dung, 1993b] and Kakas, Mancarella and Dung [Kakas et al., 1994].
Dung's notion can be formulated as follows: a set of NAF literals E 20 at-
tacks another such set A via explicit negation (relative to a program P') 21
if

Kakas, Mancarella and Dung's notion can be formulated as follows: E


attacks a non-empty set A via explicit negation (relative to a program P')
if

where p = p' and p1 — p.


Augmenting the notion of attack via NAF by either of these new notions
of attack via explicit negation, we can define admissibility, weak stability
and acceptability semantics similarly to the definitions in Section 4.3. How-
ever, the resulting semantics might give unwanted results, as illustrated by
the following example given in [Dung, 1993b].
Example 6.5.1. Given the extended logic program

20
Note that, for simplicity, here we use NAF literals directly as hypotheses, without
renaming them as positive atoms.
21
P' stands for the extended logic program P where all explicitly negated literals of
the form ~p(t) are rewritten as atoms p'(i).
298 A. C. Kakas, R. A. Kowalski and F. Toni

{ab-penguin*(tweety)} attacks {abJbird*(tweety)} via NAF. However,


{ab-bird*(tweety)} attacks {ab.penguin*(tweety)} via explicit negation
(and vice versa). Therefore, {ab-bird*(tweety)} counterattacks all attacks
against it, and is admissible. As a consequence, fly(tweety) holds in the
extension given by {ab-bird* (tweety)}. However, intuitively fly(tweety)
should hold in no extension.
To cope with this problem, Dung [Dung, 1993b] suggests the follow-
ing semantics, while keeping the definition of attack unchanged. A set of
hypotheses is D-admissible if
• A does not attack itself, either via explicit negation or via NAF, and
• for every attack E against A, either via explicit negation or via NAF,
A attacks E via NAF.
Note that, if ELP is seen as a special instance of ALP, then D-admissibility
is very similar to KM-admissibility, presented in Section 5.3 for ALP, in
that the two notions share the feature that counterattacks can only be
provided by means of attacks via NAF.
It can be argued, however, that the problem in this example lies not so
much with the semantics but with the representation itself. The last clause

can be understood as attempting to assign a higher priority to the second


clause of the program over the first. This can be done, without this last
clause, explicitly in the ELP framework with priorities of [Kakas et al.,
1994] (Section 6.4) or in the rules and exceptions approach [Kowalski and
Sadri, 1990] (Section 6.3).
An argumentation-theoretic interpretation for ELP has also been pro-
posed by Bondarenko, Toni and Kowalski [Bondarenko et al., 1993]. Their
proposal, which requires that P' U A be consistent with the integrity con-
straints
VX-[p(X)Ap(X)]
for each predicate p, instead of using a separate notion of attack via explicit
negation, has certain undesirable consequences, as shown in [Alferes and
Pereira, 1994]. For example, the program

admits both {~ q} and {~ p} as admissible extensions, while the only


intuitively correct extension is {~ q}.
The Role of Abduction 299

Alferes and Pereira [1994] use argumentation-theoretic notions to ex-


tend the well-founded semantics for ELP in [Pereira and Alferes, 1992].
Kakas, Mancarella and Dung [Kakas et al., 1994] also define a well-founded
semantics for ELP based upon argumentation-theoretic notions.

6.6 A methodology for default reasoning with explicit


negation
Compared with other authors, who primarily focus on extending or modi-
fying the semantics of LP to deal with default reasoning, Pereira, Aparicio
and Alferes [Pereira et al, 1991] develop a methodology for performing de-
fault reasoning with extended logic programs. Defaults of the form "nor-
mally if q then p" are represented by an extended clause

where the condition nameqp can be understood as a name given to the


default. The condition ~ ->p deals with exceptions to the conclusion of
the rule, whilst the condition ~ ->nameqp deals with exceptions to the
rule itself. An exception to the rule would be represented by an extended
clause of the form

where the condition r represents the conditions under which the exception
holds. In the flying-birds example, the second clause of

expresses that the default named birds-fly does not apply for penguins.
The possibility of expressing both exceptions to rules as well as ex-
ceptions to predicates is useful for representing hierarchies of exceptions.
Suppose we want to change (6.3) to the default rule "penguins usually don't
fly". This can be done by replacing (6.3) by

where penguins-don't,fly is the name assigned to the new rule. To give


preference to the more specific default represented by (6.4) over the more
general default (6.2), we add the additional clause

Then to express that superpenguins fly, we can add the rule:


300 A. C. Kakas, R. A. Kowalski and F. Toni

Pereira, Aparicio and Alferes [Pereira et al., 1991] use the well-founded se-
mantics extended with explicit negation to give a semantics for this method-
ology for default reasoning. However it is worth noting that any other se-
mantics of extended logic programs could also be used. For example Inoue
[Inoue, 1994; Inoue, 1991] uses an extension of the answer set semantics
(see Section 6.2), but for a slightly different transformation.

6.7 ELP with abduction


Inoue [Inoue, 1991] (see also Section 6,3) and Pereira, Aparicio and Alferes
[Pereira et al., 1991] investigate extended logic programs with abducibles
but without integrity constraints. They transform such programs into ex-
tended logic programs without abduction by adding a new pair of clauses

for each abducible predicate p. Notice that the transformation is identi-


cal to that of Satoh and Iwayama [Satoh and Iwayama, 1991] presented in
Section 5.5, except for the use of explicit negation instead of new predi-
cates. Inoue [Inoue, 1991] and Pereira, Aparicio and Alferes [Pereira et al.,
1991] assign different semantics to the resulting program. Whereas Inoue
applies the answer set semantics, Pereira, Aparicio and Alferes apply the
extended stable model semantics of [Przymusinski, 1990]. Pereira, Aparicio
and Alferes [Pereira et al., 1991b] have also developed proof procedures for
this semantics.
As mentioned above, Pereira, Aparicio and Alferes [Pereira et al., 1991]
understand the transformed programs in terms of (three-valued) extended
stable models. This has the advantage that it gives a semantics to every
logic program and it does not force abducibles to be either believed or
disbelieved. But the advantage of the transformational approach, as we
have already remarked, is that the semantics of the transformed program
is independent of the transformation. Any semantics can be used for the
transformed program (including even a transformational one, e.g. replacing
explicitly negated atoms -<p(t) by a new atom p'(t)).

7 An abstract argumentation-based framework for


default reasoning
Following the argumentation-theoretic interpretation of NAF introduced
in [Kakas et al., 1993], Kakas [Kakas, 1992] generalised the interpretation
and showed how other logics for default reasoning can be based upon a
similar semantics. In particular, he showed how default logic can be un-
derstood in such terms and proposed a default reasoning framework based
The Role of Abduction 301

on the argumentation-theoretic acceptability semantics (see Section 4.3) as


an alternative to default logic.
Dung [Dung, 1993a] proposed an abstraction of the argumentation-
theoretic interpretation of NAF introduced in [Kakas et al, 1993], where
arguments and the notion of one argument attacking another are treated as
primitive concepts which can be superimposed upon any monotonic logic
and can even be introduced into non-linguistic contexts. Stable, admissible,
preferred, and well-founded semantics can all be defined in terms of sets
of arguments that are able to attack or defend themselves against attack
by other arguments. Dung shows that many problems from such different
areas as AI, game theory and economics can be formulated and studied
within this argumentation-theoretic framework.
Bondarenko, Toni and Kowalski [Bondarenko et ai, 1993] modified
Dung's notion of an abstract argumentation-theoretic framework by defin-
ing an argument to be a monotonic derivation from a set of abductive
assumptions. This new framework, like that of [Kakas, 1992], can be un-
derstood as a natural abstraction and extension of the Theorist framework
in two respects. First, the underlying logic can be any monotonic logic
and not just classical first-order logic. Second, the semantics of the non-
monotonic extension can be formulated in terms of any argumentation-
theoretic notion, and not just in terms of maximal consistency.
To give an idea of this framework, we show here how a simplified ver-
sion of the framework can be used to define an abstract notion of stable
semantics which includes as special cases stable models for logic programs,
extensions for default logic [Reiter, 1980], autoepistemic logic [Moore, 1985]
and non-monotonic logic II [McDermott, 1982]. We follow the approach of
Bondarenko, Dung, Kowalski and Toni [Bondarenko et aI., 1997] (see also
[Kakas, 1992]).
Let T be a set of sentences in any monotonic logic, I- the provability
operator for that logic and A a set of candidate abducible sentences. For
any a £ A, let a be some sentence that represents the "contrary" of a.
Then, a set of assumptions E is said to attack a set of assumptions A iff
• T U E \- a for some a 6 A.
Note that the notion of a sentence a being the contrary of an assumption
a can be regarded as a special case of the more general notion that a is
retractible in an integrity constraint

This more general notion is useful for capturing the semantics of ALP.
To cater for the semantics of LP, T is a general logic program, h is
modus ponens and A is the set of all negative literals. The contrary of ~ p
is p.
302 A. C. Kakas, R. A. Kowalski and F. Toni

For default logic, default rules are rewritten as sentences of the form

7(2;) <- a(x) A Mfa(x) A ... A M0n(X)

(similarly to Poole's simulation of default logic, Section 3), where the un-
derlying language is first-order logic augmented with a new symbol "M"
which creates a sentence from a sentence not containing M, and with a
new implication symbol <— in addition to the usual implication symbol for
first-order logic. The theory T is F U D, where f is the set of "facts" and
D is the set of defaults written as sentences, - is ordinary provability for
classical logic augmented with modus ponens for the new implication sym-
bol. (This is different from Poole's simulation, which treats <— as ordinary
implication.) The set A is the set of all sentences of the form Ma. The
contrary of Ma is ->a.
For autoepistemic logic, the theory T is any set of sentences written in
modal logic. However, I- is provability in classical (non-modal) logic. The
set A is the set of all sentences of the form ->L<£ or L0. The contrary of
-L0 is 0, whereas the contrary of L0 is ->L0.
For non-monotonic logic II, T is any set of sentences of modal logic,
as in the case of autoepistemic logic, but | is provability in modal logic
(including the inference rule of necessitation, which derives L0 from 0).
The set A is the set of all sentences of the form ->L0. The contrary of -<L0
is 0.
Given any theory T in any monotonic logic, candidate assumptions A
and notion of the "contrary" of an assumption, a set of assumptions A is
stable iff

• A does not attack itself and


• A attacks all {a} such that a € A - A.
This notion of stability includes as special cases stable models in LP and
extensions in default logic, autoepistemic logic and non-monotonic logic II.
Based upon this abductive interpretation of default logic, Satoh [Satoh,
1994] proposes a sound and complete proof procedure for default logic, by
extending the proof procedure for ALP of [Satoh and Iwayama, 1992a].
At a similar level of abstraction, Kakas, Mancarella and Dung [Kakas et
al., 1994] also propose a general argumentation-theoretic framework based
primarily on the acceptability semantics. As with LP, other semantics such
as preferred extension and stable theory semantics can be obtained as ap-
proximations of the acceptability semantics. A sceptical form of semantics,
analogous to the well-founded semantics for LP, is also given in [Kakas et
al., 1994], based on a strong form of acceptability.
Kakas, Mancarella and Dung define a notion of attack between conflict-
ing sets of sentences, but these can be any subtheories of a given theory,
The Role of Abduction 303

rather than being subtheories drawn from a pre-assigned set of assumption


sentences as in [Bondarenko et al,, 1993; Bondarenko et al., 1997]. Also as
in the special case of LP (see Section 4.3) this notion of attack together
with the acceptability semantics ensures that defences are genuine counter-
attacks, i.e. that they do not at the same time attack the theory that we
are trying to construct.
Because this framework does not separate the theory into facts and
candidate assumptions, the attacking relation would be symmetric. To
avoid this, a priority relation can be given on the sentences of the theory.
As an example of this approach, Kakas, Mancarella and Dung propose a
framework for ELP where programs are accompanied by a priority ordering
on their clauses and show how in this framework NAF can be removed
from the object-level language (see also Section 6.4). More generally, this
approach provides a framework for default reasoning with priorities on
sentences of a theory, viewed as default rules. It also provides a framework
for restoring consistency in a theory T by using the acceptable subsets of
T (see Sections 6.2 and 6.3).
Brewka and Konolige [Brewka and Konolige, 1993] also propose an
abductive framework which unifies and provides new semantics for LP,
autoepistemic logic and default logic, but does not use argumentation-
theoretic notions. This semantics generalises the semantics for LP given in
[Brewka, 1993].

8 Abduction and truth maintenance


In this section we will consider the relationship between truth maintenance
(TM) and abduction. TM systems have historically been presented from a
procedural point of view. However, we will be concerned primarily with the
semantics of TM systems and the relationship to the semantics of abductive
logic programming.
A TM system is part of an overall reasoning system which consists
of two components: a domain dependent problem solver which performs
inferences and a domain independent TM system which records these in-
ferences. Inferences are communicated to the TM system by means of
justifications, which in the simplest case can be written in the form

expressing that the proposition p can be derived from the propositions


p1,... ,pn. Justifications include premises, in the case n = 0, representing
propositions which hold in all contexts. Propositions can depend upon
assumptions which vary from context to context.
TM systems can also record nogoods, which can be written in the form
304 A. C. Kakas, R. A. Kowalski and F. Toni

meaning that the propositions p1,..., pn are incompatible and therefore


cannot hold together.
Given a set of justifications and nogoods, the task of a TM system is
to determine which propositions can be derived on the basis of the justifi-
cations, without violating the nogoods.
For any such TM system there is a straightforward correspondence with
abductive logic programs:
• justifications correspond to propositional Horn clause programs,
• nogoods correspond to propositional integrity constraints,
• assumptions correspond to abducible hypotheses, and
• contexts correspond to acceptable sets of hypotheses.
The semantics of a TM system can accordingly be understood in terms
of the semantics of the corresponding propositional logic program with
abducibles and integrity constraints.
The two most popular systems are the justification-based TM sys-
tem (JTMS) of Doyle [Doyle, 1979] and the assumption-based TM system
(ATMS) of deKleer [de Kleer, 1986].

8.1 Justification-based truth maintenance


A justification in a JTMS can be written in the form

expressing that p can be derived (i.e. is IN in the current set of beliefs) if


P 1 , . . . , pn can be derived (are IN) and pn+i,..., pm cannot be derived (are
OUT).
For each proposition occurring in a set of justifications, the JTMS de-
termines an IN or OUT label, taking care to avoid circular arguments and
thus ensuring that each proposition which is labelled IN has well-founded
support. The JTMS incrementally revises beliefs when a justification is
added or deleted.
The JTMS uses nogoods to record contradictions discovered by the
problem solver and to perform dependency-directed backtracking to
change assumptions in order to restore consistency. In the JTMS changing
an assumption is done by changing an OUT label to IN.
Suppose, for example, that we are given the justifications

corresponding to the propositional form of the Yale shooting problem. As


Morris [Morris, 1988] observes, these correctly determine that q is labelled
IN and that r and p are labelled OUT. If the JTMS is subsequently in-
formed that p is true, then dependency-directed backtracking will install
The Role of Abduction 305

a justification for r, changing its label from OUT to IN. Notice that this
is similar to the behaviour of the extended abductive proof procedure de-
scribed in Example 5.2.1, Section 5.2.
Several authors have observed that the JTMS can be given a seman-
tics corresponding to the semantics of logic programs, by interpreting jus-
tifications as prepositional logic program clauses, and interpreting ~ pi
as NAF of pi. The papers [Elkan, 1990; Giordano and Martelli, 1990;
Kakas and Mancarella, 1990b; Pimentel and Cuadrado, 1989], in particu-
lar, show that a well-founded labelling for a JTMS corresponds to a stable
model of the corresponding logic program. Several authors [Elkan, 1990;
Fujiwara and Honiden, 1989; Kakas and Mancarella, 1990b; Reinfrank and
Dessler, 1989], exploiting the interpretation of stable models as autoepis-
temic expansions [Gelfond and Lifschitz, 1988], have shown a correspon-
dence between well-founded labellings and stable expansions of the set of
justifications viewed as autoepistemic theories.
The JTMS can also be understood in terms of abduction using the
abductive approach to the semantics of NAF, as shown in [Dung, 1991a;
Giordano and Martelli, 1990; Kakas and Mancarella, 1990b]. This has the
advantage that the nogoods of the JTMS can be interpreted as integrity
constraints of the abductive framework. The correspondence between ab-
duction and the JTMS is reinforced by [Satoh and Iwayama, 1991], which
gives a proof procedure to compute generalised stable models using the
JTMS (see Section 5.4).

8.2 Assumption-based truth maintenance


Justifications in ATMS have the more restricted Horn clause form

However, whereas the JTMS maintains only one implicit context of as-
sumptions at a time, the ATMS explicitly records with every proposition
the different sets of assumptions which provide the foundations for its be-
lief. In ATMS, assumptions are propositions that have been pre-specified
as assumable. Each record of assumptions that supports a proposition p
can also be expressed in Horn clause form

and can be computed from the justifications, as we illustrate in the following


example.
Example 8.2.1. Suppose that the ATMS contains justifications
306 A. C. Kakas, R. A. Kowalski and F. Toni

and the single nogood

where a, 6, c, d, e are assumptions. Given the new justification

the ATMS computes explicit records of r's dependence on the assumptions:

The dependence

is not recorded because its assumptions violate the nogood. The depen-
dence

is not recorded because it is subsumed by the dependence

Reiter and deKleer [Reiter and De Kleer, 1987] show that, given a set
of justifications, nogoods, and candidate assumptions, the ATMS can be
understood as computing minimal and consistent abductive explanations
in the prepositional case (where assumptions are interpreted as abductive
hypotheses). This abductive interpretation of ATMS has been developed
further by Inoue [Inoue, 1990], who gives an abductive proof procedure for
the ATMS.
Given an abductive logic program P and goal G, the explicit construc-
tion in ALP of a set of hypotheses A, which together with P implies G
and together with P satisfies any integrity constraints /, is similar to the
record

computed by the ATMS. There are, however, some obvious differences.


Whereas ATMS deals only with prepositional justifications, relying on a
The Role of Abduction 307

separate problem solver to instantiate variables, ALP deals with general


clauses, combining the functionalities of both a problem solver and a TM
system.
The extension of the ATMS to the non-propositional case requires a
new notion of minimality of sets of assumptions. Minimality as subset
inclusion is not sufficient, but needs to be replaced by a notion of minimal
consequence from sets of not necessarily variable-free assumptions [Lamma
and Mello, 1992].
Ignoring the propositional nature of a TM system, ALP can be regarded
as a hybrid of JTMS and ATMS, combining the non-monotonic negative
assumptions of JTMS and the positive assumptions of ATMS, and allowing
both positive and negative conditions in both justifications and nogoods
[Kakas and Mancarella, 1990b]. Other non-monotonic extensions of ATMS
have been developed in [Junker, 1989; Rodi and Pimentel, 1991].
It should be noted that one difference between ATMS and ALP is the
requirement in ATMS that only minimal sets of assumptions be recorded.
This minimality of assumptions is essential for the computational efficiency
of the ATMS. However, it is not essential for ALP, but can be imposed as
an additional requirement when it is needed.

9 Conclusions and future work


In this paper we have surveyed a number of proposals for extending LP
to perform abductive reasoning. We have seen that such extensions are
closely linked with other extensions including NAF, integrity constraints,
explicit negation, default reasoning, belief revision and argumentation.
Perhaps the most important link, from the perspective of LP, is that
between default abduction and NAF. On the one hand, we have seen that
default abduction generalises NAF, to include not only negative but also
positive hypotheses, and to include general integrity constraints. On the
other hand, we have seen that logic programs with abduction and integrity
constraints can be transformed into logic programs with NAF without in-
tegrity constraints. We have also seen that, in the context of ELP with
explicit negation, NAF can be replaced by a priority ordering between
clauses. The link between abduction and NAF includes both their seman-
tics and their implementations.
The use of default abduction for NAF is a special case of abduction
in general. The distinction between default and non-default abduction
has been clarified. Semantics, proof procedures and transformations that
respect this distinction have all been defined. However, more work is needed
to study the integration of these two kinds of abduction in a common
framework. The argumentation-based approach seems to offer a promising
framework for such an integration.
We have seen the importance of clarifying the semantics of abduction
308 A. C. Kakas, R. A. Kowalski and F. Toni

and of defining a semantics that helps to unify the different forms of ab-
duction, NAF, and default reasoning within a common framework. We
have seen, in particular, that a proof procedure which is incorrect under
one semantics (e.g. [Eshghi and Kowalski, 1989]) can be correct under an-
other improved semantics (e.g. [Dung, 1991]). We have also introduced
an argumentation-theoretic interpretation for the semantics of abduction
applied to NAF, and we have seen that this interpretation can help to
understand the relationships between different semantics.
The argumentation-theoretic interpretation of NAF has been abstracted
and shown to unify and simplify the semantics of such different formalisms
for default reasoning as default logic, autoepistemic logic and non-mono-
tonic logic. In each case the standard semantics of these formalisms has
been shown to be a special instance of a single abstract notion that a
set of assumptions is a (stable) semantics if it does not attack itself but
does attack all other assumptions it does not contain. The stable model
semantics, generalised stable model semantics and answer set semantics
are other special cases. We have seen that stable model semantics and its
extensions have deficiencies which are avoided with admissibility, preferred
extension, complete scenaria, weak stability, stable theory and acceptability
semantics. Because these more refined semantics for LP can be defined
abstractly for any argumentation-based framework, they automatically and
uniformly provide improvements for the semantics of other formalisms for
default reasoning.
Despite the many advances in the application of abduction to LP and
to non-monotonic reasoning more generally, there is still much scope for
further development. Important problems in semantics still need to be
resolved. These include the problem of clarifying the role of integrity con-
straints in providing attacks and counterattacks in ALP.
The further development, clarification and simplification of the abstract
argumentation-theoretic framework and its applications both to existing
formalisms and to new formalisms for non-monotonic reasoning is another
important direction for future research. Of special importance is the prob-
lem of relating circumscription and the if-and-only-if completion semantics
to the argumentation-theoretic approach. An important step in this direc-
tion may be the "common sense" axiomatisation of NAF [Van Gelder and
Schlipf, 1993] by Van Gelder and Schlipf, which augments the if-and-only-if
completion with axioms of induction. The inclusion of induction axioms
relates this approach to circumscription, whereas the rewriting of negative
literals by new positive literals relates it to the abductive interpretation of
NAF.
The development of systems that combine ALP and CLP is another
important area that is still in its infancy. Among the results that might be
expected from this development are more powerful systems that combine
constructive abduction and constructive negation, and systems in which
The Role of Abduction 309

user-defined constraint handling rules might be formulated and executed


efficiently as integrity constraints.
It is an important feature of the abductive interpretation of NAF that
it possesses elegant and powerful proof procedures, which significantly ex-
tend SLDNF and which can be extended in turn to accommodate other
abducibles and other integrity constraints. Different semantics for NAF
require different proof procedures. It remains to be seen whether the inef-
ficiency of proof procedures for the acceptability semantics, in particular,
can somehow be avoided in practice.
We have seen that abductive proof procedures for LP can be extended
to ALP. We have also seen that ALP can be reduced to LP by transfor-
mations. The comparative efficiency of the two different approaches to the
implementation of ALP needs to be investigated further.
We have argued that the implementation of abduction needs to be con-
sidered within a broader framework of implementing knowledge assimila-
tion (KA). We have seen that abduction can be used to assist the process
of KA and that abductive hypotheses themselves need to be assimilated.
Moreover, the general process of checking for integrity in KA might be used
to check the acceptability of abductive hypotheses.
It seems that an efficient implementation of KA can be based upon
combining two processes: backward reasoning both to generate abductive
hypotheses and to test whether the input is redundant and forward reason-
ing both to test input for consistency and to test whether existing infor-
mation is redundant. Notice that the abductive proof procedure for ALP
already has this feature of interleaving backward and forward reasoning.
Such implementations of KA need to be integrated with improvements of
the abductive proof procedure considered in isolation.
We have seen that the process of belief revision also needs to be con-
sidered within a KA context. In particular, it could be useful to investi-
gate relationships between the belief revision frameworks of [Doyle, 1991;
Gardenfors, 1988; Nebel, 1989; Nebel, 1991] and various integrity constraint
checking and restoration procedures.
The extension of LP to include integrity constraints is useful both for
abductive LP and for deductive databases. We have seen, however, that
for many applications the use of integrity constraints with retractibles can
be replaced by clauses with explicitly negated conclusions with priorities.
Moreover, the use of explicit negation with priorities seems to have several
advantages, including the ability both to represent and derive negative
information, as well as to obtain the effect of NAF.
The relationship between integrity constraints with retractibles and ex-
plicit negation with priorities needs to be investigated further: To what
extent does this relationship, which holds for abduction and default rea-
soning, hold for other uses of integrity constraints, such as those employed
in deductive databases; and what are the implications of this relationship
310 A. C. Kakas, R. A. Kowalski and F. Toni

on the semantics and implementation of integrity constraints?


We have remarked upon the close links between the semantics of LP
with abduction and the semantics of truth maintenance systems. The prac-
tical consequences of these links, both for building applications and for effi-
cient implementations, need further investigation. What is the significance,
for example, of the fact that conventional TMSs and ATMSs correspond
only to the propositional case of logic programs?
We have seen the rapid development of the abduction-based argument-
ation-theoretic approach to non-monotonic reasoning. But argumentation
has wider applications in areas such as law and practical reasoning more
generally. It would be useful to see to what extent the theory of argu-
mentation might be extended to encompass such applications. It would
be especially gratifying, in particular, if such an extended argumentation
theory might be used, not only to understand how one argument can defeat
another, but also to indicate how conflicting arguments might be reconciled.

Acknowledgements
This research was supported by Fujitsu Research Laboratories and by the
Esprit Basic Research Action Compulog II. The authors are grateful to
Katsumi Inoue and Ken Satoh for helpful comments on an earlier draft,
and to Jose Julio Alferes, Phan Minh Dung, Paolo Mancarella and Luis
Moniz Pereira for many helpful discussions.

References
[Akama, 1992] S. Akama. Answer set semantics and constructive logic with
strong negation. Technical Report, 1992.
[Alferes et al., 1993] J. J. Alferes, P. M. Dung and L. M. Pereira. Scenario
semantics of extended logic programs. Proc 2nd International Work-
shop on Logic Programming and Nonmonotonic Reasoning, Lisbon. L.
M. Pereira and A. Nerode, eds. pp. 334-348, MIT Press, Cambridge,
MA, 1993.
[Alferes and Pereira, 1994] J. J. Alferes and L. M. Pereira. An argument-
ation-theoretic semantics based on non-refutable falsity. Proc. 4th Int.
Workshop on Non-monotonic Extensions of Logic Programming, Santa
Margherita Ligure, Italy. J. Dix, L. M. Pereira and T. Przymusinski eds.
1994.
[Alferes and Pereira, 1993] J. J. Alferes and L. M. Pereira. Contradiction
in logic programs: when avoidance equal removal, Parts I and II. Proc.
4th Int. Workshop on Extensions of Logic Programming, R. Dyckhoff ed.
pp. 7-26. Lecture Notes in AI 798, Springer-Verlag, Berlin, 1993.
[Alferes et al., 1994] J. J. Alferes, C. V. Damasio and L. M. Pereira. Top-
down query evaluation for well-founded semantics with explicit negation.
The Role of Abduction 311

Proc. European Conference on Artificial Intelligence, ECAI '94, John


Wiley, Amsterdam, 1994.
[Allemand et al., 1991] D. Allemand, M. Tanner, T. Bylander and J.
Josephson. The computational complexity of abduction. Artificial In-
telligence, 49, 25-60, 1991.
[Apt and Bezem, 1990] K. R. Apt and M. Bezem. Acyclic programs. Proc.
7th International Conference on Logic Programming, Jerusalem, pp. 579-
597. MIT Press, Cambridge, MA, 1990.
[Aravindan and Dung, 1994] C. Aravindan and P. M. Dung. Belief dynam-
ics, abduction and databases. Proc. 4th European Workshop on Logics
in AI. Lecture Notes in AI, Springer Verlag, Berlin 1994.
[Baral and Gelfond, 1994] C. Baral and M. Gelfond. Logic programming
and knowledge representation. Journal of Logic Programming, 19-20,
73-148, 1994.
[Barbuti et al., 1990] R. Barbuti, P. Mancarella, D. Pedreschi and F.
Turini. A transformational approach to negation in logic programming.
Journal of Logic Programming, 8, 201-228, 1990.
[Bondarenko et al., 1993] A. Bondarenko, F. Toni and R. A. Kowalski. An
assumption-based framework for non-monotonic reasoning, Proc. 2nd In-
ternational Workshop on Logic Programming and Nonmonotonic Rea-
soning Lisbon. L. M. Pereira and A. Nerode eds. 171-189. MIT Press,
Cambridge, MA, 1993.
[Bondarenko et al., 1997] A. Bondarenko, P. M. Dung, R. A. Kowalski and
F. Toni. An abstract, argumentation-theoretic framework for default rea-
soning. To appear in Artificial Intelligence, 1997.
[Brewka, 1989] G. Brewka. Preferred subtheories: an extended logical
framework for default reasoning. Proc. llth International Joint Con-
ference on Artificial Intelligence, Detroit, MI, 1043-1048, 1989.
[Brewka, 1993] G. Brewka. An abductive framework for generalised logic
programs. Proc. 2nd International Workshop on Logic Programming and
Nonmonotonic Reasoning Lisbon. L. M. Pereira and A. Nerode eds. 349-
364. MIT Press, Cambridge, MA, 1993.
[Brewka and Konolige, 1993] G. Brewka and K. Konolige. An abductive
framework for general logic programs and other non-monotonic sys-
tems. Proc. 13th International Joint Conference on Artificial Intelli-
gence, Chambery, France, 9-15, 1993.
[Brogi et al., 1992] A. Brogi, E. Lamma, P. Mello, and P. Mancarella. Nor-
mal logic programs as open positive programs. Proc. ICSLP '92, 1992.
[Bry, 1990] F. Bry. Intensional updates: abduction via deduction. Proc.
7th International Conference on Logic Programming, Jerusalem, 561-
575, MIT Press, Cambridge, MA, 1990.
312 A. C. Kakas, R. A. Kowalski and F. Toni

[Buchert, 1994] H.-J. Burchert. A resolution principle for constrained log-


ics. Artificial Intelligence, 66, 235-271, 1994.
[Buchert and Nutt, 1991] H.-J. Burchert and W. Nutt. On abduction
and answer generation through constraint resolution. Technical Report,
DFKI, Kaiserslautern, 1991.
[Casamayor and Decker, 1992] J. Casamayor and H. Decker. Some proof
procedures for computational first-order theories, with an abductive
flavour to them. Proc. 1st Compulog-Net Workshop on Logic Program-
ming in Artificial Intelligence, Imperial College, London, 1992.
[Chan, 1988] D. Chan. Constructive negation based on the completed
database. Proc. 5th International Conference and Symposium on Logic
Programming, Washington, Seattle, 111-125, 1988.
[Charniak and McDermott, 1985] E. Charniak and D. McDermott. Intro-
duction to artificial intelligence. Addison-Wesley, Menlo Park, CA, 1985.
[Chen and Warren, 1989] W. Chen and D. S. Warren. Abductive logic pro-
gramming. Technical Report, Dept. of Comp. Science, State Univ. of New
York at Stony Brook, 1989.
[Clark, 1978] K. L. Clark. Negation as failure. Logic and Data Bases, H.
Gallaire and J. Minker, eds. pp. 293-322, Plenum Press, NY, 1978.
[Console and Saitta, 1992] L. Console and L. Saitta. Abduction, induction
and inverse resolution. Proc. 1st Compulog-Net Workshop on Logic Pro-
gramming in Artificial Intelligence, Imperial College, London, 1992.
[Console et al., 1989] L. Console, D. Theseider Dupre and P. Torasso. A
Theory for diagnosis for incomplete causal models. Proc. llth Inter-
national Joint Conference on Artificial Intelligence, Detroit, Mi, 1311-
1317, 1989.
[Console et al., 1991] L. Console, D. Theseider Dupre and P. Torasso. On
the relationship between abduction and deduction. Journal of Logic and
Computation, 2(5), 661-690, 1991.
[Console et al., 1994] L. Console, M. L. Sapino and D. Theseider Dupre.
The role of abduction in database view updating. Journal of Intelligent
Information Systems, 4, 261-280, 1995.
[Cox and Pietrzykowski, 1992] P. T. Cox and T. Pietrzykowski. Causes for
events: their computation and applications. CADE '86, 608-621, 1992.
[de Kleer, 1986] J. de Kleer. An assumption-based TMS. Artificial Intelli-
gence, 32, 1986.
[Decker, 1986] H. Decker. Integrity enforcement on deductive databases.
Proc. EDS '86, Charleston, SC, 271-285, 1986.
[Demolombe and Farinas del Cerro, 1991] R. Demolombe and L. Farinas
del Cerro. An inference rule for hypotheses generation. Proc. 12th In-
ternational Joint Conference on Artificial Intelligence, Sidney, 152-157,
1991.
The Role of Abduction 313

[Denecker and De Schreye, 1992] M. Denecker and D. De Schreye. Tem-


poral reasoning with abductive event calculus. Proc. 1st Compulog-Net
Workshop on Logic Programming in Artificial Intelligence, Imperial Col-
lege, London,1992.
[Denecker and De Schreye, 1992a] M. Denecker and D. De Schreye. On the
duality of abduction and model generation. Proc. International Confer-
ence on Fifth Generation Computer Systems, Tokyo, 650-657, 1992.
[Denecker and De Schreye, 1992b] M. Denecker and D. De Schreye. SLD-
NFA: an abductive procedure for normal abductive programs. Proc. In-
ternational Conference and Symposium on Logic Programming, 686-700,
1992.
[Denecker and De Schreye, 1993] M. Denecker and D. De Schreye. Rep-
resenting incomplete knowledge in abductive logic programming. Proc.
ILSP'93, Vancouver, 1993.
[Doyle, 1979] J. Doyle. A truth maintenance system. Artificial Intelligence,
12, 231-272, 1979.
[Doyle, 1991] J. Doyle. Rational belief revision. Proc. 2nd International
Conference on Principles of Knowledge Representation and Reasoning,
Cambridge, MA, 163-174, 1991.
[Drabent, 1995] W. Drabent. What is failure? An approach to constructive
negation. Ada Informatica, 32, 27-60, 1995.
[Dung, 1991] P. M. Dung. Negation as hypothesis: an abductive founda-
tion for logic programming. Proc. 8th International Conference on Logic
Programming, Paris, 3-17, MIT Press, Cambridge, MA, 1991.
[Dung, 1991a] P. M. Dung. An abductive foundation for non-monotonic
truth maintenance. Proc. 1st World Conference on Fundamentals of Ar-
tificial Intelligence, Paris. M. de Glas ed. 1991.
[Dung, 1992] P. M. Dung. Acyclic disjunctive logic programs with abduc-
tive procedure as proof procedure. Proc. International Conference on
Fifth Generation Computer Systems, Tokyo, 555-561, 1992.
[Dung, 1992a] P. M. Dung. An abductive procedure for disjunctive logic
programming. Technical Report, Dept. of Computing, Asian Institute of
Technology, 1992.
[Dung, 1993] P. M. Dung. Personal communication, 1993.
[Dung, 1993a] P. M. Dung. On the acceptability of arguments and its fun-
damental role in nonmonotonic reasoning and logic programming. Ar-
tificial Intelligence, 77, 321-357, 1994. (Extended Abstract in Proc.
International Joint Conference on Artificial Intelligence, 852-859,1993.)
[Dung, 1993b] P. M. Dung. An argumentation semantics for logic program-
ming with explicit negation. Proc. 10th International Conference on
Logic Programming, Budapest, Hungary, MIT Press, Cambridge, MA,
1993.
314 A. C. Kakas, R. A. Kowalski and F. Toni

[Dung and Ruamviboonsuk, 1991] P. M. Dung and P. Ruamviboonsuk.


Well-founded reasoning with classical negation. Proc. 1st International
Workshop on Logic Programming and Nonmonotonic Reasoning, Wash-
ington, DC. A. Nerode, V. Marek and V. Subrahmahian, eds. 120-135,
1991.
[Dung et al, 1992] P. M. Dung, A. C. Kakas and P. Mancarella. Negation
as failure revisited. Technical Report, 1992.
[Eiter and Gottlob, 1993] T. Eiter and G. Gottlob. The complexity of
logic-based abduction. Proc. 10th Symposium on Theoretical Aspects of
Computing (STACS-93). P. Enjalbert, A. Finkel and K. W. Wagner,
eds. pp. 70-79. Vol. 665 of Lecture Notes on Computer Science, Springer
Verlag, Berlin, 1993. (Extended paper to appear in Journal of the ACM)
[Elkan, 1990] C. Elkan. A rational reconstruction of non-monotonic truth
maintenance systems. Artificial Intelligence, 43, 219-234, 1990.
[Eshghi, 1988] K. Eshghi. Abductive planning with event calculus. Proc.
5th International Conference and Symposium on Logic Programming,
Washington, Seattle, 562-579, 1988.
[Eshghi, 1990] K. Eshghi. Diagnoses as stable models. Proc. 1st Interna-
tional Workshop on Principles of Diagnosis, Menlo Park, CA, 1990.
[Eshghi, 1993] K. Eshghi. A tractable set of abduction problems. Proc.
13th International Joint Conference on Artificial Intelligence, Chambry,
Prance, 3-8, 1993.
[Eshghi and Kowalski, 1988] K. Eshghi and R. A. Kowalski. Abduction
through deduction. Technical Report, Department of Computing, Im-
perial College, London, 1988.
[Eshghi and Kowalski, 1989] K. Eshghi and R. A. Kowalski. Abduction
compared with negation by failure. Proc. 6th International Conference
on Logic Programming, Lisbon, 234-255, MIT Press, Cambridge, MA,
1989.
[Evans, 1989] C. A. Evans. Negation as failure as an approach to the Hanks
and McDermott problem. Proc. 2nd International Symposium on Artifi-
cial intelligence, Monterrey, Mexico, 1989.
[Evans and Kakas, 1992] C. A. Evans and A. C. Kakas. Hypothetico-ded-
uctive reasoning. Proc. International Conference on Fifth Generation
Computer Systems, Tokyo, 546-554, 1992.
[Finger and Genesereth, 1985] J. J. Finger and M. R. Genesereth.
RESIDUE: a deductive approach to design synthesis. Technical Report,
no. CS-85-1035, Stanford University, 1985.
[Fruhwirth, 1992] T. Fruhwirth. Constraint simplification rules. Technical
Report ECRC-92-18, 1992.
[Fujiwara and Honiden, 1989] Y. Fujiwara and S. Honiden. Relating the
TMS to Autoepistemic Logic. Proc. llth International Joint Conference
The Role of Abduction 315

on Artificial Intelligence, Detroit, MI, 1199-1205, 1989.


[Fujita and Hasegawa, 1991] M. Pujita and R. Hasegawa. A model gener-
ation theorem prover in KL1 using a ramified-stack algorithm. Proc. 8th
International Conference on Logic Programming, Paris, 535-548, MIT
Press, Cambridge, MA, 1991.
[Fung, 1993] T. H. Fung. Theorem proving approach with constraint han-
dling and its applications on databases. MSc Thesis, Imperial College,
London, 1993.
[Gabbay, 1991] D. M. Gabbay. Abduction in labelled deductive systems. A
conceptual abstract. Proc. of the European Conference on Symbolic and
Quantitative Approaches for uncertainty '91, R. Kruse and P. Siegel,
eds. pp. 3-12. Vol. 548 of Lecture Notes on Computer Science, Springer
Verlag, Berlin, 1991.
[Gabbay and Kempson, 1991] D. M. Gabbay and R. M. Kempson. La-
belled abduction and relevance reasoning. Workshop on Non-Standard
Queries and Non-Standard Answers, Toulouse, France, 1991.
[Gabbay et al, 1994] D. M. Gabbay, R. M. Kempson and J. Pitts. Labelled
abduction and relevance reasoning. Non-standard queries and answers,
R. Demolombe and T. Imielinski, eds. pp. 155-185. Oxford University
Press, 1994.
[Gaifman and Shapiro, 1989] H. Gaifman and E. Shapiro. Proof theory and
semantics of logic programming. Proc. LICS'89, pp. 50-62. IEEE Com-
puter Society Press, 1989.
[Gardenfors, 1988] P. Gardenfors. Knowledge in Flux: Modeling the Dy-
namics of Epistemic States. MIT Press, Cambridge, MA, 1988.
[Geffner, 1990] H. Geffner. Casual theories for non-monotonic reasoning.
Proc. AAAI '90, 1990.
[Geffner, 1991] H. Geffner. Beyond negation as failure. Proc. 2nd Interna-
tional Conference on Principles of Knowledge Representation and Rea-
soning, Cambridge, MA, 218-229, 1991.
[Gelfond and Lifschitz, 1988] M. Gelfond and V. Lifschitz. The Stable
model semantics for logic programs. Proc. 5th International Conference
and Symposium on Logic Programming, Washington, Seattle, 1070-1080,
1988.
[Gelfond and Lifschitz, 1990] M. Gelfond and V. Lifschitz. Logic programs
with classical negation. Proc. 7th International Conference on Logic Pro-
gramming, Jerusalem, 579-597, MIT Press, Cambridge, MA, 1990.
[Giordano and Martelli, 1990] L. Giordano and A. Martelli. Generalized
stable model semantics, truth maintenance and conflict resolution. Proc.
7th International Confernece on Logic Programming, Jerusalem, MIT
Press, Cambridge, MA, 427-411, 1990.
[Giordano et al., 1993] L. Giordano, A. Martelli and M. L. Sapino. A se-
316 A. C. Kakas, R. A. Kowalski and F. Toni

mantles for Eshghi and Kowalski's abductive procedure. Proc. 10th In-
ternational Conference on Logic Programming, Budapest, 586-600, MIT
Press, Cambridge, MA, 1993.
[Goebel et al., 1986] R. Goebel, K. Furukawa and D. Poole. Using definite
clauses and integrity constraints as the basis for a theory formation ap-
proach to diagnostic reasoning. Proc. 3rd International Conference on
Logic Programming, London, 1986, MIT Press, Cambridge, MA, .pp.
211-222. Vol 225 of Lecture Notes in Computer Science, Springer Ver-
lag, Berlin, 1986.
[Hanks and McDermott, 1986] S. Hanks and D. McDermott. Default rea-
soning, non-monotonic logics, and the frame problem. Proc. 8th AAAI
'86, Philadelphia, 328-333,1986.
[Hanks and McDermott, 1987] S. Hanks and D. McDermott. Non-
monotonic logics and temporal projection. Artificial Intelligence, 33,
1987.
[Hasegawa and Fujita, 1992] R. Hasegawa and M. Fujita. Parallel theorem
provers and their applications. Proc. International Conference on Fifth
Generation Computer Systems, Tokyo, 132-154, 1992.
[Hobbs, 1990] J. R. Hobbs. An integrated abductive framework for dis-
course interpretation. Proc. AAAI Symposium on Automated Abduction,
Stanford, 10-12, 1990.
[Hobbs et al, 1990] J. R. Hobbs, M. Stickel, D. Appelt and P. Martin.
Interpretation as abduction. Technical Report, 499, Artificial Intelligence
Center, Computing and Engineering Sciences Division, Menlo Park, CA,
1990.
[lnoue, 1990] K. Inoue. An abductive procedure for the CMS/ATMS. Proc.
European Conference on Artificial Intelligence, ECAI '90, International
Workshop on Truth Maintenance, Stockholm, Springer Verlag Lecture
notes in Computer Science, 1990.
[Inoue, 1991] K. Inoue. Extended logic programs with default assumptions.
Proc. 8th International Conference on Logic Programming, Paris, 490-
504, MIT Press, Cambridge, MA, 1991.
[Inoue, 1994] K. Inoue. Hypothetical reasoning in logic programs. Journal
of Logic Programming, 18, 191-227, 1994.
[Inoue et al., 1992] K. Inoue, M. Koshimura and R. Hasegawa. Embedding
negation as failure into a model generation theorem prover. Proc. llth
International Conference on Automated Deduction, CADE '92, Saratoga
Springs, NY, 1992.
[Inoue et al., 1992a] K. Inoue, Y. Ohta, R. Hasegawa and M. Nakashima.
Hypothetical reasoning systems on the MGTP. Technical Report, ICOT,
Tokyo (in Japanese), 1992..
[Junker, 1989] U. Junker. A correct non-monotonic ATMS. Proc. llth
International Joint Conference on Artificial Intelligence, Detroit, MI,
The Role of Abduction 317

1049-1054, 1989.
[Kakas, 1991] A. C. Kakas. Deductive databases as theories of belief. Tech-
nical Report, Logic Programming Group, Imperial College, London,
1991.
[Kakas, 1991a] A. C. Kakas. On the evolution of databases. Technical Re-
port, Logic Programming Group, Imperial College, London, 1991.
[Kakas, 1992] A. C. Kakas. Default reasoning via negation as failure. Proc.
ECAI-92 workshop on "Foundations of Knowledge Representation and
Reasoning", G. Lakemeyer and B. Nebel, eds. Vol 810 of Lecture Notes
in AI, Springer Verlag, Berlin, 1992.
[Kakas and Mancarella, 1989] A. C. Kakas and P. Mancarella. Anomalous
models and abduction. Proc. 2nd International Symposium on Artificial
intelligence, Monterrey, Mexico, 1989.
[Kakas and Mancarella, 1990] A. C. Kakas and P. Mancarella. Generalized
Stable Models: a Semantics for Abduction. Pore. 9th European Confer-
ence on artificial Intelligence, ECAI '90, Stockholm, 385-391, 1990.
[Kakas and Mancarella, 1990a] A. C. Kakas and P. Mancarella. Database
updates through abduction. Proc. 16th International Conference on Very
Large Databases, VLDB'90, Brisbane, Australia, 1990.
[Kakas and Mancarella, 1990b] A. C. Kakas and P. Mancarella. On the
relation of truth maintenance and abduction. Proc. of the 1st Pacific Rim
International Conference on Artificial Intelligence, PRICAI'90, Nagoya,
Japan,1990.
[Kakas and Mancarella, 1990c] A. C. Kakas and P. Mancarella. Abductive
LP. Proc. NACLP '90,, Workshop on Non-Monotonic Reasoning and
Logic Programming, Austin, Texas, 1990.
[Kakas and Mancarella, 1990d] A. C. Kakas and P. Mancarella. Knowledge
assimilation and abduction. Proc. European Conference on Artificial
Intelligence, ECAI '90, International Workshop on Truth Maintenance,
Stockholm, Springer Verlag Lecture notes in Computer Science, 1990.
[Kakas and Mancarella, 1991] A. C. Kakas and P. Mancarella. Negation as
stable hypotheses. Proc. 1st International Workshop on Logic Program-
ming and Nonmonotonic Reasoning, Washington, DC. A. Nerode, V.
Marek and V. Subrahmanian eds., 1991.
[Kakas and Mancarella, 1991a] A. C. Kakas and P. Mancarella. Stable the-
ories for logic programs. Pore. ISLP '91, San Diego, 1991.
[Kakas and Mancarella, 1993a] A. C. Kakas and P. Mancarella. Construc-
tive abduction in logic programming. Technical Report, Dipartimento di
Informatica, Universita di Pisa, 1993.
[Kakas and Mancarella, 1993] A. C. Kakas and P. Mancarella. Preferred
extensions are partial stable models. Journal of Logic Programming, 14,
341-348, 1993.
318 A. C. Kakas, R. A. Kowalski and F. Toni

[Kakas and Michael, 1993] A. C. Kakas and A. Michael. Scheduling


through abduction. Proc. ICLP'93 Post Conference workshop on Ab-
ductive Reasoning, 1993.
[Kakas and Michael, 1995] A. C. Kakas and A. Michael. Integrating abduc-
tive and constraint logic programming. Proc. 12th International Logic
Programming Conference, MIT Press, Cambridge, MA, 399-413, 1995.
[Kakas et al., 1993] A. C. Kakas, R. A. Kowalski and F. Toni. Abductive
logic programming. Journal of Logic and Computation, 2, 719-770,1993.
[Kakas et al, 1994] A. C. Kakas, P. Mancarella and P. M. Dung. The ac-
ceptability semantics for logic programs. Proc. llth International Con-
ference on Logic Programming, Santa Margherita Ligure, Italy, 504-519,
MIT Press, Cambridge, MA, 1994.
[Konolige, 1990] K. Konolige. A general theory of abduction. Spring Sym-
posium on Automated Abduction, Stanford University, pp. 62-66, 1990.
[Konolige, 1992] K. Konolige. Using defualt and causal reasoning in diag-
nosis. Proc. 3rd International Conference on Principles of Knowledge
Representation and .Reasoning,Cambridge, 1992.
[Kowalski, 1979] R. A. Kowalski. Logic for problem solving. Elsevier, New
York, 1979.
[Kowalski, 1987] R. A. Kowalski. Belief revision without constraints. Com-
putational Intelligence, 3, 1987.
[Kowalski, 1990] R. A. Kowalski. Problems and promises of computational
logic. Proc. Symposium on Computational Logic, J. Lloyd ed. Springer
Verlag Lecture Notes in Computer Science, 1990.
[Kowalski, 1992] R. A. Kowalski. Database updates in the event calculus.
Journal of Logic Programming, 12, 121-146, 1992.
[Kowalski, 1992a] R. A. Kowalski. A dual form of logic programming. Lec-
ture Notes, Workshop in Honour of Jack Minker, University of Maryland,
November 1992.
[Kowalski, 1994] R. A. Kowalski. Logic without model theory. What is a
Logical System?, D. Gabbay, ed. Oxford University Press, 1994.
[Kowalski and Sadri, 1988] R. A. Kowalski and F. Sadri. Knowledge repre-
sentation without integrity constraints. Technical Report, Department
of Computing, Imperial College, London, 1988.
[Kowalski and Sadri, 1990] R. A. Kowalski and F. Sadri. Logic programs
with exceptions. Proc. 7th International Conference on Logic Program-
ming, Jerusalem, 598-613, MIT Press, Cambridge, MA, 1990.
[Kowalski and Sergot, 1986] R. A. Kowalski and M. Sergot. A logic-based
calculus of events. New Generation Computing, 4, 67-95, 1986.
[Kunifuji et al., 1986] S. Kunifuji, K. Tsurumaki and K. Furukawa. Con-
sideration of a hypothesis-based reasoning system. Journal of Japanese
Society for Artificial Intelligence, 1, 228-237, 1986.
The Role of Abduction 319

[Lamma and Mello, 1992] E. Lamma and P. Mello. An assumption-based


truth maintenance system dealing with non ground justifications. Proc.
1st Compulog-Net Workshop on Logic Programming in Artificial Intelli-
gence, Imperial College, London, 1992.
[Lever, 1991] J. M. Lever. Combining induction with resolution in logic
programming. PhD Thesis, Department of Computing, Imperial College,
London, 1991.
[Levesque, 1989] H. J.Levesque. A knowledge-level account of abduction.
Proc. llth International Joint Conference on Artificial Intelligence, De-
troit, MI, 1061-1067, 1989.
[Lloyd and Topor, 1985] J. W. Lloyd and R. W. Topor. A basis for deduc-
tive database system. Journal of Logic Programming, 2, 93-109, 1985.
[McDermott, 1982] D. McDermott. Nonmonotonic logic II: nonmonotonic
modal theories. JACM, 29, 1982.
[Maim, 1992] E. Maim. Abduction and constraint logic programming.
Proc. European Conference on Artificial Intelligence, ECAI '92, Vienna,
Austria, 1992.
[Makinson, 1989] D. Makinson. General theory of cumulative inference.
Proc. 2nd International Workshop on Monmonotonic reasoning, Vol. 346
of Lecture Notes in Computer Science, Springer Verlag, Berlin, 1989.
[Manthey and Bry, 1988] R. Manthey and F. Bry. SATCHMO: a theorem
prover implemented in Prolog. Proc. 9th International Conference on
Automated Deduction, CADE '88, Argonne, IL, 415-434, 1988.
[Marek and Truszczynski, 1989] W. Marek and M. Truszczynski. Stable se-
mantics for logic programs and default theories. Proc. NACLP '89, 243-
256, 1989.
[Minker, 1982] J. Minker. On indefinite databases and the closed world as-
sumption. Proc. 6th International Conference on Automated Deduction,
CADE'82, New York. pp. 292-308. Vol 138 of Lecture Notes in Computer
Science, Springer-Verlag, Berlin, 1982.
[Miyaki et al, 1984] T. Miyaki, S. Kunifuji, H. Kitakami, K. Furukawa,
A. Takeuchi and H. Yokota. A knowledge assimilation method for logic
databases. International Symposium on Logic Programming, Atlantic
City, NJ, pp. 118-125, 1984.
[Moore, 1985] R. Moore. Semantical considerations on non-monotonic
logic. Artificial Intelligence, 25, 1985.
[Morris, 1988] P. H. Morris. The anomalous extension problem in default
reasoning. Artificial Intelligence, 35, 383-399, 1988.
[Nebel, 1989] B. Nebel. A knowledge level analysis of belief revision. Proc.
1st International Conference on Principles of Knowledge Representation
and Reasoning, Toronto, 301-311, 1989.
[Nebel, 1991] B. Nebel. Belief revision and default reasoning: syntax-based
320 A. C. Kakas, R. A. Kowalski and F. Toni

approaches. Proc. 2nd International Conference on Principles of Knowl-


edge Representation and Reasoning, Cambridge, MA, 417-428, 1991.
[Pearce and Wagner, 1991] D. Pearce and G. Wagner. Logic programming
with strong negation. Proc. Workshop on Extensions of Logic Program-
ming, Lecture Notes in Computer Science, Springer Verlag, 1991.
[Pearl, 1987] J. Pearl. Embracing causality in formal reasoning. Proc.
AAAI '87,, Washington, Seattle, 360-373, 1987.
[Pearl, 1988] J. Pearl. Probabilistic reasoning in intelligent systems: Net-
works of plausible inference. Morgan Kaufmann, San Mateo, CA, 1988.
[Peirce, 1931-58] C. S. Peirce. Collected papers of Charles Sanders Peirce.
Vol.2, 1931-1958, C. Hartshorn and P. Weiss, eds. Harvard University
Press, 1933.
[Pereira and Alferes, 1992] L. M. Pereira and J. J. Alferes. Well-founded
semantics for logic programs with explicit negation. Proc. 92 European
Conference on Artificial Intelligence, ECAI 'Vienna,Austria, 102-106,
1992.
[Pereira et al., 1991] L. M. Pereira, J. N. Aparicio and J. J. Alferes. Non-
monotonic reasoning with well-founded semantics. Proc. 8th Interna-
tional Conference on Logic Programming, Paris, 1991. MIT Press, Cam-
bridge, MA, 1991.
[Pereira et al., 1991a] L. M. Pereira, J. N. Aparicio and J. J. Alferes. Con-
tradiction removal within well-founded semantics. Proc. 1st International
Workshop on Logic Programming and Nonmonotonic Reasoning, Wash-
ington, DC. A. Nerode, V. Marek and V. Subrahmanian eds., 1991.
[Pereira et al., 1991b] L. M. Pereira, J. N. Aparicio and J. J. Alferes.
Deriyation procedures for extended stable models. Proc. 12th Interna-
tional Joint Conference on Artificial Intelligence, Sidney, 863-868, 1991.
[Pereira et al., 1991c] L. M. Pereira, J. N. Aparicio and J. J. Alferes. Coun-
terfactual reasoning based on revising assumptions. Proc. ISLP '91, San
Diego, 1991.
[Pereira et al., 1992] L. M. Pereira, J. J. Alferes and J. N. Aparicio. Con-
tradiction removal semantics with explicit negation Proc. Applied Logic
Conference, Amsterdam, 1992.
[Pereira et al, 1993] L. M. Pereira, C. V. Damasio and J. J. Alferes. Diag-
nosis and debugging as contradiction removal. Proc. 2nd International
Workshop on Logic Programming and Nonmonotonic Reasoning Lisbon.
L. M. Pereira and A. Nerode eds. 316-330. MIT Press, Cambridge, MA,
1993.
[Pimentel and Cuadrado, 1989] S. G. Pimentel and J. L. Cuadrado. A
truth maintenance system based on stable models. Proc. NACLP '89,
1989.
[Poole, 1985] D. Poole. On the comparison of theories: preferring the most
The Role of Abduction 321

specific explanation. Proc. 9th International Joint Conference on Artifi-


cial Intelligence, Los Angeles, CA, 144-147, 1985.
[Poole, 1987] D. Poole. Variables in hypotheses. Proc. 10th International
Joint Conference on Artificial Intelligence, Milan, 905-908, 1987.
[Poole, 1988] D. Poole. A logical framework for default reasoning. Artificial
Intelligence, 36, 27-47, 1988.
[Poole, 1988a] D. Poole. Representing knowledge for logic-based diagnosis.
Proc. International Conference on Fifth Generation Computer Systems,
Tokyo, 1282-1290, 1988.
[Poole, 1989] D. Poole. Explanation and prediction: an architecture for
default and abductive reasoning. Computational Intelligence Journal, 5,
97-110, 1989.
[Poole, 1992] D. Poole. Logic programming, abduction and probability.
Proc. International Conference on Fifth Generation Computer Systems,
Tokyo, 530-538, 1992.
[Poole, 1993] D. Poole. Probabilistic Horn abduction and Bayesian net-
works. Artificial Intelligence, 64, 81-129, 1993.
[Poole et al, 1987] D. Poole, R. G. Goebel and T. Aleliunas. Theorist: a
logical reasoning system for default and diagnosis. The Knowledge Fron-
teer: Essays in the Representation of Knowledge, N. Cercone and G.
McCalla eds. pp. 331-352. Lecture Notes in Computer Science, Springer
Verlag, 1987.
[Pople, 1973] H. E. Pople Jr. On the mechanization of abductive logic.
Proc. 3rd International Joint Conference on Artificial Intelligence, 147-
152, 1973.
[Preist and Eshghi, 1992] C. Preist and K. Eshghi. Consistency-based and
abductive diagnoses as generalised stable models. Proc. International
Conference on Fifth Generation Computer Systems, Tokyo, 514-521,
1992.
[Przymusinski, 1989] T. C. Przymusinski. On the declarative and proce-
dural semantics of logic programs. Journal of Automated Reasoning, 5,
167-205, 1989.
[Przymusinski, 1990] T. C. Przymusinski. Extended stable semantics for
normal and disjunctive programs. Proc. 7th International Conference on
Logic Programming, Jerusalem, 459-477, MIT Press, Cambridge, MA,
1990.
[Przymusinski, 1991] T. C. Przymusinski. Semantics of disjunctive logic
programs and deductive databases. Proc. DOOD '91, 1991.
[Reggia, 1983] J. Reggia. Diagnostic experts systems based on a set-
covering model. International Journal of Man-Machine Studies, 19, 437-
460, 1983.
322 A. C. Kakas, R. A. Kowalski and F. Toni

[Reinfrank and Dessler, 1989] M. Reinfrank and O. Dessler. On the rela-


tion between truth maintenance and non-monotonic logics. Proc. llth
International Joint Conference on Artificial Intelligence, Detroit, MI,
1206-1212, 1989.
[Reiter, 1978] R. Reiter. On closed world data bases. Logic and Databases,
H. Gallaire and J. Minker eds. pp. 55-76. Plenum, New York, 1978.
[Reiter, 1980] R. Reiter. A Logic for default reasoning. Artificial Intelli-
gence, 13, 81-132, 1980.
[Reiter, 1987] R. Reiter. A theory of diagnosis from first principle. Artificial
Intelligence, 32, 1987.
[Reiter, 1988] R. Reiter. On integrity constraints. Proc. 2nd Conference on
Theoretical Aspects of Reasoning about Knowledge, Pacific Grove, CA,
M. Y. Vardi ed. 1988.
[Reiter, 1990] R. Reiter. On asking what a database knows. Proc. Sympo-
sium on Computational Logic, J. Lloyd ed. Lecture Notes in Computer
Science, Springer Verlag, 1990.
[Reiter and De Kleer, 1987] R. Reiter and J. de Kleer. Foundations of
assumption-based truth maintenance systems: preliminary report. Proc.
AAAI '87, Washington, Seattle, 183-188, 1987.
[Rodi and Pimentel, 1991] W. L. Rodi and S. G. Pimentel. A non-
monotonic ATMS using stable bases. Proc. 2nd International Conference
on Principles of Knowledge Representation and Reasoning, Cambridge,
MA, 1991,
[Sacca and Zaniolo, 1990] D. Sacca and C. Zaniolo. Stable models and non
determinism for logic programs with negation Proc. ACM SIGMOD-
SIGACT Symposium on Principles of Database Systems pp. 205-217,
1990.
[Sadri and Kowalski, 1987] F. Sadri and R. A. Kowalski. An application of
general purpose theorem-proving to database integrity. Foundations of
Deductive Databases and Logic Programming, Minker ed. pp. 313-362.
Morgan Kaufmann Publishers, Palo Alto, CA, 1987.
[Sakama and Inoue, 1993] C. Sakama and K. Inoue. Negation in disjunc-
tive logic programs. Proc. 10th International Conference on Logic Pro-
gramming, Budapest, 703-719, MIT Press, Cambridge, MA, 1993.
[Sakama and Inoue, 1994] C. Sakama and K. Inoue. On the equivalence be-
tween disjunctive and abductive logic programs. Proc. llth International
Conference on Logic Programming, Santa Margherita Ligure, Italy, 489-
503, MIT Press, Cambridge, MA, 1994.
[Sato, 1990] T. Sato. Completed logic programs and their consistency.
Journal of Logic Programming, 9, 33-44, 1990.
[Satoh, 1994] K. Satoh. A top-down proof procedure for default logic
by using abduction. Proc. European Conference on Artificial Intelli-
The Role of Abduction 323

gence,ECAI '94, Amsterdam, 1994.


[Satoh and Iwayama, 1991] K. Satoh and N. Iwayama. Computing abduc-
tion using the TMS. Proc. 8th International Conference on Logic Pro-
gramming, Paris, 505-518, MIT Press, Cambridge, MA, 1991.
[Satoh and Iwayama, 1992] K. Satoh and N. Iwayama. A correct top-down
proof procedure for general logic programs with integrity constraints.
Proc. 3rd International Workshop on Extensions of Logic Programming,
pp. 19-34. 1992.
[Satoh and Iwayama, 1992a] K. Satoh and N. Iwayama. A query evaluation
method for abductive logic programming. Proc. International Conference
and Symposium on Logic Programming, 671-685, 1992.
[Sattar and Goebel, 1989] A. Sattar and R. Goebel. Using crucial literals
to select better theories. Technical Report, Dept. of Computer Science,
University of Alberta, Canada, 1989.
[Selman and Levesque, 1990] B. Selman and H. J. Levesque. Abductive
and default reasoning: a computational core. Proc. AAAI 90, 343-348,
1990.
[Sergot, 1983] M. Sergot. A query-the-user facility for logic programming.
Integrated Interactive Computer Systems. P. Degano and E. Sandewell
eds. pp. 27-41. North Holland Press, 1983.
[Shanahan, 1989] M. Shanahan. Prediction is deduction but explanation is
abduction. Proc. llth International Joint Conference on Artificial Intel-
ligence, Detroit, MI, 1055-1060, 1989.
[Simari and Loui, 1992] G. R. Simari and R. P. Loui. A mathematical
treatment of defeasible reasoning and its implementation. Artificial In-
telligence, 53, 125-157, 1992.
[Sperber and Wilson, 1986] D. Sperber and D. Wilson. Relevance: com-
munication and cognition. Blackwell, Oxford, UK, 1986.
[Stickel, 1988] M. E. Stickel. A prolog-like inference system for comput-
ing minimum-cost abductive explanations in natural-language interpre-
tation. Proc. International Computer Science Conference (Artificial In-
telligence: Theory and Applications), Hong Kong, J.-L. Lassez and Shiu-
Kai Chin eds. pp. 343-350, 1988.
[Stickel, 1989] M. E. Stickel. Rationale and methods for abductive rea-
soning in natural language interpretation. Proc. International Scientific
Symposium on Natural Language and Logic, Hamburg, Germany, pp.
233-252. Lecture Notes in Artificial Intelligence, Springer Verlag, 1989.
[Teusink, 1993] F. Teusink. Using SLDFA-resolution with abductive logic
programs. ILPS '93 post-conference workshop "Logic Programming with
Incomplete Information", 1993.
[Toni and Kakas, 1995] F. Toni and A. C. Kakas. Computing the ac-
ceptability semantics. Proc. International Workshop on Logic Program-
324 A. C. Kakas, R. A. Kowalski and F. Toni

ming and Nonmonotonic Reasoning, V. W. Marek, A. Nerode and M.


Truszczynski, eds. LNAI 928, Springer Verlag, 401-415, 1995.
[Toni and Kowalski, 1995] F. Toni and R. A. Kowalski. Reduction of ab-
ductive logic programs to normal logic programs. Proc. 12th Interna-
tional Logic Programming Conference, MIT Press,Cambridge, MA, 367-
381, 1995.
[Toni, 1994] F. Toni. A theorem-proving approach to job-shop scheduling.
Technical Report, Imperial College, London, 1994.
[Torres, 1993] A. Torres. Negation as failure to support. Proc. 2nd Interna-
tional Workshop on Logic Programming and Nonmonotonic Reasoning
Lisbon. L. M. Pereira and A. Nerode eds. 223-243. MIT Press, Cam-
bridge, MA, 1993.
[Van Belleghem et al, 1994] K. Van Belleghem, M. Denecker and D. De
Schreye. Representing continuos change in the abductive event calcu-
lus. Proc. llth International Conference on Logic Programming, Santa
Margherita Ligure, Italy, 225-239, MIT Press, Cambridge, MA, 1994.
[Van Gelder and Schlipf, 1993] A. Van Gelder and J. S. Schlipf. Common-
sense axiomatizations for logic programs. Journal of Logic Programming,
17, 161-195, 1993.
[Van Gelder et al, 1988] A. Van Gelder, K. A. Ross and J. S. Schlipf. Un-
founded sets and the well-founded semantics for general logic programs.
Proc. ACM SIGMOD-SIGACT, Symposium on Principles of Database
Systems, 1988.
[Wallace, 1987] M. Wallace. Negation by constraints: a sound and efficient
implementation of negation in deductive databases. Proc. 4th Symposium
on Logic Programming, San Francisco, 1987.
Semantics for Disjunctive and Normal
Disjunctive Logic Programs
Jorge Lobo, Jack Minker and Arcot Rajasekar

Contents
1 Introduction 325
2 Positive consequences in logic programs 327
2.1 Definite logic programming 328
2.2 Disjunctive logic programming 330
3 Negation in logic programs 337
3.1 Negation in definite logic programs 337
3.2 Negation in disjunctive logic progrtams 338
4 Normal or general disjunctive logic programs 340
4.1 Stratified definite logic programs 341
4.2 Stratified disjunctive logic programs 343
4.3 Well-founded and generalized well-founded logic programs 346
4.4 Generalized disjunctive well-founded semantics . . 346
5 Summary 347
6 Addendum 349

1 Introduction
During the past 20 years, logic programming has grown from a new disci-
pline to a mature field. Logic programming is a direct outgrowth of work
that started in automated theorem proving. The first programs based on
logic were developed by Colmerauer and his students [Colmerauer et al.,
1973] at the University of Marseilles in 1972 where the logic programming
language PROLOG was developed. Kowalski [1974] published the first pa-
per that formally described logic as a programming language in 1974. Alain
Colmerauer and Robert Kowalski are considered the founders of the field of
logic programming, van Emden and Kowalski [van Emden and Kowalski,
1976] laid down the theoretical foundation for logic programming. In the
past decade the field has witnessed rapid progress with the publication of
several theoretical results which have provided a strong foundation for logic
326 Jorge Lobo, Jack Minker and Arcot Rajasekar

programming and extended the scope of logic as a programming language.


The objective of this article is to outline theoretical results that have been
developed in the field of logic programming with particular emphasis to
disjunctive logic programming. Disjunctive logic programming is an exten-
sion of logic programming and is useful in representing and reasoning with
indefinite information.
A disjunctive logic program consists of a finite set of implicitly quanti-
fied universal clauses of the form:

where the Ai's and the BJ'S are atoms. The atoms in the left of the
implication sign form a disjunction and is called the head of the formula
and those on the right form a conjunction and is called the body of the
formula. The formula is read as "A1 or A2 or ... or Am if B1 and B2
and ... and Bn." There are several forms of the formula that one usually
distinguishes. If the body of the formula is empty, and the head is not,
the formula is referred to as a fact. If both are not empty the formula is
referred to as a procedure. A procedure or a fact is also referred to as a logic
program clause. If both head and body of a formula are empty, then the
formula is referred to as the halt statement. Finally, a query is a formula
where the head of the formula is empty and the body is not empty. A finite
set of such logic program clauses is said to constitute a disjunctive logic
program. If the head of a logic program clause consists of a single atom,
then it is called a Horn or definite logic program clause. A finite set of
such definite logic program clauses is said to constitute a Horn or definite
logic program. A definite logic program is a special case of disjunctive logic
program. We shall also consider clauses of the form (1), where the B, may
be literals. By a literal we mean either an atom or the negation of an atom.
A clause of the form (1) which contains literals in the body is referred to as
a normal (when the head is an atom) or general disjunctive logic program
clause. Similarly we also deal with queries which can have literals and we
refer to them as general queries.
This article is divided into several sections. In the section Positive con-
sequences in logic programs, we describe the theory developed to character-
ize logical consequences from definite and disjunctive logic programs whose
forms are as in (1). In the section Negation in logic programs, we describe
the theories developed to handle general queries in definite and disjunctive
logic programs. In the section Normal or general disjunctive logic programs
we discuss several topics: stratified logic programs, well-founded and gener-
alized well-founded logic programs and generalized disjunctive well-founded
logic programs. In the subsection Stratified logic programs, the theory for
general logic programs with no recursion through negative literals in the
body of a logic program clause is described. Such logic programs are called
Disjunctive Logic Programs 327

stratified logic programs. In the subsection Well-founded and generalized


well-founded logic programs, we describe extensions to the theory that han-
dle normal logic programs that are not stratifiable. In the subsection Gen-
eralized disjunctive well-founded logic programs, we extend the results to
normal disjunctive logic programs. This permits the full gamut of possible
logic programs to be discussed from a theoretical view. We do not treat
theories in which the left hand side of a logic program clause may contain
negated atoms.

2 Positive consequences in logic programs


A question that arises with any programming language is that of the se-
mantics of the program. What does a program written in that language
mean, and what programs can be written in the language? In logic pro-
gramming the fact that one is dealing with a set of logical statements
permits one to use concepts from classical logic to define a semantics for
a logic program. It is convenient and sufficient to focus on the Herbrand
domain of a logic program to capture the semantics of a logic program. By
the Herbrand domain of a logic program P, denoted as Up, we mean the
set of all terms formed from the set of all constants in the logic program
and recursively the set of all functions whose arguments are terms. If there
are no constants in the logic program, then an arbitrary constant is added
to the domain. Given the Herbrand domain, one can then consider the
Herbrand base, denoted as Bp, which is the set of all ground predicates
that can be constructed from the Herbrand domain. It is sufficient to de-
fine a semantics for the logic program over this domain [Lloyd, 1987]. We
use Herbrand interpretations, which are subsets of the Herbrand base, to
specify the semantics of logic programs. A Herbrand model (or model) of
a logic program is an interpretation that satisfies all clauses in the logic
program.
Example 2.0.1. Consider the following definite logic program:
328 Jorge Lobo, Jack Minker and Arcot Rajasekar

are two Herbrand interpretations of P.

are two Herbrand models of P.

2.1 Definite logic programming


In their 1976 paper, van Emden and Kowalski [1976] defined different se-
mantics for a definite logic program. These are referred to as model the-
oretic, proof theoretic (or procedural), and fixpoint (or denotational) se-
mantics. Since we are dealing with logic, a natural semantics is to state
that the meaning of a definite logic program is given by a Herbrand model
of the theory. Hence, the meaning of the logic program is the set of atoms
that are in the model. However, this definition is too broad as there may
be atoms in the Herbrand model that one would not want to conclude to
be true. For example, in the logic program given in Example 2.0.1, M2 is
a Herbrand model and includes atoms edge(c,c), path(b,a), path(a,a). It
is clear that the logic program does not state that any of these atoms are
true, van Emden and Kowalski showed that for definite logic programs,
the intersection of all Herbrand models of a logic program is a Herbrand
model of the logic program. This property is called the Herbrand model
intersection property. The intersection of all Herbrand models is the least
Herbrand model as it is contained within all models. The least model cap-
tures all the ground atomic logical consequences of the logic program and
represents the least amount of information that can be specified as true.
The least Herbrand model of a logic program P is denoted as Mp.
Example 2.1.1. Consider the definite logic program P given in Exam-
ple 2.0.1. The least Herbrand model of P is given by

We can see that these are the only ground atoms which are logical con-
sequences of P.
A second semantics that can be associated with a logic program is
a procedural semantics. Godel showed that one obtains the same results
with proof theory as one does from model theory. Van Emden and Kowalski
[1976] showed that if one uses a proof procedure called linear resolution with
Disjunctive Logic Programs 329

selection function for definite logic programs (SLD-resolution), the ground


atoms that are derivable using SLD from the logic program, forming the
SLD-success set, (SLD(P)) of the logic program, are exactly the same as
the atoms in the least Herbrand model, Mp. SLD-resolution is a reduction
type of processing and derives a sequence of queries, starting from the
query. When a halt statement is derived, the SLD-resolution succeeds and
the query is considered to be derivable using SLD-resolution. If no halt
statement is obtained, the query is said to have failed.
Definition 2.1.2 ([van Emden and Kowalski, 1976]
(SLD-derivation)). Let P be a definite logic program, G be a query
= «- A 1 ,...,A m ,...A n . An SLD-derivation from P with top-query G
consists of a (finite or infinite) sequence of queries GO = G, G1, ..., such
that for all i > 0, Gi+i is obtained from Gi as follows:
1. Am is an atom in Gi and is called the selected atom
2. A «- B1, ... ,Bq is a program clause in P (which is standarized apart
with respect to Gi)
3. Am 0= AO where 6 is a substitution (most general)
4. Gi+1 is the query «- ( A 1 , . . . , A m - i , B 1 , . . . ,B q ,A m + 1 ,... ,Ak)0
In the example given below, we show an SLD-derivation of the halt state-
ment.
Example 2.1.3. Consider the definite logic program P given in Exam-
ple 2.0.1. Let «- path(a,c) be the query to be solved.
Then we have the following SLD-derivation:
4. -path(a,c)
5. «- edge(a, Z) , path(Z, c) from the clause path(X, Y) «- edge(X, Z),
path(Z,Y)
6. «- path(b, c) from the clause edge(a, b) <-
7. 4- edge(b, c) from the clause path(X, Y) 4- edge(X, Y)
8. *- from the clause edge(b, c) 4-
The query succeeds and is an element of the set SLD(P)
A third semantics is obtained by defining a mapping, T, from Herbrand
interpretations to Herbrand interpretations. As in denotational semantics,
if the domain over which the mapping, T, is defined is a complete lattice
and the mapping is continuous, then the mapping, T, has a least fixpoint
(lfp(T)), which is taken to be the meaning of the logic program. By a
fixpoint of a mapping T, we mean an element I in the domain of T that
satisfies the formula T(I) — I. The lfp(T) is computed by iteratively
applying the operator T starting with the bottom element of the lattice
until a fixpoint is reached. The set of Herbrand interpretations forms a
complete lattice over the partial ordering C. The bottom element of the
330 Jorge Lobo, Jack Minker and Arcot Rajasekar

lattice is the null set 0. The following mapping, defined by van Emden and
Kowalski [1976] is a continuous mapping.
Definition 2.1.4. [van Emden and Kowalski, 1976] Let P be a defi-
nite logic program, and let I be a Herbrand interpretation. Then

Tp(I) = {A e HBp | A <- BI, ..., Bn is a ground instance of a clause in P,

and

Example 2.1.5. Consider the definite logic program P given in Exam-


ple 2.0.1. The least fixpoint of Tp is computed as follows:

The least fixpoint contains all the ground atoms which are logical con-
sequences of P
The major result is that the model theoretic, the procedural and fixpoint
semantics all capture the same meaning to a logic program: the set of
ground atoms that are logical consequences of the logic program.
Theorem 2.1.6 ([van Emden and Kowalski, 1976]
(Horn Characterization—Positive)). Let P be a definite logic program
and A 6 HBp. Then the following are equivalent:
(a) A is in Mp
(b) A is in least fixpoint of Tp
(c) A is in SLD(P)
(d) A is a logical consequence of P.

2.2 Disjunctive logic programming


In a disjunctive logic program, a minimal Herbrand model may not exist
uniquely. Consider the disjunctive logic program

path(a, b), unconnected(6, a) «— (3)

This clause is equivalent to

path(a, b) V unconnected(b, a) (4)

and has two minimal Herbrand models, {path(a,b)}, and {unconnected(b,


Disjunctive Logic Programs 331

a)}. Neither model contains the other. Furthermore, the intersection of the
two minimal Herbrand models is not a model of the logic program. Hence,
disjunctive logic programs do not have the Herbrand model intersection
property of definite logic programs.
Although there is no unique minimal Herbrand model in a disjunctive
logic program, there is a set of minimal Herbrand models. In 1982, Minker
[1982] developed a suitable model theoretic semantics for disjunctive logic
programs in terms of the minimal models. He showed that a ground clause
is a logical consequence of a disjunctive logic program P if and only if it is
true in every minimal Herbrand model of P. This semantics extends the
unique minimal model theory of definite logic programs to disjunctive logic
programs.
Example 2.2.1. Consider the following disjunctive logic program:
P = {(1) path(X,Y),unconnected(X, Y) <- point(X),point(Y);
(2) point(a) ;
(3) point(b)}.
The minimal Herbrand models of P are given by

Mp = {path(a,a),path(a,b),path(b,a),path(b,b),point(a),point(b)}
Mp = {unconnected(a,a),path(a,b),path(b,a),path(b,b),point(a),
point (b)}
Mp = {path(a,a),unconnected(a,b),path(b,a),path(b,b),point(a),
point(b)}
Mp = {path(a,a),path(a,b),unconnected(b,a),path(b,b),point(a),
point (b)}
Mp = {path(a,a),path(a,b),path(b,a),unconnected(b,b),point(a),
point(b)}
Mf, = {unconnected(a,a),unconnected(a,b),path(b,a),path(b,b),
paint(a)>point(b)}
Mp = {unconnected(a, a), path(a, b), unconnected(b, a), path(b, b),
point(a),point(b)}
Mp = {unconnected(a, a),path(a, b),path(b, a), unconnected(b,b),
point(a),point(b)}
Mp = {path(a,a),unconnected(a,b),unconnected(b,a),path(b,b),
point(a), point(b)}
Mp° = {path(a, a), unconnected(a, b),path(b, a),unconnected(b, b),
point(a),point(b)}
Mp1 = {path(a, a), path (a,b), unconnected(b, a), unconnected(b, b),
point (a), point (b)}
Mp2 = {unconnected(a, a), unconnected(a, b),unconnected(b, a),
path(b, b), point(a),point(b)}
Mp3 — {unconnected(a,a),unconnected(a,b),path(b,a),
unconnected(b, b), point(a), point(b)}
332 Jorge Lobo, Jack Minker and Arcot Rajasekar

Mp4 = {unconnected(a, a), path(a, b), unconnected(b, a),


unconnected(b, b), point (a), point (b)}
Mp5 = {path(a,a),unconnected(a, b),unconnected(b, a),
unconnected(b, b), point(a), point(b)}
Mp6 = {unconnected(a, a), unconnected(a, b), unconnected(b, a),
unconnected(b, b), point (a), point (b)}
We can see that every ground clause which is a logical consequence of P is
true in every minimal Herbrand model of P
There is a corresponding result with respect to disjunctive logic pro-
grams that is similar to the Herbrand model intersection property. In
contrast to definite logic programs, the ground logical consequences can-
not necessarily be characterized by a set of ground atoms. To capture the
logical consequences in a unique structure, the structure has to be defined
over disjunctive clauses, rather than a structure dealing with atoms such
as Herbrand base and interpretations. Minker and Rajasekar therefore de-
fined the disjunctive Herbrand base (DHBp) to consist of the set of all
positive ground disjunctive clauses that can be constructed using the Her-
brand base of a program. The DHBp is used to define concepts similar to
Herbrand interpretations and models of definite logic programs. Subsets
of the DHB are referred to as states. A state is similar to a Herbrand
interpretation. A structure corresponding to Herbrand models of definite
programs is defined as follows.
Definition 2.2.2 ([Lobo et al, 1989]). Consider a disjunctive logic pro-
gram P. A model-state is defined to be a state of P such that
1. every minimal model of P is a model of the set of clauses in the
model-state,
2. every minimal model of the model-state is a model of P.
The intersection of all model-states is a model-state and it is the least
model-state. Lobo, Minker and Rajasekar [Lobo et al., 1989] showed that
the least model-state captures all the ground clausal logical consequences of
the logic program and represents the least amount of information that can
be specified as true. The least model-state of a disjunctive logic program
P is denoted as MSp.
Example 2.2.3. Consider the disjunctive logic program P given in Ex-
ample 2.2.1. The least model-state of P is given by

MSp = {path(a,a) V unconnected(a,a),path(a,b) V unconnected(a,b),


path(b, a) V unconnected(b,a),path(b, b) V unconnected(b, b),
point(a),point(b)}

We can see that every clause in MSp is true in every minimal Herbrand
model of P enumerated in Example 2.2.1.
Disjunctive Logic Programs 333

To obtain the proof theoretic semantics of a disjunctive logic program,


the inference system, linear resolution with selection function for indefinite
programs (SLI-resolution), developed by Minker and Zanon [1979] is used.
SLI-resolution is defined using trees as the basic representation. Each node
in the tree is a literal and there are two types of literals: marked liter-
als or A-literals and unmarked literals or B-literals. A non-terminal node
is always an A-literal whereas a terminal literal can be either an A- or a
B-literal. These tree structures are called t-clauses. A t-clause can also
be viewed as a pre-order representation of a resolution tree. It is a well-
parenthesized expression such that every open parenthesis is followed by a
marked literal. A t-clause is a special representation of a clause and embeds
the information about the ancestry of each literal during a derivation. A
literal is marked if it has been selected in an SLI-derivation. During deriva-
tion, an unmarked literal in the query t-clause is selected and marked. This
literal can be either positive or negative. The selected literal is unified with
a complementary literal in the program clause. The resolvent is attached
as a subtree to the literal in the query clause. The t-clause is then made
admissible and minimal by performing factoring, ancestry-resolution and
truncation. The notions of factoring, ancestry-resolution and truncation
are similar to that in SL-resolution [Kowalski and Kuehner, 1971]. Pro-
gram and query clauses are represented in the form ( e * L 1 . . . Ln), where e
is a special symbol, * is a marking and the Lis are literals.
Example 2.2.4.

is a t-clause representation of

We next give a formal definition for an SLI-derivation. First, we define


two sets of literals which are used during resolution.

A t-clause is said to satisfy the admissibility condition (AC) if for every


occurrence of every B-literal L in it the following conditions hold:
(i) No two literals from rL and L have atoms which unify,
(ii) No two literals from 6L and L have atoms which unify.
A t-clause is said to satisfy the minimality condition (MC) if there is no
A-literal which is a terminal node.
334 Jorge Lobo, Jack Minker and Arcot Rajasekar

rL and &L are used while performing factoring and ancestry-resolution


respectively. AC and MC make sure that truncation, factoring and ancestry-
resolution are performed as soon as possible. Now we have the framework
for describing an SLI-resolution. We modify the definition given in [Minker
and Zanon, 1979] and provide a definition similar to that of SLD-resolution.
Definition 2.2.5 ([Minker and Rajasekar, 1987]). Consider a t-clause
Co. Then Cn is a tranfac-derivation (truncation, ancestry and factoring)
of Co when there is a sequence of t-clauses Co, C 1 ,..., Cn such that for all
i, 0 < i < n, Ci+1 is obtained from d by either t-factoring, t-ancestry, or
t-truncation.
Ci+1 is obtained from d by t-factoring iff
1. Ci is (a1 L a2 M a3) or d is (ai M Q2 L a3);
2. L and M have the same sign and unify with mgu 0;
3. L is in rM (i-e., L is in an higher level of the tree);
4. Ci+1 is (a0 LO a20 a30) or Ci+l is (ai 0a20 L0 a36)
Ci+1 is obtained from Ci by t-ancestry iff
1. d is (A1 (a1 a2 (a3 M a4) a5) a 6 );
2. L and M are complementary and unify with mgu 0 ;
3. L is in SM ;
4. ci+1 is (a1 0(LO* a29 (a30 a40) a50) a60);
Ci+i is obtained from Ci by t-truncation iff
either Ci is (a (L*) B) and Ci+1 is (a ft).
or C1 is (e*) and d+i is O.
Definition 2.2.6 ([Minker and Rajasekar, 1987]). Consider a t-clause
Ci to be (e* a1 L B1) and a program-clause Bi+1 to be (e* a*2 M B2) which
is standardized apart with respect to Ci. Let L to be an arbitrary literal
in d selected for expansion. Then C1,+i is derived from Ci and Bj if the
following conditions hold:
(a) L and M are complementary and unify with mgu O'i+1 = 0;
(b) C'i+l is (e* a10 (L0* a2 0b2 0 ) b 1 0 )
(c) Ci+i is a tranfac-derivation of C,'+1
(d) Ci+1 satisfies the admissibility and minimality conditions.
Definition 2.2.7 ([Minker and Rajasekar, 1987]). An SLI-derivation
of a t-clause E from a disjunctive program P with top t-clause C is a
sequence of t-clauses (C 1 ,..., Cn) such that:
• C1 is a tranfac-derivation of C, and Cn is E;
• For all i, 1 < i < n C,+i is derived from d and a program clause
Bi+1 in P U {C}
Disjunctive Logic Programs 335

An SLI-derivation succeeds when it derives a null clause []. The example


given next illustrates the SLI-derivation procedure.
Example 2.2.8. Consider the program P given in Example 2.2.1 auge-
mented with the clauses
{s(X,Y)+-path(X,Y),
s(X,Y) f- unconnected(X,Y)}
Let «- s(a, a) be the query to be solved. The t-clause representation of the
augmented program is as follows:
(1) (e* s(X, Y ) - p a t h ( X , Y))
(2) (e* s(X, Y) -unconnected(X, Y))
(3) (e* path(X, Y) unconnected(X, Y) -point(X) - p o i n t ( Y ) )
(4) (e* -point(a))
(5) (e* -point(b))

and the query clause is translated into

We show that there is an SLI-refutation.

As the proof theoretic semantics one should take the set of all positive
ground clauses (that is, disjunctive clauses made of ground atoms from
the Herbrand base) that one can derive from the logic program using SLI
resolution. We call this set the SLI-success set, (SLI(P)), of P. The proof
336 Jorge Lobo, Jack Minker and Arcot Rajasekar

theoretic and the model theoretic semantics yield the same results.
Theorem 2.2.9 ([Minker and Zanon, 1979]). Consider a disjunctive
logic program P and a ground positive disjunctive clause C. Then,
C E SLI(P) if and only if C is a logical consequence of P
To obtain the fixpoint semantics of a disjunctive logic program, Minker
and Rajasekar [l987a] modified the van Emden-Kowalski fixpoint opera-
tor Tp. When working with disjunctive logic programs, it is not possible
to map Herbrand interpretations to Herbrand interpretations. The natu-
ral mapping with a disjunctive theory is to map a set of positive ground
disjuncts to positive ground disjuncts. Minker and Rajasekar used the dis-
junctive Herbrand base, DHBp, and its subsets to define a lattice for the
mapping. Subsets of the DHBp under the partial order C form a lattice.
Minker and Rajasekar defined their fixpoint operator to be:
Definition 2.2.10 ([Minker and Rajasekar, 1987a]). Let P be a dis-
junctive logic program, and let S be a state. Then
TP(S) = {C E DHB(P)/ C' - B1, . . . , Bn is a ground instance of a
clause in P,
B1 V C1, . . . , Bn V Cn are in S, and C is the smallest factor of the
clause C'VC1V...VC n , where the Ci, 1 < i < n are positive clauses.}
The smallest factor of a ground clause is another claus C' such that C' has
only distinct ground atoms and is a subset of C. Minker and Rajasekar
[l987a] showed that the operator Tp is continuous and the least fixpoint of
Tp captures all the minimal ground clauses which are SLI-derivable from
a disjunctive logic program P. By a minimal ground clause which is SLI-
derivable we mean that no sub-clause of it is in the set SLI(P).
Example 2.2.11. Consider the disjunctive logic program P given in Ex-
ample 2.2.1. The least fixpoint of Tp is computed as follows:
= {point(a),point(b)}
Tp({point(a),point(b)})
= {path(a, a) V unconnected(a, a),path(a, b) V unconnected(a, b),
path(b, a) V unconnected(b, a),path(b, b) V unconnected(b, b),point(a),
point(b)}
This is the least fixpoint and contains all the minimal ground clauses which
are derivable from P.
The major result in disjunctive logic programming is that the model
semantics based on Herbrand models and model-states, the proof seman-
tics and the fixpoint semantics yield the same semantics and capture the
set of minimal positive clauses that are logical consequences of the logic
program. Moreover, each of these semantics reduce to corresponding se-
mantics for definite logic programs discussed in the previous section when
Disjunctive Logic Programs 337

Semantics Definite || Disjunctive


Theory Reference || Theory | Reference
Positive Consequences
Fixpoint Tp / w van Emden and T'piw Minker and
Semantics Kowalski [1976] Rajasekar
[1987a]
Model Least Model van Emden and Minimal Model [Minker, 1982]
Theory Kowalski [1976] Model-State Lobo
et al. [1989]
Procedure SLD van Emden and SLI Minker and
Kowalski [1976] Zanon [1979]
Table 1. Semantics for positive consequences for logic programs

applied to definite logic programs. Compare the following theorem with


Theorem 2.1.6.
Theorem 2.2.12 ([Minker and Rajasekar, 1987a; Lobo et al., 1989]
Disjunction Characterization—Positive). Let P be a disjunctive logic
program and C E DHBp. Then the following are equivalent:
(a) C is true in every minimal Herbrand model of P
(a1) C1 is in MSp, where C' is a sub clause of C
(b) C' is in the least fixpoint of Tp, where C' is a sub clause of C
(c) C is in SLI(P)
(d) C is a logical consequence of P.
Table 1 summarizes the results discussed in this section.

3 Negation in logic programs


Given a definite or a disjunctive logic program, the only answers that may
be derived are positive. It is not possible to answer a general query. One
would have to permit negative information to be stored with the logic pro-
gram. However, adding negative data could overwhelm a system as there
is an unlimited amount of negative information that may apply. We de-
scribe several ways in which one may conclude negative information from
definite or disjunctive theories without having to add negative data to the
logic program. The theories of negation described below lead to nonmono-
tonic logics, important for commonsense reasoning. By a nonmonotonic
logic is meant one in which the addition of a new truth to a theory may
cause previous truths to become false. A logic program without negation
is monotonic.

3.1 Negation in definite logic programs


Reiter [1978] and independently Clark [1978] were the first to address nega-
tion in definite logic programs and deductive databases. Their results ap-
peared in a book edited by Gallaire and Minker [1978] entitled Logic and
338 Jorge Lobo, Jack Minker and Arcot Rajasekar

Data Bases. To answer negated queries, Reiter defined the closed world as-
sumption (CWA). According to the CWA, one can assume a negated atom
to be true if one cannot prove the atom from the logic program. Reiter
showed that the union of the theory and the negated atoms proved by the
CWA is consistent. Clark viewed negation of an atom as a lack of sufficient
condition for provability of the atom. Clark argued that the logic program
clauses in a definite logic program should be viewed as definitions of the
atoms in the Herbrand base of the program. Hence they are necessary and
sufficient to provide a proof for the atoms. That is, what one should do is
consider the logic clauses as definitions to imply if-and-only-if conditions
instead of only if conditions. To do so, one effectively reverses the if con-
dition in the logic program clauses of a definite logic program P to be an
only-if condition and then takes the union of this set of clauses with the
original logic program clauses. The union of these two sets, augmented by
Clark's equality theory (CET) [Clark, 1978] is referred to as the Clark com-
pletion of the program, and written comp(P). Clark shows that by using
what is called the negation as finite failure (NAF) rule on the if defini-
tions, one can conclude the negation of a ground atom if it fails finitely to
prove the atom. He augmented SLD-resolution with this rule, and called it
SLDNF-resblution, and showed that it is sound and complete with respect
to the the semantics defined by comp(P).
Shepherdson [1984; 1985; 1987] showed a relationship between answers
found using the CWA and the comp(P) theories of negation. He provides
conditions under which they are the same and under which they may differ.

3.2 Negation in disjunctive logic progrtams


In his discussion of the CWA in 1978, Reiter [1978] showed that the CWA
applied to disjunctive theories leads to inconsistencies. Consider the theory
{p(a)Vp(b)}. Since it is possible to prove neither p(a) nor p(b) by the CWA,
one may assume -p(a) and -p(b). But the union is now inconsistent. That
is, {p(a) Vp(b), -p(a),-p(b)} is inconsistent.
To overcome this problem, Minker [1982] defined the generalized closed
world assumption (GCWA). There are two ways to characterize the GCWA:
model theoretic and proof theoretic.
Definition 3.2.1 ([Minker, 1982]). Let P be a disjunctive logic pro-
gram. The set of negative literals that can be assumed using the GCWA
is given by:
Model-theoretic definition:
GCWA(P) = {A E HBP / A not in any minimal Herbrand model of P}
Proof-theoretic definition:
GCWA(P) = {A E6 HBP / AK, K is a positive (possibly empty) ground
clause, P /- A V K implies P /- K}
Minker showed that these two definitions are equivalent.
Disjunctive Logic Programs 339

Example 3.2.2. Consider the following definite logic program:


P = { (1) path(X, Y), unconnected(X, Y) - point(X), point(Y) ;
(2) path(X, X} -point(X)
(3) point(a) ;
(4) point(b)}.
The minimal Herbrand models of P are given by

M1p = {path(a, a),path(a, b),path(b,a),path(b, b),point(a),point(b)}


M2p = {path(a, a), unconnected(a, b),path(b, a),path(b, b),point(a),
point(b)}
M3p = {path(a, a),path(a, b),unconnected(b, a),path(b, b),point(a),
point(b)}
M4p = {path(a, a), unconnected(a, b),unconnected(b, a),path(b, b),
point (a), point (b)}
GCWA(P) = HBP - Ui=1,2,3,4Mip=
{ unconnected(a, a) unconnected(b, b)}

Answering negative queries with the GCWA is computationally difficult


as shown by Chomicki and Subrahmanian [1989].
Lobo, Rajasekar and Minker [Lobo et al., 1988] show that one can
define a completion for a disjunctive logic program in a manner similar
to the Clark completion [Clark, 1978] of definite logic programs. They
refer to the completed theory for disjunctive logic programs as dcomp(P).
They showed that a rule of negation called negation as finite failure in
disjunctive logic programs can be defined, similar to the NAF-rule [Clark,
1978] of definite logic programs, and used to augment SLI-resolution to
answer negative ground queries in disjunctive logic programs. They showed
that this procedure, called SLINF-resolution, is sound and complete with
respect to the completion of the disjunctive logic program. Rajasekar, Lobo
and Minker [Rajasekar et al,, 1987] also show that a theory of negation
similar to the closed world assumption called the weak generalized closed
world assumption (WGCWA), can be defined. They [Rajasekar et al., 1987]
showed that WGCWA is no more difficult to compute than negation in the
CWA theory [Reiter, 1978] associated with definite logic programs. Hence,
the WGCWA is computationally less complex than the GCWA [Minker,
1982]. In the WGCWA, we may assume the negation of an atom p(a)
if there is no positive ground clause that can be derived from the logic
program that contains p(a).
Definition 3.2.3 ([Rajasekar et al., 1987]). Let P be a disjunctive
logic program. The set of negative literals that can be assumed using the
WGCWA is given by:
340 Jorge Lobo, Jack Minker and Arcot Rajasekar

Semantics Horn Disjunctive


Theory | Reference Theory | Reference
Negation
Theory of CWA [Reiter, 1978J GCWA Minker [1982]
Minker and
Rajasekar [l987a]
negation WGCWA Ross and Topor [1987]
Rajasekar et al. [1987]
Rule of NAF [Clark, 1978] SN-rule Minker and
Rajasekar [1987]
negation NAFFD-rule [Rajasekar et al., 1987]
Procedure SLDNF [Clark, 1978] SLINF Minker and
Rajasekar [1987; 1987a]
Table 2. Semantics for Negative Consequences for Logic Programs

Proof-theoretic Definition:
WGCWA(P) = {A E HBP / AK, K is a positive (possibly empty) ground
clause, P derives A V K}.
We can consider the concept of derivability from a program P as equiv-
alent to membership in the least fixpoint of Tp.
Example 3.2.4. Consider the disjunctive logic program P given in Exam-
ple 3.2.2. The derivable consequences of P (as given by the least fixpoint
of Tp)

= { path(a, a) V unconnected(a, a), path(a, b) V unconnected(a, b),


path(b, a) V unconnected(b, a), path(b, b) V unconnected(b, b),
path(a, a), path(b, b), point(a), point(b)}

WGCWA(P) = {} since every ground atom in HBp is in some clause


which is derivable from P.
The WGCWA was discovered independently by Ross and Topor [1987]
who call it the disjunctive database rule (DDR). Rajasekar, Lobo and
Minker [Rajasekar et al., 1987] show that the GCWA implies the WGCWA.
Both the GCWA and the WGCWA compute answers to negated atoms
when they are ground formulae. When applied to Horn programs both the
WGCWA and the GCWA reduce to the CWA.
Table 2 summarizes the results in negation for definite and disjunctive
logic programs.

4 Normal or general disjunctive logic programs


Definite logic programs or disjunctive logic programs do not allow negated
atoms in the right hand side of a logic program clause. This restricts the
expressiveness that one may achieve in writing logic programs with negation
in the body of a rule. As noted earlier, logic programs with negated atoms
in the body of a logic program clause are referred to as normal (for program
Disjunctive Logic Programs 341

clauses with atomic heads) or normal disjunctive logic programs. One can,
of course, write an equivalent formula for a disjunctive logic program which
does not contain negated atoms, by moving the negated atom to the head
of the logic program clause to achieve a disjunction of atoms in the head of
the clause. This, however, has a different connotation than that intended
by a negated atom in the body of a clause. It is intended in this case that
the negated atom be considered solved by a default rule, such as the CWA
or negation as finite failure.
As noted by Apt, Blair and Walker [Apt et al., 1987], and also by Van
Gelder [1987], by Naqvi [1986] and by Chandra and Harel [1985], problems
arise with the intended meaning of a normal logic program in some in-
stances. For example, the logic program {P = p(a) - -q(a), q(a) - p(a)}
raises questions as to its meaning. We describe alternative ways to han-
dle normal definite logic programs and the corresponding approach with
normal disjunctive logic programs.

4.1 Stratified definite logic programs


In a stratified logic program one allows normal logic programs which do
not permit recursion through negation. For example, the program P given
at the end of the previous section would not be permitted since there is
recursion through negation. That is, we have two rules which need recur-
sive application to solve and the application is through the negative literal
-q(a). When one excludes these constructs, one can place logic program
clauses of the program into different strata, such that if a negative literal
-A occurs in the body of a program clause in some stratum S, then A
occurs positively in the head of a program clause only in a stratum below
S and if a positive literal A occurs in the body of a program clause in some
stratum S, then A occurs positively in the head of a program clause in
the same stratum or in a stratum below S. Stratified logic programs are a
simple generalization of a class of logic programs introduced in the context
of deductive databases by Chandra and Harel [1985].
Apt, Blair and Walker [1987] showed that if one has a stratified normal
logic program, then one can find a fixpoint for the first stratum and then
use this fixpoint as the starting point, find the fixpoint of the next stratum
and continue until a fixpoint is obtained for the last stratum. This fixpoint
is taken as the meaning of the logic program. They show that the fixpoint
is a model and furthermore, is a minimal and supported model of the logic
program.
Example 4.1.1. Consider the following normal program given by:
342 Jorge Lobo, Jack Minker and Arcot Rajasekar

P = { path(X, Y) - -unconnected(X, Y),


unconnected(X, Y) -point(X), point(Y) -edge(X, Y),
edge(X, Y) -edge(X, Y),edge(Y, Z),
edge(a, b), edge(b, c),
edge(a, a), edge(b, b), edge(c,c),
point(a), point(b), point(c)}

Then, the program can be stratified as follows:

P1 = { point (a), point (b), point(c),


edge(a, b), edge(b, c),
edge(X, Y) - e d g e ( X , Y), edge(Y, Z)}
P2 = { unconnected(X, Y) --edge(X, Y)}
P3 = { path(X,Y) - -unconnected(X,Y)}

Note that the atoms in the negative literals appearing in the body of a
clause in stratum Pi appear only in the heads of clauses in stratum Pj with
j < i. That is, the atoms appearing in these literals are defined in a stratum
below them. Hence, there can be no recursion through these literals. In
the case of positive literals appearing in the body of a clause in stratum
Pi it can be seen that the atoms forming these literals are in the heads of
clauses in stratum Pj with j < i. again, this implies that these atoms are
defined in the same stratum or in strata below. Recursion through these
literals are allowed. The intended model is calculated iteratively as follows:

MP1 = least fixpoint of TPl (0)


= {point(a), point(b), point(c),
edge(a, a), edge(b, b), edge(c, c),
edge(a, b), edge(b, c), edge(a, c)}
Mp2 = least fixpoint of Tp2(Mp1)
= {unconnected(b, a), unconnected(c, b), unconnected(c, a)} U M
Mp3 = least fixpoint of Tp3 (Mp2)
= {path(a, a), path(b, b), path(c, c),
path(a, b), path(b, c), path(a, c)}U Mp2

Mp = Mp3 is the intended meaning of the program using the Apt, Blair
and Walker semantics.
The model semantics achieved by the theory developed by Apt, Blair
and Walker [1987] is independent of the manner in which the logic program
is stratified. Gelfond and Lifschitz [1988] show that Mp is also a stable
model. Przymusinski [l988a] has defined the concept of a perfect model
and has shown that every stratified logic program has exactly one perfect
model. It is identical to the model obtained by Apt, Blair and Walker.
Theorem 4.1.2. Let P be a stratified normal program. Then
Disjunctive Logic Programs 343

1. Tp(MP) = MP,
2. Mp is a minimal Herbrand model of P,
3. Mp is a supported model of P,
4. Mp is a stable model of P,
5. Mp is a perfect model of P.
In the procedural interpretation of normal disjunctive programs prob-
lems arise when a negated literal has to be evaluated and the literal contains
a variable. In this case, the logic program is said to flounder. Chan [1988]
has defined constructive negation, that will find correct answers even in the
case of negated literals that contain variables. The underlying idea behind
constructive negation is to answer queries using formulas involving only
equality predicates. The following example illustrates the concept behind
constructive negation.
Example 4.1.3. Consider the following normal program:

P = { unconnected(X, Y) - -path(X, Y),


path(X, Y) -edge(X, Y)
path(X, Y) - edge(X, Z), path(Z, Y)
edge(a, b) -
edge(b, c) -}

and the query

unconnected(b, Y)

This query cannot be answered using the NAF rule [Clark, 1978] which
requires that a negative literal be ground before it is selected in an SLDNF-
derivation. Chan [1988] developed constructive negation which provides
answers using inequality. An answer to the above query in his theory
would be the inequality {Y = a}

4.2 Stratified disjunctive logic programs


Rajasekar and Minker [1988] apply the nonmonotonic fixpoint semantics
developed by Apt, Blair and Walker to a closure operator Tc to develop a
fixpoint theory for stratified disjunctive logic programs. The operator Tc
is a modification of the operator Tp given in Definition 2.2.10 to handle
negative literals in the body of the disjunctive program clauses. The op-
erator uses the GCWA to handle negation. Given a set of positive ground
clauses S, the canonical set, can(S), is defined as a largest subset of S such
that no clause in can(S) is logically implied by another clause in can(S).
Definition 4.2.1 ([Rajasekar and Minker, 1988]). For a disjunctive
program P, a mapping Tcp : 2 D H B ( P ) - 2DHB(P) is defined as follows:
344 Jorge Lobo, Jack Minker and Arcot Rajasekar

Let 5 be the state of a program P, then


Tcp(S) = { C E DHB(P) : C' - B1, B2,..., Bn, -A1, -A2, . . . ,
-Am, E HERB(P),
Ai, 1 < i < n, s.t. Bi is an atom and Bi V Ci E S,
Ai, 1 < i < m, s.t. Ai is an atom and no clause in can(S)
contains Ai and
C = s-fac(C' V C1 V . . . V Cn) where Ai, 1 < i < n, Ci
can be empty}

Example 4.2.2. Consider the disjunctive program P given by:


P = { path(X, Y) -
unconnected(X,Y), edge(X, Y)
edge(X, Z) - edge(X, Y), edge(Y, Z)
edge(a, b), edge(b, c),
edge(a, a), edge(b, b), edge(c, c),
point(a), point(b), point(c)}.
P can be stratified as:
P1 = { point (a), point (b), point (c),
edge(a, b), edge(b, c),
(edge(a, a), edge(b, b), edge(c, c)
edge(X, Z) - edge(X ,Y), edge(Y, Z)}
P2 = { unconnected(X, Y),edge(X, Y) - point(X),point(Y)}
P3= { path(X, Y) - -unconnected(X, Y)}
Note that there is no recursion through negation. The explanation given in
Example 4.1.1 also applies here. The intended meaning of P is computed
as follows:
MS P 1 = least fixpoint ofTcp1(0)
= {point(a), point(b), point(c),
edge(a, a), edge(b, b), edge(c, c),
edge(a, b), edge(b, c), edge(a, c)}
MSp2 = least fixpoint of Tcp2(MSp1
= {unconnected(a, b) V edge(a, b), unconnected(b, a) V edge(b, a),
uncannected(c, b) V edge(c, b), unconnected(b, c) V edge(b, c),
unconnected(a, c) V edge(a, c), unconnected(c, a) V edge(c, a),
unconnected(a, a) V edge(a, a), unconnected(b, b) V edge(b, b),
unconnected(c, c) V edge(c, c)}U MS P 1
MSp3 = least fixpoint of Tcp3(MSP2)
= {path(a, a), path(b, b), path(c, c),
path(a, b), path(b, c), path(a, c ) } U M S p 2
Disjunctive Logic Programs 345

Semantics Horn Disjunctive


Theory Reference Theory Reference
Stratified Programs
Fixpoint TP Apt et al. Tcp Rajasekar
semantics [1987] and Minker
[1988]
T1p Ross and
Topor [1987],
Rajasekar and
Minker [1988]
Model Standard model Apt et al Standard state Rajasekar and
theory [1987] Stable state Minker [1988]
Stable model Gelfond and
Lifschitz
[1988]
Table 3. Stratification Semantics for Logic Programs

MSP = MSp3 is the intended meaning of the program


Rajasekar and Minker [1988] show that the iterarted state MSp reached
by the stratified semantics is a logical fixpoint of Tcp. 5 is a logical fixpoint
of T if T(S) - S and S - T(S). They also show that MSP is a stable
state and supported state. A stable state is similar in concept to the stable
model of normal programs defined by Gelfond and Lifschitz [1988]. A sup-
ported state is defined similar to a supported Herbrand interpretation [Apt
et al., 1987]. In addition, Rajasekar and Minker [1988] develop an iterative
definition for negation, called the generalized closed world assumption for
stratified logic programs (GCWAS), and show that the semantics captures
this definition. A model-theoretic semantics is developed for stratified dis-
junctive logic programs which is shown to be the least state characterized
by the fixpoint semantics that corresponds to a stable-state defined in a
manner similar to the stable models of Gelfond and Lifschitz. A weaker
semantics is also developed for stratification based on the WGCWA, called
WGCWAS.
Theorem 4.2.3. Let P be a stratified disjunctive program. Then

1. Tcp(MSp) = MSp,
2. MSp is a supported state of P,
3. MSp is a stable state of P.

Lobo [1990] extends the concept of constructive negation, introduced by


Chan for stratified logic programs, to apply to stratified disjunctive logic
programs. The results include the theories of negation for disjunctive logic
programs: the GCWAS and the WGCWAS.
Table 3 summarizes the results that have been obtained in stratified
logic programs and stratified disjunctive logic programs.
346 Jorge Lobo, Jack Minker and Arcot Rajasekar

4.3 Well-founded and generalized well-founded logic


programs
There exist logic programs that are not stratifiable and yet we desire to
compute answers to queries over these theories. An example of a logic
program that is not stratifiable is given by the game tree program:

P = {win(X) - move(X, Y), -win(Y) U { clauses defining move } }.

Van Gelder, Ross and Schlipf [Van Gelder et al., 1988] define the concept
of well-founded semantics to handle such logic programs. Przymusinski
[1988b] presents the ideas of well-founded semantics in terms of two sets
of atoms T and F. Atoms in T are assumed to be true and those in F are
assumed to be false. If an atom is neither true nor false, it is assumed to
be unknown. Thus, in the logic program,

P = {p - a, p - b, a - -b, b - -a}.

We can conclude that p, a, and b are all unknown. The atom p is considered
unknown since it is defined using a and & both of which are unknown.
Van Gelder, Ross and Schlipf [Van Gelder et al., 1988] develop fixpoint
and model theoretic semantics for such logic programs. Ross [l989b] and
Przymusinski [1988b] develop procedural semantics. The three different
semantics are equivalent.
If one analyzes the above logic program, another meaning to the logic
program is also possible. In particular, the last two logic program clauses
state that a is true if b is not true, and 6 is true if a is not true. Hence,
if a is true then p must be true and if b is true p must be true. Thus,
although we may not be able to conclude which of a or & are true, we can
surely conclude that p must be true. Baral, Lobo and Minker [Baral et
al., 1989a] develop model theoretic, fixpoint and procedural semantics to
capture the meaning of logic programs such as given above. They term
this generalized well-founded semantics (GWFS). The fixpoint definition is
similar to the definitions of well-founded semantics of Przymusinski. Every
atom proved to be true in the well-founded semantics is also true in the
generalized well-founded semantics. However, some additional atoms may
be proved true in the GWFS.

4.4 Generalized disjunctive well-founded semantics


Consider the following example:
Example 4.4.1. Let P be a disjunctive logic program:
Disjunctive Logic Programs 347

c A d is false in all minimal models of this program. Therefore, it is reason-


able to assume that a is false and 6 is true. But the GWFS discussed in
the previous sub-section is not able to infer a to be false and b to be true
for the above program. But, it is able to infer a to be false and b to be true
for a simple variant of the above program, given to be:

Baral, Lobo and Minker [Baral et al., 1989b] describe a semantics which
solves the above problem in GWFS. The semantics is based on disjunctions
and conjunctions of atoms instead of only sets of atoms. The disjuncts
are assumed to be true and the conjuncts are assumed to be false. This
allows the representation of indefinite information. That is, -aV-b can be
assumed to be true (by having a A 6 as false) without knowing if either -a
or -b or both are true. The semantics is also general enough to extend the
well-founded semantics to normal disjunctive programs. They also present
a procedural semantics for the extended semantics and show how to restrict
the procedure to compute the generalized well-founded semantics. Since
the procedure handles not only atomic but also disjunctive information,
factoring and bookkeeping for ancestry resolution are needed. In addition,
the deduction of negative information is obtained using the GCWA which
was proved to be more complex than the closed world assumption [Chomicki
and Subrahmanian, 1989].
There have been other extensions of the well-founded theory to disjunc-
tive logic programs. Ross [l989a] developed a strong well-founded semantics
and Przymusinski [1990] developed what is called a stationary semantics.
This work extends well-founded semantics to disjunctive logic programs.
Table 4 summarizes the results that have been obtained in well-founded
semantics for definite and disjunctive logic programs.

5 Summary
We have described the foundational theory that exists for definite normal
logic programs and the extensions that have been made to that theory
and to disjunctive normal logic programs. The results are summarized in
348 Jorge Lobo, Jack Minker and Arcot Rajasekar

Semantics Horn Disjunctive


Theory [ Reference Theory | Reference
Well-Founded Strong/
weak well-F/
stationary
Normal Programs
Fixpoint Ioo Van Gelder
et al. [1988]
Model MWF(P) Van Gelder
et al. [1988] Ms/w w F ( P ) [Ross, 1989a]
MP Przymusinski
[1990]
Procedure SLS [Ross, 1989b]/
Przymusinski
[1989]
General Well-Founded General disjunctive well-founded
JV Baral SED
Fixpoint Baral et al. [1989b]/
et al. [1989a] Baral et al [l989d]
Model Mj Baral MSEPD Baral et al. [l989b]/
[1989a] Baral et al. [l989d]
Procedure SLIS Baral SLIS [Baral et al., 1989c]/
et al. [1989d] [Baral et al., 1989d]
Table 4. Well founded semantics for logic programs

tables 1-4. The theory of definite and disjunctive logic programs applies
equally to deductive databases where one typically assumes that the rules
are function-free. A firm foundation now exists both for definite normal
and disjunctive normal logic programs for deductive databases and logic
programming.
Although we have developed model theoretic, proof theoretic and fix-
point semantics for disjunctive logic programs, efficient techniques will be
required for computing answers to queries in disjunctive deductive databases
and logic programs. Some preliminary work has been reported by Minker
and Grant [l986b], Liu and Sunderraman [1990], and by Henschen and
his students [Yahya and Henschen, 1985; Henschen and Park, 1986; Chi
and Henschen, 1988]. However, a great deal of additional work is required
regarding theoretical, implementational and applicative aspects of disjunc-
tive logic programming. Questions regarding negation, efficient proof pro-
cedures and answer extraction are important areas which need to be looked
into. Implementing a language based on disjunctive logic has open prob-
lems which need to be solved such as efficient data structure, subsumption
algorithm, control strategies and extra logical features. Applications rang-
ing from knowledge based systems, to common sense reasoning, to natural
language processing seem to be appropriate domains for applying disjunc-
tive logic programming. These areas and others need to be explored.
Disjunctive Logic Programs 349

6 Addendum
Disjunctive logic programming has made significant progress since the pa-
per was submitted in 1989.1
Semantics of disjunctive logic programming have been extended to in-
clude literals both in the head and in the body of disjunctive clauses and
default negation in the body of clauses. In [Lobo et al., 1992] the theoretical
foundations of disjunctive logic programs including theories of negation, the
semantics and the view update problem in disjunctive deductive databases
are given. See [Minker, 1994] for work up to 1994; [Minker and Ruiz, 1996]
for literature on disjunctive theories that contain literals in clauses and
default negation in the body of clauses, and on theories that contain both
literals and multiple default rules in the body of clauses.
Nonmonotonic reasoning and logic programming are closely related.
Nonmonotonic theories such as circumscription, default reasoning and au-
toepistemic logic can be transformed to disjunctive logic programs. See
Minker [1993; 1996] for references. Thus, disjunctive logic programs can
serve as computational vehicles for nonmonotonic reasoning.
Complexity and properties of disjunctive logic programs have been stud-
ied extensively. Complexity results are known for most theories of extended
disjunctive logic programs, see [Minker, 1996]. For references to work on
properties of the consequence relations defined by the different semantics of
disjunctive logic programs, see [Minker, 1996]. Based on properties of pro-
grams and their complexity, users may select the semantics of disjunctive
logic programs of interest to them.
Disjunctive deductive databases are a subset of disjunctive logic pro-
gramming. Such databases are function-free and hence on is dealing with fi-
nite theories. For theories and algorithms of disjunctive deductive databases,
see [Fernandez and Minker, 1995]. For work and references on the view up-
date problem, see [Fernandez, Grant and Minker, 1996]. For references to
the role of disjunctive databases in knowledge bases, see [Minker, 1996].

Acknowledgements
We wish to express our appreciation to the National Science Foundation
for their support of our work under grant number IRI-86-09170 and the
Army Research Office under grant number DAAG-29-85-K-0-177.

References
[Apt et al., 1987] K. R. Apt, H. A. Blair, and A. Walker. Towards a theory
of declarative knowledge. In J. Minker, editor, Foundations of Deduc-

1This paper was written in 1989. the addendum specifies where references to work
through 1996 may be found. Space did not permit citing all relevant work.
350 Jorge Lobo, Jack Minker and Arcot Rajasekar

tive Databases and Logic Programming, pp. 89-148. Morgan Kaufmann,


Washington, D.C., 1988.
[Baral et al, 1989a] C. Baral, J. Lobo, and J. Minker. Generalized well-
founded semantics for logic programs. Technical report, Dept of Com-
puter Science, University of Maryland, College Park Md 20742, 1989.
[Baral et al., 1989b] C. Baral, J. Lobo, and J. Minker. Generalized disjunc-
tive well-founded semantics for logic programs : Declarative semantics.
In Z. W. Ras, M. Zemankova and M. L. Emrich, editors, Methodologies
for Intelligent Systems, 5, pp. 465-473. North-Holland, 1990.
[Baral et al., 1989c] C. Baral, J. Lobo, and J. Minker. Generalized disjunc-
tive well-founded semantics for logic programs : Procedural semantics.
In Z. W. Ras, M. Zemankova and M. L. Emrich, editors, Methodologies
for Intelligent Systems, 5, pp. 456-464. North-Holland, 1990.
[Baral et al., 1989d] C. Baral, J. Lobo, and J. Minker. Generalized dis-
junctive well-founded semantics for logic programs. Technical report,
Dept of Computer Science, University of Maryland, College Park Md
20742, 1989.
[Chan, 1988] D. Chan. Constructive negation based on the completed
databases. In R.A. Kowalski and K.A. Bowen, editors, Proc. 5th Interna-
tional Conference and Symposium on Logic Programming, pp. 111-125,
Seattle, Washington, August 15-19, 1988.
[Chandra and Harel, 1985] A. Chandra and D. Harel. Horn clause queries
and generalizations. Journal of Logic Programming, 2(1):1-15, April
1985.
[Chi and Henschen, 1988] S. Chi and L. Henschen. Recursive query an-
swering with non-Horn clauses. In E. Lusk and R. Overbeek, editors,
Proc. 9th International Conference on Automated Deduction, pp. 294-
312, Argonne, IL, May 23-26, 1988.
[Chomicki and Subrahmanian, 1989] J. Chomicki and V. S. Subrahma-
nian. Generalized closed world assumption is p02 complete. Information
Processing Letters, 34: 289-291, 1990.
[Clark, 1978] K. L. Clark. Negation as failure. In H. Gallaire and J. Minker,
editors, Logic and Data Bases, pp. 293-322. Plenum Press, New York,
1978.
[Colmerauer et al., 1973] A. Colmerauer, H. Kanoui, R. Pasero, and
P. Roussel. Un systeme de communication homme-machine en Francais.
Technical report, Groupe d'Intelligence Artificielle Universite de Aix-
Marseille II, Marseille, 1973.
[Fernandez and Minker, 1995] J. A. Fernandez and J. Minker. Bottom-up
computation of perfect models for disjunctive theories. Journal of Logic
Programming, 25: 33-51, 1995.
[Fernandez, Grant and Minker, 1996] J. A. Fernandez, J. Grant and J.
Minker. Model theoretic approach to view updates in deductive
Disjunctive Logic Programs 351

databases. Journal of Automated Reasoning, 17:171-197, 1996.


[Gallaire and Minker, 1978] H. Gallaire and J. Minker, editors. Logic and
Databases. Plenum Press, New York, April 1978.
[Gelfond and Lifschitz, 1988] M. Gelfond and V. Lifschitz. The stable
model semantics for logic programming. In R. A. Kowalski and K. A.
Bowen, editors, Proc. 5th International Conference and Symposium on
Logic Programming, pp. 1070-1080, Seattle, Washington, August 15-19
1988.
[Henschen and Park, 1986] L. J. Henschen and H. Park. Compiling the
GCWA in indefinite databases. In J. Minker, editor, Foundations of De-
ductive Databases and Logic Programming, pp. 395-438. Morgan Kauf-
mann, Washington, DC, 1988.
[Kowalski, 1974] R. A. Kowalski. Predicate logic as a programming lan-
guage. Proc. IFIP 4, pp. 569-574, 1974.
[Kowalski and Kuehner, 1971] R. A. Kowalski and D. Kuehner. Linear
resolution with selection function. Artificial Intelligence, 2, 227-260,
1971.
[Liu and Sunderraman, 1990] K.C. Liu and R. Sunderraman. Indefinite
and maybe information in relational databases. ACM Transactions on
Database Systems, 15: 1-39, 1990.
[Lloyd, 1987] J.W. Lloyd. Foundations of Logic Programming. Springer-
Verlag, second edition, 1987.
[Lobo, 1990] J. Lobo. On constructive negation for disjunctive logic pro-
grams. Submitted to NACLP 90.
[Lobo et al., 1988] J. Lobo, A. Rajasekar, and J. Minker. Weak completion
theory for non-Horn programs. In R. A. Kowalski and K. A. Bowen, ed-
itors, Proc. 5th International Conference and Symposium on Logic Pro-
gramming, pp. 828-842, Seattle, Washington, August 15-19 1988.
[Lobo et al., 1989] J. Lobo, J. Minker, and A. Rajasekar. Extending the
semantics of logic programs to disjunctive logic programs. In G. Levi
and M. Martelli, editors, Proc. 6th International Conference on Logic
Programming, Lisbon, Portugal, June 19-23 1989.
[Lobo et al., 1992] J. Lobo, J. Minker and A. Rajasekar. Foundations of
Disjunctive Logic Programming. The MIT Press, Cambridge, MA, 1992.
[Minker, 1982] J. Minker. On indefinite databases and the closed world
assumption. In Vol. 138 of Lecture Notes in Computer Science, pp. 292-
308. Springer-Verlag, Berlin, 1982.
[Minker, 1993] J. Minker. An overview of nonmonotonic reasoning and
logic programming. Journal of Logic Programming, 17: 95-126, 1993.
[Minker, 1994] J. Minker. Overview of disjunctive logic programming.
Journal of Artificial Intelligence and Mathematics, 12: 1-24, 1994.
352 Jorge Lobo, Jack Minker and Arcot Rajasekar

[Minker, 1996] J. Minker. Logic and databases: a 20 year retrospective.


In Logic in Databases, pp. 5-57, July 1996. Springer, Lecture Notes in
Comp. Sci., 1154 Invited Keynote Address Int. Workshop LID'96 San
Miniato, Italy Proceedings.
[Minker and Grant, 1986b] J. Minker and J. Grant. Answering queries in
indefinite databases and the null value problem. In P. Kanellakis, editor,
Advances in Computing Research, pp. 247-267, 1986.
[Minker and Rajasekar, 1987] J. Minker and A. Rajasekar. Procedural in-
terpretation of non-Horn logic programs. In E. Lusk and R. Overbeek,
editors, Proc. 9th International Conference on Automated Deduction, pp.
278-293, Argonne, IL, 23-26, May 1988.
[Minker and Rajasekar, 1987a] J. Minker and A. Rajasekar. A fixpoint
semantics for disjunctive logic programs. Journal of Logic Programming,
9: 45-74, 1990.
[Minker and Ruiz, 1996] J. Minker and C. Ruiz. Mixing a default rule with
stable negation. In Proc. of the Fourth Int. Symp. on Art. Intell. and
Mathematics, pp. 122-125, 1996.
[Minker and Zanon, 1979] J. Minker and G. Zanon. An extension to linear
resolution with selection function. Information Processing Letters, 14,
191-194, 1982.
[Naqvi, 1986] S. A. Naqvi. A logic for negation in database systems.
In J. Minker, editor, Proc. Workshop on Foundations of Deductive
Databases and Logic Programming, pp. 378-387, Washington, DC, Au-
gust 18-22, 1986.
[Przymusinski, 1990] T. C. Przymusinski. Stationary semantics for dis-
junctive logic programs. S. Debray and M. Hermenegildo, editors, Proc
of the North American Conference on Logic Programming, Austin, TX,
pp. 40-62, 1990.
[Przymusinski, 1988a] T. C. Przymusinski. Perfect model semantics. In
R. A. Kowalski and K. A. Bowen, editors, Proc. 5th International Con-
ference and Symposium on Logic Programming, pp. 1081-1096, Seattle,
Washington, August 15-19 1988.
[Przymusinski, 1988b] T. C. Przymusinski. On constructive negation in
logic programming. In E. Lusk and R. Overbeek, editors, Proc. North
American Conference of Logic Programming, Cleveland, Ohio, October
16-20, 1989. Extended Abstract.
[Przymusinski, 1989] T. C. Przymusinski. Every logic program has a nat-
ural stratification and an iterated fixed point model. In Proceedings 8th
ACM SIGACT-SIGMOD-SIGART Symposium on Principle of Database
Systems, pp. 11-21, 1989.
[Rajasekar and Minker, 1988] A. Rajasekar and J. Minker. On stratified
disjunctive programs. Technical Report CS-TR-2168 UMIACS-TR-88-
99, Department of Computer Science, University of Maryland, College
Disjunctive Logic Programs 353

Park, December 1988. In Annals of Mathematics and Artificial Intelli-


gence, Vol. 1, 1990.
[Rajasekar et al., 1987] A. Rajasekar, J. Lobo, and J. Minker. Weak gen-
eralized closed world assumption. Journal of Automated Reasoning, 5,
293-307, 1989.
[Reiter, 1978] R. Reiter. On closed world data bases. In H. Gallaire and
J. Minker, editors, Logic and Data Bases, pp. 55-76. Plenum Press, New
York, 1978.
[Ross, 1989a] K. Ross. Well-founded semantics for disjunctive logic pro-
grams. In Proc. 1st International Conference on Deductive and Object
Oriented Databases, Kyoto, Japan, December 4-6, 1989.
[Ross, 1989b] K. Ross. A procedural semantics for well founded negation in
logic programs. In Proc. 8th ACM SIGACT-SIGMOD-SIGART Sympo-
sium on Principle of Database Systems, Philadelphia, PA, March, 29-31,
1989.
[Ross and Topor, 1987] K. A. Ross and R. W. Topor. Inferring negative in-
formation from disjunctive databases. Journal of Automated Reasoning,
4, 397-424, 1988.
[Shepherdson, 1984] J. C. Shepherdson. Negation as finite failure: a com-
parison of Clark's completed database and Reiter's closed world assump-
tion. Journal of Logic Programming, 1, 51-79, 1984.
[Shepherdson, 1985] J. C. Shepherdson. Negation as failure: II. Journal of
Logic Programming, 2, 185-202, 1985.
[Shepherdson, 1987] J. C. Shepherdson. Negation in logic programming.
In J. Minker, editor, Foundations of Deductive Databases and Logic Pro-
gramming, pp. 19-88. Morgan Kaufman, Washington, DC, 1988.
[van Emden and Kowalski, 1976] M. H. van Emden and R. A. Kowalski.
The semantics of predicate logic as a programming language. Journal of
the ACM, 23, 733-742, 1976.
[Van Gelder, 1987] A. Van Gelder. Negation as failure using tight deriva-
tions for general logic programs. In J. Minker, editor, Foundations of De-
ductive Databases and Logic Programming, pp. 1149-176. Morgan Kauf-
mann, Washington, DC, 1988.
[Van Gelder et al., 1988] A. Van Gelder, K. Ross, and J. S. Schlipf. Un-
founded sets and well-founded semantics for general logic programs. In
Proc. 7th Symposium on Principles of Database Systems, pp. 221-230,
1988.
[Yahya and Henschen, 1985] A. Yahya and L. J. Henschen. Deduction in
non-Horn databases. Journal of Automated Reasoning, 1, 141-160, 1985.
This page intentionally left blank
Negation as Failure, Completion and
Stratification
J. C. Shepherdson

Contents
1 Overview/introduction 356
1.1 Negation as failure, the closed world assumption and the
Clark completion 356
1.2 Incompleteness of NF for comp(P) 359
1.3 Floundering, an irremovable source of incompleteness 359
1.4 Cases where SLDNF-resolution is complete for
comp(P) 361
1.5 Semantics for negation via special classes of model 362
1.6 Semantics for negation using non-classical logics . . 363
1.7 Constructive negation: an extension of negation as fail-
ure 364
1.8 Concluding remarks 365
2 Main body 365
2.1 Negation in logic programming 365
2.2 Negation as failure; SLDNF-resolution 367
2.3 The closed world assumption, CWA(P) 370
2.4 The Clark completion, comp(P) 374
2.5 Definite Horn clause programs 384
2.6 Three-valued logic 385
2.7 Cases where SLDNF-resolution is complete for
comp(P): hierarchical, stratified and call-consistent pro-
grams 391
2.8 Semantics for negation in terms of special classes of mod-
els 393
2.9 Constructive negation; an extension of negation as fail-
ure ... 402
2.10 Modal and autoepistemic logic 406
356 J. C. Shepherdson

2.11 Deductive calculi for negation as failure 409

1 Overview/introduction
1.1 Negation as failure, the closed world assumption
and the Clark completion
The usual way of introducing negation into Horn clause logic programming
is by 'negation as failure': if A is a ground atom
the goal -A succeeds if A fails
the goal -A fails if A succeeds.
This is obviously not classical negation, at least not relative to the given
program P; the fact that A fails from P does not mean that you can prove
-A from P, e.g. if P is
a - -b

then ? - 6 fails so, using negation as failure, ? - a succeeds, but a is not a


logical consequence of P.
You could deal with classical negation by using a form of resolution
which gave a complete proof procedure for full first order logic. To a
logician this would be the natural thing to do. Two reasons are commonly
given for why this is not done. The first is that it is believed by most, but
not all, practitioners, that this would be infeasible because it would lead
to a combinatorial explosion, whereas negation as failure does not, since
it is not introducing any radically new methods of inference, just turning
the old ones round. The second is that, in practical logic programming,
negation as failure is often more useful than classical negation. This is the
case when the program is a database, e.g. an airline timetable. You list
all the flights there are. If there is no listed flight from Zurich to London
at 12.31, then you conclude that there is no such flight. The implicit use
of negation as failure here saves us the enormous labour of listing all the
non-existent flights.
This implicit usage is made precise in the closed world assumption,
one of the two commonest declarative semantics given for negation as fail-
ure. This was introduced by Reiter [1978] and formalises the idea that
the database contains all the positive information about objects in the do-
main, that any positive ground literal which is not implied by the program
is assumed to be false. Formally we define

CWA(P) = P U {-A : A is a ground atom and P -/ A},

and we also restrict consideration to term or Herbrand models whose do-


main of individuals consists of the ground terms.
Negation as Failure, Completion and Stratification 357

Negation as failure (NF) is sound for the closed world assumption for
both success and failure, i.e.
if?—Q succeeds from P with answer 0 using NF then CWA(P) =H
Q0,
if? -Q fails from P using NF then CWA(P) =H -Q
where T =H S means 'S is true in all Herbrand models of T '.
The closed world assumption seems appropriate for programs represent-
ing the simpler kinds of database, but its more general use is limited by
two facts:
1. If P implies indefinite information about ground literals then
CWA(P) is inconsistent.
e.g. if P is the program above consisting of the single indefinite clause,
a - -b, then neither a nor 6 is a consequence of P so in forming
CWA(P) both -a and -b are added, and the original clause, which
is equivalent to a V b, is inconsistent with these.
2. Even if P consists of definite Horn clauses, NF may be incomplete
for CWA(P), i.e.
'if CWA(P) =H Q0 then ? — Q succeeds from P with answer including
6 using NF' fails to hold for some programs P.
Indeed there may be no automatic proof procedure which is both
sound and complete for CWA(P), because there are P such that the
set of negative ground literals which are consequences of CWA(P)
may be non-recursively enumerable.
A more widely applicable declarative semantics for NF was given by
Clark [1978]. This is now usually called the (Clark) completion, comp(P),
of the original program P. It is based on 'the implied iff', the idea that
when in a logic program you write

even(0) —
even(s(s(x))) — even(x),

what you usually intend is to give a comprehensive definition of the pred-


icate even, that the clauses with even in their head are supposed to cover
all the cases in which even holds, i.e.

even(y) - (y = 0 V Ex(y = s(s(x)) A even(x))).

In the general case to form comp(P) you treat each predicate symbol p
like this, rewriting the clauses of P in which p appears in the head in the
form of an iff definition of p. Since this introduces the equality predicate
= it is necessary to add axioms for that. These take the form of the usual
equality axioms together with 'freeness axioms', which say that two terms
are equal iff they are forced to be by the equality axioms; for example if
358 J. C. Shepherdson

f and g are different unary function symbols one of the freeness axioms is
f(x) = g(y).
The basic result of [Clark, 1978] is that NF is sound for comp(P) for
both success and failure. Many logic programmers regard this as a justifi-
cation for taking comp(P) to be the declarative meaning of a logic program,
indeed some of them take it so much for granted they do not feel the need
to say that this is what they are doing. Quite apart from the lack of com-
pleteness which we discuss below, I think this is unsatisfactory. Although
in everyday language we may often use 'if' when we mean 'iff', confus-
ing them is responsible for many of the logical errors made by beginning
students of mathematics. Since one of the merits of logic programming is
supposed to be making a rapprochement between the declarative and pro-
cedural interpretation of a program, in the interests of Wysiwym— What
you say is what you mean—logic programming, I think that if you mean
'iff' you should write 'iff'; if you want to derive consequences of comp(P)
you should write comp(P), and if in order to carry out this derivation it is
necessary to go via P then this should be done automatically. A practical
reason for doing this would be that although in simple examples like the
one given it is easy to understand the meaning of comp(P) given P, this is
no longer true when P contains 'recursive' clauses with the same predicate
symbol occurring on both sides of the implication sign, or clauses displaying
mutual recursion. Unfortunately writing comp(P) instead of P would not
solve all our problems because the fact that NF is usually incomplete for
comp(P) means that two programs with the same completion can behave
differently with respect to NF. For example the program

has completion

which is equivalent to the completion of

but the query ? - a succeeds with respect to the latter but not with respect
to the former.
Despite these shortcomings of the closed world assumption and the
completion, they dominate the current view of negation as failure to such
an extent that a large part of this chapter will be taken up with their study.
It must be admitted that the notion of the 'implied iff' is implicit in the use
of negation as failure together with unification, and that the completion,
although not as transparent as one would wish, is from the logical point of
Negation as Failure, Completion and Stratification 359

view one of the simplest declarative semantics which have been proposed
for negation as failure.

1.2 Incompleteness of NF for comp(P)


We have noted above that although NF is not sound for P it is sound
both for CWA(P) and comp(P). Although based on superficially simi-
lar considerations, CWA(P) and comp(P) can be very different; if P is
a - -b then comp(P) is consistent but CWA(P) is not, if P is p - -p then
CWA(P) is consistent but comp(P) is not, if P is a - -b, b - a, a - a,
then both CWA(P) and comp(P) are consistent but they are incompatible.
We have seen above that there are many P for which NF is incomplete
for CWA(P), in particular those P for which CWA(P) is inconsistent. The
simple examples above show that NF is often incomplete for comp(P). This
incompleteness is partially explained by the fact that the soundness result
i f ? - Q succeeds from P with answer 9 using NF then comp(P) =
Q0,
if ? — Q fails from P using NF then comp(P) = -Q
holds for weaker consequence relations than =, for example for intuition-
istic derivability -1, and for 3-valued consequence =3,( being true in all
3-valued models), as shown recently by Kunen [1987]. So we cannot have
completeness for comp(P) using the classical 2-valued consequence relation
unless these weaker consequence relations happen to coincide with the clas-
sical 2-valued one for the particular comp(P) in question. In fact we do
not in general have completeness for either of the two weaker consequence
relations or even for a kind of intersection of them, which allows only deriva-
tions which are sound for both intuitionistic 2-valued and classical 3-valued
logic.

1.3 Floundering, an irremovable source of incomplete-


ness
One reason for the incompleteness of NF is its inability to deal with non-
ground negative literals. This is a price we have to pay for using a quantifier-
free system. A query ? - p(x) is taken to mean ? — 3p(x), and a query
? - -p(x) to mean ? - 3-p(x). It is possible that both of these are true so
it would be unsound to fail ? — 3-ip(x) just because ? - 3p(x) succeeded.
That is why NF is only allowed for ground negative literals. [Prolog is un-
sound because it allows NF on any goal]. This means that we cannot deal
with queries of the form ? - -p(x), and in dealing with other queries we
may flounder, or be unable to proceed because we reach a goal containing
only non-ground negative literals.
It is time to be a little more precise about what we mean by NF. What is
meant here is negation as finite failure formalised as the SLDNF-resolution
of Lloyd [1987]. Program clauses are of the form
360 J. C. Shepherdson

and queries are of the form

where A is an atom and L1 , . . . , Ln are positive or negative literals. The


negation of the query is written as a goal

and when a positive literal Li is selected from the current goal the compu-
tation tree proceeds in the same way as the SLD-resolution used for definite
Horn clauses ; i.e. if Li unifies with the head of a program clause

with mgu 0 then there is a child goal

When a ground negative literal -B is selected you carry out a subsidiary


computation on the goal - B before continuing with the main computa-
tion. If this results in a finitely failed tree then -B succeeds and there is a
child goal

resulting from its removal. If the goal - B succeeds then -B fails so the
main derivation fails at this point. If neither of these happens the main
derivation has a dead-end here. This can arise because the derivation tree
for - B has infinite branches but no successful ones, or if it, or some
subsidiary derivation, flounders or has dead-ends.
The fact that we cannot deal with non-ground negative literals means
that we can only hope to get completeness of SLDNF-resolution, for any
semantics, for queries which do not flounder. In general the problem of
deciding whether a query flounders is recursively unsolvable ([Borger, 1987];
a simpler proof is in [Apt, 1990]), so a strong overall condition on both the
program and the query is often used which is sufficient to prevent this. A
query is said to be allowed if every variable which occurs in it occurs in a
positive literal of it; a program clause A - L1 ,..., Ln is allowed if every
variable which occurs in it occurs in a positive literal of its body L1,..., Ln,
and a program is allowed if all of its clauses are allowed. It is easy to show
that if the program and the query are both allowed then the query cannot
flounder, because the variables occurring in negative literals are eventually
grounded by the positive literals containing them.
Negation as Failure, Completion and Stratification 361

Allowedness is a very stringent condition which excludes many common


Prolog constructs, such as the definition of equality (equal(X, X)), and
both clauses in the standard definition of member(X,L),

1.4 Cases where SLDNF-resolution is complete for


comp(P)
If we accept the restriction described in the last section to programs and
queries which are both allowed then there are some classes of programs for
which SLDNF-resolution is complete for comp(P). The simplest is the class
of definite programs, whose clauses contain no negative literals. Here the
use of NF is minimal; the only negative literals involved are those occurring
in the query, and NF is only used once on each of these. There is no nested
use of NF.
Another class is that of hierarchical programs introduced by Clark.
These are free of recursion, that is to say the predicate symbols can be
assigned to levels so that the predicate symbols occurring in the body of a
clause are of lower levels than that occurring in the head. A much larger
class has recently been given by Kunen [1989]. This is the class of programs
which are semi-strict (or call-consistent, as it is more usually called now).
Semi-strict means that no predicate symbol depends negatively on itself in
the way that p does in the program p - -p, or similarly via any number of
intermediate clauses and predicate symbols, e.g. p - -q, q - p. Formally
this is defined as follows:
We say p 2+1 q iff there is a program clause with p occurring in the
head, and q occurring in a positive literal in the body. We say D-1 q iff
there is a clause with p occurring in the head, and q occurring in a negative
literal in the body. Let >+1 and >-1 be the least pair of relations on the
set of predicate symbols satisfying:

and

Then the program is semi-strict if we never have p >-1 p.


For semi-strict programs the completeness requires also a condition in-
volving the query, that the program is strict with respect to the query. This
means that there is no predicate symbol p on which the query depends
both positively and negatively, as does the query ? - a, b for the program
362 J. C. Shepherdson

Formally, if Q is a query, we say Q >i p iff either a >i p for some a


occurring positively in Q, or a >-i p for some a occurring negatively in
Q. The program is strict with respect to the query Q iff for no predicate
symbol p do we have both Q >+1 p and Q >-1 p.
The semi-strict programs include the stratified programs introduced by
Apt et al. [1988]; these are like the hierarchical programs except that
the condition that the level of every predicate symbol in the body be less
than the level of the head is maintained for predicate symbols appearing
negatively in the body, but for those appearing positively it is relaxed to
'less than or equal to'. The completeness result for stratified programs has
been proved independently by Cavedon and Lloyd [1989].
Kunen's work clarifies the role of the hypotheses of strictness and al-
lowedness. Allowedness gives completeness for the 3-valued semantics based
on comp(P), and strictness ensures that the 2-valued and 3-valued seman-
tics coincide.

1.5 Semantics for negation via special classes of model


So far we have considered declarative semantics for logic programs of what
we may call a purely logical kind. We have taken some set S(P) of sen-
tences e.g. CWA(P) or comp(P), determined by the program P, and taken
this to be the declarative meaning of the program in the sense that when
we ask a query ?Q we are asking whether Q (or, with the usual convention
that free variables in queries are existentially quantified, 3Q, the existential
quantification of Q) is a logical consequence of S ( P ) . An ideal correspond-
ing procedural semantics or automatic proof procedure would be sound
and complete with respect to this declarative semantics, i.e. a query would
succeed iff it was a consequence of S(P), i.e. true in all models of S(P). An
alternative, model-theoretic approach, leaves the set of sentences P com-
prising the original program alone, but modifies the notion of consequence,
by replacing 'true in all models of, by 'true in all models of a certain kind'.
Some set M(P) of one or more models of P, which are thought of as the
only intended models, is singled out, and when we ask a query ?Q we are
now considered to be asking whether Q is true in all models in M(P).
The simplest semantics of this kind is the well known 'least fixpoint
semantics' for a definite Horn clause program P, where M(P) is taken to
be the set consisting of the least fixpoint model of P. For example if P is
the program

the least fixpoint model is the Herbrand model with domain


{0,s(0),s(s(0))...} and even(x) true for x = 0, s(s(0)),.... This is the
meaning one would attach to such a program if one regarded its clauses
as being recursive definitions of the predicates in their heads. For definite
Negation as Failure, Completion and Stratification 363

Horn clause programs, this least fixpoint semantics coincides with the se-
mantics based on CWA(P) because the least fixpoint model is a model,
and the only model, of CWA(P). It is also almost identical, for the usual
positive queries, with the semantics based on considering all models of P.
This is because the least fixpoint model is a generic model, and if Q is a
conjunction of atoms then 3Q is true in the least fixpoint model of P iff it is
true in all models of P. [However it is not true that an answer substitution
which is correct for the least fixpoint model is correct in all models, e.g. in
the example above the query ?even(y) has the correct answer y = s(s(x))
in the least fixpoint model, but not in all models.] For such positive queries
?Q the least fixpoint semantics also agrees with the semantics based on all
models of comp(P). This is because comp(P) = 3Q iff P = 3Q. However
this agreement no longer holds in general for queries containing negation.
For example for the program P:

the negative ground literal -num(s(0)) is true in the least fixpoint model
of P but not in all models of comp(P) or of P.
And when negative literals are allowed in the bodies of program clauses
the least fixpoint model may no longer exist. A natural alternative then is
to consider (as in Minker [1982]) a semantics based on the class of minimal
Herbrand models of P. Apt et al. [1988] advocate a semantics based on the
class of minimal Herbrand models of P which are also models of comp(P),
and Przymusinski [l988b] proposes an even more restricted class of perfect
models.
We discuss these model-theoretic semantics in Section 2.8, but only
briefly, because they are considered in more detail in Chapters 2.6 and 3.3
of this volume. They are not all directly related to negation as failure.
For example negation as failure allows ?q to succeed from the program
P : q - -p, but q is not true in all minimal models of P. So negation as
failure is not sound for the semantics based on all minimal models of P.
It is sound for the semantics based on all minimal models of P which are
also models of comp(P), but since this in general a proper subset of the
set of all models of comp(P), negation as failure must be expected to be
even more incomplete for this semantics than for the usual one based on
all models of comp(P).

1.6 Semantics for negation using non-classical logics


Gabbay [1986] (Section 2.10 below) shows how to obtain a semantics with
respect to which a version of negation as failure is both sound and complete
by using a modal logic with a provability operator D. It is based on the
idea that negation as failure treats -A as -provable(A). Gelfond [1987]
364 J. C. Shepherdson

(Section 2.10 below) has related negation as failure to autoepistemic logic,


replacing -A by -(A is believed). Cerrito [1992; 1993] uses the linear logic
of Girard [1987] to give (for the propositional case) a declarative semantics
with respect to which negation as failure as used in Prolog is both sound
and complete.
Three-valued logic affords some of the best fitting semantics for negation
as failure. In Section 2.6 below we describe the work of Fitting [1985],
Kunen [1987; 1989] and Stark [1991; 1994; 1994] which shows that, in the
very natural three-valued logic of Kleene [1952], SLDNF-resolution is sound
with respect to comp(P), and complete for a wide class of programs.
These are very successful attempts to discover the underlying logic of
negation as failure. Their disadvantage is that the logics involved are more
complicated and less familiar than classical logic so that they are not likely
to help the naive programmer express his problem by means of a logic
program, or to check the correctness of a program.

1.7 Constructive negation: an extension of negation


as failure
Chan [1988] (Section 2.9 below) gives a way of dealing with the floundering
problem of SLDNF-resolution by extending it so as to deal with non-ground
negative literals. This is done by returning the negation of answers to query
Q as answers to -Q. An answer substitution 6 to a query Q with variables
X 1 , . . . , xn can be written in equational form as

where 3 quantifies all variables except x1,... ,xn. When the SLDNF-tree
for a query Q is finite, with answers A1,..., Ak then

is a consequence of comp(P) so if comp(P) is the intended semantics it is


legitimate to return

as the answer to -Q. Chan's procedure of SLD-CNF resolution incorpo-


rates an algorithm for reducing the equality formulae (formulae with = as
the sole predicate symbol) returned in this way to a normal form which is
not much more complicated than the equational form above of the usual
type of answer given by a substitution. It is a promising way of getting a
computational procedure which is more complete relative to the semantics
based on comp(P) than SLDNF-resolution. It is limited by the fact that
an answer to -Q can only be returned when Q has a finite derivation tree.
This is true for all queries only when P is equivalent to a rather simple type
Negation as Failure, Completion and Stratification 365

of hierarchic program, although it may well be true for many programs and
queries in practice.
1.8 Concluding remarks
Neither the closed world assumption nor the Clark completion provide a
satisfactory declarative semantics for negation as failure since although
it is sound for both of them it is not in general complete for either of
them. And it is even more incomplete for the semantics based on minimal,
perfect, well-founded or stable models of comp(P). There are sound and
complete semantics expressible in modal or linear logic but these seem too
complicated to serve as a practical guide to the meaning of a program. I
believe that this is inevitable; that the use of negation as failure is only
justifiable in general by some very contorted logic, and that it is one of
the impure features of present day logic programming which should only
be used with great caution.
Those who are wedded to comp(P) as the 'right', semantics for nega-
tion as failure must accept the incompleteness of negation as failure for
this semantics, or confine themselves to programs and queries for which
completeness has been proved for comp(P), e.g. the allowed programs and
queries which are definite, hierarchical or semi-strict (call-consistent) as
described in Section 2.7. If they are prepared to think in terms of 3-valued
logic then allowedness is enough. However, although comp(P) is one of the
simplest semantics proposed for negation as failure, it is often not easy to
read off its meaning from P.
Comprehensive surveys of recent work can be found in the special issue
(Vol 17, nos 2, 3, 4, November 1993) of the Journal of Logic Programming,
devoted to Non-monotonic Reasoning and Logic Programming and in [Apt,
1994].
I am grateful to K. R. Apt and K. Fine for reading earlier drafts very
carefully and making several corrections and many improvements.

2 Main body
2.1 Negation in logic programming
Before taking up our main topic of negation as failure we mention briefly
in this section some other kinds of negation used in logic programming.
The obvious treatment of negation—allowing the full use of classical
negation in both program clauses and queries and using a theorem prover
which is sound and complete for full first order logic—is generally believed
to be infeasible because it leads to a combinatorial explosion. However
there have been some attempts to do this, e.g. [Stickel, 1986; Naish, 1986;
Poole and Goebel, 1986; Loveland, 1988; Sakai and Miyachi, 1986].
In some cases the use of negation can be avoided by renaming. If there
are no occurrences in the program or query of p(t1,..., tn) for any terms
366 J. C. Shepherdson

t1,..., tn then negative literals -p(u1,...,u n ) can be made positive by


introducing a new predicate nonp(x1,..., xn) for -(x1,..., xn). This was
considered by Meltzer [1983]. A well known example of it is [Kowalski,
1979] where the statements
Every fungus is a mushroom or a toadstool.
No boletus is a mushroom
are made into Horn clauses by introducing the predicate nonmushroom:
toadstool(x) - fungus(x), nonmushroom(x)
nonmushroom(x) - boletus(x).
This trick obviously works, for the usual SLD-resolution, also when there
are positive occurrences of p(t1,... ,tn) provided none of these can be uni-
fied with any of the negative ones.
There is no problem in allowing queries containing negation if the pro-
gram consists entirely of definite Horn clauses. The usual SLD-resolution
is trivially still complete, because a query containing a negative literal can-
not succeed, and it should not, because it is not a logical consequence of
the program, since that has a model in which all predicates are true of
everything.
There is also no problem in dealing with general Horn clause programs
and queries, i.e. in allowing programs to contain negative Horn clauses
as well as definite Horn clauses, and allowing goal clauses to be either
negative or definite Horn clauses (so the queries, i.e. negations of goals,
are conjunctions containing at most one negative literal). This depends on
the well-known fact that if a set of Horn clauses is inconsistent then so is
some subset containing just one negative clause. The query procedure is
as follows:
First check the program for consistency by taking its positive part, i.e.
the set of definite clauses in it, and querying this in turn with the negations
of the negative program clauses (i.e. take each of these negative clauses as
a goal). If any of these queries succeeds, i.e. if the empty clause can be
derived from any of these goals, then the program is inconsistent. If the
program is found to be consistent then proceed as follows:
If the query consists entirely of positive literals then discard any nega-
tive program clauses and test the query in the usual way with the positive
part of the program. If the query contains a (single) negative literal then its
negation can be written as a definite Horn clause G. If there are no negative
program clauses then fail the query immediately because, as above, a neg-
ative query cannot be a consequence of a positive program. Otherwise add
G to the positive part of the program and query it in turn with each of the
negations of the negative program clauses (i.e. use each of these negative
clauses as a goal). The only unusual feature is that answers to the query
may no longer be of the familiar definite form, i.e. expressible by a single
substitution. Instead every time G is used within one derivation of the
Negation as Failure, Completion and Stratification 367

empty clause, the value substitution is stored as one of the disjuncts of an


indefinite answer substitution. For further details see [Sakai and Miyachi,
1983] or Gallier and Raatz [1987; 1989]. where the treatment is extended
to include the equality relation.

2.2 Negation as failure; SLDNF-resolution


The basic principle of negation as failure is:
if A is a ground atom,
the goal -A succeeds if A fails
the goal -A fails if A succeeds.
This allows the usual SLD-resolution of Horn clause logic programming to
be extended to the case where queries and the bodies of program clauses
contain negative as well as positive literals. This is done, as described in
Section 1.3 above, firstly by admitting only computation rules (rules for
selecting a literal in the goal—sometimes called selection rules) which are
safe in the sense of Lloyd [1987], i.e. which only select a negative literal
if it is ground. When the ground negative literal -A is selected the next
step is to query A; if A succeeds then -A fails, so the main derivation path
fails at this point; if A fails on every evaluation path then -A succeeds
and the next goal is obtained by deleting -A from the current goal. If A
neither succeeds nor fails then the main derivation path ends here in an
inconclusive dead-end. Since we are assuming that the program consists
of a finite set of clauses, when all evaluation paths of A end in failure the
whole derivation tree of A is finite by Konig's lemma, so this is usually
described by saying that A fails finitely or has a finitely failed tree. Since
only ground negative literals can be selected, if we reach a goal containing
only non-ground negative literals we can proceed no further. Such a goal is
called a flounder and the original derivation and query are said to flounder.
We will call this procedure SLDNF-resolution, following Lloyd [1987],
which contains a very detailed recursive definition in terms of the depth
of nesting of negation as failure calls. We find it more convenient to talk
in terms of queries rather than goals. For us a query Q is a conjunction
L1 A . . . A Ln of literals, often written ? - L1,...,£„, and the corresponding
goal is - L1 , . . . , Ln i.e. the negation -Q of the query. When Lloyd says
'P U {- Q} has an SLDNF-refutatiori' we say 'Q succeeds from P using
SLDNF-resolution', and when he says 'P U {- Q} has a finitely failed
SLDNF-tree' we say 'Q fails from P using SLDNF-resolutiori.
Kunen [1989] gives a more succinct definition of these notions as follows.
Let P be the program, R the set of all pairs (Q, 0) such that query Q
succeeds with answer 0, and F the set of all queries which finitely fail.
Then R, F are defined by simultaneous recursion to be the least sets such
that, denoting the identity substitution by 1,
368 J. C. Shepherdson

1. (true,
(true,l)6R
l)E R
2. If Q is Q1 A A A Q2 where A is a positive literal, if A' - Q' is a
clause of P, if 0=mgu(A, A') and ((Q1 A Q' A Q2)o H) E R, R, then
the

3. If Q is Q1 A -A A Q2 where A is a positive ground literal, if A E F


and (Q1 A Q2, o) E R then (Q, o) E R.
4. Suppose (Q is Q1 A A A Q2 where A is a positive literal. Suppose
that for each clause A' - Q' of P, if A' is unifiable with A then
(Q1 A Q' A Q 2 )mgu(A,A') E F. Then Q E F.
5. If Q is Q1 A -A AQ2 where A is a positive ground literal and (A, 1) E R
then Q E F. Here it is assumed that before computing an mgu
the query clause and program clause are renamed to have distinct
variables. Also (oH) / Q denotes the restriction of the substitution
oH to the variables in Q.
One important difference between SLDNF-resolution and SLD-resolution
is that, for the latter, any computation rule can be used, i.e. if a query
succeeds with answer 9 using one computation rule, then it does so using
any other computation rule. This is no longer true for SLDNF-resolution;
e.g. for the program

the query ? - r succeeds if the 'last literal' rule is used but not if the Prolog
'first literal' rule is used. {The reason for this discrepancy is that whether
a query fails using SLD-resolution may depend on the computation rule.}
So it is hard to imagine a feasible way of implementing SLDNF-resolution,
since to determine whether a query succeeds requires a search through all
possible derivation trees, using all possible selections of literals. It is shown
in Shepherdson [1985] that there are maximal computation rules Rm such
that if a query succeeds with answer 6 using any computation rule then it
does so under Rm, and if it fails using any rule then it fails using Rm, but
in Shepherdson [1991] a program is given for which there is no maximal
recursive rule. What causes the difficulty is that in SLDNF-resolution once
having chosen a ground negative literal -A in a goal G you are committed
to waiting possibly forever for the result of the query A before proceeding
with the main derivation. What you need to do is to keep coming back and
trying other choices of literal in G to see whether any of them fail, since
when one of these exists it is not always possible to determine it in advance.
This would be very complicated to implement. This is presumably why
SLDNF-resolution insists that the evaluation of a negative call is pursued
to completion before attending to siblings.
It is also important not to allow the use of negation as failure in the
present form on non-ground negative literals. A query ? - p(x) is taken to
mean ? - 3xp(x), and a query ? - -p(x) to mean ? - 3x-p(x). It is possible
Negation as Failure, Completion and Stratification 369

that both of these axe true so it would be unsound to fail ? - 3z-ip(a;) just
because ? — 3xp(x) succeeded. For example for the program

p(a)

?-p(x) should succeed because ?- p(a) succeeds, and ?- -p(x) should also
succeed, because ? - -p(b) succeeds (using negation as failure legitimately
on the ground negative literal -p(b)). Most Prolog implementations are
unsound because they allow the use of negation as failure on non-ground
negative literals, or even on more general goals. They are, of course also
incomplete even for SLD-resolution because of their depth-first search, so
it would appear to be very difficult to give a simple declarative semantics
for negation as treated in Prolog.
This inability to handle non-ground negative literals means that if we
reach a flounder, a goal containing only non-ground negative literals, we
can, in SLDNF-resolution, proceed no further. This situation is a source of
incompleteness in SLDNF-resolution. For the program p(x) - - r ( x ) , the
query ? - p(a) succeeds but ? - p(x) (meaning ? - 3xp(x)) does not. For
the program
p(x) - -q(x)
r(a)
the query ? - p ( x ) , r(x) succeeds but the query ? - p(x) flounders. So the
set of queries which succeed from a program P using SLDNF-resolution is
not closed under either of the rules:
o(a) AAB
3xo(x) A

It is therefore impossible to find a declarative semantics for SLDNF-


resolution which is sound and complete for all queries, i.e. a set S(P) of
sentences such that
? - Q succeeds from P using SLDNF-resolution iff S(P) - EQ.
Since the two rules of inference above are valid for intuitionistic and 3-
valued logic, indeed for any logic which might conceivably be useful, it will
not be possible to find such a semantics even if we weaken the classical
derivability relation - to the intuitionistic or 3-valued or some other deriv-
ability relation. [For a further discussion of this point see [Fine, 1989], who
distinguishes two souces of incompleteness, one arising from the classical
and the other from the non-classical part of the logic.]
Borger [1987] ([Apt, 1990] has a simpler proof) has shown that the
problem of deciding whether a query flounders is recursively undecidable.
The usual way of dealing with the incompleteness due to floundering is to
370 J. C. Shepherdson

restrict attention to programs and queries satisfying some condition which


prevents it. We discuss such conditions in Section 2.7. Since they are very
restrictive we consider in Section 3.1 an alternative approach which extends
SLDNF'-resolution so that it can be applied, in some cases, to non-ground
negative literals. For this extension, SLDNFS-resolution, it is possible to
find a (rather complicated) semantics for which it is sound and complete
for all programs and queries.
The practical logic programmer may not be disturbed by the incom-
pleteness due to floundering because it may well be insignificant compared
with the incompleteness forced by the limitations of time and space im-
posed by any implementation on a real machine. An interesting way of
dealing with floundering has recently been proposed by Mancarella et al.
[1988].

2.3 The closed world assumption, CWA(P)


The closed world assumption is one of the commonly accepted declarative
semantics for negation in logic programming. It is particularly appropriate
for database applications, being founded on the idea that the program
(database) contains all the positive information about the objects in the
domain. Reiter [1978] gave this a precise formulation by saying that any
positive ground literal not implied by the program is taken to be false. He
axiomatised this by adjoining the negations of these literals to the program
P thus obtaining

CWA(P) =PU {-A : A is a ground atom and P / A}.

He also restricted consideration to what in logic programming are usually


called Herbrand models, i.e. models whose domain of individuals consists
of the ground terms. This makes the closed world assumption (when it is
consistent) categorical, i.e. if it has an Herbrand model that model must be
unique, because a ground atom A must be true in it if it is a consequence
of P and false if it is not a consequence of P.
{Actually in the usual logic programming situation, where P consists
of clauses (or, more generally, universal sentences), and the query Q is an
existential sentence, and neither P nor Q contains =, the restriction to
Herbrand models is irrelevant, i.e.

CWA(P) =H Q iff CWA(P) = Q

where T =H S means 'S is true in all Herbrand models of T'. This is be-
cause CWA(P)U-Q consists of universal sentences not containing equality
so, by the usual Herbrand-Skolem argument, if it has a model it has a Her-
brand model. In particular if CWA(P) is consistent it has a Herbrand
model.}
Negation as Failure, Completion and Stratification 371

If P is a definite Horn clause program then CWA(P) is consistent,


because the Herbrand interpretation in which a ground atom A is true
iff P - A, satisfies P (since if B - A1,..., Ar is a ground instance of a
clause of P and P - A1, ..., P - Ar then P - B); and it clearly satisfies
the remaining axioms (-A if P - A) of CWA(P). {This is of course the
familiar least Herbrand model of P.}
The closed world assumption is a very strong presumption in favour
of negative information and is often inconsistent. This is the case if the
program implies indefinite information about ground atoms, e.g. if P is
p - -q, then neither p nor q is a consequence of P so both -p and -q
belong to CWA(P), and since CWA(P) also contains P it is inconsistent.
This condition is actually necessary and sufficient for the inconsistency of
CWA(P):
CWA(P) is consistent iff for all ground atoms A1,... , Ar,
P - (A1 V ... V Ar) implies P - Ai for some i = 1,..., r.
The 'only if part of this follows as in the example; the 'if part follows
from the compactness theorem, for if CWA(P) is inconsistent so is some
finite subset of it, so there exist -A1,..., -AT in CWA(P) such that P U
{-A1,..., -A r } is inconsistent, i.e. P h (A1 V ... V Ar).
Indefinite information about non-ground literals need not imply the
inconsistency of the closed world assumption, e.g. CWA(P) is consistent
for the program P:
p(x) - -g(x)
p(a)
q(b).
However the consistency of the closed world assumption for certain
extensions of P does imply that P is equivalent to a definite Horn clause
program, and, by the remark above, conversely:
If P is a set of first order sentences, then P is equivalent to a set of
definite Horn clauses iff CWA(PUS) is consistent for each set S of ground
atoms, possibly involving new constants.
[For a proof see [Makowsky, 1986] or [Shepherdson, 1988b); the result
holds for first order logic with or without equality.]
This means that if you want to be able to apply the closed world as-
sumption consistently to your program, and to any subsequent extension
of it by positive facts, possibly involving new constants, then you are con-
fined to definite Horn clause programs. This may be appropriate when the
program is a simple kind of database but most logic programmers would
consider it too restrictive and would therefore look for some other form of
default reasoning for dealing with negation. Note that if the phrase 'possi-
bly involving new constants' is omitted the result above fails, also that the
consistency of the closed world assumption depends on the underlying lan-
guage. The non-Horn program above satisfies the closed world assumption
372 J. C. Shepherdson

(i.e. CWA(P) is consistent) if the Herbrand universe, i.e. the set of ground
terms, is the usual one {a, b} determined by the terms appearing in P, and
it continues to do so if any more atoms from the corresponding Herbrand
base {p(a), p(b), q(a), q(b)} are added to the program. But if the Herbrand
universe is enlarged to {a, b, c} the closed world assumption becomes in-
consistent. This ambiguity does not arise if the program is a definite Horn
clause program, for the argument above shows that the closed world as-
sumption for such a program is consistent whatever Herbrand universe is
used.
Makowsky [1986] observed that the consistency of the closed world as-
sumption is also equivalent to an important model-theoretic property:
If P is a set of first order sentences then a term structure M is
a model for CWA(P) iff it is a generic model of P, i.e. for all
ground atoms A, M = A iff P = A.
{We use the words 'term structure' rather than 'Herbrand interpreta-
tion' here because in this more general setting where the sentences of P
may not be all universal, the word 'Herbrand' would be more appropriately
used for the language extended to include the Skolem functions needed to
express these sentences in universal form.} This notion of a generic model
is like that of a free algebraic structure; just as a free group is one in which
an equation is true only if it is true in all groups, so a generic model of P
is one in which a ground atom is true only if it is true in all models of P,
i.e. is a consequence of P. So it is a unique most economical model of P
in which a ground atom is true iff it has to be. If we identify a term model
in the usual way with the subset of the base (set of ground atoms) which
are true in it, then it is literally the smallest term model. It is easy to see
that the genericity of a generic term model extends from ground atoms to
existential quantifications of conjunctions of ground atoms:
// M is a generic model of P, and Q is a conjunction of atoms,
then M =3Q iff P = E Q .
So a positive query is true in M iff it is a consequence of P, and one
can behave almost as though one was dealing with a theory which had a
unique model. This is one of the attractive features of definite Horn clause
logic programming, which, as the results above show, does not extend much
beyond it. But notice that if one is interested in answer substitutions then
one cannot restrict consideration to a generic model, e.g. if P is:

p(s(x))

then the identity substitution is a correct answer to the query ? - p(x)


in the least Herbrand model, but is not a 'correct answer substitution',
because Axp(x) is not a consequence of P.
Negation as Failure, Completion and Stratification 373

For negative queries not only is this genericity property lost, but as
Apt et al. [1988] have pointed out, the set of queries which are true under
the closed world assumption, i.e. true in the generic model, may not even
be recursively enumerable, that is to say there may be no computable
procedure for generating the set of queries which ought to succeed under
the closed world assumption. To show this take a non-recursive recursively
enumerable set W and a definite Horn clause program P with constant 0,
unary function symbol s and unary predicate symbol p, such that
P - p(sn(0)) iff n E W.
[This is possible since every partial recursive function can be computed
by a definite Horn clause program ([Andreka and Nemeti, 1978]). Now
-p(sn(0)) is true under the closed world assumption iff n EW. However,
this situation does not arise under the conditions under which Reiter origi-
nally suggested the use of the closed world assumption, namely when there
are no function symbols in the language. Then the Herbrand base is finite
and so is the model determined by the closed world assumption, hence the
set of true queries is recursive.
The relevance of the closed world assumption to this chapter is that
negation as failure is sound for the closed world assumption for both success
and failure, i.e.
if ? - Q succeeds from P with answer 0 using SLDNF-resolution
then CWA(P) =H QO,
if?-Q fails from P using SLDNF-resolution then CWA(P) =H
-Q.
For a proof see [Shepherdson, 1984]. However, SLDNF-resolution is
usually incomplete for the closed world assumption. This is bound to be
the case when the closed world assumption is inconsistent. Even for defi-
nite Horn clause programs, where the closed world assumption will be con-
sistent, and even when there are no function symbols, SLDNF-resolution
can be incomplete for the closed world assumption, e.g. for the program
p(a) - p(a), and the query ? - -p(a), since this query ends in a dead-end.
{As noted above, the restriction to Herbrand models makes no difference
in the first of the soundness statements above, i.e. it remains true when =H
is replaced by - This is not true of the second, where we are are dealing
with the negation of a query. For example if P consists of the single clause
p(a) - p(b) and Q is p(x) then Q fails from P but CWA(P) =H -Q is not
true since there are non-Herbrand models of CWA(P) containing elements
c with p(c) true.}
It is possible that one might want to apply the closed world assumption
to some predicates but to protect from it other predicates where it was
known that the information about them was incomplete. For reference to
this notion of protected data see [Minker and Perlis, 1985; Jager, 1988]
374 J. C. Shepherdson

which study the model-theoretic aspects of this relativized closed world


assumption. For more details of the model-theory of the ordinary closed
world assumption see [Makowsky, 1986]. In Section 2.8 we discuss briefly
various model-theoretic semantics which can be regarded as weak forms of
the closed world assumption, e.g. the generalized closed world assumption
of [Minker, 1982], and the self-referential closed world assumption of [Fine,
1989]. However not all of these have an obvious relation to negation as
failure.
Summary: The closed world assumption is a natural and simple way of
dealing with negation for programs which represent simple databases, i.e.
definite Horn clause programs without function symbols. Its use outside
this range is limited: as soon as function symbols are introduced there may
not be any sound and complete computable proof procedure for it, and
if one goes beyond definite Horn clause programs it will be inconsistent,
either for the original program or for some extension of it by positive atoms.
We move on now to deal with the Clark completion of a program which
provides a more widely applicable semantics for negation as failure.

2.4 The Clark completion, comp(P)


The most widely accepted declarative semantics for negation as failure is
the 'completed database' introduced by Clark [1978]. This is now usually
called the completion or Clark completion of a program P and denoted by
comp(P). We shall define this initially only when P is what Lloyd [1987]
calls a normal program, i.e. a set of clauses (not containing the predicate
= ) of the form

where A is an atom and L1 ,..., Lm are literals. [Later we shall consider


the extension of this definition to the case where the body of the clause
is an arbitrary first order formula.] To avoid tedious repetition we shall
assume, throughout this section, unless otherwise stated, that all programs
referred to are normal. Similarly goals and queries will be assumed to be
normal, i.e. of the forms - L1,..., Lm, and ? - L1,..., Lm, respectively.
To form comp(P) you take each clause

of P in which the predicate symbol p appears in the head, rewrite it in


genera] form

where X 1 , . . . , x n are new variables (i.e. not already occurring in any of


these clauses) and y 1 , . . . , yp the variables of the original clause. If the
Negation as Failure, Completion and Stratification 375

general forms of all these clauses (we assume there are only finitely many)
are:
p(X1,...,Xn) - E1

then the completed definition of p is

The empty disjunction is taken to be false, so if j = 0 i.e. there is no clause


with p in its head, then the completed definition of p is

The completion, comp(P) of P is now defined to be the collection of com-


pleted definitions of each predicate symbol in P together with the equality
and freeness axioms below, which we shall refer to as CET (Clark's equa-
tional theory).
equality axioms
x —x
x = y - y =x
x =yAy =z - x =z
x 1 = y 1A...Axn = y n - (P(X1, . . . ,xn) - p(y1,...y n )), for each predicate
symbol p
x1 = y1 A . . . A xn = yn - f ( x 1 , . . . , x n ) = f ( y 1 , . . . , y n ) , for each function
symbol /.
freeness axioms
f ( x 1 , . . . , x n ) = g(y1,...,y m ), for each pair of distinct function symbols
f, g ,
f ( x 1 , . . . , x n ) = f ( y 1 , . . . , y n ) - x1 = y1 A ... A xn = yn, for each function
symbol /,
t(x) = x, for each term t(x) different from x in which x occurs.
The equality axioms are needed because the completed definitions of
predicates contain the equality predicate. The freeness axioms are needed
because in SLDNF-resolution terms such as f ( x 1 , . . . , xn), g ( y 1 , . . . , y m ) are
not unifiable.
In stating these axioms constants are treated as 0-ary function symbols,
and the axioms are stated for all the function and predicate symbols of the
language. We assume that this language is given 'in advance of P' so
to speak, rather than, as is often assumed in logic programming, being
determined by the function and predicate symbols actually occurring in P.
Note that, because it contains the freeness axioms, comp(P), like CWA(P),
376 J. C. Shepherdson

depends on the language as well as on P. For example if P is the program


p(a) and the language contains no function symbols and just one constant
a, then comp(P) consists of the equality axioms and p(x) - x = a; but
if the language has another constant b then comp(P) contains the freeness
axiom a = b. In the latter case comp(P) - 3x-p(x), but not in the former
case. For a precise statement of the way in which comp(P) depends on the
language of function symbols see Section 2.6. Note also that, like CWA(P),
comp(P) is an extension of P:

comp(P) - P.

The basic result of [Clark, 1978] is that negation as failure is sound for
comp(P) for both success and failure:
if? - Q succeeds from P with answer B using SLDNF-resolution
then comp(P) = Q0,
if? - Q fails from P using SLDNF-resolution then comp(P) =
-Q.
For a proof see [Lloyd, 1987, pp. 92, 93]. The key step in the proof is
to show that comp(P) implies that in an SLDNF-derivation tree a goal is
equivalent to the conjunction of its child goals.
This soundness result can be strengthened by replacing the derivabil-
ity relation = by =3 (truth in all 3- valued models, for a specific 3-valued
logic - [Kunen, 1987, Section 2.7]) or by -1, the intuitionistic derivability
relation ([Shepherdson, 1985]) or by a relation -3I, which admits only rules
which are sound both for classical 3-valued and intuitionistic 2-valued logic
([Shepherdson, 1985]). This helps to explain why SLDNF-resolution is usu-
ally incomplete for comp(P) with respect to the usual 2-valued derivability
relation; in order to succeed a query must not only be true in all 2-valued
models of comp(P), it must be true in all 3-valued models, and must be
derivable using only intuitionistically acceptable steps. For example, if P
is the program p - -p, then comp(P) contains p - -p and is inconsistent
in the 2-valued sense, so p is a 2-valued consequence of it, and if SLDNF-
resolution were complete for comp(P) then ? - p should succeed. In fact it
dead-ends. This can be explained by noting that there is a 3-valued model
of comp(P) in which p is not true but undefined. Similarly if P is the
program

whose completion contains p - (q V -q), the fact that ? - p does not


succeed using SLDNF- resolution can be explained by observing that p is
not intuitionistically derivable from comp(P), since the law of the excluded
middle does not hold in intuitionistic logic.
Negation as Failure, Completion and Stratification 377

Despite these kinds of incompleteness, and those due to floundering,


pointed out in Section 1.3, comp(P) is regarded by many logic program-
mers as the appropriate declarative semantics for the procedure of SLDNF-
resolution applied to a program P, i.e. as the meaning they had in mind
when they wrote down the program P. This is somewhat removed from the
clarity aimed at in ideal logic programming, where the declarative meaning
of a program should be apparent from the text of the program as written.
Although in simple cases comp(P) may be what most people have in mind
when they write P, it is not always easy when writing P to foresee the effect
of forming comp(P), particularly when P contains clauses involving mutual
recursion and negation. In fact we shall see later in this section that, using
only clauses of the form A - -B, where A and B are atoms, it is possible
to construct a program P such that comp(P) is essentially equivalent to
any given formula of first order logic. However the same objection, that
it is often not easy to read off the declarative meaning from the program
as written, applies to the closed world assumption; indeed we saw that it
could be impossible to decide whether -A belonged to CWA(P).
Whereas CWA (P) depends only on the logical content of P, the com-
pletion, comp(P), depends on the way P is written; the completion of
p - -q contains p - -q, and -q, which is equivalent to p and -q, but the
completion of the logically equivalent program q - -p has these reversed.
The completion also differs from the closed world assumption in not
implying a restriction to Herbrand models. At least that is the usual un-
derstanding. There are some people who impose this restriction on the
completion, i.e. who regard a query ? - Q as a request to know whether
-Q is true in all Herbrand models of comp(P). However, it should be
noted that if this is done it may lead to non-computable semantics, i.e.
[Shepherdson, 1988a]:
The set of negative ground literals -A which are true in all
Herbrand models of comp(P), for a definite program P, may
not be recursively enumerable.
[Unlike the set of sentences which are true in all models of comp(P),
which is recursively enumerable by the Godel completeness theorem. We
saw above that the closed world assumption could give rise to non-comput-
able semantics for a different reason, namely the non-recursive enumerabil-
ity of the set of sentences CWA (P).]
We have followed the usual convention here of using 'Herbrand model
of comp(P)', to refer to a term model of comp(P) over the language L used
to express P, i.e. a model whose domain is the set of ground terms of L
(the Herbrand universe of L), with functions given the free interpretation
(e.g. the value of the function / applied to the term t is simply f(t)).
This use of the term 'Herbrand' is rather misleading since it suggests that
the Skolem-Herbrand theorem, that if a sentence has a model it has a
378 J. C. Shepherdson

Herbrand model, should be applicable, and that restriction to Herbrand


models should make no difference. The reason it does make a difference
here is that for this theorem to apply the Herbrand universe must be large
enough to contain Skolem functions enabling the sentence to be written in
universal form. The appropriate Herbrand universe for comp(P) would be
one containing Skolem functions allowing the elimination of the existential
quantifiers occurring on the right hand sides of the completed definitions
of the predicate symbols of P. And even over that universe one could
not restrict to free interpretations because comp(P) contains the equality
predicate. Nevertheless the 'Herbrand models' of comp(P) are considered
to be of particular interest, presumably because the Herbrand universe of
the original language in which P is expressed is often the only domain of
individuals the programmer wants to consider. There is a very neat fixpoint
characterisation of these models which we now consider.
First we associate with the program P the operator Tp which maps a
subset 7 of the Herbrand base (set of ground atoms) BL into the subset
Tp (I) comprising all those ground atoms A for which there exists a ground
instance
A - L1,...,L m
of a clause of P with all of L1,..., Lm in 7. Tp(I) is the set of immediate
consequences of /, i.e. those which can be obtained by applying a rule from
P once only. [Note that Tp depends not only on the logical content of P
but also on the way it is written e.g. adding the tautology p - p changes
T P .] Clearly
I is a model for P iff TP(I) C I,
i.e. iff / is a pre-fixpoint of Tp. Following Apt et al. [1988] let us say an
interpretation / is supported if for each ground atom A which is true in 7
there is a ground instance

of a clause of P such that L1,...L m are true in 7. The terminology comes


from the idea of negation as the default, that ground atoms are assumed
false unless they can be supported in some way by the program. Clearly
7 is supported iff Tp(I) D I,
7 is a supported model of P iff it is a fixpoint of Tp, i.e.
TP(I) = I.
The notion of being supported is equivalent to satisfying the 'only if
halves of the completed definitions of the predicate symbols in comp(P),
so it follows that:
I is a Herbrand model of comp(P) iff it is a supported model of
P.
Negation as Failure, Completion and Stratification 379

I is a Herbrand model of comp(P) iff it is a fixpoint of Tp.


This fixpoint characterisation of Herbrand models of comp(P) (due to
Apt and van Emden [1982] for positive programs and to Apt et al. [1988]
for normal programs) can be extended to arbitrary models by replacing
the Herbrand base by the set of formal expressions p ( d 1 , . . . d n ) where p is
a predicate symbol and d 1 , . . . , d n are elements of the domain; see [Lloyd,
1987, p. 81].
If P is composed of definite Horn clauses then Tp is continuous and
has a least fixpoint lfp(Tp) which is obtainable as Tp <w>, the result of
iterating Tp w times starting from the empty set. This is the least Herbrand
model of P and of comp(P) (so comp(P) is consistent). In general Tp is
not continuous or even monotonic, so it may not have fixpoints, e.g. if P is
p - -p.
If P is a definite Horn clause program then comp(P) adds no new pos-
itive information:
If P is a definite Horn clause program and Q is a positive sen-
tence, then comp(P) = Q implies P = Q.
A positive sentence is one built up using only V, A, V, E. This is an imme-
diate consequence of the existence of the least fixpoint model in the case
where Q is a ground atom, and the argument is easily extended to cover
the case of a positive sentence [Shepherdson, 1988b].
The closed world assumption and the completion are superficially sim-
ilar ways of extending a program. They are both examples of reasoning
by default, assuming that if some positive piece of information cannot be
proved in a certain way from P then it is not true. But for the closed
world assumption the notion of proof involved is that of full first order
logic, whereas for the completion it is 'using one of the program clauses
whose head matches the given atom'. At first sight this is a narrower no-
tion of proof, so that one would expect that more ground atoms should
be false under the completion than under the closed world assumption, i.e.
that CWA(P) should be a consequence of comp(P). But it is not so simple
because comp(P) adds, in the 'only if halves of the completed definition
of a predicate symbol p, new statements which can be used to prove things
about predicates other than p. Also the closed world assumption involves
a restriction to Herbrand models, which the completion, as usually defined
does not. In fact when they are compatible it must be the closed world
assumption which implies the completion, because the former is categor-
ical. As the simple examples of Section 1.2 showed, either of the closed
world assumption and the completion can be consistent and the other one
not, or both can be separately consistent but incompatible. For conditions
under which they are compatible, and the relation of CWA(P) U comp(P)
to CWA(comp(P)), see [Shepherdson, 1988b].
Although in defining comp(P) above we assumed that P was a normal
380 J. C. Shepherdson

program, i.e. composed of clauses of the form

where A is an atom and L1,...,L m are literals, the same definition could
be applied when P is a set of sentences of the more general form

where W is a first order formula. This allows much more general statements
to be made, but as Lloyd and Topor [1984] have observed, if one takes the
view that the intended meaning of a program P is not P but comp(P), then
there is no gain in generality. Indeed they show how to transform such a
program P into a normal program P' such that comp(P') is essentially
equivalent to comp(P) (the precise sense of equivalence is explained later).
It must be emphasised that the validity of this transformation depends
crucially on the assumption that the meaning of the program is given by
comp(P). Their transformation makes comp(P') equivalent to comp(P); it
does not make P' equivalent to P. If the intended meaning of the program
was the program P actually written, then the appropriate way to obtain
an equivalent normal program would be to introduce Skolem functions to
get rid of the quantifiers. That would give a program P' essentially equiv-
alent to P - b u t then comp(P') would not be equivalent to comp(P). So
both methods of transformation change the relationship between P and
comp(P). Granted that the intended meaning is comp(P), the transforma-
tion of Lloyd and Topor is useful since it enables one to express comp(P)
using a P expressed in a form closer to natural language, and then convert
P into a normal program P' on which one can use SLDNF-resolution to
derive consequences of comp(P'), i.e. of comp(P). One can also allow the
query to be an arbitrary first order formula W, since if W has free variables
x1,...,x n , the effect of asking ? - W for the program P is the same as ask-
ing ? - answer(x1,..., xn) for the program P* obtained by adding the clause
answer(x1,..., xn) - W to P. More precisely, WO is a logical consequence
of comp(P) iff answer(x1,... , xn)0 is a logical consequence of comp(P*),
and -W is a logical consequence of comp(P) iff -answer(x1,... ,xn) is a
logical consequence of comp(P*). For a proof of this, and of the validity of
the transformation of P into P' given below, see [Lloyd, 1987, Ch. 4].
The easiest way of effecting a transformation of a general program P
into a normal program P' which sends comp(P) into comp(P') is to use
the fact that every first order formula can be built up using -, V, and 3.
Replacing

by
Negation as Failure, Completion and Stratification 381

leaves the completion unaltered, as does replacing

by

Replacing

by

where x1,...,x n are the free variables of W, and p is a new predicate


symbol, does change the completion, because it introduces a new predicate
symbol, but the new completion, comp(P'), is a conservative extension
of the old one, comp(P), i.e. a formula not involving the new predicate
symbols is a consequence of comp(P') iff it is a consequence of comp(P).
{The reason for this is that if you leave out the new predicate symbols a
model for comp(P') is a model for comp(P); conversely a model for comp(P)
can be extended to a model for comp(P') by using the completed definition
of v in comp(P'), i.e.

to define p.} If we now take each statement of the original general program
P, rewrite its body in terms of -, V, and E, and then, starting with the
outermost connective of its body, successively apply these transformations
we end up with a normal program P' such that comp(P') is a conservative
extension of comp(P), so that in particular any query expressed in the
original language is a consequence of comp(P') iff it is a consequence of
comp(P).
The transformation just described is very uneconomical, and goes fur-
ther than it needs. The clauses of the resulting normal program P' are
of the very simple forms A - B, A - -B, where A and B are atoms.
{In fact by replacing B by --B, you can take them all in the latter form!
This demonstrates that it is not always easy to predict the consequences of
forming the completion of a program even if its individual clauses are very
simple.} The body of a normal program clause is allowed to be a conjunc-
tion of literals, so there is no need for us to replace A A B by -(-A V -B)
and apply four transformation steps. The transformations given by Lloyd
and Topor take account of this and are framed in terms of transforming if
382 J. C. Shepherdson

statements whose bodies are conjunctions, e.g. the rule for eliminating V
is to replace

by

They are also as sparing as possible in the use of transformations to elimi-


nate negation. The reason for this is that the whole point of the transfor-
mation of P into P' is to be able to use SLDNF-resolution on P' to find
out whether queries are consequences of the completion comp(P) of the
original program P. And use of the new clause

resulting from our transformation of A - -W, introduces, unless n = 0, a


non-ground negative literal and is very likely to result in a flounder. Indeed
if W has free variables which are not in A, it is bound to do so, since these
variables can never be grounded. To avoid this as far as possible Lloyd and
Topor replace (in a conjunct of the body as above)

and, finally a conjunct -Ex1... En W is dealt with as we dealt with negation


above, i.e.

is replaced by

and

where y1,...,yk are the free variables in Ax1...Ax n W and p is a new


predicate symbol. This last transformation is the only one which introduces
a new predicate symbol.
Negation as Failure, Completion and Stratification 383

Although this method of transformation reduces the chance of floun-


dering when using SLDNF-resolution on the transformed program it does
not eliminate it, for the elimination of a universal quantifier A x W ( x ) still
leads to a clause with -W(x) in its body. This may not give an inevitable
flounder, e.g. if W(x) is p(x) - q(x) this is further transformed to a clause
with q(x) A -p(x] in its body, which will not flounder if x is grounded by
q(x). But if W(x) is an atomic formula p(x), or a formula starting with
an existential quantifier it is bound to cause a flounder. In fact it is eas-
ily seen that the transformation of any formula containing an alternation
of quantifiers leads to an inevitable flounder. This limits the use of such
general programs in practice. It is not surprising that SLDNF-resolution,
based as it is on SLD-resolution, which is complete only for the definite
Horn clause fragment, should be incapable of dealing effectively with full
first order logic. In fact this transformation method throws some light on
the incompleteness of comp(P) for normal programs. We have
Given any sentence o of first order logic, you can construct a
normal program P' such that comp(P') is a conservative exten-
sion of o, together with the equality and freeness axioms.
To prove this take the general program P with statement

where p is a new 0-ary predicate symbol, together with

for each ra-ary predicate symbol in o (this is to ensure that forming the
completion does not add any statements about these predicates). Apart
from the equality and freeness axioms comp(P) is equivalent to

which is equivalent to -p A o. Now transform this into a normal program


P' such that comp(P') is a conservative extension of comp(P).
This means that if you regard comp(P) as the correct declarative mean-
ing of a normal program, so that when you ask a query you are asking
whether it is a consequence of comp(P), then you are dealing with the
consequence relation of full first order logic and you need a complete proof
procedure for that. It is only to be expected that SLDNF-resolution, which
is SLD-resolution with a little trick for dealing with ground negative literals
tacked on, should usually be very incomplete.
We have tacitly assumed in the above that the statements of the general
program P do not contain the equality predicate =. It is possible to define
384 J. C. Shepherdson

comp(P) in the same way when they do, and it is possible to transform P
into a normal program P' such that comp(P') is a conservative extension
of comp(P). To do this we deal with equations t1 = t2 as follows: use
the equality and freeness axioms to reduce them successively either to false
or to conjunctions of equations of the form x = t, where a is a variable,
then replace all occurrences of x by t and discard the equation. However
programs containing = are not usually considered in this way because the
fact that forming the completion entails adding the freeness axioms means
that = is severely constrained, and the meaning a statement involving it
will have when the completion is made is not easy to predict. In fact
for programs, normal or otherwise, involving =, the freeness axioms are
inappropriate and a different treatment is called for; see [Jaffar et al, 1984;
Jaffar et al., 1986a] for the case of SLD-resolution, [Shepherdson, 1992b]
for SLDNF-resolution.

2.5 Definite Horn clause programs


For definite Horn clause programs there are some completeness results for
negation as failure with respect to comp(P). We start with definite queries,
where negation as failure is not involved and SLDNF-resolution coincides
with SLD-resolution (since the program is definite). Denote the existential
quantification of Q by 3Q.
Let P be a definite Horn clause program and Q a definite query. Then
using SLDNF-resolution with program P,
1. whatever computation rule is used Q succeeds with answer including
0 iff comp(P) = Q0,
2. whatever computation rule is used Q succeeds iff comp(P) N 3Q,
3. if a fair computation rule is used Q fails iff comp(P) = -EQ.
Here an 'answer including 0' means an answer 0' such that 0 = 0'u
for some substitution u and all variables x in Q, and a fair computation
rule is one where on each infinite evaluation path (some further instantiated
version of) every literal in the goal is eventually chosen. The 'first (leftmost)
literal' rule of Prolog is not fair and the conclusion of (3) does not hold,
e.g. for the program p - p the query p,q does not fail with the Prolog rule
although -(p A q) is a consequence of comp(P). The Prolog rule could be
made fair either by cycling the literals around, last to first, before choosing
the first literal, or by putting the new literals introduced by unification at
the right hand end. Both of these rules have the effect that all the original
literals of a goal are chosen before any of their children. By considering the
program p - p, q - q and the query p, q it is easily seen that there is no fair
rule of the kind defined in Lloyd [1987], where the selected literal depends
only on the current goal, and permutation of literals is not allowed.
(1), (2), (3) are easy consequences of the completeness of SLD-resolution
for definite queries and the 'completeness of the negation as failure rule for
Negation as Failure, Completion and Stratification 385

definite programs and queries'; see [Lloyd, 1987, Chs 2, 3]. Let us see to
what extent these results can be extended to normal queries , i.e. those of
the form A 1 ,..., Ar, -B 1 ,..., -BS ([Apt, 1990, Section 6] contains a more
thorough discussion of this). In this case the negation as failure rule is
invoked only at the end of the computation, to declare a grounded -Bj6
a success or failure according to whether Bj0 fails or succeeds using SLD-
resolution. The soundness of SLDNF-resolution with respect to comp(P)
implies that the 'only if halves of (1),(2),(3), are true for all such queries,
but the 'if halves need additional conditions.
For normal queries Q containing negative literals,
1. above holds if the program P and query Q are allowed,
2. holds if the query Q is ground,
3. holds if Q is ground and contains no positive literals.
These results follow easily from the results above for definite queries.
(1) depends on the fact that if the program and query are allowed then
any computed answer to the query must be ground, because a variable in
the query can only be removed by grounding. (2) is the same as (1) if
Q is ground. (3) uses the fact that if comp(P) implies a disjunction of
ground atoms then it implies one of them, which follows from the existence
of the least fixed point model. The following counter examples show that
(1),(2),(3), do not hold if the conditions on Q are removed. The program
p(x) and allowed query p(x),-q(x) violate (1). The allowed program
p(a) p ( a ) «— p(f(a)) and allowed query p ( x ) , - p ( f ( x ) ) violate (2)
because comp(P) implies - p ( f ( f ( a ) ) ) hence that either x = a or x = f(a)
is a correct answer to the query, but it does not give a definite answer,
and SLDNF-resolution can only give definite answers. (2) would, like (1),
be true for all allowed programs and queries if the classical consequence
relation = was replaced by the intuitionistic derivability relation 1, because
3Q is an intuitionistic consequence of comp(P) iff some Q0 is. Finally the
program p <- p and query p, ->p violate (3).
For definite programs the relation between CWA(P) and comp(P) is
clear. According to CWA(P) all ground atoms not in the least fixed point
model Tp |ware deemed false, but comp(P) implies the falsity only of the
subset of these which fail under SLD-resolution. It can be shown ([Apt and
van Emden, 1982]) that this latter set is the complement with respect to the
Herbrand base Bp of the set Tp w obtained by starting with Tp 0 = Bp
and defining T 4 (a + 1) = TP(TP 4 a),T P a = 0<aTP 4 0 for a a limit
ordinal.

2.6 Three-valued logic


Three-valued logic seems particularly apt for dealing with programs, since
they may either succeed or fail or go on forever giving no answer. And it
seems particularly apt for discussing database knowledge where we know
386 J. C. Shepherdson

some things are true, some things are false, but about other things we do
not know whether they are true or false. Kleene [1952] introduced such
a logic to deal with partial recursive functions and predicates. The three
truth values are t, true, f, false and u, undefined or unknown. A connective
has the value t or f if it has that value in ordinary 2-valued logic for all
possible replacements of u's by t or f, otherwise it has the value u. For
example p - q gets the truth value t if p is f or q is t but the value u if
p, q are both u. (So p - p is not a tautology, since it has the value u if p
has value u.) The universal quantifier is treated as an infinite conjunction
so V x o ( x ) is t if o(a) is t for all a, f if o(a) is f for some a, otherwise u
(so it is the glb of truth values of the o(a) in the partial ordering given by
t - u, f - u). Similarly 3xo(x) is t if some o ( a ) is t, f if all o(a) are f,
otherwise u.
This logic has been used in connection with logic programming by My-
croft [1983] and Lassez and Maher [1985]. Recent work of Fitting [1985]
and Kunen [1987; 1989] provide an explanation of the incompleteness of
negation as failure for 2-valued models of comp(P). It turns out that nega-
tion as failure is also sound for comp(P) in 3-valued logic, so it can only
derive those consequences that are true not only in all 2-valued models but
also in all 3-valued models.
Use of the third truth value avoids the usual difficulty with a non-Horn
clause program P that the associated operator Tp which corresponds to
one application of ground instances of the clauses regarded as rules, is no
longer monotonic. To avoid asymmetric associations of T with true, let
us call the corresponding operator for 3-valued logic Sp. This operates on
pairs (To, Fo) of disjoint subsets of Bp, the Herbrand base of P, to produce
a new pair (71, Fi) = Sp(To, Fo), the idea being that if the elements of
TO are known to be true and the elements of Fo are known to be false,
then one application of the rules of P to ground instances shows that the
elements of T1 are true and the elements of F1 are false.
Formally, for a ground atom A, we put A in 71 iff some ground instance
of a clause of P has head A and a body made true by (T0, FO), and we put
A in F1 iff all ground instances of clauses of P with head A have body made
false by (To.Fo). In particular if .A does not match the head of any clause
of P it is put into FI, which is in accordance with the default reasoning
behind negation as failure.
A little care is needed in defining the notion of a 3-valued model and the
notion of comp(P). A 3-valued model of a set of sentences 5 is a non-empty
domain together with interpretations of the various function symbols in the
same way as in the 2-valued case. The equality relation '=' is interpreted
as identity (hence is 2-valued), the sentences in 5 must all evaluate to t,
and so must what Kunen calls CET or 'Clark's equational theory', i.e. the
equality and freeness axioms listed above as part of comp(P). If we were
now to write the rest of comp(P), the completed definitions of predicates,
Negation as Failure, Completion and Stratification 387

in the form

using the Kleene truth table for - then we should be committing ourselves
to 2-valued models, for p - p is not t but u when p is u. Kunen therefore
replaces this - with Kleene's weak equivalence - which gives p - q the
value t if p, q have the same truth value, f otherwise. (Note: Our notation
here agrees with Fitting but not with Kunen who uses - instead of our -
and = instead of our -) This saves comp(P) from the inconsistency it can
have under 2-valued logic. For example if P is p - -p then the 2-valued
comp(P) is p - -ip, which is inconsistent; what Kunen uses is p - -p
which has a model with p having the value u. Having done this, if we want
comp(P) ^3 P to hold i.e. 3-valued models of comp(P) also to be models
of P then we must replace the — in the clauses of P by D where p D q is
the 2-valued 'if p is true then q is true', which has the value t except when
p is t and q is f or u, when it has the value f . (Actually this would still
be true with the slightly stronger 2-valued connective which also requires
'and if q is f then p is f .) comp(P) =3 P does not hold if -4 is used in P
because the program p - p has completion p - p which has a model where
p is u which gives p - p the value u not t.
We may identify a pair (T, F) of disjoint subsets of the Herbrand base
with the three valued structure which gives all elements of T the value t,
all elements of F the value f, and all other elements of the Herbrand base
the value u. The pairs (T, F) of disjoint subsets of Bp form a complete
lattice with the natural ordering C.
We define Sp t as we defined Tp t a, i.e. Sp t 0 = (0, , 0),Sp t
(a + 7)Sp(Sp t a),Phip t a = UBaSp t B for a & limit ordinal. We
now have the analogue for Sp of the 2-valued properties of TP.
1. Sp is monotonic.
2. Sp has a least fixed point given by Sp t a for some ordinal a.
3. If T, F are disjoint then (T, F) is a 3-valued Herbrand model of
comp(P) iff it is a fixed point of Sp p.
In addition:
4. comp(P) is always consistent in 3-valued logic.
For a proof see [Fitting, 1985]. The monotonicity is obvious and (2)
is a well known consequence of that. (3) is also easily proved, as in the
2-valued case, and (4) follows from (2) and (3).
However, the operator Sp is not in general continuous and so the closure
ordinal a, i.e. the a such that $p t a is the least fixed point, may be greater
than w. For example if P is :
388 J. C. Shepherdson

then it is easy to check that the closure ordinal is w +1. Fitting shows that
the closure ordinal can be as high as Church-Kleene w1, the first non recur-
sive ordinal. Both Fitting and Kunen show also that a semantics based on
this least fixed point as the sole model suffers from the same disadvantage
as the closed world assumption, namely that the set of sentences, indeed
even the set of ground atoms, that are true in this model may be non-
recursively enumerable (as high as II1 complete in fact). The same is true
of a semantics based on all 3-valued Her brand models, for the programs in
the examples constructed by Fitting and Kunen can be taken to have only
one 3-valued Herbrand model.
Kunen proposes a very interesting and natural way of avoiding this non-
computable semantics: Why should we consider only Herbrand models;
why not consider all 3-valued models? In other words, ask whether the
query Q is true in all 3-valued models of comp(P). He shows that an
equivalent way of obtaining the same semantics is by using the operator
$P but chopping it off at w:
A sentence 0 has value t in all 3-valued models of comp(P) iff
it has value t in $p t n for some finite n.
There are three peculiar features of this result. First, in determining
the truth value of 0 in $p t n, quantifiers are interpreted as ranging
over the Herbrand universe, yet we get equivalence to truth in all 3-valued
models. This may be partly explained by the fact that the truth of this
result depends on $p and comp(P) being formed using a language Lx with
infinitely many function symbols. (Where constants are treated as 0-ary
functions. Actually Kunen uses a language with infinitely many function
symbols of all arities, but the weaker condition above is sufficient.) For
example if P is simply p(a), then the usual language Lp associated with
P has just one constant a, the Herbrand base is {p(a)}, and if $p is
evaluated wrt this, then Vxp(x) has value t in $p t 1. But Vxp(x) is
not true in all 3-valued (or even all 2-valued) models of comp(P), and it
does not have value t in any $p t n if $p is evaluated wrt a language
with infinitely many function symbols, because if b is a new constant (or
the result /(a,..., a) of applying a new function symbol to a) then p(b)
has value f in $p t 1. Second, truth in some $p t n is not the same
as truth in $p t w, which is not usually a model of comp(P). Third
the result holds only for sentences built up from the Kleene connectives
A, V, -, t, -, V, 3, which have the property that if they have the value
t or f and one of their arguments changes from u to t or f then their
value doesn't change. The weak equivalence - used in the sentences giving
the completed definitions of the predicates in comp(P) does not have this
property, so the sentences of comp(P)—which obviously have value t in all
3-valued models of comp(P)—may not evaluate to t in any $p t n. For
example if P is
Negation as Failure, Completion and Stratification 389

then the completed definition of p in comp(P) is


.

This has value f in each $p t n for finite n, because for y = s n (0) the
left hand side is u but the right hand side is t.
If $P and comp(P) are formed using a language L with only finitely
many function symbols then [Shepherdson, 1988b] the corresponding result
is
A sentence 0 has value t in all 3-valued models of comp(P) U
{DC A} iff it has value t in $p t n for some finite n.
Here DCA is the domain closure axiom for L

(where rf is the arity of /), which states that every element is the value of
a function in L (so it is satisfied in Herbrand models formed using L.).
It is worth noting here the way in which comp(P) depends on the lan-
guage L. As mentioned above this is due to comp(P) containing the freeness
axioms for L. Let us denote by compL (P) the completion of P formed us-
ing the language L, by GkL the sentence expressing the fact that there
are greater than or equal to k distinct elements which are not values of
a function in L, and by GL the set of sentences expressing the fact that
there are infinitely many such elements. Let L(P) denote the language of
symbols occurring in the program. Then [Shepherdson, 1988a]:
Let L2 D L1 D L(P). Then compL 2 (P) is a conservative extension of
1. compL1(P) if L1 has infinitely many function symbols,
2. compL1 (P) U GL1 if L1 has finitely many function symbols and L2 —
LI contains a function symbol of positive arity or infinitely many
constants,
3. compLl (P) U {Gkl1,} if L1 has finitely many function symbols and
L2 — L1 contains no function symbol of positive arity but exactly k
constants.
These results are true in both 2- and 3-valued logic since only the equal-
ity predicate is involved, and this is always taken to be 2-valued.
Kunen shows that whether 0 has value t in $p t n is decidable, so
when a language with infinitely many function symbols is used:
The set of Q such that comp(P) N3 Q is recursively enumerable.
An alternative proof could be obtained by giving a complete and con-
sistent deductive system for 3-valued logic, as Ebbinghaus [1969] has done
390 J. C. Shepherdson

for a very similar system of 3-valued logic. This alternative proof shows
that the result holds for any language. For definite Horn clause programs
3-valued logic gives results which are in good agreement with those of 2-
valued logic.
If P is a definite Horn clause program then

Since the closure ordinal of TP t for a definite Horn clause program P


is w, this shows that the closure ordinal of HP t is the same as that of
Tp t.
For definite programs and definite queries, where the 2-valued semantics
of negation as failure in terms of comp(P) is satisfactory, the 3-valued
semantics agrees with it.
// P is a definite Horn clause program and Q is the existential closure
of a conjunction of atoms then the following eight statements are equiva-
lent: Q is true in all (2-valued, 3-valued) (models, Herbrand models) of
(comp(P), P).
It is to be understood here that, as above, the clauses of P are written
with - replaced by D. When this is done all 3-valued models of comp(P)
are models of P, so all the classes of models described are contained in
the class of 3-valued models of P. Since they all contain the class of 2-
valued Herbrand models of comp(P), the result amounts to saying that if
Q is true in all 2-valued Herbrand models of comp(P) then it is true in all
3-valued models of P, which follows easily by considering the least fixed
point model [Shepherdson, 1988b]. The result is true whatever language
(containing L(P)) is used. But if 'existential' is replaced by 'universal' here
then even the 2-valued parts of the result fail if the language used is L(P).
For example if P is p(0), p(s(x)) - p(x) and Q is Vxp(x) then Q is true in
all 2-valued Herbrand models of comp(P), but not in all 2-valued models
of comp(P). However
If the language of symbols contains infinitely many constants,
or a function symbol not in P, then the above equivalence result
holds for all positive sentences.
A positive sentence here means one built up from atomic sentences
using only A,V, V, 3. The reason this holds is that new constants behave
like variables; see [Shepherdson, 1988b] for details.
The relevance of 3-valued logic to negation as failure is given by Kunen's
soundness and completeness results:
SLDNF-resolution is sound with respect to comp(P) in 3-valued
logic, for normal programs P and normal queries Q, i.e.
if Q succeeds with answer 9 then comp(P) =3 Q0,
if Q fails then comp(P) =3 -3Q.
Negation as Failure, Completion and Stratification 391

SLDNF-resolution is complete with respect to comp(P) in 3-


valued logic for normal allowed programs P and normal allowed
queries Q, i.e.
if comp(P) =3 Q9 then Q succeeds with answer including 9,
if comp(P) =3 -3Q then Q fails .
Here a program is said to be allowed if every variable which occurs in
a clause of it occurs in at least one positive literal in the body of that
clause, a query is allowed if every variable in it occurs in at least one
positive literal of it. It is easily verified that if the program and query
are both allowed then the query cannot flounder, and this is what permits
the completeness result. So for such programs and queries comp(P) and
3-valued logic provide a perfectly satisfactory semantics. However, the
allowedness condition is very stringent and excludes many common Prolog
constructs, such as the definition of equality, equal(x, x), and both clauses
in the definition of member(X,L). But Stark [1994] has shown that the
allowedness condition can be considerably weakened to one based on modes
of predicates and data-flow analysis which is satisfied by most programs
written in practice. The first result above is stated in [Kunen, 1987] (for
a full proof see [Shepherdson, 1989]); the much deeper completeness result
is proved in [Kunen, 1989]. Neither of them require the assumption used
above that L contains infinitely many function symbols; they are valid
for any L containing the function symbols of P (see [Shepherdson, 1988a;
Shepherdson, 1989]).
Stark [1991] gives a formal system for 3-valued logic, a fragment of
Gentzen's sequent calculus LK, in which one can derive exactly the 3-
valued consequence of comp(P). Stark [1994] shows that these 3-valued
consequences of comp(P) are the same as those in a certain 4-valued logic,
and by representing a 4-valued relation by two 2-valued relations, they can
be expressed as the classical 2-valued consequences of a set of formulae
called partial completion of P. A similar theory has been introduced by
Jager [1993].
Other applications of many-valued logic to logic programming are to be
found in the papers of Fitting [1986; 1987b; 1987a], Fitting and Ben-Jacob
[1988] and Przymusinski [1991J.

2.7 Cases where SLDNF-resolution is complete for


comp(P): hierarchical, stratified and call-consistent
programs.
Clark [1978] proved completeness results for hierarchical programs. Apt et
al. [1988] introduced the important notion of stratified program, showed
that if P is stratified then comp(P) is consistent, and conjectured a com-
pleteness result for stratified programs satisfying an additional strictness
condition. This conjecture was proved by Cavedon and Lloyd [1989]. Sato
392 J. C. Shepherdson

[1987] generalised the notion of stratifiability to call-consistency and showed


that if P is call-consistent then comp(P) is consistent. Finally Kunen
[1987], using his completeness results above for 3-valued logic, proved com-
pleteness results for call-consistent programs which include all these results.
Informally a hierarchical program is one in which there are no recur-
sive or mutually recursive definitions of predicates, a stratified one has no
recursion through negation, and a call-consistent one has no predicate de-
pending negatively on itself as p does in the program p - -p or similarly
via intermediate clauses and predicate symbols, e.g. p - -q, q - p.
These notions can be defined formally in two ways. First in terms of
the dependency graph of a program P. This is a directed graph whose
nodes are the predicate symbols of P and which has an edge from p to q
iff there is a clause in P with p in the head and q in the body. The edge
is marked positive (resp. negative) if q occurs in a positive (resp. negative)
literal in the body (an edge may be both positive and negative). Then a
program is hierarchical iff its dependency graph contains no cycles, it is
stratified iff its dependency graph contains no cycles containing a negative
edge, it is call-consistent iff its dependency graph contains no cycle with
an odd number of negative edges. Equivalent definitions can be given in
terms of level mappings. A level mapping of a program is a mapping from
its set of predicate symbols to the non-negative integers. A program P is
hierarchical iff it has a level mapping such that in every clause of P the
level of a predicate symbol in the body is strictly less than the level of the
predicate symbol in the head. It is stratified iff the level of each predicate
symbol in a positive literal in the body is less than or equal to the level of
the predicate symbol in the head and each predicate symbol in a negative
literal in the body is strictly less than the level of the predicate symbol
in the head. Finally let us say a predicate symbol p depends positively
(resp. negatively) on a predicate symbol q, written p -+1 q (resp. p--1 q)
when there is a path (of length - 0) in the dependency graph from p to q
containing an even (resp. odd) number of negative edges. Then a program
is call-consistent iff there is a level mapping such that level(p) - level(q) if
p depends positively or negatively on q, and level(p) > level(g) if p depends
both positively and negatively on q. Kunen gave an alternative definition
of p -+1 q, p --1 q as follows. Define p D+1 q (resp. p D-1 q) when there
is a program clause with p in its head and q occurring in a positive (resp.
negative) literal in its body. Let -+1, --1 be the least pair of relations on
the set of predicate symbols satisfying

Kunen defined a program to be semi-strict iff there is no p such that


p --1 p. This is clearly seen to be equivalent to the first definition of
Negation as Failure, Completion and Stratification 393

call-consistent above.
Following Apt, Blair and Walker he defined a program to be strict iff
there are no p, q such that p -+1 q and p --1 q. Finally if Q is a query, let
us define Q -, p iff either q -i p for some q occurring positively in Q, or
q--i p for some q occurring negatively in Q. Then a program P is said to
be strict with respect to Q iff for no predicate letter p do we have Q -+1 p
and Q --1 p. [This condition excludes queries Q such as p, -p; here -Q is
a tautology so is a consequence of any comp(P) but Q does not fail unless
p succeeds or fails.]
Kunen [1987] proved:
If P is a call-consistent program which is strict wrt the query
Q then
comp(P) (=2 VQ implies comp(P) =3 VQ,
comp(P) =2 -3Q implies comp(P) =3 -3Q.
[Actually the second part is not explicitly stated but follows by a similar
argument]. Combining this with the 3-valued completeness results given in
Section 2.6 above gives:
If P is an allowed, call-consistent normal program, Q is an
allowed normal query, and P is strict with respect to Q, then
SLDNF-resolution is complete with respect to comp(P) in 2-
valued logic for the query Q, i.e.
if comp(P) h Q9 then Q succeeds with answer 6
if comp(P) I—3Q then Q fails.
This does not include the result of Clark, that if call-consistent here is
strengthened to hierarchical then the condition that P is strict with respect
of Q is not needed. Kunen gives a version that includes this as follows. Let
P be a subset of the predicate symbols of P which is downward closed i.e.
if the predicate symbol in the head of a clause of P belongs to P then so do
all predicate symbols in the body of the clause. Suppose that P restricted
to P is hierarchical. Then the condition that P is strict with respect to
Q can be weakened to: for no predicate letter p outside of P do we have
Q -+1 p and Q --1 p.
These results extend to general programs (Section 2.4) where arbitrary
first formulae are allowed in the bodies of program clauses, if the notions
of allowed, call-consistent and strict are defined in the appropriate way; see
[Cavedon, 1988] for details.

2.8 Semantics for negation in terms of special classes


of models
We continue here the discussion begun in Section 1.5 of semantics based on
the idea that the meaning of a program P is not a set of sentences (such
as P, comp(P) or, CWA(P)), but a set of M(P) of models of P, and that
394 J. C. Shepherdson

when we ask a query ?Q we want to know whether Q is true in all models


in M(P). We have already discussed, in Section 1.5, the case where P
is a definite Horn clause program and M(P) is the singleton consisting of
its least Herbrand model, and in Section 2.3 we have shown its close con-
nection with the semantics based on CWA(P). When P contains negation
CWA(P) may be inconsistent and P may not have a least Herbrand model.
In attempting to find a weaker assumption which would be consistent for
all consistent P, Minker [1982] was led to suggest replacing least Herbrand
model by minimal Herbrand model, i.e. one not containing any proper sub-
model, where as usual we identify a Herbrand model with the subset of the
Herbrand base Bp which is true in it. So he considers M(P), the class of
intended models of P, to be the class of minimal Herbrand models of P. He
shows this semantics is closely related to the one based on his 'generalised
closed world assumption', GCWA(P), defined as follows:
GCWA(P) = P U {-A : A is a ground atom such that there
is no disjunction B of ground atoms such that P h A V B but
P\/B}.
Indeed the condition on A here is easily shown to be equivalent to:
A is a ground atom which is false in all minimal Herbrand mod-
els of P.
Since minimal models of P clearly satisfy GCWA(P) this implies:
If a first order sentence Q is a consequence of GCWA(P) then
it is true in all minimal Herbrand models of P.
However, the converse is not generally true, for there may be models of
GCWA(P) which are not minimal Herbrand models of P. For example if
P is p(a) - -p(b), i.e. p(a) Vp(b), there are two minimal Herbrand models
of P, namely {p(a)}, (p(b)}, but GCWA(P) is the same as P and has
a non-minimal model {p(a),p(b}}. And the query 3x-p(x) is true in all
minimal Herbrand models of P but is not a consequence of GCWA(P). So
GCWA(P) is an incomplete attempt to characterise the minimal Herbrand
models of P. The converse of the statement displayed above is true if Q is a
positive query or, more generally the existential closure of a positive matrix
(i.e. a formula built up from atoms using only A, V). But GCWA(P) is not
really involved then, because if such a Q is true in all minimal Herbrand
models of P it is actually a consequence of P alone. From the displayed
statement above it follows that if such a Q is a consequence of GCWA(P)
then it is a already a consequence of P so addition of the generalised closed
world assumption, like the closed world assumption, does not allow the
derivation of any more positive information of this kind (in particular of
ground atoms). This is usually thought to be a desirable feature, since
closed world assumptions are usually intended as devices for uncovering
Negation as Failure, Completion and Stratification 395

implicit negative information, thus avoiding the need to state it explicitly,


without adding unconsciously to the positive information of the program.
Minker's aim of providing a version of the closed world assumption
which is consistent is achieved:
// P is consistent then so is GCWA (P)
This is true not only for normal programs P but for any program P con-
sisting of universal sentences because it is easy to show that such P has a
minimal Herbrand model.
GCWA(P) is a generalisation of the CWA(P) in the sense that it agrees
with that when P is definite. The generalised closed world assumption is
a kind of negation as failure in that -A is assumed when A fails to be true
in any minimal Herbrand model. However, negation as failure as defined
here, i.e. as SLDNF-resolution, is not sound with respect to it. This is
shown by the program q - -p, where p fails but is not a consequence of
GCWA(P). Henschen and Park [1988] discuss computational proof proce-
dures appropriate to the GCWA(P) in the case of principal interest where
P is a database, i.e. without function symbols (which Minker's original
article restricted itself to). When function symbols are present there may
be no sound and complete computational proof procedure for GCWA(P).
This follows from our example in Section 2.3 of a definite clause program
for which the set of queries which are consequences of CWA(P) is not re-
cursively enumerable. The same is true for the set of queries true in all
minimal Herbrand models of P, since for definite P this does coincide with
the set of queries which are consequences of GCWA(P) (i.e. of CWA(P)).
For further results on the GCWA and other generalisation of the closed
world assumption see Gelfond et al. [1986; 1989], Lifschitz [1988], Shep-
herdson [l988b], Brass and Lipeck [1989] and Yahya and Henschen [1985].
Apt et al. [1988] propose a semantics for negation which combines this
approach with that of the Clark completion, i.e. applies both kinds of de-
fault reasoning. They suggest that the models of P which it is reasonable
to study are those Herbrand models which are not only minimal but sup-
ported, i.e. a ground atom A is true only if there is a ground instance of
a clause of P with head A and a body which is true. We saw in Section
2.4 that the supported models are the fixpoints of Tp, and the models of
comp(P). Apt, Blair and Walker say on p. 100
'... we are interested here in studying minimal and supported
models ... this simply means we are looking for the minimal
fixed points of the operator Tp.'
Since they go on to study the minimal fixed points of Tp it looks as
though they intended the latter definition i.e. minimal supported models
of P i.e. minimal models of comp(P). But the first phrase suggests a
stronger definition, i.e. models of comp(P) which are also minimal models
396 J. C. Shepherdson

of P. To see the difference consider the program p - q, q - -p, q 4— q.


The only, and hence the minimal, model of comp(P) is {p, q}; the only
minimal model of P is {p}. There is no supported model of P (i.e. model
of comp(P)) which is a minimal model of P. However, their main concern is
with stratified programs, for which they establish the existence of a model
satisfying the stronger first definition:

If P is a stratified program then there is a minimal model of P


which is also supported (i.e. a model of comp(P), so comp(P)
is consistent).

There may be more than one model satisfying these conditions, e.g if
P is the stratified program p - p, q - -p there are two such models {p}
and {q}. They show that there is one such model Mp which is defined in
a natural way and propose that it be taken as defining the semantics for
the program P, i.e., that an ideal query evaluation procedure should make
a query Q succeed if Q is true in Mp and fail if Q is false in Mp. They
give two equivalent ways of defining Mp. A stratified program P can be
partitioned

so that if a predicate occurs positively in the body of a clause in Pi, all


clauses where it occurs in the head are in Pj with j - i, and if a predicate
occurs negatively in the body of a clause in Pj, then all clauses where it
occurs in the head are in P, with j < i. (So Pi consists of the clauses
defining ith level predicates.) Their first definition of Mp is to start with
the empty set, iterate Tp uj times then T'p2 w times, . . . , T'pn u times. (The
operator T'p here is defined by T' p (I) = TP(I) U / and is more appropriate
than Tp when that is nonmonotonic.) The other definition starts by defini-
tion M(P1) as the intersection of all Herbrand models of P1, then M(P2)
as the intersection of all Herbrand models of P2 whose intersection with
the Herbrand base of P1 is M(P1), then . . . M(Pn) as the intersection of
all Herbrand models of Pn whose intersection with the Herbrand base of
Pn_1 is M(Pn_1). Finally define Mp = M(Pn). For the program above,
this model is {q}, which does seem to be better in accordance with default
reasoning than the other minimal model of comp(P), namely {p}. There
is no reason to suppose p is true, so p is taken to be false, hence q to be
true. It seems natural to assign truth values to the predicates in the or-
der in which they are defined, which is the essence of the above method.
Moreover, a strong point in favour of the model Mp is that they show it
does not depend on the actual way a stratifiable program is stratified, i.e.,
divided into levels.
Negation as Failure, Completion and Stratification 397

Notice that like comp(P), Mp depends not only on the logical content
of P but on the way it is written, for p 4- -q and q - -p give different
Mp.
Since SLDNF-resolution is sound for comp(P) and Mp is a model of
comp(P), it is certainly sound for Mp but, since more sentences will be
true in Mp than in all models of comp(P), SLDNF-resolution will be even
more incomplete for Mp than for comp(P). For example, with the program
above, where q is true in Mp but not in all models of comp(P), there is no
chance of proving q by SLDNF-resolution.
In general there may be no sound and complete computational proof
procedure for the semantics based on Mp. This is shown by the example
in 2.3 which shows this for CWA(P), since for definite programs Mp is the
least Herbrand model, so the semantics based on Mp coincides with that
based on CWA(P). However Apt, Blair and Walker do give an interpreter
which is sound, and which is complete when there are no function symbols.
Przymusinski [l988b; 1988a] proposes an even more restricted class of
models than the minimal, supported models, namely the class of perfect
models. The argument for a semantics based on this class is that if one
writes p V q, then one intends p, q to be treated equally; but, if one writes
p .- -q there is a presupposition that in the absence of contrary evidence
q is false and hence p is true. He allows 'disjunctive databases,' i.e., clauses
with more than one atom in the head, e.g.

and his basic notion of priority is that the Cs here should have lower pri-
ority than the Bs and no higher priority than the As. To obtain greater
generality, he defines this notion for ground atoms rather than predicates,
i.e., if the above clause is a ground instance of a program clause he says
that Ci Bj, Ci - Ak. Taking the transitive closure of these relations
establishes a relation on the ground atoms that is transitive (but may not
be asymmetric and irreflexive if the program is not stratified).
His basic philosophy is
... if we have a model of DB and if another model N is obtained
by possibly adding some ground atoms of M and removing some
other ground atoms from M, then we should consider the new
model TV to be preferable to M only if the addition of a lower
priority atom A to TV is justified by the simultaneous removal
from M of a higher priority atom B (i.e. such that B > A).
This reflects the general principle that we are willing to min-
imize higher priority predicates, even at the cost of enlarging
predicates of lower priority, in an attempt to minimize high
priority predicates as much as possible. A model M will be
considered perfect if there are no models preferable to it. More
398 J. C. Shepherdson

formally:
[Definition 2.] Suppose that M and N are two different models
of a disjunctive database DB. We say that N is preferable to
M (briefly, N < M) if for every ground atom A in N — M there
is a ground atom B in M - N, such that B > A. We say that
a model M of DB is perfect if there are no models preferable
to M.
He extends the notion of stratifiability to disjunctive databases by re-
quiring that in a clause

the predicates in C1,..., Cp should all be of the same level i greater than
that of the predicates in B1....Bn and greater than or equal to those of
the predicates in A1 ..., Am. He then weakens this to local stratifiability by
applying it to ground atoms and instances of program clauses instead of to
predicates and program clauses. (The number of levels is then allowed to
be infinite.) It is equivalent to the nonexistence of increasing sequences in
the above relation < between ground atoms.
He proves:
Every locally stratified disjunctive database has a perfect model.
Moreover every stratified logic program P (i.e. where the head
of each clause is a single atom) has exactly one perfect model,
and it coincides with the model Mp of Apt, Blair, and Walker.
He also shows that every perfect model is minimal and supported, if the
program is positive disjunctive, then a model is perfect iff it is minimal,
and a model is perfect if there are no minimal models preferable to it.
(Positive disjunctive means the clauses are of the form C1 V ... V Cp -
A1 A ... A A m. ) He also establishes a relation between perfect models and
the concept of prioritized circumscription introduced by McCarthy [1984]
and further developed by Lifschitz [1985]:

Let S1,..., Sr be any decomposition of the set S of all predi-


cates of a database DB into disjoint sets. A model M of DB
is called a model of prioritized circumscription of DB with re-
spect to priorities S1 > S2 > ... > 5r, or—briefly—a model
of CIRC(DB, S1 > S2 > ... > Sr) if for every i = 1,... ,r
the extension in M of predicates from S1 is minimal among
all models M of DB in which the extension of predicates from
S1, S 2 , . . . , Si-1 coincides with the extension of these predicates
in M.

He shows
Negation as Failure, Completion and Stratification 399

Suppose that DB is a stratified disjunctive database and


{S1, S1,.. .Sr} is a stratification of DB. A model of DB is
perfect if and only if it is a model of prioritized circumscription
CIRC(DB, S1 >s 2 >...>s r ).

Przymusinska and Przymusinski [1988] extend the above results to a


wider class of weakly stratified program and a corresponding wider class of
weakly perfect models. The definitions are rather complicated but are based
on the idea of removing 'irrelevant' predicate symbols in the dependency
graph of a logic program and substituting components of this graph for its
vertices in the definitions of stratification and perfect model.
Przymusinski [l988a] observed that the restriction to Herbrand models
gives rise to what he calls the universal query problem. This is illustrated
by the program P consisting of the single clause p(a). Using the language
defined by the program this has the sole Herbrand model {p(a)} so that
Vxp(x) is true in the least Herbrand model although it is not a consequence
of P. So the semantics based on the least Herbrand model implies new pos-
itive information, and also prevents standard unification based procedures
from being complete with respect to this semantics. One way of avoiding
this problem is to consider Herbrand models not with respect to the lan-
guage defined by the program but with respect to a language containing
infinitely many function symbols of all arities, as in 2.6. This seems very
cumbersome; given an interpretation for the symbols occurring in the pro-
gram, to extend this to a model you would have to concoct meanings for
all the infinitely many irrelevant constant and function symbols. A simpler
way is to consider all models, or, as Przymusinski did, all models satis-
fying the equational axioms CET of Section 2.4 instead of just Herbrand
models. He showed how to extend the notion of perfect model from Her-
brand models to all such models, and proposed a semantics based on the
class of all perfect models. He gave a 'procedural semantics' and showed it
to be sound and complete (for non-floundering queries), for stratified pro-
grams with respect to this new perfect model semantics. It is an extension
of the interpreter given by Apt, Blair and Walker. However, it is not a
computational procedure. It differs from SLDNF-resolution by considering
derivation trees to be failed not only when they are finitely failed, but also
when all their branches either end in failure or are infinite. This cannot
always be checked in a finite number of steps. Indeed there cannot be
any computational procedure which is sound and complete for the perfect
model semantics because for definite programs it coincides with the least
Herbrand model semantics, and the example in Section 2.3 shows that the
set of ground atoms false in this model may be non-recursively enumerable.
Further results on stratified programs and perfect models are found in Apt
and Pugin [1987], Apt and Blair [1988].
400 J. C. Shepherdson

Van Gelder et al. [1988], building on an idea of Ross and Topor [1987]
defined a semantics based on well-founded models. These are Herbrand
models which are supported in a stronger sense than that defined above.
It is explained roughly by the following example. Suppose that p — q and
q — p are the only clauses in the program with p or q in the head. Then p
needs q to support it and q needs p to support it so the set {p, q} gets no
external support and in a well-founded model all its members will be taken
to be false. In order to deal with all programs they worked with partial
interpretations and models. A partial interpretation I of a program P is
a set of literals which is consistent, i.e. does not contain both p and -p
for any ground atom (element of the Herbrand base Bp) p. If p belongs
to / then p is true in /, if -p belongs to / then p is false in I, otherwise
p is undefined in I. It is called a total interpretation if it contains either
p or -p for each ground atom p. A total interpretation / is a total model
of P if every instantiated clause of P is satisfied in /. A partial model is
a partial interpretation that can be extended to a total model. A subset
A of the Herbrand base BP is an unfounded set of P with respect to the
partial interpretation I if each atom p E A satisfies the following condition:
For each instantiated clause C of P whose head is p, at least one of the
following holds:
1. Some literal in the body of C is false in /
2. Some positive literal in the body of C is in A.
The well-founded semantics uses conditions (1) and (2) to draw negative
conclusions. Essentially it simultaneously infers all atoms in A to be false,
on the grounds that there is no one atom in A that can be first established
as true by the clauses of P, starting from 'knowing' /, so that if we choose
to infer that all atoms in A are false there is no way we would later have
to infer one as true. The usual notion of supported uses (1) only. The
closed sets of Ross and Topor [1987] use (2) only. It is easily shown that
the union of all unfounded sets with respect to / is an unfounded set, the
greatest unfounded set of P with respect to /, denoted by Up(I). Now for
each partial interpretation / an extended partial interpretation Wp(P) is
obtained by adding to / all those positive literals p such that there is an
instantiated clause of P whose body is true in / (this part is like the familiar
Tp operator) and all those negative literals -p such that p € Up(I). It is
routine to show that Up is monotonic and so has a least fixed point reached
after iteration to some countable ordinal. This is denoted by Mwp(P) and
called the well-founded partial model of P. The well-founded semantics of P
is based on Mwp(P). In general Mwp(P) will be a partial model, giving rise
to a 3-valued semantics. Using the 3-valued logic of Section 2.6 Mwp(P)
is a 3-valued model of comp(P), but in general it is not the same as the
Fitting model defined in Section 2.6 as the least fixed point of p. Since
the Fitting model is the least 3-valued model of comp(P) it is a subset of
Negation as Failure, Completion and Stratification 401

Mwp(P). For the program p - p, in the Fitting model p is undefined,


but in Mwp(P) it is false (since the set {p} is unfounded). However, for
stratified programs this new approach agrees with the two previous ones:
If P is locally stratified then its well-founded model is total and
coincides with its unique perfect model, i.e. with the model Mp
of Apt, Blair and Walker.
Przymusinski [l988b] extends this result to weakly stratified programs and
weakly perfect models. He also shows how to extend SLS-resolution so
that it is sound and complete for all logic programs (for non-floundering
queries) with respect to the well-founded semantics. Ross [1989] gives a
similar procedure. Przymusinski also shows that if the well-founded model
is total the program is in a sense equivalent to a locally stratified program.
A closely related notion of stable model was introduced by Gelfond and
Lifschitz [1988]. For a given logic program P they define a stability trans-
formation S from total interpretations to total interpretations. Given a
total interpretation / its transform S(I) is defined in three stages as fol-
lows. Start with the set of all instantiations of clauses of P. Discard those
whose bodies contain a negative literal which is false in I. From the bodies
of those remaining discard all negative literals. This results in a set of
definite clauses. Define S(I) to be its least Herbrand model.
This transformation S is a 'shrinking' transformation, i.e. the set of
positive literals true in S(I) is a subset of those true in /. If / is a model
of P the interpretation S(I) may not be a model of P; it may shrink too
much. However, the fixed points of S are always models of P. These are
defined to be the stable models of P. A stable model is minimal (in terms
of the set of positive literals) but not every minimal model is stable. Van
Gelder et al. show that for total interpretations being a fixed point of 5
is the same as being a fixed point of their operator Wp. Since the well-
founded model is the least fixed point of Wp it is a subset of every stable
model of P. Furthermore
If P has a well-founded total model then that model is the unique
stable model.
The converse is not true.
Fine [1989] independently arrived, from a slightly different point of view,
at a notion of felicitous model, which is equivalent to that of stable model.
A felicitous model is one such that the falsehoods of the model serve to
generate, via the program, exactly the truths of the model. His idea is
that if you make a hypothesis as to which statements are false, and use
the program to generate truths from this hypothesis then there are three
possible outcomes: some statement is neither a posited falsehood nor a
generated truth (a 'gap'); some statement is both a posited falsehood and a
generated truth (a 'glut'); the posited falsehoods are the exact complements
of the generated truths (no gap and no glut). A happy hypothesis is one
402 J. C. Shepherdson

which leads to no gaps and no glut, and a felicitous model is one where
the hypothesis that the false statements are precisely those which are false
in the model, is a happy hypothesis. Fine shows that the restriction to
felicitous models can be viewed as a kind of self-referential closed world
assumption.
To sum up. Plausible arguments have been given for each of the seman-
tics discussed in this section. The minimal models of comp(P) of Apt,
Blair and Walker, the perfect models of Przymusinski, the well-founded
models of van Gelder, Ross and Schlipf, and the stable models of Gelfond
and Lifschitz are all models of comp(P), so SLDNF-resolution is sound for
them. So they all offer plausible semantics for negation as failure in general
different from that based on comp(P) because they are based on a subset
of the models of comp(P). The fact that for the important class of locally
stratified programs they all coincide, giving a unique model Mp, which
often appears to be 'the' natural model, adds support to their claim to be
chosen as the intended semantics. However SLDNF-resolution will be even
more incomplete for them than it is for the semantics based on comp(P)
and, as noted above there is, even for the class of definite programs, demon-
strably no way of extending SLDNF-resolution to give a computable proof
procedure which is both sound and complete for them.

2.9 Constructive negation; an extension of negation as


failure
Chan [1988] gives a procedure, constructive negation, which extends nega-
tion as failure to deal with non-ground negative literals. This is done by
returning the negation of answers to query Q as answers to -Q. If Q has
variables x1,...,x n and succeeds with idempotent answer substitution 0
for the program P we can write the answer in equational form as

where 3 quantifies the variables on the right hand side other than x1,. ..,xn.
If there are a finite number of answers to the query Q then

is a consequence of comp(P), so if comp(P) is the intended semantics it is


legitimate to return

as the answer to -Q. For example if P is


Negation as Failure, Completion and Stratification 403

then ?q(x) has the answer x = a, so ?q(x) is given the answer x / a and
?p the answer 3x(x = a).
If there are infinitely many answers to Q this procedure is not applica-
ble, for example if P is

then ?p(x) has answers x = 0, x — s(0),x = s 2 (0), . . . and there is no way


of describing all these by a first order formula. And it will not work when
there is an infinite branch in the derivation tree e.g. if P is

then the only computed answer to ?p(x) is x = a, but the truth of p(b)
is not determined by comp(P), so p(x) - x = a is not a consequence of
comp(P).
When the procedure does work it returns as answer to a query Q an
equality formula E (i.e. a formula with = as the only predicate symbol)
such that

is a consequence of comp(P). However if the formula E is built up in the


way described above, then the alternations of existential quantification, dis-
junction and negation have the effect that E can be an arbitrarily complex
first order equality formula. In order to make the answer E more intelligi-
ble Chan gives an algorithm for reducing the answer E at each stage to a
fairly simple normal form NE to which, assuming CET (Clark's equational
theory, defined in Section 2.4 above), it is equivalent, i.e. such that

To allow this normal form NE to be taken as simple as possible he


assumes (as Kunen does in his 3-valued treatment given in Section 2.6
above) that the underlying language of constant and function symbols used
is infinite. This language affects CET and since comp(P) includes CET it
affects comp(P). Let us indicate this dependence on the language L by
writing CET/,, compi(P). In the first example above, if L is infinite (or
indeed, if it contains any function symbol or any constant other than a)
then the answer 3x(x = a) can be replaced by true because CETL and
hence compi (P) imply the existence of an x = a since CETL implies that
the different ground terms are unequal. This is the result given in this case
404 J. C. Shepherdson

by Chan's algorithm, which is incorporated into an extension of SLDNF-


resolution called SLD-CNF resolution. If on the other hand, a is the only
constant or function symbol in L then the answer 3x(x = a) cannot be
simplified, because in some models of compL(P) there will be elements
other than a whereas in others there will not. Note the assumption that L is
infinite excludes the usual Herbrand models based on the language defined
by the program. In this example 3x(x = a) which has been replaced by
true assuming L to be infinite is false in the usual Herbrand model whose
domain is {a}. Chan's normal form for the equality formulae returned as
answers is quite simple; you only need to add some universally quantified
inequations to the equations x\ = x10 A . . . A xn = xn0 which represent
the familiar substitution type of answer. In fact further restrictions can
be made; these are clearly stated in Przymusinski [l989a, theorem 8.1].
Assuming L is infinite, there is an algorithm for reducing any equality
formula E to a strictly normal equality formula NE such that

A strictly normal equality formula is a disjunction of strictly simple equality


formulae each of which has the form

where each ti, si, is either a non-variable term or one of the free variables
of this formula distinct from Xi, x1i( respectively, where the V in V(x'i = si)
universally quantifies some (perhaps none) of the variables in si, where the
y1, . . . , ys are distinct from x1 , . . . , xm , x1, . . . , x'm , and each yi occurs in
at least one of the terms t1, . . . , tm . Chan does not describe his normal
form explicitly but from his examples and his reduction algorithm it would
appear that he achieves the further restriction corresponding to the use of
idempotent mgu, that the occurrence of Xi on the left hand side of Xi = ti
is the only occurrence of xi in the formula. But in order to achieve this
he has to admit quantified inequations of the form V ( y j = r j ) as well. For
example, the formula

is in Przymusinski's normal form. The corresponding Chan normal form is

The reason why the assumption that L is infinite simplifies the normal
forms for equality formulas is that CETL is then a complete theory. Normal
forms for the case of finite L have been given in Shepherdson [l988a]. The
Negation as Failure, Completion and Stratification 405

only difference is that closed formulae of the form GkL, -GkL as defined in
Section 2.6 may occur in the normal form. The formula GkL expresses
the fact that there at least k distinct elements which are not values of a
function in L. So if one is working with models for which the number
of such elements is known, these can be replaced by true or false and
the same normal forms as above are achievable, although the reduction
algorithm will be more complicated than those of Chan and Przymusinski.
One special case of interest is where one only wishes to consider models
satisfying the domain closure axiom, DCA, of Section 2.6, which includes all
the usual Herbrand models. Then all GkL would be replaced by false. Other
normal forms for equality formulae have been given by Malcev [1971] and
Maher [1988]. The normal forms of Chan and Przymusinski are probably
the most intelligible and easily obtainable ones, but Maher's is also fairly
simple—just a boolean combination of basic formulae. These are of the
form

where y s , . . . , ys, x1,..., xs are distinct variables and x1,..., xm do not oc-
cur in t1,..., tm; so they are the answer formulae corresponding to the usual
answer substitutions given by SLD- or SLDNF-resolution using idempotent
mgu.
Chan's constructive negation is a procedure to translate a given query
Q into an equality formula E which is equivalent to it with respect to the
semantics comp(P) i.e. such that
comp(P) h V(Q - E).
As we have seen above it does not always succeed. Clearly it can only
succeed when Q is equality definable with respect to comp(P) i.e. when
there is such an equivalent equality formula. Let us say P is equality defin-
able with respect to comp(P) when all first order formulae Q are equality
definable with respect to comp(P). Przymusinski [l989a] considers equality
definability with respect to different semantics. Since all perfect models of
P are models of comp(P), and all models of comp(P) are models of P it is
clear that:
If Q is equality definable with respect to P then it is equality
definable with respect to comp(P); if it is equality definable with
respect to comp(P) then it is equality definable with respect to
the perfect model semantics.
He shows that a program P is equality definable with respect to the
perfect model semantics if it has no function symbols or more generally
if it has the bounded term property defined by Van Gelder [Van Gelder,
1989]. The analogous result is not true for the semantics based on P or on
comp(P). Indeed Shepherdson [l988a] showed that P is equality definable
with respect to comp(P) iff there is a 2-level hierarchic program which is
406 J. C. Shepherdson

equivalent to P in the sense that its completion is a conservative exten-


sion of comp(P). Przymusinski showed how to extend his SLS-resolution
for stratified programs (cf. Section 2.8) to an 'SLSC-resolution' by aug-
menting it with constructive negation. With respect to the perfect model
semantics the resulting procedure is always sound, and it is complete for
equality definable programs, i.e. it is complete for all programs for which
any procedure could be complete. However, like SLS-resolution, it is not a
computable procedure.
Constructive negation is a significant attempt to deal with the problem
of floundering and to produce answers when SLDNF-resolution flounders.
Its limitations are that it can still only deal with a negative query -A
when the query A has a finite derivation tree. As noted above this is a
fairly severe limitation.

2.10 Modal and autoepistemic logic


Gabbay [1989] presents a view of negation as failure as a modal provability
notion.
'... we use a variation of the modal logic of Solovay, originally in-
troduced to study the properties of the Godel provability pred-
icate of Peano arithmetic, and show that -A can be read es-
sentially as 'A is not provable from the program'. To be more
precise, -A is understood as saying 'Either the program is in-
consistent, or the program is consistent, in which case -A means
A is not provable from the program'.
In symbols
-A = D (Program - f)V ~ O (Program - A),
where -i is negation by failure, ~ is classical negation, f is falsity, and D is
the modality of Solovay. We provide a modal provability completion for a
Prolog program with negation by failure and show that our new completion
has none of the difficulties which plague the usual Clark completion.
We begin with an example. Consider the Prolog program -A - A,
where - is negation by failure. This program loops. Its Clark completion
is -A - A, which is a contradiction, and does not satisfactorily give a
logical content to the program.
We regard this program as saying:
(Provable (f)V ~ Provable (A) ) - A
In symbols, if x is the program and D is the modality of provability,
the program x says (D(x - f)V ~ D(x - A)) - A where - is classical
negation. In other words, the logical content of the program x is the fixed
point solution (which can be proved to always exist) of the equation
Negation as Failure, Completion and Stratification 407

This solution turns out to be a consistent sentence of the modal logic of


provability to be described later.
We now describe how to get the completion in the general case.
Let P be a Prolog program. Let P1 be its Clark completion. Let x be a
new variable. Replace each -A in the Clark completion P1 essentially by
(D(x - f)V ~ D(x - A)) . Thus P1 becomes P2(x) containing D, x and
classical negation ~. We claim that in the modal logic of provability, the
modal completion of the program P is the X such that in the modal logic
h X - P2(x) holds. We have that a goal A succeeds from P iff (more or
less) the modal completion of P h OA. A more precise formulation will be
given later. We denote the modal completion of P by m(P).'
The modal completion is shown to exist and to be unique up to h equiv-
alence. Defining GA = A AAA, 'A is true and provable' (in a general prov-
ability logic A may be provable but not true), soundness and completeness
results are proved:
for each atomic goal Q and substitution 9, Q0 succeeds under
SLDNF-resolution iff m(P) h GQ9 and Q0 fails from P iff
m(P) h G-Q0.
The completeness result is based on a proof for the prepositional case
given by Terracini [l988a].
It should be noted that in this result 'succeeds' and 'finite' do not refer to
SLDNF-resolution but to another version of negation as failure. This coin-
cides with SLDNF-resolution when P is a prepositional program, but when
P contains individual variables 'Q0 succeeds from P' means 'there exists 77
such that Q0n is ground and succeeds from P9 under SLDNF-resolution';
similarly 'Q9 fails from P' means 'n fails from Pg under SLDNF-resolution
for all n) for which Q0n is ground'. Here Pg denotes the (possibly infinite)
prepositional program obtained by taking all ground instances of clauses
of P. For example if P is p - -q(x) then Pg is p - -q(a) and the goal
p is said to succeed, although in SLDNF-resolution from P it would floun-
der. This device enables the floundering problem to be dealt with and the
predicate case reduced to the prepositional case.
The modal logic of provability used is defined both semantically, in
terms of Kripke type models consisting of finite trees, and also syntactically.
Both of these definitions are rather complicated for predicate logic but
the prepositional form of the syntactic definition consists of the following
schemas and rules:
1. h A, if A is a substitution instance of a classical truth functional
tautology.
2. The schemas:
408 J. C. Shepherdson

Here 0 stands for -D-A This is an extension of Solovay's modal logic


of provability, which is itself obtained by extending the modal logic K4 by
Lob's axiom schemas 4(a). It is also described in [Terracini, 1988b]. The
soundness and completeness results show that this is a very satisfactory
semantics for negation as failure. Its main disadvantage is that the modal
logic used is apparently rather complicated and, at present, not widely
known. So it is doubtful whether this semantics would help writers of pro-
grams to understand their meaning and check their correctness. However,
Gabbay argues convincingly that many of the day to day operations of logic
programming have a modal meaning and that logic programmers should
become more familiar with modal logic.
Gelfond [1987] and Przymusinska [1987] discussed negation as failure in
terms of the autoepistemic logic of Moore (1985]. This is a prepositional
calculus augmented by a belief operator L where Lp is to be interpreted
as 'p is believed'. Gelfond considers -p in a logic program as intended to
mean 'p is not believed'.
He defines the autoepistemic translation I ( F ) of an objective formula F
(i.e. a prepositional formula not containing the belief operator L) to be the
result of replacing each negative literal -p in F by -Lp. The autoepistemic
translation I(P) of a logic program consists of the set of translations of all
ground instances of clauses of P, i.e. the set of all clauses of the form

for all ground instances

Let T be a set of autoepistemic formulae. Moore defined a stable auto-


epistemic expansion of T to be a set E(T) of autoepistemic formulae which
satisfies the fixed point condition
Negation as Failure, Completion and Stratification 409

where Cn(S) denotes the set of autoepistemic logical consequences of S


(formulae of the form Lg being treated as atoms). This intuitively repre-
sents a set of possible beliefs of an ideally rational agent who believes in
all and only those facts which he can conclude from T and from his other
beliefs. If this expansion is unique it can be viewed as the set of theo-
rems which follow from T in the autoepistemic logic. The autoepistemic
translation I(P) of a logic program P does not always have such a unique
expansion. The program pi \p has no consistent stable autoepistemic
expansion because its translation is p <— ->p and it is easy to verify that
both Lp and ->Lp must belong to such an expansion. On the other hand
the program p «- -<q, q 4- ->p has two such expansions, one with Cn(p)
as its objective part, the other with Cn(q) as its objective part. Gelfond
and Przymusinska showed that for stratified prepositional programs the
autoepistemic and perfect model semantics coincide:
If P is a stratified propositional logic program then I(P) has a
unique stable autoepistemic expansion E(I(P)) and for every
query Q, PERF(P) N Q iff E(I(Q)} \= I(Q).
Here PERF(P) denotes the perfect model of P which, as noted in Sec-
tion 2.8, coincides with the minimal supported model of Apt, Blair and
Walker, with the well-founded total model and with the unique stable
model.
Przymusinski [l989b; 1991] obtains results applicable to all logic pro-
grams by using a 3-valued autoepistemic logic. Let Mwp(P) denote the
3-valued well-founded model denned in Section 2.8. He shows that if P
is a logic program then I(P) always has at least one stable autoepistemic
expansion, and that the autoepistemic semantics coincides with the well-
founded semantics:
For every ground atom A,
A is true in Mwp(P) iff A is believed in I(P)
A is false in Mwp(P) iff A is disbelieved in I(P)
A is undefined in Mwp(P) iff A is undefined in I(P).
Here A is believed (resp. disbelieved) in I(P] if A is believed (resp.
disbelieved) in all stable autoepistemic expansions of I(P), otherwise we
say A is undefined in I(P)-
For further details of autoepistemic logic see Konolige's chapter in Vol-
ume 3 of this Handbook.

2.11 Deductive calculi for negation as failure


The standard procedural descriptions of logic programming systems such
as Prolog, SLD- and SLDNF-resolution are in terms of trees. This makes
proving theorems about them rather awkward because the proofs somehow
410 J. C. Shepherdson

have to involve the tree structure. So it might be useful to have descriptions


in the form of a deductive calculus of the familiar kind, based on axioms
and rules of inference, so that proofs can be simply by induction on the
length of the derivation. Mints [1986] gave such a calculus for pure Prolog,
and Gabbay and Sergot [1986] implicitly suggested a similar calculus for
negation as failure.
The definition of SLDNF-resolution given by Kunen [1989] which we
reproduced in Section 1.3 can easily be put in the form of a Mints type
calculus.
The meaning of the notation is as follows:
Y A query, i.e. a sequence L1, . . . , Ln, n -0 of literals.
Yd The result of applying the substitution 0 to the goal Y.
(Y;0) The query Y succeeds with answer substitution 0.
(Y) The query Y succeeds.
~ (Y) The query Y fails finitely.
~ j(Y) The goal Y = A, X (where A is an atom) fails finitely if you
consider only the branch starting with the attempt to unify A
with the jth suitable program clause (i.e. whose head contains
the same predicate as A does).
i :A - Z The clause A - Z is (a variant of) the ith clause of the given
program P which is suitable for (i.e. whose head contains the
same predicate as) A.
The calculus operates on formulae of the forms

The axioms will be those formulae

such that A - Z is indeed the ith clause of P which is suitable for A,


together with
(true; 1) (START)
where 1 denotes the identity substitution. The rules of inference are:
Two rules allowing permutation of literals in a goal

where Q" denotes any permutation of the atoms of Q.


Negation as Failure, Completion and Stratification 411

A rule corresponding to resolution

where A is an atom and a — mgu (A, A'). In all these rules A' — Z is as
usual supposed to be a variant of the program clause which is standardised
apart so as to have no variables in common with A, X.
A rule allowing you to pass from 'succeeds with answer 0' to 'succeeds',

Finally five rules for negation as failure,

where k is the number of suitable clauses for A, the first sub-goal of y,

provided that A, A' are not unifiable,

where a = mgu (A, A'),

if .4 is ground

if .4 is ground.
If we take SLDNF-resolution to be defined in the familiar way in terms
of trees, as in Lloyd [1987] then the statement that Kunen's definition given
in Section 1.3 above is equivalent to it amounts to the statement:
A goal X succeeds under SLDNF-resolution from P iff (X) is
derivable in this calculus; it succeeds with answer substitution
9 iff ( X ; 0 ) is derivable in the calculus; it fails iff - (X) is
derivable in the calculus.
The proof is routine, by induction on the length of the derivation for
the 'if halves, and by induction on the number of nodes in the success
412 J. C. Shepherdson

or failure tree for the 'only if halves. For a survey of these calculi see
Shepherdson [1992a] which also gives the obvious extension of this calculus
to deal with the extended negation as failure rules described in Section 1.6
above:
if A fails then -A succeeds with answer 6
if A succeeds with answer 1 then -A fails.

References
[Andreka and Nemeti, 1978] H. Andreka and I. Nemeti. The generalised
completeness of Horn predicate logic as a programming langauge. Acta
Cybernet, 4:3-10, 1978.
[Apt, 1990] K. R. Apt. Introduction to logic programming. In J. van
Leeuwen, editor, Handbook of Theoretical Computer Science, Vol B,
Chapter 10. Elsevier Science, North Holland, Amsterdam, 1990.
[Apt and Bezem, 1991] K. R. Apt and M. Bezem. Acyclic programs. New
Generation Computing, 9:335-363, 1991.
[Apt and Blair, 1988] K. R. Apt and H. A. Blair. Arithmetic classification
of perfect models of stratified programs. Technical Report TR-88-09,
University of Texas at Austin, 1988.
[Apt, 1994] K. R. Apt and R. Bol. Logic programming and negation. Jour-
nal of Logic Programming, 1994.
[Apt and Pugin, 1987] K. R. Apt and J-M. Pugin. Management of strat-
ified databases. Technical Report TR-87-41, University of Texas at
Austin, 1987.
[Apt and van Emden, 1982] K. R. Apt and M. H. van Emden. Contribu-
tions to the theory of logic programming. Journal of the ACM, 29:841-
863, 1982.
[Apt et al, 1988] K. R. Apt, H. A. Blair, and A. Walker. Towards a theory
of declarative knowledge. In J. Minker, editor, Foundations of Deductive
Databases and Logic Programming, pp. 89-148. Morgan Kaufmann, Los
Altos, CA, 1988.
[Barbuti and Martelli, 1986] R. Barbuti and M. Martelli. Completeness of
SLDNF-resolution for structured programs. Preprint, 1986.
[Blair, 1982] H. A. Blair. The recursion theoretic complexity of the se-
mantics of predicate logic as a programming language. Information and
Control, 54, 24-47, 1982.
[Borger, 1987] E. Borger. Unsolvable decision problems for prolog pro-
grams. In E. Borger, editor, Computer Theory and Logic. Lecture Notes
in Computer Science, Springer-Verlag, 1987.
[Brass and Lipeck, 1989] S. Brass and U. W. Lipeck. Specifying closed
world assumptions for logic databases. In Proc. Second Symposium on
Mathematical Fundamentals of Database Systems (MFDBS89), 1989.
Negation as Failure, Completion and Stratification 413

[Cavedon, 1988] L. Cavedon. On the Completeness of SLDNF-Resolution.


PhD thesis, Melbourne University, 1988.
[Cavedon, 1989] L. Cavedon. Continuity, consistency and completeness
properties for logic programs (extended abstract). In Proceedings of the
Sixth International Conference on Logic Programming, Lisbon, pp. 571-
589, 1989.
[Cavedon and Lloyd, 1989] L. Cavedon and J. W. Lloyd. A completeness
theorem for SLDNF-resolution. Journal of Logic Programming, 1989.
[Cerrito, 1992] S. Cerrito. A linear axiomatization of negation as failure.
Journal of Logic Programming, 12, 1-24, 1992.
[Cerrito, 1993] S. Cerrito. Negation and linear completion. In L. Farinas
del Cerro and M. Pentonnen, editors, Intensional Logic for Programming.
Oxford University Press, 1993.
[Chan, 1988] D. Chan. Constructive negation based on the completed
database. Technical report, European Computer-Industry Research Cen-
tre, Munich, 1988.
[Clark, 1978] K. L Clark. Negation as failure. In H. Gallaire and J. Minker,
editors, Logic and Data Base, pp. 293-322. Plenum, New York, 1978.
[Davis, 1983] M. Davis. The prehistory and early history of automated
deduction. In J. Siekmann and G. Wrightson, editors, Automation of
Reasoning, pp. 1-28. Springer, Berlin, 1983.
[Ebbinghaus, 1969] H. D. Ebbinghaus. Uber eine pradikaten logik mit par-
tiell defmierten pradikaten and funktionen. Arch. math. Logik, 12, 39-53,
1969.
[Fine, 1989] K. Fine. The justification of negation as failure. In J. E. Fen-
stad et al., editor, Logic, Methodology and Philosophy of Science VIII.
Elsevier Science, Amsterdam, 1989.
[Fitting, 1985] M. Fitting. A Kripke-Kleene semantics for general logic
programs. Journal of Logic Programming, 2, 295-312, 1985.
[Fitting, 1986] M. Fitting. Partial models and logic programming. Com-
puter Science, 48, 229-255, 1986.
[Fitting, 1987a] M. Fitting. Logic programming on a topological bilattice.
Technical report, H. Lehman College, (CUNY), Bronx, NY, 1987.
[Fitting, 1987b] M. Fitting. Pseudo-boolean valued prolog. Technical re-
port, Research Report, H. Lehman College, (CUNY), Bronx, NY, 1987.
[Fitting, 1991] M. Fitting. Bilattices and the semantics of logic program-
ming. Journal of Logic Programming, 11, 91-116, 1991.
[Fitting and Ben-Jacob, 1988] M. Fitting and M. Ben-Jacob. Stratified
and three-valued logic programming semantics. Technical report, Dept.
of Computer Science, CUNY, 1988.
[Gabbay, 1986] D. M. Gabbay. Modal provability foundations for negation
by failure, 1986. Preprint.
414 J. C. Shepherdson

[Gabbay, 1989] D. M. Gabbay. Modal provability interpretation for nega-


tion by failure. In P. Schroeder-Heister, editor, Extensions of Logic Pro-
gramming, pp. 179-222. LNCS 475, Springer-Verlag, Berlin, 1989.
[Gabbay and Sergot, 1986] D. M. Gabbay and M. J. Sergot. Negation as
inconsistency. Journal of Logic Programming, 1, 1-36, 1986.
[Gallier and Raatz, 1987] J. H. Gallier and S. Raatz. HORNLOG: A graph
based interpreter for general Horn clauses. Journal of Logic Program-
ming, 4, 119-155, 1987.
[Gallier and Raatz, 1989] J. H. Gallier and S. Raatz. Extending SLD res-
olution methods to equational Horn clauses using E-unification. Journal
of Logic Programming, 6, 3-43, 1989.
[Gelfond, 1987] M. Gelfond. On stratified autoepistemic theories. In Pro-
ceedings AAAI-87, pp. 207-211. American Association for Artificial In-
telligence, Morgan Kaufmann, Los Altos, CA, 1987.
[Gelfond and Lifschitz, 1988] M. Gelfond and V: Lifschitz. The stable
model semantics for logic programming. In 5th International Confer-
ence on Logic Programming, Seattle, 1988.
[Gelfond et al, 1986] M. Gelfond, H. Przymusinski, and T. Przymusinski.
The extended closed world assumption and its relationship to parallel
circumscription. In Proceedings ACM SIGACT-SIGMOD Symposium
on Principles of Database Systems, Cambridge, MA, pp. 133-139,1986.
[Gelfond et al., 1989] M. Gelfond, H. Przymusinski, and T. Przymusinski.
On the relationship between circumscription and negation as failure.
Artificial Intelligence, 39, 265-316, 1989.
[Girard, 1987] J. Y. Girard. Linear logic. Theoretical Computer Science,
50, 1987.
[Goguen and Burstall, 1984] J. A. Goguen and R. M. Burstall. Institu-
tions: Abstract model theory for computer science. In E. Clark and
D. Kozen, editors, Proc. of Logic Programming Workshop, pp. 221-256.
Lecture Notes in Computer Science 164, Springer-Verlag, 1984.
[Haken, 1985] A. Haken. The intractability of resolution. Theoretical Com-
puter Science, 39, 297-308, 1985.
[Henschen and Park, 1988] L. J. Henschen and H-S. Park. Compiling the
GCWA in indefinite databases. In J. Minker, editor, Foundations of De-
ductive Databases and Logic Programming, pp. 395-438. Morgan Kauf-
mann Publishers, Los Altos, CA, 1988.
[Hodges, 1985] W. Hodges. The logical basis of PROLOG, 1985. unpub-
lished text of lecture, 10pp.
[Jaffar and Stuckey, 1986] J. Jaffar and P. J. Stuckey. Canonical logic pro-
grams. Journal of Logic Programming, 3, 143-155, 1986.
[Jaffar et al., 1983] J. Jaffar, J. L. Lassez, and J. W. Lloyd. Completeness
of the negation as failure rule. In IJCAI-83, Karlsruhe, pp. 500-506,
Negation as Failure, Completion and Stratification 415

1983.
[Jaffar et ai, 1984] J. Jaffar, J. L. Lassez, and M. J. Maher. A theory of
complete logic programs with equality. Journal of Logic Programming,
1, 211-223, 1984.
[Jaffar et al., 1986a] J. Jaffar, J. L. Lassez, and M. J. Maher. Comments
on general failure of logic programs. Journal of Logic Programming, 3,
115-118, 1986.
[Jaffar et al., 1986b] J. Jaffar, J. L. Lassez, and M. J. Maher. Some issues
and trends in the semantics of logic programs. In Proceedings Third
International Conference on Logic Programming, pp. 223-241. Springer,
1986.
[Jager, 1988] G. Jager. Annotations on the consistency of the closed world
assumption. Technical report, Computer Science Dept., Technische
Hochschule, Zerich, 1988.
[Jager, 1993] G. Jager. Some proof-theoretic aspects of logic programming.
In F. L. Bauer, W. Brauer, and H. Schwichtenberg, editors, Logic and
Algebra of Specification, pp. 113-142. Springer-Verlag, 1993. Proceed-
ings of the NATO Advanced Studies Institute on Logic and Algebra of
Specification, Marktoherdorf, Germany, 1991.
[Kleene, 1952] S. C. Kleene. Introduction to Metamathematics. van Nos-
trand, New York, 1952.
[Kowalski, 1979] R. A. Kowalski. Logic for Problem Solving. North Hol-
land, New York, 1979.
[Kunen, 1987] K. Kunen. Negation in logic programming. Journal of Logic
Programming, 4, 289-308, 1987.
[Kunen, 1989] K. Kunen. Signed data dependencies in logic programs.
Journal of Logic Programming, 7, 231-245, 1989.
[Lassez and Maher, 1985] J. L. Lassez and M. J. Maher. Optimal fixed-
points of logic programs. Theoretical Computer Science, 15-25, 1985.
[Lewis, 1978] H. Lewis. Renaming a set of clauses as a horn set. Journal
of the ACM, 25, 134-135, 1978.
[Lifschitz, 1985] V. Lifschitz. Computing circumscription. In Proceedings
IJCAI-85, pp. 121-127, 1985.
[Lifschitz, 1988] V. Lifschitz. On the declarative semantics of logic pro-
grams with negation. In J. Minker, editor, Foundations of Deductive
Databases and Logic Programming, pp. 177-192. Morgan Kaufmann, Los
Altos, CA, 1988.
[Lloyd, 1987] J. W. Lloyd. Foundations of Logic Programming. Springer,
Berlin., second edition, 1987.
[Lloyd and Topor, 1984] J. W. Lloyd and R. W. Topor. Making Prolog
more expressive. Journal of Logic Programming, 1, 225-240, 1984.
416 J. C. Shepherdson

[Lloyd and Topor, 1985] J. W. Lloyd and R. W. Topor. A basis for de-
ductive data base systems, II. Journal of Logic Programming, 3, 55-68,
1985.
[Loveland, 1988] D. W. Loveland. Near-horn prolog. In J.-L. Lassez, editor,
Proc. ICLP'87. MIT Press, Cambridge, MA, 1988.
[McCarthy, 1984] J. McCarthy. Applications of circumscription to formal-
izing common sense knowledge. In AAAI Workshop on Non-Monotonic
Reasoning, pp. 295-323, 1984.
[Maher, 1988] M. J. Maher. Complete axiomatization of the algebras of fi-
nite, infinite and rational trees. In Proc. of the Third Annual Symposium
on Logic in Computer Science, Edinburgh, pp. 345-357, 1988.
[Mahr and Makowsky, 1983] B. Mahr and J. A. Makowsky. Characteriz-
ing specification languages which admit initial semantics. In Proc. 8th
CAAP, pp. 300-316. Lecture Notes in Computer Science 159, Springer-
Verlag, 1983.
[Makowsky, 1986] J. A. Makowsky. Why Horn formulas matter in com-
puter science: Initial structures and generic examples (extended ab-
stract). Technical Report 329, Technion Haifa, 1986. Also in Mathemat-
ical Foundations of Software Development, Proceedings of the Interna-
tional Joint Conference on Theory and Practice of Software Development
(TAPSOFT) (H. Ehrig et al, Eds.), Lecture Notes in Computer Science
185, pp. 374-387, Springer, 1985. (Revised version May 15, 1986, 1-28,
preprint.) The references in the text are to this most recent version.
[Malcev, 1971] A. Malcev. Axiomatizable classes of locally free algebras of
various types. In The Metamathematics of Algebraic Systems: Collected
Papers, chapter 23, pp. 262-281. North-Holland, Amsterdam, 1971.
[Mancarella et al., 1988] P. Mancarella, S. Martini, and D. Pedreschi.
Complete logic programs with domain closure axiom. Journal of Logic
Programming, 5, 263-276, 1988.
[Meltzer, 1983] B. Meltzer. Theorem-proving for computers: Some results
on resolution and renaming. In J. Siekmann and G. Wrightson, editors,
Automation of Reasoning, pp. 493-495. Springer, Berlin, 1983.
[Minker, 1982] J. Minker. On indefinite data bases and the closed world
assumption. In Proc. 6th Conference on Automated Deduction, pp. 292-
308. Lecture Notes in Computer Science 138, Springer-Verlag, 1982.
[Minker and Perils, 1985] J. Minker and D. Perlis. Computing protected
circumscription. Journal of Logic Programming, 2, 1-24, 1985.
[Mints, 1986] G. Mints. Complete calculus for pure Prolog (Russian). Proc.
Acad. Sci. Estonian SSR, 35, 367-380, 1986.
[Moore, 1985] R. C. Moore. Semantic considerations on non-monotonic
logic. Artificial Intelligence, 25, 75-94, 1985.
Negation as Failure, Completion and Stratification 417

[Mycroft, 1983] A. Mycroft. Logic programs and many-valued logic. In


Proc. 1st STACS Conf, 1983.
[Naish, 1986] L. Naish. Negation and quantifiers in NU-Prolog. In Proceed-
ings Third International Conference on Logic Programming, pp. 624-634.
Springer, 1986.
[Naqvi, 1986] S. A. Naqvi. A logic for negation in database systems. In
J. Minker, editor, Proceedings of Workshop on Foundations of Deductive
Databases and Logic Programming, Washington, DC, 1986.
[Plaisted, 1984] D. A. Plaisted. Complete problems in the first-order pred-
icate calculus. Journal of Computer System Sciences, 29, 8-35, 1984.
[Poole and Goebel, 1986] D. L. Poole and R. Goebel. Gracefully adding
negation and disjunction to Prolog. In Proceedings Third International
Conference on Logic Programming, pp. 635-641. Springer, 1986.
[Przymusinska, 1987] H. Przymusinska. On the relationship between au-
toepistemic logic and circumscription for stratified deductive databases.
In Proceedings of the ACM SIGART International Symposium on
Methodologies for Intelligent Systems, Knoxville, TN, 1987.
[Przymusinska and Przymusinski, 1988] H. Przymusinska and T. Przy-
musinski. Weakly perfect model semantics for logic programs. In
R. Kowalski and K. Bowen, editors, Proceedings of the Fifth Logic Pro-
gramming Symposium, Association for Logic Programming, pp. 1106-
1122. MIT Press, Cambridge, Mass, 1988.
[Przymusinski, 1988a] T. C. Przymusinski. On the declarative and proce-
dural semantics of logic programs. Journal of Automated Reasoning, 4,
1988. (Extended abstract appeared in: Przymusinski, T.C. [1988] Per-
fect model semantics. In R. Kowalski and K. Bowen, editors, Proceedings
of the Fifth Logic Programming Symposium, pp. 1081-1096, Association
for Logic Programming, MIT Press, Cambridge, MA.
[Przymusinski, 1988b] T. C. Przymusinski. On the semantics of stratified
deductive databases. In J. Minker, editor, Foundations of Deductive
Database and Logic Programming, pp. 193-216. Morgan Kaufmann, Los
Altos, CA, 1988.
[Przymusinski, 1989a] T. C. Przymusinski. On constructive negation in
logic pogramming. In Proceedings of the North American Logic Pro-
gramming Conference, Cleveland, Ohio. MIT Press, Cambridge, MA,
1989. Addendum.
[Przymusinski, 1989b] T. C. Przymusinski. Three-valued non-monotonic
formalisms and logic programming. In Proceedings of the First Interna-
tional Conference on Principles of Knowledge Representation and Rea-
soning (KR '89), Toronto, 1989.
[Przymusinski, 1991] T. C. Przymusinski. Three-valued non-monotonic
formalizations and semantics of logic programming. Artificial Intelli-
gence, 49, 309-343, 1991.
418 J. C. Shepherdson

[Reiter, 1978] R. Reiter. On closed world data bases. In H. Gallaire and


J. Minker, editors, Logic and Data Bases, pp. 55-76. Plenum, New York,
1978.
[Ross, 1989] K. Ross. A procedural semantics for well founded negation in
logic programs. In Proceedings of the Eighth Symposium on Principles
of Database Systems. ACM SIGACT-SIGMOD, 1989.
[Ross and Topor, 1987] K. Ross and R. W. Topor. Inferring negative in-
formation from disjunctive databases. Technical Report 87/1, University
of Melbourne, 1987.
[Sakai and Miyachi, 1983] K. Sakai and T. Miyachi. Incorporating naive
negation into prolog. Technical Report TR-028, ICOT, 1983.
[Sakai and Miyachi, 1986] K. Sakai and T. Miyachi. Incorporating naive
negation into PROLOG. In Proceedings of a conference, Clayton, Victo-
ria, Australia, 3-8 Jan 1984, Vol. 1, pp. 1-12, 1986.
[Sato, 1987] T. Sato. On the consistency of first order logic programs. Tech-
nical Report 87-12, Electrotechnical Laboratory, Ibarki, Japan, 1987.
[Schmitt, 1986] P. H. Schmitt. Computational aspects of three valued logic.
In Proc. 8th Conference on Automated Deduction, pp. 190-19. Lecture
Notes in Computer Science, 230, Springer-Verlag, 1986.
[Shepherdson, 1984] J. C. Shepherdson. Negation as failure: A comparison
of Clark's completed data base and Reiter's closed world assumption.
Journal of Logic Programming, 1, 51-81, 1984.
[Shepherdson, 1985] J. C. Shepherdson. Negations as failure II. Journal of
Logic Programming, 3, 185-202, 1985.
[Shepherdson, 1988a] J. C. Shepherdson. Language and equality theory
in logic programming. Technical Report PM-88-08, Mathematics Dept.,
Univ. Bristol, 1988.
[Shepherdson, 1988b] J. C. Shepherdson. Negation in logic programming.
In J. Minker, editor, Foundations of Deductive Databases and Logic Pro-
gramming, pp. 19-88. Morgan Kaufann, Los Altos, CA, 1988.
[Shepherdson, 1989] J. C. Shepherdson. A sound and complete semantics
for a version of negation as failure. Theoretical Computer Science, 65,
343-371, 1989.
[Shepherdson, 1991] J. C. Shepherdson. Unsolvable problems for SLDNF-
resolution. Journal of Logic Programming, 10, 19-22, 1991.
[Shepherdson, 1992a] J. C. Shepherdson. Mints type deductive calculi for
logic programming. Annals of Pure and Applied Logic, 56, 7-17, 1992.
[Shepherdson, 1992b] J. C. Shepherdson. SLDNF-resolution with equality.
Journal of Automated Reasoning, 8, 297-306, 1992.
[Stark, 1991] R. F. Stark. A complete axiomatization of the three-valued
completion of logic programs. Journal of Logic and Computation, 1,
811-834, 1991.
Negation as Failure, Completion and Stratification 419

[Stark, 1994] R. F. Stark. Input/output dependencies for normal logic pro-


grams. Journal of Logic and Computation, 4, 249-262, 1994.
[Stark, 1994] R. F. Stark. From logic programs to inductive definitions.
Technical report, CIS, Universitaet Munich, 1994.
[Stickel, 1986] M. E. Stickel. A PROLOG technology theorem prover: Im-
plementation by an extended PROLOG compiler. In Proceedings Eighth
International Conference on Automated Deduction, Springer, pp. 573-
587, 1986.
[Terracini, 1988a] L. Terracini. A complete bi-modal system for a class of
models. Atti dell'Academia delle Scienze di Torino, 122, 116-125,1988.
[Terracini, 1988b] L. Terracini. Modal interpretation for negation by fail-
ure. Atti dell'Academia delle Scienze di Torino, 122, 81-88, 1988.
[Van Gelder, 1988] A. Van Gelder. Negation as failure using tight deriva-
tions for general logic programs. In J. Minker, editor, Foundations of
Deductive Databases and Logic Programmming, pp. 149-176. Morgan
Kaufmann Publishers, Los Altos, CA, 1988. revised version in Journal
of Logic Programming, 6, 109-133, 1989.
[Van Gelder, 1989] A. Van Gelder. Negation as failure using tight deriva-
tions for general logic programs. Journal of Logic Programming, 6, 109-
133, 1989.
[Van Gelder et al., 1988] A. Van Gelder, K. Ross, and Schlipf. Unfounded
sets and well-founded semantics for general logic programs. In Pro-
ceedings of the Symposium on Principles of Database Systems, ACM
SIGACT-SIGMOD, 1988.
[Voda, 1986] P. J. Voda. Choices in, and limitations of, logic programming.
In Proc. 3rd International Conference on Logic Programming, pp. 615-
623. Springer, 1986.
[Yahya and Henschen, 1985] A. Yahya and L. Henschen. Deduction in non-
horn databases. Journal of Automated Reasoning, 1, 141-160, 1985.
This page intentionally left blank
Meta-Programming in Logic
Programming
P. M. Hill and J. Gallagher

Contents
1 Introduction 422
1.1 Theoretical foundations 423
1.2 Applications 425
1.3 Efficiency improvements 426
1.4 Preliminaries 427
2 The non-ground representation 429
2.1 The representation 431
2.2 Reflective predicates 434
2.3 Meta-programming in Prolog 439
3 The ground representation 440
3.1 The representation 442
3.2 Reflective predicates 448
3.3 The language Godel and meta-programming . . . . 453
4 Self-applicability 459
4.1 Separated meta-programming 460
4.2 Amalgamated meta-programming 461
4.3 Ambivalent logic 467
5 Dynamic meta-programming 468
5.1 Constructing programs 468
5.2 Updating programs 471
5.3 The three wise men problem 473
5.4 Transforming and specializing programs 478
6 Specialization of meta-programs 481
6.1 Logic program specialization 481
6.2 Specialization and compilation 487
6.3 Self-applicable program specializers 488
6.4 Applications of meta-program specialization . . . . 489
422 P. M. Hill and J. Gallagher

1 Introduction
A meta-program, regardless of the nature of the programming language,
is a program whose data denotes another (object) program. The impor-
tance of meta-programming can be gauged from its large number of ap-
plications. These include compilers, interpreters, program analysers, and
program transformers. Furthermore, if the object program is a logic or
functional program formalizing some knowledge, then the meta-program
may be regarded as a meta-reasoner for reasoning about this knowledge.
In this chapter, the meta-program is assumed to be a logic program. The
object program does not have to be a logic program although much of the
work in this chapter assumes this.
We have identified three major topics for consideration. These are the
theoretical foundations of meta-programming, the suitability of the alterna-
tive meta-programming techniques for different applications, and methods
for improving the efficiency of meta-programs. As with logic programs
generally, meta-programs have declarative and procedural semantics. The
theoretical study of meta-programming shows that both aspects of the
semantics depend crucially on the manner in which object programs are
represented as data in a meta-program. The second theme of the pa-
per is the problem of designing and choosing appropriate ways of spec-
ifying important meta-programming problems, including dynamic meta-
programming and problems involving self-application. The third theme
concerns efficient implementation of meta-programs. Meta-programming
systems require representations with facilities that minimize the overhead
of interpreting the object program. In addition, efficiency can be gained
by transforming the meta-program, specializing it for the particular object
program it is reasoning about. This chapter, which concentrates on these
aspects of meta-programming, is not intended to be a survey of the field.
A more complete survey of meta-programming for logic programming can
be found in [Barklund, 1995].
Many issues in meta-programming have their roots in problems in logic
which have been studied for several decades. This chapter emphasizes
meta-programming solutions. It is not intended to give a full treatment of
the underlying logical problems, though we try to indicate some connections
to wider topics in meta-logic.
The meta-programs in this chapter are logic programs based on first
order logic. An alternative approach, which we do not describe here, is
to extend logic programming with features based on higher-order logic.
Higher-order logic programming is an ongoing subject of research and is
discussed by Miller and Nadathur [1995]. The higher-order logic program-
ming language AProlog and its derivatives have been shown to be useful for
many meta-programming applications, particularly when the object pro-
gram is a functional program [Miller and Nadathur, 1987], [Hannan and
Meta-Programming in Logic Programming 423

Miller, 1989], [Hannan & Miller, 1992].


To avoid confusion between a programming language such as Prolog
and the language of an actual program, the programming language will be
referred to as a programming system and the word language will be used to
refer to the set of expressions defined by a specific alphabet together with
rules of construction. Note that the programming system for the object
program does not necessarily have to be the same as the programming
system for the meta-program although this is often the case and most
theoretical work on meta-programming in logic programming makes this
assumption.
1.1 Theoretical foundations
The key to the semantics of a meta-program is the way the object program
expressions are represented in the meta-program. However in logic pro-
gramming there is no clear distinction between the data and the program,
since the data is sometimes encoded as program clauses, and the question
arises as to what kind of image of the object program can be included in the
meta-program. Normally, a representation is given for each symbol of the
object language. This is called a naming relation. Then rules of construc-
tion can be used to define the representation of the constructed terms and
formulas. Each expression in the language of the object program should
have at least one representation as an expression in the language of the
meta-program.
It is straightforward to define a naming relation for the constants, func-
tions, propositions, predicates and connectives in a language. For example,
we can syntactically represent each object symbol s by its primed form s'
or even by the same token s. However, although a meta-program symbol
may be syntactically similar to the object symbol it represents, it may be
in a different syntactic category. Thus a predicate is often represented as
a function and a proposition as a constant. A connective in the object
language may be represented as a connective, predicate, or function in the
meta-program.
The main problem arises in the representation of the variables of the ob-
ject language in the language of the meta-program. There are two options;
either represent the object variables as ground terms or represent them as
variables (or, more generally, non-ground terms). The first is called the
ground representation and the second the non-ground representation. In
logic, variables are normally represented as ground terms. Such a repre-
sentation has been shown to have considerable potential for reasoning about
an object theory. For example, the arithmetization of first order logic, il-
lustrated by the Godel numbering, has been used in first order logic to
prove the well-known completeness and incompleteness theorems. As logic
programming has the full power of a Turing machine, it is clearly possi-
ble to write declarative meta-programs that use the ground representation.
424 P. M. Hill and J. Gallagher

However, ease of programming and the efficiency of the implementation


are key factors in the choice of technique used to solve a problem. So the
support provided by a programming language for meta-programming often
determines which representation should be used. Most logic programming
systems have been based on Prolog and this has only provided explicit
support for the representation of object variables as variables in the lan-
guage of the meta-program. It was clear that there were several semantic
problems with this approach and, as a consequence, for many years the
majority of meta-programs in logic programming had no clear declarative
semantics and their only defined semantics was procedural. The situation
is now better understood and the problem has been addressed in two ways.
One solution is to clarify the semantics of the non-ground representa-
tion. This has been done by a number of researchers and the semantics is
now better understood. The other solution is for the programmer to write
meta-programs that use the ground representation. To save the user the
work of constructing a ground representation, systems such as Reflective
Prolog [Costantini and Lanzarone, 1989] and Godel [Hill and Lloyd, 1994]
that provide a built-in ground representation have been developed. There
are advantages and disadvantages with each of these solutions. It is much
more difficult to provide an efficient implementation when using the ground
representation compared to the non-ground representation. However, the
ground representation is far more expressive that the non-ground repre-
sentation and can be used for many more meta-programming tasks. In
Sections 2 and 3, we discuss the non-ground and ground representations,
respectively, in more detail.
One other issue concerning the syntactic representation is how the the-
ory of the object program is represented in the meta-program. The meta-
program can either include the representation of the object program it is
reasoning about as program statements, or represent the object program
as a term in a goal that is executed in the meta-program. In the first case
the components of the object program are fixed and the meta-program is
specialized for just those object programs that can be constructed from
these components. In the second case, the meta-program can reason about
arbitrary object programs. These can be either fixed or constructed dy-
namically during the derivation procedure.
The syntactic representation of an object program as outlined above
ignores the semantics of object expressions. As an example, consider an
object program that implements the usual rules of arithmetic. In such a
program, 1 < 2 should be a true statement. Assuming a simple syntactic
representation that represents this formula as a term, the truth of 1 < 2
would be lost to the meta-program. Thus it is necessary to have relations in
the meta-program that represent semantic properties of the object program
so that, if, for example, the meta-program needed to reason about the truth
of arithmetic inequalities in the object program, it would require a relation
Meta-Programming in Logic Programming 425

representing the truth of inequalities such as 1 < 2. More generally, we


take the concept of a reflective principle to cover the representation, in the
meta-language, of 'truth of object formulas' and, also, the representation of
the notions of validity, derivability, and related concepts such as inference
rules and proofs.
In logic programming, a meta-program will normally require a variety
of reflective predicates realising different reflective principles. For exam-
ple, there may be a reflective predicate defining SLD-resolution for (the
representation of) the object program. Other reflective predicates may de-
fine (using a representation of the object program) unification or a single
derivation step. In fact, it is these more basic steps for which support is
often required.
To make meta-programming a practical tool, many useful reflective
predicates (based on a pre-defined representation) are often built into the
programming system. However, a programming system cannot provide all
possible reflective predicates even for a fixed representation. Thus a sys-
tem that supports a representation with built-in reflective predicates must
also provide a means by which a user can define additional reflective pred-
icates appropriate for the particular application. The actual definition of
the reflective predicates depends not only on the reflective principles it
is intended to model but also on the representation. Later in this chap-
ter, we discuss three reflective predicates in detail. One is Solve/2 which
is defined for the non-ground representation and the others are IDemo/3
and JDemo/3 which are defined for the ground representation. Both Pro-
log and Godel provide system predicates that are reflective. For example,
in Prolog, the predicate call/1 succeeds if its only argument represents
the body of a goal and that goal succeeds with the object program. The
Godel system provides the predicate Succeed which has three arguments.
Succeed/3 is true if the third argument represents the answer that would
be computed by the Godel program represented in the first argument using
the goal represented in the second argument.

1.2 Applications
We identify two important application requirements in this chapter. One is
for meta-programs that can be applied to (representations of) themselves
and the other is for meta-programs that need to reason about object pro-
grams that can change. We call the first, self-applicable, and the second,
dynamic meta-programming.
There are many programming tools for which self-application is impor-
tant. In particular; interpreters, compilers, program analysers, program
transformers, program debuggers, and program specializers can be usefully
applied to themselves. Self-applicable meta-programming is discussed in
Section 4 and the use of self-applicable program specializers is discussed in
Section 6.
426 P. M. Hill and J. Gallagher

For static meta-programming, where the object program is fixed, the


meta-program can include the representation of the object program it is
reasoning about in its program statements. However, frequently the pur-
pose of the meta-program is to create an object program in a controlled
way. For example, the object program may be a database that must change
with time as new information arrives; the meta-prpgram may be intended
to perform hypothetical reasoning on the object program; or the object
program may consist of a number of components and the meta-program is
intended to reason with a combination of these components. This form of
meta-programming in which the object program is changed or constructed
by the meta-program we call dynamic. Section 5 explains the different
forms of dynamic meta-programming in more detail.
Of course there are many applications which are both self-applicable and
dynamic and for these we need a combination of the ideas discussed in these
sections. One of these is a program that transforms other programs. As the
usual motivation for transforming a program is to make its implementation
more efficient, it is clearly desirable for the program transformer to be self-
applicable. Moreover, the program transformer has to construct a new
object program dynamically, possibly interpreting at different stages of the
transformation (possibly temporary) versions of the object programs. We
describe such an application in Section 6.

1.3 Efficiency improvements


Despite the fact that meta-programming is often intended to implement a
more efficient computation of an object program, there can be a significant
loss of efficiency even when the meta-program just simulates the reasoning
of the object program. This is partly due to the extra syntax that is
required in the representation and partly due to the fact that the compiler
for the object program performs a number of optimizations which are not
included in the meta-program's simulation. One way of addressing this
problem is through program specialization. This approach is explained
in Section 6. The main aim of specialization is to reduce the overhead
associated with manipulating object language expressions. When dealing
with a fixed object theory the overhead can largely be 'compiled away'
by pre-computing, or 'partially evaluating' parts of the meta-program's
computation.
A second reason for considering specialization is that it can establish
a practical link between the ground and the non-ground representations.
In certain circumstances a meta-program that uses a non-ground repre-
sentation can be obtained by partially evaluating one that uses a ground
representation.
Meta-Programming in Logic Programming 427

Cat(Tom)
Mouse (Jerry)
Chase(x,y) - Cat(x) A Mouse(y)

Fig. 1. The Chase Program


Member(x, [x\y])
Member (x,[z|y]) - Member(x,y)
Member(x, Cons(x,y))
Member(x, Cons(z,y)) - Member(x,y)

Fig. 2. The Member Program

1.4 Preliminaries
The two principal representations, non-ground and ground, discussed in
this chapter are supported by the programming systems, Prolog and Godel,
respectively. Since Prolog and Godel have different syntax, we have, for
uniformity, adopted a syntax similar to Godel. Thus, for example, vari-
ables begin with a lower-case letter, non-variable symbols are either non-
alphabetic or begin with an upper-case letter. The logical connectives are
standard. Thus A, -, -, and 3 denote conjunction, negation, left implica-
tion and existential quantification, respectively. The exception to the use
of this syntax is where we quote from particular programming systems. In
these cases, we adopt the appropriate notation.
Figures 1 and 2 contain two simple examples of logic programs (in
this syntax) that will be used as object programs to illustrate the meta-
programming concepts in later sections. There are two syntactic forms for
the definition of Member in Figure 2. One uses the standard list notation
[...|...] and the other uses the constant Nil and function Cons/2 to construct
a list. For illustrating the representations it is usually more informative
to use the latter form although the former is simpler. The language of
the program in Figure 1 is assumed to be defined using just the symbols
Cat/1, Mouse/I, Chase/2, Tom, and Jerry occurring in the program. The
language of the program in Figure 2 is assumed to include the non-logical
symbols in the program in Figure 1, the predicate Member/2, the function
Cons/2, the constant Nil, together with the natural numbers.
We summarize here the main logic programming concepts required for
this chapter. Our terminology is based on that of [Lloyd, 1987] and the
reader is referred to this book for more details. A logic program contains
a set of program statements. A program statement which is a formula in
first order logic is written as either

H
428 P. M. Hill and J. Gallagher

or
H -B

where H is an atom and B is an arbitrary formula called the body of the


statement. If B is a conjunction of literals (respectively, atoms), then the
program statement is called a normal (respectively, definite) clause. A
program is normal (respectively, definite) if all its statements are normal
(respectively, definite). A goal for a program is written as

denoting the formula -B, where B is an arbitrary formula called the body
of the goal. If B is a conjunction of literals (respectively, atoms), then
the goal is normal (respectively, definite). As in Prolog and Godel, an '_'
is used to denote a unique variable existentially quantified at the front of
the atom in which it occurs. It is assumed that all other free variables are
universally quantified at the front of the statement or goal.
The usual procedural meaning given to a definite logic program is SLD-
resolution. However, this is inadequate to deal with more general types of
program statements. In logic programming, a (ground) negative literal -A
in the body of a normal clause or goal is usually implemented by 'negation
as failure':
the goal -A succeeds if A fails
the goal -A fails if A succeeds.
Using the program as the theory, negation, as defined in classical logic, can-
not provide a semantics for such a procedure. However, it has been shown
that negation as failure is a satisfactory implementation of logical negation
provided it is assumed that the theory of a program is not just the set of
statements in the program, but is its completion [Clark, 1978]. A clear and
detailed account of negation as failure and a definition of the completion
of a normal program is given in [Shepherdson, 1994]. This definition can
easily be extended to programs with arbitrary program statements. More-
over, it is shown in [Lloyd, 1987] that any program can be transformed
to an equivalent normal program. Thus, in this chapter, we only consider
object programs that are normal and, unless otherwise stated, that the
semantics of a logic program is defined by its completion. The completion
of a program P is denoted by comp(P).
In meta-programming, we are concerned, not only with the actual for-
mulas in an object theory but also the language in which these formulas
are written. This language which we call the object language may either be
inferred from the symbols appearing in the object theory, or be explicitly
declared. When the object theory is a Prolog program, then the language
is inferred. However, when the object program is a Godel program the lan-
Meta-Programming in Logic Programming 429

guage is declared. These declarations also declare the types of the symbols
so that the semantics of the Godel system is based on many-sorted logic.
The following notation is used in this chapter.
1. The symbols £ and M denote languages. Usually £ denotes an object
language while M is the language of some meta-program.
2. The notation EC means that E is an expression in a language £. The
subscript is omitted when the language is either obvious or irrelevant.
3. The representation of an expression E in another language M is
written [E]M. The actual representation intended by this notation
will depend on the context in which it is used. The subscript is
omitted when the language is either obvious or irrelevant
4. If £ is a language, then the set of representations of expressions of £
in language M. is written |L]M.
We discuss a number of reflective predicates, but the key examples will
realise adaptations of one or both of the following reflective principles. For
all finite sets of sentences A and single sentences B of an object language £,

where Pr denotes the theory of the meta-program with language M and


Demo/2 denotes the reflective predicate. Such a program is often called a
meta-interpreter.
A meta-program reasons about another (object) program. As this ob-
ject program can itself be another meta-program, a tower of meta-programs
can be constructed. Within such a tower, a meta-program can be assigned
a level. The base level of the tower will be an object program, say P0,
which is not a meta-program. At the next level, a program P1 will be a
meta-program for reasoning about P0. Similarly, if, for each i € {1,..., n},
Pi-i is an object program for Pi, then Pi is one level above the level of
Pi-i. Normally, we are only concerned in any two consecutive levels and
refer to the relative positions of an object program and its meta-program
within such a tower as the object level and meta-level, respectively.

2 The non-ground representation


The non-ground representation requires the variables in the object program
to be represented by non-ground terms in the meta-program. This section
assumes, as in Prolog, that object variables are represented as variables in
the meta-program. Before discussing the representation in more detail, we
describe its historical background.
In the early work on logic programming it appears that there was
an understanding that the Prolog system being developed should be self-
applicable. That is, it should facilitate the interpretation of Prolog in
430 P. M. Hill and J. Gallagher

Prolog. The first implementation of a programming system based on the


ideas of logic programming was Marseille Prolog [Colmerauer et al., 1973].
This system was designed for natural language processing but it still pro-
vided a limited number of meta-programming features. However, the meta-
programming predicates such as clause, assert, and retract that we are
familiar with in Prolog (and now formally defined by the committee for
the ISO standard for Prolog [ISO/IEC, 1995]) were introduced by D H D
Warren and were first included as part of DEC-10 Prolog [Pereira et al.,
1978]. Moreover, the ability to use the meta-programming facilities of the
Prolog system to interpret Prolog was first demonstrated in [Pereira et al.,
1978] with the following program.
execute(true).
execute((A,B)) :- execute(A), execute(B).
execute(A) :- clause(A,B), execute(B).
execute(A) :- A.
The program shows the ease by which Prolog programs can interpret them-
selves. It was stated that the last clause enabled the interpreter 'to cope
with calls to ordinary Prolog predicates'. The main purpose for this clause
was to enable the interpreter to cope with system predicates. The use of
the variable A instead of a non-variable goal can be avoided by means of
the call predicate which was also provided by DEC-10 Prolog. Thus the
last clause in this interpreter could have been written as
execute(A) :- call(A).
Both the use of a variable for a goal and the provision of the call predicate
corresponded to and was (probably) inspired by the eval function of Lisp.
Clark and McCabe [1979] illustrated how Prolog could be used for
implementing expert systems. They made extensive use of the meta-
programming facility that allows the use of a variable as a formula. Here
the variable also occurred in the head of the clause or in an atom in the
body of the clause to the left of the variable formula. The programmer had
to ensure that the variable was adequately instantiated before the variable
formula was called. For example in:
r(C) :- system(C), C.
either the call to r should have an atomic formula as its argument, or the
predicate system/1 would have to be defined so that it bound C to an
atomic formula. The left to right computation rule of Prolog ensured that
system (C) was called before C.
This meta-variable feature can be explained as a schema for a set of
clauses, one for each relation used in the program. This explanation is
credited to Roussel. Using just this meta-programming feature, Clark and
McCabe showed how a proof tree could be constructed in solving a query
Meta-Programming in Logic Programming 431

to a specially adapted form of the expert system. A disadvantage of this


approach was that the object program had to be explicitly modified by the
programmer to support the production of the proof as a term.
Shapiro [1982] developed a Prolog interpreter for declarative debugging
that followed the pattern of the execute program above. This was a pro-
gram which was intended for debugging logic programs where the expected
(correct) outcome of queries to a program is supplied by the programmer.
The programmer can then ignore the actual trace of the execution. The
meta-program was based on an interpreter similar to the above program
from [Pereira et al., 1978] but with the addition of an atom system(A) as a
conjunct to the body of the last statement. system/1 is intended to identify
those predicates for which clause/2 is undefined or explicit interpretation
is not required.
A major problem with the use of meta-programs for interpreting other
Prolog programs is the loss of efficiency caused by interpreting the ob-
ject programs indirectly through a meta-program. Gallagher [1986] and
Takeuchi and Furukawa [1986] showed that most of the overhead due to
meta-programming in Prolog could be removed by partially evaluating the
meta-program with respect to a specific object program. This important
development and other research concerning program transformations en-
couraged further development of many kinds of meta-programming tools.
For example, Sterling and Beer [1986; 1989] developed tools for transform-
ing a knowledge base together with a collection of meta-programs for inter-
preting an arbitrary knowledge base into a program with the functionality
of all the interpreters specialized for the given knowledge base.
It is clear that many of the meta-programming facilities of Prolog such
as the predicates var, nonvar, assert, and retract do not have a declar-
ative semantics. However, the predicates such as clause and functor
are less problematical and can interpreted declaratively. In spite of this,
the semantics of these predicates and a logical analysis of simple meta-
programs that used them was not published until the first workshop on
meta-programming in logic in 1988 [Abramson and Rogers, 1989]. Sub-
stantial work has been done since on the semantics of the Prolog style of
meta-programming and some of this will be discussed later in this section.
Since a proper theoretical account of any meta-program depends on the de-
tails of how the the object program is represented, in the next subsection
we describe the simple non-ground representation upon which the Prolog
meta-programming provision is based.

2.1 The representation


It can be seen from the above historical notes that the main motivation for a
non-ground representation is its use in Prolog. Hence, the primary interest
is in the case where variables are represented as variables and non-variable
symbols are represented by constants and functions in the language of the
432 P. M. Hill and J. Gallagher

meta-program.
Therefore, in this section, a non-ground representation is presented
where variables are represented as variables and the naming relation for
the non-logical symbols is summarized as follows.
Object symbol Meta symbol
Constant Constant
Function of arity n Function of arity n
Proposition Constant
Predicate of arity n Function of arity n

Distinct symbols (including the variables) in the object language must be


named by distinct symbols in the meta-language. For example, Tom and
Jerry in the Chase program in Figure 1 could be named by constants, say
Tom' and Jerry. The predicates Cat/I, Mouse/I, and Chase/2 can be
similarly represented by functions, say Cat' /I, Mouse' /I, and Chase'/2.
For a given naming relation, the representations |t| and [A] of a term
t and atom A, are defined as follows.
• If t is a variable x, then [t] is the variable x.
• If t is a constant C, then [t] is the constant C', where C' is the name
of C.
• If t is the term F(t1,.. .,tn), then |t| is F ' ( [ t 1 ] , . . . , [tn]), where F'
is the name of F.
• If A is the atom P ( t 1 , . . . , tn), then [A] is P'([t1],..., [tn]), where
P' is the name of P.
For example, the atom Cat(Tom) is represented (using the above nam-
ing relation) by Cat' (Tom') and Mouse(x) by Mouse'(x). To represent non-
atomic formulas, a representation of the logical connectives is required. The
representation is as follows.
Object connective Meta symbol
Binary connective Function of arity 2
Unary connective Function of arity 1
We assume the following representation for the connectives used in the ex-
amples here.
Object connective Representation
- Prefix function Not /I
A Infix function And/2
- Infix function If/2

Given a representation of the atomic formulas and the above represen-


tation of the connectives, the term [Q] representing a formula Q is defined
Meta-Programming in Logic Programming 433

as follows.
• If Q is of the form -R, then [Q] is Not [ R ] .
• If Q is of the form R A S, then [Q] is [R] And [ S ] .
• If Q is of the form .R - 5, then [Q] is [R] If [S].
Continuing the above example using this naming relation, the formula
Cat(Tom) A Mouse(Jerry)
is represented by the term
Cat'(Tom1) And Mouse'(Jerry').
In this example, the name of an object symbol is distinct from the name
of the symbol that represents it. However, there is no requirement that this
should be the case. Thus, the names of the object symbol and the symbol
that represents it can be the same. For example, with this representation,
the atomic formula Cat (Tom) is represented by the term Cat (Tom). This
is the trivial naming relation, used in Prolog. It does not in itself cause
any amalgamation of the object language and meta-language; it is just a
syntactic convenience for the programmer. We adopt this trivial naming
relation together with the above representation of the connectives for the
rest of this section.
A logic program is a set of normal clauses. It is clearly necessary that
if the meta-program is to reason about the object program, there must be
a way of identifying the clauses of a program. In Prolog, each clause in the
program is represented as a fact (that is, a clause with the empty body).
This has the advantage that the variables in the fact are automatically
standardized apart each time the fact is used. We adopt this representation
here. Thus it is assumed that there is a distinguished constant True and
distinguished predicate Clause/2 in the meta-program defined so that each
clause in the object program of the form
h-b.
is represented in the meta-program as a fact
Clause([h],[b]).
and each fact
h.
is represented in the meta-program as a fact
Clause ([h], True).
Thus, in Figure 1,
Chase(x,y) - Cat(x) A Mouse(y).
is represented by the fact
Clause(Chase(x,y), Cat(x) And Mouse(y)).
The program in Figure 2 would be represented by the two facts.
434 P. M. Hill and J. Gallagher

Clause(Member(x, Cons(x,-)), True).


Clause(Member(x, Cons(-,y)), Member(x, y)).
An issue that has had much attention is whether the facts defining
Clause accurately represent the object program. The problem is a conse-
quence of the fact that, in Prolog, the language is not explicitly defined
but assumed to be determined by the symbols used in the program and
goal. Thus, the variables in the object program range over the terms in
the language of the object program while the variables in the definition
of Clause range, not only over the terms representing terms in the object
program, but also over the terms representing the formulas of the object
program. Thus, in Figure 1, the terms in the object language are just the
two constants Tom and Jerry, while in a meta-program representing this
program, the terms not only include Tom and Jerry but also Cat(Tom),
Mouse(Jerry), Cat(Cat(Tom)) and so on. Thus in the clause
Chase(x,y) - Cat(x) A Mouse(y).
in the object program, x and y are assumed to be universally quantified in
a domain just containing Tom and Jerry, while in the fact
Clause(Chase(x,y), Cat(x) And Mouse(y)).
in the meta-program, x and y are assumed to be universally quantified in
a domain that contains a representation of all terms and formulas of the
object program.
There is a simple solution, that is, assume that the intended interpreta-
tion of the meta-program is typed. The types distinguish between the terms
that represent terms in the object program and terms that represent the
formulas. This approach has been developed by Hill and Lloyd [1989]. An
alternative solution is to assume an untyped interpretation but restrict the
object program so that its semantics is preserved in the meta-program.
This approach has been explored by Martens and De Schreye [l992a],
[I992b] using the concept of language independence. (Informally, a pro-
gram is language independent when the perfect Herbrand models are not
affected by the addition of new constants and functions.) In the next sub-
section, we examine how each of these approaches may be used to prove the
correctness of the definition Solve with respect to the intended semantics.

2.2 Reflective predicates


By representing variables as variables, the non-ground representation pro-
vides implicit support for the reflection of the semantics of unification. For
example, a meta-program may define a reflective predicate Unify by the
fact
Unify (u,u)
so that the goal
Meta- Programming in Logic Programming 435

< Unify (
Member(x, Cons(x, y)),
Member(l, Cons(z, Cons(2, Cons(3, Nil)))) )
will succeed with
x = l, y = Cons(2, Cons(3, Nil)), z = 1.
The ability to use the underlying unification mechanism for both the
object program and its representation makes it easy to define reflective
predicates whose semantics in the meta-program correspond to the seman-
tics of the object program.
The meta-interpreter V in Figure 3 assumes that the object program is
a normal logic program and defines the reflective predicate Solve/1. Given

Solve (True)
Solve((a And b)) <-
Solve (a) A
Solve(b)
Solve(Not a) <-
-,Solve (a)
Solve (a)
Clause(a, b) A
Solve(b)
Fig. 3. The Vanilla meta-interpreter V

a normal object program P, the program Vp consists of the program V


together with a set of facts defining Clause/2 and representing P. For
example, let P be the program in Figure 2, and «— G the goal
<- -<Member(x, Cons(2, (Cons(3, Nil)))) A
Member(x, Cons(l, Cons(2, (Cons(3, Nil))))).
Then both the goal «- G and the goal «- Solve(\G\)
«- Solve(
Not Member(x, Cons(2, (Cons(3, Nil)))) And
Member(x, Cons (1, Cons(2, (Cons(3, Nil))))) )
have the computed answer
x = l.
The predicate Solve/1 is intended to satisfy the following reflective princi-
ples which are adaptations of those given in Subsection 1.4:

P\-c Q iff VP -M

comp(P) (=£ Q iff comp(Vp) \=M


436 P. M. Hill and J. Gallagher

Here Q is assumed to be a conjunction of ground literals although the


reflective principles can easily be generalized for non-ground formulas. It
has been shown that the relation Solve/1 has the intended semantics in
program Vp if either the interpretation of Vp is typed or the object pro-
gram represented by the definition of Clause/2 is language independent.
Each of these conditions is described in turn.
We first consider using a typed interpretation of Vp. There are (at
least) two types in the interpretation, o and u, where o is intended for
object language terms and u for object language formulas. The represen-
tation for the terms and formulas in the object language together with the
intended types for their interpretation is as follows.
Object symbol Meta symbol Type
Constant Constant o
Function of arity n Function of arity n o* • • • * o —> o
Proposition Constant u
Predicate of arity n Function of arity n o* • •
• • * O -> u

Binary connective Function of arity 2 u *u ': -> u

Unary connective Function of arity 1 A* ->u

In addition, the predicates Solve/1 and Clause/2 have each of their argu-
ments of type u. The following result is proved in [Hill and Lloyd, 1989].
Theorem 2.2.1. Let P be a normal program and «- Q a normal goal. Let
Vp be the program defined above. Then the following hold:
1. comp(P) is consistent iff comp(Vp) is consistent.
2. 9 is a correct answer for comp(P)U{<- Q} iff 0 is a correct answer
for comp(V P )U{<- Solve(Q)}.
3. ->Q is a logical consequence of comp(P) iff -> Solve(Q) is a logical
consequence of comp(Vp).
Theorem 2.2.2. Let P be a normal program and +- Q a normal goal. Let
Vp be the program defined above. Then the following hold:
1. 0 is a computed answer for Pu{<— Q} iff 6 is a computed answer for
VPU{«- Solve(Q)}.
2. PU{<- Q} has a finitely failed SLDNF-tree iff VPU{<- Solve(Q)}
has a finitely failed SLDNF-tree.
It is important to note here that, although the declarative semantics
of Vp requires a model which is typed, the procedural semantics for the
program P and for Vp is the same and that, apart from checking that the
given goal is correctly typed, no run-time type checking is involved.
Martens and De Schreye have provided an alternative solution to the
semantics of Vp that does not require the use of types. We now give a
summary of the semantics that they have proposed. Their work requires
Meta-Programming in Logic Programming 437

the object programs to be stratified and satisfy the condition of language


independence.
Definition 2.2.3. Let P be a program. A language for Pl is any language
C such that each clause in P is a well-formed expression in the language £.
Let £p be the language defined using just the symbols occurring in P.
Then any language, for P will be an extension of £p.
Definition 2.2.4. Let P be a stratified program. Then P is said to be
language independent if the perfect Herbrand model of P is independent of
the choice of language for P.
Language independence is an undecidable property. Thus it is impor-
tant to find a subclass of the language independent programs that can
be recognized syntactically. The following well-known concept of range
restriction determines such a class.
Definition 2.2.5. A clause in a program P is range restricted if every
variable in the clause appears in a positive literal in the body of the clause.
A program is range restricted if all its clauses are range restricted.
It is shown in [Martens and De Schreye, 1992a] that the set of range
restricted stratified programs is a proper subset of the language indepen-
dent programs. It is also demonstrated that, if P is a definite program and
G a definite goal, P is language independent if and only if all computed
answers (using SLD-resolution) for P U {G} are ground.
The correctness of the Vp program is stated in terms of the perfect
model for the object program and the weakly perfect model for the meta-
program. It is shown that if the object program P is stratified, then the
program Vp has a weakly perfect model. Note that a stratified program
always has a perfect model.2
We can now state the main result.
Theorem 2.2.6. Let P be a stratified, language independent, normal pro-
gram and Vp the meta-program, as defined above. Then, for every predicate
p/n occurring in P the following hold.
1. If t 1 ,... ,tn are ground terms in the language defined by the symbols
occurring in Vp, then Solve(p(t 1 ,..., tn)) is true in the weakly perfect
Herbrand model ofVp iff p ( t 1 , . . . , t n ) is true in the perfect Herbrand
model of P.
2. If t 1 ,... ,tn are ground terms in the language defined by the symbols
occurring in P, then Solve(Not p ( t 1 , . . . ,tn)) is true in the weakly
1
Note that this definition of a language of a program just applies to the discussion
of Martens and De Schreye's results and does not hold in the rest of this chapter.
2
A definition of a weakly perfect model is in [Martens and De Schreye, 1992b]. A
definition of a stratified program can be found in [Lloyd, 1987] as well as in [Martens
and De Schreye, 1992b].
438 P. M. Hill and J. Gallagher

perfect Herbrand model of Vp iff p(t 1 ,...,tn) is not true in the perfect
Herbrand model of P.

The main issue distinguishing the typed and language independent ap-
proaches is the criterion that is used in determining the language of a
program. Either the language of a program is inferred from the symbols it
contains or the language can be defined explicitly by declaring the symbols.
As the language of the program Vp must include a representation of all
the symbols of the object program, it is clear that if we require the lan-
guage of a program to be explicitly declared, the meta-program will need
to distinguish between the terms that represent object terms and those
that represent object formulas. This of course leads naturally to a typed
interpretation. On the other hand, if the language of a program is fixed by
the symbols actually appearing in that program, then we need to ensure
the interpretation is unchanged when the program is extended with new
symbols. This leads us to look at the concept of language independence
which is the basis of the second approach.
The advantages of using a typed interpretation is that no extra condi-
tions are placed on the object program. The usual procedural semantics
for logic programming can be used for both typed and untyped programs.
Thus the type information can be omitted from the program although it
seems desirable that the intended types of the symbols be indicated at least
as a comment to the program code. A possible disadvantage is that we must
use many-sorted logic to explain the semantics of the meta-program instead
of the better known unsorted logic. The alternative of coding the type in-
formation explicitly as part of the program is not desirable since this would
create an unnecessary overhead and adversely affect the efficiency of the
meta-interpreter.
The advantage of the language independence approach is that for def-
inite language independent object programs, the semantics of the V pro-
gram is based on that of unsorted first order logic. However, many common
programs such as the program in Figure 2 are not language independent.
Moreover, as soon as we allow negation in the bodies of clauses, any ad-
vantage is lost. Not only do we still require the language independence
condition for the object program, but we can only compare the weakly
perfect model of the meta-program with the perfect model of the object
program.
We conclude this subsection by presenting in Figure 4 an extended form
of the program V in Figure 3. This program, which is a typical example of
what the non-ground interpretation can be used for is adapted from [Ster-
ling and Shapiro, 1986]. The program W defines a predicate PSolve/2. The
first argument of PSolve/2 corresponds to the single argument of Solve but
the second argument must be bound to the program's proof of the first
argument.
Meta-Programming in Logic Programming 439

PSolve(True, True)
PSolve(x And y, xproof And yproof) <-
PSolve(x, xproof) A
PSolve(y, yproof)
PSolve(Not x,True) <-
-<PSolve(x, _)
PSolve(x,x If yproof) «-
Clause(x,y) A
PSolve(y, yproof)
Fig. 4. The Proof-Tree meta-interpreter W

Given a normal object program P, the program Wp consists of the


program W together with a set of facts defining Clause/2 and representing
P. If P is the program in Figure 2, then the goal
<- PSolve(
Not Member(x, Cons(2, (Cons(3, Nil)))) And
Member(x, Cons(1, Cons(2, (Cons(3, Nil))))),
proof)
has the computed answer
x=l
proof = True And (Member(1, Cons(1, Cons(2, Cons(3, N i l ) ) ) ) If True).

2.3 Meta-programming in Prolog


The language Prolog has a number of built-in predicates that are useful for
meta-programming. These can be divided into two categories: those with
a declarative semantics and those whose semantics is purely procedural.
The first category includes functor/3, arg/3, and =. ./2. functor is
true if the first argument is a term, the second is the name of the top-
level function for this term and the third is its arity. arg is true if the
first argument is a positive integer i, the second, of the form f ( t 1 , . . . ,t n )
(n > i), and the third, the ith argument ti. For a Prolog meta-program
containing the non-ground representation of the Chase program in Figure 1,
the definition of the system predicates functor and arg would be:
functor(tom, tom, 0).
functor (jerry, jerry, 0),
functor (cat(_), cat, 1).
functor(mouse(_), mouse, 1).
functor (chase(_,_), chase, 2).

arg(l,cat(X),X).
arg(1,mouse(X),X).
440 P. M. Hill and J. Gallagher

arg(1,chase(X,Y),X).
arg(2, chase(X,Y),Y).
The predicate = .. is a binary infix predicate which is true when the left
hand argument is a term of the form f or f (t1,...,t n ) and the right hand
argument is the list [f ,t 1 ,.. .,tn]. This predicate is not really necessary
and can be defined in terms of functor and arg.
Term = .. [Function l Args] :-
functor(Term, Function, Arity),
findargs(Arity, Args, Term).

findargs(0, [], _).


findargs(Argno, [ArglArgs], Term) :-
arg(Argno, Term, Arg),
Argnol is Argno - 1,
findargs(Argno1, Args, Term).
The second category includes meta-logical predicates such as var/1,
nonvar/1, and atomic/1 and the dynamic predicates such as assert/1
and retract/1.
The predicate var/1 tests whether its argument is currently uninstan-
tiated, while nonvar/1 is the opposite of var and tests whether its only
argument is currently instantiated, atomic/1 succeeds if its argument is
currently instantiated to a constant. These predicates are not declarative
since computed answers are not correct answers. Consider the following
Prolog goals.
?- var(X)
?- var(3)
The first goal succeeds with no binding, suggesting that var (X) is true for
all instances of X, while the second goal fails.
The predicates assert/1 and retract/1 allow modification of the pro-
gram being interpreted. On execution of assert (t), provided t is in the
correct syntactical form for a clause, the current instance of t is added as
a clause to the program, retract is the opposite of assert. On execu-
tion of retract (t), if there is a clause in the current program that unifies
with t, then the first such clause will be removed from the program. Since
these modify the program being executed and their effect is not undone on
backtracking, it is clear that they do not have a declarative semantics.

3 The ground representation


In this section we review the work that has been done in meta-programming
using the ground representation; that is, where any expression of the object
program is represented by a ground expression in the meta-program. As
Meta-Programming in Logic Programming 441

in the previous section, we begin by outlining the historical background to


the use of this representation.
Use of a ground representation in logic can be traced back to the work
of Godel, who gave a method (called a Godel numbering) for representing
expressions in a first order language as natural numbers [Godel, 1931].
This was defined by giving each symbol of the logic a unique positive odd
number, and then, using the sequence of symbols in a logical expression,
a method for computing a unique number for that expression. It was not
only possible to compute the representation of an expression, but also, given
such a number, determine the unique expression that it represented. Using
the properties of the natural numbers, this representation has been used in
proving properties of the logic. Feferman [1962], who applied Godel's ideas
to meta-mathematics, introduced the concept of a reflective principle.
Weyhrauch [1980] designed a proof checker, called FOL, which has spe-
cial support for expressing properties of a FOL structure (using a ground
representation) in a FOL meta-theory. An important concept in FOL is
the simulation structure. This is used in the meta-reasoning part of FOL
for reflective axioms that define an intended interpretation of the represen-
tation of the object language in terms of other theories already defined in
FOL (or in LISP). A FOL meta-theory has, as simulation structure, the
object theory and the mappings between the language of the object theory
and the language of the meta-theory. Weyhrauch generalized the reflection
principle (as defined by Feferman) to be defined simply as a statement of a
relation between a theory and its meta-theory. He gives, as an example, a
statement of the correspondence between provability of a formula / in the
object theory and a predicate Pr/2 in the metatheory. Pr/2 is intended
to be true if the first argument represents the proof of / and the second
argument represents /.
Bowen and Kowalski [1982] showed how the idea of using a representa-
tion similar to that used in FOL could be adapted for a logic programming
system. The actual representation is not specified although a simple syn-
tactic scheme is used in the paper whereby a symbol "P(x, Bill)" is used to
denote a term of the meta-language which names P(x, Bill). In addition,
where a symbol, say A, is used to denote a term or formula of the object
language, A' denotes the term in the meta-language representing A. In
fact, the key point of this paper was not to give a detailed scheme for the
representation of expressions in the object language, but to represent the
provability relation l-£ for the object program with language £ by means
of a predicate Demo/2 in the meta-program with language M. Thus, in
the context of a set of sentences Pr of M, Demo represents l-c if and only
if, for all finite sets of sentences A and single sentences B of £,
442 P. M. Hill and J. Gallagher

It was not intended that ->Demo in Pr should represent unprovability since


provability is semi-decidable.
The ideas of Bowen and Kowalski were demonstrated by Bowen and
Weinberg [1985] using a logic programming system called MetaProlog,
which extended Prolog with special meta-programming facilities. These
consisted of three system predicates, demo/3, add_to/3, and drop_f rom/3.
The first of these corresponded to the Demo predicate described by Bowen
and Kowalski. The predicate demo/3 defines a relation between the rep-
resentations of an object program P, a goal G, and a proof R. demo is
intended to be true when G is a logical consequence of P and a proof of
this is given by R. Predicates add_to/3, and drop_from/3 are relations
between the representations of a program P, a clause C, and another pro-
gram Q. add_to/3 and drop_f rom/3 are intended to be true when Q can
be obtained from P by, respectively, adding or deleting the clause C. These
predicates provided only the basic support for meta-programming and, to
perform other meta-programming tasks, a meta-program has to use the
non-declarative predicates in the underlying Prolog. Thus, although the
theoretical work is based on a ground representation, meta-programming
applications implemented in MetaProlog are often forced to use the non-
ground representation.
Apart from the work of Bowen and Weinberg, meta-programming us-
ing the ground representation remained a mainly theoretical experiment
for a number of years. Stimulation for furthering the research on this sub-
ject was brought about by the initiation in 1988 of a biennial series of
workshops on meta-programming in logic. A paper [Hill and Lloyd, 1989]
in the 1988 workshop built upon the ideas of Bowen and Kowalski and
described a more general framework for defining and using the ground rep-
resentation. In this representation, object clauses were represented as facts
in the meta-program and did not allow for changing the object program.
Subsequent work modified this approach so as to allow for more dynamic
meta-programming [Hill and Lloyd, 1988]. Godel is a programming system
based on these ideas [Hill and Lloyd, 1994]. This system has considerable
specialist support for meta-programming using a ground representation.
Those aspects of Godel, pertinent to the meta-programming facilities, are
described in Subsection 3.3.

3.1 The representation


In Section 2, a simple scheme for representing a first order language based
on that employed by Prolog was described. However, for most meta-
programs such a representation is inadequate. Meta-programs often have to
reason about the computational behaviour of the object program and, for
this, only the ground representation is suitable. A program can be viewed
from many angles; for example, the language, the order of statements, the
modular structure. Meta-programs may need to reason about any of these.
Meta-Programming in Logic Programming 443

In this subsection, we first consider the components that might constitute


a program and how they may be represented.
In order to discuss the details of how components of an object program
may be represented as terms in a meta-program, we need to understand
the structure of the object program that is to be represented. Object
programs are normally parsed at a number of structural levels, ranging
from the individual characters to the modules (if the program is modular)
that combine together to form the complete program. We assume here that
an object program has some of the following structural levels.
1. Character
2. Symbol
3. Language element
4. Statement or declaration
5. Module
6. Program
For example, for a normal logic program (with a modular structure) these
would correspond to the following.
1. Alphabetic and non-alphabetic character
2. Constant, function, proposition, predicate, connective, and variable
3. Term and formula
4. Clause
5. Set of predicate definitions
6. Logic program
Note that levels 1, 2, and 3 contribute to the language of a program, while
levels 4, 5, and 6 are required for the program's theory. At each structural
level a number of different kinds of tree structure (called here a unit) are
defined. That is, a unit at a structural level other than the lowest will be a
tree whose nodes are units defined at lower levels. At the lowest structural
level, the units are the pre-defined characters allowed by the underlying
programming system. Thus a subset of the trees whose nodes are units
defined at or below a certain structural level will form the set of units at
the next higher level. For example, in the program in Figure 2, M, (, and )
are characters, Member and x, are symbols, and Member(x, Cons(x,y)) is
a formula. Note that, as a tree may consist of just a single unit, some units
may occur at more than one level. Thus the character x is also a variable
as well as a term of the language. A representation will be defined for one
or more of these structural levels of the object program. Moreover, it is
usual for the representation to distinguish between these levels, so that, for
example, a constant such as A may have three representations depending
on whether it is being viewed as a character, constant, or term. Note that
this provision of more than one representation for the different structural
levels of a language is usual in logic. In particular, the arithmetization of
444 P. M. Hill and J. Gallagher

first order logic given by the Godel numbering has a different number for
a constant when viewed as a symbol to that when it is viewed as a term.
The actual structural levels that may be represented depend on the
tasks the meta-program wishes to perform. In particular, if we wish to
define a reflective predicate that has an intended interpretation of SLD-
resolution, we need to define unification. For this, a representation of the
symbols is required. If we wish to be able to change the object program
using new symbols created dynamically, then a representation of the char-
acters would be needed.
A representation is in fact a coding and, as is required for any cipher, a
representation should be invertible (injective) so that results at the meta-
level can be interpreted at the object-level3. Such an inverse mapping
we call here a re-representation. To ensure that not only the syntax and
structural level of the object element can be recovered but also the kind
of unit at that level (for example, if it is a language element in a logic
program, whether it is a term, atom, or formula) that is represented, the
representation has to include this information as part of the coding.
We give, as an example, a simple ground representation for a normal
program. This is used in the next subsection by the Instance-Demo pro-
gram in Figure 5 and the SLD-Demo program in Figure 6. First we de-
fine the naming relation for each symbol in the language and then define
the representation for the terms and formulas (using just the connectives
A, <—, -i) that can be constructed from them. Individual characters are not
represented. For this representation, it is assumed that the language of
the meta-program includes three types: the type s for terms representing
object symbols; the type o for terms representing object terms; and the
type u for terms representing object formulas. It is also assumed that the
language includes the type List (a) for any type a together with the usual
list constructor Cons and constant Nil. The language must also include the
functions Term/2, Atom/2, V/1, C/1, F / 1 , P/1 And/2, If/2, and Not/I
whose intended types are as follows.

3
There is, of course, an alternative explanation of why a representation must be
injective: it is the inverse of denotation. Since denotation must be functional (any term
has (at most) one denotation), the inverse of denotation must be injective.
Meta-Programming in Logic Programming 445

Function Type
Term s * List(o) --> o
Atom s * List(o) —> u
V Integer - > o
C Integer - > s
F Integer - > s
P Integer - > s
And (u* u - > u
If u* u - > u
Not u, - > u,

The representation of the variables, non-logical symbols, and connec-


tives is as follows.
Object symbol Representation
Variable Term V(n) n € Integer
Constant Term C(i) i £ Integer
Function Term F(j) j & Integer
Proposition/
Predicate Term P(k) k E Integer
A Function And/2
<- Function If/2
—l Function Not /1

To give the representation of the terms and formulas, suppose D is a


constant represented by C(i), G/n a function represented by F ( j ) , and
Q/n a predicate represented by P(k).
Object expression Representation
Constant D Term Term(C(i),[])
Term G(t 1 , . . . ,t n ) Term T e r m ( F ( j ) , [ [ t 1 ] , . . ., [ t n ] ] )
Atom Q(t1,...,tn) Term Term(P(k),[[t 1 ],...,[t n ]])
Formula A^ B Term And([A], [B])
Formula -l A Term Not([A])
Formula A<-B Term If([A],[B])

As an example, consider the language of the program in Figure 2.


446 P. M. Hill and J. Gallagher

Object symbol Representation


X V(0)
y V(l)
z V(2)
Nil C(0)
i >0 C(I)
C(-i
i <0 C (- i -1)
Cons F(0)
Member P(0)
Thus the atom
Member(x, [1,2])
is represented by the term
Atom( P(0),
( V(0),
Term(

Term(C(l),[}),
Term(F(0), [Term(C(2), ), Term(C(0),

The representation of a program clause which is a fact


H
is
If ( [ H ] , True)
The representation of a program clause
H<-B
is

which is the same as the formula representation of H «- B.


The theory of a program is represented as a list of clauses. Using this
representation, the Member program is represented by the following list:
[ If(Atom(P(0), (V(0), Term(F(0), [V(0), V(l)])]), True),
I f ( A t o m ( P ( 0 ) , [V(0), Term(F(0), ( V ( 1 ) , V(2)})}),
Atom(P(0),[V(0),V(2)})) }
This representation is defined for any normal programs and is used
for many of the examples in this chapter. However, the main limitation
with this representation is that there is no representation of the characters
making up the object symbols. Thus this representation does not facilitate
the generation of new object languages. In addition, it has been assumed
that the object program has no module structure. By representing a module
Meta-Programming in Logic Programming 447

as a list of clauses and a program as a list of modules, a meta-program could


reason about the structure of a modular program.
It can be seen that a representation such as the one described above
that encodes structural information, is difficult to use and prone to user
errors. An alternative representation is a string. A string, which is a finite
sequence of characters, is usually indicated by enclosing the sequence in
quotation marks. For example, "ABC" denotes the string A,B,C. With
the string representation, the atom
Member(x, [1, 2])
is represented as
"Member(x, [1, 2])"
If a string concatenation function ++ is available, then unspecified compo-
nents can be expressed using variables in the meta-program. For example
"Member(" ++ x ++ ",[1,2])"
defines a term that corresponds, using the previous representation to the
(non-ground) term
Atom( P(0),
[ x,
Term(

( Term(C(l),[]),
Term(F(0), [Term(C(2), []), Term(C(0), [])] ) ] )
] )-
As the string representation carries no structural information, this rep-
resentation is computationally very inefficient. Thus, it is desirable for a
representation such as a string as well as one that is more structurally
descriptive to be available. The meta-programming system should then
provide a means of converting from one representation to the other.
Most researchers on meta-programming define a specific representation
suitable for their purposes without explaining why the particular repre-
sentation was chosen. However, Van Harmelen [1992] has considered the
many different ways in which a representation may be defined and how this
may assist in reasoning about the object theory. We conclude this subsec-
tion with interesting examples of non-standard representations based on
examples in his paper.
In the first example, the object theory encodes graphs as facts of the
form Edge(Ni,Nj) together with a rule defining the transitive closure.
Connected ( n1, n2) «- Edge(n 1 ,n 3 ) A Connected (n3, n2)
The representation assigns different terms to the object formulas, depend-
ing on their degree of instantiation.
448 P. M. Hill and J. Gallagher

Object formula _ Representation


Edge(N15, x) GroundVar(Edge, N155,
Connected(N15, x) GroundVar (Connected, N15, V ( 1 ) )
Edge(x, N15) VarGround(Edge, V(1), N15)
Connected(Nl5, x) VarGround(Connected,V(1),Nl5)
Edge(x,y) VarVar(Edge,V(1),V(2))
Connected (N15, N16) GroundGround(Connected, N15, N16)
With this representation, we can define the atoms that may be selected by
means of a predicate Selectable in the meta-program.
Selectable(GroundGround(-,-,-))
Selectable(GroundVar(-,-,-))
Selectable( VarGround(Edge, _, _))
Moreover, with this representation, a meta-theory could be constructed
containing the control assertion that the conjunction defining Connected
should be executed from left to right if the first argument of Connected is
given, but from right to left if the second argument is given.
The second example is taken from the field of knowledge-based systems
and concerns the representation of the difference between implication when
used to mean a causation and when used as a specialization. Such implica-
tions may require different inference procedures. The representation could
be as follows.4
Object formula _ Representation _
AcuteMenin -> Menin TypeOf (AcuteMenin, Menin)
Meningococcus -> Menin Causes (Meningococcus, Menin)
The meta-program can then specify the inference steps that would be ap-
propriate for each of these relations.
Although the idea of being able to define a representation tailored for
a particular application is attractive, no programming system has been
implemented that provides support for this.

3.2 Reflective predicates


With the ground representation, the reflective predicates have to be ex-
plicitly defined. For example, a predicate Ground Unify /2 that is intended
to be true if some instance of its arguments represent identical terms in
the object language, must be defined so that its completed definition cor-
responds to the Clark equality theory. Such a definition requires a full
analysis of the terms representing object terms and atoms. This is compu-
tationally expensive compared with using Unify /2 to unify expressions in
the non-ground representation (see Subsection 2.2).
4
Menin and AcuteMenin are abbreviations of Meningitis and Acute Meningitis,
respectively.
Meta-Programming in Logic Programming 449

There axe two basic styles of meta-interpreter that use the ground rep-
resentation and have been discussed in the literature. The first style is
derived from an idea proposed in [Kowalski, 1990] and has a similar form
to the program V in Figure 3 but uses the ground rather than the non-
ground representation. In this subsection, we present, in Figure 5, an
interpreter I based on this proposal. This style of meta-interpreter and
similar meta-programs are being used in several programs although a com-
plete version and a discussion of its semantics has not previously been
published. In the second style the procedural semantics of the object pro-
gram is intended to be a model of the meta-interpreter. For example, the
meta-interpreter outlined in [Bowen and Kowalski, 1982] is intended to
define SLD-resolution. An extension of this meta-interpreter for SLDNF-
resolution in shown in [Hill and Lloyd, 1989] to be correct with respect to
its intended interpretation. An outline of such a meta-interpreter J is given
in Figure 6. The Godel program SLD-Demo, also based on this style, is
presented in the next subsection in Figure 9.
Both the programs I and J make use of the ground representation de-
fined in the previous subsection. These programs require an additional
type a to be used for the bindings in a substitution. The function Bind/2
is used to construct such a binding. This has domain type Integer * o and
range type a.
The meta-interpreter I in Figure 5 defines the predicate IDemo/3. This
program is intended to satisfy the reflective principles
P \-c Q0 iff I \-M IDemo(\P\ \Q\,\Q0\)
comp(P) \=c Q0iff comp(I) \=M IDemo([P], [Q],[Q0]).
Here, Q is a conjunction of literals and 0 a substitution that grounds Q.
These reflective principles, which are adaptations of those given in Sub-
section 1.4, ensure that provability and logical consequence for the object
program are defined in the meta-program.
The types of the predicates in Program I are as follows.
Predicate Type
IDemo List(u) * u * u
IDemo1 List(u) * u,
InstanceOf u * u
InstFormula u * u * List(a) * List(a)
InstTerm 0 * 0 * List(a) * List(a)
InstArgs List(o) * List(o) * List(a) * List(a)

The above reflective principles intended for program I are similar to


those intended for the meta-interpreter Vp. Apart from the representation
of the object variables, the main difference between these programs is that
Vp includes the representation of the object program P. Thus, to make it
450 P. M. Hill and J. Gallagher

IDemo(p,x,y) <—
InstanceOf (x, y) ^
IDemo1(p, y)

IDemol(-, True)
IDemo1(p, And(x,y)) <—
IDemo1(p,x) ^
IDemo1(p, y)
IDemo1(p, Not(x)) <-
-lIDemo1(p, x)
IDemo1(p,Atom(q,xs)) <-
Member(z, p) ^ InstanceOf (z, If (Atom(q,xs),b)) ^
IDemo1(p, b)

InstanceOf (x, y ) < — InstFormula(x, y, [], _)

InstFormula(Atom(q, xs), Atom(q, ys), s, s1) <—


InstArgs(xs, ys, s, s1)
InstFormula(And(x, y), And(z,w),s, s2) <-
InstFormula(x, z,s,sl) A
InstFormula(y, w, s1, s2)
InstFormula(If (x, y), I f ( z , w ) , s , s 2 ) <-
InstFormula(x, z, s, s1) A
InstFormula(y, w, s1, s2)
InstFormula(Not(x),Not(z),s,s1) «—
InstFormula(x, z, s, s1)
InstFormula(True, True,s,s)

InstTerm(V(n), x , [ ] , [Bind(n,x)])
InstTerm(V(n),x, [Bind(n, x)[s], [Bind(n,x)[s])
InstTerm(V(n),x, [Bind(m,y)[s], [Bind(m,y)[s1]) <-
n # mA
InstTerm(V (n), x, s, s1)
InstTerm(Term(f, xs), Term(f, ys),s, s1) <-
InstArgs(xs, ys, s, s1)

InstArgs([],[],s,s)
InstArgs([x\xs],[y\ys],s,s2) <-
InstTerm(x,y,s,s1) A
InstArgs(xs, ys, s1, s2)

Fig. 5. The Instance-Demo program I


Meta-Programming in Logic Programming 451

easier to compare the programs V and I, the first clause of I needs to be


replaced by
IDemo(p,x,y) <-
ObjectProgram(p) A
InstanceOf(x,y) A
IDemo1(p,y)
and then define the program IP to consist of this modified form of I together
with a unit clause
ObjectProgram( [P] )
We expect that similar results to those of Theorems 2.2.1 and 2.2.2 will
then apply to Ip. Further research is needed to clarify the semantics of the
program I in Figure 5 and the variation Ip described above.
The meta-interpreter J in Figure 6 defines the predicate JDemo/3. This
program is intended to satisfy the reflective principle

P \-c Q0 iff comp(I) \=M J D e m o ( [ P ] , [ Q ] , [Q0]).


Here, Q denotes a conjunction of literals and 6 a substitution that grounds
Q. The types of the predicates in J are as follows.
Predicate Type
JDemo List(u) * u * u
Derivation List(u) * u * u * List(o) * List(o) * Integer
Resolve u * u * u * List(o) * List(o) * u * Integer * Integer
MaxForm u * Integer
SelectLit u *u
Ground u
ReplaceConj u *u * u * u
Rename Integer * u * u * Integer
Unify Terms List(o) * List(o) * List(o) * List(o)
ApplyToForm List(o] * u * u

It is intended that the first argument of Derivation/6 represents a nor-


mal program P, the second and third represent conjunctions of literals Q
and R, the fourth and fifth represent substitutions 0 and o, and the sixth
is an index n for renaming variables. Derivation/6 is true if there is an
SLD-derivation of P U {<- Q0} ending in the goal «- R0. The predicate
MaxForm/2 finds the maximum of the variable indices (that is, n in V(n))
in the representation of a formula; SelectLit /2 selects a literal from the
body of a clause; Ground/I checks that a formula is ground; Rename/4
finds a variant of the formula in the second argument by adding the num-
ber in the first argument to the index of each of its variables and setting
the last argument to the maximum of the new variable indices; Resolve
performs a single derivation step; ReplaceConj/ 4 removes an element from
452 P. M. Hill and J. Gallagher

JDemo(p,x,y) «-
MaxForm(x, n) A
Derivation(p,x, True,[],s,n) A
ApplyToForm(s, x, y)

Derivation(-, x, x, s, s, _)
Derivation(p, x,z,s,t, n) «-
SelectLit(Atom(q, xs),x) A
Member(If(Atom(q,ys),ls),p) A
Resolve(x, Atom(q, xs),If(Atom(q,ys),ls),s, sl,zl,n, nl) A
Derivation(p, zl, z, s1, t, nl)
Derivation(p, x, z,s,s,_) <-
SelectLit(Not(a),x) A
ApplyToForm(s,a,a1) A
Grotmd(al) A
-lDerivation(p,al, True, [],, 0) A
ReplaceConj(x,Not(a), True,z)

Resolve(x, Atom(q, xs), If (Atom(q,ys),ls), s, t,z,m,n) «-


Rename(m, If(Atom(q,ys),ls), I f ( A t o m ( q , y l s ) , l 1 s ) , n ) A
UnifyTerms(y1s,xs,s,t) A
Replace.Conj (x, Atom(q, xs), l1s, z)

Fig. 6. The SLD-Demo program J

a conjunction and replaces it by another literal or conjunction of literals;


ApplyToForm/3 applies a substitution to a formula; and UnifyTerms/4
finds an mgu for two lists of terms obtained by applying the substitution in
the third argument to the lists of terms in the first and second arguments.
Programs I and J can be used with the Member program in Figure 2
as their object program. A goal for I or J that represents the object-level
query
«- Member(x, [1,2])
is given in Figure 7 (where Demo denotes either IDemo or JDemo). Fig-
ure 7 also contains the computed answers for this goal using the programs
in Figures 5 and 6.
The goal in Figure 7 has a ground term representing the object pro-
gram. It has been observed that if a meta-interpreter such as I or J was
used by a goal as above, but where the third argument representing the
object program was only partly instantiated, then a program that satisfied
the goal could be created dynamically. However, in practice, such a goal
would normally flounder if SLDNF-resolution was used as the procedural
semantics. Christiansen [1994] is investigating the use of constraint tech-
Meta-Programming in Logic Programming 453

Goal

Demo(
[ If(Atom(P(0), [V(0), Term(F(0), [V(0), V ( 1 ) ] ) ] ) , True),
If(Atom(P(0), (V(0), Term(F(0), [V(1), V(2)])]),
Atom(P(0),[(V(0),V(2)])) ])
Atom(P(0), [
V(0),
Term(F(0),
( Term(C(1),[]),
Term(F(0),[Term(C(2),[]),Term(C(0),[])] ) ])

9,

Computed answers

g= Atom(P(0), [
Term(C(l),[]),
Term(F(0),
[ Term(C(1),[]),
Term(F(0),[Term(C(2),[]), Term(C(0),

9= Atom(P(0), [
Term(C(2),[]),
Term(F(0),
[ Term(C(1),[]),
Term(F(0),[Term(C(2),[]), Term(C(0),

Fig. 7. Goal and computed answers for programs I and J

niques in the implementation of the meta-interpreter so as to avoid this


problem. These techniques have the potential to implement various forms
of reasoning including abduction and inductive logic programming.
The Instance-Demo program I relies on using the underlying procedures
for unification and standardizing apart. Thus it would be difficult to adapt
the program to define complex computation rules that involve non-trivial
co-routining. However, the SLD-Demo program J can easily be modified
to allow for arbitrary control strategies.
3.3 The language Godel and meta-programming
The Godel language [Hill and Lloyd, 1994] is a logic programming lan-
guage that facilitates meta-programming using the ground representation.
454 P. M. Hill and J. Gallagher

The provision is mainly focussed on the case where the object program is
another Godel program, although there are also some basic facilities for
representing and manipulating expressions in any structured language.
One of the first problems encountered when using the ground repre-
sentation is how to obtain a representation of a given object program and
goals for that program. It can be seen from the example in Figure 7,
that a ground representation of even a simple formula can be quite a large
expression. Hence constructing, changing, or even just querying such an
expression can require a large amount of program code. The Godel sys-
tem provides considerable system support for the constructing and query-
ing of terms representing object Godel expressions. By employing an ab-
stract data type approach so that the representation is not made explicit,
Godel allows a user to ignore the details of the representation. Such an
abstract data type approach makes the design and maintenance of the
meta-programming facilities easier. Abstract data types are facilitated in
Godel by its type and module systems. Thus, in order to describe the
meta-programming facilities of Godel, a brief account of these systems is
given.
Each constant, function, predicate, and proposition in a Godel program
must be specified by a language declaration. The type of a variable is
not declared but inferred from its context within a particular program
statement. To illustrate the type system, we give the language declarations
that would be required for the program in Figure 1:
BASE Name.
CONSTANT Tom, Jerry : Name.
PREDICATE Chase : Name * Name;
Cat, Mouse : Name.
Note that the declaration beginning BASE indicates that Name is a base
type. In the statement
Chase(x,y) <- Cat(x) & Mouse(y) .
the variables x and y are inferred to be of type Name.
Polymorphic types can also be defined in Godel. They are constructed
from the base types, type variables called parameters, and type construc-
tors. Each constructor has an arity > 1 attached to it. As an example,
we give the language declarations for the non-logical symbols used in the
(second variant) of the program in Figure 2.
CONSTRUCTOR List/1.
CONSTANT Nil : List(a).
FUNCTION Cons : a * List(a) -> List(a)
PREDICATE Member : a * List(a).
Here List is declared to be a type constructor of arity 1. The type List (a)
is a polymorphic type that can be used generically. Godel provides the usual
Meta-Programming in Logic Programming 455

syntax for lists so that [] denotes Nil and [x|y] denotes Cons (x, y). Thus,
if 1 and 2 have type Integer, [1, 2] is a term of type List (Integer).
The Godel module system is based on traditional software engineering
ideas. A program is a collection of modules. Each module has at most
two parts, an export part and a local part. The part and name of the
module is indicated by the first line of the module. The statements can
occur only in the local part. Symbols that are declared or imported into the
export part of the module are available for use in both parts of the module
and other modules that import it. Symbols that are declared or imported
into the local part (but not into the export part) of the module can only
be used in the local part. There are a number of module conditions that
prevent accidental interference between different modules and facilitate the
definition of an abstract data type.
An example of an abstract data type is illustrated in the Godel program
in Figure 8 which consists of two modules, UseADT and ADT. In UseADT, the

MODULE UseADT.
IMPORT ADT.

EXPORT ADT.
BASE H,K.
CONSTANT C,D : H.
PREDICATE P : H * K * K;
Q : H * H.

LOCAL ADT.
CONSTANT E : K.
FUNCTION F : H * K -> K.
P(u,F(u,E),E).
Q(C,D).

Fig. 8. Defining an abstract data type in Godel

type K, which is imported from ADT, is an abstract data type. If the query
<- Q(x,D) & P ( x , y , z ) .
was given to UseADT, then the displayed answer would be:
x = C,
y = <K>,
z = <K>
The system modules include general purpose modules such as Integers,
Lists, and Strings as well as the modules that give explicit support for
meta-programming. Integers provides the integers and the usual arith-
456 P. M. Hill and J. Gallagher

metic operations on the integers. Lists provides the standard list notation
explained earlier in this subsection as well as the usual list processing predi-
cates such as Member and Append. The module Strings makes available the
standard double quote notation for sequences of (ascii) characters. There
is no type for an individual ascii character except as a string of length one.
Godel provides an abstract data type Unit defined in the module Units.
A Unit is intended to represent a term-like data structure. The module
Flocks imports the module Units and provides an abstract data type
Flock which is an ordered collection of terms of type Unit. Since Flocks
and Units do not provide any reflective predicates, they cannot be regarded
as complete meta-programming modules. However, Flocks are useful tools
for the manipulation of any object language whose syntax can be viewed
as a sequence of units. Thus Flocks can be used as the basis of a meta-
program that can choose the object programming system and its semantics.
The four system modules for meta-programming are Syntax, Programs,
Scripts, and Theories. The modules Programs, Scripts, and Theories
support the ground representation of Godel programs, scripts, and theo-
ries, respectively. A script is a special form of a program where the module
structure is collapsed. Since program transformations frequently violate
the module structure, Scripts is mainly intended for meta-programs that
perform program transformations. A theory is assumed to be defined using
an extension of the Godel syntax to allow for arbitrary first order formulas
as the axioms of the theory. The fourth meta-programming module Syntax
is imported by Programs, Scripts, and Theories and facilitates the ma-
nipulation of expressions in the object language. We describe briefly here
the modules Syntax and Programs. The modules Scripts and Theories
are similar to Programs and details of all the meta-programming modules
can be found in [Hill and Lloyd, 1994].
The module Syntax defines abstract data types including Name, Type,
Term, Formula, TypeSubst, TermSubst, and VarTyping which are the types
of the representations of, respectively, the name of a symbol, a type, a term,
a formula, a type substitution, a term substitution, and a variable typing.
(A variable typing is a set of bindings where each binding consists of a
variable together with the type assigned to that variable.)
The module Syntax provides a large number of predicates that support
these abstract data types. Many of these are concerned with the represen-
tation and can be used to identify and construct representations of object
expressions. For example, And is a predicate with three arguments of type
Formula and is true if the third argument is a representation of the con-
junction of the formulas represented in the first two arguments. Variable
has a single argument of type Term and is true if its argument is the rep-
resentation of a variable.
The predicate Derive is an example of a reflective predicate in Syntax.
Derive has the declaration
Meta-Programming in Logic Programming 457

PREDICATE Derive : Formula * Formula * Formula * Formula *


Formula * TermSubst * Formula.
Given an atom of the form Derive(f1, f2, f3, f4,f5, f6, f7), then this is true
if r1 is the resultant derived from a resultant r with selected atom a and
statement s using mgu t. Here, r1 is the resultant represented by f7, r is the
resultant whose head is represented by f1 and whose body is a conjunction
of the formulas represented by f2, f3, and f4; a is the atom represented
by f3; s (whose variables are standardized apart from the variables in r) is
the statement represented by f5.
The module Programs imports the module Syntax and defines the ab-
stract data type Program. Program is the type of a term representing a
Godel program.
As in the module Syntax, many of the predicates in Programs are
concerned with the representation. Some actually relate the object lan-
guage syntax with the representation. For example, there is a predicate
StringToProgramType/4 with declaration
PREDICATE StringToProgramType : Program * String * String *
List(Type).
that converts a type (represented as a string) to a list of representations of
this type. There may be more than one type corresponding to the string due
to overloading of the names of the symbols in the object program. There
are other predicates for manipulating the elements of the abstract data
type Program directly. Thus DeleteStatement, which has the declaration
PREDICATE DeleteStatement : Program * String * Formula *
Program.
removes a statement from the module named in the second argument from
the object program represented in the first.
Finally, there are several reflective predicates. For example, Succeed,
which has the declaration
PREDICATE Succeed : Program * Formula * TermSubst.
is true when its first argument is the representation of an object program,
its second argument is the representation of the body of a goal in the
language of this program, and its third argument is the representation of a
computed answer for this goal and this program. The predicate Succeed
is the Godel equivalent of the Demo predicate.
The SLD-Demo program in Figure 9 defines a predicate SLDDemo/3 us-
ing some of the system predicates of Syntax and Programs although the
only reflective system predicate used is Derive from Syntax. The predi-
cate SLDDemo/3 is defined for definite programs and goals with the usual
left to right selection rule. By changing the above definition of MyAnd/3,
SLD-resolution can be defined with arbitrary selection rules. The program
provides a starting point for a variety of extensions.
458 P. M. Hill and J. Gallagher

EXPORT SLDDemo.

IMPORT Programs.
PREDICATE SLDDemo : Program * Formula * Formula.

LOCAL SLDDemo.

SLDDemo(prog, query, queryl) <-


IsImpliedBy(query, query, res) ft
EmptyFormula(empty) ft
SLDDemo1(prog, res, resl) ft
IsImpliedBy(queryl, empty, resl).

PREDICATE SLDDemo1 : Program * Formula * Formula.


DELAY SLDDemol(p,r,_) UNTIL NONVAR(p)& NONVAR(r).
SLDDemol(_, res, res).
SLDDemo1(prog, res, resl) <-
EmptyFormula(empty) &
IsImpliedBy(head, body, res) ft
MyAnd(atom, rest, body) &
StatementMatchAtom(prog, _, atom, clause) &
Derive(head, empty, atom, rest, clause, _, newres) &
SLDDemo1(prog, newres, resl).

PREDICATE MyAnd : Formula * Formula * Formula.


MyAnd(atom, rest, body) <-
And(atom, rest, body) &
Atom(atom).
MyAnd(body, empty, body) <-
EmptyFormula(empty)&Atom(body).

Fig. 9. An SLD-Demo program using the Godel meta-programming mod-


ules
Although the predicates in the system modules Syntax and Programs
can be used to construct the representation of a Godel expression, this
method would not be practical for a complete Godel program. For this
reason, there is a utility in the Godel system that constructs the ground
representation of an object program (of type Program) and writes this to
a file. There is another utility that obtains the re-representation of a term
of type Program and writes it to the appropriate files.
Meta-Programming in Logic Programming 459

MODULE TestSLDDemo.

IMPORT SLDDemo, Programs10.

PREDICATE Go: String * String * String.


Go(prog_string, goal_string, goall_string) <-
Findlnput(prog_string ++ ".prm", In(stream)) &
GetProgram(stream, prog) ft
MainModuleInProgram(prog, module) ft
StringToProgramFormulaCprog,module,goal_string,[goal]) ft
SLDDemo(prog, goal, goall) &
ProgramFormulaToString(prog,module,goall,goall_string).
Fig. 10. Godel program for testing the SLD-Demo program

To read from or write to a file containing a program representation,


there is a system module Programs I0 which provides the appropriate input
and output system predicates. For example, to run the SLD-Demo program
in Figure 9, we can use the module in Figure 10. This assumes that a file
containing the ground representation of a program exists. Given a file
Chase. prm containing a representation of a Godel version of the program
in Figure 1, a query of the form
<- Go("Chase", "Chase(Tom, x)",y).
has the answer
y = "Chase(Tom, Jerry)".

4 Self-applicability
There are a number of meta-programs that can be applied to (copies of)
themselves. In this section we review the motivation for this form of meta-
programming and discuss the various degrees of self-applicability that can
be achieved by these programs.
The usefulness of self-applicability was demonstrated by Godel in [1931]
where the natural numbers represent the axioms of arithmetic and some
fundamental theorems about logic are derived using the properties of arith-
metic. Perlis and Subrahmanian [1994] give a description of many of
the general issues surrounding self-reference in logic and artificial intel-
ligence together with a comprehensive bibliography. The concept of self-
applicability has been used to construct many well-known logical para-
doxes, one of the most famous being the liar paradox:
This sentence is false.
Here, we are concerned with programming applications of self-applicability
and how particular programming languages support such applications.
460 P. M. Hill and J. Gallagher

If the languages of the object program and meta-program are kept


separate, we can prevent the problems of self-reference illustrated by the
liar paradox above while being able to express the syntactic form of self-
reference given in
This sentence has five words.
by replacing the words "This sentence" by a representation of the sentence.
Provided a representation of the meta-program can be included as a term in
a goal for the meta-program and object provability is defined in the meta-
program, the meta-program is self-applicable although the object language
and the language of meta-program are kept strictly separate.
Many goals for the meta-program that can also be expressed as goals for
the object language can be solved more efficiently using the object program
rather than using their representation in the meta-program. In order that
the meta-program can use the object program directly, the language M.
of the meta-program needs to include the object language C. Assuming
£ C M, Bowen and Kowalski [1982] defined the amalgamation of £ and
M to be the language M together with
• at least one ground term in M representing each term and formula
of £,
• a representation of the provability relation in £ by means of a predi-
cate in M (in the context of a set of sentences of M), and
• linking rules for communicating goals and computed answers between
the object language and its representation.
An amalgamation is said to be strong if the languages C and M are the
same. Bowen and Kowalski [1982] showed that self-referential sentences
were possible in this logic. They constructed a sentence J that asserts of
itself that it is underivable. They show that neither J nor -l J is derivable
in the logic so that J is clearly true. This establishes an incompleteness
result for the amalgamated language. If £ is a proper subset of M, then
the amalgamation is said to be weak. As we are not aware of any logic
programming systems that allow the object language and language of the
meta-program to be completely identified, only the weak form of amalga-
mation is discussed in this chapter.

4.1 Separated meta-programming


Normally, a self-applicable meta-program reasons about a copy of itself.
This can be achieved with both the non-ground and ground representa-
tions, described in Sections 2 and 3. By using the meta-program as the
object program for another meta-program, meta-meta-programs and higher
towers of meta-programs can be constructed.
For example, the Instance-Demo program I given in Figure 5 can be
applied to the representation of any normal logic program. Thus, as I is
Meta-Programming in Logic Programming 461

itself a normal program, I can be applied to a representation of itself. This


is achieved by giving I a goal of the form:
«- IDemo([I], [goal],g)
where goal is the goal in Figure 7. We do not give the actual representation
of I or [goal] here, since these terms are large. This difficulty illustrates
two of the problems of using towers of meta-programs. The terms used to
represent the object program and goal are large and complex and also, be-
cause of the added structural information encoded in the terms, processing
them is much less efficient than that using the original object code. This
issue is discussed more fully in Section 6.
The Godel system assumes that self-applicable meta-programs are sep-
arated. Utilities and certain IO predicates are provided for obtaining the
representation and re-representation of a Godel program. However, apart
from these non-declarative facilities, there is no direct access to the object
program from a meta-program. In contrast, in Prolog, the object program
must be present in the meta-program in order to obtain its representation.
Self-application using separated meta-programming has many applica-
tions and is, we believe, the most useful form of self-applicability. In par-
ticular; interpreters, compilers, program analysers, program transformers,
program debuggers, and program specializers can be usefully applied to
themselves. In Section 3, we explained how the programming system Godel
provided support for meta-programming using the ground representation.
This system has been designed so that self-applicable meta-programs can
be developed. Towers of two or even three levels of program specializers
have been achieved using the Godel system although much work needs to
be done to improve their efficiency [Gurr, 1993]. Section 6 explains how
compilers and even compiler generators can be generated automatically
from program specializers using self-application.

4.2 Amalgamated meta-programming


It is often desirable that not only should the object language be part of
the meta-language but also, the object program be actually included in the
meta-program. Hence we extend the above definition of weak amalgama-
tion of languages to programs. We say that a meta-program is amalgamated
if the languages of the meta-program and its object program are weakly
amalgamated and the statements of the object program are included as
part of the meta-program. With an amalgamated meta-program, a query
for the meta-program that represents a query in the object language can be
defined directly using the object program. For example, if P is a proposi-
tion in the object language represented by the constant P' in the language
of the meta-program, then we could have a statement in the meta-program
of the form:
Demo( ThisProgram, P') «- P.
462 P. M. Hill and J. Gallagher

In this example and in the examples below, ThisProgram is a constant in


the meta-language that refers to the (ground) representation of an object
program which is included in the meta-program.
A basic requirement for amalgamated meta-programming is that the
semantics of the object program must be preserved. That is, the predi-
cates defined in the object program must have the same definition in the
amalgamated meta-program so that a goal written in the object language
must be a logical consequence of the object program if and only if it is a
logical consequence of the amalgamated meta-program.
In order to realize the advantages of amalgamated programming, there
has to be a means by which the object program and its representation can
communicate. Bowen and Kowalski define the following linking rules that
should be satisfied by the reflective predicate Demo/2 in the amalgamated
program for every formula B in the object program.

These or similar rules are necessary for communication between the object
program and its representation in the amalgamated meta-program.
To facilitate these linking rules, a means of computing the re-represent-
ation of the terms representing the terms of the object language must be
provided. Also, a method for finding the representation of an object term
is required. This reflective requirement, which concerns only the terms of
the object language, may be realized by means of inference rules, functions,
or relations. In each case, we consider how the predicate Demo/2 whose
semantics is given by the above linking rules may be defined for the atomic
formulas. The definition of Demo/2 for the non-atomic formulas is the
same in every case. Thus, for formulas that are conjunctions of literals, the
definition of Demo/2 would include the following clauses.
Demo(p, a A' b) «- Demo(p, a) A Demo(p, b)
Demo(p,-l'a) < -lDemo(p,a)
where A' represents A and -l represents -l.
If the reflective requirement is realized by means of inference rules, then
these must be built into the programming system. Thus the representation
must also be fixed by the programming system. The inference rule that
determines t from a term [t] must first check that [t] is ground and that
it represents an object level term and secondly, if the object language is
typed, that t is correctly typed in this language. The set of statements of
the form
Demo( ThisProgram, P'( [ x 1 ] , . . . , [ x n ] ) ) <- P(x1 , . . ., xn)
for each predicate P/n in the object language represented by a function
P'/n in the language of the meta-program will provide a definition of
Meta-Programming in Logic Programming 463

Demo/2 for the atomic formulas. Note that the xi are universally quanti-
fied variables, quantified over the terms of the object language.
Note that, as the trivial naming relation is built into Prolog, the repre-
sentation is trivially determined by means of an inference rule. However,
there is no check that [t] is ground before applying the inference rule. Re-
flective Prolog [Costantini and Lanzarone, 1989] (see below), is an example
of a language with a non-trivial naming relation, but where the represen-
tation and re-representation are determined by inference rules.
If a re-representation was defined functionally, the meta-program would
require a function such as ReRepresent/1. Thus, for each ground term t in
the object language, the equality theory for the meta-program must satisfy
ReRepresent([t]) = t
As this is part of a logic programming system where the equality theory is
normally fixed by the unification procedure and constraint handling mech-
anisms, the evaluation method for this function would be built-in so that
the representation would again be fixed by the programming system. Using
this function, the definition of Demo/2 for the atomic formulas is given by
a set of statements of the form
Demo(ThisProgram,P'(y 1 ,...,y n )) «-
P(ReRepresent(y 1 ), . . . , ReRepresent(yn))
for each predicate P/n in the object program.
For logic programming, the most flexible way in which a representation
may be defined is as a relation, say Represent/2, where the first argument
is an object term and the second its representation.
Represent(t, [t]).
Then the definition of Demo/2, for each predicate P/n in the object lan-
guage, would consist of a statement of the form
Demo( ThisProgram, P'(y 1 , . . . , y n )) <-
Represent(x 1 , y1) A ... A Represent(xn,yn) A

The predicate Represent/2 could be defined by the user and hence, as


discussed in Subsection 3.1, can be chosen to suit a particular application.
Thus, for the Member program in Figure 2, we would have the statement
Demo( ThisProgram, Member' (y1 , . . . , yn)) <-
Represent(x 1 , y1) A ... A Represent(xn, yn) A
Member (x1 ,... ,xn)
For this example, the definition of Represent/2 would include the clauses
Represent(Nil' , Nil)
Represent(Cons'(x1,y1), Cons(x,y)) <-
Represent(x1,x) A
Represent(y1,y)
464 P. M. Hill and J. Gallagher

In Sections 2 and 3, we gave meta-programs that defined reflective pred-


icates entirely at the meta-level. We have now shown that, if the meta-
program is amalgamated, these reflective predicates can be defined using
the object program directly. Usually, a combination of these two methods
is preferred since the approach that executes a re-representation is usually
more efficient, but the explicit representation of the procedural semantics
provides greater flexibility. The level at which the explicit definition of the
procedural semantics of the object programming system is replaced by a
call to the object program using the re-representation determines the gran-
ularity of the interpreter. The greater the detail at which the procedure is
defined explicitly, the greater the degree of granularity. Clearly efficiency
will be decreased with increasing granularity.
Amalgamation not only facilitates greater computational efficiency for
meta-programs but also provides an environment that allows interaction be-
tween between the actual knowledge and the methods for reasoning about
this knowledge. In particular, with an amalgamated language, not only
can predicates at the meta-level be defined using object-level predicates,
but also object-level predicates can use meta-level predicates in their defi-
nitions. A classic example of an application of this (taken from [Bowen and
Kowalski, 1982]) is the coding of the legal rule that a person is innocent
unless he or she is proven guilty.
Innocent(x) < <Demo(ThisProgram, Guilty'(y))A
Represent (x,y).
An amalgamation can facilitate towers of meta-programming. For ex-
ample, a meta-program can include the statement
Demo(ThisProgram,Demo'(ThisProgram,y)) <—
Represent(x,y)/\
Demo (ThisProgram ,x).
where the predicate Demo/2 is represented by the function Demo'/2.
Note that at each meta-level, the definition of the predicate Represent/2
must be extended with the constants and functions of the previous meta-
level. Suppose, for example, each meta-level uses an additional quote to
represent the previous lower level. Then, just to represent the representa-
tion of the Member program in Figure 2, the following clauses would need
to be added to the above definition of Represent/2.
Represent(Nil",Nil')
Represent(Cons"(x1,y1), Cons'(x,y)) <—
Represent(xl,x) A
Represent (y1, y)
Represent(Member"(x1,y1), Member'(x,y)) <—
Represent(x1,x) A
Represent(y1,y)
Meta-Programming in Logic Programming 465

At the next (third) meta-level, not only would the representation of Nil",
Cons"/2, and Member"/2 have to be defined by Represent/2 but also the
representation of the function Demo'/2.
Each meta-level in a program could contain the representations of sev-
eral programs at the previous level. The relationships between the different
meta-levels in a program are sometimes called its meta-level architecture.
The usual architectures in which each meta-level reasons about only one
object program at the next lower level might be said to be "linear".
As the representation has to be explicitly defined using Demo/2 and
Represent/2, there is always a top-most meta-level. This will contain sym-
bols with no representation. Hence, without any 'higher-order' entensions,
logic programming cannot be used for the strong form of amalgamation.
One of the problems of this amalgamation is that the representation
needs to be made explicit. In the previous discussion, a representation of
the symbols is given and then a representation of the terms and formulas
is constructed in the standard way. However, it is often more convenient to
hide the details of the representation from the programmer. Quine [1951]
introduced the 'quasi-quotes' already used in this chapter to indicate a rep-
resentation of some unspecified object expression. This (or similar syntax)
can be used instead of an explicit representation. For example,
[Cons(x, Nil)]
would correspond (using the above representation) to the term
Cons'(V(0), Nil')
where V(0) is the (ground) representation of x.
This syntax is not constructive. That is, there is no direct means of con-
structing larger expressions from their components. Note that, in the Godel
programming system, the use of abstract data types for meta-programming
has a similar problem.
The main reason that terms representing object-level expressions have
to be constructed dynamically is because the structure of components
of these expressions may not be fully specified. The unspecified sub-
expressions are defined by variables that range not over the object terms
but over the representations of arbitrary object expressions. Such variables
are called meta-variables. Thus, instead of a programming system provid-
ing predicates for constructing terms representing object expressions, the
syntax may distinguish between meta-variables and variables ranging over
the object level terms. The partially specified object expression can then
be enclosed in the quasi-quotes but those meta-variables that occur within
their scope must be syntactically identifiable using some escape notation.
For example, if the escape notation is an overline, x indicates that in the
context of [. . .] a; is a meta-variable. For example,
\Cans(x,Nil)]
466 P. M. Hill and J. Gallagher

would correspond (using the above representation) to the term


Cans'(x, Nil')
where the x ranges over the terms in the language of the Member program.
Moreover, [x~] is equivalent to the meta-variable x5. With this notation,
the clause in the definition of Demo/2 that defines the representation of
Member/2 is as follows.
Demo(ThisProgram, \Member(x1,... , x n ) ] ) <-
Member(x 1 ,..., xn).
As explained in the previous subsection, Godel is not intended for amal-
gamated meta-programming and provides no support for this. Prolog meta-
programming facilities force the object program and a meta-program to
be amalgamated but the actual switching between the object level and
meta-level has to be programmed explicitly. In addition, most of the meta-
programming facilities of Prolog are not declarative whereas the intention
of using amalgamated meta-programs for representing knowledge is to pro-
vide a declarative representation of this knowledge.
A programming system called Reflective Prolog [Costantini and Lan-
zarone, 1989], [Costantini, 1990] is intended for amalgamated reasoning.
In this language, the representation of the constants, functions, and predi-
cates is defined by specially annotating the corresponding object symbols.
There are three different kinds of variables: object variables, predicate
meta-variables, and function meta-variables. The rules of substitution en-
sure that these may only be substituted by, respectively, an object term,
a representation of a predicate, and a representation of a function. There
are syntactic restrictions to keep the meta-levels distinct and prevent self-
reference within a single atom. In addition to the object level and different
meta-levels, a reflective Prolog program distinguishes between the meta-
evaluation level and the base level. The meta-evaluation level is at the
top of the meta-level architecture and includes a distinguished predicate
Solve. The base level, containing an amalgamated theory, comprises the
remaining meta-levels below it and cannot refer to any predicates in the
meta-evaluation level. Procedurally, a definite Reflective Prolog program
uses SLD-resolution whenever possible but automatically switches between
the base level and meta-evaluation level in certain circumstances. The
declarative semantics for such programs, called the Least Reflective Her-
brand Model, is an adapted form of the well-known least Herbrand model.
A new system for amalgamated meta-programming called Alloy is de-
scribed by Barklund [1995]. This system, which uses a syntax based on the
5
This differs from Quine's notation where he uses x, y, z etc., to indicate the object
variables (in his case, quantified over numbers) and Greek letters to indicate the meta-
variables. Using this notation, [u] = u. The disadvantage of this notation is that only
two levels are defined and these are assumed to be separated. There is no provision for
more than two levels or for amalgamating the meta-levels.
Meta-Programming in Logic Programming 467

quasi-quote and escape notation described above, is similar to and largely


inspired by the ideas in Reflective Prolog. Alloy differs from Reflective
Prolog in that it provides explicit support for the ground representation
and for facilitating the definition of multiple meta-levels.
Towers of meta-programs and more general meta-level architectures
have been shown to be useful in a number of areas. These include software
engineering and legal reasoning. Software engineering defines methodolo-
gies for program development independent of any particular programming
language or application. Tools for supporting these methodologies may
distinguish three levels of reasoning. The object level is the application
domain. Given the pre-conditions and post-conditions, the first meta-level
defines what the program is intended to compute. Finally, the top-most
meta-level defines a formalization of correct program development. More
details concerning this application are given in [Dunin-Keplicz, 1994]. It is
known that legal knowledge has a number of reasoning layers. For example,
there may be a several primary legal rules that are intended for distinct
situations. Then secondary rules specify when the primary rules are appli-
cable, how to interpret them, or even how to construct new primary rules.
In law, such techniques are often applied repeatedly, so that tertiary rules
can be defined as meta-rules for the secondary rules and similarly for higher
rules. It is shown in [Barklund and Hamfelt, 1994] that these layers can be
put in a one-to one correspondence with the meta-levels of an amalgamated
meta-program.

4.3 Ambivalent logic


The flexibility of the trivial naming relation with the non-ground repre-
sentation in Prolog has encouraged a certain style of meta-programming
to be adopted by Prolog programmers. However, first order logic does not
provide many Prolog meta-programs with a declarative semantics. For ex-
ample, a useful feature of Prolog allows an ambivalent syntax where terms
and atoms are not distinguished except by their context in the clause, while
substitution for variables is purely syntactic. Thus, expressions such as
demo(demo(X)) :- demo(X)
demo(X) :- X
are allowed. The first of these is easily explained as overloading the name
demo as both a function and predicate symbol. The second of these can be
understood as a schema for clauses of the form
demo(t) <— t
where the argument t is a ground term representing the ground formula
t on the right of the arrow. However, the concept of a schema takes the
semantics beyond that of first order logic.
There have been a number of proposals for extending first order logic
with limited higher order features that may provide these aspects of Prolog
468 P. M. Hill and J. Gallagher

with a semantics. Chen et al. [1993] have developed the logic Hilog. This
is intended to give a logical basis for Prolog's ambivalent syntax. However,
although Hilog does not distinguish between functions, predicates, terms,
and atoms, it does not (as in Prolog) allow variables to range over arbitrary
expressions in the language. Hilog is intended to provide a basis for a new
logic programming system similar to Prolog but based on the logic of Hilog.
Richards [1974] has defined an ambivalent logic that allows formulas to be
treated as terms but not vice-versa. Another ambivalent logic is developed
by Jiang [1994]. This employs features from both Richards' logic and Hilog.
In Jiang's logic, there is no syntactic distinction between functions, predi-
cates, terms, and formulas. Moreover, the semantics distinguishes between
substitution (which is purely syntactic) and equality. The main purpose of
this logic is to provide an expressive syntax for self-reference together with
a suitable extension of first order logic for its semantics. However, it has
also been used to show that the Vanilla program Vp in Section 2 (with-
out the third clause that interprets negative formulas) has the intended
semantics.

5 Dynamic meta-programming
We consider three forms of dynamic meta-programming: constructing a
program using predefined components, updating a program, and trans-
forming or synthesizing a program.

5.1 Constructing programs


The simplest and least dynamic means of creating a program is by com-
bining program components called modules to form the complete program.
There are several applications which require such a modular approach to
programming. These include: the re-use of existing software; the devel-
opment of programs incrementally; non-monotonic reasoning; and object-
orientation with inheritance.
A number of operators that may be used to construct a complete pro-
gram from a set of modules are defined in [Brogi et al., 1990]. These form
a sort of "command shell" for building new programs out of existing ones.
In [Brogi et al., 1992], it is shown how meta-programming techniques using
the non-ground representation, can be used to define and implement these
operators. We discuss here the use of meta-programming for two of the
operators, union U and intersection D. To explain these, we assume that
there is a type Module for terms representing sets of clauses, a set of con-
stants of type Module representing the sets of clauses that form the initial
program components for constructing complete programs, and binary infix
operators n and U with type Module * Module —> Module. Let P1 and P2
be two terms of type Module. Then:
P1 U P2 represents the module obtained by putting the clauses of modules
Meta-Programming in Logic Programming 469

P1 and P2 together.
P1 n P2 represents the module consisting of all clauses defined in the fol-
lowing way:
If p(t1,... ,tn) <- B1 is in the module represented byP1,
p ( u 1 , . . . , un) «- B2 is in the module represented by P2,
and 9 is an mgu for { p ( t 1 , . . . , t n ) , p ( u 1 , . . . ,un)},
then p ( t 1 , . . . , tn)0 <- B10, B20 is in the module represented by P1 n P26.
In the non-ground representation described in Section 2, the object pro-
gram is represented in the meta-program by the definition of the predicate
Clause/2. With this representation, a declarative meta-program cannot
modify the (representation of) the object program. However, by adding an
extra argument with type Module and replacing Clause /2 by OClause/3,
we can include in the representation of a clause the name of the module
in which it occurs. Any expression formed from module names, and the
operators U and n is called a program term. For example, given the facts:
OClause(P, Cat(Tom), True)
OClause(P, Mouse (Jerry), True)
OClause(Q, Chase(x,y), Cat(x) And Mouse(y))
for modules named P and Q, the program term PUQ represents the Chase
program in Figure 1. Moreover, with the set of facts:
OClause(P1, Cat(Tom), True)
OClause(P1, Mouse( Jerry), True)
OClause(P1,Chase(x,y),Cat(x))
OClause(Q1, Cat(Tom), True)
OClause(Q1, Mouse( Jerry), True)
OClause(Q1, Chase(x,y),Mouse(y))
for modules named P1 and Ql, the program term P1 n Ql also represents
the Chase program.
The operators U and n can be defined by extending program V in
Figure 3 to give the Operator-Vanilla program O in Figure 11. As the
operators have only been defined here in the case of definite programs,
O does not have a clause for interpreting negative literals. Given a set
of modules R, the program OR consists of the program O together with
the set of facts extending the definition of OClause and representing the
modules in PL.
The following result was proved in [Brogi et al., 1992].
Theorem 5.1.1. Let P and Q be object programs. Then, for any ground
atom A in the object language,
• OP,Q - OSolve(P u Q, A) iff A is a logical consequence of P u Q
6
We assume that the statements are standardized apart so that they have no variables
in common.
470 P. M. Hill and J. Gallagher

OSolve(p, True)
OSolve(p,(b And c)) <-
OSolve(p,b) A
OSolve(p,c)
OSolve(p,a) «-
OClause(p,a,b) A
OSolve(p,b)

OClause(p U q, a, 6) «-
OClause(p,a,b)
OClause(p U q, a, b) <-
OClause(q, a, b)
OClause(p n q, a, (b And c)) <-
OClause(p,a,b) A

Fig. 11. The Operator-Vanilla interpreter O

• OP,q - OSolve(P n Q,A) iff A is a logical consequence of P n Q


For example, given the program in Figure 11 together with the above
clauses representing the Chase program, the goals
< - OSolve(P U Q, Chase(x, y))
<- OSolve(P1 n Ql, Chase(x,y))
would both succeed with computed answer
x = Tom, y = Jerry.
Composing programs with operators allows for programs to be only
partly specified. The following result was proved in [Brogi et a/., 1990].
Theorem 5.1.2. Let P be a (possibly non-ground) program term and G
represent a goal for the program. Assume that the goal «- OSolve(P,G)
succeeds with computed answer 9. Then the goal represented by GO can be
proved in the program represented by any ground instance of P0.
For example, given the program in Figure 11 together with the above
clauses representing the Chase program, the goal
<- OSolve(P U q, Chase(x, y))
would succeed with computed answers
q = Q, x = Tom, y = Jerry;
q = Q U v, x = Tom, y = Jerry;
Meta-Programming in Logic Programming 471

5.2 Updating programs


A meta-program can modify an object program by inserting and removing
statements. This form of meta-programming has many applications such
as hypothetical reasoning, knowledge assimilation, and abduction. For
example, in hypothetical reasoning, additional axioms are included as tem-
porary hypotheses during a subcomputation. In knowledge assimilation,
the program defining the current knowledge base has to be updated by the
addition or deletion of clauses. These changes are usually constrained to
satisfy certain integrity constraints as well as to cause minimal changes to
the original knowledge base. Abductive reasoning is similar to the previous
application except, in this case, only new facts may be added to the object
program and no clauses may be deleted. Normally, there is an additional
restriction that there is a predefined set of predicates called abducibles and
the added facts must (partly) define an abducible predicate.
For these applications, it is necessary, if the meta-program is to be
declarative, that the object program be represented as a term in the goal to
the meta-program. Moreover, as first order logic does not allow quantifiers
in terms, a ground representation must be used.
For knowledge assimilation, the changes to the object program are nor-
mally global and permanent. Thus, in the case of knowledge assimilation,
it is particularly important that no unnecessary changes are made and the
new knowledge base remains consistent. In [Kowalski, 1994], the problem
of adding a statement S to the database D is discussed in detail. Four
cases are described:
1. S is a logical consequence of D
2. D = D1 U D2 and D2 is a logical consequence of D1 U {S}.
3. 5 is inconsistent with D.
4. None of the relationships (l)-(3) holds.
It is assumed here that the knowledge base is a normal logic program whose
semantics is taken to be its completion. In this case it is not appropriate to
include the consistency checks (since all new facts will be inconsistent with
the completion). Thus we only consider cases 1 and 2, and a third case
when 1 and 2 do not hold. The program in Figure 12 defining the predicate
Assimilate/3 (which is based on the program in [Kowalski, 1990]) shows
how these requirements may be realized in logic programming using meta-
programming techniques. To simplify the example, we have assumed that
only facts may be added or removed. The program uses the representation
given in Section 3, where the object program is represented by a list of
terms representing the statements of the object program. The program
also requires a definition of the reflective predicate Demo such as that
of IDemo given in Figure 5 or JDemo given in Figure 6. The types of
Assimilate/3 and Remove/2 are as follows.
472 P. M. Hill and J. Gallagher

Assimilate(kb,s,kb) «-
Demo(kb, s,_)
Assimilate(kb,s,newkb) <—
Remove(If(a,True),kb,kb1)/\
Demo([If(s, True)|fc61],a,_) A
Assimilate(kb1, s, newkb)`
Assimilate(kb,s,[If(s, True)\kb]) «-
->Demo(kb,s,-) A
-(3a3kbl(Remove(If(a, True),kb,kb1) A

Remove(x, [x|xs], xs)


Remove(x, [-|ys], zs) <—
Remove(x, ys, zs)

Fig. 12. Assimilating a ground fact into a database

Predicate Type
Assimilate List(u) * u * List(u)
Remove u, * List(u) * List(u)
Remove/3 is true if the first argument is an element of the list in the second
argument and the third argument is obtained by removing this element.
It is natural to require certain integrity constraints to hold when facts
are added to or removed from a knowledge base. These constraints are
formulas that should be logical consequences of (the completion of) the
updated knowledge base. A set of integrity constraints may be represented
as a list of terms. Each term representing an integrity constraint from the
set. The assimilation with integrity constraint checking is illustrated in
Figure 13. This defines the predicate AssimilateWithIC/4.
Predicate Type
AssimilateWithIC List(u) * List(u) * u * List(u)
The first argument of AssimilateWithIC/'4 is the representation of the set
of integrity constraints. If the knowledge base, consisting of the initial
knowledge base (represented by the second argument) together with the
fact which is to be assimilated (represented in the third argument), sat-
isfies the integrity constraints (represented in the first argument), then
the Assimilate predicate, defined in Figure 12, is called to update the
knowledge base. The fourth argument of AssimilateWithIC/4 contains the
representation of the updated knowledge base.
As indicated in Section 3, the first logic programming system to provide
declarative facilities for updating logic programs based on a ground repre-
Meta-Programming in Logic Programming 473

AssimilateWithIC([ic\ics], kb, s, newkb) <—


Demo([If(s, True)\kb],ic,.) A
Assimilate WithIC(ics, kb, s, newkb)
AssimilateWithIC([],kb,s,newkb) «—
Assimilate (kb, s, newkb)

Fig. 13. Checking integrity constraints

sentation was Meta-Prolog [Bowen and Weinberg, 1985]. However, it was


built as an extension to Prolog so that it also inherited the non-declarative
facilities of Prolog. The system Godel, described in Subsection 3.3, provides
considerable support for updating the theory of a program using its ground
representation. For example, the system module Programs defines predi-
cates InsertStatement/4 and DeleteStatement/4for adding a statement
to and removing a statement from a module in a program.

5.3 The three wise men problem


We illustrate the use of meta-programming for hypothetical reasoning by
means of a well-known problem, the three wise men. This problem has
been much studied by researchers in meta-reasoning and it is thought ap-
propriate to show here how the standard techniques of meta-programming
in logic programming can be used to solve this problem.
The three wise men puzzle is as follows. A king, wishing to find out
which of his three wise men is the wisest, puts a hat on each of their heads
and tells them that each hat is either black or white and at least one of the
hats is white. The king does this in such a way that each wise man can see
the hats of the other wise men, but not his own. In fact, each wise man has
a white hat on. The king then successively asks each wise man if he knows
the colour of his own hat. The first wise man answers "I don't know", as
does the second. Then the third announces that his hat is white.
The reasoning of the third wise man is as follows. "Suppose my hat is
black. Then the second wise man would see a black hat and a white hat,
and would reason that, if his hat is black, the first wise man would see two
black hats and hence would conclude that his hat is white since he knows
that at least one of the hats is white. But the second wise man said he
didn't know the colour of his hat. Hence my hat must be white."
The solution below, using pure logic programming, is intended to il-
lustrate the use of meta-programming for hypothetical reasoning. In this
solution, the king uses his knowledge to simulate the reasoning of the three
wise men (W1, W2, W3). The king assumes that the third wise man uses
his reasoning to simulate the reasoning of the second wise man and hence
of the first. We use the ground representation given in Subsection 3.1 and
assumed by the programs in Figures 5, 6, and 13. The constants and pred-
474 P. M. Hill and J. Gallagher

icates used for the king's and wise men's knowledge bases together with
their representation is as follows.
Object symbol Representation
Black C(0)
White C(1)
W1 C(ll)
W2 C(12)
W3 C(13)
Hat/2 P(1)
DontKnow/1 P(2)
Hear/2 P(3)
See/2 P(4)
DiffColor/2 P(5)
Same/2 P(6)
The predicate Demo is defined as either IDemo in Figure 5 or JDemo in
Figure 6. AssimilateWithIC is defined in Figure 13. The knowledge of the
king and his wise men consists of sets of normal clauses represented by a
list of representations of these clauses in some order. There are two initial
knowledge bases. One contains knowledge common to all the men.
Hat( W3, White) <- Hat( W2, Black) A Hat( W1, Black)
Hat( W2, White) <- Hat( W1, Black) A Hat( W3, Black)
Hat( W1, White) <- Hat( W3, Black) A Hat( W2, Black)
Hear(W3,W2)
Hear (W3, W1)
Hear(W2, W1)
See(x,y) «- --Same(x,y)
DiffColor( White, Black)
DiffColor (Black, White)
Same(x,x)
This is represented by the single fact defining the predicate CommonKb/1.
CommonKb ([
CommonKb(1.
If(Atom(P(1),[C(13),C(1)]),
And(Atom(P(1), [C(12), C(0)]),Atom(P(1), [C(ll), C(0)]))),
If(Atom(P(l),[C(12),C7(l)]),
And(Atom(P(1), [C(ll), C(0)]), Atom(P(1), [C(13), C(0)]))),
If(Atom(P(l),[C(ll),C(l)]),
And(Atom(P(1), [C(13), C(0)]), .Atom(P(l), [C(12), C(0)]))),
If(Atom(P(3), [C(13),C(12)]), True),
If(Atom(P(3), [(7(13),C(ll)]), True),
I f ( A t o m ( P ( 3 ) , [C(12), C(ll)]), True),
If(Atom(P(4), [V(l), V(2)]), Not(Atom(P(6), (V(1),V(2)]))),
I f ( A t o m ( P ( 5 ) , [ C ( 1 ) , C ( 0 ) ] ) , True),
Meta-Programming in Logic Programming 475

If(Atom(P(5),[C(0),C(1)]), True),
If(Atom(P(6),(V(1),V(1)]),True)
})
In addition to the above common knowledge base, the men will have certain
commonly held constraints on what combination of knowledge is accept-
able. As an example of such a constraint, we assume that all men know
that a hat cannot be both black and white.
-(Hat(w, Black) A Hat(w, White))
This can be represented in the fact defining the predicate CommonIC/1.
CommonIC([
Not(And(
Atom(P(1),(V(11),C(0)]),

The other knowledge base contains a list of facts describing the king's
knowledge about the state of the world.
Hat(W1, White)
Hat(W2, White)
Hat(W3, White)
DontKnow( W1)
DontKnow( W2)
This is represented by the fact defining the predicate KingKb/1.
KingKb([
I f ( A t o m ( P ( 1 ) , [C(ll), C(l)]), True),
I f ( A t o m ( P ( 1 ) , [C(12), C(l)]), True),
If(Atom(P(1),(C(l3),C(1)]), True),
If(Atom(P(2),[C(12)]),True),
If(Atom(P(2),(C(ll)]),True)
])
The king and the wise men can use only their own knowledge when sim-
ulating other men's reasoning. The key to their reasoning is defined by the
predicate LocalKb/3. Given the name of a wise man and a current knowl-
edge base in the first and second arguments, respectively, then LocalKb/3
will be true if the third argument is bound to the part of the knowledge
base of the wise man which is contained in the current knowledge base.
LocalKb(w,kb,localkb) <- LocalKbl(w,kb,kb,localkb)
The predicate LocalKb/3 calls LocalKb1/4. This predicate requires an ex-
tra copy of the initial knowledge base so that it can process all clauses
represented by elements of the list and determine which should be included
in the new knowledge base.
476 P. M. Hill and J. Gallagher

LocalKb1(-, -,[],[])
LocalKb1(w, ikb, [k\kb], [k\lkb]) <-
CommonKb(ckb) A
Member (k,ckb) A
LocalKb1(w, ikb, kb, lkb)
LocalKb1(w,ikb,[If(Atom(P(1),[w1,c]), True)\kb],
[ I f ( A t o m ( P ( 1 ) , [ w 1 , c ] ) , True)\lkb]) <-
Demo(ikb, Atom(P(4), [w, w1]),-.) A
LocalKb1(w, ikb, kb, lkb)
LocalKb1(w,ikb,[If(Atom(P(1),[w1,-]),True)\kb],

-Demo(ikb,Atom(P(4),[w,w1]),-) A
LocalKb1(w, ikb, kb, ikb, lkb)
LocalKb1(w,ikb, [If(Atom(P(2), [ w 1 ] ) , True)\kb],
[ I f ( A t o m ( P ( 2 ) , [ w 1 ] ) , True)\lkb]) <-
Demo(ikb,Atom(P(3),[w,w1]),-) A
LocalKb1(w, ikb, kb, lkb)
LocalKb1(w,ikb, [If(Atom(P(2), [w1]), True)|kb],
Ikbs) «-
-Demo(ikb, Atom(P(3), [w, w 1 ] ) , - ) A
LocalKb1(w, ikb, kbs, lkb)
There are six statements defining LocalKb1/4. The first is the base case.
The second ensures that common knowledge is included in the wise man's
knowledge. The third (resp., fourth) deals with the case where the wise
man can (resp., cannot) see a hat and the colour is known. The fifth
(resp., sixth) deals with the case where the wise man can (resp., cannot)
hear another man and that man says he doesn't know.
The predicate Reason/3 simulates the reasoning of the men. Either the
man in the first argument reasons that the colour of his hat is white because
he believes the other two hats are black, or, if he has heard another wise
man say "I do not know the colour of my hat" , he hypothesises a colour for
his own hat and by simulating the other man's reasoning, tries to obtain
a contradiction. The predicate AssimilateWithIC checks that no integrity
constraints are violated by the extra hypothesis.
Reason(kb,w,c) <—
Demo(kb, Atom(P(1), [w, V ( 1 ) ] ) , Atom(P(1), [w, c]))
Reason(kb,w,c1) «—
Demo(kb, Atom(P(2), [ V ( 1 ) ] ) , Atom(P(2), [ w 1 ] ) ) A
LocalKb(w1, kb, newkb)A
Demo(kb, Atom(P(5), [V(1), V(2)]), Atom(P(5), [ c 1 , c 2 ] ) ) A
CommonIC (ics) A
Assimilate WithIC(ics, newkb, Atom(P(1), [w,c2]),newkb1) A
Reason(newkb1, w1, .)
Meta-Programming in Logic Programming 477

Finally, the predicate King/2 models the king's own reasoning. The king
will deduce that the man in the first argument should be able to reason
that the colour of his hat is the colour given in the second argument.
King(w, c) <—
KingKb(kkb) A
CommonKb(ckb) A
Append (kkb, ckb, kb) A
LocalKb(w, kb, lkb) A
Reason(lkb, w, c)
With the program consisting of these definitions together with the pro-
grams in Figures 5, 12, and 13 and the usual definitions of Append/3 and
Member /2, the goal
<- King(C(13),c)
has just the computed answer

Moreover, the goals


<- King(C(11),c)
<- King(C(12),c)
both fail7.
Note that this program is not amalgamated, even in the weak sense,
and only requires one level of meta-programming.
A number of alternative approaches to the solution of this problem have
been made. Kim and Kowalski [1990] use full first order logic to give a rep-
resentation and solution of the problem. This solution requires meta-level
reasoning with a weak amalgamation. The solution in [Aiello et a/., 1988]
also uses full first order logic but is designed for a more general framework
of non-cooperative but loyal and perfect reasoners. Their solution has been
machine-checked using a version of Weyhrauch's FOL system [Weyhrauch,
1982]. Nait Abdallah [1987] extends logic programming with a concept
similar to a module called an ion. This solution is not expressed as a
logic program and does not use meta-programming techniques. An al-
ternative solution similar to the one given above using the Godel system
is in [Hill and Lloyd, 1994]. This was primarily designed to illustrate the
meta-programming facilities in Godel and differs in that its object program
defining the wise men's knowledge is purely prepositional and a query may
only ask the third wise man the colour of his hat.

7
The wise men program with these goals has been machine-checked using the Godel
system.
478 P. M. Hill and J. Gallagher

5.4 Transforming and specializing programs


A major application for meta-programming is in the transformation and
specialization of programs. However, it appears that, although many trans-
formation and specialization procedures and their proofs of correctness have
been published, apart from the work of Gurr [1993], little work has been
done to establish good declarative meta-programming styles for implement-
ing such procedures.
When a program is specialized for a particular application or trans-
formed to a more efficient program, although the new program may have
little resemblance to the old program, it should preserve the semantics, at
least with respect to the expected goals for the program. It is quite clear
that the old object program has to be represented as a ground term in the
goal to the meta-program. In addition, the computed answer of the meta-
program should bind a variable in the goal to a ground term representing
the transformed or specialized program.
It is difficult to discuss how program transformers and specializers can
be implemented in logic programming without a concrete procedure in
mind. Thus we consider the meta-programming requirements for a basic
unfolding step and provide a skeleton logic program that realizes this step.
To define an unfolding step, it is convenient to denote the body of a normal
clause as a sequence of literals or conjunctions of literals. Thus, for example,
H<- L 1 , . . . , L n
denotes the normal clause
H <- L1 A • • • A Ln.
The unfolding step for definite programs is defined as follows [Tamaki and
Sato, 1984]8.
Definition 5.4.1. Let P be a normal program that includes the clause
C : A <-S,Q(t),R.
Suppose, for i = 1, . . . , r, the clauses
Di : Q(ti) <- Ri
are variants (chosen to have no variables in common with C) of all the
clauses in P whose heads unify with Q(t). Let 6i be the mgu of Q(t) and
Q(ti). Then the result of unfolding C with respect to Q(t) is the program
P' obtained from P by replacing C by the r clauses

obtained by resolving C with Di wrt Q(t).


This definition was first proved correct with respect to preservation of
the success set by Tamaki and Sato [1984] for definite programs and later
8
A tuple t 1 , . . . , tj is denoted here by t.
Meta-Programming in Logic Programming 479

Unfold(p, If(h, b),Atom(q,xs),p1) <-


SelectClatise(If(h,b),p) A
SelectLit(Atom(q,xs),b) A
MaxForm(If(h,b),n) A
DeriveAll(p,If(h,b), Atom(q,xs),cs,n) A
Replace(p, If(h, b), cs,p1)

DeriveAll([],_,_,[],_)
Denve,An([pc|pcs], If(h, b), a, [c|cs], m) <-
Resolve(b,a,pc, [],s,bl,m,n) A
ApplyToForm (s, If(h, b1) , c) A
Derive All (pcs, If(h, b), a, cs, n)
DeriveAll(\pc\pcs], If(h, b),a, cs, m ) < —
->Resolve (b, a, pc, [],_,_, m,_) A
Derive All (pcs, If(h, b), a, cs, m)

Resolve(b, Atom(q, xs), If(Atom(q, ys), ls),s, t, r, m, n) <—


Rename(m, If(Atom(q, ys), ls), If(Atom(q, yls), l1s),n) A
UnifyTerms(y1s,xs,s,t) A
ReplaceConj(b, Atom(q, xs), l1s, r)

Fig. 14. The unfolding program

by Seki [1989] and Gardner and Shepherdson [1991] for normal programs.
Kanamori and Horiuchi [1987] showed that it preserves computed answers.
Figure 14 contains the top part of a program that performs an unfold-
ing step, defined by the predicate Unfold/4. The ground representation
given in Subsection 2.1 is assumed. The types of Unfold/4, SelectClause/2,
DeriveAll/5, and Replace/4 are as follows.
Predicate Type
Unfold List(u) * u, * u * List(u)
SelectClause u * List (u)
Derive All List(u) * u * u * List(u) * Integer
Replace List(u) * u * List(u) * List(u)
The predicate SelectClause/ 2 selects a clause from a program; DeriveAll/5
attempts a derivation step for every clause in the program; and Replace/4
removes an element from a list and inserts a sublist of elements in its place.
The remaining predicates are described in Subsection 3.2.
The unfold procedure requires certain basic steps: standardising apart
the variables used in the program's clauses from the variables in the clause
selected for unfolding, computing a unifier, and applying substitutions, and
so on. Moreover at the heart of such a procedure is the need to construct
480 P. M. Hill and J. Gallagher

a list of the representations of all clauses that satisfy certain conditions.


This could also be be achieved by means of intensional sets (if these are
provided) and then converting the set of clauses to a list. These basic
steps are common to program transformers and specializers and many other
similar meta-programs. Thus to simplify the writing of this kind of meta-
programming application, predicates that perform these tasks should be
provided by the meta-programming system.
Frequently, when transforming a program, the language has to be ex-
tended with new functions and predicates. The names of these symbols
need to be generated by the meta-program. Thus, the meta-programming
system needs to provide support for constructing new names for symbols
and modifying the language of an object program. The representation
given in Subsection 3.1 does not include a representation of the charac-
ters in the symbols of the object language and hence the representation is
not adequate for program transformers that extend or change the object
language.
The language of a Prolog program is determined by the functions and
predicates used in the clauses and goal. Also, names of symbols can be
converted to and from their corresponding lists of ascii codes by means of
predicates such as name/2. Hence, new symbol names can be added to the
representation of the object language. As the naming relation is trivial, this
defines new symbol names for the object languages themselves. However,
as Prolog does not support a ground representation, writing declarative
meta-programs for transforming programs is extremely difficult.
The Godel system not only includes predicates for adding and deleting
statements, but also for changing the object language. For example, the
system module Programs defines the predicate ProgramConstantName/4
which will convert between a string of characters forming the name of a
constant in an object language and its representation. The module Strings
provides standard string processing predicates and functions. Thus, new
names of symbols required for defining new object languages can be cre-
ated and their representation obtained. Other predicates add and delete
representations of names of symbols and their declarations to and from
the representation of an object language. For example, the predicate
InsertProgramConstant/6 creates a representation of a new program from
the representation [P] of an existing program P by adding the representa-
tion of a constant to [P]. Moreover, the system module Syntax in Godel
provides many predicates for basic meta-programming tasks such as stan-
dardising apart, applying a substitution, and finding an mgu.
In Section 6, the use of meta-programming for program specialization
is illustrated by partial evaluation. This technique uses partial information
about the goal to create a specialized program. It is intended that the
specialized program has the same semantics as the original program (when
called with an appropriate goal) but improved efficiency.
Meta-Programming in Logic Programming 481

6 Specialization of meta-programs
Meta-level computations involve an overhead for interpreting the repre-
sentation of an object program. The more complex and expressive the
representation, the greater the overhead is likely to be. The ground repre-
sentation, in particular, is associated by many with inefficiency. It has been
shown that the overhead can be "compiled away" for a meta-program that
operates on a given object program. The method for doing this is based on
a program transformation technique called program specialization which
is a large topic in its own right and not limited to meta-programs in its
application. However, the combination of meta-programming and program
specialization appears a particularly fruitful one, and it is this aspect to-
gether with its applications that we discuss in this section.

6.1 Logic program specialization


Let P be a logic program and G a goal. The aim of specialization is to
derive another program P' say, whose computations with G (and instances
of G) give identical results to those given by P. For goals other than G
and its instances, P' may give different results. The restriction of P to G
is exploited to gain efficiency; in other words, P' should be more efficient
than P, with respect to G.
The topic has been studied in a variety of programming languages, and
is often called partial evaluation. A comprehensive treatment of partial
evaluation, mainly for functional languages but with some sections on other
languages, is given by Jones et al. [1993]. Partial evaluation was introduced
into logic programming by Komorowski [1982] (who later called it partial
deduction), and the basic principles and results were established by Lloyd
and Shepherdson [1991]. A recent survey of techniques and results can be
found in [Gallagher, 1993].
Our main interest in specialization is when P, the program to be spe-
cialized, is a meta-program. Suppose either P or the goal G with respect
to which P is to be specialized contains the representation of an object
program. In this case the aim of the specialization of P with respect to G
is to compile away the representation of the object program.
In the following subsections we show the specialization of the Instance-
Demo interpreter and the specialization of a resolution procedure applied to
fixed object programs. The specialization of the Instance-Demo program
(Figure 5) yields clauses syntactically isomorphic to the object language
clauses, and therefore computations using the specialized /Demo/3 are al-
most identical to computations of the object program. The specialization
of resolution yields a program containing low-level predicates manipulating
terms, substitutions and so on. In fact specialization of a resolution inter-
preter is close to true compilation of the object program to a lower-level
target language, and the low-level operations correspond to instructions in
482 P. M. Hill and J. Gallagher

IDemo(p,x,y) <— InstanceOf(x,y) A IDemol(p,y)

IDemol(_, True).
IDemol(p,And(x,y)) <-
IDemol(p,x) A
IDemol(p,y)
IDemol(p, Not(x)) <-
-IDemol(p, x)
IDemo1(p,Atom(P(n),xs)) <-
Member(z,p) A InstanceOf(z, If(Atom(P(n),xs),b)) A

Fig. 15. The Instance-Demo interpreter I


IDemo([
If(Atom(P(0), [Var(0),Terro(F(l), [Var(0), Var(l)])]), True),

InstanceOf (x , y) A
/Demol(y).
(P(0), [x, Term(F(1), [x, z])]).
IDemol(Atom(P(0),[x,Term(F(1),[y,z])]) <-
IDemol(Atom(P(0), [x, z])).
Fig. 16. The specialized Instance-Demo interpreter

a target language such as Warren Abstract machine instructions.

6.1.1 Specialization of the Instance-Demo interpreter


The Instance-Demo program I in Figure 5 can be partially evaluated with
respect to a given object program. Figure 15 shows again the top level
procedures of this interpreter. When supplied with an object program
P, I can be specialized by partial evaluation, with respect to the goal <—
IDemo( [P] ,x,y). If the object program is the Member program (Figure 2)
then the goal is as follows.

<- IDemo([
If(Atom(P(0), [Vor(O), Term(F(l), [Var(O), Var(l)])]), True),
If(Atom(P(0), [Var(0),Term(F(1), [Var(1), Var(2)])]),
Atom(P(0),(Var(0),Var(2)]))],
x,y)
A suitable partial evaluation of I with respect to this goal gives the result
shown in Figure 16.
Meta-Programming in Logic Programming 483

The example shows that substantial optimizations are obtainable by


partial evaluation, since the overhead of handling the ground representation
in the original program has been almost eliminated. In other words, compu-
tations with the specialized Instance-Demo program are almost identical to
object-level computations with the object program. In the specialized pro-
gram the clauses for IDemo1/2 are very similar to the clauses for Member/2
in the object program representation, except that the representations of
variables have been replaced by meta-variables. This arises since the calls
to Member(z,p) and InstanceOf(z, If (Atom(P(n),xs),b)) have been com-
pletely unfolded, where p is the representation of the Member program.
Secondly, the first argument of IDemol/2 containing the object program
has been completely eliminated from the specialized IDemol/2 predicate
by means of a well-known structure specialization applied by most logic
program specialization systems [Gallagher and Bruynooghe, 1990]. Note
that the Instance-Demo program (and hence also its partial evaluation) is
typed and this ensures that the new meta-variables range over representa-
tions of object terms.
The close correspondence between the specialized Instance-Demo pro-
gram and its original object program is somewhat surprising, and it is
interesting to compare the natural interpretations of the original Instance-
Demo program and the specialized program in Figure 16. In the original
Instance-Demo program the intended domain of interpretation is the set of
object language expressions, and the denotation of a term in the ground
representation is the object term that it represents. On the other hand, in
the specialized program the natural domain of interpretation is (an exten-
sion of) the domain of the Member program.
At first sight it is odd that partial evaluation has the effect of changing
the intended interpretation of the Instance-Demo program as well as spe-
cializing its behaviour. This effect is not so surprising when one considers
that the general Instance-Demo program handles arbitrary (uninterpreted)
programs in the object language, whereas specialization with respect to a
particular object program allows, indeed suggests, a particular interpreta-
tion of the object language. This induces a natural interpretation of the
meta-language representations. More precisely, if the interpretation of a
ground object term t is an object d in the domain of interpretation of the
object language, then [t] in the meta-language will also denote d. For
terms [t] where t is a non-ground object expression a reasonable interpre-
tation of [t] is the set of objects denoted by ground instances of t. Such
an interpretation scheme can be applied to the specialized program in Fig-
ure 16. The interpretation of the specialized IDemo1 predicate in this case
is that IDemo1(Atom(P(0), [ [ t 1 ] , [t2]]) is true iff some ground instance of
Member ( t 1 , t 2 ) is true.
Note that this interpretation gives non-standard, but quite reasonable
interpretations for predicates such as InstanceOf /2, which are now inter-
484 P. M. Hill and J. Gallagher

preted as relations on the domain of the Member program rather than on


the domain of object language expressions.
It is clear from the example above that a corresponding set of clauses
for IDemo1/2 could be derived for any object program P. Suppose P is
of form [ I f ( h 1 , b 1 ) , . . . ,If(hn,bn)], where h 1 , . . .,hn and b 1 ,...,b n are the
representations of the heads and bodies of the clauses respectively. The spe-
cialized Instance-Demo program contains a set of clauses {IDemo1(h'1) <—
{IDemo1(b 1 ), . . ., IDemo1(h n ) <- IDemo1(b' n )}, where each hi and bi is ob-
tained from hi and bi respectively by replacing the ground representation
of variables by typed meta-variables.
Kowalski [1990] sketched the derivation of Solve-style interpreter con-
taining non-ground clauses representing an object program, by unfolding a
Demo interpreter. Kowalski's interpreter was similar to the Instance-Demo
interpreter, but contained a set of ground unit clauses representing the ob-
ject program instead of representing the program in an argument of Demo.
The derivation was achieved by unfolding a call to a predicate perform-
ing substitution. The use of types (or corresponding sort predicates) in
the Instance-Demo program is also essential to the process, a point missed
by Kowalski, since otherwise the meta-variables may have unintended in-
stances. Kowalski's argument, and the partial evaluation example above,
suggest the conclusion that the typed non-ground representation of an ob-
ject program is conceptually not far removed from a ground representation.
The well-known limitations of the Solve-style interpreters are a consequence
of the limitations of the Instance-Demo program.
However, as a technique for gaining efficiency in the ground repre-
sentation, the partial evaluation of Instance-Demo is interesting. There
is a superficial connection with the ad hoc implementation technique of
"melting" ground representations by substituting meta-variables in place
of ground representations of variables, in order to increase efficiency of
meta-programming with the ground representation. This was used for ex-
ample in the Logimix self-applicable partial evaluator for Prolog [Mogensen
and Bondorf, 1993]).
6.1.2 Specialization of a resolution procedure
More complex and flexible interpreters than Instance-Demo (such as a res-
olution proof procedure) are more difficult to specialize effectively. Gurr
[1993] achieved substantial efficiency improvements by partial evaluation
of a resolution interpreter written for the ground representation in Godel.
This work and others [Kursawe, 1987], [Nilsson, 1993] aimed to show that
specialization of logic program interpreters can produce results similar to
standard techniques for compiling logic programs based on the Warren ab-
stract machine. The connection between specialization and compilation
will be further discussed in Section 6.2. Here we illustrate the idea using
an example of the partial evaluation of resolution in Gurr's system.
Meta-Programming in Logic Programming 485

Fig. 17. Instantiated call to Resolve

Gurr partially evaluated a predicate Resolve /7 which we have already


used in the SLD-Demo program in Figure 9. It is intended that the first
argument of Resolve/7 represents a formula g; the second, a program clause
c; the third, a substitution s1 to be applied to g; the fifth, the resolvent g1
of g and c (with substitution s1 applied to g); and the fourth, the resulting
substitution s2. The remaining arguments v1 and v2 are indices for renam-
ing variables during standardization apart. We do not show the code for
Resolve/7 used by Gurr, which incorporates standardization apart, unifi-
cation of the goal with the clause head, application of the substitution to
the clause body, and composition of the unifier with the input substitution.
During partial evaluation of the resolution interpreter with a given ob-
ject program, the aim is to generate a separate specialized call to Resolve
for each clause in the object program. Consider the example used in [Gurr,
1993] where the object program contains a clause C:

Figure 17 shows the call to Resolve/7 that is to be partially evaluated. The


ground representation defined in Section 3 is used, where P(0) represents
P, P ( 1 ) represents Q, C(0) represents A, F(0) represents F, and V(0), V(l)
represent x and y respectively. Figure 18 shows the result. A new predicate
Resolved/6 has been created and the representation of the clause C (the
second argument of Resolve) has been eliminated. The calls in the body
to UnifyTerms, GetConstant, GetFunction, UnifyVariable, UnifyConstant
and Unify Value are the operations corresponding to the detailed matching
of the parts of the head of the clause, and are the residual parts of the
full resolution procedure written by Gurr. Note that the variable names
from the clause have also completely disappeared since the standardization
apart has effectively been carried out during partial evaluation. The body
predicates are deliberately named after their analogous Warren abstract
machine (WAM) instructions. This emphasizes the fact that partial eval-
uation with respect to a given object program achieves a similar result to
compilation of the program.
We can compare specialization with the other main approach to improv-
486 P. M. Hill and J. Gallagher

Fig. 18. Partial evaluation of Resolve

ing efficiency in meta-programs, namely "meta-programming facilities",


such as those provided in Godel or Reflective Prolog. Meta-programming
facilities consist of essential meta-programming tools and commonly used
procedures that are carefully coded, and provided to the programmer in
libraries or as built-in procedures in the meta-language. Such facilities are
specific to a given object language and a representation of it. This helps
to avoid unnecessary inefficiency, but does not eliminate the overhead of
using the representation. The advantage of such tools is that once written
they are easily applicable. The disadvantage is that there will always be
applications outside the scope of the given set of meta-programming facil-
ities. For example, if one has to deal with object programs in a different
language, or use a different representation, the built-in facilities may not
be applicable.
Specialization has the advantage of being applicable to any object lan-
guage and representation. Its disadvantage is that it has to be applied to
each meta-program separately, though the Futamura projections discussed
below alleviate this overhead.
Ideally, both approaches to efficiency improvement should be combined.
A meta-programming system should provide some facilities for efficient
handling of standard meta-programming problems, together with program
specialization tools. The latter can gain further efficiency for the standard
procedures, and also help to optimize meta-programs outside the scope of
the standard facilities. This complementary approach to optimizing meta-
programs suggests that built-in procedures should be written in such a way
as to make them more "specializable" [Gurr, 1993], though what this means
in practice is not yet completely understood.
Meta- Programming in Logic Programming 487

6.2 Specialization and compilation


The specialization of meta-programs is analogous to compilation, and the
relation between executing specialized and unspecialized meta-programs is
analogous to the relation between running compiled and interpreted code.
The comparison with compilation is actually quite extensive and is dis-
cussed further below. A functional notation is used though an equivalent,
but more cumbersome, formulation in logic programming is possible.
Definition 6.2.1. A program specializer written in M for £ is a function

It is assumed that an object program pc in £ is a function with two ar-


guments, and that the first argument is known but not the second. The
specializer takes the representations of pc with data xc for the first argu-
ment of pc and computes the representation of a specialized program pxc.
The defining property of the specializer PS is the following:

Definition 6.2.2. A language interpreter written in M for £ is a function

(We assume that D is the language of input and output for programs in
£). That is, IM takes the representations of a program pc and some data
XD for pc and computes the representation of the output, say yD.
The analogy between compilation and interpreter specialization was
first identified by Futamura [Futamura, 1971]. In the following we formulate
the so-called Futamura projections in such a way as to emphasize meta-
programming aspects.
Definition 6.2.3. First Futamura projection
Let PSk be a program specializer written in K, for M . Let IM be an
interpreter for programs in £. Then specialization of the interpreter with
respect to a given program pc is expressed by

Note that the second argument is a "meta-meta" (or meta2) object in


the sense that it is the representation in 1C of an object already represented
in M.
The result of the first Futamura projection is the representation of a
specialized program, say [IPM]k. By Definition 6.2.1, I p m (x) = I M ( P , x ) ,
488 P. M. Hill and J. Gallagher

so IpM preserves the functionality of p. IPM can be regarded as a "compiled"


version of p since it maps input directly to output. This analogy becomes
more concrete in practical program specialization systems, which effectively
perform the parsing of the object program, leaving residual "execution"
operations in the specialized program, as in real compilers.
Thus for instance, we could have £ = Pascal, M = Scheme and K =
Godel, in which case / is an interpreter for Pascal written in Scheme, and
PS is a specializer of Scheme programs, written in Godel. The second ar-
gument in the first Futamura projection is a Godel term encoding a Scheme
representation of a Pascal program. The result is a Godel representation of
a Scheme program. Note that to run it as a Scheme program involves ex-
tracting the Scheme program from its Godel representation. In summary,
a Pascal program has been "compiled" into Scheme.

6.3 Self-applicable program specializers


We now consider the case where the program specializer is self-applicable.
If K = M, then in principle the program PSk can be applied to itself by
constructing [PSk]k. The possibility of encoding a language in itself was
established by Godel numbering, as discussed in Section 4. It is then possi-
ble to "self-apply" PS, or more accurately, to apply PS to a representation
of itself.
Definition 6.3.1. Second Futamura projection
The specialization of PS with respect to itself and an interpreter IM is
expressed by

Note that the second argument is meta2. The result, namely [PS I M ]k,
is the representation of a program which, when given a program pc. , pro-
duces [Ipm]k. This is the representation of a compiled version of pc, as
established by the first Futamura projection. Thus the second Futamura
projection expresses the production of a compiler from an interpreter for C.
Definition 6.3.2. Third Futamura projection
The specialization of PS with respect to itself and (a representation of)
itself is expressed by

Again, the second argument is meta2, and is not the same as the first
argument. The result [PSPS]k is a program which, when given [IM]k,
produces [PS IM ]k. In other words, it returns the representation of a com-
piler of programs in L, as established by the second Futamura projection.
The third Futamura projection thus expresses the production of a compiler
generator (that produces a compiler from a given interpreter).
Meta- Programming in Logic Programming 489

The second and third projections provide a way of achieving the first
projection in stages. This is useful where the same meta-program is to be
executed with many different object programs. The "compiler" associated
with that meta-program can be obtained using the second projection. If
compilers for different meta-programs are to be produced, then the third
projection is useful since it shows how to obtain a "compiler-generator"
from a partial evaluator.
The effective implementation of the Futamura projections is the subject
of current research. The effectiveness of the first Futamura projection is
critical to the usefulness of the second and third, which are simply means
to achieve the first projection (compilation) by stages. Interpreters, or
indeed any meta-programs, do not appear to be any less complex than
programs in general from the point of view of specialization. Therefore
an effective specializer for the first projection should be also an effective
general purpose specializer.
In order to perform the second and third projections the specializer
should be effectively self-applicable. This requirement, added to the re-
quirement of being a good general purpose specializer, has proved very
difficult to meet. Program analysis methods based on abstract interpre-
tation have been employed to complement partial evaluation and add to
its effectiveness [Sestoft and Jones, 1988], [Mogensen and Bondorf, 1993],
[Gurr, 1993]. Another approach to self-application is to use two or more
versions of a specializer [Ruf and Weise, 1993], [Fujita and Furukawa, 1988].
A simple specializer can be applied to a complex one, or vice versa. In such
a method, extra run-time computation in the compiled program or in the
compiler produced by the projections is traded for less computation during
the second and third Futamura projections respectively.
6.4 Applications of meta-program specialization
The uses of specialization with meta-programming are many, and we finish
this section by indicating some areas of current research in which special-
ization of logic programs is relevant.
Implementing other languages and logics. First order logic, or fragments of
it such as definite clauses or normal clauses, can be used as a meta-language
for defining the semantics of other languages and logics. The Futamura
projections then offer a general compilation mechanism to improve the
computational efficiency of the semantics.
A proof system for a logic £ can be constructed from a set of (abstract)
syntax rules for defining expressions of L, together with a set of inference
rules of the form
0
where <*0, . . . ,ok,/3 are expressions in L, and B is inferred from c*0, . . . ,
and ok. Clearly these rules can be encoded as definite clauses, and the
490 P. M. Hill and J. Gallagher

concept of theory, proof, theorem and so on can be defined by clauses. The


procedural interpretation of clauses can then be used to search for theorems
from given object theories. Such an approach may appear naive as a way
to build efficient theorem proving systems but, when added to techniques
of specialization, practical results are obtainable. The advantage of the
method is its generality.
One experiment serves to illustrate this general framework. A theorem
prover for first order clauses, based on the model elimination method, was
written as a definite logic meta-program by de Waal and Gallagher [1994].
Here, the object language is full clausal logic, while definite clauses are
the meta-language. The theorem prover was then specialized with respect
to fixed object theories. The result for a given theory was a specialized
theorem prover that can prove theorems only in the given theory, but much
faster than the original prover. Theory-specific information, such as the
uselessness of given object formulas or inference rules in a given proof, can
also be obtained and exploited. This can be seen as an application of the
first Futamura projection, where the theorem prover is an "interpreter" of
clauses.
It appears that this experiment could be repeated for other theorem
provers, since the method assumes only that the theorem prover can be
written as a logic meta-program, an assumption that holds for any com-
putable logic. This provides a general approach to using a uniform meta-
language (logic programs) to implement other logics, and using specializa-
tion to get reasonable efficiency. In similar style, Jones et al. [1993] mention
the possibility of using partial evaluation as a way of improving the effi-
ciency of high-level functional meta-languages for programming language
definition and implementation.
Expert systems and knowledge based systems. A deductive data base or
knowledge base can be viewed as a logic program. Expert systems can also
be included under this heading. Procedures for querying, updating, check-
ing integrity constraints, and similar tasks are thus meta-level. procedures.
The potential of using partial evaluation to optimize expert system query
interpreters with respect to fixed bases of knowledge was identified by Levi
and Sardu [1988] and by Sterling and Beer [1989].
Enhanced language interpreters. In logic programming, interpreters for
tracing computations, spying, timing and so on are sometimes written as
"enhanced" versions of a standard or "vanilla" interpreter (see the Proof-
Tree interpreter in Figure 4). That is, the standard interpreter is written
as a meta-program (usually in the non-ground representation) augmented
with operations that record or report the state of the computation at each
step. The interpretation overhead is sometimes heavy, but partial eval-
uation offers a way to reduce this (for a given object program). Safra
and Shapiro [1986] report extensive use of this technique for a concurrent
Meta-Programming in Logic Programming 491

logic programming system, where facilities such as deadlock detection were


added to a standard interpreter. In their approach the second Futamura
projection was constructed by hand since their partial evaluator was not
self-applicable.
Optimizing non-standard procedural semantics. The dual reading of pro-
grams, declarative and procedural, represents one of the most distinctive
features of logic programming. A program with clear declarative seman-
tics can be regarded as a problem specification. Unfortunately, in order to
achieve efficient computations, complex non-standard procedural semantics
are often needed. Examples of these include coroutining, tabulation and
forward checking. The use of such procedural readings is often precluded
since they carry a lot of computational overhead. As a result programs are
often made more efficient (with respect to standard procedural semantics)
at the expense of declarative clarity.
Program specialization appears to offer a general approach to exploit-
ing the dual reading more effectively. The overhead of complex procedural
semantics can sometimes be drastically reduced if an interpreter for the se-
mantics is available, written as a meta-program. In this case the Futamura
projections can be applied to reduce the overhead. This method was advo-
cated by Gallagher [1986] and illustrated on coroutining programs. Gurr
[1993] also shows the compilation of a coroutining example. The work on
"compiling control" [Bruynooghe et al., 1989] is similar in its aims, but is
not based explicitly on meta-programming.
The use of meta-programming combined with specialization as a pro-
gram production technique is in its infancy. Applications such as those
mentioned in this section indicate its generality and promise.

Acknowledgements
We are particularly indebted to John Lloyd and Frank Van Harmelen who
have made many suggestions that have improved this chapter. We greatly
appreciate their help. We also thank all those who, through discussion
and by commenting on earlier drafts contributed to this work. These in-
clude Jonas Barklund, Tony Bowers, Henning Christiansen, Yuejun Jiang,
Bob Kowalski, Bern Martens, Dale Miller, Alberto Pettorossi, Danny De
Schreye, Sten-Ake Tarnlund, and Jiwei Wang.
Work on this chapter has been carried out while the first author was
supported by a SERC grant GR/H/79862. In addition, the ESPRIT Basic
Research Action 3012 (Compulog) and Project 6810 (Compulog 2) have
provided opportunities to develop our ideas and understanding of this area
by supporting many workshops and meetings and providing the funds to
attend.
492 P. M. Hill and J. Gallagher

References
[Abramson and Rogers, 1989] H. Abramson and M. Rogers, eds. Meta-
Programming in Logic Programming, MIT Press, 1989. Proceedings of
the Meta88 Workshop.
[Aiello et al, 1988] L. C. Aiello, D. Nardi and M. Schaerf. Reasoning about
knowledge and ignorance, in 'Proceedings of the FGCS', pp. 618-627,
1988.
[Barklund, 1995] J. Barklund. Metaprogramming in logic, Technical Re-
port UPMAIL 80, Department of Computer Science, University of Upp-
sala, Sweden. Also in Encyclopedia of Computer Science and Technology,
Vol. 33, A. Kent and J. G. Williams (eds.), Marcel Dekker, 1995.
[Barklund and Hamfelt, 1994] H. Barklund and A. Hamfelt. Hierarchical
representation of legal knowledge with meta-programming in logic, Jour-
nal of Logic Programming 18(1), 55-80, 1994.
[Barklund et al., 1995] J. Barklund, K. Boberg, P. Dell'Aqua and M.
Veanes. Meta-programming with theory systems. In K. Apt and
F. Turini, eds, Meta-programming in Logic Programming, MIT Press,
1995.
[Bowen and Kowalski, 1982] K. Bowen and R. A. Kowalski. Amalgamating
language and metalanguage in logic programming. In K. Clark & S.-A.
Tarnlund, eds, Logic Programming, pp. 153-172. Academic Press, 1982.
[Bowen and Weinberg, 1985] K. Bowen and T. Weinberg. A meta-level ex-
tension of Prolog. In Proceedings of 2nd IEEE Symposium on Logic Pro-
gramming, Boston, pp. 669-675. Computer Society Press, 1985.
[Brogi et al., 1990] A. Brogi, P. Mancarella, D. Pedreschi and F. Turini.
Composition operators for logic theories. In J. W. Lloyd, ed., Computa-
tional Logic, Springer-Verlag, pp. 117-134, 1990.
[Brogi et al., 1992] A. Brogi, P. Mancarella, D. Pedreschi and F. Turini.
Meta for modularising logic programming. In A. Pettorossi, ed., Meta-
Programming in Logic, Proceedings of the 3rd International Workshop,
Meta-92, Uppsala, Sweden, pp. 105-119. Springer-Verlag, 1992.
[Bruynooghe et al., 1989] M. Bruynooghe, D. De Schreye and B. Krekels.
Compiling control, Journal of Logic Programming 6, 135-162, 1989.
[Chen et al., 1993] W. Chen, M. Kifer and D. S. Warren. HiLog: A founda-
tion for higher-order logic programming, Journal of Logic Programming
15, 187-230, 1993.
[Christiansen, 1994] H. Christiansen. On proof predicates in logic program-
ming, In A. Momigliano and M. Ornaghi, eds, Proof-Theoretical Exten-
sions of Logic Programming, CMU, Pittsburgh, PA 15213-3890, USA,
1994. Proceedings of an ICLP-94 Post-Conference Workshop.
[Clark, 1978] K. L. Clark. Negation as failure. In H. Gallaire and J. Minker,
eds, Logic and Data Bases, Plenum Press, pp. 293-322, 1978.
Meta-Programming in Logic Programming 493

[Clark and McCabe, 1979] K. L. Clark and F. G. McCabe. The control


facilities of IC-PROLOG, In D. Michie, ed., Expert Systems in the Micro
Electronic Age, Edinburgh University Press, pp. 122-149, 1979.
[Colmerauer et al., 1973] A. Colmerauer, H. Kanoui, R. Pasero and P.
Roussel. Un systeme de communication homme-machine en Frangais,
Technical report, Groupe d'Intelligence Artificielle, Universite d'Aix
Marseille II, Luminy, France, 1973.
[Costantini, 1990] S. Costantini. Semantics of a metalogic programming
language. In M. Bruynooghe, ed., Proceedings of the 2nd Workshop on
Meta-programming in Logic, Katholieke Universiteit, Leuven, Belgium,
pp. 3-18, 1990.
[Costantini and Lanzarone, 1989] S. Costantini and G. Lanzarone. A met-
alogic programming language. In G. Levi and M. Martelli, eds, 6th In-
ternational Conference on Logic Programming, Lisbon, pp. 218-233. The
MIT Press, Cambridge, MA, 1989.
[De Waal and Gallagher, 1994] D. De Waal and J. Gallagher. The applica-
bility of logic program analysis and transformation to theorem proving.
In Proceedings of the 12th International Conference on Automated De-
duction (CADE-12), Nancy, 1994.
[Dunin-Keplicz, 1994] B. Dunin-Keplicz. An architecture with multiple
meta-levels for the development of correct programs. In F. Turini, ed.,
Proceedings of the 4th International Workshop on Meta-Programming in
Logic (Meta-94), 1994. to be published by Springer-Verlag.
[Feferman, 1962] S. Feferman. Transfinite recursive progressions of ax-
iomatic theories, The Journal of Symbolic Logic 27(3), 259-316, 1962.
[Fujita and Furukawa, 1988] H. Fujita and K. Furukawa. A self-applicable
partial evaluator and its use in incremental compilation, New Generation
Computing 6(2,3), 91-118, 1988.
[Futamura, 1971] Y. Futamura. Partial evaluation of computation
process—an approach to a compiler-compiler, Systems, Computers,
Controls 2, 45-50, 1971.
[Gallagher, 1986] J. Gallagher. Transforming logic programs by specialis-
ing interpreters. In Proceedings of the 7th European Conference on Arti-
ficial Intelligence (ECAI-86), Brighton, pp. 109-122, 1986.
[Gallagher, 1993] J. Gallagher. Specialisation of logic programs: A tutorial.
In ACM-SIGPLAN Symposium on Partial Evaluation and Semantics
Based Program Manipulation, Copenhagen, pp. 88-98, 1993.
[Gallagher and Bruynooghe, 1990] J. Gallagher and M. Bruynooghe. Some
low-level source transformations for logic programs. In M. Bruynooghe,
ed., Proceedings of the 2nd Workshop on Meta Programming in Logic,
Katholieke Universiteit Leuven, Belgium, pp. 229-244, 1990.
[Gardner and Shepherdson, 1991] P. Gardner and J. Shepherdson. Un-
fold/fold transformations of logic programs. In J.-L. Lassez & G. Plotkin,
494 P. M. Hill and J. Gallagher

eds, Computational Logic: Essays in Honor of Alan Robinson, MIT


Press, pp. 565-583, 1991.
[Godel, 1931] K. Godel. Uber formal unentscheidbare Satze der Principia
Mathematica und verwandter Systeme I. Monatsh. Math. Phys. 38, 173-
198, 1931. English translation in From Frege to Godel, J. van Heijenoort,
ed. pp. 592-617. Harvard University Press, 1967.
[Gurr, 1993] C. Gurr. A Self-applicable Partial Evaluator for the Logic
Programming Language Godel, PhD thesis, Dept. of Computer Science,
University of Bristol, 1993.
[Hannan and Miller, 1989] J. Hannan and D. Miller. A meta-logic for
functional programming. In H. Abramson & M. Rogers, eds, Meta-
Programming in Logic Programming, MIT Press, chapter 24, pp. 453-
476, 1989.
[Hannan & Miller, 1992] J. Hannan and D. Miller. From operational se-
mantics to abstract machines, Mathematical Structures in Computer Sci-
ence 2(4), 415-459, 1992.
[Hill and Lloyd, 1988] P. Hill and J. Lloyd. Meta-programming for dy-
namic knowledge bases, Technical Report CS-88-18, Department of Com-
puter Science, University of Bristol, 1988.
[Hill and Lloyd, 1989] P. Hill and J. Lloyd. Analysis of meta-programs. In
H. Abramson and M. Rogers, eds, Meta-Programming in Logic Program-
ming, MIT Press, pp. 23-52, 1989. Proceedings of the Meta88 Workshop,
June 1988.
[Hill and Lloyd, 1994] P. Hill and J. Lloyd. The Godel Programming Lan-
guage, MIT Press, 1994.
[ISO/IEC, 1995] ISO/IEC 13211-1: 1995. Information Technology — Pro-
gramming Languages — Part 1: General Core. International Standards
Organisation, 1995.
[Jiang, 1994] Y. Jiang. Ambivalent logic as the semantic basis of meta-
logic programming: I. In P. V. Hentenryck, ed., Proceedings of the 11th
International Conference on Logic Programming, MIT Press, 1994 .
[Jones et al., 1993] N. Jones, C. Gomard and P. Sestoft. Partial Evaluation
and Automatic Software Generation, Prentice Hall, 1993.
[Kanamori and Horiuchi, 1987] T. Kanamori and K. Horiuchi. Construc-
tion of logic programs based on generalised unfold/fold rules. In J.-L.
Lassez, ed., Proceedings of the Fourth International Conference on Logic
Programming, 1987.
[Kim and Kowalski, 1990] J. S. Kim and R. A. Kowalski. An application
of amalgamated logic to multi-agent belief. In M. Bruynooghe, ed., Pro-
ceedings of the 2nd Workshop on Meta Programming in Logic, Katholieke
Universiteit Leuven, Belgium, pp. 272-283, 1990.
[Komorowski, 1982] H. Komorowski. Partial evaluation as a means for in-
Meta-Programming in Logic Programming 495

ferencing data structures in an applicative language: A theory and imple-


mentation in the case of Prolog, In 9th ACM Symposium on Principles of
Programming Languages, Albuquerque, New Mexico, pp. 255-267, 1982.
[Kowalski, 1990] R. A. Kowalski. Problems and promises of computational
logic. In J. W. Lloyd, ed., Computational Logic, Springer-Verlag, pp. 1-
36, 1990.
[Kowalski, 1994] R. A. Kowalski. Logic without model theory. In D. M.
Gabbay, ed., What is a Logical system?, Oxford University Press, 1994.
[Kursawe, 1987] P. Kursawe. How to invent a Prolog machine. New Gen-
eration Computing 5, 97-114, 1987.
[Levi and Sardu, 1988] G. Levi and G Sardu. Partial evaluation of
metaprograms in a 'multiple worlds' logic language. New Generation
Computing 6, 227-247, 1988.
[Lloyd, 1987] J. Lloyd. Foundations of Logic Programming, second edn,
Springer-Verlag, 1987.
[Lloyd and Shepherdson, 1991] J. Lloyd and J. Shepherdson. Partial eval-
uation in logic programming. The Journal of Logic Programming
11(3&4), 217-242, 1991.
[Martens and De Schreye, 1992a] B. Martens and D. De Schreye. Why un-
typed non-ground meta-programming is not (much of) a problem, Tech-
nical Report CW 159, Department of Computer Science, Katholieke Uni-
versiteit Leuven, 1992. An abridged version will published in the Journal
of Logic Programming.
[Martens and De Schreye, 1992b] B. Martens and D. De Schreye. A perfect
Herbrand semantics for untyped vanilla meta-programming. In K. Apt,
ed., Proceedings of the Joint International Conference on Logic Program-
ming, Washington, USA, pp. 511-525, 1992.
[Miller and Nadathur, 1987] D. Miller and G. Nadathur. A logic program-
ming approach to manipulating formulas and programs. In S. Haridi,
ed., IEEE Symposium on Logic Programming, San Francisco, pp. 379-
388, 1987.
[Miller and Nadathur, 1995] D. Miller and G. Nadathur. Higher-order logic
programming. In D. Gabbay, C. Hogger and J. Robinson, eds, Handbook
of Logic in Artificial Intelligence and Logic Programming, Volume V:
Deduction Methodologies, Oxford University Press, 1995.
[Mogensen and Bondorf, 1993] T. Mogensen and A. Bondorf. Logimix: A
self-applicable Partial Evaluator for Prolog. In K.-K. Lau & T. Clement,
eds, Logic Program Synthesis and Transformation, Manchester 1992,
Springer-Verlag, 1993.
[Nait and Abdallah, 1987] M. Nait and Abdallah. Logic programming with
ions, In T. Ottmann, ed., Proceedings of the 14th International Collo-
quium on Automata, Languages, and Programming, LNCS 267, pp. 11-
20, 1987.
496 P. M. Hill and J. Gallagher

[Nilsson, 1993] U. Nilsson. Towards a methodology for the design of ab-


stract machines for logic programming languages. Journal of Logic Pro-
gramming 16, 163-189, 1993.
[Pereira et al, 1978] L. Pereira, F. Pereira and D. Warren. User's guide to
DECsystem-10 Prolog, Technical report, Department of A.I., University
of Edinburgh, 1978.
[Perlis and Subrahmanian, 1994] D. Perlis and V. S. Subrahmanian. Meta-
languages, reflection principles, and self-reference. In D. Gabbay, C. Hog-
ger & J. Robinson, eds, Handbook of Logic in Artificial Intelligence and
Logic Programming, Volume II: Deduction Methodologies, Oxford Uni-
versity Press, 1994.
[Quine, 1951] W. Quine. Mathematical Logic (Revised Edition), Harvard
University Press, 1951.
[Richards, 1974] B. Richards. A point of self-reference. Synthese, 1974.
[Ruf and Weise, 1993] E. Ruf and D. Weise. On the specialization of online
program specializers. Journal of Functional Programming 33, 251-281,
1993.
[Safra and Shapiro, 1986] S. Safra and E. Shapiro. Meta interpreters for
real. in In H.-J. Kugler, ed., Information Processing 86, North Holland,
pp. 271-278, 1986.
[Seki, 1989] H. Seki. Unfold/Fold Transformation of Stratified Programs,
In G. Levi & M. Martelli, eds, 6th International Conference on Logic
Programming, Lisbon, Portugal, The MIT Press, Cambridge, MA, 1989.
[Sestoft and Jones, 1988] P. Sestoft and N. Jones. The structure of a self-
applicable partial evaluator. In H. Ganzinger and N. Jones, eds, Programs
as Data Objects, North-Holland, 1988.
[Shapiro, 1982] E. Shapiro. Algorithmic Program Debugging, MIT Press,
1982. An ACM Distinguished Dissertation.
[Shepherdson, 1994] J. Shepherdson. Negation as failure: Completion and
stratification, In D. Gabbay, C. Hogger & J. Robinson, eds, Handbook of
Logic in Artificial Intelligence and Logic Programming, Volume V: Logic
Programming, Oxford University Press, 1995.
[Sterling and Beer, 1986] L. Sterling and R. Beer. Incremental flavor-
mixing of meta-interpreters for expert system construction. In Proceed-
ings of the 3rd Symposium on Logic Programming, Salt Lake City, pp. 20-
27, 1986.
[Sterling and Beer, 1989] L. Sterling and R. Beer. Metainterpreters for ex-
pert system construction. Journal of Logic Programming 6, 163-178,
1989.
[Sterling and Shapiro, 1986] L. Sterling and E. Shapiro. The Art of Prolog,
MIT Press, 1986.
[Takeuchi and Furukawa, 1986] A. Takeuchi and K. Furukawa. Partial
Meta-Programming in Logic Programming 497

evaluation of Prolog programs and its application to meta programming.


In H.-J. Kugler, ed., Information Processing 86, Dublin, North Holland,
pp. 415-420, 1986.
[Tamaki and Sato, 1984] H. Tamaki and T. Sato. Unfold/fold transforma-
tions of logic programs, In Proceedings of the 2nd International Confer-
ence on Logic Programming, Uppsala, Sweden, pp. 127-138, 1984 .
[Van Harmelen, 1992] F. Van Harmelen. Definable naming relations in
metalevel systems, In A.Pettorossi, ed., Meta-Programming in Logic,
Proceedings of the 3rd International Workshop, META-92, Springer-
Verlag, 1992.
[Weyhrauch, 1980] R. Weyhrauch. Prolegomena to a theory of mechanised
formal reasoning, Artificial Intelligence 13, 133-170, 1980.
[Weyhrauch, 1982] R. Weyhrauch. An example of FOL using Metatheory.
Formalizing reasoning systems and introducing derived inference rules.
In Proceedings of the 6th Conference on Automatic Deduction, 1982.
This page intentionally left blank
Higher-Order Logic Programming
Gopalan Nadathur and Dale Miller

Contents
1 Introduction 500
2 A motivation for higher-order features 502
3 A higher-order logic 510
3.1 The language 510
3.2 Equality between terms 513
3.3 The notion of derivation 517
3.4 A notion of models 519
3.5 Predicate variables and the subformula property . 522
4 Higher-order Horn clauses 523
5 The meaning of computations 528
5.1 Restriction to positive terms 529
5.2 Provability and operational semantics 534
6 Towards a practical realization 537
6.1 The higher-order unification problem 538
6.2 P-derivations 541
6.3 Designing an actual interpreter 546
7 Examples of higher-order programming 549
7.1 A concrete syntax for programs 549
7.2 Some simple higher-order programs 552
7.3 Implementing tactics and tacticals 556
7.4 A comparison with functional programming . . . . 560
8 Using y-terms as data structures 561
8.1 Implementing an interpreter for Horn clauses . . . 563
8.2 Dealing with functional programs as data 565
8.3 A limitation of higher-order Horn clauses 572
9 Hereditary Harrop formulas 574
9.1 Universal quantifiers and implications in goals . . . 574
9.2 Recursion over structures with binding 577
10 Conclusion 584
500 Gopalan Nadathur and Dale Miller

1 Introduction
Modern programming languages such as Lisp, Scheme and ML permit pro-
cedures to be encapsulated within data in such a way that they can subse-
quently be retrieved and used to guide computations. The languages that
provide this kind of an ability are usually based on the functional pro-
gramming paradigm, and the procedures that can be encapsulated in them
correspond to functions. The objects that are encapsulated are, therefore,
of higher-order type and so also are the functions that manipulate them.
For this reason, these languages are said to allow for higher-order pro-
gramming. This form of programming is popular among the users of these
languages and its theory is well developed.
The success of this style of encapsulation in functional programming
makes is natural to ask if similar ideas can be supported within the logic
programming setting. Noting that procedures are implemented by predi-
cates in logic programming, higher-order programming in this setting would
correspond to mechanisms for encapsulating predicate expressions within
terms and for later retrieving and invoking such stored predicates. At least
some devices supporting such an ability have been seen to be useful in prac-
tice. Attempts have therefore been made to integrate such features into
Prolog (see, for example, [Warren, 1982]), and many existing implemen-
tations of Prolog provide for some aspects of higher-order programming.
These attempts, however, are unsatisfactory in two respects. First, they
have relied on the use of ad hoc mechanisms that are at variance with the
declarative foundations of logic programming. Second, they have largely
imported the notion of higher-order programming as it is understood within
functional programming and have not examined a notion that is intrinsic
to logic programming.
In this chapter, we develop the idea of higher-order logic programming
by utilizing a higher-order logic as the basis for computing. There are, of
course, many choices for the higher-order logic that might be used in such
a study. If the desire is only to emulate the higher-order features found in
functional programming languages, it is possible to adopt a "minimalist"
approach, i.e., to consider extending the logic of first-order Horn clauses—
the logical basis of Prolog—in as small a way as possible to realize the
additional functionality. The approach that we adopt here, however, is
to enunciate a notion of higher-order logic programming by describing an
analogue of Horn clauses within a rich higher-order logic, namely, Church's
simple theory of types [Church, 1940]. Following this course has a number
of pleasant outcomes. First, the extension to Horn clause logic that results
from it supports, in a natural fashion, the usual higher-order program-
ming capabilities that are present in functional programming languages.
Second, the particular logic that is used provides the elegant mechanism
of y-terms as a means for constructing descriptions of predicates and this
Higher-Order Logic Programming 501

turns out to have benefits at a programming level. Third, the use of a


higher-order logic blurs the distinction between predicates and functions—
predicates correspond, after all, to their characteristic functions—and this
makes it natural both to quantify over functions and to extend the mecha-
nisms for constructing predicate expressions to the full class of functional
expressions. As a consequence of the last aspect, our higher-order exten-
sion to Horn clauses contains within it a convenient means for representing
and manipulating objects whose structures incorporate a notion of binding.
The abilities that it provides in this respect are not present in any other
programming paradigm, and, in general, our higher-order Horn clauses
lead to a substantial enrichment of the notion of computation within logic
programming.
The term "higher-order logic" is often a source of confusion and so it
is relevant to explain the sense in which it is used here. There are at least
three different readings for this term:
1. Philosophers of mathematics usually divide logic into first-order logic
and second-order logic. The latter is a formal basis for all of mathe-
matics and, as a consequence of Godel's first incompleteness theorem,
cannot be recursively axiomatized. Thus, higher-order logic in this
sense is basically a model theoretic study [Shapiro, 1985].
2. To a proof theorist, all logics correspond to formal systems that are
recursively presented and a higher-order logic is no different. The
main distinction between a higher-order and a first-order logic is
the presence in the former of predicate variables and comprehen-
sion, i.e., the ability to form abstractions over formula expressions.
Cut-elimination proofs for higher-order logics differ qualitatively from
those for first-order logic in that they need techniques such as Girard's
"candidats de reductibilite," whereas proofs in first-order logics can
generally be done by induction [Girard et al., 1989]. Semantic argu-
ments can be employed in this setting, but general models (including
non-standard models) in the sense of Henkin [Henkin, 1950] must be
considered.
3. To many working in automated deduction, higher-order logic refers to
any computational logic that contains typed y-terms and/or variables
of some higher-order type, although not necessarily of predicate type.
Occasionally, such a logic may incorporate the rules of y-conversion,
and then unification of expressions would have to be carried out rel-
ative to these rules.
Clearly, it is not sensible to base a programming language on a higher-order
logic in the first sense and we use the term only in the second and third
senses here. Note that these two senses are distinct. Thus, a logic can be a
higher-order one in the second sense but not in the third: there have been
proposals for adding forms of predicate quantification to computational
502 Gopalan Nadathur and Dale Miller

logics that do not use y-terms and in which the equality of expressions
continues to be based on the identity relation. One such proposal appears
in [Wadge, 1991]. Conversely, a logic that is higher-order in the third
sense may well not permit a quantification over predicates and, thus, may
not be higher-order in the second sense. An example of this kind is the
specification logic that underlies the Isabelle proof system [Paulson, 1990].
Developing a theory of logic programming within higher-order logic has
another fortunate outcome. The clearest and strongest presentations of
the central results regarding higher-order Horn clauses require the use of
the sequent calculus: resolution and model theory based methods that are
traditionally used in the analysis of first-order logic programming are either
not useful or not available in the higher-order context. It turns out that
the sequent calculus is an apt tool for characterizing the intrinsic role of
logical connectives within logic programming and a study such as the one
undertaken here illuminates this fact. This observation is developed in a
more complete fashion in [Miller et al., 1991] and [Miller, 1994] and in the
chapter by Loveland and Nadathur in this volume of the Handbook.

2 A motivation for higher-order features


We are concerned in this chapter with providing a principled basis for
higher-order features in logic programming. Before addressing this con-
cern, however, it is useful to understand the higher-order additions to this
paradigm that are motivated by programming considerations. We explore
this issue in this section by discussing possible enhancements to the logic of
first-order Horn clauses that underlies usual logic programming languages.
Horn clauses are traditionally described as the universal closures of dis-
junctions of literals that contain at most one positive literal. They are
subdivided into positive Horn clauses that contain exactly one positive lit-
eral and negative Horn clauses that contain no positive literals. This form
of description has its origins in work on theorem-proving within classi-
cal first-order logic, a realm from which logic programming has evolved.
Clausal logic has been useful within this context because its simple syn-
tactic structure simplifies the presentation of proof procedures. Its use is
dependent on two characteristics of classical first-order logic: the ability to
convert arbitrary formulas to sets of clauses that are equivalent from the
perspective of unsatisfiability, and the preservation of clausal form under
logical operations like substitution and resolution. However, these prop-
erties do not hold in all logics that might be of interest. The conversion
to clausal form is, for instance, not possible within the framework of intu-
itionistic logic. In a similar vein, substitution into a disjunction of literals
may not produce another clause in a situation where predicates are per-
mitted to be variables; as we shall see in Subsection 3.5, substitution for
a predicate variable has the potential to change the top-level logical struc-
Higher-Order Logic Programming 503

ture of the original formula. The latter observation is especially relevant


in the present context: a satisfactory higher-order generalization of Horn
clauses must surely allow for predicates to be variables and must therefore
be preceded by a different view of Horn clauses themselves.
Fortunately there is a description of Horn clauses that is congenial to
their programming application and that can also serve as the basis for
a higher-order generalization. Let A be a syntactic variable for an atomic
formula in first-order logic. Then we may identify the classes of (first-order)
G- and D-formulas by the following rules:

These formulas are related to Horn clauses in the following sense: within
the framework of classical logic, the negation of a G-formula is equivalent
to a set of negative Horn clauses and, similarly, a D-formula is equivalent
to a set of positive Horn clauses. We refer to the D-formulas as definite
clauses, an alternative name in the literature for positive Horn clauses, and
to the G-formulas as goal formulas because they function as goals within
the programming paradigm of interest. These names in fact motivate the
symbol chosen to denote members of the respective classes of formulas.
The programming interpretation of these formulas is dependent on
treating a collection of closed definite clauses as a program and a closed
goal formula as a query to be evaluated; definite clauses and goal formulas
are, for this reason, also called program clauses and queries, respectively.
The syntactic structures of goal formulas and definite clauses are relevant
to their being interpreted in this fashion. The matrix of a definite clause
is a formula of the form A or G D A, and this is intended to correspond to
(part of) the definition of a procedure whose name is the predicate head
of A. Thus, an atomic goal formula that "matches" with A may be solved
immediately or by solving the corresponding instance of G, depending on
the case of the definite clause. The outermost existential quantifiers in a
goal formula (which are usually left implicit) are interpreted as a request
to find values for the quantified variables that produce a solvable instance
of the goal formula. The connectives A and V that may appear in goal
formulas typically have search interpretations in the programming context.
The first connective specifies an AND branch in a search, i.e. it is a request
to solve both components of a conjoined goal. The goal formula G1 V G2
represents a request to solve either G1 or G2 independently, and V is thus
a primitive for specifying OR branches in a search. This primitive is not
provided for in the traditional description of Horn clauses, but it is nev-
ertheless present in most implementations of Horn clause logic. Finally, a
search related interpretation can also be accorded to the existential quan-
tifier. This quantifier can be construed as a primitive for specifying an
infinite OR branch in a search, where each branch is parameterized by the
504 Gopalan Nadathur and Dale Miller

choice of substitution for the quantified variable.1


We illustrate some of the points discussed above by considering a pro-
gram that implements the "append" relation between three lists. The first
question to be addressed in developing this program is that of the repre-
sentation to be used for lists. A natural choice for this is one that uses
the constant nil for empty lists and the binary function symbol cons to
construct a list out of a "head" element and a "tail" that is a list. This
representation makes use of the structure of first-order terms and is symp-
tomatic of another facet of logic programming: the terms of the underlying
logic provides the data structures of the resulting programming language.
Now, using this representation of lists, a program for appending lists is
given by the following definite clauses:
VL append(nil, L, L),
VXVL 1 VL 2 VL 3 (append(L 1 ,L 2 ,L 3 ) D
append(cons(X, L1), L2, cons(X, L3))).
We assume that append in the definite clauses above is a predicate name,
coincidental, in fact, with the name of the procedure defined by the clauses.
The declarative intent of these formulas is apparent from their logical in-
terpretation: the result of appending the empty list to any other list is the
latter list and the result of appending a list with head X and tail L1 to
(another list) L2 is a list with head X and tail L3, provided L3 is the result
of appending L1 and L2. Prom a programming perspective, the two definite
clauses have the following import: The first clause defines append as a pro-
cedure that succeeds immediately if its first argument can be matched with
nil; the precise notion of matching here is, of course, first-order unification.
The second clause pertains to the situation where the first argument is a
non-empty list. To be precise, it stipulates that one way of solving an
"append" goal whose arguments unify with the terms cons(X, L 1 ), L2 and
cons(X, L3) is by solving another "append" goal whose arguments are the
relevant instantiations of L1, L2 and L3.
The definition of the append procedure might now be used to answer
relevant questions about the relation of appending lists. For example, sup-
pose that a, b, c and d are constant symbols. Then, a question that might
be of interest is the one posed by the following goal formula:
3L append(cons(a, cons(b, nil)), cons(c, cons(d, nil)), L).
Consistent with the interpretation of goal formulas, this query corresponds
to a request for a value for the variable L that makes
1
Our concern, at the moment, is only with the programming interpretation of the
logical symbols. It is a nontrivial property about Horn clauses that this programming
interpretation of these symbols is compatible with their logical interpretation. For in-
stance, even if we restrict our attention to classical or intuitionistic logic, the provability
of a formula of the form 3x F from a set of formulas F does not in general entail the
existence of an instance of 3x F that is provable from T.
Higher-Order Logic Programming 505

append(cons(a, cons(b, nil)), cons(c, cons(d, nil)), L)


a solvable goal formula. A solution for this goal formula may be sought
by using the procedural interpretation of append, leading to the conclusion
that the only possible answer to it is the value
cans(a, cons(b, cons(c, cons(d, nil))))
for L. A characteristic of this query is that the "solution path" and the
final answer for it are both deterministic. This is not, however, a necessary
facet of every goal formula. As a specific example, the query
EL 1 EL 2 append(L 1 ,L 2 ,cons(a,cons(b,cons(c,cons(d,nil)))))
may be seen to have five different answers, each of which is obtained by
choosing differently between the two clauses for append in the course of a
solution. This aspect of nondeterminism is a hallmark of logic programming
and is in fact a manifestation of its ability to support the notion of search
in a primitive way.
Our current objective is to expose possible extensions to the syntax of
the first-order G- and D-formulas that permit higher-order notions to be
realized within logic programming. One higher-order ability that is cov-
eted in programming contexts is that of passing procedures as arguments
to other procedures. The above discussion makes apparent that the mech-
anism used for parameter passing in logic programming is unification and
that passing a value to a procedure in fact involves the binding of a vari-
able. A further point to note is that there is a duality in logic programming
between predicates and procedures; the same object plays one or the other
of these roles, depending on whether the point-of-view is logical or that of
programming. Prom these observations, it is clear that the ability to pass
procedures as arguments must hinge on the possibility of quantifying over
predicates.
Predicate quantification is, in fact, capable of providing at least the sim-
plest of programming abilities available through higher-order programming.
The standard illustration of the higher-order capabilities of a language is
the possibility of defining a "mapping" function in it. Such "functions"
can easily be defined within the paradigm being considered with the pro-
vision of predicate variables. For example, suppose that we wish to define
a function that takes a function and a list as arguments and produces a
new list by applying the given function to each element of the former list.
Given the relational style of programming prevalent in logic programming,
such a function corresponds to the predicate mappred that is defined by
the following definite clauses:
VP mappred(P, nil, nil),
VP VL1 VL2 VX VY ((P(X, Y) A mappred(P, L1, L 2 )) D
mappred(P, cons(X, L 1 ) , cons(Y, L 2 ) ) ) .
The representation that is used for lists here is identical to the one described
506 Gopalan Nadathur and Dale Miller

in the context of the append program. The clauses above involve a quantifi-
cation over P which is evidently a predicate variable. This variable can be
instantiated with the name of an actual procedure (or predicate) and would
lead to the invocation of that procedure (with appropriate arguments) in
the course of evaluating a mappred query. To provide a particular example,
let us assume that our program also contains a list of clauses defining the
ages of various individuals, such as the following:
age(bob, 24),
age(sue, 23).
The procedure mappred can then be invoked with age and a list of indi-
viduals as arguments and may be expected to return a corresponding list
of ages. For instance, the query
EL mappred(age, cons(bob, cons(sue, nil)),L)
should produce as an answer the substitution cons(24, cons(23, nil)) for L.
Tracing a successful solution path for this query reveals that, in the course
of producing an answer, queries of the form age(bob, Y1) and age(sue, Y2)
have to be solved with suitable instantiations for Y1 and Y2.
The above example involves an instantiation of a simple kind for pred-
icate variables—the substitution of a name of a predicate. A question to
consider is if it is useful to permit the instantiation of such variables with
more complex predicate terms. One ability that seems worth while to sup-
port is that of creating new relations by changing the order of arguments
of given relations or by projecting onto some of their arguments. There are
several programming situations where it is necessary to have "views" of a
given relation that are obtained in this fashion, and it would be useful to
have a device that permits the generation of such views without extensive
additions to the program. The operations of abstraction and application
that are formalized by the y-calculus provide for such a device. Consider,
for example, a relation that is like the age relation above, except that it has
its arguments reversed. Such a relation can be represented by the predicate
term yX yY age(Y, X). As another example, the expression yX age(X, 24)
creates from age a predicate that represents the set of individuals whose
age is 24.
An argument can thus be made for enhancing the structure of terms
in the language by including the operations of abstraction and application.
The general rationale is that it is worth while to couple the ability to
treat predicates as values with devices for creating new predicate valued
terms. Now, there are mechanisms for combining predicate terms, namely
the logical connectives and quantifiers, and the same argument may be
advanced for including these as well. To provide a concrete example, let us
assume that our program contains the following set of definite clauses that
define the "parent" relation between individuals:
Higher-Order Logic Programming 507

parent (bob, John),


parent(john, mary),
parent(sue, dick),
parent(dick, kate).
One may desire to create a grandparent relation based on these clauses.
This relation is, in fact, implemented by the term
yX yY 3Z (parent(X, Z) A parent(Z, Y)).
An existential quantifier and a conjunction are used in this term to "join"
two relations in obtaining a new one. Relations of this sort can be used in
meaningful ways in performing computations. For example, the query
3L mappred(yX yY 3Z (parent(X, Z) ^ parent(Z, Y ) ) ,
cons(bob, cons(sue, nil)), L)
illustrates the use of the relation shown together with the mappred predi-
cate in asking for the grandparents of the individuals in a given list.
Assuming that we do allow logical symbols to appear in terms, it is
relevant to consider whether the occurrences of these symbols should be
restricted in any way. The role that these symbols are intended to play
eventually. indicates the answer to this question. Predicate terms are to
instantiate predicate variables that get invoked as queries after being sup-
plied with appropriate arguments. The logical connectives and quantifiers
that appear in terms therefore become primitives that direct the search
in the course of a computation. This observation, when coupled with our
desire to preserve the essential character of Horn clause programming while
providing for higher-order features, leads to the conclusion that only those
logical symbols should be permitted in terms that can appear in the top-
level structure of goal formulas. This argues specifically for the inclusion
of only conjunctions, disjunctions and existential quantifications. This re-
striction can, of course, be relaxed if we are dealing with a logic whose
propositional and quantificational structure is richer than that of Horn
clauses. One such logic is outlined in Section 9 and is studied in detail in
[Miller et al., 1991].
Our consideration of higher-order aspects began with the provision of
predicate variables. In a vein similar to that for logical symbols, one may
ask whether there are practical considerations limiting the occurrences of
these variables. In answering this question, it is useful to consider the struc-
ture of definite clauses again. These formulas are either of the form Vx A
or Vx (G D A). An intuitive justification can be provided for permitting
predicate variables in at least two places in such formulas: in "argument"
positions in A and as the "heads" of atoms in G, The possibility for pred-
icate variables to appear in these two locations is, in fact, what supports
the ability to pass procedures as arguments and to later invoke them as
goals. Now, there are two other forms in which a predicate variable can
508 Gopalan Nadathur and Dale Miller

appear in the formulas being considered, and these are as an argument of


an atom in G and as the head of A. An examination of the definition of the
mappred procedure shows that it is useful to permit predicate variables to
appear within the arguments of atomic goal formulas. With regard to the
other possibility, we recall that, from a programming perspective, a definite
clause is to be viewed as the definition of a procedure whose name it given
by the head of A. A consideration of this fact indicates that predicate
variables are not usefully permitted in the head position in A.
We have been concerned up to this point with only predicate variables
and procedures. There is, however, another higher-order facet that should
be considered and this is the possibility of functions to be variables. There
is a similarity between predicates and functions—predicates are, eventu-
ally, boolean valued functions—and so it seems reasonable to permit a
quantification over the latter if it is permitted over the former. There is,
of course, the question of whether permitting such quantifications results
in any useful and different higher-order abilities. To answer this question,
let us consider the "functional" counterpart of mappred that is defined by
the following definite clauses:
VF mapfun(F, nil, nil),
VFVL1 VL2 VX (mapfun(F, L1 , L2) D
mapfun(F, cons(X, L 1 ), cons(F(X), L 2 ))).
Reading these clauses declaratively, we see that mapfun relates a function
and two lists just in case the second list is obtained by applying the func-
tion to each element of the first list. Notice that the notion of function
application involved here is quite different from that of solving a goal. For
example, if our terms were chosen to be those of some version of the y-
calculus, function evaluation would be based on B-conversion. Thus, the
query
3L mapfun(yX h(1, X),cons(1,cons(2, nil)), L)
would apply the term y X h ( l , X ) to each of 1 and 2, eventually produc-
ing the list cons(h(1,1),cons(h(l,2),nil)) as an instantiation for L. (We
assume that the symbol h that appears in the query is a constant.) By
placing suitable restrictions on the y-terms that we use, we can make the
operation of B-conversion a relatively weak one, and, in fact, strong enough
only to encode the substitution operation. Such a choice of terms makes
it conceivable to run queries like the one just considered in "reverse." In
particular, a query such as
3F mapfun(F, cons(l, cons(2, nil)),cons(h(1, 1),cons(h(1, 2),nil)))
could be posed with the expectation of generating the term yX h(l, X) as a
substitution for F. The ability to find such solutions is dependent critically
on using a weak notion of functions and function evaluation and finding
predicate substitutions through an apparently similar process of solving
Higher-Order Logic Programming 509

goals is not a feasible possibility. To see this, suppose that we are given
the query
3P mappred(P, cons(bob, cons(sue, nil)),cons(24, cons(23, nil))).
It might appear that a suitable answer can be provided to this query and
that this might, in fact, be the value age for P. A little thought, however,
indicates that the query is an ill-posed one. There are too many predicate
terms that hold of 606 and 24 and of sue and 23—consider, for example,
the myriad ways for stating the relation that holds of any two objects—and
enumerating these does not seem to be a meaningful computational task.
The above discussion brings out the distinction between quantification
over only predicates and quantification over both predicates and functions.
This does not in itself, however, address the question of usefulness of per-
mitting quantification over functions. This question is considered in detail
in Sections 8 and 9 and so we provide only a glimpse of an answer to it at
this point. For this purpose, we return to the two mapfun queries above. In
both queries the new ability obtained from function variables and y-terms
is that of analyzing the process of substitution. In the first query, the com-
putation involved is that of performing substitutions into a given structure,
namely h(1, X). The second query involves finding a structure from which
two different structures can be obtained by substitutions; in particular, a
structure which yields h(l, 1) when 1 is substituted into it and h(1,2) when
2 is used instead. Now, there are several situations where this ability to
analyze the structures of terms via the operation of substitution is impor-
tant. Furthermore, in many of these contexts, objects that incorporate the
notion of binding will have to be treated. The terms of a y-calculus and
the accompanying conversion rules are, as we shall see, appropriate tools
for correctly formalizing the idea of substitution in the presence of bound
variables. Using function variables and y-terms can, therefore, provide for
programming primitives that are useful in these contexts.
We have already argued, using the mappred example, for the provision
of predicate variables in the arguments of (atomic) goal formulas. This
argument can be strengthened in light of the fact that predicates are but
functions of a special kind. When predicate variables appear as the heads
of atomic goals, i.e., in "extensional" positions, they can be instantiated,
thereby leading to the computation of new goals. However, as we have
just observed, it is not meaningful to contemplate finding values for such
variables. When predicate variables appear in the arguments of goals,
i.e. in "intensional" positions, values can be found for them by structural
analyses in much the same way as for other function variables. These
two kinds of occurrences of predicate variables can thus be combined to
advantage: an intensional occurrence of a predicate variable can be used
to form a query whose solution is then sought via an extensional occurrence
510 Gopalan Nadathur and Dale Miller

of the same variable. We delay the consideration of specific uses of this


facility till Section 7.

3 A higher-order logic
A principled development of a logic programming language that incorpo-
rates the features outlined in the previous section must be based on a
higher-order logic. The logic that we use for this purpose is derived from
Church's formulation of the simple theory of types [Church, 1940] princi-
pally by the exclusion of the axioms concerning infinity, extensionality for
propositions, choice and description. Church's logic is particularly suited
to our purposes since it is obtained by superimposing logical notions over
the calculus of y-conversion. Our omission of certain axioms is based on a
desire for a logic that generalizes first-order logic by providing a stronger
notion of variable and term, but that, at the same time, encompasses only
the most primitive logical notions that are relevant in this context; only
these notions appear to be of consequence from the perspective of com-
putational applications. Our logic is closely related to that of [Andrews,
1971], the only real differences being the inclusion of n-conversion as a rule
of inference and the incorporation of a larger number of propositional con-
nectives and quantifiers as primitives. In the subsections that follow, we
describe the language of this logic and clarify the intended meanings of
expressions by the presentation of a deductive calculus as well as a notion
of models. There are several similarities between this logic and first-order
logic, especially in terms of proof-theoretic properties. However, the richer
syntax of the higher-order logic make the interpretation and usefulness of
these properties in the two contexts different. We dwell on this aspect in
Subsection 3.5.

3.1 The language


The language underlying the formal system that we utilize in this chapter
is that of a typed y-calculus. There are two major syntactic components
to this language: the types and the terms. The purpose of the types is to
categorize the terms based on a functional hierarchy. From a syntactic per-
spective, the types constitute the more primitive notion and the formation
rules for terms identify a type with each term.
The types that are employed are often referred to as simple types. They
are determined by a set S of sorts and a set C of type constructors. We
assume that S contains the sort o that is the type of propositions and at
least one other sort, and that each member of C has associated with it
a unique positive arity. The class of types is then the smallest collection
that includes (i) every sort, (ii) (c o1 . . . o n ), for every c 6 C of arity n
and o1 , . . . , on that are types, and (iii) (a — T) for every a and T that
are types. Understood intuitively, the type (a - T) corresponds to the
Higher-Order Logic Programming 511

set of "function" terms whose domains and ranges are given by a and T
respectively. In keeping with this intuition, we refer to the types obtained
by virtue of (i) and (ii) as atomic types and to those obtained by virtue of
(iii) as function types.
We will employ certain notational conventions with respect to types.
To begin with, we will use the letters a and T, perhaps with subscripts, as
metalanguage variables for types. Further, the use of parentheses will be
minimized by assuming that -> associates to the right. Using this conven-
tion, every type can be written in the form (o1 —> . . .—> on —> T) where T
is an atomic type. We will refer to o1, . . . , on as the argument types and
to T as the target type of the type when it is written in this form. This
terminology is extended to atomic types by permitting the argument types
to be an empty sequence.
The class of terms is obtained by the operations of abstraction and
application from given sets of constants and variables. We assume that
the constants and variables are each specified with a type and that these
collections meet the following additional conditions: there is at least one
constant and a denumerable number of variables of each type and the
variables of each type are distinct from the constants and the variables of
any other type. The terms or formulas are then specified together with an
associated type in the following fashion:

(1) A variable or a constant of type o is a term of type a.


(2) If x is a variable of type a and F is a term of type T then (\x F) is
a term of type a -> T, and is referred to as an abstraction that binds
x and whose scope is F.
(3) If F1 is a term of type a -> T and F2 is a term of type a then (F1 F2),
referred to as the application of F1 to F2, is a term of type T.

Once again, certain notational conventions will be employed in connec-


tion with terms. When talking about an arbitrary term, we will generally
denote this by an uppercase letter that possibly has a subscript and a su-
perscript. It will sometimes be necessary to display abstractions and we
will usually depict the variable being abstracted by a lowercase letter that
may be subscripted. When it is necessary to present specific (object-level)
terms, we will explicitly indicate the symbols that are to be construed as
constants and variables. In writing terms, we will omit parentheses by us-
ing the convention that abstraction is right associative and application is
left associative. This usage will occur at both the object- and the meta-
level. As a further shorthand, the abstraction symbol in a sequence of
abstractions will sometimes be omitted: thus, the term yx1 . . . yxn T may
be abbreviated by yx1, . . . ,xnT. Finally, although each term is specified
only in conjunction with a type, we will seldom mention the types of terms
explicitly. These omissions will be justified on the basis that the types can
512 Gopalan Nadathur and Dale Miller

either be inferred from the context or are inessential to the discussion at


hand.
The rules of formation for terms serve to identify the well-formed sub-
parts of any given term. Specifically, a term G is said to occur in, or to be
a subterm or subformula of, a term F if (a) G is F, or (b) F is of the form
(yx F1) and G occurs in F1, or (c) F is of the form (F1 F2) and G occurs
in either F1 or F2. An occurrence of a variable x in F is either bound or
free depending on whether it is or is not an occurrence in the scope of an
abstraction that binds x. A variable x is & bound (free) variable of F if
it has at least one bound (free) occurrence in F. F is a closed term just
in case it has no free variables. We write F(F) to denote the set of free
variables of F. This notation is generalized to sets of terms and sets of
pairs of terms in the following way: F(D) is U { F ( F ) \ F 6 D} if D is a
set of terms and U{f(F1)U f(F 2 ) I (F 1 , F2) E D} if V is a set of pairs of
terms.
Example 3.1.1. Let int 6 S and list € C and assume that the arity
of list is 1. Then the following are legitimate types: int, (list int) and
int -> (list int) -> (list int). The argument types of the last of these types
are int and (list int) and its target type is (list int). Let cons be a constant
of type int -> (list int) -> (list int), let 1 and 2 be constants of type int
and let l be a variable of type (list int). Then yl (cons 1 (cons 2 l)) is
a term. A cursory inspection reveals that the type of this term must be
(list int) -> (list int). The above term has one bound variable and no
free variables, i.e., it is a closed term. However, it has as subterm the term
(cons 1 (cons 2 l)) in which the variable l appears free.
The language presented thus far gives us a means for describing func-
tions of a simple sort: abstraction constitutes a device for representing
function formation and application provides a means for representing the
evaluation of such functions. Our desire, however, is for a language that
allows not only for the representation of functions, but also for the use of
connectives and quantifiers in describing relations between such functions.
Such a capability can be achieved in the present context by introducing
a set of constants for representing these logical operations. To be precise,
we henceforth assume that the set of constants is partitioned into the set
of parameters or nonlogical constants and the set of logical constants with
the latter comprising the following infinite list of symbols: T of type o, ->
of type o -> o, A, V and D of type o -> o -> o, and, for each o, 3 and
V of type (a —> o) —> o. We recall that o is intended to be the type of
propositions. The logical constants are to be interpreted in the following
manner: T corresponds to the tautologous proposition, the (propositional)
connectives -, V, A, and D correspond, respectively, to negation, disjunc-
tion, conjunction, and implication, and the family of constants 3 and V are,
respectively, the existential and universal quantifiers. The correspondence
Higher-Order Logic Programming 513

between 3 and V and the quantifiers familiar from first-order logic may be
understood from the following: the existential and universal quantification
of x over P is written as (3 (yx P)) and (V (yx P)) respectively. Under this
representation, the dual aspects of binding and predication that accompany
the usual notions of quantification are handled separately by abstractions
and constants that are propositional functions of propositional functions.
The constants that are used must, of course, be accorded a suitable in-
terpretation for this representation to be a satisfactory one. For example,
the meaning assigned to V must be such that (V (yx P)) holds just in case
(yx P) corresponds to the set of all objects of the type of x.
Certain conventions and terminology are motivated by the intended
interpretations of the type o and the logical constants. In order to per-
mit a distinction between arbitrary terms and those of type o, we re-
serve the word "formula" exclusively for terms of type o. Terms of type
o1 -> . . . -> on -> o correspond to n-ary relations on terms and, for this
reason, we will also refer to them as predicates of n-arguments. In writing
formulas, we will adopt an infix notation for the symbols A, V and D; e.g.,
we will write (A F G) as (F^G). In a similar vein, the expressions (3x F)
and (Vx F) will be used as abbreviations for (3 (yx F)) and (V (yx F)) Par-
allel to the convention for abstractions, we will sometimes write the expres-
sions 3x1 . . . 3xn F and Vx1 . . . Vxn F as 3x 1 , . . . , xn F and Vx1, . . . , xn F
respectively. In several cases it will only be important to specify that the
"prefix" contains a sequence of some length. In such cases, we will use x
an abbreviation for a sequence of variables and write yx F, 3x F or Vic F,
as the case might be.
Our language at this point has the ability to represent functions as well
as logical relations between functions. However, the sense in which it can
represent these notions is still informal and needs to be made precise. We
do this in the next two subsections, first by describing a formal system that
clarifies the meaning of abstraction and application and then by presenting
a sequent calculus that bestows upon the various logical symbols their
intended meanings.

3.2 Equality between terms


The intended interpretations of abstraction and application are formalized
by the rules of y-conversion. To define these rules, we need the operation
of replacing all free occurrences of a variable x in the term T1 by a term
T2 of the same type as x. This operation is denoted by STx2T1 and is made
explicit as follows:
(i) If T1 is a variable or a constant, then STx2T1 is T2 if T1 is x and T1
otherwise.
(ii) If T1 is of the form (yy C), then STx2T1 is T1 if y is x and (Ay STx2C)
otherwise.
514 Gopalan Nadathur and Dale Miller

(iii) If T1 is of the form (C D), thenSTx2T1= (S^C ST2xD).

In performing this operation of replacement, there is the danger that the


free variables of T2 become bound inadvertently. The term "T2 is sub-
stitutable for x in T1" describes the situations in which the operation is
logically correct, i.e. those situations where x does not occur free in the
scope of an abstraction in T1 that binds a free variable of T2. The rules
of a-conversion, B-conversion and n-conversion are then, respectively, the
following operations and their converses on terms:

(1) Replacing a subterm ( y x T ) by (yy SyxT) provided y is substitutable


for x in T and not free in T.
(2) Replacing a subterm ((yxT1) T2) byST2xT1provided T2 is substi-
tutable for x in T1 , and vice versa.
(3) Replacing a subterm (Ax (T x)) by T provided x is not free in T, and
vice versa.

The rules above, referred to collectively as the y-conversion rules, may


be used to define the following relations between terms:
Definition 3.2.1. T y-conv (B-conv, =) S just in case there is a sequence
of applications of the y-conversion (respectively a- and B-conversion, a-
conversion) rules that transforms T into 5.
The three relations thus defined are easily seen to be equivalence re-
lations. They correspond, in fact, to notions of equality between terms
based on the following informal interpretation of the y-conversion rules:
a-conversion asserts that the choice of name for the variable bound by an
abstraction is unimportant, B-conversion relates an application to the re-
sult of evaluating the application, and n-conversion describes a notion of
extensionality for function terms (the precise nature of which will become
clear in Subsection 3.4). We use the strongest of these notions in our dis-
cussions below, i.e. we shall consider T and S equal just in case T y-conv
5.
It is useful to identify the equivalence classes of terms under the relations
just defined with canonical members of each class. Towards this end, we say
that a term is in B-normal form if it does not have a subterm of the form
((yx A) B). If T is in B-normal form and S B-conv T, then T is said to be a
B-normal form for 5. For our typed language, it is known that a B-normal
form exists for every term [Andrews, 1971]. By the celebrated Church-
Rosser theorem for B-conversion [Barendregt, 1981], this form must be
unique up to a renaming of bound variables. We may therefore use terms
in B-normal form as representatives of the equivalence classes under the
B-conv relation. We note that each such term has the structure
yx1 . . . yxn ( A T 1 . . . Tm)
Higher-Order Logic Programming 515

where A is a constant or variable, and, for 1 < i < m, Ti also has the
same structure. We refer to the sequence x1, . . .,xn as the binder, to A as
the head and to Ti,..., Tm as the arguments of such a term; in particular
instances, the binder may be empty, and the term may also have no argu-
ments. Such a term is said to be rigid if its head, i.e. A, is either a constant
or a variable that appears in the binder, and flexible otherwise.
In identifying the canonical members of the equivalence classes of terms
with respect to y-conv, there are two different approaches that might be
followed. Under one approach, we say that a term is in n-normal form
if it has no subterm of the form (Ax (A x)) in which x is not free in A,
and we say a term is in y-normal form if it is in both 0- and n-normal
form. The other alternative is to say that a term is in y-normal form if
it can be written in the form yx (A T1 . . . Tm) where A is a constant
or variable of type o -> . . . -> om -> T with T being an atomic type
and, further, each Ti can also be written in a similar form. (Note that
the term must be in B-normal form for this to be possible.) In either
case we say that T is a y-normal form for 5 if 5 y-conv T and T is in
y-normal form. Regardless of which definition is chosen, it is known that
a y-normal form exists for every term in our typed language and that
this form is unique up to a renaming of bound variables (see [Barendregt,
1981] and also the discussion in [Nadathur, 1987]). We find it convenient
to use the latter definition in this chapter and we will write ynorm(T)
to denote a y-normal form for T under it. To obtain such a form for
any given term, the following procedure may be used: First convert the
term into B-normal form by repeatedly replacing every subterm of the form
((yx A) B) by SBx A preceded, perhaps, by some a-conversion steps. Now,
if the resulting term is of the form yx1, . . . , xm (A T1 . . . Tn) where A is of
type o1 -> . . . on -> on+1 ->. . .—> on+r -> T, then replace it by the term

where yi, . . . ,yr are distinct variables of appropriate types that are not
contained in {x1, . . . ,xm}. Finally repeat this operation of "fluffing-up" of
the arguments on the terms T1,. . . , Tn, y1, . . . , yr.
A y-normal form of a term T as we have defined it here is unique only
up to a renaming of bound variables (i.e., up to o-conversions). While this
is sufficient for most purposes, we will occasionally need to talk of a unique
normal form. In such cases, we use p(F) to designate what we call the
principal normal form of F. Determining this form essentially requires a
naming convention for bound variables and a convention such as the one
in [Andrews, 1971] suffices for our purposes.
The existence of a y-normal form for each term is useful for two reasons.
First, it provides a mechanism for determining whether two terms are equal
by virtue of the y-conversion rules. Second, it permits the properties of
terms to be discussed by using a representative from each of the equivalence
516 Gopalan Nadathur and Dale Miller

classes that has a convenient structure. Of particular interest to us is the


structure of formulas in y-normal form. For obvious reasons, we call a
formula whose leftmost non-parenthesis symbol is either a variable or a
parameter an atomic formula. Then one of the following is true of a formula
in y-normal form: (i) it is T, (ii) it is an atomic formula, (iii) it is of the
form -F, where F is a formula in y-normal form, (iv) it is of the form
(F V G), (F ^ G), or (F D G) where F and G are formulas in y-normal
form, or (v) it is of the form (3xF) or (Vx F), where F is a formula in
y-normal form.
The B-conversion rule provides a convenient means for defining the op-
eration of substitution on terms. A substitution is formally a (type preserv-
ing) mapping on variables that is the identity everywhere except at finitely
many explicitly specified points. Thus, a substitution is represented by a
set of the form {(xi, Ti) \ 1 < i < n}, where, for 1 < i < n, Xi is a distinct
variable and Ti is a term that is of the same type as Xi but that is distinct
from Xi. The application of a substitution to a term requires this mapping
to be extended to the class of all terms and can be formalized as follows:
if 0 = {(xi, Ti) | 1 < i < n} and S is a term, then

It can be seen that this definition is independent of the order in which the
pairs are taken from 0 and that it formalizes the idea of replacing the free
occurrences of x1, . . . ,xn in S simultaneously by the terms T1, . . . , Tn. We
often have to deal with substitutions that are given by singleton sets and
we introduce a special notation for the application of such substitutions: if
0 is {(x,T)}, then 0(S) may also be written as [T/x]S.
Certain terminology pertaining to substitutions will be used later in
the chapter. Given two terms T1 and T2, we say T1 is an instance of T2
if it results from applying a substitution to T2. The composition of two
substitutions 01 and 02, written as 01 o 02, is precisely the composition of
01 and 02 when these are viewed as mappings: Q1 o0 2 (G) = 01(0 2 (G)). The
restriction of a substitution 0 to a set of variables V, denoted by 0 t V, is
given as follows:

Two substitutions, 01 and 02, are said to be equal relative to a set of


variables V if it is the case that 61 f V = 02 t V and this relationship is
denoted by 01 =v 02. 01 is said to be less general than 02 relative to V,
a relationship denoted by 01 < v 02, if there is a substitution a such that
01 =v a o 02. Finally, we will sometimes talk of the result of applying a
substitution to a set of formulas and to a set of pairs of formulas. In the
first case, we mean the set that results from applying the substitution to
each formula in the set, and, in the latter case, we mean the set of pairs
that results from the application of the substitution to each element in each
pair.
Higher-Order Logic Programming 517

3.3 The notion of derivation


The meanings of the logical symbols in our language may be clarified by
providing an abstract characterization of proofs for formulas containing
these symbols. One convenient way to do this is by using a sequent calculus.
We digress briefly to summarize the central notions pertaining to such
calculi. The basic unit of assertion within these calculi is a sequent. A
sequent is a pair of finite (possibly empty) sets of formulas (r, 0) that is
usually written as F —> 0. The first element of this pair is referred to
as the antecedent of the sequent and the second is called its succedent. A
sequent corresponds intuitively to the assertion that in each situation in
which all the formulas in its antecedent hold, there is at least one formula
in its succedent that also holds. In the case that the succedent is empty, the
sequent constitutes an assertion of contradictoriness of the formulas in its
antecedent. A proof for a sequent is a finite tree constructed using a given
set of inference figures and such that the root is labeled with the sequent
in question and the leaves are labeled with designated initial sequents.
Particular sequent systems are characterized by their choice of inference
figures and initial sequents.
Our higher-order logic is defined within this framework by the inference
figure schemata contained in Figure 1. Actual inference figures are obtained
from these by instantiating F, 0 and A by sets of formulas, B, D, and P
by formulas, x by a variable, T by a term and c by a parameter. There
is, in addition, a proviso on the choice of parameter for c: it should not
appear in any formula contained in the lower sequent in the same figure.
Also, in the inference figure schemata A, the sets A and A' and the sets 0
and 0' differ only in that zero or more formulas in them are replaced by
formulas that can be obtained from them by using the y-conversion rules.
The initial sequents of our sequent calculus are all the sequents of the form
F —> 0 where either T 6 0 or for some atomic formulas A 6 A and
A' € 0 it is the case that A = A'.
Expressions of the form B, A and 0, B that are used in the inference
figure schemata in Figure 1 are to be treated as abbreviations for A U
{B} and 0 U {B} respectively. Thus, a particular set of formulas may
be viewed as being of the form O,B even though B 6 0. As a result
of this view, a formula that appears in the antecedent or succedent of a
sequent really has an arbitrary multiplicity. This interpretation allows us to
eliminate the structural inference figures called contraction that typically
appear in sequent calculi; these inference figures provide for the multiplicity
of formulas in sequents in a situation where the antecedents and succedents
are taken to be lists instead of sets. Our use of sets also allows us to
drop another structural inference figure called exchange that is generally
employed to ensure that the order of formulas is unimportant. A third
kind of structural inference figure, commonly referred to as thinning or
518 Gopalan Nadathur and Dale Miller

Fig. 1. Inference figure schemata

weakening, permits the addition of formulas to the antecedent or succedent.


This ability is required in calculi where only one formula is permitted in
the antecedent or succedent of an initial sequent. Given the definition of
initial sequents in our calculus, we do not have a need for such inference
figures.
Any proof that can be constructed using our calculus is referred to as
a C-proof. A C-proof in which every sequent has at most one formula
in its succedent is called an I-proof. We write F hc B to signify that
F —> B has a C-proof and F hI B to signify that it has an I-proof. These
relations correspond, respectively, to provability in higher-order classical
and intuitionistic logic. Our primary interest in this chapter is in the
Higher-Order Logic Programming 519

classical notion, and unqualified uses of the words "derivation" or "proof"


are to be read as C-proof. We note that this notion of derivability is
identical to the notion of provability in the system T of [Andrews, 1971]
augmented by the rule of n-conversion. The reader familiar with [Gentzen,
1969] may also compare the calculus LK and the one described here for
C-proofs. One difference between these two is the absence of the inference
figures of the form

from our calculus. These inference figures are referred to as the Cut in-
ference figures and they occupy a celebrated position within logic. Their
omission from our calculus is justified by a rather deep result for the higher-
order logic under consideration, the so-called cut-elimination theorem. This
theorem asserts that the same set of sequents have proofs in the calculi with
and without the Cut inference figures. A proof of this theorem for our logic
can be modelled on the one in [Andrews, 1971] (see [Nadathur, 1987] for
details). The only other significant difference between our calculus and
the one in [Gentzen, 1969] is that our formulas have a richer syntax and
the A inference figures have been included to manipulate this syntax. This
difference, as we shall observe presently, has a major impact on the process
of searching for proofs.

3.4 A notion of models


An alternative approach to clarifying the meanings of the logical symbols in
our language is to specify the role they play in determining the denotations
of terms from abstract domains of objects. This may be done in a manner
akin to that in [Henkin, 1950]. In particular, we assume that we are given
a family of domains {D a }o, each domain being indexed by a type. The
intention is that a term of type a will, under a given interpretation for
the parameters and an assignment for the variables, denote an object in
the domain Da. There are certain properties that we expect our domains
to satisfy at the outset. For each atomic type o- other than o, we assume
that Do, is some set of individual objects of that type. The domain Do will
correspond to a set of truth values. However, within our logic and unlike
the logic dealt with in [Henkin, 1950], distinctions must be made between
the denotations of formulas even when they have the same propositional
content: as an example, it should be possible for QVP to denote a different
object from P V Q, despite the fact that these formulas share a truth value
under identical circumstances. For this reason, we take D0 to be a domain
of labeled truth values. Formally, we assume that we are given a collection
of labels, £ and that D0 is a subset of £ x {T, F} that denotes a function
from the labels £ to the set {T,F}. For each function type, we assume
that the domain corresponding to it is a collection of functions over the
520 Gopalan Nadathur and Dale Miller

relevant domains. Thus, we expect DoT to be a collection of functions


from Do to DT.
We refer to a family of domains {Do} a that satisfies the above con-
straints as a frame. We assume now that / is a mapping on the parameters
that associates an object from Da with a parameter of type a. The behavior
of the logical symbols insofar as truth values are concerned is determined
by their intended interpretation as we shall presently observe. However,
their behavior with respect to labels is open to choice. For the purpose of
fixing this in a given context we assume that we are given a predetermined
label Tl and the following mappings:

the last two are actually a family of mappings, parameterized by types. Let
C be the set containing Tl and these various mappings. Then the tuple
(£, { D a } a , I , C) is said to be a pre-structure or pre-interpretation for our
language.
A mapping o on the variables is an assignment with respect to a pre-
structure (£, {Da}a,I, C) just in case o maps each variable of type a to
Da. We wish to extend o to a "valuation" function Vo, on all terms. The
desired behavior of Vo on a term H is given by induction over the structure
of H:
(1) H is a variable of a constant. In this case
(i)if H is a variable then Vo(H) = o ( H ) ,
(ii)if H is a parameter then V^H) = I ( H ) ,
(iii)if H is T then Vo(H) = (Tl,T)
(iv) if H is - then Vo(H)(l,p) = (-l(l),q), where q is F if p is T and
T otherwise,
(v) if H is V then Vo(H)(l1,p)(l2,q) = (V,(l1)(l 2 ),r), where r is T
if either p or q is T and F otherwise,
(vi) if H is A then Vo(H)(l1,p)(l 2 ,q) = (A;(l1)(l2),r), where r is F
if either p or q is F and T otherwise,
(vii) if H is D then Vo(H)(li,p}(l^q) = (D1 (l1)(l 2 ),r), where r is T
if either p is F or q is T and F otherwise,
(viii) if H is 3 of type ((a -> o) -> o) then, for any pEDa->o,
Vo(H)(p) = (3i (p),q), where q is T if there is some t E Da such
that p(t) = (l, T) for some l E L. and q is F otherwise, and
(ix) if H is V of type ((a -> o) -> o) then, for any p € Da->0,
Vo(H)(p) = (Vi (p),q), where q is T if for every t E Da there is
some l e L such that p(t) = (l, T) and q is F otherwise.
(2) H is (H1 H2). In this case, 1o(H) = Vo(H1)(Vo(H 2 )).
(3) H is (yx/H1). Let x be of type a and, for any t E Da, let o(x := t)
be the assignment that is identical to o except that it maps x to t.
Higher-Order Logic Programming 521

Then Vo(H) = p where p is the function on Da such that p(t) =

The definition of pre-structure is, however, not sufficiently restrictive to


ensure that Vo(H) E Da for every term H of type a. The solution to
this problem is to deem that only certain pre-structures are acceptable for
the purpose of determining meanings, and to identify these pre-structures
and the meaning function simultaneously. Formally, we say a pre-structure
(L, { D a } a , I , C) is a structure or interpretation only if, for any assignment
o and any term H of type a, Vo(H) E Da where Vo is given by the
conditions above. It follows, of course, that Vo(H) is well defined relative to
a structure. For a closed term H , Vo(H) is independent of the assignment
o and may, therefore, be thought of as the denotation of H relative to the
given structure.
The idea of a structure as defined here is similar to the notion of a
general model in [Henkin, 1950], the chief difference being that we use a
domain of labeled truth values for Do. Note that our structures degenerate
to general models in Henkin 's sense in the case that the the set of labels £
is restricted to a two element set. It is of interest also to observe that the
domains Da->T in our structures are collections of functions from Da to
DT as opposed to functions in combination with a way of identifying them,
and the semantics engendered is extensional in this sense. The axiom of
extensionality is not a formula of our logic since its vocabulary does not
include the equality symbol. However, it is easily seen that the effect of
this axiom holds in our structures at a meta-level: two elements of the
domain Da->r are equal just in case they produce the same value when
given identical objects from Da. As a final observation, in the special case
that the domains Da->T include all the functions from Da to DT we obtain
a structure akin to the standard models of [Henkin, 1950].
Various logical notions pertaining to formulas in our language can be
explained by recourse to the definition of Vo. We say, for instance, that a
formula H is satisfied by a structure and an assignment o if Vo(H) = (l,T)
relative to that structure. A valid formula is one that is satisfied by every
structure and every assignment. Given a set of formulas F, and a formula
A, we say that A is a logical consequence of F, written r |= A, just in case
A is satisfied by every structure and assignment that also satisfies each
member of r.
Given a finite set of formulas O, let VO denote the disjunction of the
formulas in 0 if 0 is nonempty and the formula -T otherwise. The fol-
lowing theorem relates the model-theoretic semantics that is presented in
this subsection for our higher-order logic to the proof-theoretic semantics
presented in the previous subsection. The proof of this theorem and a fuller
development of the ideas here may be found in [Nadathur, 1997].
522 Gopalan Nadathur and Dale Miller

Theorem 3.4.1. Let r,O be finite sets of formulas. Then F —> 0 has
a C-proof if and only if r |= (V0).
3.5 Predicate variables and the subformula property
As noted in Subsection 3.3, the proof systems for first-order logic and our
higher-order logic look similar: the only real differences are, in fact, in the
presence of the y-conversion rules and the richer syntax of formulas. The
impact of these differences is, however, nontrivial. An important property
of formulas in first-order logic is that performing substitutions into them
preserves their logical structure — the resulting formula is in a certain
precise sense a subformula of the original formula (see [Gentzen, 1969]).
A similar observation can unfortunately not be made about formulas in
our higher-order logic. As an example, consider the formula F = ((p a) D
(Y a)); we assume that p and a are parameters of suitable type here and
that Y is a variable. Now let 9 be the substitution

where x, y and z are variables and b is a parameter. Then

As can be seen from this example, applying a substitution to a formula in


which a predicate variable appears free has the potential for dramatically
altering the top-level logical structure of the formula.
The above observation has proof-theoretic consequences that should be
mentioned. One consequence pertains to the usefulness of cut-elimination
theorems. These theorems have been of interest in the context of logic
because they provide an insight into the nature of deduction. Within first-
order logic, for instance, this theorem leads to the subformula property: if
a sequent has a proof, then it has one in which every formula in the proof is
a subformula of some formula in the sequent being proved. Several useful
structural properties of deduction in the first-order context can be observed
based on this property. Prom the example presented above, it is clear that
the subformula property does not hold under any reasonable interpretation
for our higher-order logic even though it admits a cut-elimination theorem;
predicate terms containing connectives and quantifiers may be generalized
upon in the course of a derivation and thus intermediate sequents may
have formulas whose structure cannot be predicted from the formulas in
the final one. For this reason, the usefulness of cut-elimination as a device
for teasing out the structural properties of derivations in higher-order logic
has generally been doubted.
A related observation concerns the automation of deduction. The tradi-
tional method for constructing a proof of a formula in a logic that involves
quantification consists, in a general sense, in substituting expressions for ex-
istentially quantified variables and then verifying that the resulting formula
is a tautology. In a logic where the prepositional structure remains invari-
Higher-Order Logic Programming 523

ant under substitutions, the search for a proof can be based on this struc-
ture and the substitution (or, more appropriately, unification) process may
be reduced to a constraint on the search. However, the situation is different
in a logic in which substitutions can change the prepositional structure of
formulas. In such logics, the construction of a proof often involves finding
the "right" way in which to change the propositional structure as well. As
might be imagined, this problem is a difficult one to solve in general, and
no good method that is also complete has yet been described for deter-
mining these kinds of substitutions in our higher-order logic. The existing
theorem-provers for this logic either sacrifice completeness (Bledsoe, 1979;
Andrews et al., 1984] or are quite intractable for this reason [Huet, 1973a;
Andrews, 1989].
In the next section we describe a certain class of formulas from our
higher-order logic. Our primary interest in these formulas is that they pro-
vide a logical basis for higher-order features in logic programming. There is
an auxiliary interest, however, in these formulas in the light of the above ob-
servations. The special structure of these formulas enables us to obtain use-
ful information about derivations concerning them from the cut-elimination
theorem for higher-order logic. This information, in turn, enables the de-
scription of a proof procedure that is complete and that at the same time
finds substitutions for predicate variables almost entirely through unifica-
tion. Our study of these formulas thus also demonstrates the utility of the
cut-elimination theorem even in the context of a higher-order logic.

4 Higher-order Horn clauses


In this section we describe a logical language that possesses all the en-
hancements to first-order Horn clauses that were discussed in Section 2.
The first step in this direction is to identify a subcollection of the class of
terms of the higher-order logic that was described in the previous section.
Definition 4.0.1. A positive term is a term of our higher-order language
that does not contain occurrences of the symbols ->, D and V. The collection
of positive terms that are in y-normal form is called the positive Herbrand
universe and is denoted by H+.
The structure of positive terms clearly satisfies the requirement that
we wish to impose on the arguments of atomic formulas: these terms are
constructed by the operations of application and abstraction and function
(and predicate) variables may appear in them as also the logical symbols
T, V, A and 3. As we shall see presently, H+ provides the domain of
terms used for describing the results of computations. It thus plays the
same role in our context as does the first-order Herbrand Universe in other
discussions of logic programming.
A higher-order version of Horn clauses is now identified by the following
definition.
524 Gopalan Nadathur and Dale Miller

Definition 4.0.2. A formula of the form (p T1 . . . Tn), where p is a


parameter or a variable and, for 1 < i < n, Ti e H+ is said to be a positive
atomic formula. Recall that such a formula is rigid just in case p is a
parameter. Let A and AT be symbols that denote positive and rigid positive
atomic formulas respectively. Then the (higher-order) goal formulas and
definite clauses are the G- and .D-formulas given by the following inductive
rules:

A finite collection of closed definite clauses is referred to as a program and


a goal formula is also called a query.
There is an alternative characterization of the goal formulas just defined:
they are, in fact, the terms in H+ of type o. The presentation above
is chosen to exhibit the correspondence between the higher-order formulas
and their first-order counterparts. The first-order formulas are contained in
the corresponding higher-order formulas under an implicit encoding that
essentially assigns types to the first-order terms and predicates. To be
precise, let i be a sort other that o. The encoding then assigns the type i to
variables and parameters, the type i ->. . .-> i -> i, with n +1 occurrences
of i, to each n-ary function symbol, and the type i -> . . . -> i -> o, with n
occurrences of i, to each n-ary predicate symbol. Looked at differently, our
formulas contain within them a many-sorted version of first-order definite
clauses and goal formulas. In the reverse direction, the above definition
is but a precise description of the generalization outlined informally in
Section 2. Of particular note are the restriction of arguments of atoms
to positive terms, the requirement of a specific name for the "procedure"
defined by a definite clause and the ability to quantify over predicates and
functions. The various examples discussed in Section 2 can be rendered
almost immediately into the current syntax, the main difference being the
use of a curried notation.
Example 4.0.3. Let list be a unary type constructor, let int and i be
sorts. Further, let nil and nil' be parameters of type (list i) and (list int)
and let cons and cons' be parameters of type i -> (list i) -> (list i) and
int —> (list int) —> (list int). The following formulas constitute a program
under the assumption that mappred is a parameter of type
(i -> int -> o) -> (list i) -> (list int) -> o
and that P, L1, L2, X and Y are variables of the required types:
VP (mappred P nil nil'),
VP,L1, L2, X, Y ( ( ( P XY) A (mappred P L1 L2)) D
(mappred P (cons' X L1) (cons Y L2))).
Assuming that age is a parameter of type i —> int -> o, bob and sue are
Higher-Order Logic Programming 525

parameters of type i and L is a variable of type (list int), the formula


(mappred age (cons bob (cons sue nil)) L)
constitutes a query. If mapfun is a parameter of type

and F, L1, L2 and X are variables of appropriate types, the following


formulas, once again, constitute a program:

If 1, 2, and h are parameters of type int, int, and int -> int -> int
respectively and L, X and F are variables of suitable types, the following
are queries:

Higher-order definite clauses and goal formulas are intended to provide


for a programming paradigm in a manner analogous to their first-order
counterparts. The notion of computation corresponds as before to solving
a query from a given program. The desire to preserve the essential charac-
ter of Horn clause logic programming dictates the mechanism for carrying
out such a computation: an abstract interpreter for our language must
perform a search whose structure is guided by the top-level logical symbol
in the query being solved. There is, however, a potential problem in the
description of such an interpreter caused by the possibility for predicate
variables to appear in extensional positions in goal formulas. We recall
from Section 3 that substitutions for such variables have the ability to al-
ter the top-level logical structure of the original formula. In the specific
context of interest, we see that goal formulas do not remain goal formulas
under arbitrary instantiations. For example, consider the instantiation of
the goal formula 3P (P a) with the term yx -(q x); we assume that P and
x are variables here and that a and q are parameters. This instantiation
produces the formula ->(q a) which is obviously not a goal formula. If such
instantiations must be performed in the course of solving queries similar
to the given one, an interpreter that proceeds by examining the top-level
structure of only goal formulas cannot be described for our language. The
computational mechanism for our language would, therefore, have to di-
verge even in principle from the one used in the first-order context.
Fortunately this problem has a solution that is adequate at least from
a pragmatic perspective. The essential idea is to consider the domain of
our computations to be limited to that of positive terms. In particular,
instantiations with only positive terms will be used for definite clauses
and goal formulas in the course of solving queries. Now, it is easily seen
526 Gopalan Nadathur and Dale Miller

that "positive" instantiations of quantifiers in goal formulas and definite


clauses yield formulas that are themselves goal formulas and definite clauses
respectively. Problems such as those just discussed would, therefore, not
arise in this context. We adopt this solution in our discussions below.
Although this solution is adequate in a practical sense, we note that there
is a question about its acceptability from a logical perspective; in particular,
using it may be accompanied by a loss in logical completeness. We discuss
this issue in the next section.
In presenting an abstract interpreter for our language, we find a notation
for positive instances of a set of definite clauses useful. This notation is
described below.
Definition 4.0.4. A (closed) positive substitution is a substitution whose
range is a set of (closed) terms contained in H+. Let D be a closed definite
clause. Then the collection of its closed positive instances, denoted by [D],
is
(i) {D} if D is of the form A or G D A, and
(ii) U{u(D')] | u is a closed positive substitution for x} if D is of the
form VxD'.
This notation is extended to programs as follows: if P is a program,
[P] = U{[D] |D E P}.
We now specify the abstract interpreter in terms of the desired search
related interpretation for each logical symbol.
Definition 4.0.5. Let P be a program and let G be a closed goal formula.
We use the notation P |o G to signify that our abstract interpreter succeeds
on G when given P; the subscript on ho acknowledges the "operational"
nature of the notion. Now, the success/failure behavior of the interpreter
on closed goal formula is specified as follows:
(i) p| 0 T,
(ii) P \-o A where A is an atomic formula if and only if A = A' for some
A' E [P] or for some G D A' € [P] such that A = A' it is the case
that P \-0 G,
(iii) P t-0G1 V G2 if and only if P ^G1 or P \-oG2,
(iv) P \-0G1 A G2 if and only if P \-oG1 and Pt-oG2, and
(v) P \~o 3x G if and only if P \~o u(G) for some u that is a closed positive
substitution for x.

The description of the abstract interpreter has two characteristics which


require further explanation. First, it specifies the behavior of the inter-
preter only on closed goal formulas, whereas a query is, by our definition,
an arbitrary goal formula. Second, while it defines what a computation
should be, it leaves unspecified what the result of such a computation is.
Higher-Order Logic Programming 527

The explanations of these two aspects are, in a certain sense, related. The
typical scenario in logic programming is one where a goal formula with some
free variables is to be solved relative to some program. The calculation that
is intended in this situation is that of solving the existential closure of the
goal formula from the program. If this calculation is successful, the result
that is expected is a set of substitutions for the free variables in the given
query that make it so. We observe that the behavior of our abstract inter-
preter accords well with this view of the outcome of a computation: the
success of the interpreter on an existential query entails its success on a
particular instance of the query and so it is reasonable to require a specific
substitution to be returned as an "answer."
Example 4.0.6. Suppose that our language includes all the parameters
and variables described in Example 4.0.3. Further, suppose that our pro-
gram consists of the definite clauses defining mappred in that example and
the following in which 24 and 23 are parameters of type int.
(age bob 24),
(age sue 23).
Then, the query
(mappred age (cons bob (cons sue nil)) L)
in which L is a free variable actually requires the goal formula
3L (mappred age (cons bob (cons sue nil)) L).
to be solved. There is a solution to this goal formula that, in accordance
with the description of the abstract interpreter, involves solving the follow-
ing "subgoals" in sequence:
(mappred age (cons bob (cons sue nil)) (cons' 24 (cons' 23 nil')))
(age bob 24) A (mappred age (cons sue nil) (cons' 23 nil'))
(age bob 24)
(mappred age (cons sue nil) (cons' 23 nil'))
(age sue 23) A (mappred age nil nil')
(age sue 23)
(mappred age nil nil').
The answer to the original query that is obtained from this solution is the
substitution (cons' 24 (cons' 23 nil')) for L.
As another example, assume that our program consists of the clauses
for mapfun in Example 4.0.3 and that the query is now the goal formula
(mapfun F (cons 1 (cons 2 nil)) (cons (h 1 1) (cons (h 1 2) nil)))
in which F is a free variable. Once again we can construct the goal formula
whose solution is implicitly called for by this query. Further, a successful
solution path may be traced to show that an answer to the query is the
value yx (h 1 x) for F.
528 Gopalan Nadathur and Dale Miller

We have, at this stage, presented a higher-order generalization to the


Horn clauses of first-order logic and we have outlined, in an abstract fash-
ion, a notion of computation in the context of our generalization that pre-
serves the essential character of the notion in the first-order case. We have,
through this discussion, provided a framework for higher-order logic pro-
gramming. However, there are two respects in which the framework that
we have presented is incomplete. First, we have not provided a justifica-
tion based on logic for the idea of computation that we have described.
We would like to manifest a connection with logic to be true to the spirit
of logic programming in general and to benefit from possible declarative
interpretations of programs in particular. Second, the abstract interpreter
that has been described is not quite adequate as a basis for a practical
programming language. As evidenced in Example 4.0.6, an unacceptable
degree of clairvoyance is needed in determining if a given query has a suc-
cessful solution—an answer to the query must be known at the outset. A
viable evaluation procedure therefore needs to be described for supporting
the programming paradigm outlined here. We undertake these two tasks
in Sections 5 and 6 respectively.

5 The meaning of computations


We may attempt to explain the meaning of a computation as described in
the previous section by saying that a query succeeds from a given program
if and only if its existential closure is provable from, or a logical consequence
of, the program. Accepting this characterization without further argument
is, however, problematic. One concern is the treatment of quantifiers. From
Definition 4.0.5 we see that only positive instances of definite clauses are
used and success on existentially quantified goal formulas depends only on
success on a closed positive instance of the goal. It is unclear that these
restrictions carry over to the idea of provability as well. A related problem
concerns the search semantics accorded to the logical connectives. We note,
for instance, that success on a goal formula of the form G1 V G2 depends
on success on either G1 or G2. This property does not hold of provability
in general: a disjunctive formula may have a proof without either of the
disjuncts being provable.
A specific illustration of the above problems is provided by the following
derivation of the goal formula 3Y (p Y) from a program consisting solely
of the definite clause V.X (X D (p a)); we assume that p, a and b are
parameters here and that X and Y are variables.
Higher-Order Logic Programming 529

The penultimate sequent in this derivation is

The antecedent of this sequent is obtained by substituting a term that is


not positive into a definite clause. This sequent obviously has a derivation.
There is, however, no term T such that -i(p 6) D (p a) —> (p T) has a
derivation. This is, of course, a cause for concern. If all derivations of

involve the derivation of (*), or of sequents similar to (*), then the idea of
proving 3Y (p Y) would diverge from the idea of solving it, at least in the
context where the program consists of the formula VX (X D (p a)).
We show in this section that problems of the sort described in the previ-
ous paragraph do not arise, and that the notions of success and provability
in the context of our definite clauses and goal formulas coincide. The
method that we adopt for demonstrating this is the following. We first
identify a C'-proof as a C-proof in which each occurrence of V-L and 3-R
constitutes a generalization upon a closed term from H+. In other words,
in each appearance of figures of the forms

it is the case that T is instantiated by a closed term from H+. We shall


show then that if F consists only of closed definite clauses and A consists
only of closed goal formulas, then the sequent F —> A has a C-proof only
if has a C'-proof. Now C'-proofs of sequents of the kind described have the
following characteristic: every sequent in the derivation has an antecedent
consisting solely of closed definite clauses and a succedent consisting solely
of closed goal formulas. This structural property of the derivation can be
exploited to show rather directly that the existence of a proof coincides
with the possibility of success on a goal formula.

5.1 Restriction to positive terms


We desire to show that the use of only C'-proofs, i.e., the focus on positive
terms, does not restrict the relation of provability as it pertains to definite
clauses and goal formulas. We do this by describing a transformation from
an arbitrary C-proof to a C'-proof. The following mapping on terms is
useful for this purpose.
530 Gopalan Nadathur and Dale Miller

Definition 5.1.1. Let x and y be variables of type o and, for each a, let
za be a variable of type a —> o. Then the function pos on terms is defined
as follows:
(i) If T is a constant or a variable

if T is -
if T is D
if T is V of type (a -> o) -> o
otherwise.

(ii) pos((T1 T2)) = (pos(T1) pos(T2)).


(iii) pos(ywT) = yw pos(T).
Given a term T, the y-normal form of pos(T) is denoted by T+.
The mapping defined above is a "positivization" operation on terms as
made clear in the following lemma whose proof is obvious.
Lemma 5.1.2. For any term T, T+ e H+. Further, f(T+) C F(T).
In particular, if T is closed, then T+ is a closed positive term. Finally, if
T E H+ then T = T+.
Another property of the mapping defined above is that it commutes
with y-conversion. This fact follows easily from the lemma below.
Lemma 5.1.3. For any terms T1 and T2, if T1 y-converts to T2 then
pos(T1) also y-converts to pos(T2).
Proof. We use an induction on the length of the conversion sequence. The
key is in showing the lemma for a sequence of length 1. It is easily seen that
if T2 is obtained from T1 by a single application of the a- or n-conversion
rule, then pos(T2) results from pos(T1) by a similar rule. Now, let A be
substitutable for x in B. Then an induction on the structure of B confirms
that pos(SAxS) = Spxos(A)pos(B). Thus, if R1 is ((yx B) A) and R2 is SAxB,
it must be the case that pos(R1) B-converts to pos(R2). An induction on
the structure of T\ now verifies that if T2 results from it by a single 0-
conversion step, then pos(T1) B-converts to pos(T2). I
We will need to consider a sequence of substitutions for variables in a
formula in the discussions that follow. In these discussions it is notationally
convenient to assume that substitution is a right associative operation.
Thus [R2/X2\[R1/x1]T is to be considered as denoting the term obtained
by first substituting R1 for x1 in T and then substituting R2 for x2 in the
result.
Lemma 5.1.4. If T is a term in H+ and R1, . . . , Rn are arbitrary terms
(n > 0), then
Higher-Order Logic Programming 531

In particular, this is true when T is an atomic goal formula or an atomic


definite clause.
Proof. We note first that for any term T E H+, pos(T) = T and thus
T+ = T. Using Lemma 5.1.3, it is easily verified that

The lemma follows from these observations.


The transformation of a C-proof into a C'-proof for a sequent of the
kind that we are interested in currently can be described as the result of a
recursive process that works upwards from the root of the proof and uses
the positivization operation on the terms generalized upon in the V-L and
3-R rules. This recursive process is implicit in the proof of the following
theorem.
Theorem 5.1.5. Let A be a program and let 0 be a finite set of closed
goal formulas. Then A — > 0 has a C-proof only if it also has a C'-proof.
Proof. We note first that all the formulas in A — t 0 are closed. Hence,
we may assume that the V-L and 3-R inference figures that are used in the
C-proof of this sequent generalize on closed terms (i.e., they are obtained by
replacing T in the corresponding schemata by closed terms). The standard
technique of replacing all occurrences of a free variable in a proof by a
parameter may be used to ensure that this is actually the case.
Given a set of formulas F of the form

where r, l1 , . . . ,lr > 0, we shall use the notation r+ to denote the set

Now let A be a set of the form

i.e., a set of formulas, each member of which is obtained by performing a


sequence of substitutions into a definite clause. Similarly, let 0 be a set of
the form

i.e., a set obtained by performing sequences of substitutions into goal for-


mulas. We claim that if A — > O has a C-proof in which all the V-L and
the 3-R figures generalize on closed terms, then A+ — > O+ must have a
C'-proof. The theorem is an obvious consequence of this claim.
The claim is proved by an induction on the height of C-proofs for se-
quents of the given sort. If this height is 1, the given sequent must be an
initial sequent. There are, then, two cases to consider. In the first case, for
some i such that 1 < i < s we have that [Simi/yimi] . . . [Si1/yi1]Gi is T. But
then Gi must be an atomic formula. Using Lemma 5.1.4 we then see that
532 Gopalan Nadathur and Dale Miller

[(Simi) + /yi mi ] . . . [(Si1)+/yi1]Gi must also be T. In the other case, for some
i,j such that 1 < i < s and 1 < j < t, we have that

and, further, that these are atomic formulas. From the last observation, it
follows that Dj and Gi are atomic formulas. Using Lemma 5.1.4 again, it
follows that

Thus in either case it is clear that A+ —> O+ is an initial sequent.


We now assume that the claim is true for derivations of sequents of
the requisite sort that have height h, and we verify it for sequents with
derivations of height h + 1. We argue by considering the possible cases for
the last inference figure in such a derivation. We observe that substituting
into a definite clause cannot yield a formula that has A, V, - or 3 as
its top-level connective. Thus the last inference figure cannot be an A-L,
an V-L, a ->-L or a 3-L. Further, a simple induction on the heights of
derivations shows that if a sequent consists solely of formulas in y-normal
form, then any C-proof for it that contains the inference figure A can be
transformed into a shorter C-proof in which A does not appear. Since each
formula in A —> O must be in y-normal form, we may assume that the
last inference figure in its C-proof is not a A. Thus, the only figures that
we need to consider are D -L, V-L, A-R, V-R, --R, D -R, 3-R, and V-R.
Let us consider first the case for an A-R, i.e., when the last inference
figure is of the form

In this case 0' C 0 and for some i, 1 < i < s,

Our analysis breaks up into two parts depending on the structure of Gi:
(1) If Gi is an atomic formula, we obtain from Lemma 5.1.4 that

Now B and D can be written as [B/y]y and [D/y]y, respectively. From the
hypothesis it thus follows that
A+ —> (O')+, B+ and A+ —> (O') + , D+
have C'-proofs. Using an A-R inference figure in conjunction with these,
we obtain a C'-proof for A+ —> O+.
(2) If d is not an atomic formula then it must be of the form G1i AG2i.
But then B =[Simi/yimi]...[Si1/yi1]G1iand D = [Simi/yimi] . . . [Si1/yi1]G2i. It
follows from the hypothesis that C'-proofs exist for
A+ ->
and
Higher-Order Logic Programming 533

A+ -
A proof for A+ — > @+ can be obtained from these by using an A-R
inference figure.
An analogous argument can be provided when the last figure is V-R.
For the case of D -L, we observe first that if the result of performing a
sequence of substitutions into a definite clause D is a formula of the form
B D C, then D must be of the form G D AT where G is a goal formula and
Ar is a rigid positive atom. An analysis similar to that used for A-R now
verifies the claim.
Consider now the case when the last inference figure is a ->-R, i.e., of
the form

We see in this case that for some suitable i, [Simi/yimi] . . . [Si1/yi1]Gi = -B.
But then Gi must be an atomic goal formula and by Lemma 5.1.4

Thus, A+ —> O+ is an initial sequent and the claim follows trivially.


Similar arguments can be supplied for D -R and V-R.
The only remaining cases are those when the last inference figure is a
3-R or a V-L. In the former case the last inference figure must have the
form

where O' C O and for some i, 1 < i < s, [Simi/yimi] . . . [Si1/yi1]Gi = 3w P.


We assume, without loss of generality, that w is distinct from the variables
yi1, . . . , yimi as well as the variables that are free in Si1 , . . . , Simi. There are
once again two subcases based on the the structure of GO:
(1) If Gi is an atomic formula, it follows from Lemma 5.1.4 that
[(Simi)+/yimi] . . . [(Si1)+/yi1]Gi = (3w P)+ = 3w, (P)+.
Writing [T/w]P as [T/w][P/u]u and invoking the hypothesis, we see that
a C'-proof must exist for A+ —> (0')+, [T+/w]P+. Adding a 3-R figure
below this yields a derivation for A+ —> t (O') + , 3w (P)+, which is iden-
tical to A+ —> O+. Further, this must be a C'-proof since T is a closed
term by assumption and hence, by Lemma 5.1.2, T+ must be a closed
positive term.
(2) If Gi is a non-atomic formula, it must be of the form 3x GJ where
G'i is a goal formula. But now P = [Simi/yimi] . . . [Si1/yi1]G'i. Thus,

has a C'-proof by the hypothesis. By adding a 3-R inference figure below


this, we obtain a C'-proof for
534 Gopalan Nadathur and Dale Miller

Noting that

we see that the claim must be true.


The argument for V-L is similar to the one for the second subcase of
3-R.
As mentioned already, Theorem 5.1.5 implicitly describes a transforma-
tion on C-proofs. It is illuminating to consider the result of this transfor-
mation on the derivation presented at the beginning of this section. We
invite the reader to verify that this derivation will be transformed into the
following:

Notice that there is no ambiguity about the answer substitution that should
be extracted from this derivation for the existentially quantified variable
Y.

5.2 Provability and operational semantics


We now show that the possibility of success on a goal formula given a
program coincides with the existence of a proof for that formula from the
program. Theorem 5.1.5 allows us to focus solely on C'-proofs in the course
of establishing this fact, and we do this implicitly below.
The following lemma shows that any set of definite clauses is consistent.
Lemma 5.2.1. There can be no derivation for a sequent of the form
A —> where A is a program.
Proof. Suppose the claim is false. Then there is a least h and a program
A such that A —> has a derivation of height h. Clearly h > 1. Con-
sidering the cases for the last inference figure (D -L or V-L with the latter
generalizing on a closed positive term), we see a sequent of the same sort
must have a shorter proof. I
The lemma below states the equivalence of classical and intuitionistic
provability in the context of Horn clauses. This observation is useful in
later analysis.
Lemma 5.2.2. Let A be a program and let G be a closed goal formula.
Then A —> G has a derivation only if it has one in which there is at
most one formula in the succedent of each sequent.
Higher-Order Logic Programming 535

Proof. We make a stronger claim: a sequent of the form

where A is a program and G1 , . . . , Gn are closed goal formulas, has a deriva-


tion only if for some i, 1 < i < n, A —> Gi has a derivation in which at
most one formula appears in the succedent of each sequent.
The claim is proved by an induction on the heights of derivations. It is
true when the height is 1 by virtue of the definition of an initial sequent.
In the situation when the height is ft + 1, we consider the possible cases
for the last inference figure in the derivation. The argument for A-R and
V-R are straightforward. For instance, in the former case the last inference
figure in the derivation must be of the form

where for some j, 1 < j < n, Gj = G] A G] and 0 C {G 1 ,...,G n }.


Applying the inductive hypothesis to the upper sequents of this figure, it
follows that there is a derivation of the requisite sort for A —^ G for some
G € 0 or for both A —»• Gj and A —> G'j. In the former case the claim
follows directly, and. in the latter case we use the two derivations together
with an A-R inference figure to construct a derivation of the required kind
for A — G}AG?.
Similar arguments can be provided for the cases when the last inference
figure is 3-R or V-L. The only additional observation needed in these cases
is that the restriction to C'-proofs ensures that the upper sequent in these
cases has the form required for the hypothesis to apply.
We are left only with the case of D -L. In this case, the last inference
figure has the form

where A = {(G D A)}U A' and 0 = TI UF 2 . From Lemma 5.2.1 it follows


that FI ^ 0. By the hypothesis, we see that there is a derivation of the
required kind for either A' —> G1 for some G1 € r2 or for A' — G.
In the former case, by adding G D A to the antecedent of every sequent of
the derivation, we obtain one for A —> G1. In the latter case, another
use of the inductive hypothesis tells us that there is a derivation of the
desired kind for A, A' —>• G2 for some G2 6 F1. This derivation may be
combined with the one for A' —>• G to obtain one for A —>• G2. I
The proof of the above lemma is dependent on only one fact: every
sequent in the derivations being considered has a program as its antecedent
and a set of closed goal formulas as its succedent. This observation (or one
closely related to it) can be made rather directly in any situation where
quantification over predicate variables appearing in extensional positions is
536 Gopalan Nadathur and Dale Miller

not permitted. It holds, for instance, in the case when we are dealing with
only first-order formulas. Showing that the observation also applies in the
case of our higher-order formulas requires much work, as we have already
seen.
Definition 5.2.3. The length of a derivation E is defined by recursion on
its height as follows:
(i) It is 1 if E consists of only an initial sequent.
(ii) It is l + 1 if the last inference figure in E has a single upper sequent
whose derivation has length l.
(iii) It is(iii)+ l2 + 1 if the last inference figure in E has two upper sequents
whose derivations are of length l1 and l2 respectively.

The main result of this subsection requires the extraction of a successful


computation from a proof of a closed goal formula from a program. The
following lemma provides a means for achieving this end.
Lemma 5.2.4. Let A be a program, let G be a closed goal formula and let
A —>• G have a derivation of length l. Then one of the following is true:
(i) G is T.
(ii) G is an atomic formula and either G = A for some A in [A] or for
some G' D A 6 [A] such that G = A it is the case that A —>• G'
has a derivation of length less than l.
(iii) G is GI f\G-2 and there are derivations for A —> G1 and A —>• G2
of length less than l.
(iv) G is GI V G2 and there is a derivation for either A —> G1 or
A —>• G-2 of length less than l.
(v) G is 3x G1 and for some closed positive term T it is the case that
A —>• [T/x]G 1 has a derivation of length less than l.

Proof. We use an induction on the lengths of derivations. The lemma is


obviously true in the case this length is 1: G is either T or G is atomic and
G = A for some A £ A. When the length is l +1, we consider the cases for
the last inference figure. The argument is simple for the cases of A-R, V-R
and 3-R. For the case of D -L, i.e., when the last inference figure is of the
form

where A = {G' D ^4} U A', the argument depends on the structure of


G. If G is an atomic formula distinct from T, the lemma follows from the
hypothesis applied to A, A' —> G except in the case when G' = A. But in
the latter case we see that (G' D A) 6 [A] and a derivation for A —> G'
of length less than l + 1 can be obtained from the one for A' —>• G' by
Higher-Order Logic Programming 537

adding (G' D A) to the antecedent of every sequent in that derivation. If


G is G1 A G2, we see first that there must be derivations for A, A' —>• G1
and A, A' —> G2 of smaller length than that for A, A' —> G. But
using the derivation for A' —>• G' in conjunction with these we obtain
derivations for G' D A, A' —>• d and G' D .4, A' —> G2 whose lengths
must be less than l+1. Analogous arguments may be provided for the other
cases for the structure of G. Finally a similar (and in some sense simpler)
argument can be provided for the case when the last inference figure is a
V-L. I
The equivalence of provability and the operational semantics defined in
the previous section is the content of the following theorem.
Theorem 5.2.5. // A is a program and G is a closed goal formula, then
A \~c G if and only if A \-o G.
Proof. Using Lemma 5.2.4 and an induction on the length of a derivation
it follows easily that A ho G if A \~c G. In the converse direction we use
an induction on the length of the successful computation. If this length is
1, A —> G must be an initial sequent. Consider now the case where G
is an atomic formula that is solved by finding a G' D A £ [A] such that
G = A and then solving G'. By the hypothesis, A —>• G' has a derivation
as also does A —>• G. Using an D -L in conjunction with these, we get a
derivation for G' D A, A —> G. Appending a sequence of V-L inference
figures below this, we get a derivation for A —>• G. The argument for
the remaining cases is simpler and is left to the reader. I

6 Towards a practical realization


A practical realization of the programming paradigm that we have de-
scribed thus far depends on the existence of an efficient procedure for de-
termining whether a query succeeds or fails relative to a given program.
The abstract interpreter that is described in Section 4 provides the skeleton
for such a procedure. However, this interpreter is deficient in an impor-
tant practical respect: it requires a prior knowledge of suitable instantia-
tions for the existential quantifiers in a goal formula. The technique that
is generally used in the first-order context for dealing with this problem
is that of delaying the instantiations of such quantifiers till a time when
information is available for making an appropriate choice. This effect is
achieved by replacing the quantified variables by placeholders whose values
are determined at a later stage through the process of unification. Thus, a
goal formula of the form Bx G(x) is transformed into one of the form G(X)
where X is a new "logic" variable that may be instantiated at a subsequent
point in the computation. The attempt to solve an atomic goal formula
A involves looking for a definite clause Vy (G' D A') such that A unifies
with the atomic formula that results from A' by replacing the universally
538 Gopalan Nadathur and Dale Miller

quantified variables with logic variables. Finding such a clause results in


an instantiation of the logic variables in A and the next task becomes that
of solving a suitable instance of G'.
The approach outlined above is applicable to the context of higher-order
Horn clauses as well. The main difference is that we now have to consider
the unification of A-terms in a situation where equality between these terms
is based on the rules of A-conversion. This unification problem has been
studied by several researchers and in most extensive detail by [Huet, 1975].
In the first part of this section we expose those aspects of this problem and
its solution that are pertinent to the construction of an actual interpreter
for our language. We then introduce the notion of a p-derivation as a
generalization to the higher-order context of the SLD-derivations that are
used relative to first-order Horn clauses [Apt and van Emden, 1982]. At
one level, P-derivations are syntactic objects for demonstrating success on
a query and our discussions indicate their correctness from this perspective.
At another level, they provide the basis for an actual interpreter for our pro-
gramming paradigm: a symbol manipulating procedure that searches for
p-derivations would, in fact, be such an interpreter. Practicality requires
that the ultimate procedure conduct such a search in a deterministic man-
ner. Through our discussions here we expose those choices in search that
play a critical role from the perspective of completeness and, in the final
subsection, discuss ways in which these choices may be exercised by an
actual interpreter.

6.1 The higher-order unification problem


Let us call a pair of terms of the same type a disagreement pair. A dis-
agreement set is then a finite set, {(Ti,Si) \ 1 < i < n}, of disagreement
pairs, and a unifier for the set is a substitution 9 such that, for 1 < i < n,
Q(Ti) = 0(Si). The higher-order unification problem can, in this context, be
stated as the following: Given any disagreement set, to determine whether
it has unifiers, and to explicitly provide a unifier if one exists.
The problem described above is a generalization of the well-known uni-
fication problem for first-order terms. The higher-order unification prob-
lem has certain properties that are different from those of the unification
problem in the first-order case. For instance, the question of whether
or not a unifier exists for an arbitrary disagreement set in the higher-
order context is known to be undecidable [Goldfarb, 1981; Huet, 1973b;
Lucchesi, 1972]. Similarly, it has been shown that most general unifiers
do not always exist for unifiable higher-order disagreement pairs [Gould,
1976]. Despite these characteristics of the problem, a systematic search
can be made for unifiers of a given disagreement set, and we discuss this
aspect below.
Huet, in [Huet, 1975], describes a procedure for determining the exis-
tence of unifiers for a given disagreement set and, when unifiers do exist,
Higher-Order Logic Programming 539

for enumerating some of them. This procedure utilizes the fact that there
are certain disagreement sets for which at least one unifier can easily be
provided and, similarly, there are other disagreement sets for which it is
easily manifest that no unifiers can exist. Given an arbitrary disagreement
set, the procedure attempts to reduce it to a disagreement set of one of
these two kinds. This reduction proceeds by an iterative use of two sim-
plifying functions, called SIMPL and MATCH, on disagreement sets. The
basis for the first of these functions is provided by the following lemma
whose proof may be found in [Huet, 1975]. In this lemma, and in the rest
of this section, we use the notation U(D) to denote the set of unifiers for a
disagreement set T>.
Lemma 6.1.1. Let TI = \x (H1 A1 . . . Ar) and T2 = Az (H2 B1 . . . Bs)
be two rigid terms of the same type that are in ^-normal form. Then
9 € W({<Ti,r 2 >}) if and only if
(i) H1 = H2 (and, therefore, r = s), and
(ii) 9£U({(\xAi,\xBi)\l<i<r}).

Given any term T and any substitution 8, it is apparent that 9(T) =


0(Anorm(T)). Thus the question of unifying two terms can be reduced
to unifying their A-normal forms. Let us say that T is rigid (flexible)
just in case Anorm(T) is rigid (flexible), and let us refer to the arguments
of Anorm(T) as the arguments of T. If T1 and T2 are two terms of the
same type, their A-normal forms must have binders of the same length.
Furthermore, we may, by a sequence of a-conversions, arrange their binders
to be identical. If T1 and T2 are both rigid, then Lemma 6.1.1 provides us a
means for either determining that T1 and T2 have no unifiers or reducing the
problem of finding unifiers for T1 and T2 to that of finding unifiers for the
arguments of these terms. This is, in fact, the nature of the simplification
effected on a given unification problem by the function SIMPL.
Definition 6.1.2. The function SIMPL on sets of disagreement pairs is
defined as follows:
(1) If D> = 0 then SIMPL(D) = 0.
(2) If V = {(T1,T 2 )}, then the forms of T1 and T2 are considered.
(a) If T1 is a flexible term then SIMPL(D) = D.
(b) If T2 is a flexible term then SIMPL(D) = {(T2,T1)}.
(c) Otherwise T1 and T2 are both rigid terms. Let \x (C1 A1 . . . Ar)
and \x (C2 B1 . . . Bs) be A-normal forms for T1 and T2. If
G1^ C2 then SIMPL(D) = F; otherwise
SIMPL(D) = SIMPL({{XxAi,XxBi) \ 1 < i < r}).
(3) Otherwise D has at least two members. Let D = {(Tj.T-j) | 1 < i <
n}.
540 Gopalan Nadathur and Dale Miller

(a) If SIMPL({(T1i,T-j)}) = F for some i then SIMPL(D) = F;


(b) Otherwise SIMPL(D) = U SIMPL({{T1',T2i)}).
i=l

Clearly, SIMPL transforms a given disagreement set into either the


marker F or a disagreement set consisting solely of "flexible-flexible" or
"flexible-rigid" terms. By an abuse of terminology, we shall regard F as a
disagreement set that has no unifiers. The intention, then, is that SIMPL
transforms the given set into a simplified set that has the same unifiers.
The lemma below that follows from the discussions in [Huet, 1975] shows
that SIMPL achieves this purpose in a finite number of steps.
Lemma 6.1.3. SIMPL is a total computable function on sets of disagree-
ment pairs. Further, if D is a set of disagreement pairs then U(D) =
U(SIMPL(D)).
The first phase in the process of finding unifiers for a given disagreement
set D thus consists of evaluating SIMPL(D). If the result of this is F, D
has no unifiers. On the other hand, if the result is a set that is either empty
or has only flexible-flexible pairs, at least one unifier can be provided easily
for the set, as we shall see in the proof of Theorem 6.2.7. Such a set is,
therefore, referred to as a solved set. If the set has at least one flexible-rigid
pair, then a substitution needs to be considered for the head of the flexible
term towards making the heads of the two terms in the pair identical.
There are essentially two kinds of "elementary" substitutions that may be
employed for this purpose. The first of these is one that makes the head
of the flexible term "imitate" that of the rigid term. In the context of
first-order terms this is, in fact, the only kind of substitution that needs
to be considered. However, if the head of the flexible formula is a function
variable, there is another possibility: one of the arguments of the flexible
term can be "projected" into the head position in the hope that the head
of the resulting term becomes identical to the head of the rigid one or
can be made so by a subsequent substitution. There are, thus, a set of
substitutions, each of which may be investigated separately as a component
of a complete unifier. The purpose of the function MATCH that is defined
below is to produce these substitutions.
Definition 6.1.4. Let V be a set of variables, let T1 be a flexible term, let
T2 be a rigid term of the same type as T1, and let \x(F A1 . . . Ar), and
Ax (C B1 ... Bs) be A-normal forms of T1 and T2. Further, let the type of
F be <TI -> • • • - > • o> -t T, where T is atomic and, for 1 < i < r, let Wi be
a variable of type a{. The functions IMIT, PROJ, and MATCH are then
defined as follows:

(i) If C is a variable (appearing also in x), then IMIT(T1,T2, V) = 0;


otherwise
Higher-Order Logic Programming 541

where H1, . . . , Hs are variables of appropriate types not contained in


VU{w1, . . . ,wr}.
(ii) For 1 < i < r, if ffi is not of the form T1 ->• . . . -> Tt - T then
PROJi(T1,T2, V) = 0; otherwise,

where H1, . . . ,Ht are variables of appropriate types not contained in


VU{w1,...,w r }.
(iii) MATCH(T 1 ,T 2 ,V) = IMIT(T 1 ,T 2 ,V)U( U PROJi(T1,T 2, V)).
l<i<r

The purpose of MATCH is to suggest a set of substitutions that may


form "initial segments" of unifiers and, in this process, bring the search
for a unifier closer to resolution. That MATCH achieves this effect is the
content of the following lemma whose proof may be found in [Huet, 1975]
or [Nadathur, 1987].
Lemma 6.1.5. Let V be a set of variables, let T1 be a flexible term and
let T2 be a rigid term of the same type as T1 . If there is a substitution
6 € u({{T1,T2)}) then there is a substitution <p £ MATCH(T1,T2, V) and
a corresponding substitution 9' such that 9 =v b o <p. Further, there is
a mapping n from substitutions to natural numbers, i.e., a measure on
substitutions, such that TT(o') < TT(o).
A unification procedure may now be described based on an iterative
use of SIMPL and MATCH. The structure of this procedure is apparent
from the above discussions and its correctness can be ascertained by using
Lemmas 6.1.3 and 6.1.5. A procedure that searches for a P-derivation, a
notion that we describe next, actually embeds such a unification procedure
within it.
6.2 P-derivations
Let the symbols g, D, 6 and V, perhaps with subscripts, denote sets of
formulas of type o, disagreement sets, substitutions and sets of variables,
respectively. The following definition describes the notion of one tuple of
the form (g,D,0, V) being derivable from another similar tuple relative to
a program P.
Definition 6.2.1. Let P be a program. We say a tuple (g 2 ,D 2 ,0 2 , V2) is
P-derivable from another tuple (g1,D1,01, V1) if D1 / F and, in addition,
one of the following situations holds:
(1) (Goal reduction step) 02 = 0, D2 = D1, and there is a goal formula
G e g1 such that
542 Gopalan Nadathur and Dale Miller

(a) G is T and g2 = g1 - {G} and V2 = V1, or


(b) G is G1 A G2 and g2 = (g1 - {G}) U {G1,G2} and V2 = V1, or
(c) G is G1 V G2 and, for i = 1 or i = 2, g2 = (g1 - {G}) U {Gi}
and V2 = V1 , or
(d) G = 3x P and for some variable y $ V1 it is the case that
V2 = V1 U {y} and g2 = (g1 - {G}) U {(y/x]P}.
(2) (Backchaining step) Let G € g1 be a rigid positive atomic formula,
and let D £ P be such that D = Vz (G' D A) for some sequence of
variables x no member of which is in V1- Then 02 = 0, V2 = V1 U{x},
g2 = (g1 - {G}) U {G'}, and D2 = SIMPL(D1 U {{G, A)}). (Here,
{x} denotes the set of variables occurring in the list x.)
(3) (Unification step) D1 is not a solved set and for some flexible-rigid pair
(T1,T2) e D1, either MATCH(T1,T2, V1) = 0 and D2 = F, or there is
a tp e MATCH (T1,T2, V1) and it is the case that 02 = p, g2 = <p(g1),
D2 = SIMPL(y?(D1)), and, assuming ip is the substitution {{a;, s)}
for some variable x, V2 = V1

Let us call a finite set of goal formulas a goal set, and a disagreement set
that is F or consists solely of pairs of positive terms a positive disagreement
set. If g1 is a goal set and D1 is a positive disagreement set then it is clear,
from Definitions 6.1.2, 6.1.4 and 6.2.1 and the fact that a positive term
remains a positive term under a positive substitution, that g2 is a goal
set and D2 a positive disagreement set for any tuple (g 2 , D2, 02, V2) that is
P-derivable from (g1 , D1 , 01 ,V1) .
Definition 6.2.2. Let g be a goal set. Then we say that a sequence of the
form (g1,D1, 0i, Vi)1<i<n is a P -derivation sequence for g just in case g1 =
6, V1 = F(g1, D1 = 0, 01 = 0, and, for 1 < i < n, {gi+1,Di+1,0i+1, Vi+1)
is P-derivable from {g,Di,0i, Vi).
Prom our earlier observations it follows easily that, in a P-
derivation sequence for a goal set g, each gi is a goal set and each Pi
is a positive disagreement set. We make implicit use of this fact in our
discussions below. In particular, we intend unqualified uses of the sym-
bols g and D to be read as syntactic variables for goal sets and positive
disagreement sets, respectively.
Definition 6.2.3. A p-derivation sequence (gi,Di,0i, Vi)\<i<n terminates,
i.e., is not contained in a longer sequence, if
(a) gn is either empty or is a goal set consisting solely of flexible atoms
and Dn is either empty or consists solely of flexible-flexible pairs, or
(b) D n = F .
In the former case we say that it is a successfully terminated sequence. If
this sequence also happens to be a P-derivation sequence for g, then we call
Higher-Order Logic Programming 543

it a P-derivation of g and we say that 0 n o- • •oO1 is its answer substitution.2


If g = {G} then we also say that the sequence is a P-derivation of G.
Example 6.2.4. Let P be the set of definite clauses defining the predicate
mapfun in Example 4.0.3. Further, let G be the goal formula
(mapfun F1 (cons 1 (cons 2 nil)) (cons (h 1 1) (cons (h 1 2) nil)))
in which F1 is a variable and all other symbols are parameters as in Exam-
ple 4.0.3. Then the tuple (g1,D1, 0, V1) is P-derivable from ({G}, 0,0, {F1})
by a backchaining step, if
V1 = { F 1 , F 2 , L 1 , X } ,
g1 = {(mapfun F2 L1 L2)}, and
D1 = { ( F 1 , F 2 ) , ( X , l ) , ( ( F 1 X),(h 1 1 ) ) ,
(L1, (cons 2 nil)), (L2, (cons (h 1 2) nil))},
where F2, L1, L2, and X are variables. Similarly, if
V2 = V1 U{H1,H2},
g2 = {(mapfun F2 L1 L2)},
02 = {(F1.Aw(h (H1 w) (H2 w)))}, and
D2 = {(L1,(cons 2 nil)>,(L2,(cons (ft 1 2) nil)),(X,l),
((H1, X),l),((H2 X),l),(F2,Xw(h (H1 w) (H2 w)))},
then the tuple (g 2 ,D 2 ,0 2 , V2> is P-derivable from < g 1 , D 1 , 0 , V1) by a uni-
fication step. (H1, H2 and w are additional variables here.) It is, in fact,
obtained by picking the flexible-rigid pair ((F1 X), (h 1 1)) from D1 and
using the substitution provided by IMIT for this pair. If the substitu-
tion provided by PROJi was picked instead, we would obtain the tuple
(g 2 ,F,{(F1,\w w)},V1).
There are several p-derivations of G, and all of them have the same
answer substitution: {{F1,A w(h 1 w))}.
Example 6.2.5. Let P be the program containing only the definite clause
VX (X D (p a)), where X is a variable of type o and p and a are parameters
of type i ->• o and i, respectively. Then, the following sequence of tuples
constitutes a P-derivation of 3Y (p Y):

Notice that this is a successfully terminated sequence, even though the


final goal set contains a flexible atom. We shall presently see that a goal
set that contains only flexible atoms can be "solved" rather easily. In this
particular case, for instance, the final goal set may be solved by applying
the substitution {(X, T}} to it.
2
This is somewhat distinct from what might be construed as the result of a compu-
tation. The latter is obtained by taking the the final goal and disagreement sets into
account and restricting the substitution to the free variables in the original goal set. We
discuss this matter in Subsection 6.3.
544 Gopalan Nadathur and Dale Miller

A P-derivation of a goal G is intended to show that G succeeds in the


context of a program P. The following lemma is useful in proving that
P-derivations are true to this intent. A proof of this lemma may be found
in [Nadathur, 1987] or [Nadathur and Miller, 1990]. The property of the
P-derivability relation that it states should be plausible at an intuitive level,
given Lemma 6.1.3 and the success /failure semantics for goals.
Lemma 6.2.6. Let (g2,D2,02,V2) be P-derivable from (g1,D1,01,V1),
and let D2 ^ F. Further let <p 6 U(D2) be a positive substitution such
that P VQ G for every G that is a closed positive instance of a formula in
<p(g 2 ). Then
(i) ¥>o0 2 6 u(D1), and
(ii) P \~OG' for every G' that is a closed positive instance of a formula in

We now show the correctness of P-derivations. An interesting aspect


of the proof of the theorem below is that it describes the construction of a
substitution that simultaneously solves a set of flexible atoms and unifies
a set of flexible-flexible disagreement pairs.
Theorem 6.2.7. (Soundness of "P-derivations) Let {gi, Di, 0i Vi)1<i<n be
a P-derivation of G, and let 0 be its answer substitution. Then there is a
positive substitution (p such that
(i) <f e U(Dn), and
(ii) P \~OG' for every G' that is a closed positive instances of the goal
formulas in tp(g n )-
Further, if <p is a positive substitution satisfying (i) and (ii), then P \-Q G"
for every G" that is a closed positive instance of (fo 0(G) .
Proof. The second part of the theorem follows easily from Lemma 6.2.6
and a backward induction on the index of a tuple in the given P-
derivation sequence. For the first part we exhibit a substitution — that
is a simple modification of the one in Lemma 3.5 in [Huet, 1975] — and then
show that it satisfies the requirements.
Let Ha be a chosen variable for each atomic type a. Then for each type
T we identify a term ET in the following fashion:
(a) If T is o, then ET = T.
(b) If T is an atomic type other than o, then Er = HT.
(c) If T is the function type r\ - > • • • • - > Tk - TO where TO is an atomic
type, then ET = \x1 . . . \Xk ETo, where, for 1 < i < k, Xi is a variable
of type Ti that is distinct from HTi .
Now let <p = {(Y, ET) | Y is a variable of type T and Y e F(g n ) U F(D n )}.
Any goal formula in gn is of the form (P S1 ... Sl) where P is a variable
whose type is of the form a1 - > • • • - > al -»• o. From this it is apparent that
Higher-Order Logic Programming 545

if G € gn then any ground instance of <p(G) is identical to T. Thus, it is


clear that y satisfies (ii). If Dn is empty then <p 6 u(Dn). Otherwise, let
(T1,T2) € Dn. Since T1 and T2 are two flexible terms, it may be seen that
<p(T1) and <p(T2) are of the form \y\ . . . Xy1m1 E T1 , and Aj/f . . . Ay^2 ET2
respectively, where Ti is a primitive type and ETi $ {y{, . . . , yimi} for i =
1,2. Since T\ and T2 have the same types and substitution is a type
preserving mapping, it is clear that T1 = T2, m1 = m2 and, for 1 < i < m1,
y\ and yf are variables of the same type. But then <p(T1) = ^(T2).
We would like to show a converse to the above theorem that states
that searching for a p-derivation is adequate for determining whether a
goal G succeeds or fails in the context of a program P. The crux of such
a theorem is showing that if the goals in the last tuple of a p-derivation
sequence succeed and the corresponding disagreement set has a unifier,
then that sequence can be extended to a successfully terminated sequence.
This is the content of the following lemma, a proof of which may be found
in [Nadathur, 1987] and [Nadathur and Miller, 1990]. The lemma should,
in any case, be intuitively acceptable, given the success/failure semantics
for goals and Lemmas 6.1.3 and 6.1.5.
Lemma 6.2.8. Let (g1, D1, 01, V1) be a tuple that is not a terminated P-
derivation sequence and for which f(g1 ) U F(D1 ) C V1 . In addition, let
there be a positive substitution (p\ £ U(D1) such that, for each G1 6 g1,
<{>i(G1) is a closed goal formula and P \- Q ip1(G1). Then there is a tu-
ple {g2, D2, 02, V2} that is P-derivable from (g1,D1,01,V1) and a positive
substitution <pi such that
(i) v> 2 eW(P 2 ),
(ii) (p1 =V1<P2°02,
(Hi) for each G2 G g2, f ^ ( G 2 ) is a closed goal formula that is such that
P t-0V>2(G2).

Further, there is a mapping kp from goal sets and substitutions to ordinals,


i.e., a measure on pairs of goal sets and substitutions, relative to P such
that K p ( g 2 , y 2 ) < K p ( g 1 , i p 1 ) . Finally, when there are several tuples that
are P-derivable from (g1,D1,01, V1), such a tuple and substitution exist for
every choice of (1) the kind of step, (2) the goal formula in a goal reduction
or backchaining step, and (3) the flexible-rigid pair in a unification step.
In using this lemma to prove the completeness of P-derivations, we
need the following observation that is obtained by a routine inspection of
Definition 6.2.2.
Lemma 6.2.9. Let (g 2 ,D2,02, V2> be P-derivable from (g1 , D1, 01, V1) and
let D2 = F. Then V1 C V2. Further, if F(g1 ) U F(D1) C V1, then
546 Gopalan Nadathur and Dale Miller

Theorem 6.2.10. (Completeness of P-derivations) Let ip be a closed posi-


tive substitution for the free variables of G such that P \~o tp(G) . Then there
is a P-derivation of G with an answer substitution 6 such that <p ^
Proof. From Lemmas 6.2.8 and 6.2.9 and the assumption of the theorem,
it is evident that there is a P-derivation sequence (gi, Di, 0i Vi)1<i for {G}
and a sequence of substitutions 7i such that
(i) 7i = vj,
(ii) 7i+1 satisfies the equation 7; =Vi 7,+i o0i+1,
(iii) 7i < U(Di), and
(iv) Kp(Gi+1, 1i+1) < Kp(gi,ri).

From (iv) it follows that the sequence must be finite. From (iii) and Lem-
mas 6.1.3 and 6.1.5 it is evident, then, that it must be a successfully ter-
minated sequence, i.e. a P-derivation of G. Suppose that the length of the
sequence is n. From (i), (ii), Lemma 6.2.9 and an induction on n, it can
be seen that that tp XVl 0n o • • • o 01. But f(G) = V1 and 0n o • • • o 01 is
the answer substitution for the sequence. I

6.3 Designing an actual interpreter


An interpreter for a language based on our higher-order formulas may be
described as a procedure which, given a program P and a query G, attempts
to construct a P-derivation of G. The search space for such a procedure
is characterized by a set of states each of which consists of a goal set and
a disagreement set. The initial state in the search corresponds to the pair
({G},0). At each stage in the search, the procedure is confronted with
a state that it must attempt to simplify. The process of simplification
involves either trying to find a solution to the unification problem posed
by the disagreement set, or reducing the goal set to an empty set or a set
of flexible atomic goal formulas. Given a particular state, there are several
ways in which an attempt might be made to bring the search closer to a
resolution. However, Definition 6.2.1 indicates that the possible choices are
finite, and Theorem 6.2.10 assures us that if these are tried exhaustively
then a solution is bound to be found if one exists at all. Further, a solution
path may be augmented by substitutions from which an answer can be
extracted in a manner akin to the first-order case.
An exhaustive search, while being necessary for completeness, is clearly
an inappropriate basis for an interpreter for a programming language, and
a trade-off needs to be made between completeness and practicality. From
Lemma 6.2.8 we understand that certain choices can be made by the inter-
preter without adverse effects. However, the following choices are critical:
(i) the disjunct to be added to the goal set in a goal reduction step
involving a disjunctive goal formula,
Higher-Order Logic Programming 547

(ii) the definite clause to be used in a backchaining step, and


(iii) the substitution to be used in a unification step.
When such choices are encountered, the interpreter can, with an accompa-
nying loss of completeness, explore them in a depth-first fashion with the
possibility of backtracking. The best order in which to examine the options
is a matter that is to be settled by experimentation. This issue has been ex-
plored in actual implementations of our language [Brisset and Ridoux, 1992;
Miller and Nadathur, 1988; Elliott and Pfenning, 1991], and we comment
briefly on the insights that have been gained from these implementations.
The description of the search space for the interpreter poses a natural
first question: Should the unification problem be solved first or should a
goal reduction step be attempted? A reasonable strategy in this context
appears to be that of using a unification step whenever possible and at-
tempting to reduce the goal set only after a solved disagreement set has
been produced. There are two points that should be noted with regard
to this strategy. First, an interpreter that always tries to solve an un-
solved unification problem before looking at the goal set would function
in a manner similar to standard Prolog interpreters which always solve
the (straightforward) first-order unification problems first. Second, the
attempt to solve a unification problem stops short of looking for unifiers
for solved disagreement sets. The search for unifiers for such sets can be
rather expensive [Huet, 1975], and may be avoided by carrying the solved
disagreement sets forward as constraints on the remaining search. In fact,
it appears preferable not to look for unifiers for these sets even after the
goal set has been reduced to a "solved" set. When the search reaches such
a stage, the answer substitution and the final solved disagreement and goal
sets may be produced as a response to the original query. From Theo-
rem 6.2.7 we see that composing any substitution that is a unifier for this
disagreement set and that also "solves" the corresponding goal set with
the answer substitution produces a complete solution to the query and the
response of the interpreter may be understood in this sense. In reality,
the presentation of the answer substitution and the mentioned sets can be
limited to only those components that have a bearing on the substitutions
for the free variables in the original query since it is these that constitute
the result of the computation.
In attempting to solve the unification problem corresponding to a dis-
agreement set, a flexible-rigid pair in the set may be picked arbitrarily.
Invoking MATCH on such a pair produces a set of substitutions in gen-
eral. One of these needs to be picked to progress the search further, and
the others must be retained as alternatives to backtrack to in case of a
failure. Certain biases may be incorporated in choosing from the substitu-
tions provided by MATCH, depending on the kinds of solutions that are
desired first. For example, consider the unification problem posed by the
548 Gopalan Nadathur and Dale Miller

disagreement set {{(F 2), (cons 2 (cons 2 nil)))} where F is a variable of


type int -> (list int) and the other symbols are as in Example 4.0.3. There
are four unifiers for this set:

If the substitutions provided by PROJiS are chosen first at each stage, then
these unifiers will be produced in the order that they appear above, perhaps
with the second and third interchanged. On the other hand, choosing the
substitution provided by IMIT first results in these unifiers being produced
in the reverse order. Now, the above unification problem may arise in a
programming context out of the following kind of desire: We wish to unify
the function variable F with the result of "abstracting" out all occurrences
of a particular constant, which is 2 in this case, from a given data structure,
which is an integer list here. If this is the desire, then it is clearly preferable
to choose the PROJs substitutions before the substitution provided by
IMIT. In a slightly more elaborate scheme, the user may be given a means
for switching between these possibilities.
In attempting to solve a goal set, a nonflexible goal formula from the
set may be picked arbitrarily. If the goal formula is either conjunctive or
existential, then there is only one way in which it can be simplified. If the
goal formula picked is {G1 V G2} and the remaining goal set is g, then, for
the sake of completeness, the interpreter should try to solve {G1} U g and
{G2} Ug simultaneously. In practice, the interpreter may attempt to solve
{G1}Ug first, returning to {G2}Ug only in case of failure. This approach,
as the reader might notice, is in keeping with the one used in Prolog. In
the case that the goal formula picked is atomic, a backchaining step must
be used. In performing such a step, it is enough to consider only those
definite clauses in the program of the form Vx A or Vx (G D A) where the
head of A is identical to the head of the goal formula being solved, since all
other cases will cause the disagreement set to be reduced to F by SIMPL.
For completeness, it is necessary to use each of these definite clauses in a
breadth-first fashion in attempting to solve the goal formula. Here again
the scheme that is used by standard Prolog interpreters might be adopted:
the first appropriate clause might be picked based on some predetermined
ordering and others may be returned to only if this choice leads to failure.
The above discussion indicates how an interpreter can be constructed
based on the notion of P-derivations. An interpreter based on only these
ideas would still be a fairly simplistic one. Several specific improvements
(such as recognizing and handling special kinds of unification problems) and
enhancements (such as those for dealing with polymorphism, a necessary
practical addition) can be described to this interpreter. An examination
of these aspects is beyond the scope of this chapter. For the interested
reader, we mention that a further discussion of some of these aspects may
Higher-Order Logic Programming 549

be found in [Nadathur, 1987], that a detailed presentation of a particular


interpreter appears in [Elliott and Pfenning, 1991], that a class of A-terms
with interesting unifiability properties is studied in [Miller, 1991] and that
ideas relevant to compilation are explored in [Brisset and Ridoux, 1991;
Kwon et al., 1994; Nadathur et al, 1995; Nadathur et al., 1993].

7 Examples of higher-order programming


In this section, we present some programs in our higher-order language
that use predicate variables and A-terms of predicate type. We begin this
discussion by describing a concrete syntax that will be used in the exam-
ples in this and later sections. We then present a set of simple programs
that illustrate the idea of higher-order programming. In Subsection 7.3
we describe a higher-order logic programming approach to implementing
goal-directed theorem proving based on the use of tactics and tacticals. We
conclude this section with a brief comparison of the notion of higher-order
programming in the logic programming and the functional programming
settings.

7.1 A concrete syntax for programs


The syntax that we shall use is adapted from that of the language AProlog.
In the typical programming situation, it is necessary to identify three kinds
of objects: the sorts and type constructors, the constants and variables
with their associated types and the definite clauses that define the various
predicate symbols. We present the devices for making these identifications
below and simultaneously describe the syntax for expressions that use the
objects so identified.
We assume that the two sorts o and int corresponding to propositions
and integers and the unary list type constructor list are available in our
language at the outset. This collection can be enhanced by declarations of
the form
kind <Id> type -> . . . -> type.
in which <Id> represents a token that is formed out of a sequence of alpha-
numeric characters and that begins with a lowercase letter. Such a decla-
ration identifies <Id> as a type constructor whose arity is one less than the
number of occurrences of type in the declaration. A declaration of this kind
may also be used to add new sorts, given that these are, in fact, nullary
type constructors. As specific examples, if int and list were not primitive
to the language, then they might be added to the available collections by
the declarations
kind int type.
kind list type -> type.
550 Gopalan Nadathur and Dale Miller

The sorts and type constructors available in a given programming con-


text can be used as expected in forming type expressions. Such expressions
might also use the constructor for function types that is written in con-
crete syntax as ->. Thus, the type int —> (list int) —*• (list int) seen first
in Example 3.1.1 would be written now as int -> (list int) -> (list
int).
The logical constants are rendered into concrete syntax as follows: T
is represented by true, A and V are denoted by the comma and semicolon
respectively, implication is rendered into : - after being written in reverse
order (i.e., G D A is denoted by A :- G where A and G are the respective
translations of A and G), -i is not used, and V and 3 of type (T ->• o) -»• o
are represented by pi and sigma respectively. The last two constants are
polymorphic in a sense that is explained below. To reduce the number
of parentheses in expressions, we assume that conjunction and disjunction
are right associative operators and that they have narrower scope than
implication. Thus, the formula (F A (G A H)) D A will be denoted by an
expression of the form A : - F ,G,H.
Nonlogical constants and variables are represented by tokens formed out
of sequences of alphanumeric characters or sequences of "sign" characters.
Symbols that consist solely of numeric characters are treated as nonlogical
constants that have the type int associated with them. For other constants,
a type association is achieved by a declaration of the form
type <Id> <Type>.
in which <Id> represents a constant and <Type> represents a type expres-
sion. As an example, if i has been defined to be a sort, then
type f (list i) -> i.
is a valid type declaration and results in f being identified as a constant
of type (list i) -> i. The types of variables will be left implicit in our
examples with the intention that they be inferred from the context.
It is sometimes convenient to identify a family of constants whose types
are similar through one declaration. To accommodate this possibility, our
concrete syntax allows for variables in type expressions. Such variables
are denoted by capital letters. Thus, A -> (list A) -> (list A) is a
valid type expression. A type declaration in which variables occur in the
type is to be understood in the following fashion: It represents an infinite
number of declarations each of which is obtained by substituting, in a
uniform manner, closed types for the variables that occur in the type. For
instance, the quantifier symbol sigma can be thought to be given by the
type declaration
type sigma (A -> o) -> o.
This declaration represents, amongst others, the declarations
type sigma (int -> o) -> o.
Higher-Order Logic Programming 551

type sigma ((int -> int) -> o) -> o.


The tokens sigma that appear in each of these (virtual) declarations are,
of course, distinct and might be thought to be subscripted by the type
chosen in each case for the variable A. As another example, consider the
declarations
type nil (list A).
type :: A -> (list A) -> (list A) .
The symbols nil and :: that are identified by them serve as polymor-
phic counterparts of the constants nil and cons of Example 4.Q.3: using
"instances" of nil and :: that are of type
(list int) and int -> (list int) -> (list int)
respectively, it is possible to construct representations of lists of objects
of type int. These two symbols are treated as pre-defined ones of our
language and we further assume that :: is an infix and right-associative
operator.
The symbol \ is used as an infix and right-associative operator that
denotes abstraction and juxtaposition with some intervening white-space
serves as an infix and left-associative operator that represents application.
Parentheses may be omitted in expressions by assuming that abstractions
bind more tightly than applications. Thus, the expressions
x\x, (x\x Y\Y) and x\y\z\(x z (y z))
denote, respectively, the terms
Axx, ((Axx:) (Ayy)) and \x \y \z((x z) (y z ) ) .
Within formulas, tokens that are not explicitly bound by an abstraction are
taken to be variables if they begin with an uppercase letter. A collection
of definite clauses that constitute a program will be depicted as in Prolog
by writing them in sequence, each clause being terminated by a period.
Variables that are not given an explicit scope in any of these clauses are
assumed to be universally quantified over the entire clause in which they
appear. As an example, assuming that the declaration
type append (list A) -> (list A) -> (list A) -> o.
gives the type of the predicate parameter append, the following definite
clauses define the append relation of Prolog:
append nil L L.
(append (X::L1) L2 (X::L3)) :- (append L1 L2 L3).
Notice that, in contrast to Prolog and in keeping with the higher-order
nature of our language, our syntax for formulas and terms is a curried one.
We depict a query by writing a formula of the appropriate kind preceded
by the token ?- and followed by a period. As observed in Section 4, the
free variables in a query are implicitly existentially quantified over it and
552 Gopalan Nadathur and Dale Miller

represent a means for extracting a result from a computation. Thus, the


expression
?- (append (l::2::nil) (3::4::nil) L).
represents a query that asks for the result of appending the two lists
1::2: : nil and 3: :4: :nil.
In conjunction with the append example, we see that variables may
occur in the types corresponding to the constants and variables that appear
in a "definite clause". Such a clause is, again, to be thought of as a schema
that represents an infinite set of definite clauses, each member of this set
being obtained by substituting closed types for the type variables that
occur in the schema. Such a substitution is, of course, to be constrained by
the fact that the resulting instance must constitute a well-formed definite
clause. Thus, consider the first of the definite clause "schemata" defining
append. Writing the types of nil, L and append as (list B), C and
(list A) -> (list A) -> (list A) -> o,
respectively, we see that C must be instantiated by a type of the form (list
D) and, further, that the same closed type must replace the variables A, B
and D. Consistent with this viewpoint, the invocation of a clause such as
this one must also be accompanied by a determination of the type instance
to be used. In practice this choice can be delayed through unification at the
level of types. This possibility is discussed in greater detail in [Nadathur
and Pfenning, 1992] and we assume below that it is used in the process of
solving queries.
A final comment concerns the inference of types for variables. Obvi-
ous requirements of this inference is that the overall term be judged to
be well-formed and that every occurrence of a variable that is (implicitly
or explicitly) bound by the same abstraction be accorded the same type.
However, these requirements are, of themselves, not sufficiently constrain-
ing. For example, in the first definite clause for append, these requirements
can be met by assigning the variable L the type (list int) or the type
(list D) as indicated above. Conflicts of this kind are to be resolved by
choosing a type for each variable that is most general in the sense that ev-
ery other acceptable type is an instance of it. This additional condition can
be sensibly imposed and it ensures that types can be inferred for variables
in an unambiguous fashion [Damas and Milner, 1982].
7.2 Some simple higher-order programs
Five higher-order predicate constants are defined through the declarations
in Figure 2. The intended meanings of these higher-order predicates are the
following: A closed query of the form (mappred P L K) is to be solvable if
P is a binary predicate, L and K are lists of equal length and corresponding
elements of L and K are related by P; mappred is, thus, a polymorphic
version of the predicate mappred of Example 4.0.3. A closed query of
Higher-Order Logic Programming 553

type mappred (A -> B -> o) -> (list A) -> (list B) -> o.


type forsome (A -> o) -> (list A) -> o.
type forevery (A -> o) -> (list A) -> o.
type trans (A -> A -> o) -> A -> A -> o.
type sublist (A -> o) -> (list A) -> (list A) -> o.

(mappred P nil nil).


(mappred P (X::L) (Y::K)) :- (P X Y), (mappred P L K).

(forsome P (X::L)) :- (P X).


(forsome P (X::L)) :- (forsome P L).

(forevery P nil).
(forevery P (X::L)) :- (P X), (forevery P L).

(trans R X Y) :- (R X Y).
(trans R X Z) :- (R X Y), (trans R Y Z).

(sublist P (X::L) (X::K)) :- (P X), (sublist P L K).


(sublist P (X::L) K) :- (sublist P L K).
(sublist P nil nil).

Fig. 2. Definition of some higher-order predicates

the form (forsome P L) is to be solvable if L is a list that contains at


least one element that satisfies the predicate P. In contrast, a closed query
of the form (forevery P L) is to be solvable only if all elements of L
satisfy the predicate P. Assuming that R is a closed term denoting a binary
predicate and X and Y are closed terms denoting two objects, the query
(trans R X Y) is to be solvable just in case the objects given by X and Y
are related by the transitive closure of the relation given by R; notice that
the subterm (trans R) of this query is also a predicate. Finally, a closed
query of the form (sublist P L K) is to be solvable if L is a sublist of K
all of whose elements satisfy P.
Figure 3 contains some declarations defining a predicate called age that
can be used in conjunction with these higher-order predicates. An inter-
preter for our language that is of the kind described in Section 6 will succeed
on the query
?- mappred x\y\(age x y) (ned::bob::sue::nil) L.
relative to the clauses in Figures 2 and 3, and will have an answer substi-
tution that binds L to the list (23: :24: :23: :nil). This is, of course, the
list of ages of the individuals ned, bob and sue, respectively. If the query
?- mappred x\y\(age y x) (23::24::nil) K.
554 Gopalan Nadathur and Dale Miller

kind person type.


type bob person.
type sue person.
type ned person.
type age person -> int -> o.

(age bob 24).


(age sue 23).
(age ned 23).

Fig. 3. A database of people and their ages

is invoked relative to the same set of definite clauses, two substitutions for
K can be returned as answers: (sue: :bob: :nil) and (ned: :bob: :nil).
Notice that within the form of higher-order programming being considered,
non-determinism is supported naturally. Support is also provided in this
context for the use of "partially determined" predicates, i.e., predicate
expressions that contain variables whose values are to be determined in
the course of computation. The query
?- (forevery x\(age x A) (ned::bob::sue::nil)).
illustrates this feature. Solving this query requires determining if the pred-
icate x\(age x A) is true of ned, bob and sue. Notice that this predicate
has a variable A appearing in it and a binding for A will be determined in the
course of computation, causing the predicate to become further specified.
The given query will fail relative to the clauses in Figures 2 and 3 because
the three individuals in the list do not have a common age. However, the
query
?- (forevery x\(age x A) (ned::sue::nil)).
will succeed and will result in A being bound to 23. The last two queries
are to be contrasted with the query
?- (forevery x\(sigma Y\(age x Y)) (ned::bob::sue::nil)).
in which the predicate x\(sigma Y\(age x Y)) is completely determined.
This query will succeed relative to the clauses in Figures 2 and 3 since all
the individuals in the list (ned: :bob: :sue: :nil) have an age defined for
them.
An interpreter for our language that is based on the ideas in Section 6
solves a query by using a succession of goal reduction, backchaining and
unification steps. None of these steps result in a flexible goal formula being
selected. Flexible goal formulas remain in the goal set until substitutions
performed on them in the course of computation make them rigid. This
may never happen and the computation may terminate with such goal
formulas being left in the goal set. Thus, consider the query
Higher-Order Logic Programming 555

?- (P sue 23).
relative to the clauses in Figure 3. Our interpreter will succeed on this
immediately because the only goal formula in its initial goal set is a flex-
ible one. It is sensible to claim this to be a successful computation be-
cause there is at least one substitution — in particular, the substitution
x\y\true for P — that makes this query a solvable one. It might be argued
that there are meaningful answers for this query relative to the given pro-
gram and that these should be provided by the interpreter. For example,
it may appear that the binding of P to the term x\y\(age x y) (that is
equal by virtue of n-conversion to age) should be returned as an answer
to this query. However, many other similarly suitable terms can be of-
fered as bindings for P; for example, consider the terms x\y\(age ned 23)
and x\y\((age x 23), (age ned y)). There are, in fact, far too many
"suitable" answer substitutions for this kind of a query for any reasonable
interpreter to attempt to generate. It is for this reason that flexible goal
formulas are never selected in the course of constructing a P-derivation and
that the ones that persist at the end are presented as such to the user along
with the answer substitution and any remaining flexible-flexible pairs.
Despite these observations, flexible goal formulas can have a meaningful
role to play in programs because the range of acceptable substitutions for
predicate variables can be restricted by other clauses in the program. For
example, while it is not sensible to ask for the substitutions for R that make
the query
?- (R john mary).
a solvable one, a programmer can first describe a restricted collection of
predicate terms and then ask if any of these predicates can be substituted
for R to make the given query a satisfiable one. Thus, suppose that our
program contains the definite clauses that are presented in Figure 4. Then,
the query
?- (rel R), (R john mary).
is a meaningful one and is solvable only if the term
x\y\(sigma Z\((wife x Z), (mother Z y ) ) ) .
is substituted for R. The second-order predicate rel specifies the collection
of predicate terms that are relevant to consider as substitutions in this
situation.
Our discussions pertaining to flexible queries have been based on a
certain logical view of predicates and the structure of predicate terms.
However, this is not the only tenable view. There is, in fact, an alternative
viewpoint under which a query such as
?- (P sue 23).
can be considered meaningful and for which the only legitimate answer is
556 Gopalan Nadathur and Dale Miller

type primrel (person -> person -> o) -> o.


type rel (person -> person -> o) -> o.
type mother person -> person -> o.
type wife person -> person -> o.

(primrel mother).
(primrel wife).
(rel R) :- (primrel R).
(rel x\y\(sigma Z \ ( ( R x Z), (S Z y)))) :-
(primrel R), (primrel S).
(mother jane mary).
(wife John jane).

Fig. 4. Restricting the range of predicate substitutions

the substitution of age for P. We refer the reader to [Chen et al., 1993] for
a presentation of a logic that justifies this viewpoint and for the description
of a programming language that is based on this logic.

7.3 Implementing tactics and tacticals


To provide another illustration of higher-order programming, we consider
the task of implementing the tactics and tacticals that are often used in the
context of (interactive) theorem proving systems. As described in [Gordon
et al., 1979], a tactic is a primitive method for decomposing a goal into other
goals whose achievement or satisfaction ensures the achievement or satis-
faction of the original goal. A tactical is a high-level method for composing
tactics into meaningful and large scale problem solvers. The functional pro-
gramming language ML has often been used to implement tactics and tac-
ticals. We show here that this task can also be carried out in a higher-order
logic programming language. We use ideas from [Felty and Miller, 1988;
Felty, 1993] in this presentation.
The task that is at hand requires us, first of all, to describe an encod-
ing within terms for the notion of a goal in the relevant theorem proving
context. We might use g as a sort for expressions that encode such goals;
this sort, which corresponds to "object-level" goals, is to be distinguished
from the sort o that corresponds to "meta-level" goals, i.e., the queries of
our programming language. The terms that denote primitive object-level
goals will, in general, be determined by the problem domain being consid-
ered. For example, if the desire is to find proofs for formulas in first-order
logic, then the terms of type g will have to incorporate an encoding of such
formulas. Alternatively, if it is sequents of first-order logic that have to
be proved, then terms of type g should permit an encoding of sequents.
Additional constructors over the type g might be included to support the
Higher-Order Logic Programming 557

encoding of compound goals. For instance, we will use below the constant
truegoal of type g to denote the trivially satisfiable goal and the constant
andgoal of type g -> g -> g to denote a goal formed out of the conjunc-
tion of two other goals. Other combinations such as the disjunction of two
goals are also possible and can be encoded in a similar way.
A tactic in our view is a binary relation between a primitive goal and
another goal, either compound or primitive. Thus tactics can be encoded
by predicates of type g -> g -> o. Abstractly, if a tactic R holds of G1 and
G2, i.e., if (R Gl G2) is solvable from a presentation of primitive tactics as
a set of definite clauses, then satisfying the goal G2 in the object-language
should suffice to satisfy goal Gl.
An illustration of these ideas can be provided by considering the task
of implementing a proof procedure for propositional Horn clauses. For
simplicity of presentation, we restrict the propositional goal formulas that
will be considered to be conjunctions of propositions. The objective will, of
course, be to prove such formulas. Each primitive object-level goal therefore
corresponds to showing some atomic proposition to be true, and such a goal
might be encoded by a constant of type g whose name is identical to that
of the proposition. Now, if p and q are two atomic propositions, then
the goal of showing that their conjunction is true can be encoded in the
object-level goal (andgoal p q). The primitive method for reducing such
goals is that of backchaining on (propositional) definite clauses. Thus, the
tactics of interest will have the form (R H G), where H represents the head
of a clause and G is the goal corresponding to its body. The declarations
in Figure 5 use these ideas to provide a tactic-style encoding of the four
propositional clauses
p :- r,s.
q :- r.
s:- r,q.
r.
The tactics clltac, c12tac, c13tac and c14tac correspond to each of
these clauses respectively.
The declarations in Figure 6 serve to implement several general tacti-
cals. Notice that tacticals are higher-order predicates in our context since
they take tactics that are themselves predicates as arguments. The tacticals
in Figure 6 are to be understood as follows. The orelse tactical is one that
succeeds if either of the two tactics it is given can be used to successfully
reduce the given goal. The try tactical forms the reflexive closure of a given
tactic: if R is a tactic then (try R) is itself a tactic and, in fact, one that
always succeeds, returning the original goal if it is unable to reduce it by
means of R. This tactical uses an auxiliary tactic called idtac whose mean-
ing is self-evident. The then tactical specifies the natural join of the two
relations that are its arguments and is used to compose tactics: if Rl and
558 Gopalan Nadathur and Dale Miller

type p g.
type q g.
type r g.
type s g.

type clltac g -> g -> o.


type c12tac g -> g -> o.
type c13tac g -> g -> o.
type c14tac g -> g -> o.

(c11tac p (andgoal r s)).


(c12tac q r).
(c13tac s (andgoal r q ) ) .
(c14tac r truegoal).

Fig. 5. A tactic-style encoding of some propositional definite clauses

R2 are closed terms denoting tactics, and Gl and G2 are closed terms repre-
senting (object-level) goals, then the query (then Rl R2 Gl G2) succeeds
just in case the application of Rl to Gl produces G3 and the application of
R2 to G3 yields the goal G2. The maptac tactical is used in carrying out
the application of the second tactic in this process since G2 may be not
be a primitive object-level goal: maptac maps a given tactic over all the
primitive goals in a compound goal. The then tactical plays a fundamental
role in combining the results of step-by-step goal reduction. The repeat
tactical is defined recursively using then, orelse, and idtac and it repeat-
edly applies the tactic it is given until this tactic is no longer applicable.
Finally, the complete tactical succeeds if the tactic it is given completely
solves the goal it is given. The completely solved goal can be written as
truegoal and (andgoal truegoal truegoal) and in several other ways,
and so the auxiliary predicate goalreduce is needed to reduce all of these
to truegoal. Although the complete tactical is the only one that uses the
predicate goalreduce, the other tacticals can be modified so that they also
use it to simplify the structure of the goal they produce whenever this is
possible.
Tacticals, as mentioned earlier, can be used to combine tactics to pro-
duce large scale problem solvers. As an illustration of this, consider the
following definite clause
(depthfirst G) :-
(complete (repeat (orelse c11tac
(orelse c12tac (orelse c13tac c14tac))))
G truegoal).
in conjunction with the declarations in Figures 5 and 6. Assuming an
Higher-Order Logic Programming 559

type then (g -> g -> o) -> (g -> g -> o)


-> g -> g -> o.
type orelse (g -> g -> o) -> (g -> g -> o)
-> g -> g -> o.
type maptac (g -> g -> o) -> g -> g -> o.
type repeat (g -> g -> o) -> g -> g -> o.
type try (g -> g -> o) -> g -> g -> o.
type complete (g -> g -> o) -> g -> g -> o.
type idtac g -> g -> o.
type goalreduce g -> g -> o.

(orelse Rl R2 Gl G2) :- (Rl Gl G2).


(orelse Rl R2 Gl G2) :- (R2 Gl G2).

(try R Gl G2) :- (orelse R idtac Gl G2).

(idtac G G).

(then Rl R2 Gl G2) :- (Rl Gl G3), (maptac R2 G3 G2).

(maptac R truegoal truegoal).


(maptac R (andgoal Gl G2) (andgoal G3 G4)) :-
(maptac R Gl G3), (maptac R G2 G4).
(maptac R Gl G2) :- (R Gl G2).

(repeat R Gl G2) :- (orelse (then R (repeat R)) idtac Gl G2).

(complete R Gl truegoal) :-
(R Gl G2), (goalreduce G2 truegoal).

(goalreduce (andgoal truegoal Gl) G2) :- (goalreduce Gl G2).


(goalreduce (andgoal Gl truegoal) G2) :- (goalreduce Gl G2).
(goalreduce G G).

Fig. 6. Some simple tacticals

interpreter for our language of the kind described in Section 6, this clause
defines a procedure that attempts to solve an object-level goal by a depth-
first search using the given propositional Horn clauses. The query
?- (depthfirst p).
has a successful derivation and it follows from this that the proposition p
is a logical consequence of the given Horn clauses.
560 Gopalan Nadathur and Dale Miller

7.4 A comparison with functional programming


The examples of higher-order programming considered in this section have
similarities to higher-order programming in the functional setting. In the
latter context, higher-order programming corresponds to the treatment,
perhaps in a limited fashion, of functions as values. For example, in a
language like ML [Milner et al., 1990], function expressions can be con-
structed using the abstraction operation, bound to variables, and applied
to arguments. The semantics of such a language is usually extensional,
and so intensional operations on functions like that of testing whether two
function descriptions are identical are not supported by it. We have just
seen that our higher-order logic programming language supports the ability
to build predicate expressions, to bind these to variables, to apply them
to arguments and, finally, to invoke the resulting expressions as queries.
Thus, the higher-order capabilities present in functional programming lan-
guages are matched by ones available in our language. In the converse
direction, we note that there are some completely new higher-order capa-
bilities present in our language. In particular, this language is based on
logic that has intensional characteristics and so can support computations
on the descriptions of predicates and functions. This aspect is not exploited
in the examples in this section but will be in those in the next.
In providing a more detailed comparison of the higher-order program-
ming capabilities of our language that are based on an extensional view
with those of functional programming languages, it is useful to consider
the function maplist, the "canonical" higher-order function of functional
programming. This function is given by the following equations:
(maplist f nil) = nil
(maplist f (x::l)) = (f x)::(maplist f 1)
There is a close correspondence between maplist and the predicate called
mappred that is defined in Subsection 7.2. In particular, let P be a pro-
gram, let q be a binary predicate and let f be a functional program such
that (q t s) is provable from P if and only if (f t) evaluates to s; that
is, the predicate q represents the function f. Further, let p' be P extended
with the two clauses that define mappred. (We assume that no definite
clauses defining mappred appear in P.) Then, for any functional program-
ming language that is reasonably pure, we can show that (maplist f 1)
evaluates to k if and only if (mappred q 1 k) is provable from P'. Notice,
however, that mappred is "richer" than mapfun in that its first argument
can be a non-functional relation. In particular, if q denotes a non-functional
relation, then (mappred q) is an acceptable predicate and itself denotes a
non-functional relation. This aspect was illustrated earlier in this section
through the query
?- (mappred x\y\(age y x) (23::24::nil) K).
Higher-Order Logic Programming 561

In a similar vein, it is possible for a predicate expression to be partially


specified in the sense that it contains logic variables that get instantiated
in the course of using the expression in a computation. An illustration of
this possibility was provided by the query
?- (forevery x\(age x A) (ned::sue::nil)).
These additional characteristics of higher-order logic programming are, of
course, completely natural and expected.

8 Using A-terms as data structures


As noted in Section 2, the expressions that are permitted to appear as
the arguments of atomic formulas in a logic programming language con-
stitute the data structures of that language. The data structures of our
higher-order language are, therefore, the terms of a A-calculus. There are
certain programming tasks that involve the manipulation of objects that
embody a notion of binding; the implementation of theorem proving and
program transformation systems that perform computations on quantified
formulas and programs are examples of such tasks. Perspicuous and suc-
cinct representations of such objects can be provided by using A-terms.
The data structures of our language also incorporate a notion of equality
based on A-conversion and this provides useful support for logical opera-
tions that might have to be performed on the objects being represented.
For example, following the discussion, in Subsection 3.2, b-conversion can
be used to realize the operation of substitution on these objects. In a sim-
ilar sense, our language permits a quantification over function variables,
leading thereby to the provision of higher-order unification as a primitive
for taking apart A-terms. The last feature is truly novel to our language
and a natural outcome of considering the idea of higher-order programming
in logic programming in its full generality: logic programming languages
typically support intensional analyses of objects that can be represented
in them, and the manipulation of A-terms through higher-order unification
corresponds to such examinations of functional expressions.
The various features of our language that are described above lead to
several important applications for it in the realm of manipulating the syn-
tactic objects of other languages such as formulas and programs. We refer
to such applications as meta-programming ones and we provide illustrations
of them in this section. The examples we present deal first with the manip-
ulation of formulas and then with the manipulation of programs. Although
our language has several features that are useful from this perspective, it
is still lacking in certain respects as a language for meta-programming.
We discussion this aspect in the last subsection below as a prelude to an
extension of it that is considered in Section 9.
Before embarking on the main discussions of this section, it is useful to
recapitulate the kinds of computations that can be performed on functional
562 Gopalan Nadathur and Dale Miller

objects in our language. For this purpose, we consider the predicate mapfun
that is defined by the following declarations.
type mapfun (A -> B) -> (list A) -> (list B) -> o.

(mapfun F nil nil).


(mapfun F (X::L) ((F X)::K)) :- (mapfun F L K).
This predicate is a polymorphic version of the predicate mapfun presented
in Example 4.0.3 and it relates a term of functional type to two lists of equal
length if the elements of the second list can be obtained by applying the
functional term to the corresponding elements of the first list. The notion
of function application that is relevant here is, of course, the one given by
the A-conversion rules. Thus, suppose that h is a nonlogical constant of
type int -> int -> int. Then the answer to the query
?- (mapfun x\(h 1 x) (l::2::nil) L).
is one that entails the substitution of the term ((h 1 1):: (h 1 2): :nil)
for L. In computing this answer, an interpreter for our language would have
to form the terms ((x\(h 1 x)) 1) and ((x\(h 1 x)) 2) that may sub-
sequently be simplified using the A-conversion rules. As another example,
consider the query
?- (mapfun F (1::2::nil) ((h 1 1)::(h 1 2)::nil)).
Any p-derivation for this query would have an associated answer substi-
tution that contains the pair {F, x\(h 1 x)). A depth-first interpreter for
our language that is of the kind described in Section 6 would have to con-
sider unifying the terms (F 1) and (h 1 1). There are four incomparable
unifiers for these two terms and these are the ones that require substituting
the terms
x\(h x x), x\(h 1 x), x\(h x 1), and x\(h 1 1)
respectively for F. Now, the terms (F 2) and (h 1 2) can be unified only
under the second of these substitutions. Thus, if the interpreter selects a
substitution for F distinct from the second one above, it will be forced to
backtrack to reconsider this choice when it attempts to unify the latter two
terms. As a final example, consider the query
?- (mapfun F (1::2::nil) (3::4::nil)).
This query does not succeed in the context of our language. The reason
for this is that there is no A-term of the kind used in our language that
yields 3 when applied to 1 and 4 when applied to 2. Note that there are
an infinite number of functions that map 1 to 3 and 2 to 4. However, none
of these functions can be expressed by our terms.
As a final introductory remark, we observe that it is necessary to dis-
tinguish in the discussions below between the programs and formulas that
are being manipulated and the ones in our language that carry out these
Higher-Order Logic Programming 563

kind term type.


kind form type.

type false form.


type truth form.
type and form -> form -> form.
type or form -> form -> form.
type imp form -> form -> form.
type all (term -> form) -> form.
type some (term -> form) -> form.

type a term.
type b term.
type c term.
type f term -> term.
type path term -> term -> form.
type adj term -> term -> form.
type prog list form -> o.

prog ((adj a b) :: (adj be):: (adj c (f c)) ::


(all x\(all y\(imp (adj x y) (path x y)))) ::
(all x\(all y\(all z\(imp (and (adj x y) (path y z))
(path x z))))) ::
nil)

Fig. 7. A specification of an object-level logic and some definite clauses

manipulations. We do this by referring to the former as object-level ones


and to the latter as ones of the meta-level.

8.1 Implementing an interpreter for Horn clauses


Formulas in a quantificational logic can be represented naturally within the
terms of our language. The main concern in encoding these objects is that
of capturing the nature of quantification. As noted in Subsection 3.1, the
binding and predication aspects of a quantifier can be distinguished and
the former can be handled by the operation of abstraction. Adopting this
approach results in direct support for certain logical operations on formu-
las. For example, the equivalence of two formulas that differ only in the
names used for their bound variables is mirrored directly in the equality of
A-terms by virtue of a-conversion. Similarly, the operation of instantiating
a formula is realized easily through b-conversion. Thus, suppose the ex-
pression (all x\T) represents a universally quantified formula. Then the
instantiation of this formula by a term represented by S is effected simply
564 Gopalan Nadathur and Dale Miller

by writing the expression ((x\T) S); this term is equal, via b-conversion,
to the term [S/x]T. The upshot of these observations is that a programmer
using our language does not need to explicitly implement procedures for
testing for alphabetic equivalence, for performing substitution or for car-
rying out other similar logical operations on formulas. The benefits of this
are substantial since implementing such operations correctly is a non-trivial
task.
We provide a specific illustration of the aspects discussed above by
considering the problem of implementing an interpreter for the logic of
first-order Horn clauses. The first problem to address here is the represen-
tation of object-level formulas and terms. We introduce the sorts form and
term for this purpose; A-terms of these two types will represent the objects
in question. Object-level logical and nonlogical constants, functions, and
predicates will be represented by relevant meta-level nonlogical constants.
Thus, suppose we wish to represent the following list of first-order definite
clauses:
adj(a,b),
adj(b,c),
adj(c,f(c)),
VxVy(adj(x,y) Dpath(x,y)) and
VxVyVz(adj(x, y) Apath(y, z) D path(x, z)).
We might do this by using the declarations in Figure 7. In this program, a
representation of a list of these clauses is eventually "stored" by using the
(meta-level) predicate prog. Notice that universal and existential quantifi-
cation at the object-level are encoded by using, respectively, the constants
all and some of second-order type and that variable binding at the object-
level is handled as expected by meta-level abstraction.
The clauses in Figure 8 implement an interpreter for the logic of first-
order Horn clauses assuming the representation just described. Thus, given
the clauses in Figures 7 and 8, the query
?- (prog Cs), (intep Cs (path a X ) ) .
is solvable and produces three distinct answer substitutions, binding X to
b, c and (f c), respectively. Notice that object-level quantifiers need to be
instantiated at two places in an interpreter of the kind we desire: in dealing
with an existentially quantified goal formula and in generating an instance
of a universally quantified definite clause. Both kinds of instantiation are
realized in the clauses in Figure 8 through (meta-level) b-conversion and
the specific instantiation is delayed in both cases through the use of a logic
variable.
In understanding the advantages of our language, it is useful to consider
the task of implementing an interpreter for first-order Horn clause logic in
a pure first-order logic programming language. This task is a much more
involved one since object-level quantification will have to be represented in
Higher-Order Logic Programming 565

type interp form -> form -> o.


type backchain form -> form -> form -> o.

(interp Cs (some B)) :- (interp Cs (B T ) ) .


(interp Cs (and B O) :- (interp Cs B), (interp Cs C) .
(interp Cs (or B C)) :- (interp Cs B).
(interp Cs (or B C)) :- (interp Cs C).
(interp Cs A) :- (backchain Cs Cs A).

(backchain Cs (and Dl D2) A) :- (backchain Cs Dl A).


(backchain Cs (and Dl D2) A) :- (backchain Cs D2 A).
(backchain Cs (all D) A) :- (backchain Cs (D T) A).
(backchain Cs A A).
(backchain Cs (imp G A) A) :- (interp Cs G).

Fig. 8. An interpreter for first-order Horn clauses

a different way and substitution will have to be explicitly encoded in such


a language. These problems are "finessed" in the usual meta-interpreters
written in Prolog (such as those in [Sterling and Shapiro, 1986]) by the use
of non-logical features; in particular, by using the "predicate" clause.

8.2 Dealing with functional programs as data


Our second example of meta-programming deals with the problem of rep-
resenting functional programs and manipulating these as data. Some ver-
sion of the A-calculus usually underlies a functional programming language
and, towards dealing with the representation issue, we first show how un-
typed A-terms might be encoded in simply typed A-terms. Figure 9 con-
tains some declarations that are relevant in this context. These declara-
tions identify the sort tm that corresponds to the encodings of object-level
A-terms and the two constants abs and app that serve to encode object-
level abstraction and application. As illustrations of the representation
scheme, the untyped A-terms Xxx, \x\y(x y), and \x(x x) are denoted
by the simply typed A-terms (abs x\x), (abs x\(abs y\(app x y ) ) ) ,
and (abs x\(app x x)), respectively. A notion of evaluation based on
b-reduction usually accompanies the untyped A-terms. Such an evaluation
mechanism is not directly available under the encoding. However, it can be
realized through a collection of definite clauses. The clauses defining the
predicate eval in Figure 9 illustrate how this might be done. As a result
of these clauses, this predicate relates two meta-level terms if the second
encodes the result of evaluating the object-level term that is encoded by
the first; the assumption in these clauses is that any "value" that is pro-
duced must be distinct from an (object-level) application. We note again
566 Gopalan Nadathur and Dale Miller

kind tm type.

type abs (tm -> tm) -> tm.


type app tm -> tm -> tm.
type eval tm -> tm -> o.

(eval (abs R) (abs R)).


(eval (app M N) V) :-
(eval M (abs R)), (eval N U), (eval (R U) V).

Fig. 9. An encoding of untyped A-terms and call-by-value evaluation

that application and b-conversion at the meta-level are used in the second
clause for eval to realize the needed substitution at the object-level of an
actual argument for a formal one of a functional expression.
The underlying A-calculus is enhanced in the typical functional pro-
gramming language by including a collection of predefined constants and
functions. In encoding such constants a corresponding set of nonlogical
constants might be used. For the purposes of the discussions below, we
shall assume the collection presented in Figure 10 whose purpose is under-
stood as follows:
(1) The constants fixpt and cond encode the fixed-point and conditional
operators of the object-language being considered; these operators
play an essential role in realizing recursive schemes.
(2) The constants truth and false represent the boolean values and and
represents the binary operation of conjunction on booleans.
(3) The constant c when applied to an integer yields the encoding of that
integer; thus (c 0) encodes the integer 0. The constants +, *, -, <
and = represent the obvious binary operators on integers. Finally, the
constant intp encodes a function that recognizes integers: (intp e)
encodes an expression that evaluates to true if e is an encoding of an
integer and to false otherwise.
(4) The constants cons and nill encode list constructors, consp and
null encode recognizers for nonempty and empty lists, respectively,
and the constants car and cdr represent the usual list destructors.
(5) The constant pair represents the pair constructor, pairp represents
the pair recognizer, and first and second represent the obvious
destructors of pairs.
(6) the constant error encodes the error value.
Each of the constants and functions whose encoding is described above
has a predefined meaning that is used in the evaluation of expressions
in the object-language. These meanings are usually clarified by a set of
Higher-Order Logic Programming 567

type cond tm -> tm -> tm -> tm.


type fixpt (tm -> tm) -> tm.

type truth tm.


type false tm.
type and tm -> tm -> tm.

type c int -> tm.


type + tm -> tm -> tm.
type - tm -> tm -> tm.
type * tm -> tm -> tm.
type < tm -> tm -> tm.
type —
tm -> tm -> tm.
type intp tm -> tm

type nill tm.


type cons tm -> tm -> tm
type null tm -> tm.
type consp tm -> tm.
type car tm -> tm.
type cdr tm -> tm.

type pair tm -> tm -> tm.


type pairp tm -> tm.
type first tm -> tm.
type second tm -> tm.

type error tm.

Fig. 10. Encodings for some predefined constants and functions

equations that are satisfied by them. For instance, if fixpt and cond are
the fixed-point and conditional operators of the functional programming
language being encoded and true and false are the boolean values, then
the meanings of these operators might be given by the following equations:
Vx ((fixpt x) = (x (fixpt x))),
VxVy ((cond true x y) = x), and
VxVy ((cond false x y) = y).
The effect of such equations on evaluation can be captured in our encoding
by augmenting the set of definite clauses for eval contained in Figure 9.
For example, the definite clause
(eval (fixpt F) V) :- (eval (F (fixpt F)) V).
568 Gopalan Nadathur and Dale Miller

can be added to this collection to realize the effect of the first of the equa-
tions above. We do not describe a complete set of equations or an encoding
of it here, but the interested reader can find extended discussions of this
and other related issues in [Hannan and Miller, 1992] and [Hannan, 1993].
A point that should be noted about an encoding of the kind described here
is that it reflects the effect of the object-language equations only in the
evaluation relation and does not affect the equality relation of the meta-
language. In particular, the notions of equality and unification for our
typed A-terms, even those containing the new nonlogical constants, are
still governed only by the A-conversion rules.
The set of nonlogical constants that we have described above suffices for
representing several recursive functional programs as A-terms. For example,
consider a tail-recursive version of the factorial function that might be
defined in a functional programming language as follows:
fact n m = (cond (n = 0) m (fact (n - 1) (n * m))).
(The factorial of a non-negative integer n is to be obtained by evaluating
the expression (fact n 1).) The function fact that is defined in this manner
can be represented by the A-term
(fixpt fact\(abs n\(abs m\
(cond (= n (c 0)) m
(app (app fact (- n (c 1))) (* n m ) ) ) ) ) ) .
We assume below a representation of this kind for functional programs and
we describe manipulations on these programs in terms of manipulations on
their representation in our language.
As an example of manipulating programs, let us suppose that we wish
to transform a recursive program that expects a single pair argument into
a corresponding curried version. Thus, we would like to transform the
program given by the term
(fixpt fact\(abs p\
(cond (and (pairp p) (= (first p) (c 0)))
(second p)
(cond (pairp p)
(app fact (pair (- (first p) (c 1))
(* (first p) (second p))))
error))))
into the factorial program presented earlier. Let the argument of the given
function be p as in the case above. If the desired transformer is imple-
mented in a language such as Lisp, ML, or Prolog then it would have to
be a recursive program that descends through the structure of the term
representing the functional program, making sure that the occurrences of
the bound variable p in it are within expressions of the form (pairp p),
(first p), or (second p) and, in this case, replacing these expressions
Higher-Order Logic Programming 569

respectively by a true condition, and the first and second arguments of the
version of the program being constructed. Although this does not happen
with the program term displayed above, it is possible for this descent to
enter a context in which the variable p is bound locally, and care must be
exercised to ensure that occurrences of p in this context are not confused
with the argument of the function. It is somewhat unfortunate that the
names of bound variables have to be considered explicitly in this process
since the choice of these names has no relevance to the meanings of pro-
grams. However, this concern is unavoidable if our program transformer
is to be implemented in one of the languages under consideration since a
proper understanding of bound variables is simply not embedded in them.
The availability of A-terms and of higher-order unification in our lan-
guage permits a rather different kind of solution to the given problem. In
fact, the following (atomic) definite clause provides a concise description
of the desired relationship between terms representing the curried and un-
curried versions of recursive programs:
(curry (fixpt ql\(abs x\(A (first x) (second x) (pairp x)
(r\s\(app ql (pair r s ) ) ) ) ) )
(fixpt q2\(abs y\(abs z\(A y z truth (r\s\(app
(app q2 r) s ) ) ) ) ) ) ) .
The first argument of the predicate curry in this clause constitutes a "tem-
plate" for programs of the type we wish to transform: For a term represent-
ing a functional program to unify with this term, it must have one argument
that corresponds to x, every occurrence of this argument in the body of
the program must be within an expression of the form (first x), (second
x) or (pairp x), and every recursive call in the program must involve the
formation of a pair argument. (The expression r\s\(app ql (pair r s))
represents a function of two arguments that applies ql to the pair formed
from its arguments.) The recognition of the representation of a functional
program as being of this form results in the higher-order variable A being
bound to the result of extracting the expressions (first x), (second x),
(pairp x), and r\s\(app ql (pair r s)) from the term corresponding
to the body of this program. The desired transformation can be effected
merely by replacing the extracted expressions with the two arguments of
the curried version, truth and a recursive call, respectively. Such a replace-
ment is achieved by means of application and b-conversion in the second
argument of curry in the clause above.
To illustrate the computation described abstractly above, let us consider
solving the query
?- (curry (fixpt fact\(abs p\
(cond (and (pairp p) (= (first p) (c 0)))
(second p)
(cond (pairp p)
570 Gopalan Nadathur and Dale Miller

(app fact
(pair (- (first p) (c 1))
(* (first p) (second p ) ) ) )
error))))
NewProg).
using the given clause for curry. The described interpreter for our language
would, first of all, instantiate the variable A with the term
u\v\p\q\(cond (and p (= u (c 0 ) ) ) v
(cond p (q (- u (c 1)) (* u v)) error).
(This instantiation for A is unique.) The variable NewProg will then be set
to a term that is equal via A-conversion to
(fixpt q2 \(abs y\(abs z\(cond (and truth (= y (c 0 ) ) ) z
(cond truth (app (app q2 (- y (cl))) (* y z)) error))))).
Although this is not identical to the term we wished to produce, it can be
reduced to that form by using simple identities pertaining to the boolean
constants and operations of the function programming language. A pro-
gram transformer that uses these identities to simplify functional programs
can be written relatively easily.
As a more complex example of manipulating functional programs, we
consider the task of recognizing programs that are tail-recursive; such a
recognition step might be a prelude to transforming a given program into
one in iterative form. The curried version of the factorial program is an
example of a tail-recursive program as also is the program for summing two
non-negative integers that is represented by the following A-term:
(fixpt sum\(abs n\(abs m\
(cond (= n (c 0)) m
(app (app sum (- n (c 1))) (+ m (c 1)))))))).
Now, the tail-recursiveness of these programs can easily be recognized by
using higher-order unification in conjunction with their indicated represen-
tations. Both are, in fact, instances of the term
(fixpt f\(abs x\(abs y\
(cond (C x y) (H x y)
(app (app f (Fl x y)) (F2 x y))))))
Further, the representations of only tail-recursive programs are instances
of the last term: Any closed term that unifies with the given "second-
order template" must be a representation of a recursive program of two
arguments whose body is a conditional and in which the only recursive
call appears in the second branch of the conditional and, that too, as the
head of the expression constituting that branch. Clearly any functional
program that has such a structure must be tail-recursive. Notice that all
the structural analysis that has to be performed in this recognition process
Higher-Order Logic Programming 571

is handled completely within higher-order unification.


Templates of the kind described above have been used by Huet and
Lang [Huet and Lang, 1978] to describe certain transformations on recursive
programs. Templates are, however, of limited applicability when used alone
since they can only recognize restricted kinds of patterns. For instance,
the template that is shown above for recognizing tail-recursiveness will not
identify tail-recursive programs that contain more than one recursive call
or that have more than one conditional in their body. An example of such
a program is the one for finding the greatest common denominator of two
numbers that is represented by the following A-term:
(fixpt gcd\(abs x\(abs y\
(cond (= (c 1) x) (c 1)
(cond (= x y) x
(cond (< x y) (app (app gcd y) x)
(app (app gcd (- x y)) y)))))))
This program is tail-recursive but its representation is not an instance of
the template presented above. Worse still, there is no second-order term all
of whose closed instances represent tail-recursive programs and that also
has the representations of the factorial, the sum and the greatest common
denominator progams as its instances.
A recursive specification of a class of tail-recursive program terms that
includes the representations of all of these programs can, however, be pro-
vided by using definite clauses in addition to second-order A-terms and
higher-order unification. This specification is based on the following obser-
vations:
(1) A program is obviously tail-recursive if it contains no recursive calls.
The representation of such a program can be recognized by the tem-
plate (fixpt f\(abs x\(abs y\(H x y ) ) ) ) in which H is a variable.
(2) A program that consists solely of a recursive call with possibly mod-
ified arguments is also tail-recursive. The second-order term
(fixpt f\(abs x\(abs y\(app (app f (H x y)) (G x y ) ) ) ) )
in which H and G are variables unifies with the representations of only
such programs.
(3) Finally, a program is tail-recursive if its body consists of a conditional
in which there is no recursive call in the test and whose left and right
branches themselves satisfy the requirements of tail-recursiveness.
Assuming that C, H1 and H2 are variables of appropriate type, the
representations of only such programs unify with the A-term
(fixpt f\(abs x\(abs y\(cond (C x y) (H1 f x y)
(H2 f x y ) ) ) ) )
in a way such that the terms (fixpt f\(abs x\(abs y\(H1 f x y))))
and (fixpt f\(abs x\(abs y\(H2 f x y ) ) ) ) represent tail-recursive
572 Gopalan Nadathur and Dale Miller

type tailrec tm -> o.

(tailrec (fixpt f\(abs x\(abs y\(H x y ) ) ) ) ) .


(tailrec (fixpt f\(abs x\(abs y\(app (app f (H x y))
(G x y ) ) ) ) ) .
(tailrec (fixpt f\(abs x\(abs y\(cond (C x y) (H1 f x y)
(H2 f x y)))) :-
(tailrec (fixpt f\(abs x\(abs y\(Hl f x y ) ) ) ) ) ,
(tailrec (fixpt f\(abs x\(abs y\(H2 f x y ) ) ) ) ) .

Fig. 11. A recognizer for binary tail-recursive functional programs

programs under the instantiations determined for H1 and H2.


These observations can be translated immediately into the definition of a
one place predicate that recognizes tail-recursive functional programs of
two arguments and this is done in Figure 11. It is easily verified that
all three tail-recursive programs considered in this section are recognized
to be so by tailrec. The definition of tailrec can be augmented so
that it also transforms programs that it recognizes to be tail-recursive into
iterative ones in an imperative language. We refer the reader to [Miller and
Nadathur, 1987] for details.

8.3 A limitation of higher-order Horn clauses


Higher-order variables and an understanding of A-conversion result in the
presence of certain interesting meta-programming capabilities in our lan-
guage. This language has a serious deficiency, however, with respect to this
kind of programming that can be exposed by pushing our examples a bit
further. We do this below.
We exhibited an encoding for a first-order logic in Subsection 8.1 and
we presented an interpreter for the Horn clause fragment of this logic. A
natural question to ask in this context is if we can define a (meta-level)
predicate that recognizes the representations of first-order Horn clauses.
It is an easy matter to write predicates that recognize first-order terms,
atomic formulas, and quantifier-free Horn clauses. However, there is a
problem when we attempt to deal with quantifiers. For example, when
does a term of the form (all B) represent an object-level Horn clause?
If we had chosen to represent formulas using only first-order terms, then
universal quantification might have been represented by an expression of
the form forall(x,B), and we could conclude that such an expression
corresponds to a definite clause just in case B corresponds to one. Under
the representation that we actually use, B is a higher-order term and so we
cannot adopt this simple strategy of "dropping" the quantifier. We might
try to mimic it by instantiating the quantifier. But with what should we
Higher-Order Logic Programming 573

perform the instantiation? The ideal solution is to use a new constant, say
dummy, of type term; thus, (all B) would be recognized as a definite clause
if (B dummy) is a definite clause. The problem is that our language does not
provide a logical way to generate such a new constant. An alternative is to
use just any constant. This strategy will work in the present situation, but
there is some arbitrariness to it and, in any case, there are other contexts
in which the newness of the constant is critical.
A predicate that recognizes tail-recursive functional programs of two
arguments was presented in Subsection 8.2. It is natural to ask if it is
possible to recognize tail-recursive programs that have other arities. One
apparent answer to this question is that we mimic the clauses in Figure 11
for each arity. Thus, we might add the clauses
(tailrec (fixpt f\(abs x\(H x ) ) ) ) .
(tailrec (fixpt f\(abs x\(app (appf (h x)) (G x ) ) ) ) ) .
(tailrec (fixpt f\(abs x\(cond (C x) (H1 f x) (H2 f x ) ) ) ) ) :-
(tailrec (fixpt f\(abs x\(H1 f x)))),
(tailrec (fixpt f\(abs x\(H2 f x)))).
to the earlier program to obtain one that also recognizes tail-recursive func-
tional programs of arity 1. However, this is not really a solution to our
problem. What we desire is a finite set of clauses that can be used to
recognize tail-recursive programs of arbitrary arity. If we maintain our en-
coding of functional programs, a solution to this problem seems to require
that we descend through the abstractions corresponding to the arguments
of a program in the term representing the program discharging each of
these abstractions with a new constant, and that we examine the structure
that results from this for tail-recursiveness. Once again we notice that our
language does not provide us with a principled mechanism for creating the
new constants that are needed in implementing this approach.
The examples above indicate a problem in defining predicates in our
language for recognizing terms of certain kinds. This problem becomes
more severe when we consider defining relationships between terms. For
example, suppose we wish to define a binary predicate called prenex that
relates the representations of two first-order formulas only if the second is
a prenex normal form of the first. One observation useful in defining this
predicate is that a formula of the form Vx B has Vx C as a prenex normal
form just in case B has C as a prenex normal form. In implementing this
observation, it is necessary, once again, to consider dropping a quantifier.
Using the technique of substitution with a dummy constant to simulate
this, we might translate our observation into the definite clause
(prenex (all B) (all C)) :- (prenex (B dummy) (C dummy)).
Unfortunately this clause does not capture our observation satisfactorily
and describes a relation on formulas that has little to do with prenex normal
forms. Thus, consider the query
574 Gopalan Nadathur and Dale Miller

?- (prenex (all x\(p x x)) E).


We expect that the only answer to this query is the one that binds E to
(all x\(p x x ) ) , the prenex normal form of (all x\(p x x)). Using
the clause for prenex above, the given goal would be reduced to
?- (prenex (p dummy dummy) (C dummy)).
with E being set to the term (all C). This goal should succeed only if
(p dummy dummy) and (C dummy) are equal. However, as is apparent from
the discussions in Section 6, there are four substitutions for C that unify
these two terms and only one of these yields an acceptable solution to the
original query.
The crux of the problem that is highlighted by the examples above is
that a language based exclusively on higher-order Horn clauses does not
provide a principled way to generate new constants and, consequently, to
descend under abstractions in A-terms. The ability to carry out such a
descent, and, thereby, to perform a general recursion over objects contain-
ing bound variables that are represented by our A-terms, can be supported
by enhancing our language with certain new logical symbols. We consider
such an enhancement in the next section.

9 Hereditary Harrop formulas


The deficiency of our language that was discussed in Subsection 8.3 arises
from the fact that higher-order Horn clauses provide no abstraction or
scoping capabilities at the level of predicate logic to match those present
at the level of terms. It is possible to extend Horn clause logic by allowing
occurrences of implications and universal quantifiers within goal formulas,
thereby producing the logic of hereditary Harrop formulas [Miller et al.,
1991]. This enhancement to the logic leads to mechanisms for controlling
the availability of the names and the clauses that appear in a program. It
has been shown elsewhere that these additional capabilities can be used to
realize notions such as modular programming, hypothetical reasoning, and
abstract data types within logic programming [Miller, 1989b; Miller, 1990;
Nadathur and Miller, 1988]. We show in this section that they can also
be used to overcome the shortcoming of our present language from the
perspective of meta-programming.

9.1 Universal quantifiers and implications in goals


Let S be a set of nonlogical constants and let KAC denote the set of A-normal
terms that do not contain occurrences of the logical constants D and J or
of nonlogical constants that are not elements of S; notice that the only
logical constants that are allowed to appear in terms in K% are T, A, V, V,
and 3. Further, let A and Ar be syntactic variables for atomic formulas and
rigid atomic formulas in /Cr, respectively. Then the higher-order hereditary
Higher- Order Logic Programming 575

Harrop formulas and the corresponding goal formulas relative to S are the
D- and G-formulas defined by the following mutually recursive syntax rules:
D ::= Ar \ G D Ar \VxD\D/\D
G ::= T | A \ G A G | G V G | VzG \ 3xG \ D D G.
Quantification in these formulas, as in the context of higher-order Horn
clauses, may be over function and predicate variables. When we use the
formulas described here in programming, we shall think of a closed D-
formula as a program clause relative to the signature E, a collection of such
clauses as a program, and a G-formula as a query. There are considerations
that determine exactly the definitions of the D- and G-formulas that are
given here, and these are described in [Miller, 1990; Miller et al., 1991].
An approach similar to that in Section 4 can be employed here as well in
explaining what it means to solve a query relative to a program and what
the result of a solution is to be. The main difference is that, in explaining
how a closed goal formula is to be solved, we now have to deal with the
additional possibilities for such formulas to contain universal quantifiers
and implications. The attempt to solve a universally quantified goal for-
mula can result in the addition of new nonlogical constants to an existing
signature. It will therefore be necessary to consider generating instances
of program clauses relative to different signatures. The following definition
provides a method for doing this.
Definition 9.1.1. Let E be a set of nonlogical constants and let a closed
positive ^.-substitution be one whose range is a set of closed terms contained
in K.%. Now, if D is a program clause, then the collection of its closed
positive S-instances is denoted by [D]s and is given as follows:
(i) if D is of the form A or G D A, then it is {D},
(ii) if D is of the form D' A D" then it is [D']s U [D"]s , and
(iii) if D is of the form Vx D' then it is UdvC^'Jls I f is a closed positive
S-substitution for a;}.
This notation is extended to programs as follows: if P is a program,

The attempt to solve an implicational goal formula will lead to an aug-


mentation of an existing program. Thus, in describing an abstract inter-
preter for our new language, it is necessary to parameterize the solution
of a goal formula by both a signature and a program. We do this in the
following definition that modifies the definition of operational semantics
contained in Definition 4.0.5 to suit our present language.
Definition 9.1.2. Let E be a set of nonlogical constants and let P and G
be, respectively, a program and a goal formula relative to S. We then use
the notation T.\P \~OG to signify that our abstract interpreter succeeds on
G when given the signature S and the logic program P. The success/failure
576 Gopalan Nadathur and Dale Miller

behavior of the interpreter for hereditary Harrop formulas is itself specified


as follows:
(i) E;Ph 0 T,
(ii) E;P \~OA where A is an atomic formula if and only if A = A' for
some A' e [P]s or for some (G D A') £ [7>]E such that A = A' it is
the case that S;"Ph o G,
(iii) S; P h0 G1 A G2 if and only if S; P ho G1 and S; T> ho G2,
(iv) E; P h0 G1 V G2 if and only if S; P ho G1 or E;P ^G 2 ,
(v) S; 7? h0 3x G if and only if S; P ho [t/x]G for some term t 6 /CE with
the same type as x,
(vi) S; P h0 D D G if and only if E; P U {£>} ho G, and
(vii) S; P ho Vx G if and only if S U {c}; P \~o [c/x]G where c is a nonlogical
constant that is not already in S and that has the same type as x.

Let P be a program, let G be a closed goal formula and let S be any


collection of nonlogical constants that includes those occurring in P and
G. Using techniques similar to those in Section 5, it can then be shown
that P (-/ G if and only if Z;P h o G; see [Miller et al., 1991] for details.
Thus \~o coincides with h/ in the context of interest. It is easy to see,
however, that tc is a stronger relation than ho. For example, the goal
formulas p V (p D q) and (((r o) A (r b)) D q) D 3x((r x) D q) (in which
p, q, r and o are parameters) are both provable in classical logic but they
are not solvable in the above operational sense. (The signature remains
unchanged in the attempt to solve both these goal formulas and hence can
be elided.) The operational interpretation of hereditary Harrop formulas
is based on a natural understanding of the notion of goal-directed search
and we shall therefore think of the complementary logical interpretation of
these formulas as being given by intuitionistic logic and not classical logic.
The main novelty from a programming perspective of a language based
on hereditary Harrop formulas over one based on Horn clauses is the fol-
lowing: in the new language it is possible to restrict the scope of nonlogical
constants and program clauses to selected parts of the search for a solution
to a query. It is these scoping abilities that provide the basis for notions
such as modular programming, hypothetical reasoning, and abstract data
types in this language. As we shall see later in this section, these abil-
ities also provide for logic-level abstractions that complement term-level
abstractions.
A (deterministic) interpreter can be constructed for a language based
on higher-order hereditary Harrop formulas in a fashion similar to that
in the case of higher-order Horn clauses. There are, however, some dif-
ferences in the two contexts that must be taken into account. First, the
solution to a goal formula must be attempted in the context of a spe-
cific signature and program associated with that goal formula and not in
Higher-Order Logic Programming 577

type term term -> o.


type atom form -> o.

(term a).
(term b).
(term c).
(term (f X)) :- (term X).

(atom (path X Y)) :- (term X), (term Y).


(atom (adj X Y)) :- (term X), (term Y).

Fig. 12. Recognizers for object-language terms and atomic formulas

a global context. Each goal formula must, therefore, carry a relevant sig-
nature and program. Second, the permitted substitutions for each logic
variable that is introduced in the course of solving an existential query
or instantiating a program clause are determined by the signature that
is in existence at the time of introduction of the logic variable. It is
therefore necessary to encode this signature in some fashion in the logic
variable and to use it within the unification process to ensure that in-
stantiations of the variable contain only the permitted nonlogical con-
stants. Several different methods can be used to realize this requirement
in practice, and some of these are described in [Miller, 1989a; Miller, 1992;
Nadathur, 1993]).

9.2 Recursion over structures with binding


The ability to descend under abstractions in A-terms is essential in de-
scribing a general recursion over representations of the kind described in
Section 8 for objects containing bound variables. It is necessary to "dis-
charge" the relevant abstractions in carrying out such a descent, and so the
ability to generate new constants is crucial to this process. The possibility
for universal quantifiers to appear in goal formulas with their associated
semantics leads to such a capability being present in a language based on
higher-order hereditary Harrop formulas. We show below that this suffices
to overcome the problems outlined in Subsection 8.3 for a language based
on higher-order Horn clauses.
We consider first the problem of recognizing the representations of first-
order Horn clauses. Our solution to this problem assumes that the object-
level logic has a fixed signature that is given by the type declarations in
Figure 7. Using a knowledge of this signature, it is a simple matter to define
predicates that recognize the representations of (object-level) terms and
atomic formulas. The clauses for term and atom that appear in Figure 12
serve this purpose in the present context. These predicates are used in
578 Gopalan Nadathur and Dale Miller

type defcl form -> o.


type goal form -> o.

(defcl (all C)) :- (pi x\((term x) => (defcl (C x)))).


(defcl (imp G A)) :- (atom A), (goal G) .
(defcl A) :- (atom A).

(goal truth).
(goal (and B C)) :- (goal B), (goal C).
(goal (or B C)) :- (goal B), (goal C).
(goal (some C)) :- (pi x\((term x) => (goal (C x)))).
(goal A) :- (atom A).

Fig. 13. Recognizing representations of first-order Horn clauses

Figure 13 in defining the predicate defcl and goal that are intended to
be recognizers of encodings of object-language definite clauses and goal
formulas respectively. We use the symbol => to represent implications in
(meta-level) goal formulas in the program clauses that appear in this figure.
Recall that the symbol pi represents the universal quantifier.
In understanding the definitions presented in Figure 13, it is useful to
focus on the first clause for defcl that appears in it. This clause is not
an acceptable one in the Horn clause setting because an implication and a
universal quantifier are used in its "body". An attempt to solve the query
?- (defcl (all x\(all y\(all z\
(imp (and (adj x y) (path y z)) (path x z)))))).
will result in this clause being used. This will, in turn, result in the variable
C that appears in the clause being instantiated to the term
x\(all y\(all z\(imp (and (adj x y) (path y z))
(path x z ) ) ) ) .
The way in which this A-term is to be processed can be described as follows.
First, a new constant must be picked to play the role of a name for the
bound variable x. This constant must be added to the signature of the
object-level logic, in turn requiring that the definition of the predicate term
be extended. Finally, the A-term must be applied to the new constant and
it must be checked if the resulting term represents a Horn clause. Thus,
if the new constant that is picked is d, then an attempt must be made to
solve the goal formula
(defcl (all y\(all z\(imp (and (adj d y) (path y z))
(path d z ) ) ) ) )
after the program has been augmented with the clause (term d). From
the operational interpretation of implications and universal quantifiers de-
Higher-Order Logic Programming 579

type quantfree form -> o.

(quantfree false).
(quantfree truth).
(quantfree A) :- (atom A).
(quantfree (and B C)) :- ((quantfree B), (quantfree O).
(quantfree (or B C)) :- ((quantfree B), (quantfree C)).
(quantfree (imp B C)) :- ((quantfree B), (quantfree C)).

Fig. 14. Recognizing quantifier free formulas in a given object-language

scribed in Definition 9.1.2, it is easily seen that this is exactly the compu-
tation that is performed with the A-term under consideration.
One of the virtues of our extended language is that it is relatively
straightforward to verify formal properties of programs written in it. For
example, consider the following property of the definition of defcl: If the
query (defcl (all B)) has a solution and if T is an object-level term (that
is, (term T) is solvable), then the query (defcl (B T)) has a solution,
i.e., the property of being a Horn clause is maintained under first-order
universal instantiation. This property can be seen to hold by the following
argument: If the query (defcl (all B)) is solvable, then it must be the
case that the query pi x\((term x) => (defcl (B x))) is also solvable.
Since (term T) is provable and since universal instantiation and modus
ponens holds for intuitionistic logic and since the operational semantics of
our language coincides with intuitionistic provability, we can conclude that
the query (defcl (B T)) has a solution. The reader will easily appreci-
ate the fact that proofs of similar properties of programs written in other
programming languages would be rather more involved than this.
Ideas similar to those above are used in Figures 16 and 15 in defining
the predicate prenex. The declarations in these figures are assumed to
build on those in Figure 7 and 12 that formalize the object-language and
those in Figure 14 that define a predicate called quantfree that recognizes
quantifier free formulas. The predicate merge that is defined in Figure 15
serves to raise the scopes of quantifiers over the binary connectives. This
predicate is used in the definition of prenex in Figure 16. An interpreter
for our language should succeed on the query
?- (prenex (or (all x\(and (adj x x) (and (all y\(path x y))
(adj (f x) c))))
(adj a b))
Pnf).
relative to a program consisting of these various clauses and should produce
the term
all x\(all y\(or (and (adj x x)
580 Gopalan Nadathur and Dale Miller

(and (path x y) (adj (f x) c)))


(adj a b)))
as the (unique) binding for Pnf when it does. The query
?- (prenex (and (all x\(adj x x)) (all z\(all y\(adj z y ) ) ) )
Pnf).
is also solvable, but could result in Pnf being bound to any one of the terms
all z\(all y\(and (adj z z) (adj z y))),
all x\(all z\(all y\(and (adj x x) (adj z y)))),
all z\(all x\(and (adj x x) (adj z x))),
all z\(all x\(all y\(and (adj x x) (adj z y ) ) ) ) , a n d
all z\(all y\(all x\(and (adj x x) (adj z y ) ) ) ) .
We now consider the task of defining a predicate that recognizes the
representations of tail-recursive functions of arbitrary arity. We assume
the same set of predefined constants and functions for our functional pro-
gramming language here as we did in Subsection 8.2 and the encodings
for these are given, once again, by the declarations in Figure 10. Now,
our approach to recognizing the A-terms of interest will, at an operational
level, be the following: we shall descend through the abstractions in the
representation of a program discharging each of them with a new constant
and then check if the structure that results from this process satisfies the
requirements of tail-recursiveness. The final test requires the constant that
is introduced as the name of the function, i.e., the constant used to instan-
tiate the top-level abstracted variable of M in the expression (fixpt M), to
be distinguished from the other nonlogical constants. Thus, suppose that
the expression that is produced by discharging all the abstractions is of the
form (cond C M N). For the overall A-term to be the representation of a
tail-recursive program, it must, first of all, be the case that the constant
introduced as the name of the function does not appear in the term C. Fur-
ther, each of M and N should be a A-term that contains an occurrence of the
constant introduced as the name of the function at most as its head symbol
or that represents a conditional that recursively satisfies the requirements
being discussed.
The required distinction between constants and the processing described
above are reflected in the clauses in Figure 17 that eventually define the
desired predicate tailrec. Viewed operationally, the clauses for the pred-
icates tailrec and trfn realize a descent through the abstractions repre-
senting the name and arguments of a function. Notice, however, that the
program is augmented differently in each of these cases: headrec is asserted
to hold of the constant that is added in descending through the abstraction
representing the name of the function whereas term is asserted to hold of
the new constants introduced for the arguments. Now, the predicates term
and headrec that are also defined by the clauses in this figure recognize
Higher-Order Logic Programming 581

type merge form -> form -> o.

(merge (and (all B) (all O) (all D)) :-


(pi x\((term x) => (merge (and (B x) (C x)) (D x)))).
(merge (and (all B) C) (all D)) :-
(pi x\((term x) => (merge (and (B x) C) (D x)))).
(merge (and (some B) C) (some D)) :-
(pi x\((term x) => (merge (and (B x) C) (D x)))).
(merge (and B (all O) (all D)) :-
(pi x\((term x) => (merge (and B (C x)) (D x)))).
(merge (and B (some O) (some D)) :-
(pi x\((term x) => (merge (and B (C x)) (D x)))).
(merge (or (some B) (some C)) (some D)) :-
(pi x\((term x) => (merge (or (B x) (C x)) (D x)))).
(merge (or (all B) C) (all D)) :-
(pi x\((term x) => (merge (or (B x) C) (D x)))).
(merge (or (some B) C) (some D)) :-
(pi x\((term x) => (merge (or (B x) C) (D x)))).
(merge (or B (all O) (all D)) :-
(pi x\((term x) => (merge (or B (C x)) (D x)))).
(merge (or B (some O) (some D)) :-
(pi x\((term x) => (merge (or B (C x)) (D x)))).
(merge (imp (all B) (some C)) (some D)) :-
(pi x\((term x) => (merge (imp (B x) (C x)) (D x)))).
(merge (imp (all B) C) (some D)) :-
(pi x\((term x) => (merge (imp (B x) C) (D x)))).
(merge (imp (some B) C) (all D)) :-
(pi x\((term x) => (merge (imp (B x) C) (D x)))).
(merge (imp B (all C)) (all D)) :-
(pi x\((term x) => (merge (imp B (C x)) (D x ) ) ) ) .
(merge (imp B (some c)) (some D)) :-
(pi x\((term x) => (merge (imp B (C x)) (D x ) ) ) ) .
(merge B B) :- (quantfree B).

Fig. 15. Raising the scopes of quantifiers

two different classes of A-terms representing functional programs: (i) those


that are constructed using the original set of nonlogical constants and the
ones introduced for the arguments of programs and (ii) those in which the
constant introduced as the name of the program also appears, but only as
the head symbol. The predicate term is defined by reflecting the various
type declarations in Figure 10 into the meta-language. This definition will,
of course, get augmented to realize any extension that occurs to the first
582 Gopalan Nadathur and Dale Miller

type prenex form -> form -> o.

(prenex false false).


(prenex truth truth).
(prenex B B) :- (atom B).
(prenex (and B C) D) :-
(prenex B U), (prenex C V), (merge (and U V) D).
(prenex (or B C) D) :-
(prenex B U), (prenex C V), (merge (or U V) D).
(prenex (imp B C) D) :-
(prenex B U), (prenex C V), (merge (imp U V) D).
(prenex (all B) (all D)) :-
(pi x\((term x) => (prenex (B x) (D x)))).
(prenex (some B) (some D)) :-
(pi x\((term x) => (prenex (B x) (D x)))).

Fig. 16. Relating first-order formulas and their prenex normal forms

category of terms through the addition of new nonlogical constants. The


conditions under which the predicate headrec should be true of a term can
be expressed as follows: the term must either be identical to the constant
introduced as the name of the function or it must be unifiable with the term
(app M N) with headrec and term being true of the respective resulting
instantiations of M and N. These conditions are obviously captured by the
clause for headrec in Figure 17 and the one that is added in descending
under the abstraction representing the name of the function. The final test
for tail-recursiveness can be carried out easily using these predicates and
this is, in fact, manifest in the definition of trbody in Figure 17.
Our last example in this section concerns the task of assigning types to
expressions in our functional programming language. We shall, of course,
be dealing with encodings of types and expressions of the object-language,
and we need a representation for types to complement the one we already
have for expressions. Figure 18 contains declarations that are useful for this
purpose. The sort ty that is identified by these declarations is intended
to be the meta-level type of terms that encode object-level types. We
assume three primitive types at the object-level: those for booleans, natural
numbers and lists of natural numbers. The nonlogical constants boole,
nat and natlist serve to represent these. Further object-level types are
constructed by using the function and pair type constructors and these are
encoded by the binary constants arrow and pairty, respectively.
The program clauses in Figure 19 define the predicate typeof that
relates the representation of an expression in our functional programming
language to the representation of its type. These clauses are, for the most
Higher-Order Logic Programming 583

type term tm -> o.


type tailrec tm -> o.
type trfn tm -> o.
type trbody tm -> o.
type headrec tm -> o.

(term (abs R)) :- (pi x\((term x) => (term (R x)))).


(term (app M N) ) : - (term M) , (term N) .
(term (cond M N P ) ) :- (term M) , (term N) , (term P) .
(term (fixpt R)) :- (pi x\((term x) => (term (R x)))).
(term truth) .
(term false) .
(term (and M N)) - (term M) , (term N) .
(term (c X)).
(term (+ M N)) - (term M) , (term N) .
(term (- M N)) - (term M) , (term N) .
(term (* M N)) - (term M) , (term N) .
(term (< M N)) - (term M) , (term N) .
(term (= M N)) - (term M) , (term N) .
(term (intp M)) - (term M) .
(term nill) .
(term (cons M N)) - (term M) , (term N) .
(term (null M) ) - (term M) .
(term (consp M)) - (term M) .
(term (car M)) - (term M) .
(term (cdr M)) - (term M) .
(term (pair M N)) - (term M) , (term N) .
(term (pairp M)) - (term M) .
(term (first M)) - (term M) .
(term (second M)) - (term M) .
(term error) .

(tailrec (fixpt M))


(pi f\((headrec f) => (trfn (M f)))).
(trfn (abs R)) - (pi x\((term x) => (trfn (R x)))).
(trfn R) - (trbody R) .
(trbody (cond M N P ) ) - (term M) , (trbody N) , (trbody P) .
(trbody M) - (term M) ; (headrec M) .
(headrec (app M N) ) - (headrec M) , (term N) .

Fig. 17. Recognizing tail-recursive functional programs of arbitrary arity


584 Gopalan Nadathur and Dale Miller

kind ty type.

type boole ty.


type nat ty.
type natlist ty.
type arrow ty -> ty -> ty.
type pairty ty -> ty -> ty.

Fig. 18. Encoding types for functional programs

part, self explanatory. We draw the attention of the reader to the first two
clauses in this collection that are the only ones that deal with abstractions
in the representation of functional programs. Use is made in both cases of
a combination of universal quantification and implication that is familiar
by now in order to move the term-level abstraction into the meta-level. We
observe, once again, that the manner of definition of the typing predicate
in our language makes it easy to establish formal properties concerning it.
For example, the following property can be shown to be true of typeof by
using arguments similar to those used in the case of defcl: if the queries
(typeof (abs M) (arrow A B)) and (typeof N A) have solutions, then
the query (typeof (M N) B) also has a solution.
We refer the reader to [Felty, 1993; Pareschi and Miller, 1990; Hannan
and Miller, 1992; Miller, 1991] for other, more extensive, illustrations of
the value of a language based on higher-order hereditary Harrop formulas
from the perspective of meta-programming. It is worth noting that all the
example programs presented in this section as well as several others that
are described in the literature fall within a sublanguage of this language
called L\. This sublanguage, which is described in detail in [Miller, 1991],
has the computationally pleasant property that higher-order unification is
decidable in its context and admits of most general unifiers.

10 Conclusion
We have attempted to develop the notion of higher-order programming
within logic programming in this chapter. A central concern in this en-
deavour has been to preserve the declarative style that is a hallmark of
logic programming. Our approach has therefore been to identify an ana-
logue of first-order Horn clauses in the context of a higher-order logic; this
analogue must, of course, preserve the logical properties of the first-order
formulas that are essential to their computational use while incorporating
desirable higher-order features. This approach has led to the description of
the so-called higher-order Horn clauses in the Simple Theory of Types, a
higher-order logic that is based on a typed version of the lambda calculus.
An actual use of these formulas in programming requires that a practically
Higher-Order Logic Programming 585

type typeof tm -> ty -> o.

(typeof (abs M) (arrow A B)) :-


(pi x\((typeof x A) => (typeof (M x) B))).
(typeof (fixpt M) A) :-
(pi x\((typeof x A) => (typeof (M x) A))),
(typeof (app M N) B) :-
(typeof M (arrow A B)), (typeof N A),
(typeof (cond C L R) A) :-
(typeof C boole), (typeof L A), (typeof R A).
(typeof truth boole).
(typeof false boole).
(typeof (and M N) boole) :-
(typeof M boole), (typeof N boole).
(typeof (c X) nat).
(typeof (+ M N) nat)
(typeof M nat), (typeof N nat).
(typeof (- M N) nat)
(typeof M nat), (typeof N nat).
(typeof (* M N) nat)
(typeof M nat), (typeof N nat).
(typeof (< M N) boole)
(typeof M nat), (typeof N nat).
(typeof (= M N) boole)
(typeof M nat), (typeof N nat).
(typeof (intp M) boole) - (typeof M A).
(typeof nill natlist).
(typeof (cons M N) natlist)
(typeof M nat), (typeof N natlist).
(typeof (null M) boole) (typeof M natlist).
(typeof (consp M) boole) (typeof M natlist).
(typeof (car M) nat) (typeof M natlist).
(typeof (cdr M) natlist) (typeof M natlist).
(typeof (pair M N) (pairty A B))
(typeof M A), (typeof N B)
(typeof (pairp M) boole) (typeof M A).
(typeof (first M) A) (typeof M (pair A B ) ) ,
(typeof (second M) B) (typeof M (pair A B ) ) ,
(typeof error A).

Fig. 19. A predicate for typing functional programs


586 Gopalan Nadathur and Dale Miller

acceptable proof procedure exist for them. We have exhibited such a proce-
dure by utilizing the logical properties of the formulas in conjunction with
a procedure for unifying terms of the relevant typed A-calculus. We have
then examined the applications for a programming language that is based
on these formulas. As initially desired, this language provides for the usual
higher-order programming features within logic programming. This lan-
guage also supports some unusual forms of higher-order programming: it
permits A-terms to be used in constructing the descriptions of syntactic ob-
jects such as programs and quantified formulas, and it allows computations
to be performed on these descriptions by means of the A-conversion rules
and higher-order unification. These novel features have interesting uses in
the realm of meta-programming and we have illustrated this fact in this
chapter. A complete realization of these meta-programming capabilities,
however, requires a language with a larger set of logical primitives than
that obtained by using Horn clauses. These additional primitives are in-
corporated into the logic of hereditary Harrop formulas. We have described
this logic here and have also outlined some of the several applications that
a programming language based on this logic has in areas such as theo-
rem proving, type inference, program transformation, and computational
linguistics.
The discussions in this chapter reveal a considerable richness to the
notion of higher-order logic programming. We note also that these dis-
cussions are not exhaustive. Work on this topic continues along several
dimensions such as refinement, modification and extension of the language,
implementation, and exploration of applications, especially in the realm of
meta-programming.

Acknowledgements
We are grateful to Gilles Dowek for his comments on this chapter. Miller's
work has been supported in part by the following grants: ARO DAAL03-89-
0031, ONR N00014-93-1-1324, NSF CCR91-02753, and NSF CCR92-09224.
Nadathur has similarly received support from the NSF grants CCR-89-
05825 and CCR-92-08465.

References
[Andrews, 1971] Peter B. Andrews. Resolution in type theory. Journal of
Symbolic Logic, 36:414-432, 1971.
[Andrews, 1989] Peter B. Andrews. On connections and higher-order logic.
Journal of Automated Reasoning, 5(3):257-291, 1989.
[Andrews et al., 1984] Peter B. Andrews, Eve Longini Cohen, Dale Miller,
and Frank Pfenning. Automating higher order logic. In Automated The-
orem Proving: After 25 Years, pages 169-192. American Mathematical
Society, Providence, RI, 1984.
Higher-Order Logic Programming 587

[Apt and van Emden, 1982] K. R. Apt and M. H. van Emden. Contribu-
tions to the theory of logic programming. Journal of the ACM, 29(3):841-
862, 1982.
[Barendregt, 1981] H. P. Barendregt. The Lambda Calculus: Its Syntax
and Semantics. North-Holland, 1981.
[Bledsoe, 1979] W. W. Bledsoe. A maximal method for set variables in
automatic theorem-proving. In Machine Intelligence 9, pages 53-100.
John Wiley, 1979.
[Brisset and Ridoux, 1991] Pascal Brisset and Olivier Ridoux. Naive re-
verse can be linear. In Eighth International Logic Programming Confer-
ence, Paris, France, June 1991. MIT Press.
[Brisset and Ridoux, 1992] Pascal Brisset and Olivier Ridoux. The archi-
tecture of an implementation of AProlog: Prolog/Mali. In Dale Miller,
editor, Proceedings of the 1992 ^Prolog Workshop, 1992.
[Chen et al., 1993] Weidong Chen, Michael Kifer, and David S. Warren.
HiLog: A foundation for higher-order logic programming. Journal of
Logic Programming, 15(3): 187-230, February 1993.
[Church, 1940] Alonzo Church. A formulation of the simple theory of types.
Journal of Symbolic Logic, 5:56-68, 1940.
[Damas and Milner, 1982] Luis Damas and Robin Milner. Principal type
schemes for functional programs. In Proceedings of the Ninth ACM Sym-
posium on Principles of Programming Languages, pages 207-212. ACM
Press, 1982.
[Elliott and Pfenning, 1991] Conal Elliott and Frank Pfenning. A semi-
functional implementation of a higher-order logic programming language.
In Peter Lee, editor, Topics in Advanced Language Implementation,
pages 289-325. MIT Press, 1991.
[Felty, 1993] Amy Felty. Implementing tactics and tacticals in a higher-
order logic programming language. Journal of Automated Reasoning,
11(1):43-81, August 1993.
[Felty and Miller, 1988] Amy Felty and Dale Miller. Specifying theorem
provers in a higher-order logic programming language. In Ninth Interna-
tional Conference on Automated Deduction, pages 61-80, Argonne, IL,
May 1988. Springer-Verlag.
[Gentzen, 1969] Gerhard Gentzen. Investigations into logical deduction.
In M. E. Szabo, editor, The Collected Papers of Gerhard Gentzen, pages
68-131. North-Holland, 1969.
[Girard et al., 1989] Jean-Yves Girard, Paul Taylor, and Yves Lafont.
Proofs and Types. Cambridge University Press, 1989.
[Goldfarb, 1981] Warren Goldfarb. The undecidability of the second-order
unification problem. Theoretical Computer Science, 13:225-230, 1981.
588 Gopalan Nadathur and Dale Miller

[Gordon et al., 1979] Michael J. Gordon, Arthur J. Milner, and Christo-


pher P. Wadsworth. Edinburgh LCF: A Mechanised Logic of Computa-
tion, volume 78 of Lecture Notes in Computer Science. Springer-Verlag,
1979.
[Gould, 1976] W. E. Gould. A matching procedure for w-order logic. Sci-
entific Report No. 4, A F C R L, 1976.
[Hannan, 1993] John Hannan. Extended natural semantics. Journal of
Functional Programming, 3(2):123-152, April 1993.
[Hannan and Miller, 1992] John Hannan and Dale Miller. From opera-
tional semantics to abstract machines. Mathematical Structures in Com-
puter Science, 2(4):415-459, 1992.
[Henkin, 1950] Leon Henkin. Completeness in the theory of types. Journal
of Symbolic Logic, 15:81-91, 1950.
[Huet, 1973a] Gerard Huet. A mechanization of type theory. In Proceed-
ings of the Third International Joint Conference on Artifical Intelligence,
pages 139-146, 1973.
[Huet, 1973b] Gerard Huet. The undecidability of unification in third order
logic. Information and Control, 22:257-267, 1973.
[Huet, 1975] Gerard Huet. A unification algorithm for typed A-calculus.
Theoretical Computer Science, 1:27-57, 1975.
[Huet and Lang, 1978] Gerard Huet and Bernard Lang. Proving and ap-
plying program transformations expressed with second-order patterns.
Acta Informatica, 11:31-55, 1978.
[Kwon et al, 1994] Keehang Kwon, Gopalan Nadathur, and Debra Sue
Wilson. Implementing polymorphic typing in a logic programming lan-
guage. Computer Languages, 20(l):25-42, 1994.
[Lucchesi, 1972] C. L. Lucchesi. The undecidability of the unification prob-
lem for third order languages. Technical Report CSRR 2059, Department
of Applied Analysis and Computer Science, University of Waterloo, 1972.
[Miller, 1989a] Dale Miller. Lexical scoping as universal quantification.
In Sixth International Logic Programming Conference, pages 268-283,
Lisbon, Portugal, June 1989. MIT Press.
[Miller, 1989b] Dale Miller. A logical analysis of modules in logic program-
ming. Journal of Logic Programming, 6:79-108, 1989.
[Miller, 1990] Dale Miller. Abstractions in logic programming. In Peir-
giorgio Odifreddi, editor, Logic and Computer Science, pages 329-359.
Academic Press, 1990.
[Miller, 1991] Dale Miller. A logic programming language with lambda-
abstraction, function variables, and simple unification. Journal of Logic
and Computation, l(4):497-536, 1991.
[Miller, 1992] Dale Miller. Unification under a mixed prefix. Journal of
Symbolic Computation, pages 321-358, 1992.
Higher-Order Logic Programming 589

[Miller, 1994] Dale Miller. A multiple-conclusion meta-logic. In S. Abram-


sky, editor, Ninth Annual Symposium on Logic in Computer Science,
pages 272-281, Paris, July 1994.
[Miller and Nadathur, 1987] Dale Miller and Gopalan Nadathur. A logic
programming approach to manipulating formulas and programs. In Seif
Haridi, editor, IEEE Symposium on Logic Programming, pages 379-388,
San Francisco, September 1987.
[Miller and Nadathur, 1988] Dale Miller and Gopalan Nadathur. AProlog
Version 2.7. Distribution in C-Prolog and Quintus sources, July 1988.
[Miller et al., 1991] Dale Miller, Gopalan Nadathur, Prank Pfenning, and
Andre Scedrov. Uniform proofs as a foundation for logic programming.
Annals of Pure and Applied Logic, 51:125-157, 1991.
[Milner et al., 1990] Robin Milner, Mads Tofte, and Robert Harper. The
Definition of Standard ML. MIT Press, 1990.
[Nadathur, 1987] Gopalan Nadathur. A Higher-Order Logic as the Basis
for Logic Programming. PhD thesis, University of Pennsylvania, 1987.
[Nadathur, 1993] Gopalan Nadathur. A proof procedure for the logic
of hereditary Harrop formulas. Journal of Automated Reasoning,
11(1):115-145, August 1993.
[Nadathur, 1997] Gopalan Nadathur. A notion of models for higher-order
logic. Manuscript in preparation, 1997.
[Nadathur and Miller, 1988] Gopalan Nadathur and Dale Miller. An
Overview of AProlog. In Fifth International Logic Programming Con-
ference, pages 810-827, Seattle, Washington, August 1988. MIT Press.
[Nadathur and Miller, 1990] Gopalan Nadathur and Dale Miller. Higher-
order Horn clauses. Journal of the ACM, 37(4):777-814, October 1990.
[Nadathur and Pfenning, 1992] Gopalan Nadathur and Prank Pfenning.
The type system of a higher-order logic programming language. In
Frank Pfenning, editor, Types in Logic Programming, pages 245-283.
MIT Press, 1992.
[Nadathur et al., 1993] Gopalan Nadathur, Bharat Jayaraman, and De-
bra Sue Wilson. Implementation considerations for higher-order fea-
tures in logic programming. Technical Report CS-1993-16, Department
of Computer Science, Duke University, June 1993.
[Nadathur et al., 1995] Gopalan Nadathur, Bharat Jayaraman, and Kee-
hang Kwon. Scoping constructs in logic programming: Implementation
problems and their solution. Journal of Logic Programming, 25(2): 119-
161, November 1995.
[Pareschi and Miller, 1990] Remo Pareschi and Dale Miller. Extending def-
inite clause grammars with scoping constructs. In David H. D. Warren
and Peter Szeredi, editors, Seventh International Conference in Logic
Programming, pages 373-389. MIT Press, June 1990.
590 Gopalan Nadathur and Dale Miller

[Paulson, 1990] Lawrence C. Paulson. Isabelle: The next 700 theorem


provers. In Peirgiorgio Odifreddi, editor, Logic and Computer Science,
pages 361-386. Academic Press, 1990.
[Shapiro, 1985] Steward Shapiro. Second-order languages and mathemati-
cal practice. Journal of Symbolic Logic, 50(3):714-742, September 1985.
[Sterling and Shapiro, 1986] Leon Sterling and Ehud Shapiro. The Art of
Prolog: Advanced Programming Techniques. MIT Press, 1986.
[Wadge, 1991] William W. Wadge. Higher-order Horn logic programming.
In 1991 International Symposium on Logic Programming, pages 289-303.
MIT Press, October 1991.
[Warren, 1982] David H. D. Warren. Higher-order extensions to Prolog:
Are they needed? In Machine Intelligence 10, pages 441-454. Halsted
Press, 1982.
Constraint Logic Programming: A
Survey
Joxan Jaffar and Michael J. Maher

Contents
1 Introduction 592
1.1 Constraint languages 593
1.2 Logic Programming 595
1.3 CLP languages 596
1.4 Synopsis 598
1.5 Notation and terminology 599
2 Constraint domains 601
3 Logical semantics 608
4 Fixedpoint semantics 609
5 Top-down execution 611
6 Soundness and completeness results 615
7 Bottom-up execution 617
8 Concurrent constraint logic programming 619
9 Linguistic extensions 621
9.1 Shrinking the computation tree 621
9.2 Complex constraints 623
9.3 User-defined constraints 624
9.4 Negation 625
9.5 Preferred solutions 626
10 Algorithms for constraint solving 628
10.1 Incrementality 628
10.2 Satisfiability (non-incremental) 630
10.3 Satisfiability (incremental) 633
10.4 Entailment 637
10.5 Projection 640
10.6 Backtracking 643
11 Inference engine 645
11.1 Delaying/wakeup of goals and constraints .... 645
11.2 Abstract machine 651
592 Joxan Jaffar and Michael J. Maher

11.3 Parallel implementations 657


12 Modelling of complex problems 658
12.1 Analysis and synthesis of analog circuits 658
12.2 Options trading analysis 660
12.3 Temporal reasoning 664
13 Combinatorial search problems 665
13.1 Cutting stock 666
13.2 DNA sequencing 668
13.3 Scheduling 670
13.4 Chemical hypothetical reasoning 671
13.5 Prepositional solver 674
14 Further applications 675

1 Introduction
Constraint Logic Programming (CLP) began as a natural merger of two
declarative paradigms: constraint solving and logic programming. This
combination helps make CLP programs both expressive and flexible, and
in some cases, more efficient than other kinds of programs. Though a
relatively new field, CLP has progressed in several and quite different di-
rections. In particular, the early fundamental concepts have been adapted
to better serve in different areas of applications. In this survey of CLP,
a primary goal is to give a systematic description of the major trends in
terms of common fundamental concepts.
Consider first an example program in order to identify some crucial CLP
concepts. The program below defines the relation sumto(n, 1 + 2 + . . . . . + n)
for natural numbers n.
sumto(0, 0).
sumto(N, S) :- N >= 1, N <= S, sumto(N - 1, S - N).
The query S <= 3, sumto(N, S) gives rise to three answers (N = 0, S =
0), (N = 1, S =1), and (N = 2, S = 3), and terminates. The computation
sequence of states for the third answer, for example, is

S <3,sumto(N,S).

S<3,N = N 1 ,S = Si,N 1 > l , N 1 <S1,


sumto(N 1 — 1,Si -
Constraint Logic Programming 593

S = S 1 ,N 1 >1,N 1 < S 1 ,
-N1 = S2,N2 >l,N2< S2,

3,N = N 1 ,S = S 1 , N 1 > l.N 1 < S 1 ,


-l = N 2 , S 1 -N1 = S2,N2 >1,N2< S2,

The constraints in the final state imply the answer N = 2, S = 3. Termi-


nation is reasoned as follows. Any infinite computation must use only the
second program rule for state transitions. This means that its first three
states must be as shown above, and its fourth state must be
S < 3,N = N1,S = S1,N1> l,N1 < S1,
N i - 1 = N2,S1 - N 1 = S2,N2 >l,N2< S 2 ,
N2-l = N3,S2 -N3 = S3,N3 > 1 , N 3 < S3,
sumto(...)

We note now that this contains an unsatisfiable set of constraints, and in


CLP, no further reductions are allowed.
This example shows the following key features in CLP:
• Constraints are used to specify the query as well as the answers.
• During execution, new variables and constraints are created.
• The collection of constraints in every state is tested as a whole for
satisfiability before execution proceeds further.
In summary, constraints are: used for input/output, dynamically gener-
ated, and globally tested in order to control execution.

1.1 Constraint languages


Considerable work on constraint programming languages preceded logic
programming and constraint logic programming. We now briefly survey
some important works, with a view toward the following features. Are
constraints used for input/output? Can new variables and/or constraints
be dynamically generated? Are constraints used for control? What is the
constraint solving algorithm, and to what extent is it complete? What
follows is adapted from the survey in [Michaylov, 1992].
SKETCHPAD [Sutherland, 1963] was perhaps the earliest work that
one could classify as a constraint language. It was, in fact, an interactive
drawing system, allowing the user to build geometric objects from lan-
guage primitives and certain constraints. The constraints are static, and
were solved by local propagation and relaxation techniques. (See chap-
ter 2 in [Leler, 1988] for an introduction to these and related techniques.)
594 Joxan Jaffar and Michael J. Maher

Subsequent related work was THINGLAB [Borning, 1981] whose language


took an object-oriented flavor. While local propagation and relaxation
were also used to deal with the essentially static constraints, the system
considered constraint solving in two different phases. When a graphical
object is manipulated, a plan is generated for quickly re-solving the appro-
priate constraints for the changed part of the object. This plan was then
repeatedly executed while the manipulation continued. Works following
the THINGLAB tradition included the Filters project [Ege et al., 1987]
and Animus [Duisburg, 1986]. Another graphical system, this one focusing
on geometrical layout, was JUNO [Nelson, 1985]. The constraints were
constructed, as in THINGLAB, by text or graphical primitives, and the
geometric object could be manipulated. A difference from the above men-
tioned works is that constraint solving was performed numerically using a
Newton-Raphson solver.
Another collection of early works arose from MIT, motivated by applica-
tions in electrical circuit analysis and synthesis, and gave rise to languages
for general problem solving. In the CONSTRAINTS language [Steele and
Sussman, 1980], variables and constraints are static, and constraint solv-
ing was limited to using local propagation. An extension of this work
[Steele, 1980] provided a more sophisticated environment for constraint
programming, including explanation facilities. Some other related systems,
EL/ARS [Stallman and Sussman, 1977] and SYN [de Kleer and Sussman,
1980], used the constraint solver MACSYMA [MathLab, 1983] to avoid
the restrictions of local propagation. It was noted at this period [Steele,
1980] that there was a conceptual correspondence between the constraint
techniques and logic programming.
The REF-ARF system [Fikes, 1970] was also designed for problem solv-
ing. One component, REF, was essentially a procedural language, but with
nondeterminism because of constraints used in conditional statements. The
constraints are static. They are, in fact, linear integer constraints, and all
variables are bounded above and below. The constraint solver ARF used
backtracking.
The Bertrand system [Leler, 1988] was designed as a meta-language for
the building of constraint solvers. It is itself a constraint language, based
on term rewriting. Constraints are dynamic here, and are used in control.
All constructs of the language are based on augmented rewrite rules, and
the programmer adds rules for the specific constraint solving algorithm to
be implemented.
Post-CLP, there have been a number of works which are able to deal
with dynamic constraints. The language 2LP [McAloon and Tretkoff, 1989]
is described to be a CLP language with a C-like syntax for representing
and solving combinatorial problems. Obtaining parallel execution is one
of the main objectives of this work. The commercial language CHARME,
also based on a procedural framework, arose from the work on CHIP (by
Constraint Logic Programming 595

essentially omitting the logic programming part of CHIP). ILOG-SOLVER,


which is also commercial, is a library of constraint algorithms designed
to work with C++ programs. Using a procedural language as a basis,
[Freeman-Benson, 1991] introduced Constraint Imperative programming
which has explicit constraints in the usual way, and also a new kind of
constraints obtained by considering variable assignments such as x = x +1
as time-stamped. Such assignments are treatable as constraints of the form
Xi = Xi+i + 1. Finally, we mention Constraint Functional Programming
[Darlington and Guo, 1992] whose goal is the amalgamation of the ideas of
functional programming found in the HOPE language with constraints.
There is work on languages and systems which are not generally re-
garded as constraint languages, but are nevertheless related to CLP lan-
guages. The development of symbolic algebra systems such as MACSYMA
[MathLab, 1983] concentrated on the solving of difficult algebraic prob-
lems. The programming language aspects are less developed. Languages
for linear programming [Kuip, 1993] provide little more than a primitive
documentation facility for the array of coefficients which is input into a
linear programming module.
In parallel with the development of these constraint languages, much
work was done on the modelling of combinatorial problems as Constraint
Satisfaction Problems (CSPs) and the development of techniques for solv-
ing such problems. The work was generally independent of any host lan-
guage. (A possible exception is ALICE [Lauriere, 1978] which provided a
wide variety of primitives to implement different search techniques.) One
important development was the definition and study of several notions of
consistency. This work had a significant influence on the later development
of the CLP language CHIP. We refer the reader to [Tsang, 1993] for an
introduction to the basic techniques and results concerning CSPs. Finally,
we mention the survey [Burg et al., 1990] which deals not just with con-
straint programming languages, but with constraint-based programming
techniques.

1.2 Logic Programming


Next, we consider conventional logic programming (LP), and argue by ex-
ample that the power of CLP cannot be obtained by making simple changes
to LP systems. The question at hand is whether predicates in a logic pro-
gram can be meaningfully regarded as constraints. That is, is a predicate
with the same declarative semantics as a constraint a sufficient implementa-
tion of the constraint as per CLP? Consider, for example, the logic program
add(0, N, N).
add(s(N), M, s(K)) :- add(N, M, K).
where natural numbers n are represented by s(s(. . . (0) . . . )) with n occur-
rences of s. Clearly, the meaning of the predicate add(n,m,k) coincides
596 Joxan Jaffar and Michael J. Maher

with the relation n + m = k. However, the query add(N, M, K) , add(N,


M, s (K)), which is clearly unsatisfiable, runs forever in a conventional LP
system. The important point here is that a global test for the satisfiability
of the two add constraints is not done by the underlying LP machinery.
In this example, the problem is that the add predicate is not invoked
with a representation of the add constraints collected so far, and neither
does it return such a representation (after having dealt with one more
constraint). More concretely, the second subgoal of the query above is not
given a representation of the fact that N + M = K.
A partial solution to this problem is the use of a delay mechanism.
Roughly, the idea is that invocation of the predicate is delayed until its
arguments are sufficiently instantiated. For example, if invocation of add
is systematically delayed until its first argument is instantiated, then add
behaves as in CLP when the first argument is ground. Thus the query N
= s(s(. . . . s(0). . . ) ) , add(N, M, K), add(N, M, s(K)) fails as desired.
However, the original query add (N, M, K), add(N, M, s (K)) will be de-
layed forever.
A total solution could, in principle, be obtained by simply adding two
extra arguments to the predicate. One would be used for the input rep-
resentation, and one for the output. This would mean that each time a
constraint is dealt with, a representation of the entire set of constraints
accumulated must be manipulated and a new representation constructed.
But this is tantamount to a meta-level implementation of CLP in LP. Fur-
thermore, this approach raises new challenges to efficient implementation.
Since LP is an instance of CLP, in which constraints are equations over
terms, its solver also requires a representation of accumulated constraints.
It happens, however, that there is no need for an explicit representation,
such as the extra arguments discussed above. This is because the accumu-
lated constraints can be represented by a most general unifier, and this, of
course, is globally available via a simple binding mechanism.

1.3 CLP languages


Viewing the subject rather broadly, constraint logic programming can be
said to involve the incorporation of constraints and constraint "solving"
methods in a logic-based language. This characterization suggests the pos-
sibility of many interesting languages, based on different constraints and
different logics. However, to this point, work on CLP has almost exclu-
sively been devoted to languages based on Horn clauses1. We now briefly
describe these languages, concentrating on those that have received sub-
stantial development effort.

note, however, some work combining constraints and resolution in first-order


automated theorem-proving [Stickel, 1984; Burckert, 1990].
Constraint Logic Programming 597

Prolog can be said to be a CLP language where the constraints are equa-
tions over the algebra of terms (also called the algebra of finite trees, or the
Herbrand domain). The equations are implicit in the use of unification2.
Almost every language we discuss incorporates Prolog-like terms in ad-
dition to other terms and constraints, so we will not discuss this aspect
further. Prolog II [Colmerauer, 1982a] employs equations and disequations
(£) over rational trees (an extension of the finite trees of Prolog to cyclic
structures). It was the first logic language explicitly described as using
constraints [Colmerauer, 1983].
CLP (72.) [Jaffar et al, 1992a] has linear arithmetic constraints and com-
putes over the real numbers. Nonlinear constraints are ignored (delayed)
until they become effectively linear. CHIP [Dincbas et al., 1988a] and Pro-
log III [Colmerauer, 1988] compute over several domains. Both compute
over Boolean domains: Prolog III over the well-known 2-valued Boolean
algebra, and CHIP over a larger Boolean algebra that contains symbolic
values. Both CHIP and Prolog III perform linear arithmetic over the ratio-
nal numbers. Separately (domains cannot be mixed), CHIP also performs
linear arithmetic over bounded subsets of the integers (known as "finite
domains"). Prolog III also computes over a domain of strings. There are
now several languages which compute over finite domains in the manner
of CHIP, including clp(FD) [Diaz and Codognet, 1993], Echidna [Havens et
al., 1992], and Flang [Mantsivoda, 1993]. cc(FD) [van Hentenryck et al,
1993] is essentially a second-generation CHIP system.
LOGIN [Ai't-Kaci and Nasr, 1986] and LIFE [A'ft-Kaci and Podelski,
1993a] compute over an order-sorted domain of feature trees. This domain
provides a limited notion of object (in the object-oriented sense). The
languages support a term syntax which is not first-order, although every
term can be interpreted through first-order constraints. Unlike other CLP
languages/domains, Prolog-like trees are essentially part of this domain,
instead of being built on top of the domain. CIL [Mukai, 1987] computes
over a domain similar to feature trees.
BNR-Prolog [Older and Benhamou, 1993] computes over three domains:
the 2-valued Boolean algebra, finite domains, and arithmetic over the real
numbers. In contrast to other CLP languages over arithmetic domains,
it computes solutions numerically, instead of symbolically. Trilogy [Voda,
1988a; Voda, 1988b] computes over strings, integers, and real numbers.
Although its syntax is closer to that of C, 2LP [McAloon and Tretkoff,
1989] can be considered to be a CLP language permitting only a subset
of Horn clauses. It computes with linear constraints over integers and real
numbers.
CAL [Aiba et al., 1988] computes over two domains: the real num-

2
The language Absys [Elcock, 1990], which was very similar to Prolog, used equations
explicitly, making it more obviously a CLP language.
598 Joxan Jaffar and Michael J. Maher

bers, where constraints are equations between polynomials, and a Boolean


algebra with symbolic values, where equality between Boolean formulas
expresses equivalence in the algebra. Instead of delaying non-linear con-
straints, CAL makes partial use of these constraints during computation.
In the experimental system RJSC-CLP(Real) [Hong, 1993] non-linear con-
straints are fully involved in the computation.
LA [Miller, 1991] and Elf [Pfenning, 1991] are derived from A-Prolog
[Miller and Nadathur, 1986] and compute over the values of closed typed
lambda expressions. These languages are not based on Horn clauses (they
include a universal quantifier) and were not originally described as CLP
languages. However, it is argued in [Michaylov and Pfenning, 1993] that
their operational behavior is best understood as the behavior of a CLP
language. An earlier language, Le Fun [Ait-Kaci and Nasr, 1987], also
computed over this domain, and can be viewed as a CLP language with a
weak constraint solver.

1.4 Synopsis
The remainder of this paper is organized into three main parts. In part
I, we provide a formal framework for CLP. Particular attention will be
paid to operational semantics and operational models. As we have seen
in examples, it is the operational interpretation of constraints, rather than
the declarative interpretation, which distinguishes CLP from LP. In part
II, algorithm and data structure considerations are discussed. A crucial
property of any CLP implementation is that its constraint handling algo-
rithms are incremental. In this light, we review several important solvers
and their algorithms for the satisfiability, entailment, and delaying of con-
straints. We will also discuss the requirements of an inference engine for
CLP. In part III, we consider CLP applications. In particular, we discuss
two rather different programming paradigms, one suited for the modelling
of complex problems, and one for the solution of combinatorial problems.
In this survey, we concentrate on the issues raised by the introduction
of constraints to LP. Consequently, we will ignore, or pass over quickly,
those issues inherent in LP. We assume the reader is somewhat familiar
with LP and basic first-order logic. Appropriate background can be ob-
tained from [Lloyd, 1987] for LP and [Shoenfield, 1967] for logic. For
introductory papers on constraint logic programming and CLP languages
we refer the reader to [Colmerauer, 1987; Colmerauer, 1990; Lassez, 1987;
Fruhwirth et al, 1992]. For further reading on CLP, we suggest other
surveys [Cohen, 1990; van Hentenryck, 1991; van Hentenryck, 1992], some
collections of papers [Benhamou and Colmerauer, 1993; Kanellakis et al.,
to appear; van Hentenryck, 1993], and some books [van Hentenryck, 1989a;
Saraswat, 1989]. More generally, papers on CLP appear in various jour-
nals and conference proceedings devoted to computational logic, constraint
processing, or symbolic computation.
Constraint Logic Programming 599

1.5 Notation and terminology


This paper will (hopefully) keep to the following conventions. Upper case
letters generally denote collections of objects, while lower case letters gen-
erally denote individual objects. u,v,w,x,y,z will denote variables, s,t
will denote terms, p, q will denote predicate symbols, f, g will denote func-
tion symbols, a will denote a constant, a, b, h will denote atoms, A will
denote a collection of atoms, 0, ip will denote substitutions, c will denote a
constraint, C, S will denote collections of constraints, r will denote a rule,
P, Q will denote programs, G will denote a goal, V will denote a structure,
D will denote its set of elements, and d will denote an element of D. These
symbols may be subscripted or have an over-tilde, x denotes a sequence
of distinct variables xi,X2,...,xn for an appropriate n. s denotes a se-
quence of (not necessarily distinct) terms Si,$2, . . . . ,sn for an appropriate
n. s = t abbreviates s1 = t1 A s2 = t2 A • • • A sn = tn. 3_i 0 denotes the
existential closure of the formula 0 except for the variables x, which remain
unquantified. 3 4> denotes the full existential closure of the formula 0.
A signature defines a set of function and predicate symbols and asso-
ciates an arity with each symbol3. If E is a signature, a ^-structure T>
consists of a set D and an assignment of functions and relations on D to
the symbols of S which respects the arities of the symbols. A first-order
S-formula is built from variables, function and predicate symbols of S, the
logical connectives A, V,->, <—,->, f* and quantifiers over variables 3,V in
the usual way [Shoenfield, 1967]. A formula is closed if all variable occur-
rences in the formula are within the scope of a quantifier over the variable.
A E-theory is a collection of closed E-formulas. A model of a E-theory
T is a S-structure D> such that all formulas of T evaluate to true under
the interpretation provided by T>. A T>-model of a theory T is a model of
T extending V (this requires that the signature of V be contained in the
signature of T). We write T,D 0 to denote that the formula 0 is valid
in all D-models of T.
In this paper, the set of function and predicate symbols defined in the
constraint domain is denoted by E and the set of predicate symbols defin-
able by a program is denoted by II. A primitive constraint has the form
p ( t 1 , . . . ,tn), where t 1 ,... ,tn are terms and p E S is a predicate symbol.
Every constraint is a (first-order) formula built from primitive constraints.
The class of constraints will vary, but we will generally consider only a
subset of formulas to be constraints. An atom has the form p ( t 1 , . . . , tn),
where t 1 , . . . , tn are terms and p € II. A CLP program is a collection of
rules of the form a 4— b 1 , . . . , bn where a is an atom and the 6; 's are atoms
or constraints, o is called the head of the rule and b 1 ,...,b n is called the
body. Sometimes we represent the rule by o <- c, B, where c is the con-
3
In a many-sorted language this would include associating a sort with each argument
and the result of each symbol. However, we will not discuss such details in this survey.
600 Joxan Jaffar and Michael J. Maher

junction of constraints in the body and B is the collection of atoms in the


body, and sometimes we represent the rule by a <- B, where B is the col-
lection of atoms and constraints in the body. In one subsection we will also
consider programs with negated atoms in the body. A goal (or query) G is
a conjunction of constraints and atoms. A fact is a rule a «- c where c is a
constraint. Finally, we will identify conjunction and multiset union.
To simplify the exposition, we assume that the rules are in a standard
form, where all arguments in atoms are variables and each variable occurs
in at most one atom. This involves no loss of generality since a rule such as
p ( t 1 , t 2 ) - C , q ( s 1 , s 2 ) can be replaced by the equivalent rule p ( x i , x 2 ) «-
xi = t 1 ,x 2 = t2,yi = si,y 2 = s 2 ,C,q(y 1 ,y 2 ). We also assume that all
rules defining the same predicate have the same head and that no two rules
have any other variables in common (this is simply a matter of renaming
variables). However, in examples we relax these restrictions.
Programs will be presented in teletype font, and will generally follow
the Edinburgh syntax. In particular, program variables begin with an
upper case letter, [Head\Tail] denotes a list with head Head and tail Tail,
and [] denotes an empty list. In one variation from this standard we allow
subscripts on program variables, to improve readability.
The semantics of CLP languages
Many languages based on definite clauses have quite similar semantics.
The crucial insight of the CLP Scheme [Jaffar and Lassez, 1987; Jaffar and
Lassez, 1986] and the earlier scheme of [Jaffar et al., 1984; Jaffar et al., 1986]
was that a logic-based programming language, its operational semantics, its
declarative semantics and the relationships between these semantics could
all be parameterized by a choice of domain of computation and constraints.
The resulting scheme defines the class of languages CLP(X) obtained by
instantiating the parameter X.
We take the view that the parameter X stands for a 4-tuple CS,,T>,£,T).
Here E is a signature, V is a S-structure, £ is a class of S-formulas, and
T is a first-order E-theory. Intuitively, S determines the predefined predi-
cate and function symbols and their arities, D is the structure over which
computation is to be performed, £ is the class of constraints which can
be expressed, and T is an axiomatization of (some) properties of D. In
the following section, we define some important relationships between the
elements of the 4-tuple, and give some examples of constraint domains.
We then give declarative and operational semantics for CLP programs,
parameterized by X. The declarative semantics are quite similar to the
corresponding semantics of logic programs, and we cover them quickly.
There are many variations of the resolution-based operational semantics,
and we present the main ones. We also present the main soundness and
completeness results that relate the two styles of semantics. Finally, we
discuss some linguistic features that have been proposed as extensions to
Constraint Logic Programming 601

the basic CLP language.

2 Constraint domains
For any signature £, let D be a S-structure (the domain of computation)
and £ be a class of S-formulas (the constraints). We call the pair (D, £) a
constraint domain. In a slight abuse of notation we will sometimes denote
the constraint domain by T>. We will make, several assumptions, none of
which is strictly necessary, to simplify the exposition. We assume
• The terms and constraints in £ come from a first-order language4.
• The binary predicate symbol = is contained in S and is interpreted
as identity in D5.
• There are constraints in £ which are, respectively, identically true
and identically false in V.
• The class of constraints £ is closed under variable renaming, conjunc-
tion and existential quantification.
We will denote the smallest set of constraints which satisfies these assump-
tions and contains all primitive constraints - the constraints generated by
the primitive constraints - by £s. In general, £ may be strictly larger than
£s since, for example, universal quantifiers or disjunction are permitted in
£; it also may be smaller, as in Example 2.0.7 of Section 2 below. However,
we will usually take £ = £s. On occasion we will consider an extension of
S and £, to E* and £* respectively, so that there is a constant in S* for
every element of D.
We now present some example constraint domains. In practice, these
are not always fully implemented, but we leave discussion of that until
later. Most general purpose CLP languages incorporate some arithmetic
domain, including BNR-Prolog [Older and Benhamou, 1993], CAL [Aiba
et al., 1988], CHIP [Dincbas et al., 1988a], CLP(R) [Jaffar et al., 1992a],
Prolog III [Colmerauer, 1988], RISC-CLP(Real) [Hong, 1993].
Example 2.0.1. Let £ contain the constants 0 and 1, the binary function
symbols + and *, and the binary predicate symbols =, < and <. Let D
be the set of real numbers and let D interpret the symbols of £ as usual
(i.e. + is interpreted as addition, etc). Let £ be the constraints generated
by the primitive constraints. Then 5R = (£>,£) is the constraint domain
of arithmetic over the real numbers. If we omit from £ the symbol *
then the corresponding constraint domain RLin = (D1, £') is the constraint
domain of linear arithmetic over the real numbers. If the domain is further
4
Without this assumption, some of the results we cite are not applicable, since there
can be no appropriate first-order theory T. The remaining assumptions can be omitted,
at the expense of a messier reformulation of definitions and results.
5
This assumption is unnecessary when terms have a most general unifier in D, as
occurs in Prolog. Otherwise = is needed to express parameter passing.
602 Joxan Jaffar and Michael J. Maher

restricted to the rational numbers then we have a further constraint domain


Qli n . In constraints in 3f/,jn and QL™ we will write terms such as 3 and
5x as abbreviations for 1 + 1 +1 and x + x + x + x + x respectively6. Thus
3y 5x + y<3Az<y — l is a constraint in 5R, 5?,j n and Qnn, whereas
x * x < y is a constraint only in 3?. If we extend £' to allow negated
equations7 (we will use the symbol ) then the resulting constraint domains
^Lm an<^ Qiin Pernut constraints such as2x + j / < O A a ; ^ j / . Finally,
if we restrict £ to {0,1,+,=} we obtain the constraint domain BtLmEqn,
where the only constraints are linear equations.
Jtijn and Llin (and R^Lin and Q Lin ) are essentially the same constraint
domain: they have the same language of constraints and the two structures
are elementarily equivalent [Shoenfield, 1967]. In particular, a constraint
solver for one is also a constraint solver for the other.
Prolog and standard logic programming can be viewed as constraint
logic programming over the constraint domain of finite trees.
Example 2.0.2. Let £ contain a collection of constant and function sym-
bols and the binary predicate symbol =. Let D be the set of finite trees
where: each node of each tree is labelled by a constant or function symbol,
the number of children of each node is the arity of the label of the node,
and the children are ordered. Let D> interpret the function symbols of £ as
tree constructors, where each f E S of arity n maps n trees to a tree whose
root is labelled by / and whose subtrees are the arguments of the mapping.
The primitive constraints are equations between terms, and let £ be the
constraints generated by these primitive constraints. Then FT = (D, £)
is the Herbrand constraint domain, as used in Prolog. Typical constraints
are x = g(y) and 3z x = f ( z , z ) A y = g ( z ) . (It is unnecessary to write
a quantifier in Prolog programs because all variables that appear only in
constraints are implicitly existentially quantified.)
It was pointed out in [Colmerauer, 1982b] that complete (i.e. always ter-
minating) unification which omits the occurs check solves equations over
the rational trees.
Example 2.0.3. We take £ and £ as in the previous example. D is the
set of rational trees (see [Courcelle, 1983] for a definition) and the function
symbols are interpreted as tree constructors, as before. Then 'R.T = (T>, £)
is the constraint domain of rational trees.
If we take the set of infinite trees instead of rational trees then we obtain
a constraint domain that is essentially the same as 7£T, in the same way
that RLin and QLin are essentially the same: they have the same language
6
Other syntactic sugar, such as the unary and binary minus symbol —, are allowed.
Rational number coefficients can be used: all terms in the sugared constraint need only
be multiplied by an appropriate number to reduce the coefficients to integers.
7
Sometimes called disequations.
Constraint Logic Programming 603

of constraints and the two structures are elementarily equivalent [Maher,


1988).
The next domain contains objects similar to the previous domains, but
has a different signature and constraint language [Ai't-Kaci et al., 1992]8,
which results in slightly different expressive power. It can be viewed as the
restriction of domains over which LOGIN [Ai't-Kaci and Nasr, 1986] and
LIFE [Ai't-Kaci and Podelski, 1993a] compute when all sorts are disjoint.
The close relationship between the constraints and ^-terms [Ai't-Kaci, 1986]
is emphasized by a syntactic sugaring of the constraints.
Example 2.0.4. Let E = {=}u5uF where 5 is a set of unary predicate
symbols (sorts) and F is a set of binary predicate symbols (features). Let
D be the set of (finite or infinite) trees where: each node of each tree
is labelled by a sort, each edge of each tree is labelled by a feature and
no node has two outbound edges with the same label. Such trees are
called feature trees. Let D interpret each sort s as the set of feature trees
whose root is labelled by s, and interpret each feature / as the set of pairs
( t 1 , t 2 ) of feature trees such that t2 is the subtree of t1 that is reached
by the edge labelled by /. (If t1 has no edge labelled by / then there is
no pair ( t 1 , t 2 ) in the set.) Thus features are essentially partial functions.
The domain of feature trees is FEAT = (D,£). A typical constraint is
wine(x) A 3y region(x,y) A rutherglen(y) A By color(x,y) A red(y), but
there is also a sugared syntax which would represent this constraint as
x : wine[region => rutherglen,color => red].
The next constraint domain takes strings as the basic objects. It is used in
Prolog III [Colmerauer, 1988].
Example 2.0.5. Let H contain the binary predicate symbol =, the binary
function symbol., a constant A, and a number of other constants. D is the
set of finite strings of the constants. The symbol . is interpreted in P as
string concatenation and A is interpreted as the empty string. £ is the set
of constraints generated by equations between terms. Then W£ = (D, £) is
the constraint domain of equations on strings, sometimes called the domain
of word equations. An example constraint is x.a = b.x.
The constraint domain of Boolean values and functions is used in BNR-
Prolog [Older and Benhamou, 1993], CAL [Aiba et al., 1988], CHIP [Dincbas
et al., 1988a] and Prolog III [Colmerauer, 1988]. CAL and CHIP employ a
more general constraint domain, which includes symbolic Boolean values.
Example 2.0.6. Let E contain the constants 0 and 1, the unary function
symbol ->, the binary function symbols A, V, ©,=>•, and the binary pred-
icate symbol =. Let D be the set {true, false} and let V interpret the

8
A variant of this domain, with a slightly different signature, is used in [Smolka and
Treinen, 1992].
604 Joxan Jaffar and Michael J. Maher

symbols of £ as the usual Boolean functions (i.e. A is interpreted as con-


junction, 0 is exclusive or, etc). Let £ be the constraints generated by the
primitive constraints. Then BOOL = (D, C) is the (two-valued) Boolean
constraint domain. An example constraint is -<(x A y) = y. In a slight
abuse of notation, we allow a constraint t — 1 to be written simply as t
so that, for example, —>(x A y)(&y denotes the constraint ->(x A y)@y = 1.
For the more general constraint domain, let S' = S U {ai,... , a,,...},
where the ai are constants. Let £' be the constraints generated by the £'
primitive constraints and let V be the free Boolean algebra generated by
{01,... ,a,i,...}. Then BOOC.X = (T>',£) is the Boolean constraint domain
with infinitely many symbolic values9. A constraint c(x, a) is satisfiable in
BOOCve iff V \= 3zVy c(x,y).
The finite domains of CHIP are best viewed as having the integers as
the underlying structure, with a limitation on the language of constraints.
Example 2.0.7. Let D = Z and E = {{e [m,n]} m < n ,+,=, A<}. For
every pair of integers m and n, the interval constraint x € [m, n] denotes
that m < x < n. The other symbols in £ have their usual meaning. Let
C be the constraints c generated by the primitive constraints, restricted so
that every variable in c is subject to an interval constraint. Then ?T> =
(D, £) is the constraint domain referred to as finite domains. (The domain
of a variable x is the finite set of values which satisfy all unary constraints
involving x.) A typical constraint in TV is x € [1,5] A y e [0,7] A x /
3Ax + 2 y < 5 A x + y < 9 . The domain of x is {1,2,4,5}.
There are several other constraint domains of interest that we cannot
exemplify here for lack of space. They include pseudo-Boolean constraints
(for example, [Bockmayr, 1993]), which are intermediate between Boolean
and integer constraints, order-sorted feature algebras [Ait-Kaci and Pold-
eski, 1993b], domains consisting of regular sets of strings [Walinsky, 1989],
domains of finite sets [Dovier and Rossi, 1993], domains of CLP(Fun(D))
which employ a function variable [Hickey, 1993], domains of functions ex-
pressed by A-expressions [Miller and Nadathur, 1986; Ai't-Kaci and Nasr,
1987; Miller, 1991; Pfenning, 1991; Michaylov and Pfenning, 1993], etc.
It is also possible to form a constraint domain directly from objects
and operations in an application, instead of more general purpose domains
such as those above. This possibility has only been pursued in a limited
form, where a general purpose domain is extended by the ad hoc addition of
primitive constraints. For example, in some uses of CHIP the finite domain

9
Only finitely many constants are used in any one program, so it can be argued
that a finite Boolean algebra is a more appropriate domain of computation. However,
the two alternatives agree on satisfiability and constraint entailment (although not if
an expanded language of constraints is permitted), and it is preferable to view the
constraint domain as independent of the program. Currently, it is not clear whether the
alternatives agree on other constraint operations.
Constraint Logic Programming 605

is extended with a predicate symbol element [Dincbas et al., 1988b]. The


relation element(x,l,t) expresses that t is the x'th element in the list I.
We discuss such extensions further in Section 9.2.
These constraint domains are expected to support (perhaps in a weak-
ened form) the following tests and operations on constraints, which have
a major importance in CLP languages. The first operation is the most
important (it is almost obligatory), while the others might not be used in
some CLP languages.
• The first is a test for consistency or satisfiability: D> (= 3 c.
• The second is the implication (or entailment) of one constraint by
another: T> |= CQ -+ c\. More generally, we may ask whether a
disjunction of constraints is implied by a constraint: D |= Co —>•
V ni=l <*•
• The third is the projection of a constraint C0 onto variables x to obtain
a constraint c1 such that T> \= ci *-*• 3_g CQ. It is always possible to
take C1 to be 3_x CQ, but the aim is to compute the simplest c1 with
fewest quantifiers. In general it is not possible to eliminate all uses
of the existential quantifier.
• The fourth is the detection that, given a constraint c, there is only
one value that a variable x can take that is consistent with c. That is,
T> \= c(x,z)Ac(y,w) -* x — y or, equivalently, D (= 3zVz, y c(x,y") ->•
x = z. We say x is determined (or grounded) by c.
In Section 10 we will discuss problems and techniques which arise when
implementing these operations in a CLP system. However, we point out
here that some implementations of these operators - in particular, the
test for satisfiability - are incomplete. In some cases it has been argued
[Colmerauer, 1993b; Colmerauer, 1993a; Benhamou and Massat, 1993] that
although an algorithm is incomplete with respect to the desired constraint
domain, it is complete with respect to another (artificially constructed)
constraint domain.
We now turn to some properties of constraint domains which will be
used later. The first two - solution compactness and satisfaction complete-
ness - were introduced as part of the CLP Scheme.
Definition 2.0.8. Let d range over elements of D and c,a range over
constraints in £, and let / be a possibly infinite index set. A constraint
domain (D,£) is solution compact [Jaffar and Lassez, 1986; Jaffar and
Lassez, 1987] if it satisfies the following conditions:

(SCO Vd 3{Ci}i€/ s.t. T>\=Vxx = d<* A ie/ Ci(x)

(SC2) Vc 3{Ci}i6/ s.t. D^Vx -c(x) <-» V i6 /c«(*)


606 Joxan Jaffar and Michael J. Maher

Roughly speaking, SC1 is satisfied iff every element d of D can be denned


by a (possibly infinite) conjunction of constraints, and SC2 is satisfied iff
the complement of each constraint c in £ can be described by a (possibly
infinite) disjunction of constraints.
The definition of SC2 in [Jaffar and Lassez, 1986] is not quite equivalent
to the definition in [Jaffar and Lassez, 1987] which we paraphrase above;
see [Maher, 1992]. It turns out that SC\ is not necessary for the results we
present; we include it only for historical accuracy. There is no known nat-
ural constraint domain for which SC2 does not hold. There are, however,
some artificial constraint domains for which it fails.
Example 2.0.9. Let 9?£in denote the constraint domain obtained from
^•Lin by adding the unary primitive constraint x ^ TT. The negation of this
constraint (i.e. x = n) cannot be represented as a disjunction of constraints
in ^iin- Thus 5?Jin is not solution compact.
The theory T in the parameter of the CLP scheme is intended to ax-
iomatize some of the properties of D. We place some conditions on D and
T to ensure that T reflects d sufficiently. The first two conditions ensure
that V and T agree on satisfiability of constraints, while the addition of
the third condition guarantees that every unsatisfiability in T) is also de-
tected by T. The theory T and these conditions mainly play a role in the
completeness results of Section 6.
Definition 2.0.10. For a given signature E, let (D,£) be a constraint
domain with signature S, and T be a S-theory. We say that D and T
correspond on £ if
• D is a model of T, and
• for every constraint c € C,T> =3 ciST |=3 c.
We say T is satisfaction complete with respect to £ if for every constraint
c e £, either T \= 3 c or T |= -B c.
Satisfaction completeness is a weakening of the notion of a complete theory
[Shoenfield, 1967]. Thus, for example, the theory of the real closed fields
[Tarski, 1951] corresponds and is satisfaction complete with respect to 5R
since the domain is a model of this theory and the theory is complete.
Clark's axiomatization of unification [Clark, 1978] defines a satisfaction
complete theory with respect to TT which is not complete when there are
only finitely many function symbols [Maher, 1988].
The notion of independence of negative constraints plays a significant
role in constraint logic programming10. In [Colmerauer, 1984], Colmer-
auer used independence of inequations to simplify the test for satisfiability
of equations and inequations on the rational trees. (The independence of

10
It is also closely related to the model-theoretic properties that led to an interest in
Horn formulas [McKinsey, 1943; Horn, 1951].
Constraint Logic Programming 607

inequations states: if a conjunction of positive and negative equational con-


straints is inconsistent then one of the negative constraints is inconsistent
with the positive constraints.) Independence of negative constraints has
been investigated in greater generality in [Lassez and McAloon, 1990]. The
property has been shown to hold for several classes of constraints including
equations on finite, rational and infinite trees [Lassez and Marriott, 1987;
Lassez et al., 1988; Maher, 1988], linear real arithmetic constraints (where
only equations may be negated) [Lassez and McAloon, 1992], sort and fea-
ture constraints on feature trees [Ai't-Kaci et al, 1992], and infinite Boolean
algebras with positive constraints [Helm et al., 1991], among others [Lassez
and McAloon, 1990]. We consider a restricted form of independence of
negative constraints [Maher, 1993b].
Definition 2.0.11. A constraint domain (D, £) has the independence of
negated constraints property if, for all constraints c, C 1 , . . . , c n € £,
T> |= 3 c A ->ci A • • • A ->cn iff T> |= 3 c A ->C1 for i = 1, . . . , n.
The fact that C is assumed to be closed under conjunction and existential
quantification is an important restriction in the above definition. For ex-
ample, Colmerauer's work is not applicable in this setting since that dealt
only with primitive constraints. Neither are many of the other results cited
above, at least not in their full generality. However there are still several
useful constraint domains known to have this property, including the alge-
bras of finite, rational and infinite trees with equational constraints, when
there are infinitely many function symbols [Lassez and Marriott, 1987;
Maher, 1988], feature trees with infinitely many sorts and features [Ai't-
Kaci et al., 1992], linear arithmetic equations over the rational or real
numbers, and infinite Boolean algebras with positive constraints [Helm et
al., 1991].
Example 2.0.12. In the Her brand constraint domain FT with only two
function symbols, a constant a and a unary function /, it is easily seen that
the following statements are true: FT (= 3x,y, z x = f(y) A ->y = a A ->y =
/(z); FT (= 3x,y x = f(y) A -y = a; FT ]= 3x,y,zx = f(y) A^y = f(z).
This is an example of the independence of inequations for FT. However,
when we consider the full class of constraints of FT we have the following
facts. The statement FT |= 3z, y x = f(y) A ->y = a A -i3z y = f(z) is not
true, since every finite tree y is either the constant a or has the form /(z) for
some finite tree z. On the other hand, both FT (= 3z, y x = f(y) A ->y = a
and FT [= 3x, y x = f(y)/\-3z y = f(z) are true. Thus, for these function
symbols - and it is easy to see how to extend this example to any finite
set of function symbols - the independence of negated constraints does not
hold.
As is clear from [Scott, 1982], constraint domains (and constraints) are
closely related to the information systems (and their elements) used by
608 Joxan Jaffar and Michael J. Maher

Scott to present his domain theory. Information systems codify notions of


consistency and entailment among elements, which can be interpreted as
satisfiability and implication of constraints on a single variable. Saraswat
[Saraswat et al., 1991; Saraswat, 1992] extended the notion of information
system to constraint systems11 (which allow many variables), and showed
that some of the motivating properties of information systems continue to
hold.
Constraint systems (we will not give a formal definition here) can be
viewed as abstractions of constraint domains which eliminate consideration
of a particular structure D; the relation D |= c1 A • • • A cn -»• c among con-
straints c, c 1 , . . . , cn is abstracted to the relation c 1 ,..., cn c (and the sat-
isfiability relation D> |= 3 c 1 A- • • Acn among constraints can be abstracted to
a set Con of all consistent finite sets of constraints {c 1 ,..., cn} [Scott, 1982;
Saraswat, 1992]). Many of the essential semantic details of a constraint do-
main are still present in the corresponding constraint system, although
properties such as solution compactness and independence of negated con-
straints cannot be expressed without more detail than a constraint system
provides.

3 Logical semantics
There are two common logical semantics of CLP programs over a constraint
domain (Z>, £). The first interprets a rule

p(x) < - & i , . . . , 6 n

as the logic formula

Vx, y p(x) V -<bi V • • • V -i&n

where x U y is the set of all free variables in the rule. The collection of all
such formulas corresponding to rules of P gives a theory also denoted by
P.
The second logical semantics associates a logic formula with each pred-
icate in n. If the set of all rules of P with p in the head is

p(x) <- Bi
p(x) <- B2

p(x) <- Bn

then the formula associated with p is

''Although [Saraswat, 1992] does not treat consistency, only entailment.


Constraint Logic Programming 609

where yi is the set of variables in Bi except for variables in x. If p does not


occur in the head of a rule of P then the formula is

The collection of all such formulas is called the Clark completion of P, and
is denoted by P*.
A valuation v is a mapping from variables to D, and the natural ex-
tension which maps terms to D and formulas to closed £*-formulas. If
X is a set of facts then [X]D = {v(a) | (a <- c) e X,D (= v(c)}. A
D-interpretation of a formula is an interpretation of the formula with the
same domain as D and the same interpretation for the symbols in E as D.
It can be represented as a subset of BD where B-D = {p(d) | p € II, d e Dk}.
A D-model of a closed formula is a P-interpretation which is a model of
the formula.
Let T denote a satisfaction complete theory for (D,L). The usual
logical semantics are based on the D-models of P and the models of P*, T.
The least P-model of a formula Q under the subset ordering is denoted by
lm(Q,D), and the greatest is denoted by gm(Q,D)). A solution to a query
G is a valuation v such that v(G) C lm(P,D).

4 Fixedpoint semantics
The fixedpoint semantics we present are based on one-step consequence
functions TpD and Sp, and the closure operator [P] generated by Tp.
The functions Tp* and [P] map over P-interpretations. The set of T>-
interpretations forms a complete lattice under the subset ordering, and
these functions are continuous on B-D.

[P] is the closure operator generated by Tp'. It represents a deductive


closure based on the rules of P. Let Id be the identity function, and define
(f + g ) ( x ) = f(x) U g(x). Then [P](I) is the least fixedpoint of Td + Id
greater than I, and the least fixedpoint of Tpjj/.
The function Sp is defined on sets of facts, which form a complete lattice
under the subset ordering. We denote the closure operator generated from
Sp by {{P)). Both these functions are continuous.
610 Joxan Jaffar and Michael J. Maher

= {p(x) <- c | p(x) 4- d, bi, . . . , bn is a rule of P,


ai, <-Ci 6 /, i = l , . . . , n ,
the rule and facts renamed apart,
I» |= c -H- 3_j c' A Ar=i c« A «t = bi}

We denote the least fixedpoint of a function / by lfp(f) and the greatest


fixedpoint by gfp(f)- These fixedpoints exist for the functions of interest,
since they are monotonic functions on complete lattices. For a function /
mapping D-interpretations to 'D-interpretations, we define the upward and
downward iteration of / as follows.

/tO = 0 .
/ t ( a + l) = / ( / t a )
/ 10 = U«</3 / t « if /? is a limit ordinal

/ I /? = ria<0 / 4- a if /? is a limit ordinal

We can take as semantics Ifp(S^) or lfp(Tp?). The two functions involved


are related in the following way: [Sp(/)]z> = Tp([I]D))- Consequently
[lfp(S^)]v = lfp(T%). Ifp(S^) corresponds to the s-semantics [Falaschi
et al., 1989] for languages with constraints [Gabbrielli and Levi, 1991].
Fixedpoint semantics based on sets of clauses [Bossi et al., 1992] also extend
easily to CLP languages.
Based largely on the facts that the D-models of P are the fixedpoints
of [P] and the Z>-models of P* are the fixedpoints of Tfi, we have the
following connections between the logical and fixedpoint semantics, just as
in standard logic programming.
Proposition 4.0.1. Let P,P 1 ,P2 be CLP programs and Q a set of facts
over a constraint domain D with corresponding theory T. Then:

lm(P,D) = [{h<-c\ P*,V|= (h<- c)}]r> = [{h c\ P*,T \= (h


C}}]-D
lm(P*,V) = lm(P,T>) = lfp(Tj?)
9m(P*,-D)=gfp(T£)

<(^U Q»(0) =
. V \= P1 «• P2 iff [P1] = [P2]
We will need the following terminology later. P is said to be (D, £)-
canonical iff gfp(Tj?) = Tp | w. Canonical logic programs, but not
Constraint Logic Programming 611

constraint logic programs, were first studied in [Jaffar and Stuckey, 1986]
which showed that every logic program is equivalent (wrt the success and
finite failure sets) to a canonical logic program. The proof here was not
constructive, but subsequently, [Wallace, 1989] provided an algorithm to
generate the canonical logic program12. Like many other kinds of results
in traditional logic programming, these results are likely to extend to CLP
in a straightforward way.

5 Top-down execution
The phrase "top-down execution" covers a multitude of operational mod-
els. We will present a fairly general framework for operational semantics
in which we can describe the operational semantics of some major CLP
systems.
We will present the operational semantics as a transition system on
states: tuples (A, C, S) where A is a multiset of atoms and constraints, and
C and S are multisets of constraints. The constraints C and S are referred
to as the constraint store and, in implementations, are acted upon by a
constraint solver. Intuitively, A is a collection of as-yet-unseen atoms and
constraints, C is the collection of constraints which are playing an active
role (or are awake), and 5 is a collection of constraints playing a passive
role (or are asleep). There is one other state, denoted by fail. To express
more details of an operational semantics, it can be necessary to represent
the collections of atoms and constraints more precisely. For example, to
express the left-to-right Prolog execution order we might use a sequence
of atoms rather than a multiset. However, we will not be concerned with
such details here.
We will assume as given a computation rule which selects a transition
type and an appropriate element of A (if necessary) for each state13. The
transition system is also parameterized by a predicate consistent and a
function infer, which we will discuss later. An initial goal G for execution
is represented as a state by (G, 0,0).
The transitions in the transition system are:

(A U a, C, S) -»r (A U B, C, S U (a = h))

if a is selected by the computation rule, a is an atom, h «— B is a rule of P,


renamed to new variables, and h and a have the same predicate symbol.
The expression a = h is an abbreviation for the conjunction of equations
12
This proof was performed in the more general class of logic programs with negation.
13
A computation rule is a convenient fiction that abstracts some of the behavior of a
CLP system. To be realistic, a computation rule should also depend on factors other than
the state (for example, the history of the computation). We ignore these possibilities
for simplicity.
612 Joxan Jaffar and Michael J. Maher

between corresponding arguments of a and h. We say a is rewritten in this


transition.

(A U a, C, S) -)> fail
if a is selected by the computation rule, a is an atom and, for every rule
h 4- B of P, h and a have different predicate symbols.

(A U c, C, S) -»c (A, C,S u c)


if c is selected by the computation rule and c is a constraint.

if ( C ' , S ' ) = i n fer(C,S).

if consistent(C).

(A,C,S) ->s fail


if ->consisient(C).
The ->r transitions arise from resolution, —»c transitions introduce con-
straints into the constraint solver, ->s transitions test whether the active
constraints are consistent, and —>i transitions infer more active constraints
(and perhaps modify the passive constraints) from the current collection of
constraints. We write -> to refer to a transition of arbitrary type.
The predicate consistent(C) expresses a test for consistency of C. Usu-
ally it is defined by: consistent(C) iff D |= 3 C, that is, a complete consis-
tency test. However, systems may employ a conservative but incomplete
(or partial) test: if D |= 3 C then consistent(C) holds, but sometimes
consistent (C) holds although D |= ->3 C. One example of such a system is
CAL [Aiba et al., 1988] which computes over the domain of real numbers,
but tests consistency over the domain of complex numbers.
The function infer(C, S) computes from the current sets of constraints
a new set of active constraints C' and passive constraints S'. Generally it
can be understood as abstracting from S (or relaxing 5) in the presence
of C to obtain more active constraints. These are added to C to form C',
and 5 is simplified to 5'. We require that V \= (C A 5) <-> (€' A 5'), so
that information is neither lost nor "guessed" by infer. The role that
infer plays varies widely from system to system. In Prolog, there are no
passive constraints and we can define infer(C, S) = (CUS, 0). In CLP(R),
non-linear constraints are passive, and infer simply passes (the linearized
version of) a constraint from 5 to C" when the constraint becomes linear
in the context of C, and deletes the constraint from 5. For example, if 5
Constraint Logic Programming 613

is x*y = z A z * y = 2 and Cisa; = 4 A z < 0 then infer(C, 5) = (C", S')


where C' is x = 4 A z < O A 4 y = z and S' is z * y = 2.
In a language like CHIP, infer performs less obvious inferences. For
example, if S is x = y + 1 and C is 2 < x < 5 A O < j < 3 then
infer(C,S) = (C',S') where C" is 2 < x < 4 A 1 < y < 3 and S' = 5.
(Note that we could also formulate the finite domain constraint solving
of CHIP as having no passive constraints, but having an incomplete test
for consistency. However the formulation we give seems to reflect the sys-
tems more closely.) Similarly, in languages employing interval arithmetic
over the real numbers (such as BNR-Prolog) intervals are active constraints
and other constraints are passive. In this case, in fer repeatedly computes
smaller intervals for each of the variables, based on the constraints in 5,
terminating when no smaller interval can be derived (modulo the precision
of the arithmetic). Execution of language constructs such as the cardinality
operator [van Hentenryck and Deville, 1991a], "constructive disjunction"
[van Hentenryck et al., 1993] and special-purpose constructs (for example,
in [Dincbas et al., 1988b; Aggoun and Beldiceanu, 1992]) can also be un-
derstood as —»i transitions, where these constructs are viewed as part of
the language of constraints.
Generally, the active constraints are determined syntactically. As ex-
amples, in Prolog all equations are active, in CLP (R) all linear constraints
are active, on the finite domains of CHIP all unary constraints (i.e. con-
straints on just one variable, such as x < 9 or x ^ 0) are active, and in the
interval arithmetic of BNR-Prolog only intervals are active.
The stronger the collection of active constraints, the earlier failure will
be detected, and the less searching is necessary. With this in mind, we
might wish infer to be as strong as possible: for every active constraint c,
ifinfer(C, S) = (C', S') and Z |= (CAS) - c, then D |= C' -»• c. However,
this is not always possible14. Even if it were possible, it is generally not
preferred, since the computational cost of a powerful infer function can
be greater than the savings achieved by limiting search.
A CLP system is determined by the constraint domain and a detailed
operational semantics. The latter involves a computation rule and defini-
tions for consistent and infer. We now define some significant properties
of CLP systems. We distinguish the class of systems in which passive con-
straints play no role and the global consistency test is complete. These
systems correspond to the systems treated in [Jaffar and Lassez, 1986;
Jaffar and Lassez, 1987].
Definition 5.0.1. Let -rit=-+r-*i-*s and -»Ci»=->c->i->s- We say that

14
For example, in CLP(R), where linear constraints are active and non-linear con-
straints are passive, if S is y = x * x then we can take c to be y > 2k x — K2, for any K.
There is no finite collection C' of active constraints which implies all these constraints
and is not stronger than S.
614 Joxan Jaffar and Michael J. Maher

a CLP system is quick-checking if its operational semantics can be described


by ->rts and ->cis. A CLP system is progressive if, for every state with a
nonempty collection of atoms, every derivation from that state either fails,
contains a ->r transition or contains a ->c transition. A CLP system is
ideal if it is quick-checking, progressive, infer is defined by infer(C, S) =
(C U S, 0), and cansistent(C) holds iff T> f= 3 C.
In a quick-checking system, inference of new active constraints is per-
formed and a test for consistency is made each time the collection of con-
straints in the constraint solver is changed. Thus, within the limits of
consistent and infer, it finds inconsistency as soon as possible. A pro-
gressive system will never infinitely ignore the collection of atoms and con-
straints in the first part of a state during execution. All major implemented
CLP systems are quick-checking and progressive, but most are not ideal.
A derivation is a sequence of transitions ( A 1 , C 1 , S 1 ) ->. . . -» {.A,, d,
Si) -}• • •. A state which cannot be rewritten further is called a final
state. A derivation is successful if it is finite and the final state has the
form (0,C,S). Let G be a goal with free variables x, which initiates a
derivation and produces a final state (0, C, S). Then 3_x C A 5 is called
the answer constraint of the derivation.
A derivation is failed if it is finite and the final state is fail. A derivation
is fair if it is failed or, for every i and every a £ Ai, a is rewritten in a
later transition. A computation rule is fair if it gives rise only to fair
derivations. A goal G is finitely failed if, for any one fair computation rule,
every derivation from G in an ideal CLP system is failed. It can be shown
that if a goal is finitely failed then every fair derivation in an ideal CLP
system is failed. A derivation flounders if it is finite and the final state has
the form (A, C, S) where A ^ 0.
The computation tree of a goal G for a program P in a CLP system is
a tree with nodes labelled by states and edges labelled by ->r) —» c , -^ or
->a such that: the root is labelled by (G, 0,0); for every node, all outgoing
edges have the same label; if a node labelled by a state 5 has an outgoing
edge labelled by ->c, -^ or -*s then the node has exactly one child, and
the state labelling that child can be obtained from S via a transition ->c,
—*i or —ts respectively; if a node labelled by a state <S has an outgoing edge
labelled by -»> then the node has a child for each rule in P, and the state
labelling each child is the state obtained from 5 by the -tr transition for
that rule; for each -»> and —>c edge, the corresponding transition uses the
atom or constraint selected by the computation rule.
Every branch of a computation tree is a derivation, and, given a com-
putation rule, every derivation following that rule is a branch of the cor-
responding computation tree. Different computation rules can give rise
to computation trees of radically different sizes. Existing CLP languages
use computation rules based on the Prolog left-to-right computation rule
Constraint Logic Programming 615

(which is not fair). We will discuss linguistic features intended to improve


on this rule in Section 9.1.
The problem of finding answers to a query can be seen as the problem
of searching a computation tree. Most CLP languages employ a depth-first
search with chronological backtracking, as in Prolog (although there have
been suggestions to use dependency-directed backtracking [De Backer and
Beringer, 1991]). Since depth-first search is incomplete on infinite trees,
not all answers are computed. The depth-first search can be incorporated
in the semantics in the same way as is done for Prolog (see, for example,
[Baudinet, 1988; Barbuti et al, 1992]), but we will not go into details here.
In Section 8 we will discuss a class of CLP languages which use a top-down
execution similar to the one outlined above, but do not use backtracking.
Consider the transition

(A,C,S)-+a(A,C',9)
where C' is a set of equations in £* such that V \= C' ->• (C A 5) and, for
every variable x occurring in C or 5, C' contains an equation x = d for
some constant d. Thus —>g grounds all variables in the constraint solver.
We also have the transitions

(A, C, S) ->•,, fail

if no such C' exists (i.e. C A S is unsatisfiable in D). A ground derivation


is a derivation composed of --»>->p and ->c->g.
We now define three sets that crystallize three aspects of the operational
semantics. The success set SS(P) collects the answer constraints to simple
goals p(x). The finite failure set FF(P) collects the set of simple goals
which are finitely failed. The ground finite failure set GFF(P) collects the
set of grounded atoms, all of whose fair ground derivations are failed.

SS(P) = (p(x) <- c | <p(*),0,0) -+* (0,c',c">, V \= c «• 3_5 c'Ac"}.


FF(P) = {p(x) <- c | for every fair derivation, (p(x),c, 0) ->* fail}.
GFF(P) = {p(d)\ for every fair ground derivation, (p(d),0,0) ->* fail}.

6 Soundness and completeness results


We now present the main relationships between the declarative semantics
and the top-down operational semantics. To keep things simple, we con-
sider only ideal CLP systems. However many of the results hold much
more generally. The soundness results hold for any CLP system, because
of restrictions we place on consistent and infer. Completeness results for
successful derivations require only that the CLP system be progressive.
Theorem 6.0.1. Consider a program P in the CLP language determined
616 Joxan Jaffar and Michael J. Maher

by a 4-tuple (S,D, £, T) where T> and T correspond on C, and executing


on an ideal CLP system. Then:
1. SS(P) = ifp(S?) and [SS(P)]x> = lm(P,T>).
2. If the goal G has a successful derivation with answer constraint c,
then P,T\=c^>-G.
3. Suppose T is satisfaction complete wrt £. If G has a finite compu-
tation tree, with answer constraints c 1 ,...,c n , then P*,T |= G «->•
ci V • • • V cn.
4. If P, T \= c -> G then there are derivations for the goal G with answer
constraints c1,..., cn such that T |= c -»• VILi c' • ^> m addition,
(T>, C) has independence of negated constraints then the result holds
for n = 1 (i.e. without disjunction).
5. Suppose T is satisfaction complete wrt C. If P*, T |= G <-> GI V- • • Vcn
then G has a computation tree with answer constraints c(,..., c'm
(and possibly others) such that T |= Ci V • • • V cn o ci V • • • V dm.
6. Suppose T is satisfaction complete wrt £.
The goal G is finitely failed for P iff P*,T |= -«G.
7. gm(P*,D) = Bd>- GFF(P).
8. Suppose (X>,£) is solution compact. T% | a; = Bp - [FF(P)]r>.
9. Suppose (D, £) is solution compact.
P is (V, £)-canonical iff [FF(P)]V = [{h <- c | P*,T> |= -,(/, A c)}]p.
Most of these results are from [Jaffar and Lassez, 1986; Jaffar and Lassez,
1987], but there are also some from [Gabbrielli and Levi, 1991; Maher,
1987; Maher, 1993b]. Results 8 and 9 of the above theorem (which are
equivalent) are the only results, of those listed above, which require solution
compactness. In fact, the properties shown are equivalent to SC?, the
second condition of solution compactness [Maher, 1993b]; as mentioned
earlier, SC\ is not needed. In soundness results (2, 3 and half of 6), T
can be replaced by T>. If we omit our assumption of a first-order language
of constraints (see Section 2) then only results 1, 2, 3, 7, 8, 9 and the
soundness half of 6 (replacing T by T> where necessary) continue to hold.
The strong form of completeness of successful derivations (result 4)
[Maher, 1987] provides an interesting departure from the conventional logic
programming theory. It shows that in CLP it is necessary, in general,
to consider and combine several successful derivations and their answers
to establish that c -4 G holds, whereas only one successful derivation is
necessary in standard logic programming. The other results in this theorem
are more direct liftings of results in the logic programming theory.
The CLP Scheme provides a framework in which the lifting of results
from LP to CLP is almost trivial. By replacing the Herbrand universe by
an arbitrary constraint domain D, unifiability by constraint satisfaction,
Clark's equality theory by a corresponding satisfaction-complete theory,
Constraint Logic Programming 617

etc., most results (and even their proofs) lift from LP to CLP. The lifting
is discussed in greater detail in [Maher, 1993b]. Furthermore, most opera-
tional aspects of LP (and Prolog) can be interpreted as logical operations,
and consequently these operations (although not their implementations)
also lift to CLP. One early example is matching, which is used in various
LP systems (e.g. GHC, NU-Prolog) as a basis for affecting the computation
rule; the corresponding operation in CLP is constraint entailment [Maher,
1987].
The philosophy of the CLP Scheme [Jaffar and Lassez, 1987] gives pri-
macy to the structure T> over which computation is performed, and less
prominence to the theory T. We have followed this approach. However,
it is also possible to start with a satisfaction complete theory T (see, for
example [Maher, 1987]) without reference to a structure. We can arbitrar-
ily choose a model of T as the structure D> and the same results apply.
Another variation [Hohfeld and Smolka, 1988] considers a collection D of
structures and defines consistent(C) to hold iff for some structure D> 6 D
we have T> \= 3 C. Weaker forms of the soundness and completeness of
successful derivations apply in this case.

7 Bottom-up execution
Bottom-up execution has its main use in database applications. The set-
at-a-time processing limits the number of accesses to secondary storage in
comparison to tuple-at-a-time processing (as in top-down execution), and
the simple semantics gives great scope for query optimization.
Bottom-up execution is also formalized as a transition system. For
every rule r of the form h f- c, b\ , . . . , b1 in P and every set A of facts
there is a transition
A ~> A U {h f- c' | Oj «- Ci, i = 1, . . . , n are elements of
A,T>\=cf <->• cA Ar=i Ci A&i = at}

In brief, then, we have A ~» AL)S®(A), for every set A and every rule r in P
(Sp was defined in Section 4). An execution is a sequence of transitions. It
is fair if each rule is applied infinitely often. The limit of ground instances
of sets generated by fair executions is independent of the order in which
transitions are applied, and is regarded as the result of the bottom-up
execution. If Q is an initial set of facts and P is a program, and A is the
result of a fair bottom-up execution then A = SS(P U Q) = ((P))(Q) and

An execution Q= Xo ~~* X\ -~* • • • Xi ~> • • • terminates if, for some m


and every i > m, Xi = Xm. We say P is finitary if for every finite initial
set of facts Q and every fair execution, there is a k such that [Xi]-D = [Xk\v
for all i > k. However, execution can be non-terminating, even when the
program is finitary and the initial set is finite.
618 Joxan Jaffar and Michael J. Maher

Example 7.0.1. Consider the following program P on the constraint do-


main ^Lin-
p(X+l) <- p(X)
p(X) f- X > 5
p(X) <- X < 5
Straightforward bottom-up computation gives {p(x) «- x > 5,p(x) «- x >
6,p(x) «- a; > 7,...} U {p(x) «- a; < 5,p(a;) «- a; < 6,p(z) «- z < 7,...},
and does not terminate. We also have lfp(T^) = T£ f 1 = {p(^) | d € &}.
A necessary technique is to test whether a new fact is subsumed by
the current set of facts, and accumulate only unsubsumed facts. A fact
p(x) <- c is subsumed by the facts p(x) <-C=,i = l , . . . , n (with respect to
(£>,£)) if Z> (= c -> V™=i ci- The transitions in the modified bottom-up
execution model are

A ~* A U reduce(SJ?(A), A)

where reduce(X, Y) eliminates from X all elements subsumed by Y. Un-


der this execution model every finitary program terminates on every finite
initial set Q.
Unfortunately, checking subsumption is computationally expensive, in
general. If the constraint domain ( D > , L ) does not satisfy the independence
of negated constraints then the problem of showing that a new fact is not
subsumed is at least NP-hard (see [Srivastava, 1993] for the proof in one
constraint domain). In constraint domains with independence of negated
constraints the problem is not as bad: the new fact only needs to be checked
against one fact at a time [Maher, 1993b]. (Classical database optimiza-
tions are also more difficult without independence of negated constraints
[Klug, 1988; Maher, 1993b]) A pragmatic approach to the problem of
subsumption in 3tijn is given in [Srivastava, 1993]. Some work avoids the
problem of subsumption by allowing only ground facts in the database and
intermediate computations.
Even with subsumption, there is still the problem that execution might
not terminate (for example, if P is not finitary). The approach of [Kanel-
lakis et at, 1990] is to restrict the constraint domains D> to those which
only permit the computation of finitely representable relations from finitely
representable relations. This requirement is slightly weaker than requiring
that all programs are finitary, but it is not clear that there is a practical
difference. Regrettably, very few constraint domains satisfy this condition,
and those which do have limited expressive power.
The alternative is to take advantage of P and a specific query (or a
class of queries). A transformation technique such as magic templates
[Ramakrishnan, 1991] produces a program Pmg that is equivalent to P
for the specific query. Other techniques [Kemp et al., 1989; Mumick et
al., 1990; Srivastava and Ramakrishnan, 1992; Kemp and Stuckey, 1993]
Constraint Logic Programming 619

attempt to further limit execution by placing constraints at appropriate


points in the program. Analyses can be used to check that execution of
the resulting program terminates [Krishnamurthy et al, 1988; Sagiv and
Vardi, 1989; Brodsky and Sagiv, 1991], although most work has ignored
the capability of using constraints in the answers.
Comparatively little work has been done on the nuts and bolts of im-
plementing bottom-up execution for CLP programs, with all the work ad-
dressing the constraint domain 3?i,in. [Kanellakis et al., 1990] suggested the
use of intervals, computed as the projection of a collection of constraints, as
the basis for indexing on constrained variables. Several different data struc-
tures, originally developed for spatial databases or computational geome-
try, have been proposed as appropriate for indexing [Kanellakis et al., 1990;
Srivastava, 1993; Brodsky et al., 1993]. A new data structure was presented
in [Kanellakis et al., 1993] which minimizes accesses to secondary storage.
A sort-join algorithm for joins on constrained variables is given in [Brodsky
et al., 1993]. That paper also provides a query optimization methodology
for conjunctive queries that can balance the cost of constraint manipulation
against the cost of traditional database operations.

8 Concurrent constraint logic programming


Concurrent programming languages are languages which allow the descrip-
tion of collections of processes which may interact with each other. In con-
current constraint logic programming (CCLP) languages, communication
and synchronization are performed by asserting and testing for constraints.
The operational semantics of these languages are quite similar to the top-
down execution described in Section 5. However, the different context in
which they are used results in a lesser importance of the corresponding
logical semantics.
For this discussion we will consider only the flat ask-tell CCLP lan-
guages, which were defined in [Saraswat, 1988; Saraswat, 1989] based on
ideas from [Maher, 1987]. We further restrict our attention to languages
with only committed-choice nondeterminism (sometimes called don't-care
nondeterminism); more general languages will be discussed in Section 9.
For more details of CCLP languages, see [Saraswat, to appear: 2; de Boer
and Palamidessi, 1993].
Just as Prolog can be viewed as a kind of CLP language, obtained by a
particular choice of constraint domain, so most concurrent logic languages
can be viewed as concurrent CLP languages15.
A program rule takes the form

h «- ask : tell I B
15
Concurrent Prolog [Shapiro, 1983a] is not an ask-tell language, but [Saraswat, 1988]
shows how it can be fitted inside the CCLP framework.
620 Joxan Jaffar and Michael J. Maher

where h is an atom, B is a collection of atoms, and ask and tell are


constraints. Many treatments of concurrent constraint languages employ
a language based on a process algebra involving ask and tell primitives
[Saraswat, 1989], but we use the syntax above to emphasize the similarities
to other CLP languages.
For the sake of brevity, we present a simpler transition system to de-
scribe the operational semantics than the transition system in Section 5.
However, implemented languages can make the same pragmatic compro-
mises on testing consistency (and implication) as reflected in that transition
system. The states in this transition system have the form {A,C) where A
is a collection of atoms and C is a collection of constraints. Any state can
be an initial state. The transitions in the transition system are:

(AUa,C) -+r (A u B,C u (a = h) U ask U tell)

if h 4- ask : tell \ B is a rule of P renamed to new variables x, h and a have


the same predicate symbol, T>^C-t'Bxa = h/\ ask and V ^ 3 C A a —
h A ask A tell. Roughly speaking, a transition can occur with such a rule
provided the accumulated constraints imply the ask constraint and do not
contradict the tell constraint. Some languages use only the ask constraint
for synchronization. It is shown in [de Boer and Palamidessi, 1991] that
such languages are strictly less expressive than ask-tell languages.
An operational semantics such as the above is not completely faithful
to a real execution of the language, since it is possible for two atoms to be
rewritten simultaneously in an execution environment with concurrency.
The above semantics only allows rewritings to be interleaved. A 'true
concurrency' semantics, based on graph-rewriting, is given in [Montanari
and Rossi, 1993].
All ask-tell CCLP programs have the following monotonicity [Saraswat
et al., 1988] or stability [Gaifman et al., 1991] property: If (A,C) -*T
(A',C') and V |= C" -» ' then (A,C") ->r (A',C"). This property
provides for simple solutions to some problems in distributed computing
related to reliability. When looked at in a more general framework [Gaif-
man et al., 1991], stability seems to be one advantage of CCLP languages
over other languages; most programs in conventional languages for con-
currency are not stable. It is interesting to note that a notion of global
failure (as represented in Section 5 by the state f a i l ) destroys stability. Of
course, there are also pragmatic reasons for wanting to avoid this notion in
a concurrent language. A framework which permits non-monotonic CCLP
languages is discussed in [de Boer et al., 1993].
A program is determinate if every reachable state is determinate, where
a state is determinate if every selected atom gives rise to at most one
—tr transition. Consequently, for every initial state, every fair derivation
rewrites the same atoms with the same rules, or every derivation fails. Thus
Constraint Logic Programming 621

non-failed derivations by determinate programs from an initial state differ


from each other only in the order of rewriting (and the renaming of rules).
Substantial parts of many programs are determinate16. The interest in
determinate programs arises from an elegant semantics for such programs
based upon closure operators [Saraswat et al., 1991]. For every collection
of atoms A, the semantics of A is given by the function PA(C) = 3-j C'
where (A,C) -^ (A',C'), x is the free variables of (A,C), and (A',C')
is a final state. This semantics is extended in [Saraswat et al., 1991] to
a compositional and fully abstract semantics of arbitrary programs. A
semantics based on traces is given in [de Boer and Palamidessi, 1990].
For determinate programs we also have a clean application of the clas-
sical logical semantics of a program [Maher, 1987]. If (A, C) ->* (A',C')
then P*, T> \= A A C O 3_x A' A C1 where x is the free variables of (A, C}.
In cases where execution can be guaranteed not to suspend any atom in-
definitely, the soundness and completeness results for success and failure
hold (see Section 6).

9 Linguistic extensions
We discuss in this section some additional linguistic features for top-down
CLP languages.
9.1 Shrinking the computation tree
The aim of -*i transitions is to extract as much information as is reasonable
from the passive constraints, so that the branching of —>r transitions is
reduced. There are several other techniques, used or proposed, for achieving
this result.
In [Le Provost and Wallace, 1993] it is suggested that information can
also be extracted from the atoms in a state. The constraint extracted would
be an approximation of the answers to the atom. This operation can be
expressed by an additional transition rule:

(A U a,C,S) -*x {A U a ,C,S U c)


where extract(a,C) = c. Here extract is a function satisfying P*, D> (=
(a A C) -> c. The evaluation of extract, performed at run-time, involves
an abstract (or approximate) execution of (a, C, 0). For example, if P
defines p with the facts p(l, 2) and p(3,4) then the constraint extracted by
extract(p(x, y), 0) might be y = x + 1.
16
For the programs we consider, determinate programs can be characterized syntac-
tically by the following condition: for every pair of rules (renamed apart, except for
identical heads) h «— ask : tell2 |B2 and h 4— ask2 ', tell2 B2 in the program,
we have D|= -i(ask A ask 2 A tell 1 ) or D|= -i(asfci A ask 2 A tell2). In languages where
procedures can be hidden (as in many process algebra formulations) or there is a restric-
tion on the initial states, the class of determinate programs is larger but is not as easily
characterized.
622 Joxan Jaffar and Michael J. Maher

A more widespread technique is to modify the order in which atoms are


selected. Most CLP systems employ the Prolog left-to-right computation
rule. This improves the "programmability" by providing a predictable flow
of control. However, when an appropriate flow of control is data-dependent
or very complex (for example, in combinatorial search problems) greater
flexibility is required.
One solution to this problem is to incorporate a data-dependent compu-
tation rule in the language. The Andorra principle [Warren, 1987] involves
selecting determinate atoms, if possible. (A determinate atom is an atom
which only gives rise to one ->ris transition.) A second approach is to al-
low the programmer to annotate parts of the program (atoms, predicates,
clauses, ...) to provide a more flexible computation rule that is, nonethe-
less, programmed. This approach was pioneered in Prolog II [Colmerauer,
1982a] and MU-Prolog [Naish, 1986]. The automatic annotation of pro-
grams [Naish, 1985] brings this approach closer to the first. A third ap-
proach is to introduce constructs from concurrent logic programming into
the language. There are basically two varieties of this approach: guarded
rules and guarded atoms. The former introduces a committed-choice as-
pect into the language, whereas the latter is a variant of the second ap-
proach. All these approaches originated for conventional logic programs,
but the ideas lift to constraint logic programs, and there are now several
proposals based on these ideas [Janson and Haridi, 1991; Smolka, 1991;
Ai't-Kaci and Poldeski, 1993c; van Hentenryck and Deville, 1991b; van Hen-
tenryck et al., 1993].
One potential problem with using guarded rules is that the completeness
of the operational semantics with respect to the logical semantics of the
program can be lost. This incompleteness was shown to be avoided in
ALPS [Maher, 1987] (modulo infinitely delayed atoms), but that work was
heavily reliant on determinacy. Smolka [Smolka, 1991] discusses a language
of guarded rules which extends ALPS and a methodology for extending a
predicate definition with new guarded rules such that completeness can be
retained, even though execution would involve indeterminate committed-
choice. The Andorra Kernel Language (AKL) [Janson and Haridi, 1991]
also combines the Andorra principle with guarded rules. There the interest
is in providing a language which subsumes the expressive power of CCLP
languages and CLP languages.
Guarded atoms and, more generally, guarded goals take the form c -»• G
where c is a constraint17 and G is a goal. G is available for execution
only when c is implied by the current active constraints. We call c the
guard constraint and G the delayed goal. Although the underlying mecha-
nisms are very similar, guarded atoms and guarded rules differ substantially
as linguistic features, since guarded atoms can be combined conjunctively
17
We also permit the meta-level constraint ground(x).
Constraint Logic Programming 623

whereas guards in guarded rules are combined disjunctively.

9.2 Complex constraints


Several language constructs that can be said simply to be complex con-
straints have been added to CLP languages. We can classify them as fol-
lows: those which implement Boolean combinations of (generally simple)
constraints and those which describe an ad hoc, often application-specific,
relation. Falling into the first category are some implementations of con-
straint disjunction [van Hentenryck et al., 1993; De Backer and Beringer,
1993] (sometimes called 'constructive disjunction') and the cardinality op-
erator [van Hentenryck and Deville, 1991a]. Into the second category fall
the element constraint [Dincbas et al., 1988b], and the cumulative con-
straint of [Aggoun and Beldiceanu, 1992], among others. These constraints
are already accounted for in the operational semantics of Section 5, since
they can be considered passive constraints in £. However, it also can be
useful to view them as additions to a better-known constraint domain (in-
deed, this is how they arose).
The cardinality operator can be used to express any Boolean combina-
tion of constraints. A use of this combinator has the form #(L,[ci,..., cn],
U), where the ci, are constraints and L and U are variables. This use ex-
presses that the number of constraints Ci that are true lies between the
value of L and the value of U (lower and upper bound respectively). By
constraining L > 1 the combinator represents the disjunction of the con-
straints; by constraining U — 0 the combinator represents the conjunction
of the negations of the constraints. The cardinality combinator is imple-
mented by testing whether the constraints are entailed by or are inconsis-
tent with the constraint store, and comparing the numbers of entailed and
inconsistent constraints with the values of L and U. When L and U are
not ground, the cardinality constraint can produce a constraint on these
variables. (For example, after one constraint is found to be inconsistent U
can be constrained by U < n — 1.)
In constraint languages without disjunction, an intended disjunction
c 1 ( x ) V ci(x) must be represented by a pair of clauses

p(x) <- ci (x)


p(x) <-c2(x)
In a simple CLP language this representation forces a choice to be made
(between the two disjuncts). Constructive disjunction refers to the direct
use of a disjunctive constraint without immediately making a choice. In-
stead an active constraint is computed which is a safe approximation to
the disjunction in the context of the current constraint store C. In the
constraint domain FD, [van Hentenryck et al., 1993] suggests two possible
approximations, one based on approximating each constraint C A Ci using
624 Joxan Jaffar and Michael J. Maher

the domain of each variable and the other (less accurately) approximat-
ing each constraint using the interval constraints for each variable. The
disjunction of these approximations is easily approximated by an active
constraint. For linear arithmetic [De Backer and Beringer, 1993] suggests
the use of the convex hull of the regions defined by the two constraints as
the approximation. Note that the constructive disjunction behavior could
be obtained from the clauses for p using the methods of [Le Provost and
Wallace, 1993].
In the second category, we mention two constructs used with the fi-
nite domain solver of CHIP. element(X, L, T) expresses that T is the
X'ih element in the list L. Operationally, it allows constraints on ei-
ther the index X or element T of the list to be reflected by constraints
on the other. For example, if X is constrained so that X 6 {1,3,5} then
element(X, [1,1,2,3,5,8], T) can constrain T so that T e {1,2,5} and,
similarly, if T is constrained so that T € {1,3,5} then X is constrained
so that X G {1,2,4,5}. Declaratively, the cumulative constraint of [Ag-
goun and Beldiceanu, 1992] expresses a collection of linear inequalities on
its arguments. Several problems that can be expressed as integer pro-
gramming problems can be expressed with cumulative. Operationally, it
behaves somewhat differently from the way CHIP would usually treat the
inequalities.

9.3 User-defined constraints


Complex constraints are generally "built in" to the language. There are
proposals to extend CLP languages to allow the user to define new con-
straints, together with inference rules specifying how the new constraints
react with the constraint store.
A basic approach is to use guarded clauses. The new constraint pred-
icate is defined with guarded clauses, where the guards specify the cases
in which the constraint is to be simplified, and the body is an equivalent
conjunction of constraints. Using ground(x) (or a similar construct) as a
guard constraint, it is straightforward to implement local propagation (i.e.
propagation of ground values). We give an example of this use in Section
11.1, and [Saraswat, 1987] has other examples. Some more general forms
of propagation can also be expressed with guarded clauses.
The work [Friihwirth and Hanschke, 1993] can be seen as an extension
of this method. The new constraints occur as predicates, and guarded rules
(called constraint handling rules) are used to simplify the new constraints.
However, the guarded rules may have two (or more) atoms in the head.
Execution matches the head with a collection of constraint atoms in the
goal and reduces to an equivalent conjunction of constraints. This method
appears able to express more powerful solving methods than the guarded
clauses. For example, transitivity of the user-defined constraint leg can be
Constraint Logic Programming 625

specified by the rule


leq(X, Y), leq(Y, Z) ==> true I leq(X, Z) .
whereas it is not clear how to express this in a one-atom-per-head guarded
clause. A drawback of having multiple atoms, however, is inefficiency. In
particular, it is not clear whether constraint handling rules can produce
incremental (in the sense defined in Section 10.1) constraint solvers except
in simple cases.
A different approach [van Hentenryck et al., 1991] proposes the intro-
duction of 'indexicaT terms which refer to aspects of the state of the con-
straint solver (thus providing a limited form of reflection)18. Constraints
containing these terms are called indexical constraints, and from these
indexical constraints user-defined constraints are built. Specifically, [van
Hentenryck et al., 1991] discusses a language over finite domains which
can access the current domain and upper and lower bounds on the value
of a variable using the indexical terms dom(X), max(X) and min(X)
respectively. Indexical constraints have an operational semantics: each
constraint defines a method of propagation. For example, the constraint
Y in 0..max(X) continually enforces the upper bound of Y to be less than
or equal to the upper bound of X. This same behavior can be obtained
in a finite domain solver with the constraint Y < X, but the advantage of
indexical constraints is that there is greater control over propagation: with
Y < X we also propagate changes in the lower bound of Y to X, whereas
we can avoid this with indexical constraints. A discussion of an imple-
mentation of indexical constraints is given in [Diaz and Codognet, 1993].
(One application of this work is a constraint solver for Boolean constraints
[Codognet and Diaz, 1993]; we describe this application in Section 13.5.)

9.4 Negation
Treatments of negation in logic programming lift readily to constraint logic
programming, with only minor adjustments necessary. Indeed many of
the semantics for programs with negation are essentially propositional, be-
ing based upon the collection of ground instances of program rules. The
perfect model [Przymusinski, 1988; Apt et al., 1988; van Gelder, 1988],
well-founded model [van Gelder et al., 1988], stable model [Gelfond and
Lifschitz, 1988] and Fitting fixedpoint semantics [Fitting, 1986], to name
but a few, fall into this category. The grounding of variables in CLP rules
by all elements of the domain (i.e. by all terms in £*) and the deletion of
all grounded rules whose constraints evaluate to false produces the desired
prepositional rules (see, for example, [Maher, 1993a]).
Other declarative semantics, based on Clark's completion P* of the

18
This approach has been called a 'glass-box' approach.
626 Joxan Jaffar and Michael J. Maher

program, also extend to CLP19. The counterpart of comp(P) [Clark, 1978;


Lloyd, 1987] is T,P*, where T is satisfaction complete. Interestingly, it is
necessary to consider the complete theory T of the domain if the equiv-
alence of three-valued logical consequences of T, P* and consequences of
finite iterations of Fitting's $ operator (as shown by Kunen [Kunen, 1987]),
continues to hold for CLP programs [Stuckey, 1991].
SLDNF-resolution and its variants are also relatively untouched by the
lifting to CLP programs, although, of course, they must use a consistency
test instead of unification. The other main modification is that groundness
must be replaced by the concept of a variable being determined by the
current constraints (see Section 2). For example, a safe computation rule
[Lloyd, 1987] may select a non-ground negated atom provided all the vari-
ables in the atom are determined by the current collection of constraints.
Similarly, the definition of an allowed rule [Lloyd, 1987] for a CLP pro-
gram requires that every variable either appear in a positive literal in the
body or be determined by the constraints in the body. With these modifi-
cations, various soundness and completeness results for SLDNF-resolution
and comp(P) extend easily to ideal CLP systems. An alternative implemen-
tation of negation, constructive negation [Chan, 1988], has been expanded
and applied to CLP programs by Stuckey [Stuckey, 1991], who gave the
first completeness result for this method.

9.5 Preferred solutions


Often it is desirable to express an ordering (or preference) on solutions to
a goal. This can provide a basis for computing only the 'best' solutions
to the query. One approach is to adapt the approach of mathematical
programming (operations research) and employ an objective function [van
Hentenryck, 1989a; Maher and Stuckey, 1989]. An optimization primitive
is added to the language to compute the optimal value of the objective
function20.
CHIP and cc(FD) have such primitives, but they have a non-logical
behavior. Two recent papers [Fages, 1993; Marriott and Stuckey, 1993b]
discuss optimization primitives based upon the following logical character-
ization:
m is the minimum value of f ( x ) such that G(x) holds iff

3x (G(x) A f(x) = m) A -By (G(y) A f(y) < m)


Optimization primitives can be implemented by a branch and bound ap-
proach, pruning the computation tree of G based on the current minimum.
A similar behavior can be obtained through constructive negation, using
19
For example, the extension to allow arbitrary first-order formulas in the bodies of
rules [Lloyd and Topor, 1984].
20
We discuss only minimization; maximization is similar.
Constraint Logic Programming 627

the above logical formulation [Fages, 1993; Marriott and Stuckey, 1993b],
although a special-purpose implementation is more efficient. [Marriott and
Stuckey, 1993b] gives a completeness result for such an implementation,
based on Kunen's semantics for negation.
A second approach is to admit constraints which are not required to
be satisfied by a solution, but express a preference for solutions which do
satisfy them. Such constraints are sometimes called soft constraints. The
most developed use of this approach is in hierarchical constraint logic pro-
gramming (HCLP) [Borning et al., 1989; Wilson and Horning, 1993]. In
HCLP, soft constraints have different strengths and the constraints accu-
mulated during a derivation form a constraint hierarchy based on these
strengths. There are many possible ways to compare solutions using these
constraint hierarchies [Borning et al., 1989; Maher and Stuckey, 1989;
Wilson and Borning, 1993], different methods being suitable for different
problems. The hierarchy dictates that any number of weak constraints
can be over-ruled by a stronger constraint. Thus, for example, default be-
havior can be expressed in a program by weak constraints, which will be
over-ruled by stronger constraints when non-default behavior is required.
The restriction to best solutions of a constraint hierarchy can be viewed as
a form of circumscription [Satoh and Aiba, 1993].
Each of the above approaches has some programming advantages over
the other, in certain applications, but both have problems as general-
purpose methods. While the first approach works well when there is a
natural choice of objective function suggested by the problem, in general
there is no natural choice. The second approach provides a higher-level
expression of preference but it cannot be so easily 'fine-tuned' and it can
produce an exponential number of best answers if not used carefully. The
approaches have the advantages and disadvantages of explicit (respectively,
implicit) representations of preference. In the first approach, it can be dif-
ficult to reflect intended preferences. In the second approach it is easier
to reflect intended preferences, but harder to detect inconsistency in these
preferences. It is also possible to 'weight' soft constraints, which provides
a combination of both approaches.
Implementation issues
The main innovation required to implement a CLP system is clearly in the
manipulation of constraints. Thus the main focus in this part of the survey
is on constraint solver operations, described in Section 10. Section 11 then
considers the problem of extending the LP inference engine to deal with
constraints. Here the discussion is not tied down to a particular constraint
domain.
It is important to note that the algorithms and data structures in this
part are presented in view of their use in top-down systems and, in partic-
ular, systems with backtracking. At the present, there is little experience
628 Joxan Jaffar and Michael J. Maher

in implementing bottom-up CLP systems, and so we do not discuss them


here. However, some of the algorithms we discuss can be used, perhaps
with modification, in bottom-up systems.

10 Algorithms for constraint solving


In view of the operational semantics presented in part I, there are several
operations involving constraints to be implemented. These include: a sat-
isfiability test, to implement consistent and infer; an entailment test, to
implement guarded goals; and the projection of the constraint store onto a
set of variables, to compute the answer constraint from a final state. The
constraint solver must also be able to undo the effects of adding constraints
when the inference engine backtracks. In this section we discuss the core
efficiency issues in the implementation of these operations.

10.1 Incrementality
According to the folklore of CLP, algorithms for CLP implementations must
be incremental in order to be practical. However, this prescription is not
totally satisfactory, since the term incremental can be used in two different
senses. On one hand, incrementality is used to refer to the nature of the
algorithm. That is, an algorithm is incremental if it accumulates an internal
state and a new input is processed in combination with the internal state.
Such algorithms are sometimes called on-line algorithms. On the other
hand, incrementality is sometimes used to refer to the performance of the
algorithm. This section serves to clarify the latter notion of incrementality
as a prelude to our discussion of algorithms in the following subsections.
We do not, however, offer a formal definition of incrementality.
We begin by abstracting away the inference engine from the operational
semantics, to leave simply the constraint solver and its operations. We
consider the state of the constraint solver to consist of the constraint store
C, a collection of constraints G that are to be entailed, and some backtrack
points. In the initial state, denoted by 0, there are no constraints nor
backtrack points. The constraint solver reacts to a sequence of operations,
and results in (a) a new state, and (b) a response.
Recall that the operations in CLP languages are:
• augment C with c to obtain a new store, determine whether the new
store is satisfiable, and if so, determine which constraints in G are
implied by the new store;
• add a new constraint to G;
• set a backtrack point (and associate with it the current state of the
system);
• backtrack to the previous backtrack point (i.e. return the state of the
system to that associated with the backtrack point);
• project C onto a fixed set of variables.
Constraint Logic Programming 629

Only the first and last of these operations can produce a response from the
constraint solver.
Consider the application of a sequence of operations o1,..., Ok on a
state A; denote the updated state by -F(A,oi.. .o/t), and the sequence of
responses to the operations by Q(o1.. . Ok). In what follows we shall be
concerned with the average cost of computing f and Q. Using standard
definitions, this cost is parameterized by the distribution of (sequences of)
operations (see, for example, [Vitter and Flajolet, 1990]). We use average
cost assuming the true distribution, the distribution that reflects what
occurs in practice. Even though this distribution is almost always not
known, we often have some hypotheses about it. For example, one can
identify typical and often occurring operation sequences and hence can
approximate the true distribution accordingly. The informal definitions
below therefore are intended to be a guide, as opposed to a formal tool for
cost analysis.
For an expression exp(o) denoting a function of o, define 4V[exp(o)]
to be the average value of exp(o), over all sequences of operations 6. Note
that the definition of average here is also dependent on the distribution of
the o. For example, let cost(o) denote the cost of computing F ( 0 , 0) by
some algorithm, for each fixed sequence o. Then AV[cost(6)] denotes the
average cost of computing ?((£), 6) over all o.
Let A be shorthand for f-((£>,o\.. .Ok-1) Let A denote an algorithm
which applies a sequence of operations on the initial state, giving the same
response as does the constraint solver, but not necessarily computing the
new state. That is, A is the batch (or off-line) version of our constraint
solver. In what follows we discuss what it means for an algorithm to be
incremental relative to some algorithm A. Intuitively A represents the best
available batch algorithm for the operations.
At one extreme, we consider that an algorithm for f- and Q is 'non-
incremental' relative to A if the average cost of applying an extra operation
Ok to A is no better than the cost of the straightforward approach using A
on o1... Ok • We express this as

AV [cost(A, Ok)] > AV [costA(oi ... o k ) ] .

At the other extreme, we consider that an algorithm for T and Q is 'per-


fectly incremental', relative to A, if its cost is no worse than that of A. In
other words, no cost is incurred for the incremental nature of the algorithm.
We express this as

AV [cost(0,oi ...Ok-i) + cost(A,Ofc)] < AV[costA(oi.. .Ok)].

In general, any algorithm lies somewhere in between these two extremes.


For example, it will not be perfectly incremental as indicated by the cost
630 Joxan Jaffar and Michael J. Maher

formula above, but instead we have

AV[cost(0, QI ...Ok-i) + c
extra.cost(oi . . . O k )

where the additional term extra.cost(oi . . .Ok) denotes the extra cost in-
curred by the on-line algorithm over the best batch algorithm. Therefore,
one possible "definition" of an incremental algorithm, good enough for use
in a CLP system, is simply that its extra-cost factor is negligible.
In what follows, we shall tacitly bear in mind this expression to ob-
tain a rough definition of incrementality21 . Although we have defined in-
crementality for a collection of operations, we will review the operations
individually, and discuss incrementality in isolation. This can sometimes
be an oversimplification; for example, [Mannila and Ukkonen, 1986] has
shown that the standard unification problem does not remain linear when
backtracking is considered. In general, however, it is simply too complex,
in a survey article, to do otherwise.

10.2 Satisfiability (non-incremental)


We consider first the basic problem of determining satisfiability of con-
straints independent of the requirement for incrementality. As we will see
in the brief tour below of our sample domains, the dominant criterion used
by system implementers is not the worst-case time complexity of the algo-
rithm.
For the domain FT, linear time algorithms are known [Paterson and
Wegman, 1978], and for RT", the best-known algorithms are almost linear
time [Jaffar, 1984]. Even so, most Prolog systems implement an algorithm
for the latter22 because the best-case complexity of unification in FT is
also linear, whereas it is often the case that unification in 72.7" can be done
without inspecting all parts of the terms being unified. Hence in prac-
tice Prolog systems are really implementations of CLP (727") rather than
CLP(FT). In fact, many Prolog systems choose to use straightforward al-
gorithms which are slower, in the worst case, than these almost linear time
algorithms. The reason for this choice (of algorithms which are quadratic
time or slower in the worst case) is the belief that these algorithms are
faster on average [Albert et al, 1993].
For the arithmetic domain of 3?LinBqn, the most straightforward algo-
rithm is based on Gaussian elimination, and this has quadratic worst-case
21
There are similar notions found in the (non-CLP) literature; see the bibliography
[Ramalingam and Reps, 1993].
22
This is often realized simply by omitting the 'occur-check' operation from a standard
unification algorithm for fT. Some Prolog systems perform such an omission naively,
and thus obtain an incomplete algorithm which may not terminate in certain cases.
These cases are considered pathological and hence are ignored. Other systems guarantee
termination at slightly higher cost, but enjoy the new feature of cyclic data structures.
Constraint Logic Programming 631

complexity. For the more general domain Sftz,ira, polynomial time algo-
rithms are also known [Khachian, 1979], but these algorithms are not used
in practical CLP systems. Instead, the Simplex algorithm (see eg. [Chvatal,
1983]), despite its exponential time worst case complexity [Klee and Minty,
1972], is used as a basis for the algorithm. However, since the Simplex
algorithm works over non-negative numbers and non-strict inequalities, it
must be extended for use in CLP systems. While such an extension is
straightforward in principle, implementations must be carefully engineered
to avoid significant overhead. The main differences between the Simplex-
based solvers in CLP systems is in the specific realization of this basic
algorithm. For example, the CLP(R) system uses a floating-point repre-
sentation of numbers, whereas the solvers of CHIP and Prolog III use exact
precision rational number arithmetic. As another example, in the CLP (72.)
system a major design decision was to separately deal with equations and
inequalities, enjoying a faster (Gaussian-elimination based) algorithm for
equations, but enduring a cost for communication between the two kinds
of algorithms [Jaffar et al., 1992a]. Some elements of the CHIP solver are
described in [van Hentenryck and Graf, 1991]. Disequality constraints can
be handled using entailment of the corresponding equation (discussed in
Section 10.4) since an independence of negative constraints holds [Lassez
and McAloon, 1992].
For the domain of word equations W£, an algorithm is known [Makanin,
1977] but no efficient algorithm is known. In fact, the general problem,
though easily provable to be NP-hard, is not known to be in NP. The
most efficient algorithm known still has the basic structure of the Makanin
algorithm but uses a far better bound for termination [Koscielski and Pa-
cholski, 1992]. Systems using word equations, Prolog III for example, thus
resort to partial constraint solving using a standard delay technique on the
lengths of word variables. Rajasekar's 'string logic programs' [Rajasekaar,
1993] also uses a partial solution of word equations. First, solutions are
found for equations over the lengths of the word variables appearing in the
constraint; only then is the word equation solved.
As with word equations, the satisfiability problem in finite domains
such as FD is almost always NP-hard. Partial constraint solving is once
again required, and here is a typical approach. Attach to each variable x a
data structure representing dom(x), its current possible values23. Clearly
dom(x) should be a superset of the projection space w.r.t. x. Define min(x)
and max(x) to be the smallest and largest numbers in dom(x) respectively.
Now, assume that every constraint is written in so that each inequality is
of the form x < y or x < y, each disequality is of the form x ^ y, and each
equation is of the form x — n, x = y, x = y + z, where x,y,z are variables

23
The choice of such a data structure should depend on the size of the finite domains.
For example, with small domains a characteristic vector is appropriate.
632 Joxan Jaffar and Michael J. Maher

and n a number. Clearly every constraint in FD can be rewritten into a


conjunction of these constraints.
The algorithm considers one constraint at a time and has two main
phases. First, it performs an action which is determined by the form of
the constraint: (a) for constraints x < y, ensure that min(x) < max(y) by
modifying dom(x) and/or dom(y) appropriately24; (b) for x < y, ensure
that min(x) < max(y); (c) for x ^ y, consider three subcases: if dom(x) n
dom(y) = (o then the constraint reduces to true; otherwise, if dom(x) =
{n}, then remove n from dom(y) (and similarly for the case when dom(y)
is a singleton25); otherwise, nothing more need be done; (d) for x — n,
simple make dom(x) = {n}; (e) for x = y, make dom(x) = dom(y) =
dom(x)ndom(y); (f) for x = y + z, ensure that max(x) > min(y)+min(z)
and min(x) < max(y) + max(z). If at any time during steps (a) through
(f) the domain of a variable becomes empty, then unsatisfiability has been
detected. The second phase of this algorithm is that for each x such that
dom(x) is changed by some action in steps (a) through (f), all constraints
(but the current one that gave rise to this action) that contain x are re-
considered for further action. Termination is, of course, assured simply
because the domains are finite.
In the domain of Boolean algebra BOOC, there are a variety of tech-
niques for testing satisfiability. Since the problem is NP-complete, none of
these can be expected to perform efficiently over all constraints. An early
technique, pioneered by Davis and Putnam, is based upon variable elimi-
nation. The essential idea reduces a normal form representation into two
smaller problems, each with one less variable. Binary decision diagrams
[Bryant, 1986] provide an efficient representation. One of the two Boolean
solvers of CHIP, for example, uses variable elimination and these diagrams.
A related technique is based on enumeration and propagation. The con-
straints are expressed as a conjunction of simple constraints and then local
propagation simplifies the conjunction after each enumeration step. See
[Codognet and Diaz, 1993], for example. The method used in Prolog III
[Benhamou, 1993] is a modification of SL-resolution whose main element
is the elimination of redundant expressions. Another technique comes from
operations research. Here the Boolean formula is restated in arithmetic
form, with variables constrained to be 0 or 1. Then standard techniques for
integer programming, for example cutting-planes, can be used. See [Chan-
dru, 1991] for a further discussion of this technique. This technique has not
been used in CLP systems. A more recent development is the adaptation
of Buchberger's Groebner basis algorithm to Boolean algebras [Sakai et al.,

24
In this case, simply remove from dom(x) all elements bigger than max(y), and
remove from dom(y) all elements smaller than min(x). We omit the details of similar
operations in the following discussion.
25
If both are singletons, clearly the constraint reduces to either true or false.
Constraint Logic Programming 633

to appear], which is used in CAL. Finally, there is the class of algorithms


which perform Boolean unification; see the survey [Martin and Nipkow,
1989] for example. Here satisfiability testing is only part of the problem
addressed, and hence we will discuss these algorithms in the next section.
The satisfiability problem for feature trees is essentially the same as
the satisfiability problem for rational trees, provided that the number of
features that may occur is bounded by a known constant [Ai't-Kaci et ai,
1992]. (Generally this bounding constant can be determined at compile-
time.) Two different sort constraints on the same variable clash in the same
way that different function symbols on terms in KT clash. An equation
between two feature tree variables (of the same sort) induces equations
between all the subtrees determined by the features of the variables, in
the same way as occurs in "R.T. The main difference is that some sorts or
features may be undetermined (roughly, unbound) in FEAT.
10.3 Satisfiability (incremental)
As alluded to above, it is crucial that the algorithm that determines the
satisfiability of a tentatively new constraint store be incremental. For exam-
ple, a linear-time algorithm for a satisfiability problem is often as good as
one can get. Consider a sequence of constraints c1,..., Ck of approximately
equal size N. A naive application of this linear-time algorithm to decide
c1, then c1 A c2, A • • •, and finally ci A • • • A Ck could incur a cost propor-
tional to Nk2, on average. In contrast, a perfectly incremental algorithm
as discussed in Section 10.1 has a cost of O(Nk), on average.
In practice, most algorithms represent constraints in some kind of solved
form, a format in which the satisfiability of the constraints is evident. Thus
the satisfiability problem is essentially that of reducibility into solved form.
For example, standard unification algorithms for FT represent constraints
by (one variant of) its mgu, that is, in the form x1 = t1 (y),..., xn = tn(y)
where each ti(y) denotes a term structure containing variables from y,
and no variable Xi appears in y. Similarly, linear equations in ^LinBqn are
often represented in parametric form x1 = le1(y), ...,xn = len(y) where
each lei(y) denotes a linear expression containing variables from y, and no
variable Xi appears in y. In both these examples, call the Xi eliminable
variables, and the yi parametric variables. For linear inequalities in 3?/,in,
the Simplex algorithm represents the constraints in an n x m matrix form
Ax = B where A contains an n x n identity submatrix, defining the basis
variables, and all numbers in the column vector B are nonnegative. For
domains based on a unitary equality theory [Siekmann, 1989], the standard
representation is the mgu, as in the case of FT (which corresponds to the
most elementary equality theory). Word equations over W£, however, are
associated with an infinitary theory, and thus a unification algorithm for
these equations [Jaffar, 1990] may not terminate. A solved form for word
equations, or any closed form solution for that matter, is not known.
634 Joxan Jaffar and Michael J. Maher

The first two kinds of solved form above are also examples of solution
forms, that is, a format in which the set of all solutions of the constraints
is evident. Here, any instance of the variables y determines values for x
and thus gives one solution. The set of all such instances gives the set of
all solutions. The Simplex format, however, is not in solution form: each
choice of basis variables depicts just one particular solution.
An important property of solution forms (and sometimes of just solved
forms) is that they define a convenient representation of the projection of
the solution space with respect to any set of variables. More specifically,
each variable can be equated with a substitution expression containing only
parametric variables, that is, variables whose projections are the entire
space. This property, in turn, aids incrementality as we now show via our
sample domains.
In each of the following examples, let C be a (satisfiable) constraint in
solved form and let c be the new constraint at hand. For FT, the substitu-
tion expression for a variable x is simply ax if a is not eliminable; otherwise
it is the expression equated to x in the solved form C. This mapping is
generalized to terms in the obvious way. Similarly we can define a map-
ping of linear expressions by replacing the eliminable variables therein with
their substitution expressions, and then collecting like terms. For the do-
main 9?Ltni in which case C is in Simplex form, the substitution expression
for a variable x is simply a: if a; is not basic; otherwise it is the expression
obtained by writing the (unique) equation in C containing x with x as the
subject. Once again, this mapping can be generalized to any linear expres-
sion in an obvious way. In summary, a solution form defines a mapping
9 which can be used to map any expression t into an equivalent form t9
which is free of eliminable variables.
The basic step of a satisfiability algorithm using a solution form is
essentially this.
Algorithm 10.3.1. Given C, (a) Replace the newly considered constraint
c by c9 where 9 is the substitution defined by C. (b) Then write c9 into
equations of the form x = ..., and this involves choosing the x and rear-
ranging terms. Unsatisfiability is detected at this stage, (c) If the previous
step succeeds, use the new equations to substitute out all occurrences of x
in C. (d) Finally, simply add the new equations to C, to obtain a solution
form for C A c.
Note that the nonappearance of eliminable variables in substitution
expressions is needed in (b) to ensure that the new equations themselves
are in solved form, and in (c) to ensure that C, augmented with the new
equations, remains in solution form.
The belief that this methodology leads to an incremental algorithm is
based upon believing that the cost of dealing with c is more closely related
to the size of c (which is small on average) than that of C (which is very
Constraint Logic Programming 635

large on average). This, in turn, is based upon believing that


• the substitution expressions for the eliminable variables in c, which
largely determine the size of c8, often have a size that is independent
of the size of C, and
• the number of occurrences of the new eliminable variable x in C,
which largely determines the cost of substituting out a; in C, is small
in comparison to the size of C.
The domain FT provides a particularly good example of a solved form
for which the basic algorithm 10.3.1 is incremental. Consider a standard
implementation in which there is only one location for each variable, and
all references to x are implemented by pointers. Given C in solved form,
and given a new constraint c, there is really nothing to do to obtain cO since
the eliminable (or in this case, bound) variables in c are already pointers
to their substitution expressions. Now if cd is satisfiable and we obtain the
new equations x = ..., then just one pointer setting of x to its substitution
expression is required, and we are done. In other words, the cost of this
pointer-based algorithm i s focused o n determining t h e satisfiability o f
using the new equations incurs no cost.
For ' S l i n E q n , the size of cd can be large, even though the finally obtained
equations may not be. For example, if C contain just xi = u — v,Xz = v — w,
X3 = w — u, and c were y = x1 + x1 + £3, then cS is as big as C. Upon
rearrangement, however, the finally obtained new equation is simply y = 0.
Next, the substitution phase using the new equation also can enlarge the
equations in C (even if temporarily), and rearrangement is needed in each
equation substituted upon. In general, however, the above beliefs hold in
practice, and the algorithm behaves incrementally.
We next consider the domain TIT whose universally used solved form
(due to [Colmerauer, 1982b]) is like that of FT with one important change:
constraints are represented in the form x1 = t1,...,x n = tn where each
ti is an arbitrary term structure. Thus this solved form differs from that
of FT in that the ti can contain the variables Xj, and hence algorithm 1
is not directly applicable. It is easy to show that a constraint is satisfi-
able iff it has such a solved form, and further, the solved form is a fairly
explicit representation of the set of all solutions (though not as explicit
as the solution forms for FT or Rz,linEqn)- A straightforward satisfiability
algorithm [Colmerauer, 1982b] is roughly as follows. Let x stand for a vari-
able, and s and t stand for non-variable terms. Now, perform the following
rewrite rules until none is applicable, (a) discard each x = x; (b) for any
x = y, replace a by y throughout; (c) replace t = x by x = t; (d) replace
/(s1,.. .,sn) = f(ti,...,tn), n > 0, by n equations Si = ti, 1 < i < n; (e)
replace /(...) = g(-) by false (and thus the entire collection of constraints
is unsatisfiable); (f) replace every pair of equations x = t1,x = t1, and say
636 Joxan Jaffar and Michael J. Maher

ti is not bigger than t^, by x = t1, tt — t?. Termination needs to be argued,


but we will leave the details to [Colmerauer, 1982b].
We now discuss algorithms which do not fit exactly with Algorithm
1, but which employ a solved form. Consider first the Simplex algorithm
for the domain 9? L i n . The basic step of one pivoting operation within this
algorithm is essentially the same as Algorithm 1. The arguments for incre-
mentality for Algorithm 1 thus apply. The main difference from Algorithm
1 is that, in general, several pivoting operations are required to produce
the final solved form. However, empirical evidence from CLP systems has
shown that often the number of pivoting operations is small [Jaffar et al.,
1992a].
In the Boolean domain, Boolean unification algorithms [Martin and
Nipkow, 1989] conform to the structure of Algorithm 1. One unification
algorithm is essentially due to Boole, and we borrow the following presen-
tation from [van Hentenryck, 1991]. Without loss of generality, assume the
constraints are of the form t(x1 , . . . , xn) = 0 where the Xi are the variables
in t. Assuming n > 2, rewrite t = 0 into the form

i ,..., x n _i ) = 0
so that the problem for t — 0 can be reduced to that of

,..., xn-i) A h(xi , . . . , z n _i ) = 0


which contains one less variable. If this equation is satisfiable, then the
'assignment'

xn = h(xi,...,xn-i)®->g(xi,...,xn-i) Aj/ n
where yn is a new variable, describes all the possible solutions for xn.
This reduction clearly can be repeatedly applied until we are left with the
straightforward problem of deciding the satisfiability of equations of the
form t A x © u — 0 where t and u are ground. The unifier desired is given
simply by collecting (and substituting out all assigned variables in) the
assignments, such as that for xn above.
The key efficiency problem here is, of course, that the variable elimina-
tion process gives rise to larger expressions, an increase which is exponential
in the number of eliminated variables, in the worst case. So even though
this algorithm satisfies the structure of Algorithm 1, it does not satisfy
our assumption about the size of expressions obtained after substitution,
and hence our general argument for incrementality does not apply here.
Despite this, and the fact that Boole's work dates far back, this method is
still used, for example, in CHIP [Buttner and Simonis, 1987].
Another unification algorithm is due to Lowenhein, and we adapt the
presentation of [Martin and Nipkow, 1989] here. Let f(xi, . . . ,xn) = 0 be
Constraint Logic Programming 637

the equation considered. Let a denote a solution. The unifier is then simply
given by

z< = y«v f(y) A (yi v a{), i < i < n


where the j/» are new variables. The basic efficiency problem is of course
to determine a. The obtained unifiers are only slightly larger than /, in
contrast to Boole's method. Thus Lowenhein's method provides a way of
extending a constructive satisfiability test into a satisfiability test which
has an incremental form. However, this method is not, to our knowledge,
used in CLP languages.
Other algorithms for testing the satisfiability of Boolean constraints are
considerably different from Algorithm 1. The Groebner basis algorithm pro-
duces a basis for the space of Boolean constraints implied by the constraint
store. It is a solved form but not a solution form. The remaining algorithms
mentioned in the previous subsection do not have a solved form. The al-
gorithm used in Prolog III retains the set of literals implied to be true by
the constraints, but the normal form does not guarantee solvability: that
must be tested beforehand. Enumeration algorithms have the same behav-
ior: they exhibit a solution, and may retain some further information, but
they do not compute a solved form.
In the domain of feature trees T£AT, equations occur only between
variables. Thus Algorithm 1 does not address the whole problem. Exist-
ing algorithms [Ait-Kaci et al., 1992; Smolka and Treinen, 1992] employ a
solved form in which all implied equations between variables are explicit
and there are no clashes of sort. Such solved forms are, in fact, solution
forms. The implied equations are found by congruence closure, treating the
features as (partial) functions, analogously to rule (d) in the algorithm for
nr.
In summary for this subsection, an important property for algorithms
to decide satisfiability is that they have good average case behavior. More
important, and even crucially so, is that the algorithm is incremental. To-
ward this goal, a common technique is to use a solved form representation
for satisfiable constraints.

10.4 Entailment
Given satisfiable C, guard constraints G such that no constraint therein is
entailed by C, and a new constraint c, the problem at hand is to determine
the subset GI of G of constraints entailed by C A c. We will also consider
the problem of detecting groundness which is not, strictly speaking, an
entailment problem. However, it is essentially the same as the problem of
detecting groundness to a specific value, which is an entailment problem.
In what follows the distinction is unimportant.
638 Joxan Jaffar and Michael J. Maher

We next present a rule of thumb to determine whether an entailment


algorithm is incremental in the sense discussed earlier. The important factor
is not the number of constraints entailed after a change in the store, but
instead, the number of constraints not entailed. That is, the algorithm
must be able to ignore the latter constraints so that the costs incurred
depend only on the number of entailed constraints, as opposed to the total
number of guard constraints. As in the case of incremental satisfiability, the
property of incremental entailment is a crucial one for the implementation
of practical CLP systems.
We now briefly discuss modifications to some of the previously discussed
algorithms for satisfiability, which provide for incremental entailment.
Consider the domain J-T and suppose G contains only guard constraints
of the form x = t where t is some ground term26. Add to a standard imple-
mentation of a unification algorithm an index structure mapping variables
x to just those guard constraints in G which involve x. (See [Carlsson, 1987]
for a detailed description.) Now add to the process of constructing a solved
form a check for groundness when variables are bound (and this is easily
detectable). This gives rise to an incremental algorithm because the only
guard constraints that are inspected are those x = t for which a has just
become ground, and not the entire collection G.
Just as with satisfiability, testing entailment is essentially the same
over the domains RT and F£AJ'. Four recent works have addressed this
problem, all in the context of a CLP system, but with slightly differing
constraint domains. We will discuss them all in terms of RT. With some
modifications, these works can also apply to FT.
In [Smolka and Treinen, 1992; Ai't-Kaci et ai, 1992] a theoretical foun-
dation is built. [Smolka and Treinen, 1992] then proposes a concrete algo-
rithm, very roughly as follows: the to-be-entailed constraint c is added to
the constraint store C. The satisfiability tester has the capability of detect-
ing whether c is entailed by or inconsistent with C. If neither is detected
then c', essentially a simplified form of c, is stored and the effect of adding
c to C is undone. Every time a constraint is added to C that affects d this
entailment test is repeated (with c' instead of c).
The algorithm of [Podelski and van Roy, 1993] has some similarities
to the previous algorithm, but avoids the necessity of undoing operations.
Instead, operations that might affect C are delayed and/or performed on a
special data-structure separate from C. Strong incrementality is claimed: if
we replace average-case complexity by worst-case complexity, the algorithm
satisfies our criterion for perfect incrementality.
[Ramachandran and van Hentenryck, 1993] goes beyond the problem of
entailing equations to give an algorithm for entailment when both equa-

26
As mentioned above, this discussion will essentially apply to guard constraints of
the form ground(x).
Constraint Logic Programming 639

tions and disequations (^) are considered constraints. This algorithm has a
rather different basis than those discussed above; it involves memorization
of pairs of terms (entailments and disequations) and the use of a reduction
of disequation entailment to several equation entailments.
For ytunEqn, let G contain arbitrary equations e. Add to the algorithm
which constructs the solved form a representation of each such equation e in
which all eliminable variables are substituted out. Note, however, that even
though these equations are stored with the other equations in the constraint
store, they are considered as a distinct collection, and they play no direct
role in the question of satisfiability of the current store. For example, a
constraint store containing x = z + 3, y = z + 2 would cause the guard
equation y + z = 4 to be represented as z = 1. It is easy to show that
a guard equation e is entailed iff its representation reduces to the trivial
form 0 = 0, and similarly, the equation is refuted if its representation is of
the form 0 = n where n is a nonzero number. (In our example, the guard
equation is entailed or refuted just in case z becomes ground.) In order to
have incrementality we must argue that the substitution operation is often
applied only to very few of the guard constraints. This is tantamount to
the second assumption made to argue the incrementality of Algorithm 1.
Hence we believe our algorithm is incremental.
We move now to the domain 9tnn, but allow only equations in the guard
constraints G. Here we can proceed as in the above discussion for ^LinEqn
to obtain an incremental algorithm, but we will have the further require-
ment that the constraint store contains all implicit equalities27 explicitly
represented as equations. It is then still easy to show that the entailment
of a guard equation e need be checked only when the representation of e
is trivial. The argument for incrementality given above for ^tiinBqn essen-
tially holds here, provided that the cost of computing implicit equalities is
sufficiently low.
There are two main works on the detection of implicit equalities in CLP
systems over &£,;„. In [Studkey, 1989], the existence of implicit equalities is
detected by the appearance of an equation of a special kind in the Simplex
tableau at the end of the satisfiability checking process. Such an equation
indicates some of the implicit equalities, but more pivoting (which, in turn,
can give rise to more special equations) is generally required to find all of
them. An important characteristic of this algorithm is that the extra cost
incurred is proportional to the number of implicit equalities. This method
is used in CLP (R) and Prolog III. CHIP uses a method based on [van
Hentenryck and Graf, 1991]. In this method, a solved form which is more
restrictive than the usual Simplex solved form is used. An equation in this

27
These are equalities which are entailed by the store because of the presence of
inequalities. For example, the constraint store x + y<3,x + y>3 entails the implicit
equality x + y = 3.
640 Joxan Jaffar and Michael J. Maher

form does not contribute to any implicit equality, and a whole tableau in
this solved form implies that there are no implicit equalities. The basic
idea is then to maintain the constraints in the solved form and when a
new constraint is encountered, the search for implicit equalities can be first
limited to variables in the new constraint. One added feature of this solved
form is that it directly accommodates strict inequalities and disequations.
Next still consider the domain ^Lin, but now allow inequalities to be
in G. Here it is not clear how to represent a guard inequality, say x > 5, in
such a way that its entailment or refutation is detectable by some simple
format in its representation. Using the Simplex tableau format as a solved
form as discussed above, and using the same intuition as in the discussion
of guard equations, we could substitute out x in x > 5 in case x is basic.
However, it is not clear to which format(s) we should limit the resulting
expression in order to avoid explicitly checking whether x > 5 is entailed28.
Thus an incremental algorithm for checking the entailment of inequalities
is yet to be found.
For BOOL there seems to be a similar problem in detecting the entail-
ment of Boolean constraints. However, in the case of groundness entailment
some of the algorithms we have previously discussed are potentially incre-
mental. The Prolog III algorithm, in fact, is designed with the detection of
groundness as a criterion. The algorithm represents explicitly all variables
that are grounded by the constraints. The Groebner basis algorithm will
also contain in its basis an explicit representation of grounded variables.
Finally, for the unification algorithms, the issue is clearly the form of the
unifier. If the unifier is in fully simplified form then every ground variable
will be associated with a ground value.
In summary for this subsection, the problem of detecting entailment is
not limited just to the cost of determining if a particular constraint is en-
tailed. Incrementality is crucial, and this property can be defined roughly as
limiting the cost to depend on the number of guard constraints affected by
each change to the store. In particular, dealing (even briefly) with the entire
collection of guard constraints each-time the store changes is unacceptable.
Below, in Section 11.1, an issue related to entailment is taken up. Here
we have focused on how to adapt the underlying satisfiability algorithm
to be incremental for determining entailment. There we will consider the
generic problem, independent of the constraint domain, of managing de-
layed goals which awake when certain constraints become entailed.

10.5 Projection
The problem at hand is to obtain a useful representation of the projection
of constraints C w.r.t. a given set of variables. More formally, the problem
28
And this can, of course, be done, perhaps even efficiently, but the crucial point is,
once again, that we cannot afford to do this every time the store changes.
Constraint Logic Programming 641

is: given target variables x and constraints C(x, y) involving variables from
x and y, express 3y C(x,y) in the most usable form. While we cannot de-
fine usability formally, it typically means both conciseness and readability.
An important area of use is the output phase of a CLP system: the desired
output from running a goal is the projection of the answer constraints with
respect to the goal variables. Here it is often useful to have only the tar-
get variables output (though, depending on the domain, this is not always
possible). For example, the output of x = z + l,y = z + 2 w.r.t. to x and
y should be x = y — I or some rearrangement of this, but it should not
involve any other variable. Another area of use is in meta-programming
where a description of the current store may be wanted for further manip-
ulation. For example, projecting ^Lin constraints onto a single variable x
can show if x is bounded, and if so, this bound can be used in the pro-
gram. Projection also provides the logical basis for eliminating variables
from the accumulated set of constraints, once it is known that they will
not be referred to again.
There are few general principles that guide the design of projection
algorithms across the various constraint domains. The primary reason is,
of course, that these algorithms have to be intimately related to the do-
main at hand. We therefore will simply resort to briefly mentioning existing
approaches for some of our sample domains.
The projection problem is particularly simple for the domain TT: the
result of projection is x = x9 where 0 is the substitution obtained from the
solved form of C. Now, we have described above that this solved form is
simply the mgu of C, that is, equations whose r.h.s. does not contain any
variable on the l.h.s. For example, x — f ( y ) , y = f ( z ) would have the
solved form x = f ( f ( z ) ) , y = f ( z ) . However, the equations x = f ( y ) , y =
f(z) are more efficiently stored internally as they are (and this is done
in actual implementations). The solved form for x therefore is obtained
only when needed (during unification for example) by fully dereferencing
y in the term f ( y ) . A direct representation of the projection of C on a
variable x, as required in a printout for example, can be exponential in
the size of C. This happens, for example, if C is of the form x = f ( x i ,xi),
xi — f(x2,X2), • • -, xn = f ( a , a) because x0 would contain 2n+1 occurrences
of the constant a. A solution would be to present the several equations
equivalent to x = x0, such as the n + 1 equations in this example. This
however is a less explicit representation of the projection; for example, it
would not always be obvious if a variable were ground.
Projection in the domain RT can be done by simply presenting those
equations whose l.h.s. is a target variable and, recursively, all equations
whose l.h.s. appears in anything already presented. Such a straightforward
presentation is in general not the most compact. For example, the equation
x = f ( f ( x , x ) , f ( x , x ) ) is best presented as x = f(x,x). In general, the
problem of finding the most compact representation is roughly equivalent to
642 Joxan Jaffar and Michael J. Maher

the problem of minimizing states in a finite state automaton [Colmerauer,


1982b].

let x i , . . . , xn be the target variables;


for (i = 1; i < n; i = i + 1) {
if (xi is a parameter) continue;
let e denote the equation x< = r.h.s.(xi) at hand;
if (r.h.s.(xi) contains a variable z of lower priority than Xi) {
choose the z of lowest priority;
rewrite the equation e. into the form z = t;
if (z is a target variable) mark the equation e as final;
substitute t for z in the other equations ;
} else mark the equation e as final;
}
return all final equations;

Fig. 1. Projection algorithm for linear equations

For RLinEqn the problem is only slightly more complicated. Recall that
equations are maintained in parametric form, with eliminable and para-
metric variables. A relatively simple algorithm can be obtained by using a
form of Gaussian elimination, and is informally described in Figure 1. It
assumes there is some ordering on variables, and ensures that lower pri-
ority variables are represented in terms of higher priority variables. This
ordering is arbitrary, except for the fact that the target variables should
be of higher priority than other variables. We remark that a crucial point
for efficiency is that the main loop in Figure 1 iterates n times, and this
number (the number of target variables) is often far smaller than the total
number of variables in the system. More details on this algorithm can be
found in [Jaffar et ai, 1993].
For ytiin, there is a relatively simple projection algorithm. Assume all
inequalities are written in a standard form ... < 0. Let C+ (C~) denote the
subset of constraints C in which x has only positive (negative) coefficients.
Let C® denote those inequalities in C not containing x at all. We can now
describe an algorithm, due to Fourier [Fourier, 1824], which eliminates a
variable x from a given C. If constraints c and c' have a positive and a nega-
tive coefficient of x, we can define elimx(c, c') to be a linear combination of
c and c', which does not contain x29. A Fourier step eliminates x from a set
of constraints C by computing FX(C) = {elimx(c,c') : c e C+,c' e C~}.
It is easy to show that 3xC <-> FX(C). Clearly repeated applications of F

29
Obtained, for example, by multiplying c by 1/m and c' by (—1/m'), where m and m'
are the coefficients of x in c and c' respectively, and then adding the resulting equations
together.
Constraint Logic Programming 643

eliminating all non-target variables result in an equivalent set of constraints


in which the only variables (if any) are target variables.
The main problem with this algorithm is that the worst-case size of
FX(C) is O(N2), where N is the number of constraints in C. (It is in fact
precisely \C°\ + (|C+| x \C~\) - (\C+\ + \C~\).) In principle, the number
of constraints needed to describe 3x C using inequalities over variables
var(C) — {x} is far larger than the number of inequalities in C. In prac-
tice, however, the Fourier step generates many redundant constraints30. See
[Lassez et al., 1989] for a discussion on such redundancy. Work by Cernikov
[Cernikov, 1963] proposed tests on the generated constraints to detect and
eliminate some redundant constraints. The output module of the CLP(7£)
system [Jaffar et al., 1993] furthered these ideas, as did Imbert [Imbert,
1993b]. (Imbert [Imbert, 1993a] also considered the more general problem
in which there are disequations.) All these redundancy elimination meth-
ods are correct in the following sense: if {Ci}j=1>2]... is the sequence of
constraints generated during the elimination of variables x\, • • •, Xi from
C, then Ci «->• 3xi... Xi C, for every i.
The survey [Chandru, 1993] contains further perspectives on the Fourier
variable elimination technique. It also contains a discussion on how the
essential technique of Fourier can be adapted to perform projection in other
domains such as linear integer constraints and the Boolean domain.
We finish here by mentioning the non-Fourier algorithms of [Huynh et
al., 1990; Lassez and Lassez, 1991]. In some circumstances, especially when
the matrix representing the constraints is dense, the algorithm of [Huynh et
al., 1990] can be far more efficient. It is, however, believed that typical CLP
programs produce sparse matrices. The algorithm of [Lassez and Lassez,
1991] has the advantageous property that it can produce an approximation
of the projection if the size of the projection is unmanageably large.

10.6 Backtracking
The issue here is to restore the state of the constraint solver to a previ-
ous state (or, at least, an equivalent state). The most common technique,
following Prolog, is the trailing of constraints when they are modified by
the constraint solver and the restoration of these constraints upon back-
tracking. In Prolog, constraints are equations between terms, represented
internally as bindings of variables. Since variables are implemented as point-
ers to their bound values31, backtracking can be facilitated by the simple
mechanism of an untagged trail [Warren, 1983; Ait-Kaci, 1991]. This iden-
tifies the set of variables which have been bound since the last choice point.
Upon backtracking, these variables are simply reset to become unbound.
30
A constraint c e C is redundant in C if C 4+ C — {c}.
31
Recall that this means that eliminable variables are not explicitly dereferenced on
the r.h.s. of the equations in the solved form.
644 Joxan Jaffar and Michael J. Maher

Thus in Prolog, the only information to be trailed is which variables have


just become bound, and untrailing simply unbinds these variables.
For CLP in general, it is necessary to record changes to constraints.
While in Prolog a variable's expression simply becomes more and more in-
stantiated during (forward) execution, in CLP an expression may be com-
pletely changed from its original form. In RLinEqn, for example, a variable
x may have an original linear form and subsequently another. Assuming
that a choice point is encountered just before the change in x, the orig-
inal linear form needs to be trailed in case of backtracking. This kind of
requirement in fact holds in all our sample domains with the exception of
TT and RT. Thus we have our first requirement on our trailing mecha-
nism: the trail is a value trail, that is, each variable is trailed together with
its associated expression. (Strictly speaking, we need to trail constraints
rather than the expression with which a variable is associated. However,
constraints are typically represented internally as an association between a
variable and an expression. )
Now, the trailing of expressions is in general far more costly than the
trailing of the variables alone. For this reason, it is often useful to avoid
trailing when there is no choice point between the time a variable changes
value from one expression to another. A standard technique facilitating this
involves the use of time stamps: a variable is always time stamped with the
time that it last changed value, and every choice point is also time stamped
when created. Now just before a variable's value is to be changed, its time
stamp n is compared with the time stamp m of the most recent choice
point, and if n > m, clearly no trailing is needed32.
Next consider the use of a cross-reference table for solved forms, such as
those discussed for the arithmetic domains, which use parametric variables.
This is an index structure which maps each parametric variable to a list of
its occurrences in the solved form. Such a structure is particularly useful,
and can even be crucial for efficiency, in the process of substituting out a
parametric variable (step (c) in Algorithm 10.3.1). However, its use adds to
the backtracking problem. A straightforward approach is to simply trail the
entries in this table (according to time stamps). However, since these entries
are in general quite large, and since the cross reference table is redundant
from a semantic point of view, a useful approach is to reconstruct the table
upon backtracking. The details of such reconstruction are straightforward
but tedious, and hence are omitted here; see [Jaffar et al., 1992a] for the
case of the CLP(7£) system. A final remark: this reconstruction approach
has the added advantage of incurring cost only when backtracking actually
takes place.
In summary, backtracking in CLP is substantially more complex than
in Prolog. Some useful concepts to be added to the Prolog machinery are
32
In Prolog, one can think of the stack position of a variable as the time stamp.
Constraint Logic Programming 645

as follows: a value trail (and, in practice, a tagged trail as well because


most systems will accommodate variables of different types, for example,
the functor and arithmetic variables in CLP(7£)); time stamps, to avoid
repeated trailing for a variable during the lifetime of the same choice point;
and finally, reconstruction of cross-references, rather than trailing.

11 Inference engine
This section deals with extensions to the basic inference engine for logic pro-
gramming needed because of constraints. What follows contains two main
sections. In the first, we consider the problem of an incremental algorithm
to manage a collection of delayed goals and constraints. This problem, dis-
cussed independently of the particular constraint domain at hand, reduces
to the problem of determining which of a given set of guard constraints (cf.
Section 9) are affected as a result of change to the constraint store. The
next section discusses extensions to the WAM, in both the design of the
instruction set as well as in the main elements of the runtime structure.
Finally, we give a brief discussion of work on parallel implementations.

11.1 Delaying/wakeup of goals and constraints


The problem at hand is to determine when a delayed goal is to be woken or
when a passive constraint becomes active. The criterion for such an event
is given by a guard constraint, that is, awaken the goal or activate the
constraint when the guard constraint is entailed by the store33. In what
follows, we use the term delayed constraint as synonymous with passive
constraint, to emphasize the similarities with delayed goals.
The underlying implementation issue, as far as the constraint solver is
concerned, is how to efficiently process just those guard constraints that
are affected as a result of a new input constraint34. Specifically, to achieve
incrementality, the cost of processing a change to the current collection of
guard constraints should be related to the guard constraints affected by
the change, and not to all the guard constraints. The following two items
seem necessary to achieve this end.
First is a representation of what further constraints are needed so that
a given guard constraint is entailed. For example, consider the delayed
CLP(7£) constraint pow(x,y,z) (meaning x = yz) which in general awaits
the grounding of two of the three variables x,y,z. In constrast, the con-
straint pow(l,y, z), only awaits the grounding of y (to a nonzero number)
33
For guarded clauses, the problem is extended to determining which clause is to be
chosen.
34
However, significant changes to the inference engine are needed to handle delayed
goals and guarded clauses. But these issues are the same as those faced by extending
logic programming systems to implement delayed goals and/or guarded clauses (see, for
example, [Tick, 1993]).
646 Joxan Jaffar and Michael J. Maher

or z (to 1). In general, a delayed constraint is awoken by not one but a


conjunction of several input constraints. When a subset of such input con-
straints has already been encountered, the runtime structure should relate
the delayed constraint to (the disjunction of) just the remaining kinds of
constraints which will awaken it.
Second, we require some index structure which allows immediate ac-
cess to just the guard constraints affected as the result of a new input
constraint. The main challenge is how to maintain such a structure in the
presence of backtracking. For example, if changes to the structure were
trailed using some adaptation of Prolog techniques [Warren, 1983], then a
cost proportional to the number of entries can be incurred even though no
guard constraints are affected.
The following material is a condensation of [Jaffar et al., 1991].

11.1.1 Wakeup systems


For the purposes of this section, we will describe an instance of a constraint
in the form p ( $ i , . . . , $n) f\C where p is the n-ary constraint symbol at hand,
$ 1 . . . , $ „ are distinguished variables used as templates for the arguments
of p, and C is a constraint (which determines the values of $1,..., $n).
A wakeup degree represents a subset of the p constraints, and a wakeup
system consists of a set of wakeup degrees, and further, these degrees are
organized into an automaton where transitions between degrees are labelled
by constraints called wakeup conditions35. Intuitively, a transition occurs
when a wakeup condition becomes entailed by the store. There is a dis-
tinguished degree called woken which represents active p constraints. We
proceed now with an example.

Fig. 2. Wakeup system for pow/3

35
These are templates for the guard constraints.
Constraint Logic Programming 647

Consider the CLP(7£) constraint pow(x, y, z) and see Figure 2. A wakeup


degree may be specified by means of constraints containing $1,..., $3 (for
describing the three arguments) and some meta-constants # , # i , # 2 , - - -
(for describing unspecified values). Thus, for example, $2 = # specifies
that the second argument is ground. Such a meta-language can also be
used to specify the wakeup conditions. Thus, for example, the wakeup
condition $2 = #,# <> 0,# <> 1 attached to the bottom-most de-
gree in figure 2 represents a transition of a constraint pow($\,$2,$3) AC,
where C does not ground $2, into pow($i, $2, $3) A C A c, where C A c does
ground $2 into a number different from 0 and 1. The wakeup condition
$2 = 1 which represents a transition to the degree woken, represents the
fact that pow($i, 1, $3) is an active constraint (equivalent to $1 = 1). Simi-
larly, $3 = 1 represents the transition to the active constraint $1 = $2. Note
that there is no wakeup condition $ 2 = 0 because pow($i,0,$3) (which is
equivalent to ($1 = 0 A $3 ^ 0) V ($1 = 1 A $3 = 0)) is not active.
In general, there will be certain requirements on the structure of such an
automaton to ensure that it does in fact define a mapping from p constraints
into wakeup degrees, and that this mapping satisfies certain properties such
as: it defines a partition, it maps only active constraints into woken, it is
consistent with the wakeup conditions specifying the transitions, etc. A
starting point will be a formalization of the meta-language used. These
formal aspects are beyond the scope of this survey.
In summary, wakeup systems are an intuitive way to specify the orga-
nization of guard constraints. The wakeup degrees represent the various
different cases of a delayed constraint which should be treated differently
for efficiency reasons. Associated with each degree is a number of wakeup
conditions which specify when an input constraint changes the degree of a
delayed constraint. What is intended is that the wakeup conditions repre-
sent all the situations in which the constraint solver can efficiently update
its knowledge about what further constraints are needed to wake the de-
layed constraint.
Before embarking on the runtime structure to implement delayed con-
straints such as pow, we amplify the abovementioned point about the sim-
ilarities between delayed constraints and guarded clauses. Consider the
guarded clause program:
648 Joxan Jaffar and Michael J. Maer

pow(X,Y,Z) :-
Y=l I X=l.
pow(X,Y,Z) :-
ground(X), X^O, ground(Y), Y^l I Z=log(X)/log(Y).
pow(X,Y,Z) :-
ground(X), X^O, ground(Z) I Y=-^C.
pow(X,Y,Z) :-
ground(Y), Y = l , ground(Z) I X=Y Z .
pow(X,Y,Z) :-
X=0 | Y=0, Z=O.
pow(X,Y,Z) :-
Z=0 I X=l.
pow(X,Y,Z) :-
Z=l I X=Y.
This program could be compiled into the wakeup system in Figure 2, where
the three intermediate nodes reflect subexpressions in the guards that might
be entailed without the entire guard being entailed. (More precisely, several
woken nodes would be used, one for each clause body.) Thus wakeup sys-
tems express a central part of the implementation of (flat) guarded clauses.
Since a guarded atom can be viewed as a one-clause guarded clause pro-
gram for an anonymous predicate, wakeup systems are also applicable to
implementing these constructs.
11.1.2 Runtime structure
Here we present an implementational framework in the context of a given
wakeup system. There are three major operations with delayed goals or
delayed constraints which correspond to the actions of delaying, awakening
and backtracking:
1. adding a goal or delayed constraint to the current collection;
2. awakening a delayed goal or delayed constraint as the result of in-
putting a new (active) constraint, and
3. restoring the entire runtime structure to a previous state, that is,
restoring the collection of delayed goals and delayed constraints to
some earlier collection, and restoring all auxiliary structures accord-
ingly.
In what follows, we concentrate on delayed constraints; as mentioned above,
the constraint solver operations to handle delayed goals and guarded clauses
are essentially the same.
The first of our two major structures is a stack containing the delayed
constraints. Thus implementing operation 1 simply requires a push opera-
tion. Additionally, the stack contains constraints which are newer forms of
Constraint Logic Programming 649

constraints deeper in the stack. For example, if the constraint pow(x, y, z)


were in the stack, and if the input constraint y = 3 were encountered,
then the new constraint pow(x, 3, z) would be pushed, together with a
pointer from the latter to the former. In general, the collection of delayed
constraints contained in the system is described by the sub-collection of
stacked constraints which have no inbound pointers.
Now consider operation 2. In order to implement this efficiently, it is
necessary to have some access structure mapping an entailed constraint
to just those delayed constraints affected. Since there are in general an
infinite number of possible entailed constraints, a finite classification of
them is required. A guard constraint, or simply guard for short, is an
instance of a wakeup condition obtained by renaming the distinguished
argument variables $i into runtime variables. It is used as a template for
describing the collection of entailed constraints (its instances) which affect
the same sub-collection of delayed constraints. For example, suppose that
the only delayed constraint is pow(5,y,z) whose degree is pow(#, $2,83)
with wakeup conditions $2 = # and $3 = #. Then only two guards need
be considered: y = # and z = #.
We now specify an index structure which maps a delayed constraint
into a doubly linked list of occurrence nodes. Each node contains a pointer
to a stack element containing a delayed constraint36. Corresponding to
each occurrence node is a reverse pointer from the stack element to the
occurrence node. Call the list associated with a delayed constraint £>W a
T>W-list, and call each node in the list a £>VV-occurrence node.
Initially the access structure is empty. The following specifies what is
done for the basic operations:
Delay Push the constraint C onto the stack, and for each wakeup condition
associated with (the degree of) C, create the corresponding guard and
DW-list. All occurrence nodes here are pointed to C.
Process entailment Say x = 5 is now entailed. Find all guards which are
implied by x = 5. If there are none, we are done. Otherwise, for each
£>W-list L corresponding to each of these conditions, and for each
constraint C = p(...) A C' pointed to in L, (a) delete all occurrence
nodes pointing to C (using the reverse pointers), push the new delayed
constraint C" = p(...) A C" A x = 5 with a (downward) pointer to C,
and finally, (c) construct the new DW-lists corresponding to C" as
defined above for the delay operation.
Backtrack Restoring the stack during backtracking is easy because it only
requires a series of pops. Restoring the list structure, however, is not
as straightforward because no trailing/saving of the changes was per-
formed. In more detail, the operation of backtracking is the following:
36
The total number of occurrence nodes is generally larger than the number of delayed
constraints.
650 Joxan Jaffar and Michael J. Maher

(a) Pop the stack, and let C denote the constraint just popped, (b)
Delete all occurrence nodes pointed to by C. If there is no pointer
from C (and so it was a constraint that was newly delayed) to another
constraint deeper in the stack, then nothing more need be done, (c)
If there is a pointer from C to another constraint C" (and so C is
the reduced form of C'), then perform the modifications to the access
structure as though C' were being pushed onto the stack. These mod-
ifications, described above, involve computing the guards pertinent to
C", inserting occurrence nodes, and setting up reverse pointers.
Note that the index structure obtained in backtracking may not be
structurally the same as that of the previous state. What is important,
however, is that it depicts the same logical structure as that of the
previous state.

Figure 3 illustrates the entire runtime structure after the two constraints
pow(x,y,z) and pow(y,x,y) are stored, in this order. Figure 4 illustrates
the structure after a new input constraint makes x = 5 entailed.

x=# y=#
x=0 y=l *<>0 **" z=0 z=1

pow(y.x.y)

pow(x,y,z)

Fig. 3. The index structure

In summary, a stack is used to store delayed constraints and their re-


duced forms. An access structure maps a finite number of guards to lists of
delayed constraints. The constraint solver is assumed to identify those con-
ditions which are entailed. The cost of one primitive operation on delayed
constraints (delaying a constraint, upgrading the degree of one delayed con-
straint, including awakening the constraint, and undoing the delay/upgrade
of one constraint) is bounded by the (fixed) size of the underlying wakeup
system. The total cost of an operation (delaying a new constraint, process-
ing an entailed constraint, backtracking) on delayed constraints is propor-
tional to the number of the delayed constraints affected by the operation.
Constraint Logic Programming 651

z=#
#<>0

Fig. 4. The index structure after x = 5 is entailed

11.2 Abstract machine


This section discusses some major issues in the design of an abstract ma-
chine for the execution of CLP programs. The primary focus here will be on
the design of the instruction set, with emphasis on the interaction between
their use and information obtained from a potential program analyzer.
Some elements of the runtime structure will also be mentioned.
In general, the essential features of the parts of an abstract machine
dealing with constraints will differ greatly over CLP languages using dif-
ferent constraint domains. This is exemplified in the literature on CLP(7£)
[Jaffar et al, 1992b], CHIP [Aggoun and Beldiceanu, 1993], and CLP(FD)
[Diaz and Codognet, 1993]. The following presentation, though based on
one work [Jaffar et al., 1992b], contains material that is relevant to abstract
machines for many CLP languages.
We begin by arguing that an abstract machine is the right approach in
the first place. Abstract machines have been used for implementing pro-
gramming languages for many reasons. Portability is one: only an imple-
mentation of the abstract machine needs to be made available on each
platform. Another is simply convenience: it is easier to write a native code
compiler if the task is first reduced to compiling for an abstract machine
that is semantically closer to the source language. The best abstract ma-
chines sit at just the right point on the spectrum between the conceptual
clarity of the high-level source language and the details of the target ma-
chine. In doing so they can often be used to express programs in exactly
the right form for tackling the efficiency issues of a source language. For
example, the Warren abstract machine [Warren, 1983; A'ft-Kaci, 1991] revo-
lutionized the execution of Prolog, since translating programs to the WAM
exposed many opportunities for optimization that were not apparent at the
source level. The benefit from designing an appropriate abstract machine
for a given source language can be so great that even executing the ab-
652 Joxan Jaffar and Michael J. Maher

stract instruction code by interpretation can lead to surprisingly efficient


implementations of a language. Many commercial Prolog systems compile
to WAM-like code. Certainly more efficiency can be obtained from native
code compilation, but the step that made Prolog usable was that of com-
piling to the WAM.
While the WAM made Prolog practical, global analysis shows the po-
tential of making another major leap. For example, [Taylor, 1990] and [van
Roy and Despain, 1990] used fairly efficient analyzers to generate high
quality native code. Based on certain examples, they showed that the code
quality was comparable to that obtained from a C compiler. In the case
of CLP, the opportunities for obtaining valuable information from analysis
are even greater than in Prolog. This is because the constraint solving step
is in general far more involved than the unification step.
11.2.1 Instructions
Next we consider the design of an abstract machine instruction set, in
addition to the basic instruction set of the WAM. While the examples
presented will be for CLP (7£), the discussions are made for CLP systems
in general. More details on this material can be obtained from the theses
[Michaylov, 1992; Yap, 1994].
Our first requirement is a basic instruction for invoking the constraint
solver. The format can be of the form
solve-zxz Xi X2 ... Xn
where xxx indicates the kind of constraint and the X1 denote the arguments.
Typically these arguments are, as in the WAM, stack locations or registers.
For example, in CLP (72.), there are instructions of the form initpf n and
addpf n, X, where n is a number and X a (solver) variable. The former
initializes a parametric form to contain just the number n. The latter adds
an entry of the form n * pf(X) to the parametric form being stored in an
accumulator, where pf(X) is the parametric form for X in the store. Thus
the accumulator in general stores an expression exp of the form n + n1 *Xi
-I- ... + n/t*Xfc. Then, the instruction solve_eqO tests for the consistency
of exp = 0 in conjunction with the store. If consistent, the solver adds the
equation to the store; otherwise, backtracking occurs. There are similar
instructions for inequalities.
There are important special kinds of constraints that justify making
specialized versions of this basic instruction. While there are clearly many
kinds of special cases, some specific to certain constraint domains, there
are three cases which stand out:
1. the constraint is to be added to the store, but no satisfiability check
is needed;
2. the constraint need not be added, but its satisfiability in conjunction
Constraint Logic Programming 653

with the store needs to be checked;


3. the constraint needs to be added and satisfiability needs to be checked,
but the constraint is never used later.

To exemplify the special case 1, consider adding the constraint 5+X-Y = 0


to the store. Suppose that Y = Z + 3.14 is already in the store, and that X
is a new variable. A direct compilation results in the following. Note that
the rightmost column depicts the current state of the accumulator.

initpf 5 accumulator: 5
addpf 1, X accumulator : 5 + X
addpf -1, Y accumulator : 1.86 + X - Z
solve_eqO solve : 1.86 + X - Z = 0

A better compilation can be obtained by using a specialized instruction


solve_no_f ail_eq X which adds the equation X = exp to the store, where
exp is the expression in the accumulator. The main difference here with
solve.eqO is that no satisfiability check is performed. For the above exam-
ple, we now can have

initpf -5 accumulator: —5
addpf -1, Y accumulator: -1.86+Z
solve_no_fail.eq X add : X = -1.86 + Z

In summary for this special case, for CLP systems in general, we often en-
counter constraints which can be organized into a form such that its consis-
tency with the store is obvious. This typically happens when a new variable
appears in an equation, for example, and new variables are often created
in CLP systems. Thus the instructions of the form solve_no_fail-zxx are
justified.
Next consider the special case 2, and the following example CLP (7£)
program.
sum(0, 0).
sum(N, X) :-
N >= 1,
Nl = N - 1,
XI = X - N,
sum(Nl, X I ) .

Of concern to us here are constraints that, if added to the store, can be


shown to become redundant as a result of future additions to the store.
This notion of future redundancy was first described in [Jorgensen et al.,
1991]. Now if we execute the goal sum(N, X) using the second rule above,
654 Joxan Jaffar and Michael J. Maher

we obtain the subgoal


?- N >= 1, Nl = N - 1, XI = X - N, sum(Nl, X I ) .
Continuing the execution, we now have two choices: choosing the first rule
we obtain the new constraint NI = 0, and choosing the second rule we
obtain the constraint Nl > 1 (among others). In each case the original
constraint N > 1 is made redundant. The main point of this example is
that the constraint N > 1 in the second rule should be implemented simply
as a test, and not added to the constraint store. We hence define the new
class of instructions solve_no_add_xxx.
This example shows that future redundant constraints do occur in CLP
systems. However, one apparent difficulty with this special case is the prob-
lem of detecting its occurrence. We will mention relevant work on program
analysis below. Meanwhile, we remark that experiments using CLP ("R.) have
shown that this special case leads to the most substantial efficiency gains
compared to the other two kinds of special cases discussed in this section
[Michaylov, 1992; Yap, 1994].
Finally consider special case 3. Of concern here are constraints which are
neither entailed by the store as in case 1 nor are eventually made redundant
as in case 2, but which are required to be added to the store, and checked for
consistency. What makes these constraints special is that after they have
been added to the store (and the store is recomputed into its new internal
form), their variables appear in those parts of the store that are never again
referred to. Consider the sum program once again. The following sequence
of constraints arises from executing the goal sum(7, X):
(1) XI = X-7
(2) XI' = (X-7)-6
(3) XI" = ((X-7) -6) -5

Upon encountering the second equation XI' = XI — 6 and simplifying


into (2), note that the variable XI will never be referred to in the future.
Hence equation (1) can be deleted. Similarly, upon encountering the third
equation XI" = XI'—5 and simplifying into (3), the variable XI' will never
be referred to in future and so (2) can be deleted. In short, only one equation
involving X need be stored at any point in the computation. We hence add
the class of instructions of the form add_and_delete X which informs the
solver that after considering the constraint associated with X, it may delete
all structures associated with X. In CLP(7£), the corresponding instruction
is addpf _and_delete n, X, the obvious variant of the previously described
instruction addpf n, X. Compiling this sum example gives
(1) init_pf -7
addpf 1, X
Constraint Logic Programming 655

solve_no_fail_eq XI
(2) init_pf -6
addpf.and-delete 1, XI
solve_no_fail_eq XI'
(3) init.pf -5
addpf_and_delete 1, XI'
solve_no_fail_eq XI"

Note that a different set of instructions is required for the first equation
from that required for the remaining equations. Hence the first iteration
needs to be unrolled to produce the most efficient code. The main challenge
for this special case is, as in special case 2, the detection of the special
constraints. We now address this issue.
11.2.2 Techniques for CLP program analysis
The kinds of program analysis required to utilize the specialized instruc-
tions include those techniques developed for Prolog, most prominently, de-
tecting special cases of unification and deterministic predicates. Algorithms
for such analysis have become familiar; see [Debray, 1989a; Debray, 1989b]
for example. See [Garcia and Hermenegildo, 1993], for example, for a de-
scription of how to extend the general techniques of abstract interpretation
applicable in LP to CLP. Our considerations above, however, require rather
specific kinds of analyses.
Detecting redundant variables and future redundant constraints can in
fact be done without dataflow analysis. One simple method involves unfold-
ing the predicate definition (and typically once is enough), and then, in the
case of detecting redundant variables, simply inspecting where variables
occur last in the unfolded definitions. For detecting a future redundant
constraint, the essential step is determining whether the constraints in an
unfolded predicate definition imply the constraint being analyzed.
An early work describing these kinds of optimizations is [Jorgensen et
a/., 1991], and some further discussion can also be found in [Jaffar et al.,
1992b]. The latter first described the abstract machine CLAM for CLP(7t),
and the former first defined and examined the problem of our special case
2, that of detecting and exploiting the existence of future redundant con-
straints in CLP(R). More recently, [McDonald et ai, 1993] reported new
algorithms for the problem of special case 3, that of detecting redundant
variables in CLP(72.). The work [Marriott and Stuckey, 1993a] describes,
in a more general setting, a collection of techniques (entitled refinement,
removal and reordering) for optimization in CLP systems. See also [Mar-
riott et ai, 1994] for an overview of the status of CLP(72.) optimization and
[Michaylov, 1992; Yap, 1994] for detailed empirical results.
Despite the potential of optimization as reported in these works, the
656 Joxan Jaffar and Michael J. Maher

lack of (full) implementations leaves open the practicality of using these and
other sophisticated optimization techniques for CLP systems in general.

11.2.3 Runtime structure


A CLP abstract machine requires the same basic runtime support as the
WAM. Some data structures needed are a routine extension of those for the
WAM - the usual register, stack, heap and trail organization. The main new
structures pertain to the solver. Variables involved in constraints typically
have a solver identifier, which is used to refer to that variable's location in
the solver data structures.
The modifications to the basic WAM architecture typically would be:

Solver identifiers
It is often necessary to have a way to index from a variable to the
constraints it is involved in. Since the WAM structure provides stack
locations for the dynamically created variables, it remains just to have
a tag and value structure to respectively (a) identify the variable as a
solver variable, and (b) access the constraint(s) associated with this
variable. Note that the basic unification algorithm, assuming functors
are used in the constraint system, needs to be augmented to deal with
this new type.
Tagged trail
As mentioned in Section 10.6, the trail in the WAM merely consists
of a stack of addresses to be reset on backtracking. In CLP systems
in general, the trail is also used to store changes to constraints. Hence
a tagged value trail is required. The tags specify what operation is
to be reversed, and the value component, if present, contains any old
data to be restored.
Time-stamped data structures
Time stamps have been briefly discussed in Section 10.6. The basic
idea here is that the data structure representing a constraint may go
through several changes without there being a new choice point en-
countered during this activity. Clearly only one state of the structure
need be trailed for each choice point.
Constraint accumulator
A constraint is typically built up using a basic instruction repeatedly,
for example, the addpf instruction in CLP (72.). During this process,
the partially constructed constraint is represented in an accumulator.
One of the solve instructions then passes the constraint to the solver.
We can think of this linear form accumulator as a generalization of
the accumulator in classical computer architectures, accumulating a
partially constructed constraint instead of a number.
Constraint Logic Programming 657

11.3 Parallel implementations


We briefly outline the main works involving CLP and parallelism. The
opportunities for parallelism in CLP languages are those that arise, and
have already been addressed, in the logic programming context (such as
or-parallelism, and-parallelism, stream-parallelism), and those that arise
because of the presence of a potentially computationally costly constraint
solver.
The first work in this area [van Hentenryck, 1989b] was an experimen-
tal implementation of an or-parallel CLP language with domain FD. That
approach has been pursued with the development of the ElipSys system
[Veron et a/., 1993], which is the most developed of the parallel implemen-
tations of CLP languages.
Atay [Atay, 1992; Atay et al., 1993] presents the or-parallelization of
2LP, a language that computes with linear inequalities over reals and inte-
gers, but in which rules do not have local variables37. Another work deals
with the or-parallel implementation of a CLP language over FD on mas-
sively parallel SIMD computers [Tong and Leung, 1993]. However the basis
for the parallelism is not the nondeterministic choice of rules, as in con-
ventional LP or-parallelism, but the nondeterministic choice of values for
a variable.
Work on and-parallelism in logic programming depends heavily on no-
tions of independence of atoms in a goal. [Garcia et ai, 1993] addresses
this notion in a CLP context, and identify notions of independence for con-
straint solvers which must hold if the advantages of and-parallelism in LP
are to be fully realized in CLP languages. However, there has not, to our
knowledge, been any attempt to produce an and-parallel implementation
of a CLP language.
Two works address both stream-parallelism and parallelism in con-
straint solving. GDCC [Terasaki et al, 1992] is a committed-choice lan-
guage that can be crudely characterized as a committed-choice version of
CAL. It uses constraints over domains of finite trees, Booleans, real num-
bers and integers. [Terasaki et al., 1992] mainly discusses the parallelization
of the Groebner basis algorithms, which are the core of the solvers for the
real number and Boolean constraint domains, and a parallel branch-and-
bound method that is used in the integer solver. Leung [Leung, 1993] ad-
dresses the incorporation of constraint solving in both a committed-choice
language and a language based on the Andorra model of computation. He
presents distributed solvers for finite domains, the Boolean domain and
linear inequalities over the reals. The finite domain solver is based on [van
Hentenryck, 1989a], the solver for the reals parallelizes the Simplex al-
gorithm, and the Boolean solver parallelizes the unification algorithm of
37
We say that a variable in a rule is local if it appears in the body of the rule, but not
in the head.
658 Joxan Jaffar and Michael J. Maher

[Buttner and Simonis, 1987].


Finally, [Burg et al., 1992; Burg, 1992] reports the design and initial
implementation of CLP (R.) with an execution model in which the inference
engine and constraint solver compute concurrently and asynchronously.
One of the issues addressed is backtracking, which is difficult when the
engine and solver are so loosely coupled.
11.3.1 Programming and Applications
In this final part, we discuss the practical use of CLP languages. The format
here is essentially a selected list of successful applications across a variety
of problem domains. Each application is given an overview, with emphasis
on the particular programming paradigm and CLP features used.
It seems useful to classify CLP applications broadly into two classes.
In one class, the essential CLP technique is to use constraints and rules to
obtain a transparent representation of the (relationships underlying the)
problem. Here the constraints also provide a powerful query language. The
other class caters for the many problems which can be solved by enumera-
tion algorithms, the combinatorial search problems. Here the LP aspect of
CLP is useful for providing the enumeration facility while constraints serve
to keep the search space manageable.

12 Modelling of complex problems


We consider here the use of CLP as a specification language: constraints al-
low the declarative interpretation of basic relationships, and rules combine
these for complex relationships.

12.1 Analysis and synthesis of analog circuits


This presentation is adapted from [Heintze et al, 1992], an early application
of CLP (R). Briefly, the general methodology for representing properties of
circuits is that constraints at a base level describe the relationship between
variables corresponding to a subsystem, such as Ohm's law, and constraints
at a higher level describe the interaction between these subsystems, such
as Kirchhoff's law.
Consider the following program fragment defining the procedure
circuit (N, V, I) which specifies that, across an electrical network N, the
potential difference and current are V and I respectively. The network is
specified in an obvious way by a term containing the functors resistor,
series and parallel. In this program, the first rule states the required
voltage-current relationship for a resistor, and the remaining rules combine
such relationships in a network of resistors.
Constraint Logic Programming 659

circuit(resistor(R), V, I) :- V = I * R.
circuit(series(Nl, N 2 ) , V, I) :-
I = II,
I = 12,
V = VI + V2,
circuit(Nl, VI, I I ) ,
circuit(N2, V2, 12).
circuit(parallel(Nl, N2), V, I) :-
V = VI, V = V2,
I = II + 12,
circuit(Nl, VI, II),
circuit(N2, V2, 12).
For example, the query
?- circuit(series(series(resistor(R).resistor(R)),
resistor(R)),V,5)
asks for the voltage value if a current value of 5 is flowing through a net-
work containing just three identical resistors in series. (The answer is R =
0.0666667*V.) Additional rules can be added for other devices. For exam-
ple, the piece-wise linear model of a diode described by the voltage-current
relationship

r 10
10V + 1000 if V < -100
J = {^ o-c
0.001 V if - 100 < V < 0.6
100V - 60 if V > 0.6
I 10i
is captured by the rules:
circuit(diode, V, 10 * V + 1000) :- V < -100.
circuit(diode, V, 0.0001 * V) :- -100 <= V, V <= 0.6.
circuit(diode, V, 100 * V - 60) :- V > 0.6.
This basic idea can be extended to model AC networks. For example, sup-
pose we wish to reason about an RLC network in steady-state. First, we
dispense with complex numbers by representing X + iY as a CLP (R,) term
c(X, Y), and use:
c_equal(c(Re, Im), c(Re, Im)).
c_add(c(Rel, Iml), c(Re2, Im2), c(Rel + Re2, Iml
cmult(c(Rel, Iml), c(Re2, Im2), c(Re3, Im3)) :-
Re3 = Rel * Re2 - Iml * Im2,
Im3 = Rel * Im2 + Re2 * Iml.
660 Joxan Jaffar and Michael J. Maher

to implement the basic complex arithmetic operations of equality, addition


and multiplication.
Now consider the following procedure circuit(N, V, I, W) which is
like its namesake above except that the voltage and current values are now
complex numbers, and the new parameter W, a real number, is the angular
frequency. It is noteworthy that this significant extension of the previous
program fragment for circuit has been obtained so easily.
circuit(resistor(R), V, I, W) :-
c_mult(V, I, c(R, 0)).
circuit(inductor(L), V, I, W) :-
c_mult(V, I, c(0, W * L)).
circuit(capacitor(C), V, I, W) :-
c_mult(V, I, c(0, -1 / (W * C))).
circuit(series(Nl, N2), V, I, W) :-
c_equal(I, II), c.equald, 12),
c_add(V, VI, V2),
circuit(Nl, VI, II, W),
circuit(N2, V2, 12, W).
cir.cuit(parallel(Nl, N2) , V, I, W) :-
c_equal(V, V I ) , c_equal(V, V 2 ) ,
c_add(I, II, 12),
V = VI, V = V2,
I =I1+ I2,
circuit(N1, VI, II, W),
circuit(N2, V2, I2, W).

We close this example application by mentioning that the work in [Heintze


et al, 1992] not only contains further explanation of the above technique,
but also addresses other problems such as the synthesis of networks and
digital signal flow. Not only does the CLP approach provide a concise
framework for modelling circuits (previously done in a more ad hoc man-
ner), but it also provides additional functionality because relationships, as
opposed to values, are reasoned about. Evidence that this approach can be
practical was given; for example, the modelling can be executed at the rate
of about 100 circuit components per second on an RS6000 workstation.

12.2 Options trading analysis


Options are contracts whose value is contingent upon the value of some
underlying asset. The most common type of option are those on company
shares. A call option gives the holder the right to buy a fixed number of
shares at a fixed exercise price until a certain maturity/expiration date.
Conversely, a put option gives the holder the right to sell at a fixed price.
The option itself may be bought or sold. For example, consider a call option
Constraint Logic Programming 661

costing $800 which gives the right to purchase 100 shares at $50 per share
within some period of time. This call option can be sold at the current
market price, or exercised at a cost of $5000. Now if the price of the share
is $60, then the option may be exercised to obtain a profit of $10 per share;
taking the price of the option into account, the net gain is $200. After
the specified period, the call option, if not exercised, becomes worthless.
Figure 5 shows payoff diagrams which are a simple model of the relationship
between the value of a call option and the share price. Sell options have
similar diagrams. Note that c denotes the cost of the option and x the
exercise price.

Buy a Call Sell a Call Butterfly

Fig. 5. Payoff diagrams

Options can be combined in arbitrary ways to form artificial financial


instruments. This allows one to tailor risk and return in flexible ways. For
example, the butterfly strategy in Figure 5 consists of buying two calls, one
at a lower strike price x and one at a higher price z and selling two calls
at the middle strike price y. This makes a profit if the share stays around
the middle strike price and limits the loss if the movement is large.
The following presentation is due to Yap [Yap, 1994], based in his work
using CLP (R). This material appeared in [Lassez et al., 1987], and the sub-
sequently implemented OTAS system is described in [Huynh and Lassez,
1988]. There are several main reasons why CLP, and CLP(R) in partic-
ular, are suitable for reasoning about option trading: there are complex
trading strategies used which are usually formulated as rules; there is a
combinatorial aspect to the problem as there are many ways of combining
options; a combination of symbolic and numeric computation is involved;
there are well developed mathematical valuation models and constraints on
the relationships involved in option pricing, and finally, flexible 'what-if
type analysis is required.
A simple mathematical model of valuing options, and other financial
instruments such as stocks and bonds, is with linear piecewise functions.
Let the Heaviside function h and the ramp function r be defined as follows:
662 Joxan Jaffar and Michael J. Maher

,, , f 0 if a; > w , , x fO if x > y
h(x,y)
v =<, ., . and r(x,y)
v y =i ,, .
"' \ 1 otherwise ' \ y-x otherwise

The payoff function for call and put options can now be described by the
following matrix product which creates a linear piecewise function:

payoff'= [hi,h2,ri,r2] x
' r(b2,s)
where s is the share price, bi is either the strike price or 0, and hi and ri are
multipliers of the Heaviside and ramp functions. In the following program,
the variables S, X, R respectively denote the stock price, the exercise price
and the interest rate.
h(X, Y, Z) :- Y < X, Z = 0.
h(X, Y, Z) :- Y >= X, Z = 1.
r(X, Y, Z) :- Y < X, Z = 0.
r(X, Y, Z) :- Y >= X, Z = Y - X.
value(Type, Buy_or_Sell, S, C, P, R, X, B, Payoff) :-
sign(Buy_or_Sell, Sign),
data(Type, S, C, P, R, X, B, Bl, B2, HI, H2, Rl, R2),
h(Bl, S, Tl), h(B2, S, T2), r(Bl, S, T3),
r(B2, S, T4),
Payoff = Sign*(Hl*Tl + H2*T2 + R1*T3 + R2*T4).
The parameters for the piecewise functions can be expressed symbolically
in the following tables, implemented simply as CLP facts.
sign(buy, -1).
sign(sell, 1).
data(stock, S, C, P, R, X, B, 0, 0, S*R, 0, -1, 0).
data(call, S, C, P, R, X, B, 0, X, OR, 0, 0, -1).
data(put, S, C, P, R, X, B, 0, X, P*R-X, 0, 1, -1).
data(bond, S, C, P, R, X, B, 0, 0, B*R, 0, 0, 0).
This program forms the basis for evaluating option combinations. The fol-
lowing direct query evaluates the sale of a call option that expires in-the-
monej/38,

38
That is, when the strike price is less than the share price.
Constraint Logic Programming 663

?- Call = 5, X = 50, R = 1.05, S = 60,


value(call, sell, S, Call, _, R, X, _, Payoff).
giving the answer, Payoff = -4.75. More general queries make use of the
ability to reason with inequalities. We can ask for what share price does
the value exceed 5,
?- Payoff > 5, C = 5, X = 50, R = 1.05, S = 60,
value(call, sell, S, C, _, R, X, _, Payoff).
The answer constraints returned39 illustrates the piecewise nature of the
model,
Payoff = 5.25, S < 50;
Payoff = 55.25 - S, 50 <= S, S <= 50.25.

More complex combinations can be constructed by composing them out of


the base financial instruments and linking them together with constraints.
For example, the following is a combination of two calls and two puts,
?- R = 0.1,
Payoff = Payoff1 + Payoff2 + Payoff3 + Payoff4,
PI = 10, Kl = 20, value(put, sell, S, _, PI, R, Kl,
_, Payoff1),
P2 = 18, K2 = 40, value(put, buy, S, -, P2, R, K2,
_, Payoff2),
C3 = 18, K3 = 60, value(call, buy, S, C3, _, R, K3,
_, Payoff3),
C4 = 18, K4 = 60, value(call, sell, S, C4, ., R, K4,
_, Payoff4).
The answer obtained illustrates how combinations of options can be tailored
to produce a custom linear piecewise payoff function.
Payoff = 5.7, S < 20;
Payoff = 25.7 - S, 20 <= S, S < 40;
Payoff = -14.3, 40 <= S, S < 60;
Payoff = S - 74.3, 60 <= S, S < 80;
Payoff = 5.7, 80 <= S.

The above is just a brief overview of the core ideas behind the work in
[Lassez et al, 1987]. Among the important aspects that are omitted are
consideration of option pricing models, and details of implementing the de-

39
We will use ';' to separate different sets of answer constraints in the output.
664 Joxan Jaffar and Michael J. Maher

cision support system OTAS [Huynh and Lassez, 1988]. As in the circuit
modelling application described above, the advantages of using CLP (R)
here are that the program is concise and that the query language is expres-
sive.

12.3 Temporal reasoning


It is natural and common to model time as an arithmetic domain, and
indeed we do this in everyday life. Depending upon the application, a dis-
crete representation (such as the integers) or a continuous representation
(such as the reals) may be appropriate, and varying amounts of the arith-
metic signature are needed (for example, we might use only the ordering,
or use only a successor function). In this brief discussion we assume that
time is linearly ordered, although this is not a universally accepted choice
[Emerson, 1990].
Temporal logic [Emerson, 1990] is often used as a language for express-
ing time-related concepts. Temporal logic adds to standard first-order logic
such constructs as next (meaning, roughly, 'in the next time instant'40),
always (meaning 'in every future time instant'), and sometime (meaning
'in some future time instant'). The language Templog [Abadi and Manna,
1989] was designed based on a Horn-like subset of temporal logic in which
the meaning of function symbols does not vary with time, but the meaning
of predicate symbols does. It was shown in [Brzoska, 1991] that the op-
erational behavior of Templog could be mimicked by a CLP language via
the following natural translation: every predicate receives another argu-
ment, representing time. Then, at time t, next is represented by t' = t + 1,
and the future (for always and sometime) is represented by t' > t. In later
work [Brzoska, 1993], Brzoska has presented a more powerful temporal logic
language which also can be viewed as, and implemented through, a CLP
language.
Often we wish to manipulate the time parameter more directly than
is possible in conventional temporal logic. For example, we may wish to
express durations as well as times. We can do this if we include + in the
signature of our domain modelling time. This is used in applications to
scheduling, among others, as discussed in Section 13.
The use of simple constraint domains to model time has been explored
extensively in the context of temporal databases. In this situation, an item
of data might incorporate the time interval for which it is valid. Simple do-
mains have been considered because of over-riding requirements for quick
and terminating execution of queries, as discussed in Section 7. Further-
more, often the restriction is made that only one or two arguments in a
tuple are time-valued, with the other arguments taking constant values.

40
We assume that time is modelled by the integers.
Constraint Logic Programming 665

[Baudinet et al., 1993] surveys work in this area using an integer model of
time.

13 Combinatorial search problems


CLP offers an easy realization of enumeration algorithms for the solving
of combinatorial problems. Given decision variables xi,..., xn, one uses a
CLP program schema of the form
solve(Xl, ... , Xn) :-
constraints (XI Xn) ,
enumerate(XI, ... , Xn).
to implement a 'constrain-and-generate' enumeration strategy (also called
implicit enumeration), as opposed to naive enumerate-and-test strategy, to
curtail the search space. We refer to the basic text [van Hentenryck, 1989a],
chapter 2, for further introductory material to this CLP approach.
The above schema is used to represent the set of all solutions to the
constraints. Often one desires an optimal solution according to some crite-
rion, say the solution a1,..., an to x1,..., xn that minimizes some given
function cost(x1,.. .,xn). The simplest strategy to obtain this solution is
simply to obtain and check each and every solution of solve. An easy
improvement is obtained by augmenting the search with a branch-and-
bound strategy. Briefly, the cost of the best solution encountered so far is
stored and the continuing search is constrained to find only new solutions
of better cost. More concretely, CLP systems typically provide predicates
such as minimize (solve (XI, ... , Xn, Cost), BestCost) (and simi-
larly maximize(...)) where solve(XI, ... , Xn, Cost) serves to ob-
tain one solution as explained above, with cost Cost, and BestCost is a
number representing the cost of the best solution found so far. (Initially,
this number can be any sufficiently large number.) It is assumed here that
the procedure solve(XI, ... , Xn, Cost) maintains a lower bound for
a variable Cost, which is computed as the values of the decision variables
are determined. The minimize procedure then essentially behaves as a
repeated invocation of the goal ?- Cost < BestCost, solve (XI, ... ,
Xn, Cost). In general, the choice of a suitable cost function can be diffi-
cult. Finally, we refer the reader to the text [van Hentenryck, 1989a, Section
4.5.1] for a more detailed explanation of how branch-and-bound is used in
CLP systems.
The constraint domain at hand is discrete and typically finite (since the
enumeration must cover all candidate values for the sequence x i , . . . ,xn),
and therefore constraint solving is almost always NP-hard. This in turn
restricts implementations to the use of partial solvers, that is, not all con-
straints will be considered active. Recall that partial solvers are, however,
required to be conservative in the sense that whenever unsatisfiability is
reported, the tested constraints are indeed unsatisfiable.
666 Joxan Jaffar and Michael J. Maher

In general, the primary efficiency issues are:


• How complete is the constraint solver? In general, there is tradeoff
between the larger cost of a more complete solver and the smaller
search space that such a solver can give rise to.
• What constraints to use to model the problem? A special case of this
issue concerns the use of redundant constraints, that is, constraints
that do not change the meaning of the constraint store. In general,
redundant constraints will slow down a CLP system with a complete
solver. With partial solvers, however, redundant constraints may be
useful to the solver in case the equivalent information in the constraint
store is not active.
• In which order do we choose the decision variables for enumeration?
And should such order be dynamically determined?
• In which order do we enumerate the values for a given decision vari-
able? And should such order be dynamically determined?
In this section, we will outline a number of CLP applications in specific
combinatorial problem areas. In each subsection below, unless otherwise
specified, we shall assume that the underlying constraint system is based
on the integers.
13.1 Cutting stock
The following describes a two-dimensional cutting stock problem pertaining
to furniture manufacturing, an early application of CHIP [Dincbas et a/.,
1988b]. We are given a sawing machine which cuts a board of wood into
a number of different sized shelves. The machine is able to cut in several
configurations, each of which determines the number of each kind of shelf,
and some amount of wood wasted. Let there be N different kinds of shelves,
and M different configurations. Let Si,j1, 1 < i < M, 1 < j < N, denote the
number of shelves j cut in configuration i. Let Wi, 1 < i < M, denote the
wastage in configuration i. Let Ri,l<i<N denote the number of shelves
i required. The problem now can be stated as finding the configurations
such that the required number of shelves are obtained and the wastage
minimized.
In [Dincbas et al., 1988b], there were 6 kinds of shelves, 72 configura-
tions, and the number of boards to be cut was fixed at 4. Two solutions
were then presented, which we now paraphrase.
Let Xi, I < i < 72, denote the number of boards cut according to
configuration i. Thus X1 + X72 = 4. The requirements on the number
of shelves are expressed via the constraints X1 * Si,j+ h X72 * £72,j > Rj >
for 1 < j < N. The objective function, to be minimized, is Xl * W1 H h
X-ii * W72. The straightforward program representation of all this is given
below. The enumerate procedure has the range {0,1,2,3,4}. Note that
solve is run repeatedly in the search for the solution of lowest Cost.
Constraint Logic Programming 667

solve(Xi, ... , X 7 2> Cost) :-


Xi + . . . + X72 = 4,
Xi * Si,! + ... + X72 * S72,i >= RI ,
>=
Xl * 81,2 + . . . + X72 * S72,2 R2 »

Xi * Si,6 + ... + X72 * S72je >= Re,


Cost = Xi * Vi + . . . + X72 * W72 ,
enumerate (Xj, ... , X 72 ) .
The second solution uses the special CHIP constraint element, described
above in Section 9.2. Recall that element (X, List, E) expresses that the
X'th element of List is E. In this second approach to the problem, the
variables Xi, 1 < i < 4, denote the configurations chosen. Thus 1 < Xi <
72. Let Ti,j, I < i < 4, 1 < j < 6, denote the number of shelves j in
configuration i. Let Costi, I < i < 4, denote the wastage in configuration i.
Thus the required shelves are obtained by the constraints TI,J+-----+T4j>
RJ where 1 < j < 6, and the total cost is simply Costi + • • • + Cost1.
In the program below, the constraints X1 < X2 < X3 < X4 serve to
eliminate consideration of symmetrical solutions. The following group of
24 element constraints serve to compute the Ti,j variables in terms of the
(given) Si,j values and the (computed) Xi values. The next group of 4
element constraints computes the Costi variables in terms of the (given)
Wi variables. The enumerate procedure has the range {1, 2, . . . , 72}. Once
again, solve is run repeatedly here in the search for the lowest Cost.
solve (Xi, ... , X 4 , Cost) :-
Xi <= X2, X2 <= Xa , X3 <= X4 ,
element (Xi , [Si ,j, ... , S72,J] , TI,. <J< 6)
element (X 2 , [Si ,j , • • • , S72,j] , T2.
element (Xa, [Si j , ... , S72,j] , T3l.
element (X 4 , [Si j, ... , S72,j] , T4|.
element (Xi, [Wi , ... W 7 2 ], Costi),
element (X2 , [Wi , . . . W72] , Cost2) ,
element (X3 , [W1 , . . . W72] , Costa),
element (X4 , [W1 , . . - W72] , Cost4) ,
TI,J + Tjj + T3,j + T4,J >= RJ, 7. (l<j< 6)
Cost = Cost1 + Cost2 + Costa3 + Cost4,
enumerate (Xi, X2, X 3 , X 4 ) .
The second program has advantages over the first. Apart from a smaller
search space (approximately 107 in comparison with 1043), it was able to
avoid encountering symmetrical solutions. The timings given in [Dincbas
et al., 1988b] showed that the second program ran much faster. This com-
parison exemplifies the abovementioned fact that the way a problem is
modelled can greatly affect efficiency.
668 Joxan Jaffar and Michael J. Maher

13.2 DNA sequencing


We consider a simplified version of the problem of restriction site map-
ping (RSM). Briefly, a DNA sequence is a finite string over the letters
{A,C, G,T}, and a restriction enzyme partitions a DNA sequence into
certain fragments. The problem is then to reconstruct the original DNA
sequence from the fragments and other information obtained through ex-
periments. In what follows, we consider an abstraction of this problem
which deals only with the lengths of fragments, instead of the fragments
themselves.
Consider the use of two enzymes. Let the first enzyme partition the
DNA sequence into A1, . . . , AN and the second into B1, . . . , BM Now, a
simultaneous use of the two enzymes also produces a partition D1 , . . . ,
corresponding to combining the previous two partitions. That is,

j : Al • • • Ai = DI • • • Dj and
ViBj : BI • • • Bi = DI • • • DJ, and conversely,

Let di denote the length of Ai, similarly for 6i, and di. Let Si denote the
subsequence (a1, . . . , ai), 1 < i < N. Similarly define bi and di. The prob-
lem at hand now can be stated as: given the multisets a = {ai, . . . ,an},
b = {&i,..., &Af} and d = {d1,.. . , d/k}, construct the sequences SN =
(ai,... ,an), bM = (bi,...,bM) and dk = (d1,. . . , d k ) .
Our basic algorithm generates d1 , d2 , . . . in order and extends the par-
titions for a and b using the following invariant property which can be
obtained from the problem definition above. Either
• dk is aligned with ai, that is, d1 + • • • + dk = ai + • • • + a1, or
• dk is aligned with bj (but not with ai41,) that is, d1 + • • • + dk =
61 + --- + bj.
In the program below, the main procedure solve takes as input three
lists representing a, 6 and d in the first three arguments, and outputs in
the remaining three arguments. Enumeration is done by choosing, at each
recursive step of the rsm procedure, one of two cases mentioned above.
Hence the two rules for rsm. Note that the three middle arguments of
rsm maintain the length of the subsequences found so far, and in all calls,
either lenA = lenD < lenB or lenB = lenD < lenA holds; the procedure
choose_initial chooses the first fragment, and the first call to rsm is
made with this invariant holding. Finally, the procedure choose deletes
some element from the given list and returns the resultant list. Note that
one more rule for rsm is needed in case the A and B fragments do align
4
'For simplicity we assume that we never have all three partitions aligned except at
the beginning and at the end.
Constraint Logic Programming 669

anywhere except at the extreme ends; we have omitted this possibility for
simplicity.

solve(A, B, D, [AFraglMapA], [BFragIMapB],


[DFraglMapDj) :-
choose_initial(A, B, D, AFrag, BFrag, DFrag, A2, B2, D2) ,
rsm(A2, B2, D2, AFrag, BFrag, DFrag, MapA, MapB, MapD).
rsm(A, B, D, LenA, LenB, LenD, MapA, MapB, MapD) :-
empty(A), empty(B), empty(D),
MapA = [] , MapB = [] , MapD = [] .
rsm(A, B, D, LenA, LenB, LenD, [Ai l MapA], MapB,
[DklMapD]) :-
LenA = LenD, LenA < LenB
Dk <= LenB - LenA, Ai >= Dk,
choose(Dk, D, D2),
choose(Ai, A, A2),
rsm(A2, B, D2, LenA + Ai, LenB, LenD + Dk,
MapA, MapB, MapD).
rsm(A, B, D, LenA, LenB, LenD, MapA, [BjIMapB],
[DklMapD]) :-
LenB = LenD, LenB < LenA
Dk <= LenA - LenB, Bj >= Dk,
choose(Dk, D, D2),
choose(Bj, B, B2),
rsm(A, B2, D2, LenA, LenB + Bj, LenD + Dk,
MapA, MapB, MapD).

This application of CLP is due to Yap [Yap, 1991; Yap, 1993] and it is im-
portant to note that the above program is a considerable simplification of
Yap's program. A major omission is the consideration of errors in the frag-
ment lengths (because these lengths are obtained from experimentation). A
major point in Yap's approach is that it gives a robust and uniform treat-
ment of the experimental errors inherent in the data as compared with
many of the approaches in the literature. Furthermore, [Yap, 1993] shows
how the simple two enzyme problem can be extended to a number of other
problem variations. Because a map solution is just a set of answer con-
straints returned by the algorithm, it is easy to combine this with other
maps, compare maps, verify maps, etc. This kind of flexibility is impor-
tant as the computational problem of just computing a consistent map is
intractable, and hence when dealing with any substantial amount of data,
any algorithm would have to take into account data from many varieties of
mapping experiments, as well as other information specific to the molecule
in question.
670 Joxan Jaffar and Michael J. Maher

13.3 Scheduling
In this class of problems, we are given a number of tasks, and for each task, a
task duration. Each task also requires other resources to be performed, and
there are constraints on precedences of task performance, and on resource
usage. The problem is to schedule the tasks so that the resources are most
efficiently used (for example, perform the tasks so that all are done as soon
as possible).
Consider now a basic job-shop scheduling problem in which is given a
number m of machines, j sequences of tasks, the task durations and the
machine assigned to each task. The precedence constraints are that the
tasks in each sequence (called a job) are performed in the sequence order.
The resource constraints are that each machine performs at most one task
at any given time.
In the program below, precedences sets up the precedence constraints
for one job, and is called with two equally long lists. The first contains the
task variables, whose values are the start times. The second list contains
the durations of the tasks. Thus precedences is called once for each job.
The procedure resources is called repeatedly, once for each pair of tasks
TI and T2 which must be performed without overlapping; their durations
are given by D1 and D2.
precedences( [Ti, T2 I Tail], [D1, D2 I Tail2]) :-
TI + Dj <= T2,
precedences(Tail, Tail2).
precedences ( [] , []).

resources (T1, Dt, T2, D2) :- T1 + D1 <= T2.


resources(T1, D1, T2, D2) :- T2 + D2 <= TI.
A simple way to proceed is to fix an ordering of the tasks performed on
each machine. This corresponds to choosing one of the two resources rules
for each pair of tasks assigned to the same machine. This forms the basis
of the enumerate procedure below. Once an ordering of tasks is fixed, it is
a simple matter to determine the best start times for each task.
This can be done in the manner indicated in the solve procedure below.
An important efficiency point is that by choosing a precedence between two
tasks, the new constraints created by the use of resources, in conjunction
with the precedence constraints, can reduce the number of possible choices
for the remaining pairs. We assume that the procedure define_cost defines
Cost in such a way that, in conjunction with other constraints, it provides
a conservative lower bound of the real cost of the schedule determined so
far. Its precise definition, omitted here, can be obtained in a similar way
as in the second program of the cutting-stock example above.
solve(T1, T 2 , ... , T n , Cost) :-
Constraint Logic Programming 671

precedences ( ... ), '/. one per job

precedences( ... ),
define_cost(T1, T2 T n , Cost),
enumerate(T1, T 2 , ... , T n ),
generate jstart_times(T1, t2, ... , T n ).
enumerated (T1, ... , Tn) :-
resources ( ... ) , % one per pair of tasks assigned
to same machine

resources( ... ).

Finally, this solve procedure can be repeatedly run, within a branch-and-


bound framework (with a special minimize predicate mechanism as ex-
plained above) to obtain the best solution over all possible orderings.
In this presentation of the program we have chosen to simply list all
calls to precedences in the procedure solve, to focus on the important
procedures in the program. A real program would use an auxiliary predicate
to iterate over the jobs and generate the calls to precedences. Similarly,
enumerate would iterate to generate calls to resources. Thus the program
would be independent of the number of jobs or the pattern in which tasks
are assigned to machines. Similar comments apply to other programs in
this section.
There are variations and specializations of CLP approaches to this prob-
lem. Section 5.4.2 of [van Hentenryck, 1989a] and section 2 of [Dincbas et
al., 1990], on which this presentation is based, further discuss the prob-
lem and how particular features of CHIP can be useful. Another CHIP
approach, but this time to a specific and practical scheduling problem, is
reported in [Chamard et al., 1992]. In [Aggoun and Beldiceanu, 1992], the
focus is on a new feature of CHIP and how it can be used to obtain an
optimal solution to a particular 10 jobs and 10 machine problem, which
remained open until recently.
Real scheduling problems can involve more kinds of constraints than
just those mentioned above. For example, one could require that there is
at most a certain time elapsed between the completion of one task and
the commencement of another. See [Wallace, 1993] for a more complete
discussion of the CLP approach to the general scheduling problem.

13.4 Chemical hypothetical reasoning


This Prolog III application, described in some detail in [Jourdan et at,
1990], uses both arithmetic and Boolean constraints. The problem at hand
is that of elucidating chemical-reaction pathways, and we quote [Jourdan
et al., 1990]: given an instantiation of the (two-reagent) reaction schema
A + B ~»- T + PI + • • • + Pk, determine the pathway, that is, the set of
672 Joxan Jaffar and Michael J. Maher

constituent reaction steps, as well as other molecules (or species) formed


during the reaction.
The reaction step considered in [Jourdan et al., 1990] contains at most
two reactant molecules, and at most two product molecules, and so can be
described in the form RI +R2 —» P1 +P2 where R1,R 2 ,Pi, P2 are (possibly
empty) molecular formulas. The problem then is to determine, given an
overall reaction, a collection of basic steps or pathway which explain the
overall reaction. For example, given C7H9N + CH2O ~> C17H18N2 + H2O,
the following is a pathway which explains the reaction.

C7H9N + CH2O —» H2O + C8H9N


C8H9N + C8H9N —> C16Hi8N2
C8H9N + C16H18N2 —> C17H18N2 + C7H9N

Here C16H18N2 and C16H18N2 are the previously unidentified species.


The program imposes constraints to express requirements for a chem-
ical reaction and to exclude uninteresting reactions. In addition to the
constraints on the number of molecules, there are two other constraints on
reaction steps: for each chemical element, the number of reactant atoms
equals the number of product atoms (i.e. the step is chemically balanced),
and no molecular formula appears in both sides of a step.
There are also constraints on the pathway. Let the reaction schema
under consideration be A + B ~» T'+ P1 + h Pk • Then
• All pathway species must be formable from the two reagents A and
B.
• Neither A nor B alone is sufficient to form the target product T. Here
Boolean variables are used to express the dependency relation 'can be
formed from'. For each pathway step R1,R2 —> P1,P2 we state the
Boolean constraint a1 A a2 =>a 03 A o4 where a1, a2,03,04 are Boolean
variables associated with Ri,R2,Pi,P2 respectively. The constraint
expresses that both PI and P2 can be formed if both RI and R2
can be formed. Let B denote the Boolean formulas thus constructed
over all the steps in a pathway. Then expressing that species R does
not, by itself, produce species P is tantamount to the satisfiability of
the Boolean constraint B A -'(O.R => ap), where O,R and ap are the
Boolean variables associated with R and P respectively. Since we have
two original reagents, we will need two sets of Boolean variables and
two sets of dependency constraints to avoid any interference between
the two conditions.
• There is a notion of pathway consistency which is defined to be the
satisfiability of a certain arithmetic formula constructed from the
occurrences of species in the pathway. Essentially this formula is a
conjunction of formulas ni + n2 = n3 + n4, for each pathway step
Constraint Logic Programming 673

Ri + R2 —> PI + PZ, where ni,... ,n4 are the arithmetic variables


of RI,RZ,PI,PI respectively.
• Finally, in the ultimate output of the program, no two pathways
are identical, nor become identical under transformations such as
permuting the reactants or products within a step, or switching the
reactants and products in a step.
The program representation of a molecular formula is as a list of numbers,
each of which specifies the number of atoms of a certain chemical element.
We shall assume that there are only four chemical elements of interest in
our presentation, and hence a molecular formula is a 4-tuple. A species is
also represented by a 4-tuple (n, a, b, /) where n is an arithmetic variable
(to be used in the formulation of the arithmetic formula mentioned above),
a and b are Boolean variables (to be used in expressing the formation
dependencies), and f is the species formula. A step R1 + R2 —>• P1 + P2
is represented by a 4-tuple (r1, r2, p1, p2) containing the identifiers of the
representations of Ri,R2, PI and P2.
The listing below is a simplified and translated version of the Prolog
III program in [Jourdan et a/., 1990]. In the main procedure solve, the
first argument is a list of fixed size, say n, in which each element is a
species template. The first three templates are given, and these represent
the two initial reagents R1,R2? and final target T. Similarly, Steps is a list
of fixed size, say m, in which each element is a step template. Thus n and
m are parameters to the program. The undefined procedure formula_of
obtains the species formula from a species, that is, it projects onto the last
element of the given 4-tuple. Similarly, arith_var_of, bool_var_a_of and
bool_var_b_of project onto the first, second and third arguments respec-
tively.
The procedure no_duplicates asserts constraints which prevent du-
plicate species and steps, and it also prevents symmetrical solutions; we
omit the details. Calls to the procedure formation-dependencies gen-
erate the formation dependencies. The procedure both reagents_needed
imposes two constraints, one for each reagent, that, in conjunction with
the formation dependencies, assert that RI (respectively #2) alone cannot
produce T. Finally, enumerate_species is self-explanatory.
solve([Ri, R2, T I Species], Steps) :-
no_duplicates( ... ),
balanced_step( ... ) , 7, for each step in Steps
pathway_step_consistency( ... ) , %, for each step in Steps
formationjdependencies( ... ) , '/, for each step in Steps
both_reagents_needed(R1, R2, T) ,
enumerate_species( ... ).
balanced-step (R1, R2, P1, P2) :-
formula_of(R1, (C1, H1, N1, O1)),
674 Joxan Jaffar and Michael J. Maher

formula_of (R 2 , (C 2 , H 2 , N 2 , 0 2 )),
formula_of(Pi, (C 3 , H 3 , N 3 > 0 3 )),
formula_of (P 2 , (C 4 , H 4 , N 4 , 0 4 )),
c1 + C2 = C3 + C 4 ,
HI + H2 = H3 + H 4 ,
Ni + N2 = N3 + N 4 ,
Oi + 02 = 03 + 0 4 .
pathway _step_consistency(R1, R 2 , PI, P 2 ) :-
arith_var_of (R1 , NI) , arith_var_of (R1 , N2) ,
aritli_var_of (P1 , N3) , arith_var_of (P1 , N4) ,
NI + N2 = N3 + N4.
formation-dependencies (R , R2, P1, P2) :-
bool_var_a_of (R1 , A1) bool_var_b_of (R1 , B1),
bool_var_a_of (Ri , A2) bool_var_b_of (R2 , B2) ,
bool.var_a_of (Pi , A3) bool_var_b_of (P1 , B 3 ) ,
bool_var_a_of (P1 , A4) bool_var_b_of (P2 , B4) ,
A1 A A2 => A3 A A4 ,
B1 A B2 => B3 A B4 .
both_reagents_needed(R1, R2, T) :-
bool_var_a_of (R1 , A1) , bool_var_b_of (R2 , BI)
bool_var_a_of (T, A 3 ) , bool_var_b_of (T , B3) ,
-. (A1 => A 3 ) ,
- (B1 => B 3 ) .
13.5 Propositional solver
As mentioned above in the discussion about the Boolean constraint domain,
one approach to solving Boolean equations is to use clp (FD) , representing
the input formulas in a straightforward way using variables constrained to
be 0 or 1. See section 3.3.2 of [Simonis and Dincbas, 1993] and [Codognet
and Diaz, 1993] for example. What follows is from [Codognet and Diaz,
1993].
Assuming, without losing generality, that the input is a conjunction of
equations of the form Z = X A Y, Z = X V Y or X = ->Y, the basic
algorithm is simply to represent each equation
Z = X AY by the FD) constraints Z =X xY
Z<X <Z x Y + l-
Z<Y <ZxX + l-
Z = X V Y by Z = X + Y -X xY
Zx(1-Y)<X <Z
Zx(l-X)<Y <Z
X = -Y by X = 1-Y
Y = l-X
The following is a clp (FD) program fragment which realizes these represen-
Constraint Logic Programming 675

tations. What is not shown is a procedure which takes the input equation
and calls the and, or and not procedures appropriately, and an enumeration
procedure (over the values 0 and 1) for all variables. In this program val(X)
delays execution of an ?"D constraint containing it until X is ground, at
which time val(X) denotes the value of X. The meanings of min(X) and
max(X) are, respectively, the current lower and upper bounds on X main-
tained by the constraint solver, as discussed in Section 9.3. A constraint
X in s..t expresses that s and t are, respectively, lower and upper bounds
for X.
and(X, Y, Z) :-
Z in min(X)*min(Y) .. max(X)*max(Y),
X in min(Z) .. max(Z)*max(Y) + 1 - min(Y),
Y in min(Z) .. max(Z)*max(X) + 1 - min(X).
or(X, Y, Z) :-
Z in min(X) + min(Y) - min(X)*min(Y) ..
max(X) + max(Y) - max(X)*max(Y),
X in min(Z)*(l - max(Y)) .. max(Z),
Y in min(Z)*(l - max(X)) .. max(Z).
not(X, Y) :-
X in 1 - val(Y),
Y in 1 - val(X).

We conclude here by mentioning the authors' claim that this approach has
great efficiency. In particular, it is several times faster than each of two
Boolean solvers deployed in CHIP, and some special-purpose stand-alone
solvers.

14 Further applications
The applications discussed in the previous two sections are but a sample
of CLP applications. Here we briefly mention some others to indicate the
breadth of problems that have been addressed using CLP languages and
techniques.
We have exemplified the use of CLP to model analog circuits above.
A considerable amount of work has also been done on digital circuits, in
particular on verification [Simonis, 1989a; Simonis and Dincbas, 1987a;
Simonis et al., 1988; Simonis and Le Provost, 1989], diagnosis [Simonis
and Dincbas, 1987b], synthesis [Simonis and Graf, 1990] and test-pattern
generation [Simonis, 1989b]. Many of these works used the CHIP system.
See also [Filkorn et al., 1991] for a description of a large application. In
civil engineering, [Lakmazaheri and Rasdorf, 1989] used CLP(R) for the
analysis and partial synthesis of truss structures. As with electrical cir-
cuits, the constraints implement physical modelling and are used to verify
676 Joxan Jaffar and Michael J. Maher

truss and support components, as well as to generate spatial configura-


tions. There is also work in mechanical engineering; [Sthanusubramonian,
1991] used CLP(R) to design gear boxes, and [Subramanian and Wang,
1993] combined techniques from qualitative physics and CLP (R) to design
mechanical systems from behavior specifications. In general, engineering
applications such as these use CLP to specify a hierarchical composition of
complex systems and for rule-based reasoning.
Another important application area for CLP is finance. We mentioned
the OTAS work above. Some further work is [Homiak, 1991] which also
deals with option valuations, and [Berthier, 1989; Berthier, 1990; Broek
and Daniels, 1991] which deal with financial planning. These financial ap-
plications have tended to take the form of expert systems involving sophis-
ticated mathematical models.
There have been various proposals for including certainty measures and
probabilities in logic programs to provide some built-in evidential reason-
ing that can be useful when writing expert systems. Original proposals
[Shapiro, 1983b; van Emden, 1986] intended Prolog as the underlying lan-
guage, but it is clear that CLP languages provide for more flexible execution
of such expert systems.
Finally, we mention work on applying CLP languages to: music [To-
bias, 1988], car sequencing [van Hentenryck, 1991], aircraft traffic control
[Codognet et al., 1992], building visual language parsers [Helm et al., 1991],
a warehousing problem [Bisdorff and Laurent, 1993], safety analysis [Corsini
and Rauzy, 1993], frequency assignment for cellular telephones [Carlsson
and Grindal, 1993], timetabling [Boizumault et al., 1993], floor planning
[Kanchanasut and Sumetphong, 1992], spacecraft attitude control [Skup-
pin and Buckle, 1992], interoperability of fiber optic communications equip-
ment [Chadra et al., 1992], interest rate risk management in banking [Gailly
et al., 1992], failure mode and effect analysis of complex systems [Gailly et
al., 1992], development of digitally controlled analog systems [Nerode and
Kohn, 1993], testing of telecommunication protocols [Ladret, 1993], causal
graph management [Rueher, 1993], factory scheduling [Evans, 1992], etc.
The Applause Project [Li et al., 1993] has developed applications that use
the ElipSys system for manufacturing planning, tourist advice, molecular
biology, and environment monitoring and control.

Acknowledgements
We would like to thank the following people for their comments on drafts
of this paper and/or help in other ways: M. Bruynooghe, N. Heintze, P.
van Hentenryck, A. Herold, J-L. Lassez, S. Michaylov, C. Palamidessi, K.
Shuerman, P. Stuckey, M. Wallace, R. Yap. We also thank the anonymous
referees for their careful reading and helpful comments.
Constraint Logic Programming 677

References
[Abadi and Manna, 1989] M. Abadi and Z. Manna. Temporal Logic Pro-
gramming, Journal of Symbolic Computation, 8, 277-295, 1989.
[Aggoun and Beldiceanu, 1992] A. Aggoun and N. Beldiceanu. Extend-
ing CHIP to Solve Complex Scheduling and Packing Problems, In
Journees Francophones De Programmation Logique, Lille, France,
1992.
[Aggoun and Beldiceanu, 1993] A. Aggoun and N. Beldiceanu. Overview
of the CHIP Compiler System. In Constraint Logic Programming:
Selected Research, F. Benhamou and A. Colmerauer, eds. pp. 421-
435. MIT Press, 1993.
[Aiba et at., 1988] A. Aiba, K. Sakai, Y. Sato, D. Hawley and R. Hasegawa.
Constraint Logic Programming Language CAL, Proc. International
Conference on Fifth Generation Computer Systems 1988, 263-276,
1988.
[Ai't-Kaci, 1986] H. Ai't-Kaci. An Algebraic Semantics Approach to the Ef-
fective Resolution of Type Equations, Theoretical Computer Sci-
ence, 45, 293-351, 1986.
[Ai't-Kaci, 1991] H. Ai't-Kaci. Warren's Abstract Machine: A Tutorial Re-
construction, MIT Press, 1991.
[Ai't-Kaci and Nasr, 1986] H. Ai't-Kaci and R. Nasr. LOGIN: A Logic Pro-
gramming Language with Built-in Inheritance, Journal of Logic Pro-
gramming, 3, 185-215, 1986.
[Ai't-Kaci and Nasr, 1987] H. Ai't-Kaci, P. Lincoln and R. Nasr. Le Fun:
Logic Equations and Functions, Proc. Symposium on Logic Program-
ming, 17-23, 1987.
[Ai't-Kaci and Podelski, 1993a] H. Ai't-Kaci and A. Podelski. Towards a
Meaning of LIFE, Journal of Logic Programming, 16,195-234,1993.
[Ai't-Kaci and Poldeski, 1993b] H. Ai't-Kaci and A. Podelski. Entailment
and Disentailment of Order-Sorted Feature Constraints, manuscript,
1993.
[Ai't-Kaci and Poldeski, 1993c] H. Ai't-Kaci and A. Podelski. A General
Residuation Framework, manuscript, 1993.
[Ai't-Kaci et al., 1992] H. Ai't-Kaci, A. Podelski and G. Smolka. A Feature-
based Constraint System for Logic Programming with Entailment,
Theoretical Computer Science, to appear. Also in: Proc. Interna-
tional Conference on Fifth Generation Computer Systems 1992, Vol.
2, 1992, 1012-1021.
[Albert et al., 1993] L. Albert, R. Casas and F. Fages. Average-case Anal-
ysis of Unification Algorithms, Theoretical Computer Science 113,
3-34, 1993.
[Apt et al., 1988] K. Apt, H. Blair and A. Walker. Towards a theory of
678 Joxan Jaffar and Michael J, Maher

declarative knowledge. In Foundations of Deductive Databases and


Logic Programming, J. Minker, ed. pp. 89-148. Morgan Kaufmann,
1988.
[Atay, 1992] C. Atay. A Parallelization of the Constraint Logic Program-
ming Language 2LP, Ph.D. thesis, City University of New York,
1992.
[Atay et al., 1993] C. Atay, K. McAloon and C. Tretkoff. 2LP: A Highly
Parallel Constraint Logic Programming Language, Proc. 6th. SI AM
Conf. on Parallel Processing for Scientific Computing, 1993.
[Barbuti et al, 1992] R. Barbuti, M. Codish, R. Giacobazzi and M.J. Ma-
her. Oracle Semantics for Prolog, Proc. 3rd Conference on Algebraic
and Logic Programming, LNCS 632, 100-115, 1992.
[Baudinet, 1988] M. Baudinet. Proving Termination Properties of Prolog:
A Semantic Approach, Proc. 3rd. Symp. Logic in Computer Science,
334-347, 1988.
[Baudinet et al., 1993] M. Baudinet, J. Chomicki and P. Wolper. Tempo-
ral deductive databases. In Temporal Databases: Theory, Design
and Implementation, A. Tansel, J. Clifford, S. Gadia, S. Jajodia,
A. Segev and R. Snodgrass, eds. Benjamin/Cummings, 1993.
[Benhamou, 1993] F. Benhamou. Boolean algorithms in PROLOG III. In
Constraint Logic Programming: Selected Research, F. Benhamou
and A. Colmerauer, eds. pp. 307-325. MIT Press, 1993.
[Benhamou and Colmerauer, 1993] F. Benhamou and A. Colmerauer, eds.
Constraint Logic Programming: Selected Research, MIT Press, 1993.
[Benhamou and Massat, 1993] F. Benhamou and J-L. Massat. Boolean
Pseudo-equations in Constraint Logic Programming, Proc. 10th In-
ternational Conference on Logic Programming, 517-531, 1993.
[Berthier, 1989] F. Berthier. A Financial Model using Qualitative and
Quantitative Knowledge, In F. Gardin, editor, Proceedings of the
International Symposium on Computational Intelligence 89, Milano,
1-9, September 1989.
[Berthier, 1990] F. Berthier. Solving Financial Decision Problems with
CHIP, In J.-L. Le Moigne and P. Bourgine, editors, Proceeedings
of the 2nd Conference on Economics and Artificial Intelligence—
CECIOA 2, Paris, 233-238, June 1990.
[Bisdorff and Laurent, 1993] R. Bisdorff and S. Laurent. Industrial Dispos-
ing Problem Solved in CHIP, Proc. 10th International Conference
on Logic Programming, 831, 1993.
[Bockmayr, 1993] A. Bockmayr. Logic Programming with Pseudo-Boolean
Constraints, in: Constraint Logic Programming: Selected Research,
F. Benhamou and A. Colmerauer, eds. pp. 327-350. MIT Press,
1993.
Constraint Logic Programming 679

[Boizumault et ai, 1993] P. Boizumault, Y. Delon and L. Peridy. Solving


a real life exams problem using CHIP, Proc. International Logic
Programming Symposium, 661, 1993.
[Borning, 1981] A. Borning. The Programming Language Aspects of
ThingLab, a Constraint-oriented Simulation Laboratory, ACM
Transactions on Programming Languages and Systems, 3(4), 252-
387, October 1981.
[Borning et al., 1989] A. Borning, M.J. Maher, A. Martindale and M. Wil-
son. Constraint Hierarchies and Logic Programming, Proc. 6th In-
ternational Conference on Logic Programming, 149-164,1989. Fuller
version as Technical Report 88-11-10, Computer Science Depart-
ment, University of Washington, 1988.
[Bossi et ai, 1992] A. Bossi, M. Gabbrielli, G. Levi and M.C. Meo. Contri-
butions to the Semantics of Open Logic Programs, Proc. Int. Conf.
on Fifth Generation Computer Systems, 570-580, 1992.
[Brodsky and Sagiv, 1991] A. Brodsky and Y. Sagiv. Inference of Inequal-
ity Constraints in Logic Programs, Proc. ACM Symp. on Principles
of Database Systems, 1991.
[Brodsky et al, 1993] A. Brodsky, J. Jaffar and M. Maher. Toward Prac-
tical Constraint Databases, Proc. 19th International Conference on
Very Large Data Bases, 567-580, 1993.
[Broek and Daniels, 1991] J.M. Broek and H.A.M. Daniels. Application of
CLP to Asset and Liability Management in Banks, Computer Sci-
ence in Economics and Management, 4(2), 107-116, May 1991.
[Bryant, 1986] R. Bryant. Graph Based Algorithms for Boolean Function
Manipulation, IEEE Transactions on Computers 35, 677-691,1986.
[Brzoska, 1991] C. Brzoska. Temporal Logic Programming and its Rela-
tion to Constraint Logic Programming, Proc. International Logic
Programming Symposium, 661-677, 1991.
[Brzoska, 1993] C. Brzoska. Temporal Logic Programming with Bounded
Universal Modality Goals, Proc. 10th International Conference on
Logic Programming, 239-256, 1993.
[Burckert, 1990] H-J. Burckert. A Resolution Principle for Clauses with
Constraints. In Proc. CADE-10, M. Stickel, ed. pp. 178-192. LNCS
449, Springer-Verlag, 1990.
[Burg, 1992] J. Burg. Parallel Execution Models and Algorithms for Con-
straint Logic Programming over a Real-number Domain, Ph.D. the-
sis, Dept. of Computer Science, University of Central Florida, 1992.
[Burg et al, 1990] J. Burg, C. Hughes, J. Moshell and S.D. Lang.
Constraint-based Programming: A Survey, Technical Report IST-
TR-90-16, Dept. of Computer Science, University of Central Florida,
1990.
680 Joxan Jaffar and Michael J. Maher

[Burg et al., 1992] J. Burg, C. Hughes and S.D. Lang. Parallel Execution
of CLP-Sft Programs, Technical Report TR-CS-92-20, University of
Central Florida, 1992.
[Buttner and Simonis, 1987] W. Buttner and H. Simonis. Embedding
Boolean Expressions into Logic Programming, Journal of Symbolic
Computation, 4, 191-205, 1987.
[Carlsson, 1987] M. Carlsson. Freeze, Indexing and other Implementation
Issues in the WAM, Proc. 4th International Conference on Logic
Programming, 40-58, 1987.
[Carlsson and Grindal, 1993] M. Carlsson and M. Grindal. Automatic Fre-
quency Assignment for Cellular Telephones Using Constraint Sat-
isfaction Techniques, Proc. 10th International Conference on Logic
Programming, 647-665, 1993.
[Cernikov, 1963] S.N. Cernikov. Contraction of Finite Systems of Linear
Inequalities (In Russian), Doklady Akademiia Nauk SSSR, 152, No.
5, 1075-1078,1963. (English translation in Soviet Mathematics Dok-
lady, 4, No. 5, 1520-1524, 1963.)
[Chadra et al., 1992] R. Chadra, O. Cockings and S. Narain. Interoperabil-
ity Analysis by Symbolic Simulation, Proc. JICSLP Workshop on
Constraint Logic Programming, 55-58, 1992.
[Chamard et al., 1992] A. Chamard, F. Deces and A. Fischler. Applying
CHIP to a Complex Scheduling Problem, draft manuscript, Dassault
Aviation, Department of Artificial Intelligence, 1992.
[Chan, 1988] D. Chan. Constructive Negation based on Completed
Database, Proc. 5th International Conference on Logic Program-
ming, 111-125, 1988.
[Chandru, 1993] V. Chandru. Variable Elimination in Linear Constraints,
The Computer Journal, 36(5), 463-472, 1993.
[Chandru, 1991] V. Chandru and J.N. Hooker. Extended Horn Sets in
Prepositional Logic, Journal of the ACM, 38, 205-221, 1991.
[Chvatal, 1983] V. Chvatal. Linear Programming, W.H. Freeman, New
York, 1983.
[Clark, 1978] K.L. Clark. Negation as Failure. In Logic and Databases, H.
Gallaire and J. Minker, eds. pp. 293-322. Plenum Press, New York,
1978.
[Codognet and Diaz, 1993] P. Codognet and D. Diaz. Boolean Constraint
Solving using clp(FD), Proc. International Logic Programming
Symposium, pp. 525-539, 1993.
[Codognet et al., 1992] P. Codognet, F. Fages, J. Jourdan, R. Lissajoux
and T. Sola. On the Design of Meta(F) and its Applications in
Air Traffic Control, Proc. JICSLP Workshop on Constraint Logic
Programming, pp. 28-35, 1992.
Constraint Logic Programming 681

[Cohen, 1990] J. Cohen. Constraint Logic Programming Languages,


CACM, 33, 52-68, July 1990.
[Colmerauer, 1982a] A. Colmerauer. Prolog-II Manuel de Reference et
Modele Theorique, Groupe Intelligence Artificielle, Universite
d'Aix-Marseille II, 1982.
[Colmerauer, 1982b] A. Colmerauer. Prolog and infinite trees. In Logic
Programming, K. L. Clark and S.-A. Tarnlund, eds. pp. 231-251.
Academic Press, New York, 1982.
[Colmerauer, 1983] A. Colmerauer. Prolog in 10 Figures, Proc. 8th Inter-
national Joint Conference on Artificial Intelligence, pp. 487-499,
1983.
[Colmerauer, 1984] A. Colmerauer. Equations and Inequations on Finite
and Infinite Trees, Proc. 2nd. Int. Conf. on Fifth Generation Com-
puter Systems, Tokyo, pp. 85-99, 1984.
[Colmerauer, 1987] A. Colmerauer. Opening the Prolog III Universe,
BYTE Magazine, August 1987.
[Colmerauer, 1988] A. Colmerauer. Prolog III Reference and Users Manual,
Version 1.1, PrologIA, Marseilles, 1990.
[Colmerauer, 1990] A. Colmerauer. An Introduction to Prolog III, CACM,
33, 69-90, July 1990.
[Colmerauer, 1993a] A. Colmerauer. Naive solving of non-linear con-
straints. In Constraint Logic Programming: Selected Research, F.
Benhamou and A. Colmerauer, eds. pp. 89-112, MIT Press, 1993.
[Colmerauer, 1993b] A. Colmerauer. Invited talk at Workshop on the Prin-
ciples and Practice of Constraint Programming, Newport, Rl, April
1993.
[Corsini and Rauzy, 1993] M.-M. Corsini and A Rauzy. Safety Analysis by
means of Fault Trees: an Application for Open Boolean Solvers,
Proc. 10th International Conference on Logic Programming, p. 834,
1993.
[Courcelle, 1983] B. Courcelle. Fundamental properties of infinite trees.
Theoretical Computer Science, 25, 95-169, March 1983.
[Darlington and Guo, 1992] J. Darlington and Y.-K. Guo. A New Perspec-
tive on Integrating Functions and Logic Languages, Proceedings of
the 3rd International Conference on Fifth Generation Computer
Systems, Tokyo, 682-693, 1992.
[De Backer and Beringer, 1991] B. De Backer and H. Beringer. Intelligent
Backtracking for CLP Languages, An Application to CLP(R), Proc.
International Logic Programming Symposium, 405-419, 1991.
[De Backer and Beringer, 1993] B. De Backer and H. Beringer. A CLP
Language Handling Disjunctions of Linear Constraints, Proc. 10th
International Conference on Logic Programming, 550-563, 1993.
682 Joxan Jaffar and Michael J. Maher

[de Boer and Palamidessi, 1990] F.S. de Boer and C. Palamidessi. A Fully
Abstract Model for Concurrent Constraint Programming, Proc. of
TAPSOFT/CAAP, LNCS 493, 296-319, 1991.
[de Boer and Palamidessi, 1991] F.S. de Boer and C. Palamidessi. Embed-
ding as a Tool for Language Comparison, Information and Compu-
tation 108, 128-157, 1994.
[de Boer and Palamidessi, 1993] F.S. de Boer and C. Palamidessi. From
Concurrent Logic Programming to Concurrent Constraint Program-
ming, in: Advances in Logic Programming Theory, Oxford University
Press, to appear.
[de Boer et al., 1993] F. de Boer, J. Kok, C. Palamidessi and J. Rutten.
Non-monotonic Concurrent Constraint Programming, Proc. Inter-
national Logic Programming Symposium, 315-334, 1993.
[Debray, 1989a] S. K. Debray. Static Inference of Modes and Data De-
pendencies in Logic Programs, ACM Transactions on Programming
Languages and Systems 11 (3), 418-450, 1989.
[Debray, 1989b] S. K. Debray and D.S. Warren. Functional Computations
in Logic Programs, ACM Transactions on Programming Languages
and Systems 11 (3), 451-481, 1989.
[de Kleer and Sussman, 1980] J. de Kleer and G.J. Sussman. Propagation
of Constraints Applied to Circuit Synthesis, Circuit Theory and Ap-
plications 8, 127-144, 1980.
[Diaz and Codognet, 1993] D. Diaz and P. Codognet. A Minimal Extension
of the WAM for clp(FD), Proc. 10th International Conference on
Logic Programming, 774-790, 1993.
[Dincbas et al., 1988a] M. Dincbas, P. Van Hentenryck, H. Simonis, and
A. Aggoun. The Constraint Logic Programming Language CHIP,
Proceedings of the 2nd International Conference on Fifth Generation
Computer Systems, 249-264, 1988.
[Dincbas et al., 1988b] M. Dincbas, H. Simonis and P. van Hentenryck.
Solving a Cutting-stock Problem in CLP, Proceedings 5th Interna-
tional Conference on Logic Programming, MIT Press, 42-58, 1988.
[Dincbas et al., 1990] M. Dincbas, H. Simonis and P. Van Hentenryck.
Solving Large Combinatorial Problems in Logic Programming, Jour-
nal of Logic Programming 8 (1 and 2), 75-93, 1990.
[Dovier and Rossi, 1993] A. Dovier and G. Rossi. Embedding Extensional
Finite Sets in CLP, Proc. International Logic Programming Sympo-
sium, 540-556, 1993.
[Duisburg, 1986] R. Duisburg. Constraint-based Animation: Temporal
Constraints in the Animus System, Technical Report CR-86-37, Tek-
tronix Laboratories, August 1986.
[Ege et al, 1987] R. Ege, D. Maier and A. Borning. The Filter Browser:
Constraint Logic Programming 683

Defining Interfaces Graphically, Proc. of the European Conf. on


Object-oriented Programming, Paris, 155-165, 1987.
[Elcock, 1990] E. Elcock. Absys: The First Logic Pro-
gramming Language—A Retrospective and Commentary, Journal
of Logic Programming, 9, 1-17, 1990.
[Emerson, 1990] E. Emerson. Temporal and Modal Logic, in: Handbook of
Theoretical Computer Science, Vol. B, Chapter 16, 995-1072, 1990.
[Evans, 1992] O. Evans. Factory Scheduling using Finite Domains, in: Logic
Programming in Action, LNCS 636, Springer-Verlag, 45-53, 1992.
[Fages, 1993] F. Pages. On the Semantics of Optimization Predicates in
CLP Languages, Proc. 13th Conf. on Foundations of Software Tech-
nology and Theoretical Computer Science, LNCS 761,193-204,1993.
[Falaschi et al, 1989] M. Falaschi, G. Levi, M. Martelli and C. Palamidessi.
Declarative Modelling of the Operational Behavior of Logic Lan-
guages, Theoretical Computer Science 69, 289-318, 1989.
[Fikes, 1970] R.E. Fikes. REF-ARF: A system for solving problems stated
as procedures, Artificial Intelligence 1, 27-120, 1970.
[Filkorn et al., 1991] T. Filkorn, R. Schmid, E. Tiden and P. Warkentin.
Experiences from a Large Industrial Circuit Design Application,
Proc. International Logic Programming Symposium, 581-595, 1991.
[Fitting, 1986] M. Fitting. A Kripke-Kleene Semantics for Logic Programs,
Journal of Logic Programming, 4, 295-312, 1985.
[Fourier, 1824] J-B.J. Fourier. Reported in: Analyse des travaux de
1'Academic Royale des Sciences, pendant l'annee 1824, Partie math-
ematique, Histoire de I'Academie Royale des Sciences de I'lnstitut
de France, Vol. 7, xlvii-lv, 1827. (Partial English translation in: D.A.
Kohler. Translation of a Report by Fourier on his work on Linear
Inequalities. Opsearch, Vol. 10, 38-42, 1973)
[Freeman-Benson, 1991] B.N. Freeman-Benson. Constraint Imperative
Programming, PhD thesis, Department of Computer Science and
Engineering, University of Washington, 1991.
[Fruhwirth and Hanschke, 1993] T. Friihwirth and P. Hanschke. Termino-
logical Reasoning with Constraint Handling Rules, Proc. Workshop
on Principles and Practice of Constraint Programming, 82-91,1993.
[Fruhwirth et al., 1992] T. Fruhwirth, A. Herold, V. Kuchenhoff, T. Le
Provost, P. Lim and M. Wallace. Constraint Logic Programming—
An Informal Introduction, in: Logic Programming in Action, LNCS
636, Springer-Verlag, 3-35, 1992.
[Gabbrielli and Levi, 1991] M. Gabbrielli and G. Levi. Modeling Answer
Constraints in Constraint Logic Programs, Proc. 8th International
Conference on Logic Programming, 238-252, 1991.
[Gaifman et al., 1991] H. Gaifman, M.J. Maher and E. Shapiro. Replay,
684 Joxan Jaffar and Michael J. Maher

Recovery, Replication and Snapshots of Nondeterministic Concur-


rent Programs, Proc. 10th. ACM Symposium on Principles of Dis-
tributed Computation, 1991.
[Gailly et al, 1992] P.-J. Gailly, W. Krautter, C. Bisiere and S. Bescos.
The Prince project and its Applications, in: Logic Programming in
Action, LNCS 636, Springer-Verlag, 54-63, 1992.
[Garcia and Hermenegildo, 1993] M. Garcia de la Banda and M.
Hermenegildo. A Practical Approach to the Global Analysis of
Constraint Logic Programs, Proc. International Logic Programming
Symposium, 437-455, 1993.
[Garcia et al., 1993] M. Garcia de la Banda, M. Hermenegildo and K. Mar-
riott. Independence in Constraint Logic Programs, Proc. Interna-
tional Logic Programming Symposium, 130-146, 1993.
[Gelfond and Lifschitz, 1988] M. Gelfond and V. Lifschitz. The Stable
Model Semantics for Logic Programming, Proc. 5th International
Conference on Logic Programming, 1070-1080, 1988.
[Havens et al., 1992] W. Havens, S. Sidebottom, G. Sidebottom, J. Jones
and R. Ovans. Echidna: A Constraint Logic Programming Shell,
Proc. Pacific Rim International Conference on Artificial Intelli-
gence, 1992.
[Heintze et al, 1992] N. Heintze, S. Michaylov and P.J. Stuckey. CLP(K)
and Some Electrical Engineering Problems, Journal of Automated
Reasoning 9, 231-260, 1992.
[Helm et al., 1991] R. Helm, K. Marriott and M. Odersky. Building Visual
Language Parsers, Proc. Conf. on Human Factors in Computer Sys-
tems (CHI'91), 105-112, 1991.
[Helm et al., 1991] R. Helm, K. Marriott and M. Odersky. Constraint-
based Query Optimization for Spatial Databases, Proc. 10th ACM
Symp. on Principles of Database Systems, 181-191, 1991.
[Hickey, 1993] T. Hickey. Functional Constraints in CLP Languages, in:
Constraint Logic Programming: Selected Research, F. Benhamou
and A. Colmerauer, eds. pp. 355-381, MIT Press, 1993.
[Hohfeld and Smolka, 1988] M. Hohfeld and G. Smolka. Definite Relations
over Constraint Languages, LILOG Report 53, IBM Deutschland,
1988.
[Homiak, 1991] D. Homiak. A CLP System for Solving Partial Differential
Equations with Applications to Options Valuation, Masters Project,
DePaul University, 1991.
[Hong, 1993] H. Hong. RISC-CLP(Real): logic programming with non-
linear constraints over the reals. In Constraint Logic Programming:
Selected Research, F. Benhamou and A. Colmerauer, eds. pp. 133-
159. MIT Press, 1993.
Constraint Logic Programming 685

[Horn, 1951] A. Horn. On sentences which are true of direct unions of al-
gebras. Journal of Symbolic Logic, 16, 14-21, 1951.
[Huynh and Lassez, 1988] T. Huynh and C. Lassez. A CLP(R) Options
Trading Analysis System, Proceedings 5th International Conference
on Logic Programming, pp. 59-69, 1988.
[Huynh et al., 1990] T. Huynh, C. Lassez and J-L. Lassez. Practical issues
on the projection of polyhedral sets. Annals of Mathematics and
Artificial Intelligence, 6, 295-315, 1992.
[Imbert, 1993a] J.-L. Imbert. Variable elimination for disequations in gen-
eralized linear constraint systems. The Computer Journal, 36, 473-
484, 1993.
[Imbert, 1993b] J.-L. Imbert. Fourier's Elimination: which to choose? Proc.
Workshop on Principles and Practice of Constraint Programming,
Newport, pp. 119-131, April 1993.
[Jaffar, 1984] J. Jaffar. Efficient unification over infinite terms. New Gen-
eration Computing, 2, 207-219, 1984.
[Jaffar, 1990] J. Jaffar. Minimal and Complete Word Unification, Journal
of the ACM, 37, 47-85, 1990.
[Jaffar and Lassez, 1986] J. Jaffar and J.-L. Lassez. Constraint Logic Pro-
gramming, Technical Report 86/73, Department of Computer Sci-
ence, Monash University, 1986.
[Jaffar and Lassez, 1987] J. Jaffar and J.-L. Lassez. Constraint Logic Pro-
gramming, Proc. 14th ACM Symposium on Principles of Program-
ming Languages, Munich (January 1987), pp. 111-119.
[Jaffar and Stuckey, 1986] J. Jaffar and P. Stuckey. Canonical Logic Pro-
grams, Journal of Logic Programming, 3, 143-155, 1986.
[Jaffar et al., 1984] J. Jaffar, J.-L. Lassez and M.J. Maher. A theory of
complete logic programs with equality. Journal of Logic Program-
ming, 1, 211-223, 1984.
[Jaffar et al., 1986] J. Jaffar, J.-L. Lassez and M.J. Maher. A logic pro-
gramming language scheme. In Logic Programming: Relations,
Functions and Equations, D. DeGroot and G. Lindstrom, eds. pp.
441-467. Prentice-Hall, 1986.
[Jaffar et al., 1991] J. Jaffar, S. Michaylov and R.H.C. Yap. A Methodol-
ogy for Managing Hard Constraints in CLP Systems, Proc. ACM-
SIGPLAN Conference on Programming Language Design and Im-
plementation, pp. 306-316, 1991.
[Jaffar et al., 1992a] J. Jaffar, S. Michaylov, P. Stuckey and R.H.C. Yap.
The CLP(R.) language and system. ACM Transactions on Program-
ming Languages, 14, 339-395, 1992.
[Jaffar et al., 1992b] J. Jaffar, S. Michaylov, P. Stuckey and R.H.C. Yap.
An Abstract Machine for CLP(R.), Proceedings ACM-SIGPLAN
686 Joxan Jaffar and Michael J. Maher

Conference on Programming Language Design and Implementation,


pp. 128-139, 1992.
[Jaffar et al, 1993] J. Jaffar, M.J. Maher, P.J. Stuckey and R.H.C. Yap.
Projecting CLP(R) constraints. New Generation Computing, 11,
449-469, 1993.
[Janson and Haridi, 1991] S. Janson and S. Haridi.
Programming Paradigms of the Andorra Kernel Language, Proc.
International Logic Programming Symposium, pp. 167-183, 1991.
[Jorgensen et al., 1991] N. Jorgensen, K. Marriott and S. Michaylov. Some
Global Compile-time Optimizations for CLP (R.), Proceedings 1991
International Logic Programming Symposium, pp. 420-434, 1991.
[Jourdan et al., 1990] J. Jourdan and R.E. Valdes-Perez. Constraint Logic
Programming Applied to Hypothetical Reasoning in Chemistry,
Proceedings North American Conference on Logic Programming, pp.
154-172, 1990.
[Kanchanasut and Sumetphong, 1992] K. Kanchanasut and C. Sumet-
phong. Floor Planning Applications in CLP (R), Proc. JICSLP
Workshop on Constraint Logic Programming, pp. 36-44, 1992.
[Kanellakis et al., 1990] P. Kanellakis, G. Kuper and P. Revesz. Constraint
query languages. Journal of Computer and System Sciences, to ap-
pear. Preliminary version appeared in Proc. 9th ACM Symp. on
Principles of Database Systems, pp. 299-313, 1990.
[Kanellakis et al., 1993] P. Kanellakis, S. Ramaswamy, D.E. Vengroff and
J.S. Vitter. Indexing for Data Models with Constraints and Classes,
Proc. ACM Symp. on Principles of Database Systems, 1993.
[Kanellakis et al., to appear] P. Kanellakis, J-L. Lassez and V. Saraswat
(Eds). Principles and Practice of Constraint Programming, MIT
Press, to appear.
[Kemp et al., 1989] D. Kemp, K. Ramarohanarao, I. Balbin and K.
Meenakshi. Propagating Constraints in Recursive Deductive
Databases, Proc. North American Conference on Logic Program-
ming, pp. 981-998, 1989.
[Kemp and Stuckey, 1993] D. Kemp and P. Stuckey. Analysis based Con-
straint Query Optimization, Proc. 10th International Conference on
Logic Programming, pp. 666-682, 1993.
[Khachian, 1979] L.G. Khachian. A polynomial algorithm in linear pro-
gramming, Soviet Math. Dokl., 20, 191-194, 1979.
[Klee and Minty, 1972] V. Klee and G.J. Minty. How good is the Simplex
algorithm? In Inequalities-Ill, O. Sisha, ed. pp. 159-175. Academic
Press, New York, 1972
[Klug, 1988] A. Klug. On conjunctive queries containing inequalities. Jour-
nal of the ACM, 35, 146-160, 1988.
Constraint Logic Programming 687

[Koscielski and Pacholski, 1992] A. Koscielski and L. Pacholski. Complex-


ity of Unification in Free Groups and Free Semigroups, Proc. 31st
Symp. on Foundations of Computer Science, pp. 824-829, 1990.
[Krishnamurthy et al., 1988] R. Krishnamurthy, R. Ramakrishnan and O.
Shmueli. A Framework for Testing Safety and Effective Cbmputabil-
ity of Extended Datalog, Proc. ACM Symp. on Management of
Data, pp. 154-163, 1988.
[Kuip, 1993] C.A.C. Kuip. Algebraic languages for mathematical program-
ming. European Journal of Operations Research, 67, 25-51, 1993.
[Kunen, 1987] K. Kunen. Negation in logic programming. Journal of Logic
Programming, 4, 289-308, 1987.
[Ladret, 1993] D. Ladret and M. Rueher. Contribution of Logic Program-
ming to support Telecommunications Protocol Tests, Proc. 10th In-
ternational Conference on Logic Programming, pp. 845-846, 1993.
[Lakmazaheri and Rasdorf, 1989] S. Lakmazaheri and W. Rasdorf. Con-
straint logic programming for the analysis and partial synthesis
of Truss structures. Artificial Intelligence for Engineering Design,
Analysis, and Manufacturing, 3, 157-173, 1989.
[Lassez, 1987] C. Lassez. Constraint logic programming: a tutorial. BYTE
Magazine, 171-176. August 1987,
[Lassez and Lassez, 1991] C. Lassez and J.-L. Lassez. Quantifier elimina-
tion for conjunctions of linear constraints via a convex hull algo-
rithm. In Symbolic and Numeric Computation for Artificial Intelli-
gence, B. Donald, D. Kapur and J.L. Mundy, eds. Academic Press,
to appear. Also, IBM Research Report RC16779, T.J. Watson Re-
search Center,
[Lassez and McAloon, 1990] J.-L. Lassez and K. McAloon. A Constraint
Sequent Calculus, Proc. of Symp. on Logic in Computer Science,
pp. 52-62, 1990.
[Lassez and McAloon, 1992] J.-L. Lassez and K. McAloon. A canonical
form for generalized linear constraints. Journal of Symbolic Com-
putation, 13, 1-24, 1992.
[Lassez and Marriott, 1987] J.-L. Lassez and K.G. Marriott. Explicit rep-
resentation of terms defined by counter examples. Journal of Auto-
mated Reasoning, 3, 301-317, 1987.
[Lassez et al., 1987] C. Lassez, K. McAloon and R.H.C. Yap. Constraint
logic programming and options trading. IEEE Expert, 2, Special
Issue on Financial Software, August 1987, 42-50. 1991.
[Lassez et al., 1988] J.-L. Lassez, M. Maher and K.G. Marriott. Unification
revisited. In Foundations of Deductive Databases and Logic Pro-
gramming, J. Minker, ed. pp. 587-625. Morgan Kaufmann, 1988.
[Lassez et al., 1989] J.-L. Lassez, T. Huynh and K. McAloon. Simplifcation
688 Joxan Jaffar and Michael J. Maher

and Elimination of Redundant Linear Arithmetic Constraints, Proc.


North American Conference on Logic Programming, Cleveland, pp.
35-51, 1989.
[Lauriere, 1978] J-L. Lauriere. A language and a program for stating and
solving combinatorial problems. Artificial Intelligence, 10, 29-127,
1978.
[Leler, 1988] W. Leler. Constraint Programming Languages: Their Specifi-
cation and Generation, Addison-Wesley, 1988.
[Le Provost and Wallace, 1993] T. Le Provost and M. Wallace. General-
ized constraint propagation over the CLP scheme. Journal of Logic
Programming, 16, 319-359, 1993.
[Leung, 1993] H.F. Leung. Distributed Constraint Logic Programming, Vol.
41, World-Scientific Series in Computer Science, World-Scientific,
1993.
[Li et al., 1993] L.L. Li, M. Reeve, K. Schuerman, A. Veron. J. Bellone, C.
Pradelles, Z. Palaskas, D. Stamatopoulos, D. Clark, S. Doursenot,
C. Rawlings, J. Shirazi and G. Sardu, APPLAUSE: Applications
Using the ElypSys Parallel CLP System, Proc. 10th International
Conference on Logic Programming, pp. 847-848, 1993.
[Lloyd, 1987] J.W. Lloyd. Foundations of Logic Programming, Springer-
Verlag, Second Edition, 1987.
[Lloyd and Topor, 1984] J.W. Lloyd and R.W. Topor. Making Prolog more
expressive. Journal of Logic Programming, 1, 93-109, 1984.
[McAloon and Tretkoff, 1989] K. McAloon and C. Tretkoff. 2LP: A Logic
Programming and Linear Programming System, Brooklyn College
Computer Science Technical Report No 1989-21, 1989.
[McDonald et al, 1993] A. McDonald, P. Stuckey and R.H.C. Yap. Redun-
dancy of Variables in CLP (R), Proc. International Logic Program-
ming Symposium, pp. 75-93, 1993.
[McKinsey, 1943] J. McKinsey. The decision problem for some classes of
sentences without quantifiers. Journal of Symbolic Logic, 8, 61-76,
1943.
[Maher, 1987] M.J. Maher. Logic Semantics for a Class of Committed-
Choice Programs, Proc. 4th International Conference on Logic Pro-
gramming, pp. 858-876, 1987.
[Maher, 1988] M.J. Maher. Complete Axiomatizations of the Algebras of
Finite, Rational and Infinite Trees, Proc. 3rd. Symp. Logic in Com-
puter Science, pp. 348-357, 1988. Full version: IBM Research Re-
port, T.J. Watson Research Center.
[Maher, 1992] M.J. Maher. A CLP View of Logic Programming, Proc.
Conf, on Algebraic and Logic Programming, LNCS 632, pp. 364-
383, 1992.
Constraint Logic Programming 689

[Maher, 1993a] M.J. Maher. A transformation system for deductive


database modules with perfect model semantics. Theoretical Com-
puter Science, 110, 377-403, 1993.
[Maher, 1993b] M.J. Maher. A Logic Programming View of CLP, Proc.
10th International Conference on Logic Programming, pp. 737-753,
1993. Full version: IBM Research Report, T.J. Watson Research
Center.
[Maher and Stuckey, 1989] M.J. Maher and P.J. Stuckey. Expanding
Query Power in CLP Languages, Proc. North American Conference
on Logic Programming, pp. 20-36. 1989.
[Makanin, 1977] G.S. Makanin. The problem of solvability of equations in
a free semigroup. Math. USSR Sbornik, 32, 129-198,1977. (English
translation, AMS 1979).
[Mannila and Ukkonen, 1986] H. Mannila and E. Ukkonen. On the Com-
plexity of Unification Sequences, Proc. 3rd International Conference
on Logic Programming, pp. 122-133, 1986.
[Mantsivoda, 1993] A. Mantsivoda. Flang and its Implementation, Proc.
Symp. on Programming Language Implementation and Logic Pro-
gramming, LNCS 714, pp. 151-166, 1993.
[Marriott and Stuckey, 1993a] K.G. Marriott and P.J. Stuckey. The 3 R's
of optimizing constraint logic programs: Refinement, removal and
reordering, Proc. 20th ACM Symp. Principles of Programming Lan-
guages, pp. 334-344, 1993.
[Marriott and Stuckey, 1993b] K.G. Marriott and P.J. Stuckey. Semantics
of CLP Programs with Optimization, Technical Report, University
of Melbourne, 1993.
[Marriott et al, 1994] K. Marriott, H. S0ndergaard, P.J. Stuckey and
R.H.C. Yap. Optimising Compilation for CLP(R), Proc. Australian
Computer Science Conf., pp. 551-560, 1994.
[Martin and Nipkow, 1989] U. Martin
and T. Nipkow, Boolean unification—the story so far. Journal of
Symbolic Computation, 7, 275-293, 1989.
[MathLab, 1983] MathLab. MACSYMA Reference Manual, The MathLab
Group, Laboratory for Computer Science, MIT, 1983.
[Michaylov, 1992] S. Michaylov. Design and Implementation of Practical
Constraint Logic Programming Systems, Ph.D. Thesis, Carnegie
Mellon University, Report CMU-CS-92-16, August 1992.
[Michaylov and Pfenning, 1993] S Michaylov and F. Pfenning. Higher-
order Logic Programming as Constraint Logic Programming, Proc.
Workshop on Principles and Practice of Constraint Programming,
1993.
[Miller, 1991] D. Miller. A Logic Programming Language with Lambda-
690 Joxan Jaffar and Michael J. Maher

abstraction, Function Variables, and Simple Unification, in Ex-


tensions of Logic Programming: International Workshop, Springer-
Verlag LNCS 475, 253-281, 1991.
[Miller and Nadathur, 1986] D. Miller and G. Nadathur. Higher-order
Logic Programming, Proc. 3rd International Conference on Logic
Programming, pp. 448-462, 1986.
[Montanari and Rossi, 1993] U. Montanari and F. Rossi. Graph rewriting
for a partial ordering semantics of concurrent constraint program-
ming. Theoretical Computer Science, 109, 225-256, 1993.
[Mukai, 1987] K. Mukai. Anadic Tuples in Prolog, Technical Report TR-
239, ICOT, Tokyo, 1987.
[Mumick et al, 1990] I.S. Mumick, S.J. Finkelstein, H. Pirahesh and R.
Ramakrishnan. Magic Conditions, Proc. 9th ACM Symp. on Prin-
ciples of Database Systems, pp. 314-330, 1990.
[Naish, 1985] L. Naish. Automating control for logic programs. Journal of
Logic Programming, 2, 167-183, 1985.
[Naish, 1986] L. Naish. Negation and Control in PROLOG, Lecture Notes
in Computer Science 238, Springer-Verlag, 1986.
[Nelson, 1985] G. Nelson. JUNO, a constraint based graphics system. Com-
puter Graphics, 19, 235-243, 1985.
[Nerode and Kohn, 1993] A. Nerode and W. Kohn. Hybrid Systems and
Constraint Logic Programming, Proc. 10th International Conference
on Logic Programming, pp. 18-24, 1993.
[Older and Benhamou, 1993] W. Older and F. Benhamou. Programming
in CLP(BNR), Proc. Workshop on Principles and Practice of Con-
straint Programming, pp. 239-249, 1993.
[Paterson and Wegman, 1978] M.S. Paterson and M.N. Wegman. Linear
unification. Journal of Computer and System Sciences, 16, 158-167,
1978.
[Pfenning, 1991] F. Pfenning. Logic programming in the LF logical frame-
work. In Logical Frameworks, G. Huet and G. Plotkin, eds. pp. 149-
181, Cambridge University Press, 1991.
[Podelski and van Roy, 1993] A. Podelski and P. van Roy. The Beauty and
the Beast Algorithm: Testing Entailment and Disentailment Incre-
mentally, draft manuscript, 1993.
[Przymusinski, 1988] T. Przymusinski. On the declarative semantics of de-
ductive databases and logic programs. In Foundations of Deductive
Databases and Logic Programming, J. Minker, ed. pp. 193-216. Mor-
gan Kaufmann, 1988.
[Rajasekaar, 1993] A. Rajasekar. String Logic Programs, draft manuscript,
Dept. of Computer Science, Univ. of Kentucky, 1993.
Constraint Logic Programming 691

[Ramachandran and van Hentenryck, 1993] V. Ramachandran and P. van


Hentenryck. Incremental Algorithms for Constraint Solving and En-
tailment over Rational Trees, Proc. 13th Conf. on Foundations of
Software Technology and Theoretical Computer Science, LNCS 761,
pp. 205-217, 1993.
[Ramakrishnan, 1991] R. Ramakrishnan. Magic templates: a spellbinding
approach to logic programs. Journal of Logic Programming, 11, 189-
216, 1991.
[Ramalingam and Reps, 1993] G. Ramalingam and T. Reps. A Catego-
rized Bibliography on Incremental Computation, Proc. 17th ACM
Symp. on Principles of Programming Languages, pp. 502-510, 1993.
[Rueher, 1993] M. Rueher. A first exploration of Prologlll's capabilities.
Software-Practice and Experience, 23, 177-200, 1993.
[Sagiv and Vardi, 1989] Y. Sagiv and M. Vardi. Safety of Datalog Queries
over Infinite Databases, Proc. ACM Symp. on Principles of Database
Systems, pp. 160-171, 1989.
[Sakai et al., to appear] K. Sakai, Y. Sato and S. Menju. Boolean Groebner
Bases, to appear.
[Saraswat, 1987] V. Saraswat. CP as a General-purpose Constraint-
language, Proc. AAAI-87, pp. 53-58, 1987.
[Saraswat, 1988] V. Saraswat. A Somewhat Logical Formulation of CLP
Synchronization Primitives, Proc. 5th International Conference
Symposium on Logic Programming, pp. 1298-1314, 1988.
[Saraswat, 1989] V. Saraswat. Concurrent Constraint Programming Lan-
guages, Ph.D. thesis, Carnegie-Mellon University, 1989. Revised ver-
sion appears as Concurrent Constraint Programming, MIT Press,
1993.
[Saraswat, 1992] V. Saraswat. The Category of Constraint Systems is
Cartesian-Closed, Proc. Symp. on Logic in Computer Science, pp.
341-345, 1992.
[Saraswat, to appear: 2] V. Saraswat. A Retrospective Look at Concurrent
Logic Programming, in preparation.
[Saraswat et al., 1988] V. Saraswat, D. Weinbaum, K. Kahn, and E.
Shapiro. Detecting Stable Properties of Networks in Concurrent
Logic Programming Languages, Proc. 7th. ACM Symp. Principles
of Distributed Computing, pp. 210-222, 1988.
[Saraswat et al., 1991] V. Saraswat, M. Rinard and P. Panangaden. Se-
mantic Foundation of Concurrent Constraint Programming, Proc.
18th ACM Symp. on Principles of Programming Languages, pp. 333-
352, 1991.
[Satoh and Aiba, 1993] K. Satoh and A. Aiba. Computing soft constraints
by hierarchical constraint logic programming. Journal of Informa-
692 Joxan Jaffar and Michael J. Maher

tion Processing, 7, 1993,


[Scott, 1982] D. Scott. Domains for denotational semantics, Proc. ICALP,
LNCS 140, 1982.
[Shapiro, 1983a] E. Shapiro. A Subset of Concurrent Prolog and its Inter-
preter, Technical Report CS83-06, Dept of Applied Mathematics,
Weizmann Institute of Science, 1983.
[Shapiro, 1983b] E. Shapiro. Logic Programs with Uncertainties: A Tool
for Implementing Expert Systems, Proc. 8th. IJCAI, pp. 529-532,
1983.
[Shoenfield, 1967] J.R. Shoenfield. Mathematical Logic, Addison-Wesley,
1967.
[Siekmann, 1989] J. Siekmann. Unification theory. Journal of Symbolic
Computation, 7, 207-274, 1989.
[Simonis, 1989a] H. Simonis. Formal verification of multipliers. In Proceed-
ings of the IFIP TC10/WG10.2/WG10.5 Workshop on Applied For-
mal Methods for Correct VLSI Design, Leuven, Belgium, L.J.M.
Claesen, ed. November 1989.
[Simonis, 1989b] H. Simonis. Test Generation using the Constraint Logic
Programming language CHIP, Proc. 6th International Conference
on Logic Programming, 1989.
[Simonis and Dincbas, 1987a] H. Simonis and M. Dincbas. Using an ex-
tended Prolog for digital circuit design, IEEE International Work-
shop on AI Applications to CAD Systems for Electronics, Munich,
Germany, pp. 165-188, October 1987.
[Simonis and Dincbas, 1987b] H. Simonis and M. Dincbas. Using Logic
Programming for Fault Diagnosis in Digital Circuits, German Work-
shop on Artificial Intelligence (GWAI-87), Geseke, Germany, pp.
139-148, September 1987.
[Simonis and Dincbas, 1993] H. Simonis and M. Dincbas. Prepositional
calculus problems in CHIP. In Constraint Logic Programming: Se-
lected Research, F. Benhamou and A. Colmerauer, eds. pp.269-285.
MIT Press, 1993.
[Simonis and Graf, 1990] H. Simonis and T. Graf. Technology Mapping in
CHIP, Technical Report TR-LP-44, ECRC, Munich, 1990.
[Simonis and Le Provost, 1989] H. Simonis and T. Le Provost. Circuit
verification in chip: Benchmark results. Proceedings of the IFIP
TC10/WG10.2/WG10.5 Workshop on Applied Formal Methods for
Correct VLSI Design, Leuven, Belgium, pp. 125-129, November
1989.
[Simonis et al, 1988] H. Simonis, H. N. Nguyen and M. Dincbas. Verifi-
cation of digital circuits using CHIP, Proceedings of the IFIP WG
Constraint Logic Programming 693

10.2 International Working Conference on the Fusion of Hardware


Design and Verification, Glasgow, Scotland, July 1988.
[Skuppin and Buckle, 1992] R. Skuppin and T. Buckle. CLP and Space-
craft Attitude Control, Proc. JICSLP Workshop on Constraint Logic
Programming, pp. 45-54, 1992.
[Smolka, 1991] G. Smolka. Residuation and guarded rules for constraint
logic programming. In Constraint Logic Programming: Selected Re-
search, F. Benhamou and A. Colmerauer, eds. pp. 405-419. MIT
Press, 1993.
[Smolka and Treinen, 1992] G. Smolka and R. Treinen. Records for logic
programming. Journal of Logic Programming, to appear. Also in:
Proceedings of the Joint International Conference and Symposium
on Logic Programming, pp. 240-254, 1992.
[Srivastava, 1993] D. Srivastava. Subsumption and indexing in constraint
query languages with linear arithmetic constraints. Annals of Math-
ematics and Artificial Intelligence, 8, 315-343, 1993.
[Srivastava and Ramakrishnan, 1992] D. Srivastava and R. Ramakrishnan.
Pushing constraint selections. Journal of Logic Programming, 16,
361-414, 1993.
[Stallman and Sussman, 1977] R.M. Stallman and G.J. Sussman. Forward
reasoning and dependency directed backtracking in a system for
computer aided circuit analysis. Artificial Intelligence, 9, 135-196,
1977.
[Steele and Sussman, 1980] G. Steele and
G.J. Sussman. CONSTRAINTS—a language for expressing almost
hierarchical descriptions. Artificial Intelligence, 14, 1-39, 1980.
[Steele, 1980] G.L. Steele. The Implementation and Definition of a Com-
puter Programming Language Based on Constraints, Ph.D. Disser-
tation (MIT-AI TR 595), Dept. of Electrical Engineering and Com-
puter Science, M.I.T. 1980.
[Sthanusubramonian, 1991] T. Sthanusubramonian. A Transformational
Approach to Configuration Design, Master's thesis, Engineering De-
sign Research Center, Carnegie Mellon University, 1991.
[Stickel, 1984] M. Stickel. Automated deduction by theory resolution. Jour-
nal of Automated Reasoning, 1, 333-355, 1984.
[Studkey, 1989] P.J. Stuckey. Incremental linear constraint solving and de-
tection of implicit equalities. ORSA Journal of Computing, 3, 269-
274, 1991.
[Stuckey, 1991] P. Stuckey. Constructive Negation for Constraint Logic
Programming, Proc. Logic in Computer Science Conference, pp.
328-339, 1991. Full version in Information and Computation.
[Subramanian and Wang, 1993] D. Subramanian and C-S. Wang. Kine-
694 Joxan Jaffar and Michael J. Maher

matic synthesis with configuration spaces, Proc. Qualitative Rea-


soning 1993, D. Weld, ed. pp. 228-239. 1993.
[Sutherland, 1963] I. Sutherland. A Man-Machine Graphical Communi-
cation System, PhD thesis, Massachusetts Institute of Technology,
January 1963.
[Tarski, 195l] A. Tarski. A Decision Method for Elementary Algebra and
Geometry, University of California Press, 1951.
[Taylor, 1990] A. Taylor. LIPS on a MIPS: Results from a Prolog Com-
piler for a RISC, Proceedings 7th International Conference on Logic
Programming, pp. 174-185, 1990.
[Terasaki et al., 1992] S. Terasaki, D.J. Hawley, H. Sawada, K. Satoh, S.
Menju, T. Kawagishi, N. Iwayama and A. Aiba. Parallel Constraint
Logic Programming Language GDCC and its Parallel Constraint
Solvers, Proc. International Conference on Fifth Generation Com-
puter Systems 1992, Volume I, pp. 330-346, 1992.
[Tick, 1993] E. Tick. The Deevolution of Concurrent Logic Programming
Languages, draft manuscript, 1993.
[Tobias, 1988] J.C. Tobias, II. Knowledge Representation in the Harmony
intelligent tutoring system, Master's thesis, Department of Com-
puter Science, University of California at Los Angeles, 1988.
[Tong and Leung, 1993] B.M. Tong and H.F. Leung. Concurrent Con-
straint Logic Programming on Massively Parallel SIMD Computers,
Proc. International Logic Programming Symposium, pp. 388-402,
1993.
[Tsang, 1993] E. Tsang. Foundations of Constraint Satisfaction, Academic
Press, 1993.
[van Emden, 1986] M. van Emden. Quantitative deduction and its fixpoint
theory. Journal of Logic Programming, 37-53, 1986.
[van Gelder, 1988] A. van Gelder. Negation as failure using tight deriva-
tions for general logic programs. In Foundations of Deductive
Databases and Logic Programming, J. Minker, ed. pp. 149-176. Mor-
gan Kaufmann, 1988.
[van Gelder et al., 1988] A. van Gelder, K. Ross and J.S. Schlipf. Un-
founded sets and well-founded semantics for general logic programs.
Journal of the ACM, 38, 620-650, 1991.
[van Hentenryck, 1989a] P. van Hentenryck. Constraint Satisfaction in
Logic Programming, MIT Press, 1989.
[van Hentenryck, 1989b] P. van Hentenryck. Parallel Constraint Satisfac-
tion in Logic Programming: Preliminary Results of CHIP within
PEPSys, Proc. 6th International Conference on Logic Programming,
pp. 165-180, 1989.
Constraint Logic Programming 695

[van Hentenryck, 1991] P. van Hentenryck. Constraint logic programming.


The Knowledge Engineering Review, 6, 151-194, 1991.
[van Hentenryck, 1992] P. van Hentenryck. Constraint satisfaction using
constraint logic programming. Artificial Intelligence, 58, 113-159,
1992.
[van Hentenryck, 1993] P. van Hentenryck, ed. Special issue on constraint
logic programming. Journal of Logic Programming, 16, 1993.
[van Hentenryck and Deville, 1991a] P. van Hentenryck and Y. Deville.
The Cardinality Operator: A New Logical Connective and its Appli-
cation to Constraint Logic Programming, Proc. International Con-
ference on Logic Programming, pp. 745-759, 1991.
[van Hentenryck and Deville, 1991b] P. van Hentenryck and Y. Deville.
Operational Semantics of Constraint Logic Programming over Finite
Domains, Proc. Symp. on Programming Language Implementation
and Logic Programming, LNCS 528, pp. 395-406, 1991.
[van Hentenryck and Graf, 1991] P. van Hentenryck and T. Graf. Standard
forms for rational linear arithmetics in constraint logic program-
ming. Annals of Mathematics and Artificial Intelligence, 5, 303-319,
1992.
[van Hentenryck et al., 1991] P. van Hentenryck, V. Saraswat and Y. Dev-
ille. Constraint Processing in cc(.Fd), manuscript, 1991.
[van Hentenryck et al., 1993] P. van Hentenryck, V. Saraswat and Y. Dev-
ille. Design, Implementations and Evaluation of the Constraint Lan-
guage cc(FD), Technical Report CS-93-02, Brown University, 1993.
[van Roy and Despain, 1990] P. van Roy and A.M. Despain. The Benefits
of Global Dataflow Analysis for an Optimizing Prolog Compiler,
Proceedings 1990 North American Conference on Logic Program-
ming, pp. 501-515, 1990.
[Veron et al., 1993] A. Veron, K. Schuerman, M. Reeve and L.L. Li. Why
and How in the ElipSys OR-parallel CLP system, Proc. Conf. on
Parallel Architectures and Languages Europe, pp. 291-303, 1993.
[Vitter and Flajolet, 1990] J.S. Vitter and Ph. Flajolet. Average-case
Analysis of Algorithms and Data Structures, Handbook of Theoret-
ical Computer Science, Vol. A, pp. 431-524. Elsevier Science Pub-
lishers, Amsterdam, 1990.
[Voda, 1988a] P. Voda. The Constraint Language Trilogy: Semantics and
Computations, Technical Report, Complete Logic Systems, 1988.
[Voda, 1988b] P. Voda. Types of Trilogy, Proc. 5th International Confer-
ence on Logic Programming, pp. 580-589, 1988.
[Walinsky, 1989] C. Walinsky. CLP(E*): Constraint Logic Programming
with Regular Sets, Proc. 6th International Conference on Logic Pro-
gramming, pp. 181-196, 1989.
696 Joxan Jaffar and Michael J. Maher

[Wallace, 1989] M. Wallace. A computable semantics for general logic pro-


grams. Journal of Logic Programming, 6, 269-297, 1989.
[Wallace, 1993] M. Wallace. Applying constraints for scheduling. In Con-
straint Programming, B. Mayoh, E. Tyugu and J. Penjaam, eds.
NATO Advanced Science Institute Series, Springer-Verlag, 1994.
[Warren, 1983] D.H.D. Warren. An Abstract PROLOG Instruction Set,
Technical note 309, AI Center, SRI International, Menlo Park (Oc-
tober 1983).
[Warren, 1987] D.H.D. Warren. The Andorra Principle, presented at the
Gigalips Workshop, 1987.
[Wilson and Borning, 1993] M. Wilson and A. Borning. Hierarchical con-
straint logic programming. Journal of Logic Programming, 16, 277-
318, 1993.
[Yap, 1991] R.H.C. Yap. Restriction Site Mapping in CLP(R), Proceedings
8th International Conference on Logic Programming, pp. 521-534.
MIT Press, June 1991.
[Yap, 1993] R.H.C. Yap. A constraint logic programming framework
for constructing DNA restriction maps. Artificial Intelligence in
Medicine, 5, 447-464, 1993.
[Yap, 1994] R.H.C. Yap. Contributions to CLP(R), Ph.D. thesis, Depart-
ment of Computer Science, Monash University, January 1994 (ex-
pected).
Transformation of Logic Programs
Alberto Pettorossi and Maurizio Proietti

Contents
1 Introduction 697
2 A preliminary example 701
3 Transformation rules for logic programs 704
3.1 Syntax of logic programs 704
3.2 Semantics of logic programs 706
3.3 Unfold/fold rules 707
4 Correctness of the transformation rules 715
4.1 Reversible transformations 716
4.2 A derived goal replacement rule 719
4.3 The unfold/fold proof method 721
4.4 Correctness results for definite programs 723
4.5 Correctness results for normal programs 736
5 Strategies for transforming logic programs 742
5.1 Basic strategies 745
5.2 Techniques which use basic strategies 747
5.3 Overview of other techniques 760
6 Partial evaluation and program specialization 764
7 Related methodologies for program development 771

1 Introduction
Program transformation is a methodology for deriving correct and efficient
programs from specifications.
In this chapter, we will look at the so called 'rules + strategies' ap-
proach, and we will report on the main techniques which have been intro-
duced in the literature for that approach, in the case of logic programs. We
will also present some examples of program transformation, and we hope
that through those examples the reader may acquire some familiarity with
the techniques we will describe.
The program transformation approach to the development of programs
has been first advocated in the case of functional languages by Burstall and
698 Alberto Pettorossi and Maurizio Proietti

Darlington [1977]. In that seminal paper the authors give a comprehensive


account of some basic transformation techniques which they had already
presented in [Darlington, 1972; Burstall and Darlington, 1975].
Similar techniques were also developed in the case of logic languages
by Clark and Sickel [1977], and Hogger [1981], who investigated the use of
predicate logic as a language for both program specification and program
derivation.
In the transformation approach the task of writing a correct and efficient
program is realized in two phases. The first phase consists in writing an
initial, maybe inefficient, program whose correctness can easily be shown,
and the second phase, possibly divided into various subphases, consists
in transforming the initial program with the objective of deriving a new
program which is more efficient.
The separation of the correctness concern from the efficiency concern is
one of the major advantages of the transformation methodology. Indeed,
using this methodology one may avoid some difficulties often encountered
in other approaches. One such difficulty, which may occur when following
the stepwise refinement approach, is the design of the invariant assertions,
which may be quite intricate, especially when developing very efficient pro-
grams.
The experience gained during the past two decades or so shows that the
methodology of program transformation is very valuable and attractive, in
particular for the task of programming 'in the small', that is, for writing
single modules of large software systems.
Program transformation has also the advantage of being adaptable to
various programming paradigms, and although in this chapter we will focus
our attention on the case of logic languages, in the relevant literature one
can find similar results for the case of functional languages, and also for
the case of imperative and concurrent languages.
The basic idea of the program transformation approach can be picto-
rially represented as in Fig. 1. From the initial program P0, which can be
viewed as the initial specification, we want to obtain a final program Pn
with the same semantic value, that is, we want that SEM[Po] = SEM[Pn]
for some given semantic function SEM. The program Pn is often de-
rived in various steps, that is, by constructing a sequence P0 , . . . , Pn of
programs, called a transformation sequence, such that for 0 < i < n,
SEM[Pi] = SEM[Pi+1], where Pi+1 is obtained from Pi by applying a
transformation rule.
In principle, one might obtain a program Pn such that SEM[P0] =
SEM[Pn] by deriving intermediate programs whose semantic value is com-
pletely unrelated to SEM[P0]. This approach, however, has not been fol-
lowed in practice, because it enlarges in an unconstrained way the search
space to be explored when looking for those intermediate programs.
Sometimes, if the programs are nondeterministic, as it is the case of
Transformation of Logic Programs 699

SEM\SEM\

V
Fig. 1. The program transformation idea: from program P0 we derive
program Pn preserving the semantic value V.

most logic programs which produce a set of answers for any input query,
we may allow transformation steps which are partially correct, but not
totally correct, in the sense that for 0 < i < n and for every input query
Q, SEM[Pi,Q] C SEM[Pi+i, Q]. (Here, and in what follows, the semantic
function SEM is assumed to depend both on the program and the input
query.)
As already mentioned, during the program transformation process one
is interested in reducing the complexity of the derived program w.r.t. the
initial program. This means that for the sequence P0 , . . . , Pn of programs
there exists a cost function C which measures the computational complexity
of the programs, such that C(Po) > C7(Pn).
Notice that we may allow ourselves to derive a program, say Pi, for
some i > 0, such that C(Po) < C'(Pj), because subsequent transformations
may lead us to a program whose tost is smaller than the one of P0. Un-
fortunately, there is no general theory of program transformations which
deals with this situation in a satisfactory way in all possible circumstances.
The efficiency improvement from program PO to program Pn is not
ensured by an undisciplined use of the transformation rules. This is the
reason why we need to introduce the so-called transformation strategies,
that is, meta-rules which prescribe suitable sequences of applications of the
transformation rules.
In logic programming there are many notions of efficiency which have
been used. They are related either to the size of the proofs or to the
machine model which is assumed for the execution of programs. In what
follows we will briefly explain how the strategies which have been proposed
in the literature may improve program efficiency, and we will refer to the
original papers for more details on these issues.
So far we have indicated two major objectives of the program trans-
formation approach, namely the preservation of the semantic value of the
initial program and the reduction of the computational complexity of the
derived program w.r.t. the initial one.
There is a third important objective which is often given less attention:
the formalization of the program transformation process itself. The need
for this formalization derives from the desire of making the program trans-
700 Alberto Pettorossi and Maurizio Proietti

formation approach a powerful programming methodology. In particular,


via this formalization it is possible for the programmer to perform 'similar'
transformations when starting from 'similar' initial programs, thus avoid-
ing the difficulty of deciding which transformation rule should be applied
at every step. It is also possible to make alternative program transforma-
tions by considering any program of a previously constructed sequence of
programs and deriving from it a different sequence by applying different
transformation rules.
The formalization of the program transformation process allows us to
define various transformation strategies and, through them, to give sugges-
tions to the programmer on how to transform the program at hand on the
basis of the sequence of programs constructed so far.
However, it is not always simple to derive from a given program trans-
formation sequence the strategies which can successfully be applied to sim-
ilar initial programs. Research efforts are currently being made in this
direction.
We often refer to the above three major objectives of the program
transformation approach, that is, the preservation of semantics, the im-
provement of complexity, and the formalization of the transformation pro-
cess itself, as the transformation triangle, with the following three vertices:
(i) semantics, (ii) complexity, and (iii) methodology.
Finally, the program transformation methodology should be supported
by an automatic (or semiautomatic) system, which both guides the pro-
grammer when applying the transformation rules, and also acquires in the
form of transformation strategies, some knowledge about successful trans-
formation sequences, while it is in operation.
Together with the 'rules + strategies' approach to program transfor-
mation, in the literature one also finds the so-called 'schemata' approach.
We will not establish here the theoretical difference between these two ap-
proaches, and indeed that difference is mainly based on pragmatic issues.
We will briefly illustrate the schemata approach in Section 5.3.
In Section 2 we will present a preliminary example of logic program
transformation. It will allow the reader to have a better understanding
of the various transformation techniques which we will introduce later. In
Section 3 we will describe the transformation rules for logic programs, and
in Section 4 we will study their correctness w.r.t. the various semantics
which may be preserved. Section 5 is devoted to the introduction of the
transformation strategies which are used for guiding the application of the
rules. Then, in Section 6 we will consider the partial evaluation technique,
and finally, in Section 7 we will briefly indicate some methodologies for
program derivation which are related to program transformation.
Transformation of Logic Programs 701

2 A preliminary example
The 'rules + strategies' approach to program transformation as it was
first introduced in [Burstall and Darlington, 1977] for recursive equation
programs, is based on the use of two elementary transformation rules: the
unfolding rule and the folding rule.
The unfolding rule consists in replacing an instance of the left hand side
of a recursive equation by the corresponding instance of the right hand side.
This rule corresponds to the 'replacement rule' used in [Kleene, 1971] for
the computation of recursively defined functions. The application of the
unfolding rule can also be viewed as a symbolic computation step.
The folding rule consists in replacing an instance of the right hand side
of a recursive equation by the corresponding instance of the left hand side.
Folding can be viewed as the inverse of unfolding, in the sense that, if we
perform an unfolding step followed by a folding step, we get back the initial
expression. Vice versa, unfolding can be viewed as the inverse of folding.
The reader who is not familiar with the transformation methodology,
may wonder about the usefulness of performing a folding step, that is,
of inverting a symbolic computation step, when one desires to improve
program efficiency. However, as we will see in some examples below, the
folding rule allows us to modify the recursive structure of the programs to
be transformed and, by doing so, we will often be able to achieve substantial
efficiency improvements.
Program derivation techniques following the 'rules + strategies' ap-
proach, have been presented in the context of logic programming in [Clark
and Sickel, 1977; Hogger, 1981], where the basic derivation rules consist of
the substitution of a formula by an equivalent formula.
Tamaki and Sato [1984] have adapted the unfolding and folding rules to
the case of logic programs. Following the basic ideas relative to functional
programs, they take an application of the unfolding rule to be equivalent
to a computation step, that is, an application of SLD-resolution, and the
folding rule to be the inverse of unfolding.
As already mentioned, during the transformation process we want to
keep unchanged, at least in a weak sense, the semantic value of the pro-
grams which are derived, and in particular, we want the final program to
be partially correct w.r.t. the initial program.
If from a program P0 we derive by unfold/fold transformations a pro-
gram PI , then the least Herbrand model of PI , as defined in [van Emden and
Kowalski, 1976], is contained in the least Herbrand model of P0 [Tamaki
and Sato, 1984]. Thus, the unfold/fold transformations are partially correct
w.r.t. the least Herbrand model semantics.
In general, unfold/fold transformations are not totally correct w.r.t. the
least Herbrand model semantics, that is, the least Herbrand model of P0
may be not contained in the one of PI . In order to get total correctness
702 Alberto Pettorossi and Maurizio Proietti

one has to comply with some extra conditions [Tamaki and Sato, 1984].
The study of the various semantics which are preserved when using the
unfold/fold transformation rules will be the objective of Section 4.
Let us now consider a preliminary example of program transformation
where we will see in action some of the rules and strategies for transforming
logic programs. In this example, together with the unfolding and folding
rules, we will also see the use of two other transformation rules, called
definition rule and goal replacement rule, and the use of a transformation
strategy, called tupling strategy.
As already mentioned, the need for strategies which drive the appli-
cation of the transformation rules and improve efficiency comes from the
fact that folding is the inverse of unfolding, and thus we may construct
a useless transformation sequence where the final program is equal to the
initial program.
Let us consider the following logic program PO for testing whether or
not a given list is a palindrome:
1. pal([ ]) <-
2. pal((H}) <-
3. pal([H\T\) <- append(Y,[H],T),pal(Y)
4. append([],Y,Y) ^
5. append([H\X], Y, [H\Z]) <- append(X, Y, Z)
We have that, given the lists X, Y, and Z, append(X,Y,Z) holds in the
least Herbrand model of PO iff Z is the concatenation of X and Y.
Both pal(Y) and append(Y,[H],T) visit the same list Y and we may
avoid this double visit by applying the tupling strategy which suggests the
introduction of the following clause for the new predicate newp:
6. newp(L,T) «- append(Y,L,T),pal(Y)
Actually, clause 6 has been obtained by a simultaneous application of
the tupling strategy and the so-called generalization strategy, in the sense
that in the body of clause 3 the argument [H] has been generalized to the
variable L. In Section 5, we will consider the tupling and the generaliza-
tion strategies and we will indicate in what cases they may be useful for
improving program efficiency.
By adding clause 6 to PO we get a new program P\ which is equivalent to
PO w.r.t. all predicates occurring in the initial program PO, in the sense that
each ground atom q(. . .), where q is a predicate occurring in PO, belongs
to the least Herbrand model of PO iff q( . . . ) belongs to the least Herbrand
model of PI .
In order to avoid the double occurrence of the list Y in the body
of clause 3, we now fold that clause using clause 6, that is, we replace
'append(Y, [H], T), pal(Y)' which is an instance of the body of clause 6, by
Transformation of Logic Programs 703

the corresponding instance ' newp([H],Ty of the head of clause 6. Thus,


we get:
3f. pal([H|T]) <- newp([H],T)
This folding step is the inverse of the step of unfolding clause 3f w.r.t.
newp([H|,T).
Unfortunately, if we use the program made out of clauses 1, 2, 3f, 4, 5,
and 6, we do not avoid the double visit of the input list, because newp is
defined in terms of the two predicates append and pal. As we will show,
a gain in efficiency is possible if we derive a definition of newp in terms of
newp itself. This recursive definition of newp can be obtained as follows.
We first unfold clause 6 w.r.t. pal(Y), that is, we derive the following
three resolvents of clause 6 using clauses 1, 2, and 3, respectively:
7. newp(L, T) <- append([], L, T)
8. newp(L,T) «- append([H],L,T)
9. newp(L,T) <- append([H\Y],L,T),append(R, [H],Y),pal(R)
We then unfold clauses 7, 8, and 9 w.r.t. the atoms append([ ],L,T),
append([H],L,T), and append([H\Y],L,T), respectively, and we get
10. newp(L,L) «-
11. newp(L,[H|L}) <-
12. newp(L, [ H | U ] ) <- append(Y, L, U), append(R, [H], Y),pal(R)
Now, in order to get a recursive definition of the predicate newp where
no multiple visits of lists are performed, we would like to fold the entire
body of clause 12 using clause 6, and in order to do so we need to have only
one occurrence of the atom append. We can do so by applying the goal
replacement rule (actually, the version formalized by rule R5.1 on page 713)
which allows us to replace the goal 'append(Y,L,U), append(R,[H],Y)'
by the equivalent goal 'append(R,[H|L],U)' Thus, we get the following
clause:
13. newp(L,[H|U]) <-append(R,[H\L],U),pal(R)
We can fold clause 13 using clause 6, and we get:
13f. newp(L,[H\U])<-newp([H\L],U)
Having derived a recursive definition of newp, the transformation pro-
cess is completed. The final program we have obtained is as follows:
1. pal([]) <-
2. pal([H]) <-
3f. pal([H|T]) «- newp([H],T)
4. append([],Y,Y) <-
5. append([H\X], Y, [H\Z]) <- append(X, F, Z)
704 Alberto Pettorossi and Maurizio Proietti

10. newp(L, L) «-
11. newp(L, [H\L]) <-
13f. newp(L,[H\U]) <- newp([H\L],U)
In this final program no double visits of lists are performed, and the
time complexity is improved from O(n2) to O(n), where n is the size of the
input list. The initial and final programs have the same least Herbrand
model semantics w.r.t. the predicates pal and append.
Notice that if we are interested in the computation of the predicate
pal only, in the final program we can discard clauses 4 and 5, which are
unnecessary.
The crucial step in the above program transformation, which improves
the program performance, is the introduction of clause 6 defining the new
predicate newp. In the literature that step is referred to as a eureka step
and the predicate newp is also called a eureka predicate.
It can easily be seen that eureka steps cannot, in general, be mechani-
cally performed, because they require a certain degree of ingenuity. There
are, however, many cases in which the synthesis of eureka predicates can
be performed in an automatic way, and this is the reason why in practice
the use of the program transformation methodology is very powerful.
In Section 5, we will consider the problem of inventing the eureka pred-
icates and we will see that it can often be solved on the basis of syntactical
properties of the program to be transformed by applying suitable transfor-
mation strategies.

3 Transformation rules for logic programs


In this section, we will present the most frequently used transformation
rules for logic programs considered in the literature.
As already mentioned, the rules for transforming logic programs are
directly derived from those used in the case of functional programs, but
in adapting those rules, care has been taken, because logic programs com-
pute relations rather than functions, and the nondeterminism inherent in
relations does affect the various transformation techniques.
Moreover, for logic programs a rich variety of semantics can be defined
and the choice of a particular semantics to be preserved affects the trans-
formation rules to be used.
These facts motivate the large amount of research work which has been
devoted to the extension of the transformation methodology to logic pro-
grams. Surveys of the work on logic program transformation have appeared
in [Shepherdson, 1992; Pettorossi and Proietti, 1994].

3.1 Syntax of logic programs


Let us first briefly recall the syntax of logic programs and let us introduce
the terminology which we will use. For other notions concerning logic
Transformation of Logic Programs 705

programming not explicitly stated here we will refer to [Lloyd, 1987]. We


assume that all our logic programs are written using symbols taken from
a fixed language L. The Herbrand universe and the Her brand base are
constructed out of L, independently of the programs. This assumption
which is sometimes made in the theory of logic programs (see, for instance,
[Kunen, 1989]), in our case is motivated by the convenience of having a
constant Herbrand universe while transforming programs.
An atom is a formula of the form: p(t1, . . . ,tn) where p is an n-ary
predicate symbol taken from L, and t1 , . . . , tn are terms constructed out of
variables, constants, and function symbols in L. A literal is either a positive
literal, that is, an atom, or a negative literal, that is, a formula of the form:
->A, where A is an atom.
A goal is a finite, possibly empty, sequence of literals not necessarily
distinct. For clarity, a goal may also be written between round parentheses.
In particular, if L\ = L2 the goals (Li,L^) and (L2,L1) are different, even
though their semantics may be the same. Commas will be used to denote
the associative concatenation of goals. Thus, (L1, . . . , Lm), (Lm+1, . . . , Ln)
is equal to (Li , . . . ,L m ,L m+1 , . . . ,Ln).
A clause is a formula of the form: H 4— B, where the head H is an
atom and the body B is a (possibly empty) goal. The head and the body
of a clause C are denoted by hd(C) and bd(C), respectively.
A logic program is a finite sequence (not a set) of clauses.
A query is a formula of the form: «— G, where G is a (possibly empty)
goal. Notice that our notion of query corresponds to that of goal considered
in [Lloyd, 1987].
Goals, clauses, logic programs, and queries are called definite goals,
definite clauses, definite logic programs, and definite queries, respectively,
if no negative literals occur in them. When we want to stress the fact that
occurrences of negative literals are allowed, we will follow [Lloyd, 1987] and
use the qualification normal. Thus, 'normal goal', 'normal clause', 'normal
program', and 'normal query' are synonyms of 'goal', 'clause', 'program',
and 'query', respectively. We will feel free to omit both qualifications 'defi-
nite' and 'normal' when they are irrelevant or understood from the context.
Given a term t, the set of variables occurring in t is denoted by vars(t).
A similar notation will also be adopted for the variables occurring in literals,
goals, clauses, and queries.
A substitution is a finite mapping from variables to terms of the form:
{X1/t1, . . . ,Xn/tn}. The application of a substitution 0 to a term t will
be denoted by t0. Similar notation will be used for substitutions applied
to literals, goals, clauses, and queries.
For other notions related to substitutions, such as instance and most
general unifier, we refer to [Apt, 1990].
A variable renaming is a bijection from the set of variables of the lan-
guage L onto itself. We assume that the variables occurring in a clause
706 Alberto Pettorossi and Maurizio Proietti

can be freely renamed, as usually done for bound variables in quantified


formulas. This is required to avoid clashes of names, as, for instance, when
performing resolution steps. Two clauses which differ only for a variable
renaming are called variants.
For reasons of simplicity, we will identify any two computed answer
substitutions obtained by resolution steps which differ only by the renaming
of the clauses involved.
By a predicate renaming we mean a bijective mapping of the set of
predicate symbols of L onto itself. Given a predicate renaming p and a
program P, by p(P) we denote the program obtained by replacing each
predicate symbol p in P by p(p). A similar notation will be adopted for
the predicate renaming in queries.
Given a predicate p occurring in a logic program P, the definition of p
in P is the subsequence of all clauses in P whose head predicate is p.
We will say that a predicate p depends on a predicate q in the program
P iff either there exists in P a clause of the form: p(. . .) <- B such that q
occurs in the goal B or there exists in P a predicate r such that p depends
on r in P and r depends on q in P.
We say that a query (or a goal) Q depends on a clause of the form:
p(. . .) <- B in a program P iff either p occurs in Q or there exists a predicate
occurring in Q which depends on p. The relevant part Prel (Q) of program
P for the query (or the goal) Q is the subsequence of clauses in P on which
Q depends.

3.2 Semantics of logic programs


When deriving new programs from old ones, we need to take into account
the semantics that is preserved. For the formal definition of the semantics
of a logic program we explicitly consider the dependency on the input query,
and thus we define a semantics of a set P of logic programs w.r.t. a set
Q of queries to be a function SEM: P x Q -> (D, <), where (D, <) is a
partially ordered set.
We will assume that every semantics SEM we consider is preserved by
predicate renaming, that is, for every predicate renaming p, program P,
and query Q, SEM[P,Q] = SEM[p(P),p(Q)].
We will also assume that every semantics SEM is preserved by inter-
changing the order of two adjacent clauses with different head predicates.
Thus, we may always assume that all clauses constituting the definition of
a predicate are adjacent.
We say that two programs P1 and P2 in P are equivalent w.r.t. the
semantics function SEM and the set Q of queries iff for every query Q in
Q we have that SEM[Pl,Q] = SEM[P2,Q].
An example of semantics function can be provided by taking P to be
the set of definite programs (denoted by P + ), Q the set of definite queries
(denoted by Q + ), and D the powerset of the set of substitutions (denoted
Transformation of Logic Programs 707

by 'P(Subst)), ordered by set inclusion.


We now define the least Herbrand model semantics as the function
LHM : P+ x Q+ -> (p(Subst), <) such that, for every P e P+ and
for every <- G e Q+,
LHM[P, <- G] = {0 | every ground instance of GO
is a logical consequence of P}
where we identify the program P and the goal G with the logical formulas
obtained by interpreting sequences of clauses (or atoms) as conjunctions
and interpreting <- as the implication connective. As usual, the empty
sequence of atoms in the body of a clause is interpreted as true.
The least Herbrand model M(P) of a definite program P defined ac-
cording to [van Emden and Kowalski, 1976] can be expressed in terms of
the function LHM as follows:

M(P) = {A | A is a ground atom and LHM[P, <- A] = Subst}.

Thus, two programs P1 and P2 in P+ are equivalent w.r.t. LHM iff

In order to state various correctness results concerning our transforma-


tion rules w.r.t. different semantics functions (see Section 4), we need the
following notion of relevance [Dix, 1995].
Definition 3.2.1 (Relevance). A semantics function SEM: P x Q ->
(D, <) is relevant iff for every program P in P and query Q in Q, we have
that SEM[P,Q] = SEM[P r e l (Q),Q].
Thus, a semantics is relevant iff its value for a given program and a
given query is determined only by the query and the clauses on which the
query depends.
The least Herbrand model semantics and many other semantics we will
consider are relevant but well-known semantics, such as the Clark's comple-
tion [Lloyd, 1987] and the stable model semantics [Gelfond and Lifschitz,
1988], are not.

3.3 Unfold/fold rules


As already mentioned, the program transformation process starting from a
given initial program PO can be viewed as a sequence of programs P0 , . . . , Pk ,
called a transformation sequence [Tamaki and Sato, 1984], such that pro-
gram Pj+1, with 0 < j < k — 1, is obtained from program Pj by the
application of a transformation rule, which may depend on P0, . . . , Pj. An
application of a transformation rule is also called a transformation step.
Since most transformation rules can be viewed as the replacement of a
given clause C by some new clauses C1 , . . . , Cn, the transformation process
can also be represented by means of trees of clauses [Pettorossi and Proietti,
708 Alberto Pettorossi and Maurizio Proietti

1989], where the clauses C1, . . . , Cn are viewed as the son-nodes of C. This
tree-based representation will be useful for describing the transformation
strategies (see Section 5).
The transformation rules we will present in this chapter are collectively
called unfold/fold rules and they are a generalization of those introduced
by [Tamaki and Sato, 1984]. Several special cases of these rules will be
introduced in the subsequent sections, when discussing the correctness of
the transformation rules w.r.t. different semantics of logic programs.
In the presentation of the rules we will refer to the transformation se-
quence P0, . . . , Pk and we will assume that the variables of the clauses which
are involved in each transformation rule are suitably renamed so that they
do not have variables in common.
Rule R13. Unfolding. Let Pk be the program E1, . . . ,Er,C, Er+1 , . . . ,Es
and C be the clause H <- F, A, G, where A is a positive literal and F and
G are (possibly empty) goals. Suppose that
1. D1, . . . ,Dn , with n > 0, is the subsequence of all clauses of a pro-
gram PJ, for some j, with 0 < j < k, such that A is unifiable
with hd(D1) , . . . ,hd(Dn), with most general unifiers 01,. . . ,0n, re-
spectively, and
2. Ci is the clause (H <- F, bd(Di), C)0i, for i = 1, . . . , n.
If we unfold C w.r.t. A using Pj we derive the clauses C1 , . . . , Cn and we
get the new program Pk+1 = EI , . . . , Er, C1 , . . . , Cn, Er+1 , . . . ,Es .
The unfolding rule corresponds to the application of a resolution step
to clause C with the selection of the positive literal A and the input clauses
D1, . . . , Dn.
Example 3.3.1. Let Pk be the following program:
p(X)<-q(t(X)),r(X),r(b)
q(a)<-
q(t(b)) <-
q(X) <- r(X)
Then, by unfolding p(X) <- q(t(X)),r(X),r(b) w.r.t. q(t(X)) using Pk
itself we derive the following program Pk+1:
p(b)<-r(b),r(b)
p(X)<-r(t(X)),r(X),r(b)
g(a)«-
q(t(b)) <-
q(X) 4- r(X)
Remark 3.3.2. There are two main differences between the unfolding rule
in the case of logic programs and the unfolding rule in the case of functional
programs.
Transformation of Logic Programs 709

The first difference is that, when we unfold a clause C w.r.t. an atom A


in bd(C) using a program Pj, it is not required that A be an instance of the
head of a clause in Pj. We only require that A be unifiable with the head
of at least one clause in Pj. This is related to the fact that a resolution
step produces a unifying substitution, not a matching substitution, as it
happens in a rewriting step for functional programs.
The second difference is that in the functional case it is usually assumed
that the equations defining a program are mutually exclusive. Thus, by
unfolding a given equation we may get at most one new equation. On the
other hand, in the logic case there may be several clauses in a program Pj
whose heads are unifiable with an atom A in the body of a clause C. As a
result, by unfolding C w.r.t. .A, we may derive more than one clause.
The unfolding rule is one of the basic transformation rules in all trans-
formation systems for logic programs proposed in the literature. Our pre-
sentation of the rule follows [Tamaki and Sato, 1984] where, however, it is
required that the program used for unfolding a clause is the one where this
clause occurs, that is, with reference to our rule Rl, Pj is required to be
Pk.
Some derivation rules for logic programs similar to the unfolding rule
have been considered in [Komorowski, 1982] and in [Clark and Sickel, 1977;
Hogger, 1981], in the context of partial evaluation and program synthesis
(see also Sections 6 and 7).
Gardner and Shepherdson [1991] have defined a transformation rule
which can be considered as the unfolding of a clause w.r.t. a negative literal.
Given the clause C — H <- F, ->A, G in a program P, where A is a ground
atom, Gardner and Shepherdson's unfolding rule transforms P as follows:
1. either ->A is deleted from the body of C, if the query <— A has a
finitely failed SLDNF-tree in P,
2. or C is deleted from P, if the query <- A has an SLDNF-refutation
in P.
Gardner and Shepherdson's unfolding rule w.r.t. negative literals can be
expressed in terms of the goal replacement rule (in case 1) and the clause
deletion rule (in case 2). These rules will be introduced below.
Also Kanamori and Horiuchi [1987] and Sato [1992] allow unfolding
steps w.r.t. negative literals. However, their notion of program goes beyond
the definition of logic program which we consider here, and their transfor-
mation rules may more properly be regarded as rules for logic program
synthesis starting from first-order logic specifications (see Section 7).
Rule R2. Folding. Let Pk be the program E1, . . . ,E r ,C1, . . . , C n ,E r + 1 ,
. . . , Es and D1 , . . . , Dn be a subsequence of clauses in a program Pj, for
some j, with 0 < j < k. Suppose that there exist an atom A and two goals
F and G such that for each i, with 1 < i < n, there exists a substitution
Oi which satisfies the following conditions:
710 Alberto Pettorossi and Maurizio Proietti

1. Ci is a variant of the clause H <— F, bd(Di)0i, G,


2. A = hd(Di)0i,
3. for every clause D of P, not in the sequence D1, . . . ,Dn, hd(D) is
not unifiable with A, and
4. for every variable X in the set vars(Di) - vars(hd(Di)), we have that
• XOi is a variable which does not occur in (H, F, G) and
• the variable X9i does not occur in the term Y9i, for any variable
Y occurring in bd(Di) and different from X.
If we fold Ci, . . . , Cn using D1, . . . , Dn in Pj we derive the clause
C = H <- F,A,G, and we get the new program Pk+1 = E1, . . . , ET,
C,E r+1 , . . . ,E 8 .
The folding rule is the inverse of the unfolding rule, in the sense
that given a transformation sequence P0, . . . ,Pk, P k+1 , where Pk+i is
obtained from Pk by folding, there exists a transformation sequence
P0, . . . , > Pk Pk+1,Pk, where Pk can be obtained from Pt+i by unfolding.
Notice that the possibility of inverting a folding step by performing an
unfolding step, depends on the fact that for unfolding (as for folding) we
can use clauses taken from any program of the transformation sequence
constructed so far.
Example 3.3.3. Let us consider a transformation sequence Po,Pi where
PI includes the clauses
C1.p(X)<-q(t(X),Y),r(X)
C2. p(Z) <-s(Z),r(Z)
and the definition of predicate a in P0 consists of the clauses
D1 a(U)<-q(U,V)
D2. a(t(W)) <- s(W)
Clauses Ci,C2 can be folded using DI, D2. Indeed, the conditions listed
in the folding rule R2 are satisfied with 0X = {U/t(X),V/Y} and 02 =
{W/Z}. The derived clause is
C. p(X) <-a(t(X)),r(X)
Notice that by unfolding clause C using DI and D2 we get again clauses
Ci and C2.
The following example shows that condition 4 in rule R2 is necessary
for ensuring that, after a folding step, we can perform an unfolding step
which leads us back to the program we had before the folding step.
Example 3.3.4. Let C be p(Z) «- q(Z) and D be r «- q(X). Suppose
that D is the only clause in Pj with head r. Clauses C and D satisfy
conditions 1, 2, and 3 of the folding rule with n = 1 and 01 = {X/Z}.
However, they do not satisfy condition 4 because X does not occur in the
Transformation of Logic Programs 711

head of D, and X01, which is Z, occurs in the head of C. By replacing the


body of C by the head of D we get the clause p(Z) <- r. If we then unfold
p(Z) «— r using Pj we get p(Z) <— g(-?Q, which is not a variant of C.
Notice that, if a program P% can be transformed into a program Pk+i
by an unfolding step, it is not always possible to derive again Pk by means
of a folding step applied to Pk+i (see the following example). Thus, we
may say that folding is only a 'right-inverse' of unfolding.
Example 3.3.5. Let C be the clause p(X) 4- r(X) and D be the clause
r(t(X)) «- q(X). Prom program C,D, by unfolding C using the program
C, D itself, we get the clause Ci = p(t(X)) <- q(X) and the program C1,D.
In order to get back C, D by folding, we would like to fold C1 and derive
(a variant of) C. There are only two ways of applying the folding rule
to C1. The first one is to use clause C1 itself, thereby getting the clause
p(t(X)) <- p ( t ( X ) ) . The second one is to use clause D and if we do so we
get the clause p(t(X)) <- r ( t ( X ) ) . In neither case do we get a variant of
C.
Our presentation of the folding rule is similar to the one in [Gergatsoulis
and Katzouraki, 1994] where, however, during the application of the folding
rule, the introduction of some equality atoms is also allowed. We will deal
with the equality introduction in a separate rule.
Several folding rules which are special cases of rule R2 have been consid-
ered in the literature. These folding rules have various restrictions depend-
ing on: i) the choice of the program in the transformation sequence from
where the clauses used for folding (i.e. D1, . . . , Dn) are taken, ii) the num-
ber of these clauses, and iii) whether or not these clauses are allowed to be
recursive (i.e. predicates in the bodies of the clauses depend on predicates
in the heads of those clauses).
We will present some of these special cases of the folding rule in sub-
sequent sections. In particular, the rule originally introduced by Tamaki
and Sato [1984] will be presented in Section 4.4.1 (see rule R2.2, page 726),
when dealing with the preservation of the least Herbrand model semantics
while transforming programs.
Rule R3. Definition introduction (or Definition, for short). We
may get program Pk+1 by adding to program Pk n clauses of the form:
Pi(. . .) <- Bi, for i = 1, . . . , n, such that the predicate symbol pi does not
occur in P0, . . . ,Pk-
The definition rule is said to be non-recursive iff every predicate symbols
occurring in the bodies Bi's occurs in Pk as well.
Our presentation of the definition rule is similar to the one in [Maher,
1987] and it allows us to introduce one or more new predicate definitions
each of which may consist of more than one clause.
712 Alberto Pettorossi and Maurizio Proietti

Rule R4. Definition elimination. We may get program Pk+1 by delet-


ing from program Pk the clauses constituting the definitions of the pred-
icates q1, . . . ,qn such that, for i = 1, . . . , n, qi does not occur in P0 and
every predicate in Pk which depends on qi is in the set { q 1 , . ..,q n }.
The definition elimination rule can be viewed as an inverse of the defi-
nition introduction rule. It has been presented in [Maher, 1987], where it
has been called deletion, and also in [Bossi and Cocco, 1993], where it has
been called restricting operation.
The next rule we will introduce is the goal replacement rule, which al-
lows us to replace a goal in the body of a clause by an equivalent goal.
Equivalence between goals, as it is usually defined, depends on the seman-
tics of the program Pk where the replacement takes place.
We now introduce a simple notion of goal equivalence which is paramet-
ric w.r.t. the semantics considered. A more complex notion will be given
later.
Definition 3.3.6 (Goal equivalence). Two goals G1 and G2 are equiv-
alent w.r.t. a semantics SEM and a program P iff SEM[P, <- G1] =
SEM[P,<- G2]. (We will feel free to omit the references to SEM and/or
P when they are understood from the context.)
Rule R5. Goal replacement. Let SEM be a semantics function, G\
and GZ two equivalent goals w.r.t. SEM and Pk, and C = H <- L,G1,R
a clause in Pk . By replacement of goal G1 by goal G2 in C we derive the
clause D — H <- L, G2, R and we get Pk+1 from Pk by replacing C by D.
Example 3.3.7. Let SEM be the least Herbrand model semantics LHM
defined in Section 3.2 and Pk be the following program:
C. p(X,Y)-<q(X,Y)
q(a,Y)<-
q(b, Y)<- q(b, Y)
r(X,X)<-
We have that q(X, Y) is equivalent to r(X,a) w.r.t. LHM and Pk.
Thus, by replacement of q(X, Y) in C we derive the clause

The above definition of goal equivalence does not take into account the
clause where the goal replacement occurs. As a result, many substitutions
of goals by new goals which produce from a program Pk a new program
Pk+1, cannot be viewed as applications of our rule R5, even though Pk and
are equivalent programs.
Transformation of Logic Programs 713

Example 3.3.8. Let us consider the following clauses in a program Pk:


C. sublist(N,X, Y) <- length(X, N),append(V,X, I),append(I, Z, Y)
AI. append([],L,L) <-
A2. append([H\T], L, [H\U]) <- append(T, L, U)
If we replace the goal lappend(V, X, I), append(I, Z, F)' in the body of
C by the goal 'append(X,Z,J),append(V,J,Y)' we get a program, say
Pk+1, which is equivalent to Pk w.r.t. the least Herbrand model seman-
tics LHM. However, the two goals 'append(V,X,I),append(I, Z, F)' and
'append(X, Z, J),append(V, J,Y)' are not equivalent in the sense of Defi-
nition 3.3.6.
Indeed, the substitution 9 = {V/[ ],X/[ ],//[ },Z/[ ],Y/[ },J/[a]} be-
longs to LHM[Pk,<- (append(V,X,I),append(I, Z, Y))] and does not be-
long to LHM[Pk, <- (append(X, Z, J), append(V, J, Y))].
In order to overcome the above mentioned limitation of the goal re-
placement rule R5, we now consider a weaker notion of goal equivalence
which depends on a given set of variables. A similar notion was introduced,
for definite programs and the computed answer substitution semantics, by
Cook and Gallagher [1994]. We will then consider a version of the goal
replacement rule based on this weaker notion of goal equivalence.
Definition 3.3.9 (Goal equivalence w.r.t. a set of variables). Let
the program Pk be C1, . . . ,Cn and let SEM be a semantics function. Let
us consider the following two clauses:
D1. newp1(Xi , . . . ,X m )<-G1
D2. newp2(Xi , . . . ,Xm) <- G2
where newp1 and newp2 are distinct predicate symbols not occurring in
Pk, {Xi , . . . , Xm} is a set of m variables, and G1 and G2 are two goals.
Let V denote the set {X1, . . . , X m }. We say that G1 and G2 are equivalent
w.r.t. SEM, Pk and V, and we write G1 =v G2, iff
SEM[(d1, . . . ,C n ,D1), <- new P 1 (X 1 , . . . , X m ) ]
= SEM[(C1, . . . ,C n ,D 2 ),<-newp 2 (X 1 , . . . , X m )].
If G1 and G2 are equivalent w.r.t. SEM, Pk, and V, we also say that
the replacement law G1 =v G2 is valid w.r.t. SEM and Pk.
Since we have assumed that SEM is preserved by predicate renaming,
we have that, for any set V of variables, =y is an equivalence relation.
Rule R5.1 Clausal goal replacement. Let C = H <- L,G\,R be a
clause in Pk. Suppose that goals G1 and G2 are equivalent w.r.t. SEM,
Pfc, and the set of variables vars(H,L,R) n vars(G1,G 2 ). By clausal goal
replacement of G1by G2 in C we derive the clause D = H <- L, G2, R and
we get Pk+1 by substituting D for C in Pk.
714 Alberto Pettorossi and Maurizio Proietti

In Example 4.3.1 below we will see that rule R5.1 overcomes the above
mentioned limitation of rule R5.
In the next section we will show that the clausal goal replacement rule
R5.1 can be viewed as a derived rule, because its application can be mim-
icked by suitable applications of the transformation rules Rl, R2, R3, R4,
and R5 defined above. Thus, without loss of generality, we may consider
rule R5 as the only goal replacement rule, when also rules Rl, R2, R3, and
R4 are available.
Various notions of goal equivalence and goal replacement have been in-
troduced in the literature [Tamaki and Sato, 1984; Maher, 1987; Gardner
and Shepherdson, 1991; Bossi et ai, 1992b; Bossi et al., 1992a; Cook and
Gallagher, 1994]. Each of these notions has been defined in terms of a par-
ticular semantics, while in our presentation we introduced a notion which
has the advantage of being parametric w.r.t. the given semantics SEM.
We finally present a class of transformation rules which will collectively
be called clause replacement rules and referred to as rule R6.
Rule R6. Clause replacement. Prom program Pk we get program Pk+1
by applying one of the following four rules.
Rule R6.1 Clause rearrangement. We get Pk+1 by replacing in Pk
the sequence C, D of two clauses by D, C.
This clause rearrangement rule is implicitly used by many authors who
consider a program as a set or a multiset of clauses.
Rule R6.2 Deletion of subsumed clauses. A clause C is subsumed
by a clause D iff there exist a substitution 0 and a (possibly empty) goal G
such that hd(C) = hd(D)0 and bd(C) = bd(D)0, G. We may get program
Pk+1 by deleting from Pk a clause which is subsumed by another clause in
Pk.
In particular, the rule for the deletion of subsumed clauses allows us to
remove duplicate clauses.
Rule R6.3 Deletion of clauses with finitely failed body. Let C be
a clause in program Pk of the form: H <- A1, . . . , Am, L, B1, . . . , Bn with
m, n > 0. If literal L has a finitely failed SLDNF-tree in Pk, then we say
that C has a finitely failed body in Pk and we get program Pk+1 by deleting
C from Pk.
The rules for the deletion of subsumed clauses and the deletion of clauses
with finitely failed body are instances of the clause deletion rule introduced
by Tamaki and Sato [1984] for definite programs. Other rules for deleting
clauses from a given program, or adding clauses to a given program are
studied in [Tamaki and Sato, 1984; Gardner and Shepherdson, 1991; Bossi
and Cocco, 1993; Maher, 1993]. The correctness of those rules strictly
Transformation of Logic Programs 715

depends on the semantics considered. For further details the reader may
look at the original papers.
Rule R6.4 Generalization + equality introduction. Let us assume
that the equality predicate '=' (written in infix notation) is defined by
the clause X = X <- in every program of the transformation sequence
PO, . . . . , Pk. Let us also consider a clause
C. H < - A 1 , . . . , Am
in Pk, a substitution 6 = {X/t}, with X not occurring in t, and a clause
GenC. GenH <- GenA1, . . . , GenAm
such that C = GenC 0,
By generalization + equality introduction we derive the clause
D. GenH <- X = t,GenA1, . . . ,GenAm
and we get Pk+1 by replacing C by D in Pk-
This transformation rule was formalized in [Proietti and Pettorossi,
1990].

4 Correctness of the transformation rules


In this section we will first present some correctness properties of the trans-
formation rules which we introduced in the previous section. These prop-
erties are parametric w.r.t. the semantics function SEM. We will then give
an overview of the main results presented in the literature concerning the
correctness of the transformation rules w.r.t. various semantics for definite
and normal programs.
We first introduce the notion of correctness of a transformation sequence
w.r.t. a generic semantics function SEM.
Definition 4.0.1 (Correctness of a transformation sequence). Let
P be a set of programs, Q a set of queries, and SEM: P x Q -> (D, <) be
a semantics function. A transformation sequence PO, . . . . ,Pk of programs
in P is partially correct (or totally correct) w.r.t. SEM iff for every query
Q in Q, containing only predicate symbols which occur in P0, we have that
SEM[Pk,Q] < SEM[P0,Q] (or SEM[Pk,Q] = SEM[P0,Q]).
A transformation rule is partially correct (or totally correct) w.r.t. SEM
iff for every transformation sequence PO, . . . ,Pk which is partially correct
(or totally correct) w.r.t. SEM and for any program Pk+1 obtained from
Pk by an application of that rule, we have that the extended transforma-
tion sequence P0, . . . , Pk, Pk+1 is partially correct (or totally correct) w.r.t.
SEM.
A transformation step which allows us to derive Pk+1 from a trans-
formation sequence P0, . . . ,Pk is said to be partially correct (or totally
716 Alberto Pettorossi and Maurizio Proietti

correct) w.r.t. SEM iff for every query Q in Q, containing only predicate
symbols which occur in Pk, we have that S E M [ P k + 1 , Q ] < SEM[Pk,Q]
(or SEM[Pk+1,Q] = SEM[Pk,Q]).
Notice that, if a transformation sequence is constructed by performing
a sequence of partially correct transformation steps, then it is partially
correct. However, it may be the case that not all transformation steps
realizing a partially correct transformation sequence are partially correct.
Also, the application of a partially correct transformation rule may generate
a transformation step which is not partially correct.
Similar remarks also hold for total correctness, instead of partial cor-
rectness.
Obviously, if P0 , . . . , Pk and Pk, Pk+1, . . . , Pn are partially correct (or
totally correct) transformation sequences, also their 'concatenation' P0, . . . ,
Pk,Pk+1, . . . ,Pn is partially correct (or totally correct). In what follows,
by 'correctness' we will mean 'total correctness'.
The following lemma establishes a correctness result for the definition
introduction and definition elimination rules assuming that SEM is rele-
vant. In the subsequent sections we will give some more correctness results
which hold with different assumptions on SEM.
Lemma 4.0.2 (Relevance). The rules of definition introduction and def-
inition elimination are totally correct w.r.t. any relevant semantics.

4.1 Reversible transformations


We present here a simple method to prove that a transformation sequence
constructed by applying partially correct rules is totally correct. This
method is based on the notion of reversible transformation sequence.
Definition 4.1.1 (Reversible transformations). A transformation se-
quence P0, P1 , . . . , P n-1 , Pn constructed by using a set R of rules is said to
be reversible iff there exists a transformation sequence Pn, Q1, . . . , Qk, PO,
with k > 0, which can be constructed by using rules in R.
A transformation step which allows us to derive Pk+1 from Pk is said
to be reversible iff Pk,Pk+1 is a reversible transformation sequence. In
particular, if Pk, Pk+1 has been constructed by using a rule Ra and Pk+1, Pfc
can be constructed by using a rule Rb, then we say that a transformation
step using Ra is reversible by an transformation step using Rb-
Notice that, in the above definition of reversible transformations the
construction of the transformation sequence Pn, Q1, . . . , Qk, PO is required
to be independent of the construction of the transformation sequence
Po,P1, . . . ,P n - 1 ,P n . This independence condition is essential because,
in general, we can derive a new program by using clauses occurring in
a program which precedes the last one in the transformation sequence at
hand. Thus, it may be the case that there exists a transformation sequence
Transformation of Logic Programs 717

P0,P1, . . . , P n - 1 , Pn, R1 , . . . , Rh, P0, for h > 0, but there is no transforma-


tion sequence Pn, Q1, . . . , Qk, P0. In this case the transformation sequence
P0,P1, . . . , P n - 1 , Pn is not reversible.
In particular, there are folding steps which are not reversible by unfold-
ing steps, because the clauses to be used for unfolding are not available.
Example 4.1.2. Let us consider the following program:
P0 : p <- q q<-q
By folding the first clause using itself, we get the program
P1 : p <- p q<-q
This folding step is not reversible by any sequence of unfolding steps,
because starting from program P1 , it is impossible to get program PO by
applying the unfolding rule only.
The importance of the reversibility property is given by the following
result, whose proof is straightforward.
Lemma 4.1.3 (Reversibility). Let SEM be a semantics function and
R a set of transformation rules.
If a transformation step using a rule Ra in R is reversible by a totally
correct transformation step using a rule RB in R, then the transformation
step using Ra is totally correct.
If the rules in R are partially correct w.r.t. SEM, then any reversible
transformation sequence using rules in R is totally correct w.r.t. SEM.
Notice, however, that in general it is hard to check whether or not a
transformation sequence is reversible.
We now consider instances of the folding rule and the goal replacement
rule which always produce reversible transformation sequences.

Rule R2.1 In-situ folding. A folding step is called an in-situ folding


iff, with reference to rule R2, we have that
Pk = Pj (that is, the clauses used for folding are taken from the
last program, not from a previous program, in the transformation
sequence at hand) and
{C1, . . . ,Cn} n {D1, . . . ,Dn} = {} (that is, no clause among
C1, . . . , Cn is used to fold C1,. . . , Cn).
Any in-situ folding step which derives a clause C from C1, . . . , Cn in
Pk using clauses D1 , . . . , Dn in Pk itself, is reversible by unfolding C using
D1, . . . , Dn. Indeed, clauses D1 , . . . , Dn occur in Pk+1 and by unfolding C
using D1, . . . , Dn we get C1. , . . . , Cn. Thus, Pk+1, Pk is a transformation
sequence which can be produced by an unfolding step.
A folding rule similar to in-situ folding has been considered by Maher
[1990; 1993] in the more general context of logic programs with constraints.
718 Alberto Pettorossi and Maurizio Proietti

Other instances of the in-situ folding rule have been proposed in [Maher,
1987; Gardner and Shepherdson, 1991].
We will see in Section 4.4 that the reversibility property of in-situ folding
allows us to establish in a straightforward way some total correctness results
for this rule. However, in-situ folding has limited power, in the sense that,
as we will see in Section 5, most transformation strategies for improving
program efficiency make use of folding steps which are not in-situ foldings.
Rule R5.2 Persistent goal replacement. Let C be a clause in Pk and
goal G1 be equivalent to goal G-z w.r.t. a semantics SEM and program Pk.
The goal replacement of G1 by G2 in C is said to be persistent iff G1 and
G2 are equivalent w.r.t. SEM and the derived program
Any persistent goal replacement step which replaces G1 by G2 is re-
versible by a goal replacement step which performs the inverse replacement
of G2 by G1. Thus, if the goal replacement rule is partially correct w.r.t.
SEM, then any persistent goal replacement step is totally correct w.r.t.
SEM.
In the following definition we introduce a variant of the goal replacement
rule which is reversible if one considers relevant semantics.
Rule R5.3 Independent goal replacement. Let C be a clause in a
program P and goal G1 be equivalent to goal G2 w.r.t. a semantics SEM
and program P. The replacement of G1 by G-z in C is said to be independent
iff C belongs to neither P rel (G1) nor P rel (G2) (that is, neither G1 nor G2
depends on G).
Lemma 4.1.4 (Reversibility of independent goal replacement). Let
SEM be a relevant semantics and goal G1 be equivalent to goal G2 w.r.t.
SEM and program P. Any independent goal replacement of G\ by G-z in
a clause of P is reversible by performing an independent goal replacement
step.
Proof. Let Q be the program obtained from P by replacing the goal G1
by the goal G-z in a clause C of P such that neither G1 nor G2 depends on
C. We first show that this independent goal replacement step is persistent
by proving that G1 and G2 are equivalent w.r.t. SEM and program Q,
that is, SEM[Q, <- G1] = SEM[Q, <- G2}. Indeed, we have that
SEM[Q, <- G1] (by relevance of SEM)
= SEM[<Qrel(G1), <- G1] (since C $ Prel(G1))
= SEM[Prel(G1), <- G1] (by relevance of SEM)
= SEM[P, <- G1] (since G\ is equivalent to G-z w.r.t.
SEM and P)
= SEM[P, <- G2] (by relevance of SEM)
= SEM[Prel(G2), <- G2] (since C g P rel (G 2 ))
Transformation of Logic Programs 719

= SEM[Qrel (G2) , <- G2] (by relevance of SEM)


= SEM[Q,<-G2].
Since any independent goal replacement step is persistent, it is also
reversible by performing the inverse independent goal replacement of G2
by G1 in the program Q. •

4.2 A derived goal replacement rule


In this section we show that the clausal goal replacement rule can be de-
rived from the following transformation rules: unfolding, in-situ folding,
non-recursive definition introduction, definition elimination, and goal re-
placement, whenever the rules of non-recursive definition introduction and
in-situ folding are totally correct w.r.t. the semantics SEM. A different
way of viewing the goal replacement rule as a sequence of transformation
rules can be found in [Bossi et al., 1992a].
Let Pk be the program C1 , . . . , Cn and let us consider clause Ci of the
form H *-.L,G1,R. Suppose that the goal G1 is equivalent to the goal G2
w.r.t. SEM, Pk, and {X1, . . . ,Xm} = vars(H,L,R) n uors(G1,G 2 ).
By applying the clausal goal replacement rule R5.1 (page 713) we may
replace G\ by G2 in Ci, thereby deriving the new clause D of the form
H <- L, G2, R and the new program Pk+1 =C1, . . . , Ci-1 D, Ci+1 , . . . ,Cn.
The same program Pk+1 can be derived by the sequence of the following
five transformation steps.
Step 1. By the non-recursive definition rule we introduce the clauses

Then we get the program C1,. . . , Cn, D1, D2.


Step 2. Since vars(H, L, R) n vars(G1) C {X1, . . . , Xm}, condition 4 of the
folding rule is fulfilled. Also the other conditions 1, 2, and 3 are fulfilled.
By in-situ folding of clause Ci using D\ we derive the clause

Thus, we get the new program

Step 3. The goals newpi (X1 , . . . , Xm) and newpz(X1 , . . . , Xm) are equiv-
alent w.r.t. SEM and Q. Indeed, we have

(by the correctness of non-recursive definition introduction)

(by the correctness of in-situ folding)


720 Alberto Pettorossi and Maurizio Proietti

= SEM[(C1, . . . , Ci-1,F,Ci+1, . . . , C n ,D1,D2), <- newp1(X1, . . . ,X m )]


and
SEM[(C1, . . . , di-1, Ci, Ci+1, . . . , Cn, D1, <- newp1(X 1 , . . . , Xm)]
(since G1 and G2 are equivalent w.r.t. SEM, Pk, and {X1 , . . . , Xm})
= SEM[(C1, . . . ,Ci-1,Ci,Ci+1, . . . ,C n ,D 2 ), <- newp2(X1, . . .,Xm)]
(by the correctness of non-recursive definition introduction)
= SEM[(Ci,..., d-i, d, Ci+1,..., Cn, DI, Da), <- ne«;p2(Xj,..., Xm)]
(by the correctness of in-situ folding)
= SEM[(Ci, . . . ,Ci_i,F,C i+1 , . . . ,Cn,D1D2), <- newp2(Xi,... ,X m )].
Thus, by applying the goal replacement rule R5, from clause F in pro-
gram Q we derive the new clause
M. H*-L,newp2(Xi,...,Xm),R
and we get the new program Ci,...,Cj_i,M,d+i,... ,Cn,Di,Dz.
Step 4. By unfolding M w.r.t. newp2(Xi,..., Xm) we get
D. H <- L,G2,R
and the program C1, . . . ,Ci-1,D, Ci+1, . . . , Cn, D1 , D2.
Step 5. Finally, by definition elimination we discard clauses D1 and D2,
and we get exactly the program Pk+1,as desired.
We have also the converse result which we will show below, that is, if
by goal replacement from program Pk we get program Pk+1 and we assume
that the non-recursive definition introduction rule and the independent goal
replacement rule are totally correct w.r.t. SEM, then we may apply the
clausal goal replacement rule for deriving program Pk+1 from program Pk-
Indeed, suppose that the goals G1 and G2 are equivalent w.r.t. SEM
and Pk = C1, . . . , C n .Also, consider the clauses
D1. n e w p 1 ( X 1 , . . . , X m ) < - G 1
D2. newp2(X1,..., Xm) <- G2
where { X 1 , . . . , Xm} is any set of variables.
By the correctness of the non-recursive definition introduction rule we
have that
SEM[(C1, . . . ,C n ), <-G1]
= SEM[(C1 , . . . ,Cn, n e w p 1 ( X 1 , . . . , X m ) < - G 1 ) , <- G1] and
SEM[(C1, . . . ,C n ), <- G2]
= SEM[(C1, . . . ,Cn, newp1(X1, . . . , X m ) < - G 1 ) , <- G2].
Since the goals G1 and G2 are equivalent w.r.t. SEM and (C1, . . . , Cn),
that is, SEM[(C1, . . . ,C n ), <- G1] = SEM[(C1, . . . ,C n ), <- G2], we have
that
Transformation of Logic Programs 721

SEM[(C1, . . . ,CB, newp1 (X1,. . . ,X m )<- G1, <- G1]


= SEM[(C1, . . . ,C n , newp1(X1, . . . , X m ) < - G 1 , <- G2],
that is, G1 and G2 are equivalent w.r.t. SEM and the program (C1,. . . , Cn,
newp1(X1, . . .,Xm)<-G1).
As a consequence, we may apply the independent goal replacement
rule and from program (C1 , . . . , Cn , newp1 (X1 , . . . , Xm) <- GO we derive
program (C1 , . . . , Cn, newp1 (X1, . . . , Xm) <- G 2 ).
Thus,
SEM[(C1, . . . , C n ,newp1(X 1 ,. . . ,X m ) <- G1, <- newp 1 (X1, . . . , X m ) ]
(by the correctness of the independent goal replacement rule)
= SEM[(C1, . . . ,Cn,newp1(X1, . . . ,Xm) <- G 2 ), <-
(since SEM is preserved by predicate renaming)
= SEM[(C1 , . . , Cn, newp 2 (X 1 , . . . , Xm) <- G 2 ), <-
We conclude that: i) G1 and G2 are equivalent w.r.t. SEM, Pk, and
any set of variables (see Definition 3.3.9, page 713), and ii) the replacement
of G1 by G2 in any clause of Pk may be viewed as an application of the
clausal goal replacement rule.
To sum up the results presented in this section we have that, if the rules
of non-recursive definition introduction, in-situ folding, and independent
goal replacement are correct w.r.t. SEM, then the goal replacement and
clausal goal replacement are equivalent rules, in the sense that program
Q can be derived from program P by using rules Rl, R2, R3, R4, goal
replacement (rule R5), and R6 iff Q can be derived from P by using rules
Rl, R2, R3, R4, clausal goal replacement (rule R5.1), and R6.
We will see that the non-recursive definition introduction, in-situ fold-
ing, and independent goal replacement rules are correct w.r.t. all semantics
we will consider, and thus, when describing the correctness results w.r.t.
those semantics, we can make no distinction between the goal replacement
rule R5 and the clausal goal replacement rule R5.1.

4.3 The unfold/fold proof method


The validity of a replacement law is, in general, undecidable. However, if we
use totally correct transformation rules only, then for any transformation
sequence we need to prove a replacement law only once.
Indeed, if G1 =v G2 is valid w.r.t. a semantics SEM and a program
Pk, then it is also valid w.r.t. SEM and Q for every program Q derived
from Pk by using totally correct transformation rules.
In order to prove the validity of a replacement law, there are ad hoc
proof methods depending on the specific semantics which is considered (see
Section 7). As an alternative approach, one can use a simple method based
on unfold/fold transformations which we call unfold/fold proof method.
722 Alberto Pettorossi and Maurizio Proietti

This proof method was introduced by Kott for recursive equation pro-
grams [Kott, 1982] and its application to logic programs is described in
[Boulanger and Bruynooghe, 1993; Proietti and Pettorossi, 1994a; Proietti
and Pettorossi, 1994b],
The unfold/fold proof method can be described as follows. Given a
program P, a semantics function SEM, and a replacement law G1 =v G2,
with V — {X1, . . . , Xm}, we consider the clauses

D2. newp2(X1, . . . ,Xm)<-G2


and the programs R0 = C1,. . . ,C n ,D1 and S0 = C1,. . . ,C n ,D 2 .
We then construct two correct transformation sequences R0, . . . , Ru and
S0, . . . , Sv, such that Ru and Sv are equal modulo predicate renaming.
The validity of G1 =v G2 follows from the total correctness of the
transformation sequences, and the assumption that SEM is preserved by
predicate renaming.
Example 4.3.1. Consider again the program Pk = C,A1,A 2 of the
Sublist Example 3.3.8 (page 713). Suppose that we want to ap-
ply the clausal goal replacement rule to replace the goal G1 =
(append(V, X, I),append(I, Z, Y)) by the goal G2 = (append(X,Z,J),
append(V, J, Y)) in the body of the clause
C. sublist(N, X, Y) <- length(X, N), append(V, X, I), append(I, Z, Y)
We need to show the validity of the replacement law

append(V, X, I), append(I, Z, Y) ={X,Y} append(X, Z, J), append(V, J, Y)

where the equivalence w.r.t. the set {X, Y} is justified by the fact that

vars(sublist(N,X,Y'),length(X,N)) n vars(G 1 ,G 2 ) = {X,Y}.

As suggested by the unfold/fold proof method, we introduce the clauses


DI. newp1(X,Y) <- append (V,X, I), append(I, Z,Y)
D2. newp2(X,Y) <- append(X,L,J),append(K,J,Y)
We then consider the programs R0 = C,A1,A 2 ,D1 and 5o = C, AI,
A2, D2.
We now construct two transformation sequences starting from RQ and
5o, respectively, as follows.
1. Transformation sequence starting from R0.
By unfolding clause D1 in R0 w.r.t. append(V, X, I) we derive the
following two clauses:
E1. newp1 ( X , Y ) < - append(X, Z, Y)
Transformation of Logic Programs 723

E2. newp1(X,Y)<-append(T,X,U),append([H\U},Z,Y)
and we get the program R1 =C,A1, A2, E1, E2.
Then, by unfolding clause E2 w.r.t. append([H\U],Z, Y) we get
E3. newp1 (X, [H\V]) <- append(T, X, U), append(U, Z, V)
and we get the program R2 = C, A1 , A2 , E1 , E3 .
Finally, by folding clause E3 using clause D1 in R0 we derive
E4. newp1(X,[H\V])<-newpl(X,V)
and we get the program R3 = C, A1, A2, E1 , E4.
2. Transformation sequence starting from S0.
By unfolding clause D2 in 5o w.r.t. append(K, J, Y) we derive two
clauses
F1. newp2(X,Y) <- append(X,L,Y)
F2. newp2(X, [H\U]) <- append(X, L, J),append(T, J, U)
and we get the program S1 = C, A1, A2, F1, F2.
By folding clause F2 using clause D2 in So we get the clause
F3. newp2(X;[H\U])<-newp2(X,U)
and we derive the final program of this transformation sequence which

The derived programs R3 and S2 are equal up to predicate renaming


(that is, for a renaming p which maps newp 1 to newp2} we have p(R3) = S2)
and the validity of the given replacement law is proved w.r.t. any semantics
SEM, provided the transformation sequences R0, R1, R2, R3 and So, S1,
S2 are correct w.r.t. SEM. We will show below that these transformation
sequences are correct w.r.t. several semantics and, in particular, the least
Herbrand model semantics LHM.

4.4 Correctness results for definite programs


Let us now consider definite programs and let us study the correctness
properties of the transformation rules w.r.t. various semantics. We will first
look at the correctness properties of the unfold/fold transformations w.r.t.
both the least Herbrand model (Section 4.4.1) and the computed answer
substitution semantics (Section 4.4.2). We will then take into account
various semantics related to program termination, such as the finite failure
semantics (Section 4.4.3) and the answer substitution semantics computed
by the depth-first search strategy of Prolog (Section 4.4.4) .
We will assume that the equivalence between goals, as well as the various
instances of the goal replacement rule, refer to the semantics considered in
each section.
724 Alberto Pettorossi and Maurizio Proietti

4.4.1 Least Herbrand model


In this section we assume that the semantics function is LHM (page 707)
and we present several partial correctness and total correctness results
based on the work in [Tamaki and Sato, 1984; Tamaki and Sato, 1986; Ma-
her, 1987; Gardner and Shepherdson, 1991; Gergatsoulis and Katzouraki,
1994].
The total correctness w.r.t. LHM of the unfolding steps is a straight-
forward consequence of the soundness and completeness of SLD-resolution
w.r.t. the least Herbrand model semantics. As a consequence, by the re-
versibility lemma (page 717) any in-situ folding step is totally correct w.r.t.
LHM.
In the general case, by applying the folding rule R2 to program Pk of
a transformation sequence P0, . . . , Pk, we derive clauses which are true in
the least Herbrand model of P0. Thus, the folding rule is partially correct
w.r.t. LHM. It is not totally correct, as is shown by the following example.
Example 4.4.1. Given the program

P0: p<-q q<-


by folding the first clause using itself we get

P1: p<-p p <-


We have that LHM [P1,<- p] = LHM[P0,<- p], because p is true in
the least Herbrand model of P0 and it is false in the least Herbrand model
of P1
The correctness of the definition introduction and elimination rules
w.r.t. LHM follows from Lemma 4.0.2, since LHM is a relevant semantics.
Similarly to the case of the folding rule, also the goal replacement rule
R5 is partially correct w.r.t. LHM. Indeed, by applying the goal replace-
ment rule to program Pk of a transformation sequence P0 , . . . , Pk, we derive
clauses which are true in the least Herbrand model of P0 .
It is easy to see that, in general, the goal replacement rule is not totally
correct. Indeed, the folding step considered in Example 4.4.1 may also be
taken as an example of a goal replacement step which is not totally correct,
because p is equivalent to q w.r.t. LHM and P0.
However, there are some instances of the goal replacement rule which
are totally correct. In particular, since LHM is a relevant semantics, by
the partial correctness of the goal replacement rule, the reversibility lemma
(page 717), and the reversibility of independent goal replacement lemma
(page 718), the independent goal replacement rule is totally correct w.r.t.
LHM.
Let us now consider the following two special cases of the goal replace-
ment rule.
Transformation of Logic Programs 725

Rule R5.4 Goal rearrangement. By the goal rearrangement rule we


replace a goal (G, H) in the body of a clause by the goal (H, G).
Rule R5.5 Deletion of duplicate goals. By the deletion of duplicate
goals rule we replace a goal (G, G) in the body of a clause by the goal G.
Steps of goal rearrangement and deletion of duplicate goals are to-
tally correct w.r.t. LHM. Indeed, they are persistent goal replacement
steps, because they are based on the following equivalences w.r.t. LHM:
(G,H) =vars(G,H) (H,G) and (G,G) =vars(G) G, which hold w.r.t. any
given program.
From these total correctness results it follows that, when dealing with
the least Herbrand model semantics, we may assume that bodies of clauses
are sets (not sequences) of atoms.
The total correctness of the clause replacement rules is also straight-
forward. Thus, since the rules for clause rearrangement (rule R6.1) and
deletion of duplicate clauses (rule derived from rule R6.2) are totally cor-
rect w.r.t. LHM, we may assume that programs are sets (not sequences)
of clauses.
However, we will see that some instances of the above rules are not
correct when considering the computed answer substitution semantics (see
Section 4.4.2, page 728) or the pure Prolog semantics (see Section 4.4.4,
page 732).
As a summary of the results mentioned so far we have the following:
Theorem 4.4.2 (First correctness theorem w.r.t. LHM). Let P0,
. . . , Pn be a transformation sequence of definite programs constructed by
using the following transformation rules: unfolding, in-situ folding, def-
inition introduction, definition elimination, goal rearrangement, deletion
of duplicate goals, independent goal replacement, and clause replacement.
Then P0, . . . ,Pn is totally correct w.r.t. LHM.
We have seen that the in-situ folding rule has the advantage of being a
totally correct transformation rule, but it is a weak rule because it does not
allow us to derive recursive definitions. In order to overcome this limitation
we now present a different and more powerful version of the folding rule,
called single-folding.
Let us first notice that by performing a folding step and introducing
recursive clauses from non-recursive clauses, some infinite computations
(due to a non-well-founded recursion) may replace finite computations,
thereby affecting the semantics of the program and losing total correctness.
A simple example of this undesirable introduction of infinite computa-
tions is self-folding, where all clauses in a predicate definition can be folded
using themselves. For instance, the definition p <- q of a predicate p can
be replaced by p <— p (see Example 4.4.1, page 724).
This inconvenience can be avoided by ensuring that 'enough' unfold-
726 Alberto Pettorossi and Maurizio Proietti

ing steps have been performed before folding, so that 'going backward in
the computation' (as folding does) does not prevail over 'going forward in
the computation' (as unfolding does). This idea is the basis for various
techniques in which total correctness is ensured by counting the number of
unfolding and folding steps performed during the transformation sequence
[Kott, 1978; Kanamori and Fujita, 1986; Bossi et al., 1992a].
An alternative approach is based on the verification that some ter-
mination properties are preserved by the transformation process, thereby
avoiding the introduction of infinite computations [Amtoft, 1992; Bossi and
Etalle, 1994b; Bossi and Cocco, 1994; Cook and Gallagher, 1994].
The following definition introduces the version of the folding rule we
have promised above. This version is a special case of rule R2 for n = 1.

Rule R2.2 Single-folding. Let C be a clause in program Pk and D be


a clause in a program Pj, for some j, with 0 < j < k. Suppose that there
exist two goals F and G and a substitution 9 such that:
1. C is a variant of H <- F, bd(D)0, G,
2. for every clause E of Pj, different from D, hd(E) is not unifiable with
hd(D)0, and
3. for every variable X in the set vars(D) — vars(hd(D)), we have that
• X9 is a variable which does not occur in (H, F, G) and
the variable XO does not occur in the term YO, for any variable
Y occurring in bd(D) and different from X.
By the single-folding rule, using clause D, from clause C we derive clause
H <- F, hd(D)0, G.

This rule is called T & S-folding in [Pettorossi and Proietti, 1994].


We now present a correctness result analogous to the first correct-
ness Theorem w.r.t. LHM for a transformation sequence including single-
folding, rather than in-situ folding. We essentially follow [Tamaki and Sato,
1986], but we make some simplifying assumptions. By doing so, total cor-
rectness is ensured by easily verifiable conditions on the transformation
sequence.
We assume that the set of the predicate symbols occurring in the trans-
formation sequence P0, . . . , Pn is partitioned into three sets, called top pred-
icates, intermediate predicates, and basic predicates, respectively, with the
following restrictions:
1. a predicate introduced by the definition rule is a top predicate,
2. an intermediate predicate does not depend in P0 on any top predicate,
and
3. a basic predicate does not depend in P0 on any intermediate or top
predicate.
Transformation of Logic Programs 727

Notice that this partition process is, in general, nondeterministic. In


particular, we can choose the top predicates in P0 in various ways. Notice
also that dependencies of some intermediate predicates on top predicates
may be introduced by folding steps (see the second correctness theorem
below).
The approach we follow here is more general than the approach de-
scribed in [Kawamura and Kanamori, 1990], where only two sets of predi-
cates are considered (the so-called new predicates and old predicates).
Let us also introduce a new goal replacement rule, called basic goal
replacement, which is a particular case of independent goal replacement.
Rule R5.6 Basic goal replacement. By the basic goal replacement
rule we replace a goal G1 in the body of a clause C by a goal G1 such that
any predicate occurring in G\ or G2 is a basic predicate and the head of C
has a top or an intermediate predicate.
The following theorem establishes the correctness of transformation se-
quences which are constructed by applying a set of transformation rules
including the single-folding rule and the basic goal replacement rule.
Theorem 4.4.3 (Second correctness theorem w.r.t. LHM). Let
Po, . . . ,Pn be a transformation sequence of definite programs constructed
by using the following transformation rules: unfolding, single-folding, defi-
nition introduction, definition elimination, goal rearrangement, deletion of
duplicate goals, basic goal replacement, and clause replacement. Suppose
that no single-folding step is performed after a definition elimination step.
Suppose also that when we apply the single-folding rule to a clause, say C,
using a clause, say D, the following conditions hold:
either D belongs to P0 or D has been introduced by the definition rule,
hd(D) has a top predicate, and
either hd(C) has an intermediate predicate
or hd(C) has a top predicate and C has been derived from a clause, say
E, by first unfolding E w.r.t. an atom with an intermediate predicate
and then performing zero or more transformation steps on a clause
derived from the unfolding of E.
Then P0, . . . ,Pn is totally correct w.r.t. the semantics LHM.
The hypothesis that no single-folding step is performed after a definition
elimination step is needed to prevent single-folding from being applied using
a clause with a head predicate whose definition has been eliminated. This
point is illustrated by the following example.
Example 4.4.4. Let us consider the transformation sequence
P0: p <- q p <- fail q <-
(by definition introduction)
PI1: p <- q p <- fail q <- newp <- q
728 Alberto Pettorossi and Maurizio Proietti

(by definition elimination)


P2: p<-q p <- fail q <-
(by single-folding)
P3: p <- newp p <- fail q <-
We may assume that newp is a top predicate and p is an intermedi-
ate predicate. However, the transformation sequence is not correct w.r.t.
LHM, because the query <- p succeeds in the initial program, while it fails
in the final one.
The second correctness theorem w.r.t. LHM can be extended to other
variants of the folding rule as described in [Gergatsoulis and Katzouraki,
1994].
4.4.2 Computed answer substitutions
We now consider a semantics function based on the notion of computed an-
swer substitutions [Lloyd, 1987; Apt, 1990], which captures the procedural
behaviour of definite programs more accurately than the least Herbrand
model semantics.
The computed answer substitution semantics can be defined as a func-
tion
CAS: P+ x Q+ -> (p(Subst),<)
where P+ is the set of definite programs, Q+ is the set of definite queries,
and ("P(Subst), <) is the powerset of the set of substitutions ordered by
set inclusion. We define the semantics CAS as follows:
CAS[P,<2] = {0 | there exists an SLD-refutation of Q in P
with computed answer substitution &}.
CAS is a relevant semantics.
By the soundness and completeness of SLD-resolution w.r.t. LHM,
we have that the equivalence of two programs w.r.t. CAS implies their
equivalence w.r.t. LHM. However, the converse is not true. For instance,
consider the following two programs:
Pi: P(X)<-
P2: p(X) <- p(a) <-
We have that LHM[P1, <- p(X)] = LHM[P2, <- p(X)] = Subst. How-
ever, we have that CAS [P1, <- p(X)] = {{}}, whereas CAS[P2, <- p(X)] =
{{}, {X/a}}, where {} is the identity substitution.
As a consequence, not all rules which are correct w.r.t. LHM are correct
also w.r.t. CAS. In particular, the deletion of duplicate goals and the
deletion of subsumed clauses do not preserve the CAS semantics, as is
shown by the following examples.
Example 4.4.5. Let us consider the program
P1: p(X) <-q(X),q(X) q(t(Y,a)) <- q(t(a,Z))<-
Transformation of Logic Programs 729

By deleting an occurrence of q(X) in the body of the first clause we get


P2: p(X)<-q(X) q(t(Y,a))+- q(t(a,Z))<-
The substitution {X/t(a,a)} belongs to CAS[P1,<- p ( X ) ] and not to
CAS[P2,<-p(X)].
Example 4.4.6. Let us consider the program
P: p(X) <- p(a) <-
The clause p(a) <- is subsumed by p(X) <-. However, if we delete
p(a) <- from the program P the CAS semantics is not preserved, because
{X/a} is no longer a computed answer substitution for the query <- p(X).
There are particular cases, however, in which the deletion of duplicate
goals and the deletion of subsumed clauses are correct w.r.t. CAS. The
following definitions introduce two such cases.
Rule R5.7 Deletion of duplicate ground goals. By the rule of deletion
of ground duplicate goals we replace a ground goal (G, G) in the body of a
clause by the goal G.
This rule is an instance of the persistent goal replacement rule (see rule
R5.2, page 718).
Rule R5.8 Deletion of duplicate clauses. By the rule of deletion of
duplicate clauses we replace all occurrences of a clause C in a program by
a single occurrence of C.
Several researchers have addressed the problem of proving the correct-
ness of the transformation rules w.r.t. CAS [Kawamura and Kanamori,
1990; Bossi et al, 1992a; Bossi and Cocco, 1993]. We now present for the
CAS semantics two theorems which correspond to the first and second
correctness theorems w.r.t. the LHM semantics. As already mentioned, in
these theorems the various instances of the goal replacement rule refer to
the equivalence of goals w.r.t. CAS.
Theorem 4.4.7 (First correctness theorem w.r.t. CAS). Let P0, . . . ,
Pn be a transformation sequence of definite programs constructed by using
the following transformation rules: unfolding, in-situ folding, definition in-
troduction, definition elimination, goal rearrangement, deletion of ground
duplicate goals, independent goal replacement, clause rearrangement, dele-
tion of duplicate clauses, deletion of clauses with finitely failed body, and
generalization + equality introduction. Then Po, . . . ,Pn is totally correct
w.r.t. CAS.
Theorem 4.4.8 (Second correctness theorem w.r.t. CAS). Let P0,
. . . , Pn be a transformation sequence of definite programs constructed by
using the following transformation rules: unfolding, single-folding, defini-
tion introduction , definition elimination, goal rearrangement, deletion of
730 Alberto Pettorossi and Maurizio Proietti

ground duplicate goals, basic goal replacement, clause rearrangement, dele-


tion of duplicate clauses, deletion of clauses with finitely failed body, and
generalization + equality introduction. Suppose that no single-folding step
is performed after a definition elimination step. Suppose also that when we
apply the single-folding rule to a clause, say C, using a clause, say D, the
following conditions hold:
• either D belongs to P0 or D has been introduced by the definition rule,
• hd(D) has a top predicate, and
• either hd(C) has an intermediate predicate
or hd(C) has a top predicate and C has been derived from a clause, say
E, by first unfolding E w.r.t. an atom with an intermediate predicate
and then performing zero or more transformation steps on a clause
derived from the unfolding of E.
Then Pc, . . . , Pn is totally correct w.r.t. the semantics CAS.

4.4.3 Finite failure


In order to reason about the preservation of finite failure during program
transformation we now consider the semantics function

FF: P+ x Q+ -> (P(Subst), <)


such that
FF[P, Q] = {9\ there exists a finitely failed SLD-tree for Q0 in P}.
The reader may verify that FF is a relevant semantics.
Work on the correctness of transformation rules w.r.t. FF has been
presented in [Maher, 1987; Seki, 1991; Gardner and Shepherdson, 1991;
Cook and Gallagher, 1994]. Similarly to the case of LHM and CAS, we
have the following result, where the independent goal replacement rule is
defined in terms of goal equivalence w.r.t. FF.
Theorem 4.4.9 (First correctness theorem w.r.t. FF). Let P0, . . . ,
Pn be a transformation sequence of definite programs constructed by using
the following transformation rules: unfolding, in-situ folding, definition
introduction, definition elimination, goal rearrangement, deletion of dupli-
cate goals, independent goal replacement, and clause replacement. Then
P0, . . . , Pn is totally correct w.r.t. FF.
However, the use of the rules listed in the second correctness theorems
w.r.t. LHM and CAS (pages 727 and 729), may affect FF. In particular,
if we allow folding steps which are not in-situ foldings, we may transform a
finitely failing program into an infinitely failing program, as shown by the
following example.
Example 4.4.10. Let us consider the transformation sequence, where p
is a top predicate and q and r are intermediate predicates:
Transformation of Logic Programs 731

P0: P(X) <- q(X), r(X) ?(o) <- r(6) <- r(6)
(by unfolding the first clause w.r.t. r(X))
Pi: p(6)<-«(6),r(6) q(a) <- r(6) <- r(6)
(by applying single-folding to the first clause)
P 2 : p(6)«-p(6) ?(a)<- r(6)«-r(6)
This transformation sequence satisfies the conditions stated in the sec-
ond correctness theorem w.r.t. both LHM and CAS, but PQ finitely fails
for the query <— p(b), while P% does not.
As we have shown in the above Example 4.4.10, if we allow folding
steps which are not in-situ foldings, it may be the case that the derived
transformation sequence is not correct w.r.t. FF.
The fact that such folding steps are not totally correct w.r.t. FF is
related to the notion of fair SLD-derivation [Lloyd, 1987].
A possibly infinite SLD-derivation is fair iff it is either failed or for every
occurrence of an atom A in the SLD-derivation, that occurrence of A or
an instance (possibly via the identity substitution) of A which is derived
from that occurrence, is selected for SLD-resolution within a finite number
of steps.
Fairness of SLD-derivations is a sufficient condition for the completeness
of SLD-resolution w.r.t. FF.
Let us consider a program PI and a query Q, and let us apply a folding
step which replaces a goal B by an atom H , thereby obtaining a program,
say pj . We may view every SLD-derivation 6? of Q in the derived program
P<2 as 'simulating' an SLD-derivation 8\ of Q in PI . Indeed, the simulated
SLD-derivation Si can be obtained by replacing in #2 the instances of H
introduced by folding steps, with the corresponding instances of B.
By applying a folding step (which is not an in-situ folding) to a clause
in PI , we may derive a program PJ such that a fair SLD-derivation for Q
using pj simulates an unfair SLD-derivation for Q using PI , as shown by
the following example.
Example 4.4.11. Let us consider the program Pj of Example 4.4.10 and
the infinite sequence of queries

which constitutes a fair SLD-derivation for the program P^ and the query
*-p(b). The folding step which produces P% from PI replaces the goal
(q(b),r(b)) by the goalp(6). The above SLD-derivation can be viewed as a
simulation of the following SLD-derivation for PI :
<-p(6) <-«(6),r(ft) <-«(&), r(6) ...
which is unfair, because it has been obtained by always selecting the atom
r(b) for performing an SLD-resolution step.
The Theorem 4.4.12 below is the analogue for the FF semantics of the
second correctness theorems w.r.t. LHM and CAS. Its proof is based on
732 Alberto Pettorossi and Maurizio Proietti

the fact that an unfair SLD-derivation of a given program cannot be sim-


ulated by a fair SLD-derivation of a transformed program, if all atoms
replaced in a folding step have previously been derived by unfolding. This
condition is not fulfilled by the folding step shown in Example 4.4.10 be-
cause in the body of the clause p(b) <— q(b),r(b) in P1 the atom q(b) has
not been derived by unfolding.
Theorem 4.4.12 (Second correctness theorem w.r.t. FF). Let P0,
. . . , Pn be a transformation sequence of definite programs constructed by
using the following transformation rules: unfolding, single-folding, defini-
tion introduction, definition elimination, goal rearrangement, deletion of
duplicate goals, basic goal replacement, and clause replacement. Suppose
that no single-folding step is performed after a definition elimination step.
Suppose also that when we apply the single-folding rule to a clause, say C,
using a clause, say D, the following conditions hold:
• either D belongs to P0 or D has been introduced by the definition rule,
• hd(D) has a top predicate, and
• either hd(C) has an intermediate predicate
or hd(C) has a top predicate and each atom of bd(C) w.r.t. which the
single-folding step is performed, has been derived in a previous trans-
formation step by first unfolding a clause, say E, w.r.t. an atom with
an intermediate predicate and then performing zero or more transfor-
mation steps on a clause derived from the unfolding of E.
Then P0, . . . , Pn is totally correct w.r.t. the semantics FF.
4.4.4 Pure Prolog
In this section we consider the case where a definite program is evaluated
using a Prolog evaluator. Its control strategy can be described as follows.
The SLD-tree for a given program and a given query, is constructed by using
the left-to-right rule for selecting the atom w.r.t. which SLD-resolution
should be performed in a given goal. In this SLD-tree, the nodes which are
sons of a given goal are ordered from left to right according to the order
of the clauses used for performing the corresponding SLD-resolution step.
Thus, in Prolog we have that the SLD-tree is an ordered tree, and it is
visited in a depth-first manner. The use of the Prolog control strategy has
two consequences: i) the answer substitutions are generated in a fixed order,
possibly with repetitions, and ii) there may be some answer substitutions
which cannot be obtained in a finite number of computation steps, because
in the depth-first visit they are 'after' branches of infinite length. Therefore,
by using Prolog control strategy SLD-resolution is not complete.
We will define a semantics function Prolog by taking into consideration
the 'generation order' of the answer substitutions, their 'multiplicity', and
their 'computability in finite time'. Thus, given a program P and a query
Q, we consider the ordered SLD-tree T constructed as specified above.
Transformation of Logic Programs 733

The left-to-right ordering of brother nodes in T determines a left-to-right


ordering of branches and leaves.
If T is finite then Prolog [P, Q] is the sequence of the computed answer
substitutions corresponding to the non-failed leaves of T in the left-to-right
order.
If T is infinite we consider a (possibly infinite) sequence F of com-
puted answer substitutions, each substitution being associated with a leaf
of T. The sequence F is obtained by visiting from left to right the non-
failed leaves which are at the end of branches to the left of the leftmost
infinite branch. There are two cases: either F is infinite, in which case
Prolog[P, Q] is F or F is finite, in which case Prolog[P, Q] is F followed
by the symbol ±, called the undefined substitution. All substitutions dif-
ferent from ± are said to be defined.
Thus, our semantics for Prolog is a function

Prolog: P+ x Q+ -> (SubstSeq, <)

where P+ and Q+ are the sets of definite programs and definite queries,
respectively. (SubstSeq, <) is the set of finite or infinite sequences of
defined substitutions, and finite sequences of defined substitutions followed
by the undefined substitution ±. Similar approaches to the semantics of
Prolog can be found in [Jones and Mycroft, 1984; Debray and Mishra, 1988;
Deville, 1990; Baudinet, 1992].
The sequence consisting of the substitutions 01, 02, . . . is denoted by
(01, 02, . . .), and the concatenation of two sequences S1 and 52 in SubstSeq
is denoted by S1@S2 and it is defined as the usual monoidal concatenation
of finite or infinite sequences, with the extra property; {J_)@S = (-L), for
any sequence 5.
Example 4.4.13. Consider the following three programs:
P1: p(a) <- p(b) «- p(a) 4-
P2: p(o)«- p(X)<-p(X) p(6)<-
P3: p(o) <- p(&)«-p(6) p(o) 4-
We have that
Prolog^, 4- p(A-)] = ({X/a}, {X/b}, {X/a})
Prolog[P2,4- ppQ] = ({X/a}, {X/a},...)
Prolog[P3,4- p(X)] = ({X/a}, 1).

The order < over SubstSeq expresses a 'less defined than or equal to'
relation between sequences which can be introduced as follows.
For any two sequences of substitutions Si and S2, the relation Si < S2
holds iff either S1 = S2 or S1 = S3@{±> and S2 = S3@S4, for some S3 and
S4 in SubstSeq. For instance, we have that: (i) for all substitutions 771
734 Alberto Pettorossi and Maurizio Proietti

and 772 with 771 / ± and 772 ^ -L, (ni, -L) < {^i,%). and the sequences {771}
and (771,772) are not comparable w.r.t. <, and (ii) for any (possibly empty)
sequence S, (±) < 5.
Unfortunately, most transformation rules presented in the previous sec-
tions are not even partially correct w.r.t. Prolog. Indeed, it is easy to
see that a clause rearrangement may affect the 'generation order' or the
'computability in finite time' of the answer substitutions, and the deletion
of a duplicate clause may affect their multiplicity.
An unfolding step may affect the order of the computed answer substi-
tutions as well as the termination of a program, as is shown by the following
examples.
Example 4.4.14. By unfolding w.r.t. r(Y) the first clause of the following
program:

we get

The order of the computed answer substitutions is not preserved by this


unfolding step. Indeed, we have

Example 4.4.15. By unfolding w.r.t. r the first clause of the following


program:

we get

We have that Prolog[P 0 ,«- p] = Prolog[P1,<— p], because

Example 4.4.16. By unfolding w.r.t. r ( X ) the first clause of the program


Transformation of Logic Programs 735

We have that Prolog [Po, <— p] = Prolog[P1, <- p], because


Prolog[P0, <- p] = (-L) and Prolog[P1, <- p] = ().
We also have that the use of the folding rule does not necessarily pre-
serve the Prolog semantics. In order to overcome this inconvenience, sev-
eral researchers have proposed restricted versions of the unfolding and fold-
ing rules [Proietti and Pettorossi, 1991; Sahlin, 1993]. The following two
instances of the unfolding rule can be shown to be totally correct w.r.t.
Prolog.
Rule Rl.l Leftmost unfolding. The unfolding of a clause C w.r.t. the
leftmost atom of its body is said to be a leftmost unfolding of C.
Rule R1.2 Single non-left-propagating unfolding. The unfolding
of a clause H <— F,A,G w.r.t. the atom A is said to be a single non-left-
propagating unfolding iff i) there exists exactly one clause D such that A
is unifiable with hd(D) via a most general unifier 0, and ii) H <- F is a
variant of (H <- F)0.
This rule R1.2 is called deterministic non-left-propagating unfolding in
[Pettorossi and Proietti, 1994]. If a folding step is both a single-folding
and an in-situ folding, called here single in-situ folding, then it is reversible
by an application of the single non-left-propagating unfolding rule. By
the reversibility Lemma 4.1.3 (page 717), each single in-situ folding step is
totally correct w.r.t. Prolog.
Since Prolog is relevant, the definition introduction and definition elim-
ination rules are totally correct w.r.t. Prolog.
We also have that the goal replacement rule is partially correct, and by
the reversibility lemma and the reversibility of independent goal replace-
ment lemma (page 718) the independent goal replacement is totally correct
w.r.t. Prolog.
Thus, we can state the following two results which are the analogues
of the first and the second correctness theorems w.r.t. LHM, CAS, and
FF. In the following Theorems 4.4.17 and 4.4.18 the instances of the goal
replacement rule are defined in terms of the notion of goal equivalence w.r.t.
Prolog.
Their proofs are based on the fact that an application of the leftmost
unfolding rule can be viewed as 'a step forward in the computation' using
the left-to-right computation rule.
Theorem 4.4.17 (First correctness theorem w.r.t. Prolog). Let
P0, . . . , Pn be a transformation sequence of definite programs constructed by
using the transformation rules: leftmost unfolding, single non-left-propa-
gating unfolding, single in-situ folding, definition introduction, definition
elimination, independent goal replacement, and generalization + equality
introduction. Then P0,. . . ,Pn is totally correct w.r.t. Prolog.
736 Alberto Pettorossi and Maurizio Proietti

Theorem 4.4.18 (Second correctness theorem w.r.t. Prolog). Let


P0, . . . ,Pn be a transformation sequence of definite programs constructed
by using the following transformation rules: leftmost unfolding, single non-
left-propagating unfolding, single-folding, definition introduction, definition
elimination, basic goal replacement, and generalization + equality intro-
duction. Suppose that no single-folding step is performed after a definition
elimination step. Suppose also that when we apply the single-folding rule
to a clause, say C, using a clause, say D, the following conditions hold:
• either D belongs to P0 or D has been introduced by the definition rule,
• hd(D) has a top predicate, and
• either hd(C) has an intermediate predicate
or hd(C) has a top predicate and C has been derived from a clause,
say E, by first performing a leftmost unfolding step w.r.t. an atom
in bd(E) with an intermediate predicate and then performing zero or
more transformation steps on a clause derived from the unfolding of
E.
Then P0, . . . ,Pn is totally correct w.r.t. the semantics Prolog.
The following example shows that in the above Theorem 4.4.18 we can-
not replace 'leftmost unfolding step' by 'single non-left-propagating unfold-
ing step'.
Example 4.4.19. Let us consider the following initial program:
P0: p <- g, r q <- fail r <- r, q
We assume that p is a top predicate and g, r, and fail are intermediate
predicates. By single non-left-propagating unfolding of p 4— q, r w.r.t. r,
we get the program
P1: p<-q,r,q q <-fail r<-r,q
If we now fold the first clause of P1 using the first clause of P0, we get
P2 : P <- Pq q <- fail r <— r,q
P2 is not equivalent to P0 w.r.t. Prolog, because
Prolog[P0, <- p] = {} and Prolog[P2,<- p] = (±).
In this chapter we have considered only the case of pure Prolog, where
the SLD-resolution steps have no side-effects. Properties which are pre-
served by unfold/fold rules when transforming Prolog programs with side-
effects, including cuts, are described in [Deville, 1990; Sahlin, 1993; Prest-
wich, 1993b; Leuschel, 1994a].

4.5 Correctness results for normal programs


In this section we consider the case where the bodies of the clauses contain
negative literals. There is a large number of papers dealing with trans-
formation rules which preserve the various semantics proposed for logic
Transformation of Logic Programs 737

programs with negation. In particular, some restricted forms of unfold-


ing and folding have been shown to be correct w.r.t. various semantics,
such as the computed answer substitution semantics and the finite fail-
ure semantics [Gardner and Shepherdson, 1991; Seki, 1991], the Clark's
completion [Gardner and Shepherdson, 1991], the Fitting's and Kunen's
three-valued extensions of Clark's completion [Fitting, 1985; Kunen, 1987;
Bossi et at, 1992b; Sato, 1992; Bossi and Etalle, 1994a], the perfect model
semantics [Przymusinsky, 1987; Maher, 1993; Seki, 1991], the stable model
semantics [Gelfond and Lifschitz, 1988; Maher, 1990; Seki, 1990], and
the well-founded model semantics [Van Gelder et al., 1989; Maher, 1990;
Seki, 1990; Seki, 1993].
A uniform approach for proving the correctness of the unfold/fold trans-
formation rules w.r.t. various non-monotonic semantics of logic programs
(including stable model, well-founded model, and perfect model semantics)
has been proposed by Aravindan and Dung [1995], who showed that the un-
folding and some variants of folding transformations preserve the so-called
semantic kernel of a normal logic program.
We will report here only on the results concerning the following three
semantics [Lloyd, 1987]: i) computed answer substitutions, ii) finite failure,
and iii) Clark's completion.
The computed answer substitution semantics for normal programs is a
function
CASNF: P x Q -> (P(Subst), <)
where P is the set of normal programs, Q is the set of normal queries,
and (p(Subst),<) is the powerset of the set of substitutions ordered by
set inclusion. (The suffix 'NF' stands for 'negation as failure'.) We define
CASNF as follows:
CASNF [P,Q] = {0 | there exists an SLDNF-refutation of Q in P
with computed answer substitution 8}.
CASNF is a relevant semantics.
For the correctness of a transformation sequence w.r.t. CASNF there
are results which are analogous to the ones for CAS. In particular, the
statement of the first correctness theorem w.r.t. CAS (page 729) is valid
also for CASNF if we replace 'definite programs' with 'normal programs'
and CAS with CASNF.
However, the first correctness theorem w.r.t. CAS (and the new version
for CASNF) does not ensure the correctness of transformation sequences
which also include folding steps different from in-situ foldings.
If we want to perform transformation steps which are not applications of
the in-situ folding rule, and still ensure their correctness, we may use The-
orem 4.5.1 below, which combines the second correctness theorems w.r.t.
CAS and FF.
738 Alberto Pettorossi and Maurizio Proietti

Following [Seki, 1991] in the hypotheses of Theorem 4.5.1 we assume


that the programs are stratified.
We recall that a program is stratified iff for every program clause
p(. . .) <- B and for every negative literal -q(. . .) in B, we have that q
does not depend on p.
Theorem 4.5.1 (Second correctness theorem w.r.t. CASNF). Let
P0 , . . . , Pn be a transformation sequence of stratified normal programs con-
structed by using the following transformation rules: unfolding, single-
folding, definition introduction, definition elimination, goal rearrangement,
deletion of ground duplicate goals, basic goal replacement, clause rearrange-
ment, deletion of duplicate clauses, deletion of clauses with finitely failed
body, and generalization + equality introduction. Suppose that no single-
folding step is performed after a definition elimination step. Suppose also
that when we apply the single-folding rule to a clause, say C, using a clause,
say D, the following conditions hold:
• either D belongs to PO or D has been introduced by the definition rule,
• hd(D) has a top predicate, and
• either hd(C) has an intermediate predicate
or hd(C) has a top predicate and each atom of bd(C) w.r.t. which
the single-folding step is performed, has been derived in a previous
transformation step by unfolding a clause, say E, w.r.t. an atom with
an intermediate predicate and then performing zero or more transfor-
mation steps on a clause derived from the unfolding of E.
Then P0, . . . , Pn is totally correct w.r.t. the semantics CASNF.

The finite failure semantics for normal programs, denoted FF as in the


case of definite programs, has the same domain and codomain of CASNF.
We define FF as follows:
FF[P, Q] = {9 | there exists a finitely failed SLDNF-tree for Q9 in P}.
For normal programs we may state some correctness results which are
analogous to those stated in the case of definite programs. Indeed, the first
and second correctness theorems w.r.t. FF (pages 730 and 732) continue to
hold if we replace 'definite programs' with 'stratified normal programs' and
we assume FF to be defined with reference to the set of normal programs
and normal queries, instead of the set of definite programs and definite
queries.
Now we consider the Clark's completion semantics.
Let P and Q be the sets of all normal programs and normal queries,
respectively. For any program P € P, let Comp(P) be the set of first
order formulas, called the completion of P, and constructed as indicated in
[Lloyd, 1987], except that Comp(P) also contains a formula VX->p(X) for
every predicate p in the language L not occurring in P.
Transformation of Logic Programs 739

The Clark's completion semantics is defined by the function

COMP: P x Q -> (P(Subst), <)

such that
COMP[F, <- G] = {9 | Camp(P) |= V(G0)}
where VC denotes the universal closure of a conjunction C of literals, and
similarly to the case of LHM, we identify any program P and any goal G
with the corresponding logical formulas.
As already mentioned, COMP is not a relevant semantics. Thus, we
cannot use the relevance lemma (page 716), and indeed, the definition
introduction rule and the definition elimination rule are not totally cor-
rect w.r.t. COMP. To see this, let us consider the case where we add
to a program P1 whose completion is consistent, a new clause of form
newp(X) <— -inewp(X). We get a new program, say P2, whose completion
contains the formula newp(X) <-> -<newp(X) and it is inconsistent. Thus,
COMP[P1,Q] = COMP[P2,Q] for Q =<- newp(X).
However, it can be shown that if the definition introductions are only
non-recursive definition introductions then any step of non-recursive defini-
tion introduction or definition elimination is totally correct w.r.t. COMP.
The partial correctness w.r.t. COMP of the unfolding rule Rl and the
folding rule R2 can easily be established, as illustrated by the following
example.
Example 4.5.2. Let us consider the program

whose completion is

By unfolding the first clause of P0 w.r.t. q we get

whose completion is

Comp(P1) can be obtained by replacing q in p <-> q A -ir of Comp(Po)


by (s A t) V (s A u) and then applying the distributive and associative laws.
Since q «-> (s A t) V (s A u) holds in Comp(P0), we have that Comp(P1) is
a logical consequence of Comp(Po).
740 Alberto Pettorossi and Maurizio Proietti

From P1 by folding the definition of p using the definition of v in PI


itself, we get

whose completion is

Comp(p2) can be obtained from Comp(P1) by first using the associa-


tive, commutative, and distributive laws for replacing the formula p <-»
(s A * A -ir) V (s A u A ->r) by p <-> (i'V u) A (s A ->r), and then substituting
w for t V u. Since t> H- £ V u holds in Comp(Pi), we have that Camp(P<2.) is
a logical consequence of Comp(P1).
In general, given a transformation sequence P0, . . . , Pk, if a program
Pk+1 can be obtained from a program Pt by an unfolding step using Pj,
with 0 < j < k, and both Comp(Pj) and Comp(Pk) are logical conse-
quences of Comp(Po), then Comp(Pk+1) can be obtained from Comp(Pk)
by one or more replacements of a formula F by a formula G such that
F «-> G is a logical consequence of Comp(Pj). Thus, also Comp(Pk+1) is a
logical consequence of Comp(P0).
A similar fact holds if Pk+1 can be obtained from Pk by applying the
folding rule, or the goal replacement rule, or the clause replacement rules.
Thus, we have the following result.
Theorem 4.5.3 (Partial correctness w.r.t. COMP). Let P0 , . . . ,Pn
be a transformation sequence of normal programs constructed by using the
following rules: unfolding, folding, non-recursive definition introduction,
definition elimination, goal replacement, and clause replacement. If no
folding step is performed after a definition elimination step, then P0 , . . . , Pn
is partially correct w.r.t. the semantics COMP.
Unfortunately, the unfolding rule is not totally correct w.r.t. COMP
as shown by the following example adapted from [Maher, 1987].
Example 4.5.4. Let us consider the program

whose completion is (equivalent to)

together with the axioms of Clark's equality theory [Lloyd, 1987; Apt,
1990]. By unfolding the last clause of PO we get
Transformation of Logic Programs 741

whose completion is (equivalent to)

together with the axioms of Clark's equality theory.


We have that VXp(X) is a logical consequence of Comp(Po). On the
other hand, VXp(X) is not a logical consequence of Comp(P1). Indeed,
let us consider the interpretation / whose domain is the set of integers,
p(x) holds iff q(x) holds iff x is an even integer, and succ is the successor
function. / is a model of Comp(P1), whereas it is not a model of VXp(X).
We may restrict the use of the unfolding rule so to make it totally
correct w.r.t. COMP, as indicated in the following definition.
Rule R1.3 In-situ unfolding. The unfolding of a clause C in a program
Pk w.r.t. an atom A using a program P, is said to be an in-situ unfolding
iff PJ = Pk, and hd(C) is not unifiable with A.
If program Pk+1 is derived from program Pk by performing an in-situ
unfolding step, then the transformation sequence Pk,Pk+1 is reversible by
in-situ folding. Thus, by the reversibility lemma 4.1.3 (page 717) and the
partial correctness theorem w.r.t. COMP (page 740), we have that every
in-situ unfolding is totally correct w.r.t. COMP.
By similar arguments we can show the total correctness of any in-situ
folding and independent goal replacement step, because, as the reader may
verify, the independent goal replacement rule is reversible even though
COMP is not a relevant semantics (and, thus, the reversibility of inde-
pendent goal replacement lemma cannot be applied).
It is also straightforward to show the total correctness of any clause
replacement step. Thus, we have the following result.
Theorem 4.5.5 (First correctness theorem w.r.t. COMP). Let P0,
. . . ,Pn be a transformation sequence of normal programs constructed by us-
ing the transformation rules: in-situ unfolding, in-situ folding, non-recursive
definition introduction, definition elimination, independent goal replace-
ment, and clause replacement. Then Po, . . . ,Pn is totally correct w.r.t.
COMP.
We end this section by showing, through the following example, that
the hypotheses of the second correctness theorem w.r.t. CASNF are not
sufficient to ensure the correctness of folding w.r.t. COMP.
Example 4.5.6. Let us consider the following transformation sequence:
742 Alberto Pettorossi and Maurizio Proietti

(by single-folding of p <- q)


P2: p <- p q <- q r <- p r <-->q
where we assume that p and r are top predicates and q is an intermediate
one.
By the second correctness theorem w.r.t. CASNF we have that P0 and
PZ (and also P1) are equivalent w.r.t. CASNF. Let us now consider the
completions of P0 and P2, respectively:
Comp(Po): p <-» q q «-» q r*+pV->q
Comp(P-2): p <-> p 9<->9 r <-> p V-></
We have that r is a logical consequence of Corap(Po). On the contrary, r is
not a logical consequence of Comp(P^). Indeed, the interpretation where
p is false, q is true, and r is false, is a model of Comp(P-z), but not of r.
Thus, PQ and P% are not equivalent w.r.t. COMP.
It should be noted that in the above Example 4.5.6, P0 is equivalent
to P2 w.r.t. other two-valued or three-valued semantics for normal pro-
grams such as the already mentioned Fitting's and Kunen's extensions of
Clark's completion, perfect model, stable model, and well-founded model
semantics.
For the case where unfolding and folding are not in-situ, the reader may
find various correctness results w.r.t. the above mentioned semantics in
[Seki, 1990; Seki, 1991; Sato, 1992; Seki, 1993; Aravindan and Dung, 1995;
Bossi and Etalle, 1994a].

5 Strategies for transforming logic programs


The transformation process should be directed by some metarules, which
we call strategies, because, as we have seen in Section 3, the transformation
rules have inverses, and thus, they allow the final program of a transfor-
mation sequence to be equal to the initial program. Obviously, we are not
interested in such useless transformations.
In this section we present an overview of some transformation strategies
which have been proposed in the literature. They are used, in particular,
for solving one of the crucial problems of the transformation methodology,
that is, the use of the definition rule for the introduction of the so-called
eureka predicates.
In [Feather, 1987; Partsch, 1990; Deville, 1990; Pettorossi and Proi-
etti, 1994; Pettorossi and Proietti, 1996] one can find a treatment of the
transformation strategies for both functional and logic programs.
For reasons of simplicity, when we describe the various transformation
strategies we consider only the case of definite programs with the least
Her brand model semantics LHM.
We assume that the following rules are available: unfolding (rule Rl,
page 708), in-situ folding (rule R2.1, page 717), single-folding (rule R2.2,
Transformation of Logic Programs 743

page 726), definition introduction (rule R3, page 711), definition elim-
ination (rule R4, page 712), independent goal replacement (rule R5.3,
page 718), goal rearrangement (rule R5.4, page 725), deletion of duplicate
goals (rule R5.5, page 725), basic goal replacement (rule R5.6, page 727),
and clause replacement (rule R6, page 714).
The correctness of those rules w.r.t. LHM is ensured by the first and
the second correctness theorems w.r.t. LHM (pages 725 and 727).
In order to simplify our presentation, sometimes we will not mention
the use of goal rearrangement and deletion of duplicate goals.
As already pointed out, from the correctness of the goal rearrangement,
deletion of duplicate goals, clause rearrangement (rule R6.1, page 714),
and deletion of duplicate clauses (rule derived from rule R6.2, page 714) it
follows that the concatenation of sequences of literals and the concatena-
tion of sequences of clauses are associative, commutative, and idempotent.
Therefore, when dealing with goals or programs, we will feel free to use
set-theoretic notations, such as {...} and U, instead of sequence-theoretic
notations.
Before giving the technical details concerning the transformation strate-
gies we would like to present, we now informally explain the main ideas
which justify their use.
Suppose that we are given an initial program and we want to apply the
transformation rules to improve its efficiency. In order to do so, we usually
need a preliminary analysis of the initial program by which we discover that
the evaluation of a goal, say AI,.. .,An, in the body of a program clause,
say C, is inefficient, because it generates some redundant computations.
For example, by analysing the initial program PQ given in the palindrome
example of Section 2 (page 702), we may discover that the evaluation of
the body of the clause
3. pal((H\T\) <- appmd(Y,[H],T),pal(Y)
is inefficient because it determines multiple traversals of the list Y.
In order to improve the performance of PQ, we can apply the technique
which consists in introducing a new predicate, say newp, by means of a
clause, say N, with body AI ,..., An.
This initial transformation step can be formalized as an application
of the so-called tupling strategy (page 746). Sometimes we also need a
simultaneous application of the generalization strategy (page 747). Then
we fold clause C w.r.t. the goal A\,..., An by using clause TV, and we
unfold clause N one or more times, thereby generating some new clauses.
This process can be viewed as a symbolic evaluation of a query which
is an instance of AI, ... ,An. This unfolding gives us the opportunity of
improving our program, because, for instance, we may delete some clauses
with finitely failed body, thereby avoiding failures at run-time, and we may
delete duplicate atoms, thereby avoiding repeated computations.
744 Alberto Pettorossi and Maurizio Proietti

Looking again at the palindrome example of Section 2, we see that by


applying the tupling and generalization strategies, we have introduced the
clause
6. newp(L,T) <- append(Y,L,T),pal(Y)
and we have used this clause 6 for folding clause 3. Then we have unfolded
clause 6 w.r.t. the atoms pal(...) and append(...) and we have derived the
clauses
10. newp(L, L) <—
11. newp(L,[X\L]) <-
13. newp(L, \H\U\) <- append(R, [H\L], U),pal(R)
These clauses for newp, together with clauses 1, 2, and 3f for pal and
clauses 4 and 5 for append, avoid multiple traversals of the input list, but
as the reader may verify, only when that list has -at most three elements.
The efficiency improvements due to the unfoldings can be iterated at
each level of recursion, and thus, they become computationally significant,
only if we find a recursive definition of newp. In that case the multiple
traversals of the input list will be avoided for lists of any length.
This recursive definition can often be obtained by performing a folding
step using the clause initially introduced by tupling. In our palindrome
example that clause is clause 6. By folding clause 13 using clause 6, we get
13f. newp(L,[H\U])+-newp([H\L],U)
This recursive clause, together with clauses 10 and 11, indeed provides a
recursive definition of newp, and it avoids multiple traversals of any input
list.
In some unfortunate cases we may be unable to perform the desired fold-
ing steps for deriving the recursive definition of the predicates introduced
by the initial applications of the tupling and generalization strategies. In
those cases we may use some auxiliary strategies and we may introduce
some extra eureka predicates which allow us to perform the required fold-
ing steps. Two of those auxiliary strategies are the loop absorption strategy
and the already mentioned generalization strategy, both described in Sec-
tion 5.1 below.
In [Darlington, 1981] the expression 'forced folding' is introduced to
refer to the need for performing the folding steps for improving program
efficiency. This need for folding plays an important role in the program
transformation methodology, and it can be regarded as a meta-strategy. It
is the need for folding that often suggests the appropriate strategy to apply
at each step of the derivation.
The need for folding in program transformation is related to similar
ideas in the field of automated theorem proving [Boyer and Moore, 1975]
and program synthesis [Deville and Lau, 1994], where tactics for inductive
Transformation of Logic Programs 745

proofs and inductive synthesis are driven by the need for applying suitable
inductive hypotheses.
5.1 Basic strategies
We now describe some of the basic strategies which have been introduced
in the literature for transforming logic programs. They are: tupling, loop
absorption, and generalization. The basic ideas underlying these strategies
come from the early days of program transformation and they were already
present in [Burstall and Darlington, 1977].
The tupling strategy was formally defined in [Pettorossi, 1977] where it
is used for tupling together different function calls which require common
subcomputations or visit the same data structure.
The name 'loop absorption' was introduced in [Proietti and Pettorossi,
1990] for indicating a strategy which derives a new predicate definition
when a goal is recurrently generated during program transformation. This
strategy is present in various forms in a number of different transforma-
tion techniques, such as the above mentioned tupling, supercompilation
[Turchin, 1986], compiling control [Bruynooghe et al., 1989], as well as
various techniques for partial evaluation (see Section 6).
Finally, the generalization strategy has its origin in the automated the-
orem proving context [Boyer and Moore, 1975], where it is used to generate
a new generalized conjecture allowing the application of an inductive hy-
pothesis.
The tupling, loop absorption, and generalization strategies are used
in this chapter as building blocks to describe a number of more complex
transformation techniques.
For a formal description of the strategies and their possible mechaniza-
tion we now introduce the notion of unfolding tree. It represents the process
of transforming a given clause by performing unfolding and basic goal re-
placement steps. This notion is also related to the one of symbolic trace tree
of [Bruynooghe et al., 1989], where, however, the basic goal replacement
rule is not taken into account.
Definition 5.1.1 (Unfolding tree). Let P be a program and C a clause.
An unfolding tree for (P, C) is a (finite or infinite) labelled tree such that
• the root is labelled by the clause C,
• if M is a node labelled by a clause D then
either M has no sons,
or M has n(> 1) sons labelled by the clauses D1, . . . , Dn obtained
by unfolding D w.r.t. an atom of its body using P,
or M has one son labelled by a clause obtained by basic goal replace-
ment from D.
In an unfolding tree we also have the usual relations of 'descendant
node' (or clause) and 'ancestor node' (or clause).
746 Alberto Pettorossi and Maurizio Proietti

Given a program P and a clause C, the construction of an unfolding


tree for (P, C) is nondeterministic. In particular, during the process of
constructing an unfolding tree we need to decide whether or not a node
should have son-nodes, and in case we decide that a node should have son-
nodes constructed by unfolding, we need to choose the atom w.r.t. which
that unfolding step should be performed. Those choices can be realized by
using a function denned as follows.
Definition 5.1.2 (Unfolding selection rule). An unfolding selection
rule (or u-selection rule, for short) is a function that, given an unfolding
tree and one of its leaves, tells us whether or not we should unfold the
clause in that leaf, and in the affirmative case it tells us the atom w.r.t.
which that clause should be unfolded.
We now formally introduce the tupling, loop absorption, and general-
ization strategies.

51. Tupling strategy. Let us consider a clause C of the form


H 4r- A\, . . . ,Am,Bi,. . .,Bn
with m > 1 and n > 0. We introduce a new predicate neuip defined by a
clause T of the form
newp(X1,. . . ,Xk) <- A1,. . . ,Am
where the arguments Xi,...,Xk are the elements of vars(Ai,... ,Am) n
vars(H, BI, ..., Bn). We then look for the recursive definition of the eureka
predicate newp by performing some unfolding and basic goal replacement
steps followed by suitable folding steps using clause T. We finally fold
clause C w.r.t. the atoms AI,... ,Am using clause T.
The tupling strategy is often applied when AI,..., Am share some vari-
ables. The program improvements which can be achieved by using this
strategy are based on the fact that we need to evaluate only once the sub-
goals which are common to the computations determined by the tupled
atoms AI, ... ,Am. By tupling we can also avoid multiple visits of data
structures and the construction of intermediate bindings.

52. Loop absorption strategy. Suppose that a non-root clause C in an


unfolding tree has the form
H i— AI,...,Am,Bi,...,Bn
with m > 1 and n > 0, and the body of a descendant D of C contains
(as a subsequence of atoms) the instance (A\,..., Am)6 of A\,..., Am via
some substitution 9. Suppose also that the clauses in the path from C to
D have been generated by applying no transformation rule, except for goal
rearrangement and deletion of duplicate goals, to BI ,..., Bn. We introduce
a new predicate defined by the following clause A:
newp(Xi,...,Xk) «- Ai,...,Am
Transformation of Logic Programs 747

where {X1, . . . ,Xk} is the minimum subset of vars(A1, . . . , Am) which is


necessary to perform a single-folding step on C and a single-folding step on
D, both using a clause whose body is A1, . . . , Am. This minimum subset
is determined by condition 3 for the applicability of the single-folding rule
(page 726). We fold clause C using clause A and we then look for the
recursive definition of the eureka predicate newp. This recursive definition
can be found starting from clause A by first performing the unfolding steps
and the basic goal replacement steps corresponding to the ones which lead
from clause C to clause D, and then folding using clause A again.

S3. Generalization strategy. Let us consider a clause C of the form


H <— A1,. . . ,Am, B1, . . . ,Bn
with m > 1 and n > 0. We introduce a new predicate genp defined by a
clause G of the form
genp(X1, . . . , Xk) <- GenA1, . . . , GenAm
where (GenA1, . . . , GenAm) 8 — (Ai,..., Am), for some substitution 0, and
{Xi, ...,Xk} is a superset of the variables which are necessary to fold C
using a clause whose body is GenAi,..., GenAm. We then fold C using G
and we get
H «- genp(Xi,..., Xk)6,Bi,. ..,Bn
We finally look for the recursive definition of the eureka predicate genp.
A suitable form of the clause G introduced by generalization can often
be obtained by matching clause C against one of its descendants, say £>,
in the unfolding tree generated during program transformation (see Exam-
ple 5.2.2, page 753). In particular, we will consider the case where

1. D is the clause K «- EI,... ,Em,Fi,.. .,Fr, and D has been ob-


tained from C by applying no transformation rule, except for goal
rearrangement and deletion of duplicate goals, to BI, ..., Bn,
2. for i = l,...,m, Ei has the same predicate symbol of Ai,
3. EI ,..., Em is not an instance of AI,..., Am,
4. the goal GenAi,. • •, GenAm is the most specific generalization of
Ai,...,Am and EI,...,Em, and
5. {Xi,...,Xk} is the minimum subset of vars(GenAi,..., GenAm)
which is necessary to fold both C and D using a clause whose body
is GenAi, • • •, GenAm.

5.2 Techniques which use basic strategies


In this section we will present some techniques for improving program effi-
ciency by using the tupling, loop absorption, and generalization strategies.
748 Alberto Pettorossi and Maurizio Proietti

5.2.1 Compiling control


One of the advantages of logic programming over conventional imperative
programming languages is that by writing a logic program one may easily
separate the 'logic' part of an algorithm from the 'control' part [Kowalski,
1979]. By doing so, the correctness of an algorithm w.r.t. a given specifica-
tion is often easier to prove. Obviously, we are then left with the problem
of providing an efficient control.
Unfortunately, the naive Prolog strategy for controlling SLD-resolution
(see Section 4.4.4) does not always give us the desired level of efficiency,
because the search space generated by the nondeterministic evaluation of
a program is explored without using any information about the program.
Much work has been done in the direction of improving the control strategy
of logic languages (see, for instance, [Bruynooghe and Pereira, 1984; Naish,
1985]).
We consider here a transformation technique, called compiling control
[Bruynooghe et al., 1989], which follows a different approach. Instead of
enhancing the naive Prolog evaluator using a clever (and often more com-
plex) control strategy, we transform the given program so that the derived
program behaves using the naive evaluator as the given program behaves
using an enhanced evaluator.
The main advantage of the compiling control approach is that we can
use relatively simple evaluators which have small and efficient compilers.
The compiling control technique can also be used to 'compile' bottom-
up and mixed evaluation strategies [De Schreye et al., 1991; Sato and
Tamaki, 1988] as well as lazy evaluation and coroutining [Narain, 1986].
Here we will only show the use of compiling control in the case where the
control to be 'compiled' is a computation rule different from the left-to-
right Prolog one. In this case, by applying the compiling control technique
one can improve generate-and-test programs by simulating a computation
rule which selects test predicates as soon as the relevant data are available.
A similar idea has also been investigated in the area of functional pro-
gramming, within the so-called filter promotion strategy [Darlington, 1978;
Bird, 1984]. Some other transformation techniques for improving generate-
and-test logic programs, which are closely related to the compiling control
technique and the filter promotion strategy, can be found in [Seki and Pu-
rukawa, 1987; Brough and Hogger, 1991; Traff and Prestwich, 1992].
The problem of 'compiling' a given computation rule C can be described
as follows: given a program PI and a set Q of queries, we want to derive a
new program P% which, for any query in Q, is equivalent to P\ w.r.t. LHM
and behaves using the left-to-right computation rule as PI does using the
rule C [Bruynooghe et al., 1989; De Schreye and Bruynooghe, 1989].
By 'equal behaviour' we mean that for a query in Q, the SLD-tree, say
TI, constructed by using PI and the computation rule C, is equal to the
Transformation of Logic Programs 749

SLD-tree, say T^, constructed by using PI and the left-to-right computation


rule, if
i) we look at T\ and T2 as directed trees with leaves labelled by 'suc-
cess' or 'failure' and arcs labelled by most general unifiers (thus, we
disregard the goals in the nodes),
ii) we replace zero or more non-branching paths of T\ by single arcs, each
of which is labelled by the composition of the most general unifiers
labelling the corresponding path to be replaced, and
iii) we replace zero or more subtrees of Ti whose roots have an outgoing
arc only, and this arc is labelled by the identity substitution, as fol-
lows: every subtree is replaced by the subtree below the arc outgoing
from its root.
We can formulate basic forms of compiling control in terms of the pro-
gram transformation methodology as we now indicate. Given a program
PI, a set Q of queries, and a computation rule C, the program P? obtained
by the compiling control technique can be derived by first constructing a
suitable unfolding tree, say T, using the unfolding rule only, and then ap-
plying the loop absorption strategy. Some more complex forms of compiling
control require the use of generalization strategies possibly more powerful
than the strategy S3 (page 747).
Without loss of generality, we assume that every query in Q is of the
form 4- q(...) and in PI there exists only one clause, say R, whose head
predicate is q. (Indeed, we can use the definition rule to comply with this
condition.) The root clause of T is R and the nodes of T are generated
by using a 'suitable' u-selection rule which simulates the evaluation of an
'abstract query' representing the whole set Q, by using the computation
rule C. We will not give here the formal notions of 'simulation' and 'ab-
straction' which may be used to effectively construct the unfolding tree T
from the given PI, Q, and C. We refer to [Cousot and Cousot, 1977] for a
formalization of the techniques of abstract interpretation, and to [De Schr-
eye and Bruynooghe, 1989] for a method based on abstract interpretation
for generating the tree T in a semi-automatic way.
We now give an example of application of the compiling control tech-
nique by using the tupling and the loop absorption strategies.
Example 5.2.1. [Common subsequences] Let sequences be represented
as lists of items. We assume that for any given sequence X and Y,
subseq(X,Y) holds iff X is a subsequence of Y, in the sense that X can
be obtained from Y by deleting some (possibly not contiguous) elements.
Suppose that we want to verify whether or not a sequence X is a common
subsequence of the two sequences Y and Z. The following program Csub
does so by first verifying that X is a subsequence of Y, and then verifying
that X is a subsequence of Z.
750 Alberto Pettorossi and Maurizio Proietti

1. csub(X, Y, Z) <- subseq(X, Y), subseq(X, Z)


2. subseq([],X) <-
3. su6se9([A|^],[yl|y]) <- su6seg(X,F)
4. a«6segr([yl|X],[B|y]) «- subseq([A\X},Y)
where for any sequence X, Y, and Z, csu6(X,Y,Z) holds iff X is a subse-
quence of Y and X is also a subsequence of Z.
Let Q be the set of queries {«- csub(X, s,t) \ s and < are ground lists
and X is an unbound variable} and the computation rule C be the following
one:
if the goal is 'subseq(w,x), subseq(y, z)' and w is a proper subterm of y
then C selects for resolution the atom subseq(y, z)
else C selects for resolution the leftmost atom in the goal.
We may expect that the evaluation of a query in Q using the compu-
tation rule C, is more efficient than the evaluation of that query using the
standard left-to-right Prolog computation rule, because the second occur-
rence of subseq(...) in a goal of the form lsubseq(...), subseq(...)' is selected
as soon as it gets suitably instantiated by the evaluation of the first occur-
rence. Thus, using the computation rule C, it may be the case that the
evaluation of a goal of the form 'subseq(...), subseq(...)' fails even if the
first occurrence of subseq(...) has not been completely evaluated.
We now construct an unfolding tree T for (Csub, clause 1) by using the
following u-selection rule Uc which simulates the computation rule C:
if the body of the clause to be unfolded is tsubseq(w, x), subseq(y, z)'
and w is a proper subterm of y
then Uc selects for unfolding the atom subseq(y,z)
else Uc selects for unfolding the leftmost atom in the body.
Clause 1 is the only clause whose head unifies with csub(X, s, t).
A finite portion of T is depicted in Fig. 2, where a dashed arrow from
clause M to clause N means that the body of M is an instance of the body
of TV.
Since the body of clause 10 is an instance of the body of clause 6, we
apply the loop absorption strategy. We introduce the eureka predicate
newcsub by the following clause:
11. newcsub(A,X,Y,Z) 4- subseq(X,Y),subseq([A\X],Z)
and we fold clause 6, whereby obtaining
6f. csub([A\X],[A\Y],Z) <- newcsub(A,X,Y,Z)
We also have that the body of clause 7 is an instance of the body of
clause 1. We fold clause 7 and we get
7f. csub([A\X],[B\Y],Z) <- c8ub([A\X],Y,Z)
Transformation of Logic Programs 751

1. csub(X,Y£)<-- subseq(X,Y), subseq(XZ) < ------- 1

5. csub([],Y£)<- subseq([]2) 7. csub([A\X],[B\Y\2)


subseq([A\X],Y), subseq([A\X]Z)
6. csub([A\X\,[A\Y}2) «-
8. csub([ ],YZ) <- subseq(X,Y), subseq([A\X]Z) < ------- ,

----- 9. csub([A\X},[A\Y\,[A\Z])<- 10. cs«fe([AIX],[AIF],[fllZ])*- !


subseq(X,Y), subseq(X£) subseq(X,T>, subseq([A\X]Z)
Fig. 2. An unfolding tree for (Csub, clause 1} using the computation rule
Uc- We have underlined the atoms selected for unfolding.

We now have to look for the recursive definition of the predicate new-
csub. Starting from clause 11, we perform the unfolding step corresponding
to the one which leads from clause 6 to clauses 9 and 10. We get the clauses
12. newcsub(A, X, Y, [A\Z]) 4- subseq(X, Y), subseq(X, Z)
13. newcsub(A,X,Y,[B\Z]) 4- subseq(X,Y),subseq([A\X],Z)
and by folding we get
12f. newcsub(A,X,Y,[A\Z]) 4- csub(X,Y,Z)
13f. newcsub(A,X,Y,[B\Z]) 4- newcsub(A,X,Y,Z)
The final program is made out of the following clauses:
8. csub([],Y,Z) 4-
6f. csub([A\X], [A\Y], Z) 4- newcsub(A, X, F, Z)
7f. csu6([yl|X],[ J B|y] > Z) 4- csub([A\X],Y, Z)
12f. newcsub(A, X, Y, [A\Z]) 4- csub(X, Y, Z)
13f. newcsub(A,X,Y,[B\Z}) 4- newcsub(A,X,Y,Z)
The correctness of the above transformation can easily be proved by
applying the second correctness theorem w.r.t. LHM with the assumption
that newcsub and csub are top predicates and subseq is an intermediate
predicate. In particular, the single-folding step which generates clause 6f
from clause 6 using clause 11, satisfies the conditions of that theorem, be-
cause: (i) clause 11 has been introduced by the definition rule, (ii) the head
of clause 11 has a top predicate, and (iii) clause 6 has been derived from
clause 1 by unfolding w.r.t. the intermediate atom subseq(X, Y). Similar
conditions ensure the correctness of the other single-folding steps.
Let us now compare the SLD-tree, say Ti, for Csub, a query of form
4— csub(X, s, t) in Q, and the computation rule C, with the SLD-tree, say
T2, for the final program, the query 4- csub(X, s,(), and the left-to-right
computation rule.
752 Alberto Pettorossi and Maurizio Proietti

Fig. 3. Tree rewritings for the SLD-tree TI.

As the reader may verify, the tree T% can be obtained from the tree
TI by first replacing every query of the form '«- subseq(x,b), subseq(x,c)'
by the query '«— csub(x,b,cY and every query of the form '<— subseq(x,b),
subseq([a\x],c)' by the query '«— newcsub(a,x,b,cy, and then by perform-
ing on the derived tree the rewritings shown in Fig. 3 for any unbound
variable X and ground lists u and v.
5.2.2 Composing programs
A popular style of programming, which can be called compositional, con-
sists in decomposing a given goal into easier subgoals, then writing program
modules which solve these subgoals, and finally, composing the various pro-
gram modules together. The compositional style of programming is often
helpful for writing programs which can easily be understood and proved
correct w.r.t. their specifications.
However, this programming style often produces inefficient programs,
because the composition of the various subgoals does not take into account
the interactions which may occur among the evaluations of these subgoals.
For instance, let us consider a logic program with a clause of the form
p(X)<-q(X,Y),r(Y)
where in order to solve the goal p(X) we are required to solve q(X, Y) and
r(Y). The binding of the variable Y is not explicitly needed because it does
not occur in the head of the clause. If the construction, the memorization,
and the destruction of that binding are expensive, then our program is
likely to be inefficient.
Similar problems occur when the compositional style of programming
is applied for writing programs in other programming languages, different
from logic. In imperative languages, for instance, one may construct several
procedures which are then combined together by using various kinds of
sequential or parallel composition operators. In functional languages, the
small subtasks in which a given task is decomposed are solved by means of
individual functions which are then combined together by using function
application or tupling.
There are various papers in the literature which present techniques for
improving the efficiency of the evaluation of programs written according to
Transformation of Logic Programs 753

the compositional style of programming.


Similarly to the case discussed in Section 5.2.1, two approaches have
been followed:
1. the improvement of the evaluator by using, for instance, garbage
collection, memoing, and various forms of laziness and coroutining,
and
2. the transformation of the given program into a semantically equiva-
lent one which can be more efficiently evaluated by a non-improved,
standard evaluator.
In the imperative and functional cases, various transformation methods
have been proposed, such as: finite differencing [Paige and Koenig, 1982],
composition or deforestation [Feather, 1982; Wadler, 1990], and tupling
[Pettorossi, 1977]. (See also [Feather, 1987; Partsch, 1990; Pettorossi and
Proietti, 1996] for surveys.)
For logic programs two main methods have been considered: loop fu-
sion [Debray, 1988] and unnecessary variable elimination [Proietti and Pet-
torossi, 1995]. The aim of loop fusion is to transform a program which com-
putes a predicate defined by the composition of two independent recursive
predicates, into a program where the computations corresponding to these
two predicates are performed by one predicate only. Using loop fusion one
may avoid the multiple traversal of data structures and the construction of
intermediate data structures.
The method presented in [Proietti and Pettorossi, 1995] may be used for
deriving programs without unnecessary variables. A variable A" of a clause
C is said to be unnecessary if at least one of the following two conditions
holds:
1. X occurs more than once in the body of C (in this case we say that
X is a shared variable),
2. X does not occur in the head of C (in this case we say that X is an
existential variable).
Since unnecessary variables often determine multiple traversals of data
structures and construction of intermediate data structures, the results of
unnecessary variable elimination are often similar to those of loop fusion.
In the following example we recast loop fusion and unnecessary variable
elimination in terms of the basic strategies presented in Section 5.1.
Example 5.2.2. [Maximal number deletion] Suppose that we are given a
list Xs of positive numbers. We want to delete from Xs every occurrence
of its maximal number, say M. This can be done by first computing the
value of M, and then visiting again Xs for deleting each occurrence of M.
A program which realizes this algorithm is as follows:
1. deletemax(Xs, Ys) 4- maximal(Xs,M),delete(M,Xs, Ys)
2. maximal([],0) 4-
754 Alberto Pettorossi and Maurizio Proietti

3. maximal([X\Xs],M) <- maximal(Xs,N),max(N,X,M)


4. delete(M, [],[]) «-
5. delete(M, [M\Xs], Ys) «- delete(M,Xs, Ys)
6. delete(M, [X\Xs], [X\ Ys}) <- M ^ X, ddete(M, Xs, Ys)
where, for any positive number A, B, and M, max(A, B,M) holds iff M
is the maximum of A and B.
We would like to derive a program which traverses the list Xs once only.
This could be done by applying the loop fusion method and obtaining a
new program where the computations corresponding to maximal and delete
are performed by one predicate only. A similar result can be achieved by
eliminating the shared variables whose bindings are lists, and in particular,
the variable Xs in clause 1.
To this aim we may apply the tupling strategy to the predicates max-
imal and delete which share the argument Xs. Since the atoms to be
tupled together constitute the whole body of clause 1 defining the predi-
cate deletemax, we do not need to introduce a new predicate, and we only
need to look for the recursive definition of the predicate deletemax. After
some unfolding steps, we get
7. deletemax([],[]) <-
8. deletemax([M\Xs],Ys) <- maximal(Xs,N),max(N,M,M),
delete(M,Xs,Ys)
9. ddetenuKe([X\Xs],[X\Y8]) «- maximal(Xs,N),max(N,X,M),
M^X,delete(M,Xs, Ys)
As suggested by the tupling strategy, we may now look for a fold of
the goal 'maximal(Xs,N), delete(M,Xs,YsY using clause 1. Unfortu-
nately, no matching is possible because this goal is not an instance of
'maxima/(J(s,M), delete(M,Xs, Ys)'. Thus, we apply the generalization
strategy and we introduce the following clause:
10. gen(Xs, P, Q, Ys) 4- maximal(Xs, P), delete(Q, Xs, Ys)
whose body is the most specific generalization of the following two goals:
'maximal(Xs,M),delete(M,Xs, Ys)', which is the body of clause 1, and
' maximal (Xs,N), delete(M,Xs, Ys)'. By folding clause 1 using clause 10,
we get
If. deletemax(Xs, Ys) «- gen(Xs,M,M, Ys)
We are now left with the problem of finding the recursive definition of
the predicate gen introduced in clause 10. This is an easy task, because we
can perform the unfolding steps corresponding to those leading from clause
1 to clauses 7, 8, and 9, and then we can use clause 10 for folding. After
those steps we get the following program:
If. deletemax(Xs, Ys) <- gen(Xs,M,M, Ys)
Transformation of Logic Programs 755

11. gen([],0, Q,[])<-


12. g e n ( [ X \ X 3 \ , P , X , Ys) <- gen(Xs,N,X, Ys),max(N,X,P)
13. gen([X\Xa], P, Q, [X\ Ys}) <- gen(Xs, N, Q, Ys), Q / X,
mox(N,X,P)
This program performs the desired list transformation in one visit. In-
deed, let us consider a query of the form: <- deletemox(l, Ys), where I is a
ground list and Ys is an unbound variable. During the evaluation of that
query, while visiting the input list /, the predicate gen(l,P,Q, Ys) both
computes the maximal number F and deletes all elements of / which are
equal to P.
Notice also that no shared variable whose binding is a list occurs in
the clauses defining deletemax and gen. Thus, we have been successful in
eliminating the unnecessary variables at the expense of increasing nonde-
terminism (see clauses 12 and 13).
Now in order to avoid nondeterminism, we may continue our program
derivation by looking for a program in which one avoids the evaluation of
the goal gen(Xs, N, Q, Ys) in clause 13, when the body of clause 12 fails
after the evaluation of gen(Xs, N, X, Ys).
This can be done by the application of the so called clause fusion [De-
bray and Warren, 1988; Deville, 1990]. This technique can be mimicked by
applying our transformation rules as follows.
We first perform two generalization + equality introduction steps fol-
lowed by a goal rearrangement step and we get

We then introduce the following definition:

and we fold clauses 14 and 15 by applying rule R2.1, thereby getting

We can then simplify clauses 16 and 17 by unfolding, and we obtain

Thus, the final program is made out of the clauses


756 Alberto Pettorossi and Maurizio Proietti

18. gen([X\X8],P,QtZs) <- gen(Xs,N,Q, Ys),max(N,X,P),


aux(Q, X, Zs, Ys)
19. avx(X,X, Ys, Ys)+-
20. aux(Q,X, [X\ Ys], Ys) «- Q ^ X

5.2.3 Changing data representations


The choice of appropriate data structures is usually very important for the
design of efficient programs. In essence, this is the meaning of Wirth's
motto 'algorithms + data structures = programs' [Wirth, 1976].
However, it is sometimes difficult to identify the data structures which
allow very efficient algorithms before actually writing the programs. More-
over, complex data structures makes it harder to prove program correctness.
Program transformation has been proposed as a methodology for pro-
viding appropriate data structures in a dynamic way (see Chapter 8 of
[Partsch, 1990]): first the programmer writes a preliminary version of the
program implementing a given algorithm using simple data structures, and
then he transforms their representations while preserving program seman-
tics and improving efficiency.
An example of transformational change of data representations is the
transformation of logic programs which use lists into equivalent programs
which use difference-lists.
Difference-lists are data structures which are sometimes used for imple-
menting algorithms that manipulate sequences of elements. The advantage
of using difference-lists is that the concatenation of two sequences repre-
sented as difference-lists can often be performed in constant time, while
the concatenation of standard lists takes linear time w.r.t. the length of
the first list.
A difference-list can be thought of as a pair (L, R) of lists, denoted by
L\R, such that there exists a third list X for which the concatenation of
X and R is L [Clark and Tarnlund, 1977]. In that case we say that the list
X is represented by the difference-list L\R. Obviously, a single list can be
represented by many difference-lists.
Programs that use lists are often simpler to write and understand
than the equivalent ones which make use of difference-lists. Several meth-
ods for transforming programs which use lists into programs which use
difference-lists have been proposed in the literature [Hansson and Tarnlund,
1982; Brough and Hogger, 1987; Zhang and Grant, 1988; Marriot and
S0ndergaard, 1993; Proietti and Pettorossi, 1993].
The problem of deriving programs which manipulate difference-lists,
instead of lists, can be formulated as follows.
Let p(X, Y) be a predicate defined in a program P where Y is a list.
We want to define the new predicate diff-p(X, L\R) which holds iff p(X, Y)
holds and Y is represented by the difference-list L\R.
Transformation of Logic Programs 757

Let us assume that the concatenation of lists is defined in P by means


of a predicate append(X, Y, Z) which for any given list X, Y, and Z, holds
iff the concatenation of X and Y is Z. Then, the desired transformation
can often be achieved by applying the definition rule and introducing the
following clause for the predicate diff-p [Zhang and Grant, 1988]:
D. diff-p(X, L\R) <- p(X, Y), append(Y, R, L)
Then we have to look for a recursive definition of the predicate diff-p,
which should depend neither on p nor on append.
This can be done, as clarified by the following example, by starting
from clause D and performing some unfolding and goal replacement steps,
based on the associativity property of append, followed by folding steps
using D. We can then express p in terms of diff-p by observing that in the
least Herbrand model of PU {D}, diff-p(X,Y\[ ]) holds iff p ( X , Y ) holds.
Thus, in our transformed program the clauses for the predicate p can be
replaced by the clause
E. p(X,Y) ^ diff-p(X,Y\[])
We leave it to the reader to check that this replacement of clauses can
be performed by a sequence of in-situ folding, unfolding, and independent
goal replacement steps, which are correct by the first correctness theorem
w.r.t. LHM (page 725).
Example 5.2.3. [List reversal using difference-lists] Let us consider the
following program for reversing a list:
1. reverse([ ]>[])*"
2. reverse([H\T],R) <- reverse(T, V),append(V,[H],R)
3. append([ ],L,L) <-
4. append([H\T\, L, [H\S])«- append(T, L, S)
Given a ground list / of length n and the query 4- reverse(l, R), where R
is an unbound variable, this program requires O(ra2) SLD-resolution steps.
Indeed, for the evaluation of «- reverse(l,R), clause 2 is invoked n — 1
times. Thus, n — 1 calls to append are generated, and the evaluation of
each of those calls requires O(n) SLD-resolution steps.
The above program can be improved by using a difference-list for rep-
resenting the second argument of reverse. This is motivated by the fact
that by clause 2 the list which appears as second argument of reverse is
constructed by the predicate append, and as already mentioned, concate-
nation of difference-lists can be much more efficient than concatenation of
lists.
We start off by applying the definition rule and introducing the clause
5. diff-rev(X,L\R) «- reverse (X,Y), append (Y, R, L)
corresponding to clause D above.
758 Alberto Pettorossi and Maurizio Proietti

The recursive definition of diff-rev can easily be derived as follows. We


unfold clause 5 w.r.t. m>erse(X,Y) and we get
6. diff-rev([ ], L\R) <- append([ ], R, L)
7. diff-rev ([H\T],L\R) <- reverse(T, V),append(V,[H],Y),
append(Y,R,L)
By unfolding, clause 6 is replaced by
8. diff-rev([],R\R)i-
By using the unfold/fold proof method described in Section 4.3 we can
prove the validity of the replacement law
F. append(V,[H],Y),append(Y,R,L) ={V,H,R,L} append(V,[H\R],L)
w.r.t. LHM and the current program made out of clauses 1, 2, 3, 4, 7, and
8.
Thus, we apply the goal replacement rule to clause 7 and we get
9. diff-rev([H\T], L\R) <- reverse(T, V), append(V, [H\R],L)
We now fold clause 9 using clause 5 and we get
10. diff-rev([H\T],L\R) +-diff-rev (T,L\[H\R])
which, together with clause 8, provides the desired recursive definition of
diff-rev.
The correctness of the transformation steps described above is ensured
by the second correctness theorem w.r.t. LHM with the assumption that
diff-rev is a top predicate, reverse is an intermediate predicate, and ap-
pend is a basic predicate. Thus, in particular, the replacement performed
to derive clause 9 is a basic goal replacement step, and the folding step
which generates clause 10 from clause 9 using clause 5 is a single-folding
step satisfying the conditions of that second correctness theorem, because:
(i) clause 5 has been introduced by the definition rule, (ii) the heads of
clauses 5 and 9 have top predicates, and (iii) clause 9 has been derived
by first unfolding clause 5 w.r.t. the atom reverse(X, Y) with intermediate
predicate and then performing a basic goal replacement step.
Our final program that uses difference-lists is obtained by replacing the
clauses defining reverse by the following clause (see clause E above):
11. reverse(X,Y) <-diff-rev(X,Y\[])
The derived program is as follows:
11. reverse(X,Y) <- diff-rev(X,Y\[])
8. diff-mi([],R\R)<-
10. diff-nv([H\T\,L\R)<- diff-rev(T, L\[H\R})
It takes O(n) SLD-resolution steps for reversing a list of length n.
Transformation of Logic Programs 759

Fig. 4. An unfolding tree for the reverse program. We have underlined


the atoms selected for unfolding.

A crucial step in the derivation of programs which use difference-lists


is the introduction of the clause of the form
D. diff-p(X, L\R) <- p(X, Y), append(Y, R, L)
which defines the eureka predicate diff-p. This eureka predicate can also
be viewed as the invention of an accumulator variable, in the sense of the
accumulation strategy [Bird, 1984]. Indeed, as indicated in Example 5.2.3,
the argument R of diff-rev(X, L\R) can be viewed as an accumulator which
at each SLD-resolution step stores the result of reversing the list visited so
far.
In the following example we show that the invention of accumulator
variables can be derived by using the basic strategies described in Sec-
tion 5.1.
Example 5.2.4. [Inventing difference-lists by the generalization strategy]
Let us consider again the initial program of Example 5.2.3 (page 757). We
would like to derive a program for list reversal which does not use the
append predicate. We can do so by applying the tupling strategy to clause
2 (because of the shared variable V) and introducing the eureka predicate
new-rev by the following clause:
N. new-rev(T,H,R) <- reverse(T, V), append(V, [H],R)
As suggested by the tupling strategy, we then look for a recursive defi-
nition of new-rev by performing unfolding and goal replacement steps fol-
lowed by folding steps using JV. We have the additional requirement that
the recursive definition of new-rev should not contain any call to append.
This requirement can be fulfilled if the final folding steps are performed
w.r.t. a conjunction of the atoms of the form lreverse(...), append(...)'
and no other calls to append occur in the folded clauses.
The unfolding tree generated by some unfolding and goal replacement
steps starting from clause N is depicted in Fig. 4.
Let us now consider clause N* in the unfolding tree of Fig. 4. If we
were able to fold it using the root clause N, we would have obtained the
760 Alberto Pettorossi and Maurizio Proietti

required recursive definition of new-rev. Unfortunately, that folding step


is not possible because the argument [K, H] of the call of append in clause
N4 is not an instance of [H] in clause N. Since N4 is a descendant of AT, we
are in a situation where we can apply the generalization strategy. By doing
so we introduce the new eureka predicate gen-rev defined by the following
clause:
G. gen-rev(U, X, Y, R) <- reverse(U, B), append(B, [X\Y], R)
where the body of G is the most specific generalization of the body of N
and the body of A/4.
The recursive definition of gen-rev can be found by performing the
transformation steps which correspond to those leading from N to AT4 in
the unfolding tree. We get the following clauses:
gen-rev([],X,Y,[X\Y])<-
gen-rev([H\T],X,Y,R) <- gen-rev(T,H,[X\Y],R)
We can then fold clause 2 using G and we get
2f. nvene([H\T\,R) «- gen-rev(T,H, [ ],R)
The final program is as follows:
1. reverse([ ],[])«-
2f. nver8e([H\T}t R) <- gen-rev(T, H, [ ], R).
gm-nv([],X,Y,[X\Y])<-
gen-rev([H\T},X,Y,R) <- gen-rev(T,H,[X\Y],R)
It has a computational behaviour similar to the program derived in
Example 5.2.3 (page 757). In particular, the third argument of gen-rev is
used as an accumulator.

5.3 Overview of other techniques


In this section we would like to give a brief account of some other techniques
which have been presented in the literature for improving the efficiency of
logic programs by using transformation methods.
5.3.1 Schema-based transformations
A common feature of the strategies we have described in Section 5.2 is
that they are made out of sequences of transformation rules which are not
specified in advance; on the contrary, they depend on the structure of the
program at hand during the transformation process.
The schema-based approach to program transformation is complemen-
tary to the 'rules + strategies' approach and it consists in providing a
catalogue of predefined transformations of program schemata.
A program schema (or simply, schema) S is an abstraction via a sub-
stitution 0, of a program F, where some terms, goals, and clauses are
Transformation of Logic Programs 761

replaced by meta-variables, which once instantiated using 9, give us back


the program P.
If a program schema 5 is an abstraction of a program P, then we say
that F is an instance of 5.
Two schemata S1 and S2 are equivalent w.r.t. a given semantics function
SEM iff for all the values of the meta-variables the corresponding instances
P1 and P2 are equivalent programs w.r.t. SEM.
The transformation of a schema S1 into a schema 52 is correct w.r.t.
SEM iff S1 and S2 are equivalent w.r.t. SEM.
Usually, we are interested in a transformation from a schema S1 to a
schema S2 if each instance of S2 is more efficient than the corresponding
instance of S1.
Given an initial program P1 , the schema-based program transformation
technique works as follows. We first choose a schema S1 which is an ab-
straction via a substitution 9 of P1 , then we choose a transformation from
the schema S1 to a schema S2 in a given catalogue of correct schema trans-
formations, and finally, we instantiate S2 using 9 to get the transformed
program P2.
The issue of proving the equivalence of program schemata has been ad-
dressed within various formalisms, such as flowchart programs, recursive
schemata, etc. (see, for instance, [Paterson and Hewitt, 1970; Walker and
Strong, 1972; Huet and Lang, 1978]). Some methodologies for developing
logic programs using program schemata are proposed by several authors
(see, for instance, [Deville and Burnay, 1989; Kirschenbaum et ai, 1989;
Puchs and Promherz, 1992; Flener and Deville, 1993; Marakakis and Gal-
lagher, 1994]) and some examples of logic program schema transforma-
tions can be found in [Brough and Hogger, 1987; Seki and Purukawa, 1987;
Brough and Hogger, 1991]. The schema transformations presented in these
papers are useful for recursion removal (see Section 5.3.2 below) and for
reducing nondeterminism in generate-and-test programs (see Section 5.2.1,
page 748).
An advantage of the schema-based approach over the strategy-based
approach is that the application of a schema transformation requires little
time, because it is simply the application of a substitution. However, the
choice of a suitable schema transformation in the catalogue of the available
transformations may be time consuming, because it requires the time for
computing the matching substitution. On the other hand, one of the draw-
backs of the schema-based approach is the space requirement for storing
the catalogue itself. One more drawback is the fact that, when the program
to be transformed is not an instance of any schema in the catalogue, then
no action can be performed.
762 Alberto Pettorossi and Maurizio Proietti

5.3.2 Recursion removal


Recursion is the main control structure for declarative (functional or logic)
programs. Unfortunately, the extensive use of recursively denned proce-
dures may lead to inefficiency w.r.t. time and space. In the case of im-
perative programs some program transformation techniques that remove
recursion in favour of iteration have been studied, for instance, in [Pater-
son and Hewitt, 1970; Walker and Strong, 1972].
In logic programming languages, where no iterative constructs are avail-
able, recursion removal can be understood as a technique for deriving tail-
recursive clauses from recursive clauses.
A definite clause is said to be recursive iff its head predicate also occurs
in an atom of its body.
A recursive clause is said to be tail-recursive iff it is of the form
p(t) <- Lyp(u)
where I, is a definite goal. (For reasons of simplicity when dealing with
recursion removal, we restrict ourselves to definite programs.)
A program is said to be tail-recursive iff all its recursive clauses are
tail-recursive.
The elimination of recursion in favour of iteration can be achieved in
two steps. First the given program is transformed into an equivalent, tail-
recursive one, and then the derived tail-recursive program is executed in an
efficient, iterative way by using an ad hoc compiler optimization, called tail-
recursion optimization or last-call optimization (see [Bruynooghe, 1982] for
a detailed description and the applicability conditions in the case of Prolog
implementations).
Tail-recursion optimization makes sense only if we assume the left-to-
right computation rule, so that, for instance, when the clause p(t) «- L,p(u)
is invoked, the recursive call p(u) is the last call to be evaluated.
In principle, any recursive clause can be transformed into a tail-recursive
one by simply rearranging the order of the atoms in the body. This transfor-
mation is correct w.r.t. LHM (see Section 4.4.1). However, goal rearrange-
ments can increase the amount of nondeterminism, thus making useless the
efficiency improvements due to tail-recursion optimization. Moreover, goal
rearrangements do not preserve Prolog semantics (see Section 4.4.4), and
tail-recursion optimization is usually applied to Prolog programs.
Many researchers have proposed more complex transformation strate-
gies for obtaining tail-recursive programs without increasing the nondeter-
minism. We would like to mention the following three methods.
The first method consists in transforming almost-tail-recursive clauses
into tail-recursive ones [Debray, 1985; Azibi, 1987; Debray, 1988] by using
the unfold/fold rules. A clause is said to be almost-tail-recursive iff it is of
the form
p(t)<-L,p(u),R
Transformation of Logic Programs 763

where L is a conjunction of atoms and R, called the tail- computation, is a


conjunction of atoms whose predicates do not depend on p. Usually, the
tail-computation contains calls to 'primitive predicates', such as the ones for
computing concatenation of lists and arithmetic operations, like addition
or multiplication of integers. The transformation techniques presented in
[Debray, 1985; Azibi, 1987; Debray, 1988] use the generalization strategy
and some replacement laws which are valid for the primitive predicates,
such as the associativity of list concatenation, the associativity and the
commutativity of addition, and the distributivity of multiplication over
addition. Those techniques are closely related to the ones considered by
[Arsac and Kodratoff, 1982] for functional programs.
The second method is based on schema transformations [Bloch, 1984;
Brough and Hogger, 1987; Brough and Hogger, 1991], where some almost-
tail recursive program schemata are shown to be equivalent to tail-recursive
ones.
The third method consists in transforming a given program into a bi-
nary program, that is, a program whose clauses have only one atom in their
bodies [Tarau and Boyer, 1990]. This transformation method is applicable
to all programs and it is in the style of the continuation-based transforma-
tions for functional programs [Wand, 1980]. The transformation works by
adding to each predicate an extra argument which encodes the next goal to
be evaluated. This extra argument represents the so-called continuation.
For instance, the program

is transformed into the program


p 4- r(true)
r(G) <- G

This transformation in itself does not improve efficiency. However, it


allows us to use compilers based on a specialized version of the Warren Ab-
stract Machine [Warren, 1983], and to perform further efficiency improving
transformations [Demoen, 1993; Neumerkel, 1993].
5.3.3 Annotations and memoing
In the previous sections we have mainly considered transformations which
do not make use of the extra-logical features of logic languages, like cuts,
asserts, delay declarations, etc. In the literature, however, there are var-
ious papers which deal with transformation rules which preserve the op-
erational semantics of full Prolog (see Section 4.4.4), and there are also
some transformation strategies which work by inserting in a given Prolog
program extra-logical predicates for improving efficiency by taking advan-
tage of suitable properties of the evaluator. These strategies are related to
764 Alberto Pettorossi and Maurizio Proietti

some techniques which have been first introduced in the case of functional
programs and are referred to as program annotations [Schwarz, 1982].
In the case of Prolog, a typical technique which produces annotated
programs consists in adding a cut operator '!' in a point where the execution
of the program can be performed in a deterministic way. For instance, the
following Prolog program fragment:
p(X) <- C, Body A
p(X) <- not(C),BodyB
can be transformed (if C has no side-effects) into

p(X) <- BodyB


The derived code is more efficient than the initial one and behaves like
an if-then-else statement.
Prolog program transformations based on the insertion of cuts are re-
ported in [Sawamura and Takeshima, 1985; Debray and Warren, 1989;
Deville, 1990].
Other techniques which introduce annotations for the evaluator are
related to the automatic generation of delay declarations [Naish, 1985;
Wiggins, 1992], which procrastinate calls to predicates until they are suit-
ably instantiated.
A final form of annotation technique which has been used for improving
program efficiency is the so-called memoing [Michie, 1968]. Results of
previous computations are stored in a table together with the program
itself, and when a query has to be evaluated, that table is looked up first.
This technique has been implemented in logic programming by enhancing
the SLDNF-resolution compiler through tabulations [Warren, 1992] or by
using the 'assert' predicate for the run-time updating of programs [Sterling
and Shapiro, 1994].

6 Partial evaluation and program specialization


Partial evaluation (also called partial deduction in the case of logic pro-
gramming) is a program transformation technique which allows us to derive
a new program from an old one when part of the input data is known at
compile time. This technique which can be considered as an application of
the s-m-n theorem [Rogers, 1967], has been extensively applied in the field
of imperative and functional languages [Futamura, 1971; Ershov, 1977;
Bj0rner et a/., 1988; Jones et al., 1993] and first used in logic program-
ming by [Komorowski, 1982] (see also [Venken, 1984; Gallagher, 1986;
Safra and Shapiro, 1986; Takeuchi, 1986; Takeuchi and Furukawa, 1986;
Ershov et al., 1988] for early papers on partial deduction, with special
emphasis on the problem of partially evaluating meta-interpreters).
Transformation of Logic Programs 765

The resulting program may be more efficient than the initial program
because, by using the partially known input, it is possible to perform at
compile time some run-time computations.
Partial evaluation can be viewed as a particular case of program special-
ization [Scherlis, 1981], which is aimed at transforming a given program by
exploiting the knowledge of the context where that program is used. This
knowledge can be expressed as a precondition which is satisfied by the
values of the input to the program.
Not much work has been done in the area of logic program specializa-
tion, apart from the particular case of partial deduction. However, some
results are reported in [Bossi et al., 1990] and in various papers by Gal-
lagher and others [Gallagher et al., 1988; Gallagher and Bruynooghe, 1991;
de Waal and Gallagher, 1992]. In the latter papers the use of the abstract
interpretation methodology plays a crucial role. Using this methodology
one can represent and manipulate a possibly infinite set of input values
which satisfies a given precondition, by considering, instead, an element of
a finite abstract domain.
Abstract interpretations can be used before and after the application
of program specialization, that is, during the so-called preprocessing phase
and postprocessing phase. During the preprocessing phase, by using ab-
stract interpretations we may collect information depending on the control
flow, such as groundness of arguments and determinacy of predicates. This
information can then be exploited for directing the specialization process.
Examples of this preprocessing are the binding time analysis performed by
the Logimix partial evaluator of [Mogensen and Bondorf, 1993] and the
determinacy analysis performed by Mixtus [Sahlin, 1993].
During the postprocessing phase, abstract interpretations may be used
for improving the program obtained by the specialization process, as indi-
cated, for instance, in [Gallagher, 1993] where it is shown how one can get
rid of the so-called useless clauses.
The idea of partial evaluation of logic programs can be presented as
follows [Lloyd and Shepherdson, 1991]. Let us consider a normal program
P and a query «— A, where A is an atom. We construct a finite portion of
an SLDNF-tree for Pu {«- .4} containing at least one non-root node. For
this construction we use an unfolding strategy U which tells us the atoms
which should be unfolded and when to terminate the construction of that
tree. The notion of unfolding strategy is analogous to the one of u-selection
rule (page 746), but it applies to goals, instead of clauses. The design of
unfolding strategies which eventually terminate, thereby producing a finite
SLDNF-tree, can be done within general frameworks like the ones described
in [Bruynooghe et al., 1992; Bol, 1993].
We then construct the set of clauses {Adi <- G< | » = 1,... ,n}, called
resultants, obtained by collecting from each non-failed leaf of the SLDNF-
tree the query <- d and the corresponding computed answer substitution
766 Alberto Pettorossi and Maurizio Proietti

Fig. 5. An SLDNF-tree for P U {<- p(X,a)} using U.

A partial evaluation of P w.r.t. the atom A is the program PA obtained


from P as follows. Let A be of the form p(. . .). We first replace the clauses
of P which constitute the definition of the predicate symbol p by the set
of resultants {A0i <- Gi | i = l, . . . ,n}, and then we throw away the
definitions of the predicates, different from p, on which p does not depend
after the replacement.
Example 6.0.1. Let us consider the following program P:

and the atom A = p(X,a). Let us use the unfolding strategy U which
performs unfolding steps starting from the query <- p(X, a) until each leaf
of the SLDNF-tree is either a success or a failure or it is an atom with
predicate p. We get the tree depicted in Fig. 5.
By collecting the goals and the substitutions corresponding to the leaves
of that tree we have the following set of resultants:

which constitute the partial evaluation PA of p w.r.t. A. The clauses tor q


have been discarded because p does not depend on q in the above resultants.
If we use the program PA, the evaluation of an instance of the query
4- p(X, a) is more efficient than the one using the initial program because
the calls to the predicate q need not be evaluated and some failure branches
are avoided.
The notion of partial evaluation of a program w.r.t. an atom can be
extended to the notion of partial evaluation w.r.t. a set S of atoms by
considering the union of the sets of resultants relative to the atoms in 5.
We now introduce a correctness notion for partial evaluation which
Transformation of Logic Programs 767

refers to the semantics CASNF and FF considered in Section 4. Analogous


notions may be given with reference to other semantics.
Definition 6.0.2 (Correctness of partial evaluation). Let P be a
program, Q be a query, and S be a set of atoms. A partial evaluation PS
of P w.r.t. 5 is correct w.r.t. Q iff we have that
• CASNF[P,Q] = CASNF[PS,Q], and
. FF[P,Q] = FF[P5,Q].
Theorem 6.0.5 below establishes a criterion for the correctness of partial
evaluation. First we need the following definitions, where the notion of
instance is relative to a substitution which may be the identity substitution.
Definition 6.0.3. Let R be a program or a query. Given a set S of
atoms, we say that jR is S-closed iff every atom in R with predicate symbol
occurring in 5 is an instance of an atom in 5.
Definition 6.0.4. Given a set 5 of atoms, we say that S is independent
iff no two atoms in 5 have a common instance.
Theorem 6.0.5. [Lloyd and Shepherdson, 1991]. Given a program P, a
query Q, and a set S of atoms, let its consider a partial evaluation PS of
P w.r.t. S. If S is independent, and both P and Q are S-closed, then PS
is correct w.r.t. every instance ofQ.
In Example 6.0.1, the correctness w.r.t. every instance of the query
<— p(X, a) of the partial evaluation PA of the program P follows from The-
orem 6.0.5. Indeed, for the singleton {p(X,a)} the independence property
trivially holds, and the closedness property also holds because p([ ],o),
p([H\T],a), and p(T,a) are all instances of p(X,a).
The closedness and independence hypotheses cannot be dropped from
Theorem 6.0.5, as it is shown by the following two examples.
Example 6.0.6. Suppose we want to partially evaluate the following pro-
gram P:
p(a) <- p(6)

w.r.t. the atom p(a). We can derive the resultant p(o) <- p(b). Let A be
{p(o)|. Thus, a partial evaluation of P w.r.t. p(a) is the program PA'
p(a) <- p(b)
obtained by replacing the definition of p in P (that is, the whole program
P) by the resultant p(a) 4- p(b). PA is not {p(a)}- closed and we have that
CASNF[PA, <- p(a)] = {}, whereas CASNF[P, <- p(a)} = {{}}.
Example 6.0.7. Let us consider the following program P:
768 Alberto Pettorossi and Maurizio Proietti

and the set S of atoms {p,q(X), q(a)} which is not independent. A partial
evaluation of P w.r.t. S is the following program PS'

The program PS is S-closed and CASNF[PS, <- p] = {{}}, whereas


CASNF[P, <- p] = {}, because the unique SLDNF-derivation of PU{4- p}
flounders.
Lloyd and Shepherdson's theorem suggests the following methodology
for performing correct partial evaluations. Given a program P and a query
Q, we look for an independent set S of atoms and a partial evaluation PS
of P w.r.t. S such that both PS and Q are S-closed.
Various strategies have been proposed in the literature for computing
from a given program P and a given query Q, a suitable set 5 with the
independence and closedness properties (see, for instance, [Benkerimi and
Lloyd, 1990; Bruynooghe et al., 1992; Gallagher, 1991; Martens et al., 1992;
Gallagher, 1993]). Some of them require generalization steps and the use
of abstract interpretations.
Other techniques for partial evaluation and program specialization are
based on the unfold/fold rules [Fujita, 1987; Bossi et al., 1990; Sahlin, 1993;
Bossi and Cocco, 1993; Prestwich, 1993a; Proietti and Pettorossi, 1993]. By
using those techniques, given a program P and a query 4- G, we introduce
a new predicate newp defined by the clause
D. newp(Xi,...,Xn) «- G
where Xi,.. -,Xn are the variables occurring in G.
Obviously, newp(Xi,... ,Xn) and G are equivalent goals w.r.t. the se-
mantics CASNF and the program Pu{D}, and also w.r.t. FF and P(J{D}.
Moreover, we have that
CASNF[P U {£>},«- newp(Xi,...,Xn)] = CASNF[P, <- G], and
FF[P U {D}, <- netflp(*i,..., *„)] = FF[P, <- G].
Thus, we may look for a partial evaluation of the program P U {D}
w.r.t. newp(Xi, ... ,Xn), instead of a partial evaluation of P w.r.t. G.
The partial evaluation of P U {D} w.r.t. newp(...) can be achieved by
transforming P U {D} into a program Pa such that
CASNF[PU {£>},<- newp(Xi,...,Xn)]
= CASNF[PG, <- newp(Xi,...,Xn)], and
Transformation of Logic Programs 769

These two equalities hold if, for instance, we derive PG from Pu{D} by
using the definition, unfolding, and folding rules according to the restric-
tions of the second correctness theorems w.r.t. CASNF (page 738) and
FF (page 732), respectively.
Let us now briefly compare the two approaches to partial evaluation we
have mentioned above, that is, the one based on Lloyd and Shepherdson's
theorem and the one based on the unfold/fold rules.
In the approach based on Lloyd and Shepherdson's theorem, the ef-
ficiency gains are obtained by constructing SLDNF-trees and extracting
resultants. This process corresponds to the application of some unfolding
steps, and since efficiency gains are obtained without using the folding rule,
it may seem that this is an exception to the 'need for folding' meta-strategy
described in Section 5. However, in order to guarantee the correctness of
the partial evaluation of a given program P w.r.t. a set of atoms S, for
each element of S we are required to find an SLDNF-tree whose leaves
contain instances of atoms in S (see the closedness condition), and as the
reader may easily verify, this requirement exactly corresponds to the 'need
for folding'.
Conversely, the approach based on the unfold/fold rules does not require
us to find the set S with the closedness and independence properties, but as
we show in Example 6.0.8 below, we often need to introduce some auxiliary
clauses by the definition rule and we also need to perform some final folding
steps using those clauses.
Example 6.0.8 below also shows that in the partial evaluation approach
based on the unfold/fold rules, the use of the renaming technique for struc-
ture specialization [Benkerimi and Lloyd, 1990; Gallagher and Bruynooghe,
1990; Gallagher, 1993; Benkerimi and Hill, 1993] which is often required in
the first approach, is not needed. For other issues concerning the use of
folding during partial evaluation the reader may refer to [Owen, 1989].
We now present an example of derivation of a partial evaluation of a
program by applying the unfold/fold transformation rules and the loop
absorption strategy.
Example 6.0.8. [String matching] [Sahlin, 1991; Gallagher, 1993]. Let us
consider the following program Match for string matching:
1. match(P,S) «- aux(P,S,P,S)
2. aux([],X,Y,Z)<-
3. aux([A\Ps],[A\Ss],P,S) <- aux(Ps,Ss,P,S)
4. auo;([yl|Fs],[B|5s],P,[C|S]) <- -.(A = B),aux(P,S,P,S)
where the pattern P and the string S are represented as lists, and the
relation match(P, S) holds iff the pattern P occurs in the string 5. For
instance, the pattern [a,b] occurs in the string [c,a,b], but it does not occur
in the string [a,c,b].
770 Alberto Pettorossi and Maurizio Proietti

Fig. 6. An unfolding tree for (Match, newp(X) 4- match([a, a, b],X)).

Let us now partially evaluate the given program Match w.r.t. the atom
match([a,a,b],X). In order to do so we first introduce the following defi-
nition:
5. newp(X) «- match([a,a, b],X)
whose body is the atom w.r.t. which the partial evaluation should be per-
formed. As usual when applying the definition rule, the name of the head
predicate is a new symbol, newp in our case. Then we construct the un-
folding tree for (Match, clause 5) using the u-selection rule which
i) unfolds a clause w.r.t. any atom of the form either match(...) or
aux(...), and
ii) does not unfold a clause for which we can apply the loop absorption
strategy, that is, a clause in whose body there is an instance of an
atom which occurs in the body of a clause in an ancestor node.
We get the tree depicted in Fig. 6. In clause 8 of Fig. 6, the atom
aux([a, a, b], S, [a, a, 6], S) is an instance of the body of clause 6 via the
substitution {X/S}.
Analogously, in clause 10 the atom aux([a,a,b],[H\S],[a,a,b],[H\S]),
and in clause 12 the atom aux([a,a,b],[a,H\S],[a,a,b],[a,H\S]) are in-
stances of the body of clause 6.
Thus, we can apply the loop absorption strategy and we introduce the
new definition:
14. newq(S) <- aux([a,a,b],S,[a,a,b],S)
Transformation of Logic Programs 771

We fold clause 6 (see Fig. 6) using clause 14 and we get:


6f. newp(X) <- newq(X)
Now the unfold/fold derivation continues by looking for the recursive
definition of the predicate newq. This can be done by constructing the
unfolding tree for (Match, clause 14). This tree is equal to the tree depicted
in Fig. 6, except that clause 5 is deleted and the name newp is replaced by
the name newq.
Thus, the leaves of the unfolding tree for (Match, clause 14} have the
following clauses:
13q. newq([a,a,b\S]) f-
12q. newq([a, a, H|S|]) <-(b = H), aux((a, a, b], [a, H\S], [a, a, b], [a, H\S])
lOq. newq([a,H\S}) ^ ^(a = H),aux([a,a,b],[H\S],[a,a,b},[H\S})
8q. newq([H\S]) < 1(0 = H), aux([a,a,b],S, [a,a,b],S)
By folding clauses 12q, lOq, and 8q, we get the following program:
6f. newp(X) <- newq(X)
13q. newq([a, a, b\S}) 4-
12qf. newq([a, a, H\S}) 4- -.(& = H), newq([a, H\S])
lOqf. newq([a, H\S]) 4- -n(a = H), newq([H\S})
8qf. newq([H\S]) 4- -.(a = H), newq(S)
which is exactly the program produced by the Mixtus partial evaluator (see
[Sahlin, 1991], page 124).
One of the most interesting motivations for developing the partial eval-
uation methodology is that it can be used for compiling programs and for
deriving compilers from interpreters via the Futamura projections tech-
nique [Futamura, 1971]. For this last application it is necessary that the
partial evaluator be self-applicable. This means that it should be able
to partially evaluate itself. The interested reader may refer to [Jones
et al., 1993] for a general overview, and to [Fujita and Furukawa, 1988;
Fuller and Abramsky, 1988; Mogensen and Bondorf, 1993; Gurr, 1993;
Leuschel, 1994a] for more details on the problem of self-applicability of
partial evaluators in the logic languages Prolog and Godel.
Partial evaluation has also been used in the area of deductive databases
for deriving very efficient techniques for recursive query optimization and
integrity checking. Some results in this direction can be found in [Sakama
and Itoh, 1988; Bry, 1989; Leuschel, 1994b].

7 Related methodologies for program development


From what we have presented in the previous sections it is clear that
the program transformation methodology for program development is very
much related to various fields of artificial intelligence, theoretical computer
772 Alberto Pettorossi and Maurizio Proietti

science, and software engineering. Here we want to briefly indicate some


of the techniques and methods which are used in those fields and are of
relevance to the transformation methodology and its applications.
Let us begin by considering some program analysis techniques by which
the programmer can investigate various program properties. Those proper-
ties may then be used for improving efficiency by applying transformation
methods.
Program properties which are often useful for program transformation
concern, for instance, the flow of computation, the use of data structures,
the propagation of bindings, the sharing of information among arguments,
the termination for a given class of queries, the groundness and/or freeness
of arguments, and the functionality (or determinacy) of predicates.
Perfect knowledge about these properties is, in general, impossible to
obtain, because of undecidability limitations. However, it is often the case
that approximate reasoning can be carried out by using abstract interpre-
tation techniques [Cousot and Cousot, 1977; Debray, 1992]. They make
use of finite interpretation domains where information can be derived via a
'finite amount' of computation. The interpretation domains vary according
to the property to be analysed and the degree of information one would
like to obtain [Cortesi et al., 1992].
A general framework, where program transformation strategies are sup-
ported by abstract interpretation techniques, is defined in [Boulanger and
Bruynooghe, 1993]. Among the many transformation techniques which
depend on program analysis techniques, we would like to mention: i) com-
piling control (see Section 5.2.1), where the information about the flow of
computation is used for generating the unfolding tree, ii) the specialization
method of [Gallagher and Bruynooghe, 1991], which is based on a tech-
nique for approximating the set of all possible calls generated during the
evaluation of a given class of queries, iii) various techniques which insert
cuts on the basis of determinacy information (see Section 5.3), and iv) var-
ious techniques implemented in the Spes system [Alexandre et al., 1992;
Bsai'es, 1992] in which mode analysis is used to mechanize several transfor-
mation strategies.
Very much related with these methodologies for the analysis of programs
are the methodologies for the proof of properties of programs. They have
been used for program verification, and in particular, for ensuring that a
given set of clauses satisfies a given specification, or a given first order
formula is true in a chosen semantic domain. These proofs may be used for
guiding the application of suitable instances of the goal replacement rule.
Many proof techniques can be found in the literature, and in particular,
in the field of theorem proving and automated deduction. For the ones
which have been used for logic programs and may be adapted for program
transformation we recall those in [Drabent and Maluszyriski, 1988; Bossi
and Cocco, 1989; Deransart, 1989].
Transformation of Logic Programs 773

The field of program transformation partially overlaps with the field


of program synthesis (see [Deville and Lau, 1994] for a survey in the case
of logic programs). Indeed, if we consider the given initial program as
a program specification then the final program derived by transformation
can be considered as an implementation of that specification. However, it
is usually understood that program synthesis differs from program trans-
formation because in program synthesis the specification is a somewhat
implicit description of the program to be derived, and such implicit de-
scription often does not allow us to get the desired program by a sequence
of simple manipulations, like those determined by standard transformation
rules.
Moreover, it is often the case that the specification language differs
from the executable language in which the final program should be writ-
ten. This language barrier can be overcome by using transformation rules,
but these techniques, we think, go beyond the area of traditional program
transformation and belong to the field of logic program synthesis.
The transformational methods for developing logic programs are also
closely related to methods for logic program construction [Sterling and
Lakhotia, 1988; Deville, 1990; Sterling and Kirschenbaum, 1993], where
complex programs are developed by enhancing and composing together
simpler programs (see Section 5.2.2). However, the basic ideas and ob-
jectives of program construction are quite different from those of program
transformation. In particular, the starting point for the above mentioned
techniques for program construction is not a logic program, but a possibly
incomplete and not fully formalized specification. Thus, the notion of se-
mantics plays a minor role, in comparison with the techniques for program
transformation. Moreover, the main objective of program construction is
the improvement of the efficiency in software production, rather than the
improvement of the efficiency of programs.
Finally, we would like to mention that the transformation and spe-
cialization techniques considered in this chapter have been partially ex-
tended to the case of concurrent logic programs [Ueda and Furukawa,
1988] and constraint logic programs [Hickey and Smith, 1991; Maher, 1993;
Bensaou and Guessarian, 1994; Etalle and Gabbrielli, 1996].

Conclusions
We have looked at the theoretical foundations of the so-called 'rules +
strategies' approach to logic program transformation. We have established
a unified framework for presenting and comparing the various rules which
have been proposed in the literature. That framework is parametric with
respect to the semantics which is preserved during transformation.
We have presented various sets of transformation rules and the cor-
responding correctness results w.r.t. different semantics of definite logic
774 Alberto Pettorossi and Maurizio Proietti

programs, such as: the least Herbrand model, the computed answer sub-
stitutions, the finite failure, and the pure Prolog semantics.
We have also considered the case of normal programs, and using the
proposed framework, we have presented the rules which preserve computed
answer substitutions, finite failure, and Clark's completion semantics. We
have briefly mentioned the results concerning the rules which preserve other
semantics for normal programs.
We have also presented a unified framework in which it is possible to
describe some of the most significant techniques for guiding the application
of the transformation rules with the aim of improving program efficiency.
We have singled out a few basic strategies, such as tupling, loop absorption,
and generalization, and we have shown that various methods for compiling
control, program composition, change of data representations, and partial
evaluation, can be viewed as suitable applications of those strategies.
An area of further investigation is the characterization of the power of
the transformation rules and strategies, both in the 'completeness' sense,
that is, their capability of deriving all programs which are equivalent to the
given initial program, and in the 'complexity' sense, that is, their capability
of deriving programs which are more efficient than the initial program. No
conclusive results are available in either direction.
A line of research that can be pursued in the future, is the integration
of tools, like abstract interpretations, proofs of properties, and program
synthesis, within the 'rules + strategies' approach to program transforma-
tion.
Unfortunately, the transformational methodology in the practice of logic
programming has gained only moderate attention in the past. However, it
is recognized that the automation of transformation techniques and their
integrated use is of crucial importance for building advanced software de-
velopment systems.
There is a growing interest in the mechanization of transformation
strategies and the production of interactive tools for implementing pro-
gram transformers. Moreover, some optimizing compilers already devel-
oped make use of various transformation techniques.
The importance of the transformation methodology will substantially
increase if we extend its theory and applications also to the case of complex
logic languages which manipulate constraints, and support both concur-
rency and object-orientation.

Acknowledgements
We thank M. Bruynooghe, J. P. Gallagher, M. Leuschel, M. Maher, and
H. Seki for their helpful comments and advice on many issues concerning
the transformation of logic programs. Our thanks go also to O. Aioni, P.
Dell'Acqua, M. Gaspari and M. Kalsbeek for reading a preliminary version •
of this chapter.
Transformation of Logic Programs 775

References
[Alexandra et at, 1992] F. Alexandra, K. Bsai'es, J. P. Finance, and
A. Quere. Spes: A system for logic program transformation. In Pro-
ceedings of the International Conference on Logic Programming and Au-
tomated Reasoning, LPAR '92, Lecture Notes in Computer Science 624,
pages 445-447, 1992.
[Amtoft, 1992] T. Amtoft. Unfold/fold transformations preserving termi-
nation properties. In Proc. PLILP '92, Leuven, Belgium, Lecture Notes
in Computer Science 631, pages 187-201. Springer-Verlag, 1992.
[Apt, 1990] K. R. Apt. Introduction to logic programming. In J. van
Leeuwen, editor, Handbook of Theoretical Computer Science, pages 493-
576. Elsevier, 1990.
[Aravindan and Dung, 1995] C. Aravindan and P. M. Dung. On the cor-
rectness of unfold/fold transformation of normal and extended logic pro-
grams. Journal of Logic Programming, 24(3):201-217, 1995.
[Arsac and Kodratoff, 1982] J. Arsac and Y. Kodratoff. Some techniques
for recursion removal from recursive functions. ACM Transactions on
Programming Languages and Systems, 4(2):295-322, 1982.
[Azibi, 1987] N. -Azibi. TREQUASI: Un systeme pour la transformation
automatique de programmes Prolog recursifs en quasi-iteratifs. PhD the-
sis, Universite de Paris-Sud, Centre d'Orsay, France, 1987.
[Baudinet, 1992] M. Baudinet. Proving termination properties of Prolog
programs: A semantic approach. Journal of Logic Programming, 14:1-
29, 1992.
[Benkerimi and Hill, 1993] K. Benkerimi and P. M. Hill. Supporting trans-
formations for the partial evaluation of logic programs. Journal of Logic
and Computation, 3(5):469-486, 1993.
[Benkerimi and Lloyd, 1990] K. Benkerimi and J. W. Lloyd. A partial eval-
uation procedure for logic programs. In S. Debray and M. Hermenegildo,
editors, Logic Programming: Proceedings of the 1990 North American
Conference, Austin, TX, USA, pages 343-358. The MIT Press, 1990.
[Bensaou and Guessarian, 1994] N. Bensaou and I. Guessarian. Trans-
forming constraint logic programs. In llth Symp. on Theoretical Aspects
of Computer Science, STAGS '94, Lecture Notes in Computer Science
775, pages 33-46. Springer-Verlag, 1994.
[Bird, 1984] R. S. Bird. The promotion and accumulation strategies in
transformational programming. ACM Toplas, 6(4):487-504, 1984.
[Bj0rner et al., 1988] D. Bj0rner, A. P. Ershov, and N. D. Jones, editors.
Partial Evaluation and Mixed Computation. North-Holland, 1988. IFIP
TC2 Workshop on Partial and Mixed Computation, Gammel Avernaes,
Denmark, 1987.
776 Alberto Pettorossi and Maurizio Proietti

[Bloch, 1984] C. Bloch. Source-to-source transformations of logic pro-


grams. Master's thesis, Department of Applied Mathematics, Weizmann
Institute of Science, Rehovot, Israel, 1984.
[Bol, 1993] R. Bol. Loop checking in partial deduction. Journal of Logic
Programming, 16:25-46, 1993.
[Bossi and Cocco, 1989] A. Bossi and N. Cocco. Verifying correctness of
logic programs. In Proceedings TAPSOFT '89, Lecture Notes in Com-
puter Science 352, pages 96-110. Springer-Verlag, 1989.
[Bossi and Cocco, 1993] A. Bossi and N. Cocco. Basic transformation oper-
ations which preserve computed answer substitutions of logic programs.
Journal of Logic Programming, 16(l&2):47-87, 1993.
[Bossi and Cocco, 1994] A. Bossi and N. Cocco. Preserving universal ter-
mination through unfold/fold. In Proceedings ALP '94, Lecture Notes
in Computer Science 850, pagas 269-286. Springer-Verlag, 1994.
[Bossi and Etalle, 1994a] A. Bossi and S. Etalle. More on unfold/fold
transformations of normal programs: Preservation of Fitting's seman-
tics. In L. Pribourg and F. Turini, editors, Proceedings of LOPSTR'94
and META '94, Pisa, Italy, Lecture Notes in Computer Science 883, pages
311-331, Springer-Verlag, 1994.
[Bossi and Etalle, 1994b] A. Bossi and S. Etalle. Transforming acyclic pro-
grams. ACM Transactions on Programming Languages and Systems,
16(4): 1081-1096, July 1994.
[Bossi et al., 1990] A. Bossi, N. Cocco, and S. Dulli. A method for spe-
cializing logic programs. ACM Transactions on Programming Languages
and Systems, 12(2):253-302, April 1990.
[Bossi et al., 1992a] A. Bossi, N. Cocco, and S. Etalle. On safe folding. In
Proceedings PLILP '92, Leuven, Belgium, Lecture Notes in Computer
Science 631, pages 172-186. Springer-Verlag, 1992.
[Bossi et al., 1992b] A. Bossi, N. Cocco, and S. Etalle. Transforming nor-
mal programs by replacement. In A. Pettorossi, editor, Proceedings 3rd
International Workshop on Meta-Programming in Logic, Meta '92, Upp-
sala, Sweden, Lecture Notes in Computer Science 649, pages 265-279.
Springer-Verlag, 1992.
[Boulanger and Bruynooghe, 1993] D. Boulanger and M. Bruynooghe. De-
riving unfold/fold transformations of logic programs using extended
OLDT-based abstract interpretation. Journal of Symbolic Computation,
15:495-521, 1993.
[Boyer and Moore, 1975] R. S. Boyer and J. S. Moore. Proving theorems
about Lisp functions. Journal of the ACM, 22(1):129-144, 1975.
[Brough and Hogger, 1987] D. R. Brough and C. J. Hogger. Compiling
associativity into logic programs. Journal of Logic Programming, 4:345-
359, 1987.
Transformation of Logic Programs 111

[Brough and Hogger, 1991] D. R. Brough and C. J. Hogger. Grammar-


related transformations of logic programs. New Generation Computing,
9(1):115-134, 1991.
[Bruynooghe, 1982] M. Bruynooghe. The memory management of Prolog
implementations. In K. L. Clark and S.-A. Tarnlund, editors, Logic Pro-
gramming, pages 83-98. Academic Press, 1982.
[Bruynooghe and Pereira, 1984] M. Bruynooghe and L. M. Pereira. De-
duction revision by intelligent backtracking. In J. A. Campbell, editor,
Implementations of Prolog, pages 253-266. Ellis Horwood, 1984.
[Bruynooghe et al, 1989] M. Bruynooghe, D. De Schreye, and B. Krekels.
Compiling control. Journal of Logic Programming, 6:135-162, 1989.
[Bruynooghe et al., 1992] M. Bruynooghe, D. De Schreye, and B. Martens.
A general criterion for avoiding infinite unfolding during partial deduc-
tion of logic programs. New Generation Computing, 11:47-79, 1992.
[Bry, 1989] F. Bry. Query evaluation in recursive data bases: Bottom-up
and top-down reconciled. In Proceedings 1st International Conference
on Deductive and Object-Oriented Databases, Kyoto, Japan, 1989.
[Bsa'ies, 1992] K. Bsaies. Static analysis for the synthesis of eureka proper-
ties for transforming logic programs. In Proceedings J^th UK Conference
on Logic Programming, ALPUK '92, Workshops in Computing, pages
41-61. Springer-Verlag, 1992.
[Burstall and Darlington, 1975] R. M. Burstall and J. Darlington. Some
transformations for developing recursive programs. In Proceedings of the
International Conference on Reliable Software, Los Angeles, CA,USA,
pages 465-472, 1975.
[Burstall and Darlington, 1977] R. M. Burstall and J. Darlington. A trans-
formation system for developing recursive programs. Journal of the
ACM, 24(l):44-67, January 1977.
[Clark and Sickel, 1977] K. L. Clark and S. Sickel. Predicate logic: A cal-
culus for deriving programs. In Proceedings 5th International Joint Con-
ference on Artificial Intelligence, Cambridge, MA, USA, pages 419-420,
1977.
[Clark and Tarnlund, 1977] K. L. Clark and S.-A. Tarnlund. A first order
theory of data and programs. In Proceedings Information Processing '77,
pages 939-944. North-Holland, 1977.
[Cook and Gallagher, 1994] J. Cook and J. P. Gallagher. A transform-
ation system for definite programs based on termination analysis. In
L. Fribourg and F. Turini, editors, Proceedings of LOPSTR'94 and
META'94, Pisa, Italy, Lecture Notes in Computer Science 883, pages
51-68, Springer-Verlag, 1994.
[Cortesi et al., 1992] A. Cortesi, G. File\ and W. Winsborough. Com-
parison of abstract interpretations. In Proceedings Nineteenth ICALP,
778 Alberto Pettorossi and Maurizio Proietti

Wien, Austria, Lecture Notes in Computer Science 623, pages 521-532.


Springer-Verlag, 1992.
[Cousot and Cousot, 1977] P. Cousot and R. Cousot. Abstract interpreta-
tion: A unified lattice model for static analysis of programs by construc-
tion of approximation of fixpoints. In Proceedings 4th ACM-SIGPLAN
Symposium on Princples of Programming Languages (POPL '77), pages
238-252. ACM Press, 1977.
[Darlington, 1972] J. Darlington. A Semantic Approach to Automatic Pro-
gram Improvement. PhD thesis, Department of Machine Intelligence,
Edinburgh University, Edinburgh (Scotland) UK, 1972.
[Darlington, 1978] J. Darlington. A synthesis of several sorting algorithms.
Acta Informatica, 11:1-30, 1978.
[Darlington, 1981] J. Darlington. An experimental program transform-
ation system. Artificial Intelligence, 16:1-46, 1981.
[De Schreye and Bruynooghe, 1989] D. De Schreye and M. Bruynooghe.
On the transformation of logic programs with instantiation based com-
putation rules. Journal of Symbolic Computation, 7:125-154, 1989.
[De Schreye et al, 1991] D. De Schreye, B. Martens, G. Sablon, and
M. Bruynooghe. Compiling bottom-up and mixed derivations into top-
down executable logic programs. Journal of Automated Reasoning,
7:337-358, 1991.
[de Waal and Gallagher, 1992] D. A. de Waal and J. P. Gallagher. Special-
ization of a unification algorithm. In T. Clement and K.-K. Lau, editors,
Logic Program Synthesis and Transformation, Proceedings LOPSTR '91,
Manchester, UK, Workshops in Computing, pages 205-221. Springer-
Verlag, 1992.
[Debray, 1985] S. K. Debray. Optimizing almost-tail-recursive Prolog pro-
grams. In Proceedings IFIP International Conference on Functional Pro-
gramming Languages and Computer Architecture, Nancy, Prance, Lec-
ture Notes in Computer Science 201, pages 204-219. Springer-Verlag,
1985.
[Debray, 1988] S. K. Debray. Unfold/fold transformations and loop opti-
mization of logic programs. In Proceedings SIGPLAN 88 Conference on
Programming Language Design and Implementation, Atlanta, GA, USA,
SIGPLAN Notices, 23, (7), pages 297-307, 1988.
[Debray, 1992] S. K. Debray, editor. Special Issue of the Journal of Logic
Programming on Abstract Interpretation, volume 12, Nos. 2&3. Elsevier,
1992.
[Debray and Mishra, 1988] S. K. Debray and P. Mishra. Denotational and
operational semantics for Prolog. Journal of Logic Programming, 5:61-
91, 1988.
[Debray and Warren, 1988] S. K. Debray and D. S. Warren. Automatic
Transformation of Logic Programs 779

mode inference for logic programs. Journal of Logic Programming, 5:207-


229, 1988.
[Debray and Warren, 1989] S. K. Debray and D. S. Warren. Functional
computations in logic programs. ACM TOPLAS, 11(3):451-481, 1989.
[Demoen, 1993] B. Demoen. On the transformation of a Prolog program to
a more efficient binary program. In K.-K. Lau and T. Clement, editors,
Logic Program Synthesis and Transformation, Proceedings LOPSTR '92,
Manchester, UK, Workshops in Computing, pages 242-252. Springer-
Verlag, 1993.
[Deransart, 1989] P. Deransart. Proof methods of declarative properties of
logic programs. In Proceedings TAPSOFT '89, Lecture Notes in Com-
puter Science 352, pages 207-226. Springer-Verlag, 1989.
[Deville, 1990] Y. Deville. Logic Programming: Systematic Program De-
velopment. Addison-Wesley, 1990.
[Deville and Burnay, 1989] Y. Deville and J. Burnay. Generalization and
program schemata. In Proceedings NACLP '89, pages 409-425. The MIT
Press, 1989.
[Deville and Lau, 1994] Y. Deville and K.-K. Lau. Logic program synthe-
sis. Journal of Logic Programming, 19, 20:321-350, 1994.
[Dix, 1995] J. Dix. A classification theory of semantics of normal logic pro-
grams: II weak properties. Fundamenta Informaticae, XII(3):257-288,
1995.
[Drabent and Mamszyriski, 1988] W. Drabent and J. Maluszyriski. Induc-
tive assertion method for logic programs. Theoretical Computer Science,
1(1):133-155, 1988.
[Ershov, 1977] A. P. Ershov. On the partial computation principle. Infor-
mation Processing Letters, 6(2):38-41, 1977.
[Ershov et al, 1988] A. P. Ershov, D. Bj0rner, Y. Futamura, K. Furukawa,
A. Haraldson, and W. Scherlis, editors. Special Issue of New Generation
Computing: Workshop on Partial Evaluation and Mixed Computation,
volume 6, Nos. 2&3. Ohmsha Ltd. and Springer-Verlag, 1988.
[Etalle and Gabbrielli, 1996] S. Etalle and M. Gabbrielli. Modular trans-
formations of CLP programs. Theoretical Computer Science, 166:101-
146, 1996.
[Feather, 1982] M. S. Feather. A system for assisting program transform-
ation. ACM Toplas, 4(1):1-20, 1982.
[Feather, 1987] M. S. Feather. A survey and classification of some program
transformation techniques. In L. G. L. T. Meertens, editor, Proceedings
IFIP TC2 Working Conference on Program Specification and Transform-
ation, Bad Tolz, Germany, pages 165-195. North-Holland, 1987.
[Fitting, 1985] M. Fitting. A Kripke-Kleene semantics for logic programs.
Journal of Logic Programming, 2(4):295-312, 1985.
780 Alberto Pettorossi and Maurizio Proietti

[Flener and Deville, 1993] P. Flener and Y. Deville. Logic program syn-
thesis from incomplete specifications. Journal of Symbolic Computation,
15:775-805, 1993.
[Fuchs and Fromherz, 1992] N. E. Fuchs and M. P. J. Fromherz. Schema-
based transformations of logic programs. In T. Clement and K.-K. Lau,
editors, Logic Program Synthesis and Transformation, Proceedings LOP-
STR '91, Manchester, UK, pages 111-125. Springer-Verlag, 1992.
[Fujita, 1987] H. Fujita. An algorithm for partial evaluation with con-
straints. Technical Memorandum TM-0367, ICOT, Tokyo, Japan, 1987.
[Fujita and Furukawa, 1988] H. Fujita and K. Furukawa. A self-applicable
partial evaluator and its use in incremental compilation. New Generation
Computing, 6(2&3):91-118, 1988.
[Fuller and Abramsky, 1988] D. A. Fuller and S. Abramsky. Mixed compu-
tation of Prolog programs. New Generation Computing, 6(2&3): 119-141,
1988.
[Futamura, 1971] Y. Futamura. Partial evaluation of computation
process—an approach to a compiler-compiler. Systems, Computers,
Controls, 2(5):45-50, 1971.
[Gallagher, 1986] J. P. Gallagher. Transforming programs by specializing
interpreters. In Proceedings Seventh European Conference on Artificial
Intelligence, ECAI '86, pages 109-122, 1986.
[Gallagher, 1991] J. P. Gallagher. A system for specializing logic programs.
Technical Report TR-91-32, University of Bristol, Bristol, U.K., 1991.
[Gallagher, 1993] J. P. Gallagher. Tutorial on specialization of logic pro-
grams. In Proceedings of ACM SIGPLAN Symposium on Partial Evalu-
ation and Semantics Based Program Manipulation, PEPM '93, Copen-
hagen, Denmark, pages 88-98. ACM Press, 1993.
[Gallagher and Bruynooghe, 1990] J. P. Gallagher and M. Bruynooghe.
Some low-level source transformations for logic programs. In M. Bruy-
nooghe, editor, Proceedings of the Second Workshop on Meta-Pro-
gramming in Logic, Leuven, Belgium, pages 229-246. Department of
Computer Science, KU Leuven (Belgium), April 1990.
[Gallagher and Bruynooghe, 199l] J. P. Gallagher and M. Bruynooghe.
The derivation of an algorithm for program specialisation. New Gen-
eration Computing, 6(2):305-333, 1991.
[Gallagher et al, 1988] J. P. Gallagher, M. Codish, and E. Shapiro. Spe-
cialization of Prolog and FCP programs using abstract interpretation.
New Generation Computing, 6(2&3):159-186, 1988.
[Gardner and Shepherdson, 1991] P. A. Gardner and J. C. Shepherdson.
Unfold/fold transformations of logic programs. In J.-L. Lassez and
G. Plotkin, editors, Computational Logic, Essays in Honor of Alan
Robinson, pages 565-583. The MIT Press, 1991.
Transformation of Logic Programs 781

[Gelfond and Lifschitz, 1988] M. Gelfond and V. Lifschitz. The stable


model semantics for logic programming. In Proceedings of the Fifth In-
ternational Conference and Symposium on Logic Programming, pages
1070-1080. The MIT Press, 1988.
[Gergatsoulis and Katzouraki, 1994] M. Gergatsoulis and M. Katzouraki.
Unfold/fold transformations for definite clause programs. In Proceedings
Sixth International Symposium on Programming Language Implementa-
tion and Logic Programming (PLILP '94), Lecture Notes in Computer
Science 844. Springer-Verlag, 1994.
[Gurr, 1993] C. A. Gurr. A Self-Applicable Partial Evaluator for the Logic
Programming Language Godel. PhD thesis, University of Bristol, Bristol,
UK, 1993.
[Hansson and Tarnlund, 1982] A. Hansson and S.-A. Tarnlund. Program
transformation by data structure mapping. In K. L. Clark and S.-A.
Tarnlund, editors, Logic Programming, pages 117-122. Academic Press,
1982.
[Hickey and Smith, 1991] T. J. Hickey and D. A. Smith. Towards the par-
tial evaluation of CLP languages. In Proceedings ACM Symposium on
Partial Evaluation and Semantics Based Program Manipulation, PEPM
'91, New Haven, CT, USA, SIGPLAN Notices, 26, 9, pages 43-51. ACM
Press, 1991.
[Hogger, 1981] C. J. Hogger. Derivation of logic programs. Journal of the
ACM, 28(2):372-392, 1981.
[Huet and Lang, 1978] G. Huet and B. Lang. Proving and applying pro-
gram transformations expressed with second-order patterns. Acta Infor-
matica, 11:31-55, 1978.
[Jones and Mycroft, 1984] N. D. Jones and A. Mycroft. Stepwise devel-
opment of operational and denotational semantics for Prolog. In Pro-
ceedings 1984 International Symposium on Logic Programming, Atlantic
City, NJ, USA, pages 289-298, 1984.
[Jones et al., 1993] N. D. Jones, C. K. Gomard, and P. Sestoft. Partial
Evaluation and Automatic Program Generation. Prentice Hall, 1993.
[Kanamori and Fujita, 1986] T. Kanamori and H. Fujita. Unfold/fold
transformation of logic programs with counters. Technical Report 179,
ICOT, Tokyo, Japan, 1986.
[Kanamori and Horiuchi, 1987] T. Kanamori and K. Horiuchi. Construc-
tion of logic programs based on generalized unfold/fold rules. In Pro-
ceedings of the Fourth International Conference on Logic Programming,
pages 744-768. The MIT Press, 1987.
[Kawamura and Kanamori, 1990] T. Kawamura and T. Kanamori. Preser-
vation of stronger equivalence in unfold/fold logic program transform-
ation. Theoretical Computer Science, 75:139-156, 1990.
782 Alberto Pettorossi and Maurizio Proietti

[Kirschenbaum et al, 1989] M. Kirschenbaum, A. Lakhotia, and L. Ster-


ling. Skeletons and techniques for Prolog programming. TR 89-170, Case
Western Reserve University, 1989.
[Kleene, 1971] S. C. Kleene. Introduction to Metamathematics. North-
Holland, 1971.
[Komorowski, 1982] H. J. Komorowski. Partial evaluation as a means for
inferencing data structures in an applicative language: A theory and
implementation in the case of Prolog. In Ninth ACM Symposium on
Principles of Programming Languages, Albuquerque, New Mexico, USA,
pages 255-267, 1982.
[Kott, 1978] L. Kott. About transformation system: A theoretical study.
In Seme Collogue International sur la Programmation, pages 232-247,
Paris (France), 1978. Dunod.
[Kott, 1982] L. Kott. The McCarthy's induction principle: 'oldy' but
'goody'. Calcolo, 19(l):59-69, 1982.
[Kowalski, 1979] R. A. Kowalski. Algorithm = Logic + Control. Commu-
nications of the ACM, 22(7):424-436, 1979.
[Kunen, 1987] K. Kunen. Negation in logic programming. Journal of Logic
Programming, 4(4):289-308, 1987.
[Kunen, 1989] K. Kunen. Signed data dependencies in logic programs.
Journal of Logic Programming, 7:231-246, 1989.
[Leuschel, 1994a] M. Leuschel. Partial evaluation of the real thing. In
L. Fribourg and F. Turini, editors, Proceedings of LOPSTR'94 and
META '94, Pisa, Italy, Lecture Notes in Computer Science 883, pages
122-137, Springer-Verlag, 1994.
[Leuschel, 1994b] M. Leuschel. Partial evaluation of the real thing and its
application to integrity checking. Technical report, Computer Science
Department, K.U. Leuven, Heverlee, Belgium, 1994.
[Lloyd, 1987] J. W. Lloyd. Foundations of Logic Programming. Springer-
Verlag, Berlin, Second Edition, 1987.
[Lloyd and Shepherdson, 1991] J. W. Lloyd and J. C. Shepherdson. Partial
evaluation in logic programming. Journal of Logic Programming, 11:217-
242, 1991.
[Maher, 1987] M. J. Maher. Correctness of a logic program transformation
system. IBM Research Report RC 13496, T. J. Watson Research Center,
1987.
[Maher, 1990] M. J. Maher. Reasoning about stable models (and other
unstable semantics). IBM research report, T. J. Watson Research Center,
1990.
[Maher, 1993] M. J. Maher. A transformation system for deductive data-
base modules with perfect model semantics. Theoretical Computer Sci-
ence, 110:377-403, 1993.
Transformation of Logic Programs 783

[Marakakis and Gallagher, 1994] E. Marakakis and J. P. Gallagher. Sche-


ma-based top-down design of logic programs using abstract data types.
In L. Pribourg and F. Turini, editors, Proceedings of LOPSTR'94 and
META '94, Pisa, Italy, Lecture Notes in Computer Science 883, pages
138-153, Springer-Verlag, 1994.
[Marriot and S0ndergaard, 1993] K. Marriot and H. S0ndergaard. Differ-
ence-list transformation for Prolog. New Generation Computing, 11:125-
177, 1993.
[Martens et ai, 1992] B. Martens, D. De Schreye, and M. Bruynooghe.
Sound and complete partial deduction with unfolding based on well-
founded measures. In Proceedings of the International Conference on
Fifth Generation Computer Systems, pages 473-480. Ohmsha Ltd., IOS
Press, 1992.
[Michie, 1968] D. Michie. Memo functions and machine learning. Nature,
218(5136):19-22,1968.
[Mogensen and Bondorf, 1993] T. Mogensen and A. Bondorf. Logimix: A
self-applicable partial evaluator for Prolog. In K.-K. Lau and T. Clement,
editors, Logic Program Synthesis and Transformation, Proceedings LOP-
STR '92, Manchester, UK, Workshops in Computing, pages 214-227.
Springer-Verlag, 1993.
[Naish, 1985] L. Naish. Negation and Control in Prolog. Lecture Notes in
Computer Science 238. Springer-Verlag, 1985.
[Narain, 1986] S. Narain. A technique for doing lazy evaluation in logic.
Journal of Logic Programming, 3(3):259-276, 1986.
[Neumerkel, 1993] U. W. Neumerkel. Specialization of Prolog Programs
with Partially Static Goals and Binarization. PhD thesis, Technical Uni-
versity Wien, Austria, 1993.
[Owen, 1989] S. Owen. Issues in the partial evaluation of meta-interpreters.
In H. Abramson and M. H. Rogers, editors, Meta-Programming in Logic
Programming, pages 319-339. The MIT Press, 1989.
[Paige and Koenig, 1982] R. Paige and S. Koenig. Finite differencing of
computable expressions. ACM Transactions on Programming Languages
and Systems, 4(3):402-454, 1982.
[Partsch, 1990] H. A. Partsch. Specification and Transformation of Pro-
grams. Springer-Verlag, 1990.
[Paterson and Hewitt, 1970] M. S. Paterson and C. E. Hewitt. Compara-
tive schematology. In Conference on Concurrent Systems and Parallel
Computation Project MAC, Woods Hole, Mass., USA, pages 119-127,
1970.
[Pettorossi, 1977] A. Pettorossi. Transformation of programs and use of
tupling strategy. In Proceedings Informatica 77, Bled, Yugoslavia, pages
1-6, 1977.
784 Alberto Pettorossi and Maurizio Proietti

[Pettorossi and Proietti, 1989] A. Pettorossi and M. Proietti. Decidability


results and characterization of strategies for the development of logic pro-
grams. In G. Levi and M. Martelli, editors, Proceedings of the Sixth In-
ternational Conference on Logic Programming, Lisbon, Portugal, pages
539-553. The MIT Press, 1989.
[Pettorossi and Proietti, 1994] A. Pettorossi and M. Proietti. Transform-
ation of logic programs: Foundations and techniques. Journal of Logic
Programming, 19,20:261-320, 1994.
[Pettorossi and Proietti, 1996] A. Pettorossi and M. Proietti. Rules and
strategies for transforming functional and logic programs. A CM Com-
puting Surveys, 28(2):360-414, 1996.
[Prestwich, 1993a] S. Prestwich. Online partial deduction of large pro-
grams. In Proceedings ACM Sigplan Symposium on Partial Evaluation
and Semantics-Based Program Manipulation, PEPM '93, Copenhagen,
Denmark, pages 111-118. ACM Press, 1993.
[Prestwich, 1993b] S. Prestwich. An unfold rule for full Prolog. In K.-
K. Lau and T. Clement, editors, Logic Program Synthesis and Trans-
formation, Proceedings LOPSTR '92, Manchester, UK, Workshops in
Computing, pages 199-213. Springer-Verlag, 1993.
[Proietti and Pettorossi, 1990] M. Proietti and A. Pettorossi. Synthesis of
eureka predicates for developing logic programs. In N. D. Jones, editor,
Third European Symposium on Programming, ESOP '90, Lecture Notes
in Computer Science 432, pages 306-325. Springer-Verlag, 1990.
[Proietti and Pettorossi, 1991] M. Proietti and A. Pettorossi. Semantics
preserving transformation rules for Prolog. In A CM Symposium on Par-
tial Evaluation and Semantics Based Program Manipulation, PEPM '91,
Yale University, New Haven, CT, USA, pages 274-284. ACM Press, 1991.
[Proietti and Pettorossi, 1993] M. Proietti and A. Pettorossi. The loop
absorption and the generalization strategies for the development of logic
programs and partial deduction. Journal of Logic Programming, 16(1-
2):123-161, 1993.
[Proietti and Pettorossi, 1994a] M. Proietti and A. Pettorossi. Synthesis of
programs from unfold/fold proofs. In Y. Deville, editor, Logic Program
Synthesis and Transformation, Proceedings of LOPSTR '93, Louvain-
la-Neuve, Belgium, Workshops in Computing, pages 141-158. Springer-
Verlag, 1994.
[Proietti and Pettorossi, 1994b] M. Proietti and A. Pettorossi. Total cor-
rectness of the goal replacement rule based on unfold/fold proofs. In
M. Alpuente, R. Barbuti, and I. Ramos, editors, Proceedings of the
1994 Joint Conference on Declarative Programming, GULP-PRODE
'94, pages 203-217. Universidad Politecnica de Valencia, Peniscola,
Spain, September 19-22, 1994.
Transformation of Logic Programs 785

[Proietti and Pettorossi, 1995] M. Proietti and A. Pettorossi. Unfolding-


definition-folding, in this order, for avoiding unnecessary variables in
logic programs. Theoretical Computer Science, 142(1):89-124, 1995.
[Przymusinsky, 1987] T. Przymusinsky. On the declarative semantics of
stratified deductive databases and logic programs. In J. Minker, editor,
Foundations of Deductive Databases and Logic Programming, pages 193-
216. Morgan Kaufmann, 1987.
[Rogers, 1967] H. Rogers. Theory of Recursive Functions and Effective
Computability. McGraw-Hill, 1967.
[Safra and Shapiro, 1986] S. Safra and E. Shapiro. Meta interpreters for
real. In H. J. Kugler, editor, Proceedings Information Processing 86,
pages 271-278. North-Holland, 1986.
[Sahlin, 1991] D. Sahlin. An Automatic Partial Evaluator for Full Prolog.
PhD thesis, SICS, Sweden, 1991.
[Sahlin, 1993] D. Sahlin. Mixtus: An automatic partial evaluator for full
Prolog. New Generation Computing, 12:7-51, 1993.
[Sakama and Itoh, 1988] C. Sakama and H. Itoh. Partial evaluation of
queries in deductive databases. New Generation Computing, 6(2, 3):249-
258, 1988.
[Sato, 1992] T. Sato. An equivalence preserving first order unfold/fold
transformation system. Theoretical Computer Science, 105:57-84,1992.
[Sato and Tamaki, 1988] T. Sato and H. Tamaki. Deterministic transform-
ation and deterministic synthesis. In Future Generation Computers.
North-Holland, 1988.
[Sawamura and Takeshima, 1985] H. Sawamura and T. Takeshima. Recur-
sive unsolvability of determinacy, solvable cases of determinacy and their
application to Prolog optimization. In Proceedings of the International
Symposium on Logic Programming, Boston, USA, pages 200-207. IEEE
Computer Society Press, 1985.
[Scherlis, 1981] W. L. Scherlis. Program improvement by internal special-
ization. In Proc. 8th ACM Symposium on Principles of Programming
Languagesi, Williamsburgh, VA, pages 41-49. ACM Press, 1981.
[Schwarz, 1982] J. Schwarz. Using annotations to make recursive equations
behave. IEEE Transactions on Software Engineering SE, 8(l):21-33,
1982.
[Seki, 1990] H. Seki. A comparative study of the well-founded and the
stable model semantics: Transformation's viewpoint. In Proceedings of
the Workshop on Logic Programming and Non-monotonic Logic, pages
115-123. Cornell University, USA, 1990.
[Seki, 1991] H. Seki. Unfold/fold transformation of stratified programs.
Theoretical Computer Science, 86:107-139, 1991.
[Seki, 1993] H. Seki. Unfold/fold transformation of general logic programs
786 Alberto Pettorossi and Maurizio Proietti

for well-founded semantics. Journal of Logic Programming, 16(1&2):5-


23, 1993.
[Seki and Furukawa, 1987] H. Seki and K. Furukawa. Notes on transform-
ation techniques for generate and test logic programs. In Proceedings
of the International Symposium on Logic Programming, San Francisco,
CA, USA, pages 215-223. IEEE Press, 1987.
[Shepherdson, 1992] J. C. Shepherdson. Unfold/fold transformations of
logic programs. Mathematical Structures in Computer Science, 2:143-
157, 1992.
[Sterling and Kirschenbaum, 1993] L. Sterling and M. Kirschenbaum. Ap-
plying techniques to skeletons. In J.-M. Jacquet, editor, Constructing
Logic Programs, chapter 6, pages 127-140. Wiley, 1993.
[Sterling and Lakhotia, 1988] L. Sterling and A. Lakhotia. Composing
Prolog meta-interpreters. In R. A. Kowalski and K. A. Bowen, edi-
tors, Proceedings Fifth International Conference on Logic Programming,
Seattle, WA, USA, pages 386-403. The MIT Press, 1988.
[Sterling and Shapiro, 1994] L. Sterling and E. Shapiro. The Art of Prolog.
Second Edition, The MIT Press, 1994.
[Takeuchi, 1986] A. Takeuchi. Affinity between meta-interpreters and par-
tial evaluation. In H. J. Kugler, editor, Proceedings of Information Pro-
cessing '86, pages 279-282. North-Holland, 1986.
[Takeuchi and Furukawa, 1986] A. Takeuchi and K. Furukawa. Partial
evaluation of Prolog programs and its application to meta-programming.
In H. J. Kugler, editor, Proceedings of Information Processing '86, pages
279-282. North-Holland, 1986.
[Tamaki and Sato, 1984] H. Tamaki and T. Sato. Unfold/fold transform-
ation of logic programs. In S.-A. Tarlund, editor, Proceedings Second In-
ternational Conference on Logic Programming, Uppsala, Sweden, pages
127-138. Uppsala University, 1984.
[Tamaki and Sato, 1986] H. Tamaki and T. Sato. A generalized correctness
proof of the unfold/fold logic program transformation. Technical Report
86-4, Ibaraki University, Japan, 1986.
[Tarau and Boyer, 1990] P. Tarau and M. Boyer. Elementary logic pro-
grams. In P. Deransart and J. Maluszynski, editors, Proceedings PLILP
'90, pages 159-173. Springer-Verlag, 1990.
[Traff and Prestwich, 1992] J. L. Traff and S. D. Prestwich. Meta-pro-
gramming for reordering literals in deductive databases. In A. Pettorossi,
editor, Proceedings 3rd International Workshop on Meta-Programming
in Logic, Meta '92, Uppsala, Sweden, Lecture Notes in Computer Science
649, pages 280-293. Springer-Verlag, 1992.
[Turchin, 1986] V. F. Turchin. The concept of a supercompiler. A CM
TOPLAS, 8(3):292-325, 1986.
Transformation of Logic Programs 787

[Ueda and Purukawa, 1988] K. Ueda and K. Furukawa. Transformation


rules for GHC programs. In Proceedings International Conference on
Fifth Generation Computer Systems, ICOT, Tokyo, Japan, pages 582-
591, 1988.
[van Emden and Kowalski, 1976] M. H. van Emden and R. Kowalski. The
semantics of predicate logic as a programming language. Journal of the
ACM, 23(4):733-742, 1976.
[Van Gelder et al., 1989] A. Van Gelder, K. Ross, and J. Schlipf. Un-
founded sets and well-founded semantics for general logic programs.
In Proceedings of the ACM Sigact-Sigmod Symposium on Principles of
Database Systems, pages 221-230. ACM Press, 1989.
[Venken, 1984] R. Venken. A Prolog meta-interpretation for partial eval-
uation and its application to source-to-source transformation and query
optimization. In T. O'Shea, editor, Proceedings of ECAI '84, pages 91-
100. North-Holland, 1984.
[Wadler, 1990] P. L. Wadler. Deforestation: Transforming programs to
eliminate trees. Theoretical Computer Science, 73:231-248, 1990.
[Walker and Strong, 1972] S. A. Walker and H. R. Strong. Characteriza-
tion of flowchartable recursions. In Proceedings 4th Annual ACM Sym-
posium on Theory of Computing, Denver, CO, USA, 1972.
[Wand, 1980] M. Wand. Continuation-based program transformation
strategies. Journal of the ACM, 27(1):164-180, 1980.
[Warren, 1983] D. H. D. Warren. An abstract Prolog instruction set. Tech-
nical Report 309, SRI International, 1983.
[Warren, 1992] D. S. Warren. Memoing for logic programs. Communica-
tions of the ACM, 35(3):93-lll, 1992.
[Wiggins, 1992] G. A. Wiggins. Negation and control in automatically
generated logic programs. In A. Pettorossi, editor, Proceedings 3rd In-
ternational Workshop on Meta-Programming in Logic, Meta '92, Upp-
sala, Sweden, Lecture Notes in Computer Science 649, pages 250-264.
Springer-Verlag, 1992.
[Wirth, 1976] N. Wirth. Algorithms + Data Structures = Programs. Pren-
tice-Hall, Inc., 1976.
[Zhang and Grant, 1988] J. Zhang and P. W. Grant. An automatic dif-
ference-list transformation algorithm for Prolog. In Proceedings 1988
European Conference on Artificial Intelligence, ECAI '88, pages 320-
325. Pitman, 1988.
This page intentionally left blank
INDEX

References to footnotes are indicated by "n" after the page number

H- 4, 6 semantics 308
> | -> 4, 6, 44 simulation 280-5
J-\- 4, 6, 43 use for various forms of reasoning 271-2
> h* 5, 6 abductive framework 242
?- 5, 6, 11, 70 abductive logic programming [ALP] 269
I- 5, 6, 37, 75 modification of semantics 278
0 | 5, 6 abductive proof procedure 273-7
-> 5, 6, 78 abductive phase 258-60
h 6, 40, 75 argumentation-theoretic interpretation 267-
N 6, 7, 70 9, 277-9
consistency phase 258, 258n, 260
A8 soundness 261-2
. 25 abductive reasoning 236-7
= 28 abductive task, intractability 240
-L 71, 192 abstract data type 455
n 120 realization 208
C 120 abstract interpretation 111, 772
T 120, 516 abstract interpreter 526-7
D 166, 363, 406 for higher-order Horn clauses, deficien-
T 192 cies 537
HO 193 abstract logic programming language 198
? 213 examples 199-200, 205
= 387 abstract machine
<-> 387 advantages 651-652
~387 design of instruction set 652-655
0408 runtime support 656
[]429 AC, see admissibility condition
[1 465 acceptability semantics 296
X 516 accumulation strategy 759
:: 551 admissibility condition [AC] 333
\ 551 admissible chain 184
(( )> 609 AKL, see Andorra Kernel Language
[]609 algebra 28; see also error...; functional...;
initial...; relational...
abducible hypotheses, retraction 281-2 allowed program 360-1, 391
abducibles, negation of 274-5 Alloy 466
abducible sentences 237 ALP, see abductive logic programming
abduction a-conversion 514
applications in AI 243-4 ALPS 622
argumentation-theoretic interpretation 236 amalgamated language, incompleteness 460
computation through TMS 279 amalgamated program 461
and constraint logic programming 287-8 amalgamation 460
deduction from the completion 285-7 advantages 464
default and non-default 308 ambivalent logic 467-8
formalizations 240 analog circuits, analysis and synthesis
proof procedures 239 658-60
790 INDEX

ancestor filter 170 GET, see Clark equality theory; Clark's


Andorra Kernel Language [AKL] 622 equational theory
Andorra principle 622 chain, admissible 184
annotations 763—4 chemical-reaction pathways, elucidation
answer 671-4
consequentially strongest 14 CHIP, use for scheduling problems 671
most general 20 Church-Rosser property 81
prohibited 13 Church-Rosser results 57
as proof of formula 12 Church's simple theory of types 500, 510
in query system 12 CI, see clause incorporation
answer extraction 189 CLAM 655
answer sets, consistency 290-3 Clark completion 338, 356-9, 365, 374-84,
APL 71 609, 707
append, implementation 504-7 and closed world assumption 379
architecture, meta-level 465 consistency in 3-valued logic 387
arithmetic, Robinson's 18 dependence on language 389
assumption-based truth maintenance sys- incompleteness for normal programs 383
tem [ATMS] 239, 304, 305-7 and non-normal programs 380-2
non-propositional case 307 for programs with equality 383-4
ATMS, see assumption-based truth main- semantics 738
tenance system Clark equality theory [CET] 286, 338
atomic prepositional formula 8 Clark's equational theory [CET] 375,
attack 263 386-7
by explicit negation 297 classical logic 196, 206
by integrity constraint 308 clausal goal replacement rule 713-14,
autoepistemic logic 302, 408-9 719-21
three-valued 409 clause, definite 186
automated theorem proving 163-4 clause incorporation [CI] 217
clause replacement rules 714-15
backchaining refutation 172 clause subsumption 176
backtracking closed world assumption [CWA] 289, 338,
in constraint algorithm 643-645 356-9, 365, 370-4
dependency-directed 304 and Clark completion 379
implicational computation 54 conditions for consistency 371, 372
basic explanation 238 in databases 27
basic goal replacement rule 727 inconsistent, in disjunctive theories 338
bat 291 model-theory of 374
belief revision 248, 309 and negation as failure 357, 373
B-conversion 514 restriction to definite Horn clause pro-
bidirectional search 172 grams 371-2
block structuring, realization 208 see also generalized. . . ; weak generalized..
boolean domain, satisfiability in 632, 636, closure 130
640 CLP, see constraint logic programming
boolean equations, solving 674-5 coherence principle 296-7
bottom-up execution, for CLP systems combinator calculus 138
617-19 combinatorial search problems 665-75
bottom-up refutation 172 committed-choice non-determinism 619
bound variable 17 compactness
of proof system 37
caching 185 of provable-consequence relation 40
call-consistent program 361, 391-3 of semantic-consequence relation 9
CAS, see computed answer substitution se- compilation, compared with specialization
mantics 487-8
CCLP, see concurrent constraint logic pro- compiling control 748
gramming completeness
INDEX 791

preservation, in depth-first search proce- practical use of 658


dure 168 preferred solutions in 626-8
of programming system 50 user-defined constraints in 624-5
complete rewriting sequence 97-100 constraint logic programming system
completion, see Clark completion completeness 615-17
complexity, and disjunctive logic programs implementation 627-8
349 soundness 615-17
composition 753 constraint programming languages 593-5
computable relation 3 constraints
computation dynamic 594-5
as search 192—5 incremental satisfiability 633-7
meaning 528-37 non-incrementally satisfiable 630-3
computed answer semantics for normal pro- tests and operations 605
grams 737 constraint solving algorithms 628-645
computed answer substitution 728 constraint system 608
computed answer substitution semantics [CAS] constructive negation 343, 364-5
728-30 constructor-orthogonality 85, 86-87, 91
computed-output relation 45 cross-reference table 644
concurrent constraint logic programming cumulative default logic 252
[CCLP] 619-21 cut elimination 56-7, 196, 519, 522
Concurrent Prolog 619n CWA, see closed world assumption
confluence 57, 74, 81-2, 93-5
consequences 96 D-admissible 298
local 82, 93-5 database view updates, use of abduction
one-step 82 243
of orthogonal systems 87 Datalog 23-8
testing for 84-96 data structures
undecidability 84 choice of 756
confluent term rewriting, near-completeness dynamic provision 756-60
83 DCA, see domain closure axiom
congruence closure 119 DDR, see disjunctive database rule
conjunctive equation 142, 143 decidable set 3
conjunctive implicational computations 54 decidable theory 3
conjunctive normal form, advantages deduction 236
165-6 automated 522
consequences, provable 37-40 deduction property 10
consequentially complete programming sys- deductive database 24
tem 50 default abduction and negation as failure
consistent answer set 290 307
constraint domain 601-8 default logic 249, 302
constraint logic programming [CLP] 36-7, default reasoning 249-54
152, 287-8, 592-601 argumentation-based 301-3
background 598 with explicit negation 299-300
design of programs 651-6 formalisations 250
logical semantics of programs 608-9 methods of performance 254
parallel implementation 657-8 uses of abduction 244
program analysis techniques 655-6 default rule 249
programs 599-600 definite clause 186, 503
semantics 600—1 definite program 361, 384—5
constraint logic programming languages definition elimination rule 712
596-8 definition rule 702, 711-12
complex constants in 623—4 deforestation 753
linguistic features 621-8 delay declaration 764
miscellaneous applications 676 delayed constraint 645
negation in 625 delayed goal 645
792 INDEX

delay mechanism 596 rules of inference 75-78


deletion 712 soundness of proof systems 77
demand-driven computation 71, 73 equational logic programming
demodulation 78 data structures 111-20
denotational semantics 57-8 driving procedures 111, 129-37
depth-first search, price 170 extensions 141-53
derivability, formal 6 implementation 119-20
derivation 517-19 pattern-matching 111, 120-9
DHBp, see disjunctive Herbrand base sequencing 111, 120-9
difference-list 756 Equational Machine [EM] 138
digital circuits, use of CLP 675 equational programming language 31-5
directed congruence closure 133, 134 equation solving 147-9
directed set 142 error algebra 73
disjunction, characterization 337 Tj-conversion 510, 514
disjunctive database rule [DDR] 340 eureka predicate 704, 742
disjunctive deductive database 349 eureka step 704
disjunctive Herbrand base [DHBp] 332,336 evaluation, see eager...; lazy...
disjunctive logic, implementation of lan- Event Calculus 245-6
guage 348 eventually outermost 99
disjunctive logic program 326 expert systems 490
semantics 349 explanation
disjunctive logic programming [DLP] 219, formula 285
283, 326, 330-7 most specific 267
transformation of programs into ALP explicit negation 288, 289
284-5 extended logic program 289, 295-7
various semantics the same 336 with abduction 300
DLP, see disjunctive logic programming inconsistency 291-3
DNA sequencing 668-9 semantics 289-90
domain closure axiom [DCA] 389, 405 extended logic programming [ELP]
domain theory 58 288-300
don't-care nondeterminism 619 argumentation-theoretic approach
DSL ALPHA 25 297-9
semantic problem in 26 extended stable model semantics 297
translation of queries into FOPC ques- extension 256n
tions 27 complete 264
dynamic meta-programming 468 in model elimination procedure 182
dynamic scope, and incorrect answers in preferred 263
Lisp 34-5
dynamic scoping, unsound, in early Lisp fact, in Prolog 433-4
processors 55 fair term rewriting sequence 98, 99
fault diagnosis, use of abduction 243
eager evaluation, in denotational semantics felicitous model 401
58 FF, see finite failure
ElipSys 657 filter promotion strategy 748
ELP, see extended logic programming financial planning 676
EM, see Equational Machine finite differencing 753
entailment algorithm 637-40 finite failure [FF] 367
entailment, semantic 6 during program transformation 730-2
EqL 36, 148 first-order N-Prolog 215
equality, definability 405 first-order predicate calculus [FOPC] 1-2,
equational formula 28 16, 70
equational logic models for 16-17
completeness of proof systems 77 programming in 15-21
model of 28 semantic system for 17
programming in 28-31 fixed point semantics 609-11
INDEX 793

floundering 343, 359-61, 367, 369, 406 generic model 372


recursive undecidability of occurrence 369 G machine 138
recursive unsolvability 360 goal equivalence 712, 713, 714
in SLDNF-resolution 364 goal replacement 718
flying birds example 249, 250-1, 254-5, 272, goal formulas 503
298, 299-300 goal-oriented refutation 172
see also bat; ostrich goal replacement rule 702, 712-13, 714
FOL 441 Godel numbering 45, 441
folding 717-18, 721-3 Godel [system] 424, 442, 453-9, 461, 473
folding rule 701, 709-11 Godel T-predicate 40
partial correctness 701 ground representation 423
FOPC, see first-order predicate calculus ground resolution 166
forced folding 744 completeness 167
formal derivability 6 soundness 166-7
formula, logical 7 GWFS, see generalized well-founded seman-
forward chaining refutation 172 tics
forward-directed refutation 172
Fourier algorithm 643 Harrop formula 202-3
four-valued logic 391 see also hereditary...
free variable 17 higher-order, value of language 584
functional algebra 58-60 hashed cons 116, 119
functional program head-normal form 130
approaches to compiling 137 Herbrand base, see disjunctive Herbrand
as data 565-72 base
functional programming 560-1 Herbrand domain 327
functional programming language 31-5, 71 Herbrand model 58-60, 327
design of 31-5 fixpoint characterization 378, 379
deviations from pure LP in implementa- intersection property 328, 331, 332
tion 34 see also least...; least reflective...; minimal.
higher-order 31-2 supported
notations 34 Herbrand universe 174
functions, language representing 510-13 see also positive...
Futamura projection 487, 488-9, 771 hereditary Harrop formula 203, 574-84
future redundancy 653—4 language based on, novelty 576
hierarchical program 361, 391-3
Galois connection 8 high-level vision, use of abduction 243
garbage collection, generational 136 higher-order logic, various senses 501-2
Gaussian elimination 630-1 higher-order program, examples 552-6
GCWA, see generalized closed world as- higher-order programming 500
sumption syntax 549-52
GCWAS, see generalized closed world as- Hilog 468
sumption for stratified logic pro- Horn characterization 330
grams Horn clause 21, 22, 186, 326
generalization + equality 715 higher-order 501, 523-8, 572-4, 584-6
generalization strategy 702, 743, 745, 747 implementation of interpreter 563—5
generalized closed world assumption [GCWA] negative 502
338, 394-5 positive 502, 503
computational difficulty for negative queries Horn clause logic 186-90, 229
339 procedural qualities 186
for stratified logic programs 345 Horn clause program, definite 384-5
generalized stable model 270 Horn formula 180
semantics 271 Horn set 180-1
generalized well-founded semantics [GWFS] hypergraph 151-2
346-7 rewriting, implementation 152
general-level N-Prolog 215 hypothetical implication 213
794 INDEX

hypothetical reasoning explicit 10, 42


chemical 671-4 implicit 10, 42
use of meta-programming 473-7 representations of 10
knowledge assimilation [KA] 244-9, 309,
iff 358 471-3
implementation 489-90 use of abduction 243
implication, in goals 547-77 knowledge base [KB] 244
implicational computations knowledge based systems 490
backtracking 47 Knuth-Bendix methods 92-6
conjunctive 48
naive 46 lambda abstraction 32
naive, with failure 46 A calculus 136
implicational logic program 46 see also typed lambda-calculus
implied iff 358 A conversion 510, 522
incrementality of algorithms 628-30 rules 513-16
incremental normal form 143-4 A Prolog 23, 212, 549
independence of negated constraints 607 A-terms 500-1
index 104, 105 as data structures 561-74
indexical constraints 625 language 423
indexings, in recursion theory 45 language independence 434
induction 236 language independent stratified program 437,
inference engine, in constraint programming 438
645-648 language interpreter, enhanced 490-1
inferential proof, in equational logic 74, 75 lazy computation 71
infinite terms 141-7 lazy directed congruence closure 133
information system 607-8 lazy evaluation 35, 141
InH-Prolog 221-9, 230 in denotational semantics 58
examples of deductions 224-7 recursive schema 129
representation of inference system 227-8 lazy memo function 134
initial algebra 58—60 leapfrogging, in rewriting 100
instance of 29 least fixpoint semantics 362
institution 8 least Herbrand model [LHM] 707, 724
integrity check 274 Least Reflective Herbrand Model 466
integrity checking 242-3 legal knowledge, layers of 467
optimisation 275-7 let, construct in functional languages 32
integrity constraints 241-3, 251 level of meta-program 429
in logic programming 309-310 LHM, see least Herbrand model [LHM]
role in providing attacks 308 liar paradox 459
interpreter lifting 210
design problems 209-12 linear input format 180
for higher-order formulae 546-9 linear input resolution, for Horn sets 181
intuitionistic logic 196, 206 linear resolution, refinements 175-86
iterative deepening 175 linear restriction 170
linked heap structure 112-14
JTMS, see justification-based truth main- Lisp 22, 71, 72, 500
tenance system compilers 111
justification, in truth maintenance 303-4 early implementations 34-5
justification-based truth maintenance sys- as functional programming language 31
tem [JTMS] 304-5 locally linear refutation 171
locally stratified program 402
KA, see knowledge assimilation logic 1, 70
KB, see knowledge base logical formula 7
KM-admissible 278, 298 logical semantics 57-60
knowledge logic program 239
communication 10 construction 773
INDEX 795

negation 337-40 meta-programming


normal 340-7 amalgamated 461-7
positive consequences 327—37 dynamic 425-6
semantics 706-7 facilities 486
strategies for transformation 742-64 using ground representation 440-59
stratified 341-3, 343-5 self-applicable 425
syntax 704-6 separated 460-1
well-founded 346 MetaProlog 442, 473
logic programming [LP] 598 meta-variable 465
abductive proof procedure 257-63 METEOR 185
choice of logical language 190-1, 198 MGTP 284
and CLP 595-6 minimal explanation 238
completeness in 51 minimal Herbrand model 363, 394
design of languages 2 minimality condition [MC] 333
exceptions 293—5 minimal logic 196, 206
generalizations 2 minimal model 402
higher-order 422 ML 500
history 325 modal completion 407
incompleteness and unsoundness of im- modal logic 406-9
plementations 55 of provability 407
meanings 186 mode analysis 772
problems of implementation 81 model 7, 519-22
sorts of negation 365-7 general 521
soundness 51 standard 521
without NAF 295-7 model elimination [ME] 164, 181-2
logic programming languages 70 see also METEOR
equational 70 model-state, of disjunctive logic program
examples 15—37 332
implementation 37-57 modus ponens, see cut elimination; resolu-
motivations for use 2 tion inference rule
process of design 56 MTOSS deduction 180
specification 7-37 multiple-infix notation 4
loop absorbtion strategy 745, 746 MU-Prolog 622
loop fusion 753
LP, see logic programming NAF, see negation as failure; negation as
LUSH resolution 187 finite failure
naive implicational computation 53
MACSYMA 595 with failure 53-4
MC, see minimality condition naming relation 423
ME, see model elimination narrowing 149
meaning natural language understanding, use of ab-
in nonclassical logics 42 duction 243
of a program 393-4 near-Horn Prolog [nH-Prolog] 213, 219-29,
mechanical engineering, use of CLP 676 230
melting 120-1 intuitionistic soundness 220
memo function 119, 134 negation
memoing 763—4 constructive 402-6
ME refutation 183-4 infeasibility of classical 365
merge condition, in implementation 180 semantics, in terms of classes of models
meta-interpreter, basic styles 449 393-402
meta-program 422 semantics, using non-classical logics
representation of program components 363-4
442-8 semantics, using special models 362-3
self-applicable 459-68 negation as failure [NAF; NF] 254-7,
specialization 481—91 356-9
796 INDEX

abductive interpretation 255-6 oriented clause set 172


abductive, proof procedures 309 orthogonality 85, 96
argumentation-theoretic interpretation ostrich 293-4
263-7, 308 OTAS 661
common-sense axiomatization 308 overlaps 108-10
deductive calculi 409-12
and default abduction 307 paraconsistent logic 290
incompleteness for Clark completion 359 parallel implementation of rewriting sys-
incompleteness for 2-valued models 386 tems 139-40
as modal provability notion 406 paramodulation 134-6
reasons for using 356 partial completion 391
satisfactory, in completion 428 partial deduction 481, 764
SLDNF-resolution 367-70 partial evaluation 481, 764-71
soundness for Clark completion 358, 376 approaches to 769
soundness for closed world assumption correctness 767
357 passive constraint 645
and 3-valued logic 390-1 path compression 132
3-valued semantics 262 P-derivation 541-6
used with caution 365 completeness 546
see also constructive negation and construction of interpreter 548
negation as finite failure [NAF] 338, 339 correctness 544
negative call evaluation 368 soundness 544
negative consequences, semantics 340 perfect model 342, 363, 397, 399, 402, 409
negative sequential-or 146 Pierce, C. S. 236
NF, see negation as failure Pierce's law 218
nH-Prolog, see near-Horn Prolog planning, use of abduction 243
nondeterminism 619 polymorphic types 454
non-monotonic logic 302, 337 positive consequences, semantics 337
non-monotonic reasoning 310, 349 positive Herbrand universe 523
non-standard procedural semantics, opti- predicate calculus with equality 36-7
mizing 491 predicate, reflective 448-53
non-strictness, in equational logic program- predicate variable
ming 72 extensional occurrence 509
normal form 70, 83 intensional occurrence 509-10
for term 29 preferred explanations 237
uniqueness 83-84 preferred extension semantics 297
normal program 374 principal normal form 515
correctness theorems 735, 738, 740, 741 probability theory, as alternative to com-
notation mon sense 244
computational 2, 3-6 program
logical 2, 3-6 analysis 772
N-Prolog 212, 213, 230 construction from modules 468-70
prepositional 214 improvement of efficiency 747-60
with quantifiers [QN-Prolog] 215 specialization 426, 478-80, 481, 765
NR-Prolog 218 synthesis 773
updating 471-3
objective formula 408 program clause 503
object language 428 programming
omega-order predicate calculus 23 compositional style 752
u)-rewriting 103-5 in FOPC 15
n-terms 120, 124, 129 programming system 37, 44-9
one-step strategy 97 completeness 49-56
options 661 determinate 45
analysis of trading 661-4 deterministic 45
valuations 676 and proof and query systems 51-3, 55
INDEX 797

soundness 49-56 over functions 509


program transformation 426, 478—80, over function symbols 209
697-700 over function variables 561
correctness 698 over predicates 505-9
efficiency 698 quantifiers, universal, in goals 574-7
formalization 699-700 query 503
objectives 699-700 allowed 391
reversible 716-19 notations for, in relational database sys-
rule 697, 701 tems 25
schemata approach 700 query system 11-15, 37
Progressive nH-Prolog 223 question 3
progressive search 223-4 determination of 33—4
projection algorithm 640-3 as formula 12
Prolog 2, 15, 21-3, 186, 325, 500 in query system 11
development 429-30 quick-checking CLP system 614
as CLP language 596-7
execution 651 raising 210
meta-programming 439-40 reachable parameters 118
notations in typical implementations 22 recursion removal 761—3
removal of meta-programming overhead recursive inseparability 101
431 redex 97
unsoundness 359 needed 102, 103
Prolog Technology Theorem Prover 182 reduction, in model elimination procedure
Prolog II 622 182-3
proof reflective predicate 425, 434-9, 457
Hilbert-style 38 reflective principle 425
linear 38 Reflective Prolog 424, 463, 466
natural-deduction style 38-9 reflective requirement 462
uniform 196-8, 213, 230 regularity 85
proof calculus 38 relational algebra 27
proof normalization 56 relational calculus 25, 27
proof procedure 163 relational database 15, 23—8
proof strategies for logic programming pure 24
56-7 relational logic programming 151-3
proof system 11, 37-40 Relevance Theory 248
compact 37 replacement law, validity 721-3
completeness 40-3 representation, non-ground 431-4
for SIC 38 re-representation 444
soundness 40-3 resolution inference rule 166
versus programming system 55 resolution procedure, first-order 173-5
proof theory 7, 39, 56 resolution proof procedure 164
protected data 373 resolution with merging 179—80
provability restart rule 218
classical equivalent to intuitionistic, for restricting operation 712
Horn clauses 534-7 restriction site mapping [RSM] 668-9
distinctions between sorts of 206 retractability 280
provable-consequence relation 40 retractability semantics 282
provable correctness of answer 43 rewrite-ambiguity 88-90
pure lazy Lisp 22 rewrite-orthogonality 85
pure Prolog 22, 190, 732, 736 rewriting, optimal 106—8
rewriting logic 151
QN-Prolog 213, 215 rewriting sequence, completeness 97—100
soundness and completeness 217 Robinson's arithmetic 19
QNR-Prolog 219 RSM, see restriction site mapping
quantification rule-orthogonality, 85, 86, 91
798 INDEX

rummage sale 137 SLDNF 239, 260


computation space 259-60
SATCHMO 284 SLDNFA 288
satisfaction completeness 606 SLDNF-resolution 359
scheduling problems 670-1 completeness for Clark completion 361-
schema 760-1 2
scheme 500 incompleteness for closed world assump-
scoping mechanism 209 tion 373
realization 207 soundness with respect to Clark comple-
search problems 665—75 tion in 3-valued logic 390-1
search procedure, depth-first 168-9 SLD-resolution 163-4, 187, 189-90, 229,
SECD machine 138 328-9
semantic-consequence relation 9 important properties 220
semantic correctness of answer 43 see also Prolog
semantic entailment 6 SLI-derivation 334-5
semantics 7 s-linear resolution 170
abductive 267 SLINF-resolution 339
answer set 289-90 SLI-resolution 333
denotational 57, 328 SL-resolution 164, 175, 185
fixpoint 328, 329-30 solution compactness 605-6
logical 2, 57 soundness of programming system 50
model-theoretic 328 specialization
procedural 328 compared to compilation 487-8
proof theoretic 328 of interpreter 482-4
relevant 707 of resolution procedure 484-5
specifcation view 241 specializes, self-applicable 488-9
semantic system 7-11, 37 special notations 6
intuitive evaluation 42 s-resolution 176-9
semicomputable relation 3 stable model 256-7, 342, 401, 402
semi-strict program 361, 392-3 stable semantics 301
sentence 18 stable set of assumptions 302
sequent 195 stable state 345
sequent calculus 517-18 stable theory 265
sequent derivation 39 semantics
sequentiality, see strong...; weak... stationary semantics 347
sequentiality analysis 102 sterile jar problem 203—5
extensions 108-11 stock cutting 666-7
sequent-style proof systems 195-8 strategies for transformation 742
set, decidable 3 stratification semantics 345
shallow implicational calculus [SIC] 8, 9 stratified program 362, 391-3, 399, 437n
completeness 41 stream 73
soundness 41 strict function 71
simplex algorithm 631, 634, 636 strictly normal equality formula 404
shallow implicational-conjunctive calculus strictness analysis 111, 139
[SICC] 14 strict program 393
sharing, dynamic exploitation 133-4 strong sequentiality 103-4, 105
SIC, see shallow implicational calculus decidability 105
SICC, see shallow implicational-conjunctive strong well-founded semantics 347
calculus structure 519-22
Skolem conjunctive form 165 subpattern set 124-8, 129
Skolem-Herbrand-Godel theorem 174 subset logic 149-51
Skolemization, dynamic 210 subset logic programming 150
SLD 239 subsumption 618-19
SLD derivation 329 computational hardness of checking 618
SLDNA 288 see also clause...; 8-...
INDEX 799

subsumption elimination 167 truth maintenance [TM], and abduction


supercombinator method 137 303-7
supported Herbrand model 395 truth maintenance system 249, 310
supported state 345 semantics 303
suspension 130 tupling 753
syntax 7 tupling strategy 702, 743, 745, 746
system 423 typed interpretation of meta-program 434,
438
typed lambda-calculus 39, 510
tactic 556 types, in functional programming language
implementation 556-9 582-4
tactical 556 type theory 23
tactical implementation 556-9
tail-recursive program, recognition problem unfolding 721-3
570, 573 unfolding rule 701, 707-9
T&S-folding 726 partial correctness 701
tautology, detection 13 unfolding selection rule 746
t-clause 333 unfolding step 478-80
template 571 unfolding tree 745
Templog 664 unification algorithm 118
temporal reasoning 664—5 unification problem
term, value of 17, 129 higher-order, undecidability of existence
term data structures, logical interpretation 538
117-20 for A-terms
term rewriting 57, 78 unifiers, search for 538-41
choice of redex to rewrite 100 uniform proof 196-8, 213, 230
proof strategies using 96-108 Unit nH-Prolog 223, 227
term rewriting proof universal query problem 399
in equational logic 74, 75-78, 78-81 unnecessary variable elimination 753
as shorthand for inferential proof 78, 80 value of a term 17, 129
status of Symmetric rule 80 value trail 644
theory, decidable 3 variable
0-subsumption 176 bound 17
three-valued logic 364, 385-91 free 17
three wise men problem 473—7 view updates 247
thunk 130
Tigre system 138 wakeup 645
time stamp 644 implementation 645
TM, see truth maintenance wakeup conditions 647
top-down execution, for CLP systems wakeup system 646-648
611-15 implementation 648-650
top-down refutation 172 Warren Abstract Machine 138, 171, 182,
TOSS deduction 178-9 651
TOSS refutation 183 weak generalized closed world assumption
tours protocol 146 [WGCWA] 339
tranfac-derivation 334 weakly complete programming system 50
transformation weakly perfect model 399, 401, 437n
schema-based 760-1 weakly stratified program 399, 401
weak sequentiality 102, 103
see also program transformation
well-founded model 264, 400, 402
transformation rule 704-15
well-founded semantics 266
correctness 715-42
for logic programs 348
correctness, for definite programs 723-6 WGCWA, see weak generalized closed world
transformation strategies 697, 699, 701 assumption
tree isomorphism algorithm 133
truss structures, use of CLP 675-6 Yale shooting problem 252-3, 258-9, 305

You might also like