0% found this document useful (0 votes)
397 views

Boolean Functions - Theory, Algorithms, and Applications (Crama & Hammer 2011-05-16)

Uploaded by

Thabang Thema
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
397 views

Boolean Functions - Theory, Algorithms, and Applications (Crama & Hammer 2011-05-16)

Uploaded by

Thabang Thema
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 711

This page intentionally left blank

Boolean Functions

Written by prominent experts in the field, this monograph provides the first compre-
hensive and unified presentation of the structural, algorithmic, and applied aspects of
the theory of Boolean functions.
The book focuses on algebraic representations of Boolean functions, especially dis-
junctive and conjunctive normal form representations. It presents within this framework
the fundamental elements of the theory (Boolean equations and satisfiability problems,
prime implicants and associated short representations, dualization), an in-depth study
of special classes of Boolean functions (quadratic, Horn, shellable, regular, threshold,
read-once functions and their characterization by functional equations), and two fruit-
ful generalizations of the concept of Boolean functions (partially defined functions and
pseudo-Boolean functions). Several topics are presented here in book form for the first
time.
Because of the unique depth and breadth of the unified treatment that it provides and
its emphasis on algorithms and applications, this monograph will have special appeal
for researchers and graduate students in discrete mathematics, operations research,
computer science, engineering, and economics.

Dr. Yves Crama is Professor of Operations Research and Production Management and
the former Director General of the HEC Management School of the University of
Liège, Belgium. He is widely recognized as a prominent expert in the field of Boolean
functions, combinatorial optimization, and operations research, and he has coauthored
more than seventy papers and three books on these subjects. Dr. Crama is a member of
the editorial board of Discrete Applied Mathematics, Discrete Optimization, Journal
of Scheduling, and 4OR – The Quarterly Journal of the Belgian, French and Italian
Operations Research Societies.

The late Peter L. Hammer (1936–2006) was a Professor of Operations Research,


Mathematics, Computer Science, Management Science, and Information Systems at
Rutgers University and the Director of the Rutgers University Center for Operations
Research (RUTCOR). He was the founder and editor-in-chief of the journals Annals
of Operations Research, Discrete Mathematics, Discrete Applied Mathematics, Dis-
crete Optimization, and Electronic Notes in Discrete Mathematics. Dr. Hammer was
the initiator of numerous pioneering investigations of the use of Boolean functions in
operations research and related areas, of the theory of pseudo-Boolean functions, and
of the logical analysis of data. He published more than 240 papers and 19 books on
these topics.
encyclopedia of mathematics and its applications
founding editor g.-c. rota
Editorial Board
R. Doran, P. Flajolet, M. Ismail, T.-Y. Lam, E. Lutwak
The titles below, and earlier volumes in the series, are available from booksellers or from
Cambridge University Press at www.cambridge.org.
110 M.-J. Lai and L. L. Schumaker Spline Functions on Triangulations
111 R. T. Curtis Symmetric Generation of Groups
112 H. Salzmann et al. The Classical Fields
113 S. Peszat and J. Zabczyk Stochastic Partial Differential Equations with Lévy Noise
114 J. Beck Combinatorial Games
115 L. Barreira and Y. Pesin Nonuniform Hyperbolicity
116 D. Z. Arov and H. Dym J-Contractive Matrix Valued Functions and Related Topics
117 R. Glowinski, J.-L. Lions, and J. He Exact and Approximate Controllability for Distributed
Parameter Systems
118 A. A. Borovkov and K. A. Borovkov Asymptotic Analysis of Random Walks
119 M. Deza and M. Dutour Sikirié Geometry of Chemical Graphs
120 T. Nishiura Absolute Measurable Spaces
121 M. Prest Purity, Spectra and Localisation
122 S. Khrushchev Orthogonal Polynomials and Continued Fractions
123 H. Nagamochi and T. Ibaraki Algorithmic Aspects of Graph Connectivity
124 F. W. King Hilbert Transforms I
125 F. W. King Hilbert Transforms II
126 O. Calin and D.-C. Chang Sub-Riemannian Geometry
127 M. Grabisch et al. Aggregation Functions
128 L. W. Beineke and R. J. Wilson (eds.) with J. L. Gross and T. W. Tucker Topics in
Topological Graph Theory
129 J. Berstel, D. Perrin, and C. Reutenauer Codes and Automata
130 T. G. Faticoni Modules over Endomorphism Rings
131 H. Morimoto Stochastic Control and Mathematical Modeling
132 G. Schmidt Relational Mathematics
133 P. Kornerup and D. W. Matula Finite Precision Numbers Systems and Arithmetic
134 Y. Crama and P. L. Hammer Boolean Models and Methods in Mathematics, Computer
Science, and Engineering
135 V. Berthé and M. Rigo Combinatorics, Automata and Number Theory
136 A. Kristály, V. D. Rǎdulescu, and C. Varga Variational Principles in Mathematical Physics,
Geometry, and Economics
137 J. Berstel and C. Reutenauer Noncommutative Rational Series with Applications
138 B. Courcelle Graph Structure and Monadic Second-Order Logic
139 M. Fiedler Matrices and Graphs in Geometry
140 N. Vakil Real Analysis through Modern Infinitesimals
141 R. B. Paris Hadamard Expansions and Hyperasymptotic Evaluation
     

Boolean Functions
Theory, Algorithms, and Applications

YVE S C RAMA
University of Liège, Belgium

PETE R L. HAM M E R
cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town,
Singapore, São Paulo, Delhi, Tokyo, Mexico City

Cambridge University Press


32 Avenue of the Americas, New York, NY 10013-2473, USA
www.cambridge.org
Information on this title: www.cambridge.org/9780521847513

© Yves Crama and Peter L. Hammer 2011

This publication is in copyright. Subject to statutory exception


and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.

First published 2011

Printed in the United States of America

A catalog record for this publication is available from the British Library.

Library of Congress Cataloging in Publication Data


Crama, Yves, 1958–
Boolean functions / Yves Crama, Peter L. Hammer.
p. cm. – (Encyclopedia of mathematics and its applications)
Includes bibliographical references and index.
Contents: Theory, algorithms, and applications
ISBN 978-0-521-84751-3 (hardback)
1. Algebraic functions. 2. Algebra, Boolean. I. Hammer, P. L., 1936–2006. II. Title.
QA341.C73 2011
511.3 24–dc22 2011009690

ISBN 978-0-521-84751-3 Hardback

Cambridge University Press has no responsibility for the persistence or accuracy of URLs for
external or third-party Internet Web sites referred to in this publication and does not guarantee that
any content on such Web sites is, or will remain, accurate or appropriate.
To Edith,
by way of apology for countless days
spent in front of the computer.
YC
Contents

Contributors page xiii


Preface xv
Acknowledgments xix
Notations xxi

Part I Foundations
1 Fundamental concepts and applications 3
1.1 Boolean functions: Definitions and examples 3
1.2 Boolean expressions 8
1.3 Duality 13
1.4 Normal forms 14
1.5 Transforming an arbitrary expression into a DNF 19
1.6 Orthogonal DNFs and number of true points 22
1.7 Implicants and prime implicants 24
1.8 Restrictions of functions, essential variables 28
1.9 Geometric interpretation 31
1.10 Monotone Boolean functions 33
1.11 Recognition of functional and DNF properties 40
1.12 Other representations of Boolean functions 44
1.13 Applications 49
1.14 Exercises 65

2 Boolean equations 67
2.1 Definitions and applications 67
2.2 The complexity of Boolean equations: Cook’s theorem 72
2.3 On the role of DNF equations 74
2.4 What does it mean to “solve a Boolean equation”? 78
2.5 Branching procedures 80
2.6 Variable elimination procedures 87

vii
viii Contents

2.7 The consensus procedure 92


2.8 Mathematical programming approaches 95
2.9 Recent trends and algorithmic performance 103
2.10 More on the complexity of Boolean equations 104
2.11 Generalizations of consistency testing 111
2.12 Exercises 121

3 Prime implicants and minimal DNFs 123


Peter L. Hammer and Alexander Kogan
3.1 Prime implicants 123
3.2 Generation of all prime implicants 128
3.3 Logic minimization 141
3.4 Extremal and typical parameter values 159
3.5 Exercises 165

4 Duality theory 167


Yves Crama and Kazuhisa Makino
4.1 Basic properties and applications 167
4.2 Duality properties of positive functions 176
4.3 Algorithmic aspects: The general case 183
4.4 Algorithmic aspects: Positive functions 189
4.5 Exercises 198

Part II Special Classes


5 Quadratic functions 203
Bruno Simeone
5.1 Basic definitions and properties 203
5.2 Why are quadratic Boolean functions important? 205
5.3 Special classes of quadratic functions 207
5.4 Quadratic Boolean functions and graphs 209
5.5 Reducibility of combinatorial problems
to quadratic equations 218
5.6 Efficient graph-theoretic algorithms for quadratic equations 230
5.7 Quadratic equations: Special topics 243
5.8 Prime implicants and irredundant forms 250
5.9 Dualization of quadratic functions (Contributed by
Oya Ekin Karaşan) 263
5.10 Exercises 266

6 Horn functions 269


Endre Boros
6.1 Basic definitions and properties 269
6.2 Applications of Horn functions 273
6.3 False points of Horn functions 277
Contents ix

6.4 Horn equations 281


6.5 Prime implicants of Horn functions 286
6.6 Properties of the set of prime implicants 292
6.7 Minimization of Horn DNFs 297
6.8 Dualization of Horn functions 306
6.9 Special classes 309
6.10 Generalizations 314
6.11 Exercises 321

7 Orthogonal forms and shellability 326


7.1 Computation of orthogonal DNFs 326
7.2 Shellings and shellability 330
7.3 Dualization of shellable DNFs 336
7.4 The lexico-exchange property 338
7.5 Shellable quadratic DNFs and graphs 346
7.6 Applications 348
7.7 Exercises 349

8 Regular functions 351


8.1 Relative strength of variables and regularity 351
8.2 Basic properties 355
8.3 Regularity and left-shifts 362
8.4 Recognition of regular functions 365
8.5 Dualization of regular functions 369
8.6 Regular set covering problems 377
8.7 Regular minorants and majorants 380
8.8 Higher-order monotonicity 391
8.9 Generalizations of regularity 397
8.10 Exercises 401

9 Threshold functions 404


9.1 Definitions and applications 404
9.2 Basic properties of threshold functions 408
9.3 Characterizations of threshold functions 413
9.4 Recognition of threshold functions 417
9.5 Prime implicants of threshold functions 423
9.6 Chow parameters of threshold functions 428
9.7 Threshold graphs 438
9.8 Exercises 444

10 Read-once functions 448


Martin C. Golumbic and Vladimir Gurvich
10.1 Introduction 448
10.2 Dual implicants 450
x Contents

10.3 Characterizing read-once functions 456


10.4 The properties of P4 -free graphs and cographs 463
10.5 Recognizing read-once functions 466
10.6 Learning read-once functions 473
10.7 Related topics and applications of read-once functions 476
10.8 Historical notes 480
10.9 Exercises 481

11 Characterizations of special classes by functional equations 487


Lisa Hellerstein
11.1 Characterizations of positive functions 487
11.2 Functional equations 488
11.3 Characterizations of particular classes 491
11.4 Conditions for characterization 495
11.5 Finite characterizations by functional equations 500
11.6 Exercises 506

Part III Generalizations


12 Partially defined Boolean functions 511
Toshihide Ibaraki
12.1 Introduction 511
12.2 Extensions of pdBfs and their representations 514
12.3 Extensions within given function classes 531
12.4 Best-fit extensions of pdBfs containing errors 547
12.5 Extensions of pdBfs with missing bits 551
12.6 Minimization with don’t cares 558
12.7 Conclusion 561
12.8 Exercises 562

13 Pseudo-Boolean functions 564


13.1 Definitions and examples 564
13.2 Representations 570
13.3 Extensions of pseudo-Boolean functions 578
13.4 Pseudo-Boolean optimization 585
13.5 Approximations 593
13.6 Special classes of pseudo-Boolean functions 593
13.7 Exercises 607

A Graphs and hypergraphs 609


A.1 Undirected graphs 609
A.2 Directed graphs 612
A.3 Hypergraphs 614
Contents xi

B Algorithmic complexity 615


B.1 Decision problems 615
B.2 Algorithms 617
B.3 Running time, polynomial-time algorithms, and the class P 618
B.4 The class NP 619
B.5 Polynomial-time reductions and NP-completeness 620
B.6 The class co-NP 621
B.7 Cook’s theorem 622
B.8 Complexity of list-generation and counting algorithms 624

C JBool: A software tool 627


Claude Benzaken and Nadia Brauner
C.1 Introduction 627
C.2 Work interface 628
C.3 Creating a Boolean function 629
C.4 Editing a function 632
C.5 Operations on Boolean functions 633

Bibliography 635
Index 677
Contributors

Claude Benzaken
Laboratoire G-SCOP
Université Joseph Fourier
Grenoble, France

Endre Boros
RUTCOR – Rutgers Center for Operations Research
Rutgers University
Piscataway, NJ, USA

Nadia Brauner
Laboratoire G-SCOP
Université Joseph Fourier
Grenoble, France

Martin C. Golumbic
The Caesarea Rothschild Institute
University of Haifa
Haifa, Israel

Vladimir Gurvich
RUTCOR – Rutgers Center for Operations Research
Rutgers University
Piscataway, NJ, USA

Lisa Hellerstein
Department of Computer and Information Science
Polytechnic Institute of New York University
Brooklyn, NY, USA

xiii
xiv Contributors

Toshihide Ibaraki
Kyoto College of Graduate Studies for Informatics
Kyoto, Japan

Oya Ekin Karaşan


Department of Industrial Engineering
Bilkent University
Ankara, Turkey

Alexander Kogan
Rutgers Business School and RUTCOR
Rutgers University
Piscataway, NJ, USA

Kazuhisa Makino
Department of Mathematical Informatics
University of Tokyo
Tokyo, Japan

Bruno Simeone
Department of Statistics
La Sapienza University
Rome, Italy
Preface

Boolean functions, meaning {0, 1}-valued functions of a finite number of {0, 1}-
valued variables, are among the most fundamental objects investigated in pure and
applied mathematics. Their importance can be explained by several interacting
factors.

• It is reasonable to argue that a multivariate function f : A1 × A2 × . . . × An →


A is “interesting” only if each of the sets A1 , A2 , . . . , An , and A contains at
least two elements, since otherwise the function either depends trivially on
some of its arguments, or is constant. Thus, in a sense, Boolean functions are
the “simplest interesting” multivariate functions. It may even be surprising,
actually, that such primitive constructs turn out to display a rich array of
properties and have been investigated by various breeds of scientists for
more than 150 years.
• When the arguments of a Boolean function are viewed as atomic logical
propositions, the value of the function at a 0–1 point can be interpreted as
the truth value of a sentence composed from these propositions. Carrying out
calculations on Boolean functions is then tantamount to performing related
logical operations (such as inference or theorem-proving) on propositional
sentences. Therefore, Boolean functions are at the heart of propositional logic.
• Many concepts of combinatorial analysis have their natural Boolean coun-
terpart. In particular, since every 0–1 point with n coordinates can be
viewed as the characteristic vector of a subset of N = {1, 2, . . . , n}, the set
of points at which a Boolean function takes value 1 corresponds to a col-
lection of subsets of N , or a “hypergraph” on N . (When all subsets have
cardinality 2, then the function corresponds exactly to a graph.) Structural
properties relating to the transversals, stable sets, or colorings of the hyper-
graph, for instance, often translate into interesting properties of the Boolean
function.
• Boolean functions are ubiquitous in theoretical computer science, where they
provide fundamental models for the most basic operations performed by

xv
xvi Preface

computers on binary digits (or bits). Turing machines and Boolean circuits
are prime examples illustrating this claim. Similarly, electrical engineers rely
on the Boolean formalism for the description, synthesis, or verification of
digital circuits.
• In operations research or management science, binary variables and Boolean
functions are frequently used to formulate problems where a number of “go –
no go” decisions are to be made; these could be, for instance, investment
decisions arising in a financial management framework, or location deci-
sions in logistics, or assignment decisions for production planning. In most
cases, the variables have to be fixed at values that satisfy constraints express-
ible as Boolean conditions and that optimize an appropriate real-valued
objective function. This leads to – frequently difficult – Boolean equations
(“satisfiability problems”) or integer programming problems.
• Voting games and related systems of collective choice are frequently repre-
sented by Boolean functions, where the variables are associated with (binary)
alternatives available to the decision makers, and the value of the function
indicates the outcome of the process.
• Various branches of artificial intelligence rely on Boolean functions to express
deductive reasoning processes (in the above-mentioned propositional frame-
work), or to model primitive cognitive and memorizing activities of the brain
by neural networks, or to investigate efficient learning strategies, or to devise
storing and retrieving mechanisms in databases, and so on.

We could easily extend this list to speak of Boolean models arising in reliability
theory, in cryptography, in coding theory, in multicriteria analysis, in mathematical
biology, in image processing, in theoretical physics, in statistics, and so on.
The main objective of the present monograph is to introduce the reader to the
fundamental elements of the theory of Boolean functions. It focuses on algebraic
representations of Boolean functions, especially disjunctive or conjunctive nor-
mal form expressions, and it provides a very comprehensive presentation of the
structural, algorithmic, and applied aspects of the theory in this framework.
The monograph is divided into three main parts.
Part I: Foundations proposes in Chapter 1: Fundamental concepts and applica-
tions, an introduction to the major concepts and applications of the theory. It then
successively tackles three generic classes of problems that play a central role in the
theory and in the applications of Boolean functions, namely, Boolean equations
and their extensions in Chapter 2: Boolean equations, the generation of prime
implicants and of optimal normal form representations in Chapter 3: Prime impli-
cants and minimal DNFs, and various aspects of the relation between functions
and their dual in Chapter 4: Duality theory.
Part II: Special Classes presents an in-depth study of several remarkable classes
of Boolean functions. Each such class is investigated from both the structural and
the algorithmic points of view. Chapter 5 is devoted to Quadratic functions, Chapter
6 to Horn functions, Chapter 7 to Orthogonal forms and shellability, Chapter 8 to
Preface xvii

Regular functions, Chapter 9 to Threshold functions, and Chapter 10 to Read-once


functions. Chapter 11: Characterizations of special classes by functional equations
provides general conditions under which classes of functions can be “compactly”
characterized.
Finally, Part III: Generalizations deals with two fruitful extensions of the
concept of Boolean functions. Namely, Chapter 12: Partially defined Boolean
functions deals with functions whose domain is restricted to a subset of all pos-
sible {0, 1} points, and Chapter 13: Pseudo-Boolean functions proposes a brief
overview of the theory of real-valued functions of binary variables.
In view of its emphasis on algorithms and applications, this monograph should
appeal to researchers and graduate students in discrete mathematics, operations
research, computer science, engineering, and economics. Although we believe
that it is rather unique in its depth and breadth, our work has been influenced in
various ways by many other books dealing with specialized aspects of the field,
such as threshold logic, logical inference, operations research, game theory, or
reliability theory. We like to mention, in particular, the classic monograph by P.L.
Hammer and S. Rudeanu, Boolean Methods in Operations Research and Related
Areas (Springer, Berlin, 1968). Although it focuses almost exclusively on Boolean
models, rather than pseudo-Boolean ones, it can be seen as a distant follow-up to
the 1968 monograph. We should also cite the influence of books by Anthony [25];
Brayton, Hachtel, McMullen, and Sangiovanni-Vincentelli [153]; Brown [156];
Chandru and Hooker [184]; Chang and Lee [186]; Hu [511, 512]; Jeroslow [533];
Kleine, Büning, and Lettmann [571]; Knuth [575]; Mendelson [680]; Muroga
[698, 699]; Ramamurthy [777]; Rudeanu [795, 796]; Schneeweiss [811]; Störmer
[849]; Truemper [871]; Wegener [902, 903]; and Winder [917], among others.
As a complement to the monograph, the reader is also advised to consult the
collection of papers Boolean Models and Methods in Mathematics, Computer
Science and Engineering (Y. Crama and P.L. Hammer, eds., Cambridge University
Press, Cambridge, UK, 2010). Each chapter in that volume introduces the reader
to specialized Boolean models and applications investigated in a particular field
of science and provides a survey of important representative results.
Acknowledgments

The genesis of this book spread over many years, and over this long period, the
authors have benefited from the support and advice provided by many individuals.
First and foremost, several colleagues have contributed important material to
the monograph: Endre Boros, Marty Golumbic, Vladimir Gurvich, Lisa Heller-
stein, Toshi Ibaraki, Oya Ekin Karaşan, Alex Kogan, Kaz Makino, and Bruno
Simeone have coauthored several chapters and have provided input on various
sections. Claude Benzaken and Nadia Brauner have developed a software pack-
age for manipulating Boolean functions that serves as a useful companion to the
monograph. The contributions of these prominent experts of Boolean functions
greatly enhance the appeal of the volume.
Comments, reviews, and corrections have been provided at different stages by
colleagues and by RUTCOR students, including Nina Feferman, Noam Goldberg,
Levent Kandiller, Shaoji Li, Tongyin Liu, Irina Lozina, Martin Milanic, Devon
Morrese, David Neu, Sergiu Rudeanu, Gábor Rudolf, Jan-Georg Smaus, and Mine
Subasi.
Special thanks are due to Endre Boros, who provided constant encouragement
and tireless advice to the authors over the gestation period of the volume. Terry
Hart provided the efficient administrative assistance that allowed the authors to
keep track of countless versions of the manuscript and endless mail exchanges.
Finally, I am deeply indebted to my mentor, colleague, and friend, Peter L.
Hammer, for getting us started on this ambitious project, many years ago. Peter
spent much of his academic career stressing the importance and relevance of
Boolean models in different fields of applied mathematics, and he was very keen
on completing this monograph. It is extremely unfair that he did not live to see
the outcome of our joint effort. I am sure that he would have loved it, and that he
would have been very proud of this contribution to the dissemination of the theory,
algorithms, and applications of Boolean functions.
Yves Crama
Liège, Belgium, September 2010

xix
Notations

B = {0, 1}, U = [0, 1]


n
X = (x 1 , x2 , . . . , xn ), Y = (y1 , y2 , . . . , yn ), . . .: components of points in B
x, if α = 1,
xα =
x, if α = 0.
X ∨ Y = (x1 ∨ y1 , x2 ∨ y2 , . . . , xn ∨ yn )
X ∧ Y = (x1 ∧ y1 , x2 ∧ y2 , . . . , xn ∧ yn ) = (x1 y1 , x2 y2 , . . . , xn yn )
X = (x 1 , x 2 , . . . , x n )
X ≤ Y (with X, Y ∈ Bn ) if and only if xi ≤ yi for i = 1, 2, . . . , n
ek : a unit vector (0, . . . , 0, 1, 0, . . . , 0) of appropriate dimension, with 1 in kth
position

eA : the characteristic vector of A ⊆ {1, 2, . . . , n}, that is, eA = k∈A ek ; e∅ = 0.
supp(X) : the support of X ∈ Bn , that is, the set { i ∈ {1, 2, . . . , n} | xi = 1}
TA,B = {X ∈ Bn | xi = 1 for all i ∈ A and xj = 0 for all j ∈ B}
f , g, h, . . . : Boolean functions
φ, ψ, θ, . . . : Boolean expressions
1n : the function that takes constant value 1 on B n
0 n : the function that takes constant value 0 on B n
T (f ): the set of true points of function f
F (f ): the set of false points of function f
minT (f ): the set of minimal true points of a positive function f
maxF (f ): the set of maximal false points of a positive function f
f d : the dual of function f
|φ| : the (encoding) length, or size, of a Boolean expression φ; when φ is a DNF,
|φ| is simply the number of literals appearing in φ
|f |: for a positive function f , |f | denotes the size of the complete (prime
def
irredundant) DNF φ of f , that is, |f | = |φ|
||φ|| : the number of terms of a DNF φ
(ω1 , ω2 , . . . , ωn , ω) : the Chow parameters of a Boolean function on B n
(π1 , π2 , . . . , πn , π) : the modified Chow parameters of a Boolean function on B n

xxi
Part I

Foundations
1
Fundamental concepts and applications

The purpose of this introductory chapter is threefold. First, it contains the main
definitions, terminology, and notations that are used throughout the book. After the
introduction of our main feature characters – namely, Boolean functions – several
sections are devoted to a discussion of alternative representations, or expressions,
of Boolean functions. Disjunctive and conjunctive normal forms, in particular, are
discussed at length in Sections 1.4–1.11. These special algebraic expressions play
a very central role in our investigations, as we frequently focus on the relation
between Boolean functions and their normal forms. Section 1.12, however, also
provides a short description of different types of function representations, namely,
representations over GF(2), pseudo-Boolean polynomial expressions, and binary
decision diagrams.
A second objective of this chapter is to introduce several of the topics to be
investigated in more depth in subsequent chapters, namely: fundamental algorith-
mic problems (Boolean equations, generation of prime implicants, dualization,
orthogonalization, etc.) and special classes of Boolean functions (bounded-degree
normal forms, monotone functions, Horn functions, threshold functions, etc.).
Finally, the chapter briefly presents a variety of applications of Boolean functions
in such diverse fields as logic, electrical engineering, reliability theory, game the-
ory, combinatorics, and so on. These applications have often provided the primary
motivation for the study of the problems to be encountered in the next chapters.
In a sense, this introductory chapter provides a (very) condensed digest of
what’s to come. It can be considered a degustation: Its main purpose is to whet the
appetite, so that readers will decide to embark on the full course!

1.1 Boolean functions: Definitions and examples


This book is about Boolean functions, meaning: {0, 1}-valued functions of a finite
number of {0, 1}-valued variables.

3
4 1 Fundamental concepts and applications

Definition 1.1. A Boolean function of n variables is a function on B n into B,


where B is the set {0, 1}, n is a positive integer, and Bn denotes the n-fold carte-
sian product of the set B with itself. A point X ∗ = (x1 , x2 , . . . , xn ) ∈ B n is a true point
(respectively, false point) of the Boolean function f if f (X ∗ ) = 1 (respectively,
f (X∗ ) = 0). We denote by T (f ) (respectively, F (f )) the set of true points (respec-
tively, false points) of f . We denote by 1n the function that takes constant value 1
on Bn and by 0 n the function that takes constant value 0 on B n .
It should be stressed that, in many applications, the role of the set B is
played by another two-element set, like {Yes, No}, {True, False}, {ON, OFF},
{Success, Failure}, {−1, 1} or, more generally, {a, b}, where a and b are abstract
(uninterpreted) elements. In most cases, this distinction is completely irrelevant.
However, it is often convenient to view the elements of B as numerical quanti-
ties in order to perform arithmetic operations on these elements and to manipulate
algebraic expressions like 1−x, x +y −xy, and so on, where x, y are elements of B.
As an historical aside, it is interesting to note that the ability to perform algebraic
computations on logical symbols, in a way that is at least formally similar to what
we are used to doing for numerical quantities, was one of the driving forces behind
George Boole’s seminal work in logic theory. Let us quote from Boole [103],
Chapter V.6 (italics are Boole’s):
[...] any system of propositions may be expressed by equations involving symbols
x, y, z, which, whenever interpretation is possible, are subject to laws identical in form
with the laws of a system of quantitative symbols, susceptible only of the values 0 and
1. But as the formal processes of reasoning depend only upon the laws of the symbols,
and not upon the nature of their interpretation, we are permitted to treat the above
symbols, x, y, z, as if they were quantitative symbols of the kind above described. We
may in fact lay aside the logical interpretation of the symbols in the given equation;
convert them into quantitative symbols, susceptible only of the values 0 and 1; perform
upon them as such all the requisite processes of solution; and finally restore to them
their logical interpretation. And this is the mode of procedure which will actually be
adopted [...]

In this book, we systematically follow Boole’s prescription and adhere to the


convention that B = {0, 1}, where 0 and 1 can be viewed as either abstract symbols
or numerical quantities.
The most elementary way to define a Boolean function f is to provide its truth
table.
Definition 1.2. The truth table of a Boolean function on B n is a complete list of
all the points in Bn together with the value of the function at each point.

Example 1.1. The truth table of a Boolean function on B 3 is shown in


Table 1.1. 

Of course, the use of truth tables becomes extremely cumbersome when the
function to be defined depends on more than, say, 5 or 6 arguments. As a matter
1.1 Boolean functions: Definitions and examples 5

Table 1.1. Truth Table for Example 1.1

(x1 , x2 , x3 ) f (x1 , x2 , x3 )

(0,0,0) 1
(0,0,1) 1
(0,1,0) 0
(0,1,1) 1
(1,0,0) 0
(1,0,1) 1
(1,1,0) 0
(1,1,1) 1

of fact, Boolean functions are often defined implicitly rather than explicitly, in the
sense that they are described through a procedure that allows us, for any 0 − 1 point
in the domain of interest, to compute the value of the function at this point. In some
theoretical developments, or when we analyze the computational complexity of
certain problems, such a procedure can simply be viewed as a black box oracle, of
which we can observe the output (that is, the function value) for any given input,
but not the inner working (that is, the details of the algorithm that computes the
output). In most applications, however, more information is available regarding
the process that generates the function of interest, as illustrated by the examples
below. (We come back to these applications in much greater detail in Section 1.13
and in many subsequent chapters of the book.)

Application 1.1. (Logic.) In many applications (such as those arising in artificial


intelligence), a Boolean function can be viewed as indicating the truth value of a
sentence of propositional (or Boolean) logic. Consider, for instance, the sentence
S: “If it rains in the morning, or if the sky is cloudy, then I carry my umbrella.” Let
us denote by x1 , x2 , and x3 , respectively, the subsentences “it rains in the morning,”
“the sky is cloudy,” and “I carry my umbrella”. Then, S can be identified with the
sentence
(x1 OR x2 ) ⇒ x3 .
It is easy to see that the function displayed in Table 1.1 computes the truth value of
S for all possible values of x1 , x2 , x3 , under the usual correspondence True ↔ 1,
False ↔ 0. 

Application 1.2. (Electrical engineering.) In electrical or in computer engineer-


ing, a switching circuit is often abstracted into the following model, called a
combinational circuit. The wiring of the circuit is described by an acyclic directed
graph D = (V , A). The vertices of D are the gates of the circuit. The indegree of
each gate is at most 2. Each gate with indegree 2 is labeled either AND or OR, and
each gate with indegree 1 is labeled NOT. The gates with indegree 0 are called
input gates and are denoted v1 , v2 , . . . , vn . Also, all gates of D have outdegree 1,
except for a single gate f , called output gate, which has outdegree 0.
6 1 Fundamental concepts and applications

Every such circuit can be viewed as representing a Boolean function


fD (x1 , x2 , . . . , xn ). First, for every (x1 , x2 , . . . , xn ) ∈ Bn , the state s(v) of gate v ∈ V
is computed according to the following recursive rules:

1. For each input gate vi , s(vi ) = xi (i = 1, 2, . . . , n).


2. For each AND-gate v ∈ V , if (u, v), (w, v) ∈ A are the arcs entering v, then
s(v) = min(s(u), s(w)).
3. For each OR-gate v ∈ V , if (u, v), (w, v) ∈ A are the arcs entering v, then
s(v) = max(s(u), s(w)).
4. For each NOT-gate v ∈ V , if (u, v) ∈ A is the arc entering v, then s(v) =
1 − s(u). Finally, we let fD (x1 , x2 , . . . , xn ) = s(f ).

For instance, the circuit represented in Figure 1.1 computes the function given
in Example 1.1. This can easily be verified by computing the state of the output gate
(in this case, the OR-gate) for all possible 0–1 inputs. For example, if (x1 , x2 , x3 ) =
(0, 0, 0), then one successively finds that the state of each NOT-gate is 1 (= 1 − 0);
the state of the AND-gate is 1 (= min(1, 1)); and the state of the output gate is 1
(= max(1, 0)).
More generally, the gates of a combinational circuit may be “primitive”
Boolean functions forming another class from the {AND,OR,NOT} collection used
in our small example. In all cases, the gates may be viewed as atomic units of hard-
ware, providing the building blocks for the construction of larger circuits. 

Historically, propositional logic and electrical engineering have been the main
nurturing fields for the development of research on Boolean functions. However,
because they are such fundamental mathematical objects, Boolean functions have
also been used to model a large number of applications in a variety of areas. To
describe these applications, we introduce a few more notations.
Given a point X ∈ Bn , we denote by supp(X) the support of X, that is, supp(X)
is the set { i ∈ {1, 2, . . . , n} | xi = 1}. (Conversely, X is the characteristic vector of
supp(X).)

#✥ #✥
✲ NOT ❍
v1
❍❍ #✥
✧✦ ✧✦ ❍


#✥ #✥ AND ❍


✟ ❍❍
✟ ✧✦ ❍❍
✲ NOT ✟✟ #✥
v2 ❍❍

✧✦ ✧✦ OR
✟✯

✧✦
#✥ ✟✟
✟ ✟
v3 ✟
✧✦
Figure 1.1. A small combinational circuit.
1.1 Boolean functions: Definitions and examples 7

Application 1.3. (Game theory.) Many group decision procedures (such as those
used in legislative assemblies or in corporate stockholder meetings) can be viewed,
in abstract terms, as decision rules that associate a single dichotomous “Yes–No”
outcome (for instance, adoption or rejection of a resolution) with a collection
of dichotomous “Yes–No” votes (for instance, assent or disagreement of indi-
vidual lawmakers). Such procedures have been studied in the game-theoretic
literature under the name of simple games or voting games. More formally, let
N = {1, 2, . . . , n} be a finite set, the elements of which are to be called players.
A simple game on N is a function v : {A | A ⊆ N } → B. Clearly, from our van-
tage point, a simple game can be equivalently modeled as a Boolean function fv
on Bn : The variables of fv are in 1-to-1 correspondence with the players of the
game (variable i takes value 1 exactly when player i votes “Yes”), and the value
of the function reflects the outcome of the vote for each point X ∗ ∈ B n describing
a vector of individual votes:

1 if v(supp(X∗ )) = 1,
fv (X ∗ ) =
0 otherwise.

Application 1.4. (Reliability theory.) Reliability theory investigates the relation-
ship between the operating state of a complex system S and the operating state
of its individual components, say components 1, 2, . . . , n. It is commonly assumed
that the system and its components can be in either of two states: operative or
failed. Moreover, the state of the system is completely determined by the state of
its components via a deterministic rule embodied in a Boolean function fS on B n ,
called the structure function of the system: For each X∗ ∈ Bn ,

1 if the system operates when all components in supp(X ∗ ) operate

fS (X ) = and all other components fail,

0 otherwise.
A central issue is to compute the probability that the system operates (meaning
that fS takes value 1) when each component is subject to probabilistic failure.
Thus, reliability theory deals primarily with the stochastic theory of Boolean
functions. 

Application 1.5. (Combinatorics.) Consider a hypergraph H = (N , E), where


N = {1, 2, . . . , n} is the set of vertices of H, and E is a collection of subsets of N ,
called edges of the hypergraph. A subset of vertices is said to be stable if it does
not contain any edge of H. With H, we associate the Boolean function fH defined
as follows: For each X∗ ∈ Bn ,

∗ 1 if supp(X∗ ) is not stable,
fH (X ) =
0 otherwise.
The function fH is the stability function of H. 
Of course, the kinship among the models presented in Applications 1.3–1.5 is
striking: It is immediately apparent that we are really dealing here with a single
8 1 Fundamental concepts and applications

class of mathematical objects, in spite of the distinct motivations that originally


justified their investigation.
Applications of Boolean functions will be discussed more thoroughly in
Section 1.13, after we have introduced some of the fundamental theoretical
concepts that underlie them.
Before we close this section, let us add that, in this book, our view of Boolean
functions will be mostly combinatorial and algorithmic. For algebraic or logic-
oriented treatments, we refer the reader to the excellent books by Rudeanu
[795, 796] or Brown [156]. In these books, as in many related classical publi-
cations by other authors, Boolean functions are actually defined more broadly
than in Definition 1.1, as (special) mappings of the form f : An → A, where A
is the carrier of an arbitrary Boolean algebra (A, ∪, ∩, ¬, 0, 1). By contrast, we
shall essentially restrict ourselves in this book to the two-element Boolean algebra
(B, ∨, ∧, · , 0, 1), where B = {0, 1} (see Section 1.2). Brown [156], in particular,
discusses in great detail the pros and cons of working with two-element, rather
than more general, Boolean algebras. While acknowledging the relevance of his
arguments, we feel that, at the risk of giving up some generality, our restricted
framework is already sufficiently rich to model a variety of interesting applica-
tions and to allow us to handle a host of challenging algorithmic problems of
a combinatorial nature. Also, the terminology introduced in Definition 1.1 has
become sufficiently entrenched to justify its continued use, rather than the alterna-
tive terminology switching functions or truth functions which, though less liable
to create confusion, has progressively become obsolete.

1.2 Boolean expressions


As the above examples illustrate, Boolean functions can be described in many alter-
native ways. In this section, we concentrate on a type of representation derived from
propositional logic, namely, the representation of Boolean functions by Boolean
expressions (see, for instance, [156, 680, 795, 848] for different presentations).
Boolean expressions will be used extensively throughout the book. In fact, the
emphasis on Boolean expressions (rather than truth tables, circuits, oracles, etc.)
can be seen as a main distinguishing feature of our approach and will motivate
many of the issues we will tackle in subsequent chapters.
Our definition of Boolean expressions will be recursive, starting with three
elementary operations as building blocks.
Definition 1.3. The binary operation ∨ (disjunction, Boolean OR), the binary
operation ∧ (conjunction, Boolean AND), and the unary operation · (comple-
mentation, negation, Boolean NOT) are defined on B by the following rules:
0 ∨ 0 = 0, 0 ∨ 1 = 1, 1 ∨ 0 = 1, 1 ∨ 1 = 1;
0 ∧ 0 = 0, 0 ∧ 1 = 0, 1 ∧ 0 = 0, 1 ∧ 1 = 1;
0 = 1, 1 = 0.
1.2 Boolean expressions 9

For a Boolean variable x, we sometimes use the following convenient notation:



α x, if α = 1,
x =
x, if α = 0.

Keeping in line with our focus on functions, we often regard the three elemen-
tary Boolean operations as defining Boolean functions on B2 : disj(x, y) = x ∨ y,
conj(x, y) = x ∧ y, and on B: neg(x) = x. When the elements of B = {0, 1} are
interpreted as integers rather than abstract symbols, these operations can be defined
by simple arithmetic expressions: For all x, y ∈ B,

x ∨ y = max{x, y} = x + y − x y,

x ∧ y = min{x, y} = x y,

x = 1 − x.
Observe that the conjunction of two elements of B is equal to their arithmetic
product. By analogy with the usual convention for products, we often omit the
operator ∧ and denote conjunction by mere juxtaposition.
We can extend the definitions of all three elementary operators to Bn by writing:
For all X, Y ∈ Bn ,

X ∨ Y = (x1 ∨ y1 , x2 ∨ y2 , . . . , xn ∨ yn ),

X ∧ Y = (x1 ∧ y1 , x2 ∧ y2 , . . . , xn ∧ yn ) = (x1 y1 , x2 y2 , . . . , xn yn ),

X = (x 1 , x 2 , . . . , x n ).
Let us enumerate some of the elementary properties of disjunction, conjunc-
tion, and complementation. (We note for completeness that the properties listed
in Theorem 1.1 can be viewed as the defining properties of a general Boolean
algebra.)

Theorem 1.1. For all x, y, z ∈ B, the following identities hold:

(1) x ∨ 1 = 1 and x ∧ 0 = 0;
(2) x ∨ 0 = x and x ∧ 1 = x;
(3) x ∨ y = y ∨ x and x y = y x (commutativity);
(4) (x ∨ y) ∨ z = x ∨ (y ∨ z) and x (y z) = (x y) z (associativity);
(5) x ∨ x = x and x x = x (idempotency);
(6) x ∨ (x y) = x and x (x ∨ y) = x (absorption);
(7) x ∨ (y z) = (x ∨ y) (x ∨ z) and x (y ∨ z) = (x y) ∨ (x z) (distributivity);
(8) x ∨ x = 1 and x x = 0;
(9) x = x (involution);
(10) (x ∨ y) = x y and (x y) = x ∨ y (De Morgan’s laws);
(11) x ∨ (x y) = x ∨ y and x (x ∨ y) = x y (Boolean absorption).
10 1 Fundamental concepts and applications

Proof. These identities are easily verified, for example, by exhausting all possible
values for x, y, z. 

Building upon Definition 1.3, we are now in a position to introduce the important
notion of Boolean expression.
Definition 1.4. Given a finite collection of Boolean variables x1 , x2 , . . . , xn , a
Boolean expression (or Boolean formula) in the variables x1 , x2 , . . . , xn is defined
as follows:
(1) The constants 0, 1, and the variables x1 , x2 , . . . , xn are Boolean expressions
in x1 , x2 , . . . , xn .
(2) If φ and ψ are Boolean expressions in x1 , x2 , . . . , xn , then (φ ∨ ψ), (φ ψ)
and φ are Boolean expressions in x1 , x2 , . . . , xn .
(3) Every Boolean expression is formed by finitely many applications of the
rules (1)–(2).
We also say that a Boolean expression in the variables x1 , x2 , . . . , xn is a Boolean
expression on Bn .
We use notations like φ(x1 , x2 , . . . , xn ) or ψ(x1 , x2 , . . . , xn ) to denote Boolean
expressions in the variables x1 , x2 , . . . , xn .
Example 1.2. Here are some examples of Boolean expressions:
φ1 (x) = x,
φ2 (x) = x,
ψ1 (x, y, z) = (((x ∨ y)(y ∨ z)) ∨ ((xy)z)),
ψ2 (x1 , x2 , x3 , x4 ) = ((x1 x2 ) ∨ (x 3 x 4 )). 

Now, since disjunction, conjunction, and complementation can be interpreted as


Boolean functions, every Boolean expression φ(x1 , x2 , . . . , xn ) can also be viewed
as generating a Boolean function defined by composition.
Definition 1.5. The Boolean function fφ represented (or expressed) by a Boolean
expression φ(x1 , x2 , . . . , xn ) is the unique Boolean function on Bn defined as fol-
lows: For every point (x1∗ , x2∗ , . . . , xn∗ ) ∈ Bn , the value of fφ (x1∗ , x2∗ , . . . , xn∗ ) is
obtained by substituting xi∗ for xi (i = 1, 2, . . . , n) in the expression φ and by
recursively applying Definition 1.3 to compute the value of the resulting expression.
When f = fφ on B n , we also say that f admits the representation or the
expression φ, and we simply write f = φ.
Example 1.3. Consider again the expressions defined in Example 1.2. We can
compute, for instance:
fφ1 (0) = 0, fφ1 (1) = 1,
fφ2 (0) = 0 = 1, fφ2 (1) = 1 = 0,
fψ1 (0, 0, 0) = (((0 ∨ 0)(0 ∨ 0)) ∨ ((00)0)) = 1, . . .
1.2 Boolean expressions 11

In fact, the expression ψ1 in Example 1.2 represents the function f , where


f (0, 0, 1) = f (1, 0, 0) = f (1, 0, 1) = 0,
f (0, 0, 0) = f (0, 1, 0) = f (0, 1, 1) = f (1, 1, 0) = f (1, 1, 1) = 1.
Thus, we can write
f (x, y, z) = ψ1 (x, y, z) = (((x ∨ y)(y ∨ z)) ∨ ((xy)z)). 
Remark. So that we can get rid of parentheses when writing Boolean expressions,
we assume from now on a priority ranking of the elementary operations: Namely,
we assume that disjunction has lower priority than conjunction, which has lower
priority than complementation. When we compute the value of a parentheses-free
expression, we always start with the operations of highest priority: First, all com-
plementations; next, all conjunctions; and finally, all disjunctions. (This is similar
to the convention that assigns a lower priority to addition than to multiplication, and
to multiplication than to exponentiation when evaluating an arithmetic expression
like 3x 2 + 5xy.) Moreover, we also discard any parentheses that become redun-
dant as a consequence of the associativity property of disjunction and conjunction
(Theorem 1.1).
Example 1.4. The expression ψ1 in Example 1.2 (and hence, the function f in
Example 1.3) can be rewritten with fewer parentheses as f (x, y, z) = ψ1 (x, y, z) =
(x ∨ y)(y ∨ z) ∨ xyz. Similarly, the expression ψ2 in Example 1.2 can be rewritten
as ψ2 (x1 , x2 , x3 , x4 ) = x1 x2 ∨ x 3 x 4 . 

The relation between Boolean expressions and Boolean functions, as spelled


out in Definition 1.5, deserves to be carefully pondered.
On one hand, it is important to understand that every Boolean function can
be represented by numerous Boolean expressions (see, for instance, Theorem 1.4
n
in the next section). In fact, it is easy to see that there are “only” 22 Boolean
functions of n variables, while there are infinitely many Boolean expressions in n
variables. These remarks motivate the distinction we draw between functions and
expressions.
On the other hand, since every Boolean expression φ represents a unique
Boolean function fφ , we are justified in interpreting φ itself as a function, and we
frequently do so. The notation f = φ introduced in Definition 1.5, in particular,
may initially seem confusing, since it equates a function with a formal expression,
but this notational convention is actually innocuous: It is akin to the convention
for real-valued functions of real variables, where it is usual to assimilate a function
with its analytical expression and to write, for instance, equalities like
f (x, y) = x 2 + 2xy + y 2 = (x + y)2 . (1.1)
As a matter of fact, since Definition 1.5 implies that we write both f = ψ and f = φ
when ψ and φ represent the same function f (compare with Equation (1.1)), it
also naturally leads to the next notion.
12 1 Fundamental concepts and applications

Definition 1.6. We say that two Boolean expressions ψ and φ are equivalent
if they represent the same Boolean function. When this is the case, we write
ψ = φ.

Note that any two expressions that can be deduced from each other by repeated
use of the properties listed in Theorem 1.1 are equivalent even though they are not
identical.

Example 1.5. The function f (x, y, z) represented by ψ1 (x, y, z) = (x ∨ y)(y ∨ z)


∨ xyz (see previous examples) is also represented by the expression φ = x z ∨ y.
Indeed,

(x ∨ y)(y ∨ z) ∨ xyz = (xy ∨ x z ∨ yy ∨ yz) ∨ xyz (distributivity)


= xy ∨ x z ∨ y ∨ yz ∨ xyz (idempotency
and associativity)
= xz∨y (absorption).

Thus, ψ1 (x, y, z) and φ(x, y, z) are equivalent, that is, ψ1 (x, y, z) = φ(x, y, z). 

A recurrent theme in Boolean theory concerns the transformation of Boolean


expressions into equivalent expressions that display specific desirable proper-
ties. For instance, in the previous example, the expression φ is intuitively much
“simpler” or “shorter” than ψ1 , even though these two expressions represent the
same function. More generally, for algorithmic purposes, it is necessary to have a
definition of the length of a Boolean expression.

Definition 1.7. The length (or size) of a Boolean expression φ is the number of
symbols used in an encoding of φ as a binary string. The length of φ is denoted by
|φ|.

We refer to standard books on computational complexity for additional com-


ments regarding this concept (see, for instance, [371, 725]). For most practical
purposes, we can conveniently think of |φ| as the total number of symbols
(constants, variables, operators, parentheses) occuring in the expression φ.
To conclude this important section on Boolean expressions, let us note that
complex Boolean functions can be introduced by substituting functional symbols
for the variables of a Boolean expression. That is, if ψ(y1 , y2 , . . . , ym ) is a Boolean
expression on B m , and f1 , f2 , . . . , fm are m Boolean functions on B n , then the
Boolean function ψ(f1 , f2 , . . . , fm ) can be defined in the natural way: Namely, for
all X ∗ ∈ Bn ,
 
ψ(f1 , f2 , . . . , fm ) (X ∗ ) = ψ(f1 (X ∗ ), f2 (X ∗ ), . . . , fm (X ∗ )), (1.2)

where we identify the expression ψ with the function fψ that it represents (thus,
(1.2) simply boils down to function composition). In particular, if f and g are two
1.3 Duality 13

Boolean functions on B n , then the functions f ∨ g, f ∧ g, and f are defined, for


all X∗ ∈ Bn , by

(f ∨ g)(X∗ ) = f (X ∗ ) ∨ g(X∗ ),
(f ∧ g)(X∗ ) = f (X ∗ ) ∧ g(X∗ ),
f (X ∗ ) = f (X ∗ ).

1.3 Duality
With every Boolean function f , the following definition associates another
Boolean function f d called the dual of f :
Definition 1.8. The dual of a Boolean function f is the function f d defined by

f d (X) = f (X)

for all X = (x1 , x2 , . . . , xn ) ∈ Bn , where X = (x 1 , x 2 , . . . , x n ).

Example 1.6. Let f be the 2-variable function defined by f (0, 0) = f (0, 1) =


f (1, 1) = 1 and f (1, 0) = 0. Then the dual of f satisfies f d (0, 0) = f d (1, 0) =
f d (1, 1) = 0 and f d (0, 1) = 1. 

Dual functions arise naturally in many Boolean models. We only describe here
one simple occurence of this concept; more applications are discussed in Chapter 4.
Application 1.6. (Voting theory.) Suppose that a voting procedure is modeled by
a Boolean function f on Bn , as explained in Application 1.3. Thus, when the play-
ers’ votes are described by the Boolean point X∗ ∈ Bn , the outcome of the voting
procedure is f (X∗ ). What happens if all the players simultaneously change their
minds and vote X ∗ rather than X∗ ? In many cases, we would expect the outcome of
the procedure to be reversed as well, that is, we would expect f (X ∗ ) = f (X ∗ ), or
equivalently, f (X∗ ) = f (X ∗ ) = f d (X ∗ ). When the property f (X) = f d (X) holds
for all X ∈ B n , we say that the function f (and the voting procedure it describes) is
self-dual. Note, however, that some common voting procedures are not self-dual,
as exemplified by the two-thirds majority rule. 

We first list some useful properties of dualization.


Theorem 1.2. If f and g are Boolean functions, then
(a) (f d )d = f (involution: the dual of the dual is the function itself);
(b) (f )d = (f d );
(c) (f ∨ g)d = f d g d ;
(d) (f g)d = f d ∨ g d .
14 1 Fundamental concepts and applications

Proof. Definition 1.8 immediately implies (a) and (b). For property (c), observe
that
(f ∨ g)d (X) = (f ∨ g)(X)
= (f (X) ∨ g(X))
= f (X) g(X) (by de Morgan’s laws)
d d
= f (X) g (X).
Property (d) follows from (a) and (c). 

Observe that, in view of property (a), dualization defines a bijective correspon-


dence on the space of Boolean functions.
It is natural to ask how the Boolean expressions of a function relate to the
expressions of its dual. To settle this question, we introduce one more definition.
Definition 1.9. The dual of a Boolean expression φ is the expression φ d obtained
by exchanging the operators ∨ and ∧, as well as the constants 0 and 1, in φ.

Example 1.7. If φ(x, y, z) = (x ∨ y)(y ∨ z) ∨ xyz, then φ d (x, y, z) = (xy ∨ yz)


(x ∨ y ∨ z). 

For our notations and terminology to be consistent, φ d should represent the


dual of the function represented by φ. This is indeed the case.
Theorem 1.3. If the expression φ represents the Boolean function f , then the
expression φ d represents f d .
Proof. Let t denote the total number of conjunction, disjunction and negation
operators in φ. We prove the theorem by induction on t. If t = 0, then φ is either
a constant or a literal, and the statement is easily seen to hold.
Assume now that t > 0. Then, by Definition 1.4, φ takes either the form ψ ∨ θ ,
or the form ψθ , or the form ψ. Assume, for instance, that φ = ψ ∨ θ (the other
cases are similar). Then, by Definition 1.9, φ d = ψ d θ d . Let g be the function
represented by ψ and let h be the function represented by θ. By induction, ψ d and
θ d represent g d and hd , respectively. So, φ d represents g d hd , which is equal to f d
by Theorem 1.2. 

Duality is an important concept in Boolean theory and we shall return to this


topic many times in subsequent chapters of this book. Chapter 4, in particular, is
fully devoted to duality.

1.4 Normal forms


In this section, we discuss some classes of Boolean expressions of special inter-
est. Let us adopt the following notations: If {φk |k ∈ *} is a family of Boolean
1.4 Normal forms 15


expressions indexed over the set * = {k1 , k2 , . . . , km }, then we denote by k∈* φk

the expression (φk1 ∨ φk2 ∨ . . . ∨ φkm ), and we denote by k∈* φk the expression

(φk1 ∧ φk2 ∧ . . . ∧ φkm ). By convention, when * is empty, k∈* φk is equivalent to

the constant 0 and k∈* φk is equivalent to the constant 1.
Definition 1.10. A literal is an expression of the form x or x, where x is a Boolean
variable. An elementary conjunction (sometimes called term, or monomial, or
cube) is an expression of the form

C= xi x j , where A ∩ B = ∅,
i∈A j ∈B

and an elementary disjunction (sometimes called clause) is an expression of the


form
D= xi ∨ x j , where A ∩ B = ∅,
i∈A j ∈B

where A, B are disjoint subsets of indices.


A disjunctive normal form (DNF) is an expression of the form
m
m



Ck = xi xj ,
k=1 k=1 i∈Ak j ∈Bk

where each Ck (k = 1, 2, . . . , m) is an elementary conjunction; we say that each


conjunction Ck is a term of the DNF.
A conjunctive normal form (CNF) is an expression of the form

m
m
Dk = xi ∨ xj ,
k=1 k=1 i∈Ak j ∈Bk

where each Dk (k = 1, 2, . . . , m) is an elementary disjunction; we say that each


disjunction Dk is a clause of the CNF.
In particular, 0 is an elementary (empty) disjunction, 1 is an elementary (empty)
conjunction, and any elementary disjunction or conjunction is both a DNF and a
CNF. Additional illustrations of normal forms are provided in the next example.
Example 1.8. The expression φ(x, y, z) = x z ∨ y is a disjunctive normal form;
its terms are the elementary conjunctions x z and y. It is easy to check that φ is
equivalent to the CNF (x ∨ y)(y ∨ z) with clauses (x ∨ y) and (y ∨ z).
The expression ψ2 (x1 , x2 , x3 , x4 ) = x1 x2 ∨ x 3 x 4 is a DNF; it is equivalent to
the CNF (x1 ∨ x 3 )(x1 ∨ x 4 )(x2 ∨ x 3 )(x2 ∨ x 4 ). 
Bringing together the observations in Examples 1.5 and 1.8, we see that we
have obtained three different expressions for the same Boolean function f :
f (x, y, z) = (x ∨ y)(y ∨ z) ∨ xyz (1.3)
=xz ∨ y (1.4)
= (x ∨ y)(y ∨ z). (1.5)
16 1 Fundamental concepts and applications

In particular, we have been able to derive both a DNF representation (1.4) and a
CNF representation (1.5) of the original expression (1.3) (which is not a normal
form). This is not an accident. Indeed, we can now establish a fundamental property
of Boolean functions.
Theorem 1.4. Every Boolean function can be represented by a disjunctive normal
form and by a conjunctive normal form.
Proof. Let f be a Boolean function on B n , let T be the set of true points of f , and
consider the DNF



φf (x1 , x2 , . . . , xn ) = xi xj . (1.6)
Y ∈T i|yi =1 j |yj =0

If we interpret φf as a function on B n , then a point X∗ ∈ B n is a true point of φf


if and only if there exists Y = (y1 , y2 , . . . , yn ) ∈ T such that

xi∗ xj∗ = 1. (1.7)


i|yi =1 j |yj =0

But condition (1.7) simply means that xi∗ = 1 whenever yi = 1, and xi∗ = 0 when-
ever yi = 0, that is, X∗ = Y . Hence, X∗ is a true point of φf if and only if X ∗ ∈ T ,
and we conclude that φf represents f .
A similar reasoning establishes that f is also represented by the CNF


ψf (x1 , x2 , . . . , xn ) = xj ∨ xi , (1.8)
Y ∈F j |yj =0 i|yi =1

where F is the set of false points of f . 

Note that, alternatively, the second part of Theorem 1.4 can also be derived
from its first part by an easy duality argument. Indeed, in view of Theorem 1.3,
the function f is represented by the CNF


xi ∨ xj (1.9)
(A,B)∈* i∈A j ∈B

exactly when its dual f d is represented by the DNF





xi xj . (1.10)
(A,B)∈* i∈A j ∈B

Let us now illustrate Theorem 1.4 by an example.


Example 1.9. The set of true points of the function f represented by the expression
(1.3) is T = {(0, 0, 0), (0, 1, 0), (0, 1, 1), (1, 1, 0), (1, 1, 1)}, and its set of false points
is F = {(1, 0, 0), (0, 0, 1), (1, 0, 1))} (see Example 1.3). Thus, it follows from the
proof of Theorem 1.4 that f is also represented by the DNF
φf = x y z ∨ x y z ∨ x y z ∨ x y z ∨ x y z
1.4 Normal forms 17

and by the CNF


ψf = (x ∨ y ∨ z )(x ∨ y ∨ z )(x ∨ y ∨ z ).

The expressions (1.6) and (1.8) have a very special structure that is captured by
the following definitions:
Definition 1.11. A minterm (respectively, maxterm) on Bn is an elementary
conjunction (respectively, disjunction) involving exactly n literals.
Let f be a Boolean function on B n , let T (f ) be the set of true points of f , and
let F (f ) be its set of false points. The DNF



φf (x1 , x2 , . . . , xn ) = xi xj (1.11)
Y ∈T (f ) i|yi =1 j |yj =0

is the minterm expression (or canonical DNF) of f , and the terms of φf are the
minterms of f . The CNF


ψf (x1 , x2 , . . . , xn ) = xj ∨ xi (1.12)
Y ∈F (f ) j |yj =0 i|yi =1

is the maxterm expression (or canonical CNF) of f , and the terms of ψf are the
maxterms of f .
Observe that Definition 1.11 actually involves a slight abuse of language, since
the minterm (or the maxterm) expression of a function is unique only up to the
order of its terms and literals. In the sequel, we shall not dwell on this subtle, but
usually irrelevant, point and shall continue to speak of “the” minterm (or maxterm)
expression of a function.
With this terminology, the proof of Theorem 1.4 establishes that every Boolean
function is represented by its minterm expression. This observation can be traced all
the way back to Boole [103]. In view of its unicity, the minterm expression provides
a “canonical” representation of a function. In general, however, the number of
minterms (or, equivalently, of true points) of a function can be very large, so that
handling the minterm expression often turns out to be rather impractical.
Normal form expressions play a central role in the theory of Boolean functions.
Their preeminence is partially justified by Theorem 1.4, but this justification is
not sufficient in itself. Indeed, the property described in Theorem 1.4 similarly
holds for many other special classes of Boolean expressions. For instance, it can
be observed that, besides its DNF and CNF expressions, every Boolean function
also admits expressions involving only disjunctions and complementations, but no
conjunctions (as well as expressions involving only conjunctions and complemen-
tations, but no disjunctions). Indeed, as an immediate consequence of De Morgan’s
laws, every conjunction xy can be replaced by the equivalent expression (x ∨ y),
and similarly, every disjunction x ∨ y can be replaced by the expression (x y).
More to the point, and as we will repeatedly observe, normal forms arise
quite naturally when one attempts to model various problems within a Boolean
18 1 Fundamental concepts and applications

framework. For this reason, normal forms are ubiquitous in this book: Many of
the problems to be investigated will be based on the assumption that the Boolean
functions at hand are expressed in normal form or, conversely, will have as a goal
constructing a normal form of a function described in some alternative way (truth
table, arbitrary Boolean expression, etc.).
However, it should be noticed that DNF and CNF expressions provide closely
related frameworks for representing or manipulating Boolean functions (remember
the duality argument invoked at the end of the proof of Theorem 1.4). This dual
relationship between DNFs and CNFs will constitute, in itself, an object of study
in Chapter 4.
Most of the time, we display a slight preference for DNF representations of
Boolean functions over their CNF counterparts, to the extent that we discuss many
problems in terms of DNF rather than CNF representations. This choice is in
agreement with much of the classical literature on propositional logic, electrical
engineering, and reliability theory, but is opposite to the standard convention in the
artificial intelligence and computational complexity communities. Our preference
for DNFs is partially motivated by their analogy with real polynomials: Indeed,
since the identities x ∧ y = xy and x = 1 − x hold when x, y are interpreted as
numbers in {0, 1}, every DNF of the form
m


xi xj
k=1 i∈Ak j ∈Bk

can also be rewritten as


m 

xi (1 − xj ) ,
k=1 i∈Ak j ∈Bk

a form that is reminiscent of a multilinear real polynomial like


m
 
ck xi .
k=1 i∈Tk

Ultimately, however, because of the above-mentioned “duality” between DNFs


and CNFs, our preference can also simply be viewed as a matter of taste and habit,
and so we will make no further attempts to justify it.
Before we close this section, we introduce some additional terminology
(inspired by the analogy of DNFs with polynomials over the reals) and notation
that will be useful in our dealings with DNFs.

Definition 1.12. The degree of an elementary conjunction C = i∈A xi j ∈B x j
is the number of literals involved in C, namely, |A| + |B|. If φ = m k=1 Ck is a DNF,
then the degree of φ is the maximum degree of the terms Ck over all k ∈ {1, 2, . . . , m}.
A DNF is called linear (respectively, quadratic, cubic, ...) if its degree is at most 1
(respectively, at most 2, 3, . . .).
1.5 Transforming an arbitrary expression into a DNF 19

Note that the (encoding) length |φ| of a DNF φ, as introduced in Definition 1.7,
comes within a constant factor of the number of literals appearing in φ. Therefore,
we generally feel free to identify these two measures of the size of φ (especially
when discussing asymptotic complexity results). We denote by ||φ|| the number
of terms of a DNF φ.

1.5 Transforming an arbitrary expression into a DNF


How difficult is it to transform an arbitrary expression φ into an equivalent DNF?
Clearly, the construction given in the proof of Theorem 1.4 is not algorithmically
efficient, as it requires the enumeration of all the true points of φ. On the other hand,
a very simple procedure may come to mind immediately: Given the expression
φ, the properties listed in Theorem 1.1 (especially, De Morgan’s laws and the
distributivity laws) can be repeatedly applied until a DNF is obtained.
Example 1.10. The expression φ(x1 , x2 , x3 , x4 ) = (x1 ∨ x4 ) (x1 ∨ (x2 x 3 )) can be
successively transformed into:

φ = (x1 ∨ x4 ) (x 1 (x 2 ∨ x3 )) (De Morgan and involution)


= x1 x 1 x 2 ∨ x1 x 1 x3 ∨ x 1 x 2 x4 ∨ x 1 x3 x4 (distributivity and commutativity)
= x 1 x 2 x4 ∨ x 1 x3 x4 .

The problem with this method (and, actually, with any method that transforms
a Boolean expression into an equivalent DNF) is that it may very well require an
exponential number of steps, as illustrated by the following example:
Example 1.11. The function represented by the CNF

φ(x1 , x2 , . . . , x2n ) = (x1 ∨ x2 )(x3 ∨ x4 ) . . . (x2n−1 ∨ x2n )

has a unique shortest DNF expression; call it ψ (this will result from Theorem 1.23
hereunder). The terms of ψ are exactly those elementary conjunctions of n variables
that involve one variable out of each of the pairs {x1 , x2 }, {x3 , x4 }, ..., {x2n−1 , x2n }.
Thus, ψ has 2n terms. Writing down all these terms requires exponentially large
time and space in terms of the length of the original formula φ. 

Example 1.11 essentially shows that there is no hope of transforming an arbitrary


expression (or even a CNF) into an equivalent DNF in polynomial time.
In Chapter 4, we shall return to a finer discussion of the following, rather natural,
but surprisingly difficult question: Given a DNF expression ψ of the function f ,
what is the complexity of generating a CNF expression of f ? Or, equivalently, what
is the complexity of generating a DNF expression of f d ? For now, we are going to
present a procedure that achieves a less ambitious goal in polynomial (indeed, lin-
ear) time. This procedure is essentially due to Tseitin [872] (see also Blair, Jeroslow,
20 1 Fundamental concepts and applications

and Lowe [98]). With an arbitrary Boolean expression φ(X) = φ(x1 , x2 , . . . , xn )


on Bn , the procedure associates a DNF ψ(X, Y ) = ψ(x1 , x2 , . . . , xn , y1 , y2 , . . . , ym )
(where (y1 , y2 , . . . , ym ) are additional variables, and possibly m = 0) and a dis-
tinguished literal z among the literals on {x1 , x2 , . . . , xn , y1 , y2 , . . . , ym }. These
constructs have the properties that, for all X∗ ∈ B n , there is a (unique) point
Y ∗ ∈ B m such that ψ(X ∗ , Y ∗ ) = 0. Moreover, in every solution (X ∗ , Y ∗ ) of the
equation ψ(X, Y ) = 0, the distinguished coordinate of the point Y ∗ takes the value
z∗ = φ(X∗ ).
The DNF ψ(X, Y ) can be regarded as providing an implicit DNF representation
of the function φ(X), in the following sense: in order to compute the value of φ
at a point X∗ ∈ Bn , one can solve the equation ψ(X∗ , Y ) = 0 and read the value
of z in the unique solution of the equation. We will encounter some applications
of this procedure to the analysis of switching circuits in Section 1.13.2 and to the
solution of Boolean equations in Chapter 2.
The procedure recursively processes each of the subexpressions of φ and then
recombines the resulting DNFs into a single one using additional variables. Intu-
itively, each additional variable yi (i = 1, 2, . . . , m) represents the value of one of the
subexpressions occurring in φ. The formulation of the recombination step depends
on whether the outermost operator in φ is a complementation, a disjunction, or a
conjunction.
Before we give a formal statement of this procedure, let us illustrate it on a
small example.
Example 1.12. Consider the expression φ = (x1 ∨ x4 )(x1 ∨ (x2 x 3 )) (see Example
1.10). When working out an example by hand, it is easiest to apply the recur-
sive procedure “from bottom to top.” So, we start at the lowest level, with the
subexpression φ1 = x1 ∨ x4 . This subexpression gives rise to an associated DNF
ψ1 (x1 , x4 , y1 ) = x1 y 1 ∨ x4 y 1 ∨ x 1 x 4 y1 ,
where y1 is the distinguished literal associated with ψ1 . We will explain later how
this DNF ψ1 has been constructed, but the reader can already verify that, in every
solution of the equation ψ1 (x1 , x4 , y1 ) = 0, there holds y1 = φ1 = x1 ∨ x4 , so that
the literal y1 can be viewed as implicitly representing the subexpression φ1 .
Let us now proceed with the remaining subexpressions of φ. The subexpression
φ3 = x1 yields the trivial expansion ψ3 (x1 ) = 0, with x1 itself as distinguished
literal, while the subexpression φ4 = x2 x 3 expands into
ψ4 (x2 , x3 , y4 ) = x 2 y4 ∨ x3 y4 ∨ x2 x 3 y 4 ,
with y4 as distinguished literal. Note again that, in every solution of
ψ4 (x2 , x3 , y4 ) = 0, we have y4 = φ4 = x2 x 3 .
Combining ψ3 and ψ4 , we obtain the DNF expansion of φ2 = (x1 ∨ (x2 x 3 )) as
ψ2 (x1 , x2 , x3 , y2 , y4 ) = x1 y 2 ∨ y4 y 2 ∨ x 1 y 4 y2 ∨ x 2 y4 ∨ x3 y4 ∨ x2 x 3 y 4 ,
where y2 is the distinguished literal associated with ψ2 . Here again, one can verify
that the equality y2 = φ2 = (x1 ∨ (x2 x 3 )) holds in every solution of the equation
ψ2 = 0.
1.5 Transforming an arbitrary expression into a DNF 21

The same DNF ψ2 is also the DNF expansion of (x1 ∨ (x2 x 3 )), this time with
y 2 as associated literal.
Finally, putting all the pieces together, we obtain the desired expression of ψ:

ψ(x1 , x2 , x3 , y1 , y2 , y4 , z) = y 1 z ∨ y2 z ∨ y1 y 2 z
∨ x1 y 1 ∨ x4 y 1 ∨ x 1 x 4 y1
∨ x1 y 2 ∨ y4 y 2 ∨ x 1 y 4 y2
∨ x 2 y4 ∨ x3 y4 ∨ x2 x 3 y 4

with distinguished literal z. We leave it as an easy exercise to check that, for every
X = (x1 , x2 , x3 , x4 ) ∈ B4 , the unique solution of the equation ψ = 0 satisfies

y1 = φ1 (X) = x1 ∨ x4
y2 = φ2 (X) = x1 ∨ (x2 x 3 )
y4 = φ4 (X) = x2 x 3
z = φ(X).

Figure 1.2 presents a formal description of the procedure Expand. Let us now
establish the correctness of this procedure.

Theorem 1.5. With every Boolean expression φ(X) on B n , the procedure Expand
associates a DNF ψ(X, Y ) on Bn+m (m ≥ 0) and a distinguished literal z among the
literals on {x1 , x2 , . . . , xn , y1 , y2 , . . . , ym } with the property that, for each X∗ ∈ B n ,
there is a unique point Y (X ∗ ) ∈ B m such that ψ(X∗ , Y (X ∗ )) = 0; moreover, in this
point, the distinguished literal z is equal to φ(X∗ ). Expand can be implemented
to run in linear time.

Proof. We proceed by induction on the number of symbols in the expression φ. The


statement trivially holds if φ contains only one literal. If φ contains more than one
literal, then it must be of one of the types identified in Expand. Let us concentrate
on the case φ = (φ1 ∨ φ2 ∨ . . . ∨ φk ) (the other cases are similar). Let ψj (X, Yj ) =
Expand(φj ), where Yj is a Boolean vector of appropriate dimension, and let zj
denote the distinguished literal of ψj , for j = 1, 2, . . . , k. Then, by construction,

ψ := z1 y ∨ z2 y ∨ . . . ∨ zk y ∨ z1 z2 . . . zk y ∨ ψ1 (X, Y1 ) ∨ ψ2 (X, Y2 ) ∨ . . . ∨ ψk (X, Yk ).

Fix X∗ ∈ Bn . By induction, there exist k points Y1∗ , Y2∗ , . . . , Yk∗ , each of them
uniquely defined, such that ψj (X ∗ , Yj∗ ) = 0 and zj∗ = φj (X ∗ ) for j = 1, 2, . . . , k. It is
then straightforward to verify that the condition ψ(X ∗ , Y1 , Y2 , . . . , Yk , y) = 0 holds
for a unique choice of (Y1 , Y2 , . . . , Yk , y), namely, for Yj = Yj∗ (j = 1, 2, . . . , k), and
for

y = z1∗ ∨ z2∗ ∨ . . . ∨ zk∗ = φ1 (X ∗ ) ∨ φ2 (X ∗ ) ∨ . . . ∨ φk (X ∗ ) = φ(X∗ ).

The time complexity of the procedure is easily established by induction. 


22 1 Fundamental concepts and applications

Procedure Expand(φ)
Input: A Boolean expression φ(x1 , x2 , . . . , xn ).
Output: A DNF ψ(x1 , x2 , . . . , xn , y1 , y2 , . . . , ym ), with a distinguished literal z among the
literals on {x1 , x2 , . . . , xn , y1 , y2 , . . . , ym }.

begin
if φ = xi for some i ∈ {1, 2, . . . , n}
then return ψ(x1 , x2 , . . . , xn ) = 0 n and the distinguished literal xi
else if φ = φ1 for some expression φ1 then
begin
let ψ1 := Expand(φ1 ) and let z be the distinguished literal of ψ1 ;
return ψ := ψ1 and the distinguished literal z;
end
else if φ = (φ1 ∨ φ2 ∨ . . . ∨ φk ) for some expressions φ1 , φ2 , . . . , φk then
begin
for j = 1 to k do ψj := Expand(φj );
let zj be the distinguished literal of ψj , for j = 1, 2, . . . , k;
create a new variable y;
return ψ := z1 y ∨ z2 y ∨ . . . ∨ zk y ∨ z1 z2 . . . zk y ∨ ψ1 ∨ ψ2 ∨ . . . ∨ ψk
and the distinguished literal y;
end
else if φ = (φ1 φ2 . . . φk ) for some expressions φ1 , φ2 , . . . , φk then
begin
for j = 1 to k do ψj := Expand(φj );
let zj be the distinguished literal of ψj , for j = 1, 2, . . . , k;
create a new variable y;
return ψ := z1 y ∨ z2 y ∨ . . . ∨ zk y ∨ z1 z2 . . . zk y ∨ ψ1 ∨ ψ2 ∨ . . . ∨ ψk
and the distinguished literal y;
end
end

Figure 1.2. Procedure Expand.

1.6 Orthogonal DNFs and number of true points


Aclassical problem of Boolean theory is to derive an orthogonal disjunctive normal
form of an arbitrary Boolean function. In order to define this concept, consider a
DNF
m


φ= xi xj , (1.13)
k=1 i∈Ak j ∈Bk

where Ak ∩ Bk = ∅ for all k = 1, 2, . . . , m.


Definition 1.13. A DNF of the form (1.13) is said to be orthogonal, or to be a sum
of disjoint products, if (Ak ∩ B- ) ∪ (A- ∩ Bk )  = ∅ for all k, - ∈ {1, 2, . . . , m}, k  = -.
Definition 1.13 simply states that every two terms of an orthogonal DNF must
be “conflicting” in at least one variable; that is, there must be a variable that appears
complemented in one of the terms and uncomplemented in the other term. This
property is easy to test for any given DNF.
1.6 Orthogonal DNFs and number of true points 23

Note also that a DNF is orthogonal if and only if, for every pair of terms
k, l ∈ {1, 2, . . . , m}, k  = l, and for every X∗ ∈ Bn ,





∗ ∗ ∗ ∗
xi x j xi x j = 0.
i∈Ak j ∈Bk i∈Al j ∈Bl

The terminology “orthogonal” is quite natural in view of this observation, the proof
of which is left to the reader.

Example 1.13. The DNF φ = x 1 x 2 x4 ∨ x 1 x3 x4 is not orthogonal since the point


X∗ = (0, 0, 1, 1) makes both of its terms equal to 1. But φ is equivalent to the DNF
ψ = x 1 x 2 x4 ∨ x 1 x2 x3 x4 , which is orthogonal. 

As this example illustrates, the following specialization of Theorem 1.4 holds.


Theorem 1.6. Every Boolean function can be represented by an orthogonal DNF.
Proof. It suffices to observe that the minterm expression (1.6) used in the proof of
Theorem 1.4 is orthogonal. 

Let us now establish a remarkable property of orthogonal DNFs that reinforces


several of our earlier comments about the usefulness of interpreting the elements
of B = {0, 1} as numbers (see Section 1.1) and the similarity between DNFs and
polynomials over the reals (see the end of Section 1.4).
Theorem 1.7. If the Boolean function f on B n is represented by an orthogonal
DNF of the form (1.13), and if the elements of B are interpreted as numbers, then
m  
f (X) = xi (1 − xj ) , (1.14)
k=1 i∈Ak j ∈Bk

for all X = (x1 , x2 , . . . , xn ) ∈ Bn .


Proof. Since the terms of (1.14) are pairwise orthogonal, at most one of them takes
value 1 at any point X ∈ Bn . 

One of the main motivations for the interest in orthogonal DNFs is that, for
functions expressed in this form, computing the number of true points turns out to
be extremely easy.
Theorem 1.8. If the Boolean function f on B n is represented by an orthogonal
DNF of the form (1.13), then the number of its true points is equal to
m

ω(f ) = 2n−|Ak |−|Bk | .
k=1

Proof. The DNF (1.13) takes value 1 exactly whenone of its terms takes value 1.
Since the terms are pairwise orthogonal, ω(f ) = m k=1 αk , where αk denotes the
number of true points of the k-th term. The statement follows easily. 
24 1 Fundamental concepts and applications

At this point, the reader may be wondering (with some reason) why anyone
would ever want to compute the number of true points of a Boolean function. We
present several applications of this concept in Section 1.13. For now, it may be
sufficient to note that determining the number of true points of a function f is a
roundabout way to check the consistency of the Boolean equation f = 0.
Chow [194] introduced several parameters of a Boolean function that are closely
related to the number ω(f ) defined in Theorem 1.8.
Definition 1.14. The Chow parameters of a Boolean function f on B n are the
n + 1 integers (ω1 , ω2 , . . . , ωn , ω), where ω = ω(f ) is the number of true points of
f and ωi is the number of true points X∗ of f such that xi∗ = 1:
ωi = | {X ∗ ∈ B n | f (X∗ ) = 1 and xi∗ = 1} |, i = 1, 2, . . . , n.
The same reasoning as in Theorem 1.8 shows that the Chow parameters of a
function represented in orthogonal form can be efficiently computed: For ω, this
is just a consequence of Theorem 1.8; for ωi (1 ≤ i ≤ n), this follows from the fact
that the DNF obtained by fixing xi to 1 in an orthogonal DNF remains orthogonal.

Example 1.14. The function f represented by the orthogonal DNF ψ = x 1 x 2 x4 ∨


x 1 x2 x3 x4 has Chow parameters (ω1 , ω2 , ω3 , ω4 , ω) = (0, 1, 2, 3, 3). Indeed, f has
exactly three true points, x1 = 0 and x4 = 1 in all true points, x2 = 1 in exactly
one true point, and x3 = 1 in exactly two true points. 

Chow parameters, and variants thereof, have been independently rediscovered


by several researchers; in particular, up to scaling and shifting, they are identical
to the so-called degree-0 and degree-1 Fourier coefficients of a Boolean func-
tion. Chow parameters have found applications in as diverse fields as electrical
engineering (Chow [194], Winder [920]), game theory (Banzhaf [52], Dubey and
Shapley [279]), reliability theory (Birnbaum [91], Barlow and Proschan [54]),
cryptography (Carlet [170]), and theoretical computer science (see Ben-Or and
Linial [60], Bruck [157], Kahn, Kalai and Linial [543]); see also O’Donnell [716]
for an overview of applications. We return to Chow parameters in Section 1.13
and in subsequent chapters, especially in Chapter 9. Orthogonal forms are further
discussed in Chapter 7.

1.7 Implicants and prime implicants


Definition 1.15. Given two Boolean functions f and g on Bn , we say that f
implies g (or that f is a minorant of g, or that g is a majorant of f ) if
f (X) = 1 ⇒ g(X) = 1 for all X ∈ Bn .
When this is the case, we write f ≤ g.
This definition extends in a straightforward way to Boolean expressions, since
every such expression can be regarded as a Boolean function.
1.7 Implicants and prime implicants 25

The terminology “f implies g” is obviously borrowed from logic: If f and


g model, respectively, the truth value of propositional sentences Sf and Sg , then
f ≤ g holds exactly when Sf ⇒ Sg . On the other hand, the terms “minorant,” and
“majorant,” as well as the notation “f ≤ g” are easily motivated by looking at f
and g as integer-valued functions. Also, as suggested by the notation, the equality
f = g holds if and only if f ≤ g and g ≤ f hold simultaneously.
The following alternative forms of Definition 1.15 are frequently useful.

Theorem 1.9. For all Boolean functions f and g on Bn , the following statements
are equivalent:

(1) f ≤ g;
(2) f ∨ g = g;
(3) f ∨ g = 1n ;
(4) f g =f;
(5) f g = 0n.

Proof. It suffices to note that each of the assertions (1)–(5) fails exactly when
there exists X ∈ Bn such that f (X) = 1 and g(X) = 0. 

Let us record a few additional properties of the implication relation.

Theorem 1.10. For all Boolean functions f , g, and h on Bn ,

(1) 0 n ≤ f ≤ 1n ;
(2) f g ≤ f ≤ f ∨ g;
(3) f = g if and only if (f ≤ g and g ≤ f );
(4) (f ≤ h and g ≤ h) if and only if f ∨ g ≤ h;
(5) (f ≤ g and f ≤ h) if and only if f ≤ g h;
(6) if f ≤ g then f h ≤ g h;
(7) if f ≤ g then f ∨ h ≤ g ∨ h;

Proof. All these properties are easily verified. 

When two Boolean functions f and g are represented by arbitrary Boolean


expressions, it can be quite difficult to check whether or not f implies g. Definition
1.15 does not suggest any efficient way to perform this task, except by complete
enumeration of all the points in Bn , nor does Theorem 1.9 help in this respect.
We will come back to this point in Chapter 2, when we discuss the complexity of
solving Boolean equations.
For elementary conjunctions, however, implication takes an especially simple,
easily verifiable form: Indeed, an elementary conjunction implies another one if
and only if the latter results from the former by deletion of literals (the “longer”
conjunction implies the “shorter” one). More formally:
26 1 Fundamental concepts and applications


Theorem 1.11. The elementary conjunction CAB = i∈A xi j ∈B x j implies the

elementary conjunction CF G = i∈F xi j ∈G x j if and only if F ⊆ A and G ⊆ B.
Proof. Assume that F ⊆ A and G ⊆ B and consider any point X = (x1 , x2 , . . . , xn ) ∈
Bn . If CAB (X) = 1, then xi = 1 for all i ∈ A and xj = 0 for all j ∈ B, so that xi = 1
for all i ∈ F and xj = 0 for all j ∈ G. Hence, CF G (X) = 1 and we conclude that
CAB implies CF G .
To prove the converse statement, assume for instance that F is not contained
in A. Set xi = 1 for all i ∈ A, xj = 0 for all j  ∈ A and X = (x1 , x2 , . . . , xn ). Then,
CAB (X) = 1 but CF G (X) = 0 (since xk = 0 for some k ∈ F \ A), so that CAB does
not imply CF G . 

Definition 1.16. Let f be a Boolean function and C be an elementary conjunction.


We say that C is an implicant of f if C implies f .
Example 1.15. Let f = xy ∨ xyz. Then xy, xyz and xz are implicants of f . 
We can now formulate an easy observation.
Theorem 1.12. If φ is a DNF representation of the Boolean function f , then every
term of φ is an implicant of f . Moreover, if C is an implicant of f , then the DNF
φ ∨ C also represents f .
Proof. For the first statement, notice that, if any term of φ takes value 1, then φ,
and hence f , take value 1. For the second statement, just check successively that
φ ∨ C ≤ f and f ≤ φ ≤ φ ∨ C. 

Example 1.16. By Theorem 1.12, the function f = xy ∨ xyz (see Example 1.15)
admits the DNF expression xy ∨ xyz ∨ xz = xy ∨ xz (the last equality is easily
verified to hold). 

Example 1.16 illustrates an important point: With a view toward simplification


of Boolean expressions, it makes sense to replace “long” implicants by “short”
ones in DNF representations of a Boolean function. The meaning of “long” and
“short” can be clarified by reference to Theorem 1.11. This line of reasoning leads
to the following definitions (see Quine [766, 768]).
Definition 1.17. Let f be a Boolean function and C1 , C2 be implicants of f . We
say that C1 absorbs C2 if C1 ∨ C2 = C1 or, equivalently, if C2 ≤ C1 .
Definition 1.18. Let f be a Boolean function and C1 be an implicant of f . We
say that C1 is a prime implicant of f if C1 is not absorbed by any other implicant
of f (namely, if C2 is an implicant of f and C1 ≤ C2 , then C1 = C2 ).
Example 1.17. Consider again the function f defined in Example 1.15. It is easy
to verify that xy and xz are prime implicants of f , whereas xyz is not prime
(since xyz ≤ xz). As a matter of fact, f has no prime implicants other than xy
and xz. 
1.7 Implicants and prime implicants 27

Prime implicants play a crucial role in constructing DNF expressions of


Boolean functions. This role is best described by the next theorem (compare with
Theorem 1.4).
Theorem 1.13. Every Boolean function can be represented by the disjunction of
all its prime implicants.
Proof. Let f be a Boolean function on B n , and let P1 , P2 , . . . , Pm be its prime
implicants (notice that m is finite because the number of elementary conjunctions

on n variables is finite). Consider any DNF representation of f , say φ = rk=1 Ck .
By Theorem 1.12, the DNF
r m
ψ= Ck ∨ Pj
k=1 j =1

also represents f . Consider any term Ck of φ (1 ≤ k ≤ r). Since Ck is an implicant


of f , it is absorbed by at least one prime implicant of f , say, by Pj (where pos-
sibly Ck = Pj ). Then, it follows that Ck ∨ Pj = Pj , from which we deduce ψ =
m
j =1 Pj . 

The DNF representation introduced in Theorem 1.13 will be used repeatedly


throughout this book, and therefore deserves a special name.
Definition 1.19. The disjunction of all prime implicants of a Boolean function is
called the complete DNF (or the Blake canonical form) of this function.
Note that the complete DNF is only unique up to the order of its terms and
literals. However, just as we did in the case of minterm expressions, we shall
disregard this subtlety and simply look at the complete DNF as being uniquely
defined.
An interesting corollary of Theorem 1.13 is that each Boolean function is
uniquely identified by the list of its prime implicants. Equivalently, two Boolean
functions are equal if and only if they have the same complete DNF. Let us stress,
however, that it is not always necessary to know all the prime implicants of a func-
tion to know the function, and that it is not always necessary to take the disjunction
of all the prime implicants to obtain a correct DNF representation of the function.
Example 1.18. The function g = xy ∨ xy ∨ xz has four prime implicants, namely,
xy, xy, xz, and yz. 

More generally, let us introduce the following terminology.



Definition 1.20. Let f be a Boolean function on B n and let φ = k∈* Ck be a
DNF representation of f . We say that φ is a prime DNF of f if each term Ck
(k ∈ *) is a prime implicant
 of f . We say that φ is an irredundant DNF of f if
there is no j ∈ * such that k∈*\{j } Ck represents f ; otherwise, we say that f is
redundant.
28 1 Fundamental concepts and applications

So, a redundant DNF expression can be turned into a shorter equivalent DNF by
dropping some of its terms. For instance, Example 1.18 shows that the complete
DNF of a Boolean function is not necessarily irredundant. Similarly, if a DNF is
not prime, then at least one of its terms can be replaced by a prime implicant that
absorbs it (remember Theorem 1.12 and the comments following it). Therefore,
prime irredundant DNFs provide the shortest possible DNF representations of
Boolean functions. In Chapter 3, we return to the study of prime irredundant
DNFs in detail.
Of course, the concepts of implicants and prime implicants have their natural
disjunctive counterparts.
Definition 1.21. Let f be a Boolean function and D be an elementary disjunction.
We say that D is an implicate of f if f implies D. We say that the implicate D is
prime if it is not implied by any other implicate of f .
Similarly to Theorem 1.13, we obtain:
Theorem 1.14. Every Boolean function can be represented by the conjunction of
all its prime implicates.
Proof. The proof is a straightforward adaptation of the proof of Theorem 1.13. 

Example 1.19. The function g considered in Example 1.18 has four implicates,
namely, (x ∨ y), (x ∨ y ∨ z), (x ∨ y ∨ z), and (x ∨ y ∨ z). However, only the first
and the last implicates in this list are prime, and we conclude that g = (x ∨ y)
(x ∨ y ∨ z). 

1.8 Restrictions of functions, essential variables


We now introduce the concept of restriction (sometimes called projection) of a
Boolean function.
Definition 1.22. Let f be a Boolean function on Bn , and let k ∈ {1, 2, . . . , n}. We
denote by f|xk =1 and f|xk =0 , respectively, the Boolean functions on B n−1 defined
as follows: For every (x1 , . . . , xk−1 , xk+1 , . . . , xn ) ∈ Bn−1 ,
f|xk =1 (x1 , . . . , xk−1 , xk+1 , . . . , xn ) = f (x1 , . . . , xk−1 , 1, xk+1 , . . . , xn ),
f|xk =0 (x1 , . . . , xk−1 , xk+1 , . . . , xn ) = f (x1 , . . . , xk−1 , 0, xk+1 , . . . , xn ).
We say that f|xk =1 is the restriction of f to xk = 1 and that f|xk =0 is the restriction
of f to xk = 0.
Even though f|xk =1 and f|xk =0 are, by definition, functions of n − 1 variables,
we can also look at them as functions on B n rather than B n−1 , via the following
convention: For every (x1 , x2 , . . . , xn ) ∈ Bn , we simply let
f|xk =1 (x1 , x2 , . . . , xn ) = f (x1 , . . . , xk−1 , 1, xk+1 , . . . , xn ),
1.8 Restrictions of functions, essential variables 29

and similarly for f|xk =0 (x1 , x2 , . . . , xn ). This slight abuse of definitions is innocu-
ous and we use it whenever it proves convenient. Also,we use shorthand  like
 
f|x1 =0,x2 =1,x3 =0 instead of the more cumbersome notation f|x1 =0 |x =1 .
2 |x3 =0
The link between representations of a function and representations of its
restrictions is straightforward.
Theorem 1.15. Let f be a Boolean function on Bn , let ψ be a representation of f ,
and let k ∈ {1, 2, . . . , n}. Then, the expression obtained by substituting the constant
1 (respectively, 0) for every occurrence of xk in ψ represents f|xk =1 (respectively,
f|xk =0 ).
Proof. This is an immediate consequence of Definitions 1.22 and 1.5. 

Example 1.20. Consider the function f = (xz ∨ y)(x ∨ z) ∨ x y. After some easy
simplifications, we derive the following expressions for f|y=1 and f|y=0 :

f|y=1 = (xz ∨ 1)(x ∨ z) ∨ x 1 = x ∨ z,

f|y=0 = (xz ∨ 0)(x ∨ z) ∨ x 0 = xz ∨ x = z ∨ x.



We now prove a trivial, but useful identity.
Theorem 1.16. Let f be a Boolean function on Bn , and let k ∈ {1, 2, . . . , n}. Then,

f (x1 , x2 , . . . , xn ) = xk f|xk =1 ∨ x k f|xk =0 (1.15)


for all (x1 , x2 , . . . , xn ) ∈ Bn .
Proof. This is immediate by substitution of the values xk = 0 or xk = 1 in
(1.15). 

The right-hand side of the identity (1.15) is often called the Shannon expansion
of the function f with respect to xk , by reference to its use by Shannon in [827],
although this identity was already well-known to Boole [103]. It can be used, in
particular, to construct the minterm DNF of a function (Theorem 1.4 and Definition
1.11). More interestingly, by applying the Shannon expansion to a function and to
its successive restrictions until these restrictions become either 0, or 1, or a literal,
we obtain an orthogonal DNF of the function (this is easily proved by induction
on n). Not every orthogonal DNF, however, can be obtained in this way.
Example 1.21. Consider again the function f in Example 1.20. The Shannon
expansion of f|y=1 with respect to x is

xf|y=1,x=1 ∨ xf|y=1,x=0 = x ∨ x z.

Observe that f|y=1,x=1 is identically 1 and f|y=1,x=0 = z is a literal, so we terminate


here the expansion f|y=1 .
30 1 Fundamental concepts and applications

Similarly, the Shannon expansion of f|y=0 with respect to z (for a change) is

zf|y=0,z=1 ∨ zf|y=0,z=0 = z ∨ z x.

Putting the pieces together, we obtain

f (x, y, z) = y (x ∨ x z) ∨ y (z ∨ z x) = x y ∨ x y z ∨ y z ∨ x y z,

which is an orthogonal DNF of f .


Another orthogonal DNF of f is x y ∨ x z ∨ y z. But this DNF cannot be
obtained from successive Shannon expansions, since, when applying this proce-
dure, we necessarily produce a DNF in which one of the variables appears in all
the terms. 
Let us now turn to the concept of essential variables.
Definition 1.23. Let f be a Boolean function on Bn , and let k ∈ {1, 2, . . . , n}. We
say that the variable xk is inessential for f , or that xk is a dummy for f , or that
f does not depend on xk , if f|xk =1 (X) = f|xk =0 (X) for all X ∈ B n−1 . Otherwise,
we say that xk is essential.
If a function has a representation in which some specific variable xk does not
appear, then, as a consequence of Theorem 1.15, the function does not depend on
xk . The converse statement is slightly less obvious but nevertheless valid.
Theorem 1.17. Let f be a Boolean function on B n , and let k ∈ {1, 2, . . . , n}. The
following statements are equivalent:
(1) The variable xk is inessential for f .
(2) The variable xk does not appear in any prime implicant of f .
(3) f has a DNF representation in which the variable xk does not appear.
Proof. The second statement implies the third one by Theorem 1.13, and the third
statement implies the first one by Theorem 1.15. Let us now assume that xk is
inessential for f , and let us consider an arbitrary implicant of f , say, CAB =

i∈A xi j ∈B x j . Assume, for instance, that k ∈ A (the argument would be similar
for k ∈ B) and consider the conjunction C obtained by deleting xk from CAB :



C= xi xj .
i∈A\{k} j ∈B

We claim that C is an implicant of f : This will in turn entail that the prime impli-
cants of f do not involve xk . To prove the claim, let X = (x1 , x2 , . . . , xn ) be any
point in Bn such that C(X) = 1, and let us show that f (X) = 1. Since neither C nor
f depend on xk , we may as well suppose that xk = 1. Then, C(X) = CAB (X) = 1
and hence f (X) = 1, as required. 

It should be obvious, however, that any particular representation of a function


may involve a variable on which the function does not depend. So, for instance, the
minterm expression introduced in Definition 1.11 involves n variables for every
1.9 Geometric interpretation 31

function on B n (except the null function 0 n ), even when the function depends on
much fewer than n variables.
Example 1.22. The DNF φ(x1 , x2 , x3 , x4 ) = x1 x2 ∨ x1 x 2 ∨ x 1 x2 ∨ x 1 x 2 repre-
sents the constant function 14 on B 4 . In particular, φ does not depend on any of its
variables. 
We will prove later that, for a function represented by an arbitrary DNF expres-
sion, it is generally difficult to determine whether any given variable is essential
or not (see Theorem 1.32 in Section 1.11).
Finally, let us mention an interesting connection between the concept of
essential variables and of Chow parameters.
Theorem 1.18. Let f be a Boolean function on B n , let (ω1 , ω2 , . . . , ωn , ω) be its
vector of Chow parameters, and let k ∈ {1, 2, . . . , n}. If the variable xk is inessential
for f , then ω = 2 ωk .
Proof. The sets A = {X ∈ Bn | f (X) = 1, xk = 1} and B = {X ∈ Bn | f (X) = 1,
xk = 0} partition the set of true points of f , and |A| = ωk , |B| = ω − ωk . If xk is
inessential, then A and B are in one-to-one correspondence, so ω = 2 ωk . 

The converse of Theorem 1.18 is not valid since the function f (x1 , x2 ) =
x1 x 2 ∨ x 1 x2 has Chow parameters (1,1,2), and both variables x1 , x2 are essential.

1.9 Geometric interpretation


Most of the concepts introduced in the previous sections have simple, but fre-
quently useful, geometric interpretations. First, the points of Bn can be identified
with the vertices of the unit hypercube
U n = {X ∈ Rn | 0 ≤ xi ≤ 1 for i = 1, 2, . . . , n}.
Every Boolean function defines a partition of the vertices of U n into true points
and false points. Conversely, this partition completely characterizes the function.

Consider an arbitrary elementary conjunction of the form CAB = i∈A xi

j ∈B x j . The set of true points of CAB is

TAB = {X ∈ Bn | xi = 1 for all i ∈ A and xj = 0 for all j ∈ B}.

Geometrically, the points in TAB are exactly the vertices contained in a face of
U n . Every such face is itself a hypercube of dimension n − |A| − |B| containing
2n−|A|−|B| vertices, and will therefore be referred to as a subcube. (Some authors,
especially in the electrical engineering literature, actually use the term “cube”
instead of “elementary conjunction.”)
Consider now a Boolean function f . In view of the previous observation, each
implicant of f corresponds to a subcube of U n that contains no false points of f .
The implicant
is prime if the corresponding subcube is maximal with this property.
Let φ = m k=1 Ck be an arbitrary DNF expression of the function f . The set
of true points of f coincides with the union of the sets of true points of the
32 1 Fundamental concepts and applications

(0, 1, 1) ✐
✑ ✑ pppp ❏
✑ pp ❏ ✐ T (f )
✑ pp ❏
✑ ppp
✑ ppp ❏
(0, 0, 1) ✐ ppp y F (f )
ppp ❏

❏ p ❏✐
(0, 1, 0) p p py (1, 1, 1)
❏ p p p pp ✑
❏ ppppp p p ✑✑
p p
p p p❏ ✑p p
pp p p ✑ ppp
✐ ❏✐ ✑ pp
(0, 0, 0) (1, 0, 1) pp
pp
❏ pp
❏ y (1, 1, 0)
❏ ✑ ✑
❏ ✑
❏ ✑

❏y✑ (1, 0, 0)
Figure 1.3. A 3-dimensional view of the Boolean function of Example 1.23.

Table 1.2. A Karnaugh map


(x2 , x3 )
00 01 11 10
(x1 ) 0 1 1 1 0
1 0 1 1 0

terms Ck . In other words, a DNF expression of f can be viewed as a collection


of subcubes of U n that cover all the true points of f and none of its false points.
In particular, an orthogonal DNF is one for which the subcubes in the collection
are pairwise disjoint. This observation motivates the terminology “sum of disjoint
products” mentioned in Definition 1.13 and may provide an alternative insight into
Theorem 1.8.
The classical representation of Boolean functions by Karnaugh maps is directly
inspired by the geometric point of view.Although Karnaugh maps may be useful for
visual inspection of functions involving a small number of variables (up to 5 or 6, at
most), they are inadequate for algorithmic purposes and thus have become obsolete.
We illustrate them here with only a simple example and refer the interested reader
to Maxfield [678] or to Mendelson [680] for more details.

Example 1.23. Consider again the function given by Table 1.1 in Section 1.1.
A Karnaugh map for this DNF is given by the matrix displayed in Table 1.2.
The rows of the map are indexed by the values of the variable x1 ; its columns are
indexed by the values of the pair of variables (x2 , x3 ); and each cell contains the
value of the function in the corresponding Boolean point. For instance, the cell in
the second row, fourth column, of the map contains a 0, since f (1, 1, 0) = 0.
1.10 Monotone Boolean functions 33

Because of the special way in which the columns are ordered, two adjacent
cells always correspond to neighboring vertices of the unit hypercube U ; that is,
the corresponding points differ in exactly one component. This remains true if we
think of the Karnaugh map as being wrapped on a torus, with cell (0, 10) adjacent
to cell (0, 00). Likewise, each row of the map corresponds to a 2-dimensional face
of U , and so do squares formed by 4 adjacent cells, like (0, 01), (0, 11), (1, 01),
and (1, 11).
Now, note that every cell of the map containing a 1 can alternatively be viewed as
representing a minterm of the function f . For instance, the cell (0, 01) corresponds
to the minterm x 1 x 2 x3 . Moreover, any two adjacent cells with value 1 can be
combined to produce an implicant of degree 2. So, the cells (0, 01) and (0, 11)
generate the implicant x 1 x3 , and so on. Finally, each row or square containing
four 1’s generates an implicant of degree 1; e.g., the cells (0, 01), (0, 11), (1, 01),
and (1, 11) correspond to the implicant x1 .
So, in order to derive from the map a DNF expression of f , we just have to
find a collection of subsets of adjacent cells corresponding to implicants of f
and covering all the true points of f . Each such collection generates a different
DNF of f . For instance, the pairs of cells ((0, 00), (0, 01)), ((0, 01), (0, 11)), and
((1, 01), (1, 11)) simultaneously cover all the true points of f and generate the DNF
φ = x 1 x 2 ∨ x 1 x3 ∨ x1 x3 .
Alternatively, the true points can be covered by the pair ((0, 00), (0, 01)) and by
the square ((0, 01), (0, 11), (1, 01), (1, 11)), thus giving rise to the DNF
ψ = x 1 x 2 ∨ x3 .

Karnaugh maps have been mostly used by electrical engineers to identify short
(irredundant, prime) DNFs of Boolean functions of a small number of variables.
Extensions of this problem to arbitrary functions will be discussed in Section 3.3.

1.10 Monotone Boolean functions


In this section, we introduce one of the most important classes of Boolean func-
tions, namely, the class of monotone functions, which subsumes several other
special classes of functions studied further in this book. We establish some of the
fundamental properties of monotone functions and of their normal forms. Many
other properties of monotone functions will be uncovered in subsequent chapters
(see also Korshunov [580] for a long survey devoted to monotone functions).

1.10.1 Definitions and examples


“Monotonically increasing” and “monotonically decreasing” real-valued functions
are classical objects of study in elementary calculus. The following definition
attempts to capture similar concepts in a Boolean framework.
34 1 Fundamental concepts and applications

Definition 1.24. Let f be a Boolean function on Bn , and let k ∈ {1, 2, . . . , n}. We


say that f is positive (respectively, negative) in the variable xk if f|xk =0 ≤ f|xk =1
(respectively, f|xk =0 ≥ f|xk =1 ). We say that f is monotone in xk if f is either
positive or negative in xk .
Thus, when f is positive in xk , changing the value of xk from 0 to 1 (while
keeping the other variables fixed) cannot change the value of f from 1 to 0.
Definition 1.25. A Boolean function is positive (respectively, negative) if it is
positive (respectively, negative) in each of its variables. The function is monotone
if it is monotone in each of its variables.
Example 1.24. The function f (x1 , x2 , x3 ) = x 1 x 2 ∨ x3 , whose truth table was
given in Example 1.1, is negative in x1 , negative in x2 , and positive in x3 . Hence,
f is monotone, but it is neither positive nor negative.
The function h(x, y) = xy ∨ xy is neither monotone in x nor monotone in y.
For instance, to see that h is not positive in x, observe that h(0, 1) = 1, whereas
h(1, 1) = 0, and hence h|x=0 = y  ≤ h|x=1 = y. 

Application 1.7. (Voting theory.) Remember the decision-making situation


sketched in Application 1.3. Voting rules are usually designed in such a way that
the outcome of a vote cannot switch from “Yes” to “No” when any single player’s
vote switches from “No” to “Yes”. For this reason, simple games are most ade-
quately modeled by positive Boolean functions. 

Application 1.8. (Reliability theory.) In the context described in Application 1.4,


it is rather natural to assume that a currently working system does not fail when we
replace a defective component by an operative one. Therefore, a common hypoth-
esis in reliability theory is that structure functions of complex systems are positive
Boolean functions. 

Application 1.9. (Graphs and hypergraphs.) The stability function of a hyper-


graph, as defined in Application 1.5, is a positive Boolean function since every
subset of a stable set is stable. 
As is too often the case in the Boolean literature, the terminology established in
Definitions 1.24 and 1.25 is not completely standardized, and authors working in
different fields have proposed several variants. So, for example, monotone func-
tions are also called unate or 1-monotone in the electrical engineering and threshold
logic literature. Computer scientists usually reserve the qualifier “monotone” for
what we call “positive” functions, and so forth.
Notice that, in many applications, the distinction between positive and negative
variables (and hence, between positive and monotone functions) turns out to be
irrelevant. This holds by virtue of the following fact.
Theorem 1.19. Let f be a Boolean function on B n , and let g be the function
defined by
g(x1 , x2 , . . . , xn ) = f (x 1 , x2 , . . . , xn )
1.10 Monotone Boolean functions 35

for all (x1 , x2 , . . . , xn ) ∈ B n . Then, g is positive in the variable x1 if and only if f


is negative in x1 .
Proof. This is a trivial consequence of Definition 1.24. 

So, when a monotone function is neither positive nor negative (as in the pre-
ceding Example 1.24), it can always be brought to one of these two forms by an
elementary change of variables. This suggests that, in many cases, it is sufficient to
study the properties of positive functions to understand the properties of monotone
functions. This is our point of view in the next sections.
Let us give a characterization of positive functions that can be seen as a simple
restatement of Definitions 1.24 and 1.25. For two points X = (x1 , x2 , . . . , xn ) and
Y = (y1 , y2 , . . . , yn ) in B n , we write X ≤ Y if xi ≤ yi for all i = 1, 2, . . . , n.
Theorem 1.20. A Boolean function f on B n is positive if and only if f (X) ≤ f (Y )
for all X, Y ∈ Bn such that X ≤ Y .
Proof. The “if” part of the statement is trivial, and the “only if” part is easily
established by induction on the number of components of X and Y such that
xi < yi . 

1.10.2 DNFs and prime implicants of positive functions


Let us now try to understand the main features of positive functions in terms of their
DNF representations and their prime implicants. To this effect, we first introduce
some remarkable classes of disjunctive normal forms.
Definition 1.26. Let ψ(x1 , x2 , . . . , xn ) be a DNF, and let k ∈ {1, 2, . . . , n}. We say
that
• ψ is positive (respectively, negative) in the variable xk if the complemented
literal x k (respectively, uncomplemented literal xk ) does not appear in ψ;
• ψ is monotone in the variable xk if ψ is either positive or negative in xk ;
• ψ is positive (respectively, negative) if ψ is positive (respectively, negative)
in each of its variables;
• ψ is monotone if ψ is either positive or negative in each of its variables.
Example 1.25. Every elementary conjunction is monotone (since each variable
appears at most once in it). The DNF φ(x, y, z) = xy ∨ x y z ∨ xz is positive in x
and neither positive nor negative in y and z. The DNF θ(x, y, z) = xy ∨ x z ∨ y z
is monotone (as it is positive in x, y and negative in z), but it is neither positive
nor negative. The DNF ψ(x, y, z, u) = xy ∨ xzu ∨ yz ∨ yu is positive. 
It is important to realize that a nonpositive (or even nonmonotone) DNF may
very well represent a positive function. The DNF φ in Example 1.25 provides
an example: Indeed, this DNF can be checked to represent the monotone func-
tion f (x, y, z) = x. The following result spells out the relation between positive
functions and positive DNFs.
36 1 Fundamental concepts and applications

Theorem 1.21. Let f be a Boolean function on B n , and let k ∈ {1, 2, . . . , n}. The
following statements are equivalent:

(1) f is positive in the variable xk .


(2) The literal x k does not appear in any prime implicant of f .
(3) f has a DNF representation in which the literal x k does not appear.

Proof. To see that the first assertion


implies the second one, consider any prime

implicant of f , say, CAB = i∈A xi j ∈B x j , and assume that k ∈ B. Since CAB
is prime, the conjunction C obtained by deleting x k from CAB , namely,

C= xi xj ,
i∈A j ∈B\{k}

is not an implicant of f . Therefore, there exists a point X ∗ ∈ B n such that C(X ∗ ) = 1


and f (X∗ ) = 0. Since CAB is an implicant of f , this implies that CAB (X ∗ ) = 0,
and hence, xk∗ = 1. Consider now the point Y ∗ ∈ Bn defined by yi∗ = xi∗ for i  = k
and yk∗ = 0. Then, CAB (Y ∗ ) = 1 implies f (Y ∗ ) = 1. This establishes that f is not
positive in the variable xk , as required.
By Theorem 1.13, the second assertion implies the third  one.
Assume now that the third assertion holds, and let φ = m j =1 Cj be any DNF
of f in which the literal x k does not appear. Recall from Theorem 1.15 that the
expression obtained by substituting 1 (respectively 0) for every occurence of xk in
φ represents f|xk =1 (respectively f|xk =0 ). Now, if a term Cj does not involve xk ,
then the substitution has no effect on this term. On the other hand, if Cj involves
xk (in uncomplemented form, by hypothesis), then this term vanishes when we
substitute xk by 0. This directly implies that f|xk =0 ≤ f|xk =1 , and hence, f is posi-
tive in xk . 

As an immediate corollary of Theorem 1.21, the prime implicants of a positive


Boolean function do not involve any complemented variables; therefore, every
positive function has at least one positive DNF. This property can actually be
stated more accurately. Before we do so, we first establish a result that facilitates
the comparison of positive DNFs.

Theorem 1.22. Let φ and ψ be two DNFs and assume that ψ is positive. Then, φ
implies ψ if and only if each term of φ is absorbed by some term of ψ.

Proof. We suppose, without loss of generality, that φ and ψ are expressions in the
same n variables. The “if” part of the statement holds even when ψ is not positive,
as an easy corollary of Theorem 1.11. For the converse statement,
let us assume that
φ implies ψ, and let us consider some term of φ, say, Ck = i∈A xi j ∈B x j . Con-
sider the characteristic vector of A, denoted eA . There holds Ck (eA ) = φ(eA ) = 1.
Thus, ψ(eA ) = 1 (since φ ≤ ψ), and therefore, some term of ψ must take value 1

at the point eA : Denote this term by Cj = i∈F xi (remember that ψ is positive).
1.10 Monotone Boolean functions 37

Now, since Cj (eA ) = 1, we conclude that F ⊆ A, and hence, Cj absorbs Ck


as required. 

As a consequence of Theorem 1.22, one can easily check in polynomial time


whether an arbitrary DNF φ implies a positive DNF ψ (the same question is much
more difficult to answer when both φ and ψ are arbitrary DNFs; see the comments
following Theorem 1.10).
Beside its algorithmic consequences, Theorem 1.22 also allows us to derive
one of the fundamental properties of DNFs of positive functions (remember
Definitions 1.19 and 1.20):
Theorem 1.23. The complete DNF of a positive Boolean function f is positive
and irredundant; it is the unique prime DNF of f .
Proof. Let f be a positive function, let P1 , P2 , . . . , Pm be its prime implicants, and

let φ = m k=1 Pk denote the complete DNF of f . By Theorem 1.21, r φ is posi-
tive. Consider now an arbitrary prime expression of f , say, ψ = k=1 Pk , where
1 ≤ r ≤ m. Since f = φ = ψ, we deduce from Theorem 1.22 that each term of φ is
absorbed by some term of ψ. In particular, if m > r, then Pm must be absorbed by
some other prime implicant Pk with k ≤ r. This, however, contradicts the primality
of Pm . Hence, we conclude that r = m, which shows that φ is irredundant and is
the unique prime DNF of f . 

Theorem 1.23 is due to Quine [767]. It is important because it shows that the
complete DNF provides a “canonical” shortest DNF representation of a positive
Boolean function: Since the shortest DNF representation of a Boolean function is
necessarily prime and irredundant (see the comments following Definition 1.20),
no other DNF representation of a positive function can be as short as its com-
plete DNF. Notice that this unicity result does not hold in general for nonpositive
functions, as illustrated by the example below.
Example 1.26. The DNFs ψ1 = x y ∨ y z ∨ x z and ψ2 = x z ∨ y z ∨ x y are two
shortest (prime and irredundant) expressions of the same function. 
We conclude this section with a useful result that extends Theorem 1.23: This
result states that the complete DNF of a positive function can be obtained by first
dropping the complemented literals from any DNF representation of the function
and then deleting the redundant implicants from the resulting expression.
    
Theorem 1.24. Let φ = m k=1 i∈Ak xi j ∈Bk x j be a DNF representation
  
of a positive Boolean function f . Then, ψ = m k=1 i∈Ak x i is a positive DNF
representation of f . The prime implicants of f are the terms of ψ which are not
absorbed by other terms of ψ.
Proof. Clearly, f = φ ≤ ψ (see Theorem 1.22). To prove the reverse inequality,
consider any point X ∗ = (x1∗ , x2∗ , . . . , xn∗ ) ∈ B n such that ψ(X∗ ) = 1. There is a term
38 1 Fundamental concepts and applications

of ψ that takes value 1 at the point X∗ , or equivalently, there exists k ∈ {1, 2, . . . , m}


such that xi∗ = 1 for all i ∈ Ak . If eAk is the characteristic vector of Ak , then
φ(eAk ) = f (eAk ) = 1. Moreover, eAk ≤ X ∗ and therefore, by positivity of f ,
f (X∗ ) = 1. This establishes that ψ ≤ f , and thus f = ψ as required.
For the second part of the statement, consider the complete DNF of f , say, ψ ∗ .
Since ψ = ψ ∗ and ψ is positive, Theorem 1.22 implies that every term of ψ ∗ is
absorbed by some term of ψ. However, the terms of ψ are implicants of f , and
the terms of ψ ∗ are prime implicants of f . Hence, all prime implicants of f must
appear among the terms of ψ. This completes the proof. 

Example 1.27. As already observed in the comments following Example 1.25, the
DNF φ(x, y, z) = xy ∨ x y z ∨ xz represents a positive function; call it f . An alter-
native representation of f is derived by deleting all complemented literals from φ.
In this way, we obtain the redundant DNF ψ = xy ∨ x ∨ xz, and we conclude that
x is the only prime implicant of f . 

1.10.3 Minimal true points and maximal false points


The definition of the true points (respectively, false points) of an arbitrary Boolean
function has been stated in Definition 1.1: These are simply the points in which the
function takes value 1 (respectively, 0). Let us now consider a further refinement
of these concepts.
Definition 1.27. Let f be a Boolean function on B n , and let X ∈ Bn . We say that
X is a minimal true point of f if X is a true point of f and if there is no true point
Y of f such that Y ≤ X and X = Y . Similarly, we say that X is a maximal false
point of f if X is a false point of f and if there is no false point Y of f such that
X ≤ Y and X  = Y . We denote by minT (f ) (respectively, maxF (f )) the set of
minimal true points (respectively, maximal false points) of f .
Minimal true points and maximal false points have been defined for arbi-
trary Boolean functions. However, these concepts are mostly relevant for positive
functions, as evidenced by the following observation:
Theorem 1.25. Let f be a positive Boolean function on B n and let Y ∈ Bn .
(1) Y is a true point of f if and only if there exists a minimal true point X of f
such that X ≤ Y .
(2) Y is a false point of f if and only if there exists a maximal false point X of
f such that Y ≤ X.
Proof. The “only if” implication is trivial in both cases (and is independent of the
positivity assumption). The converse implications are straightforward corollaries
of Theorem 1.20. 
1.10 Monotone Boolean functions 39

As a consequence of Theorem 1.25, positive functions are completely character-


ized by their set of minimal true points (or maximal false points). More precisely, if
S is a subset of B n such that every two points in S are pairwise incomparable with
respect to the partial order ≤, then there is a unique positive function which has S
as its set of minimal true points and there is a unique positive function which has
S as its set of maximal false points.
At this point, it is interesting to remember that, as we discussed in Section
1.7, Boolean functions are similarly characterized by their list of prime implicants
(see Theorem 1.13 and the comments following it). This analogy is not fortuitous:
Indeed, there exists a simple, but fundamental, one-to-one correspondence between
the minimal true points and the prime implicants of a positive function.

Theorem 1.26. Let f be a positive Boolean function on Bn , let CA = i∈A xi be
an elementary conjunction and let eA be the characteristic vector of A. Then,
(1) CA is an implicant of f if and only if eA is a true point of f .
(2) CA is a prime implicant of f if and only if eA is a minimal true point of f .
Proof. Consider the first statement. If CA is an implicant of f , then clearly eA is
a true point of f (this holds even if f is not positive). Conversely, if eA is a true

point of f , then i∈A xi j ∈A x j is an implicant of f . Then, by positivity of f , CA
also is an implicant of f (by the same argument as in the proof of Theorem 1.24).
For a proof of the second statement, consider an elementary conjunction

CB = i∈B xi and the characteristic vector eB of B. Observe that CA ≤ CB if
and only if B ⊆ A, that is, if and only if eB ≤ eA . This observation, together with
the first statement, implies that CA is a prime implicant of f if and only if eA is a
minimal true point of f . 

Example 1.28. Consider the positive function f (x, y, z, u) = xy ∨ xzu ∨ yz. From
Theorem 1.26, we conclude that the minimal true points of f are (1,1,0,0), (1,0,1,1)
and (0,1,1,0).
Theorem 1.26 crucially depends on the positivity assumption. To see this, con-
sider the function g(x, y, z) = x y ∨x z. The point (1,1,0) is a true point of g derived
from the prime implicant x y, as explained in Theorem 1.26. However, (0,0,0) is
the unique minimal true point of g. 

A similar one-to-one correspondence holds between the maximal false points


and the prime implicates of a positive function.

Theorem 1.27. Let f be a positive Boolean function on Bn , let DA = i∈A xi
be an elementary disjunction, and let eN\A be the characteristic vector of N \ A.
Then,
(1) DA is an implicate of f if and only if eN \A is a false point of f .
(2) DA is a prime implicate of f if and only if eN \A is a maximal false point
of f .
40 1 Fundamental concepts and applications

Proof. It suffices to mimic the proof of Theorem 1.26. Alternatively, Theorem 1.27
can be derived as a corollary of Theorem 1.26 via De Morgan’s laws or simple
duality arguments. 

Example 1.29. The function f given in Example 1.28 has four prime implicates,
namely, (x ∨ y), (x ∨ z), (y ∨ z) and (y ∨ u). Accordingly, it has four maximal false
points, namely, (0,0,1,1), (0,1,0,1), (1,0,0,1) and (1,0,1,0). 

1.11 Recognition of functional and DNF properties


In this section, we concentrate on the broad algorithmic issue of deciding whether
a given function or expression belongs to a particular class. We refer the reader to
Appendix B for a brief primer on computational complexity, and for a reminder
of concepts like NP-completeness, NP-hardness, and so on.
If C is a set of Boolean expressions, we define the decision problem:

DNF Membership in C
Instance: A DNF expression φ.
Question: Is φ in C?

Similarly, if C is a set of Boolean functions, then we define the decision problem:

Functional Membership in C
Instance: A DNF expression of a function f .
Question: Is f in C?

Roughly speaking, a DNF membership problem bears on the DNF itself,


whereas a functional membership problem bears on the function represented by
the DNF. The distinction between both types of problems, however, is not as clear-
cut as it may seem, since every functional membership problem can be viewed
as a DNF membership problem of a special type: Indeed, if φ is a DNF expres-
sion of the function f , then f is in the class C if and only if φ is in the class
C ∗ = {ψ | ψ is a DNF representation of a function in C}.
Consider, for instance, the following classes of Boolean expressions and
Boolean functions:

• The class T DN F of all DNF expressions of the constant function 1 (on


an arbitrary number of arguments) and the class T of all constant functions
{1n | n ∈ N}.
• The class ZDN F of all DNF expressions of the constant function 0 (on
an arbitrary number of arguments) and the class Z of all constant functions
{0 n | n ∈ N}.
1.11 Recognition of functional and DNF properties 41

Since T DN F = T ∗ and ZDN F = Z ∗ , it is obvious that the functional mem-


bership problems associated with T and Z are equivalent to the DNF membership
problems associated with the classes T DN F and ZDN F, respectively.
On the other hand, define now
• the class D+ of all positive DNFs;
• the class F+ of all positive functions.
The relationship between D+ and F+ is not trivial, since a positive function
may very well be represented by a nonpositive DNF. In particular, D+  = F+∗ and
the DNF membership problem associated with the class D+ does not reduce to a
functional membership problem. (D+ is, in fact, defined by a purely syntactical
property.)
Similarly, consider
• the class Dk of all DNFs of degree at most k (k ∈ N),
• the class Fk of all functions representable by a DNF in Dk (k ∈ N).
Here again, Dk  = Fk∗ , and the DNF membership problem associated with Dk
is not equivalent to the functional membership problem associated with Fk .
Now, as one may expect, the difficulty of membership problems depends to a
large extent on the specification of the class C. For instance, it is quite easy to test
whether a DNF is identically 0, or has degree at most k, or is positive.
Theorem 1.28. DNF Membership in ZDN F and Functional Membership in
Z can be tested in constant time. DNF Membership in D+ and DNF Membership
in Dk (k ∈ N) can be tested in linear time.
Proof. Every elementary conjunction takes value 1 in at least one point. Conse-
quently, a DNF φ is identically 0 if and only if it has no term, a condition which
can be tested in constant time.
Furthermore, computing the degree of a DNF, or checking whether a DNF is
positive, only requires linear time in the size of the DNF. 

As illustrated by Theorem 1.28, many DNF membership problems are easy to


solve. By contrast, however, functional membership problems tend to be difficult:
Intuitively, we might say that this is because most properties of Boolean functions
are not reflected in a straightforward way in their normal form representations.
As a first manifestation of this phenomenon, we can formulate a result that is
a simple restatement of Cook’s fundamental theorem on NP-completeness [208].
The restatement applies to the so-called tautology problem, that is, to the functional
membership problem in T :
Theorem 1.29. The tautology problem Functional Membership in T is co-NP-
complete.
Proof. Cook’s theorem was originally stated and proved
in the
following form (see
 
also Theorem 2.1 in Chapter 2): Given a CNF ψ = m k=1 i∈Ak xi j ∈Bk x j in
42 1 Fundamental concepts and applications

n variables, it is NP-complete to decide whether there exists a point X ∗ ∈ B n such


that ψ(X∗ ) = 1. Trivially, the
 answer to this decision
 problem is affirmative if and

only if the DNF ψ = m k=1 i∈Ak x i j ∈Bk x j is not in T . 

We now extend Theorem 1.29 to a broad category of functional membership


problems. This result can be found in Hegedűs and Megiddo [481]; it relies on a
simple extension of an argument originally proposed by Peled and Simeone [735].

Theorem 1.30. Let C be any class of Boolean functions with the following
properties:
(a) There exists a function g such that g  ∈ C.
(b) For all n ∈ N, the constant function 1n is in C.
(c) C is closed under restrictions; that is, if f is a function in C, then all functions
obtained by fixing some variables of f to either 0 or 1 are also in C.
Then, the problem Functional Membership in C is NP-hard.

Proof. Let g  ∈ C be a function of m variables, and let γ be an arbitrary DNF


representation of g. We are going to reduce the problem Functional Membership
in T to Functional Membership in C. Let φ be a DNF in n variables (defining
an instance of Functional Membership in T ) and let us construct a new DNF
ψ on B n+m :
ψ(X, Y ) = φ(X) ∨ γ (Y ),
where X and Y are disjoint sets of n and m variables, respectively. Notice that
ψ can be constructed in time polynomial in the length of φ, since γ is fixed
independently of φ. We claim that φ represents a tautology (that is, φ = 1n ) if and
only if ψ represents a function in C.
Indeed, if φ = 1n , then ψ = 1n+m and, by virtue of condition (b), ψ represents
a function in C. Conversely, if φ is not identically 1, then there exists a point
X∗ ∈ B n such that φ(X∗ ) = 0, so that ψ(X∗ , Y ) = γ (Y ) for all Y ∈ Bm . Thus, γ is
a restriction of ψ and condition (c) implies that ψ does not represent a function in
C (remember that γ represents g ∈ C). This proves the claim and the theorem. 

In spite of its apparent simplicity, Theorem 1.30 is a very general result that
can be applied to numerous classes of interest due to the weakness of its premises.
Indeed, condition (a) is perfectly trivial because the membership question would
be vacuous without it. Condition (b) is quite weak as well: It is fulfilled by all
the classes introduced earlier in this section, except by Z (remember Theorem
1.28). Condition (c) is stronger than the first two, but it arises naturally in many
situations. In particular, the condition holds again for all the classes of functions
discussed above.
Without further knowledge about the class C, Theorem 1.30 does not allow us to
draw conclusions about NP-completeness or co-NP-completeness of the member-
ship problem. In any specific application of the theorem, however, we may know
1.11 Recognition of functional and DNF properties 43

that the problem is in NP or in co-NP, and we may strengthen the conclusions


accordingly. Several examples will be encountered further in the book. Some of
the results presented in Chapter 11 (characterizations by finite sets of functional
equations), in particular, imply that certain classes of functional membership prob-
lems are in co-NP. Also, Aizenstein et al. [13] investigate various relations between
the Functional Membership problem and query learnability of Boolean formu-
las, which allow them to derive a general criterion for Functional Membership
to be in co-NP.
Let us illustrate these comments on a few simple examples.

Theorem 1.31. The problem Functional Membership in F+ is co-NP-


complete. The problem Functional Membership in Fk is co-NP-complete for
all k ∈ N.

Proof. As already observed, F+ and Fk (k ∈ N) fulfill conditions (a)–(c) in


Theorem 1.30.
The problem Functional Membership in F+ is in co-NP: Indeed, to certify
that a function f is not in F+ , it suffices to exhibit two points X and Y such that
X ≤ Y , f (X) = 1 and f (Y ) = 0.
The problem Functional Membership in Fk is also in co-NP when k ≤ 2, as
follows from Theorem 11.4 and Theorem 11.5 in Chapter 11.
A similar argument does not apply when k ≥ 3 (because Fk cannot be char-
acterized by a finite set of functional equations; see the comments at the end of
Chapter 11). However Aizenstein et al. [13] were able to establish that Fk is in
co-NP for all k ∈ N. 

Let us finally observe that, even though they are not direct corollaries of The-
orem 1.30, some related complexity results may sometimes be derived from it as
well. For instance:

Theorem 1.32. Given a DNF expression ψ(x1 , x2 , . . . , xn ) of a function f and an


index i ∈ {1, . . . , n}, it is NP-complete to decide whether the variable xi is essential
for f , and it is co-NP-complete to decide whether f is positive in xi .

Proof. If there exists a polynomial algorithm to check whether a variable is


essential or not, then the same algorithm can be applied repeatedly (for every
variable) to decide in polynomial time whether a given function is identically 1
or not. Thus, detecting essential variables is NP-hard. Moreover, to show that xi
is essential, it is enough to exhibit two points X and Y that differ only in their
i-th component such that f (X)  = f (Y ). This establishes that the problem is in
NP. A similar reasoning shows that testing the positivity of individual variables is
co-NP-complete. 
44 1 Fundamental concepts and applications

1.12 Other representations of Boolean functions


As we mentioned earlier, we mostly concentrate in this book on Boolean functions
represented by Boolean expressions, in particular by DNF and CNF expressions.
The applications presented in Section 1.13 demonstrate that this class of repre-
sentations is extremely rich and allows us to model and tackle a wide variety of
interesting problems.
However, many other representations of Boolean functions also exist and have
proved useful in various contexts. We briefly mention here some of the most
important ones.

1.12.1 Representations over GF(2)


Definition 1.28. The exclusive-or function, or parity function, is the Boolean
function ⊕: B2 → B defined by

⊕(x1 , x2 ) = x1 x 2 ∨ x 1 x2

for all x1 , x2 ∈ B. We usually write x1 ⊕ x2 instead of ⊕(x1 , x2 ).


It is easy to check that, when viewed as a binary operator, ⊕ is commutative
and associative, that is, x1 ⊕ x2 = x2 ⊕ x1 and (x1 ⊕ x2 ) ⊕ x3 = x1 ⊕ (x2 ⊕ x3 ) for all
x1 , x2 , x3 ∈ B.Also, for every n ∈ N0 , the function f (x1 , x2 , . . . , xn ) = x1 ⊕x2 ⊕. . .⊕
xn takes value 1 exactly when the number of ones in the point (x1 , x2 , . . . , xn ) ∈ B n
is odd. Actually, the operation ⊕ defines addition modulo 2 over the Galois field
GF(2) = ({0, 1}, ⊕, ∧).
It is well-known that every Boolean function can be represented uniquely as
a sum-of-products modulo 2. Namely, if we let P(N ) denote the power set of
N = {1, 2, . . . , n}, then:
Theorem 1.33. For every Boolean function f on Bn , there exists a unique mapping
c : P(N ) → {0, 1} such that
 
f (x1 , x2 , . . . , xn ) = c(A) xi . (1.16)
A∈P(N ) i∈A

Proof. We provide a constructive proof from first principles. To establish the exis-
tence of the representation, we use induction on n. A representation of the form
(1.16) clearly exists when n = 0, or when n = 1 (since x = x ⊕ 1). For n > 1, the
existence of the representation directly follows from the trivial identity (note the
analogy with the Shannon expansion (1.15)):

f = f|xn =0 ⊕ xn f|xn =0 ⊕ xn f|xn =1 . (1.17)

Indeed, by induction, both f|xn =0 and f|xn =1 can be expressed in the form (1.16).
Substituting these expressions in (1.17) yields a sum-of-products modulo 2 that
may contain pairs of identical terms. In this case, these pairs of terms can be
removed using the identity x ⊕ x = 0.
1.12 Other representations of Boolean functions 45

n
To prove uniqueness, it suffices to observe that there are exactly 22 expressions
of the form (1.16) and that this is also the number of Boolean functions on B n . 

Representations of Boolean functions over GF(2) are sometimes called Reed-


Muller expansions, or Zhegalkin polynomials, or algebraic normal forms. They
are a common tool in algebra (see, for instance, Pöschel and Rosenberg [752]), in
cryptography and coding theory (see, for instance, McWilliams and Sloane [642] or
the survey by Carlet [170]), and in electrical engineering (see, for instance, Astola
and Stanković [35]; Davio, Deschamps and Thayse [259]). The concept of Boolean
derivative (introduced by Reed [782]) also plays a useful role in these applications.
Definition 1.29. Let f be a Boolean function on Bn , and let k ∈ {1, 2, . . . , n}. The
(Boolean) derivative of f with respect to xk is the function ∂f /∂xk : B n−1 → B
defined by
∂f
= f|xk =0 ⊕ f|xk =1 . (1.18)
∂xk
Comparing (1.18) with (1.17), we see that ∂f /∂xk acts indeed like a formal
derivative. Also, it is quite obvious that a function f depends on its k-th variable
if and only if ∂f /∂xk  = 0. A complete theory of Boolean differential calculus can
be built on the basis of Definition 1.29; see [35, 259, 795, 862].

1.12.2 Representations over the reals


A pseudo-Boolean function of n variables is a function on B n into R, that is, a
real-valued function of Boolean variables. Pseudo-Boolean functions provide a
far-reaching extension of the class of Boolean functions and are discussed in more
detail in Chapter 13. For now, we simply need the following fact (compare with
Theorem 1.33):
Theorem 1.34. For every pseudo-Boolean function f on B n , there exists a unique
mapping c : P(N ) → R such that
 
f (x1 , x2 , . . . , xn ) = c(A) xi . (1.19)
A∈P(N ) i∈A

Proof. This result will be established in Chapter 13; see Theorem 13.1. 

In particular, every Boolean function has a unique representation as a multilinear


polynomial over the reals.
Example 1.30. The function f = x y ∨ x z ∨ y z can be expressed as
f = x y + (1 − x) (1 − z) + (1 − y) z
(by orthogonality) or, after some rewriting, as
f = 1 − x + x y + x z − y z.

46 1 Fundamental concepts and applications

For Boolean functions, we could actually have observed the existence of the rep-
resentation (1.19) while discussing orthogonal expressions in Section 1.6. Indeed,
the existence of a polynomial representation is an immediate corollary of The-
orem 1.7. The latter result also underlines that Boolean functions admit various
representations over the reals.
However, the uniqueness of the multilinear polynomial (1.19) makes it espe-
cially attractive, as it provides a canonical representation of every Boolean
function. Note that checking whether a given multilinear polynomial represents a
Boolean function, rather than an arbitrary pseudo-Boolean function, is quite easy.
Indeed, a pseudo-Boolean function f is Boolean if and only if f 2 (X) = f (X)
for all X ∈ Bn . This condition can be checked efficiently due to the unicity of
expression (1.19).
Finally, we note that, when the Boolean function f is viewed as a function on
the domain {−1, +1}n and taking its values in {−1, +1}, then f obviously admits
an alternative polynomial representation of the form (1.19), sometimes called the
Fourier expansion of f . Although the Fourier expansion is perfectly equivalent
to the multilinear polynomial in 0-1 variables, one of these two expressions may
occasionally prove more useful than the other, depending on the intended purpose.
Applications of the Fourier expansion in the theoretical computer-science literature
are numerous; some illustrations can be found, for instance, in [163, 543, 714, 716];
we refer to Bruck [157] for an introduction to this very fruiful topic. (See also Carlet
[170] for uses of the pseudo-Boolean representation and of the Fourier expansion
of Boolean functions in crytpography and in coding theory.)

1.12.3 Binary decision diagrams and decision trees


We have already discussed the analogy between the representation of Boolean
functions by combinational circuits and Boolean circuits investigated in complex-
ity theory. Another graphical representation of Boolean functions is provided by
binary decision diagrams. A binary decision diagram (or BDD) on n variables
consists of an acyclic directed graph G = (V , A) in which exactly one vertex (the
root) has indegree 0, and all vertices have outdegree either 0 (the leaves) or 2 (the
inner vertices), together with a labeling of the vertices and of the arcs. The inner
vertices are labeled by variables from {x1 , x2 , . . . , xn }, while the leaves get labels
from {0, 1}. One of the arcs leaving each inner vertex is labeled by 0, the other
by 1.
The BDD G = (V , A) represents a Boolean function fG on B n , in the following
sense. For each point X∗ ∈ Bn , the value of fG (X ∗ ) is computed recursively by
traversing G, starting from its root. If vertex v is reached during the traversal, and v
is labeled by variable xi , then the traversal leaves v along the arc labeled by xi∗ . The
value of fG (X ∗ ) is the label of the leaf reached at the end of the computation. (Note
that by switching the labels on the leaves, we can similarly view G as providing a
representation of fG , the complement of fG .)
1.12 Other representations of Boolean functions 47

Example 1.31. A binary decision diagram is displayed in Figure 1.4. It is easy to


verify that it represents the Boolean function f (x1 , x2 , x3 ) = x 2 ∨ x1 x3 ∨ x 1 x 3 . 
Some special classes of BDDs have been more thoroughly investigated in the
literature. A BDD is a decision tree if its underlying graph is a tree. A BDD is
ordered, or is an OBDD, if there exists a permutation π = (π1 , π2 , . . . , πn ) of
{1, 2, . . . , n} with the following property: πi < πj for each arc (u, v) ∈ A such that
u is labeled by xi and v is labeled by xj . Note that, in an OBDD, the variables
that appear on a path from the root to a leaf form a subsequence of π , and each
input variable is read at most once while evaluating the value of the function at
any given point.
Example 1.32. The BDD in Figure 1.4 is ordered by the permutation π =
(2, 1, 3). 
BDDs have become popular in the engineering community, mostly since Bryant
[160] established the efficiency of OBDDs for performing several operations on
Boolean functions (evaluation, solution of Boolean equations, etc.). Decision trees
are widely used in artificial intelligence, where they provide a tool for the solu-
tion of various machine learning and classification problems (e.g., Quinlan’s ID3
method; see [770] and Section 12.2.5 in Chapter 12). BDDs have also been studied
in the theoretical computer-science literature, under the name of branching pro-
grams, in connection with the derivation of lower bounds on the computational
complexity of structured Boolean functions. A very thorough account of the litera-
ture on BDDs is found in Wegener’s book [903] and in the survey paper by Bollig
et al. [100].
Although we do not intend to discuss BDDs in great detail, we nevertheless
establish a few connections between the concepts introduced in this section and
✏
r x2
✒✑

0 ❅ 1

✠ ❅
❘✏

1 u x1
✒✑

0 ❅ 1

✏ ✠ ❅
❘✏

x3 w v x3
✒✑ ✒✑
❅ ❅
0 ❅ 1 0 ❅ 1
❅ ❅
✠ ❘
❅ ✠ ❘

1 0 1
Figure 1.4. A binary decision diagram.
48 1 Fundamental concepts and applications

Procedure Decision Tree(f )


Input: A Boolean function f (x1 , x2 , . . . , xn ).
Output: A decision tree D(f ) = (V , A) representing f .

begin
if f is constant then
D(f ) has a unique vertex r(f ) (which is both its root and its leaf);
r(f ) is labeled with the constant value of f (either 0 or 1);
else
let f0 := f|xi =0 and f1 := f|xi =1 ;
run Decision Tree(f0 ) to build D(f0 ) with root r(f0 );
run Decision Tree(f1 ) to build D(f1 ) with root r(f1 );
introduce a root r(f ) labeled by x1 ;
make r(f0 ) the right son and r(f1 ) the left son of r(f );
label the arc (r(f ), r(f0 )) by 0 and the arc (r(f ), r(f1 )) by 1;
return D(f );
end

Figure 1.5. Procedure Decision Tree

in the remainder of the chapter. Each vertex u ∈ V of a BDD-graph G = (V , A)


can in fact be viewed as defining a Boolean function f u : Rather than starting the
computation at the root, as we did when defining fG , simply start it at vertex u.
So, if r is the root of G, we have fG = f r . Alternatively, if u is labeled by variable
xi , and if v, w are the children of u, then it is easy to see that f u = xi f v ∨ x i f w
(where we assume that the arc (u, v) is labeled by 1, and the arc (u, w) is labeled
by 0). The similarity of this construction with the Shannon expansion (1.15) is
rather obvious. Note, however, that f v and f w may depend on xi and, thus, are
generally not equal to f|xui =1 and f|xui =0 , respectively.
On the other hand, when G is a decision tree, then one may safely assume that
each variable xi is encountered at most once on any path from the root to a leaf.
Thus, with the same notations as above, f v = f|xui =1 and f w = f|xui =0 in decision
trees. Conversely, for an arbitrary function f (x1 , x2 , . . . , xn ), an ordered decision
tree D(f ) representing f can be obtained by successive Shannon expansions as
described in Figure 1.5 (compare with the comments following Theorem 1.16 in
Section 1.8).
Another interesting observation is that every BDD G gives rise to an orthogonal
DNF of fG . To see this, consider the set P of all directed paths from the root r
to the leaves with label 1. Suppose that a particular path P contains the vertices
u1 (= r), u2 , . . . , up , in that order, where ui is labeled by variable xk(i) and the arc
(ui , ui+1 ) is labeled by ai ∈ {0, 1}, and assume that the conjunction

C(P ) = xk(i) x k(j )


i|ai =1 j |aj =0
1.13 Applications 49

is not identically 0 (that is, a variable xk and its complement x k do not simulta-
neously appear in the conjunction). Then, C(P ) is an implicant of fG . Moreover,

P ∈P:C(P ) ≡0 C(P ) is an orthogonal DNF of fG .
Of course, by applying the same procedure to the paths from the root to the
0–leaves of G, one can similarly compute an orthogonal DNF of fG and of the
dual function fGd .
Example 1.33. When we apply this procedure to the binary decision diagram in
Figure 1.4, we obtain the orthogonal DNF ψ = x 2 ∨ x1 x2 x3 ∨ x 1 x2 x 3 for the func-
tion f represented by the BDD, and the orthogonal DNF φ = x 1 x2 x3 ∨ x1 x2 x 3 for
its complement f . 

For arbitrary BDDs, the above procedure may be inefficient because the number
of paths in P may be exponentially large in the size of G. When G is a decision
tree, however, we obtain a stronger result:
Theorem 1.35. Let f be a Boolean function represented by a decision tree D, let
L be the number of leaves of D and let δ be the depth of D, that is, the length of
a longest path from root to leaf in D. Then, an ODNF of f and an ODNF of f d
with degree δ can be computed in time O(δL).
Proof. When D is a decision tree, there is exactly one path from the root to each
leaf of D. Hence, the number of terms in the ODNF is at most L, and each term
can be built in time O(δ). 

Finally, we note the following corollary:


Theorem 1.36. Under the assumptions of Theorem 1.35, the prime implicants of
f and the prime implicants of f d can be generated in time O(δL) when f is a
positive function.
Proof. This follows from Theorem 1.35 and Theorem 1.24. 

1.13 Applications
In this section, we return to some of the areas of application that we briefly men-
tioned earlier in this chapter: propositional logic, electrical engineering, game
theory, reliability, combinatorics, and integer programming. We sketch how the
basic Boolean concepts arise in these various frameworks and introduce some of
the problems and concepts investigated in subsequent chapters. We stress again,
however, that Boolean functions and expressions play a role in many other fields of
science. We have already mentioned their importance in complexity theory (see,
for instance, Krause and Wegener [583], Papadimitriou [725], Wegener [902]);
in coding theory or in cryptography (see Carlet [170], McWilliams and Sloane
[642]); and we could cite a variety of additional applications arising in social
sciences (qualitative analysis of data; see Ragin [775]); in psychology (human
50 1 Fundamental concepts and applications

concept learning; see Feldman [326, 327]); in medicine (diagnostic, risk asses-
ment; see Bonates and Hammer [102]); in biology (genetic regulatory networks;
see Kauffman [553], Shmulevich, Dougherty and Zhang [831], Shmulevich and
Zhang [832]), and so on.
Beyond the specific issues arising in connection with each particular applica-
tion, we want to stress that the unifying role played by Boolean functions and,
more generally, by Boolean models, should probably provide the main motivation
for studying this book (it certainly provided one of the main motivations for writ-
ing it). This theme will be recurrent throughout subsequent chapters, where we
will see that the same basic Boolean concepts and results have repeatedly been
reinvented in various areas of applied mathematics.

1.13.1 Propositional logic and artificial intelligence


As suggested in Application 1.1, propositional logic is essentially equivalent to the
calculus of Boolean functions (see, e.g., Stoll [848], Urquhart [882]). Besides its
fundamental role as a theoretical model of formal reasoning, propositional logic
has found practical applications in several domains of artificial intelligence. For
more information on this topic, we refer the reader to classic texts by Chang and
Lee [186], Gallaire and Minker [359], Loveland [627], Kowalski [582], Jeroslow
[533], Anthony and Biggs [29], and so on. We only briefly touch here upon the
surface of this topic.
Consider three propositional variables, say x, y, z. The exact interpretation of
these variables is not relevant here, but, for the sake of the discussion, one may
think of them as representing elementary propositions such as:
x : The patient shows symptom X.
y : Test Y is negative.
z : Diagnosis Z applies.
The knowledge base of an expert system is a list of rules expressing logical rela-
tionships of the “if-then-else” type between the propositional variables of interest.
For instance, a knowledge base may contain the following rules:
Rule 1 : “If x is false and y is true then z is true.”
Rule 2 : “If x is false and y is false then z is false.”
Rule 3 : “If z is true then x is false.”
Rule 4 : “If y is true then z is false.”
Let us associate a Boolean expression φ(x, y, z) with the above knowledge base:

φ(x, y, z) = x y z ∨ x y z ∨ x z ∨ y z, (1.20)
where each term of φ corresponds in a straightforward way to one of rules 1–4. The
interpretation of φ is easy: A 0–1 point (x, y, z) is a false point of φ if and only if the
corresponding assignment of True–False values to the propositional variables does
not contradict any of the rules in the knowledge base. Thus, in the terminology of
1.13 Applications 51

logic theory, the set of solutions of the Boolean equation φ(x, y, z) = 0 is exactly
the set of models of the knowledge base. In particular, the set of rules is not “self-
contradictory” if and only if φ is not identically 1, that is, if and only if the Boolean
equation φ = 0 admits at least one solution (which is easily seen to be the case for
our small example).
The main purpose of an expert system is to draw inferences and to answer
queries involving the propositional variables, such as: “Is the assignment z = 1
consistent with the given set of rules?” (that is, “Does diagnosis Z apply under at
least one imaginable scenario?”). This question can be answered by plugging the
value z = 1 into φ and checking whether the resulting Boolean equation φ|z=1 = 0
remains consistent. For our example, this procedure yields the equation

x y ∨ x ∨ y = 0,

which is clearly inconsistent. Thus, z = 1 is not possible in the world described by


the above knowledge base.
This short discussion illustrates how simple questions pertaining to the atomic
propositions involved in a knowledge base can be reduced to the solution of
Boolean equations. The solution of Boolean equations by algebraic techniques
has been an ongoing topic of research ever since Boole’s original work appeared
in print 150 years ago. We return to Boolean equations in much greater detail in
Chapter 2 of this book.
In actual expert systems, for pragmatic reasons of computational efficiency, it
is usual to restrict the rules incorporated in the knowledge base to so-called Horn
clauses, namely, to rules of the form
if xi1 is true and xi2 is true and ... and xik is true, then xik+1 is true,
or
either xi1 is false or xi2 is false or ... or xik is false,
where xi1 , xi2 , . . ., xik , xik+1 are arbitrary variables. When all the rules are Horn
clauses, then the associated Boolean expression φ is a DNF with terms of the form
xi1 xi2 . . . xik x ik+1 or xi1 xi2 . . . xik . This leads to the following definition:

Definition 1.30. A DNF is a Horn DNF if each of its terms contains at most one
complemented variable.

We will show in Chapter 6 that, when φ is a Horn DNF, the Boolean equation
φ(X) = 0 can be solved easily, more precisely, in linear time. This single fact
suffices to explain the importance of Horn DNFs in the context of expert systems,
where large Boolean equations must be solved repeatedly. Moreover, we also
discover in Chapter 6 that Horn DNFs possess a host of additional remarkable
properties making them a worthwhile object of study.
Before we close this section, we must warn the reader that our view that (propo-
sitional) knowledge bases define Boolean expressions and, concomitantly, Boolean
functions, is quite unorthodox in the artificial intelligence literature, where rules
are more traditionally regarded as forming a “loose” collection of Boolean clauses
52 1 Fundamental concepts and applications

rather than a single function. We claim, however, that our point of view has defi-
nite advantages over the traditional one. Indeed, it allows us to take advantage of
the huge body of knowledge regarding Boolean functions and to draw inspiration
from concepts and properties pertaining to such functions.
As an example of this general claim, let us go back to the small knowledge base
just given and to the corresponding DNF φ displayed in equation (1.20). It should
be clear from our previous discussion that, as far as drawing inferences goes, all the
information contained in the knowledge base is adequately translated in φ. More
precisely, any Boolean expression representing the same Boolean function as φ
provides the same information as the original knowledge base. Indeed, if ψ is any
expression such that ψ = φ, then the set of models of the knowledge base is in one-
to-one correspondence with the set of false points of ψ, which coincides with the
set of false points of φ. This observation implies that Boolean transformations can
sometimes be applied in order to obtain a simpler, but equivalent, representation
of the knowledge base. The simplification of Boolean expressions is one of the
main topics of Chapter 3. For now, however, the discussion in Section 1.7 already
suggests that the prime implicants of φ may play an interesting role in this respect.
For our example, it turns out that φ only has two prime implicants, namely, x y and
z. By way of consequence (recall Theorem 1.13), φ = x y ∨ z, so that the original
rules 1–4 are equivalent to the conjunction of the following two rules:
Rule 5 : “either x is true or y is false.”
Rule 6 : “z is false.”
(Note that Rule 6 provides a confirmation of our previous conclusion, according
to which z can never be true.)
Recently, the Boolean formalism has found a very central role in another area of
artificial intelligence, namely, in computational learning theory. In intuitive terms,
many of the fundamental questions in this field take the following form: Given a
class C of Boolean functions and an unknown function f in C, how many rows
of the truth table of f is it necessary to query to be “reasonably confident” that f
is known with “sufficient accuracy?” Another type of question would be: Given a
class C of Boolean functions and two subsets (or “samples”) T , F ⊆ B n , is there a
function f ∈ C such that f takes value 1 on T and value 0 on F ? Related issues
will be tackled in Chapter 12 of this book. For more information on computational
learning theory, we refer the reader to the textbook by Anthony and Biggs [29] and
to survey papers by Anthony [26] and Sloan, Szörényi, and Turán [838].

1.13.2 Electrical and computer engineering


We have already mentioned that every switching or combinational circuit can be
viewed as a device computing the value of a Boolean function f (see Applica-
tion 1.2). Given a description of an {AND,OR,NOT}–circuit, an expression of f
can be constructed recursively. Indeed, let us assume that the output gate of the
circuit is an OR-gate. If we delete this gate, then we obtain two subcircuits that
1.13 Applications 53

compute two functions, say f1 and f2 , for which we can (recursively) construct
the representations φ1 and φ2 . Then, the expression φ1 ∨ φ2 represents f .

Example 1.34. The circuit displayed in Figure 1.6 computes the function
φ = (x1 ∨ x4 )(x1 ∨ (x2 x 3 )). (1.21)


It can be much more difficult, however, to obtain a DNF of the function asso-
ciated to a given circuit. Fortunately, for many applications, it is sufficient to have
an implicit representation of f via a DNF ψ(X, Y , z) similar to the DNF pro-
duced by the procedure Expand (see Section 1.4). In the DNF ψ(X, Y , z), the
vector X represents the inputs of the circuit, z represents its output, and Y can be
viewed as a vector of variables associated with the outputs of the “hidden” gates
of the circuit (in the physical realization of a switching circuit, the input and ouput
signals can be directly observed, whereas the value of all other signals cannot,
hence the qualifier “hidden” applied to these internal gates). On every input signal
X∗ , the circuit produces the output z∗ , where (X∗ , Y ∗ , z∗ ) is the unique solution
of the equation ψ(X∗ , Y , z) = 0. (See Abdulla, Bjesse, and Eén [1] for a more
detailed contribution along similar lines.) Let us illustrate this construction on an
example.
Example 1.35. Consider again the circuit displayed in Figure 1.6 and the corre-
sponding expression φ given by (1.21). We have already shown in Example 1.12

✗✔
x4 
✖✕ ✗✔


z

✿ OR PP

✘✘✘ ✖✕ PP
✛✘ ✘✘✘ P
x1 ✘ ✘✘✘ PP
PP
PP ✗✔ PP
✚✙ PP PP
P PP q AND
✗✔✏
PP
P ✏ ✏ ✏

✖✕
PP ✗✔✏ NOT
✶ ✖✕
✗✔ P PP ✏
q OR ✏
x2  ✯ ✖✕


 ✗✔ ✟
✖✕  ✟
 ✟
z AND
✗✔ ✟✯

✟ ✖✕
NOT
✗✔ ✯✖✕

✟✟
x3
✖✕
Figure 1.6. Combinational circuit for Example 1.34.
54 1 Fundamental concepts and applications

that, when applying Expand to φ, we obtain the expression


ψ = y 1 z ∨ y2 z ∨ y1 y 2 z ∨ x1 y 1 ∨ x4 y 1 ∨ x 1 x 4 y1
∨ x1 y 2 ∨ y4 y 2 ∨ x 1 y 4 y2 ∨ x 2 y4 ∨ x3 y4 ∨ x2 x 3 y 4 . (1.22)
For every point (x1∗ , x2∗ , x3∗ , x4∗ ) describing the input signals, the output of the
circuit is given by the value of z in the unique solution of the equation
ψ(x1∗ , x2∗ , x3∗ , x4∗ , y1 , y2 , y4 , z) = 0. Similarly, as discussed in Example 1.12, the value
of y1 in this solution indicates the state of the first (topmost) OR-gate produced by
the inputs (x1∗ , x2∗ , x3∗ , x4∗ ); the value of y2 indicates the state of the second OR-gate;
and the value of y4 indicates the state of the first AND-gate.
Consider, for instance, the input (x1∗ , x2∗ , x3∗ , x4∗ ) = (0, 1, 1, 1). In this point, the
equation ψ = 0 boils down to
y 1 z ∨ y2 z ∨ y1 y 2 z ∨ y 1 ∨ y4 y 2 ∨ y 4 y2 ∨ y4 = 0,
which has the unique solution (y1∗ , y2∗ , y4∗ , z∗ ) = (1, 0, 0, 1). Thus, the output signal
of the circuit is z∗ = 1, which is indeed equal to φ(0, 1, 1, 1). 
We now turn to the opposite problem of constructing a circuit that computes a
given Boolean function. Notice that, as an easy consequence of Theorem 1.4, such
a circuit exists for every Boolean function. Actually, if we allow AND-gates and
OR-gates to have indegree larger than two, then every DNF can even be computed
by a switching circuit involving at most four layers and one OR-gate, with all input
gates in the first layer, all NOT-gates in the second layer, all AND-gates in the third
layer, and the OR-gate in the fourth layer (in the role of output gate). By the same
reasoning, every positive function corresponds to a circuit involving at most three
layers and one OR-gate, since NOT-gates are superfluous in this case.
Broadly speaking, the basic issue of circuit design (or network synthesis) can be
formulated as follows: Given a Boolean function f , we want to construct a combi-
national circuit of minimal size that computes f and that satisfies a number of pre-
specified side constraints. The measure of size used in the optimality criterion may
vary, but it is usually related to the number of gates and/or to the depth (that is, the
length of the longest path) of the circuit. The side constraints may restrict the types
of gates that are allowed (only AND-gates and NOT-gates, no NOT-gates, etc.) or
the indegree of the gates, or, more generally, may be motivated by considerations
of reliability, manufacturability, availability of technology, and so on.
Circuit design problems of this nature have for several decades been addressed
in the engineering literature; see, for instance, Adam [5]; Astola and Stanković
[35]; Brayton et al. [153]; Brown [156]; Hu [511, 512]; Kunz and Stoffel
[590]; McCluskey [634, 635]; Muroga [698]; Sasao [804]; Villa, Brayton, and
Sangiovanni-Vincentelli [891], and so on. They have given rise, among other note-
worthy contributions, to a host of results concerning the size of representations of
Boolean functions, a topic to which we will return in Chapter 3 of this book. More
recently, theoretical computer scientists have shown renewed interest for similar
questions arising in the framework of computational complexity. Although their
1.13 Applications 55

research stresses asymptotic measures of performance (“Is it possible to compute


all functions in a given class by circuits of polynomial size?”) rather than engineer-
ing or economic considerations, the issues they investigate remain very much akin
to those studied in electrical engineering. We refer the reader to the monographs by
Wegener [902] and Vollmer [892] or to the survey by Krause and Wegener [583]
for a wealth of information on this line of research, which largely falls outside the
scope of our book.
Starting in the late 1950s, electrical engineers have devoted a lot of attention,
from both an applied and a theoretical perspective, to combinational circuits built
from a remarkable type of switching gates called threshold gates. For our purpose
(and brushing aside all technicalities involved in their implementation), threshold
gates are electronic devices that compute a special class of Boolean functions
called threshold functions.
Definition 1.31. A Boolean function f on Bn is a threshold (or linearly separable)
function if there exist n weights w1 , w2 , . . . , wn ∈ R and a threshold t ∈ R such that,
for all (x1 , x2 , . . . , xn ) ∈ Bn ,
n

f (x1 , x2 , . . . , xn ) = 0 if and only if wi xi ≤ t.
i=1

In geometric terms, threshold functions are precisely those Boolean functions


for which the set of true points can be separated from the set of false points

by a hyperplane, namely, the separator {X ∈ Rn | ni=1 wi xi = t}. It is easy to
see that elementary conjunctions and disjunctions are threshold functions. As a
consequence, every Boolean function can be realized by a circuit involving only
threshold gates. The problem of designing optimal circuits of threshold gates has
generated a huge body of literature. The concept of Chow parameters, for instance,
has originally been introduced with the purpose of providing a numerical charac-
terization of threshold functions (see Chow [194] and Winder [920] or books by
Dertouzos [269], Hu [511], Muroga [698], etc.). We devote two chapters (Chapters
8 and 9) of this book to an investigation of the properties of threshold and related
functions.

1.13.3 Game theory


As introduced in Applications 1.3 and 1.7, a simple game (or voting game) v on
a set of players N = {1, 2, . . . , n} can be modeled as a positive Boolean function
fv on B n . This concept was introduced by von Neumann and Morgenstern [893]
(albeit in set-theoretic, rather than Boolean terminology), in the seminal book that
laid the foundations of game theory, and was further developed in Shapley [828].
More recent discussions can be found in several books, for instance [79, 777, 850].
Many of the notions introduced in previous sections have natural interpreta-
tions in a game-theoretic setting. Consider for instance an implicant i∈A xi of
the function fv , where A ⊆ N , and consider any point X∗ ∈ B n such that xi∗ = 1
56 1 Fundamental concepts and applications

for all i ∈ A (all players in A cast a “Yes” vote). By definition of an implicant,


fv (X ∗ ) = 1 and, in view of the translation rules proposed in Application 1.3,
v(supp(X∗ )) = 1, where supp(X ∗ ) is the set of players who voted “Yes.” Thus,
the set A can be viewed as a group of players who, when simultaneously vot-
ing in favor of an issue, have the power to determine the outcome of the vote
irrespective of the decision made by the remaining players. In game theory, such
a decisive group of players is called a winning coalition. Clearly then, a prime
implicant simply corresponds to an (inclusion-wise) minimal winning coalition,
that is, to a subset A of players such that v(A) = 1, but v(B) = 0 for all subsets
B ⊂ A, B  = A.
It is well-known that every simple game is completely determined by the col-
lection of its minimal winning coalitions, a fact we can regard as an immediate
corollary of the results established in previous sections (see, for instance, Theorem
1.13 and Theorem 1.23).
A straightforward counterpart of minimal winning coalitions is provided by
maximal losing coalitions, that is, by those subsets A of players such that v(A) = 0,
but v(B) = 1 for all supersets B ⊃ A, B  = A. It is easy to see that the maximal losing
coalitions of v are in one-to-one correspondence with the prime implicates of fv ,

in the sense that A is a maximal losing coalition of v if and only if i∈N\A xi is a
prime implicate of fv (see also Theorem 1.27). The question of characterizing the
collection L of maximal losing coalitions in terms of the collection W of minimal
winning coalitions, or even of generating L from W, arises quite naturally in this
setting. We shall tackle this type of issues in Chapter 4, in the broader framework
of duality theory.
The most common voting rules used in legislative or corporate assemblies are
modeled by the class of so-called weighted majority games. In such a game, each
player i carries a positive weight wi ∈ R: When he votes in favor of an issue,
player i contributes his full weight wi toward the issue (i = 1, 2, . . . , n). The issue
is adopted if the sum of the weights cast in its favor exceeds a predetermined
threshold t. Comparing this definition with Definition 1.31, it is not too hard to
see that weighted majority games correspond exactly to positive threshold func-
tions. As a consequence, the theory of threshold functions has been thoroughly
investigated in the game-theoretic literature. This happened at about the same
time threshold functions were also attracting the attention of electrical engineers
(starting mostly in the late fifties), so that many properties have been independently
(re)discovered by researchers active in these two fields.
Another main theme of study in game theory is the computation of the “share of
power” held by the players of a game, and several definitions of “power indices”
coexist in the literature. Many of these indices are closely related to the Chow
parameters of the associated Boolean functions (see Dubey and Shapley [279] and
Felsenthal and Machover [329] for detailed presentations). In fact, most indices
are naturally expressed in terms of the so-called modified Chow parameters of the
function, which we now introduce.
1.13 Applications 57

Definition 1.32. The modified Chow parameters of a Boolean function


f (x1 , x2 , . . . , xn ) are the (n+1) numbers (π1 , π2 , . . . , πn , π ) defined as π = ω−2n−1
and πk = 2ωk − ω for k = 1, 2, . . . , n, where (ω1 , ω2 , . . . , ωn , ω) are the Chow
parameters of f .

Note that there is a bijective correspondence between the vectors of Chow


parameters and those of modified Chow parameters. Modified Chow parameters
have been considered both in threshold logic (see [698, 920]) and in game theory
(see [279, 329]). In the terminology of Dubey and Shapley [279], π1 , π2 , . . . , πn
are the swing numbers, or raw Banzhaf indices, of the function. The name “swing
number” refers to the following concept.

Definition 1.33. Let f be a positive Boolean function on B n , and let k ∈


{1, 2, . . . , n}. A swing of f for variable k is a false point X ∗ = (x1∗ , x2∗ , . . . , xn∗ )
of f such that X∗ ∨ ek is a true point of f , where ek = (0, . . . , 0, 1, 0, . . . , 0) denotes
the k-th unit vector.

The relation between swings and modified Chow parameters is simple.

Theorem 1.37. If f is a positive Boolean function on B n with modified Chow


parameters (π1 , π2 , . . . , πn , π), then πk is the number of swings of f for k, k =
1, 2, . . . , n.

Proof. Let Y ∗ be any true point of f such that yk∗ = 1 (note that there are ωk such
points), and write Y ∗ = X∗ ∨ ek , where xk∗ = 0. Then, either X ∗ is a swing for
k, or X ∗ is a true point of f , but not both. Moreover, all swings of f for k and
all true points of f whose k-th component is zero can be obtained in this way.
Denoting by sk the number of swings for k, we conclude that ωk = sk + (ω − ωk )
or, equivalently, sk = πk . 

In voting terms, a swing for variable (that is, player) k corresponds to a losing
coalition (namely, the coalition {i ∈ N | xi∗ = 1}) that turns into a winning coalition
when player k joins it. Intuitively, then, player k retains a lot of power in the game
v if fv has many swings for k, since this means that k plays a “pivotal” role in
many winning coalitions.
Accordingly, many authors define power indices as functions of the number
of swings or, equivalently, of the modified Chow parameters. Banzhaf [52], for
instance, made a proposal which translates as follows in our terminology (see also
Penrose [739] for pioneering work on this topic).

Definition 1.34. If f is a positive, nonconstant Boolean function on B n with


modified Chow parameters (π1 , π2 , . . . , πn , π), then the k-th (normalized) Banzhaf
index of f is the quantity
πk
βk = n ,
i=1 πi

for k = 1, 2, . . . , n.
58 1 Fundamental concepts and applications

The Banzhaf index ranks among the most extensively studied and widely
accepted power indices for voting games. In spite of some fundamental draw-
backs, it agrees on many accounts with what we would intuitively expect from
a reasonable measure of power (see Dubey and Shapley [279]; Felsenthal and
Machover [329]; and Straffin [850] for an axiomatic characterization and exten-
sive discussions of the relation between Banzhaf and other power indices).
Note, for instance, that, in view of Theorem 1.18, the Banzhaf index of an
inessential player is equal to zero. The converse statement also holds for pos-
itive Boolean functions (the proof is left to the reader as an exercise). We
return to this topic in Chapter 9. Many other connections between the theory
of Boolean functions and the theory of simple games will also be established in
the monograph.
Finally, it is interesting to observe that Boolean functions provide useful models
for investigating certain types of nonsimple games, for example, 2-player posi-
tional games in normal form. We do not further discuss this topic now but refer the
reader to Chapter 10 and to Gurvich [421, 423, 424, etc.] for more information.

1.13.4 Reliability theory


As explained in Applications 1.4 and 1.8, reliability theory models every complex
system S by a positive Boolean function fS called the structure function of S. To
rule out trivial cases, it is often assumed that all variables of fS are essential. When
this is the case, the system and its structure function are said to be coherent. This
framework was introduced by Birnbaum, Esary, and Saunders [92] and is further
discussed in Barlow and Proschan [54], Colbourn [205, 206], Ramamurthy [777],
or Provan [759]. Colbourn [206], in particular, examines in depth the interplay
between combinatorial and Boolean reliability models.

Let N = {1, 2, . . . , n} be the set of components. If i∈A xi is an implicant of
the function fS , then the whole system S operates whenever the components in
A operate, irrespectively of the state of the remaining components. In reliability
parlance, the set A is called a pathset of S. If no subset of A is itself a pathset,
then A is called a minimal pathset. Thus, we see that the (minimal) pathsets of S
correspond exactly to the (prime) implicants of fS .
As mentioned in Application 1.4, the fundamental problem of reliability theory
is to compute the probability that the system S operates when its components
fail randomly. Assume for the sake of simplicity that the components work or
fail independently of each other, and let pi denote the probability that component
i works, for i = 1, 2, . . . , n. Thus, we have pi = Prob[xi = 1] and we want to
compute RelS (p1 , p2 , . . . , pn ) = Prob[fS = 1], which is the probability that the
system S operates.

If φ = m k=1 i∈Ak xi j ∈Bk x j is an orthogonal DNF of fS , then Theorem
1.7 can be used to compute RelS (p1 , p2 , . . . , pn ) (see [49, 205, 206, 619, 759,
etc.]). Indeed, denoting by E[fS ] the expected value of the random variable
1.13 Applications 59

fS (x1 , x2 , . . . , xn ), we successively derive:

RelS (p1 , p2 , . . . , pn ) = Prob[fS = 1]


= E[fS ] (since fS is a Bernoulli random variable)
m
  
= E[ xi (1 − xj )] (by Theorem 1.7)
k=1 i∈Ak j ∈Bk
m 
 
= pi (1 − pj ) (by the independence assumption).
k=1 i∈Ak j ∈Bk

When viewed as a function from [0, 1]n to [0, 1], RelS is called the reliability
function or reliability polynomial of S (see, e.g., [54, 205, 206, 777]). Observe
that the polynomial RelS extends the Boolean function fS : {0, 1}n → {0, 1} over
the whole unit cube U n = [0, 1]n . As a matter of fact, if fS is viewed as a pseudo-
Boolean function, and if it is represented as a multilinear polynomial over the reals
(see Section 1.12.2)
 
fS (x1 , x2 , . . . , xn ) = c(A) xi ,
A∈P(N) i∈A

then we similarly conclude that


 
RelS (p1 , p2 , . . . , pn ) = c(A) pi .
A∈P(N) i∈A

Similar observations have also been made in the game theory literature (see [456,
720, 777]).
When pi = 12 for i = 1, 2, . . . , n, all vertices of B n are equiprobable with
probability 2−n . As a result,
1  ω(fS )
RelS 2
, . . . , 12 = Prob[fS = 1] = ,
2n
where ω(fS ) denotes as usual the number of true points of fS . Similarly, if pk = 1
for some component k and pi = 12 for i = 1, 2, . . . , n, i  = k, then
1  ωk
RelS 2
, . . . , 12 , 1, 21 , . . . , 12 = n−1 ,
2
where ωk is the k-th Chow parameter of fS . These observations show that com-
puting the Chow parameters of a positive Boolean function is just a special case of
computing the reliability of a coherent system. Also, similarly to what happened
for simple games, variants of the modified Chow parameters have been used in
the literature to estimate the “importance” of individual components of a coherent
system. Ramamurthy [777] explains nicely how Banzhaf and other power indices
(like the Shapley-Shubik index) have been rediscovered in this framework.
60 1 Fundamental concepts and applications

1.13.5 Combinatorics
Relations between Boolean functions and other classical combinatorial constructs,
such as graphs, hypergraphs, independence systems, clutters, block designs,
matroids, colorings, and so forth, are amazingly rich and diverse. Over time, these
relations have been exploited to gain insights into the constructs themselves (see,
e.g., Benzaken [64]), to handle algorithmic issues related to the functions or the
constructs (see, e.g., Hammer and Rudeanu [460]; Aspvall, Plass, and Tarjan [34];
Simeone [834]) and to introduce previously unknown classes of combinatorial
objects (see, e.g., Chvátal and Hammer [201]). These are but a few examples, and
we will encounter plenty more throughout this book. In this section, we only men-
tion a few useful connections between the study of hypergraphs and the concepts
introduced so far.
The stability function f = fH of a hypergraph H = (N , E) was introduced in
Application 1.5. We observed in Application 1.9 that fH is a positive function. In
fact, if N = {1, 2, . . . , n}, then it is easy to see that

fH (x1 , x2 , . . . , xn ) = xj . (1.23)
A∈E j ∈A

It is important to realize that the function fH does not completely define the
hypergraph H. Indeed, consider two hypergraphs H = (N , E) and H = (N , E ).
If E ⊆ E , and if every edge in E \ E contains some edge in E, then H and H have
exactly the same stable sets, so that fH = fH . Thus, the expression (1.23) of fH
can be rewritten as

fH (x1 , x2 , . . . , xn ) = xj ,
A∈P j ∈A

where P is the set of minimal edges of


H. Putting this observation parallel with
Theorem 1.23, we see that the terms j ∈A xj (A ∈ P) are nothing but the prime
implicants of fH .
Obviously, the minimal edges of a hypergraph H form a clutter (or a Sperner
family), namely, a subhypergraph (N , P) of H with the property that

A ∈ P, B ∈ P, A = B ⇒ A  ⊆ B.

Conversely, any clutter can also be viewed as defining the collection of minimal
edges of a hypergraph or, equivalently, the collection of prime implicants of a
positive Boolean function.
Many operations on hypergraphs or clutters are natural counterparts of oper-
ations on Boolean expressions. For instance, if H = (N, E) is a clutter and
j ∈ N , the clutter H \ j is defined as follows: H \ j = (N \ {j }, F), where
F = E \ {A ∈ E | j ∈ A} (deletion of j ; see, e.g., Seymour [821] or the literature
on matroid theory). Thus, fH\j is simply the restriction of fH to xj = 0.
Similarly, the clutter H/j is defined as H/j = (N \ {j }, G), where G is the
collection of minimal sets in {A \ {j } | A ∈ E} (contraction of j ). We see that
1.13 Applications 61

fH/j is the restriction of fH to xj = 1. We shall come back to these operations in


Chapter 4, when we discuss duality theory.
A (simple, undirected) graph G = (V , E) is a special type of hypergraph such
that |e| = 2 for all edges e ∈ E (we adopt a well-entrenched convention and denote
edges of a graph by lowercase letters). Thus, graphs are in one-to-one correspon-
dence with purely quadratic positive Boolean functions, that is, positive functions
with prime implicants of degree 2 only. This, and related connections between
graphs and quadratic functions, will be exploited repeatedly in later chapters (see,
in particular, Chapter 5). For now, let us just illustrate its use in deriving the fol-
lowing observation due to Ball and Provan [49] (we refer to [371, 725, 883] and
Appendix B for a definition of #P-completeness):

Theorem 1.38. Computing the number of true points of a Boolean func-


tion f expressed in DNF is #P-complete, even if f is purely quadratic and
positive.


Proof. Let f (x1 , x2 , . . . , xn ) = {i,j }∈E xi xj , and let G be the corresponding graph,
namely, G = (N , E). We denote by s(G) the number of stable sets of G, and by
ω(f ) the number of true points of f . Valiant [883] proved that computing s(G) is
#P-complete. Since s(G) = 2n − ω(f ), the result follows. 

As a corollary of this theorem, we can also conclude that computing the Chow
parameters of a quadratic positive function is #P-hard. Observe that these results
actually hold independently of the representation of f . Indeed, if we know in
advance that f is purely quadratic and positive, then the complete DNF of f can
easily be obtained by querying O(n2 ) values of f : For all pairs of indices i, j ∈ N ,
compute f (e{i,j } ), where e{i,j } is the characteristic vector of {i, j }. Those pairs
{i, j } such that f (e{i,j } ) = 1 are exactly the edges of G.
We conclude this section by mentioning one last connection between combina-
torial structures and positive Boolean functions. In 1897, Dedekind asked for the
number d(n) of elements of the free distributive lattice on n elements. This famous
question is often referred to as Dedekind’s problem [572]. As it turns out, d(n) is
equal to the number of positive Boolean functions of n variables. The number
d(n) grows quite fast and its exact value is only known for small values of n; see
Table 1.3 based on Berman and Köhler [73]; Church [196]; Wiedemann [908];
and sequence A000372 in Sloane [840]. Kleitman  n [572] proved that log2 d(n) is
asymptotic to the middle binomial coefficient [n/2] (see also [542, 573, 578, 579]
for extensions and refinements of this deep result).
We should warn the reader, however, that the relations between combina-
torics and Boolean theory are by no means limited to the study of positive
Boolean functions. Later in the book, we shall have several opportunities to
encounter nonpositive Boolean functions linked, in various ways, to graphs
or hypergraphs.
62 1 Fundamental concepts and applications

Table 1.3. The number of positive Boolean


functions of n variables for n ≤ 8

n d(n)

0 2
1 3
2 6
3 20
4 168
5 7581
6 7828354
7 2414682040998
8 56130437228687557907788

1.13.6 Integer programming


Consider a very general 0–1 integer programming problem P of the form
n

maximize z(x1 , x2 , . . . , xn ) = ci x i (1.24)
i=1

subject to (x1 , x2 , . . . , xn ) ∈ F , (1.25)


where c1 , c2 , . . . , cn are integer coefficients and F ⊆ B n is a set of feasible 0–1
solutions. Following Granot and Hammer [410], we call resolvent of F the Boolean
function fF (x1 , x2 , . . . , xn ) that takes value 0 on F and value 1 elsewhere (see also
Hammer and Rudeanu [460], where the function fF is called the characteristic
function of F , or Granot and Hammer [411], where fF is implicitly described).
So, problem P is equivalent to
n

maximize z(x1 , x2 , . . . , xn ) = ci xi (1.26)
i=1

subject to fF (x1 , x2 , . . . , xn ) = 0 (1.27)


n
(x1 , x2 , . . . , xn ) ∈ B . (1.28)
Let us assume for a moment that we have somehow obtained a DNF expression
of the resolvent fF . Then, problem P can be rewritten as a linear 0–1 programming
problem with a very special structure. Indeed, as observed by Balas and Jeroslow
[43] and by Granot and Hammer [410, 411]:
Theorem 1.39. If
m



ψ= xi xj (1.29)
k=1 i∈Ak j ∈Bk

is a DNF expression of the resolvent fF , then problem P is equivalent to the


generalized covering problem
1.13 Applications 63

n

maximize z(x1 , x2 , . . . , xn ) = ci xi (1.30)
i=1
 
subject to xi − xj ≤ |Ak | − 1, k = 1, 2, . . . , m (1.31)
i ∈Ak j ∈Bk

(x1 , x2 , . . . , xn ) ∈ Bn . (1.32)

Proof. We must show that the set of false points of fF coincides with the set of
solutions of (1.31). Let X∗ be a false point of fF . For each k = 1, 2, . . . , m, since
ψ(X∗ ) = 0, either there is an index i ∈ Ak such that xi∗ = 0, or there is an index
j ∈ Bk such that xj∗ = 1. In either case, we see that X ∗ satisfies the k-th inequality
in (1.31). The converse statement is equally easy. 

Theorem 1.39 takes an especially interesting form when fF is positive. Indeed,


remember that a set covering problem is a linear programming 0–1 problem of the
form
n

minimize wi yi
i=1

subject to yi ≥ 1, k = 1, 2, . . . , m
i ∈Sk

(y1 , y2 , . . . , yn ) ∈ Bn ,

where S1 , S2 , . . . , Sm are subsets of {1, 2, . . . , n} (see, e.g., Nemhauser and


Wolsey [707]).
Now, if we assume that (1.29) is a positive DNF of fF , namely, if Bk = ∅ for
k = 1, 2, . . . , n, then we obtain (Granot and Hammer [411]):

Theorem 1.40. If the resolvent fF is positive, and if


m


ψ= xi (1.33)
k=1 i∈Ak

is a positive DNF of fF , then problem P is equivalent to the following set covering


problem SCP:
n

minimize z (y1 , y2 , . . . , yn ) = ci y i (1.34)
i=1

subject to yi ≥ 1, k = 1, 2, . . . , m (1.35)
i ∈Ak

(y1 , y2 , . . . , yn ) ∈ Bn . (1.36)
64 1 Fundamental concepts and applications

Proof. By Theorem 1.39, P is equivalent to


n

maximize z(x1 , x2 , . . . , xn ) = ci x i
i=1

subject to xi ≤ |Ak | − 1, k = 1, 2, . . . , m
i ∈Ak

(x1 , x2 , . . . , xn ) ∈ Bn .

For i = 1, 2, . . . , n, it is now sufficient to replace variable xi by a new variable


yi = 1 − xi in this formulation. 

Note that the latter result motivates the terminology “generalized covering”
used in Theorem 1.39.
Another way to look at Theorem 1.40 is suggested by the connections estab-
lished in Section 1.13.5. Indeed, when fF is positive, the feasible solutions of P
are exactly the stable sets of a hypergraph, and the feasible solutions of SCP are
the transversals of this hypergraph. So, Theorem 1.40 simply builds on the well-
known observation that stable sets are exactly the complements of transversals
(see, e.g., Berge [72]).
Algorithms based on the transformations described in Theorems 1.39 and 1.40
have been proposed in [408, 409, 410]. Several recent approaches to the solution
of Boolean equations also rely on this transformation (see, e.g., [184]).
We shall come back to integer programming problems of the form P in sub-
sequent chapters of the book (see, in particular, Sections 4.2, 8.6, and 9.4). For
now, we conclude this section with a discussion of the complexity of computing
a DNF expression of the resolvent. For this question to make sense, we must first
specify how the set F is described in (1.25). In the integer programming context,
F would typically be defined as the solution set of a system of linear inequalities
in the variables x1 , x2 , . . . , xn , say,
n

aki xi ≤ bk , k = 1, 2, . . . , s. (1.37)
i=1

When this is the case, there is generally no low-complexity, practically efficient


algorithm for computing a DNF of fF . More precisely, in Section 9.5 of Chapter 9
we show that the size of every DNF of the resolvent may be exponentially large in
the input size of (1.37), even when s = 1, that is, when P is a so-called knapsack
problem. (Observe that when s = 1, the resolvent is a threshold function.) We also
see in Chapter 9 that, in this context, the resolvent still turns out to be a useful
concept.
On the other hand, there are examples of combinatorial optimization prob-
lems for which the resolvent of F is directly available in DNF. The most obvious
example, in view of Theorem 1.40, is when P is a set-covering problem.
1.14 Exercises 65

Finally, one should also notice that, as long as the description of F is in NP,
Cook’s theorem [208] guarantees the existence of a polynomial-time procedure
which, given any instance of F , produces an integer t ≥ n and a DNF expression
φ(y1 , y2 , . . . , yt ) such that X∗ ∈ F if and only if φ(y1∗ , y2∗ , . . . , yt∗ ) = 0 for some
(y1∗ , y2∗ , . . . , yt∗ ) ∈ B t . However, although the DNF φ bears some resemblance with
the resolvent of F , it usually involves a large number of additional variables
beside the original variables x1 , x2 , . . . , xn (compare with the DNF produced by the
procedure Expand in Section 1.5).

1.14 Exercises
1. Compute the number of Boolean functions and DNF expressions in n
variables, for n = 1, 2, . . . , 6.
2. Show that (a x ∨ b x) = a x ∨ b x for all a, b, x ∈ B.
3. Prove that every Boolean function has an expression involving only dis-
junctions and negations, but no conjunctions, as well as an expression
involving only conjunctions and negations, but no disjunctions.
4. The binary operator NOR is defined by NOR(x, y) = x y. Show that every
Boolean expression is equivalent to an expression involving only the NOR
operator (and parentheses). Show that the same property holds for the
NAND operator defined by NAND(x, y) = x ∨ y. (See, e.g., [752] for far-
reaching extensions of these observations.)
5. A Boolean function f is called symmetric if f (x1 , x2 , . . . , xn ) =
f (xσ1 , xσ2 , . . . , xσn ) for all permutations (σ1 , σ2 , . . . , σn ) of {1, 2, . . . , n}.
(a) Prove that f is symmetric if and only if there exists a function
g : {0, 1, . . . , n} → B such that, for all X ∈ B n , f (x1 , x2 , . . . , xn ) =

g( ni=1 xi ).
(b) For k = 0, 1, . . . , n, define the Boolean function rk by rk (X) = 1 if and

only if ni=1 xi = k. Prove that f is symmetric if and only if there exists

A ⊆ {0, 1, . . . , n} such that f = k∈A rk .
(c) Prove that the set of all symmetric functions is closed under disjunc-
tions, conjunctions, and complementations.
(d) What is the complexity of deciding whether a given DNF represents a
symmetric function?
6. Design a data structure to store a DNF φ in which
(a) φ can be stored in O(|φ|) space built in O(|φ|) time;
(b) finding a term of η of a given degree requires O(1) time;
(c) finding a negative linear term of φ requires O(1) time;
(d) adding/deleting a term of degree k requires O(k) time;
(e) fixing/reactivating a literal occurring l times in φ requires O(l) time.
7. Show that the degree of a DNF expression of a Boolean function may be
strictly smaller than the degree of its complete DNF.
8. For an arbitrary Boolean function f on Bn , define the influence of variable
k (k = 1, 2, . . . , n) to be the probability that f|xk =1 (X)  = f|xk =0 (X), where
66 1 Fundamental concepts and applications

X is drawn uniformly at random over B n−1 (see Kahn, Kalai, and Linial
[543]). Show that, when f is positive, the influence of variable k is equal
πk
to 2n−1 , where πk is the k-th modified Chow parameter of f .
9. Show that the binary operator ⊕ is commutative and associative, that is,
x1 ⊕ x2 = x2 ⊕ x1 and (x1 ⊕ x2 ) ⊕ x3 = x1 ⊕ (x2 ⊕ x3 ) for all x1 , x2 , x3 ∈ B.
10. The parity function on B n is the function pn (x1 , x2 , . . . , xn ) = x1 ⊕ x2 ⊕
. . . ⊕ xn .
(a) Write a DNF expression of p n .
(b) Compute the Chow parameters of p n .
11. Assume that f is represented either as a sum-of-products modulo 2 of the
form (1.16) or as a multilinear polynomial over the reals of the form (1.19).
In each case, show how to efficiently solve the equation f (X) = 0.
12. Show that, if f is a Boolean function on B n , and f has an odd number of
true points, then
(a) every orthogonal DNF of f has degree n;
(b) every decision tree for f contains a path of length n from the root to
some leaf.
13. Prove that every Boolean function f on B n has a unique largest positive
minorant f− and a unique smallest positive majorant f + , where
(a) f− and f + are positive functions on B n ;
(b) f− ≤ f ≤ f + ;
(c) if g and h are any two positive functions such that g ≤ f ≤ h, then
g ≤ f− and f + ≤ h.
14. Prove that every Boolean function has the same maximal false points as its
largest positive minorant and the same minimal true points as its smallest
positive majorant (see previous exercise).
15. Consider the 0-1 integer programming problem (1.26)–(1.28) in Section
1.13.6. Prove that, when cj > 0 for j = 1, 2, . . . , n, (1.26)–(1.28) has the
same optimal solutions as the set covering problem obtained upon replacing
the resolvent fF by its largest positive minorant (see previous exercises,
and Hammer, Johnson, and Peled [443]).

Question for thought


16. (Open-ended). Characterize those multilinear polynomials over the reals
that represent Boolean functions. (See Section 1.12.2 and Nisan and
Szegedy [714].)
2
Boolean equations

The solution of Boolean equations is arguably the most fundamental problem


arising in the theory of Boolean functions. Actually, the quote at the beginning
of Chapter 1 shows that an important aspect of Boole’s original research program
was essentially to reduce logic to the solution of Boolean equations. Although
his hopes eventually proved overly optimistic, it will become clear in subsequent
chapters of this book that Boolean equations often arise as subproblems to be
solved in the course of tackling more complex problems. Therefore, their solution
is a cornerstone of many Boolean algorithms.
In this chapter, we present some representative models involving Boolean
equations and describe various algorithmic procedures for their solution: branch-
ing, variable elimination, the consensus method, and mathematical programming
approaches. In view of the importance of this topic, we spend quite a lot of time
discussing the details of classical procedures, their interrelations, respective mer-
its, and complexity. In the last section, we generalize the basic consistency-testing
problem in several ways: We examine the problems of counting and of generating
all solutions of a Boolean equation and briefly discuss the maximum satisfiability
(Max Sat) problem.

2.1 Definitions and applications


Definition 2.1. A Boolean equation is an equation of the form φ(X) = ψ(X),
where X = (x1 , x2 , . . . , xn ) is a vector of Boolean variables, and φ, ψ are Boolean
expressions in these variables. A solution of the equation is a point X∗ ∈ B n such
that φ(X∗ ) = ψ(X ∗ ). A Boolean equation is called consistent if it has a solution;
otherwise, it is called inconsistent.
For reasons to be discussed in Section 2.3, much of the literature on Boolean
equations focuses on DNF equations.
Definition 2.2. A DNF equation is a Boolean equation of the form φ(X) = 0,
where φ is a DNF. The degree of the DNF equation φ(X) = 0 is the degree of φ.

67
68 2 Boolean equations

Boolean equations not only play a fundamental role in propositional logic and
in theoretical computer science but also occur directly and naturally in many
applications, such as artificial intelligence, electrical engineering, mathematical
programming, and so on. Here are brief outlines of some typical applications.
Application 2.1. (Propositional logic, artificial intelligence.) In propositional
logic, a formula (or a Boolean expression) φ is called satisfiable if the equation
φ(X) = 1 is consistent, and it is called a contradiction otherwise. The formula is
valid, or is a tautology, if φ is identically equal to 1, that is, if the equation φ(X) = 0
is inconsistent. These classical concepts play a central role in (propositional) logic
and in all applications of artificial intelligence in which propositional logic is used
to model knowledge.
To illustrate, consider a knowledge base of rules involving the propositional
variables x1 , x2 , . . . , xn , and let φ(x1 , x2 , . . . , xn ) be the Boolean expression associ-
ated with the knowledge base, as in Section 1.13.1 (Chapter 1). Then, as we have
seen, the set of solutions of the equation φ = 0 describes the set of models of the
knowledge base, that is, the set of truth assignments that satisfy all the rules. In
particular, the equation φ = 0 is consistent if and only if the collection of rules is
not self-contradictory. Also, questions relative to the atomic propositions – e.g.,
questions of the form, “Is xi = 1 consistent with the given rules?” – are directly
reducible to the solution of Boolean equations.
Similar principles are used in many other areas of artificial intelligence, notably
in automated theorem proving. Assume, for instance, that a theorem proving system
must prove or disprove a general implication of the form

∀X ∈ B n ; (φ(X) = 0) =⇒ (ψ(X) = 0), (2.1)

where φ and ψ are arbitrary Boolean expressions (the premise φ(X) = 0 could
express the axioms of the theory as well as a number of more specific hypotheses).
The usual way to attack this question is to reason by contradiction and to solve
the equation
φ(X) ∨ ψ(X) = 0.
If this equation is consistent, then any of its solutions yields a counter-example to
the conjecture (2.1). Conversely, if the equation is inconsistent, then the implication
(2.1) is a theorem.
Our discussion focused on propositional logic. However, testing the validity
of formulas in first-order predicate logic, even though an undecidable problem,
can, in principle, be “reduced” to the solution of an infinite number of Boolean
equations through an application of Herbrand’s theorem. This type of reduction is
used, either explicitly or implicitly, in many theorem-proving procedures for first-
order logic; see, for example, Gilmore [380], Davis and Putnam [261], Robinson
[787], Chang and Lee [186], Jeroslow [533], Thayse [863]. Boolean equations
also find applications in solving decision problems from modal logic, as discussed
in [384, 510]. 
2.1 Definitions and applications 69

Application 2.2. (Electrical engineering.) Boolean equations play a central role


in the design and analysis of logic circuits. We sketch here only some representative
applications arising in this field and refer the reader to the specialized liter-
ature for more information; see, for instance, Abdulla, Bjesse, and Eén [1];
Brayton; Hachtel, McMullen, and Sangiovanni-Vincentelli [153]; Brown [156];
Herbstritt [490]; Kunz and Stoffel [590]; Schneeweiss [811]; Stephan, Brayton,
and Sangiovanni-Vincentelli [846]; or, the surveys by Clarke, Biere, Raimi, and
Zhu [204]; Gu, Purdom, Franco, and Wah [418]; Jiang and Villa [535]; or Villa,
Brayton, and Sangiovanni-Vincentelli [891].
When a Boolean function is to be physically realized by a VLSI circuit, it is usu-
ally desirable to first transform the original expression of the function into another
equivalent expression. This is because the original expression, which arose from
a functional specification of the circuit, may not be best suited for implementa-
tion purposes. Circuit designers would thus typically seek an expression requiring
fewer gates, fewer contacts, and so on in order to reduce the size of the circuit and
increase its speed and reliability. The transformed expression can be obtained by
algebraic manipulations based on the elementary rules spelled out in Chapter 1 or,
possibly, by other means. In particular, the Boolean minimization and dualization
problems discussed in Chapters 3 and 4 of this book, mostly arise in this context;
we shall see that the solution of Boolean equations is a basic subproblem in this
framework. (Brayton et al. [153] developed the well-known computer program
Espresso-II for logic design; according to the authors (page 64): “Answering the
tautology question (deciding if f ≡ 1) is the most fundamental Boolean operation
required by Espresso-II.”)
Whatever means are used, the validity of the transformation of a given expres-
sion, say φ(X), to another expression, say ψ(X), has to be carefully established
in a so-called verification phase before one can proceed with the actual imple-
mentation of the circuit. Verification can (in principle) be carried out by solving
the Boolean equation φ(X) = ψ(X). Indeed, φ(X) and ψ(X) are equivalent
expressions if and only if this equation is inconsistent.
We saw in Chapter 1, Section 1.13.2, that the correct operation of a combi-
national circuit can be described by a Boolean equation ψ(X, Y , z) = 0, where
ψ(X) is a DNF, X = (x1 , x2 , . . . , xn ) is the vector of variables associated with
the input signals of the circuit, z corresponds to the output signal of the circuit,
and Y = (y1 , y2 , . . . , ym ) is a vector of variables associated with the outputs of the
internal, “hidden” gates of the circuit.
In reality, a circuit may malfunction for any of a number of reasons, and the
problem of detecting such malfunctions is crucial in VLSI engineering. Various
techniques can be used for this purpose, depending on the type of faults that are
expected. We briefly discuss the detection of stuck-at faults. A stuck-at fault occurs
when, due to some physical defect, one of the gates of the circuit produces a
constant output, independent of the values of its inputs. The gate could be stuck at
1, meaning that it always produces a 1, or stuck at 0, meaning that it always outputs
a 0. Since the hidden gates of the circuit are not directly observable, one can only
70 2 Boolean equations

infer stuck-at faults from the observed input and ouput signals of the circuit. In
general terms, the test generation problem for stuck-at faults can be expressed as
follows: Generate an input vector X∗ (or possibly several) such that the output of
the circuit is incorrect on that input when certain gates have stuck-at faults.
To make this more explicit, let us focus on the test generation problem for diag-
nosing whether a specific OR-gate, say, gate k, is stuck at 1. (In practice, one may
often safely assume that only a few gates are faulty in a circuit. It is even common
to posit the “single fault hypothesis” according to which one gate at most could
be faulty.) Let ψ(X, Y , z) be the Boolean expression modeling the combinational
circuit as explained in Section 1.13.2, let y1 , y2 model the inputs of gate k, and let
yk model its output. So, in the expression ψ, we can isolate the terms associated
with gate k by rewriting ψ(X, Y , z) as
ψ(X, Y , z) = φ(X, Y , z) ∨ y1 y k ∨ y2 y k ∨ y 1 y 2 yk . (2.2)
Observe that the role of the last three terms of ψ in (2.2) is only to describe
the correct operation of the OR-gate k (all three terms must be 0 when the gate is
operating properly).
To model the behavior of the circuit when gate k is stuck at 1, we introduce
a new variable w, representing the output of the faulty circuit, and a new vector
of variables V = (v1 , v2 , . . . , vm ), where vi represents the output signal of gate
i (i = 1, 2, . . . , m) in the faulty circuit. Applying the same reasoning as in the
absence of any fault, we can state: In every solution (X ∗ , V ∗ , w∗ ) of the equation
φ|vk =1 (X, V , w) = 0, the variable associated with each gate represents the output
of that gate on the input signal X∗ on the assumption that gate k is stuck at 1 (note
that the terms linking y1 , y2 , and yk are absent from this equation).
It is then easy to conclude that every solution (X∗ , Y ∗ , z∗ , V ∗ , w∗ ) of the equation
ψ(X, Y , z) ∨ φ|vk =1 (X, V , w) ∨ z w ∨ z w = 0 (2.3)
has the following property: On the input signal X ∗ , the correct circuit described
by ψ produces the output z∗ , while the faulty circuit in which gate k is stuck at 1
produces the output w∗ = z∗ . In other words, a valid test vector for the stuck-at-1
fault at gate k can be generated by solving the Boolean equation (2.3).
Example 2.1. Let us illustrate this procedure for the detection of a stuck-at-1 fault
at the second (lower) OR-gate of the circuit displayed in Figure 1.6. The expres-
sion ψ(X, Y , z) associated with this circuit is given by equation (1.22), where the
output of the OR-gate under consideration is represented by variable y2 . As in
equation (2.2), we can rewrite
ψ(X, Y , z) = φ(X, Y , z) ∨ x1 y 2 ∨ y4 y 2 ∨ x 1 y 4 y2 ,
with
φ(X, Y , z) = y 1 z ∨ y2 z ∨ y1 y 2 z ∨ x1 y 1 ∨ x4 y 1 ∨ x 1 x 4 y1 ∨ x 2 y4 ∨ x3 y4 ∨ x2 x 3 y 4 .
Then, after some simplifications, equation (2.3) reduces to
x1 ∨ x 4 ∨ y 1 ∨ y2 ∨ y4 ∨ z ∨ v 1 ∨ w ∨ x2 x 3 ∨ x 2 v4 ∨ x3 v4 = 0.
2.1 Definitions and applications 71

The conclusion is that any input vector X ∗ satisfying x1∗ = 0, x4∗ = 1, and
x2∗ x ∗ 3 = 0 is a valid test vector for a stuck-at-1 fault at the lower OR-
gate. Any such vector produces the output w∗ = 0 in the faulty circuit, when
it should produce the output z∗ = 1 in the correct circuit (it is not very
difficult to check that it is indeed so, by direct verification). 

Larrabee [599] has demonstrated that a Boolean approach to test pattern gen-
eration, based on the formulation just described, is extremely effective in practice
and produces excellent results on well-known benchmark problems. In her exper-
iments, the approach proved competitive with alternative structural approaches
proposed in the specialized literature (see, e.g., [178, 590]).
In more recent work, Clarke et al. [204] describe successful reformulations
of other verification problems as Boolean DNF equations. They observe that this
approach, known as bounded model checking, appears to be remarkably efficient
and robust on industrial systems that would be difficult for the more traditional
model checking techniques based on binary decision diagrams; see also Jiang and
Villa [535]. 

Application 2.3. (Combinatorics.) Many properties of graphs and hypergraphs


can be easily expressed by means of Boolean equations. Theorem 2.1 in Section 2.2
provides a more precise statement of this claim, and Hammer and Rudeanu
[460] give several explicit Boolean formulations of combinatorial problems. More
examples will appear in subsequent chapters. So, we only present here a simple
illustration.
Let H = (N , E) be a hypergraph, where N = {1, 2, . . . , n}, and recall the termi-
nology in Appendix A. We say that H is 2-colorable if N can be partitioned into
two stable sets of H. Equivalently, H is 2-colorable if each of its vertices can be
assigned one of two colors, say blue or red, so that no edge of H is entirely blue or
entirely red. Introduce now n Boolean variables x1 , x2 , . . . , xn with the interpreta-
tion that vertex i is colored blue (respectively, red) if xi = 1 (respectively, xi = 0).
Then, H is 2-colorable if and only if the following DNF equation is consistent:



φ(x1 , x2 , . . . , xn ) = xj ∨ x j = 0.
A∈E j ∈A A∈E j ∈A

This straightforward observation seems to be part of the folklore of the field.


Remark that, with the notations of Section 1.1, φ(X) = fH (X) ∨ fH (X).
Conversely, Linial and Tarsi [615] showed that testing the consistency of any
DNF equation of the form
m



φ(x1 , x2 , . . . , xn ) = xi xj = 0, (2.4)
k=1 i∈Ak j ∈Bk

can be very simply transformed to a hypergraph 2-colorability problem. To see


this, let us define V = {x1 , x2 , . . . , xn , x 1 , x 2 , . . . , x n , 1}. We build a hypergraph
72 2 Boolean equations

H = (V , E) on the vertex-set V , where

– for each i ∈ {1, 2, . . . , n}, {xi , x i } is an edge in E;


– for each k ∈ {1, 2, . . . , m}, {xi | i ∈ Ak } ∪ {x j | j ∈ Bk } ∪ {1} is an edge in E.

It is an easy exercise to check that equation (2.4) is consistent if and only if H


is 2-colorable. As a consequence, any algorithm for testing the 2-colorability of
hypergraphs can also be used to solve (2.4) (see [615] for the description of such
an algorithm).
Another property of this construction is that the equation (2.4) is consistent
if and only if the hypergraph H has the so-called Kőnig-Egerváry property, that
is, if the maximum number of pairwise disjoint edges of H is equal to the min-
imum cardinality of a transversal of H. A closely related result was previously
established by Simeone [833, 834], who relied on this characterization to pro-
pose a linear time algorithm for the solution of quadratic DNF equations (see
Chapter 5). 

Application 2.4. (Integer programming.) In the course of solving linear or non-


linear 0–1 optimization problems, logical relations can often be deduced between
the values assumed by certain variables in every, or in some, optimal solutions.
This happens typically, though not exclusively, in the preprocessing phase of the
solution procedure. Suppose, for instance, that we are somehow able to derive that
two variables x and y can never be simultaneously 0 in a feasible solution of the
problem. Then, we know that the Boolean relation x y = 0 must hold. Similarly, if at
most one of x, y, and u can take value 1 in a feasible solution, then xy ∨xu∨yu = 0
must hold. Collecting several such relations and taking them simultaneously into
account leads to a Boolean equation φ(x, y, u, . . .) = 0, which is consistent if and
only if the the optimization problem is feasible. This observation can be used to set
up feasibility tests in a branch-and-bound procedure or to accelerate heuristics
(see, e.g., Granot and Hammer [410], Hammer and Nguyen [454], Hammer and
Hansen [439], Jaumard [526], Boros, Hammer, and Sun [133]). 
Several researchers have recently reported encodings of a large variety of indus-
trial problems in the form of Boolean equations, and solutions of these problems by
general purpose algorithms. Besides the references already cited earlier, let us also
mention the synthesis of small circuits for partially defined Boolean functions
[546], the verification of the validity of an automated safety procedure imple-
mented by Dutch railway stations [412], an application to product data management
in the automotive industry [586], the analysis of data encryption standards [675],
planning problems in logistics [556], and so on (see also the survey [418]).

2.2 The complexity of Boolean equations: Cook’s theorem


In the previous section, we discovered several prominent applications of Boolean
equations. We could have extended this list of applications to encompass several
2.2 The complexity of Boolean equations: Cook’s theorem 73

hundreds of questions. Indeed, it has been observed for a long time that numer-
ous problems of a combinatorial nature can be reduced to the solution of Boolean
equations (see, for instance, Fortet [342, 343]; Hammer and Rudeanu [460]). This
statement was given a more precise and very dramatic formulation by Cook [208],
who proved that each and every decision problem in a broad class of problems
(namely, the so-called class NP) can be transformed in polynomial time into an
equivalent Boolean equation. In order to express Cook’s theorem in the usual for-
mat of complexity theory, we first pose the problem of solving Boolean equations
as a decision problem (see Appendix B).

Boolean Equation
Instance: Two Boolean expressions φ(X) and ψ(X).
Question: Is the equation φ(X) = ψ(X) consistent?

A restricted version of this problem is:

DNF Equation
Instance: A DNF expression φ(X).
Question: Is the equation φ(X) = 0 consistent?

Observe that the answer to DNF Equation is “No” exactly when φ is a


tautology, as discussed in Application 2.1.

Theorem 2.1. (Cook [208]) The problem Boolean Equation is NP-complete,


even when restricted to DNF Equation and to DNF expressions of degree 3.

A proof of this deep and fundamental theorem requires the introduction of


formal machinery from complexity theory, for example, a definition of models
of computation, computing time, polynomial algorithms, reductions, and so on.
We refer the interested reader to Appendix B for a succinct introduction to these
concepts and for a proof of Theorem 2.1 (see also Theorems 2.3 and 2.4 following).
For readers who are not familiar with complexity theory, Appendix B also provides
valuable additional insights into the relevance of Cook’s theorem.
It is important to observe that the DNF equation
m



φ(X) = xi xj =0 (2.5)
k=1 i∈Ak j ∈Bk

has exactly the same set of solutions as the equation


m


ψ(X) = xi ∨ xj = 1, (2.6)
k=1 i∈Ak j ∈Bk

where ψ is now a CNF. As a matter of fact, Cook’s theorem is frequently stated (and
was originally proved) in its dual form involving CNF rather than DNF equations.
74 2 Boolean equations

More precisely, let us define the following decision problem:

Satisfiability
Instance: A CNF ψ(X).
Question: Is the equation ψ(X) = 1 consistent?

In view of Theorem 2.1 and of the equivalence of (2.5) and (2.6), we immedi-
ately conclude that Satisfiability is NP-complete, even when each clause of the
CNF ψ involves at most three literals (3-Satisfiability or 3-Sat problem).
Boolean equations have been frequently stated as satisfiability problems in
the artificial intelligence and computational complexity literatures. On the other
hand, DNF formulations are more commonly used in electrical engineering and
in propositional logic. In this book, we mostly deal with DNF equations rather
than satisfiability problems, but it should be clear that this is purely a matter of
convention.
The reader should also be aware that, in contrast to the foregoing comments,
equations of the form φ = 1, where φ is a DNF, are extremely easy to solve (and
thus rather uninteresting). To see this, simply remember that a DNF takes value 1
if and only if at least one of its terms takes value 1.
Finally, it should be noted that the bound on the degree of the equation in
Theorem 2.1 is tight, in the sense that the equation φ(X) = 0 can be solved in
polynomial time if φ is a quadratic DNF, as we shall see in Chapter 5. Numerous
extensions of Theorem 2.1, of the form Boolean Equation is NP-complete,
even when restricted to equations satisfying condition C, have been established in
the literature. We refer to [371, 571] for a discussion of such extensions, and we
propose some of them as end-of-chapter exercises.

2.3 On the role of DNF equations


In subsequent sections, we frequently concentrate on solution techniques for DNF,
rather than arbitrary Boolean equations. There are several reasons for this focus.
Clearly, Theorem 2.1 claims an important role for DNF equations in the theory
of computational complexity. But DNF equations also occur naturally in many
practical settings, as illustrated by several of the applications presented in Section
2.1. Moreover, most solution techniques for Boolean equations actually start by
reducing the given equation to a DNF equation.
The practical relevance of DNF equations is probably better understood when
one realizes that (2.5) is in fact equivalent to the system of equations

xi x j = 0, k = 1, 2, . . . , m.
i∈Ak j ∈Bk

Thus, a DNF equation is the natural expression of a system of conditions of the


form “at least one of the variables in Ak must be 0, or at least one of the variables
in Bk must be 1,” all of which must be simultaneously satisfied. For instance,
2.3 On the role of DNF equations 75

the production rules used in the knowledge base of an expert system frequently
constitute a system of conditions of this type; see Applications 1.13.1 and 2.1.
This is also the case for the Boolean equation associated with a logic circuit, as
explained in Applications 1.13.2 and 2.2.
More generally, systems of (possibly complex) Boolean conditions also arise
when instantiation techniques based on Herbrand’s theorem are used to prove the
validity of first-order logic formulas. Davis and Putnam [261] argued that, in this
framework, it is quite natural and efficient to work with DNFs. Their argument
goes as follows (for the sake of clarity, we replace the word “conjunctive” by
“disjunctive” in the authors’ original statement, without altering its meaning):
That the disjunctive normal form can be employed follows from the remark that to put
a whole system of formulas into disjunctive normal form we have only to put the
individual formulas into disjunctive normal form. Thus, even if a system has hundreds
or thousands of formulas, it can be put into disjunctive normal form “piece by piece”,
without any “multiplying out” (Davis and Putnam [261]).
In the remainder of this section, we show how an arbitrary system of Boolean
conditions (equations and inequalities) can be efficiently transformed into an
equivalent DNF equation. Let us first define what we mean by a “system of Boolean
conditions.”
Definition 2.3. A Boolean system on B n is a collection of Boolean equations and
inequalities of the form
φk (X) = ψk (X) k = 1, 2, . . . , p, (2.7)
φk (X) ≤ ψk (X) k = p + 1, p + 2, . . . , p + q, (2.8)
where φk and ψk are Boolean expressions on B n , for k = 1, 2, . . . , p + q. A solution
of the system is a point X ∗ ∈ B n such that φk (X ∗ ) = ψk (X ∗ ) for k = 1, 2, . . . , p
and φk (X ∗ ) ≤ ψk (X ∗ ) for k = p + 1, p + 2, . . . , p + q.
An easy, but fundamental, result due to Boole [103] allows us to transform any
Boolean system into a single Boolean equation.
Theorem 2.2. The Boolean system (2.7)–(2.8) has the same set of solutions as the
Boolean equation
p p+q

φk (X) ψk (X) ∨ φk (X) ψk (X) ∨ φk (X) ψk (X) = 0. (2.9)
k=1 k=p+1

Proof. It suffices to observe that the system (2.7) is equivalent to the system
φk (X) ≤ ψk (X) k = 1, 2, . . . , p,
φk (X) ≥ ψk (X) k = 1, 2, . . . , p,
and that each inequality of the form φk (X) ≤ ψk (X) is in turn equivalent to the
equation φk (X) ψk (X) = 0. 
76 2 Boolean equations

In view of Theorem 2.2, it only remains to show that every Boolean equation of
the form φ(X) = 0 can be efficiently transformed into an equivalent DNF equation.
A polynomial time transformation could of course be read from the proof of Cook’s
theorem (Theorem 2.1), but the resulting procedure would be too cumbersome to
be of practical interest.
On the other hand, since every Boolean function has a DNF expression, the left-
hand side of (2.9) could, in principle, be rewritten as an equivalent DNF. However,
we have already observed that this may lead to an exponential explosion in the size
of the problem (see Example 1.11). As a matter of fact, Example 1.11 essentially
shows that there is no hope of achieving the desired polynomial time transformation
of an arbitrary equation into an equivalent DNF equation, unless one is willing to
introduce additional variables in the picture.

Definition 2.4. Consider two Boolean systems, say S1 (X) and S2 (X, Y ), where S1
involves only the variables (x1 , x2 , . . . , xn ), whereas S2 involves (x1 , x2 , . . . , xn ) and
possibly additional variables (y1 , y2 , . . . , ym ). We say that S1 and S2 are equivalent
if the following two conditions hold:

(a) For every solution of S1 , say X ∗ ∈ B n , there exists Y ∗ ∈ B m such that


(X ∗ , Y ∗ ) is a solution of S2 .
(b) For every solution of S2 , say, (X ∗ , Y ∗ ) ∈ Bn+m , X ∗ is a solution of S1 .

So, when S1 and S2 are equivalent, the solution set of S1 is the projection on
B n of the solution set of S2 . In particular, if S2 only involves the X-variables, then
S1 and S2 are equivalent if and only if they have the same solution set.
We are now ready for the main result of this section (Tseitin [872]; see also [78]
for a broader discussion and for extensions of this result to first-order predicate
logic).

Theorem 2.3. Every Boolean system can be reduced in linear time to an equivalent
DNF equation.

Proof. First, Theorem 2.2 can be used to rewrite (in linear time) the system as a sin-
gle equation of the form φ(X) = 0. Then, apply the procedure Expand described
in Section 1.5 to the expression φ(X). The output of Expand is a DNF ψ(X, Y )
and a distinguished literal z among the literals on (X, Y ), with the property that
the equation φ(X) = 0 is equivalent to the DNF equation ψ|z=0 (X, Y ) = 0. 

Actually, we do not need the full power of the procedure Expand in order to
establish Theorem 2.3. Indeed, we leave it to the reader to verify that the procedure
Expand∗ in Figure 2.1, which introduces fewer additional variables and produces
shorter DNFs than Expand, also achieves the required transformation (we refer, for
instance, to Blair, Jeroslow and Lowe [98], Clarke, Biere, Raimi and Zhu [204],
Eén and Sörensson [290], Jeroslow [533], Plaisted and Greenbaum [750], and
Wilson [914], for descriptions and applications of related procedures).
2.3 On the role of DNF equations 77

Procedure Expand∗ (φ)


Input: A Boolean expression φ(X) on Bn .
Output: A DNF ψ(X, Y ) on Bn+m such that the equations φ(X)=0 and ψ(X, Y)=0 are equivalent.

begin
if φ is a DNF then ψ := φ;
else if φ = α for some expression α then return Expand∗ (α);
else if φ = (φ1 ∨ φ2 ) for some expressions φ1 , φ2 then return Expand∗ (φ1 φ2 );
else if φ = (φ1 φ2 ) for some expressions φ1 , φ2 then return Expand∗ (φ1 ∨ φ2 );
else if φ = (φ1 ∨ φ2 ∨ . . . ∨ φk ) for some expressions φ1 , φ2 , . . . , φk then
begin
for j = 1 to k do ψj := Expand∗ (φj );
return ψ := ψ1 ∨ ψ2 ∨ . . . ∨ ψk ;
end
else if φ = (φ1 φ2 . . . φk ) for some expressions φ1 , φ2 , . . . , φk then
begin
for j = 1 to k do ψj := Expand∗ (φj );
create k new variables, say y1 , y2 , . . . , yk ;
return ψ := y 1 ψ1 ∨ y 2 ψ2 ∨ . . . ∨ y k ψk ∨ y1 y2 . . . yk ;
end
end

Figure 2.1. Procedure Expand∗ .

The next result underlines the special role played by DNF equations of degree 3.
It can be seen as a strengthening of the second half of Theorem 2.1.

Theorem 2.4. Every DNF equation can be reduced in linear time to an equivalent
DNF equation of degree 3.

Proof. Consider a DNF equation of the form (2.5) and assume that |A1 | + |B1 | > 3.
Select two distinct indices in A1 ∪ B1 , say, h, - ∈ A1 (similar arguments apply if
one of the indices is in B1 ). Let y be an additional Boolean variable, different from
x1 , x2 , . . . , xn , and define


m


ψ(X, y) = xi xj y ∨ xi x j ∨ xh x- y ∨ xh y ∨ x- y.
i∈A1 \{h,-} j ∈B1 k=2 i∈Ak j ∈Bk

We claim that the equations φ(X) = 0 and ψ(X, y) = 0 are equivalent. To see
this, consider any point (X ∗ , y ∗ ) ∈ B n+1 . It is easy to see that the expression
xh∗ x-∗ y ∗ ∨ xh∗ y ∗ ∨ x-∗ y ∗ is equal to 0 if and only if y ∗ = xh∗ x-∗ . This implies that,
for all solutions (X ∗ , y ∗ ) of the equation ψ(X, y) = 0, there also holds φ(X ∗ ) = 0.
And conversely, every solution X ∗ of φ(X) = 0 gives rise to a solution (X ∗ , y ∗ )
of the equation ψ(X, y) = 0, by simply setting y ∗ = xh∗ x-∗ . Thus, the equations are
equivalent.
Note that the degree of the first term of ψ is equal to |A1 | + |B1 | − 1. Thus,
repeatedly applying this reduction eventually yields a DNF equation of degree 3. It
can be checked that the total number of additional variables and terms introduced
78 2 Boolean equations

m 
by this transformation is O k=1 (|Ak | + |Bk |) . We leave to the reader a more
complete analysis of the complexity of this procedure. 

Relying on Theorem 2.3 (and Theorem 2.4), the remainder of this chapter
mostly concentrates on the solution of DNF equations. The reader should be aware,
however, that the transformation of an arbitrary Boolean equation into a DNF
equation typically introduces a large number of new variables into the picture,
even when procedure Expand∗ is used, rather than Expand. Hence, in some cases,
this transformation may artificially increase the difficulty of the problem at hand.
Since some Boolean equations naturally arise in non-DNF form (e.g., equations of
the form φ(X) = ψ(X) arising in logic circuit verification; see Application 2.2),
it may sometimes be desirable to develop procedures capable of dealing directly
with these alternative forms, rather than blindly relying on the general techniques
discussed earlier.
To illustrate this comment, let us consider an equation of the form φ(X) = ψ(X),
where φ and ψ are DNFs. According to our previous discussion, one way of
handling this equation is to rewrite it as φ(X) ψ(X) ∨ φ(X) ψ(X) = 0, and next
to apply Expand∗ to the latter equation. However, a more efficient approach can
be used here. First, check whether the system φ(X) = 1, ψ(X) = 1 has a solution.
Since φ and ψ are both DNFs, this system turns out to be very easy to solve
(we leave this for the reader to check). If it is consistent, then we can stop right
away. Otherwise, the original equation φ(X) = ψ(X) has been reduced to the
system φ(X) = 0, ψ(X) = 0, which is, in turn, equivalent to the DNF equation
φ(X) ∨ ψ(X) = 0. Clearly, this approach usually involves much less work than
the “standard” procedure.
It will also be easy to see that some of the equation-solving techniques presented
in the following sections (e.g., the enumeration techniques) can be modified in a
straightforward way to handle non-DNF equations. Other techniques have been
generalized in a more sophisticated way with the same goal in mind, for example,
the consensus technique (see Thayse [863], Van Gelder [885]) or local search
heuristics (see Stachniak [844]).

2.4 What does it mean to “solve a Boolean equation”?


The phrase “solving a Boolean equation” can be interpreted in various ways. It is
worthwhile to briefly clarify this issue before proceeding.
While discussing Cook’s theorem in Section 2.2, we formalized Boolean
equations as decision problems: Given a Boolean equation, the task was sim-
ply to decide whether the equation was consistent or not. Any algorithm for the
solution of Boolean equations should be able, as a minimal requirement, to give
an answer to this decision problem.
Now, any algorithm that tests the consistency of an equation can also be used, in
principle, to compute a solution when there is one. To understand this, consider any
such consistency-testing algorithm, say A. Given an equation φ(x1 , x2 , . . . , xn ) = 0,
2.4 What does it mean to “solve a Boolean equation”? 79

we first use A to decide whether the equation is consistent. If the answer is


No, then we can stop. Otherwise, we run A again in order to decide whether
φ(x1 , x2 , . . . , xn−1 , 0) = 0 is consistent, where the expression φ(x1 , x2 , . . . , xn−1 , 0)
is obtained by substituting 0 for xn in φ. If the answer is Yes, then we restrict our
attention to solutions where xn = 0, that is, we fix xn to 0 in the equation. If the
answer is No, then we know that xn must be 1 in all solutions, and, accordingly,
we fix xn to 1 in the equation. Thus, in either case, we have reduced the original
equation to an equation in n − 1 variables. Proceeding iteratively, we see that n + 1
calls on the algorithm A suffice to construct a solution of the equation, when there
is one.
Fortunately, this roundabout way of computing solutions will usually not prove
necessary. Indeed, it is difficult to imagine an algorithm that would simply test
whether an equation is consistent that would not also, implicitly or explicitly,
find a solution of the equation when there is one. As a result, all of the algorithms
described in the coming sections will provide an answer to the latter “constructive”
version of the problem.
But there are still other ways of interpreting, or of generalizing the task of
“solving a Boolean equation.” First, we may want to list all solutions of the given
equation. This is, of course, a formidable requirement, since a Boolean equation
may well have an exponential number of solutions. We discuss various ways of
handling this problem, either explicitly, in Section 2.11.2, or implicitly (by giving
a parametric representation of the set of solutions), in Section 2.11.3.
We may also be interested in counting the number of solutions of the equation.
We have already briefly mentioned this question in Sections 1.6 and 1.13, for
instance, in connection with the problem of computing the reliability of a complex
system. We return to it in Section 2.11.1.
We may want to compute the optimal solution of the given equation according
to a variety of numerical criteria. Such formulations bring us into the realm of
integer programming. They have already been evoked in Section 1.13.6, where
we have seen that they can be transformed into equivalent generalized covering
problems, and we return to them several times in subsequent chapters.
Finally, even when the equation is inconsistent, we may want to compute a
point that comes as close as possible to satisfying the equation. For instance, the
famous maximum satisfiability problem (or Max Sat problem) is of this nature.
Indeed, it is equivalent to the following question: Given a DNF φ, find a point
that cancels as many terms of φ as possible. This problem and some of its variants
have been thoroughly investigated in the computational complexity literature. We
discuss them in Section 2.11.4.
For now, let us turn to the fundamental task of testing the consistency of a
Boolean equation. There is a huge field to cover, and we shall primarily concen-
trate on exact Boolean approaches, as opposed, for instance, to heuristics and to
numerical methods. For additional information, we refer to the books [508, 571],
to the collections of papers [278, 377, 537], to the surveys [209, 418, 881], and
so on.
80 2 Boolean equations

2.5 Branching procedures


Branching procedures (sometimes called splitting procedures) represent the most
elementary and most natural approach to the solution of Boolean equations. Yet,
in spite (or because) of their simplicity, they have established themselves as very
efficient, reliable, and versatile methods. Therefore, they deserve special attention
in this chapter. They also provide a general framework in which many useful
algorithmic ideas can easily be explained and implemented.
The starting point of most branching procedures is the following obvious
observation:

Theorem 2.5. The Boolean equation φ(x1 , . . . , xn−1 , xn ) = 0 is consistent if and


only if either
φ(x1 , . . . , xn−1 , 0) = 0

or
φ(x1 , . . . , xn−1 , 1) = 0

is consistent.

Theorem 2.5 suggests that we can solve the equation φ(x1 , . . . , xn−1 , xn ) = 0
using a branching, or enumerative, procedure similar in spirit to the branch-and-
bound methods developed for integer programming problems. We are now going
to describe the basic scheme of such a procedure, first informally, and then more
rigorously. We restrict ourselves to a depth-first search version of the procedure,
partly for the sake of simplicity, and also because many efficient implementations
fall under this category (the reader will easily figure out what a more general
branching scheme may look like).
The procedure can be viewed as growing a binary enumeration tree (or “seman-
tic tree”), where each node of the tree corresponds to a partial assignment of values
to the variables. More precisely, each node is associated with a subproblem that we
denote by (φ, T , F ), where T and F are two disjoint subsets of {1, . . . , n}. This sub-
problem is defined as follows: Find a solution X∗ = (x1∗ , x2∗ , . . . , xn∗ ) of the equation
φ(X) = 0 such that xi∗ = 1 for all i ∈ T , and xi∗ = 0 for all i ∈ F , or decide that no
such solution exists. The root of the tree corresponds to the subproblem (φ, ∅, ∅),
meaning that all variables are initially free.
The branching procedure uses a subroutine Preprocess(φ, T , F ) which could
perform a variety of preprocessing operations on the subproblem (φ, T , F ). We
simply assume that this subroutine always returns one of three possible outputs:

(i) Either a solution X∗ satisfying the conditions of the subproblem (φ, T , F ).


(ii) Or the answer No, if the procedure is able to establish conclusively that
(φ, T , F ) has no solution.
(iii) Or a subproblem of the form (ψ, S, G) with the property that (φ, T , F ) has
a solution if and only if (ψ, S, G) has a solution.
2.5 Branching procedures 81

To simplify our presentation, we further assume that ψ is defined on the same


set of variables as φ, and that T ⊆ S and F ⊆ G. (Typically, though not neces-
sarily, (ψ, S, G) would be obtained by determining the value that certain variables
must take in all solutions of (φ, T , F ), and by simplifying φ and extending (T , F )
accordingly.) Finally, let us agree that Preprocess always returns either a solution
or the answer No in the trivial case where T ∪ F = {1, . . . , n}, that is, when values
have been assigned to all variables.
Now consider an arbitrary node of the enumeration tree and the corresponding
subproblem (φ, T , F ). The branching procedure makes a first attempt at solving
(φ, T , F ) by calling the subroutine Preprocess(φ, T , F ). If Preprocess succeeds
in finding a solution X∗ (case (i)), then the search stops, since X ∗ is by definition
a solution of φ(X) = 0. If Preprocess reports that (φ, T , F ) is inconsistent (case
(ii)), then the procedure backtracks by moving to another node of the search tree.
Finally, if Preprocess returns the problem (ψ, S, G) (case (iii)), then the procedure
resorts to Theorem 2.5: That is, a variable xi is selected such that i  ∈ S ∪ G, and the
subproblems (ψ, S, G∪{i}) and (ψ, S ∪{i}, G) are recursively solved (this amounts
to fixing xi first to 0, then to 1). The subproblem (φ, T , F ) is only reported to have
no solution if both (ψ, S ∪ {i}, G) and (ψ, S, G ∪ {i}) are eventually found to be
inconsistent.
Figure 2.2 presents a more formal, recursive description of the branching pro-
cedure. The Boolean equation φ = 0 can be solved by calling the procedure
Branch(φ, ∅, ∅). The correctness of the procedure directly follows from Theorem
2.5 and from our previous discussion.

Procedure Branch(φ, T , F ).
Input: A Boolean expression φ(x1 , x2 , . . . , xn ) and two subsets T , F of {1, . . . , n} such that
T ∩ F = ∅.
Output: A solution X∗ = (x1∗ , x2∗ , . . . , xn∗ ) of the equation φ(X) = 0 such that xi∗ = 1 for all i ∈ T ,
and xi∗ = 0 for all i ∈ F , if such a solution exists; No otherwise.

begin
if Preprocess(φ, T , F ) = X∗ then return X ∗ ;
if Preprocess(φ, T , F ) = No then return No;
if Preprocess(φ, T , F ) = (ψ, S, G) then
{comment: branch }
begin
select an index i ∈ {1, . . . , n} \ (S ∪ G);
{comment: fix xi to 0}
if Branch(ψ, S, G ∪ {i}) = X∗ then return X ∗
{comment: fix xi to 1}
else return Branch(ψ, S ∪ {i}, G);
end
end

Figure 2.2. Procedure Branch.


82 2 Boolean equations

Of course, Branch cannot really be called an algorithm until we specify the


rules applied to selecting the branching variable, as well as the specific fea-
tures of the subroutine Preprocess. In practice, as demonstrated, for instance,
in [239, 281, 476], the efficiency of Branch hinges critically on these factors (as
well as on the strategy used to explore the search tree; see Section 2.9). We now
proceed with a discussion of these topics. We observe that, although no assump-
tion has been formulated so far regarding the nature of the input expression φ,
much of the literature has concentrated on the special (but important) case of DNF
equations. In particular, branching rules and preprocessing operations have been
mostly investigated for DNF equations, and from now on, we restrict our attention
to such equations.

2.5.1 Branching rules


Let us concentrate on the situation arising at the root of the search tree, when the
subproblem to be solved is the original (DNF) equation φ = 0 (the situation at
every other node is similar). When branching is necessary, the branching variable
can be chosen according to various strategies. Most strategies tend to give higher
priority to variables presenting a “large” number of occurrences in the DNF and/or
to variables occurring in “many” terms of “low” degree (the idea being, in both
cases, to reduce as much as possible the size of the DNF after branching). Some
typical suggestions are listed hereunder. In order to describe them, let hi (u) denote
the number of terms of degree i that contain the literal u in the DNF φ, for i =
1, 2, . . . , n.
• Davis and Putnam [261] propose branching first on any literal appearing
in a term of smallest degree (theoretical properties of this rule have been
investigated by Chao and Franco [187] and Chvátal and Reed [202]).
• A popular variant of this rule consists in selecting a literal with the highest
number of occurrences among the terms of smallest degree: Select u that
maximizes hmin(φ) (u), where min(φ) is the minimum degree of any term in
φ (see Cook and Mitchell [209]; Dubois, André, Boufkhad and Carlier [281];
Van Gelder and Tsuji [886], etc.).
• In the computer code Espresso-II, Brayton et al. [153] branch first on a literal
with the highestnumber of occurrences in the formula, namely, a literal u
that maximizes ni=1 hi (u).
• Jeroslow and Wang [534] combine the above ideas by giving more weight to
shorter terms: They suggest branching first on any literal u that maximizes
n

W (u) = wi hi (u), (2.10)
i=1

where wi = 2−i . Jeroslow and Wang [534] and Harche, Hooker, and Thomp-
son [476] obtained good computational results with this branching rule, but
Dubois et al. [281] report that other choices of the weights wi may be more
2.5 Branching procedures 83

effective. The performance of the branching rule has been investigated in


depth by Hooker and Vinay [504], who also challenge its rationale and
propose more efficient alternatives.
• Several researchers have successfully used branching rules of the following
form: Select a variable x that maximizes


hmin(φ) (x) + hmin(φ) (x) + α min hmin(φ) (x), hmin(φ) (x) , (2.11)

where hmin(φ) is defined as above and α is a numerical parameter; see, for


instance, Buro and Kleine Büning [166], Dubois et al. [281], Pretolani [758].
Intuitively, this type of rule favors variables that not only appear frequently
in short terms but also tends to pick variables for which the subtrees created
after branching are roughly balanced (this provides the motivation for the
last term in (2.11)).

Other practical branching rules are discussed in [58, 166, 239, 281, 418, 534,
613, 886], and so on. Dubois et al. [281], in particular, stress the fact that the branch-
ing strategies that prove most efficient on consistent instances may be different
from those that perform well on inconsistent instances.
In order to improve the effectiveness of branching, several authors have sug-
gested focusing on control sets, where a control set is a set S of indices such that,
after branching on all the variables with indices in S, in any arbitrary order, the
remaining equation is always “easy” to solve (that is, the subproblem (φ, T , F ) is
“easy” for every partition T ∪ F of S). This type of strategy appears, for instance,
in publications by Brayton et al. [153], Chandru and Hooker [183], Boros et al.
[116], Truemper [871], and so on. Crama, Ekin, and Hammer [229] proved that
finding a smallest control set is NP-hard for a broad range of specifications of what
constitutes an “easy” equation. Closely related concepts have recently been reex-
amined by Williams, Gomes, and Selman [913] under the name backdoor sets; see
also [568, 581, 715, 854] and the discussion of relaxation schemes in Section 2.5.2
hereunder.
The branching rules described earlier may lead to ties that can be broken either
deterministically (e.g., by choosing the variable with smallest index among the
candidates) or by random selection. Implementations of sophisticated randomized
branching rules are found, for instance, in Bayardo and Schrag [58] and Crawford
and Auton [239]. Interestingly, Gomes et al. [403] provide evidence that random-
ized selection may noticeably influence the performance of branching procedures
(namely, the variance of the running time is usually large when the randomized
procedure is applied several times to a single instance).
Departing from the basic algorithm Branch, some authors have suggested
branching on terms of the current DNF rather than on its variables. For
instance, Monien and Speckenmeyer [690] proposed the following approach: If
84 2 Boolean equations

Preprocess(φ, T , F ) returns (ψ, S, G) at some node of the enumeration tree, then


p
(a) choose a term of ψ, say a term of the form ( ri=1 xi )( j =r+1 x j );
(b) create p subproblems, where, in the k-th subproblem:
• If 1 ≤ k ≤ r, then x1 = . . . = xk−1 = 1 and xk = 0.
• If r + 1 ≤ k ≤ p, then x1 = . . . = xr = 1, xr+1 = . . . = xk−1 = 0 and
xk = 1.

Thus, the subproblems created


in the search tree correspond to mutually exclusive
p
ways of setting the term ( ri=1 xi )( j =r+1 x j ) to zero. In their computational
experiments, Gallo and Urbani [365] and Bruni and Sassano [159] found this rule
to perform well.

2.5.2 Preprocessing
Let us now discuss some of the possible ingredients that may go into the sub-
routine Preprocess. We assume again, for the sake of simplicity, that the current
subproblem is the equation DNF φ = 0. We successively handle rewriting rules,
the Davis-Putnam rules, general heuristics, and relaxation schemes.

Rewriting rules
Any rewriting operation that replaces φ by an equivalent DNF can be applied.
Examples of such operations are the removal of duplicate terms or, more generally,
the removal of any term of φ that is absorbed by another term. Several authors
have also experimented with rules which replace φ by an equivalent DNF of the
form φ ∨ C1 ∨ C2 ∨ . . . ∨ Cr , where C1 , C2 , . . . , Cr are (prime) implicants of φ.
The consensus procedure (see Section 2.7) can be interpreted in this framework;
related ideas are found in [599, 886].

Davis-Putnam rules
In an oft-cited paper, Davis and Putnam [261] proposed a number of simple prepro-
cessing rules that have attracted an enormous amount of attention in the literature
on Boolean equations and that are implemented in most of the efficient equation
solvers (strictly speaking, Davis and Putnam’s suggestions were formulated in the
framework of elimination algorithms – to be discussed in Section 2.6 – rather
than branching algorithms; the application of these rules within branching proce-
dures was popularized by Davis, Logemann, and Loveland [260] and Loveland
[627]).
The Davis-Putnam rules identify various special circumstances under which a
variable xi can be fixed to a specific value without affecting the consistency of the
equation. The rules fall into two categories: unit literal rules (sometimes called
unit clause rules, unit deduction rules, forward chaining rules, etc.) and monotone
literal rules (sometimes called pure literal rules, affirmative-negative rules, etc.).
2.5 Branching procedures 85

To state them, it is convenient to assume that the terms of the DNF φ have been
grouped as follows:
φ = x i φ0 ∨ xi φ1 ∨ φ2 , (2.12)

where φ0 , φ1 , and φ2 are DNFs which do not involve xi .

Unit literal rules: For i = 1, 2, . . . , n,

(a) if φ has the form x i ∨ xi ∨ φ2 , then return No: the equation φ = 0 is


inconsistent;
(b) if φ has the form x i ∨ φ2 , then fix xi to 1;
(c) if φ has the form xi ∨ φ2 , then fix xi to 0.

The unit literal rules are obviously valid; that is, the equation obtained after
applying the rules is consistent if and only if the original equation is consistent.
Within branching algorithms, they are usually applied in an iterative fashion until
their premises are no longer satisfied. At this point, either a complete solution of
the equation φ = 0 has been found, or an equivalent, but simpler equation has been
derived. In the artificial intelligence literature, this procedure sometimes goes by
the name of unit resolution, clausal chaining, or Boolean constraint propagation
(BCP) (see, e.g., [186, 533, 627, 670, 693]).
The unit literal rules can be implemented to run in linear time and are com-
putationally efficient. It is worth noting that they are somehow redundant with
most of the branching rules described in the previous subsection, in the sense that
these branching rules tend to select a variable appearing in a term of degree 1
when such a term exists (since the branching rules often give priority to variables
appearing in short terms). Thus, many branching rules can be seen as automatically
enforcing the unit literal rules when they are applicable, and as generalizing these
rules to terms of higher degree otherwise. Separately handling the unit literal rules,
however, usually allows for more efficient implementations.
Let us now turn to the monotone literal rules.

Monotone literal rules: For i = 1, 2, . . . , n,

(a) if xi occurs only uncomplemented in φ, that is, if φ has the form xi φ1 ∨ φ2 ,


then fix xi to 0;
(b) if xi occurs only complemented in φ, that is, if φ has the form x i φ0 ∨ φ2 ,
then fix xi to 1.

The monotone literal rules are valid in the sense that φ = 0 has a solution if and
only if the equation obtained after applying the rules has a solution. From a practical
viewpoint, they can be implemented to run in linear time but seem to have only
a marginal effect on the performance of branching procedures. Generalizations of
these rules have been investigated in [126].
86 2 Boolean equations

Heuristics
Any heuristic approach to consistency testing can be used within the branching
framework. For instance, Jeroslow and Wang [534] implement a “greedy” heuris-
tic, which essentially consists in iteratively fixing to 0 any literal u that maximizes
the expression W (u) defined by (2.10). This process is repeated until either a solu-
tion X ∗ of φ(X) = 0 has been produced or a contradiction has been detected. In
the latter case, Preprocess simply returns the original equation. Jaumard, Stan,
and Desrosiers [532] similarly rely on a tabu search heuristic at every node of the
branching tree.

Relaxation schemes
An interesting approach to preprocessing has been initiated by Gallo and Urbani
[365] (who also credit Minoux [unpublished] with a similar idea) and exploited
by several other researchers in various frameworks. This approach makes use of a
basic ingredient of enumerative algorithms: the notion of relaxation of a problem.
We define here a relaxation scheme as an operator that associates with every
(DNF) equation φ(X) = 0 another (DNF) equation ψ(X, Y ) = 0 (its relax-
ation), with the property that φ(X) = 0 is inconsistent whenever ψ(X, Y ) = 0
is inconsistent.
Given a relaxation scheme, the subroutine Preprocess can proceed along the
following lines:
For the current subproblem φ(X) = 0,

(a) generate the relaxation ψ(X, Y ) = 0, and solve it;


(b) if the relaxation is inconsistent, then return No; otherwise, let (X∗ , Y ∗ ) be
a solution of the relaxation;
(c) if φ(X∗ ) = 0, then return the solution X ∗ ; otherwise, return the original
equation.

Thus, solving the relaxation ψ = 0 either proves that the original equation φ = 0
is inconsistent (in step (b)) or produces a candidate (heuristic) solution of φ = 0
(in step (c)).
Generally speaking, the art consists in choosing the relaxation scheme in such
a way that the relaxed equation ψ(X, Y ) = 0 is “easy” to solve, while remaining
sufficiently “close” to the original equation. One way of defining a relaxation
scheme is to construct ψ so that ψ(X, Y ) ≤ φ(X) for all (X, Y ), which can be
achieved by removing a subset of terms from φ. In this framework, the goal is to
remove as few terms as possible from φ (so that ψ remains “close” to φ) until
the equation ψ = 0 becomes “easy” to solve. (This idea is related to the notion
of control set introduced in Section 2.5.1.) Crama, Ekin, and Hammer [229] have
investigated the computational complexity of several versions of this problem.
Gallo and Urbani [365] use Horn equations as relaxations of arbitrary DNF
equations. Horn equations are precisely those DNF equations in which each term
contains at most one complemented variable (recall Definition 1.30 in Section
2.6 Variable elimination procedures 87

1.13.1). As we will see in Chapter 6, Horn equations can be solved in linear time
(essentially, by repeated application of the unit literal rules). A DNF equation φ = 0
can be relaxed to a Horn equation by dropping from φ any term that contains more
than one complemented variable. More elaborate schemes are discussed in Gallo
and Urbani [365] or Pretolani [758].
Other authors have similarly proposed to relax the given DNF equation to a
quadratic equation (quadratic equations, like Horn equations, are easily solved in
linear time; see Chapter 5). Buro and Kleine Büning [166]; Dubois and Dequen
[283]; Groote and Warners [412]; Jaumard, Stan, and Desrosiers [532]; Larrabee
[599]; and Van Gelder and Tsuji [886] report on computational experiments relying
on (variants of) such schemes. As Larrabee observed [599], one may expect these
approaches to perform particularly well when the equation contains a relatively
high number of quadratic terms, as is the case with the equations arising from
stuck-at fault detection in combinational circuits (see Application 2.2).
Finally, we note that the decomposition techniques described by Truemper [871]
share some similarities with relaxation schemes.

2.6 Variable elimination procedures


In this section, we discuss variable elimination techniques for the solution of
Boolean equations. Variable elimination procedures apply to Boolean equations
of the form
φ(x1 , . . . , xn−1 , xn ) = 0, (2.13)
where φ is an arbitrary Boolean expression, not necessarily in disjunctive normal
form. They rely on the following result.
Theorem 2.6. The equation φ(x1 , . . . , xn−1 , xn ) = 0 is consistent if and only if the
equation
φ(x1 , . . . , xn−1 , 0) φ(x1 , . . . , xn−1 , 1) = 0 (2.14)
is consistent.
Theorem 2.6 can be viewed as a trivial restatement of Theorem 2.5. It should be
noted, however, that contrary to Theorem 2.5 which only holds in the two-element
Boolean algebra, Theorem 2.6 holds (nontrivially) in general Boolean algebras
as well, so that variable elimination techniques extend directly to such algebras.
Theorem 2.6 and variable elimination procedures can actually be traced to the clas-
sical works of several 19th-century logicians (see, e.g., Boole [103], Chapter VII,
Proposition 1; see also Kuzicheva [591] or Rudeanu [795] for historical accounts).
Equation (2.14) is an equation in n − 1 variables, which we view as resulting
from (2.13) by elimination of variable xn (the operation that associates equation
(2.14) to equation (2.13) is sometimes called variable splitting; see e.g. [186, 261]).
By successive elimination of a subset of variables, a necessary and sufficient
condition for the consistency of (2.13) can be obtained in terms of the remaining
variables. This technique turns out to be useful in applications where some of
88 2 Boolean equations

the variables are not immediately relevant, but have rather been introduced in the
equation in order to facilitate the formulation of a problem. For instance, in the
Boolean equation φ(X, Y , z) = 0 describing the correct functioning of a switching
circuit (see Application 1.13.2), the variables Y associated with the output of the
hidden gates are usually not of direct interest. In this application, eliminating the Y -
variables from φ = 0 leads to an equation whose solution set describes the relation
between the input signals X and the output signal z (viz., the function computed
by the circuit).
More specifically, successive elimination of all variables of the equation (2.13)
eventually provides a straightforward consistency test for this equation. Before we
make this more precise, however, we would like to address the following question:
Suppose that the equation (2.14) is consistent, and that we know one of its solutions,
say (x1∗ , . . . , xn−1

); how can we use this knowledge to produce a solution of the
original equation (2.13)? The next result provides a constructive answer to this
question.
Theorem 2.7. If (x1∗ , . . . , xn−1

) is a solution of (2.14), if xn∗ = φ(x1∗ , . . . , xn−1∗
, 0),
∗ ∗ ∗ ∗ ∗ ∗
and xn = φ(x1 , . . . , xn−1 , 1), then both (x1 , . . . , xn−1 , xn ) and (x1 , . . . , xn−1 , xn∗∗ )
∗∗ ∗

are solutions of (2.13).


Proof. The validity of this statement can be verified by direct substitution. But the
following proof provides more insight into the nature of the elimination technique.
The Shannon expansion of the function φ is
φ(x1 , . . . , xn−1 , xn ) = xn φ(x1 , . . . , xn−1 , 1) ∨ x n φ(x1 , . . . , xn−1 , 0). (2.15)
Therefore, if xn∗ = φ(x1∗ , . . . , xn−1

, 0), it follows from (2.15) that
φ(x1∗ , . . . , xn−1

, xn∗ ) = φ(x1∗ , . . . , xn−1

, 0) φ(x1∗ , . . . , xn−1

, 1), (2.16)
which is zero by definition of (x1∗ , . . . , xn−1

). The same reasoning applies to xn∗∗ . 

Let us now illustrate the use of Theorems 2.6 and 2.7 on a small example.
Example 2.2. Consider the DNF equation φ3 (x1 , x2 , x3 ) = 0, where
φ3 = x 1 x2 x 3 ∨ x1 x 2 x3 ∨ x 1 x 2 ∨ x 1 x3 ∨ x2 x3 .
By Theorem 2.6, the equation φ3 (x1 , x2 , x3 ) = 0 is consistent if and only if the
equation φ2 (x1 , x2 ) = 0 is consistent, where
φ2 (x1 , x2 ) = φ3 (x1 , x2 , 0)φ3 (x1 , x2 , 1) = (x 1 x2 ∨ x 1 x 2 )(x1 x 2 ∨ x 1 x 2 ∨ x 1 ∨ x2 ).
Applying once again Theorem 2.6, φ2 (x1 , x2 ) = 0 is consistent if and only if
φ1 (x1 ) = 0 is consistent, where
φ1 (x1 ) = φ2 (x1 , 0)φ2 (x1 , 1) = x 1 .
Finally, eliminating x1 yields
φ0 = 0.
2.6 Variable elimination procedures 89

Procedure Eliminate(φ)
Input: A Boolean expression φ(x1 , . . . , xn ).
Output:Asolution (x1∗ , . . . , xn∗ ) of the equation φ(X) = 0 if the equation is consistent; No otherwise.

begin
φn := φ(x1 , . . . , xn );
{comment: begin successive variable elimination}
for j := n down to 1 do φj −1 (x1 , . . . , xj −1 ) := φj (x1 , . . . , xj −1 , 0) φj (x1 , . . . , xj −1 , 1);
{comment: consistency check}
if φ0 = 1 then return No;
if φ0 = 0 then {comment: the equation is consistent; begin backtracking}
for j := 1 to n do xj∗ := φj (x1∗ , . . . , xj∗−1 , 0);
return (x1∗ , . . . , xn∗ );
end

Figure 2.3. Procedure Eliminate.

The equation φ0 = 0 is clearly consistent, and therefore, we can conclude at this


point that the original equation φ3 = 0 is consistent, too. Using iteratively The-
orem 2.7, we now proceed to compute a solution (x1∗ , x2∗ , x3∗ ) of φ3 = 0. First,
we let x1∗ = φ1 (0) = 1. Since φ2 (x1∗ , 0) = φ2 (1, 0) = 0, we next set x2∗ = 0. And
finally, since φ3 (x1∗ , x2∗ , 0) = φ3 (1, 0, 0) = 0, we let x3∗ = 0. Thus, we conclude that
(x1∗ , x2∗ , x3∗ ) = (1, 0, 0) is a solution of φ3 = 0. 

Figure 2.3 presents a formal statement of the procedure Eliminate(φ) for the
solution of Boolean equations of the form (2.13). The correctness of the procedure
is an immediate consequence of Theorems 2.6 and 2.7. It should be noted, however,
that Eliminate can be implemented in a variety of ways. More precisely, the
meaning of the assignment

φj −1 := φj (x1 , . . . , xj −1 , 0) φj (x1 , . . . , xj −1 , 1) (2.17)

in this procedure is not entirely determined. It leaves open an important question:


What expression of φj −1 should we carry over to the next step of the algorithm?
Also, there is no reason to stick to the original ordering (x1 , . . . , xn ) of the variables
in the elimination phase of the procedure. Rather, we may want to decide at each
step, in a dynamic fashion, what variable to eliminate next. The answer to these
questions may determine the efficiency of Eliminate to a large extent, and we
now proceed to discuss them briefly.
Let us first consider the question of what expression to use for φj −1 at each
step of the elimination procedure. If we simply write φj −1 as the conjunction
of the expressions φj (x1 , . . . , xj −1 , 0) and φj (x1 , . . . , xj −1 , 1), without transform-
ing the resulting expression any further, then we eventually obtain the following
expression:

φ0 = φ(X ∗ ).
X ∗ ∈Bn
90 2 Boolean equations

Successive elimination then amounts to the complete enumeration of all points


of Bn , and the necessary and sufficient condition for consistency, viz., φ0 = 0,
becomes trivial (in the two-element Boolean algebra).
By contrast, transforming the expression (2.17) in each (or some) iteration(s)
of Eliminate allows in general an increase of the efficiency of the algorithm. In
particular, simplifying the expression φj −1 sometimes allows us to immediately
detect that φj −1 is identically 0 or identically 1. The elimination procedure can then
be curtailed: Indeed, if φj −1 is constant, then clearly φj −1 = φ0 , and Eliminate
can immediately proceed with the consistency check.
To discuss this point more concretely, let us concentrate on the special case in
which φ = φn is a DNF (recall that no such assumption has been made so far).
When this is the case, we can rewrite φn in the form

φn = x n ψ0 ∨ xn ψ1 ∨ ψ2 , (2.18)

where ψ0 , ψ1 and ψ2 are DNFs involving the variables x1 , . . . , xn−1 , but not xn .
Then,
φn (x1 , . . . , xn−1 , 0) = ψ0 ∨ ψ2
and
φn (x1 , . . . , xn−1 , 1) = ψ1 ∨ ψ2 ,
so that

φn−1 = φn (x1 , . . . , xn−1 , 0) φn (x1 , . . . , xn−1 , 1) = ψ0 ψ1 ∨ ψ2 . (2.19)

The expression (2.19) can be used to rewrite φn−1 as a DNF. Indeed, by dis-
tributivity, the conjunction ψ0 ψ1 has a DNF expression ψ, each term of which is
simply the conjunction of a term of ψ0 with a term of ψ1 . This DNF can be further
simplified by deleting any term that is identically 0 or is absorbed by another term.
These straightforward rules yield a DNF equivalent to φn−1 .
Since a DNF is identically zero if and only if it has no terms, this approach
sometimes allows us to detect consistency early in the elimination procedure,
thus reducing the number of iterations required by Eliminate and speeding up
termination.
Example 2.3. Consider the equation

φ4 = x1 x2 x 4 ∨ x 1 x2 x3 x 4 ∨ x 2 x4 ∨ x 1 x 3 x4 .

By elimination of x4 , we get

φ3 = (x1 x2 ∨ x 1 x2 x3 )(x 2 ∨ x 1 x 3 ).

Using distributivity, φ3 is directly seen to be identically zero. Thus, we conclude


that φ3 = φ2 = φ1 = φ0 = 0, and that the equation φ4 = 0 is consistent. The solu-
tion (x1∗ , . . . , x4∗ ) = (0, . . . , 0) can be computed using Theorem 2.7. 
2.6 Variable elimination procedures 91

We now turn to a brief discussion of the elimination ordering. As noted earlier,


there is no compelling reason to eliminate the variables in the order xn , . . . , x1 rather
than in any other order. We may even want to determine dynamically (that is, on
the run) which variable xi to eliminate next. In some situations, an obvious choice
can be made for this next variable. For instance, if the current DNF φj (x1 , . . . , xj )
contains xi as a term of degree 1, or if the variable xi appears only uncomplemented
in φj , then eliminating xi is tantamount to fixing xi to 0 in φj (we leave this for the
reader to check). Similarly, if φj (x1 , . . . , xj ) contains a term x i , or if the variable xi
appears only complemented in φj , then eliminating xi is tantamount to fixing xi
to 1 in φj . It is easy to recognize in this description an alternative statement of the
Davis-Putnam rules (see Section 2.5), cast here in terms of variable elimination.
Example 2.4. Consider the DNF equation φ6 (x1 , . . . , x6 ) = 0, where

φ6 = x 1 x2 x 3 ∨ x1 x 2 x3 ∨ x 1 x 2 x4 ∨ x 1 x3 ∨ x2 x3 x4 ∨ x4 x5 x6 ∨ x 4 x 5 x 6 ∨ x 4 ∨ x3 x5 x 6 .

Applying the unit literal rule, we see that x4 can be fixed to 1. This reduces φ6 to

x 1 x2 x 3 ∨ x1 x 2 x3 ∨ x 1 x 2 ∨ x 1 x3 ∨ x2 x3 ∨ x5 x6 ∨ x3 x5 x 6 .

Variable x5 only appears in uncomplemented form in this DNF. By the monotone


literal rule, we can set x5 to 0, thus reducing the original problem to the equation
solved in Example 2.2. 

Davis and Putnam’s original algorithm [261] is in fact a variant of the classi-
cal procedure Eliminate, especially tailored for the solution of DNF (or CNF)
equations. The additional rules proposed by these authors consist in maintaining the
DNF format throughout the procedure and in computing dynamically an effective
variable elimination ordering. Since both the unit literal rules and the monotone
literal rules lead to a simplification of the current DNF φj , it makes sense to apply
them first in the elimination algorithm. When the rules are no longer applicable,
Davis and Putnam [261] suggest proceeding with the elimination of any variable
that appears in a shortest term of φj (recall our discussion of branching rules in
Section 2.5).
Even with these refinements, however, the main computational hurdle of the
elimination method remains: Namely, the number of terms in the equation tends to
explode in the initial phases of the procedure, before it eventually decreases with the
number of variables. As a result, computer implementations of elimination proce-
dures rapidly face memory space problems, similar in nature to those encountered
by other dynamic programming algorithms. In effect, these problems are often
serious enough to prohibit the solution of equations involving many variables.
This difficulty was first noticed by Davis, Logemann, and Loveland [260] and led
them to replace the original form of the Davis-Putnam algorithm by a branching
procedure of the type discussed in Section 2.5. We will see in Sections 2.11.2
and 2.11.3, however, that elimination procedures are well suited for generating all
solutions or for computing parametric solutions of Boolean equations.
92 2 Boolean equations

2.7 The consensus procedure


The consensus procedure has a long history in the Boolean literature. It was orig-
inally designed as a method generating (that is, listing) all prime implicants of a
Boolean function given in DNF and was repeatedly discovered in this form by sev-
eral independent researchers; see Blake [99], Samson and Mills [801], Quine [768],
as well as Chapter 3. Brown [156] gives a interesting historical account of this line
of research.
As a solution method for CNF equations, the consensus method mostly owes
its fame to Robinson [787]. In his seminal paper, Robinson introduced an infer-
ence principle (which he calls the resolution principle) for first-order logic. The
resolution principle subsequently became the cornerstone of many algorithmic
techniques used by automated reasoning systems (see, e.g., Wos et al. [925]).
When specialized to propositional equations and translated from the CNF for-
mat favored by Robinson into the equivalent DNF framework adopted here, the
resolution method becomes essentially identical to the consensus method, and it
immediately follows from earlier works that resolution provides a correct solution
procedure for Boolean equations (although Robinson [787] used a direct, ad hoc
argument to establish this important result).
In this section, we explain how a consensus-based procedure can be used to
solve DNF Boolean equations. A general version of the consensus method, allow-
ing the enumeration of all prime implicants of DNF expressions, is discussed
more extensively in Chapter 3. The essence of consensus procedures lies in the
following observation. (Note the similarity of this statement with the statement of
Theorem 2.6 when the latter is specialized to DNF equations; see (2.18), (2.19).)
Theorem 2.8. Let φ(x1 , x2 , . . . , xn ) be a Boolean expression of the form

φ = xi C ∨ x i D ∨ ψ, (2.20)

where i ∈ {1, 2, . . . , n}, and C, D are elementary conjunctions. Then, φ = φ ∨ CD,


so that the equation φ = 0 is consistent if and only if the equation φ ∨ CD = 0 is
consistent.
Proof. The claims simply follow from the observation that, in every solution of
φ = 0, either C or D must be 0. 

Theorem 2.8 motivates the following definition.


Definition 2.5. If x C and x D are two elementary conjunctions such that CD is
not identically 0, then we say that CD is the consensus of these two conjunctions,
and we say that CD is derived from x C and x D by consensus on x.
One interpretation of Theorem 2.8 is that, whenever xi C = 0 and x i D = 0
express conditions to be satisfied by the variables (x1 , x2 , . . . , xn ), then CD = 0
expresses another such condition (note that this condition is uninteresting if CD
is identically 0). Therefore, we can see consensus derivation as the application
2.7 The consensus procedure 93

of an inference rule (namely, the classical syllogism) that allows us to draw the
conclusion C D from the premises xi C and x i D.
This view of consensus derivation, as an operation producing new elemen-
tary conjunctions from existing ones, leads to a natural extension of the previous
concepts.
Definition 2.6. The elementary conjunction C can be derived by consensus from
a set S of elementary conjunctions if there exists a finite sequence C1 , C2 , . . . , Cp
of elementary conjunctions such that
(1) Cp = C, and
(2) for i = 1, . . . , p, either Ci ∈ S or there exist j < i and k < i such that Ci is
the consensus of Cj and Ck .
We are now ready to state the fundamental result that motivates the consideration
of consensus derivation.
Theorem 2.9. The DNF equation φ = 0 is inconsistent if and only if the (empty)
elementary conjunction 1 can be derived by consensus from the set of terms of φ.
Proof. As mentioned earlier, this theorem can be viewed as an immediate corollary
of the results in Chapter 3 (see Theorem 3.5 and its Corollary 3.4). For the sake of
completeness, we prove it here from first principles.
The “if” part of the statement follows directly from Theorem 2.8. For the “only
if” part, we assume that the DNF equation φ(x1 , x2 , . . . , xn ) = 0 is inconsistent,
and we proceed by induction on the number n of variables. The result is trivial if
n = 1. For n > 1, write φ as
φ = x n ψ0 ∨ xn ψ1 ∨ ψ2 , (2.21)
where ψ0 , ψ1 and ψ2 do not depend on xn . Theorem 2.6 implies that the equation
ψ0 ψ1 ∨ ψ2 = 0 is inconsistent.
m
Now use distributivity to rewrite ψ0 ψ1 ∨ ψ2 as a
DNF of the form ψ = k=1 Ck , where each term Ck is either a term of φ or the
conjunction of a term of ψ0 with a term of ψ1 , namely, the consensus (on xn ) of
two terms of φ. Since ψ depends on n − 1 variables, we know by induction that the
constant 1 can be derived by consensus from {Ck | k = 1, 2, . . . , m}. This, however,
implies that 1 can be derived by consensus from the set of terms of φ. 

A procedure for testing the consistency of DNF equations can now be stated
as in Figure 2.4. The correctness of the procedure is an immediate corollary of
Theorem 2.9 (note that the while-loop eventually terminates, since the number of
elementary conjunctions on n variables is finite).
Example 2.5. Consider the DNF equation φ(x1 , x2 , x3 , x4 ) = 0, where
φ = x 1 x2 x 3 ∨ x1 x 4 ∨ x 1 x 2 ∨ x 1 x3 ∨ x4 .
From the terms x 1 x2 x 3 and x 1 x 2 , we can derive the consensus x 1 x 3 . This new
term together with x 1 x3 yields the consensus x 1 . On the other hand, the term
94 2 Boolean equations

Procedure Consensus(φ)

Input: A DNF expression φ(x1 , . . . , xn ) = m
k=1 Ck .
Output: Yes if the equation φ = 0 is consistent; No otherwise.

begin
S := {Ck | k = 1, 2, . . . , m};
while there exist two terms xi C and x i D in S such that
xi C and x i D have a consensus and CD is not in S do
if CD = 1 then return No
else S := S ∪ {CD};
return Yes;
end

Figure 2.4. Procedure Consensus.

x1 can be derived from x1 x 4 and x4 . Combining now the derived terms x 1 and
x1 , we can produce the constant 1, and we conclude that the equation φ = 0 is
inconsistent. 

Two features of the consensus procedure deserve further attention. First, Con-
sensus does not produce a solution of the DNF equation when there is one. Second,
Consensus is not completely defined, since we did not specify how the terms xi C
and x i D are to be chosen in the while-loop. We now successively tackle these two
points.
Consider first the fact that Consensus only delivers a consistency verdict for
DNF equations, but no solution. This is, from a theoretical viewpoint, no serious
problem. Indeed, as explained in Section 2.4, Consensus can easily be used as a
subroutine to produce a solution of consistent equations.
But the situation is actually even better here. Indeed, we shall prove in Chapter 3
that, when the procedure Consensus(φ) halts and returns the answer Yes, the set
S contains all prime implicants of the function represented by the DNF φ. The
knowledge of these prime implicants is, by itself, sufficient to produce a solution
of the equation φ = 0, as will also be explained in Chapter 3 (see Corollary 3.4).
Let us also notice, as a final remark on this topic, that the consensus procedure
and its various extensions have been mostly used as equation-solving techniques
within the field of automated theorem proving. As previously mentioned, many
applications in this particular field do not require the explicit finding of solu-
tions, since only inconsistent equations are “interesting” (because they correspond
to theorems). On the other hand, what is valuable in this context is an explicit
argument showing why a theorem is true (i.e., a proof of the theorem). A consen-
sus derivation of inconsistency provides such an argument (although sometimes
insufficiently clear; see [881, 925] for a more detailed discussion).
We now take up the second issue mentioned above: How are the terms xi C and
x i D to be selected in the while-loop of the consensus procedure? This question
is closely related to the question of selecting the next variable to branch upon in
2.8 Mathematical programming approaches 95

branching procedures, and some of the available strategies should be by now very
familiar.
A first strategy is to replace the condition “CD is not in S” in the while-
statement by the stronger condition “CD is not absorbed by any term in S.” The
procedure remains correct under this modification, as easily follows from the proof
of Theorem 2.9.
Another strategy, much in the spirit of the Davis-Putnam unit literal rule, is to
give priority to so-called unit consensus steps, namely, to pairs of terms {xi C, x i D}
such that either xi C or x i D is of degree 1. Note, for instance, that the consensus
of xi and x i D is simply D, which absorbs x i D. Thus, unit consensus steps can be
implemented without increasing the cardinality of the set S. If we restrict the pro-
cedure Consensus to the use of unit consensus steps, then the procedure becomes
extremely fast. But, unfortunately, it can fail to detect inconsistent equations. Nev-
ertheless, equation solving heuristics based on this approach are widely used in
automated reasoning procedures.
Similarly, a substantially accelerated heuristic algorithm is obtained when we
restrict consensus formation to pairs of terms of the form {xi C, x i CD}; indeed,
such a pair produces the term CD, which absorbs x i CD.
If Consensus starts by selecting all pairs of terms having a consensus on xn , as
long as they are available, and proceeds next to pairs of terms having a consensus on
xn−1 , xn−2 , . . . , x1 , then Consensus becomes essentially identical to the elimination
procedure.
Other specialized forms of consensus used in automated reasoning are the so-
called set of support strategy, linear consensus, input refutation, and so on. Some
of these variants will be introduced in subsequent chapters (e.g., in Chapter 6). We
also refer to [186, 571, 925] and to the exercises at the end of this chapter for more
information.

2.8 Mathematical programming approaches


The approaches surveyed in this section are characterized by their treatment of
Boolean variables as numerical quantities and by the transformation of Boolean
equations into equivalent mathematical programming problems. This is in sharp
contrast with the methods discussed in previous sections, which rely on a purely
symbolic treatment of the variables. The idea of identifying the Boolean symbols
0 and 1 with numbers and reducing problems of logic to optimization prob-
lems goes back a long time (see, among others, Fortet [342, 343], Hammer and
Rudeanu [460]); the interest in such approaches has been revived in recent years.

2.8.1 Integer linear programming


The basic observation underlying integer linear programming approaches can be
phrased as follows:
96 2 Boolean equations

Theorem 2.10. The DNF equation



φ(x1 , x2 , . . . , xn ) = xi xj =0 (2.22)
k=1 i∈Ak j ∈Bk

has the same set of solutions as IS(φ), where IS(φ) is the following system of linear
inequalities in 0-1 variables:
 
(1 − xi ) + xj ≥ 1, k = 1, 2, . . . , m;
i∈Ak j ∈Bk

xi ∈ {0, 1}, i = 1, 2, . . . , n.

In particular, the following statements are equivalent:

(a) The equation φ = 0 is consistent.


(b) The system IS(φ) is feasible.
(c) The optimal value of IP(φ) is 0, where IP(φ) is the integer programming
problem:

minimize z
 
subject to z + (1 − xi ) + xj ≥ 1, k = 1, 2, . . . , m;
i∈Ak j ∈Bk

xi ∈ {0, 1}, i = 1, 2, . . . , n;
z ∈ {0, 1}.

Proof. The first claim is just a restatement of Theorem 1.39 (see Section 1.13.6)
and the second one is an immediate corollary. 

In principle, any algorithm for handling 0-1 linear programming problems


can be used to solve IS(φ) or IP(φ), thereby simultaneously solving the Boolean
equation φ = 0 (see [707] for an overview of integer progamming methods). Such
approaches have been taken up and developed by several researchers, following,
in particular, some early work by Jeroslow and his coworkers; see, for example,
[98, 533], Williams [911, 912] and Hooker [498, 499, 500, 501], and so on. The
book by Chandru and Hooker [184] covers these developments in great detail,
so that we shall content ourselves with a brief survey of the basic ideas (see also
Hooker [502] for a discussion of logic-based methods in optimization).
Blair, Jeroslow and Lowe [98] adopt a straightforward approach: They sim-
ply feed the formulation IS(φ) to standard integer linear programming codes
that attempt to solve IS(φ) by branch-and-bound. Let us see what this approach
amounts to.
First, consider the linear relaxation LP(φ) of problem IP(φ), namely, the linear
programming problem:
2.8 Mathematical programming approaches 97

minimize z
 
subject to z + (1 − xi ) + xj ≥ 1, k = 1, 2, . . . , m,
i∈Ak j ∈Bk

0 ≤ xi ≤ 1, i = 1, 2, . . . , n,
0 ≤ z ≤ 1.

Applying a basic branch-and-bound procedure to solve IP(φ) is tantamount to


solving the equation φ = 0 by the procedure Branch described in Section 2.5,
with a subroutine Preprocess-LP which performs the following steps:

(a) Solve LP(φ) and let (X∗ , z∗ ) be its optimal solution;


(b) If z∗ > 0, then the optimal value of IP(φ) must be 1 and hence the equation
φ = 0 is inconsistent;
(c) If z∗ = 0 and X∗ is integral, then X ∗ is a solution of the equation φ = 0.

When neither case (b) nor case (c) applies, then one of the variables assuming
a fractional value in X∗ can be selected for branching.
How effective is this particular version of Branch? Let us say that a variable
xi is fixed to the value 0 (respectively, 1) by unit consensus on φ if the term
xi (respectively, x i ) can be derived from the terms of φ by a sequence of unit
consensus steps (i.e., if the linear term xi , respectively x i , arises after iterated
applications of the unit literal rule on φ). Also, let us say that unit consensus
detects that φ = 0 is inconsistent if some variable xi can be fixed both to 0 and to
1 by unit consensus. The next result is due to Blair, Jeroslow, and Lowe [98].

Theorem 2.11. (a) If unit consensus does not detect that φ = 0 is inconsis-
tent, then there is a feasible solution (X ∗ , z∗ ) of LP(φ) in which z∗ = 0 and
xi∗ = 1/2 for each variable xi that is not fixed by unit consensus (i = 1, 2, . . . , n).
(b) For each i = 1, 2, . . . , n, if xi is fixed to the value u ∈ {0, 1} by unit consensus
on φ, then xi∗ = u in all those feasible solutions (X ∗ , z∗ ) of LP(φ) for which
z∗ = 0.
(c) The optimal value of LP(φ) is strictly positive if and only if unit consensus
detects that φ = 0 is inconsistent.

Proof. The theorem follows from the fact that, if all terms of φ have degree at least
2, then setting z∗ = 0 and xj∗ = 1/2 for j = 1, 2, . . . , n defines a feasible solution
of LP(φ). And conversely, if φ contains a term of degree 1, say, the term xi , then
LP(φ) contains the constraint

z + (1 − xi ) ≥ 1,

so that xi∗ = 0 in every feasible solution (X∗ , z∗ ) of LP(φ) for which z∗ = 0.


Statements (a) and (b) easily follow from these observations, by induction on the
number of variables fixed by unit consensus. Statement (c) is a corollary of the
previous ones. 
98 2 Boolean equations

It follows from Theorem 2.11 that, when applied to problem IP(φ) (or IS(φ)),
a branch-and-bound algorithm does not detect inconsistency faster than the unit
literal rules. One may still hope that, in the course of solving the linear relaxation
LP(φ), integer solutions may be produced by “sheer luck,” thus accelerating the
basic branching procedure in the case of consistent equations. While this is true
to some extent, computational experiments indicate that this approach is rather
inefficient and that special-purpose heuristics tend to outperform this general-
purpose LP-based approach (see [98, 534] and Section 2.5.2).
The integer programming framework, however, also offers insights of a more
theoretical nature into the solution of Boolean equations. Let us first recall some
definitions from [197, 211, 812]. (We denote by "x# the smallest integer not smaller
than x.)

Definition 2.7. Let A ∈ Zm×n , b ∈ Zm , and consider the system of linear inequali-
ties I : (A x ≥ b, x ≥ 0) for x ∈ Rn . A Chvátal cut for I is any inequality of the form
cx ≥ δ, where c ∈ Zn and δ ∈ R, such that for some d ∈ R, "d# ≥ δ, the inequality
cx ≥ d can be obtained as a nonnegative linear combination of the inequalities
in I.

It should be clear that every integral vector x ∈ Zn that satisfies all the inequal-
ities in I also satisfies every Chvátal cut for I. Let us now consider the set of all
the inequalities that can be obtained by iterated computations of Chvátal cuts.

Definition 2.8. The inequality cx ≥ d is in the Chvátal closure of I : (Ax ≥ b,


x ≥ 0) if there exists a finite sequence of inequalities ci x ≥ di (i = 1, 2, . . . , p) such
that

(1) cp = c, dp = d, and
(2) for i = 1, . . . , p, either the inequality ci x ≥ di is in I, or it is a Chvátal cut
for the system of inequalities (cj x ≥ dj : 1 ≤ j < i).

A deep theorem of Chvátal [197] asserts that, if the solution set of I is bounded,
then every linear inequality cx ≥ δ (c ∈ Zn , δ ∈ R) that is satisfied by all integral
solutions of I is in the Chvátal closure of I (see also [812, 211]). In particular,
if the system I has no integral solution, then the inequality 0 ≥ 1 must be in
its Chvátal closure. We are now ready to apply these concepts to the solution of
Boolean equations.

Theorem 2.12. The DNF equation φ(X) = 0 is inconsistent if and only if the
inequality 0 ≥ 1 is in the Chvátal closure of the system
 
(1 − xi ) + xj ≥ 1, k = 1, 2, . . . , m, (2.23)
i∈Ak j ∈Bk

0 ≤ xi ≤ 1, i = 1, 2, . . . , n. (2.24)
2.8 Mathematical programming approaches 99

Proof. By Theorem 2.10, we know that φ = 0 is inconsistent if and only if the


system IS(φ) is infeasible, that is, if and only if the system (2.23)–(2.24) has no
integral solution. So, the statement follows from Chvátal’s theorem. 

As observed by Cook, Coullard, and Turán [210], Definition 2.8 and Theo-
rem 2.12 suggest a purely algebraic cutting-plane proof system for establishing
the inconsistency of DNF equations. The next result, proved in Cook, Coullard,
and Turán [210]; Hooker [499, 500] and Williams [911], establishes a connection
between this approach and the consensus method.

Theorem 2.13. Let k ∈ {1, ..., n}; let A1 , A2 , B1 , B2 be subsets of {1, ..., n}\{k} such
that (A1 ∪ A2 ) ∩ (B1 ∪ B2 ) = ∅; and consider the system of inequalities
 
(1 − xk ) + (1 − xi ) + xj ≥ 1, (2.25)
i∈A1 j ∈B1
 
xk + (1 − xi ) + xj ≥ 1, (2.26)
i∈A2 j ∈B2

0 ≤ xi ≤ 1, i = 1, 2, . . . , n. (2.27)

Then, the inequality


 
(1 − xi ) + xj ≥ 1 (2.28)
i∈A1 ∪ A2 j ∈B1 ∪ B2

is a Chvátal cut for (2.25)–(2.27).

Proof. Take the sum of (2.25) and (2.26).Add (1−xi ) ≥ 0 to the resulting inequality
for each i that appears in exactly one of A1 , A2 and add xj ≥ 0 for each j that appears
in exactly one of B1 , B2 . Divide both sides of the resulting inequality by 2. These
operations yield the valid inequality
  1
(1 − xi ) + xj ≥ , (2.29)
i∈A1 ∪ A2 j ∈B1 ∪ B2
2

which shows that (2.28) is a Chvátal cut for (2.25)–(2.27). 



Observe that (2.25) represents the elementary conjunction C = xk ( i∈A1 xi
x ), (2.26) represents the elementary conjunction D = x k ( i∈A2 xi
j ∈B1 j
j ∈B2 x j ), and (2.28) represents the consensus of C and D. Therefore, (2.28)
can appropriately be called a consensus cut derived from (2.25)–(2.26).

Example 2.6. Consider two terms of a DNF φ, say, C = x1 x2 x 3 x 4 and D =


x 1 x2 x 3 x5 . In the system IS(φ), these terms give rise to the inequalities

(1 − x1 ) + (1 − x2 ) + x3 + x4 ≥ 1,
x1 + (1 − x2 ) + x3 + (1 − x5 ) ≥ 1.
100 2 Boolean equations

The consensus cut (2.28) derived from these inequalities is


(1 − x2 ) + x3 + x4 + (1 − x5 ) ≥ 1,
which is also the inequality associated to the consensus of C and D, viz.
x 2 x 3 x 4 x5 . 
Comparing Definitions 2.6 and 2.8 in light of Theorem 2.13, we conclude that
the consensus procedure can be interpreted as a special type of cutting-plane proce-
dure (Cook, Coullard, and Turán [210]). Note in particular that, for an inconsistent
equation φ = 0, the sequence of consensus steps required to derive the empty
conjunction must be at least as long as the number of cuts required to derive the
inequality 0 ≥ 1. This observation raises hope that a cutting-plane approach to the
solution of Boolean equations may be practically efficient (see also Section 2.10.1).
Hooker [500] attacked IP(φ) by a cutting-plane procedure based on Theorem
2.13. In the simplest approach, the procedure Preprocess-LP described earlier in
this section is augmented by the following step:
(d) Try to derive one or more consensus cuts violated by X∗ from the inequalities
in IP(φ). If such cuts are found, then add them to IP(φ) and go back to step (a).
Finding violated consensus cuts can in principle be implemented by sequentially
considering all pairs of inequalities in IP(φ) and checking the correspond-
ing consensus cuts. This inefficient approach, however, can be accelerated
in various ways. We refer to Chandru and Hooker [184] for details and for
additional theoretical developments, and to Chai and Kuehlmann [177] or
Manquinho and Marques-Silva [666] for recent computational work along similar
lines.
Hooker [503] presents further results about the integer programming approach
to logic.

2.8.2 Nonlinear programming


Several attempts have been made to model and to solve Boolean equations as
nonlinear programming problems, either discrete or continuous.
nOne possible approach consists in minimizing the objective function
i=1 xi (1 − xi ) subject to the constraints (2.23)–(2.24), which define the con-
tinuous relaxation of IS(φ). Note that the optimal value of this problem is 0 if
and only if the equation φ = 0 is consistent. Kamath, Karmarkar, Ramakrishnan,
and Resende [545, 546], for instance, propose an interior-point algorithm to solve
a closely related model (they first perform the change of variables yi = 2xi − 1,
which replaces the 0-1 variables x1 , x2 , . . . , xn by new variables y1 , y2 , . . . , yn taking

values in {−1, +1}, and they maximize ni=1 yi2 ).
Another line of attack exploits the following observation:
Theorem 2.14. Consider the DNF
m



φ(x1 , x2 , . . . , xn ) = xi xj (2.30)
k=1 i∈Ak j ∈Bk
2.8 Mathematical programming approaches 101

and the real-valued function

m
  
f (x1 , x2 , . . . , xn ) = ck xi (1 − xj ) , (2.31)
k=1 i∈Ak j ∈Bk

where c1 , c2 , . . . , cm are arbitrary positive coefficients. The following statements


are equivalent:

(a) The equation φ(x1 , x2 , . . . , xn ) = 0 is consistent.


(b) The minimum of f (x1 , x2 , . . . , xn ) over {0, 1}n is equal to zero.
(c) The minimum of f (x1 , x2 , . . . , xn ) over [0, 1]n is equal to zero.

Proof. The equivalence of statements (a) and (b) is obvious. Their equivalence
with statement (c) follow from the claim that minX∈[0,1]n f (X) = minX∈{0,1}n f (X)
(as observed by Rosenberg [789], this property actually holds for every multilin-
ear function f ; see also Theorem 13.12 in Section 13.4.3). To see this, consider
an arbitrary point X ∗ ∈ [0, 1]n and assume that one of its components, say, x1∗ ,
is not integral. The restriction of f to xi = xi∗ for i ≥ 2, namely, the function
g(x1 ) = f (x1 , x2∗ , . . . , xn∗ ), is affine. Hence, g(x1 ) attains its minimum at a 0-1 point
x̂1 . This implies in particular that f (x̂1 , x2∗ , . . . , xn∗ ) ≤ f (x1∗ , x2∗ , . . . , xn∗ ). Continu-
ing in this way with any remaining fractional components, we eventually produce
a point X̂ ∈ {0, 1}n such that f (X̂) ≤ f (X ∗ ), which proves the claim and the
theorem. 

Any algorithm for nonlinear 0-1 programming can be used to optimize f (X)
over Bn (see Chapter 13 and the survey [469]). Hammer, Rosenberg, and Rudeanu
[458, 460], for instance, have proposed a variable elimination algorithm (inspired
from Eliminate) for minimizing functions of the form (2.31) over B n . A stream-
lined version and an efficient implementation of this algorithm are described by
Crama, Hansen, and Jaumard [235], who also observe that this algorithm is appli-
cable to the solution of Boolean equations. The algorithm described in [235] relies
on numerical bounding procedures to control (to a certain extent) the combinatorial
explosion inherent to elimination procedures (see Section 2.6).
The coefficients ck are arbitrary in (2.31), and the performance of any opti-
mization algorithm based on Theorem 2.14 may be influenced by the choice of
these coefficients. Wah and Shang [894] propose a discrete Lagrangian algorithm
for minimizing (2.31), which can be viewed as starting with ck = 1 for all k and
dynamically adapting these values.
Recently, several authors have experimented with semidefinite programming
reformulations of Boolean equations based on extensions of Theorem 2.14; see, for
instance, Anjos [23, 24] and de Klerk, Warners, and van Maaren [266]. Gu [417]
combines various continuous global optimization algorithms with backtracking
techniques to compute the minimum of (2.31), or of closely related functions,
over [0, 1]n . Other nonlinear programming approaches to the solution of Boolean
102 2 Boolean equations

equations, including Lagrangian techniques and heuristics, are surveyed by Gu,


Purdom, Franco, and Wah [418].

2.8.3 Local search heuristics


In recent years, several groups of researchers have experimented with heuristics,
or incomplete methods, which are not guaranteed to solve the given equation
φ(X) = 0, but which do so (experimentally) with high probability.
As a matter of fact, heuristic methods for equation solving have been used
for a long time in the artificial intelligence literature (see [186, 533, 627]). We
have already mentioned, for instance, that unit consensus is sometimes viewed
as providing such an incomplete method. Linear consensus (see Exercise 9 in
Section 2.12) is another example. Both unit consensus and linear consensus may
prove either consistency or inconsistency but may sometimes terminate without
any conclusion.
By contrast, a more recent trend of research has turned to heuristics which
are unable to prove inconsistency, and which simply concentrate on the quest for
solutions of the equation. These approaches are typically based on Theorem 2.14
and attempt to minimize the pseudo-Boolean function
m 
 
f (X) = xi (1 − xj ) (2.32)
k=1 i∈Ak j ∈Bk

by a descent algorithm enhanced with some local search ingredients. Pioneering


work along these lines includes the work of Gu [415, 416]; Selman, Levesque, and
Mitchell [820]; Selman, Kautz, and Cohen [818, 819] (a very similar scheme was
implemented by Hansen and Jaumard [468] for solving the maximum satisfiability
problem). The GSAT algorithm in [820], for instance, starts with a random point
X ∗ ∈ B n and repeats a number of times the following step: If f (X ∗ )  = 0, then
switch that component i (namely, replace xi∗ by x ∗ i ) which results in the largest
decrease of f (X). The decrease may be negative or 0. If no solution of φ(X) = 0
is found after a predetermined number of switches, then the process is restarted
from scratch and, after a number of restarts, the equation is declared (perhaps
wrongly) inconsistent. GSAT was found to perform surprisingly well on a variety
of experimental benchmark problems. It can be improved even further, however,
if the variable to be switched is picked more carefully. Algorithms in the WalkSAT
family [818, 819], for instance, select a term randomly among all terms of (2.32)
that are not canceled by the current assignment X∗ , and then switch a variable
within that term, either at random or greedily. Variations on this theme have been
explored by several researchers.
Gu et al. [418] and Hoos and Stützle [507, 508] provide a wealth of details
about heuristic approaches to SAT and about their practical performance. Finally,
we note that local search algorithms have also been proposed as exact solution
methods for DNF equations, as in Dantsin et al. [255].
2.9 Recent trends and algorithmic performance 103

2.9 Recent trends and algorithmic performance


Over the last 15 years, there has been an unprecedented flurry of algorithmic devel-
opments around the solution of Boolean equations. These developments came in
fast, successive waves, with each wave bringing new computational breakthroughs
and new insights into what the “ultimate” solution method might eventually look
like.
In spite of its simplicity, the basic branching scheme enhanced by some
additional features (such as a smart branching rule or a tight relaxation) has
repeatedly proved to provide one of the most effective ways of solving DNF
Boolean equations. State-of-the-art implementations are described in several of
the references cited earlier (see also Hoos and Stützle [506] and the Web site
http://www.satlive.org). It should be noted, however, that some of
the most recent implementations depart in various ways from the basic scheme
described in Section 2.5.
For instance, in their Relsat algorithm, Bayardo and Schrag [58] have incor-
porated look-back strategies inspired from the constraint satisfaction literature:
Relsat is no longer restricted to performing a depth-first traversal of the search
tree, but is allowed to backtrack in more intelligent ways. For this purpose, the
authors, at every node of the search tree, rely on information which is derived from
past branchings by implicitly applying consensus operations on carefully selected
pairs of terms. Thus, their approach provides an interesting, and extremely effective
link between branching-based and consensus-based techniques.
In another paper, Gomes et al. [403] argued convincingly that the performance
of branching procedures can be further enhanced if randomized branching is used
in conjunction with rapid randomized restarts (RRR). In RRR, if the branching
procedure does not stop after a small number of backtracks, then the run is ter-
minated and restarted from the root (since the branching rule is randomized, two
successive runs of the procedure usually behave differently). In particular, the
authors show that RRR further improves the performance of efficient algorithms
like Relsat ([58]) and Satz ([613]).
Thus, in conclusion, today’s most successful methods for the solution of
Boolean equations are a mixture of a broad variety of ingredients. They are
often elaborations of branching methods à la Davis-Putnam, augmented by smart
branching rules spiced with a subtle touch of randomization, and they may rely on
look-ahead or look-back techniques based on the solution of easy subproblems, on
local search optimization or on partial consensus derivations. The highly efficient
algorithms implemented by Bayardo and Schrag [58], Dubois and Dequen [283],
Eén and Sörensson [289], Goldberg and Novikov [392], Gomes et al. [403], Hoos
and Stützle [507, 508], Marques-Silva and Sakallah [670] or Moskewicz et al. [693]
are good examples of these trends. The developments concerning the fast solution
of Boolean equations have certainly not come to a halt yet and, in years to come,
we should still witness much progress in the solution of this venerable problem.
104 2 Boolean equations

2.10 More on the complexity of Boolean equations


We now briefly return to some theoretical complexity issues. We refer the reader
to the surveys [209, 344, 418, 881, 882] and to the book [571] for additional details
and references.

2.10.1 Complexity of equation-solving procedures


Let us first compare the relative complexity of the solution procedures described
in previous sections. Rather than viewing these procedures as precisely defined
algorithms, we look at them as broad algorithmic frameworks, or proof systems.
For instance, we do not want to specify how the next branching variable or how the
next consensus pair is selected. Moreover, we only consider the simplest versions
of the procedures, without any fancy preprocessing or additional heuristics.
We focus on the number of computational steps required to prove the inconsis-
tency of a DNF equation φ(x1 , x2 , . . . , xn ) = 0 (this is in a sense the more difficult
half of the problem, since proving consistency only requires us to exhibit a solu-
tion). Loosely speaking, we say that algorithm A is stronger than algorithm B if,
for some implementation of A, the number of steps required by A for proving the
inconsistenty of φ(x1 , x2 , . . . , xn ) = 0 is no larger than the number of steps required
by any implementation of B (see Urquhart [882] for a more rigorous statement of
this definition).
Theorem 2.15. Cutting-plane procedures are stronger than consensus pro-
cedures, which are stronger than both branching and variable elimination
procedures.
Proof. The relative strength of consensus and cutting-plane procedures was exam-
ined in Section 2.8.1 (see the comments following Theorems 2.12 and 2.13). We
noted at the end of Section 2.7 that consensus is more powerful than variable elim-
ination, since eliminating variable xj can be viewed as performing all possible
consensus steps on xj , for j = 1, 2, . . . , n. Thus, it only remains to establish that
the consensus procedure is stronger than branching.
More precisely, we want to prove that, if a branching tree contains β nodes
and eventually demonstrates the inconsistency of φ = 0, then there is a consensus
derivation of the constant 1 in β steps. The proof is by induction on the number of
variables. Suppose that

φ = x n φ0 ∨ xn φ1 ∨ φ2 , (2.33)

where φ0 , φ1 and φ2 are DNFs which do not involve xn , and suppose that the
first branching takes place on xn . Two subtrees are created, corresponding to the
equations φ0 ∨ φ2 = 0 and φ1 ∨ φ2 = 0. Say these trees have sizes β0 and β1 ,
respectively, where β0 + β1 = β − 1. Since both equations are inconsistent, the
constant 1 can be derived by consensus from each of them in, at most, β0 and β1
steps, respectively. Now, apply the same consensus steps to the terms of φ (note
2.10 More on the complexity of Boolean equations 105

that each term of φ0 ∨ φ2 or φ1 ∨ φ2 corresponds to a term of φ). Either these con-


sensus steps yield the constant 1, or they must respectively yield the terms x n and
xn . Then, one more consensus step produces the constant 1, and the total length of
this derivation is at most β. 

Since solving Boolean equations is NP-hard, one may expect any solution
procedure to take an exponential number of steps on some classes of instances.
Identifying bad instances for any particular method, however, is not an easy task.
The so-called pigeonhole formulae have played an interesting role in this respect.
These formulae express that it is impossible to assign n + 1 pigeons to n holes
without squeezing two pigeons into a same hole. In Boolean terms, this rather
obvious fact of life translates into the inconsistency of the DNF equation

n+1 n n n+1
n
x ik ∨ xik xj k = 0, (2.34)
i=1 k=1 i=1 j =i+1 k=1

where variable xik takes value 1 if the i-th pigeon is assigned to the k-th hole.
In a famous breakthrough result, Haken [433] showed that any consensus proof
of inconsistency has exponential length for the pigeonhole formulae. Other hard
examples for consensus (and hence, for branching and variable elimination) were
later provided by Urquhart [880] (see also Section 2.10.2).
It can be shown, however, that cutting-plane derivations of length O(n3 ) are suf-
ficient to prove the inconsistency of (2.34) (see [210]). Exponential lower bounds
for cutting-plane proofs are provided by Pudlák [761]. Let us also mention that an
extended version of consensus has been introduced by Tseitin [872], and is known
to be at least as strong as cutting-plane proofs [210]. Interestingly, no exponential
lower bound has been established for this extended consensus algorithm. We refer
to Urquhart [882] for a discussion of the complexity of other proof systems.
A number of authors have examined upper bounds on the number of steps
required to prove the inconsistency of a DNF equation φ(x1 , x2 , . . . , xn ) = 0.
Branching procedures trivially require O(2n ) steps. Monien and Speckenmeyer
[690] have improved this bound by proving that a variant of the branching proce-
dure solves DNF equations of degree k in at most O(αkn ) steps, where αk is the
largest root of the equation
x k = 2x k−1 − 1,
for k = 1, 2, . . . , n. One computes: α3 = 1.618, α4 = 1.839, α5 = 1.928, and so on.
Note that αk < 2 for all k, but that αk quickly approaches 2 as k goes to infinity. It
is an open question whether DNF equations in n variables can be solved in O(α n )
steps for some constant α < 2.
The above bounds have been subsequently improved by several authors, see for
instance Kullmann [588]; Schiermeyer [808]; Paturi, Pudlák, Saks, and Zane [731];
Dantsin et al. [255]. In particular, the algorithm in [255] requires (2 − 2/(k + 1))n
steps for equations of degree k and O(1.481n ) steps for cubic DNF equations.
106 2 Boolean equations

Van Gelder [885] described an algorithm requiring at most O(1.093|φ| ) steps,


where |φ| is the input length of the DNF (his analysis yields a bound of
O(1.189|φ| ) for arbitrary, non-DNF equations). Hirsch [492] strengthened the
bound to O(1.074|φ| ) and to O(1.239m ), where m is the number of terms of the
DNF. Yamamoto [930] slightly improved the latter bound to O(1.234m ).
Crama, Hansen, and Jaumard [235] proved that a variable elimination algorithm
for nonlinear 0-1 programming runs in time O(n 2tw(φ) ), where tw(φ) is the so-
called tree-width of a graph associated to φ; their arguments are easily adapted to
show that the same bound applies to the Boolean procedure Eliminate. We refer
to their paper for details.

2.10.2 Random equations


A large body of literature has been devoted to the investigation of random Boolean
expressions and random Boolean equations. This approach allows, for instance, a
better understanding of the distinctive features of hard versus easy equations, the
analysis of the behavior of algorithms over various distributions of instances, or
nonconstructive proofs of the existence of certain types of expressions.
We limit our discussion to one particular distribution of random expressions
(see, e.g., [209, 418, 763] for other probabilistic models).
Definition 2.9. Let n, m and k be positive integers. A random (n, m, k)-DNF is

a DNF φ(x1 , x2 , . . . , xn ) = m j =1 Cj whose terms C1 , C2 , . . . , Cm are drawn inde-
pendently and uniformly from among all elementary conjunctions of degree k
on x1 , x2 , . . . , xn . A random (n, m, k)-equation is an equation φ = 0, where φ is a
random (n, m, k)-DNF.
Note that all terms of a random (n, m, k)-DNF have degree exactly k, and that
the definition allows for repeated terms but not for terms which are identically 0.
Since adding terms increases the probability of introducing inconsistencies, one
can expect “long” equations, that is, equations where m is large relative to n, to be
inconsistent with high probability, and “short” equations to be consistent with high
probability. More precise versions of these statements can actually be established.
We start with an easy observation due to Franco and Paull [345].
Theorem 2.16. Let φ = 0 be a random (n, m, k)-equation, where m = c n for
some constant c. If c > −1/ log2 (1 − 2−k ), then the equation is inconsistent with
probability tending to 1 as n goes to infinity.

Proof. Let the random equation be φ(X) = m j =1 Cj = 0, and consider an arbitrary
point X∗ in Bn . For j = 1, 2, . . . , m, theprobability ∗ −k
 that Cj (X ) = 0 is 1 − 2 , and
∗ −k m
hence the probability that φ(X ) = 0 is 1−2 m . Therefore, the expected number
of solutions of the equation is 2n 1 − 2−k . If m = c n and c > −1/ log2 (1 − 2−k ),
then this expected number goes to 0 as n goes to infinity, which proves the
statement. 
2.10 More on the complexity of Boolean equations 107

For instance, by setting k = 2 or k = 3 in the theorem, we conclude that almost


all quadratic equations with more than 2.5 n terms, and almost all cubic equations
with more than 5.191 n terms, are inconsistent.
A simple counting argument shows that very short equations are almost always
consistent.
 
Theorem 2.17. If m j =1 2
−|Cj |
< 1, then the DNF equation φ = m j =1 Cj = 0 is
k
consistent. In particular, every (n, m, k)-equation with m < 2 is consistent.

Proof. Each term Cj (X) takes value 1 in exactly 2n−|Cj | points of B n , for
j = 1, 2, . . . , m. So, φ(X) takes value 1 in at most m j =1 2
n−|Cj |
points of Bn .
m −|C | n
If j =1 2 j < 1, then φ(X) takes value 1 in less than 2 points, which implies
that φ(X) = 0 is consistent. The second statement is an immediate corollary of the
first one. 

Of course, there is nothing really probabilistic about the previous result. In order
to improve the bound on m, however, several researchers have analyzed algorithms
which quickly find a solution of random equations with high probability. Following
previous work by Chao and Franco [187], Chvátal and Reed [202] were able to
show that, when k ≥ 2 and c < 2k /4k, random (n, m, k)-equations with m = c n
terms are consistent with probability approaching 1 as n goes to infinity (this is to
be contrasted with the lower bound on c in Theorem 2.16, which grows roughly
like 2k ln 2).
These results motivate the following conjecture (see [209, 349, 418]).
Threshold conjecture. For each k ≥ 2, there exists a constant c∗ such that random
(n, cn, k)-equations are consistent with probability approaching 1 as n goes to
infinity when c < c∗ and are inconsistent with probability approaching 1 as n goes
to infinity when c > c∗ .
Despite its appeal, the considerable experimental evidence for its validity, and
the existence of similar zero-one laws for other combinatorial structures, the thresh-
old conjecture has only been established when k = 2. In this case, Chvátal and Reed
[202] and Goerdt [390] were able to show that the conjecture holds for the threshold
value c∗ = 1. This result was subsequently sharpened by several researchers; see in
particular the very tight results by Bollobás, Borgs, Chayes, Kim, and Wilson [101].
For k = 3, experiments indicate the existence of a threshold around the value
c∗ = 4.2, but at the time of this writing, the available bounds only imply that, if c∗
exists, then 3.26 < c∗ < 4.506 (see Achlioptas and Sorkin [4]; Dubois, Boufkhad,
and Mandler [282]; Janson, Stamatiou, and Vamvakari [525], etc.). In a remarkable
breakthrough, however, Friedgut [348] proved that a weak form of the threshold
conjecture holds for all k when c∗ is replaced by a function depending on n only.
Achlioptas and Peres [3] established that the conjecture holds asymptotically when
k → +∞ with c∗ = 2k log 2 − O(k); see also Frieze and Wormald [350].
From an empirical point of view, it has been repeatedly observed that very
long and very short equations are easy for most algorithms, whereas hard nuts
108 2 Boolean equations

occur in the so-called phase transition region, near the crossover point at which
about half the instances are (in)consistent. These observations clearly have impor-
tant consequences for the design of experiments aimed at assessing the quality of
equation solvers. They have progressively led researchers to focus their computa-
tional experiments on the solution of special subclasses of random equations, or
on structured equations derived from the encoding of hard combinatorial problems
(see, e.g., [57, 239, 505, 687, etc.]).
The concept of random equations has also been used to analyze the efficiency
of solution algorithms. In a far-reaching extension of the results of Haken [433]
and Urquhart [880] (see Section 2.10.1), Chvátal and Szemerédi [203] proved that
for all fixed integers c and k ≥ 3, there exists ε > 0 such that, for large n, almost no
random (n, cn, k)-equations have consensus proofs of inconsistency of length less
than (1 + ε)n . In view of Theorems 2.15 and 2.16, this result actually implies that
almost all cubic equations with more than 5.191 n terms are hard for branching,
variable elimination, and consensus algorithms.
For more information on the analysis of random equations, we refer to an
extensive survey by Franco [344].

2.10.3 Constraint satisfaction problems and Schaefer’s theorem


A constraint satisfaction problem (CSP) is a class of Boolean equations which can
be formulated by imposing a finite number of constraints on Boolean variables.
If the 0-1 points that satisfy the i-th constraint are modeled as the set of solu-
tions of an “elementary” Boolean equation fi (X) = 0, for i = 1, 2, . . . , q, then the
corresponding CSP is simply the equation

f1 (X) ∨ f2 (X) ∨ . . . fq (X) = 0. (2.35)

Interestingly, and rather unexpectedly, the complexity of CSP can be character-


ized quite precisely: In an appropriate setting to be described later, it is possible to
classify every CSP of the form (2.35) as either “easy” or “hard,” depending only
on the nature of the individual constraints fi (X) = 0, which are used as building
blocks of the problem.
This line of research has been initiated in a seminal paper by Schaefer [807]
and pursued by several researchers after him; the book by Creignou, Khanna, and
Sudan [243] contains a detailed account of their results. In this section, we give
a precise statement of Schaefer’s theorem without going into the intricacies of its
proof.
Let us start with a formal definition of constraint satisfaction problems.
Definition 2.10. A (Boolean) constraint set F is a finite set of Boolean functions,
where fi is defined on B ni , ni ≥ 1, and fi is not identically 1ni (i = 1, 2, . . . , r). In
this context, each of the functions fi is called a constraint.
If Xi is an ni -dimensional vector of Boolean variables, then the pair (fi , X i ) is
called an application of the constraint fi to X i .
2.10 More on the complexity of Boolean equations 109

The constraint satisfaction problem associated to the constraint set F, or


CSP(F), is the (infinite) collection of Boolean equations of the form

fi1 (X i1 ) ∨ fi2 (X i2 ) ∨ . . . ∨ fiq (X iq ) = 0, (2.36)

where fi1 , fi2 , . . . , fiq are functions in the constraint set F, and (Xi1 ), (X i2 ), . . . ,
(X iq ) are vectors of Boolean variables of appropriate lengths. So, an instance of
CSP(F) is defined by the list of applications (fij , X ij ), j = 1, 2, . . . , q.
Let us give a few examples of constraint satisfaction problems.
Example 2.7. Consider the constraint set F QU AD = {f1 , f2 , f3 , f4 , f5 } in which
the constraints are represented by the following expressions:

f1 (x) = x, f2 (x) = x, f3 (x1 , x2 ) = x1 x2 , f4 (x1 , x2 ) = x1 x 2 , f5 (x1 , x2 ) = x 1 x 2 .

An instance of CSP(F QU AD ) is for example the equation:

f3 (x1 , x2 ) ∨ f3 (x2 , x3 ) ∨ f4 (x1 , x3 ) ∨ f4 (x1 , x4 ) ∨ f4 (x4 , x1 ) ∨ f4 (x4 , x3 )


∨ f5 (x2 , x3 ) = 0,

or, equivalently,

x1 x2 ∨ x2 x3 ∨ x1 x 3 ∨ x1 x 4 ∨ x 1 x4 ∨ x 3 x4 ∨ x 2 x 3 = 0.

Clearly, CSP(F QU AD ) is exactly the class of all (nontrivial) quadratic DNF


equations. As we mentioned in Section 2.2, such equations can be solved in
polynomial time (see also Chapter 5).
Note that an immediate generalization of this example would show that, for
every fixed integer k, the class of DNF equations of degree k can be represented
as a constraint satisfaction problem. This problem is NP-complete for all k > 2,
as stated by Cook’s theorem.
Consider next the set F 3NAE = {g} in which g is represented by the DNF

g(x1 , x2 , x3 ) = x1 x2 x3 ∨ x 1 x 2 x 3 .

Note that in any point (x1∗ , x2∗ , x3∗ ) ∈ B 3 , g(x1∗ , x2∗ , x3∗ ) = 1 if and only if x1∗ = x2∗ = x3∗ .
Therefore, we call g the “cubic not-all-equal” constraint, a name which is in turn
reflected in the notation F 3NAE . The constraint satisfaction problem CSP(F 3N AE )
is NP-complete (see the exercises at the end of the chapter). 

So, depending on the class F, the problem CSP(F) may be either easy or hard,
as illustrated by the classes introduced in the previous example. Schaefer’s theorem
very accurately separates those classes for which CSP(F) is polynomially solvable
from those for which it is NP-complete. Before we can state this result, however,
we need a few more definitions.
Extending Definitions 1.12 and 1.30 in Chapter 1, we say that a Boolean function
is quadratic if it can be represented by a DNF in which each term contains at most
110 2 Boolean equations

two variables, and that the function is Horn if it can be represented by a DNF in
which each term contains at most one complemented variable. Similarly, we say
that a function is co-Horn if it can be represented by a DNF in which each term
contains at most one noncomplemented variable. It will follow from the results in
Chapter 6 that CSP(F) is polynomially solvable when all the constraints in F are
Horn, or when they are all co-Horn.
Finally, we define a Boolean function f on B n to be affine if the set of false
points of f is exactly the set of solutions of a system of linear equations over
GF(2), that is, if f can be represented by an expression of the form
 
f (x1 , x2 , . . . , xn ) = ( xi ) ∨ (1 ⊕ xi ), (2.37)
A∈E0 i∈A A∈E1 i∈A

where E0 , E1 are families of subsets of {1, 2, . . . , n} (compare with (1.16)). We now


show that systems of linear equations over GF(2) can be solved by an elimination
procedure closely resembling the classical Gaussian elimination process.

Theorem 2.18. Systems of linear equations over GF(2) can be solved in


polynomial time.

Proof. Consider the system f (X) = 0, where f has the form (2.37), and assume
that the first term of (2.37) defines the equation

a ⊕ x1 ⊕ x2 ⊕ . . . ⊕ xn = 0,

where a ∈ {0, 1}. This equation can be rewritten as

xn = a ⊕ x1 ⊕ x2 ⊕ . . . ⊕ xn−1 ,

which can be used to eliminate variable xn from all subsequent equations.


We leave it as an exercise to work out the remaining details of this elimina-
tion procedure and to verify that it can be implemented to run in polynomial time. 

We are finally ready to present Schaefer’s result [807].

Theorem 2.19. If F satisfies either one of the conditions (1)–(6) hereunder, then
CSP(F) is polynomially solvable; otherwise, it is NP-complete.

(1) For every function f ∈ F, f (0, 0, . . . , 0) = 0.


(2) For every function f ∈ F, f (1, 1, . . . , 1) = 0.
(3) Every function f ∈ F is quadratic.
(4) Every function f ∈ F is Horn.
(5) Every function f ∈ F is co-Horn.
(6) Every function f ∈ F is affine.

Proof. The first half of the theorem is easy: CSP(F) is trivial under conditions (1)
and (2), and we have already discussed conditions (3)–(6). The NP-completeness
2.11 Generalizations of consistency testing 111

statement, therefore, is the hard nut to crack: we refer to Schaefer [807] or to


Creignou, Khanna, and Sudan [243] for a complete proof. 

Theorem 2.19 underlines the special role played by quadratic, Horn, and affine
functions in Boolean theory. Chapter 5 and Chapter 6 contain a thorough discussion
of quadratic and Horn functions, respectively. Affine functions will not be further
handled in the book. The monograph [243] contains additional facts about these
functions, as well as several extensions and refinements of Theorem 2.19; see also
Creignou and Daudé [241, 242] for probabilistic extensions.
Finally, we note that Boros, Crama, Hammer and Saks [116] established another
theorem separating NP-hard from polynomially solvable instances of Boolean
Equation (see Section 6.10.2). Although the nature of their classification result
is very different from Schaefer’s classification, it also stresses the importance of
quadratic and Horn equations.

2.11 Generalizations of consistency testing


In this section, we return to some of the extensions of consistency testing which
we briefly introduced in Section 2.4.

2.11.1 Counting the number of solutions


Counting the number of solutions of a Boolean equation, or equivalently, the num-
ber of false points of a Boolean function, is an old problem with applications
in reliability theory (see Section 1.13.4), in game theory (see Section 1.13.3), in
artificial intelligence (see for instance [792]), etc. We have already observed in
Theorem 1.38 that this problem is #P-complete even for quadratic positive func-
tions. It generalizes in an obvious way the consistency question, and its solution
has actually been used by some authors to attack Boolean equations indirectly (see
[145, 522, etc.]).
Of course, counting the number of true points of a function on Bn is equivalent
to counting the number of its false points, since the sum of these two numbers is
 m
exactly 2n . Now, the set of true points of a DNF φ = m k=1 Ck is just k=1 Tk ,
where Tk is the set of true points of the k-th term Ck , for k = 1, 2, . . . , m. Hence,
counting the number of true points of a function expressed in DNF can also be
viewed as determining the size of a union of sets, a problem which is frequently
attacked by inclusion-exclusion techniques. These links are explicitly stated and
exploited by several authors, see [144, 145, 522, 551, 614, 630, 759, etc.].
As discussed in Section 1.6, another way of counting the number of true points
of a function f consists in producing an orthogonal DNF of f and in applying
Theorem 1.8. In view of the relationship between BDDs and orthogonal forms (cf.
Section 1.12.3), related approaches can also be cast in a branching framework; see
[48, 49, 90, 111, 280, 619, 759, etc.].
112 2 Boolean equations

Recently, a number of specialized counting algorithms have been proposed in


[56, 257, 584, 802, etc.].

2.11.2 Generating all solutions


When the objective is to generate all solutions of a Boolean equation, we face
the additional difficulty that the number of solutions, and hence the length of
the output, can be exponential in the input size of the equation. Therefore, the
complexity of any algorithm solving this problem is most meaningfully analyzed
in terms of its input size and its output size (see Appendix B.8).
We first show that, in a sense, generating all solutions of an equation is not
much harder than testing its consistency.

Theorem 2.20. There is an algorithm which, given a Boolean expression φ on n


variables, produces all solutions of the equation φ = 0 by solving q + 1 Boolean
equations of size at most |φ| + nq, where q is the number of solutions of φ = 0.
If t(L) is the complexity of solving a Boolean equation with input size at most L,
then the running time of this algorithm is polynomial in |φ|, q and t(|φ| + nq).

Proof. We describe an algorithm which performs (q + 1) iterations and outputs a


new false point of φ at each of the first q iterations. To describe a generic iteration,
assume that we have already produced k ≤ q false points X1 , X 2 , . . . , X k . For
i = 1, 2, . . . , k, let Ci be the unique elementary conjunction such that Ci (X) = 1 if
and only if X = Xi , and solve the equation
k

φ(X) ∨ Ci (X) = 0. (2.38)
i=1

Clearly, X∗ ∈ B n is a solution of (2.38) if and only if X∗ is a false point of


φ which differs from X 1 , X 2 , . . . , X k . Thus, if we find such a solution, we let
X k+1 := X ∗ and proceed with the next iteration. Otherwise, we stop. Since each
iteration can be carried out in time O(t(|φ| + nq)), the proof is complete. 

Note that we did not assume anything about the expression φ in Theorem 2.20,
and hence, this result can be used to generate all solutions of a general equation
of the form φ(X) = ψ(X). Approaches of the type described in the proof of
Theorem 2.20 have also been used in the machine learning literature; see, for
instance, Angluin [21].
Other approaches rely on ad hoc modifications of the equation-solving pro-
cedures described in previous sections in order to generate all solutions of the
given equation. For instance, straightforward extensions of the branching pro-
cedure Branch can be used to handle the problem. We describe here another
approach, based on an extension of the variable elimination technique and the
following simple observations.
2.11 Generalizations of consistency testing 113

Theorem 2.21. The one-variable equation φ(x) = a x ∨ b x = 0, where a, b are


Boolean constants, is consistent if and only if a b = 0, or, equivalently, if and only
if b ≤ a. When this is the case, the solutions of the equation are the values of x
that satisfy b ≤ x ≤ a, namely,
x (0) = b = φ(0), (2.39)
(1)
x = a = φ(1). (2.40)
Proof. This can be checked directly. 

Theorem 2.22. The point (x1∗ , x2∗ , . . . , xn∗ ) is a solution of the equation
φ(x1 , x2 , . . . , xn ) = 0 if and only if (x1∗ , . . . , xn−1

) is a solution of the equation
φ(x1 , . . . , xn−1 , 0) φ(x1 , . . . , xn−1 , 1) = 0,
and xn∗ is a solution of the one-variable equation
φ(x1∗ , . . . , xn−1

, xn ) = 0.
Proof. This is trivial. 

Taken together, Theorems 2.21 and 2.22 provide a slight generalization


of Theorem 2.7, and they allow us to produce all solutions of the equation
φ(x1 , x2 , . . . , xn ) = 0 by the following recursive procedure: First, we successively
compute all expressions φn , φn−1 , . . . , φ0 , as in the procedure Eliminate. If φ0 = 1,
then the original equation is inconsistent. Otherwise, for j = 1, 2, . . . , n and for
each solution (x1∗ , . . . , xj∗−1 ) of the equation φj −1 (x1 , . . . , xj −1 ) = 0, we compute
the solutions of the one-variable equation φj (x1∗ , . . . , xj∗−1 , xj ) = 0, and we use
Theorem 2.22 to produce all solutions of the equation φj (x1 , . . . , xj ) = 0.
This procedure is reasonably efficient in the sense that, once the equation has
been found to be consistent, all its solutions are produced in quick succession
(exactly how efficient the procedure is depends on the size of the intermediate
expressions φj , j = 1, 2, . . . , n). For special classes of Boolean equations, however,
it may be possible to achieve a better performance; for instance, Feder [322]
describes a polynomial-delay algorithm to generate all solutions of a quadratic
DNF equation; see Section 5.7.1.
Finally, we observe that all solutions of the equation φ = 0 are immediately
available if a CNF expression of φ is at hand, because a CNF is equal to 0
exactly when (at least) one of its terms is 0. Of course, obtaining a CNF of φ
(or equivalently, dualizing φ) is generally quite difficult. We return to this problem
in Chapter 4.

2.11.3 Parametric solutions


Rather than explicitly generating all solutions of a Boolean equation, we may want
to obtain an implicit representation of these solutions.
114 2 Boolean equations

Definition 2.11. A parametric solution of the Boolean equation φ(x1 , x2 , . . . ,


xn ) = 0 is a mapping σ: B m → Bn with the property that, for all (x1∗ , x2∗ , . . . , xn∗ ) ∈
Bn , (x1∗ , x2∗ , . . . , xn∗ ) is a solution of the equation if and only if there exists
(p1 , p2 , . . . , pm ) ∈ Bm such that σ (p1 , p2 , . . . , pm ) = (x1∗ , x2∗ , . . . , xn∗ ). The para-
metric solution σ is called reproductive if n = m and σ (p1 , p2 , . . . , pn ) =
(p1 , p2 , . . . , pn ) whenever (p1 , p2 , . . . , pn ) is a solution of the equation.

In other words, a parametric solution is a surjective mapping σ : B m → F ,


where F is the set of false points of φ, and a reproductive solution is the identity
on F . Parametric solutions of Boolean equations have been investigated for a very
long time (see Hammer and Rudeanu [460] or Rudeanu [795] for references).
Their connection with the concept of “Boolean unification” has been recently
examined, for instance, in [167, 672], where the authors point out their relevance for
manipulating hardware descriptions (e.g., for verifying and testing digital circuits).
A classical result due to Löwenheim [628, 629] allows the construction of a
parametric solution of the equation φ = 0 once a particular solution of the equation
is known (this is reminiscent of the solution of differential equations in calculus).

Theorem 2.23. Let X∗ = (x1∗ , x2∗ , . . . , xn∗ ) be a particular solution of the Boolean
equation φ(X) = 0, and consider the functions σi : Bn → Bn defined by

σi (p1 , p2 , . . . , pn ) = xi∗ φ(p1 , p2 , . . . , pn ) ∨ pi φ(p1 , p2 , . . . , pn )

for i = 1, 2, . . . , n. Then, σ = (σ1 , σ2 , . . . , σn ) is a reproductive parametric solution


of φ(X) = 0.

Proof. Let (p1 , p2 , . . . , pn ) ∈ B n , and let X = (x1 , x2 , . . . , xn ) = σ (p1 , p2 , . . . , pn ).


If φ(p1 , p2 , . . . , pn ) = 1, then xi = xi∗ for all i, so X = X ∗ is a solution of the
equation. If φ(p1 , p2 , . . . , pn ) = 0, then xi = pi for all i, so φ(x1 , x2 , . . . , xn ) =
φ(p1 , p2 , . . . , pn ) = 0. This implies that σ is a reproductive parametric solution
which maps every true point to X ∗ . 

Example 2.8. Let us return to the equation φ3 = x 1 x2 x 3 ∨ x1 x 2 x3 ∨ x 1 x 2 ∨


x 1 x3 ∨ x2 x3 = 0 which was examined in Example 2.2. We found there that
(x1∗ , x2∗ , x3∗ ) = (1, 0, 0) was a solution of this equation. Using Theorem 2.23, we
obtain the parametric solution

σ1 = φ3 (p1 , p2 , p3 ) ∨ p1 φ3 (p1 , p2 , p3 ),
σ2 = p2 φ3 (p1 , p2 , p3 ),
σ3 = p3 φ3 (p1 , p2 , p3 ).

Some additional manipulations show that this parametric solution can be alterna-
tively represented as (σ1 , σ2 , σ3 ) = (1, p1 p2 p3 , 0), and that it correctly describes
the two solutions of φ3 = 0, namely, the points (1, 0, 0) and (1, 1, 0). 
2.11 Generalizations of consistency testing 115

Another type of parametric solution can be derived from the variable elimination
principle. Note first that a parametric solution of a consistent one-variable equation
φ(x) = a x ∨ b x = 0 is given by σ (p) = p a ∨ p b = p φ(1) ∨ p φ(0) (compare
with Theorem 2.21). This observation leads to the following reformulation of
Theorem 2.22.

Theorem 2.24. The point (x1∗ , x2∗ , . . . , xn∗ ) is a solution of the equation
φ(x1 , x2 , . . . , xn ) = 0 if and only if (x1∗ , . . . , xn−1

) is a solution of the equation
φn−1 (x1 , . . . , xn−1 ) = 0 and

xn∗ = pn φ(x1∗ , . . . , xn−1



, 1) ∨ pn φ(x1∗ , . . . , xn−1

, 0)

for some parameter pn ∈ B.

Proof. This is an immediate consequence of Theorem 2.22 and the previous


observation. 

Theorem 2.24 allows us to compute recursively a parametric solution


of the equation φ = 0: If (σ1 , σ2 , . . . , σi−1 ) is a parametric solution of
φi−1 (x1 , x2 , . . . , xi−1 ), then Theorem 2.24 indicates that σi can be obtained as

σi (p1 , p2 , . . . , pi ) = pi φi (σ1 , σ2 , . . . , σi−1 , 1) ∨ pi φi (σ1 , σ2 , . . . , σi−1 , 0).

This process yields a parametric solution in “triangular form,” where σi depends


on (p1 , p2 , . . . , pi ), but not on (pi+1 , pi+2 , . . . , pn ), for i = 1, 2, . . . , n. Furthermore,
this solution can be shown to be reproductive (we leave the proof of this claim as
an exercise).

Example 2.9. Let us again return to Example 2.2. Using Theorem 2.24 and the
expression φ1 = x 1 derived in Example 2.2, we find σ1 = p1 φ1 (1) ∨ p1 φ1 (0) =
p1 ∨ p1 = 1.
Next, σ2 = p2 φ2 (σ1 , 1) ∨ p2 φ2 (σ1 , 0) = p2 φ2 (1, 1) ∨ p 2 φ2 (1, 0). In view of
Example 2.2, φ2 (1, 0) = φ2 (1, 1) = 0, so that σ2 = p2 .
Finally, σ3 = p3 φ3 (σ1 , σ2 , 1) ∨ p 3 φ3 (σ1 , σ2 , 0) = p3 φ3 (1, p2 , 1) ∨ p3 φ3 (1, p2 , 0).
From the expression of φ3 , we find immediately that σ3 = 0.
Note that this solution (σ1 , σ2 , σ3 ) = (1, p2 , 0) is in triangular form, as opposed
to the solution derived in Example 2.8, and that it is reproductive. 

More information on parametric solutions can be found in Hammer and


Rudeanu [460], Martin and Nipkow [672], and Rudeanu [795, 796].

2.11.4 Maximum satisfiability



Definition 2.12. If φ(X) = m n
k=1 Ck is a DNF on B , and if positive real weights
w1 , w2 , . . . , wm are associated with the terms C1 , C2 , . . . , Cm , then the (weighted)
maximum satisfiability problem, or Max Sat, asks for a point X∗ ∈ B n that
116 2 Boolean equations

maximizes the total weight of the terms canceled by X∗ . In other words, Max Sat
is the optimization problem
m

maximize { wk | Ck (X) = 0 } subject to X ∈ Bn .
k=1

The name Max Sat refers more properly to a dual version of the problem
in which the objective is to maximize the number of satisfied clauses of a CNF

ψ(X) = m k=1 Dk , where a clause k is satisfied if it takes value 1. Clearly, both
versions of the problem are equivalent. To be consistent with the remainder of
the book, we carry on the discussion in terms of DNFs; but the terminology Max
Sat is so deeply entrenched that we prefer to apply it to this DNF version as well,
rather than inventing some neologism like “maximum falsifiability problem.”
Max Sat is a natural generalization of DNF equations, viewed as collections
of logical conditions C1 (X) = 0, C2 (X) = 0, . . . , Cm (X) = 0. When the equation
φ(X) = 0 is inconsistent, we may be happy to find a model X∗ that satisfies as
many of the conditions as possible. Applications are discussed, for instance, in
Hansen and Jaumard [468].
Let us call Max d-Sat the restriction of Max Sat to DNFs of degree d. In view
of Cook’s theorem, Max d-Sat is NP-hard for all d ≥ 3. But a stronger statement
can actually be made (Garey, Johnson, and Stockmeyer [372]).
Theorem 2.25. Max 2-Sat is NP-hard, even when w1 = w2 = . . . = wm = 1.

Proof. The problem of solving the DNF equation φ(x1 , x2 , . . . , xn ) = m k=1 Ck = 0
is NP-complete even when all terms of φ have degree exactly 3 (see [208, 371],
Theorem 2.4 and Exercise 4). With such an equation, we associate an instance of
Max 2-Sat on B n+m , as follows. First, we introduce m new variables y1 , y2 , . . . , ym .
Next, for all k = 1, 2, . . . , m, if Ck = u1 u2 u3 is the kth term of φ, where u1 , u2 , u3
are distinct literals, we create a subformula ψk consisting of 10 terms:
ψk = u1 ∨ u2 ∨ u3 ∨ u1 u2 ∨ u1 u3 ∨ u2 u3 ∨ yk ∨ u1 y k ∨ u2 y k ∨ u3 y k .

Finally, the instance of Max 2-Sat is the DNF ψ = m k=1 ψk , with weight 1 on
each term. We claim that φ = 0 is consistent if and only if the optimal value of this
Max 2-Sat instance is at least 7m.
Indeed, suppose that φ(X ∗ ) = 0 for some point X ∗ ∈ B n , and consider a term
Ck = u1 u2 u3 . Either 1, 2, or 3 of the literals u1 , u2 , u3 take value 0 at X ∗ . If only
one of the literals is 0, then set yk∗ = 1; otherwise, set yk∗ = 0. The resulting point
(X∗ , Y ∗ ) cancels 7 terms of each DNF ψk , for k = 1, 2, . . . , m, and hence it cancels
7m terms of ψ.
Conversely, assume that the point (X ∗ , Y ∗ ) cancels 7m terms of ψ. For
k = 1, 2, . . . , m, it is easy to see that no assignment of values to u1 , u2 , u3 , yk cancels
more than 7 terms of ψk . Moreover, if u1 = u2 = u3 = 1, then at most 6 terms of
ψk can be cancelled. Therefore, (X ∗ , Y ∗ ) must cancel exactly 7 terms of each DNF
ψk , and X ∗ must be a solution of the equation φ(X) = 0. 
2.11 Generalizations of consistency testing 117

The following extension of Theorem 2.10 and Theorem 2.14 will be useful in
the sequel (see also Theorem 13.13 in Section 13.4.3).
Theorem 2.26. If
m



φ(x1 , x2 , . . . , xn ) = xi xj , (2.41)
k=1 i∈Ak j ∈Bk

then the optimal value of Max Sat is equal to the optimal value of the 0-1 linear
programming problem
m

maximize wk zk (2.42)
k=1
 
subject to (1 − xi ) + xj ≥ z k , k = 1, 2, . . . , m; (2.43)
i∈Ak j ∈Bk

xi ∈ {0, 1}, i = 1, 2, . . . , n; (2.44)


zk ∈ {0, 1}, k = 1, 2, . . . , m, (2.45)
as well as to the maximum over {0, 1}n and over [0, 1]n of the real-valued function
m
  
f (X) = wk 1 − xi (1 − xj ) . (2.46)
k=1 i∈Ak j ∈Bk

Proof. In any optimal solution (X ∗ , Z ∗ ) ∈ {0, 1}n+m of (2.42)–(2.45), variable zk∗


takes value 1 if and only if Ck (X ∗ ) = 0, since wk > 0 (k = 1, 2, . . . , m). This proves
the first statement.
 
Similarly, for every X ∗ ∈ {0, 1}n , the expression ( 1 − i∈Ak xi∗ j ∈Bk (1 − xj∗ ))
takes value 1 if and only if Ck (X ∗ ) = 0. This proves that the maximum of f (X)
over {0, 1}n coincides with the optimal value of Max Sat.
Finally, we claim that, if we view f (X) as a function on [0, 1]n , then
maxX∈[0,1]n f (X) = maxX∈{0,1}n f (X). The proof of this claim is similar to the
proof of Theorem 2.14 (see also Theorem 13.12 in Section 13.4.3). 

So, Max Sat can be seen as either a linear or a nonlinear optimization problem
in 0-1 variables and can, in principle, be solved by any 0-1 programming algorithm
(see, e.g., [707] and Chapter 13). Rather than diving into the details of specific
implementations, we restrict ourselves here to a few elegant results concerning
the performance of approximation algorithms for Max Sat, as these results tie in
nicely with previous sections of the chapter. We begin with a definition.
Definition 2.13. Let 0 < α ≤ 1. An α-approximation algorithm for Max Sat is
a polynomial-time algorithm which, for every instance of Max Sat, produces a
point X̂ ∈ Bn such that
m
 m

{ wk | Ck (X̂) = 0 } ≥ α maxn { wk | Ck (X) = 0 }.
X∈B
k=1 k=1
118 2 Boolean equations

The parameter α is called the performance guarantee of the algorithm.


Of course, it is not a priori obvious that there should exist an α-approximation
algorithm for Max Sat, for some α > 0. Johnson [536], however, was able to
establish the existence of such an algorithm (in this and the following statements,
we assume that the DNF φ has no empty terms).
Theorem 2.27. For all d ≥ 1, there is a (1 − 21d )-approximation algorithm for
the restriction of Max Sat to DNFs in which every term has degree at least d. In
particular, there is a 12 -approximation algorithm for Max Sat.
Proof. Let d ≥ 1 be the minimum degree of a term of φ, let XH = ( 12 , 21 , . . . , 12 )
denote the center of the unit hypercube, and consider the value assumed by the
function (2.46) at the point XH . Then,
 m m
H 1 |Ak |+|Bk | 1  1
f (X ) = wk 1 − ( ) ≥ (1− d ) wk ≥ (1− d ) W MS , (2.47)
k=1
2 2 k=1 2

where W MS is the optimal value of Max Sat. Therefore, starting from the point XH
and proceeding as in the last part of the proof of Theorem 2.14, we can produce a
point X̂ ∈ {0, 1}n such that f (X̂) ≥ f (X H ) ≥ (1− 21d ) W MS . This procedure clearly
runs in polynomial time, and hence, the algorithm that returns X̂ is a (1 − 21d )-
approximation algorithm. 

Note that the proof actually establishes a little bit more than what we claimed:

Namely, the algorithm always returns an assignment with value at least 12 m k=1 wk .
This shows in particular that, for any DNF equation φ = 0, there exists a point that
cancels at least half of the terms of φ.
Our proof of Theorem 2.27 is inspired from a probabilistic argument due to
Yannakakis [934]. In this approach, each variable xi is independently set to either
0 or 1 with probability 12 , and f ( 12 , 12 , . . . , 12 ) is interpreted as the expected objective
value of this random assignment. Then, the method of conditional probabilities is
used to “derandomize” the procedure. The above proof translates this probabilistic
method into a purely deterministic one (but not every probabilistic algorithm can
be so easily derandomized; see, for instance, [689] for a brief introduction to
probabilistic algorithms).
Theorem 2.27 has been subsequently improved by several authors, but the
first real breakthrough came with a 34 -approximation algorithm proposed by
Yannakakis [934] (note that Johnson’s algorithm has a performance guarantee
equal to 34 for DNFs without linear terms). Goemans and Williamson [389] later
proposed another, simpler 34 -approximation algorithm, which we now describe.
We need some preliminary results.
Define the sequence
1
βt = 1 − (1 − )t , t ∈ N.
t
2.11 Generalizations of consistency testing 119

Lemma 2.1. Let A, B be subsets of {1, 2, . . . , n} with A ∩ B = ∅ and |A| + |B| = n.


All solutions of the system
 
(1 − xi ) + xj ≥ z, (2.48)
i∈A j ∈B

(x1 , x2 , . . . , xn , z) ∈ [0, 1]n+1 (2.49)


satisfy the inequality
 
1− xi (1 − xj ) ≥ βn z. (2.50)
i∈A j ∈B

Proof. Assume without loss of generality that |A| = n and B = ∅. The arithmetic-
geometric mean inequality yields

 n n
 xi
n
xi ≤ i=1 ,
i=1
n

or equivalently
n
 n n
i=1 xi
xi ≤ .
i=1
n
n
From (2.48), i=1 xi ≤ n − z. Hence,
n
 n−z n z
xi ≤ ( ) = (1 − )n .
i=1
n n

The function h(z) = 1 − (1 − nz )n is concave on [0, 1], h(0) = 0 and h(1) = βn .


Thus, h(z) ≥ βn z on [0, 1], and the lemma follows. 

Goemans and Williamson [389] proved:


Theorem 2.28. There is a βd -approximation algorithm for Max d-Sat, for all
d ≥ 1. In particular, there is a (1 − 1e )-approximation algorithm for Max Sat.
Proof. Consider the linear relaxation of (2.42)–(2.45), that is, the linear program-
ming problem obtained after replacing the integrality constraints (2.44)–(2.45) by
the weaker constraints xi ∈ [0, 1] (i = 1, 2, . . . , n) and zk ∈ [0, 1] (k = 1, 2, . . . , m).
Call this problem LP MS. Let (XLP , Z LP ) ∈ [0, 1]n+m be an optimal solution of

LP MS, with value W LP = m LP
k=1 wk zk , and let W
MS
denote the optimal value
LP LP
of Max Sat. Note that (X , Z ) can be computed in polynomial time and that
W LP ≥ W MS .
Consider now the value taken by the function f defined by (2.46) at the point
XLP . Since each term of φ has degree at most d, and the sequence βt is decreasing
with t, Lemma 2.1 implies
m

f (XLP ) ≥ wk β|Ak |+|Bk | zkLP ≥ βd W LP ≥ βd W MS . (2.51)
k=1
120 2 Boolean equations

As in the proof of Theorem 2.14, we can find in polynomial time a point X̂ ∈ {0, 1}n
such that f (X̂) ≥ f (XLP ) ≥ βd W MS .
The second part of the statement follows from lim t→∞ βt = 1 − 1e . 

Since β2 = 0.75, Theorem 2.28 establishes the existence of a 34 -approximation


algorithm for Max 2-Sat. This result is in a sense complementary to Theorem 2.27,
since Johnson’s algorithm has performance guarantee equal to 0.75 when each term
of φ has degree at least 2, while the new algorithm has performance guarantee equal
to 0.75 when each term of φ has degree at most 2. This observation led Goemans
and Williamson [389] to the following stronger result (note that 1 − 1e ≈ 0.632):
Theorem 2.29. There is a 34 -approximation algorithm for Max Sat.
Proof. Let f1 = f ( 12 , 12 , . . . , 12 ), let f2 = f (X LP ), where (X LP , Z LP ) is an optimal
solution of the linear relaxation LP MS, and let W LP be the optimal value of
LP MS. In order to prove the theorem, we only have to establish that
f1 + f2 3 3
max(f1 , f2 ) ≥ ≥ W LP ≥ W MS , (2.52)
2 4 4
and to conclude as usual.
The first and last inequalities in (2.52) are trivial. For the middle one, notice
that (2.47) implies
m  m
1 1
f1 = wk 1 − ( )|Ak |+|Bk | ≥ wk 1 − ( )|Ak |+|Bk | zkLP .
k=1
2 k=1
2

Adding this to (2.51) yields


m

1
f1 + f2 ≥ wk 1 − ( )|Ak |+|Bk | + β|Ak |+|Bk | zkLP .
k=1
2

Let γt = 1 − ( 12 )t + βt for t ∈ N. Then, γ1 = γ2 = 1.5, and, for t ≥ 3, γt ≥


7
8
+ 1 − 1e ≥ 1.5. The middle inequality in (2.52) follows immediately. 

Several further improvements on the performance guarantee of 34 have been


subsequently reported in the literature, and more will certainly follow in years to
come. Several of these approaches rely on reformulations of Max Sat as a semidef-
inite programming problem. We refer the reader to Asano and Williamson [31]
for a 0.7846-approximation algorithm for Max Sat; to Avidor, Berkovitch, and
Zwick [38] for a 0.7968-approximation algorithm for Max Sat; to Feige and
Geomans [325] for a 0.931-approximation algorithm for Max 2-Sat; to Lewin,
Livnat, and Zwick [611] for a 0.9401-approximation algorithm for Max 2-Sat;
and to Karloff and Zwick [549] for a 0.875-approximation algorithm for Max
3-Sat.
By contrast, Håstad [478] has proved that, unless P = NP, no approximation
algorithm for Max 2-Sat can achieve a better guarantee than 21/22 ∼ = 0.9545,
2.12 Exercises 121

and no algorithm for Max 3-Sat (and, a fortiori, for Max Sat) can achieve a
better guarantee than 0.875. Hence, the performance guarantee in [549] is best
possible for Max 3-Sat, but a small gap remains between the known upper and
lower bounds for Max 2-Sat. Khot et al. [567] have shown that, if the so-called
“Unique Games Conjecture” holds, then it is NP-hard to approximate Max 2-Sat
to within any factor greater than 0.943, a bound that is extremely close to the
approximation ratio of 0.9401 due to Lewin, Livnat, and Zwick [611].
Escoffier and Paschos [315] analyze the approximability of Max Sat
under a different type of metric, namely the differential approximation ratio.
Creignou [240] and Khanna, Sudan, and Williamson [563] investigate and classify
some generalizations of Max Sat; see also Creignou, Khanna, and Sudan [243]
for a complete overview.
We have concentrated in this section on the approximability of Max Sat. On
the computational side, numerous algorithms have been proposed for the solu-
tion of Max Sat problems. Most of these algorithms rely on generalizations of
techniques described in previous sections, especially in Section 2.8. We do not
discuss these approaches in detail, and we refer instead to early work by Hansen
and Jaumard [468]; to the papers [55, 540, 557, 785] in the volume edited by Du,
Gu, and Pardalos [278]; to the book by Hoos and Stützle [508], and so on. Recent
efficient algorithms are proposed by Ibaraki et al. [515] or Xing and Zhang [926].
De Klerk and Warners [265] examine the computational performance of semidefi-
nite programming algorithms for Max Sat. We also refer to Chapter 13 for a more
general discussion of pseudo-Boolean optimization.

2.12 Exercises
1. Given an undirected graph G and an integer K, write a DNF φ such that the
equation φ = 0 is consistent if and only if G is K-colorable.
2. Prove that Boolean Equation can be solved in polynomial time when
restricted to DNF equations in which every variable appears at most twice.
3. Complete the proof of Theorem 2.18.
4. Prove that every Boolean equation can be transformed in linear time into an
equivalent DNF equation in which all terms have degree exactly equal to 3.
5. Prove that Boolean Equation is NP-complete, even when restricted to
DNF equations in which every variable appears at most three times.
6. Prove that Boolean Equation is NP-complete, even when restricted to
cubic DNF equations in which every term is either positive or negative.
7. Let ψ(X, Y ) be the DNF produced by the procedure Expand∗ when running
on the expression φ(X). Prove that, for all X∗ ∈ B n , φ(X ∗ ) = 0 if and only if
there exists Y ∗ ∈ B m such that ψ(X ∗ , Y ∗ ) = 0. Show that Y ∗ is not necessarily
unique.

8. Prove that the following problem is NP-hard: Given a DNF φ = k∈A Tk ,
find alargest subset of terms, say, B ⊆ A, such that the “relaxed” DNF
ψ = k∈B Tk is monotone (see Section 2.5 and [229]).
122 2 Boolean equations

9. Linear consensus is the restricted form of Consensus in which the pair


{xi C, x i D} considered in each step of the while-loop (after the first one)
involves the consensus generated in the previous step. Show that the empty
conjunction 1 can be derived by linear consensus whenever it can be derived
by consensus (i.e., whenever a DNF equation is inconsistent; see, e.g., [186,
571]).
10. Input consensus is the restricted form of Consensus in which a pair of
conjunctions {xi C, x i D} can be used to derive a consensus only if one of
xi C, x i D is among the terms {Ck | k = 1, 2, . . . , m} of the original DNF φ.
(a) Show that input consensus can fail to derive the empty conjunction 1
when φ = 0 is inconsistent.
(b) Prove that the empty conjunction 1 can be derived by input consensus
if and only if it can be derived by unit consensus steps ([186, 571]).
11. Let X∗ satisfy the inequalities (2.25)–(2.27) in the statement of Theorem
2.13. Show that, if X∗ violates (2.28), then 0 < xk∗ < 1 and the left-hand side
of (2.25) and (2.26), when evaluated at X∗ , is stricly less than 2 (see Hooker
[500]).
12. Prove that the inconsistency of the n-th pigeonhole formula has a cutting-
j +r
plane proof of length O(n3 ). Hint: Show how the inequality i=j xik ≤ 1
can be generated by Chvátal cuts, for r = 1, 2, . . . , n, j = 1, . . . , n − r, k =
1, 2, . . . , n.
13. Prove that, with probability tending to 1, random (n, m, 1)-equations of
degree 1 are consistent whenever m n−1/2 → 0 and inconsistent whenever
m n−1/2 → ∞ (see Chvátal and Reed [202]).
14. Prove that the parametric solution derived from the statement of Theorem
2.24 is reproductive.
3
Prime implicants and minimal DNFs
Peter L. Hammer and Alexander Kogan

This chapter is dedicated to two of the most important topics in the theory of
Boolean functions. The first is concerned with the basic building blocks of a
Boolean function, namely, its prime implicants. The set of prime implicants of
a Boolean function not only defines the function, but also provides detailed
information about many of its properties. In this chapter, we discuss various appli-
cations and basic properties of prime implicants and describe several methods for
generating all the prime implicants of a Boolean function.
The second deals with problems related to the representation of a Boolean
function by a DNF; that is, as a disjunction of elementary conjunctions. Since a
Boolean function may have numerous DNF representations, the question of finding
an “optimal” one plays a very important role. Among the most commonly consid-
ered optimality criteria, we discuss in detail the minimization of both the number
of terms and the number of literals in a DNF representation of a given function. We
explain the close relationship between these “logic minimization” problems and
the well-known set covering problem of combinatorial optimization; we describe
several efficient DNF simplification procedures; we establish the computational
complexity of logic minimization problems; and we present a “greedy” procedure
as an efficient and effective approximation algorithm for logic minimization.

3.1 Prime implicants


Let us first recall some of the notations and definitions introduced in Chapter 1
(see Section 1.7, in particular). For a Boolean variable x, we let

x, if α = 1,
xα =
x, if α = 0.

An elementary conjunction CP N = i∈P xi j ∈N x j is an implicant of a
Boolean function f (x1 , x2 , . . . , xn ) if CP N = 1 implies f = 1, or equivalently,
if CP N ≤ f . Clearly, every term of any DNF representing a Boolean function f is
an implicant of f . We say that a term C covers a point X if C(X) = 1.

123
124 3 Prime implicants and minimal DNFs

An implicant is called prime if it is not absorbed by any other implicant. In


other words, an implicant is prime if each elementary conjunction obtained by
eliminating an arbitrary literal from it is not an implicant. A DNF consisting only
of prime implicants is called a prime DNF.
A remarkable property of prime implicants is the fact that the disjunction of all
prime implicants represents the function. This expression is called the complete
DNF of the function. We have already noted, however, that a disjunction of a
(sometimes small) subset of prime implicants may already represent the function.
For example, the prime DNF
xy ∨ yz
represents the function whose complete DNF is

xy ∨ yz ∨ xz.

The prime implicants of a Boolean function can be viewed as its “building


blocks.” Indeed, a function is not only completely described by its prime impli-
cants, but also, as will be seen later, the set of all prime implicants reveals many
important properties of the function.

3.1.1 Applications to propositional logic and artificial intelligence


As already discussed in Chapter 1 (Section 1.13), Boolean functions find numerous
applications in the knowledge bases of expert systems. Such knowledge bases are
usually huge collections of propositional implication rules that formally represent
the expert knowledge in a particular domain. As was shown in Section 1.13, a
system of rules can be transformed in a straightforward way to a DNF expression
of a Boolean function, and vice versa. Recall, for instance, that the rule system
Rule 1: If x is false and y is true then z is true
Rule 2: If x is false and y is false then z is false
Rule 3: If z is true then x is false
Rule 4: If y is true then z is false
corresponds to the DNF

φ(x, y, z) = x y z ∨ x y z ∨ x z ∨ y z. (3.1)

This DNF is logically equivalent to the prime DNF φ = x y ∨ z, so that the original
rules 1–4 are equivalent to the conjunction of the following two rules:
Rule 5: If y is true then x is true
Rule 6: z is false
The two rule systems are logically equivalent in the sense that any logical deduction
that follows from one system also follows from the other one.
The foregoing example shows how the application of the notion of prime impli-
cants allows the simplification of an arbitrary system of rules. Moreover, any
3.1 Prime implicants 125

implicant of the associated DNF corresponds to a rule that can be deduced from
the rule system, and vice versa. For example, the term x z is an implicant of the
DNF φ(x, y, z), and therefore the rule
Rule 7: If z is true then x is true
can be deduced from the rule system.
Note that Rule 7 is not very interesting, since a more general rule (namely,
Rule 6: “z is false”) can also be deduced. Since z is a prime implicant of φ(x, y, z),
it is impossible to deduce a more general rule than the latter one. It is therefore
natural to consider the so-called irredundant rules, which correspond to the prime
implicants of the associated Boolean function. The complete DNF of the associated
Boolean function will provide all the irredundant rules that can be deduced from
the given rule system. While some of these rules may be present in the original
rule system or can be obtained by generalizing the rules of the original system (i.e.,
by removing some literals from them), some other irredundant rules may bear no
evident similarity to any of the initial rules. Such rules can reveal some possibly
interesting logical implications that are “hidden” in the original system.

3.1.2 Short prime implicants


In many cases, the most “important” prime implicants of a Boolean function are
the shortest ones. The presence of short prime implicants may allow to simplify
various problems concerning a Boolean function.
First of all, the constant 1, which is the only elementary conjunction of degree
0, is an implicant of a Boolean function f (x1 , . . . , xn ) if and only if f (x1 , . . . , xn )
is a tautology (namely, if f (x1 , . . . , xn ) = 1 for all Boolean vectors (x1 , . . . , xn )),
and in this case, it is its only prime implicant. In other words, the constant 1 is an
implicant (which cannot be but prime) of f if and only if there is no solution to
the equation f (x1 , . . . , xn ) = 0.
Similarly, a prime implicant of degree 1 is a literal; such implicants will be
called linear. If a function f has a linear prime implicant x (or a linear prime
implicant x), then no other prime implicant of f contains either x or x. Indeed,
if x is an implicant of f , then every other implicant of f containing the literal x
is absorbed by the implicant x, and therefore is not prime. On the other hand, if
xC is an implicant of f , where C is an elementary conjunction, then C is also an
implicant of f , since
C = xC ∨xC ≤ x ∨xC ≤ f.
This reasoning, together with Theorem 1.13, easily leads to the following result:
α
Theorem 3.1. If xi i , i = 1, 2, . . . , m, are prime implicants of the Boolean function
f (x1 , . . . , xn ), then there exists a Boolean function g(xm+1 , . . . , xn ) such that
m
α
f (x1 , . . . , xn ) = xi i ∨ g(xm+1 , . . . , xn ).
i=1
126 3 Prime implicants and minimal DNFs

α
Moreover, an elementary conjunction different from xi i , i = 1, . . . , m, is a prime
implicant of f (x1 , . . . , xn ) if and only if it is a prime implicant of g(xm+1 , . . . , xn ).
The decomposition provided by Theorem 3.1 allows the reduction of many
problems involving Boolean functions to the case of Boolean functions without
linear implicants.
Prime implicants of degree 2, also called quadratic prime implicants, define a
α α
partial order among certain literals. Indeed, if x1 1 x2 2 is an implicant of a Boolean
function f (x1 , . . . , xn ), then the inequality
α α
x1 1 ≤ x2 2 ,

or equivalently, the inequality


α α
x2 2 ≤ x1 1 ,
holds in every false point of f .
Example 3.1. Consider the Boolean function

f = xy ∨ yz ∨ xwz.

Since xy and yz are prime implicants of f , it follows that x ≤ y and y ≤ z in every


false point of f . In other words, if a false point has y = 1, then it must have x = 0
and z = 1. 

If x1 x2 is a prime implicant of f , then neither x1 x 2 nor x 1 x2 can be an implicant


of f , since otherwise, either x1 or x2 would also be an implicant, and hence x1 x2
would not be prime. If both x1 x 2 and x 1 x2 are prime implicants of f , then the
variables x1 and x2 are logically equivalent in the sense that, in every false point of
f , the value of x1 and the value of x2 are the same. The next theorem shows that in
this case x1 and x2 behave in a perfectly “symmetric” way in the prime implicants
of f .
Theorem 3.2. If both x1 x 2 and x 1 x2 are prime implicants of the Boolean function
f (x1 , x2 , . . . , xn ), then
(1) no other prime implicant of f depends on both x1 and x2 ;
(2) if an elementary conjunction C depends neither on x1 nor on x2 , then x1α C
is a prime implicant of f if and only if x2α C is a prime implicant of f;
(3) f (x1 , x2 , x3 . . . , xn ) = x1 x 2 ∨ x 1 x2 ∨ g(x1 , x3 , . . . , xn ), where g is obtained
from f by substituting x1 for x2 in f , that is, g(x1 , x3 , . . . , xn ) =
f (x1 , x1 , x3 . . . , xn );
(4) an elementary conjunction different from x1 x 2 or x 1 x2 is a prime implicant
of f (x1 , x2 , . . . , xn ) if and only if
• it is a prime implicant of g(x1 , x3 , . . . , xn ), or
• it is of the form x2α C, where the elementary conjunction C is such that
x1α C is a prime implicant of g(x1 , x3 , . . . , xn ).
3.1 Prime implicants 127

β γ
Proof. (1) Let x1 x2 C be a prime implicant of f . Clearly, γ = β since otherwise
β γ β
x1 x2 C is absorbed by x1 x 2 or x 1 x2 . However, in this case x1 C is an implicant,
since

β β β β β β γ β β
x1 C = x1 x2 C ∨ x1 x2 C ≤ x1 x2 C ∨ x1 x2 ≤ f .

β γ β
Therefore, x1 x2 C is not prime since it is absorbed by x1 C. This proves statement 1.
(2) If x1α C is an implicant of f , then x2α C is an implicant of f , since

x2α C = x1α x2α C ∨ x1α x2α C ≤ x1α C ∨ x1α x2α ≤ f .

Therefore, by symmetry, x1α C is an implicant of f if and only if x2α C is an implicant


of f .
If x1α C is a prime implicant of f , but the implicant x2α C is not prime, then, since
C is not an implicant, there must exist a C < C such that x2α C is an implicant of
f . Then x1α C is also an implicant of f , contradicting the assumption that x1α C is
prime. Hence, by symmetry, x1α C is a prime implicant of f if and only if x2α C is a
prime implicant of f . This proves statement 2.
(3) If we use Theorem 1.13 to represent f as its complete DNF, then statement
3 follows from the identity

x1 x 2 ∨ x 1 x2 ∨ x2α C = x1α x2α ∨ x1α x2α C ∨ x1α x2α ∨ x1α x2α C = x1α x2α ∨ x1α (x2α ∨ x2α C)

= x1α x2α ∨ x1α (x2α ∨ C) = x1 x 2 ∨ x 1 x2 ∨ x1α C

applied to each prime implicant depending on x2 .


(4) To prove that statement 4 holds, let us first note that every implicant of g is
an implicant of f .
Let us now show that any implicant C of f that does not depend on x2 is an
implicant of g. Indeed, if C(X∗ ) = 1 for some point X ∗ , then f (X∗ ) = 1, and
since C does not depend on x2 , f (x1∗ , x1∗ , . . . , xn∗ ) must also be 1, and therefore
g(X ∗ ) = 1. It follows now, just as in the proof of statement 2, that every prime
implicant of g is a prime implicant of f , and conversely, every prime implicant
of f that does not depend on x2 is a prime implicant of g. This fact, together
with statement 2, shows that a prime implicant of f that is not a prime impli-
cant of g is of the form x2α C, where x1α C is a prime implicant of g. This proves
statement 4. 

Remark 3.1. If x1 x2 and x 1 x 2 are prime implicants of a Boolean function f , then


a valid statement analogous to Theorem 3.2 is obtained after replacing x2 by x 2 ,
showing that in this case x1 and x 2 behave in a perfectly symmetric way in the
prime implicants of f . 
128 3 Prime implicants and minimal DNFs

3.2 Generation of all prime implicants


In this section, we consider the problem of generating all prime implicants of a
Boolean function, that is, the algorithmic problem:

Prime Implicants
Instance: An arbitrary expression of a Boolean function f .
Output: The complete DNF of f or, equivalently, a list of all prime implicants of f .

This problem has been intensively investigated in the literature since the early
1930s. Its complexity depends very much on the expression of f . We shall suc-
cessively handle the cases in which f is given by a list of its true points, or by an
arbitrary DNF, or by a CNF.

3.2.1 Generation from the set of true points


Let us first assume that the input of Prime Implicants takes the form of a list of
all true points of the function f or, equivalently, of a truth table, or of a minterm
expression of f . The results presented here have been known for a very long time
and seem to belong to the “folklore” of the field.
It is useful to associate an elementary conjunction with a pair of points in a
Boolean cube, as defined next.

Definition 3.1. Given two Boolean points Y = (y1 , y2 , . . . , yn ) and Z =


(z1 , z2 , . . . , zn ), the hull of Y and Z (denoted by [Y , Z]) is the elementary
conjunction defined by

[Y , Z] = xi xj .
i:yi =zi =1 j :yj =zj =0

Example 3.2. If Y = (1, 0, 1, 0, 1) and Z = (0, 0, 1, 1, 1), then [Y , Z] =


x 2 x3 x5 . 

Clearly, for any two Boolean points Y and Z,

[Y , Z] = [Z, Y ]

and
[Y , Z](Y ) = [Y , Z](Z) = 1.
In fact, the set of true points of [Y , Z] is the smallest subcube covering both Y
and Z.

Theorem 3.3. If C is an implicant of a Boolean function f , then for any true


point Y of f such that C(Y ) = 1, there exists a unique true point Z of f such that
C(Z) = 1 and C = [Y , Z].
3.2 Generation of all prime implicants 129


Proof. Let C = i∈P xi j ∈N x j . Since C(Y ) = 1, the point Y = (y1 , y2 , . . . , yn )
is such that yi = 1 for i ∈ P and yj = 0 for j ∈ N . Let us define the point
Z = (z1 , z2 , . . . , zn ) in the following way:

yi , if i ∈ P ∪ N ,
zi =
y i , if i  ∈ P ∪ N .

Clearly, C(Z) = 1, and therefore Z is a true point of f . Moreover, it follows easily


from Definition 3.1 that C = [Y , Z] and that C  = [Y , W ] when W  = Z. 

Corollary 3.1. If a Boolean function f  has


 m true points, then the number of
(prime) implicants of f does not exceed m2 + m.

Proof. By Theorem 3.3, every implicant of f is the hull of some pair of (possibly
identical) true points of f , and every pair of true points generates in this way at
most one implicant of f . 

We now describe an efficient way of generating all (prime) implicants of a


Boolean function f when the set T (f ) of its true points is given. To generate all
implicants of f , it is sufficient to examine all the pairs of true points of f , and
check for every pair whether its hull is an implicant of f .
Given the set T (f ) of all true points of a Boolean function f on Bn , and an
elementary conjunction C of degree d, one can easily check whether C is an
implicant of f . Indeed, one can simply count the number of points X ∈ T (f ) such
that C(X) = 1: Obviously, C is an implicant of f if and only if this count equals
2n−d . For every X ∈ T (f ), both evaluating C(X) and incrementing the counter can
be done in O(n) time. Therefore, for every pair of true points, it can be checked in
O(n|T (f )|) time whether or not its hull is an implicant of f , and all the implicants
of f (possibly with repetitions) can be generated in O(n|T (f )|3 ) time.
Now, given a list (possibly with repetitions) of all the implicants of a Boolean
function f , one can generate all the prime implicants of f by eliminating from
the list those implicants that are absorbed by some other  ones.
 If the list contains
M elementary conjunctions, a naive approach requires M2 pairwise comparisons,
each taking O(n) time. In this way, the list of all prime implicants is generated in
O(nM 2 ) time. Since M is typically much larger than n, one may want to reduce
the generation time by making use of the fact that all the implicants are present in
the list. We now
describe
how to achieve this time reduction.
α α
Let C = i∈L xi i , and let us denote C j = i∈L\{j } xi i for j ∈ L. If C is an
implicant of f , then C is prime if and only if no C j (for j ∈ L) is an implicant of
f . To be able to find out efficiently whether an elementary conjunction is present
in the list of all implicants of f , we need to order the implicants in such a way that
C j always appears “before” C in the list.
Let us introduce such a linear order on the set of all elementary conjunctions.
We place C “before” C if the degree of C is lower than the degree of C . When
130 3 Prime implicants and minimal DNFs

C and C have the same degree, then their order is the lexicographic order induced
by the linear order of literals, whereby xi is before x i , x j is before than ∗ (meaning
“not present”), and xi is before xj if i < j .
A comparison of two elementary conjunctions according to this order can be
performed in O(n) time, and the set of implicants of f can be linearly ordered in
O(nM log M) time. Ordering the list also allows us to eliminate possible repeti-
tions. Then, using binary search, one can check whether a conjunction is present
in the list by doing at most log M comparisons, that is, in O(n log M) time. For
any implicant C, at most n conjunctions C j need to be checked. Therefore, all the
nonprime implicants can be eliminated from the list in O(n2 M log M) time. When
M is sufficiently large, this bound is better than the naive O(nM 2 ) bound.
The arguments above prove the following statement:

Theorem 3.4. If a Boolean function f of n variables is represented by the set


T (f ) of its true points, then

(a) all implicants of f can be generated in O(n|T (f )|3 ) time;


(b) all prime implicants of f can be generated in O(n|T (f )|2 (|T (f )| +
n log |T (f )|)) time. 

Note that for those Boolean functions whose number of true points is sufficiently
large, the additional expense of reducing the list of all implicants and keeping only
the prime ones is asymptotically negligible compared with the time required to
generate all the implicants.

3.2.2 Generation from a DNF representation: The consensus method


We now turn to the Prime Implicants problem when its input is in disjunctive nor-
mal form. The best-known method of solving this problem is the consensus method.
Recall that we introduced this fundamental procedure in Chapter 2, Section 2.7,
for the solution of DNF equations. The consensus method, however, has been ini-
tially proposed, and repeatedly rediscovered, as a method of generating all prime
implicants of a function represented in DNF. The most frequently cited references
in this framework include Blake [99], Samson and Mills [801], and Quine [768];
see Brown [156] for a historical perspective on the development of the consensus
method.
Given an arbitrary DNF φ(x1 , . . . , xn ), the consensus procedure transforms φ
by repeatedly applying the operations of absorption and consensus, as displayed
in Figure 3.1 (recall Definition 2.5 and compare with Figure 2.4 in Section 2.7;
the present section contains significant overlap with Section 2.7, but we find it
advisable to repeat some of those definitions and concepts here for the sake of
clarity). In the description of this procedure, we use the shorthand φ \ D to denote
the DNF obtained by removing a term D from the DNF φ.
3.2 Generation of all prime implicants 131

Procedure Consensus*(φ)

Input: A DNF expression φ(x1 , . . . , xn ) = m
k=1 Ck of a Boolean function f .
Output: The complete DNF of f , that is, the disjunction of all prime implicants of f .

begin
while one of the following conditions applies do
if there exist two terms C and D of φ such that C absorbs D
then remove D from φ: φ := φ \ D;
if there exist two terms xi C and x i D of φ such that xi C and x i D
have a consensus and CD is not absorbed by another term of φ
then add CD to φ: φ := φ ∨ CD ;
end while
return φ;
end

Figure 3.1. Procedure Consensus*

The consensus procedure stops when

(1) the absorption operation cannot be applied, and


(2) either the consensus operation cannot be applied, or all the terms that can
be produced by consensus are absorbed by other terms in φ.

We shall say that a DNF is closed under absorption if it satisfies the first
condition above, and that it is closed under consensus if it satisfies the second
condition.
Note that the consensus procedure always terminates and produces a DNF
closed under consensus and absorption in a finite number of steps: Indeed, the
number of terms in the given variables is finite, and once a term is removed by
absorption, it will never again be added by consensus.

Example 3.3. Consider the following DNF:

φ(x1 , x2 , x3 , x4 ) = x1 x 2 x3 ∨ x 1 x 2 x4 ∨ x2 x3 x4 .

Note that absorption cannot be applied to φ. The application of consensus to the


first two terms of φ transforms it into

φ (x1 , x2 , x3 , x4 ) = x1 x 2 x3 ∨ x 1 x 2 x4 ∨ x2 x3 x4 ∨ x 2 x3 x4 .

Again, absorption cannot be applied to φ . The application of consensus to the last


two terms of φ transforms it into

φ (x1 , x2 , x3 , x4 ) = x1 x 2 x3 ∨ x 1 x 2 x4 ∨ x2 x3 x4 ∨ x 2 x3 x4 ∨ x3 x4 .

Now the last term of φ absorbs the two previous terms, and φ is transformed
into
φ (x1 , x2 , x3 , x4 ) = x1 x 2 x3 ∨ x 1 x 2 x4 ∨ x3 x4 .
132 3 Prime implicants and minimal DNFs

Here, the consensus procedure stops. Note that the first two terms of φ actually
have a consensus, but it is absorbed by the last term. 

We have already observed that the operations of absorption and consensus


transform DNFs, but do not change the Boolean functions that they represent.
This is implied by the two lemmas below, which easily follow from the basic
Boolean identities (see also Theorem 2.8).
Lemma 3.1. For any two elementary conjunctions C and CD,

C ∨ CD = C.

Lemma 3.2. For any two elementary conjunctions xC and xD,

xC ∨ xD = xC ∨ xD ∨ CD.

The importance of the consensus procedure in the theory of Boolean functions


derives from the following theorem, which asserts the correctness of Procedure
Consensus*.
Theorem 3.5. Given an arbitrary DNF φ of a Boolean function f , the consensus
procedure applied to φ produces the complete DNF of f , that is, the disjunction
of all prime implicants of f .
In view of its crucial role, we shall provide two alternative proofs of this the-
orem. In order to present the first proof, we start by establishing two technical
lemmas.
Lemma 3.3. Given an arbitrary DNF φ of a Boolean function f , a prime implicant
of f can involve only those variables that are present in φ.
Proof. If φ does not involve x, then the value of f does not change when only the
value of x changes. Therefore, implicants of f involving x cannot be prime. 

Lemma 3.4. Given an arbitrary DNF φ of a Boolean function f , if C is an


implicant of f that involves all variables present in φ, then C is absorbed by a
term in φ.
Proof. If C contains all the variables in φ, then the assignment that makes C = 1
assigns values to all variables in φ. Since C is an implicant of f , this assignment
makes φ = 1, and therefore at least one term in φ is 1. This term absorbs C. 

We are now ready to proceed with a first proof of the theorem.


Proof of Theorem 3.5. We prove the statement by contradiction. Let us assume
that there exists a Boolean function f and a DNF φ of f such that, when the
consensus procedure is applied to φ, it returns a DNF ψ that does not contain
some prime implicant C0 of f . Lemma 3.3 implies that C0 involves only variables
3.2 Generation of all prime implicants 133

present in ψ. Let us consider the set S of elementary conjunctions C satisfying


the following three conditions:
1. C only involves variables present in ψ.
2. C ≤ C0 (and therefore C is an implicant of f ).
3. C is not absorbed by any term in ψ.
The set S is not empty, since C0 satisfies all three conditions. Let C m be a
term of maximum degree in S. Since C m is not absorbed by any term in ψ, by
Lemma 3.4, C m cannot involve all the variables present in ψ. Let x be a variable
present in ψ and not present in C m . The degree of the elementary conjunctions
xC m and xC m exceeds that of C m . Since the degree of C m was assumed to be
maximum, xC m and xC m do not belong to S, and therefore cannot satisfy all three
conditions. Since they obviously satisfy the first two conditions, they must violate
the last one, namely, there must exist terms C and C in ψ such that xC m ≤ C and
xC m ≤ C . Since C m is not absorbed by either C or C , it follows that C = xD
and C = xD , where D and D are elementary conjunctions that absorb C m . This
implies that D and D do not conflict in any variable. Therefore, the consensus of
C and C exists: It is D D , and this term absorbs C m . Since the consensus pro-
cedure stops on the DNF ψ, there must exist a term C in ψ that absorbs D D .
Then C must also absorb C m , contradicting the assumption that C m belongs
to S. 

Before we proceed with the second proof of Theorem 3.5, we first establish
a lemma. Recall from Section 1.9 that, if A and B are two disjoint subsets of
{1, . . . , n}, then the set of all Boolean vectors in Bn whose coordinates in A are
fixed at 1 and whose coordinates in B are fixed at 0, forms a subcube of B n . This
subcube is denoted by TA,B .
Lemma 3.5. Let φ be a DNF closed under consensus and let TA,B be a subcube.
The equation φ(X) = 0 has a solution in TA,B if and only if no term of φ is
identically 1 on TA,B .
Proof. The “only if ” part of the statement is trivial. Let us prove the “if ” part by
contradiction. Assume that TA∗ ,B ∗ is a subcube such that no term of φ is identically
1 on TA∗ ,B ∗ , and such that no solution of φ(X) = 0 exists in TA∗ ,B ∗ ; moreover,
assume that |A∗ ∪ B ∗ | has maximum cardinality among all subcubes satisfying
these conditions. Clearly, |A∗ ∪ B ∗ | ≤ n − 1, since the statement trivially holds
when |A ∪ B| = n.
Let us select an arbitrary variable xi such that i  ∈ A∗ ∪ B ∗ . Since each of the two
subcubes TA∗ ∪{i},B ∗ and TA∗ ,B ∗ ∪{i} is a subset of TA∗ ,B ∗ , no solution of the equation
φ(X) = 0 exists either in TA∗ ∪{i},B ∗ or in TA∗ ,B ∗ ∪{i} . It follows, then, from the max-
imality of |A∗ ∪ B ∗ | that, on each of the two subcubes TA∗ ∪{i},B ∗ and TA∗ ,B ∗ ∪{i} ,
at least one of the terms of φ is identically 1. Obviously, one of these terms must
involve the literal xi , while the other one must involve x i . Let the two terms in
question be xi C and x i D. Clearly, C and D are elementary conjunctions that are
134 3 Prime implicants and minimal DNFs

both identically 1 on the subcube TA∗ ,B ∗ . Therefore, C and D cannot conflict, and
hence the consensus CD of xi C and x i D exists. Since φ is closed under consensus,
it must contain a term E that absorbs CD. Then this term E must be identically 1
on the subcube TA∗ ,B ∗ , contradicting our assumption. 

Lemma 3.6. A DNF is closed under consensus if and only if it contains all prime
implicants of the Boolean function it represents.

Proof. We first prove the “if ” part of the statement. It follows from Lemma 3.2 that
the consensus of any two terms of a DNF φ is an implicant of the Boolean function
f represented by φ. This implies in turn that, if φ contains all prime implicants of
f , then it is closed under consensus.
To prove the “only if ” part of the lemma, let us assume that a DNF φ representing
the
Boolean function f is closed under consensus, and that the conjunction C =

i∈P xi i∈N x i is a prime implicant of f not contained in φ. Clearly, the partial
assignment defined by

xi = 1 for i ∈ P and xi = 0 for i ∈ N

makes no term in φ identically 1, since such a term would absorb C. Therefore,


by Lemma 3.5, there exists a solution X∗ to φ(X) = 0 such that X∗ ∈ TP ,N . Then
C(X∗ ) = 1, while φ(X ∗ ) = 0, contradicting the assumption that C is an implicant
of f . 

We are now ready to present the next proof.

Proof of Theorem 3.5. Let φ be the DNF produced by the consensus procedure
applied to the given DNF φ. This DNF φ is closed under absorption and consensus.
By Lemma 3.6, φ contains all the prime implicants of f . Since every implicant
of f is absorbed by a prime implicant of f , it follows that φ is the complete DNF
of f . 

Theorem 3.5 is equivalent to the following statement:

Corollary 3.2. A DNF is closed under consensus and absorption if and only if it
is the complete DNF of the function it represents.

This statement is frequently used to check whether or not a DNF is complete.


The following corollary shows that the completeness of a DNF can be verified in
polynomial time. Recall that ||φ|| denotes the number of terms in a DNF φ.

Corollary 3.3. Given a DNF φ of a Boolean function f , one can check in


O(n||φ||3 ) time whether φ is the complete DNF of f .
3.2 Generation of all prime implicants 135

Proof. Given two terms, one can check in O(n) time whether one is absorbed
by the other. Therefore, one can check in O(n||φ||2 ) time whether the absorption
operation can be applied to φ.
Given two terms, checking the existence
 of their consensus and producing it can
be done in O(n) time. Since there are ||φ|| pairs of terms of φ, it can be checked
||φ|| 3
2
in 2 O(n||φ||) = O(n||φ|| ) time whether every consensus of two terms of φ
is absorbed by another term of φ. 

The next corollary is essentially due to Robinson [787]. It shows how a solution
of a consistent DNF equation can be efficiently computed once the prime implicants
of the DNF are available.

Corollary 3.4. If a DNF φ is closed under consensus, then one can find a solution
of the equation φ(X) = 0, or prove that the equation is inconsistent, in O(n2 ||φ||)
time.

Proof. By Lemma 3.6, the equation φ(X) = 0 is inconsistent if and only if 1 is one
of the terms of φ. If this is not the case, then a solution of the equation is obtained
by a simple “greedy” procedure: Fix successively the variables x1 , x2 , . . . , xn to
either 0 or 1, while avoiding making any term in the DNF identically equal to 1.
Indeed, Lemma 3.5 implies that this procedure is correct, since any DNF that is
closed under consensus will remain closed under consensus after substituting any
Boolean values for any of the variables. The time bound follows from the fact that
substituting a value for a variable in any of the DNFs obtained in the process of
fixing variables can be done in O(n||φ||) time. 

Variable depletion
A streamlined version of the consensus procedure called variable depletion was
proposed by Blake [99] and later by Tison [864]. This method organizes the con-
sensus procedure in the following way: First, a starting variable xi1 is chosen,
and all possible consensuses are formed using pairs of terms that conflict in xi1 .
After completing this stage and removing all absorbed terms, another variable xi2
is chosen, and all consensuses on xi2 are produced. The process is repeated in the
same way until all variables have been exhausted.
The surprising fact, perhaps, is that after the stage based on an arbitrary variable
xi is completed, there is no need later on to apply again the consensus operation to
any pair of terms conflicting in xi . Before proving the correctness of this method,
we first establish the following lemma (which extends Theorem 2.6).

Lemma 3.7. Let f be a Boolean function depending on the variables x1 , x2 , . . . , xn ,


and let g, h, and l be Boolean functions depending on the variables x1 , x2 , . . . , xn−1
such that:
f = xn g ∨ x n h ∨ l.
136 3 Prime implicants and minimal DNFs

A conjunction C not depending on xn is an implicant (prime implicant) of f if and


only if it is an implicant (prime implicant) of
f = (g ∨ l)(h ∨ l) = gh ∨ l.
Proof. Since f ≤ f , it follows that every implicant of f is an implicant of f .
Conversely, let us assume that C is an implicant of f that does not depend on xn
and let X ∗ be any point in B n−1 . If C(X∗ ) = 1, then f (X∗ , 0) = f (X ∗ , 1) = 1 (since
C is an implicant of f ), or equivalently h(X ∗ ) ∨ l(X ∗ ) = g(X ∗ ) ∨ l(X ∗ ) = 1, and
hence f (X ∗ ) = 1. Thus, C is an implicant of f .
Furthermore, if C is a prime implicant of f but not a prime implicant of f , then
there exists another implicant C of f such that C < C . Since every implicant of
f is an implicant of f , C > C is an implicant of f , contradicting the assumption
that C is a prime implicant of f .
A similar reasoning shows that every prime implicant of f is also a prime
implicant of f . 

Lemma 3.8. Let f be a Boolean function depending on the variables x1 , x2 , . . . , xn ,


and let g, h, and l be Boolean functions depending on the variables x1 , x2 , . . . , xn−1
such that:
f = xn g ∨ x n h ∨ l.
A conjunction xn C is an implicant (prime implicant) of f if and only if it is an
implicant (prime implicant) of
f = xn g ∨ l.
The proof of this statement is analogous to that of Lemma 3.7, and is therefore
omitted.
We are now ready to formally state and prove the correctness of the method of
variable depletion.
Theorem 3.6. Given an arbitrary DNF φ of a Boolean function f , the method of
variable depletion applied to φ produces the complete DNF of f .
Proof. Let us call a variable x non-unate in a DNF φ if φ contains both a term xC
and a term xC . Let us prove the theorem by induction on the number of non-unate
variables in the given DNF φ. If φ contains just one non-unate variable, then the
variable depletion procedure stops after one step, and the resulting DNF is closed
under consensus and absorption. Corollary 3.2 implies that this resulting DNF is
the complete DNF of f , thus proving the basis of induction.
Let us assume now that the theorem holds if the number of non-unate variables
is at most n − 1, and let φ contain n non-unate variables. Let xn be the first variable
used in the variable depletion procedure, and let us represent the DNF φ of f as
φ = xn φ1 ∨ x n φ0 ∨ φ2 ,
where the DNFs φ0 , φ1 , and φ2 do not depend on xn .
3.2 Generation of all prime implicants 137

Note that the first step of the variable depletion procedure will generate all the
terms of the conjunction φ0 φ1 . Therefore, the DNF φ produced after the first step
of variable depletion is

φ = xn φ1 ∨ x n φ0 ∨ φ2 ∨ φ0 φ1 .

Although some absorptions may be possible in φ , no term of φ2 ∨ φ0 φ1 can be


absorbed by a term of xn φ1 ∨ x n φ0 . It follows from Lemma 3.7 that every prime
implicant of f that does not depend on xn is a prime implicant of φ2 ∨ φ0 φ1 . Note
that the DNF φ2 ∨ φ0 φ1 has at most n − 1 non-unate variables. By the inductive
assumption, the variable depletion procedure applied to φ2 ∨ φ0 φ1 will generate all
such prime implicants of f . All these prime implicants will also be generated by the
variable depletion procedure applied to φ . Indeed, since xn was already “depleted”,
every consensus in this latter procedure, which involves a term depending on xn ,
must result in a term depending on xn . Additionally, a term not depending on xn
cannot be absorbed by a term depending on xn . Thus, all prime implicants of f
not depending on xn will be generated by the variable depletion procedure.
Applying Lemma 3.8 to the DNF φ , one can see that a term xn C is a prime
implicant of f if and only if it is a prime implicant of xn φ1 ∨ φ2 ∨ φ0 φ1 . Note
that this DNF has at most n − 1 non-unate variables. By the inductive assumption,
the variable depletion procedure applied to this DNF will generate all those prime
implicants of f that have the form xn C. All these prime implicants will also
be generated by the variable depletion procedure applied to φ . Indeed, since xn
was already “depleted”, every consensus in this latter procedure that involves
a term containing x n must result in a term containing x n . Additionally, a term
not containing x n cannot be absorbed by a term containing x n . Thus, all prime
implicants of f having the form xn C will be generated by the variable depletion
procedure.
The case of prime implicants of f having the form x n C is completely analogous
to the above. 

Term disengagement
Another interesting variant of the consensus procedure based on term disengage-
ment was introduced by Tison [864], who proved that it works for arbitrary DNFs;
it was subsequently generalized by Pichat [747] to more abstract lattice-theoretic
structures. The Disengagement Consensus procedure is described in Figure 3.2.
It relies on the following principles:

Definition 3.2. A consensus algorithm is said to be a (term) disengagement algo-


rithm if it maintains a list L of implicants and proceeds in successive stages,
where

(i) at each stage a term C in the current list L is selected and all possible
consensuses of C with all other terms of L are generated;
138 3 Prime implicants and minimal DNFs

Procedure Disengagement Consensus(φ)



Input: A DNF expression φ = m k=1 Ck of a Boolean function f .
Output: The list L of all prime implicants of f .

begin
L := (C1 , C2 , . . . , Cm ), the list of terms of φ;
declare all terms C in L to be engaged;
while L contains some engaged term do
select an engaged term C;
declare C to be disengaged;
generate all possible consensuses of C and the other terms of L;
let R be the list of all such consensuses;
L := L ∪ R;
for each C in R
if C is not absorbed by another term in L
then add C to L and declare C to be engaged;
end while
return L;
end

Figure 3.2. Procedure Disengagement Consensus

(ii) each of the newly generated terms is checked for absorption by any other
(old or new) existing term; if it is not absorbed, then it is added to L;
(iii) the term C can no longer be chosen as a parent (it is “disengaged”) in any
subsequent stages, although it can still absorb some new terms.

We refer to Tison [864] and Pichat [747] for a proof of correctness of the
disengagement procedure.

3.2.3 Generation from a DNF representation: Complexity


In this subsection, we are going to discuss the computational complexity of gen-
erating the prime implicants of a Boolean function given in DNF. The most basic
computational problem simply consists of checking whether a given elementary
conjunction is an implicant of a Boolean function represented by a DNF:

Implicant Recognition
Instance: An elementary conjunction C and a Boolean function f in DNF.
Question: Is C an implicant of f ?

Theorem 3.7. The Implicant Recognition problem is co-NP-complete.

Proof. Clearly, the problem belongs to the class co-NP, since one can easily check
in polynomial time whether a Boolean point gives value 1 to the elementary
conjunction C and value 0 to the function f .
3.2 Generation of all prime implicants 139

A DNF equation φ = 0 is inconsistent if and only if the empty conjunction


C = 1 is an implicant of the function represented by φ. Since the DNF equation
problem is NP-complete, it follows that the implicant recognition problem is
co-NP-complete. 

Theorem 3.7 already suggests that generating the prime implicants of a function
given in DNF cannot be an easy task. Moreover, Theorem 3.17 will show that the
number of prime implicants (that is, the length of the output) may be exponential in
the length of the initial DNF (the input). Therefore, the computational complexity
of prime implicant generation algorithms should be measured in terms of the sizes
of their input and of their output (see Appendix B for a more detailed discussion
of list-generation algorithms).
In fact, if it were possible to design a prime implicant generation algorithm
that runs in polynomial total time (that is, polynomial in the combined sizes of the
input and of the output), then this algorithm could be used to solve DNF equations
in polynomial time, since the only prime implicant of a tautology is the constant
1 (see the proof of Theorem 3.7). This, of course, is not to be expected.
The next theorems will show that the computational complexity of the DNF
equation problem actually is the main stumbling block on the way to the efficient
recognition and generation of prime implicants. In order to state these results, let
us recall that |φ| denotes the length (that is, the number of literals) of a DNF φ,
and let t(L) denote the computational complexity of solving a DNF equation of
length at most L.

Theorem 3.8. For any Boolean function f , any DNF φ representing f , and any
elementary conjunction C, one can check in O(|φ|) + t(|φ|) time whether C is an
implicant of f .

Proof. By definition, an elementary conjunction C = i∈A xi j ∈B x j is an impli-
cant of f if and only if the restriction of f to the subcube TA,B is a tautology. The
latter property can be checked in O(|φ|) + t(|φ|) time by fixing xi to 1, for all
i ∈ A, and xj to 0, for all j ∈ B, in the DNF φ, and by solving the resulting DNF
equation. 

Corollary 3.5. For any Boolean function f , any DNF φ representing f , and any
implicant C of f , a prime implicant absorbing C can be constructed in O(|C|(|φ|+
t(|φ|))) time.

Proof. If C is not prime, then there must exist a literal in C such that the elementary
conjunction obtained from C by removing this literal remains an implicant of f .
By Theorem 3.8, this process can be carried out in O(|φ|) + t(|φ|) time for every
literal in C. 
140 3 Prime implicants and minimal DNFs

Let us now denote by D(f ) the set of prime implicants of a Boolean function f .

Theorem 3.9. For any Boolean function f and any DNF φ representing f ,
the set D(f ) can be generated by an algorithm that solves O(n|D(f )|2 ) DNF
equations of length at most |φ|. If t(L) is the computational complexity of
solving DNF equations of length L, then the running time of this algorithm is
O(n|D(f )|2 (n|D(f )| + |φ| + t(|φ|))).

Proof. By Corollary 3.5, for every elementary conjunction C in φ, one can find a
prime implicant of f absorbing C in O(|C|(|φ| + t(|φ|))) time. In this way, φ can
be reduced to a prime DNF (i.e., a disjunction of prime implicants) φ . In order
to obtain the complete DNF of f , we are going to add prime implicants to φ , as
described below.
First, the terms of φ are ordered arbitrarily, and the first term is marked. At
each step of the algorithm,

• the first unmarked term is compared to every marked term, and their
consensus – if any – is produced;
• for every consensus produced, a prime implicant of f absorbing it is found
(as in Corollary 3.5), and this prime implicant is added to φ if it is not already
present.

After this, the term is marked, and the algorithm continues with the next unmarked
term of φ . The algorithm stops when all the terms of φ are marked.
By construction, the resulting DNF φ is closed under absorption and consensus.
Therefore, by Corollary 3.2, the output DNF φ is complete.
Since the number of terms of φ never exceeds |D(f )|, the number of steps of
the algorithm does not exceed |D(f )|. At each step, an unmarked term is com-
pared to at most |D(f )| marked terms, and for every consensus produced, a prime
implicant of f absorbing it can be found in O(n(|φ| + t(|φ|))) time. Finally, it
can be checked in O(n|D(f )|) time whether a prime implicant is already present
in φ . Therefore, the total running time of the algorithm is O(n|D(f )|2 (n|D(f )|
+ |φ| + t(|φ|))). 

Note that the complexity of the algorithm in Theorem 3.9 depends, not only
on the size of the output (namely, on |D(f )|), but also on the complexity t(φ)
of solving the DNF equation φ = 0. For arbitrary DNFs, we expect t(φ) to be
exponential in the input size.
It is natural, however, to consider the problem of generating the prime implicants
of a Boolean function represented by a DNF in the special case where the associated
DNF equation can be solved efficiently. More precisely, let us call a class C of DNFs
tractable if the DNF equation φ = 0 can be solved in polynomial time for every
DNF φ in C and for every DNF obtained by fixing variables to either 0 or 1 in
such a DNF. For instance, the class of quadratic DNFs the class of Horn DNFs are
tractable (see Chapters 5 and 6).
3.3 Logic minimization 141

Corollary 3.6. For every tractable class C, there exists a polynomial p(x, y) and
an algorithm that, for every DNF φ ∈ C, generates the set of prime implicants
D(f ) of the function represented by φ in polynomial total time p(|φ|, |D(f )|).
Proof. This statement follows immediately from the definition of a tractable class
and from the proof of Theorem 3.9. 

In the terminology of Johnson, Yannakakis, and Papadimitriou [538] and of


Appendix B, the algorithm mentioned in Corollary 3.6 actually runs in polynomial
incremental time. We leave this for the reader to verify.
Theorem 3.9 and Corollary 3.6 were originally established (in a slightly different
form) by Boros, Crama, and Hammer [112].

3.2.4 Generation from a CNF representation


Finally, we briefly discuss the prime implicant generation problem Prime Impli-
cants when the function is given as a CNF. It will follow from Theorem 3.18
that no method can produce all prime implicants of a function f given by a CNF φ
in time polynomial in ||φ||. Despite its computational intractability, this problem,
which is a special case of the dualization problem investigated in Chapter 4, plays
a major role in the theory of Boolean functions.
The term “dualization” is due to the fact that the dual expression φ d is a DNF rep-
resenting the dual Boolean function f d , and therefore, the problem of generating
all prime implicants of a function represented by a CNF is equivalent to the problem
of generating all prime implicants of the dual of a Boolean function represented by
a DNF. As a consequence, several dualization algorithms can be used to generate
all the prime implicants of a Boolean function represented by a CNF; we refer to
Chapter 4, in particular, Section 4.3, for a more thorough discussion of this topic.
In addition, the following fact is worth noticing. Suppose that A is a dualization
algorithm that, when applied to a DNF representation of a function f , produces
all the prime implicants of the dual function f d . Then, the involution property of
Boolean functions (namely, (f d )d = f , see Theorem 1.2) makes it possible to use A
for generating all the prime implicants of f , by simply applying dualization twice;
namely, given any DNF φ representing f , apply A to φ to produce all the prime
implicants of f d , and then apply again A to the complete DNF of f d to produce
all the prime implicants of f . This approach is sometimes known as the double
dualization method, and is usually attributed to Nelson [705]. From the point of
view of computational efficiency, it is clear that the advantages of the double
dualization method over the consensus procedure, if any, must be confined to
special situations.

3.3 Logic minimization


It was already observed in Chapter 1 that a Boolean function may have numerous
DNF representations (see, e.g., Example 1.16). It was also mentioned there that in
142 3 Prime implicants and minimal DNFs

some applications a “short” DNF representation of a Boolean function is preferred


over a longer one (see Section 1.13.1, describing how a system of implication rules
in artificial intelligence can be replaced by a logically equivalent one, containing
fewer and simpler rules).
The problem of constructing a short DNF representation of a Boolean function
is usually referred to as the problem of logic minimization, or two-level logic min-
imization, or Boolean function minimization. This problem was originally studied
within the context of electrical and computer engineering (see Section 1.13.2),
where logic minimization is used to reduce the number of electronic components
in a switching circuit that realizes a Boolean function. We refer, for instance, to
Coudert [221]; Coudert and Sasao [222]; Czort [249]; Sasao [804]; Umans, Villa,
and Sangiovanni-Vincentelli [877]; or Villa, Brayton, and Sangiovanni-Vincentelli
[891] for surveys.
The complexity of a DNF φ can be measured in several ways. The two most
popular measures used in logic minimization are ||φ|| (the number of terms) and
|φ| (the number of literals) in φ. Note that in other areas of the theory of Boolean
functions, different measures of DNF complexity can be more relevant, such as
for instance the degree of φ, that is, the largest number of literals in a term of φ.
Let us remark that a ||φ||-minimizing DNF must be irredundant, while a |φ|-
minimizing DNF must be both irredundant and prime (see Definition 1.30, for the
terminology). On the other hand, an arbitrary prime irredundant DNF of a Boolean
function may be neither ||φ||-minimizing nor |φ|-minimizing.

Example 3.4. Consider the Boolean function f (x1 , x2 , x3 ) represented by the DNF

φ1 = x1 x 2 ∨ x 1 x2 ∨ x1 x 3 ∨ x 1 x3 .

This DNF is neither ||φ||-minimizing nor |φ|-minimizing, since f can also be


represented by the DNF

φ2 = x1 x 2 ∨ x 1 x3 ∨ x2 x 3 ,

which has both fewer terms and fewer literals. 

In our discussion of logic minimization, to avoid unnecessary technical


complications, we shall focus on finding ||φ||-minimizing DNFs. Clearly, if a
||φ||-minimizing DNF is not prime, then this DNF can be simplified further by
reducing each of its nonprime terms to a prime one. Therefore, in this section we
limit our attention to those ||φ||-minimizing DNFs that are not only irredundant
but also prime. (The reader should note at this point, however, that it may already
be quite hard to recognize whether an arbitrary DNF is irredundant, or whether it
is prime; see Exercises 8 and 9 at the end of this chapter.)
Finally, we shall also need to distinguish among different versions of the logic
minimization problem, depending on the format of its input. Accordingly, we
formally define the following algorithmic problems:
3.3 Logic minimization 143

(T , F ) ||φ||-minimization
Instance: The complete truth table of a Boolean function f .
Output: A prime ||φ||-minimizing DNF of f .

T ||φ||-minimization
Instance: The list of true points of a Boolean function f or, equivalently, the
minterm DNF expression of f .
Output: A prime ||φ||-minimizing DNF of f .

minT ||φ||-minimization
Instance: The list of prime implicants of a Boolean function f or, equivalently,
the complete DNF of f .
Output: A prime ||φ||-minimizing DNF of f .

||φ||-minimization
Instance: An arbitrary DNF expression of a Boolean function f .
Output: A prime ||φ||-minimizing DNF of f .

3.3.1 Quine-McCluskey approach: Logic minimization as set covering


We first present a fundamental result due to McCluskey [633] and to Quine [766],
which will allow us to reformulate logic minimization problems as set covering
problems.
Let us assume that a Boolean function f (x1 , x2 , . . . , xn ) is represented by the
set T (f ) of its true points. As shown in Section 3.2.1 (Theorem 3.4), all the prime
implicants C1 , C2 , . . . , Ck of f (x1 , x2 , . . . , xn ) can be generated in time polynomial
in n|T (f )|.
Let us associate a 0, 1-variable si with each of the prime implicants Ci , i =
1, 2, . . . , k: The interpretation of these variables will be that si = 1 if Ci is retained
in the construction of a DNF on the collection of terms {C1 , C2 , . . . , Ck }. More
formally, every Boolean point S = (s1 , s2 , . . . , sk ) ∈ Bk defines a DNF
k

φS (x1 , x2 , . . . , xn ) = Ci = s i Ci . (3.2)
i: si =1 i=1

Since ki=1 Ci is the complete DNF of the Boolean function f , for every Boolean
vector S we have:

φS (x1 , x2 , . . . , xn ) ≤ f (x1 , . . . , xn ). (3.3)

Clearly, every prime DNF of f corresponds to a vector S for which the inequal-
ity (3.3) holds as an equality and, conversely, every vector S for which (3.3)
becomes an equality defines a prime DNF of f . It follows from (3.3) that to char-
acterize those vectors S that correspond to the prime DNFs of f , it is sufficient to
144 3 Prime implicants and minimal DNFs

characterize those S for which the reverse inequality

φS (x1 , x2 , . . . , xn ) ≥ f (x1 , . . . , xn ) (3.4)

also holds. Moreover, the inequality (3.4) can be reformulated as a system of


|T (f )| linear inequalities in the variables si , i = 1, . . . , k:
k

si Ci (X) ≥ 1, for all X ∈ T (f ). (3.5)
i=1

We now consider the Boolean function π(s1 , s2 , . . . , sk ) that takes value 1 exactly
on those points S = (s1 , s2 , . . . , sk ) for which the system of inequalities (3.5) holds
or, equivalently, on those points S for which (3.2) defines a prime DNF of f .
This function is known in the literature as the Petrick function associated with
f (see [744]). A CNF representation of the Petrick function follows directly
from (3.5):

k
π(s1 , . . . , sk ) = ( si Ci (X)). (3.6)
X∈T (f ) i=1

This CNF representation clearly shows that the Petrick function is positive, a fact
that can also be easily derived from its definition.
By definition of the Petrick function, there is a one-to-one correspondence
between its positive implicants and the prime DNFs of the function f . Furthermore,
one can easily see that there is a one-to-one correspondence between the prime
implicants of the Petrick function and the prime irredundant DNFs of the function
f . But of course, in general, the computational complexity of generating all the
prime implicants of the Petrick function is prohibitively expensive.
In view of the preceding discussion, the problem of finding a ||φ||-minimizing
DNF can be formulated as the problem of finding a minimum degree prime impli-
cant of the Petrick function. Alternatively, the same problem can be formulated as
the set covering problem
k

minimize si (3.7)
i=1

subject to (3.5) and (s1 , s2 , . . . , sk ) ∈ Bk . (3.8)

Similarly, the problem of finding a |φ|-minimizing DNF can be formulated as


the weighted set covering problem
k

minimize deg(Ci ) si (3.9)
i=1

subject to (3.5) and (s1 , s2 , . . . , sk ) ∈ Bk (3.10)

(where deg(Ci ) denotes the degree of Ci , i.e., the number of literals in Ci ).


3.3 Logic minimization 145

Example 3.5. Consider again the Boolean function f (x1 , x2 , x3 ) of Example 3.4,
represented this time by its set of true points
T (f ) = {(1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, 0), (1, 0, 1), (0, 1, 1)}.
Using the algorithm described in Section 3.2.1, we generate all the prime impli-
cants of this function: x1 x 2 , x 1 x2 , x1 x 3 , x 1 x3 , x2 x 3 , x 2 x3 . Associating with these
prime implicants the binary variables s1 , s2 , . . . , s6 , respectively, we can write the
CNF (3.6) of the Petrick function as
π(s1 , . . . , s6 ) = (s1 ∨ s3 )(s2 ∨ s5 )(s4 ∨ s6 )(s3 ∨ s5 )(s1 ∨ s6 )(s2 ∨ s4 ).
The complete DNF of the Petrick function is obtained by dualization of this CNF
(see Section 3.2.4):
π(s1 , . . . , s6 ) = s1 s4 s5 ∨ s2 s3 s6 ∨ s1 s2 s3 s4 ∨ s1 s2 s5 s6 ∨ s3 s4 s5 s6 .
From this DNF, we conclude that the function f (x1 , x2 , x3 ) has five prime irre-
dundant DNFs, two of them consisting of three prime implicants each, and three
others consisting of four prime implicants each.
The problem of finding a ||φ||-minimizing DNF of f (without necessarily listing
all its prime irredundant DNFs) can be formulated as the following set covering
problem:
minimize s1 + s2 + s3 + s4 + s5 + s6

subject to s1 + s3 ≥ 1
s2 + s5 ≥ 1
s4 + s6 ≥ 1 (3.11)
s3 + s5 ≥ 1
s1 + s6 ≥ 1
s2 + s4 ≥ 1
si ∈ {0, 1}, i = 1, . . . , 6.
For this small example, it can be easily checked that the optimal solutions of the
set covering problem (3.11) are (1, 0, 0, 1, 1, 0) and (0, 1, 1, 0, 0, 1), corresponding
to the two ||φ||-minimizing DNFs of f :
φ = x1 x 2 ∨ x 1 x3 ∨ x2 x 3 ,
and
φ = x 1 x2 ∨ x1 x 3 ∨ x 2 x3 .
Since, in this example, all the prime implicants of f have the same degree, its
||φ||-minimizing DNFs and |φ|-minimizing DNFs coincide. 

It follows from Section 3.2.1 that, given the set of true points of a Boolean
function, the set covering formulation of the logic minimization problem can be
146 3 Prime implicants and minimal DNFs

constructed in polynomial time. However, it is well-known that the set cover-


ing problem is NP-hard; therefore, this approach to logic minimization does not
necessarily provide a polynomial algorithm. In fact, it will be seen later in this
chapter that the problem of logic minimization is intractable in general. Moreover,
if a Boolean function is represented by an arbitrary DNF, or even by its complete
DNF, then just the construction of the set covering formulation of the logic mini-
mization problem can in itself be computationally difficult because of the possibly
exponential number of true points.

3.3.2 Local simplifications of DNFs


A main challenge of the logic minimization problem stems from the fact that the
same Boolean function can be represented by numerous DNFs of varying lengths,
even if we restrict our attention only to prime and irredundant DNFs. While the
set of prime implicants of a Boolean function is unique, the subsets of the prime
implicants used in two distinct DNF representations of the same function can be
quite different, and, as Example 3.5 shows, these subsets can even be disjoint. On
the other hand, some of the prime implicants of a Boolean function can exhibit
a consistent pattern of behavior regarding their participation in the prime and
irredundant DNFs of the function, as illustrated in the following example.
Example 3.6. Let us consider the Boolean function f whose set of prime
implicants is
D(f ) = {xy, xy, xu, yu, uw, xw, yw}.
It can be verified that this function has exactly two prime and irredundant DNFs:

φ1 = xy ∨ xy ∨ xu ∨ uw

and
φ2 = xy ∨ xy ∨ yu ∨ uw.
Notice that the prime implicants xy, xy, and uw appear in all prime and irre-
dundant DNFs of f , while the prime implicants xw and yw do not appear in any
prime and irredundant DNFs of f . 

In view of this example, let us introduce the following concepts (see


Quine [769], Pyne and McCluskey [764]).
Definition 3.3. A prime implicant of a Boolean function f is called essential if it
appears in every prime DNF of f . A prime implicant of f is called redundant if it
does not appear in any prime and irredundant DNF of f .
In the foregoing example, xy, xy, and uw are essential prime implicants, while
xw and yw are redundant prime implicants. The prime implicant xu (as well as
yu) is neither essential nor redundant, since there exists a prime and irredundant
DNF in which it appears, and another one in which it does not.
3.3 Logic minimization 147

Clearly, the knowledge of essential and redundant prime implicants is very


useful in solving logic minimization problems.As we will see, if a Boolean function
is represented by the set of its true points, the detection of essential and redundant
prime implicants can be carried out without major computational difficulties. To
do so, we first return to the set covering formulation (3.5) of logic minimization.
With the set of linear inequalities (3.5), let us associate a (0, 1)-matrix A with
|T (f )| rows and k columns. If X 1 , X 2 , . . . , X |T (f )| are the true points of f , then the
elements of A are defined as

aj i = Ci (X j ) for j = 1, 2, . . . , |T (f )| and i = 1, 2, . . . , k. (3.12)

The rows of this matrix correspond to the true points of f and will be denoted
by aj , j = 1, 2, . . . , |T (f )|. The columns of the matrix correspond to the prime
implicants of f and will be denoted by a i , i = 1, 2, . . . , k.
Let us say that a (0, 1)-point S = (s1 , s2 , . . . , sk ) satisfying the system of inequal-
ities (3.5) is a minimal solution of (3.5) if no point obtained by changing any of
the components of S from 1 to 0 also satisfies (3.5). We now discuss three com-
putationally easy transformations which can be used to simplify the system of set
covering inequalities (3.5), while preserving all its minimal solutions (the presen-
tation is ours, but we refer to Gimpel [382], Pyne and McCluskey [764, 765], or
Zhuravlev [937] for early references on this topic).
S1 If the matrix A contains a row aj ∗ with a single component, say i ∗ , equal to 1
(that is, aj ∗ i ∗ = 1, and aj ∗ i = 0 for all i  = i ∗ ), then fix si ∗ = 1 and remove

from the matrix A the column a i and all the rows aj having aj i ∗ = 1.
S2 If the matrix A contains two comparable rows, say aj and aj , such that
aj ≤ aj (i.e., aj i ≤ aj i for every i), then remove the row aj from A.

S3 If the matrix A contains a column a i consisting only of 0 components, then

fix si ∗ = 0 and remove the column a i from A.
It can be seen easily that the three simplifications S1, S2, and S3 preserve the
set of minimal solutions of the set covering inequalities (3.5). Therefore, one can
simplify (3.5) by repeatedly applying S1, S2, and S3 in an arbitrary order, for as
long as possible. Let us denote the resulting matrix by A,  the set of variables s

that are fixed at 1 by S1 , and the set of variables s that are fixed at 0 by 
S0 . One
would expect the matrix A  and the sets 
S1 and 
S0 to depend on the particular order
in which the simplifications were applied. To avoid ambiguity, let us now specify
an algorithm that first applies S1 for as long as possible, then applies S2 for as
long as possible, and finally applies S3 for as long as possible. We shall call this
algorithm the essential reduction algorithm (ERA). Let us denote the resulting
matrix by A∗ , and the set of variables which are fixed at 1 (respectively, 0) by S1∗
(respectively, S0∗ ).
Theorem 3.10. The end result of applying simplifications S1, S2, and S3 as long
as possible does not depend on the order of their application: Every possible order
 = A∗ , 
always yields A S1 = S1∗ , and 
S0 = S0∗ .
148 3 Prime implicants and minimal DNFs

Proof. The proof follows from three simple observations. First, let us observe that
if an intermediate matrix A contains a row with a single 1 component, then that
row cannot contain more than one 1 in the original matrix A. Indeed, if a column
was removed during the simplification process, then either this column had no 1’s,
and therefore its removal did not affect the number of 1’s in the remaining rows,
or all the rows in which this column had a 1 were also removed at the same step.
Therefore,  S1 = S1∗ .
Second, an intermediate matrix A contains two comparable rows if and only
if these two rows are also comparable in the original matrix A. This is a direct
consequence of the fact that none of the simplification steps S1, S2, or S3 affects
the comparability of the remaining rows. It follows, then, from the foregoing two
observations that the sets of rows of A and A∗ are exactly the same.
 and A∗ consists exactly of those columns of the
Third, the set of columns of A
original matrix that have at least one 1 component in the remaining rows. Indeed,
on the one hand, neither of the matrices contains a column consisting only of 0’s.
On the other hand, if a removed column did have some 1’s, then all the rows, in
which it had 1’s, were also removed. In conclusion, the set of remaining columns
is uniquely determined by the set of remaining rows. Since the sets of rows of A 
∗  ∗
and A coincide, we have A = A . It follows that exactly the same sets of variables
were fixed in both procedures, and since we have already concluded that  S1 = S1∗ ,

we can now conclude that S0 = S0 . ∗


Lemma 3.9. For every variable s of the system of set covering inequalities (3.5),
s is not fixed by ERA if and only if there exists a minimal solution of (3.5) in which
s = 1 and a minimal solution of (3.5) in which s = 0.
Proof. The “if ” part follows from the fact that the simplifications S1, S2, and S3
preserve all the minimal solutions.
We now prove the “only if” part. Let si ∗ be a variable that is not fixed by
ERA. On the one hand, since every row of A∗ has at least two 1’s, we can set
si ∗ = 0, and the problem will remain feasible, showing that there must exist a
minimal solution of (3.5) in which si ∗ = 0. On the other hand, since A∗ has no
columns consisting of all 0’s, there must exist a row j ∗ in A∗ such that aj ∗ i ∗ = 1.
Let us now set si = 0 for every i  = i ∗ such that aj ∗ i = 1. Since A∗ has no com-
parable rows, the set covering system remains feasible, and in every solution
of this reduced set covering system (including the minimal ones), si ∗ must be
equal to 1 because there is no other way to satisfy the inequality corresponding
to row j ∗ . 

Since prime and irredundant DNFs of a Boolean function are in one-to-one


correspondence with the minimal solutions of the set covering inequalities (3.5),
Lemma 3.9 implies the following characterizations of the essential and redundant
prime implicants:
Theorem 3.11. A prime implicant of a Boolean function is essential if and only if
the corresponding variable s is fixed at 1 by ERA.
3.3 Logic minimization 149

Theorem 3.12. A prime implicant of a Boolean function is redundant if and only


if the corresponding variable s is fixed at 0 by ERA.
It is important to observe that, from a computational point of view, ERA is
relatively inexpensive. More specifically, given a |T (f )| × k set covering matrix
A, the ERA simplifications can be carried out in O(k|T (f )|2 ) time. Indeed, all
the simplifications S1 and S3 can be done in O(k|T (f )|) time, since each of these
two types of simplifications requires a single pass over the set covering matrix.
Additionally,
|T (f )| to carry out all the simplifications S2, one has to compare at most
2
pairs of rows, and each comparison can be done in O(k) time.
Example 3.7. Let us consider the set covering matrix A associated with the logic
minimization problem for the Boolean function f given in Example 3.6:
(x, y, u, w) xy xy xu yu uw xw yw
(0, 0, 0, 1) 0 0 0 0 1 0 0
(0, 1, 0, 0) 0 1 0 0 0 0 0
(1, 0, 0, 0) 1 0 0 0 0 0 0
(0, 1, 0, 1) 0 1 0 0 1 0 1
(1, 0, 0, 1) 1 0 0 0 1 1 0
(0, 1, 1, 0) 0 1 0 1 0 0 0
(1, 0, 1, 0) 1 0 1 0 0 0 0
(0, 1, 1, 1) 0 1 0 1 0 0 1
(1, 0, 1, 1) 1 0 1 0 0 1 0
(1, 1, 0, 1) 0 0 0 0 1 1 1
(1, 1, 1, 0) 0 0 1 1 0 0 0
(1, 1, 1, 1) 0 0 1 1 0 1 1
The twelve rows of this matrix correspond to the true points of the function, while
the seven columns correspond to its prime implicants. Three applications of the
simplification S1 show that xy, xy, and uw are essential prime implicants. The
resulting simplified set covering matrix is
(x, y, u, w) xu yu xw yw
(1, 1, 1, 0) 1 1 0 0 .
(1, 1, 1, 1) 1 1 1 1
Applying now S2, the matrix reduces to
(x, y, u, w) xu yu xw yw
.
(1, 1, 1, 0) 1 1 0 0
Finally, two applications of the simplification S3 show that xw and yw are
redundant prime implicants. The set covering matrix of the remaining problem is
(x, y, u, w) xu yu
,
(1, 1, 1, 0) 1 1
showing that every prime and irredundant DNF contains either xu or yu, but not
both. These results confirm the statements made in Example 3.6. 
150 3 Prime implicants and minimal DNFs

We have seen that the simplifications S1, S2, and S3, and therefore ERA, pre-
serve the minimal solutions of the system of set covering inequalities (3.5); namely,
they preserve the set of prime and irredundant DNFs. This property allows the
application of S1, S2, and S3, and of ERA, to any type of logic minimization
problem whose objective is to minimize the number of terms, the number of lit-
erals, or any monotonically increasing function of these two DNF complexity
measures.
Let us now turn our attention to another type of simplifying transformation,
which has a more limited scope of application, since it may not preserve all the
minimal solutions of the set covering inequalities (3.5).
S4 If the matrix A contains two comparable columns, say, a i and a i , such that
a i ≥ a i (i.e., aj i ≥ aj i for every j ), then fix si = 0 and remove the
column a i from A.
Note that the simplification S3 introduced earlier is a special case of S4. The
simplification S4 is guaranteed to preserve at least one minimum-cardinality solu-
tion of the system (3.5), namely, one optimal solution of the set covering problem
(3.7)–(3.8). Indeed, a single application of S4 reduces the current set of minimal
solutions in such a way that only those minimal solutions in which si = 0 are pre-
served. Further, since a i ≥ a i , if there is a current minimum-cardinality solution
S with si = 1, then the point S ∗ , which is equal to S in all components, except
si∗ = 0 and si∗ = 1, is also a minimum-cardinality solution and is preserved by
S4.
It is now clear that S4 can be applied to simplify those logic minimization
problems whose objective is to find at least one minimum solution of the set
covering problem (3.7)–(3.8), that is, to find a ||φ||-minimizing prime DNF.
Note that the simplification process can never start with S4 because, in our
logic minimization problems, the initial set covering matrix A does not contain
comparable columns (because no prime implicant is absorbed by another one).
Nevertheless, S4 may become applicable after several applications of simplifi-
cations S1 or S2. On the other hand, the opposite phenomenon can also happen;
namely, it is possible that neither S1 nor S2 is applicable but S4 is, and after several
applications of S4 it may become possible to apply S1 or S2. Therefore, further
simplifications can be achieved by alternatively applying ERA and S4 as long as
possible.

3.3.3 Computational complexity of logic minimization


It was seen in the previous subsection that logic minimization problems can be
reduced to set covering problems. This reduction makes it possible to solve logic
minimization problems by generic set covering algorithms. Note, however, that
the use of such generic algorithms may not be most appropriate for solving the
resulting set covering problems if they appeared to possess some special properties
that would allow the development of specialized, more efficient algorithms.
3.3 Logic minimization 151

At first glance, set covering problems arising from logic minimization prob-
lems do display some special features. For example, since each column of the set
covering matrix corresponds to a prime implicant (i.e., a subcube), the number
of 1’s in the column must be a power of 2. Similarly, the number of 1’s in the
intersection of any subset of columns must also be a power of 2.
In view of such special features, formally, not every set covering problem
originates from logic minimization. Therefore, it comes as a surprise that every
(nontrivial) set covering problem is, in fact, an S1-simplified version of a logic
minimization problem. More precisely, given an arbitrary set covering problem
without zero rows or columns, there exists a logic minimization problem, which –
after several applications of the simplification S1 – reduces to it. This subsection
is devoted to a proof of this result and its corollaries.
Let us consider the system of set covering inequalities

m

aj i si ≥ 1, j = 1, 2, . . . , n (3.13)
i=1

and the corresponding matrix A = (aj i )i=1,...,m


j =1,...,n , which we assume to have no zero
rows or columns. The construction of a logic minimization problem reducible
to the given set covering system involves two steps. First, we construct a set of
Boolean points and a set of terms in such a way that the given matrix A repre-
sents the associated set covering conditions. At this stage, the logic minimization
problem is not completely defined because the constructed terms can also cover
other Boolean points, besides the constructed ones. Then we extend the construc-
tion by adding some special terms and the corresponding true points. It will be
shown that in the completely defined logic minimization problem constructed
in this way, the terms added at the second stage are essential prime implicants.
Moreover, the set covering inequalities associated with this logic minimization
problem will be shown to be reducible (by using the simplification S1) to the
originally given set covering system (3.13). Let us now describe the details of the
construction.
As a first step, let us associate with each row j of A a Boolean point of dimension
j j j
n, denoted P j = (p1 , p2 , . . . , pn ), where, for j , r = 1, 2, . . . , n,

1 if r = j ,
prj = (3.14)
0 if r = j .

Also, with each column i of A, i = 1, 2, . . . , m, let us associate an elementary


conjunction Ci on variables from {x1 , x2 , . . . , xn }:

Ci = xj . (3.15)
j : aj i =0
152 3 Prime implicants and minimal DNFs

Example 3.8. As a small example, let us consider the following set covering
matrix:  
1 0 1
A=  1 1 0 .
0 0 1
Then, the points associated to its rows are
 1   
P 0 1 1
 P2  =  1 0 1 ,
P3 1 1 0

while the terms associated to its columns are

(C1 , C2 , C3 ) = (x3 , x1 x3 , x2 ).

Lemma 3.10. For every matrix A ∈ Bn×m without zero rows, there holds
(a) for all j = 1, 2, . . . , n and i = 1, 2, . . . , m, aj i = Ci (P j );
(b) for all j = 1, 2, . . . , n, P j is a true point of the function represented by the

DNF m i=1 Ci .

Proof. To establish (a), notice that, by construction of Ci , Ci (P j ) = 0 if and only


j
if there exists an index k such that aki = 0 and Pk = 0. But by definition of P j ,
this is equivalent to k = j and aj i = 0.
To prove assertion (b), simply note that, for each P j , there is at least one con-
junction Ci such that Ci (P j ) = 1 (since A has no zero row). 

Lemma 3.10 suggests that A comes close to being the matrix associated with a
logic minimization problem, because it expresses the covering of the true points
P j by the terms Ci . However, as the Example 3.8 shows, absorption may possibly
take place among the conjunctions Ci (indeed, x3 absorbs x1 x3 in the example).
Therefore, the construction has to be modified if we want the conjunctions Ci to
represent the prime implicants of some Boolean function.
Let us call a column a i of A dominating if there exists another column a i in
A such that a i ≥ a i , and let us redefine the associated conjunctions Ci by

Ci , if a i is not dominating,
Ci := (3.16)
Ci yi , if a i is dominating,

where the yi ’s represent additional Boolean variables. Obviously, after this trans-
formation, there will be no absorption among the conjunctions Ci . In order to
complete the construction, we shall extend the associated vectors P j by adding
additional components for each of the additional variables yi , and defining the
value of all these components to be 1. This modification preserves the property
that A expresses the covering of the points P j ’s by the conjunctions Ci ’s.
3.3 Logic minimization 153

Example 3.9. Returning to our Example 3.8, we find now:


(C1 , C2 , C3 ) = (x3 y1 , x1 x3 , x2 ),
and    
P1 0 1 1 1
 P2  =  1 0 1 1 . 
P3 1 1 0 1
To define a logic minimization problem equivalent to the original set covering
problem, we construct the DNF
m

ψ= Ci , (3.17)
i=1

where the terms Ci are defined by (3.15) and (3.16). Note that ψ represents a
positive Boolean function, say, f , and, since ψ is closed under absorption, it is the
complete DNF of f . The true points of f include all the points P j , j = 1, 2, . . . , n
but can also include many additional points, say, Qt , t = 1, 2, . . . , T .
If we simply extend the set covering problem by adding to A all the rows
corresponding to the additional true points Qt , then the resulting matrix may not
necessarily be reducible to A by using the simplifications S1, S2, and S3. To make
this reduction possible, we introduce two additional variables, z0 and z1 . For any
Boolean point Q, let us denote by [Q] the unique minterm (in the (x, y)-variables)
covering Q, and let us say that Q is “even” (respectively, “odd”) if it has an even
(respectively, odd) number of components equal to 1. We can now define the DNF:
m

ψ∗ = z0 z1 Ci ∨ z0 [Qt ] ∨ z1 [Qt ]. (3.18)
i=1 t: Qt is even t: Qt is odd
We let f ∗ be the Boolean function represented by ψ ∗ .
Example 3.10. For the Example 3.9, there are eight additional true points Qt :
 1   
Q 1 1 1 1
 Q2   1 1 1 0 
 3   
 Q   1 1 0 0 
 4   
 Q   1 0 1 0 
 = 
 Q5   0 1 1 0  .
   
 Q6   0 1 0 1 
   
 Q7   0 0 1 1 
Q8 0 1 0 0
The associated DNF is
ψ ∗ = x3 y1 z0 z1 ∨ x1 x3 z0 z1 ∨ x2 z0 z1 ∨ x1 x2 x3 y1 z0 ∨ x1 x2 x 3 y 1 z0 ∨ x1 x 2 x3 y 1 z0 ∨
x 1 x2 x3 y 1 z0 ∨ x 1 x2 x 3 y1 z0 ∨ x 1 x 2 x3 y1 z0 ∨ x1 x2 x3 y 1 z1 ∨ x 1 x2 x 3 y 1 z1 .

154 3 Prime implicants and minimal DNFs

Lemma 3.11. The complete DNF of f ∗ is ψ ∗ .


Proof. Let us write ψ ∗ as

ψ ∗ = z0 z1 ψ ∨ z0 ψ0 ∨ z1 ψ1 .

By construction, no two terms of ψ absorb each other. The same holds for the
terms of ψ0 and ψ1 . Moreover, it is obvious that no term of z0 ψ0 can absorb a term
of z1 ψ1 , and vice versa. It is also obvious that no term of z0 z1 ψ can absorb any
term of z0 ψ0 or z1 ψ1 . Since A has no zero columns, no term of ψ is a minterm on
the (x, y)-variables. Then, since every term of ψ0 and of ψ1 is a minterm, no term
of z0 ψ0 or z1 ψ1 can absorb a term of z0 z1 ψ. Thus, ψ ∗ is closed under absorption.
Let us now prove that ψ ∗ is closed under consensus. Obviously, no two terms
of z0 z1 ψ have a consensus because they are all positive. Moreover, any two terms
of ψ0 have at least two conflicting literals, and hence no two terms of z0 ψ0 have
a consensus. For the same reason, no two terms of z1 ψ1 have a consensus.
Let us now assume that a term of z0 z1 ψ, say z0 z1 C, and a term of z0 ψ0 , say
z0 [Q], have a consensus. This can only happen if there is a variable w in C such
that w appears in [Q]. Since Q is a true point of f , there exists a prime implicant
of f , say, C that absorbs [Q] and obviously does not contain w. Then, z0 z1 C is
a term of z0 z1 ψ that absorbs the consensus of z0 z1 C and z0 [Q]. Similarly, every
consensus of a term in z0 z1 ψ and a term in z1 ψ1 will be absorbed by a term in
z0 z1 ψ.
Let us next assume that a term of z0 ψ0 , say, z0 [Q ], and a term of z1 ψ1 , say
z1 [Q ], have a consensus. Without loss of generality, let us assume that [Q ] = wG
and [Q ] = wH . Again, there exists a prime implicant of f , say, C that absorbs
[Q ] and that does not contain w. Then, z0 z1 C is a term of z0 z1 ψ that absorbs the
consensus of z0 [Q ] and z1 [Q ].
Thus, ψ ∗ is closed under consensus and, in view of Corollary 3.2, ψ ∗ is the
complete DNF of f ∗ . 

We now discuss the logic minimization problem for f ∗ . By Lemma 3.11, the
columns of the set covering matrix A∗ associated with f ∗ correspond to the terms
of ψ ∗ . The rows of A∗ correspond to the set of true points T (f ∗ ). These true points
are derived from the points P j , j = 1, 2, . . . , n and Qt , t = 1, 2, . . . , T by extending
them with two additional components, corresponding to z0 and z1 , so that T (f ∗ )
consists of the following disjoint subsets:
• The set P of points (P j , 1, 1), j = 1, 2, . . . , n.
• The set Q11 of points (Qt , 1, 1), t = 1, 2, . . . , T .
• The set Q10 of points (Qt , 1, 0), where Qt is even.
• The set Q01 of points (Qt , 0, 1), where Qt is odd.
Let us see what happens when the simplification steps S1 are performed on
A∗ . Every true point of the form (Qt , σ , σ ) (t = 1, . . . , T ; σ = 0, 1) is covered
by a single prime implicant zσ [Qt ]. Therefore, every prime implicant zσ [Qt ] is
essential, and the application of the simplification S1 removes the corresponding
3.3 Logic minimization 155

columns and all the rows in Q10 ∪ Q01 from the set covering matrix A∗ . Moreover,
the rows in Q11 are also removed by S1, since every prime implicant of the form
zσ [Qt ] covers (Qt , 1, 1).
So, the application of S1 only leaves in A∗ the rows associated with the true
points (P j , 1, 1), j = 1, 2, . . . , n, and the columns associated with the prime impli-
cants of the form z0 z1 Ci , i = 1, 2, . . . , m. It now follows from Lemma 3.10 that this
reduced set covering matrix coincides with the original matrix A. This completes
the proof of the following result, due to Gimpel [381]:

Theorem 3.13. Given an arbitrary set covering problem without zero rows or
columns, there exists a logic minimization problem whose set covering formu-
lation can be reduced to the given problem after several applications of the
simplification S1.

The foregoing arguments show how to transform an arbitrary set covering prob-
lem of size n × m to an equivalent logic minimization problem having at most
n + m + 1 Boolean variables. Since it is well known that the set covering problem
is NP-hard [371], one may be tempted to interpret this construction as an NP-
hardness proof for the logic minimization problem. Unfortunately, this inference
is incorrect, since the described reduction is not necessarily polynomial because
the number of true points of f ∗ constructed above can be exponentially large in
n and m. However, this difficulty is easy to overcome, and we can establish the
following result (Gimpel [381]):

Theorem 3.14. The logic minimization problem is NP-hard when its input is a
Boolean function given by the set of its true points.

Proof. It is known [371] that the set covering problem remains NP-hard in the
special case in which every column of the set covering matrix contains at most
three 1’s, and the matrix does not contain any pair of comparable columns. In this
case, because of the incomparability of the columns, no variable y is needed in the
construction (3.17) of the DNF ψ. Moreover, the degree of every conjunction Ci
is at least n − 3, hence the number T of the additional true points Qt of f is at most
8m. Therefore, the number of prime implicants of the Boolean function f ∗ is at
most 9m, and the DNF ψ ∗ can be constructed in polynomial time. It follows that,
for this special case of set covering problems, the transformation to an equivalent
logic minimization problem is polynomial, which completes the proof. 

Theorem 3.14 describes the complexity of the logic minimization problem in


the most commonly considered case, where the Boolean function is given by the
set of its true points. However, there are many other ways to represent a Boolean
function, for example, by an arbitrary DNF or CNF, by a complete truth table
containing the value of the function in all the 2n Boolean points, or by the set of its
false points, and so on. It is important to note that the computational complexity
of the logic minimization problem can depend on the representation of the input,
since different representations of the same Boolean function are not polynomially
156 3 Prime implicants and minimal DNFs

equivalent; namely, the length of one representation may not necessarily be limited
by a polynomial function of the length of another one.
Some representations of a Boolean function can be viewed as special cases of
others. For example, the representation of a Boolean function by the set of its true
points can be viewed as a special type of DNF representation. Thus, in particular,
Theorem 3.14 implies that the logic minimization problem is NP-hard for Boolean
functions expressed in DNF.
On the other hand, the representation of a Boolean function by the set of its
true points can be exponentially shorter than its representation by a complete truth
table. It is therefore surprising that the latter, possibly much larger, representation
does not make the logic minimization problem significantly simpler. As a matter of
fact, Masek [674] was able to prove that the logic minimization problem remains
NP-hard when its input is a complete truth table. A more accessible proof (based on
Gimpel’s construction [381]) of the latter result was recently proposed by Allender
et al. [16].

3.3.4 Efficient approximation algorithms for logic minimization


We saw in the previous subsection that the logic minimization problem is compu-
tationally equivalent to the set covering problem. Because of the NP-hardness
of these problems, it is widely believed that solving them to optimality may
require exponential time. This explains the importance of developing efficient
approximation algorithms for their solution.
One of the most natural approaches to the solution of many optimization prob-
lems is to use a “greedy” procedure, that is, an iterative process of which each
step is aimed at reaping the maximum immediate benefit, without the heavy com-
putational expense required to analyze its global impact. The general philosophy
of greedy procedures has found numerous implementations, often with excellent
results.
We are going to describe in this subsection an efficient greedy procedure for
solving logic minimization problems, derived from the associated set covering
formulations with constraints (3.5).
The classical greedy procedure for a generic set covering problem of the form

k

minimize si
i=1
k

subject to aj i si ≥ 1, j = 1, 2, . . . , n,
i=1

(s1 , s2 , . . . , sk ) ∈ Bk ,

is an iterative process at each step of which a variable si is chosen in such a way


that setting this variable to 1 satisfies the largest possible number of yet unsatisfied
constraints (3.5). When all the constraints are satisfied, the greedy procedure stops,
3.3 Logic minimization 157

and all those variables si , which have not been set to 1 in this process, are now
set to 0.
Let us now describe this greedy procedure in terms of the set covering matrix
A = (aj i )i=1,...,k
j =1,...,n . Denote by A(r) the reduced set covering matrix at the beginning
of step r of the greedy procedure. Thus, A(1) denotes the original set covering
matrix A. At step r, the greedy procedure
1. calculates the number |a(r)i | of 1’s in every column a(r)i of the matrix
A(r);
2. chooses a column a(r)ir having the maximum number of 1’s; and
3. reduces A(r) to A(r + 1) by removing from A(r) the chosen column a(r)ir
as well as all the rows a(r)j covered by it, namely, those with a(r)j ir = 1.
The process stops when all rows have been removed from the set covering matrix.
Let q be the number of steps of the greedy procedure. For simplicity, let us
renumber the columns of the set covering matrix in such a way that the removed
columns i1 , i2 , . . . , iq become 1, 2, . . . , q; the remaining columns are numbered from
q + 1 to k. Let us denote by wir the number |a(r)i | of 1’s in the i-th column of
A(r). With this notation, wrr is the number of rows removed from the set covering
matrix at step r of the greedy procedure. Note that w11 is the maximum number of
1’s in the columns of the original set covering matrix A (we assume, without loss
of generality, that w11 ≥ 1).
Two important observations about the greedy procedure are in order. First, this
procedure is very efficient. Indeed, if n is the number of rows and k is the number
of columns in the set covering matrix, then the number of steps of the procedure
does not exceed min{n, k}, while each step takes O(nk) time; therefore the com-
putational complexity of the greedy procedure is O(min{n, k}nk). Second, despite
its low computational cost, the greedy procedure produces very good solutions.
To quantify this last statement, let us compare the size q of the greedy cover (that
is, the number of variables fixed to 1 by the greedy procedure) with the size m of a
minimum cover. Obviously, m ≤ q. On the other hand, q cannot be “much worse”
than the optimum, in view of the following surprising result:
Theorem 3.15. For any set covering problem, if m is the size of a minimum cover,
then the size q of the greedy cover is bounded by the relation

q ≤ H (w11 )m, (3.19)

where w11 is the maximum number of 1’s in a column of the set covering matrix,
and
d
 1
H (d) = for all positive integers d.
i=1
i

We refer to Chvátal [198], Johnson [536], or Lovász [623], for a proof of this
classical result. It is easy to show (e.g., by induction) that H (d) ≤ 1 + ln d for any
positive integer d. Thus, Theorem 3.15 implies the following corollary (see also
Slavík [837] for a slight improvement).
158 3 Prime implicants and minimal DNFs

Corollary 3.7. For any set covering problem, if m is the size of a minimum cover,
then the size q of the greedy cover is bounded by the relation

q ≤ (1 + ln w11 )m ≤ (1 + ln n)m, (3.20)

where w11 is the maximum number of 1’s in a column of the set covering matrix,
and n is the number of its rows.

Let us now consider the application of these approximation results to logic


minimization. If a Boolean function is identically 1 and is represented by the set
of its true points, then, obviously, the logic minimization problem is trivial. If a
Boolean function is not identically 1, then each of its prime implicants covers
at most 2n−1 points. Together with Corollary 3.7, this observation implies the
following result.

Corollary 3.8. Let f be a Boolean function of n variables, let q be the number


of terms in its prime DNF constructed by the greedy procedure applied to the set
covering formulation (3.7)–(3.8), and let m be the number of terms in a ||φ||-
minimizing DNF of f . Then,

q ≤ (1 − ln 2 + n ln 2)m. (3.21)

We observed in Section 3.3.1 that, when the input is the set of true points
of a Boolean function, the set covering formulation (3.7)–(3.8) of the logic
minimization–problem can be constructed in polynomial time. Hence, the greedy
procedure also runs in polynomial time on this input and provides a solution of
the ||φ||-minimization problem that approximates its optimal value to within a
factor O(n).
A natural question to be asked now is whether there exists a polynomial time
algorithm having a significantly better approximation ratio. In all likelihood, the
answer to this question is negative. Indeed, Feldman [328] established the follow-
ing result: Even when the input of the logic minimization problem consists of the
complete truth table of a Boolean function f , there exists a constant γ > 0 such
that it is NP-hard to approximate m to within a factor nγ , where m is the number of
terms in a ||φ||-minimizing DNF of f , and n is the number of variables. This result
implies that the approximation factor achieved by the greedy algorithm is at most
polynomially larger than the best ratio that can be achieved in polynomial time,
unless P=NP. (When the input is an arbitrary DNF, Umans [876] proves stronger
inapproximability results.)
Additionally, the following surprising fact was proved by Feige [324]: Under
the assumption that NP-complete problems cannot be solved in O(l O(log log l) ) time
(where l denotes the length of the input), it is shown in [324] that no polyno-
mial time algorithm for the set covering problem can have an approximation ratio
less than (1 − o(1)) ln ρ. Since, by Corollary 3.7, the approximation ratio of the
greedy procedure is (1 + o(1)) ln ρ, the only remaining possibility is to improve
the approximation ratio by a lower-order term o(ln ρ).
3.4 Extremal and typical parameter values 159

Chvátal [198] generalized Theorem 3.15 and Corollary 3.7 for the weighted
version of the set covering problem in which nonnegative weights ci are associated
with the variables si , and the problem is the following:
k

minimize ci s i
i=1
k

subject to aj i si ≥ 1, j = 1, 2, . . . , n,
i=1

(s1 , s2 , . . . , sk ) ∈ Bk .

In this case, the generalized greedy procedure is defined in a similar way, the only
difference being that at each iteration r a column a(r)i is chosen so as to maximize
the ratio wir /ci of the number of 1’s remaining in the column divided by its weight.
The approximation results of Theorem 3.15 and Corollary 3.7 remain valid for this
weighted set covering problem [198]. Therefore, if ql is the number of literals in
a prime DNF of an n-variable Boolean function f, constructed by the generalized
greedy procedure applied to the set covering formulation (3.9)–(3.10), and if ml
is the number of literals in a |φ|-minimizing DNF of f , then it follows, similarly
to Corollary 3.8 that
ql ≤ (1 − ln 2 + n ln 2)ml . (3.22)

3.4 Extremal and typical parameter values


Several numerical parameters provide important information about Boolean func-
tions and their DNFs. Typical examples of such parameters include the number of
terms and the number of literals of a DNF, the degree of implicants, the number
of irredundant and prime DNFs of a Boolean function, and so on. We discuss first
several issues related to the number of prime implicants of a Boolean function.

3.4.1 Number of prime implicants


The number of different terms, or elementary conjunctions, in n Boolean variables
equals 3n , since each variable can be either present in uncomplemented form or
present in complemented form, or can be absent in a term. We shall show that a
Boolean function of n variables can have almost as many prime implicants as there
are terms. We obtain this result by analyzing a special class of Boolean functions,
called symmetric. The value of a symmetric function depends only on the number
of 1’s in the Boolean point where it is computed (see Exercise 5 in Chapter 1). An
important subclass of symmetric functions consists of the so-called belt functions,
denoted bnm,k and defined by
 
1, if m ≤ ni=1 xi ≤ m + k,
bnm,k (x1 , . . . , xn ) =
0, otherwise.
160 3 Prime implicants and minimal DNFs

Here m and k are nonnegative integers such that m + k ≤ n.



Lemma 3.12. A term C = i∈P xi j ∈N x j is a prime implicant of a belt function
bnm,k if and only if |P | = m and |N | = n − m − k.
Proof. Clearly, every term with |P | = m and |N | = n − m − k is an implicant, since
it covers only points whose number of 1’s is between m and m + k. Every such
implicant is prime, since removing any literal from the term will result in a term
that covers a point with either fewer than m 1’s or more than m + k 1’s.
On the other hand, an implicant of bnm,k must have at least m positive and
n − m − k negative literals, and if an implicant has more than m positive or more
than n − m − k negative literals, then it is not prime, since the term that results
from removing an extra literal will remain an implicant. 

Theorem 3.16. There is a positive constant c such that, for every n ≥ 3, there
n
exists a Boolean function of n variables having at least c 3n prime implicants.
n,n
Proof. The statement holds for the belt function bn3 3 . Indeed, it follows from
Lemma 3.12 that the number of prime implicants of a belt function bnm,k equals

n n−m n!
= ,
m n−m−k m!k!(n − m − k)!
which, for m = k = n3 , equals
n!
. (3.23)
( n3 !)3
Substituting into (3.23) the well-known Stirling formula (see, e.g., [314])
√ n
n! = 2π n( )n (1 + o(1)),
e
one can see that there exists a positive constant c such that the number of prime
n,n n
implicants of bn3 3
is at least c 3n . 

The previous statement shows that the number of prime implicants of a Boolean
function can be exponentially large in the number of Boolean variables. From the
algorithmic point of view, it is also important to understand how large the number
of prime implicants can be in terms of the length of an arbitrary DNF or CNF
representation of a Boolean function. Interestingly, the number of prime implicants
can be exponential in the length of a DNF, even for seemingly simple functions,
as the following theorem shows.
Theorem 3.17. For every integer n ≥ 1, there exists a Boolean function f that
has 2n + 2n prime implicants and can be represented by a DNF having 2n + 1
terms.
3.4 Extremal and typical parameter values 161

Proof. Let f be the Boolean function represented by the DNF

n n


φ(x1 , . . . , xn , y1 , . . . , yn ) = xi ∨ (xi y i ∨ x i yi ),
i=1 i=1

which has 2n + 1 terms and can be easily seen to be prime.


If we apply Theorem 3.2 consecutively to every pair {xi , yi }, we can see that
an elementary conjunction different from xi y i or x i yi is a prime implicant of f if

and only if it has the form ni=1 ui , where each ui is either xi or yi . Therefore, the
number of prime implicants of f equals 2n + 2n. 

The argument in this proof can be easily modified (e.g., by adding to φ an


additional linear term z) for the case of DNFs with an even number of terms.
Similarly, the number of prime implicants of a Boolean function can be
exponentially large in the length of a CNF representation of the function.

Theorem 3.18. For every integer n ≥ 1, there exists a Boolean function f that
has 3n prime implicants and can be represented by a CNF having n clauses.

Proof. Let us consider the positive function f of 3n variables represented by the


CNF
n

ψ(x1 , . . . , xn , y1 , . . . yn , z1 , . . . zn ) = (xi ∨ yi ∨ zi ).
i=1

This CNF has n clauses. It is clear from the CNF expression that, in each min-
imal true point of f , exactly one the variables xi , yi , and zi equals 1, for every
i ∈ {1, 2, . . . , n}. Therefore, in view of Theorem 1.26, the complete DNF of f con-
sists of elementary conjunctions of the form ni=1 ui , where each ui is either xi ,
yi , or zi . Hence, the function f has 3n prime implicants. 

3.4.2 Extremal parameters of minimal DNFs


To better understand the nature of the logic minimization problem, we provide
in this section some evaluations of the extremal values of a number of important
DNF parameters.
We start our discussion with the analysis of the worst-case values of DNF
parameters. Probably, the most interesting such parameter related to the logic min-
imization problem is the largest number of terms contained in a ||φ||-minimizing
DNF of a Boolean function of n variables.

Theorem 3.19. A ||φ||-minimizing DNF of a Boolean function of n variables


cannot contain more than 2n−1 terms, and this number of terms can be attained.
162 3 Prime implicants and minimal DNFs

Proof. To establish the upper bound, we prove by induction on the number of


variables that every Boolean function of n variables can be represented by a DNF
containing at most 2n−1 terms. Clearly, the statement holds for n = 1. Assuming
that the statement holds for all functions of up to n − 1 variables, let us consider
an arbitrary function f (x1 , x2 , . . . , xn ) and, using the Shannon expansion, represent
it as

f (x1 , x2 , . . . , xn ) = x n f (x1 , x2 , . . . , xn−1 , 0) ∨ xn f (x1 , x2 , . . . , xn−1 , 1).

By the induction hypothesis, f (x1 , x2 , . . . , xn−1 , 0) and f (x1 , x2 , . . . , xn−1 , 1), being
functions of n − 1 variables, have DNF representations φ0 and φ1 such that ||φ0 || ≤
2n−2 and ||φ1 || ≤ 2n−2 . Then, f (x1 , x2 , . . . , xn ) can be represented by the expression

x n φ0 ∨ xn φ1 ,

which immediately expands into a DNF φ such that ||φ|| ≤ ||φ0 || + ||φ1 || ≤ 2n−1 .
To show that the bound is attained, define the parity function of n variables
to be the Boolean function whose value in the Boolean point X = (x1 , x2 , . . . , xn )

is 1 if and only if ni=1 xi is odd. Obviously, the number of true points of the
parity function is 2n−1 . Since every two terms in the minterm DNF of the parity
function have degree n and conflict in at least two variables, this DNF is closed
under absorption and consensus, and is therefore the complete DNF of the parity
function. Since the minterm DNF is obviously irredundant, it then follows that the
parity function has a unique DNF representation, and that this representation has
2n−1 terms. 

Another parameter of interest for logic minimization is the so-called spread of


f : If φm is any ||φ||-minimizing DNF of f , the spread of f is
||φ||
Y (f ) = max{ : φ is a prime irredundant DNF of f }.
||φm ||
It was shown by Vasiliev [889] that the√maximum value of Y (f ) over all Boolean
functions of n variables is at least 2n−3 n , which clearly justifies the relevance of
logic minimization.
Since among the ||φ||-minimizing DNFs there is always a prime irredundant
one, it is also interesting to obtain some information about the number I(f ) of
different prime irredundant DNFs of a Boolean function f . It turns out (see [890])
that the maximum value of I(f ) over all Boolean functions of n variables exceeds
n(1−o(1))
22 , where o(1) → 0 when n → ∞.

3.4.3 Typical parameters of Boolean functions and their DNFs


n
Since the number of distinct Boolean functions of n variables is 22 , let us say
that a certain property holds for almost all Boolean functions if the number of
n
functions of n variables that have this property is (1 − o(1))22 .
3.4 Extremal and typical parameter values 163

Theorem 3.20. For almost all Boolean functions f of n variables, the number
|T (f )| of true points of f satisfies the inequalities:
n n
2n−1 − n2 2 ≤ |T (f )| ≤ 2n−1 + n2 2 . (3.24)

Proof.
 n  The number of Boolean functions of n variables having exactly k true points
is 2k , since every Boolean point is either a true point or a false point. Hence, the
total number of Boolean functions with the property that their number of true
points satisfies (3.24) is
n
 n

2n−1
 +n2 2 n −n2 2 −1 n
2n−1
2 n  2 
= 22 − 2  .
n k k=0
k
k=2n−1 −n2 2

The statement of the theorem follows from the fact that


n
2n−1
 −n2 2 n
2 n 2n n
≤ (2n−1 − n2 2 ) n−1 n = o(22 ),
k=0
k 2 − n2 2

 
where the last equality can be obtained by using the formula mk = k!(m−k)!
m!
, together
with the following refined version of the Stirling formula (see, e.g., [314]):
√ n 1 √ n 1
2πn( )n e 12n+1 < n! < 2π n( )n e 12n ,
e e
with the limits: (1 + m1 )m → e and (1 − m1 )m → 1
e
when m → ∞. 

Asimple interpretation of Theorem 3.20 is that, for almost all Boolean functions,
the number of true points is about the same as that of false points, namely, about
2n−1 . After establishing this fact, it is natural to ask in what way these two sets
of true and false points are mixed in the Boolean hypercube. More specifically,
one may wonder whether the set of true points of a typical Boolean function
contains large subcubes. The next theorem states that a typical Boolean function
has only “long” implicants, thus showing that the answer to the previous question
is negative.
Theorem 3.21. For almost all Boolean functions of n variables, the degree of
every implicant is at least n − log2 (3n).
Proof. Before proving the statement, we first calculate the average number of
implicants of a fixed degree k over the set of all Boolean functions
 of n variables.
Note that the number of different terms of degree k is nk 2k . Every such term
takes the value 1 in exactly 2n−k Boolean vectors. Therefore, every such term is
n n−k
an implicant of exactly 22 −2 different Boolean functions of n variables.
Let us consider now a bipartite graph having two disjoint vertex sets A and B,
where the nodes in A correspond to the terms of degree k over n variables, while
164 3 Prime implicants and minimal DNFs

the nodes in B correspond to the different Boolean functions of n variables; an


edge (a, f ) connects a ∈ A to f ∈ B if and only if a is an implicant of f . Clearly,
the number of edges in this graph is

n k 2n −2n−k
2 2 .
k
n
Since the total number of Boolean functions of n variables is 22 , the average
number of edges incident to a node in B is
n k
1 n k 2n −2n−k k
2
2 2 = . (3.25)
22n k 22n−k
Obviously, this number is the average number of implicants of degree k over the
n
set of all Boolean functions of n variables. It follows that at most n1 22 Boolean
functions of n variables can have

n nk 2k
g(n, k) = n−k
22
or more implicants of degree k, since, otherwise, the average number of implicants
would exceed (3.25). Therefore, for almost all Boolean functions of n variables,
the number of implicants of degree k is at most g(n, k).
Obviously, if g(n, k) < 1, then it is true that almost all Boolean functions
 of n
variables do not have implicants of degree k or less. Since neither nk nor 2k can
exceed 2n , and since n < 2n , the inequality
n−k
23n < 22 (3.26)
implies that g(n, k) < 1. Obviously, the inequality (3.26) is implied by the
inequality
k < n − log2 (3n). (3.27)
This shows that if (3.27) holds, then g(n, k) < 1. Hence, for almost all Boolean
functions of n variables, the degree of any implicant is at least n − log2 (3n). 

The next natural question concerns the number of terms (or literals) in a ||φ||-
minimizing (or |φ|-minimizing) DNF of a typical Boolean function of n variables.
Several important results are known in this area. We shall not present the detailed
proofs of these technical results here, and we give only a brief overview.
An interesting result obtained by Nigmatullin [712] shows that the number of
terms (respectively, literals) in the ||φ||-minimizing (respectively, |φ|-minimizing)
DNFs of almost all Boolean functions of n variables is asymptotically the same.
Let t(n) and l(n) represent “asymptotic estimates” of these two numbers. It follows
from Theorem 3.21 that l(n) behaves like nt(n); thus, it is sufficient to estimate
t(n) only.
Glagolev [385] obtained the following lower bound on t(n):
2n−1
t(n) ≥ .
(log2 n)(log2 log2 n)
3.5 Exercises 165

Moreover, an upper bound on t(n) obtained by Sapozhenko [803] shows that


2n
t(n) ≤ .
log2 n
Together with Theorem 3.19, these two bounds imply that the number of terms
in the ||φ||-minimizing DNFs of almost all Boolean functions of n variables is
asymptotically smaller than the worst possible one, but not by much.
To conclude, we stress that the results in this section are only intended to indicate
the flavor of the research carried out in this area. A more complete presentation
would substantially exceed the scope of this volume.

3.5 Exercises
1. Consider a set of ordered pairs D = {(i, j )}, where i, j ∈ {1, 2, . . . , n}, and call
a Boolean point X = (x1 , x2 , . . . , xn ) D-feasible if, for every pair (i, j ) ∈ D,
the implication “xi = 1 implies xj = 1” holds. Let fD be the Boolean function
that takes the value 1 on D-feasible Boolean points, and the value 0 on all
the other Boolean points. Prove that f D has no prime implicants of degree 3.
2. Consider the linear inequality
n

2i xi ≤ k,
i=0

where xi ∈ {0, 1}, i = 0, 1, . . . , n, and consider the Boolean function


f (x0 , x1 , . . . , xn ) that takes the value 0 if and only if the Boolean point
(x0 , x1 , . . . , xn ) satisfies the given inequality. Determine the maximum degree
of a prime implicant of f if
(a) k = 2m − 1,
(b) k = 2m − 2,
where m is a positive integer not exceeding n.
3. Prove that it is NP-hard to check whether all the prime implicants of a
Boolean function given by a DNF are quadratic.
4. Prove Lemma 3.8.

5. Let ψ = m k=1 k be a DNF of a Boolean function f . Prove that the following
C
statements are equivalent:
(a) The collection {Ck |k = 1, 2, . . . , m} contains all prime implicants of f .
(b) For every DNF φ, the implication φ ≤ ψ holds if and only if each term
of φ is absorbed by some term of ψ.
6. Prove that, if f and g are two Boolean functions on B n such that g ≤ f , then
every implicant of g is absorbed by some prime implicant of f .
7. Prove that the following problem is co-NP-complete: Given an elementary
conjunction C, and given a prime and irredundant DNF ψ, decide whether
C is an implicant of the function represented  by ψ. (Compare with Theo-
rem 3.7.) Hint: Show that the DNF equation m k=1 Ck = 0 is consistent if
and only if C = y1 y2 . . . ym is not an implicant of the prime and irredundant

DNF ψ(X, Y ) = m k=1 yk Ck .
166 3 Prime implicants and minimal DNFs

8. Prove that it is NP-complete to decide whether a DNF ψ is irredundant.



Hint: Show that the DNF equation m C = 0 is consistent if and only if
mk=1 k
the DNF ψ(X, Y ) = y1 y2 . . . ym ∨ k=1 yk Ck is irredundant.
9. Prove that it is NP-complete to decide whether a DNF ψ is prime.  (Compare
with Corollary 3.3.) Hint: Show that the DNF equation m k=1 Ck = 0 is
consistent if and only if theDNF ψ(X, Y ) = (y 1 y2 . . . ym ) ∨ (y1 y 2 y3 . . . ym )
∨ . . . ∨ (y1 . . . ym−1 y m ) ∨ m
k=1 yk Ck is prime.
10. Let f be an arbitrary Boolean function on B n , let D+ be the set of its positive
prime implicants, and let f− be the largest positive minorant of f ; namely,

f− is the largest positive function smaller than f . Prove that f− = P ∈D+ P .
(See Exercise 13 in Chapter 1, and Hammer, Johnson and Peled [443].)
11. If every term of a DNF of a Boolean function f contains at most one
nonnegated variable, then show that the same property holds for each prime
implicant of f .
12. Does the property described in Exercise 11 hold if “at most one” is replaced
by “at least one”?
13. Does the property described in Exercise 11 hold if “at most one” is replaced
by “at most two”?
14. Let us define a simplification algorithm ERA+ consisting of the application
of ERA followed by the repeated application of S4 as long as possible, and
then the iteration of these two steps as long as possible. Let A  denote the
final set covering matrix, let  s1 denote the number of the s variables fixed
at 1, and let  s0 denote the number of the s variables fixed at 0. Let us also
consider a procedure consisting of the repeated applications of S1, S2, S3,
and S4 in any order and as long as possible. Let A∗ , s1∗ , and s0∗ denote the
final set covering matrix and the number of the s variables fixed at 1 and at
0, respectively. Prove that
• s1 = s1∗ ,
• s0 = s0∗ ,
• the matrix A  can be obtained from A∗ by a permutation of its rows and
columns.
15. Prove that if a Boolean function f is not identically 1, then every linear
implicant of f is an essential prime implicant.
16. Prove that if a Boolean function has linear prime implicants, then the DNF
constructed by the greedy procedure will include all of them.
17. Use the result in the previous exercise to show that, if n ≥ 2, then the
approximation ratio of the greedy procedure is not greater than n ln 2 −
2 ln 2 + 1.
18. Construct an example showing that the DNF produced by the greedy
procedure is not necessarily irredundant.
4
Duality theory
Yves Crama and Kazuhisa Makino

This chapter deals with yet another fundamental topic in the theory of Boolean
functions, namely, duality theory. Some of the applications of duality were sketched
in Chapter 1, and the concept has appeared at various occasions in Chapters 2 and 3.
Here, we collect some of the basic properties of the dual of Boolean functions
and, then characterize those functions that are comparable to (i.e., either imply,
or are implied by) their dual. A large section of the chapter is then devoted to
algorithmic aspects of dualization, especially for the special and most interesting
case of positive functions expressed in disjunctive normal form. It turns out that
the complexity of the latter problem remains incompletely understood, in spite of
much recent progress on the question.

4.1 Basic properties and applications


4.1.1 Dual functions and expressions
Recall Definition 1.8 from Chapter 1, Section 1.3.
Definition 4.1. The dual of a Boolean function f is the function f d defined by

f d (X) = f (X) (4.1)

for all X = (x1 , x2 , . . . , xn ) ∈ Bn , where X = (x 1 , x 2 , . . . , x n ).


Example 4.1. Let f be the 2-variable function defined by f (0, 0) = f (0, 1) =
f (1, 1) = 1 and f (1, 0) = 0. Then the dual of f is defined by f d (0, 0) = f d (1, 0) =
f d (1, 1) = 0 and f d (0, 1) = 1. 
The basic properties of dual functions are easily established.
Theorem 4.1. If f and g are Boolean functions on Bn , then
(a) g = f d if and only if, for all X ∈ B n , f (X) ∨ g(X) = 1 and f (X) g(X) = 0;
(b) (f d )d = f (involution: the dual of the dual is the function itself);

167
168 4 Duality theory

(c) (f )d = (f d );
(d) (f ∨ g)d = f d g d ;
(e) (f g)d = f d ∨ g d ;
(f) f ≤ g if and only if g d ≤ f d .

Proof. All these properties are trivial consequences of Definition 4.1 (properties
(b)–(e) have already been verified in Theorem 1.2). 

In view of the involution property (b), we sometimes say that two functions
f , g are mutually dual when g = f d or, equivalently, when f = g d .
Observe that the properties stated in Theorem 4.1 continue to hold when we
replace dualization by complementation. As a matter of fact, investigating proper-
ties of the dual function f d is tantamount to investigating properties of the function
f , namely, the complement of f , up to the “change of variables” X ↔ X. It turns
out, however, that the duality concept arises quite naturally in several applications.
Therefore, we prefer to place our discussion in this framework.
If f is a function on B n and P , N are disjoint subsets of {1, 2, . . . , n}, then we
denote by f|P ,N the restriction of f obtained by fixing xi = 1 for all i ∈ P and
xj = 0 for all j ∈ N . The next property expresses in a formal way that “the dual
of the restriction of a function is the restriction of the dual of the function to the
complementary values.”

Theorem 4.2. Let f be a Boolean function on Bn , and let P , N ⊆ {1, 2, . . . , n},


with P ∩ N = ∅. Then (f|P ,N )d = (f d )|N,P .

Proof. This property follows from the definition of f d . 

Another easy, but useful, property is stated as follows:

Theorem 4.3. Let f and g be two Boolean functions on Bn . If f and g are mutually
dual, then, for all i ∈ {1, 2, . . . , n}, f|xi =0 and g|xi =1 are mutually dual, and f|xi =1
and g|xi =0 are mutually dual. Conversely, if for some i ∈ {1, 2, . . . , n}, f|xi =0 and
g|xi =1 are mutually dual, and f|xi =1 and g|xi =0 are mutually dual, then f and g are
mutually dual.

Proof. For every i = 1, 2, . . . , n, we can write the Shannon expansions of g and


f d as

g = xi g|xi =1 ∨ x i g|xi =0 , (4.2)


f d = xi (f d )|xi =1 ∨ x i (f d )|xi =0 . (4.3)

From (4.3) and Theorem 4.2,

f d = xi (f|xi =0 )d ∨ x i (f|xi =1 )d . (4.4)

The theorem follows by comparing (4.2) and (4.4). 


4.1 Basic properties and applications 169

We also recall that, by definition, the dual of a Boolean expression φ is the


expression φ d obtained by exchanging ∨ and ∧ as well as the constants 0 and 1 in
φ (Definition 1.9). We have shown that if the expression φ represents f , then φ d
represents f d (Theorem 1.3). The latter property can be seen as a consequence of
De Morgan’s laws.

Example 4.2. Consider again the function f in Example 4.1. Then ϕ = x ∨ y


represents f , and the dual expression ϕ d = x y represents f d . 

More generally, we mention the following fundamental duality principle of


Boolean algebra:

Theorem 4.4. Let I be a valid statement expressed in terms of the constants 0, 1;


the operations ∨, ∧; the implication relations ≤, ≥; and Boolean functions. Then
the “dual statement” I d obtained from I by exchanging the symbols 0 and 1, ∨
and ∧, ≤ and ≥, and by replacing every function by its dual, is also valid.

Proof. We refer, for example, to Rudeanu [795] or Stoll [848]. 

4.1.2 Normal forms and implicants of dual functions


This subsection considers disjunctive and conjunctive normal forms, as well as
(prime) implicants and implicates of dual functions. The following connection
is an immediate consequence of the properties mentioned before; we record it
explicitly because of its importance.
  
Theorem 4.5. The DNF φ = m k=1 j ∈Pk xj j ∈Nk x j represents the Boolean
  
function f if and only if the CNF ψ = m k=1 j ∈Pk xj
d
j ∈Nk x j represents f .

Proof. This holds because ψ is the dual expression of φ. 

This theorem suggests a simple characterization of the (prime) implicants and


implicates of dual functions.

Theorem 4.6. For a Boolean function f ,



(i) the elementary conjunction CP N = j ∈P xj j ∈N x j is an impli-
d
cant (respectively, a prime implicant) of f if and only if DP N =
 
j ∈P xj j ∈N x j is an implicate (respectively,
  a prime implicate) of f ;
(ii) the elementary disjunction DP N = j ∈P xj j ∈N x j is an implicate (respec-

tively, a prime implicate) of f d if and only if CP N = j ∈P xj j ∈N x j is an
implicant (respectively, a prime implicant) of f .
Proof. Assertion (i) easily follows from the observation that f ≤ DP N if and only
if DPd N ≤ f d , and from the identity DPd N = CP N . Assertion (ii) is obtained by
interchanging the roles of f and f d in the previous one. 
170 4 Duality theory

The next result presents an alternative characterization of dual prime impli-


cants, which is frequently useful. In words, this characterization expresses that,
when viewed as collections of literals, dual implicants and implicants always have
a nonempty intersection, and that dual prime implicants are minimal with this
property. (A similar characterization of prime implicates would be immediately
obtained by combining the statements of Theorems 4.6 and 4.7.)
  
Theorem 4.7. Let φ = m i=1 j ∈Pi x j x j be an arbitrary DNF of
j ∈Ni
a Boolean function f , and let CP N = j ∈P xj j ∈N x j be an elementary
conjunction. Then,
(i) CP N is an implicant of f d if and only if

(P ∩ Pi ) ∪ (N ∩ Ni )  = ∅ for i = 1, 2, . . . , m; (4.5)

(ii) CP N is a prime implicant of f d if and only if (4.5) holds and, for every P ⊆ P
and N ⊆ N with P ∪ N  = P ∪ N , there exists an index i ∈ {1, 2, . . . , m}
such that
(P ∩ Pi ) ∪ (N ∩ Ni ) = ∅.
Proof. By definition of dual functions
(namely,
f d (X) = f (X)), CP N is an impli-
cant of f d if and only if CNP = j ∈P x j j ∈N xj is an implicant of f . Since
f ∧ f = 0, the identity CPi Ni ∧ CNP = 0 must hold for all implicants CPi Ni of f ,
which implies (4.5).
Conversely, if (4.5) holds, then f ∧ CN P = 0 holds identically, meaning that
CN P is an implicant of f . This establishes assertion (i).
Assertion (ii) follows from the definition of prime implicants. 

Observe that, in conditions (i) and (ii) of Theorem 4.7, the conjunctions CPi Ni
could be taken to be prime implicants, rather than arbitrary implicants of f .

4.1.3 Dual-comparable functions


Definition 4.2. A Boolean function f is called dual-minor if f ≤ f d , dual-major
if f ≥ f d , and self-dual if f d = f . A function is dual-comparable if it is either
dual-minor, dual-major, or self-dual.
Example 4.3. The function f = x1 x2 x3 is dual-minor, since f d = x1 ∨ x2 ∨ x3
satisfies f ≤ f d . By Theorem 4.5, the dual of g = x1 x2 x 3 ∨ x1 x 2 x3 ∨ x 1 x2 x3 ∨
x 1 x 2 x 3 is

g d = (x1 ∨ x2 ∨ x 3 )(x1 ∨ x 2 ∨ x3 )(x 1 ∨ x2 ∨ x3 )(x 1 ∨ x 2 ∨ x 3 )


= x1 x2 x 3 ∨ x1 x 2 x3 ∨ x 1 x2 x3 ∨ x 1 x 2 x 3 ,

and therefore g is self-dual. 


4.1 Basic properties and applications 171

The investigation of dual-comparable functions has proved useful in a variety of


contexts (see, e.g., Muroga [698]). The next theorems present several characteriza-
tions of these functions. First, we observe that trivial examples of dual-comparable
functions are easily provided.

Theorem 4.8. Suppose that the function f has a prime implicant of degree 1.
Then, f is dual-major. Moreover, f is dual-minor (and self-dual) if and only if f
has no other prime implicant.

Proof. Assume without loss of generality that f (x1 , x2 , . . . , xn ) = x1 ∨


g(x2 , x3 , . . . , xn ). Then, f d = x1 g d . Since f = 0 implies x1 = 0, we see that
f is dual-major. If f has no other prime implicant than x1 , then f is clearly
self-dual. Conversely, if f has another prime implicant, then there exists a point
(x1∗ , x2∗ , . . . , xn∗ ) ∈ B n such that x1∗ = 0 and f (x1∗ , x2∗ , . . . , xn∗ ) = 1. But x1∗ = 0 implies
f d (x1∗ , x2∗ , . . . , xn∗ ) = 0, and we conclude that f is not dual-minor. 

Of course, the function g in Example 4.3 shows that there also exist nontrivial
examples of dual-comparable functions. The next result is a simple restatement of
Definition 4.2 (compare with Theorem 4.1(a)).

Theorem 4.9. Let f be a Boolean function on B n .


(i) f is dual-minor if and only if the complement of every true point of f is a
false point of f: For all X ∈ Bn , f (X) = 1 ⇒ f (X) = 0 or, equivalently,
f (X) f (X) = 0.
(ii) f is dual-major if and only if the complement of every false point of f is
a true point of f: For all X ∈ Bn , f (X) = 0 ⇒ f (X) = 1 or, equivalently,
f (X) ∨ f (X) = 1.
(iii) f is self-dual if and only if every pair of complementary points contains
exactly one true point and one false point of f: For all X ∈ Bn , f (X) = 1
⇔ f (X) = 0.
The next characterization of dual-minor functions is based on Theorem 4.7.

Theorem 4.10. A function f is dual-minor if and only if

(P ∩ P ) ∪ (N ∩ N )  = ∅ (4.6)

for all pairs of (prime) implicants CP N = j ∈P xj j ∈N x j and CP N =

j ∈P xj j ∈N x j of f .

Proof. If f is dual-minor, then every implicant CP N of f is an implicant of f d ,


since CP N ≤ f ≤ f d . In view of Theorem 4.7(i), this implies conditions (4.6).
On the other hand, if f is not dual-minor, then there exists a (prime) implicant
CP N of f such that CP N  ≤ f d , that is, such that CP N is not an implicant of f d .
Hence, by Theorem 4.7(i), there exists a (prime) implicant CP N of f such that
(4.6) does not hold. 
172 4 Duality theory

We now give a necessary and sufficient condition for a function to be dual-major.

Theorem 4.11. A function f is dual-major


if and
only if, for all A ⊆ {1, 2, . . . , n},
there exists a (prime) implicant CP N = j ∈P xj j ∈N x j of f such that either
P ⊆ A and N ∩ A = ∅, or P ∩ A = ∅ and N ⊆ A.

Proof. For any X ∈ Bn , let A = { i | xi = 1 }. Then, the condition to be established


is easily seen to be equivalent to condition (ii) in Theorem 4.9. 

We now establish that self-dual functions are maximal among all dual-minor
functions. More precisely, let us say that a dual-minor function f is maximally
dual-minor if there exists no dual-minor function g such that f ≤ g and f  = g.

Theorem 4.12. A Boolean function is self-dual if and only if it is maximally dual-


minor.

Proof. If f is self-dual and g is a dual-minor function such that f ≤ g, then we


derive the sequence of inequalities

gd ≤ f d = f ≤ g ≤ gd .

Hence, f = g, implying that f is maximally dual-minor.


Conversely, assume that f is dual-minor, but not self-dual. Then, there exists a
point X∗ such that f (X∗ ) = 0 and f d (X ∗ ) = 1.Assume for instance that x1∗ = 1, and
consider the function g = f ∨ f d x1 . Clearly, f ≤ g, and f  = g since g(X ∗ ) = 1.
Moreover, g is dual-minor (actually, self-dual):

g d = f d (f ∨ x1 ) = f d f ∨ f d x1 = g

(the last equality holds because f is dual-minor). Therefore, f is not maximally


dual-minor. 

Of course, we would similarly show that:

Theorem 4.13. A Boolean function is self-dual if and only if it is minimally dual-


major.

The construction used in the proof of Theorem 4.12 can be generalized to yield
a simple, standard way of associating a self-dual function with an arbitrary Boolean
function.

Definition 4.3. For a Boolean function f (x1 , x2 , . . . , xn ), the self-dual extension


of f is the function f SD (x1 , x2 , . . . , xn , xn+1 ), defined by

f SD (x1 , x2 , . . . , xn , xn+1 ) = f (x1 , x2 , . . . , xn ) x n+1 ∨ f d (x1 , x2 , . . . , xn ) xn+1 .


(4.7)

This terminology is well-justified.


4.1 Basic properties and applications 173

Theorem 4.14. For every Boolean function f , the function f SD defined by (4.7)
is self-dual. The mapping SD: f + → f SD is a bijection between the set of Boolean
functions of n variables and the set of self-dual functions of n + 1 variables.

Proof. The dual of (4.7) is

(f d (X) ∨ x n+1 )(f (X) ∨ xn+1 ) = f d (X) f (X) ∨ f (X) x n+1 ∨ f d (X) xn+1
= f (X) x n+1 ∨ f d (X) xn+1 .

Hence, f SD is self-dual.
The mapping SD is injective, since the restriction of f SD to xn+1 = 0 is exactly
f . Morevover, SD has an inverse, defined by g + → g|xn+1 =0 for every self-dual
function g ∈ Bn+1 . Indeed, if g is self-dual, then

(g|xn+1 =0 )SD = g|xn+1 =0 x n+1 ∨ (g|xn+1 =0 )d xn+1


= g|xn+1 =0 x n+1 ∨ (g d )|xn+1 =1 xn+1
= g|xn+1 =0 x n+1 ∨ g|xn+1 =1 xn+1 ,

and this last expression is exactly the Shannon expansion of g. 

Note that, when applied to dual-minor functions, the definition of self-dual


extensions assumes a simpler form.

Theorem 4.15. If f is a dual-minor function on B n , then f SD = f ∨ f d xn+1 .

Proof. This holds because, for all a, b, x ∈ B, a ≤ b implies ax ∨ bx = a ∨ bx. 


n−1
Theorem 4.14 implies, in particular, that there are 22 self-dual functions of
n
n variables, as compared to 22 Boolean functions of n variables (this could have
been deduced from Theorem 4.9(iii) as well).
Another corollary of Theorem 4.14 is that dual comparability is not preserved
under fixation of variables, a fact which also follows directly from the observation
that the constant function 1n is not dual-minor, and that the constant function 0 n is
not dual-major. Interestingly, however, self-duality is preserved under composition
of Boolean functions.

Theorem 4.16. If f1 (x1 , x2 , . . . , xn , xn+1 ) and f2 (y1 , y2 , . . . , ym ) are self-dual


functions (where f1 and f2 may depend on common variables), then the function

g(x1 , x2 , . . . , xn , y1 , y2 , . . . , ym ) = f1 (x1 , x2 , . . . , xn , f2 (y1 , y2 , . . . , ym ))

is self-dual.
174 4 Duality theory

Proof. Let X = (x1 , x2 , . . . , xn ) ∈ Bn and Y = (y1 , y2 , . . . , ym ) ∈ Bm . Then,

g(X, Y ) = f1 (x 1 , x 2 , . . . , x n , f2 (Y ))
= f1 (x 1 , x 2 , . . . , x n , f2 (Y ))
= f1 (x1 , x2 , . . . , xn , f2 (Y ))
= g(X, Y ),

which shows that g is self-dual. 

4.1.4 Applications
Duality plays a central role in various applications arising in artificial intelli-
gence, computational logic, data mining, reliability theory, game theory, integer
programming, and so on. Some of these applications have already been mentioned
in previous chapters. We have observed several times, for instance, that one way
of solving the Boolean equation φ = 0 is to compute a CNF representation of φ or,
equivalently, a DNF representation of φ d . Actually, if a DNF expression of φ d is
at hand, then all solutions of the equation φ = 0 are readily available (see Section
2.11.2).

Example 4.4. Let f (x, y, z, u) = xy ∨ xzu ∨ xyz ∨ yzu. It can be checked that
f d = xz ∨ yz ∨ xyu ∨ x y u. Hence, the solutions of f = 0 have the form (1, ∗, 0, ∗),
(∗, 0, 0, ∗), (0, 0, ∗, 0), or (1, 1, ∗, 1), where ∗ denotes an arbitrary 0–1 value. 

We now present a few additional models involving dual functions. Other appli-
cations will be presented in Section 4.2, when we concentrate more specifically
on positive functions.

Application 4.1. (Artificial intelligence, electrical engineering.) Reiter [783] pro-


poses a logic-based framework for the analysis of diagnosis problems and presents
an application to fault diagnosis in combinational circuits. If we restrict ourselves
to propositional logic, then Reiter’s approach can be sketched as follows (even in
propositional logic, Reiter’s model is actually more general than the one below;
but our formulation is already sufficiently general, for instance, to encompass the
circuit fault diagnosis problem):
Consider a complex system I consisting of m interrelated components.
The intended operation of I is modeled by a collection of Boolean equations
φk (X, Y ) = 0 (k = 1, 2, . . . , m), with the following interpretation: For every point

X∗ ∈ B n , there exists a unique point Y ∗ ∈ B t such that m ∗ ∗
k=1 φk (X , Y ) = 0
∗ n
(see Section 1.13.2). Each point X ∈ B is called an observation. Assume now

that a particular observation X ∗ is such that the equation m ∗
k=1 φk (X , Y ) = 0 is
inconsistent. This means that the behavior of the system I deviates from its spec-
ification, that is, I is faulty. The diagnosis issue is now, intuitively, to understand
4.1 Basic properties and applications 175

what went wrong with the system. More precisely, Reiter [783] defines a diagnosis
as a minimal subset J ⊆ {1, 2, . . . , m} such that
m

φk (X ∗ , Y ) = 0
k=1
k ∈ J

is consistent. The idea is that, were it not for the components in J, then I would
have been functioning properly. The minimality of J translates what Reiter calls
the “Principle of Parsimony.”
The task of the analyst is now to produce all diagnoses associated with a given
observation X∗ . Let p1 , p2 , . . . , pm be m new Boolean variables, and define the
function
m
f (Y , P ) = pk φk (X ∗ , Y ).
k=1
Let also

pi yj yj , k = 1, 2, . . . , r,
i∈Jk j ∈Ak j ∈Bk

denote the prime implicants of the dual function f d . We leave it to the reader to
check that diagnoses are exactly the minimal members of the collection of sets
{J1 , J2 , . . . , Jr }. Reiter proposes an ad hoc algorithm that produces all the diag-
noses and that uses as a subroutine a simple dualization algorithm for positive
functions (see also the exercises at the end of this chapter). 

Application 4.2. (Complexity theory.) Theoretical computer scientists have intro-


duced several measures of complexity reflecting the difficulty to compute Boolean
functions, and they have analyzed the relation between them. One such measure,
albeit a rather primitive one, is the degree deg(f ) of the Boolean function f , that
is, the minimum degree of a DNF representing f . A more elaborate measure is the
decision tree complexity of f . Remember that decision trees were introduced in
Section 1.12.3. The depth δ(T ) of a decision tree T is the length (that is, the number
of arcs) of a longest path from the root to a leaf of T . The decision tree complexity
DT (f ) of a Boolean function f is the minimum of δ(T ) over all decision trees
computing f . This measure of complexity has been extensively investigated; see,
for example, [902, 903]. Now, assume that f is computed by a tree of depth δ.
Then, as we noted at the end of Section 1.12.3, f and f can both be represented
by (orthogonal) DNFs of degree at most δ. Therefore, we obtain the relation
max(deg(f ), deg(f d )) ≤ DT (f ).
However, a more subtle relation also holds, namely,
DT (f ) ≤ deg(f ) deg(f d ).
We can prove this inequality by induction on the number of variables ([626, 903]).
Let f = φ(x1 , x2 , . . . , xn ), and f d = ψ(x1 , x2 , . . . , xn ), where φ and ψ are DNFs
176 4 Duality theory

of degrees deg(f ) and deg(f d ), respectively; let C be any term of φ, and let
(without loss of generality) {x1 , x2 , . . . , xk } be the set of variables occurring in C.
Thus, k ≤ deg(f ). We construct a decision tree for f as shown in Figure 1.5,
branching on the variables in the natural order (x1 , x2 , . . . , xn ). Thus, the root of
the tree is labeled by x1 . More generally, if u is an internal vertex at depth i from
the root ( 0 ≤ i < k), then u is labeled by xi+1 . Now, consider any internal vertex
at depth k − 1 (if there is no such vertex, then the tree has depth at most k − 1 and
the required inequality holds). Let v and w be the children of this vertex. Then, the
subtree hanging from v (respectively, from w) is a decision tree for a function of the
form g = f|P ,N (resp., h = f|P \{k},N ∪{k} ), where (P , N ) is a partition of {1, 2, . . . , k}
and k ∈ P .
We can assume that the subtrees representing g and h both have optimal depth.
In this way, we obtain for f a decision tree with depth max(DT (g), DT (h)) + k.
Assume that max(DT (g), DT (h)) = DT (g) (the other case is similar). By induc-
tion, we can assume that DT (g) ≤ deg(g) deg(g d ). Note that deg(g) ≤ deg(f ),
since g is a restriction of f . Moreover, by Theorem 4.2,
g d = (f|P ,N )d = (f d )|N,P . (4.8)
Since P ∪ N = {1, 2, . . . , k} is the set of indices of the variables in C, Theorem
4.7, together with (4.8), implies that a DNF of g d is obtained by fixing at least one
variable to either 0 or 1 in each term of ψ. Therefore, deg(g d ) ≤ deg(f d ) − 1.
So, we have represented f by a decision tree of depth
DT (g) + k ≤ deg(g) deg(g d ) + k ≤ desg(f ) (deg(f d ) − 1) + deg(f )
= deg(f ) deg(f d ),
which proves the required inequality. 

4.2 Duality properties of positive functions


Much of the literature on Boolean duality has focused on the special case of positive
functions, where the results usually have simple combinatorial interpretations. In
this section, we reexamine some of the results of Section 4.1 within this framework
and discuss their meaning within various fields of application. Most of these results
have actually been independently discovered in several areas; see, for example,
Benzaken [64], Berge [72], and Muroga [698].

4.2.1 Normal forms and implicants of dual functions


Recall from Section 1.10 that a Boolean function f is positive if and only if X ≤ Y
implies f (X) ≤ f (Y ) for all X, Y ∈ Bn , and that f is positive if and only if it
can be represented by a positive expression (namely, a Boolean expression which
contains only positive literals). In fact, the complete DNF of a positive Boolean
function is positive and is its unique prime irredundant DNF (see Theorem 1.23).
4.2 Duality properties of positive functions 177

Thus, in view of Theorem 4.5, positivity is preserved under dualization.

Theorem 4.17. A function f is positive if and only if its dual f d is positive.

For a positive function f , we denote by minT (f ) the set of minimal true points
of f , and by maxF (f ) the set of its maximal false points. Theorem 1.26 describes
a simple one-to-one correspondence between the prime implicants of f and its
minimal true points: Namely, X ∗ ∈ minT (f ) if and only if C = ∧i∈supp(X∗ ) xi is a
prime implicant of f , where supp(X ∗ ) = { i ∈ {1, 2, . . . , n} | xi∗ = 1}.
Theorem 1.27 establishes a similar relationship between the prime implicates
of f and its maximal false points. In duality terms, this result translates as follows:

Theorem 4.18. Let f be a positive Boolean function on Bn . The point


X∗ ∈ B n is a maximal false point of f if and only if the elementary conjunc-

tion C = i∈supp(X∗ ) xi is a prime implicant of f d , where supp(X ∗ ) = { i ∈
{1, 2, . . . , n} | xi∗ = 0}.

Example 4.5. Let f = x1 ∨ x2 x3 . Its dual is f d = x1 x2 ∨ x1 x3 . One can check that


the maximal false points of f are (0, 0, 1) (corresponding to the prime implicant
x1 x2 of f d ) and (0, 1, 0) (corresponding to the prime implicant x1 x3 of f d ). 

Other useful characterizations of dual prime implicants of positive functions


are best stated in hypergraph terminology. Recall from Chapter 1 and Appendix A
that, if H = (N, E) is a hypergraph and S ⊆ N is a subset of vertices, then S is
called a transversal of H (or of E) if S ∩ E  = ∅ holds for all edges E ∈ E, and S is
called stable if its complement is a transversal (namely, if S does not include any
edge of H). A transversal S is minimal if it does not (properly) include any other
transversal.
Now, for a positive Boolean function f on B n , let Hf = (N , P), where N =
{1, 2, . . . , n} and P denotes the family of all subsets P ⊆ {1, 2, . . . , n} such that

i∈P xi is a prime implicant of f . We know that Hf is a clutter (or a Sperner
hypergraph), meaning that no set in P contains another set in P (sometimes, we
may say that P itself is the clutter).
 
Theorem 4.19. Let f = P ∈ P i ∈P xi and g = T ∈ T i ∈T xi be the complete
DNFs of two positive functions on B n . The following statements are equivalent:

(a) g = f d .
(b) For every partition of N = {1, 2, . . . , n} into two sets A and A, there is either
a member of P contained in A or a member of T contained in A, but not
both.
(c) T is exactly the family of minimal transversals of P.

Proof. The equivalence of (a) and (b) is a restatement of Theorem 4.1(a). Statement
(c) is a corollary of Theorem 4.7. 
178 4 Duality theory

Example 4.6. As in Example 4.5, consider the function f = x1 ∨ x2 x3 and its


dual f d = x1 x2 ∨ x1 x3 . The hypergraph Hf has the edge-set E = {{1}, {2, 3}}. One
easily checks that {1, 2} and {1, 3} are exactly the minimal transversals of Hf . 

The previous results provide an efficient characterization of the dual (prime)


implicants of a positive function: Namely, given any reasonable description of a
positive function f (e.g., its complete DNF) and given an elementary conjunction

C = i ∈T xi , Theorem 4.18 allows us to verify efficiently whether C is a prime
implicant or an implicant of f d (see also Theorem 1.27).
It turns out to be more difficult to decide whether C is a dual subimplicant of
f , that is, to determine whether there exists a set of indices S such that T ⊆ S and
d
i ∈S xi is a prime implicant of f . Boros, Gurvich, and Hammer [121] proved
that this question is NP-complete when f is given in DNF, but they also gave a
characterization of dual subimplicants which can be efficiently tested when |T |
is bounded. We defer a presentation of this result to Chapter 10 (see Theorem
10.4), where it will constitute a main tool for the recognition of read-once Boolean
functions.

4.2.2 Dual-comparable functions


Let us now turn to the characterization of dual-comparable functions. We say that
a hypergraph H = (N , E) (or the family of sets E) is intersecting if E ∩ E  = ∅ for
all E, E ∈ E.
Theorem 4.20. A positive function f is dual-minor if and only if Hf is intersecting.
Proof. This follows from Theorem 4.10. 

Let H = (N, E) be an arbitrary hypergraph, and let k ≥ 1 be an integer. A k-


coloring of H is a partition of N into k stable sets N1 , N2 , . . . , Nk . We say that
H is k-colorable if it admits a k-coloring, and we denote by χ (H) the chromatic
number of H, that is, the smallest integer k such that H is k-colorable. Note that
χ (H) is finite, except when H has either an empty edge or an edge of cardinality 1.
We let χ (H) = +∞ in either of these two cases. On the other hand, χ (H) = 1
exactly when H has no edge.
For a positive Boolean function f , the hypergraph Hf has an empty edge only if
f = 1, and it has an edge of cardinality 1 only if f has a linear prime implicant. In
view of Theorem 4.8, we do not lose much if we disregard linear prime implicants
in the next statement.
Theorem 4.21. If f is a dual-minor positive function without prime implicants of
degree 1, then χ (Hf ) ≤ 3.
Proof. If f is dual-minor, then f  = 1, and hence its chromatic number is finite.
Consider an arbitrary coloring of Hf = (N , P) into k stable sets N1 , N2 , . . . , Nk ,
and assume that k ≥ 4. One of the sets A = N1 ∪ N2 or A = N3 ∪ . . . ∪ Nk is stable:
4.2 Duality properties of positive functions 179

Otherwise, there are two sets P , P ∈ P such that P ⊆ A and P ⊆ A, and thus
P ∩ P = ∅, in contradiction with Theorem 4.20. Therefore, either (N1 , N2 , A) or
(A, N3 , . . . , Nk ) is a coloring of Hf involving fewer than k classes. 

Theorem 4.22. A positive function f is dual-major if and only if χ (Hf ) ≥ 3.


Proof. The clutter Hf is 2-colorable if and only if there exists a partition (A, A)
of {1, 2, . . . , n} such that P ∩ A  = ∅ and P ∩ A  = ∅ for all P ∈ P. In view of
Theorem 4.11, this means that f is not dual-major.
The case χ (Hf ) = 1 corresponds to the constant function f = 0 which is not
dual-major. 

From these results, we derive a characterization of self-dual positive functions.


Theorem 4.23. A positive function f without prime implicants of degree 1 is
self-dual if and only if Hf is intersecting and χ (Hf ) = 3.
Proof. This follows directly from Theorems 4.20, 4.21, and 4.22. 

Finally, let us note that the proof of Theorem 4.12 is easily adapted to establish
the next result.
Theorem 4.24. A positive Boolean function f on B n is self-dual if and only if it
is maximal among all positive dual-minor functions or, equivalently, if and only if
{supp(X): f (X) = 1} is a maximal intersecting family of subsets of {1, 2, . . . , n}.
The number of positive self-dual functions on B n is not as easily determined as
the total number of self-dual functions, but asymptotic formulas have been derived
by Korshunov [579] (see also Bioch and Ibaraki [88]; Loeb and Conway [621]).

4.2.3 Applications
Application 4.3. (Combinatorics.) We saw in Section 1.13.5 that positive functions
are in one-to-one correspondence with clutters, by way of the mapping

f (x1 , x2 , . . . , xn ) = xj + → P.
A∈P j ∈A

Let φ = T ∈T ( j ∈T xj ) be the complete DNF of f d . By Theorem 4.19, every
set T in T is a minimal transversal of H. In hypergraph terminology, T is the
transversal clutter or blocker of H (see, e.g., Berge [72]; Eiter and Gottlob [295];
the terminology blocker is due to Edmonds and Fulkerson [288, 353]).
Let T (H ) denote the blocker of an arbitrary clutter H. Many elementary prop-
erties of blockers are probably best viewed in a Boolean context (and, in this
context, can be extended to nonpositive functions). For instance, Lawler [603]
and Edmonds and Fulkerson [288] observed that T (T (H )) = H , a property
180 4 Duality theory

that is equivalent to the Boolean identity (f d )d = f . Similarly, we can deduce


from Theorem 4.2 the following property mentioned in Seymour [821]: For all
S ⊆ {1, 2, . . . , n},
T (H) \ S = T (H / S) and T (H) / S = T (H \ S),
where the deletion (\) and contraction (/) operations have been introduced in
Section 1.13.5.
Properties of intersecting clutters (that is, dual-minor functions), (non)
2-colorable clutters (that is, dual-major functions), maximal intersecting hyper-
graphs (corresponding to self-dual functions), and k-colorable hypergraphs have
been extensively studied in the literature; see, for instance, Berge [72] or Schrij-
ver [814]. Their connections with Boolean duality have been stressed in a series
of papers by Benzaken [62, 63, 64, etc.]. 

Application 4.4. (Integer programming and combinatorial optimization.) Con-


sider a set covering problem SCP, as introduced in Section 1.13.6:
n

minimize z(y1 , y2 , . . . , yn ) = ci yi (4.9)
i=1

subject to yi ≥ 1, k = 1, 2, . . . , m (4.10)
i ∈Ak

(y1 , y2 , . . . , yn ) ∈ Bn , (4.11)
and let P = {A1 , A2 , . . . , Am }. Clearly, the (minimal) feasible solutions of SCP
are the characteristic vectors of the (minimal) transversals of P. Therefore, if we
define a Boolean function f by
m

f = xi = xi ,
k=1 i ∈Ak P ∈ P i ∈P

then the (minimal) feasible solutions of SCP are exactly the (minimal) true points
of f d . In particular, any algorithm that computes the dual of f could be used, in
principle, to solve the set covering problem (see, e.g., Lawler [603] for early work
based on these observations).
More generally, dual blocking pairs (P, T ), where T is the blocker of P, play
a very important role in the theory of combinatorial optimization. A paradigmatic
example of such a pair is provided by the set P of elementary paths joining two
vertices s and t in a directed graph, and by the set T of minimal cuts separating s
from t. Another example consists of the set P of all chains in a partially ordered
set and the set T of all antichains.
We have just seen that the set covering problem SCP is equivalent to the min-

imization problem: minT ∈ T i ∈T ci . If we replace the sum by a max-operator in
the objective function, then we obtain a class of bottleneck optimization problems,
expressed as
min max ci .
T ∈ T i ∈T
4.2 Duality properties of positive functions 181

Edmonds and Fulkerson [288] have established that this class of problems displays
a very strong property which, in fact, provides a rather unexpected characterization
of duality for positive Boolean functions.
Theorem 4.25. Let P and T be two nonempty clutters on {1, 2, . . . , n}. Then, the
equality
max min ci = min max ci (4.12)
P ∈ P i ∈P T ∈ T i ∈T
holds for all choices of real coefficients c1 , c2 , . . . , cn if and only if T is the blocker
of P.
Proof. Assume first that T is the blocker of P and fix the coefficients c1 , c2 , . . . , cn .
Consider any P ∈ P and T ∈ T . Since P ∩ T  = ∅, mini ∈P ci ≤ maxi ∈T ci .
Therefore, the left-hand side of (4.12) is no larger than its right-hand side.
Now, assume without loss of generality that c1 ≥ c2 ≥ . . . ≥ cn , and consider
the smallest index j such that {1, 2, . . . , j } contains a member of P; say P ∗ ⊆
{1, 2, . . . , j } and P ∗ ∈ P. Then, mini ∈P ∗ ci = cj . Note that {j + 1, j + 2, . . . , n}
does not contain any set T ∈ T because such a set T would not intersect P ∗ .
On the other hand, {j , j + 1, . . . , n} is a transversal of P (since its complement is
stable in P, by choice of j ), and hence it contains some set T ∗ ∈ T . Therefore,
maxi ∈T ∗ ci = cj , and equality holds in (4.12).
For the converse implication, let us assume that (4.12) holds for all choices
of c1 , c2 , . . . , cn , and let us establish condition (b) in Theorem 4.19. Let (A, A) be
a partition of {1, 2, . . . , n} into two sets, and let ci = 1 if i ∈ A, ci = 0 if i ∈ A.
By assumption, (4.12) holds for this choice of c1 , c2 , . . . , cn . If both sides of the
equation are equal to 1, this means that there is a set P ∗ ∈ P such that P ∗ ⊆ A, and
that no set in T is entirely contained in A. On the other hand, if both sides of (4.12)
are equal to 0, then the reverse conclusion holds. Hence, by Theorem 4.19(b), T
is the blocker of P. 
Gurvich [421] generalized Theorem 4.25 in order to characterize Nash-
solvable game forms. 
Application 4.5. (Reliability theory.) As in Section 1.13.4, let fS be the (positive)
structure function of a coherent system S. We have already seen that each prime

implicant i∈P xi corresponds to a minimal pathset of S, namely, a minimal set
of components P with the property that the whole system S works whenever the
components in P work.
Similarly, every (prime) implicant i∈T xi of fSd is associated with a subset T
of components called a (minimal) cutset of S. A cutset T has the distinguishing
property that, if X∗ describes a state of the components such that xi∗ = 0 for all
i ∈ T , then fSd (X ∗ ) = 1, and hence fS (X ∗ ) = 0. In other words, the system S fails
whenever all components in the cutset fail, irrespectively of the operating state of
the other components. Therefore, the dual function fSd describes the system S in
terms of failing states.
This duality relationship between minimal pathsets and minimal cutsets is well-
known in the context of reliability theory, as stressed by Ramamurthy [777]. Several
182 4 Duality theory

authors have actually investigated the use of Boolean dualization techniques to


generate the list of minimal cutsets of a system from a list of its minimal pathsets;
see, for instance, Locks [616] or Shier and Whited [830]. 

Application 4.6. (Game theory.) Let v be a simple game on the player set N ,
and let fv be the positive Boolean function associated with v, as explained in
Section 1.13.3. Then, the prime implicants of fv correspond to the minimal winning
coalitions of the game, namely, to those minimal subsets P of players such that
v = 1 whenever all players in P vote “Yes.”

If i∈T xi is a prime implicant of fvd , then, in view of Theorem 4.18, T is
the complement of a maximal losing coalition. In other words, T is a blocking
coalition, that is, a minimal subset of players such that v = 0 if all players in T
vote “No.”
When modeling real-world voting bodies, it often makes sense to consider cer-
tain special classes of games (see, e.g., Ramamurthy [777] or Shapley [828]). A
game v is called proper if two complementary coalitions S and N \ S cannot be
simultaneously winning. It follows from Theorem 4.9(i) (or from Theorem 4.20)
that the game v is proper if and only if the function fv is dual-minor. On the other
hand, in a strong game, two complementary coalitions cannot be simultaneously
losing. By Theorem 4.9(ii) (or Theorem 4.22), a game v is strong if and only if fv is
dual-major. Finally, v is called decisive (or constant-sum) if exactly one of any two
complementary coalitions is winning. So, v is decisive if and only if fv is self-dual.
For obvious reasons, most practical voting rules are proper. For instance, when
all the players carry one vote and decisions are made based on the majority rule
with threshold q > n2 , then the resulting game is proper. If the number of players
is odd and q = n+1 2
, then the game is also decisive.
The concept of self-dual extension has been studied in the game-theoretic
literature under the name of constant-sum extension.
Unexpectedly, perhaps, Boolean duality also plays an important role in the
investigation of solution concepts for nonsimple games, such as 2-person (or
n-person) positional games; we refer to Gurvich [421, 423, 424, etc.] and to
Chapter 10 for illustrations. 

Application 4.7. (Distributed computing systems.) Dual-comparable Boolean


functions have also found applications in several areas of computer science. Lam-
port [593], for instance, has proposed to use them in order to achieve mutual
exclusion in distributed computing systems (see also Davidson, Garcia-Molina,
and Skeen [258]; Garcia-Molina and Barbara [370]; Bioch and Ibaraki [88];
Ibaraki and Kameda [516]; etc.). In this context, intersecting clutters (correspond-
ing to the prime implicants of positive dual-minor functions) are usually called
coteries, and each member of a coterie is called a quorum.
More precisely, let N = {1, 2, . . . , n} represent the sites in a distributed system
and let C be a coterie on N . Lamport [593] proposed that a task (e.g., updating
data in a replicated database) should be allowed to enter a critical section only
4.3 Algorithmic aspects: The general case 183

if it can get permission from all the members of a quorum T ∈ C, where each site
is allowed to issue at most one permission at a time. The intersecting property of
coteries guarantees that at most one task can enter the critical section at any time
(meaning, e.g., that conflicting updates cannot be performed concurrently in the
database).
A coterie C is said to dominate another coterie D if, for each quorum T1 ∈ D,
there is a quorum T2 ∈ C satisfying T2 ⊆ T1 (see Garcia-Molina and Bar-
bara [370]). Non-dominated coteries have maximal “efficiency” and are therefore
important in practical applications. Theorem 4.24 shows that nondominated coter-
ies are nothing but self-dual positive functions in disguise. Theorems 4.22 and 4.23
have also been rediscovered in this context (see [370]). 

Further discussions of the use of duality concepts in applications can be


found, for instance, in papers by Domingo, Mishra, and Pitt [275] or Eiter and
Gottlob [295].

4.3 Algorithmic aspects: The general case


4.3.1 Definitions and complexity results
The applications presented in the previous sections have established the need for
an algorithm that computes an expression of f d from an expression of f . Since
we know that an expression of f d can be obtained by exchanging ∨ and ∧, as well
as the constants 0 and 1, in any given expression of f , the problem has to be stated
more precisely in order to avoid trivialities. A closer look at the applications shows
that, in many cases, we are more specifically interested in one of the following
algorithmic problems.

Dual Recognition
Instance: DNF representations of two Boolean functions f and g.
Question: Is g = f d ?

Dualization
Instance: An arbitrary expression of a Boolean function f .
Output: The complete DNF of f d or, equivalently, a list of all prime implicants
of f d .

DNF Dualization
Instance: A DNF representation of a Boolean function f .
Output: The complete DNF of f d or, equivalently, a list of all prime implicants
of f d .
184 4 Duality theory

In this section, we examine more closely the algorithmic complexity of these


dualization problems, as well as their relationship with the solution of Boolean
equations and the generation of prime implicants. We start with an easy result.

Theorem 4.26. Dual Recognition is co-NP-complete, even if f is a positive


function represented by its complete DNF.

Proof. Consider an arbitrary DNF equation φ = 0. Let f = 0, and let g be the


function represented by φ. Then, g = f d = 1 if and only if the equation φ = 0 is
inconsistent. 

Theorem 4.26 already underlines the intrinsic complexity of the dualization


problem in its decision version (which simply requires a Yes or No answer). When
we turn to the list-generation problems Dualization and DNF Dualization,
another difficulty arises. Indeed, we have observed in Theorem 3.18 that the num-
ber of prime implicants of f d can be exponentially larger than the number of terms
in a DNF of f , even when f is a positive function. Thus, the size of the output of
the Dualization and DNF Dualization problems is generally not polynomi-
ally bounded in the size of their input. In view of this unavoidable difficulty, the
complexity of dualization algorithms is most meaningfully expressed as a func-
tion of the combined size of their input and of their output (we refer to [538, 605]
and toAppendix B for a discussion of the complexity of list-generation algorithms).

Remark. The reader should note that on some occasions, it may be easier to
generate a shortest DNF of f d , or even an arbitrary DNF of f d , rather than its
complete DNF. Indeed, Theorem 3.17 shows that for some Boolean functions, the
size of the complete DNF may be exponentially larger than the size of certain
appropriately selected DNF representations. (This can only hold for nonmonotone
functions. Indeed, for monotone functions, the complete DNF is necessarily shorter
than any other DNF; see Theorem 1.24.)
It turns out, however, that practically all dualization algorithms generate the
complete DNF of f d , rather than an arbitrary DNF. Moreover, analyzing the com-
plexity of the “incomplete” version of the problem requires special care, since the
output of the problem is not univocally defined, or may not have an efficient char-
acterization (e.g., when the objective is to generate a shortest DNF of f d ). These
reasons explain why we mostly concentrate here on generating the complete DNF
of f d . Exceptions will be found in Theorem 4.29 and, indirectly, in the proof of
Theorem 4.28. 

As mentioned in Sections 2.11.2 and 4.1.4, and as expressed by the proof of


Theorem 4.26, Dualization and DNF Dualization can be seen as generaliza-
tions of the problem of solving (DNF) equations. Therefore, both problems are
certainly hard. More precisely:
4.3 Algorithmic aspects: The general case 185

Theorem 4.27. Unless P = NP, there is no polynomial total time algorithm for
Dualization or DNF Dualization, even if their input is restricted to cubic
DNFs.

Proof. Assume that there is a polynomial total time algorithm A for either problem.
Denote by r(L, U ) the running time of A, where r(x, y) is a bivariate polynomial,
L is the input length and U is the output length.
Let φ be a cubic DNF. From Theorem 2.1, we know that, unless P = NP, there
is no polynomial time algorithm for deciding whether the equation φ(X) = 0 is
consistent. Note that φ = 0 is inconsistent exactly when φ d is identically 0, that is
when φ d has no implicant.
We now consider any of the two dualization problems with the input φ. Run the
algorithm A on φ until either (i) it halts or (ii) the time limit r(|φ|, 0) is exceeded.
In case (i), if A outputs some implicant of f d , then the equation φ(X) = 0 is
consistent; otherwise, it is inconsistent. In case (ii), the equation φ(X) = 0 is con-
sistent. Therefore, in both cases, the equation has been solved in time polynomial
in |φ|, which can only happen if P = NP. 

A converse of Theorem 4.27 holds. Indeed, if P = NP, then the following result
implies the existence of a polynomial total time algorithm for Dualization (and
hence, for DNF Dualization):

Theorem 4.28. There is an algorithm for Dualization which, given an arbitrary


Boolean expression φ(x1 , x2 , . . . , xn ) of a function f , produces the complete DNF
ψ of f d by solving O(np) Boolean equations of size at most |φ| + |ψ|, where p is
the number of prime implicants of f d . If t(L) is the complexity of solving a Boolean
equation with input length at most L, then the running time of this algorithm is
polynomial in |φ|, p and t(|φ| + |ψ|).

Proof. The algorithm combines the arguments developed in Theorem 2.20,


Corollary 3.5, and Theorem 3.9. It consists of two phases.
In Phase 1, as in the proof of Theorem 2.20, assume that the prime implicants
C1 , C2 , . . . , Ck of f d have already been produced (k ≤ p). In the next iteration, the
algorithm solves the equation
k

φ(X) ∨ Ci (X) = 0. (4.13)
i=1

If X∗ ∈ B n is a solution of (4.13), then X ∗ is a false point of φ, that is, X ∗ is a


true point of f d , and X ∗ is not covered by any of C1 , C2 , . . . , Ck . In other words,

the minterm C ∗ = j ∈supp(X∗ ) xj j ∈supp(X∗ ) x j , is an implicant of f d that is not
absorbed by any of the known prime implicants. Therefore, as in Corollary 3.5,
solving n Boolean equations allows us to produce a new prime implicant of f d that
absorbs C ∗ . The algorithm adds this new prime implicant to the list and proceeds
with the next iteration of Phase 1.
186 4 Duality theory

Conversely, if (4.13) is inconsistent, then it means that every true point of f d is



covered by one of C1 , C2 , . . . , Ck . So, at this point, the DNF φ (X) = ki=1 Ci (X)
represents f d , although some prime implicants of f d may still be missing. Then,
Phase 1 terminates and the algorithm enters a second phase where the consensus
procedure is applied to the DNF φ (with the same modifications as in the proof
of Theorem 3.9), until the complete DNF of f d has been obtained.
Clearly, the whole algorithm runs in time polynomial in |φ|, p and
t(|φ| + |ψ|). 

A result similar to Theorem 4.28 holds for generating the minterm expression
of the dual (remember that the minterm expression of a function is a special type
of DNF representation; see Definition 1.11).
Theorem 4.29. There is an algorithm which, given an arbitrary Boolean expres-
sion φ(x1 , x2 , . . . , xn ) of a function f , produces the minterm expression of f d by
solving q + 1 Boolean equations of size at most |φ| + nq, where q is the number of
minterms of f d . If t(L) is the complexity of solving a Boolean equation with input
length at most L, then the running time of this algorithm is polynomial in |φ|, q
and t(|φ| + nq).
Proof. This is an immediate corollary of Theorem 2.20 and of the fact that X is a
false point of f if and only if X is a true point of f d . 

Together with the results obtained in previous chapters, Theorems 4.28 and
4.29 stress once again the close connection among three fundamental problems on
Boolean functions, namely, the solution of Boolean equations, the generation of
prime implicants, and the dualization problem. Essentially, these results show that
an algorithm for any of these three problems can be used as a black box for the
solution of the other two problems. Indeed, assume that A is an algorithm taking
as input an arbitrary Boolean expression φ, and let f be the function represented
by φ:
(i) If A is a dualization algorithm or an algorithm that generates all prime
implicants of f , then A trivially solves the equation φ = 0.
(ii) Conversely, if A is an algorithm for the solution of Boolean equations,
then A can be used to produce all prime implicants of f (see Theorem 3.9)
as well as all prime implicants of f d (see Theorem 4.28).

4.3.2 Dualization by sequential distributivity


The algorithms sketched in Theorems 4.29 and 4.28 are valid when φ is not in
disjunctive normal form, but they require subroutines (i.e., NP-oracles) for the
solution of Boolean equations. In this section, we present a simple dualization
algorithm for the most important case, namely, the DNF Dualization problem.
It is based on Theorem 4.5, which shows that a CNF of f d can be immediately
4.3 Algorithmic aspects: The general case 187

deduced from any DNF of f . Then, by repeated use of the distributivity law and
of absorption, the available CNF can easily be transformed into a DNF of f d .
 k
More formally, for the input DNF φ = m i=1 Ci , let φk = i=1 Ci and let fk
denote the function represented by φk (k = 1, 2, . . . , m). The k-th iteration of the
algorithm computes all prime implicants of fkd , so that the task is complete after
the m-th iteration.  
For i = 1, 2, . . . , m, let Ci = j ∈Li -j , where -1 , -2 , . . . are literals. The prime
implicants of f1d = C1d are exactly the literals -j (j ∈ L1 ). For k > 1, suppose that
d

fk−1 is expressed by its complete DNF T ∈T PT . Then, by Theorem 4.5,
k 

    
fkd = -j = PT ∧ -j ,
i=1 j ∈Li T ∈T j ∈Lk

and, by distributivity,
fkd = P T -j .
T ∈T j ∈Lk

So, we obtain all prime implicants of fkd from those of fk−1


d
by first generating all
d
terms PT -j (j ∈ Lk ), for each prime implicant PT of fk−1 , and then removing the
terms that are absorbed.
Example 4.7. Let φ = xy ∨ xzu ∨ xyz. Then, φ1 = xy and f1d has two prime
implicants, namely, x and y. Consider now φ2 = xy ∨ xzu. Applying the distribu-
tivity law to the dual expression φ2d = (x ∨ y)(x ∨ z ∨ u), we generate the terms
xx, xz, x u, xy, yz, and yu. The first term is absorbed by the other ones, so that
φ2d has 5 prime implicants: xz, x u, xy, yz, and yu. Finally, we obtain that
φ3d = (xz ∨ x u ∨ xy ∨ yz ∨ yu)(x ∨ y ∨ z),
and we generate the terms x yz, xz, x y u, xzu, xy, xyz, yz, xyu, and yzu. Since
x yz, xzu, xyz, xyu, and yzu are absorbed, we conclude that φ d (= φ3d ) has 4
prime implicants, namely, xz, x y u, xy, and yz. 

The resulting procedure is called SD-Dualization (for “sequential-


distributive dualization”) and is stated more formally in Figure 4.1.
Theorem 4.30. Procedure SD-Dualization outputs all the prime implicants
of f d .
Proof. The statement follows easily from Theorem 4.7. 

Procedure SD-Dualization is part of the folklore of the field and has been
repeatedly proposed by numerous authors, often in the context of the dualization
of positive DNFs; see Fortet [342], Maghout [643], Pyne and McCluskey [765],
Kuntzmann [589], Benzaken [61], Lawler [603], and so on. (Some authors [119,
432] recently called it “Berge multiplication,” in reference to its description in
[71, 72].) Nelson [705] proposed using it as a subroutine in his so-called double
188 4 Duality theory

Procedure SD-Dualization
 

Input: A DNF φ = mi=1 j ∈Li -j of a Boolean function f .
Output: The set of prime implicants of f d .
begin
T ∗ := {-j | j ∈ L1 };
for k = 2 to m do
begin
T := ∅;
for all P ∈ T ∗ and for all j ∈ Lk do T := T ∪ {P -j };
remove from T every term which is absorbed by another term in T ;
T ∗ := T ;
end
return T ∗ ;
end

Figure 4.1. Procedure SD-Dualization.

dualization method for generating all prime implicants of a function f represented


by a DNF φ: Indeed, all prime implicants of f can be obtained by applying SD-
Dualization twice in succession, first on φ, then on the complete DNF of f d
obtained after this first step (see Section 3.2.4).
From a practical viewpoint, this simple algorithm is reasonably efficient for
small problem sizes and can easily be accelerated by various procedural shortcuts,
such as those based on the following result, found in Benzaken [61]:

Theorem 4.31. Let P1 , P2 , . . . , Pk be the prime implicants of a Boolean function f ,


p
and let C = j =1 -j be an elementary conjunction, where -1 , -2 , . . . , -p are literals.
Assume that P1 , P2 , . . . , Pk are sorted into p + 2 classes T (1), T (2), . . . , T (p + 2),
where

• for i = 1, 2, . . . , p, each conjunction in T (i) involves the literal -i and no


other literal from C;
• each conjunction in T (p + 1) involves at least 2 literals from C; and
• the conjunctions in T (p + 2) do not involve any literal from C.
p
Then, the prime implicants of f ∧ ( j =1 -j ) are exactly

(i) the conjunctions in T (1), T (2), . . . , T (p + 1); and


(ii) the conjunctions of the form Pi -j , where Pi ∈ T (p + 2), j ∈ {1, 2, . . . , p},
and Pi -j is not absorbed by any conjunction in T (j ).

Proof. Left as exercise for the reader. 

Additional shortcuts and other improvements of SD-Dualization have been


proposed and implemented by several researchers, such as Benzaken [61],
Locks [616, 617], Shier and Whited [830], etc. More recently, the application of
this algorithm to positive DNFs has received special attention, and its efficicency
4.4 Algorithmic aspects: Positive functions 189

has been improved in various ways, for instance, by Bailey, Manoukian, and
Ramamohanarao [40]; Dong and Li [276]; or Kavvadias and Stavropoulos [559].
SD-Dualization does not run in polynomial total time (even on positive
DNFs), namely, its running time may be exponentially large in the combined input
and output size of the problem. In fact, it tends to generate many useless terms in
its intermediate iterations (for k = 2, . . . , m − 1), and it only generates the prime
implicant of f d in its very last iteration (when k = m), after exponentially many
operations may already have been performed. This behavior was described more
accurately by Takata [856], who showed that on some examples, SD-Dualization
may produce a superpolynomial blowup for every possible ordering of the terms
of the input DNF (see also Hagen [432]). By contrast however, Boros, Elbassioni
and Makino [119] proved that SD-Dualization can be implemented to run in
output-subexponential time on positive DNFs, and to run in polynomial total time
on certain special classes of positive DNFs, such as bounded-degree DNFs or
read-once DNFs (see also Exercise 7).

4.4 Algorithmic aspects: Positive functions


4.4.1 Some complexity results
This section focuses on the dualization problem for positive Boolean functions.
Just as in the general case, this problem appears to be intractable (Lawler, Lenstra,
and Rinnooy Kan [605]).

Theorem 4.32. Unless P = NP, there exists no polynomial total time algorithm
for Dualization, even if its input represents a positive function.

Proof. Consider a DNF equation ψ(x1 , x2 , . . . , xn ) = 0, and assume that each of the
literals xi and x i appears at least once in ψ, for i = 1, 2, . . . , n. Clearly, solving this
type of DNF equation is NP-complete.
Now, let ψ ∗ (x1 , x2 , . . . , xn , xn+1 , xn+2 , . . . , x2n ) be the positive DNF obtained
after replacing each negative literal x i by a new variable xn+i in ψ (i = 1, 2, . . . , n).
Notice that ψ(X) = 0 if and only if ψ ∗ (X, X) = 0. Define further the positive
expression:

n
φ(x1 , x2 , . . . , x2n ) = ψ ∗ ∧ (xi ∨ xn+i ). (4.14)
i=1

Let X (i) , i = 1, 2, . . . , n, be the point of B 2n having all its components equal to


1 except for the i-th and (n + i)-th components. Clearly, X (i) is a maximal false
point of φ. We now claim that the maximal false points of φ are exactly the points
X (1) , X (2) , . . . , X (n) if and only if the equation ψ = 0 has no solution.
Let us first assume that φ has a maximal false point Y ∈ B2n other than
X , X (2) , . . . , X (n) . If yi = yn+i = 0 holds for some index i, then Y ≤ X (i) ,
(1)

a contradiction. So, ni=1 (yi ∨ yn+i ) = 1, and there follows that ψ ∗ (Y ) = 0.
190 4 Duality theory

Let U = (y1 , y2 , . . . , yn ) and note that (U , U ) ≤ Y . Hence (by positivity of ψ ∗ )


ψ ∗ (U , U ) = 0 and ψ(U ) = 0, thus proving the “if” part of the claim.
Conversely, if ψ = 0 has a solution, say, ψ(U ) = 0, then ψ ∗ (U , U ) = φ(U , U ) =
0. Thus, there exists a maximal false point of φ, say, Y ∈ B 2n , such that (U , U ) ≤ Y .
Note that yi = yn+i = 0 cannot hold for any index i, and hence Y is distinct from
X (1) , X (2) , . . . , X (n) , proving the “only if” part of the claim
Using Theorem 4.18, we obtain that the dual of φ has exactly n prime impli-
cants if and only if the equation ψ = 0 has no solution. Now, the proof is easily
completed by the same type of argument as in the proof of Theorem 4.27. 

Observe that the expression (4.14) is not a disjunctive normal form, so that
Theorem 4.32 does not settle the complexity of Dualization when its input φ is
restricted to positive DNFs: Let us call this problem Positive DNF Dualization.
Clearly, we can assume without loss of generality that the input of Positive DNF
Dualization is the complete DNF of a positive function f , that is, a positive
DNF consisting of all prime implicants of f . Thus, formally, we define Positive
DNF Dualization as follows:

Positive DNF Dualization


Instance: The complete DNF of a positive Boolean function f .
Output: The complete DNF of f d .

For simplicity, and when no confusion can arise, we often use the same notation
for a positive function f and for its complete DNF φ in the sequel. For instance,
def
we denote by |f | the size of the complete DNF of f , that is, we let |f | = |φ|.
Positive DNF Dualization is known to be equivalent to many interest-
ing problems encountered in various fields (see Section 4.2 and [295]). Within
Boolean theory alone, several authors – in particular, Bioch and Ibaraki [89];
Eiter and Gottlob [295]; Fredman and Khachiyan [347]; Johnson, Yannakakis,
and Papadimitriou [538] – have observed that this problem is polynomially equiv-
alent to the fundamental problem of recognizing whether two positive functions f
and g are mutually dual, namely, whether f = g d (note that this is just the positive
version of the Dual Recognition problem introduced in Section 4.3.1):

Positive Dual Recognition


Instance: The complete DNFs of two positive Boolean functions f and g.
Question: Is g = f d ?

If f and g are not mutually dual, then by definition of duality, there exists a point
X ∗ ∈ B n such that f (X ∗ ) = g(X ∗ ). Let us now establish that solving Positive
Dual Recognition indirectly allows us to determine such a point X∗ . (It is
interesting to observe that a similar result holds without the positivity assumptions.)

Theorem 4.33. If f and g are two positive functions on B n expressed by their


complete DNFs, and if f and g are not mutually dual, then a point X ∗ ∈ Bn
4.4 Algorithmic aspects: Positive functions 191

such that f (X∗ ) = g(X ∗ ) can be found by solving n instances of Positive Dual
Recognition with size at most |f | + |g|.

Proof. The proof is by induction on n. Let A be an algorithm for Positive Dual


Recognition, and assume that A returns the output No on the instance (f , g).
Theorem 4.3 implies that either f|xn =0 and g|xn =1 are not mutually dual or f|xn =1
and g|xn =0 are not mutually dual. Let us assume, without loss of generality, that
f|xn =0 and g|xn =1 are not mutually dual (one call on the algorithm A suffices to
find out). By induction on the number of variables, n − 1 additional calls on A can
be used to compute a point Y ∗ ∈ B n−1 such that f|xn =0 (Y ∗ ) = g|xn =1 (Y ∗ ). Then,
f (Y ∗ , 0) = g(Y ∗ , 1), and the point X ∗ = (Y ∗ , 0) is as required. 

We are now in a position to establish the equivalence of Positive DNF


Dualization and Positive Dual Recognition.

Theorem 4.34. Positive DNF Dualization and Positive Dual Recognition


are polynomially equivalent. More precisely:

(i) There is an algorithm for Positive Dual Recognition which, given the
complete DNFs of two positive functions f and g, decides whether f and g
are mutually dual by solving one instance of Positive DNF Dualization.
If r(|f |, |f d |) is the complexity of solving Positive DNF Dualization on
the input f , then the running time of this algorithm is polynomial |f |, |g|
and r(|f |, |g|).
(ii) Conversely, there is an algorithm for Positive DNF Dualization which,
given the complete DNF of a positive function f , produces the complete
DNF of f d by solving O(np) instances of Positive Dual Recognition
of size at most |f | + |f d |, where p is the number of prime implicants of
f d . If t(f1 , f2 ) is the complexity of solving Positive Dual Recognition
on the input (f1 , f2 ), then the running time of this algorithm is polynomial
in |f |, p and t(|f |, |f d |).

Proof. (i) If A is a dualization algorithm with running time r(|f |, |f d |), and (f , g)
is the input to Positive Dual Recognition, then we run A on the input f . If A
does not stop at time r(|f |, |g|), then it means that g  = f d . Otherwise, the output
of A can be used to determine whether g = f d and to answer Positive Dual
Recognition.
(ii)Assume that A is an algorithm for Positive Dual Recognition and assume
that, at some stage, the prime implicants PJ (J ∈ G) of f d have already been
produced, where |G| ≤ p. In the next iteration, the algorithm considers the positive
function
g= PJ (X). (4.15)
J ∈G

The algorithm A can be used to decide whether g = f d . In the affirmative, we


can stop. Otherwise, A can again be used (as in Theorem 4.33) to compute a
192 4 Duality theory

point X ∗ ∈ Bn such that f (X ∗ ) = g(X ∗ ). Since g ≤ f d , it must be the case that


f (X∗ ) = g(X ∗ ) = 0. Then, we can find (in polynomial time) a maximal false point

of f , say, Y ∗ , such that X∗ ≤ Y ∗ . By Theorem 4.18, the term P = j ∈supp(Y ∗ ) xj
is a prime implicant of f d and P (Y ∗ ) = 1. On the other hand, by positivity of
g, g(Y ∗ ) = 0, which implies that the prime implicant P is not in the current list
(PJ , J ∈ G).
This process can be repeated p + 1 times in order to produce all the prime
implicants of f d . 

Lawler, Lenstra, and Rinnooy Kan [605] and several other researchers (see
Garcia-Molina, and Barbara [370]; Johnson, Yannakakis, and Papadimitriou [538];
Bioch and Ibaraki [89]; Eiter and Gottlob [295]) have asked whether Positive DNF
Dualization can be solved in polynomial total time or, equivalently, whether
Positive Dual Recognition can be solved in polynomial time. This central
question of duality theory remains open to this day. A breakthrough result by
Fredman and Khachiyan [347], however, has established the existence of quasi-
polynomial time algorithms for Positive Dual Recognition and for Positive
DNF Dualization. This is in stark contrast with the NP-hardness results obtained
for the general Dual Recognition (Theorem 4.26) and DNF Dualization prob-
lems (Theorem 4.27), since it is widely believed that NP-hard problems have no
quasi-polynomial time algorithm.

4.4.2 A quasi-polynomial dualization algorithm


We now describe a simplest version of the dualization algorithm proposed by
Fredman and Khachiyan [347], which builds on the approach developed in Theo-
rems 4.33 and 4.34. Consider the complete DNFs of a positive function f on B n
and of its dual f d , say, 

f= xi (4.16)
I ∈F i∈I
and 

fd = xj . (4.17)
J ∈F d j ∈J

As in Theorem 4.34, let G ⊆ F d represent the collection of prime implicants of f d


which are currently known, and let


g= xj . (4.18)
J ∈G j ∈J

The algorithm proceeds to determine whether f and g are mutually dual and, in
the negative, to find a point point X∗ ∈ Bn such that
f (X∗ ) = g(X ∗ ) = 0. (4.19)
However, since an efficient procedure is not immediately available for deciding
whether f and g are mutually dual (i.e., for solving Positive Dual Recognition),
4.4 Algorithmic aspects: Positive functions 193

we cannot apply the same recursive approach used in the proof of Theorem 4.33.
Therefore, we introduce here two crucial modifications. First, instead of exactly
solving an instance of Positive Dual Recognition at every step of the recursion
(as in the proof of Theorem 4.33), we rely on an incomplete test based on examining
the quantity
def
 
E(f , g) = 2−|I | + 2−|J | . (4.20)
I ∈F J ∈G

Theorem 4.35. Let f and g be two positive functions defined by (4.16) and (4.18).
If E(f , g) < 1, then f and g are not mutually dual, and a point X ∗ satisfying (4.19)
can be computed in polynomial time.
Proof. We use the same approach in the proofs of Theorems 2.26 and 2.27. Namely,
consider the polynomial
 
F (X) = xi + (1 − xj ).
I ∈F i∈I J ∈G j ∈J

It defines a real-valued function on [0, 1]n . Let X H = ( 12 , 12 , . . . , 12 ) denote the cen-


ter of the unit hypercube. There holds F (XH ) = E(f , g), and (using Rosenberg’s
results [789], as in the proof of Theorem 2.26), one can compute in polynomial
time a point X ∗ ∈ {0, 1}n such that F (X ∗ ) ≤ F (X H ). In particular, if E(f , g) < 1,
then F (X ∗ ) = 0, which implies that f (X∗ ) = g(X ∗ ) = 0, and f and g are not
mutually dual. 

Thus, when E(f , g) < 1, Theorem 4.35 can be used as a substitute for Theo-
rem 4.33. When E(f , g) ≥ 1, however, we cannot draw any immediate conclusion,
and we turn instead to a recursive divide-and-conquer procedure based on The-
orem 4.3. But rather than decomposing f and g on an arbitrary variable, we are
going to show how to choose a “good” variable xi , so that the size of the resulting
subproblems is relatively small. Observe first that when E(f , g) ≥ 1, either f or
g contains a prime implicant of only logarithmic length.
Lemma 4.1. Let f and g be two positive functions defined by (4.16) and (4.18).
If E(f , g) ≥ 1, then either f or g has a prime implicant with degree at most
log(|F| + |G|).
Proof. Let δ = min{|I | | I ∈ F ∪ G} be the degree of a shortest prime implicant of
either f or g. By definition (4.20), (|F| + |G|)2−δ ≥ E(f , g) ≥ 1. 

For M ∈ [0, 1] and i ∈ {1, 2, . . . , n}, we say that variable xi occurs in f with
frequency at least M if
|{I | i ∈ I , I ∈ F}|
≥ M.
|F|
We say that xi is a frequent variable for the pair (f , g) if xi occurs with frequency
at least 1/ log(|F| + |G|) either in f or in g.
194 4 Duality theory

Procedure Recognize Dual


Input: Two positive Boolean functions f and g on Bn expressed by their complete DNFs (4.16)
and (4.18), with I ∩ J = ∅ for all I ∈ F and all J ∈ G.
Output: Yes if f and g are mutually dual. Otherwise, a point X∗ ∈ Bn such that f (X ∗ ) = g(X ∗ ) = 0.
begin
Step 1: if E ≥ 1 then go to Step 2
else return a vector X∗ ∈ Bn such that f (X ∗ ) = g(X ∗ );
Step 2: if |F ||G| ≤ 1 then check directly whether g = f d ;
if g  = f d then return X∗ ∈ Bn such that f (X ∗ ) = g(X ∗ )
else return “Yes”;
Step 3: select a frequent variable xi for the pair (f , g);
call Recognize Dual(f|xi =0 , g|xi =1 );
if the returned value is Y ∗ ∈ Bn−1
then return X∗ ∈ Bn , where xj∗ := yj∗ for all j  = i and xi∗ := 0
else begin
call Recognize Dual(f|xi =1 , g|xi =0 );
if the returned value is Y ∗ ∈ Bn−1
then return X∗ ∈ Bn , where xj∗ := yj∗ for all j  = i and xi∗ := 1
else return “Yes”;
end
end;

Figure 4.2. Procedure Recognize Dual.

Theorem 4.36. Let f and g be two positive functions defined by (4.16) and (4.18),
and assume that

I ∩ J  = ∅ for all I ∈ F and J ∈ G. (4.21)

If E(f , g) ≥ 1 and |F| |G| ≥ 1, then there exists a frequent variable for the pair
(f , g).
Proof. By Lemma 4.1, either f or g has a prime implicant with degree at most
log(|F| + |G|). Let us assume without loss of generality that J ∈ G defines such a
short implicant. Then, in view of (4.21), some variable xi , i ∈ J , must occur in f
with frequency 1/|J | ≥ 1/ log(|F| + |G|). 

We now have all the necessary ingredients to present the important quasi-
polynomial time algorithm proposed by Fredman and Khachiyan [347] for the
solution of Positive Dual Recognition. A formal description of the algorithm
is given in Figure 4.2.
Theorem 4.37. Procedure Recognize Dual is correct and runs in time
2
m4 log m+O(1) , where m = |F| + |G|.
Proof. The correctness of the procedure follows from the above discussion. Theo-
rem 4.35 implies that Step 1 can be executed in time polynomial in the input size
|f | + |g|. It can be checked that, if g = f d , then n ≤ |F||G| ≤ m2 (see Exercise 5),
4.4 Algorithmic aspects: Positive functions 195

and hence |f | + |g| = O(nm) = O(m3 ). Step 2 is easily done in O(1) time. There-
fore, up to a polynomial factor mO(1) , the running time of the procedure is bounded
by the number of recursive calls.
Fix m, and let M = 1/ log m. Let v = |F||G| be the volume of the pair (f , g), and
let a(v) be the maximum number of recursive calls of the procedure when running
on a pair with size at most m and volume at most v. We are going to show that
2m
a(v) ≤ m4 log . (4.22)

Note that the size of each pair involved in a recursive call is smaller than m. So,
the frequent variable xi selected in Step 3 always has frequency at least M either
in f or in g. Suppose, without loss of generality, that xi occurs with frequency
M in f .
Then, the number of terms of f|xi =0 is at most (1 − M)|F|, and the number of
terms of g|xi =1 is at most |G|, so that the volume of the pair (f|xi =0 , g|xi =1 ) is at
most (1 − M)v. Also, the number of terms of f|xi =1 is at most |F| and the number
of terms of g|xi =0 is at most |G| − 1, so that the volume of the pair (f|xi =1 , g|xi =0 )
is at most v − 1.
We thus obtain the following recurrence:

a(v) ≤ 1 + a((1 − M)v) + a(v − 1) and a(1) = 1.

From this recurrence, we obtain a(v) ≤ k + ka((1 − M)v) + a(v − k) for all k ≤ v.
Letting k = "vM# yields a(v) ≤ (3 + 2vM)a((1 − M)v), and hence

a(v) ≤ (3 + 2vM)(log v)/M .

The bound (4.22) on a(v) follows from v = |F||G| ≤ (|F| + |G|)2 /4 ≤ m2 /4 and
M = 1/ log m. 

A dualization algorithm for positive functions in DNF can be obtained as a by-


product of Recognize Dual, just as in Theorem 4.34. The procedure is described
in Figure 4.3.
As an immediate consequence of the above results, we obtain:
Theorem 4.38. Procedure FK-Dualization is correct and runs in time
2
m4 log m+O(1) , where m = |F| + |F d |.
Fredman and Khachiyan [347] have improved the time complexity of Recog-
nize Dual (or FK-Dualization) to mo(log m) (see also Elbassioni [309]). But, as
already mentioned, it remains an important open question to determine whether
the dual recognition problem can be solved in polynomial time or, equivalently,
whether Positive DNF Dualization can be solved in polynomial total time.
The results presented in this section have been a source of inspiration for much
subsequent research and have been generalized in many ways. For instance, Boros
et al. [117, 123, 124, 562, etc.] considered natural generalizations of positive
dualization problems that allow them to model numerous interesting applications
196 4 Duality theory

Procedure FK-Dualization
Input: A positive Boolean function f on Bn expressed by its complete DNF.
Output: The complete DNF of f d .
Step 0: g := 0;
Step 1: Call Recognize Dual on the pair (f , g);
if the returned value is “Yes” then halt;
else let X∗ ∈ Bn be the point returned by Recognize Dual;
compute a maximal false point of f , say Y ∗ , such that X ∗ ≤ Y ∗ ;

g := g ∨ j ∈supp(Y ∗ ) xj ;
return to Step 1.

Figure 4.3. Procedure FK-Dualization.

[9, 651, 654, 839]. We refer to Eiter, Makino, and Gottlob [302] and to Boros,
Elbassioni, Gurvich, and Makino [118] for surveys of related results. It is also worth
recalling at this point that the sequential-distributive algorithm SD-Dualization
has been recently shown to run in subexponential time on positive DNFs ([119];
see Section 4.3.2).

4.4.3 Additional results


Bioch and Ibaraki [89] and Eiter and Gottlob [295] have systematically investi-
gated several algorithmic problems that turn out to be polynomially equivalent to
dualization. We have already mentioned the equivalence of Positive DNF Dual-
ization and Positive Dual Recognition. It can also be shown that Positive
Dual Recognition is equivalent to the (apparently more restrictive) problem of
deciding whether a positive function given in complete disjunctive normal form
is self-dual or not, and to the following (apparently more general) identification
problem:

Identification
Instance: A black-box oracle to evaluate a positive Boolean function f at any
given point.
Output: All prime implicants of f and of f d .

The importance of this problem, where knowledge of f can only be gained


through queries of the form: “What is the value of f at the point X?” has been
underlined by Bioch and Ibaraki [89] and has been investigated especially in the
machine learning literature in relation to various other models of “exact learning
by membership queries”; see [21, 22, 29, 275, 429, 651, 652, 653, 654, 838, 884,
etc.] and [233] or Chapter 12 for related considerations. Incremental approaches
of the type used in Theorems 4.29, 4.28, 4.34, in particular, have proved useful in
the oracle context (see, for instance, Lawler, Lenstra, and Rinnooy Kan [605] or
Angluin [21]).
4.4 Algorithmic aspects: Positive functions 197

Many researchers have investigated natural special cases of Positive DNF


Dualization [74, 129, 225, 275, 295, 538, 652, 653, 735, 736]. If φ is a pos-
itive quadratic DNF, then the dualization problem is equivalent to the problem
of generating all maximal stable sets of a graph and can be solved with poly-
nomial delay [873, 605, 538]. More generally, the dualization problem has a
polynomial total time algorithm when its input is restricted to positive DNFs
of degree at most k, where k is viewed as a constant [119, 121, 295]. Many
other subcases can also be solved in polynomial total time; this is the case
when f is regular, threshold, matroidal, read-once, acyclic, and so on (see
[74, 119, 129, 225, 275, 295, 429, 605, 652, 653, 735, 736]). We refer to a survey
by Eiter, Makino, and Gottlob [302] for more details.
Finally, Lawler, Lenstra, and Rinnooy Kan [605] observed that a general
approach (inspired from previous work by Paull and Unger [733]) can be used
to derive polynomial dualization algorithms for certain special classes of positive
functions. This approach is quite different from those described so far: Instead of
producing the prime implicants of f d one by one, as in Theorem 4.34 or proce-
dure FK-Dualization, it recursively dualizes f|x1 =...=xn =0 , then f|x2 =...=xn =0 , . . .,
f|xn =0 , and finally f . For j = 1, 2, . . . , n, consider the following subproblem:

Add-j
Instance: A prime implicant P of (f|xj =...=xn =0 )d , where f is a positive Boolean
function on Bn expressed in DNF.
Output: All prime implicants of (f|xj +1 =...=xn =0 )d that are absorbed by P .

Theorem 4.39. If C is a class of positive functions such that Add-j can be


solved in polynomial total time on C for all j = 1, 2, . . . , n, then Positive DNF
Dualization can be solved in polynomial total time on C.

Proof. We only sketch the proof. For every positive function f ,



d
f d = xn f|xn =1 ∨ f|xn =0

= xn ∨ (f|xn =1 )d (f|xn =0 )d
= xn (f|xn =0 )d ∨ (f|xn =1 )d (f|xn =0 )d .

It follows that every prime implicant of f d is absorbed by some prime implicant


of (f|xn =0 )d and, therefore, f d can be computed by repeatedly solving Add-n for
all prime implicants of (f|xn =0 )d .
Similarly, (f|xj +1 =...=xn =0 )d can be computed for all j by repeatedly solving
instances of Add-j . Details, and ways to accelerate the algorithm, can be found
in Lawler, Lenstra, and Rinnooy Kan [605]. 

Despite its apparent simplicity, the approach sketched in Theorem 4.39 has a
surprisingly broad range of applicability. Several related approaches are mentioned
by Eiter, Makino, and Gottlob [302]; see also Grossi [413].
198 4 Duality theory

4.5 Exercises
1. Consider Reiter’s analysis of the diagnosis problem (Application 4.1).
(a) Prove that the characterization of diagnoses is correct.
(b) With the same notations as in Application 4.1, define a conflict set to be
a minimal subset N ⊆ {1, 2, . . . , m} such that
m

φk (X ∗ , Y ) = 0
k=1
k ∈N

is
inconsistent. Show that N ⊆ {1, 2, . . . , m} is a conflict set if and only
if k∈N pk is a prime implicant of f .
(c) Prove that the diagnoses are exactly the transversals of the conflict sets.
2. Prove that the composition of dual-minor positive functions is dual-minor,
and the composition of dual-major positive functions is dual-major. Show
that these results do not hold without the positivity assumption.
3. Show that, if f (x1 , x2 , . . . , xn ) is a Boolean function, then g(x1 , x2 , . . . , xn , xn+1 ,
xn+2 ) = xn+1 xn+2 ∨ xn+1 f ∨ xn+2 f d is self-dual.
4. Show that there exists a positive function f such that χ (Hf ) ≤ 3, but f is
not dual-minor (compare with Theorem 4.21).
5. Prove that, if f is a positive Boolean function on n variables, then
n ≤ p q, where p (respectively, q) is the number of prime implicants of
f (respectively, f d ).
6. Show that the procedure SD-Dualization presented in Section 4.3.1 does
not run in polynomial total time.
7. Consider a variant of SD-Dualization where the prime implicants of f
are sorted in such a way that, for j = 1, 2, . . . , n, the prime implicants on
{x1 , x2 , . . . , xj } precede any prime implicant containing xj +1 . Prove that this
variant can be implemented to run in polynomial total time on quadratic
positive functions. (Note: this implies that all maximal stable sets of a graph
can be generated in polynomial total time).
8. Prove Theorem 4.31.
9. Let ψ be a DNF of the Boolean function f (x1 , x2 , . . . , xn ). Show that the
complete DNF of f d can be generated by the following procedure: (a) In ψ,
replace every occurence of the literal xi by a new variable yi (i = 1, 2, . . . , n),
thus producing a positive DNF φ(x1 , x2 , . . . , xn , y1 , y2 , . . . , yn ); (b) Generate
the complete DNF of φ d , say η(x1 , x2 , . . . , xn , y1 , y2 , . . . , yn ); (c) In η, replace
every occurence of yi by xi , and remove the terms which are identically
zero. Is this sufficient to conclude that the problem DNF Dualization is
no more difficult than Positive DNF Dualization?
10. Show that the bounds in Lemma 4.1 and Theorem 4.36 are tight up to a
factor of 2. (Fredman and Khachiyan [347].)
11. Show that Theorem 4.35, Lemma 4.1, and Theorem 4.36 hold for arbitrary,
not necessarily positive functions. (Fredman and Khachiyan [347].)
4.5 Exercises 199

12. Prove that Positive Dual Recognition is polynomially equivalent to


deciding whether a positive function given in complete disjunctive normal
form is self-dual.
13. Prove that Positive DNF Dualization is polynomially equivalent to the
Identification problem.
14. Complete the proof of Theorem 4.39. Show that Add-j can be solved in
polynomial time on the class C of quadratic positive functions. (Compare
with Exercise 7.)

Question for thought


15. What is the complexity of the following problem: Given the complete DNFs
of two Boolean functions f and g, decide whether g = f d ?
Part II

Special Classes
5
Quadratic functions
Bruno Simeone

This chapter is devoted to an important class of Boolean functions, namely,


quadratic Boolean functions, or Boolean functions that can be represented by
DNFs of degree at most two. Since linear functions are trivial in many respects,
quadratic functions are in a sense the simplest interesting Boolean functions: Most
of the fundamental problems introduced in the first part of this monograph – solv-
ing Boolean equations, generating prime implicants, dualization – turn out to be
efficiently solvable for quadratic functions expressed in DNF. Their solution, how-
ever, requires a good understanding of structural properties of quadratic functions,
as well as clever algorithms. Graph-theoretical models play a central role in these
developments, and we will see that, conversely, many questions about graphs can
also be fruitfully rephrased as questions involving quadratic Boolean functions.

5.1 Basic definitions and properties


We start with basic definitions and properties.

Definition 5.1. We call a DNF


 
m

φ(x1 , . . . , xn ) =  xj xj 
i=1 j ∈Pi j ∈Ni

quadratic if all its terms are quadratic, that is, if they are conjunctions of at most
two literals: |Pi ∪ Ni | ≤ 2 for all i ∈ {1, . . . , m}. A term is called linear or purely
quadratic according to whether it consists of exactly one or exactly two literals.

Definition 5.2. A Boolean function f is called quadratic if it admits a quadratic


DNF.

In a similar fashion, we call a CNF quadratic if all its clauses are disjunctions
of at most two literals.

203
204 5 Quadratic functions

Definition 5.3. A Boolean function f is called dually quadratic if it admits a


quadratic CNF.
This definition is equivalent to the property that the dual function f d is
quadratic.
Recall from Chapter 2 (Definition 2.5) that the consensus of two terms x C and
x D is the term CD (provided it is not identically 0). Note that if both x C and x D
are quadratic, then their consensus CD is quadratic, too.
An important consequence of this observation is the following:
Theorem 5.1. All prime implicants of a quadratic Boolean function are quadratic.
Proof. Let f be a quadratic Boolean function, and let φ be an arbitrary quadratic
DNF representing f . By Theorem 3.5, all prime implicants of f can be obtained
by applying the consensus procedure to φ. By the above observation, all terms
obtained by this procedure, and, in particular, all prime implicants of f , must be
quadratic. 

Definition 5.4. A quadratic Boolean function f is called purely quadratic if it is


not constant and if it has no linear prime implicant or, equivalently, if no linear
term appears in any DNF of f .
The following statement follows immediately from the definitions:
Lemma 5.1. If f is purely quadratic, then in every quadratic DNF of f every
term is a prime implicant.
Let us remark that a function might be quadratic even though at first sight it
does not appear as such. In other words, it is quite possible for a quadratic function
to be represented by a DNF of higher degree.
Example 5.1. The function
f = x1 x2 x 3 x 4 ∨ x1 x2 x 3 x4 ∨ x 1 x2 x 3 x4 ∨ x1 x2 x3 ∨ x 2 x 3 x4 ∨ x 1 x3 ∨ x 2 x 4
is quadratic, since it also admits the DNF
f = x1 x2 ∨ x 1 x3 ∨ x 2 x 4 ∨ x 3 x4 . 
As noted in Chapter 1, Theorem 1.31, the problem of recognizing whether a
given DNF represents a quadratic Boolean function is co-NP-complete. Therefore,
we often assume that a quadratic Boolean function is given by a quadratic DNF.
In particular, this is the case in Definition 5.5, which introduces one of the most
important notions of this chapter.
Definition 5.5. A quadratic Boolean equation is a DNF equation of the form
ϕ(X) = 0,
where ϕ is a quadratic DNF.
5.2 Why are quadratic Boolean functions important? 205

Many authors prefer to concentrate on quadratic CNF equations of the form

ψ(X) = 1,

where ψ is a quadratic CNF. The problem of deciding whether an equation of the


latter form has solutions is known under the name 2-Satisfiability (2-Sat for
short). As follows from the discussion in Section 1.4 and Section 2.2, however,
the DNF and CNF forms of quadratic equations are strictly equivalent.

5.2 Why are quadratic Boolean functions important?


Quadratic Boolean functions are interesting, and worthy of investigation, for many
reasons. Here, we list the main ones:

(1) Quadratic Boolean functions are “abundant in nature.”


(2) There are strong connections in both directions between quadratic Boolean
functions and graphs.
(3) Many significant combinatorial problems can be reduced to 2-Sat.
(4) Low complexity algorithms are available for solving 2-Sat, as well as for
finding all prime implicants and irredundant normal forms of quadratic
Boolean functions.

We now briefly comment on these points. Item (2) is discussed at length in


Section 5.4, item (3) in Section 5.5, and item (4) in Sections 5.6 and 5.8.

(1) Quadratic Boolean functions are “abundant in nature.”


The most common types of logical relations, like
“P implies Q.”
“Either P or Q is true.”
“Either P or Q is false.”
“Exactly one of P or Q is true.”
“P is true if and only if Q is true.”
can be represented by quadratic equations, such as

pq = 0,
p q = 0,
pq = 0,
pq ∨ p q = 0,
pq ∨ pq = 0.

In fact, it has been estimated that about 95% of the production rules in
expert systems are of the foregoing types and, hence, can be represented by
quadratic equations (see Jaumard, Simeone, and Ow [531]).
206 5 Quadratic functions

(2) Quadratic Boolean functions and graphs.


The theory of quadratic Boolean functions has a strong combinatorial appeal.
This is mainly due to the fact that, with any given quadratic Boolean function
f , one can associate in many ways a graph that “represents” f , and vice
versa. Depending on f , the graph is either undirected or directed, or bidi-
rected. (A bidirected graph is a graph in which a label from the set {−1, 1} is
independently assigned to each endpoint of every edge. The arc associated
with edge (i, j ), according to the labels of its two endpoints, can be viewed
as going either from i to j or from j to i, or as being directed into both i
and j or out of both i and j ; see Figure 5.1.)
This two-way correspondence between quadratic functions and graphs is
very useful. For several important subclasses of quadratic Boolean func-
tions (discussed in detail in Section 5.3), the recognition problem can be
formulated as a problem in graph theory. The most efficient procedures for
solving quadratic Boolean equations known so far are graph algorithms (see
Section 5.6). In the opposite direction, many graph-theoretic properties, such
as bipartiteness or the Kőnig-Egerváry property, can be naturally expressed
as quadratic Boolean equations.
(3) Many significant combinatorial problems can be reduced to quadratic
Boolean equations.
Another good reason for studying quadratic Boolean functions is that a
host of significant combinatorial decision problems can be formulated as
quadratic equations. Early examples (recognition of bipartiteness and signed
graph balance) already appear in Maghout [644] and Hammer [436].
In Section 5.5, we present a collection of problems that are reducible
to quadratic equations. For each of these problems, the reduction can be
obtained in polynomial time, and for some, even in linear time. Since, as
we show in Section 5.6, there are quite fast (indeed, linear) algorithms for
quadratic equations, each of the above problems turns out to be efficiently
solvable in polynomial, or even in linear time.
Just as 3-Sat problems (or cubic equations) are a “template” for a
broad class of “hard” combinatorial decision problems (including maximum
clique, vertex cover, chromatic number, subset sum, set covering, traveling
salesman, etc.) that can all be reduced to 3-Sat in polynomial time, 2-Sat
problems (or quadratic Boolean equations) can be taken to be the “tem-
plate” of a rich class of “easy,” although nontrivial, combinatorial problems
(including the above-mentioned collection of problems and many others),
all of which are efficiently reducible to 2-Sat.
(4) Low-complexity algorithms are available for quadratic equations as well as
for finding all prime implicants and irredundant normal forms of quadratic
Boolean functions.
As we show in Section 5.5, the recognition of the Kőnig-Egerváry property in
graphs and quadratic equations are mutually reducible to each other. On this
ground, an efficient algorithm of Gavril [374] for testing the Kőnig-Egerváry
5.3 Special classes of quadratic functions 207

property can be easily translated into a linear-time algorithm for quadratic


equations, which is actually the fastest currently available algorithm for
quadratic equations.
Another nice feature of quadratic Boolean functions, which is not enjoyed
by those of higher degree, is that, starting from an arbitrary quadratic DNF,
one can produce in polynomial time all the prime implicants of the function,
as well as an irredundant DNF of it (for a general definition of these notions,
see Section 1.7). Efficient algorithms for these problems are presented in
Section 5.8.

5.3 Special classes of quadratic functions


5.3.1 Classes
In this section, we introduce several classes of quadratic Boolean functions and
then – starting from the class of all quadratic Boolean functions – we point out
characterizations of some of these classes by functional inequalities.
In any DNF, a quadratic term may take one of the three forms

xy, x y, xy,

where x and y are variables. By forbidding all terms having one or more of these
three forms, one can naturally define meaningful special subclasses of quadratic
DNFs and, accordingly, of quadratic Boolean functions.
Let us now introduce some special classes of general, not necessarily quadratic,
DNFs. We start with the definitions of Horn, co-Horn and polar DNFs, which are
thoroughly studied in Chapters 6 and 11.
Definition 5.6. A Horn DNF is a DNF in which every term contains at most one
complemented variable.
Definition 5.7. A co-Horn DNF is a DNF in which every term contains at most
one uncomplemented variable.
Definition 5.8. A polar DNF is a DNF in which no term contains both a
complemented and an uncomplemented variable.
In Section 5.4, we extensively refer to those quadratic DNFs in which every
quadratic term consists of one complemented and one uncomplemented variable.
Definition 5.9. A mixed DNF is a DNF that is both Horn and co-Horn.
As mentioned above, these important subclasses of DNFs, when restricted to
quadratic DNFs, can be simply characterized by means of forbidden terms (see
Table 5.1).
Any of these types of DNFs defines in a natural way a corresponding class of
Boolean functions. For example, we say that a Boolean function is Horn if it is
representable by a Horn DNF.
208 5 Quadratic functions

Table 5.1. Subclasses of quadratic DNFs and their


forbidden terms
Class Forbidden terms

Horn xy
co-Horn xy
polar xy
mixed xy, x y
positive (purely quadratic) xy, x y

Before we proceed with functional characterizations of these subclasses, let us


mention a result on DNF representations of purely quadratic Boolean functions.
Recall from Section 1.10 that by a positive DNF, we mean a DNF containing no
complemented variables.
Lemma 5.2. Let ϕ be a quadratic DNF of a purely quadratic Boolean function f .
If f is a positive, Horn, co-Horn, or mixed Boolean function, then ϕ is a positive,
Horn, co-Horn, or mixed DNF, respectively.
Proof. Let f be a purely quadratic Boolean function, and let ϕ be any quadratic
DNF of f . By Lemma 5.1, every term of ϕ is a prime implicant of f . If f is
positive, then every prime implicant of f is positive; hence, ϕ is positive.
Since every term of ϕ is a prime implicant of f , it can be generated by the con-
sensus algorithm of Chapter 2, executed on an arbitrary quadratic DNF of f . On
the other hand, each consensus operation, when performed on a pair of quadratic
terms, preserves the Horn, co-Horn, and mixed types. 

Note that Lemma 5.2 does not extend to polar DNFs.

5.3.2 Characterizations by functional relations


Ekin, Foldes, Hammer, and Hellerstein [305] obtained, for every class of Boolean
functions in Table 5.2 (and for others), a characterization in terms of functional
inequalities satisfied by every function in the class.
In Table 5.2, the inequalities are understood to be universally quantified over
all vectors X, Y , Z in B n ; XY and X ∨ Y are the vectors in B n whose ith compo-
nent is given by xi yi and by xi ∨ yi , respectively, for i = 1, . . . , n. The functional
characterization of quadratic Boolean functions on the first line of the table was
obtained by Schaefer [807]. A proof of this result, due to Ekin, Foldes, Hammer,
and Hellerstein [305], will be presented in Chapter 11 together with proofs of the
other functional characterizations in Table 5.2.
In view of their functional characterization, polar functions are sometimes called
supermodular and mixed ones submodular. (Notice the formal analogy with the
supermodular and submodular (real-valued) set functions defined in Chapter 13.)
5.4 Quadratic Boolean functions and graphs 209

Table 5.2. Characterizations of classes of Boolean functions


Class Functional relations

quadratic f (XY ∨ XZ ∨ Y Z) ≤ f (X) ∨ f (Y ) ∨ f (Z)


dually quadratic f (X)f (Y )f (Z) ≤ f ((X ∨ Y )(X ∨ Z)(Y ∨ Z))
Horn f (XY ) ≤ f (X) ∨ f (Y )
co-Horn f (X ∨ Y ) ≤ f (X)f (Y )
polar f (X) ∨ f (Y ) ≤ f (XY ) ∨ f (X ∨ Y )
mixed f (XY ) ∨ f (X ∨ Y ) ≤ f (X) ∨ f (Y )

Table 5.3. Quadratic Boolean functions


and graphs

Quadratic Boolean functions Graphs

positive undirected
mixed directed
arbitrary bidirected

Further properties of sub- and supermodular Boolean functions are discussed in


Chapter 6 and Chapter 11.

5.4 Quadratic Boolean functions and graphs


5.4.1 Graph models of quadratic functions
There is a quite natural correspondence between certain classes of quadratic
Boolean functions on one side, and graphs, digraphs, and bidirected graphs on
the other side, as shown in Table 5.3.
In fact, as explained in Section 1.13.5, one can associate with any undirected
graph G = (V , E) its stability function, namely, the positive quadratic Boolean
function given by

f= xi xj . (5.1)
(i,j )∈E

Note that the prime implicants of f are precisely the terms xi xj of this DNF,
which is also the unique irredundant DNF of f . It follows that the correspondence
between positive purely quadratic Boolean functions and undirected graphs is
one-to-one.
Let now D = (N, A) be a directed graph, with N = {1, 2, . . . , n}. We can associate
with D a quadratic mixed DNF ϕ ≡ ϕ(D) as follows: We associate with every
vertex i ∈ N a variable xi of ϕ, and with every arc (i, j ) ∈ A a quadratic term xi x j
of ϕ. Conversely, given any mixed quadratic DNF ϕ (without linear terms), one
210 5 Quadratic functions

Figure 5.1. Terms associated with bidirected arcs.

can uniquely reconstruct the directed graph D ≡ D(ϕ) whose associated DNF is
ϕ.
However, this time the correspondence between digraphs and quadratic mixed
Boolean functions is not one-to-one: Indeed, a purely quadratic mixed Boolean
function f may be represented by many irredundant quadratic mixed DNFs. In
order to state this relation more precisely, we need the notion of transitive closure
of a digraph (see also Appendix A): Given a digraph D = (N , A), its transitive
closure is the digraph obtained from D by adding to A all the arcs (u, v) such that
there is a directed path from u to v in D.

Theorem 5.2. Two digraphs correspond to the same quadratic mixed Boolean
function if and only if their transitive closures are identical.

Proof. Two mixed DNFs represent the same quadratic Boolean function if and only
if the two sets of prime implicants that one can obtain from them by the consensus
algorithm are the same. It is easy to see that these implicants are quadratic mixed
terms, and that the digraph associated with their disjunction is transitively closed. 

Finally, if B = (N , H ) is a bidirected graph, one introduces again the variables


{x1 , x2 , . . . , xn } associated with its n vertices as above. Quadratic terms are associ-
ated with the arcs of B as indicated in Figure 5.1. Then, ϕ is the DNF consisting
of the disjunction of all such quadratic terms. Conversely, B can be reconstructed
from ϕ.

5.4.2 The matched graph


Another graph that can be conveniently associated with a quadratic DNF ϕ is the
matched graph Gϕ , introduced by Simeone [834]. This undirected graph has 2n
5.4 Quadratic Boolean functions and graphs 211

Figure 5.2. A matched graph.

vertices corresponding to the 2n literals {x1 , . . . , xn , x 1 , . . . , x n }. Its set of edges is

{(xi , x i ) : i ∈ {1, . . . , n}} ∪ {(ξ , η) : ξ η is a term of ϕ} .

If ϕ contains linear terms, a loop (ξ , ξ ) is introduced for every such term ξ .


Example 5.2. The matched graph associated with the DNF

ϕ = x 1 ∨ x4 ∨ x1 x2 ∨ x 1 x 2 ∨ x 1 x 4 ∨ x2 x 3 ∨ x2 x4 ∨ x3 x 4 (5.2)

is shown in Figure 5.2. 

The edges of Gϕ are classified as positive, negative, mixed, or null edges


according to whether they have the form (xi , xj ), (x i , x j ), (xi , x j ), or (xi , x i ),
respectively.
The consistency of the quadratic Boolean equation ϕ = 0 has a nice graph-
theoretic counterpart for Gϕ . In order to state this property, we need some
terminology.
If µ(G) and τ (G) respectively denote the maximum cardinality of a matching
and the minimum cardinality of a (vertex) cover of an arbitrary graph G (see
definitions in Appendix A), then the following relation always holds:

µ(G) ≤ τ (G) . (5.3)

The graph G is said to have the Kőnig-Egerváry (KE) property if equality holds
in (5.3).
Theorem 5.3. The quadratic Boolean equation ϕ = 0 is consistent if and only if
the matched graph Gϕ has the Kőnig-Egerváry property.
Proof. The n null edges form a maximum matching of Gϕ . Therefore, Gϕ has the
KE property if and only if there is a cover C in Gϕ with |C| = n.
212 5 Quadratic functions

Assume first that Gϕ has the KE property, and let C be a cover with |C| = n.
As every null edge has exactly one endpoint in C, we can define Z ∈ Bn by

0 if vertex xi belongs to C
zi =
1 if vertex x i belongs to C.

Since C is a cover, Z is a solution of the equation ϕ = 0.


For the converse direction, let Z be a solution of ϕ = 0. Let C be the set of all
those vertices xi for which zi = 0 and all those vertices x i for which zi = 1. Then
C is a cover with |C| = n, and so Gϕ has the KE property. 

A variant of the matched graph in which null edges are absent is introduced in
Section 5.9 as a useful tool for dualization.

5.4.3 The implication graph


As an alternative to the matched graph Gϕ , one can associate with the quadratic
DNF ϕ a directed graph Dϕ , called the implication (di)graph of ϕ, and again
characterize the consistency of ϕ = 0 in terms of a simple property of Dϕ . As we
shall see in Section 5.8, the implication graph also turns out to be a convenient
tool for the efficient solution of two other fundamental problems, namely, finding
all prime implicants or computing an irredundant DNF of a quadratic Boolean
function. Moreover, the implication graph will prove useful in obtaining a concise
parametric product form of the solutions of a quadratic Boolean equation and in
getting a fast on-line 2-Sat algorithm (Section 5.7).
The definition of an implication graph naturally arises from the observation that
the relation
ξη = 0
is equivalent to the implication
ξ ⇒ η, (5.4)
as well as to the implication
η ⇒ ξ. (5.5)
As in the matched graph Gϕ , the vertices of the implication graph Dϕ correspond
to the 2n literals {x1 , . . . , xn , x 1 , . . . , x n }. For each quadratic term ξ η, in view of (5.4)
and (5.5), there are in Dϕ two arcs (ξ , η) and (η, ξ ) (either arc will be called the
mirror arc of the other one, and the simultaneous presence of these two arcs will
be referred to as the Mirror Property). For each linear term ξ , there is an arc (ξ , ξ )
in Dϕ .
Example 5.3. The implication graph associated with the DNF (5.2) is shown in
Figure 5.3. 

The notion of implication graph was introduced by Aspvall, Plass, and


Tarjan [34]. Their representation of linear terms, however, is different from ours:
5.4 Quadratic Boolean functions and graphs 213

Figure 5.3. An implication graph.

They add two dummy vertices x0 (representing the constant 0) and x 0 (represent-
ing the constant 1) and, for each linear term ξ , two arcs (x0 , ξ ) and (ξ , x 0 ), again
mirroring each other. The advantages of our representation will become apparent
in Section 5.8, when we discuss the relationship between prime implicants of a
quadratic DNF and transitive closures.
One should notice that, through the implication graph, a quadratic Boolean
equation is represented by an equivalent system of logical implications – a
deductive knowledge base in the terminology of artificial intelligence (see
Nilsson [713]).
The most important property of the implication graph Dϕ relates its strong com-
ponents to the consistency of the quadratic Boolean equation ϕ = 0. According to
definitions in Appendix A, a strongly connected component (or, briefly, a strong
component) of Dϕ = (N, A) is any maximal subset C of vertices with the property
that any two vertices of C lie on some closed directed walk consisting only of
vertices of C. The strong components of Dϕ form a partition of its vertex-set N ,
and they can be computed in O(m) time, where m = |A| (Tarjan [858]). By shrink-
ing each strong component into a single vertex, one obtains an acyclic digraph D̂ϕ ,
the condensed implication graph of ϕ. Notice that, in view of the Mirror Property,
the strong components of Dϕ come in pairs: If C is a strong component, then the
set C consisting of the negations of all literals in C also is a strong component.
Aspvall, Plass, and Tarjan [34] proved:
Theorem 5.4. The quadratic Boolean equation ϕ = 0 is consistent if and only if
no strong component of Dϕ contains both a literal ξ and its complement ξ .
To prove this theorem, let us state a simple, but useful, result.
Lemma 5.3. An assignment of binary values to the vertices of Dϕ corresponds to
a solution of the equation ϕ = 0 if and only if
(i) for all i, vertices xi and x i receive complementary values, and
(ii) no arc (and hence no directed path) goes from a 1-vertex (that is, a vertex
with value 1) to a 0-vertex (that is, a vertex with value 0).
214 5 Quadratic functions

Proof. This equivalence follows directly from the construction of the implication
graph. 

We now turn to the proof of Theorem 5.4.


Proof. First, assume that an assignment of binary values to the vertices of Dϕ
corresponds to a solution of ϕ = 0. Suppose also that the literals ξ and ξ belong
to the same strong component. This means that

ξ ⇒ ξ and ξ ⇒ ξ .

Therefore, the literal ξ must take the values 0 and 1 at the same time, a contradiction.
Hence ϕ = 0 has no solution.
For the converse direction, let us show that if no strong component of Dϕ
contains both a literal and its complement, then ϕ = 0 has a solution. The proof is
by induction on the number s of strong components of Dϕ (which is always even).
If s = 2 and C is a strong component, then the other strong component is C.
Since C and C are different strong components, we may assume that all the arcs
between C and C (if any) go from a vertex of C to a vertex of C. Now, assign the
value 0 to all literals in C and the value 1 to those in C. Properties (i) and (ii) of
Lemma 5.3 are satisfied and thus the assignment defines a solution of ϕ = 0.
Assume now that the statement is true whenever the implication graph has at
most s − 2 strong components (s ≥ 4), and let Dϕ have s strong components.
Consider the acyclic condensed digraph D̂ϕ obtained from Dϕ upon contraction
of the strong components of Dϕ .
Let C be the strong component of Dϕ corresponding to a source in D̂ϕ . Then,
by the Mirror Property, C is a sink of D̂ϕ . By the definitions of source and sink, no
arc of Dϕ goes into C and no arc leaves C. Remove both C and C from Dϕ . Let D
be the resulting subdigraph of Dϕ . The digraph D has s − 2 strong components.
Hence the statement of Theorem 5.4 holds for D by the inductive hypothesis, and
therefore there is an assignment of binary values to the vertices of D satisfying
(i) and (ii) of Lemma 5.3. Such an assignment can be extended to Dϕ by assigning
the value 0 to all literals in C and the value 1 to all literals in C. It is immediate to
verify that the extended assignment still satisfies (i) and (ii) of Lemma 5.3 in the
digraph Dϕ . Hence, it yields a solution of ϕ = 0. 

The implication graph enables us not only to determine the consistency of the
corresponding quadratic Boolean equation but also, in case of consistency, to infer
further properties of its solutions.
We say that a literal ξ is forced to the value α (for α ∈ {0, 1}) if either the
quadratic Boolean equation ϕ = 0 is inconsistent, or if ξ takes the value α in all
its solutions.
Theorem 5.5. Suppose that the equation ϕ = 0 is consistent. Then, the literal ξ is
forced to 0 if and only if there exists a directed path from ξ to ξ in Dϕ .
5.4 Quadratic Boolean functions and graphs 215

Proof. If there is a directed path from ξ to ξ and ξ = 1 in some solution, then this
contradicts part (ii) of Lemma 5.3.
For the converse direction, suppose that there is no directed path from ξ to ξ ,
and let X be any solution of ϕ = 0. If ξ = 1 in X, then we are done. Else, let us
modify X as follows: Assign to ξ and to all its successors the value 1; assign to ξ
and to all its ancestors the value 0. Let X be the resulting assignment. First of all,
X is well defined: No conflicting values may arise, since no ancestor of ξ can be
a successor of ξ (as this would yield a directed path from ξ to ξ ).
Let us show that X is a solution. If not, by Lemma 5.3 (ii), there is a path
from a 1-vertex α to a 0-vertex β. Since this path did not exist for X, either α is
a successor of ξ or β is an ancestor of ξ . By symmetry, it is enough to consider
the former case. But, if α is a successor of ξ , so is β, and hence β should take the
value 1 in X , which is a contradiction. 

Theorem 5.6. Let ξ be a literal not forced to 0, and let η be a literal not forced
to 1. The relation ξ ≤ η holds in all solutions of the quadratic Boolean equation
ϕ = 0 if and only if there is a directed path from ξ to η in Dϕ .

Proof. The “if” part is obvious after part (ii) of Lemma 5.3. Let us prove the “only
if” part. Assume there is no directed path from ξ to η, and let us prove that, if there
is a solution at all, then there is also a solution in which ξ = 1 and η = 0.
Consider an arbitrary solution X. By part (ii) of Lemma 5.3 there is no directed
path from any 1-vertex to a 0-vertex. If in X we have ξ = 1 and η = 0, we are
done. Otherwise, let us modify X as follows: Assign the value 1 to ξ and the value
0 to η. Also, assign the value 1 to all successors of ξ and 0 to all ancestors of η.
Taking into account the Mirror Property, assign the value 0 to all ancestors of ξ
and the value 1 to all successors of η. Leave the remaining values unchanged. We
claim that the assignment of values X obtained in this way is also a solution of
ϕ = 0.
First of all, X is well-defined: No conflicting values may arise, since no suc-
cessor of ξ may be an ancestor of η (as this would yield a directed path from ξ to
η, against our assumption).
Furthermore, no successor of ξ can be also an ancestor of ξ , else there would
be a directed path from ξ to ξ , and ξ would be forced to 0. Similarly, no ancestor
of η can be a successor of η. Suppose that in X there is a directed path from a
1-vertex α to a 0-vertex β. Then α is a successor either of ξ or η. But then, so is
β; hence β should take the value 1 in X , a contradiction. 

Two literals ξ and η are said to be twins if ξ = η in every solution of the quadratic
Boolean equation ϕ = 0.

Corollary 5.1. Suppose that the two literals ξ and η are not forced. Then, they
are twins if and only if they are in the same strong component of the implication
graph Dϕ .
216 5 Quadratic functions

Proof. The equality ξ = η is equivalent to the pair of relations ξ ≤ η, η ≤ ξ . The


statement then follows from Theorem 5.6. 

5.4.4 Conflict codes and quadratic graphs


In this section, we describe yet another way of associating a graph with a DNF.
Let us say that two elementary conjunctions conflict if there is a variable that
appears complemented in one of them and uncomplemented in the other one.
Given an arbitrary DNF ϕ, the conflict graph Cϕ of ϕ is the undirected graph
whose vertices are the terms of ϕ, and whose edges are the pairs of conflicting
terms (see Hammer [437, 465]).
Conversely, given a graph G, a (conflict) code of G is an assignment of ele-
mentary Boolean conjunctions to the vertices of G such that, if ϕ is the disjunction
of these conjunctions, then G = Cϕ .

Example 5.4. Figure 5.4 shows a graph G and two of its conflict codes. 

Let us introduce some additional terminology. Consider an arbitrary DNF ϕ


and its conflict graph Cϕ = (V , E). For a variable x of ϕ, we call color of x the set
of all edges (T , T ) ∈ E such that x is complemented in T and uncomplemented in
T , or vice-versa. Clearly, each color spans a (possibly empty, and not necessarily
induced) complete bipartite subgraph of Cϕ . Moreover, the union of all colors
corresponding to the variables of ϕ covers the edge-set of Cϕ .
Conversely, for an arbitrary graph G = (V , E), any collection of complete
bipartite subgraphs that covers E defines a conflict code of G. It easily follows
from this observation that every graph has at least one, and generally many distinct
conflict codes, as illustrated by Example 5.4.
As pointed out by Hammer [438, 465]; Benzaken, Hammer, and Simeone [68,
69]; and Hammer and Simeone [463], the non-uniqueness of a conflict code of
a graph can be exploited in order to preprocess and simplify weighted maxi-
mum stable set problems in graphs, weighted maximum satisfiability problems
(Max Sat), and unconstrained nonlinear binary optimization problems; see
Section 13.4.4.
Notice that the DNF corresponding to the conflict code of the graph in
Figure 5.4(b) is quadratic, whereas the one corresponding to Figure 5.4(a) is
not. Naturally, one may ask which graphs admit a quadratic code. Such graphs
are called quadratic by Benzaken, Hammer, and Simeone [68, 69]; an equivalent
graph-theoretic definition is that a graph is quadratic if and only if its edge-set can
be covered by complete bipartite graphs (corresponding to colors) so that at most
two different colors meet at each vertex. If, furthermore, the colors can be chosen
to be stars, then the graph is called bistellar (Hammer and Simeone [461]).
Since two terms may have more than one conflicting variable, the colors gener-
ally form a covering, but not necessarily a partition, of the edge-set of Cϕ . However,
5.4 Quadratic Boolean functions and graphs 217

Figure 5.4. Two conflict codes of the same graph.

they do form a partition when the DNF ϕ is both quadratic and primitive, that is,
when two different terms of ϕ do not involve exactly the same set of variables.
A quadratic graph is called primitive, Horn, or mixed if it admits a primitive,
Horn, or mixed quadratic code, respectively.
The complexity of recognizing quadratic graphs appears to be still an open
question. However, the following negative result was established by Crama and
Hammer [230].
Theorem 5.7. Recognizing quadratic primitive graphs is NP-complete.
Actually, they proved the following stronger result.
Theorem 5.8. Recognizing whether the edge-set of a bipartite graph can be par-
titioned into colors, so that all colors are either stars or squares (that is, C4 ’s),
and at most two colors meet at each vertex, is an NP-complete problem.
Benzaken, Hammer, and Simeone [69] remarked that quadratic primitive mixed
graphs are precisely the adjoints of directed graphs (where the adjoint of a digraph
D is the undirected graph whose vertices are the arcs of D, and where two vertices
218 5 Quadratic functions

u and v are adjacent if and only if the head of v coincides with the tail of u).
Chvátal and Ebenegger [200] proved:

Theorem 5.9. Recognizing quadratic primitive mixed graphs is NP-complete.

On the positive side, Benzaken, Boyd, Hammer, and Simeone [65] obtained
a characterization of quadratic primitive Horn graphs, and Hammer and Sime-
one [461] characterized bistellar graphs. In the statement of Theorem 5.10
hereunder, the word “configuration” refers to a family of digraphs on a given
set S of vertices. A configuration is defined by two disjoint subsets A, B ⊆ S × S.
The meaning is that, in every digraph of the family, the arcs in A must always be
present, the arcs in B must be absent, and all the remaining arcs may be either
present or absent.

Theorem 5.10. A graph G is quadratic primitive Horn if and only if it admits an


edge-orientation that avoids the ten special configurations of Figure 5.5.

Theorem 5.11. A graph G is bistellar if and only if each connected component of


the subgraph of G induced by vertices of degree at least 3 is a 1-tree, that is, it is
either a tree or it becomes a tree after deletion of one edge.

5.5 Reducibility of combinatorial problems


to quadratic equations
5.5.1 Introduction
As noted earlier, the importance of quadratic Boolean functions is substantiated by
the fact that many combinatorial decision problems can be efficiently reduced to
quadratic equations. A partial list, to be further discussed in this section, includes
checking bipartiteness of a graph, balance in signed graphs, recognition of split
graphs, recognition of the Kőnig-Egerváry property, and single-bend drawings
of electronic circuits. For some of these problems, the reduction can even be
performed in linear time. Conversely, some of them also admit a linear time reduc-
tion from quadratic Boolean equations, which makes the former equivalent, in a
well-defined sense, to the latter.
Additional applications of quadratic Boolean functions and equations can be
found in papers by Waltz [895] (computer vision); Even, Itai, and Shamir [318]
(timetabling); Hansen and Jaumard [467] (minimum sum-of-diameters cluster-
ing); Boros, Hammer, Minoux, and Rader [132] (VLSI design);  Eskin, Halperin,
and Karp [316] (phylogenetic trees) Miyashiro and Matsui [688] (selection of
home and away games in round-robin tournaments), Wang et al. [898] (routing
on the internet), and so forth. In Section 6.10.1, we present yet another applica-
tion of quadratic Boolean equations: Namely, the recognition of renamable Horn
functions.
5.5 Reducibility to quadratic equations 219

C1 C2

C3 C4

C5 C6

C7 C8

C9 C10

Figure 5.5. The ten forbidden configurations for quadratic primitive Horn graphs.
Continuous arcs must be present; dashed ones must be absent.

5.5.2 Bipartite graphs


Recall that an undirected graph G = (V , E) is bipartite if its vertex-set V can be
partitioned into two subsets V1 and V2 such that every edge of G has exactly one
endpoint in V1 and the other endpoint in V2 . Introduce binary variables xi , i ∈ V ,
where xi = 1 or 0 according to whether vertex i belongs to V1 or to V2 . Then, the
graph G is bipartite if and only if the quadratic Boolean equation

(xi xj ∨ x i x j ) = 0 (5.6)
(i,j )∈E

is consistent.
220 5 Quadratic functions

5.5.3 Balance in signed graphs


A signed graph is an undirected graph G = (V , E), together with a partition of E
into a set P of “positive” edges and a set N of “negative” edges. A signed graph is
balanced if the number of negative edges along every circuit is even. Harary [475]
showed that G is balanced if and only if V can be partitioned into two sets V1 and
V2 , so that each negative edge has exactly one endpoint in V1 and the other in V2 ,
while each positive edge has both of its endpoints either in V1 or in V2 . (Note that
when E = N, G is balanced if and only if it is bipartite.)
For a signed graph G, let us assign a binary variable xi with each vertex i, as
in the previous example. Then as pointed out by Hammer [436], G is balanced if
and only if the quadratic Boolean equation
   

 (xi x j ∨ x i xj ) ∨  (xi xj ∨ x i x j ) = 0 (5.7)
(i,j )∈P (i,j )∈N

is consistent.

5.5.4 Split graphs


Foldes and Hammer [335] introduced the following definition: A graph G = (V , E)
is split if its vertex-set can be partitioned into a clique C and a stable set I ; that is,
G is split if V can be partitioned into two (possibly empty) subsets C and I such
that
(i) if i, j ∈ C and i  = j then (i, j ) ∈ E;
(ii) if i, j ∈ I and i  = j , then (i, j )  ∈ E.
Define binary variables xj , j ∈ V , with the interpretation that

1 if j ∈ C,
xj =
0 if j ∈ I .
Then, conditions (i) and (ii) hold if and only if the quadratic Boolean equation
   

 xi xj  ∨  xi xj  = 0
(i,j )∈E (i,j )∈E

is consistent.
The related class of bisplit graphs has been investigated by Brandstädt, Hammer,
Le and Lozin [151]. Their recognition turns out again to be reducible to a quadratic
Boolean equation.

5.5.5 Forbidden-color graph bipartition


Gavril [375] has studied the following decision problem in graph theory (presented
here in a slightly different, but equivalent, form).
5.5 Reducibility to quadratic equations 221

Forbidden-color graph bipartition


Instance: A graph G = (V , E), together with an edge-coloring of G (that is, a
partition of E) consisting of at least two colors, say “red” and “blue,” and possibly
other colors.
Question: Is there a partition of V into two (possibly empty) subsets U and W
such that
(i) no red edge is entirely contained in U ;
(ii) no blue edge is entirely contained in W ?
We use the shorthand FCGB to denote the foregoing problem. Gavril [375]
showed that several combinatorial decision problems are polynomial-time
reducible (and, in fact, log-space reducible) to FCGB. For example, the recogni-
tion of split graphs is a special case of FCGB on the complete graph Kn (n = |V |):
Just color “red” the edges of G, and “blue” those of the complement G.
Furthermore, Gavril showed that quadratic equations and FCGB are mutu-
ally reducible in linear time. Here we show that FCGB is reducible to quadratic
equations. In fact, let R and B be the sets of red and blue edges of G, respectively.
Introduce binary variables xj , j ∈ V , such that

1 if j ∈ U ,
xj =
0 if j ∈ W .
Then, the answer to FCGB is Yes if and only if the quadratic Boolean equation
   

 xi xj  ∨  xi xj  = 0
(i,j )∈R (i,j )∈B

is consistent.

5.5.6 Totally unimodular matrices with two nonzero entries per column
Definition 5.10. A matrix is totally unimodular (TU) if all its square submatrices
have determinant 0, 1 or −1.
Clearly, all entries of a TU matrix must be 0, 1, or −1. TU matrices are very
important in integer programming in view of the following classical result of
Hoffman and Kruskal [495].
Theorem 5.12. Let A be an m × n TU matrix, and let b ∈ Zm be an arbitrary
integral m-vector. Then, each extreme point of the polyhedron

P = {x ∈ Rn : Ax ≤ b}

is integral.
Proof. See Hoffman and Kruskal [495]. 
222 5 Quadratic functions

Theorem 5.12 and the Fundamental Theorem of Linear Programming (see e.g.,
[199, 812]) imply the following corollary:

Corollary 5.2. Let A be an m × n TU matrix, let c ∈ Rn , and let b ∈ Zm be an


integral m-vector. If the linear program

maximize cx
(5.8)
subject to Ax ≤ b, x ∈ Rn

has a finite optimum, then it has an integral optimal solution.

Hence the integer linear program obtained from (5.8) by the addition of
integrality constraints on x can be solved by ordinary linear programming.
A complete characterization of TU matrices was obtained by Seymour [823];
a polynomial-time recognition algorithm based on this result can be found in
Schrijver [812].
For the special case of matrices with two nonzero entries per column, how-
ever, Heller and Tompkins [483] gave more efficient characterizations of totally
unimodular matrices.

Theorem 5.13. A necessary and sufficient condition for a (−1, 0, 1)-matrix A with
two nonzero entries per column to be totally unimodular is that its set of rows can
be partitioned into two (possibly empty) subsets R1 and R2 such that, for each
column a j:

(i) if the two nonzero entries of a j are different, then they both belong to R1 ,
or they both belong to R2 ;
(ii) if the two nonzero entries of a j are equal, then one of them belongs to R1 ,
and the other one belongs to R2 .

Clearly, these conditions can be expressed in terms of consistency of a quadratic


Boolean equation with two quadratic terms per column.

Example 5.5. The matrix


 
0 1 1 −1
A =  −1 0 −1 0 
1 1 0 −1

is not TU, since the associated quadratic Boolean equation

x 2 x3 ∨ x2 x 3 ∨ x1 x3 ∨ x 1 x 3 ∨ x1 x 2 ∨ x 1 x2 ∨ x1 x3 ∨ x 1 x 3 = 0

has no solution (the submatrix formed by the first three columns has
determinant −2). 
5.5 Reducibility to quadratic equations 223

5.5.7 The Konig-Egerváry property for graphs


In Section 5.4.2, we have proved that a quadratic Boolean equation ϕ = 0 is
consistent if and only if the matched graph Gϕ associated with ϕ has the Kőnig-
Egerváry property. Here we show that, conversely, the validity of the Kőnig-
Egerváry property for graphs can be reduced to the satisfiability of a quadratic
Boolean function.
Let G = (V , E) be an arbitrary graph, and let M be a maximum matching
of G. Note that M can be found in O(|V |2.5 ) time (see, e.g., Papadimitriou and
Steiglitz [726]). Let F be the set of all free vertices, that is, the set of vertices that
are not endpoints of any edge in M. For each edge ei ∈ M, let us associate the
literal xi with one of the endpoints of ei , and the literal x i with the other endpoint;
moreover, we associate a literal xj with each j ∈ F .
Finally, denoting by ξ(v) the literal associated with vertex v ∈ V , we set
  # $

ϕ= ξ(u)ξ(v) ∨ ξ (w) . (5.9)
(u,v)∈E\M w∈F

Simeone [834] proved:


Theorem 5.14. The graph G has the Kőnig-Egerváry property if and only if the
quadratic Boolean equation ϕ = 0 is consistent, where ϕ is defined by (5.9).
Before proving the theorem, let us introduce the notion of “rake,” and let us state
two related results. A pair (C, M), where C ⊆ V is a cover and M is a matching,
is called a rake if every v ∈ C is an endpoint of exactly one edge of M, and
every e ∈ M has exactly one endpoint in C. Note that if (C, M) is a rake, then
C necessarily is a minimum cover and M necessarily is a maximum matching.
The next two results are due to Klee (as reported in [604]) and to Gavril [374],
respectively.
Theorem 5.15. A graph has the Kőnig-Egerváry property if and only if it has a
rake.
Theorem 5.16. A graph has the Kőnig-Egerváry property if and only if, for every
minimum cover C and every maximum matching M, the pair (C, M) is a rake.
Now we can prove Theorem 5.14.
Proof. Assume that the Boolean equation ϕ = 0 is consistent, and let X∗ be a
solution. Let I be the set of all vertices v ∈ V such that the associated literal ξ(v)
takes value 1 in X∗ . The set I must be stable; hence, C = V \I is a cover. Moreover,
all vertices in C must be matched because F ⊆ I . On the other hand, every edge
of M must have exactly one endpoint in C and one in I . Hence, (C, M) is a rake,
and by the “if” part of Theorem 5.15, G has the KE property.
Conversely, assume that the KE property holds for G. If C is an arbitrary mini-
mum cover of G, then (C, M) must be a rake by the “only if” part of Theorem 5.16.
224 5 Quadratic functions

x1

x1 x2

x2

x3 x4

Figure 5.6. Graph for Example 5.6.

The set I = V \C is stable and must include F because all vertices in C are matched.
Hence, if we assign the value 0 or 1 to ξ(v) according to whether v ∈ C or v ∈ I ,
we obtain a solution of the equation ϕ = 0. 

Example 5.6. Consider the graph of Figure 5.6, where the matching is represented
by thick edges. The associated Boolean equation is
ϕ ≡ x1 x2 ∨ x 1 x2 ∨ x 1 x 2 ∨ x 2 x3 ∨ x 2 x4 ∨ x 3 ∨ x 4 = 0.
It is easy to see that this equation is inconsistent, and that the graph does not have
the KE property. 

5.5.8 Single-bend wiring


In the design of microwave integral circuits, some prescribed pairs of pins with
known locations on a rectangular board are to be connected. When the conductors
are microstrip lines, it is desirable that each connection consist only of a horizontal
segment and of a vertical one; due to this single-bend wiring requirement, only
two connections, called upper and lower, respectively, are allowed for any given
pair of pins (see Figure 5.7).
For technological reasons, we want to find, if there is one, a set of pairwise non-
crossing connections for the prescribed pairs. (We may assume that the pins are in
“general position,” that is, no two of them are aligned along the same horizontal
or vertical line. This assumption simplifies the discussion.)
Let us associate a Boolean variable x with every pair of pins to be con-
nected, where x = 1 or 0, respectively, depending on whether an upper or a lower
connection is chosen for the pair.
5.5 Reducibility to quadratic equations 225

upper lower

(a) (b)

Figure 5.7. Single bend wiring.

A
A A
B B B

A⬘
B⬘ B⬘
A⬘ B⬘

A⬘

(a) Infeasibility (a) Forcing to “upper” (c) “Upper” implies “upper”

Figure 5.8. Some basic patterns in the single bend wiring problem.

It is easy to see that the relative positions of two given pairs of pins may
induce some constraints on the connections between each pair, and hence, on
the corresponding Boolean variables. Figure 5.8 shows some of the patterns that
may occur. In case (a), no matter whether connections AA’ and BB’ are upper or
lower, they must cross each other, giving rise to an infeasible situation. In case (b),
regardless of whether connection BB’is upper or lower, connection AA’is forced to
be upper, else it would cross BB’. In case (c), if connection BB’ is upper, then also
AA’ must be upper, else it would cross BB’. In every case, each constraint involves
only two connections; hence it can be represented by quadratic conditions on the
corresponding Boolean variables. Therefore, checking the existence of a feasible
noncrossing wiring can be reduced to the solution of a quadratic Boolean equation;
226 5 Quadratic functions

see Raghavan, Cohoon, and Shani [773] and Garrido, Márquez, Morgana, and
Portillo [373] for details and extensions.
In an interactive computer-aided design (CAD) environment, one usually places
one component at a time and tries to connect it to the others. The addition of such
a component gives rise to new terms in the quadratic equation. This motivates the
investigation of an on-line model; see Section 5.7.4.

5.5.9 Max-quadratic functions and VLSI design


Recall from Section 1.12.2 that a pseudo-Boolean function is a real-valued function
of Boolean variables. Boros, Hammer, Minoux, and Rader [132] define a max-
quadratic function as any pseudo-Boolean function that is the pointwise maximum
of a finite set of (quadratic) pseudo-Boolean functions of two variables.
Formally, let F be a finite family of (possibly repeated) ordered pairs p =
(p1 , p2 ) of elements in {1, 2, . . . , n}. Then a max-quadratic function has the form

g(x1 , x2 , . . . , xn ) = max gp (xp1 , xp2 ),


p∈F

where
gp (xp1 , xp2 ) = ap xp1 xp2 + bp xp1 + cp xp2 + dp , p∈F
and all xj ∈ {0, 1}.
Boros et al. [132] report an interesting application of the minimization of max-
quadratic functions to a VLSI design problem. The decision version of this problem
asks whether, for a given threshold value t, the set of inequalities

gp (xp1 , xp2 ) ≤ t, p ∈ F (5.10)

has a solution.
For any given pair p = (p1 , p2 ) ∈ F, the set of all solutions to the inequality
gp (xp1 , xp2 ) ≤ t is a subset of the 2-dimensional binary cube B 2 . Every such subset
is itself the set of solutions of a quadratic Boolean equation in two variables. It
follows that the set of solutions of the system of inequalities (5.10) is also the set
of solutions of a quadratic Boolean equation.
Example 5.7. Let

g(2,5) (x2 , x5 ) = 7 − 3x2 − 2x5 + 4x2 x5

and let t = 5. Then the set of solutions of the inequality g(2,5) (x2 , x5 ) ≤ t consists
of the points (x2 , x5 ) = (0, 1) and (x2 , x5 ) = (1, 0). Hence, the set of solutions of
the inequality g(2,5) (x2 , x5 ) ≤ t coincides with the set of solutions of the quadratic
Boolean equation
x2 x5 ∨ x 2 x 5 = 0.

5.5 Reducibility to quadratic equations 227

5.5.10 A level graph drawing problem


A level graph is a directed acyclic graph (DAG) G = (V , A) together with a level
function, that is, a function l from V onto Jr ≡ {1, . . . , r} (r being a positive integer)
such that
(u, v) ∈ A ⇒ l(v) > l(u).
In G, level h (h = 1, . . . , r) is defined to be the set

Lh = {v ∈ V : l(v) = h},

which is certainly nonempty by our assumption that l is surjective.


A level graph is proper if the stronger condition

(u, v) ∈ A ⇒ l(v) = l(u) + 1

holds. A level-planar embedding of the level graph G is an embedding of G in the


plane such that
(i) the vertices of each level Lh are aligned along a straight vertical line which
differs from level to level;
(ii) all arcs are represented by straight line segments whose endpoints must
lie on two consecutive vertical lines;
(iii) any two such straight line segments, if different, may intersect only in a
common endpoint.
Checking whether a given level graph admits a level-planar embedding is a
question of practical importance in the area of graph drawing, in view of its
applications to software engineering, database design, and project management.
The essence of the problem lies in finding suitable linear orders of each level Lh
such that, if the vertices in Lh are placed along a vertical line from top to bottom
according to the linear order in Lh , no arc-crossing arises.
Thus, checking the existence of a level-planar embedding of a proper level
graph can be rephrased in order-theoretic terms as follows. Let (R, ≤) and (S, ,)
be two finite linearly ordered sets. Let ϕ be a one-to-many mapping of R into S.
The mapping ϕ is said to be isotonic if

(x, y ∈ R and x < y) ⇒ (ξ , η for all ξ ∈ ϕ(x) and η ∈ ϕ(y)).

Consider the following decision problem:

Isotony
Instance: r mutually disjoint finite sets L1 , . . . , Lr ; for each h = 1, . . . , r − 1, a
one-to-many mapping ϕh from Lh to Lh+1 .
Question: Are there r linear orders ,1 , . . . , ,r on L1 , . . . , Lr , respectively, such that
ϕh is an isotonic mapping from (Lh , ,h ) into (Lh+1 , ,h+1 ), for h = 1, . . . , r − 1?
Clearly, the above embedding problem is reducible to Isotony.
228 5 Quadratic functions

Figure 5.9. The level graph for Example 5.8.

Example 5.8. Consider the proper level graph G shown in Figure 5.9(a). The
levels of G are L1 = {a, b}, L2 = {c, d, e}, L3 = {f , g}. A level-planar embedding
of G is shown in Figure 5.9(b). With reference to the corresponding Isotony
formulation, the mappings ϕ1 and ϕ2 are given by

ϕ1 (a) = {c, d}, ϕ1 (b) = {c, e};

ϕ2 (c) = {g}, ϕ2 (d) = {f , g}, ϕ2 (e) = {g}.


The answer to Isotony is Yes and the required linear orders ,1 , ,2 , ,3 are given by

b ≺1 a; e ≺2 c ≺2 d; g ≺3 f . 

Randerath, Speckenmeyer, Boros, Čepek, Hammer, Kogan, Makino, and


Simeone [778] pointed out a simple reduction of Isotony to a cubic Boolean
equation (or equivalently, to 3-Sat). In order to describe it, let us introduce binary
variables 
1 if i, j ∈ Lh , i  = j , and i ≺h j ,
zijh =
0 otherwise.
In other words, Z h = [zijh ] is the incidence matrix of the (unknown) linear order
,h , h = 1, . . . , r. Then, the following constraints must be satisfied:
(i) For each h = 1, . . . , r − 1; i, p ∈ Lh , i = p; j ∈ ϕh (i), q ∈ ϕh (p):

i ≺h p ⇒ j ,h+1 q (isotony)

or, equivalently,
h h+1
zip zj q = 0 (5.11)
since each ,h is a linear order.
(ii) For each h = 1, . . . , r; i, p ∈ Lh , i = p:

i ,h p ⇐⇒ p  ,h i (asymmetry and completeness)


5.5 Reducibility to quadratic equations 229

Table 5.4. Complexity of reductions to quadratic equations

Problem Complexity of the reduction

Bipartiteness O(m)
Balance in signed graphs O(m)
Recognition of split graphs O(n2 )
Forbidden-color graph bipartition O(m)
Totally unimodular matrices linear
Kőnig-Egerváry property O(n2.5 )
Single bend wiring quadratic
Max-quadratic functions linear
Level graph drawing quadratic

or, equivalently,
h h
zip zpi ∨ zhip zhpi = 0. (5.12)
(iii) For each h = 1, . . . , r; i, k, p ∈ Lh :

i ,h k and k ,h p ⇒ i ,h p (transitivity)

or, equivalently,
h h h
zik zkp zip = 0. (5.13)
Summing up, the answer to Isotony is Yes if and only if the cubic Boolean
equation
F (Z) = 0
is consistent, where F is the disjunction of all the left-hand sides of (5.11), (5.12),
and (5.13). Randerath et al. [778] proved the following surprising result.
Theorem 5.17. The cubic constraints (5.13) are redundant. Therefore, Isotony
is polynomially reducible to a quadratic Boolean equation, and it can be answered
in polynomial time.
Proof. The proof is lengthy and must be omitted here. The reader may consult the
paper by Randerath et al. [778]. 

5.5.11 A final look into complexity


Most of the reductions to quadratic Boolean equations discussed in Sections 5.5.2
to 5.5.10 can be performed in linear time, and all of them in polynomial time. The
complexity of these reductions is summarized in Table 5.4. In this table, n and m
stand for the number of vertices and edges of the input graph, while “linear” and
“quadratic” are meant with respect to the input size.
In conclusion, we see that quadratic Boolean equations, or 2-Sat problems,
play, within a wide class of “tractable” problems, an analogous role to that of
230 5 Quadratic functions

cubic equations, or 3-Sat problems, for the class of “untractable” NP-complete


problems. Formally, it can be proved (see Papadimitriou [725]) that 2-Sat is NL-
complete, where NL denotes the class of those problems that can be solved by a
nondeterministic Turing machine using a logarithmic amount of memory space.
This result supports the view that, in a sense, quadratic Boolean equations are
among the “hardest easy” discrete problems.

5.6 Efficient graph-theoretic algorithms for quadratic equations


5.6.1 Introduction
As discussed in Chapter 2, solving Boolean equations is one of the most fundamen-
tal and important problems on Boolean functions. Although intractable in general,
this problem admits efficient algorithms when the input is restricted to quadratic
DNFs. Indeed, the polynomial-time solvability of quadratic equations was already
pointed out by Cook [208] in his seminal paper on the NP-completeness of Sat-
isfiability. Here, we provide a simple argument to establish this fact (in DNF
formulation).
Given a quadratic DNF equation in n variables, apply the classical variable
elimination method presented in Section 2.6, maintaining at each iteration a current
list of terms. Eliminating an arbitrary variable requires computing the conjunction
(product) of two linear expressions, which results in O(n2 ) quadratic terms, and
checking whether each of the generated terms is absorbed by some term in the
current list, which takes O(n4 ) time. The backward step for retrieving a solution
costs only O(n2 ) time. In conclusion, an O(n5 ) algorithm ensues.
The key property that allows this procedure to run in polynomial time is that
the equation obtained after eliminating a variable remains quadratic. As a conse-
quence, no exponential blowup can occur in the course of the algorithm. A similar
reasoning would apply to the consensus procedure described in Section 2.7, since
the consensus of any two quadratic terms is again quadratic.
However, one can do much better in terms of complexity. In the rest of this
section, we describe four fast algorithms for the solution of quadratic Boolean
equations:

• The Labeling algorithm of Gavril [374].


• The Alternative Labeling algorithm of Even, Itai, and Shamir [318] (this
paper contains only an outline of the algorithm; more detailed descriptions
can be found in Gavril [374] and Simeone [834]).
• The Switching algorithm of Petreschi and Simeone [741].
• The Strong Components algorithm of Aspvall, Plass, and Tarjan [34].

All four algorithms above are graph theoretic: In the first three algorithms, the
quadratic Boolean expression ϕ is represented by an undirected graph (namely, the
matched graph introduced in Section 5.4.2), whereas the fourth algorithm exploits
a digraph model (namely, the implication graph introduced in Section 5.4.3).
5.6 Graph-theoretic algorithms for quadratic equations 231

Figure 5.10. The matched graph for Example 5.9.

Consider a quadratic Boolean equation in n variables x1 , . . . , xn and in m terms,


say,
ϕ ≡ T1 ∨ · · · ∨ Tm = 0, (5.14)
where, without loss of generality, we may assume that each term is the conjunction
of exactly two literals and that no term appears more than once in the expression ϕ.

Example 5.9. The four algorithms to be described will be demonstrated on the


quadratic Boolean equation

ϕ = x1 x 3 ∨ x1 x 4 ∨ x 1 x2 ∨ x2 x4 ∨ x 2 x 4 ∨ x3 x 4 = 0. (5.15)

The corresponding matched graph Gϕ and implication graph Dϕ are shown in


Figures 5.10 and 5.11, respectively. 

5.6.2 Labeling algorithm (L)


The basic principle of the Labeling algorithm for quadratic Boolean equations can
be traced back to the algorithm proposed by Gavril [374] for the recognition of the
Kőnig-Egerváry property in graphs, see Petreschi and Simeone [742]. The idea is
to guess the value of an arbitrary literal ξ and to deduce -– essentially, by the unit
literal rules of Section 2.5.2 – the possible consequences of this guess on other
variables. One keeps track of these consequences by a 0–1 labeling of the literals
occurring in the input DNF ϕ.
Initially all terms are declared to be “unscanned.” An arbitrary literal ξ is
selected and is given the label 1; at the same time ξ is given the label 0. Then the
labeling is propagated to as many literals as possible through repeated execution
of the following STEP:
232 5 Quadratic functions

Figure 5.11. The implication graph for Example 5.9.

STEP:
Pick an arbitrary unscanned term ηζ such that η has the label 1, and assign to ζ
and to ζ the labels 0 and 1, respectively, making sure that ζ did not previously
receive the label 1. Declare the term ηζ “scanned.”

If a conflict arises because ζ was previously assigned the label 1, and the
algorithm now tries to assign the label 0 to ζ , then the labeling stops, all labels
are erased, and the alternative guess ξ = 0 is made. The labeling procedure starts
again, and if a new conflict occurs at a later stage, then the algorithm terminates
with the conclusion that the equation has no solution. On the other hand, if all
literals are successfully labeled, then the algorithm concludes that the equation is
consistent and the labeling directly yields a solution. However, a third possibility
may occur: The labeling “gets stuck,” in the sense that no conflict has occurred,
but some literals are still unlabeled. This may happen only when no unscanned
term contains a literal labeled 1, in other words, when each literal appearing in an
unscanned term is either unlabeled or has the label 0. If this situation occurs, then
the labeled variables are fixed according to the current labels, and the labeling
restarts with a new guess on the reduced expression involving only the unlabeled
literals.

Theorem 5.18. The Labeling algorithm is correct and runs in O(mn) time.

Proof. The algorithm makes a guess on the value of some literal, and then it
deduces the values of as many literals as possible. Label propagation (that is,
value assignment) stops in three cases:
5.6 Graph-theoretic algorithms for quadratic equations 233

Case 1: All literals have been labeled without conflicts.


In this case the labels assigned to the literals define a solution X ∗ of the quadratic
Boolean equation ϕ = 0. Indeed, the algorithm is such that

(i) the labels assigned to variable xi and to its complement x i are different for
i = 1, . . . , n;
(ii) the label 1 is never simultaneously assigned to two literals appearing in a
same quadratic term.

Case 2: A conflict occurs.


In this case the initial guess ξ = 1 was wrong. This means that one must have
ξ = 0 in every solution (if any) of the Boolean equation ϕ = 0; equivalently, ξ is
a linear implicant of the quadratic Boolean function f associated with ϕ. If the
label propagation consequent to the alternative guess ξ = 0 also ends in a conflict,
then ξ must be a linear implicant of f , too, meaning that the equation ϕ = 0 has
no solution.

Case 3: The algorithm “gets stuck,” that is, a proper subset of literals is labeled
and the labeling cannot be extended further.
In this case, let L and U be the sets of labeled and unlabeled literals, respectively.
Let ϕU be the subexpression of ϕ obtained after fixing all the labeled variables to
their current labels; thus, ϕU involves only the unlabeled literals. We claim that
ϕ = 0 is consistent if and only if ϕU = 0 is consistent.
Observe that each term of T of ϕU is among the terms of ϕ: Indeed, if this is
not the case, then T must result from some term Ti = ηζ of ϕ by fixation of one
of its literals to 1. But then, the propagation step implies that the other literal of Ti
should have been labeled 0, so that T should not appear in ϕU .
Now, assume that ϕ = 0 is consistent and that ϕ(X∗ ) = 0. Then, in view of
the previous observation, the restriction of X∗ to the variables associated with
unlabeled literals defines a solution of ϕU = 0.
Conversely, assume that ϕU = 0 is consistent, and consider the labeling cor-
responding to an arbitrary solution. Such labeling, together with the one already
obtained for L, defines a complete labeling of the literals of ϕ having the above
properties (i) and (ii), and hence, a solution of ϕ = 0.
It follows that the labels of L can be made permanent, and that the labeling can
restart from U after the process gets stuck. Hence, the algorithm is correct.
The total number of initial guesses made by the algorithm is at most 2n. After
each guess, the corresponding label propagation stage explores at most m terms.
Hence, the worst-case complexity of the Labeling algorithm is O(mn). 

Example 5.9 (continued). The history of the execution of the Labeling algorithm
on the quadratic equation (5.15) is shown in Table 5.5. After Step 10 all literals
have been labeled without conflicts. Hence, the equation ϕ = 0 is consistent, and
a solution is x1∗ = 0, x2∗ = 0, x3∗ = 1, x4∗ = 1. 
234 5 Quadratic functions

Table 5.5. Execution of the Labeling algorithm


on the equation (5.15)

Step Term Labels State

0 / x2 = 1, x 2 = 0 guess
1 x 1 x2 x 1 = 0, x1 = 1
2 x2 x4 x4 = 0, x 4 = 1
3 x1 x 3 x 3 = 0, x3 = 1
4 x1 x 4 x 4 = 0, x4 = 1 conflict
5 / x2 = 0, x 2 = 1 alternative guess
6 x2x4 x 4 = 0, x4 = 1 stuck
7 / x1 = 0, x 1 = 1 guess
8 / stuck
9 / x3 = 1, x 3 = 0 guess
10 / end

Example 5.10. Consider the DNF


ϕ = x 1 x2 ∨ x 2 x3 ∨ . . . ∨ x n−1 xn ∨ x n−1 x n
with m = n. If the initial guess x1 = 0 is made, then the ensuing label propa-
gation stage discovers a conflict very late, that is, after n steps, when xn must
successively receive the labels 0 and 1. For the alternative guess x1 = 1, label
propagation immediately gets stuck, and no further variable may be labeled.
Afterwards, the wrong guess x2 = 0 can be made, and so on. So, in the worst
case, n + (n − 1) + · · · + 2 + 1 = 12 (n + 1)n = 12 (n + 1)m terms are scanned (with
repetitions) by the algorithm, and the total number of operations performed is of
the order of *(mn). 

5.6.3 Alternative Labeling algorithm (AL)


The idea of the Alternative Labeling algorithm is again to guess the value of an
arbitrary literal ξ in some solution and to deduce the possible consequences of
this guess on other variables appearing in the expression. Since ξ can take either
the value 0 or the value 1, the algorithm analyzes in parallel the consequences
of these two alternative guesses on ξ . It keeps track of these consequences by
a “red” labeling (corresponding to the guess ξ = 1) and by a “green” labeling
(corresponding to the guess ξ = 0). The purpose of propagating the two labelings
in parallel is to avoid wasting time on the green labeling, say, as soon as the red
one either detects an early conflict or gets stuck.
Initially, all terms are declared to be “red-unscanned” and “green-unscanned.”
Then, the algorithm selects an arbitrary literal ξ and assigns to it both the red label
1 and the green label 0, while the complementary literal ξ receives the red label 0
and the green label 1.
5.6 Graph-theoretic algorithms for quadratic equations 235

Table 5.6. Execution of the Alternative Labeling algorithm on the


equation (5.15)

RED LABELING GREEN LABELING


Step Term Labels State Term Labels State
0 / x2 = 0, x 2 = 1 guess / x2 = 1, x 2 = 0 guess
1 x2x4 x 4 = 0, x4 = 1 x 1 x2 x 1 = 0, x1 = 1
2 x2 x4 x2 = 0, x 2 = 1 stuck
3 / x3 = 0, x 3 = 1 guess / x 3 = 0, x3 = 1 guess
4 x1 x 3 x1 = 0, x 1 = 1 end

The two labelings are then extended to as many literals as possible through the
alternate execution of the following STEP for the red labeling and for the green
one:

STEP:
Pick an arbitrary unscanned term ηζ such that η has the label 1, and assign to ζ
and to ζ the labels 0 and 1, respectively, making sure that ζ did not previously
receive the label 1. Declare the term ηζ “scanned.” (Here, terms like “label,”
“unscanned,” “scanned” are relative to the color currently under consideration.)

If a conflict arises, say, for the red labeling (i.e., some literal that was previously
red-labeled 1 is forced to get the red label 0, or vice versa), the red labeling stops
and the red labels are erased. If, at a later stage, a conflict occurs also for the green
labeling, the algorithm stops and the equation has no solution. It may happen that
one of the labelings, say, the red one, “gets stuck,” meaning that no conflict has
occurred, but that there are still literals having no red label. This is possible only
when, for each red-unscanned term, the literals appearing in that term are either
red-unlabeled or have red label 0. If this situation occurs, then the red labels are
made permanent, and both the red and the green labeling are restarted on the
reduced expression involving only the red-unlabeled literals.
The algorithm can be shown to run in O(m) time (see Gavril [374]).

Example 5.9 (continued). Table 5.6 summarizes a run of the algorithm on the
equation (5.15). After step 4, all literals have been (red-)labeled without conflicts.
Hence the equation is consistent and a solution is given by x1∗ = 0, x2∗ = 0, x3∗ =
0, x4∗ = 1. 

5.6.4 Switching algorithm (S)


The Switching algorithm relies on the idea of Horn-renamability. Lewis [612]
introduced the class of Horn-renamable DNFs, consisting of those DNFs that
236 5 Quadratic functions

can be written as Horn DNFs after switching a subset of variables, that is, after
performing the change of variables that replaces some of the original variables xi by
new variables yi = x i . He provided a 2-Sat characterization of Horn-renamability
(see Section 6.10.1 for details). For quadratic DNFs, a sort of converse relation
holds.
Theorem 5.19. Given a pure quadratic Boolean DNF ϕ, the equation ϕ = 0 is
consistent if and only ϕ is Horn-renamable.
Proof. The proof is left as an easy exercise. 

On the basis of Theorem 5.19, the Switching algorithm tries to transform the
given expression ϕ into a Horn expression, if possible, through a sequence of
switches of variables. The algorithm first identifies an arbitrary negative term,
say x i x r ; if this term is to be transformed into a Horn term, then at least one of
the variables xi , xr needs to switched. The algorithm accordingly picks one of the
variables, say xi , and tries to deduce the consequences of this choice.
In order to describe more formally the algorithm, it is convenient to introduce
some preliminary definitions. (We use the tree terminology of Appendix A.) An
alternating tree rooted at x i is a subgraph T (x i ) of the matched graph Gϕ with
the following properties:
(1) T (x i ) is a tree, and x i is its root.
(2) If xj is a vertex of T (x i ), then its father in T (x i ) is x j .
(3) If x j is a vertex of T (x i ) and j  = i, then its father is a vertex xr of T (x i )
such that (xr , x j ) is a mixed edge of Gϕ .
(4) If xr is a vertex of T (x i ) and (xr , x j ) is a mixed edge of Gϕ , then x j is a
vertex of T (x i ).
Note that it is easy to “grow” a maximal alternating tree T (x i ) rooted at a vertex
x i of a matched graph. Indeed, suppose that T is any tree T which satisfies con-
ditions (1)–(3) (initially, T may contain the isolated vertex x i only), and perform
the following steps as long as possible:
(i) If T has a leaf of the form x j , then add vertex xj and edge (x j , xj ) to T .
(ii) If T has a leaf xr , then add to T all vertices x j and edges (xr , x j ) such that
(xr , x j ) is a mixed edge of Gϕ and x j is not already in T .
It is clear that conditions (1)–(3) are maintained by both steps (i) and (ii). Moreover,
when step (ii) no longer applies, then condition (4) is also satisfied; hence, T is an
alternating tree rooted at xi .
Let us now record two useful properties of alternating trees.
Lemma 5.1. Let T (x i ) be an alternating tree of Gϕ rooted at x i , let xj be any
vertex of T (x i ), and let P (i, j ) be the unique path from xi to xj in T (x i ). If X ∗ is
a solution of the equation ϕ(X) = 0 such that xj∗ = 0, then xk∗ = 0 for all vertices
xk lying on P (i, j ).
5.6 Graph-theoretic algorithms for quadratic equations 237

Proof. The proof is by induction on the length of the path P (i, j ). If P (i, j ) has
length 0, then i = j and the statement is trivial. Otherwise, observe that x j is the
father of xj in T (x i ), and consider the father of x j ; in view of condition (3), this
is a vertex xr such that xr x j is a term of of ϕ. Since ϕ(X ∗ ) = 0 and xj∗ = 0, we
obtain that xr∗ = 0. Now, the conclusion follows by induction, since the path from
xi to xr is shorter than P (i, j ). 

To state the next property, we define the join of two vertices of T (x i ) to be their
common ancestor that is farthest away from the root x i . Note that the join of any
two vertices necessarily corresponds to an uncomplemented variable.

Lemma 5.2. If xh and xk are two vertices of an alternating tree T (x i ), and xh xk


is a positive term of ϕ, then the variable xj associated with the join of xh and xk
is forced to 0 in all solutions of ϕ = 0.

Proof. In every solution (if any) of ϕ = 0, either xh or xk must take value 0. Since
xj is on the path from xi to xh and on the path from xi to xk , the conclusion follows
from Lemma 5.1. 

We are now ready to describe the Switching algorithm. The algorithm works on
the matched graph Gϕ . An endpoint x i of a negative edge (x i , x r ) is selected, and
an alternating tree T (x i ) is grown, as explained above. As soon as a new vertex xh
of T (x i ) is generated, one checks whether Gϕ has a positive edge (xh , xk ) linking
xh to a previously generated vertex xk of T (x i ). If this is the case, the variable xj
corresponding to the join of xh and xk must be forced to 0 by Lemma 5.2.
As a consequence, other variables are forced in cascade according to the
following rules:

• If ξ is forced to 0, then ξ is forced to 1.


• If ξ is forced to 1 and (ξ , η) is an edge of Gϕ , then η is forced to 0.

If a conflict occurs during this process (that is, if some variable is forced both to 0
and to 1), then the algorithm stops and concludes that the equation is inconsistent.
Otherwise, we obtain a reduced equation involving fewer variables, and a new
iteration begins. If the construction of T (x i ) has been completed and no positive
edge between two vertices of T (x i ) has been detected, then a switch is performed
on all the variables corresponding to the vertices of T (x i ). In this way, we produce
an equivalent expression, and a new iteration begins. The procedure is iterated
until either a Horn equation is obtained or all variables are forced. In both cases,
a solution of the original equation ϕ = 0 can be found by inspection of the lists of
the forced variables and of the switched ones.

Example 5.9 (continued). The matched graph Gϕ of Figure 5.10 has a negative
edge (x 2 , x 4 ). Hence, the alternating tree T (x 2 ) shown in Figure 5.12 is grown,
until the positive edge (x2 , x4 ) is detected.
238 5 Quadratic functions

Figure 5.12. Alternating tree rooted at x 2 for the matched graph of Figure 5.10.

Since the join of x2 and x4 is x2 itself, the variable x2 is forced to 0. Because


of the term x 2 x 4 , variable x4 is forced to 1. The subgraph of Gϕ induced by x1 , x3
and their complements has no negative edge, so the associated DNF ψ is Horn,
and the equation ψ = 0 has the trivial solution x1 = x3 = 0. It follows that the
equation ϕ = 0 is consistent, and that it admits the solution x1 = x2 = x3 = 0,
x4 = 1. 

Theorem 5.20. The Switching algorithm is correct and can be implemented to


run in O(mn) time.

Proof. In view of Lemma 5.2, the consistency of the original equation is not
affected when we fix variables as explained in the algorithm. Also, switching a
set of variables does not affect consistency. Therefore, if the algorithm terminates
(either because the equation is proved to be inconsistent or because a solution
has been produced), then it necessarily returns the correct answer. Thus, we only
need to prove that the algorithm always terminates. To see this, let us show that
each vertex x i can occur at most once as the root of an alternating tree during
5.6 Graph-theoretic algorithms for quadratic equations 239

the execution of the algorithm. Consider what can happen when the tree T (x i ) is
generated.

• If the equation is declared inconsistent, then the algorithm stops.


• If a positive edge (xh , xk ) is encountered and the join of xh , xk is forced to 0,
then as a consequence of Lemma 5.1, xi is subsequently fixed to 0 as well,
and this variable disappears from the remaining equation.
• If all variables occurring in T (x i ) are switched when T (x i ) has been com-
pletely generated, then we claim that no new negative edges arise in the
process (in other words, a positive edge or a mixed edge is never transformed
into a negative edge in the course of the algorithm). This implies, in partic-
ular, that x i will never appear in a negative edge in any subsequent iteration
of the algorithm. To prove the claim,
– consider any positive edge (xh , xk ) of Gϕ ; at most one of xh and xk can
belong to T (x i ); otherwise, the positive edge (xh , xk ) would have been
detected and handled earlier by the algorithm; hence, this edge either
remains positive or becomes mixed after switching;
– consider a mixed edge (xh , x k ) of Gϕ ; in view of condition (4) in the
definition of alternating trees, it cannot be the case that xh is a vertex
of T (x i ) but xk is not; hence, (xh , x k ) cannot be transformed into a
negative edge.

Petreschi and Simeone [741] describe an implementation of the Switching algo-


rithm with complexity O(mn). 

5.6.5 Strong Components algorithm (SC)


This algorithm is based on Theorem 5.4 and Lemma 5.3. It works on the implication
graph D = Dϕ and preliminarily finds the strong components of D in reverse
topological order (see Appendix A and Tarjan [858]). The Mirror Property of D
(see Section 5.4.3) implies that for every strong component C of D, there exists a
“mirror" component C, the complement of C, induced by the complements of the
vertices in C. Hence, Theorem 5.4 can be restated as follows: “ϕ is satisfiable if
and only if no strong component of D coincides with its complement.”
The general step of the Strong Components algorithm implements the procedure
described in the proof of Theorem 5.4. Namely, it processes the strong components
of D (or equivalently, the vertices of the condensed implication graph D̂) in reverse
topological order, starting from a sink, and it labels them in the following way. For
each strong component C, one of the following cases must occur:

(a) C is already labeled. Then, the algorithm processes the next strong
component.
(b) C = C. Then, the algorithm stops. In view of Theorem 5.4, the equation
ϕ = 0 is inconsistent.
240 5 Quadratic functions

Figure 5.13. The condensed implication graph D̂ϕ for the equation (5.15)

(c) C is unlabeled. Then, the algorithm assigns the label 1 to C and the label 0
to C.

It is easy to see that, if C1 and C2 are two strong components, if there exists an
arc from some vertex of C1 to some vertex of C2 in D, and if C1 is labeled 1, then
C2 is necessarily labeled 1 as well. Thus, if we assign to each vertex ξ the label of
the component containing ξ , we get a solution to the equation ϕ = 0 (by virtue of
Lemma 5.3).

Example 5.9 (continued). We consider again the equation ϕ = 0 given in (5.15)


and the associated implication graph Dϕ . As can be seen from Figure 5.11, the
strong components of Dϕ are {x 2 , x4 }, {x1 }, {x3 } and their mirror components. The
condensed implication graph D̂ϕ is shown in Figure 5.13.
Since no pair xi , x i belongs to the same strong component for any i, ϕ = 0
is consistent. The strong components of Dϕ are labeled in the order shown in
Table 5.7. Hence, a solution of the quadratic Boolean equation ϕ = 0 is given by
x1∗ = 1, x2∗ = 0, x3∗ = 1, x4∗ = 1. 

Aspvall, Plass and Tarjan [34] show that the Strong Components algo-
rithm has complexity O(m). A randomized version of the algorithm, with
expected O(n) time complexity, has been described by Hansen, Jaumard, and
Minoux [470].
5.6 Graph-theoretic algorithms for quadratic equations 241

Table 5.7. Labeling of the strong


components of Dϕ for the equation (5.15)

Strong component Label

{x 2 , x4 } 1
{x2 , x 4 } 0
{x3 } 1
{x 3 } 0
{x1 } 1
{x 1 } 0

5.6.6 An experimental comparison of algorithms for quadratic equations


Petreschi and Simeone [742] report on the results of an experimental study in
which the performance of the four algorithms for quadratic equations described in
Sections 5.6.2–5.6.5 has been compared on 400 randomly generated test problems
with up to 2000 variables and 8000 terms.
In all test problems, the density mn was nearly constant and equal to 4. With such
density, almost all random quadratic equations instances are unsatisfiable under
mild assumptions on the probability distribution of their terms (see Theorem 2.16
in Chapter 2 and Exercises 12–13 at the end of the current chapter). Therefore,
200 random instances were generated, and all of them proved to be unsatisfiable.
The remaining 200 instances were randomly generated so as to be renamable
Horn and thus provably satisfiable. One shortcoming of the uniform probability
model was that almost all the strong components of the implication graph were
singletons, except for one (in the unsatisfiable case) or two (in the satisfiable
case) “megacomponents”: This is in agreement with the theoretical probabilistic
results in Hansen, Jaumard, and Minoux [470]. In order to eliminate these and
other related anomalies, another instance generator was built, which produced
strong components with binomially distributed sizes. Then random instances were
generated by this “binomial” generator.
An analysis of the results led to the following main conclusions:

1) The first, and perhaps most important, observation is that quadratic Boolean
equations are indeed easy to solve: Even the slowest algorithm took only
44 milliseconds (on an IBM 3090 – nowadays an archaic computer!) to
solve the largest problem (2000 variables and 8000 terms).
2) In the satisfiable case, the foregoing experiments show a clear-cut ranking
of the four algorithms with respect to running times: L is unquestionably
the fastest one, followed by AL, S, and SC (see Figure 5.14).
3) In the unsatisfiable case, the running times of L, AL, and S are roughly
comparable, whereas the running time of SC is by far larger; except for
SC, the running times were much smaller in the unsatisfiable case than in
242 5 Quadratic functions

SC
45000
S
AL
L
30000

20000

10000

200 600 1000 1500 2000

Figure 5.14. Running times for satisfiable formulas.

1600 S
AL
L
1200

800

400

200 600 1000 1500 2000

Figure 5.15. Running times for unsatisfiable formulas.

the satisfiable one (see Figure 5.15, where the SC-graph is oversized and
hence, is not shown).
4) In the satisfiable case, the running times of SC and L grow quite regularly
with the problem size. In fact, they are very well fitted by a straight line:
The authors found that TIME L = 5.94n and TIME SC = 21.99n, the squared
correlation coefficients being RL2 = 0.999 and RSL 2
= 1, respectively. On
the other hand, the graph of the running times of AL and S as a function of
n is less regular, but it lies between two straight lines corresponding to L
and SC (see Figure 5.14).
In the unsatisfiable case, the behavior of SC is as regular as it is in the
satisfiable case. The other three algorithms, however, behave very irregu-
larly and exhibit frequent nonmonotonicities (see Figure 5.15). At any rate,
5.7 Quadratic equations: Special topics 243

their complexity turns out to be sublinear: Roughly speaking, the running


times are proportional to the square root of n. Furthermore, the running
times of L and AL are seen to be highly correlated.
In conclusion, the experimental average complexity of both L and S is
lower than their worst-case complexity, and is, in any case, bounded above
by a linear function of n. Similar conclusions are reached with the binomial
generator.
5) In the satisfiable case, the vast majority of the variables turned out to be
forced.
6) A direct comparison between L and AL shows that the latter algorithm,
despite its O(m) worst-case complexity, is more than twice slower than the
former one, whose worst-case complexity is O(mn).

The main point is that L “capitalizes on luck,” whereas AL follows a more “pes-
simistic” approach, and L is less affected by random factors, which may increase
its running time in the worst-case but may also decrease it on average. Actually,
for L to reach its O(mn) worst-case complexity, the following events must take
place:

• Every time a guess is made, it is always the wrong one.


• Every time a wrong guess is made, the resulting conflict is detected very late.
• Every time a conflict takes place, the alternative guess results in very early
blocking.

However, under both probability models, things do not go that way:

• A guess is successful in about 50% of the cases.


• Every time a wrong guess is made, the resulting conflict is detected rather
early because conflicts are due to “local obstructions” (Simeone [834]).
• Every time a conflict takes place, a certain literal ζ is recognized as being
forced; as a consequence, a large set C(ζ ) of literals is then forced.

5.7 Quadratic equations: Special topics


5.7.1 The set of solutions of a quadratic equation
There is a nice connection between the set of solutions of a quadratic Boolean
equation and median graphs. A median graph is an undirected graph having the
property that, for any three vertices x, y, z, there exists a unique vertex w (called
the median of x, y, z) that at the same time lies on some shortest path between x
and y, on some shortest path between x and z, and on some shortest path between
y and z. Median graphs display many interesting properties and remarkable con-
nections with other branches of mathematics, computer science, natural sciences,
and social sciences; see Bandelt and Chepoi [50]; Chung, Graham, and Saks [195];
Mulder [694]; Mulder and Schrijver [695], and so on.
244 5 Quadratic functions

Given a quadratic DNF ϕ and its implication graph Dϕ , let us introduce an


undirected graph H whose vertices are all the solutions of the quadratic Boolean
equation ϕ = 0, where two solutions X∗ and Y ∗ are adjacent if there exists some
strong component C of Dϕ having the property that Y ∗ is obtained from X∗ (and
vice versa) by switching the values of some variables xi such that either xi or
x i belongs to C. For instance, if X∗ = (0,1,0,0,1,0), Y ∗ = (0,0,1,0,0,0), and there
exists a strong component {x2 , x 3 , x 5 , x6 }, then X∗ and Y ∗ are adjacent in H , since
one obtains Y ∗ from X∗ by switching the values of the second, third, and fifth
variables, and the literals x2 , x 3 , x 5 all belong to the strong component.

Theorem 5.21. The foregoing construction always produces a median graph, and
all median graphs can be obtained in this way.

This result follows from work of Schaefer [807]; see also Bandelt and Chepoi
[50] and Feder [323]. An interesting “closure” property can be derived from it.
(This property is in fact a restatement of the characterization of quadratic functions
given in Section 5.3.2.)

Corollary 5.3. Let X, Y , Z be any three solutions of a quadratic Boolean equation


in n variables. Let W be the point of B n defined as follows: for each i = 1, 2, . . . , n,
the i-th component of W takes the value 1 if and only if the i-th components of at
least two out of the three vectors X, Y , Z take the value 1; that is, W is obtained
from these three vectors according to the majority rule (componentwise). Then W
also is a solution of the quadratic Boolean equation.

The number of solutions of a quadratic Boolean equation

ϕ(x1 , x2 , . . . , xn ) = 0 (5.16)

may be exponentially larger than the number of its variables, and generating them
all is generally a prohibitive task. In fact, Valiant [883] proved that even determin-
ing the number of such solutions is #P-complete, and hence, probably very difficult.
It is perhaps worth mentioning here that merely counting the solutions is somewhat
“easier” than generating them; see  Dahlöf, Jonsson, and Wahlström [252]; Fürer
and Kasiviswanathan [354].
Feder [322, 323] proposed a generating algorithm, which we now sketch. For
ease of presentation, we assume that the quadratic equation given by (5.16) is pure
and Horn, that is, all its terms are quadratic and either positive (they involve only
uncomplemented variables) or mixed (they involve exactly one complemented and
one uncomplemented variable). This assumption is not restrictive, since, in view
of Theorem 5.19, every consistent purely quadratic Boolean equation can always
be cast into a Horn equation after some of its variables are renamed.
For every pair of Boolean variables xk , xj , the following equivalences hold:

xk x j = 0 if and only if xk ≤ xj ,
x k xj = 0 if and only if xk ≤ x j .
5.7 Quadratic equations: Special topics 245

Therefore, (5.16) can be rewritten (in more than one way) as a system of Boolean
implications of the form

xk ≤ xj for all xj ∈ Dk , (5.17)

xk ≤ x j for all x j ∈ Dk , (5.18)

where Dk ⊆ X ∪ X for k = 1, 2, . . . , n.
We also assume, without loss of generality, that there are no forced variables
and no twin literals in the equation, since these can easily be detected and handled
in a preprocessing phase. As a consequence of our assumptions, the implications
(5.17)–(5.18) can be written in such a way that k < j when either xj ∈ Dk or
x j ∈ Dk .
Feder [322, 323] observed:
Theorem 5.22. Let X ∗ ∈ B n be a nonzero solution of (5.17)–(5.18), and let - ≤ n
be such that x-∗ = 1 and xi∗ = 0 for 1 ≤ i < -. Then, the point Y ∗ obtained after
replacing x-∗ by 0 is again a solution of (5.17)–(5.18).
Proof. Because y-∗ = 0, the point Y ∗ clearly satisfies all implications of the form
(5.17) for xj ∈ D- , as well as all implications of the form (5.18) for x j ∈ D- and for
x - ∈ Dk . Moreover, when x- ∈ Dk , the implication (5.17) is necessarily satisfied
by Y ∗ because k < -, and hence, yk∗ = 0. 

Example 5.11. Consider the quadratic Boolean equation

x1 x2 ∨x1 x3 ∨x1 x 5 ∨x2 x 7 ∨x3 x 4 ∨x3 x 8 ∨x4 x5 ∨x4 x 7 ∨x5 x6 ∨x6 x7 ∨x6 x 8 ∨x7 x8 = 0.
(5.19)
This equation is equivalent to the system of inequalities

x1 ≤ x 2 , x1 ≤ x 3 , x1 ≤ x5 , x2 ≤ x7 , . . . , x7 ≤ x 8 . (5.20)

Because X ∗ = (0, 1, 0, 0, 1, 0, 1, 0) is a solution of the equation, we can deduce that


Y ∗ = (0, 0, 0, 0, 1, 0, 1, 0) (obtained after replacing x2∗ by 0) also is a solution. 

We say that Y ∗ is the father of the solution X ∗ if Y ∗ and X ∗ are in the relation
described by Theorem 5.22. Note that every nonzero solution has exactly one
father. Consider now the digraph T = (S, A), where S is the set of solutions of
(5.17)–(5.18) (or, equivalently, of the quadratic equation ϕ = 0), and where an arc
(Y ∗ , X ∗ ) is in A if and only if Y ∗ is the father of X ∗ . Then, T defines an arborescence
rooted at the all-zero solution. Given any solution Y ∗ ∈ S, the children of Y ∗ in T
can easily be generated: If yj∗ is the first nonzero component of Y ∗ , then the children
of Y ∗ are exactly the points of the form X∗ = Y ∗ ∨ ei , i < j , such that X ∗ ∈ S. It
follows that the arborescence T can be generated and traversed efficiently (in fact,
with polynomial delay; see Appendix B.8).
Feder [322, 323] describes a low-complexity implementation of this procedure.
246 5 Quadratic functions

Theorem 5.23. The solutions of a quadratic equation with n variables and m


terms can be generated after O(m) preprocessing time in O(n) time per solution,
using O(m) space.

Proof. We refer the reader to Feder [322, 323] for details of the analysis. 

5.7.2 Parametric solutions


In spite of the high complexity of generating the solutions of a quadratic Boolean
equation, Crama, Hammer, Jaumard, and Simeone [234] showed that one can
obtain a concise product-form parametric representation for all such solutions.
The representation uses no more than n free Boolean parameters for an equation
in n variables. Each variable (or its complement) is expressed as a product of
these parameters or their complements, and these expressions provide a complete
description of the solution set of the equation. Furthermore, the representation can
be computed in O(n3 ) time.
In fact, algebraic methods for determining parametric representations in the
case of general Boolean equations have been known for a long time (see Löwen-
heim [628, 629] and Section 2.11.3). When specialized to quadratic equations,
Löwenheim’s method produces (in polynomial time) a parametric representation
of the solution set, each variable being associated with some Boolean expression
of the parameters. The resulting expressions are generally in neither disjunctive
nor conjunctive normal form, and reducing them to such a convenient format can
be computationally expensive. This is to be contrasted with the very simple form
of the representation proposed by Crama et al. [234].
Let us sketch the basic ideas leading to this parametric representation. As in
the previous section, we assume that the quadratic equation is represented by the
system of Boolean implications (5.17)–(5.18), where xk  ∈ Dk , x k  ∈ Dk , Dk does not
contain both a variable and its complement, and xk  ∈ Dj when xj ∈ Dk , for k, j ∈
{1, . . . , n} (otherwise, the equation can be simplified). The system (5.17)–(5.18) is
in turn equivalent to the following one:



xk ≤ xj xj (k = 1, 2, . . . , n), (5.21)
j : xj ∈Dk j : x j ∈Dk

and hence, also to the system of equations:





xk = xk xj xj (k = 1, 2, . . . , n). (5.22)
j : xj ∈Dk j : x j ∈Dk

In the remainder of this section, we focus on the equivalent expression (5.22) of


the original quadratic equation.
5.7 Quadratic equations: Special topics 247

The expression (5.22) suggests the following construction. Let P = (p1 , p2 , . . . ,


pn ) denote a vector of free Boolean parameters, and define the functions



gk (P ) = gk (p1 , . . . , pn ) = pk pj pj (5.23)
j : xj ∈Dk j : x j ∈Dk

for k = 1, 2, . . . , n. Let

Q = {(g1 (P ), . . . , gn (P )) : P ∈ {0, 1}n }. (5.24)

Then, we can prove:

Lemma 5.4. If S is the set of solutions of the system (5.22), and if Q is defined by
(5.23)–(5.24), then S ⊆ Q.

Proof. If (x1∗ , . . . , xn∗ ) ∈ S, then xk∗ = gk (x1∗ , . . . , xn∗ ) for k = 1, 2, . . . , n in view of


(5.22). Hence, (x1∗ , . . . , xn∗ ) ∈ Q. 

The next proposition states a necessary and sufficient condition under which
equality holds between S and Q. We first introduce some additional notation.
With the system (5.22), we associate the directed graph H = (X ∪ X, A), defined
as follows: For all xk in X and µ in X ∪ X, the arc (xk , µ) is in A if and only
if µ ∈ Dk . (H is in general a subgraph of the implication graph of the original
equation (5.16).)

Theorem 5.24. If S is the set of solutions of the system (5.22), and if Q is defined
by (5.23)–(5.24), then S = Q if and only if the digraph H is transitive.

Proof. Assume in the first place that H is transitive. By Lemma 5.4, we only have
to prove that every point (g1 (P ), . . . , gn (P )) in Q is a solution of (5.17)–(5.18).
Let x j ∈ Dk . If gk (P ) = 1, then pj = 0, and hence, gj (P ) = 0. This shows that
the implications (5.18) are satisfied by (g1 (P ), . . . , gn (P )).
Let xj ∈ Dk . If gj (P ) = 0, then either (i) pj = 0, or (ii) pi = 0 for some i such
that xi ∈ Dj , or (iii) pi = 1 for some i such that x i ∈ Dj . In case (ii), xi ∈ Dk by
transitivity of H . Similarly, in case (iii), x i ∈ Dk . Hence, in all cases, gk (P ) = 0,
and the implications (5.17) are satisfied by (g1 (P ), . . . , gn (P )).
Conversely, assume that H is not transitive. This means that, for some xk , xj ∈ X
and µ ∈ X ∪ X, (xk , xj ) and (xj , µ) are in A, but (xk , µ) is not in A. Assume for
instance that µ ∈ X, that is, µ = xi for some i ∈ {1, . . . , n} (the proof is similar if
µ ∈ X). So, xi ∈ Dj , but xi  ∈ Dk . Notice that i  = k, by our assumptions on the
system (5.17)–(5.18).
Let P = (p1 , . . . , pn ), where pk = 1, pi = 0, pl = 1 if xl ∈ Dk and pl = 0 other-
wise (this is a valid assignment of values to the parameters). Then, gk (P ) = 1
and gj (P ) = 0. So (g1 (P ), . . . , gn (P )) is not a solution of (5.17)–(5.18) and
S  = Q. 
248 5 Quadratic functions

So, when H is transitive, the expressions gk (P ) (k = 1, 2, . . . , n) defined by


(5.23) yield a simple, product-form parametric representation of the solutions of
(5.22), and hence of the original equation (5.16). Notice that, even if H is not
transitive, (5.22) can always be transformed into an equivalent system for which
the associated graph is transitive, by adding to it the necessary missing terms. More
precisely, if xk ≤ xj and xj ≤ µ are two inequalities in the system (5.17)–(5.18),
then the inequality xk ≤ µ is redundant, and it can always be added to the system.
Iterating this operation until the resulting graph is transitive amounts to computing
the transitive closure of H (see Section 5.8).
Crama et al. [234] rely on these ideas and on the properties of implication graphs
to derive an efficient algorithm with complexity O(max{m, n3 }) that computes a
product-form parametric representation for an arbitrary quadratic equation. We
refer to their paper for details.
Example 5.12. Consider again the quadratic Boolean equation (5.19), which is
equivalent to the system of inequalities (5.20). The digraph H associated with
the system (5.20) is represented in Figure 5.16. The transitive closure H ∗ of H
is displayed in Figure 5.17. (At this point, we can notice that x3 must be equal to
zero in all solutions of (5.19), because x8 and x 8 are successors of x3 in H ∗ .)
Using Theorem 5.24, we derive the following product-form parametric repre-
sentation of the solutions of (5.19):
x1 = p1 p 2 p3 p5 p 6 ,
x2 = p2 p7 p8 ,
x3 = 0,
x 4 = p 4 p 5 p7 p 8 ,
x5 = p5 p 6 ,
x6 = p6 p 7 p8 ,
x7 = p7 p 8 ,
x8 = p8 .

x8 x7 x6 x5 x4 x3 x2 x1

x8 x7 x6 x5 x4 x3 x2 x1

Figure 5.16. The digraph H associated with (5.20) in Example 5.12.


5.7 Quadratic equations: Special topics 249

x8 x7 x6 x5 x4 x3 x2 x1

x8 x7 x6 x5 x4 x3 x2 x1

Figure 5.17. The transitive closure of H in Example 5.12.

The reader can check that all solutions of (5.19) are generated by giving all possible
0–1 values to the parameters p1 , p2 , . . . , p8 .
Note also that, since the system of inequalities (5.20) is not uniquely defined by
(5.19), it is possible to derive from Theorem 5.24 several product-form parametric
representations of the solutions of (5.19). 

5.7.3 Maximum 2-satisfiability


For a quadratic DNF ϕ on B n , the maximum 2-satisfiability problem, or Max
2-Sat, consists in finding a point X∗ ∈ Bn that cancels the maximum number
of terms of ϕ. Of course, if the quadratic equation ϕ = 0 is consistent, then any
false point of ϕ is a solution of Max 2-Sat. In contrast with quadratic Boolean
equations, however, Max 2-Sat is an NP-hard problem, and remains hard even if
we are only interested in finding a “provably good” approximate solution of the
problem. This optimization problem was discussed extensively in Section 2.11.4.
We refer to this section for more information and references on Max 2-Sat; see
also Chapter 13 for a brief discussion of quadratic binary optimization problems
placed in the broader framework of pseudo-Boolean optimization problems.

5.7.4 On-line quadratic equations


In some applications, rather than a single quadratic Boolean equation (or 2-Sat
problem), one is required to solve a nested sequence of m equations, where each
formula is the disjunction of the previous one with an additional term of degree 2
(the initial formula is void and represents the constant 0). This problem is called an
on-line quadratic equation, or on-line 2-Sat. Note, in particular, the following:
• The on-line model is quite natural in interactive environments.
• It leads to an early detection of inconsistency at its very onset.
• As soon as the equation becomes inconsistent, the removal of the last term
immediately restores consistency.
Clearly, on-line quadratic equations can be solved in O(1 + 2 + . . . + m) =
O(m2 ) time by a naive approach. The main idea of an on-line algorithm, however,
250 5 Quadratic functions

is, to update at each step a suitable data structure that keeps track of the work
done so far and allows us to solve the whole sequence of problems with less
computational effort. In this case, the classical worst-case analysis of the cost of
a single operation may not be adequate to analyze the cost of the whole sequence
of operations, and amortized complexity arguments are more appropriate. For a
general discussion of amortized complexity, see Tarjan [859].
For an on-line equation involving n variables and m terms, Jaumard, Marchioro,
Morgana, Petreschi, and Simeone [528] present an algorithm running in (amor-
tized) O(n) time per term, and hence, in overall O(mn) time. For each formula
in the nested sequence, not only does the algorithm check whether the formula is
consistent or not, but it also yields an explicit solution, if any, and detects the sets
of forced and twin (or identical) variables.
One can hardly conceive on-line algorithms with lower complexity, since simply
writing out the solutions to m equations already requires O(mn) time. For details,
we refer to the paper by Jaumard et al. [528].

5.8 Prime implicants and irredundant forms


5.8.1 Introduction
In this section, we consider the following two problems (recall the definitions of
prime implicants and irredundant DNF from Section 1.7):

(1) Given a quadratic DNF ϕ of a quadratic Boolean function f , find all prime
implicants of f .
(2) Given a quadratic DNF ϕ of a quadratic Boolean function f , find an
irredundant DNF of f .

Because all the prime implicants of a quadratic Boolean function in n variables are
quadratic, their number is O(n2 ); moreover, as we mentioned in Section 5.6.1, the
consensus method, starting from ϕ, generates all of them in polynomial time (actu-
ally, in time O(n6 )). Similar conclusions follow from Theorem 3.9 and Corollary
3.6 in Chapter 3.
However, much faster algorithms can be obtained on the basis of the close
relationship that exists between the generation of all prime implicants of f and
the generation of the transitive closure of a digraph. As we show in Section 5.8.2,
the prime implicants of f can be easily obtained from the transitive closure of the
implication graph of ϕ.
The disjunction of all the prime implicants of a Boolean function f is, in a
sense, the most detailed and explicit DNF of f : Along with each pair of terms
it explicitly features their consensus (or some term absorbing it); so, all logical
implications derivable from those appearing in the DNF are themselves featured
in the DNF. At the opposite extreme, irredundant DNFs are the most succinct and
implicit DNFs of f : No consensus of pairs of terms appearing in any such DNF is
5.8 Prime implicants and irredundant forms 251

also present in it, and the logical implications derivable from those appearing in
the DNF are implicitly, rather than explicitly, present.
A polynomial bound can be derived for the complexity of finding an irredundant
DNF of a quadratic Boolean function f , starting from an arbitrary quadratic DNF
of f . This bound can be estimated as follows: Generate in O(n6 ) time, as earlier,
the disjunction ψ of all prime implicants of f . Choose any term T of ψ and check
in O(n2 ) time whether T is an implicant of the DNF ψ resulting from the deletion
of T in ψ (as e.g., in Theorem 3.8 of Chapter 3). If so, then T is redundant and
ψ can be replaced by ψ ; otherwise, ψ remains unchanged. At this point, choose
another term T and repeat. The process ends when all terms have been checked
for redundancy, and possibly deleted. Since the number of terms in ψ is O(n2 ),
the overall complexity of the foregoing procedure is O(n6 ) – again a polynomial
bound.
However, much faster algorithms can be obtained for this problem, too. As
mentioned above, the graph-theoretic tool of choice for the generation of all prime
implicants of a quadratic Boolean function f is the transitive closure of the impli-
cation digraph. On the other hand, as we show in Section 5.8.4, the appropriate
notion for the generation of an irredundant quadratic DNF of f is that of transitive
reduction of a digraph – just the converse of the transitive closure.

5.8.2 A transitive closure algorithm for finding all prime implicants


Let Dϕ be the implication graph associated with the quadratic DNF ϕ. An elemen-
tary, but important, property of Dϕ is that if ξ η and ηζ are any two terms for which
there is a consensus ξ ζ , then the corresponding arcs (ξ , η), (η, ζ ), and (ξ , ζ ) form
a transitive triplet, as shown in Figure 5.18. The arc (ξ , ζ ) is present in Dϕ if and
only if the consensus ξ ζ appears in ϕ. Analogous statements hold for the “mirror”
arcs (ζ , η), (η, ξ ), and (ζ , ξ ).
Recall from Appendix A that the transitive closure of a digraph D =
(V , A), is the digraph D ∗ = (V , A∗ ), where A∗ = A ∪ {(u, v): there is
a directed path from u to v in D}. Each consensus operation can be interpreted on
Dϕ as the addition of two mirror transitive arcs, and vice versa. Hence, in the
transitive closure Dϕ∗ of Dϕ , each pair (α, β) and (β, α) of mirror arcs corresponds
to a quadratic implicant αβ if α  = β, and to a linear implicant α if α = β. Some of
the quadratic implicants associated with arcs of Dϕ∗ may not be prime, since they
might be absorbed by linear ones. However, it follows from Theorem 5.6 that all
prime implicants must correspond to some pair of arcs of Dϕ∗ .
The obvious idea for generating all prime implicants of the quadratic DNF ϕ,
then, is to compute the transitive closure Dϕ∗ and to efficiently perform absorption
in order to remove nonprime quadratic implicants. The operation of absorption
also has a simple interpretation on Dϕ∗ . Suppose that the linear term ξ absorbs the
quadratic term ξ η. Then, the arcs (ξ , ξ ), (ξ , η), and (η, ξ ) have to be present in Dϕ∗
(see Figure 5.19).
252 5 Quadratic functions

Figure 5.18. Transitive arcs in the implication graph.

Figure 5.19. Absorption in the implication graph.

Therefore, absorption can be performed directly on Dϕ∗ by application of the


following rule: Whenever an arc (ξ , ξ ) is present, remove all arcs leaving ξ (except
for (ξ , ξ )), as well as all arcs entering ξ (again, except for (ξ , ξ )).
A survey on transitive closures is given in van Leeuwen [887]. Most of the
known transitive closure algorithms are of two kinds.

1) Algorithms that perform a sequence of transitive arc additions.


The O(mn) algorithms of Goralcikova and Koubek [404], Ebert [286],
Schmitz [810], Jaumard and Minoux [529], Chen and Cooke [191] belong
to this class (for some of these algorithms, stronger complexity bounds hold,
depending also on size parameters other than n and m.)
5.8 Prime implicants and irredundant forms 253

2) Algorithms based on Boolean matrix multiplication.


A straightforward implementation results in an O(n3 ) transitive closure
algorithm (Warshall [899]). Strassen-like matrix multiplication methods
typically achieve complexities of O(n2+α log n), where 0 < α < 1; see
Furman [355], Fischer and Meyer [331], Munro [697], Booth [104],
Coppersmith and Winograd [212].

Munro [697] was apparently first to point out that, when computing the transitive
closure of a digraph D, one may assume, without loss of generality, that D is
connected (in the sense that its underlying undirected graph is connected) and
acyclic. As a matter of fact, if D is disconnected, its transitive closure D ∗ is the
union of the transitive closures of the connected components of D. If D has cycles,
then one can preliminarily find the strong components of D by the O(m) algorithm
of Tarjan [858], and subsequently generate the acyclic condensation D̂ of D by
shrinking each strong component into a single supervertex. Once D̂ ∗ has been
computed, D ∗ can be obtained as follows:
Let A∗ and Â∗ be the arc sets of D ∗ and D̂ ∗ , respectively. Then,


 there exists (u, v) ∈ Â∗ such that

x belongs to the strong component of D
(x, y) ∈ A∗ ⇔ (5.25)

 represented by u, and y belongs to

the strong component of D represented by v.

Let nk be the number of vertices in the kth strong component of D, k = 1, . . . , r; let


n̂ and m̂ be the number of vertices and arcs of D̂, respectively. Besides the O(m̂n̂)
operations required to generate D̂ ∗ , one needs

ni nj ≤ (n1 + · · · + nr )2 = n2
(i,j )∈Â∗

elementary operations to compute A∗ , according to (5.25). But n̂ ≤ n, m̂ ≤ n and


m ≥ n − 1 under the assumption that D is connected. It follows that D ∗ can be
computed in O(m̂n̂ + n2 ) time and thus in O(mn) time. We state in Figure 5.20
a formal description of a transitive closure algorithm for the generation of all the
prime implicants of a quadratic Boolean function f .
Clearly Step 2 can be implemented in O(mn) time, and this is also the overall
complexity of the algorithm. From the discussion at the beginning of this section,
we obtain the following results:

Theorem 5.25. The algorithm Quadratic Prime Implicants is correct, that is,
it produces all prime implicants of the quadratic Boolean function f represented
by the input DNF ϕ.

Example 5.13. Let f be the quadratic Boolean function represented by the DNF

ϕ = x1 x 2 ∨ x1 x 3 ∨ x2 x3 ∨ x 3 x4 .
254 5 Quadratic functions

Procedure Quadratic Prime Implicants(ϕ)


Input: A quadratic DNF ϕ.
Output: All prime implicants of the quadratic Boolean function f represented by ϕ.

begin
Step 1: construct the implication graph Dϕ ;
Step 2: run a transitive closure algorithm on the input Dϕ ;
let H = Dϕ∗ be the (transitive) graph obtained at the end of this step;
Step 3: for each arc (ξ , ξ ) in H , remove all arcs leaving ξ (except (ξ , ξ ))
and all arcs entering ξ (except (ξ , ξ )); let Q be the resulting digraph;
Step 4: if there is a pair of arcs (ξ , ξ ), (ξ , ξ ) in Q, then the Boolean constant 1n
is the only prime implicant of ϕ;
else
for each arc (ξ , ξ ) in Q, the linear term ξ is a prime implicant of ϕ;
for each pair of mirror arcs (ξ , η) and (η, ξ ), the quadratic term ξ η
is a prime implicant of ϕ;
Step 5: return the list of prime implicants constructed in Step 4.
end

Figure 5.20. Procedure Quadratic Prime Implicants.

Figure 5.21. The implication graph Dϕ .

The implication graph Dϕ is shown in Figure 5.21; the graphs H and Q are shown
in Figures 5.22 and 5.23, respectively. It follows that the disjunction of all the
prime implicants of f is given by

x1 ∨ x2 x3 ∨ x2 x4 ∨ x 3 x4 .

This can also be checked by the consensus method. 


5.8 Prime implicants and irredundant forms 255

Figure 5.22. The graph H ; dashed lines represent arcs added to Dϕ .

Figure 5.23. The graph Q.

5.8.3 A restricted consensus method and its application to computing


the transitive closure of a digraph
In the present subsection, following a direction opposite to the previous one, we
show how to obtain a fast and simple O(mn) algorithm for the transitive clo-
sure of a digraph G through the execution of a very restricted form of consensus
algorithm on a (quadratic) mixed DNF naturally associated with G. Unlike other
transitive closure algorithms with the same complexity, this one has a very sim-
ple implementation and does not require complex data structures. The material in
this subsection is drawn from recent work of Boros, Foldes, Hammer, and Sime-
one [120]. We refer to Section 3.2.2 and Section 6.5 for the general notions of
disengagement consensus and input consensus, respectively.

Definition 5.1. A consensus algorithm is said to be an input disengagement


algorithm if it is both an input algorithm and a disengagement algorithm.
256 5 Quadratic functions

Whether an input disengagement algorithm works or not for a given (quadratic)


mixed Boolean function may actually depend on the disengagement order of the
terms of its DNF representation. However, we will prove that, for an arbitrary
mixed Boolean function f , there always exists some input disengagement algo-
rithm that works for f . Before giving examples, let us work out a graph-theoretic
framework which makes things easier to visualize and, as an additional bonus,
leads to an efficient transitive closure algorithm. We recall from Section 5.4.1 that,
for a mixed DNF ϕ, one can define a directed graph G ≡ G(ϕ) – not to be confused
with the implication graph Dϕ – as follows:

xi is a vertex of G ⇐⇒ xi is a variable of ϕ, (5.26)


(xi , xj ) is an arc of G ⇐⇒ xi x j is a term of ϕ. (5.27)

Conversely, given an arbitrary digraph G, one can associate with G a mixed DNF
ϕ ≡ ϕ(G) by simply reading the double implications (5.26) and (5.27) from left
to right. Two terms in ϕ, let them be xy and uv, have a consensus only in two
cases:

(a) u = y : then their consensus is xv;


(b) x = v : then their consensus is uy.

Thus, the consensus of any two mixed terms is still mixed. A graph-theoretic
interpretation in G of cases (a) and (b) is provided by Figures 5.24 (a) and (b),
respectively. As in the case of implication graphs, here, too, an elementary con-
sensus operation corresponds to a transitive arc addition, and vice versa. Observe
that in the context of mixed DNFs, absorption is trivial. In fact, since linear terms
can never be generated by consensus in this case, a quadratic mixed term can
be absorbed only by itself; that is, it is absorbed only if it is already present in
the current list of terms. Accordingly, any consensus algorithm whose input is a
mixed DNF ϕ can be interpreted as a transitive closure algorithm on the associated
digraph G, and vice versa (recall also Theorem 5.2).
Now we are ready to give a graph-theoretic description of a generic input
disengagement consensus algorithm. We assume that the algorithm directly takes
as input, instead of a mixed DNF, a digraph G = (V , E). As in Section 5.8.2, we
may assume, without loss of generality, that G is a connected directed acyclic
graph or a connected DAG.

Figure 5.24. Transitive arc additions.


5.8 Prime implicants and irredundant forms 257

Procedure Input Disengagement Consensus (G, ≺)


Input: A connected DAG G = (V , E) and a disengagement order ≺ on E.
Output: A DAG H = (V , F ), F ⊇ E.

begin
let F := E;
declare all arcs of E to be engaged;
while there is some engaged arc do { process arc a }
select the first (with respect to ≺) engaged arc a;
declare arc a to be disengaged;
let a = (h, k);
for each arc (p, h) ∈ F do add arc (p, k) to F (if missing);
for each arc (k, q) ∈ F do add arc (h, q) to F (if missing);
end while
return H = (V , F );
end

Figure 5.25. Procedure Input Disengagement Consensus.

Added arcs

p h k q

Processed arc

Figure 5.26. Processing an arc in the Input Disengagement Consensus procedure.

Definition 5.2. A disengagement order ≺ is any strict linear order on the arc set
of G.

A disengagement order is meant to represent the order in which the arcs


are disengaged in the Input Disengagement Consensus algorithm described
in Figure 5.25 (compare with the Disengagement Consensus procedure in
Figure 3.2 of Section 3.2.2, and see also Figure 5.26).
Does the digraph H = (V , F ) output by the Input Disengagement Consensus
procedure coincide with the transitive closure G∗ of G? The answer may depend
on the chosen disengagement order ≺, as illustrated by the following example:

Example 5.14. Consider the directed path G = P5 , and label its four arcs as
shown in Figure 5.27. If the disengagement order is 1 ≺ 4 ≺ 3 ≺ 2, then H is a
proper subgraph of G∗ , since arc (v1 , v5 ) is missing (see Figure 5.28; here and
258 5 Quadratic functions

Figure 5.27. The dipath P5 .

in Figure 5.29, all arcs are assumed to be directed from top to bottom; at each
iteration, the thick arc is the one that is being processed, and the dashed arcs are
the ones that are being added).
On the other hand, if the disengagement order is 1 ≺ 3 ≺ 4 ≺ 2, then H = G∗
(see Figure 5.29). Interestingly, in this case, only three iterations are needed in
order to generate G∗ . 

Definition 5.3. A disengagement order ≺ is successful (for the digraph G) if the


input disengagement algorithm outputs G∗ when it runs on the input (G, ≺).
Can successful disengagement orders be characterized? Theorem 5.26 yields
some insights into this question, providing a full characterization in the case of
dipaths; this characterization proves useful in establishing our main Theorem 5.28.
Let us first introduce some preliminary definitions and notation. We denote
by Pn the standard dipath whose vertices are v1 , . . . , vn and whose arcs are
(v1 , v2 ), (v2 , v3 ), . . . , (vn−1 , vn ). We let m = n − 1 and label arc (vi , vi+1 ) as i, for
i = 1, 2, . . . , m.
5.8 Prime implicants and irredundant forms 259

Figure 5.28. H does not coincide with G∗ .

Figure 5.29. H = G∗ .

Definition 5.4. A disengagement order ≺ on the arc set {1, 2, . . . , m} of Pn is said


to be
(i) monotone if
either 1 ≺ 2 ≺ · · · ≺ m,
or 1 0 2 0 · · · 0 m;
(ii) an N-order if
either 1 ≺ 2 and 2 0 3 and 3 ≺ 4 ≺ · · · ≺ m,
or (symmetrically) 1 0 2 0 · · · 0 m − 2 and m − 2 ≺ m − 1
and m − 1 0 m;
260 5 Quadratic functions

(iii) a V-order if there exists an index i, 1 < i < m, such that:

h 0 h + 1 for h = 1, . . . , i − 1;
h ≺ h + 1 for h = i, . . . , m − 1;

(iv) a W-order if there exists an index i, 2 < i < m − 1, such that:

h 0 h+1 for h = 1, . . . , i − 2;
i − 1 ≺ i and i 0 i + 1;
h ≺ h+1 for h = i + 1, . . . , m − 1.

Notice that V-orders, N-orders and W-orders exist only when m ≥ 3, m ≥ 4, and
m ≥ 5, respectively. Monotone orders and N-orders are very easy to recognize.
One can recognize both V-orders and W-orders among all strict linear orders in
O(m) time by constructing the m-vector rank, whose components are defined
by rank(h) = r if and only if h is the r-th smallest element with respect to ≺
(h = 1, . . . , m), and comparing each component with the next one.
Clearly, for m ≤ 3, any linear order is a successful disengagement order for the
path Pm+1 . The following theorem yields several characterizations of successful
disengagement orders for m ≥ 4:
Theorem 5.26. Let Pn be the standard dipath on n vertices whose arcs are labeled
1, 2, . . . , m (where m = n − 1), and let ≺ be any disengagement order on the set
{1, 2, . . . , m}. Then, the following statements are equivalent for m ≥ 4:
(a) The disengagement order ≺ is successful for Pn .
(b) There are no i < j < h < k such that i ≺ j and h 0 k.
(c) There is no i such that

either i ≺ i + 1 and i + 2 0 i + 3,
or i ≺ i + 1 and i + 3 0 i + 4.

(d) The disengagement order ≺ is either monotone, or an N-order, or a V-order,


or a W-order.
(e) There is an arc a such that, for each t = 1, 2, . . . , m − 1, At ∪ {a} induces
a subpath of Pn , where At consists exactly of the first t arcs of Pn with
respect to ≺.
Proof. See Boros et al. [120]. 

The following result is also worth mentioning:


Theorem 5.27. The minimum cardinality of a set of arcs to be disengaged in order
to generate the transitive closure of Pn is n − 2. For any successful disengagement
order and any arc a  as in Theorem 5.26(e), one can obtain one such minimum
cardinality set by moving a to the last rank in the order.
5.8 Prime implicants and irredundant forms 261

Proof. See Boros et al. [120]. For instance, in the second case of Example 5.14, it
is enough to disengage the arcs 1, 3, 4 in order to generate the transitive closure
of P5 (see Figure 5.29), but this is impossible with one or two arcs. 

The main result of this section can now be stated.

Theorem 5.28. For an arbitrary DAG G, there always exists a disengagement


order that is successful for G. Such a disengagement order can be found in time
O(m), where m is the number of arcs of G.

Proof. We label the arcs of G from 1 to m, as follows: The vertices of G are visited
in reverse topological order and for each vertex i, the arcs going into i are assigned
the highest previously unassigned labels (ties can be broken arbitrarily). This can
be done in time O(m) as in Tarjan [858].
Now, let ≺ be the disengagement order, defined by

i ≺ j ⇐⇒ label(i) < label(j ).

Since the arc labels are strictly increasing along each dipath P of G, ≺ induces
a monotone strict linear order on the arcs of each dipath. By Theorem 5.26, the
input disengagement algorithm running on the instance (G, ≺) must generate the
transitive closure of each maximal dipath of G. Since a DAG is transitively closed
if and only if each of its maximal dipaths is such, it follows that the DAG H
produced by the input disengagement algorithm must coincide with G∗ . 

The final result of this subsection concerns the complexity of the input
disengagement algorithm.

Theorem 5.29. The complexity of the Input Disengagement Consensus


algorithm is O(mn).

Proof. Since the algorithm is an input consensus one and since all arcs of G are
disengaged after processing, the algorithm consists of m stages, one for each arc
of G. At each stage, an arc (h, k) of G is processed: All its predecessors (p, h) and
all its successors (k, q) are examined and the arcs (p, k) and (h, q) added to the
current set F , provided that they are not already present. Since the initial digraph
G is acyclic and each transitive arc addition transforms a DAG again into a DAG,
no predecessor p of h can coincide with a successor q of k. Hence, for a fixed arc
(h, k), the number of all such vertices p and q is at most n − 2. Therefore, there
are at most m(n − 2) transitive arc additions, and the thesis follows. 

5.8.4 Irredundant normal forms and transitive reductions


We turn our attention to the second problem stated in Section 5.8.1: Given a
quadratic DNF ϕ of a quadratic Boolean function f , find an irredundant DNF of f .
262 5 Quadratic functions

We restrict ourselves to finding prime irredundant DNFs. Recall from Section 1.7
that a prime irredundant DNF of f has the following two properties:
• It is a disjunction of prime implicants of f .
• It does not have any redundant terms, that is, terms whose deletion results in
a shorter DNF representation of f .
Therefore, a natural algorithmic strategy for finding a prime irredundant DNF
is the following:
1) Generate all linear implicants of f from the input DNF ϕ.
2) If there are two linear implicants of the form ξ and ξ , then the constant 1n is
the only prime implicant, and hence, also the only prime irredundant form
of f ; stop.
3) Otherwise, perform all possible absorptions of quadratic terms by linear
ones. The resulting DNF χ is prime.
4) Check whether any term of χ is redundant.
Step 1 can be efficiently implemented as follows: To check whether the linear
term ξ is an implicant of f , assign to ξ the value 1 and deduce the values of as many
literals as possible, exactly as in the Labeling algorithm of Section 5.6.2. Then, ξ
is an implicant of f if and only if a conflict arises. Another efficient alternative
is to work on the implication graph Dϕ using Theorem 5.5 to check whether ξ is
forced to 0. Each of these two approaches takes O(mn) time.
Steps 2 and 3 are easy to implement.
An efficient implementation of Step 4 relies on the notion of transitive reduction.
A transitive reduction of a digraph D = (V , A) is any digraph D = (V , A ) such
that the transitive closure of D is equal to the transitive closure of D , and such
that the cardinality of A is minimum with this property. In the case of acyclic
digraphs, the transitive reduction is unique and can be computed in polynomial
time (see Aho, Garey, and Ullman [10]).
Let Dχ be the implication graph of the DNF χ found in Step 3. At this point,
we may assume that no linear term is redundant. Also, we may assume, without
loss of generality, that Dχ is an acyclic digraph.
Lemma 5.5. A quadratic term ξ η is redundant in χ if and only if, in the transitive
reduction of Dχ , the arcs (ξ , η) and (η, ξ ) are both missing.
Proof. A term ξ η of χ is redundant if and only if it can be obtained from the
remaining terms of χ through a sequence of consensus operations. In view of
the interpretation of consensus as a transitive arc addition, ξ η is redundant if
and only if in Dχ there is a directed path from ξ to η (and hence, also from η to
ξ ), that is, if and only if both arcs (ξ , η) and (η, ξ ) are missing in the transitive
reduction of Dχ . 

As a consequence of Lemma 5.5, one gets the following simple implementation


of Step 4: Build the implication graph Dχ and its transitive reduction Dχr . Delete
5.9 Dualization of quadratic functions 263

from χ all the quadratic implicants ξ η such that both arcs (ξ , η) and (η, ξ ) are
missing in Dχr .
The resulting DNF T is a prime irredundant DNF of f .
Aho, Garey, and Ullman [10] have shown that the transitive reduction of an
arbitrary DAG can be generated with the same order of complexity as its transitive
closure. In particular, O(mn) algorithms are available. Hence, an irredundant DNF
of f can be obtained within the same complexity.

5.9 Dualization of quadratic functions


(Contributed by Oya Ekin Karaşan)

5.9.1 Introduction
Several algorithmic problems related to dualization of Boolean functions were
introduced in Section 4.3. We now consider the following special case (recall that
the complete DNF of a Boolean function consists of the disjunction of all its prime
implicants):

Quadratic DNF Dualization


Instance: The complete DNF of a quadratic Boolean function f .
Output: The complete DNF of f d .

We observe that for a quadratic function f , there is no serious loss of generality


from assuming that f is given by its complete DNF, rather than by an arbitrary
DNF representation. Indeed, fast algorithms can be used to generate all prime
implicants of f from any DNF, as explained in Section 5.8.
We should note, however, that there are quadratic Boolean functions f whose
dual f d has exponentially more prime implicants than f . An example of such a
function is given (for even n) by the DNF
n/2

x2i−1 x2i ,
i=1

whose dual has 2n/2 prime implicants.


Since the output may be large, the question of interest again becomes designing
algorithms that run either with polynomial delay, in polynomial incremental time,
or in polynomial total time (see Appendix B).
Recall that the problem of dualizing a positive quadratic Boolean function was
mentioned in Chapter 4. There, it has been noted that, due to its relationship with
the problem of generating all maximal stable sets of a graph, the problem can be
solved with polynomial delay (cf. also Exercise 7 of Chapter 4).
In fact, as showed by Ekin [303, 304], this relation can be further exploited
in order to develop a polynomial-delay algorithm for the dualization of general,
not necessarily positive, quadratic DNFs. We discuss this in more detail in the
following subsection.
264 5 Quadratic functions

5.9.2 The dualization algorithm


Let f be a quadratic Boolean function, and consider the complete DNF ϕ or,
equivalently, the list of prime implicants of f . To solve the problem Quadratic
DNF Dualization, we may assume, without loss of generality, that ϕ is purely
quadratic, meaning that all prime implicants of f are quadratic. Indeed,
• if ϕ ≡ 1n , then f d = 0 n ;
• if ξ is a linear prime implicant of f , then all prime implicants of f d
contain ξ .
Let Gf = (V , E) be the matched graph associated with ϕ, from which the null
edges have been deleted; thus, the edges in E are in one to one correspondence
with the prime implicants of f . It immediately follows from the definition of the
dual that the prime implicants of f d are in one-to-one correspondence with those
minimal vertex covers of Gf that do not contain both a vertex ξ and its negation ξ .
As in Section 5.4.3, we say that literals ξ and η are twins if both ξ η and ξ η are
prime implicants of f , that is, if ξ = η for all false points of f .
Let C be a minimal vertex cover of Gf that does not contain both a vertex
and its negation. It is easy to see that if ξ and η are twin literals, then C contains
either both ξ and η or neither of them. Indeed, if ξ ∈ C, and ξ and η are twins,
then (η, ξ ) ∈ E. Therefore, as ξ ∈ / C by assumption, C must contain η in order to
intersect the edge (η, ξ ).
Let us construct a graph G∗f = (V ∗ , E ∗ ) from Gf as follows: The vertex set V ∗
consists of all equivalence classes induced by the “twin-relation” on the set V of
literals; that is, ξ and η belong to the same equivalence class if and only if they are
twins. Note that the negations of all vertices in an equivalence class I also form
an equivalence class; we denote it by I and call it the negation of the equivalence
class I . There is an edge in G∗f between two equivalence classes I and J if and
only if (ξ , η) is an edge of Gf for some ξ ∈ I and η ∈ J .
Observe that no edge of Gf joins two twins, for this would violate the primality
of the term in ϕ corresponding to this edge. Additionally, if (I , J ) ∈ E ∗ , then ξ η is
a prime implicant of f for every ξ ∈ I and η ∈ J , which is simply a consequence
of the consensus operation. Hence, we can conclude that no information is lost
in the process of identifying a set of twins, and the graph G∗f summarizes all the
information present in Gf .
As mentioned above, a minimal vertex cover of Gf that does not contain both a
vertex and its negation contains only entire equivalence classes. Hence, a minimal
vertex cover of Gf that does not contain both a vertex and its negation corresponds
to a minimal vertex cover of G∗f that does not contain both a vertex and its negation.
In fact, the following stronger statement is valid:
Lemma 5.6. No minimal vertex cover of G∗f contains both a vertex and its
negation.
Proof. Assume by contradiction that C is a minimal vertex cover of G∗f containing
both I and I . Since C is minimal, there is an edge (I , J ) ∈ E ∗ such that J ∈
/ C.
5.9 Dualization of quadratic functions 265

Similarly, there is an edge (I , K) ∈ E ∗ such that K ∈


/ C. Note that it is not possible
to have J = K as this would contradict primality. Moreover, J = K is not possible
either, since it would mean that I ∪ K is an equivalence class.
Because (I , J ) ∈ E ∗ and (I , K) ∈ E ∗ , there exist ξ ∈ I , η ∈ J , γ ∈ K such
that both ξ η and ξ γ are prime implicants of f . It follows that their consensus
ηγ gives rise to the edge (J , K) in E ∗ . But this edge is not covered by C, a
contradiction. 

We conclude that the minimal vertex covers of Gf that do not contain both a
vertex and its negation (namely, the prime implicants of f d ) correspond precisely
to the minimal vertex covers of G∗f .
Several algorithms are available in the literature for generating all maximal
stable sets of a graph with polynomial delay (and even in linear space) [538, 605,
873]. Since maximal stable sets are precisely the complements of minimal vertex
covers, we obtain the following result due to Ekin [303, 304].

Theorem 5.30. The problem Quadratic DNF Dualization can be solved with
polynomial delay.

Let us illustrate the dualization algorithm with an example.

Example 5.15. Let the Boolean function f be given by the quadratic DNF

x1 ∨ x 1 x2 ∨ x 3 x 4 ∨ x4 x5 ∨ x3 x 5 ∨ x4 x 6 ∨ x5 x 7 ∨ x6 x8 .

• Step 1: Find the complete DNF representation ϕ of f . We obtain

ϕ = x1 ∨ x2 ∨ x 3 x 4 ∨ x 3 x5 ∨ x3 x4 ∨ x4 x5 ∨ x3 x 5 ∨ x 4 x 5 ∨ x 3 x 6 ∨ x4 x 6
∨x 5 x 6 ∨ x3 x 7 ∨ x 4 x 7 ∨ x5 x 7 ∨ x 6 x 7 ∨ x6 x8 ∨ x 7 x8
∨x 3 x8 ∨ x4 x8 ∨ x 5 x8 .

We observe that f  ≡ 1, and that the variables x1 , x2 can be removed from


further consideration (they appear in every dual prime implicant).
• Step 2: Identify the equivalence classes. In this example, literals x 3 , x4 , and
x 5 are equivalent, and so are x3 , x 4 , and x5 .
• Step 3: Construct G∗f ; see Figure 5.30.
• Step 4: Find all maximal stable sets of G∗f . There are three such sets, and
each of them yields a prime implicant of f d .

Maximal stable sets Corresponding prime implicants of f d


{x 3 x4 x 5 , x6 , x 7 } x1 x2 x3 x 4 x5 x 6 x8
{x3 x 4 x5 , x 6 , x6 } x1 x2 x 3 x4 x 5 x 7 x8
{x3 x 4 x5 , x 6 , x8 } x1 x2 x 3 x4 x 5 x6 x 7


266 5 Quadratic functions

Figure 5.30. The graph G∗f .

5.10 Exercises
1. In a plant, two machines are available for processing n jobs. Each job i has
a fixed start time si and a fixed end time ti , and it must be processed without
interruption by either machine. No job can be processed by both machines,
and neither machine can process more than one job at a time. When a job
ends, the next one can start instantaneously on the same machine. Set up a
quadratic Boolean equation that is consistent if and only if a feasible schedule
exists for the n jobs.
2. Solve the quadratic Boolean equation ϕ = 0, with

ϕ = x1 x 2 ∨ x1 x6 ∨ x 1 x2 ∨ x 1 x 5 ∨ x 1 x 6 ∨ x 1 x 9 ∨ x2 x6 ∨ x 2 x3 ∨ x 2 x 6 ∨ x3 x4
∨x 3 x 4 ∨ x 3 x7 ∨ x 3 x 8 ∨ x4 x5 ∨ x4 x6 ∨ x 4 x7 ∨ x5 x 8 ∨ x 5 x8 ∨ x6 x 7 ∨ x6 x 8
∨x 7 x9 ∨ x8 x9 ,

by the four algorithms of Section 5.6.


3. Show that the quadratic Boolean equation

x1 x 3 ∨ x1 x5 ∨ x 1 x2 ∨ x 2 x4 ∨ x 2 x7 ∨ x3 x 6 ∨ x3 x 8 ∨ x 3 x5 ∨ x 4 x6 ∨ x 4 x8
∨ x 5 x 7 ∨ x 5 x 8 ∨ x 7 x8 = 0

has no solution by pinpointing a strong component containing both a variable


and its complement in the implication graph.
4. Given a pure quadratic Boolean DNF ϕ, show that the equation ϕ = 0 is
consistent if and only ϕ is Horn-renamable.
5. Show that the Alternative Labeling algorithm can be implemented to run in
O(m) time.
6. Exhibit an example showing that the Switching algorithm in Section 5.6.4
can attain its O(mn) worst-case complexity bound.
7. Prove that, for every n ≥ 2, the number of solutions of the quadratic Boolean
equation
x1 x2 ∨ x2 x3 ∨ . . . ∨ xn−1 xn = 0
5.10 Exercises 267

is given by the Fibonacci number Fn+1 and thus grows exponentially with
n.
8. Find all prime implicants of the quadratic Boolean function

f (x1 , . . . , x7 ) = x1 x2 ∨x1 x 7 ∨x 2 x3 ∨x 2 x4 ∨x 3 x4 ∨x 4 x 5 ∨x 4 x6 ∨x5 x 6 ∨x 6 x7

by the algorithm in Section 5.8.2.


9. Find an irredundant DNF of the quadratic DNF

x1 x2 ∨ x1 x 4 ∨ x1 x6 ∨ x2 x 3 ∨ x2 x5 ∨ x 2 x 4 ∨ x3 x5 ∨ x3 x 6 ∨ x 3 x 4 ∨ x4 x 5
∨ x4x6 ∨ x5x6

by the algorithm in Section 5.8.4.


10. A posiform is a multilinear polynomial in the 2n variables x1 , x2 , . . . , xn , x 1 ,
x 2 , . . . , x n with nonnegative real coefficients.
(i) Show that for every quadratic pseudo-Boolean function f (X) on Bn ,
there exist a constant c and a quadratic posiform φ(X, X) such that
f (X) = c + φ(X, X) for all X ∈ Bn .
(ii) Clearly, c is a lower bound on the minimum of f in Bn . Show that this
lower bound is tight if and only if a certain quadratic Boolean equation
is consistent.
(See Hammer, Hansen, and Simeone [440] and Chapter 13.)

11. Let φ(x1 , x2 , . . . , xn ) = (i,j )∈E xi xj be a positive quadratic DNF of n vari-
ables, let V = {1, 2, . . . , n}, and let the graph G = (V , E) be connected.
Assume we know that, for some reason, the condition xi ≤ xj must hold
between two variables xi and xj . Because this condition is equivalent to
xi xj = xi , it follows that the term xi xj of φ can be “linearized” when the
constraint xi ≤ xj holds. The question arises: What is the minimum number
of binary order constraints that need to be imposed in order to make φ linear?
(We only count the order constraints that are explicitly imposed, not those
that are implied by the transitivity of the order relation ≤.)
(i) Show that, in order to linearize φ, at least n − 1 order constraints need
to be imposed.
(ii) Show that a set of n − 1 order constraints linearizing φ is given by the
set {xi ≤ xj : (i, j ) ∈ A}, where A is the set of arcs of a depth-first search
tree T of G with the following property: If vertices i and j are adjacent
in G, then i is an ancestor of j in T , or vice versa.
(See Tarjan [858]; Hammer and Simeone [462].)
12. Consider the probability model in which all the quadratic Boolean equations
with n variables and m terms are equally likely. Show that a random quadratic
equation is almost surely satisfiable when m < n and almost surely unsatis-
fiable when m > n.
(See Chvátal and Reed [202].)
13. Show that if a quadratic Boolean equation with n variables and m terms is
generated at random in the preceding probability model, then one can solve
268 5 Quadratic functions

2 3 4

5 6 7

8 9

Figure 5.31. A directed acyclic graph.

it in expected O(n) time.


(Hint: Randomly select 4n terms and solve the corresponding equation in
O(n) time. With high probability, the equation is inconsistent. If not, solve
the full equation in O(m) time. See Hansen, Jaumard, and Minoux [470].)
14. Find the transitive closure of the DAG in Figure 5.31 by the input
disengagement method of Section 5.8.3.
15. Show that Lemma 5.6 does not hold in general for the graph Gf defined in
Section 5.9.2.
6
Horn functions
Endre Boros

In this chapter, we study the class of Horn functions. The importance of Horn
functions is supported by their basic role in complexity theory (see, e.g., Schaefer
[807]), by the number of applications involving these functions, and, last but not
least, by the beautiful mathematical properties that they exhibit.
Horn expressions and Horn logic were introduced first in formal logic by
McKinsey [638] and Horn [509] and were later recognized as providing a proper
setting for universal algebra by Galvin [367], Malcev [657], and McNulty [640].
Horn logic proved particularly useful and gained prominence in logic programming
[19, 185, 488, 489, 494, 521, 552, 582, 648, 656, 721, 855, 816], artificial intelli-
gence [186, 277, 318, 612, 853], and in database theory through its proximity to
functional dependencies in relational databases [179, 267, 319, 320, 646, 647, 797].
The basic principles of Horn logic have been implemented in several widely
used software products, including the programming language PROLOG and the
query language DATALOG for relational databases [494, 648]. Though many of
the cited papers are about first-order logic, the simplicity, expressive power, and
algorithmic tractability of propositional Horn formulae are at the heart of these
applications.

6.1 Basic definitions and properties


Horn functions, just like monotone and quadratic functions, are customarily
defined by the syntax of their DNF (or CNF) expressions. It is important to note,
however, that this syntactical property of a particular representation of a Horn
function propagates, in fact, to all its prime representations. In this sense, the cus-
tomarily used syntactical description of Horn functions does indeed define a class
of functions, and not merely a family of expressions.
To see this, let us start with some basic definitions (see also Section 1.13 in
Chapter 1).

269
270 6 Horn functions

Definition 6.1. An elementary conjunction



T (x1 , . . . , xn ) = xj xk (6.1)
j ∈P k∈N

is called a Horn term if |N| ≤ 1, that is, if T contains at most one complemented
variable. The term T is called pure Horn if |N | = 1, and positive if N = ∅.
Definition 6.2. A DNF
 
m

η(x1 , . . . , xn ) =  xj xk (6.2)


i=1 j ∈Pi k∈Ni

is called Horn (pure Horn) if all of its terms are Horn (pure Horn).
Note that the same function may have both Horn and non-Horn DNF
representations.
Example 6.1. The DNF
η1 (x1 , x2 , x3 ) = x1 x 2 ∨ x1 x 3 ∨ x2 x3
is Horn because its first two terms are pure Horn and its last term is positive,
whereas the following DNF of the same (monotone) Boolean function,
η2 (x1 , x2 , x3 ) = x1 x2 ∨ x1 x3 ∨ x1 x 2 x 3 ∨ x2 x3 ,
is not Horn because its third term contains two complemented variables. 

 
Definition 6.3. For a pure Horn term T = x k ∧ j ∈P xj , variable xk is called
the head of T , while variables xj , j ∈ P , are called the subgoals of T .
To simplify subsequent discussions, we further introduce the following nota-
tions. Given a subset P ⊆ {1, 2, . . . , n}, we also use the letter P to denote the
corresponding elementary conjunction as well as the Boolean function defined by
that conjunction:

P = P (x1 , . . . , xn ) = xj ,
j ∈P

whenever this notation does not cause any confusion. Thus, a Horn DNF can be
written as    
n
η= P∨ P xi  , (6.3)
P ∈P0 i=1 P ∈Pi

where P0 denotes the set of positive terms, while Pi denotes the family of subgoals
of the terms with head xi , for i = 1, . . . , n. We interpret the families Pi , i = 0, . . . , n,
as hypergraphs over the base set {1, 2, . . . , n}.
Example 6.2. Consider the Boolean expression
η = x 1 ∨ x1 x 2 ∨ x1 x2 x 3 ∨ x2 x3 x 1 . (6.4)
6.1 Basic definitions and properties 271

This is is a Horn expression, for which P0 = ∅, P1 = {∅, {2, 3}}, P2 = {{1}}, and
P3 = {{1, 2}}. Since P0 = ∅, η is in fact a pure Horn formula. 

Recall Definition 2.5: If Ax and Bx are two terms such that AB  = 0, then
AB is called their consensus. The term AB is the largest elementary conjunction
satisfying AB ≤ Ax ∨ Bx, and thus, whenever both Ax and Bx are implicants of
a same Boolean function f , then AB = 0 is an implicant of f , too.
Theorem 6.1. The consensus of two Horn terms is Horn. More precisely, the
consensus of two pure Horn terms is pure Horn, while the consensus of a positive
and a pure Horn term is positive.
Proof. Assume, without any loss of generality that Ax and Bx are two Horn terms
that have a consensus (at least one of the terms must be pure Horn for their con-
sensus to exist). Then, A must contain only positive literals, and B can contain
at most one negated variable (which cannot belong to A). Hence, their consensus
AB contains at most one negative literal; thus it is Horn. More precisely, AB is
positive (respectively, pure Horn) if Bx is positive (respectively, pure Horn). 

An important consequence of this observation is the following:


Theorem 6.2. If h is a Boolean function which can be represented by a (pure)
Horn DNF, then all prime implicants of h are (pure) Horn.
Proof. Consider a (pure) Horn DNF η representing h. According to Theorem 3.5,
all prime implicants of h can be obtained by applying the consensus method to
η. Thus all prime implicants of h can be obtained by a sequence of consensus
operations starting with terms present in η, that is, with (pure) Horn terms. Thus,
by Theorem 6.1 all terms obtained by that procedure must also be (pure) Horn,
and in particular, all prime implicants of h must be (pure) Horn. 

Example 6.3. Returning to Example 6.2, we observe that among the terms of η,
only x 1 is a prime implicant of the function h represented by η. All other terms in
(6.4) are nonprime. In fact, h has three prime implicants, namely, x 1 , x 2 and x 3 ,
the disjunction of which is another representation of h. 

Theorem 6.2 implies the following statement:


Corollary 6.1. If h is a Boolean function that can be represented by a (pure) Horn
DNF, then all prime DNF representations of h are (pure) Horn.
This fact provides our motivation for the following definition:
Definition 6.4. A Boolean function h is called a (pure) Horn function if it can be
represented by a (pure) Horn DNF.
Remark that the constant function h = 1n is Horn since, for instance, 1n =
x1 ∨ x 1 is a Horn DNF, but it is not pure Horn, since h(1, 1, . . . , 1) = 0 must hold
272 6 Horn functions

for all pure Horn functions. Let us further add that we can consider 0 n to be both
Horn and pure Horn, by definition, since its only DNF representation is the empty
DNF.
Although pure Horn functions play an important role in parts of this chapter,
they are not fundamentally different from Horn ones. Indeed:

Theorem 6.3. A function h on B n is pure Horn if and only if


(a) h is Horn, and
(b) h(1, 1, . . . , 1) = 0.

Proof. Necessity of (a)–(b) is obvious from the definition of pure Horn functions.
Conversely, if (a) holds then h can be represented by a Horn DNF, and if (b) holds
then this DNF cannot contain any positive term. 

Another easy relation is established by considering the pure Horn function pn


on B n represented by π = x 1 ∨ x 2 ∨ . . . ∨ x n . Note that pn is 1 everywhere except
at (1, 1, . . . , 1).

Theorem 6.4. A function h on B n is Horn if and only if hpn is pure Horn.

Proof. Assume that h is represented by a Horn DNF φ, and let T be a term of φ.


If T is pure Horn, then T π = T . If T is positive, then T π is a pure Horn DNF
(possibly 0). Thus, hpn = φ π is pure Horn.
Conversely, assume that hpn is pure Horn. If h(1, 1, . . . , 1) = 0, then hpn = h and
h is Horn. If h(1, 1, . . . , 1) = 1, then h = hpn ∨ x1 x2 . . . xn , and h is Horn, too. 

When dealing with Horn functions, we usually assume that the function is rep-
resented by one of its Horn DNFs. As a matter of fact, recognizing Horn functions
expressed by arbitrary DNFs turns out to be hard.

Theorem 6.5. (a) It is co-NP-complete to recognize whether an arbitrary DNF


represents a Horn function.
(b) It is co-NP-complete to recognize whether an arbitrary DNF represents a pure
Horn function.

Proof. In statement (a), NP-hardness follows from Theorem 1.30, and membership
in co-NP is an easy consequence of Corollary 6.2 to be proved in Section 6.3.
In statement (b), NP-hardness is implied by statement (a) and Theorem 6.4,
whereas membership in co-NP is implied by Theorem 6.3 and Corollary 6.2. 

Finally, note that the number of prime implicants of a Horn function can be
much larger than the number of terms in an arbitrary defining Horn expression
of it.
6.2 Applications of Horn functions 273

Example 6.4. The expression given in the proof of Theorem 3.17 is such a Horn
DNF. We can also consider the following, somewhat simpler expression:
# k $ k

η2 = xi y i ∨ yi . (6.5)
i=1 i=1

Clearly, η2 is a Horn expression in 2k variables and k + 1 terms, and it has more


than 2k prime implicants. For instance, all terms of the form

xi yi
i∈S i∈S

for any subset S ⊆ {1, 2, . . . , k}, are prime implicants of η2 . 

6.2 Applications of Horn functions


Horn functions appear in many different disciplines, though sometimes in
disguised form. We now describe a few examples of such applications.

6.2.1 Propositional rule bases


Expert systems, in particular, propositional production rule-based systems, are
widely used for decision support (see, e.g., Ignizio [519] and Section 1.13.1).
Boolean variables (propositions) are used in such systems to represent simple
statements about the state of the world. To use statements about a sick person as
examples, we may consider propositions like: x1 =“has a headache,” x2 =“must
take aspirin,” x3 =“coughs,” x4 =“must go to doctor,” and so on. In a rule base,
we can include simple implications, corresponding to statements which are known
(or required) to be true:

R = {x1 ∧ x3 =⇒ x2 , x1 ∧ x3 =⇒ x4 , · · · }.

In certain situations, some of the values of these propositional variables are known,
and the rule base R is used to derive the values of the other variables (e.g., to choose
which actions to take) so that all rules remain valid. In other cases, we might just
want to check whether a certain chain of events (assignments of truth values to the
propositional variables) obeys all the rules, or not.
We can easily see that such a rule base can equivalently be represented by a
Horn DNF
h = x1 x3 x 2 ∨ x1 x3 x 4 ∨ · · · .
More precisely, a binary assignment X to the propositional variables satisfies all
the rules of R exactly when h(X) = 0 (such an assignment X is called a model of
R). In other words, the models of R are exactly the false points of h.
Important problems arising in this context include deciding the consistency of a
given rule base (namely, finding a solution to the Horn equation h = 0, see Section
274 6 Horn functions

6.4), deriving all consequences of a partial assignment in a system in which all


rules of R must remain valid (namely, computing the forward chaining closure with
respect to h, see Section 6.4), finding a simpler equivalent expression of a given
rule base (namely, finding a “shorter” DNF of the Horn function representation,
see Section 6.7), etc. (see for instance [108, 112, 172, 173, 297, 298, 299, 300,
308, 391, 446, 447, 449, 450, 564]).

6.2.2 Functional dependencies in databases


For simplicity, we can imagine a database as a large array in which every row
corresponds to a particular item, usually called record, and in which the columns
correspond to the various attributes those records may have. The entries are strings
of text, numbers, dates, or more complex data structures themselves, and not every
attribute value is necessarily defined for a particular record. As a typical example,
we can think of each record as corresponding to a transaction with a customer
(such as bill sent, payment received, reminder sent, etc.) of a large company,
which may have thousands of customers, and many transactions with each of
these customers. Such a large database is typically highly redundant; for example,
in each transaction the customer may be identified by name, address, phone, and
account number, implying that all these attributes appear repeatedly in many of the
records. To handle such large amount of data, to produce various reports quickly,
to check consistency efficiently, and for many other typical operations, it is crucial
to store and access the database in an efficient way.
Functional dependencies provide one of the most important and most widely
used theoretical tool to model these issues (see e.g. [30, 176, 264, 319, 320, 517,
565, 646, 647, 663, 664, 874, 875]). For subsets X and Y of the attributes, we say
that X determines Y , and we write X → Y , if in every record of the database, the
values of the attributes in X determine uniquely the values of those in Y .
For instance, consider the following small database containing 4 records with
attributes A, B, C, and D:

A B C D
a b c d
a bb c dd
aa b cc d
aa bb cc dd

We can observe that {A} → {C}, {B, C} → {A, D}, and {D} → {B} are a few of the
many functional dependencies in this database. For instance, {D} → {B} means
that, whenever we know the value of attribute D, we “know” the value of B as
well: Whenever D = d in the above database, we also have B = b. In fact, using
{A} → {C} and {D} → {B}, we can uniquely recompute all records of the given
6.2 Applications of Horn functions 275

database from the following two small tables:

A C D B
a c and d b
aa cc dd bb

Furthermore, it is obvious that the functional dependency X → Y is equivalent


to the set of functional dependencies X → {y} for y ∈ Y , and that a set of functional
dependencies Xi → {yi }, i =1, . . . , m can equivalently be represented by the Horn
m 
system η = i=1 x∈Xi x ∧ y i (see, e.g., Sagiv et al. [797]; Ibaraki, Kogan,
and Makino [517]). This connection faithfully preserves all logical inferences,
too, namely, all implicants of η correspond to valid functional dependencies of the
same database, and vice versa (cf. so-called Armstrong’s axioms [30, 647]).

6.2.3 Directed graphs, hypergraphs, and Petri nets


Simple examples of (pure) Horn systems arise from the following correspondence
with directed graphs: Given a directed graph G = (V , A) on the vertex set V =
{1, 2, . . . , n}, we can associate with it a quadratic pure Horn DNF by defining

ηG = xi x j .
(i,j )∈A

It is easy to see that all prime implicants of ηG are also quadratic and pure Horn,
and that they are in a one-to-one correspondence with the directed paths in G,
namely, xi x j is a prime implicant of ηG if and only if there exists a directed path
from i to j in G. Algorithms and graph properties for directed graphs naturally
correspond to operations and properties of Horn functions. For instance, strong
components of G correspond in a one-to-one way to logically equivalent variables
of ηG , the transitive closure of G corresponds to the set of prime implicants of ηG ,
etc. (see Chapter 5 for more details, in particular, Sections 5.4 and 5.8).
Directed hypergraphs (V , A) provide a natural generalization of directed
graphs. They consist of hyperarcs of the form T → h, where T ⊆ V and h ∈ V .
The set T is called the tail (or source set) of the hyperarc T → h, while h is called
its head (see Ausiello, D’Atri, and Sacca [37] or Gallo et al. [361]). The connec-
tion with Horn expressions is quite obvious, and several algorithmic problems and
procedures of logical inference on Horn systems can naturally be reformulated on
directed hypergraphs (see, e.g., [168, 360, 363, 756]).
The more general notion of Petri nets was introduced for modeling and ana-
lyzing finite state dynamic systems (see Petri [743]). Many important aspects of
Petri nets can equivalently be modeled by associated Horn expressions, providing
efficient algorithmic solutions to some of the basic problems of system design and
analysis (see, e.g., Barkaoui and Minoux [53, 683]).
276 6 Horn functions

6.2.4 Integer programming and polyhedral combinatorics


Just as monotone Boolean functions correspond naturally to set covering problems
(see Chapter 1), many examples of Horn systems also arise in integer programming.
Conditional covering problems involve binary variables and inequalities of the
form

xi ≥ 1 for P ∈ P,
i∈P

xi ≥ xj for (H , j ) ∈ H,
i∈H

and are used to model certain facility location problems (see Moon and Chaudhry
[691] or Chaudhry, Moon, and McCormick [189]). A similar model is used by
Salvemini, Simeone, and Succi [800] to model shareholders’ networks and to
determine optimal ownership control.
For another type of connection, let us consider a Horn DNF, say, for instance,

η = x1 x2 x3 x 4 ∨ x3 x 2 ∨ x1 x4 ∨ · · ·

and observe that a binary assignment X is a false point of η if and only if the
corresponding system of linear inequalities
−x1 − x2 − x3 + x4 ≥ −2
x2 − x3 ≥ 0
−x1 − x4 ≥ −1
..
.
is satisfied by X. One characteristic of this system of inequalities is that each row
has at most one positive coefficient. This feature turns out to imply interesting
properties of the set of feasible solutions. Namely, it was proved by Cottle and
Veinott [216] that a nonempty convex polyhedron of the form

P = {x | AT x ≥ b, x ≥ 0} (6.6)

has a least element if each row of the integral matrix A has at most one positive
element. Furthermore, as was shown by Chandrasekaran [180], the polyhedron P
has an integral least element for every integral right-hand side vector b if A has
at most one positive element in each row, and all positive elements in A are equal
to 1. For the special case of 0, ±1 matrices, this property was also observed and
utilized by Jeroslow and Wang [534] and Chandru and Hooker [182].
The property that P has a least element is perfectly analogous to the fact that
Horn functions have a unique minimal false point, and it can in fact be established
analogously to Theorem 6.6. This very useful property implies that for a linear
integer minimization problem over a polytope of the form (6.6), a simple rounding
procedure provides the optimal solution.
For further connections between cutting planes in binary integer programming
and prime implicant generation techniques for Boolean functions and, in particular,
6.3 False points of Horn functions 277

those specialized for Horn DNFs, we refer the reader to the book by Chandru and
Hooker [184] and to the survey by Hooker [503].
The next interesting connection is between (0, ±1) matrices, certain associated
polyhedra, and Horn functions. It is quite natural to associate with an m×n, (0, ±1)
matrix A the DNF
   
m

φA =  xj  ∧  xk .
i=1 j :aij =1 k:aik =−1

The association A ←→ φA is one-to-one between (0, ±1) matrices and Boolean


DNFs. Though this association is merely syntactical, in some cases, it covers
a much deeper connection. Perfect (0, ±1) matrices, introduced by Conforti,
Cornuéjols, and de Francesco [207], constitute just such an interesting case. This
family of matrices generalize perfect (0, 1) matrices (i.e., matrices that are the
clique vertex incidence matrices of maximal cliques of perfect graphs; see Lovász
[622] and Padberg [722]), totally unimodular matrices, and balanced (0, ±1) matri-
ces (see, e.g., the books by Truemper [871] or Cornuejols [215]). A (0, ±1) matrix
A is called perfect if the polyhedron
PA = {x | Ax ≤ 1 − n(A), 0 ≤ x ≤ 1}
has integral vertices, where n(A) is an integer vector, the ith component of which
is the number of negative entries in row i of A, for i = 1, . . . , m, and where m is
the number of rows in A. Perfect (0, ±1) matrices have several characterizations
in terms of the perfection of associated graphs (see [107, 207, 419]), and are also
connected to the family of Horn functions. Namely, it was shown by Boros and
Čepek [107] that a (0, ±1) matrix A satisfying PA  = ∅, is perfect only if φA belongs
to a subclass of acyclic renamable Horn functions (see Section 6.9 for definitions).

6.3 False points of Horn functions


Given a Boolean function f on B n , let us recall from Chapter 1 that T (f ) and F (f )
denote, respectively, the sets of its true points and false points. Given binary vectors
X, Y ∈ Bn , we call Z = X ∧ Y their conjunction, defined by componentwise con-
junction. In other words, if X = (x1 , . . . , xn ), Y = (y1 , . . . , yn ), and Z = (z1 , . . . , zn ),
then zj = xj ∧ yj for j = 1, . . . , n.
Definition 6.5. For a nonempty subset S ⊆ Bn , let us define
& ' (

''

S = X ' ∅ = R ⊆ S ,
'
X∈R
∧ ∧
and define ∅ = ∅. We call S the conjunction closure of S. Finally, we say that a
subset S ⊆ B n is closed under conjunction, or conjunction-closed, if S = S ∧ .
Example 6.5. If S = {(0, 1, 0, 1), (0, 1, 1, 0), (1, 1, 1, 1)}, then
S ∧ = {(0, 1, 0, 1), (0, 1, 1, 0), (1, 1, 1, 1), (0, 1, 0, 0)}. 
278 6 Horn functions

Note that the mapping S → S ∧ satisfies the usual properties of closure


operations, justifying its name:
Lemma 6.1. For all subsets A ⊆ B ⊆ B n , we have A ⊆ A∧ , A∧ ⊆ B ∧ , and
(A∧ )∧ = A∧ .
Proof. Immediate by the definition. 

Since Boolean functions can also be defined by their sets of true and/or false
points, and since Horn functions constitute a proper subfamily of all Boolean
functions, not all subsets of B n can appear as sets of false points of Horn functions.
Indeed, the set of false points of a Horn function has a very special property,
observed first by McKinsey [638] and also by Horn [507; Lemma 7].
Theorem 6.6. A Boolean function is Horn if and only if its set of false points is
closed under conjunction.
Proof. Let us consider a Boolean function h on B n , and let T1 , …, Tp denote its
prime implicants.
Assume first that h is Horn or, equivalently, by Theorem 6.2, that all its prime
p
implicants are Horn. Let us note that F (h) = ∩k=1 F (Tk ), and that the intersection
of conjunction-closed sets is conjunction-closed again. Hence, to prove the first
half of the statement, it is enough to show that the set of false points F (T ) of a
Horn term T is closed under
conjunction. Since this is obvious for a positive term,
let us assume that T = j ∈P xj x i , and let us consider binary vectors X, Y , and Z
for which Z = X ∧ Y and T (Z) = 1. Then, we must have zi = 0 and zj = 1 for
all j ∈ P , implying by the definition of conjunction that xj = yj = 1 for all j ∈ P
and xi ∧ yi = 0. Thus, at least one of xi and yi must be equal to 0, say, xi = 0, and
therefore T (X) = 1 follows. This implies that F (T ) is closed under conjunction.
For the reverse direction, let usassume that  h
is not Horn, and let us consider

a non-Horn prime implicant T = j ∈P xj ∧ k∈N x k of h, where |N | ≥ 2.
According to the Definition 1.18 of prime implicants, deleting any literal from T
yields a non-implicant of h. Thus in particular, for every index i ∈ N there exists
a binary vector X i ∈ B n such that xji = 1 for j ∈ P , xki = 0 for k ∈ N \ {i}, and
xii = 1, and for which h(X i ) = 0 holds. Therefore, T (X i ∧ Xi ) = 1 follows for any
two distinct indices i  = i , i, i ∈ N , implying h(Xi ∧ X i ) = 1, and thus proving
that F (h) is not closed under conjunction. 

This result has several interesting consequences. First, it implies a simple char-
acterization of Horn functions, which can serve as the basis for learning Horn
theories (see, e.g., [139, 264, 650]), and which was generalized to several other
classes of Boolean functions (see, e.g., [305, 303] and Chapter 11 in this book).
Corollary 6.2. A Boolean function f on B n is Horn if and only if
f (X ∧ Y ) ≤ f (X) ∨ f (Y ) (6.7)
n
holds for every X, Y ∈ B .
6.3 False points of Horn functions 279

Proof. Indeed, (6.7) implies that F (f ) is closed under conjunction, namely, f is


Horn, by Theorem 6.6. Conversely, if f is Horn, then the left-hand side of (6.7) is
zero whenever the right-hand side is zero, again by Theorem 6.6. 

Another implication of Theorem 6.6 is the following statement:


Corollary 6.3. For every Horn function h on Bn , h  = 1n , there exists a unique
minimal false vector Xh ∈ F (h) ⊆ Bn .
Proof. According to Theorem 6.6, the false vector

Xh = Y
Y ∈F (h)

is well-defined, unique and satisfies the inequalities X h ≤ Y for all Y ∈


F (h)  = ∅. 

Theorem 6.6 also implies that every Boolean function f has a unique maximal
Horn minorant h, that is, a Horn function h such that h ≤ f and the inequalities
h ≤ h ≤ f hold for no other Horn function h  = h.
Theorem 6.7. Given a Boolean function f , let h be the function defined by F (h) =
F (f )∧ . Then h is the unique maximal Horn minorant of f .
Proof. Clearly, h is well defined, and since F (h)∧ = (F (f )∧ )∧ = F (f )∧ = F (h),
it is also Horn by Theorem 6.6. It is also clear that F (h) = F (f )∧ ⊇ F (f ), and
hence, h ≤ f . Furthermore, for any Horn minorant h ≤ f we have F (h ) ⊇ F (f ),
and thus, by Theorem 6.6, F (h ) = F (h )∧ ⊇ F (f )∧ = F (h), which implies
h≥h. 

6.3.1 Deduction in AI
The false points of Horn functions play a role in artificial intelligence in a
slightly different context, though the characterization by Theorem 6.6 remains
essential. In the artificial intelligence literature, typically, Horn CNFs instead of
Horn DNFs are considered. A Horn CNF is a conjunction of elementary dis-
junctions, called clauses, in which at most one  literal is positive. Due to De
m 
Morgan’s laws, η = i=1 j ∈P xj k∈Ni x k is a Horn DNF if and only if
m    i
η = i=1 j ∈Pi x j k∈Ni xk is a Horn CNF. Accordingly, the solutions of the
Boolean equation η = 0, that is, the false points of the Boolean mapping η are
referred to as the models of η or, more precisely, as the models of the Boolean
function h represented by the DNF η.
One of the frequently arising tasks in this context is deduction, that is, the
problem of recognizing whether another logical expression η is consistent with
the given knowledge base h represented by η. Here, consistency means that all
280 6 Horn functions

models of η are also models of η . Thus, deduction is equivalent with recognizing


whether η ≤ η. Such a task is solved customarily by algebraic manipulations of the
expressions η and η (e.g., consensus operations). As a new approach, model-based
reasoning was introduced recently by a number of authors (see, e.g., [554, 566]).
In this approach, based on the equivalence η ≤ η ⇐⇒ F (η ) ⊇ F (η), the relation
η ≤ η is tested by checking the values of η on the set F (η). Though, this approach
may be inefficient for general Boolean functions, a more efficient variant of it was
introduced by Khardon and Roth [566] for Horn knowledge bases. The following
observation serves as the basis for the improvement:
Theorem 6.8. Let A, B, S be subsets of B n such that A ⊆ S ⊆ A∧ and B ⊆ S ⊆ B ∧ .
Then, S ⊆ (A ∩ B)∧ .
Proof. Assume indirectly that S ⊆ (A ∩ B)∧ . It follows from Lemma 6.1 that A  ⊆
(A∩B)∧ and B  ⊆ (A∩B)∧ . Let us choose a maximal point X ∈ (A∪B)\(A∩B)∧
with respect to the usual componentwise comparison. We can assume, without any
loss of generality, that X ∈ A (and hence, X  ∈ B). Since A ⊆ S ⊆ B ∧ , there exist k
binary vectors Y1 , …,Yk ∈ B such that X = Y1 ∧ Y2 ∧ · · · ∧ Yk . Furthermore, since
X  ∈ B, we have k ≥ 2 and Yj = X for j = 1, . . . , k. By the maximality of X in
(A ∪ B) \ (A ∩ B)∧ , we must have Yj ∈ (A ∩ B)∧ for all j = 1, . . . , k, implying
X ∈ (A ∩ B)∧ by Lemma 6.1, and hence, contradicting the choice of X. 

Corollary 6.4. For every nonempty subset S ⊆ Bn , there exists a unique minimal
subset Q(S) ⊆ S such that Q(S)∧ = S ∧ ⊇ S.
Proof. Define )
Q(S) = Q. (6.8)
Q⊆S⊆Q∧

Clearly, Q(S) ⊆ S, and by Theorem 6.8, S ⊆ Q(S)∧ . It follows by Lemma 6.1 that
Q(S)∧ = S ∧ . 

In particular, for a Horn function h, we have F (h) = Q(F (h))∧ by Theorem


6.6 and by Corollary 6.4. The elements of Q(F (h)) are called the characteristic
models of h by Khardon and Roth [566], who argue that these points are enough
for model-based deduction.
Theorem 6.9. ([566]). Given a Horn function h and a Horn DNF η, we have η ≤ h
if and only if η(X) = 0 holds for all X ∈ Q(F (h)).
Proof. Assume first that η ≤ h holds. Then we have η(X) = h(X) = 0 for all
X ∈ F (h); and the claim follows by F (h) ⊇ Q(F (h)).
Assume next that η(X) = 0 for all X ∈ Q(F (h)). This means that
F (η) ⊇ Q(F (h)); hence F (η) ⊇ Q(F (h))∧ = F (h) by Theorem 6.6 and
Corollary 6.4. 

Further characterizations and properties of characteristic models are stated as


exercises at the end of this chapter (see also [566]).
6.4 Horn equations 281

6.4 Horn equations


One of the main reasons Horn expressions appear in applications is that the tautol-
ogy problem for Horn DNFs (or, equivalently, the satisfiability problem for Horn
CNFs) can be solved efficiently (see Even, Itai, and Shamir [318] or Dowling
and Gallier [277]). In this section we recall this result as well as several related
algorithmic ideas.

6.4.1 Horn equations and the unit literal rule


Let us first observe, as a further important implication of Theorem 6.6, that
the unique minimal false point X h of a Horn function h, as defined in
Corollary 6.3, provides us with a characterization of the negative linear prime
implicants of h.
Corollary 6.5. Given a Horn function h, a negative linear term T = x j is an
implicant of h if and only if xjh = 1.
Proof. Assume first that T = x j is an implicant of h, that is, x j ≤ h. This implies
that yj = 1 for all vectors Y ∈ F (h), and hence, xjh = 1, by the characterization of
X h in the proof of Corollary 6.3.
For the converse direction, consider an index j for which xjh = 1. Then yj = 1
for all vectors Y ∈ F (h) by the definition of Xh , and hence, x j ≤ h, that is, the
term T = x j is indeed an implicant of h. 

To prove that the tautology problem for Horn DNFs can be solved efficiently,
we shall show below that given a Horn DNF η representing the Horn function h,
the unique minimal false point X h ∈ F (h) can be found in linear time in the size
|η| of the DNF η. (As before, |η| denotes the number of literals occurring in the
DNF η.) Furthermore, h = 1 can also be recognized with the same effort whenever
F (h) = ∅.
Consider a Horn DNF η of the form (6.2), and denote by h the Horn function
represented by η. We assume, without loss of generality, that |Pi ∪ Ni | > 0 for all
i = 1, . . . , m.
Note first that if Pi  = ∅ for all terms i = 1, . . . , m, then the vector 0 =
(0, 0, . . . , 0) ∈ B n is a solution of the equation η(X) = 0, and clearly, Xh = 0
is the unique minimal false point in this case.
Consider next the case in which Pi = ∅ for some term Ti of η. In this case, Ti
must be a negative linear term of the form Ti = x j for some index j . Clearly, for
all solutions of the equation η(X) = 0 (i.e., for all false points X ∈ F (h)), we have
xj = 1, and thus xjh = 1 is implied, too.
Based on these observations, a naïve approach to solving the equation η(X) = 0
could proceed as shown in Figure 6.1. We can observe that this procedure is a
restricted version of the so-called Unit Literal Rule employed by most satisfiability
algorithms (see Chapter 2). In this version only negative linear terms are used, and
hence, we call it the Negative Unit Literal Rule procedure (NULR).
282 6 Horn functions

Procedure NULR(η)
Input: A Horn DNF η representing the Horn function h.
Output: A false point of h or a proof that h = 1.

set η0 := η and k := 0.
repeat
if there is an empty term in ηk
then stop {comment: no solution, h = 1}
else find j such that x j is a negative linear term of ηk ;
if there is no such index j
then set all remaining variables to 0,
return X = Xh and stop {comment: solution found}
else
set xj := 1, ηk+1 := ηk |xj =1 , and k := k + 1.

Figure 6.1. Procedure NULR.

Theorem 6.10. Let η be a Horn DNF of the Horn function h on B n . Then, algorithm
NULR(η) runs in O(n|η|) time, and either it detects that h ≡ 1 or it finds the vector
Xh ∈ F (h).

Proof. Let us denote by l the value of index k at termination, and let jk denote the
index of the variable fixed at 1 in step k − 1. Observe that for every k = 1, . . . , l
we have ηk = ηk−1 |xjk =1 , and hence,

ηk = η|xj1 =1,xj2 =1,...,xjk =1 . (6.9)

We claim that x jk , k = 1, . . . , l, are negative linear implicants of h. This is clearly


true for k = 1, since x j1 is a term of η by the choice of j1 . Let us prove the claim
by induction on k, and let us assume that it is true for k < i ≤ l. Then x ji is a term
of ηi−1 by the choice of ji ; hence, by (6.9), η must have a term T of the form

T = ( j ∈S xj )x ji for some subset S ⊆ {j1 , . . . , ji−1 }. Since the terms x j for j ∈ S

are linear implicants of h by our assumption, x ji ≤ T ∨ j ∈S x j ≤ h follows,
proving that x ji is an implicant of h and concluding the inductive proof of the
claim.
If algorithm NULR terminates with finding an empty term in ηl , then it follows
by (6.9) that η must contain a term T of the form

T= xi
i∈S

for some subset S ⊆ {j1 , j2 , . . . , jl−1 }. Therefore,


# $


1 = xi ∨ x i ≤ h,
i∈S i∈S

follows, implying h ≡ 1n .
6.4 Horn equations 283

On the other hand, if NULR terminates with finding a solution, let us denote
this solution by X ∗ ; thus, X ∗ is the point defined as

∗ 1 if i ∈ {j1 , . . . , jl },
xi =
0 otherwise.

Then, since ηl has neither an empty term, nor a negative linear term, ηl (0, 0, …,
0) = 0 follows. By (6.9), we have 0 = ηl (0, 0, . . . , 0) = η(X ∗ ), implying X ∗ ∈ F (h).
Then X∗ ≥ X h follows by Corollary 6.3. Since we have shown that all the terms
x jk for k = 1, . . . , l are negative linear implicants of h, and since h  ≡ 1n , all these
terms must be negative linear prime implicants of h, implying thus X∗ ≤ Xh by
the definition of X∗ and by Corollary 6.5. Hence, X ∗ = X h follows, concluding
the proof of correctness.
Finally, we note that all operations of the repeat loop can obviously be carried
out in linear time in the size of the input DNF η, hence the total running time of
NULR can be bounded by O(n|η|). 

The procedure NULR can actually be implemented to run in linear time, by


representing the input DNF in an appropriate data structure:

Theorem 6.11. Procedure NULR can be implemented to run in O(n + |η|) time.

Proof. We leave the proof as an exercise to the reader (see, e.g., Exercise 6 at the
end of this chapter). 

A first important consequence of the previous results is that, unlike in the case
of general Boolean functions, we can decide in polynomial time whether or not a
given term is an implicant of a Horn function.

Corollary 6.6. Given a Horn DNF η of a Horn function h, one can decide in
O(n + |η|) time whether a given term T is an implicant of h.

Proof. Follows readily by Theorems 3.8 and 6.11, since the restriction of η for
T = 1 is, again, a Horn DNF. 

Recall from Chapter 1 that a DNF η of the Boolean function h is called prime
if all terms of η are prime implicants of h and called irredundant if no terms can
be deleted from η without changing the Boolean function it represents.

Theorem 6.12. Given a Horn DNF, η of a Horn function h, one can construct in
O(|η|(n + |η|)) time an irredundant and prime Horn DNF of h.

Proof. For a term T of η, let η denote the DNF obtained from η by deleting the
term T . Clearly, η = η if and only if T is an implicant of η , which we can test
whether O(n + |η |) time in view of Corollary 6.6. Repeating this for all terms of η
one by one, and deleting redundant terms, we can produce in O(m(n + |η|)) time
an irredundant DNF of h.
284 6 Horn functions

To achieve primality, let us take a term T of the current Horn DNF, and let T
denote the term obtained from T by deleting a literal u of T . By definition, if T
is an implicant of h, then we can replace T by T . According to Corollary 6.6,
we can test whether T is an implicant in O(n + |η|) time. Thus, by repeating this
procedure for all literals of T , replacing T by T whenever T is proved to be an
implicant, and repeating for all terms of η, we can derive in O(|η|(n + |η|)) time
a prime DNF of h.
Since |η| ≥ m, the claim follows. 

6.4.2 Pure Horn equations and forward chaining


When dealing with pure Horn DNFs, the tautology problem is trivial in view of
the following observation:
Remark 6.1. If η is a pure Horn DNF, then η(1, 1, . . . , 1) = 0. 
Thus, in order to solve a pure Horn DNF equation, it is enough to read the input
to confirm that it is indeed pure Horn; no additional computations are needed.
However, to find the unique minimal solution of a pure Horn equation, one needs
to employ NULR or a similar procedure. Such a variant of NULR, applied to pure
Horn expressions, is widely used in artificial intelligence, where it is known as the
forward chaining procedure.
To derive this procedure, we first consider a slightly more general inference
problem that frequently arises in the AI literature. Given a DNF η and a subset
S ⊆ {1, 2, . . . , n} of indices of its variables, let us denote by η|S the DNF η|xi =1,i∈S
obtained by fixing all variables xj to 1 in η, for j ∈ S. With these notations, we
are interested in the inference problem: Given a pure Horn DNF η and a subset S
of indices, find all other indices j  ∈ S such that xj = 1 is  implied by the equation

η|S = 0. Clearly, a pure Horn term of the form i∈S x i x j is an implicant of η if
and only if xj = 1 is such an implied assignment. Thus, in other words, we would
like to determine the set of all negative linear terms of the pure Horn expression
η|S . Our previous results show that this can be done in linear time by using NULR.
However, the computation can be organized in a somewhat simpler way in this
special case, as described in Figure 6.2.
It is easy to see that this forward chaining procedure can be implemented to
run in linear O(n + |η|) time, just like NULR (see Theorem 6.11). However, there
are two major differences between forward chaining and NULR: Namely, forward
chaining starts fixing the variables xj for j ∈ S in the first step of the procedure, and
it does not check for inconsistency of the input expression. This is well justified
by Remark 6.1, since forward chaining is defined for pure Horn expressions only.
NULR can in fact be viewed as a natural generalization of forward chaining for
general Horn expressions, starting with S = ∅.
For a pure Horn DNF η, we look at the set S η as the logical closure of S; the
mapping S + −→ S η satisfies the usual properties of a closure operator.
6.4 Horn equations 285

Procedure Forward Chaining(η,S)


Input: A pure Horn DNF η on Bn , and a subset S ⊆ {1, 2, . . . , n}.
Output: A superset S η of S.

set S 0 := S, η0 := η|S , and k := 0;


repeat while there is a negative linear term x j of ηk :
set S k+1 := S k ∪ {j }, ηk+1 := ηk |xj =1 , and k := k + 1;
return S η := S k

Figure 6.2. Procedure Forward Chaining.

Lemma 6.2. If η is a pure Horn DNF, then S η ⊇ S, and (S η )η = S η for all subsets
S ⊆ {1, 2, . . . , n}.

Proof. Follows directly by the definition. 

Note that, in fact, the set S η depends only on the pure Horn function h repre-
sented by η, and not on the particular representation η. Hence, we often prefer to
use the notation S h rather than S η . Still, although the set S h does not depend on the
given representation of h, its computation may; hence, the notation S η will also
be used when necessary to avoid computational ambiguity.
The forward chaining procedure can also be viewed as producing the unique
minimal false point of h within a subcube of B n . Recall from Chapter 1 that for
a subset S ⊆ {1, 2, . . . , n} we denote by eS the characteristic vector of S, and by
T |S,∅ the sub-cube of vectors X ≥ eS : T |S,∅ = {X ∈ Bn | xi = 1 for all i ∈ S}. With
these notations the following statement follows directly from the forward chaining
procedure:

Remark 6.2. Given a pure Horn function h and a subset S ⊆ {1, 2, . . . , n} the point
eS h is the unique minimal point in T |S,∅ ∩ F (h). 

Let us add that the simple linear-time forward chaining procedure is also
instrumental in testing if a given term is an implicant of a Horn function.

Lemma
6.3. Given a pure Horn DNF η of the pure Horn function h, a term
T = j ∈P xj j ∈N x j is an implicant of h if and only if N ∩ P η  = ∅.

Proof. If T is not an implicant of h, then there must exist a vector X ∗ ∈ B n such that
h(X∗ ) = 0 and T (X∗ ) = 1, implying xj∗ = 0 for all j ∈ N . Moreover, for all indices
j ∈ P η we must have xj∗ = 1 by the definition of P η . Thus, N ∩ P η = ∅ follows.
 
Conversely, if i ∈ N ∩ P η , then the term T = j ∈P x j x i is an implicant of h,
as we observed earlier, and thus T ≤ T is an implicant, too. 
286 6 Horn functions

6.4.3 More on Horn equations


We conclude this section with a few remarks about related results and techniques.
First, we note that the polynomial solvability of Horn equations was probably
well-known “folklore,” and some implementations were made independently in
AI (see, e.g., the development of the programming language PROLOG [648]) and
in database theory (see, e.g., [19, 185, 267, 489, 539, 582]), well before linear
time solvability was formally proved. Several linear time algorithms have been
proposed for Horn equations, using mainly graph or directed hypergraph models
(see, e.g., [277, 318, 815]).
A special variant of this problem, the so-called unique Horn satisfiability prob-
lem also gained some popularity in the literature. In the DNF variant of this
problem, a Horn DNF η is given, and the problem is to decide whether the
Boolean equation η = 0 has a unique solution or, in other words, to decide whether
|F (h)| = 1 for the function h represented by η. The difficulty of this problem
comes from the fact that, while it is relatively easy to generate the negative linear
prime implicants of a Horn function, one has to employ a more complicated algo-
rithm to efficiently generate the positive linear prime implicants. Minoux [682]
presented an O(|η| + n log n) algorithm, improved later by Schlipf et al. [809] to
O(|η| + mα(m + n, n)), where α denotes the inverse Ackerman function, and m is
the number of terms in η. The existence of a truly linear time algorithm for unique
Horn satisfiability is an open problem, as of now. Using the somewhat different
computational model of random access machines, Pretolani [757] provides a linear
time algorithm (based on a result of Gabow and Tarjan [357]).
The unit literal rule is widely used and is one of the basic procedures in most
satisfiability solvers. In fact, it was shown to provide polynomial time solution,
not only to Horn, but also to a much larger class of Boolean equations (see, e.g.,
Schlipf et al. [809])
In order to attack hard general (non-Horn) equations and satisfiability problems,
several heuristics and approximations rely on Horn approximations of Boolean
expressions (see, e.g., [106, 365, 554, 555, 596] and Section 2.5.2). We pose as
exercises the related problems of finding tight Horn minorants and/or majorants
of Boolean expressions.

6.5 Prime implicants of Horn functions


Logical inference is a central problem in various areas, including theorem proving,
logic programming, databases, and so on. We can formulate logical inference as
the problem of generating the prime implicants of a given DNF.
As we saw in Chapter 3, the consensus method is one of the general methods used
to obtain all prime implicants of a Boolean function. Let us recall (see Theorem 3.9
and Corollary 3.6) that, for function classes for which the corresponding Boolean
equation is tractable, prime implicants can efficiently be generated in total time
(see also Appendix B). This certainly implies, according to Theorems 6.10 and
6.5 Prime implicants of Horn functions 287

6.11, that prime implicants of a Horn function can be generated in polynomial


total time. For the sake of completeness, let us briefly repeat here the heart of the
argument.
Going into the details of the general consensus method, we can see that a great
part of its computational redundancy is caused by the fact that for every prime
implicant T of the input function f , the algorithm may generate many (maybe
exponentially many) implicants, all of which will eventually be absorbed by T .
Hence, if we could repeatedly simplify the list of terms representing f , allowing
only prime implicants in it, we would cut drastically the length of the computations.
For Horn DNFs this can be done efficiently, by Theorem 6.12. Using this idea
Boros, Crama, and Hammer [112] proved that prime implicants of a Horn DNF
can be generated in polynomial incremental time. More precisely, it was shown
in [112] that given two Horn DNFs, φ and ψ, one can decide whether ψ contains
all prime implicants of φ, and if not, one can find a new prime implicant of φ in
poly(|φ|, |ψ|) time. Let us note that the same decision problem for general DNFs
is a hard problem, since the testing of whether the terms of ψ are indeed (prime)
implicants of φ is already a co-NP-complete problem by Theorem 3.7.
In this section, we present some further specialized versions of the consensus
method that for Horn functions provide incrementally efficient ways to generate
all prime implicants.
The first variant is the Prime Implicant Depletion procedure described in
Figure 6.3. It runs on a prime DNF, and it is based on applying the consensus
operation with the prime implicants of the input Horn DNF one-by-one, in any
order, without ever returning to the same prime implicant. (This is akin to the
variable depletion procedure in Section 3.2.2.)
To see the correctness of the algorithm in Figure 6.3, let us first prove the
following lemma.

Procedure Prime Implicant Depletion(η)


Input: A prime DNF η = T1 ∨ T2 ∨ . . . ∨ Tm .
Output: All prime implicants of the function f represented by η.

initialize P := ∅, L = {T1 , . . . , Tm };
repeat while L  = ∅
select a term T ∈ L and set L := L \ {T } and P := P ∪ {T };
generate all consensuses of T with terms T ∈ L, and add the
produced terms to L;
substitute each term in L by a corresponding prime implicant
of f absorbing this term;
eliminate duplicates from L, as well as those terms which also
appear in P;
end while
return the list of terms in P

Figure 6.3. Procedure Prime Implicant Depletion.


288 6 Horn functions

Lemma 6.4. Let f be a Boolean function represented by the DNF ϕ = A ∨ ψ,


where A is an elementary conjunction, and let us denote by η the disjunction of
all the terms obtained by consensus of A with the terms of ψ. Then, for every
implicant T of f , we have either T ≤ A or T ≤ ψ ∨ η.

Proof. Let us assume that T is an implicant of f for which T ≤ A does not hold;
hence, A contains a literal u that is not in T . Let us show that T ≤ ψ ∨ η.
Consider any binary point X ∈ Bn for which T (X) = 1. Since T ≤ f , we must
have A(X) ∨ ψ(X) = 1. If ψ(X) = 1, we are done. Otherwise, A(X) = 1, and
in particular, u(X) = 1. Let us denote by Y the binary point obtained from X
by switching the value of the literal u. Since u  ∈ T , we have T (Y ) = 1, hence
f (Y ) = 1. On the other hand, since u ∈ A, we have A(Y ) = 0, thus implying
ψ(Y ) = 1. This, together with ψ(X) = 0, implies that there is a term B in ψ
involving the literal u, for which B(Y ) = 1. Hence, the terms A and B have
exactly one conflict, and thus their consensus C = (A ∪ B) \ {u, u} must be a term
of η, implying C(X) = 1 ≤ η(X). This proves the lemma. 

Theorem 6.13. The Prime Implicant Depletion procedure generates the com-
plete list of prime implicants of the Boolean function f represented by the prime
DNF η. Furthermore, when η is a Horn DNF, the procedure runs in polynomial
incremental time and each main while loop takes O(n(n + |η|)|L|) time, where L
is the current list of prime implicants at the beginning of the loop.

Proof. Let TL denote the disjunction of prime implicants in the current list L.
We argue by induction on the size of P that at any moment during the procedure,
every prime implicant of f is either explicitly listed in P, or is an implicant of
TL . This is clearly the case at the very beginning of the procedure. According
to Lemma 6.4, this property is not changed when we move a term T (a prime
implicant of f ) from L to P, and then increment L with the consensuses obtained
with T . The property also remains unchanged when we substitute the terms in L
by some absorbing prime implicants, since such a substitution does not change
the function represented by TL . Similarly, the property remains valid when we
eliminate duplicates from L.
Now, when the algorithm stops L is empty; hence, P contains all prime
implicants of f .
To see the complexity claim, let us observe that the consensus of two terms can
be carried out in O(n) steps, where n is the number of variables in η; hence all
consensuses in a main iteration take O(n|L|) time. This step introduces at most
|L| new terms. For each term, we need to find a prime implicant of f that absorbs
it, which can be done, for instance, by forward chaining in O(n(n + |η|)) time.
Hence, a prime list can be obtained in O(n(n + |η|)|L|) time. Finally, by keeping L
and P in a hash, the elimination of duplicates can be accomplished in O(|L| log n)
time, proving the claim. 
6.5 Prime implicants of Horn functions 289

A further improvement can be achieved by introducing the restricted version of


the consensus method in which only those consensuses are considered where at
least one of the terms belongs to the original input DNF. More precisely, given a
DNF η of the function f , let us call a consensus between two implicants of f an
input consensus if at least one of these implicants is present in η.
Let us remark that input consensus is not necessarily complete for an arbitrary
input DNF, in the sense that not all prime implicants can be generated in this way.
Example 6.6. Consider the DNF

φ = x1 x 3 x7 ∨ x2 x3 x6 ∨ x1 x 2 x7 ∨ x 1 x 4 x7 ∨ x4 x5 x8 ∨ x 1 x 5 x7 .

It is easy to check that the term T = x6 x7 x8 is a prime implicant of φ. However, it


cannot be obtained from φ by input consensus, since all terms in φ have at least
two variables not present in T . 
Furthermore, even when φ = 1n , the unique prime implicant of φ can be gener-
ated by input consensus if and only if it can also be generated by unit consensuses,
that is, by the NULR procedure (see Exercise 10 in Chapter 2).
For Horn DNFs, however, this restricted variant of the consensus method works
well. It is described more precisely in Figure 6.4.
Before we prove the correctness of this procedure, we need to establish a result
shown originally by Chang and Lee [186] and by Jones and Laaser [539].
Lemma 6.5. Let us assume that η is a Horn DNF representing the function 1. An
empty term can then be derived from η by a sequence of input consensuses such
that each term of η is used at most once in the sequence.
Proof. Let us consider the procedure NULR(η). Since η = 1, NULR terminates
by finding an empty term in ηk = η|xj1 =1,...,xjk =1 , and we can conclude, as in the
proof of Theorem 6.10, that there is a corresponding positive term T0 in η such
that T0 ⊆ {j1 , . . . , jk }.

Procedure Input Consensus(η)


Input: A DNF η = T1 ∨ T2 ∨ . . . ∨ Tm .
Output: All prime implicants of the function f represented by η.

initialize P := ∅, L = {T1 , . . . , Tm };
repeat while L  = ∅
select a term T ∈ L and set L := L \ {T } and P := P ∪ {T };
generate all consensuses of T with the input terms T1 , …, Tm , and
add the obtained new terms to L;
absorption: delete from P ∪ L all terms which are absorbed
by some other terms of P ∪ L;
end while
return the list of terms in P

Figure 6.4. Procedure Input Consensus.


290 6 Horn functions

For all i = 1, . . . , k, we can also observe that, since x ji is a negative linear term
of ηi−1 , η must contain a corresponding term of the form
 

Ti =  xj  ∧ x j i
j ∈Si

where Si ⊆ {j1 , . . . , ji−1 } (e.g., S1 = ∅).


Let us then define C0 = T0 , and let Ci be the consensus of Ci−1 and Tk−i+1 , for
i = 1, . . . , k. It is easy to verify by induction on i that these terms indeed have a
consensus (since otherwise NULR(η) would have stopped earlier), and that Ci is
a positive term with Ci ⊆ {j1 , . . . , jk−i } for i = 1, . . . , k − 1. Therefore, Ck is the
empty term.
Since T0 and Ti for i = 1, . . . , k are all different terms of η, this chain of
consensuses provides an input consensus derivation of the empty term with no
repetitions. 

Using the preceding lemma, we can now prove that the input consensus
algorithm indeed works for Horn DNFs (see Hooker [498]).
Lemma 6.6. Let η be a Horn DNF of the Horn function h, and let T be a prime
implicant of h. Then, T can be obtained from η by a sequence of input consensuses
such that each term of η is used at most once in the sequence.
Proof. Let us consider the DNF η = η|T =1 obtained from η by substituting the
value 1 for all literals in T . Then η ≡ 1, and hence, there is a subset of its terms,
say, D1 , …, Dl , such that the empty term can be obtained from these by a sequence
of consensuses, without repetitions. Since D1 , …, Dl are terms of η , each of them
corresponds to a term Ti of η, for i = 1, . . . , l. Performing exactly the same sequence
of consensuses on T1 , …, Tl , yields T . 

It follows immediately from Lemma 6.6 that:


Corollary 6.7. The Input Consensus procedure correctly generates all prime
implicants of any Horn DNF.
The complexity of the Input Consensus algorithm, however, may not be
polynomial in the number of prime implicants of the input DNF: To achieve
polynomiality, we have to perform again the same “prime substitution” step as
in the Prime Implicant Depletion procedure; that is, whenever a new term T is
generated and added to the list L, we should subsequently substitute T by a prime
implicant absorbing it. This leads us to the Input Prime Consensus procedure
displayed in Figure 6.5.
We next prove that this modification is acceptable, and that the Input Prime
Consensus method correctly generates all prime implicants of the input function.
We first state an easy technical lemma.
6.5 Prime implicants of Horn functions 291

Procedure Input Prime Consensus(η)


Input: A DNF η = T1 ∨ T2 ∨ . . . ∨ Tm .
Output: All prime implicants of the function f represented by η.

initialize P := ∅, L = {T1 , . . . , Tm };
repeat while L  = ∅
select a term T ∈ L and set L := L \ {T } and P := P ∪ {T };
generate all consensuses of T with the input terms T1 , …, Tm ;
replace each such consensus by a prime implicant of f absorbing it;
check if each of these prime implicants is in P ∪ L, and if not,
add the new ones to L.
end while
return the list of terms in P

Figure 6.5. Procedure Input Prime Consensus.

Lemma 6.7. Let us assume that P , Q, R are implicants of a function f and that
P is the consensus of Q and R. Let us assume further that R is a prime implicant
of f absorbing R. Then, P is absorbed either by R or by the consensus of Q
and R .

Proof. Assume first that Q and R do not have a consensus. Since R ≤ R , this
implies that P ≤ R . Assume next that Q and R have a consensus, say T . Then,
Q and R must have the same conflicting variable as Q and R, and thus, P ≤ T is
implied. 

Theorem 6.14. When η is a Horn DNF, the Input Prime Consensus procedure
correctly generates all prime implicants of the function represented by η.

Proof. Consider an arbitrary prime implicant P . In view of Lemma 6.6, P can be


generated by input consensus from η. Let Tij , j = 0, 1, ..., k, be the input terms
used in this consensus derivation of P , and let Rj , j = 1, ..., k, be the implicants
generated by these consensuses; more precisely, R1 is the consensus of Ti0 and Ti1 ,
and Rj is the consensus of Rj −1 and Tij for j = 2, ..., k. Finally, P = Rk .
We claim that, for all j = 1, ..., k, the list P contains a prime implicant Pj ∈ P
absorbing Rj . Since P = Rk is a prime implicant of f , this implies that P = Pk ∈ P,
which completes the proof of the theorem.
Let us establish the claim. Clearly, the consensus of Ti0 and Ti1 is executed
by procedure Input Prime Consensus(η); thus, we must have a prime implicant
P1 ∈ P absorbing R1 . Assume now, for j < k, that there is a prime implicant
Pj −1 ∈ P absorbing Rj −1 , and consider Rj . By Lemma 6.7, either Pj −1 absorbs
Rj , or the consensus C of Tij and Pj −1 absorbs Rj . In the latter case the consen-
sus C must have been generated by procedure Input Prime Consensus(η), since
Pj −1 ∈ P and Tij is an input term; therefore, there is a prime implicant Pj ∈ P
absorbing C, and hence Pj absorbs Rj . 
292 6 Horn functions

Corollary 6.8. The complete list of prime implicants of the Boolean function
represented by a Horn DNF η can be generated with polynomial delay using
procedure Input Prime Consensus(η).

Proof. Let us remark first that the incremental complexity of the previously
described methods for prime implicant generation (namely, Prime Implicant
Depletion and Input Consensus) resulted from the fact that, in each main cycle,
we had to check for absorption, a task requiring time proportional to the length
of the lists P and L. The speedup of Input Prime Consensus is due to the fact
that, instead of absorption, we have to check now for membership in P and L; this
can be done in O(n) time with an appropriate data structure, independently of the
length of those lists.
More precisely, let us assume that we keep both P and L in a hash table. Then
inserting a new member, deleting a member, or checking membership can all be
done in O(n) time. Now, it is easy to see that with every execution of the main
while loop, we add exactly one new element to the output list P. In the while
loop, selecting a term T , deleting it from L and adding it to P takes O(n) time;
generating the consensus of T with T1 , ..., Tm can be done in O(nm) time; replacing
the (at most m) consensuses by prime implicants can be done in O(nm|η|) time;
checking membership in P and L takes O(nm) time; and adding the new terms to
L can be done in O(nm) time. It follows that all prime implicants can be generated
with polynomial delay O(nm|η|) between two successive prime implicants. 

6.6 Properties of the set of prime implicants


Horn and pure Horn functions appear in many areas of applications, primarily
because several of the tasks arising in those applications can be reduced to solving
Boolean equations and thus, as we saw in Section 6.4, can be handled efficiently for
Horn systems. The actual complexity of these procedures depends, however, on the
representation of the underlying Horn function. Since a Horn function can typically
be represented by many different DNFs (and/or CNFs, etc.) of widely varying sizes,
it is a natural problem to find a “most efficient” representation of a given Horn
function. This is a very important practical problem, frequently considered in the
literature: Executing queries, or checking consistency in Horn rule bases, is faster
on “shorter” DNFs; the storage efficiency of relational databases is improved if
the Horn system of relations is represented in a most condensed form, and so on.
The basic problem of finding a “shortest,” or “most economical” Horn DNF
of a given Horn function is, in principle, a special case of logic minimization, a
topic that we considered in Chapter 3; see, in particular, Section 3.3. In this special
case, however, we are not only able to state more precise results, but we can
also introduce specific measures expressing what “shortest” should really mean in
different contexts.
While logic minimization is a hard problem in general, it becomes tractable for
certain measures of size in the special case of Horn functions. To be able to present
some of these positive results about Horn minimization, we need to establish
6.6 Properties of the set of prime implicants 293

further results about the structure of the family of implicants and about Horn DNF
representations of a Horn function. Since these results may be of independent
interest, we present them in this section, before turning to Horn minimization in
Section 6.7.
Definition 6.6. A set T of terms (elementary conjunctions) is said to be
closed under consensus if, for any two terms T , T ∈ T , their consensus, when
it exists, also belongs to T .
Let us note the difference between this definition and a similar one introduced
in Section 3.2.2. In Definition 6.6, we consider a set of terms (without absorp-
tions), and not their disjunction. This is an important detail, since our purpose is
to understand the structure of different DNF representations of a given function.
Clearly, the intersection of closed sets of terms is closed again; hence, every
set of terms has a unique smallest closed set containing it.
Definition 6.7. The consensus closure T c of a set of terms T is the smallest closed
set containing T .
Given a Boolean function f , let us denote by If the set of all implicants of f ,
and let Pf denote the set of its prime implicants. Clearly, If is a closed set, and
Pfc is a subset of If (typically, a proper subset).
Definition 6.8. Let T be a closed set of terms. A partition (R, D) of T
(R ∪ D = T and R ∩ D = ∅) is called a recessive-dominant partition (or, in
short, an RD-partition) of T if
• both R and D are closed under consensus, and
• if two terms T1 ∈ R and T2 ∈ D have a consensus T3 , then T3 ∈ D.

This terminology is inspired by a biological analogy: We can view the set of


implicants as a “population,” and the consensus operation as “mating” between
the members of this population. Then the above definition expresses that siblings
inherit a “dominant” strain when at least one of the parents possesses it, and they
inherit a “recessive” strain exactly when both parents have it.
Example 6.7. Let h be a Horn function, and consider the partition of its implicants
into positive and pure Horn terms, defined by
R = {T ∈ Ih | T is pure Horn} and D = {T ∈ Ih | T is positive}.
It is easy to verify that this is an RD-partition of Ih (cf. Exercise 26). 

For further examples of RD-partitions, we refer the reader to Čepek [173]


and Boros, Čepek, and Kogan [108] (see also Exercise 27). Let us add that RD-
partitions have nice algebraic properties (see, e.g., Exercise 25) which allow the
generation of even larger families of RD-partitions.
The significance of RD-partitions for Horn minimization is that RD-partitions
allow the decomposition of the minimization problem into a sequence of smaller
minimization problems.
294 6 Horn functions

To simplify our notations for the rest of this section, and with a slight abuse of
terminology, we shall view a DNF as a set of terms.

Definition 6.9. Given a DNF η representing the Horn function h and a family T
of terms, let us denote by ηT = η ∩ T the DNF formed by those terms of η that
belong to T , and let us call it the T -component of the DNF η. Let us further denote
by hT the Horn function defined by the disjunction of the terms in Ph ∩ T , and let
us call it the T -component of h.

The following result of Čepek [173] implies that the R-component of an arbi-
trary prime DNF representation η of a Horn function h defines the same Boolean
function, namely, the R-component of h, for any RD-partition (R, D) of Phc . As
a consequence, one can start the minimization of h by minimizing first hR , rep-
resented by ηR , and then replacing in η the terms of ηR by the obtained minimal
representation of hR , yielding a new, “shorter” DNF representation of h (in fact,
this scheme works for several different measures of “size”).

Theorem 6.15. Let h be a Horn function represented by a Horn DNF η ⊆ Phc , and
let (R, D) be an RD-partition of Phc . Then, the R-component ηR of η is a Horn
DNF representation of the R-component hR of h.

Proof. Let us first note that (η)c = Phc by Theorem 3.5 and by the properties
of the consensus closure. Since we obviously have (η ∩ R)c ⊆ Rc = R, the
above equality and the definition of an RD-partition by Definition 6.8 imply
(η ∩ R)c = R. Applying this for the particular DNF representation Ph of h,
instead of η, we also get (Ph ∩ R)c = R. Consequently, both ηR and hR have
the same set of prime implicants, namely, Ph ∩ R (since no prime implicant
of h is absorbed by a term of Phc , and since (Ph ∩ R)c ⊆ Phc , obviously),
which proves the statement. 

For the special case in which R is the set of pure Horn terms of a Horn function
(as in Example 6.7), this result was established by Hammer and Kogan [446].
Let us remark that the condition η ⊆ Phc in Theorem 6.15 can easily be fulfilled
by requiring η to be prime. Note, however, that this condition cannot be relaxed
completely; for instance, it cannot be simply replaced by the irredundancy of η.
Indeed, irredundant Horn DNFs may contain terms that cannot be obtained from
the prime implicants by consensus; moreover, there may exist an irredundant DNF
representation of a Horn function, which is perfectly disjoint from the consensus
closure of its prime implicants. To illustrate this, let us consider the following
example.

Example 6.8. Consider the Horn DNFs

η = x1 x 2 ∨ x1 x 3 ∨ x1 x2 x3 ∨ x 1 x2 x3 ,
φ = x1 ∨ x2 x3 .
6.6 Properties of the set of prime implicants 295

It is easy to verify that both DNFs are irredundant Horn representations of the
same function h. However, when R is the family of all pure Horn terms in Phc (as
in Example 6.7), we have ηR = x1 x 2 ∨ x1 x 3 ∨ x 1 x2 x3  = 0 = φ R . The main reason
for the equality ηR = hR to fail in this case is that none of the terms of η belongs
to Phc . 

Let us further remark that a result analogous to Theorem 6.15 does not hold
for D-components: The D-components of different Horn DNF representations of
a same Horn function may represent different Boolean functions, as the following
example shows:
Example 6.9. Consider the following Horn DNFs
η = x1 x2 ∨ x1 x 3 ∨ x2 x 4 ∨ x3 x 1 ∨ x4 x 2 ,
φ = x3 x4 ∨ x1 x 3 ∨ x2 x 4 ∨ x3 x 1 ∨ x4 x 2 .
It is easy to verify that η and φ are equivalent irredundant prime Horn DNFs of
the Horn function h having the following prime implicants Ph = {x1 x2 , x1 x4 , x2 x3 ,
x3 x4 , x1 x 3 , x2 x 4 , x3 x 1 , x4 x 2 }. If we partition the implicants of h into pure Horn
and positive terms, we obtain an RD-partition (as in Example 6.7). However, the
D-components of the above DNFs, ηD = x1 x2 and φ D = x3 x4 , are not equivalent,
and none of them represents the disjunction x1 x2 ∨ x1 x4 ∨ x2 x3 ∨ x3 x4 of all posi-
tive prime implicants of h. 

Although the concept of D-component of of a Horn function does not appear


to be very useful, D-components of Horn DNFs turn out to have a remarkable
property, at least, for certain RD-partitions: Namely, for such RD-partitions, the
D-components of all irredundant and prime DNF representations of a Horn func-
tion h contain the same number of terms. Since a representation involving the
minimum number of terms can be assumed to be irredundant and prime, this prop-
erty implies that it is enough to minimize the R-component of irredundant and
prime representations in order to find a term-minimal representation of h. We now
establish the property (see [173]).
Theorem 6.16. Let h be a Horn function, let η1 , η2 ⊆ Phc be two irredundant DNFs
of h, and let (R, D) be an RD-partition of Phc such that no two terms in D have a
consensus. Then, the number of terms in η1D and η2D is the same.
Proof. Let us associate with h a directed graph G = (D, A), where
A = {(T , T ) | T , T ∈ D, T is an implicant of hR ∨ T }.
Clearly, G is a transitively closed directed graph, and its definition depends only on
h and the considered RD-partition (R, D), but not on any particular representation
of h. Let us denote by C1 , . . . , Cq the strong components of G and assume that
C1 , . . . , Ct (t ≤ q) are its source components, that is, those components that have
no incoming arcs.
296 6 Horn functions

By Theorem 6.15 we know that if η ⊆ Phc , then ηR ≡ hR . Using this fact,


we can show that η must contain exactly one term from each of the components
C1 , …, Ct , and no other terms from D. Applying this claim to η1 and η2 will then
prove the statement.
To show the claim, let us consider an arbitrary irredundant DNF η ⊆ Phc of
h. Since every implicant of h belonging to D ⊆ Phc = (η)c can be obtained by a
series of consensus operations from η, and since no consensus operation can be
performed between the terms of D by our assumption, only one term of ηD is used
in such a consensus chain; all other terms must be from ηR . Thus, for every P ∈ D,
there exists a term T in ηD such that P is an implicant of ηR ∨ T ≡ hR ∨ T . In
other words, for every P ∈ D, there must exist a directed path in G from a term
of ηD , implying that Cj ∩ ηD  = ∅ for j = 1, . . . , t. On the other hand, if T is a
term of ηD , then for all other terms P ∈ D for which there exists a directed path
from T to P in G, we have that P ≤ ηR ∨ T ≡ hR ∨ T ; thus those terms cannot
appear in the irredundant DNF η. This implies that |ηD ∩ Cj | = 1 for j = 1, . . . , t
and |ηD ∩ Cj | = 0 for j = t + 1, . . . , q, proving the claim, and completing the proof
of the theorem. 

This theorem was proved for the positive terms of an irredundant prime Horn
DNF (see Example 6.7) by Hammer and Kogan [446]. The statement implies in this
case that the number of positive terms is the same constant in all irredundant and
prime DNF representations of a Horn function; see Example 6.9 for an illustration
of this.
Let us note again that the conditions η1 , η2 ⊂ Phc cannot be simply disregarded,
since the statement does not remain true, in general, even for irredundant Horn
DNFs, as the following example shows:
Example 6.10. Consider the Horn DNFs of Example 6.8. The DNF η contains
only one positive term, while φ contains two such terms, and, in fact, φ is the
(unique) shortest DNF of the corresponding Horn function. The conclusion of
Theorem 6.16 fails here because η contains implicants that do not belong to Phc . It
is possible to perform consensus operations with these implicants that introduce
extra arcs in the corresponding digraph G, and in effect reduce the number of
source components from 2 to 1 (cf. Exercise 28). 

Theorems 6.15 and 6.16 provide the basis for a very useful decomposition
technique of Horn minimization problems. For Horn functions, and especially
for pure Horn functions, there are several different RD-partitions that could be
utilized in such decomposition methods (see, e.g., [108] and Exercise 27). Similar
structural properties of Horn CNFs also play an important role in decomposability
of Horn functions, and in an AI context, in Horn belief revision (see [595]).
As we shall see in the rest of this chapter, the above results alone provide
efficient minimization techniques for several special classes of Horn functions.
We also refer the reader to [109] for a more thorough treatment of this topic.
6.7 Minimization of Horn DNFs 297

6.7 Minimization of Horn DNFs


We now turn our attention to the problem of finding a “shortest” DNF represen-
tation of a given Horn function. We present here a number of related results from
several different sources (see, e.g., [37, 108, 173, 446, 447, 646]. The word “short-
est” may in fact refer to several different objectives here (cf. Chapter 3). Given a
Horn function h, represented by the Horn DNF
   
n

η= P∨ P xi 
P ∈P0 i=1 P ∈Pi

as in (6.3), we can consider the number of terms


n

τ (η) = 3η3 = |P0 | + |Pi |, (6.10)
i=1

and the number of literals


 n 

λ(η) = |η| = |P | + (1 + |P |) (6.11)
P ∈P0 i=1 P ∈Pi

as measures of the size of η. For a function h and µ ∈ {λ, τ }, we define

µ(h) = min{µ(η) | η is a DNF of h}.

Let us recall from Section 6.2 that a Horn function h can also be represented as a
set of implications of the form

P =⇒ for P ∈ P0 , and
P =⇒ xi for P ∈ Pi , i = 1, . . . , n.

The sets of positive literals P ∈ P0 ∪ P1 ∪ · · · Pn are called the source sides of


these implications. Let us also observe that, if P ∈ Pi ∩ Pj ∩ · · · ∩ Pk , then the
corresponding implications can be written as a single implication of the form
P =⇒ (xi ∧ xj ∧ · · · ∧ xk ). Thus, h can also be represented as the collection of
such implications by a DNF of the form:
  

V= P =⇒  xj  . (6.12)
P ∈P j ∈R(P )

The number σ (V) = |P| is called the number of source sides in such an implication
representation V, and can also be used as a measure of the size of the represen-
tation (see, e.g., [37, 646]). For a Horn function h we define σ (h) = min σ (V),
where the minimization is over all possible implication representations V of h, as
in (6.12).
298 6 Horn functions

We also consider for each µ ∈ {λ, τ , σ } the decision variant of the problem of
finding a shortest representation of a given Horn function:

Horn µ-Minimization
Instance: A Horn DNF η of a Horn function h and an integer K.
Output: A (Horn) DNF or implication representation η∗ of the Horn function h
such that µ(η∗ ) ≤ K, if there is one.

Note that we do not have to require the output to be Horn in case of µ ∈ {λ, τ }.
In fact, by substituting the non-Horn terms of η∗ by prime implicants of η (which
can easily be done in polynomial time in the size of η according to Lemma 6.3),
we can always obtain a Horn DNF η∗∗ such that η ≥ η∗∗ ≥ η∗ and µ(η∗∗ ) ≤ µ(η∗ )
for both measures µ ∈ {τ , λ}. It is also easy to see that η∗ ≥ η holds if and only
if η∗∗ ≥ η holds, and the latter can be checked in polynomial time by Lemma 6.3
(see [173]). Thus, we can assume in the sequel, without any loss of generality, that
η∗ is a Horn DNF when µ ∈ {λ, τ }.

6.7.1 Minimizing the number of terms


Since partitioning the implicants into pure Horn and positive terms provides an RD-
partition, and since there is no consensus between positive terms, Theorems 6.15
and 6.16 immediately imply the following decomposition, as shown by Hammer
and Kogan [446]:
Corollary 6.9. Given a Horn function h and an irredundant prime DNF η of h,
consider the RD-partition of its implicants into the sets of pure Horn and positive
terms. Then we have τ (ηD ) = τ (h) − τ (hR ); that is, τ (ηD ) is a constant, inde-
pendent of η. Furthermore, h = ηD ∨ η holds for an arbitrary DNF η of the pure
Horn component hR of h. Thus, the problem of finding a τ -minimal (shortest)
DNF of h can be reduced in polynomial time to finding a τ -minimal DNF of its
pure Horn component hR .
Proof. Theorem 6.16 claims that for any RD-partition (R, D) of Phc such that there
is no consensus between the terms of D, the number of terms |η ∩ D| in the D-
component of an irredundant DNF η ⊆ Phc of h is a constant. Applying this for the
pure Horn versus positive RD-partition, we can conclude that even the “shortest”
Horn DNF of h contains exactly the same constant number of positive terms.
Furthermore, Theorem 6.15 states that the R-component of η represents the
function hR , namely, the R-component of h, for all representations η ⊆ Phc
of h. Consequently, if η is an arbitrary Horn DNF representation of hR , then
η = ηD ∨ η is a Horn DNF representation of h. Thus, if η is a “shortest” Horn
DNF of hR , then η is a “shortest” Horn DNF of h. 

Given a pure Horn function h, finding a τ -minimal pure Horn DNF of h, is


however, a difficult problem. This problem was first considered in the slightly
different context of directed hypergraphs, and its hardness was shown by Ausiello,
6.7 Minimization of Horn DNFs 299

D’Atri, and Saccà [37] using a reduction from set covering. We sketch their proof
in the context of pure Horn τ -minimization:

Theorem 6.17. Horn τ -minimization is NP-complete, even if the input is restricted


to pure Horn expressions.

Proof. Let us consider a hypergraph H = (V , E) over the base set V = {1, 2, ..., n}

such that H ∈E H = V . It is well-known that, for a given integer k < m = |E|, it
is NP-complete to decide the existence of a subset of hyperedges S⊆ E that is a
cover of H of cardinality at most k, that is, such that |S| ≤ k and H ∈S H = V
(see, e.g., [371]).
With the hypergraph H and with every subset of hyperedges S ⊆ E, we now
associate pure Horn DNFs V and ηS , depending on the Boolean variables z, xj
for j ∈ V , and yH for H ∈ E, where
   
n

V =  x j yH  ∨  xj y H  ,
H ∈E j ∈H H ∈E j =1

and
# $

ηS = z y H ∨ V.
H ∈E

Let us further denote by h the Horn function represented by the pure Horn DNF
ηH . We claim that h has a DNF with no more than k + τ (V) terms if and only if
H has a cover of cardinality no more than k.
To see this, let us observe first that since ηH does not involve the literal z, no
term in Phc contains z (all those terms can be obtained from ηH by consensus).
Let us then define D as the set of those terms in Phc involving the literal z, and
let R = Phc \ D. Any consensus involving a term in D will result in a term also
containing z. Hence, (R, D) forms an RD-partition for h and thus V represents
hR , the R-component of h. Furthermore, V is a τ -minimal representation of hR .
This is because all quadratic terms in V must appear in all representations of V,
and all such representations must also contain at least one term including y H for
all H ∈ H. Since the only prime implicants in D are z y H , H ∈ H, and z x j , j ∈ V ,
and since a term z x j can always be replaced by z y H for H ∈ H such that j ∈ H
without changing the size of the representation, Theorem 6.15 implies that a τ -
minimal prime DNF of h looks like ηS for some subhypergraph S ⊆ E. Since ηS
represents h if and only if S is a cover, our main claim follows. 
This result can further be improved, as observed by Boros, Čepek, and
Kučera [110].

Theorem 6.18. Horn τ -minimization remains NP-complete even if the input is


restricted to cubic pure Horn expressions.
300 6 Horn functions

Proof. Let us try to repeat the above proof with a small modification in the definition
of V. Namely, let us introduce n − 1 additional variables and replace the high-
degree terms by a chain of cubic and quadratic terms, as follows:
 
# $
 
T =  x j yH  ∨ x1 x2 u1 ∨ u1 x3 u2 ∨ · · · ∨ un−2 xn un−1 ∨ un−1 y H ,
H ∈H H ∈H
j ∈H

and set # $

ηS = z y H ∨ T.
H ∈S
As in the proof of Theorem 6.17, we denote by h the function represented by the
cubic pure Horn DNF ηH . We can then repeat the preceding proof, with T playing
the role of V. 

Note that for quadratic pure Horn DNFs, τ -minimization is equivalent to finding
the transitive reduction of a directed graph (that is, finding the smallest subset of
arcs, the transitive closure of which is the same as that of the original graph), which
is a polynomially solvable problem; see Sections 5.4.1 and 5.8.4.
On the positive side, for an arbitrary Horn function h, Hammer and Kogan
[447] proved that τ (h) is approximated within a reasonable factor by the size of
any irredundant prime DNF of h.
Theorem 6.19. If h is a Horn function on Bn and η ⊆ Phc is an irredundant Horn
DNF of h, then τ (η) ≤ (n − 1)τ (h).
Proof. Let us consider the RD-partition R ∪ D = Phc into pure Horn and positive
terms, and let ζ denote a τ -optimal irredundant, prime DNF of h. Then, τ (ηD ) =
τ (ζ D ) holds for the positive components according to Theorem 6.16, and η1 =
ηR ≡ ζ1 = ζ R = hR must hold for the pure Horn components by Theorem 6.15.
Let us further divide R into R = PhcR = R ∪ D , where D is the set of linear terms
and R is the set of nonlinear pure Horn terms in R. This yields an RD-partition
of the closure of the prime implicants of hR (see Exercise 27), and by the same
theorems, we get that τ (η1D ) = τ (ζ1D ) and η2 = η1R ≡ ζ2 = ζ1R = hR .
Let us consider next a term Ay of ζ2 . Since η2 ≡ ζ2 , this term is an implicant
of η2 , and thus, by Lemma 6.3 variable y must belong to the forward chaining
closure Aη2 of A. Let Aη2 \ A = {xi1 , xi2 , . . . , xik } be indexed according to the order
in which forward chaining adds these variables to A, and let Aij x ij be the term of
η2 used in this process when adding xij to A, for j = 1, . . . , k. (We have y = xit for
some t ≤ k.) It is easy to see that performing consensuses between these terms,
we can derive the prime implicant Ay.
Thus we need at most |Aη2 \ A| terms of η2 to derive a term Ay ∈ ζ2 . Due to the
fact that η is irredundant, η2 must also be irredundant (this follows by Theorem
6.15), and thus, all terms of η2 must appear in such derivation for some terms of

ζ2 . Therefore, we have τ (η2 ) ≤ Ay∈ζ2 |Aη2 \ A| ≤ (n − 1)τ (ζ2 ), since ζ2 does not
contain linear pure Horn terms by our construction.
6.7 Minimization of Horn DNFs 301

Putting all the above together, we obtain

τ (η) = τ (ηD ) + τ (η1D ) + τ (η2 ) = τ (ζ D ) + τ (ζ1D ) + τ (η2 )


≤ τ (ζ D ) + τ (ζ1D ) + (n − 1)τ (ζ2 ) ≤ (n − 1)τ (ζ ),

which completes the proof. 

Let us again observe that in this theorem, η ⊆ Phc is an important condition


without which the claim does not remain true, as illustrated by the following
example.

Example 6.11. Consider the irredundant DNF representation η = x1 x 2 ∨ x1 x 3 ∨


x1 x2 x3 of the Horn function h = x1 . In this case we have n = 3, τ (η) = 3, and
τ (h) = 1. 

Let us finally remark that much better polynomial time approximation may
not be achievable, as shown by a recent inapproximability result of Bhattacharya,
DasGupta, Mubayi, and Turán [77]:
1−M
Theorem 6.20. For any fixed 0 < M < 1, one cannot guarantee a 2log n -
approximation for Horn τ -minimization in polynomial time, unless N P ⊆
DT I ME(npolylog(n) ).

6.7.2 Minimizing the number of literals


We turn next to the minimization of the number of literals in a Horn representation.
The first related result, due to Maier [646], establishes the hardness of minimization
for a somewhat different measure; its proof, however, carries easily over to the case
of λ-minimization (see, e.g., [173]). A simpler and more elegant reduction from
set covering to λ-minimization was presented by Hammer and Kogan [447]. This
result can further be strengthened, as noted by Boros, Čepek, and Kučera [110]:

Theorem 6.21. Horn λ-minimization is NP-complete, even if the input is restricted


to cubic pure Horn DNFs.

Proof. Given a hypergraph (V , E), let us consider the cubic Horn DNF ηS ,
defined as in the proof of Theorem 6.18, for any subfamily S ⊆ E. It can be
verified that ηS is not only τ -minimal but also λ-minimal if and only if S is a
minimal cover. 

For quadratic pure Horn DNFs, λ-minimization is easily seen to be equivalent


to τ -minimization, and hence, it is polynomially solvable, as we remarked earlier.
On the positive side, Hammer and Kogan [447] proved that λ(h) is approx-
imated within a reasonable factor by any irredundant and prime Horn DNF
representation. More precisely, we can show the following:
302 6 Horn functions

Theorem 6.22. If h is a Horn function on B n and η ⊆ Phc is an irredundant Horn


n
DNF of h, then λ(η) ≤ 2 λ(h).
Proof. Let us consider the RD-partition D ∪R = Phc into linear and nonlinear terms
of the set Phc (see Exercise 27), and let ζ denote a λ-minimal DNF of h. Then, by
Theorems 6.15 and 6.16, we have λ(ηD ) = τ (ηD ) = τ (ζ D ) = λ(ζ D ) and ηR ≡
ζ R ≡ hR . Since ζ R does not contain any linear terms, we have λ(ζ R ) ≥ 2τ (ζ R ) ≥
2τ (hR ). Furthermore, by Theorem 6.19, we have τ (ηR ) ≤ (n − 1)τ (hR ). Putting
all these together with the trivial inequality λ(φ) ≤ nτ (φ), we obtain
λ(η) = λ(ηD ) + λ(ηR ) = λ(ζ D ) + λ(ηR ) ≤ λ(ζ D ) + nτ (ηR )
≤ λ(ζ D ) + n(n − 1)τ (hR ) ≤ λ(ζ D ) + 12 n(n − 1)λ(ζ R )
 
≤ n2 λ(ζ ) = n2 λ(h).


Here again, condition η ⊆ Phc is important because Theorem 6.22 does not hold
for arbitrary irredundant Horn DNFs.
Example 6.12. Consider the DNF η of the Horn function h = x1 , as in Example
6.11. In this case, we have n = 3, λ(η) = 7, while λ(h) = 1. 

6.7.3 Minimization of the number of source sides


An arbitrary Horn DNF η can be rewritten straightforwardly as an implication
expression V of the form (6.12), and σ (V) will be exactly the number of different
sets of positive variables appearing in η. Conversely, any implication expression
V can be rewritten as a Horn DNF η, such that the number of different sets of
positive variables appearing in η is exactly σ (V). Thus, we can denote by σ (η)
the number of different sets of positive variables appearing in an arbitrary Horn
DNF η, and restate Horn σ -minimization as the problem of finding a Horn DNF
representation η of a given Horn function h minimizing σ (η).
Horn σ -minimization was shown to be solvable in polynomial time by Maier
[646] and by Ausiello, D’Atri, and Saccà [37]. In the rest of this section, we provide
a proof of this lone, truly positive result in the area of Horn DNF minimization.
We first show that it is enough to consider the problem for pure Horn functions.
Lemma 6.8. If π and π are positive DNFs on {x1 , x2 , . . . , xn }, and η and η are
pure Horn DNFs on {x1 , x2 , . . . , xn }, then π ∨η and π ∨η represent the same Horn
function h if and only if the DNFs η ∨ (π ∧ x n+1 ) and η ∨ (π ∧ x n+1 ) represent
the same pure Horn function h on n + 1 variables.
Proof. The claimed equivalence trivially holds if xn+1 = 0, and follows by
the existence of a unique pure Horn component (see Theorem 6.15) when
xn+1 = 1. 
6.7 Minimization of Horn DNFs 303

Lemma 6.8 implies that we can associate a unique pure Horn function h in
n + 1 variables with every Horn function h in n variables, so that σ (h) = σ (h ).
Therefore, in the sequel, we shall consider source minimization only for pure Horn
functions.
Recall from Section 6.4 that the forward chaining closure S η of a subset S of the
variables is uniquely defined for every (pure) Horn DNF η, and that this closure
is the same for every (pure) Horn DNF representing a given function h, so that
we can also denote S η as S h . It follows from Lemma 6.3 that a pure Horn term
Ax ∈ Phc is an implicant of a Horn function h if and only if x ∈ Ah .
Note further that, since we view a DNF as a set of terms, we consider η = xz to be
different from η = xz ∨ xyz, even if they represent the same Boolean function; but
η is considered to be the same as η = xz ∨ xz, even if they are written differently.

Definition 6.10. Given an implicant T x ∈ Phc of a pure Horn function h, the set
of terms I(T ) = {T y | y ∈ T h \ T } ⊆ Phc is called the h-star of T .

Note that if T x ∈ Phc , then we have I(T ) ⊆ Phc , by Lemma 6.3.

Definition 6.11. For a pure Horn DNF η, we denote by S(η) the family of all
those subsets of variables which appear as sets of positive variables of a term of η.
We call S(η) the family of source sets of η.

With this definition, we have σ (η) = |S(η)| for every pure Horn DNF η.

Definition 6.12. Given a DNF η ⊆ Phc of the pure Horn function h, we associate

to it another DNF defined by η∗ = T ∈S(η) I(T ). We say that η∗ is the star closure
of η, and we say that η is star closed if η = η∗ .

The star closure η∗ represents h, and we have S(η) = S(η∗ ) by the preceding
definitions.

Definition 6.13. A star closed pure Horn DNF η representing the pure Horn

function h is called star irredundant if the DNF T ∈S I(T ) does not represent
h for any proper subset S  S(η).

Lemma 6.9. Given a DNF η ⊆ Phc representing a pure Horn function h, a star
η representing h can be constructed in O(n|η|2 )
closed and star irredundant DNF *
time.

Proof. Since T η can be computed by forward chaining in O(n + |η|) time for an
arbitrary subset T of the variables (see Section 6.4), we can compute the star closure
η∗ of η, namely, the sets I(T ) for T ∈ S(η) in O(|S(η)|(n+|η|)) = O(n|η|+|η|2 )
time.
Let us next initialize *η = η∗ and label the sets S(η) = {T1 , T2 , . . . , Tk } (where
k = |S(η)|). Then, repeat the following for j = 1, . . . , k: define the DNF
 φ
φj = Q∈S(*η)\{Tj } I(Q), and compute the forward chaining closure Tj j in
φ η
O(n + |φj |) = O(n|η|) time. Clearly, if Tj j = Tj , then φj also represents h;
304 6 Horn functions

hence, the star I(Tj ) is redundant in * η. In this case, update *


η = φj . Otherwise,
keep the star of Tj in the representation * η.
At the end of this loop, *
η is a star irredundant (and star closed) representation of
h, as claimed. Since we have |S(η)| ≤ |η| steps in the loop, we can complete this
part in O(n|η|2 ) time. Thus, the total time required by the procedure is O(n|η|2 ),
as stated. 

The main result of this subsection, then, states that any star irredundant and star
closed DNF representation of a pure Horn function is also σ -minimal.
Theorem 6.23. If h is a pure Horn function, and η ⊆ Phc is a star closed, star
irredundant DNF of h, then σ (h) = σ (η).
Before we prove this statement, we need a few more definitions and lemmas.
Observe first that if h is a pure Horn function, and S is subset of its variables such
that S h = S, then the partition RS = {Ax ∈ Phc | A ⊆ S} and DS = {Ax ∈ Phc | A  S}
is an RD-partition of Phc (see Exercise 27). To simplify our notations, we denote
respectively by hS and ηS the RS -components of h and η, when η ⊆ Phc is a DNF
representation of h; we call hS and ηS the S-components of h and η, respectively.
Note that hS could equivalently be defined by the disjunction of all terms Ax ∈ Phc
for which Ah ⊆ S, and that ηS is a DNF representation of hS for every DNF η ⊆ Phc
of h, by Theorem 6.15.
Definition 6.14. For a pure Horn function h and a subset S of its variables such
that S h = S, we denote by hS the function defined by the disjunction of all those
terms T x ∈ Phc such that T h  S. Analogously, for a DNF η ⊆ Phc of h, we denote
by ηS the disjunction of all those terms T x ∈ η such that T h  S.
The next lemma is instrumental in our proof of Theorem 6.23, and it leads to the
identification of another type of “subfunction” of pure Horn functions, not implied
by RD-partitions.
Lemma 6.10. Let h be a pure Horn function, let S be a subset of its variables
such that S h = S, and let η ⊆ Phc be a Horn DNF of h. Then, for every implicant
Ax ≤ hS , either Ax ≤ ηS or Ah = S.
Proof. Let us consider an arbitrary implicant Ax ≤ hS for which Ax  ηS . We
claim that Ah ⊇ S, which will imply the lemma, since S ⊇ A and S h = S by our
assumptions. To see this claim, we consider the partial assignment that sets all
variables in A to 1 and assigns 0 to x. Since Ax  ηS , the Horn function obtained
from ηS ≡ hS by substituting this partial assignment has some false points, and
thus it has a unique minimal false point by Corollary 6.3. Let X∗ denote this unique
binary assignment, extended with the values assigned to the variables of A and to
x, and let us denote by Q the subset of variables which are assigned value 1 in
S
X ∗ . It is easy to see by the definition of forward chaining that we have Q ⊆ Aη
(since x = 0 limits the forward chaining procedure). Since Ax ≤ hS ≡ ηS , and the
term Ax evaluates to 1 at X ∗ , by our construction, there must exist a term By of
6.7 Minimization of Horn DNFs 305

ηS that also evaluates to 1 at X ∗ , that is, for which B ⊆ Q and y  ∈ Q. Clearly, this
term of ηS does not belong to ηS , since all terms of ηS vanish at X∗ ; thus, B h = S
S
is implied by the definition of ηS . Since we have Ah = Aη ⊇ Aη ⊇ Q ⊇ B, the
relations Ah = (Ah )h ⊇ B h = S follow, concluding the proof of the claim. 

Corollary 6.10. Let h be a pure Horn function, let S be a subset of its variables
such that S h = S, and let η ⊆ Phc be a Horn DNF of h. Then, ηS represents the
function hS .

Proof. For any term T x ∈ Phc for which T h  S it follows by Lemma 6.10 that
T x ≤ ηS , which then implies hS ≤ ηS by Definition 6.14. For the converse direc-
tion, the terms of ηS are also implicants of hS by Definition 6.14, since η ⊆ Phc is
assumed. 

We are now ready to prove the main theorem of this subsection.

Proof of Theorem 6.23. Consider two star closed, star irredundant DNFs η ⊆ Phc
and ζ ⊆ Phc of the pure Horn function h, and fix an arbitrary subset S of the
variables for which S h = S. Clearly, both ηS and ζ S represent the S-component hS
of h; thus, they both must be star closed and star irredundant because both η and
ζ are assumed to be star closed and star irredundant. Let us further denote by

S(ηS ) \ S(ηS ) = {A1 , . . . , Ak } and S(ζ S ) \ S(ζ S ) = {B1 , . . . , B- }

the source sets of η and ζ , respectively, for which Ahi = Bjh = S holds for i = 1, . . . , k
and j = 1, . . . , -.
We claim that k = -. Since every source set of S(η) and S(ζ ) corresponds
to exactly one subset S of the variables, satisfying S h = S, this claim implies
the statement of the theorem, for example, by assuming that ζ is a σ -optimal
representation.
To prove the claim, let us assume indirectly that, for instance, k > -. Note
first that according to Corollary 6.10, both ηS and ζ S represent the same function
hS . Furthermore, the star irreducibility of ηS and ζ S implies that Ai x  ≤ ηS , and
Bj y  ≤ ζ S for some variables x ∈ Ahi \ Ai , and y ∈ Bjh \ Bj for all i = 1, . . . , k and
j = 1, . . . , -.
Thus it follows, as in the proof of Lemma 6.10, that for every index i, there
S
exists a corresponding index j , such that Ah ⊇ Bj , and conversely, for every
S
index j , there exists a corresponding index i such that Bjh ⊇ Ai . Since k > -, we
S S
must have indices i1 , i2 and j for which Ahi1 ⊇ Bj and Ahi2 ⊇ Bj . Let us denote
hS
by i3 one of the indices for which Bj ⊇ Ai3 holds. Since i1  = i2 , we can assume,
without any loss of generality, that i3 = i1 . Thus,
 ηS
ηS ηS S
Ai 1 = Ai1 ⊇ Bjh ⊇ Ai3
306 6 Horn functions

follows, from which we can derive


ηS ∪I(Ai3 )
 S I(Ai ) I(Ai )
η 3
A i1 = Ai1 ⊇ Ai3 3 = S.

This last relation implies by Lemma 6.3 that every term of I(Ai1 ) is an impli-
cant of ηS ∪ I(Ai3 ), contradicting the fact that η was chosen as a star irredundant
expression. This contradiction proves that k = -, finishing the proof of the claim
and of the theorem. 

We close this section by mentioning that a remarkable directed graph can be


associated quite naturally with pure Horn DNFs (and with pure Horn functions,
as well; see [448]), and this directed graph plays an important role (explic-
itly or implicitly) in many of the related results obtained in this area (see, e.g.,
[20, 37, 108, 173, 383, 448, 449, 711]). We refer the reader to Section 6.9.4 for
further details. We also note that, besides the “minimality” of Horn expressions
(in various senses), several other extremal properties of Horn representations lead
to interesting combinatorial results (see, e.g., [594]).

6.8 Dualization of Horn functions


The dual of a Boolean function f (X) has been defined as f d (X) = f (X), where
X denotes the componentwise negation of X; see Section 1.3 and Chapter 4. In
this section, we consider the problems of characterizing and generating f d when
f is a Horn DNF.
Despite the fact that duals of Horn functions must be very special, since Horn
functions are special, it is not immediate to obtain a simple characterization. Cer-
tainly, the dual of a Horn function is not necessarily Horn, as shown by the
example h(x1 , x2 ) = x 1 ∨ x 2 , for which hd (x1 , x2 ) = x 1 x 2 is not Horn. Gener-
alizing slightly a result of Eiter, Ibaraki, and Makino [298], we can obtain the
following characterization of DNF expressions of the dual of a Horn function:
Theorem 6.24. Consider a DNF φ of a Boolean function f , where
 
m

φ(X) =  xj xk .
i=1 j ∈Pi k∈Ni

Then, f is the dual of a Horn function if and only if for any two distinct indices
i = i ,
φ|{xj =1|j ∈Pi ∪Pi }∪{xk =0|k∈Ni ∩Ni } ≡ 1. (6.13)
Proof. Recall (see Theorem 4.7) that a nontrivial term

T= xj xk, (6.14)
j ∈P k∈N

(where P ∩ N = ∅) is a prime implicant of the dual function f d if and only if


(P ∩ Pi ) ∪ (N ∩ Ni )  = ∅ for all i = 1, . . . , m, and the set of literals in T is minimal
6.8 Dualization of Horn functions 307

with respect to these conditions. In other words, the prime implicants of the dual
f d , as subsets of literals, are in one-to-one correspondence with those minimal
transversals of the hypergraph on the set of literals formed by the terms of φ,
which do not contain complementary pairs of literals. Thus, f is the dual of a
Horn function if and only if all such minimal transversals of the terms of φ contain
at most one negative literal.
To prove the theorem, let us assume first that there exists a non-Horn prime
implicant of f d of the form (6.14) with |N | ≥ 2, and let us prove that, in this case,
condition (6.13) is violated.
Since T is prime, for every - ∈ N there must exist a term i(-) of φ such that
(P ∪ N ) ∩ (Pi(-) ∪ Ni(-) ) = {-}. Thus, for any two distinct indices -  = - , -, - ∈ N ,
we have - ∈ Ni(-) \ Ni(- ) , - ∈ Ni(- ) \ Ni(-) and P ∩ (Pi(-) ∪ Pi(- ) ) = ∅. On the other
hand, P ∩ Pi  = ∅ must hold for all terms of φ such that Ni ⊆ (Ni(-) ∪ Ni(- ) ) \ {-, - },
since N ∩ Ni = ∅ for such terms.
It follows from these observations that the assignment {xj = 0 | j ∈ P } ∪
{xk = 1 | k ∈ N } is compatible with the assignment {xj = 1 | j ∈ Pi(-) ∪ Pi(- ) } ∪
{xk = 0 | k ∈ Ni(-) ∩ Ni(- ) }. However, since T is an implicant of f d , φ vanishes
when xj = 0 for j ∈ P and xk = 1 for k ∈ N , contradicting (6.13).
For the reverse direction, let us assume indirectly that there exist two distinct
indices i and i such that
φ|{xj =1|j ∈Pi ∪Pi }∪{xk =0|k∈Ni ∩Ni }  ≡ 1.
Let X be an assignment of the variables xj , j  ∈ Pi ∪ Pi ∪ (Ni ∩ Ni ), at which the
left-hand side vanishes, and let us define P = {j | xj = 0} and N = {k | xk = 1}.
Then the term T corresponding to these sets P and N is a transversal of the terms
in φ; thus, it contains a minimal transversal. All such minimal transversals, how-
ever, must have a literal from both terms i and i , which can only be from the
sets Ni \ Ni and Ni \ Ni , respectively, implying that all such minimal transversals
must contain at least two negative literals. 

For general DNFs φ the above characterization is not computationally efficient,


since (6.13) is a tautology problem (and any tautology problem can arise in this
way). Actually, we have:
Theorem 6.25. It is co-NP-complete to decide whether a given DNF φ represents
the dual of a Horn function.
Proof. In view of Theorem 6.24, the recognition problem is in co-NP: Indeed, to
show that φ does not represent the dual of a Horn function, it suffices to exhibit two
indices i, i and a point X such that the left-hand side of (6.13) evaluates to 0 at X.
Moreover, NP-hardness immediately follows from Theorem 1.30 in
Section 1.11: If C denotes the class of duals of Horn functions, then C does not
contain all Boolean functions, the constant function 1n is in C, and all restrictions
of a member of C are in C. 
308 6 Horn functions

However, for special classes of DNFs for which tautology is tractable and
remains so after fixing some of the variables, Theorem 6.24 provides a computa-
tionally efficient way of recognizing whether the dual of the input DNF is indeed
Horn. This applies, for instance, when φ itself is a Horn DNF; see also [298] and
Section 6.9.2.
We turn now to the problem of generating a DNF of the dual f d of a Horn
function f . It is clear that this problem is at least as hard as the generation of
the dual of a monotone function, since monotone functions are Horn. It is not
so clear, however, whether “Horn dualization” is strictly harder than “monotone
dualization.” Recall that a prime DNF of the dual of a monotone function can
be generated incrementally efficiently (see Fredman and Khachiyan [347] and
Section 4.4.2 in Chapter 4). We explain next that a similar claim can be made for
Horn dualization, as well.
While it is hard to recognize whether a given conjunction is an implicant of a
function expressed in DNF (see Theorem 3.7), we can show that the same problem
is tractable for the dual function (see also Theorem 4.7).

Theorem 6.26. Given a DNF φ of a Boolean function f and an elementary con-


junction T , we can test in O(|T | + |φ|) time whether T is an implicant or a prime
implicant of the dual function f d .

Proof. By definition, T is an implicant of f d if T ≤ f d or, equivalently, if


T d ≥ f = φ. The latter inequality is easy to test, by simply fixing the literals
in T  = 0 at zero, and checking whether this partial assignment makes the DNF φ
vanish, that is, whether every term of φ has a common literal with T . It is also clear
that T is a prime implicant of f d if for every literal u of T , the DNF φ contains a
term that has only u as a common literal with T . These conditions can be checked
by simply reading through φ and maintaining a counter for all literals in T . 

In contrast, note that checking whether f d has no prime implicant, that is,
whether φ d ≡ 0, is co-NP-complete for general DNFs. However, even this case
becomes easy when φ is a Horn DNF (see Theorems 6.10 and 6.11). This implies
that the following special variant of the Dual Recognition problem may be eas-
ier than the general case:

Horn Dual Recognition


Instance: A Horn DNF η and a disjunction φ of some of the prime implicants
of ηd .
Output: YES if ηd = φ, and NO otherwise.

In fact, it was observed by Khardon [564] that the quasi-polynomial algo-


rithm introduced by Fredman and Khachiyan [347] for monotone dualization (see
Chapter 4) can be straightforwardly applied in this case, too.
6.9 Special classes 309

Theorem 6.27. The Horn Dual Recognition problem can be solved in


2
N O(log N) time, where N = |η| + |φ|.

Proof. Clearly, ηd  = φ only if there exists a binary assignment X ∈ B n such that


η(X) ∨ φ(X) = 0, where n denotes the number of variables in η and φ. By the
same reasoning as in Section 4.4.2, either such a vector is easy to find or there must
exist a variable appearing in the DNF η(X) ∨ φ(X) with high frequency, in which
case recursion can be applied. Correctness and complexity of this procedure can
be proved as in [347] (see also Section 4.4.2). 

This result shows that Horn Dual Recognition is unlikely to be NP-hard,


unless all NP-hard problems can be solved in quasi-polynomial time. It is also
important to note that, though essentially the “same” algorithm works for Horn
dualization as for monotone dualization, it remains an open question whether
these two problems, that is Dual Recognition for Horn and monotone inputs,
are indeed polynomially equivalent.
Finally, Theorem 6.27 implies that the dual of a Horn function (expressed in
DNF) can be generated in quasi-polynomial total time; here again, the proof follows
the same arguments as in Section 4.4.2.

6.9 Special classes


In this section, we discuss several interesting special classes of Horn functions
which have been considered in the literature (see, e.g., [107, 108, 296, 298, 308,
449]).

6.9.1 Submodular functions


A Boolean function f (X) on B n is called submodular if

f (X ∨ Y ) ∨ f (X ∧ Y ) ≤ f (X) ∨ f (Y ) (6.15)

for all X, Y ∈ Bn . A function f (X) is called co-Horn if g(X) = f (X) is Horn.


Ekin, Hammer, and Peled [308] observed the following relation between Horn,
co-Horn, and submodular functions:

Theorem 6.28 ([308]). A Boolean function is submodular if and only if it is both


Horn and co-Horn. All prime implicants of a submodular function are either linear
or quadratic pure Horn.

Proof. It is easy to verify that for a function f both conditions – namely, being
submodular or being simultaneously Horn and co-Horn – are equivalent to the fact
that F (f ) is closed with respect to both componentwise conjunction and compo-
nentwise disjunction; see Corollary 6.2. 
310 6 Horn functions

Because submodular functions are quadratic, many of their properties immedi-


ately follow from the results established in Chapter 5. We simply recall them here
briefly.
Consider a submodular function f , and (since linear prime implicants do not
have common variables with other prime implicants) assume for simplicity that f
is purely quadratic. If φ is a prime DNF of f , we can associate with it a directed
graph Gφ = (V , A), where V = {1, 2, . . . . , n}, and (i, j ) ∈ A if xi x j is a term in φ. It
is easy to see that xi x j is a prime implicant of f if and only if there is a directed path
from i to j in Gφ . Thus, the transitive closure Gf of Gφ corresponds to f in the
sense that the quadratic prime implicants of f are in one-to-one correspondence
with the arcs of Gf ; see Section 5.3 (and see Appendix A for the definition of the
transitive closure).
As we observed in Sections 6.7.1 and 6.7.2 (see also Section 5.8.4), the number
of terms (or the number of literals) in a DNF representation of a submodular
function f can be minimized in polynomial time, since it is easy to find a minimum
cardinality subset of the arcs of Gf that induces the same transitive closure.
We remark next that the dual f d of a submodular function f can also be char-
acterized with the help of the associated directed graph Gf = (V , A) (see Ekin,
Hammer, and Peled [308]). We write i ≺ j or, equivalently j 0 i, if there is a
directed path from i to j in Gf . We say that two vertices i and j are comparable
in Gf if either i ≺ j or i 0 j . A set of pairwise incomparable vertices is called
an antichain. Let I(Gf ) denote the family of maximal antichains of Gf . The
following characterization is established in [308]:

Theorem 6.29. Let f be a submodular function without linear prime implicants,


and let Gf be the associated directed graph.

• If Gf is strongly connected, then

n

n

fd = xj ∨ xk.
j =1 k=1

• If Gf is acyclic, then

fd = xj xk. 
I ∈I(Gf ) j ∈I :j ≺a k∈I :k0a
for some i∈I for some i∈I

In the general case, when Gf has c strong components (c > 1), we can write
f = f0 ∨ f1 ∨ · · · ∨ fc , where f0 is the disjunction of those prime implicants that
involve variables from different strong components, and fi is the disjunction of
those prime implicants that involve variables only from the ith strong component
of f , for i = 1, . . . , c. Then, we have f d = f0d ∧ f1d ∧ · · · ∧ fcd , where each of these
functions can be determined by Theorem 6.29, since Gf0 is acyclic, and Gfi is
strongly connected for i = 1, . . . , c.
6.9 Special classes 311

Let us finally mention that, if φ is an arbitrary DNF, then it is co-NP-complete


to recognize whether φ represents a submodular function; this follows easily from
Theorem 1.30 (see also [308]).

6.9.2 Bidual Horn functions


A Boolean function f is called bidual Horn if both f and f d are Horn. We mention
some interesting properties of bidual Horn functions established by Eiter, Ibaraki,
and Makino [298], who were the first to consider this class of functions.
As we recall from Section 6.3, a function f is Horn if and only if its set of false
points F (f ) is closed under componentwise conjunction (see Theorem 6.6). From
this fact and from the definition of the dual function, it is easy to derive that f d
is Horn if and only if its set of true points T (f ) is closed under componentwise
disjunction.
A special case of Theorem 6.24 can be used to recognize whether a Horn DNF
represents a bidual Horn:
Theorem 6.30 ([298]). A Horn DNF η represents a bidual Horn function if and
only if for any two pure Horn terms Ax and By of η with x  = y, the term AB is
an implicant of η.
Proof. Let us apply Theorem 6.24 for the DNF η. If for two terms Ax and By of
η we have x = y, then (6.13) trivially holds, since all literals in these terms are
assigned value 1. If x  = y, then (6.13) means that AB is an implicant of η. 

Since testing AB ≤ η can be done in linear time when η is Horn (see, e.g.,
Corollary 6.6), the above characterization provides an O(|η|2 3η3) algorithm to
test whether a Horn DNF represents a bidual Horn function.
Unfortunately, this positive result does not extend to general DNF representa-
tions. Namely, it was shown in [298] that it is co-NP-complete to recognize whether
an arbitrary DNF represents a bidual Horn function (this is again a corollary of
Theorem 1.30).
Recall from Definition 3.3 in Section 3.3.2 that a prime implicant of a Boolean
function f is essential if it is present in all prime DNF representations of f . An
interesting property of bidual Horn functions is stated next.
Theorem 6.31 ([298]). If f is bidual Horn, then all pure Horn prime implicants
of f are essential. 
In light of Theorem 6.16, this implies that every irredundant prime DNF of a
bidual Horn function has the same number of terms; thus, minimizing the number
of terms in a DNF representation of a bidual Horn function given by a Horn DNF
is polynomially solvable, by Theorem 6.12.
The foregoing does not imply that all irredundant prime DNfs of a bidual Horn
function f should involve the same number of literals. Still, finding a repre-
sentation with the minimum number of literals can also be solved efficiently in
312 6 Horn functions

O(l(m2h mp + l)) time, where l is the number of literals in a given Horn DNF η of
f , and mh and mp denote respectively the number of Horn and positive terms in
η. Furthermore, the number of positive prime implicants of f cannot be more than
2m2h + mp (mh + 1), and thus the consensus algorithm generates from η all prime
implicants of f in polynomial time (see [298]).
Let us further observe that generating the dual of a bidual Horn function f
represented by a Horn DNF η is not easier than dualizing a monotone DNF, since
bidual DNFs include all monotone DNFs as special cases.
Finally, we remark that the existence of a bidual extension for a given partially
defined Boolean function (T , F ) (see Chapter 12 for definitions) can be checked
in O(n|T ||F |) time, where n is the number of variables. Interestingly, listing all
bidual extensions of (T , F ) is computationally equivalent (i.e., as easy or difficult)
as generating all prime implicants of the dual of a monotone DNF (see [298]). In
particular, deciding whether a given partially defined Boolean function (T , F ) has
a unique bidual extension is equivalent to Dual Recognition (see Chapter 4),
and hence can be solved in quasi-polynomial time (see [347]).

6.9.3 Double Horn functions


A Boolean function f is called double Horn if both f and f (the negation of f ) are
Horn. This class of functions was studied by Eiter, Ibaraki, and Makino [296] who
provided many interesting properties and nice characterizations, some of which
we recall here without proofs.
First, as follows easily from Theorem 6.6, a function f is double Horn if and
only if both its set of false points F (f ) and its set of true points T (f ) are closed
under componentwise conjunction.

Theorem 6.32 ([296]). A Boolean function f on Bn is double Horn if and only if


it can be represented by a DNF of the form
# i−1 $

φ= x jk x j i ,
i∈S k=1

where S ⊆ {1, 2, . . . , n} and (j1 , j2 , . . . , jn ) is a permutation of {1, 2, . . . , n}. 

Note that the preceding DNF is an orthogonal expression (i.e., no two of its
terms can take value 1 simultaneously; see Section 1.6 and Chapter 7) which is
short, since it consists of at most n + 1 terms, where n is the number of variables.
In fact, a much stronger statement can be established:

Theorem 6.33 ([296]). If f is a double Horn function on n > 1 variables, then


f , f , and f d all have unique prime DNF representations, each having at most n
terms and n2 literals. Given any of these DNFs, the other ones can be obtained in
O(n2 ) time. Furthermore, the number of nonisomorphic (up to relabeling of the
variables) double Horn functions on n variables is exactly 2n+1 . 
6.9 Special classes 313

It can also be shown (e.g., by Theorem 6.32) that double Horn functions are
read-once, that is they can be represented by a Boolean expression in which every
variable appears at most once (see Chapter 10).
Despite the fact that this class of functions is very “small” and quite well charac-
terized, recognizing whether a given DNF φ represents a double Horn function is
still co-NP-complete in view of Theorem 1.30. However, the recognition problem
is polynomially solvable under appropriate conditions on the input DNFs.
Theorem 6.34 ([296]). Let F be a class of formulae that is closed under restric-
tions (i.e., variable fixing) and for which checking ϕ ≡ 1 and ϕ ≡ 0 can both be
done in t(n, |ϕ|) time, where n is the number of variables and |ϕ| denotes the input
length of formula ϕ ∈ F. Then, deciding whether ϕ ∈ F represents a double Horn
function can be performed in O(n2 t(n, |ϕ|)) time. 

Thus, in particular, if f is represented by a Horn DNF η, then we can recognize


in O(n2 3η3) time if f is a double Horn function.
We finally mention that the existence of a double Horn extension of a partially
defined Boolean function (T , F ) can be decided in polynomial O(n(|T | + |F |))
time (see Chapter 12 for definitions). Furthermore, DNF expressions for all such
extensions can be generated with O(n3 (|T |+|F |) delay (namely, DNF expressions
φ1 , φ2 , ..., can be produced so that the computing time between successive outputs
φi and φi+1 is never more than O(n3 (|T | + |F |))). In particular, deciding if a
given partially defined Boolean function has a unique double Horn extension can
be done in polynomial time. Unfortunately, the number of double Horn extensions
of a given partially defined Boolean function (T , F ) can be exponential in terms
of n, |T |, and |F |, and finding a “shortest” double Horn extension is NP-hard. We
refer the reader to [296] for details.

6.9.4 Acyclic Horn functions


Graph-based special classes generalizing some subclasses of Horn formulae (see
[20, 383, 711]) were introduced by Hammer and Kogan [448, 449]. We present
here a few interesting properties of one of these classes.
Given a pure Horn DNF η, let us associate to it a directed graph Gη = (V , Aη ),
where V = {1, 2, . . . , n} is the set of indices of the variables, and (i, j ) ∈ Aη if η
has a term involving both xi and x j . Analogously, if h is a pure Horn function, let
us associate to it a directed graph Gh = (V , Ah ) by including an arc (i, j ) ∈ Ah if
h has a prime implicant involving both xi and x j . We call Gh the implicant graph
of h.
Clearly, if η is a prime DNF of h, then Gη is a subgraph of Gh . A very useful
property of these graphs is formulated in the following statement:
Theorem 6.35 ([448]). If η is a prime DNF representing the pure Horn function
h, and if Axi x j is a prime implicant of h, then Gη has a directed path from vertex
i to vertex j . In other words, Gh is a subgraph of the transitive closure of Gη .
314 6 Horn functions

Proof. See Exercises 29 and 30. 

A pure Horn function h is called acyclic if Gh is an acyclic directed graph. In


view of Theorem 6.35, it follows that h is acyclic if and only if Gη is acyclic for
an arbitrary prime DNF η of h.
Recall again from Definition 3.3 in Section 3.3.2 that a prime implicant
of a Boolean function f is called essential if it is present in all prime DNF
representations of f , and redundant if no irredundant prime DNF of f includes it.

Theorem 6.36 ([448]). If h is an acyclic pure Horn function, then every prime
implicant of h is either essential or redundant. 

This remarkable property of acyclic Horn functions implies that they have
a unique irredundant prime DNF representation. Thus, in light of the preced-
ing results and of Theorem 6.12, we can check whether a given pure Horn
DNF η is acyclic, and if yes, we can find the unique irredundant prime DNF
representing the same acyclic Horn function in O(3η32 ) time (where, actually,
the majority of the time will be spent on transforming η into an irredun-
dant prime DNF). Clearly, the unique irredundant prime DNF of an acyclic
Horn function minimizes all usual measures of complexity (see Chapter 3 and
Section 6.7).
Further properties and generalizations of acyclic functions based on the
structure of associated graphs can be found in [108, 173, 448, 449].

6.10 Generalizations
6.10.1 Renamable Horn expressions and functions
In most applications of Boolean functions, the meaning of a variable and its nega-
tion are interchangeable, since a particular variable x could equally well denote the
truth value of a logical proposition or its negation. Thus, it is natural to consider
logical expressions obtained from a given expression after replacing some of the
variables by their negations. More formally, given a DNF
 
m

φ=  xj xk (6.16)
i=1 j ∈Pi k∈Ni

and a subset S ⊆ {1, 2, . . . , n}, we say that the DNF φ S is obtained from φ by
switching (or renaming; see also Chapter 5) the variables in the subset S if
 
m

φS =  xj xk .
i=1 j ∈(Pi \S)∪(Ni ∩S) k∈(Ni \S)∪(Pi ∩S)
6.10 Generalizations 315

We say that the DNF φ is renamable Horn if φ S is a Horn DNF for some subset
S of the variables (as before, we do not distinguish between sets of variables and
sets of indices whenever this does not cause any confusion).
The problem of recognizing whether a given DNF φ is renamable Horn was
considered first by Lewis [612], who provided an elegant proof showing that this
problem is polynomially solvable, namely, that it can be reduced to a quadratic
Boolean equation.
Theorem 6.37 ([612]). Let φ be a DNF given as in (6.16) and let S ⊆ {1, 2, . . . , n}.
Then φ S is Horn if and only if the following implications hold for every i = 1, . . . , m:
Pi ∩ S  = ∅ =⇒ Ni ⊆ S and |Pi ∩ S| = 1,
Ni \ S  = ∅ =⇒ |Ni \ S| = 1 and Pi ∩ S = ∅.
Proof. If some of the above implications were not valid for the ith term of φ, then
after switching the variables in S, this term would have more than one negated
variables. On the other hand, if all of the above implications hold for term i, then
it will have at most one negated variable after switching. 

Introducing the binary characteristic vector Y S = (y1 , . . . , yn ), where yj = 1 if


and only if j ∈ S, we can rewrite the foregoing implications as a single quadratic
Boolean condition
     
m
     
 y j1 y j 2   yj 1 yj 2   y j 1 y j2  
 ∨ ∨  = 0 (6.17)
i=1 j1 ∈Pi j1 ∈Pi j1 ∈Ni
j2 ∈Ni j2 ∈Pi \{j1 } j2 ∈Ni \{j1 }

If S is not given, then Y S can be viewed as a vector of unknowns, and the


condition for Horn renamability translates
|Pi ∪Ninto
 the quadratic Boolean equation
(6.17), involving n variables and m i=1 2
i|
quadratic terms. This equation can
be solved in O(n2 ) time (see Chapter 5), and the reduction provides a quadratic-
time recognition algorithm for renamable Horn DNFs.
It was observed by Aspvall [33] that, by using some auxiliary variables, an
equivalent quadratic system can be constructed that involves only O(|φ|) terms,
thus providing the first linear-time recognition algorithm for renamable Horn
DNFs. Further linear-time recognition algorithms were proposed by Chandru
et al. [181], Mannila and Mehlhorn [662], and Sykora [853]. A linear-time recog-
nition algorithm for a more general class of expressions was presented by Boros,
Hammer, and Sun [135]; as a special case, this algorithm also detects in linear
time if a given DNF is Horn renamable (see Section 6.10.2 for more details). We
further add that recognizing whether a given DNF has a unique Horn renaming
can also be detected in linear time (see Hebrard [480]), and that an iterative Horn
renaming-based algorithm was presented by Boros, Čepek, and Kogan [108] to
find a short DNF representation of a given Horn function.
In some applications, most notably when solving Boolean DNF equations,
it may be advantageous to have many Horn terms in the input DNF (see, e.g.,
316 6 Horn functions

Section 2.5.2). The problem of switching a subset of variables so as to maximize


the number of Horn terms of a given DNF was considered by several authors. We
can observe, for instance, that for a cubic DNF, at least half of its terms can always
be switched to Horn (see Exercise 33 at the end of this chapter). Chandru and
Hooker [183] showed that finding the maximum number of terms of a given DNF
that can be switched simultaneously to Horn is an NP-hard optimization problem,
and Crama, Ekin, and Hammer [229] observed that it remains NP-hard even for
quadratic DNFs. In Boros [106], a simple polynomial time approximation algo-
rithm is presented for this hard optimization problem, guaranteeing that at least 40 67
of the maximum possible number of terms can be renamed to Horn in polynomial
time. It was shown by Zwick [941] that guaranteeing more than 23 for cubic Horn
DNFs is not possible, unless P=NP, and that 23 is achievable by a semidefinite
programming-based approximation algorithm.
In this section, so far, we have focused on the renamability of (DNF) expres-
sions. We should note, however, that variable switching is not only an operation on
expressions, but also defines a mapping (a bijection) on the set of Boolean func-
tions. Namely, for a subset S ⊆ {1, 2, . . . , n}, a binary point X = (x1 , x2 , . . . , xn ) ∈ B n
and a Boolean function f , let us define the point X[S] by

xj if j  ∈ S,
xj [S] =
x j if j ∈ S,

and f S (X) = f (X[S]). Clearly, X ←→ X[S] is a bijection over B n , and thus


f ←→ f S is an induced bijection over the set of Boolean functions on B n . Accord-
ingly, we say that a Boolean function f is renamable Horn if f S is a Horn function
for some subset S.
Note that even Horn functions (which clearly form a subfamily of renamable
Horn functions) may have DNF representations that cannot be renamed to Horn.

Example 6.13. The (monotone) Horn function h defined by the DNF

η = x1 ∨ x2 ∨ x3 ,

can also be represented by the irredundant DNF

φ = x1 x2 ∨ x1 x3 ∨ x2 x3 ∨ x 1 x 2 x3 ∨ x 1 x2 x 3 ∨ x1 x 2 x 3 ,

which is not Horn renamable. 

In fact, Theorem 1.30 implies that it is NP-hard to recognize whether an arbitrary


DNF represents a Horn-renamable function.
However, if f S is Horn, then the same switching set S turns all the prime
implicants of f into Horn terms, and thus any DNF φ ⊆ Pfc representing f is also
Horn renamable (where Pfc denotes, as usual, the consensus closure of the prime
implicants of f ).
6.10 Generalizations 317

6.10.2 Q-Horn functions


A further generalization, the family of so-called Q-Horn functions, was introduced
by Boros, Crama, and Hammer [112]. This class includes Horn and renamable Horn
as well as quadratic functions.
With a DNF φ, given as in (6.16), let us associate a polyhedron Pφ ⊆ Rn , defined
by
 '   
 ' αj + (1 − αk ) ≤ 1 for i = 1, . . . , m 
'
Pφ = α ∈ Rn '' j ∈Pi k∈Ni . (6.18)
 ' 
0 ≤ αj ≤ 1 for j = 1, . . . , n
We say that φ is a Q-Horn DNF if Pφ  = ∅. It is easy to see that
• α = (0, 0, . . . , 0) ∈ Pφ whenever φ is Horn;
• α = X S ∈ Pφ whenever φ can be turned into a Horn formula by switching
the variables in S; and
• α = ( 12 , 12 , . . . , 12 ) ∈ Pφ whenever φ is a quadratic DNF.

Example 6.14. The following DNF


φ = x1 x2 x3 ∨ x1 x 2 x4 ∨ x 1 x2 x 5 ∨ x 1 x 2 x 6 ∨ x3 x 4 x 5 ∨ x3 x6 ∨ x4 x 5
is Q-Horn, since ( 12 , 12 , 0, 0, 1, 1) ∈ Pφ in this case. In fact Pφ = {( 12 , 21 , 0, 0, 1, 1)},
and thus, this DNF is neither quadratic, nor Horn, nor renamable Horn. 

Definition 6.15. Given a real vector α ∈ Rn , let us define [α] ∈ Rn by


 1
 1 if αj > 2 ,

1
[α]j = 2
if αj = 21 ,


0 if αj < 21 .

Furthermore, let H (α) = {j | αj = 12 }.


Lemma 6.11 ([112]). If α ∈ Pφ , then [α] ∈ Pφ , and hence, Pφ  = ∅ if and only if
Pφ ∩ {0, 12 , 1}n  = ∅. Furthermore, if Pφ  = ∅, then there exists a unique minimal
subset H = Hφ such that H = H (α) for some α ∈ Pφ ∩ {0, 12 , 1}n and H ⊆ H (β)
for all β ∈ Pφ .
Proof. Let  us first note that if 0 ≤ r ≤ 1, then [1 − r] = 1 − [r] by Definition
  
6.15, thus j ∈P [αj ] + j ∈N (1 − [αj ]) = j ∈P [αj ] + j ∈N [1 − αj ]. Observe
next that, if the sum of some nonnegative reals is not larger than 1, then at most
one of these numbers is larger than 12 (and then all others are smaller than 12 ),
or at most two of them are equal to 12 (and then all others are equal to 0). Thus
   
j ∈P αj + j ∈N (1 − αj ) ≤ 1 implies j ∈P [αj ] + j ∈N [1 − αj ] ≤ 1, proving
that if α ∈ Pφ , then [α] ∈ Pφ , too.
For the second half of the lemma, let us observe that, if α, β ∈ Pφ , then for almost
all reals 0 < λ < 1 (except finitely many values), we have H (λα + (1 − λ)β) =
318 6 Horn functions

H (α)∩H (β). Since there are only finitely many different subsets H ⊆ {1, 2, . . . , n},
it follows that there exists a vector α ∈ Pφ such that H (α) ⊆ H (β) for all β ∈ Pφ .
Thus the lemma follows by H (α) = H ([α]). 

Lemma 6.11 implies easily that if φ is a Q-Horn DNF, and α ∈ Pφ , then α ∈ Pφ c ,


where φ c denotes the DNF formed by the disjunction of all terms obtainable from
φ by consensuses (i.e., φ c is the consensus closure of φ; see Section 6.6).
Thus, we can define Q-Horn functions as those Boolean functions whose com-
plete DNF (the disjunction of all their prime implicants) is Q-Horn. The family
of Q-Horn functions properly includes all quadratic, Horn, and renamable Horn
functions.
Using linear programming, we can recognize efficiently whether a given DNF φ
is Q-Horn or not; moreover, a half-integral vector in Pφ ∩ {0, 12 , 1}n can be found in
polynomial time.Alinear time recognition algorithm was given by Boros, Hammer,
and Sun [135].
It was shown in [112] that the Boolean equation φ = 0 can be solved in linear
time for a Q-Horn DNF φ whenever a vector α ∈ Pφ ∩ {0, 12 , 1}n is known. More
precisely, we can find α ∈ Pφ ∩ {0, 12 , 1}n , for which H (α) = Hφ , whenever Pφ  = ∅
or recognize that Pφ = ∅, in O(3φ3) time. Since, in particular, φ is renamable Horn
if and only if Hφ = ∅, the same algorithm also recognizes in linear time whether
or not a given DNF is renamable Horn.
Consequently, Q-Horn equations can be solved in linear time. More precisely,
for every Boolean equation φ = 0, we can either recognize that φ is not Q-Horn
or solve the equation in linear time.
If we associate a (0, ±1) matrix with a given DNF, as in Section 6.2.4, then the
family of DNFs for which the corresponding (0, ±1) matrix has a so-called mono-
tone decomposition, as introduced by Truemper [871], includes Q-Horn DNFs. A
linear time algorithm to find a monotone decomposition of a given (0, ±1) matrix
is also presented in [871].
Finally, by relaxing the definition of Q-Horn DNFs, we can introduce a useful
index associated with a DNF φ, which is related to the difficulty of solving the
Boolean equation φ = 0. For a DNF φ defined by (6.16), we define the index z(φ)
as the optimal value of the linear programming problem

z(φ) = min z
 
s.t. z≥ j ∈Pi αj + k∈Ni (1 − αk ) for i = 1, . . . , m,

0 ≤ αj ≤ 1 for j = 1, . . . , n.

Clearly, φ is Q-Horn if and only if z(φ) ≤ 1. Boros et al. [116] showed that if
z(φ) ≤ 1 + (c log n)/n, then the Boolean equation φ = 0 can be solved in O(nc )
time. On the other hand, the tautology problem remains NP-complete for any fixed
M < 1 when restricted to instances for which z(φ) ≤ 1 + n−M .
6.10 Generalizations 319

6.10.3 Extended Horn expressions


Another generalization of Horn formulae was introduced by Chandru and Hooker
[182]. The motivation behind this generalization is the integer programming round-
ing result by Chandrasekaran [180] mentioned in Section 6.2.4, and the possibility
of using linear programming to solve Boolean equations (see Section 2.8).
For a formal definition, let us consider an arborescence T rooted at vertex r (i.e.,
a directed tree with all arcs oriented away from the root) that has n arcs, labeled by

{1, 2, . . . , n}. We say that a term j ∈P xj k∈N x k is extended Horn with respect to
T if the set N is a directed path of T , and if the set P is a union of directed paths
in T with the property that either (i) all paths in P start at the root or (ii) one of
them starts where N starts, and all others start at the root. The same term is called
simple extended Horn with respect to T if (ii) does not occur. Accordingly, a DNF
φ is called (simple) extended Horn if all of its terms are (simple) extended Horn
with respect to the same arborescence T .

Theorem 6.38 ([182]). If φ is extended Horn, then the Boolean equation φ = 0


has a solution if and only if the polyhedron
 '   .
'
n' j ∈Pi xj + k∈Ni (1 − xk ) ≤ |Pi ∪ Ni | − 1 for i = 1, . . . , m
Qφ = X ∈ R ' 0 ≤ xj ≤ 1 for j = 1, . . . , n

is not empty. Furthermore, repeated application of the unit literal rule allows us
to detect whether φ ≡ 1. 

Note that by Theorem 2.10, an arbitrary DNF equation φ = 0 has a solution if and
only if the polyhedron Qφ contains an integral point. The strength of the preceding
statement is that for an extended Horn DNF φ, the integrality requirement can be
disregarded, and hence, the consistency question can be decided in polynomial
time by linear programming.
It was also shown by Schlipf et al. [809] that extended Horn equations (and
many others, including renamable extended Horn equations) can be solved by the
single look-ahead unit literal rule. In this algorithm, variables are assigned binary
values one-by-one, and the unit literal rule is applied right after each assignment
has been made. If a contradiction is found, then the last assignment is reversed;
otherwise, the last assignment is accepted permanently.
The recognition of extended Horn DNFs is strongly related to the so-called
arborescence realization problem (given a hypergraph H on a base set E, find an
arborescence T with arc set E such that all hyperedges of H are directed paths
in T ), and, in fact, a polynomial time recognition algorithm for simple extended
Horn DNFs was derived via arborescence realization by Swaminathan and Wagner
[852]. This was later improved to a linear time algorithm by Benoist and Hebrard
[59]. The problem of recognizing extended Horn DNFs is still open.
320 6 Horn functions

6.10.4 Polynomial hierarchies built on Horn expressions


A polynomial hierarchy of DNFs is a sequence of families of DNFs

D0 ⊂ D1 ⊂ · · · ⊂ Dk ⊂ · · ·

such that (i) the membership φ ∈ Dk can be tested in time polynomial in |φ|k ; (ii)
if k is a fixed constant and φ ∈ Dk , then the Boolean equation φ = 0 can be solved
in polynomial time; and (iii) for every DNF φ, φ ∈ Dk for some integer k.
Several such hierarchies were considered in the literature (see, e.g., [174,
253, 362, 756]), most of them built on Horn expressions or on some of their
generalizations. To describe these, we need to introduce a few more notations.
With a DNF φ given by (6.16), let us associate the hypergraph N (φ) = {Ni |
i = 1, . . . , m} consisting of the index sets of the negated variables of the terms of
φ. Note that N (φ) may not be a clutter; for example, it contains the empty set
whenever φ includes a positive term. For an index j , consider two operations,
defined by N \ {j } = N \ {N ∈ N | N 4 j } and N ÷ {j } = {N \ {j } | N ∈ N },
respectively, called the deletion and the contraction of element j (note the slight
difference with the similar terminology introduced in Section 1.13.5).
One of the earliest polynomial hierarchies N0 ⊂ N1 ⊂ · · · ⊂ Nk ⊂ · · · , where
N0 is the family of Horn expressions, was proposed by Gallo and Scutella [362].
To describe this hierarchy, first we need to define a hierarchy of hypergraphs
I0 ⊂ I1 ⊂ · · · by

• N ∈ I0 if |N| ≤ 1 for all N ∈ N ; and


• for k > 0, N ∈ Ik if there exists an index j such that N \ {j } ∈ Ik−1 and
N ÷ {j } ∈ Ik .

Note that class Ik for k > 0 is initialized by the condition Ik−1 ⊂ Ik . Then, classes
of DNFs Nk , k = 0, 1, . . . are defined by φ ∈ Nk if and only if N (φ) ∈ Ik .
Clearly, N0 is the family of Horn DNFs. The class N1 is the family of so-called
generalized Horn DNFs, introduced earlier by Yamasaki and Doshita [931]. It
was shown in [362] that the membership φ ∈ Nk can be tested in O(|φ|nk ) time.
Furthermore, the membership algorithm in [362] provides the index j appearing
in the recursive definition of Ik . When k is a fixed constant, a polynomial time
algorithm to solve the Boolean equation φ = 0, with φ ∈ Ik , follows easily from
these results. Indeed, branching on the j -th variable results in two subproblems,
one from Nk−1 and one from Nk , both having one variable less than the original
problem. (The same results were obtained by [931] when k = 1.)
The previous hierarchy was somewhat improved by Dalal and Etherington
[253], so that both Horn and quadratic formulae could be included at the lowest
level of the hierarchy. Furthermore, it was shown by Kleine Büning [570] that,
to prove φ ≡ 1 for a DNF φ ∈ Nk , it is enough to use a restricted version of the
consensus algorithm in which the consensus of two terms is computed only if at
least one of the terms is of degree at most k.
6.11 Exercises 321

Pretolani [756] observed that many other classes of DNFs could be used in
place of N0 , resulting in a similar polynomial hierarchy. Unfortunately, renamable
extensions of otherwise simple classes may not always be included at low levels
of such hierarchies. For instance, Eiter, Kilpelainen, and Mannila [301] showed
that recognizing renamable generalized Horn DNFs is an NP-complete problem.
Recent work of Čepek and Kučera [174] provides a quite general framework
for more general polynomial hierarchies. Let D0 be a class of DNFs, and

• for k > 0, let φ ∈ Dk if and only if there exists a literal u of φ such that
φ|u=0 ∈ Dk−1 and φ|u=1 ∈ Dk .

Theorem 6.39 ([174]). If D0 is a nontrivial class that is closed under (i)


switching a subset of the variables and (ii) fixing a subset of the variables at
binary values, and if the Boolean equation φ = 0 for φ ∈ D0 can be solved in poly-
nomial p(|φ|) time, then the classes D0 ⊂ D1 ⊂ · · · define a polynomial hierarchy.
In particular, for a DNF φ, membership in Dk can be tested in O(p(|φ|nk+1 ) time,
and if φ ∈ Dk , then the Boolean equation φ = 0 can also be solved in O(|φ|nk+1 )
time. 

For example, the class D0 can be chosen to be the family of renamable Horn
DNFs or the family of Q-Horn DNFs, and so on, with each choice resulting in a
different polynomial hierarchy.

6.11 Exercises
1. Let f and g denote arbitrary Horn functions. Decide whether the following
claims are true or false:
• f ∨ g is Horn.
• f ∧ g is Horn.
• f is Horn.
2. Find a Boolean function, in n variables for which the number of minimal
Horn majorants is exponential in n.
3. Find a Horn function in n variables for which the number of prime implicants
is polynomial in n, but the number of different Horn DNF representations
is exponential in n.
4. Let f be a Boolean function, and let Pi , i = 1, . . . , m, be its Horn prime impli-

cants. Prove that η(X) = m i=1 Pi (X) is the unique maximal Horn minorant
of f . Does this claim remain true if Pi , i = 1, . . . , m are the Horn terms of
an arbitrary DNF of f ?
5. Let
n

f= P xi
i=1 P ∈Pi
322 6 Horn functions

and
n

g= Qx i
i=1 Q∈Qi

be the complete DNFs of two pure Horn functions. We then define


n

f ⊗g = P Qx i .
i=1 P ∈Pi
Q∈Qi

• Prove that the family of pure Horn functions with the operations ⊗ and
∨ form a lattice.
• Prove that f ⊗ g is the unique largest Horn minorant of f ∧ g.
• Can you generalize this for the family of Horn functions?
6. Prove that for a nonempty subset S ⊆ B n and for the characteristic models
Q(S) of this set (see Corollary 6.4), we have
/ 0
Q(S) = X ∈ S | X  ∈ (S \ {X})∧ . (6.19)

7. Let h be a Horn function in n variables, and let m∗ and l ∗ denote, respectively,


the numbers of terms and literals in a DNF representation of hd . Prove that
the following inequality holds:

|Q(F (h))| ≤ m∗ (n + 1) − l ∗ . (6.20)

8. Construct examples of Horn functions h for which there is an exponential


gap in inequality (6.20).
9. Find examples of Horn DNFs η such that Q(F (η)) and ηd are simultaneously
exponentially larger than η.
10. Find examples of Horn functions h for which any DNF representation of both
functions h and hd are exponentially larger than the cardinality |Q(F (h))|.
11. Given X, Y , A ∈ B n , let us write X ≥A Y if xi ⊕ai ≥ yi ⊕ai for all i = 1, . . . , n,
where ⊕ denotes the modulo 2 addition. For a subset S ⊆ B n let

S A = {X | X ≥A Y for some Y ∈ S}

denote the A-monotone closure of S.


• Prove that, for every subset S ⊆ B n ,
)
S= SA.
A∈Bn

• Let A denote the set of those n + 1 binary vectors from B n that contain
at least n − 1 ones. Prove that, for every Horn function h, we have
)
F (h) = F (h)A .
A∈A

(See more in [161].)


6.11 Exercises 323

12. Given a DNF    


m

φ=  xj  ∧  xj 
i=1 j ∈Pi j ∈Ni

in n variables, let us call a mapping σ : [m] −→ [n] a selector if σ (i) ∈ Ni


whenever Ni  = ∅. With φ and the selector σ , let us associate a DNF φσ
defined by
     

φσ =  xj  ∨   xj  ∧ x σ (i)  .
i:Ni =∅ j ∈Pi i:Ni =∅ j ∈Pi

• Prove that, φσ is a Horn majorant of φ, for every selector σ .


• Prove that, for every Horn majorant η of φ, there exists a selector σ
such that φ ≤ φσ ≤ η.
13. Let hi , i = 1, . . . , N, be the set of minimal Horn majorants of the Boolean

function f . Prove that f = N i=1 hi .
14. Which Boolean functions have a unique minimal Horn majorant?
15. Can you characterize those Boolean functions that have exactly two minimal
Horn majorants?
16. Let φ be a DNF, and let η be a Horn DNF. How difficult is it to decide whether
or not η ≤ φ holds? What is the complexity of this problem if we assume
that φ contains all prime implicants of the Boolean function it represents?
17. Let η be a Horn DNF representing the unique maximal Horn minorant of the
DNF φ. Prove that deciding the consistency of the Boolean equations η = 0
and φ = 0 are computationally equivalent problems. What is the complexity
of finding the maximal Horn minorant of a DNF?
18. Given Horn DNFs ηj , j = 1, . . . , k, what is the complexity of finding the
maximal Horn minorant of η1 ∧ η2 ∧ · · · ∧ ηk ?
19. Let η be a Horn DNF representing a minimal Horn majorant of the DNF
φ. Prove that deciding the consistency of the Boolean equations η = 0 and
φ = 0 are computationally equivalent problems. What is the complexity of
finding a minimal Horn majorant of a DNF?
20. Given a pure Horn function h in variables V = {x1 , x2 , . . . , xn }, find a minimal
subset S ⊆ V for which S h = V , that is, for which the forward chaining
closure of S includes all variables. How difficult is this problem? Is such a
minimal subset unique?
21. Prove that for two Horn functions h and h , we have S h = S h for every
subset S of variables if and only if h = h .
22. Given a Horn function h of n variables, let us denote by h(k) the disjunction
of those prime implicants of h having degree at most k. Note that h(1) ≤
h(2) ≤ · · · ≤ h(n) = h.
• Is it true that h(1) has a DNF representation not longer than the shortest
DNF of h?
324 6 Horn functions

• Construct a Horn DNF η representing the Horn function h such that, for
every DNF representation η(2) of h(2) , we have |η(2) | > |η| (cf. [172]).
23. Let us call a consensus k-restricted if at least one of the terms involved in
the consensus has degree at most k.
• Prove that all linear prime implicants of a pure Horn DNF η can be
obtained by a sequence of 1-restricted consensuses.
• Generalize this statement for any k ≥ 2 (see [173]).
24. Consider a Horn function h given by a prime DNF η, and let T be an implicant
of h. How difficult is it to decide whether T can be derived from the prime
implicants of h by a sequence of consensuses? How many prime implicants
of h are needed for such a consensus derivation of T when it exists?
25. Let T and Q ⊆ T be two sets of terms, both closed under consensus, and let
(R1 , D1 ) and (R2 , D2 ) be two RD-partitions of T .
• Prove that (R3 , D3 ) is also an RD-partition of T if R3 = R1 ∩ R2 and
D3 = D1 ∪ D2 .
• Prove that (R4 , D4 ) is an RD-partition of Q if R4 = R1 ∩ Q and D4 =
D1 ∩ Q.
26. Prove that the partition in Example 6.7 is an RD-partition.
27. Prove that, if h is a Horn function, then each of the following defines an
RD-partition of Phc :
(a) R = {T ∈ Phc | |T | ≥ 2} and D = Phc \ R.
(b) R = {T ∈ Phc | |T | ≤ 2} and D = Phc \ R.
(c) R = {T ∈ Phc | T (X) = 0} and D = Phc \ R, if X ∈ Bn is a point at which
every prime implicant of h contains at most one literal that evaluates to
zero. (How easy is it to check for the existence of such a binary vector
X ∈ Bn ?)
(d) R = {T ∈ Phc | all variables of T belong to S} and D = Phc \ R, where
S is a subset of the variables that is closed under forward chaining,
namely, S h = S (see Section 6.4).
28. Prove that the minimum number of positive terms in a Horn DNF of a
Horn function h is always at most 1. For which Horn functions is it 0?
How difficult is to find such an “optimal” Horn DNF, having the minimum
number of positive terms?
29. Consider a pure Horn DNF η of a pure Horn function h, and the associated
directed graph Gη = (V , Aη ) defined in Section 6.9.4. Prove that if Ay is
an implicant of h (not necessarily present in η) and x ∈ A, then there is a
directed path from x to y in Gη .
30. Consider two prime DNFs η1 and η2 of the pure Horn function h. Prove
that the transitive closures of the directed graphs Gη1 and Gη2 are the same,
and that they coincide with the transitive closure of Gh (see Appendix A for
definitions).
6.11 Exercises 325

31. Let us consider a pure Horn function h, the associated transitively closed
directed graph Gh = (V , Ah ), as defined in the previous exercise, and let us
assume that S ⊆ V is an initial set of the vertices (namely, there is no arc
(x, y) with x ∈ V \ S and y ∈ S). Define

R = {T | T ∈ Phc , the head of T belongs to S}.

Prove that R and D = Phc \ R form an RD-partition of Phc .


32. Consider a transitively closed directed graph D = (V , A), and let HD denote
the set of those pure Horn functions h for which D = Gh . Prove that if
h, h ∈ HD , then both h ∨ h and h ⊗ h (as defined in Exercise 5) belong to
HD . Prove also that HD contains a unique minimal function and a unique
maximal function. Can you write a DNF of these unique minimal and
maximal members of HD ?
33. Let
 φ |P be a DNF of m terms, as given in (6.16). Prove that at least
" m i ∪Ni |+1
i=1 2|Pi ∪Ni | # of its terms can be switched to Horn by renaming some of
its variables. Can you give a polynomial time algorithm to accomplish this?
34. Prove that the lower bound in the previous exercise is tight (cf. [585]).
7
Orthogonal forms and shellability

The concept of orthogonal disjunctive normal form (or ODNF, sometimes called
sum of disjoint products) was introduced in Chapter 1. Orthogonal forms are a
classic object of investigation in the theory of Boolean functions, where they were
originally introduced in connection with the solution of Boolean equations (see
Kuntzmann [589], Rudeanu [795]). More recently, they have also been extensively
studied in the reliability literature (see, e.g., Colbourn [205, 206]; Provan [759];
Schneeweiss [811]).
In general, however, orthogonal forms are difficult to compute, and few classes
of disjunctive normal forms are known for which orthogonalization can be effi-
ciently performed. An interesting class with this property, called the class of
shellable DNFs, has been introduced and investigated by Ball and Provan [49, 760].
As these authors established, the DNFs describing several important classes of reli-
ability problems (all-terminal reliability, all-point reachability, k-out-of-n systems,
etc.) are shellable. Moreover, besides its unifying role in reliability theory, shella-
bility also provides a powerful theoretical and algorithmic tool of combinatorial
geometry, where it originally arose in the study of abstract simplicial complexes
(see [96, 97, 205, 206, 254, 569, etc.]; let us simply mention here, without further
details, that an abstract simplicial complex can be viewed as the set of true points
of a positive Boolean function).
In this chapter, we first review some basic facts concerning orthogonal forms
and describe a simple orthogonalization procedure for DNFs. Then, we intro-
duce shellable DNFs and establish some of their most remarkable properties: In
particular, we prove that shellable DNFs can be orthogonalized and dualized in
polynomial time. Finally, we define and investigate a fruitful strengthening of
shellability, namely, the lexico-exchange property.

7.1 Computation of orthogonal DNFs


Recall from Chapter 1, Section 1.6, that the DNF

326
7.1 Computation of orthogonal DNFs 327



φ= xi xj , (7.1)
k=1 i∈Ak j ∈Bk

is orthogonal if no two terms of φ can be simultaneously equal to 1, that is, if


   
Ak ∩ B- ∪ A- ∩ Bk  = ∅ for all 1 ≤ k < - ≤ m,

or, equivalently,





xi xj xi xj ≡ 0 for all 1 ≤ k < - ≤ m.
i∈Ak j ∈Bk i∈A- j ∈B-

As described in Section 1.6, one of the main applications of ODNFs is in


enumerating the true points of a Boolean function or, more generally, in computing
the probability that a Boolean function takes the value 1 when each of its variables
takes the value 0 or 1 randomly and independently of the values of the other
variables. Indeed, for functions in orthogonal form, this probability is very easily
computed by summing the probabilities associated with all individual terms, since
any two terms correspond to a pair of disjoint events. This explains, in particular,
why ODNFs have become an object of study in reliability theory (see Section
1.13.4).
As noted earlier, however, computing an ODNF of a Boolean function often
turns out to be a difficult computational task. In previous chapters, we described
different ways of obtaining an ODNF of a given function, for instance, by comput-
ing its minterm expression (see Section 2.11.2 and the “complete state enumeration
scheme” in Provan’s classification [759]), by iterative applications of the Shannon
expansion (see Section 1.8 and the “pivotal decomposition scheme” in [759]), or as
a byproduct of binary decision diagrams (see Section 1.12.3; Ball and Nemhauser
[48]; Birnbaum and Lozinskii [90]; Wegener [903], etc.). We now present another
classical approach, which relies on the following simple observations.

Theorem 7.1. Let φ = m k=1 Ck be a DNF. Then,

(i) the expression

ψ = C1 ∨ C1 C2 ∨ C1 C2 C3 ∨ . . . ∨ C1 C2 . . . Cm−1 Cm

is equivalent to φ;

(ii) if ψk is an ODNF of C1 C2 . . . Ck−1 Ck for k = 1, 2, . . . , m, then mk=1 ψk is
an ODNF of φ.
Proof. The expression ψ is clearly equivalent to φ. Let T1 be a term of ψk and T2
be a term of ψj , where T1  = T2 and k ≤ j . If k < j , then T1 T2 ≡ 0, since ψk ψj ≡ 0.
On the other hand, if k = j , then T1 T2 ≡ 0 by orthogonality of ψk . 

Theorem 7.1 suggests the recursive procedure described in Figure 7.1 for
computing an ODNF of an arbitrary DNF (see, e.g., Kuntzmann [589]).
328 7 Orthogonal forms and shellability

Procedure Orthogonalize(φ)

Input: A DNF φ = m k=1 Ck .
Output: An orthogonal DNF ψ equivalent to φ.

begin
for k := 1 to m do
begin
compute a DNF φk of C1 C2 . . . Ck−1 Ck ;
ψk := Orthogonalize(φk );
end; 
ψ := m k=1 ψk ;
end

Figure 7.1. Procedure Orthogonalize.

There are many ways of implementing this algorithm, thus giving rise to differ-
ent variants of Orthogonalize, such as those proposed by Fratta and Montanari
[346]; Abraham [2]; Aggarwal, Misra, and Gupta [7]; Locks [619]; Bruni [158];
and so on; see also the surveys [206, 776]. (Note that most authors restrict
their attention to positive Boolean functions, although there is no need to be so
restrictive.)
A specific difficulty with Orthogonalize is to work around the recursive
call to the procedure, since orthogonalizing φk may, in general, be as difficult
as orthogonalizing φ itself. One way to resolve this difficulty is to produce φk
directly in orthogonal form, as this suppresses the need for the recursive call. To
nj
achieve this goal, we write Cj = i=1 -ij , where -1j , -2j , . . . , -nj j are literals, for
j = 1, 2, . . . , m. Then,

nj


k−1 
C1 C2 . . . Ck−1 Ck = -ij Ck
j =1 i=1



k−1 
= -1j ∨ -1j -2j ∨ -1j -2j -3j ∨ . . . ∨ -1j -2j . . . -nj −1,j -nj ,j Ck .
j =1

Using distributivity to “multiply out” its k − 1 factors, the latter expression can
easily be transformed into an orthogonal DNF ψk .
Abraham [2] suggested to implement this approach in an iterative fashion,
by successively computing an ODNF expression ϕj of C1 C2 . . . Cj Ck for j =
1, 2, . . . , k − 1, until ϕk−1 = φk = ψk is obtained. Suppose that the ODNF ϕj −1 is

in the form ϕj −1 = t∈T Pt , where Pt (t ∈ T ) are elementary conjunctions. Then,

nj
   
C1 C2 . . . Cj Ck = Cj ϕj −1 = -ij Pt , (7.2)
i=1 t∈T
7.1 Computation of orthogonal DNFs 329

and the right-hand side of (7.2) can be transformed to produce the ODNF
nj
   
ϕj = -ij Pt
i=1 t∈T
 
= -1j Pt ∨ -1j -2j Pt ∨ -1j -2j -3j Pt ∨ . . . ∨ -1j -2j . . . -nj −1,j -nj ,j Pt .
t∈T

(7.3)

Abraham [2] proposed to accelerate this procedure by various types of com-


putational shortcuts (similar to those described in the context of dualization
algorithms – see Theorem 4.31). For instance, if some term Pt contains the com-
plement of one of the literals -ij , then Cj Pt = Pt , and the t-th subexpression
in (7.3) can be replaced by Pt . If Pt contains a subset of {-1j , -2j , . . . , -nj j }, say,
 
r
without loss of generality {-r+1,j , -r+2,j , . . . , -nj j }, then Cj Pt = i=1 - ij Pt and
the right-hand side of (7.3) simplifies accordingly. Also, absorption can be applied
at any stage of the procedure (the previous two simplifications can actually be
viewed as resulting from absorption). Finally, as noted in [2, 7, 346, etc.], the
efficiency of the procedure is usually improved if the terms of φ are reordered by
nondecreasing degree.
Example 7.1. Let φ = x1 x2 ∨ x2 x 3 ∨ x3 x4 , and let us apply Abraham’s method.
First, we let φ1 = ψ1 = x1 x2 . Next, we find

φ2 = ψ2 = (x1 ∨ x2 ) x2 x3 = x1 x2 x3 .

Finally, we need an orthogonal DNF of (x1 ∨ x2 ) (x2 ∨ x3 ) x3 x4 . We first produce

ϕ1 = (x1 ∨ x1 x2 ) x3 x4 = x1 x3 x4 ∨ x1 x2 x3 x4 .

Then, we produce (note that both terms of ϕ1 conflict with a literal of x2 x 3 )

ϕ2 = φ3 = ψ3 = (x2 ∨ x3 ) ϕ1 = x1 x3 x4 ∨ x1 x2 x3 x4 ,

and we eventually obtain the following ODNF of φ:

ψ = ψ1 ∨ ψ2 ∨ ψ3 = x1 x2 ∨ x1 x2 x3 ∨ x1 x3 x4 ∨ x1 x2 x3 x4 .


Another way to look at the for loop of the procedure in Figure 7.1 relies on the
observation that computing a DNF C C . . . Ck−1 Ck is essentially equivalent to
 of 1 2
dualizing the function fk−1 = k−1 d
i=1 i . Indeed, if θk−1 (X) is a DNF of fk−1 (X),
C
then the required DNF φk is easily derived from the expression θk−1 (X)Ck (X).
Note, however, that the resulting DNF is usually not orthogonal, so that the
recursive call to Orthogonalize is needed here.
Incidentally, for positive functions, this relation between dualization and
orthogonalization procedures prompts an intriguing conjecture.
330 7 Orthogonal forms and shellability

Conjecture 7.1. Every positive Boolean function f has an ODNF ψ whose length
is polynomially related to the length of the complete (i.e., prime irredundant) DNFs
of f and f d : More precisely, there exist positive constants α and β such that, if
p, q, and r respectively denote the number of terms of f , f d , and a shortest ODNF
of f , then asymptotically
α (p + q) ≤ r ≤ (p + q)β .
Weaker forms of the lower bound conjecture have been informally stated by Ball
and Nemhauser [48] and Boros et al. [111] (see also Jukna et al. [541] for related
considerations and negative results in the context of decision trees and branching
programs). Note that if m denotes the number of terms of an arbitrary DNF of
f , then the bound p ≤ m holds in view of the unicity of the prime irredundant
representation of positive functions.
An interesting result concerning the length of ODNFs was established by Ball
and Nemhauser [48]. (The proof of this result involves arguments based on linear
programming duality. It is rather lengthy and we omit it here.)
Theorem 7.2. For all n ≥ 1, the shortest ODNF of f (x1 , x2 , . . . , xn , y1 , y2 , . . . , yn ) =
n n
i=1 xi yi contains 2 − 1 terms.

Observe that the dual of the function mentioned in Theorem 7.2 has 2n prime
implicants, in agreement with Conjecture 7.1.

7.2 Shellings and shellability


7.2.1 Definition
An extreme simplification of the procedure Orthogonalize is achieved when
each of the expressions C1 C2 . . . Ck−1 Ck (k = 1, 2, . . . , m) reduces to an elementary
conjunction. This observation motivated Ball and Provan [49] to introduce and to
investigate the properties of shellable disjunctive normal forms.

Definition 7.1. A shelling of the DNF m k=1 Ck is a permutation (Cπ(1) , Cπ(2) , . . . ,
Cπ(m) ) of its terms such that, for each k = 1, 2, . . . , m, the expression
Cπ(1) Cπ(2) . . . Cπ(k−1) Ck
is equivalent to an elementary conjunction. A DNF is called shellable if it admits
a shelling.
Note that the definition in [49] is given for positive DNFs only, but it extends in
a straightforward way to arbitrary DNFs. It should also be stressed that, as usual,
we identify the constant 1 with the empty elementary conjunction, but the constant
0 is not an elementary conjunction. However, we could slightly generalize Defi-
nition 7.1 to include the case where Cπ(1) Cπ(2) . . . Cπ(k−1) Ck = 0, and all results
in forthcoming sections could be adapted accordingly without much difficulty.
It can be shown that several natural classes of DNFs are shellable, but we
delay our presentation of such generic examples until the end of the chapter
7.2 Shellings and shellability 331

(Section 7.6), when we shall have more tools at hand with which to establish
shellability.
For now, we just provide a couple of small examples showing that shellable
DNFs exist, that some DNFs are not shellable, and that an arbitrary permutation
of the terms of a shellable DNF is not necessarily a shelling.
Example 7.2. Consider again the DNF φ = x1 x2 ∨ x2 x 3 ∨ x3 x4 , as in Example
7.1. The permutation (x1 x2 , x2 x 3 , x3 x4 ) is not a shelling of its terms, since
(x1 ∨ x2 ) (x2 ∨ x3 ) x3 x4 = x1 x3 x4 ∨ x2 x3 x4
is not equivalent to an elementary conjunction. However, φ is shellable. Indeed,
when we consider its terms in the order (x2 x 3 , x3 x4 , x1 x2 ), we successively obtain
(x 2 ∨ x3 ) x3 x4 = x3 x4 ,
and
(x 2 ∨ x3 ) (x 3 ∨ x4 ) x1 x2 = x1 x2 x3 x4 .
Thus, in particular, φ is equivalent to the orthogonal DNF x2 x 3 ∨ x3 x4 ∨ x1 x2 x3 x4 .
Finally, the positive DNF x1 x2 ∨ x3 x4 is not shellable, since neither (x1 ∨
x2 ) x3 x4 nor (x3 ∨ x4 ) x1 x2 is equivalent to an elementary conjunction. 

As should be clear from the introductory discussion, and as illustrated by


Example 7.2, the following statement holds:
Theorem 7.3. If φ is a shellable DNF on m terms, then φ is equivalent to an
orthogonal DNF on m terms.
Proof. This follows from Definition 7.1 and Theorem 7.1. 

However, it is absolutely not obvious that the “short” ODNF whose existence is
guaranteed by Theorem 7.3 can always be computed efficiently (say, in polynomial
time) for every shellable DNF. This question actually raises multiple side issues:
How difficult is it to recognize whether a DNF is shellable? How difficult is it
to find a shelling of a shellable DNF? How difficult is it to recognize whether a
given permutation of the terms of a DNF is a shelling? Given a shelling of a DNF,
how difficult is it to compute an equivalent ODNF? and so on. We tackle most of
these questions in forthcoming sections. From here on, however, we restrict our
attention to positive DNFs, since all published results concerning shellability have
been obtained for such DNFs.

7.2.2 Orthogonalization of shellable DNFs


For positive DNFs, Ball and Provan [49] proposed an alternative approach to
the concept of shellability, based again on the consideration of the procedure
Orthogonalize. To motivate this approach, let us consider a positive DNF φ =
 m
k=1 Ck , where

Ck = xi , k = 1, 2, . . . , m, (7.4)
i∈Ak
332 7 Orthogonal forms and shellability

and Ak  = ∅ for k = 1, 2, . . . , m. Since computing an ODNF of φk = C1 C2 . . . Ck−1 Ck


is usually a rather costly process, Ball and Provan suggest computing instead an
elementary conjunction Uk such that φk ≤ Uk . The disjunction of these elementary

conjunctions yields a DNF φU = m k=1 Uk such that φ ≤ φU . If the ultimate goal
is to compute the probability that φ = 1, then the conjunctions Uk can be used to
produce an upper-bound on the target value, since
m

Prob[φ = 1] ≤ Prob[φU = 1] ≤ Prob[Uk = 1].
k=1

We now describe conditions that must be fulfilled by any upper-bounding


elementary conjunctions Uk . We start with an easy lemma, for further reference.
Lemma 7.1. For k = 1, 2, . . . , m, the expression φk = C1 C2 . . . Ck−1 Ck is identi-
cally zero if and only if there exists - < k such that A- ⊆ Ak .
Proof. The expression φk is identically zero if and only if
Ck = 1 ⇒ C1 C2 . . . Ck−1 = 0
or, equivalently, if and only if
Ck = 1 ⇒ C1 ∨ C2 ∨ . . . ∨ Ck−1 = 1,
which means that Ck is an implicant of C1 ∨ C2 ∨ . . . ∨ Ck−1 . This completes the
proof, since all conjunctions C1 , C2 , . . . , Cm are positive. 

The lemma shows, in particular, that if A- ⊆ Ak , then Ck must precede C- in


every shelling (since 0 is not an elementary conjunction).
We need yet another definition ([49, 105, 111]).
Definition 7.2. Let A1 , A2 , . . . , Am be an ordered list of subsets of {1, 2, . . . , n}. For
k = 1, 2, . . . , m, the shadow of Ak is the set
S(Ak ) = { j ∈ {1, 2, . . . , n} : there exists - < k ≤ m such that A- \Ak = {j } }. (7.5)
Note that the shadow of Ak depends on the order in which the sets A1 , A2 , . . . , Am
are listed, so that a notation like S(A1 , A2 , . . . , Ak ) may be more appropriate than
S(Ak ). However, we adhere to the shorter notation for the sake of brevity.
Example 7.3. Consider the sets A1 = {1, 2}, A2 = {1, 3, 5}, A3 = {2, 3, 5}, A4 =
{3, 4, 5}, in this order. Their shadows are, respectively, S(A1 ) = ∅, S(A2 ) = {2},
S(A3 ) = {1}, and S(A4 ) = {1, 2}. 


Lemma 7.2. Let C- = i∈A- xi for - = 1, 2, . . . , m, let k ∈ {1, 2, . . . , m}, and assume
that φk = C1 C2 . . . Ck−1 Ck is not identically zero. For an arbitrary elementary
conjunction

Uk = xi xj , (7.6)
i∈Ok j ∈Fk
7.2 Shellings and shellability 333

the implication φk ≤ Uk holds if and only if

(a) Ok ⊆ Ak ; and
(b) Fk ⊆ S(Ak ).

Proof. Sufficiency.Assume that conditions (a)–(b) hold and assume that Uk (X ∗ ) = 0


for some X ∗ ∈ B n . We want to show that φk (X ∗ ) = 0. If there is i ∈ Ak such that
xi∗ = 0, then Ck (X ∗ ) = 0; hence, φk (X ∗ ) = 0. On the other hand, if xi∗ = 1 for all
i ∈ Ak , then condition (a) implies that xi∗ = 1 for all i ∈ Ok . Since Uk (X ∗ ) = 0, there
must be an index j ∈ Fk such that xj∗ = 1 and, by condition (b), there exists - < k
such that A- \ Ak = {j }. This implies that xi∗ = 1 for all i ∈ A- , hence C- (X ∗ ) = 1
and φk (X ∗ ) = 0, as required.
Necessity. Conversely, if φk ≤ Uk , let X ∗ ∈ B n denote the characteristic vector
of Ak . Then φk (X ∗ ) = 1 (because φk is not identically 0); hence, Uk (X ∗ ) = 1,
which implies condition (a).
Suppose now that condition (b) does not hold, that is, suppose that there is an
index j ∈ Fk such that A- \ Ak  = {j } for all - < k. Note that A- \ Ak  = ∅ (by
Lemma 7.1). Hence, for all - < k, there exists i- = j such that i- ∈ A- \ Ak . Define
a point Y ∗ ∈ B n by setting yi∗ = 0 if i ∈ {i1 , i2 , . . . , ik−1 } and yi∗ = 1 otherwise. In
particular, yj∗ = 1, and therefore Uk (Y ∗ ) = 0. On the other hand, Ck (Y ∗ ) = 1 and
C- (Y ∗ ) = 0 for all - < k, so that φk (Y ∗ ) = 1. This contradicts the assumption that
φk ≤ Uk , and the proof is complete. 

As an easy corollary, we obtain [49, 111]:


     
Lemma 7.3. If φ = m k=1 i∈Ak xi and φ
sh
= m
k=1 i∈Ak xi j ∈S(Ak ) x j , then
φ and φ sh are equivalent DNFs.

Proof. Comparing the DNFs termwise, it is obvious that φ sh ≤ φ. The inequality


φ ≤ φ sh follows from Theorem 7.1 and Lemma 7.2. 

Example 7.4. Observe that the DNF φ sh is not necessarily orthogonal. For
instance, when φ = x1 x2 ∨ x3 x4 , we find S(A1 ) = S(A2 ) = ∅, and φ = φ sh . 

We are now ready to establish several characterizations of shellable positive


DNFs due to Ball and Provan [49] (see also [111]).

Theorem 7.4. Let φ = m k=1 Ck , where Ck = i∈Ak xi for k = 1, 2, . . . , m. The
following statements are equivalent:

(a) (C1 , C2 , . . . , Cm ) is a shelling of φ.


(b) For k = 1, 2, . . . , m,

C1 C2 . . . Ck−1 Ck = xi xj . (7.7)
i∈Ak j ∈S(Ak )
334 7 Orthogonal forms and shellability

(c) The DNF


m

 
φ sh = xi xj (7.8)
k=1 i∈Ak j ∈S(Ak )

is orthogonal.
(d) A- ∩ S(Ak ) = ∅ for all 1 ≤ - < k ≤ m.
(e) For all 1 ≤ - < k ≤ m, there exists j ∈ A- and h < k such that Ah \Ak = {j }.
Proof. (a) ⇐⇒ (b). Statement (b) implies (a), by definition of shellings. Conversely,
assume that (C1 , C2 , . . . , Cm ) is a shelling of φ. Then, C1 C2 . . . Ck−1 Ck must be an
elementary conjunction. But Lemma 7.2 implies that the right-hand side of (7.7)
is the smallest elementary conjunction implied by C1 C2 . . . Ck−1 Ck , and hence,
equality must hold in (7.7).
(b) ⇐⇒ (c). If (b) holds, then φ sh is orthogonal, since the expressions
C1 C2 . . . Ck−1 Ck are pairwise orthogonal. Conversely, suppose that φ sh is orthog-
onal. By Lemma 7.2, we know that

C1 C2 . . . Ck−1 Ck ≤ xi xj (7.9)
i∈Ak j ∈S(Ak )

for every k = 1, 2, . . . , m. If the --th inequality is strict, then there exists X∗ ∈ Bn


such that the left-hand side of (7.9) is 0 and the right-hand side of (7.9) is 1 at
the point X∗ , for k = -. Moreover, since φ sh is orthogonal, the right-hand side
(and therefore, the left-hand side) of (7.9) is 0 at the point X∗ for all k  = -.

Thus, we conclude that φ(X∗ ) = m sh ∗
k=1 C1 C2 . . . Ck−1 Ck = 0, while φ (X ) = 1,
contradicting Lemma 7.3.
(c) ⇐⇒ (d). Condition (d) trivially implies (c). Conversely, suppose that φ sh
is orthogonal, and that condition (d) does not hold, that is, A- ∩ S(Ak ) = ∅ for
some pair (l, k) with l < k. Choose - as small as possible with this property. Since
φ sh is orthogonal, it must be the case that S(A- ) ∩ Ak  = ∅, say, j ∈ S(A- ) ∩ Ak .
So, by definition of S(A- ), there exists h < - such that Ah \ A- = {j }. Moreover,
j  ∈ S(Ak ), since Ak and S(Ak ) are disjoint. Therefore, Ah ∩ S(Ak ) ⊆ (A- ∪ {j }) ∩
S(Ak ) = ∅. Since h < -, this contradicts our choice of -.
(d) ⇐⇒ (e). The equivalence of these conditions is obvious in view of the
definition of shadows. 

Example 7.5. Consider the DNF φ = x1 x2 ∨ x1 x3 x5 ∨ x2 x3 x5 ∨ x3 x4 x5 and the


corresponding sets A1 = {1, 2}, A2 = {1, 3, 5}, A3 = {2, 3, 5}, A4 = {3, 4, 5}. We
computed the shadows of these sets in Example 7.3. The reader will check that
Equation (7.7) holds for k = 1, 2, 3, 4, so that φ is shellable and is represented by
the orthogonal DNF φ = x1 x2 ∨ x1 x 2 x3 x5 ∨ x 1 x2 x3 x5 ∨ x 1 x 2 x3 x4 x5 . 

As a corollary of Theorem 7.4, we can now answer some of the questions posed
at the end of Section 7.2.1 (compare with Theorem 7.3).
7.2 Shellings and shellability 335


Theorem 7.5. If φ = m k=1 Ck is a positive DNF on n variables, there is an
O(nm2 )-time algorithm to test whether (C1 , C2 , . . . , Cm ) is a shelling of φ and,
when this is the case, to compute an orthogonal DNF of φ.
Proof. Given a permutation of the terms, it suffices to compute the expression (7.8)
and to test whether it is orthogonal. 

In contrast with Theorem 7.5, the complexity of recognizing shellable DNFs


is an important and intriguing open problem, already mentioned, for instance, in
[49, 254].

7.2.3 Shellable DNFs versus shellable functions


So far, we have defined and investigated shellable DNFs, rather than the functions
they represent. We now consider the following definitions.
Definition 7.3. A positive Boolean function is shellable if its complete DNF is
shellable. It it weakly shellable if it can be represented by a shellable DNF.
We have already seen (in Example 7.2) that certain positive functions are not
shellable: A minimal example is provided by the function

f (x1 , . . . , x4 ) = x1 x2 ∨ x3 x4 .

On the other hand, the concept of weak shellability is rather vacuous, since it can be
shown that every positive Boolean function is weakly shellable (Boros et al. [111]).
Theorem 7.6. Every positive Boolean function can be represented by a shellable
DNF.

Proof. Let f be a positive function, let {CI = j ∈I xj | I ∈ I} denote the set of
all implicants (not necessarily prime) of f , and let π be a permutation that orders
the implicants by nonincreasing degree. Then, the DNF


φ= xj
I ∈I j ∈I

represents f , and condition (d) in Theorem 7.4 can be used to verify that π
is a shelling of φ. Indeed, if I- , Ik ∈ I and CI- precedes CIk in π , then there
is an index j ∈ I- \ Ik . The set I = Ik ∪ {j } is in I and CI precedes CIk in
π. Therefore, j ∈ S(Ik ), and we conclude that j ∈ I- ∩ S(Ik ), as required by
condition (d). 

Since the size of the DNF produced in the proof of Theorem 7.6 can generally
be very large relative to the number of prime implicants of f , let us provide another
construction that uses a smaller subset of the implicants.
We first recall a well-known definition.
336 7 Orthogonal forms and shellability

Definition 7.4. If I , J are two subsets of N = {1, 2, . . . , n}, we say that I precedes
J in the lexicographic order, and we write I <L J if

min{j ∈ N | j ∈ I \ J } < min{j ∈ N | j ∈ J \ I }.

Now, for I ∈ I, let h(I ) denote the largest element of the subset I ⊆ {1, 2, . . . , n},
and let H (I ) = I \ {h(I )}. We call leftmost implicant of f any implicant CI of
f for which CH (I ) is not an implicant of f , and we denote by L the family of
leftmost implicants of f . Clearly, all prime
 implicants
 of f are in L, therefore f

is represented by the DNF ψL = I ∈L x
j ∈I j . Boros et al. [111] showed that
the lexicographic order <L defines a shelling of ψL . We leave the proof of this
claim as an end-of-chapter exercise and simply illustrate it on an example.

Example 7.6. We know that the function f (x1 , . . . , x4 ) = x1 x2 ∨ x3 x4 is not


shellable. Its leftmost implicants are x1 x2 , x1 x3 x4 , x2 x3 x4 and x3 x4 , listed here
in lexicographic order. The corresponding DNF

ψL = x1 x2 ∨ x1 x3 x4 ∨ x2 x3 x4 ∨ x3 x4

represents f and is shellable, since the DNF

ψLsh = x1 x2 ∨ x1 x 2 x3 x4 ∨ x 1 x2 x3 x4 ∨ x 1 x 2 x3 x4

is orthogonal. 

Let us finally observe that there exist families of positive functions for which
the smallest shellable DNF representation involves a number of terms that grows
exponentially with the number of its prime implicants:

Theorem 7.7. For all n ≥ 1, every shellable DNF of f (x1 , x2 , . . . , xn ,y1 , y2 , . . . , yn )



= ni=1 xi yi contains at least 2n − 1 terms.

Proof. This is an immediate corollary of Theorems 7.2 and 7.3. 

7.3 Dualization of shellable DNFs


The formal similarity between certain dualization and orthogonalization proce-
dures was noted in Section 7.1. Since shellable DNFs have short orthogonal forms,
it is quite natural to wonder whether they also have short dual expressions (remem-
ber Conjecture 7.1). In this section, we provide an affirmative answer to this
question and prove a result due to Boros et al. [111] stating that shellable positive
DNFs can be dualized in time polynomial in their input size. This result implies,
in particular, that for shellable positive functions, the number of prime implicants
of the dual is polynomially bounded in the number of prime implicants of the
function.
7.3 Dualization of shellable DNFs 337

Theorem 7.8. If a Boolean function f in n variables can be represented by a


shellable positive DNF φ involving m terms, then f d has at most nm prime impli-
cants. If a shelling of φ is available, then the prime implicants of f d can be
generated in O(nm2 ) time.

Proof. We prove the first statement by induction on m.


If m = 1, then f is an elementary conjunction and its dual is an elementary
disjunction that has at most n prime implicants.
Let us now assume that the statement has been established for shellable DNFs
m
of
mat most m − 1 terms, let f be represented by the DNF φ =  k=1 Ck =
m−1
k=1 i∈Ak x i , where (C 1 , C 2 , . . . , C m ) is a shelling of φ, and let g = k=1 Ck .
Observe that (C1 , C2 , . . . , Cm−1 ) is a shelling of g. Therefore, by the induction
hypothesis, g d has at most n(m − 1) prime implicants. Let us denote the complete
DNF of g d by
p p


ψ= Pk = xj , (7.10)
k=1 k=1 j ∈Jk

where P1 , P2 , . . . , Pp are all prime implicants of g d , and p ≤ n(m − 1). Then,


     
p


f d = gd ∧  xi  =  xj  ∧  xi  . (7.11)
i∈Am k=1 j ∈Jk i∈Am

p  
On the other hand, g = C1 C2 . . . Cm−1 = k=1 j ∈Jk xj , so that
   
p

C1 C2 . . . Cm−1 Cm =  xj  ∧  xi 
k=1 j ∈Jk i∈Am
 
p

=  xj xi  . (7.12)
k=1 j ∈Jk i∈Am

By definition of shellings, the DNF (7.12) is equivalent to a single conjunction.


Since no absorption can take place in (7.12), and no two terms of (7.12) form
a consensus, it must be the case that all its terms are identically zero, except
one. In other words, there is an index - ∈ {1, 2, . . . , p} such that J- ∩ Am = ∅ and
Jk ∩ Am  = ∅ for all k  = -. (The same conclusion can be reached by noting that J-
is exactly the shadow of Am .)
Thus, from (7.11),
   
p

fd = xj  ∨  xj  , (7.13)
k=1 j ∈Jk i∈Am j ∈J- ∪{i}
k =-

and we conclude that f d has at most nm prime implicants.


338 7 Orthogonal forms and shellability

Using relation (7.13), all prime implicants of f d can easily be generated in


O(nm) time once the prime implicants of g d are known. The overall O(nm2 ) time
bound follows. 

Note that the dualization procedure sketched in the proof of Theorem 7.8
is exactly the classical algorithm SD-Dualization presented in Chapter 4,
Section 4.3.2.
In Theorem 7.6, we established that every positive function can be represented
by a shellable DNF. This result, combined with Theorem 7.8, might raise the
impression that every positive function can be dualized in polynomial time. This
is, of course, a fallacy because, as shown in Theorem 7.7, the shortest shellable
representation of a positive Boolean function may be extremely large.
Finally, we mention that Theorem 7.8 generalizes a sequence of earlier results
on regular functions [74, 225, 735, 736] and on aligned functions [105], since these
are special classes of shellable positive DNFs (see Chapter 8 and the end-of-chapter
exercises). For aligned and regular functions, efficient dualization algorithms with
running time O(n2 m) have been proposed in [74, 105, 225, 736]. None of those
procedures, however, seems to be generalizable to shellable functions.

7.4 The lexico-exchange property


7.4.1 Definition
We now introduce a subclass of shellable DNFs, whose definition can best be
viewed as a specialization of condition (e) in Theorem 7.4. As in Definition 7.4,
<L denotes the lexicographic order on the subsets of N = {1, 2, . . . , n}.
Definition 7.5. A positive DNF
m


φ(x1 , x2 , . . . , xn ) = xj (7.14)
k=1 j ∈Ak

has the lexico-exchange (LE) property with respect to (x1 , x2 , . . . , xn ) if, for every
pair -, k ∈ {1, 2, . . . , m} such that A- <L Ak , there exists h ∈ {1, 2, . . . , m} such that
Ah <L Ak and Ah \ Ak = {j }, where j = min{i | i ∈ A- \ Ak }.
We say that φ has the LE property with respect to a permutation
(σ (x1 ), σ (x2 ), . . . , σ (xn )) of its variables, or that σ is an LE order for φ, if the
DNF φ σ defined by
φ σ (σ (x1 ), σ (x2 ), . . . , σ (xn )) = φ(x1 , x2 , . . . , xn )
has the LE property with respect to (σ (x1 ), σ (x2 ), . . . , σ (xn )).
Finally, we simply say that φ has the LE property if φ has the LE property with
respect to some permutation of its variables.
Note that these definitions can be extended to positive functions by applying
them to the complete DNF of such functions (as in Section 7.2.3).
7.4 The lexico-exchange property 339

The LE property was introduced by Ball and Provan in [49] and further inves-
tigated in [111, 760]. Interest in this concept is motivated by the observation that
every DNF with the LE property is also shellable.
Theorem 7.9. If the DNF φ(x1 , x2 , . . . , xn ) given by equation (7.14) has the
LE property with respect to (x1 , x2 , . . . , xn ), then the lexicographic order on
{A1 , A2 , . . . , Am } induces a shelling of the terms of φ.
Proof. This follows by comparing Definition 7.5 and condition (e) in
Theorem 7.4. 

In fact, most classes of shellable DNFs investigated in the literature have the
LE property (see [49, 105] and the examples in Section 7.6).
It is interesting to observe that the converse of Theorem 7.9 does not hold:
Namely, the lexicographic order may induce a shelling of the terms of a DNF,
even when this DNF does not have the LE property with respect to (x1 , x2 , . . . , xn ).
This is because Definition 7.5 not only determines the order of the terms of φ but
also imposes the choice of the element j in A- \ Ak .
Example 7.7. The DNF φ = x1 x2 ∨ x2 x3 ∨ x3 x4 is shellable with respect to
the lexicographic order of its terms. However, φ does not have the LE prop-
erty with respect to (x1 , x2 , x3 , x4 ): With A- = {1, 2} and Ak = {3, 4}, we obtain
j = min{i | i ∈ A- \ Ak } = 1, and there is no h such Ah \ Ak = {1}. (But the reader
may check that φ has the LE property with respect to the permutation (x2 , x3 , x1 , x4 )
of its variables.) 

7.4.2 LE property and leaders


In the remainder of this section, when φ(x1 , x2 , . . . , xn ) is a positive DNF, we denote
by φ1 (respectively, φ0 ) the disjunction of the terms of φ involving x1 (respectively,
not involving x1 ), so that
φ(x1 , x2 , . . . , xn ) = x1 φ1 ∨ φ0 . (7.15)
Definition 7.6. We say that x1 is a leader for a positive DNF φ(x1 , x2 , . . . , xn ) if
φ1 ≥ φ0 . Equivalently, x1 is a leader if every term of φ0 is absorbed by a term of
φ1 , or if every term of φ0 is an implicant of φ1 .
The next theorem clarifies the relationship between the LE property and the
existence of leaders.
Theorem 7.10. A positive DNF φ(x1 , x2 , . . . , xn ) = x1 φ1 ∨ φ0 has the LE property
with respect to (x1 , x2 , . . . , xn ) if and only if
(a) both φ1 and φ0 have the LE property with respect to (x2 , x3 , . . . , xn ); and
(b) either x1 is a leader for φ or φ does not involve x1 .
340 7 Orthogonal forms and shellability

 m
Proof. Let φ = m k=1 Ck = k=1 j ∈Ak xj .
Necessity. Property (a) is an immediate consequence of Definition 7.5. To estab-
lish property (b), we must show that, if φ involves x1 and Ck is any term of φ0 ,
then Ck is absorbed by some term of φ1 . Let C- be any term of φ1 , and observe that
A- <L Ak and min{i | i ∈ A- \ Ak } = 1. Since φ has the LE property, there exists
h ∈ {1, 2, . . . , m} such that Ah <L Ak and Ah \ Ak = {1}. Then, j ∈Ah \{1} xj is a
term of φ1 which absorbs Ck , as required.
Sufficiency. Suppose that (a) and (b) hold. If φ does not involve x1 , then (a)
implies that φ = φ0 has the LE property. So, assume that x1 is a leader for φ, let
A- <L Ak , and let j = min{i | i ∈ A- \ Ak }. If C- and Ck are both in φ1 or both in
φ0 , then condition (a) implies that φ has the LE property. Otherwise, it must be
the case that C- is a term of φ1 , Ck is a term of φ0 , and j = 1. By definition of

leaders, there is a term in φ1 , say, j ∈Ah \{1} xj , which absorbs Ck . Then, however,
Ah <L Ak and Ah \ Ak = {1}, showing that φ has the LE property. 

7.4.3 Recognizing the LE property


In view of Definition 7.5, verifying whether a positive DNF φ(x1 , x2 , . . . , xn ) has
the LE property with respect to the identity permutation (x1 , x2 , . . . , xn ) can easily
be done in polynomial time, say, in O(nm3 ) time, where m is the number of
terms of φ. Provan and Ball [760] presented another procedure with O(n2 m)
time complexity for this problem. Since m is typically much larger than n, we
expect their procedure to be more efficient than the trivial one. We now describe
this procedure, which also turns out to be useful for recognizing regular Boolean
functions (in Chapter 8).
The procedure can be seen as relying on Theorem 7.10, which characterizes
the LE property in terms of leaders (although the description in [760] does not
explicitly use this characterization). Let us, therefore, momentarily concentrate on
the algorithmic complexity of the following type of queries: For a positive DNF

φ(x1 , x2 , . . . , xn ), and for a subset A ⊆ {1, 2, . . . , n}, is j ∈A xj absorbed by a term
of φ? (Remember the definition of leaders.)
Such a query can easily be answered in O(nm) time for a DNF on m terms. But
in fact, this time complexity is far from optimal when φ is a fixed DNF possessing
the LE property: Then, for each input subset A, it becomes possible to answer
the query in time O(n). This complexity can be achieved by using an appropriate
data structure to represent φ. The fixed overhead incurred in setting up the data
structure amounts to O(nm) operations but can be amortized if the number of
queries to be answered for φ is large enough.
The data structure to be used is a rooted, labeled binary tree T (φ). The tree
T (φ) is defined for an arbitrary positive DNF φ(x1 , x2 , . . . , xn ). (As we will see
in Section 7.4.4, T (φ) is essentially equivalent to a decision tree for the function
7.4 The lexico-exchange property 341

represented by φ when φ has the LE property with respect to (x1 , x2 , . . . , xn )).


For n ≥ 1, the tree T (φ) is recursively defined as follows (we denote its root
by r(φ)):

(a) If φ is identically 0, then T (φ) is empty, that is, T (φ) has no vertices.
(b) If φ is identically 1, then T (φ) has exactly one unlabeled vertex, namely,
its root r(φ).
(c) If φ(x1 , x2 , . . . , xn ) is not identically 1, then let φ = x1 φ1 ∨ φ0 (where φ0
and φ1 do not involve x1 , as usual); build T (φ) by introducing a root r(φ)
labeled by x1 , creating disjoint copies of T (φ0 ) and T (φ1 ), and making
r(φ1 ) (respectively, r(φ0 )) the left son (respectively, the right son) of r(φ).
(If either r(φ1 ) or r(φ0 ) is not defined, i.e., if either φ1 or φ0 is identically
zero, then the corresponding son of r(φ) does not exist.)

Example 7.8. Consider the DNF φ(x1 , x2 , x3 , x4 , x5 ) = x1 x2 ∨ x1 x3 ∨ x1 x4 x5 ∨


x2 x3 x4 . The corresponding tree T (φ) is represented in Figure 7.2. The leaves are
indexed by terms as explained in Theorem 7.11 hereunder. 

It is obvious that T (φ) has height at most n. Except for the leaves, all
vertices of T (φ) are labeled by a variable. Moreover, the leaves themselves cor-
respond in a natural way to the terms of φ. Indeed, for an arbitrary leaf v, let

x1 u1

x2 u2 u3 x2

u4 x3 u5 u6 x3

u7 x4 u8 u9 x4

x5 u10 u11

u12

Figure 7.2. The binary tree for Example 7.8.


342 7 Orthogonal forms and shellability

r(φ) = u1 , u2 , . . . , uq = v be the vertices of T (φ) lying on the unique path from the
root r(φ) to v. Define
q−1
1
P (v) = { j | uk is labeled by xj and uk+1 is the left son of uk }. (7.16)
k=1

Theorem 7.11. For every positive DNF φ, the mapping v + → j ∈P (v) xj defines
a one-to-one correspondence between the leaves of T (φ) and the terms of φ.

Proof. The proof is left to the reader. 

Thus, T (φ) has exactly m leaves and at most nm vertices. It is actually easy
to see that T (φ) sorts the terms of φ in lexicographic order, from left to right.
Moreover, T (φ) can be set up in time O(nm).

With the data structure T (φ) at hand, let us now revert to the query: “Is j ∈A xj
absorbed by a term of φ?” Our next goal is to show that, when φ has the LE property
with respect to (x1 , x2 , . . . , xn ), the query is correctly answered by the procedure
Implicant(A) in Figure 7.3, consisting of one traversal of T (φ) along a path from
root to leaf.

Example 7.9. The reader may want to apply the procedure Implicant(A) to the
tree T (φ) displayed in Figure 7.2, with A = {2, 4, 5}, and check that it returns the
answer False. 

Procedure Implicant(A)
Input: A subset A of {1, 2, . . . , n}.
Output: True if j ∈A xj is absorbed by a term of the DNF φ represented by T (φ),
False otherwise.

begin
if T (φ) is empty (that is, φ = 0) then return False;
u1 := r(φ);
for k = 1 to n do
begin
if uk is a leaf of T (φ) then return True
else if k ∈ A then
begin
if uk has a left son then uk+1 := leftson(uk ) else uk+1 := rightson(uk )
end
else if k  ∈ A then
begin
if uk has a right son then uk+1 := rightson(uk ) else return False
end
end
end

Figure 7.3. Procedure Implicant.


7.4 The lexico-exchange property 343

The procedure Implicant can be implemented to run in time O(n). It is certainly


worth stressing that it does not necessarily return the correct answer when φ does
not have the LE property with respect to (x1 , x2 , . . . , xn ), as the next example
illustrates.

Example 7.10. Consider the DNF φ(x1 , x2 , x3 , x4 , x5 ) = x1 x2 x3 ∨ x1 x4 ∨ x1 x5 ∨


x2 x4 x5 . We leave it to the reader to verify that Implicant({1, 2, 5}) returns the
answer False, despite the fact that x1 x2 x5 is an implicant of φ. 

However, even for an arbitrary DNF φ, Implicant works correctly “in half of
the cases”: Namely, it never errs on the answer True.

Theorem 7.12. Let φ(x1 , x2 , . . . , xn ) be a positive DNF, let T (φ) be the associated
binary tree, and let A ⊆ {1, 2, . . . , n}. If the procedure Implicant(A)
returns the
answer True and terminates at the leaf v of T (φ), then j ∈A xj is absorbed by
the term of φ associated with v.

Proof. Suppose that the procedure eventually reaches the leaf v = uq and returns

the answer True. Let j ∈P (v) xj be the term of φ associated with v by (7.16).
From the description of Implicant, we see that, if k  ∈ A, then uk+1 is the right son
of uk , k = 1, 2, . . . , q. Hence, by construction of P (v), k  ∈ P (v). Thus, P (v) ⊆ A,

and j ∈A xj is absorbed by the term j ∈P (v) xj of φ. 

More interestingly for our purpose, Provan and Ball [760] proved that Impli-
cant works correctly when φ has the LE property with respect to (x1 , x2 , . . . , xn ).
Note that the DNF φ considered in Example 7.10 does not have the LE property
with respect to (x1 , x2 , x3 , x4 , x5 ) (although it has it with respect to the permutation
(x1 , x4 , x5 , x2 , x3 ) of its variables).

Theorem 7.13. Let φ(x1 , x2 , . . . , xn ) be a positive DNF having the LE property


with respect to (x1 , x2 , . . . , xn ), let T (φ) be the associated binary tree, and let
A ⊆ {1, 2, . . . , n}. The procedure Implicant(A) returns the answer True if and

only if j ∈A xj is absorbed by a term of φ.

Proof. The “only if” statement follows from Theorem 7.12.


We prove the converse statement by induction on n. If n = 1, then the statement
is easily verified. Assume next that n ≥ 2 and that j ∈A xj is absorbed by a term
of φ. If φ is identically 1, then T (φ) has exactly one vertex, namely, r(φ), and we
are done. Otherwise, write φ = x1 φ1 ∨ φ0 . By Theorem 7.10, x1 is either a leader
for φ or does not appear in φ.

If 1  ∈ A or if x1 does not appear in φ, then j ∈A xA is absorbed by a term of
φ0 . Hence, φ0 is not identically 0, and r(φ) has a right son, which can be identified
with the root of T (φ0 ), say, r(φ0 ). In the execution of Implicant(A), u2 is set
equal to r(φ0 ) (note that if x1 does not appear in φ, then r(φ) has no left son).
The next steps of the procedure are identical to those performed by Implicant(A)
on the subtree T (φ0 ). Note that, by Theorem 7.10, φ0 has the LE property with
344 7 Orthogonal forms and shellability

Procedure LE-Property(φ)

Input: A DNF φ(x1 , x2 , . . . , xn ) = m
k=1 j ∈Ak xj , where A1 <L A2 <L . . . <L Am .
Output: True if φ has the LE property with respect to (x1 , x2 , . . . , xn ), False
otherwise.

begin
set up the binary tree T (φ);
for k = 2 to m do
begin

find the leaf of T (φ), say vk , associated with the term j ∈Ak xj ;
for each vertex u on the path from r(φ) to vk
if vk is a successor of the right son of u and if u has a left son then
begin
let xi be the label of u;
if Implicant(Ak ∪ {i}) = False then return False;
end
end
return True;
end

Figure 7.4. Procedure LE-Property.

respect to (x2 , x3 , . . . , xn ). Hence, by induction, the procedure returns the output


True.

Assume now, on the other hand, that 1 ∈ A and that x1 is a leader. Then, j ∈A xA
is absorbed by a term of φ1 or by a term of φ0 . In both cases, however, the definition
of leaders implies that j ∈A xA is absorbed by a term of φ1 . Hence, u2 is set equal
to left son of r(φ), namely, r(φ1 ), and the proof is complete by induction as in the
previous case. 

We can now state (in Figure 7.4) the efficient procedure proposed by Provan
and Ball [760] to test whether a DNF φ(x1 , x2 , . . . , xn ) has the LE property with
respect to the identity permutation (x1 , x2 , . . . , xn ).

Theorem 7.14. The procedure LE-Property is correct and can be implemented


to run in O(n2 m) time.

Proof. Assume first that φ has the LE property with respect to (x1 , x2 , . . . , xn ).

Trivially, for all k ∈ {1, 2, . . . , m} and all i ∈ {1, 2, . . . , n}, j ∈Ak ∪{i} xj is absorbed by

the term j ∈Ak xj . Hence, by Theorem 7.13, Implicant(Ak ∪ {i}) always returns
the answer True, and LE-Property eventually returns True.
Conversely, assume that LE-Property returns True, and consider two sets
A- , Ak with A- <L Ak and i = min{j | j ∈ A- \ Ak }. Let vk and v- be the leaves
of T (φ) associated with Ak and A- , respectively. On the path from r(φ) to vk ,
consider the last vertex u that is an ancestor of v- . Then, u is labeled by xi , and
Implicant(Ak ∪ {i}) is called in the innermost for loop of the procedure. When
running on the input Ak ∪ {i}, Implicant traverses T (φ) until vertex u, then visits
7.4 The lexico-exchange property 345

the left son of u, and eventually returns the value True (by assumption). By

Theorem 7.12, this means that Ak ∪ {i} is absorbed by the term Ch = j ∈Ah xj
associated with the leaf reached by Implicant. It follows that Ah <L Ak and
Ah \ Ak = {i}; hence, φ has the LE property.
We have mentioned that T (φ) can be set up in time O(nm). LE-Property
makes at most nm calls on Implicant, and each of these calls can be executed in
time O(n). Hence, the overall running time of LE-Property is O(n2 m). 

In contrast with the previous results, Provan and Ball [760] pointed out that
the existence of an efficient procedure to determine whether a DNF has the LE
property with respect to some unknown permutation of its variables, is far from
obvious. Boros et al. [111] settled this question in the negative by proving the
following result:

Theorem 7.15. It is NP-complete to decide whether a positive DNF φ has the LE


property, even when φ has degree at most 5.

We omit the (rather technical) proof of this result. It should be noted, however,
that the complexity of this recognition problem remains open for DNFs of degree
3 or 4. The case of quadratic DNFs is the topic of Section 7.5.

7.4.4 Dualization of functions having the LE property


In Section 1.12.3 of Chapter 1, we saw that, given a Boolean function f and an
arbitrary order of the variables, say, (x1 , x2 , . . . , xn ), a decision tree D(f ) for f can
be recursively constructed as follows (see Figure 1.5):
(a) If f is constant, then D(f ) has a unique vertex (which is both its root and
its leaf) labeled with the constant value of f (either 0 or 1).
(b) Otherwise, let f0 = f|x1 =0 , f1 = f|x1 =1 , and build D(f ) by introducing a
root r(f ) labeled by x1 , creating disjoint copies of D(f0 ) and D(f1 ) and
making r(f1 ) (respectively, r(f0 )) the left son (respectively, the right son)
of r(f ).
When f is represented by a positive DNF φ(x1 , x2 , . . . , xn ) = x1 φ1 ∨ φ0 , and
when φ has the LE property with respect to (x1 , x2 , . . . , xn ), Theorem 7.10 implies
that f1 = φ1 and f0 = φ0 (unless φ1 = 0, in which case f1 = f0 = φ0 ). It is
then easy to see that the decision tree D(f ) produced by the above procedure is
essentially identical to the binary tree T (φ) defined in Section 7.4.3, up to some
minor differences. In particular, D(f ) has at most 2nm leaves and can be set up
in time O(nm).
Therefore, as a corollary of Theorem 1.35 and Theorem 1.36, we obtain
(E. Boros, personal communication):

Theorem 7.16. If a Boolean function f (x1 , x2 , . . . , xn ) is expressed by a positive


DNF φ such that φ has the LE property with respect to (x1 , x2 , . . . , xn ), then a
346 7 Orthogonal forms and shellability

decision tree D(f ) representing f can be built in time O(nm). Moreover, an


ODNF of f , an ODNF of f d , and the prime implicants of f d can be generated
from D(f ) in time O(n2 m).

Proof. We leave the details of the proof to the reader. 

Although this theorem follows in a rather straightforward way from well-known


properties of decision trees and from the results in Provan and Ball [760], it does
not seem to have been formulated explicitly in the literature; see Boros [105] for
related considerations.

7.5 Shellable quadratic DNFs and graphs


In this section, we concentrate on the case in which φ(x 1 , x2 , . . . , xn ) is a pure
quadratic positive DNF, that is, a DNF of the form φ = {i,j }∈E xi xj , where E
is a set of pairs of elements of N = {1, 2, . . . , n}. We assume that all members
of E are distinct, so that φ can be viewed as the complete DNF of a quadratic
positive function f and G = (N , E) is a simple, undirected graph. For simplicity,
we transpose from DNFs to graphs the terminology introduced in this chapter.
Thus, a graph G = (N , E) is shellable if and only if the corresponding quadratic
positive DNF φ = {i,j }∈E xi xj is shellable. Similarly, we speak of shelling of the
edges of G, of the LE property for G, and so on.
The purpose of this section is to present some results characterizing shellable
graphs from Benzaken et al. [66]. Let us first recall a few graph-theoretic definitions
(we follow the terminology in Appendix A and in Golumbic [398]). We denote by
Ck a chordless cycle on k vertices and k edges (k ≥ 3), and by 2K2 the graph on four
vertices consisting of two disjoint edges. So, 2K2 is the complement of C4 . A graph
is called triangulated (or chordal) if it contains no induced cycle of length 4 or
more. Triangulated graphs constitute one of the fundamental, and most extensively
studied, classes of perfect graphs. They have been characterized in numerous ways;
see, for example, Berge [71], Brandstädt, Le and Spinrad [152], Duchet [284],
Golumbic [398], and so on. We shall use the fact that a graph G = (N , E) is the
complement of a triangulated graph if and only if every induced subgraph of G
contains a cosimplicial vertex, that is, a vertex v such that {u ∈ N | {u, v}  ∈ E} is a
stable set.
Benzaken et al. [66] observed that shellable graphs can be built up, one edge at
a time, without ever producing 2K2 .

Theorem 7.17. Let G = (V , E) and E = {e1 , e2 , . . . , em }. The permutation


(e1 , e2 , . . . , em ) is a shelling of G if and only if, for every k = 1, 2, . . . , m, the
graph Gk = (N , {e1 , e2 , . . . , ek }) has no induced subgraph isomorphic to 2K2 .

Proof. By Theorem 7.4(e), (e1 , e2 , . . . , em ) is a shelling of G if and only


if, for all e- , ek ∈ E with - < k, there exists j ∈ e- and there exists h < k such
7.5 Shellable quadratic DNFs and graphs 347

that eh \ ek = {j }. The latter condition means that the edge eh shares at least one
vertex (namely, vertex j ) with e- , and shares exactly one vertex with ek . This
is easily seen to be equivalent to the condition that e- ∪ ek does not induce 2K2
in Gk . 

We are now ready for our main characterization of shellable graphs [66].

Theorem 7.18. For a graph G, the following statements are equivalent:

(a) G has the LE property.


(b) G is shellable.
(c) The complement of G is triangulated.

Proof. (a) ⇒ (b). This implication holds by Theorem 7.9.


(b) ⇒ (c). Assume first that G = (N , E) is the complement of a chordless cycle
on n vertices, that is, G = Cn . Then, we show by induction on n that G is not
shellable. Indeed, if n = 4, then G = C4 is isomorphic to 2K2 , hence G is not
shellable by Theorem 7.17. For n > 4, assume by contradiction that (e1 , e2 , . . . , em )
is a shelling of G, and let H = (N, E \ {em }). Then, (e1 , e2 , . . . , em−1 ) is a shelling
of H , and hence, H is shellable. Note that H contains Ck , the complement of
a chordless cycle on k vertices, with 4 ≤ k < n (indeed, the complement of H
is a cycle on n vertices with exactly one chord). Now, by induction, Ck is not
shellable. On the other hand, in view of Theorem 7.17, all the induced subgraphs
of a shellable graph are shellable; hence, Ck (as a subgraph of H ) should be
shellable. The contradiction shows that the complement of a chordless cycle is
not shellable. Now, if G is not the complement of a triangulated graph, then G
contains the complement of a chordless cycle as an induced subgraph, implying
that G is not shellable.
(c) ⇒ (a). Let G be the complement of a triangulated graph, and let
(v1 , v2 , . . . , vn ) be a permutation of N such that, for j = 1, 2, . . . , n, vj is a cosim-
plicial vertex in the subgraph Gj of G induced by {vj , vj +1 , . . . , vn }. We want to
prove that G has the LE property with respect to (v1 , v2 , . . . , vn ).
Consider two edges e- = {vj , vi } and ek = {vr , vs } with e- <L ek , j < i and
j ≤ r < s. We must show that there exists eh <L ek such that eh \ ek = min{vt | vt ∈
e- \ ek }. If vj = vr , or vi = vr , or vi = vs , then it is easy to check that eh = e-
satisfies the condition. Hence, we can assume that all four vertices vj , vi , vr , and
vs are distinct. Consider now the subgraph Gj : Since vj is cosimplicial in this
graph, and {vr , vs } is not stable, either vr or vs must be a neighbor of vj . Sup-
pose, for instance, that eh = {vj , vr } ∈ E (the other case is similar). Then eh is as
required. 

As a consequence of Theorem 7.18, we note that quadratic shellable DNFs


can be recognized in O(n2 ) time, since the same result holds for triangulated
graphs.
348 7 Orthogonal forms and shellability

7.6 Applications
We conclude this chapter with a brief presentation of three generic classes of
shellable DNFs arising in reliability theory and in game theory. We refer to Ball
and Nemhauser [48], Ball and Provan [49, 760], or Colbourn [205, 206] for a more
detailed discussion.
Application 7.1. (Undirected all-terminal reliability.) Let G = (N , E) be a con-
nected undirected graph with E = {e1 , e2 , . . . , em }, and let T be the collection of all
spanning trees of G, viewed as subsets of E. Let us associate with every edge ei ,
i = 1, 2, . . . , m, a Boolean variable xi indicating whether the edge is operational

or failed. Then, the DNF φ = T ∈T ei ∈T xi takes value 1 exactly when the graph
formed by the operational edges is connected. In the terminology of Section 1.13.4,
φ represents the structure function of the reliability system whose minimal pathsets
are the spanning trees of G.
We claim that φ satisfies the LE property with respect to (x1 , x2 , . . . , xm ) (i.e.,
with respect to an arbitrary permutation of its variables). Indeed, let T- , Tk be two
spanning trees with T- <L Tk , and let j = min{i | ei ∈ T- \ Tk }. From elementary
properties of trees, there exists an edge ei ∈ Tk \ T- such that Tk ∪ {ej } \ {ei } is a
spanning tree. Call this spanning tree Th . Then, Th \ Tk = {ej }, and Th <L Tk , as
required by the LE property.
This result implies, in particular, that the all-terminal reliability of a graph can
be computed in time polynomial in the number of spanning trees of the graph (see
Ball and Nemhauser [48] for details). 

Application 7.2. (Matroids.) This is a generalization of the previous example. A


collection M of subsets of N = {1, 2, . . . , n} is the set of bases of a matroid if it
satisfies the following condition: For all B- , Bk ∈ M and for all j ∈ B- \ Bk , there
exists i ∈ Bk \ B- such that Bk ∪ {j } \ {i} is in M (see, e.g., Welsh [905, 906]).
It is well-known that the spanning trees of a connected graph are the bases of a
matroid.
Now, when M is the set of bases of a matroid on N , let φM (x1 , x2 , . . . , xn ) =

B∈M i∈B xi . From the foregoing definition, it is easy to check that φM satisfies
the LE property with respect to every ordering of (x1 , x2 , . . . , xn ). 

Application 7.3. (Threshold functions and weighted majority games.) Suppose


that f (x1 , x2 , . . . , xn ) is a threshold function representing a weighted majority game
on N = {1, 2, . . . , n}, as defined in Chapter 1, Section 1.13.3. Thus, each player i
carries a positive weight wi ∈ R, and the point X∗ ∈ B n is a true point of f (i.e.,
X ∗ is the characteristic vector of a winning coalition of players) if and only if
n ∗
i=1 wi xi > t, where t ∈ R is a predetermined quota. If the weights are sorted so
that w1 ≥ w2 ≥ . . . ≥ wn , then the function f (or, equivalently, the complete DNF
of f ) has the LE property with respect to (x1 , x2 , . . . , xn ).
Indeed, observe that the prime implicants of f correspond to the minimal win-
ning coalitions of the game, and let A- and Ak be two minimal winning coalitions
7.7 Exercises 349

such that A- <L Ak . If j = min{i | i ∈ A- \ Ak } and i is any index in Ak \ A- ,


then Ak ∪ {j } \ {i} is a winning coalition (since j < i implies wj ≥ wi ). There-
fore, there exists Ah ⊆ Ak ∪ {j } \ {i} such that Ah is a minimal winning coalition,
Ah \ Ak = {j }, and Ah <L Ak . This shows that f has the LE property.
As a consequence, the number of true points (namely, winning coalitions) of f
can be efficiently computed when the list of all its minimal true points (namely,
minimal winning coalitions) is available. In view of the relation between Chow
parameters and Banzhaf indices (as discussed in Chapter 1, Section 1.13.3 and
Section 1.13.4), this also implies that the Banzhaf indices of a weighted major-
ity game can be computed in time polynomial in the number of its minimal true
points. We return to these topics in subsequent chapters (Chapter 8 and Chapter 9).
In particular, Chapter 8 is devoted to the investigation of an important class of
shellable functions generalizing threshold functions, namely, the class of regular
functions. 

7.7 Exercises
1. Let C1 , C2 , . . . , Ck and U be elementary conjunctions. Prove that it is co-NP-
complete to decide whether C1 C2 . . . Ck−1 Ck ≤ U . (Compare with Lemma
7.2.) Hint: Let Ck = y and U = z, where y, z are variables not occuring in
C1 , C2 , . . . , Ck−1 .
2. Complete the argument following the proof of Theorem 7.6: Show that the
DNF ψL is shellable, where ψL is the disjunction of all leftmost implicants
of a positive function.

3. A positive DNF φ = m k=1 i∈Ak xi is aligned if, for every k = 1, 2, . . . , m
and for every j  ∈ Ak such that j < hk = max{i : i ∈ Ak }, there exists A- ⊆
(Ak ∪ {j }) \ {hk }. Prove that every aligned DNF has the LE property (see
Boros [105] and Section 8.9.2).
4. Complete the proof of Theorem 7.16 (see Boros [105]).

5. Let φ(x1 , x2 , . . . , xn ) = mk=1 i∈Ak xi be a DNF such that |Ak | = n − 2 for
k = 1, 2, . . . , m. Show that φ is shellable if and only if the graph G = (N , E)
is connected, where N = {1, 2, . . . , n} and E = {N \ Ak | k = 1, 2, . . . , m}.

Questions for thought


6. Find a small, shellable DNF that does not have the LE property with respect
to any order of its variables.
7. Prove or disprove: If a DNF φ(x1 , x2 , . . . , xn ) is shellable with respect to the
lexicographic order of its terms, then it has the LE property with respect to
some order of its variables. (Compare with Example 7.7.)
8. Determine the complexity of Shellability:
Instance: A positive DNF φ.
Output: Yes if φ is shellable, No otherwise.
350 7 Orthogonal forms and shellability

9. The article [111] states a stronger form of Theorem 7.8, namely, it claims
that:
Claim. If a Boolean function in n variables can be represented by a shellable
positive DNF of m terms, then its dual can be represented by a shellable
DNF of at most nm terms.
Unfortunately, the proof given in [111] is flawed, so that the validity of the
claim (namely, the existence of a short, shellable DNF of the dual) remains
open. Can you prove or disprove it?
8
Regular functions

In this chapter we investigate the main properties of regular Boolean functions. This
class of functions constitutes a natural extension of the class of threshold functions,
and, as such, has repeatedly and independently been “rediscovered” by several
researchers over the last 40 years. It turns out that regular functions display many
of the most interesting properties of threshold functions, and that these properties
are, accordingly, best understood by studying them in the appropriate context of
regularity. From an algorithmic viewpoint, regular functions constitute one of the
most tractable classes of Boolean functions: Indeed, fundamental problems such
as dualization, computation of reliability, or set covering are efficiently solvable
when associated with regular functions. Besides its more obvious implications, this
nice algorithmic behavior will eventually pave the way for the efficient recognition
of threshold functions, which are discussed in the next chapter.

8.1 Relative strength of variables and regularity


In Chapter 1 (Definition 1.31), we defined the class of threshold Boolean functions
as follows:
Definition 8.1. A Boolean function f on B n is a threshold (or linearly separable)
function if there exist n weights w1 , w2 , . . . , wn ∈ R and a threshold t ∈ R such that,
for all (x1 , x2 , . . . , xn ) ∈ Bn ,
n

f (x1 , x2 , . . . , xn ) = 0 if and only if wi xi ≤ t.
i=1

The (n + 1)-tuple (w1 , w2 , . . . , wn , t) is called a (separating) structure of f .


One of the most remarkable properties of a threshold function is that the weights
w1 , w2 , . . . , wn naturally determine an ordinal ranking of the variables, translating
the relative “influence” of the variables on the value of the function: Namely, if
wi ≥ wj , then the function is “more likely” to take the value 1 when xi = 1 and
xj = 0 than when xi = 0 and xj = 1.

351
352 8 Regular functions

This notion of relative influence, or relative strength, of variables can be


extended to more general Boolean functions, as expressed by the following def-
inition, which was independently introduced by Isbell [520]; Muroga, Toda, and
Takasu [700]; Paull and McCluskey [732]; Winder [916]; Maschler and Peleg
[673]; Neumaier [709], and so on. In this definition, as usual, we denote by ek the
n–dimensional unit vector with k-th component equal to 1 (k = 1, 2, . . . , n).

Definition 8.2. Let f (x1 , x2 , . . . , xn ) be a Boolean function, and let i, j ∈


{1, 2, . . . , n}. We say that variable xi is stronger than variable xj with respect to
f , and we write xi f xj if and only if, for all X ∗ ∈ Bn ,

xi∗ = xj∗ = 0 ⇒ f (X∗ ∨ ei ) ≥ f (X∗ ∨ ej ).

Equivalently, xi f xj if either i = j or f|xi =1,xj =0 ≥ f|xi =0,xj =1 .

The subscript f appearing in the symbol f is a reminder that the strength


relation depends on f . To simplify the notations, we sometimes write “xi  xj with
respect to f ”, instead of xi f xj .
Let us illustrate Definition 8.2 with a couple of examples.

Example 8.1. Let f (x1 , x2 ) = x1 x2 ∨ x1 x2 . There holds x1 f x2 and x2 f x1 ,


since f (0, 1) = f (1, 0) = 1. 

Example 8.2. If f (x1 , x2 , x3 , x4 ) = x1 x2 ∨ x2 x3 ∨ x3 x4 , then x2 f x1 . Indeed, for


all values of x3 and x4 , f (0, 1, x3 , x4 ) = x3 ≥ f (1, 0, x3 , x4 ) = x3 x4 .
One similarly verifies that x2 f x4 , x3 f x1 , and x3 f x4 . No other pairs of
variables are comparable with respect to f . For instance, x1 and x4 are not
comparable, since f (1, x2 , x3 , 0) = x2 and f (0, x2 , x3 , 1) = x3 . 

A bit of additional terminology and notations comes in handy when dealing


with the strength relation. We say that
• xi is strictly stronger than xj (xi 0f xj ) if xi f xj but not xj f xi ;
• xj is weaker than xi (xj f xi ) if xi f xj , and xj is strictly weaker than xi
(xj ≺f xi ) if xi 0f xj ;
• xi and xj are comparable if xi f xj or xj f xi ;
• xi and xj are equivalent or symmetric (xi ≈f xj ) if xi f xj and xj f xi .

The qualifier “symmetric” is justified by the following easy observation.

Theorem 8.1. For a Boolean function f (x1 , x2 , . . . , xn ), and for i, j ∈ {1, 2, . . . , n},
xi ≈f xj if and only if

f (Y ) = f (Y ij ) for all Y ∈ Bn , (8.1)

where Y ij is the point obtained by exchanging the values of components i and j


in Y .
8.1 Relative strength of variables and regularity 353

Proof. Assume that xi ≈f xj , and let Y ∈ B n . If yi = yj then (8.1) trivially holds.


Else, suppose, for instance, that yi = 1 and yj = 0. Then, since xi f xj , f (Y ) ≥
f (Y ij ), and since xj f xi , f (Y ) ≤ f (Y ij ). Thus (8.1) holds again.
One similarly shows that (8.1) implies xi ≈f xj . 

As Example 8.2 illustrates, certain pairs of variables may turn out to be incom-
parable with respect to the strength relation f ; in other words, f is generally
not a complete relation. On the other hand, as we prove now, the strength relation
always defines a preorder, that is, a reflexive and transitive relation.

Theorem 8.2. The strength relation is a preorder on the set of variables of every
Boolean function.

Proof. The strength relation is obviously reflexive. To see that it is also transitive,
consider a function f (x1 , x2 , . . . , xn ), three indices i, j , k such that xi f xj and
xj f xk , and a point X∗ ∈ B n with xi∗ = xk∗ = 0. We must show that f (X ∗ ∨ ek ) ≤
f (X∗ ∨ ei ).
If xj∗ = 0, then f (X ∗ ∨ ek ) ≤ f (X ∗ ∨ ej ) ≤ f (X ∗ ∨ ei ) and we are done. If

xj = 1, then let Y ∗ ∈ B n be the point obtained by switching the j -th component of
X ∗ from 1 to 0; thus, yj∗ = 0 and X∗ = Y ∗ ∨ ej . Then,

f (X ∗ ∨ ek ) = f (Y ∗ ∨ ej ∨ ek )
≤ f (Y ∗ ∨ ei ∨ ek ) (since xi f xj )
≤ f (Y ∗ ∨ ei ∨ ej ) (since xj f xk )
= f (X∗ ∨ ei ),
and the proof is complete. 

We pointed out in the introductory paragraphs of this section that the strength
relation associated with a threshold function is always complete. More precisely,
we can state:

Theorem 8.3. If f (x1 , x2 , . . . , xn ) is a threshold function with separating structure


(w1 , w2 , . . . , wn , t) and w1 ≥ w2 ≥ . . . ≥ wn , then x1 f x2 f · · · f xn .

Proof. Let 1 ≤ i < j ≤ n. If X∗ ∈ B n and xi∗ = xj∗ = 0, then nk=1 wk xk∗ + wi ≥
 n ∗ ∗ ∗
k=1 wk xk + wj , and hence, f (X ∨ ei ) ≥ f (X ∨ ej ). 

Threshold functions are not the only Boolean functions featuring a complete
strength preorder. For instance, the function displayed in Example 8.1 has a com-
plete strength preorder but is not a threshold function, since it is not monotone (the
reader will easily verify that every threshold function is monotone). If we restrict
our attention to monotone functions, then it can be shown that all functions of five
variables for which the strength preorder is complete are threshold functions, but
this implication fails for functions of six variables or more (see Winder [917] and
Exercise 11 at the end of this chapter).
354 8 Regular functions

The foregoing observations motivate the main definition of this chapter.

Definition 8.3. A positive Boolean function f is regular if its strength preorder


is complete. In particular, we say that f (x1 , x2 , . . . , xn ) is regular with respect to
(x1 , x2 , . . . , xn ) if x1 f x2 f · · · f xn .

Example 8.3. The function f in Example 8.2 is not regular since x1 and x4 are
not comparable in the preorder f .
On the other hand, the function g(x1 , x2 , x3 ) = x1 x2 ∨x1 x3 is regular with respect
to (x1 , x2 , x3 ), and the function h(x1 , x2 , . . . , x5 ) = x1 x2 ∨ x1 x3 ∨ x1 x4 x5 ∨ x2 x3 x4
is regular with respect to (x1 , x2 , . . . , x5 ). 

Because it is so natural and (as we will see) fruitful, the regularity concept has
been “rediscovered” several times in various fields of applications (see Muroga,
Toda, and Takasu [700]; Paull and McCluskey [732]; Winder [916]; Neumaier
[709]; Golumbic [398]; Ball and Provan [49], etc.). It constitutes our main object
of study in this chapter.
Before diving more deeply into this topic, however, let us first offer the impatient
reader an illustration of how the notion of strength preorder can be used in a game-
theoretical framework. More applications are presented at the end of Section 8.2,
after we have become better acquainted with the elementary properties of the
strength relation.

Application 8.1. (Political science, Game theory.) The legislative body in Boole-
land consists of 45 representatives, 11 senators and a president. In order to be
passed by this legislature, a bill must receive

(1) at least half of the votes in the House of Representatives and in the Senate,
as well as the president’s vote, or
(2) at least two-thirds of the votes in the House of Representatives and in the
Senate.

(The knowledgeable reader will recognize that this lawmaking process is a slightly
simplified version of the system actually in use in the United States)
As usual, we can model this voting mechanism by a monotone Boolean function
f (r1 , . . . , r45 , s1 , . . . , s11 , p), where variable ri (respectively, sj , p) takes value 1 if
representative i (respectively, senatorj , the president) casts a “Yes” vote, and takes
value 0 otherwise (1 ≤ i ≤ 45 and 1 ≤ j ≤ 11). The true points of f correspond to
the voting patterns described by rules (1) and (2) above.
A more detailed description of f can be obtained as follows: For k, n ≥ 1,
denote by gk (x1 , x2 , . . . , xn ) the “k–majority” function on n variables, that is, the
threshold function defined by
n

gk (x1 , x2 , . . . , xn ) = 1 ⇐⇒ xi ≥ k.
i=1
8.2 Basic properties 355

Then f can be expressed as

f (r1 , . . . , r45 , s1 , . . . , s11 , p)


   
= g23 (r1 , . . . , r45 ) ∧ g6 (s1 , . . . , s11 ) ∧ p ∨ g30 (r1 , . . . , r45 ) ∧ g7 (s1 , . . . , s11 ) .

One can easily verify that, with respect to the strength preorder associated
to f ,

• any two representatives are equivalent;


• any two senators are equivalent;
• a representative and a senator cannot be compared in terms of strength;
• the president is strictly stronger than any representative or any senator.

In this political setting, the strength preorder can be straightforwardly inter-


preted as defining an ordinal measure of power on the set of legislators. Indeed,
what does it mean here for legislator i to be (strictly) stronger than legislator j ?
Simply that, if S is any coalition (that is, subset) of legislators who all decided to
vote in the same way (either all “Yes” or all “No”), and if neither i nor j has
committed her vote yet, then the members of S prefer i to j as an additional con-
vert. Indeed, i’s vote is more likely to influence the final outcome of the vote than
j ’s vote. Thus, i is “more powerful” than j . It seems that the strength relation was
first explicitly introduced in this context by Maschler and Peleg [673], although
similar concepts can be found in Isbell [520]. 

8.2 Basic properties


We present in this section some of the fundamental properties of the strength pre-
order and of regular functions. To begin with, we address an issue which may
already have come to the reader’s mind during our discussion of Application 8.1,
namely, the question of the relationship between the strength preorder and the
Chow parameters of a function. As a matter of fact, we argued in Chapter 1 that
the Chow parameters provide a numerical measure of the influence of each variable
on the value of the function (see Sections 1.6, 1.13.3, 1.13.4). Since the strength
relation also captures this influence, albeit in an ordinal setting, one would legit-
imately expect some connection between the two concepts. Such a connection
indeed exists, as expressed by the next statement.

Theorem 8.4. Let f (x1 , x2 , . . . , xn ) be a Boolean function, and let (ω1 , ω2 , . . . , ωn , ω)


denote its Chow parameters. If xi 0f xj , then ωi > ωj . If xi ≈f xj , then ωi = ωj .

Proof. Let T be the set of true points of f . If xi f xj , then it follows immediately


from the definitions of the Chow parameters and of the strength preorder that
356 8 Regular functions

ωi = | {X ∈ T : xi = 1} |
= | {X ∈ T : xi = xj = 1} | + | {X ∈ T : xi = 1, xj = 0} |
≥ | {X ∈ T : xi = xj = 1} | + | {X ∈ T : xi = 0, xj = 1} |
= | {X ∈ T : xj = 1} |
= ωj .

If xi 0f xj , then there exists at least one point X ∗ ∈ B n such that xi∗ = xj∗ = 0,
f (X∗ ∨ ei ) = 1 and f (X∗ ∨ ej ) = 0. Thus, the above inequality is strict.
If xi ≈f xj , then f is symmetric on xi , xj , and hence, ωi = ωj . 

Having clarified this point, let us now turn to the issue of deciding whether two
variables are comparable with respect to the strength relation. We only deal with
positive functions expressed by their complete (i.e., prime irredundant) DNF, as
the same question turns out to be NP-hard for arbitrary DNFs (see Exercise 2 at
the end of this chapter).
Theorem 8.5. Let f (x1 , x2 , . . . , xn ) be a positive Boolean function and let i, j be
distinct indices in {1, 2, . . . , n}. Write the complete DNF of f in the form α xi xj ∨
β xi ∨ γ xj ∨ δ, where α, β, γ and δ are positive DNFs which do not involve xi nor
xj . Then
xi f xj if and only if β ≥ γ .
Proof. Without loss of generality, suppose that i = 1 and j = 2. For X = (0, 0, Y ) ∈
Bn , we get f (X ∨e1 ) = β(Y )∨δ(Y ), and f (X ∨e2 ) = γ (Y )∨δ(Y ). Hence, by def-
inition of the strength relation, x1 f x2 if and only if β(Y ) ∨ δ(Y ) ≥ γ (Y ) ∨ δ(Y )
for all Y ∈ B n−2 . To establish the theorem, note that β ≥ γ trivially implies
β ∨ δ ≥ γ ∨ δ. For the converse implication, assume that β ∨ δ ≥ γ ∨ δ, and
let C be a prime implicant of γ . Since C ≤ γ ≤ β ∨ δ, the DNF β ∨ δ contains a
term B which absorbs C. Note that B cannot be a term of δ (hence, of f ), since B
absorbs Cx2 , which is, by assumption, a prime implicant of f . Hence, B must be
a term of β. We conclude that β ≥ γ , and the proof is complete. 

Theorem 8.5 can be rephrased as follows:


Theorem 8.6. Let f (x1 , x2 , . . . , xn ) be a positive Boolean function. For all i, j ∈
{1, 2, . . . , n}, the following statements are equivalent:
(a) xi f xj .

(b) For each prime implicant of f , say, k∈A xk , such that j ∈ A and i  ∈ A,

k∈(A∪{i})\{j } xk is an implicant of f .
(c) For each prime implicant of f , say, k∈A xk , such that j ∈ A and i  ∈ A,

there is a prime implicant of f , say, k∈P xk , such that P ⊆ (A ∪ {i}) \ {j }.
Proof. This is an immediate consequence of Theorem 8.5 and Theorem 1.22. 
8.2 Basic properties 357

Example 8.4. Let f (x1 , x2 , x3 , x4 , x5 , x6 , x7 ) = x1 x2 x3 ∨ x1 x3 x4 ∨ x1 x3 x5 ∨


x2 x3 x4 x6 ∨ x2 x3 x4 x7 ∨ x2 x3 x5 x7 ∨ x4 x5 x6 . Letting i = 1 and j = 2 in the statement
of Theorem 8.5, we get

β = x3 x4 ∨ x3 x5 > γ = x3 x4 x6 ∨ x3 x4 x7 ∨ x3 x5 x7 ,

and hence, x1 0f x2 . On the other hand, for i = 1 and j = 4, we have

β = x2 x3 ∨ x3 x5 and γ = x2 x3 x6 ∨ x2 x3 x7 ∨ x5 x6 .

Since neither β ≥ γ nor β ≤ γ holds, we conclude that x1 and x4 are not compa-
rable with respect to f . 

Let us now see how the strength preorder behaves under some fundamen-
tal transformations of Boolean functions, namely, restriction, composition, and
dualization.

Theorem 8.7. Let f (x1 , x2 , . . . , xn ) be a Boolean function, let i, j , k be distinct


indices in {1, 2, . . . , n}, let g = f|xk =1 , and let h = f|xk =0 . Then, xi f xj if and only
if both xi g xj and xi h xj . Moreover, the following statements are equivalent:

(a) f is regular with respect to (x1 , x2 , . . . , xn ).


(b) x1 f x2 , and both f|x1 =1 and f|x1 =0 are regular with respect to
(x2 , x3 , . . . , xn ).

Proof. The first equivalence is an immediate consequence of Definition 8.2, and


the second equivalence follows from it. 

Example 8.5. As in Example 8.2, consider the function f (x1 , x2 , x3 , x4 ) =


x1 x2 ∨ x2 x3 ∨ x3 x4 , for which x3 f x1 and x3 f x4 . The restriction of
f to x2 = 1 is the function g(x1 , x3 , x4 ) = x1 ∨ x3 , and its restriction
to x2 = 0 is the function h(x1 , x3 , x4 ) = x3 x4 . Theorem 8.7 implies that
x3 g x1 , x3 g x4 , x3 h x1 , x3 h x4 . On the other hand, x1 and x4 are not compa-
rable with respect to f , since x1 0g x4 and x4 0h x1 . Thus, f is not regular (even
though both g and h are regular; see also Exercise 1 at the end of the chapter). 

We next establish an easy result concerning the composition of functions.

Theorem 8.8. If xi is stronger than xj with respect to each of the Boolean


functions fk (x1 , x2 , . . . , xn ) (k = 1, 2, . . . , m), and if g(y1 , y2 , . . . , ym ) is a posi-
tive function, then xi is stronger than xj with respect to the composite function
h = g(f1 , f2 , . . . , fm ), for all i, j in {1, 2, . . . , n}.

Proof. Let h = g(f1 , f2 , . . . , fm ), and let X ∗ be a point of B n with xi∗ = xj∗ = 0. For
k = 1, 2, . . . , m, fk (X ∗ ∨ ei ) ≥ fk (X ∗ ∨ ej ). Hence, by positivity of g, h(X∗ ∨ ei ) ≥
h(X∗ ∨ ej ). 
358 8 Regular functions

In particular, we observe that:


Theorem 8.9. If xi is stronger than xj with respect to each of the Boolean
functions fk (x1 , x2 , . . . , xn ) (k = 1, 2, . . . , m), then xi is stronger than xj
with respect to f1 f2 . . . fm and with respect to f1 ∨ f2 ∨ . . . ∨ fm .
Proof. This is an immediate corollary of Theorem 8.8. 

The strength preorder is invariant under dualization:


Theorem 8.10. The strength preorders of a function and of its dual are identical.
In particular, a function is regular if and only if its dual is regular.
Proof. Let f (x1 , x2 , . . . , xn ) be a Boolean function and let i, j be distinct indices in
{1, 2, . . . , n}. We only have to show that, if xi is stronger than xj with respect to f ,
then xi is stronger than xj with respect to f d (the converse implication follows
by duality). For simplicity of presentation, assume that i = 1 and j = 2, and that
x1 f x2 . Then, for all X = (0, 0, Y ) ∈ Bn ,

f d (1, 0, Y ) = f (0, 1, Y ) ≥ f (1, 0, Y ) = f d (0, 1, Y ).

Hence, x1 is stronger than x2 with respect to f d . 

Example 8.6. Consider again f (x1 , x2 , x3 , x4 ) = x1 x2 ∨ x2 x3 ∨ x3 x4 , as in the pre-


vious example. Then, f d = x1 x3 ∨ x2 x3 ∨ x2 x4 , and the strength preorder of f d is
the same as that of f . 
In some of the subsequent developments, it will be of interest to know conditions
which must hold when a variable is stronger than all the other ones. The next result
states a simple necessary condition found in Winder [916].
Theorem 8.11. Let f (x1 , x2 , . . . , xn ) be a positive Boolean function, not identically
equal to 1, and let i ∈ {1, 2, . . . , n}. Write the complete DNF of f in the form
φ1 xi ∨ φ0 , where φ1 and φ0 are positive DNFs that do not involve xi . If xi f xj
for j = 1, 2, . . . , n, then φ1 ≥ φ0 .
Proof. Suppose, for instance, that i = 1 and x1 f xj for j = 1, 2, . . . , n. We only
have to show that if Y ∗ = (y2∗ , y3∗ , . . . , yn∗ ) is a minimal true point of φ0 , then Y ∗
is a true point of φ1 . If Y ∗ = 0, then φ0 and f are identically 1, contradicting the
hypothesis. Thus Y ∗  = 0, and we can assume that Y ∗ = (1, Z ∗ ), where Z ∗ ∈ B n−2 .
By assumption, (0, Z ∗ ) is a false point of φ0 . Therefore,

f (0, 1, Z ∗ ) = φ0 (1, Z ∗ ) = φ0 (Y ∗ ) = 1,
f (1, 0, Z ∗ ) = φ1 (0, Z ∗ ) ∨ φ0 (0, Z ∗ ) = φ1 (0, Z ∗ ).

However, x1 f x2 implies that f (0, 1, Z ∗ ) ≤ f (1, 0, Z ∗ ), and hence φ1 (0, Z ∗ ) = 1.


By positivity of φ1 , we conclude that φ1 (Y ∗ ) = φ1 (1, Z ∗ ) = 1, as required. 
8.2 Basic properties 359

Example 8.7. Consider the function f (x1 , x2 , x3 ) = x1 x2 ∨ x1 x3 , for which


x1 0f x2 and x1 0f x3 . Letting i = 1 in Theorem 8.11, we obtain φ1 = x2 ∨ x3 ,
φ0 = 0, and hence, φ1 ≥ φ0 , as expected.
We can use the same example to show that the converse of Theorem 8.11 does
not hold in general. Indeed, if we let i = 2 in the statement of Theorem 8.11, then
we get φ1 = x1 , φ0 = x1 x3 , and hence φ1 ≥ φ0 . But x2 is strictly weaker than x1 . 

In Chapter 7, when discussing shellability and the lexico-exchange (LE) prop-


erty, we called “leader” a variable satisfying the necessary condition in Theorem
8.11 (see Definition 7.6), and we established the relation between the LE property
and the existence of leaders in Theorem 7.10.
Combining these results, it is now rather straightforward to prove the following
theorem due to Ball and Provan [49, 760] (see also Application 7.3 in Chapter 7).

Theorem 8.12. If f (x1 , x2 , . . . , xn ) is regular with respect to (x1 , x2 , . . . , xn ), then


f has the LE property with respect to (x1 , x2 , . . . , xn ).

Proof. We use induction on n. When n = 1, the claim is trivial, so let us assume


that n > 1. If f is regular with respect to (x1 , x2 , . . . , xn ), then, by Theorem 8.11,
x1 is a leader of f . Moreover, by Theorem 8.7, both f|x1 =1 and f|x1 =0 are regular,
and hence, they have the LE property with respect to (x2 , x3 , . . . , xn ). Then, The-
orem 7.10 implies that f also has the LE property with respect to (x1 , x2 , . . . , xn ). 

We will return in subsequent sections to this connection between the LE property


and regularity. For now, we describe some additional applications of the concepts
of strength preorder and of regularity.

Application 8.2. (Integer programming.) Consider an optimization problem in


0–1 variables of the form:
n

maximize z(x1 , x2 , . . . , xn ) = ci xi (8.2)
i=1

subject to f (x1 , x2 , . . . , xn ) = 0 (8.3)


n
(x1 , x2 , . . . , xn ) ∈ B , (8.4)

where f is a positive Boolean function (cf. Section 1.13.6 in Chapter 1). If xi and
xj are two variables such that xi f xj and ci ≤ cj , then one easily verifies that
there exists an optimal solution X∗ of (8.2)–(8.4) such that xi∗ ≤ xj∗ . This fact can
be used in an enumerative approach to the solution of (8.2)–(8.4). Indeed, as soon
as variable xi has been fixed to 1 in a branch of the enumeration tree, then xj
can automatically be fixed to 1. More generally, the conclusion that xi ≤ xj can
also be handled as a logical condition to be satisfied by the optimal solution of
the problem (see Application 2.4 in Section 2.1).
In particular, if c1 ≤ c2 ≤ · · · ≤ cn and if f is regular with x1 f x2 f · · · f xn ,
then (8.2)–(8.4) has an optimal solution X ∗ satisfying x1∗ ≤ x2∗ ≤ · · · ≤ xn∗ . Under
360 8 Regular functions

these assumptions, an optimal solution of (8.2)–(8.4) is given by the largest vector


X ∗ of the form X ∗ = ei ∨ ei+1 ∨ · · · ∨ en which satisfies the constraint f (X ∗ ) = 0.
Such a solution is delivered by the greedy procedure, which successively sets the
variables xn , xn−1 , . . . , x1 to 1, while maintaining the feasibility of the solution thus
produced.
In Section 8.6, we shall see that, when f is a regular function given by the list
of its prime implicants, problem (8.2)–(8.4) is always solvable in polynomial time,
without any further conditions on the coefficients c1 , c2 , . . . , cn . 

Application 8.3. (Game theory). Since a simple game is nothing but a positive
Boolean function, we can speak of the strength preorder of a simple game (see
Section 1.13.3). What can be said about this preorder in a game-theoretic setting?
As discussed in Application 8.1, the strength preorder can be naturally inter-
preted as providing an ordinal ranking of the players according to their relative
power in the game. On the other hand, we have defined in Section 1.13.3 different
cardinal measures of power, or power indices, associated with a simple game. In
particular, we have observed that the Banzhaf indices are a monotone transfor-
mations of the Chow parameters of the associated Boolean function. Hence, it
follows from Theorem 8.4 that these power indices are consistent with the strength
preorder, in the following sense: If variable xi is (strictly) stronger than variable
xj with respect to the strength preorder of the game, then the Banzhaf index of
player i is (strictly) larger than the Banzhaf index of player j .
The notion of strength preorder has been extended by Maschler and Peleg [673]
to cooperative games in characteristic function form (i.e., pseudo-Boolean func-
tions, or real-valued functions of 0-1 variables; see Chapter 13). 

Application 8.4. (Combinatorics). A tactical configuration over the finite set N =


{1, 2, . . . , n} is a hypergraph H = (N , E) with the following two properties:
1. Each member of E has the same cardinality, say, k > 0.
2. Each element of N appears in the same number, say, r > 0, of members of E.
Neumaier [709] proved a result about tactical configurations, which is easily
stated and established in our Boolean-theoretic framework. Given the tactical
configuration H = (N , E), let fH (x1 , x2 , . . . , xn ) be the positive Boolean function
defined, as in Section 1.13.5, by

fH (x1 , x2 , . . . , xn ) = xj .
A∈E j ∈A

Note that H is a tactical configuration if and only if all terms of fH have the same
degree k and every variable appears in r terms of fH . Then, Neumaier’s result
states: If H is a tactical configuration such that fH is regular, then E = {A ⊆ N :
|A| = k}. To see that this is indeed the case, consider any two variables xi and xj
with xi f xj and rewrite fH in the form: fH = αxi xj ∨ βxi ∨ γ xj ∨ δ. Theorem
8.7 implies that β ≥ γ . But then, using the definition of a tactical configuration,
8.2 Basic properties 361

it is easy to verify that β = γ . Since xi and xj are two arbitrary variables, we


conclude that fH is symmetric on all its variables, and Neumaier’s result follows.
Euler [317] and Reiterman et al. [784] have investigated other classes of reg-
ular hypergraphs. 

Application 8.5. (Reliability.) We have already mentioned that, in the termi-


nology of reliability theory, every positive Boolean function f (x1 , x2 , . . . , xn ) can
be interpreted as the structure function of a coherent binary system (see Section
1.13.4). The strength relation often has an obvious interpretation for complex engi-
neering systems. For instance, if two resistors R1 and R2 are placed in series in
an electrical circuit, and if R1 has higher resistance than R2 , then R1 is stronger
than R2 with respect to the structure function of the circuit.
We have also mentioned that two of the fundamental algorithmic problems in
reliability theory are the dualization of the structure function f and the computa-
tion of the reliability polynomial of f , that is, Relf = Prob[f (x1 , x2 , . . . , xn )] = 1,
when the xi ’s are viewed as independent Bernoulli random variables taking value 1
with probability pi and value 0 with probability 1 − pi (i = 1, 2, . . . , n).
We already know that these two problems are computationally difficult for gen-
eral functions but turn out to be polynomially solvable when f has the LE property
(and is given as a complete DNF). Hence, by virtue of Theorem 8.12, they are also
polynomially solvable when f is regular. Section 8.5 is devoted to the description
of a streamlined, very efficient algorithm for the dualization of regular functions.
As for the computation of Relf , the results described in Chapter 7 can be special-
ized as follows. 

Theorem 8.13. Assume that f (x1 , x2 , . . . , xn ) is regular with respect to



(x1 , x2 , . . . , xn ), and let m k=1 j ∈Ak xj denote the complete DNF of f . For
k = 1, 2, . . . , m, let µk = max{j : j ∈ Ak } and Sk = {1, 2, . . . , µk } \ Ak . Then, f is
represented by the orthogonal (sum of disjoint products) DNF
  
m

φ sh =  xj   xj  (8.5)
k=1 j ∈Ak j ∈Sk

and
  
m
  
Relf (p1 , p2 , . . . , pn ) = Prob[f (X) = 1] =  pj   (1 − pj ) .
k=1 j ∈Ak j ∈Sk
(8.6)
Before proving Theorem 8.13, we illustrate it by means of a small example.
Example 8.8. Let f = x1 x2 ∨ x1 x3 ∨ x1 x4 x5 ∨ x2 x3 x4 . Then, x1 f x2 f x3 f x4
f x5 . We obtain

µ1 = 2, S1 = ∅, µ2 = 3, S2 = {2}, µ3 = 5, S3 = {2, 3}, µ4 = 4, S4 = {1},


362 8 Regular functions

so that f can be written as the sum of disjoint products

f = x1 x2 ∨ x1 x2 x3 ∨ x1 x2 x3 x4 x5 ∨ x1 x2 x3 x4 ,

and for all choices of (p1 , p2 , p3 , p4 , p5 ),

Prob[f (X) = 1]
= p1 p2 + p1 (1 − p2 )p3 + p1 (1 − p2 )(1 − p3 )p4 p5 + (1 − p1 )p2 p3 p4 .

Proof. Assume, without loss of generality, that the prime implicants of f are listed
in lexicographic order, that is, A1 <L A2 <L . . . <L Am (remember Definition 7.4).
Then, the statement is an immediate corollary of Theorem 7.4 if we can prove that,
for k = 1, 2, . . . , m, the set Sk is the shadow of Ak , that is,

Sk = { j ∈ {1, 2, . . . , n} : there exists - < k ≤ m such that A- \ Ak = {j } }. (8.7)

Consider first an index r ∈ Sk = {1, 2, . . . , µk }\Ak . By Theorem 8.6(c), since r <



µk , there exists a prime implicant j ∈A- xj of f such that A- ⊆ (Ak ∪ {r}) \ {µk }.
Clearly, A- \ Ak = {r} and A- <L Ak . This shows that Sk is contained in the
right-hand side of (8.7).
Conversely, suppose now that A- \ Ak = {r} for some - < k ≤ m. From the
definition of the lexicographic order, it follows that r = min{j : j ∈ A- \ Ak } <
min{j : j ∈ Ak \ A- } ≤ µk . Hence, r ∈ Sk , and equality holds in (8.7). 

Note that the computation of the expressions (8.5) and (8.6) does not require
explicitly computing the lexicographic order of A1 , A2 , . . . , Am , that is, the shelling
of f . All that is actually needed is the knowledge of the strength (complete)
preorder on the variables of f .
As a corollary of Theorem 8.13, we observe that the number of true points and
the Chow parameters of a regular Boolean function can be efficiently computed.
Indeed, as pointed out in Section 1.13.4, the number of true points of a function
f is equal to 2n times the probability that f takes the value 1 when each variable
takes value 0 or 1 with probability 12 . In view of equation (8.6), this probability is
given by the expression
  
m  1 1 m µk
1 1     1
Relf ( , . . . , ) = =
2 2 k=1 j ∈A
2 j ∈S
2 k=1
2
k k

(see Winder [920] for related observations).

8.3 Regularity and left-shifts


In this section, we briefly discuss a useful characterization of regular functions
relying on the notion of left-shift of a Boolean point. Recall that the support of a
point Y ∈ Bn is the set supp(Y ) = {i ∈ {1, 2, . . . , n} : yi = 1}.
8.3 Regularity and left-shifts 363

Definition 8.4. For any two points X∗ , Y ∗ ∈ B n , we say that Y ∗ is a left–shift of


X∗ , and we write Y ∗  X ∗ if there exists a mapping σ : supp(X∗ ) → supp(Y ∗ )
such that
(a) σ is injective, that is, σ (i) = σ (j ) when i  = j ; and
(b) σ (i) ≤ i for all i = 1, 2, . . . , n.
Intuitively speaking, Y ∗  X ∗ if the 1’s of X ∗ can be “shifted to the left” (from
position i to position σ (i)) until they coincide with a subset of the 1’s of Y ∗ . Notice
that  is a preorder and that  is an extension of the preorder ≥, in the sense
that Y ∗ ≥ X∗ implies Y ∗  X ∗ .
Example 8.9. In B 3 ,

(1, 1, 1) (1, 1, 0)  (1, 0, 1) (1, 0, 0)  (0, 1, 0)  (0, 0, 1)  (0, 0, 0)

and

(1, 1, 1) (1, 1, 0)  (1, 0, 1)  (0, 1, 1) (0, 1, 0)  (0, 0, 1)  (0, 0, 0),

but the points (1, 0, 0) and (0, 1, 1) are not comparable with respect to  . 

Theorem 8.14. For a positive Boolean function f (x1 , x2 , . . . , xn ), the following


statements are equivalent:
(a) f is regular, with x1 f x2 f · · · f xn .
(b) Every left-shift of a true point is a true point: For all Y , Z ∈ B n , if Z  Y ,
then f (Y ) ≤ f (Z).
Proof. Assume that f is regular with respect to (x1 , x2 , . . . , xn ), and consider two
points Y , Z ∈ B n with Z  Y . Let σ be the mapping associated with Y and Z,
as in Definition 8.4, and let Y σ be the point with support {σ (i) : yi = 1}. Then,
Y σ  Y and Z ≥ Y σ . Since f is positive, f (Y σ ) ≤ f (Z). On the other hand, the
definition of the strength preorder easily implies that f (Y ) ≤ f (Y σ ), since Y σ is
obtained by “shifting to the left” the nonzero entries of Y . Condition (b) follows.
Assume now that condition (b) is satisfied, and consider two indices 1 ≤ i <
j ≤ n. Let X∗ ∈ Bn and xi∗ = xj∗ = 0. Then, (X∗ ∨ ei )  (X ∗ ∨ ej ) implies
f (X∗ ∨ ej ) ≤ f (X∗ ∨ ei ). Hence, xi f xj , and condition (a) follows. 

Some authors prefer to take condition (b) in Theorem 8.14 as the defining
property of regular functions (up to a permutation of the variables). In particular,
consideration of the “left-shift” relation allows us to introduce in a natural way
some special types of false points and true points that play an interesting role in
computational manipulations of regular and threshold functions (see e.g., Bradley,
Hammer, and Wolsey [148], Muroga [698], and Section 9.4.2).
Definition 8.5. A point X ∗ ∈ B n is a ceiling of the Boolean function
f (x1 , x2 , . . . , xn ) if X∗ is a false point of f and if no other false point of f is
364 8 Regular functions

a left–shift of X∗ . Similarly, X ∗ is a floor of f if X∗ is a true point of f and if X∗


is a left-shift of no other true point of f .
Thus, a ceiling is a “leftmost” false point, and a floor is a “rightmost” true point.
Observe that a ceiling X ∗ of f is necessarily a maximal false point of f , since, for
all Y ∗ ∈ B n , X∗ ≤ Y ∗ implies Y ∗  X ∗ , and hence, either X ∗ = Y ∗ or Y ∗ is a true
point of f . Similarly, every floor of f must be a minimal true point of f .
Clearly, the notions of ceiling and floor depend on the labeling of the variables.
In the sequel, when we refer to ceilings and floors of a regular function f , we
always assume that f is regular with respect to (x1 , x2 , . . . , xn ), meaning that the
variables have been preliminarily sorted by nonincreasing strength.
Example 8.10. Consider the function f = x1 ∨ x2 x3 . Its maximal false point
X∗ = (0, 0, 1) is not a ceiling, since Y ∗ = (0, 1, 0) is another false point of f and
Y ∗  X ∗ . One can check that Y ∗ is the unique ceiling of f . The floors of f are
the minimal true points (1, 0, 0) and (0, 1, 1). 
An easy corollary of Theorem 8.14 is that a regular Boolean function is uniquely
defined by the collection of its ceilings or its floors. This can be seen as the main
motivation for introducing Definition 8.5. More precisely, we can state:
Theorem 8.15. Let A be a subset of B n such that no two points in A are com-
parable with respect to  . Then, there exists a unique function rA (x1 , x2 , . . . , xn )
that is regular with respect to (x1 , x2 , . . . , xn ), and for which A is the set of ceil-
ings. Similarly, there exists a unique function r A (x1 , x2 , . . . , xn ) that is regular with
respect to (x1 , x2 , . . . , xn ) and for which A is the set of floors.
Proof. We only establish the statement concerning ceilings, since the argument
is easily adapted to prove the statement about floors. Let rA (x1 , x2 , . . . , xn ) be the
Boolean function defined as follows:
For all X ∗ ∈ Bn , rA (X ∗ ) = 0 if and only if there exists Y ∗ ∈ A such that Y ∗  X ∗ .
(8.8)
Now, let Y , Z ∈ B n with Z  Y and rA (Z) = 0. Then, it follows from (8.8) and
from the transitivity of  that rA (Y ) = 0. Hence, by Theorem 8.14, rA is regular
with respect to (x1 , x2 , . . . , xn ). Moreover, it is easy to verify that A is exactly the
set of ceilings of rA . To establish the unicity of rA , consider now a regular function
f (x1 , x2 , . . . , xn ) with x1 f x2 f · · · f xn , which admits A for set of ceilings. We
want to show that f necessarily is the unique function satisfying (8.8). Suppose
first that Y ∗ ∈ A and that Y ∗  X ∗ . Since Y ∗ is a ceiling of f , f (Y ∗ ) = 0, and
hence, by Theorem 8.14, f (X∗ ) = 0. Conversely, if f (X ∗ ) = 0, then there exists
a “leftmost” point Y ∗ such that Y ∗  X ∗ and f (Y ∗ ) = 0. By definition, Y ∗ is a
ceiling of f , and hence Y ∗ ∈ A. 

Example 8.11. Let A = {(0, 1, 0)}. If rA (x1 , x2 , x3 ) is a function with x1 f x2 f x3


and such that (0, 1, 0) is its unique ceiling, then, by Theorem 8.14, all points X∗
8.4 Recognition of regular functions 365

such that (0, 1, 0) X ∗ must be false points of rA . Moreover, by definition of a ceil-
ing, all left-shifts of (0, 1, 0) are true points of rA . A look at Example 8.9 indicates
that this classification exhausts all points of B3 . Hence, rA is uniquely determined.
One easily verifies that rA (x1 , x2 , x3 ) = x1 ∨ x2 x3 . 
Peled and Simeone [735] used Theorem 8.15 to show that, if r(n) is the number
3
of regular functions on n variables, then log2 r(n) ≥ cn− 2 2n for some constant c.

8.4 Recognition of regular functions


We tackle in this section the algorithmic problem of recognizing regular Boolean
functions, mostly concentrating on the case in which the input function f is posi-
tive and is represented by its complete DNF, that is, on the problem:

Regularity Recognition
Instance: The complete DNF of a positive Boolean function f .
Output: True if f is regular, False otherwise.

It is not too hard to see that Regularity Recognition can be solved in


polynomial time. Indeed, each question of the form:
“Is xi f xj , or is xj f xi , or are xi and xj incomparable with respect to f ?”
(8.9)
can be answered in time O(nm2 ) by virtue of Theorem 8.6, where n is the number of
variables and m is the number of prime implicants of f . By asking enough questions
of this type, we can either find a pair of incomparable variables, or determine a
permutation (xi1 , xi2 , . . . xin ) of the variables such that xi1 f xi2 f · · · xin in case
f is regular. Therefore, we can state the following result:
Theorem 8.16. There is an O(n2 m2 log n) algorithm to decide whether a positive
Boolean function given by its complete DNF is regular, where n is the number of
variables and m is the number of prime implicants of the function.
Proof. Using an optimal sorting strategy (like Mergesort [11]), one can determine
whether the input function is regular by asking O(n log n) questions of the form
(8.9), and each question can be answered in time O(nm2 ). 

Although polynomially bounded, the complexity of this simple procedure is


quite high. In particular, the factor m2 in the time bound is unsatisfactory since we
generally expect m to be large with respect to n. In the remainder of this section,
we present several results due to Winder [916, 917] and Provan and Ball [760] that
will allow us to derive an improved recognition procedure for regular functions
with time complexity O(n2 m).
The improvements will be achieved on two separate fronts. First, we will show
how to quickly obtain a complete ordering σ of the variables of f , with the property
366 8 Regular functions

that f is regular if and only if σ coincides with the strength preorder of f . “Quickly”
means here in O(n2 + nm) operations. Next, making use of an appropriate data
structure, we explain how to check in O(n2 m) steps whether σ actually is the
strength preorder of f and, hence, whether f is regular.

Strength preorder and Winder matrix


We start with an elegant result due to Winder [916, 917], which makes use of the
concept of lexicographic order of points in Rn .

Definition 8.6. For X, Y ∈ Rn , we say that X precedes Y in the lexicographic


order; and we write X <L Y if xk < yk , where k = min{j : xj  = yj , 1 ≤ j ≤ n}. We
write X ≤L Y if either X = Y or X <L Y .

Definition 8.7. The Winder matrix of a positive Boolean function f (x1 , x2 , . . . , xn )


is the n × n matrix R = (rid ), where rid denotes the number of prime implicants
of f that involve xi and whose degree is exactly d (i, d = 1, 2, . . . , n).

Theorem 8.17. Let f (x1 , x2 , . . . , xn ) be a positive Boolean function, and denote


by R i the i-th row of its Winder matrix (i = 1, 2, . . . , n). For i, j = 1, 2, . . . , n,

(a) if xi ≈f xj then R i = R j ;
(b) if xi 0f xj then R i >L R j .

Proof. Consider two variables xi , xj , and write the complete DNF of f in the form
α xi xj ∨ β xi ∨ γ xj ∨ δ as in Theorem 8.5. If xi ≈f xj , then β = γ , and hence,
R i = R j . So, assume now that xi 0f xj . Then β > γ . For d = 0, 1, . . . , n − 1, define

B(d) = {P : |P | = d and xk is a term of β}


k∈P

C(d) = {P : |P | = d and xk is a term of γ }.


k∈P

If B(d) = C(d) for all d, then β = γ , which contradicts our assumption. Thus,
there exists a smallest d ∗ such that B(d ∗ )  = C(d ∗ ). We claim that C(d ∗ ) ⊂ B(d ∗ ).

Indeed, let P ∈ C(d ∗ ). Since β > γ , there exists a term of β, say k∈Q xk , such
that Q ⊆ P . If Q is not equal to P , then |Q| < |P | = d ∗ , and hence, Q ∈ B(d)
for some d < d ∗ . By our choice of d ∗ , this implies that Q ∈ C(d). But, then both

k∈P xk and k∈Q xk are terms of γ , a contradiction. So, we conclude that Q = P ,
and hence, P ∈ B(d ∗ ) as required.
From the assertions B(d) = C(d) for d < d ∗ and C(d ∗ ) ⊂ B(d ∗ ), one easily
derives rid = rj d for d < d ∗ and rj d ∗ < rid ∗ , which completes the proof. 

Example 8.12. Let f = x1 x2 ∨ x1 x3 ∨ x1 x4 x5 ∨ x2 x3 x4 . One checks for instance


that r2,3 = 1, since x2 occurs in exactly one prime implicant of degree 3. The
complete matrix R associated with f is
8.4 Recognition of regular functions 367

 
0 2 1 0 0
 0 1 1 0 0 
 
R=
 0 1 1 0 0 .

 0 0 2 0 0 
0 0 1 0 0
Since x1 f x2 , the first row of R is lexicographically larger than its second
row. Also, the second and third rows of R are identical, since x2 ≈f x3 . 
Note that the strength preorder f does not coincide perfectly, in general, with
the lexicographic order ≥L on the rows of R (in particular, ≥L completely orders
the rows of R, whereas f is generally incomplete). When f is regular, however,
we obtain as an immediate corollary of Theorem 8.17:
Theorem 8.18. Let f (x1 , x2 , . . . , xn ) be a regular function and denote by R i the
i-th row of its Winder matrix (i = 1, 2, . . . , n). Then,
R 1 ≥L R 2 ≥L · · · ≥L R n if and only if x1 f x2 f · · · f xn .
Proof. This immediately follows from Theorem 8.17. 

For a positive function f expressed in complete DNF, with n variables and


m prime implicants, the Winder matrix R can be computed in time O(n2 + nm)
and its rows can be lexicographically ordered in time O(n2 ) (see [11]). Assuming
for simplicity that R 1 ≥L R 2 ≥L · · · ≥L R n , one can then decide whether f is
regular by checking whether x1 f x2 f · · · f xn . This requires (n − 1) pairwise
comparisons of variables, and each of these can be performed in time O(nm2 )
(using Theorem 8.5). Thus, Theorem 8.18 directly leads to an O(n2 m2 ) recognition
algorithm for regular functions.
To get rid of a factor of m in this time complexity, more work is needed.

Efficient comparison of variables


Given two variables xi , xj of a positive function f (x1 , x2 , . . . , xn ), deciding whether

xi f xj amounts (by Theorem 8.6) to testing whether k∈(A∪{i})\{j } xk is an impli-

cant of f , for each prime implicant of f of the form k∈A xk such that j ∈ A
and i  ∈ A. This observation motivates us to momentarily concentrate on the
algorithmic complexity of the following type of queries: For a positive function
f (x1 , x2 , . . . , xn ) expressed in complete DNF and for a subset A ⊆ {1, 2, . . . , n}, is

k∈A xk an implicant of f ?
Now, we have already seen in Chapter 7, Section 7.4, that queries of this type
can be answered efficiently when f has the LE property. Since Theorem 8.12
asserts that regular functions have the LE property, all results in Section 7.4 apply
to regular functions as well. (Note that it is not necessary to master all of Chapter 7
in order to appreciate the contents of Section 7.4: The reader can still study Section
7.4, now simply substituting the words “regularity property” for “LE property”
everywhere in the section.)
368 8 Regular functions

Procedure Regular(f )
Input: A positive Boolean function f (x1 , x2 , . . . , xn ) in complete DNF.
Output: True if f is regular, False otherwise.

begin
compute R, the Winder matrix of f ;
order the rows of R lexicographically;
{comment: assume without loss of generality that R 1 ≥L R 2 ≥L · · · ≥L R n }
set up the binary tree T (f );
for i = 1 to n − 1 and

for every prime implicant k∈A xk of f such that i  ∈ A and i + 1 ∈ A do
if Implicant(A ∪ {i} \ {i + 1}) = False then return False;
return True;
end

Figure 8.1. Procedure Regular.

More precisely, denote by T (f ) the binary tree associated with (the com-
plete DNF of) a positive function f as on page 341, and consider the procedure
Implicant(A) defined on page 342. Then, we can state:

Theorem 8.19. Let f (x1 , x2 , . . . , xn ) be a positive function, and let A ⊆


{1, 2, . . . , n}.

(a) If the procedure Implicant(A) returns the answer True, then j ∈A xj is
an implicant of f .
(b) When f is regular with respect to (x1 , x2 , . . . , xn ), the
procedure
Implicant(A) returns the answer True if and only if j ∈A xj is an
implicant of f .
Proof. This is a corollary of Theorem 7.12 and Theorem 7.13. 

We are now ready to state an efficient algorithm due to Provan and Ball [760]
for the recognition of regular functions; see Figure 8.1 for a formal statement of
the algorithm.

Theorem 8.20. Algorithm Regular correctly recognizes regular functions given


by their complete DNF. It can be implemented to run in time O(n2 m), where n is
the number of variables and m is the number of prime implicants of the function
to be tested.

Proof. If f is regular and R 1 ≥L R 2 ≥L · · · ≥L R n , then x1 f x2 f · · · f xn


by Theorem 8.18. So, for every i ∈ {1, 2, . . . , n − 1} and for every prime impli-

cant k∈A xk of f such that i  ∈ A and i + 1 ∈ A, Theorem 8.6 implies that

k∈(A∪{i})\{i+1} xk is an implicant of f . Hence, by Theorem 8.19, Regular(f )
returns the answer True.
Conversely, if f is not regular, then there is a smallest index i such that xi f

xi+1 . For this i, there is a prime implicant k∈A xk of f such that i  ∈ A, i + 1 ∈ A,
8.5 Dualization of regular functions 369


and k∈(A∪{i})\{i+1} xk is not an implicant of f . But then, Implicant(A ∪ {i} \
{i + 1}) returns False, by Theorem 8.19. This establishes that the procedure is
correct.
As for the complexity of the procedure, we have already observed that its first
and second steps can be performed in time O(n2 + nm). Setting up the tree T (f )
takes time O(nm) (see Section 7.4). The nested loops require at most nm calls on
the procedure Implicant, and each of these calls can be executed in time O(n).
Hence, the overall running time of Regular is O(n2 m). 

Example 8.13. Consider the function f (x1 , x2 , x3 , x4 , x5 ) = x1 x2 ∨x1 x3 ∨x1 x4 x5 ∨


x2 x3 x4 . We computed the Winder matrix of f in Example 8.12. The tree T (f ) is
represented in Figure 7.2. The reader can check that Regular returns the answer
True when running on f . 
The O(n2 m) time complexity stated in Theorem 8.20 has been further improved
to O(nm) by Makino [649]. Makino’s algorithm makes use of an improved binary
tree data structure in order to achieve this time complexity; we refer to the paper
[649] for details.
Before closing this section on the recognition of regular functions, let us address
the complexity of a more general version of the problem: Namely, given an arbitrary
DNF (as opposed to a positive one), how difficult is it to determine whether this
DNF represents a regular Boolean function? Peled and Simeone [735] showed:
Theorem 8.21. Deciding whether a DNF represents a regular Boolean function
is co-NP-complete, even if the DNF has degree at most three.
Proof. NP-hardness follows immediately from Theorem 1.30 and from the obser-
vation that not all functions are regular. The decision problem is in co-NP since
we can show that a function is not regular by exhibiting a pair of incomparable
variables. 

The problem of recognizing regular functions given by an oracle, rather than


by a Boolean expression, has also been considered in a number of publications;
see, for instance, Boros, Hammer, Ibaraki, and Kawakami [129] or Makino and
Ibaraki [653].

8.5 Dualization of regular functions


In this section, we consider the problem of dualizing regular Boolean functions
expressed in complete (prime irredundant) disjunctive normal form, namely, the
problem:

Regular Dualization
Instance: The complete DNF of a regular function f or, equivalently, the list of
all minimal true points of f .
370 8 Regular functions

Output: The complete DNF of f d or, equivalently, the list of all maximal false
points of f .

Motivation for this problem can be found in Chapter 4, as well as in Applica-


tion 8.5. Also, and perhaps most importantly, the efficient dualization of regular
functions will turn out to be an essential step for the efficient recognition of thresh-
old functions in Chapter 9. As a consequence, this problem has a rather complex
and interesting history.
The first specialized dualization algorithm for regular functions was proposed
by Hammer, Peled, and Pollatschek [455]. This algorithm runs in “polynomial
total time” in the sense of Appendix B, meaning that its running time is bounded
by a polynomial in the size of its input and of its output. Denote by n, m, and
p, respectively, the number of variables, minimal true points, and maximal false
points of the function f . So, the algorithm of Hammer, Peled, and Pollatschek [455]
is polynomial in n, m and p. However, the authors did not carry out a more detailed
complexity analysis of their algorithm and, in particular, they did not provide any
precise bound on the magnitude of p.
Similar comments hold for the general dualization scheme of Lawler, Lenstra,
and Rinnooy Kan [605] sketched in Theorem 4.39. Indeed, as noticed by Peled
and Simeone [735], the approach proposed by Lawler, Lenstra, and Rinnooy Kan
for the enumeration of all maximal feasible solutions of knapsack problems can
be generalized for the dualization of regular functions. It leads to an O(n2 p)
dualization algorithm for regular functions, but again, the approach does not seem
to imply any reasonable bound on p.
Peled and Simeone [735] presented the first dualization algorithm for regular
functions whose running time could be proved to be polynomially bounded in n
and m only. More precisely, their algorithm outputs the maximal false points of f
in time O(n3 m). Clearly, such a result is only possible if the number of maximal
false points of f , namely, p, is itself polynomially bounded in n and m. And indeed,
as a by-product of the complexity analysis of their algorithm, Peled and Simeone
established that the bound p ≤ nm + m + n always holds for regular functions.
Therefore, in particular, the algorithms of Hammer, Peled, and Pollatschek [455]
and Lawler, Lenstra, and Rinnooy Kan [605] mentioned above also have their
running time bounded by a polynomial in n and m.
In spite of its low computational complexity, Peled and Simeone’s algorithm
is quite intricate. By contrast, Crama [225] proposed a straightforward O(n2 m)
dualization algorithm for regular functions, based on a simple recursive charac-
terization of the maximal false points of these functions in terms of their minimal
true points (see Theorem 8.22 hereunder). His characterization also implies a
stronger bound on the number of maximal false points: Namely, p ≤ (n − 1)m
when m > 1.
Bertolazzi and Sassano [74, 75] independently rediscovered these same results
and extended them to a more compact characterization of the maximal false points
of regular functions (see Theorem 8.27). Their characterization also leads to an
8.5 Dualization of regular functions 371

O(n2 m) dualization algorithm and lends itself to an O(nm) algorithm for the
solution of “regular set covering problems” to be discussed in Section 8.6. Later
on, Peled and Simeone [736] proposed yet another O(n2 m) regular dualization
algorithm.
Finally, we note that an O(n2 m) dualization algorithm for regular func-
tions can be obtained as a corollary of Theorem 7.16, since regular functions
have the LE property by Theorem 8.12. This algorithm was first described by
Boros [105], within the framework of his analysis of so-called aligned functions
(see Section 8.9.2).
The presentation hereunder combines ideas from Crama [225] and Bertolazzi
and Sassano [74]. It mostly rests on a key result from Crama [225]:
Theorem 8.22. Assume that f (x1 , x2 , . . . , xn ) is regular with respect to
(x1 , x2 , . . . , xn ) and let X ∗ ∈ B n−1 . Then, (X ∗ , 0) is a maximal false point of f
if and only if (X ∗ , 1) is a minimal true point of f .
Proof. Assume that (X ∗ , 0) is a maximal false point of f . Then, (X∗ , 1) is a true
point of f . To see that (X∗ , 1) actually is a minimal true point of f , consider any
index i < n such that xi∗ = 1. Since xi f xn , (x1∗ , x2∗ , . . . , xi−1 ∗ ∗
, 0, xi+1 ∗
, . . . , xn−1 , 1)
is a false point of f , as required.
Conversely, if (X∗ , 1) is a minimal true point of f , then (X ∗ , 0) is a false
point of f . To see that (X ∗ , 0) is a maximal false point, consider i < n such that
xi∗ = 0. Since xi f xn , (x1∗ , x2∗ , . . . , xi−1
∗ ∗
, 1, xi+1 ∗
, . . . , xn−1 , 0) is a true point of f , as
required. 

Theorem 8.22 provides a simple and tractable characterization of those maximal


false points of a regular function that have their last component equal to 0. On the
other hand, the maximal false points with last component equal to 1 can easily
be treated recursively. To see this, let us introduce a new notation: For a function
f (x1 , x2 , . . . , xn ) and an index i ∈ {1, 2, . . . , n}, let us denote by fi the restriction of
f to xi = xi+1 = · · · = xn = 1. We look at fi as a function of (x1 , x2 , . . . , xi−1 ). By
convention, we also set fn+1 = f .
Theorem 8.23. Let f (x1 , x2 , . . . , xn ) be a positive function and let X ∗ ∈ B n−1 .
Then, (X∗ , 1) is a maximal false point of f if and only if X ∗ is a maximal false
point of fn .
Proof. This is trivial. 

Note that, in contrast with Theorem 8.22, Theorem 8.23 is valid for all positive
functions, whether regular or not. Taken together, these theorems immediately
suggest a recursive dualization procedure for regular functions. This procedure,
which we call DualReg0, is described in Figure 8.2.
The procedure is obviously correct in view of Theorem 8.22 and Theorem 8.23.
Moreover, it can actually be implemented recursively, since fn is regular when f
is regular (by Theorem 8.7).
372 8 Regular functions

Procedure DualReg0(f )
Input: The list of minimal true points of a regular function f (x1 , x2 , . . . , xn )
such that x1 f x2 f · · · f xn .
Output: All maximal false points of f .

begin
identify all minimal true points of f with last component equal to 1,
say (X1∗ , 1), (X2∗ , 1), . . . , (Xk∗ , 1);
fix xn to 1 in f and determine the minimal true points of fn ;
generate (recursively) all maximal false points of fn ,
∗ , X∗ , . . . , X∗ ;
say Xk+1 k+2 p
return (X1 , 0), (X2∗ , 0), . . . , (Xk∗ , 0) and (Xk+1
∗ ∗ , 1), (X ∗ , 1), . . . , (X ∗ , 1);
k+2 p
end

Figure 8.2. Procedure DualReg0.

Example 8.14. Consider the function f (x1 , x2 , x3 , x4 , x5 ) = x1 x2 ∨ x1 x3 ∨ x1 x4 ∨


x2 x3 ∨ x2 x4 x5 , which is regular with x1 0f x2 0f x3 0f x4 0f x5 . The minimal
true points of f are (in lexicographic order): Y 1 = (0, 1, 0, 1, 1), Y 2 = (0, 1, 1, 0, 0),
Y 3 = (1, 0, 0, 1, 0), Y 4 = (1, 0, 1, 0, 0) and Y 5 = (1, 1, 0, 0, 0). Let us execute the pro-
cedure DualReg0 on f .

Step 1. The only maximal false point of f with 0 as last component is X1 =


(0, 1, 0, 1, 0) (derived from Y 1 via Theorem 8.22).
Step 2. The restriction of f to x5 = 1 is f5 (x1 , x2 , x3 , x4 ) = x1 x2 ∨ x1 x3 ∨
x1 x4 ∨ x2 x3 ∨ x2 x4 , which has the minimal true points: Z 1 = (0, 1, 0, 1), Z 2 =
(0, 1, 1, 0), Z 3 = (1, 0, 0, 1), Z 4 = (1, 0, 1, 0), Z 5 = (1, 1, 0, 0).
Step 3. We now recursively apply DualReg0 to f5 .

Step 1. The maximal false points of f5 with last component equal to 0 are (0, 1, 0, 0)
and (1, 0, 0, 0) (derived from Z 1 and Z 3 by Theorem 8.22). Thus, f has the maximal
false points X2 = (0, 1, 0, 0, 1) and X3 = (1, 0, 0, 0, 1) (by Theorem 8.23).
Step 2. The restriction of f5 to x4 = 1 is f4 (x1 , x2 , x3 ) = x1 ∨ x2 , with minimal true
points V 1 = (0, 1, 0) and V 2 = (1, 0, 0).
Step 3. We recursively apply DualReg0 to f4 .

Step 1. f4 has no maximal false points with x3 = 0.


Step 2. Setting x3 = 1 in f4 , we get f3 (x1 , x2 ) = x1 ∨ x2 , with minimal true points
W 1 = (0, 1) and W 2 = (1, 0).
Step 3. We recursively apply DualReg0 to f3 .

Step 1. Using Theorem 8.22, we see that f3 has the maximal false point (0, 0) with
last component equal to 0. Thus, f has the maximal false point X 4 = (0, 0, 1, 1, 1)
(by repeated applications of Theorem 8.23).
Step 2. Fixing x2 = 1 in f3 , we obtain f2 (x1 ) ≡ 1.
8.5 Dualization of regular functions 373

Step 3. Since f2 has no maximal false points, the procedure terminates here: all
maximal false points of f have been listed. 

DualReg0 requires generating the minimal true points of fn from the mini-
mal true points of f . To carry out this step efficiently, one may rely on the next
observation.

Theorem 8.24. Let f (x1 , x2 , . . . , xn ) be a positive function and let Y ∈ Bn−1 . Then,
Y is a minimal true point of fn if and only if

(a) either (Y , 1) is a minimal true point of f , or


(b) (Y , 0) is a minimal true point of f , and f has no minimal true point of the
form (Z, 1) with Z < Y .

Proof. We leave this easy proof to the reader. 

A straightforward implementation of DualReg0 based on Theorem 8.24 yields


an O(n2 m2 ) dualization algorithm for regular functions with n variables and m
minimal true points. Our next goal in this section will be to reduce this complexity
by a factor of m. We now briefly sketch the line of attack that we will follow in
order to achieve this goal.
We first derive an accurate characterization of certain minimal true points of the
restricted functions f1 , f2 , . . . , fn in terms of the minimal true points of f , under
the assumption that f is regular (see Theorem 8.26; notice that Theorem 8.24 does
not rest on any regularity assumption). This result will then lead to a compact
description of the maximal false points of a regular function (Theorem 8.27) and,
finally, to the announced O(n2 m) dualization algorithm (Theorem 8.28).
We now launch this programme with a first refinement of Theorem 8.24.
We use the following notations: If Y is a nonzero point in B n , we denote by
µ(Y ) the largest index k such that yk = 1, and we denote by Y − ei the point
(y1 , . . . , yi−1 , 0, yi+1 , . . . , yn ), for i = 1, 2, . . . , n. Then (Crama [225]):

Theorem 8.25. Assume that f (x1 , x2 , . . . , xn ) is regular with respect to


(x1 , x2 , . . . , xn ), and let Y be a nonzero point in B n−1 . Then, Y is a minimal true
point of fn if and only if

(a) either (Y , 1) is a minimal true point of f , or


(b) (Y , 0) is a minimal true point of f , but (Y − eµ(Y ) , 1) is not.

Proof. Necessity. This is a corollary of Theorem 8.24.


Sufficiency. If (Y , 1) is a minimal true point of f , then Y is a minimal true point
of fn by Theorem 8.24. So, assume now that (Y , 0) is a minimal true point of f ,
but that Y is not a minimal true point of fn . We will deduce from these assumptions
that (Y − eµ(Y ) , 1) is a minimal true point of f , thus completing the proof.
By Theorem 8.24, f must have a minimal true point of the form (Z, 1), with
Z < Y . Let j be any index in {1, 2, . . . , n − 1} such that zj = 0 and yj = 1. By
374 8 Regular functions

regularity, (Z ∨ ej , 0) is a true point of f , and, by minimality of (Y , 0), it follows


that Y = Z ∨ ej . So, (Z, 1) = (Y − ej , 1) is a minimal true point of f .
If j = µ(Y ), then we are done. Otherwise, j < µ(Y ), and, by regularity,
(Y − eµ(Y ) , 1) is a true point of f , as required. To see that (Y − eµ(Y ) , 1) actu-
ally is a minimal true point of f , observe first that (Y − eµ(Y ) , 0) is a false point of
f , since (Y , 0) is a minimal true point. Next, consider any index k < µ(Y ) such
that yk = 1. Since (Y − eµ(Y ) , 0) is a false point of f , (Y − ek − eµ(Y ) , 1) also is a
false point, by regularity. Thus, (Y − eµ(Y ) , 1) is a minimal true point. 

As shown in Crama [225], Theorem 8.25 can already be used to produce an


O(n2 m) implementation of DualReg0. But we now go one step further and
establish a more precise characterization of those minimal true points of fj
(j = 1, 2, . . . , n) that have a 1 as last component (observe that these are the only
minimal true points of fj that we need to know to carry out DualReg0).
For two minimal true points Y and Z of f , let us say that Z immediately precedes
Y if Z <L Y and if there is no minimal true point of f between Z and Y in the
lexicographic order <L (see Definition 8.6). The following result is essentially
due to Bertolazzi and Sassano (see Theorem 4.2 in [74]):
Theorem 8.26. Let f (x1 , x2 , . . . , xn ) be regular with respect to (x1 , x2 , . . . , xn ), let
j ∈ {1, 2, . . . , n − 1}, and let (y1 , y2 , . . . , yj ) be a point in B j such that yj = 1.
The point (y1 , y2 , . . . , yj ) is a minimal true point of fj +1 if and only if there exists
(yj +1 , yj +2 , . . . , yn ) ∈ Bn−j such that
(a) Y = (y1 , y2 , . . . , yn ) is a minimal true point of f , and
(b) if Z is the minimal true point of f immediately preceding Y , then
(y1 , y2 , . . . , yj −1 )  = (z1 , z2 , . . . , zj −1 ).
Proof. Necessity. If (y1 , y2 , . . . , yj ) is a minimal true point of fj +1 , then it fol-
lows from Theorem 8.24 that f must have a minimal true point of the form
Y = (y1 , y2 , . . . , yn ) for some appropriate values of yj +1 , yj +2 , . . . , yn . Choose
yj +1 , yj +2 , . . . , yn in such a way that Y is lexicographically smallest among all
minimal true points of f of this form.
Let now Z be the minimal true point of f immediately preceding Y and
assume by contradiction that (y1 , y2 , . . . , yj −1 ) = (z1 , z2 , . . . , zj −1 ). If zj = 1, then
(y1 , y2 , . . . , yj ) = (z1 , z2 , . . . , zj ), contradicting the choice of Y . So, zj = 0. On the
other hand, (z1 , z2 , . . . , zj ) is a true point of fj +1 , since

fj +1 (z1 , z2 , . . . , zj ) = f (z1 , z2 , . . . , zj , 1, . . . , 1) ≥ f (Z) = 1.

Hence, (y1 , y2 , . . . , yj ) is not a minimal true point of fj +1 , a contradiction.


Sufficiency. Assume now that f has a minimal true point of the form Y =
(y1 , y2 , . . . , yn ) but that (y1 , y2 , . . . , yj ) is not a minimal true point of fj +1 . Then,
there exists an index k, j < k ≤ n, such that (y1 , y2 , . . . , yk ) is a minimal true point
of fk+1 , but (y1 , y2 , . . . , yk−1 ) is not a minimal true point of fk . In view of Theorem
8.25, this means that yk = 0, and that V = (y1 , y2 , . . . , yk−1 , 1) − eµ is a minimal
8.5 Dualization of regular functions 375

true point of fk+1 , where µ = µ(y1 , y2 , . . . , yk−1 ). Observe that j ≤ µ, since yj = 1


by assumption.
Since V is a minimal true point of fk+1 , there exists (by Theorem 8.24) a
minimal true point of f of the form (V , W ). Moreover, (V , W ) <L Y , since
v1 = y1 , v2 = y2 , . . . , vµ−1 = yµ−1 , vµ = 0 < 1 = yµ . Hence, if Z denotes the mini-
mal true point of f immediately preceding Y , then (V , W ) ≤L Z <L Y . From j ≤ µ,
it now follows easily that (v1 , v2 , . . . , vj −1 ) = (z1 , z2 , . . . , zj −1 ) = (y1 , y2 , . . . , yj −1 ),
as required. 

As announced earlier in the section, we are now ready to present a complete


characterization of the maximal false points of a regular function in terms of its
minimal true points.
Theorem 8.27. Let f (x1 , x2 , . . . , xn ) be regular with respect to (x1 , x2 , . . . , xn ).
Assume that f is not identically equal to 0, and let Y 1 , Y 2 , . . . , Y m be its minimal
true points, labeled in such a way that Y 1 <L Y 2 <L . . . <L Y m . A point X∗ ∈ Bn
is a maximal false point of f if and only if there exists a minimal true point Y i
(1 ≤ i ≤ m) and an index j ∈ {1, 2, . . . , n} such that
(a) either i = 1 or (y1i−1 , y2i−1 , . . . , yji−1 i i i
−1 )  = (y1 , y2 , . . . , yj −1 );
(b) xk∗ = yki for k = 1, 2, . . . , j − 1;
(c) xj∗ = 0 and yji = 1;
(d) xk∗ = 1 for k = j + 1, . . . , n.
Proof. Let X∗ ∈ Bn , X∗  = (1, . . . , 1), and let j be the largest index such that xj∗ = 0.
By Theorem 8.23, X ∗ is a maximal false point of f if and only if (x1∗ , . . . , xj∗ ) is a
maximal false point of fj +1 . Hence, by Theorem 8.22, X∗ is a maximal false point
of f if and only if (x1∗ , . . . , xj∗−1 , 1) is a minimal true point of fj +1 . The proof is
now easily completed by referring to Theorem 8.26. 

An efficient dualization algorithm for regular functions can be immediately


deduced from Theorem 8.27. To efficiently test condition (a) in the statement of this
theorem, it is convenient to compute, in a preprocessing phase of the algorithm, the
smallest index νi on which Y i differs from Y i−1 , for i = 2, 3, . . . , m. By convention,
we let ν1 = 0. Then, condition (a) can be simply replaced by
(a’) νi < j ,
and the algorithm can be stated as in Figure 8.3.
We are now finally ready for the main result of this section.
Theorem 8.28. The procedure DualReg(f ) is correct and can be implemented
to run in time O(n2 m), where n is the number of variables and m is the number
of minimal true points of f .
Proof. The correctness of the procedure follows from Theorem 8.27. As for its com-
plexity, notice that the minimal true points of f can be lexicographically ordered
376 8 Regular functions

Procedure DualReg(f )
Input: The list of minimal true points Y 1 , Y 2 , . . . , Y m of a regular function
f (x1 , x2 , . . . , xn ) such that x1 f x2 f · · · f xn .
Output: The list L of all maximal false points of f .

begin
sort the points Y i (i = 1, 2, . . . , m) in lexicographic order;
{comment: assume without loss of generality that Y 1 <L Y 2 <L . . . <L Y m };
ν1 := 0;
for i = 2 to m do νi := min{k : yki−1 < yki };
initialize L := empty list;
for i = 1 to m and for j = 1 to n do
if yji = 1 and νi < j then
begin
for k = 1 to j − 1 do xk∗ := yki ;
xj∗ := 0;
for k = j + 1 to n do xk∗ := 1;
add X∗ to L;
end
return L;
end

Figure 8.3. Procedure DualReg.

in time O(nm) (see for instance Aho, Hopcroft, and Ullman [11]). The parameters
ν1 , ν2 , . . . , νm can be simultaneously computed on the run. Each execution of the
(for i, for j )–loop requires O(n) operations, thus leading to the overall O(n2 m)
time bound. 

An interesting feature of the procedure DualReg is worth stressing here.


Namely, in each execution of the (for i, for j )–loop, at most one maximal false
point is identified and added to the list L. Explicitly producing, that is, writing
up this false point, requires O(n) operations. But in fact, the point is implicitly
identified in constant time by simply testing whether yji = 1 and νi < j (this is, of
course, a direct consequence of Theorem 8.27). It is this feature of DualReg that
allows Bertolazzi and Sassano [74] to solve regular set covering problems in time
O(nm), as we explain in Section 8.6.
We close this section with a bound on the size of the dual of a regular function
(see Bertolazzi and Sassano [74] and Crama [225]):
Theorem 8.29. If f (x1 , x2 , . . . , xn ) is a regular function with minimal true points
Y 1 , Y 2 , . . . , Y m , then the number of maximal false points of f is exactly
m 
 n
p= {yji : νi < j },
i=1 j =1

where νi is defined as in DualReg. In particular, p ≤ |f |, where |f | is the number


of literals in the complete DNF of f , and p ≤ (n − 1)m when m > 1.
8.6 Regular set covering problems 377

Proof. This is a straightforward corollary of Theorem 8.27. 

Theorem 8.29 strengthens the result of Peled and Simeone [735] mentioned in
the introduction of this section. Further refinements of the bound can be found in
[105, 225, 735].

8.6 Regular set covering problems


We deal in this section with the set covering problem (SCP):
n

maximize z(x1 , x2 , . . . , xn ) = ci xi (8.10)
i=1

subject to f (x1 , x2 , . . . , xn ) = 0 (8.11)


n
(x1 , x2 , . . . , xn ) ∈ B , (8.12)

where f is a positive Boolean function expressed in complete (prime irredundant)


disjunctive normal form and ci ≥ 0 for i = 1, 2, . . . , n (see, e.g., Section 1.13.6,
Application 4.4 in Section 4.2, and Application 8.2 in Section 8.2). We are more
particularly interested in the special case in which f is regular. When this is the
case, we say that SCP is a regular set covering problem (RSCP).
Since we know that at least one optimal solution of RSCP is to be found among
the maximal false points of f , we immediately conclude that RSCP is solved in
polynomial time by the procedure RegCover0 in Figure 8.4.

Theorem 8.30. The procedure RegCover0(c, f ) is correct and can be imple-


mented to run in time O(n2 m), where n is the number of variables and m is the
number of prime implicants of f .

Proof. The procedure is obviously correct. It can easily be implemented to run in


time O(n2 m) if the dualization algorithm DualReg is used to generate the maxi-
mal false points of f (see Theorem 8.28). 

Procedure RegCover0(c, f )
Input: A vector (c1 , c2 , . . . , cn ) of integer coefficients and a regular function
f (x1 , x2 , . . . , xn ) in complete disjunctive normal form.
Output: An optimal solution of the instance of RSCP defined by (c1 , c2 , . . . , cn ) and f .

begin
generate all maximal false points of f ;
evaluate the value of each maximal false point and return the best one;
end

Figure 8.4. Procedure RegCover0.


378 8 Regular functions

Example 8.15. Consider the regular set covering problem:

maximize z(x1 , x2 , x3 , x4 , x5 ) = 3x1 + 2x2 + x3 + x4 + 2x5


subject to f = x1 x2 ∨ x1 x3 ∨ x1 x4 ∨ x2 x3 ∨ x2 x4 x5 = 0
(x1 , x2 , x3 , x4 , x5 ) ∈ B5 .

The maximal false points of f have been computed in Example 8.14; they are
X 1 = (0, 1, 0, 1, 0), X 2 = (0, 1, 0, 0, 1), X 3 = (1, 0, 0, 0, 1) and X 4 = (0, 0, 1, 1, 1).
Their respective values are z(X1 ) = 3, z(X 2 ) = 4, z(X 3 ) = 5 and z(X4 ) = 4. So,
X 3 is an optimal solution for this instance of RSCP. 

The first polynomial-time algorithm for RSCP was obtained by Peled and
Simeone [735], based on the general approach outlined in procedure DualReg0.
The complexity of their algorithm is O(n3 m), since this is also the complexity
of the dualization algorithm proposed in [735]. The better time bound mentioned
in Theorem 8.30 immediately results from the improvements brought by Crama
[225] or Bertolazzi and Sassano [74] to the efficiency of dualization procedures
for regular functions.
However, as shown by Bertolazzi and Sassano [74], Hammer and Simeone
[462], or Peled and Simeone [736], even faster algorithms (with complexity
O(nm)) can be obtained for RSCP by exploiting a slightly different idea: Namely,
these authors manage to replace the explicit generation of the maximal false points
of f by their implicit generation, and to compute in constant time the value z(X)
of each such point. In Bertolazzi and Sassano [74], this idea is implemented via
a simple adaptation of the dualization algorithm DualReg. This leads to the pro-
cedure RegCover shown in Figure 8.5. In this procedure, the variable best keeps
track of the value of the best point found so far, and i ∗ , j ∗ are the values of i and
j describing this point, as in Theorem 8.27.
The meaning of the computations carried out in RegCover is revealed in the
following proof.

Theorem 8.31. The procedure RegCover(c, f ) is correct and can be imple-


mented to run in time O(nm), where n is the number of variables and m is the
number of prime implicants of f .

Proof. It is trivial to verify that, at the beginning of an arbitrary (for j )–loop (that
is, just after the counter j has been increased), the value of C is given as
 
C= ck yki + ck .
k<j k≥j

On the other hand, in view of Theorem 8.27 and the comments that follow it, we
know that the maximal false points of f are all points of the form

X∗ = (y1i , y2i , . . . , yji −1 , 0, 1, . . . , 1)


8.6 Regular set covering problems 379

Procedure RegCover(c, f )
Input: A vector (c1 , c2 , . . . , cn ) of nonnegative integer coefficients and the list of
minimal true points Y 1 , Y 2 , . . . , Y m of a regular function f (x1 , x2 , . . . , xn )
such that x1 f x2 f · · · f xn .
Output: An optimal solution of the instance of RSCP defined by (c1 , c2 , . . . , cn ) and f .

begin
best :=
 −1;
S := nj=1 cj ;
sort the points Y i (i = 1, 2, . . . , m) in lexicographic order;
{comment: assume without loss of generality that Y 1 <L Y 2 <L . . . <L Y m };
ν1 := 0;
for i = 2 to m do νi := min{k : yki−1 < yki };
{comment: compute the value of each maximal false point};
for i = 1 to m do
begin
C := S;
for j = 1 to n do
begin
if yji = 0 then C := C − cj ;
if yji = 1 and νi < j and C − cj > best then
begin
best := C − cj ;
i ∗ := i;
j ∗ := j ;
end
end
end
∗ ∗ ∗
return (y1i , y2i , . . . , yji ∗ −1 , 0, 1, 1, . . . , 1);
end

Figure 8.5. Procedure RegCover.

such that yji = 1 and νi < j . Thus, if X ∗ = (y1i , y2i , . . . , yji −1 , 0, 1, . . . , 1) is such a
point, then C − cj is precisely the value of z(X ∗ ). It follows easily that RegCover
returns a maximal false point with maximum value.
The complexity analysis is straightforward. 

Example 8.16. Let us consider again the set covering instance given in Exam-
ple 8.15, and let us run RegCover on this instance. The minimal true points of f are
(in lexicographic order): Y 1 = (0, 1, 0, 1, 1), Y 2 = (0, 1, 1, 0, 0), Y 3 = (1, 0, 0, 1, 0),
Y 4 = (1, 0, 1, 0, 0), and Y 5 = (1, 1, 0, 0, 0). So, ν1 = 0, ν2 = 3, ν3 = 1, ν4 = 3,
ν5 = 2. The sum of the objective function coefficients is S = 9, and we initially set
best := −1.
For i = 1 and for j = 1 to 5, we successively obtain
j = 1 : y11 = 0 =⇒ C := 9 − c1 = 6;
j = 2 : y21 = 1 and ν1 < 2 and C − c2 = 4 > best =⇒ best := 4, i ∗ := 1, j ∗ := 2;
380 8 Regular functions

j = 3 : y31 = 0 =⇒ C := 6 − c3 = 5;
j = 4 : y41 = 1 and C − c4 = 4 ≤ best =⇒ no update;
j = 5 : y51 = 1 and C − c5 = 3 ≤ best =⇒ no update.
No better solution is found for i = 2, since ν2 ≥ j whenever yj2 = 1.
For i = 3, we get:
j = 1 : y13 = 1 and ν3 ≥ j =⇒ no update;
j = 2 : =⇒ C := 9 − c2 = 7;
j = 3 : =⇒ C := 7 − c3 = 6;
j = 4 : y43 = 1 and ν3 < 4 and C − c4 = 5 > best =⇒ best := 5, i ∗ := 3, j ∗ := 4.
We leave it to the reader to continue the execution of RegCover on this example
and to verify that no further updates of best, i ∗ and j ∗ take place. So, the solution
returned by the algorithm is
∗ ∗ ∗
(y1i , y2i , . . . , yji ∗ −1 , 0, 1, . . . , 1) = (y13 , y23 , y33 , 0, 1) = (1, 0, 0, 0, 1),

with an objective function value of 5. 

Further connections between regular functions and set covering problems can be
found, for instance, in Balas [42], Hammer, Johnson and Peled [443, 444], Laurent
and Sassano [602], Wolsey [922], etc. (see also Section 8.7.3 and Chapter 9).

8.7 Regular minorants and majorants


In view of the computational tractability of regular functions, it may be of interest
to approximate a given nonregular function by a regular one. We deal in this section
with a restricted form of these problems in which the approximating function is
required to be either a majorant or a minorant of the original one, and in which
the strength ordering of the approximant is imposed. Thus, we state as follows
the problem to be tackled: Given a positive function f (x1 , x2 , . . . , xn ), find two
positive functions f− (x1 , x2 , . . . , xn ) and f + (x1 , x2 , . . . , xn ) such that

• f− and f + are both regular with respect to (x1 , x2 , . . . , xn ); (8.13)


+
• f− ≤ f ≤ f ; (8.14)
• f− and f + are “closest” to f among all functions satisfying (8.13) and (8.14).
(8.15)

The word “closest” in condition (8.15) needs to be further clarified: Before


we can speak of closeness, it may seem necessary to introduce first a notion of
distance between Boolean functions. However, this difficulty is easily avoided
in the present context. Indeed, as we prove next, there always exists a smallest
majorant and a largest minorant satisfying conditions (8.13) and (8.14). They will
play for us the roles of “closest majorant” and “closest minorant.” (Compare with
Exercise 13 in Chapter 1.)
8.7 Regular minorants and majorants 381

Theorem 8.32. For every Boolean function f (x1 , x2 , . . . , xn ), there exist two
positive functions f R (x1 , x2 , . . . , xn ) and fR (x1 , x2 , . . . , xn ) such that

(a) fR and f R are both regular with respect to (x1 , x2 , . . . , xn );


(b) fR ≤ f ≤ f R ;
(c) if f− and f+ are any two functions satisfying conditions (8.13) and (8.14),
then f− ≤ fR and f R ≤ f + .

Proof. Let us denote by L and U the sets of all positive functions such that (8.13)
and (8.14) are satisfied for all f− ∈ L and f + ∈ U . Observe that L and U are both
nonempty, since 0 n ∈ L and 1n ∈ U . Define

fR = {f− : f− ∈ L}, f R = {f + : f + ∈ U }.

Then, fR and f R trivially satisfy conditions (b) and (c), and Theorem 8.9
implies (a). 

The functions fR and f R introduced in Theorem 8.32 will be called the


largest regular minorant and the smallest regular majorant of f with respect
to (x1 , x2 , . . . , xn ), respectively.
Note that condition (a) in Theorem 8.32 cannot be replaced by the weaker
condition “fR and f R are regular” without further specification of the strength
ordering. This is illustrated by the next example.

Example 8.17. The function f = x1 x2 ∨ x3 x4 is not regular. The largest regular


minorant of f with respect to (x1 , x2 , x3 , x4 ) is fR = x1 x2 ∨ x1 x3 x4 ∨ x2 x3 x4 (see
Example 8.20 hereunder). Another regular minorant of f is g = x1 x2 x3 ∨ x1 x2 x4 ∨
x3 x4 , which is such that x3 ≈g x4 0g x1 ≈g x2 . But there is no regular minorant of
f which is larger than both fR and g. Indeed, assume that h is a minorant of f
such that fR ≤ h and g ≤ h. Then, fR ∨ g ≤ h ≤ f . However, fR ∨ g = f . Hence,
h = f , and h is not regular. 

A useful characterization of the functions fR and f R can be derived from results


in Section 8.3 (recall in particular Theorem 8.15, and compare with Exercise 14
in Chapter 1).

Theorem 8.33. For every Boolean function f (x1 , x2 , . . . , xn ),

(i) fR is the unique function that is regular with respect to (x1 , x2 , . . . , xn ) and
that has the same set of ceilings as f; and
(ii) f R is the unique function that is regular with respect to (x1 , x2 , . . . , xn ) and
that has the same set of floors as f .

Proof. Let A be the set of ceilings of f , and let τA be defined as in Theorem 8.15.
We want to prove that fR = rA , that is, we want to prove that rA satisfies conditions
(a)-(c) in Theorem 8.32. Condition (a) follows from the definition of τA .
382 8 Regular functions

To obtain condition (b), let X∗ be a false point of f . Then, by Definition 8.5,


there exists a ceiling Y ∗ ∈ A such that Y ∗  X ∗ . It follows from (8.8) in the proof
of Theorem 8.15 that rA (X ∗ ) = 0, and hence rA ≤ f .
Since rA is a regular minorant of f , we have rA ≤ fR . To see that rA = fR ,
consider now any point X∗ such that rA (X ∗ ) = 0. By (8.8) in Theorem 8.15, there
exists Y ∗ ∈ A such that Y ∗  X ∗ . Since Y ∗ is a ceiling of f , f (Y ∗ ) = 0, and hence,
fR (Y ∗ ) = 0. By Theorem 8.14, we conclude that fR (X ∗ ) = 0.
This completes the proof of statement (i). The proof of the second statement is
similar. 

In the next subsections, we propose some algorithms for the computation of fR


and f R when f is positive (observe that Theorem 8.32 and Theorem 8.33 do not
depend on the positivity assumption). Before we turn to these problems, however,
we note that the size of the complete DNF of fR and of the complete DNF of f R
can be exponentially large in the size of the complete DNF of f , so that there is
no hope of computing fR and f R in polynomial time.
Example 8.18. Consider the function f = xn+1 ∨ . . . ∨ x2n on B 2n . Its unique max-
imal false point (and unique ceiling) is the characteristic vector of {1, 2, . . . , n},
that is, Y ∗ = (1, . . . , 1, 0, . . . , 0). Let A = {Y ∗ } and let F be the set of all points of
B2n with exactly n components equal to 1. Then, Y ∗ is a left-shift of every point
in F , and it follows from condition (8.8) in Theorem 8.15 that F is exactly the
set of maximal false points of rA = fR . Hence, the minimal true points of fR are
the points of B2n with exactly n + 1 components equal to 1, and their number is
exponential in the size of f . 

8.7.1 Largest regular minorant with respect to a given order


Consider a positive function f (x1 , x2 , . . . , xn ) = αxi xj ∨ βxi ∨ γ xj ∨ δ, where
i, j ∈ {1, 2, . . . , n} and α, β, γ , δ are positive DNFs that not involve either xi or xj .
Hammer, Johnson, and Peled [443] introduced an operation (to be called (i, j )–
minorization) that transforms the function f into another positive function fij
defined by any of the following equivalent expressions:

fij = f ∧ (xi ∨ f|xi =1, xj =0 ) (8.16)


= f ∧ (xi ∨ β ∨ δ) (8.17)
= (α ∨ γ ) xi xj ∨ β xi ∨ β γ xj ∨ δ, (8.18)

where we look at f|xi =1, xj =0 as a function of (x1 , x2 , . . . , xn ). We leave it to the


reader to verify that these expressions actually are equivalent. We say that fij is
the (i, j )–minor of f .
Example 8.19. The (1, 3)–minor of f = x1 x2 ∨ x3 x4 is

f13 = (x1 x2 ∨ x3 x4 ) (x1 ∨ x2 ) = x1 x2 ∨ x1 x3 x4 ∨ x2 x3 x4 . 


8.7 Regular minorants and majorants 383

The next result shows that fij is the largest positive minorant of f for which xi
is stronger than xj .

Theorem 8.34. Let f (x1 , x2 , . . . , xn ) be a positive Boolean function and let i, j be


any two indices in {1, 2, . . . , n}. Then,

(a) fij ≤ f ;
(b) xi is stronger than xj with respect to fij ;
(c) if g(x1 , x2 , . . . , xn ) is a positive function such that g ≤ f and xi g xj , then
g ≤ fij .

Proof. Assertions (a) and (b) are easily verified. Suppose now that g ≤ f and
xi g xj . Let Y ∈ Bn . We must show that g(Y ) ≤ fij (Y ).
If yi = 1, then fij (Y ) = f (Y ) by (8.16), and hence, g(Y ) ≤ fij (Y ).
If yi = yj = 0, then

fij (Y ) = f (Y ) ∧ f (Y ∨ ei ) (by (8.16))


= f (Y ) (by positivity of f )
≥ g(Y ).

If yi = 0 and yj = 1, let Z ∈ Bn be such that Y = Z ∨ ej and zj = 0. Then,

fij (Y ) = f (Y ) ∧ f (Z ∨ ei ) (by (8.16))


≥ g(Y ) ∧ g(Z ∨ ei )
≥ g(Y ) ∧ g(Z ∨ ej ) (because xi g xj )
= g(Y ).

Theorem 8.34 suggests the procedure RegMinor0 displayed in Figure 8.6.


Hammer, Johnson, and Peled [443] proved:

Procedure RegMinor0(f )
Input: A positive Boolean function f (x1 , x2 , . . . , xn ).
Output: fR , the largest regular minorant of f with respect to (x1 , x2 , . . . , xn ).

begin
fR := f ;
while there is a pair of variables xi , xj such that i < j and
xi is not stronger than xj with respect to fR
do fR := fij ;
return fR ;
end

Figure 8.6. Procedure RegMinor0.


384 8 Regular functions

Theorem 8.35. The procedure RegMinor0(f ) is correct, that is, it stops for
every input, and it returns the largest regular minorant of f with respect to
(x1 , x2 , . . . , xn ).
Proof. It follows from Theorem 8.34 that, if xi is not stronger than xj with respect
to f , then fij < f . Thus, the sequence of functions produced in the while loop is
strictly decreasing, and it must terminate. Denote by g the output of the procedure,
and denote by fR the largest regular minorant of f with respect to (x1 , x2 , . . . , xn )
(its existence is guaranteed by Theorem 8.32). We must show that g = fR .
By construction, g is regular with respect to (x1 , x2 , . . . , xn ). Thus, g ≤ fR by
definition of fR .
On the other hand, Theorem 8.34(c) implies (by induction) that fR ≤ f X for
each of the functions f X produced in the course of the procedure. In particular,
fR ≤ g, and this completes the proof. 

Example 8.20. Consider the function f = x1 x2 ∨ x3 x4 , and note that x1 and x3


are not comparable with respect to f . The (1, 3)–minor of f has already been
computed in Example 8.19. Since f13 is regular with respect to (x1 , x2 , x3 , x4 ), we
conclude that f13 is the largest regular minorant of f with respect to this order of
the variables. 

Regarding the computational complexity of RegMinor0, note that the number


of iterations of the while statement, that is, the number of (i, j )–minorization steps
to be executed, does not appear to be polynomially bounded. Indeed, it may very
well happen that a pairwise strength relation imposed in some minorization step
is destroyed in a further step, and hence, needs to be reestablished later on. This
possibility is illustrated by the following example.
Example 8.21. The strength preorder of the function f (x1 , x2 , x3 , x4 ) = x1 x2 ∨
x3 ∨ x4 is given as x3 ≈f x4 0f x1 ≈f x2 . Assume that we want to produce fR ,
namely, the largest minorant of f with respect to (x1 , x2 , x3 , x4 ). We start the execu-
tion of RegMinor0 by performing a (1, 3)–minorization step on f , thus producing
f13 = x1 x2 ∨ x1 x3 ∨ x2 x3 ∨ x4 . Observe now that x4 is strictly stronger than x3 with
respect to f13 . Thus, the relation “x3 is stronger than x4 ,” which holds for f and
is to hold for fR , has been temporarily lost for f13 . 

It is possible, however, to carry out the (i, j )–minorization steps in such a


way that, once established, the strength relation among a pair of variables will
not be spoiled in a later stage. More precisely, consider the following minoriza-
tion strategy: Impose first x1 f xj for all j ≥ 2, then x2 f xj for all j ≥ 3,
then x3 f xj for all j ≥ 4, and so on. The next result shows that, if we adopt
this strategy, then no relation xi f xj ever needs to be imposed twice; indeed,
all relations valid at some stage of the procedure (and expressed by conditions
(a), (b), (c) in the next statement) remain valid after a subsequent minorization
step.
8.7 Regular minorants and majorants 385

Theorem 8.36. Let f (x1 , x2 , . . . , xn ) be a positive function, and let i, j be two


indices in {1, 2, . . . , n}, i < j , such that the following conditions hold with respect
to f:

(a) x1  x2  · · ·  xi−j ;
(b) For all k ∈ {i, i + 1, . . . , n}, xi−1  xk ;
(c) For all k ∈ {i + 1, i + 2, . . . , j − 1}, xi  xk .

Then, conditions (a), (b), and (c) also hold with respect to fij .

Proof. Let h = fij and g = xi ∨ f|xi =1,xj =0 , so that h = f ∧ g (see (8.16)).


(a) To see that condition (a) holds with respect to h, suppose that 1 ≤ k < r ≤ i − 1.
Theorem 8.7 implies that xk  xr with respect to f|xi =1,xj =0 . So, xk f xr , xk g xr ,
and xk h xr follows by virtue of Theorem 8.9.
(b) To establish condition (b), consider first the case in which k  = i and k  = j .
Then, the same argument used in (a) shows that xi−1 h xk .
Consider next the case k = i, and let Y ∈ B n be such that yi−1 = yi = 0. We must
show that h(Y ∨ ei ) ≤ h(Y ∨ ei−1 ). Now,

h(Y ∨ ei ) = f (Y ∨ ei )

and

h(Y ∨ ei−1 ) = f (Y ∨ ei−1 ) ∧ g(Y ∨ ei−1 )


= f (Y ∨ ei−1 ) ∧ f|xi =1,xj =0 (Y ∨ ei−1 )
= f (Y ∨ ei−1 ) ∧ f|xj =0 (Y ∨ ei−1 ∨ ei ).

There are two distinct subcases. If yj = 0, then by positivity of f

f|xj =0 (Y ∨ ei−1 ∨ ei ) = f (Y ∨ ei−1 ∨ ei ) ≥ f (Y ∨ ei ).

On the other hand, if yj = 1, then

f|xj =0 (Y ∨ ei−1 ∨ ei ) ≥ f (Y ∨ ei )

since xi−1 f xj . In either case, we get

h(Y ∨ ei−1 ) ≥ f (Y ∨ ei−1 ) ∧ f (Y ∨ ei )


= f (Y ∨ ei ) (since xi−1 f xi )
= h(Y ∨ ei ),

so that xi−1 h xi as required.


Consider finally the case k = j . Here, the relation xi−1 h xj directly follows
from xi−1 h xi (which we just established) and from xi h xj (which follows from
the definition of h = fij ).
386 8 Regular functions

Procedure RegMinor(f )
Input: A positive Boolean function f (x1 , x2 , . . . , xn ).
Output: fR , the largest regular minorant of f with respect to (x1 , x2 , . . . , xn ).

begin
fR := f ;
for i = 1 to n − 1 do
for j = i + 1 to n do
if xi is not stronger than xj with respect to fR then fR := fij ;
return fR ;
end

Figure 8.7. Procedure RegMinor.

(c) We want to show that, for k ∈ {i + 1, i + 2, . . . , j − 1}, xi h xk . Let Y ∈ Bn be


such that yi = yk = 0. Then,

h(Y ∨ ek ) = f (Y ∨ ek ) ∧ f|xi =1,xj =0 (Y ∨ ek )


≤ f (Y ∨ ek )
≤ f (Y ∨ ei ) (since xi f xk )
= h(Y ∨ ei ),

and hence, xi h xk as required. 

Theorem 8.36 suggests the specialization of RegMinor0 described in


Figure 8.7.

Theorem 8.37. The procedure RegMinor(f ) is correct and performs O(n2 )


minorization steps, where n is the number of variables of f .

Proof. The correctness of the procedure is implied by Theorem 8.36, and the bound
on the number of minorization steps is trivial. 

Note that, despite the fact that the number of minorization steps performed by
RegMinor is small, this procedure necessarily runs in exponential (input) time,
in view of Example 8.18. It is not clear, however, whether the procedure runs in
polynomial total time, that is, in time polynomial in |f | + |fR | (see Appendix B).

8.7.2 Smallest regular majorant with respect to a given order


We start with an easy observation:
8.7 Regular minorants and majorants 387

Theorem 8.38. The smallest regular majorant of a function f (x1 , x2 , . . . , xn ) with


respect to a given order of the variables is the dual of the largest regular minorant
of f d with respect to the same order of the variables.

Proof. Let (x1 , x2 , . . . , xn ) be the given order, and let g d be the largest regular
minorant of f d with respect to (x1 , x2 , . . . , xn ). We must prove that g is the smallest
regular majorant of f with respect to (1, 2, . . . , n).
First, g d ≤ f d implies f ≤ g. Next, since g d is regular with respect to
(x1 , x2 , . . . , xn ), g is regular with the same strength preorder (by Theorem 8.10).
Finally, if g is not the smallest regular majorant of f with respect to (x1 , x2 , . . . , xn ),
then there exists another regular function h, with the same strength preorder, such
that f ≤ h < g. But then, g d < hd ≤ f d , contradicting the definition of g d . 

According to Theorem 8.38, everything there is to know about smallest regular


majorants can easily be derived from the corresponding results concerning largest
regular minorants. In particular, a procedure similar to RegMinor can be devel-
oped for the computation of the smallest regular majorant of a positive function
with respect to a given order, based on the following ideas.
Consider a positive function f (x1 , x2 , . . . , xn ) = αxi xj ∨ βxi ∨ γ xj ∨ δ, where
i, j ∈ {1, 2, . . . , n} and α, β, γ , δ are positive DNFs that do not involve either xi or
xj . We define the (i, j )–major of f as the function f ij (x1 , x2 , . . . , xn ) represented
by any of the following equivalent expressions:

f ij = f ∨ (xi ∧ f|xi =0, xj =1 ) (8.19)


= f ∨ γ xi (8.20)
= α xi xj ∨ (β ∨ γ ) xi ∨ γ xj ∨ δ. (8.21)

Paraphrasing the statement of Theorem 8.34, we obtain the following result due
to Hammer and Mahadev [452].

Theorem 8.39. Let f (x1 , x2 , . . . , xn ) be a positive Boolean function, and let i, j


be two indices in {1, 2, . . . , n}. Then,

(a) f ≤ f ij ;
(b) xi is stronger than xj with respect to f ij ;
(c) if g(x1 , x2 , . . . , xn ) is a positive function such that f ≤ g and xi g xj , then
f ij ≤ g.

Proof. This can be proved either by a duality argument or by adapting the proof of
Theorem 8.34. Details are left to the reader. 

Similarly to the minorization case, Theorem 8.39 leads to an algorithm that


produces the smallest regular majorant of an arbitrary positive function, and this
algorithm can be implemented to perform O(n2 ) majorization steps.
388 8 Regular functions

8.7.3 Regular minorization and set covering problems


We conclude this section by indicating how the concept of regular minorization
can be used to transform an arbitrary set covering problem into an equivalent (in
some sense to be made precise) regular set covering problem. Most results in this
section are due to Hammer, Johnson, and Peled [443].
Let us consider an instance of the problem SCP:
n

maximize z(x1 , x2 , . . . , xn ) = ci x i (8.22)
i=1

subject to f (x1 , x2 , . . . , xn ) = 0 (8.23)


n
(x1 , x2 , . . . , xn ) ∈ B , (8.24)

where f is a positive Boolean function, and let us define the set covering problem
SCP 12 as follows:
n

maximize z(x1 , x2 , . . . , xn ) = ci x i
i=1

subject to f12 (x1 , x2 , . . . , xn ) = 0


(x1 , x2 , . . . , xn ) ∈ Bn ,

where f12 is the (1,2)-minor of f defined in Section 8.7.1.


Since f12 ≤ f (Theorem 8.34), every feasible solution of SCP also is a feasible
solution of SCP 12 . Hence, the optimal value of SCP 12 is at least as large as the
optimal value of SCP. However, more is actually true, namely:

Theorem 8.40. If c1 > c2 , then SCP and SCP 12 have the same set of optimal
solutions.

Proof. We only need to show that every optimal solution of SCP 12 is feasi-
ble for SCP. Let X∗ = (x1∗ , x2∗ , . . . , xn∗ ) be an optimal solution of SCP 12 . Since
f12 (X ∗ ) = 0, Equation (8.16) implies that f (X∗ ) = 0 or

x1∗ ∨ f|x1 =1,x2 =0 (X ∗ ) = 0. (8.25)

If f (X∗ ) = 0, then we are done. Otherwise, (8.25) holds, or equivalently,

x1∗ = 0 and f (1, 0, x3∗ , x4∗ , . . . , xn∗ ) = 0. (8.26)

Now, there are two cases. If x2∗ = 0, then

f (X∗ ) = f (0, 0, x3∗ , x4∗ , . . . , xn∗ ) ≤ f (1, 0, x3∗ , x4∗ , . . . , xn∗ ) = 0,

and X∗ is feasible for SCP.


If x2∗ = 1, then define Y ∗ = (1, 0, x3∗ , x4∗ , . . . , xn∗ ). In view of (8.26),

Y is feasible for SCP, and hence, for SCP 12 . However, since c2 < c1 ,
8.7 Regular minorants and majorants 389

z(X ∗ ) = z(0, 1, x3∗ , x4∗ , . . . , xn∗ ) < z(1, 0, x3∗ , x4∗ , . . . , xn∗ ) = z(Y ∗ ). But then X ∗ is not
an optimal solution of SCP 12 , and we reach a contradiction. 

Example 8.22. Consider the set covering problem:

maximize z = 5x1 + 4x2 + 3x3 + 2x4 + x5 (8.27)


subject to f = x1 x2 ∨ x1 x3 ∨ x1 x4 ∨ x2 x3 ∨ x2 x5 = 0 (8.28)
5
(x1 , x2 , x3 , x4 , x5 ) ∈ B . (8.29)

The (1, 2)–minor of f is the regular function:

f12 = x1 x2 ∨ x1 x3 ∨ x1 x4 ∨ x2 x3 ∨ x2 x4 x5 .

As shown in Example 8.14, the maximal false points of f12 are X1 = (0, 1, 0, 1, 0),
X2 = (0, 1, 0, 0, 1), X3 = (1, 0, 0, 0, 1), and X4 = (0, 0, 1, 1, 1). The objective func-
tion value of X1 , X 3 and X4 is 6, and the value of X2 is 5. So, Theorem 8.40
implies that {X1 , X 3 , X 4 } is the set of optimal solutions of the original problem
(8.27)–(8.29). One easily verifies that this is indeed the fact, since X 1 , X 3 , and X4
are all the maximal false points of f . 

In view of Theorem 8.40 and of the results obtained in Section 8.7, it is now
natural to associate with SCP the set covering problem SCP R , defined as follows:
n

maximize z(x1 , x2 , . . . , xn ) = ci xi
i=1

subject to fR (x1 , x2 , . . . , xn ) = 0
(x1 , x2 , . . . , xn ) ∈ Bn ,

where fR is the largest regular minorant of f with respect to (x1 , x2 , . . . , xn ).

Theorem 8.41. If c1 > c2 > · · · > cn , then SCP and SCP R have the same set of
optimal solutions.

Proof. The statement easily follows from Theorem 8.35 and 8.40, by induction on
the number of (i, j )–minorization steps that are necessary to derive fR from f . 

Since the coefficients c1 , c2 , . . . , cn can always be sorted in nonincreasing order,


Theorem 8.41 provides a constructive transformation of an arbitrary set covering
problem into an equivalent regular one, under the assumption that all coefficients
of the objective function are distinct. (Compare with Exercise 15 in Chapter 1.) As
illustrated by the next example, the conclusion of Theorem 8.41 may fail to hold
when the coefficients are not distinct.
390 8 Regular functions

Example 8.23. Consider the set covering problem:


maximizez = 4x1 + 4x2 + 2x3 + x4 + x5
subject tof = x1 x2 ∨ x1 x3 ∨ x1 x4 ∨ x2 x3 ∨ x2 x5 = 0
(x1 , x2 , x3 , x4 , x5 ) ∈ B5
(compare with Example 8.22). It is easy to check that SCP has exactly two
optimal solutions, namely, X1 = (0, 1, 0, 1, 0) and X3 = (1, 0, 0, 0, 1), whereas
the associated problem SCP R has three optimal solutions, namely, X 1 , X 3 , and
X2 = (0, 1, 0, 0, 1). 
Nevertheless, the following result can be proved:
Theorem 8.42. If c1 ≥ c2 ≥ · · · ≥ cn , then the lexicographically largest optimal
solution of SCP R is an optimal solution of SCP.
Proof. As was the case for Theorem 8.41, we only need to show that the statement
is correct with SCP R replaced by SCP 12 . So, let X∗ = (x1∗ , x2∗ , . . . , xn∗ ) be the lexi-
cographically largest optimal solution of SCP 12 . If f (X ∗ ) = 0, then we are done.
Otherwise, as in the proof of Theorem 8.40, one shows that x1∗ = 0, and that X ∗
is feasible (hence, optimal) for SCP if x2∗ = 0. So, assume that x2∗ = 1. Note that
the point Y ∗ = (1, 0, x3∗ , x4∗ , . . . , xn∗ ) is feasible for SCP 12 . Moreover, since c2 ≤ c1 ,
z(X∗ ) ≤ z(Y ∗ ). Hence, Y ∗ is an optimal solution of SCP 12 and X∗ <L Y ∗ , con-
tradicting the choice of X ∗ . 

Theorem 8.42 suggests an approach to the solution of an arbitrary set cov-


ering problem SCP: First, transform SCP into the regular set covering problem
SCP R , then compute the lexicographically largest optimal solution of SCP R . This
approach may look attractive, since the procedures RegCover0 or RegCover
(presented in Section 8.6) are easily adapted to carry out its second phase in poly-
nomial time. However, the first phase involves the computation of fR , and hence,
as observed earlier, it cannot be performed in polynomial time (which does not
come as a surprise in view of the fact that the set covering problem is NP-hard).
As shown by Hammer, Johnson, and Peled [443], most results in this section
can easily be extended to optimization problems of the form
maximize g(X) (8.30)
subject to f (X) = 0 (8.31)
n
X∈B , (8.32)
where f is a Boolean function and g is a pseudo-Boolean function, that is, a real-
valued function on Bn (see Chapter 13), if the assumption c1 > c2 > . . . > cn is
replaced by an appropriate “generalized regularity” condition, namely,
for all 1 ≤ i < j ≤ n, and for X∗ ∈ Bn ,
if xi∗ = xj∗ = 0 then g(X∗ ∨ ei ) > g(X ∗ ∨ ej ). (8.33)
8.8 Higher-order monotonicity 391

8.8 Higher-order monotonicity


The notion of strength preorder among variables can be generalized to a notion of
strength relation among subsets of variables. Various such generalizations have
been introduced for instance by Muroga, Toda, and Takasu [700]; Paull and
McCluskey [732]; Winder [916] in the context of switching theory; and by Lapidot
[597] in his investigations of simple games (as cited by Einy [291]). Most of these
proposals originally stemmed from attempts to provide purely Boolean or com-
binatorial characterizations of threshold functions (or weighted majority games)
in contrast with the numerical flavor of Definition 8.1. These efforts, where the
study of regularity also found its origins (see Section 8.1), eventually resulted
in the unearthing of several important properties of threshold functions, that is,
necessary conditions for a function to be threshold.
We adopt here Lapidot’s approach [597, 291] which rests on a natural extension
of Definition 8.2. Recall the following notation: if T is a subset of {1, 2, . . . , n},
then the characteristic vector of T is denoted

eT = ei (and e∅ = 0).
i∈T

Definition 8.8. Let f (x1 , x2 , . . . , xn ) be a Boolean function and let S, T be two


subsets of {1, 2, . . . , n}. We say that S is stronger than T with respect to f , and we
write S f T if and only if, for all X ∗ ∈ Bn ,

xi∗ = 0 for all i ∈ S ∪ T ⇒ f (X∗ ∨ eS ) ≥ f (X∗ ∨ eT ).

We say that S and T are comparable with respect to f if either S f T holds or


T f S holds.

As usual, we drop the subscript f from the symbol f when no confusion can
result.

Application 8.6. (Game theory.) The strength relation among subsets of vari-
ables has a clear interpretation in the context of game theory. If f represents a
simple game, and S is stronger than T with respect to f , then a coalition C (dis-
joint from S and T ) can more easily form a winning coalition by joining S than
by joining T (remember Application 8.1). Therefore, Lapidot [597] says that S is
“more desirable” than T when S f T . This relation among coalitions was used
by Peleg [737, 738] to develop a theory of coalition formation in simple games,
and its game-theoretic properties have been further investigated by Einy [291]. 

It is easily checked that, for a function f (x1 , x2 , . . . , xn ) and for i, j ∈


{1, 2, . . . , n},

• {i} f ∅ if and only if f is positive in xi ;


• ∅ f {i} if and only if f is negative in xi ;
• {i} f {j } if and only if xi is stronger than xj in the sense of Definition 8.2.
392 8 Regular functions

Thus, in particular, monotone functions are precisely those functions such that
S and T are comparable whenever |S ∪ T | ≤ 1. Similarly, a positive function is
regular if and only if S and T are comparable whenever |S ∪ T | = 2.

Definition 8.9. A Boolean function f on B n is k-monotone (1 ≤ k ≤ n) if, for all


pairs of subsets S, T ⊆ {1, 2, . . . , n} such that |S ∪ T | ≤ k, S and T are comparable
with respect to f . A function on B n is completely monotone if it is n-monotone,
that is, if the strength relation is complete on the power set of {1, 2, . . . , n}.

So, 1-monotonicity is equivalent to monotonicity, and, up to switching the


negative variables, 2-monotonicity is equivalent to regularity. The motivation for
introducing k-monotonicity in connection with the study of threshold functions is
provided by the following result, which extends Theorem 8.3.

Theorem 8.43. Every threshold function is completely monotone. More precisely,


if f (x1 , x2 , . . . , xn ) is a threshold function withstructure (w , w , . . . , wn , t), and if
1 2
S, T are two subsets of {1, 2, . . . , n} such that i∈S wi ≥ i∈T wi , then S f T .

Proof. This is straightforward. 

Properties of k-monotone and completely monotone functions have been exten-


sively studied in the threshold logic literature (see, e.g., Winder [917] and Muroga
[698] for an account). Some of them have been independently rediscovered in
the framework of game theory (Einy [291]). We present now a sample of such
properties.
In view of Definition 8.9, k-monotonicity implies h-monotonicity for all h ≤ k.
Winder [916, 917] showed that this implication cannot be reversed in general:
Namely, for each k, there exists a (k − 1)-monotone function of n variables
that is not k-monotone (we omit the proof of this result, but see the end-of-
chapter exercises for the case k = 3). Also, all completely monotone functions
of eight or fewer variables are threshold functions, but in Chapter 9 we provide
an example of a nonthreshold completely monotone function of nine variables
(Theorem 9.15). In other words, complete monotonicity fails to be a sufficient
condition for thresholdness.
Thus, if we denote by T h the set of all threshold functions, by Mk the set of
k-monotone functions, and by CM the set of completely monotone functions, we
obtain the picture in Figure 8.8 for all k ≥ 1, where all inclusions are strict.
However, Winder [916, 917] proved that, for fixed n, the hierarchy of
k-monotone functions collapses at level 7n/28.

T h ⊂ CM ⊂ . . . ⊂ Mk+1 ⊂ Mk ⊂ . . . ⊂ M1

Figure 8.8. The hierarchy of k-monotone Boolean functions.


8.8 Higher-order monotonicity 393

Theorem 8.44. A Boolean function of n variables is completely monotone if and


only if it is 7n/28-monotone.
Proof. Assume that f (x1 , x2 , . . . , xn ) is not completely monotone. Then, there exist
S, T ⊆ {1, 2, . . . , n} such that S and T are not comparable with respect to f ,
meaning that there exist X∗ , Y ∗ ∈ Bn such that

xi∗ = 0 for all i ∈ S ∪ T , f (X∗ ∨ eS ) = 0, f (X∗ ∨ eT ) = 1,

and
yi∗ = 0 for all i ∈ S ∪ T , f (Y ∗ ∨ eS ) = 1, f (Y ∗ ∨ eT ) = 0.
We can assume without loss of generality that S and T are disjoint (see the end-
of-chapter exercises).
Let now I = {i : xi∗ = 0, yi∗ = 1}, J = {i : xi∗ = 1, yi∗ = 0}, K = {i : xi∗ = yi∗ , i  ∈
S ∪ T }, and define two points W ∗ , Z ∗ as follows:


 1 if i ∈ T ,

wi = xi∗ if i ∈ K,


0 otherwise,

and 

 1 if i ∈ S,

zi = xi∗ if i ∈ K,


0 otherwise.
It is trivial to check that X∗ ∨ eS = Z ∗ ∨ eJ , X∗ ∨ eT = W ∗ ∨ eJ , Y ∗ ∨ eS = Z ∗ ∨ eI ,
and Y ∗ ∨ eT = W ∗ ∨ eI . As a consequence, we see that

f (W ∗ ∨ eI ) = 0, f (W ∗ ∨ eJ ) = 1,

and
f (Z ∗ ∨ eI ) = 1, f (Z ∗ ∨ eJ ) = 0.
Hence, I and J are not comparable with respect to f . Since |I ∪J | + |T ∪ S| ≤ n,
we conclude that f is not k-monotone for some k ≤ 7n/28. 

So, if we restrict our attention to functions of n variables (for fixed n), the
hierarchy of k-monotone functions boils down to

T h ⊆ CM = M7n/28 ⊂ M7n/28−1 ⊂ . . . ⊂ M1 ,

and all inclusions are strict when n ≥ 9.


We now investigate the behavior of the strength relation with respect to the
fixation of variables (compare with Theorem 8.7).
Theorem 8.45. Let f (x1 , x2 , . . . , xn ) be a Boolean function, let i ∈ {1, 2, . . . , n},
and let S, T ⊆ {1, 2, . . . , n} \ {i}. If S is stronger than T with respect to f , then S is
stronger than T with respect to f|xi =1 and with respect to f|xi =0 .
394 8 Regular functions

Proof. This immediately follows from Definition 8.8. 

Recall that, for 0 ≤ d ≤ n, a face of B n of dimension d is a subset of B n of the


form

F (I , J ) = {X ∈ Bn | xi = 1 for all i ∈ I and xj = 0 for all j ∈ J }

where I , J are disjoint subsets of {1, 2, . . . , n} such that |I ∪ J | = n − d. Two


faces F1 , F2 are complementary if F1 = F (I , J ) and F2 = F (J , I ) for some
I , J ⊆ {1, 2, . . . , n}. We denote by f|I ,J or by f|F the restriction of a function
f (x1 , x2 , . . . , xn ) to a face F = F (I , J ) of B n . As usual, we sometimes consider
f|F as a Boolean function of d variables, where d is the dimension of F .

Theorem 8.46. For k ≤ n, a Boolean function f on B n is k-monotone if and


only if one of the implications f|I ,J ≤ f|J ,I or f|J ,I ≤ f|I ,J holds for all pairs of
complementary faces F (I , J ) and F (J , I ) of dimension at least n − k.

Proof. Necessity. Assume first that f is k-monotone, and consider two complemen-
tary faces F (I , J ) and F (J , I ), with |I ∪ J | ≤ k. By definition of k-monotonicity,
we can assume without loss of generality that I f J . But this easily implies that
f|I ,J ≥ f|J ,I .
Sufficiency. To prove the reverse implication, consider two (disjoint) subsets
S, T of {1, 2, . . . , n} such that |S ∪ T | ≤ k. Then, if we assume for instance that
f|S,T ≥ f|T ,S , it is straightforward to check that S f T . 

As a corollary, we obtain:

Theorem 8.47. A Boolean function is k-monotone if and only if its dual is k-


monotone.

Proof. This follows from Theorem 8.46 and from Theorem 4.2 in Section 4.1. 

Muroga, Toda, and Takasu [700] observed that completely monotone functions
are dual-comparable (see Section 4.1.3 for definitions).

Theorem 8.48. Every completely monotone Boolean function is either dual-minor


or dual-major.

Proof. Assume that f is neither dual-minor nor dual-major. Then, there exist
X ∗ , Y ∗ ∈ Bn such that f (X ∗ ) = 1, f d (X ∗ ) = 0, f (Y ∗ ) = 0, and f d (Y ∗ ) = 1.
Let S = {i : xi∗ = yi∗ = 1} and T = {i : xi∗ = yi∗ = 0}. Define two points W ∗ , Z ∗ ∈
n
B as follows:

wi∗ = 0 if i ∈ S ∪ T ,
= xi∗ otherwise,
8.8 Higher-order monotonicity 395

and

zi∗ = 0 if i ∈ S ∪ T ,
= yi∗ otherwise.

One easily verifies that

f (W ∗ ∨ eS ) = f (X∗ ) = 1, f (W ∗ ∨ eT ) = f (Y ∗ ) = 0,

and
f (Z ∗ ∨ eS ) = f (Y ∗ ) = 0, f (Z ∗ ∨ eT ) = f (X ∗ ) = 1.
Hence, S and T are not comparable with respect to f , and f is not completely
monotone. 

Taken together, Theorem 8.45 and 8.48 imply that the restriction of a completely
monotone function to any face of Bn is either dual-minor or dual-major. Ding
[272] established that this property actually characterizes completely monotone
functions.
Theorem 8.49. A Boolean function f on B n is completely monotone if and only
if, for every face F of Bn , f|F is either dual-minor or dual-major.
Proof. Assume that f is not completely monotone. Then, there exist S, T ⊆
{1, 2, . . . , n} and X ∗ , Y ∗ ∈ Bn such that

xi∗ = 0 for all i ∈ S ∪ T , f (X∗ ∨ eS ) = 0, f (X∗ ∨ eT ) = 1,

and
yi∗ = 0 for all i ∈ S ∪ T , f (Y ∗ ∨ eS ) = 1, f (Y ∗ ∨ eT ) = 0.
Moreover, we can again assume, without loss of generality, that S and T are
disjoint.
Let now I = {i : xi∗ = yi∗ = 1}, J = {i  ∈ (S ∪ T ) : xi∗ = yi∗ = 0}, F = F (I , J )
and g = f|F . We claim that g is neither dual-minor nor dual-major, that is, there
exist W ∗ , Z ∗ ∈ F such that g(W ∗ ) = 1, g d (W ∗ ) = 0 and g(Z ∗ ) = 0, g d (Z ∗ ) = 1.
We leave it to the reader to verify that W ∗ = X ∗ ∨ eT and Z ∗ = X ∗ ∨ eS are as
required. 

We conclude this section with a last characterization of complete monotonicity


that rests on the concept of 2-summability.
Definition 8.10. A Boolean function f on B n is 2-summable if there exist two (not
necessarily distinct) false points of f , say, X∗ , W ∗ ∈ Bn , and two (not necessarily
distinct) true points of f , say, Y ∗ , Z ∗ ∈ Bn , such that X ∗ + W ∗ = Y ∗ + Z ∗ (where
the summation is over Rn ). Otherwise, f is 2-asummable.
Example 8.24. The function f (x1 , x2 ) = x1 x2 ∨ x1 x2 is 2-summable. Indeed, if we
let X ∗ = (0, 1), W ∗ = (1, 0), Y ∗ = (0, 0) and Z ∗ = (1, 1), then X ∗ + W ∗ = Y ∗ + Z ∗ .
396 8 Regular functions

The function f (x1 , x2 , x3 , x4 ) = x1 x2 ∨ x3 x4 is also 2-summable. On the other hand,


it is easy to see that every threshold function is 2-asummable (see Theorem 9.14
in Chapter 9). 
Elgot [310] proved:
Theorem 8.50. A Boolean function is completely monotone if and only if it is
2-asummable.
Proof. Sufficiency. Assume that f (x1 , x2 , . . . , xn ) is not completely monotone, that
is, there exist S, T ⊆ {1, 2, . . . , n} and X ∗ , Y ∗ ∈ Bn such that xi∗ = yi∗ = 0 for i ∈ S ∪T
and f (X∗ ∨ eS ) = f (Y ∗ ∨ eT ) = 0, f (X ∗ ∨ eT ) = f (Y ∗ ∨ eS ) = 1. Then f is
2-summable, since

(X ∗ ∨ eS ) + (Y ∗ ∨ eT ) = (X∗ ∨ eT ) + (Y ∗ ∨ eS ).

Necessity. Assume that f (x1 , x2 , . . . , xn ) is 2-summable, and let X ∗ , W ∗ , Y ∗ , Z ∗


be as in Definition 8.10. Let S = {i : xi∗ = 1, yi∗ = 0}, T = {i : xi∗ = 0, yi∗ = 1}, and
define two points U ∗ , V ∗ ∈ Bn as follows:
&
∗ 0 if i ∈ S ∪ T ,
ui =
xi∗ otherwise,
&
∗ 0 if i ∈ S ∪ T ,
vi =
wi∗ otherwise.
From the equality X ∗ + W ∗ = Y ∗ + Z ∗ , one easily derives that
• for i  ∈ S ∪ T , xi∗ = yi∗ = u∗i and vi∗ = wi∗ = zi∗ ;
• for i ∈ S, xi∗ = zi∗ = 1 and yi∗ = wi∗ = u∗i = vi∗ = 0;
• for i ∈ T , xi∗ = zi∗ = u∗i = vi∗ = 0 and yi∗ = wi∗ = 1.
This, in turn, implies that

f (U ∗ ∨ eS ) = f (X∗ ) = 0, f (U ∗ ∨ eT ) = f (Y ∗ ) = 1,

f (V ∗ ∨ eS ) = f (Z ∗ ) = 1, f (V ∗ ∨ eT ) = f (W ∗ ) = 0.
Hence, S and T are not comparable with respect to f , and f is not completely
monotone. 

As a corollary of Theorem 8.50, we conclude that completely monotone


functions can be recognized in polynomial time (Ding [272]).
Theorem 8.51. There exists an O(n3 m4 ) algorithm to determine whether a pos-
itive function expressed in complete DNF is completely monotone, where n is the
number of variables and m is the number of prime implicants of the function.
Proof. Given a positive function f , we first test in time O(n2 m) whether f is
regular (see Theorem 8.20). If f is not regular, then it is not completely mono-
tone. Otherwise, we generate in time O(n2 m) the maximal false points of f (see
8.9 Generalizations of regularity 397

Theorem 8.28). Because f is positive, it follows from Definition 8.10 that f is


2-summable if and only if there exists a pair of maximal false points X∗ , W ∗ and a
pair of minimal true points Y ∗ , Z ∗ such that Y ∗ + Z ∗ ≤ X∗ + W ∗ (see Exercise 13).
In view of Theorem 8.29, f has at most nm maximal false points, and the claim
follows. 

More properties of k-monotone and completely monotone functions can


be found for instance in Ding [272], Einy [291], Giles and Kannan [379],
Muroga [698], Winder [917], and so on. We return to the topic of asummability in
Chapter 9.

8.9 Generalizations of regularity


In this section, we briefly introduce several extensions of the class of regular
functions and describe their main properties.

8.9.1 Weakly regular functions


The next result generalizes Theorem 8.22.
Theorem 8.52. For a positive Boolean function f (x1 , x2 , . . . , xn ), the following
properties are equivalent:
(a) xj f xn for all j ∈ {1, 2, . . . , n}.
(b) For all X ∗ ∈ B n−1 , (X ∗ , 0) is a maximal false point of f if and only if (X ∗ , 1)
is a minimal true point of f .
Proof. The proof of Theorem 8.22 establishes that (a) implies (b). We leave the
proof of the reverse implication as an easy end-of-chapter exercise. 

Based on this observation, and on the fact that many of the remarkable features
of regular functions actually rest on Theorem 8.22 and Theorem 8.23, Crama [224]
introduced the following class of functions:
Definition 8.11. A positive Boolean function f (x1 , x2 , . . . , xn ) is weakly regular
with respect to (x1 , x2 , . . . , xn ) if f is constant, or if
(a) xj f xn for all j ∈ {1, 2, . . . , n}, and
(b) f|xn =1 is weakly regular with respect to (x1 , x2 , . . . , xn−1 ).
We simply say that f is weakly regular if f is weakly regular with respect to some
permutation of its variables.
So, when f is weakly regular with respect to (x1 , x2 , . . . , xn ), xi is a “weakest”
variable in the preorder associated with f|xi+1 =···=xn =1 , for all i ∈ {1, 2, . . . , n}.
Clearly, regular functions are weakly regular, but the converse is not necessarily
true.
398 8 Regular functions

Example 8.25. The function x1 x3 ∨ x1 x4 ∨ x2 x3 ∨ x1 x2 x5 is weakly regular with


respect to (x1 , x2 , . . . , x6 ), but it is not regular (x1 and x3 are not comparable). 
Many results from previous sections extend in a straightforward way to weakly
regular functions. For instance, algorithm DualReg0 allows us to dualize these
functions in O(n2 m2 ) time, and RegCover0 solves weakly regular set covering
problems in the same time complexity. Regularization can be extended in an obvi-
ous way to weak regularization and can be used to solve set covering problems as
explained in Section 8.7.
Finally, Crama [224] noted that a function f (x1 , x2 , . . . , xn ) can be tested for
weak regularity in polynomial time by a simple greedy procedure: if there is no
variable xi such that xj f xi for all j ∈ {1, 2, . . . , n}, then f is not weakly regular;
otherwise, we can fix xi to 1 in f and repeat the test with f|xi =1 . The procedure
is correct because, when several variables qualify as “weakest" variables, f is
symmetric on these variables and hence, the choice among them is immaterial
(see Theorem 8.1).

8.9.2 Aligned functions


Boros [105] has introduced and investigated the class of aligned functions, which
provide another generalization of regular functions:
Definition 8.12. A positive Boolean function f (x1 , x2 , . . . , xn ) is aligned (with
respect to (x1 , x2 , . . . , xn )) if its dual f d is weakly regular (with respect to
(x1 , x2 , . . . , xn )).
Equivalently, the function f is aligned with respect to (x1 , x2 , . . . , xn ) if, for all
i ∈ {1, 2, . . . , n}, xi is weakest in the preorder associated with (f d )|xi+1 =···=xn =1 ,
which is identical to the preorder associated with f|xi+1 =···=xn =0 . This implies, in
particular, that aligned functions can be recognized in polynomial time by the same
type of procedure described for the recognition of weakly regular functions.
Boros [105] established yet another characterization of aligned functions:
Theorem 8.53. A positive Boolean function f (x1 , x2 , . . . , xn ) is aligned with

respect to (x1 , x2 , . . . , xn ) if and only if, for every prime implicant of f , say k∈A xk ,

and for every j  ∈ A such that j < µ = max{k | k ∈ A}, k∈(A∪{j })\{µ} xk is an
implicant of f .
Proof. We leave this proof as an exercise at the end of the chapter. 

Comparing this statement with Definition 7.5 in Section 7.4, it is easy to con-
clude that aligned functions have the LE property. As a consequence, aligned
functions can be dualized in O(n2 m) time (see Theorem 7.16 in Section 7.4.4).
Example 8.26. To see that the class of aligned functions is distinct from previ-
ously introduced classes, consider fB = x1 x2 ∨ x1 x3 ∨ x1 x4 ∨ x2 x3 ∨ x2 x4 x5 ∨
x3 x4 x5 x6 ∨ x4 x5 x6 x7 . This function is aligned with respect to (x1 , x2 , . . . , x7 ), but it
8.9 Generalizations of regularity 399

is not weakly regular, and hence, it is not regular either (Boros [105]). By duality,
f d is weakly regular, but it is not aligned.
On the other hand, the function fC4 = x1 x3 ∨ x1 x4 ∨ x2 x3 ∨ x2 x4 has the LE
property by virtue of Theorem 7.18, but it is not aligned since it does not have a
weakest variable. 

8.9.3 Ideal functions


Bertolazzi and Sassano [75] defined the class of ideal functions.

Definition 8.13. Let f = m k=1 i∈Ak xi be the complete DNF of a positive func-
tion. We say that xn is a last variable of f if, for all k, - ∈ {1, 2, . . . , m} such that

n ∈ Ak \ A- , there exists j ∈ A- such that i∈(Ak ∪{j })\{n} xi is an implicant of f .
We say that f is ideal with respect to (x1 , x2 , . . . , xn ) if xi is a last variable of
f|xi+1 =···=xn =1 for all i ∈ {1, 2, . . . , n}, and we say that f is ideal if f is ideal with
respect to some permutation of its variables.
Bertolazzi and Sassano proved that ideal functions can be recognized efficiently
(in time O(n3 m2 ); see [75] and Exercise 18 at the end of this chapter). They also
observed that regular function are ideal. More precisely:
Theorem 8.54. If xi f xn for all i = 1, 2, . . . , n, then xn is a last variable of f . In
particular, every weakly regular function is ideal.
Proof. To show that xn is last, suppose that k, - ∈ {1,

2, . . . , m} and that n ∈ Ak \ A- .
Choose j arbitrarily in A- \ Ak . Since xj f xn , i∈(Ak ∪{j })\{n} xi is an implicant
of f , and this proves the first part of the statement. The second part follows
immediately. 

The converse of this theorem is false:An ideal function is not necessarily weakly
regular.
Example 8.27. Each of x1 , x2 , x3 , x4 is a last variable of fC4 = x1 x3 ∨ x1 x4 ∨
x2 x3 ∨ x2 x4 , so that the function is ideal. But fC4 is neither weakly regular nor
aligned because it has no weakest variable.
Similarly, the function fP4 = x1 x2 ∨ x1 x3 ∨ x2 x4 is ideal with respect to the
order (x1 , x2 , x3 , x4 ), but fP4 is neither weakly regular nor aligned. 
The main motivation for considering ideal functions is that they can be dualized
in
polynomial time. To describe this result, let us introduce the following notation:
If i∈A xi is a prime implicant of f and if j ∈ A, we let

P (A, j ) = { h ∈ {1, 2, . . . , n} | xi is an implicant of f }


i∈(A∪{h})\{j }

and
Q(A, j ) = P (A, j ) ∪ {j }.
400 8 Regular functions

Note for futher reference that P (A, j ) ∩ A = ∅.


Bertolazzi and Sassano [75] proved:

Theorem 8.55. Let xn be a last variable of f = m k=1 i∈Ak xi . The prime impli-
cants of f d containing xn are exactly the elementary conjunctions of the form

i∈Q(Ak ,n) xi for all k ∈ {1, 2, . . . , m} such that n ∈ Ak .

Proof. Let P = {Ak | k = 1, 2, . . . , m}, let {A1 , A2 , . . . , Aq } = {A ∈ P | n ∈ A} and


let T = {Q(Ak , n) | k = 1, 2, . . . , q}. By Theorem 4.19, we must show that the sets
in T are exactly the minimal transversals of P that contain n.
Fix k ∈ {1, 2, . . . , q}. We first want to show that Q(Ak , n) is a transversal of P,
meaning that Q(Ak , n) ∩ A-  = ∅ for all - ∈ {1, 2, . . . , m}.
(i) If n ∈ A- , then n ∈ Q(Ak , n) ∩ A- .
(ii) If n  ∈ A- , then, since xn is a last variable, there exists j ∈ A- such

that i∈(Ak ∪{j })\{n} xi is an implicant of f . This, however, means that
j ∈ P (Ak , n), and hence, j ∈ Q(Ak , n) ∩ A- .
So, for all k ∈ {1, 2, . . . , q}, the set Q(Ak , n) contains a minimal transversal of
P. Assume now that A is a minimal transversal of P such that n ∈ A, and assume
that Q(Ak , n)  = A for all k = 1, 2, . . . , q. Fix k ∈ {1, 2, . . . , q}. Note that there exists
h ∈ Q(Ak , n) \ A: Otherwise, Q(Ak , n) ⊂ A, contradicting the minimality of A.
Clearly, h  = n and hence h ∈ P (Ak , n). So, by definition of P (Ak , n), there exists a
prime implicant A- , - ∈ {1, 2, . . . , m} such that A- ⊆ (Ak ∪{h})\{n}. But A∩A-  = ∅
(since A is a transversal) and h  ∈ A (by choice of h). Hence, A ∩ (Ak \ {n})  = ∅ or,
equivalently, (A \ {n}) ∩ Ak  = ∅. This conclusion holds for all k = 1, 2, . . . , q, but it
also holds trivially for k = q + 1, . . . , m because A is a transversal of P. Therefore,
we obtain that (A \ {n}) is a transversal of P, which contradicts the minimality of
A. This concludes the proof. 

Theorem 8.55, combined with Theorem 8.23, allows us to generate the dual of
an ideal function in polynomial time. Details are left to the reader.
Example 8.28. Note that the dual of an ideal function is generally not ideal:
For instance, the dual of the function fC4 defined in Example 8.27 is f2K2 =
x1 x2 ∨ x3 x4 , which is not ideal. 

8.9.4 Relations among classes


The mutual relations among the classes of Boolean functions introduced in this and
previous sections have not been completely clarified in the literature. Figure 8.9
summarizes the relations we have explicitly identified in this chapter.
We have provided examples showing that none of the implications in Figure 8.9
can be reversed (see Examples 8.25, 8.26, and 8.27). In Example 8.29, we show
that most of the missing implications cannot be added either. The exercises in
Section 8.10 contain a number of additional open questions that may be worth
8.10 Exercises 401

aligned ⇒ LE property

regular ⇒ 9 (dual)
weakly regular ⇒ ideal

Figure 8.9. Generalizations of regular functions.

investigating (we do not claim that these are very difficult questions, but simply
that their answers do not seem to appear readily in the literature).

Example 8.29. We have already noted that the function fC4 = x1 x3 ∨ x1 x4 ∨


x2 x3 ∨ x2 x4 is ideal with respect to (x1 , x2 , x3 , x4 ), but it is not aligned. Conversely,
aligned functions are not necessarily ideal, as illustrated by the function fB in
Example 8.26.
The function fP4 = x1 x2 ∨ x1 x3 ∨ x2 x4 has the LE property, and it is ideal with
respect to (x1 , x2 , x3 , x4 ), but it is neither aligned nor weakly regular.
Ideal functions are not necessarily shellable: Indeed, f = x1 x2 x5 ∨ x1 x3 ∨ x2 x4
is ideal with respect to (x1 , x2 , x3 , x4 , x5 ) but it is not shellable with respect to any
permutation of its terms (and it is not weakly regular either).
Finally, the dual of ideal functions is not necessarily shellable: the function
d
fC4 = f2K2 is a counter-example. 

8.10 Exercises
1. Let f (x1 , x2 , . . . , xn ) be a positive Boolean function, let g = f|x1 =1 , let
h = f|x1 =0 , and assume that both g and h are regular with respect to
(x2 , x3 , . . . , xn ). Show that f is not necessarily regular (compare with
Theorem 8.7).
2. Show that, for a function f (x1 , x2 , . . . , xn ) given in DNF, it is co-NP-complete
to decide whether x1 f x2 .
3. Prove Theorem 8.12 by resorting only to the definitions of regularity and of
the LE property.
4. Prove the validity of the claims in Application 8.2.
5. Prove the validity of the claims in Application 8.4.
6. Prove Theorem 8.24.
7. Show that Theorem 8.25 can be used to produce an O(n2 m) implementation
of DualReg0 (Crama [225]).
8. Prove Theorem 8.39.
9. Prove that Theorem 8.41 extends to problem (8.30)–(8.32) if the objective
function g satisfies the “generalized regularity” condition (8.33). (Hammer,
Johnson, and Peled [443]).
402 8 Regular functions

10. Show that in Definition 8.9, comparisons can be restricted to pairs of disjoint
subsets: A Boolean function f on Bn is k-monotone (1 ≤ k ≤ n) if and only
if S and T are comparable for all S, T ⊆ {1, 2, . . . , n} such that S ∩ T = ∅ and
|S ∪ T | ≤ k.
11. Show that the function f (x1 , x2 , . . . , x6 ) defined by
f = x1 (x2 ∨ x3 ∨ x4 x5 x6 ) ∨ x2 x3 (x4 x5 ∨ x4 x6 ∨ x5 x6 ) ∨ (x2 ∨ x3 ) x4 x5 x6
is regular, but is not 3-monotone (and, hence, is not threshold; Winder [917]).
12. Prove that
(a) a function f of n variables is completely monotone if and only if
its self-dual extension f SD (x1 , x2 , . . . , xn , xn+1 ) = f x n+1 ∨ f d xn+1 is
completely monotone;
(b) a self-dual function of n variables is completely monotone if and only
if it is 7n/38-monotone.
(See [698, 918].)
13. Show that, for a positive function f ,
(a) f is 2-summable if and only if there exists a pair of maximal false points
X∗ , W ∗ and a pair of minimal true points Y ∗ , Z ∗ such that Y ∗ + Z ∗ ≤
X∗ + W ∗ ;
(b) in the previous statement, the inequality Y ∗ + Z ∗ ≤ X∗ + W ∗ cannot be
replaced by Y ∗ + Z ∗ = X∗ + W ∗ . (Compare with Definition 8.10.)
14. Prove that a function f is completely monotone if and only if it is k-
monotone, where k is the largest degree of a prime implicant of f . (See
[272].)
15. When S f T holds, but T f S does not hold for two subsets S, T ⊆
{1, 2, . . . , n}, we write S0f T . Show that the relation 0f may be cyclic in
general, but is acylic for threshold functions. (See, e.g., Einy [291], but also
Muroga [698, p. 200] and Winder [917] for additional results along this line.)
16. Complete the proof of Theorem 8.52.
17. Prove Theorem 8.53 and conclude that aligned functions have the LE
property.
18. Prove that, if f (x1 , x2 , . . . , xn ) is ideal, then f|xj =1 is ideal for all j ∈
{1, 2, . . . , n}. Use this result to derive a polynomial-time algorithm for the
recognition of ideal functions. (See Bertolazzi and Sassano [75].)

19. Let G = (V , E) be a graph and let fG (x1 , x2 , . . . , xn ) = (i,j )∈E xi xj be
the corresponding stability function (see Section 1.13.5 in Chapter 1). For
i ∈ V , denote by N (i) the neighborhood of vertex i, that is, N (i) = {j ∈ V :
(i, j ) ∈ E}.
(a) Prove that xi  xj with respect to fG if and only if N (j )\{i} ⊆ N (i)\{j },
for all i, j ∈ V .
(b) Prove that fG is regular if and only if G does not contain 2K2 , P4 , or
C4 as induced subgraphs.
(c) Prove that fG is regular if and only if fG is weakly regular.
(See Chvátal and Hammer [201]; Crama [224].)
8.10 Exercises 403

Questions for thought


20. Analyze the complexity of the procedure RegMinor in Figure 8.7. Does it
run in polynomial total time (that is, in time polynomial in |f | + |fR |)?
21. Relations among function classes:
(a) Is it true that weakly regular functions have the LE property? Are they
shellable?
(b) If a function is both aligned and weakly regular with respect to
(x1 , x2 , . . . , xn ), is it also regular with respect to (x1 , x2 , . . . , xn )?
(c) Characterize ideal quadratic functions.
(d) Is there some unifying concept behind the definitions of “leaders" and
of the LE property on one hand, and those of “last variables" and of
ideal functions on the other hand? (Compare, e.g., Theorem 8.11 and
Theorem 8.54.) Exploring these concepts in parallel may lead to fruitful
insights.
22. (Due to Endre Boros.) What can be said about self-dual regular functions?
Are they always threshold?
23. (Due to Endre Boros.) For a regular function f and an arbitrary maximum
false point X∗ of f , does there always exist a permutation σ of the variables
such that f is regular and such that X∗ is a ceiling with respect to the new
order of the variable σ (x1 ), …, σ (xn )? (Compare with Example 8.10.)
9
Threshold functions

In this chapter, we investigate the properties of threshold Boolean functions, an


important class of functions which has already been mentioned several times in
previous chapters. Threshold functions provide a simple but fundamental model for
many questions investigated in electrical engineering, artificial intelligence, game
theory, and many other areas. As such, their main properties have been investi-
gated by countless researchers and frequently rediscovered in various guises. In
particular, we present here a number of necessary conditions for a function to
be threshold, and we establish a classical characterization of threshold functions
based on their “asummability” properties. We also describe a polynomial-time
recognition algorithm for threshold functions represented by positive disjunctive
normal forms, we analyze the complexity of enumerating the prime implicants and
of computing the Chow parameters of threshold functions, and we briefly examine
the class of threshold graphs.

9.1 Definitions and applications


Let us first recall the definition of threshold functions.
Definition 9.1. A Boolean function f on B n is called a threshold (or linearly
separable) function if there exist n weights w1 , w2 , . . . , wn ∈ R and a threshold
t ∈ R such that, for all (x1 , x2 , . . . , xn ) ∈ Bn ,
n

f (x1 , x2 , . . . , xn ) = 0 if and only if wi xi ≤ t.
i=1

The hyperplane {X ∈ Rn : ni=1 wi xi = t} is called a separator of f , and the
(n + 1)–tuple (w1 , w2 , . . . , wn , t) is called a (separating) structure of f . We say that
the separator and the separating structure represent f .
Note that in this definition, variables x1 , x2 , . . . , xn have to be interpreted as
natural numbers in {0, 1} ⊂ N, rather than purely Boolean, meaningless symbols

404
9.1 Definitions and applications 405

(remember the discussion in Section 1.1 of Chapter 1). In geometric terms, thresh-
old functions are precisely those functions for which the set of true points can be
separated from the set of false points by a hyperplane (the separator).
Example 9.1. The function f (x, y, z) = xy ∨ z is a threshold function, with sep-
arator {(x, y, z) ∈ R3 : x − y + 2z = 0} and with structure (1, −1, 2, 0). Observe
that f admits many other separators (actually, an infinite number of them): For
instance {(x, y, z) ∈ R3 : α x − α y + 2α z = 0} is a separator for all α > 0, but
so are {(x, y, z) ∈ R3 : x − 2y + 3z = 0}, {(x, y, z) ∈ R3 : 5x − 5y + 10z = 3},
and so on. 

Example 9.2. The function f (x, y) = xy ∨ x y is not a threshold function. Indeed,


its set of true points is {(0, 0), (1, 1)}, its set of false points is {(0, 1), (1, 0)}, and
these two sets cannot be separated by a line in R2 . 

Threshold functions constitute one of the most extensively investigated classes


of Boolean functions. This interest in threshold functions has been stimulated
by their central role in many fields of application, a role which is itself justified
by the simplicity of their description (since a threshold function is completely
characterized by a vector of (n + 1) numbers) and by their numerous nice
properties.
Application 9.1. (Electrical engineering.) A switching gate is an electrical device
(i.e., a circuit consisting of resistors, transistors, etc.) which admits a number
of input voltages V1 , V2 , . . . , Vn , and which releases an output voltage V0 . In a
simplified model, each of the input and output voltages can only assume two
distinct values, say, Vi ∈ {ai , bi } for i = 0, 1, 2, . . . , n.
A threshold gate is a special type of switching gate characterized by a threshold
value t and numerical weights w1 , w2 , . . . , wn attached to the inputs; the value of

the output voltage is equal to a0 if i wi Vi ≤ t and is equal to b0 otherwise. So,
up to a simple transformation of variables, the functioning of a threshold gate is
described by a threshold Boolean function.
Threshold gates can be combined in various ways to produce switching net-
works, that is, physical realizations of more general (not necessarily threshold)
Boolean functions, as explained in Section 1.13.2. As a matter of fact, every
Boolean function can be realized by a switching network of threshold gates (see
Theorem 9.2 hereunder). For this reason, threshold gates were widely used as
basic components in the design of early computers. This important application
stimulated, in the late 1950s, a dense flow of research aimed at understanding
the theoretical properties of threshold functions. This research eventually evolved
into a coherent field known as threshold logic, an account of which can be found,
for instance, in books by Dertouzos [269], Hu [511, 512], Mendelson [680], or
Muroga [698, 699].
More recently, the complexity of Boolean circuits made up of threshold gates
has been investigated in the theoretical computer science literature; we refer, for
406 9 Threshold functions

instance, to the monograph Wegener [902] and to papers by Anthony [27, 28],
Bruck [157], Krause and Wegener [583] for various aspects of this line of
research. 

Application 9.2. (Artificial neural networks.) An artificial neural network con-


sists of a directed graph D together with a collection of functions (or neurons)
associated with the vertices of G. In one of the best known models, D is acyclic,
each vertex of indegree 0 corresponds to a Boolean variable, and each neuron is
a threshold Boolean function, sometimes called perceptron in this context. Then,
each vertex of outdegree 0 can be viewed as computing a Boolean function obtained
as a superposition of several threshold functions, via the same feedforward process
described for combinational networks in Section 1.1.
The reader will quickly notice that this model is extremely similar to the switch-
ing circuit model sketched in Application 9.1. Its interpretation as an abstract
computational model, however, has given rise to an independent stream of research
originating with the book by Minsky and Papert [684]. We refer to Anthony [25, 27]
for a discussion of the links between Boolean threshold functions and neural net-
work theory. 

Application 9.3. (Reliability theory.) In reliability theory, a complex system con-


sisting of n components is called a k-out-of-n system (k ≤ n) if the system works
whenever at least k of its components work and if it fails otherwise. Thus, the struc-
ture function (see Section 1.13.4) of a k-out-of-n system is the threshold function

with separator {(x1 , x2 , . . . , xn ) ∈ Bn : ni=1 xi ≤ k − 1}.
More general threshold systems, namely, systems whose structure function is
an arbitrary threshold function, have been considered, for instance, by Ball and
Provan [49]. 

Application 9.4. (Game theory.) In the framework of game theory (recall Section
1.13.3), a positive threshold Boolean function is called a weighted majority game.
Such games model the familiar situation in which each of n players (or voters) is
assigned a number of votes, say, wi (i = 1, 2, . . . , n), which she can decide to cast –
or not – in favor of the issue at stake. The issue is adopted if the total number of votes
cast in its favor exceeds a predetermined threshold t. In the simplest case (simple
majority rule), every voter carries exactly one vote, and the threshold is equal
to half the number of players. More elaborate voting rules arise, for instance, in
legislatures, where the number of votes of each member is correlated to the size of
the constituency that she represents, or in shareholder meetings, where the number
of votes corresponds to the number of shares held by each member.
Weighted majority procedures constitute the main paradigm in the theory of sim-
ple games and social choice. Many properties of these procedures and of their gen-
eralizations appear in the literature, for instance, in [79, 720, 777, 850, 861, 893],
and so on. 
9.1 Definitions and applications 407

Application 9.5. (Integer programming.) A knapsack problem is an optimization


problem of the form
n

maximize ci xi
i=1
n

subject to wi xi ≤ t
i=1

(x1 , x2 , . . . , xn ) ∈ Bn ,

where ci , wi and t are nonnegative integers for i = 1, 2, . . . , n. Knapsack problems


have been extensively studied in integer programming (Kellerer, Pferschy, and
Pisinger [561]; Martello and Toth [671]).
Remember from Section 1.13.6 that the resolvent of a system of constraints in
0–1 variables is the Boolean function whose false points are the feasible solutions of

the system. So, by definition, the resolvent of the knapsack inequality i wi xi ≤ t
is a threshold function. We shall see in Section 9.5 that fundamental Boolean
concepts, such as that of prime implicant, prove useful in describing the solution
set of a knapsack inequality.
Conversely, any system of inequalities in 0–1 variables whose resolvent is a
threshold function, say f (x1 , x2 , . . . , xn ), is equivalent to a single linear inequal-

ity i wi xi ≤ t, where (w1 , w2 , . . . , wn , t) is a structure of f . In particular, a
polynomial-time algorithm will be given in Section 9.4 to decide whether an
instance of the set–covering problem can be transformed into an equivalent
instance of the knapsack problem in the same variables (where the term “equiva-
lent” means that both instances have the same set of feasible solutions). 

Application 9.6. (Distributed computing systems.) Boolean functions can be


used to prevent conflicts in distributed computing systems, as briefly sketched
in Application 4.7 of Section 4.2. A popular way to implement mutual exclusion
mechanisms in a distributed system relies on threshold functions. In this approach,
a vote wi is assigned to each site of the system, and a group of sites is allowed
to perform an operation (such as updating a database) only if its members have
a majority of the total number of votes (see, e.g., [258, 370]). Similar ideas have
been proposed in other computational contexts, such as the synchronization of
parallel processes [487, 888]. 

Before concluding this section, we mention the existence of a large body


of literature dealing with higher-degree generalizations of threshold functions.
Namely, a Boolean function f on B n is called a polynomial threshold func-
tion of degree k if there exists a multilinear (pseudo-Boolean) polynomial
 
p(X) = A∈P(N) c(A) i∈A xi of degree k such that

f (X) = 0 if and only if p(X) ≤ 0


408 9 Threshold functions

(recall the definitions in Section 1.12.2). So, a linearly separable function is a


threshold function of degree 1, and it is easy to see that every Boolean function on
B n is a polynomial threshold function of degree n.
Polynomial threshold functions have been investigated in connection with cir-
cuit complexity and neural networks. The reader is referred to the monograph by
Anthony [25] and to survey papers by Anthony [27], Bruck [157], or Saks [799]
for more information.

9.2 Basic properties of threshold functions


In this section, we get acquainted with some of the elementary properties of thresh-
old functions (see, e.g., [700, 732, 916, 917], as well as additional references cited
in [698]). Many of these properties can best be seen as necessary conditions for a
function to be threshold, but all turn out to be strictly weaker than thresholdness.
Complete characterizations of threshold functions will be presented in Section 9.3.
We start with a few easy observations.
Theorem 9.1. Elementary conjunctions and elementary disjunctions represent
threshold functions.
 
Proof. The equation i∈A xi + j ∈B (1−xj ) = |A|+|B|−1 defines a separator of
 
the function CAB = i∈A xi j ∈B x j , and the equation i∈A xi + j ∈B (1−xj ) = 0
 
defines a separator of the function CAB = i∈A xi ∨ j ∈B x j . 

An interesting corollary of Theorem 9.1 is that every Boolean function can be


expressed as a composition of threshold functions.
Theorem 9.2. Every Boolean function f (X) on Bn can be expressed in the form
f (X) = g(h1 (X), h2 (X), . . . , hm (X)),
where g and h1 , h2 , . . . , hm are threshold functions.
Proof. This follows immediately from Theorem 9.1 and from the fact that every
Boolean function has a disjunctive normal form. 

As mentioned in Application 9.1, this observation motivates the realization of


switching networks by threshold gates.
Another easy, but important, property of threshold functions is that their class
is closed under restrictions.
Theorem 9.3. If f (x1 , x2 , . . . , xn ) is a threshold function on B n with separating
structure (w1 , w2 , . . . , wn , t), then f|x1 =1 is a threshold function on B n−1 with sepa-
rating structure (w2 , w3 , . . . , wn , t − w1 ) and f|x1 =0 is a threshold function on B n−1
with structure (w2 , w3 , . . . , wn , t).
Proof. This is trivial. 
9.2 Basic properties of threshold functions 409

We have already observed that a threshold function may have infinitely many
separators (see Example 9.1). In fact, the set of separators can be characterized
more precisely.
Theorem 9.4. The separating structures of a threshold function of n variables
constitute a full-dimensional convex cone in Rn+1 .
Proof. If S and S are two arbitrary separating structures of the threshold function
f , and if α is a positive scalar, then α S and S + S are also separating structures
of f : Thus, the set of separating structures is a convex cone.
To establish full-dimensionality, let S = (w1 , w2 , . . . , wn , t). If f is identically
0, then the claim is easily checked. Otherwise, define
& n n
(
 
n
µ = min wi xi : wi xi > t, X ∈ B ,
i=1 i=1

and choose α arbitrarily in the interval (0, µ − t) so that t + α is nonzero (note


that µ is well-defined, and that µ − t > 0). Consider now the (n + 1) vectors
S 1 , S 2 , . . . , S n+1 , where S i = S + α ei + α en+1 (i = 1, 2, . . . , n), S n+1 = S + α en+1 ,
and ej denotes the j th unit vector in Rn+1 . It is straightforward to check that the
vectors S 1 , S 2 , . . . , S n+1 are linearly independent, and that each of them is a struc-
ture of f . 

As a corollary of Theorem 9.4, we obtain the following useful property (which


is also easily established from first principles).
Theorem 9.5. Every threshold function has an integral separating structure.
Proof. Details are left to the reader. 

In the remainder of this section, we try to understand where threshold functions


fit in the world of Boolean functions, in particular with respect to monotone,
regular, or dual-comparable functions (this topic will be taken up further in
Section 9.3).
First, we note that every threshold function is monotone, and hence, can be
turned into a positive function by “switching” some of its variables. Moreover, the
positivity or negativity of each variable is reflected in the sign of the corresponding
weight.
Theorem 9.6. Every threshold function is monotone. More precisely, if
f (x1 , x2 , . . . , xn ) is a threshold function with structure (w1 , w2 , . . . , wn , t), then,
for i = 1, 2, . . . , n:
(1) If wi = 0, then f does not depend on xi .
(2) If f does not depend on xi , then (w1 , . . . , wi−1 , 0, wi+1 , . . . , wn , t) is a structure
of f .
(3) If wi > 0, then f is positive in xi .
410 9 Threshold functions

(4) If f is positive in xi and f depends on xi , then wi > 0.


(5) If wi < 0, then f is negative in xi .
(6) If f is negative in xi and f depends on xi , then wi < 0.
(7) Assume that wj ≥ 0 for j = 1, 2, . . . , k, wj < 0 for j = k + 1, k + 2, . . . , n, and
define the function g(x1 , x2 , . . . , xn ) = f (x1 , x2 , . . . , xk , x k+1 , . . . , x n ). Then,
g is a positive
 threshold function with structure (w1 , w2 , . . . , wk , −wk+1 , . . . ,
−wn , t − nj=k+1 wj ).
Proof. The proof is left as an exercise. 

Example 9.3. The function f (x, y, z) = xy ∨ z considered in Example 9.1 is a


threshold function with structure (1, −2, 3, 0). The associated function g(x, y, z) =
xy ∨ z is also a threshold function with structure (1, 2, 3, 2). 

Let us stress the fact that, as the next example illustrates a variable can have
nonzero weight in the separating structure of a threshold function even if the
function does not depend on this variable (as a matter of fact, we will show later
that it is NP-hard to determine whether or not a threshold function given by a
separating structure depends on a particular variable; see Theorem 9.26).
Example 9.4. The function f (x, y, z, u) = xy ∨ z is a threshold function with
structure (2, 4, 6, 1, 5). The variable u, which is inessential, has positive weight in
this separating structure. 

One can check by complete enumeration that monotonicity is equivalent to


thresholdness for functions of three variables or less. However, as one may expect,
this statement does not hold for functions of more variables.
Example 9.5. The functions f (x, y, z, u) = xy ∨ zu, g(x, y, z, u) = xy ∨ yz ∨ zu,
and h(x, y, z, u)= xy ∨ yz ∨ zu ∨ xu are not threshold. Up to permutations of
their variables, f , g, and h are, in fact, the only positive nonthreshold functions
of four variables [698]. 

An easy way of proving that the functions f , g, and h in Example 9.5 are not
threshold is to observe that they are not regular. Indeed:
Theorem 9.7. Every threshold function has a complete strength preorder. More
precisely, if f (x1 , x2 , . . . , xn ) is a threshold function, then,
(1) for every structure (w1 , w2 , . . . , wn , t) of f , and for all i, j = 1, 2, . . . , n, if
wi ≥ wj , then xi f xj ;
(2) there exists a structure (w1 , w2 , . . . , wn , t) of f such that, for all i, j =
1, 2, . . . , n, wi ≥ wj if and only if xi f xj .
Proof. The proof of (1) is straightforward (this statement was established as The-
orem 8.3 in Section 8.1). As for statement (2), consider an arbitrary structure
(v1 , v2 , . . . , vn , t) of f . Denote by C an equivalence class of the relation ≈f ,
9.2 Basic properties of threshold functions 411

say, without loss of generality, C = {x1 , x2 , . . . , xk }. By symmetry of the variables


x1 , x2 , . . . , xk , it is clear that the vector

(Vi , t) = (vi , vi+1 , . . . , vk , v1 , v2 , . . . , vi−1 , vk+1 , vk+2 , . . . , vn , t)

is a structure of f , for i = 1, 2, . . . , k. Therefore, by Theorem 9.4, (V , t) =


1 k
k i=1 (Vi , t) is also a structure of f , for which all variables in C have the same
weight. The same procedure can be repeated for all other equivalence classes
of ≈f until we eventually obtain a structure (W , t) of f with the property that
xi ≈f xj implies wi = wj , for i, j = 1, 2, . . . , n. But then, (W , t) is a structure
of f as described by (2), since, by statement (1), xi 0f xj implies wi > wj , for
i, j = 1, 2, . . . , n. 

Example 9.6. Consider again the function g(x, y, z) = xy ∨ z defined in Exam-


ple 9.3. This is a threshold function with structure (1, 2, 3, 2). From statement
(1) in Theorem 9.7, we conclude that x g y g z. On the other hand, apply-
ing the procedure described in the proof of statement (2) with C = {x, y}, we
obtain the alternative structure (3/2, 3/2, 3, 2) that gives equal weight to symmetric
variables. 

It can be checked directly that every regular function of five variables or less is
a threshold function, but there exist nonthreshold regular functions of six variables
(see Winder [917] and Exercise 11 in Chapter 8).
We also recall Theorem 8.43 (Section 8.8).

Theorem 9.8. Every threshold function is completely monotone. More precisely,


if f (x1 , x2 , . . . , xn ) is a threshold function with structure (w1 , w2 , . . . , wn , t), and if
 
S, T are two subsets of {1, 2, . . . , n} such that i∈S wi ≥ i∈T wi , then S f T .

Proof. Straightforward. 

As mentioned in Section 8.8, all completely monotone functions of eight vari-


ables or less are threshold functions, but we shall present in Section 9.3 an example
of a nonthreshold completely monotone function of nine variables (see Theo-
rem 9.15). Let us also mention that Winder [917] has constructed a nonthreshold
completely monotone function, for which the strength relation f is acyclic (see
Einy [291] and Muroga [698] for additional information on this line of research).
We now investigate the behavior of threshold functions with respect to
dualization (see Section 4.1.3 for definitions).

Theorem 9.9. If f is a threshold function on B n and (w1 , w2 , . . . , wn , t) is


an integral structure of f , then f d is a threshold function with structure
 
(w1 , w2 , . . . , wn , ni=1 wi − t − 1). If t ≤ 12 ( ni=1 wi − 1), then f is dual-major.

If t ≥ 12 ( i wi − 1), then f is dual-minor.
412 9 Threshold functions


Proof. Let t = ni=1 wi − t − 1. Since t and w1 , w2 , . . . , wn are integral, the
following equivalences hold for all X ∈ Bn :
f d (X) = 0 if and only if f (X) = 1
n

if and only if wi (1 − xi ) > t
i=1
n

if and only if wi xi ≤ t .
i=1

This proves the first part of the statement. For the second and third parts, simply
notice that f d ≤ f if t ≤ t and f ≤ f d if t ≤ t. 

In view of Theorem 9.5, the requirement that the structure be integral is obvi-
ously not essential in the statement of Theorem 9.9; it merely simplifies its
expression. Note also that the conditions for f to be dual-major or dual-minor
are sufficient, but not necessary, in this statement, as illustrated by the next exam-
ple. (Exercise 8 at the end of the chapter actually suggests that it may be hard to
characterize self-dual threshold functions.)
Example 9.7. The threshold function f (x, y, z, u) = xy ∨ xz ∨ xu ∨ yzu
admits the structure (4, 2, 2, 2, 5). Thus, f d is a threshold function with structure
(4, 2, 2, 2, 4), and f is dual-minor. But another structure of f is (2, 1, 1, 1, 2), which
implies that the same vector (2, 1, 1, 1, 2) is also a structure of f d , and hence, that
f is self-dual: f = f d . 
The next property was independently observed in the context of threshold logic
(see for instance [698]) and of game theory (see [291]). It involves the concept of
self-dual extension, which we introduced in Section 4.1.3.
Theorem 9.10. The function f (x1 , x2 , . . . , xn ) is a threshold function if and only if
its self-dual extension f SD (x1 , x2 , . . . , xn , xn+1 ) = f x n+1 ∨ f d xn+1 is a threshold
function.
Proof. Assume that f is a threshold function and that (w1 , w2 , . . . , wn , t) is an inte-
gral structure of f . Then, it follows from Theorem 9.9 that (w1 , w2 , . . . , wn , 2t +

1 − ni=1 wi , t) is a structure of f SD . Conversely, if f SD is a threshold
function with structure (w1 , w2 , . . . , wn , wn+1 , t), then (w1 , w2 , . . . , wn , t) is a
structure of f . 

We conclude this section by stating some results regarding the number of thresh-
old functions and the size of the weights required in a separating structure. The
number of threshold functions of n variables is known quite precisely.
Theorem 9.11. The number τn of threshold functions of n variables satisfies
n2 n
− ≤ log2 τn ≤ n2 . (9.1)
2 2
9.3 Characterizations of threshold functions 413

Moreover, for n sufficiently large,



2 10
n 1− < log2 τn . (9.2)
ln n
The upper bound in (9.1) was independently proved by several authors and
published by Winder in [916] (see [698, 917] for an account). The lower bound in
(9.1) is due to Yajima and Ibaraki [928] and Smith [842]. The sharper asymptotic
lower bound (9.2) was eventually established by Zuev [940], thus settling Winder’s
conjecture [917] that (log2 τn )/n2 approaches 1 as n goes to infinity. We do not
prove these results here; the reader is referred to the original publications or to
Anthony [25, 27] for extensions.
The size of weights in a separator can be bounded as follows:

Theorem 9.12. For every threshold function of n variables, there exists an integral
separating structure (w1 , w2 , . . . , wn , t) such that

max{|w1 |, |w2 |, . . . , |wn |, |t|} ≤ (n + 1)nn/2 . (9.3)

Moreover, there are constants k > 0 and c > 1 such that, for n a power of 2, there
is a threshold function of n variables, such that any integral separating structure
representing f involves a weight of magnitude at least kc−n nn/2 .

In Theorem 9.12 (that we quote directly from Anthony [25]), the upper bound
is due to Muroga [698] and the lower bound is due to Håstad [477]. Observe that
the dominating factor nn/2 is identical in both bounds. Here, we again omit the
proofs and refer the reader to [25, 27, 477, 698] for additional details; see also
Diakonikolas and Servedio [271] for significant extensions.

9.3 Characterizations of threshold functions


In this section, we present two alternative characterizations of threshold functions
and discuss related results.
The first characterization is a simple linear programming formulation which
provides a useful computational tool for the recognition of threshold functions
(see Section 9.4). For the sake of simplicity, we only state it for positive functions:
Since every threshold function is monotone, this restriction does not entail any
essential loss of generality.

Theorem 9.13. A positive Boolean function with maximal false points


X 1 , X 2 , . . . , X p and minimal true points Y 1 , Y 2 , . . . , Y m is a threshold function if
and only if the system of inequalities
 n j
 wi xi ≤ t (j = 1, 2, . . . , p)
i=1
n j
(TS) i=1 wi yi ≥ t + 1 (j = 1, 2, . . . , m)

wi ≥ 0 (i = 1, 2, . . . , n)
414 9 Threshold functions

has a solution (w1 , w2 , . . . , wn , t). When this is the case, every solution of (TS) is a
separating structure of the function.
Proof. The statement follows directly from Definition 9.1 and Theorems 9.4 and
9.6. 

Example 9.8. Let f = x1 x2 ∨ x1 x3 x4 ∨ x2 x3 x4 . The maximal false points of f are


(1, 0, 1, 0), (1, 0, 0, 1), (0, 1, 1, 0), (0, 1, 0, 1), (0, 0, 1, 1), and its minimal true points
are (1, 1, 0, 0), (1, 0, 1, 1), (0, 1, 1, 1). Thus, the system (TS) associated with f is

w1 + w3 ≤ t
w1 + w4 ≤ t
w2 + w3 ≤ t
w2 + w4 ≤ t
w3 + w4 ≤ t
w1 + w2 ≥ t +1
w1 + w3 + w4 ≥ t +1
w2 + w3 + w4 ≥ t +1
w1 , w2 , w3 , w4 ≥ 0.
This system admits the solution (w1 , w2 , w3 , w4 , t) = (5, 4, 3, 2, 8). Hence, f is a
threshold function with structure (5, 4, 3, 2, 8). 
Theorem 9.13, like Definition 9.1, has a strong numerical flavor. The next result
originated in the efforts devoted by researchers in switching logic to establish
purely combinatorial, rather than numerical, characterizations of threshold func-
tions (remember that the study of regularity and of k-monotonicity also originated
in such attempts; see Chapter 8).
We start with a definition (due to Winder [917]) that extends the notions
of 2-summability and 2-asummability already introduced in Definition 8.10 of
Section 8.8.
Definition 9.2. Let k ∈ N, k ≥ 2. A Boolean function f on B n is k-summable if, for
some r ∈ {2, 3, . . . , k}, there exist r (not necessarily distinct) false points of f , say,
X 1 , X 2 , . . . , X r , and r (not necessarily distinct) true points of f , say, Y 1 , Y 2 , . . . , Y r ,
 
such that ri=1 X i = ri=1 Y i . A function is k-asummable if it is not k-summable,
and it is asummable if it is k-asummable for all k ≥ 2.
Example 9.9. We have shown in Example 8.24 that the function f (x1 , x2 ) =
x1 x2 ∨ x1 x2 is 2-summable. We shall provide an example of a 2-asummable,
3-summable function in the proof of Theorem 9.15. 
The following characterization of threshold Boolean functions is due to Chow
[193] and Elgot [310].
Theorem 9.14. A Boolean function is a threshold function if and only if it is
asummable.
9.3 Characterizations of threshold functions 415

Proof. Let f be a threshold function on Bn with structure (W , t) ∈ Rn+1 , let


X 1 , X 2 , . . . , X r be r false points of f , and let Y 1 , Y 2 , . . . , Y r be r true points of
f . Then, for i = 1, 2, . . . , r,

W Xi ≤ t < W Y i ,
 
and hence, ri=1 X i  = ri=1 Y i . Therefore, f is asummable.
Conversely, assume that f is not threshold, meaning that the set
{X 1 , X 2 , . . . , X p } of false points of f cannot be separated from the set
{Y 1 , Y 2 , . . . , Y m } of its true points by a hyperplane of Rn . Then, standard separation
theorems (see, e.g., [788]) imply that the convex hulls of {X1 , X 2 , . . . , X p } and of
{Y 1 , Y 2 , . . . , Y m } have nonempty intersection. In other words, the following system
has a feasible solution in the variables ui , i = 1, 2, . . . , p, and vj , j = 1, 2, . . . , m:
p m
 
ui X i = vj Y j (9.4)
i=1 j =1
p

ui = 1 (9.5)
i=1
m

vj = 1 (9.6)
j =1

ui ≥ 0 (i = 1, 2, . . . , p) (9.7)
vj ≥ 0 (j = 1, 2, . . . , m). (9.8)

Let (U , V ) ∈ Qp+m be a rational solution of (9.4)–(9.8) (such a solution exists, since


the system has rational coefficients). For some positive integer k, all components
of the vector (kU , kV ) are nonnegative integers, and
p m
 
(kui ) X i = (kvj ) Y j (9.9)
i=1 j =1
p

kui = k (9.10)
i=1
m

kvj = k. (9.11)
j =1

Now the equalities (9.9)–(9.11) express that f is a k-summable function: Simply


take kui copies of the false point X i for i = 1, 2, . . . , p, and kvj copies of the true
point Y j for j = 1, 2, . . . , m. 

In this proof, we have stressed the connection of Theorem 9.14 with geometric
separability theorems. Alternatively, this result could also be deduced directly by
416 9 Threshold functions

T h ⊆ Ak+1 ⊆ Ak ⊆ . . . ⊆ A2 = CM

Figure 9.1. The hierarchy of k-asummable Boolean functions.

applying the strong duality theorem of linear programming to the formulation (TS)
(as in [310, 917]).
Thus, if we denote by T h the set of threshold functions, by Ak the set of
k-asummable functions (k ≥ 2), and by CM the set of completely monotone func-
tions, we obtain the hierarchy displayed in Figure 9.1 for all k ≥ 2. (Compare with
the hierarchy of k-monotone functions pictured in Figure 8.8 of Section 8.8, and
recall that A2 = CM by Theorem 8.50.)
It was once conjectured that this hierarchy may be finite, meaning that there
would exist some possibly large, but fixed value k ∗ such that the equality T h =
Ak = Ak∗ holds for all k ≥ k ∗ . This conjecture was demolished by Winder [915,
917] who proved that, for every k, there exist k-asummable functions that are not
linearly separable. We do not establish this result here, but simply prove the weaker
statement that the inclusion T h ⊆ A2 is strict.
Theorem 9.15. Some 2-asummable functions are not threshold functions.
Proof. Moore [692] (cited in [698, 917]) first exhibited a 12-variable function
establishing this statement. Gabelman [356] later produced a 9-variable example.
We propose here a variant of Gabelman’s example.
Consider first the vector A = (14, 18, 24, 26, 27, 30, 31, 36, 37). We shall use the
observation that the only points of B9 lying on the hyperplane H = {X ∈ B 9 :
9
i=1 ai xi = 81} are the six points:

X1 = (1, 0, 0, 0, 0, 0, 1, 1, 0), X 2 = (0, 1, 0, 1, 0, 0, 0, 0, 1), X 3 = (0, 0, 1, 0, 1, 1, 0, 0, 0),


Y 1 = (1, 0, 0, 0, 0, 1, 0, 0, 1), Y 2 = (0, 1, 0, 0, 1, 0, 0, 1, 0), Y 3 = (0, 0, 1, 1, 0, 0, 1, 0, 0).
Define now a Boolean function f (x1 , x2 , . . . , x9 ) as follows: The false points of f
are all points X such that 9i=1 ai xi ≤ 80, plus the three points X1 , X2 , and X 3 .
Notice that, in particular, Y 1 , Y 2 , and Y 3 are true points of f . We claim that f is
2-asummable but not a threshold function.
To see that f is not a threshold function, it suffices to observe that X1 + X 2 +
X = Y 1 + Y 2 + Y 3 , and hence, that f is 3-summable.
3

On the other hand, assume that f is 2-summable, and that


U ∗ + V ∗ = W ∗ + Z∗, (9.12)
where U ∗ , V ∗ are false points of f , and W ∗ , Z ∗ are true points of f . Then, all four
points U ∗ , V ∗ , W ∗ , and Z ∗ must lie on the hyperplane H; otherwise,
9
 9
 9
 9

ai u∗i + ai vi∗ < ai wi∗ + ai zi∗ ,
i=1 i=1 i=1 i=1
9.4 Recognition of threshold functions 417

threshold 2-asummable
: ⇒ k-asummable ⇒ : ⇒ k-monotone
asummable (k > 2) completely (k ≥ 1)
monotone

Figure 9.2. A hierarchy of Boolean functions.

contradicting equation (9.12). So, {U ∗ , V ∗ } ⊂ {X1 , X 2 , X 3 } and {W ∗ , Z ∗ } ⊂


{Y 1 , Y 2 , Y 3 }. But this is easily seen to be incompatible with equation (9.12). 

The proof of Theorem 9.15 actually shows that the inclusion A3 ⊂ A2 is strict.
This result was generalized by Taylor and Zwicker [860], who proved that Ak+1  =
Ak for all k ≥ 2. Another interesting generalization of Winder’s result is provided
by Theorem 11.14 in Chapter 11.
Figure 9.2 summarizes the relations between some of the classes of Boolean
functions studied in this chapter and in the previous one. The one-way implications
displayed in Figure 9.2 cannot be reversed. It may be useful to recall here that 1-
monotone functions are exactly monotone functions, and that 2-monotone positive
functions coincide with regular functions. Figure 9.2 will be enriched with one
more class of functions in Section 9.6 (see Figure 9.5).

9.4 Recognition of threshold functions


9.4.1 A polynomial-time algorithm for positive DNFs
A fundamental algorithmic problem is to recognize whether a given Boolean func-
tion f is a threshold function, and, when the answer is affirmative, to produce a
separating structure of f . As always, the complexity of this problem depends very
much on the assumptions regarding the format of its input: For instance, it is easy
to see that the problem can be solved by linear programming when f is given by its
truth table, but that it may require an exponential number of steps when f is given
by an oracle (see Exercise 10; note however that Matulef, O’Donnell, Rubinfeld,
and Servedio [677] provide efficient algorithms for “approximately” recognizing
threshold function in the oracle framework).
We focus here on the following formulation of the recognition problem:

Threshold Recognition
Instance: A Boolean function f represented by a Boolean expression.
Output: False if f is not a threshold function; a separating structure of f other-
wise.

This question has been extensively studied in the threshold logic literature under
the name of threshold synthesis problem (see, e.g., Hu [511] or Muroga [698]). It
has stimulated the discovery of properties of threshold functions that we discussed
418 9 Threshold functions

in Section 9.2 and Section 9.3. As we have seen, all early attempts to derive
a “tractable” characterization of thresholdness were unsuccessful. In particular,
none of the increasingly intricate conjectures linking threshold functions to k-
monotonicity or to k-asummability has resisted a deeper examination. Note also
that the asummability characterization in Theorem 9.14 does not seem to yield a
straightforward, efficient thresholdness test.
In spite of this negative news, we are going to prove in this section that the
threshold recognition problem is polynomially solvable when the input function is
positive and is expressed by its complete (prime irredundant) disjunctive normal
form. In Section 9.4.3, we briefly discuss the extent to which these assumptions
are restrictive.
Like most classical approaches to the threshold recognition problem, the algo-
rithm presented relies on the characterization of threshold functions and on the
system of inequalities (TS) formulated in Theorem 9.13. We know that if a pos-
itive Boolean function is given by its complete DNF, then the list of its minimal
true points is readily available. Thus, in order to generate the system (TS) for
such a function, we only need to enumerate the maximal false points of the func-
tion, or, equivalently, to dualize it. But, as we know from Chapter 4, dualizing an
arbitrary positive Boolean function is in general a difficult task, and the number
of maximal false points may very well be exponential in the size of the input
DNF. These difficulties originally motivated the quest for efficient dualization
algorithms for regular functions, which eventually led to the results presented in
Section 8.5. Indeed, these results are easily exploited to obtain a polynomial-time
implementation of the recognition procedure displayed in Figure 9.3.
We thus obtain a remarkable result due to Peled and Simeone [735].
Theorem 9.16. The procedure Threshold is correct and can be implemented to
run in time O(n7 m5 ), where n is the number of variables and m is the number of
prime implicants of the function to be tested.

Procedure Threshold(f )
Input: The complete DNF of a positive Boolean function f (x1 , x2 , . . . , xn ).
Output: False if f is not a threshold function; a separating structure of f otherwise.

begin
if f is not regular then return False
else begin
dualize f ;
set up the system (TS);
solve (TS);
if (TS) has no solution then return False
else return a solution (w1 , w2 , . . . , wn , t) of (TS);
end
end

Figure 9.3. Procedure Threshold.


9.4 Recognition of threshold functions 419

Proof. Testing whether the input function f is regular can be accomplished in time
O(n2 m) by the procedure Regular presented in Section 8.4 (Theorem 8.20). If
f is not regular, then f is not a threshold function (by Theorem 9.7). If f is
regular, then it can be dualized in O(n2 m) time by the procedure DualReg (The-
orem 8.28), and the system (TS) can be set up within the same time bound. Now, by
Theorem 9.13, f is a threshold function if and only if the system (TS) is consistent,
and every solution of (TS) is a structure of f . Using a polynomial-time algorithm
for linear programming (see [76, 812]), (TS) can be solved in time O(n7 m5 ), since
(TS) has n + 1 variables and O(nm) constraints (by Theorem 8.29). 

Example 9.10. Let f = x1 x2 ∨ x1 x3 x4 ∨ x2 x3 x4 . This function is regular, with


x1 ≈f x2 0f x3 ≈f x4 , and f d = x1 x2 ∨ x1 x3 ∨ x1 x4 ∨ x2 x3 ∨ x2 x4 . The system
(TS) associated with f was set up in Example 9.8, where we learned that f is a
threshold function with structure (5, 4, 3, 2, 8). 
Note that the worst-case time complexity of the procedure Threshold is quite
high due to the solution of the system of linear inequalities (TS) by a generic linear
programming algorithm. This observation is somewhat disturbing in view of the
fact that the other steps of the procedure require only O(n2 m) operations. It may
be interesting to know whether threshold functions can be recognized through an
entirely combinatorial procedure without resorting to the solution of the system
(TS) by a generic linear programming algorithm. An attempt in this direction can
be found in Smaus [841], but some of the details missing in the proofs of this paper
may not be easy to fill in.

9.4.2 A compact formulation


For practical computations, the system (TS) can be simplified considerably (even
though these simplifications do not affect the worst-case complexity of Thresh-
old). To understand this, assume, for instance, that the input function is regular
with x1 ≈f x2 0f x3 . Then, Theorem 9.7(2) can be used to introduce the additional
constraints w1 = w2 ≥ w3 in (TS). (Actually, we could even add the constraints
w2 ≥ w3 + 1; check this!) As a consequence, some of the original constraints of
(TS) become redundant and can be eliminated.
Example 9.11. Consider again the function f = x1 x2 ∨ x1 x3 x4 ∨ x2 x3 x4 as in
Example 9.10. Since x1 ≈f x2 0f x3 ≈f x4 , we can add to (TS) the constraints
w1 = w2 ≥ w3 = w4 . As a consequence, w2 and w4 can be eliminated from the
system (TS), which reduces to


 w1 + w3 ≤ t

 2w3 ≤ t

(TS∗ ) 2w1 ≥ t +1



 w1 + 2w 3 ≥ t +1

w1 ≥ w3 ≥ 0.
420 9 Threshold functions

Moreover, since 2w3 ≤ w1 + w3 in every solution of (TS*), the second inequality


of (TS*) is redundant and can be removed. A solution of (TS*) is, for instance,
(2, 2, 1, 1, 3), which is easily seen to be a separating structure of f . 

To describe more precisely what happens in Example 9.11, we first recall two
definitions from Section 8.3.

Definition 9.3. For any two points X∗ , Y ∗ ∈ B n , we say that Y ∗ is a left-shift of


X∗ and we write Y ∗  X ∗ if there exists a mapping σ : supp(X∗ ) → supp(Y ∗ )
such that

(a) σ is injective, that is, σ (i)  = σ (j ) when i  = j , and


(b) σ (i) ≤ i for all i = 1, 2, . . . , n.

Definition 9.4. A point X∗ ∈ B n is a ceiling of the Boolean function


f (x1 , x2 , . . . , xn ) if X∗ is a false point of f and if no other false point of f is
a left-shift of X∗ . Similarly, X∗ is a floor of f if X∗ is a true point of f and if X∗
is a left-shift of no other true point of f .

Thus, a ceiling is a “leftmost” (maximal) false point, and a floor is a “rightmost”


(minimal) true point. Theorem 8.14 implies that a regular function is completely
characterized by the list of its ceilings or of its floors. As for threshold functions,
we can refine the statement of Theorem 9.13 as follows:

Theorem 9.17. Let f (x1 , x2 , . . . , xn ) be a regular Boolean function such that x1 f


x2 f . . . f xn , let X1 , X 2 , . . . , X r denote the ceilings of f , and let Y 1 , Y 2 , . . . , Y s
denote its floors. Then, f is a threshold function if and only if the system of
inequalities
 n j

  i=1 wi xi ≤ t (j = 1, 2, . . . , r)

 n j
 i=1 wi yi ≥ t +1 (j = 1, 2, . . . , s)
(TS∗ ) wi ≥ 0 (i = 1, 2, . . . , n)



 wi = w j if xi ≈f xj (i, j = 1, 2, . . . , n)

wi ≥ wj if x i 0 f xj (i, j = 1, 2, . . . , n)

has a solution (w1 , w2 , . . . , wn , t). When this is the case, every solution of (TS*) is
a separating structure of the function.

Proof. If f is a threshold function, then (TS*) has a solution by Theorems 9.4,


9.6, and 9.7. Conversely, if (TS*) has a solution (w1 , w2 , . . . , wn , t), then it follows
easily from the definition of ceilings and floors that (w1 , w2 , . . . , wn , t) is a solution
of (TS), and hence, f is a threshold function. 

Example 9.12. Consider again the function f = x1 x2 ∨ x1 x3 x4 ∨ x2 x3 x4 , as in


Example 9.11. The unique ceiling of f is the point (1, 0, 1, 0), and its floors are the
9.4 Recognition of threshold functions 421

points (1, 1, 0, 0) and (0, 1, 1, 1). Thus, the system (TS*) associated with f reads
w1 + w3 ≤ t
w1 + w2 ≥ t +1
w2 + w3 + w4 ≥ t +1
w1 = w2 ≥ w3 = w4 ≥ 0.
This system is equivalent to the system (TS∗ ) in Example 9.11. 

9.4.3 The general case


We have so far only handled the special case of the threshold recognition problem
in which the input is the complete DNF of a positive function. Let us now drop
this assumption, and let us assume that the input function is given by an arbitrary
Boolean expression.
In this case, a generic approach for solving the threshold recognition problem
can be sketched as follows [698]:
(a) Generate the prime implicants of f .
(b) Use Theorem 1.21 to check whether f is monotone. If not, then f is
not a threshold function. Otherwise, convert f into a positive function by
performing the change of variables: yi ← x i for all negative variables xi .
(c) Use the procedure Threshold to decide whether the resulting function is a
threshold function.
Note that steps (b) and (c) of this procedure are easy in the sense that their
complexity is polynomial in the number of prime implicants of f (that is, in
the size of the output of step (a)). However, we know from Chapter 4 that step
(a) is difficult even when f is in disjunctive normal form, and that its output
may actually be exponentially large in the size of f . One may, therefore, wonder
whether alternative, more efficient lines of attack could be devised.
An answer to this question was provided by Peled and Simeone [735], who
established that the general version of the threshold recognition problem is likely
to be significantly harder than the positive case.
Theorem 9.18. Threshold Recognition is co-NP-complete even when its input
is expressed as a DNF of degree 3.
Proof. From the proof of Theorem 1.30 in Section 1.11, and from the observation
that the quadratic DNF γ = y1 y2 ∨ y3 y4 does not represent a threshold function,
we can immediately conclude that Threshold Recognition is NP-hard when
restricted to DNFs of degree 3. Thus, the only point requiring some attention is
the claim that the associated decision problem is in co-NP.
To see that this is the case, let f be an arbitrary input function on Bn , denote its
false points by {X 1 , X 2 , . . . , X p } and its true points by {Y 1 , Y 2 , . . . , Y m }. If f is not
a threshold function, then (as in the proof of Theorem 9.14) the system (9.4)–(9.8)
has a feasible solution in the variables ui (i = 1, 2, . . . , p) and vj (j = 1, 2, . . . , m).
422 9 Threshold functions

Because this system has rational coefficients and involves n+2 equations, standard
results about linear programming problems imply that (9.4)–(9.8) has a rational
solution in which at most n + 2 variables take a nonzero value, and whose size is
polynomially bounded in n (see, e.g., [812]; in geometric terms, this is also a con-
sequence of Caratheodory’s theorem; see [199, 788]). Let (U , V ) ∈ Rp+m be such
a solution, with I = {i : ui > 0, i = 1, 2, . . . , p}, J = {j : vj > 0, j = 1, 2, . . . , m},
|I | ≤ (n+2), and |J | ≤ (n+2). Then, the points X i (i ∈ I ) and Y j (j ∈ J ), together
with the coefficients ui (i ∈ I ) and vj (j ∈ J ), constitute a polynomial-size cer-
tificate of nonthresholdness for f . This implies that Threshold Recognition is
in co-NP. 

The bound on the degree of the input DNF is sharp in Theorem 9.18: Indeed,
the prime implicants of a quadratic DNF can be generated in polynomial time (see
Section 5.8), so that the generic recognition procedure sketched at the beginning
of this subsection applies. In particular, the case of nonmonotone quadratic DNFs
can be efficiently reduced to the positive case, which we discuss in greater detail
in Section 9.7.

Application 9.7. (Integer programming.) The aggregation problem for a system


of linear inequalities in 0–1 variables can be stated as follows. Given a system of
inequalities
n
aij xj ≤ bi (i = 1, 2, . . . , m), (9.13)
j =1

is there a single inequality in (x1 , x2 , . . . , xn ), say,


n

wj xj ≤ t (9.14)
j =1

such that (9.13) and (9.14) have the same set of solutions over B n ? To establish
the link between the aggregation problem and the threshold recognition problem,
we rely on some of the concepts that have been introduced in Chapter 1, Section
1.13.6. Remember that the resolvent of the system (9.13) is the Boolean function
f (x1 , x2 , . . . , xn ) whose false points are exactly the 0–1 solutions of (9.13) (see
Section 1.13.6). Then, the aggregation problem is simply asking whether f is a
threshold function, and Theorem 9.18 implies that the aggregation problem is NP-
hard, even for systems of generalized covering inequalities (see Theorem 1.39).
However, when (9.13) happens to be a system of set-covering inequalities, meaning
that aij ∈ {−1, 0} and bi = −1 (i = 1, 2, . . . , m, j = 1, 2, . . . , n), then f is a nega-
tive function and its prime implicants are readily available. Hence, in this special
case, the procedure Threshold provides an efficient solution of the aggregation
problem (Peled and Simeone [735]). See also Application 9.12 in Section 9.7 for
related considerations. 
9.5 Prime implicants of threshold functions 423

9.5 Prime implicants of threshold functions


In the previous section, we tackled the problem of computing a structure of a
threshold function when the function is expressed in DNF. We now turn to the
opposite question; namely, given a separating structure, how can we generate the
prime implicants of the corresponding threshold function? For the sake of simplic-
ity, we consider only the case of positive functions; by virtue of Theorem 9.6, this
assumption does not entail any loss of generality. Note also that, as an immediate
consequence of Theorem 9.9, all results in this section carry over mutatis mutandis
to the prime implicates of threshold functions.
We start with a simple characterization of the prime implicants of a threshold
function in terms of a separating structure of the function.

Theorem 9.19. Let f (x1 , x2 , . . . , xn ) be a positive threshold function with separat-


ing structure (w1 , w2 , . . . , wn , t), where wj ≥ 0 for j = 1, 2, . . . , n. The elementary

conjunction j ∈P xj is a prime implicant of f if and only if j ∈P wj > t and

j ∈P \{i} wj ≤ t for all i ∈ P .

Proof. We know that j ∈P xj is a prime implicant of f if and only the point
X P ∈ B n , defined by xjP = 1 for j ∈ P and xjP = 0 for j  ∈ P , is a mini-
mal true point of f . This is trivially equivalent to the conditions given in the
statement. 

We now present an algorithm to generate all prime implicants (or, more pre-
cisely, all minimal true points) of a positive threshold function f (x1 , x2 , . . . , xn )
described by a structure (w1 , w2 , . . . , wn , t). We assume that the variables of f have
been permuted in such a way that w1 ≥ w2 ≥ · · · ≥ wn ≥ 0 and, to rule out the

trivial cases where f is constant on B n , we also assume that 0 ≤ t < ni=1 wi .
For k = 1, 2, . . . , n, we denote by Tk the set of all points (y1∗ , y2∗ , . . . , yk∗ ) ∈ B k such
that f has a minimal true point of the form (y1∗ , y2∗ , . . . , yn∗ ) ∈ B n for an appropriate
∗ ∗
choice of (yk+1 , yk+2 , . . . , yn∗ ) ∈ Bn−k . Thus, Tn contains exactly the minimal true
points of f . We also let T0 = {( )}, where () is the “empty” vector. (Observe that
this convention is coherent with the previous definition, since we have assumed
that Tn is nonempty.)
Now, the prime implicant generation algorithm recursively generates
T1 , T2 , . . . , Tn : The next result explains how Tk+1 can be efficiently generated when
Tk is at hand.

Theorem 9.20. Let f (x1 , x2 , . . . , xn ) be a positive threshold function with struc-


ture (w1 , w2 , . . . , wn , t), where w1 ≥ w2 ≥ · · · ≥ wn ≥ 0; let 0 ≤ k ≤ n; and let
(y1∗ , y2∗ , . . . , yk∗ ) be a point in Tk . Then,

(1) (y1∗ , y2∗ , . . . , yk∗ , 1) ∈ Tk+1 if and only if ki=1 wi yi∗ ≤ t;
 
(2) (y1∗ , y2∗ , . . . , yk∗ , 0) ∈ Tk+1 if and only if ki=1 wi yi∗ + ni=k+2 wi > t.
424 9 Threshold functions

Proof. Let Y ∗ = (y1∗ , y2∗ , . . . , yk∗ ), and consider assertion (1). If (Y ∗ , 1) is in Tk+1 , then
f has a minimal true point of the form (Y ∗ , 1, Z ∗ ) ∈ B n , and hence, (Y ∗ , 0, . . . , 0) ∈

B n is a false point of f , meaning that ki=1 wi yi∗ ≤ t.
k ∗ ∗
Conversely, assume now that i=1 wi yi ≤ t.Since Y ∈ Tk , the point
(Y , 1, . . . , 1) ∈ B is a true point of f , and hence, i=1 wi yi + ni=k+1 wi > t.
∗ n k ∗
 
Let r ≥ k + 1 be the smallest index such that ki=1 wi yi∗ + ri=k+1 wi > t. Define
∗ ∗ ∗
yk+1 = . . . = yr∗ = 1, yr+1 = . . . = yn∗ = 0, and X∗ = (Y ∗ , yk+1 , . . . , yn∗ ). Then, X∗

is a minimal true point of f , and hence, (Y , 1) ∈ Tk+1 as required.
Consider now assertion (2), and assume that (Y ∗ , 0) is in Tk+1 . Then
 
(Y , 0, 1, . . . , 1) ∈ B n is a true point of f ; hence, ki=1 wi yi∗ + ni=k+2 wi > t.

 
Conversely, assume that ki=1 wi yi∗ + ni=k+2 wi > t. There are two cases:
k 
• If ∗
r i=1 wi yi ≤ t, let r ≥∗k + 2 be the smallest index such that ki=1 wi yi∗ +
∗ ∗ ∗ ∗
i=k+2 wi > t. Define yk+1 = 0, yk+2 = . . . = yr = 1, yr+1 = . . . = yn = 0,
∗ ∗ ∗ ∗ ∗
and let X = (Y , yk+1 , . . . , yn ). Then, X is a minimal true point of f , and
hence, (Y ∗ , 0) ∈ Tk+1 .
k
• If i=1 wi yi∗ > t, then X ∗ = (Y ∗ , 0, . . . , 0) ∈ B n is a true point of f . On the
other hand, since Y ∗ ∈ Tk , there exists a minimal true point of f of the form

Z ∗ = (Y ∗ , yk+1 , . . . , yn∗ ). By minimality of Z ∗ , we conclude that X∗ = Z ∗ ,
and hence, (Y ∗ , 0) ∈ Tk+1 . 

Theorem 9.20 leads to the algorithm displayed in Figure 9.4. We illustrate this
algorithm on a small example.

Example 9.13. We apply procedure MinTrue to the threshold function f rep-


resented by the separating structure (w1 , w2 , . . . , w5 , t) = (5, 4, 3, 2, 1, 8). Since
5
j =1 wj = 15 > 8, we start with T0 = {()}.
In order to generate T1 , we use Theorem 9.20 with k = 0 and Y ∗ = ( ).
  
Since ki=1 wi yi∗ = 0 ≤ 8 and ki=1 wi yi∗ + ni=k+2 wi = 10 > 8, we obtain
T1 = {(1), (0)}.

Procedure MinTrue(w1 , w2 , . . . , wn , t)
Input: The separating structure (w1 , w2 , . . . , wn , t) ∈ Qn+1 of a threshold function f ,
with w1 ≥ w2 ≥ · · · ≥ wn ≥ 0.
Output: The set T of minimal true points of f .

begin 
if ni=1 wi ≤ t then return T := ∅
else if t < 0 then return T := {(0, . . . , 0)}
else begin
T0 := {( )};
for j := 1 to n do use Theorem 9.20 to generate Tj ;
return T := Tn ;
end
end

Figure 9.4. Procedure MinTrue.


9.5 Prime implicants of threshold functions 425


Next, we let k = 1 in Theorem 9.20. When Y ∗ = (1), we have ki=1 wi yi∗ =
k ∗
 n
5 ≤ 8 and i=1 wi yi + i=k+2 wi = 11 > 8. Thus, the points (1, 0) and

(1, 1) are in T 2 . On the other hand, when Y ∗ = (0), ki=1 wi yi∗ = 0 ≤ 8 and
k ∗
 n
i=1 wi yi + i=k+2 wi = 6 ≤ 8. Hence, (0, 1) is in T2 , and we conclude that
T2 = {(1, 0), (1, 1), (0, 1)}.
Continuing in this way, we successively produce

T3 = {(1, 0, 1), (1, 1, 0), (0, 1, 1)}


T4 = {(1, 0, 1, 1), (1, 0, 1, 0), (1, 1, 0, 0), (0, 1, 1, 1)}
T5 = {(1, 0, 1, 1, 0), (1, 0, 1, 0, 1), (1, 1, 0, 0, 0), (0, 1, 1, 1, 0)}.

The set T5 contains the complete list of minimal true points of f . 

In the next statement, the term (arithmetic) operations denotes elementary oper-
ations, such as additions, subtractions, multiplications, comparisons, performed on
numbers of size polynomially bounded in the size of the input.

Theorem 9.21. Procedure MinTrue is correct and can be implemented to per-


form O(nm) arithmetic operations, where n is the number of variables, and m is
the number of minimal true points of the input function.

Proof. Theorem 9.20 implies that MinTrue is correct.


To establish the complexity bound, it is useful to picture a binary tree T (f ) of
height n, whose root is the “empty” point ( ), and whose vertices at height k are
the elements of Tk (k = 1, 2, . . . , n). The parent of vertex (y1∗ , y2∗ , . . . , yk+1

) ∈ Tk+1
∗ ∗ ∗
is vertex (y1 , y2 , . . . , yk ) ∈ Tk . Note that, since the tree T (f ) has m leaves, it has
O(nm) vertices.
For an efficient implementation of MinTrue, we do not explicitly record
the components of vertex (y1∗ , y2∗ , . . . , yk∗ ) ∈ Tk , but only the quadruplet of
 n
labels
 (k, yk∗ , ki=1 wi yi∗ , i=k+2 wi ). The root is labeled by the quadruplet
n
0, ∗, 0, i=2 wi .
Now, the procedure MinTrue builds T (f ) recursively, visiting every vertex
of T (f ) exactly once in the process. Note that, for each element Y ∗ of Tk , test-
ing the conditions in Theorem 9.20 and computing the labels associated with
the children of Y ∗ requires a constant number of operations. Hence, the labels
associated with Tk+1 can be generated from those associated with Tk in time
O(|Tk |), and this implies that all minimal true points can be listed in total time

O( nk=1 |Tk |) = O(nm). 

Procedure MinTrue is a version of a procedure described by Hammer and


Rudeanu [460]. For related work, see, for instance, Granot and Hammer [410];
Bradley, Hammer and Wolsey [148]; Lawler, Lenstra, and Rinnooy Kan [605],
and so on.
For the sake of simplicity, we have described MinTrue as a breadth-first search
traversal of T (f ), but it should be obvious that it can also be implemented as a
426 9 Threshold functions

depth-first search procedure, possibly allowing reduction of storage requirements


(details are left to the reader).
Procedure MinTrue runs in polynomial total time, since its complexity is poly-
nomially bounded in the size of its input and of its output (see Appendix B). In fact,
the minimal true points can even be generated with polynomial delay if MinTrue
is implemented as a depth-first search procedure. In the worst case, however, the
number of minimal true points to be generated could be exponentially large in the
encoding size of the input structure.
Example 9.14. For n ≥ 1, consider the structure (w1 , w2 , . . . , wn , t) =
(1, 1, . . . , 1, 7 n2 8) and the corresponding threshold function
  fn . Then, the encod-
ing size of the structure is O(n). But fn has m(n) = 7 nn 8 minimal true points, and
2
m(n) is not bounded by any polynomial in n. 
Application 9.8. (Integer programming.) Consider the knapsack constraints (see
Application 9.5)
n

wi xi ≤ t (9.15)
i=1
(x1 , x2 , . . . , xn ) ∈ Bn (9.16)
and their continuous relaxation
n

wi xi ≤ t (9.17)
i=1
0 ≤ xi ≤ 1 (i = 1, 2, . . . , n), (9.18)
where we assume that wi ≥ 0 for i = 1, 2, . . . , n.
The 0–1 solutions of (9.15)–(9.16) are the false points of a positive threshold
function f (x1 , x2 , . . . , xn ), with structure (w1 , w2 , . . . , wn , t). We call the convex hull
of these 0–1 points a threshold (or knapsack) polyhedron. From Section 1.13.6,

we know that, if Ck = j ∈P (k) xj , k = 1, 2, . . . , m, denote the prime implicants of
f , then each of the inequalities

xi ≤ |P (k)| − 1 (k = 1, 2, . . . , m) (9.19)
i∈P (k)

defines a valid inequality for the corresponding threshold polyhedron, meaning


that every point in the threshold polyhedron satisfies the inequalities (9.19). More-
over, the solution set of the system (9.17)–(9.19) is, in general, strictly smaller
than the solution set of (9.17)–(9.18). As a consequence, inequalities of the form
(9.19) have been successfully used in cutting-plane algorithms for the solution
of large-scale 0–1 linear programming problems. Each constraint of such a
problem is then considered individually in order to generate the corresponding
inequalities (9.19).
The investigation of the relationship between the facets of threshold polyhedra
and the prime implicant inequalities (9.19) was initiated by Balas [42], Hammer,
9.5 Prime implicants of threshold functions 427

Johnson, and Peled [444], and Wolsey [922], and further developed in numerous
publications (see, e.g., Balas and Zemel [46]; Weismantel [904]; Zemel [935],
etc.). Their practical use in 0–1 programming was first convincingly demonstrated
by Crowder, Johnson, and Padberg [245]. We refer the reader to Nemhauser and
Wolsey [707] or Wolsey [924] for more information on this topic.
We also note that, more recently, a number of researchers have examined effi-
cient procedures to translate knapsack systems of the form (9.15)–(9.16) into
equivalent Boolean DNF equations, possibly involving additional variables. This
line of research, in the spirit of Chapter 2, Section 2.3, opens the possibility
of relying on purely Boolean techniques (such as satisfiability solvers) to han-
dle 0–1 linear optimization problems; see, for instance, Bailleux, Boufkhad, and
Roussel [41]; Eén and Sörensson [290]; or Manquinho and Roussel [667]. 

Application 9.9. (Integer programming.) In certain applications, it may be advan-


tageous to substitute an initial separating structure by one with smaller weights
and/or threshold value, but which defines the same threshold function. This type
of transformation gives rise to a variety of coefficient reduction problems.
To illustrate, consider for instance the following system of inequalities, defining
the continuous relaxation of a particular knapsack problem:

10x1 + 8x2 + 7x3 + 6x4 ≤ 22 (9.20)

0 ≤ xj ≤ 1 (j = 1, 2, 3, 4). (9.21)

It is easily seen that the inequality (9.20) has exactly the same 0–1 solutions as

2x1 + x2 + x3 + x4 ≤ 3 (9.22)

(that is, both inequalities define the same threshold function f (x1 , x2 , x3 , x4 ) =
x1 x2 x3 ∨ x1 x2 x4 ∨ x1 x3 x4 ), but some fractional solutions of (9.20)–(9.21) are cut
off by the inequality (9.22): For instance, (x1∗ , x2∗ , x3∗ , x4∗ ) = ( 23 , 23 , 23 , 23 ) satisfies
(9.20)–(9.21) but violates (9.22).
Even though it is not true that a reduction of the coefficient sizes always implies
a strengthening of the inequality, this is, nevertheless, often the case. Coefficient
reduction is therefore of interest in branch-and-bound and cutting-plane algo-
rithms for 0–1 linear programming problems (see Bradley, Hammer, and Wolsey
[148]; Nemhauser and Wolsey [707]; Williams [910], etc.). Similar issues also
arise in electrical engineering; see, for instance, Muroga [698].
A possible approach to coefficient reduction problems goes as follows: Given the
initial separating structure (w1 , w2 , . . . , wn , t), generate the maximal false points
X 1 , X 2 , . . . , X p and the minimal true points Y 1 , Y 2 , . . . , Y m of the corresponding
threshold function f (as usual, we assume that f is positive). Then, in view of
Theorem 9.13, a “reduced” structure of f can be found by solving the optimization
428 9 Threshold functions

problem

minimize g(w1 , w2 , . . . , wn , t) (9.23)


n
 j
subject to wi xi ≤ t (j = 1, 2, . . . , p) (9.24)
i=1
n
 j
wi yi ≥ t + 1 (j = 1, 2, . . . , m) (9.25)
i=1

wi ≥ 0 (i = 1, 2, . . . , n), (9.26)

where g(w1 , w2 , . . . , wn , t) could be any of a variety of objective functions, such


 
as: ni=1 wi , or t, or ni=1 wi + t, or max(w1 , w2 , . . . , wn ), and so on.
Note, however, that the optimal solution of (9.23)–(9.26) depends on the choice
of the objective function g. This is related to the fact that, in general, the solution
set of (9.24)–(9.26) has no componentwise minimum element. 

Example 9.15. The separating structures (W ∗ , t ∗ ) = (13, 7, 6, 6, 4, 4, 4, 3, 2, 24)


and (V ∗ , t ∗ ) = (13, 7, 6, 6, 4, 4, 4, 2, 3, 24) define the same threshold function f .
But no solution of the system (9.24)–(9.26) associated with f is componentwise
smaller than both (W ∗ , t ∗ ) and (V ∗ , t ∗ ) (see Exercise 7 at the end of this
chapter). 

9.6 Chow parameters of threshold functions


We have had several opportunities to discuss the concept of Chow parameters
(see, e.g., Sections 1.6, 1.13, and 8.2). Historically, the motivation to introduce
this concept stemmed from the observation that the Chow parameters of threshold
functions display numerous remarkable properties; Dertouzos [269], Dubey and
Shapley [279], and Winder [920] present a wealth of information about this early
stream of research.
Recall Definition 1.14 from Section 1.6.
Definition 9.5. The Chow parameters of a Boolean function f on Bn are the n + 1
integers (ω1 (f ), ω2 (f ), . . . , ωn (f ), ω(f )), where ω(f ) is the number of true points
of f and ωi (f ) is the number of true points Y ∗ of f such that yi∗ = 1

ωi (f ) = | {Y ∗ ∈ Bn | f (Y ∗ ) = 1 and yi∗ = 1} |, i = 1, 2, . . . , n.

When no confusion can arise, we sometimes drop the symbol f from the nota-

tion ωi (f ) or ω(f ). Note that (ω1 , ω2 , . . . , ωn ) = ωj=1 Y j , where Y 1 , Y 2 , . . . , Y ω
are the true points of f . We should also mention that many variants of Definition
9.5 have been used in the literature (see, e.g., [920] and Section 9.6.2). These vari-
ants give rise to different scalings of the Chow parameters, while preserving their
main features.
9.6 Chow parameters of threshold functions 429

9.6.1 Chow functions


Definition 9.6. A Boolean function f is a Chow function if no other function has
the same Chow parameters as f .
Example 9.16. The function f = x1 x2 ∨ x1 x2 is not a Chow function, since it has
the same Chow parameters as g = x1 x2 ∨ x1 x2 , namely, (ω1 , ω2 , ω) = (1, 1, 2). 

With Definition 9.6 at hand, we are now ready to state Chow’s fundamental
result (see Chow [194]; Muroga [698] also credits Tannenbaum [857] for this
result).
Theorem 9.22. Every threshold function is a Chow function.
Proof. Consider a threshold function f on B n , and a function g on Bn having the
same Chow parameters as f . We must show that f = g.
Let us denote by Y 1 , Y 2 , . . . , Y ω the true points of f , and by X1 , X 2 , . . . , X k ,
Y , Y k+2 ,. . ., Y ω the true points of g, where ω = ω(f ) = ω(g), 0 ≤ k ≤ ω,
k+1

and X1 , X 2 , . . . , X k are false points of f . Since f and g have the same Chow
parameters,
ω k ω
Yj = Xj + Yj,
j =1 j =1 j =k+1

or, equivalently,
k
 k

Yj = Xj . (9.27)
j =1 j =1

Now, if k ≥ 1, then (9.27) contradicts the fact that f is asummable. Hence, we con-
clude that k = 0, meaning that f and g have the same true points, and that f = g. 

This result shows that every threshold function is uniquely identified by its
Chow parameters. Chow parameters have therefore been used as convenient iden-
tifiers for cataloging threshold functions; see Muroga [698] for a table of threshold
functions up to five variables; Muroga, Toda, and Kondo [701] or Winder [917]
for functions of six variables; Winder [919] for functions of seven variables; and
Muroga, Tsuboi, and Baugh [702] (cited in [698]) for functions of eight variables.
Observe that all points occurring in equation (9.27) are distinct. This motivates
the introduction of yet another concept.
Definition 9.7. A Boolean function is weakly asummable if, for all k ≥ 1, there do
not exist k distinct false points of f  , say, X1 , X 2
, . . . , X k , and k distinct true points
of f , say, Y 1 , Y 2 , . . . , Y k , such that i=1 X i = ki=1 Y i .
k

Clearly, every asummable (that is, threshold) function is weakly asummable.


Moreover, the proof of Theorem 9.22 actually establishes that every weakly
asummable function is a Chow function. Yajima and Ibaraki [929] (see also Winder
[920]) proved that the converse implication holds as well, namely:
430 9 Threshold functions

(k > 2)
threshold ⇒ k-asummable ⇒ 2-asummable
: : ⇒ k-monotone
asummable ⇒ weakly assumable ⇒ completely (k ≥ 1)
: monotone
Chow
Figure 9.5. A hierarchy of Boolean functions: Enlarged version.

Theorem 9.23. A Boolean function is weakly asummable if and only if it is a Chow


function.
Proof. We only have to show that, if a Boolean function is not weakly asummable,
then it is not a Chow function. Let X1 , X 2 , . . . , X q denote the false points of a func-
tion f , and let Y 1 , Y 2 , . . . , Y p denote its true points. If f is not weakly asummable,
 
then we can assume, without loss of generality, that ki=1 X i = ki=1 Y i for
some k ≥ 1. Let g be the Boolean function whose true points are exactly
X1 , X 2 , . . . , X k , Y k+1 , Y k+2 , . . . , Y p . The functions f and g are distinct, but they
have the same Chow parameters. Hence, f is not a Chow function. 

There exist Chow functions that are not threshold, and the function constructed
in the proof of Theorem 9.15 is completely monotone but not a Chow function.
On the other hand, Yajima and Ibaraki [929] showed that Chow functions are
completely monotone (we leave the proof of this assertion as an end-of-chapter
exercise). Thus, we obtain the hierarchy displayed in Figure 9.5 (compare with
Figure 9.2 in Section 9.3).
Chow’s theorem has been more recently revisited by O’Donnell and Serve-
dio [717], who established a “robust” generalization of it: Namely, they
proved that if f is a threshold function, if g is an arbitrary function, and if
(ω1 (f ), ω2 (f ), . . . , ωn (f ), ω(f )) is “close” to (ω1 (g), ω2 (g), . . . , ωn (g), ω(g)) in
some appropriate norm, then the functions f and g are also “close” in the norm

X∈Bn |f (X) − g(X)| (if we replace “close” by “equal” in this statement, then
we obtain exactly Theorem 9.22). Based on this result, O’Donnell and Servedio
proposed a fast algorithmic version of Chow’s theorem, which allows them to effi-
ciently construct an approximate representation of a threshold function given its
Chow parameters (an extension of this problem is mentioned in Application 9.10
hereunder). We refer to [717] for details and applications in learning theory; see
also Matulef et al. [677] for additional far-reaching extensions of Chow’s theorem.

9.6.2 Chow parameters and separating structures


It is natural to expect some sort of relationship between the Chow parameters of a
threshold function and the separating structures defining the function, since both
types of coefficients somehow provide a “measure” of the “influence” of each
9.6 Chow parameters of threshold functions 431

variable on the function (remember the discussion of power indices in Section


1.13.3). This relationship is most naturally expressed in terms of the so-called
modified Chow parameters of the function, which were introduced in Section
1.13.3.

Definition 9.8. The modified Chow parameters of a Boolean function


f (x1 , x2 , . . . , xn ) are the (n + 1) numbers (π1 , π2 , . . . , πn , π ), defined as π =
ω − 2n−1 and πk = 2ωk − ω for k = 1, 2, . . . , n, where (ω1 , ω2 , . . . , ωn , ω) are the
Chow parameters of f .

Since there is a bijective correspondence between Chow parameters and mod-


ified Chow parameters, Theorem 9.22 implies that every threshold function is
uniquely determined by its modified Chow parameters, or by its Chow parameters,
or by any of its separating structures.
The following statements display formal analogy with Theorems 9.6, 9.7, and
9.9, but they hold for arbitrary (not necessarily threshold) functions:

Theorem 9.24. If f (x1 , x2 , . . . , xn ) is a Boolean function with modified Chow


parameters (π1 , π2 , . . . , πn , π), then, for all i, j ∈ {1, 2, . . . , n},

(1) if f is positive in xi and f depends on xi , then πi > 0;


(2) if f is negative in xi and f depends on xi , then πi < 0;
(3) if f does not depend on xi , then πi = 0;
(4) if xi 0f xj , then πi > πj ;
(5) if xi ≈f xj , then πi = πj ;
(6) the modified Chow parameters of f d are (π1 , π2 , . . . , πn , −π );
(7) if f d ≤ f , then π ≥ 0;
(8) if f d ≥ f , then π ≤ 0.

Proof. Let (ω1 , ω2 , . . . , ωn , ω) denote the Chow parameters of f . Fix i ∈ {1, 2, . . . , n},
and let A = {X ∈ B n : f (X) = 1, xi = 1}, B = {X ∈ Bn : f (X) = 1, xi = 0}. So,
|A| = ωi and |B| = ω − ωi . If f is positive in xi , then the mapping m(X) = X ∨ ei
is one-to-one on B, and m(B) ⊆ A. Hence, |B| ≤ |A| and πi ≥ 0. Moreover, if f
depends on xi , then |B| < |A|, and hence πi > 0. This establishes assertion (1);
assertions (2) and (3) are proved in a similar way.
Assertions (4) and (5) are a restatement of Theorem 8.4 in Section 8.2.
By definition of duality, f d (X) = 1 if and only if f (X) = 0. It follows directly
that ω(f d ) = 2n − ω, and hence, that π(f d ) = ω(f d ) − 2n−1 = 2n−1 − ω = −π .
Similarly, for i = 1, 2, . . . , n,

ωi (f d ) = |{X : f d (X) = 1, xi = 1}|


= |{X : f (X) = 0, xi = 0}|
= |{X : xi = 0}| − |{X : f (X) = 1, xi = 0}|
= 2n−1 − (ω − ωi ),
432 9 Threshold functions

and hence,
πi (f d ) = 2 ωi (f d ) − ω(f d )
= 2 (2n−1 − ω + ωi ) − (2n − ω)
= 2 ωi − ω
= πi .
This proves assertion (6). As for (7), observe that f d ≤ f implies ω(f d ) ≤ ω, and
hence, π ≥ 0. A similar reasoning yields (8). 

In the case of threshold functions, how much further does the analogy go
between weights and modified Chow parameters? First, it can be informally
stated that, for a threshold function with separating structure (w1 , w2 , . . . , wn , t),
the vectors (w1 , w2 , . . . , wn ) and (π1 , π2 , . . . , πn ) often turn out to be “roughly”
proportional. This, in spite of the fact that, as the following example shows,
proportionality can become quite rough when the separating structure is picked
arbitrarily.
Example 9.17. The threshold function with structure (w, w2 , w3 , t) = (1, 1, 1, 1) has
modified Chow parameters (π1 , π2 , π3 ) = (2, 2, 2), so that (w1 , w2 , w3 ) is exactly
proportional to (π1 , π2 , π3 ). But (50, 50, 1, 50) and (50, 33, 18, 50) are two other
structures of the same function, for which proportionality with (π1 , π2 , π3 ) is much
more approximative! 
Based on the previous example, one may be tempted to go one step further and
to conjecture that every threshold function admits a separating structure whose
weights are proportional to the modified Chow parameters π1 , π2 . . . πn of the
function. Or, in other words, that every such function has a structure of the form
(π1 , π2 , . . . , πn , t), for some suitable choice of t. This conjecture is easily disproved,
however.
Example 9.18. The function f (x1 , x2 , x3 , x4 , x5 ) = x1 x2 ∨ x1 x3 x4 ∨ x1 x3 x5 ∨
x2 x3 x4 x5 is a threshold function with separating structure (4, 3, 2, 1, 1, 6), and
its modified Chow parameters are (π1 , π2 , π3 , π4 , π ) = (10, 6, 4, 2, 2, −4). But this
function has no structure of the form (10, 6, 4, 2, 2, t), for any t (otherwise, 14 ≤ t,
since (1, 0, 1, 0, 0) is a false point of f , and t < 14, since (0, 1, 1, 1, 1) is a true
point of f ). A similar reasoning also shows that f has no structure of the form
(ω1 , ω2 , ω3 , ω4 , ω5 , t) = (11, 9, 8, 7, 7, t). 
Notwithstanding this dispiriting news, Dubey and Shapley [279] observed that,
in some sense, the vector of modified Chow parameters actually is proportional to
the “average” of the vector of weights.
Theorem
n 9.25. Let w1 , w2 , . . . , wn be fixed nonnegative numbers and W =
j =1 wj , let t be a random variable uniformly distributed on [ 0, W ],
and let f (x1 , x2 , . . . , xn ) be the (random) threshold function with structure
9.6 Chow parameters of threshold functions 433

Table 9.1. Modified Chow parameters for Example 9.19

t π1 π2 π3 π4 π5

0, 14 1 1 1 1 1
1, 13 2 2 2 2 0
2, 12 3 3 3 1 1
3, 11 5 5 3 1 1
4, 10 7 5 3 3 1
5, 9 8 6 4 4 2
6, 8 9 7 5 3 1
7 10 6 6 2 2
Total 80 64 48 32 16

(w1 , w2 , . . . , wn , t). Then, for i = 1, 2, . . . , n, the expected value of the (random)


modified Chow parameter πi is equal to 2n−1 wi /W . 
Proof. Fix i ∈ {1, 2, . . . , n}. By Theorem 1.37 in Section 1.13.3, the expected value
of πi is nothing but the expected number of swings of f for i. Now, if X ∗ ∈ B n
and xi∗ = 0, then,
 
n n
Prob(X∗ is a swing of f for i) = Prob  wj xj∗ ≤ t < wj xj∗ + wi 
j =1 j =1
wi
= (since t is uniformly distributed).
W
Hence,  wi wi
E[πi ] = = 2n−1 .
{X∈Bn : x =0}
W W
i

The same result holds, with the same proof, if the weights w1 , w2 , . . . , wn
are assumed to be nonnegative integers and if t is uniformly distributed on
{0, 1, . . . , W − 1}. Dubey and Shapley [279] illustrate this point with the following
example.
Example 9.19. Let (w1 , w2 , w3 , w4 , w5 ) = (5, 4, 3, 2, 1) and consider all threshold
functions with separating structures (w1 , w2 , w3 , w4 , w5 , t), where t can take any
value in the set {0, 1, . . . , 14}. The modified Chow parameters of these 15 func-
tions are displayed in Table 9.1. The average value of π1 for this set of functions
is 80
15
= 2n−1 wW1 . Note, however, that there is no single choice of the threshold
t for which the vector of modified Chow parameters is exactly proportional to
(w1 , w2 , . . . , wn ). 

Additional theoretical results describing the relation between weights and Chow
parameters of a threshold function, as well as algorithms allowing us to reconstruct
434 9 Threshold functions

a threshold function from the (approximate) knowledge of its Chow parameters,


can be found in Alon and Edelman [17]; Aziz, Paterson, and Leech [39]; O’Donnell
and Servedio [717], and so on.
We conclude this section with a discussion of interesting related issues arising
in political science.

Application 9.10. (Political science, game theory.) Picture a federation of states


administered by a legislature in which each state, independently of its size, is
represented by exactly one legislator. The legislature makes its decisions according
to a weighted majority voting scheme, whereby every legislator carries a (possibly
different) number of votes. In order to embody the “one man one vote” principle
in the functioning of this legislature, the U.S. Supreme Court has ruled that “the
voting power detained by each legislator ought to be proportional to the size of the
constituency that he or she represents.” The question is now: How is this principle
to be put into practice?
Apportionment problems of this nature, far from being only theoretical, actually
arose in several U.S. elected bodies in the 1960s. They led John F. Banzhaf III
[52] to propose the indices now bearing his name as adequate measures of voting
power; his proposal was eventually adopted by several official bodies (see Dubey
and Shapley [279], Felsenthal and Machover [329], or Lucas [631] for details on
this story).
If we accept the principle that the Banzhaf index of a legislator can be equated
with his share of voting power, then the Supreme Court decree can be mathemati-
cally reformulated as follows: Denote by 1, 2, . . . , n the members of the legislature,
and assume that member i represents a state of size si (i = 1, 2, . . . , n). Also, let wi
be the weight of legislator i in the voting system (i = 1, 2, . . . , n), and let t + 1 be
the required number of votes for a resolution to pass in the legislature (we assume
w1 , w2 , . . . , wn and t to be integer). In other words, the weighted majority voting
rule used by the legislature is described by the threshold function f (x1 , x2 , . . . , xn )
with structure (w1 , w2 , . . . , wn , t).
Now, recall from Definition 1.34 in Section 1.13.3 that the i th (normalized)
Banzhaf index of f is the quantity βi = πiπj , where πi is the i th modified Chow
j
parameter of f (i = 1, 2, . . . , n).
Putting these facts together, we come to the conclusion that, according to the
Supreme Court’s interpretation of the “one man, one vote” principle, the weights
(w1 , w2 , . . . , wn ) and the threshold t should be chosen in such a way that the vector
(β1 , β2 , . . . , βn ) of Banzhaf indices be equal to the vector S1 (s1 , s2 , . . . , sn ) of relative

population sizes, where S = ni=1 si is the total population size.
As Example 9.17 shows, it is generally not sufficient to let (w1 , w2 , . . . , wn ) =
(s1 , s2 , . . . , sn ) to abide by the Supreme Court decree; see also Application 9.11
hereunder and the computations relative to the distribution of power in the
European Union Council reported in Algaba, Bilbao, Fernández Garcia, and
López [15]; Bilbao [79]; Bilbao, Fernández, Jiménez Losada, and López [80];
9.6 Chow parameters of threshold functions 435

Bilbao, Fernández, Jiménez, and López [81]; Laruelle and Widgrén [600]; Leech
[607], and so on.
Even more interestingly, the above mathematical model makes it very clear that
the one man, one vote principle, as embodied in the Court decree and further inter-
preted in terms of Banzhaf indices, cannot always be implemented in real-world
situations. Indeed, for fixed n, the number of possible realizations of the vector
1
(s , s , . . . , sn ) is infinite, whereas the number of Banzhaf vectors (β1 , β2 , . . . , βn )
S 1 2
is obviously finite (since the number of threshold functions of n variables is finite).
So, for most distributions of population sizes, there exists no allocation of weights
(w1 , w2 , . . . , wn ) that implies a distribution of power equal to S1 (s1 , s2 , . . . , sn ). In
such cases, the need arises again to give an operational meaning to the one man,
one vote principle. How can this be achieved?
One (rather intriguing) possibility raised by Papayanopoulos [729] would be
to assign exactly si votes to legislator i, and to let the threshold t vary randomly
between 0 and S (namely, the threshold would be drawn randomly in [0, S] when-
ever the legislature is to vote). By virtue of Theorem 9.25, this would provide
legislator i with an expected share of power proportional to the size of his or her
constituency. Unfortunately, even though this solution may sound quite attractive
to a mathematically inclined political scientist, it is doubtful that it will be adopted
by any real-world legislature in the foreseeable future!
Another, more realistic way out of the dilemma has been actually implemented
by some county supervisorial boards in the State of New York. In these bodies, the
one man, one vote principle has been translated as follows: The voting weights
w1 , w2 , . . . , wn and the threshold t should be specified in such a way that the
Banzhaf vector (β1 , β2 , . . . , βn ) be “as close as possible” to the population distri-
bution S1 (s1 , s2 , . . . , sn ), or, in other words, so as to minimize the distance (in some
appropriate norm) between (β1 , β2 , . . . , βn ) and S1 (s1 , s2 , . . . , sn ). This interpretation
of the one man, one vote principle gives rise to an interesting, but hard, combinato-
rial optimization problem; see Alon and Edelman [17]; Aziz, Paterson, and Leech
[39]; Lucas [632]; McLean [639]; O’Donnell and Servedio [717]; Papayanopou-
los [727, 728, 729]; Laruelle and Widgrén [600]; or Leech [607, 608] for more
information and related applications. 

9.6.3 Computing the Chow parameters


We now turn our attention to the algorithmic problem of computing the Chow
parameters of a threshold function. As might be expected, the complexity of this
problem depends very much on the format of its input. For instance, since threshold
functions are 2-monotone (Theorem 9.7), the results in Chapter 7 and Chapter 8
(in particular, Application 8.5 in Section 8.2) imply that the Chow parameters of
a threshold function can be computed in polynomial time when the list of prime
implicants of the function is available. On the other hand, the problem becomes
more difficult when the input function is described by a separating structure.
436 9 Threshold functions

Indeed, the following result due to Garey and Johnson [371], in conjunction with
Theorem 9.24, shows that it is already NP-complete to decide whether a modified
Chow parameter vanishes or not (compare with Theorem 1.32).

Theorem 9.26. Deciding whether a threshold function depends on its last variable
is NP-complete when the function is described by a separating structure.

Proof. The problem is obviously in NP. Now, recall that the following Subset Sum
problem is NP-complete [371]: Given n + 1 positive integers (w1 , w2 , . . . , wn , t),

is there a point X∗ ∈ Bn such that nj=1 wj xj∗ = t?
With an arbitrary instance (w1 , w2 , . . . , wn , t) of Subset Sum, we associate the
threshold function f (x1 , x2 , . . . , xn+1 ) with structure (w1 , w2 , . . . , wn , 12 , t). It is clear
that f depends on its last variable xn+1 if and only if (w1 , w2 , . . . , wn , t) is a Yes
instance of Subset Sum. 

Prasad and Kelly [754] actually proved that, for a threshold function given
by a separating structure, computing Banzhaf indices – or, equivalently, Chow
parameters – is #P-complete; compare with Theorem 1.38. (A similar observation
was already formulated by Garey and Johnson [371] for Shapley-Shubik indices;
see also Deng and Papadimitriou [268] and Matsui and Matsui [676].)
As a remarkable illustration of the occurrence of “dummy” variables in weighted
majority systems, we mention a well-known story among political scientists (see,
e.g., [150, 329]).

Application 9.11. (Political science, game theory.) In 1958, the European Eco-
nomic Community had six member-states, namely, Belgium, France, Germany,
Italy, Luxembourg, and the Netherlands. Its Council of Ministers relied on a
weighted majority decision rule with voting weight 4 for France, Germany and
Italy, weight 2 for Belgium and the Netherlands, and weight 1 for Luxembourg. The
threshold was set to t = 11. With these rules, it is readily seen that Luxembourg
actually had no voting power at all, since the outcome of the vote was always
determined regardless of the decision made by Luxembourg. 

The previous story, as well as the apportionment problem described in Appli-


cation 9.10, build a strong case for “practically efficient” procedures for the
computation of Chow parameters. Theorem 9.27 describes a simple dynamic pro-
gramming (i.e., recursive) algorithm for the computation of ω, the number of true
points of f .

Theorem 9.27. If f (x1 , x2 , . . . , xn ) is a threshold function given by the integral


structure (w1 , w2 , . . . , wn , t), then the number of true points of f can be computed
in O(nt) arithmetic operations.

Proof. We assume for the sake of simplicity that w1 , w2 , . . . , wn and t are positive
(only minor adaptations are required in the general case). For j = 0, 1, . . . , n and
9.6 Chow parameters of threshold functions 437

s = 0, 1, . . . , t, define p(j , s) to be the number of points X ∗ ∈ B n such that xj∗+1 =



. . . = xn∗ = 0 and nj=1 wj xj∗ = s. In particular, p(n, s) is the number of points
n
such that j =1 wj xj∗ = s, and hence,
t

ω = 2n − p(n, s). (9.28)
s=0

The numbers p(j , s) satisfy the recursions

p(j , s) = p(j − 1, s) for j = 1, 2, . . . , n; s = 1, 2, . . . , wj − 1, (9.29)

and

p(j , s) = p(j − 1, s) + p(j − 1, s − wj ) for j = 1, 2, . . . , n; s = wj , wj + 1, . . . , t.


(9.30)

Indeed, when s < wj , then xj∗ = 0 in all solutions X∗ of nj=1 wj xj∗ = s; Equation
(9.29) follows from this observation. On the other hand, when s ≥ wj , then xj∗ can
be either 0 or 1, and this gives rise to the two terms in Equation (9.30).
Note also that the initial conditions p(j , 0) = 1 for j = 0, 1, . . . , n, and p(0, s) = 0
for s = 1, 2, . . . , t, must hold.
Equations (9.29) and (9.30), together with these initial conditions, can be used
to fill in the (n + 1) × (t + 1) matrix with elements p(j , s). This only requires
O(nt) arithmetic operations, and the theorem follows from (9.28). 

Since the complexity of the algorithm in Theorem 9.27 increases polynomially


with the value of the threshold t, we conclude that the number of true points of a
threshold function can be computed in pseudo-polynomial time (which is the next
best thing to a genuine polynomial algorithm).
Suppose next that we want to compute all n + 1 Chow parameters
(ω1 , ω2 , . . . , ωn , ω) of a threshold function f . Observe that ω1 = ω(f|x1 =1 ), where
f|x1 =1 is the threshold function with structure (w2 , w3 , . . . , wn , t − w1 ). It follows
that in order to compute (ω1 , ω2 , . . . , ωn , ω), we only need to apply the previous
algorithm to f|x1 =1 , f|x2 =1 , . . . , f|xn =1 and f . Hence, all Chow parameters can be
computed in O(n2 t) operations.
Dynamic programming algorithms similar to the algorithm described in The-
orem 9.27 are classical tools for the solution of knapsack problems (see [671]).
Such algorithms have been proposed for the computation of power indices by Lucas
[631]; see Matsui and Matsui [676]. Uno [878] showed that the Banzhaf indices
(or Chow parameters) of all players can actually be computed in O(nt) opera-
tions by eliminating redundant operations. Klinz and Woeginger [574] describe a
dynamic programming algorithm with complexity O(n2 1.415n ). See also Pesant
and Quimper [740] or Trick [869] for related work in the context of constraint
programming.
Pseudo-polynomial algorithms based on the consideration of generating func-
tions have been proposed for the computation of Shapley-Shubik indices by Mann
438 9 Threshold functions

and Shapley [661] (who credit Cantor), and for Banzhaf indices by Brams and
Affuso [150]; see also Algaba et al. [15]; Bilbao [79]; Bilbao et al. [80, 81];
Fernández, Algaba, Bilbao, Jiménez, Jiménez and López [330]; Leech [607];
Papayanopoulos [729] for related work, extensions, and applications in various
political settings.
Finally, we refer to Crama and Leruth [236]; Crama, Leruth, Renneboog, and
Urbain [237]; Cubbin and Leech [246]; Gambarelli [368]; Leech [606, 608, 609]
for different approaches to the computation of Banzhaf indices in the framework
of corporate finance applications.

9.7 Threshold graphs


In this section, we specialize some of the above results to the case of graphic (that
is, purely quadratic and positive) functions. Recall that such a function can be iden-

tified with an undirected graph. More precisely, if f (x1 , x2 , . . . , xn ) = (i,j )∈E xi xj ,
we denote by Gf the graph (V , E), where V = {1, 2, . . . , n}. Conversely, if
G = (V , E) is an arbitrary graph, we define the Boolean function fG by the

expression (i,j )∈E xi xj .

Definition 9.9. A graph G is a threshold graph if the Boolean function fG is


threshold. We say that (w1 , w2 , . . . , wn , t) is a separating structure of G if it is a
separating structure of fG .

Example 9.20. The function f (x1 , x2 , x3 , x4 ) = x1 x2 ∨ x1 x3 ∨ x1 x4 ∨ x2 x3 is


graphic, and the associated graph Gf is shown in Figure 9.6. Since f is a thresh-
old function, Gf is a threshold graph. A separating structure for Gf is for instance
(3, 2, 2, 1, 3). 

Threshold graphs were introduced by Chvátal and Hammer [201] and, inde-
pendently, by Henderson and Zalcstein [487]. Most of this section is based
on [201].
The central question we want to address is: Which graphic functions are thresh-
old, or, equivalently, which graphs are threshold? In view of the correspondence

1 m m2






4 m m3

Figure 9.6. Graph Gf for Example 9.20.


9.7 Threshold graphs 439

between false points of f and stable sets of Gf (recall Application 1.13.5), we


immediately obtain a first, trivial characterization.

Theorem 9.28. A graph G = (V , E) is a threshold graph if and only if there exists


a structure (w1 , w2 , . . . , wn , t) such that, for every subset S of vertices

S is stable in G if and only if wi ≤ t.
i∈S

Proof. This is a mere reformulation of Definition 9.9. 

Recall that for a graph G = (V , E) and a vertex i ∈ V , we denote by N (i) the


neighborhhood of i, that is, the set N (i) = {j ∈ V : (i, j ) ∈ E}. We say that i is
isolated if N (i) = ∅, and i is dominating if N (i) = V \ {i}. Note that isolated
vertices of G correspond to inessential (dummy) variables of fG .

Theorem 9.29. For a graphic function f (x1 , x2 , . . . , xn ), the following statements


are equivalent:

(a) f is a threshold function.


(b) f is a regular function.
(c) There is a permutation (σ (1), σ (2) . . . , σ (n)) of {1, 2, . . . , n} such that, for
every i ∈ {1, 2, . . . , n}, σ (i) is either isolated or dominating in the subgraph
of Gf induced by {i, i + 1, . . . , n}.

Proof. The implication (a) =⇒ (b) follows from Theorem 9.7.


We prove the implication (b) =⇒ (c) by induction on n. The implication cer-
tainly holds for n = 1. So, assume that n > 1, and let f be a regular graphic function
with x1 f x2 f · · · f xn . We first claim that either Gf has an isolated vertex,
or vertex 1 is dominating in Gf .
Indeed, assume that Gf has no isolated vertex, and consider an arbitrary vertex
j in {2, 3, . . . , n}. Since j is not isolated, there exists a vertex k ∈ N (j ). If k = 1,
then we are done. Otherwise, since x1 f xk , Theorem 8.5 in Section 8.2 implies
that N (k) \ {1} ⊆ N (1) \ {k}. Thus, j ∈ N (1) \ {k}, and the claim is proved.
Now, we can define σ (1) as follows: If Gf has an isolated vertex, say i, then
we let σ (1) = i; otherwise, we let σ (1) = 1.
Let H be the subgraph of Gf obtained by deleting σ (1) from Gf . Thus, H = Gg ,
where g is the restriction of f obtained by fixing xσ (1) to 0. By Theorem 8.7, g is
a regular graphic function, and the proof is complete by induction.
Finally, we use again induction on n to prove the implication (c) =⇒ (a). If
n = 1, then the implication trivially holds. Otherwise, without loss of generality,
assume that 1 is either an isolated or a dominating vertex of Gf . Let H be the
induced subgraph of Gf obtained by deleting 1 from Gf . Then H = Gg , where g
is the restriction of f to x1 = 0.
Since condition (c) holds for H , the induction hypothesis implies that g is a
threshold function. Let (w2 , w3 , . . . , wn , t) be a separating structure for g. Using
440 9 Threshold functions

1 ✐ ✐2 1 ✐ ✐2 1 ✐ ✐2

4 ✐ ✐3 4 ✐ ✐3 4 ✐ ✐3
(a) (b) (c)
Figure 9.7. (a): 2K2 , (b): P4 , (c): C4 .

Theorem 9.6 and the arguments in the proof of Theorem 9.4, we can assume that
wi > 0 for all i = 2, 3, . . . , n, and that t > 0.
Now, it is easy to see that (w1 , w2 , . . . , wn , t) is a structure for f , where w1 = 0
if vertex 1 is isolated in Gf , and w1 = t if vertex 1 is dominating in Gf . Hence, f
is a threshold function. 

The graphs 2K2 , P4 , and C4 are represented in Figure 9.7.

Theorem 9.30. A graph is a threshold graph if and only if it has no induced


subgraph isomorphic to 2K2 , P4 , or C4 .

Proof. As a consequence of Theorem 9.3, every induced subgraph of a threshold


graph is threshold. Moreover, it is easy to check that 2K2 , P4 , and C4 are not
threshold. Therefore, a threshold graph cannot have any of these graphs as an
induced subgraph.
To prove the converse statement, assume that G is not threshold. Hence, by
Theorem 9.29, fG is not regular. This means that there are two variables, say, xi
and xj , that are not comparable in the strength preorder associated with fG . From
Theorem 8.5, it follows that N (i) \ {j } ⊆ N (j ) \ {i} and N (j ) \ {i}  ⊆ N (i) \ {j }.
Let k ∈ N (i) \ N (j ), and let - ∈ N (j ) \ N (i), where i, j , k, - are all distinct. Then,
{i, j , k, -} must induce 2K2 or P4 or C4 . 

Several characterizations of threshold graphs rely on the concept of degree


sequence, which we define next.

Definition 9.10. Let G be a graph on {1, 2, . . . , n}, and let di = deg(i) be the
degree of vertex i, for i = 1, 2, . . . , n. If (π(1), π(2), . . . , π(n)) is a permutation
of {1, 2, . . . , n} such that dπ(1) ≥ dπ(2) ≥ · · · ≥ dπ(n) , then we say that d(G) =
(dπ(1) , dπ(2) , . . . , dπ(n) ) is the degree sequence of G. A degree sequence is called
threshold if it is the degree sequence of at least one threshold graph.

Theorem 9.31. If G is a threshold graph and H is a graph such that d(H ) = d(G),
then H is isomorphic to G.

Proof. Consider a threshold graph G with degree sequence d(G) = (d1 , d2 , . . . , dn ),


and let H be another graph with d(H ) = d(G). We assume, without loss of gen-
erality, that G and H have the same vertex–set, say, {1, 2, . . . , n}, and that di is the
9.7 Threshold graphs 441

degree of vertex i in both G and H , for i = 1, 2, . . . , n. We now prove by induction


on n that H is isomorphic to G.
If n = 1, this is trivial. If n > 1, then, by Theorem 9.29, there is a vertex
i ∈ {1, 2, . . . , n} such that either di = n − 1 or di = 0. Recall that G \ i is the sub-
graph obtained by deleting i from G, and let d(G \ i) = d̂. Then, d̂ is a threshold
degree sequence and, by induction, G \ i is (up to isomorphism) the unique graph
with this degree sequence. In particular, since d(H \ i) = d̂, we conclude that G \ i
is isomorphic to H \ i. From this, it easily follows that G is isomorphic to H . 

Theorem 9.31 is closely related to Theorem 9.22: Indeed, the Chow parameters
of a threshold graphic function can be explicitly expressed as a function of the
corresponding degree sequence (see Exercise 19 at the end of the chapter).
An interesting corollary of this result is that all the information concerning the
“thresholdness” of a graph is embodied in its degree sequence. In other words,
it must be possible to decide whether a graph G is a threshold graph by simply
examining its degree sequence. As a matter of fact, a careful reading of the proof
of Theorem 9.31 indicates how to decide, in O(n2 ) operations, whether a sequence
(d1 , d2 , . . . , dn ) of nonnegative integers is a threshold sequence. This result is not
best possible: Threshold sequences and threshold graphs can be recognized in time
O(n); see Golumbic [398] or Mahadev and Peled [645] for details.
The foregoing observations can also be derived from an analytical characteri-
zation of threshold sequences due to Hammer, Ibaraki, and Simeone [442]. Before
we state this result, we first recall a classical theorem of Erdős and Gallai [313]
(see also [71]).

Theorem 9.32. A sequence (d1 , d2 , . . . , dn ) with d1 ≥ d2 ≥ · · · ≥ dn ≥ 0 is a degree



sequence if and only if ni=1 di is even and, for r = 1, 2, . . . , n,
r
 n

di ≤ r(r − 1) + min{r, di }. (9.31)
i=1 i=r+1

For r = 1, 2, . . . , n, we call (9.31) the r th Erdős-Gallai inequality.

Example 9.21. The sequence d = (2, 2, 2, 2) is the degree sequence of the cycle
C4 . For this sequence, (9.31) becomes

2r ≤ r(r − 1) + r(4 − r) when r = 1, 2,

and
2r ≤ r(r − 1) + 2(4 − r) when r = 3, 4.


Notice that, in the previous example, all of the Erdős-Gallai inequalities are
satisfied as strict inequalities. This need not be the case in general.
442 9 Threshold functions

Theorem 9.33. The degree sequence d = (d1 , d2 , . . . , dn ) is a threshold sequence


if and only if equality holds in the r th Erdős-Gallai inequality associated with d,
for all r ∈ {1, 2, . . . , n} such that r − 1 ≤ dr .
We refer to the paper by Hammer, Ibaraki, and Simeone [442] for a proof of
this result, and we simply illustrate it on a small example.
Example 9.22. The degree sequence of the graph Gf described in Example 9.20
is (3, 2, 2, 1). It is easy to check that (9.31) holds as equality for all r ∈ {1, 2, 3, 4}
such that r − 1 ≤ dr , that is, for r = 1, 2, 3. 
Many additional results on threshold graphs can be found in Golumbic [398] or
Mahadev and Peled [645]. These books also describe applications of threshold
graphs to integer programming, mathematical psychology, personnel schedul-
ing and synchronization of parallel processes. We briefly discuss two of these
applications.
Application 9.12. (Integer programming.) A system of set packing inequalities
is a system of the form
n

akj xj ≤ 1, k = 1, 2, . . . , m, (9.32)
j =1

where A = (akj ) is an m × n matrix with 0–1 elements. As observed by Chvátal


and Hammer [201], the aggregation problem (see Application 9.7) for set packing
inequalities can be translated into the problem of recognizing threshold graphs.
Indeed, let us associate a graph G(A) = (V , E) with the system (9.32), where
V = {1, 2, . . . , n}, and E = { (i, j ) : aki = akj = 1 for some k ∈ {1, 2, . . . , m} }. It is
easy to see that the system (9.32) has the same 0–1 solutions as the system

xi + xj ≤ 1, (i, j ) ∈ E. (9.33)

Now, the 0–1 solutions of (9.33) are precisely the characteristic vectors of
the stable sets of G(A). Hence, by Theorem 9.28, there exists a single linear
inequality having the same 0–1 solutions as (9.32) (or (9.33)) if and only if G(A)
is a threshold graph. In particular, this proves that the aggregation problem is
polynomially solvable for set packing inequalities.
When (9.32) is not equivalent to a single linear inequality, one can push the
investigation a bit further and ask instead: What is the smallest integer δ for which
there exists a system of δ linear inequalities
n

wkj xj ≤ tk , k = 1, 2, . . . , δ, (9.34)
j =1

such that (9.32) and (9.34) have the same set of 0–1 solutions (Chvátal and Hammer
[201], Neumaier [709])? We denote this value by δ(A) and call it the threshold
dimension of A. Similarly, for a graph G = (V , E), we can define the threshold
9.7 Threshold graphs 443

dimension of G as δ(G) := δ(A(G)), where A(G) is the coefficient matrix of


the system (9.33) (notice that this definition is coherent in the sense that δ(A) =
δ(G(A)) for every 0–1 matrix A).
The threshold dimension has an interesting graph-theoretic interpretation. Of
course, a graph G = (V , E) is threshold if and only if δ(G) = 1. But, more gen-
erally, it can also be shown that δ(G) is the smallest δ for which there exist δ
threshold graphs Gk = (V , Ek ) (k = 1, 2, . . . , δ) satisfying E1 ∪ E2 ∪ · · · ∪ Eδ = E
(this is left as an exercise to the reader).
Chvátal and Hammer [201] proved that computing δ(G) is NP-hard.
Yannakakis [933] refined this result by proving that, for every fixed k ≥ 3, it is NP-
complete to decide whether δ(G) ≤ k. For many years, and in spite of a flurry of
research on this topic, it remained unknown whether testing δ(G) ≤ 2 was NP-hard
or not. Finally, the question was settled by Ma [641] who provided a polynomial-
time algorithm for the recognition of graphs with threshold dimension 2. Other
polynomial algorithms for this problem were later proposed by Raschle and Simon
[779] and Sterbini and Raschle [847]. We refer the reader to the original papers
or to Mahadev and Peled [645] for additional information. 

Application 9.13. (Mathematical psychology, social choice.) Let S denote a set of


individuals, and let P denote a set of propositions, and assume that each individual
declares either to “agree” or to “disagree” with each proposition. For instance,
the individuals may be citizens and the propositions may be items in an opinion
poll, or the individuals may be college students and the propositions may be math
problems that the students can either solve or not, and so forth.
We would like to map all individuals and all propositions to a common linear
scale (e.g., from “left” to “right” or from “hard” to “easy”) in such a way that an
individual agrees with all propositions following it and disagrees with all propo-
sitions preceding him on the scale. Such a scale is called a Guttman scale. More
precisely, assume that the data of the problem are described by the bipartite graph
H = (V , A), where V = S ∪ P and
A = { (s, p) ∈ S × P : s agrees with p }.
A Guttman scale for H is a mapping g from V to R such that, for each s ∈ S and
p ∈ P , (s, p) ∈ A if and only if g(s) < g(p).
Of course, not every bipartite graph admits a Guttman scale. To obtain a full
characterization, consider the graph G = (V , E), where
E = { (s, t) ∈ S × S : s  = t } ∪ A.
The following result is due to Cozzens and Leibowitz [223].
Theorem 9.34. The graph H has a Guttman scale if and only if G is a threshold
graph.
Proof. If G is a threshold graph, then it has a separating structure with threshold
t, with weight w(s) for vertex s ∈ S and with weight a(p) for vertex p ∈ P . We
444 9 Threshold functions

can now construct a Guttman scale g as follows. For p ∈ P , let g(p) = a(p). For
s ∈ S, consider the largest value a(p ∗ ) such that w(s) + a(p ∗ ) ≤ t, and define
g(s) = a(p∗ ). It is easy to check that g is a valid Guttman scale.
Conversely, if G is not a threshold graph, then by Theorem 9.30 it has four
vertices, say, 1, 2, 3, 4, such that (1, 2) ∈ E, (3, 4) ∈ E, (1, 3)  ∈ E and (2, 4)  ∈ E
(cf. Figure 9.7). We can assume, without loss of generality, that vertices 1 and 4
are in P , and vertices 2 and 3 are in S (indeed, 1 and 3 are not both in S, since
they are not linked; hence, we can assume that 1 ∈ P ; then, 2 ∈ S, etc.). Then, if g
is a Guttman scale, there holds

g(4) ≤ g(2) < g(1), since 2 agrees with 1 but 2 does not agree with 4,

g(1) ≤ g(3) < g(4), since 3 agrees with 4 but 3 does not agree with 1,
and we reach a contradiction. 
We refer to the paper by Cozzens and Leibowitz [223] for additional information
on the connections between Guttman scales and threshold graphs. 
Other connections between threshold functions and graph properties have been
explored in several papers. For instance, Benzaken and Hammer [67] characterized
domishold graphs: A graph G = (V , E) is domishold if there exists a structure
(w1 , w2 , . . . , wn , t) such that, for every S ⊆ V ,

S is dominating in G if and only if wi ≤ t.
i∈S

Hammer, Maffray, and Queyranne [451] investigated cut-threshold graphs in


which subsets of edges (or vertices) corresponding to cuts are characterized by
a similar threshold-type property. We refer again to Mahadev and Peled [645] for
more information on such graph classes.

9.8 Exercises
1. ABoolean function f (x1 , x2 , . . . , xn ) is a ball if there exist (w1 , w2 , . . . , wn , t) ∈
Rn such that, for all (x1 , x2 , . . . , xn ) ∈ Bn ,
n

f (x1 , x2 , . . . , xn ) = 0 if and only if (wi − xi )2 ≤ r 2 .
i=1

Prove that a function is a ball if and only if it is a threshold function (Hegedűs


and Megiddo [481]).
2. (a) Show that, if a Boolean function can be represented by a DNF of degree
k, then it is a polynomial threshold function of degree k in the sense of
Section 9.1.
(b) Show that the parity function f (x1 , x2 , . . . , xn ) = x1 ⊕ x2 ⊕ . . . ⊕ xn is
not a polynomial threshold function of degree k for any k < n (Wang and
Williams [897]).
9.8 Exercises 445

3. Prove that every threshold function has an integral separating structure


(Theorem 9.5).
4. Prove Theorem 9.6.
5. Derive Theorem 9.14 from the strong duality theorem of linear program-
ming.
6. Consider the Boolean function f on B 10 defined as follows: f (x1 ,
x2 , . . . , x10 ) = 1 if and only if

x1 + x2 + . . . + x10 ≥ 7

and

34x1 + 29x2 + 9x3 + 7x4 + 5x5 + 5x6 + 4x7 + 3x8 + 3x9 + x10 ≥ 50.

Prove that f is regular, but that f is not a threshold function.


7. Consider the separating structures (W ∗ , t ∗ ) = (13, 7, 6, 6, 4, 4, 4, 3, 2, 24) and
(V ∗ , t ∗ ) = (13, 7, 6, 6, 4, 4, 4, 2, 3, 24), as in Example 9.15.
(a) Show that (W ∗ , t ∗ ) and (V ∗ , t ∗ ) define the same self-dual threshold
function f .
(b) Observe that Y 1 = (1, 0, 1, 1, 0, 0, 0, 0, 0), Y 2 = (1, 0, 0, 1, 1, 0, 0, 0, 1),
Y 3 = (0, 1, 1, 0, 1, 1, 1, 0, 0), Y 4 = (0, 1, 0, 1, 1, 1, 1, 0, 0), and Y 5 =
(0, 0, 1, 1, 0, 1, 1, 1, 1) are minimal true points of f , and that
X1 = (1, 1, 0, 0, 0, 1, 0, 0, 0), X 2 = (1, 1, 0, 0, 0, 0, 1, 0, 0), X 3 =
(0, 1, 1, 1, 0, 0, 0, 1, 1), and X4 = (0, 0, 1, 1, 1, 1, 1, 0, 0) are maximal false
points of f . Observe also that 14Y 1 + 3Y 2 + 15Y 3 + 12Y 4 + 11Y 5 −
8X1 − 8X 2 − 10X 3 − 29X 4 = (1, 1, 1, 1, 1, 1, 1, 1, 4). Conclude that, in
every solution of the system (9.24)–(9.26) such that w9 ≤ 2, there holds
n
i=1 wi ≥ 49, and hence, no solution of (9.24)–(9.26) is simultaneously
smaller than both (W ∗ , t ∗ ) and (V ∗ , t ∗ ).
8. Prove that the following decision problem is co-NP-complete: Given a sep-
arating structure (w1 , w2 , . . . , wn , t), decide whether the threshold function
represented by this structure is self-dual. (Compare with Theorem 9.9.)
9. Use the lower bound (9.1) in Theorem 9.11 to prove the following: For every
k > 2 and for n large enough, there exists a threshold function of n variables,
so that any integral separating structure representing f involves a weight of
magnitude at least 2n/k . (Compare with the – much stronger – lower bound
in Theorem 9.12.)
10. Prove that any oracle algorithm for the threshold recognition problem must
perform, in the worst case, an exponential number of queries on the oracle.
(An oracle algorithm is an algorithm that can only gain information about
the input function through queries of the form: “Is X ∗ a true point of the
function?”)
11. Let S1 be the solution set of the system (9.17)–(9.18), and let S2 be the
solution set of the system (9.18)–(9.19). Show that no inclusion relation
holds in general between S1 and S2 .
446 9 Threshold functions

12. In 1973, the voting weights of the nine members of the Council of Ministers
of the European Economic Community were 10, 10, 10, 10, 5, 5, 3, 3, and
2, respectively. The threshold was 40 votes. Show that this voting procedure
is equivalent to the procedure defined by the smaller weights 6, 6, 6, 6, 3, 3,
2, 2, 1 with threshold 24. Compute the Banzhaf indices of the nine states.
13. Prove that every Chow function is completely monotone.
14. Let f and g be two functions on B n . Prove that, if f is a positive threshold
function and ωi (f ) = ωi (g) for i = 1, 2, . . . , n, then either f = g or ω(f ) <
ω(g).
15. Prove that a graph G = (V , E) is threshold if and only if there exist (n + 1)
numbers a1 , a2 , . . . , an and q such that, for all i, j in V ,

(i, j ) ∈ E if and only if ai + aj > q. (9.35)

16. Show that, if G is a threshold graph and the numbers a1 , a2 , . . . , an , q satisfy


(9.35), then (a1 , a2 , . . . , an , q) is not necessarily a separating
 structure
 of G.

17. A positive Boolean function f (x1 , x2 , . . . , xn ) = P ∈E x
j ∈P j is r-
uniform if |P | = r for all P ∈ E. We say that a r-uniform function has
property (T) if there exist (n + 1) numbers a1 , a2 , . . . , an and q such that, for
all P ⊆ {1, 2, . . . , n} with |P | = r,


P ∈ E if and only if ai > q.
i∈P

Prove that

(a) if f is uniform and threshold, then f has property (T); if f is uniform


and has property (T), then f is regular (Golumbic [398]); and
(b) the reverse of both implications in (a) may fail for 3-uniform functions
(Reiterman, Rödl, Šiňajová, and Tůma [784]).

18. Let G = (V , E) be a threshold graph on the vertex-set V = {1, 2, . . . , n}, and


let (d1 , d2 , . . . , dn ) be the degree sequence of G, where di is the degree of
vertex i (i = 1, 2, . . . , n). Prove that

(a) K = { i ∈ V : i − 1 ≤ di } is a maximum clique of G;


(b) V \ K is a stable set of G.

19. If G is a threshold graph, express the Chow parameters of fG as a function


of the degree sequence of d(G). (B. Simeone, private communication.)
20. Show that the threshold dimension of a graph G is the smallest value of δ for
which there exist δ threshold graphs Gk = (V , Ek ) (k = 1, 2, . . . , δ) satisfying
E1 ∪ E2 ∪ · · · ∪ Eδ = E.
9.8 Exercises 447

Question for thought


21. Let k(n) be the smallest integer k such that every k-asummable 4√ function
5 of
n variables6√ 7 is a threshold function. It is known that k(n) ≥ n , and that
k(15) > 15 (Muroga [698]). What else can be said about k(n)?
22. Is it possible to recognize threshold functions through an entirely combina-
torial procedure, that is, without resorting to the solution of the system (TS)
as in Theorem 9.16, or by developing a specialized combinatorial algorithm
for its solution?
23. If f (x1 , x2 , . . . , xn ) is a positive Boolean function, denote by δ(f ) (respec-
tively, ρ(f )) the smallest number m such that f is the disjunction of m
threshold (respectively, regular) functions.
(a) Show that δ(f ) and ρ(f ) can take any value between 1 and 7n/28.
(b) Is it true that, for every pair of integers (d, r) with d ≥ r, there exists
a positive function f with δ(f ) = d and ρ(f ) = r ? (See Neumaier
[709].)
10
Read-once functions
Martin C. Golumbic and Vladimir Gurvich

10.1 Introduction
In this chapter, we present the theory and applications of read-once Boolean func-
tions, one of the most interesting special families of Boolean functions. A function
f is called read-once if it can be represented by a Boolean expression using
the operations of conjunction, disjunction, and negation in which every variable
appears exactly once. We call such an expression a read-once expression for f .
For example, the function

f0 (a, b, c, w, x, y, z) = ay ∨ cxy ∨ bw ∨ bz

is a read-once function, since it can be factored into the expression

f0 = y(a ∨ cx) ∨ b(w ∨ z)

which is a read-once expression.


Observe, from the definition, that read-once functions must be monotone (or
unate), since every variable appears either in its positive or negative form in the
read-once expression (see Exercise 1 at the end of the chapter). However, we
will make the stronger assumption that a read-once function is positive, simply
by renaming any negative variable x i as a new positive variable xi . Thus, every
variable will be positive, and we may freely rely on the results presented earlier
(in particular, in Chapters 1 and 4) on positive Boolean functions.
Let us look at two simple functions,

f1 = ab ∨ bc ∨ cd

and
f2 = ab ∨ bc ∨ ac.
Neither of these is a read-once function; indeed, it is impossible to express them so
that each variable appears only once. (Try to do it.) The functions f1 and f2 illustrate
the two types of forbidden functions that characterize read-once functions, as we

448
10.1 Introduction 449

Figure 10.1. The co-occurrence graph of f0 = ay ∨ cxy ∨ bw ∨ bz.

Figure 10.2. The co-occurrence graphs of (a): f1 , and (b): f2 , f3 .

will see. We begin by defining the co-occurrence graph of a positive Boolean


function.
Let f be a positive Boolean function over the variable set V = {x1 , x2 , . . . , xn }.
The co-occurrence graph of f , denoted G(f ) = (V , E), has vertex set V (the same
as the set of variables), and there is an edge (xi , xj ) in E if xi and xj occur together
(at least once) in some prime implicant of f . In this chapter, we often regard a
prime implicant as the set of its literals. Formally, let P denote the collection of
prime implicants of f . Then,

(xi , xj ) ∈ E ⇐⇒ xi , xj ∈ P for some P ∈ P.

Figures 10.1 and 10.2 show the co-occurrence graphs of f0 , f1 , f2 .


We denote by P4 the graph consisting of a chordless path on 4 vertices and 3
edges, which is the graph G(f1 ) in Figure 10.2 (see also Appendix A). A graph is
called P4 -free if it contains no induced subgraph isomorphic to P4 . The P4 -free
graphs are also known as cographs (for “complement reducible graphs”); we will
have more to say about them in Section 10.4.
Since we have observed that f1 is not read-once, and since its co-occurrence
graph is P4 , it would be reasonable to conjecture that the co-occurrence graph
of a read-once function must be P4 -free. In fact, we will prove this statement
in Section 10.3. This is not enough, however. In order to characterize read-once
functions in terms of graphs, we will need a second property called normality.1

1 The property of normality is sometimes called clique-maximality in the literature. It also appears in the
definition of conformal hypergraphs in Berge [71] and is used in the theory of acyclic hypergraphs.
450 10 Read-once functions

To see this, note that the function

f3 = abc

has the same co-occurrence graph as f2 , namely, the triangle G(f2 ) = G(f3 ) in
Figure 10.2, yet f3 is clearly read-once and f2 is not read-once. This example
illustrates the motivation for the following definition.
A Boolean function f is called normal if every clique of its co-occurrence graph
is contained in a prime implicant of f .
In our example, f2 fails to be normal, since the triangle {a, b, c} is not contained
in any prime implicant of f2 . This leads to our second necessary property of read-
once functions, namely, that a read-once function must be normal, which we will
also prove in Section 10.3. Moreover, a classical theorem of Gurvich [422, 426]
shows that combining these two properties characterizes read-once functions.

Theorem 10.1. A positive Boolean function f is read-once if and only if its


co-occurrence graph G(f ) is P4 -free and f is normal.

A new proof of this theorem will be given in Section 10.3 as part of


Theorem 10.6.
Read-once functions first appeared explicitly in the literature in the papers
of Chein [190] and Hayes [479] that gave exponential time recognition algo-
rithms for the family (see the historical notes at the end of this chapter).
Gurvich [422, 425, 426] gave the first characterization theorems for read-once
functions; they are presented in Section 10.3. Several authors have subsequently
discovered and rediscovered these and a number of other characterizations. The-
orem 10.1 also provides the justification for the polynomial time recognition
algorithm of read-once functions by Golumbic, Mintz, and Rotics [401, 402],
presented in Section 10.5. In particular, we will show how to factor read-once
functions using the properties of P4 -free graphs.
Read-once functions have been studied in computational learning theory, where
they have been shown to constitute a class that can be learned in polynomial time.
Section 10.6 will survey some of these results. Additional applications of read-once
functions are presented in Section 10.7.
Before turning our full attention to read-once functions, however, we review a
few properties of the dual of a Boolean function and prove an important result on
positive Boolean functions that will be useful in subsequent sections.

10.2 Dual implicants


In this section, we first recall some of the relationships between the prime impli-
cants of a function f and the prime implicants of its dual function f d in the case
of positive Boolean functions. All of these properties were presented in Chapter 1
and Chapter 4. We then present a characterization of the subimplicants of the dual
of a positive Boolean function, due to Boros, Gurvich, and Hammer [121]. This
10.2 Dual implicants 451

result will be used later in the proof of one of the characterizations of read-once
functions.

10.2.1 Implicants and dual implicants


The dual of a Boolean function f is the function f d defined by

f d (X) = f (X),

and an expression for f d can be obtained from any expression for f by simply
interchanging the operators ∧ and ∨ as well as the constants 0 and 1. In particular,
given a DNF expression for f , this exchange yields a CNF expression for f d . This
shows that the dual of a read-once function is also read-once.
The process of transforming a DNF expression of f into a DNF expression of
f d is called DNF dualization; its complexity for positive Boolean functions is still
unknown, the current best algorithm being quasi-polynomial [347]; see Chapter 4.
Let P be the collection of prime implicants of a positive Boolean function f
over the variables x1 , x2 , . . . , xn , and let D be the collection of prime implicants of
the dual function f d . We assume throughout that all of the variables for f (and
hence for f d ) are essential. We use the term “dual (prime) implicant” of f to
mean a (prime) implicant of f d . For positive functions, the prime implicants of f
correspond precisely to the set of minimal true points minT (f ), and the dual prime
implicants of f correspond precisely to the set of maximal false points maxF (f );
see Sections 1.10.3 and 4.2.1.
Theorem 4.7 states that the implicants and dual implicants of a Boolean function
f , viewed as sets of literals, have pairwise nonempty intersections. In particular,
this holds for the prime implicants and the dual prime implicants. Moreover, the
prime implicants and the dual prime implicants are minimal with this property,
that is, for every proper subset S of a dual prime implicant of f , there is a prime
implicant P such that P ∩ S = ∅.
In terms of hypergraph theory, the prime implicants P form a clutter (namely,
a collection of sets, or hyperedges, such that no set contains another set), as does
the collection of dual prime implicants D.
Finally, we recall the following properties of duality to be used in this chapter
and which can be derived from Theorems 4.1 and 4.19.

Theorem 10.2. Let f and g be positive Boolean functions over {x1 , x2 , . . . , xn },


and let P and D be the collections of prime implicants of f and g, respectively.
Then the following statements are equivalent:

(i) g = f d .
(ii) For every partition of {x1 , x2 , . . . , xn } into sets A and A, there is either a
member of P contained in A or a member of D contained in A, but not
both.
(iii) D is exactly the family of minimal transversals of P.
452 10 Read-once functions

(iv) P is exactly the family of minimal transversals of D.


(v) (a) For all P ∈ P and D ∈ D, we have P ∩ D  = ∅; and
(b) For every subset B ⊆ {x1 , x2 , . . . , xn }, there exists D ∈ D such that
D ⊆ B if and only if P ∩ B  = ∅ for every P ∈ P.

We obtain from Theorem 10.2(v) the following characterization of dual


implicants.

Theorem 10.3. A set of variables B is a dual implicant of the function f if and


only if P ∩ B  = ∅ for all prime implicants P of f .

10.2.2 The dual subimplicant theorem


We are now ready to present a characterization of the subimplicants of the dual of a
positive function, due to Boros, Gurvich, and Hammer [121]. This characterization
is interesting on its own and also provides a useful tool for proving other results.
Let f be a positive Boolean function over the variables V = {x1 , x2 , . . . , xn },
and let f d be its dual. As before, P and D denote the prime implicants of f and
f d , respectively. We assume throughout that all of the variables of f (and f d ) are
essential.
A subset T of the variables is called a dual subimplicant of f if T is a subset
of a dual prime implicant of f , that is, if there exists a prime implicant D of f d
such that T ⊆ D. A proper dual subimplicant is a nonempty proper subset of a
dual prime implicant.

Example 10.1. Let f = x1 x2 ∨ x2 x3 x4 ∨ x4 x5 . Its dual is f d = x1 x3 x5 ∨


x1 x4 ∨ x2 x4 ∨ x2 x5 . The proper dual subimplicants of f are the pairs
{x1 , x3 }, {x3 , x5 }, {x1 , x5 } and the five singletons {xi }, i = 1, . . . , 5. 

We will make use below of the following consequence of Theorem 10.3:

Remark 10.1. Let T be a subset of the variables {x1 , x2 , . . . , xn }. If T is a


proper dual subimplicant of f , then there exists a prime implicant P ∈ P such that
P ∩ T = ∅. 

Let T be a subset of the variables. Our goal will be to determine whether T is


contained in some D ∈ D, namely, whether T is a dual subimplicant. We define
the following sets of prime implicants of f , with respect to the set T :

P0 (T ) = {P ∈ P|P ∩ T = ∅},

and, for all x ∈ T ,


Px (T ) = {P ∈ P|P ∩ T = {x}}.
Note that by Theorem 10.3, P0 (T ) is empty if and only if T is a dual implicant, and
by Remark 10.1, P0 (T ) is nonempty when T is a proper dual subimplicant. The
remaining prime implicants in P, which contain two or more variables of T , will
10.2 Dual implicants 453

not be relevant for our analysis. (We may omit the parameter T from our notation
when it is clear which subset is meant.)
A selection S(T ), with respect to T , consists of one prime implicant Px ∈
Px (T ) for every x ∈ T . A selection is called covering if there is a prime implicant

P0 ∈ P0 (T ) such that P0 ⊆ x∈T Px . Otherwise, it is called noncovering. (See
Example 10.2.)
We now present the characterization of the dual subimplicants of a positive
Boolean function from [121].
Theorem 10.4. Let f be a positive Boolean function over the variable set
{x1 , x2 , . . . , xn }, and let T be a subset of the variables. Then T is a dual subimplicant
of f if and only if there exists a noncovering selection with respect to T .
Proof. Assume that T is a dual subimplicant of f , and let D ∈ D be a prime
implicant of f d for which T ⊆ D. For any variable x ∈ T , the subset D \ {x} is a
proper subset of D, and therefore, by Remark 10.1 (or trivially, if D = {x}), there
exists a prime implicant Px ∈ P such that Px ∩ (D \ {x}) = ∅. Since Px ∩ D  = ∅
by Theorem 10.3, we have {x} = Px ∩ D = Px ∩ T , that is, Px ∈ Px (T ).
If S = {Px |x ∈ T } were a covering selection, then there would exist a prime

implicant P0 ∈ P0 (T ) such that P0 ⊆ x∈T Px . But this would imply
1  1
P0 ∩ D ⊆ Px ∩ D = (Px ∩ D) = T ,
x∈T x∈T

which, together with P0 ∩ T = ∅, would give P0 ∩ D = ∅, contradicting Theo-


rem 10.3. Thus, the selection S we have constructed is a noncovering selection
with respect to T . (Note that in the special case when T = D, we would have
P0 (T ) empty, and any selection would be noncovering.)
Conversely, suppose there exists a noncovering selection S = {Px |x ∈ T }, where
Px ∈ Px (T ). Since S is noncovering, we have for all P0 ∈ P0 (T ) that
1
P0  Px .
x∈T

Let B be defined as the complementary set


 81 
B = {x1 , x2 , . . . , xn } Px ∪ T .
x∈T

Clearly, for any prime implicant P0 ∈ P0 (T ), we have P0 ∩ B  = ∅, since S is


non-covering. Moreover, by definition, all other prime implicants P ∈ P \ P0 (T )
intersect T , and therefore, they intersect B, since T ⊆ B. Thus, we have shown
that P ∩ B  = ∅ for all P ∈ P, implying that B is a (not necessarily prime) dual
implicant.
Let D ∈ D be a dual prime implicant such that D ⊆ B. From the definition of
B, it follows that Px ∩ B = {x} for all x ∈ T . But each Px intersects D, since Px is
a prime implicant and D is a dual prime implicant, which, together with the fact
that D ⊆ B, implies that Px ∩ D = {x}. Hence, T ⊆ D, proving that T is a dual
subimplicant. 
454 10 Read-once functions

Figure 10.3. The co-occurrence graph for Example 10.2.

We will often apply Theorem 10.4 in its contrapositive form or in its dual form,
as follows.
Remark 10.2. A subset T is not a dual subimplicant of f if and only if every
selection with respect to T is a covering selection. 
Remark 10.3. We may also apply Theorem 10.4 to subimplicants of f and dual
selections, where the roles of P and D are reversed in the obvious manner. 
Example 10.2. Consider the positive Boolean function

f = adg ∨ adh ∨ bdg ∨ bdh ∨ eag ∨ ebg ∨ ecg ∨ eh,

whose co-occurrence graph is shown in Figure 10.3.


(i) Let T = {b, c, h}. We have

P0 (T ) = {adg, eag}, Pb (T ) = {bdg, ebg}, Pc (T ) = {ecg},


Ph (T ) = {adh, eh}.

The selection S = {bdg, ecg, eh} is noncovering since {a, d, g}, {a, e, g} 
{b, c, d, e, g, h}; hence, by Theorem 10.4, T is a dual subimplicant.
(ii) Now let T = {a, b, g}. We have

P0 (T ) = {eh}, Pa (T ) = {adh}, Pb (T ) = {bdh}, Pg (T ) = {ecg}.

There is only one possible selection S = {adh, bdh, ecg} and S is a cov-
ering selection since {e, h} ⊆ {a, b, c, d, e, g, h}. Hence, by Remark 10.2, T
is not a dual subimplicant.
It can be verified that T is contained in the dual prime implicant abch, and
that, to extend T to a dual implicant, it would be necessary to add either e or h;
however, neither abeg nor abgh are prime (since abe, bgh ∈ D), see Exercise 5
at the end of the chapter. 
10.2 Dual implicants 455

The problem of recognizing whether a given subset T is a dual subimplicant of


a positive function f given by its complete DNF was shown to be NP-complete by
Boros, Gurvich, and Hammer [121]. However, they point out that Theorem 10.4
can be applied in a straightforward manner to answer this recognition problem
in O(n|f |1+min{|T |,|P0 (T )|} ) time, where |f | denotes the number of literals in the
complete DNF of f . This becomes feasible for very small and very large values
of |T |, such as 2, 3, n − 2, n − 1. Specifically, by applying this for every pair
T = {xi , xj }, 1 ≤ i < j ≤ n, we obtain the following:
Theorem 10.5. The co-occurrence graph G(f d ) of the dual of a positive Boolean
function f can be determined in polynomial time, when f is given by its complete
DNF. The complexity of determining all the edges of G(f d ) is at most O(n3 |f |3 ).
Proof. Consider a given pair T = {xi , xj }. We observe the following:
(1) If either Pxi or Pxj is empty, then there is no possible selection (covering or
noncovering). Hence, Theorem 10.4 implies that xi and xj are not contained
together in a dual prime implicant and, therefore, are not adjacent in G(f d ).
(2) If both Pxi and Pxj are nonempty, but P0 is empty, then there is a selection
and every selection will be noncovering. Hence, Theorem 10.4 implies that
{xi , xj } is a dual subimplicant, and so xi and xj are adjacent in G(f d ).
(3) If all three sets P0 , Pxi and Pxj are nonempty, then we may have to check all
possible O(|f |2 ) selections before knowing whether there is a noncovering
selection.
We leave a detailed complexity analysis as an exercise for the reader. 

Example 10.3. Let us calculate G(f d ) for the function f = abc ∨ bde ∨ ceg, as
illustrated in Figure 10.4.
The pair (a, b) is not an edge: Indeed, we have in this case Pa = ∅, so a and
b are not adjacent in G(f d ). Similarly, (a, c), (b, d), (c, g), (d, e), (e, g) are also
nonedges.
The pair (b, c) is an edge: In this case, both Pb and Pc are nonempty, but P0 is
empty, so b and c are adjacent in G(f d ). Similarly, (b, e), (c, e) are also edges.

Figure 10.4. The co-occurrence graphs of f and f d in Example 10.3.


456 10 Read-once functions

The pair (a, e) is an edge: In this case, as in the previous one, both Pa  = ∅ and
Pe  = ∅, but P0 = ∅, so a and e are adjacent in G(f d ). Similarly, (b, g), (c, d) are
also edges.
The pair (a, d) is an edge: In this case, Pa = {abc}, Pd = {bde}, P0 = {ceg}.
Since {c, e, g}  {a, b, c, d, e}, we conclude that a and d are adjacent in G(f d ).
Similarly, (a, g), (d, g) are also edges.
Notice what happens if we add an additional prime implicant bce to the function
f in this example. Consider the function f = abc ∨ bde ∨ ceg ∨ bce. Then ad is
not a dual subimplicant of f although it was of f . Indeed, there is still only one
selection {abc, bde}, but now it is covering, since it contains bce. By symmetry,
neither ag nor dg are dual subimplicants of f . 

10.3 Characterizing read-once functions


In this section, we present the mathematical theory underlying read-once functions
due to Gurvich [422, 425, 426] and rediscovered by several other authors; see
[293, 294, 548, 696]. The algorithmic aspects of recognizing and factoring read-
once functions will be presented in Section 10.5.
Recall from Section 10.1 that a read-once expression is a Boolean expression
in which every variable appears exactly once. A read-once Boolean function is a
function that can be transformed (i.e., factored) into a read-once expression over
the operations of conjunction and disjunction. We have also assumed read-once
functions to be positive.
A positive Boolean expression, over the operations of conjunction and disjunc-
tion, may be represented as a (rooted) parse tree whose leaves are labeled by the
variables {x1 , x2 , . . . , xn }, and whose internal nodes are labeled by the Boolean
operations ∧ and ∨. The parse tree represents the computation of the associated
Boolean function according to the given expression, and each internal node is the
root of a subtree corresponding to a part of the expression; see Figure 10.5. (A parse
tree is a special type of combinational circuit, as introduced in Section 1.13.2.) If
the expression is read-once, then each variable appears on exactly one leaf of the
tree, and there is a unique path from the root to the variable.
We begin by presenting a very useful lemma relating a read-once expression to
the co-occurrence graph of the function. It also shows that the read-once expression
is unique for a read-once function (Exercise 9).
Lemma 10.1. Let T be the parse tree of a read-once expression for a positive
Boolean function f over the variables x1 , x2 , . . . , xn . Then (xi , xj ) is an edge in
G(f ) if and only if the lowest common ancestor of xi and xj in the tree T is labeled
∧ (conjunction).
Proof. Since T is a tree, there is a unique path from the leaf labeled xi to the root.
Thus, for a pair (xi , xj ), there is a unique lowest common ancestor v of xi and xj .
The lemma is trivial if there is only one variable. Let us assume that the lemma is
true for all functions with fewer than n variables, and prove the result by induction.
10.3 Characterizing read-once functions 457

Figure 10.5. The parse tree of the expression x2 (x1 ∨ x3 ) ∨ x4 (x3 ∨ x5 ) ∨ x5 x1 .

Let u1 , . . . , ur be the children of the root of T , and for k = 1, . . . , r, let Tk be the


subexpression (subtree) rooted at uk , denoting its corresponding function by fk .
Note that the variables at the leaves of Tk are disjoint from the leaves of Tl for
k  = l, since the expression is read-once.
If the root of T is labeled ∨, then f = f1 ∨ · · · ∨ fr and the graph G(f ) will be
the disjoint union of the graphs G(fk ) (k = 1, . . . , r), since multiplying out each of
the expressions Tk will yield disjoint prime implicants of f . Thus, xi and xj are
adjacent in G(f ) if and only if they are in the same Tk and adjacent in G(fk ) and,
by induction, if and only if their lowest common ancestor (in Tk and hence in T )
is labeled ∧ (conjunction).
If the root is labeled ∧, then f = f1 ∧ · · · ∧ fr and the graph G(f ) will be the
join of the graphs G(fk ), (k = 1, . . . , r). That is, every vertex of the subgraph G(fk )
is adjacent to every vertex of the subgraph G(fl ) for k  = l, since multiplying out
each expression Tk and then expanding the entire expression T will put every pair
of variables from different subtrees into some (perhaps many) prime implicants.
Therefore, if xi and xj are on leaves of different subtrees, then they are connected
in G(f ), and their lowest common ancestor is the root of T that is labeled ∧. If
xi and xj are on leaves of the same subtree, then again by induction, (xi , xj ) is an
edge in G(fk ) if and only if the lowest common ancestor of xi and xj is labeled ∧
(conjunction). 

We are now ready to present and prove the main characterization theorem of
read-once functions. We describe briefly what will be shown in our Theorem 10.6.
We saw in Theorem 10.2 that for any positive Boolean function f , every prime
implicant P of f and every prime implicant D of its dual f d must have at least
one variable in common. This property is strengthened in the case of read-once
functions, by condition (iv) in Theorem 10.6, which claims that f is read-once if
and only if this common variable is unique. Moreover, this condition immediately
458 10 Read-once functions

implies (by definition) that the co-occurrence graphs G(f ) and G(f d ) have no
edges in common; otherwise, a pair of variables adjacent in both graphs would
be contained in some prime implicant and in some dual prime implicant. This
is condition (iii) of our theorem and already implies that recognizing read-once
functions has polynomial-time complexity (by Theorem 10.5).
Condition (ii) is a further strengthening of condition (iii). It says that in addition
to being edge-disjoint, the graphs are complementary, that is, every pair of variables
appear together either in some prime implicant or in some dual prime implicant,
but not both.
The remaining condition (v) characterizing read-once functions is the one
mentioned as Theorem 10.1 at the beginning of this chapter, namely, that the
co-occurrence graph G(f ) is P4 -free and the maximal cliques of G(f ) are pre-
cisely the prime implicants of f (normality). It is condition (v) that will be used
in Section 10.5 to obtain an efficient O(n|f |) recognition algorithm for read-once
functions.
Example 10.4. The function

f4 = x1 x2 ∨ x2 x3 ∨ x3 x4 ∨ x4 x5 ∨ x5 x1 ,

whose co-occurrence graph G(f4 ) is the chordless 5-cycle C5 , is normal but G(f4 )
is not P4 -free. Hence, f4 is not a read-once function. Its dual

f4d = x1 x2 x4 ∨ x2 x3 x5 ∨ x3 x4 x1 ∨ x4 x5 x2 ∨ x5 x1 x3 ,

whose co-occurrence graph G(f4d ) is the clique (complete graph) K5 , which is


P4 -free, is not a normal function. 

Theorem 10.6. Let f be a positive Boolean function over the variable set
{x1 , x2 , . . . , xn }. Then the following conditions are equivalent:
(i) f is a read-once function.
(ii) The co-occurrence graphs G(f ) and G(f d ) are complementary, that is,
G(f d ) = G(f ).
(iii) The co-occurrence graphs G(f ) and G(f d ) have no edges in common, that
is, E(G(f )) ∩ E(G(f d )) = ∅.
(iv) For all P ∈ P and D ∈ D, we have |P ∩ D| = 1.
(v) The co-occurrence graph G(f ) is P4 -free and f is normal.
Proof. (i) =⇒ (ii): Assume that f is a read-once function, and let T be the parse
tree of a read-once expression for f . By interchanging the operations ∨ and ∧, we
obtain the parse tree T d of a read-once expression for the dual f d . By Lemma 10.1,
(xi , xj ) is an edge in G(f ) if and only if the lowest common ancestor of xi and
xj in the tree T is labeled ∧ (conjunction). Similarly, (xi , xj ) is an edge in G(f d )
if and only if the lowest common ancestor of xi and xj in the tree T d is labeled
∧ (conjunction). It follows from the foregoing construction that G(f ) and G(f d )
are complementary.
10.3 Characterizing read-once functions 459

(ii) =⇒ (iii): Trivial.


(iii) ⇐⇒ (iv): As noted in the discussion above, by definition, the co-occurrence
graphs G(f ) and G(f d ) have no edges in common if and only if |P ∩ D| ≤ 1 for
every prime implicant P of f and every prime implicant D of its dual f d . However,
for any positive Boolean function, we have |P ∩D| ≥ 1 by Theorem 10.2(v), which
proves the equivalence.
(iv) =⇒ (v): We first prove that the function f is normal (Claim 1), and then
that the graph G(f ) is P4 -free (Claim 3). We may assume both conditions (iii) and
(iv) since we have already shown that they are equivalent.
Claim 1. The function f is normal, that is, every clique of G(f ) is contained
in a prime implicant of f .
The claim is certainly true for any clique of size one, since we assume that all
variables are essential, and it is true for any clique of size two, by the definition
of the co-occurrence graph G(f ). Let us consider the smallest value k (k ≥ 3)
for which the claim fails, that is, there exists a clique K = {x1 , . . . , xk } of G(f )
that is not a subimplicant of f . We denote the subcliques of K of size k − 1 by
Ki = K − {xi }, i = 1, . . . , k.
By our assumption of k being smallest possible, each set Ki is a subimplicant
of f , so each is contained, respectively, in a prime implicant Pi ∈ P, which we
can express in the form
Pi = Ki ∪ Ai ,
where K ∩ Ai = ∅, since K is not a subimplicant.
In addition, each variable xi ∈ K is contained in a dual prime implicant Di ∈ D,
which we can express in the form
Di = {xi } ∪ Bi ,
where K ∩ Bi = ∅, by our assumption (iv). Applying (iv) further, we note that
|Pi ∩ Dj | = |(Ki ∪ Ai ) ∩ ({xj } ∪ Bj )| = 1
for all i, j . In the case of i  = j , since xj ∈ Ki , this implies
Ai ∩ Bj = ∅ (∀i  = j ). (10.1)
In the case of i = j , we obtain
|Ai ∩ Bi | = 1,
since the common variable cannot belong to K. This enables us to define
yi = Ai ∩ Bi (i = 1, . . . k). (10.2)
Moreover, yi  = yj for i  = j by (10.1).
We now apply Theorem 10.4 (the dual subimplicant theorem). Consider a pair
T = {xi , xj } (1 ≤ i < j ≤ k). Since (xi , xj ) is an edge of G(f ), by assumption (iii),
it is not an edge of G(f d ) and, hence, not a dual subimplicant. By Theorem 10.4,
this implies that every selection S with respect to T must be a covering selection.
460 10 Read-once functions

Now, S = {Pi , Pj } is a selection for T = {xi , xj } since Pi ∩ {xi , xj } = {xj }


and Pj ∩ {xi , xj } = {xi }. Therefore, there exists a prime implicant P0 such that
P0 ∩ {xi , xj } = ∅ and P0 ⊆ Pi ∪ Pj . Thus, P0 ⊆ (K \ {xi , xj }) ∪ Ai ∪ Aj .
Since, 1 = |P0 ∩ Di | = |P0 ∩ Bi |, it follows from (10.2) that yi ∈ P0 . Similarly,
1 = |P0 ∩ Dj | = |P0 ∩ Bj |, so yj ∈ P0 . Thus, (yi , yj ) is an edge in G(f ). In fact,
since i and j were chosen arbitrarily, the set Y = {y1 , . . . , yk } is a clique in G(f ).
Now, we apply Theorem 10.4 to the dual function f d , as suggested in
Remark 10.3. Since the clique K is not a subimplicant of f , every dual selection S
with respect to K must be a covering dual selection. In particular, S = {D1 , . . . , Dk }
is such a selection since Di = {xi } ∪ Bi intersects K only in xi . Therefore,
there
 exists a dual prime implicant D0 satisfying D0 ∩ K = ∅ and D0 ⊆
xi ∈K ({xi } ∪ Bi ), or
1
D0 ⊆ Bi . (10.3)
xi ∈K

For each i, we have 1 = |D0 ∩ Pi | = |D0 ∩ (Ki ∪ Ai )|. It therefore follows


from (10.1), (10.2), and (10.3) that D0 ∩ Pi = {yi }. Moreover, since i was chosen
arbitrarily, Y = {y1 , . . . , yk } ⊆ D0 , implying that Y is a clique in G(f d ). This is
a contradition to (iii), since Y cannot be both a clique in G(f ) and a clique in
G(f d ). This proves Claim 1.
Claim 2. If (x1 , x2 ), (x2 , x3 ) ∈ E(G(f )) and (x1 , x3 )  ∈ E(G(f )), then (x1 , x3 ) ∈
E(G(f d )).
Suppose that (x1 , x3 ) is not an edge of G(f d ). Choose prime implicants

{x1 , x2 } ∪ A12 , {x2 , x3 } ∪ A23 ∈ P

and dual prime implicants

{x1 } ∪ B1 , {x3 } ∪ B3 ∈ D.

By our assumptions,

{x1 , x2 , x3 } ∩ (A12 ∪ A23 ∪ B1 ∪ B3 ) = ∅.

By condition (iv), we have

|({x1 , x2 } ∪ A12 ) ∩ ({x1 } ∪ B1 )| = 1 =⇒ |A12 ∩ B1 | = 0 (10.4)


|({x2 , x3 } ∪ A23 ) ∩ ({x3 } ∪ B3 )| = 1 =⇒ |A23 ∩ B3 | = 0 (10.5)

and

|({x1 , x2 } ∪ A12 ) ∩ ({x3 } ∪ B3 )| = 1 =⇒ |A12 ∩ B3 | = 1 (10.6)


|({x2 , x3 } ∪ A23 ) ∩ ({x1 } ∪ B1 )| = 1 =⇒ |A23 ∩ B1 | = 1. (10.7)

From (10.6) and (10.7), we can define

y1 = A12 ∩ B3

y3 = A23 ∩ B1
10.3 Characterizing read-once functions 461

and from (10.4) and (10.5),


y1 = y3 .
On the one hand, because we have assumed that {x1 , x3 } is not a subimplicant
of the dual f d , by Theorem 10.4, we claim that every selection with respect to
{x1 , x3 } is covering. Now,
S = {{x1 , x2 } ∪ A12 , {x2 , x3 } ∪ A23 }
is such a selection, so there exists a prime implicant
P0 ⊆ {x2 } ∪ A12 ∪ A23 .
By condition (iv), (10.4), and (10.5), we have
|P0 ∩ ({x1 } ∪ B1 )| = 1 =⇒ P0 ∩ ({x1 } ∪ B1 ) = y3
and
|P0 ∩ ({x3 } ∪ B3 )| = 1 =⇒ P0 ∩ ({x3 } ∪ B3 ) = y1 .
Hence, {y1 , y3 } ⊆ P0 and (y1 , y3 ) is an edge of G(f ), that is,
(y1 , y3 ) ∈ E(G(f )). (10.8)
On the other hand, since we have also assumed that {x1 , x3 } is not a subimplicant
of the original function f , we again apply Theorem 10.4, this time in its dual form,
by claiming that every dual selection with respect to {x1 , x3 } is covering. Now,
S = {{x1 } ∪ B1 , {x3 } ∪ B3 }
is such a dual selection, so there exists a dual prime implicant
D0 ⊆ B1 ∪ B3 .
By condition (iv), we have
|D0 ∩ ({x2 , x3 } ∪ A23 )| = 1 =⇒ D0 ∩ ({x2 , x3 } ∪ A23 ) = y3
and
|D0 ∩ ({x1 , x2 } ∪ A12 )| = 1 =⇒ D0 ∩ ({x1 , x2 } ∪ A12 ) = y1 .
Hence, {y1 , y3 } ⊆ D0 and (y1 , y3 ) is an edge of G(f d ), that is,
(y1 , y3 ) ∈ E(G(f d )). (10.9)
Finally, combining the conclusions of (10.8) and (10.9), we have a contradiction,
since G(f ) and G(f d ) cannot share a common edge. This proves Claim 2.
Claim 3. The graph G(f ) is P4 -free.
Suppose G(f ) has a copy of P4 with edges (x1 , x2 ), (x2 , x3 ),(x3 , x4 ) and
nonedges (x2 , x4 ),(x4 , x1 ),(x1 , x3 ). By Claim 2, we have (x1 , x3 ), (x2 , x4 ) are edges
in G(f d ). Choose prime implicants
{x1 , x2 } ∪ A12 , {x3 , x4 } ∪ A34 ∈ P
462 10 Read-once functions

and dual prime implicants

{x1 , x3 } ∪ B13 , {x2 , x4 } ∪ B24 ∈ D.

By repeatedly using condition (iv), it is simple to verify that the sets

{x1 , x2 , x3 , x4 }, A12 ∪ A34 , B13 ∪ B24 (10.10)

are pairwise disjoint.


Since {x1 , x4 } is not a subimplicant of f , Theorem 10.4 implies that the dual
selection
S = {{x1 , x3 } ∪ B13 , {x2 , x4 } ∪ B24 }
with respect to {x1 , x4 } must be covering. So there exists a dual prime implicant
D0 ∈ D satisfying D0 ⊆ S , where

S = ({x1 , x3 } ∪ B13 ) ∪ ({x2 , x4 } ∪ B24 )

and x1 , x4  ∈ D0 . By the pairwise disjointness of the sets in (10.10), we have

S ∩ ({x1 , x2 } ∪ A12 ) = {x1 , x2 },

so
D0 ∩ ({x1 , x2 } ∪ A12 ) = {x2 }.
Hence, x2 ∈ D0 .
In a similar manner, we can show that

D0 ∩ ({x3 , x4 } ∪ A34 ) = {x3 }.

Hence, x3 ∈ D0 .
Thus, we have shown x2 , x3 ∈ D0 , implying that (x2 , x3 ) is an edge of G(f d ),
a contradiction to condition (iii). This proves Claim 3.
(v) =⇒ (i): Let us assume that f is normal and that G = G(f ) is P4 -free. We
will show how to construct a read-once formula for f recursively. In order to prove
this implication, we will use the following property of P4 -free graphs (cographs)
which we will prove in Section 10.4, Theorem 10.7.
Claim 4. If a graph G is P4 -free, then its complement G is also P4 -free;
moreover, if G has more than one vertex, precisely one of G and G is connected.
The function is trivially read-once if there is only one variable. Assume that the
implication (v) ⇒ (i) is true for all functions with fewer than n variables.
By Claim 4, one of G or G is disconnected. Suppose G is disconnected, with
connected components G1 , . . . , Gr partitioning the variables of f into r disjoint
sets. Then the prime implicants of f are similarly partitioned into r collections
Pi , (i = 1, . . . , r), defining positive functions f1 , . . . , fr , respectively, where Gi =
G(fi ) and f = f1 ∨ · · · ∨ fr . Clearly, G(fi ) is P4 -free because it is an induced
subgraph of G(f ), and each fi is normal for the same reason. Therefore, by
induction, there is a read-once expression Fi for each i, and combining these, we
obtain a read-once expression for f given by F = F1 ∨ · · · ∨ Fr .
10.4 The properties of P4 -free graphs and cographs 463

Now suppose that G is disconnected, and let H1 , . . . , Hr be the connected com-


ponents of G, again partitioning the variables into r disjoint sets. Define Gi = Hi .
We observe that every vertex xi of Gi is adjacent to every vertex xj of Gj for
i  = j , so each maximal clique of G(f ) consists of a union of maximal cliques of
G1 , . . . , Gr . Moreover, since f is normal, the maximal cliques are precisely the
prime implicants. It now follows that by restricting f to the variables of Gi , we
obtain a normal function fi whose co-occurrence graph G(fi ) = Gi is P4 -free,
and f = f1 ∧ · · · ∧ fr . Therefore, by induction, as before, there is a read-once
expression Fi for each i, and combining these, we obtain a read-once expression
for f given by F = F1 ∧ · · · ∧ Fr . 

Example 10.5. Let us again consider the function

f0 = ay ∨ cxy ∨ bw ∨ bz,

whose co-occurrence graph G(f0 ) was shown in Figure 10.1. Clearly, f0 is normal,
and G(f0 ) is P4 -free and has two connected components G1 = G{a,c,x,y} and G2 =
G{b,w,z} . Using the arguments presented after Claim 4 above, we can handle these
components separately, finding a read-once expression for each and taking their
disjunction.
For G1 , we note that its complement G1 is disconnected with two components,
namely, an isolated vertex H1 = {y} and H2 = G1{a,c,x} having two edges; we can
handle the components separately and take their conjunction. The complement
H2 has an isolate {a} and edge (c, x) which we combine with disjunction. Finally,
complementing (c, x) gives two isolates which are combined with conjunction.
Therefore, the read-once expression representing G1 will be y ∧ (a ∨ [c ∧ x]).
For G2 , we observe that its complement G2 has an isolate {b} and edge (w, z),
which we combine with conjunction, giving b∧(w ∨z). So the read-once expression
for f0 is
f0 = [y ∧ (a ∨ [c ∧ x])] ∨ [b ∧ (w ∨ z)].


10.4 The properties of P4 -free graphs and cographs


The recursive construction of a read-once expression, which we just saw illustrated
at the end of the last section in Example 10.5, was based on the special properties
of P4 -free graphs and, in particular, the use of Claim 4. We present these structural
and algorithmic properties in this section.
The complement reducible graphs, or cographs, can be defined recursively as
follows:
(1) A single vertex is a cograph.
(2) The union of disjoint cographs is a cograph.
(3) The join of disjoint cographs is a cograph,
464 10 Read-once functions

where the join of disjoint graphs G1 , . . . , Gk is the graph G with V (G) = V (G1 ) ∪
· · · ∪ V (Gk ) and E(G) = E(G1 ) ∪ · · · ∪ E(Gk ) ∪ {(x, y) | x ∈ V (Gi ), y ∈ V (Gj ),
for all i  = j }. An equivalent definition can be obtained by substituting for (3) the
rule
(3 ) the complement of a cograph is a cograph;
see Exercise 15 at the end of the chapter.
The building of a cograph G from these rules can be represented by a rooted
tree T that records its construction, where
(a) the leaves of T are labeled by the vertices of G;
(b) if G is formed from the disjoint cographs G1 , . . . , Gk (k > 1), then the root
r of T has as its children the roots of the trees of G1 , . . . , Gk ; moreover,
(c) the root r is labeled 0 if G is formed by the union rule (2), and labeled 1 if
G is formed by the join rule (3).
Among all such constructions, there is a canonical one whose tree T is called
the cotree and satisfies the additional property that
(d) on every path, the labels of the internal nodes alternate between 0 and 1.
Thus, the root of the cotree is labeled 1 if G is connected and labeled 0 if G is
disconnected; an internal node is labeled 0 if its parent is labeled 1, and vice versa.
A subtree Tu rooted at an internal node u represents the subgraph of G induced by
the labels of its leaves, and vertices x and y of G are adjacent in G if and only if
their least common ancestor in the cotree is labeled 1.
Notice that the recursive application of rules (1)–(3) follows a bottom-up view-
point of the construction of G. An alternate top-down viewpoint can also be taken,
as a recursive decomposition of G, where we repeatedly partition the vertices
according to either the connected components of G (union) or the connected
components of its complement (join).
One can recognize whether a graph G is a cograph by repeatedly decomposing
it this way, until the decomposition either fails on some component H (both H and
H are connected) or succeeds, reaching all the vertices. The cotree is thus built
top-down as the decomposition proceeds.2
The next theorem gives several characterizations of cographs.
Theorem 10.7. The following are equivalent for an undirected graph G:
(i) G is a cograph.
(ii) G is P4 -free.
(iii) For every subset X of vertices (|X| > 1), either the induced subgraph GX
is disconnected or its complement GX is disconnected.
2 This latter viewpoint is a particular case of modular decomposition [358] that applies to arbitrary
graphs, and any modular decomposition algorithm will produce a cotree when given a cograph,
although such general algorithms [430, 636] are more involved than is necessary for cograph
recognition.
10.4 The properties of P4 -free graphs and cographs 465

In particular, any graph G for which both G and G are connected, must contain
an induced P4 . This claim appears in Seinsche [817]; independently, it was one of
the problems on the 1971 Russian Mathematics Olympiad, and seven students gave
correct proofs; see [366]. The full version of the theorem was given independently
by Gurvich [422, 423, 425] and by Corneil, Lerchs, and Burlingham [213], where
further results on the theory of cographs were developed. Note that it is impossible
for both a graph G and its complement G to be disconnected; see Exercise 7.
It is rather straightforward to recognize cographs and build their cotree in O(n3 )
time. The first linear O(n + e) time algorithm for recognizing cographs appears in
Corneil, Perl, and Stewart [214]. Subsequently, other linear time algorithms have
appeared in [154, 155, 431]; a fully dynamic algorithm is given in [826], and a
parallel algorithm was proposed in [251].

Proof of Theorem 10.7. (iii) =⇒ (i): This implication follows immediately


from the top-down construction of the cotree, as we just discussed.
(i) =⇒ (ii): Let T be the cotree of G, and for vertex x ∈ V (G), let px denote
the path in T from the leaf labeled x to the root of the tree.
Suppose that G contains an induced P4 with edges (a, b), (b, c), (c, d). Since c
and d are adjacent in G, their least common ancestor in T is an internal node u
labeled 1. Consider the path pa . Since both pc and pd must meet pa in an internal
node labeled by a 0, it follows that (i) they meet pa in the same internal node, say
v, and (ii) v is an ancestor of u.
Let us consider pb . Now, pb meets pa in an internal node z labeled 1. If z is
above v, then the least common ancestor of b and d will be z, which is labeled 1,
contradicting the fact that b and d are nonadjacent in G. Furthermore, z  = v, since
they have opposite labels, which implies that z must lie below v on pa . However,
in this case, the least common ancestor of b and c will be v, which is labeled 0,
contradicting the fact that b and c are adjacent in G. This proves the implication.
(ii) =⇒ (iii): Assume that G is P4 -free, thus G is also P4 -free, since P4 is
self-complementary. Suppose that there is an induced subgraph H of G such that
both H and its complement H are connected. Clearly, they are also P4 -free and
can contain neither an isolated vertex nor a universal vertex (one that is adjacent
to all other vertices).
We will construct an ordering a1 , a2 , . . . , an of V (H ) such that, for odd-indexed
vertices a2j −1 ,
(ai , a2j −1 ) ∈ E(H ), for all i < 2j − 1,
and, for even-indexed vertices a2j ,

(ai , a2j ) ∈ E(H ), for all i < 2j .

In this case, an will either be an isolated vertex if n is even, or a universal vertex


if n is odd, a contradiction.
Choose a1 arbitrarily. Since a1 cannot be universal in H , there is a vertex a2
such that (a1 , a2 ) ∈ E(H ). Since H is connected, there is a path in H from a1
466 10 Read-once functions

to a2 . Consider the shortest such path. It consists of exactly two edges of H , say,
(a1 , a3 ), (a2 , a3 ) ∈ E(H ), since H is P4 -free.
By a complementary argument, since H is connected and P4 -free, there is
a shortest path in H from a2 to a3 consisting of exactly two edges of H , say,
(a2 , a4 ), (a3 , a4 ) ∈ E(H ). Now we argue that (a1 , a4 ) ∈ E(H ), since otherwise, H
would have a P4 .
We continue constructing the ordering in the same manner. Assume we have
a1 , a2 , . . . , a2j ; we will find the next vertices in the ordering.
(Find a2j +1 ). There is a shortest path in H from a2j −1 to a2j consisting of
exactly two edges of H , say, (a2j −1 , a2j +1 ), (a2j , a2j +1 ) ∈ E(H ). Note that a2j +1
has not yet been seen in the ordering, since none of the ai is adjacent to a2j . We
argue, for all i < 2j − 1, that (ai , a2j +1 ) ∈ E(H ), since otherwise, H would have
a P4 on the vertices {ai , a2j −1 , a2j +1 , a2j }. Thus, we have enlarged our ordering by
one new vertex.
(Find a2j +2 ). There is a shortest path in H from a2j to a2j +1 consisting of
exactly two edges of H , say, (a2j , a2j +2 ), (a2j +1 , a2j +2 ) ∈ E(H ). Now we argue,
for all i < 2j , that (ai , a2j +2 ) ∈ E(H ), since otherwise, H would have a P4 on
the vertices {ai , a2j , a2j +2 , a2j +1 }. Thus, we have enlarged our ordering by another
new vertex.
Eventually, this process orders all vertices, and the last one an will be either
isolated or universal, giving the promised contradiction. 

10.5 Recognizing read-once functions


Given a Boolean function f , can we efficiently determine whether f is a read-once
function? This is known as the recognition problem for read-once functions, which
we define as follows:

Read-Once Recognition
Input: A representation of a positive Boolean function f by its list of prime impli-
cants, namely, its complete DNF expression.
Output: A read-once expression for f , or “failure” if there is none.

Chein [190] and Hayes [479] first introduced read-once functions and provided
an exponential-time recognition algorithm for the family. Peer and Pinter [734]
also gave an exponential-time factoring algorithm for read-once functions, whose
nonpolynomial complexity is due to the need for repeated calls to a routine that
converts a DNF representation to a CNF representation, or vice-versa. We have
already observed in Section 10.3 that combining Theorem 10.5 with condition (iii)
of Theorem 10.6 implies that recognizing read-once functions has polynomial-time
complexity, although without immediately providing the read-once expression.
In this section, we present the polynomial-time recognition algorithm due
to Golumbic, Mintz, and Rotics [400, 401, 402] and analyze its computational
10.5 Recognizing read-once functions 467

Procedure GMR Read-Once Recognition(f )


Step 1: Build the co-occurrence graph G(f ).
Step 2: Test whether G(f ) is P4 -free. If so, construct the cotree T for G(f ). Otherwise, exit with
“failure.”
Step 3: Test whether f is a normal function, and if so, output T as the read-once expression.
Otherwise, exit with “failure.”

Figure 10.6. Procedure GMR Read-Once Recognition.

complexity. The algorithm is described in Figure 10.6. It is based on condition


(v) of Theorem 10.6, that a function is read-once if and only if its co-occurrence
graph is P4 -free (namely, is a cograph) and the function is normal. That is, we first
test whether G(f ) is P4 -free and construct its cotree T , then we test whether f is
normal. Passing both tests assures that f is read-once. Moreover, T will provide
us with the read-once expression, see Remark 10.4.

Remark 10.4. The reader has no doubt noticed that the cotree of a P4 -free graph
is very similar to the parse tree of a read-once expression. On the one hand, when
a function is read-once, its parse tree is identical to the cotree of its co-occurrence
graph: Just switch the labels {0, 1} to {∨, ∧}. On the other hand, a cotree always
generates a read-once expression that represents “some” Boolean function g. Thus,
the question to be asked is:
Given a function f , although G(f ) may be P4 -free and, thus, has a cotree T , will
the read-once function g represented by T be equal to f or not? (In other words,
G(g) = G(f ) and, by construction, the maximal cliques of G(g) are precisely the
prime implicants of g, so will these also be the prime implicants of f ?)
The function f = ab ∨ bc ∨ ac is a negative example; its graph is a triangle and
g = abc.
The answer to our question lies in testing normality, that is, comparing the prime
implicants of g with those of f , and doing it efficiently. 

The main result of this section is the following:

Theorem 10.8. [400, 401, 402] Given the complete DNF formula of a positive
Boolean function f on n variables, the GMR procedure solves the Read-Once
Recognition problem in time O(n|f |), where |f | denotes the length of the DNF
expression.

Proof. (Step 1.) The first step of the GMR procedure is building the graph G(f ).
If an arbitrary positive function f is given by its DNF expression, that is, as a list
of its prime implicants P = {P1 , . . . , Pm }, then the edge set of G(f ) can be found

in O( m 2
i=1 |Pi | ) time. It is easy to see that this is at most O(n|f |).
(Step 2.) As we saw in Section 10.4, the complexity of testing whether the graph
G(f ) is P4 -free and providing a read-once expression (its cotree T ) is O(n + e),
as first shown in [214]. This is at worst O(n2 ) and is bounded by O(n|f |). (A
straightforward application of Theorem 10.7 would yield complexity O(n3 )).
468 10 Read-once functions

(Step 3.) Finally, we show that the function f can be tested for normality
in O(n|f |) time by a novel method, due to [400] and described more fully in
[401, 402, 685]3 . As in Remark 10.4, we denote by g the function represented by
the cotree T ; we will verify that g = f .

Testing normality
We may assume that G = G(f ) has successfully been tested to be P4 -free, and
that T is its cotree. We construct the set of maximal cliques of G recursively, by
traversing the cotree T from bottom to top, according to Lemma 10.2 below. For
a node x of T , we denote by Tx the subtree of T rooted at x, and we denote by gx
the function represented by Tx . We note that Tx is also the cotree representing the
subgraph GX of G induced by the set X of labels of the leaves of Tx .
First, we introduce some notation. Let X1 , X2 , . . . , Xr be disjoint sets, and let Ci
be a set of subsets of Xi (1 ≤ i ≤ r). We define the Cartesian sum C = C1 ⊗ · · · ⊗ Cr ,
to be the set whose elements are unions of individual elements from the sets Ci
(one element from each set). In other words,

C = C1 ⊗ · · · ⊗ Cr = {C1 ∪ · · · ∪ Cr | Ci ∈ Ci , 1 ≤ i ≤ r}.

For a cotree T , let C(T ) denote the set of all maximal cliques in the cograph
corresponding to T . From the definitions of cotree and cograph, we obtain:

Lemma 10.2. Let G be a P4 -free graph and let T be the cotree of G. Let h be an
internal node of T and let h1 , . . . , hr be the children of h in T .
(1) If h is labeled with 0, then C(Th ) = C(Th1 ) ∪ · · · ∪ C(Thr ).
(2) If h is labeled with 1, then C(Th ) = C(Th1 ) ⊗ · · · ⊗ C(Thr ).

The following algorithm calculates, for each node x of the cotree, the set C(Tx )
of all the maximal cliques in the cograph defined by Tx . It proceeds bottom up,
using Lemma 10.2, and also keeps at each node x:
s(Tx ): The number of cliques in C(Tx ). This number is equal to the number of
prime implicants in gx .

L(Tx ): The total length of the list of cliques at Tx , namely, L(Tx ) = {|C| :
C ∈ C(Tx )}, which represents the total length of the list of prime
implicants of gx .
A global variable L maintains the overall size of the clique lists as they are being
built. (In other words, L is the sum of all L(Tx ) taken over all x on the frontier as
we proceed bottom up.)

3 In [401] only a complexity bound of O(n2 k) was claimed, where k is the number of prime implicants;
however, using an efficient data structure and careful analysis, it has been shown in [402], following
[685], that the method can be implemented in O(n|f |). For the general case of a positive Boolean
function given in DNF form, it is possible to check normality in O(n3 k) time using the results of
[538]; see Exercise 13 at the end of the chapter.
10.5 Recognizing read-once functions 469

Procedure Checking Normality(f )

Step 3a: Initialize k to be the number of terms (clauses) in the DNF representation of f . For every
leaf a of T , set C(Ta ) = {a} and set s(Ta ) = 1, L(Ta ) = 1, and L = n.
Step 3b: Scan T from bottom to top, at each internal node h reached, let h1 , . . . , hr be the children
of h and do:

(1) If h is labeled with 0:


• set s(Th ) = s(Th1 ) + · · · + s(Thr )
• if s(Th ) > k stop, and claim that f is not normal; otherwise,
• set L(Th ) = L(Th1 ) + · · · + L(Thr )
• L remains unchanged
• set C(Th ) = C(Th1 ) ∪ · · · ∪ C(Thr )
(2) If h is labeled with 1:
• set s(Th ) = s(Th1 ) × · · · × s(Thr )
• if s(Th ) > k stop, and claim that f is not normal; otherwise,
• set L(Th ) = I{|C1 | + · · · + |Cr | | (C1 , . . . , Cr ) ∈ C(Th1 ) × · · · × C(Thr )}
• set L ← L + L(Th ) − [L(Th1 ) + · · · + L(Thr )]
• if L > |f | stop, and claim that f is not normal; otherwise,
• set C(Th ) = C(Th1 ) ⊗ · · · ⊗ C(Thr )
Step 3c: Let y be the root of T , and let C(Ty ) be the set of maximal cliques of the cograph, obtained
by the preceding step.

• If s(Ty ) = k or if |L| = |f | stop, and claim that f is not normal.


• Otherwise, compare the set C(Ty ) with the set of prime implicants (from the DNF) of f ,
using radix sort as described in the proof. If the sets are equal, claim that f is normal.
Otherwise, claim that f is not normal.

Figure 10.7. Procedure Checking Normality.

The steps of the normality-checking procedure are given in Figure 10.7. This
procedure correctly tests normality because it tests whether the maximal cliques
of the cograph are precisely the prime implicants of f .

Complexity analysis
The purpose of comparing s(Th ) with k at each step is simply a speedup mechanism
to assure that the number of cliques never exceeds the number of prime implicants.
Similarly, calculating L(Th ), that is, |gh | and comparing L with |f | at each step
assures that the overall length of the list of cliques will never exceed the sum of
the lengths of the prime implicants. (Note that we precompute L, and test against
|f | before we actually build a new set of cliques.)
For efficiency, we number the variables {x1 , x2 , . . . , xn }, and maintain both the
prime implicants and the cliques as lists of their variables. Then, each collection of
cliques C(Tx ) is maintained as a list of such lists. In this way, constructing C(Th ) in
Step 3b(1) can be done by concatenating the lists C(Th1 ), . . . , C(Thr ), and construct-
ing C(Th ) in Step 3b(2) can be done by creating a new list of cliques by repeatedly
taking r (sub)cliques, one from each set C(Th1 ), . . . , C(Thr ) and concatenating these
r (disjoint) lists of variables.
470 10 Read-once functions

Thus, the overall calculation of C(Th ) takes at most O(|f |) time. Since the
number of internal nodes of the cotree is less than n, the complexity of Steps 3a
and 3b is O(n|f |).
It remains to compare the list of the prime implicants of f with the list of the
maximal cliques C(Ty ), where y is the root of T . This can be accomplished using
radix sort in O(nk) time. Initialize two k × n bit matrices P and C filled with zeros.
Each prime implicant Pi is traversed (it is a list of variables), and for every xj ∈ Pi
we assign Pi,j ← 1, thus, converting it into its characteristic vector, which will be
in row i of P. Similarly, we traverse each maximal clique Ci and convert it into
its characteristic vector, which will be in row i of C. It is now a straightforward
procedure to lexicographically sort the rows of these two matrices and compare
them in O(nk) time.
This concludes the proof, since the complexity of each step is bounded by
O(n|f |). 

Of course, the form in which a function f is given influences the computa-


tional complexity of recognizing whether it is read-once. For example, if f is
initially represented by an arbitrary Boolean expression, we are required to pay a
preprocessing expense to test that f is positive and to transform f into its DNF
expression in order to apply the GMR procedure. The same would be true if f
were to be given as a BDD. This preprocessing could be exponential in the size of
the original input.
Actually, for a general (nonmonotone) DNF expression ψ, Theorem 1.30
(Section 1.11) implies that it is NP-hard to decide whether ψ represents a read-
once function, and Aizenstein et al. [13] proved that this decision problem is in
co-NP, but the question remains open for BDDs.
As for positive expressions (other than DNFs), the problem is co-NP-complete.
More precisely, we are now going to show that it is co-NP-complete to decide
whether a positive Boolean function given by an arbitrary positive Boolean
expression is read-once.
In the remainder of this section, we let g0 be the positive Boolean function
defined by the quadratic DNF formula φ0 = x1 y1 ∨ . . . ∨ xn yn .

Lemma 10.3. (Gurvich and Khachiyan [429]) When h is a positive function


defined by a CNF formula θ on the variables x1 , y1 , . . . , xn , yn , it is co-NP-complete
to verify the equality g0 ∨ h = g0 . Moreover, the problem remains co-NP-complete
under the additional conditions that h has no linear implicants and no quadratic
implicants.

Proof. Let θ be any CNF such that no variable appears more than three times in
θ . It is NP-complete to decide whether θ is satisfiable; see [371, 932].
Now, replace x̄i by yi in θ for all i ∈ {1, 2, . . . , n}, denote by θ the resulting
(positive) CNF, and denote by h the Boolean function represented by θ . It is easy
10.5 Recognizing read-once functions 471

to see that θ is satisfiable if and only if g0 < h or, equivalently, if and only if
g0 ∨ h  = g0 . Moreover, if the number of clauses of θ is large enough (say, at
least 7), then θ has no implicants of degree smaller than three. 

Remark 10.5. Recall that verifying the equality of two Boolean functions defined
by positive DNF and CNF expressions, respectively, is exactly Positive DNF
Dualization, which is not co-NP-complete unless every problem of co-NP can
be solved in quasi-polynomial time; see Section 4.4.2. Yet, verifying the similar
identity g0 ∨ h = g0 appears harder. 

Lemma 10.4. Let h be a positive function without linear implicants. If the function
f = g0 ∨ h is read-once, then f is quadratic.

Proof. Since h has no linear implicants, xi yi is a prime implicant of f = g0 ∨ h


for each i ∈ {1, 2, . . . , n}.
If f is read-once, let T be its associated parse tree. By definition, the leaves
of T are labeled by the variables x1 , y1 , . . . , xn , yn , each of which appears at most
once, and in fact, exactly once, since xi yi is a prime implicant of f for each
i ∈ {1, 2, . . . , n}. All other nodes of T are labeled by ∨ and ∧.
For each i ∈ {1, 2, . . . , n}, let us consider in T two paths pi and ri from the
root v0 to the leaves labeled by xi and yi , respectively, and denote by vi the last
common vertex of these two paths. Obviously, vi is a ∧-vertex, since xi yi is a
prime implicant of f . For the same reason, vertex vi is of degree three in T : The
corresponding three edges lead towards xi , yi , and v0 . Moreover, for the same
reason, paths pi and ri have no other ∧-vertices.
Since i ∈ {1, 2, . . . , n} was chosen arbitrarily, we conclude that every path in T
from the root to a leaf contains exactly one ∧-vertex, and that this vertex is of
degree three. This easily implies that every prime implicant of f is quadratic. 

The read-once functions constructed as in the previous lemma are more


completely characterized in Exercise 21.

Remark 10.6. It is easy to demonstrate that the condition on h is essential in


Lemma 10.4. Let us consider, for example, the function

h = (x1 ∨ x2 ∨ . . . ∨ xn ∨ y1 )(x1 ∨ x2 ∨ . . . ∨ xn ∨ y2 ) . . . (x1 ∨ x2 ∨ . . . ∨ xn ∨ yn ).

Obviously, the corresponding function g0 ∨ h = x1 ∨ x2 ∨ . . . ∨ xn ∨ (y1 y2 . . . yn )


is read-once, but it contains the prime implicant (y1 y2 . . . yn ), which is not
quadratic when n > 2. Yet, in this case h has n linear prime implicants, namely,
x1 , x 2 , . . . , x n . 

Lemma 10.5. Let h be a positive function without linear or quadratic implicants.


The function f = g0 ∨ h is read-once if and only if g0 ∨ h = g0 .
472 10 Read-once functions

Proof. The “if” part is obvious, since function g0 is read-once, while the “only if”
part follows immediately from the previous lemma. 

Remark 10.7. Again, it is easy to demonstrate that the assumptions on h are


essential. For instance, when n = 5 and

h = (x1 ∨ y2 ∨ x3 ∨ x4 )(x1 ∨ y2 ∨ y3 ∨ y4 )(y1 ∨ x2 ∨ x3 ∨ x4 )(y1 ∨ x2 ∨ y3 ∨ y4 ),

we find
g0 ∨ h = (x1 ∨ y2 )(x2 ∨ y1 ) ∨ (x3 ∨ x4 )(y3 ∨ y4 ) ∨ x5 y5 ,
so that g0 ∨ h is read-once but distinct from g0 . 
Now we are ready to prove the desired result.
Theorem 10.9. For a Boolean function f given by a positive ∨-∧ expression, it
is co-NP-complete to decide whether f is read-once.
Proof. NP-hardness immediately follows from Lemmas 10.3 and 10.5. It remains
to show that the decision problem is in co-NP. This will follow from Theorem 10.6:
f is read once if and only if every prime implicant P of f and prime implicant D of
f d have exactly one variable in common. Hence, to disprove that f is read-once,
it is sufficient (and necessary) to exhibit dual prime implicants P0 and D0 with at
least two common variables. Furthermore, to verify that P0 is a prime implicant
of f , it is sufficient to check that
(i) f is true if all variables of P0 are true, while all others are false.
(ii) f is false if all variables of P0 but one are true, while all others are false.
This can be checked in polynomial time.
Similarly, we can check that D0 is a prime implicant of f d . To do so, it is
enough to dualize the expression of f by swap of ∨ and ∧ (see Theorem 1.3 in
Section 1.3). 

Remark 10.8. The recognition problem remains in co-NP when the function f
is given by any polynomially computable representation (or polynomial oracle)
and is guaranteed to be positive. Moreover, Aizenstein et al. [13] showed that the
problem remains in co-NP even without assumption of the positivity of f . 
Remark 10.9. Interestingly, the same arguments (three lemmas and theorem)
prove that it is a co-NP-complete problem to recognize whether a positive Boolean
formula, φ0 ∨ θ, defines a quadratic Boolean function. Indeed, the corresponding
Boolean function g0 ∨ h is quadratic if and only if g0 ∨ h = g0 , provided h has no
implicants of degree less than three. 
Exercise 27 at the end of the chapter raises some related open questions
regarding the complexity of recognizing a read-once function depending on the
10.6 Learning read-once functions 473

representation of the function. For example, we may be fortunate to receive f as


a very compact expression, yet not know how to take advantage of this. When
might it be possible to efficiently construct the co-occurrence graph of a Boolean
function and test normality for forms other than a positive DNF representation?

10.6 Learning read-once functions


I’ve got a secret. It’s a Boolean function f . Can you guess what it is? You can ask
me questions like: “What is the value of f at the point X?” Can you figure out my
mystery function with just 20 questions?

The answer, of course, is yes, 20 questions are enough if the number of variables
is at most 4. Otherwise, the answer is no. If there are n variables, then there will
be 2n independent points to be queried before you can “know” the function.

Suppose I give you a clue: The function f is a positive Boolean function. Now
can you learn f with fewer queries?

Again the answer is yes. The extra information given by the clue allows you to
ask fewer questions in order to learn the function. For example, in the case n = 4,
first try (1,1,0,0). If the answer is true, then you immediately know that (1,1,1,0),
(1,1,0,1) and (1,1,1,1) are all true. If the answer is false, then (1,0,0,0), (0,1,0,0)
and (0,0,0,0) are all false. Either way, you asked one question and got four answers.
Not bad. Now if you query (0,0,1,1), you will similarly get two or three more free
answers. In the worst case, it could take 10 queries to learn the function (rather
than 16 had you queried each point).
Learning a Boolean function in this manner is sometimes called Exact Learn-
ing with Queries; see Angluin [21]. It receives as input an oracle for a Boolean
function f , that is, a “black box” that can answer a query on the value of f at a
given Boolean point in constant time. It then attempts to learn the value of f at all
2n points and outputs a Boolean expression that is logically equivalent to f .
If we know something extra about the structure of the function f , then it may
be possible to reduce the number of queries required to learn the function. We saw
this earlier in our example with the clue (that the mystery function was positive).
However, even for positive functions, the number of queries needed to learn the
function remains exponential.
The situation is much better for read-once functions. In this case, the number
of required queries can be reduced to a polynomial number, and the unique read-
once formula can be produced, provided we “know” that the function is read-once.
Thus, the read-once functions constitute a very natural class of functions that can
be learned efficiently, and, for this reason, they have been extensively studied
within the computational learning theory community.
For our purposes, we define the problem as follows:
474 10 Read-once functions

Procedure AHK Read-Once Exact Learning(f )


Step 0: Check whether f is a constant function, using the oracle: If f (1) = 0 then f is constant 0;
if f (0) = 1 then f is constant 1.
Step 1: Use the oracle to construct the co-occurrence graph G(f ).
Step 2: Build a cotree T for G(f ) (“knowing” a priori that it must be P4 -free and thus will succeed).
Step 3: Immediately output T as the read-once expression (“knowing” a priori that f is normal).

Figure 10.8. Procedure AHK Read-Once Exact Learning.

Read-Once Exact Learning


Input: A black-box oracle to evaluate f at any given point, where f is known
a priori to be a positive read-once function.
Output: A read-once factorization for f .

Remark 10.10. There is a subtle but significant difference between the Exact
Learning problem and the Recognition problem. With recognition, we have
a DNF expression for f and must determine whether it represents a read-once
function. With exact learning, we have an oracle for f whose correct usage relies
upon the a priori assumption that the function to be learned is read-once. So the
input assumptions are different, but the output goal in both cases is a correct
read-once expression for f . Also, when measuring the complexity of recognition,
we count the algorithmic operations; when measuring the complexity of exact
learning, we must count both the operations implemented by the algorithm and the
number of queries to the oracle. 
As we saw in Section 10.5, the GMR recognition procedure: (1) uses the DNF
expression to construct the co-occurrence graph G(f ), then (2) tests whether G(f )
is P4 -free and builds a cotree T for it, and (3) uses T and the original DNF formula
to test whether f is normal; if so, T is the read-once expression.
In contrast to this, Angluin, Hellerstein, and Karpinski [22] give the exact
learning algorithm in Figure 10.8.
The main difference between AHK exact learning and GMR recognition that
concerns us will be Step 1, that is, how to construct G(f ) using an oracle. We
outline the solution through a series of exercises at the end of the chapter.
(A) In a greedy manner, we can determine whether a subset U ⊆ X of the
variables contains a prime implicant, and find one when the answer is positive.
Exercise 16 gives such a routine Find-PI-In(U ), which has complexity O(n) plus
|U | queries to the oracle. A similar greedy algorithm Find-DualPI-In(U ) will
find a dual prime implicant contained in U .
(B) An algorithm Find-Essential-Variables is developed in Exercises 17,
18, and 19 that not only finds the set Y of essential variables4 but also, in the
process, for each variable xi in Y , generates a prime implicant P [i] and a dual
4 We have generally assumed throughout this chapter that all of the variables for a Boolean function
f (and hence for f d ) are essential. However, in the exact learning problem, we may wish to drop
this assumption and then need to find the set of essential variables.
10.6 Learning read-once functions 475

prime implicant D[i] containing xi . This algorithm uses Find-PI-In and Find-
DualPI-In and can be implemented to run in O(n2 ) time using O(n2 ) queries to
the oracle.
(C) Finally, we construct the co-occurrence graph G(f ) based on the following
Lemma (whose proof is proposed as Exercise 14):
Lemma 10.6. Let f be a nonconstant read-once function over the variables N =
{x1 , x2 , . . . , xn }. Suppose that Di is a dual prime implicant containing xi but not
xj , and that Dj is a dual prime implicant containing xj but not xi . Let Ri,j =
(N \ (Di ∪ Dj )) ∪ {xi , xj }. Then (xi , xj ) is an edge in the co-occurrence graph
G(f ) if and only if Ri,j contains a prime implicant.
We obtain G(f ) using the oracle in the following way: For each pair of essential
variables xi and xj ,
C.1: if xi ∈ D[j ] or xj ∈ D[i], then (xi , xj ) is not an edge of G(f );
C.2: otherwise, construct Ri,j from D[i] and D[j ] and test whether Ri,j contains
a prime implicant using just one query to the oracle, namely, is f (XRi,j ) = 1? If
so, then (xi , xj ) is an edge in G(f ); otherwise, it is not an edge.

Complexity
The computational complexity of the procedure is determined as follows. Step 0
requires two queries to the oracle. Step 1 constructs the co-occurrence graph G(f )
by first calling the algorithm Find-Essential-Variables (Part B) to generate
P [i] and D[i] for each variable xi in O(n2 ) time using O(n2 ) queries, then it
applies Lemma 10.6 (Part C) to determine the edges of the graph. Step C.1 can be
done in the same complexity as Step B; however, Step C.2 uses O(n3 ) time and
O(n2 ) queries, since, for each pair i, j , we have O(n) operations and 1 query. Step
2, building the cotree T for G(f ) takes O(n2 ) time using one of the fast cograph
algorithms of [154, 214, 431], and Step 3 takes no time at all.
To summarize, the overall complexity using the method of Angluin, Hellerstein
and Karpinski [22] will be O(n3 ) time and O(n2 ) queries. However, in an unpub-
lished manuscript [250], Dahlhaus subsequently reported an alternative to Step C.2
using only O(n2 ) time. (Further generalizations by Raghavan and Schach [774]
lead to the same time bound.)
The main result, therefore, is the following:
Theorem 10.10. The Read-Once Exact Learning problem can be solved with
the AHK procedure in O(n2 ) time, using O(n2 ) queries to the oracle.
Proof. The correctness of the AHK exact learning procedure follows from
Lemma 10.6, Exercises 17–19, and Remark 10.4. 

Remark 10.11. If a lying, deceitful, cunning adversary were to place a non-read-


once function into our “black-box” query oracle, then the exact learning method
described here would give an incorrect identification answer, since the “a priori
476 10 Read-once functions

read-once” assumption is vital for the construction of G(f ). (See the discussion
in Exercise 28 concerning what might happen if such an oracle were to be applied
to a non-read-once function.) 
Further topics relating computational learning theory with read-once functions
may be found in [13, 22, 162, 484, 396, 397, 482, 749, 774, 838, 884, etc.].

10.7 Related topics and applications of read-once functions


In this section, we briefly mention three topics related to read-once functions and
application areas in which they play an interesting role.

10.7.1 The readability of a Boolean function


Suppose a given function f is not a read-once function. In this case, we may still
want to obtain an expression that is logically equivalent to f and that has a small
number of repetitions of the variables. The notion of the readability of a Boolean
function is used to capture this notion.
We call a Boolean expression read-m if each variable appears at most m times
in the expression. A Boolean function f is defined to be a read-m function if it
has an equivalent read-m expression. Finally, the readability of f is the smallest
number m such that f is a read-m function.
The definition of readability does not require the function to be positive.
Thus, characterizing read-m Boolean functions and characterizing positive read-m
Boolean functions appear to be separate questions.
As noted earlier in Section 10.5, recognizing whether a nonmonotone DNF
represents a read-once function is NP-hard. The same result holds for recognizing
whether a nonmonotone DNF represents a read-m function when m > 1. (This
follows again from Theorem 1.30.)
To the best of our knowledge, the complexity of recognizing read-m functions
given by an irredundant positive DNF is open for all fixed m ≥ 2. Golumbic,
Mintz, and Rotics therefore proposed in [401] to investigate restrictions of the
general problem to special cases of positive Boolean functions f identified by the
structure of the co-occurrence graph G(f ). As a first step in this direction, they
showed the following result:
Theorem 10.11. [401] Let f be a positive Boolean function. If f is a normal
function and its co-occurrence graph G(f ) is a partial k-tree, then f is a read-2k
function and a read-2k expression for f can be obtained in polynomial (O(nk+1 ))
time.
Notice that if G(f ) is a tree, then f would immediately be normal. Therefore,
in the case of k = 1, Theorem 10.11 reduces to the following:
Corollary 10.1. Let f be a positive Boolean function. If G(f ) is a tree, then f is
a read-twice function.
10.7 Related topics and applications of read-once functions 477

10.7.2 Factoring general Boolean functions


Factoring is the process of deriving a parenthesized Boolean expression or factored
form representing a given Boolean function. Since, in general, a function will have
many factored forms, the problem of factoring Boolean functions into shorter, more
compact, logically equivalent expressions is one of the basic operations in the early
stages in designing logic circuits. Generating an optimum factored form (a shortest
length expression) is an NP-hard problem. Thus, heuristic algorithms have been
developed in order to obtain good factored forms.
An exception to this, as we have already seen, are the read-once functions. For
a read-once function f , the read-once expression is unique, it can be determined
very efficiently; moreover, it is the shortest possible expression for f . According
to [734], read-once functions account for a significant percentage of functions that
arise in real circuit applications. Some smaller or specifically designed circuits
may indeed be read-once functions, but most often they will not even be positive
functions. Nevertheless, we can use the optimality of factoring read-once functions
as part of a heuristic method.
Such an approach for factoring general Boolean functions has been described
in [399, 686], and is based on graph partitioning. Their heuristic algorithm is
recursive and operates on the function and its dual to obtain the better factored
expression. As a special class, which appears in the lower levels of the recursive
factoring process, are the read-once functions.
The original function f is decomposed into smaller components, for example,
f = f1 ∨ f2 ∨ f3 , and when a component is recognized to be read-once, a special
purpose subroutine (namely, the GMR procedure of Section 10.5) is called to
factor that read-once component efficiently and optimally. Their method has been
implemented in the SIS logic synthesis environment, and an empirical evaluation
indicates that the factored expressions obtained are usually significantly better than
those from previous fast algebraic factoring algorithms and are quite competitive
with previous Boolean factoring methods, but with lower computation costs (see
[685, 686]).

10.7.3 Positional games


We introduce here the notions of normal, extensive, and positional game forms,
and then show their relationship to read-once functions.

Definition 10.1. Given three finite sets S 1 = {s11 , s21 , ..., sm1 1 }, S 2 = {s12 , s22 , ..., sm2 2 },
which are interpreted as the sets of strategies of the players 1 and 2, and X =
{x1 , x2 , ..., xk }, which is interpreted as the set of outcomes, a game form (of two
players) is a mapping g : S 1 × S 2 → X, which assigns an outcome x(s 1 , s 2 ) ∈ X
to every pair of strategies s 1 ∈ S 1 , s 2 ∈ S 2 .

A convenient representation of a game form is a matrix M = M(g) whose rows


are labeled by S 1 , whose columns are labeled by S 2 , and whose elements are
478 10 Read-once functions

labeled by X. For example,


9 :
x1 x2
M1 = .
x2 x1
Each outcome x ∈ X may appear several times in M(g), because g may not be
injective. We can interpret M(g) as “a game in normal form in which the payoff
is not specified, yet.”

Definition 10.2. Two strategies s1i and s2i of player i, where i = 1 or 2, are
called equivalent if for every strategy s 3−i of the opponent, we have g(s1i , s 3−i ) =
g(s2i , s 3−i ); in other words, if in matrix M(g), the rows (i = 1) or the columns
(i = 2) corresponding to the strategies s1i and s2i , are equal.

We will restrict ourselves by studying the game forms without equivalent


strategies.

Definition 10.3. Given a read-once function f , we can interpret its parse tree
(or read-once formula) T (f ) as an extensive game form (or game tree) of two
players. The leaves X = {x1 , x2 , ..., xk } of T are the final positions or outcomes.
The internal vertices of T are the internal positions. The game starts at the root of
T and ends in a final position x ∈ X. Each path from the root to a final position
(leaf) is called a play. If an internal node v is labeled by ∨ (respectively, by ∧),
then it is the turn of player 1 (respectively, player 2) to move in v. This player can
choose any vertex that is a child of v in T .
A strategy of a player is a mapping which assigns a move to every position in
which this player has to move. In other words, a strategy is a plan of how to play
in every possible situation.
Any pair of strategies s 1 of player 1 and s 2 of player 2 define a play p(s 1 , s 2 ) and
an outcome x(s 1 , s 2 ) that would appear if both players implement these strategies.
Two strategies s1i and s2i of player i, where i = 1 or 2, are called equivalent
if for every strategy s 3−i of the opponent the outcome is the same, that is, if
x(s1i , s 3−i ) = x(s2i , s 3−i ). By suppressing all but one (arbitrary) strategy from every
class of equivalent strategies, we obtain two reduced sets of strategies, denoted by
S 1 = {s11 , s21 , ..., sm1 1 } and S 2 = {s12 , s22 , ..., sm2 2 }.
The mapping g : S 1 × S 2 → X, which assigns the outcome x(s 1 , s 2 ) ∈ X to
every pair of strategies s 1 ∈ S 1 , s 2 ∈ S 2 , defines a game form, which we call the
normal form of the corresponding extensive game form.
Note that such a mapping g = g(T ) may be not injective because different pairs
of strategies may generate the same play.
We call a game form g positional if it is the normal form of an extensive game
form, that is, if g = g(T (f )) for a read-once function f .

Example 10.6. In the extensive game form defined by the read-once formula
((x1 ∨ x2 )x3 ∨ x4 )x5 , each player has three strategies, and the corresponding
normal game form is given by the following (3 × 3)-matrix:
10.7 Related topics and applications of read-once functions 479

 
x1 x3 x5
M 2 =  x2 x3 x5  .
x4 x4 x5
The game form given by the matrix
9 :
x1 x1
M3 =
x2 x3
is also generated by a read-once formula, namely, by x1 ∨ x2 x3 . 
Our aim is to characterize the positional game forms.
Definition 10.4. Let us consider a game form g and the corresponding matrix
M = M(g). We associate with M two DNFs, representing two Boolean functions
f1 = f1 (g) = f1 (M) and f2 = f2 (g) = f2 (M), respectively, by first taking the
conjunction of all the variables in each row (respectively, each column) of M,
and then taking the disjunction of all these conjunctions for all rows (respectively,
columns) of M.
We call a game form g (as well as its matrix M) tight if the functions f1 and f2
are mutually dual.
Example 10.7. Matrix M2 of Example 10.6 generates the functions f1 (M2 ) =
x1 x3 x5 ∨ x2 x3 x5 ∨ x4 x5 and f2 (M2 ) = x1 x2 x4 ∨ x3 x4 ∨ x5 . These functions are
mutually dual, thus the game form is tight. Matrix M3 is also tight, because its func-
tions f1 (M3 ) = x1 ∨ x2 x3 and f2 (M3 ) = x1 x2 ∨ x1 x3 are mutually dual. However,
M1 is not tight, because its functions f1 (M1 ) = f2 (M1 ) = x1 x2 are not mutually
dual. 
Remark 10.12. It is proven in [421] that a normal game form (of two players) is
Nash-solvable (that is, for an arbitrary payoff the obtained game has at least one
Nash equilibrium in pure strategies) if and only if this game form is tight. 
Theorem 10.12. Let f be a read-once function; T = T (f ), the parse tree of f
interpreted as an extensive game form; g = g(T ), its normal form; M = M(g),
the corresponding matrix; and f1 = f1 (M), f2 = f2 (M), the functions generated
by M. Then, f1 = f and f2 = f d .
Proof. By induction. For a trivial function f the claim is obvious. If f = f ∨f ,
then f1 = f1 ∨f1 and f2 = f2 ∧f2 . If f = f ∧f , then f1 = f1 ∧f1 and f2 =
f2 ∨f2 . The theorem follows directly from the definition of strategies. 

Definition 10.5. We call a game form g : S 1 × S 2 → X (as well as the correspond-


ing matrix M) rectangular if every outcome x ∈ X occupies a rectangular array
in M, that is, if the following property holds: g(s11 , s12 ) = g(s21 , s22 ) = x implies
g(s11 , s22 ) = g(s21 , s12 ) = x.
For example, matrices M2 and M3 above are rectangular, while M1 is not.
480 10 Read-once functions

Theorem 10.13. A game form g and its corresponding matrix M are rectangular
if and only if every prime implicant of f1 (M) and every prime implicant of f2 (M)
have exactly one variable in common.
Proof. Obviously, any two such prime implicants must have at least one common
variable because every row and every column in M intersect, that is, row s 1 and
column s 2 always have a common outcome x = g(s 1 , s 2 ). Let us suppose that they
have another common outcome, namely, that there exist strategies si1 and sj2 such
that g(s 1 , sj2 ) = g(si1 , s 2 ) = x  = x. Then, g(s 1 , s 2 ) = x; thus, g is not rectangular.
Conversely, let us assume that g is not rectangular, that is, g(s11 , s12 ) = g(s21 , s22 ) =
x, while g(s11 , s22 ) = x  = x. Then row s11 and column s22 have at least two outcomes
in common, namely, x and x . 

Theorem 10.14. (Gurvich [423, 424]). A normal game form g is positional if and
only if it is tight and rectangular.
Proof. The normal form g corresponding to an extensive game form T (f ) is tight
in view of Theorem 10.12, and g is rectangular in view of Theorem 10.13 and
Theorem 10.6(iv).
Conversely, if g is tight and rectangular, then, by definition, f1 (g) and f2 (g)
are dual. Further, according to Theorem 10.13, every prime implicant of f1 (g) and
every prime implicant of f2 (g) have exactly one variable in common. Hence, by
Theorem 10.6(iv), f1 (g) and f2 (g) are read-once; thus, g is positional. 

Remark 10.13. In [423], this theorem is generalized for game forms of n players.
The criterion is the same: A game form is positional if and only if it is tight and
rectangular. The proof is based on the cotree decomposition of P4 -free graphs; see
Sections 10.3, 10.5. 

10.8 Historical notes


We conclude this chapter with a few brief remarks about the history of read-once
functions. It is important to distinguish between
(A) the algorithms to verify read-onceness based on the parse tree decompo-
sition, or, in other words, the ∨-∧ disjoint decomposition, and
(B) the criteria of read-onceness based on “rectangularity" of the pair f and
f d , or P4 -freeness and normality of f .
In fact, (A) is at least 20 years older than (B). The oldest reference we know is
by Kuznetsov [592], in 1958. Kuznetsov claims that the parse tree decomposition
is well defined (i.e., it is unique), and he also says a few words on how to get it; De
Morgan’s formulae are mentioned, too. This implies (A), though read-onceness is
not mentioned explicitly in this paper.
10.9 Exercises 481

In his 1978 doctoral thesis, Gurvich [423] remarked that the parse tree decom-
position is a must for any minimum ∨-∧ formula for f , in both the monotone
and general cases. However, a bit earlier, Michel Chein’s short paper [190] based
on his doctoral thesis of 1967 may be the earliest one mentioning “read-once”
functions. J. Kuntzmann (Chein’s thesis advisor) raised the question a few years
earlier in the first edition (1965) of his book “Algèbre de Boole” [589], mentioning
a problem called “dédoublement de variables,” and in the second edition (1968)
he cites Chein’s work.
What Chein does (using our notation) is to look at the bipartite graph B(f ) =
(P, V , E), where P is the set of prime implicants, V is the set of variables, and
edges represent containment, that is, for all P ∈ P, v ∈ V ,
(P , v) ∈ E ⇐⇒ v ∈ P .
The reader can easily verify that B(f ) is connected if and only if the graph
G(f ) is connected.
Chein’s method is to check which of B(f ) or B(f d ) is disconnected (failing if
both are connected) and continuing recursively. An exponential price is paid for
dualizing. Peer and Pinter [734] do something quite similar.
By contrast, as the reader also now knows, the polynomial-time algorithm of
Golumbic, Mintz, Rotics similarly acts on G(f ) and G(f d ), but G(f d ) is gotten for
free, without dualizing, thanks to the fact that G(f d ) equals the graph complement
of G(f ) (by Theorem 10.6), paying only an extra low price to check for normality.
Finally, to clarify complexities using our notation: Clearly, building B(f d )
involves dualization of f ; however, building G(f d ) can be done in polynomial
time for any positive Boolean function (i.e., without any dualization). The implica-
tion is that one can compute a unique read-once decomposition for any (positive)
read-once Boolean function in polynomial time; see also Ramamurthy’s book
[777].
To summarize, testing read-onceness and obtaining a parse tree decomposition
is just an extreme case of representing f by a minimum length ∨-∧ formula.
The parse tree decomposition implies (A) and has been known since 1958 [592],
whereas (B) has been known since 1977 [422, 423] and been rediscovered inde-
pendently several times thereafter [293, 294, 548, 696]. Dominique de Werra has
described it as “an additional interesting example of rediscovery by people from
the same scientific community. It shows that the problem has kept its importance
and [those involved] have good taste.”

10.9 Exercises
1. Prove that a Boolean function f for which some variable appears in its
positive form x in one prime implicant and in its negative form x in another
prime implicant cannot be a read-once function.
2. Verify Remark 10.1; namely, if T is a proper dual subimplicant of f , then
there exists a prime implicant of f , say, P , such that P ∩ T = ∅.
482 10 Read-once functions

3. Consider the positive Boolean function

f = x1 x2 ∨ x1 x5 ∨ x2 x3 ∨ x2 x4 ∨ x3 x4 ∨ x4 x5 .

(a) Draw the co-occurrence graph G(f ). Prove that f is not a read-once
function.
(b) Let T = {x1 , x4 }. What are the sets P0 , Px1 , Px4 ? Prove that T is a dual
subimplicant of f by finding a noncovering selection.
(c) Let T = {x3 , x4 , x5 }. What are the sets P 0 , P x3 , P x4 , P x5 ? Prove that
T is not a dual subimplicant of f .
4. Consider the function f = ab ∨ bc ∨ cd. Verify that {a, d} is not a dual
subimplicant.
5. Verify that the function

f = adg ∨ adh ∨ bdg ∨ bdh ∨ eag ∨ ebg ∨ ecg ∨ eh

in Example 10.2 is not a normal function. Find the collection D of dual


prime implicants of f . Is f d normal?
6. Let f be a positive Boolean function over the variable set {x1 , x2 , ..., xn }, and
let T be a subset of the variables. Prove the following:
(a) T is a dual prime implicant if and only if P0 = ∅ and there is a
nonempty selection S for T (i.e., Pxi  = ∅ for every xi ∈ T ).
(b) T is a dual super implicant (i.e., D ⊂ T for some dual prime implicant
D ∈ D) if and only if P0 = ∅ and Pxi = ∅ for some xi ∈ T (i.e., no
selection S is possible).
7. Prove that for any graph G, G must be connected if G is disconnected.
8. Give a direct proof (using the dual subimplicant theorem) of the implication
(iii) =⇒ (ii) of Theorem 10.4; namely, if G(f ) and G(f d ) do not share a
common edge, then G(f ) and G(f d ) are complementary graphs.
9. Using Lemma 10.1, prove that the read-once expression is unique for a
read-once function (up to commutativity of the operations ∨ and ∧).
10. Verify that the function f = abc ∨ bde ∨ ceg from Example 10.3 is not
normal, though its three prime implicants correspond to maximal cliques of
the co-occurrence graph G(f ); see Figure 10.4. Verify that G(f ) contains
an induced P4 . How many P4 ’s does it contain?
11. Consider two functions:

f1 = x1 x3 x5 ∨ x1 x3 x6 ∨ x1 x4 x5 ∨ x1 x4 x6 ∨ x2 x3 x5 ∨ x2 x3 x6 ∨ x2 x4 x5 ∨ x2 x4 x6

and

f2 = x1 x3 x5 ∨ x1 x3 x6 ∨ x1 x4 x5 ∨ x1 x4 x6 ∨ x2 x3 x5 ∨ x2 x3 x6 ∨ x2 x4 x5 .

Verify that they generate the same co-occurrence graph G, which is P4 -free,
and that all prime implicants of f1 and f2 correspond to maximal cliques of
G; yet, f1 is normal, while f2 is not. Find the cotree for G and the read-once
expression for f1 .
10.9 Exercises 483

12. Give an example of a pair of functions g and f with same co-occurrence


graph G = G(g) = G(f ), which is P4 -free, and where the number of prime
implicants of g and f are equal; yet, g is normal and thus read-once, while
f is not. (Hint: Combine nonnormal functions seen in this chapter whose
graphs are P4 -free.)
13. Prove that for a positive Boolean function given by its complete DNF expres-
sion, it is possible to check normality in O(n3 k) time, where n is the number
of essential variables, and k is the number of prime implicants of the function.
(Hint: Use the results of [538].)
14. Prove Lemma 10.6: Let f be a nonconstant read-once function over the
variables N = {x1 , x2 , . . . , xn }. Suppose that Di is a dual prime implicant
containing xi but not xj , and that Dj is a dual prime implicant containing xj
but not xi . Let Ri,j = (N \ (Di ∪ Dj )) ∪ {xi , xj }. Then, (xi , xj ) is an edge in
the co-occurrence graph G(f ) if and only if Ri,j contains a prime implicant.
(Hint: Use (iv) of Theorem 10.6, or see reference [22].)
15. Prove that the recursive definition of cographs based on rules (1), (2), (3)
in Section 10.4 is equivalent to the alternative definition using rules
(1), (2), (3 ).
16. Let f be a positive Boolean function over the variables N = {x1 , x2 , . . . , xn },
and let U ⊆ N.
(a) Prove that the following greedy algorithm Find-PI-In(U ) finds a prime
implicant P ⊆ U of f , if one exists, and can be implemented to run
in O(n) time using |U | membership queries. (We denote by eU the
characteristic vector of U , where (eU )i = 1 for xi ∈ U , and (eU )i = 0
otherwise.)
Algorithm Find-PI-In(U )
Step 1: Verify that f (eU ) = 1.
Otherwise, exit with no solution, since U contains no prime
implicant.
Step 2: Set S ← U .
Step 3: For all xi ∈ U , do
if f (eS\{xi } ) = 1 then S ← S \ {xi }
end-do
Step 4: Set P ← S and output P .
(b) Write an analogous dual Algorithm Find-DualPI-In(U ) to find a dual
prime implicant D ⊆ U of f , if one exists.
17. The next three exercises are due to [22].
Prove the following: Let f be a nonconstant read-once function, and let Y be
a nonempty subset of its variables. Then Y is the set of essential variables of
f if and only if for every variable xi ∈ Y , xi is contained in a prime implicant
of f that is a subset of Y , and xi is contained in a dual prime implicant of f
that is a subset of Y .
18. Let f be a read-once function over the set of variables N = {x1 , x2 , . . . , xn }.
Prove the following: If S is a prime implicant of f containing the variable
484 10 Read-once functions

xi , then (N \ S) ∪ {xi } contains a dual prime implicant of f , and any such


dual prime implicant contains xi . Dually, if T is a dual prime implicant of
f containing the variable xi , then (N \ T ) ∪ {xi } contains a prime implicant
of f , and any such prime implicant contains xi .
19. Let f be a read-once function over the set of variables N = {x1 , x2 , . . . , xn }.
Using Exercises 16, 17, and 18, prove that the following algorithm finds the
set Y of essential variables and can be implemented to run in O(n2 ) time
using O(n2 ) membership queries. In the process, for each variable xi in Y , it
generates a prime implicant P [i] and a dual prime implicant D[i] containing
xi .
Algorithm Find-Essential-Variables
Step 1: Set P [i] ← D[i] ← ∅ for i = 1, . . . , n.
Step 2: Set W ← P ← Find-PI-In(N ), and
for each xj ∈ P , set P [j ] ← P .
Step 3: While there exists xi ∈ N such that exactly one of P [i] and D[i]
is ∅, do
(3a:) if D[i] = ∅, then set D ← Find-DualPI-In((N \ P [i])∪
{xi }), and for each xj ∈ D, set D[j ] ← D, and set W ←
W ∪ D.
(3b:) if P [i] = ∅, then set P ← Find-PI-In((N \ D[i]) ∪ {xi }), and
for each xj ∈ P , set P [j ] ← P , and set W ← W ∪ P .
end-do
Step 4: Set Y ← W and output Y .
20. Give a counter example to show that the statement in Exercise 17 may fail
when f is a positive Boolean function but is not read-once. Show that for
an arbitrary positive Boolean function f , identifying the set of essential
variables may require an exponential number of calls on a membership
oracle.
21. Let f be a read-once positive function of 2n variables with n prime impli-
cants xi yi for i ∈ N = {1, . . . , n}. Prove that there is a partition N = I1 ∪. . .∪Ik

such that f = kj =1 µj νj , where, for j ∈ {1, . . . , k}, µj and νj are elementary
disjunctions, each containing exactly one of xi , yi for each i ∈ Ij , and no
other variables. (See Lemma 10.4.)
22. (From Lisa Hellerstein.) Consider the function
f1 = x1 ∨ x2 ∨ ... ∨ xn
and the class of functions F = {fA }, where A is an element in {0, 1}n having
at least two 1’s, and
fA (X) = 1 ⇐⇒ f1 (X) = 1 and X  = A.

(a) Prove that the functions fA are not monotone.


(b) Prove that determining that a function is equal to f1 and not some fA
requires querying all possible A’s, and there are Y(2n ) of them.
10.9 Exercises 485

23. Prove directly that the normal form of any extensive game form is rectan-
gular. In other words, if two pairs of strategies (s11 , s12 ) and (s21 , s22 ) result in
the same play p, that is, p(s11 , s12 ) = p(s21 , s22 ) = p, then (s11 , s22 ) and (s21 , s12 )
also result in the same play, that is, p(s11 , s22 ) = p(s21 , s12 ) = p.
24. Verify that the following two game forms are tight:
 
x1 x2 x1 x2
 x3 x4 x4 x3 
 
M4 =  ,
 x1 x4 x1 x5 
x3 x2 x6 x2

 
x1 x1 x2
 
M5 =  x1 x1 x3  .
x2 x4 x2

Questions for thought


25. To what extent is Lemma 10.1 true for all expressions, that is, not just the
read-once formula and the DNF formula of prime implicants?
26. The polynomial time complexity given in Theorem 10.5 can (almost cer-
tainly) be improved by a more careful choice of data structures. In this
direction, what is the complexity of calculating P0 and Pxi for all xi ?
Consider using bit vectors to represent sets of variables.
27. What can be said about the complexity of recognizing read-once functions if
the input formula is not a DNF, but some other type of representation, such
as a BDD or an arbitrary Boolean expression? In such a case, we might have
to pay a high price to convert the formula into a DNF or CNF and use the
GMR method of Section 10.5. When is there an efficient alternative way to
build the co-occurrence graph G(f ) directly from a representation of f that
is different from the DNF or CNF expression? What assumptions must be
made regarding f ? When can normality also be tested?
It is shown in [13] that if ψ is a nonmonotone DNF expression, the
read-once recognition problem is co-NP-complete. Furthermore, as we saw
in Theorem 10.9, the problem remains co-NP-complete even for arbitrary
positive expressions. How does this impact the answer?
28. What would happen if we attempted to apply the read-once oracle learning
method to a positive function f that was not read-once? In other words, in
the building of the co-occurrence graph (Step 1), how did we rely upon the
read-once assumption? Would the oracle fail, in which case we would know
that f is not read-once, or would it produce some other graph? What graph
would we get? When would it still yield the correct co-occurrence graph
G(f )? If so, we can easily test whether it is a cograph, but how can we test
whether the function is normal? For example, consider what would happen
486 10 Read-once functions

for the functions f1 and f2 of Section 10.1. Could the oracle generate all
prime implicants? What would be the complexity?
29. The two game forms M4 and M5 in Exercise 24 represent the normal form
of some extensive games on graphs that have no terminal positions, and
their cycles are the outcomes of the game. Find two graphs that generate
M4 and M5 .
11
Characterizations of special classes by
functional equations
Lisa Hellerstein

The previous chapters covered a number of different classes of Boolean func-


tions and provided a variety of characterizations of those classes. Some of those
characterizations were in terms of functional equations or inequalities, such as
the characterization of Horn functions by the inequality f (XY ) ≤ f (X) ∨ f (Y )
in Chapter 6. This chapter presents similar characterizations of other Boolean
function classes.
This chapter also presents general results on characterizations of Boolean func-
tion classes by functional equations. Some important classes of Boolean functions
can be characterized by a single simple functional equation. Other classes can be
characterized by an infinite set of functional equations, but not by any finite set.
Finally, some classes cannot be characterized even by an infinite set of functional
equations.
Ekin, Foldes, Hammer, and Hellerstein [305] were the first to systematically
study the characterization of Boolean functions by functional equations and similar
logical expressions. Related results and characterizations, and extensions to non-
Boolean classes of functions, appeared in a number of papers (cf. [748, 485, 334,
751, 340, 217, 218, 219, 220]); several of these papers point out the connections
between equational characterizations of Boolean functions and Post’s classical
description of the classes of Boolean functions closed under compositions (see
[753, 752]).
Except where otherwise noted, the results in this chapter are from Ekin
et al. [305].

11.1 Characterizations of positive functions


To help motivate what follows, we begin with some simple characterizations.
Recall from Section 1.10 that for two points X = (x1 , x2 , . . . , xn ) and Y =
(y1 , y2 , . . . , yn ) in B n , we write X ≤ Y if xi ≤ yi for all i = 1, 2, . . . , n. Let X ∨ Y
denote (x1 ∨ y1 , . . . , xn ∨ yn ), the bitwise disjunction of X and Y . Let X ∧ Y (also

487
488 11 Characterizations by functional equations

written XY ) and X similarly denote the bitwise conjunction of X and Y and the
bitwise negation of X, respectively.
By Theorem 1.20, a Boolean function f on Bn is positive if and only if f (X) ≤
f (Y ) for all X, Y ∈ B n such that X ≤ Y . The following theorem gives two other
characterizations of positive functions:
Theorem 11.1. A Boolean function f on B n is positive if and only if the following
inequality is satisfied for all X, Y ∈ Bn :
f (X) ≤ f (X ∨ Y ) (11.1)
or, equivalently, if and only if the following inequality is satisfied for all X, Y ∈ Bn :

f (XY ) ≤ f (X). (11.2)


Proof. We prove that the statement holds for the first inequality. The second is
proved similarly.
Let f be a Boolean function defined on Bn . Since for all X, Y ∈ B n , X ≤ X ∨ Y ,
if f is positive then f satisfies f (X) ≤ f (X ∨ Y ).
Conversely, suppose f satisfies f (X) ≤ f (X ∨ Y ). Consider V , W ∈ Bn such
that V ≤ W . Since V ∨W = W , f satisfies f (V ) ≤ f (V ∨W ) = f (W ). Therefore,
f is positive. 

From the foregoing, it is easy to show that the class of negative Boolean
functions is characterized by the inequalities
f (X ∨ Y ) ≤ f (X)
and
f (X) ≤ f (XY ),
which are opposite to the inequalities given for positive functions.
We will show below that similar functional equations and inequalities charac-
terize other interesting classes of Boolean functions.

11.2 Functional equations


In this section, we formally define what it means to characterize a class of Boolean
functions using functional equations or inequalities.
We first give preliminary definitions and notation. Let m, n > 0. A Boolean
expression φ on B m can be interpreted as representing a function from (B n )m to
Bn , as follows.
Definition 11.1. Let φ(x1 , . . . , xm ) be a Boolean expression. Let n ≥ 1, and let
Y1 . . . , Ym be elements of B n . We define φ(Y1 , . . . , Ym ) to be the vector obtained
by applying φ componentwise to the entries of Y1 , . . . , Ym . More formally, letting
Yi = (yi1 , . . . , yin ), for 1 ≤ i ≤ m, we define φ(Y1 , . . . , Ym ) to be equal to
(φ(y1,1 , . . . , ym,1 ), φ(y1,2 , . . . , ym,2 ), . . . , φ(y1,n , . . . , ym,n )).
11.2 Functional equations 489

The expression φ(Y1 , . . . , Ym ) thus represents a function from (B n )m to B n . We call


this function the interpretation of φ in B n .
Example 11.1. Let
φ1 (Y1 , Y2 , Y3 ) = Y1 Y2 ∨ Y 1 Y3 ,
φ2 (Y1 ) = 0.
Let Y1 = (1, 0), Y2 = (0, 1) and Y3 = (1, 1). Then,
φ1 (Y1 , Y2 , Y3 ) = φ1 ((1, 0), (0, 1), (1, 1))
= (1, 0)(0, 1) ∨ (1, 0)(1, 1)
= (0, 0) ∨ (0, 1)(1, 1)
= (0, 1)
and φ2 (1, 0) = (0, 0). 
As is standard with functions taking a single vector-valued input, we write, for
example, φ2 (1, 0) rather than φ2 ((1, 0)).
Given a Boolean function g on B n and a Boolean expression φ(Y1 , . . . , Ym ),
the expression g(φ(Y1 , . . . , Ym )) denotes the composition of φ, interpreted in B n ,
and g. This composite function is a map from (B n )m to B.
We now give a formal definition of a functional equation.
Definition 11.2. A functional equation in the variables Y1 , . . . , Ym and the function
symbol f is an equation of the form
h1 (f (τ1 (Y1 , . . . , Ym )), . . . , f (τs (Y1 , . . . , Ym )))
= h2 (f (τ1 (Y1 , . . . , Ym )), . . . , f (τt (Y1 , . . . , Ym ))), (11.3)
where m, s, t ≥ 1, h1 is a Boolean expression on B s , h2 is a Boolean expression on
B t , and each τi and τi is a Boolean expression on B m .
We refer to the variables Y1 , . . . , Ym as the vector variables of the equation.
Functional inequalities are defined analogously to functional equations.
Example 11.2. Consider the functional equation
h1 (f (τ1 (Y1 , Y2 )), f (τ2 (Y1 , Y2 ))) = h2 (f (τ1 (Y1 , Y2 ))),
where h1 (x1 , x2 ) = x1 ∨ x2 , h2 (x1 ) = x1 , τ1 (x1 , x2 ) = x1 , τ2 (x1 , x2 ) = x1 ∨ x2 , and
τ1 (x1 , x2 ) = x1 ∧ x2 .
We write this more succinctly as
f (Y1 ) ∨ f (Y1 ∨ Y2 ) = f (Y1 Y2 ).

Consider a functional equation C = D in the variables Y1 , . . . , Ym and the func-
tion symbol f , as in Equation 11.3. By replacing the function symbol f in C by
490 11 Characterizations by functional equations

a particular function g on B n (for some n ≥ 0), and interpreting the τi in C in Bn ,


we obtain an expression representing a Boolean function on (Bn )m . We denote this
function by Cg (Y1 , . . . , Ym ). The function Dg is defined analogously.
Example 11.3. Let
C = f (Y1 ∨ Y3 ) ∨ f (Y2 ).
2
Let g be the function on B such that g(x1 , x2 ) = x1 x2 . Then,

Cg (Y1 , Y2 , Y3 ) = g(Y1 ∨ Y3 ) ∨ g(Y2 ).

The value of Cg ((0, 1), (1, 0), (0, 0)) can be computed as follows:

Cg ((0, 1), (1, 0), (0, 0)) = g((0, 1) ∨ (0, 0)) ∨ g(1, 0) = g(0, 1) ∨ g(1, 0) = 0 ∨ 0 = 1.


We say that a particular Boolean function g on B n satisfies a functional equation
C = D in the variables Y1 , . . . , Yn and function symbol f , if for all Y1 , . . . , Ym ∈ B n ,

Cg (Y1 , . . . , Ym ) = Dg (Y1 , . . . , Ym ).
Otherwise, we say that g falsifies the equation.
Example 11.4. Consider the functional equation

f (Y1 ∨ Y3 ) ∨ f (Y2 ) = f (Y1 Y2 ).

Let C denote the left-hand side of this equation and D the right-hand side. Note
that C is the same as in the previous example.
Also as in the previous example, let g be the function on B2 such that g(x1 , x2 ) =
x1 x2 , and let Y1 = (0, 1), Y2 = (1, 0), Y3 = (0, 0). We showed that Cg (Y1 , Y2 , Y3 ) = 1.
For the same values of the Yi ’s,

Dg (Y1 , Y2 , Y3 ) = g((0, 1)(1, 0)) = g(0, 0) = 0.

Thus g falsifies the above equation. 

Definition 11.3. A (possibly infinite) set I of functional equations characterizes a


class K of Boolean functions if K consists precisely of the Boolean functions that
satisfy all equations in I .
Our primary focus is on characterization by functional equations of the form
C = D. However, it is sometimes more convenient to consider characterizations
by functional inequalities C ≤ D.
Theorem 11.2. The following two equations each characterize the same set of
Boolean functions as the functional inequality C ≤ D:
• C ∨ D = D.
• CD = C.
11.3 Characterizations of particular classes 491

Proof. Follows directly from the fact that C and D are both Boolean-valued. 

Example 11.5. By Theorem 11.1, the inequality f (X) ≤ f (X ∨ Y ) characterizes


the class of positive Boolean functions. Therefore, so do either of the following
functional equations:
f (X) ∨ f (X ∨ Y ) = f (X ∨ Y )
and
f (X)f (X ∨ Y ) = f (X).

An interesting alternative to using functional equations or inequalities is to
instead use relations called Boolean constraints. These relations were introduced
by Pippenger, who showed that a class of Boolean functions can be characterized
by functional equations if and only if it can be characterized by a set of Boolean
constraints [748] (cf. Exercise 4).

11.3 Characterizations of particular classes


In this section, we present and discuss functional equations and inequalities
characterizing some important classes of Boolean functions.

11.3.1 Horn functions


In Chapter 6, Corollary 6.2, the following inequality was shown to characterize
the class of Horn functions:

f (XY ) ≤ f (X) ∨ f (Y ). (11.4)


This inequality for Horn functions is very similar to the inequality f (XY ) ≤
f (X)f (Y ). The latter inequality characterizes the positive functions. This can
be shown by combining the inequality f (XY ) ≤ f (X), previously shown to
characterize positive functions (in Theorem 11.1), with the equivalent inequality
f (XY ) ≤ f (Y ).
Recall that a Boolean function f is co-Horn if the function g(X) = f (X) is
Horn. From the inequality characterizing Horn functions, it is easy to show that
the following inequality characterizes the co-Horn functions:

f (X ∨ Y ) ≤ f (X) ∨ f (Y ). (11.5)

11.3.2 Linear functions and related classes


In Chapter 1 (Definition 1.12), the degree of a DNF φ was defined to be the
maximum degree (number of literals) in any term of φ. We now define the degree
of a Boolean function.
492 11 Characterizations by functional equations

Definition 11.4. The degree of a Boolean function f is the degree of the complete
DNF of f . Equivalently, it is the maximum degree of any prime implicant of f . A
Boolean function is called linear if its degree is at most 1.
If a Boolean function is representable by a DNF of degree 1, then all of its prime
implicants have degree 1. Therefore, a Boolean function is linear if and only if it
can be represented by a DNF of degree at most 1.
We discuss functions of degree k ≥ 2 in the next section.
Polar functions were defined in Chapter 5, Section 5.3. A Boolean function is
polar if it is representable by a DNF in which no term contains both a complemented
and an uncomplemented variable. Equivalently, a Boolean function f is polar if
f = g ∨ h for some positive function g and some negative function h.
Submodular functions were defined in Chapter 6, Section 6.9 to be the functions
satisfying the inequality
f (X) ∨ f (Y ) ≥ f (X ∨ Y ) ∨ f (XY ). (11.6)
Supermodular functions are defined by reversing the inequality for submodular
functions.
Definition 11.5. A Boolean function is supermodular if it satisfies the inequality
f (X) ∨ f (Y ) ≤ f (X ∨ Y ) ∨ f (XY ). (11.7)
In fact, the class of supermodular functions is identical to the class of polar
functions.
Theorem 11.3. A Boolean function f is polar if and only if it is supermodular.
Proof. Suppose f is a polar function on Bn . Let f = g ∨ h, where g is positive
and h is negative. Suppose X, Y ∈ B n are such that f (X) ∨ f (Y ) = 1. Assume,
without loss of generality, that f (X) = 1. Then, X satisfies either g or h, or both.
If X satisfies g, then X ∨ Y must also satisfy g because g is positive, and hence,
f (X ∨ Y ) = 1. If X satisfies h, then XY must satisfy h because h is negative, and
hence, f (XY ) = 1. Therefore, f is supermodular.
Conversely, suppose f is a supermodular function on Bn . Define the following
sets:
S = {X ∈ Bn | f (X) = 1 and for all Y ∈ Bn , X ≤ Y ⇒ f (Y ) = 1},
T = {X ∈ Bn | f (X) = 1 and for all Y ∈ Bn , Y ≤ X ⇒ f (Y ) = 1}.
Let g be the function on Bn such that g(X) = 1 if and only if X ∈ S, and let
h be the function on Bn such that h(X) = 1 if and only if X ∈ T . Clearly g is
positive, h is negative, and g ∨ h ≤ f . We will show that f = g ∨ h. Suppose not.
Then, there exist points P , Q, R ∈ Bn such that f (Q) = 1, f (P ) = f (R) = 0, and
P ≤ Q ≤ R. Define Z = P ∨ QR. Since P ≤ Q ≤ R, ZQ = P and Z ∨ Q = R.
But then f (Z) ∨ f (Q) = 1 and f (ZQ) ∨ f (Z ∨ Q) = 0, contradicting that f is
supermodular. Therefore f = g ∨ h, and thus, f is polar. 
11.3 Characterizations of particular classes 493

The inequalities characterizing polar and submodular functions yield an


equation characterizing linear functions.
Theorem 11.4. A Boolean function is linear if and only if it satisfies the functional
equation
f (X) ∨ f (Y ) = f (X ∨ Y ) ∨ f (XY ).
Proof. A Boolean function is linear if and only if it is polar, Horn, and co-Horn.
In Chapter 6, Section 6.9, it was shown that a Boolean function is submodular if
and only if it is both Horn and co-Horn. Using these two facts, Theorem 11.4 fol-
lows immediately from Theorem 11.3 and the functional equation for submodular
functions. 

11.3.3 Quadratic and degree k functions


Quadratic functions were defined previously in Chapter 5 as the Boolean func-
tions representable by DNFs of degree at most 2. By Theorem 5.1, if a function
is quadratic, then all its prime implicants have degree at most 2. Hence the
quadratic functions are precisely the functions of degree at most 2, in the sense of
Definition 11.4.
In Chapter 1, Section 1.11, we defined Fk to be the class of Boolean functions
representable by DNFs of degree at most k. For k = 1 and k = 2, Fk is also the class
of functions of degree at most k. However, for k ≥ 3, Fk is not the class of Boolean
functions of degree at most k (in the sense of Definition 11.4). For example, the
function f (x1 , x2 , x3 , x4 , x5 ) = x1 x2 x3 ∨ x3 x4 x5 is representable by the given DNF,
which has degree 3, but it has a prime implicant of degree greater than 3, namely
x1 x2 x4 x5 .
As mentioned in Section 5.3.2 of Chapter 5, an early functional characterization
of quadratic Boolean functions was given by Schaefer [807]. This characterization
was rediscovered (in a slightly different form) by Ekin et al. [305], and we give
their proof here.
Theorem 11.5. Quadratic Boolean functions are characterized by the inequality

f (XY ∨ XZ ∨ Y Z) ≤ f (X) ∨ f (Y ) ∨ f (Z). (11.8)

Proof. Suppose f is quadratic. Let Q, R, S be points such that

f (Q) ∨ f (R) ∨ f (S) = 0.

We will show that


f (QR ∨ RS ∨ QS) = 0. (11.9)
Let P be a prime implicant of f . The prime implicant P contains at most two
literals, and Q, R, and S must each falsify at least one literal of P . Therefore, there
exists a literal z of P that is falsified by at least two of Q, R, and S. Without loss of
494 11 Characterizations by functional equations

generality, assume that Q and R both falsify z. Then, whether z is complemented


or not, QR ∨ RS ∨ QS also falsifies z, and hence, P as well. This implies (11.9)
and completes the proof of inequality (11.8) for quadratic functions.
Conversely, suppose that f is not quadratic, that is, some prime implicant P of
f has degree at least three. Then P can be written as

P1 P2 P3 ,

where each factor Pi is an elementary conjunction with at least one variable, but
no two of the three factors P1 , P2 , P3 have a common variable. Define elementary
conjunctions
R1 = P1 P3 , R2 = P2 P3 , R3 = P1 P2 .
Since P is a prime implicant, none of these Ri is an implicant of f , namely, there
are points X, Y , Z such that

R1 (X) = R2 (Y ) = R3 (Z) = 1

f (X) = f (Y ) = f (Z) = 0.
These points violate (11.8). 

Although linear and quadratic functions can be characterized by a functional


equation, we will show in Section 11.4 that for k > 2, there is no set of functional
equations that characterizes the functions of degree at most k. However, by gen-
eralizing the equation for quadratic functions, we obtain the following result for
positive functions:
Theorem 11.6. Let f be a positive Boolean function and let k ≥ 2. Then f has
degree at most k if and only if f satisfies the inequality
k+1

f( Yj ) ≤ f (Y1 ) ∨ . . . ∨ f (Yk+1 ). (11.10)


i=1 j  =i

Proof. Let f be defined on B n . First, we show that if f is of degree at most k, then


(11.10) holds. Suppose

f (Y1 ) = . . . = f (Yk+1 ) = 0

for some Y1 , . . . , Yk+1 ∈ B n . Let P be a prime implicant of f . Then P contains


at most k literals. Each Yi must falsify at least one literal of P , and hence, there
exists a literal z of P that is falsified by at least two of Y1 , . . . , Yk+1 . Without loss of
generality, assume Y1 and Y2 falsify z. Since P is positive, z is an uncomplemented
literal. Thus, the variable z takes the value 0 in Y1 and Y2 . Then, z also takes the
value 0 in
k+1

Yj ,
i=1 j =i

and hence, so does P . It follows that the left hand side of Equation (11.10) is 0.
11.4 Conditions for characterization 495

Conversely, suppose that some prime implicant P of f has degree at least k + 1.


Then, P can be written as
P1 . . . Pk+1 ,
where each factor Pi is an elementary conjunction with at least one variable, but
no two factors have a common variable. For each i = 1, . . . , k + 1, let

Ri = Pi .
j  =i

Since P is a prime implicant, there are points Y1 , . . . , Yk+1 such that

R1 (Y1 ) = · · · = Rk+1 (Yk+1 ) = 1

f (Y1 ) = · · · = f (Yk+1 ) = 0.
These points violate (11.10). 

For k ≥ 2, the positive functions of degree at most k can be characterized by the


inequality for positive functions together with the inequality in Theorem 11.10.
For arbitrary (i.e., not necessarily positive) Boolean functions, the inequality in
Theorem 11.6 is a sufficient but not necessary condition for the function to have
degree at most k.

11.4 Conditions for characterization


Having given explicit characterizations of a number of particular classes of
Boolean functions, we now address the following general question: Which classes
of Boolean functions can be characterized by a set of functional equations?
Our answer to this question involves two operations on Boolean functions,
identification of variables and addition of inessential variables.

Definition 11.6. Let f be a Boolean function on B n . Let m ≤ n and let r :


{1, . . . , n} → {1, . . . , m} be a surjective function. We say that the Boolean func-
tion g on B m defined by g(x1 , . . . , xm ) = f (xr(1) , . . . , xr(n) ) is produced from f by
identification of variables. We call r the identification map that produces g from
f . If r is a bijection, we say that g is obtained from f by permutation of variables.
Let J = {(xr(1) , . . . , xr(n) )| (x1 , . . . , xm ) ∈ B m }. Let s be the bijection from J to
B such that for all (x1 , . . . , xm ) ∈ B m , s(xr(1) , . . . , xr(n) ) = (x1 , . . . , xm ). We call s
m

the vector map associated with r. Clearly, for all X ∈ J , f (X) = g(s(X)).

Definition 11.7. Let f be a Boolean function on Bn . Let k > 0. Then, the function
g on B n+k defined by g(x1 , . . . , xn+k ) = f (x1 , . . . , xn ) is said to be produced from
g by addition of inessential variables.

If f and g are such that g is produced from f by identification map r, and φ is


a Boolean formula representing f , then one can produce a formula representing g
by simply replacing each variable xi in g by xr(i) .
496 11 Characterizations by functional equations

Example 11.6. Let f (x1 , x2 , x3 ) = x1 x2 ∨x1 x 3 . Let r : {1, 2, 3} → {1, 2} be such that
r(1) = r(3) = 2 and r(2) = 1. Then g(x1 , x2 ) = x1 x2 ∨x2 x 2 = x1 x2 is produced from
f by the identification map r. The function h(x1 , x2 , x3 , x4 ) = x1 x2 can be produced
from g by addition of inessential variables. The function h (x1 , x2 , x3 , x4 ) = x1 x4
can be produced from h by identification of variables (in fact, by permutation of
variables). 

The importance of the operations of identification of variables and addition of


inessential variables can be seen in the following theorem:

Theorem 11.7. If a class K of Boolean functions can be characterized by a set


of functional equations, then K is closed under identification of variables and
addition of inessential variables.

Proof. Let C = D be a functional equation. Let f be a Boolean function on Bn


that satisfies C = D.
Consider a Boolean function f that is produced from f by addition of
inessential variables. Clearly, f also satisfies C = D.
Now consider a Boolean function f that is produced from f by identi-
fication of variables using an identification map r. Let J = {(xr(1) , . . . , xr(n) )|
(x1 , . . . , xm ) ∈ B m }. Let s : J → B m be the vector map associated with r. Consider
any X ∈ J . Clearly, f (X) = f (s(X)). Since J ⊆ Bn , Cf (X) = Df (X), and hence,
Cf (s(X)) = Df (s(X)). Since s is surjective, it follows that f satisfies C = D. 

We can use Theorem 11.7 to prove that certain classes of Boolean functions
cannot be characterized by functional equations.

Theorem 11.8. The following classes of functions do not have a characterization


by a set of functional equations:

(a) Monotone functions.


(b) Functions of degree at most k, for all k ≥ 3.
(c) Shellable functions.
(d) Regular functions.
(e) Read-once functions.

Proof. We show that each of these classes is not closed under identification of
variables.
Monotone functions: Let f (x1 , x2 , x3 , x4 ) = x1 x 2 ∨ x 3 x4 , and apply the identifica-
tion map r : {1, 2, 3, 4} → {1, 2} such that r(1) = 1, r(2) = 2, r(3) = 1, and r(4) = 2
to yield f (x1 , x2 ) = x1 x 2 ∨ x 1 x2 . The function f is neither positive nor negative
in x1 and x2 , and hence, it is not monotone (i.e., not unate).
Functions of degree at most k, for all k ≥ 3: Let f (x1 , . . . , x2k ) = x1 x2 x3 . . . xk
∨ x k+1 . . . x 2k , and apply the identification map r : {1, . . . , 2k} → {1, . . . , 2k − 1}
such that r(i) = i for all i < 2k, and r(2k) = 1. The resulting function has
11.4 Conditions for characterization 497

x2 x3 . . . xk x k+1 . . . x 2k−1 as a prime implicant, and hence, it is not of degree at


most k.
Shellable functions: (The following proof was provided by Yves Crama.) The
function f = x1 x2 ∨ x1 x3 x5 ∨ x2 x3 x5 ∨ x3 x4 x5 is shellable, since it is represented
by the orthogonal DNF φ = x1 x2 ∨ x1 x 2 x3 x5 ∨ x 1 x2 x3 x5 ∨ x 1 x 2 x3 x4 x5 .
Now, identify variables x4 and x5 in f . This yields the function g = x1 x2 ∨
x1 x3 x4 ∨ x2 x3 x4 ∨ x3 x4 x4 = x1 x2 ∨ x3 x4 , which is not shellable. Therefore, the
class of shellable functions is not characterizable by functional equations.
Regular functions: The function f (x1 , x2 , x3 , x4 , x5 , x6 ) = x4 x6 ∨ x5 x6 ∨ x2 x4 x5 ∨
x3 x4 x5 ∨ x1 x2 x3 x5 ∨ x1 x2 x3 x6 is regular because (using the notation from
Chapter 8) x1 ≺f x2 ≈f x3 ≺f x4 ≺f x5 ≺f x6 . Applying the identification map
r : {1, . . . , 6} → {1, . . . , 5} such that r(1) = 1, r(2) = 2, and r(i) = i − 1 for all i ≥ 3,
yields the function

f (x1 , x2 , x3 , x4 , x5 , x6 ) = x3 x5 ∨ x4 x5 ∨ x2 x3 x4 ∨ x1 x2 x4 ∨ x1 x2 x5 .

The function f is not regular because

f (0, 0, 1, 0, 1) = 1
f (0, 1, 0, 0, 1) = 0
f (1, 1, 0, 1, 0) = 1
f (1, 0, 1, 1, 0) = 0,

meaning that x2 and x3 are not comparable.


Read-once functions: Left as an end-of-chapter exercise for the reader
(Exercise 2). 

Surprisingly, closure under identification of variables and addition of inessen-


tial variables is not just a necessary condition for a class of functions to have a
characterization by functional equations; it is also a sufficient condition.
Theorem 11.9. Let K be a class of Boolean functions that is closed under
identification of variables and addition of inessential variables. Then K can be
characterized by a (possibly infinite) set of functional equations.
Proof. This result was first shown by Ekin et al. [305]. The following version of
the proof uses simplifications due to Pippenger [748].
Let G be the set of Boolean functions not in K. For each g ∈ G, we will construct
a functional equation Ig such that Ig is falsified by g and satisfied by every function
in K. The set of equations {Ig | g ∈ G} clearly characterizes K.
Let g ∈ G be defined on Bm . The construction of Ig is as follows: Let t = 2m .
Let A be the t × m binary matrix whose rows are the t binary vectors of length
m, listed in lexicographic order. Let A1 , . . . , At denote the rows of A. Let col(A)
denote the set of column vectors of A. All the columns are distinct.
498 11 Characterizations by functional equations

For i ∈ {1, . . . , t} let hi be the Boolean function on B t such that, for all
(x1 , . . . , xt ) ∈ B t , hi (x1 , . . . , xt ) = xi if the transpose of (x1 , . . . , xt ) is in col(A),
and hi (x1 , . . . , xt ) = 0 otherwise. Similarly, for i ∈ {1, . . . , t} let ht+i be the Boolean
function on Bt such that for all (x1 , . . . , xt ) ∈ B t , ht+i (x1 , . . . , xt ) = xi if the transpose
of (x1 , . . . , xt ) is in col(A), and ht+i (x1 , . . . , xt ) = 1 otherwise. For i ∈ {1, . . . , 2t},
let φi (x1 , . . . , xt ) be a Boolean expression representing hi .
For all n ≥ 0, define the function hin : (Bn )t → Bn as follows: For all
X1 , . . . , Xt ∈ B n , hin (X1 , . . . , Xt ) = (y1 , . . . , yn ) such that, for all j ∈ {1, . . . , n},
yj = hi (X1 [j ], X2 [j ], . . . , Xt [j ]). That is, hin is the function obtained by apply-
ing hi componentwise to X1 , . . . , Xt . Thus, hin is the interpretation of φi (x1 , . . . , xt )
in Bn . Because n may not be equal to 1, we will write φi (X1 , . . . , Xt ) rather than
φi (x1 , . . . , xt ), to emphasize that the variables of φi are vector variables.
Let H = {h1 , . . . , h2t }. Define a partition of H into two sets, H0 and H1 as
follows:
H0 = {hkt+i : i ∈ {1, . . . , t}, k ∈ {0, 1}, and g(Ai ) = 0},
H1 = {hkt+i : i ∈ {1, . . . , t}, k ∈ {0, 1}, and g(Ai ) = 1}.
The desired equation Ig is defined to be

(f (φi (X1 , . . . , Xt ))) ∨ f (φi (X1 , . . . , Xt )) = 1. (11.11)
hi ∈H0 hi ∈H1

We show that Ig is falsified by g but satisfied by all functions in K.


Let C(X1 , . . . , Xt ) denote the functional expression on the left-hand side of Ig ,
so Ig is C(X1 , . . . , Xt ) = 1. For all k ∈ {0, 1},i ∈ {1, . . . , t}, φkt+i (A1 , . . . , At ) =
hnkt+i (A1 , . . . , At ) = Ai . It follows from the definitions of H0 and H1 that
Cg (A1 , . . . , At ) = 0. Therefore, g falsifies Ig .
We now show that any Boolean function f falsifying Ig is not a member of K.
Suppose f is a Boolean function on Bn that falsifies Ig . Then, for some W1 , . . . , Wt ∈
(Bn )t , Cf (W1 , . . . , Wt ) = 0. Let W be the t × n matrix whose rows are W1 , . . . , Wt .
Since Cf (W1 , . . . , Wt ) = 0, it follows that, for all k ∈ {0, 1}, i ∈ {1, . . . , t},
f (φkt+i (W1 , . . . , Wt )) = g(Ai ). (11.12)
The column vectors of W are not necessarily all distinct. Let col(W ) denote
the set of column vectors of W . Let q = |col(W ) ∩ col(A)|.
We first consider the case q > 0. For each column vector in col(W ) ∩ col(A),
choose a column of W that is equal to that column vector. Let k1 , . . . , kq be the
indices of the chosen columns. Let j1 , . . . , jq be the indices of the columns of
A that are equal to columns k1 , . . . , kq of W respectively. Let jq+1 , . . . , jm be the
indices of the remaining columns of A. Let r : {1, . . . , n} → {1, . . . , q} be such that
for i ∈ {1, . . . , n}, r(i) = d if column i of W equals column kd of W (and hence,
column jd of A), and r(i) = 1 if column i of W is not in col(A).
Let f be the function produced from f by the identification map r. Let f0 be
produced from f by addition of m − q inessential variables. Let p : {1, . . . , m} →
11.4 Conditions for characterization 499

{1, . . . , m} be the bijection such that, for all u ∈ {1, . . . , m}, p(u) = ju . Let f be
the function produced from f0 by the identification map p.
Let i ∈ {1, . . . , t}. For index c, let Wic and Aic denote the cth components of Wi
and Ai respectively. Let ρ = Wik1 . Then,

hnρt+i (W1 , . . . , Wt ) = Wikr(1) , . . . , Wikr(n) (11.13)

because if column c of W is equal to a column of A, then column c of W is equal


to column kr(c) of W , and otherwise r(c) = 1.
We now have

g(Ai ) = f (hnρt+i (W1 , . . . , Wt )) by Equation (11.12)


= f (Wikr(1) , . . . , Wikr(n) ) by Equation (11.13)
= f (Wik1 , . . . , Wikq ) because for all (x1 , . . . , xq ) ∈ Bq ,
f (x1 , . . . , xq ) = f (xr(1) , . . . , xr(n) )
= f0 (Wik1 , . . . , Wikq , Aijq+1 , Aijq+2 , . . . , Aijm )
by addition of inessential variables to f
= f0 (Aij1 , . . . , Aijq , Aijq+1 , Aijq+2 , . . . , Aijm )
since Wik1 , . . . , Wikq equal Aij1 , . . . , Aijq respectively
= f (Ai1 , . . . , Aim ) by definition of f
= f (Ai ).

Thus g(Ai ) = f (Ai ) for all i ∈ {1, . . . , t}. Since the rows of A are the t elements
of the domain of g, f = g.
The class K is closed under identification of variables and addition of inessential
variables. If f were in K, then g would be also, since g can be obtained from f
by these operations. Therefore, f is not in K, which is what we wanted to show.
It remains to consider the case q = 0. Let i ∈ {1, . . . , n}. By Equation (11.12),
for ρ ∈ {0, 1}, g(hm n
ρt+i (A1 , . . . , At )) = f (hρt+i (W1 , . . . , Wt )). By the definitions of
hi and ht+i , it follows that g(Ai1 , . . . , Ain ) = f (0, . . . , 0) = f (1, . . . , 1). Since this
is true for all i ∈ {1, . . . , t}, g is a constant function. The constant function g can be
produced from f by first applying the identification map r : {1, . . . , n} → {1} such
that r(u) = 1 for all u ∈ {1, . . . , n}, and then adding m − 1 inessential variables. As
in the case q > 0, it follows immediately that f is not in K. 

Theorem 11.9 can be used to show that particular classes of functions have a
characterization by functional equations. For example, we can prove the following
result for the class of threshold functions.

Theorem 11.10. The class of threshold functions can be characterized by a set of


functional equations.
500 11 Characterizations by functional equations

Proof. By Theorem 11.9, it suffices to show that the class of threshold functions
is closed under identification of variables and addition of inessential variables.
Closure under addition of inessential variables is obvious.
We show closure under identification of variables. Suppose f (x1 , . . . , xn ) is a
threshold function. Then, for some w1 , . . . , wn and t in R, f (x1 , . . . , xn ) = 0 if and

only if i wi xi ≤ t. If f (x1 , . . . , xm ) is obtained from f using an identification
map r, then f (x1 , . . . , xm ) = 0 if and only if
 
m 
 wj  xi ≤ t.
i=1 1≤j ≤n,r(j )=i

Therefore, f is a threshold function. 

Similarly, it is easy to show that the class Fk of functions representable by


DNFs of degree at most k has a characterization by functional equations (see also
Exercise 3). This is in contrast to the result (cf. Theorem 11.8) that, for k ≥ 3, the
class of functions of degree k has no such characterization.
Note that the set of equations constructed in the proof of Theorem 11.9 consists
of one equation Ig for each function g not in the set K being characterized. Since
there are an infinite number of Boolean functions that are not threshold functions,
Theorem 11.9 implies that there is an infinite set of functional equations character-
izing the class of threshold functions. (See Exercise 4 for another way to construct
a characterization of Boolean threshold functions by an infinite set of functional
equations.) In the next section, we address the question of whether the class of
threshold functions can be characterized by a finite set of functional equations.
Combining Theorems 11.7 and 11.9 yields the following:
Theorem 11.11. A class K of functions can be characterized by a set of functional
equations if and only if K is closed under identification of variables and addition
of inessential variables.

11.5 Finite characterizations by functional equations


Theorem 11.12. If a class K of Boolean functions can be characterized by a finite
set of functional equations, then it can be characterized by a single functional
equation.
Proof. Let {C1 = D1 , . . . , Cm = Dm } be a finite set of functional equations. Without
loss of generality, assume that these equations are over disjoint sets of variables.
A function g satisfies all the equations in the above set if and only if it satisfies the
equation m i=1 (Ci Di ∨ C i D i ) = 1. 

When can a class of Boolean functions be characterized by a finite set of func-


tional equations (and hence by a single one)? We begin by describing a necessary
condition.
11.5 Finite characterizations by functional equations 501

Definition 11.8. Let K be a class of Boolean functions. Let g be a Boolean function


on B n . A certificate of nonmembership of g in K is a subset Q ⊆ B n such that
for all Boolean functions f on B n , if f ∈ K, then there exists X ∈ Q such that
f (X)  = g(X). A class K of Boolean functions has constant-size certificates of
nonmembership if there exists an integer c ≥ 0 such that for every Boolean function
g  ∈ K, there is a certificate Q of nonmembership of g in K such that |Q| ≤ c.
Example 11.7. Let g(x1 , x2 ) = x1 x 2 ∨ x 1 x2 . By Theorem 11.1, the positive func-
tions are characterized by the functional inequality f (X) ≤ f (X ∨Y ). If X = (0, 1)
and Y = (1, 1), then g(X) > g(X ∨ Y ). Therefore, {(0, 1), (1, 1)} is a certificate of
nonmembership of g in the class of positive functions.
Since every Boolean function g on B n that is not a positive function must fal-
sify f (X) ≤ f (X ∨ Y ), for each such g, there exists a set {X, Y } ⊆ B n that is a
certificate of nonmembership of g in the class of positive functions. Therefore, the
class of positive functions has constant-size certificates of nonmembership. 
By generalizing Example 11.7 we easily obtain the following result
(Hellerstein [485]):
Theorem 11.13. Let K be a class of functions that can be characterized by
a finite set of functional equations. Then K has constant-size certificates of
nonmembership.
Proof. Let Z be a finite set of functional equations characterizing K. Let
c be the maximum number of vector variables in any equation in Z. Let
g be a Boolean function on B n that is not in K. Then g falsifies some
functional equation C(X1 , . . . , Xm ) = D(X1 , . . . , Xm ) in Z, where m ≤ c. The
two sides of the equation are Boolean expressions over elements of the
form f (τ (X1 , . . . , Xm ))), where τ is a Boolean expression on B m . For fixed
{Y1 , . . . , Ym } ∈ B n , the value of g(τ (Y1 , . . . , Ym ))), for all τ appearing in the equation,
determines whether Cg (Y1 , . . . , Ym )  = Dg (Y1 , . . . , Ym ). Since g falsifies the func-
tional equation C(X1 , . . . , Xm ) = D(X1 , . . . , Xm ), there exist {Y1 , . . . , Ym } ∈ B m
such that Cg (Y1 , . . . , Ym )  = Dg (Y1 , . . . , Ym ); the set of vectors τ (Y1 , . . . , Ym ) ∈ B n ,
for all τ appearing in the functional equation, constitute a certificate that g is not
m
in K. Since each such τ expresses one of the 22 functions on B m , it follows that
m c
this certificate has size at most 22 ≤ 22 . 

By Theorem 11.10, threshold functions can be characterized by a set of


functional equations. However, Hellerstein [485] showed that they cannot be
characterized by a finite set of functional equations. This is proved using the
following result:
Theorem 11.14. Threshold functions do not have constant-size certificates of
nonmembership.
Proof. Suppose for contradiction that the set of threshold functions has certificates
of nonmembership of size c.
502 11 Characterizations by functional equations

Then for each Boolean function g that is not a threshold function, there exists
a certificate Qg of nonmembership of g in the set of threshold functions, such
that Qg has size at most c. Let Sg = {X ∈ Qg |g(X) = 1}, and let Tg = {X ∈ Qg |
g(X) = 0}.
Consider an arbitrary Boolean function g on Bn that is not a threshold function.
If the convex hull of Sg does not intersect the convex hull of Tg , then, by standard
separation theorems, (see, e.g., [788]), there exists a hyperplane separating the
points in Sg from the points in Tg . In this case, there exists a threshold function f
such that f (X) = g(X) for all X ∈ Qg . This contradicts that Qg is a certificate of
nonmembership of g in the set of threshold functions. Hence, the convex hulls of
Sg and Tg intersect.
Let X1 , . . . , Xt be the elements of Qg . Let Mg be the t × n matrix whose rows
are X1 , . . . , Xt . Let Mg be the matrix obtained from Mg by deleting all columns
j from Mg such that for some j < j , column j and column j of Mg are equal.
Let m be the number of columns of Mg , and let X̂1 , . . . , X̂t be the rows of Mg
corresponding to rows X1 , . . . , Xt of Mg .
Let Ŝg = {X̂i |Xi ∈ Sg }, and T̂g = {X̂i |Xi ∈ Tg }. Since Sg and Tg are disjoint, so
are Ŝg and T̂g . Also, since the convex hulls of Sg and Tg intersect, the convex hulls
of Ŝg and T̂g intersect.
Since the convex hulls of Ŝg and T̂g intersect, it follows from the proof of
Theorem 9.14 in Chapter 9 that, for some z > 0, there exist z points X̂i1 , .., X̂iz in
Ŝg (not necessarily distinct), and z points X̂j1 , ..., X̂jz in T̂g (not necessarily distinct)
such that

X̂i1 + · · · + X̂iz = X̂j1 + · · · + X̂jz , (11.14)

and hence,

Xi1 + · · · + Xiz = Xj1 + · · · + Xjz . (11.15)

Let zg be the smallest such z. Note that zg is completely determined by Ŝg


and T̂g .
The columns of Mg are all distinct. Since there are only 2t different binary
vectors of length t, it follows that m ≤ 2t . Because Qg has size at most c, t ≤ c,
and hence, m ≤ 2c .
Therefore, over all possible Boolean functions g that are not threshold functions,
there are a finite number of possible values for Ŝg and T̂g and, hence, a finite number
of possible values for zg .
Let α be the maximum value of zg over all Boolean functions g that are not
threshold functions.
As mentioned in Chapter 9, Section 9.3, Winder showed that for every k
there is a function that is k-asummable but not a threshold function [917, 915, 860].
Consider a function g that is α-asummable but not a threshold function. Since g is
not a threshold function, it follows that, for z = zg , there exist z points Xi1 , ..., Xiz
11.5 Finite characterizations by functional equations 503

in Sg (not necessarily distinct), and z points Xj1 , ..., Xjz in Tg (not necessarily
distinct), such that Equation (11.15) holds. Since zg ≤ α, g is α-summable, a
contradiction. 

By Definition 9.2, any k-summable function has a certificate of size at most


2k that it is k-summable. Thus, Theorem 11.4 generalizes Winder’s result that, for
any fixed k, k-asummability is not a sufficient condition for thresholdness; see
Section 9.3. Informally, it says that any condition depending on only a constant
number of points of the function cannot be a sufficient condition for thresholdness.
Returning to the question of characterization by functional equations, we now
have the following theorem:
Theorem 11.15. Threshold functions cannot be characterized by a finite set of
functional equations.
Proof. Follows immediately from Theorems 11.13 and 11.14. 

Although the existence of constant-size certificates of nonmembership is a


necessary condition for characterization of a class by a finite set of functional
equations, it is not a sufficient condition.
Example 11.8. Let g be a Boolean function on B n such that g is not a monotone
function. Then, there exists k ∈ {1, . . . , n} such that g is neither positive nor negative
in the variable xk . It follows that there exist X = (x1 , . . . , xn ) and Y = (y1 , . . . , yn )
in Bn such that
g(x1 , . . . , xk−1 , 0, xk+1 , . . . , xn ) = 0, (11.16)
g(x1 , . . . , xk−1 , 1, xk+1 , . . . , xn ) = 1, (11.17)
g(y1 , . . . , yk−1 , 1, yk+1 , . . . , yn ) = 0, (11.18)
g(y1 , . . . , yk−1 , 0, yk+1 , . . . , yn ) = 1. (11.19)
The four vectors in the above equations constitute a certificate of nonmembership
of g in the class of monotone functions. Since such a set of four vectors exists
for each non-monotone g, monotone functions have constant-size certificates of
nonmembership. However, by Theorem 11.8, monotone functions cannot be char-
acterized by any set (finite or infinite) of functional equations. 
Ekin et al. [305] showed that a condition that is both necessary and sufficient
can be obtained by considering identification minors.
Definition 11.9. Let f be a Boolean function, and let g be a function that is
produced from f by identification of variables. The function g is called an identi-
fication minor of f . We use the notation g , f to denote that g is an identification
minor of f .
Identification minors are a restricted case of the Boolean minors introduced
by Wang and Williams [897] and Wang [896]. They are called minors because
504 11 Characterizations by functional equations

of their similarity to graph minors, which have been extensively studied in graph
theory.

Definition 11.10. Let K be a class of Boolean functions. A Boolean function g is


called a forbidden identification minor of K if g is not an identification minor of
any function f ∈ K.

Example 11.9. The function f (x1 , x2 ) = x1 x 2 ∨ x 1 x2 is a forbidden identification


minor of the class of positive functions. 

Definition 11.11. Let K be a class of Boolean functions, and let Z be a set of


forbidden identification minors of K. The set Z characterizes K if every function
not in K has an identification minor in Z.

Theorem 11.16. Let K be a class of Boolean functions. Then K can be character-


ized by a finite set of functional equations if and only if K is closed under addition
of inessential variables and can be characterized by a finite set of forbidden
identification minors.

Proof. Suppose K can be characterized by a finite set of functional equations.


By Theorem 11.7, K must be closed under addition of inessential variables. We
show now that it can be characterized by a finite set of forbidden identification
minors.
Since K can be characterized by a finite set of functional equations, by The-
orem 11.12 it can be characterized by a single functional equation E = F . Let
X1 , . . . , Xm be the vector variables appearing in E = F . Suppose f is a Boolean
function on Bn such that n > 2m and f does not satisfy E = F . Then there exist
V1 , . . . , Vm ∈ Bn such that Ef (V1 , . . . , Vm )  = Ff (V1 , . . . , Vm ). Consider the m × n
matrix W with rows V1 , .., Vm , in that order. Let n be the number of distinct columns
of W . Clearly, n ≤ 2m . Consider an identification map r : {1, . . . , n} → {1, . . . , n }
such that r(i) = r(j ) if and only if columns i and j of W are equal. This map
produces an identification minor f of f defined on B n . Let s be the vec-
tor map corresponding to r. For i ∈ {1, . . . , n}, f (Vi ) = f (s(Vi )). Therefore,
Ef (s(V1 ), . . . , s(Vm ))  = Ff (s(V1 ), . . . , s(Vm )).
Thus, for every f defined on B n , with n > 2m and f  ∈ K, there exists f defined
on Bn with n ≤ 2m , such that f , f and f ∈ K. The set of all such f is finite,
and forms a set of forbidden identification minors that characterizes K.
Conversely, suppose K is closed under addition of inessential variables and can
be characterized by a finite set of forbidden identification minors. Clearly, K is
closed under identification of variables. Let Z = {g1 , . . . , gn } be a set of forbidden
identification minors characterizing K.
Referring to the proof of Theorem 11.9, consider the equations Ig1 , . . . , Ign . By
Theorem 11.7, if a function f satisfies these equations, then so do all identification
minors of f . Because g1 , . . . , gn do not satisfy all these equations, it follows that
11.5 Finite characterizations by functional equations 505

g1 , . . . , gn are not identification minors of f . Hence, f ∈ K. Conversely, by the


proof of Theorem 11.9, if f belongs to K then f satisfies every Igi . Therefore, the
equations {Ig1 , . . . , Ign } characterize K. 

As we showed in Example 11.8, for arbitrary classes of Boolean functions,


having constant-size certificates of nonmembership is a necessary, but not suffi-
cient, condition for the class to have a characterization by a finite set of functional
equations. However, Hellerstein [485] showed that for classes closed under iden-
tification of variables and addition of inessential variables, the condition is both
necessary and sufficient.

Theorem 11.17. Let K be a class of Boolean functions that is closed under


identification of variables and addition of inessential variables. Then K can
be characterized by a finite set of functional equations if and only if K has
constant-size certificates of nonmembership.

Proof. Necessity was shown in Theorem 11.13.


To show sufficiency, suppose every Boolean function not in K has a certificate
of nonmembership of size at most c, for some constant c.
Let g be a Boolean function on Bn that is not in K. Let Q = {Q1 , . . . , Qk } be a
certificate of nonmembership of g in K such that k ≤ c.
Consider the matrix A whose rows are Q1 , . . . , Qk . Let n be the number of
distinct column vectors appearing as columns of A. Clearly n ≤ 2k . Without loss
of generality, assume that the first n columns of A are distinct. Let r : {1, . . . , n} →
{1, . . . , n } be such that for all j ∈ {1, . . . , n }, r(j ) = i, where 1 ≤ i ≤ n and the ith
and j th columns of A are equal. Let g be the function produced from g using the
identification map r.
Now consider the function g derived from g by adding n − n inessential
variables to g . For each Qi ∈ Q, g (Qi ) = g(Qi ). Since Q is a certificate of
nonmembership of g in K, it is also a certificate of nonmembership of g in K.
Thus g  ∈ K.
Since g can be produced from g by addition of inessential variables, and since
K is closed under addition of inessential variables, g  ∈ K. Therefore, g has an
identification minor g that is not in K such that g is defined on B n for some
n ≤ 2c . This holds for each g not in K. Let Z be the set of all such g . The set Z
consists of forbidden identification minors of K and characterizes K.
Because there are only a finite number of functions defined on B n , for all
n ≤ 2c , Z is a finite set. By Theorem 11.16, K can be characterized by a finite set
of functional equations. 

Hellerstein and Raghavan [486] showed that, for any k, the class of functions
representable by DNFs having at most k terms has constant-sized certificates of
nonmembership, and hence, by the above theorem, it has a characterization by a
finite set of functional equations.
506 11 Characterizations by functional equations

We observed earlier that the class Fk can be characterized by a set of functional


equations. In contrast to this result, Fk cannot be characterized by a finite set of
functional equations (see Exercise 3).

11.6 Exercises
1. Give a functional equation characterizing the class of elementary conjunc-
tions.
2. Prove that the class of read-once functions cannot be characterized by a set
of functional equations.
3. This exercise is based on a result of Bernard Rosell (personal communica-
tion). Let k > 2. Recall that the class Fk consists of functions representable
by DNFs of degree at most k. Let f (x1 , . . . , xn ) be the function whose output
is 1 if and only if at least k+1 2
of its inputs are 1 and at least k+1
2
of its inputs
are 0.
(a) Show that the given function f is not in Fk .
(b) Prove a lower bound on the size of any certificate of nonmembership of
f in Fk . Use this lower bound to show that Fk cannot be characterized
by a finite set of functional equations.
4. In [748], Pippenger defined a Boolean constraint to be a pair (R, S), where
R and S are each a set of binary column vectors of length m, for some m ≥ 0.
If A is an m × n binary matrix, and f (x1 , . . . , xn ) is a Boolean function on n
variables, then let f (A) denote the column vector produced by applying f to
each row of A; namely, f (A) is the length m column vector whose ith entry
is f (A[i, 1], A[i, 2], . . . , A[i, n]), for all entries i. We write A ≺ R if each
column of A is a member of R. Function f (x1 , . . . , xn ) satisfies constraint
(R, S) if for all m × n binary matrices A, A ≺ R implies that f (A) ∈ S.
Aset I of Boolean constraints characterizes a class K of Boolean functions
if K consists precisely of the Boolean functions that satisfy all constraints
in I .
(a) Show that the following constraint characterizes the class of positive
9 : 9 : 9 :. 9 : 9 : 9 :.
0 1 0 0 1 0
Boolean functions: , , , , , .
1 1 0 1 1 0
(b) Give a constraint that characterizes the class of Horn functions.
(c) By Theorem 9.14, a Boolean function is a threshold function if and only
if it is k-asummable for every k ≥ 2.
Describe a constraint that characterizes the set of functions that are k-
asummable, for fixed k ≥ 2. Then construct an infinite set of constraints
that characterizes the class of threshold functions.
(d) Show that a class of Boolean functions can be characterized by a set
of functional equations if and only if it can be characterized by a set of
Boolean constraints. (See [748].)
11.6 Exercises 507

5. Let K be a class of Boolean functions. A Boolean function g defined on


B n is a minimal forbidden identification minor of K if it is a forbidden
identification minor of K, and, for every identification minor g of g, if g
is defined on B n and n < n, then g ∈ K.
(a) Prove that, if a Boolean function f is defined on B5 , then f is not a
minimal forbidden identification minor of the class of linear functions.
(b) Give an example of a function defined on B3 that is a minimal forbidden
identification minor of the class of linear functions.
Part III

Generalizations
12
Partially defined Boolean functions
Toshihide Ibaraki

12.1 Introduction
Suppose that a set of data points is at hand for a certain phenomenon. A data point
is called a positive example if it describes a case that triggers the phenomenon, and
a negative example otherwise. We consider the situation in which all data points
are binary and have a fixed dimension; namely, they belong to B n .
Given a set of positive examples T ⊆ B n and a set of negative examples F ⊆ B n ,
we call the pair (T , F ) a partially defined Boolean function (pdBf) on B n . For a
pdBf (T , F ) on B n , a Boolean function f : Bn → B satisfying
T (f ) ⊇ T and F (f ) ⊇ F
is called an extension of (T , F ), where
T (f ) = {A ∈ Bn | f (A) = 1}, (12.1)
n
F (f ) = {B ∈ B | f (B) = 0}. (12.2)
If we associate n Boolean variables xj , j = 1, 2, . . . , n, with the components of
points in B n , then extensions are Boolean functions of the variables x1 , x2 , . . . , xn .
As an example of a pdBf (T , F ), let us assume that each point A =
(a1 , a2 , . . . , an ) ∈ T ∪ F indicates the result of physical tests applied to a patient,
where T denotes the set of results for patients diagnosed as positive, and F denotes
the set of negative results. Each component aj of a point A gives the result of the
j -th test; for example, a1 = 1 may indicate that blood pressure is “high,” while
a1 = 0 indicates, “low”; a2 = 1 may say that body temperature is “high,” while
a2 = 0 says “low,” and so on. An extension f of this pdBf (T , F ) then describes
how the diagnosis of the disease could be formulated for all possible patients. In
other words, this Boolean function f contains all the details of the diagnosis. As
extensions of a given pdBf (T , F ) are not unique, in general, it is interesting and
important to investigate how to build meaningful extensions from given pdBfs.
This line of approach to data analysis recently received increasing attention in
statistics and in artificial intelligence under various names such as data mining,

511
512 12 Partially defined Boolean functions

knowledge discovery, and knowledge acquisition (Agrawal, Imielinski, and Swami


[8]; Crama, Hammer, and Ibaraki [233]; Fayyad et al. [321]; Mangasarian [659];
Mangasarian, Setiono, and Wolberg [660]; Mannila, Toivonen, and Verkamo [665];
Quinlan [770, 771]), reflecting the current trend that large amount of data sets are
available in many applications. In addition to the diagnosis of diseases, applications
include the analysis of sales records at retail shops, economic indices of countries
and enterprises, stock market records, DNA sequences, geological data, and many
others. Extraction of meaningful information from such data sets is considered very
important. It may be interesting to observe that closely related approaches have also
been proposed in the social science literature, where they are specifically applied
to the analysis of small sets of qualitative data which do not lend themselves to
classical statistical approaches; see, for instance, Flament [333] and Ragin [775].
We term the approach in this chapter logical analysis of data (LAD) to empha-
size its logical aspects in statistics and in artificial intelligence. The study of LAD
was initiated by Crama, Hammer, and Ibaraki [233] and has been elaborated in
subsequent papers, such as those by Boros et al. [122, 128, 130, 131, 139, 140]
and Bonates and Hammer [102]. More references are found in other sections of
this chapter.
A large body of studies on pdBfs can be found in switching theory (Curtis [248];
Hu [512]; Kuntzmann [589]; McCluskey [634]; Mendelson [680]; Muroga [699];
Prather [755]; Roth [793]; Urbano and Mueller [879]). In this area, pdBfs are
often called “incompletely specified Boolean functions,” as the value of Boolean
functions is usually specified in most points, except for some binary points called
“don’t cares,” which never arise as input vectors because of circuit specification
constraints. The main issue here is to exploit don’t cares to simplify the resulting
circuits. Various minimization techniques used for Boolean functions (as discussed
in Section 3.3) have been generalized to minimize functions with don’t cares. Some
discussion in this direction will be given in Section 12.6; see Villa, Brayton, and
Sangiovanni-Vincentelli [891] for more information.
Extensions of pdBfs are also closely related to problems studied in computa-
tional learning theory (see, e.g., Aizenstein et al. [13]; Angluin [21]; Anthony [26];
Anthony and Biggs [29]; Bshouty [161]; Kearns, Li and Valiant [560]; Pitt and
Valiant [749]; Sloan, Szörényi and Turán [838]; Valiant [884], etc.); in fact, some
relevant results on pdBfs were first obtained in learning theory.
We also note that psychologists rely on Boolean functions to model human
concept learning from examples, as explained, for instance, by Feldman [326, 327];
see also Ganter and Wille [369] for a general mathematical framework of concept
formation.
Finally, discriminant functions studied in pattern recognition have obvious
resemblance with extensions, although statistical models and methods are usu-
ally considered in pattern recognition (e.g., Gnanadesikan [388]; Hand [466]), in
contrast with the purely logical and combinatorial methods to be covered in this
chapter.
12.1 Introduction 513

Table 12.1. An example of pdBf (T , F )

x1 x2 x3 x4 x5 x6 x7 x8
(1)
A = 0 1 0 1 0 1 1 0
T A(2) = 1 1 0 1 1 0 0 1
A(3) = 0 1 1 0 1 0 0 1

B (1) = 1 0 1 0 1 0 1 0
F B (2) = 0 0 0 1 1 1 0 0
B (3) = 1 1 0 1 0 1 0 1
B (4) = 0 0 1 0 1 0 1 0

Example 12.1. Consider a pdBf (T , F ) as shown in Table 12.1. Extensions of this


pdBf can be expressed by the following DNFs:

f1 = x̄1 x2 ∨ x2 x5
f2 = x̄1 x̄5 ∨ x3 x̄7 ∨ x1 x5 x̄7
f3 = x5 x8 ∨ x6 x7 .

It can be verified that all these functions are indeed extensions of (T , F ), that is,
fk (A(i) ) = 1 holds for i = 1, 2, 3 and fk (B (i) ) = 0 holds for i = 1, 2, 3, 4. As we
shall see later, this pdBf has many other extensions. 

As extensions are Boolean functions, they can be represented by DNFs, CNFs,


and other Boolean expressions. We shall also discuss decision trees as a means of
representing extensions in Section 12.2.5.
When choosing among many extensions of a given pdBf, we need some criteria
to guide the choice. We emphasize the following two points: First,
• the simplicity of extensions,

which may reflect our general belief that the truth is simple and beautiful, or as a
translation of Occam’s razor principle. Simplicity can be measured, for example,
by the sizes of representations such as DNFs, CNFs, and decision trees. The size of
a “support set,” to be discussed in Section 12.2.2, is another measure of simplicity.
Second,
• embodiment in the extensions of structural knowledge concerning the
phenomenon to be modeled.
For example, if high blood pressure is known to favor the appearance of a disease,
then we expect the extension f to depend positively on the variable xj associated
with blood pressure.
In more general mathematical terms, we require the obtained extension to
belong to a specified class of functions C. The selected class C may arise not
only from prior structural information, but also from the application that we have
514 12 Partially defined Boolean functions

in mind for the resulting extensions. For example, if an extension f is Horn, then
f can be dealt with by Horn rules; as discussed in Chapter 6, this allows us to
benefit from numerous convenient mathematical properties of Horn rules.
In this chapter, we consider the following classes of functions:
(1) The class of all Boolean functions, FALL .
(2) The class of positive functions, F+ (defined in Sections 1.10 and 1.11).
(3) The class of monotone, or unate functions, FUNATE (defined in Section 1.10).
(4) The class of functions representable by a DNF of degree at most k, Fk
(defined in Sections 1.4 and 1.11).
(5) The class of Horn functions, FHORN (discussed in Chapter 6).
(6) The class of threshold functions, FTh (discussed in Chapter 9).
(7) The class of decomposable functions, FF0 (S0 ,F1 (S1 )) (defined in
Section 12.3.6).
(8) The class of k-convex functions, Fk-CONV (discussed in Section 12.3.7).
For other classes of functions studied in the literature on pdBfs, see [139].
In dealing with real-world data, we should also be aware that the data may
contain errors as well as missing bits. A missing bit is denoted by ∗, meaning that
it can be either 0 or 1. We shall discuss in Sections 12.4 and 12.5 how to deal
with these situations, and we shall introduce various problems associated with the
extensions in such cases.

12.2 Extensions of pdBfs and their representations


12.2.1 Definitions
Given a Boolean function of n variables f : Bn → B, let T (f ) denote its set of
true points and F (f ) its set of false points, as defined by (12.1)–(12.2). Obviously
T (f ) ∩ F (f ) = ∅ and T (f ) ∪ F (f ) = Bn hold. For two Boolean functions f and
g on the same set of n variables, recall that we write f ≤ g if f (A) ≤ g(A) holds
for all A ∈ Bn , where we consider 0 < 1 for B = {0, 1}. As already defined in
Section 12.1, a Boolean function f : B n → B is an extension of a pdBf (T , F ),
where T ⊆ Bn and F ⊆ Bn , if T (f ) ⊇ T and F (f ) ⊇ F hold.
A fundamental question raised in Section 12.1 can be stated as follows, where
C denotes an arbitrary class of Boolean functions:

Problem EXTENSION(C)
Instance: A pdBf (T , F ).
Question: Does (T , F ) have an extension in C?

When the answer to the question is “yes,” it is frequently required to output an


extension in C.
In this section, we consider the class C = FALL . Other classes will be dis-
cussed in subsequent sections. The following theorem is immediate from the above
definitions.
12.2 Extensions of pdBfs and their representations 515

Theorem 12.1. A pdBf (T , F ) has an extension in FALL if and only if T ∩ F = ∅.


Hence, problem EXTENSION(FALL ) can be solved in polynomial time. 

n −|T |−|F |
If a pdBf (T , F ) satisfies T ∩ F = ∅, then it has 22 extensions. Define
two extensions fmin and fmax by

T (fmin ) = T , F (fmin ) = Bn \ T , (12.3)

T (fmax ) = Bn \ F , F (fmax ) = F . (12.4)

Then, any extension f of (T , F ) satisfies

fmin ≤ f ≤ fmax ,

that is, fmax maximizes T (f ) and fmin minimizes T (f ) among all extensions f
of (T , F ). Furthermore, all extensions of (T , F ) form a finite lattice with respect
to the operations ∨ and ∧ between functions. The largest element of this lattice is
fmax , and its smallest element is fmin . A remaining question is: Which extensions
in the lattice are appropriate for the purpose of logical analysis of data?

12.2.2 Support sets of variables


For a subset U ⊆ B n and S ⊆ {1, 2, . . . , n}, we denote by U |S the projection of
U to S. In other words, U |S = {A|S | A ∈ U }, where A|S = (aj | j ∈ S) is the
point obtained from A by considering only those components aj with j ∈ S.
For example, for U = {(1, 0, 1, 1), (0, 1, 1, 0), (0, 0, 0, 1)} and S = {2, 3}, we have
U |S = {(0, 1), (1, 1), (0, 0)}. Given a pdBf (T , F ) with T , F ⊆ B n , and a class C
of Boolean functions, a subset S ⊆ {1, 2, . . . , n} is called a support set for class
C if (T |S , F |S ) has an extension in class C. In a sense, given a support set S, all
variables xj , j ∈ {1, 2, . . . , n} \ S, are redundant because there is an extension in C
that does not depend on them.
From the viewpoint of pursuing simple extensions, therefore, it is meaningful
to consider small support sets. We say that a support set S is minimal if there is no
other support set properly contained in S, and minimum if it minimizes |S|.

Problem MIN-SUPPORT(C)
Instance: A pdBf (T , F ) (where we assume that (T , F ) has an extension in C).
Output: A minimum support set S of (T , F ) for class C.

We first show that this problem for class FALL can be formulated as a set
covering problem. Recall that the set covering problem is the following NP-hard
optimization problem [371]:

Problem SET COVER


Instance: An m × n 0–1 matrix Q.
Output: An n-dimensional 0–1 vector y = (y1 , y2 , . . . , yn )t that satisfies Qy ≥ 1
516 12 Partially defined Boolean functions

n
and that minimizes j =1 yj , where 1 is the n-dimensional column vector of 1’s.

For any two points A, B ∈ Bn , define

J(A, B) = {j ∈ {1, 2, . . . , n} | aj  = bj }. (12.5)

Let us introduce 0–1 variables yj , j = 1, 2, . . . , n, to denote whether j ∈ S (i.e.,


yj = 1) or j  ∈ S (i.e., yj = 0). It is easy to see that A|S  = B|S holds for S = {j |
yj = 1} if

yj ≥ 1. (12.6)
j ∈J(A,B)

Therefore, as a result of Theorem 12.1, problem MIN-SUPPORT(FALL ) can be


formulated as follows:
n

minimize yj
j =1

subject to yj ≥ 1, A ∈ T , B ∈ F (12.7)
j ∈J(A,B)

yj ∈ {0, 1}, j ∈ {1, 2, . . . , n}.

The relation between support sets and the set covering problem has been observed
in various early papers (e.g., Kambayashi [547]; Kuntzmann [589]; Necula [704]).
The preceding description follows the presentation by Crama, Hammer, and
Ibaraki [233].

Example 12.2. The set covering problem (12.7) corresponding to the pdBf in
Example 12.1 is given as follows.

8

minimize yj
j =1

subject to y1 + y2 + y3 + y4 + y5 + y6 ≥ 1
y2 + y5 + y7 ≥ 1
y1 + y7 + y8 ≥ 1
y2 + y3 + y4 + y5 + y6 ≥ 1
y2 + y3 + y4 + y7 + y8 ≥ 1
y1 + y2 + y6 + y8 ≥ 1
y5 + y6 ≥ 1
12.2 Extensions of pdBfs and their representations 517

y1 + y2 + y3 + y4 + y7 + y8 ≥ 1
y1 + y2 + y7 + y8 ≥ 1
y2 + y3 + y4 + y6 + y8 ≥ 1
y1 + y3 + y4 + y5 + y6 ≥ 1
y2 + y7 + y8 ≥ 1
y1 , y2 , . . . , y8 ∈ {0, 1}.

This set of inequalities contains many redundant inequalities, and can be greatly
simplified. As already observed in Chapter 1, Section 1.13, the constraints of a set
covering problem can be associated with a CNF such that a 0–1 assignment of
values to y satisfies the set covering constraints if and only if it satisfies all clauses
of the CNF. In the current example, we obtain the CNF

ψ = (y1 ∨ y2 ∨ y3 ∨ y4 ∨ y5 ∨ y6 )(y2 ∨ y5 ∨ y7 )(· · · )(y2 ∨ y7 ∨ y8 ).

It is not difficult to see that the prime implicants of the function represented
by ψ correspond exactly to the minimal support sets of (T , F ) (see Chapter 4,
Section 4.2). Applying this procedure, we conclude that there are eight minimal
support sets for our example, namely,

S1 = {5, 8}, S2 = {6, 7},


S3 = {1, 2, 5}, S4 = {1, 2, 6}, S5 = {2, 5, 7},
S6 = {2, 6, 8}, S7 = {1, 3, 5, 7}, S8 = {1, 4, 5, 7}.

The first two sets, S1 and S2 , are the only minimum support sets, and the following
DNFs provide two extensions associated with S1 and S2 , respectively.

ϕ1 = x5 x8 ∨ x̄5 x̄8
ϕ2 = x6 x7 ∨ x̄6 x̄7 .


Theorem 12.2. Problem MIN-SUPPORT(FALL ) is NP-hard.


Proof. We provide a reduction from SET COVER. Given an instance Q of SET
COVER, we consider the following instance of MIN-SUPPORT(FALL ):

T = {Qi | i = 1, 2, . . . , m},
F = {(0, 0, · · · , 0)},

where Qi denotes the i-th row of the 0–1 matrix Q. It is easy to see that the
formulation (12.7) for MIN-SUPPORT(FALL ) is exactly the same as the original
instance of SET COVER. This shows that SET COVER is reducible to MIN-
SUPPORT(FALL ), and proves the theorem. 
518 12 Partially defined Boolean functions

Problem SET COVER has been intensively studied in operations research, as it


has a wide variety of applications. Even though it is NP-hard, branch-and-bound
algorithms can solve fairly large instances of SET COVER exactly (Nemhauser
and Wolsey [707]), and there are various heuristic algorithms that can find very
good feasible solutions of large instances (Caprara, Fischetti, and Toth [169];
Yagiura, Kishida, and Ibaraki [927]).Atheoretical analysis of simple greedy heuris-
tics can be found in papers by Chvátal [198] and Lovász [623]. These algorithms
can be used to solve the formulation (12.7) of MIN-SUPPORT(FALL ) exactly or
approximately. Other types of heuristic algorithms to find support sets are described
in Boros et al. [137].

12.2.3 Patterns and theories of pdBfs


In this section, we consider methods of obtaining extensions with rather simple
DNFs. A DNF will be considered “simple” if all its terms are short and the number
of its terms is small.
We say that a term t covers A ∈ B n if t(A) = 1 holds, where t is regarded as
a function. Let us define a term t as a pattern of a pdBf (T , F ) if it covers some
point A ∈ T , but does not cover any point B ∈ F ; that is, if T (t) ∩ T = ∅ and
T (t) ∩ F = ∅. For the pdBf of Example 12.1,

x̄1 x2 x̄3 x4 x̄5 x6 x7 x̄8 , x̄1 x2 x̄3 x4 , x̄1 x2

are some of the patterns which cover A(1) ∈ T .


Let t be a pattern of (T , F ). We say that t is a prime pattern if no pattern of
(T , F ) can be obtained by deleting some literals from t, that is, if T (t ) ∩ F  = ∅
holds for every t  = t that absorbs t. Continuing the above example, we can see
that
x̄1 x2 , x̄1 x̄5 , x2 x7 , x2 x̄8 , x̄3 x7 , x4 x7 , x̄5 x7 , x̄5 x̄8 , x6 x7
are all the prime patterns that cover A(1) ∈ T .
Patterns and prime patterns of (T , F ) are closely related to the function fmax
defined by (12.4), as shown by the next lemma.
Lemma 12.1. Let (T , F ) be a pdBf. A term t is a pattern (respectively, a prime
pattern) of (T , F ) if and only if t is an implicant (respectively, a prime implicant)
of fmax that covers some point in T .
Proof. Let t be a term of (T , F ). The condition T (t)∩F = ∅ is equivalent to T (t) ⊆
T (fmax ), which means, in turn, that t is an implicant of fmax . The characterization
of patterns follows directly from this observation.
If t is a prime pattern of (T , F ), then, every term t obtained from t by deleting
some literals satisfies T (t ) ∩ F (fmax )  = ∅, meaning that t is not an implicant
of fmax ; hence, t is a prime implicant of fmax . The converse statement is proved
similarly. 
12.2 Extensions of pdBfs and their representations 519

Generating all prime patterns is a very important problem in logical analysis


of data, and in the design of logic circuits. Some methods for this purpose are
discussed in Section 12.6.
Now consider a DNF ϕ consisting only of patterns of a given pdBf (T , F ) such
that every A ∈ T is covered by some patterns in ϕ. Such a DNF ϕ represents an
extension of (T , F ) and is called a theory of (T , F ). If the patterns in a theory ϕ
are all prime, then ϕ is called a prime theory. Every prime theory is a theory, but
the converse does not hold in general. If a theory ϕ has the additional property
that none of its patterns can be removed without sacrificing the covering condition
of T , it is called an irredundant theory of (T , F ). In general, there exist many
irredundant theories of (T , F ), and every such theory is minimal (but may not be
minimum) in the sense of the number of terms. In a similar manner, we can define
a prime irredundant theory of (T , F ). A prime irredundant theory is minimal with
respect to the length of each term as well as the number of terms. In subsequent
sections, when no confusion arises, we may sometimes call “theory” the extension
represented by a theory.
Example 12.3. Consider again the pdBf of Example 12.1. It is easy to see that
x̄1 x2 is a prime pattern that covers A(1) ,
x2 x5 is a prime pattern that covers A(2) ,
x3 x8 is a prime pattern that covers A(3) ,
and the following DNF gives a prime theory:
ϕ = x̄1 x2 ∨ x2 x5 ∨ x3 x8 .
However, this theory is not irredundant, because the DNF ϕ obtained from ϕ by
removing x3 x8 still covers all points in T :
ϕ = x̄1 x2 ∨ x2 x5 .
This prime theory ϕ is irredundant, as none of the patterns in ϕ can be removed
any longer. 
At this point, let us note that not all extensions of a given pdBf (T , F ) are
theories. For the pdBf of Example 12.1, for instance, the extension f with T (f ) =
T ∪ {(1, 1, 1, 1, 1, 1, 1, 1)} and F (f ) = Bn \ T (f ) is not a theory, since any term that
covers a point in T and (1, 1, 1, 1, 1, 1, 1, 1) must cover some other points not in T
(hence in F (f )). In most cases, only a very small fraction of all extensions are
theories (and an even smaller fraction are prime theories). In this sense, theories
(in particular, prime irredundant theories) define extensions of a given pdBf (T , F )
that can be considered as simple in their DNF expressions. Indeed, the following
statement holds:

Theorem 12.3. Let f be an extension of the pdBf (T , F ) and let ϕ = m i=1 Ci be
an arbitrary DNF expression of f . If f is not a theory, then there exists a proper
subset S ⊂ {1, 2, . . . , m} such that i∈S Ci is a theory of (T , F ).
520 12 Partially defined Boolean functions

Table 12.2. All basic theories of the pdBf in Table 12.1

ϕ1 = x5 x8 ∨ x̄5 x̄8 ϕ71 = x̄1 x̄5 ∨ x3 x̄7 ∨ x1 x5 x̄7 ϕ81 = x̄1 x̄5 ∨ x̄4 x̄7 ∨ x1 x4 x5
ϕ2 = x6 x7 ∨ x̄6 x̄7 ϕ72 = x̄1 x̄5 ∨ x3 x̄7 ∨ x1 x̄3 x5 ϕ82 = x̄1 x̄5 ∨ x̄4 x̄7 ∨ x1 x5 x̄7
ϕ31 = x̄1 x2 ∨ x2 x5 ϕ73 = x3 x̄7 ∨ x̄3 x7 ∨ x1 x5 x̄7 ϕ83 = x4 x7 ∨ x̄4 x̄7 ∨ x1 x4 x5
ϕ32 = x̄1 x̄5 ∨ x2 x5 ϕ74 = x3 x̄7 ∨ x̄3 x7 ∨ x1 x̄3 x5 ϕ84 = x4 x7 ∨ x̄4 x̄7 ∨ x1 x5 x̄7
ϕ4 = x̄1 x2 ∨ x2 x̄6 ϕ75 = x3 x̄7 ∨ x̄5 x7 ∨ x1 x5 x̄7 ϕ85 = x̄4 x̄7 ∨ x̄5 x7 ∨ x1 x4 x5
ϕ51 = x2 x5 ∨ x2 x7 ϕ76 = x3 x̄7 ∨ x̄5 x7 ∨ x1 x̄3 x5 ϕ86 = x̄4 x̄7 ∨ x̄5 x7 ∨ x1 x5 x̄7
ϕ52 = x2 x5 ∨ x̄5 x7
ϕ61 = x2 x̄6 ∨ x2 x̄8
ϕ62 = x2 x̄8 ∨ x̄6 x8

Proof. For every A ∈ T , there is a term Ci(A) , with i(A) ∈ {1, 2, . . . , m}, such that
Ci(A) (A) = 1. Since Ci(A) (B) = 0 for all B ∈ F , we see that Ci(A) is a pattern of

(T , F ). Now, let S = {i(A) | A ∈ T }. Then, i∈S Ci is a theory of (T , F ) and since
f itself is not a theory, it must be the case that S is a proper subset of {1, 2, . . . , m}. 

Another measure of simplicity addressed in Section 12.2.2 is the minimality


of a support set S. Any theory of (T |S , F |S ) over a support set S is a theory of
the original pdBf (T , F ). From the view point of simplicity, it is desirable that the
support set S be minimal and the theory be prime. Furthermore, it is easy to show
that any irredundant theory of (T |S , F |S ) for a support set S is an irredundant
theory of (T , F ). Combining these concepts together, we call basic theory any
prime irredundant theory of a pdBf (T , F ) defined over a minimal support set. A
basic theory displays simplicity with respect to both the size of its DNF expression
and the size of the support set.

Example 12.4. As all minimal support sets were listed in Example 12.2 for the
pdBf (T , F ) of Example 12.1, we are now able to obtain all basic theories by
enumerating all prime patterns for each support set Sk . Table 12.2 gives all basic
theories thus obtained, where ϕki indicates the i-th basic theory generated from a
minimal support set Sk . (The superscript i is not indicated if Sk has only one basic
theory.)
Note that the pdBf (T , F ) has |B8 \ (T ∪ F )| = 28 − 7 = 249 unspecified points,
implying that it has 2249 extensions in FALL . Table 12.2 shows that only 21 of these
extensions are basic theories. 

The preceding discussion can be symmetrically applied to the set F of a pdBf


(T , F ) (in other words, when we consider the pdBf (F , T ) instead of (T , F )).
An implicant t of f¯min that covers at least one point B ∈ F is called a copattern
of (T , F ). A copattern is a prime copattern if it is a prime implicant of f¯min .
Cotheories, prime cotheories, irredundant cotheories and basic cotheories can
then be defined from copatterns and prime copatterns in the same manner. For the
pdBf (T , F ) of Example 12.1, x5 x̄8 is a prime copattern that covers B (1) , B (2) , B (4) ,
12.2 Extensions of pdBfs and their representations 521

Table 12.3. Flat data set corresponding to the


pdBf in Table 12.1

x1 x2 x3 x4 x5 x6 x7 x8 x9

0 1 0 1 0 1 1 0 1
1 1 0 1 1 0 0 1 1
0 1 1 0 1 0 0 1 1
1 0 1 0 1 0 1 0 0
0 0 0 1 1 1 0 0 0
1 1 0 1 0 1 0 1 0
0 0 1 0 1 0 1 0 0

and x̄5 x8 is a prime copattern that covers B (3) . Therefore ϕ = x5 x̄8 ∨ x̄5 x8 is a prime
cotheory. It is easy to see that this extension ϕ is also a basic cotheory.
In concluding this subsection, we briefly comment upon the history of the fun-
damental concepts of patterns and theories. In the context of LAD, the definitions
of patterns and theories were formulated in Crama, Hammer, and Ibaraki [233], and
their properties have been studied in several subsequent papers; see, for example,
Boros et al. [115]. However, as patterns and prime patterns for pdBfs are natural
generalizations of implicants and prime implicants for Boolean functions, similar
concepts can be found in early references such as Mendelson [680]; Prather [755];
and Roth [793, 879]. For example, in [755, 793], patterns are discussed under the
name of “basic cells,” and prime patterns under the name of “maximal basic cells.”
The concept of theories is also introduced in these references.
There also exists an interesting relation between patterns, as defined in this
section, and association rules, which are a basic concept used in data mining. In
data mining, a data set is usually given as a “flat” list of data points without “output
bit,” rather than as a pair of sets consisting of positive and negative examples. The
data set of Example 12.1, for instance, would be given as Table 12.3, after adding
the attribute x9 that indicates the outcome of each data point.
A property that holds among such data points is called an association rule if it
can be described as an implication of the form: “if x2 = 0 and x4 = 1, then x9 = 1
holds.” More formally, an association rule takes the form,

(xj1 = aj1 , xj2 = aj2 , . . . , xjk = ajk ) =⇒ xl = al ,

where aj1 , aj2 , . . . , ajk and al are either 0 or 1, respectively. It is further required
that at least one data point should satisfy the rule, that no data point should violate
the rule, and that no shorter rule (namely, consisting of a subset of {xj1 = aj1 ,
xj2 = aj2 , . . . , xjk = ajk } in its left-hand side) should exist.
It is not difficult to see that, if we give a special role to the conclusion variable
xl , and if we define the set of data points satisfying xl = 1 (respectively, xl = 0)
as T (respectively, F ), then the above association rule actually asserts that
522 12 Partially defined Boolean functions


i∈Pxji i∈N x̄ji is either a prime pattern (in case al = 1) or a prime copattern (in
case al = 0) of the pdBf (T , F ), where P = {i | aji = 1} and N = {i | aji = 0}.
In the discussion of association rules in data mining, data sets are usually sup-
posed to contain errors and missing parts. To cope with such situations, the concepts
of support and confidence are introduced as essential constituents of association
rules. We do not go into details, but refer to Agrawal, Imielinski, and Swami [8];
Fayyad et al. [321]; and Mannila, Toivonen, and Verkamo [665] for further discus-
sion. In this chapter, we shall deal with errors and missing bits of data in Sections
12.4 and 12.5, respectively, from a slightly different viewpoint.

Remark 12.1. There is some confusion in the use of the terms “theory” and
“cotheory” in application areas. In learning theory and data-mining, “theory” is
often used as a synonym of “extension.” But, here we use it to mean a special
extension with certain properties.Acotheory is sometimes referred to as a “negative
theory” to emphasize its role with respect to the set of negative examples F (in
this case, the theory itself is called “positive theory”). We do not follow these
conventions, so as to avoid a potential confusion with the concepts of positive and
negative functions. 

12.2.4 Roles of theories and cotheories


In this subsection, we discuss some properties of theories and cotheories. Most
results in this section are based on Boros et al. [115]. Given a pdBf (T , F ), let us
define the following theory α(T ,F ) and cotheory β(T ,F ) :

α(T ,F ) = t (12.8)
t∈P (T ,F )

β(T ,F ) = t, (12.9)
t∈coP (T ,F )

where P (T , F ) (respectively, coP (T , F )) denotes the set of all patterns (respec-


tively, copatterns) of (T , F ). The suffix (T , F ) of α and β may be omitted if no
confusion arises. In words, α (respectively, β) is the largest theory (respectively,
cotheory) of (T , F ). We can also define α and β by taking the disjunction of all
prime patterns and prime copatterns of (T , F ), respectively. The resulting theory
and cotheory are equivalent to those defined by (12.8)–(12.9), in the sense that
they define the same functions on B n . Thus, we may also say that α (respectively,
β) is the largest prime theory (respectively, prime cotheory) of (T , F ).
As an important property of α and β, we can show that every point in Bn is a
true point of either α or β. (But note that T (α) ∩ T (β) is not empty in general.)

Theorem 12.4. For every pdBf (T , F ) on Bn , T (α) ∪ T (β) = Bn .


12.2 Extensions of pdBfs and their representations 523

Proof. Take an arbitrary point X ∈ B n , and let A ∈ T ∪ F be the closest point to X


in the sense of the Hamming distance, which is defined by
n

d(V , W ) = |{j = 1, 2, . . . , n | vj = wj }| = |vj − wj | for all V , W ∈ Bn .
j =1

Assume A ∈ T without loss of generality. For a point Y ∈ B n , we use the nota-


tion L(Y ) to denote the set of all literals in its minterm (e.g., if Y = (1, 0, 1, 1),
we have L(Y ) = {x1 , x̄2 , x3 , x4 }). Then let t be the term consisting of all literals in
L(X) ∩ L(A) (e.g., if X = (0, 0, 1, 1, 1) and A = (1, 0, 0, 1, 1), then t = x̄2 x4 x5 ). This
term t satisfies t(A) = 1 by definition. Moreover, t(B) = 0 holds for all B ∈ F ,
since d(X, A) ≤ d(X, B) and A  = B imply that at least one literal in t does not
coincide with L(B). Thus, t is a pattern of (T , F ), and hence, α(X) = 1 holds,
which establishes the theorem. 

Example 12.5. Consider the following pdBf (T , F ):

T = {(1, 0, 0), (1, 1, 1)},


F = {(0, 0, 0), (0, 0, 1), (0, 1, 1)}.

This pdBf is illustrated in Figure 12.1. It has the following:

Patterns: x1 , x1 x2 , x1 x̄2 , x1 x3 , x1 x̄3 , x1 x2 x3 , x1 x̄2 x̄3 ,


Copatterns: x̄1 , x̄1 x2 , x̄1 x̄2 , x̄1 x3 , x̄1 x̄3 , x̄1 x2 x3 , x̄1 x̄2 x3 , x̄2 x3 , x̄1 x̄2 x̄3 ,
Prime patterns: x1 ,
Prime copatterns: x̄1 , x̄2 x3 .

Figure 12.1. An example of pdBf in 3-dimensional space.


524 12 Partially defined Boolean functions

Therefore, the functions α and β, when represented by the disjunctions of all prime
patterns and prime copatterns, can be written as:
α = x1 , β = x̄1 ∨ x̄2 x3 .
This implies that
T (α) = {(1, 0, 0), (1, 1, 1), (1, 1, 0), (1, 0, 1)},
T (β) = {(0, 0, 0), (0, 0, 1), (0, 1, 1), (0, 1, 0), (1, 0, 1)}.
Note that the point (1, 0, 1) belongs to both T (α) and T (β). 
For a pdBf (T , F ), let us now define
T ∗ = F (β) = {X ∈ Bn | β(T ,F ) (X) = 0}, (12.10)

F ∗ = F (α) = {X ∈ Bn | α(T ,F ) (X) = 0}. (12.11)


Namely, T ∗ (respectively, F ∗ ) is the set of points at which all cotheories of (T , F )
evaluate to 0 (respectively, all theories evaluate to 0). Obviously, T ∗ ⊇ T and
F ∗ ⊇ F hold.
Example 12.6. For the pdBf (T , F ) of Example 12.5, we obtain
T ∗ = {(1, 0, 0), (1, 1, 1), (1, 1, 0)},
F ∗ = {(0, 0, 0), (0, 0, 1), (0, 1, 1), (0, 1, 0)}. 
The two pdBfs (T , F ) and (T ∗ , F ∗ ) are mathematically very close, as evidenced
by Lemma 12.2.
Lemma 12.2. For a given pdBf (T , F ), let (T ∗ , F ∗ ) be the pdBf as defined above.
Then,
(i) every pattern (respectively, copattern) of (T , F ) is a pattern (respectively,
copattern) of (T ∗ , F ∗ );
(ii) every pattern (respectively, copattern) of (T ∗ , F ∗ ) is an implicant of α(T ,F )
(respectively, β(T ,F ) );
(iii) α(T ,F ) (X) = α(T ∗ ,F ∗ ) (X) and β(T ,F ) (X) = β(T ∗ ,F ∗ ) (X) for all X ∈ Bn ;
(iv) the pdBfs (T , F ) and (T ∗ , F ∗ ) have the same support sets.
Proof. We omit the proofs of (i)–(iii), as they easily follow from definitions. To
prove (iv), first note that any support set of (T ∗ , F ∗ ) is a support set of (T , F ) by
the property T ∗ ⊇ T and F ∗ ⊇ F . Therefore, let us assume that S is a support set
of (T , F ), and let us show that it is also a support set of (T ∗ , F ∗ ). If this is not
true, there is a pair of points A ∈ T ∗ and B ∈ F ∗ with A|S = B|S . Let t be the
term consisting of all literals in L(A|S ). This term t satisfies t(A) = t(B) = 1 by
definition. Since S is a support set of (T , F ), the term t is either a pattern of (T , F )
(we ignore the case of copattern without loss of generality) or it satisfies t(X) = 0
for all points X ∈ T ∪ F . If t is a pattern of (T , F ), this implies that t(B) = 0
12.2 Extensions of pdBfs and their representations 525

by the definition of F ∗ , leading to a contradiction. In the other case, take a point


C ∈ T ∪ F for which the Hamming distance d(A|S , C|S ) is minimized among all
points in T ∪ F . We assume C ∈ T without loss of generality. Then let t be the
term consisting of all literals in L(A|S ) ∩ L(C|S ). This t satisfies t (A) = 1 and
t (Y ) = 0 for all Y ∈ F by construction (use the argument based on the Hamming
distance in the proof of Theorem 12.4); that is, t is a pattern of (T , F ). Then
t (B) = 0 follows from the definition of F ∗ , again contradicting the assumption
t(B) = 1 since t involves only a subset of the literals in t. 

It may be a lengthy procedure to generate all the elements in T ∗ and F ∗ from


a given pdBf (T , F ). We can, however, state the next theorem for membership
testing.

Theorem 12.5. For a given pdBf (T , F ), the membership in T ∗ (or in F ∗ ) can be


tested in polynomial time.

Proof. We consider only the membership in T ∗ , since the other case is similar. Let
X ∈ B n be a point not in T ∪ F . Then X  ∈ T ∗ if and only if there is a copattern
t of (T , F ) satisfying t(X) = 1. Let this t cover B ∈ F , and let t(X,B) be the term
consisting of all the literals in L(X) ∩ L(B). By definition, t(X,B) ≤ t holds and
t(X,B) is also a copattern of (T , F ). This argument implies that the condition X  ∈ T ∗
holds if and only if t(X,B) is a copattern for some B ∈ F . This test can be conducted
in time polynomial in the input length n(|T | + |F |). 

In view of the symmetric relation between theory and cotheory, it may be


interesting to give special consideration to those theories whose complement is
a cotheory: We say that a theory ϕ is a bi-theory of (T , F ) if ϕ is a cotheory of
(T , F ) (more precisely, if ϕ is equivalent to some cotheory of (T , F )). For the pdBf
of Example 12.5, we see that ϕ = x1 is a bi-theory, since its complement ϕ = x̄1
is a cotheory. There is another bi-theory ϕ = x1 x2 ∨ x1 x̄3 , and this exhausts all
bi-theories for this example.
It is natural to ask whether every pdBf has a bi-theory, assuming of course that
it has an extension. The next theorem answers this question.

Theorem 12.6. If a pdBf has an extension, then it has at least one bi-theory.

Proof. Consider a pdBf (T , F ). For a point X ∈ B n and a set U ⊆ Bn , let d(X, U ) =


minY ∈U d(X, Y ) where d denotes the Hamming distance. To prove the lemma
constructively, define the Boolean function f by

1 if d(X, T ) ≤ d(X, F ),
f (X) =
0 otherwise.

It is easy to see that if (T , F ) has an extension, that is, if T ∩ F = ∅, then f is an


extension of (T , F ).
526 12 Partially defined Boolean functions

For each X ∈ T (f ), let tX be the term consisting of all the literals in L(X)∩L(A)
for some A ∈ T satisfying d(X, A) = d(X, T ). Then define

ϕ= tX .
X∈T (f )

Similarly, for each Y ∈ F (f ), let tY be the term consisting of all the literals in
L(Y ) ∩ L(B) for some B ∈ F satisfying d(Y , B) = d(Y , F ), and define

ψ= tY .
Y ∈F (f )

It follows from these definitions that ϕ is a theory of (T , F ) and ψ is a cotheory of


(T , F ). We are going to show that ϕ represents f and ψ represents f¯ (and hence
ϕ̄), which together imply that ϕ is a bi-theory. For simplicity, we only prove the
statement about ϕ, since the other statement is analogous.
From the definition of ϕ, it follows immediately that ϕ(X) = 1 holds for all
X ∈ T (f ). To show that ϕ(Y ) = 0 holds for all Y ∈ F (f ), choose X ∈ T (f ) and
Y ∈ F (f ) arbitrarily. Let d(X, A) = d(X, T ) hold for A ∈ T and d(Y , B) = d(Y , F )
hold for B ∈ F . Then we have
d(X, A) = d(X, T ) ≤ d(X, F ) ≤ d(X, B) and
d(Y , B) = d(Y , F ) < d(Y , T ) ≤ d(Y , A).
For these A and B, define LAB = L(A) \ L(B) and LBA = L(B) \ L(A) (namely,
LAB is the set of literals in L(A) whose complements are in L(B), and LBA is
defined similarly). Then d(X, A) ≤ d(X, B) implies |tX ∩ LAB | ≥ |LAB |/2, where
tX ∩LAB denotes the set of literals in both tX and LAB . Similarly, d(Y , B) < d(Y , A)
implies |tY ∩ LBA | > |LBA |/2 = |LAB |/2. Therefore, there is at least one literal in
tX whose complement is in tY , and hence, tX (Y ) = 0 holds. As tX was an arbitrary
term in ϕ, this proves ϕ(Y ) = 0. 

With regard to bi-theories, the sets (T ∗ , F ∗ ) defined by (12.10)–(12.11) can be


characterized as follows (Boros et al. [115]): X ∈ T ∗ if and only if ϕ(X) = 1 holds
for all bi-theories ϕ, and Y ∈ F ∗ if and only if ϕ(Y ) = 0 holds for all bi-theories ϕ.
Bi-theories play an important role in logical analysis of data, as the classifica-
tions which they produce are justified both by examples from T and by examples
from F . However, not much is known about the complexity of their recognition
and generation. We refer the reader to Boros et al. [115] for additional details.

12.2.5 Decision trees


Decision trees were introduced in Chapter 1, Section 1.12.3, as a means of rep-
resenting Boolean functions. Recall that a decision tree is a binary rooted tree,
in which each intermediate node has exactly two children corresponding to the
assignments xj = 0 and xj = 1 for a chosen variable xj , and each leaf node carries
12.2 Extensions of pdBfs and their representations 527

Figure 12.2. An example of a decision tree.

Procedure Tree-DNF
For each leaf node with value 1, construct the term corresponding to the path from the root to the
leaf node, and take the disjunction of all such terms.

Figure 12.3. Procedure Tree-DNF.

the function value 0 or 1 for the assignment defined by the unique path from the
root (the top node numbered 0) to the leaf node under consideration. Figure 12.2
shows an example of a decision tree, where intermediate nodes are drawn as circles
and leaves are drawn as squares. For example, the rightmost bottom node (with
assignment 1) indicates that the function value for the assignment x2 = 1, x1 = 1,
and x5 = 1 (along the rightmost path) is 1. To know the function value for a given
data point A = (0, 1, 0, 1, 0, 1, 1, 0), for example, we start from the root and follow
the branch x2 = 1 (since a2 = 1) to the intermediate node 1. Then from node 1 we
follow the branch x1 = 0 (since a1 = 0) to arrive at a leaf node with value 1. This
tells us that f (A) = 1 for the function f represented by this decision tree.
Given a decision tree representing a Boolean function f , the above explanation
entails that a DNF of f can be constructed by the procedure in Figure 12.3.
In this procedure, the term corresponding to a path is defined by including the
literal xj (respectively, x̄j ) if the assignment xj = 1 (respectively, xj = 0) occurs
along the path.

Example 12.7. As the decision tree in Figure 12.2 has two leaf nodes with value 1,
the following DNF is obtained by the procedure Tree-DNF.

ϕ = x̄1 x2 ∨ x1 x2 x5 .
528 12 Partially defined Boolean functions

This can be simplified to


ϕ = x̄1 x2 ∨ x2 x5 ,
which shows that the tree in Figure 12.2 represents the function f1 of Example 12.1.
The DNF ϕ is actually one of the prime theories obtained in Example 12.3 for the
pdBf of Example 12.1.
As observed in Section 1.12.3, a decision tree yields a DNF of f¯ as well, by
applying the procedure Tree-DNF to all leaf nodes with value 0 (instead of those
with value 1). For the above example, we obtain

ϕ̄ = x̄2 ∨ x1 x2 x̄5 = x̄2 ∨ x1 x̄5 . 

Let us now consider the problem of constructing a decision tree that represents
an extension of a given pdBf (T , F ). Similarly to the case of DNFs, there are
many such decision trees, and it is desirable to obtain a “simple” one. The sim-
plicity of decision trees may be measured by their number of nodes, or by their
height. Exact minimization is, however, intractable; for example, it is known that
finding a decision tree with the minimum number of nodes is NP-hard (Hyafil
and Rivest [514]). Therefore, various heuristic algorithms have been proposed to
obtain approximately minimum decision trees. When applied to a pdBf (T , F ),
most of these heuristics fit in the generic scheme described in Figure 12.4, where
we assume that T ∩ F = ∅.
The procedure pdBf Decision Tree yields a decision tree D = D(T , F ) for
every pdBf (T , F ). The tree D can be viewed as representing a Boolean function

Procedure pdBf Decision Tree


Start with the rooted tree consisting of a single node numbered 0, which is unprocessed, and which
is associated with the original pdBf (T , F ).
Repeat the following Branching step as long as there remains an unprocessed node:
(Branching step) Select an unprocessed node numbered k, associated with a pdBf (Tk , Fk ), and
process it according to the following rules.

1. If Tk = ∅, then node k becomes a leaf node with value 0.


2. If Fk = ∅, then node k becomes a leaf node with value 1.
3. If Tk  = ∅ and Fk  = ∅, then determine a branching variable xjk such that xjk does not take a
constant value in Tk ∪ Fk , and generate two children k0 and k1 corresponding to xjk = 0 and
xjk = 1, respectively, for which the associated pdBfs (Tk0 , Fk0 ) and (Tk1 , Fk1 ) are defined
as follows.

Tk0 = {A ∈ Tk | ajk = 0},


Fk0 = {B ∈ Fk | bjk = 0},
Tk1 = {A ∈ Tk | ajk = 1},
Fk1 = {B ∈ Fk | bjk = 1}.

Figure 12.4. Procedure pdBf Decision Tree.


12.2 Extensions of pdBfs and their representations 529

fD , which is an extension of (T , F ). It is interesting to observe that every extension


produced in this way is a bi-theory, as introduced in Section 12.2.4:

Theorem 12.7. For every pdBF (T , F ), if D is a decision tree produced by the


procedure pdBf Decision Tree, and if fD is the extension of (T , F ) represented
by D, then fD is a bi-theory of (T , F ).

Proof. A DNF ϕD of the function fD can be constructed by the procedure Tree-


DNF. It is easy to see that each term of ϕ is a pattern of (T , F ), and hence, the
DNF ϕD is a theory of (T , F ).
Similarly, the DNF ψD obtained by applying the procedure Tree-DNF to the
leaf nodes of D with value 0 is a cotheory of (T , F ). Since ψD represents fD , we
conclude that fD is a bi-theory. 

Note that this result provides an alternative proof of Theorem 12.6. It is


illustrated by Example 12.7. Additional connections between decision trees and
bi-theories are established in Boros et al. [115].
When using the procedure PdBf Decision Tree, the rule applied for choosing
a branching variable at each intermediate node is crucial and determines the prop-
erties of the resulting decision tree, including its size. Different heuristic methods
rely on different ways to select the branching variable. As a representative exam-
ple, we describe now the “information theoretic” rule proposed by Quinlan [770]
in a popular algorithm called ID3.
For a pdBf (Tk , Fk ), let p = |Tk | and q = |Fk |, and define the entropy of (Tk , Fk )
by
p p q q
I (p, q) = − log2 − log2 .
p+q p+q p+q p+q
If a branching variable xj yields the pdBfs (Tk0 , Fk0 ) and (Tk1 , Fk1 ), then their
average entropy becomes
p0 + q0 p1 + q1
E(xj ) = I (p0 , q0 ) + I (p1 , q1 ).
p+q p+q

where p0 = |Tk0 |, q0 = |Fk0 |, p1 = |Tk1 | and q1 = |Fk1 |. This means that the amount
of information gained by the decomposition based on the selction of xj is

gain(xj ) = I (p, q) − E(xj ). (12.12)

In ID3, the variable xj that maximizes gain(xj ) among all the remaining unfixed
variables is selected as the branching variable at node k.

Example 12.8. Let us apply the procedure ID3 to the pdBf of Example 12.1. We
first apply rule 3 to the original pdBf (T , F ). In determining the branching variable
that maximizes gain(xj ), we can choose the variable that minimizes E(xj ), since
I (p, q) is constant for all xj in (12.12). In order to illustrate the computation of
530 12 Partially defined Boolean functions

E(x1 ), observe that the following pdBfs (T , F ) and (T , F ) result when we fix
x1 to 0 and to 1, respectively,
T = {A(1) , A(3) }
F = {B (2) , B (4) }
T = {A(2) }
F = {B (1) , B (3) }.
Thus we have p0 = |T | = 2, q0 = |F | = 2, p1 = |T | = 1, q1 = |F | = 2, and
hence,
4 2 2 2 2 3 1 1 2 2
E(x1 ) = (− log − log ) + (− log − log ) = 0.77.
7 4 4 4 4 7 3 3 3 3
Similarly, we obtain
E(x2 ) = 0.46,
E(x3 ) = E(x4 ) = E(x6 ) = E(x7 ) = E(x8 ) = 0.77,
E(x5 ) = 0.98.
Therefore, at the root, x2 minimizes E(xj ) and is chosen as the branching variable.
Now the two pdBfs (T0 , F0 ) and (T1 , F1 ) that result by fixing x2 to 0 and to 1,
respectively, are given by
T0 = ∅,
F0 = {B (1) , B (2) , B (4) },
T1 = {A(1) , A(2) , A(3) },
F1 = {B (3) }.
As the pdBf (T0 , F0 ) corresponding to x2 = 0 satisfies T0 = ∅, we obtain a leaf node
with value 0 by rule 1 of the branching step in procedure pdBf Decision Tree.
For the pdBf (T1 , F1 ) corresponding to x2 = 1, we again apply rule 3 to select a
branching variable from among x1 , x3 , x4 , . . . , x8 ; in this case, x1 is selected.
Repeating this procedure, we eventually obtain the decision tree of Figure 12.2,
in which the pdBf (T2 , F2 ) of node 2 does not depend on x1 , x2 , and is given by
T2 = {A(2) }
F2 = {B (3) }. 
Other types of selection rules for branching variables have also been proposed.
The successful software C4.5 and its successor C5.0 by Quinlan [771, 772], for
example, use a rule based on the gain-ratio in place of the above gain crite-
rion. Another important addition included in these algorithms is the operation of
“pruning,” which is applied after a decision tree is constructed. This operation is
performed on each intermediate node in order to test whether it is more beneficial
12.3 Extensions within given function classes 531

to retain the node or to prune it into a leaf node, according to some statistical
criterion. The resulting decision tree usually features a more robust behavior on
new input samples.
Before closing this section, we briefly compare two representations of exten-
sions of pdBfs, by DNFs and by decision trees, respectively. Generally speaking,
if an extension f has a small decision tree, it tends to have a small DNF, and vice
versa, since both representations are closely related as explained earlier in this
section. A decision tree is visually appealing, while a DNF may be more conve-
nient for the purpose of understanding the logical content of f . For certain function
classes, such as F+ , FHORN , and Fk , it is easier to check whether a function belongs
to the class when it is represented by a DNF.
The size of a support set is also positively correlated with that of a decision
tree, but not always exactly. Recall that a support set is a set of variables which
is required to represent an extension. On the other hand, heuristic minimization
of a decision tree, such as performed by ID3, is based on choosing an appropriate
branching variable at each node, independently of the choices at other nodes. As a
result of this difference, minimization of a support set does not generally coincide
with minimization of a decision tree.

12.3 Extensions within given function classes


In this section we consider the problem EXTENSION(C), defined in Section
12.2.1, for the function classes F+ , FUNATE , FHORN , FTh , FF0 (S0 ,F1 (S1 )) , Fk , and
Fk-CONV . We discuss necessary and sufficient conditions for the existence of such
extensions, and the computational complexity of finding an extension in the class
when there is one. The results are mainly borrowed from papers by Boros, Ibaraki,
and Makino [139] and Crama, Hammer, and Ibaraki [233], which also consider
other classes of functions.

12.3.1 Positive extensions


Let us first consider the class of positive functions F+ , defined in Sections 1.10
and 1.11. For this class, we obtain (Zuev [939]):

Theorem 12.8. A pdBf (T , F ) has an extension f ∈ F+ if and only if there exists


no pair (A, B) with A ∈ T and B ∈ F such that A ≤ B. This condition can be
checked in polynomial time.

Proof. Necessity. Assume that (T , F ) has a positive extension f , and let A ∈ T ,


B ∈ F . Then, f (A) = 1 and f (B) = 0, and the positivity of f rules out that A ≤ B.
+
Sufficiency. Define a Boolean function fmin by
+
T (fmin ) = {C ∈ Bn | C ≥ A holds for some A ∈ T },
+ +
F (fmin ) = B n \ T (fmin ).
532 12 Partially defined Boolean functions

+ +
It is clear that fmin is a positive function. Furthermore, T (fmin ) ∩ F = ∅ holds by
+
the assumption on T and F . Therefore, fmin is a positive extension of (T , F ).
Finally, the condition in the theorem statement can be checked by directly
comparing all pairs (A, B) with A ∈ T and B ∈ F . This can be done in O(n|T ||F |)
time, which is polynomial in the input length n(|T | + |F |). 

+
The positive extension fmin defined in the proof minimizes the set T (f ) among
+
all positive extensions f of the pdBf (T , F ). It is not difficult to show that fmin is
+
in fact the unique minimum positive extension of (T , F ). We can also define fmax
dually:
+
F (fmax ) = {C ∈ Bn | C ≤ B holds for some B ∈ F },
+
T (fmax ) = B n \ F (fmax
+
).
+
This function fmax is the unique extension that maximizes the set T (f ) among all
positive extensions f . Any positive extension f of a pdBf (T , F ) satisfies
+ +
fmin ≤ f ≤ fmax ,
and all positive extensions form a lattice under the operations ∨ and ∧ between
functions. This is a sublattice of the lattice of all extensions of a pdBf (T , F )
introduced in Section 12.2.1.
Assume now that a pdBf (T , F ) has positive extensions. We say that a set
S + ⊆ {1, 2, . . . , n} is a positive support set for (T , F ) if (T |S + , F |S + ) has a positive
extension, and we define
J+ (A, B) = {j ∈ {1, 2, . . . , n} | aj = 1, bj = 0}
(compare with J(A, B) of (12.5) in Section 12.2.2). Then the problem of finding
a minimum positive support set can be formulated as the following set covering
problem:
n

minimize yj
j =1

subject to yj ≥ 1, A ∈ T , B ∈ F
j ∈J+ (A,B)

yj ∈ {0, 1}, j ∈ {1, 2, . . . , n}.


The next theorem can be proved similarly to Theorem 12.2 [233].
Theorem 12.9. Problem MIN-SUPPORT(F+ ) is NP-hard. 

12.3.2 Monotone (unate) extensions


As defined in Section 1.10, a Boolean function f is called monotone (or unate)
if f is either positive or negative in each of its variables. In finding an extension
12.3 Extensions within given function classes 533

f ∈ FUNATE of a pdBf (T , F ), therefore, it is also required to know the polarity


(either positive or negative) of each variable xj . We first show that this problem can
be formulated as a 0–1 integer programming problem, by adapting the argument
used for support sets in Sections 12.2.2 and 12.3.1
Introduce two new 0–1 variables yj and zj for each j ∈ {1, 2, . . . , n}, where
yj = 1 implies that variable xj appears positively in a unate extension, while zj = 1
implies that xj appears negatively in this extension. The assignment yj = zj = 1
is prohibited, and yj = zj = 0 indicates that the extension does not depend on xj .
Define
J+ (A, B) = {j ∈ {1, 2, . . . , n} | aj = 1, bj = 0},
J− (A, B) = {j ∈ {1, 2, . . . , n} | aj = 0, bj = 1}.
Then problem EXTENSION(FUNATE ) has a solution for a given pdBf (T , F ) if
and only if the following problem has a feasible solution:
 
yj + zj ≥ 1, A ∈ T , B ∈ F (12.13)
j ∈J+ (A,B) j ∈J− (A,B)

yj + zj ≤ 1, j ∈ {1, 2, . . . , n} (12.14)
yj ∈ {0, 1}, zj ∈ {0, 1}, j ∈ {1, 2, . . . , n}. (12.15)
Furthermore, since yj = 1 or zj = 1 implies that j is used in the resulting support
set of f ∈ FUNATE , problem MIN-SUPPORT(FUNATE ) can be formulated as the
0–1 programming problem obtained by considering the objective function
n
 n

minimize yj + zj
j =1 j =1

together with the constraint set (12.13)–(12.15).


For practical purposes, the above problems may be solved by existing integer
programming algorithms, and heuristic algorithms may be developed to solve large
problem instances. However, the constraint set (12.13)–(12.15) is more compli-
cated than the set covering constraints used in Sections 12.2.2 and 12.3.1, due to the
presence of the additional constraints (12.14). Therefore, the following theorem
(due to [233]) should not come as a surprise.
Theorem 12.10. Problem EXTENSION(FUNATE ) is NP-complete.
Proof. The problem is obviously in the class NP, since it is straightforward to
check whether any assignment of 0–1 values to the variables (yj , zj ) satisfies the
constraints (12.13)–(12.14).
To prove that EXTENSION(FUNATE ) is NP-complete, we provide a reduction
from the following NP-complete problem:

DNF Equation
Instance: A DNF expression φ(X) on the variables X = (x1 , x2 , . . . , xn ).
Question: Is the equation φ(X) = 0 consistent?
534 12 Partially defined Boolean functions

(see Chapter 2 and Appendix B). Given an instance φ(X) of DNF Equation,
we construct a pdBf (T , F ) such that the corresponding 0–1 problem (12.13)–
(12.14) has a feasible solution if and only if the equation φ(X) = 0 is consistent.
For this purpose, let t1 , t2 , . . . , tm denote the terms of the DNF φ, and let Ci ⊆
{x1 , x̄2 , . . . , xn , x̄n } be the set of literals that appear in ti . We define

T = {Ai ∈ Bn+m | i = 1, 2, . . . , m},


F = {B i , D i ∈ Bn+m | i = 1, 2, . . . , m},

where, for i, k = 1, 2, . . . , m and j = 1, 2, . . . , n,

aji = 1 and bji = 0 if xj ∈ Ci ,


aji = 0 and bji = 1 if x̄j ∈ Ci ,
aji = bji = 0 if xj , x̄j  ∈ Ci ,
i i
an+i = bn+i = 1,
i i
an+k = bn+k =0 if k  = i,
dji = aji ,
i
dn+k = 0.

For the pdBf (T , F ), we obtain the following set of inequalities from (12.13)–
(12.14):

(i) For Ai ∈ T and B i ∈ F (for the same i),


 
yj + zj ≥ 1, i ∈ {1, 2, . . . , m}.
xj ∈Ci x̄j ∈Ci

(ii) For Ai ∈ T and D i ∈ F (for the same i),

yn+i ≥ 1, i ∈ {1, 2, . . . , m}.

(iii) yj + zj ≤ 1, j ∈ {1, 2, . . . , n + m}.


(iv) Other inequalities.

From (ii) and (iii), we see that yn+i = 1 and zj +i = 0 must hold for all i ∈
{1, 2, . . . , m}. This implies that the inequalities in (iv) are all redundant since any
inequality in (iv) contains at least one variable yn+i (i = 1, 2, . . . , m) in its left-
hand side. Therefore, our problem EXTENSION(FUNATE ) becomes equivalent to
deciding whether the constraints (i) and (iii) have a feasible 0–1 solution. It is
now obvious that such a solution (Y , Z) exists if and only if the original Boolean
equation has a solution X defined by xj = 1 if yj = 1, xj = 0 if zj = 1, and xj
arbitrary if xj = yj = 0. 
12.3 Extensions within given function classes 535

12.3.3 Degree-k extensions


A DNF ϕ is called a k-DNF if it has degree k, that is, if every term of ϕ contains
at most k literals, where k is a given positive integer. We denote by Fk the class
of Boolean functions which can be represented by a k-DNF. The following state-
ment is an immediate corollary of Theorem 12.3 and of the definition of prime
irredundant theories in Section 12.2.3:
Lemma 12.3. If a pdBf (T , F ) has an extension in Fk , then it has a prime irre-
dundant theory in Fk . 
In view of this lemma, a pdBf (T , F ) has an extension in Fk if and only if every
point A ∈ T is covered by a pattern of degree k or less. For a given A ∈ T , this
property can be checked as follows. First, construct the minterm



tA∗ = xj x̄j ,
j :aj =1 j :aj =0

and generate all terms t consisting of at most k literals chosen from the n literals in
ta∗ . If at least one of these terms t satisfies T (t)∩F = ∅, then t is the required pattern;
otherwise, A is not covered by any pattern of degree k. A naive implementation of
this procedure requires O(nk × n|F |) time, which is polynomial when k is viewed
as a constant.
Thus, we obtain:
Theorem 12.11. The problem EXTENSION(Fk ) can be solved in polynomial
time when k is fixed. Similarly, EXTENSION(Fk+ ) can be solved in polynomial
time for every fixed k.
Proof. The first part of the theorem follows from the above discussion. The
statement
about positive extensions can be shown similarly, by starting from
+ ∗
tA = j :aj =1 xj instead of tA . 

12.3.4 Horn extensions


Recall the characterization of a Horn function in Section 6.3: a Boolean function
f is Horn if and only if F (f ) = F (f )∧ holds, where U ∧ denotes the conjunction
closure of a set U ⊆ Bn . This implies that any Horn extension f of a pdBf (T , F )
satisfies F (f ) ⊇ F ∧ , and hence, F ∧ ∩ T = ∅ is a necessary condition for the
existence of a Horn extension. The next theorem establishes that this condition is
also sufficient:
Theorem 12.12. A pdBf (T , F ) has an extension f ∈ FHORN if and only if F ∧ ∩T =
∅. This condition can be checked in polynomial time.
Proof. The Boolean function f defined by F (f ) = F ∧ is a Horn function, and it
is an extension of (T , F ) if and only if F ∧ ∩ T = ∅. This proves the first part of
the theorem.
536 12 Partially defined Boolean functions

Table 12.4. A pdBf (T , F ) with Horn extensions

x1 x2 x3 x4 x5 x6 x7 x8 x9
(1)
A = 1 1 1 1 0 0 1 0 0
A(2) = 1 1 1 0 1 0 1 0 0
A(3) = 1 1 1 0 0 1 0 1 0
T A(4) = 0 0 1 0 0 0 1 0 0
A(5) = 1 0 0 0 0 0 1 0 0
A(6) = 0 1 1 0 0 0 0 0 1
A(7) = 1 1 0 0 0 0 0 0 1
A(8) = 1 1 1 1 1 1 0 0 0

B (1) = 1 1 1 1 0 0 1 1 0
F B (2) = 1 1 1 0 1 0 1 1 1
B (3) = 1 1 1 0 0 1 1 1 0
B (4) = 1 1 1 0 0 0 1 0 1

For the time complexity, note that condition F ∧ ∩ T = ∅ can be rewritten as


B  = A, for all F ⊆ F and for all A ∈ T .


B∈F

For A ∈ Bn , define
F≥A = {B ∈ F | B ≥ A}.

For every F ⊆ F , the condition B∈F B = A implies that B ≥ A for all B ∈ F
(i.e., F ⊆ F≥A ), and hence, that B∈F≥A B = A also holds. Therefore, the condition
F ∧ ∩ T = ∅ is equivalent to

B  = A, for all A ∈ T , (12.16)


B∈F≥A

which can be checked in O(n|T ||F |) time by scanning all B ∈ F for each A ∈ T . 

Example 12.9. Consider the pdBf (T , F ) defined in Table 12.4. It is easily checked
that

F≥A(1) = {B (1) },
F≥A(2) = {B (2) },
F≥A(3) = {B (3) },
F≥A(4) = F≥A(5) = {B (1) , B (2) , B (3) , B (4) },
F≥A(6) = F≥A(7) = {B (2) , B (4) },
F≥A(8) = ∅,
12.3 Extensions within given function classes 537

and condition (12.16) holds for all A(i) ∈ T . Therefore, this pdBf has a Horn exten-
sion by Theorem 12.12. 
HORN
In general, a pdBf (T , F ) may have many Horn extensions. Let fmax denote
the Horn extension that maximizes T (f ) among all Horn extensions f . Then, it
HORN
follows from the discussion before Theorem 12.12 that fmax is given by
HORN
F (fmax ) = F∧
HORN
T (fmax ) = Bn \ F ∧ ,
and it is unique. On the other hand, there are generally many minimal Horn
extensions, that is, Horn extensions f with minimal true set T (f ).
As observed in Chapter 6, DNFs of Horn functions have numerous special
properties. Some of them can be generalized to Horn extensions of pdBfs. In
particular, there are pdBfs (T , F ) for which the number of prime implicants in
HORN
fmax is exponential in the input length n(|T | + |F |). There are algorithms for
Horn
generating all prime implicants of fmax , but none of them runs in polynomial time
in its input and output length (Kautz, Kearns, and Selman [554]; Khardon [564]).
It is known that this problem has a polynomial time algorithm if and only if there
is a polynomial time algorithm (in its input and output length) to generate all
prime implicants of the dual of a positive function (Kavvadias, Papadimitriou, and
Sideri [558]). As discussed in Section 4.4, the complexity of the latter problem is
HORN
still open. Observe that just finding any Horn DNF of fmax is not easier than
finding all its prime implicants, since, from such a DNF, all prime implicants can
be generated in polynomial total time (see Section 6.5). The complexity of this
HORN
problem and other related problems, such as finding an irredundant DNF of fmax
HORN
and finding a shortest DNF of fmax , still remain to be studied.
On the other hand, DNFs of minimal (in the sense of T (f )) Horn extensions
can be described in a canonical form, each of which is of polynomial length in the
input length n(|T | + |F |). To see this, let us introduce some notations. For a pdBf
(T , F ) with T ∩ F = ∅, and for each A ∈ T ,
I (A) = {j ∈ {1, 2, . . . , n} | aj = 0 and bj = 1 for all B ∈ F≥A }

 { nj=1 xj } if A = (1, 1, . . . , 1)

R(A) = {( j :aj =1 xj ) x̄l | l ∈ I (A)} if A  = (1, 1, . . . , 1) and I (A) = ∅

∅ if A  = (1, 1, . . . , 1) and I (A) = ∅.
Note that R(A) is empty only if A  = (1, 1, . . . , 1) and I (A) = ∅, in which case
condition (12.16) implies that (T , F ) has no Horn extension.
Now, when R(A) is nonempty for all A ∈ T , we define a canonical Horn DNF
for the pdBf (T , F ) to be any DNF of the form

ϕ= tA , where tA ∈ R(A).
A∈T

In words, a canonical Horn DNF is obtained by choosing one term tA from each
set R(A) and taking the disjunction of these terms over all A ∈ T . Note that
538 12 Partially defined Boolean functions

each term tA ∈ R(A) satisfies tA (A) = 1 and tA (B) = 0 for all B ∈ F . Therefore,
every canonical Horn DNF represents a Horn extension of (T , F ), and its length
is O(n|T |).

Example 12.10. Let us obtain I (A) and R(A) for all A ∈ T of Example 12.9.

I (A(1) ) = {8}, R(A(1) ) = {123478̄}


I (A(2) ) = {8, 9}, R(A(2) ) = {123578̄, 123579̄}
I (A(3) ) = {7}, R(A(3) ) = {12367̄8}
I (A(4) ) = {1, 2}, R(A(4) ) = {1̄37, 2̄37}
I (A(5) ) = {2, 3}, R(A(5) ) = {12̄7, 13̄7}
I (A(6) ) = {1, 7}, R(A(6) ) = {1̄239, 237̄9}
I (A(7) ) = {3, 7}, R(A(7) ) = {123̄9, 127̄9}
I (A(8) ) = {7, 8, 9}, R(A(8) ) = {1234567̄, 1234568̄, 1234569̄},

where we employ a shorthand notation for terms; for example, 123478̄ stands for
x1 x2 x3 x4 x7 x̄8 , and so on. Consequently, there are 1×2×1×2×2×2×2×3 = 96
canonical Horn DNFs, among which we find, for example,

ϕ (1) = 123478̄ ∨ 123579̄ ∨ 12367̄8 ∨ 1̄37 ∨ 13̄7 ∨ 237̄9 ∨ 127̄9 ∨ 1234567̄


ϕ (2) = 123478̄ ∨ 123579̄ ∨ 12367̄8 ∨ 1̄37 ∨ 13̄7 ∨ 1̄239 ∨ 123̄9 ∨ 1234569̄.


It can be proved that every minimal Horn extension has a canonical Horn DNF,
but the converse is not always true; we refer the reader to Makino, Hatanaka,
and Ibaraki [650] for details. In the above Example 12.10, ϕ (2) represents a
minimal Horn extension, but ϕ (1) does not. It can be checked in polynomial
time whether a canonical Horn DNF represents a minimal Horn extension or
not [650]. Further properties of Horn extensions can be found in Ibaraki, Kogan,
and Makino [518].

12.3.5 Threshold extensions


Because of their natural interpretation, threshold extensions of pdBf have been
extensively studied in data mining and pattern recognition (see, e.g., Mangasarian
[658]; Bradley, Fayyad, and Mangasarian [149]; Mangasarian, Setiono, and
Wolberg [660]), in machine learning (see Matulef et al. [677]; O’Donnell and
Servedio [717])), and in mathematical psychology (see Medina and Schwanen-
flugel [679]; Smith, Murray, and Minda [843]; Wattenmaker et al. [901]).
The problem of deciding whether a pdBf admits a threshold extension is easily
settled.
12.3 Extensions within given function classes 539

Theorem 12.13. The pdBf (T , F ) has a threshold extension if and only if the
system of inequalities
n

wj xj ≤ t for all X ∈ F , (12.17)
j =1
n

wj xj ≥ t + 1 for all X ∈ T , (12.18)
j =1

has a solution (w1 , w2 , . . . , wn , t). This condition can be checked in polynomial


time.
Proof. The characterization follows immediately from the definition of threshold
functions. The feasibility of the system of linear inequalities (12.17)–(12.18) can
be checked in polynomial time (see, e.g., [76]). 

Note however that, even when there exists a threshold extension, the solution of
the system (12.17)–(12.18) does not immediately produce a DNF of the extension,
but only a linear separating structure (w1 , w2 , . . . , wn , t). Also, the existence of a
threshold extension does not guarantee the existence of a threshold prime theory
or of a threshold theory defined over a minimum cardinality support set.
Example 12.11. Consider the pdBf given by
T = {(1, 0, 1, 1), (1, 1, 0, 0), (1, 1, 0, 1), (1, 1, 1, 0), (1, 1, 1, 1)},
F = {(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 1, 0), (0, 1, 0, 0), (0, 1, 0, 1),
(0, 1, 1, 0), (1, 0, 0, 0), (1, 0, 0, 1), (1, 0, 1, 0)}.
This pdBf has four extensions, namely,
ψ1 = x1 x2 ∨ x1 x3 x4 ,
ψ2 = x1 x2 ∨ x1 x3 x4 ∨ x2 x3 x4 ,
ψ3 = x1 x2 ∨ x1 x3 x4 ∨ x̄2 x3 x4 ,
ψ4 = x1 x2 ∨ x3 x4 .
Of these four extensions, only ψ1 and ψ2 are threshold. The unique prime theory
of (T , F ) is ψ4 , which is not threshold.
Similarly, the pdBf of Example 12.1 has several threshold extensions, but the
extensions defined over the minimum cardinality support sets S1 = {5, 8} and
S2 = {6, 7} (namely, ϕ1 and ϕ2 in Table 12.2) are not threshold. 

12.3.6 Decomposable extensions


Consider a family of subsets S0 , S1 , . . . , Sk , where Si ⊆ {1, 2, . . . , n} for all i. In
general, we allow Si and Sj to intersect, that is, Si ∩ Sj  = ∅, although the case
540 12 Partially defined Boolean functions

of disjoint Si ’s will be most interesting. We denote the projection of a vector of


variables X = (x1 , x2 , . . . , xn ) to a set S as X|S = (xj | j ∈ S).
Now, consider a Boolean function f on Bn . We say that f is
F0 (S0 , F1 (S1 ), . . . , Fk (Sk ))-decomposable if there exist (k + 1) Boolean functions
g : B|S0 |+k → B, and hi : B|Si | → B, for i = 1, 2, . . . , k, such that f can be represented
as the following composition of g and hi , i = 1, 2, . . . , k:

f (X) = g(X|S0 , h1 (X|S1 ), . . . , hk (X|Sk )). (12.19)

Here, F0 (S0 , F1 (S1 ), . . . , Fk (Sk )) is referred to as a scheme in which F0 and Fi


stand for some Boolean functions.
Decomposability of Boolean functions is an important topic in logic design
[32, 248, 512], database theory [264], reliability and game theory [777], and other
fields; we refer to Bioch [87] for a recent survey. In logic design, decompositions of
partially defined Boolean functions received some attention in the foregoing early
references, and enumerative type algorithms were proposed. Decomposability is
also important from the viewpoint of logical analysis of data, since decompositions
such as (12.19) reveal essential hierarchical logical structures in the underlying
data sets. As the simplest decomposition scheme of this kind, we study in this
section the decomposition scheme F0 (S0 , F1 (S1 )), where FF0 (S0 ,F1 (S1 )) denotes the
class of functions decomposable under this scheme. We also consider the class
FF+0 (S0 ,F1 (S1 )) in which the functions g and h1 are restricted to being positive in the
decomposition g(X|S0 , h1 (X|S1 ])).
We first consider the problem EXTENSION(FF0 (S0 ,F1 (S1 )) ) for a given pair of
sets S0 and S1 . Let us define the structure graph G(T ,F ) = (V , E) by

V = V0 ∪ V1 ,
E = EF ∪ ET ,
Vi = {X|Si | X ∈ T ∪ F }, i = 0, 1
ET = {(A|S0 , A|S1 ) | A ∈ T },
EF = {(B|S0 , B|S1 ) | B ∈ F }.

When displaying the graph G(T ,F ) , we draw the edges in ET as solid lines, and the
edges in EF as broken lines.

Example 12.12. Consider the pdBf in Table 12.5 with S0 = {1, 2, 3} and S1 =
{4, 5, 6} (ignore the column h1 for the time being). The corresponding structure
graph is shown in Figure 12.5. 

In view of Theorem 12.1, the pdBf (T , F ) has an extension f ∈ FF0 (S0 ,F1 (S1 )) if
and only if there exists a function h1 : V1 → B such that T ∩ F = ∅, where

T = {(A|S0 , h1 (A|S1 )) | A ∈ T }
F = {(B|S0 , h1 (B|S1 )) | B ∈ F }.
12.3 Extensions within given function classes 541

Table 12.5. An example of pdBf having a


decomposition g(X|S0 , h1 (X|S1 ))

S0 S1 h1

T 100 101 1
011 110 0

011 010 1
110 101 1
F 100 110 0
000 110 0
000 010 1

Figure 12.5. The structure graph G(T ,F ) of the pdBf in Example 12.12.

In terms of the graph G(T ,F ) , this condition is described as follows:


Lemma 12.4. The pdBf (T , F ) has an extension f ∈ FF0 (S0 ,F1 (S1 )) if and only if
there exists a function h1 : V1 → B such that, for every pair of edges e = (X0 , X1 ) ∈
ET and e = (X0 , X1 ) ∈ EF , either X0  = X0 holds or h1 (X1 )  = h1 (X1 ) holds. 

Example 12.13. For the pdBf of Example 12.12, possible values of h1 (X) (X ∈ V1 )
are indicated in Table 12.5 and also beside the vertices in V1 , in Figure 12.5. It
is easy to see that these values h1 (X) satisfy the condition in Lemma 12.4, thus
implying that the pdBf of Example 12.12 has an extension in FF0 (S0 ,F1 (S1 )) . 
In order to verify if there exists a function h1 satisfying the condition of Lemma
12.4, let us construct the auxiliary graph G∗(T ,F ) = (V ∗ , E ∗ ) as follows:
V ∗ = V1
E ∗ = {(X1 , X1 ) | there is a vertex X0 ∈ V0 in G(T ,F )
such that (X0 , X1 ) ∈ ET and (X0 , X1 ) ∈ EF }.
542 12 Partially defined Boolean functions

Figure 12.6. The graph G∗(T ,F ) and its two-coloring for the pdBf in Example 12.12.

With this construction, we can state:

Theorem 12.14. The pdBf (T , F ) has an extension f ∈ FF0 (S0 ,F1 (S1 )) if and only
if G∗(T ,F ) is a bipartite graph. In particular, for given sets S0 and S1 , the problem
EXTENSION(FF0 (S0 ,F1 (S1 )) ) can be solved in polynomial time.

Proof. It is easy to see that there exists a function h1 as described in Lemma 12.4
if and only if each vertex of G∗(T ,F ) can be assigned one of two colors, either 0 or
1, so that no two adjacent vertices receive the same color. This condition means
that G∗(T ,F ) must be bipartite, and it can be checked in polynomial time. 

Example 12.14. The auxiliary graph G∗(T ,F ) for the pdBf (T , F ) of Example 12.12
is displayed in Figure 12.6. The colors satisfying the above condition are indi-
cated beside the vertices. This construction illustrates how the h1 -values shown in
Figure 12.5 were obtained. 

We next turn to the class of positively decomposable functions FF+0 (S0 ,F1 (S1 )) . In
this case, we have to rely on Theorem 12.8 rather than on Theorem 12.1. Thus, let
us define the positive structure graph G+ (T ,F ) = (V0 ∪ V1 , EF ∪ ET ∪ H0 ∪ H1 ) for
a given pdBf (T , F ), by adding the following sets of directed arcs to the structure
graph G(T ,F ) = (V0 ∪ V1 , EF ∪ ET ):

Hi = {(X, X ) | X, X ∈ Vi and X ≤ X }, i = 0, 1.

The arcs (X, X ) in H0 ∪ H1 are drawn as solid arrows from X to X , respectively.

Example 12.15. For the pdBf (T , F ) of Table 12.6, assume that S0 = {1, 2} and
S1 = {3, 4, 5} are given (ignore the column h1 temporarily). The positive structure
graph G+ (T ,F ) is shown in Figure 12.7. 
12.3 Extensions within given function classes 543

Table 12.6. An example of a pdBf


having a positive decomposition.

S0 S1 h1

11 011 0
T 01 101 1
01 110 1

01 010 0
F 00 101 1
10 110 1

Figure 12.7. The positive structure graph G+


(T ,F ) of the pdBf in Example 12.15.

In view of Theorem 12.8, a positive decomposable extension of (T , F ) exists


if and only if there is a function h1 : V1 → B such that
(i) for all X1 , X1 ∈ V1 with X1 ≤ X1 , the inequality h1 (X1 ) ≤ h1 (X1 ) holds;
(ii) there is no pair of edges e = (X0 , X1 ) ∈ ET and e = (X0 , X1 ) ∈ EF such
that both inequalities X0 ≤ X0 and h1 (X1 ) ≤ h1 (X1 ) simultaneously hold.
The condition for the existence of such a function h1 is expressed by the next
lemma, where we let

T ∗ = {A ∈ T | there exists B ∈ F such that B |S0 ≥ A|S0 },


F ∗ = {B ∈ F | there exists A ∈ T such that B|S0 ≥ A |S0 }.

Lemma 12.5. A pdBf (T , F ) has an extension f = g(X|S0 , h1 (X|S1 )) ∈


FF+0 (S0 ,F1 (S1 )) if and only if there is no pair of points A ∈ T ∗ and B ∈ F ∗ such
that A|S1 ≤ B|S1 .
Proof. The general condition for the existence of a pair A ∈ T ∗ and B ∈ F ∗ such
that A|S1 ≤ B|S1 is illustrated in Figure 12.8. The condition in the lemma asserts
that the positive structure graph G+(T ,F ) does not contain Figure 12.8 as a subgraph.
544 12 Partially defined Boolean functions

Figure 12.8. Illustration of a pair of (A, B), A ∈ T ∗ and B ∈ F ∗ , such that A|S1 ≤ B|S1 .

Note that some vertices (e.g., those connected by the arcs in Hi ) may be contracted
when we consider the subgraph of Figure 12.8.
Necessity. Assume that there are points A ∈ T ∗ , B ∈ F ∗ , A ∈ T , B ∈ F sat-
isfying the condition of Figure 12.8, for which A|S1 ≤ B|S1 , A|S0 ≤ B |S0 and
A |S0 ≤ B|S0 hold. This means h1 (A|S1 ) = 1 because h1 (A|S1 ) = 0 implies
(A|S0 , h1 (A|S1 )) ≤ (B |S0 , h1 (B |S1 )), which contradicts the condition (ii) on h1
stated before this lemma. However, h1 (A|S1 ) = 1 implies h(B|S1 ) = 1 by con-
dition (i) on h1 , and hence, (A |S0 , h1 (A |S1 )) ≤ (B|S0 , h1 (B|S1 )), contradicting
condition (ii) on h1 .
Sufficiency. If the subgraph of Figure 12.8 is not contained in G+ (T ,F ) , then a
positive function h1 : V1 → {0, 1} can be defined as follows:

1 if some A ∈ T ∗ satisfies A|S1 ≤ X
h1 (X) = (12.20)
0 otherwise.

It is straightforward to show that this function h1 satisfies the above conditions (i)
and (ii). 

Example 12.16. It can be checked directly that the positive structure graph G+ (T ,F )
of Figure 12.7 for Example 12.15 does not contain the subgraph of Figure 12.8.
The values of h1 , indicated in Figure 12.7 beside the vertices of V1 , are determined
by (12.20). This assignment h1 satisfies conditions (i) and (ii), as easily seen from
Table 12.6, and we can conclude that the pdBf (T , F ) has a positive extension
f = g(X|S0 , h1 (X|S1 )) ∈ FF+0 (S0 ,F1 (S1 )) . 

Since the condition of Lemma 12.5 can be checked in polynomial time by


enumerating all possible subsets of eight vertices, we obtain the next theorem.
12.3 Extensions within given function classes 545

Table 12.7. Complexity results for decomposable extensions

FALL F+

F0 (S0 , F1 (S1 )) P P
F0 (F1 (S1 ), F2 (S2 )) P P
F0 (S0 , F1 (S1 ), F2 (S2 )) NPC P
F0 (F1 (S1 ), F2 (S2 ), F3 (S3 )) NPC P
F0 (S0 , F1 (S1 ), . . . , Fk (Sk )), k ≥ 3 NPC NPC
F0 (F1 (S1 ), F2 (S2 ), . . . , Fk (Sk )), k ≥ 4 NPC NPC

P: polynomial time, NPC: NP-complete

Theorem 12.15. For given sets S0 and S1 , problem EXTENSION(FF+0 (S0 ,F1 (S1 )) )
can be solved in polynomial time. 

To conclude this section, we summarize in Table 12.7 the complexity status


of problem EXTENSION(C) for various decomposition schemes in FALL and
F+ , where S1 , S2 , . . . , Sk are given subsets. For F+ , we require that the functions
g and hi in (12.19) should all be positive. In Table 12.7, a letter “P” indi-
cates that the corresponding problem is solvable in polynomial time, and “NPC”
means that it is NP-complete. Most of these results are due to Boros et al. [122],
except the results for CF+0 (S0 ,F1 (S1 ),...,Fk (Sk )) with k ≥ 3, and CF+0 (F1 (S1 ),F2 (S2 ),...,Fk (Sk ))
with k ≥ 4, which are proved by Makino, Yano, and Ibaraki [655]. The lat-
ter reference also considers cases in which some or all of the functions are
restricted to be Horn. Further related results can be found in Ono, Makino, and
Ibaraki [718].

12.3.7 k-convex extensions


The concept of k-convex function was introduced by Ekin, Hammer, and Kogan
[306], and k-convex extensions were studied by the same authors in [307]. A
Boolean function f is called k-convex for a given integer k ≥ 2 if, for every pair
of true points A, C ∈ T (f ) with Hamming distance d(A, C) ≤ k, every point B
located between A and C is also a true point of f . Here, we say that B is located
between A and C if d(A, B) + d(B, C) = d(A, C) holds. The class of k-convex
functions is denoted Fk-CONV .
The class Fk-CONV deserves attention in data analysis because k-convex func-
tions can model situations in which the set of true points consists of a number of
clusters that lie far apart (at distance larger than k) from each other.
Let us say that two terms s and t conflict in h literals if there are h variables,
each of which appears in exactly one of the terms s and t as a positive literal, and
in the other term as a negative literal. A k-convex function can be characterized as
follows [306]:
546 12 Partially defined Boolean functions

Lemma 12.6. For k ≥ 2, a Boolean function f is k-convex if and only if every


two prime implicants of f conflict in at least k + 1 literals. 

Example 12.17. Consider a function f with two prime implicants,

f = x1 x2 x3 x4 ∨ x̄1 x̄2 x̄3 .

Since the two prime implicants of f conflict in three literals, this function is 2-
convex. In other words, T (f ) consists of two clusters represented by the two prime
implicants, and any two points belonging to different clusters are at Hamming dis-
tance at least 3. 

For a function f (which may not be k-convex), define the k-convex envelope
of f to be the smallest k-convex majorant of f . The k-convex envelope of f is
denoted by [f ]k . Thus, [f ]k ∈ Fk-CONV , and [f ]k ≤ g for all g ∈ Fk-CONV such
that f ≤ g.
Ekin, Hammer and Kogan [306] introduced the k-convex envelope and proved
that it always exists. In order to describe an algorithm to compute the k-convex
envelope, we define as follows the convex hull of two terms s and t. Let yj ,
j = 1, 2, . . . , n, denote (positive or negative) literals, and assume that s and t are
written as




s= yj yj yj ,
j ∈S1 j ∈S2 j ∈S3




t= ȳj yj yj ,
j ∈S1 j ∈S2 j ∈S4

where S1 denotes the set of indices of conflicting literals, S2 the set of indices of
common literals, and S3 and S4 (satisfying S3 ∩ S4 = ∅) the sets of literals which
appear only in s and only in t, respectively. The convex hull [s, t] is defined as the
conjunction of the common literals in s and t:

[s, t] = yj .
j ∈S2

Given a DNF ϕ0 of a function f , a DNF of its k-convex envelope [f ]k is


obtained by applying the following operation as long as possible.
If the current DNF ϕ contains two terms s and t conflicting in at most k literals, then
remove s and t from ϕ, and add the new term [s, t] to ϕ.

This algorithm terminates in polynomial time in the length of the DNF ϕ0 , since
the number of terms decreases by one at each iteration.

Example 12.18. Let us compute the 2-convex envelope of the following function:

f = x1 x2 x3 x4 x5 ∨ x1 x2 x3 x4 x6 ∨ x̄1 x̄2 x̄3 x5 x6 ∨ x̄1 x̄2 x̄3 x4 x̄5 x6 ∨ x̄1 x̄2 x̄3 x̄4 x̄5 x̄6 .
12.4 Best-fit extensions of pdBfs containing errors 547

Taking the convex hull of the first two terms, which have no conflicting literal, we
obtain
[x1 x2 x3 x4 x5 , x1 x2 x3 x4 x6 ] = x1 x2 x3 x4 .
Similarly, from the third and fourth terms having one conflicting literal x5 , we
obtain
[x̄1 x̄2 x̄3 x5 x6 , x̄1 x̄2 x̄3 x4 x̄5 x6 ] = x̄1 x̄2 x̄3 x6 .
Finally, from this new term and the fifth term of f having one conflicting literal
x6 , we obtain
[x̄1 x̄2 x̄3 x6 , x̄1 x̄2 x̄3 x̄4 x̄5 x̄6 ] = x̄1 x̄2 x̄3 .
The resulting two terms conflict in three (= k + 1) literals, and thus we have
obtained the 2-envelope of f :
[f ]2 = x1 x2 x3 x4 ∨ x̄1 x̄2 x̄3 .
This is indeed a 2-convex function as already discussed in Example 12.17. 
Now let (T , F ) be a pdBf, and suppose we want to know whether (T , F ) admits
a k-convex extension. Let ϕT be the DNF consisting of all the minterms associated
with the true points in T (ϕT is the minterm expression of fmin ; see (12.3)). Then
from the preceding argument, the following theorem easily follows:
Theorem 12.16. A pdBf (T , F ) has an extension f ∈ Fk-CONV if and only if the k-
envelope of ϕT satisfies [ϕT ]k (B) = 0 for all B ∈ F . This condition can be checked
in polynomial time.
Proof. Supppose that g is a k-convex extension of (T , F ). Since [ϕT ] ≤ g, the
definition of the k-convex envelope implies [ϕT ]k ≤ g. Now, for all B ∈ F , g(B) =
0, and hence, [ϕT ]k (B) = 0.
The converse implication and the complexity statement are straightforward. 

Remark 12.2. The problem EXTENSION(C) has also been extensively studied in
computational learning theory, where it is usually called the consistency problem.
This interest is motivated by the fact that a class C is not PAC learnable and not
polynomially exact learnable with equivalence queries, if the consistency problem
for C is NP-complete (provided, of course, P  = N P ); see, for example, Anthony
[26] for details. For example, the consistency problem for the class of h-term DNF
functions (namely, functions representable by a disjunction of at most h terms)
was shown to be NP-complete by Pitt and Valiant [749]. For related topics, the
reader is referred to Aizenstein et al. [13]; Angluin [21]; Bshouty [161]; Kearns,
Li, and Valiant [560]; and Valiant [884], and so on. 

12.4 Best-fit extensions of pdBfs containing errors


Real-world data sets represented as pdBfs (T , F ) are prone to errors. Some points
in T ∪ F may contain corrupted bits, some points may have been erroneously
548 12 Partially defined Boolean functions

classified, and some attributes not included in the current data set may render
it inconsistent. In this section, in order to cope with such situations, we allow an
extension f “to make errors” in the sense that some points A ∈ T may be classified
in F (f ) (f (A) = 0), and some points B ∈ F may be classified in T (f ) (f (B) = 1).
However, we obviously want to minimize the magnitude of such errors. In order
to state more precisely the resulting questions, let

w : T ∪ F → R+

be a weighting function that represents the importance of each data point in T ∪ F .


For a subset U ⊆ T ∪ F , we let

w(U ) = w(A).
A∈U

Boros, Ibaraki, and Makino [139] introduced the following problem (see also
Boros, Hammer, and Hooker [128]):

Problem BEST-FIT(C)
Instance: A pdBf (T , F ) and a weighting function w on T ∪ F .
Output: A pdBf (T ∗ , F ∗ ) (and an extension f ∈ C of (T ∗ , F ∗ )) with the following
properties:
1. T ∗ ∩ F ∗ = ∅ and T ∗ ∪ F ∗ = T ∪ F .
2. (T ∗ , F ∗ ) has an extension in C.
3. w(T ∩ F ∗ ) + w(F ∩ T ∗ ) is minimized.
The conditions in this problem express that if we consider the points in T ∩
F ∗ and F ∩ T ∗ as erroneously classified, and if we change their classification
accordingly, then the resulting pdBf (T ∗ , F ∗ ) has an extension in the designated
class C. In case the weighting function w satisfies w(A) = 1 for all A ∈ T ∪ F , the
problem asks to minimize the number of erroneously classified points in T ∪ F .
Clearly, problem BEST-FIT(C) contains problem EXTENSION(C) as a spe-
cial case. Therefore, if EXTENSION(C) is NP-complete, then BEST-FIT(C) is
NP-hard. Conversely, if BEST-FIT(C) is solvable in polynomial time, then so is
EXTENSION(C). The next theorem indicates that BEST-FIT(C) is quite hard and
is polynomially solvable only for very restrictive classes C (see [139] for additional
results).

Theorem 12.17. Problem BEST-FIT(C) can be solved in polynomial time for C =


FALL and C = F+ , but is NP-hard for C ∈ {FUNATE , FTh , FHORN , FF0 (S0 ,F1 (S1 )) , Fk }.

Proof. We prove the polynomiality of BEST-FIT(C) for FALL and F+ . Its NP-
hardness for FUNATE follows from Theorem 12.10. The results for other classes
are omitted (see Boros, Ibaraki, and Makino [139]).
C = FALL : By Theorem 12.1, if (T , F ) does not have an extension in FALL , then
T ∩ F  = ∅. The optimal pdBf (T ∗ , F ∗ ) is obtained by reclassifying every point
12.4 Best-fit extensions of pdBfs containing errors 549

X ∈ T ∩ F either into T ∗ or into F ∗ . Since both decisions carry the same weight
w(X), we can minimize w(T ∗ ∩ F ) + w(F ∗ ∩ T ) by letting, for example,

T ∗ = T \F, F∗ = F. (12.21)

C = F+ : By Theorem 12.8, if the pdBf (T , F ) does not have an extension in


F+ then there are two points A ∈ T , B ∈ F with A ≤ B. Define a bipartite graph
H(T ,F ) = (T ∪ F , E) by

E = {(A, B) | A ≤ B, A ∈ T , B ∈ F }.

This graph H(T ,F ) can be constructed from (T , F ) in O(n|T ||F |) time. A minimum
vertex cover of H(T ,F ) is a subset of vertices U ⊆ T ∪ F such that
(1) U is a vertex cover of H(T ,F ) , that is, every edge (A, B) ∈ E satisfies either
A ∈ U or B ∈ U , and
(2) w(U ) is minimum among all vertex covers.
Although the problem of finding a minimum vertex cover is NP-hard for general
graphs, it is solvable in O((|T | + |F |)3 ) time for bipartite graphs (e.g., Ford and
Fulkerson [341]; Kuhn [587]).
Let U be a minimum vertex cover of H(T ,F ) . We can assume without loss of
generality that U is a minimal cover, meaning that no proper subset of U is a vertex
cover (this is certainly true if all weights w are strictly positive; otherwise, simply
remove all redundant vertices from U ).
Observe that for every positive Boolean function f , the set

W = (T ∩ F (f )) ∪ (F ∩ T (f ))

is a vertex cover of H(T ,F ) . (Indeed, otherwise, there is an edge (A, B) ∈ E such


that A ≤ B, f (A) = 1 and f (B) = 0, which contradicts the positivity of f .) This
implies that
w(T ∩ F (f )) + w(F ∩ T (f )) ≥ w(U ) (12.22)
for every positive function f .
Now define

T ∗ = (T \ U ) ∪ (F ∩ U ), (12.23)

F = (T ∩ U ) ∪ (F \ U ). (12.24)

We claim that the pdBf (T ∗ , F ∗ ) has an extension in F+ . Every such extension f


satisfies

w(T ∩ F (f )) + w(F ∩ T (f )) = w(T ∩ F ∗ ) + w(F ∩ T ∗ ) = w(U ),

and this, together with (12.22) implies that (T ∗ , F ∗ ) provides an optimal solution
of BEST-FIT(F+ ). The total time required for the entire computation of (T ∗ , F ∗ )
is O(n|T ||F | + (|T | + |F |)3 ).
550 12 Partially defined Boolean functions

In order to prove the claim, assume that (T ∗ , F ∗ ) does not have a pos-
itive extension. This means that there exist A ∈ T ∗ and B ∈ F ∗ such that
A ≤ B. We distinguish three cases, according to the definition (12.23)–(12.24)
of (T ∗ , F ∗ ):

(1) A ∈ T \ U and B ∈ F \ U : Then, the edge (A, B) is in E, and this contradicts


the assumption that U is a vertex cover of H(T ,F ) .
(2) A ∈ T \ U and B ∈ T ∩ U : If there is an edge (B, B ) ∈ E with B ∈ F
and B ≤ B , we have A ≤ B ≤ B . Hence, (A, B ) is an edge of H(T ,F ) , and
B ∈ U , since U is a vertex cover and A  ∈ U . This shows that U \ {B} is
also a vertex cover contradicting the minimality of U .
(3) A ∈ F ∩ U : If there is an edge (A , A) ∈ E with A ∈ T and A ≤ A, we
have A ≤ A ≤ B. If A ∈ T \ U , then, the same reasoning as in either (1)
or (2) (with A playing now the role of A) leads again to a contradiction.
Thus, A ∈ U , and we conclude that U \ {A} is also a vertex cover of H(T ,F ) ,
contradicting again the minimality of U .

This completes the proof of the claim and of the theorem. 

Example 12.19. Consider a pdBf (T , F ) on B 5 defined by

T = {(0, 1, 1, 0, 0), (0, 1, 0, 1, 0), (0, 0, 1, 1, 0), (0, 0, 1, 0, 1), (0, 0, 1, 1, 1)},
F = {(0, 1, 0, 1, 1), (1, 1, 0, 1, 0), (0, 1, 1, 1, 0), (0, 0, 1, 1, 1)}.

The weighting function w is given by w(A) = 1 for all A ∈ T ∪ F .


Since T ∩ F = {(0, 0, 1, 1, 1)}, a best-fit extension in FALL is obtained from the
pdBf (T ∗ , F ∗ ) defined by

T ∗ = T \ {(0, 0, 1, 1, 1)},
F∗ = F,

in view of (12.21).
To solve BEST-FIT(F+ ), we then construct the bipartite graph H(T ,F ) of
Figure 12.9. This graph has a minimum vertex cover

U = {(0, 1, 0, 1, 0), (0, 1, 1, 1, 0), (0, 0, 1, 1, 1)},

as illustrated by the dark circles in the figure. Therefore, by (12.23)–(12.24) in the


above proof, we obtain

T ∗ = (T \ U ) ∪ (F ∩ U )
= {(0, 1, 1, 0, 0), (0, 0, 1, 1, 0), (0, 0, 1, 0, 1), (0, 0, 1, 1, 1), (0, 1, 1, 1, 0)},

F = (T ∩ U ) ∪ (F \ U )
= {(0, 1, 0, 1, 0), (0, 1, 0, 1, 1), (1, 1, 0, 1, 0)}.
12.5 Extensions of pdBfs with missing bits 551

Figure 12.9. Bipartite graph H(T ,F ) for the pdBf in Example 12.19 (dark circles denote the
vertices in a minimum vertex cover U ).

It is easily checked that there is no pair (A, B) with A ∈ T ∗ and B ∈ F ∗


such that A ≤ B, and hence, the pdBf (T ∗ , F ∗ ) has an extension in F+ by
Theorem 12.8. 

The problem BEST-FIT was extensively studied by Boros, Ibaraki and Makino
[139].As the problem plays an important role in analyzing real-world data, efficient
heuristic algorithms are necessary to deal with those classes C for which BEST-
FIT(C) is NP-hard. Some attempts in this direction have been made, for instance,
in Boros et al. [131].

12.5 Extensions of pdBfs with missing bits


Not only do real-world data sets often contain erroneous data, but they may also
turn out to be incomplete. By incomplete, we mean here that one or several bits
may be missing from certain data points. Such bits simply may have been lost
in the handling process, or may have been unavailable at the time of data collec-
tion, or intentionally omitted, for instance, because obtaining the bits is costly or
dangerous. Let us denote each missing bit by ∗, and let

M = {0, 1, ∗}.

We call partially defined Boolean function with missing bits, abbreviated as pBmb,
any pair (T̃ , F̃ ) consisting of a set of positive examples T̃ ⊆ Mn and of a set of
negative examples F̃ ⊆ Mn . Following the line of Boros, Ibaraki, and Makino
[140], we introduce in the next subsection various types of extensions which are
meaningful for pBmbs. Related complexity results are then discussed in Section
12.5.2.
552 12 Partially defined Boolean functions

12.5.1 Three types of extensions


When trying to define the concept of an “extension of a pBmb,” missing bits “∗”
in data points may be interpreted in two different ways:

1. We consider that each missing bit can take value either 0 or 1, and the value
of the extension should be identical in both cases.
2. We consider that each missing bit should be fixed to one of the two values
0 and 1, and an extension should exist for these fixed values. (Here, it is
important to fix appropriately the value of the missing bits.)

If we take the first point of view, then we can define a fully robust extension1
of a pBmb (T̃ , F̃ ) to be a Boolean function f such that f (A) = 1 (respectively,
f (B) = 0) for all A ∈ Bn (respectively, B ∈ B n ) obtainable from a point à ∈ T̃
(respectively, B̃ ∈ F̃ ) by fixing each missing bit to either 0 or 1. From the second
point of view, we can define a consistent extension of (T̃ , F̃ ) to be an extension of
some pdBf (T , F ) obtained from (T̃ , F̃ ) by fixing all missing bits appropriately.
When a pBmb (T̃ , F̃ ) admits a consistent extension, but no fully robust exten-
sion, then we may also take an intermediate view whereby we should fix a smallest
possible number of missing bits so that the resulting pBmb has a fully robust
extension. Such an extension is called a most robust extension.
To describe more precisely the above three problems, let us introduce some
notations. For a set of points S̃ ⊆ Mn , let

AS(S̃) = {(X, j ) | X ∈ S̃, xj = ∗}.

For each subset Q ⊆ AS(S̃) and α ∈ B Q , we interpret α as an assignment of values


to the missing bits xj , for all points X and all indices j such that (X, j ) ∈ Q. The
outcome of the assignment α to S̃ is denoted by S̃ α = {Xα | X ∈ S̃}, where

α α(X, j ) if (X, j ) ∈ Q,
xj =
xj otherwise.

Example 12.20. If

S̃ = {X = (1, ∗, 0, 1), Y = (0, 1, ∗, ∗), Z = (1, 1, ∗, 0)},

then
AS(S̃) = {(X, 2), (Y , 3), (Y , 4), (Z, 3)}.
If we consider Q = {(X, 2), (Y , 4)} and the assignment (α(X, 2), α(Y , 4)) = (1, 0) ∈
BQ , then we obtain

S̃ α = {Xα = (1, 1, 0, 1), Y α = (0, 1, ∗, 0), Z α = (1, 1, ∗, 0)}.




1 This extension is called a robust extension in [140].


12.5 Extensions of pdBfs with missing bits 553

We also use the following shorthand notations: For a given pBmb (T̃ , F̃ ), we
let
AS = AS(T̃ ∪ F̃ ),

and if S̃ is a singleton {X}, then we simply write AS(X) for AS(S̃). Note that an
assignment α ∈ BAS fixes all missing bits of the points in T̃ ∪ F̃ .
Based on these definitions, we say that a Boolean function f is a fully robust
extension of the pBmb (T̃ , F̃ ) if the conditions

f (Aα ) = 1 for all A ∈ T̃ , (12.25)


α
f (B ) = 0 for all B ∈ F̃ , (12.26)

hold for all α ∈ BAS . A Boolean function f is a consistent extension of (T̃ , F̃ ) if


there is an assignment α ∈ BAS for which (12.25)–(12.26) hold.
Various extension problems for partially defined Boolean functions with miss-
ing bits can now be defined as follows.

Problem FRE(C) (fully robust extension)


Instance: A pBmb (T̃ , F̃ ).
Question: Does (T̃ , F̃ ) have a fully robust extension in C? (When the answer is
“yes,” it is usually required to output one such extension.)

Problem CE(C) (consistent extension)


Instance: A pBmb (T̃ , F̃ ).
Question: Does (T̃ , F̃ ) have a consistent extension in C? (When the answer is
“yes,” it is usually required to output one such extension and the corresponding
assignment α ∈ BAS .)

Problem MRE(C) (most robust extension)


Instance: A pBmb (T̃ , F̃ ).
Question: Does (T̃ , F̃ ) have a consistent extension in C? If the answer is “yes,”
then output a subset Q ⊆ AS and an assignment α ∈ B Q such that

(1) the pdBf (T̃ α , F̃ α ) has a fully robust extension in C, and


(2) |Q| is minimized among all (Q, α) satisfying condition (1).

As is obvious from these definitions, FRE(C) and CE(C) both contain


EXTENSION(C) as a special case. MRE(C) is more general than FRE(C) and
CE(C). Therefore, NP-hardness of EXTENSION(C) for a class C implies NP-
hardness of FRE(C), CE(C), and MRE(C). Furthermore, if one of FRE(C) and
CE(C) is NP-hard, then so is MRE(C). For polynomial solvability, these arguments
can be reversed.
In the next section, we investigate the complexity of FRE(C), CE(C), and
MRE(C) for some function classes C of interest.
554 12 Partially defined Boolean functions

Remark 12.3. There is yet another type of extension of a pBmb called fully
consistent extension: A pBmb (T̃ , F̃ ) is fully consistent in class C if for every
assignment α ∈ BAS there is an extension f ∈ C of the pdBf (T̃ α , F̃ α ). Note that
the extensions may be different for different assignments α ∈ BAS . Clearly, a
pBmb (T̃ , F̃ ) is fully consistent in class C if it has a fully robust extension in C,
but the converse may not be true. This type of extension was studied in Boros et
al. [141, 142]. 

12.5.2 Complexity results


In this section, for two points A, B ∈ Mn , we write A ≈ B if there exists an
assignment α ∈ BAS({A,B}) such that Aα = B α . We write A , B if Aα ≤ B α holds
for some assignment α. For example, (0, ∗, 1, ∗) ≈ (∗, 1, 1, 0) and (0, ∗, 0, ∗) ,
(∗, 1, 1, 0), but (1, ∗, 1, ∗)  ≈ (0, 1, 1, 0) and (0, ∗, 1, ∗)  , (∗, 1, 0, ∗). For a point A ∈
Mn , let A1 denote the point in B n obtained from A by fixing all missing bits ∗ to
1, and A0 the point obtained from A by fixing all missing bits ∗ to 0.

Theorem 12.18. A pBmb (T̃ , F̃ ) has a fully robust extension if and only if there
exists no pair (A, B) with A ∈ T̃ and B ∈ F̃ such that A ≈ B. Hence, FRE(FALL )
can be solved in polynomial time.

Proof. The necessary and sufficient condition is obvious from the definition of a
fully robust extension. The condition A  ≈ B is equivalent to the existence of an
index j such that aj  = bj , aj , bj ∈ {0, 1}. This can be checked in O(n|T̃ ||F̃ |) time
by direct comparison of all points A ∈ T̃ and B ∈ F̃ . 

The next lemma holds for the class F+ and for any subclass of F+ .

Lemma 12.7. A pBmb (T̃ , F̃ ) has a fully robust extension in the class C ⊆ F+ if
and only if the pdBf (T − , F + ) defined by

T − = {A0 | A ∈ T̃ }, F + = {B 1 | B ∈ F̃ },

has an extension in C.

Proof. If there is a fully robust extension f ∈ C of (T̃ , F̃ ), then by definition, f is


also an extension of the pdBf (T − , F + ) since the latter is obtained from (T̃ , F̃ ) by
some assignment of values to missing bits.
To prove the converse, assume that (T − , F + ) has an extension g ∈ C. Then
A ≤ Aβ holds for all A ∈ T̃ and all assignments β ∈ B AS(A) , and hence,
0

1 = g(A0 ) ≤ g(Aβ ) implies g(Aβ ) = 1. Similarly, we obtain g(B β ) = 0 for


all B ∈ F̃ and all β ∈ BAS(B) . This shows that g is a fully robust extension
of (T̃ , F̃ ). 
12.5 Extensions of pdBfs with missing bits 555

Example 12.21. Consider the following pBmb (T̃ , F̃ ) with n = 5:

T̃ = {(0, 1, ∗, ∗, 0), (∗, 1, 0, 1, 1)},


F̃ = {(∗, ∗, 1, 0, 1), (0, ∗, 1, ∗, 1)}.

It is easily checked that A ≈ B does not hold for any A ∈ T̃ , B ∈ F̃ , and hence,
there is a fully robust extension of (T̃ , F̃ ) in FALL . Such a fully robust extension f
is for example given by

T (f ) = {(0, 1, 0, 0, 0), (0, 1, 0, 1, 0), (0, 1, 1, 0, 0), (0, 1, 1, 1, 0), (0, 1, 0, 1, 1),
(1, 1, 0, 1, 1)}
F (f ) = B 5 \ T (f ).

We next construct (T − , F + ) as in Lemma 12.7:

T − = {(0, 1, 0, 0, 0), (0, 1, 0, 1, 1)}


F + = {(1, 1, 1, 0, 1), (0, 1, 1, 1, 1)}.

Since A ≤ B holds for A = (0, 1, 0, 0, 0) ∈ T − and B = (1, 1, 1, 0, 1) ∈ F + , (T − , F + )


does not have an extension in F+ by Theorem 12.8. Hence, by the previous lemma,
the pBmb (T̃ , F̃ ) does not have a fully robust extension in F+ . 
A variant of Lemma 12.7 applies to consistent extensions:
Lemma 12.8. A pBmb (T̃ , F̃ ) has a consistent extension in the class C ⊆ F+ if
and only if the pdBf (T + , F − ) defined by

T + = {A1 | A ∈ T̃ }, F − = {B 0 | B ∈ F̃ }

has an extension in C.
Proof. Assume first that there is a consistent extension f ∈ C of (T̃ , F̃ ). That is,
f is an extension of the pdBf (T̃ β , F̃ β ) for some assignment β ∈ B AS . Since f
is positive and Aβ ≤ A1 , we see that f (A1 ) = 1 holds for all A ∈ T̃ . Similarly,
f (B 0 ) = 0 for all B ∈ F̃ . Therefore, f is an extension of (T + , F − ).
The converse direction is obvious since (T + , F − ) is obtained from (T̃ , F̃ ) by
an assignment. 

We proved earlier that EXTENSION(C) is polynomially solvable for the classes


C = F+ , FF+0 (S0 ,F1 (S1 )) , Fk+ , among others. The following theorem then immediately
follows from Lemmas 12.7 and 12.8:
Theorem 12.19. The problems FRE(C) and CE(C) are solvable in polynomial
time for the classes C = F+ , FF+0 (S0 ,F1 (S1 )) , and Fk+ . 

Fully robust threshold extensions can also be identified in polynomial time.


556 12 Partially defined Boolean functions

Theorem 12.20. The problem FRE(FTh ) can be solved in polynomial time.


Proof. For a pBmb (T̃ , F̃ ) on Mn , consider the system of linear inequalities:
 
wj + yj ≥ t + 1 for all A ∈ T̃ , (12.27)
(j :aj =1) (j :aj =∗)
 
wj + zj ≤ t for all B ∈ F̃ , (12.28)
(j :bj =1) (j :bj =∗)

yj ≤ wj , yj ≤ 0 j = 1, 2, . . . , n, (12.29)
zj ≥ wj , zj ≥ 0 j = 1, 2, . . . , n. (12.30)
We claim that this system has a feasible solution if and only if (T̃ , F̃ ) has a fully
robust threshold extension.
Let us assume first that (12.27)–(12.30) has a feasible solution (W , Y , Z, t).
Then, for all A ∈ T̃ and for all α ∈ AS(A), we obtain from (12.27) and (12.29):
   n

t +1 ≤ wj + yj ≤ wj = wj ajα . (12.31)
j :aj =1 j :aj =∗ j :ajα =1 j =1

Applying the same reasoning on F̃ , we conclude that the structure (W , t) defines


a fully robust threshold extension of (T̃ , F̃ ).
Conversely, assume that (T̃ , F̃ ) has a fully robust threshold extension f , and
let (W , t) be a separating structure for f . Set yj = min{0, wj } and zj = max{0, wj }
for j = 1, 2, . . . , n. We claim that (W , Y , Z, t) is a feasible solution of the system
(12.27)–(12.30). Indeed, for any A ∈ T̃ , let us define α to be the assignment on
AS(A) which sets ajα = 1 when yj = wj , and ajα = 0 when yj = 0. Since f is
a fully robust extension, we have f (Aα ) = 1, and since f is threshold, we have
n α
j =1 wj aj ≥ t + 1. Thus,
n
   
wj ajα = wj = wj + yj , (12.32)
j =1 j :ajα =1 j :aj =1 j :aj =∗

and (12.27) is satisfied. The same reasoning holds for (12.28), and hence,
(W , Y , Z, t) is a feasible solution of (12.27)–(12.30). 

Finally, we establish another polynomially solvable case of the fully robust


extension problem.
Theorem 12.21. The problem FRE(FHORN ) is solvable in polynomial time.
Proof. For a pBmb (T̃ , F̃ ) and a point A ∈ T̃ , define
F̃=A = {B ∈ F̃ | B = A}.
We claim that (T̃ , F̃ ) has a fully robust extension in FHORN if and only if, for every
A ∈ T̃ such that F̃=A  = ∅,
there is an index j such that aj = 0 and bj = 1 hold for all B ∈ F̃=A . (12.33)
12.5 Extensions of pdBfs with missing bits 557

This claim will prove the theorem, since condition (12.33) can be checked for all
A ∈ T̃ in O(n|T̃ ||F̃ |) time.
To prove the claim, assume first that condition (12.33) holds. Consider a point
A ∈ T̃ and the corresponding index j satisfying condition (12.33). Then, for all
assignments α ∈ BAS and all B ∈ F̃=a , we have ajα = 0 and bjα = 1. Therefore, the
Horn term

tA = ( xi ) x̄j
i:ai =1

satisfies tA (Aα ) = 1 and tA (B α ) = 0 for all α ∈ B AS and all B ∈ F̃=a . This term
tA also satisfies tA (B α ) = 0 for all B ∈ F̃ \ F̃=A and all α ∈ B AS ; indeed, for all
such B, there is some i such that ai = 1 and bi = 0 by the assumption that B  = A.
We conclude that the following Horn DNF represents a fully robust extension of
(T̃ , F̃ ):
ϕ= tA . (12.34)
A∈T̃

Conversely, if condition (12.33) does not hold for some A ∈ T̃ with F̃=A  = ∅,
then define an assignment α ∈ BAS({A}∪F̃=A ) as follows: For all (A, i) ∈ AS(A),
&
B∈F̃=A :bi  =∗ bi if there is a point B ∈ F̃=A such that bi  = ∗
α(A, i) =
1 otherwise,

and for all for all (B, i) ∈ AS(F̃=A ),


 α
ai if (A, i) ∈ AS(A),
α(B, i) =
ai otherwise.

Then, it can be checked that (F̃=A )α = (F̃ α )≥Aα = {B α ∈ F̃ α | B α ≥ Aα } satisfies


Aα = Bα.
B α ∈(F̃ α )≥Aα

By condition (12.16) in the proof of Theorem 12.12, this implies that the pdBf
(T̃ α , F̃ α ) does not have a Horn extension, and consequently, the pBmb (T̃ , F̃ )
does not have a fully robust extension in FHORN . 

Example 12.22. Let us consider the pBmb (T̃ , F̃ ) of Example 12.21. For the two
points in T̃ , we obtain

F̃=(0,1,∗,∗,0) = {(∗, ∗, 1, 0, 1), (0, ∗, 1, ∗, 1)},


F̃=(∗,1,0,1,1) = {(0, ∗, 1, ∗, 1)}.

The condition (12.33) holds with j = 5 for A = (0, 1, ∗, ∗, 0), and with j = 3 for
A = (∗, 1, 0, 1, 1). Therefore, the DNF ϕ defined by (12.34), namely,

ϕ = x2 x̄5 ∨ x2 x4 x5 x̄3 ,
558 12 Partially defined Boolean functions

is a Horn DNF representing a fully robust extension of (T̃ , F̃ ). 

In contrast with the previous positive results, Boros, Ibaraki, and Makino [140]
also proved that, except for the special cases discussed in Theorems 12.18, 12.19,
12.20, and 12.21, all other variants of the problems FRE, CE, MRE are either NP-
complete or NP-hard for the classes FALL , F+ , FUNATE , FTh , FHORN , FF0 (S0 ,F1 (S1 )) ,
and Fk . We refer the reader to [140] for details and additional results.

12.6 Minimization with don’t cares


In designing logic circuits of computers and other digital systems, Boolean the-
ory has been extensively used to make the circuits efficient and economical.
This has been discussed in several other chapters of this book, for instance, in
Chapter 1, Section 1.13.2; in Chapter 2, Section 2.1; and, in particular, in Chapter 3,
Section 3.3. In the process of logic design, complex logic functions are first decom-
posed into many small basic blocks, and each one is then realized as a logic circuit.
This is illustrated in Figure 12.10 in which the central block realizes three Boolean
functions f1 , f2 , f3 of the variables x1 , x2 , x3 , x4 . Although each block has in gen-
eral many outputs, for simplicity, we consider here the case of realizing a single
function f .
In practical applications, there are usually many combinations of input values
that can never be simultaneously observed and that are therefore called don’t care
points, or simply don’t cares. For example, two physical lines associated with x1
and x2 may be used to represent the binary numbers “0” and “1” by a special
coding scheme “0”= (0, 1) and “1”= (1, 0). Then, it is prohibited to use the com-
binations (0, 0) or (1, 1), meaning that we can ignore all inputs points X satisfying
(x1 , x2 ) = (0, 0) or (1, 1). In general, the values of input lines are mutually corre-
lated, and these input values must satisfy many constraints. All input points X not
satisfying such constraints are called don’t cares, as we do not need to care about

Figure 12.10. A basic block in logic circuits.


12.6 Minimization with don’t cares 559

the output obtained for such input values when designing the logic circuit under
consideration. This provides some freedom, which can be exploited in the design
process. Since this aspect was not mentioned in Chapter 3, Section 3.3, we discuss
it here very briefly (for more details, we refer the reader to the specialized litera-
ture; see, e.g., Umans, Villa and Sangiovanni-Vincentelli [877] or Villa, Brayton,
and Sangiovanni-Vincentelli [891]).
In the terminology of this chapter, we can restate the logic synthesis problem
as the problem of realizing some extension of a pdBf (T , F ) (instead of a Boolean
function) by a DNF φ, where all points in B n \ (T ∪ F ) are interpreted as don’t
cares. Keeping in mind the difference between a Boolean function and a pdBf,
we can accordingly adapt the discussion of logic minimization in Section 3.3.
Namely, we want now to find an extension f of (T , F ) having a shortest DNF φ,
as measured either by |φ| (the number of literals) or by ||φ|| (the number of terms).
A main difference with the discussion in Chapter 3 is that here, we do not know
the function f beforehand, but we have to select it among the extensions of (T , F ).
Note that, since our objective is to minimize the size of the DNF representation,
Lemma 12.3 implies that there is no loss of generality in restricting our attention
to prime irredundant theories (defined in Section 12.2.3). Therefore, we use prime
patterns of (T , F ) (instead of prime implicants of f in Chapter 3), and we aim
to find a set of prime patterns that together cover T . The DNF φ defined as the
disjunction of such prime patterns is a prime theory. If this prime theory mini-
mizes |φ| (or ||φ||), then we deem it desirable from the point of view of circuit
design.
In order to select an appropriate prime theory, the usual procedures require first
to generate all prime patterns of the given pdBf (T , F ). Since the prime patterns
of (T , F ) are among the prime implicants of the function fmax defined by (12.4)
(see Lemma 12.1 in Section 12.2.3), we can proceed as described in Figure 12.11.
Note that ψfmax is explicitly available from F . The prime implicants of fmax can
be generated, for instance, by (the dual version of) the procedure SD-Dualization
of Section 4.3.2.

Example 12.23. Let us consider the pdBf in Table 12.1 of Section 12.1. First we
construct ψfmax from F = {B (1) , B (2) , B (3) , B (4) } as follows (we use the shorthand

Procedure Prime Patterns of (T , F )


1. Construct the maxterm expression ψfmax (namely, the CNF expression of fmax introduced
in Definition 1.11 of Section 1.4) by taking the conjunction of all maxterms associated with
the points in F = F (fmax ).
2. Generate all prime implicants of fmax by one of the dualization methods discussed in
Section 4.3 (see also Section 3.2.4).
3. Select the prime patterns of (T , F ) among the prime implicants of fmax .

Figure 12.11. Procedure Prime Patterns of (T , F ).


560 12 Partially defined Boolean functions

notation 1̄2 for x̄1 x2 , etc.):

ψfmax = (1̄ ∨ 2 ∨ 3̄ ∨ 4 ∨ 5̄ ∨ 6 ∨ 7̄ ∨ 8)(1 ∨ 2 ∨ 3 ∨ 4̄ ∨ 5̄ ∨ 6̄ ∨ 7 ∨ 8)


(1̄ ∨ 2̄ ∨ 3 ∨ 4̄ ∨ 5 ∨ 6̄ ∨ 7 ∨ 8̄)(1 ∨ 2 ∨ 3̄ ∨ 4 ∨ 5̄ ∨ 6 ∨ 7̄ ∨ 8).

Expanding this CNF into a DNF, and then manipulating it as discussed in Section
4.3.2, we obtain the next DNF consisting of all prime implicants of fmax :

ϕ = 1̄2∗ ∨ 1̄5̄∗ ∨ 1̄8∗ ∨ 23∗ ∨ 24̄∗ ∨ 25∗ ∨ 2̄5̄ ∨ 26̄∗ ∨ 27∗ ∨ 28̄∗
∨ 2̄8 ∨ 34 ∨ 3̄4̄ ∨ 35̄ ∨ 36 ∨ 3̄6̄∗ ∨ 3̄7∗ ∨ 37̄∗ ∨ 38∗ ∨ 4̄5̄
∨ 46̄∗ ∨ 4̄6 ∨ 47∗ ∨ 4̄7̄∗ ∨ 4̄8∗ ∨ 5̄6̄ ∨ 5̄7∗ ∨ 58∗ ∨ 5̄8̄∗ ∨ 67∗
∨ 6̄7̄∗ ∨ 6̄8∗ ∨ 78 ∨ 12̄3̄ ∨ 12̄4 ∨ 12̄6 ∨ 12̄7̄ ∨ 13̄5∗ ∨ 13̄8̄
∨ 145∗ ∨ 148̄ ∨ 156 ∨ 157̄∗ ∨ 168̄ ∨ 17̄8̄ ∨ 268̄∗ .

In this DNF, the terms marked with ∗ are the prime patterns of (T , F ), while the
unmarked terms are not prime patterns. This example shows that the procedure
may generate many prime implicants which are not prime patterns of (T , F ). 

Following the line of Section 3.3, the next step of logic minimization is to find
a set of prime patterns that together cover the set T of true points. For the purpose
of computing a theory φ which minimizes |φ| or ||φ||, the methods described in
Section 3.3 (Quine-McCluskey method and its extensions) can be readily applied,
if we simply replace the words “prime implicant” by “prime pattern.” We illustrate
this by continuing the foregoing example.

Example 12.24. From the list of prime patterns marked in the DNF ϕ for Example
12.23, we choose a set of prime patterns which cover the set T = {A(1) , A(2) , A(3) }
given in Table 12.1. It is easy to see that a single prime pattern cannot do
this, and thus we must select at least two prime patterns. Even if we restrict
ourselves to short prime implicants, there are many such sets, for example,
{1̄2, 25}, {1̄2, 26̄}, . . . , {67, 6̄7̄}. The corresponding prime theories φ contain exactly
two prime patterns of degree two and are minimum with respect to both norms |φ|
and ||φ||. The extensions f1 and f3 given in Example 12.1 are two such minimum
realizations of (T , F ). 

The preceding method based on generating all prime patterns seems reasonably
efficient for those pdBfs (T , F ) such that T ∪F is not much smaller than B n (which
is often the case when don’t cares are considered). But if the set T ∪ F is small,
other methods that construct prime patterns directly from T may be more efficient.
For example, Boros et al. [131] propose a naive method that first generates all terms
of degree 1 one and picks up prime patterns from them, and then repeats the same
for all terms of degree 2, and so on. This method can be used to obtain short prime
patterns. Another approach is to apply, for each A ∈ T , a method to generate all
prime patterns that cover A, by relying on the set covering characterization of
12.7 Conclusion 561

Table 12.8. Summary of complexity results obtained in this chapter

EXT MIN-SUPT BEST-FIT FRE CE MRE

FALL P NPH P P NPC NPH


F+ P NPH P P P NPH
FUNATE NPC NPH NPH NPC NPC NPH
FTh P NPH NPH P NPC NPH
Fk P NPH NPH NPC NPC NPH
FHORN P NPH NPH P NPC NPH
FF0 (S0 ,F1 (S1 )) P NPH NPH NPC NPC NPH
FF+0 (S0 ,F1 (S1 )) P NPH NPH P P NPH

P: polynomial time, NPH: NP-hard, NPC: NP-complete

patterns (see Exercise 2 of this chapter). This can be elaborated into an algorithm
that runs with polynomial delay for the generation of all prime patterns, as discussed
in Boros et al. [117].

12.7 Conclusion
In this chapter, we introduced partially defined Boolean functions (pdBfs) as fun-
damental models arising in various fields of applications, in particular, in logical
analysis of data. We defined various problems and classified their computational
complexity, with an emphasis on questions related to extensions of pdBfs. We sum-
marize in Table 12.8 the main complexity results mentioned in this chapter. In this
table, a letter P indicates that the corresponding problem is solvable in polynomial
time, while NPH or NPC indicate that it is NP-hard or NP-complete, respectively.
Also, EXT stands for EXTENSION and MIN-SUPT for MIN-SUPPORT.
As mentioned in the introduction of this chapter, acquiring or discovering
meaningful information (or knowledge) from available data has recently received
increased attention. The approach in this chapter may be regarded as a logical
approach, since it is based solely on the consideration of pdBfs and of their
extensions, viewed as Boolean functions having simple Boolean expressions. The
performance of different approaches may be compared from several viewpoints,
such as:

• accuracy of the performance of the obtained classification on new data points;


• ease of comprehension of the classification, and of the underlying knowledge
unveiled by the approach;
• compactness of the representation of this knowledge, allowing its use for
various purposes;
• efficiency of the computation of the classification.

It remains important to develop and to investigate better methods, possibly by com-


bining existing approaches, so that they become more useful and more meaningful
when applied to real-world situations.
562 12 Partially defined Boolean functions

12.8 Exercises
1. Prove Theorem 12.9, that is, prove that problem MIN-SUPPORT(F+ ) is
NP-hard.
2. Given a pdBf (T , F ) on B n and a point A ∈ T , let tA = z1 z2 · · · zn be the
minterm of A (that is, zj = xj if aj = 1 and zj = x̄j otherwise). Define an
|F | × n matrix Q by

1 if bj(i)  = aj ,
Qij =
0 otherwise,

where B (i) is the i-th point in F . Then consider the following set covering
constraints:

Qy ≥ 1 (12.35)
n
y ∈ {0, 1} . (12.36)

Show that y is a feasible solution of the system (12.35)–(12.36) if and only


if the term t = j :yj =1 zj is a pattern of (T , F ) that covers A. Furthermore,
y is a minimal solution of (12.35)–(12.36) (namely, no y ≤ y with y  = y
is feasible) if and only if t is a prime pattern.
3. As a special case of F0 (F1 (S1 ), F2 (S2 ))-decomposability of a Boolean
function f , where S1 , S2 ⊆ {1, 2, . . . , n}, let us consider a conjunctive
decomposition of type

f (X) = h1 (X|S1 ) ∧ h2 (X|S2 ). (12.37)

Define a bipartite graph G(T ,F ) = (V , E) by

V = V1 ∪ V2 ,
E = EF ∪ ET
Vi = {X|Si | X ∈ T ∪ F }, i = 1, 2
ET = {(X|S1 , X|S2 ) | X ∈ T },
EF = {(X|S1 , X|S2 ) | X ∈ F }.

Prove that (T , F ) has an extension that is decomposable as in (12.37) if and


only if G(T ,F ) has no four vertices A, B, C, D such that A, C ∈ V1 , B, D ∈ V2 ,
(A, B) ∈ ET , (C, D) ∈ ET , and (C, B) ∈ EF .
4. Similarly to Exercise 3, consider now a disjunctive decomposition:

f (X) = h1 (X|S1 ) ∨ h2 (X|S2 ),

and derive a necessary and sufficient condition for this type of decompos-
ability.
5. Prove the second half of Theorem 12.11, that is, prove that problem
EXTENSION(Fk+ ) can be solved in polynomial time.
12.8 Exercises 563

6. For a given graph G = (V , E) with V = {1, 2, . . . , n}, define the points A(i,j ) ,
(i, j ) ∈ E, and B (i) , i ∈ V , as follows:
(i,j ) (i,j )
• ak = 1 for k ∈ / {i, j } and ak = 0 for k ∈ {i, j },
(i) i
• bk = 1 for k  = i and bk = 0 for k = i.
Then define a pdBf (T , F ) in Bn by

T = {A(i,j ) | (i, j ) ∈ E},


F = {B (i) | i ∈ V }.

Show that
min(|T ∩ F ∗ | + |F ∩ T ∗ |) = τ (G)
holds, where the minimum is taken over all pdBfs (T ∗ , F ∗ ) having an exten-
sion f ∈ FHORN , and τ (G) is the size of a minimum vertex cover in G.
Knowing that the minimum vertex cover problem is NP-hard, prove that
BEST-FIT(FHORN ) is NP-hard.
7. For each of the following conditions, construct a pBmb (T̃ , F̃ ) satisfying
it.
a. (T̃ , F̃ ) has a consistent extension in FALL , but does not have a fully
robust extension in FALL .
b. (T̃ , F̃ ) has a fully robust extension in FALL , but does not have a fully
robust extension in F+ .
c. (T̃ , F̃ ) has a consistent extension in F+ , but does not have a fully robust
extension in F+ .
8. Consider the consistent extension problem CE(FALL ) for a pBmb (T̃ , F̃ )
such that each A ∈ T̃ ∪ F̃ has at most one missing bit. Recall that (T̃ , F̃ )
has a consistent extension in FALL if and only if there is an assignment
α such that Aα  = B α holds for all pairs of A ∈ T̃ and B ∈ F̃ . Show that
the question of the existence of such an assignment can be formulated as a
quadratic Boolean equation (or 2-sat problem). Since quadratic equations
are solvable in polynomial time, this proves that CE(FALL ) is also solvable
in polynomial time under the stated restriction.
13
Pseudo-Boolean functions

13.1 Definitions and examples


In Chapter 1, we defined a pseudo-Boolean function to be a mapping from B n =
{0, 1}n to R. In other words, a pseudo-Boolean function is a real-valued function of
a finite number of 0–1 variables. Identifying the Boolean symbols 0 and 1 (or T and
F , Yes and No, etc.) with the corresponding integers, we see that pseudo-Boolean
functions provide a proper generalization of Boolean functions. In fact, just as in the
Boolean case, the deliberate ambiguity that results from this identification rarely
causes any difficulties, but it is frequently the source of fruitful developments.
The systematic investigation of pseudo-Boolean functions, their theoretical
properties, and their applications has been initiated by Hammer and Rudeanu in
[460], building on previous ideas of Fortet [342, 343] and of Hammer, Rosenberg,
and Rudeanu [458]. This field of research has given rise to countless subsequent
publications over the last decades.
Since the element of {0, 1}n are in one-to-one correspondence with the sub-
sets of N = {1, 2, . . . , n}, every pseudo-Boolean function can also be viewed as a
real-valued set function defined on P(N ), the power set of N = {1, 2, . . . , n}. Set
functions have been extensively studied because of their mathematical appeal and
their presence in numerous fundamental models of mathematics and of applied sci-
ences. By considering functions defined on {0, 1}n rather than on P(N ), however,
the pseudo-Boolean approach provides an algebraic viewpoint, which some-
times carries clear advantages over the set-theoretic description. For instance,
we mentioned in Chapter 1, Section 1.12.2, that every pseudo-Boolean func-
tion can be (uniquely) represented as a multilinear polynomial in its variables.
This representation (and related ones) opens the door to algebraic and numeri-
cal manipulations of pseudo-Boolean functions that play a major role in many
applications.
Another (voluminous) book would be required in order to discuss appropriately
the enormous body of literature devoted to the investigation of pseudo-Boolean
functions. Our intention in this chapter, therefore, is only to skim the surface of the

564
13.1 Definitions and examples 565

topic and to briefly indicate some of the main research directions and techniques
encountered in the field.
We now proceed with a description of a few representative problems arising in
mathematics, computer science, and operations research, where pseudo-Boolean
functions appear naturally and contribute to the analysis and the solution of area-
specific problems.

Mathematics
Application 13.1. (Graph theory.) As observed by Hammer and Rudeanu [460],
many graph theoretic concepts can be easily formulated in the pseudo-Boolean
language. We only give here a few examples.
Let N = {1, 2, . . . , n}, and consider a graph G = (N , E) with nonnegative
weights w : N → R+ on its vertices, and capacities c : E → R+ on its (undirected)
edges. For every S ⊆ N , the cut (S, N \ S) is the set  of edges having exactly
one endpoint in S; the capacity of this cut is defined as (i,j )∈(S,N \S) c(i, j ). The
max-cut problem is to find a cut of maximum capacity in G. If (x1 , x2 , . . . , xn ) is
interpreted as the characteristic vector of S, then the edge (i, j ) has i ∈ S and
j  ∈ S if and only if xi x j = 1. Therefore, the max-cut problem is equivalent to the
maximization of the quadratic pseudo-Boolean function

f (x1 , x2 , . . . , xn ) = c(i, j )(xi x j + x i xj ). (13.1)
(i,j )∈E

Recall that a stable set in G is a set S ⊆ 


N such that no edge has both of
its endpoints in S; the weight of S is w(S) = i∈S w(i). The weighted stability
problem is to find a stable set of maximum weight in G. If (x1 , x2 , . . . , xn ) denotes
again the characteristic vector of S, then this is equivalent to maximizing the
quadratic pseudo-Boolean function
n
 
f (x1 , x2 , . . . , xn ) = w(i) xi − M x i xj (13.2)
i=1 (i,j )∈E

for a sufficiently large value of the penalty M (say, M > max1≤i≤n w(i)).
Let us now assume that w(i) = 1 for i = 1, 2, . . . , n. For every A ⊆ N , we denote
by αG (A) the stability number of the subgraph of G induced by A, that is, the size of
a largest stable set of G contained in A. We can associate with G a pseudo-Boolean
function fαG defined as follows: For each X = (x1 , x2 , . . . , xn ) ∈ Bn ,

fαG (X) = αG (supp(X)), (13.3)

where supp(X) denotes as usual the subset of N with characteristic vector X.


Pseudo-Boolean functions defined in this way have been introduced in [66] in
connection with the study of perfect graphs. To illustrate their interest, we mention
for instance that G is the complement of a triangulated graph if and only if all the
566 13 Pseudo-Boolean functions

coefficients in the multilinear polynomial representation of fαG (X) take values in


{−1, 0, 1} (see [66]). 

Application 13.2. (Linear algebra.) Let V be a finite set of vectors over an arbi-
trary field and consider the set function f : P(V ) → R, where f (T ), T ⊆ V , is the
rank of the matrix whose rows are the members of T . This rank function has two
interesting properties that are further examined in Section13.6. First, the function
is monotone nondecreasing, that is,
f (S) ≤ f (T ) whenever S ⊆ T .
Second, the function is submodular, meaning that
f (S ∪ T ) + f (S ∩ T ) ≤ f (S) + f (T ) for all S, T ⊆ V .
It is interesting to remark that both of these properties continue to hold for rank
functions defined on subsets of elements of a matroid (see for instance Welsh
[905]). 

Computer science and engineering


Application 13.3. (Artificial intelligence, Maximum satisfiability.) Expert sys-
tems are frequently described as systems of rules of the form
(Ck (xi1 , . . . , xink ) = 1) ⇒ (xjk = 1), (k = 1, 2, . . . , m),
where the Boolean variables xi , i = 1, 2, . . . , n, are associated with various control
parameters, and where each Ck , k = 1, 2, . . . , m, is an elementary conjunction.
Suppose that there is a real-valued penalty wk for the violation of the k-th rule.
Then, the total penalty incurred for the assignment of values (x1 , x2 , . . . , xn ) to the
control variables is described by the pseudo-Boolean function
m

f= wk Ck (xi1 , . . . , xink ) x jk .
k=1

This situation can be generalized as follows: Consider a CNF m k=1 Ck ,
where each Ck is a Boolean clause (or elementary disjunction) of the form
 
Ck = ( i∈Ak x i )∨( j ∈Bk xj ), and assume that a real weight wk has been assigned
to each clause Ck , for k = 1, 2, . . . , m. The weighted maximum satisfiability (Max
Sat) problem is to find a point X ∗ in {0, 1}n that maximizes the total weight of the
satisfied clauses, that is, Max Sat is the pseudo-Boolean optimization problem
m

maximize { wk | Ck (X) = 1 } subject to X ∈ Bn .
k=1
 
Clearly a clause Ck takes value 1 if and only if the term ( i∈Ak xi )( j ∈Bk x j ) is
equal to 0. Therefore, Max Sat is equivalent to minimizing the pseudo-Boolean
13.1 Definitions and examples 567

function   
m
  
f= wk  xi   xj  .
k=1 i∈Ak j ∈Bk

We refer the reader to Chapter 2, Section 2.11.4, for a more complete discussion
of this well-known generalization of the Boolean satisfiability problem. 

Application 13.4. (Data mining, classification, learning theory.) Consider a finite


set *+ ⊆ {0, 1}n of positive observations, and a finite set *− ⊆ {0, 1}n of negative
observations, such that *+ ∩ *− = ∅. In order to distinguish the sets of positive
and negative vectors, two families of elementary conjunctions C1+ , . . . , Ck+ and
C1− , . . . , Ch− (called respectively positive and negative patterns) can be determined,
such that for all X ∈ *+ ∪ *− ,

Ci+ (X) = 1 ⇒ X ∈ *+ (i = 1, . . . k)
Cj− (X) = 1 ⇒ X ∈ *− (j = 1, . . . h)

(see Chapter 12 for details). In Boros et al. [131], patterns have been used to
define a family of discriminants, namely, pseudo-Boolean functions of the form
k
 h

d(X) = αi Ci+ (X) − βj Cj− (X),
i=1 j =1

 
where the αi ’s and the βj ’s are nonnegative reals, and ki=1 αi = hj=1 βj = 1.
An appropriate choice of the parameters (αi , βj ) allows the construction of dis-
criminants which take “high" values in positive observations, and “low" values
in negative ones. We refer to [131] for details. See also Genkin, Kulikowski, and
Muchnik [376] for other pseudo-Boolean models in data mining. 

Application 13.5. (Computer vision.) A fundamental problem in computer vision


is to restore a “better” version of an initially blurred, or “noisy” image. Ide-
ally, the restored image should be “similar” to the initial one but should display
large “uniformly colored” regions with “crisp” transitions at boundaries between
different colors.
A basic formulation of the problem can be stated as follows: We are given a set
P = {1, 2, . . . , n} of pixels, a set C = {1, 2, . . . , C} of colors, an initial assignment
c0 : P → C of colors to pixels, and a so-called energy function E(c) which measures
the inadequacy of any new coloring c : P → C. This energy function is to be
minimized over all possible colorings c.
Typically, the energy function takes the form
 
E(c) = (c0 (p) − c(p))2 + V (c(p), c(q)),
p∈P (p,q)∈E
568 13 Pseudo-Boolean functions

where E is a collection of “neighboring pixels.” The first group of terms estimates


the similarity between the initial coloring c0 and the new coloring c, whereas the
remaining terms penalize the assignment of distinct colors to neighboring pixels.
In the simplest (black-and-white) case, every pixel can take exactly one of two
colors (C = 2), so that each c(p) can be viewed as a Boolean variable and E(c)
is a quadratic pseudo-Boolean function (by virtue of Theorem 13.1 hereunder,
and because each term V (c(p), c(q)) depends on two Boolean variables only).
In spite of its apparent simplicity, this binary model arises as a subproblem in
the solution of more realistic formulations. We refer the reader to Boykov, Vek-
sler, and Zabih [147] or to Kolmogorov and Rother [576] for more details on
applications. 

Operations research
Application 13.6. (0–1 linear programming.) Consider the 0–1 linear program-
ming problem
n

maximize z(x1 , x2 , . . . , xn ) = cj x j (13.4)
j =1
n

subject to aij xj = bi , i = 1, 2, . . . , m (13.5)
j =1

(x1 , x2 , . . . , xn ) ∈ {0, 1}n . (13.6)

This fundamental problem of discrete optimization is equivalent to the uncon-


strained quadratic pseudo-Boolean optimization problem
n
 m 
 n
maximize f (x1 , x2 , . . . , xn ) = cj xj − M ( aij xj − bi )2
j =1 i=1 j =1

subject to (x1 , x2 , . . . , xn ) ∈ {0, 1}n , (13.7)

for a sufficiently large value of M. 

Application 13.7. (Game theory.) A game in characteristic form is a set func-


tion f defined on P(N ), where N = {1, 2, . . . , n} is a finite set of players. The
value of f (S) is interpreted as the payoff that players in S can secure by acting
together. It is usual to assume that f (∅) = 0 and that f is monotone nondecreas-
ing, that is, f (S) ≤ f (T ) whenever S ⊆ T . Another frequent assumption is that
f is superadditive, meaning that f (S) + f (T ) ≤ f (S ∪ T ) whenever S ∩ T = ∅.
The economic interpretation of superadditivity is that players can achieve more
value by cooperating than by acting separately.
If f is viewed as a pseudo-Boolean function on Bn , then its multilinear rep-
resentation and its continuous extension f c (see Sections 13.2 and 13.3) play
13.1 Definitions and examples 569

an interesting role in this context. Indeed, several central concepts in game


theory (such as imputations, core, Shapley value, Banzhaf index) have natural
pseudo-Boolean interpretations, leading to interesting theoretical and algorithmic
insights; see, for instance, [441, 456, 669, 719, 720] and Section 13.5.
Monotone nondecreasing functions such that f (∅) = 0 and f (N ) = 1 have
also been examined in artificial intelligence under the name belief functions or
Choquet capacities, where they are used to model uncertainty and subjective prob-
abilities (see Chateauneuf and Jaffray [188]; Shafer [824, 825], etc.), and in
multicriteria decision-making under the name fuzzy measures, as tools for the
aggregation of interacting criteria (see Sugeno [851]; Grabisch [405]; Grabisch,
Marichal, Mesiar, and Pap [406]; Marichal [668], etc.). They are discussed further
in Section 13.6.2.
Application 13.8. (Production management and logistics.) So-called fixed charge
constraints of the form
(y = 1) if and only if (xi = 1 for all i ∈ A) (13.8)
are encountered in many business decision problems, like capital budgeting, pro-
duction
 planning, plant location, etc. Since constraint (13.8) simply expresses that
y = i∈A xi , pseudo-Boolean formulations of the associated problems often arise
quite naturally by elimination of the y-variables. We briefly describe two models
of this type.
A fundamental planning problem for flexible manufacturing systems (FMS) is
the part selection problem. A part-set containing n parts must be processed, one
part at a time, on a single flexible machine. The machine can use different tools,
numbered from 1 to m. Each part requires a specific subset of tools which have to
be loaded in the tool magazine of the machine before the part can be processed:
Say part i requires T (i) ⊆ {1, 2, . . . , m}. The magazine features C tool slots. When
loaded on the machine, tool j occupies sj slots in the magazine (j = 1, 2, . . . , m).
The total number of tools required to process all parts can be much larger than C, so
that it is sometimes necessary to change tools in order to process the complete part-
set. Now, the part selection problem consists in determining the largest number of
parts that can be produced without tool changes.
This problem can be modeled as a pseudo-Boolean optimization problem in
various ways. In the simplest model, a Boolean variable xj indicates whether
tool j is placed in the magazine or not (j = 1, 2, . . . , m). Then, the part selection
problem is
 
n
 
maximize f (x1 , x2 , . . . , xm ) =  xj  (13.9)
i=1 j ∈T (i)
n

subject to sj xj ≤ C, (13.10)
j =1

(x1 , x2 , . . . , xm ) ∈ {0, 1}m , (13.11)


570 13 Pseudo-Boolean functions


where the product j ∈T (i) xj takes value 1 only if part i can be processed by the
selected tools. Different formulations and detailed discussions of this problem can
be found, for instance, in [228, 238, 845].
For a second example, consider the classical simple facility location problem:
Here, we must select an optimal subset of locations for some facilities (such as
plants, warehouses, emergency facilities) in order to serve the needs of a set of
users. Opening a facility in a given location i requires a fixed cost ci , and deliv-
ering the service to user j from location i carries a cost dj i (j = 1, 2, . . . , m,
i = 1, 2, . . . , n).
Let us introduce a 0–1 variable xi which indicates whether a facility is to be
opened in location i (i = 1, . . . , n). Two pseudo-Boolean functions can be defined:
A function c(X) to indicate the total fixed cost required to open a configuration
X = (x1 , x2 , . . . , xn ), and a function d(X) to indicate the optimal cost of serving
the set of users from the corresponding locations. The optimal location problem
(essentially) consists now in finding the minimum of the pseudo-Boolean function
c(X) + d(X). Detailed expressions of this function were first proposed by Hammer
[434] and further examined in [70, 263, 394, 395, 900, etc.]. If we denote by π(j ) =
(i1 (j ), i2 (j ), . . . , in (j )) a permutation of locations such that dj i1 (j ) ≤ dj i2 (j ) ≤ . . . ≤
dj in (j ) , then the function to be minimized can be written as
n
 m 
 n  n

f (X) = ci xi + dj ik (j ) xik (j ) x i- (j ) + M xi .
i=1 j =1 k=1 -<k i=1

In this formulation, the last term involves a large penalty M; it is necessary to


ensure that at least one facility is opened. 

13.2 Representations
Different application areas may rely on different descriptions of pseudo-Boolean
functions. For instance, in game theory, the payoff of a coalition of players may
be computed as the optimal value of an associated combinatorial optimization
problem (see Bilbao [79]). In other models, the values assumed by a pseudo-
Boolean function may be listed in a table, or computed by a black-box oracle.
One of the main impacts of the pseudo-Boolean viewpoint on the theory of set
functions, however, is due to the existence of various algebraic representations
of these functions. The properties of such algebraic representations are the main
topic of this section.

13.2.1 Polynomial expressions, pseudo-Boolean normal forms


and posiforms
The following representation theorem is stated in Hammer, Rosenberg, and
Rudeanu [458] and in Hammer and Rudeanu [460] (where it is attributed to T.
Gaspar).
13.2 Representations 571

Theorem 13.1. For every pseudo-Boolean function f on B n , there exists a unique


mapping c : P(N ) → R such that
 
f (x1 , x2 , . . . , xn ) = c(A) xi . (13.12)
A∈P(N ) i∈A

Proof. For every point X ∗ ∈ Bn , the expression


 
f (X∗ ) xi xj (13.13)
i|xi∗ =1 j |xj∗ =0

takes value f (X ∗ ) in the point X ∗ , and the value 0 in every other point of B n .
Therefore,
  

f (x1 , x2 , . . . , xn ) = f (X ) xi xj . (13.14)
X ∗ ∈Bn i|xi∗ =1 j |xj∗ =0

Replacing x j by (1 − xj ), expanding the products and using distributivity


immediately yields a polynomial expression of the form (13.12).
Assume now that p1 and p2 are two different polynomial expressions of
the form (13.12) with coefficients c1 (A) and c2 (A), A ∈ P(N ), respectively.
Let A∗ be a subset of N such that c1 (A∗ ) = c2 (A∗ ), and such that c1 (A) =
c2 (A) for all A with |A| < |A∗ |. If X∗ denotes the characteristic vector of A∗ ,
then p1 (X ∗ ) − p2 (X ∗ ) = c1 (A∗ ) − c2 (A∗ )  = 0, so that p1 and p2 cannot both
represent f . 

Note that the polynomial (13.12) is linear in each of its variables: We say that
it is multilinear.
Definition 13.1. The expression in the right-hand side of (13.12) is the (multilin-
ear) polynomial expression of f . The degree of f is the degree of this polynomial,
namely, degree(f ) = max{|A| : c(A)  = 0}. We say that a pseudo-Boolean func-
tion is either linear, or quadratic, or cubic if its degree is at most 1, or 2, or 3,
respectively.
The set function c: P(N ) → R is sometimes called the Möbius transform or the
mass function associated with f (see for instance [407, 824]). In fact, it follows
from the elementary theory of Möbius inversion for ordered sets that c can be
computed as

c(A) = (−1)|A|−|S| f (eS ) for all A ∈ P(N ),
S⊆A

where eS denotes as usual the characteristic vector of S (see Aigner [12]). The
bijective correspondence linking the functions f and c has been investigated in
a broader context by various authors; see for instance Grabisch, Marichal, and
Roubens [407].
572 13 Pseudo-Boolean functions

The polynomial expression of a pseudo-Boolean function does not involve


complemented variables. If we allow complementation, then we obtain a broader
class of expressions.

Definition 13.2. A pseudo-Boolean normal form (PBNF) is an expression ψ of


the form
  
m
  
ψ(x1 , x2 , . . . , xn ) = b0 + bk  xi   xj  , (13.15)
k=1 i∈Ak j ∈Bk

where b0 , b1 , . . . , bm are real coefficients, and Ak ∩ Bk = ∅, Ak ∪ Bk  = ∅ for k =


1, 2, . . . , m.

Every pseudo-Boolean function can be represented by (many) distinct PBNFs.


For instance, the representation in equation (13.14) is a PBNF (called the minterm
PBNF) of f that can be readily constructed from a table of values of f ; see also
Example 13.1 hereunder.
PBNFs with positive coefficients play a special role in many applications.

Definition 13.3. The PBNF (13.15) is called a posiform if bk > 0 for all k =
1, . . . , m.

Note that the sign of the free coefficient b0 is unrestricted in a posiform. Hammer
and Rosenberg [457] introduced posiforms and observed the following property:

Theorem 13.2. Every pseudo-Boolean function can be represented by a posiform.

Proof. Let us consider the polynomial representation (13.12) of a pseudo-Boolean


function f . If T = c xi1 xi2 . . . xik is a term of (13.12) with c < 0, then successive
applications of the identity xij = 1 − x ij , for j = k, k − 1, . . . down to 1, transform
T into

T = c − c x i1 − c xi1 x i2 − · · · − c xi1 xi2 · · · xik−1 x ik ,

which is a posiform. Repeating this transformation for every negative term of


(13.12) eventually produces a posiform of f . 

Other posiforms representing the same pseudo-Boolean function would be


obtained by applying in a different order the transformations described in the
proof of Theorem 13.2.
13.2 Representations 573

Example 13.1. The pseudo-Boolean function f (x, y, z) defined by the table

x y z f (x, y, z)
0 0 0 3
0 0 1 1
0 1 0 0
0 1 1 −2
1 0 0 4
1 0 1 2
1 1 0 −5
1 1 1 6

admits the minterm PBNF

µ = 3 x y z + x yz − 2 xyz + 4 xy z + 2 xyz − 5 xyz + 6 xyz.

Replacing each complemented variable u by 1 − u, we find the unique polynomial


expression of f :
f = 3 + x − 3 y − 2 z − 6 xy + 13 xyz.
Replacing now the terms −3y and −2z by −3 + 3 y and −2 + 2 z, respectively,
and replacing the term −6 xy either by −6 + 6 x + 6x y or by −6 + 6 y + 6 xy, we
obtain the posiform representations:

ψ1 = −8 + x + 6 x + 3 y + 2 z + 6 xy + 13 xyz

and
ψ2 = −8 + x + 3 y + 6 y + 2 z + 6 xy + 13 xyz,
which can be further simplified to

ψ1 = −7 + 5 x + 3 y + 2 z + 6 xy + 13 xyz

and
ψ2 = −8 + x + 9 y + 2 z + 6 xy + 13 xyz,
respectively. 

13.2.2 Piecewise linear representations


Hammer and Rosenberg [457] observed that every pseudo-Boolean function f
can be expressed as the pointwise-minimum of a family of linear functions. To see
this, consider an arbitrary posiform of f :
  
m
  
ψ(x1 , x2 , . . . , xn ) = b0 + bk  xi   xj  (13.16)
k=1 i∈Ak j ∈Bk
574 13 Pseudo-Boolean functions

where Ak ∩ Bk = ∅, Ak ∪ Bk  = ∅ and bk > 0 for k = 1, 2, . . . , m. A selector for


(13.16) is a vector σ = (σ1 , σ2 , . . . , σm ) such that σk ∈ Ak ∪ Bk for k = 1, 2, . . . , m.
For every selector σ , the linear function
m
 m

l σ (x1 , x2 , . . . , xn ) = b0 + b k xσ k + bk (1 − xσk ) (13.17)
k=1: k=1:
σk ∈Ak σk ∈Bk

is a majorant of f , that is, f (x1 , x2 , . . . , xn ) ≤ l σ (x1 , x2 , . . . , xn ) for all


(x1 , x2 , . . . , xn ) ∈ Bn (since the inequality holds termwise).

Theorem 13.3. If S is the set of all selectors for (13.16), then

f (x1 , x2 , . . . , xn ) = min l σ (x1 , x2 , . . . , xn ) for all (x1 , x2 , . . . , xn ) ∈ Bn . (13.18)


σ ∈S

Proof. The previous discussion implies that f ≤ minσ ∈S l σ on B n .


To establish the reverse inequality, let X∗ be a point in B n . We define a selector σ
 
as follows. For k = 1, 2, . . . , m, consider the value of Tk∗ = ( i∈Ak xi∗ )( j ∈Bk xj∗ ).
If Tk∗ = 1, then σk can be an arbitrary index in Ak ∪ Bk . If Tk∗ = 0, then σk is either
an index in Ak such that xσ∗k = 0 or an index in Bk such that xσ∗k = 0. In all cases,
it is easy to see that f (X ∗ ) = l σ (X ∗ ), and hence, equality holds in (13.18). 

13.2.3 Disjunctive and conjunctive normal forms


An interesting representation of pseudo-Boolean functions is based on the use
of elementary conjunctions and disjunctions, by analogy with classical represen-
tations of Boolean functions. Our discussion in this section is based on several
papers by Foldes and Hammer [336, 337, 338], where additional information
can be found. Closely related concepts are discussed by Cunninghame-Green
[247]; Davio, Deschamps, and Thayse [259]; Grabisch et al. [406]; Marichal [668];
Störmer [849]; Sugeno [851], and so on.

Definition 13.4. If f1 and f2 are two pseudo-Boolean functions on B n , their


disjunction is the pseudo-Boolean function f1 ∨ f2 defined as

(f1 ∨ f2 )(X) = max{f1 (X), f2 (X)} for all X ∈ Bn ,

and their conjunction is the function f1 ∧ f2 defined as

(f1 ∧ f2 )(X) = min{f1 (X), f2 (X)} for all X ∈ Bn .

Clearly, if the functions f1 and f2 are Boolean, then disjunction and conjunc-
tion are simply the usual Boolean operators (and we sometimes omit to write the
operator ∧).
13.2 Representations 575

Definition 13.5. A (pseudo-Boolean) elementary conjunction is an expression of


the form


p(X) = a + b xi xj , (13.19)
i∈A j ∈B

where a, b ∈ R, b ≥ 0, and A, B are subsets of indices with |A| + |B| ≥ 1 and


A ∩ B = ∅.
A (pseudo-Boolean) disjunctive normal form (DNF) is a disjunction of elemen-
tary conjunctions that all have the same minimum, that is, an expression of the
form
m 9

:
f= a + bk xi xj , (13.20)
k=1 i∈Ak j ∈Bk

where b1 , b2 , . . . , bm ≥ 0, |Ak |+|Bk | ≥ 1 and Ak ∩Bk = ∅ for k = 1, 2, . . . , m. We say


that the right-hand side of (13.20) is a DNF representation or a DNF expression
of the function f .
Note that every constant function p(X) = a is an elementary conjunction (with
b = 0).
We know that Boolean functions can always be represented in disjunctive
normal form. In the pseudo-Boolean case, we similarly obtain:
Theorem 13.4. Every pseudo-Boolean function has infinitely many DNF repre-
sentations.
Proof. Let a be any constant such that a ≤ minX∈Bn f (X). Then, f is represented
by the DNF expression
9

:
ψ(x1 , . . . , xn ) = a + (f (X∗ ) − a) xi xj (13.21)
X ∗ ∈Bn i|xi∗ =1 j |xj∗ =0

(compare with (13.14)). 

Example 13.2. The pseudo-Boolean function

f (x, y) = 6 + 3x − xy

attains its minimum value (min f (X) = 6) when (x, y) = (0, 0) or when (x, y) =
(0, 1). Hence, using the construction (13.21), f can be expressed as

f (x, y) = (6 + 2 x y) ∨ (6 + 3 x y),

or as
f (x, y) = (5 + 3 x y) ∨ (5 + x y) ∨ (5 + 4 x y) ∨ (5 + x y),
576 13 Pseudo-Boolean functions

or as
f (x, y) = (8 x y) ∨ (6 x y) ∨ (9 x y) ∨ (6 x y),
and so on. 

Pseudo-Boolean elementary disjunctions and conjunctive normal forms can be


defined in a similar way.Apseudo-Boolean elementary disjunction is an expression
of the form
a+b xi ∨ xj , (13.22)
i∈A j ∈B

where a, b ∈ R, b ≥ 0, and A, B are subsets of indices with |A|+|B| ≥ 1, A∩B = ∅.


A conjunctive normal form (CNF) is a conjunction of elementary disjunctions that
all have the same maximum M, namely, an expression of the form
m 9

:
f= ak + bk xi ∨ xj , (13.23)
k=1 i∈Ak j ∈Bk

where bk ≥ 0, ak + bk = M, |Ak | + |Bk | ≥ 1 and Ak ∩ Bk = ∅ for k = 1, 2, . . . , m.


The existence of CNF representations is shown similarly to that of DNFs.

Pseudo-Boolean implicants and implicates


An elementary conjunction (respectively, disjunction) p is an implicant (respec-
tively, implicate) of a pseudo-Boolean function f if p ≤ f (respectively, f ≤ p).
An implicant p is called a prime implicant of f if f has no implicant p such that
p  = p and p ≤ p . Prime implicates are similarly defined.
Let us establish some of the fundamental properties of pseudo-Boolean (prime)
implicants and implicates.
Lemma 13.1. Let f be a pseudo-Boolean function on B n , let



p(X) = a + b xi xj (13.24)
i∈A j ∈B

be an elementary conjunction, let Fp denote the face

Fp = {X ∈ Bn | xi = 1 for all i ∈ A and xj = 0 for all j ∈ B}

if b > 0, and let Fp = Bn if b = 0. Let fmin = minX∈Bn f (X) and fp =


minX∈Fp f (X).
(i) p is an implicant of f if and only if a ≤ fmin and a + b ≤ fp .
(ii) If p is a prime implicant of f , then a = fmin and a + b = fp .
Proof. Note that the elementary conjunction p(X) given by (13.24) takes value
a + b on Fp and value a elsewhere. Claim (i) follows immediately from these
observations.
13.2 Representations 577

To prove Claim (ii), assume first that b = 0 and that a < fmin (or equivalently
in this case, a + b < fp ). Since p (X) = fmin is an implicant of f and since
p(X) = a < fmin = p (X), we conclude that p(X) is not prime.
So, let us assume from now on that b > 0. If a 
< fmin , let 0 < M ≤ min(b, fmin −

a), and define p (X) = (a +M)+(b −M) x
i∈A i j ∈B x j . There holds p(X) ≤
p (X) ≤ f (X) for all X ∈ Bn , p(X)  = p (X), and we conclude that p(X) is not
prime.
Finally, assume that a + b< fp , let 0 < η ≤ fp − (a + b), and define
p (X) = a + (b + η) i∈A xi j ∈B x j . Here again, p(X) ≤ p (X) ≤ f (X)
for all X ∈ Bn , and we conclude that p(X) is not prime. 

Theorem 13.5. Every pseudo-Boolean function has an infinite number of impli-


cants and implicates, and a finite number of prime implicants and prime
implicates.
Proof. We only discuss the case of implicants, as a similar reasoning applies for
implicates. For every pseudo-Boolean function f , for every constant a strictly
smaller than the minimum value of f , and for every sufficiently small constant b,
the elementary conjunction (13.24) is an implicant of f . This shows that f has
infinitely many implicants.
Let us say that an implicant of the form (13.24) is “tight” if a = fmin and
a + b = fp , as in statement (ii) of Lemma 13.1. The lemma states that every
prime implicant is tight. Moreover, the number of tight implicants is finite, since
there is only a finite number of possible choices for the sets A and B in (13.24),
and the value of a and b is fixed as soon as A and B are given. This proves the
theorem. 

Theorem 13.6. For every implicant (respectively, implicate) p of a pseudo-


Boolean function f , there is a prime implicant (respectively, implicate) p of f
such that p ≤ p ≤ f (respectively, f ≤ p ≤ p).
Proof. We concentrate on the claim concerning an implicant p. The proof of Lemma
13.1 actually implies that there is a tight implicant p such that p ≤ p ≤ f . Let
us choose p to be maximal with this property, that is, let us assume that there is
no tight implicant q such that p ≤ q ≤ f , q  = p (this assumption is legitimate
because the set of tight implicants is finite).
Now, if p is not prime, then there exists another implicant p such that
p ≤ p ≤ f , p  = p . But here again, the proof of Lemma 13.1 implies that
p must be dominated by a tight implicant q such that p ≤ q ≤ f , contradicting
the maximality of p . 

If ψ is a DNF expression of a pseudo-Boolean function f , then all elementary


conjunctions appearing in ψ are implicants of f . Clearly, different DNFs may
use very different sets of implicants. However, the prime implicants allow us to
578 13 Pseudo-Boolean functions

define a canonical DNF for each pseudo-Boolean function, thus extending the cor-
responding representation theory of Boolean functions. A similar situation arises
for CNFs.

Theorem 13.7. Every pseudo-Boolean function is the disjunction of its prime


implicants and the conjunction of its prime implicates.

Proof. This is an immediate consequence of Theorems 13.5 and 13.6. 

Foldes and Hammer [336] propose an algorithm which produces all prime
implicants of an arbitrary function expressed in DNF. Their algorithm is a gener-
alization of the Boolean consensus method (see Chapter 3). It is also analogous
to the consensus procedure for discrete functions described by Davio, Deschamps
and Thayse [259].

13.3 Extensions of pseudo-Boolean functions


We denote by U n the “solid” hypercube U n = [0, 1]n spanned by B n .

Definition 13.6. A (continuous) extension of the pseudo-Boolean function f :


B n → R is a function g : U n → R which coincides with f at the vertices of the
hypercube, meaning that

f (X) = g(X) for all X ∈ Bn .

Remark. The term “extension” was used with a different meaning in Chapter 12,
where it applied to partially defined Boolean functions. On the other hand, the
qualifier “continuous” is somewhat ambiguous in Definition 13.6, since this def-
inition does not require that extensions be continuous in the standard sense for
functions of real variables (namely, with respect to the Euclidean topology of Rn );
the word “continuous” only reminds us here that extensions are defined over a
nondiscrete domain. Therefore, we generally use the short terminology “exten-
sion” in this chapter; this should hopefully cause no confusion. 

Extensions of pseudo-Boolean functions find applications in optimization (see


Section 13.4 hereunder), in reliability theory (see Section 1.13.4), in game theory
(see, e.g.,Alonso-Meijide et al. [18], Owen [719, 720]), or in multicriteria decision-
making as illustrated by the next example.

Application 13.9. (Multicriteria decision making.) Suppose that, in a particular


decision problem, n relevant criteria c1 , c2 , . . . , cn are defined and take value on a
continuous [0, 1] scale. Thus, ci (a) indicates the evaluation of a particular action
a according to criterion ci , and (c1 (a), . . . , cn (a)) ∈ U n .
A pseudo-Boolean function f on Bn can be used to model the importance of
each subset of criteria: namely, for each X ∈ Bn , the value f (X) indicates the
importance of the subset of criteria {ci | xi = 1}. Now, if g is an extension of
13.3 Extensions of pseudo-Boolean functions 579

f on U n , then g(c1 (a), . . . , cn (a)) can be interpreted as the global evaluation of


action a. For instance, if wi ≥ 0 (i = 1, 2, . . . , n),
n

f (X) = wi xi for all X ∈ Bn ,
i=1

and
n

g(X) = wi xi for all X ∈ U n ,
i=1

then wi can be viewed as the “weight” of criterion i in a simple additive weighing


scheme.
Other classes of pseudo-Boolean functions and extensions can be used
to model complex, nonlinear interactions among criteria (see for instance
[405, 407, 668]). 

Of course, every pseudo-Boolean function f has infinitely many extensions.


We now discuss some classes of extensions which have proved to be of special
interest in various settings.

13.3.1 The polynomial extension


Definition 13.7. When viewed as a mapping on U n , the multilinear polynomial
expression 

c(A) xi (13.25)
A∈P(N ) i∈A

of a pseudo-Boolean function f defines an extension of f that we call its


polynomial extension and that we denote by f pol .

In game theory, f pol is frequently called the multilinear extension of f ; see


Owen [719, 720].
More generally, if f is represented by the PBNF
  
m
  
ψ(x1 , x2 , . . . , xn ) = b0 + bk  xi   xj  , (13.26)
k=1 i∈Ak j ∈Bk

then the expression


  
m
  
ψ̂(x1 , x2 , . . . , xn ) = b0 + bk  xi   (1 − xj ) (13.27)
k=1 i∈Ak j ∈Bk

provides an alternative representation of the polynomial extension f pol . This easily


follows from the observation that, if we expand all products in (13.27), then we
580 13 Pseudo-Boolean functions

obtain a polynomial, which, in view of Theorem 13.1, necessarily coincides with


the multilinear polynomial expression of f .
Example 13.3. Consider again the pseudo-Boolean function f introduced in
Example 13.1, which can be represented by either of the expressions
φ = 3 + x − 3 y − 2 z − 6 xy + 13 xyz
or
ψ1 = −8 + x + 6 x + 3 y + 2 z + 6 xy + 13 xyz.
The expression
ψ̂1 = −8 + x + 6 (1 − x) + 3 (1 − y) + 2 (1 − z) + 6 x(1 − y) + 13 xyz
represents the extension f pol = 3 + x − 3 y − 2 z − 6 xy + 13 xyz on U 3 . 

The polynomial extension of f admits an interesting probabilistic interpretation.


Theorem 13.8. Let f be a pseudo-Boolean function on Bn . Assume that
x1 , x2 , . . . , xn are independent Bernoulli random variables, where xi takes value 1
with probability pi and value 0 with probability 1 − pi . Then, the expected value
of f is equal to f pol (p1 , p2 , . . . , pn ).
Proof. Let f be given by (13.25) and denote by E[u] the expectation of a random
variable u. Then,
;# $<
 
E[f (x1 , x2 , . . . , xn )] = c(A) E xi
A∈P(N ) i∈A
; <
 
= c(A) Prob xi = 1
A∈P(N ) i∈A
# $
 
= c(A) pi
A∈P(N ) i∈A

= f pol (p1 , p2 , . . . , pn ).


Example 13.4. In Example 13.3, if each variable takes value 0 or 1 with proba-
bility 12 , then the expected value of f is f pol ( 12 , 12 , 12 ) = 98 . 

In the special case where f is a Boolean function, Proposition 13.8 has already
been anticipated in our discussion of reliability theory, in Section 1.13.4 of
Chapter 1. In this framework, the polynomial extension f pol corresponds to the so-
called reliability polynomial; see for instance Colbourn [205, 206], Ramamurthy
[777].
13.3 Extensions of pseudo-Boolean functions 581

13.3.2 Concave and convex extensions


Every pseudo-Boolean function f admits various concave and convex extensions
that have been frequently examined in the optimization literature. A simple way
to demonstrate the existence of such extensions is to observe that the piecewise
linear representation (13.18) defines a concave real-valued function on Rn , as the
pointwise minimum of linear functions. Also, the function g defined by
n

g(x1 , x2 , . . . , xn ) = f pol (x1 , x2 , . . . , xn ) + M xj (1 − xj )
j =1

for all (x1 , x2 , . . . , xn ) ∈ U n

is an extension of f and is concave (respectively, convex) when M is a large


enough positive (respectively, negative) number (for a quadratic function f , this
was observed by Hammer and Rubin [459]; the general case was considered by
Gianessi and Niccolucci [378] and by Kalantari and Rosen [544]).
The concave envelope of f , denoted f env , is defined as the pointwise minimum
of all concave extensions of f :

f env (X) = min { g(X) | g is a concave extension of f } for all X ∈ U n .

Note that f env is concave on U n , as pointwise minimum of concave functions, and


that it can be viewed as the smallest concave extension of f . The convex envelope
of f would be similarly defined.
Another class of concave extensions has been introduced in Crama [227].
Suppose again that f is represented by the PBNF
  
m
  
ψ(x1 , x2 , . . . , xn ) = b0 + bk  xi   xj  , (13.28)
k=1 i∈Ak j ∈Bk

where b0 , b1 , . . . , bm ∈ R, Ak ∩ Bk = ∅, and Ak ∪ Bk  = ∅ for k = 1, 2, . . . , m.


Then the function
m

std
ψ (x1 , x2 , . . . , xn ) = b0 + bk gk (x1 , x2 , . . . , xn ) for all (x1 , x2 , . . . , xn ) ∈ U n
k=1
(13.29)
where




 min min(xi | i ∈ Ak ), min(1 − xj | j ∈ Bk ) if bk > 0

gk (x1 , x2 , . . . , xn ) =

  


 max 0, 1 − |Ak | + i∈Ak xi − j ∈Bk xj if bk < 0
 
is an extension of f and is concave: Indeed, gk (X) = ( i∈Ak xi ) ( j ∈Bk x j ) for
all X ∈ Bn , and each of the functions gk is concave (respectively, convex) for
582 13 Pseudo-Boolean functions

bk > 0 (respectively, for bk < 0). In [227], the function ψ std is called the standard
extension of f associated with the PBNF ψ.
The following facts will be useful:
Lemma 13.2. Consider the PBNF ψ in (13.28). For k = 1, 2, . . . , m, let Hk denote
the polyhedron
Hk = { (X, y) ∈ U n+1 | y ≤ gk (X) } if bk > 0
and
Hk = { (X, y) ∈ U n+1 | y ≥ gk (X) } if bk < 0.
All vertices of Hk are in B n+1 , that is, they only have 0–1 components.
Proof. The claim follows from the fact that, in both cases, the system of inequalities
defining Hk is totally unimodular; this follows from Theorem 5.13 in Chapter 5
for the case where bk is positive (see [47, 474, 786]); the other case is easily estab-
lished by direct arguments. 

The next lemma is found in Crama [227] (see also Hammer and Kalantari [445]
and Hammer and Simeone [463]).
Lemma 13.3. Consider the PBNF ψ in (13.28). If ψ consists of a single non-
 
constant term, that is, if ψ = b1 ( i∈A1 xi ) ( j ∈B1 x j ), then its standard extension
ψ std and its concave envelope ψ env coincide on U n :
ψ env (X) = ψ std (X) = b1 g1 (X) for all X ∈ U n .
Proof. Since ψ std is concave, ψ env ≤ ψ std on U n . To establish the reverse inequal-
ity, let X ∗ ∈ U n . Since the point (X∗ , g1 (X ∗ )) is in H1 , it is a convex combination
of vertices of H1 : That is, there exists a collection of 0–1 points (X r , y r ) ∈ H1

and of positive scalars λr (r ∈ R) such that (X∗ , g1 (X ∗ )) = r∈R λr (X r , y r ) and

r∈R λr = 1. Hence,

ψ std (X ∗ ) = 
b1 g1 (X ∗ )
= r∈R λr b1 y r

≤ r∈R λr b1 g1 (X r ) (since (X r , y r ) ∈ H1 )
= r∈R λr ψ(X r ) (since X r ∈ Bn by Lemma 13.2)

= r∈R  λr ψ env (X r ) (since Xr ∈ Bn )
≤ ψ ( r∈R λr X r )
env
(by concavity of ψ env )
= ψ env (X ∗ ).


Let us now introduce yet another class of concave extensions associated


with the PBNF ψ. For k = 1, 2, . . . , m, let pk be any linear function such that
 
bk ( i∈Ak xi ) ( j ∈Bk x j ) ≤ pk (X) for all X ∈ Bn . Then, the linear function
m

p(X) = b0 + pk (X) for all X ∈ U n (13.30)
k=1
13.3 Extensions of pseudo-Boolean functions 583

is called a paved upper-plane of ψ (and of the function f represented by ψ).


Clearly, a paved upper-plane is a linear majorant of f . Let now P denote the set
of all paved upper-planes of f . The paved upper-plane extension of f associated
with the PBNF ψ is the function ψ pup defined by

ψ pup (X) = min p(X) for all X ∈ U n . (13.31)


p∈P

Our next result shows that, in spite of their very different definitions, ψ std and
pup
ψ turn out to be identical.
Theorem 13.9. The standard extension ψ std and the paved upper-plane extension
ψ pup associated with a same PBNF ψ coincide on U n .
Proof. Let p(X) be a paved upper-plane of ψ given by (13.30). Since each term pk
(k = 1, 2, . . . , m) is a concave majorant of the corresponding term of ψ, it follows
from Lemma 13.3 that bk gk (X) ≤ pk (X), and hence, ψ std (X) ≤ p(X) for all
X ∈ U n . So, ψ std ≤ ψ pup on U n .
To see that ψ pup ≤ ψ std on U n , fix X∗ ∈ U n and consider the paved upper-plane
p(X) given by (13.30), where for each k = 1, 2, . . . , m:
(a) pk = bk xi if bk > 0 and gk (X ∗ ) = xi∗ , i ∈ Ak ;
(b) pk = bk (1 − xj ) if bk > 0 and gk (X ∗ ) = 1 − xj∗ , j ∈ Bk ;
(c) pk = 0 if bk < 0 and gk (X ∗ ) = 0;
 
(d) pk = bk (1 − |Ak | + i∈Ak xi − j ∈Bk xj ) if bk < 0 and gk (X ∗ ) > 0.

(Apply an arbitrary tie-breaking rule to select the indices i and j if either (a) or
(b) are ambiguous.) This construction is such that p(X ∗ ) = ψ std (X ∗ ), and hence,
ψ pup (X ∗ ) ≤ ψ std (X ∗ ). 

Theorem 13.9 is due to Crama [227]. It generalizes a sequence of previous


results by Hammer, Hansen, and Simeone [440]; Hansen, Lu, and Simeone [471];
Adams and Dearing [6], and so on, showing that the maximum of ψ std and the
maximum of ψ pup coincide on U n .

13.3.3 The Lovász extension


Consider again a pseudo-Boolean function f on B n and its polynomial expression
 
f (X) = c(A) xi . (13.32)
A∈P(N ) i∈A

In this section, we assume for simplicity of notations that f (0, 0, . . . , 0) = 0, that


is, c(∅) = 0.
Definition 13.8. The Lovász extension of f is the extension f L defined by

f L (X) = c(A) min xi for all (x1 , x2 , . . . , xn ) ∈ U n . (13.33)
i∈A
A∈P(N )
584 13 Pseudo-Boolean functions

This extension was introduced by Lovász in [624]; see also [625]. Observe
that, if c(A) ≥ 0 for all A ⊆ {1, 2, . . . , n} such that |A| ≥ 2, then f L coincides with
the standard extension associated with the polynomial representation of f , and
it is concave. In general, however, f L is neither concave nor convex on U n , as
illustrated by the next example.
Example 13.5. The Lovász extension of f (x, y, z) = xy − xz is the function f L =
min(x, y) − min(x, z), which is neither concave nor convex on U 3 since
1
2
= 12 f L (1, 1, 0) + 12 f L (0, 1, 1) > f L ( 12 , 1, 12 ) = 0,
and
− 12 = 12 f L (1, 0, 1) + 12 f L (0, 1, 1) < f L ( 12 , 12 , 1) = 0. 
The following discussion provides a different perspective on the Lovász exten-
sion. For a set A ⊆ {1, 2, . . . , n}, A  = ∅, denote by m(A) the smallest element in A:
m(A) = min{i | i ∈ A}. Let S = {X ∈ U n | x1 ≤ x2 ≤ . . . ≤ xn } and observe that
S is a simplex, that is, S is a full-dimensional convex bounded polyhedron with
n + 1 vertices. Its vertices are exactly the points (0, 0, . . . , 0, 0, 0), (0, 0, . . . , 0, 0, 1),
(0, 0, . . . , 0, 1, 1), . . ., (1, 1, . . . , 1, 1, 1).
Consider now the restriction of f L to the simplex S. This function, that we
denote by fSL , is linear on S: Indeed, for all X ∈ S, Definition 13.8 yields

f L (X) = fSL (X) = c(A) xm(A) .
A∈P(N)

Even more, since fSL coincides with f at the n + 1 vertices of S, it follows that
fSL actually is the unique linear extension of f on S.
This reasoning is easily generalized. For an arbitrary permutation π of
{1, 2, . . . , n}, let S(π ) be the simplex S(π ) = {X ∈ U n | xπ(1) ≤ xπ(2) ≤ . . . ≤ xπ(n) }
L L L
and let fS(π ) be the restriction of f to S(π ). Then, fS(π ) is the unique linear
n
extension of f on S(π ). Moreover, since the cube U is covered by the family of
simplices
S = {S(π ) | π is a permutation of {1, 2, . . . , n}},
it follows that f L is the unique extension of f that is linear on every member of S.
L
In order to obtain an analytical expression of the function fS(π ) , let us introduce
the following notation: For 1 ≤ k ≤ n, let
E π ,k = eπ(k) + eπ(k+1) + . . . + eπ(n) .
We also let E π,n+1 = (0, . . . , 0), so that E π ,1 , E π ,2 , ..., E π ,n+1 are exactly the vertices
of the simplex S(π ).
Theorem 13.10. For every permutation π of {1, 2, . . . , n} and for every X ∈ S(π ),
n

L
fS(π ) (X) = (xπ(k) − xπ(k−1) )f (E π ,k ), (13.34)
k=1

where xπ(0) = 0 by convention.


13.4 Pseudo-Boolean optimization 585

Proof. Since the right-hand side of (13.34) defines a linear function, it suffices to
verify that this function coincides with f at every vertex of S(π ), which is true by
construction. 

Equation (13.34) leads to the definition of f L originally proposed by Lovász in


[624, 625] (see also the end-of-chapter exercises). As observed by Singer [836],
this approach to the construction of extensions can be further generalized by
considering different coverings of U n by collections of simplices.

13.4 Pseudo-Boolean optimization


We refer to the optimization of pseudo-Boolean functions over subsets of B n =
{0, 1}n as pseudo-Boolean optimization or nonlinear 0–1 optimization. This impor-
tant field of research was popularized by Hammer and Rudanu [460], and is
surveyed in [127, 469]. We mostly restrict ourselves here to a discussion of the
unconstrained maximization problem
maximize f (X) subject to X ∈ Bn , (13.35)
and we only mention a few fundamental results about it.

Remark. Some authors have recently started to use the term “pseudo-Boolean
optimization problems” to designate 0–1 linear programming problems of the
form (13.4)–(13.6), possibly subject to inequality constraints; see Eén and Sörens-
son [290]; Manquinho and Roussel [667], and so on. This usage is likely to create
confusion with the classically accepted definition of pseudo-Boolean optimization
problems, and we do not encourage it. 

Observe that the unconstrained problem (13.35) is NP-hard even when f is


quadratic, since it subsumes several hard combinatorial problems, like max-cut,
weighted stability, Max 2-Sat, or 0–1 linear programming (see Section 13.1). We
return to quadratic optimization in Section 13.6.1. On the other hand, problem
(13.35) turns out to be easy when f is linear: Indeed, if
n

f (X) = wi xi ,
i=1

then the maximum of f is attained at any point X ∗ ∈ Bn such that


xi∗ = 1 when wi > 0,
(13.36)
xi∗ = 0 when wi < 0.

13.4.1 Local optima


We start with a few definitions.
Definition 13.9. Two points X ∗ , Y ∗ ∈ B n are neighbors if they differ in exactly one
component, that is, if they correspond to adjacent vertices of the unit hypercube.
586 13 Pseudo-Boolean functions

If f is a pseudo-Boolean function on Bn , then X∗ ∈ B n is a local maximum of f


if
f (X∗ ) ≥ f (Y ∗ ) for all neighbors Y ∗ of X ∗ .

Definition 13.10. For i = 1, 2, . . . , n, the i-th derivative of f is the pseudo-Boolean


function

Ji f = f (x1 , . . . , xi−1 , 1, xi+1 , . . . , xn ) − f (x1 , . . . , xi−1 , 0, xi+1 , . . . , xn ). (13.37)

Since Ji f does not depend on xi , we may want to look at it as a function


on B n or on B n−1 , as the context requires. It is easy to check that if the (unique)
polynomial expression of f is written as

f (x1 , x2 , . . . , xn ) = xi g(x1 , . . . , xi−1 , xi+1 , . . . , xn ) + h(x1 , . . . , xi−1 , xi+1 , . . . , xn ),


(13.38)

where the polynomials g and h do not depend on xi , then g is the (unique) poly-
nomial expression of Ji f . In other words, the polynomial expression of Ji f is
∂f
obtained by writing the partial derivative ∂x i
of the polynomial expression of f
with respect to xi .
Fortet [343] and Hammer and Rudeanu [460] observed that the local maxima of
a function are characterized by a system of implications involving its derivatives
(compare with (13.36)).

Theorem 13.11. If f is a pseudo-Boolean function on B n , then X ∗ ∈ Bn is a local


maximum of f if and only if the following conditions hold for i = 1, 2, . . . , n:

xi∗ = 1 when Ji f (X ∗ ) > 0,


(13.39)
xi∗ = 0 when Ji f (X ∗ ) < 0.

Proof. This is easily derived from (13.37) or from (13.38). 

Let now Mi be an arbitrary upper bound on |Ji f | (for instance, the sum of the
absolute values of all coefficients in the polynomial representation of Ji f ). Then,
it is easily seen that an equivalent characterization of the local maxima of f is
given by the system of inequalities

Mi (xi − 1) ≤ Ji f ≤ Mi xi , for i = 1, 2, . . . , n. (13.40)

Thus, in principle, a local maximum of f could be obtained by finding a 0–1


solution of the system (13.40). This may be a difficult task in itself. It should be
observed, however, that the system (13.40) is linear when f is quadratic and that it
may lend itself to an easier treatment in this special case.
A local maximum of f can be found by any simple local search procedure
starting from an arbitrary 0–1 point and moving from neighbor to neighbor as long
as this improves the value of the function. Such algorithms tend to work very fast
in practice; see, for instance, Boros, Hammer, and Tavares [136]; Boykov, Veksler,
13.4 Pseudo-Boolean optimization 587

and Zabih [147]; Davoine, Hammer, and Vizvári [262]; Hansen and Jaumard [468];
Hvattum, Løkketangen, and Glover [513]; Lodi, Allemand, and Liebling [620],
Merz and Freisleben [681], and so on.
From a theoretical perspective, however, things are not so nice. Indeed, it can
be shown that in order to find a local maximum of a pseudo-Boolean function
of n variables, such local search procedures may require a number of steps that
grows exponentially with n (see Emamy-K. [311]; Hammer, Simeone, Liebling,
and de Werra [464]; Hoke [496]; Tovey [866, 867, 868] for related investigations)
or with the encoding size of the polynomial expression of f (see Schäffer and Yan-
nakakis [806]). Moreover, Schäffer and Yannakakis [806] proved that computing
a local maximum of a quadratic pseudo-Boolean function belongs to a class of
hard (so-called PLS-complete), and likely intractable, local search problems (see
also Pardalos and Jha [730]).
Finally, it should be observed that the value of f may be arbitrarily worse in a
local maximum of f than in its global maximum (see Exercise 5 at the end of the
chapter).

13.4.2 An elimination algorithm for global optimization


Hammer, Rosenberg and Rudeanu [458, 460] described a combinatorial variable
elimination algorithm that finds a global maximum of a pseudo-Boolean func-
tion. The following streamlined version and an efficient implementation of this
algorithm have been proposed by Crama, Hansen, and Jaumard [235].
Let f0 (x1 , x2 , . . . , xn ) be the function to be maximized. We can write

f0 (x1 , x2 , . . . , xn ) = x1 J1 (x2 , x3 , . . . , xn ) + h(x2 , x3 , . . . , xn ),

where J1 and h do not depend on x1 . As a slight extension of Theorem 13.11, it


is easy to see that there exists a global maximum of0 f
, say (x1∗ , x2∗ , . . . , xn∗ ), with the
property that

x1∗ = 1 if and only if J1 (x2∗ , x3∗ , . . . , xn∗ ) > 0. (13.41)

This observation suggests a function t1 (x2 , x3 , . . . , xn ) defined as follows:

t1 (x2 , x3 , . . . , xn ) = J1 (x2 , x3 , . . . , xn ) if J1 (x2 , x3 , . . . , xn ) > 0,


(13.42)
=0 otherwise.

Then, setting f1 = t1 +h, we have reduced the maximization of the original function
f0 in n variables to the maximization of f1 , which only depends on n − 1 variables:
Indeed, if (x2∗ , x3∗ , . . . , xn∗ ) is a maximum of f1 , then setting x1∗ to either 0 or 1
according to rules (13.41) yields a maximum of0 . f
Repeating n times this elimination process produces a sequence of pseudo-
Boolean functions f0 , f1 , . . . , fn , where fi depends on n − i variables, and
eventually allows us to determine a (global) maximum of f0 by backtracking. (Note
the analogy with the elimination techniques for the solution of Boolean equations
588 13 Pseudo-Boolean functions

presented in Chapter 2, Section 2.6, which originally inspired the development of


this procedure.)
Assuming that f0 is given in pseudo-Boolean normal form (13.15), the expen-
sive step in the elimination process is to deduce a PBNF of fi+1 from a PBNF
of fi , for i = 0, 1, . . . , n − 1. An efficient implementation of this step has been
proposed in [235], where it is also proved that the elimination algorithm runs in
polynomial time for a special class of pseudo-Boolean functions associated with
graphs of bounded tree-width.

13.4.3 Extensions and relaxations


If g is an arbitrary extension of the pseudo-Boolean function f over the cube
U n = [0, 1]n , then maxX∈U n g(X) is an upper bound for maxX∈Bn f (X). We now
examine some properties of this bound for different families of extensions.

The polynomial extension


As observed by Rosenberg [789], the multilinear polynomial extension f pol has
the attractive feature that its maximum is attained at a vertex of the hypercube
[0, 1]n and hence, that this maximum coincides with the maximum of f .

Theorem 13.12. For every pseudo-Boolean function f on B n ,

max f (X) = maxn f pol (X).


X∈Bn X∈U

Proof. Let X∗ denote a maximizer of f pol on U n and consider an arbitrary index


i ∈ {1, 2, . . . , n}. Write f pol as

f pol (x1 , x2 , . . . , xn ) = xi g(x1 , . . . , xi−1 , xi+1 , . . . , xn ) + h(x1 , . . . , xi−1 , xi+1 , . . . , xn ),


(13.43)
where the polynomials g and h do not depend on xi . The function

p(xi ) = xi g(x1∗ , . . . , xi−1


∗ ∗
, xi+1 , . . . , xn∗ ) + h(x1∗ , . . . , xi−1
∗ ∗
, xi+1 , . . . , xn∗ )

is linear in xi , so that the maximum of p(xi ) over U = [0, 1] is attained when xi = 0


or when xi = 1. Hence, if 0 < xi∗ < 1, we can replace xi∗ by a 0–1 value without
changing the value of f pol . 

Note that Theorem 13.12 can alternatively be viewed as a corollary of Theo-


rem 13.8: Indeed, for every point (p1 , p2 , . . . , pn ) ∈ U n , f pol (p1 , p2 , . . . , pn ) is
the expected value of f with respect to an appropriate probability distribution
on Bn ; hence, by well-known properties of the expectation, minX∈Bn f (X) ≤
f pol (p1 , p2 , . . . , pn ) ≤ maxX∈Bn f (X).
The proof of Theorem 13.12 actually implies that “rounding” a fractional point to a
“better” 0–1 point can be performed efficiently. This result was already anticipated
in earlier chapters of the book (see, e.g., Theorems 2.26, 2.27, and 2.28 in Section
13.4 Pseudo-Boolean optimization 589

Section 2.11.4), and was put to systematic use in Boros and Hammer [127], Boros
and Prékopa [145], and so on.
Theorem 13.12 also suggests that continuous global optimization techniques
can be applied to f pol to compute the maximum of f . This approach has not
proved computationally efficient in past experiments, but it remains conceptually
valuable.

Linearization and concave extensions


A classical approach to pseudo-Boolean optimization consists in transforming the
problem max{f (X) : X ∈ {0, 1}n } into an equivalent linear 0–1 programming prob-
lem by substituting a variable yk for the kth monomial Tk of a PBNF representation,
and by setting up a collection of linear constraints that enforce the equality yk = Tk .
More precisely, the following result can be traced to papers by Dantzig [256], Fortet
[342, 343], and Glover and Woolsey [387]; see Hansen, Jaumard, Mathon [469]
for additional references.
Theorem 13.13. If the pseudo-Boolean function f is represented by the PBNF
  

m  
ψ(x1 , x2 , . . . , xn ) = b0 + bk  xi   xj  , (13.44)
k=1 i∈Ak j ∈Bk

where b0 , b1 , . . . , bm ∈ R, Ak ∩ Bk = ∅, and Ak ∪ Bk = ∅ for k = 1, 2, . . . , m, then the


maximum of f over B n is equal to the optimal value of the 0–1 linear programming
problem

m
maximize b0 + bk yk (13.45)
k=1

subject to yk ≤ xi , i ∈ Ak , k = 1, 2, . . . , m, bk > 0; (13.46)


yk ≤ 1 − xj , j ∈ Bk , k = 1, 2, . . . , m, bk > 0; (13.47)
 
1−|Ak |+ xi − xj ≤ yk , k = 1, 2, . . . , m, bk < 0; (13.48)
i∈Ak j ∈Bk

xi ∈ {0, 1}, i = 1, 2, . . . , n; (13.49)


yk ∈ {0, 1}, k = 1, 2, . . . , m. (13.50)

Proof. In every optimal solution(X∗ , Y ∗ ) ∈


{0, 1}n+m of (13.45)–(13.50), variable
yk takes value 1 if and only if ( i∈Ak xi ) ( j ∈Bk x ∗j ) = 1.
∗ ∗


This 0–1 linear model can be handled, in principle, by any algorithm for the solu-
tion of integer programming problems. The analysis of its facial structure has been
been initiated by Balas and Mazzola [44, 45]. Its continuous relaxation, meaning
the linear programming problem obtained after replacing the integrality require-
ments (13.49) and (13.50) by the weaker constraints 0 ≤ xi ≤ 1 (i = 1, 2, . . . , n)
and 0 ≤ yk ≤ 1 (k = 1, 2, . . . , m), yields an easily computable upper bound W std
590 13 Pseudo-Boolean functions

U n of the concave standard extension ψ std introduced in Section 13.3. Properties


of the bound W std have been investigated by Hammer, Hansen, and Simeone [440]
and in a series of subsequent papers; see Crama [227] for a brief account and Section
13.6.1 for related considerations. Compare also with Theorem 2.26 and Theorem 2.28
in Section 2.11.4, where this relaxation was investigated in connection with the
Maximum Satisfiability problem.

The Lovász extension


An analog of Rosenberg’s Theorem 13.12 holds for the Lovász extension f L .

Theorem 13.14. For every pseudo-Boolean function f on B n ,

max f (X) = maxn f L (X).


X∈Bn X∈U

Proof. This follows from Theorem 13.10, which shows that the Lovász extension is
linear on every simplex S(π ): Hence, its maximum is necessarily attained at a
vertex of B n . 

13.4.4 Posiform transformations and conflict graphs


In view of Theorem 13.2, every pseudo-Boolean optimization problem can be
reduced to the optimization of a posiform
  
m m  
ψ = b0 + bk Tk = b0 + bk  xi   xj  , (13.51)
k=1 k=1 i∈Ak j ∈Bk

where Ak ∩ Bk = ∅, Ak ∪ Bk  = ∅, and bk > 0 for k = 1, 2, . . . , m. It turns out that


both the minimization and the maximization of posiforms have natural connections
with other fundamental combinatorial optimization problems.
First, Theorems 2.14 and 2.26 show that DNF equations and maximum sat-
isfiability problems are easily expressed as posiform minimization problems.
Conversely, a straightforward extension of Theorem 2.26 shows that every posi-
form minimization problem can be viewed as a maximum satisfiability problem:
Indeed, minimizing a posiform ψ precisely consists in finding a point X ∗ ∈ Bn
that cancels (or “satisfies") as many terms as possible in ψ.
In this minimization setting, a useful remark is that, if (13.51) is an arbitrary
posiform representation of a function f , then the free term b0 is a lower bound
on the global minimum of f (since the remaining terms are always nonnegative).
In fact, for any function f , there always exists a posiform such that the free
term b0 is exactly equal to minX∈Bn f (X) (we leave the proof of this claim as an
exercise for the reader). Approaches to pseudo-Boolean minimization based on this
observation have been developed for instance by Bourjolly, Hammer, Pulleyblank,
and Simeone [146] and Hammer, Hansen, and Simeone [440]. The idea is here to
13.4 Pseudo-Boolean optimization 591

“squeeze out” the highest possible constant b0 by successive transformations of a


posiform.
Let us now turn to the posiform maximization problem.As observed by Hammer
[437, 465], this problem bears a fruitful relation to the maximum weighted stability
problem described in Application 13.1. In order to discuss this relation, we first
define the concept of conflict graph (conflict graphs were introduced in a slightly
different framework in Chapter 5; see also [65, 69, 230, 461], etc.). Consider again
the posiform (13.51), and assume for simplicity that b0 = 0, as this assumption entails
no loss of generality. We say that two terms Tk and T- conflict if Tk T- ≡ 0 (that
is, if a same variable appears both in Tk and T- , once complemented and once
uncomplemented). Now, the conflict graph of ψ is the graph G(ψ) = (V , E),
where V = {1, 2, . . . , m}, and where (k, -) ∈ E if and only if Tk and T- conflict, for
k, - ∈ V . We say that bk is the weight of vertex k, for k = 1, 2, . . . , m. Finally, we
let α(ψ) denote the weight of a maximum weighted stable set in G(ψ):
& (

α(ψ) = max bk | S is a stable set of G(ψ) .
k∈S

Hammer [437, 465] proved:

Theorem 13.15. For every posiform ψ on Bn ,

max ψ(X) = α(ψ).


X∈Bn

Proof. For any point X∗ ∈ Bn , let us observe first that the set

S(X∗ ) = { k ∈ {1, 2, . . . , m} | Tk (X ∗ ) = 1}

is a stable set of the graph G(ψ). Indeed, no two terms in S(X∗ ) can conflict, since
otherwise, at least one of them would vanish at the point X∗ . Hence,

ψ(X ∗ ) = bk ≤ α(ψ) for all X∗ ∈ Bn .
k∈S(X ∗ )

Conversely, if S ⊆ V is a stable set of G(ψ), then the terms associated with


the vertices in S do not conflict, and thus all literals appearing in these terms can
simultaneously be made equal to 1. In other words, for any stable set S ⊆ V , there
exists a point X∗ ∈ B n such that Tk (X ∗ ) = 1 for all k ∈ S. Applying this observation
to a stable set S ∗ of maximum weight, we obtain
 
α(ψ) = bk = bk Tk (X ∗ ) ≤ ψ(X∗ ) ≤ maxn ψ(X).
X∈B
k∈S ∗ k∈S ∗


So, every posiform maximization problem can be easily reduced to a graph


stability problem. The converse statement is true as well, in view of the formulation
(13.2) and of Theorem 13.2. In fact, another interesting transformation of the
592 13 Pseudo-Boolean functions

weighted stable set problem to posiform maximization can also be inferred from
the following observations:
First, for a posiform ψ on Bn given by (13.51), consider an arbitrary variable x i
and define the sets

Pi = { k ∈ {1, 2, . . . , m} | i ∈ Ak } and Ni = { k ∈ {1, 2, . . . , m} | i ∈ Bk }

(possibly Pi = ∅ or Ni = ∅). By definition, in the conflict graph G(ψ), every vertex


of Pi is linked to every vertex of Ni ; in other words, the graph Hi = (Vi , Ei ) where
Vi = Pi ∪ Ni and
Ei = { (k, -) ∈ E | k ∈ Pi , - ∈ Ni }

is a complete bipartite subgraph of G(ψ). Moreover, E = ni=1 Ei , meaning that
the edge-set of G(ψ) is covered by the collection of complete bipartite graphs
H1 , H2 , . . . , Hn associated with the variables of ψ.
Hammer [437] observed that this construction can be reversed and established
the following result (recall that αG denotes the weight of a maximum weighted
stable set of G).

Theorem 13.16. For every graph G = (V , E) and vertex weights w : V → R+ ,


there exists a posiform ψ such that G = G(ψ) and αG = maxX∈Bn ψ(X).

Proof. Consider any collection H1 , H2 , . . . , Hn of complete bipartite graphs cov-


ering the edges of G, and let Hi = (Pi ∪ Ni , Ei ); thus, every edge of Hi has an

endpoint in Pi and the other endpoint in Ni , and E = ni=1 Ei . If I is the set of iso-
lated vertices of G, that is, I = {k ∈ V | for all e ∈ E, k  ∈ e}, and if I is nonempty,
then assume, without loss of generality, that Pn = I and Nn = En = ∅.
For i = 1, 2, . . . , n, associate a variable xi with the subgraph Hi , and for each
k ∈ V let

Ak = {i ∈ {1, 2, . . . , n} | k ∈ Pi },
Bk = {i ∈ {1, 2, . . . , n} | k ∈ Ni },
bk = w(k).

With these definitions, if ψ is the posiform given by (13.51), then it is easy


to check that G = G(ψ). The equality αG = maxX∈Bn ψ(X) follows from
Theorem 13.15. 

The relations between posiform maximization and weighted stability described


in Theorems 13.15 and 13.16 have been exploited by several researchers. Ebenegger,
Hammer, and de Werra [285], in particular, have proposed a specific posiform
transformation technique leading to an algorithm called struction for the weigthed
stability problem. Extensions and applications of struction to various classes of
graphs have been investigated in [14, 453, 491], and so on. We refer the reader to
these publications for more details.
13.6 Special classes of pseudo-Boolean functions 593

13.5 Approximations
In this section, we briefly discuss the problem of approximating a pseudo-Boolean
f on B n by a “simpler” function. Hammer and Holzman [441] considered the
specific version of this problem in which the objective is to find a function g of
degree k, for a predetermined value of k, which minimizes the L2 -norm

[f (X) − g(X)]2 . (13.52)
X∈Bn

When k = 1, g is the best linear L2 -approximation of f and we denote it by


L(f ). Let us assume that f is represented by the polynomial expression (13.12).
Then, in order to compute L(f ), it is sufficient to know how to compute the best
linear approximation of a monomial. Indeed, L(f ) can be viewed as the projection
of f on the subspace of linear functions, and hence, there holds
 
L(f ) = c(A) L xi .
A∈P(N) i∈A

Hammer and Holzman [441] showed that


  
1 
L xi = |A| 1 − |A| + 2 xi for all A ⊆ N .
i∈A
2 i∈A

The best quadratic, cubic, and higher-order L2 -approximations can be derived by


similar approaches; see also Ding, Lax, Chen, and Chen [273]; Ding, Lax, Chen,
Chen, and Marx [274]; Grabisch, Marichal, and Roubens [407]; or Zhang and
Rowe [936] for extensions of these results.
Important game-theoretical applications of best L2 -approximations consist in
finding the Banzhaf indices of the players of a simple game, or the Shapley values
of the players of an n-person characteristic function game .As shown in [441], these
indices are simply the coefficients of best (weighted) linear L2 -approximations of
the pseudo-Boolean functions describing the games.
Another application of these results allows the efficient determination of excel-
lent heuristic solutions of unconstrained pseudo-Boolean optimization problems,
as shown by Davoine, Hammer, and Vizvári [262]. Zhang and Rowe [936] dis-
cuss the relevance of pseudo-Boolean approximations for the development of
evolutionary algorithms.
Finally, we note that (different types of) approximations of pseudo-Boolean
functions are also of interest in the theory of probabilistic databases, where they
can be used to track the most influential facts in the derivation of a conclusion;
see Ré and Suciu [780].

13.6 Special classes of pseudo-Boolean functions


Many special classes of pseudo-Boolean functions can be defined by analogy with
their Boolean counterparts: quadratic, monotone, supermodular, and so on.
594 13 Pseudo-Boolean functions

13.6.1 Quadratic functions and quadratic 0-1 optimization


Quadratic pseudo-Boolean functions, or pseudo-Boolean functions of degree (at
most) 2, have been the object of numerous investigations; surveys are provided by
Boros and Hammer [127] and by Hammer and Simeone [463].
Quadratic 0–1 optimization, in particular, is an important special case of nonlin-
ear 0–1 optimization, both because numerous applications appear in this form (see
Applications 13.1, 13.5, 13.6, etc.), and because the general case is easily reduced
to it. This reduction can be performed in various ways. For instance, Theorems 13.2
and 13.15 suggest the following procedure: In order to maximize a pseudo-Boolean
function f , produce a posiform of f , build the conflict graph G of this posiform,
and formulate the weighted stability problem associated with G as a quadratic 0–1
maximization problem.
Another efficient transformation was proposed by Rosenberg [790]. It relies
on the substitution of the product of any two variables by a new variable, and the
addition of appropriate penalty terms which, at every optimal point, force the new
variable to take the value of the product of the two substituted variables. More
precisely:
Theorem 13.17. Let f be a pseudo-Boolean function represented by the polyno-
mial expression
m
 
f (x1 , x2 , . . . , xn ) = ck xi ,
k=1 i∈Ak

assume that |A1 | ≥ 2, and select j , - ∈ A1 . Let y be a new 0-1 variable, different
from x1 , x2 , . . . , xn , let M be a positive constant, and define
 m 
g(x1 , x2 , . . . , xn , y) = c1 xi y + ck xi
i∈A1 \{j ,-} k=2 i∈Ak

− M(xj x- − 2xj y − 2x- y + 3y).

If M is large enough, then the maximum value of f over Bn is equal to the maximum
value of g over B n+1 .
Proof. Consider any point (X ∗ , y ∗ ) ∈ Bn+1 . It is easy to check that the expression
xj∗ x-∗ − 2xj∗ y ∗ − 2x-∗ y ∗ + 3y ∗ is equal to 0 when y ∗ = xj∗ x-∗ , and is strictly positive
otherwise.
Assume now that M is large (say, M > |c1 |). Then, f (X ∗ ) = g(X ∗ , y ∗ ) for all
(X ∗ , y ∗ ) ∈ Bn+1 such that y ∗ = xj∗ x-∗ , and g(X∗ , y ∗ ) < f (X ∗ ) for all other points
in Bn+1 . The claim follows directly. 

Note that, after applying the transformation described in Theorem 13.17, the degree
of the first term of g is equal to |A1 | − 1. Thus, applying repeatedly this transfor-
mation eventually yields a function of degree 2 which has the same maximum
value as f .
13.6 Special classes of pseudo-Boolean functions 595

It is interesting to observe that this argument is analogous to the proof that


every Boolean DNF equation is equivalent to a DNF equation of degree 3 (see
Theorem 2.4 in Chapter 2). Actually, in many ways, it can be said that quadratic
0–1 optimization problems play the same fundamental role with respect to pseudo-
Boolean optimization problems, as DNF equations of degree 3 (or 3-Sat problems)
with respect to general DNF equations (or satisfiability problems).
Other transformations of pseudo-Boolean optimization problems to the
quadratic case have been proposed and have been shown to be computationally
effective by Buchheim and Rinaldi [164, 165].
Hammer, Hansen, and Simeone [440] showed that, for every quadratic pseudo-
Boolean function f , one can efficiently construct a linear function
n

l(x1 , x2 , . . . , xn ) = l0 + lj xj ,
j =1

called the roof dual of f , that majorizes f (x1 , x2 , . . . , xn ) in every binary point
and that has the following property of strong persistency: If lj is strictly positive
(respectively, negative), then xj is equal to 1 (respectively, 0) in every maximizer
of f . Thus, in some cases, strong persistency allows the determination of the
optimal values of a subset of variables.
Note that the maximum of l(X) over Bn is simply equal to ρ(f ) = l0 +
n
j =1 max(lj , 0), and ρ(f ) provides an upper-bound on the maximum of f over
Bn . Hammer, Hansen, and Simeone [440] proved that ρ(f ) is exactly the opti-
mal value W std of the continuous relaxation of the 0–1 linear programming model
(13.45)–(13.50) associated with the polynomial expression of f or with any posiform of
f.
Moreover, the equality ρ(f ) = maxX∈Bn f (X) holds if and only if an associated
quadratic Boolean function is consistent; therefore, the optimality of ρ(f ) can be
tested in polynomial time (see Exercise 10 in Chapter 5).
The determination of the roof dual l(X) was derived in [440] from the solution
of the continuous relaxation of the model (13.45)–(13.50); Boros, Hammer, and Sun
[125, 134] showed that the computation of the roof dual can be efficiently reduced
to a maximum flow problem. We refer again to the survey by Boros and Hammer
[127] for additional details, as well as to Boros, Crama, and Hammer [113, 114]
or Boros, Lari, and Simeone [143] for extensions of roof duality theory.
The convex hull of the set of 0–1 solutions of (13.46)–(13.50) is called the quadric
polytope, or correlation polytope. Its facial structure was investigated by Padberg
[723] and by several other authors; see also Deza and Laurent [270] and Laurent
and Rendl [601].
There is a huge number of papers discussing exact or heuristic optimization
algorithms for quadratic pseudo-Boolean functions, and it is impossible to cite
them all here. Among recent ones, let us only mention a variety of approaches by
Billionnet and Elloumi [85]; Boros, Hammer, and Tavares [136]; Glover and Hao
[386]; Gueye and Michelon [420]; Hansen and Meyer [472]; Lodi, Allemand, and
596 13 Pseudo-Boolean functions

Liebling [620]; Merz and Freisleben [681]; Palubeckis [724], and so on, as well
as efficient implementations of the roof duality computations in the framework
of computer vision applications by Kolmogorov and Rother [576] and Rother,
Kolmogorov, Lempitsky, and Szummer [794].

13.6.2 Monotone functions


Definition 13.11. A pseudo-Boolean function f on B n is called monotone
nondecreasing if

f (X) ≤ f (Y ) for all X, Y ∈ Bn such that X ≤ Y ,

and it is called monotone nonincreasing if

f (X) ≥ f (Y ) for all X, Y ∈ Bn such that X ≤ Y .

As noted in Application 13.7, monotone nondecreasing functions such that


f (0, . . . , 0) = 0 and f (1, . . . , 1) = 1 have also been studied in the literature under
the names of Choquet capacities, belief functions, fuzzy measures, and so on.

Example 13.6. The function f1 (x, y) = 1 + 2x + 2y − xy is monotone nonde-


creasing, while f2 (x, y) = 3 − y − xy is monotone nonincreasing. 

Just as in the case of functions of real variables, monotonicity properties can


be related to the signs of first-order derivatives.

Theorem 13.18. The pseudo-Boolean function f is monotone nondecreasing if


and only if Ji f (X) ≥ 0 for all X ∈ B n and for all i = 1, 2, . . . , n. It is monotone
nonincreasing if and only if Ji f (X) ≤ 0 for all X ∈ B n and for all i = 1, 2, . . . , n.

Proof. This is straightforward. 

Extending Definition 13.11, we say that a function f is monotone if the sign of


Ji f is constant on B n for each i = 1, 2, . . . , n (Wilde and Sanchez-Anton [909]).
Maximizing a monotone function f on B n is trivial if the sign of each first deriva-
tive is known: Indeed, a global maximum X∗ is obtained by setting xi∗ = 1 if
Ji f (X) ≥ 0 on Bn , and by setting xi∗ = 0 otherwise.
Note also that, for a function f given in polynomial form, the sign of each
derivative can be easily determined if we know beforehand that f is monotone
(Hammer [435]). However, recognizing whether a function is monotone is a hard
task in itself, as proved by Crama [226].

Theorem 13.19. It is co-NP-complete to decide whether a pseudo-Boolean func-


tion expressed in polynomial form is monotone, even when the input is restricted
to cubic polynomials.
13.6 Special classes of pseudo-Boolean functions 597

Proof. The decision problem is in co-NP: Indeed, in order to establish that an


instance f is not monotone, it suffices to exhibit two points X ∗ , Y ∗ ∈ B n such that
Ji f (X ∗ ) > 0 and Ji f (Y ∗ ) < 0.
To prove that the problem is co-NP-complete, we provide a transformation from
the NP-complete Subset Sum problem, which can be stated as follows (see [371]):
Given n + 1 positive integers (w1 , w2 , . . . , wn , t), is there a point X∗ ∈ B n such that
n ∗
j =1 wj xj = t?
With an arbitrary instance (w1 , w2 , . . . , wn , t) of Subset Sum, we associate the
linear function
n

r(x1 , x2 , . . . , xn ) = wj xj − t,
j =1

and the cubic function


n

f (x1 , x2 , . . . , xn+1 ) = (r 2 (x1 , x2 , . . . , xn ) − 1)xn+1 + 3C 2 xj ,
j =1


where C is a large enough constant (say, C = nj=1 wj + t). One easily verifies
that Ji f ≥ 0 for i = 1, 2, . . . , n, and that Jn+1 f = r 2 (x1 , x2 , . . . , xn ) − 1. Hence, f
is not monotone if and only if there exists X∗ ∈ Bn such that r(X ∗ ) = 0, that is, if
and only if the Subset Sum problem has a “Yes” answer. 

The same argument shows that it is also co-NP-complete to decide whether


a cubic function given in polynomial form is monotone nondecreasing or mono-
tone nonincreasing. For quadratic polynomials, the problem is easy in view of
Theorem 13.18.
Foldes and Hammer [337] have investigated monotone pseudo-Boolean func-
tions expressed in disjunctive normal forms (see also Marichal [668], Sugeno
[851]).

Example 13.7. The functions in Example 13.6 can also be expressed as f 1 =


x̄ ∨ 3x̄y ∨ 3x ∨ 4xy and f2 = y ∨ 3ȳ ∨ 2x̄y. 

We have already seen that it is co-NP-complete to recognize whether a Boolean


DNF is monotone (see Theorems 1.31 and 1.32 in Chapter 1). Since every Boolean
DNF can be interpreted as a pseudo-Boolean DNF, it easily follows that recognizing
monotone (nonincreasing or nondecreasing) pseudo-Boolean functions expressed
in DNF is also co-NP-complete.
The following theorem generalizes another well-known result from the theory
of Boolean functions (recall Theorem 1.21, and see Bioch [86] for an extension to
the class of discrete functions).

Theorem 13.20. For a pseudo-Boolean function f , the following conditions are


equivalent:
598 13 Pseudo-Boolean functions

(i) f is monotone nondecreasing.


(ii) Some DNF of f contains no complemented variables.
(iii) Some CNF of f contains no complemented variables.

Proof. By monotonicity of the operators ∨ and ∧, it is obvious that each of the


properties (ii) and (iii) implies (i).
Assume now that f is monotone nondecreasing and let us show that this implies
(ii) (the other case is similar). By Theorem 13.4, we know that f can be represented

by a pseudo-Boolean DNF of the form ψ = m k=1 pk (X), where each term pk has
the form



pk (X) = a + bk xi xj ,
i∈Ak j ∈Bk

where bk ≥ 0 for k = 1, 2, . . . , m.
Suppose that some term of ψ contains at least one complemented variable, say,
B1  = ∅, and let


q1 (X) = a + b1 xi .
i∈A1

Since p1 (X) ≤ q1 (X), there holds


m
m

f (X) = pk (X) ≤ q1 (X) ∨ pk (X). (13.53)
k=1 k=2

We claim that q1 (X) ≤ f (X) for all X ∈ Bn . Indeed, assume that q1 (X ∗ ) > f (X ∗ )
for some point X∗ ∈ B n . Then, we define another point Y ∗ ∈ B n as follows: yj∗ = xj∗
for all j  ∈ B1 , and yj∗ = 0 for all j ∈ B1 . For this point Y ∗ ,

p1 (Y ∗ ) = q1 (X ∗ ) > f (X ∗ ) ≥ f (Y ∗ )

(the last inequality holds because f is nondecreasing). But the conclusion p1 (Y ∗ ) >
f (Y ∗ ) is in contradiction with the definition of the DNF expression of f .
Thus, there holds q1 (X) ≤ f (X) for all X, and (13.53) leads to
m

f (X) = q1 (X) ∨ pk (X).
k=2

Repeating this procedure for each term of the DNF ψ, we eventually conclude that
the expression obtained by dropping all complemented literals from ψ is again a
DNF of f (compare with Theorem 1.24 in Chapter 1). 

Example 13.8. Consider again the nondecreasing function f1 already intro-


duced in Example 13.6 and in Example 13.7. This function is represented by
the DNF ψ = 1 ∨ 3y ∨ 3x ∨ 4xy and by the CNF φ = (3 + x) ∧ (3 + y)
∧ (1 + 3(x ∨ y)). 
13.6 Special classes of pseudo-Boolean functions 599

13.6.3 Supermodular and submodular functions


Definition 13.12. A pseudo-Boolean function f on B n is supermodular if

f (X) + f (Y ) ≤ f (X ∨ Y ) + f (X ∧ Y ) for all X, Y ∈ Bn . (13.54)

The function f is submodular if (−f ) is supermodular, or equivalently if

f (X) + f (Y ) ≥ f (X ∨ Y ) + f (X ∧ Y ) for all X, Y ∈ Bn . (13.55)

Supermodular and submodular functions arise in numerous contexts and have


been thoroughly investigated in discrete mathematics, in combinatorial optimiza-
tion, in algebra, in statistics, in game theory, in economics, in engineering, in
artificial intelligence, and so on. We refer to Choquet [192], Edmonds [287]
and Shapley [829] for early work, and to Fujishige [351], Iwata [523], Lovász
[624, 625], McCormick [637], Narayanan [703], Nemhauser and Wolsey [707],
Rosenmüller [791], Schrijver [814], and Topkis [865] for in-depth discussions and
additional references.
Specific examples of supermodular functions were encountered earlier in this
chapter. For instance, the objective function (13.9) in Application 13.8 is super-
modular. (The reader can either try to check directly the conditions in Definition
13.12 or use Theorem 13.21 hereunder.) Note also that, if f is supermodular and
f (0, 0, . . . , 0) = 0, then f is superadditive in the sense of Application 13.7.
Examples of submodular functions have been provided in Application 13.1
(Equations (13.1) and (13.2)) and in Application 13.2. Submodular functions
also arise in a variety of computer science models (data mining, see Application
13.4, Genkin, Kulikowski, and Muchnik [376]; computer vision, see Application
13.5, Boykov, Veksler, and Zabih [147], Kolmogorov and Zabih [577]; artificial
intelligence, see Živný, Cohen, and Jeavons [938]).
As argued by Lovász in [624], supermodular functions share some of the charac-
teristic features of concave and of convex functions on Rn . In particular, similarly
to convex functions, supermodular functions have nonnegative second derivatives
(or, equivalently, in view of Theorem 13.18, nondecreasing first derivatives).

Theorem 13.21. A pseudo-Boolean function f on Bn is supermodular if and only if

Ji Jj f (X) ≥ 0 for all X ∈ Bn and for all i, j = 1, 2, . . . , n. (13.56)

The function f is submodular if and only if

Ji Jj f (X) ≤ 0 for all X ∈ Bn and for all i, j = 1, 2, . . . , n. (13.57)

Proof. We focus on the first statement, since the second one follows immediately
by sign reversal.
600 13 Pseudo-Boolean functions

Suppose first that f is supermodular and consider two indices i < j (note that
Ji Ji f (X) ≡ 0). In view of Definition 13.10,

Ji Jj f = f (x1 , . . . , xi−1 , 1, xi+1 , . . . , xj −1 , 1, xj +1 , . . . , xn )


−f (x1 , . . . , xi−1 , 1, xi+1 , . . . , xj −1 , 0, xj +1 , . . . , xn )
−f (x1 , . . . , xi−1 , 0, xi+1 , . . . , xj −1 , 1, xj +1 , . . . , xn )
+f (x1 , . . . , xi−1 , 0, xi+1 , . . . , xj −1 , 0, xj +1 , . . . , xn )

for all (x1 , . . . , xi−1 , xi+1 , . . . , xj −1 , xj +1 , . . . , xn ) ∈ Bn−2 . Letting

X ∗ = (x1 , . . . , xi−1 , 1, xi+1 , . . . , xj −1 , 0, xj +1 , . . . , xn )

and
Y ∗ = (x1 , . . . , xi−1 , 0, xi+1 , . . . , xj −1 , 1, xj +1 , . . . , xn ),
we see that Ji Jj f ≥ 0 holds as a consequence of (13.54).
Conversely, assume that (13.56) holds, and let X0 , Y 0 ∈ Bn . We are going to estab-
lish that (13.54) holds for X0 , Y 0 by induction on the Hamming distance d(X0 , Y 0 )
between X0 and Y 0 , where
n

d(X, Y ) = |xi − yi |
i=1

for all X, Y ∈ Bn . When d(X0 , Y 0 ) = 0 or 1, the inequality (13.54) is trivially satisfied.


For d(X0 , Y 0 ) = 2, it is a reformulation of (13.56), as follows from the first part of
the proof. Assume now that d(X 0 , Y 0 ) ≥ 3, and assume without loss of generality
that X0 = (0, 0, X2 ) and Y 0 = (1, 1, Y 2 ).
Introduce the point U 0 = (0, 1, Y 2 ). There holds d(X 0 , U 0 ) = d(X 0 , Y 0 ) − 1,
and hence, by induction,

f (X 0 ) + f (U 0 ) ≤ f (X0 ∨ U 0 ) + f (X0 ∧ U 0 ). (13.58)

Moreover,

d(X0 ∨ U 0 , Y 0 ) = 1 + d(X2 ∨ Y 2 , Y 2 ) ≤ 1 + d(X2 , Y 2 ) = d(X0 , Y 0 ) − 1,

hence, we obtain again by induction and after some easy computations:


 
f (X0 ∨ U 0 ) + f (Y 0 ) ≤ f (X 0 ∨ U 0 ∨ Y 0 ) + f (X0 ∨ U 0 ) ∧ Y 0
= f (X0 ∨ Y 0 ) + f (U 0 ). (13.59)

Adding (13.58) and (13.59) yields

f (X0 ) + f (Y 0 ) ≤ f (X0 ∨ Y 0 ) + f (X0 ∧ Y 0 ),

and the proof is complete. 


13.6 Special classes of pseudo-Boolean functions 601

The sequence of Theorems 13.18 and 13.21 has been extended in Crama, Hammer, and
Holzman [232] and Foldes and Hammer [339] to the characterization of functions
with nonnegative derivatives of higher order (see also Choquet [192]).
Theorem 13.21 has several corollaries for a pseudo-Boolean function f given by
its polynomial expression.
First, notice that f is linear if and only if all its second-order derivatives
are identically zero. This implies that linear functions are exactly those pseudo-
Boolean functions that are simultaneously supermodular and submodular; they are
sometimes called “modular” in the literature.
Example 13.9. A prime example of linear pseudo-Boolean function is provided
by a probability measure on a finite set. Linearity is due to the defining identity

Prob(A) = Prob({j }) for all A ⊆ {1, 2, . . . , n},
j ∈A

whereas sub- and supermodularity appear clearly in the well-known inclusion-


exclusion formula

Prob(A ∪ B) = Prob(A) + Prob(B) − Prob(A ∩ B). 

Consider now the quadratic case. It follows from Theorem 13.21 that a quadratic
function f is supermodular if and only all its quadratic terms have nonnegative
coefficients (Nemhauser, Wolsey, and Fisher [708]). This property can easily be
checked in polynomial time.
The second-order derivatives of cubic functions are linear functions. Hence,
the minimum and maximum of these derivatives can be efficiently computed.
This implies in turn that supermodular and submodular cubic functions can also
be recognized in polynomial time. On the other hand, the following result was
independently established by Crama [226] and by Gallo and Simeone [364].
Theorem 13.22. It is co-NP-complete to decide whether a pseudo-Boolean func-
tion expressed in polynomial form is supermodular (or submodular), even when
the input is restricted to polynomials of degree 4.
Proof. The proof is similar to the proof of Theorem 13.19. We leave it as an end-of-
chapter exercise to the reader. 

An important connection between supermodularity and concavity was estab-


lished by Lovász [624]. It relies on an elegant characterization of supermodular
functions in terms of their Lovász extension (see Section 13.3.3, and remember
that we have only defined the Lovász extension when f (0, 0, . . . , 0) = 0).
Theorem 13.23. A pseudo-Boolean function f such that f (0, 0, . . . , 0) = 0 is
supermodular if and only if its Lovász extension f L is concave.
Proof. We assume that f is defined on B n and we use the same notations as in
Section 13.3.3.
602 13 Pseudo-Boolean functions

(If) Assume that f L is concave and let X, Y ∈ B n . Observe that the points
X ∧ Y and X ∨ Y are in a same simplex S(π ) ∈ S since X ∧ Y ≤ X ∨ Y . Thus, we
successively derive:
1
2
f (X) + 12 f (Y ) = 12 f L (X) + 12 f L (Y ) (since f L is an extension of f )
≤ f L ( 12 (X + Y )) (by concavity of f L )
 
= f L 12 (X ∨ Y ) + 12 (X ∧ Y )
= 12 f L (X ∨ Y ) + 12 f L (X ∧ Y ) (by linearity of f L on S(π ))
= 12 f (X ∨ Y ) + 12 f (X ∧ Y ) (since f L is an extension of f ).

This proves that f is supermodular.

(Only if) Assume that f is supermodular. Recall that, for an arbitrary permuta-
L
tion π of {1, 2, . . . , n}, fS(π ) denotes the unique linear extension of f on S(π ) and
that it can be expressed by Equation (13.34). By a slight abuse of notations, we look
L n
at fS(π ) as being defined on R , rather than on S(π ) only.
Consider now an arbitrary point X ∈ U n , and assume that xπ ∗ (1) ≤ xπ ∗ (2) ≤
L
. . . ≤ xπ ∗ (n) , meaning that X is in the simplex S(π ∗ ) and f L (X) = fS(π ∗ ) (X). We
are going to prove that, for every other permutation π ,
L L
fS(π ∗ ) (X) ≤ fS(π ) (X). (13.60)

Observe that if (13.60) holds, then it follows that

f L (X) = fS(π
L L
∗ ) (X) = min fS(π ) (X), (13.61)
S(π )∈S

and hence, f L is concave because it is the pointwise minimum of (finitely many)


linear functions.
In order to prove inequality (13.60), consider the smallest index j such that x π(j ) >
xπ(j +1) . If j does not exist, then (13.60) holds as an equality since X ∈ S(π) ∩ S(π ∗ ).
Otherwise, define a permutation ρ by transposing j and j + 1:

ρ(j ) = π(j + 1), ρ(j + 1) = π(j ), and ρ(i) = π(i) for all i  = j , j + 1.

Some computations show that


n

L L
= >
fS(π ) (X) − fS(ρ) (X) = (xπ(k) − xπ(k−1) )f (E π ,k ) − (xρ(k) − xρ(k−1) )f (E ρ,k )
k=1

= (xπ(j ) − xπ(j +1) )


= >
× f (E π ,j ) − f (E π ,j +1 ) − f (E ρ,j +1 ) + f (E π ,j +2 ) .

Moreover, E π ,j = E π ,j +1 ∨ E ρ,j +1 and E π ,j +2 = E π ,j +1 ∧ E ρ,j +1 . Therefore,


supermodularity implies that
L L
fS(π ) (X) − fS(ρ) (X) ≥ 0.
13.6 Special classes of pseudo-Boolean functions 603

Repeating this argument at most n times eventually transforms π into a permutation


ρ ∗ which sorts the components of X in nondecreasing order and such that

L L L
fS(π ∗ ) (X) = fS(ρ ∗ ) (X) ≤ fS(π ) (X).

This establishes (13.60), and the proof is complete. 

The proof of Theorem 13.23, in particular, Equation (13.61), shows that every
supermodular function can be represented as the lower-envelope of linear (pseudo-
Boolean) functions. Interestingly, supermodular functions can also be shown to
be upper-envelopes of linear functions; this result is discussed in Rosenmüller
[791], where it is used to characterize extreme rays of the cone of nonnegative
supermodular functions.
Let us now turn to the problem of optimizing supermodular functions.
Grötschel, Lovász, and Schrijver [414] were first to prove that supermodular func-
tions can be maximized in polynomial time, even when the function can only be
accessed via an oracle (that is, a black-box algorithm which returns the value f (X)
for every input X ∈ B n ). Another proof of this result was provided by Lovász
[624], as a direct consequence of Theorem 13.23, of the fact that concave functions
can be maximized over convex sets in polynomial time, and of the observation
that maxX∈U n f L (X) = maxX∈Bn f (X) (Theorem 13.14).
Strongly polynomial combinatorial algorithms for the maximization of super-
modular functions were subsequently proposed by Iwata, Fleischer, and Fujishige
[524] and Schrijver [813]; see also Fujishige [351] and Schrijver [814], as well as
the surveys by Iwata [523] and McCormick [637].
When a supermodular function is given by its polynomial expression and is
either quadratic or cubic, then its maximization can be reduced to a max-flow min-
cut problem in an associated network (compare with Equation (13.1) in Section
13.1; see for instance Balinski [47], Billionnet and Minoux [84], Hansen and
Simeone [474], Kolmogorov and Zabih [577], Picard and Ratliff [746], Rhys
[786], Živný, Cohen, and Jeavons [938], and Section 13.6.4 hereunder for related
considerations).
Finally, let us remark that even though the maximum of a supermodular (or
the minimum of a submodular) function can be computed in polynomial time,
the opposite optimization problems, namely, the maximization of a submodular
(or the minimization of a supermodular) function is NP-hard; this follows easily,
for instance, from the NP-hardness of the max-cut problem and of the weighted
stability problem in graphs; see Application 13.1. However, a standard greedy
procedure for the maximization of a submodular set function provides a (1 − 1e )-
approximation of the maximum; see Fisher, Nemhauser, and Wolsey [332, 708],
Fujito [352], Nemhauser, and Wolsey [706], Wolsey [923], and so on. Goldengorin
[393] reviews theoretical results about the structure of local and global maxima of
submodular functions, and discusses specialized maximization algorithms.
604 13 Pseudo-Boolean functions

13.6.4 Unimodular functions


Definition 13.13. A pseudo-Boolean function is almost-positive if all its non-
linear terms (i.e., terms of degree at least 2) have nonnegative coefficients in its
polynomial expression.
Theorem 13.21 implies that almost-positive functions are supermodular, and
that the converse relation holds for quadratic functions. It is well-known that
the maximization of almost-positive functions can be performed efficiently, by
reduction to the computation of a minimum cut in a network (Balinski [47]; Picard
and Queyranne [745]; Picard and Ratliff [746]; Rhys [786]). This observation has
prompted several researchers to investigate broader classes of functions for which
the same property holds. In order to define these classes, we introduce the following
switching operation. For a pseudo-Boolean function f on Bn and a subset S of
{1, 2, . . . , n}, we denote by fS the function defined for all (x1 , x2 , . . . , xn ) in B n by
fS (x1 , x2 , . . . , xn ) = f (y1 , y2 , . . . , yn ), where yj = x j if j ∈ S and yj = xj if j  ∈ S,
and we say that fS is obtained from f by switching S. It is easy to see that the class
of almost-positive functions is not closed under switching, and this motivates the
next definition.
Definition 13.14. A pseudo-Boolean function is unate if it can be obtained from
an almost-positive function by switching a subset of its variables.
Another extension of the class of almost-positive functions was introduced by
Billionnet and Minoux [84].
Definition 13.15. A posiform is polar if each of its terms involves either no com-
plemented variables or no uncomplemented variables. A pseudo-Boolean function
is polar if it has at least one polar posiform.
Almost-positive functions are obviously polar. Moreover, Billionnet and
Minoux [84] observed that the class of polar functions is properly included in
the class of supermodular functions, and that both classes coincide when restricted
to cubic functions. The maximization of polar functions is again reducible to a
network min-cut problem; this follows, for instance, from the observation that
the conflict graph of a polar posiform is bipartite, or from the special structure
of the constraint matrix of the integer programming problem (13.45)–(13.50),
which turns out to be totally unimodular for polar posiforms (as a consequence of
Theorem 5.13). Since bipartiteness of the conflict graph and total unimodularity
are preserved by switching operations, whereas polarity is not, we can define yet
another class of functions.
Definition 13.16. A pseudo-Boolean function is unimodular if it can be obtained
from a polar function by switching a subset of its variables.
Unimodular functions were introduced by Hansen and Simeone [474]. Their
definition was directly stated in terms of total unimodularity, but the equivalence
13.6 Special classes of pseudo-Boolean functions 605

unate
> ?
almost-positive unimodular
? > ?
polar supermodular after switching
? >
supermodular

Figure 13.1. Classes related to unimodular and supermodular functions.

unate
> ?
almost-positive unimodular ⇔
supermodular after switching
? >
polar ⇔ supermodular

Figure 13.2. Classes related to unimodular and supermodular cubic functions.

almost-positive unate
. 
⇔ polar =⇒ ⇔ unimodular
⇔ supermodular ⇔ supermodular after switching

Figure 13.3. Classes related to unimodular and supermodular quadratic functions.

Unimodular functions were introduced by Hansen and Simeone [474]. Their


definition was directly stated in terms of total unimodularity, but the equivalence
of both definitions was observed by Crama [226] and by Simeone, de Werra, and
Cochand [835].
Clearly, almost-positive, unate, and polar functions are unimodular. Figure 13.1
summarizes the mutual relationships between several classes of functions. The sim-
pler diagram obtained for cubic functions is displayed in Figure 13.2. For quadratic
functions, the diagram shrinks even further, as shown in Figure 13.3.
Simeone, de Werra, and Cochand [835] proposed an efficient recognition algo-
rithm for unate functions given in polynomial form. Crama [224, 226] described a
polynomial-time algorithm that recognizes polar and unimodular functions; when
a function f is unimodular, this algorithm produces a switching set S and a polar
posiform of fS .
606 13 Pseudo-Boolean functions

13.6.5 Threshold and unimodal functions


Hammer, Simeone, Liebling, and de Werra [464] have introduced a hierarchy
of pseudo-Boolean functions that generalize Boolean threshold functions (see
Chapter 9), and that also present interesting features in relation with local maxi-
mization algorithms (see Section 13.4.1). We briefly describe them in this section. All
pseudo-Boolean functions considered here are assumed to be injective on B n : If
X, Y ∈ Bn and X  = Y , then f (X)  = f (Y ).
Definition 13.17. A pseudo-Boolean function f on Bn is called threshold if, for all
r ∈ Rn , there exist n weights w1 (r), w2 (r), . . . , wn (r) ∈ R and a threshold t(r) ∈ R
such that, for all (x1 , x2 , . . . , xn ) ∈ Bn ,
n

f (x1 , x2 , . . . , xn ) ≤ r if and only if wi (r) xi ≤ t(r).
i=1

So, for each value of r, there exists a hyperplane which separates the vertices
of B n where f takes value at most r from those where it takes value larger than r.
Definition 13.18. A pseudo-Boolean function f on B n is unimax if it has a unique
local maximum in B n . It is completely unimodal if, for each face F of Bn , the
restriction of f to F is unimax.
This terminology is due to Hammer et al. [464], who proved that threshold
functions are completely unimodal. Completely unimodal functions were also
examined by Emamy-K. [311], Hoke [496, 497], and Wiedemann [907], and
unimax functions by Tovey [866, 867].
The main motivation for considering unimax functions is that local maximiza-
tion algorithms could be expected to perform well for such functions. Indeed, if
f is a unimax function, then the decision version of the maximization problem
is in NP ∩ co-NP, since the global maximum of f is “well-characterized” [866];
based on this observation, it has been conjectured that unimax functions can be
maximized in polynomial time. (Pardalos and Jha [730] proved that it is NP-hard
to find the global maximum of a quadratic pseudo-Boolean function even when
this global maximum is unique; however, this does not seem to have immediate
consequences for unimax functions.)
When f is completely unimodal, Hammer et al. [464] proved that there always
exists an increasing path of length at most n from any point X ∈ Bn to the maximum
of f . However, rather surprisingly, it has also been shown that simple local search
procedures may perform an exponential number of steps before they reach a local
(and global) maximum of a completely unimodal function; we refer to the above-
mentioned references or to papers by Björklund, Sandberg, and Vorobyov [93, 94,
95] for related investigations and for applications in game theory and computer-
aided verification.
Crama [226] proved that the recognition problem is NP-hard for threshold,
completely unimodal, and unimax functions expressed in polynomial form. The
question remains open, however, for quadratic unimax functions.
13.7 Exercises 607

13.7 Exercises
1. Prove that the conjunction of two pseudo-Boolean elementary conjunctions
is an elementary conjunction.
2. Show that condition (ii) in Lemma 13.1 does not completely characterize
the prime implicants of a pseudo-Boolean function.
3. Consider the (simple) game associated with a Boolean function f , and
let βi denote the Banzhaf index of player i, as in Section 1.13.3.
Show that (β1 , β2 , . . . , βn ) is proportional to the vector of first deriva-
tives (J1 f pol (C), J2 f pol (C), . . . , Jn f pol (C)) evaluated at the point C =
( 12 , 12 , . . . , 12 ). (See Owen [720].)
4. (a) Show that every point X ∈ U n can be written in a unique way as a linear
combination of the form
K

X= λk X k , (13.62)
k=1

where λk > 0 (k = 1, 2, . . . , K) and X 1 ≤ X2 ≤ . . . ≤ XK are distinct points


in B n .
(b) Show that the Lovász extension of a pseudo-Boolean function f can be
expressed as

K

L
f (X) = λk f (X k ) subject to (13.62).
k=1

5. Show that the value of a pseudo-Boolean function f may be arbitrarily


worse in a local maximum than in the global maximum of f , even when f
is assumed to be quadratic.
6. Show that the maximum of the max-cut function (13.1) is at least

1 
c(i, j ).
2 1≤i<j ≤n

Conclude that every graph contains a cut of capacity at least equal to half
the sum of the edge capacities, and that such a cut can be found efficiently.
(See Erdős [312] and Sahni and Gonzalez [798].)
7. Prove that every pseudo-Boolean function f on B n has a posiform ψ of the
form (13.51) such that b 0 = minX∈Bn f (X).
8. Prove that the optimal value of the linear relaxation of (13.45)–(13.50) is exactly
the maximum of the concave standard extension ψ std (13.29).
9. Show that the hyperbolic or fractional programming problem
n
a0 + j =1 aj xj
maxn f (X) = n
X∈B b0 + j =1 bj xj
608 13 Pseudo-Boolean functions

can be solved in polynomial time if


n

b0 + bj xj > 0 for all X ∈ Bn ,
j =1

but is NP-hard when this condition does not hold. (See Boros and Hammer
[127]; Hammer and Rudeanu [460]; Hansen, Poggi de Aragão, and Ribeiro
[473].)
10. Prove that it is co-NP-complete to decide whether a pseudo-Boolean function
expressed in DNF is monotone.
11. Prove Theorem 13.22.
12. Prove that the concave envelope of a supermodular pseudo-Boolean function
is its Lovász extension.
13. Show that the classes of almost-positive, supermodular, and polar functions
are not closed under switching.
14. Establish all the implications displayed in Figures 13.1,13.2 and 13.3, and show
that they cannot be reversed.
15. Prove that threshold pseudo-Boolean functions are completely unimodal.
16. If f is a completely unimodal function on Bn , prove that there always exists
an increasing path of length at most n from any point X ∈ B n to the maximum
of f .
17. Prove that
(a) it is NP-hard to decide whether a quadratic pseudo-Boolean function
has a unique global maximum;
(b) it is NP-hard to find the maximum of a quadratic pseudo-Boolean func-
tion even if we know that the global maximum is unique; (Pardalos and
Jha [730]).

Question for thought


18. How difficult is it to recognize whether a quadratic pseudo-Boolean function
is unimax?
Appendix A

Graphs and hypergraphs

This appendix proposes a short primer on graph and hypergraph theory. It sums up
the basic concepts and terminology used in the remainder of the monograph. For
(much) more information, we refer the reader to numerous excellent books dealing
in-depth with this topic, such as Bang-Jensen and Gutin [51]; Berge [71, 72];
Brandstädt, Le, and Spinrad [152]; Golumbic [398]; Mahadev and Peled [645]; or
Schrijver [814].

A.1 Undirected graphs


An undirected graph, or graph for short, is a pair of finite sets G = (V , E) in which
V is the set of vertices of the graph, and E is a set of unordered pairs of vertices
called edges of the graph. Abiding by widespread conventions, we often use the
notation (u, v), or even simply uv, for an edge {u, v}. Occasionally, we consider
undirected graphs with loops, where a loop is an edge of the form (v, v) for v ∈ V
(we may view a loop as an edge of cardinality 1).
A graph can be represented as a diagram consisting of points (vertices) joined
by lines (edges), as in Figure A.1.
When e = (u, v) is an edge, we say that vertices u and v are adjacent, that u is a
neighbor of v, that u and v are incident to e, that u and v are the endpoints of e, and
so forth. The neighborhood of a vertex u ∈ V is the set N (u) = {v ∈ V : (u, v) ∈ E}.
The degree of u in G is the number of edges incident to u. We denote it by degG (u)
or simply deg(u).
Two graphs (V , E) and (W , A) are isomorphic if there exists a bijection
ψ:V → W such that, for all u, v ∈ V , (u, v) ∈ E if and only if (ψ(u), ψ(v)) ∈ A.
Intuitively, two graphs (V , E) and (W , A) are isomorphic if they can be represented
by the same diagram.
The complement of the loopless G = (V , E) is the graph G = (V , E) where
E = { (u, v) : u, v ∈ V , u  = v, (u, v)  ∈ E }. So, the edges of G are exactly the
nonedges of G.

609
610 Appendix A

1

✁❆
✁ ❆
✁ ❆
✁ ❆
✁ ❆
2 ✈
✁ ❆✈ 3
❍❍


❍❍✈
✟ 6
✟✟

✈ ✈✟

4 5
Figure A.1. Representation of a small graph.

A.1.1 Subgraphs
Let G = (V , E) be a graph. A graph H = (W , A) is a subgraph of G if W ⊆ V and
A ⊆ E. We say that H is the subgraph of G induced by W if A is exactly the set of
edges of G that have both of their endpoints in W ; namely, if A = {e ∈ E : e ⊆ W }.
We sometimes denote by GW the subgraph of G induced by W .
A subset of vertices S ⊆ V is said to be a stable set (or an independent set) of
G if S does not contain any edge of G. The subset S is a clique of G if every pair
of vertices of S is an edge. It is a transversal, or a vertex cover, if every edge in E
intersects S.
We denote by α(G) the maximum size of a stable set of G; by ω(G), the
maximum size of a clique of G; and by τ (G), the minimum size of a vertex cover.
A subset of edges M ⊆ E is called a matching of G if the edges in M are
pairwise disjoint. A matching is perfect if it contains 12 |V | edges, that is, if every
vertex of G is incident to an edge of the matching.

A.1.2 Paths and connectivity


A graph can simply be viewed as a symmetric binary relation on its set of vertices.
But the “pictorial” representation of a graph as a diagram of points (vertices)
and lines (edges) naturally places the emphasis on topological notions like paths,
cycles, or connectivity.
A walk of length k in a graph G = (V , E) is a sequence

C = (v1 , e1 , v2 , e2 , v3 , . . . , vk , ek , vk+1 ) (A.1)

in which k ≥ 0, v1 , v2 , . . . , vk+1 are vertices, e1 , e2 , . . . , ek are edges, and ei =


(vi , vi+1 ) for i = 1, 2, . . . , k. It can also be denoted as C = (v1 , v2 , v3 , . . . , vk , vk+1 )
or C = (e1 , e2 , . . . , ek ) when no confusion arises. The vertices v1 and vk+1 are the
endpoints of the walk C, and we say that they are connected by the walk. The walk
is closed if v1 = vk+1 .
Appendix A 611

The walk (A.1) is a path if all its vertices (and hence, all its edges) are distinct:
vi  = vj for 1 ≤ i < j ≤ k + 1. The walk (A.1) is a circuit if is is closed (v1 = vk+1 ),
if v1 , v2 , . . . , vk+1 are all distinct, and if e1 , e2 , . . . , ek are all distinct.
A connected component of G = (V , E) is a maximal subset S ⊆ V such that,
for all u, v ∈ S, u and v are the endpoints of a path in G. So, connected components
are the equivalence classes of the equivalence relation “u and v are connected by
a path.” A graph is connected if it has a unique connected component.

A.1.3 Special classes of graphs


In this section, we introduce a few classes of graphs with special properties. More
classes are defined in several chapters throughout the book.
First, we denote by Pn the path with vertex set N = {1, 2, . . . , n} and with edges
(i, i + 1) for i = 1, 2, . . . , n − 1. Similarly, we denote by Cn the circuit with vertex
set N = {1, 2, . . . , n} and with edges (1, n) and (i, i + 1), i = 1, 2, . . . , n − 1.
The graph G = (V , E) is complete if E = { (u, v) : u, v ∈ V }, that is, if V is a
clique of G. We denote by Kn the complete graph on N = {1, 2, . . . , n}.
The graphs P4 , C4 and K4 are represented in Figure A.2.
The graph G = (V , E) is bipartite if there exists a partition of V into two subsets
B, R (say, blue and red) such that every edge of G has one blue endpoint and one
red endpoint, namely,
E ⊆ { (u, v) : u ∈ B, v ∈ R }. (A.2)
The graph is called complete bipartite if E is exactly equal to the right-hand side of
(A.2). For example, the graphs P4 and C4 are bipartite, and C4 is complete bipartite
(see Figure A.2). A star is a complete bipartite graph such that |B| = 1.
A graph is a forest if it contains no circuit. A tree is a connected forest. It is
easy to see that, in a tree, there always exists a unique path between any pair of
vertices. A rooted tree is a pair (T , r), where T = (V , E) is a tree and r ∈ V is a
distinguished vertex called the root of T . A small rooted tree is shown in Figure
A.3. Let
P = (v1 , e1 , v2 , e2 , v3 , . . . , vk , ek , vk+1 )
be a path in a rooted tree, with v1 = r, and let vj be one of the vertices in P , with
1 < j < k + 1. Then, we say that
• vj −1 is the (unique) father of vj ;

1 ❤ ❤2 1 ❤ ❤2 1 ❤ ❤2



❅❤
4 ❤ ❤3 4 ❤ ❤3 4 ❤ 3
(a) (b) (c)
Figure A.2. (a): P4 , (b): C4 , (c): K4 .
612 Appendix A

✏
r
✒✑



✏ ❅
❅✏
s t
✒✑ ✒✑



✏ ❅
❅✏
u v
✒✑ ✒✑
❅ ❅
❅ ❅
❅ ❅
✏ ❅
❅✏ ❅
❅✏
w x y
✒✑ ✒✑ ✒✑
Figure A.3. A tree rooted at r.

• v1 , v2 , . . . , vj −1 are the ancestors of vj ;


• vj +1 is a child (not necessarily unique) of vj ;
• vj +1 , vj +2 , . . . , vk+1 are successors of vj .

A.2 Directed graphs


A directed graph, or digraph for short, is a pair of finite sets D = (V , A) where
V is the set of vertices of the digraph and A is a collection of ordered pairs of
vertices, called arcs. We think of every arc (u, v) as being directed from its tail u
to its head v. A loop is an arc of the form (u, u).
The outdegree of vertex u is the number of arcs “leaving” u (that is, with tail
u), and the indegree of vertex u is the number of arcs “entering” u (that is, with
head u).
Every digraph can be obtained by orienting the edges of a graph G = (V , E),
that is, by replacing every edge {u, v} of G by one (or both) of the arcs (u, v), (v, u).
Conversely, by disregarding the orientation of its arcs, each digraph D = (V , A)
gives rise to the underlying (undirected) graph G = (V , E), where E = {{u, v} |
(u, v) ∈ A}.
Most of the notions presented in the previous section can be extended to directed
graphs.

A.2.1 Directed paths and connectivity


A directed walk of length k in a digraph D = (V , A) is a sequence P =
(v1 , a1 , v2 , a2 , v3 , . . . , vk , ak , vk+1 ), where k ≥ 0, v1 , v2 , . . . , vk+1 are vertices,
Appendix A 613

a1 , a2 , . . . , ak are arcs, and ai = (vi , vi+1 ) for i = 1, 2, . . . , k. The directed walk


P is closed if v1 = vk+1 . It is a directed path, or dipath, if all its vertices (and
hence, all its arcs) are distinct. It is a cycle if k ≥ 1, v1 , v2 , . . . , vk are all distinct,
v1 = vk+1 , and a1 , a2 , . . . , ak are all distinct.
If there is a dipath from u to v in D, then we say that u is an ancestor of v, and
that v is a successor of u.
A strongly connected component, or strong component, of D = (V , A) is a
maximal subset S ⊆ V such that, for every pair u, v of distinct vertices in S, there
is a directed path from u to v and a directed path from v to u in D. We say that D
is strongly connected if V is its unique strong component. We simply say that D
is connected if its underlying undirected graph is connected.
The condensation of digraph D = (V , A) is the digraph D̂ = (V̂ , Â), where the
elements of V̂ are the strong components of D, and (S1 , S2 ) ∈ Â if there is at least
one arc in D from some vertex of S1 to some vertex of S2 . It is easy to see that D̂
is an acyclic digraph.

A.2.2 Special classes of digraphs


An arborescence rooted at r is a pair (T , r), where T = (V , A) is a digraph and
r ∈ V is a distinguished vertex of T such that, for every v ∈ V , there exists a unique
directed path from r to v in D.
If (T , r) is a rooted (undirected) tree, where T = (V , E), we obtain an arbores-
cence ((V , A), r) as follows: For every edge {u, v} ∈ E, if u is the father of v, then
we create the arc (u, v) in A (i.e., we orient every edge from father to son); this con-
struction is illustrated in Figure A.4 for the tree of Figure A.3. Every arborescence
arises in this way.
A digraph D = (V , A) is transitive if the following implication holds for all
u, v, w ∈ V :
(u, v) ∈ A and (v, w) ∈ A ⇒ (u, w) ∈ A.
A DAG is a directed acyclic graph, that is, a directed graph without cycles. Every
DAG D has at least one vertex with indegree 0, called a source of D, and at least
one vertex with outdegree 0, called a sink or leaf of D. A topological ordering
of a DAG D = (V , A) is a bijection σ : V → {1, 2, . . . , n} such that σ (u) < σ (v)
when (u, v) ∈ A. Every DAG has a topological ordering.

A.2.3 Transitive closure and transitive reduction


The transitive closure of a digraph D = (V , A) is the smallest transitive digraph
D ∗ that contains D as a subgraph; in other words, the transitive closure of D is
the digraph D ∗ = (V , A∗ ), where A∗ contains all the arcs (u, v) ∈ V × V such that
there is a directed path from u to v in D.
A transitive reduction of the digraph D = (V , A) is any digraph D = (V , A )
such that the transitive closure of D is equal to the transitive closure of D , and
614 Appendix A

✏
r
✒✑



✏
✠ ❅
❘✏

s t
✒✑ ✒✑



✏
✠ ❅
❘✏

u v
✒✑ ✒✑
❅ ❅
❅ ❅
❅ ❅
✏
✠ ❅
❘✏
❅ ❅
❘✏

w x y
✒✑ ✒✑ ✒✑
Figure A.4. An arborescence rooted at r.

such that the cardinality of A is minimum with this property. If D is acyclic, then
D has a unique transitive reduction.

A.3 Hypergraphs
A hypergraph, or set system, is a pair of sets H = (V , E), where V is the set of
vertices of H, and the elements of E are subsets of V called edges (or hyperedges)
of the hypergraph.
Hypergraphs constitute a natural generalization of (undirected) graphs: Indeed,
a graph is nothing but a hypergraph with edges of cardinality 2. As such, many of
the concepts introduced for graphs can be extended (often in more than one way)
to hypergraphs.
For instance, a subset of vertices is said to be stable in H if it does not contain
any edge of H, and it is a transversal of H if it intersects every edge of H. A
matching is a set of pairwise disjoint edges of H.
A clutter (or Sperner family, or simple hypergraph) is a hypergraph H = (V , E)
with the property that no edge is a subset of another edge: If A ∈ E, B ∈ E and
A  = B, then A  ⊆ B.
Appendix B

Algorithmic complexity

By and large, we assume that the readers of this book have at least some intu-
itive knowledge about algorithms and complexity. For the sake of completeness,
however, we provide in this appendix an informal introduction to fundamental
concepts of computational complexity: problems, algorithms, running time, easy
and hard problems, etc. For a more thorough and rigorous introduction to this
topic, we refer the reader to the classical monograph by Garey and Johnson [371],
or to other specialized books like Aho, Hopcroft and Ullman [11], Papadim-
itriou [725], or Papadimitriou and Steiglitz [726]. Note that Cook et al. [211]
and Schrijver [814] also provide gentle introductions to the topic, much in the
spirit of this appendix.
In Section B.8, we propose a short primer on the complexity of list-generating
algorithms; such algorithms are usually not discussed in basic textbooks on
complexity theory, but they arise naturally in several chapters of our book.

B.1 Decision problems


Intuitively speaking, an algorithmic or computational problem is a generic question
whose formulation contains a number of undetermined parameters. For instance,
we can think of “addition” as the generic problem of adding two numbers (the
numbers themselves must be specified before any specific computation can be
performeed). Similarly, “solving quadratic equations” of the form ax 2 +bx +c = 0
is a problem which can be handled by an appropriate algorithm as soon as the
numerical values of the parameters a, b, and c are known. In order to express these
concepts more precisely, we need to explain how problems are stated and how
their parameters are specified.
An alphabet is a finite set I, and a word on I is a finite (ordered) sequence of
symbols from I. For instance, if I is the binary alphabet B = {0, 1}, then examples
of words on I are: 000, 10100, 0011. If I consists of the Roman alphabet and
of some usual typographical symbols, namely I = {a, b, c, . . . , z, !, ; , −, . . .}, then

615
616 Appendix B

do−not−disturb or dsfhuhf;;jseee are examples of words on I. The


size of a word W is the number of symbols in W ; we denote it by |W | and we
denote by I ∗ the set of all words (of any size) on I.
When we fix an arbitrary alphabet I, the words in I ∗ can be used to encode
many types of objects, such as a data set of numbers in binary format, or a text
in natural language, or an algebraic equation, or a Boolean expression. Think of a
word as the input string which is read by a computer program. Then, intuitively,
a “problem” is a question that is asked about the input string: What is the largest
number in the data set? Does the text contain the word do−not−disturb?
Does the Boolean expression represent the constant 1?
In particular, we say that a question about an input string is a decision problem
if the answer to the question is either “Yes” or “No”. More formally, a deci-
sion problem is simply defined as a subset D of I ∗ . (Since decision problems
are sets of words, they are also called languages.) In this context, an arbi-
trary word is called an instance (or input) of the problem. The word W is a
“Yes-instance” for the decision problem D if W ∈ D, and it is a “No-instance”
otherwise.
In our informal description of problems, we often use the following type of
presentation:

Problem D
Instance: A word W ∈ I ∗ .
Question: Is W contained in D?

To make things concrete, let us give some examples of problems:

Quadratic equations
Instance: Three integers a, b, c ∈ N.
Question: Does the equation ax 2 + bx + c = 0 have a solution in R?

Tree
Instance: A graph G = (V , E).
Question: Is G a tree?

Hamiltonian graph
Instance: A graph G = (V , E).
Question: Does G contain a Hamiltonian circuit, that is, does G contain a circuit
that visits every vertex exactly once?

DNF Equation
Instance: A DNF expression φ(X).
Question: Is the equation φ(X) = 0 consistent?
Appendix B 617

In the previous examples, we have (implicitly) assumed

DQuad = {W ∈ I ∗ : W represents a triplet (a, b, c) ∈ N3 such that


ax 2 + bx + c = 0 has a solution in R},
DT ree = {W ∈ I ∗ : W represents a tree},
DH amilton = {W ∈ I ∗ : W represents a Hamiltonian graph},
DDNF = {W ∈ I ∗ : W represents a DNF expression φ such that φ(X)  ≡ 1}.

B.2 Algorithms
In order to solve a problem, we like to rely on an algorithm, that is, on a step-
by-step procedure that describes how to compute a solution for each instance of
the problem. Thus, for the problem Quadratic equations, the algorithm may
consist in computing the resolvent ρ = b2 − 4ac, in testing whether ρ ≥ 0, and in
returning the answer either “Yes” or “No” depending on the outcome of the test.
More formally, algorithms (and computers) can be modelled in many different
ways, such as Turing machines or random access machines (RAMs). A sketchy
description of Turing machines [11, 371, 725] will suffice for our purpose. (In
fact, we only need this description for the proof of Cook’s theorem, in Section B.7
hereunder. So, the reader may choose to skip the following definitions in a first
reading and to return to them later if necessary.)
A one-tape Turing machine A consists of

• a “processor,” which is always in one of a finite number of “states”; the set of


states, say Q, contains three distinguished states, namely, the “initial state”
q0 , and the “final states” qY (for “Yes”) and qN (for “No”);
• a single one-dimensional “tape,” to be be viewed as memory space, which
contains an infinite number of “cells” indexed by the integers in Z; at any
time, each cell of the tape can hold at most one symbol from the alphabet
I0 = I ∪ {Z}, where Z is a special “blank” symbol;
• a “read-write (RW) head,” which can move along the tape and scan any of
its cells; when the RW head scans a cell, it can read the symbol marked in
the cell and/or replace it by a new symbol;
• a “transition function” T : Q × I0 → Q × I0 × {−1, +1}, to be viewed as a
primitive program.

A Turing machine operates on words of I ∗ according to the following recursive


rules. Initially, the processor is in state q0 , the input word W is written in the
adjacent cells 1, 2, . . . , |W | of the tape (one symbol per cell), all the other cells
contain the blank symbol Z, and the RW head scans the leftmost symbol of W in
cell 1. Suppose now that, at the start of iteration i, the processor is in state q ∈ Q,
and the RW head scans cell k ∈ Z, where k contains the symbol σ ∈ I0 . If q = qY or
q = qN , then the machine ends its computation. Otherwise, let T (q, σ ) = (q , σ , m),
618 Appendix B

where m ∈ {−1, +1}. Then, the processor changes its state from q to q , the RW
head replaces the symbol σ by σ in cell k, and the head moves to cell k + m (that
is, it moves either one step to the left or one step to the right). Iteration i + 1 can
begin.
We say that the Turing machine A accepts the word W ∈ I ∗ if it halts in state
qY when applied to W . The set of words (that is, the language) accepted by A is, by
definition, a decision problem DA . Note that when A is applied to an input word
that does not belong to DA , then A may either halt in the state qN , or it may go on
computing forever. Since we are not fond of endless computations, we introduce
one more concept: Namely, we say that the Turing machine A solves the decision
problem D if D = DA and if A halts for all inputs W ∈ I ∗ . Thus, A returns the
answer “Yes” when W ∈ D, and it returns “No” otherwise.
We also note, for the record, that if a Turing machine A halts for all inputs
W ∈ I ∗ , then it can be used to compute a function fA : I ∗ → I ∗ , where fA (W )
is the word written on the tape when A halts, disregarding all blank symbols.
Despite its apparent simplicity, the Turing machine model is surprisingly power-
ful and can be used to simulate complex computations, such as those performed by
real-world computers. Therefore, in the remainder of this appendix and throughout
most of the book, we do not distinguish between “algorithms” and Turing machines
unless the distinction is absolutely required. We refer again to the literature cited
earlier for a discussion of the relation between Turing machines and other mod-
els of computation. Roughly speaking, however, the basic idea is that all these
models are “essentially equivalent” from the point of view of their computational
efficiency. Which brings us to our next topic...

B.3 Running time, polynomial-time algorithms, and the class P


The running time of a computer program on a given data set can be influenced
by many factors, including the speed of the CPU, the skill of the programmer,
the features of the programming language and of the compiler, and so on. But
essentially, it is directly related to the number of elementary operations performed
by the underlying algorithm and to the size of the data set. These observations
motivate the following definitions.
Consider a problem D and a Turing machine (or an algorithm) A that solves
D. For every W ∈ I ∗ , the running time of A on the input W is the number of
iterations performed by A on W before it halts.
The running time function RA (n) of A denotes the worst-case running time of
A over all input words of size n, that is,
RA (n) = max{r : r is the running time of A on a word W ∈ I ∗ such that |W | = n}.
The function RA (n) is sometimes called the time complexity function of A, or
simply the complexity of A.
Algorithm A runs in polynomial time if there exists a polynomial p(n) such
that RA (n) ≤ p(n) for all n ∈ N. The complexity of a polynomial-time algorithm
Appendix B 619

does not increase too fast with the size of the instances that it solves: We consider
such an algorithm to be efficient.
The complexity class P contains the set of all problems that can be solved by a
polynomial-time algorithm (or Turing machine):

P = {D : D is a decision problem and there is a polynomial-time algorithm


that solves D}.

The class P is of paramount importance in the theory of computation, so much so


that, for combinatorial algorithmic problems, the qualifiers “solved in polynomial
time,” “well-solved,” or “efficiently solved,” have become quasi-synonymous. We
refer again to [11, 371, 725, 726, 814] for a more thorough discussion.
By analogy with time complexity, one can also define the space complexity of
a Turing machine A by reference to the number of cells scanned by the RW head
until it halts. We do not make much use of this concept in the book.

B.4 The class NP


Another important complexity class is the class NP, where the initial “N” stands for
“nondeterministic” and “P” stands for “polynomial.” To understand its definition,
consider again your favorite decision problem, say, DNF Equation as defined in
Section B.1, and consider an instance φ of this problem, where

φ = x 1 x2 x 3 ∨ x1 x 2 x3 ∨ x 1 x 2 x4 ∨ x 1 x3 ∨ x2 x3 x4 ∨ x4 x5 x6
∨ x 4 x 5 x 6 ∨ x1 x3 x 4 ∨ x3 x5 x 6 .

It may not be easy for you to decide whether the equation φ = 0 is consistent or
not. (Try!) But since we are nice people, we can provide some help: In fact, we
can assure you of the existence of a solution, and we can even convince you easily
that we are not lying. Indeed, X ∗ = (1, 0, 0, 1, 0, 0) is a solution.
Now, a crucial point in this example is that you do not need to know how we
have found the solution in order to convince yourself of its correctness: We may
have stumbled upon it by chance (nondeterministically), or guessed it otherwise.
What matters is that, once you hold the candidate X ∗ , it is easy to check that the
equation φ = 0 is indeed consistent. (This situation is not as strange as it may
initially appear; mathematicians, in particular, do not usually have to explain how
they came up with the proof of a new theorem: Their professional community only
requires that they be able to verify the validity of the alleged proof.)
Let us now generalize this idea. We say that a decision problem D ⊆ I ∗ is in
the class NP if there exists a problem D ∈ P and a polynomial p (n) such that, for
every word W ∈ I ∗ , the following statements are equivalent:
(a) W ∈ D, that is, W is a Yes-instance of D.
(b) There exists a certificate V ∈ I ∗ such that |V | ≤ p (|W |) and such that
(V , W ) ∈ D .
620 Appendix B

To relate this formal definition to the previous discussion, note that for every
Yes-instance W ∈ D, there must exist a certificate V (in our previous example, a
candidate solution X ∗ ) that is reasonably short relative to W (this is ensured by
the condition |V | ≤ p (|W |)), and such that checking the condition W ∈ D boils
down to verifying that (V , W ) ∈ D (in our example, verifying that φ(X∗ ) = 0).
Moreover, the condition (V , W ) ∈ D must be testable in polynomial time; this is
ensured by the assumption that D ∈ P.
It is easy to see that P ⊆ NP, meaning that every polynomially solvable problem
is in NP. Indeed, if D ∈ P, then it suffices to choose D = D and p (n) ≡ 0 in the
definition of NP (with V the empty string).
It is also quite obvious that the problem DNF Equation is in NP, just like
Hamiltonian graph and numerous other combinatorial problems (for example,
any Hamiltonian circuit can be used to certify that a graph is Hamiltonian). To
date, however, nobody has been able to devise a polynomial-time algorithm for
DNF Equation or for Hamiltonian graph; that is, nobody knows whether these
problems are in P or in NP\P.
The vast majority of mathematicians and computer scientists actually believe
that P  = NP, but this famous conjecture has resisted all proof attempts (and there
have been many) since the early 70s. To better appreciate this conjecture, it is
useful to introduce the concepts of polynomial-time reductions and of NP-complete
problems.

B.5 Polynomial-time reductions and NP-completeness


It is common practice in mathematics to establish that a problem D can be viewed
as a “special case” of another problem D , and to solve D by an algorithm originally
designed for the more general problem D .
In our context, we say that a decision problem D is (polynomially) reducible
to a decision problem D if there is a polynomial-time algorithm A that computes,
for any input word W ∈ I ∗ , another word fA (W ) = W ∈ I ∗ such that

W ∈ D if and only if W ∈ D .

The algorithm A that transforms any instance of D into an instance of D is called


a polynomial-time reduction of D to D .
Note that, if D is reducible to D and if D belongs to P, then D also belongs to
P. It is slightly less obvious, but equally true, that if D is reducible to D and if D
belongs to NP, then D belongs to NP.
Now, a problem D is called NP-complete if every problem in NP is reducible to
D. So, NP-complete problems can be viewed as the most general, or the hardest,
problems in NP (they are “complete” in the sense that they “contain” every other
problem of NP as a subproblem). It is not obvious, however, that NP-complete
problems should actually exist. Cook’s fundamental contribution was to demon-
strate the existence of at least one natural NP-complete problem, namely, DNF
Appendix B 621

Equation [208] (see also Levin [610]). We sketch a proof of this seminal result
later, in Section B.7.
Once we get hold of a first NP-complete problem D, it becomes easier to
establish that another problem D is also NP-complete: Indeed, to reach this con-
clusion, it suffices to prove that D is at least as hard as D, or, more precisely,
that D is reducible to D . This type of reduction has been provided for thousands
of decision problems, starting with the work of Cook [208] and Karp [550]; see
also Ausiello et al. [36], Crescenzi and Kann [244], or Garey and Johnson [371].
Several examples of NP-completeness proofs are given in the book.
Note also that the existence of NP-complete problems has interesting conse-
quences for the “P vs. NP” question stated above: Namely, to validate the conjecture
that P  = NP, it is sufficient to prove that at least one NP problem cannot be solved
in polynomial time, and NP-complete problems are most natural candidates for
this purpose. Moreover, the equality P = NP holds if and only if at least one
NP-complete problem happens to be polynomially solvable.

B.6 The class co-NP


The definition of the class NP in Section B.4 displays a striking asymmetry between
“Yes-instances” and “No-instances” of decision problems. In fact, this apparent
anomaly is well-grounded. Indeed, we have been able to argue that a problem like
DNF Equation is in NP by observing that, when the DNF equation φ(X) = 0 is
consistent, any solution X ∗ provides a concise certificate of consistency (remember
the small example in Section B.4). But when a DNF equation is not consistent, we
may be hard put to provide a short proof of inconsistency.
As a consequence of this observation, we can introduce a new complexity class,
to be called co-NP, by reversing the roles of “Yes-instances” and “No-instances”
in the definition of the class NP. Equivalently, we say that a decision problem D
belongs to co-NP if and only if its complementary problem (I ∗ \ D) belongs to
NP. Since D ∈ P trivially implies that (I ∗ \ D) ∈ P, and since P ⊆ NP, we can also
conclude that
P ⊆ NP ∩ co-NP.
Problems in NP ∩ co-NP have short, polynomially verifiable certificates for both
positive and negative instances. Therefore, these problems are sometimes called
“well-characterized.” Such problems are frequently known to belong to P as well,
but the question of whether P = NP ∩ co-NP remains open.
Co-NP-complete problems can be defined by analogy with NP-complete prob-
lems, namely: Problem D is co-NP-complete if and only if D belongs to co-NP
and every problem in co-NP is reducible to D. Equivalently, D is co-NP-complete
exactly when its complementary problem (I ∗ \ D) is NP-complete.
Finally, we use the term NP-hard rather loosely to designate any problem D
(be it a decision problem or an optimization problem) that is at least as hard as
every NP-complete problem in the sense that, if D can be solved in polynomial
622 Appendix B

time, then so can every NP-complete problem. In particular, NP-complete and co-
NP-complete problems are NP-hard, as are certain problems that are not known
to be either in NP or in co-NP.

B.7 Cook’s theorem


In this section, we provide a proof of the following version of Cook’s theorem
[208]:

Theorem B.1. The problem DNF Equation is NP-complete.

Proof. We only sketch the main arguments of the proof, leaving aside some of the
technical fine points, and we refer the reader to the specialized literature for details
(see Cook’s original paper or Garey and Johnson [371]).
The proof of the theorem heavily relies on the observation that the computa-
tions performed by a Turing machine can be “encoded” by the solution of a DNF
equation, much in the same way that the output of a combinational circuit can be
implicitly represented by the solution of a DNF equation (see Section 1.13.2). So,
we start with a demonstration of this fact.
Consider an arbitrary decision problem D, and suppose that D is solved in
polynomial time by a Turing machine A. The complexity of A is bounded by a
polynomial p(n) for every instance of size n ∈ N.
For simplicity, and without loss of generality, we assume that A works on the
encoding alphabet B ∪ {Z}, so that an input word of size n can be viewed as a
point in B n .
We make the following claim:
Claim. For every n ∈ N, there is an integer m = O(p(n)2 ) and a Boolean DNF
φ(X, Y , z) (where X ∈ B n , Y ∈ Bm , and z ∈ B) with the property that, for every
point X∗ ∈ Bn ,

(i) the DNF equation φ(X ∗ , Y , z) = 0 has a unique solution (X ∗ , Y ∗ , z∗ ) ∈


B n+m+1 , and
(ii) when the Turing machine A operates on the input word X∗ , the output of
A is qY (“Yes”) if z∗ = 1, and the output of A is qN (“No”) if z∗ = 0.

Moreover, the DNF φ can be constructed in time polynomial in n and p(n).


Proof of the claim. For each fixed n, if the input point (or word) X is in Bn ,
then the number of iterations performed by A is bounded by p(n), so that the
read-write head will be able to scan at most p(n) cells of the tape until the Turing
machine stops. More precisely, since the RW head initially scans cell 1, it can only
scan the cells in K = {−p(n) + 1, . . . , p(n)} until it stops.
Let us now introduce p(n)(|Q| + 8p(n)) variables that completely describe the
configuration of A in successive iterations: For q ∈ Q, k ∈ K, t ∈ {1, . . . , p(n)}
and σ ∈ B ∪ {Z}, we define
Appendix B 623

Q Q
• variables yq,t : their intended meaning is that yq,t = 1 if A is in state q at
iteration t;
H H
• variables yk,t , where yk,t = 1 if the RW head scans cell k at iteration t;
• variables yσ ,k,t , where yσC,k,t = 1 if σ is the symbol contained in cell k at
C

iteration t.

Q
For the variables (yq,t H
, yk,t , yσC,k,t ) to correctly describe the (uniquely defined)
configuration of the Turing machine at every iteration t, there must hold

(a) yqQ0 ,1 = 1 (the machine is initially in state q0 ) and, for all q ∈ Q \ {q0 },
Q
yq,1 = 0;
H H
(b) y1,1 = 1 (the RW head initially scans cell 1) and, for all k  = 1, yk,1 = 0;
C
(c) if k ∈ {1, 2, . . . , n} and σ = xk , then yσ ,k,1 = 1; if k ∈ K \ {1, 2, . . . , n} and
σ = Z, then yσC,k,1 = 1; for all other pairs (σ , k), yσC,k,1 = 0.
At every iteration, the variables describe a valid configuration resulting
from a correct transition from the previous configuration, meaning that
(d) for all t ∈ {1, . . . , p(n)}, for all k ∈ K, for all q ∈ {qY , qN }, for all σ , for
(q , σ , m) = T (q, σ ), for all q = q , for all k  = k + m, for all k  = k, for
all σ ,
Q
if yq,t = 1 and yk,t H
= 1 and yσC,k,t = 1, then

yqQ,t+1 = 1, yqQ ,t+1 = 0 (the machine is in state q at iteration t + 1),

H
yk+m,t+1 = 1, ykH,t+1 = 0 (the RW head scans cell k + m at iteration t + 1),

yσC ,k,t+1 = 1, yσC ,k,t+1 = 0 (cell k contains the symbol σ at iteration t +1),

yσC ,k ,t+1 = yσC ,k ,t (all cells other than cell k remain unchanged).

(These rules preserve the following property: At every iteration t, the


machine is in a unique state, the RW head scans a unique cell, and each cell
contains a unique symbol.)
(e) for all t ∈ {1, . . . , p(n) − 1}, for q ∈ {qY , qN } (if the machine has reached
a halting state, then its configuration remains unchanged in subsequent
iterations),
Q
if yq,t = 1, then

yqQ,t+1 = yqQ,t for all q ∈ Q,

H H
yk,t+1 = yk,t for all k ∈ K,

yσC,k,t+1 = yσC,k,t for all k ∈ K and for all σ .


(f) z = yqQY ,p(n) .
624 Appendix B

The conditions (a)–(f) are easily translated into a DNF equation φ(X, Y , z) = 0.
For instance, condition (c) can be written as
n

C
(y0,k,1 xk ∨ y C0,k,1 x k ∨ y1,k,1
C
x k ∨ y C1,k,1 xk ∨ yZ,k,1
C
)∨
k=1

C
(y0,k,1 C
∨ y1,k,1 ∨ y CZ,k,1 ) = 0.
k∈K\{1,...,n}

By construction, this equation has a unique solution (X∗ , Y ∗ , z∗ ) ∈ B n+m+1 for


every fixed X∗ ∈ Bn , and the values of Y ∗ and z∗ in this solution describe the
operations of the Turing machine on the input X∗ . This establishes the claim.
We are now ready to conclude the proof of the theorem. Let D be an arbitrary
problem in NP. By definition, and with the same notations as in Section B.4, there
is a problem D ∈ P and a polynomial p (n) such that, for every instance W ∈ I ∗ ,
W ∈ D if and only if there exists a certificate V ∈ I ∗ such that |V | ≤ p (|W |) and
such that (V , W ) ∈ D . Let A be a Turing machine that solves D in polynomial
time.
For every fixed n ∈ N, there is a Boolean DNF φ(X, Y , z) associated with A as
in the proof of the claim. We can view every input word X ∈ B n as consisting of
two subwords V and W , with V ∈ B r and W ∈ B s for some fixed s and r = p (s).
A word W ∗ ∈ Bs is a Yes-instance of D if and only if there exists V ∗ ∈ Br such
that X∗ = (V ∗ , W ∗ ) ∈ D , or equivalently if and only if the equation

φ(V , W ∗ , Y , 1) = 0

is consistent. Observe that when the equation has a solution (V ∗ , W ∗ , Y ∗ , 1), the
point V ∗ describes the certificate associated to W ∗ , and the point Y ∗ describes the
steps of the verification of the certificate by A. This completes the proof of Cook’s
theorem. 

B.8 Complexity of list-generation and counting algorithms


In this book, we frequently investigate problems that are neither decision problems
nor optimization problems, but problems of the following type: Given a binary
relation D ⊆ I ∗ × I ∗ and a word W ∈ I ∗ , we must generate all words V ∈ I ∗
such that (V , W ) ∈ D. We say that this is the list-generation problem associated
with property D. Occasionally, we also consider the counting problem associated
with D, that is, the question of determining the number of words V such that
(V , W ) ∈ D. To keep things reasonable, we further assume that the size of each
“solution” V is polynomially-bounded in the size of the input W .
For example, if D expresses the property “X is a solution of the DNF equation
φ = 0”, and if the input string W encodes φ, then the associated counting problem
asks for the number of solutions of the equation φ = 0, and the list-generation
Appendix B 625

problem consists in generating all solutions of the equation (these problems are
considered in Sections 2.11.1 and 2.11.2, respectively). Similarly, if D expresses
the property “C is a prime implicant of the function f ”, and if the input W encodes
f (in some predetermined format), then the counting problem asks for the number
of prime implicants of f , and the list-generation problem requires the production
of all prime implicants of f (see Chapter 3).
We do not discuss the complexity of counting problems in detail here, as we
encounter very few of them in this book. Let us simply say that we call #P -
complete those counting problems that are “hardest” among a natural class of
counting problems (essentially, among those counting problems such that the prop-
erty (V , W ) ∈ D can be verified in polynomial time). We refer to [371, 725, 883]
for details.
By contrast, we find it necessary to discuss more formally the complexity
of list-generation algorithms. The main difficulty here is that the number of
solutions V satisfying the property (V , W ) ∈ D may be much larger than the
size |W | of the input; to put it another way, the size of the output of a list-
generation problem may be exponentially large in the size of its input, and hence,
no polynomial-time algorithm can possibly exist for such a problem. There-
fore, it makes sense to measure the complexity of list-generation algorithms as
a function of their input size and of their output size. This notion has been formal-
ized and used by many authors; early references include Read and Tarjan [781];
Valiant [883]; Lawler, Lenstra, and Rinnooy Kan [605]; and Johnson, Yannakakis,
and Papadimitriou [538].
Consider a binary relation D ⊆ I ∗ × I ∗ and the associated list-generation
problem LD . Let A be a list-generation algorithm for LD , and suppose that, when
running on the input W , A outputs the list V1 , V2 , . . . , Vm , in that order. Note that
the value of m depends on W but is independent of A. We take it as a measure
of the output size of LD for the instance W ∈ I ∗ (remember that the size of each
solution V1 , V2 , . . . , Vm has been assumed to be polynomially bounded in the size
of W ).
For k = 1, . . . , m, we denote by τ (k) the running time required by A to
output the first k elements of the list, that is, to generate V1 , V2 , . . . , Vk . So,
τ (m) is the total running time of A on W , and if we let τ (0) = 0, then
τ (k) − τ (k − 1) is the time elapsed between the (k − 1)-st and the k-th outputs, for
k = 1, 2, . . . , m.
Following the terminology of Johnson, Yannakakis, and Papadimitriou [538],
we say that

• A runs in polynomial total time if τ (m) is bounded by a polynomial in |W |


and m;
• A runs in polynomial incremental time if τ (k) is bounded by a polynomial
in |W | and k, for k = 1, 2, . . . , m;
• A runs with polynomial delay if τ (k) − τ (k − 1) is bounded by a polynomial
in |W |, for k = 1, 2, . . . , m.
626 Appendix B

Polynomial total time is, in a sense, the weakest notion of polynomiality that
can be applied to LD , since the running time of any algorithm for LD must grow
at least linearly with m.
Polynomial incremental time captures the idea that the algorithm A outputs
the solutions of LD sequentially and does not spend “too much time” between
two successive outputs. Indeed, the definition implies that τ (k) − τ (k − 1) is
polynomially bounded in |W | and k, for all k. When generating the next element
in the list, however, the algorithm may need to look at all previous outputs, and
therefore, we allow τ (k) to depend on k as well as on the input size |W |.
Finally, an algorithm runs with polynomial delay when the time elapsed between
two successive outputs is polynomial in the input size of the problem. This is a
rather strong requirement, the strongest, in fact, among those discussed by Johnson,
Yannakakis, and Papadimitriou [538].
In order to better understand the complexity of the list-generation problem LD ,
it is also useful to grasp its relation with the following problem:

NEXT-GEND
Instance: A word W ∈ I ∗ , and a set K of words such that (V , W ) ∈ D for all
V ∈ K.
Output: Either find a word V  ∈ K such that (V , W ) ∈ D, or prove that no such
word exists.
Clearly, if problem NEXT-GEND can be solved in polynomial time (meaning, in
time polynomial in |W | and |K|), then LD can be solved in polynomial incremental
time: Indeed, starting from the empty list K = ∅, one can iteratively generate
solutions of LD by solving a sequence of instances of NEXT-GEND , until we can
conclude that all solutions of LD have been generated.
Boros et al. [117] pointed out that, somewhat surprisingly, the converse relation
also holds (see also Lawler et al. [605]). Namely, if algorithm A solves the list-
generation problem LD in polynomial incremental time, then NEXT-GEND can
be solved in polynomial time for every input (W , K) by a single run of A on the
input W , which can be aborted after the generation of the first |K| + 1 solutions.
Thus, investigating the complexity of NEXT-GEND provides valuable insights
into the complexity of the list-generation problem LD .
Appendix C

JBool: A software tool


Claude Benzaken and Nadia Brauner

C.1 Introduction
JBool is an application designed for teaching and illustrative purposes. It allows
users to work with Boolean functions in disjunctive normal form (DNF) or in
conjunctive normal form (CNF), and to easily manipulate the concepts described
in this book or test conjectures on small-size examples. It is not an industrial
software package, and it is not optimized to tackle large problems.
JBool can be downloaded freely from http://hdl.handle.net/2268/
72714. The user interface is written in Java and the core engine for Boolean
functions is written in ANSI C. The Java application requires a Java Runtime
Environment (JRE) 1.3 or later, and binaries for the engine are available for the
following platforms:

• Mac OS 10.3 and later


• Windows XP and later
• Linux x86

Source code is available, so the engine can be compiled for other platforms as
well.
This appendix is organized as follows. First, the basic interface of the software
is presented in Section C.2. The tools available to create, load, or save a function
are described in Section C.3. The main functionalities of the software are then
successively examined: Modify the elements of the edition (Section C.4), cre-
ate several representations of the same function (Section C.5.1), apply various
operators to the current function (Section C.5.2), perform operations on sev-
eral functions and test properties of the current function (Section C.5). More
details on all these functionalities can be found in the on-line help of the
software.

627
628 Appendix C

C.2 Work interface


Figure C.1 displays the work interface of JBool. The main elements of this interface
are described in the following sections.

C.2.1 Menu bar


When no Boolean function is selected, only [File], [Edit], and [Help] menus are
visible in the menu bar. Other menus appear when a function is active.

• The [File] menu gives access to standard functionalities like New, Open,
Save, and so on.
• The [Edit] menu contains classical commands like Cut, Copy, Paste, as well
as some functionalities that change the function form.
• The [Presentation] menu contains items that produce an equivalent Boolean
expression of the current function, like a dual form or an orthogo-
nal form. In each case, a new name is created with structure <Item
name>(<function name>).

text zone

Figure C.1. JBool Interface.


Appendix C 629

• The [Construction] menu allows various constructions of new functions from


the current one, like duplication of a function, restriction by assignment of
values to literals, and so on. The same naming procedure is used for the new
function as in the [Presentation] menu.
• The [Operations] menu allows the user to perform basic operations on pairs
of functions, like disjunction or conjunction.
• The [Computation] menu allows the user to test properties (such as positivity,
regularity, and so forth) of the current function.

C.2.2 Function windows


The Boolean functions are displayed in function windows. When a Boolean func-
tion is created, it appears in a new window with the default name Function n,
where n is its sequence number. Some menu items open a new window associ-
ated with a new function, depending on the operation that has been performed.
Then, several windows can be used simultaneously. The title bar of each window
recalls the name of the operation used to create the function (for example, Dual
function(Function 1)). When a window is selected, the corresponding
function appears in the main text zone.
Each function window contains a general board that displays the following
information: the number mF of terms or clauses, the number nX of variables,
the variable set V arset, the normal form type (disjunctive or conjunctive), and a
general or positive qualifier. Below the title bar, the text zone displays a normal
form representation of the function, as explained in Section C.3.

C.2.3 Text zone


The text zone is located below the menu bar, as in Figure C.1. It is activated when
a new function is created or loaded in the [File] menu. When a function window
is selected, the corresponding Boolean function appears in the text zone. Each
change in the text zone affects the corresponding function window when the [OK]
button is selected.

C.3 Creating a Boolean function


C.3.1 Function syntax and presentation
The edition of a function is done only in the text zone. Each variable is represented
by one character within the lowercase alphabet a to z or by an integer between 1
and 6. Thus one can use 32 variables in the set {a, . . . , z} ∪ {1, . . . , 6}. The software
only allows the representation of functions as normal forms, either DNFs or CNFs.
Terms or clauses are written as simple words, separated by “+” (representing the
“or” operator) in a DNF, and separated by “&” (representing the “and” operator)
in a CNF (when typing a function, one may input a space character instead of “+”
630 Appendix C

or “&”). Each word starts with the alphabetical list of positive literals, followed
by the sign “-” and by the alphabetical list of negative (complemented) literals.
Empty words are allowed. For instance, the DNF (a ∧ b ∧ f ) ∨ (c) ∨ (d ∧ e) is
written as a-bf + -c + e-d. Similarly, the CNF (a ∨ b ∨ f ) ∧ (c) ∧ (d ∨ e) is written
as a-bf & -c & e-d.
An empty list (mF = 0) represents a a constant function (0 for a DNF, and 1
for a CNF) and is displayed as “F” (False) for a DNF and as “T” (True) for a CNF.
(One may also simply type “T” or “F” in the text zone.)
All Boolean expressions are automatically simplified according to the absorp-
tion laws

x ∧ (x ∨ y) = x, x ∨ (x ∧ y) = x.

For instance, the DNF expression a ∨ (a ∧ b ∧ c) with the corresponding syntax a


+ ac-b is automatically simplified to the expression a (by absorption law).

C.3.2 Creation modes


There are four ways of creating a function: One can create a new empty function,
generate a random function, calculate a threshold function from the definition of
a separator, or load an existing function.
The [New] item in the [File] menu creates a new function whose default Boolean
expression is F (False) in DNF, and T (True) in CNF. This function can be
subsequently modified in the text zone.
The [Random...] item in the [File] menu opens a dialog box, as shown in
Figure C.2, for generating a random expression. Six fields are displayed in the
dialog box: the number of variables, an upper bound for the number of terms, the
minimal degree (number of literals) of each term, a specification of uniform degree
(all terms have equal degree), positivity of the function, and the conjunctive or
disjunctive character of the normal form. Positivity here means that no negative
literal appears in the expression. With the choices in Figure C.2, JBool might return
the Boolean function (d ∨ e)(b ∨ c ∨ d ∨ f )(a ∨ b ∨ e ∨ f ).
The [Threshold...] item in the [File] menu opens a dialog box, as in Figure C.3,
for generating a threshold function from the definition of a separator. First, the
threshold value (which can be negative) and the number n of variables are required.
Then a grid is opened with n boxes to be filled by integers (positive or negative)
which are the weights of the n variables. The inequality corresponding to the
example in Figure C.3 is 2a − b + 3c ≥ 2, and JBool returns the corresponding
Boolean threshold function: c ∨ ab.
The [Open...] item in the [File] menu allows us to load a previously saved
Boolean function. This item opens a dialog box for selecting a Boolean function
file. Once opened, the Boolean function appears in the function window and in the
text zone.
Appendix C 631

Figure C.2. Random function dialog.

Figure C.3. Threshold function dialog.

C.3.3 Saving a function


The [Save As…] item in the [File] menu saves a Boolean function as a text file.
This item opens a dialog box for entering the file name of the function. The [Save]
item in the [File] menu saves an existing Boolean function. If the function does
not exist yet, then this item opens a [Save As…] dialog.
Each Boolean function is saved in a text (.txt) file that only contains the function
written in the JBool syntax. Figure C.4 shows an example of a Boolean function
file.
632 Appendix C

Example.txt
a + b + c-d

Figure C.4. A Boolean function file: Example.txt.

The [Rename function] item in the [File] menu opens a dialog for entering a
new name for the current Boolean function. The new name appears in the title bar
of the corresponding window.

C.4 Editing a function


The [Edit] menu contains tools for editing a function. Classical commands, like
Cut, Copy, Paste are available. The menu also contains some functionalities that
modify the form of the function: sort the terms by degree, change the normal form,
modify the variable set, and so on. We next describe two of these operations.

C.4.1 Changing the normal form


The [Change normal form] command in the [Edit] menu carries out the formal
transformation of the current normal form into the dual form: it simply replaces
all “&” by “+” (or conversely). For instance, it replaces the DNF a ∨ b ∨ cd by the
CNF ab(c ∨ d). This command is equivalent to changing the form in the function
window using the toggle buttons.
Similarly, the [Formal Complement] item in the [Edit] menu creates a Boolean
function which is the formal complement of the current one: it replaces all “&” by
“+” (or conversely) and it replaces each literal by its complement. For instance,
the formal complement of the DNF a ∨ b ∨ cd is the CNF ab(c ∨ d). The name of
the new function is Formal complement(< f unction_name >).

C.4.2 Modifying the variable set


The [Shift variables...] item in the [Edit] menu opens a dialog window for entering
a number k. Each variable rank is then shifted by k. For instance, if k = 2, then
the Boolean function a ∨ bc becomes c ∨ de. Notice that 32 − k has to be larger
than the largest rank of the variables of the current function.
The [Add dummy] item in the [Edit] menu opens a dialog that asks for a subset
A of variables. All variables in A are then added to the set of variables (Varset)
of the current function. The [Delete dummy] item in the [Edit] menu deletes all
variables on which the function does not effectively depend.
Appendix C 633

The [Compact...] item in the [Edit] menu deletes all dummy variables in the
variable set (Varset) and replaces the rank of all the variables by the smallest
possible rank. For instance, the Boolean function e ∨ j ∨ hu becomes a ∨ c ∨ bd.

C.5 Operations on Boolean functions


This section briefly presents the main functionalities of the JBool software (we
refer to the on-line help for details).

C.5.1 Equivalent presentations of a Boolean function


The [Presentation] menu contains commands that produce equivalent Boolean
expressions of the current function, like a dual form, an orthogonal form, an irre-
dundant form, the full list or prime implicants, or an irredundant list of prime
implicants of the current function. For each item, a new function is created whose
name is <Item Name>(<Function name>).

C.5.2 Constructions
The [Construction] menu allows various constructions of new functions from the
current one, for instance, by duplication, dualization, or complementation of the
current function; by assignment of values to subsets of literals; or by merging of
variables. The new function can also be obtained by extracting terms of a given
degree or by switching variables. In each case, a new function is created whose
name is <Item Name>(<Function name>).

C.5.3 Operations on two Boolean functions


The possible operations are the disjunction and the conjunction of two Boolean
function. The items in the [Operations] menu open a dialog with the list of all
Boolean functions in use. One function must be selected in this dialog. Then, a
new function is created by applying the chosen operation to the current function
and to the function selected in the dialog.

C.5.4 Testing properties of a function


The [Computation] menu allows testing whether the current function is identically
1, monotone, 2-monotone, quadratic, pure Horn, disguised-Horn, or quasi-Horn-
quadratic.
Bibliography

[1] P.A. Abdulla, P. Bjesse and N. Eén, Symbolic reachability analysis based on SAT-solvers,
in: S. Graf and M. Schwartzbach, eds., Tools and Algorithms for the Construction and
Analysis of Systems, Lecture Notes in Computer Science, Vol. 1785, Springer-Verlag,
Berlin Heidelberg, 2000, pp. 411–425.
[2] J.A. Abraham, An improved algorithm for network reliability, IEEE Transactions on
Reliability R-28 (1979) 58–61.
[3] D. Achlioptas and Y. Peres, The threshold for random k-SAT is 2k log 2 − O(k), Journal
of the American Mathematical Society 17 (2004) 947–973.
[4] D. Achlioptas and G.B. Sorkin, Optimal myopic algorithms for random 3-SAT, Proceed-
ings of the 41st Annual IEEE Symposium on the Foundations of Computer Science, IEEE,
2000, pp. 590–600.
[5] A. Adam, Truth Functions and the Problem of Their Realization by Two-Terminal Graphs,
Akademiai Kiado, Budapest, 1968.
[6] W.P. Adams and P.M. Dearing, On the equivalence between roof duality and Lagrangian
duality for unconstrained 0–1 quadratic programming problems, Discrete Applied
Mathematics 48 (1994) 1–20.
[7] K.K. Aggarwal, K.B. Misra and J.S. Gupta, A fast algorithm for reliability evaluation,
IEEE Transactions on Reliability R-24 (1975) 83–85.
[8] R. Agrawal, T. Imielinski and A. Swami, Mining association rules between sets of items in
large databases, International Conference on Management of Data (SIGMOD 93), 1993,
pp. 207–216.
[9] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A.I. Verkamo, Fast discovery of
association rules, in: U.M. Fayyad et al., eds., Advances in Knowledge Discovery and
Data Mining, AAAI Press, Menlo Park, California, 1996, pp. 307–328.
[10] A.V. Aho, M.R. Garey and J.D. Ullman, The transitive reduction of a directed graph, SIAM
Journal on Computing 1 (1972) 131–137.
[11] A.V. Aho, J.E. Hopcroft and J.D. Ullman, The Design and Analysis of Computer
Algorithms, Addison-Wesley Publishing Company, Reading, MA, 1974.
[12] M. Aigner, Combinatorial Theory, Springer-Verlag, Berlin, Heidelberg, New York,
1979.
[13] H. Aizenstein, T. Hegedűs, L. Hellerstein and L. Pitt, Complexity theoretic hardness
results for query learning, Computational Complexity 7 (1998) 19–53.
[14] G. Alexe, P.L. Hammer, V. Lozin and D. de Werra, Struction revisited, Discrete Applied
Mathematics 132 (2003) 27–46.
[15] E. Algaba, J.M. Bilbao, J.R. Fernández Garcia and J.J. López, Computing power
indices in weighted multiple majority games, Mathematical Social Sciences 46 (2003)
63–80.

635
636 Bibliography

[16] E. Allender, L. Hellerstein, P. McCabe, T. Pitassi and M.E. Saks, Minimizing disjunctive
normal form formulas and AC0 circuits given a truth table, SIAM Journal on Computing
38 (2008) 63–84.
[17] N. Alon and P.H. Edelman, The inverse Banzhaf problem, Social Choice and Welfare 34
(2010) 371–377.
[18] M. Alonso-Meijide, B. Casas-Méndez, M.J. Holler and S. Lorenzo-Freire, Computing
power indices: Multilinear extensions and new characterizations, European Journal of
Operational Research 188 (2008) 540–554.
[19] H. Andreka and I. Nemeti, The generalized completeness of Horn predicate-logic as a
programming language, Research Report of the Department of Artificial Intelligence 21,
University of Edinburgh, 1976.
[20] D. Angluin, Learning propositional Horn sentences with hints, Research Report of the
Department of Computer Science 590, Yale University, 1987.
[21] D. Angluin, Queries and concept learning, Machine Learning 2 (1988) 319–342.
[22] D. Angluin, L. Hellerstein and M. Karpinski, Learning read-once formulas with queries,
Journal of the ACM 40 (1993) 185–210.
[23] M.F. Anjos, An improved semidefinite programming relaxation for the satisfiability
problem, Mathematical Programming 102 (2005) 589–608.
[24] M.F. Anjos, Semidefinite optimization approaches for satisfiability and maximum-
satisfiability problem, Journal on Satisfiability, Boolean Modeling and Computation
1 (2005) 1–47.
[25] M. Anthony, Discrete Mathematics of Neural Networks: Selected Topics, SIAM Mono-
graphs on Discrete Mathematics and Applications, SIAM, Philadelphia, 2001.
[26] M. Anthony, Probabilistic learning and Boolean functions, in Y. Crama and P.L. Hammer,
eds., Boolean Models and Methods in Mathematics, Computer Science, and Engineering,
Cambridge University Press, Cambridge, 2010, pp. 197–220.
[27] M. Anthony, Neural networks and Boolean functions, in Y. Crama and P.L. Hammer,
eds., Boolean Models and Methods in Mathematics, Computer Science, and Engineering,
Cambridge University Press, Cambridge, 2010, pp. 554–576.
[28] M. Anthony, Decision lists and related classes of Boolean functions, in Y. Crama and P.L.
Hammer, eds., Boolean Models and Methods in Mathematics, Computer Science, and
Engineering, Cambridge University Press, Cambridge, 2010, pp. 577–595.
[29] M. Anthony and N. Biggs, Computational Learning Theory, Cambridge University Press,
Cambridge, 1992.
[30] W.W. Armstrong, Dependency structures of database relationships, in: IFIP-74, North-
Holland, Amsterdam, 1974, pp. 580–583.
[31] T. Asano and D.P. Williamson, Improved approximation algorithms for MAX SAT, Work-
ing paper, IBM Almaden Research Center, 2000. Preliminary version in the Proceedings
of the 11th ACM-SIAM Symposium on Discrete Algorithms, 2000, pp. 96–105.
[32] R.L. Ashenhurst, The decomposition of switching functions, in: Proceedings of the
International Symposium on the Theory of Switching, Part I, Harvard University Press,
Cambridge, MA, 1959, pp. 75–116.
[33] B. Aspvall, Recognizing disguised NR(1) instances of the satisfiability problem, Journal
of Algorithms 1 (1980) 97–103.
[34] B. Aspvall, M.F. Plass and R.E. Tarjan, A linear-time algorithm for testing the truth of
certain quantified Boolean formulas, Information Processing Letters 8 (1979) 121–123.
[35] J. Astola and R.S. Stanković, Fundamentals of Switching Theory and Logic Design: A
Hands on Approach, Springer, Dordrecht, The Netherlands, 2006.
[36] G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-Spaccamela and M. Protasi,
Complexity and Approximation, Springer-Verlag, Berlin, 1999.
[37] G. Ausiello, A. D’Atri and D. Saccà, Minimal representation of directed hypergraphs,
SIAM Journal on Computing 15 (1986) 418–431.
[38] A. Avidor, I. Berkovitch and U. Zwick, Improved approximation algorithms for MAX
NAE-SAT and MAX SAT, in: T. Erlebach and G. Persiano, eds., Approximation and
Bibliography 637

Online Algorithms, Lecture Notes in Computer Science, Vol. 3879, Springer-Verlag,


Berlin Heidelberg, 2006, pp. 27–40.
[39] H. Aziz, M. Paterson and D. Leech, Efficient algorithm for designing weighted vot-
ing games, in: Proceedings of the 11th IEEE International Multitopic Conference, IEEE
Computer Society, 2007, pp. 1–6.
[40] J. Bailey, T. Manoukian and K. Ramamohanarao, A fast algorithm for computing hyper-
graph transversals and its application in mining emerging patterns, in: Proceedings of
the 3rd IEEE International Conference on Data Mining Florida, USA, IEEE Computer
Society, 2003, pp. 485–488.
[41] O. Bailleux, Y. Boufkhad and O. Roussel, A translation of pseudo-Boolean con-
straints to SAT, Journal on Satisfiability, Boolean Modeling and Computation 2 (2006)
191–200.
[42] E. Balas, Facets of the knapsack polytope, Mathematical Programming 8 (1975) 146–164.
[43] E. Balas and R. Jeroslow, Canonical cuts on the unit hypercube, SIAM Journal on Applied
Mathematics 23(1972) 661–669.
[44] E. Balas and J.B. Mazzola, Nonlinear 0–1 programming: I. Linearization techniques,
Mathematical Programming 30 (1984) 1–21.
[45] E. Balas and J.B. Mazzola, Nonlinear 0–1 programming: II. Dominance relations and
algorithms, Mathematical Programming 30 (1984) 22–45.
[46] E. Balas and E. Zemel, Facets of the knapsack polytope from minimal covers, SIAM
Journal of Applied Mathematics 34 (1978) 119–148.
[47] M.L. Balinski, On a selection problem, Management Science 17 (1970) 230–231.
[48] M.O. Ball and G.L. Nemhauser, Matroids and a reliability analysis problem, Mathematics
of Operations Research 4 (1979) 132–143.
[49] M.O. Ball and J.S. Provan, Disjoint products and efficient computation of reliability,
Operations Research 36 (1988) 703–715.
[50] H.-J. Bandelt and V. Chepoi, Metric graph theory and geometry: A survey, in:
J.E. Goodman, J. Pach and R. Pollack, eds., Surveys on Discrete and Computational Geom-
etry: Twenty Years Later, Contemporary Mathematics, Vol. 453, American Mathematical
Society, Providence, RI, 2008, pp. 49–86.
[51] J. Bang-Jensen and G. Gutin, Digraphs: Theory, Algorithms and Applications, Springer-
Verlag, London, 2000.
[52] J.F. Banzhaf, Weighted voting doesn’t work: A mathematical analysis, Rutgers Law
Review 19 (1965) 317–343.
[53] K. Barkaoui and M. Minoux,Apolynomial-time algorithm to decide liveness of some basic
classes of bounded Petri nets, in: Application and Theory of Petri Nets 1992, Lecture Notes
in Computer Science, Vol. 616, Springer-Verlag, Berlin Heidelberg, 1992, pp. 62–75.
[54] R.E. Barlow and F. Proschan, Statistical Theory of Reliability and Life Testing, Holt,
Rinehart and Winston, New York, 1975.
[55] R. Battiti and M. Protasi, Solving MAX-SAT with non-oblivious functions and history-
based heuristics, in: D. Du, J. Gu and P.M. Pardalos, eds., Satisfiability Problem: Theory
and Applications, DIMACS series in Discrete Mathematics and Theoretical Computer
Science, Vol. 35, American Mathematical Society, 1997, pp. 649–667.
[56] R.J. Bayardo Jr. and J.D. Pehoushek, Counting models using connected components,
in: Proceedings of the 17th National Conference on Artificial Intelligence and 12th
Conference on Innovative Applications of Artificial Intelligence, Austin, TX, 2000,
pp. 157–162.
[57] R.J. Bayardo Jr. and R.C. Schrag, Using CSP look-back techniques to solve exceptionally
hard SAT instances, in: Proceedings of the Second International Conference on Principles
and Practice of Constraint Programming, Lecture Notes in Computer Science, Vol. 1118,
Springer, Berlin, 1996, pp. 46–60.
[58] R.J. Bayardo Jr. and R.C. Schrag, Using CSP look-back techniques to solve real-world
SAT instances, in: Proceedings of the Fourteenth National Conference on Artificial
Intelligence, Providence, RI, 1997, pp. 203–208.
638 Bibliography

[59] E. Benoist and J-J. Hebrard, Recognition of simple enlarged Horn formulas and simple
extended Horn formulas, Annals of Mathematics and Artificial Intelligence, 37 (2003)
251–272.
[60] M. Ben-Or and N. Linial, Collective coin flipping, in: S. Micali, ed., Randomness and
Computation, Academic Press, New York, 1990, pp. 91–115.
[61] C. Benzaken, Algorithmes de dualisation d’une fonction booléenne, R.F.T.I.-Chiffres 9
(1966) 119–128.
[62] C. Benzaken, Post’s closed systems and the weak chromatic number of hypergraphs,
Discrete Mathematics 23 (1978) 77–84.
[63] C. Benzaken, Critical hypergraphs for the weak chromatic number, Journal of Combina-
torial Theory B 29 (1980) 328–338.
[64] C. Benzaken, From logical gates synthesis to chromatic bicritical clutters, Discrete Applied
Mathematics 96–97 (1999) 259–305.
[65] C. Benzaken, S. Boyd, P.L. Hammer and B. Simeone, Adjoints of pure bidirected graphs,
Congressus Numerantium 39 (1983) 123–144.
[66] C. Benzaken, Y. Crama, P. Duchet, P.L. Hammer and F. Maffray, More characterizations
of triangulated graphs, Journal of Graph Theory 14 (1990) 413–422.
[67] C. Benzaken and P.L. Hammer, Linear separation of dominating sets in graphs, Annals of
Discrete Mathematics 3 (1978) 1–10.
[68] C. Benzaken, P.L. Hammer and B. Simeone, Graphes de conflit des fonctions pseudo-
booléennes quadratiques, in: P. Hansen and D. de Werra, eds., Regards sur la Théorie des
Graphes, Presses Polytechniques Romandes, Lausanne, 1980, pp. 165–170.
[69] C. Benzaken, P.L. Hammer and B. Simeone, Some remarks on conflict graphs of quadratic
pseudo-Boolean functions, International Series of Numerical Mathematics 55 (1980)
9–30.
[70] V.L. Beresnev, On a problem of mathematical standardization theory, Upravliajemyje
Sistemy 11 (1973) 43–54 (in Russian).
[71] C. Berge, Graphes et Hypergraphes, Dunod, Paris, 1970. (Graphs and Hypergraphs,
North-Holland, Amsterdam, 1973, revised translation.)
[72] C. Berge, Hypergraphs, North-Holland, Amsterdam, 1989.
[73] J. Berman and P. Köhler, Cardinalities of finite distributive lattices, Mitteilungen aus dem
Mathematischen Seminar Giessen 121 (1976) 103–124.
[74] P. Bertolazzi and A. Sassano, An O(mn) algorithm for regular set-covering problems,
Theoretical Computer Science 54 (1987) 237–247.
[75] P. Bertolazzi and A. Sassano, A class of polynomially solvable set-covering problems,
SIAM Journal on Discrete Mathematics 1 (1988) 306–316.
[76] D. Bertsimas and J. Tsitsiklis, Introduction to Linear Optimization, Athena Scientific,
Paris, 1997.
[77] A. Bhattacharya, B. DasGupta, D. Mubayi and G. Turán, On approximate Horn
minimization, manuscript, 2009.
[78] W. Bibel and E. Eder, Methods and calculi for deduction, in: D.M. Gabbay, C.J. Hogger and
J.A. Robinson, eds., Handbook of Logic in Artificial Intelligence and Logic Programming,
Vol. 1, Logical Foundations, Oxford Science Publications, Clarendon Press, Oxford, 1993,
pp. 67–182.
[79] J.M. Bilbao, Cooperative Games on Combinatorial Structures, Kluwer Academic
Publishers, Dordrecht, 2000.
[80] J.M. Bilbao, J.R. Fernández, A. Jiménez Losada and J.J. López, Generating functions for
computing power indices efficiently, Sociedad de Estadística e Investigación Operativa
Top 8 (2000) 191–213.
[81] J.M. Bilbao, J.R. Fernández, N. Jiménez and J.J. López, Voting power in the Euro-
pean Union enlargement, European Journal of Operational Research 143 (2002)
181–196.
[82] L.J. Billera, Clutter decomposition and monotonic Boolean functions, Annals of the New
York Academy of Sciences 175 (1970) 41–48.
Bibliography 639

[83] L.J. Billera, On the composition and decomposition of clutters, Journal of Combinatorial
Theory 11 (1971) 234–245.
[84] A. Billionnet and M. Minoux, Maximizing a supermodular pseudoboolean function:
A polynomial algorithm for supermodular cubic functions, Discrete Applied Mathematics
12 (1985) 1–11.
[85] A. Billionnet and S. Elloumi, Using a mixed integer quadratic programming solver for the
unconstrained quadratic 0–1 problem, Mathematical Programming 109 (2007) 55–68.
[86] J.C. Bioch, Dualization, decision lists and identification of monotone discrete functions,
Annals of Mathematics and Artificial Intelligence 24 (1998) 69–91.
[87] J.C. Bioch, Decomposition of Boolean functions, in: Y. Crama and P.L. Hammer, eds.,
Boolean Models and Methods in Mathematics, Computer Science, and Engineering,
Cambridge University Press, Cambridge, 2010, pp. 39–75.
[88] J.C. Bioch and T. Ibaraki, Generating and approximating non-dominated coteries, IEEE
Transactions on Parallel and Distributed Systems 6 (1995) 905–914.
[89] J.C. Bioch and T. Ibaraki, Complexity of identification and dualization of positive Boolean
functions, Information and Computation 123 (1995) 50–63.
[90] E. Birnbaum and E.L. Lozinskii, The good old Davis-Putnam procedure helps counting
models, Journal of Artificial Intelligence Research 10 (1999) 455–477.
[91] Z.W. Birnbaum, On the importance of different components in a multicomponent system,
in: P.R. Krishnaiah, ed., Multivariate Analysis-II, Academic Press, New York, 1969.
[92] Z.W. Birnbaum, J.D. Esary and S.C. Saunders, Multi-component systems and structures
and their reliability, Technometrics 3 (1961) 55–77.
[93] H. Björklund, S. Sandberg and S. Vorobyov, Optimization on completely unimodal hyper-
cubes, Technical Report TR-2002-018, Department of Information Technology, Uppsala
University, Sweden May 2002.
[94] H. Björklund, S. Sandberg and S. Vorobyov, Complexity of model checking by iterative
improvement: The pseudo-Boolean framework, in: M. Broy and A.V. Zamulin, eds.,
Perspectives of System Informatics 2003, Lecture Notes in Computer Science, Vol. 2890,
Springer-Verlag, Berlin-Heidelberg, 2003, pp. 381–394.
[95] H. Björklund and S. Vorobyov, Combinatorial structure and randomized subexponential
algorithms for infinite games, Theoretical Computer Science 349 (2005) 347–360.
[96] A. Björner, Homology and shellability of matroids and geometric lattices, in: N. White,
ed., Matroid Applications, Cambridge University Press, Cambridge, 1992, pp. 226–283.
[97] A. Björner, Topological methods, in: R. Graham, M. Grötschel and L. Lovász, eds.,
Handbook of Combinatorics, Elsevier, Amsterdam, 1995, pp. 1819–1872.
[98] C.E. Blair, R.G. Jeroslow and J.K. Lowe, Some results and experiments in program-
ming techniques for propositional logic, Computers and Operations Research 13 (1986)
633–645.
[99] A. Blake, Canonical Expressions in Boolean Algebras, Dissertation, Department of Math-
ematics, University of Chicago, 1937. Published by University of Chicago Libraries,
1938.
[100] B. Bollig, M. Sauerhoff, D. Sieling and I. Wegener, Binary decision diagrams, in:
Y. Crama and P.L. Hammer, eds., Boolean Models and Methods in Mathematics, Computer
Science, and Engineering, Cambridge University Press, Cambridge, 2010, pp. 473–505.
[101] B. Bollobás, C. Borgs, J. Chayes, J.H. Kim and D.B. Wilson, The scaling window of the
2-SAT transition, Random Structures and Algorithms 18 (2001) 201–256.
[102] T. Bonates and P.L. Hammer, Logical Analysis of Data: From combinatorial optimization
to medical applications, Annals of Operations Research 148 (2006) 203–225.
[103] G. Boole, An Investigation of the Laws of Thought, Walton, London, 1854. (Reprinted by
Dover Books, New York, 1954.)
[104] K.S. Booth, Boolean matrix multiplication using only 0(nlog2 7 log n) bit operations,
SIGACT News 9 (Fall 1977) p. 23.
[105] E. Boros, Dualization of aligned Boolean functions, RUTCOR Research Report RRR
9-94, Rutgers University, Piscataway, NJ, 1994.
640 Bibliography

[106] E. Boros, Maximum renamable Horn sub-CNFs, Discrete Applied Mathematics 96–97
(1999) 29–40.
[107] E. Boros and O. Čepek, Perfect 0, ±1 matrices. Discrete Mathematics 165–166 (1997)
81–100.
[108] E. Boros, O. Čepek and A. Kogan, Horn minimization by iterative decomposition, Annals
of Mathematics and Artificial Intelligence 23 (1998) 321–343.
[109] E. Boros, O. Čepek, A. Kogan and P. Kučera, Exclusive and essential sets of implicates of
Boolean functions, RUTCOR Research Report 10-2008, Rutgers University, Piscataway,
NJ, 2008.
[110] E. Boros, O. Čepek, and P. Kučera, Complexity of minimizing the number of clauses and
literals in a Horn CNF, manuscript, 2010.
[111] E. Boros, Y. Crama, O. Ekin, P.L. Hammer, T. Ibaraki and A. Kogan, Boolean normal
forms, shellability and reliability computations, SIAM Journal on Discrete Mathematics
13 (2000) 212–226.
[112] E. Boros, Y. Crama and P.L. Hammer, Polynomial-time inference of all valid implications
for Horn and related formulae, Annals of Mathematics and Artificial Intelligence 1 (1990)
21–32.
[113] E. Boros, Y. Crama and P.L. Hammer, Upper bounds for quadratic 01 maximization,
Operations Research Letters 9 (1990) 7379.
[114] E. Boros, Y. Crama and P.L. Hammer, Chvátal cuts and odd cycle inequalities in quadratic
0-1 optimization, SIAM Journal on Discrete Mathematics 5 (1992) 163–177.
[115] E. Boros, Y. Crama, P.L. Hammer, T. Ibaraki, A. Kogan and K. Makino, Logical Analysis
of Data: Classification with justification, Annals of Operations Research (2011), to appear.
[116] E. Boros, Y. Crama, P.L. Hammer and M. Saks, A complexity index for satisfiability
problems, SIAM Journal on Computing 23 (1994) 45–49.
[117] E. Boros, K.M. Elbassioni, V. Gurvich, L. Khachiyan and K. Makino, Dual-bounded
generating problems: All minimal integer solutions for a monotone system of linear
inequalities, SIAM Journal on Computing 31 (2002) 1624–1643.
[118] E. Boros, K.M. Elbassioni, V. Gurvich and K. Makino, Generating vertices of polyhe-
dra and related monotone generation problems, in: D. Avis, D. Bremner and A. Deza,
eds., Polyhedral Computations, CRM Proceedings and Lecture Notes, Vol. 48, Centre de
Recherches Mathématiques and AMS (2009) pp. 15–44.
[119] E. Boros, K.M. Elbassioni and K. Makino, On Berge multiplication for monotone Boolean
dualization, in: A. Luca et al., eds., Proceedings of the 35th International Colloquium on
Automata, Languages and Programming (ICALP), Lecture Notes in Computer Science,
Vol. 5125, Springer-Verlag, Berlin Heidelberg, 2008, pp. 48–59.
[120] E. Boros, S. Foldes, P.L. Hammer and B. Simeone, A restricted consensus algorithm for
the transitive closure of a digraph, manuscript, in preparation, 2008.
[121] E. Boros, V. Gurvich and P.L. Hammer, Dual subimplicants of positive Boolean functions,
Optimization Methods and Software 10 (1998) 147–156.
[122] E. Boros, V. Gurvich, P.L. Hammer, T. Ibaraki and A. Kogan, Decompositions of partially
defined Boolean functions, Discrete Applied Mathematics 62 (1995) 51–75.
[123] E. Boros, V. Gurvich, L. Khachiyan and K. Makino, Dual-bounded generating problems:
Partial and multiple transversals of a hypergraph, SIAM Journal on Computing 30 (2000)
2036–2050.
[124] E. Boros, V. Gurvich, L. Khachiyan and K. Makino, Dual-bounded generating problems:
Weighted transversals of a hypergraph, Discrete Applied Mathematics 142 (2004) 1–15.
[125] E. Boros and P.L. Hammer, A max-flow approach to improved roof duality in quadratic
0–1 minimization, RUTCOR Research Report RRR 15-1989, Rutgers University, 1989.
[126] E. Boros and P.L. Hammer, A generalization of the pure literal rule for satisfiability
problems, RUTCOR Research Report 20-92, Rutgers University, 1992.
[127] E. Boros and P.L. Hammer, Pseudo-Boolean optimization, Discrete Applied Mathematics
123 (2002) 155–225.
Bibliography 641

[128] E. Boros, P.L. Hammer and J.N. Hooker, Predicting cause-effect relationships from
incomplete discrete observations, SIAM Journal on Discrete Mathematics 7 (1994)
531–543.
[129] E. Boros, P.L. Hammer, T. Ibaraki and K. Kawakami, Polynomial time recognition of
2-monotonic positive Boolean functions given by an oracle, SIAM Journal on Computing
26 (1997) 93–109.
[130] E. Boros, P.L. Hammer, T. Ibaraki and A. Kogan, Logical analysis of numerical data,
Mathematical Programming 79 (1997) 163–190.
[131] E. Boros, P.L. Hammer, T. Ibaraki, A. Kogan, E. Mayoraz and I. Muchnik, An implemen-
tation of logical analysis of data, IEEE Transactions on Knowledge and Data Engineering
12 (2000) 292–306.
[132] E. Boros, P.L. Hammer, M. Minoux and D.J. Rader Jr., Optimal cell flipping to mini-
mize channel density in VLSI design and pseudo-Boolean optimization, Discrete Applied
Mathematics 90 (1999) 69–88.
[133] E. Boros, P.L. Hammer and X. Sun, The DDT method for quadratic 0–1 minimization,
RUTCOR Research Report 39-89, Rutgers University, 1989.
[134] E. Boros, P.L. Hammer and X. Sun, Network flows and minimization of quadratic pseudo-
Boolean functions, RUTCOR Research Report 17-91, Rutgers University, 1991.
[135] E. Boros, P.L. Hammer and X. Sun, Recognition of q-Horn formulae in linear time,
Discrete Applied Mathematics 55 (1994) 1–13.
[136] E. Boros, P.L. Hammer and G. Tavares, Local search heuristics for quadratic unconstrained
binary optimization, Journal of Heuristics 13 (2007) 99–132.
[137] E. Boros, T. Horiyama, T. Ibaraki, K. Makino and M. Yagiura, Finding essential attributes
from binary data, Annals of Mathematics and Artificial Intelligence 39 (2003) 223–257.
[138] E. Boros, T. Ibaraki and K. Makino, Boolean analysis of incomplete examples, in: R.
Karlsson and A. Lingas, eds., Algorithm Theory – SWAT’96, Lecture Notes in Computer
Science, Vol. 1097, Springer-Verlag, Berlin, 1996, pp. 440–451.
[139] E. Boros, T. Ibaraki and K. Makino, Error-free and best-fit extensions of partially defined
Boolean functions, Information and Computation 140 (1998) 254–283.
[140] E. Boros, T. Ibaraki and K. Makino, Logical analysis of binary data with missing bits,
Artificial Intelligence 107 (1999) 219–264.
[141] E. Boros, T. Ibaraki and K. Makino, Fully consistent extensions of partially defined
Boolean functions, in: J. van Leeuwen, O. Watanabe, M. Hagiya, P.D. Mosses and T. Ito,
eds., Theoretical Computer Science - International Conference IFIP TCS 2000, Lecture
Notes in Computer Science, Vol. 1872, Springer, Berlin, 2000, pp. 257–272.
[142] E. Boros, T. Ibaraki and K. Makino, Variations on extending partially defined Boolean
functions with missing bits, Information and Computation 180 (2003) 53–70.
[143] E. Boros, I. Lari and B. Simeone, Block linear majorants in quadratic 01 optimization,
Discrete Applied Mathematics 145 (2004) 52–71.
[144] E. Boros and A. Prékopa, Closed form two-sided bounds for probabilities that at least r or
exactly r out of n events occur, Mathematics of Operations Research 14 (1989) 317–342.
[145] E. Boros and A. Prékopa, Probabilistic bounds and algorithms for the maximum
satisfiability problem, Annals of Operations Research 21 (1989) 109–126.
[146] J.-M. Bourjolly, P.L. Hammer, W.R. Pulleyblank and B. Simeone, Boolean-combinatorial
bounding of maximum 2-satisfiability, in: O. Balci, R. Sharda, S. Zenios, eds., Computer
Science and Operations Research: New Developments in their Interfaces, Pergamon Press,
1992, 23–42.
[147] Y. Boykov, O. Veksler and R. Zabih, Fast approximate energy minimization via graph cuts,
IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (2001) 1222–1239.
[148] G.H. Bradley, P.L. Hammer and L.A. Wolsey, Coefficient reduction for inequalities in
0–1 variables, Mathematical Programming 7 (1974) 263–282.
[149] P.S. Bradley, U.M. Fayyad and O.L. Mangasarian, Mathematical programming for data
mining: Formulations and challenges, INFORMS Journal on Computing 11 (1999)
217–238.
642 Bibliography

[150] S.J. Brams and P.J. Affuso, Power and size: A new paradox, Theory and Decision 7 (1976)
29–56.
[151] A. Brandstädt, P.L. Hammer, V.B. Le and V.V. Lozin, Bisplit graphs, Discrete Mathematics
299 (2005) 11–32.
[152] A. Brandstädt, V.B. Le and J.P. Spinrad, Graph Classes: A Survey, SIAM Monographs on
Discrete Mathematics and Applications, SIAM, Philadelphia, 1999.
[153] R.K. Brayton, G.D. Hachtel, C.T. McMullen, A.L. Sangiovanni-Vincentelli, Logic
Minimization Algorithms for VLSI Synthesis, Kluwer Academic Publishers, Boston, 1984.
[154] A. Bretscher, D.G. Corneil, M. Habib and C. Paul, A simple linear time LexBFS cograph
recognition algorithm (extended abstract), in: Proceedings of the 29th International Work-
shop on Graph-Theoretic Concepts in Computer Science, WG2003, Lecture Notes in
Computer Science, Vol. 2880, Springer-Verlag, Berlin Heidelberg, 2003, pp. 119–130.
[155] A. Bretscher, D.G. Corneil, M. Habib and C. Paul, A simple linear time LexBFS cograph
recognition algorithm, SIAM Journal on Discrete Mathematics 22 (2008) 1277–1296.
[156] F.M. Brown, Boolean Reasoning: The Logic of Boolean Equations, Kluwer Academic
Publishers, Boston - Dordrecht - London, 1990.
[157] J. Bruck, Fourier transforms and threshold circuit complexity, in: Y. Crama and
P.L. Hammer, eds., Boolean Models and Methods in Mathematics, Computer Science,
and Engineering, Cambridge University Press, Cambridge, 2010, pp. 531–553.
[158] R. Bruni, On the orthogonalization of arbitrary Boolean formulae, Journal of Applied
Mathematics and Decision Sciences 2 (2005) 61–74.
[159] R. Bruni and A. Sassano, A complete adaptive solver for propositional satisfiability,
Discrete Applied Mathematics 127 (2003) 523–534.
[160] R.E. Bryant, Graph-based algorithms for Boolean function manipulation, IEEE Transac-
tions on Computers 35 (1986) 677–691.
[161] N.H. Bshouty, Exact learning Boolean functions via the monotone theory, Information
and Computation 123 (1995) 146–153.
[162] N. Bshouty, T.R. Hancock and L. Hellerstein, Learning boolean read-once formulas with
arbitrary symmetric and constant fan-in gates, Journal of Computer and System Sciences
50 (1995) 521–542.
[163] N. Bshouty and C. Tamon, On the Fourier spectrum of monotone functions, Journal of
the Association for Computing Machinery 43 (1996) 747–770.
[164] C. Buchheim and G. Rinaldi, Efficient reduction of polynomial zero-one optimization to
the quadratic case, SIAM Journal on Optimization 18 (2007) 1398–1413.
[165] C. Buchheim and G. Rinaldi, Terse integer linear programs for Boolean optimization,
Journal on Satisfiability, Boolean Modeling and Computation 6 (2009) 121–139.
[166] M. Buro and H. Kleine Büning, Report on a SAT competition, Report Nr. 110,
Mathematik/Informatik, Universität Paderborn, 1992.
[167] W. Büttner and H. Simonis, Embedding Boolean expressions into logic programming,
Journal of Symbolic Computation 4 (1987) 191–205.
[168] R. Cambini, G. Gallo and M.G. Scutellà, Flows on hypergraphs, Mathematical Program-
ming 78 (1997) 195–217.
[169] A. Caprara, M. Fischetti and P. Toth, A heuristic method for the set covering problem,
Operations Research 47 (1999) 730–743.
[170] C. Carlet, Boolean functions for cryptography and error-correcting codes, in: Y. Crama
and P.L. Hammer, eds., Boolean Models and Methods in Mathematics, Computer Science,
and Engineering, Cambridge University Press, Cambridge, 2010, pp. 257–397.
[171] C. Carlet, Vectorial Boolean functions for cryptography, in: Y. Crama and P.L. Hammer,
eds., Boolean Models and Methods in Mathematics, Computer Science, and Engineering,
Cambridge University Press, Cambridge, 2010, pp. 398–469.
[172] O. Čepek, Restricted consensus method and quadratic implicates of pure Horn functions,
RUTCOR Research Report 31, Rutgers University, Piscataway, NJ September 1994.
[173] O. Čepek, Structural properties and minimization of Horn Boolean functions, Ph.D. thesis,
RUTCOR, Rutgers University, Piscataway, NJ, October 1995.
Bibliography 643

[174] O. Čepek and P. Kučera, Known and new classes of generalized Horn formulae with
polynomial recognition and SAT testing, Discrete Applied Mathematics 149 (2005) 14–52.
[175] O. Čepek and P. Kučera, On the complexity of minimizing the number of literals in Horn
formulae, RUTCOR Research Report 11-2008, Rutgers University, Piscataway, NJ, 2008.
[176] S. Ceri, G. Gottlob and L. Tanca, Logic Programming and Databases, Springer-Verlag,
Berlin Heidelberg, 1990.
[177] D. Chai and A. Kuehlmann, A fast pseudo-Boolean constraint solver, IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems 24 (2005) 305–317.
[178] S.T. Chakradhar, V.D. Agrawal and M.L. Bushnell, Neural Models and Algorithms for
Digital Testing, Kluwer Academic Publishers, Boston - Dordrecht - London, 1991.
[179] A.K. Chandra, H.R. Lewis and J.A. Makowsky, Embedded implicational dependencies
and their inference problem, in: Proceedings of the 13th Annual ACM Symposium on the
Theory of Computation, ACM Press, New York, 1981, pp. 342–354.
[180] R. Chandrasekaran, Integer programming problems for which a simple rounding type
algorithm works, in: W.R. Pulleyblank, ed., Progress in Combinatorial Optimization,
Academic Press Canada, Toronto, 1984, pp. 101–106.
[181] V. Chandru, C.R. Coullard, P.L. Hammer, M. Montanez, and X. Sun. On renamable
Horn and generalized Horn functions, Annals of Mathematics and Artificial Intelligence
1 (1990) 33–47.
[182] V. Chandru and J.N. Hooker, Extended Horn sets in propositional logic, Journal of the
ACM 38 (1991) 205–221.
[183] V. Chandru and J.N. Hooker, Detecting embedded Horn structure in propositional logic,
Information Processing Letters 42 (1992) 109–111.
[184] V. Chandru and J.N. Hooker, Optimization Methods for Logical Inference, John Wiley &
Sons, New York etc., 1999.
[185] C.L. Chang, The unit proof and the input proof in theorem proving, Journal of the ACM
14 (1970) 698–707.
[186] C.-L. Chang and R.C. Lee, Symbolic Logic and Mechanical Theorem Proving, Academic
Press, New York - San Francisco - London, 1973.
[187] M.T. Chao and J. Franco, Probabilistic analysis of a generalization of the unit-clause
literal section heuristic for the k-satisfiability problem, Information Science 51 (1990)
289–314.
[188] A. Chateauneuf and J.Y. Jaffray, Some characterizations of lower probabilities and other
monotone capacities through the use of Möbius inversion, Mathematical Social Sciences
17 (1989) 263–283.
[189] S.S. Chaudhry, I.D. Moon and S.T. McCormick, Conditional covering: Greedy heuristics
and computational results, Computers and Operations Research 14 (1987) 11–18.
[190] M. Chein, Algorithmes d’écriture de fonctions Booléennes croissantes en sommes et
produits, Revue Française d’Informatique et de Recherche Opérationnelle 1 (1967)
97–105.
[191] Y. Chen and D. Cooke, On the transitive closure representation and adjustable compres-
sion, in: SAC06 – Proceedings of the 21st Annual ACM Symposium on Applied Computing,
Dijon, France, 2006, pp. 450–455.
[192] G. Choquet, Theory of capacities, Annales de l’Institut Fourier 5 (1954) 131–295.
[193] C.K. Chow, Boolean functions realizable with single threshold devices, in: Proceedings
of the IRE 49 (1961) 370–371.
[194] C.K. Chow, On the characterization of threshold functions, in: IEEE Symposium on
Switching Circuit Theory and Logical Design, 1961, pp. 34–48.
[195] F.R.K. Chung, R.L. Graham and M.E. Saks, A dynamic location problem for graphs,
Combinatorica 9 (1989) 111–132.
[196] R. Church, Enumeration by rank of the elements of the free distributive lattice with 7
generators, Notices of the American Mathematical Society 12 (1965) 724.
[197] V. Chvátal, Edmonds polytopes and a hierarchy of combinatorial problems, Discrete
Mathematics 4 (1973) 305–337.
644 Bibliography

[198] V. Chvátal, A greedy heuristic for the set-covering problem, Mathematics of Operations
Research, 4 (1979) 233–235.
[199] V. Chvátal, Linear Programming, W.H. Freeman and Co., New York, 1983.
[200] V. Chvátal and C. Ebenegger, A note on line digraphs and the directed max-cut problem,
Discrete Applied Mathematics 29 (1990) 165–170.
[201] V. Chvátal and P.L. Hammer, Aggregation of inequalities in integer programming, Annals
of Discrete Mathematics 1 (1977) 145–162.
[202] V. Chvátal and B. Reed, Mick gets some (the odds are on his side), in: Proceedings of
the 33rd Annual IEEE Symposium on the Foundations of Computer Science, IEEE, 1992,
pp. 620–627.
[203] V. Chvátal and E. Szemerédi, Many hard examples for resolution, Journal of the
Association for Computer Machinery 35 (1988) 759–788.
[204] E. Clarke, A. Biere, R. Raimi and Y. Zhu, Bounded model checking using satisfiability
solving, Formal Methods in System Design 19 (2001) 7–34.
[205] C.J. Colbourn, The Combinatorics of Network Reliability, Oxford University Press,
New York, 1987.
[206] C.J. Colbourn, Boolean aspects of network reliability, in: Y. Crama and P.L. Hammer,
eds., Boolean Models and Methods in Mathematics, Computer Science, and Engineering,
Cambridge University Press, Cambridge, 2010, pp. 723–759.
[207] M. Conforti, G. Cornuéjols and C. de Francesco, Perfect 0, ±1 matrices, Linear Algebra
and its Applications 43 (1997) 299–309.
[208] S.A. Cook, The complexity of theorem-proving procedures, in: Proceedings of the Third
ACM Symposium on the Theory of Computing, 1971, pp. 151–158.
[209] S.A. Cook and D.G. Mitchell, Finding hard instances for the satisfiability problem: A
survey, in: D. Du, J. Gu and P.M. Pardalos, eds., Satisfiability Problem: Theory and Appli-
cations, DIMACS series in Discrete Mathematics and Theoretical Computer Science,
Vol. 35, American Mathematical Society, 1997, pp. 1–17.
[210] W.J. Cook, C.R. Coullard and Gy. Turán, On the complexity of cutting-plane proofs,
Discrete Applied Mathematics 18 (1987) 25–38.
[211] W.J. Cook, W.H. Cunningham, W.R. Pulleyblank and A. Schrijver, Combinatorial
Optimization, Wiley-Interscience, New York, 1998.
[212] D. Coppersmith and S. Winograd, On the asymptotic complexity of matrix multiplication,
SIAM Journal on Computing 11 (1982) 472–492.
[213] D. Corneil, H. Lerchs and L. Burlingham, Complement reducible graphs, Discrete Applied
Mathematics 3 (1981) 163–174.
[214] D. Corneil, Y. Perl and L. Stewart, A linear recognition algorithm for cographs, SIAM
Journal on Computing 14 (1985) 926–934.
[215] G. Cornuéjols, Combinatorial Optimization, SIAM, Philadelphia, 2001.
[216] R.W. Cottle and A.F. Veinott, Polyhedral sets having a least element, Mathematical
Programming 3 (1972) 238–249.
[217] M. Couceiro and S. Foldes, Definability of Boolean function classes by linear equations
over GF(2), Discrete Applied Mathematics 142 (2004) 29–34.
[218] M. Couceiro and S. Foldes, On closed sets of relational constraints and classes of functions
closed under variable substitutions, Algebra Universalis 54 (2005) 149–165.
[219] M. Couceiro and S. Foldes, Functional equations, constraints, definability of function
classes, and functions of Boolean variables, Acta Cybernetica 18 (2007) 61–75.
[220] M. Couceiro and M. Pouzet, On a quasi-ordering on Boolean functions, Theoretical
Computer Science 396 (2008) 71–87.
[221] O. Coudert, Two-level logic minimization: An overview, Integration: The VLSI Journal
17 (1994) 97–140.
[222] O. Coudert and T. Sasao, Two-level logic minimization, in: Logic Synthesis and Verifica-
tion, S. Hassoun and T. Sasao, eds., Kluwer Academic Publishers, Norwell, MA, 2002,
pp. 1–27.
Bibliography 645

[223] M.B. Cozzens and R. Leibowitz, Multidimensional scaling and threshold graphs, Journal
of Mathematical Psychology 31 (1987) 179–191.
[224] Y. Crama, Recognition and Solution of Structured Discrete Optimization Problems, Ph.D.
thesis, Rutgers University, Piscataway, NJ, 1987.
[225] Y. Crama, Dualization of regular Boolean functions, Discrete Applied Mathematics 16
(1987) 79–85.
[226] Y. Crama, Recognition problems for special classes of polynomials in 0–1 variables,
Mathematical Programming 44 (1989) 139–155.
[227] Y. Crama, Concave extensions for nonlinear 0–1 maximization problems, Mathematical
Programming 61 (1993) 53–60.
[228] Y. Crama, Combinatorial optimization models for production scheduling in automated
manufacturing systems, European Journal of Operational Research 99 (1997) 136–153.
[229] Y. Crama, O. Ekin and P.L. Hammer, Variable and term removal from Boolean formulae,
Discrete Applied Mathematics 75 (1997) 217–230.
[230] Y. Crama and P.L. Hammer, Recognition of quadratic graphs and adjoints of bidirected
graphs, in: G.S. Bloom, R.L. Graham and J. Malkevitch, eds., Combinatorial Mathemat-
ics: Proceedings of the Third International Conference, Annals of the New York Academy
of Sciences, Vol. 555, 1989, pp. 140–149.
[231] Y. Crama and P.L. Hammer, eds., Boolean Models and Methods in Mathematics, Computer
Science, and Engineering, Cambridge University Press, Cambridge, 2010.
[232] Y. Crama, P.L. Hammer and R. Holzman, A characterization of a cone of pseudo-Boolean
functions via supermodularity-type inequalities, in: P. Kall, J. Kohlas, W. Popp and
C.A. Zehnder, eds., Quantitative Methoden in den Wirtschaftswissenschaften, Springer-
Verlag, Berlin-Heidelberg, 1989, pp. 53–55.
[233] Y. Crama, P.L. Hammer and T. Ibaraki, Cause-effect relationships and partially defined
Boolean functions, Annals of Operations Research 16 (1988) 299–326.
[234] Y. Crama, P.L. Hammer, B. Jaumard and B. Simeone, Product form parametric represen-
tation of the solutions to a quadratic Boolean equation, RAIRO - Operations Research 21
(1987) 287–306.
[235] Y. Crama, P. Hansen and B. Jaumard, The basic algorithm for pseudo-Boolean program-
ming revisited, Discrete Applied Mathematics 29 (1990) 171–185.
[236] Y. Crama and L. Leruth, Control and voting power in corporate networks: Concepts and
computational aspects, European Journal of Operational Research 178 (2007) 879–893.
[237] Y. Crama, L. Leruth, L. Renneboog and J.-P. Urbain, Corporate control concentration
measurement and firm performance, in: J.A. Batten and T.A. Fetherston, eds., Social
Responsibility: Corporate Governance Issues, Research in International Business and
Finance (Volume 17), Elsevier, Amsterdam, 2003, pp. 123–149.
[238] Y. Crama and J.B. Mazzola, Valid inequalities and facets for a hypergraph model of the
nonlinear knapsack and FMS part-selection problems, Annals of Operations Research 58
(1995) 99–128.
[239] J.M. Crawford and L.D. Auton, Experimental results on the crossover point in random
3-SAT, Artificial Intelligence 81 (1996) 31–57.
[240] N. Creignou, A dichotomy theorem for maximum generalized satisfiability problems,
Journal of Computer and System Sciences 51 (1995) 511–522.
[241] N. Creignou and H. Daudé, Generalized satisfiability problems: Minimal elements and
phase transitions, Theoretical Computer Science 302 (2003) 417–430.
[242] N. Creignou and H. Daudé, The SAT–UNSAT transition for random constraint satisfaction
problems, Discrete Mathematics 309 (2009) 2085–2099.
[243] N. Creignou, S. Khanna and M. Sudan, Complexity Classifications of Boolean Constraint
Satisfaction Problems, SIAM Monographs on Discrete Mathematics and Applications,
SIAM, Philadelphia, 2001.
[244] P. Crescenzi and V. Kann, eds., A compendium of NP optimization problems, pub-
lished electronically at http://www.nada.kth.se/∼viggo/wwwcompendium/
(2005).
646 Bibliography

[245] H.P. Crowder, E.L. Johnson and M.W. Padberg, Solving large-scale zero–one linear
programming problems, Operations Research 31 (1983) 803–834.
[246] J. Cubbin and D. Leech, The effect of shareholding dispersion on the degree of control in
British companies: Theory and measurement, The Economic Journal 93 (1983) 351–369.
[247] R. Cunninghame-Green, Minimax Algebra, Lecture Notes in Economics and Mathemat-
ical Systems, Vol. 166, Springer, Berlin, 1979.
[248] H.A. Curtis, A New Approach to the Design of Switching Circuits, D. Van Nostrand,
Princeton, NJ, 1962.
[249] S.L.A. Czort, The Complexity of Minimizing Disjunctive Normal Form Formulas,
Master’s thesis, University of Aarhus, 1999.
[250] E. Dahlhaus, Learning monotone read-once formulas in quadratic time, Unpublished
manuscript, Department of Computer Science, University of Sydney, 1990.
[251] E. Dahlhaus, Efficient parallel recognition algorithms of cographs and distance hereditary
graphs, Discrete Applied Mathematics 57 (1995) 29–44.
[252] V. Dahllöf, P. Jonsson and M. Wahlström, Counting models for 2SAT and 3SAT formulae,
Theoretical Computer Science 332 (2005) 265–291.
[253] M. Dalal and D.W. Etherington, A hierarchy of tractable satisfiability problems,
Information Processing Letters 44 (1992) 173–180.
[254] G. Danaraj and V. Klee, Which spheres are shellable? Annals of Discrete Mathematics 2
(1978) 33–52.
[255] E. Dantsin, A. Goerdt, E.A. Hirsch, R. Kannan, J. Kleinberg, Ch. Papadimitriou, P. Ragha-
van and U. Schöning, A deterministic (2 − 2/(k + 1))n algorithm for k-SAT based on local
search, Theoretical Computer Science 289 (2002) 69–83.
[256] G.B. Dantzig, On the significance of solving linear programming problems with some
integer variables, Econometrica 28 (1960) 30–44.
[257] A. Darwiche, New advances in compiling CNF to decomposable negation normal form,
in: Proceedings of the 16th European Conference on Artificial Intelligence, Valencia,
Spain, 2004, pp. 328–332.
[258] S.B. Davidson, H. Garcia-Molina and D. Skeen, Consistency in partitioned networks,
ACM Computing Surveys 17 (1985) 341–370.
[259] M. Davio, J.-P. Deschamps and A. Thayse, Discrete and Switching Functions,
McGraw-Hill, New York, 1978.
[260] M. Davis, G. Logemann and D. Loveland, A machine program for theorem-proving,
Communications of the ACM 5 (1962) 394–397.
[261] M. Davis and H. Putnam, A computing procedure for quantification theory, Journal of
the Association for Computing Machinery 7 (1960) 201–215.
[262] T. Davoine, P.L. Hammer and B. Vizvári, A heuristic for Boolean optimization problems,
Journal of Heuristics 9 (2003) 229–247.
[263] P.M. Dearing, P.L. Hammer and B. Simeone, Boolean and graph theoretic formulations
of the simple plant location problem, Transportation Science 26 (1992) 138–148.
[264] R. Dechter and J. Pearl, Structure identification in relational data, Artificial Intelligence
58 (1992) 237–270.
[265] E. de Klerk and J.P. Warners, Semidefinite programming relaxations for MAX 2-SAT and
3-SAT: Computational perspectives, in: P.M. Pardalos, A. Migdalas and R.E. Burkard,
eds., Combinatorial and Global Optimization, Series on Applied Optimization, Volume
14, World Scientific Publishers, River Edge, NJ, 2002, pp. 161–176.
[266] E. de Klerk, J.P. Warners and H. van Maaren, Relaxations of the satisfiability problem
using semidefinite programming, Journal of Automated Reasoning 24 (2000) 37–65.
[267] C. Delobel and R.G. Casey, Decomposition of a database and the theory of Boolean
switching functions, IBM Journal of Research and Development 17 (1973) 374–386.
[268] X. Deng and C.H. Papadimitriou, On the complexity of cooperative solution concepts,
Mathematics of Operations Research 19 (1994) 257–266.
[269] M.L. Dertouzos, Threshold Logic: A Synthesis Approach, M.I.T. Press, Cambridge, MA,
1965.
Bibliography 647

[270] M.M. Deza and M. Laurent, Geometry of Cuts and Metrics, Springer-Verlag, Berlin, 1997.
[271] I. Diakonikolas and R.A. Servedio, Improved approximation of linear threshold functions,
in: Proceedings of the 24th Annual IEEE Conference on Computational Complexity, IEEE
Computer Society, Los Alamitos, CA, 2009, pp. 161–172.
[272] G. Ding, Monotone clutters, Discrete Mathematics 119 (1993) 67–77.
[273] G. Ding, R.F. Lax, J. Chen and P.P. Chen, Formulas for approximating pseudo-Boolean
random variables, Discrete Applied Mathematics 156 (2008) 1581–1597.
[274] G. Ding, R.F. Lax, J. Chen, P.P. Chen and B.D. Marx, Transforms of pseudo-Boolean
random variables, Discrete Applied Mathematics 158 (2010) 13–24.
[275] C. Domingo, N. Mishra and L. Pitt, Efficient read-restricted monotone CNF/DNF
dualization by learning with membership queries, Machine Learning 37 (1999) 89–110.
[276] G. Dong and J. Li, Mining border descriptions of emerging patterns from dataset pairs,
Knowledge Information Systems 8 (2005) 178–202.
[277] W.F. Dowling and J.H. Gallier, Linear time algorithms for testing the satisfiability of
propositional Horn formulae, Journal of Logic Programming 3 (1984) 267–284.
[278] D. Du, J. Gu and P.M. Pardalos, eds., Satisfiability Problem: Theory and Applications,
DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol. 35,
American Mathematical Society, 1997.
[279] P. Dubey and L.S. Shapley, Mathematical properties of the Banzhaf power index,
Mathematics of Operations Research 4 (1979) 99–131.
[280] O. Dubois, Counting the number of solutions for instances of satisfiability problems,
Theoretical Computer Science 81 (1991) 49–64.
[281] O. Dubois, P. André, Y. Boufkhad and J. Carlier, SAT versus UNSAT, in: D.S. Johnson
and M.A. Trick, eds., Cliques, Coloring, and Satisfiability, DIMACS Series in Discrete
Mathematics and Theoretical Computer Science, Vol. 26,American Mathematical Society,
1996, pp. 415–436.
[282] O. Dubois, Y. Boufkhad and J. Mandler, Typical random 3-SAT formulae and the sat-
isfiability threshold, in: Proceedings of the Eleventh Annual ACM-SIAM Symposium on
Discrete Algorithms, 2000, pp. 126–127.
[283] O. Dubois and G. Dequen, A backbone-search heuristic for efficient solving of hard 3-
SAT formulae, in: Proceedings of the 17th International Joint Conference on Artificial
Intelligence (IJCAI’01), Seattle, Washington, 2001, pp. 248–253.
[284] P. Duchet, Classical perfect graphs, in: Topics on Perfect Graphs, North-Holland,
Amsterdam, 1984, pp. 67–96.
[285] Ch. Ebenegger, P.L. Hammer and D. de Werra, Pseudo-Boolean functions and stability
of graphs, Annals of Discrete Mathematics 19 (1984) 83–97.
[286] J. Ebert,Asensitive transitive closure algorithm, Information Processing Letters 12 (1981)
255–258.
[287] J. Edmonds, Submodular functions, matroids, and certain polyhedra, in: R. Guy,
H. Hanani, N. Sauer and J. Schönheim, eds., Combinatorial Structures and Their
Applications, Gordon and Breach, New York, 1970, pp. 69–87.
[288] J. Edmonds and D.R. Fulkerson, Bottleneck extrema, Journal of Combinatorial Theory
8 (1970) 299–306.
[289] N. Eén and N. Sörensson, An extensible SAT-solver, in: Proceedings of the 6th
International Conference on Theory and Applications of Satisfiability Testing, 2003.
[290] N. Eén and N. Sörensson, Translating pseudo-Boolean constraints into SAT, Journal on
Satisfiability, Boolean Modeling and Computation 2 (2006) 1–26.
[291] E. Einy, The desirability relation of simple games, Mathematical Social Sciences 10 (1985)
155–168.
[292] E. Einy and E. Lehrer, Regular simple games, International Journal of Game Theory 18
(1989) 195–207.
[293] T. Eiter, Exact transversal hypergraphs and application to Boolean µ-functions, Journal
of Symbolic Computation 17 (1994) 215–225.
[294] T. Eiter, Generating Boolean µ-expressions, Acta Informatica 32 (1995) 171–187.
648 Bibliography

[295] T. Eiter and G. Gottlob, Identifying the minimal transversals of a hypergraph and related
problems, SIAM Journal on Computing 24 (1995) 1278–1304.
[296] T. Eiter, T. Ibaraki and K. Makino, Double Horn functions, Information and Computation
144 (1998) 155–190.
[297] T. Eiter, T. Ibaraki and K. Makino, Computing intersections of Horn theories for reasoning
with models, Artificial Intelligence 110 (1999) 57–101.
[298] T. Eiter, T. Ibaraki and K. Makino, Bidual Horn functions and extensions, Discrete Applied
Mathematics 96 (1999) 55–88.
[299] T. Eiter, T. Ibaraki and K. Makino. On the difference of Horn theories, Journal of Computer
and System Sciences 61 (2000) 478–507.
[300] T. Eiter, T. Ibaraki and K. Makino, Disjunction of Horn theories and their cores, SIAM
Journal on Computing 31 (2001) 269–288.
[301] T. Eiter, P. Kilpelainen and H. Mannila, Recognizing renamable generalized propositional
Horn formulas is NP-complete, Discrete Applied Mathematics 59 (1995) 23–31.
[302] T. Eiter, K. Makino and G. Gottlob, Computational aspects of monotone dualization: A
brief survey, Discrete Applied Mathematics 156 (2008) 2035–2049.
[303] O. Ekin, Special Classes of Boolean Functions, Ph.D. Thesis, Rutgers University,
Piscataway, NJ, 1997.
[304] O. Ekin Karaşan, Dualization of quadratic Boolean functions, Annals of Operations
Research (2011), to appear.
[305] O. Ekin, S. Foldes, P.L. Hammer and L. Hellerstein, Equational characterizations of
Boolean function classes, Discrete Mathematics 211 (2000) 27–51.
[306] O. Ekin, P.L. Hammer and A. Kogan, On connected Boolean functions, Discrete Applied
Mathematics 96/97 (1999) 337–362.
[307] O. Ekin, P.L. Hammer and A. Kogan, Convexity and logical analysis of data, Theoretical
Computer Science 244 (2000) 95–116.
[308] O. Ekin, P.L. Hammer and U.N. Peled, Horn functions and submodular Boolean functions,
Theoretical Computer Science 175 (1997) 257–270.
[309] K.M. Elbassioni, On the complexity of monotone dualization and generating minimal
hypergraph transversals, Discrete Applied Mathematics 156 (2008) 2109–2123.
[310] C.C. Elgot, Truth functions realizable by single threshold organs, in: IEEE Symposium
on Switching Circuit Theory and Logical Design, 1961, pp. 225–245.
[311] M.R. Emamy-K., The worst case behavior of a greedy algorithm for a class of pseudo-
Boolean functions, Discrete Applied Mathematics 23 (1989) 285–287.
[312] P. Erdős, On some extremal problems in graph theory, Israel Journal of Mathematics 3
(1965) 113–116.
[313] P. Erdős and T. Gallai, Graphen mit Punkten vorgeschriebenen Graden, Mat. Lapok 11
(1960) 264–274.
[314] P. Erdős and J. Spencer, Probabilistic Methods in Combinatorics, Akadémiai Kiadó,
Budapest, 1974.
[315] B. Escoffier and V.Th. Paschos, Differential approximation of MIN SAT, MAX SAT and
related problems, European Journal of Operational Research 181 (2007) 620–633.
[316] E. Eskin, E. Halperin and R.M. Karp, Efficient reconstruction of haplotype structure via
perfect phylogeny, Journal of Bioinformatics and Computational Biology 1 (2003) 1–20.
[317] R. Euler, Regular (2,2)-systems, Mathematical Programming 24 (1982) 269–283.
[318] S. Even, A. Itai and A. Shamir, On the complexity of timetable and multicommodity flow
problems, SIAM Journal on Computing 5 (1976) 691–703.
[319] R. Fagin, Functional dependencies in a relational database and propositional logic, IBM
Journal of Research and Development 21 (1977) 534–544.
[320] R. Fagin, Horn clauses and database dependencies, Journal of the ACM 29 (1982)
952–985.
[321] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy, Advances in Knowledge
Discovery and Data Mining, The MIT Press, Cambridge, MA, 1996.
[322] T. Feder, Network flow and 2-satisfiability, Algorithmica 11 (1994) 291–319.
Bibliography 649

[323] T. Feder, Stable Networks and Product Graphs, Memoirs of the American Mathematical
Society, Vol. 116, No. 555, Providence, RI, 1995.
[324] U. Feige, A threshold of ln n for approximating set cover, Journal of the Association for
Computing Machinery 45 (1998) 634–652.
[325] U. Feige and M.X. Goemans, Approximating the value of two prover proof sys-
tems, with applications to MAX SAT and MAX DICUT, in: Proceedings of the
Third Israel Symposium on Theory of Computing and Systems, Tel Aviv, Israel, 1995,
pp. 182–189.
[326] J. Feldman, Minimization of Boolean complexity in human concept learning, Nature 407
(2000) 630–633.
[327] J. Feldman, An algebra of human concept learning, Journal of Mathematical Psychology
50 (2006) 339–368.
[328] V. Feldman, Hardness of approximate two-level logic minimization and PAC learning
with membership queries, in: Proceedings of the 38th ACM Symposium on Theory of
Computing (STOC) 2006, pp. 363–372.
[329] D.S. Felsenthal and M. Machover, The Measurement of Voting Power: Theory and
Practice, Problems and Paradoxes, Edward Elgar, Cheltenham, UK, 1998.
[330] J.R. Fernández, E. Algaba, J.M. Bilbao, A. Jiménez, N. Jiménez and J.J. López, Generating
functions for computing the Myerson value, Annals of Operations Research 109 (2002)
143–158
[331] M.J. Fischer and A.R. Meyer, Boolean matrix multiplication and transitive closure, in:
Proceedings of the 12th Annual IEEE Symposium on the Foundations of Computer
Science, IEEE, 1971, pp. 129–131.
[332] M.L. Fisher, G.L. Nemhauser and L.A. Wolsey, An analysis of approximations for
maximizing submodular set functions - II, Mathematical Programming Study 8 (1978)
73–87.
[333] C. Flament, L’analyse booléenne de questionnaires, Mathématiques et Sciences Humaines
12 (1966) 3–10.
[334] S. Foldes, Equational classes of Boolean functions via the HSP Theorem, Algebra
Universalis 44 (2000) 309–324.
[335] S. Foldes and P.L. Hammer, Split graphs, Congressus Numerantium 19 (1977)
311–315.
[336] S. Foldes and P.L. Hammer, Disjunctive and conjunctive normal forms of pseudo-Boolean
functions, Discrete Applied Mathematics 107 (2000) 1–26.
[337] S. Foldes and P.L. Hammer, Monotone, Horn and quadratic pseudo-Boolean functions,
Journal of Universal Computer Science 6 (2000) 97–104.
[338] S. Foldes and P.L. Hammer, Disjunctive analogues of submodular and supermodular
pseudo-Boolean functions, Discrete Applied Mathematics 142 (2004) 53–65.
[339] S. Foldes and P.L. Hammer, Submodularity, supermodularity, and higher-order mono-
tonicities of pseudo-Boolean functions, Mathematics of Operations Research 30 (2005)
453–461.
[340] S. Foldes and G.R. Pogosyan, Post classes characterized by functional terms, Discrete
Applied Mathematics 142 (2004) 35–51.
[341] L.R. Ford and D.R. Fulkerson, Flows in Networks, Princeton University Press, Princeton,
NJ 1962.
[342] R. Fortet, L’algèbre de Boole et ses applications en recherche opérationnelle, Cahiers du
Centre d’Etudes de Recherche Opérationnelle 1 (1959) 5–36.
[343] R. Fortet, Applications de l’algèbre de Boole en recherche opérationnelle, Revue
Française de Recherche Opérationnelle 4 (1960) 17–26.
[344] J. Franco, Probabilistic analysis of satisfiability algorithms, in: Y. Crama and P.L. Hammer,
eds., Boolean Models and Methods in Mathematics, Computer Science, and Engineering,
Cambridge University Press, Cambridge, 2010, pp. 99–159.
[345] J. Franco and M. Paull, Probabilistic analysis of the Davis-Putnam procedure for solving
the satisfiability problem, Discrete Applied Mathematics 5 (1983) 77–87.
650 Bibliography

[346] L. Fratta and U.G. Montanari, A Boolean algebra method for computing the terminal
reliability in a communication network, IEEE Transactions on Circuit Theory CT-20
(1973) 203–211.
[347] M. Fredman and L. Khachiyan, On the complexity of dualization of monotone disjunctive
normal forms, Journal of Algorithms 21 (1996) 618–628.
[348] E. Friedgut, Sharp threshold of graph properties, and the k-SAT problem, Journal of the
American Mathematical Society 12 (1999) 1017–1054 (with an appendix by J. Bourgain).
[349] A.M. Frieze and B. Reed, Probabilistic analysis of algorithms, in: M. Habib, C. McDi-
armid, J. Ramirez-Alfonsin and B. Reed, eds., Probabilistic Methods for Algorithmic
Discrete Mathematics, Springer, Berlin, 1998, pp. 36–92.
[350] A. Frieze and N.C. Wormald, Random k-SAT: A tight threshold for moderately growing k,
Combinatorica 25 (2005) 297–305.
[351] S. Fujishige, Submodular Functions and Optimization, Annals of Discrete Mathematics
Vol. 58, Elsevier, Amsterdam, 2005.
[352] T. Fujito, On approximation of the submodular set cover problem, Operations Research
Letters 25 (1999) 169–174.
[353] D.R. Fulkerson, Networks, frames, blocking systems, in: G.B. Dantzig and A.F. Veinott
Jr., eds., Mathematics of the Decision Sciences - Part I, American Mathematical Society,
Providence, RI, 1968, pp. 303–334.
[354] M. Fürer and S.P. Kasiviswanathan, Algorithms for counting 2-SAT solutions and col-
orings with applications, Algorithmic Aspects in Information and Management, Lecture
Notes in Computer Science, Vol. 4508, Springer-Verlag, Berlin, 2007, pp. 47–57.
[355] M.E. Furman, Application of a method of fast multiplication to the problem of finding
the transitive closure of a graph, Soviet Mathematics Doklady 22 (1970) 1252.
[356] I.J. Gabelman, The Functional Behavior of Majority (Threshold) Elements, Ph.D.
Dissertation, Department of Electrical Engineering, Syracuse University, NY, 1961.
[357] H.N. Gabow and R.E. Tarjan, A linear-time algorithm for a special case of disjoint set
union, Journal of Computer and System Sciences 30 (1996) 209–221.
[358] T. Gallai, Transitiv orientierbare Graphen, Acta Mathematica Academiae Scientiarum
Hungaricae 18 (1967) 25–66.
[359] H. Gallaire and J. Minker, eds., Logic and Data Bases, Plenum, New York, 1978.
[360] G. Gallo, C. Gentile, D. Pretolani and G. Rago, Max Horn sat and the minimum cut
problem in directed hypergraphs, Mathematical Programming 80 (1998) 213–237.
[361] G. Gallo, G. Longo, S. Nguyen and S. Pallottino, Directed hypergraphs and applications,
Discrete Applied Mathematics 42 (1993) 177–201.
[362] G. Gallo and M.G. Scutellà, Polynomially solvable satisfiability problems, Information
Processing Letters 29 (1988) 221–227.
[363] G. Gallo and M.G. Scutellà, Directed hypergraphs as a modelling paradigm, Rivista
AMASES 21 (1998) 97–123.
[364] G. Gallo and B. Simeone, On the supermodular knapsack problem, Mathematical
Programming Study 45 (1989) 295–309.
[365] G. Gallo and G. Urbani, Algorithms for testing the satisfiability of propositional formulae,
Journal of Logic Programming 7 (1989) 45–61.
[366] G. Galperin and A. Tolpygo, Moscow Mathematical Olympiads, in: A. Kolmogorov, ed.,
Prosveschenie (Education), Moscow, USSR, 1986, Problem 72 (in Russian).
[367] F. Galvin, Horn sentences, Annals of Mathematical Logic 1 (1970) 389–422.
[368] G. Gambarelli, Power indices for political and financial decision making, Annals of
Operations Research 51 (1994) 165–173.
[369] B. Ganter and R. Wille, Formal Concept Analysis - Mathematical Foundations, Springer-
Verlag, Berlin, 1999.
[370] H. Garcia-Molina and D. Barbara, How to assign votes in a distributed system, Journal
of the Association for Computer Machinery 32 (1985) 841–860.
[371] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of
NP-Completeness, W.H. Freeman, New York, 1979.
Bibliography 651

[372] M.R. Garey, D.S. Johnson and L. Stockmeyer, Some simplified NP-complete graph
problems, Theoretical Computer Science 1 (1976) 237–267.
[373] M.A. Garrido, A. Márquez, A. Morgana and J.R. Portillo, Single bend wiring on surfaces,
Discrete Applied Mathematics 117 (2002) 27–40.
[374] F. Gavril, Testing for equality between maximum matching and minimum node covering,
Information Processing Letters 6 (1977) 199–202.
[375] F. Gavril, An efficiently solvable graph partition problem to which many problems are
reducible, Information Processing Letters 45 (1993) 285–290.
[376] A. Genkin, C.A. Kulikowski and I.B. Muchnik, Set covering submodular maximization:
An optimal algorithm for data mining in bioinformatics and medical informatics, Journal
of Intelligent and Fuzzy Systems 12 (2002) 5–17.
[377] I. Gent, H. van Maaren and T. Walsh, eds., SAT2000: Highlights of Satisfiability Research
in the Year 2000, IOS Press, Amsterdam, 2000.
[378] F. Giannessi and F. Niccolucci, Connections between nonlinear and integer programming
problems, Symposia Mathematica XIX (1976) 161–176.
[379] R. Giles and R. Kannan, A characterization of threshold matroids, Discrete Mathematics
30 (1980) 181–184.
[380] P.C. Gilmore, A proof method for quantification theory: Its justification and realization,
IBM Journal of Research and Development 4 (1960) 28–35.
[381] J.F. Gimpel, A method of producing a Boolean function having an arbitrarily pre-
scribed prime implicant table, IEEE Transactions on Electronic Computers EC-14 (1965)
485–488.
[382] J.F. Gimpel, A reduction technique for prime implicant tables, IEEE Transactions on
Electronic Computers EC-14 (1965) 535–541.
[383] A. Ginsberg, Knowledge-base reduction: A new approach to checking knowledge bases
for inconsistency and redundancy, in: Proceedings of the Seventh National Conference
on Artificial Intelligence, 1988, pp. 585–589.
[384] E. Giunchiglia, F. Giunchiglia and A. Tacchella, SAT-based decision procedures for clas-
sical modal logics, in: I. Gent, H. van Maaren and T. Walsh, eds., SAT2000: Highlights
of Satisfiability Research in the Year 2000, IOS Press, Amsterdam, 2000, pp. 403–426.
[385] V.V. Glagolev, Some estimates of disjunctive normal forms of functions in the algebra of
logic, in: Problems of Cybernetics, Vol. 19, Nauka, Moscow, 1967, pp. 75–94 (in Russian).
[386] F. Glover and J-K. Hao, Efficient evaluations for solving large 0–1 unconstrained quadratic
optimisation problems, International Journal of Metaheuristics 1 (2010) 3–10.
[387] F. Glover and E. Woolsey, Converting the 0-1 polynomial programming problem to a 0-1
linear program, Operations Research 22 (1974) 180–182.
[388] R. Gnanadesikan, Methods for Statistical Data Analysis of Multivariate Observations,
Wiley-Interscience, New York, 1977.
[389] M.X. Goemans and D.P. Williamson, New 34 -approximation algorithm for the maximum
satisfiability problem, SIAM Journal on Discrete Mathematics 7 (1994) 656–666.
[390] A. Goerdt, A threshold for unsatisfiability, in: I.M. Havel and V. Koubek, eds., Proceedings
of the 17th International Symposium on Mathematical Foundations of Computer Science,
Lecture Notes in Computer Science, Vol. 629, Springer-Verlag, Berlin, 1992, pp. 264–274.
[391] G. Gogic, C. Papadimitriou and M. Sideri, Incremental recompilation of knowledge,
Journal of Artificial Intelligence Research 8 (1998) 23–37.
[392] E. Goldberg and Y. Novikov, BerkMin: A fast and robust SAT solver, Discrete Applied
Mathematics 155 (2007) 1549–1561.
[393] B. Goldengorin, Maximization of submodular functions: Theory and enumeration
algorithms, European Journal of Operational Research 198 (2009) 102–112.
[394] B. Goldengorin, D. Ghosh and G. Sierksma, Equivalent instances of the simple plant
location problem, SOM Research Report No. 00A54, University of Groningen, The
Netherlands, 2000.
[395] B. Goldengorin, D. Ghosh and G. Sierksma, Branch and peg algorithms for the simple
plant location problem, Computers and Operations Research 31 (2004) 241–255.
652 Bibliography

[396] S.A. Goldman, M.J. Kearns and R.E. Schapire, Exact identification of read-once formulas
using fixed points of amplification functions, SIAM Journal on Computing 22 (1993)
705–726.
[397] J. Goldsmith, R.H. Sloan, B. Szorenyi and G. Turán, Theory revision with queries: Horn,
read-once, and parity formulas, Artificial Intelligence 156 (2004) 139–176.
[398] M.C. Golumbic, Algorithmic Graph Theory and Perfect Graphs, Academic Press,
New York, 1980. Second edition: Annals of Discrete Mathematics, Vol. 57, Elsevier,
Amsterdam, 2004.
[399] M.C. Golumbic and A. Mintz, Factoring logic functions using graph partitioning, in:
Proceedings of the IEEE/ACM International Conference on Computer Aided Design,
November 1999, pp. 195–198.
[400] M.C. Golumbic, A. Mintz and U. Rotics, Factoring and recognition of read-once functions
using cographs and normality, in: Proceedings of the 38th Design Automation Conference,
June 2001, pp. 109–114.
[401] M.C. Golumbic, A. Mintz and U. Rotics, Factoring and recognition of read-once functions
using cographs and normality and the readability of functions associated with partial
k-trees, Discrete Applied Mathematics 154 (2006) 1465–1477.
[402] M.C. Golumbic, A. Mintz and U. Rotics, An improvement on the complexity of factoring
read-once Boolean functions, Discrete Applied Mathematics 156 (2008) 1633–1636.
[403] C.P. Gomes, B. Selman, N. Crato and H. Kautz, Heavy-tailed phenomena in satisfiability
and constraint satisfaction problems, in: I. Gent, H. van Maaren and T. Walsh, eds.,
SAT2000: Highlights of Satisfiability Research in the Year 2000, IOS Press, Amsterdam,
2000, pp. 15–41.
[404] A. Goralcikova and V. Koubek, A reduct and closure algorithm for graphs, in: Pro-
ceedings of the 8th Symposium on Mathematical Foundations of Computer Science
(MFCS’79), Lecture Notes in Computer Science, Vol. 74, Springer-Verlag, Berlin, 1979,
pp. 301–307.
[405] M. Grabisch, The application of fuzzy integrals in multicriteria decision making, European
Journal of Operational Research 89 (1996) 445–456.
[406] M. Grabisch, J.-L. Marichal, R. Mesiar and E. Pap, Aggregation Functions, Cambridge
University Press, Cambridge, 2009.
[407] M. Grabisch, J.-L. Marichal and M. Roubens, Equivalent representations of set functions,
Mathematics of Operations Research 25 (2) (2000) 157–178.
[408] D. Granot and F. Granot, Generalized covering relaxations for 0–1 programs, Operations
Research 28 (1980) 1442–1450.
[409] D. Granot, F. Granot and J. Kallberg, Covering relaxation for positive 0–1 polynomial
programs, Management Science 25 (1979) 264–273.
[410] F. Granot and P.L. Hammer, On the use of Boolean functions in 0–1 programming,
Methods of Operations Research 12 (1972) 154–184.
[411] F. Granot and P.L. Hammer, On the role of generalized covering problems, Cahiers du
Centre d’Etudes de Recherche Opérationnelle 16 (1974) 277–289.
[412] J.F. Groote and J.P. Warners, The propositional formula checker HeerHugo, in: I. Gent,
H. van Maaren and T. Walsh, eds., SAT2000: Highlights of Satisfiability Research in the
Year 2000, IOS Press, Amsterdam, 2000, pp. 261–282.
[413] A. Grossi, Algorithme à séparation de variables pour la dualisation d’une fonction
booléenne, R.A.I.R.O. 8 (B-1) (1974) 41–55.
[414] M. Grötschel, L. Lovász and A. Schrijver, The ellipsoid method and its consequences in
combinatorial optimization, Combinatorica 1 (1981) 169–197.
[415] J. Gu, Efficient local search for very large-scale satisfiability problems, SIGART Bulletin
3 (1992) 8–12.
[416] J. Gu, Local search for satisfiability (SAT) problems, IEEE Transactions on Systems, Man
and Cybernetics 23 (1993) 1108–1129.
[417] J. Gu, Global optimization for satisfiability (SAT) problems, IEEE Transactions on
Knowledge and Data Engineering 6 (1994) 361–381.
Bibliography 653

[418] J. Gu, P.W. Purdom, J. Franco and B.W. Wah, Algorithms for the satisfiability (SAT)
problem: A survey, in: D. Du, J. Gu and P.M. Pardalos, eds., Satisfiability Problem: Theory
and Applications, DIMACS series in Discrete Mathematics and Theoretical Computer
Science, Vol. 35, American Mathematical Society, 1997. pp. 19–151.
[419] B. Guenin, Perfect and ideal 0, ±1 matrices, Mathematics of Operations Research 23
(1998) 322–338.
[420] S. Gueye and P. Michelon, A linearization framework for unconstrained quadratic (0–1)
problems, Discrete Applied Mathematics 157 (2009) 1255–1266.
[421] V. Gurvich, Nash-solvability of positional games in pure strategies, USSR Computer
Mathematics and Mathematical Physics 15(2) (1975) 74–87.
[422] V. Gurvich, On repetition-free Boolean functions, Uspekhi Mat. Nauk. 32 (1977) 183–
184, (in Russian); translated as: On read-once Boolean functions, Russian Mathematical
Surveys 32 (1977) 183–184.
[423] V. Gurvich, Applications of Boolean Functions and Networks in Game Theory, Ph.D.
thesis, Moscow Institute of Physics and Technology, Moscow, USSR, 1978 (in Russian).
[424] V. Gurvich, On the normal form of positional games, Soviet Mathematics Doklady 25(3)
(1982) 572–575.
[425] V. Gurvich, Some properties and applications of complete edge-chromatic graphs and
hypergraphs, Soviet Mathematics Doklady 30(3) (1984) 803–807.
[426] V. Gurvich, Criteria for repetition-freeness of functions in the algebra of logic, Soviet
Mathematics Doklady 43(3) (1991) 721–726.
[427] V. Gurvich, Positional game forms and edge-chromatic graphs, Soviet Mathematics
Doklady 45(1) (1992) 168–172.
[428] V. Gurvich and L. Khachiyan On the frequency of the most frequently occurring variable
in dual DNFs, Discrete Mathematics 169 (1997) 245–248.
[429] V. Gurvich and L. Khachiyan, On generating the irredundant conjunctive and disjunctive
normal forms of monotone Boolean functions, Discrete Applied Mathematics 96 (1999)
363–373.
[430] M. Habib, F. de Montgolfier and C. Paul, A simple linear-time modular decomposition
algorithm, in: Proceedings of the 9th Scandinavian Workshop on Algorithm Theory -
SWAT 2004, Lecture Notes in Computer Science, Vol. 3111, Springer-Verlag, Berlin,
2004, pp. 187–198.
[431] M. Habib and C. Paul, A simple linear time algorithm for cograph recognition, Discrete
Applied Mathematics 145 (2005) 183–197.
[432] M. Hagen, Algorithmic and Computational Complexity Issues of MONET, Ph.D. thesis,
Friedrich-Schiller-Universität Jena, Germany, 2009.
[433] A. Haken, The intractability of resolution, Theoretical Computer Science 39 (1985)
297–308.
[434] P.L. Hammer, Plant location: A pseudo-Boolean approach, Israel Journal of Technology
6 (1968) 330–332.
[435] P.L. Hammer, A note on the monotonicity of pseudo-Boolean functions, Zeitschrift für
Operations Research 18 (1974) 47–50.
[436] P.L. Hammer, Pseudo-Boolean remarks on balanced graphs, International Series of
Numerical Mathematics 36 (1977) 69–78.
[437] P.L. Hammer, The conflict graph of a pseudo-Boolean function, Bell Laboratories,
Technical Report, August 1978.
[438] P.L. Hammer, Boolean elements in combinatorial optimization, in: P.L. Hammer, E.L.
Johnson and B. Korte, eds., Discrete Optimization, Annals of Discrete Mathematics Vol. 4,
Elsevier, Amsterdam, 1979, pp. 51–71.
[439] P.L. Hammer and P. Hansen, Logical relations in quadratic 0–1 programming, Revue
Roumaine de Mathématiques Pures et Appliquées 26 (1981) 421–429.
[440] P.L. Hammer, P. Hansen and B. Simeone, Roof duality, complementation and persistency
in quadratic 0–1 optimization, Mathematical Programming 28 (1984) 121–155.
654 Bibliography

[441] P.L. Hammer and R. Holzman, Approximations of pseudo-Boolean functions: Applica-


tions to game theory, ZOR - Methods and Models of Operations Research 36 (1992)
3–21.
[442] P.L. Hammer, T. Ibaraki and B. Simeone, Threshold sequences, SIAM Journal on
Algebraic and Discrete Methods 2 (1981) 39–49.
[443] P.L. Hammer, E.L. Johnson and U.N. Peled, Regular 0–1 programs, Cahiers du Centre
d’Etudes de Recherche Opérationnelle 16 (1974) 267–276.
[444] P.L. Hammer, E.L. Johnson and U.N. Peled, Facets of regular 0–1 polytopes, Mathematical
Programming 8 (1975) 179–206.
[445] P.L. Hammer and B. Kalantari, A bound on the roof duality gap, in: B. Simeone, ed.,
Combinatorial Optimization, Lecture Notes in Mathematics, Vol. 1403, Springer, Berlin,
1989, pp. 254–257.
[446] P.L. Hammer and A. Kogan, Horn functions and their DNFs, Information Processing
Letters 44 (1992) 23–29.
[447] P.L. Hammer and A. Kogan, Optimal compression of propositional knowledge bases:
complexity and approximation, Artificial Intelligence 64 (1993) 131–145.
[448] P.L. Hammer and A. Kogan, Graph based methods for Horn knowledge compression,
in: Proceedings of the 27th Hawaii International Conference on System Sciences, IEEE
Press, 1994, pp. 300–309.
[449] P.L. Hammer and A. Kogan, Quasi-acyclic propositional Horn knowledge bases: opti-
mal compression, IEEE Transaction on Knowledge and Data Engineering 7(5) (1995)
751–762.
[450] P.L. Hammer and A. Kogan, Essential and redundant rules in Horn knowledge bases,
Decision Support Systems 16 (1996) 119–130.
[451] P.L. Hammer, F. Maffray and M. Queyranne, Cut-threshold graphs, Discrete Applied
Mathematics 30 (1991) 163–179.
[452] P.L. Hammer and N.V.R. Mahadev, Bithreshold graphs, SIAM Journal on Applied
Mathematics 6 (1985) 497–506.
[453] P.L. Hammer, N.V.R. Mahadev and D. de Werra, The struction of a graph: Application to
CN-free graphs, Combinatorica 5 (1985) 141–147.
[454] P.L. Hammer and S. Nguyen, APOSS – A partial order in the solution space of bivalent
programs, in: N. Christofides, A. Mingozzi, C. Sandi, and P. Toth, eds., Combinatorial
Optimization, John Wiley & Sons, Chichester, New York, 1979, pp. 93–106.
[455] P.L. Hammer, U.N. Peled and M.A. Pollatschek, An algorithm to dualize a regular
switching function, IEEE Transactions on Computers C-28 (1979) 238–243.
[456] P.L. Hammer, U.N. Peled and S. Sorensen, Pseudo-Boolean functions and game
theory I. Core elements and Shapley value, Cahiers du Centre d’Etudes de Recherche
Opérationnelle 19, 1977, 159–176.
[457] P.L. Hammer and I.G. Rosenberg, Linear decomposition of a positive group-Boolean
function, in: L. Collatz and W. Wetterling, eds., Numerische Methoden bei Optimierung,
Vol. 2, Birkhauser, Basel, 1974, pp. 51–62.
[458] P.L. Hammer, I.G. Rosenberg and S. Rudeanu, On the determination of the minima
of pseudo-Boolean functions (in Romanian), Studii şi Cercetari Matematice 14 (1963)
359–364.
[459] P.L. Hammer and A.A. Rubin, Some remarks on quadratic programming with 0–1
variables, Revue Française d’Informatique et de Recherche Opérationnelle 4 (1970)
67–79.
[460] P.L. Hammer and S. Rudeanu, Boolean Methods in Operations Research and Related
Areas, Springer, Berlin, 1968.
[461] P.L. Hammer and B. Simeone, Quasimonotone Boolean functions and bistellar graphs,
Annals of Discrete Mathematics 9 (1980) 107–119.
[462] P.L. Hammer and B. Simeone, Order relations of variables in 0 − 1 programming, in: C.
Ribeiro, G. Laporte and S. Martello, eds., Surveys in Combinatorial Optimization, Annals
of Discrete Mathematics Vol. 31, North-Holland, Amsterdam, 1987, pp. 83–111.
Bibliography 655

[463] P.L. Hammer and B. Simeone, Quadratic functions of binary variables, in: B. Simeone,
ed., Combinatorial Optimization, Lecture Notes in Mathematics, Vol. 1403, Springer,
Berlin, 1989, pp. 1–56.
[464] P.L. Hammer, B. Simeone, T. Liebling and D. de Werra, From linear separability
to unimodality: A hierarchy of pseudo-Boolean functions, SIAM Journal on Discrete
Mathematics 1 (1988) 174–184.
[465] A. Hamor (alias P.L. Hammer), Stories of the one-zero-zero-one nights: Abu Boul in
Graphistan, in: P. Hansen and D. de Werra, eds., Regards sur la Théorie des Graphes,
Presses Polytechniques Romandes, Lausanne, 1980.
[466] D.J. Hand, Construction and Assessment of Classification Rules, Wiley, Chichester, 1997.
[467] P. Hansen and B. Jaumard, Minimum sum of diameters clustering, Journal of Classifica-
tion 4 (1987) 215–226.
[468] P. Hansen and B. Jaumard,Algorithms for the maximum satisfiability problem, Computing
44 (1990) 279–303.
[469] P. Hansen, B. Jaumard and V. Mathon, Constrained nonlinear 0–1 programming, ORSA
Journal on Computing 5 (1993) 97–119.
[470] P. Hansen, B. Jaumard and M. Minoux, A linear expected-time algorithm for deriving all
logical conclusions implied by a set of Boolean inequalities, Mathematical Programming
34 (1986) 223–231.
[471] P. Hansen, S.H. Lu and B. Simeone, On the equivalence of paved-duality and standard
linearization in nonlinear 0–1 optimization, Discrete Applied Mathematics 29 (1990)
187–193.
[472] P. Hansen and C. Meyer, Improved compact linearizations for the unconstrained quadratic
0–1 minimization problem, Discrete Applied Mathematics 157 (2009) 1267–1290.
[473] P. Hansen, M.V. Poggi de Aragão and C.C. Ribeiro, Boolean query optimization and the
0–1 hyperbolic sum problem, Annals of Mathematics and Artificial Intelligence 1 (1990)
97–109.
[474] P. Hansen and B. Simeone, Unimodular functions, Discrete Applied Mathematics 14
(1986) 269–281.
[475] F. Harary, On the notion of balance of a signed graph, Michigan Mathematics Journal 2
(1954) 143–146.
[476] F. Harche, J.N. Hooker and G.L. Thompson, A computational study of satisfiability
algorithms for propositional logic, ORSA Journal on Computing 6 (1994) 423–435.
[477] J. Håstad, On the size of weights for threshold gates, SIAM Journal on Discrete
Mathematics 7 (1994) 484–492.
[478] J. Håstad, Some optimal inapproximability results, Journal of the Association for
Computing Machinery 48 (2001) 798–859.
[479] J.P. Hayes, The fanout structure of switching functions, Journal of the ACM 22 (1975)
551–571.
[480] J.-J. Hebrard, Unique Horn renaming and unique 2-satisfiability, Information Processing
Letters 54 (1995) 235–239.
[481] T. Hegedűs and N. Megiddo, On the geometric separability of Boolean functions, Discrete
Applied Mathematics 66 (1996) 205–218.
[482] R. Heiman and A. Wigderson, Randomized vs. deterministic decision tree complexity for
read-once Boolean functions, Computational Complexity 1 (1991) 311–329.
[483] I. Heller and C.B. Tompkins, An extension of a theorem of Dantzig, in: H.W. Kuhn and
A.W. Tucker, eds., Linear Inequalities and Related Systems, Princeton University Press,
Princeton, N.J., 1956, pp. 247–254.
[484] L. Hellerstein, Functions that are read-once on a subset of their variables, Discrete Applied
Mathematics 46 (1993) 235–251.
[485] L. Hellerstein, On generalized constraints and certificates, Discrete Mathematics 226
(2001) 211–232.
[486] L. Hellerstein and V. Raghavan, Exact learning of DNF formulas using DNF hypothesis,
Journal of Computer and System Sciences 70 (2005) 435–470.
656 Bibliography

[487] P.B. Henderson and Y. Zalcstein, A graph-theoretic characterization of the PV chunk class
of synchronizing primitives, SIAM Journal on Computing 6 (1977) 88–108.
[488] L.J. Henschen, Semantic resolution for Horn sets, IEEE Transactions on Computers 25
(1976) 816–822.
[489] L.J. Henschen and L. Wos, Unit refutations and Horn sets, Journal of the ACM 21 (1974)
590–605.
[490] M. Herbstritt, Satisfiability and Verification: From Core Algorithms to Novel Application
Domains, Suedwestdeutscher Verlag für Hochschulschriften, 2009.
[491] A. Hertz, On the use of Boolean methods for the computation of the stability number,
Discrete Applied Mathematics 76 (1997) 183–203.
[492] E.A. Hirsch, New worst-case upper bounds for SAT, Journal of Automated Reasoning 24
(2000) 397–420.
[493] W. Hodges, Reducing first order logic to Horn logic, School of Mathematical Sciences,
Queen Mary and Westfield College, London, 1985.
[494] W. Hodges, Logical features of Horn clauses, in: Handbook of Logic in Artifi-
cial Intelligence and Logic Programming, Vol. 1, Oxford University Press, 1993,
pp. 449–503.
[495] A.J. Hoffman and J.B. Kruskal, Integral boundary points of convex polyhedra, in:
H.W. Kuhn and A.W. Tucker, eds., Linear Inequalities and Related Systems, Princeton
University Press, Princeton, N.J., 1956, 223–246.
[496] K. Williamson Hoke, Completely unimodal numberings of a simple polytope, Discrete
Applied Mathematics 20 (1988) 69–81.
[497] K. Hoke, Extending shelling orders and a hierarchy of functions of unimodal simple
polytopes, Discrete Applied Mathematics 60 (1995) 211–217.
[498] J.N. Hooker, A quantitative approach to logical inference, Decision Support Systems 4
(1988) 45–69.
[499] J.N. Hooker, Generalized resolution and cutting planes, Annals of Operations Research
12 (1988) 217–239.
[500] J.N. Hooker, Resolution vs. cutting plane solution of inference problems: Some compu-
tational experience, Operations Research Letters 7 (1988) 1–7.
[501] J.N. Hooker, Resolution and the integrality of satisfiability problems, Mathematical
Programming 74 (1996) 1–10.
[502] J.N. Hooker, Logic-Based Methods for Optimization: Combining Optimization and
Constraint Satisfaction, John Wiley & Sons, New York, 2000.
[503] J.N. Hooker, Optimization methods in logic, in: Y. Crama and P.L. Hammer, eds., Boolean
Models and Methods in Mathematics, Computer Science, and Engineering, Cambridge
University Press, Cambridge, 2010, pp. 160–194.
[504] J.N. Hooker and V. Vinay, Branching rules for satisfiability, Journal of Automated
Reasoning 15 (1995) 359–383.
[505] H.H. Hoos and T. Stützle, Towards a characterisation of the behaviour of stochastic local
search algorithms for SAT, Artificial Intelligence 112 (1999) 213–232.
[506] H.H. Hoos and T. Stützle, SATLIB: An online resource for research on SAT, in:
I. Gent, H. van Maaren and T. Walsh, eds., SAT2000: Highlights of Satisfiability Research
in the Year 2000, IOS Press, Amsterdam, 2000, pp. 283–292.
[507] H.H. Hoos and T. Stützle, Local search algorithms for SAT: An empirical evaluation,
Journal of Automated Reasoning 24 (2000) 421–481.
[508] H.H. Hoos and T. Stützle, Stochastic Local Search: Foundations and Applications,
Morgan Kaufmann Publishers, San Francisco, CA, 2005.
[509] A. Horn, On sentences which are true of direct unions of algebras, Journal of Symbolic
Logic 16 (1951) 14–21.
[510] I. Horrocks and P.F. Patel-Schneider, Evaluating optimized decision procedures for propo-
sitional modal K(m) satisfiability, in: I. Gent, H. van Maaren and T. Walsh, eds., SAT2000:
Highlights of Satisfiability Research in the Year 2000, IOS Press, Amsterdam, 2000,
pp. 427–458.
Bibliography 657

[511] S.-T. Hu, Threshold Logic, University of California Press, Berkeley - Los Angeles,
1965.
[512] S.-T. Hu, Mathematical Theory of Switching Circuits and Automata, University of
California Press, Berkeley - Los Angeles, 1968.
[513] L.M. Hvattum, A. Løkketangen and F. Glover, Adaptive memory search for Boolean
optimization problems, Discrete Applied Mathematics 142 (2004) 99–109.
[514] L. Hyafil and R.L. Rivest, Constructing optimal binary decision trees is NP-complete,
Information Processing Letters 5 (1976) 15–17.
[515] T. Ibaraki, T. Imamichi, Y. Koga, H. Nagamochi, K. Nonobe and M. Yagiura, Efficient
branch-and-bound algorithms for weighted MAX-2-SAT, Technical Report 2007-011,
Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto
University, May 2007.
[516] T. Ibaraki and T. Kameda, A theory of coteries: Mutual exclusion in distributed systems,
IEEE Transactions on Parallel and Distributed Systems 4 (1993) 779–794.
[517] T. Ibaraki, A. Kogan and K. Makino, Functional dependencies in Horn theories, Artificial
Intelligence 108 (1999) 1–30.
[518] T. Ibaraki, A. Kogan and K. Makino, Inferring minimal functional dependencies in
Horn and q-Horn theories, Annals of Mathematics and Artificial Intelligence, 38 (2003)
233–255.
[519] J.P. Ignizio, Introduction to Expert Systems: The Development and Implementation of
Rule-Based Expert Systems, McGraw-Hill, New York, 1991.
[520] J.R. Isbell, A class of simple games, Duke Mathematical Journal 25 (1958) 423–439.
[521] A. Itai and J.A. Makowsky, Unification as a complexity measure for logic programming,
Journal of Logic Programming 4 (1987) 105–117.
[522] K. Iwama, CNF satisfiability test by counting and polynomial average time, SIAM Journal
on Computing 18 (1989) 385–391.
[523] S. Iwata, Submodular function minimization, Mathematical Programming Ser. B 112
(2008) 45–64.
[524] S. Iwata, L. Fleischer and S. Fujishige, A combinatorial, strongly polynomial-time algo-
rithm for minimizing submodular functions, in: Proceedings of the 32nd ACM Symposium
on Theory of Computing, 2000, pp. 97–106.
[525] S. Janson, Y.C. Stamatiou and M. Vamvakari, Bounding the unsatisfiability threshold of
random 3-SAT, Random Structures and Algorithms 17 (2000) 103–116.
[526] B. Jaumard, Extraction et Utilisation de Relations Booléennes pour la Résolution des
Programmes Linéaires en Variables 0-1, Thèse de doctorat, Ecole Nationale Supérieure
des Télécommunications, Paris, France, 1986.
[527] B. Jaumard, P. Marchioro, A. Morgana, R. Petreschi and B. Simeone, An O(n3 )
on-line algorithm for 2-satisfiability, Atti Giornate di Lavoro AIRO, Pisa, 1988,
pp. 391–399.
[528] B. Jaumard, P. Marchioro, A. Morgana, R. Petreschi and B. Simeone, On-line 2-
satisfiability, Annals of Mathematics and Artificial Intelligence 1 (1990) 155–165.
[529] B. Jaumard and M. Minoux, An efficient algorithm for the transitive closure and a linear
worst-case complexity result for a class of sparse graphs, Information Processing Letters
22 (1986) 163–169.
[530] B. Jaumard and B. Simeone, On the complexity of the maximum satisfiability problem
for Horn formulas, Information Processing Letters 26 (1987) 1–4.
[531] B. Jaumard, B. Simeone and P.S. Ow, A selected Artificial Intelligence bibliography for
Operations Researchers, Annals of Operations Research 12 (1988) 1–50.
[532] B. Jaumard, M. Stan and J. Desrosiers, Tabu search and a quadratic relaxation for
the satisfiability problem, in: D.S. Johnson and M.A. Trick, eds., Cliques, Coloring,
and Satisfiability, DIMACS Series in Discrete Mathematics and Theoretical Computer
Science, Vol. 26, American Mathematical Society, 1996, pp. 457–477.
[533] R.G. Jeroslow, Logic-Based Decision Support - Mixed Integer Model Formulation,
North-Holland, Amsterdam, 1989.
658 Bibliography

[534] R.G. Jeroslow and J. Wang, Solving propositional satisfiability problems, Annals of
Mathematics and Artificial Intelligence 1 (1990) 167–187.
[535] J.H.R. Jiang and T. Villa, Hardware equivalence checking, in: Y. Crama and P.L. Hammer,
eds., Boolean Models and Methods in Mathematics, Computer Science, and Engineering,
Cambridge University Press, Cambridge, 2010, pp. 599–674.
[536] D.S. Johnson, Approximation algorithms for combinatorial problems, Journal of Com-
puter and System Sciences 9 (1974) 256–278.
[537] D.S. Johnson and M.A. Trick, eds., Cliques, Coloring, and Satisfiability, DIMACS
Series in Discrete Mathematics and Theoretical Computer Science, Vol. 26, American
Mathematical Society, 1996.
[538] D.S. Johnson, M. Yannakakis and C.H. Papadimitriou, On generating all maximal
independent sets, Information Processing Letters 27 (1988) 119–123.
[539] N.D. Jones and W.T. Laaser, Complete problems for deterministic polynomial time,
Theoretical Computer Science 3 (1976) 105–117.
[540] S. Joy, J. Mitchell and B. Borchers, A branch and cut algorithm for MAX-SAT and
weigthed MAX-SAT, in: D. Du, J. Gu and P.M. Pardalos, eds., Satisfiability Problem:
Theory and Applications, DIMACS series in Discrete Mathematics and Theoretical
Computer Science, Vol. 35, American Mathematical Society, 1997, pp. 519–536.
[541] S. Jukna, A. Razborov, P. Savický and I. Wegener, On P versus NP ∩ co-NP for decision
trees and read-once branching programs, in: I. Privara and P. Ruzicka, eds., Mathematical
Foundations of Computer Science 1997, Lecture Notes in Computer Science, Vol. 1295,
Springer-Verlag, Berlin-New York, 1997, pp. 319–326.
[542] J. Kahn, Entropy, independent sets and antichains:Anew approach to Dedekind’s problem,
Proceedings of the American Mathematical Society 130 (2002) 371–378.
[543] J. Kahn, G. Kalai and N. Linial, The influence of variables on Boolean functions, in: Pro-
ceedings of the 29th Annual IEEE Symposium on the Foundations of Computer Science,
IEEE, White Plains, NY, 1988, pp. 68–80.
[544] B. Kalantari and J.B. Rosen, Penalty formulation for zero-one nonlinear programming,
Discrete Applied Mathematics 16 (1987) 179–182.
[545] A.P. Kamath, N.K. Karmarkar, K.G. Ramakrishnan and M.G.C. Resende, Computa-
tional experience with an interior point algorithm on the satisfiability problem, Annals of
Operations Research 25 (1990) 43–58.
[546] A.P. Kamath, N.K. Karmarkar, K.G. Ramakrishnan and M.G.C. Resende, A continuous
approach to inductive inference, Mathematical Programming 57 (1992) 215–238.
[547] Y. Kambayashi, Logic design of programmable logic arrays, IEEE Transactions on
Computers C-28 (1979) 609–617.
[548] M. Karchmer, N. Linial, I. Newman, M. Saks and A. Wigderson, Combina-
torial characterization of read-once formulae, Discrete Mathematics 114 (1993)
275–282.
[549] H. Karloff and U. Zwick,A7/8-approximation algorithm for MAX 3SAT?, in: Proceedings
of the 38th Annual IEEE Symposium on the Foundations of Computer Science, IEEE, 1997,
pp. 406–415.
[550] R.M. Karp, Reducibility among combinatorial problems, in: R.E. Miller and
J.W. Thatcher, eds., Complexity of Computer Computations, Plenum Press, New York,
1972, pp. 85–103.
[551] R.M. Karp, M. Luby and N. Madras, Monte-Carlo approximation algorithms for
enumeration problems, Journal of Algorithms 10 (1989) 429–448.
[552] M. Karpinski, H. Kleine Büning and P.H. Schmitt, On the computational complexity of
quantified Horn clauses, in: E. Börger, H. Kleine Büning and M.M. Richter, eds., CSL’87,
First Workshop on Computer Science Logic, Lecture Notes in Computer Science, Vol. 329,
Springer-Verlag, Berlin, 1988, pp. 129–137.
[553] S.A. Kauffman, The Origins of Order: Self-Organization and Selection in Evolution,
Oxford University Press, New York, 1993.
Bibliography 659

[554] H.A. Kautz, M.J. Kearns and B. Selman, Horn approximations of empirical data, Artificial
Intelligence 74 (1995) 129–145.
[555] H. Kautz and B. Selman, Knowledge compilation and theory of approximation, Journal
of the ACM 43 (1996) 193–224.
[556] H. Kautz and B. Selman, Pushing the envelope: Planning, propositional logic, and stochas-
tic search, in: Proceedings of the 13th National Conference on Artificial Intelligence,
Portland, OR, 1996, pp. 1188–1194.
[557] H. Kautz, B. Selman and Y. Jiang, A general stochastic approach to solving problems with
hard and soft constraints, in: D. Du, J. Gu and P.M. Pardalos, eds., Satisfiability Prob-
lem: Theory and Applications, DIMACS series in Discrete Mathematics and Theoretical
Computer Science, Vol. 35, American Mathematical Society, 1997, pp. 573–586.
[558] D.J. Kavvadias, C.H. Papadimitriou and M. Sideri, On Horn envelopes and hypergraph
transversals, in: K.W. Ng et al., eds., Algorithms and Computation – ISAAC’93, Lecture
Notes in Computer Science, Vol. 762, Springer-Verlag, Berlin, 1993, pp. 399–405.
[559] D.J. Kavvadias and E.C. Stavropoulos, An efficient algorithm for the transversal
hypergraph generation, Journal of Graph Algorithms and Applications 9 (2005) 239–264.
[560] M. Kearns, M. Li and L. Valiant, Learning Boolean functions, Journal of the Association
for Computing Machinery 41 (1994) 1298–1328.
[561] H. Kellerer, U. Pferschy and D. Pisinger, Knapsack Problems, Springer-Verlag, Berlin-
Heidelberg-New York, 2004.
[562] L. Khachiyan, E. Boros, K. Elbassioni and V. Gurvich, Generating all minimal integral
solutions to AND-OR systems of monotone inequalities: Conjunctions are simpler than
disjunctions, Discrete Applied Mathematics 156 (2008) 2020–2034.
[563] S. Khanna, M. Sudan and D.P. Williamson, A complete classification of the approx-
imability of maximization problems derived from Boolean constraint satisfaction, in:
Proceedings of the 29th Annual ACM Symposium on the Theory of Computing, 1997,
pp. 11–20.
[564] R. Khardon, Translating between Horn representations and their characteristic models,
Journal of Artificial Intelligence Research 3 (1995) 349–372.
[565] R. Khardon, H. Mannila and D. Roth, Reasoning with examples: Propositional formulae
and database dependencies, Acta Informatica 36 (1999) 267–286.
[566] R. Khardon and D. Roth, Reasoning with models, Artificial Intelligence 87 (1996)
187–213.
[567] S. Khot, G. Kindler, E. Mossel and R. O’Donnell, Optimal inapproximability results for
MAX-CUT and other 2-variable CSPs?, SIAM Journal on Computing 37 (2007) 319–357.
[568] P. Kilby, J.K. Slaney, S. Thibaux and T. Walsh, Backbones and backdoors in satisfiability,
AAAI Proceedings, 2005, pp. 1368–1373.
[569] V. Klee and P. Kleinschmidt, Convex polytopes and related complexes, in: R. Graham, M.
Grötschel and L. Lovász, eds., Handbook of Combinatorics, Elsevier, Amsterdam, 1995,
pp. 875–917.
[570] H. Kleine Büning, On generalized Horn formulas and k-resolution, Theoretical Computer
Science 116 (1993) 405–413.
[571] H. Kleine Büning and T. Lettmann, Propositional Logic: Deduction and Algorithms,
Cambridge University Press, Cambridge, 1999.
[572] D. Kleitman, On Dedekind’s problem: The number of monotone Boolean functions,
Proceedings of the American Mathematical Society 21 (1969) 677–682.
[573] D. Kleitman and G. Markowsky, On Dedekind’s problem: The number of isotone Boolean
functions. II, Transactions of the American Mathematical Society 213 (1975) 373–390.
[574] B. Klinz and G.J. Woeginger, Faster algorithms for computing power indices in weighted
voting games, Mathematical Social Sciences 49 (2005) 111–116.
[575] D.E. Knuth, The Art of Computer Programming, Volume 4, Fascicle 0, Introduction to
Combinatorial Algorithms and Boolean Functions, Stanford University, Stanford, CA,
2008. http://www-cs-faculty.stanford.edu/ knuth/taocp.html
660 Bibliography

[576] V. Kolmogorov and C. Rother, Minimizing nonsubmodular functions with graph cuts -
A review, IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (2007)
1274–1279.
[577] V. Kolmogorov and R. Zabih, What energy functions can be minimized via graph cuts?,
IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (2004) 147–159.
[578] A.D. Korshunov, The number of monotone Boolean functions, Problemy Kibernetiki 38
(1981) 5–108 (in Russian).
[579] A.D. Korshunov, Families of subsets of a finite set and closed classes of Boolean functions,
in: P. Frankl et al., eds., Extremal Problems for Finite Sets, János Bolyai Mathematical
Society, Budapest, Hungary, 1994, pp. 375–396.
[580] A.D. Korshunov, Monotone Boolean functions, Russian Mathematical Surveys 58 (2003)
929–1001.
[581] S. Kottler, M. Kaufmann and C. Sinz, Computation of renameable Horn backdoors, in:
Proceedings of the 11th International Conference on Theory and Applications of Satisfia-
bility Testing (SAT 2008), Lecture Notes in Computer Science, Vol. 4996, Springer-Verlag,
Berlin, 2008, pp. 154–160.
[582] R. Kowalski, Logic for Problem Solving, North-Holland, Amsterdam-New York, 1979.
[583] M. Krause and I. Wegener, Circuit complexity, in: Y. Crama and P.L. Hammer, eds.,
Boolean Models and Methods in Mathematics, Computer Science, and Engineering,
Cambridge University Press, Cambridge, 2010, pp. 506–530.
[584] L. Kroc, A. Sabharwal and B. Selman, Leveraging belief propagation, backtrack search,
and statistics for model counting, in: L. Perron and M.A. Trick, eds., Integration of AI and
OR Techniques in Constraint Programming for Combinatorial Optimization Problems,
Lecture Notes in Computer Science Vol. 5015, Springer-Verlag, Berlin Heidelberg, 2008,
pp. 127–141.
[585] P. Kučera, On the size of maximum renamable Horn sub-CNF, Discrete Applied
Mathematics 149 (2005) 126–130.
[586] W. Küchlin and C. Sinz, Proving consistency assertions for automotive product data
management, in: I. Gent, H. van Maaren and T. Walsh, eds., SAT2000: Highlights of
Satisfiability Research in the Year 2000, IOS Press, Amsterdam, 2000, pp. 327–342.
[587] H.W. Kuhn, The Hungarian method for solving the assignment problem, Naval Research
Logistics Quarterly 2 (1955) 83–97.
[588] O. Kullmann, New methods for 3-SAT decision and worst-case analysis, Theoretical
Computer Science 223 (1999) 1–72.
[589] J. Kuntzmann, Algèbre de Boole, Dunod, Paris, 1965. English translation: Fundamental
Boolean Algebra, Blackie and Son Limited, London and Glasgow, 1967.
[590] W. Kunz and D. Stoffel, Reasoning in Boolean Networks, Kluwer Academic Publishers,
Boston - Dordrecht - London, 1997.
[591] Z.A. Kuzicheva, Mathematical logic, in: A.N. Kolmogorov and A.P. Yushkevich, eds.,
Mathematics of the 19th Century, Volume 1, 2nd revised edition, Birkhaüser Verlag,
Basel, 2001, pp. 1–34.
[592] A.V. Kuznetsov, Non-repeating contact schemes and non-repeating superpositions of
functions of algebra of logic, in: Collection of Articles on Mathematical Logic and its
Applications to Some Questions of Cybernetics, Proceedings of the Steklov Institute of
Mathematics, Vol. 51, Academy of Sciences of USSR, Moscow, 1958, pp. 862–25.
[593] L. Lamport, The implementation of reliable distributed multiprocess systems, Computing
Networks 2 (1978) 95–114.
[594] M. Langlois, D. Mubayi, R.H. Sloan and G. Turán, Combinatorial problems for Horn
clauses, manuscript, 2008.
[595] M. Langlois, R.H. Sloan, B. Szörényi and G. Turán, Horn complements: Towards Horn-
to-Horn belief revision, in: D. Fox and C.P. Gomes, eds., Proceedings of the Twenty-Third
AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, Illinois, USA, 2008,
pp. 466–471.
Bibliography 661

[596] M. Langlois, R.H. Sloan and G. Turán, Horn upper bounds and renaming, in: J. Marques-
Silva and K.A. Sakallah, eds., Proceedings of the 10th International Conference on Theory
and Applications of Satisfiability Testing – SAT 2007, Lisbon, Portugal, 2007, pp. 80–93.
[597] E. Lapidot, Weighted majority games and symmetry groups of games, M.Sc. thesis (in
Hebrew), Technion, Haifa, Israel, 1968.
[598] E. Lapidot, The counting vector of a simple game, Proceedings of the American
Mathematical Society 31 (1972) 228–231.
[599] T. Larrabee, Test pattern generation using Boolean satisfiability, IEEE Transactions on
Computer-Aided Design 11 (1992) 4–15.
[600] A. Laruelle and M. Widgrén, Is the allocation of voting power among EU states fair?,
Public Choice 94 (1998) 317–339.
[601] M. Laurent and F. Rendl, Semidefinite programming and integer programming, in:
K. Aardal, G. Nemhauser and R. Weismantel, eds., Discrete Optimization, Elsevier,
Amsterdam, 2005, pp. 393–514.
[602] M. Laurent and A. Sassano, A characterization of knapsacks with the max-flow-min-cut
property, Operations Research Letters 11 (1992) 105–110.
[603] E.L. Lawler, Covering problems: Duality relations and a new method of solution, SIAM
Journal on Applied Mathematics 14 (1966) 1115–1132.
[604] E.L. Lawler, Combinatorial Optimization: Networks and Matroids, Holt, Rinehart and
Winston, New York, 1976.
[605] E.L. Lawler, J.K. Lenstra and A.H.G. Rinnooy Kan, Generating all maximal independent
sets: NP-hardness and polynomial-time algorithms, SIAM Journal on Computing 9 (1980)
558–565.
[606] D. Leech, The relationship between shareholding concentration and shareholder voting
power in British companies: A study of the application of power indices for simple games,
Management Science 34 (1988) 509–528.
[607] D. Leech, Designing the voting system for the Council of the European Union, Public
Choice 113 (2002) 437–464.
[608] D. Leech, Voting power in the governance of the International Monetary Fund, Annals of
Operations Research 109 (2002) 375–397.
[609] D. Leech, Computation of power indices, Warwick Economic Research Papers, Number
644, The University of Warwick, 2002.
[610] L.A. Levin, Universal’nye zadachi perebora, Problemy Peredachi Informatsii 9 (1973)
115–116 (in Russian); translated as: Universal sequential search problems, Problems of
Information Transmission 9 (1974) 265–266.
[611] M. Lewin, D. Livnat and U. Zwick, Improved rounding techniques for the MAX 2-SAT
and MAX DI-CUT problems, in: Integer Programming and Combinatorial Optimiza-
tion (IPCO), Lecture Notes in Computer Science, Vol. 2337, Springer-Verlag, Berlin
Heidelberg New York, 2002, pp. 67–82.
[612] H.R. Lewis, Renaming a set of clauses as a Horn set, Journal of the ACM 25 (1978)
134–135.
[613] C.M. Li and Anbulagan, Heuristics based on unit propagation for satisfiability problems,
Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence,
Morgan Kaufmann, 1997, pp. 366–371.
[614] N. Linial and N. Nisan, Approximate inclusion-exclusion, Combinatorica 10 (1990)
349–365.
[615] N. Linial and M. Tarsi, Deciding hypergraph 2-colourability by H-resolution, Theoretical
Computer Science 38 (1985) 343–347.
[616] M.O. Locks, Inverting and minimalizing path sets and cut sets, IEEE Transactions on
Reliability R-27 (1978) 107–109.
[617] M.O. Locks, Inverting and minimizing Boolean functions, minimal paths and mini-
mal cuts: Noncoherent system analysis, IEEE Transactions on Reliability R-28 (1979)
373–375.
662 Bibliography

[618] M.O. Locks, Recursive disjoint products, inclusion-exclusion, and min-cut approxima-
tions, IEEE Transactions on Reliability R-29 (1980) 368–371.
[619] M.O. Locks, Recursive disjoint products:Areview of three algorithms, IEEE Transactions
on Reliability R-31 (1982) 33–35.
[620] A. Lodi, K. Allemand and T.M. Liebling, An evolutionary heuristic for quadratic 0–1
programming, European Journal of Operational Research 119 (1999) 662–670.
[621] D.E. Loeb and A.R. Conway, Voting fairly: Transitive maximal intersecting families of
sets, Journal of Combinatorial Theory A 91 (2000) 386–410.
[622] L. Lovász, Normal hypergraphs and the perfect graph conjecture, Discrete Mathematics
2 (1972) 253–267.
[623] L. Lovász, On the ratio of optimal and integral and fractional covers, Discrete Mathematics
13 (1975) 383–390.
[624] L. Lovász, Submodular functions and convexity, in: A. Bachem, M. Grötschel and
B. Korte, eds., Mathematical Programming – The State of the Art, Springer-Verlag, Berlin,
1983, pp. 235–257.
[625] L. Lovász, An Algorithmic Theory of Numbers, Graphs and Convexity, Society for
Industrial and Applied Mathematics, Philadelphia, 1986.
[626] L. Lovász, Lecture Notes on Evasiveness of Graph Properties, Notes by Neal Young,
Computer Science Department, Princeton University, January 1994.
[627] D.W. Loveland, Automated Theorem-Proving: A Logical Basis, North-Holland, Amster-
dam, 1978.
[628] L. Löwenheim, Über das Auflösungsproblem im logischen Klassenkalkul, Sitzungs-
berichte der Berliner Mathematischen Gesellschaft 7 (1908) 89–94.
[629] L. Löwenheim, Über die Auflösung von Gleichungen im logischen Gebietkalkul,
Mathematische Annalen 68 (1910) 169–207.
[630] E. Lozinskii, Counting propositional models, Information Processing Letters 41 (1992)
327–332.
[631] W.F. Lucas, Measuring power in weighted voting systems, in: Case Studies in Applied
Mathematics, Mathematical Association of America, 1976, pp. 42–106. Also Chapter 9
in: S.J. Brams, W.F. Lucas and P.D. Straffin, Jr., eds., Political and Related Models,
Springer-Verlag, Berlin Heidelberg New York, 1983.
[632] W.F. Lucas, The apportionment problem, Chapter 14 in: S.J. Brams, W.F. Lucas and P.D.
Straffin, Jr., eds., Political and Related Models, Springer-Verlag, Berlin Heidelberg New
York, 1983.
[633] E.J. McCluskey, Minimization of Boolean functions, Bell Systems Technical Journal 35
(1956) 1417–1444.
[634] E.J. McCluskey, Introduction to the Theory of Switching Circuits, McGraw-Hill,
New York, 1965.
[635] E.J. McCluskey, Logic Design Principles, Prentice-Hall, Englewood Cliffs, New Jersey,
1986.
[636] R.M. McConnell and J.P. Spinrad, Modular decomposition and transitive orientation,
Discrete Mathematics 201 (1999) 189–241.
[637] S.T. McCormick, Submodular function minimization, in: K. Aardal, G.L. Nemhauser,
R. Weismantel, eds., Discrete Optimization, Handbooks in Operations Research and
Management Science, Vol. 12, Elsevier, Amsterdam, 2005, pp. 321–391.
[638] J.C.C. McKinsey, The decision problem for some classes of sentences without quantifiers,
Journal of Symbolic Logic 8 (1943) 61–76.
[639] I. McLean, Don’t let the lawyers do the math: Some problems of legislative districting in
the UK and the USA, Mathematical and Computer Modelling 48 (2008) 1446–1454.
[640] G.F. McNulty, Fragments of first order logic, I: Universal Horn logic, Journal of Symbolic
Logic 42 (1977) 221–237.
[641] T.-H. Ma, On the threshold dimension 2 graphs, Technical report, Institute of Information
Sciences, Academia Sinica, Taipei, Republic of China, 1993.
Bibliography 663

[642] F.J. MacWilliams and N.J.A. Sloane, The Theory of Error-Correcting Codes, North-
Holland, Amsterdam, The Netherlands, 1977.
[643] K. Maghout, Sur la détermination des nombres de stabilité et du nombre chroma-
tique d’un graphe, Comptes Rendus de l’Académie des Sciences de Paris 248 (1959)
3522–3523.
[644] K. Maghout,Applications de l’algèbre de Boole à la théorie des graphes et aux programmes
linéaires et quadratiques, Cahiers du Centre d’Etudes de Recherche Opérationnelle 5
(1963) 21–99.
[645] N.V.R. Mahadev and U. Peled, Threshold Graphs and Related Topics, Annals of Discrete
Mathematics Vol. 56, North-Holland, Amsterdam, The Netherlands, 1995.
[646] D. Maier, Minimal covers in the relational database model, Journal of the ACM 27 (1980)
664–674.
[647] D. Maier, The Theory of Relational Databases, Computer Science Press, Rockville, MD,
1983.
[648] D. Maier and D.S. Warren, Computing with Logic: Logic Programming with PROLOG,
Benjamin/Cummings Publishing Co., Menlo Park, CA, 1988.
[649] K. Makino, A linear time algorithm for recognizing regular Boolean function, Journal of
Algorithms 43 (2002) 155–176.
[650] K. Makino, K. Hatanaka and T. Ibaraki, Horn extensions of a partially defined Boolean
function, SIAM Journal on Computing 28 (1999) 2168–2186.
[651] K. Makino and T. Ibaraki, Interior and exterior functions of Boolean functions, Discrete
Applied Mathematics 69 (1996) 209–231.
[652] K. Makino and T. Ibaraki, The maximum latency and identification of positive Boolean
functions, SIAM Journal on Computing 26 (1997) 1363–1383.
[653] K. Makino and T. Ibaraki,Afast and simple algorithm for identifying 2-monotonic positive
Boolean functions, Journal of Algorithms 26 (1998) 291–305.
[654] K. Makino and T. Ibaraki, Inner-core and outer-core functions of partially defined Boolean
functions, Discrete Applied Mathematics 96–97 (1999) 307–326.
[655] K. Makino, K. Yano and T. Ibaraki, Positive and Horn decomposability of partially defined
Boolean functions, Discrete Applied Mathematics 74 (1997) 251–274.
[656] J.A. Makowsky, Why Horn formulas matter in computer science: Initial structures and
generic examples, Journal of Computer and System Sciences 34 (1987) 266–292.
[657] A.I. Malcev, The Metamathematics of Algebraic Systems, Collected Papers: 1936–1967,
North Holland, Amsterdam, 1971.
[658] O.L. Mangasarian, Linear and nonlinear separation of patterns by linear programming,
Operations Research 13 (1965) 444–452.
[659] O.L. Mangasarian, Mathematical programming in neural networks, ORSA Journal on
Computing 5 (1993) 349–360.
[660] O.L. Mangasarian, R. Setiono and W.H. Wolberg, Pattern recognition via linear
programming: Theory and applications to medical diagnosis, in: T.F. Coleman and Y.
Li, eds., Large-Scale Numerical Optimization, SIAM Publications, Philadelphia, 1990,
pp. 22–30.
[661] I. Mann and L.S. Shapley, Values of large games VI: Evaluating the Electoral College
exactly, RM-3158, The Rand Corporation, Santa Monica, CA, 1962.
[662] H. Mannila and K. Mehlhorn, A fast algorithm for renaming a set of clauses as a Horn
set, Information Processing Letters 21 (1985) 261–272.
[663] H.K. Mannila and J. Räihä, Design of Relational Databases, Addison-Wesley, Woking-
ham, 1992.
[664] H.K. Mannila and J. Räihä, Algorithms for inferring functional dependencies, Data and
Knowledge Engineering 12 (1994) 83–99.
[665] H.K. Mannila, H. Toivonen and A.I. Verkamo, in: U.M. Fayyad and R. Uthurusamy, eds.,
Efficient Algorithms for Discovering Association Rules, AAAI Workshop on Knowledge
Discovery in Databases, 1994, pp. 181–192.
664 Bibliography

[666] V. Manquinho and J.P. Marques-Silva, On using cutting planes in pseudo-Boolean


optimization, Journal on Satisfiability, Boolean Modeling and Computation 2 (2006)
209–219.
[667] V.M. Manquinho and O. Roussel, The first evaluation of pseudo-Boolean solvers (PB’05),
Journal on Satisfiability, Boolean Modeling and Computation 2 (2006) 103–143.
[668] J.-L. Marichal, On Sugeno integral as an aggregation function, Fuzzy Sets and Systems
114 (2000) 347–365.
[669] J.-L. Marichal, The influence of variables on pseudo-Boolean functions with applications
to game theory and multicriteria decision making, Discrete Applied Mathematics 107
(2000) 139–164.
[670] J.P. Marques-Silva and K.A. Sakallah, GRASP: A search algorithm for propositional
satisfiability, IEEE Transactions on Computers C-48 (1999) 506–521.
[671] S. Martello and P. Toth, Knapsack Problems: Algorithms and Computer Implementations,
John Wiley & Sons, Chichester, New York, 1990.
[672] U. Martin and T. Nipkow, Boolean unification: The story so far, Journal of Symbolic
Computation 7 (1989) 275–293.
[673] M. Maschler and B. Peleg, A characterization, existence proof and dimension bounds for
the kernel of a game, Pacific Journal of Mathematics 18 (1966) 289–328.
[674] W.J. Masek, Some NP-complete set covering problems, MIT, Cambridge, MA, unpub-
lished manuscript, August 1979.
[675] F. Massacci and L. Marraro, Logical cryptanalysis as a SAT problem: Encoding and
analysis of the U.S. data encryption standard, in: I. Gent, H. van Maaren and T. Walsh, eds.,
SAT2000: Highlights of Satisfiability Research in the Year 2000, IOS Press, Amsterdam,
2000, pp. 343–376.
[676] T. Matsui and Y. Matsui, A survey of algorithms for calculating power indices of
weighted majority games, Journal of the Operations Research Society of Japan 43 (2000)
71–86.
[677] K. Matulef, R. O’Donnell, R. Rubinfeld and R. Servedio, Testing halfspaces, in: ACM-
SIAM Symposium on Discrete Algorithms (SODA), 2009, pp. 256–264.
[678] C. Maxfield, Bebop to the Boolean Boogie: An Unconventional Guide to Electronics
Fundamentals, Components, and Processes, LLH Technology Publications, Eagle Rock,
VA, 1995.
[679] D.L. Medin and P.J. Schwanenflugel, Linear separability in classification learn-
ing, Journal of Experimental Psychology: Human Learning and Memory 7 (1981)
355–368.
[680] E. Mendelson, Boolean Algebra and Switching Circuits, Schaum’s Outline Series,
McGraw-Hill, New York, 1970.
[681] P. Merz and B. Freisleben, Greedy and local search heuristics for unconstrained binary
quadratic programming, Journal of Heuristics 8 (2002) 197–213.
[682] M. Minoux, The unique-Horn satisfiability problem and quadratic Boolean equations,
Annals of Mathematics and Artificial Intelligence 6 (1992) 253–266.
[683] M. Minoux and K. Barkaoui, Deadlocks and traps in Petri nets as Horn satisfiability solu-
tions and some related polynomially solvable problems, Discrete Applied Mathematics
29 (1990) 195–210.
[684] M. Minsky and S. Papert, Perceptrons, MIT Press, Cambridge, MA, 1969.
[685] A. Mintz, Multi-Level Synthesis: Factoring Logic Functions Using Graph Partitioning
Algorithms, Ph.D. Thesis, Bar-Ilan University, Ramat Gan, Israel, 2000.
[686] A. Mintz and M.C. Golumbic, Factoring Boolean functions using graph partitioning,
Discrete Applied Mathematics 149 (2005) 131–153.
[687] D. Mitchell, B. Selman and H. Levesque, Hard and easy distributions of SAT problems,
in: AAAI’92, Proceedings of the Tenth National Conference on Artificial Intelligence, San
Jose, CA, 1992, pp. 459–465.
[688] R. Miyashiro and T. Matsui, A polynomial-time algorithm to find an equitable homeaway
assignment, Operations Research Letters 33 (2005) 235–241.
Bibliography 665

[689] M. Molloy, The probabilistic method, in: M. Habib, C. McDiarmid, J. Ramirez-Alfonsin


and B. Reed, eds., Probabilistic Methods for Algorithmic Discrete Mathematics, Springer,
Berlin, 1998, pp. 1–35.
[690] B. Monien and E. Speckenmeyer, Solving satisfiability in less than 2n steps, Discrete
Applied Mathematics 10 (1985) 287–295.
[691] I.D. Moon and S.S. Chaudhry, An analysis of network location problems, Management
Science 30 (1984) 290–307.
[692] E.F. Moore, Counterexample to a conjecture of McCluskey and Paull, unpublished
memorandum, Bell Telephone Laboratories, 1957.
[693] M.W. Moskewicz, C.F. Madigan, Y. Zhao, L. Zhang and S. Malik, Chaff: Engineer-
ing an efficient SAT solver, in: Proceedings of the 38th Design Automation Conference
(DAC’01), 2001, pp. 530–535.
[694] H.M. Mulder, The structure of median graphs, Discrete Mathematics 24 (1978) 197–204.
[695] H.M. Mulder and A. Schrijver, Median graphs and Helly hypergraphs, Discrete Mathe-
matics 25 (1979) 41–50.
[696] D. Mundici, Functions computed by monotone Boolean formulas with no repeated
variables, Theoretical Computer Science 66 (1989) 113–114.
[697] I. Munro, Efficient determination of the transitive closure of a directed graph, Information
Processing Letters 1 (1971) 56–58.
[698] S. Muroga, Threshold Logic and Its Applications, Wiley-Interscience, New York, 1971.
[699] S. Muroga, Logic Design and Switching Theory, Wiley-Interscience, New York, 1979.
[700] S. Muroga, S. Takasu and I. Toda, Theory of majority decision elements, Journal of the
Franklin Institute 271 (1961) 376–418.
[701] S. Muroga, M. Kondo and I. Toda, Majority decision functions of up to six variables,
Mathematics of Computation 16 (1962) 459–472.
[702] S. Muroga, T. Tsuboi and C.R. Baugh, Enumeration of threshold functions of eight vari-
ables, Department of Computer Science, University of Illinois, Report no 245, 1967.
Excerpts in IEEE Transactions on Computers C-19 (1970) 818–825.
[703] H. Narayanan, Submodular Functions and Electrical Networks, Annals of Discrete
Mathematics Vol. 54, Elsevier, Amsterdam, 1997.
[704] N.N. Necula, O metodǎ pentru reducerea numǎrului de variabile ale functiilor Booleene
foarte slab definite, Studii şi Cercetari Matematice 24 (1972) 561–566.
[705] R.J. Nelson, Simplest normal truth functions, Journal of Symbolic Logic 20 (2) (1955)
105–108.
[706] G.L. Nemhauser and L.A. Wolsey, Maximizing submodular set functions: Formulations
and analysis of algorithms, Annals of Discrete Mathematics 11 (1981) 279–301.
[707] G.L. Nemhauser and L.A. Wolsey, Integer and Combinatorial Optimization, Wiley-
Interscience Series in Discrete Mathematics and Optimization, John Wiley & Sons,
New York, 1988.
[708] G.L. Nemhauser, L.A. Wolsey and M.L. Fisher, An analysis of approximations for
maximizing submodular set functions - I, Mathematical Programming 14 (1978) 265–294.
[709] A. Neumaier, Inklusions- und Abstimmungssyteme, Mathematische Zeitschrift 141
(1975) 147–158.
[710] I. Newman, On read-once boolean functions, in: M.S. Paterson, ed., Boolean Function
Complexity: Selected Papers from LMS Symposium, Durham, July 1990, Cambridge
University Press, 1992, pp. 24–34.
[711] T.A. Nguyen, W.A. Perkins, T.J. Laffey and D. Pecora, Knowledge base verification, AI
Magazine 8 (1987) 69–75.
[712] R.G. Nigmatullin, A variational principle in the algebra of logic, in: Discrete Analysis,
Vol. 10, Novosibirsk, 1967, pp. 69–89 (in Russian).
[713] N.J. Nilsson, Principles of Artificial Intelligence, Morgan Kaufmann Publishers, San
Francisco, CA, 1980.
[714] N. Nisan and M. Szegedy, On the degree of Boolean functions as real polynomials,
Computational Complexity 4 (1994) 301–313.
666 Bibliography

[715] N. Nishimura, P. Ragde and S. Szeider, Detecting backdoor sets with respect to Horn
and binary clauses, Seventh International Conference on Theory and Applications of
Satisfiability Testing – SAT04, 2004, Vancouver, Canada.
[716] R. O’Donnell, Some topics in analysis of Boolean functions, in: Proceedings of the 40th
ACM Annual Symposium on Theory of Computing (STOC), 2008, pp. 569–578.
[717] R. O’Donnell and R.A. Servedio, The Chow parameters problem, in: Proceedings of the
40th ACM Annual Symposium on Theory of Computing (STOC), 2008, pp. 517–526.
[718] H. Ono, K. Makino and T. Ibaraki, Logical analysis of data with decomposable structures,
Theoretical Computer Science 289 (2002) 977–995.
[719] G. Owen, Multilinear extensions of games, Management Science 18 (1972) 64–79.
[720] G. Owen, Game Theory, Academic Press, San Diego, 1995.
[721] P. Padawitz, Computing in Horn Clause Theories, Springer-Verlag, Berlin, 1988.
[722] M.W. Padberg, Perfect zero-one matrices, Mathematical Programming 6 (1974) 180–196.
[723] M.W. Padberg, The Boolean quadric polytope: Some characteristics, facets and relatives,
Mathematical Programming 45 (1989) 139–172.
[724] G. Palubeckis, Iterated tabu search for the unconstrained binary quadratic optimization
problem, Informatica 17 (2006) 279–296.
[725] C.H. Papadimitriou, Computational Complexity, Addison Wesley Publishing Co.,
Reading, MA, 1994.
[726] C.H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and
Complexity, Prentice Hall, Englewood Cliffs, NJ, 1982.
[727] L. Papayanopoulos, Computerized weighted voting reapportionment, in: AFIPS Proceed-
ings, Vol. 50, 1981, pp. 623–629.
[728] L. Papayanopoulos, On the partial construction of the semi-infinite Banzhaf polyhedron,
in: A.V. Fiacco and K.O. Kortanek, eds., Semi-Infinite Programming and Applications,
Lecture Notes in Economics and Mathematical Systems, Vol. 215, Springer-Verlag,
Berlin-Heidelberg-New York, 1983, pp. 208–218.
[729] L. Papayanopoulos, DD analysis: Variational and computational properties of power
indices, Research Report 83-18, Graduate School of Management, Rutgers University,
NJ, 1983.
[730] P.M. Pardalos and S. Jha, Complexity of uniqueness and local search in quadratic 0–1
programming, Operations Research Letters 11 (1992) 119–123.
[731] R. Paturi, P. Pudlák, M.E. Saks and F. Zane, An improved exponential-time algorithm
for k-SAT, in: Proceedings of the 39th Annual IEEE Symposium on the Foundations of
Computer Science, IEEE, 1998, pp. 628–637.
[732] M.C. Paull and E.J. McCluskey, Jr., Boolean functions realizable with single threshold
devices, Proceedings of the IRE 48 (1960) 1335–1337.
[733] M.C. Paull and S.H. Unger, Minimizing the number of states in incompletely specified
sequential switching functions, IRE Transactions on Electronic Computers EC-8 (1959)
356–367.
[734] J. Peer and R. Pinter, Minimal decomposition of Boolean functions using non-repeating
literal trees, in: Proceedings of the International Workshop on Logic and Architecture
Synthesis, IFIP TC10 WD10.5, Grenoble, 1995, pp. 129–139.
[735] U.N. Peled and B. Simeone, Polynomial-time algorithms for regular set-covering and
threshold synthesis, Discrete Applied Mathematics 12 (1985) 57–69.
[736] U.N. Peled and B. Simeone, An O(nm)-time algorithm for computing the dual of a regular
Boolean function, Discrete Applied Mathematics 49 (1994) 309–323.
[737] B. Peleg, A theory of coalition formation in committees, Journal of Mathematical
Economics 7 (1980) 115–134.
[738] B. Peleg, Coalition formation in simple games with dominant players, International
Journal of Game Theory 10 (1981) 11–33.
[739] L.S. Penrose, The elementary statistics of majority voting, Journal of the Royal Statistical
Society 109 (1946) 53–57.
Bibliography 667

[740] G. Pesant and C.-G. Quimper, Counting solutions of knapsack constraints, in: L. Perron
and M.A. Trick, eds., Integration of AI and OR Techniques in Constraint Programming for
Combinatorial Optimization Problems, Lecture Notes in Computer Science, Vol. 5015,
Springer-Verlag, Berlin-Heidelberg, 2008, pp. 203–217.
[741] R. Petreschi and B. Simeone, A switching algorithm for the solution of quadratic Boolean
equations, Information Processing Letters 11 (1980) 193–198.
[742] R. Petreschi and B. Simeone, Experimental comparison of 2-satisfiability algorithms,
RAIRO Recherche Opérationnelle 25 (1991) 241–264.
[743] C.A. Petri, Introduction to General Net Theory of Processes and Systems, Springer-Verlag,
Berlin, 1980.
[744] S.R. Petrick, A direct determination of the irredundant forms of a boolean function from
the set of prime implicants, Technical Report AFCRC-TR-56-110, Air Force Cambridge
Research Center, Cambridge, MA, April 1956.
[745] J.-C. Picard and M. Queyranne, A network flow solution to some nonlinear
0–1 programming programs, with applications to graph theory, Networks 12 (1982)
141–159.
[746] J.-C. Picard and H.D. Ratliff, Minimum cuts and related problems, Networks 5 (1975)
357–370.
[747] E. Pichat, The disengagement algorithm or a new generalization of the exclusion
algorithm, Discrete Mathematics 17 (1977) 95–106.
[748] N. Pippenger, Galois theory for minors of finite functions, Discrete Mathematics 254
(2002) 405–419.
[749] L. Pitt and L.G. Valiant, Computational limitations on learning from examples, Journal
of the Association for Computing Machinery 35 (1988) 965–984.
[750] D. Plaisted and S. Greenbaum, A structure-preserving clause form translation, Journal of
Symbolic Computation 2 (1986) 293–304.
[751] G.R. Pogosyan, Classes of Boolean functions defined by functional terms, Multiple Valued
Logic 7 (2002) 417–448.
[752] R. Pöschel and I. Rosenberg, Compositions and clones of Boolean functions, in: Y. Crama
and P.L. Hammer, eds., Boolean Models and Methods in Mathematics, Computer Science,
and Engineering, Cambridge University Press, Cambridge, 2010, pp. 3–38.
[753] E.L. Post, The Two-Valued Iterative Systems of Mathematical Logic, Annals of Mathe-
matics Studies Vol. 5, Princeton University Press, Princeton, NJ, 1941.
[754] K. Prasad and J.S. Kelly, NP-completeness of some problems concerning voting games,
International Journal of Game Theory 19 (1990) 1–9.
[755] R.E. Prather, Introduction to Switching Theory: A Mathematical Approach, Allyn and
Bacon, Inc., Boston, MA, 1967.
[756] D. Pretolani, Satisfiability and Hypergraphs, Ph.D. thesis, University of Pisa, Pisa, Italy,
1992.
[757] D. Pretolani,Alinear time algorithm for unique Horn satisfiability, Information Processing
Letters 48 (1993) 61–66.
[758] D. Pretolani, Efficiency and stability of hypergraph SAT algorithms, in: D.S. Johnson
and M.A. Trick, eds., Cliques, Coloring, and Satisfiability, DIMACS Series in Discrete
Mathematics and Theoretical Computer Science, Vol. 26,American Mathematical Society,
1996, pp. 479–498.
[759] J.S. Provan, Boolean decomposition schemes and the complexity of reliability computa-
tions, DIMACS Series in Discrete Mathematics Vol. 5, American Mathematical Society,
1991, pp. 213–228.
[760] J.S. Provan and M.O. Ball, Efficient recognition of matroid and 2-monotonic systems,
in: R.D. Ringeisen and F.S. Roberts, eds., Applications of Discrete Mathematics, SIAM,
Philadelphia, 1988, pp. 122–134.
[761] P. Pudlák, Lower bounds for resolution and cutting planes proofs and monotone
computations, Journal of Symbolic Logic 62 (1997) 981–998.
668 Bibliography

[762] P.W. Purdom, Solving satisfiability with less searching, IEEE Transactions on Pattern
Analysis and Machine Intelligence 6(4) (1984) 510–513.
[763] P.W. Purdom, A survey of average time analyses of satisfiability algorithms, Journal of
Information Processing 13 (1990) 449–455.
[764] I.B. Pyne and E.J. McCluskey, Jr., An essay on prime implicant tables, Journal of the
Society for Industrial and Applied Mathematics 9 (1961) 604–631.
[765] I.B. Pyne and E.J. McCluskey, Jr., The reduction of redundancy in solving prime implicant
tables, IRE Transactions on Electronic Computers EC-11 (1962) 473–482.
[766] W.V. Quine, The problem of simplifying truth functions, American Mathematical Monthly
59 (1952) 521–531.
[767] W.V. Quine, Two theorems about truth functions, Boletin de la Sociedad Matemática
Mexicana 10 (1953) 64–70.
[768] W.V. Quine, A way to simplify truth functions, American Mathematical Monthly 62 (1955)
627–631.
[769] W.V. Quine, On cores and prime implicants of truth functions, American Mathematical
Monthly (1959) 755–760.
[770] J.R. Quinlan, Induction of decision trees, Machine Learning 1 (1986) 81–106.
[771] J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers,
1993.
[772] J.R. Quinlan, Data mining tools See5 and C5.0, published electronically at
http://www.rulequest.com/see5-info.html/ (2000).
[773] R. Raghavan, J. Cohoon and S. Sahni, Single bend wiring, Journal of Algorithms 7 (1986)
232–257.
[774] V. Raghavan and S. Schach, Learning switch configurations, in: Proceedings of the Third
Annual Workshop on Computational Learning Theory, Morgan Kaufmann Publishers,
San Francisco, CA, 1990, pp. 38–51.
[775] C.C. Ragin, The Comparative Method: Moving Beyond Qualitative and Quantitative
Strategies, University of California Press, Berkeley-Los Angeles-London, 1987.
[776] S. Rai, M. Veeraraghavan and K.S. Trivedi. A survey of efficient reliability computation
using disjoint products approach, Networks 25 (1995) 147–163.
[777] K.G. Ramamurthy, Coherent Structures and Simple Games, Kluwer Academic Publishers,
Dordrecht, 1990.
[778] B. Randerath, E. Speckenmeyer, E. Boros, O. Čepek, P.L. Hammer, A. Kogan, K. Makino
and B. Simeone, Satisfiability formulation of problems on level graphs, in: H. Kautz and
B. Selman, eds., Proceedings of the LICS 2001 Workshop on Theory and Applications of
Satisfiability Testing (SAT 2001), Boston, MA, Electronic Notes in Discrete Mathematics
9 (2001) pp. 1–9.
[779] T. Raschle and K. Simon, Recognition of graphs with threshold dimension two, Proceed-
ings of the 27th Annual ACM Symposium on the Theory of Computing, Las Vegas, NE,
1995, pp. 650–661.
[780] C. Ré and D. Suciu, Approximate lineage for probabilistic databases, Proceedings of the
Very Large Database Endowment 1 (2008) 797–808.
[781] R.C. Read and R.E. Tarjan, Bounds on backtrack algorithms for listing cycles, paths, and
spanning trees, Networks 5 (1975) 237–252.
[782] I.S. Reed, A class of multiple error-correcting codes and the decoding scheme, IRE
Transactions on Information Theory IT-4 (1954) 38–49.
[783] R. Reiter, A theory of diagnosis from first principles, Artificial Intelligence 32 (1987)
57–95.
[784] J. Reiterman, V. Rödl, E. Šiňajová and M. Tůma, Threshold hypergraphs, Discrete
Mathematics 54 (1985) 193–200.
[785] M.G. Resende, L.S. Pitsoulis and P.M. Pardalos, Approximate solution of weighted
MAX-SAT problems using GRASP, in: D. Du, J. Gu and P.M. Pardalos, eds., Satis-
fiability Problem: Theory and Applications, DIMACS series in Discrete Mathematics
Bibliography 669

and Theoretical Computer Science, Vol. 35, American Mathematical Society, 1997,
pp. 393–405.
[786] J.M.W. Rhys, A selection problem of shared fixed costs and network flows, Management
Science 17 (1970) 200–207.
[787] J.A. Robinson, A machine oriented logic based on the resolution principle, Journal of the
Association for Computing Machinery 12 (1965) 23–41.
[788] R.T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, 1970.
[789] I.G. Rosenberg, 0–1 optimization and non-linear programming, Revue Française d’Auto-
matique, d’Informatique et de Recherche Opérationnelle (Série Bleue) 2 (1972) 95–97.
[790] I.G. Rosenberg, Reduction of bivalent maximization to the quadratic case, Cahiers du
Centre d’Etudes de Recherche Opérationnelle 17 (1975), 71–74.
[791] J. Rosenmüller, Nondegeneracy problems in cooperative game theory, in: A. Bachem,
M. Grötschel and B. Korte, eds., Mathematical Programming – The State of the Art,
Springer-Verlag, 1983, pp. 391–416.
[792] D. Roth, On the hardness of approximate reasoning, Artificial Intelligence 82 (1996)
273–302.
[793] J.P. Roth, Algebraic topological methods for the synthesis of switching systems,
Transactions of the American Mathematical Society 88 (1958) 301–326.
[794] C. Rother, V. Kolmogorov, V. Lempitsky and M. Szummer, Optimizing binary MRFs via
extended roof duality, in: IEEE Conference on Computer Vision and Pattern Recognition,
June 2007.
[795] S. Rudeanu, Boolean Functions and Equations, North-Holland, Amsterdam, 1974.
[796] S. Rudeanu, Lattice Functions and Equations, Springer-Verlag, Heidelberg, 2001.
[797] Y. Sagiv, C. Delobel, D.S. Parker and R. Fagin,An equivalence between relational database
dependencies and a fragment of propositional logic, Journal of the ACM 28 (1981)
435–453.
[798] S. Sahni and T. Gonzalez, P-complete approximation problems, Journal of the ACM 23
(1976) 555–565.
[799] M. Saks, Slicing the hypercube, in: K. Walker, ed., Surveys in Combinatorics, Cambridge
University Press, Cambridge, 1993, pp. 211–255.
[800] M.T. Salvemini, B. Simeone and R. Succi, Analisi del possesso integrato nei gruppi di
imprese mediante grafi, L’Industria XVI(4) (1995) 641–662.
[801] E.W. Samson and B.E. Mills, Circuit minimization: Algebra and algorithms for new
Boolean canonical expressions, Air Force Cambridge Research Center, Technical Report
TR 54-21, 1954.
[802] T. Sang, F. Bacchus, P. Beame, H.A. Kautz and T. Pitassi, Combining component caching
and clause learning for effective model counting, in: SAT 2004 - The Seventh International
Conference on Theory and Applications of Satisfiability Testing, Vancouver, Canada,
2004, pp. 20–28.
[803] A.A. Sapozhenko, On the complexity of disjunctive normal forms obtained by the use
of the gradient algorithm, in: Discrete Analysis, Vol. 21, Novosibirsk, 1972, pp. 62–71
(in Russian).
[804] T. Sasao, Switching Theory for Logic Synthesis, Kluwer Academic Publishers, Norwell,
Massachusetts, 1999.
[805] M. Sauerhoff, I. Wegener and R. Werchner, Optimal ordered binary decision diagrams
for read-once formulas, Discrete Applied Mathematics 46 (1993) 235–251.
[806] A.A. Schäffer and M. Yannakakis, Simple local search problems that are hard to solve,
SIAM Journal on Computing 20 (1991) 56–87.
[807] T.J. Schaefer, The complexity of satisfiability problems, in: Proceedings of the
10th Annual ACM Symposium on the Theory of Computing, San Diego, CA, 1978,
pp. 216–226.
[808] I. Schiermeyer, Pure literal lookahead: an O(1.497n ) 3-satisfiability algorithm, in:
Proceedings of the Workshop on Satisfiability, Siena, Italy, 1996, pp. 63–72.
670 Bibliography

[809] J.S. Schlipf, F.S. Annexstein, J.V. Franco and R.P. Swaminathan, On finding solutions for
extended Horn formulas, Information Processing Letters 54 (1995) 133–137.
[810] L. Schmitz, An improved transitive closure algorithm, Computing 30 (1983) 359–371.
[811] W.G. Schneeweiss, Boolean Functions with Engineering Applications and Computer
Programs, Springer-Verlag, Berlin, New York, 1989.
[812] A. Schrijver, Theory of Linear and Integer Programming, Wiley-Interscience Series in
Discrete Mathematics and Optimization, John Wiley & Sons, Chichester, 1986.
[813] A. Schrijver, A combinatorial algorithm minimizing submodular functions in strongly
polynomial time, Journal of Combinatorial Theory B 80 (2000) 346–355.
[814] A. Schrijver, Combinatorial Optimization: Polyhedra and Efficiency, Springer, Berlin,
2003.
[815] M.G. Scutellà, A note on Dowling and Gallier’s top-down algorithm for propositional
Horn satisfiability, Journal of Logic Programming 8 (1990) 265–273.
[816] J. Sebelik and P. Stepanek, Horn clause programs for recursive functions, in: K.L. Clark
and S.-A. Tarnlund, eds., Logic Programming, Academic Press, 1982, pp. 325–340.
[817] D. Seinsche, On a property of the class of n-colorable graphs, Journal of Combinatorial
Theory B 16 (1974) 191–193.
[818] B. Selman, H. Kautz and B. Cohen, Noise strategies for improving local search, in:
Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, WA,
1994, pp. 337–343.
[819] B. Selman, H. Kautz and B. Cohen, Local search strategies for satisfiability testing,
in: D.S. Johnson and M.A. Trick, eds., Cliques, Coloring, and Satisfiability, DIMACS
Series in Discrete Mathematics and Theoretical Computer Science, Vol. 26, American
Mathematical Society, 1996, pp. 521–531.
[820] B. Selman, H. Levesque and D. Mitchell, A new method for solving hard satisfiabil-
ity problems, in: AAAI’92, Proceedings of the Tenth National Conference on Artificial
Intelligence, San Jose, CA, 1992, pp. 440–446.
[821] P.D. Seymour, The forbidden minors of binary matroids, Journal of the London
Mathematical Society Ser. 2, 12 (1976) 356–360.
[822] P.D. Seymour, The matroids with the max-flow min-cut property, Journal of Combinato-
rial Theory B 23 (1977) 189–222.
[823] P.D. Seymour, Decomposition of regular matroids, Journal of Combinatorial Theory B
28 (1980) 305–359.
[824] G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, Princeton,
1976.
[825] G. Shafer, Perspectives on the theory and practice of belief functions, International
Journal of Approximate Reasoning 4 (1990) 323–362.
[826] R. Shamir and R. Sharan, A fully dynamic algorithm for modular decomposition and
recognition of cographs, Discrete Applied Mathematics 136 (2004) 329–340.
[827] C.E. Shannon, The synthesis of two-terminal switching circuits, Bell System Technical
Journal 28 (1949) 59–98.
[828] L.S. Shapley, Simple games: An outline of the descriptive theory, Behavioral Science 7
(1962) 59–66.
[829] L.S. Shapley, Cores of convex games, International Journal of Game Theory 1 (1971)
11–26.
[830] D.R. Shier and D.E. Whited, Algorithms for generating minimal cutsets by inversion,
IEEE Transactions on Reliability R-34 (1985) 314–318.
[831] I. Shmulevich, E.R. Dougherty and W. Zhang, From Boolean to probabilistic Boolean
networks as models of genetic regulatory networks, in: Proceedings of the IEEE 90 (2002)
1778–1792.
[832] I. Shmulevich and W. Zhang, Binary analysis and optimization-based normalization of
gene expression data, Bioinformatics 18 (2002) 555–565.
[833] B. Simeone, Quadratic 0–1 Programming, Boolean Functions and Graphs, Ph.D. thesis,
University of Waterloo, Ontario, Canada, 1979.
Bibliography 671

[834] B. Simeone, Consistency of quadratic Boolean equations and the Kőnig-Egerváry property
for graphs, Annals of Discrete Mathematics 25 (1985) 281–290.
[835] B. Simeone, D. de Werra and M. Cochand, Recognition of a class of unimodular functions,
Discrete Applied Mathematics 29 (1990) 243–250.
[836] I. Singer, Extensions of functions of 0–1 variables and applications to combinatorial
optimization, Numerical Functional Analysis and Optimization 7 (1984-85) 23–62.
[837] P. Slavík, A tight analysis of the greedy algorithm for set cover, Journal of Algorithms 25
(1997) 237–254.
[838] R.H. Sloan, B. Szörényi and G. Turán, Learning Boolean functions with queries, in:
Y. Crama and P.L. Hammer, eds., Boolean Models and Methods in Mathematics, Computer
Science, and Engineering, Cambridge University Press, Cambridge, 2010, pp. 221–256.
[839] R.H. Sloan, K. Takata and G. Turán, On frequent sets of Boolean matrices, Annals of
Mathematics and Artificial Intelligence 24 (1998) 1–4.
[840] N.J.A. Sloane, The On-Line Encyclopedia of Integer Sequences, published electronically
at http://www.research.att.com/∼njas/sequences/ (2006).
[841] J.-G. Smaus, On Boolean functions encodable as a single linear pseudo-Boolean con-
straint, in: P. Van Hentenryck and L.A. Wolsey, eds., Proceedings of the 4th International
Conference on Integration of AI and OR Techniques in Constraint Programming for Com-
binatorial Optimization Problems (CPAIOR 2007), Lecture Notes in Computer Science,
Vol. 4510, Springer-Verlag, Berlin-Heidelberg, 2007, pp. 288–302. Full version available
as: Technical Report 230, Institut für Informatik, Universität Freiburg, Germany, 2007.
[842] D.R. Smith, Bounds on the number of threshold functions, IEEE Transactions on
Electronic Computers EC-15 (1966) 368–369.
[843] J.D. Smith, M.J. Murray, Jr. and J.P. Minda, Straight talk about linear separability,
Journal of Experimental Psychology: Learning, Memory, and Cognition 23 (1997)
659–680.
[844] Z. Stachniak, Going non-clausal, in: Fifth International Symposium on the Theory and
Applications of Satisfiability Testing, SAT 2002, Cincinnati, Ohio, 2002, pp. 316–322.
[845] K.E. Stecke, Formulation and solution of nonlinear integer production planning problems
for flexible manufacturing sytems, Management Science 29 (1983) 273–288.
[846] P.R. Stephan, R.K. Brayton and A.L. Sangiovanni-Vincentelli, Combinational test gen-
eration using satisfiability, IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems 15 (1996) 1167–1176.
[847] A. Sterbini and T. Raschle, An O(n3 ) time algorithm for recognizing threshold dimension
2 graphs, Information Processing Letters 67 (1998) 255–259.
[848] R.R. Stoll, Set Theory and Logic, Dover Publications, New York, 1979.
[849] H. Störmer, Binary Functions and their Applications, Lecture Notes in Economics and
Mathematical Systems, Vol. 348, Springer, Berlin, 1990.
[850] P.D. Straffin, Game Theory and Strategy, The Mathematical Association of America,
Washington, 1993.
[851] M. Sugeno, Fuzzy measures and fuzzy integrals: a survey, in: M.M. Gupta, G.N. Saridis
and B.R. Gaines, eds., Fuzzy Automata and Decision Processes, North-Holland, Amster-
dam, 1977, pp. 89–102.
[852] R. Swaminathan and D.K. Wagner, The arborescence realization problem, Discrete
Applied Mathematics 59 (1995) 267–283.
[853] O. Sykora,An optimal algorithm for renaming a set of clauses into the Horn set, Computers
and Artificial Intelligence 4 (1985) 37–43.
[854] S. Szeider, Backdoor sets for DLL subsolvers, Journal of Automated Reasoning 35 (2005)
73–88.
[855] W. Szwast, On Horn spectra, Theoretical Computer Science 82 (1991) 329–339.
[856] K. Takata, A worst-case analysis of the sequential method to list the minimal hitting sets
of a hypergraph, SIAM Journal on Discrete Mathematics 21 (2007) 936–946.
[857] M. Tannenbaum, The establishment of a unique representation for a linearly separable
function, Lockheed, Technical Note no 20, 1961.
672 Bibliography

[858] R.E. Tarjan, Depth first search and linear graph algorithms, SIAM Journal on Computing
1 (1972) 146–160.
[859] R.E. Tarjan, Amortized computational complexity, SIAM Journal on Algebraic and
Discrete Methods 6 (1985) 306–318.
[860] A.D. Taylor and W.S. Zwicker, Simple games and magic squares, Journal of Combinato-
rial Theory A 71 (1995) 67–88.
[861] A.D. Taylor and W.S. Zwicker, Simple Games: Desirability Relations, Trading, Pseu-
doweightings, Princeton University Press, Princeton, NJ, 1999.
[862] A. Thayse, Boolean Calculus of Differences, Lecture Notes in Computer Science, Vol. 101,
Springer-Verlag, Berlin-Heidelberg-New York, 1981.
[863] A. Thayse, From Standard Logic to Logic Programming, John Wiley & Sons, Chichester
etc., 1988.
[864] P. Tison, Generalization of consensus theory and application to the minimization of
Boolean functions, IEEE Transactions on Electronic Computers EC-16, No. 4 (1967)
446–456.
[865] D.M. Topkis, Supermodularity and Complementarity, Princeton University Press, Prince-
ton, NJ, 1998.
[866] C.A. Tovey, Hill climbing with multiple local optima, SIAM Journal on Algebraic and
Discrete Methods 6 (1985) 384–393.
[867] C.A. Tovey, Low order polynomial bounds on the expected performance of local
improvement algorithms, Mathematical Programming 35 (1986) 193–224.
[868] C.A. Tovey, Local improvement on discrete structures, in: E. Aarts and J.K. Lenstra,
eds., Local Search in Combinatorial Optimization, John Wiley & Sons, Chichester, 1997,
pp. 57–89.
[869] M.A. Trick, A dynamic programming approach for consistency and propagation for
knapsack constraints, Annals of Operations Research 118 (2003) 73–84.
[870] K. Truemper, Monotone decomposition of matrices, Technical Report UTDCS-1-94,
1994.
[871] K. Truemper, Effective Logic Computation, Wiley-Interscience, New York, 1998.
[872] G.S. Tseitin, On the complexity of derivations in propositional calculus, in: A.O. Slisenko,
ed., Studies in Constructive Mathematics and Mathematical Logic, Part II, Consultants
Bureau, New York, 1970, pp. 115–125. (Translated from the Russian).
[873] S. Tsukiyama, M. Ide, H. Ariyoshi and I. Shirakawa, A new algorithm for generating all
the maximal independent sets, SIAM Journal on Computing 6 (1977) 505–517.
[874] J.D. Ullman, Principles of Database and Knowledge-Base Systems, Vol. I: Classical
Database Systems, Computer Science Press, New York, 1988.
[875] J.D. Ullman, Principles of Database and Knowledge-Base Systems, Vol. II: The New
Technologies, Computer Science Press, New York, 1989.
[876] C. Umans, The minimum equivalent DNF problem and shortest implicants, Journal of
Computer and System Sciences 63 (2001) 597–611.
[877] C. Umans, T. Villa and A.L. Sangiovanni-Vincentelli, Complexity of two-level logic
minimization, IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems 25 (2006) 1230–1246.
[878] T. Uno, Efficient computation of power indices for weighted majority games, NII
Technical Report NII-2003-006E, National Institute of Informatics, Japan, 2003.
[879] R.H. Urbano and R.K. Mueller, A topological method for the determination of the minimal
forms of a Boolean function, IRE Transactions on Electronic Computers EC-5 (1956)
126–132.
[880] A. Urquhart, Hard examples for resolution, Journal of the Association for Computing
Machinery 34 (1987) 209–219.
[881] A. Urquhart, The complexity of propositional proofs, Bulletin of Symbolic Logic 1 (1995)
425–467.
Bibliography 673

[882] A. Urquhart, Proof theory, in: Y. Crama and P.L. Hammer, eds., Boolean Models and
Methods in Mathematics, Computer Science, and Engineering, Cambridge University
Press, Cambridge, 2010, pp. 79–98.
[883] L.G. Valiant, The complexity of enumeration and reliability problems, SIAM Journal on
Computing 8 (1979) 410–421.
[884] L.G. Valiant, A theory of the learnable, Communications of the ACM 27 (1984)
1134–1142.
[885] A. Van Gelder, A satisfiability tester for non-clausal propositional calculus, Information
and Computation 79 (1988) 1–21.
[886] A. Van Gelder and Y.K. Tsuji, Satisfiability testing with more reasoning and less guessing,
in: D.S. Johnson and M.A. Trick, eds., Cliques, Coloring, and Satisfiability, DIMACS
Series in Discrete Mathematics and Theoretical Computer Science, Vol. 26, American
Mathematical Society, 1996, pp. 559–586.
[887] J. van Leeuwen, Graph algorithms, in: J. van Leeuwen, ed., Handbook of Theoretical
Computer Science: Algorithms and Complexity, Volume A, The MIT Press, Cambridge,
MA, 1990, pp. 525–631.
[888] H. Vantilborgh and A. van Lamsweede, On an extension of Dijkstra’s semaphore
primitives, Information Processing Letters 1 (1972) 181–186.
[889] Yu.L. Vasiliev, On the comparison of the complexity of prime irredundant and mini-
mal DNFs, in: Problems of Cybernetics, Vol. 10, PhysMatGIz, Moscow, 1963, pp. 5–61
(in Russian).
[890] Yu.L. Vasiliev, The difficulties of minimizing Boolean functions using universal
approaches, Doklady Akademii Nauk SSSR, Vol. 171, No. 1, 1966, pp. 13–16
(in Russian).
[891] T. Villa, R.K. Brayton and A.L. Sangiovanni-Vincentelli, Synthesis of multi-level Boolean
networks, in: Y. Crama and P.L. Hammer, eds., Boolean Models and Methods in Math-
ematics, Computer Science, and Engineering, Cambridge University Press, Cambridge,
2010, pp. 675–722.
[892] H. Vollmer, Introduction to Circuit Complexity: A Uniform Approach, Springer, Berlin -
New York, 1999.
[893] J. von Neumann and O. Morgenstern, Theory of Games and Economic Behavior, Princeton
University Press, Princeton, NJ, 1944.
[894] B.W. Wah and Y. Shang, A discrete Lagrangian-based global-search method for solving
satisfiability problems, in: D. Du, J. Gu and P.M. Pardalos, eds., Satisfiability Prob-
lem: Theory and Applications, DIMACS series in Discrete Mathematics and Theoretical
Computer Science, Vol. 35, American Mathematical Society, 1997, pp. 365–392.
[895] D. Waltz, Understanding line drawings of scenes with shadows, in: P.H. Winston, ed., The
Psychology of Computer Vision, McGraw-Hill, New York, 1975.
[896] C. Wang, Boolean minors, Discrete Mathematics 141 (1995) 237–258.
[897] C. Wang and A.C. Williams, The threshold order of a Boolean function, Discrete Applied
Mathematics 31 (1991) 51–69.
[898] H. Wang, H. Xie, Y.R. Yang, L.E. Li, Y. Liu and A. Silberschatz, Stable egress route
selection for interdomain traffic engineering: Model and analysis, in: Proceedings of
Thirteenth IEEE Conference on Network Protocols (ICNP ’05), Boston, 2005, pp. 16–29.
[899] S. Warshall, A theorem on Boolean matrices, Journal of the ACM 9 (1962) 11–12.
[900] A. Warszawski, Pseudo-Boolean solutions to multidimensional location problems,
Operations Research 22 (1974) 1081–1096.
[901] W.D. Wattenmaker, G.I. Dewey, T.D. Murphy and D.L. Medin, Linear separability
and concept learning: Context, relational properties, and concept naturalness, Cognitive
Psychology 18 (1986) 158–194.
[902] I. Wegener, The Complexity of Boolean Functions, Wiley-Teubner Series in Computer
Science, John Wiley & Sons, Chichester etc., 1987.
674 Bibliography

[903] I. Wegener, Branching Programs and Binary Decision Diagrams: Theory and Applica-
tions, SIAM Monographs on Discrete Mathematics and Applications, SIAM, Philadel-
phia, PA, 2000.
[904] R. Weismantel, On the 0–1 knapsack polytope, Mathematical Programming 77 (1987)
49–68.
[905] D.J.A. Welsh, Matroid Theory, London Mathematical Society Monographs, Vol. 8,
Academic Press, New York, 1976.
[906] D.J.A. Welsh, Matroids: Fundamental concepts, in: R. Graham, M. Grötschel
and L. Lovász, eds., Handbook of Combinatorics, Elsevier, Amsterdam, 1995,
pp. 481–526.
[907] D. Wiedemann, Unimodal set-functions, Congressus Numerantium 50 (1985) 165–169.
[908] D. Wiedemann, A computation of the eighth Dedekind number, Order 8 (1991) 5–6.
[909] D.J. Wilde and J.M. Sanchez-Anton, Discrete optimization on a multivariable Boolean
lattice, Mathematical Programming 1 (1971) 301–306.
[910] H.P. Williams, Experiments in the formulation of integer programming problems,
Mathematical Programming Studies 2 (1974) 180–197.
[911] H.P. Williams, Linear and integer programming applied to the propositional calculus,
Systems Research and Information Sciences 2 (1987) 81–100.
[912] H.P. Williams, Logic applied to integer programming and integer programming applied
to logic, European Journal of Operational Research 81 (1995) 605–616.
[913] R. Williams, C. Gomes and B. Selman, Backdoors to typical case complexity, in: Pro-
ceedings of the International Joint Conference on Artificial Intelligence (IJCAI) 2003,
pp. 1173–1178.
[914] J.M. Wilson, Compact normal forms in propositional logic and integer programming
formulations, Computers and Operations Research 90 (1990) 309–314.
[915] R.O. Winder, More about threshold logic, in: IEEE Symposium on Switching Circuit
Theory and Logical Design, 1961, pp. 55–64.
[916] R.O. Winder, Single stage threshold logic, in: IEEE Symposium on Switching Circuit
Theory and Logical Design, 1961, pp. 321–332.
[917] R.O. Winder, Threshold Logic, Ph.D. Dissertation, Department of Mathematics, Princeton
University, Princeton, NJ, 1962.
[918] R.O. Winder, Properties of threshold functions, IEEE Transactions on Electronic
Computers EC-14 (1965) 252–254.
[919] R.O. Winder, Enumeration of seven-arguments threshold functions, IEEE Transactions
on Electronic Computers EC-14 (1965) 315–325.
[920] R.O. Winder, Chow parameters in threshold logic, Journal of the Association for
Computing Machinery 18 (1971) 265–289.
[921] P.H. Winston, Artificial Intelligence, Addison-Wesley, Reading, MA, 1984.
[922] L.A. Wolsey, Faces for a linear inequality in 0–1 variables, Mathematical Programming
8 (1975) 165–178.
[923] L.A. Wolsey,An analysis of the greedy algorithm for the submodular set covering problem,
Combinatorica 2 (1982) 385–393.
[924] L.A. Wolsey, Integer Programming, Wiley-Interscience Series in Discrete Mathematics
and Optimization, John Wiley & Sons, New York, 1998.
[925] L. Wos, R. Overbeek, E. Lusk and J. Boyle, Automated Reasoning: Introduction and
Applications, Prentice-Hall, Englewood Cliffs, NJ, 1984.
[926] Z. Xing and W. Zhang, MaxSolver: An efficient exact algorithm for (weighted) maximum
satisfiability, Artificial Intelligence 164 (2005) 47–80.
[927] M. Yagiura, M. Kishida and T. Ibaraki, A 3-flip neighborhood local search for the set
covering problem, European Journal of Operational Research 172 (2006) 472–499.
[928] S. Yajima and T. Ibaraki, A lower bound on the number of threshold functions, IEEE
Transactions on Electronic Computers EC-14 (1965) 926–929.
Bibliography 675

[929] S. Yajima and T. Ibaraki, On relations between a logic function and its characteristic
vector, Journal of the Institute of Electronic and Communication Engineers of Japan 50
(1967) 377–384 (in Japanese).
[930] M. Yamamoto, An improved Õ(1.234m )-time deterministic algorithm for SAT, in: X.
Deng and D. Du, eds., Algorithms and Computation - ISAAC 2005, Lecture Notes in
Computer Science, Vol. 3827, Springer-Verlag, Berlin-Heidelberg, 2005, pp. 644–653.
[931] S. Yamasaki and S. Doshita, The satisfiability problem for a class consisting of Horn
sentences and some non-Horn sentences in propositional logic, Information and Control
59 (1983) 1–12.
[932] M. Yannakakis, Node-and edge-deletion NP-complete problems, in: Proceedings of the
10th Annual ACM Symposium on Theory of Computing (STOC) 1978, ACM, NY, USA,
pp. 253–264.
[933] M. Yannakakis, The complexity of the partial order dimension problem, SIAM Journal
on Algebraic and Discrete Methods 3 (1982) 351–358.
[934] M. Yannakakis, On the approximation of maximum satisfiability, Journal of Algorithms
17 (1994) 475–502.
[935] E. Zemel, Easily computable facets of the knapsack polytope, Mathematics of Operations
Research 14 (1989) 760–764.
[936] H. Zhang and J.E. Rowe, Best approximations of fitness functions of binary strings,
Natural Computing 3 (2004) 113–124.
[937] Yu.I. Zhuravlev, Set-theoretical methods in Boolean algebra, Problems of Cybernetics 8
(1962) 5–44 (in Russian).
[938] S. Živný, D.A. Cohen and P.G. Jeavons, The expressive power of binary submodular
functions, Discrete Applied Mathematics 157 (2009) 3347–3358.
[939] Yu.A. Zuev, Approximation of a partial Boolean function by a monotonic Boolean
function, U.S.S.R. Computational Mathematics and Mathematical Physics 18 (1979)
212–218.
[940] Yu.A. Zuev, Asymptotics of the logarithm of the number of threshold functions of the
algebra of logic, Soviet Mathematics Doklady 39 (1989) 512–513.
[941] U. Zwick, Approximation algorithms for constraint satisfaction problems involving at
most three variables per constraint, in: SODA ’98: Proceedings of the 9th Annual ACM-
SIAM Symposium on Discrete Algorithms, 1998, SIAM, Philadelphia, PA, pp. 201–210.
Index

2-Sat problem, see quadratic equation BDD, see binary decision diagram (BDD)
3-Sat problem, see DNF equation, degree 3 belief function, 569
belt function, 159
bidirected graph, 206, 210
absorption, 9, 26
binary decision diagram (BDD), 46–49
closure, 131
and orthogonal DNF, 48
affine Boolean function, 110
ordered (OBDD), 47
algebraic normal form, see representation over
bipartite graph, 542, 549, 611
GF(2)
and conflict codes, 216, 592
aligned function, 398–399
and Guttman scale, 443
almost-positive pseudo-Boolean function, 604
and posiforms, 604
apportionment problem, 434
complete, 611
approximation algorithm, 117
recognition, 219
arborescence, 613
black box oracle, see oracle algorithm
artificial intelligence, 50–52, 68, 124, 174,
blocker, 179
273–274, 279–280, 511, 566, 569, 599
Boolean equation
association rule, 521–522
complexity, 72–74, 104–111
asummable function, 414–417
consistent, 67
k-asummable
definition, 67, 73
definition, 414
DNF, see DNF equation
2-asummable
generating all solutions, 112
complete monotonicity, 396
inconsistent, 67
definition, 395
parametric solutions, 113–115
vs. threshold function, 416
Boolean expression
threshold function, 414
definition, 10
weakly asummable function, 429
dual, 14
Chow function, 430
equivalent, 12
length, size, 12
backdoor set, 83 of a function, 10–13
Banzhaf index, 57–58 read-once, 448
and pseudo-Boolean approximations, 593 satisfiable, 68
and strength preorder, 360 tautology, 68
definition, 57 valid, 68
in reliability, 59 Boolean function, 3
of threshold functions, 349, 434, 436, 437 expression, representation, 10–13
raw, 57 normal form representations, 15–19

677
678 Index

bottleneck optimization, 180 condensation of a digraph, 613


branching procedures for Boolean equations, conflict code, 216
80–87 conflict graph, 216–218, 591
branching on terms, 83 stable set, 591
branching rules, 82 conflicting terms, 216, 545, 591
complexity, 104, 105 conjunction
Boolean, 8
pseudo-Boolean, 574
Choquet capacity, 569
conjunctive normal form (CNF), 14–19
Chow function, 429
clause, 15
weakly asummable function, 430
definition, 15
Chow parameters
of a function, 15–19
and Banzhaf indices, 57
pseudo-Boolean, 576
and degree sequences, 441
connected component, 611
and essential variables, 31
consensus
and reliability, 59
Chvátal cut, 99
and strength preorder, 355
closure, 131, 293
complexity, 61, 436
derivation, 93
definition, 24
of two conjunctions, 92
modified, 56, 66, 431
unit consensus, 95, 97, 102
of threshold functions, 428–438
consensus procedure for Boolean equations,
circuit: combinational, logic, switching, 5,
92–95
52–55, 69–71, 142, 174, 405, 456, 558
clause, 15 and cutting-plane proofs, 99
clique, 610 complexity, 104, 108
clutter, 60, 177, 451, 614 hard examples, 105
CNF, see conjunctive normal form (CNF) input consensus, 122
co-Horn DNF, 207 linear consensus, 102, 122
co-Horn function, 309 consensus procedure for prime implicants,
and Schaefer’s theorem, 110 130–138
co-occurrence graph, 449, 455 disengagement order, 256
of read-once function, 458 input consensus, 289
oracle algorithm, 475 input disengagement, 255
coalition input prime consensus, 291
blocking, 182 prime implicant depletion, 287
maximal losing, 56 term disengagement, 137
minimal winning, 56 variable depletion, 135
winning, 56 constraint satisfaction problem, 108–111
cograph, see P4 -free graph constraint set, 108
coloring control set, 83, 86
of graphs, 121, 220 convex Boolean function, 545
of hypergraphs, 71–72, 178–180 convex envelope
complementation (Boolean), 8 of a Boolean function, 546
complete DNF, 27 of a pseudo-Boolean function, 581
recognition, 134 convex hull of terms, 546
completely monotone function, 391–397 Cook’s theorem, 41, 72–74, 230, 622–624
2-asummability, 396 correlation polytope, 595
dual-comparability, 394–395 coterie, 182
recognition, 396 crossover point, 108
vs. threshold function, 392 cube, 15
completely unimodal pseudo-Boolean function, cut-threshold graph, 444
606 cutset, 181
computer vision, 567 cutting-plane proof, 99
concave envelope of a pseudo-Boolean function, Chvátal closure, 98
581 Chvátal cut, 98
Index 679

complexity, 104 quadratic, 18


consensus cut, 99 random, 106
hard examples, 105 redundant, 27
term, 15
transformation into, 19–22
data mining, 511, 521, 522, 538, 567
distributed computing, 182, 407
databases, 274–275, 593
DNF, see disjunctive normal form (DNF)
Davis-Putnam rules, see DNF equation,
DNF equation
Davis-Putnam rules
branching procedures, 80–87
De Morgan’s laws, 9
consensus procedure, 92–95, 135
decision tree, 47–49
counting solutions, 111
complexity, 175
Davis-Putnam rules, 84, 91
construction, 48
definition, 67, 73
depth, 49, 175
degree 3 (3-Sat), 73, 74, 105, 121
of a pdBf, 526
random, 107
decomposable function, 540
reduction to, 77
degree
heuristics, 86, 102
of a Boolean function, 492
Horn relaxation, 86
of a DNF, 18
integer programming approaches, 95–100
of a polynomial threshold function, 407
nonlinear programming approaches, 100–102
of a pseudo-Boolean function, 571
of an elementary conjunction, 18 preprocessing, 84–87
of prime implicants, 163, 193 quadratic relaxation, 87
degree-k DNF, 41, 197 random, 106
functions representable by, 41 relative strength of procedures, 104
characterization by functional equations, relaxation schemes, 86
500, 506 rewriting rules, 84
recognition, 43 satisfiability problem, 74
degree-k extension, 535 variable elimination, 87–91
degree-k function, 491–495 domishold graph, 444
characterization by functional equations, 494, don’t care points, 512, 558
496 dual expression, 14
degree-k pseudo-Boolean approximation, 593 dual function, 13
disjunction mutually dual functions, 168, 183, 190
Boolean, 8 dual implicant, 169, 177, 308, 450, 458
pseudo-Boolean, 574 dual subimplicant, 452
disjunctive normal form (DNF), 14–19 recognition, 455
complete, 27 theorem, 453
recognition, 134 dual-comparable function, 170–174, 178–179,
definition, 15 394
degree dual-major function, 170, 179, 180, 182,
definition, 18 394, 411
typical, 163 dual-minor function, 170, 178, 180, 182,
extremal size, 161 394, 411
irredundant, 27 duality
linear, 18 and bottleneck optimization, 181
mixed, 207 and game theory, 13, 56, 182, 477–480
monotone, 35 and hypergraphs, 61, 177, 179
negative, 35 and integer programming, 180
of a function, 15–19 and reliability theory, 181
orthogonal, see orthogonal DNF principle, 169
polar, 207 dualization
positive, 35 algorithms, 183–189, 192–196
prime, 27 Berge multiplication, 187
pseudo-Boolean, 575 by sequential distributivity, 186–189
680 Index

dualization (cont.) convex envelope, 581


complexity, 183–186, 189–192, 481 Lovász extension, 583, 590
double dualization, 141, 188 paved upper-plane, 583
equivalent problems, 191, 196 polynomial, 579, 588–589
Fredman-Khachiyan algorithm, 192–196, 308 standard, 582, 590, 595
of Horn functions, 306–309
of quadratic functions, 263–266
facility location, 570
of regular functions, 369–377
false points, 3
of shellable functions, 336–338
maximal, 38
recursive algorithm, 197
Fourier expansion, see representation over
vs. identification, 196
the reals
functional equations
electrical engineering, 5, 52–55, 69–71, 174, addition of inessential variables, 495
224–226, 405 certificate of non-membership, 501, 505
elementary conjunction characterizable classes, 490, 495–506
Boolean, 15 definition, 489
pseudo-Boolean, 575 finitely characterizable classes, 500–506
elementary disjunction for co-Horn functions, 491
Boolean, 15 for degree-k positive functions, 494
pseudo-Boolean, 576 for Horn functions, 208, 278, 491
elementary operations, 8 for linear functions, 493
properties, 9 for positive functions, 488
Espresso, 69, 82 for quadratic functions, 208, 493
essential variable, 30 for submodular functions, 309, 492
of a read-once function, 474, 484 for supermodular functions, 492
recognition, 43, 436, 484 forbidden identification minors, 504
exclusive-or, 44 identification of variables, 495
expert system, 50, 68, 124, 205, 273, 512, 566 non-characterizable classes, 496
expression, see Boolean expression fuzzy measure, 569
extension of a pdBf, 514–558
best fit, 548
game theory
bi-theory, 525
characteristic form, 568, 593
convex extension, 545–547
positional games, 477–480
decision tree, 529
simple games, 7, 55–58, 182, 348, 354, 360,
decomposable extension, 539–545
391, 406, 434–436, 593
definition, 511
constant-sum, 182
degree-k extension, 535
decisive, 182
existence, 514
proper, 182
Horn extension, 535–538
strong, 182
largest, 515
greedy heuristic
monotone extension, 532–534
for Boolean equations, 86, 135
positive extension, 531
for logic minimization, 156–159
smallest, 515
for set covering, 156
theory, 519
for submodular optimization, 603
threshold extension, 538–539
Guttman scale, 443
with errors, 547–551
with missing bits, 551–558
consistent, 552 Hamming distance, 523
fully robust, 552, 553 heuristics
most robust, 552 for Boolean equations, 86, 102
extension of a pseudo-Boolean function, 578 Horn DNF
concave, 581–583, 589–590 definition, 51, 270
concave envelope, 581 dual recognition, 308
convex, 581–583 extended, 319
Index 681

generalized, 320 complexity of recognition, 138


irredundant and prime, 283, 294–296, 314 definition, 26
literal minimization: approximability, 301 dual, 169, 450
literal minimization: complexity, 301 geometric interpretation, 31
minimization, 297 pseudo-Boolean, 576
polynomial hierarchies, 320–321 vs. true point, 39
pure, 270 implicate
renamable, 235, 314–316 definition, 28
source sides minimization: complexity, 302 dual, 169
term minimization: approximability, 300–301 pseudo-Boolean, 576
term minimization: complexity, 298–300 vs. false point, 39
Horn equation, 86, 281–286 implication graph, 212–216, 247
and forward chaining, 284–285 and generation of prime implicants, 250–254
and unit literal rule, 281–284 and irredundant quadratic DNFs, 261–263
unique solution, 286 Mirror property, 212
Horn function solving quadratic equations, 239–240
acyclic, 313–314 incompletely specified function, see partially
and Schaefer’s theorem, 110 defined Boolean function
bidual, 311–312 inessential variable, 30
characteristic models, 280, 322 addition, 495
characterization by functional equations, input consensus, 122, 255, 289, 291
278, 491 integer programming
definition, 271 Max Sat, 117
double Horn, 312–313 aggregation problem, 422, 442
dual, 306 Boolean equations, 95–100
dual recognition, 307 Boolean formulations, 62–65, 180–181
dualization, 306–309 Boolean preprocessing, 72
false points, 277–280 extended Horn equations, 319
generation of prime implicants, 140, 286–292 Horn equations, 276–277
maximal minorant, 279, 321–323 knapsack problem, 407, 426, 427
minimal majorant, 321–323 logic minimization, 143, 156
properties of prime implicants, 292–296 monotone pdBf extension, 533
pure, 271 nonlinear, see pseudo-Boolean optimization
recognition, 272 pdBf support set, 532
renamable, 316, 317 pseudo-Boolean formulations, 568–570
Horn term regular set covering, 377–380, 388–390
definition, 269 set packing, 442
head, 270 strength preorder, 359
positive, 270 total unimodularity, 221
pure, 270
subgoals, 270
hypergraph, 60–61, 298 k-monotone function, 391–397
coloring, 71, 178–180 dual, 394
definition, 614 related properties, 430
directed, 275 Karnaugh map, 32
hierarchy, 320 knapsack problem, 64, 407, 426, 427
regular, 360 Kőnig-Egerváry property, 72, 211, 223–224, 231
stable sets, 7, 34
transversal, 177, 179
LE property, see lexico-exchange property
leader, 339, 359
ideal function, 399–400 left-shift, 362–365
identification minor, 503 ceiling, 363, 420
identification of variables, 495 definition, 362, 420
implicant floor, 363, 420
682 Index

level graph, 227 approximability, 118–121


lexico-exchange property, 338–346 complexity, 116
dualization, 345–346 integer programming formulation, 117
orthogonal form, 345 pseudo-Boolean formulation,
quadratic functions, 346–347 566, 590
recognition, 340–345 maxterm, 17
regular functions, 359 (Boolean) expression, 17
shellability, 339 median graph, 243
lexicographic order membership problems
of points in Rn , 366, 374 complexity, 42
of sets, 336, 338, 362 definition, 40
linear equations over GF(2), 110 DNF, 40
linear function functional, 40
characterization by functional equations, 493 certificate of non-membership, 501
linearly separable function, see threshold min-cut problem, 603, 604
function minorant
list-generating algorithms definition, 24
complexity, 625–626 largest positive, 66, 166
for prime implicants, 128–141 largest regular, 380, 382, 389
for solutions of Boolean equations, 112 maximal Horn, 279, 321–323
for solutions of quadratic equations, 244 regular, 380–390
local maximum of pseudo-Boolean function, 606 minterm, 17
complexity, 587 (Boolean) expression, 17
definition, 586 (pseudo-Boolean) PBNF, 572
logic minimization, 141–159 models (of a Boolean equation), 51, 68, 273,
approximability, 156–159 279, 280
complexity, 150–156 monomial, 15
extremal number of terms, 161 monotone Boolean function, 34
literal minimization, 142 k-monotone, 392
set covering formulation, 143–150 characterization by functional equations,
term minimization, 142 496
typical number of terms, 164 completely monotone, 392
with don’t cares, 558–561
monotone literal rules, 85
logical analysis of data, 512
monotone pseudo-Boolean function, 566, 568,
Lovász extension, 583, 590
569, 596–598
and supermodular functions, 601
derivatives, 596
DNF and CNF, 597
Möbius transform, 571 recognition, 596
machine learning, 47, 52, 196, 473, 512, 522, multicriteria decision-making, 569, 578
538, 547, 567 multilinear expression, extension, see
majorant polynomial expression, extension
definition, 24 mutually dual functions, 168, 183, 190
Horn, 321–323
regular, 380–390
smallest positive, 66 negation (Boolean), 8
smallest regular, 380, 386 negative function, 34
matched graph, see quadratic equation Negative Unit Literal Rule (NULR), 281
matching, 610, 614 neural network, 406, 408
matroid, 197, 348, 566 nonlinear programming
Max Sat problem, see maximum satisfiability for Boolean equations, 100–102
max-cut problem, 565, 603 normal form of a pseudo-Boolean function
max-quadratic pseudo-Boolean function, 226 (PBNF), 572
maximum satisfiability, 115–121 normal function, 450
Max 2-Sat, 249 recognition, 468, 483
Index 683

ODNF, see orthogonal DNF (ODNF) perceptron, 406


oracle algorithm, 5 perfect matrix, 277
for essential variables, 484 Petrick function, 144
for identification, 196 phase transition for DNF equations, 108
for read-once learning, 473–476, 485 piecewise linear representation of a
for read-once recognition, 472 pseudo-Boolean function, 574
for regular recognition, 369 pigeonhole formula, 105
for supermodular optimization, 603 polar function
for threshold recognition, 417, 445 Boolean, 208, 492
orthogonal DNF (ODNF), 22, 48, 326 pseudo-Boolean, 604
algorithms, 326–330 polynomial delay: definition, 625
of a regular function, 361 polynomial expression
of a shellable DNF, 331–335 of a Boolean function, 45
reliability, 58 of a pseudo-Boolean function, 571
vs. dual DNF, 329 polynomial extension of a pseudo-Boolean
function, 579
optimization, 588–589
P4 -free graph, 449
polynomial incremental time: definition, 625
complement, 462
polynomial total time: definition, 625
cotree, 464
posiform of a pseudo-Boolean function, 572
recognition, 465
optimization, 590–592
vs. cograph, 463–466
positive function, 34
parametric solution of equations, 113–115
recognition, 43
quadratic equations, 246–249
reproductive, 114 prime implicant
parity function, 44, 66, 162, 444 definition, 26
parse tree dual, 169, 177, 183
as game tree, 478 essential, 146, 311, 314
of a positive expression, 456 linear, 125
of a read-once expression, 456 number of, 129, 159–161
partially defined Boolean function, 511 of Horn functions, 292–296
basic theory, 520 of positive functions, 35–38
bi-theory, 525–526, 529 pseudo-Boolean, 576
co-pattern, 520 quadratic, 126
co-theory, 520 redundant, 146, 314
decision tree, 526–531 small degree, 125, 193
extension, 511, 514–558 vs. minimal pathset, 58
irredundant theory, 519 vs. minimal true point, 39
logic minimization, 558–561 vs. minimal winning coalition, 56
missing bits, 551 prime implicant generation, 128–141
pattern, 518, 567 by consensus, 130–138
prime pattern, 518 by double dualization, 141, 188
prime theory, 519 by term disengagement, 137
support set, 515 by variable depletion, 135
theory, 518–526 complexity, 139, 141
definition, 519 for Horn functions, 140, 286–292
pathset, 58, 181, 348 for quadratic functions, 140, 250–254
pattern (of a pdBf), see partially defined for threshold functions, 423–428
Boolean function, pattern from transitive closure, 251–254, 310
paved upper-plane, 583 from true points, 128
extension, 583 tractable classes, 140
PBNF, see normal form of a pseudo-Boolean prime implicate
function definition, 28
pdBF Decision Tree, 528 dual, 169
pdBf, see partially defined Boolean function pseudo-Boolean, 576
684 Index

prime implicate (cont.) FK-Dualization (Fredman and Khachiyan),


vs. maximal false point, 39 196
vs. maximal losing coalition, 56 Forward Chaining, 285
problem definition GMR Read-Once Recognition, 467
(T , F ) || φ ||-minimization, 143 Implicant, 342
T || φ ||-minimization, 143 Input Consensus, 289
minT || φ ||-minimization, 143 Input Disengagement Consensus, 257
|| φ ||-minimization, 143 Input Prime Consensus, 291
Add-j , 197 LE-Property (lexico-exchange), 344
Best-Fit(C), 548 MinTrue, 424
Boolean Equation, 73 NULR (Negative Unit Literal Rule), 282
CE(C) (consistent extension), 553 Orthogonalize., 328
DNF Dualization , 183 Prime Implicant Depletion, 287
DNF Equation, 73 Prime Patterns, 559
DNF Membership in C, 40 Quadratic Prime Implicants, 254
Dual Recognition , 183 Recognize Dual, 194
Dualization , 183 RegCover0 (regular set covering), 377
Extension(C), 514 RegCover (regular set covering), 379
FRE(C) (fully robust extension), 553 RegMinor0 (largest regular minorant), 383
Forbidden-color graph bipartition, 221 RegMinor (largest regular minorant), 386
Regular (recognition), 368
Functional Membership in C, 40
SD-Dualization (sequential distributive),
Horn µ-Minimization, 298
188
Horn Dual Recognition, 308
Threshold (recognition), 418
Identification, 196
Tree-DNF, 527
Implicant Recognition, 138
production planning, 569
Isotony, 227
projection of a Boolean function, 28
MRE(C) (most robust extension), 553
propositional logic, 5, 50–52, 68, 124, 273–274
Min-Support(C), 515
pseudo-Boolean function
Positive DNF Dualization, 190
almost-positive, 604
Positive Dual Recognition , 190
approximation, 593
Prime Implicants, 128
completely unimodal, 606
Quadratic DNF Dualization, 263 concave extension, 581–583
Read-Once Exact Learning, 474 continuous extensions, 578–585
Read-Once Recognition, 466 convex extension, 581–583
Regular Dualization, 369 definition, 564
Regularity Recognition, 365 degree, 571
Satisfiability, 74 derivative, 586, 596, 599
Set Cover, 515 DNF and CNF, 574–578
Shellability, 349 linear, 601
Threshold Recognition, 417 Lovász extension, 583–585
procedure modular, 601
AHK Read-Once Exact Learning, 474 monotone, 596–598
Branch (solving equations), 81 normal form (PBNF), 571–572
Checking Normality, 469 paved upper-plane extension, 583
Consensus* (prime implicants), 131 piecewise linear representation, 573–574
Consensus (solving equations), 94 polynomial expression, 570–571
Decision Tree, 48, 528 polynomial extension, 579–580
Disengagement Consensus, 138 posiform, 572–573
DualReg0 (regular dualization), 372 quadratic, 594–596
DualReg (regular dualization), 376 representation, 570–578
Eliminate (solving equations), 89 standard extension, 582
Expand* (Tseitin’s procedure), 77 submodular, 599–603
Expand (Tseitin’s procedure), 22 supermodular, 599–605
Index 685

threshold, 606 definition, 203


unate, 604 generation of prime implicants, 140, 250–254
unimax, 606 purely quadratic, 204
unimodal, 605–606 shellability, 346–347
unimodular, 604–605 quadratic graph, 216–218
pseudo-Boolean optimization quadratic irredundant DNF
concave extension, 589–590 and transitive reduction, 261–263
conflict graph, 590–592 quadratic pseudo-Boolean function
continuous relaxations, 588–590 optimization, 594–596
linearization, 589 super- or submodular, 601
local optima, 585–587, 606 quadric polytope, 595
Lovász extension, 590
methods, 585–592
models, 565–570 random Boolean function, 163
polynomial extension, 588–589 random DNF equation, 106
quadratic, 594–596 crossover point, 108
rounding (Rosenberg’s theorem), 101, 193, phase transition, 108
588, 590 quadratic, 241–243, 267–268
standard extension, 589–590 threshold conjecture, 107
standard extension bound, 590, 595 random DNF expression, 106
supermodular function, 603–605 rank function (vector space, matroid), 566
transformation to the quadratic case, 594 read-m function, 476–477
variable elimination, 587–588 read-once expression
psychology, 443, 512, 538 definition, 448
parse tree, 456
unicity, 456
Q-Horn read-once function
DNF, 317 P4 -freeness, 461
equation, 318 characterization, 450, 458
function, 317 characterization by functional equations, 496
quadratic equation (2-Sat), 230–243 co-occurrence graph, 456
Alternative Labeling algorithm, 234–235 definition, 448
and Schaefer’s theorem, 109 learning with oracle, 473–476
definition, 204 normality, 459
forced literal, 214 positional game, 477–480
generating all solutions, 244 recognition, 466–473
implication graph, 212–216, 239–240, 247 arbitrary representation, 470–473
Labeling algorithm, 231–234 complete DNF expression, 466–470
matched graph, 210–212, 236–239 Reed-Muller expansion, see representation over
number of solutions, 61, 244 GF(2)
on-line, 249 regular function
parametric solution, 246–249 characterization by functional equations, 496
random, 107, 241–243, 267–268 definition, 354
reducibility to, 218–230 dualization, 369–377
set of solutions, 243 largest regular minorant, 380–390
Strong Components algorithm, 239–240 left-shifts, 362–365
Switching algorithm, 235–239 lexico-exchange property, 359
twin literals, 215 maximal false points, 371, 375
variable elimination, 230 number of, 376
vs. 2-Satisfiability, 205 prime implicant recognition, 367–368
quadratic function recognition, 365–369
and Schaefer’s theorem, 109 set covering, 377–380
and transitive closure, 210, 251–254 smallest regular majorant, 380–390
characterization by functional equations, 208, threshold graph, 439
493 vs. aligned, 400
686 Index

regular function (cont.) strength relation, 351–362, 391–397


vs. ideal, 400 k-monotonicity, 392
vs. weakly regular, 400 and Chow parameters, 355
Winder matrix, 367 and regular functions, 354
regular hypergraph, 360 and reliability theory, 361
reliability polynomial, 59, 361, 580 and simple games, 355, 360, 391
reliability theory, 7, 34, 58–59, 181, 348, complete monotonicity, 392
361–362, 406 leader, 359
renaming variables, see switching variables of dual function, 358
representation by Boolean expressions, 10–13 of restrictions, 357, 393
normal forms, 15–19 on subsets, definition, 391
representation over GF(2), 44 on variables, definition, 352
linear equations, 110 recognition, 356, 366–369, 401
representation over the reals, 45 Winder matrix, 366
resolution principle, 92 strong component, 613
resolvent of Boolean constraints, 62, 407, 422 strongly connected digraph, 613
restriction of a Boolean function, 28 struction, 592
roof dual, 595 stuck-at fault, 69–71, 87
strong persistency, 595 subcube, 31
submodular Boolean function, 208, 309–311
characterization by functional equations, 309,
satisfiability problem, see DNF equation 492
satisfiable expression, 68 submodular pseudo-Boolean function, 566,
Schaefer’s theorem, 108–111 599–603
self-dual extension, 172, 402, 412 derivatives, 599
self-dual function, 13, 170, 179–180, 182–183, quadratic, 601
196, 199, 402, 412, 445 recognition, 601
set covering problem, 63, 66, 143, 147, 150, sum of disjoint products, see orthogonal DNF
156, 180, 299, 422, 515, 532, 562 superadditive pseudo-Boolean function, 568
generalized, 62, 276, 422 supermodular Boolean function, 208
regular, 377–380, 388–390 characterization by functional equations, 492
set function, 564 supermodular pseudo-Boolean function,
shadow, 332, 337, 362 599–605, 608
Shannon expansion, 29–30, 44, 88 derivatives, 599
shellable DNF, 330 Lovász extension, 601
dualization, 336–338, 349 optimization, 603–605
orthogonal form, 331–335 quadratic, 601
recognition, 335, 349 recognition, 601
shellable function, 335 support set of a pdBf, 515–518
characterization by functional equations, 496 minimum, 515
shelling, 330 complexity, 517
recognition, 335 positive, 532
signed graph, 220 swing, 57, 433
Sperner family, Sperner hypergraph, 60, 177, switching variables, 235–239, 314–316, 321,
614 604–605
split graph, 220 symmetric
spread of a function, 162 function, 65, 159, 361
stability function, 7, 34, 60, 209, 402 variables, 352, 356
stable set, 7, 34, 60, 71, 177, 197, 263, 439, 610, system of equations (Boolean), 74–76
614
and pseudo-Boolean optimization, 565,
590–592, 594 tautology, 68
standard extension of a pseudo-Boolean term, 15
function, 582 theory (of a pdBf), see partially defined Boolean
bound, 590, 595 function, theory
Index 687

threshold conjecture, 107 and Horn functions, 277


threshold dimension, 443, 446 pseudo-Boolean optimization, 604
threshold function recognition, 221
asummability, 414, 502 transitive closure, 613
certificate of non-membership, 501 and quadratic functions, 210
characterization by functional equations, 499, by consensus, 255–261
501–503 transitive reduction, 613
Chow function, 429 and irredundant quadratic DNFs, 261–263
Chow parameters, 428–438 transversal, 64, 177–181, 610, 614
computation, 435 tree, 611
vs. separating structure, 430–435 rooted, 611
complete monotonicity, 392, 411 triangulated graph, 346, 565
definition, 404 true points, 3
dual, 411 minimal, 38
generation of prime implicants, 423–428 number of, 23, 24, 61, 111, 163, 327, 436
linear programming characterization, 413 truth table, 4, 143, 156, 158, 417
number of threshold functions, 412 Tseitin’s procedure, 19–22, 76
polynomial threshold function, 407
prime implicant characterization, 423
unate function
pseudo-Boolean, 606
Boolean, 34
recognition, 417–422
pseudo-Boolean, 604
complexity, 418, 421
unimax pseudo-Boolean function, 606
oracle algorithm, 417, 445 unimodal pseudo-Boolean function, 605–606
regularity, 410 unimodular pseudo-Boolean function, 604–605
related graph classes, 438–444 unique games conjecture, 121
related properties, 430 unit literal rules, 85
restrictions, 408 and extended Horn equations, 319
separating structure, 404, 408–410 and Horn equations, 281–284
cone, 409 unit resolution, 85
integral, 409
random, 432, 435
size of weights, 413 variable elimination
shellability, 348 complexity, 104, 106
vs. 2-asummability, 416 for Boolean equations, 87–91
threshold graph, 438–444 for pseudo-Boolean optimization, 587–588
aggregation problem, 442
degree sequence, 440–442, 446
weakly regular function, 397–398
forbidden subgraphs, 440
weighted majority game, 56, 182, 348, 406, 407,
Guttman scale, 443
434, 436
threshold dimension, 443
threshold synthesis, 417
topological ordering of a digraph, 613 Zhegalkin polynomial, see representation over
totally unimodular matrix GF(2)

You might also like