chapter6
chapter6
Most programmers know how hard it is to make a program work. In the 1970s, it
became apparent that programmers could no longer cope with software projects
that were growing ever more complex. Systems were delayed and cancelled;
costs escalated. In response to this software crisis, several new methodologies
have arisen — each an attempt to master the complexity of large systems.
Structured programming seeks to organize programs into simple parts with
simple interfaces. An abstract data type lets the programmer view a data struc-
ture, with its operations, as a mathematical object. The next chapter, on modules,
will say more about these topics.
Functional programming and logic programming aim to express computa-
tions directly in mathematics. The complicated machine state is made invisible;
the programmer has to understand only one expression at a time.
Program correctness proofs are introduced in this chapter. Like the other
responses to the software crisis, formal methods aim to increase our understand-
ing. The first lesson is that a program only ‘works’ if it is correct with respect
to its specification. Our minds cannot cope with the billions of steps in an ex-
ecution. If the program is expressed in a mathematical form, however, then
each stage of the computation can be described by a formula. Programs can
be verified — proved correct — or derived from a specification. Most of the
early work on program verification focused on Pascal and similar languages;
functional programs are easier to reason about because they involve no machine
state.
Chapter outline
The chapter presents proofs about functional programs, paying particu-
lar attention to induction. The proof methods are rigorous but informal. Their
purpose is to increase our understanding of the programs.
The chapter contains the following sections:
Some principles of mathematical proof. A class of ML programs can be treated
215
216 6 Reasoning About Functional Programs
within elementary mathematics. Some integer functions are verified using math-
ematical induction.
Structural induction. This principle generalizes mathematical induction to
finite lists and trees. Proofs about higher-order functions are presented.
A general induction principle. Some unusual inductive proofs are discussed.
Well-founded induction provides a uniform framework for such proofs.
Specification and verification. The methods of the chapter are applied to an
extended example: the verification of a merge sort function. Some limitations
of verification are discussed.
Recall from Section 2.11 that functional programs are computed by reduction:
Computing facti (n, p) yields a unique result for all n ≥ 0; thus facti is a
mathematical function satisfying these laws:
facti (0, p) = p
facti (n, p) = facti (n − 1, n × p) for n > 0
6.1 ML programs and mathematics 217
If n < 0 then facti (n, p) produces a computation that runs forever; it is unde-
fined. We may regard facti (n, p) as meaningful only for n ≥ 0, which is the
function’s precondition.
For another example, consider the following declaration:
fun undef (x ) = undef (x )-1;
Since undef (x ) does not terminate for any x , we shall not regard it as meaning-
ful. We may not adopt undef (x ) = undef (x ) − 1 as a law about numbers, for
it is clearly false.
It is possible to introduce the value ⊥ (called ‘bottom’) for the value of a
nonterminating computation, and develop a domain theory for reasoning about
arbitrary recursive function definitions. Domain theory interprets undef as the
function satisfying undef (x ) = ⊥ for all x . It turns out that ⊥ − 1 = ⊥, so
undef (x ) = undef (x ) − 1 means simply ⊥ = ⊥, which is valid. But domain
theory is complex and difficult. The value ⊥ induces a partial ordering on all
types. All functions in the theory must be monotonic and continuous over this
partial ordering; recursive functions denote least fixed points. By insisting upon
termination, we can work within elementary set theory.
Restricting ourselves to terminating computations entails some sacrifices. It
is harder to reason about programs that do not always terminate, such as in-
terpreters. Nor can we reason about lazy evaluation — which is a pity, for
using this sophisticated form of functional programming requires mathematical
insights. Most functional programmers eventually learn some domain theory;
there is no other way to understand what computation over an infinite list really
means.
Logical notation. This chapter assumes you have some familiarity with formal
proof. We shall adopt the following notation for logical formulæ:
¬φ not φ
φ∧ψ φ and ψ
φ∨ψ φ or ψ
φ→ψ φ implies ψ
φ↔ψ φ if and only if ψ
∀x . φ(x ) for all x , φ(x )
∃x . φ(x ) for some x , φ(x )
From highest to lowest precedence, the connectives are ¬, ∧, ∨, →, ↔. Here is
an example of precedence in formulæ:
P ∧ Q → P ∨ ¬R abbreviates (P ∧ Q) → (P ∨ (¬R))
218 6 Reasoning About Functional Programs
case and φ(1), φ(2), . . . , by repeated use of the induction step. Therefore φ(n)
holds for all n.
As a trivial example of induction, let us prove the following.
n is even or n is odd
k is even or k is odd
and prove
k + 1 is even or k + 1 is odd.
By the induction hypothesis, there are two cases: if k is even then k + 1 is odd;
if k is odd then k + 1 is even. Since the conclusion holds in both cases, the proof
is finished. t
u
Theorem 2 Every natural number has the form 2m or 2m + 1 for some natural
number m.
220 6 Reasoning About Functional Programs
Proof Since this property is fairly complicated, let us express it in logical nota-
tion:
∃m . n = 2m ∨ n = 2m + 1
∃m . k = 2m ∨ k = 2m + 1
∃m 0 . k + 1 = 2m 0 ∨ k + 1 = 2m 0 + 1.
Observing that itfib(n, prev , curr ) is defined for all n ≥ 1, we set out to prove
itfib(n, 0, 1) = Fn . As in the previous example, the induction formula must
be generalized to say something about all the arguments of the function. There
is no automatic procedure for doing this, but examining some computations of
itfib(n, 0, 1) reveals that prev and curr are always Fibonacci numbers. This
suggests the relationship
itfib(n, Fk , Fk +1 ) = Fk +n .
Again, a universal quantifier must be inserted before induction.
Theorem 5 For every integer n ≥ 1, itfib(n, 0, 1) = Fn .
Proof Put k = 0 in the following formula, which is proved by induction on n:
∀k . itfib(n, Fk , Fk +1 ) = Fk +n
Since n ≥ 1, the base case is to prove this for n = 1:
∀k . itfib(1, Fk , Fk +1 ) = Fk +1
This is immediate by the definition of itfib.
For the induction step, the induction hypothesis is given above; we must show
∀k . itfib(n + 1, Fk , Fk +1 ) = Fk +(n +1) .
We prove this by simplifying the left side:
itfib(n + 1, Fk , Fk +1 )
= itfib(n, Fk +1 , Fk + Fk +1 ) [itfib]
= itfib(n, Fk +1 , Fk +2 ) [Fibonacci]
= F(k +1)+n [ind hyp]
= Fk +(n +1) [arithmetic]
6.3 Simple examples of program verification 225
Powers. We now prove that power (x , k ) = x k for every real number x and
integer k ≥ 1. Recall the definition of power (Section 2.14):
fun power (x ,k ) : real =
if k =1 then x
else if k mod 2 = 0 then power (x *x , k div 2)
else x * power (x *x , k div 2);
> val power = fn : real * int -> real
The proof will assume that ML’s real arithmetic is exact, ignoring roundoff er-
rors. It is typical of program verification to ignore the limitations of physical
hardware. To demonstrate that power is suitable for actual computers would
require an error analysis as well, which would involve much more work.
We must check that power (x , k ) is defined for k ≥ 1. The case k = 1 is
obvious. If k ≥ 2 then we need to examine the recursive calls, which replace k
by k div 2. These terminate because 1 ≤ k div 2 < k .
Since x varies during the computation of power (x , k ), the induction formula
must have a quantifier:
∀x . power (x , k ) = x k
∀x . power (x , 1) = x 1 .
Exercise 6.4 Verify that introot computes integer square roots (Section 2.16).
Exercise 6.5 Recall sqroot of Section 2.17, which computes real square roots
by the Newton-Raphson method. Discuss the problems involved in verifying
this function.
Structural induction
Mathematical induction establishes φ(n) for all natural numbers n by
considering how a natural number is constructed. Although there are infinitely
many natural numbers, they are constructed in just two ways:
• 0 is a number.
• If k is a number then so is k + 1.
Strictly speaking, we should introduce the successor function suc and reformu-
late the above:
• If k is a number then so is suc(k ).
Addition and other arithmetic functions are then defined recursively in terms of
0 and suc, which are essentially the constructors of an ML datatype. Structural
induction is a generalization of mathematical induction to datatypes such as lists
and trees.
6.4 Structural induction on lists 227
This theorem does not apply to infinite lists, for [1,1,1,. . . ] equals its own tail.
The structural induction rules given here are sound for finite objects only. In
domain theory, induction can be extended to infinite lists — but not for arbitrary
formulæ! The restrictions are complicated; roughly speaking, the conclusion
holds for infinite lists only if the induction formula is a conjunction of equations.
So x :: xs 6= xs cannot be proved for infinite lists.
Let us prove theorems about some of the list functions of Chapter 3. Each of
these functions terminates for all arguments because each recursive call involves
a shorter list.
The length of a list:
fun nlength [] = 0
| nlength (x ::xs) = 1 + nlength xs;
Length and append. Here is an obvious property about the length of the con-
catenation of two lists.
Theorem 8 For all lists xs and ys, nlength(xs@ys) = nlength xs+nlength ys.
Proof By structural induction on xs. We avoid renaming this variable; thus, the
formula above also serves as the induction hypothesis.
The base case is
For the induction step, assume the induction hypothesis and show, for all x
and xs, that
nlength((x :: xs) @ ys) = nlength(x :: xs) + nlength ys.
This holds because
nlength((x :: xs) @ ys)
= nlength(x :: (xs @ ys)) [@]
= 1 + nlength(xs @ ys) [nlength]
= 1 + (nlength xs + nlength ys) [ind hyp]
= (1 + nlength xs) + nlength ys [associativity]
= nlength(x :: xs) + nlength ys. [nlength]
We could have written 1+nlength xs+nlength ys, omitting parentheses, instead
of applying the associative law explicitly. t
u
The proof brings out the correspondence between inserting the list elements
and counting them. Induction on xs works because the base case and induction
step can be simplified using function definitions. Induction on ys leads nowhere:
try it.
Efficient list reversal. The function nrev is a mathematical definition of list re-
versal, while revAppend reverses lists efficiently. The proof that they are equiv-
alent is similar to Theorem 4, the correctness of facti . In both proofs, the induc-
tion formula is universally quantified over an accumulating argument.
Theorem 9 For every list xs, we have ∀ys . revAppend (xs, ys) = nrev (xs) @
ys.
Proof By structural induction on xs, taking the formula above as the induction
hypothesis. The base case is
∀ys . revAppend ([], ys) = nrev [] @ ys.
It holds because revAppend ([], ys) = ys = [] @ ys = nrev [] @ ys.
The induction step is to show, for arbitrary x and xs, the formula
∀ys . revAppend (x :: xs, ys) = nrev (x :: xs) @ ys.
Simplifying the right side of the equality yields
nrev (x :: xs) @ ys = (nrev (xs) @ [x ]) @ ys. [nrev ]
230 6 Reasoning About Functional Programs
Exercise 6.8 Prove nrev (nrev xs) = xs for every list xs.
Exercise 6.9 Show that nlength xs = length xs for every list xs. (The func-
tion length was defined in Section 3.4.)
Binary trees admit structural induction. In most respects, their treatment resem-
bles that of lists. Suppose φ(t) is a property of trees, where t has type τ tree.
To prove φ(t) by structural induction, it suffices to prove two premises:
• The base case is φ(Lf ).
• The induction step is to show that φ(t1 ) and φ(t2 ) imply φ(Br (x , t1 , t2 ))
for all x of type τ and t1 , t2 of type τ tree. There are two induction hy-
potheses: φ(t1 ) and φ(t2 ).
The rule can be portrayed thus:
[φ(t1 ), φ(t2 )]
φ(Lf ) φ(Br (x , t1 , t2 )) proviso: x , t1 and t2 must not occur in
φ(t) other assumptions of φ(Br (x , t1 , t2 )).
232 6 Reasoning About Functional Programs
This structural induction rule is sound because it covers all the ways of building
a tree. The base case establishes φ(Lf ). Applying the induction step once
establishes φ(Br (x , Lf , Lf )) for all x , covering all trees containing one Br
node. Applying the induction step twice establishes φ(t) where t is any tree
containing two Br nodes. Further applications of the induction step cover larger
trees.
We can also justify the rule by complete induction on the number of labels
in the tree, because every tree is finite and its subtrees are smaller than itself.
Structural induction is not sound in general for infinite trees.
We shall prove some facts about the following functions on binary trees, from
Section 4.10.
The number of labels in a tree:
fun size Lf = 0
| size (Br (v ,t1,t2)) = 1 + size t1 + size t2;
Reflection of a tree:
fun reflect Lf = Lf
| reflect (Br (v ,t1,t2)) = Br (v , reflect t2, reflect t1);
Double reflection. We begin with an easy example: reflecting a tree twice yields
the original tree.
Theorem 11 For every binary tree t, reflect(reflect t) = t.
reflect(reflect Lf ) = Lf .
Preorder and postorder. If the concepts of preorder and postorder are obscure to
you, then the following theorem may help. A key fact is Theorem 10, concerning
nrev and @, which we have recently proved.
Theorem 12 For every binary tree t, postorder (reflect t) = nrev (preorder t).
Proof By structural induction on t. The base case is
postorder (reflect Lf ) = nrev (preorder Lf ).
This is routine; both sides are equal to [].
For the induction step we have the induction hypotheses
postorder (reflect t1 ) = nrev (preorder t1 )
postorder (reflect t2 ) = nrev (preorder t2 )
and must show
postorder (reflect(Br (x , t1 , t2 ))) = nrev (preorder (Br (x , t1 , t2 ))).
First, we simplify the right-hand side:
nrev (preorder (Br (x , t1 , t2 )))
= nrev ([x ] @ preorder t1 @ preorder t2 ) [preorder ]
= nrev (preorder t2 ) @ nrev (preorder t1 ) @ nrev [x ] [Theorem 10]
= nrev (preorder t2 ) @ nrev (preorder t1 ) @ [x ] [nrev ]
234 6 Reasoning About Functional Programs
Some steps have been skipped. Theorem 10 has been applied twice, to both
occurrences of @, and nrev [x ] is simplified directly to [x ].
Now we simplify the left-hand side:
postorder (reflect(Br (x , t1 , t2 )))
= postorder (Br (x , reflect t2 , reflect t1 )) [reflect]
= postorder (reflect t2 ) @ postorder (reflect t1 ) @ [x ] [postorder ]
= nrev (preorder t2 ) @ nrev (preorder t1 ) @ [x ] [ind hyp]
Thus, both sides are equal. t
u
Count and depth. We now prove a law relating the number of labels in a binary
tree to its depth. The theorem is an inequality, reminding us that formal methods
involve more than mere equations.
Theorem 13 For every binary tree t, size t ≤ 2depth t − 1.
Proof By structural induction on t. The base case is
size Lf ≤ 2depth Lf − 1.
It holds because size Lf = 0 = 20 − 1 = 2depth Lf − 1.
In the induction step the induction hypotheses are
size t1 ≤ 2depth t1 − 1 and size t2 ≤ 2depth t2 − 1
and we must demonstrate
size(Br (x , t1 , t2 )) ≤ 2depth (Br (x ,t1 ,t2 )) − 1.
First, simplify the right-hand side:
2depth (Br (x ,t1 ,t2 )) − 1 = 21+max(depth t1 ,depth t2 ) − 1 [depth]
max(depth t1 ,depth t2 )
=2×2 −1 [arithmetic]
Next, show that the left side is less than or equal to this:
size (Br (x , t1 , t2 )) = 1 + size t1 + size t2 [size]
≤ 1 + (2 depth t1
− 1) + (2 depth t2
− 1) [ind hyp]
= 2depth t1 + 2depth t2 − 1 [arithmetic]
max(depth t1 ,depth t2 )
≤2×2 −1 [arithmetic]
Here we have identified max, the mathematical function for the maximum of
two integers, with the library function Int.max . u
t
6.6 Function values and functionals 235
The mathematics in this chapter is based on set theory. Since there is no set A that is
isomorphic to the set of functions A → A, we can make no sense of this declaration.
In domain theory, this declaration can be interpreted because there is a domain D iso-
morphic to D → D, which is the domain of continuous functions from D to D. Even
in domain theory, no induction rule useful for reasoning about D is known. This is
because the type definition involves recursion to the left of the function arrow (→). We
shall not consider datatypes involving functions.
The declaration of type term (Section 5.11) refers to lists:
datatype term = Var of string
| Fun of string * term list;
Type term denotes a set of finite terms and satisfies a structural induction rule. However,
the involvement of lists in the type complicates the theory and proofs (Paulson, 1995,
Section 4.4).
Exercise 6.10 Formalize and prove: No binary tree equals its own left subtree.
Exercise 6.13 Prove nrev (inorder (reflect t)) = inorder t for every binary
tree t.
Exercise 6.14 Define a function leaves to count the Lf nodes in a binary tree.
Then prove leaves t = size t + 1 for all t.
Exercise 6.15 Verify the function preord of Section 4.11. In other words,
prove preord (t, []) = preorder t for every binary tree t.
The extensionality law is valid because the only operation that can be performed
on an ML function is application to an argument. Replacing f by g, if these
functions are extensionally equal, does not affect the value of any application
of f .1
A different concept of equality, called intensional equality, regards two func-
tions as equal only if their definitions are identical. Our three doubling func-
tions are all distinct under intensional equality. This concept resembles function
equality in Lisp, where a function value is a piece of Lisp code that can be taken
apart.
There is no general, computable method of testing whether two functions are
extensionally equal. Therefore ML has no equality test for function values. Lisp
tests equality of functions by comparing their internal representations.
We now prove a few statements about function composition (the infix o) and
the functional map (of Section 5.7).
fun (f o g) x = f (g x );
fun map f [] = []
| map f (x ::xs) = (f x ) :: map f xs;
The associativity of composition. Our first theorem is trivial. It asserts that func-
tion composition is associative.
(f ◦ g) ◦ h = f ◦ (g ◦ h).
((f ◦ g) ◦ h) x = (f ◦ (g ◦ h)) x
1 The extensionality law relies on our global assumption that functions termi-
nate. ML distinguishes ⊥ (the undefined function value) from λx .⊥ (the func-
tion that never terminates when applied) although both functions yield ⊥ when
applied to any argument.
6.6 Function values and functionals 237
((f ◦ g) ◦ h) x = (f ◦ g)(h x )
= f (g(h x ))
= f ((g ◦ h) x )
= (f ◦ (g ◦ h)) x .
As stated, the theorem holds only for functions of appropriate type; the equa-
tion must be properly typed. Typing restrictions apply to all our theorems and
will not be mentioned again.
The list functional map. Functionals enjoy many laws. Here is a theorem about
map and composition that can be used to avoid computing intermediate lists.
Since xs is a list, we may use structural induction. This formula will also be our
induction hypothesis. The base case is
For the induction step, we assume the induction hypothesis and show (for arbi-
trary x and xs)
Despite the presence of function values, the proof is a routine structural induc-
tion. t
u
The list functional foldl. The functional foldl applies a 2-argument function
over the elements of a list. Recall its definition from Section 5.10:
fun foldl f e [] = e
| foldl f e (x ::xs) = foldl f (f (x , e)) xs;
Exercise 6.16 Prove map f (xs @ ys) = (map f xs) @ (map f ys).
Exercise 6.20 Prove foldl f z (xs @ ys) = foldl f (foldl f z xs) ys).
Exercise 6.22 Let , e and F be as in the previous exercise. Define the func-
tion G by G(l , z ) = F z l . Prove that for all ls, foldr G e ls = F e (map (F e) ls).
The list function nrev makes its recursive call on the tail of its argument.
This kind of recursion is called structural recursion by analogy with structural
induction. However, recursive functions can shorten the list in other ways. The
function maxl , when applied to m :: n :: ns, may call itself on m :: ns:
fun maxl [m] : int = m
| maxl (m::n::ns) = if m>n then maxl (m::ns)
else maxl (n::ns);
Quick sort and merge sort divide a list into two smaller lists and sort them recur-
sively. Matrix transpose (Section 3.9) and Gaussian elimination make recursive
calls on a smaller matrix obtained by deleting rows and columns.
Most functions on trees use structural recursion: their recursive calls involve a
node’s immediate subtrees. The function nnf , which converts a proposition into
negation normal form, is not structurally recursive. We shall prove theorems
about nnf in this section.
Structural induction works best with functions that are structurally recursive.
With other functions, well-founded induction is often superior. Well-founded
induction is a powerful generalization of complete induction. Because the rule
is abstract and seldom required in full generality, our proofs will be done by a
special case: induction on size. For instance, the function nlength formalizes the
size of a list. In the induction step we have to prove φ(xs) under the induction
hypothesis
∀ys . nlength ys < nlength xs → φ(ys).
The mutually recursive functions nnfpos and nnfneg compute the same normal
form, but more efficiently:
fun nnfpos (Atom a) = Atom a
| nnfpos (Neg p) = nnfneg p
| nnfpos (Conj (p,q)) = Conj (nnfpos p, nnfpos q)
| nnfpos (Disj (p,q)) = Disj (nnfpos p, nnfpos q)
and nnfneg (Atom a) = Neg (Atom a)
| nnfneg (Neg p) = nnfpos p
| nnfneg (Conj (p,q)) = Disj (nnfneg p, nnfneg q)
| nnfneg (Disj (p,q)) = Conj (nnfneg p, nnfneg q);
We must verify that these functions terminate. The functions nnfpos and nnfneg
are structurally recursive — recursion is always applied to an immediate con-
stituent of the argument — and therefore terminate. For nnf , termination is not
so obvious. Consider nnf (Neg(Conj (p, q))), which makes a recursive call on
a large expression. But this reduces in a few steps to
Disj (nnf (Neg p), nnf (Neg q)).
Thus the recursive calls after Neg(Conj (p, q)) involve the smaller propositions
Neg p and Neg q. The other complicated pattern, Neg(Disj (p, q)), behaves
similarly. In every case, recursive computations in nnf involve smaller and
smaller propositions, and therefore terminate.
Let us prove that nnfpos and nnf are equal. The termination argument sug-
gests that theorems involving nnf p should be proved by induction on the size
of p. Let us write nodes(p) for the number of Neg, Conj and Disj nodes in p.
This function can easily be coded in ML.
Theorem 17 For all propositions p, nnf p = nnfpos p.
Proof By mathematical induction on nodes(p), taking as induction hypotheses
nnf q = nnfpos q for all q such that nodes(q) < nodes(p). We consider seven
cases, corresponding to the definition of nnf .
If p = Atom a then nnf (Atom a) = Atom a = nnfpos(Atom a).
If p = Neg(Atom a) then
nnf (Neg(Atom a)) = Neg(Atom a) = nnfpos(Neg(Atom a)).
242 6 Reasoning About Functional Programs
If p = Conj (r , q) then
nnf (Conj (r , q)) = Conj (nnf r , nnf q) [nnf ]
= Conj (nnfpos r , nnfpos q) [ind hyp]
= nnfpos(Conj (r , q)). [nnfpos]
The case p = Disj (r , q) is similar.
If p = Neg(Conj (r , q)) then
nnf (Neg(Conj (r , q))) = nnf (Disj (Neg r , Neg q)) [nnf ]
= Disj (nnf (Neg r ), nnf (Neg q)) [nnf ]
= Disj (nnfpos(Neg r ), nnfpos(Neg q)) [ind hyp]
= nnfneg(Conj (r , q)) [nnfneg]
= nnfpos(Neg(Conj (r , q))). [nnfpos]
We have induction hypotheses for Neg r and Neg q because they are smaller, as
measured by nodes, than Neg(Conj (r , q)).
The case p = Neg(Disj (r , q)) is similar.
If p = Neg(Neg r ) then
nnf (Neg(Neg r )) = nnf r [nnf ]
= nnfpos r [ind hyp]
= nnfneg(Neg r ) [nnfneg]
= nnfpos(Neg(Neg r )). [nnfpos]
An induction hypothesis applies since r contains fewer nodes than Neg(Neg r ).
t
u
This function is unusual in its case analysis and its recursive calls.
The first two cases overlap if both arguments in distrib(p, q) are conjunc-
tions. Because ML tries the first case before the second, the second case can-
not simply be taken as an equation. There seems to be no way of making the
cases separate except to write nearly every combination of one Atom, Neg, Disj
or Conj with another: at least 13 cases seem necessary. To avoid this, take the
second case of distrib as a conditional equation; if p does not have the form
Conj (p1 , p2 ) then
Tr (distrib(p 0 , q 0 )) ↔ Tr (p 0 ) ∨ Tr (q 0 )
for all p 0 and q 0 such that nodes(p 0 ) + nodes(q 0 ) < nodes(p) + nodes(q). The
proof considers the same cases as in the definition of distrib.
If q = Conj (q 0 , r ) then
The induction hypothesis has been applied twice using these facts:
We may now assume that q is not a Conj . If p = Conj (p 0 , r ) then the conclu-
sion follows as in the previous case. If neither p nor q is a Conj then
The proof exploits the distributive law of ∨ over ∧, as might be expected. The
overlapping cases in distrib do not complicate the proof at all. On the contrary,
they permit a concise definition of this function and a simple case analysis.
Exercise 6.23 State and justify a rule for structural induction on values of type
prop. To demonstrate it, prove the following formula by structural induction
on p:
nnf p = nnfpos p ∧ nnf (Neg p) = nnfneg p
Exercise 6.24 Define a predicate Isnnf on propositions such that Isnnf (p)
holds exactly when p is in negation normal form. Prove Isnnf (nnf p) for every
proposition p.
For instance, ‘less than’ (<) on the natural numbers is well-founded. ‘Less than’
on the integers is not well-founded: there exists the decreasing chain
· · · < −n < · · · < −2 < −1.
‘Less than’ on the rational numbers is not well-founded either; consider
1 1 1
··· <
< ··· < < .
n 2 1
Observe that we have to state the domain of the relation — the set of values it is
defined over — and not simply say that < is well-founded.
Another well-founded relation is the lexicographic ordering of pairs of natu-
ral numbers, defined by
(i 0 , j 0 ) ≺lex (i , j ) if and only if i 0 < i ∨ (i 0 = i ∧ j 0 < j ).
To see that ≺lex is well-founded, suppose there is an infinite decreasing chain
· · · ≺lex (in , jn ) ≺lex · · · ≺lex (i2 , j2 ) ≺lex (i1 , j1 ).
If (i 0 , j 0 ) ≺lex (i , j ) then i 0 ≤ i . Since < is well-founded on the natural numbers,
the decreasing chain
· · · ≤ in ≤ · · · ≤ i2 ≤ i1
reaches some constant value i after say M steps: thus in = i for all n ≥ M .
Now consider the strictly decreasing chain
· · · < jM +n < · · · < jM +1 < jM .
This must eventually terminate at some constant value j after say N steps: thus
(in , jn ) = (i , j ) for all n ≥ M + N . At this point the chain of pairs becomes
constant, contradicting our assumption that it was decreasing under ≺lex .
Similar reasoning shows that lexicographic orderings for triples, quadruples
and so forth, are well-founded. The lexicographic ordering is not well-founded
for lists of natural numbers; it admits an infinite decreasing chain:
· · · ≺ [1, 1, . . . , 1, 2] ≺ · · · ≺ [1, 2] ≺ [2]
Another sort of well-founded relation is given by a measure function. If f is
a function into the natural numbers, then there is a well-founded relation ≺f
defined by
x ≺f y if and only if f (x ) < f (y).
Clearly, if there were an infinite decreasing chain
· · · ≺f xn ≺f · · · ≺f x2 ≺f x1
246 6 Reasoning About Functional Programs
in the natural numbers, which is impossible. Here f typically ‘measures’ the size
of something. The well-founded relations ≺nlength and ≺size compare lists and
trees by size. Our proof about distrib used the measure nodes(p) + nodes(q)
on pairs (p, q) of propositions.
The demonstration that ≺f is well-founded applies just as well if < is re-
placed by any other well-founded relation. For instance, f could return pairs
of natural numbers to be compared by ≺lex . Similarly, the construction of ≺lex
may be applied to any existing well-founded relations. There are several ways
of constructing well-founded relations from others. Frequently we can show
that a relation is well-founded by construction, without having to argue about
decreasing chains.
[∀y 0 ≺ y . φ(y 0 )]
φ(y) proviso: y must not occur in other
φ(x ) assumptions of the premise.
The rule is sound by contradiction: if φ(x ) is false for any x then we obtain
an infinite decreasing chain in ≺. By the induction step we know that ∀y 0 ≺
x . φ(y 0 ) implies φ(x ). If ¬φ(x ) then ¬φ(y1 ) for some y1 ≺ x . Repeating this
argument for y1 , we get ¬φ(y2 ) for some y2 ≺ y1 . We then get y3 ≺ y2 , and so
forth.2
Complete induction is an instance of this rule, where ≺ is the well-founded
relation < (on the natural numbers). Our other induction rules are instances of
well-founded induction for suitable choices of ≺.
Suppose that we are also given a well-founded relation ≺ such that g(x ) ≺ x
for all x such that p(x ) = false. We then know that f 1 and f 2 terminate, and
can prove theorems about them.
Theorem 19 Suppose ⊕ is an infix operator that is associative and has iden-
tity e; that is, for all x , y and z ,
x ⊕ (y ⊕ z ) = (x ⊕ y) ⊕ z
e ⊕ x = x = x ⊕ e.
∀y . f 2(x , y) = f 1(x ) ⊕ y.
Exercise 6.26 Recall the function fst, such that fst(x , y) = x for all x and y.
Give an example of a well-founded relation that uses fst as a measure function.
Use a well-founded relation to show that ack (m, n) is defined for all natural
numbers m and n. Prove ack (m, n) > m + n by well-founded induction.
Exercise 6.30 Show that well-founded induction on the ‘tail of’ relation ≺L is
equivalent to structural induction for lists.
A trivial induction proves that gcd (m, n) = GCD(m, n) for all natural num-
bers m and n not both zero. We thereby learn that GCD(m, n) is computable.
A sorting function is not verified like this. It is not practical to define a math-
ematical function sorting and to prove tmergesort(xs) = sorting(xs). Sorting
involves two different correctness properties, which can be considered sepa-
rately:
Too often in program verification, some of the correctness properties are ig-
nored. This is dangerous. A function can satisfy property 1 by returning the
empty list, or property 2 by returning its input unchanged. Either property alone
is useless.
The specification does not have to specify the output uniquely. We might
specify that a compiler generates correct code, but should not specify the pre-
cise code to generate. This would be too complicated and would forbid code
optimizations. We might specify that a database system answers queries cor-
rectly, but should not specify the precise storage layout.
The next sections will prove that tmergesort is correct, in the sense that it
returns an ordered rearrangement of its input. Let us recall some functions from
Chapter 3. Proving that they terminate is left as an exercise.
The list utilities take and drop:
fun tmergesort [] = []
| tmergesort [x ] = [x ]
| tmergesort xs =
let val k = length xs div 2
in merge (tmergesort (List.take(xs,k )),
tmergesort (List.drop(xs,k )))
end;
Since both arguments of merge are ordered, the conclusion follows by the pre-
vious theorem. t
u
Exercise 6.32 Write another predicate to define the notion of ordered list, and
prove that it is equivalent to ordered .
this approach accepts [1,1,1,1,2] as a valid sorting of [2,1,2]. Sets do not take
account of repeated elements. This specification is too abstract.
Multisets are a good way to specify sorting. A multiset is a collection of
elements that takes account of their number but not of their order. The multi-
sets h1, 1, 2i and h1, 2, 1i are equal; they differ from h1, 2i. Multisets are often
called bags, for reasons that should be obvious. Here are some ways of forming
multisets:
• ∅, the empty bag, contains no elements.
• hui, the singleton bag, contains one occurrence of u.
• b1 ] b2 , the bag sum of b1 and b2 , contains all elements in the bags b1
and b2 (accumulating repetitions of elements).
Rather than assume bags as primitive, let us represent them as functions into the
natural numbers. If b is a bag then b(x ) is the number of occurrences of x in b.
Thus, for all x ,
∅(x ) = 0
(
0 if u 6= x
hui(x ) =
1 if u = x
(b1 ] b2 )(x ) = b1 (x ) + b2 (x ).
These laws are easily checked:
b1 ] b2 = b2 ] b1
(b1 ] b2 ) ] b3 = b1 ] (b2 ] b3 )
∅ ] b = b.
Let us define a function to convert lists into bags:
bag[] = ∅
bag(x :: xs) = hx i ] bag xs.
The ‘rearrangement’ correctness property can finally be specified:
bag(tmergesort xs) = bag xs
We should have to constrain k in the statement of the theorem and modify the
proof accordingly.
256 6 Reasoning About Functional Programs
If x ≤ y then
bag(merge(x :: xs 0 , y :: ys 0 ))
= bag(x :: merge(xs 0 , y :: ys 0 )) [merge]
= hx i ] bag(merge(xs 0 , y :: ys 0 )) [bag]
= hx i ] bag xs ] bag(y :: ys )
0 0
[ind hyp]
= bag(x :: xs ) ] bag(y :: ys ).
0 0
[bag]
Finally, we prove that merge sort preserves the bag of elements given to it.
Theorem 24 For every list xs, bag(tmergesort xs) = bag xs.
Proof By induction on the length of xs. The only hard case is if nlength xs ≥ 2.
As in Theorem 21, the induction hypotheses apply to take(xs, k ) and drop(xs, k ):
Therefore
bag(tmergesort xs)
= bag(merge(tmergesort(take(xs, k )),
tmergesort(drop(xs, k )))) [tmergesort]
= bag(tmergesort(take(xs, k ))) ]
bag(tmergesort(drop(xs, k ))) [Theorem 23]
= bag(take(xs, k )) ] bag(drop(xs, k )) [ind hyp]
= bag xs. [Theorem 22]
Exercise 6.33 Verify that ] is commutative and associative. (Hint: recall the
extensional equality of functions.)
6.12 The significance of verification 257
Exercise 6.34 Prove that insertion sort preserves the bag of elements it is
given. In particular, prove these facts:
Exercise 6.35 Modify merge sort to suppress repetitions: each input element
should appear exactly once in the output. Formalize this property and state the
theorems required to verify it.
Apart from these fundamental limitations, there is a practical one: formal proof
is tedious. Look back over the proofs in this chapter; usually they take great
pains to prove something elementary. Now consider verifying a compiler. The
specification will be gigantic, comprising the syntax and semantics of a pro-
gramming language along with the complete instruction set of the target ma-
chine. The compiler will be a large program. The proof should be divided into
parts, separately verifying the parser, the type checker, the intermediate code
generator and so forth. There may only be time to verify the most interesting
258 6 Reasoning About Functional Programs
part of the program: say, the code generator. So the ‘verified’ compiler could
fail due to faulty parsing.
Let us not be too negative. Writing a formal specification reveals ambiguities
and inconsistencies in the design requirements. Since design errors are far more
serious than coding errors, writing a specification is valuable even if the code
will not be verified. Many companies go to great expense to produce specifica-
tions that are rigorous, if not strictly formal.
The painstaking work of verification yields rewards. Most programs are in-
correct, and an attempted proof often pinpoints the error. To see this, insert an
error into any program verified in this chapter, and work through the proof again.
The proof should fail, and the location of this failure should indicate the precise
conditions under which the modified program fails.
A correctness proof is a detailed explanation of how the program or system
works. If the proof is simple, we can go through it line by line, reading it as a
series of snapshots of the execution. The inductive step traces what happens at
a recursive function call. A large proof may consist of hundreds of theorems,
examining every component or subsystem.
Specification and verification yield a fuller knowledge of the program and its
task. This leads to increased confidence in the system. Formal proof does not
eliminate the need for systematic testing, especially for a safety-critical system.
Testing is the only way to investigate whether the computational model and the
formal specification accurately reflect the real world. However, while testing
can detect errors, it cannot guarantee success; nor does it provide insights into
how a program works.
Further reading. Bevier et al. (1989) have verified a tiny computer system
consisting of several levels, both software and hardware. Avra Cohn (1989a)
has verified some correctness properties of the Viper microprocessor. Taking her proofs
as an example, Cohn (1989b) discusses the fundamental limitations of verification.
Fitzgerald et al. (1995) report a study in which two teams independently develop a
trusted gateway. The control team uses conventional methods while the experimental
team augments these methods by writing a formal specification. An unusually large
study involves the AAMP 5 pipelined microprocessor. This is a commercial product de-
signed for use in avionics. It has been specified on two levels and some of its microcode
proved correct (Srivas and Miller, 1995). Both studies suggest that writing formal spec-
ifications — whether or not they are followed up by formal proofs — uncovers errors.
A major study by Susan Gerhart et al. (1994) investigated 12 cases involving the use
of formal methods. And in a famous philosophical monograph, Lakatos (1976) argues
that we can learn from partial, even faulty, proofs.
6.12 The significance of verification 259