Dovek, Levy - Introduction To The Theory of Programming Languages PDF
Dovek, Levy - Introduction To The Theory of Programming Languages PDF
Undergraduate Topics in Computer Science (UTiCS) delivers high-quality instructional content for undergraduates studying in all areas of computing and information science. From core foundational and
theoretical material to nal-year topics and applications, UTiCS books take a fresh, concise, and modern approach and are ideal for self-study or for a one- or two-semester course. The texts are all authored
by established experts in their elds, reviewed by an international advisory board, and contain numerous examples and problems. Many include fully worked solutions.
Introduction
to the Theory
of Programming
Languages
Gilles Dowek
Labo. dInformatique
cole polytechnique
route de Saclay
91128 Palaiseau
France
[email protected]
Jean-Jacques Lvy
Centre de Recherche Commun
INRIA-Microsoft Research
Parc Orsay Universit
28 rue Jean Rostand
91893 Orsay Cedex
France
[email protected]
Series editor
Ian Mackie
Advisory board
Samson Abramsky, University of Oxford, Oxford, UK
Chris Hankin, Imperial College London, London, UK
Dexter Kozen, Cornell University, Ithaca, USA
Andrew Pitts, University of Cambridge, Cambridge, UK
Hanne Riis Nielson, Technical University of Denmark, Lungby, Denmark
Steven Skiena, Stony Brook University, Stony Brooks, USA
Iain Stewart, University of Durham, Durham, UK
The work was first published in 2006 by Les editions de lcole polytechnique with the following title: Introduction la thorie des langages de programmation. The translator of the work is Maribel
Fernandez.
ISSN 1863-7310
ISBN 978-0-85729-075-5
e-ISBN 978-0-85729-076-2
DOI 10.1007/978-0-85729-076-2
Springer London Dordrecht Heidelberg New York
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
Springer-Verlag London Limited 2011
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced,
stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licenses issued by the
Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to
the publishers.
The use of registered names, trademarks, etc., in this publication does not imply, even in the absence of a
specific statement, that such names are exempt from the relevant laws and regulations and therefore free
for general use.
The publisher makes no representation, express or implied, with regard to the accuracy of the information
contained in this book and cannot accept any legal responsibility or liability for any errors or omissions
that may be made.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
The ultimate, definitive programming language has not been created yet, far from it.
Almost every day a new language is created, and new functionalities are added to
existing languages. Improvements in programming languages contribute to making
programs more reliable, shorten the development time, and make programs easier
to maintain. Improvements are also needed to satisfy new requirements, such as the
development of parallel, distributed or mobile programs.
The first thing that we need to describe, when defining a programming language,
is its syntax. Should we write x := 1 or x = 1? Should we put brackets after
an if or not? More generally, what are the strings of symbols that can be used as
a program? There is a useful tool for this: the notion of a formal grammar. Using
a grammar, we can describe the syntax of the language in a precise way, and this
makes it possible to build programs to check the syntactical correctness of programs.
But it is not sufficient to know what a syntactically correct program is in order to
know what is going to happen when we run the program. When defining a programming language, it is also necessary to describe its semantics, that is, the expected
behaviour of the program when it is executed. Two languages may have the same
syntax but different semantics.
The following is an example of what is meant (informally) by semantics. Function evaluation is often explained as follows. The result V of the evaluation of an
expression of the form f e1 ... en , where the symbol f is a function defined by
the expression f x1 ... xn = e, is obtained in the following way. First, the
arguments e1 , ..., en are evaluated, returning values W1 , ..., Wn . Then,
these values are associated to the variables x1 , ..., xn , and finally the expression e is evaluated. The value V is the result of this evaluation.
This explanation of the semantics of the language, expressed in a natural language (English), allows us to understand what happens when a program is executed,
but is it precise? Consider, for example, the program
f x y = x
g z = (n = n + z; n)
n = 0; print(f (g 2) (g 7))
v
vi
Depending on the way we interpret the explanation given above, we can deduce that
the program will result in the value 2 or in the value 9. This is because the natural
language explanation does not indicate whether we have to evaluate g 2 before or
after g 7, and the order in which we evaluate these expressions is important in this
case. Instead, the explanation should have said: the arguments e1 , ..., en are
evaluated starting from e1 or else starting from en .
If two different programmers read an ambiguous explanation, they might understand different things. Even worse, the designers of the compilers for the language
might choose different conventions. Then the same program will give different results depending on the compiler used.
It is well known that natural languages are too imprecise to express the syntax of a
programming language, a formal language should be used instead. Similarly, natural
languages are too imprecise to express the semantics of a programming language,
and we need to use a formal language for this.
What is the semantics of a program? Let us take for instance a program p that
requests an integer, computes its square, and displays the result of this operation. To
describe the behaviour of this program, we need to describe a relation R between
the input value and the associated output.
The semantics of this program is, thus, a relation R between elements of the set E
of input values and elements of the set S of output values, that is, a subset of E S.
The semantics of a program is then a binary relation. The semantics of a programming language is, in turn, a ternary relation: the program p with input value e
returns the output value s. We denote this relation by p, e s. The program p
and the input e are available before the execution of the program starts. Often, these
two elements are paired in a term p e, and the semantics of the language assigns a
value to this term. The semantics of the language is then a binary relation t s.
To express the semantics of a programming language we need a language that
can express relations.
When the semantics of a program is a functional relation, that is, for each input
value there is at most one output value, we say that the program is deterministic.
Video games are examples of non-deterministic programs, since some randomness
is necessary to make the game enjoyable. A language is deterministic if all the programs that can be written in the language are deterministic, or equivalently, if the
semantics is a functional relation. In this case, it is possible to define its semantics
using a language to define functions instead of a language to define relations.
Acknowledgements
The authors would like to thank Grard Assayag, Antonio Bucciarelli, Roberto Di
Cosmo, Xavier Leroy, Dave MacQueen, Luc Maranget, Michel Mauny, Franois
Pottier, Didier Rmy, Alan Schmitt, lodie-Jane Sims and Vronique Vigui
Donzeau-Gouge.
vii
Contents
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
4
6
6
7
7
7
9
10
10
12
12
12
12
13
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15
15
15
15
16
16
16
17
17
18
18
19
20
21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
ix
Contents
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
22
23
24
24
24
26
26
27
27
27
28
29
31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
33
33
35
36
38
38
40
Compilation . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 An Interpreter Written in a Language Without Functions
4.2 From Interpretation to Compilation . . . . . . . . . . .
4.3 An Abstract Machine for PCF . . . . . . . . . . . . . .
4.3.1 The Environment . . . . . . . . . . . . . . . .
4.3.2 Closures . . . . . . . . . . . . . . . . . . . . .
4.3.3 PCF Constructs . . . . . . . . . . . . . . . . .
4.3.4 Using de Bruijn Indices . . . . . . . . . . . . .
4.3.5 Small-Step Operational Semantics . . . . . . .
4.4 Compilation of PCF . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
44
44
45
45
46
46
47
48
48
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
51
51
52
53
54
55
55
56
56
57
58
59
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Contents
xi
Type Inference . . . . . . . . . . . . . . . . . . . . . . . .
6.1 Inferring Monomorphic Types . . . . . . . . . . . . . .
6.1.1 Assigning Types to Untyped Terms . . . . . . .
6.1.2 Hindleys Algorithm . . . . . . . . . . . . . . .
6.1.3 Hindleys Algorithm with Immediate Resolution
6.2 Polymorphism . . . . . . . . . . . . . . . . . . . . . .
6.2.1 PCF with Polymorphic Types . . . . . . . . . .
6.2.2 The Algorithm of Damas and Milner . . . . . .
.
.
.
.
.
.
.
.
63
63
63
64
66
68
68
70
73
74
75
.
.
.
.
.
.
.
.
81
81
81
82
85
85
86
88
Epilogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 1
Let A be an arbitrary set. The inclusion relation over the set (A) of all
the subsets of A is another example of a weakly complete
ordering. The limit of
an increasing sequence U0 , U1 , U2 , ... is the set iN Ui . In addition, this
relation has a least element .
Let f be a function from E to E. The function f is increasing if
x y
f x f y.
Proof First, since m is the smallest element in E, m f m. The function f is increasing, therefore fi m fi+1 m. Since the sequence fi m is increasing, it has
a limit. The sequence fi+1 m also has p as limit, thus, p = limi (f (fi m))
= f (limi (fi m)) = f p. Moreover, p is the least fixed point, because if q
is another fixed point, then m q and fi m fi q = q (since f is increasing). Hence p = limi (fi m) q.
The second fixed point theorem states the existence of a fixed point for increasing
functions, even if they are not continuous, provided the ordering satisfies a stronger
property.
An ordering over a set E is strongly complete if every subset A of E has a least
upper bound sup A.
The standard ordering relation over the interval [0, 1] is an example of a
strongly complete ordering relation. The standard ordering over R+ is not strongly
complete because the set R+ itself has no upper bound.
Let A be an arbitrary set. The inclusion relation over the set (A) of all
the subsets of A is another
example of strongly complete ordering. The least upper
bound of a set B is the set CB C.
Exercise 1.1 Show that any strongly complete ordering is also weakly complete.
Is the ordering weakly complete? Is it strongly complete?
Note that if the ordering over the set E is strongly complete, then any subset A
of E has a greatest lower bound inf A. Indeed, let A be a subset of E, let B be the
set {y E | x A y x} of lower bounds of A and l the least upper
bound of B. By definition, l is an upper bound of the set B
y B y l
and it is the least one
(y B y l) l l
It is easy to show that l is the greatest lower bound of A. Indeed, if x is an
element of A, it is an upper bound of B and since l is the least upper bound,
l x. Thus, l is a lower bound of A. To show that it is the greatest one, it is
sufficient to note that if m is another lower bound of A, it is an element of B and
therefore m l.
The greatest lower bound of a set B of subsets of A is, of course, the set CB C.
Second Fixed Point Theorem Let be a strongly complete ordering over a set
E. Let f be a function from E to E. If f is increasing then p = inf {c |
f c c} is the least fixed point of f.
n P
n + 2 P
In order to define a language inductively, we will sometimes use a notation borrowed from language theory, where, for example, the set of words of the form
an bcn is defined as follows
m=b
|a m c
To show that there is indeed a smallest subset of A that is closed under the functions f1 , f2 , ..., we define a function F from (A) to (A)
F C = {x A | i y1 ... yni C x = fi y1 ... yni }
A subset C of A is closed under the functions f1 , f2 , ... if and only if
F C C.
The function F is trivially increasing, that is, if C C then F C F C.
In addition,
it is continuous, that is, if C0 C1 C2 then F ( j Cj )
=
(F
C
j ). Indeed, if an element x of A is
j
in F ( j Cj ), then there exists
a number i and elements y1 , ..., yni in j Cj such that x = fi y1 ...
yni . Each of these elements is in one of the Cj . Since the sequence Cj is increasing,
they are all in Ck , which
the element x belongs
is the largest of these sets. Therefore,
to F Ck and also to j (F Cj ). Conversely, if x is in j (F Cj ), then it belongs
yni of
to some F Ck , and there is therefore a number i and elements y1 , ...,
Ck such that x = fi y1 ... yni . The elements y1 , ..., yni are in j Cj ,
and therefore x is in F ( j Cj ).
The set E is defined as the least fixed point of the function F. This is the smallest
set that satisfies the property F E = E and, according to the second fixed point
theorem, it is also the smallest set that satisfies the property F E E. Thus, it is
the smallest set that is closed under the functions f1 , f2 , ....
The set of even numbers is not the only subset of N that contains 0 and is
closed under the function n n + 2the set N, for example, also satisfies
these propertiesbut it is the smallest one. It can be defined as the intersection
of all those sets. The second fixed point theorem allows us to generalise this observation and define E as the intersection of all the sets that are closed under the
functions f1 , f2 , ....
The first fixed point theorem shows that an element x is in E if and only if there
is some number k such that x is in the set Fk . That is, if there is a function fi
such that x = fi y1 ... yni where y1 , ..., yni are in Fk1 . Iterating,
that is, by induction on k, we can show that an element x of A is in E if and only if
there exists a tree where the nodes are labelled by elements of A, the root is labelled
by x, and if a node is labelled by c, then its children are labelled by d1 , ...,
dn such that for some rule f, we have c = f d1 ... dn . Such a tree is called
a derivation for a. This notion of a derivation generalises the notion of proof in
logic. We can then define the set E as the set of elements x of A for which there is a
derivation.
We will use a specific notation for derivations. First, the root of the tree will be
written at the bottom, and the leaves at the top. Then, we will write a line over each
node in the tree and we will write its children over the line.
The number 8, for example, is in the set of even numbers, as the following derivation shows
0
2
4
6
8
If we call P the set of even numbers, we can write the derivation as follows
0P
2P
4P
6P
8P
if x R y
x R x
x R y
y R z
x R z
1.2 Languages
If we see R as a directed graph, then R is the relation that links two nodes when
there is a path from one to the other.
1.2 Languages
1.2.1 Languages Without Variables
Now that we have introduced inductive definitions, we will use this technique to
define the notion of a language. The notion of language that we will define does not
take into account superficial syntactic conventions, for instance, it does not matter
whether we write 3 + 4, +(3,4), or 3 4 +. This term will be represented in
an abstract way by a tree. Each node in the tree will be labelled by a symbol. The
number of children of a node depends on the nodes label2 children if the label is
+, 0 if it is 3 or 4, ....
A language is thus a set of symbols, each with an associated number called arity,
or simply number of arguments, of the symbol. The symbols without arguments are
called constants.
The set of terms of the language is the set of trees inductively defined by
if f is a symbol with n arguments and t1 , ..., tn are terms then
f(t1 , ..., tn )that is, the tree that has a root labelled by f and subtrees
t1 , ..., tn is a term.
1.2.2 Variables
Imagine that we want to design a language to define functions. One possibility
would be to use constants sin, cos, ... and a symbol with two arguments
. We could, for instance, build the term sin (cos sin) in this language.
However, we know that, to specify functions, it is easier to use a notion invented
by F. Vite (15401603): the notion of a variable. Thus, the function described above
can be written sin (cos (sin x)).
Since the 1930s, we write this function x sin (cos (sin x)) or x
sin (cos (sin x)), using the symbol or to bind the variable x. By indicating explicitly which variables are bound, we can distinguish the arguments of
the function from potential parameters, and we also fix the order of the arguments.
1.2 Languages
two constants sin and cos to represent the functions sine and cosine, a symbol ,
called application, such that (f,x) is the object obtained by applying the function
f to the object x and a symbol fun to build functions. This language includes then
four symbols: the constants sin and cos, with arity (0,0) and fun with arity
(1); the set of terms is inductively defined by
We will adopt a simplified notation, writing t u for the term (t,u) and fun
x -> t for the term fun(x t).
For example, fun x -> sin (cos (sin x)) is a term in this language.
10
tn )) = 1 +
1.2.5 Substitution
The first operation that we need to define is substitution: indeed, the rle of variables
is not only to be bound but also to be substituted. For example, when we apply the
function fun x -> sin (cos (sin x)) to the term 2 * , at some point
we will need to substitute in the term sin (cos (sin x)) the variable x by
the term 2 * .
A substitution is simply a mapping from variables to terms, with a finite domain.
In other words, a substitution is a finite set of pairs where the first element is a
variable and the second a term, and such that each variable occurs at most once as
first element in a pair. We can also define a substitution as an association list =
t1 /x1 ... tn /xn .
When a substitution is applied to a term, each occurrence of a variable x1 ,
..., xn in the term is replaced by t1 , ..., tn , respectively.
Of course, this replacement only affects the free variables. For example, if we
substitute the variable x by the term 2 in the term x + 3, we should obtain the
term 2 + 3. However, if we substitute the variable x by the term 2 in the term
fun x -> x which represents the identity function we should obtain the term
fun x -> x and not fun x -> 2.
The first attempt to define the application of a substitution to a term is as follows:
xi = ti ,
x = x if x is not in the domain of ,
1.2 Languages
11
k1
where we use the notation |V\{y1 , ..., yk } for the restriction of the substitution
to the set V \ {y1 , ..., yk }, that is, the substitution where we have omitted
all the pairs where the first element is one of the variables y1 , ..., yk .
This definition is problematic, because substitutions could capture variables. For
example, the term fun x -> (x + y) represents the function that adds y to its
argument. If we substitute y by 4 in this term, we obtain the term fun x ->
(x + 4) representing the function that adds 4 to its argument. If we substitute y
by z, we get the term fun x -> (x + z) representing the function that adds
z to its argument. But if we substitute y by x, we obtain the function fun x ->
(x + x) which doubles its argument, instead of the function that adds x to its
argument as expected. We can avoid this problem if we change the name of the
bound variable: bound variables are dummies, their name does not matter. In other
words, in the term fun x -> (x + y), we can replace the bound variable x by
any other variable, except of course y. Similarly, when we substitute in the term
u the variables x1 , ..., xn by the terms t1 , ..., tn , we can change the
names of the bound variables in u to make sure that their names do not occur in
x1 , ..., xn , or in the variables of t1 , ..., tn , or in the variables of u, to
avoid capture.
We start by defining an equivalence relation on terms, by induction on the height
of terms. This relation is called alphabetic equivalenceor -equivalenceand it
corresponds to variable renaming.
x x,
f(y11 ... y1k1 t1 , ..., yn1 ... ynkn tn ) f(y11 ... y1k1 t1 ,
..., yn1 ... ynkn tn ) if for all i, and for any sequence of fresh variables z1 , ..., zki (that is, variables that do not occur in ti , ti ), we have
z1 /yi1 , ..., zki /yiki ti
z1 /yi1 , ..., zki /yiki ti .
For example, the terms fun x -> x + z and fun y -> y + z are equivalent.
In the rest of the book we will work with terms modulo -equivalence, that is,
we will consider implicitly -equivalence classes of terms.
We can now define the operation of substitution by induction on the height of
terms:
xi = ti ,
x = x if x is not in the domain of ,
f(y11 ... y1k1 u1 , ..., yn1 ... ynkn un ) = f(z11 ...
z1k1
z11 /y11 , ..., z1k1 /y1k1 u1 , ..., zn1 ... znkn
zn1 /yn1 , ...,
znkn /ynkn un ) where z11 , ..., z1k1 , ..., zn1 , ..., znkn are variables
that do not occur in f(y11 ... y1k1 u1 , ..., yn1 ... ynkn un ) or in .
For example, if we substitute the variable y by the term 2 * x in the term fun
x -> x + y, we obtain the term fun z -> z + (2 * x). The choice of
12
if and only if
JpK e = s
Of course, this simply moves the problem further down: we now need to define the
function JpK. For this, we will use two tools: explicit definitions of functions, and
the fixed point theorem. . . but we will leave this for later.
13
For example, when we run the program fun x -> (x * x) + x with input
4, we obtain the result 20. But the term (fun x -> (x * x) + x) 4 does
not become 20 in one step, it is first transformed into (4 * 4) + 4, then 16 +
4, and finally 20.
The most important relation is not the one that links (fun x -> (x * x) +
x) 4 with 20, but , which relates the term (fun x -> (x * x) + x) 4
with (4 * 4) + 4, then the term (4 * 4) + 4 with 16 + 4 and finally the
term 16 + 4 with the term 20.
Once the relation is given, can be derived from the reflexive-transitive
closure of the relation
t s
if and only if
t s and s is irreducible
The fact that the term s is irreducible implies that there is nothing else to compute
in s. For example, the term 20 is irreducible, but the term 16 + 4 is not. A term s
is irreducible if there is no term s such that s s.
1.3.4 Non-termination
The execution of a program may produce a result, produce an error, or never terminate. Errors can be seen as particular kinds of results. For non-terminating programs,
there are several ways to define a semantics. A first alternative is to consider that if
the term t does not terminate, then there is no pair (t,s) in the relation . Another alternative is to add a specific element to the set of output values, and to state
that the relation contains the pair (t,) when the term t does not terminate.
The difference may seem superficial: it is easy to delete all the pairs of the form
(t,), or to add such a pair if there is no pair of the form (t,s) in the relation.
However, readers who are familiar with computability problems will notice that, if
we add the pairs (t,), the relation is no longer recursively enumerable.
Chapter 2
15
16
of a syntax that is different from the syntax used for a standard argument such as an
integer or a string. In a functional language, functions are defined in the same way
whether they take numbers or functions as arguments.
For example, the composition of a function with itself is defined by fun f ->
fun x -> f (f x).
To highlight the fact that functions are not considered different, and thus they can
be used as arguments or returned as results for other functions, we say that functions
are first class objects.
2.1.4 No Assignments
In contrast with languages such as Caml or Java, the main feature of PCF is a total
lack of assignments. There is no construction of the form x := t or x = t to
assign a value to a variable. We will describe, in Chap. 7, an extension of PCF
with assignments.
17
It is often said that a function is recursive if the function is used in its own definition. This is absurd: in programming languages, as everywhere else, circular definitions are meaningless. We cannot define the function fact by fun n ->
ifz n then 1 else n * (fact (n - 1)). In general, we cannot define
a function f by a term G which contains an occurrence of f. However, we can define
the function f as the fixed point of the function fun f -> G. For example, we can
define the function fact as the fixed point of the function fun f -> fun n ->
ifz n then 1 else n * (f (n - 1)).
Does this function have a fixed point? and if it does, is this fixed point unique?
Otherwise, which fixed point are we referring to? We will leave these questions for
a moment, and simply state that a recursive function is defined as a fixed point.
In PCF, the symbol fix binds a variable in its argument, and the term
fix f G denotes the fixed point of the function fun f -> G. The function
fact can then be defined by fix f fun n -> ifz n then 1 else n *
(f (n - 1)).
Note, again, that using the symbol fix we can build the factorial function without necessarily giving it a name.
2.1.6 Definitions
We could, in theory, omit definitions and replace everywhere the defined symbols by
their definitions. However, programs are simpler and clearer if we use definitions.
We add then a final construct in PCF, written let x = t in u. The occurrences of the variable x in u are bound, but those in t are not. The symbol let is a
binary operator that binds a variable in its second argument.
18
t=x
| fun
|t t
|n
|t +
| ifz
| fix
| let
x -> t
t
t
x
x
| t - t | t * t | t / t
then t else t
t
= t in t
Despite its small size, PCF is Turing complete, that is, all computable functions
can be programmed in PCF.
Exercise 2.1 Write a PCF program that takes two natural numbers n and p as inputs
and returns np .
Exercise 2.2 Write a PCF program that takes a natural number n as input and returns the number 1 if the input is a prime number, and 0 otherwise.
Exercise 2.3 (Polynomials in PCF) Write a PCF program that takes a natural number q as input, and returns the greatest natural number u such that u (u + 1) /
2 q.
Cantors function K is a function from N2 to N defined by fun n ->
fun p -> (n + p) (n + p + 1) / 2 + n. Let K be the function from
N to N2 defined by fun q -> (q - (u (u + 1) / 2), u - q +
u (u + 1) / 2) where u is the greatest natural number such that u (u +
1) / 2 q.
Show that K K = id. Let n and p be two natural numbers, show that
the greatest natural number u such that u (u + 1) / 2 (n + p) (n +
p + 1) / 2 + n is n + p. Then deduce that K K = id. From this fact,
deduce that K is a bijection from N2 to N.
Let L be the function fun n -> fun p -> (K n p) + 1. A polynomial
with integer coefficients a0 + a1 X + + ai Xi + + an Xn can be
represented by the integer L a0 (L a1 (L a2 ... (L an 0) ...)).
Write a PCF program that takes two natural numbers as input and returns the
value of the polynomial represented by the first number applied to the second.
19
goes well. The first step in this simplification process is parameter passing, that
is, the replacement of the formal argument x by the actual argument 3. The initial
term becomes, after a first small-step transformation, the term 2 * 3. In the second
step, the term 2 * 3 is evaluated, resulting in the number 6. The first small step,
parameter passing, can be performed each time we have a term of the form (fun
x -> t) u where a function fun x -> t is applied to an argument u. As a
consequence, we define the following rule, called -reduction rule
(fun x -> t) u (u/x)t
The relation t u should be read t reducesor rewritesto u. The second
step mentioned above can be generalised as follows
p q n (if p q = n)
where is any of the four arithmetic operators included in PCF. We add similar
rules for conditionals
ifz 0 then t else u t
ifz n then t else u u (if n is a number different from 0)
a rule for fixed points
fix x t (fix x t/x)t
and a rule for let
let x = t in u (t/x)u
A redex is a term t that can be reduced. In other words, a term t is a redex if
there exists a term u such that t u.
2.2.2 Numbers
It could be said, quite rightly, that the rule p q n (if p q = n), of
which 2 * 3 6 is an instance, does not really explain the semantics of the
arithmetic operators, since it just replaces the multiplication in PCF by that of Mathematics. This choice is however motivated by the fact that we are not really interested in the semantics of arithmetic operators, instead, our goal is to highlight the
semantics of the other constructs in the language.
To define the semantics of the arithmetic operators in PCF without referring to
the mathematical operators, we should consider a variant of PCF without numeric
constants, where we introduce just one constant for the number 0 and a symbol S
successorwith one argument. The number 3, for instance, is represented by the
term S(S(S(0))). We then add small-step rules
0 + u u
S(t) + u S(t + u)
0 - u 0
20
t - 0 t
S(t) - S(u) t - u
0 * u 0
S(t) * u t * u + u
t / S(u) ifz t - u then 0 else S((t - S(u)) / S(u))
Note that, to be precise, we should add a rule for division by 0, which should raise
an exception: error.
Exercise 2.4 (Church numerals) Instead of introducing the symbols 0 and S, we
can represent the number n by the term fun z -> fun s -> s (s (...
(s z)...)) rather than S(S(...(0)...)). Show that addition and multiplication can be programmed on these representations. Show that the function that
checks whether a number is 0 can also be programmed.
Exercise 2.5 (Position numerals) It could be said that the representations of numbers using the symbols 0 and S, or using Church numerals, are not efficient, since
the size of the term representing a number grows linearly with the numberas the
representation in unary notation, where to write the number n we need n symbols
and not logarithmically, as it is the case with the usual position-based notation. An
alternative could be to use a symbol z for the number 0 and two functions O and I
to represent the functions n 2 * n and n 2 * n + 1. The number 26
would then be represented by the term O(I(O(I(I(z))))), and reversing it we
obtain IIOIO, the binary representation of this number.
Write a small-step operational semantics for the arithmetic operators in this language.
2.2.3 A Congruence
Using the rules of the small-step semantics we obtain
(fun x -> 2 * x) 3 2 * 3 6
Thus, denoting by the reflexive-transitive closure of , we can write
(fun x -> 2 * x) 3 6.
However, with this definition, the term (2 + 3) + 4 does not reduce to the
term 9 according to . Indeed, to reduce a term of the form t + u the terms
t and u should be numeric constants, but our first term 2 + 3 is a sum, not a
constant. The first step should then be the evaluation of 2 + 3, which produces the
number 5. Then, a second step reduces 5 + 4 to 9. The problem is that, with our
definition, the term 2 + 3 reduces to 5, but (2 + 3) + 4 does not reduce to
5 + 4.
We need to define another relation, where rules can be applied to any subterm of
a term to be reduced. Let us define inductively the relation as follows
t u if t u
21
t u
t v u v
t u
v t v u
t u
fun x -> t fun x -> u
t u
t + v u + v
...
It is possible to show that a term is a redex with respect to the relation if and
only if one of its subterms is a redex with respect to .
2.2.4 An Example
To illustrate PCFs small-step semantic rules, let us compute the factorial of 3.
(fix f fun n -> ifz n then 1 else n * (f (n - 1))) 3
(fun n -> ifz n then 1 else n * ((fix f fun n ->
ifz n then 1 else n * (f (n - 1))) (n - 1))) 3
ifz 3 then 1 else 3 * ((fix f fun n -> ifz n then 1
else n * (f (n - 1))) (3 - 1))
3 * ((fix f fun n -> ifz n then 1 else
n * (f (n - 1))) (3 - 1))
3 * ((fix f fun n -> ifz n then 1 else
n * (f (n - 1))) 2)
3 * ((fun n -> ifz n then 1 else n * ((fix f fun n ->
ifz n then 1 else n * (f (n - 1))) (n - 1))) 2)
3 * (ifz 2 then 1 else 2 * ((fix f fun n -> ifz n
then 1 else n * (f (n - 1))) (2 - 1)))
3 * (2 * ((fix f fun n -> ifz n then 1 else
n * (f (n - 1))) (2 - 1)))
3 * (2 * ((fix f fun n -> ifz n then 1 else
n * (f (n - 1))) 1))
3 * (2 * ((fun n -> ifz n then 1 else
n * ((fix f fun n -> ifz n then 1 else
n * (f (n - 1))) (n - 1))) 1))
3 * (2 * (ifz 1 then 1 else 1 * ((fix f fun n ->
ifz n then 1 else n * (f (n - 1))) (1 - 1))))
3 * (2 * (1 * ((fix f fun n -> ifz n then 1 else
n * (f (n - 1))) (1 - 1))))
22
23
(fun x -> x)), and ifz V1 then V2 else V3 where V1 , V2 and V3 are
irreducible and closed and V1 is not a number (for example, ifz (fun x ->
x) then 1 else 2).
Exercise 2.7 Which are the values associated to the terms
(fun x -> fun x -> x) 2 3
and
(fun x -> fun y -> ((fun x -> (x + y)) x)) 5 4
according to the small-step operational semantics of PCF?
Exercise 2.8 (Static binding) Does the small-step operational semantics of PCF
associate the value 10 or the value 11 to the term
let x = 4 in let f = fun y -> y + x
in let x = 5 in f 6?
The first versions of the language Lisp produced the value 11 instead of 10 for
this term. In this case, we say that the binding is dynamic.
2.2.6 Non-termination
It is easy to see that the relation is not total, that is, there are terms t for which
there is no term u such that t u. For example, the term b = fix x x reduces to itself, and only to itself. It does not reduce to any irreducible term.
Exercise 2.9 Let b1 = (fix f (fun x -> (f x))) 0. Show all the terms
obtained by reducing this term. Does the computation produce a result in this case?
Exercise 2.10 (Currys fixed point operator) Let t be a term and u be the term
(fun y -> (t (y y)))(fun y -> (t (y y))). Show that u reduces
to t u.
Let t be a term and v be the term (fun y -> ((fun x -> t)
(y y)))(fun y -> ((fun x -> t) (y y))). Show that v reduces to
(v/x)t.
Thus, we can deduce that the symbol fix is superfluous in PCF. However, it is
not going to be superfluous later when we add types to PCF.
Write a term u without using the symbol fix and equivalent to b = fix x x.
Describe the terms that can be obtained by reduction. Does the computation produce
a result in this case?
24
2.2.7 Confluence
Is it possible for a closed term to produce several results? And, in general, can a
term reduce to several different irreducible terms? The answer to these questions
is negative. In fact, every PCF program is deterministic, but this is not a trivial
property. Let us see why.
The term (3 + 4) + (5 + 6) has two subterms which are both redexes.
We could then start by reducing 3 + 4 to 7 or 5 + 6 to 11. Indeed, the term
(3 + 4) + (5 + 6) reduces to both 7 + (5 + 6) and (3 + 4) + 11.
Fortunately, neither of these terms is irreducible, and if we continue the computation
we reach in both cases the term 18.
To prove that any term can be reduced to at most one irreducible term we need to
prove that if two computations originating in the same term produce different terms,
then they will eventually reach the same irreducible term.
This property is a consequence of another property of the relation : confluence.
A relation R is confluent if each time we have a R b1 and a R b2 , there exists
some c such that b1 R c and b2 R c.
It is not difficult to show that confluence implies that each term has at most one
irreducible result. If the term t can be reduced to two irreducible terms u1 and u2 ,
then we have t u1 and t u2 . Since is confluent, there exists a term v
such that u1 v and u2 v. Since u1 is irreducible, the only term v such that
u1 v is u1 itself. Therefore, u1 = v and similarly u2 = v. We conclude that
u1 = u2 . In other words, t reduces to at most one irreducible term.
We will not give here the proof of confluence for the relation . The idea is that
when a term t contains two redexes r1 and r2 , and t1 is obtained by reducing r1
and t2 is obtained by reducing r2 , then we can find the residuals of r2 in t1 and
reduce them. Similarly, we can reduce the residuals of r1 in t2 , obtaining the same
term. For example, by reducing 5 + 6 in 7 + (5 + 6) and reducing 3 + 4 in
(3 + 4) + 11, we obtain the same term: 7 + 11.
25
other terms). By reducing always the innermost redex, we can build an infinite reduction sequence C b1 C b2 C b1 , whereas reducing the outermost
redex produces the result 0.
This example may seem an exception, because it contains a function C that does
not use its argument; but note that the ifz construct is similar, and in the example
of the factorial function, when computing the factorial of 3 for instance, we can
observe the same behaviour: The term ifz 0 then 1 else 0 * ((fix f
fun n -> ifz n then 1 else n * (f (n - 1))) (0 - 1)) has
several redexes. Outermost reduction produces the result 1 (the other redexes disappear), whereas reducing the redex fix f fun n -> ifz n then 1 else
n * (f (n - 1)) we get an infinite reduction sequence. In other words, the
term fact 3 can be reduced to 6, but it can also generate reductions that go on
forever.
Both C b1 and fact 3 produce a unique result, but not all reduction sequences
reach a result.
Since the term C b1 has the value 0 according to the PCF semantics, an evaluator, that is, a program that takes as input a PCF term and returns its value, should
produce the result 0 when computing C b1 . Let us try to evaluate this term using
some current compilers. In Caml, the program
let rec f x = f x in let g x = 0 in g (f 0)
does not terminate. In Java, we have the same problem with the program
class Omega {
static int f (int x) {return f(x);}
static int g (int x) {return 0;}
static public void main (String [ ] args) {
System.out.println(g(f(0)));}}
Only a small number of compilers, using call by name or lazy evaluation, such as
Haskell, Lazy-ML or Gaml, produce a terminating program for this term.
This is because the small-step semantics of PCF does not correspond to the semantics of Caml or Java. In fact, it is too general and when a term has several
redexes it does not specify which one should be reduced first. By default, it imposes
termination of all programs that somehow can produce a result. An ingredient is
missing in this semantic definition: the notion of a strategy, that specifies the order
of reduction of redexes.
A strategy is a partial function that associates to each term in its domain one of its
redex occurrences. Given a strategy s, we can define another semantics, replacing
the relation by a new relation s such that t s u if s t is defined and u is
obtained by reducing the redex s t in t. Then, we define the relation s as the
reflexive-transitive closure of s , and the relation s as before.
Instead of defining a strategy, an alternative would be to weaken the reduction
rules, in particular the congruence rules, so that only some specific reductions can
be performed.
26
27
28
The big-step operational semantics of a programming language provides an inductive definition of the relation , without first defining and .
29
t fun x -> t
t u V
(W/x)t V
t fun x -> t
t u V
(W/x)t V
n n
u q
t p
if p q = n
t u n
t 0
u V
ifz t then u else v V
if n is a
t n
v V
ifz t then u else v V constant = 0
30
(fix x t/x)t V
fix x t V
t W
(W/x)u V
let x = t in u V
Notice that, even under call by value, we keep the rules for the ifz
t 0
u V
ifz t then u else v V
if n is a
t n
v V
ifz t then u else v V constant = 0
that is, we do not evaluate the second and third arguments of an ifz until they are
needed.
Note also that, even under call by value, we keep the rule
(fix x t/x)t V
fix x t V
We must resist the temptation to evaluate the term fix x t to a value W before
substituting it in t, because the rule
fix x t W
(W/x)t V
fix x t V
requires, in order to evaluate fix x t, to start by evaluating fix x t which
would create a loop and the term fact 3 would never produce a valueits evaluation would give rise to an infinite computation.
Note finally that other rule combinations are possible. For example, some variants
of the call by name semantics use call by value in the let rule.
Exercise 2.13 Which values do we obtain under big-step semantics for the terms
(fun x -> fun x -> x) 2 3
and
(fun x -> fun y -> ((fun x -> (x + y)) x)) 5 4?
Compare your answer with that of Exercise 2.7.
Exercise 2.14 Does the big-step semantics associate the value 10 or the value 11
to the term
let x = 4 in let f = fun y -> y + x
in let x = 5 in f 6?
Compare your answer with that of Exercise 2.8.
31
Chapter 3
33
34
closure, consisting of a term that must be of the form fun x -> t and an environment e. We will write such values as follows x, t, e. Values are no longer
a subset of terms, and we will have to define a language of values independently
from the language of terms.
As a consequence, we will need to rewrite the rules for the call by name bigstep operational semantics of PCF, in order to consider a relation of the form e
t V, read t is interpreted as V in e , where e is an environment, t a term
and V a value. When the environment e is empty, this relation will be written t
V. The rules that extend the environment are the application rule, which adds a
pair consisting of a variable x and a thunk u, e, the let rule, which adds a pair
consisting of the variable x and the thunk t, e and the fix rule, which adds a
pair consisting of the variable x and the thunk fix x t, e. In the latter rule,
the term t is duplicated: one of the copies is interpreted and the other is kept in the
environment for any recursive calls arising from the interpretation of the first one.
e t V
if e contains x = t,e
e x V
e t x, t, e
(e, x = u, e) t V
e t u V
e fun x -> t x, t, e
e n n
e u q
e t p
if p q = n
e t u n
e t 0
e u V
e ifz t then u else v V
if n is a
e t n
e v V
e ifz t then u else v V number = 0
(e, x = fix x t, e) t V
e fix x t V
(e, x = t, e) u V
e let x = t in u V
Exercise 3.1 Write a call by name interpreter for PCF.
Exercise 3.2 Which values will be obtained for the following terms according to
the interpretation rules given above for PCF?
(fun x -> fun x -> x) 2 3
35
and
(fun x -> fun y -> ((fun x -> (x + y)) x)) 5 4
Compare with Exercises 2.7 and 2.13.
Exercise 3.3 Will the interpretation rules for PCF compute the value 10 or the
value 11 for the term
let x = 4 in let f = fun y -> y + x
in let x = 5 in f 6?
Compare with Exercises 2.8 and 2.14.
e t x, t, e
e t u V
(e, x = W) t V
36
e t 0
e u V
e ifz t then u else v V
if n is a
e t n
e v V
e ifz t then u else v V number = 0
(e, x = fix x t, e) t V
e fix x t V
e t W
(e, x = W) u V
e let x = t in u V
Exercise 3.4 When we compute the value of the term (fact 3) where the function fact is defined by fix f fun n -> ifz n then 1 else
n * (f (n - 1)), we start by calling recursively the function fact with argument 2, which will create an association between the variable n and the value 2.
When we come back from the recursive call to compute the value of n and perform
the multiplication, is the variable n associated to the value 2 or the value 3? Why?
Exercise 3.5 Write a call by value interpreter for PCF.
37
For example, the term above will be written fun x -> fun y -> (x1 +
(fun z -> fun w -> (x3 + y2 + z1 + w0 )) (2 * 8) (14 + 4))
(5 + 7) (20 - 6).
It is easy to show that an occurrence of a subterm translated in the variable environment x1 , ..., xn will always be interpreted in an environment of the form
x1 = ., ..., xn = . For this reason, to find the value of the variable associated to the index p we will just look for the pth element in the environment.
This suggests an alternative way to interpret a term: we start by computing the
de Bruijn index for each occurrence of a variable; once the indices are known,
we no longer need to keep in the environment the list of variables. The environment will simply be a list of extended values. Similarly, we can dispose of variable names in closures and in thunks. Indeed, variable names are useless now
and we could for instance rewrite the term above as follows: fun _ -> fun _
-> (_1 + (fun _ -> fun _ -> (_3 + _2 + _1 + _0 )) (2 * 8)
(14 + 4)) (5 + 7) (20 - 6).
The big-step operational semantic rules can now be defined as follows
e _p V
e t t, e
e t u V
(e, W) t V
38
e t 0
e u V
e ifz t then u else v V
if n is a
e t n
e v V
e ifz t then u else v V number = 0
(e, fix _ t, e) t V
e fix _ t V
e t W
(e, W) u V
e let _ = t in u V
Exercise 3.6 Write a program to replace each variable by its De Bruijn index. Write
an interpreter for this language.
Exercise 3.7 Write the rules of the call by name big-step operational semantics
using de Bruijn indices.
We will highlight the advantages of this notation, which eliminates the names of
variables, when we study compilation in the next chapter.
In the meantime, notice that two terms have the same de Bruijn translations if
and only if they are -equivalent. This gives us a new definition of alphabetical
equivalence. Replacing variables by indices that indicate the position where they
are bound can be seen as a radical point of view that highlights the fact that bound
variables are dummies.
In this case, we could define simpler variations of the rules for the call by value
interpreter.
39
The rule that we have given to interpret the construction fixfun f x -> t
can be reformulated as follows
e fixfun f x -> t f, x, t, e
When we interpret an application t u under a call by value semantics, if the
term t is interpreted as the recursive closure f, x, t, e, that is, x, t,
(e, f = fixfun f x -> t, e) and the term u as the value W, then
to interpret the term t u, the application rule requires to interpret the term t in
the environment e, f = fixfun f x -> t, e, x = W.
We can anticipate the interpretation of the thunk fixfun f x -> t, e
that appears in this environment, and this gives rise to the rule fixfun, the recursive closure f, x, t, e. In the case of recursive closures, the application
rule can then be specialised as follows
e u W
e t f, x, t, e
(e, f = f, x, t, e, x = W) t V
e t u V
Thunks are no longer used in this rule; thus, under call by value, by introducing
recursive closures we eliminate thunks and we no longer need the rule to interpret
them.
A final simplification: standard closures x, t, e can be replaced by recursive closures f, x, t, e where f is an arbitrary variable that does not occur
in t. We can then discard the application rule for the case of standard closures.
Finally, we obtain the rules
e x V
if e contains x = V
e u W
e t f, x, t, e
(e, f = f, x, t, e, x = W) t V
e t u V
e fun x -> t f, x, t, e
where f is an arbitrary variable, different from x, that does not occur in t or e
e fixfun f x -> t f, x, t, e
e n n
e u q
e t p
if p q = n
e t u n
40
e t 0
e u V
e ifz t then u else v V
if n is a
e t n
e v V
e ifz t then u else v V number = 0
e t W
(e, x = W) u V
e let x = t in u V
Exercise 3.8 Write a call by value interpreter for PCF, using recursive closures.
Exercise 3.9 How will the rules of the big-step operational semantics with recursive
closures change if variables are replaced by de Bruijn indicessee Sect. 3.3?
41
Using the notation FIX X x, t, (e, f = X) for this rational value, we
can replace the rule above by
e fixfun f x -> t FIX X x, t, (e, f = X)
and again thunks will no longer be needed.
Note that it is sometimes better to represent such rational value in an equivalent
way
42
f = G(G(G(...))). In a sense, this explains the intuition that recursive programs are infinite programs. For example, the term fact could be written fun x
-> ifz x then 1 else x * (ifz x - 1 then 1 else (x - 1)
* (ifz x - 2 then 1 else (x - 2) * )). This replacement must
only be done on demand: in a lazy way.
We have seen that there are several ways to express this behaviour in the semantics of PCFand finally in the code of a PCF interpreter: substitute x by fix x
t and freeze this redex if it is under a fun or an ifz, store this redex as a thunk
or a recursive closure and unfreeze the thunk on demand, represent the term f =
G(G(G(...))) as a rational tree and traverse it on demand. A final method could
be to use the encoding of fix given in Exercise 2.10, and only reduce this term
(which requires the duplication of a subterm) when needed.
Exercise 3.13 (An extension of PCF with pairs) We extend PCF with the following
constructions: t,u represents the pair where the first component is t and the second
is u; fst t and snd t are, respectively, the first and second component of the
pair t. Write small-step and big-step operational semantic rules for this extension
of PCF. Write an interpreter for this extension of PCF.
Exercise 3.14 (An extension of PCF with lists) We extend PCF with the following
constructions: nil denotes the empty list, cons n l denotes a list where the first
element is the natural number n and l is the rest of the list, ifnil t then u
else v checks whether a list is empty or not, hd l returns the first element of
the list l and tl l the list l without its first element. Write small-step and bigstep operational semantic rules for this extension of PCF. Write an interpreter for
this extension of PCF. Write a program to implement a sorting algorithm over these
lists.
Exercise 3.15 (An extension of PCF with trees) We extend PCF with the following
constructions: L n denotes a tree that consists of one leaf labelled by the natural
number n, N t u denotes a tree with two subtrees t and u, ifleaf t then
u else v checks whether its first argument is a tree of the form L n or N t u,
content t denotes the content of the tree t if it is a leaf, left t and right
t denote, respectively, the left and right subtrees of t if it is not a leaf. Write smallstep and big-step operational semantic rules for this extension of PCF. Write an
interpreter for this extension of PCF.
Chapter 4
Compilation
When a computer comes out of the factory, it is not capable of interpreting a PCF
term, not even a Caml or Java program. For a computer to be able to run a PCF,
Caml or Java program, we need to have an interpreter for the language, which must
be written in the machine language of the computer. In the previous chapter we
described the principles underlying PCF interpretation, and we wrote an interpreter
in a high-level language, such as Caml. We could continue this line of thought, and
try to write now an interpreter in machine language. . . .
One possibility is to leave the realm of interpretation and move towards a compiler. An interpreter takes a PCF term as input and returns its value. A compiler,
instead, is a program that takes a PCF term as argument and returns a program, in
machine language, whose execution returns the value of the term. In other words, a
PCF compiler is a program that translates PCF terms into machine language, that is,
into a language which can be directly executed by the machine.
One of the advantages of using a compiler is that the program is translated once
and for all, when it is compiled, rather than each time it is executed. Once compiled,
the execution is usually faster. Another advantage comes from the fact that a compiler can compile itself, we call this bootstrapping (see Exercise 4.4), whereas an
interpreter cannot interpret itself.
The implementation of a compiler should be guided by the rules of the operational semantics of the language (as was the case for the interpreter). To simplify,
we will focus on a fragment of PCF where only functions can be defined recursively,
and we will use the big-step semantics with recursive closuressee Sect. 3.4.
The machine language that we will use is not a commercial one: it is the machine
language of an imaginary computer. This kind of machine is called an abstract machine. We will write a program that will simulate the behaviour of this machine.
The use of an abstract machine is not only motivated by pedagogical reasons, there
are practical reasons too: the main compilers for Caml and Java, for instance, use
abstract machines. Compiled programs are executed by a program that simulates the
workings of the abstract machine, or are further translated (in a second compilation
phase) to the machine language of a concrete machine.
G. Dowek, J.-J. Lvy, Introduction to the Theory of Programming Languages,
Undergraduate Topics in Computer Science,
DOI 10.1007/978-0-85729-076-2_4, Springer-Verlag London Limited 2011
43
44
4 Compilation
45
The second program takes a PCF term as input and, depending on the term, produces machine instructions, which will be executed by the machine, one by one. If
t is a PCF term, we denote by |t| the sequence of abstract machine instructions
generated by this program during the interpretation of the term. For instance, for
the term ((((1 + 2) + 3) + 4) + 5) + 6, the machine instructions generated are: Ldi 6, Push, Ldi 5, Push, Ldi 4, Push, Ldi 3, Push, Ldi 2, Push,
Ldi 1, Add, Add, Add, Add, Add.
Exercise 4.1 Which instructions will be executed by the abstract machine when
interpreting the term 1 + (2 + (3 + (4 + (5 + 6))))?
This way of sharing the work resembles the behaviour of a car driver and a passenger in an unfamiliar city: the passenger reads the map and gives instructions to
the driver, who follows the instructions without really knowing where the car is.
If the passenger could generate the instructions just by looking at the map, it
would be possible to record the list of instructions in a compact disk, which the
driver could then listen to in the car. In this scenario, the passenger does not need
to be in the car to guide the driver. Similarly, the interpreter could leave the sequence |t| of instructions in a file, and the file could then be executed later by the
abstract machine. We have just transformed the interpreter into a compiler.
In general, we consider that the abstract machine contains, in addition to the
accumulator and the stack, a third register: the code, the list of instructions that have
to be executed. At the beginning, the abstract machine looks for an instruction in the
code register, executes it, then looks for another instruction. . . until the code register
becomes empty. As we will see, the fact that the execution of an instruction may
add new instructions to the code register will allow us to write loops and recursive
definitions.
46
4 Compilation
then instructions Pushenv and Popenv to put the contents of the environment in
the stack and recover it. These operations are often further decomposed into several
operations to push and pop individual elements of the environment, but here we will
not decompose them in this way.
4.3.2 Closures
In PCF it is also necessary to define closures as values. In addition to the instruction
Ldi n, we will need an instruction Mkclos(f,x,t), with two variables f and x
and a term t as arguments. This instruction will build the closure f, x, t, e,
where e is the content of the environment register, and put the closure in the accumulator.
47
Let us consider now the compilation process for such a term. The interpretation
of the term u is replaced by the execution of the sequence |u| of instructions, and
similarly the interpretation of the term t is replaced by the execution of the sequence
|t| of instructions. The interpretation of t has to be replaced by the execution of
the sequence |t| of instructions. However, there is a difficulty here: t is not a
subterm of t u, it is provided by the closure resulting from the interpretation of t.
We then need to modify the notion of closure, and replace the term t in f, x,
t, e by a sequence i of instructions. Thus, terms of the form fun x -> t and
fixfun f x -> t should not be compiled into Mkclos(f, x, t), instead,
they should be compiled into Mkclos(f, x, |t|) to build the closure f, x,
|t|, e where e is the content of the environment register.
Finally, we need to include in the machine an instruction Apply that takes
a closure f, x, i, e from the accumulator, puts the environment e, f =
f, x, i, e, x = W, where W is the top of the stack, in the environment register, discards the top of the stack and adds to the code register the sequence i of
instructions.
The term t u can then be compiled as the sequence of instructions Pushenv,
|u|, Push, |t|, Apply, Popenv.
Summarising, the abstract machine has the set of instructions Ldi n, Push, Add,
Extend x, Search x, Pushenv, Popenv, Mkclos(f,x,i) and Apply. To complete it, we just need to add the arithmetic operations Sub, Mult, Div and the test
Test(i,j) to compile the operators -, *, / and ifz.
To simplify the machine we can use De Bruijn indicessee Sect. 3.3. Recall that
the instruction Search x is generated by the compilation of variables, and we have
already seen that it is possible to determine the index of each variable occurrence
statically. We could then compile a variable x using the instruction Search n, where
n is a number, instead of earch x.
De Bruijn indices can be computed at the same time as the compilation is performed, it suffices to compile a term in a variable environment, and compile the
variable x in the environment e by the instruction Search n, where n is the position
of the variable x in the environment e, starting by the end.
This mechanism allows us to dispose of variables in environments, closures, and
instructions Mkclos and Extend. Our abstract machine includes the instructions
Ldi n, Push, Extend, Search n, Pushenv, Popenv, Mkclos i, Apply, Test(i,j),
Add, Sub, Mult and Div.
48
4 Compilation
49
The state of the abstract machine at the beginning of the 14th execution step for
the program Pushenv, Ldi 1, Extend, Ldi 6, Push, Ldi 5, Push, Ldi 4, Push,
Ldi 3, Push, Ldi 2, Push, Search 0, Add, Add, Add, Add, Add, Popenv.
Exercise 4.3 We extend PCF with the tree operators described in Exercise 3.15.
Write a compiler and an abstract machine for this extension of PCF.
Exercise 4.4 (A bootstrapping compiler) Many kinds of data structures can be represented using the trees described in Exercise 3.15. To start with, we can represent
a natural number n as a tree L n. The character c can be represented by the tree
L n where n is a code, for instance the ASCII code of the character c. If t1 , t2 ,
..., tn are trees, the list t1 , t2 , ..., tn can be represented by the tree
N(t1 , N(t2 , ..., N(tn , L 0)...)). Finally, values of a type defined by
constructors that are themselves representable could be defined by enumerating the
constructors and representing the value C(V1 , V2 , ..., Vn ) by the list L p,
t1 , t2 , ..., tn where p is the number associated to the constructor C and t1 ,
t2 , ..., tn represent the values V1 , V2 , ..., Vn .
50
4 Compilation
Chapter 5
5.1 Types
In Mathematics, the domain of a function is a set (any set). For example, we can
define a function m from 2N to N that associates to each even number its half. Then,
G. Dowek, J.-J. Lvy, Introduction to the Theory of Programming Languages,
Undergraduate Topics in Computer Science,
DOI 10.1007/978-0-85729-076-2_5, Springer-Verlag London Limited 2011
51
52
5.1 Types
53
a type symbol -> with two type arguments and which does not bind any variable
in its arguments.
Alternatively, we can define the syntax of the typed version of PCF inductively
A=X
| nat
| A -> A
t=x
| fun x:A -> t
|t t
|n
|t + t | t - t | t * t | t / t
| ifz t then t else t
| fix x:A t
| let x:A = t in t
54
e t : A
(e, x : A) u : B
e let x:A = t in u : B
In the first rule only the rightmost declaration for x is taken into account, the others
are hidden.
The language includes variables of various sorts, in particular type variables for
which we will use capital letters. Since no symbol can bind a type variable, a closed
term will not contain type variables. Moreover, if a closed term t has the type A in
the empty environment, then the type A must be closed too. So, type variables are
not really used here; they will be used in the next chapter.
Let e be an environment and t a term. Reasoning by induction on t, we can
show that the term t has at most one type in the environment e.
We can build a type checking algorithm based on the typing rules given above.
The algorithm will check whether a term t has a type in an environment e, and if it
does, it will give the type as a result. It will do this by typing recursively the direct
subterms of the given term, and will then compute the type of term using the types
of the subterms.
Exercise 5.1 Write a type checker for PCF.
Reduction is still confluent on the typed language, and types bring us an additional property: all the terms that do not contain the operator fix terminate
Taits Theorem. It will be impossible to build a term such as (fun x -> (x x))
(fun x -> (x x)), which does not terminate and does not contain fix.
Exercise 5.2 Write typing rules for the version of PCF that uses de Bruijn indices
instead of variable namessee Sect. 3.3.
Exercise 5.3 We extend PCF with the constructs described in Exercise 3.13 to define pairs, and we introduce a symbol to denote the Cartesian product of two
types. Write typing rules for this extension of PCF. Write a type-checker for this
extension of PCF.
Exercise 5.4 We extend PCF with the constructs described in Exercise 3.14 to define lists, and we introduce a type natlist for these lists. Write typing rules for
this extension of PCF. Write a type-checker for this extension of PCF.
Exercise 5.5 We extend PCF with the constructs described in Exercise 3.15 to define trees, and we introduce a type nattree for these trees. Write typing rules for
this extension of PCF. Write a type-checker for this extension of PCF.
55
56
value to an application whose left-hand side has a value that is a numeric constant,
or how to associate a value to an arithmetic operation where the value of one of
the arguments is a term of the form fun, or a value to a conditional where the first
argument has a value that is of the form fun. However, for well-typed terms the
rules are complete. In other words, the three examples that we have just mentioned
cannot arise.
We start by showing a type-preservation-by-interpretation lemma, which states
that if a closed term t has type A then its value, if it exists, also has type A. This
lemma corresponds to the subject reduction lemma of the small-step operational
semantics.
Then we show, as for the small-step semantics, that a term of the form fun
cannot have type nat and, similarly, that a numeric constant cannot have a type of
the form A -> B.
Since we know that the value of a term is either a number or a term of the form
fun, we deduce that the value of a term of type nat is a numeric constant, and
the value of a term of type A -> B is a term of the form fun. Therefore, when
interpreting a well-typed term, the left-hand side of an application will always be
interpreted as a term of the form fun, the arguments of arithmetic operators will
always be interpreted as numeric constants, and the first argument of an ifz will
always be interpreted as a numeric constant.
Exercise 5.6 (Equivalent semantics) Show that the computation of a well-typed
term produces a result under call by name small-step operational semantics if and
only if it produces a result under call by name big-step operational semantics. Moreover, the result is the same in both cases. Show that the same property is true of the
call by value semantics.
Does this result hold also for the untyped version of PCF? Hint: what is the result
of ((fun x -> x) 1) 2?
57
This is really trivial: a program is a function and its semantics is the same function. Achieving this triviality is one of the goals in the design of functional languages.
Two remarks are in order. First, division by 0 produces an error in PCF, whereas
it is not defined in Mathematics. To be precise, we should add a value error to
each set JAK and adapt the definition given above. Second, in this definition we have
forgotten the construction fix.
5.3.2 Termination
The only construct with a non-trivial denotational semantics is fix, because this
construct is not usually found in everyday definitions of functions in Mathematics.
Unlike PCF, mathematical definitions can only use fixed points of functions that do
have a fixed point, and even then if there are several fixed points it is essential to
specify which one we are taking. We left these issues aside when we defined PCF,
it is now time to deal with them.
Consider a function that does not have a fixed point: the function fun x:nat
-> x + 1. In PCF, we can build the term fix x:nat (x + 1). Similarly,
the function fun f:(nat -> nat) -> fun x:nat -> (f x) + 1 does
not have a fixed point but we can build the term fix f:(nat -> nat) fun
x:nat -> (f x) + 1. On the other hand, the function fun x:nat -> x, has
many fixed points, and still we can build the term fix x:nat x.
When we defined the operational semantics of PCF, we gave a reduction rule
fix x:A t (fix x:A t/x)t
that explains the idea of a fixed point. Using this rule, we can see that the term
a = fix x:nat (x + 1) reduces to a + 1, then to (a + 1) + 1, ...
without ever reaching an irreducible term. Similarly, if g = fix f:(nat ->
nat) fun x:nat -> (f x) + 1, the term g 0 can be reduced in two steps
to (g 0) + 1 and then ((g 0) + 1) + 1, ... and again will never reach
an irreducible term. The same thing happens with the term b = fix x:nat x,
which reduces to b, and again to b, . . . and will never reach an irreducible term. In
other words, it appears that in PCF, when we take the fixed point of a function that
does not have any, or that has more than one, the program does not terminate.
58
59
and we define Jfix x:nat tK as the least fixed point of the function Jfun
x:nat -> tK, forcing the use of the fixed point when more than one fixed point
exist. It remains to prove that the least fixed point exists; we will use the fixed point
theorem for this. To apply this theorem, we must show that the ordering relation that
we defined on JnatK is weakly complete, and that the semantics of a program of
type nat -> nat is a continuous function.
More generally, we will build for each type A a set JAK endowed with a weakly
complete ordering relation, and we will show that the semantics of a program of
type A -> B is a continuous function from JAK to JBK.
We start by defining the sets JAK. The set JnatK will be defined as N {}, with
the ordering relation given above. The set JA -> BK is defined to be the set of all
continuous functions from JAK to JBK, with the ordering relation f g if for all x
in JAK, f x g x.
We can show that these ordering relations are weakly complete. The ordering on
JnatK is weakly complete because any increasing sequence is either constant or has
the form , , ..., , n, n, ... and in both cases there is a limit.
We will now show that if the ordering relations on JAK and JBK are weakly
complete, then so is the ordering on JA -> BK. Let us consider an increasing sequence fn over JA -> BK. Using the definition of the ordering on JA -> BK, for
all x in JAK, the sequence fn x, whose values are in JBK, is also increasing, and
therefore has a limit. Let us call F the function that associates to x the element
limn (fn x). We can showbut we will not do it herethat the function F is
in JA -> BK, that is, it is a continuous function (this requires a lemma to permute
limits). By construction, the function F is greater than all the functions fn , and it is
the least such function. Therefore it is the limit of the sequence fn . Any increasing
sequence has a limit and the ordering relation on JA -> BK is therefore weakly
complete.
Each set JAK has a least element, written A . The least element of JnatK is ,
and the least element of JA -> BK is the constant function that returns the value
B for all arguments.
60
To show that this definition is correct, we need to prove that if t is a term of type
A then JtK is in JAK, that is, we need to prove that the function is continuous. This
is true, but we will not prove it here.
Exercise 5.7 What is the semantics of the term fun x:nat -> 0? And the semantics of fix x:nat x and (fun x:nat -> 0) (fix x:nat x)?
Exercise 5.8 What is the value of Jifz t then u else vKe , if JtKe = 0,
JuKe = 0 and JvKe = nat ?
We can now state the equivalence theorem for the two semantics. Let t be a
closed term of type nat and n a natural number: t n under call by name if and
only if JtK = n. The direct implication is not difficult to prove, but the converse is
not trivial.
Exercise 5.9 Show, using the equivalence theorem, that if t is a closed term of type
nat such that JtK = , there is no natural number n such that t n.
Exercise 5.10 Let G be the denotational semantics of the term fun f:(nat ->
nat) -> fun n:nat -> ifz n then 1 else n * (f (n - 1)).
The denotational semantics of the term fix f:(nat -> nat) fun n:nat
-> ifz n then 1 else n * (f (n - 1)) is the least fixed point of G.
By the first fixed point theorem, this is the limit of the sequence Gn (nat -> nat ).
Which function is denoted by nat -> nat ? And by Gn (nat -> nat )? Identify
the limit of this sequence.
Show that for any natural number p, there exists a natural number m such that
Gm (nat -> nat )(p) = limn Gn (nat -> nat )(p).
Exercise 5.11 We consider the following elements in the set Jnat -> natK: the
function u that maps to and all other elements to 0, the function vi that maps
to , i to 1 and all other elements to 0, and the function wi that maps to ,
0, 1, ..., i-1 to 0 and all other elements to .
Let F be an increasing function from Jnat -> natK to JnatK, such that F u
= 0 and for all i, F vi = 1. Show that for all i, F wi = . Show that the
function F is not continuous.
Show that it is not possible to write a PCF function that takes as argument a function g of type nat -> nat and returns 0 if for all n, g n = 0 and 1 otherwise.
61
Exercise 5.12 (An information-based approach to continuity) It might seem surprising that the notion of continuity is used to define the semantics of PCF, even
though PCF works only with natural numbers, not with real numbers. In fact, the set
of functions from N to N, or the set of sequences of natural numbers, is very similar
to the set of real numbers.
The intuition is that a real function f is continuous if to compute the initial n
decimal places of f x it is sufficient to know a finite number of decimals in x.
Unfortunately, this is technically false if x or f x are decimal numbers. We will
say that a decimal number approximates a real number to the nth decimal place if
the distance between the two is smaller than 10n . Thus, the number has two
approximations to the second decimal place: 3.14 and 3.15, and it makes sense
to say that the function f is continuous if to compute a decimal approximation of f
x to the nth place it is sufficient to have some decimal approximation of x.
The goal of this exercise is to show that, similarly, a function f from sequences
of natural numbers to sequences of natural numbers is continuous if to compute the
first n terms in f x it is sufficient to have an initial segment of x. If we agree to call
a finite initial segment of the sequence a finite approximation, then we can rephrase
it as follows: to compute an approximation of f x with n terms, it is sufficient to
have a certain approximation of x.
Let u be a sequence of natural numbers, and let U be the element of Jnat ->
natK that associates to and ui to i.
Let V be a sequence with elements in Jnat -> natK
[ , 0 , 1 , 2 , 3 , ...],
[ , 0 u0 , 1 , 2 , 3 , ...],
[ , 0 u0 , 1 u1 , 2 , 3 , ...],
[ , 0 u0 , 1 u1 , 2 u2 , 3 , ...],
...
Show that the sequence V converges to U. Let F be a continuous function on
Jnat -> natK. Show that the sequence F Vi converges to F U. Show that the
sequence F Vi p converges to F U p. Show that there exists a natural number k
such that F Vk p = F U p. Show that to compute F U p, it suffices to have the
first k terms in U. Show that to compute the first n terms in F U it is sufficient to
know a finite number of terms in U.
Consider the function that associates to a sequence u the number 0 if u is always
0, and 1 otherwise. Is this function continuous? Can it be written in PCF?
Finally, notice that in these two examples, the approximationsdecimal numbers
or finite sequencescontain a finite amount of information, whereas the objects that
they approximatereal numbers or infinite sequencescontain an infinite amount
of information.
Exercise 5.13 (Gdels System T) To avoid non-terminating computations, we can
replace fix by a rec construct to define functions by induction. All the programs
in this language terminate, but the language is no longer Turing complete. Still, it is
62
not easy to find a program that cannot be represented in this language, you need to
be an expert logician to build such a program.
The function f defined by f 0 = a and f (n + 1) = g n (f n) is written rec a g. The small-step operational semantic rules for this construct are
rec a g 0 a
rec a g n g (n - 1) (rec a g (n - 1))
if n is a natural number different from 0.
Program the factorial function in this language. Give typing rules for rec. Give
a denotational semantics for this language.
Chapter 6
Type Inference
In many programming languages, for instance Java and C, programmers must declare a type for each of the variables used in the program, writing for example fun
x:nat -> x + 1. However, if we know that + can only work with numbers, it
is not difficult to show that in the term fun x -> x + 1 the variable x has to
be of type nat. We can then let the computer infer the types, rather than asking the
programmer to write them. This is the goal of a type inference algorithm.
if e contains x : A
63
64
6 Type Inference
e u : A
e t : A -> B
e t u : B
(e, x : A) t : B
e fun x -> t : A -> B
e n : nat
e u : nat
e t : nat
e t u : nat
e t : nat
e u : A
e v : A
e ifz t then u else v : A
(e, x : A) t : A
e fix x t : A
e t : A
(e, x : A) u : B
e let x = t in u : B
Some terms, for example the term fun x -> x, may have more than one
type in this system. For instance, we can derive the judgement fun x -> x :
nat -> nat and also the judgement fun x -> x : (nat -> nat) ->
(nat -> nat). A closed term may have a type with free variables, for example
the term fun x -> x has type X -> X in the empty environment.
We can prove that if a closed term t has a type A which contains variables in the
empty environment, then t also has type A for any substitution . For example, if
we substitute the variable X by the type nat -> nat in X -> X, we obtain the
type (nat -> nat) -> (nat -> nat) and this is one of the possible types
for the term fun x -> x.
65
we need to type the term 2, which has type nat, and the term f 1. The term 1
has type nat and the term f has type X. We generate the equation X = nat ->
Y and the type of f 1 is Y. Once the terms 2 and f 1 are typed, we generate the
equations nat = nat and Y = nat, and the type of the term 2 + (f 1) is
nat. Finally, the type of the term fun f -> 2 + (f 1) is X -> nat and the
equations that we need to solve are
X = nat -> Y
nat = nat
Y = nat
This system of equations has a unique solution X = nat -> nat, Y = nat,
and therefore the only type that we can assign to the term fun f -> 2 + (f 1)
is (nat -> nat) -> nat.
We can describe the first part of the algorithm using a set of rules in the style of
the big-step operational semantics (as we did for the type checking algorithm), but
in this case the result of the interpretation of a term will not be a value or a type, it
will be a pair of a type and a set of equations on types. We write e t A, E
to denote the relation between the environment e, the term t, the type A and the set
of equations E.
e x A, if e contains x : A
e u A, E
e t B, F
e t u X, E F {B = A -> X}
(e, x : X) t A, E
e fun x -> t (X -> A), E
e n nat,
e u A, E
e t B, F
e t u nat, E F {A = nat, B = nat}
e t A, E
e u B, F
e v C, G
e ifz t then u else v B, E F G {A = nat, B = C}
(e, x : X) t A, E
e fix x t A, E {X = A}
e t A, E
(e, x : A) u B, F
e let x = t in u B, E F
In the application rule, the variable X is an arbitrary variable that does not occur in
e, A, B, E and F. In the rules for fun and fix, it is an arbitrary variable that does
not occur in e.
Let t be a closed term and let A and E be the type and the set of equations
computed by this algorithm, that is, we have t A, E. A substitution =
B1 /X1 , ..., Bn /Xn is a solution of E if, for each equation C = D in E, the
types C and D are identical. We can show that if a substitution is a solution of
the set E, then the type A is a type for t in the empty environment. In general, if e
66
6 Type Inference
67
The algorithm has the following property: if e t A, , then A is a principal type of t in the environment e. The algorithm is defined below.
e x A, if e contains x : A
e u A,
e t B,
e t u X,
if = mgu(B = A -> X)
(e, x : X) t A,
e fun x -> t (X -> A),
e n nat,
e u A,
e t B,
e t u nat,
if = mgu(A = nat) and = mgu(B = nat)
e t A,
e u B,
e v C,
e ifz t then u else v C,
if = mgu(A = nat) and = mgu( B = C)
(e, x : X) t A,
if = mgu(A = X)
e fix x t A,
e t A,
(e, x : A) u B,
e let x = t in u B,
Again, in the application rule X is an arbitrary variable that does not occur in e, A,
B, and , and in the rules for fun and fix, it is a variable that does not occur
in e.
Exercise 6.1 Give a principal type for the term fun x -> fun y ->
(x (y + 1)) + 2. Describe all of its types.
Give a principal type for the term fun x -> x. Describe all of its types.
Exercise 6.2 (Unicity of principal types) A substitution is called a renaming if it
is an injective map associating a variable to each variable. For example, the substitution y/x, z/y is a renaming. Let A be a type and , two substitutions. Show
that if A = A then |FV(A) is a renaming.
Deduce that if A and A are two principal types of a term t then there exists a
renaming , with domain FV(A), such that A = A.
Exercise 6.3 In the general case of a language without binders, we can replace the
first three rules in Robinsons unification algorithm by the two rules
if an equation is of the form f(u1 , ..., un ) = f(v1 , ..., vn ), replace it by u1 = v1 , ..., un = vn ,
68
6 Type Inference
6.2 Polymorphism
We have seen that the principal type of the term id = fun x -> x is X -> X.
This means that the term id has type A -> A for any type A. We could give it a
new type X (X -> X) and add a rule so that if a term t has type X A then
it has the type (B/X)A for any type B. A type language that includes a universal
quantifier is polymorphic.
In the system presented in the previous section, the term let id =
fun x -> x in id id was not typeable. Indeed, the typing rule for let requires that we type both fun x -> x and id id, but the latter is not typeable
because we cannot assign the same type to both occurrences of the variable id.
For this reason the term let id = fun x -> x in id id cannot be typed.
This could be seen as a flaw in the type system, because the term (fun x -> x)
(fun x -> x), obtained by replacing id by its definition, is typeable. Indeed,
to type this term it is sufficient to assign type nat -> nat to the first occurrence
of the bound variable x and type nat to the second.
If we give the type X (X -> X) to the symbol id in the term let id =
fun x -> x in id id we can then use a different type for each occurrence of
id in the term id id, and the term becomes typeable.
Typing the term let id = fun x -> x in id id might seem a minor
issue, and adding quantifiers to the type language might seem a high price to pay
to obtain a marginal increase in power. However, this is a wrong impression. In
fact, in the extension of PCF with listssee Exercise 3.14, this feature allows
us to develop a unique sorting algorithm and apply it to all the lists, irrespective of
the type of their arguments: let sort = t in u. Polymorphism entails more
code reuse, and therefore more concise programs.
We will therefore give a quantified type to the variables bound in a let, but use
a standard type for variables that are bound in a fun or fix.
6.2 Polymorphism
69
A scheme has the form X1 ... Xn A where A is a type. We will then define
a language with two sorts: a sort for types and a sort for schemes. Since the sets of
terms of each sort are disjoint in a many-sorted language, the set of types cannot be
a subset of the set of schemes, and we will need to use a symbol [ ] to inject a type
in the sort of the schemes. Thus, if A is a type, [A] will be the scheme consisting
of the type A without any quantified variable.
The language of types and schemes is defined by
a type constant nat,
a type symbol -> with two type arguments, which does not bind any variable in
its arguments,
a scheme symbol [ ] with one type argument, which does not bind any variable
in its argument,
a scheme symbol with one scheme argument, which binds a variable in its
argument.
A=X
| nat
| A -> A
S=Y
| [A]
| X S
This language includes variables for every sort, in particular scheme variables. However, these variables will not be used.
An environment is now a list associating a scheme to each variable. We define
inductively the relation the term t has the scheme S in the environment e
e x : S if e contains x : S
e u : [A]
e t : [A -> B]
e t u : [B]
(e, x : [A]) t : [B]
e fun x -> t : [A -> B]
e n : [nat]
e u : [nat]
e t : [nat]
e t u : [nat]
e t : [nat]
e u : [A]
e v : [A]
e ifz t then u else v : [A]
(e, x : [A]) t : [A]
e fix x t : [A]
e t : S
(e, x : S) u : [B]
e let x = t in u : [B]
e t : S
e t : X S if X does not occur free in e
70
6 Type Inference
e t : X S
e t : (A/X)S
This inductive definition assigns a scheme to each term, in particular to variables.
This is why variables are associated to schemes in the environment. However, when
we type a term of the form fun x -> t or fix x t, we type t in an extended
environment where the variable x is associated to a scheme [A] without quantifiers.
A scheme can be associated to a term t only during the typing of a term of the form
let x = t in u, and then this scheme is associated to the variable x.
To introduce quantifiers in the scheme associated to t we use the penultimate
rule, which allows us to quantify a variable in the scheme S if the variable does not
occur free in e. Thus, in the empty environment, after assigning the scheme [X ->
X] to the term fun x -> x we can assign the scheme X [X -> X] to it. Note
that in the environment x : [X], after assigning the scheme [X] to the variable x
we cannot assign the scheme X [X].
Finally, note that if we have assigned a quantified scheme to a variable, or to an
arbitrary term, we can remove the quantifier and substitute the free variable using
the last rule. For example, in the environment x : X [X -> X] we can assign
the scheme [nat -> nat] to the variable x.
6.2 Polymorphism
71
Chapter 7
Consider two numbers: and the temperature in Paris. Today, the number has
a value between 3.14 and 3.15 and the temperature in Paris is between 16 and
17 degrees. Tomorrow, will have the same value, but the temperature in Paris will
probably change. In Mathematics, numbers are entities that do not change over time:
the temperature in Paris is not a number that changes, it is a function that varies over
time.
However, formalising the temperature of a system as a function of time is perhaps
too general. It does not take into account the fact that the variation in temperature
at a given point in time depends, in general, of the temperature at this point and
not the temperature ten seconds earlier or ten seconds later. In general, a system
does not have access to the full temperature function, just the current value of the
function. This is why equations in Physics are generally differential equations and
not arbitrary equations on functions.
In Computer Science, programs also use objects that vary over time. For example,
in the program that manages the sale of tickets for a concert, the number of seats
available varies over time: it decreases by one each time a ticket is sold. From the
mathematical point of view, it is a function of time. However, to know whether
it is possible or not to sell a ticket, or whether booking is no longer possible, the
program only needs to know the current value of this function, not the full function:
at a certain point t in time, it needs the value of the function at t.
For this reason, when we write such a program, we do not represent the number
of places available for the concert as a function, that is, as a term of type nat ->
natassuming a discrete clock, which would mean that at each instant t we
know the number of seats still available for the concert at each instant t. This is
clearly impossible, since it requires to know the number of seats available at each
instant t in the future. We cannot express this number by a term of type nat
either, because as a number the value of a term of type nat in PCF cannot change
over time. We have to introduce another sort of terms for the values that change over
time: references, also called variables but we prefer not to use the word variable in
this context, since the notion of a reference is very different from the notion of a
variable in Mathematics and in functional languages.
G. Dowek, J.-J. Lvy, Introduction to the Theory of Programming Languages,
Undergraduate Topics in Computer Science,
DOI 10.1007/978-0-85729-076-2_7, Springer-Verlag London Limited 2011
73
74
If x is a reference, we can do two things with it, get its current value !x and
modify its value x := t, that is, contribute to the construction of the function that
we mentioned above, asserting that the value of the function is now, and until further
notice, the current value of the term t.
The issue of equality of numbers that vary over time is subtle. We could compare such a number, the temperature in Paris for instance, with a leaf in a tree: small,
green and flexible in Spring, it becomes bigger, yellow and brittle in Autumn. There
is clearly a change, but we know that it is the same leaf: nobody would believe that
the little green leaf disintegrated and suddenly the big yellow leaf appeared ex nihilo.
Although there is a transformation, the same leaf remains in the tree from March till
October. This is an instance of the old paradox, that something can change while remaining the same. Similarly, the notion of temperature in Paris is always the same,
even if the temperature changes over time. On the other hand, we can easily distinguish the temperature in Paris from the temperature in Rome: these are two different
things, even if from time to time the temperature is the same in both cities.
One way to deal with this paradox is to consider the temperature in Paris and
the temperature in Rome as functions: a function may take different values at two
different points and remain the same function, and two different functions might
take the same value at a given point.
In a program, if x and y are two references and we need to compare them, we
should distinguish carefully between their equality as references, that is, whether x
and y are the same thing or notin mathematical terms: whether they are the same
function of timeand equality of their contents, that is, whether the numbers !x
and !y are the same at a particular point in time. In particular, equality of references
implies that if we modify the value of x then the value of y also changes, but this is
not the case if they are different references with the same value.
75
computer. In other languages, the set R is arbitrary. In particular, when we define the
semantics of a language, we do not distinguish between sets R and R of the same
cardinality (i.e., with the same number of elements). This means that programmers
cannot know the exact set of memory addresses used to store the data.
In PCF, as well as in most programming languages, the values associated to references may change over time. Moreover, the set R itself may vary over time: it is
possible to create a reference during the execution of the program. To do this, the
language includes a construct ref. The side effect associated to the interpretation
of the term ref t is the creation of a new reference whose initial value is the current value of the term t. The value computed by this interpretation is the reference
itself.
Since the interpretation of the term ref t produces a value which is a reference,
it is clear that references must be values in this extension of PCF.
if e contains x = V
if e contains
e, m fix y t V, m
e, m x V, m
x = fix y t, e
e, m u W, m
e, m t x, t, e, m
(e, x = W), m t V, m
e, m t u V, m
e, m fun x -> t x, t, e, m
e, m n n, m
e, m u q, m
e, m t p, m
if p q = n
e, m t u n, m
e, m t 0, m
e, m u V, m
e, m ifz t then u else v V, m
76
e, m t n, m
e, m v V, m if n is a
e, m ifz t then u else v V, m
number = 0
(e, x = fix x t, e), m t V, m
e, m fix x t V, m
e, m t W, m
(e, x = W), m u V, m
e, m let x = t in u V, m
We can now give rules for the three new constructs, ref, ! and :=
e, m t V, m
e, m ref t r, (m, r = V)
if r is any reference not occurring in m
e, m t r, m
if m contains r = V
e, m !t V, m
e, m t r, m
e, m u V, m
e, m t := u 0, (m, r = V)
The construction t; u whose semantics is obtained by interpreting t, throwing
away the value obtained, then interpreting u, is not very interesting in a language
without side effects, because in that case the value of the term t; u is always the
same as the value of u, assuming t terminates. We can now add it to PCF
e, m t V, m
e, m u W, m
e, m t; u W, m
We can also add now constructions whilez, for, . . . which were of no interest in
a language without side effects.
Exercise 7.1 Write an interpreter for the language PCF with references.
The uncertainty that we mentioned at the beginning of the book regarding the
evaluation of nested functions is finally elucidated.
Exercise 7.2 Consider the term
let n = ref 0
in let f = fun x -> fun y -> x
in let g = fun z -> (n := !n + z; !n)
in f (g 2) (g 7)
What is the value of this term? In which order will the arguments be interpreted in
PCF? Why?
Modify the rules given above to obtain the value 2 instead of the value 9 for this
term.
77
78
79
a constant for each of these references. These references will be called mutable
variables. The symbol := applies now to a mutable variable and a term, written
X := t.
If X is a mutable variable, the value that the operational semantics associates to
the term X is the value associated to the reference X in the state available at the time
of interpretation.
Give a big-step operational semantics for this extension of PCF.
Write an interpreter for this extension of PCF.
Exercise 7.9 (A minimal imperative language) Consider a language including integer constants, arithmetic operations, mutable variablessee Exercise 7.8, assignment :=, sequence ;, a conditional ifz and a whilez loop (but without the usual
notion of variable, fun, fix, let or application).
Give rules to define the operational semantics of this language. Write an interpreter for this language. Write a program to compute factorial in this language. What
can we program in this language?
To conclude this chapter, we remark that in most programming languages there
are two different ways to program the factorial function. For example, in Java, we
can program it recursively
static int fact (int x) {
if (x == 0) return 1; return x * (fact (x - 1));}
or iteratively
static int fact (int x) {
int k = 1;
for (int i = 1; i <= x; i = i + 1) k = k * i;
return k;}
Should we prefer the first version or the second?
Of course, the theory of programming languages does not give us an answer to
moral questions of the form Should we. . . ? We could nevertheless say a few
words about the way this question has evolved.
In the first programming languagesmachine languages, assembly languages,
Fortran, Basic, . . . only the second version could be programmed. Indeed, a program with loops and references is easier to execute in a machine that is itself, in
fine, a physical system with a mutable state, than a program that requires evaluating
a function defined via a fixed point.
Lisp was one of the first languages to promote the use of recursive definitions.
With Lisp, for the first time, programs did away with references and side effects,
and this simplified the semantics of the language, brought it close to mathematical
language, allowed programmers to reason over programs in an easier way, and facilitated the task of writing complex programs. For example, it is much easier to
write a program to compute the derivative of an algebraic expression using recursion than keeping track of a stack of expressions that are waiting to be treated. It was
80
then natural to contrast the pure functional style of programming with the impure
imperative one.
But the first implementations of functional languages were very slow in comparison with those of imperative languages, precisely because, as we have said, it is
more difficult to execute a functional program on a machine, which is a physical
system, than it is to execute an imperative program. During the 1990s, the compilation techniques for functional languages made such a huge progress that efficiency
is no longer a valid argument against functional programming today, except in the
domain of intensive computation.
Moreover, all modern languages include both functional and imperative features,
which means that today the only valid argument to justify the choice of a particular
style should be its simplicity and ease of use.
From this point of view, it is clear that not all problems are identical. A program
that computes derivatives for functional expressions is easier to express in functional
style. In contrast, when we program the Logo turtle it is more natural to talk about
the position of the turtle, its orientation, . . . that is, its state at a given instant. It
is also natural to talk about the actions that the turtle does: to move, to write a line,
. . . , and it is not easy to express all this in a functional way: in fact, it is not natural
to think of the turtles actions as functions over the space of drawings.
There is still one point that remains mysterious: programs, whether functional or
imperative, are always functions from inputs to outputs. If imperative programming
brought us new ways of defining functions, which in certain cases are more practical
from a Computer Science point of view than the mathematical definitions that are
typical of functional languages, we could wonder whether they would also be more
practical for mathematicians. However, so far the mathematical language has not
adopted the notion of reference.
Chapter 8
8.1 Records
In the equations describing the movement of two bodies that exert a force on each
other, for example, a star and a planet, their positions are represented by three coordinates (functions of time). This leads to a system of differential equations with
six variables. However, instead of flattening them, we can group them in two
packages of three variables each, obtaining a system of differential equations with
vector variables. There are mathematical tools to pack several values into one: the
notion of a pair, which can be iterated to build tuples, and the notion of a finite
sequence.
In programming languages we also need tools to pack several values into one.
The tools that we have for this are the notion of a pair, the notion of an array,
the notion of a record, the notion of an object and the notion of a module. The
components of those structures are called fields.
81
82
8.1 Records
83
The small-step operational semantics of PCF will now include the following rules
{l1 = t1 , ..., ln = tn }.li ti
{l1 = t1 , ..., ln = tn }(li <- u)
{l1 = t1 , ..., li1 = ti1 , li = u,
li+1 = ti+1 , ..., ln = tn }
Similarly, the big-step operational semantics is extended with the following rules
t1 V1 tn Vn
{l1 = t1 , ..., ln = tn } {l1 = V1 , ..., ln = Vn }
t {l1 = V1 , ..., ln = Vn }
t.li Vi
t {l1 = V1 , ..., ln = Vn }
u W
t(li <- u) {l1 = V1 , ..., li1 = Vi1 , li = W,
li+1 = Vi+1 , ..., ln = Vn }
Notice that in these rules the terms of sort label are not interpreted. This is because, as mentioned above, these terms are constants.
Exercise 8.2 Write an interpreter for PCF with records.
Exercise 8.3 The goal of this exercise is to represent a Logo turtle with a record
containing an abscissa, an ordinate, and an angle. The turtle should have an internal state so that it can move without changing its identitysee the introduction to
Chap. 7. There are two alternatives: the turtle can be defined as a record of references to real numbers, or as a reference to a record of real numbers. Write the
function move-forward in both cases.
In this exercise we assume that there is a type of real numbers and all the necessary operations.
In the big-step operational semantics that we gave for PCF with records, the interpretation of the term {a = 3 + 4, b = 2} requires to perform the addition
of 3 and 4. In contrast, once the value {a = 7, b = 2} is built, an access to the
field a does not require to perform an arithmetic operation.
An alternative would be to delay the addition and assume that the term {a =
3 + 4, b = 2} is a value that can be interpreted as itself. In this case, we will
need to interpret the term 3 + 4 each time there is an access to the field a. We
could say that this semantics is a call by name one, as opposed to the semantics we
gave above, which follows the call by value strategy.
In call by name, the rules of the operational semantics are
{l1 = t1 , ..., ln = tn } {l1 = t1 , ..., ln = tn }
84
t(li
t {l1 = t1 , ..., ln = tn }
ti V
t.li V
t {l1 = t1 , ..., ln = tn }
<- u) {l1 = t1 , ..., li1 = ti1 , li = u,
li+1 = ti+1 , ..., ln = tn }
Exercise 8.4 Write an interpreter for PCF with records following the call by name
semantics.
If we compare these two semantics of records, we are lead to make the same
comments as for the semantics of functions in call by value vs. call by name: the
interpretation of let x = {a = fact 10, b = 4} in x.b requires the
computation of the factorial of 10 in call by value, but not in call by name. On
the other hand, the interpretation of let x = {a = fact 10, b = 4} in
x.a + x.a under call by name triggers twice the computation of the factorial 10.
The interpretation of let x = {a = fix y y, b = 4} in x.b produces
an infinite loop under call by value, whereas it successfully returns 4 under call by
name. Finally, when we also have references, the side effects of the interpretation
of a field could be repeated several times if we access the filed several timessee
Exercise 7.5.
For example, if we build a record x with a field a that is a reference to a natural
number, initially 0, and a function inc that increases this number by one, and then
we write a term that increases this value and returns it, we obtain
let x = {a = ref 0}
in let inc = fun y -> (y.a := 1 + !(y.a))
in (inc x; !(x.a))
Under call by value, this term produces the result 1, as one expects. However, a call
by name interpretation will access three times the field a of the record x, that is, it
will interpret three times the term ref 0, creating three references that point to the
value 0. The third reference, created by the interpretation of the term !(x.a), is
never updated and therefore the interpretation of the programme above under call
by name produces the result 0.
To make sure that the call by value and the call by name interpretations produce
the same result, we should avoid side effectssuch as the creation of a reference in
the example aboveduring the interpretation of fields. We can rewrite the term as
follows
let r = ref 0
in let x = {a = r}
in let inc = fun y -> (y.a := 1 + !(y.a))
in (inc x; !(x.a))
which guarantees that the value will be 1, whether in call by value or call by name.
Exercise 8.5 (Types for records) Consider a type person for records with three
fields: surname, name and telephone. Show that we can program the three
8.2 Objects
85
8.2 Objects
Programs usually deal with various kinds of data, often structured as records. For example, a companys computer system might deal with order forms from customers,
invoices, pay slips. . . . A customer order might be represented as a record including
the identification of the object ordered, the quantity requested. . . To print the data
there are several alternatives. We could write a unique function print that starts by
checking which kind of data we want to printorder form, pay slip. . . and then
prints it in a different format depending on the kind of data. Or we could write several functions: print_order_form, print_pay_slip. . . Alternatively, we
could define a record print where each field is a printing function. Yet another
option would be to make each printing function a part of the type. Such a data type
is called a class, and its elements are called objects.
In the most radical object-oriented programming style, each object, for instance,
each order form, includes a different function print. An order form is then a record
that contains, in addition to the standard fieldsidentification of the item requested,
number of items ordered, . . . a field print defining the printing function that
should be used to print the object.
Some languages, for instance Java, associate a print function to each class
rather than each object. Thus, all the objects in the class share the printing
functionwhether static or dynamic. If we do not want to share the printing function for two objects t and u in the same class C, we need to define two sub-classes
T and U of C, which inherit all the fields of C but redefine print differently.
86
8.2 Objects
87
s -> 3. Thus, the term t#a, that is, t.a t or (fun s -> 3) t, is interpreted
as the value 3.
The first argument of each method in the object is then a bound variable, which
is usually called self or this. In fact, most programming languages use a special
variable self or this which is implicitly bound in the object, and which denotes
the object itself.
When all methods in a record are terms of the form fun x -> ..., they can
be interpreted as themselves, and we can simplify the rule
fun x1 -> t1 V1 ...
{l1 = fun x1 -> t1 , ...} {l1 = V1 , ...}
by using
{l1 = fun x1 -> t1 , ...} {l1 = fun x1 -> t1 , ...}
Similarly, the rule
t {l1 = V1 , ...}
t.li Vi
specialises to
t {l1 = fun x1 -> t1 , ...}
t.li fun xi -> ti
and finally the rule
t {l1 = V1 , ...}
u W
t(li <-u ) {l1 = V1 , ..., li = W, ...}
can be replaced by
{l1
To force all fields to be functions, we can modify the language of records, passing
from a record language to an object-oriented language. The symbol {} now binds a
variable in each even argumentterms, the symbol . is replaced by the symbol #,
the symbol <- now binds a variable in the third argument.
The term {}(l1 , s1 t1 , ..., ln , sn tn ) is written {l1 = s1 t1 ,
..., ln = sn tn }, the term #(t,l) is written t#l and the term <-(t,l,
s u) is written t(l <- s u). The rules of the big-step operational semantics
are now
{l1 = s1 t1 , ...,ln = sn tn }
{l1 = s1 t1 , ...,ln = sn tn }
t {l1 = s1 t1 , ..., ln = sn tn }
(t/si )ti V
t#li V
t {l1 = s1 t1 , ...}
t(li <- s u) {l1 = s1 t1 , ..., li1 = si1 ti1 ,
li = s u, li+1 = si+1 ti+1 , ...}
88
Exercise 8.7 Write an interpreter for the language PCF with objects.
Exercise 8.8 (Late binding) Consider the term
(({x = s 4, f = s fun y -> y + s#x} (x <- s 5))#f) 6
Is the value of this term 10 or 11? Compare this result with that of Exercise 2.8.
Chapter 9
Epilogue
The first goal of this book was to present the main tools to define the semantics
of a programming language: small-step operational semantics, big-step operational
semantics, and denotational semantics.
We have stressed the fact that these three tools have the same purpose. In the
three cases, the goal is to define a relation between a program, an input value
and an output value. Since the goal is to define a relation, the question that arises
naturally is: how do we define relations in mathematical language?
The answer is the same in the three cases: the means to achieve the goal is the
fixed point theorem. However, the similarity is superficial, because the fixed point
theorems are used in different ways in the three semantics. By giving rise to inductive definitions, and hence reflexive-transitive closures, the fixed point theorem
plays a major rle in operational semantics. In contrast, it plays a minor rle in
denotational semantics, because it is only used to give the meaning of the construction fix. The denotational semantics of a language without fixed point, such as
Gdels System Tsee Exercise 5.13can be defined without using the fixed point
theorem.
To highlight the differences, we can look at the rle of derivations. To establish
that a term t has the value V in operational semantics, it is sufficient to show a
derivation, or a sequence of reductions, that is, finite objects. In contrast, in denotational semantics the meaning of a term of the form fix is given as the least fixed
point of a function, that is, a limit. For this reason, to establish that the value of
a term t is V we sometimes need to compute the limit of a sequence, that is, we
sometimes need to deal with an infinite object.
Operational semantics have an advantage over denotational ones, because the relation can be defined in a more concrete way operationally. But on the other
hand, operationally we can only define relations that are recursively enumerable,
whereas denotationally we can define arbitrary relations. For this reason, in operational semantics we cannot complete the definition of the relation by adding
a value for the terms that do not terminate, because the resulting relation is not
recursive, it cannot be effectively defined by induction. In contrast, denotationally it
is not a problem to add such a value.
G. Dowek, J.-J. Lvy, Introduction to the Theory of Programming Languages,
Undergraduate Topics in Computer Science,
DOI 10.1007/978-0-85729-076-2_9, Springer-Verlag London Limited 2011
89
90
Epilogue
We see here the dilemma that arises from the undecidability of the halting problem: we cannot complete the relation by adding for the non-terminating terms,
and at the same time define it inductively. We have to choose between completing
the relation or defining it inductively, which leads to two different semantics. The
readers who have followed logic courses before will recognise here the same issues
that distinguish the truth judgements that are inductively defined, by the existence
of a proof, from those that are defined by their validity in a model.
The second goal of this book was to give the semantics of some programming language features: explicit definitions of functions, functions defined by fixed
points, assignment, records, objects. . . . Here again, since the goal is to define functions, it is useful to start by looking at the ways in which functions are defined in
Mathematics. In general, the comparison between the mathematical language and
programming languages is fruitful, since the mathematical language is the closest
we have to programming languages. This comparison shows some common points,
but also some differences.
The purpose of the study of programming language features is not to be exhaustive, but to show some informative examples. The point to remember is that, in the
same way that Zoology is not the study of all the animal species one after the other,
the study of programming languages should not consist of studying all languages
one after the other. They should be organised according to their main features.
We could continue this study by defining data types and exceptions. The study
of data types would give us the opportunity to use again the fixed point theorem,
and Robinsons unification algorithm, of which matching is a particular case. Going
forward in this direction we could study the notion of backtracking which leads to
Prolog. Other important points that we have left aside are the polymorphic typing
of references, the notion of array, imperative objects, modules, type systems for
records and objects (and in particular the notion of sub-type), concurrency. . . .
The final goal of this book was to present a number of applications of these
tools, in particular for the design and implementation of interpreters and compilers,
and also the implementation of type inference systems. The main point here is that
the structure of a compiler is derived directly from the operational semantics of
the language to be compiled. The next step would be the study of implementation
techniques for abstract machines, and this would lead us to the study of memory
management and garbage collection. We could also study program analysis, and
design systems to deduce in an automatic or interactive way properties of programs,
for instance, the property that states that the value returned by a sorting algorithm is
a sorted list.
The last point that remains to discuss is the rle of the theory of programming
languages, and in particular whether its purpose is to describe the existing programming languages, or to propose new languages.
Astronomers study the galaxies that exist, and do not build new ones, whereas
chemists study the existing molecules and build new ones. We know that in the latter case, the order in which theories and production techniques appear may vary: the
transformation of mass into energy was achieved long time after the theory of relativity, whereas the steam engine appeared before the principles of thermodynamics
were established.
9 Epilogue
91
The theory of programming languages has enabled the development of new features, such as static binding, type inference, polymorphic types, garbage collection,
. . . which are now available in commercial languages. In contrast, other functionalities, such as assignments and objects, were introduced in programming languages
wildly, and the theory has been slow to follow. The development of a formal semantics for these constructs led in turn to new proposals, such as the recent extensions
of Java with polymorphic types.
The theory of programming languages has neither an exclusively descriptive rle
nor an exclusively leading rle. It is this going backwards and forwards between the
description of existing features and the design of new ones that gives the theory of
programming languages its dynamics.
References
93
Index
A
Abstract machine, 43
Algorithm
Damas and Milner, 70
Hindleys, 64
Robinsons, 66
-equivalence, 11
Alphabetic equivalence, 11
Arity, 7
Array, 81
B
-reduction, 19
Binding
dynamic, 23
late, 88
static, 23
C
Call by name, 26, 28, 33
Call by value, 27, 29, 35
Church numeral, 20
Closed set, 4
Closure, 34
recursive, 38
Compiler, 43
bootstrapping, 43
Composition, 12
Confluence, 24
Constant, 7
Continuous function, 2
D
De Bruijn index, 36
Definition
explicit, 1
inductive, 4
Derivation, 5
Deterministic, vi
E
Environment, 33
semantic, 56
typing, 53
Evaluate, 22
Evaluator, 25
F
Fields, 81
Fixed point
Curry, 23
first theorem, 2
function construction via, 38
in PCF, 17
second theorem, 3
Functionalisation, 78
H
Height, 10
I
Interpreter, 33
L
Label, 82
Language, 7
Limit, 1
List, 42
M
Method, 85
dynamic, 86
static, 86
Module, 81
95
96
N
Number of arguments, 7
O
Object, 85
Ordering, 1
Scotts, 58
strongly complete, 2
weakly complete, 1
P
Pair, 42, 81
PCF (Programming language for computable
functions), 15
Polymorphism, 68
Position numerals, 20
R
Record, 82
in call by name, 83
in call by value, 83
Redex, 19
Reduction
call by name, 26
call by value, 27
lazy, 27
weak, 26
Reference, 73
Register, 44
accumulator, 44
code, 45
environment, 45
stack, 44
Renaming, 67
Result, 22
Rule, 4
S
Semantics
big-step operational, 12
Index
denotational, 12
small-step operational, 12
Side effect, 74
Solution, 65
principal, 66
Sort, 9
Strategy, 25
Subject reduction, 55
Substitution, 10
System
F, 71
T, 61
T
Term, 7
closed, 10
irreducible, 22
stuck, 22
Thunk, 33
Tree, 42
Type, 52
checking, 54
inference, 63
principal, 66
Type preservation
by interpretation, 56
Type scheme, 68
U
Unification, 66
V
Value, 22, 26
extended, 35
rational, 40
Variable, 8
capture, 11
environment, 36
mutable, 79