Compiler Construction
Compiler Construction
William M. Waite
Department of Electrical Engineering
University of Colorado
Boulder, Colorado 80309
USA
email: [email protected]
Gerhard Goos
Institut Programmstrukturen und Datenorganisation
Fakultat fur Informatik
Universitat Karlsruhe
D-76128 Karlsruhe
Germany
email: [email protected]
All known errors from the rst and second printing (1994 and 1995) have been xed. While every
precaution has been taken in preparation of this book, the authors assume no responsibility for errors
or omissions, or damages resulting from the use of the information contained here.
c 1984{1994 by Springer-Verlag, Berlin, New York Inc. ISBN 0-387-90821-8 and ISBN 3-540-90821
c 1995 by William M. Waite and Gerhard Goos.
All rights reserved. No part of this book may be translated, reproduced, archived or sold in any form
without written permission from one of the authors.
The content of Compiler Construction is made available via the Web by permission of the authors
as a service to the community and only for educational purposes. The book may be accessed freely
via Web browsers. The URL is ftp://i44ftp.info.uni-karlsruhe.de/pub/papers/ggoos/Compi-
lerConstruction.ps.gz.
Karlsruhe, 22nd February 1996
To all who know more than one language
Preface
Compilers and operating systems constitute the basic interfaces between a programmer and
the machine for which he is developing software. In this book we are concerned with the
construction of the former. Our intent is to provide the reader with a rm theoretical basis
for compiler construction and sound engineering principles for selecting alternate methods,
implementing them, and integrating them into a reliable, economically viable product. The
emphasis is upon a clean decomposition employing modules that can be re-used for many com-
pilers, separation of concerns to facilitate team programming, and
exibility to accommodate
hardware and system constraints. A reader should be able to understand the questions he
must ask when designing a compiler for language X on machine Y, what tradeos are possible,
and what performance might be obtained. He should not feel that any part of the design rests
on whim; each decision must be based upon specic, identiable characteristics of the source
and target languages or upon design goals of the compiler.
The vast majority of computer professionals will never write a compiler. Nevertheless,
study of compiler technology provides important benets for almost everyone in the eld.
It focuses attention on the basic relationships between languages and machines. Un-
derstanding of these relationships eases the inevitable transitions to new hardware and
programming languages and improves a person's ability to make appropriate tradeos
in design and implementation.
It illustrates application of software engineering techniques to the solution of a signicant
problem. The problem is understandable to most users of computers, and involves both
combinatorial and data processing aspects.
Many of the techniques used to construct a compiler are useful in a wide variety of appli-
cations involving symbolic data. In particular, every man-machine interface constitutes
a form of programming language and the handling of input involves these techniques.
We believe that software tools will be used increasingly to support many aspects of
compiler construction. Much of Chapters 7 and 8 is therefore devoted to parser gen-
erators and analyzers for attribute grammars. The details of this discussion are only
interesting to those who must construct such tools; the general outlines must be known
to all who use them. We also realize that construction of compilers by hand will remain
an important alternative, and thus we have presented manual methods even for those
situations where tool use is recommended.
Virtually every problem in compiler construction has a vast number of possible solutions.
We have restricted our discussion to the methods that are most useful today, and make no
attempt to give a comprehensive survey. Thus, for example, we treat only the LL and LR
parsing techniques and provide references to the literature for other approaches. Because we
do not constantly remind the reader that alternative solutions are available, we may sometimes
appear overly dogmatic although that is not our intent.
i
ii Preface
Chapters 5 and 8, and Appendix B, state most theoretical results without proof. Although
this makes the book unsuitable for those whose primary interest is the theory underlying a
compiler, we felt that emphasis on proofs would be misplaced. Many excellent theoretical
texts already exist; our concern is reduction to practice.
A compiler design is carried out in the context of a particular language/machine pair.
Although the principles of compiler construction are largely independent of this context, the
detailed design decisions are not. In order to maintain a consistent context for our major
examples, we therefore need to choose a particular source language and target machine. The
source language that we shall use is dened in Appendix A. We chose not to use an existing
language for several reasons, the most important being that a new language enabled us to
control complexity: Features illustrating signicant questions in compiler design could be
included while avoiding features that led to burdensome but obvious detail. It also allows
us to illustrate how a compiler writer derives information about a language, and provides an
example of an informal but relatively precise language denition.
We chose the machine language of the IBM 370 and its imitators as our target. This
architecture is widely used, and in many respects it is a dicult one to deal with. The
problems are representative of many computers, the important exceptions being those (such
as the Intel 8086) without a set of general registers. As we discuss code generation and
assembly strategies we shall point out simplications for more uniform architectures like
those of the DEC PDP11 and Motorola 68000.
We assume that the reader has a minimum of one year of experience with a block-
structured language, and some familiarity with computer organization. Chapters 5 and 8
use notation from logic and set theory, but the material itself is straightforward. Several
important algorithms are based upon results from graph theory summarized in Appendix B.
This book is based upon many compiler projects and upon the lectures given by the
authors at the Universitat Karlsruhe and the University of Colorado. For self-study, we
recommend that a reader with very little background begin with Section 1.1, Chapters 2
and 3, Section 12.1 and Appendix A. His objective should be to thoroughly understand the
relationships between typical programming languages and typical machines, relationships that
dene the task of the compiler. It is useful to examine the machine code produced by existing
compilers while studying this material. The remainder of Chapter 1 and all of Chapter 4 give
an overview of the organization of a compiler and the properties of its major data structures,
while Chapter 14 shows how three production compilers have been structured. From this
material the reader should gain an appreciation for how the various subtasks relate to one
another, and the important characteristics of the interfaces between them.
Chapters 5, 6 and 7 deal with the task of determining the structure of the source program.
This is perhaps the best-understood of all compiler tasks, and the one for which the most
theoretical background is available. The theory is summarized in Chapter 5, and applied in
Chapters 6 and 7. Readers who are not theoretically inclined, and who are not concerned
with constructing parser generators, should skim Chapter 5. Their objectives should be to
understand the notation for describing grammars, to be able to deal with nite automata,
and to understand the concept of using a stack to resolve parenthesis nesting. These readers
should then concentrate on Chapter 6, Section 7.1 and the recursive descent parse algorithm
of Section 7.2.2.
The relationship between Chapter 8 and Chapter 9 is similar to that between Chapter 5
and Chapter 7, but the theory is less extensive and less formal. This theory also underlies
parts of Chapters 10 and 11. We suggest that the reader who is actually engaged in com-
piler construction devote more eort to Chapters 8-11 than to Chapters 5-7. The reason is
that parser generators can be obtained \o the shelf" and used to construct the lexical and
syntactic analysis modules quickly and reliably. A compiler designer must typically devote
Preface iii
most of his eort to specifying and implementing the remainder of the compiler, and hence
familiarity with Chapters 8-11 will have a greater eect on his productivity.
The lecturer in a one-semester, three-hour course that includes exercises is compelled to
restrict himself to the fundamental concepts. Details of programming languages (Chapter 2),
machines (Chapter 3) and formal languages and automata theory (Chapter 5) can only be
covered in a cursory fashion or must be assumed as background. The specic techniques
for parser development and attribute grammar analysis, as well as the whole of Chapter 13,
must be reserved for a separate course. It seems best to present theoretical concepts from
Chapter 5 in close conjunction with the specic methods of Chapters 6 and 7, rather than as
a single topic. A typical outline is:
1. The Nature of the Problem 4 hours
1.1. Overview of compilation (Chapter 1)
1.2. Languages and machines (Chapters 2 and 3)
2. Compiler Data Structures (Chapter 4) 4 hours
3. Structural Analysis 10 hours
3.1. Formal Systems (Chapter 5)
3.2. Lexical analysis (Chapter 6)
3.3. Parsing (Chapter 7)
Review and Examination 2 hours
4. Consistency Checking 10 hours
4.1. Attribute grammars (Chapter 8)
4.2. Semantic analysis (Chapter 9)
5. Code Generation (Chapter 10) 8 hours
6. Assembly (Chapter 11) 2 hours
7. Error Recovery (Chapter 12) 3 hours
Review 2 hours
The students do not write a compiler during this course. For several years it has been
run concurrently with a practicum in which the students implement the essential parts of a
LAX compiler. They are given the entire compiler, with stubs replacing the parts they are to
write. In contrast to project courses in which the students must write a complete compiler, this
approach has the advantage that they need not be concerned with unimportant organizational
tasks. Since only the central problems need be solved, one can deal with complex language
properties. At the same time, students are forced to read the environment programs and to
adhere to interface specications. Finally, if a student cannot solve a particular problem it
does not cause his entire project to fail since he can take the solution given by the instructor
and proceed.
Acknowledgements
This book is the result of many years of collaboration. The necessary research projects and
travel were generously supported by our respective universities, the Deutsche Forschungsge-
meinschaft and the National Science Foundation.
It is impossible to list all of the colleagues and students who have in
uenced our work.
We would, however, like to specially thank four of our doctoral students, Lynn Carter, Bruce
Haddon, Uwe Kastens and Johannes Rohrich, for both their technical contributions and their
willingness to read the innumerable manuscripts generated during the book's gestation. Mae
Jean Ruehlman and Gabriele Sahr also have our gratitude for learning more than they ever
wanted to know about computers and word processing as they produced and edited those
manuscripts.
iv Preface
Contents
Preface i
Contents v
1 Introduction and Overview 1
1.1 Translation and Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Tasks of a Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Data Management in a Compiler . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Compiler Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Properties of Programming Languages 11
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Syntax, Semantics and Pragmatics . . . . . . . . . . . . . . . . . . . . 11
2.1.2 Syntactic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.3 Semantic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Data Objects and Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Elementary Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Composite Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.3 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.4 Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.5 Type Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Control Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5 Program Environments and Abstract Machine States . . . . . . . . . . . . . . 25
2.5.1 Constants, Variables and Assignment . . . . . . . . . . . . . . . . . . 25
2.5.2 The Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5.3 Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.6 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3 Properties of Real and Abstract Machines 37
3.1 Basic Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1.1 Storage Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.1.2 Access Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.1.3 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 Representation of Language Elements . . . . . . . . . . . . . . . . . . . . . . 43
3.2.1 Elementary Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.2 Composite Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.3 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.4 Control Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
v
vi Contents
while i 6= j do
if i > j then i := i , j else j := j , i ;
a) An algorithm
Initial: i = 36 j = 24
i = 12 j = 24
Final: i = 12 j = 12
b) A particular sequence of states
Figure 1.1: Algorithms and States
For example, consider the language of a simple computer with a single accumulator and two
data locations called I and J respectively (Exercise 1.3). Suppose that M maps a particular
state of the algorithm given in Figure 1.1a to a set of machine states in which I contains the
value of the variable i, J contains the value of the variable j , and the accumulator contains
any arbitrary value. Figure 1.2a shows a translation of Figure 1.1a for this machine; a partial
state sequence is given in Figure 1.2b.
LOOP LOAD I
SUB J
JZERO EXIT
JNEG SUBI
STORE I
JUMP LOOP
SUBI LOAD J
SUB I
STORE J
JUMP LOOP
EXIT
a) An algorithm
Initial: I = 36 J = 24 ACC = ?
I = 36 J = 24 ACC = 36
I = 36 J = 24 ACC = 12
. . .
Final: I = 12 J = 12 ACC = 0
b) A sequence of states corresponding to Figure 1.1b
Figure 1.2: A Translation of Figure 1.1
In determining the state sequence of Figure 1.1b, we used only the concepts of Pascal as
specied by the language denition. For every programming language, PL, we can dene
an abstract machine : The operations, data structures and control structures of PL become
the memory elements and instructions of the machine. A `Pascal machine' is therefore an
imaginary computer with Pascal operations as its machine instructions and the data objects
possible in Pascal as its memory elements. Execution of an algorithm written in PL on such
a machine is called interpretation ; the abstract machine is an interpreter.
1.2 The Tasks of a Compiler 3
A pure interpreter analyzes the character form of each source language instruction every
time that instruction is executed. If the given instruction is only to be executed once, pure
interpretation is the least expensive method of all. Hence it is often used for job control
languages and the `immediate commands' of interactive languages. When instructions are
to be executed repeatedly, a better approach is to analyze the character form of the source
program only once, replacing it with a sequence of symbols more amenable to interpretation.
This analysis is simply a translation of the source language into some target language, which
is then interpreted.
The translation from the source language to the target language can take place as each
instruction of the program is executed for the rst time (interpretation with substitution ).
Thus only that part of the program actually executed will be translated; during testing this
may be only a fraction of the entire program. Also, the character form of the source program
can often be stored more compactly than the equivalent target program. The disadvantage of
interpretation with substitution is that both the compiler and interpreter must be available
during execution. In practice, however, a system of this kind should not be signicantly larger
than a pure interpreter for the same language.
Examples may be found of virtually all levels of interpretation. At one extreme are the
systems in which the compiler merely converts constants to internal form, xes the meaning
of identiers and perhaps transforms inx notation to postx (APL and SNOBOL4 are com-
monly implemented this way); at the other are the systems in which the hardware, assisted
by a small run-time system, forms the interpreter (FORTRAN and Pascal implementations
usually follow this strategy).
while
exp if
down into two parts, the structural analysis to determine the static structure of the source
program, and the semantic analysis to x the additional information and check its consistency.
Chapter 5 summarizes some results from the theory of formal languages and shows how they
are used in the structural analysis of a program. Two subtasks of the structural analysis are
identied on the basis of the particular formalisms employed: Lexical analysis (Chapter 6)
deals with the basic symbols of the source program, and is described in terms of nite-state
automata; syntactic analysis, or parsing, (Chapter 7) deals with the static structure of the
program, and is described in terms of pushdown automata. Chapter 8 extends the theoretical
treatment of Chapter 5 to cover the additional information attached to the components of the
structure, and Chapter 9 applies the resulting formalism (attribute grammars) to semantic
analysis.
There is little in the way of formal models for the entire synthesis process, although al-
gorithms for various subtasks are known. We view synthesis as consisting of two distinct
subtasks, code generation and assembly. Code generation (Chapter 10) transforms the ab-
stract source program appearing at the analysis/synthesis interface into an equivalent target
machine program. This transformation is carried out in two steps: First we map the algo-
rithm from source concepts to target concepts, and then we select a specic sequence of target
machine instructions to implement that algorithm.
Assembly (Chapter 11) resolves all target addressing and converts the target machine
instructions into an appropriate output format. We should stress that by using the term
`assembly' we do not imply that the code generator will produce symbolic assembly code for
input to the assembly task. Instead, it delivers an internal representation of target instructions
in which most addresses remain unresolved. This representation is similar to that resulting
from analysis of symbolic instructions during the rst pass of a normal symbolic assembler.
The output of the assembly task should be in the format accepted by the standard link editor
or loader on the target machine.
Errors may appear at any time during the compilation process. In order to detect as
many errors as possible in a single run, repairs must be made such that the program is
consistent, even though it may not re
ect the programmer's intent. Violations of the rules of
the source language should be detected and reported during analysis. If the source algorithm
uses concepts of the source language for which no target equivalent has been dened in a
particular implementation, or if the target algorithm exceeds limitations of a specic target
language interpreter (e.g. requires more memory than a specic computer provides), this
should be reported during synthesis. Finally, errors must be reported if any storage limits of
the compiler itself are violated.
In addition to the actual error handling, it is useful for the compiler to provide extra
information for run-time error detection and debugging. This task is closely related to error
handling, and both are discussed in Chapter 12.
A number of strategies may be followed in an attempt to improve the target program
relative to some specied measure of cost. (Code size and execution speed are typical cost
measures.) These strategies may involve deeper analysis of the source program, more complex
mapping functions, and transformations of the target program. We shall treat the rst two
in our discussions of analysis and code generation respectively; the third is the subject of
Chapter 13.
Compilation
Analysis Synthesis
LOCAL
Structure Tree
Figure 1.4: Decomposition of the Compiler
Our decomposition is based upon our understanding of the compilation problem and our
perception of the best techniques currently available for its solution. The choice of precise
boundaries is driven by control and data
ow considerations, primarily minimization of
ow
at interfaces. Specic criteria that in
uenced our decisions will be discussed throughout the
text.
The decomposition is virtually independent of the underlying implementation, and of
the specic characteristics of the source language and target machine. Clearly these factors
8 Introduction and Overview
INPUT OUTPUT
Source text Target Code
Error Reports
Analysis
in
uence the complexity of the modules that we have identied, in some cases reducing them
to trivial stubs, but the overall structure remains unchanged.
INPUT OUTPUT
Source text Error Reports
Connection Sequence
Structural Analysis
Synthesis
and interact as coroutines: As soon as the analyzer has extracted an element of the structure
tree, the synthesizer is activated to process this element further. In this case the structure
tree will never be built as a concrete object, but is simply an abstract data structure; only
the element being processed exists in concrete form.
INPUT OUTPUT
Structure Tree Error Reports
Target Tree
Code Generation
Exercises
1.1 Consider the Pascal algorithm of Figure 1.1a.
(a) What are the elementary objects and operations?
(b) What are the rules for chronological relations?
(c) What composition rules are used to construct the static program?
1.2 Determine the state transformation function, f , for the algorithm of Figure 1.1a. What
initial states guarantee termination? How do you characterize the corresponding nal
states?
1.3 Consider a simple computer with an accumulator and two data locations. The instruc-
tion set is:
LOAD d: Copy the contents of data location d to the accumulator.
STORE d: Copy the contents of the accumulator to data location d.
SUB d: Subtract the contents of data location d from the accumulator, leaving
the result in the accumulator. (Ignore any possibility of over
ow.)
JUMP i: Execute instruction i next.
JZERO i: Execute instruction i next if the accumulator contents are zero.
JNEG i: Execute instruction i next if the accumulator contents are less than
zero.
(a) What are the elementary objects?
(b) What are the elementary actions?
(c) What composition rules are used?
(d) Complete the state sequence of Figure 1.2b.
Chapter 2
Properties of Programming
Languages
Programming languages are often described by stating the meaning of the constructs (ex-
pressions, statements, clauses, etc.) interpretively. This description implicitly denes an
interpreter for an abstract machine whose machine language is the programming language.
The output of the analysis task is a representation of the program to be compiled in
terms of the operations and data structures of this abstract machine. By means of code
generation and the run-time system, these elements are modeled by operation sequences and
data structures of the computer and its basic software (operating system, etc.)
In this chapter we explore the properties of programming languages that determine the
construction and possible forms of the associated abstract machines, and demonstrate the
correspondence between the elements of the programming language and the abstract machine.
On the basis of this discussion, we select the features of our example source language, LAX.
A complete denition of LAX is given in Appendix A.
2.1 Overview
The basis of every language implementation is a language denition. (See the Bibliography
for a list of the language denitions that we shall refer to in this book.) Users of the language
read the denition as a user manual: What is the practical meaning of the primitive elements?
How can they be meaningfully used? How can they be combined in a meaningful way? The
compiler writer, on the other hand, is interested in the question of which constructions are
permitted. Even if he cannot at the moment see any useful application of a construct, or if
the construct leads to serious implementation diculties, he must implement it exactly as
specied by the language denition. Descriptions such as programming textbooks, which are
oriented towards the meaningful applications of the language elements, do not clearly dene
the boundaries between what is permitted and what is prohibited. Thus it is dicult to make
use of such descriptions as bases for the construction of a compiler. (Programming textbooks
are also informal, and often cover only a part of the language.)
of the language to concepts outside the language (to concepts of mathematics or to the objects
and operations of a computer, for example).
Semantics include properties that can be deduced without executing the program as well
as those only recognizable during execution. Following Griffiths [1973], we denote these
properties static and dynamic semantics respectively. The assignment of a particular property
to one or the other of these classes is partially a design decision by the compiler writer. For
example, some implementations of ALGOL 60 assign the distinction between integer and real
to the dynamic semantics, although this distinction can normally be made at compile time
and thus could belong to the static semantics.
Pragmatic considerations appear in language denitions as unelaborated statements of
existence, as references to other areas of knowledge, as appeals to intuition, or as explicit
statements. Examples are the statements `[Boolean] values are the truth values denoted by the
identiers true and false' (Pascal Report, Section 6.1.2), `their results are obtained in the sense
of numerical analysis' (ALGOL 68 Revised Report, Section 2.1.3.1.e) or `decimal numbers have
their conventional meaning' (ALGOL 60 Report, Section 2.5.3). Most pragmatic properties
are hinted at through a suggestive choice of words that are not further explained. Statements
that certain constructs only have a dened meaning under specied conditions also belong
to the pragmatics of a language. In such cases the compiler writer is usually free to x the
meaning of the construct under other conditions. The richer the pragmatics of a language, the
more latitude a compiler writer has for ecient implementation and the heavier the burden
on the user to write his program to give the same answers regardless of the implementation.
We shall set the following goals for our analysis of a language denition:
Stipulation of the syntactic rules specifying construction of programs.
Stipulation of the static semantic rules. These, in conjunction with the syntactic rules,
determine the form into which the analysis portion of the compiler transforms the source
program.
Stipulation of the dynamic semantic rules and dierentiation from pragmatics. These
determine the objects and operations of the language-oriented abstract machine, which
can be used to describe the interface between the analysis and synthesis portions of the
compiler: The analyzer translates the source program into an abstract target program
that could run on the abstract machine.
Stipulation of the mapping of the objects and operations of the abstract machine onto
the objects and operations of the hardware and operating system, taking the pragmatic
meanings of these primitives into account. This mapping will be carried out partly by
the code generator and partly by the run-time system; its specication is the basis for
the decisions regarding the partitioning of tasks between these two phases.
2.1.2 Syntactic Properties
The syntactic rules of a language belong to distinct levels according to their meaning. The
lowest level contains the `spelling rules' for basic symbols, which describe the construction
of keywords, identiers and special symbols. These rules determine, for example, whether
keywords have the form of identiers (begin) or are written with special delimiters ('BEGIN',
.BEGIN), whether lower case letters are permitted in addition to upper case, and which
spellings (<=, .LE., 'NOT' 'GREATER') are permitted for symbols such as that cannot be
reproduced on all I/O devices. A common property of these rules is that they do not aect
the meaning of the program being represented. (In this book we have distinguished keywords
by using boldface type. This convention is used only to enhance readability, and does not
imply anything about the actual representation of keywords in program text.)
2.1 Overview 13
The second level consists of the rules governing representation and interpretation of con-
stants, for example rules about the specication of exponents in
oating point numbers or
the allowed forms of integers (decimal, hexadecimal, etc.) These rules aect the meanings of
programs insofar as they specify the possibilities for direct representation of constant values.
The treatment of both of these syntactic classes is the task of lexical analysis, discussed in
Chapter 6.
The third level of syntactic rules is termed the concrete syntax. Concrete syntax rules
describe the composition of language contructs such as expressions and statements from basic
symbols. Figure 2.1a shows the parse tree (a graphical representation of the application of
concrete syntax rules) of the Pascal statement `if a or b and c then : : : else : : : '. Because
the goal of the compiler's analysis task is to determine the meaning of the source program,
semantically irrelevant complications such as operator precedence and certain keywords can
be suppressed. The language constructs are described by an abstract syntax that species
the compositional structure of a program while leaving open some aspects of its concrete
representation as a string of basic symbols. Application of the abstract syntax rules can be
illustrated by a structure tree (Figure 2.1b).
statement
... ...
simple expression
term or term
a b c
expression
... ...
term or term
a b c
upon subsequent use and the necessary dynamic type checking increases the computation
time.
represent all valid integers exactly as
oating point numbers because s is not large enough to
hold all integer values.
The number of signicant digits and the size of the exponent (and similar properties of
other types) vary from computer to computer and implementation to implementation. Since
an algorithm's behavior may depend upon the particular values of such parameters, the values
should be accessible. For this purpose many languages provide environment inquiries ; some
languages, Ada for example, allow specications for the range and precision of numbers in
the form of minimum requirements.
Restriction of the integer domain and similar specication of subranges of nite types is
often erroneously equated to the concept of a type. ALGOL 68, for example, distinguishes an
innity of `sizes' for integer and real values. Although these sizes dene dierent modes in the
ALGOL 68 sense, the Standard Environment provides identical operators for each; thus they
are indistinguishable according to the denition of type given at the beginning of Section 2.2.
The distinction can only be understood by examination of the internal coding.
The basic arithmetic operations are usually dened by recourse to the reader's mathe-
matical intuition. Only integer division involving negative operands requires a more exact
stipulation in a language denition. Number theorists recognize two kinds of integer division,
one truncating toward zero (-3 divided by 2 yields -1) and the other truncating toward nega-
tive innity (-3 divided by 2 yields -2). ALGOL 60 uses the rst denition, which also forms
the basis for most hardware realizations.
We have already seen that a correspondence between the values of a nite type and a
subset of the natural numbers can be dened. This correspondence may be specied by the
language denition, or it may be described but its denition left to the implementor. As
a general principle, similar relationships are possible between the value sets of other types.
For example, the ALGOL 68 Revised Report asserts that for every integer of a given length
there is an equivalent real of that length; the FORTRAN Standard implies a relation between
integer and real values by its denition of assignment, but does not dene it precisely.
Even if two values of dierent types (say 2 and 2.0) are logically equivalent, they must
be distinguished because dierent operations may be applied to them. If a programmer is to
make use of the equivalence, the abstract machine must provide appropriate transfer (con-
version ) operations. This is often accomplished by overloading the assignment operator. For
example, Section 4.2.4 of the ALGOL 60 Report states that `if the the type of the arithmetic
expression [in an assignment] diers from that associated with the variables and procedure
identiers [making up the left part list], appropriate transfer functions are understood to be
automatically invoked'. Another way of achieving this eect is to say that the operator indi-
cation `:=' stands for one of a number of assignment operations, just as `+' stands for either
integer or real addition.
The meaning of `:=' must be determined from the context in the above example. Another
approach to the conversion problem is to use the context to determine the type of value
directly, and allow the compiler to insert a transfer operation if necessary. We say that
the compiler coerces the value to a type appropriate for the context; the inserted transfer
operation is a coercion.
Coercions are most frequently used when the conversion is dened for all values of the type
being converted. If this is not the case, the programmer may be required to write an explicit
transfer function. In Pascal, for example, a coercion is provided from integer to real but not
from real to integer. The programmer must use one of the two explicit transfer functions
trunc or round in the latter case.
Sometimes coercions are restricted to certain syntactic positions. ALGOL 68 has elaborate
rules of this kind, dividing the complete set of available coercions into four classes and allowing
dierent classes in dierent positions. The particular rules are chosen to avoid ambiguity in
18 Properties of Programming Languages
the program. Ada provides a set of coercions, but does not restrict their use. Instead, the
language denition requires simply that each construct be unambiguously interpretable.
LAX provides Boolean, integer and real as elementary types. We omitted characters and
programmer-dened nite types because they do not raise any additional signicant issues.
Integer division is dened to truncate towards zero to match the behavior of most hardware.
Coercion from integer to real is dened, but there is no way to convert in the opposite
direction. Again, the reason for this omission is that no new issues are raised by it.
The elds appearing in every record of the type are written rst, followed by alternative sets
of elds; the c appearing in the case construct describes which alternative set is actually
present.
A union mode in ALGOL 68 is a special case of a variant record, in which every variant
consists of exactly one eld and the xed part consists only of the variant selector. Syntacti-
cally, the construct is not described as a record and the variant selector is not given explicitly.
In languages such as APL or SNOBOL4, essentially all objects are specied in this manner.
An important question about such objects is whether the variant is xed for the lifetime of a
particular object, or whether it forms a part of the state and may be changed.
Arrays dier from records in that their components may be selected via a computable,
one-to-one function whose domain is some nite set (such as any nite type or a subrange
p i q of the integers). In languages with manifest types, all elements of an array have the
same type. The operation a [ e ] (`select the component of a corresponding to e ') is called
indexing. Most programming languages also permit multi-dimensional rectangular arrays, in
which the index set represents a Cartesian product I1 I2 In over a collection of index
domains. Depending upon the time at which the number of elements is bound, we speak of
static (xed at compile time), dynamic (xed at the time the object is created) or
exible
(variable by assignment) arrays (cf. Section 2.5.3).
2.2 Data Objects and Operations 19
One-dimensional arrays of Boolean values (bit vectors ) may also be regarded as tabular
encodings of characteristic functions over the index set I . Every value of an array c corre-
sponds to fi j c[i] = trueg. In Pascal such arrays are introduced as `sets' with type set of
index set ; in Ada they are described as here, as Boolean arrays. In both cases, the opera-
tions union (represented by + or or), intersection (*, and), set dierence (-), equality (= and
<>), inclusion (<, <=, >, >=) and membership (in) are dened on such sets. Diculties
arise in specifying set constants: The element type can, of course be determined by looking
at the elements of the constant. But if sets can be dened over a subrange of a type, it is not
usually possible to determine the appropriate subrange just by looking at the elements. In
Pascal the problem is avoided by regarding all sets made up of elements of a particular scalar
type to be of the same type, regardless of the subrange specied as the index set. (Sets of
integers are regarded as being over an implementation-dened subrange.) In Ada the index
set is determined by the context.
Only a few programming languages provide operations (other than set operations) that
may be applied to a composite object as a whole. (APL has the most comprehensive collection
of such operations.) Processing of composite objects is generally carried out componentwise,
with eld selection, indexing and component assignment used as access operations on the
composite objects. It may also be possible to describe groups of array elements, for example
entire rows or columns or even arbitrary rectangular index domains (a[i 1:i 2, j 1:j 2]
in ALGOL 68); this process is called slicing.
2.2.3 Strings
Strings are exceptional cases in most programming languages. In ALGOL 60, strings are
permitted only as arguments to procedures and can thus ultimately be used only as data
for code procedures (normally I/O routines). ALGOL 68 considers strings as
exible arrays,
and in FORTRAN 77 or PL/1 the size can increase only to a maximum value xed when
the object is created. In both languages, single characters may be extracted by indexing; in
addition, comparison and concatenation may be carried out on strings whose length is known.
These latter operations consider the entire string as a single unit. In SNOBOL4 strings are
always considered to be single units: Assignment, concatenation, conversion to a pattern,
pattern matching and replacement are elementary operations of the language.
We omitted strings from LAX because they do not lead to any unique problems in compiler
construction.
2.2.4 Pointers
Records, arrays and strings are composite objects constructed as contiguous sequences of
elements. Composition according to the model of a directed graph is possible using pointers,
with which one node can point to another. In all languages providing arrays, pointers can be
represented by indices in an array. Some languages (such as ALGOL 68, Pascal and PL/1)
dene pointers as a new kind of type. In PL/1 the type of the object pointed to is not
specied, and hence one can place an arbitrary interpretation upon the target node of the
pointer. In the other languages mentioned, however, the pointer type carries the type of the
object pointed to.
Pointers have the advantage of security over indices in an array: Indices can be confused
with other uses of integers, pointers cannot. Above all, however, pointers can be used to ref-
erence anonymous objects that are created dynamically. The number of objects thus created
need not be known ahead of time. With indices the array bounds x the maximum number
of objects (except when the array is
exible).
20 Properties of Programming Languages
Pascal pointers can reference only anonymous objects, whereas in ALGOL 68 either named
or anonymous objects may be referenced. When named objects have at most a bounded
lifetime, it is possible that a pointer to an object could outlive the object to which it points.
Such dangling references will be discussed in Section 2.5.2.
In addition to the technical questions of pointer implementation, the compiler writer
should be concerned with special testing aids (such as printing programs that can traverse a
structure, outputting links in some reasonable way). The reason is that programs containing
pointers are usually more dicult to debug than those not containing pointers.
are dierent types under this denition, since m and p are distinct identiers. The right
hand sides of the declarations of m and p are automatically dierent, since they are not
type identiers. Name equivalence is obviously easy to check, since it only involves xing the
identity of type declarations.
Name equivalence seldom appears in pure form. On the one hand it leads to a
ood of type
declarations, and on the other to problems in linking to library procedures that have array
parameters. However, name equivalence is the basis for the denition of abstract data types,
where type declarations that carry the details of the representation are not revealed outside the
declaration. This is exactly the eect of name equivalence, whereas structural equivalence
has the opposite result. Most programming languages that permit type declarations use
an intermediate strategy. Euclid uses structural equivalence locally; as soon as a type is
`exported', it is known only by a type identier and hence name equivalence applies.
If the language allows subranges of the basic types (such as a subrange of integers in
Pascal) the question of whether or not this subrange is a distinct type arises. Ada allows
both: The subrange can be dened as a subtype or as a new type. In the second case, the
pre-dened operations of the base type will be taken over but later procedures requiring
parameters of the base type cannot be passed arguments of the new type.
The type equivalence rules of LAX embody a representative compromise. They require
textual equivalence as discussed above, but whenever a type is denoted by an identier it is
considered elementary. (In other words, if the compiler is comparing two type specications
for equality and an identier appears in one then the same identier must appear in the same
position in the other.) Implementation of these rules illustrate the compiler mechanisms
needed to handle both structure and name equivalence.
2.3 Expressions
Expressions (or formulas ) are examples of composite operations. Their structure resembles
that of composite objects: They consist of a simple operation with operands, which are either
ordinary data objects or further expressions. In other words, an expression is a tree with
operations as interior nodes and data objects as leaves.
An expression written in linear inx notation may lead to distinct trees when interpreted
according to dierent language denitions (Figure 2.2). In low-level languages modeled upon
PL/360, the operators are strictly left-associative with no operator precedence, and parenthe-
ses are prohibited; APL uses right-associativity with no precedence, but permits grouping by
parentheses. Most higher-level languages employ the normal precedence rules of mathematics
and associate operators of the same precedence to the left. FORTRAN 77 (Section 6.6.4) is
an exception: `Once [a tree] has been established in accordance with [the precedence, associ-
ation and parenthesization] rules, the processor may evaluate any mathematically equivalent
expression, provided that the integrity of parentheses is not violated.' The phrase `mathemat-
ically equivalent' implies that a FORTRAN compiler may assume that addition is associative,
even though this is not true for computer implementation of
oating point arithmetic. (The
programmer can, however, always indicate the correct sequence by proper use of parentheses.)
The leaves of an expression tree represent activities that can be carried out indepen-
dently of all other nodes of the tree. Interior nodes, on the other hand, depend upon the
values returned by their descendants. The entire tree may thus be evaluated by the following
algorithm:
22 Properties of Programming Languages
+ d
* c
a b
a) Left-associative (e.g. PL/360)
*
a +
b *
c d
b) Right-associative (e.g. APL)
+
* *
a b c d
eects are present. In Euclid an attempt was made to restrict the possibilities to the point
where the compiler could perform such a check safely. These restrictions include prohibition
of assignments to result parameters and global variables in functions, and prohibition of I/O
operations in functions.
Some side eects do not destroy referential transparency, and are thus somewhat less dan-
gerous. Section 6.6 of the FORTRAN 77 Standard formulates the weakest useful restrictions:
`The execution of a function reference in a statement may not alter the value of any other
entity within the statement in which the function reference appears.'
In some expressions the value of a subexpression determines that of the entire expression.
Examples are:
a and ( ) when a = false
b or ( ) when b = true
c ( ) when c = 0
If the remainder of the expression has no side eect, only the subexpression determining the
value need be computed. The FORTRAN 77 Standard allows this short circuit evaluation
regardless of side eects; the description is such that the program is undened if side eects
are present, and hence it is immaterial whether the remainder of the expression is evaluated or
not in that case. The wording (Section 6.6.1) is: `If a statement contains a function reference
in a part of an expression that need not be evaluated, all entities that would have become
dened in the execution of that reference become undened at the completion of evaluation
of the expression containing the function reference.'
ALGOL 60, ALGOL 68 and many other languages require, in principle, the evaluation
of all operands and hence preclude such optimization unless the compiler can guarantee that
no side eects are possible. Pascal permits short circuit evaluation, but only in Boolean
expressions (User Manual, Section 4a): `The rules of Pascal neither require nor forbid the
evaluation of the second part [of a Boolean expression, when the rst part xes the value]'.
Ada provides two sets of Boolean operators, one (and, or) prohibiting short circuit evaluation
and the other (and then, or else) requiring it.
LAX requires complete evaluation of operands for all operators except and and or. The
order of evaluation is constrained only by data
ow considerations, so the compiler may
assume referential transparency. This simplies the treatment of optimization. By requiring
a specic short circuit evaluation for and and or, we illustrate other optimization techniques
and also show how the analysis of an expression is complicated by evaluation order rules.
Conditional clause
Case clause
Iteration (with or without a count)
Jump, exit, etc.
Procedure call
Conditional clauses make the execution of a component S dependent upon fulllment of
a Boolean condition. In many languages S may only take on one of a restricted number of
forms { in the extreme case, S may only be a jump.
The case clause is a generalization of the conditional clause in which the distinct values of
an expression are associated with distinct statements. The correspondence is either implicit
as in ALGOL 68 (the statements correspond successively to the values 1,2,3,: : : ), or explicit
as in Pascal (the value is used as a case label for the corresponding statement). The latter
construct allows one statement to correspond with more than one value and permits gaps in
the list of values. It also avoids counting errors and enhances program readability.
Several syntactically distinct iteration constructs appear in many programming languages:
with or without counters, test at the beginning or end, etc. The inecient ALGOL 60 rules
requiring the (arbitrarily complex) step and limit expressions to be re-evaluated for each
iteration have been replaced in newer languages by the requirement that these expressions be
evaluated exactly once. Another interesting point is whether the value of the counter may
be altered by assignment within the body of the iteration (as in ALGOL 60), or whether it
must remain constant (as in ALGOL 68). This last is important for many optimizations of
iterations, as is the usual prohibition on jumps into an iteration.
Many programming languages allow jumps with variable targets. Examples are the use
of indexing in an array of labels (the ALGOL 60 switch ) and the use of label variables (the
FORTRAN assigned GOTO). While COBOL or FORTRAN jumps control only the succession
of statements, jumps out of blocks or procedures in ALGOL-like languages in
uence the
program state (see Section 2.5). Procedure calls also in
uence the state.
The ALGOL 60 and ALGOL 68 denitions explain the operation of procedure calls by
substitution of the procedure body for the call ( copy rule ). This copying process could
form the basis for an implementation ( open subroutines ), if the procedure is not recursive.
Recursion requires that the procedure be implemented as a closed subroutine, a model on
which many other language denitions are based. Particular diculties await the writer of
compilers for languages such as COBOL, which do not distinguish the beginning and end of
the procedure body in the code. This means that, in addition to the possibility of invoking
the procedure by means of a call (PERFORM in COBOL), the statements could be executed
sequentially as a part of the main program.
Parallel execution of two actions is required if both begin from the same initial state and
alter this state in incompatible ways. A typical example is the parallel assignment x, y :=
y, x , in which the values are exchanged. To represent this in a sequential program, the
compiler must rst extend the state so that the condition `identical starting states for both
actions' can be preserved. This can be done here by introducing an auxiliary variable t , to
which x is assigned.
Another case of parallel execution of two actions arises when explicit synchronization is
embedded in these actions to control concurrent execution. The compiler must fall back upon
coroutines or parallel processing facilities in the operating system in order to achieve such
synchronization; we shall not discuss this further.
Collateral execution of two actions means that the compiler need not x their sequence
according to source language constraints. It can, for example, exchange actions if this will
2.5 Program Environments and Abstract Machine States 25
lead to a more ecient program. If both actions contain identical sub-actions then it suces
to carry out this sub-action only once; this has the same eect as the (theoretically possi-
ble) perfectly-synchronized parallel execution of the two identical sub-actions. If a language
species collateral evaluation, the question of whether the evaluation of f (x) in the assign-
ment a[i + 1] := f(x) + a[i + 1] can in
uence the address calculation for a[i + 1] by
means of a side eect is irrelevant. The compiler need only compute the address of a[i + 1]
once, even if i were the following function procedure:
function i : integer ; begin k := k + 1; i := k end;
In this case k will be incremented only once, a further illustration of side eects and the
meaning of the paragraph from the FORTRAN 77 Standard quoted at the end of Section 2.3.
of objects are generally associated with procedure call and return, and for this reason the
procedure call hierarchy forms a part of the environment. We shall now consider questions of
lifetime and visibility; the related topic of procedure parameter transmission will be deferred
to Section 2.5.3.
That part of the execution history of a program during which an object exists is called
the extent of the object. The extent rules of most programming languages classify objects as
follows:
Static: The extent of the object is the entire execution history of the program.
Automatic: The extent is the execution of a specied syntactic construct (usually a
procedure or block).
Unrestricted: The extent begins at a programmer-specied point and ends (at least
theoretically) at the end of the program's execution.
Controlled: The programmer species both the beginning and end of the extent by
explicit construction and destruction of objects.
Objects in COBOL and the blank common block of FORTRAN are examples of static
extent. Local variables in ALGOL 60 or Pascal, as well as local variables in FORTRAN sub-
programs, are examples of automatic extent. (Labeled common blocks in FORTRAN 66 also
have automatic extent, see Section 10.2.5 of the standard.) List elements in LISP and objects
created by the heap generator of ALGOL 68 have unrestricted extent, and the anonymous
variables of Pascal are controlled (created by new and discarded by dispose).
The possibility of a dangling reference arises whenever a reference can be created to an
object of restricted extent. To avoid errors, we must guarantee that the referenced object
exists at the times when references to it are actually attempted. A sucient condition to make
this guarantee is the ALGOL 68 rule (also used in LAX) prohibiting assignment of references
or procedures in which the extent of the right-hand side is smaller than the reference to which
it is assigned. It has the advantage that it can be checked by the compiler in many cases,
and a dynamic run-time check can always be made in the absence of objects with controlled
extent. When a language provides objects with controlled extent, as do PL/1 and Pascal,
then the burden of avoiding dangling references falls exclusively upon the programmer.
LAX constant are the only object having static extent. Variables are generally automatic,
although it is possible to generate unrestricted variables. The language has no objects with
controlled extent, because such objects do not result in any new problems for the compiler.
Static variables were omitted because the techniques used to deal with automatic variables
apply to them essentially without change.
By the scope of an identier denition we understand the region of the program within
which we can use the identier with the dened meaning. The scope of an identier denition
is generally determined statically by the syntactic construct of the program in which it is
directly contained. A range is a syntactic construct that may have identier denitions
associated with it. In a block-structured language, inner ranges are not part of outer ranges.
Usually any range may contain at most one denition of an identier. Exceptions to this
rule may occur when a single identier may be used for distinct purposes, for example as
an object and as the target of a jump. In ALGOL-like languages the scope of a denition
includes the range in which it occurs and all enclosed ranges not containing denitions of the
same identier.
Consider the eld selection p.f . The position immediately following the dot belongs
to the scope of the declaration of p 's record type. In fact, only the eld selectors of that
record type are permitted in this position. On the other hand, although the statement s
of the Pascal (or SIMULA) inspection with p do s also belongs to the scope of p 's record
28 Properties of Programming Languages
type declaration, the denitions from the inspection's environment remain valid in s unless
overridden by eld selector denitions. In COBOL and PL/1, f can be written in place of
p.f (partial qualication ) if there is no other denition of f in the surrounding range.
The concept of static block structure has the consequence that items not declared in a
procedure are taken from the static surrounding of the procedure. A second possibility is that
used in APL and LISP: Nonlocal items of functions are taken from the dynamic environment
of the procedure call.
In the case of recursive procedure calls, identically-declared objects with nested extents
may exist at the same time. Diculties may arise if an object is introduced (say, by parameter
transmission) into a program fragment where its original declaration is hidden by another
declaration of the same identier. Figure 2.3 illustrates the problem. This program makes
two nested calls of p , so that two incarnations, q1 and q2 , of the procedure q and two variables
i1 and i2 exist at the same time. The program should print the values 1, 4 and 1 of i2 , i1
and k . This behavior can be explained by using the contour model.
procedure outer ;
var n , k : integer ;
procedure p (procedure f; var j : integer );
label 1;
var i : integer ;
procedure q ;
label 2;
begin (* q *)
n := n + 1; if n = 4 then q;
n := n + 1; if n = 7 then 2 : j := j + 1;
i := i + 1;
end; (* q *)
begin (* p *)
i := 0;
n := n + 1; if
n = 2 then p (q , i ) else j := j + 1;
if
n = 3 then
1 : f;
i := i + 1;
writeln (' i = ', i :1);
end; (* p *)
procedure empty ; begin end
;
begin (* outer *)
n := 1; k := 0;
p (empty , k );
writeln (' k = ', k :1);
end; (* outer *)
are accessible. The contour addressed by ep is called the local contour. The object identied
by a given identier is found by scanning the contours from inner to outer, beginning at the
local contour, until a denition for the specied identier is found.
The structure of the state is changed by the following actions:
Construction or removal of an object.
Procedure call or range entry.
Procedure return or range exit.
Jump out of a range.
When an object with automatic extent is created, it lies in a contour corresponding to
the program construct in which it was declared; static objects behave exactly like objects
declared in the main program with automatic extent. Objects with unrestricted extent and
controlled objects lie in their own contours, which do not correspond to program constructs.
Upon entry into a range, a new contour is established within the local contour and the
environment pointer ep is set to point to it. Upon range exit this procedure is reversed: the
local contour is removed and ep set to point to the immediately surrounding contour.
Upon procedure call, a new contour c is established and ep set to point to it. In contrast
to range entry, however, c is established within the contour c ' addressed by ep at the time of
procedure declaration. We term c ' the static predecessor of c to distinguish it from c ", the
dynamic predecessor, to which ep pointed immediately before the procedure call. The pointer
to c " must be stored in c as a local object. Upon return from a procedure the local contour
of the procedure is discarded and the environment pointer reset to its dynamic predecessor.
To execute a jump into an enclosing range b , blocks and procedures are exited and the
corresponding contours discarded until a contour c corresponding to b is reached such that
c contained the contour of the jump. c becomes the new local contour, to which ep will
point, and ip is set to the jump target. If the jump target is determined dynamically as a
parameter or the content of a label variable, as is possible in ALGOL 60, then that parameter
or variable must specify both the target address and the contour that will become the new
local contour.
Figures 2.4 and 2.5 show the contour model for the state existing at two points during
the execution of the program of Figure 2.3. Notice that several contours correspond to the
same range when a procedure is called recursively. Further, the values of actual parameters
of a procedure call should be computed before the environment pointer is altered. If this is
not done, the pointer for parameter computation must be restored (as is necessary for name
parameters in ALGOL 60).
In order to unify the state manipulation, procedures and blocks are often processed iden-
tically. A block is then a parameterless procedure called `on the spot'. The contour of a block
thus has a dynamic predecessor identical with its static predecessor. The lifetimes of local
objects in blocks can be determined by the compiler, and a static overlay structure for them
can be set up within the contour of the enclosing procedure. The main program is counted
as a procedure for this purpose. The scope rules are not altered by this transformation. Con-
tours for blocks can be dispensed with, and all objects placed in the contour of the enclosing
procedure. Arrays with dynamic bounds lead to diculties with this optimization, since the
bounds can be determined only at the time of actual block entry.
The rules discussed so far do not permit description of either LISP or SIMULA. In LISP a
function f may have as its result a function g that accesses the local storage of f . Since this
storage must also exist during the call of g , the contour of f must be retained at least until
g becomes inaccessible. Analogously, a SIMULA class k (an object of unrestricted extent)
may have name parameters from the contour in which it was instantiated. This contour must
therefore be retained at least until k becomes inaccessible.
30 Properties of Programming Languages
Figure 2.5: Contours Existing When Control Reaches Label 2 in Figure 2.3
2.5 Program Environments and Abstract Machine States 31
We solve these problems by adopting a uniform retention strategy that discards an object
only when that object becomes inaccessible. Accessibility is dened relative to the current
contour. Whenever an object in a contour c references another object in a dierent contour,
c ', we implement that reference by an explicit pointer from c to c '. (Such references include
the dynamic predecessors of the contour, all reference parameters, and any explicit pointers
established by the user.) A contour is accessible if it can be reached from the current contour
by following any sequence of pointers or by a downhill walk. The dangling reference problem
vanishes when this retention strategy is used.
2.5.3 Binding
An identier b is termed bound (or local ) in a range if this range contains a denition for b ;
otherwise b is free (or global ) in this range. As denitions we have:
Declarations of object identiers (including procedure identiers)
Denitions: Label denitions, type denitions, FORTRAN labeled common blocks, etc.
Formal parameter denitions
In the rst and second cases the dened value along with all of its attributes is obvious
from the denition. In the third case only the identier and type of the dened value are
available via the program text. The actual parameter, the argument, will be associated with
the identier by parameter transmission at the time of the procedure call. We distinguish
ve essentially dierent forms of parameter transmission:
1. Value (as in ALGOL 60, SIMULA, Pascal, Ada, for example): The formal parameter
identies a local variable of the procedure, which will be initialized with the argument
value at the procedure call. Assignment to the parameter does not aect the caller.
2. Result (Ada): The formal parameter identies a local variable of the procedure with
undened initial value. Upon return from the procedure the content of this local variable
is assigned to the argument, which must be a variable.
3. Value/Result (FORTRAN, Ada): The formal parameter identies a local variable of
the procedure, which will be initialized with the argument value at the procedure call.
Upon return from the procedure the content of this local variable is assigned to the
argument if the argument is a variable. The argument variable may be xed prior to
the call or redetermined upon return.
4. Reference (FORTRAN, Pascal, Ada): A reference to the argument is transmitted to
the procedure. All operations on the formal parameter within the procedure are carried
out via this reference. (If the argument is an expression but not a variable, then the
result is placed in a temporary variable for which the reference is constructed. Some
languages, such as Pascal, do not permit use of an expression as an argument in this
case.)
5. Name (ALGOL 60): A parameterless procedure p , which computes a reference to the
argument, is transmitted to the procedure. (If the argument is an expression but not a
variable then p computes the value of the expression, stores it in a temporary variable
h , and yields a reference to h .) All operations on the formal parameter rst invoke p
and then operate via the reference yielded by p .
Call by value is occasionally restricted to a strict value transmission in which the formal
parameter identies not a local variable, but rather a local constant. Call by name is explained
in many language denitions by textual substitution of the argument for the parameter.
32 Properties of Programming Languages
ALGOL 60 provides for argument evaluation in the environment of the caller through a
consistent renaming.
The dierent parameter mechanisms can all be implemented in terms of (strict) call by
value, if the necessary kinds of data are available. For cases (2)-(4), the language must
provide the concept of arbitrary references as values. Call by name also requires the concept
of procedures as values (of procedure variables). Only when these concepts are unavailable are
the transmission mechanisms (2)-(5) important. This is clear in the language SIMULA, which
(in addition to the value and name calls inherited from ALGOL 60) provides call by reference
for classes and strings. A more careful study shows that in truth this could be handled by
an ordinary value call for references. In ALGOL 68 the call by reference is stated in terms of
the strict call by value, by using an identity declaration to make the formal parameter fp an
alias of the argument ap :
ref int fp = ap
Expressions that do not yield references are not permitted as arguments if this explanation
of call by reference is used, since the right hand side of the identity declaration must yield a
reference.
LAX follows the style of ALGOL 68, explaining its argument bindings in terms of identity
declarations. This provides a uniform treatment of all parameter mechanisms, and also elim-
inates the parameter mechanism as a distinct means of creating new access paths. Finally,
the identity declaration gives a simple implementation model.
Many language denitions do not specify parameter transmission mechanisms explicitly.
The compiler writer must therefore attempt to delineate the possibilities by a careful con-
sideration of their eects. For example, both case (3) and case (4) satisfy the conditions of
the FORTRAN 66 Standard, but none of the others do. Ada generally requires case (1), (2)
or (3). For composite objects, however, case (4) is permitted as an alternative. Use of this
alternative is at the discretion of the implementor, and the programmer is warned that any
assumptions about the particular transmission mechanism invalidates the program.
Programs whose results depend upon the parameter transmission mechanism are generally
dicult to understand. The dependencies arise when an object has two access paths, say via
two formal parameters or via a global variable and a formal parameter. This can be seen in
the program of Figure 2.6a, which yields the results of Figure 2.6b for the indicated parameter
mechanisms.
In addition to knowing what value an identier is bound to, it is important to know
when the binding takes place. The parameter transmission dierences discussed above can,
to a large extent, be explained in terms of binding times. In general, we can distinguish the
following binding times (explained in terms of the identity declaration ref realx =a[i,j+3] ):
1. Binding at each access (corresponding to call by name): Upon each access to x the identity
of a[i, j + 3] is re-determined.
2. Binding at rst access: Upon the rst access to x the identity of a[i, j + 3] will be
determined. All assignments to i and j up to that point will have an eect.
3. Binding upon declaration (corresponding to call by reference): After elaboration of the
identity declaration the identity of a[i, j + 3] is xed. In several languages the
identiers on the right-hand side must not be declared in the same range, to avoid
circular denitions.
4. Static binding: The identity of a[i, j + 3] is xed throughout the entire program. In
this case a must have static extent and statically-determined size. The values of i and
j must be dened prior to program execution and be independent of it (hence they
must be constants).
2.5 Program Environments and Abstract Machine States 33
begin
int m :=1, n ;
proc p = (??? int j , ??? int k ) int:
begin j := j + 1 ; m := m + k; j + k end;
n := p (m , m + 3)
end
Note: `???' depends upon the parameter mechanism.
a) An ALGOL 68 program
Mechanism m n j k Comment
Value 5 6 2 4 Strict value is not possible due to the assignment to j .
Value/Result 2 6 2 4 Pure result is unreasonable in this example.
Reference 6 10 6 4 Only j is a reference parameter because an expression
is illegal as a reference parameter in ALGOL 68. Hence
k is a value parameter.
Name 7 17 7 10
Note: m and n were evaluated at the end of the main program, j and k at the end of p .
b) The eect of dierent parameter mechanisms
Figure 2.6: Parameter Transmission
In this spectrum call by result would be classied as binding after access. Call by value
is a binding of the value, not of the reference.
Determination of identity is least costly at run time for static binding and most costly for
binding at access. During the analysis of the program, the compiler writer is most concerned
with gathering as much information as possible, to bind as early as he can. For this reason
static binding breaks into two subcases, which in general depend not upon the language but
upon other considerations:
4a. Binding at compilation time. The identity of the bound values is determined during
compilation.
4b. Binding at program initialization: The identity of les or of external procedures will be
determined during a pre-process to program execution.
In case 4a the knowledge of the bound values can be used in optimization. 4b permits
repeated execution of the program with dierent bindings without re-compilation.
Free identiers, which are not dened in a procedure, must be explained in the context of
the procedure so that their meaning can be determined. The denitions of standard identiers,
which may be used in any program without further declaration, are tted into this scheme
by assuming that the program is embedded in a standard environment containing denitions
for them.
By an external entity we mean an entity identied by a free identier with no denition
in either the program or the standard environment. A program with external entities cannot
be compiled and then directly executed. Another step, which obtains the objects associated
with external entities from a program library, must be introduced. We shall discuss this step,
the binding of programs, in Chapter 11. In the simplest case the binding can be separated
from the compilation as an independent terminal step. This separation is normally chosen
for FORTRAN implementations. One consequence is that the compiler has no complete
overview of the properties of external entities and hence cannot verify that they are used
consistently. Thus in FORTRAN it is not usually possible for the compiler to determine
34 Properties of Programming Languages
whether external subprograms and functions are called with the correct number and type
of parameters. For such checking, but also to develop the correct accesses, the compiler
must have specications like those for formal parameters for every external entity. Many
implementations of ALGOL 60, Pascal, etc. provide that such specications precede or be
included in independently compiled procedures. Since in these languages, as in many others,
separate compilation of language units is not specied by the language denition, the compiler
writer himself must design the handling of external values in conjunction with introduction
of these possibilities. Ada contains a far-reaching specication scheme for external entities.
Exercises
2.1 [Housden, 1975; Morrison, 1982] Consider the manipulation of character string data
in a general purpose programming language.
2.6 Notes and References 35
type q = record
x : real ;
y : "record
x : real ;
y : q"
end
end;
2.3 Why is the Boolean expression (x -1) and (sqrt (1 + x) > y ) meaningless in
Pascal, FORTRAN or ALGOL 60? Consider only structurally equivalent expressions
in the various languages, making any necessary syntactic changes. Give a similar
expression in Ada that is meaningful.
2.4 Give the rules for contour creation and destruction necessary to support the module
concept in Ada.
2.5 Consider a block-structured language such as SIMULA, in which coroutines are allowed.
Generalize the contour model with a retention strategy to handle the following situation:
If n coroutines are started in block b , all have contour c as dynamic predecessor.
By means of call-by-name parameters, a coroutine can obtain access to an object o
belonging to c ; on the other hand, contour c can disappear (because execution of b
has terminated) long before termination of the coroutine. o is then nonexistent, but
the access path via the name parameter remains. What possible solutions do you see
for this problem?
2.6 The retention strategy discussed in connection with SIMULA in Exercise 2.5 could be
used to support parallel processing in ALGOL 68. Quote sections of the ALGOL 68
Report to show that a simpler strategy can be used.
2.7 What problems arise from result parameters in a language that permits jumps out of
procedures?
2.8 Consider a program in which several procedures execute on dierent processors in
a network. Each processor has its own memory. What parameter mechanisms are
appropriate in such a program?
36 Properties of Programming Languages
Chapter 3
Properties of Real and Abstract
Machines
In this chapter we shall discuss the target machine properties relevant for code generation,
and the mapping of the language-oriented objects and operations onto objects and operations
of the target machine. Systematic code generation must, of course, take account of the pecu-
liarities and weaknesses of the target computer's instruction set. It cannot, however, become
bogged down in exploitation of these special idiosyncrasies; the payo in code eciency will
not cover the implementation cost. Thus the compiler writer endeavors to derive a model of
the target machine that is not distorted by exceptions, but is as uniform as possible, to serve
as a base for code generator construction. To this end some properties of the hardware may
be ignored, or gaps in the instruction set may be lled by subroutine invocations or inline
sequences treated as elementary operations. In particular, the instruction set is extended by
the operations of a run-time system that interfaces input/output and similar actions to the
operating system, and attends to storage management.
Further extension of this idea leads to construction of abstract target machines imple-
mented on a real machine either interpretively or by means of a further translation. (Inter-
pretive abstract machines are common targets of code generation for microprocessors due to
the need for space eciency.) We shall not attempt a systematic treatment of the goals, meth-
ods and criteria for the design of abstract target machines here; see the Notes and References
for further guidance.
Main storage Data registers D0,...,D7 serving as integer accumulators or index registers.
Address registers A0,...,A7 serving as base or index registers.
Program counter PC
Condition code
Stack pointer A7
b) Motorola 68000
Figure 3.1: Storage Classes
3.1 Basic Characteristics 39
(Such a decision can be made dierently for the generated code and the run-time system,
implying that the memory belongs to one class as far as the generated code is concerned and
another for the run-time system.) Also, since the properties of a storage class depend to a
certain extent upon the available access paths, a Motorola 68000 stack will dier from that
of a Burroughs 6700/7700.
Most storage classes consist of a sequence of numbered elements, the storage cells. (The
numbering may have gaps.) The number of a storage cell is called its address . Every access
path yields an algorithm, the eective address of the access path, for computing the address
of the storage cell being accessed. We speak of byte-oriented computers if the cells in the main
storage class have a size of 8 bits, otherwise (e.g. 16, 24, 32, 48 or 60 bits per cell) we term
the computer word-oriented . For a word-oriented computer the cell sizes in the main storage
and register classes are usually identical, whereas the registers of a byte-oriented computer
(except for some microprocessors) are 2, 4 or possibly 8 bytes long. In this case the storage
cell of the integer accumulator class is usually termed a word.
All storage is ultimately composed of bits. Some early computers (such as the IBM 1400
series) used decimal arithmetic and addressing, and many current computers provide a packed
decimal (4 bits per digit) encoding. None of these architectures, however, consider decimal
digits to be atoms of storage that cannot be further decomposed; all have facilities for accessing
the individual bits of the digit in some manner.
Single bits and bit sequences such as the decimal digits discussed above cannot be accessed
directly on most machines. Instead, the bit sequence is characterized by a partial-word access
path specifying the address of a storage cell containing the sequence, the position of the
sequence from the left or right boundary of this unit, and the size of the sequence. Often this
partial word access path must be simulated by means of shifts and logical operations.
Aggregates hold objects too large for a single storage cell. An aggregate will usually be
specied by the address of its rst storage cell, and the cells making up the aggregate by their
addresses relative to that point. Often the address of the aggregate must be divisible by a
given integer, called the alignment. Figure 3.2 lists main storage operand sizes and alignments
for typical machines.
Operand Size (bits) Alignment
Byte 8 1
Halfword 16 2
Word 32 4
Doubleword 64 8
String up to 256 8 1
a) IBM 370 - Storage cell is an 8-bit byte
Operand Size (bits) Alignment
Bit 1 -
Digit 4 -
Byte 8 1
Word 16 2
Doubleword 32 2
b) Motorola 68000 - Storage cell is an 8-bit byte
Figure 3.2: Operand Sizes
Aggregates also appear in classes other than main storage. For example, the 16 general
purpose registers of the IBM 370 form a storage class of 4-byte cells addressed by the numbers
40 Properties of Real and Abstract Machines
0 through 15. Every register whose address is even forms the rst element of a larger entity
(a register pair) used in multiplication, division and shift operations. When a single-length
operand for such an operation is supplied, it should be placed in the proper register of a
pair rather than in an arbitrary register. The other register of the pair is then automatically
reserved for the operation, and cannot be used for other purposes.
The entities of a particular level in a hierarchy of aggregates may overlap. This occurs,
for example, for the segments in the main storage class of the Intel 8086 (65536-byte blocks
whose addresses are divisible by 16) or the 4096-byte blocks addressable via a base or index
register in the IBM 370.
Operations on registers usually involve the full register contents. When an object whose
size is smaller than that of a register is moved between a register and storage of some other
class, a change of representation may occur. The value of the object must, however, remain
invariant. Depending upon the type of the object, it may be lengthened by inserting leading
or trailing zeros, or by inserting leading or trailing copies of the sign. When it is shortened,
we must guarantee that no signicant information is lost. Thus the working length of an
object must be distinguished from the storage length.
3.1.2 Access Paths
An access path describes the value or location of an operand, result or jump target. We
classify an instruction as a 0-, 1-, 2-, or 3-address instruction according to the number of
access paths it species. Very seldom are there more than three access paths per instruction,
and if more do exist then they are usually implicit. (For example, in the MVCL instruction
of the IBM 370 the two register specications R1 and R2 actually dene four operands in
registers R1, R1+1, R2 and R2+1 respectively.)
Each access path species the initial element of an operand or result in a storage class.
Access paths to some of the storage classes (such as the stack, program counter, condition
code and special registers) are not normally explicit in the instruction. They will appear only
when there is some degree of freedom associated with their use, as in the PDP11 where any
register can be used as a stack pointer.
The most common explicit access paths involve one of the following computations:
Constant. The value appears explicitly in the instruction.
Register. The content of the register is taken as the value.
Register+constant. The sum of the content of the register and a constant appearing
explicitly in the instruction is taken as the value.
Register+register. The sum of the contents of two registers is taken as the value.
Register+register+constant. The sum of the contents of two registers and a constant
appearing in the instruction is taken as the value.
The computed value may itself be used as the operand (immediate ), it may be used as the
eective address of the operand in main storage (direct ), or it may be used as the address of
an address (indirect ). On some machines the object fetched from main storage in the third
case may specify another computation and further indirection, but this feature is rarely used
in practice. Figure 3.3 illustrates these concepts for typical machines.
The addresses of registers must almost always appear explicitly as constants in the instruc-
tion. In special cases they may be supplied implicitly, as when the content of the (unspecied)
program counter is added to a constant given in the instruction (relative addressing ). If the
computed value is used as an address then the registers must belong to the base register
or index register class; the sum of the (unsigned) base address and (signed) index is often
3.1 Basic Characteristics 41
interpreted modulo the address size. The values of constants in instructions are frequently re-
stricted to nonnegative values, and often their maximum values are far less than the maximum
address. (An example is the restriction to the range [0,4095] of the IBM 370.)
Not all computers allow every one of the access paths discussed above; restrictions in the
combination (operation, access path) can also occur. Many of these restrictions arise from
the properties of the machine's registers. We distinguish ve architectural categories based
upon register structure:
Storage-to-storage. All operands of a computational operation are taken from main
storage, and the result is placed into main storage (IBM 1400 series, IBM 1620). Storage-
to-storage operations appear as a supplementary concept in many processors.
Stack. All operands of a computational operator are removed from the top of the stack,
and the result is placed onto the top of the stack (Burroughs 5000, 6000 and 7000 series,
ICL 2900 family). The stack appears as a supplementary concept in many processors.
Single Accumulator. One operand of a computational operator is taken from the accu-
mulator, and the result is placed into the accumulator; all other registers, including any
accumulator extension, have special tasks or cannot participate in all operations (IBM
7040/7090, Control Data 3000 series, many process-control computers, Intel 8080 and
microprocessors derived from it).
Multiple Accumulator. One operand of a computational operator is taken from one of
the accumulators, and the result is returned to that accumulator; long operands and
42 Properties of Real and Abstract Machines
results are accommodated by pairing the accumulators (DEC PDP11, Motorola 68000,
IBM 370, Univac 1100)
Storage Hierarchy. All operands of a computational operator are taken from accumula-
tors, and the result is returned to an accumulator (Control Data 6000, 7000 and Cyber
series). This architecture is identical to the storage-to-storage architecture if we view
the accumulators as primary storage and the main storage as auxiliary storage.
3.1.3 Operations
Usually the instruction set of a computer provides four general classes of operation:
Computation: Implements a function from n-tuples of values to m-tuples of values. The
function may aect the state. Example: A divide instruction whose arguments are a
single-length integer divisor and a double-length integer dividend, whose results are a
single-length integer quotient and a single-length integer remainder, and which may
produce a divide check interrupt.
Data transfer: Copies information, either within one storage class or from one storage
class to another. Examples: A move instruction that copies the contents of one register
to another; a read instruction that copies information from a disc to main storage.
Sequencing: Alters the normal execution sequence, either conditionally or uncondition-
ally. Examples: A halt instruction that causes execution to terminate; a conditional
jump instruction that causes the next instruction to be taken from a given address if a
given register contains zero.
Environment control: Alters the environment in which execution is carried out. The
alteration may involve a transfer of control. Examples: An interrupt disable instruc-
tion that prohibits certain interrupts from occurring; a procedure call instruction that
updates addressing registers, thus changing the program's addressing environment.
It is not useful to attempt to assign each instruction unambiguously to one of these classes.
Rather the classes should be used as templates to evaluate the properties of an instruction
when deciding how to implement language operations (Section 3.2.3).
It must be possible for the control unit of a computer to determine the operation and
all of the access paths from the encoding of an instruction. Older computer designs usually
had a single instruction size of, say, 24 or 36 bits. Fixed subelds were used to specify the
operation and the various access paths. Since not all instructions require the same access
paths, some of these subelds were unused in some cases. In an information-theoretic sense,
this approach led to an inecient encoding.
Coding eciency is increased in more modern computers by using several dierent instruc-
tion sizes. Thus the IBM 370 has 16, 32 and 48 bit (2, 4 and 6 byte) instructions. The rst
byte is the operation code, which determines the length and layout of the instruction as well
as the operation to be carried out. Nearly all microprocessors have variable-size operation
codes as well. In this case the encoding process carried out by the assembly task may require
larger tables, but otherwise the compiler is not aected. Variable-length instructions may
also lead to more complex criteria of optimality.
On some machines one or more operation codes remain unallocated to hardware functions.
Execution of an instruction specifying one of these operation codes results in an interrupt,
which can be used to activate a subprogram. Thus these undened operations can be given
meaning by software, allowing the compiler writer to extend the instruction set of the target
machine. Such programmable extension of the instruction set is sometimes systematically
supported by the hardware, in that the access paths to operands at specic positions are
3.2 Representation of Language Elements 43
placed at the disposal of the subprogram as parameters. The XOP instruction of the Texas
Instruments 990 has this property. (TRAP allows programmable instruction set extension on
the PDP11, but does not make special access path provisions.)
to over- and under
ow. For example, consider a machine with integers in the range [-
32767,32767]. If a > b is implemented as (a , b) > 0 then an over
ow will occur when
comparing the values a = 16384 and b = ,16384. The comparison code must either antici-
pate and avoid this case, or handle the over
ow and interpret the result properly. In either
case, a long instruction sequence may be required. Under
ow may occur in
oating point
comparisons implemented by a subtraction when the operand dierence is small. Since many
machines deliver 0 as a result, without indicating that an under
ow has occurred, anticipation
and avoidance are required.
Actually, the symptom of the
oating point under
ow problem is that a comparison asserts
the equality of two numbers when they are really dierent. We could argue that the inherent
inaccuracy of
oating point operations makes equality testing a risky business anyway. The
programmer must thoroughly understand the algorithm and its interaction with the machine
representation before using equality tests, and hence we can inform him of the problem and
then forget about it. This position is defensible provided that we can guarantee that a
comparison will never yield an incorrect relative magnitude (i.e. it will never report a > b
when a is less than b, or vice-versa).
If, as in Pascal, subranges m::n of integers can be specied as types, the compiler writer
must decide what use to make of this information. When the usual integer range can be
exceeded (not possible in Pascal) this forces the introduction of higher-precision arithmetic
(in the extreme case, of variable-length arithmetic). For small subranges the size of the
range can be used to reduce the number of bits required in the representation, if necessary
by replacing the integer i by (i , lower bound), although this last is not recommended. The
important question is whether arithmetic operations exist for the shorter operands, or at least
whether the conversion between working length and storage length can easily be carried out.
(Recall that no signicant bits may be discarded when shortening the representation.)
The possibilities for mapping real numbers are constrained by the
oating point operations
of the hardware or the given subroutine package. (If neither is available on the target machine
then implementation should follow the IEEE standard.) The only real choice to be made
involves the precision of the signicand. This decision must be based upon the milieu in
which the compiler will be used and upon numeric problems whose discussion is beyond the
scope of this book.
For characters and character strings the choice of mapping is restricted to the specication
of the character code. Assuming that this is not xed by the source language, there are two
choices: either a standard code such as the ISO 7-bit code (ASCII), or the code accepted
by the target computer's operating system for input/output of character strings (EBCDIC
or other 6- or 8-bit code; note that EBCDIC varies from one manufacturer to another).
Since most computers provide quite ecient instructions for character translation, use of the
standard code is often preferable.
The representation of other nite types reduces to the question of suitably representing
the integers 0::n , 1, which we have already discussed. One exception is the Boolean values
false and true. Only a few machines are provided with instructions that access single bits.
If these instructions are absent, bit operations must be implemented by long sequences of
code (Figure 3.4). In such cases it is appropriate to implement Boolean variables and values
as bytes or words. Provided that the source language has not constrained their coding, the
choice of representation depends upon the realization of operations with Boolean operands or
Boolean results. In making this decision, note that comparison and relational operations occur
an order of magnitude more frequently than all other Boolean operations. Also, the operands
of and and or are much more frequently relations than Boolean variables. In particular, the
implementation of and and or by jump cascades (Section 3.2.3) introduces the possibilities
(false = 0, true 6= 0) and (false 0, true0) or their inverses in addition to the classical
3.2 Representation of Language Elements 45
(false = 0; true = 1). These possibilities underscore the use of more than one bit to represent
a Boolean value.
1 Bit The bit position is specied by two masks, M0=B'0...010...0' and
M1=B'1...101...1'.
1 Byte Let 0 represent false, K represent true.
a) Possible representations for Boolean values
Construct Code, depending on representation
Byte Bit
TM M0,p
BO L1
NI M1,q
q := p MVC q,p B L2
L1 OI M0,q
L2 continuation
p := not p XI K,p XI M0,p
TM M0,p
q := q or p OC q,p BZ L1
OI M0,q
L1 continuation
TM M0,p
q := q and p NC q,p BO L1
NI M0,q
L1 continuation
(The masks M0 and M1 are those appropriate to the second operand of the instruction in which they appear.)
b) Code using the masks from (a)
Figure 3.4: Boolean Operations on the IBM 370
Here |M| is the size of an element in address units and address (a [ 0 ]) is the `c-
titious starting address' of the array. The address of a[0] is computed from the location
of the array in storage; such an element need not actually exist. In fact, address (a [0])
could be an invalid address lying outside of the address space.
The usual representation of an object b : array [m1 .. n1 ,: : : , mr .. nr ] of M
occupies k1 k2 ::: kr jM j contiguous memory cells, where kj = nj , mj + 1, j = 1; : : : ; r.
The address of element b[i1 ; : : : ; ir ] is given by the following storage mapping function when
the array is stored in row-major order :
address (b[m1 ; : : : ; mr ]) + (i1 , m1 ) k2 kr jM j + + (ir , mr ) jM j
= address (b[0; : : : ; 0]) + i1 k2 ::: kr jM j + + ir jM j
By appropriate factoring, this last expression can be rewritten as:
address (b[0; : : : ; 0]) + ((: : : (i1 k2 + i2 ) k3 + + ir ) jM j
46 Properties of Real and Abstract Machines
If the array is stored in column-major order then the order of the indices in the polynomial
is reversed:
address (b[0; : : : ; 0]) + ((: : : (ir kr,1 + ir,1 ) kr,2 + + i1 ) jM j
The choice of row-major or column-major order is a signicant one. ALGOL 60 does not
specify any particular choice, but many ALGOL 60 compilers have used row-major order.
Pascal implicitly requires row-major order, and FORTRAN explicitly species column-major
order. This means that Pascal arrays must be transposed in order to be used as parameters
to FORTRAN library routines. In the absence of language constraints, make the choice that
corresponds to the most extensive library software on the target machine.
Access to b[i1 ; :::; ir ] is undened if the relationship mj ij nj is not satised for some
j = 1; : : : ; r. To increase reliability, this relationship should be checked at run time if the
compiler cannot verify it in other ways (for example, that ij is the controlled variable of a
loop and the starting and ending values satisfy the condition). To make the check, we need
to evaluate a storage mapping function with the following xed parameters (or its product
with the size of the single element):
r; address (b[0; : : : ; 0]); m1 ; :::; mr ; n1 ; :::; nr
Together, these parameters constitute the array descriptor. The array descriptor must be
stored explicitly for dynamic and
exible arrays, even in the trivial case r = 1. For static
arrays the parameters may appear directly as immediate operands in the instructions for
computing the mapping function. Several array descriptors may correspond to a single array,
so that in addition to questions of equality of array components we have questions of equality
or identity of array descriptors.
An r dimensional array b can also be thought of as an array of r , 1 dimensional arrays.
We might apply this perception to an object c : array[1::m; 1::n] of integer, representing it as
m one-dimensional arrays of type t = array[1::n] of integer. The ctitious starting addresses
of these arrays are then stored in an object a : array[1::m] of " t. To be sure, this descriptor
technique raises the storage requirements of c from m n to m n + m locations for integers
or addresses; in return it speeds up access on many machines by replacing the multiplication
by n in the mapping function address (c[0; 0]) + (i n + j ) jintegerj by an indexed memory
reference. The saving may be particularly signicant on computers that have no hardware
multiply instruction, but even then there are contraindications: Multiplications occurring in
array accesses are particularly amenable to elimination via simple optimizations.
The descriptor technique is supported by hardware on Burroughs 6700/7700 machines.
There, the rows of a two-dimensional array are stored in segments addressed by special seg-
ment descriptors. The segment descriptors, which the hardware can identify, are used to
access these rows. Actual allocation of storage to the rows is handled by the operating sys-
tem and occurs at the rst reference rather than at the declaration. The allocation process,
which is identical to the technique for handling page faults, is also applied to one-dimensional
arrays. Each array or array row is divided into pages of up to 256 words. Huge arrays can
be declared if the actual storage requirements are unknown, and only that portion actually
referenced is ever allocated.
Character strings and sets are usually implemented as arrays of character and Boolean
values respectively. In both cases it pays to pack the arrays. In principle, character string
variables have variable length. Linked lists provide an appropriate implementation; each list
element contains a segment of the string. List elements can be introduced or removed at will.
Character strings with xed maximum length can be represented by arrays of this length.
When an array of Boolean values is packed, each component is represented by a single bit,
even when simple Boolean variables are represented by larger storage units as discussed above.
A record is represented by a succession of elds. If the elds of a record have alignment
constraints, the alignment of the entire record must be constrained also in order to guarantee
3.2 Representation of Language Elements 47
that the alignment constraints of the elds are met. An appropriate choice for the alignment
constraint of the record is the most stringent of the alignment constraints of its elds. Thus
a record containing elds with alignments of 2, 4 and 8 bytes would itself have an alignment
of 8 bytes. Whenever storage for an object with this record type is allocated, its starting
address must satisfy the alignment constraint. Note that this applies to anonymous objects
as well as objects declared explicitly.
The amount of storage occupied by the record may depend strongly upon the order of
the elds, due to their sizes and alignment constraints. For example, consider a byte-oriented
machine on which a character variable is represented by one byte with no alignment constraint
and an integer variable occupies four bytes and is constrained to begin at an address divisible
by 4. If a record contained an integer eld followed by a character eld followed by a second
integer eld then it would occupy 12 bytes: There would be a 3-byte gap following the
character eld, due to the alignment constraint on integer variables. By reordering the elds,
this gap could be eliminated. Most programming languages permit the compiler to do such
reordering.
Records with variants can be implemented with the variants sharing storage. If it is
known from the beginning that only one variant will be used and that the value of the variant
selector will never change, then the storage requirement may be reduced to exactly that
for the specied variant. This requirement is often satised by anonymous records; Pascal
distinguishes the calls new(p) and new(p; variant selector ) as constructors for anonymous
records. In the latter case the value of the variant selector may not change, whereas in the
former all variants are permitted.
The gaps arising from the alignment constraints on the elds of a record can be eliminated
by simply ignoring those constraints and placing the elds one after another in memory. This
packing of the components generally increases the cost in time and instructions for eld
access considerably. The cost almost always outweighs the savings gained from packing a
single record; packing pays only when many identical records are allocated simultaneously.
Packing is often restricted to partial words, leaving objects of word length (register length)
or longer aligned. On byte-oriented machines it may pay to pack only the representation of
sets to the bit level.
Packing alters the access function of the components of a composite object: The selector
must now specify not only the relative address of the component, but also its position within
the storage cell. On some computers extraction of a partial word can be specied as part of an
operand address, but usually extra instructions are required. This has the result that packed
components of arrays, record and sets may not be accessible via normal machine addresses.
They cannot, therefore, appear as reference parameters.
Machine-dependent programs sometimes use records as templates for hardware objects.
For example, the assembly phase of a compiler might use a record to describe the encoding of
a machine instruction. The need for a xed layout in such cases violates the abstract nature
of the record, and some additional mechanism (such as the representation specication of
Ada) is necessary to specify this. If the language does not provide any special mechanism,
the compiler writer can overload the concept of packing by guaranteeing that the elds of a
packed record will be allocated in the order given by the programmer.
Addresses are normally used to represent pointer values. Addresses relative to the be-
ginning of the storage area containing the objects are often sucient, and may require less
storage than full addresses. If, as in ALGOL 68, pointers have bounded lifetime, and the
correctness of assignments to reference variables must be checked at run time, we must add
information to the pointer from which its lifetime may be determined. In general the starting
address of the activation record (Section 3.3) containing the reference object serves this pur-
pose; reference objects of unbounded extent are denoted by the starting address of the stack.
48 Properties of Real and Abstract Machines
A comparison of these addresses for relative magnitude then represents inclusion of lifetimes.
3.2.3 Expressions
Because of the diversity of machine instruction sets, we can only give the general principles
behind the mapping of expressions here. An important point to remember throughout the
discussion, both here and in Section 3.2.4, is that the quality of the generated code is deter-
mined by the way it treats cases normally occurring in practice rather than by its handling
of the general case. Moreover, local code characteristics have a greater impact than any op-
timizations on the overall quality. Figure 3.5 shows the static frequencies of operations in
a large body of Pascal text. Note the preponderance of memory accesses over computation,
but remember that indexing generally involves both multiplication and addition. Remember
also that these are static frequencies; dynamic frequencies might be quite dierent because
a program usually spends about 90% of its time in heavily-used regions accounting for less
than 10% of the overall code.
Structure Tree Operator Percent of All Operators
Access a variable 27
Assign 13
Select a eld of a record 9.7
Access a value parameter 8.1
Call a procedure 7.8
Index an array (each subscript) 6.4
Access an array 6.1
Compare for equality (any operands) 2.7
Access a variable parameter 2.6
Add integers 2.3
Write a text line 1.9
Dereference a pointer variable 1.9
Compare for inequality (any operands) 1.3
Write a single value 1.2
Construct a set 1.0
not 0.7
and 0.7
Compare for greater (any operands) 0.5
Test for an element in a set 0.5
or 0.4
All other operators 3.8
Figure 3.5: Static Frequencies of Pascal Operators [Carter, 1982]
Single target machine instructions directly implement operations appearing in the struc-
ture tree only in the simplest cases (such as integer arithmetic). A node of the structure
tree generally corresponds to a sequence of machine instructions, which may appear either
directly in the generated code or as a subroutine call. If subroutines are used then they may
be gathered together into an interpreter consisting of a control loop containing a large case
statement. The operations are then simply selectors used to choose the proper case, and
may be regarded as instructions of a new (abstract) machine. This approach does not really
answer the question of realizing language elements on a target machine; it merely changes the
target machine, hopefully simplifying the problem.
A closed sequence is invariably slower than the corresponding open sequence because of
3.2 Representation of Language Elements 49
the cost of the transfers in and out. It would therefore be used only if commensurate savings
in space were possible. Some care must be taken in evaluating the tradeos, because both
open and closed sequences usually involve setup code for the operands. It is easy to overlook
this code, making erroneous assumptions about the operand locations, and thereby arrive at
the wrong decision. Recall from Section 3.1.3 that it is sometimes possible to take advantage
of unused operation codes to access closed instruction sequences. Depending upon the details
of the hardware, the time overhead for this method may be either higher or lower than that
of a conventional call. It is probably most useful for implementing facilities that might be
provided by hardware. The typical example is
oating point arithmetic on a microprocessor
with integer operations only. A
oating point operation usually involves a long sequence of
instructions on such a machine (which may not even be capable of integer multiplication or
division), and thus the entry/exit overhead is negligible. If the user later adds a
oating-
point chip, and controls it with the previously unused operation codes, no changes to the
code generator are required. Even when dierent operation codes are used the changes are
minimal.
An object, label or procedure is addressable if its eective address can be expressed by
the relevant access path of an instruction. For entities that are not addressable, additional
operations and temporary storage are required to compute the eective address. The allow-
able combinations of operation and access function exert a very strong in
uence upon the
code generation process because of this. On the Motorola 68000, for example, specication
of the operation can be largely separated from selection of the access path, and operand ad-
dressability is almost independent of the operator. Many IBM 370 instructions, on the other
hand, work only when the second operand is in a register. In other cases memory access is
possible, but only via a base register without indexing. This leads to the problem that an
operand may be addressable in the context of one operation but not in the context of another.
When an instruction set contains such asymmetries, the simplest solution is to dene the
abstract machine for the source-to-target mapping with a uniform access function, reserving
the resources (usually one or two registers) needed to implement the uniform access function
for any instruction. Many code sequences require additional resources internally in any event.
These can often be standardized across the code sequences and used to provide the uniform
access function in addition. The only constraint on resources reserved for the uniform access
function is that they have no inter-sequence meaning; they can be used arbitrarily within a
sequence.
Consider the tree for an expression. The addressability of entities described by leaves
is determined by the way in which the environment is encoded in the machine state. (We
shall discuss possibilities for environment encoding in Section 3.3.) For entities described by
interior nodes, however, the addressability depends upon the code sequence that implements
the node. It is often possible to vary a code sequence, without changing its cost, to meet
the addressability requirements of another node. Figure 3.6 shows a typical example. Here
the constraints of the IBM 370 instruction set require that a multiplicand be in the odd-
numbered register of a pair, and that the even-numbered register of that pair be free. Similarly,
the optimum mechanism for converting a single-length value to double-length requires its
argument to be in the even register of the pair used to hold its result. An important part of
the source-to-target mapping design is the determination of the information made available
by a node to its neighbors in the tree, and how this information aects the individual code
sequences.
Interior nodes whose operations yield addresses, such as indexing and eld selection nodes,
may or may not result in code sequences. Addressability is the key factor in this decision:
No code is required if an access function describing the node's result can be built, and if
that access function is acceptable to the instruction using the result. The richer the set of
50 Properties of Real and Abstract Machines
L R1,I
A R1,J Result in R1
M R0,K Multiplicand from R1, product to (R0,R1)
D R0,L Dividend from (R0,R1)
a) Code for the expression ((i + j ) k=l)
L R0,I
A R0,J
A R0,K Result in R0
SRDA R0,32 Extend to double, result in (R0,R1)
D R0,L Dividend from (R0,R1)
b) Code for the expression ((i + j + k)=l)
Figure 3.6: Optimum Instruction Sequences for the IBM 370
access functions, the more nodes can be implemented simply by access function restructuring.
In fact, it is often possible to absorb nodes describing normal value operations into access
functions that use their result. Figure 3.7 is a tree for b[i +12]. As we shall see in Section 3.3,
the local byte array b might have access function 36(13) on an IBM 370 (here register 13 gives
the base address of the local contour, and 36 is the relative byte location of b within that
contour). After loading the value of i into register 1, the eects of the index and addition
nodes can be combined into the access function 48(13,1). This access function (Figure 3.3a)
can be used to obtain the second argument in any RX-format instruction on the IBM 370.
INDEX
b +
i 12
b. But fetching a simple variable has no side eect, and hence the short-circuit evaluation
is not detectable. If c were a parameterless function with a side eect then it should be
invoked prior to the start of the code sequence of Figure 3.8b, and the c in that code sequence
would represent temporary storage holding the function result. Thus we see that questions
of short-circuit evaluation aect only the relative placement of code belonging to the jump
cascade and code for evaluating the operands of the relations.
if (a < b ) and (c = d ) or (e > f ) then statement ;
a) A conditional
L R1,a
C R1,b
BNL L10 Note condition reversal here
L R1,c
C R1,d
BEQ L1 Condition is not reversed here
L10 L R1,e
C R1,f
BNH L2 Reversed
L1 : : : Code for statement
L2 : : : Code following the conditional
generator for each case clause. The source-to-target mapping must specify the parameters to
be used in making this choice.
condition(e, L1, L2)
L1: clause
L2:
a) if e then clause ;
GOTO L
L1: clause
L: condition(e, L1, L2)
L2:
d) while e do clause ;
L1: clause
condition(e, L2, L1)
L2:
e) repeat clause until e
forbegin(i, e1 , e2 , e3 )
clause
forend(i, e2 , e3 )
f) for i := e1 by e2 to e3 do clause ;
required. Further, if the target machine can execute independent instructions in parallel, this
schema provides more opportunity for such parallelism than one in which the test is at the
beginning.
`Forbegin' and `forend' can be quite complex, depending upon what the compiler can
deduce about the bounds and step, and how the language denition treats the controlled
variable. As an example, suppose that the step and bounds are constants less than 212 , the
step is positive, and the language denition states that the value of the controlled variable is
undened on exit from the loop. Figure 3.10b shows the best IBM 370 implementation for
this case, which is probably one of the most common. (We assume that the body of the loop
is too complex to permit retention of values in registers.) Note that the label LOOP is dened
within the `forbegin' operation, unlike the labels used by the other iterations in Figure 3.9.
If we permit the bounds to be general expressions, but specify the step to be 1, the general
schema of Figure 3.10c holds. This schema works even if the value of the upper bound is the
largest representable integer, since it does not attempt to increment the controlled variable
after reaching the upper bound. More complex cases are certainly possible, but they occur
only infrequently. It is probably best to implement the abstract operations by subroutine
calls in those cases (Exercise 3.9).
target : array[kmin .. kmax ] of address ;
k : integer ;
k := e ;
if
k kmin and k kmax then goto target [k ] else goto L0;
a) General schema for `select' (Figure 3.9c)
LA 1, e1 e1 = constant < 212
LOOP ST 1,i
::: Body of the clause
L 1,i
LA 2,e2 e2 = constant < 212
LA 3,e3 e3 = constant < 212
BXLE 1,2,LOOP
b) IBM 370 code for special-case forbegin : : : forend
i := e1 ; t := e3 ;
if i > t then goto l3 else goto l2;
l1 : i := i + 1;
l2 : : : : (* Body of the clause *)
if i < t then goto l1;
l3 :
c) Schema for forbegin...forend when the step is 1
Figure 3.10: Implementing Abstract Operations for Control Structures
Procedure and function invocations are control structures that also manipulate the state.
Development of the instruction sequences making up these invocations involves decisions
about the form of parameter transmission, and the construction of the activation record { the
area of memory containing the parameters and local variables.
A normal procedure invocation, in its most general form, involves three abstract opera-
tions:
Callbegin: Obtain access to the activation record of the procedure.
54 Properties of Real and Abstract Machines
always treat the argument as a variable. If the programmer uses a constant, the compiler
must either
ag it as an error or move the constant value to a temporary storage location
and transmit the address of that temporary.)
For function results, the compiler generally produces temporaries of suitable type at the
call site and in the function. Within the function, the result is assigned to the local temporary.
Upon return, as in the case of a result parameter, the local temporary is copied into the global
temporary. The global temporary is only needed if the result cannot be used immediately.
(An example of this case is the value of cos(x) in cos(x) + sin(y).)
Results delivered by function procedures can, in simple cases, be returned in registers. (For
compatibility with jump cascades, it may be useful for a Boolean function to encode its result
by returning to two dierent points.) Transmission of composite values as function results
can be dicult, especially when these are arrays whose sizes are not known to the caller. This
means that the caller cannot reserve storage for the result in his own environment a priori;
as a last resort such objects may be left on the heap (Section 3.3.3).
only advantage of static allocation then consists of the fact that no operations for storage
reservation or release need be generated at block or procedure entry and exit.
3.3.2 Dynamic Storage Management Using a Stack
As we have already noted in Section 2.5.2, all declared values in languages such as Pascal and
SIMULA have restricted lifetimes. Further, the environments in these languages are nested:
The extent of all objects belonging to the contour of a block or procedure ends before that of
objects from the dynamically enclosing contour. Thus we can use a stack discipline to manage
these objects: Upon procedure call or block entry, the activation record containing storage for
the local objects of the procedure or block is pushed onto the stack. At block end, procedure
return or a jump out of these constructs the activation record is popped o of the stack. (The
entire activation record is stacked, we do not deal with single objects individually!)
An object of automatic extent occupies storage in the activation record of the syntactic
construct with which it is associated. The position of the object is characterized by the base
address, b, of the activation record and the relative location oset ), R, of its storage within
the activation record. R must be known at compile time but b cannot be known (otherwise
we would have static storage allocation). To access the object, b must be determined at run
time and placed in a register. R is then either added to the register and the result used
as an indirect address, or R appears as the constant in a direct access function of the form
`register+constant'.
Every object of automatic extent must be decomposable into two parts, one of which has
a size that can be determined statically. (The second part may be empty.) Storage for the
static parts is allocated by the compiler, and makes up the static portion of the activation
record. (This part is often called the rst order storage of the activation record.) When a
block or procedure is activated, the static part of its activation record is pushed onto the
stack. If the activation record contains objects whose sizes must be determined at run time,
this determination is carried out and the activation record extended. The extension, which
may vary in size from activation to activation, is often called the second order storage of the
activation record. Storage within the extension is always accessed indirectly via information
held in the static part; in fact, the static part of an object may consist solely of a pointer to
the dynamic part.
An array with dynamic bounds is an example of an object that has both static and
dynamic parts. In most languages, the number of dimensions of an array is xed, so the size
of the array descriptor is known at compile time. Storage for the descriptor is allocated by the
compiler in the static part of the activation record. On encountering the declaration during
execution, the bounds are evaluated and the amount of storage needed for the array elements
is determined. The activation record is extended by this amount and the array descriptor is
initialized appropriately. All accesses to elements of the array are carried out via the array
descriptor.
We have already noted that at compile time we do not know the base address of an
activation record; we know only the range to which it belongs. From this we must determine
the base address, even in the case where recursion leads to a number of activation records
belonging to the same range. The range itself can be specied by its block nesting depth, bnd,
dened according to the following rules based on the static structure of the program:
The main program has bnd = 1.
A range is given bnd = t + 1 if and only if the immediately enclosing range has bnd = t.
Bnd = t indicates that during execution of the range the state consists of a total of t
nested contours.
3.3 Storage Management 57
If, as in all ALGOL-like languages, the scopes of identiers are statically nested then
at every point in the execution history of a program there is at most one activation record
accessible at a given nesting depth. The base address of a particular activation record can
then be found by noting the corresponding nesting depth at compile time and setting up a
mapping s : nestingdepth ! baseaddress during execution. The position of an object in the
xed part of the activation record is fully specied by the pair (bnd; R); we shall therefore
speak of `the object (bnd; R)'.
The mapping s changes upon range entry and exit, procedure call and return, and jumps
out of blocks or procedures. Updating s is thus one of the tasks (along with stack pointer
updating and parameter or result transmission) of the state-altering operations that we met
in Section 2.5.2. We shall describe them semi-formally below, assuming that the stack is
described by:
k : array[0 .. upper limit ] of storage cell ; k top : 0 .. upper limit ;
We assume further that a storage cell can hold exactly one address, and we shall treat address
variables as integer variables with which we can index k.
The contour nesting and pointer to dynamic predecessor required by the contour model
are represented by address values stored in each activation record. Together with the re-
turn address, and possibly additional information depending upon the implementation, they
constitute the `administrative overhead' of the activation record. A typical activation record
layout is shown in Figure 3.11; the corresponding state change operations are given in Figure
3.12. We have omitted range entry/exit operations. As noted in Section 2.5.2, procedures and
blocks can be treated identically by regarding a block as a parameterless procedure called `on
the spot', or contours corresponding to blocks can be eliminated and objects lying upon them
can be placed on the contour of the enclosing procedure. If blocks are to be given separate
activation records, the block entry/exit operations are identical to those for procedures except
that no return address is saved on entry and ip is not set on exit. Jumps out of blocks are
treated exactly as shown in Figure 3.12c in any case.
Second-order storage
2 Return Address
1 Pointer to Dynamic Predecessor First-order storage
0 Pointer to Static Predecessor
Figure 3.11: Typical Activation Record Layout
The procedure and jump addresses indicated by the comments in Figures 3.12a and c
are supplied by the compiler; the environment pointers must be determined at run time. If
a procedure is invoked directly, by stating its identier, then it must lie within the current
environment and its static predecessor can be obtained from the stack by following the chain
of static predecessors until the proper block nesting depth is reached:
environment := ep ;
for i := bndcaller downto
bndprocedure do
environment := k [environment ];
58 Properties of Real and Abstract Machines
The value (bndcaller - bndprocedure ) compile time and is usually small, so the loop is
sometimes `unrolled' to a xed sequence of environment := k [environment ] operations.
k[k top] := (* static predecessor of the procedure *);
k[k top + 1] := ep; (* dynamic predecessor *)
k[k top + 2] := ip; (* return address *)
ep := k top; (* current environment *)
k top := k top + "size"; (* rst free location *)
ip := (* procedure code address *)
a) Procedure entry
k top := ep;
ep := k[k top + 1]; (* back to the dynamic predecessor *)
ip := k[k top + 2];
b) Procedure exit
k top := ep;
ep := (* target environment of the jump *);
while k[k top + 1] 6= ep do
k top := k[k top + 1]; (* leave all intermediate environments *)
ip := (* target address of the jump *);
c) Jump out of a procedure
Figure 3.12: Environment Change Operations
When a procedure is passed as a parameter and then the parameter is called, the static
predecessor cannot be obtained from the stack because the called procedure may not be in
the environment of the caller. (Figures 2.3 and 2.5 illustrate this problem.) Thus a procedure
parameter must be represented by a pair of addresses: the procedure entry point and the
activation record address for the environment statically enclosing the procedure declaration.
This pair is called a closure . When a procedure parameter is invoked, the address of the
static predecessor is obtained from the closure that represents the parameter. Figure 3.13
shows the stack representing the contours of Figure 2.5; note the closures appearing in the
activation records for procedure p.
Jumps out of a procedure also involve changing the state (Figure 3.12c). The mechanism
is essentially the same as that discussed above: If the label is referenced directly then it lies in
the current environment and its environment pointer can be obtained from the stack. A label
variable or label parameter, however, must be represented by a closure and the environment
pointer obtained from that closure.
Access to any object in the environment potentially involves a search down the chain of
static predecessors for the pointer to the activation record containing that object. In order to
avoid the multiple memory accesses required, a copy of the addresses can be kept in an array,
called a display, indexed by the block nesting depth. Access to the object (bnd; R) is therefore
provided by display[bnd] + R; we need only a single memory access, loading display[bnd] into
a base register, to set up the access function.
The Burroughs 6000/7000 series computers have a 32-register display built into the hard-
ware. This limits the maximum block nesting depth to 32, which is no limitation in practice.
Even a restriction to 16 is usually no problem, but 8 is annoying. Thus the implementation
of a display within the register set of a multiple-register machine is generally not possible,
because it leads to unnatural restrictions on the block nesting depth. The display can be
3.3 Storage Management 59
22
location after 1 : f
12 Activation record for procedure q
19 5
i=0
11 (reference to i)
5 (q's environment)
entry point address for q Activation record for procedure p
location after p(q; i)
5
12 0
i=2
4 (reference to k)
0 (empty 's environment)
entry point address for empty Activation record for procedure p
location after p(empty; k)
0
5 0
k=0
n=7
0 Activation record for procedure outer
0
0 0
Note:
k top = 22
ep = 19
ip = address of label 2
Figure 3.13: Stack Conguration Corresponding to Figure 2.5
allocated to a xed memory location, or we might keep only a partial display (made up of the
addresses of the most-frequently accessed activation records) in registers. Which activation
record addresses should be kept is, of course, program-dependent. The current activation
record address and that of the outermost activation record are good choices in Pascal; the
latter should probably be replaced with that of the current module in an implementation of
any language providing modules.
If any sort of display, partial or complete, is used then it must be kept up to date as the
state changes. Figure 3.14 shows a general procedure for bringing the display into synchronism
with the static chain. It will alter only those elements that need alteration, halting when the
remainder is guaranteed to be correct. In many cases the test for termination takes more
time than it saves, however, and a more appropriate strategy may be simply to reload the
entire display from the static chain.
Note that the full generality of update display is needed only when returning from a pro-
cedure or invoking a procedure whose identity is unknown. If a procedure at level bndnew in
the current addressing environment is invoked, the single assignment display[bndnew] := a
suces. (Here a is the address of the new activation record.) Display manipulation can
become a signicant overhead for short procedures operating at large nesting depths. Recog-
nition of special cases in which this manipulation can be avoided or reduced is therefore an
60 Properties of Real and Abstract Machines
released only if the program fragment (block, procedure, class) to which it belongs has been
left and no pointers to objects within this activation record exist.
Heap allocation is particularly simple if all objects required during execution can t into
the designated area at the same time. In most cases, however, this is not possible. Either
the area is not large enough or, in the case of virtual storage, the working set becomes too
large. A detailed discussion of heap storage management policies is beyond the scope of this
book (see Section 3.5 for references to the relevant literature). We shall only sketch three
possible recycling strategies for storage and indicate the support requirements placed upon
the compiler by these strategies.
If a language provides an explicit `release' operation, such as Pascal's dispose or PL/1's
free, then heap storage may be recycled by the user. This strategy is simple for the compiler
and the run-time system, but it is unsafe because access paths to the released storage may
still exist and be used eventually to access recycled storage with its earlier interpretation.
The release operation, like the allocation operation, is almost invariably implemented as a
call on a support routine. Arguments that describe the size and alignment of the storage area
must be supplied to these calls by the compiler on the basis of the source type of the object.
Automatic reclamation of heap storage is possible only if the designers of a language
have considered this and made appropriate decisions. The key is that it must be possible
to determine whether or not a variable contains an address. For example, only a variable
of pointer type may contain an address in a Pascal program. A special value, nil, indicates
the absence of a pointer. When a pointer variable is created, it could be initialized to nil.
Unfortunately, Pascal also provides variant records and does not require such records to have
a tag eld indicating which variant is in force. If one variant contains a pointer and another
does not, it is impossible to determine whether or not the corresponding variable contains a
pointer. Detailed discussion of the tradeos involved in such a decision by a language designer
is beyond the scope of this text.
Storage can be recycled automatically by a process known as garbage collection, which
operates in two steps:
Mark. All accessible objects on the heap are marked as being accessible.
Collect. All heap storage is scanned. The storage for unmarked objects is recycled, and
all marks are erased.
This has the advantage that no access paths can exist to recycled storage, but it requires
considerable support from the compiler and leads to periodic pauses in program execution. In
order to carry out the mark and collect steps, it must be possible for the run-time system to
nd all pointers into the heap from outside, nd all heap pointers held within a given object
on the heap, mark an object without destroying information, and nd all heap objects on a
linear sweep through the heap. Only the questions of nding pointers aect the compiler;
there are three principal possibilities for doing this:
1. The locations of all pointers are known beforehand and coded into the marking algo-
rithm.
2. Pointers are discovered by a dynamic type check. (In other words, by examining a
storage location we can discover whether or not it contains a pointer.)
3. The compiler creates a template for each activation record and for the type of every
object that can appear on the heap. Pointer locations and (if necessary) the object
length can be determined from the template.
62 Properties of Real and Abstract Machines
Pointers in the stack can also be indicated by linking them together into a chain, but this
would certainly take too much storage on the heap.
Most LISP systems use a combination of (1) and (2). For (3) we must know the target type
of every pointer in order to be able to select the proper template for the object referenced.
This could be indicated in the object itself, but storage would be saved if the template carried
the number or address of the proper template as well as the location of the pointer. In this
manner we also solve the problem of distinguishing a pointer to a record from the pointer to
its rst component. Thus the template for an ALGOL 68 structure could have the following
structure:
Length of the structure (in storage units)
For each storage unit, a Boolean value `reference'
For each reference, the address of the template of the referenced type.
If dynamic arrays or variants are allowed in records then single Boolean values indicating
the presence of pointers are no longer adequate. In the rst case, the size and number of
components are no longer known statically. The template must therefore indicate the location
of descriptors, so that they can be interpreted by the run-time system. In the second case the
position of the variant selector and the dierent interpretations based upon its value must be
known. If, as in Pascal, variant records without explicit tag elds are allowed, then garbage
collection is no longer possible.
Garbage collection also requires that all internal temporaries and registers that can contain
references must be identied. Because this is very dicult in general it is best to arrange the
generated code so that, whenever a garbage collection might occur, no references remain in
temporaries or registers.
The third recycling strategy requires us to attach a counter to every object in the heap.
This counter is incremented whenever a reference to the object is created, and decremented
whenever a reference is destroyed. When the counter is decremented to its initial value of 0,
storage for the object can be recycled because the object is obviously inaccessible. Mainte-
nance of the counters results in higher administrative and storage costs, but the overheads are
distributed. The program simply runs slower overall; it does not periodically cease normal
operation to reclaim storage. Unfortunately, the reference counter method does not solve all
problems:
Reference counts in a cyclic structure will not become 0 even after the structure as a
whole becomes inaccessible.
If a counter over
ows, the number of references to the object is lost.
A complete solution requires that the reference counters be backed up by a garbage col-
lector.
To support storage management by reference counting, the compiler must be able to iden-
tify all assignments that create or destroy references to heap objects. The code generated for
such assignments must include appropriate updating of the reference counts. Diculties arise
when variant records may contain references, and assignments to the tag eld identifying the
variant are allowed: When such an assignment alters the variant, it destroys the reference
even though no direct manipulation of the reference has taken place. Similar hidden destruc-
tion occurs when there is a jump out of a procedure that leads to deletion of a number of
activation records containing references to heap objects. Creation of references is generally
easier to keep track of, the most dicult situation probably being assignment of a composite
value containing references as minor components.
3.4 Mapping Specications 63
Section 1 of the mapping specication relies heavily on the manufacturer's manual for
the target machine. It describes the machine as it will be seen by the code generator, with
anomalies smoothed out and omitted operations (to be implemented by code sequences or
subroutines) in place. The actual details of realizing the abstraction might be included, or this
information might be the subject of a separate specication. We favor the latter approach,
because the abstraction should be almost entirely language-independent. It is clear that
the designer must decide which facilities to include in the abstract machine and which to
implement as part of the operation mapping. We cannot give precise criteria for making this
choice. (The problem is one of modular decomposition, with the abstraction constituting a
module and the operation encoding using the facilities of that module.)
The most dicult part of Section 2 of the mapping specication is Section 2.3, which
is tightly coupled to Section 3.1. Procedure mechanisms advocated by the manufacturer
are often ill-suited to the requirements of a given language. Several alternative mechanisms
should be explored, and detailed cost estimates prepared on the basis of some assumptions
about the relative numbers of calls at various static nesting depths and accesses to variables.
It is imperative that these assumptions be carefully stated, even though there is only tenuous
justication for them; unstated assumptions lead to con
icting judgements and usually to
a suboptimal design. Also, if measurements later indicate that the assumptions should be
changed, the dependence of the design upon them is clearly stated.
Control structure implementation can be described adequately using notation similar to
that of Figure 3.9. When a variety of information is exchanged among nodes of an expres-
sion, however, description of the encoding for each node is complicated. The best notation
available seems to be the extended-entry decision table, which we discuss in this context in
Section 10.3.2.
A mapping specication is arrived at by an iterative process, one that should be allotted
sucient time in scheduling a compiler development project. The cost is dependent upon
the complexities of both the source language and the target machine. In one specic case,
involving a Pascal implementation for the Motorola 68000, two man-months of eort was
required over a six-month period. One person should be responsible for the specication, but
at least one other (and preferably several) should be involved in frequent critical reviews. The
objective of these reviews should be to test the reasoning based upon the stated assumptions,
making certain that it has no
aws. Challenging the assumptions is less important unless
specic evidence against them is available.
Sections 2.1 and 2.2 of the mapping specication should probably be written rst. They
are usually straightforward, and give a basis on which to build. Sections 2.3 and 3.1 should be
next. As indicated earlier, these sections interact strongly and involve dicult decisions. The
remainder of Section 3 is tedious, but should be carried out in full detail. It is only by being
very explicit here that one learns the quirks and problems of the machine, and discovers the
aws in earlier reasoning about storage mapping. Section 1 should be done last, not because
it is the least important, but because it is basically a modication of the machine manual in
the light of the needs generated by Section 3.
has been dened by a committee of IEEE [Stevenson, 1981]. McLaren [1970] provides a
comprehensive discussion of data structure packing and alignment. Randell and Russell
[1964] detail the implementation of activation record stacks and displays in the context of
ALGOL 60; Hill [1976] updates this treatment to handle the problems of ALGOL 68.
Static storage management is not the only possible strategy for FORTRAN implementa-
tions. Both the 1966 and 1978 FORTRAN standards restrict the extent of objects, and thus
permit dynamic storage management via a stack. We have not pursued the special storage al-
location problems of COMMON blocks and EQUIVALENCE statements here; the interested
reader is referred to Chapter 10 of the book by Aho and Ullman [1977] and the original
literature cited there.
Our statements about the probability of access to objects at various nesting depths are
debatable because no really good statistics exist. These probabilities are dependent upon the
hierarchical organization of the program, and may vary considerably between applications
and system programs.
The fact that a procedure used as a parameter must carry its environment with it ap-
pears in the original treatment of LISP [McCarthy, 1960]. Landin [1964] introduced the
term `closure' in connection with his mechanization of Lambda expressions. More detailed
discussions are given by Moses [1970] and Waite [1973a]. Hill [1976] applied the same
mechanism to the problem of dynamic scope checking in ALGOL 68.
An overall treatment of storage management is beyond the scope of this book. Knuth
[1968a] provides an analysis of the various general strategies, and a full discussion of most
algorithms known at the time. A general storage management package that permits a wide
range of adaptation was presented by Ross [1967]. The most important aspect of this package
is the interface conventions, which are suitable for most storage management modules.
Both general principles of and algorithms for garbage collection and compaction (the
process of moving blocks under the user's control to consolidate the free space into a single
block) are covered by Waite [1973a]. Wegbreit [1972] discusses a specic algorithm with
an improved worst-case running time.
Several authors [Deutsch and Bobrow, 1976; Barth, 1977; Morris, 1978] have shown
how to reduce the cost of reference count systems by taking special cases into account. Clark
and Green [1977] demonstrated empirically that over 90% of the objects n typical LISP
programs never have reference counts greater than 1, a situation in which the technique
operates quite eciently.
Exercises
3.1 List the storage classes and access paths available on some machine with which you are
familiar. Did you have diculty in classifying any of the machine's resources? Why?
3.2 Consider access to data occupying a part of a word on some machine with which you
are familiar. Does the best code depend upon the bit position within the word? Upon
the size of the accessed eld? Try to characterize the set of `best' code sequences. What
information would you need to choose the proper sequence?
3.3 [Steele, 1977] Consider the best code for implementing multiplication and division of
an integer by a power of 2 on some machine with which you are familiar.
(a) Would multiplication by 2 best be implemented by an add, a multiply or a shift?
Give a detailed analysis, taking into account the location and possible values of
the multiplicand.
66 Properties of Real and Abstract Machines
(b) If you chose to use a shift for division, would the proper result be obtained when
the dividend was negative? Explain.
(c) If your machine has a condition code that is set as a side eect of arithmetic
operations, would it be set correctly in all of the cases discussed above?
3.4 For some computer with which you are familiar, design encodings for the elementary
types boolean , integer , real of Pascal. Carefully defend your choice.
3.5 Consider the representation of a multi-dimensional array.
(a) In what manner can a user of ALGOL, FORTRAN or Pascal determine whether
the elements are stored in row- or column-major order?
(b) Write optimum code for some computer with which you are familiar that imple-
ments the following doubly-nested loop over an object of type array [1::m; 1::n]
of integer stored in row-major order. Do not alter the sequence of assignments
to array elements. Compare the result with the same code for an array stored in
column-major order.
for i := 1 to m do
forj := 1 to
n do
a [i, j] := 0;
(c) Explain why a test that the aective address of an array element falls within
the storage allocated to the array is not sucient to guarantee that the access is
dened.
3.6 Carefully describe the implementation of the access function for an array element (Sec-
tion 3.2.2) in each of the following cases:
(a) The ctitious starting address lies outside of the address space of the computer.
(b) The computer provides only base registers (i.e. the registers involved in the access
computation of Section 3.1.3 cannot hold signed values).
3.7 Consider a computer requiring certain data items to be stored with alignment 2, while
others have no alignment constraints. Give an algorithm that will rearrange any arbi-
trary record to occupy minimum storage. Can this algorithm be extended to a machine
whose alignment constraints require addresses divisible by 2, 4 and 8?
3.8 Give a mapping of a Pascal while statement that places the condition at the begin-
ning and has the same number of instructions as Figure 3.9d. Explain why there is
less opportunity for parallel execution in your mapping than in Figure 3.9d. Under
what circumstances would you expect your expansion to execute in less time than Fig-
ure 3.9d? What information would the compiler need in order to decide between these
schemata on the basis of execution time?
3.9 Consider the mapping of a BASIC FOR statement with the general form:
FOR I= e1 TO e2 STEP e3
:::
NEXT I
Give implementations of forbegin and forend under each of the following conditions:
(a) e1 =1, e2 =10, e3 =1
(b) e1 =1, e2 =10, e3 =7
3.5 Notes and References 67
type
tokens = ( (* classication of LAX tokens *)
identifier , (* A.1.0.2 *)
integer denotation , (* A.1.0.6 *)
floating point denotation , (* A.1.0.7 *)
plus , :::
, equivalent , (* specials: A.1.0.10 *)
and kw , :::
, while kw ); (* keywords: A.1.0.11 *)
abstract token = record
location : coordinates; (* for error reports *)
caseclassification : tokens of
identifier : (sym : symbol );
integer denotation : (intv : integer value );
floating point denotation : (fptv : real value );
end;
A LAX identier has no intrinsic meaning that can be determined from the character string
constituting that identier. As a basic symbol, therefore, the only property distinguishing
one identier from another is its external representation. This property is embodied in the
sym eld of the token. Section 4.2.1 will consider the type symbol , and explain how the
external representation is encoded.
The eld intv or fptv is a representation of the value denoted by the source language
denotation that the token abstracts. There are several possibilities, depending upon the goals
of the particular compiler; Section 4.2.2 considers them in detail.
abstract syntax. This means that any node corresponding to a rule dening any of these will
have the attributes of an expression attached to it. Figure 4.2b indicates which of the names
dened by rules used in Figure 4.2a are associated with the same abstract syntax construct.
A.4.0.2
A.4.0.16a A.4.0.9b
A.1.0.2 A.1.0.2
a) Structure
expression , assignment , disjunction , conjunction ,
comparison , relation , sum , term , factor , primary :
primode , postmode : entity
name :
mode : entity
identifier :
sym : symbol
ent : entity
b) Attributes
Figure 4.2: Structure Tree for x := y + z
The sym attribute of an identier is just the value of the sym eld of the corresponding
token (Figure 4.1). This attribute is known as soon as the node to which it is attached is
created. We call such attributes intrinsic. All of the other attributes in the tree must be
computed. The details of the computations will be covered in Chapters 8 and 9; here we
merely sketch the process.
Ent characterizes the object (for example, a particular integer variable) corresponding to
the identier sym . It is determined by the declarations valid at the point where the identier
is used, and gives access to all of the declarative information. Section 4.2.3 discusses possible
representations for an entity .
The mode attribute of a name is the type of the object named. In our example it can
be obtained directly from the declarative information made accessible by the ent attribute
of the descendant node. In any case, it is computed on the basis of attributes appearing
in the `A.4.0.16a' node and its descendants. The term synthesized is used to describe such
attributes.
Two types are associated with each expression node in the tree. The rst, primode , is the
type determined without regard to the context in which the expression is embedded. This
is a synthesized attribute, and in our example the primode of an expression dened by an
`A.4.0.15b' node is simply the mode of the name below it. The second type, postmode , is the
72 Abstract Program Representation
type demanded by the context in which the expression is embedded. It is computed on the
basis of attributes of the expression node, its siblings, and its ancestors. Such attributes are
called inherited.
If primode 6= postmode then either a semantic error has occurred or a coercion is neces-
sary. For example, if y and z in Figure 4.2 were declared to be of types boolean and real
respectively then there is an error, whereas if they were declared to be integer and real
then a coercion would be necessary.
Three classes of operation, creation, access and assignment are necessary to manipulate
the structure tree. A creation operation establishes a new node of a specied type. Assignment
operations are used to interconnect nodes and to set attribute values, while access operations
are used to extract this information. With these operations we can build trees, traverse them
computing attribute values, and alter their structure. Structure tree operations are invoked
as the source program is parsed, constructing the tree and setting intrinsic attribute values.
One or more additional traversals of the completed tree may be necessary to establish all
attribute values. In some cases the structure of the tree may be altered during attribute
computation. Chapter 8 explains how the necessary traversals of the structure tree can be
derived from the dependence relations among the attributes. (Figure 4.3 shows some basic
traversal strategies.)
process node A;
if node A is not a leaf then
process all subtrees of A from left to right;
a) Prex traversal
if node A is not a leaf then
process all subtrees of A from left to right;
process node A;
b) Postx Traversal
process node A;
while subtrees of A remain do
begin
process next (to the right) subtree of A;
process node A;
end;
c) Hybrid traversal
Figure 4.3: Traversal Strategies
The result of processing a structure tree is a collection of related information. It may
be possible to produce this result without ever actually constructing the tree. In that case,
the structure and attributes of the tree were eectively embedded in the processing code.
Another possibility is to have an explicit data structure representing the tree. Implementation
constraints often prevent the compiler from retaining the entire data structure in primary
memory, and secondary storage must be used. If the secondary storage device is randomly-
addressable, only the implementation of the structure tree operations need be changed. If
it is sequential, however, constraints must be placed upon the sequences of invocations that
are permitted. An appropriate set of constraints can usually be derived rather easily from a
consideration of the structure tree traversals required to compute the attributes.
Any of the traversal strategies described by Figure 4.3 could be used with a sequential
storage device: In each case, the operation `process node A' implies that A is the currently-
4.1 Intermediate Languages 73
accessible element of the device. It may be read, altered, and written to another device.
The remaining operations advance the device's `window', making another element accessible.
Figure 4.4 illustrates the correspondence between the tree and the sequential le. The letters
in the nodes of Figure 4.4a stand for the attribute information. In Figures 4.4b and 4.4c, the
letters show the position of this information on the le. Figure 4.4d diers from the others
in that each interior node is associated with several elements of the le. These elements
correspond to the prex encounter of the node during the traversal (
agged with `('), some
number of inx encounters (
agged with `,'), and the postx encounter (
agged with `)').
Information from the node could be duplicated in several of these elements, or divided among
them.
a
b c
d e f
g h
a) A tree
d e b g h f c a
b) Postx linearization
a b d e c f g h
c) Prex linearization
a b d b e b a c f g f h f c a
( ( , ) , ( ( , ) ) )
d) Hybrid linearization
Figure 4.4: Linearization by Tree Traversal
The most appropriate linearization of the tree on the basis of tree traversals and tree
transformations is heavily dependent upon the semantic analysis, optimization and code gen-
eration tasks. We shall return to these questions in Chapter 14. Until then, however, we shall
assume that the structure tree may be expressed as a linked data structure.
stract nature of the computation graph: It uses target operations, but not target instructions,
separating operations from access paths. Moreover, the concept of a value has been separated
from that of a variable. As we shall see in Chapter 13, this is a crucial point for common
subexpression recognition.
SUB
i
j
JZERO
exit
SUB
i
j
JNEG
SUB SUB
i i
j j
STORE STORE
j i
ADR ADR
a a
VAL VAL
4 4
MUL MUL
i j
PA PA
STI LDI
t1 : i " t4 : j "
t2 : t1 4 t5 : t4 4
t3 : a + t2 t6 : a + t5
t7 : t3 := t6
Figure 4.7: Human-Readable Representation of Figure 4.1.3
computations. These characteristics are largely independent of both the source language and
the target computer.
The operations necessary to manipulate the target tree fall into the same classes as those
necessary to manipulate the structure tree. As with the structure tree, memory constraints
may require that the target tree be placed in secondary memory. The most reasonable lin-
earization to use in this case is one corresponding closely to the structure of a normal symbolic
assembly language.
Figure 4.8 gives a typical layout for a target tree node. Machine op would be a variant
record that could completely describe any target computer instruction. This record might
have elds specifying the operation, one or more registers, addresses and addressing modes.
Similarly, constant specification must be capable of describing any constant representable
on the target computer. For example, the specication of a literal constant would be similar
to that appearing in a token (Figure 4.1 and Section 4.2.2); an address constant would be
specied by a pointer to an expression node dening the address. In general, the amount of
space to be occupied by the constant must also be given.
type
instructions = ( (* Classication of target abstractions *)
operation , (* machine instruction *)
constant , (* constant value *)
label , (* address denition *)
sequence , (* code sequence *)
expression ); (* address expression *)
"
target node = t node block ;
t node block = record
link : target node;
caseclassification : instructions of
operation : (instr : machine op );
constant : (value : constant specification );
label : (addr : address );
sequence : (seq , origin : target node );
expression : (rator : expr op ; rand 2 : target node );
end;
operand. It is important to stress that this attribute is not set by the code generator; the
code generator is responsible only for establishing the label node and any linkages to it.
A target program may consist of an arbitrary number of code sequences, each of which
consists of instructions and/or data placed contiguously in the target computer memory. Each
sequence appears in the target tree as a list of operation, constant and label nodes rooted
in a sequence node. If the origin eld of the sequence node species an address expression
then the sequence begins at the address which is the value of that expression. Thus the
placement of a sequence can be specied relative to another sequence or absolutely in the
target computer memory. In the absence of an origin expression, a sequence will be placed
in an arbitrary position that guarantees no overlap between it and any other sequence not
based upon it. (A sequence s1 is based upon a sequence s2 when the origin expression of s1
depends upon a label node in s2 or in some sequence based upon s2 .) Related code sequences
whose origin expressions result in gaps between them serve to reserve uninitialized storage,
while overlapping sequences indicate run-time overlays.
Address expressions may contain integers and machine addresses, combined by the four
basic integer operations with the normal restrictions for subexpressions having machine ad-
dresses as operands. The code generator must guarantee that the result of an address ex-
pression will actually t into the eld in which it is being used. For some machines, this
guarantee cannot be made in general. As a result, either restrictions must be placed upon the
expressions used by the code generator or the assembler must take over some aspects of the
code generation task. Examples of the latter are the nal selection of an instruction from a
set whose members dier only in address eld size (e.g. short vs. long jumps), and selection
of a base register from a set used to access a block of memory. Chapter 11 will consider such
problems in detail.
Although the symbol table is used primarily for identiers, we advocate inclusion of key-
words as well. No separate recognition procedure is then required for them. With this
understanding, we shall continue to speak of the symbol table as though its only contents
were identiers.
The symbol is used later as a key to access the identier's attributes, so it is often encoded
as a pointer to a table containing those attributes. A pointer is satisfactory hen only one such
table exists and remains in main storage. Positive integers provide a better encoding when
several tables must be combined (as for separate compilation in Ada) or moved to secondary
storage. In the simplest case the integers chosen would be 1,2,: : :
Identiers may be character strings of any length. Since it may be awkward to store
a table of strings of various lengths, many compilers either x the maximum length of an
identier or check only a part of the identier when computing the mapping. We regard
either of these strategies as unacceptable. Clearly the nite size of computer memory will
result in limitations, but these should be placed on the total number of characters rather
than the length of an individual identier. Failure to check the entire identier may result in
incorrect analysis of the source program with no indication to the programmer.
The solution is to implement the symbol table as two distinct components: a string table
and a lookup mechanism. The string table is simply a very large, packed array of characters,
capable of holding all of the distinct identiers appearing in a program. It is implemented
using a conventional virtual storage scheme (Exercise 4.4), which provides for allocation of
storage only as it is needed. The string forms of the identiers are stored contiguously in this
array, and are specied by initial index and length.
In view of the large number of entries in the symbol table (often resulting mainly from
standard identiers), hash techniques are preferable to search trees for implementing the
lookup mechanism. The length of the hash table must be specied statically, before the
number of identiers is known, so we choose the scheme known as `open hashing' or `hash
with chaining': A computation is performed on the string to select one of M lists, which is
then searched sequentially. If the computation distributes the strings uniformly over the lists,
then the length of each will be approximately (number of distinct identiers)/M . By making
M large enough the lengths of the lists can be reduced to one or two items.
The rst decision to be made is the choice of hash function. It should yield a relatively
smooth distribution of the strings across the M lists, evaluation should be rapid, and it must
be expressible in the implementation language. One computation that gives good results is
to express the string as an integer and take the residue modulo M . M should be a prime
number not close to a power of the number of characters in the character set. For example,
M = 127 would not be a good choice if we were dealing with a 128-character set; M = 401,
on the other hand, should prove quite satisfactory.
There are two problems with the division method: It is time-consuming for strings whose
integer representations exceed the single-length integer range of the implementation language,
and it cannot be expressed at all if the implementation language is strongly typed. To solve
the former, we generally select some substring for the hash computation. Heads or tails of
the string are poor choices because they tend to show regularities (SUM1, SUM2, SUM3
or REDBALL, BLUEBALL, BLACKBALL) that cause the computation to map too many
strings into the same list. A better selection is the center substring:
if jsj n then s else substr (s; (jsj , n) div 2; n);
(Here s is the string, jsj is the length of s and n is the length of the longest string representable
as a single-length integer. The function substr (s; f; l) yields the l-character substring of s
beginning at the f th character.)
The constraints of a strongly-typed implementation language could be avoided by provid-
ing a primitive transfer function to convert a suciently short string into an integer for type
4.2 Global Tables 79
checking purposes. It is important that this transfer function not involve computation. For
example, if the language provides a transfer function from characters to integers, a transfer
function from strings to integers could be synthesized by a loop. This approach defeats the
whole purpose of the hashing function, however, by introducing a time-consuming computa-
tion. It would probably be preferable to use a single character to select the list in this case
and accept a longer search!
Comparison of the input identier with the symbols already present in the table can be
speeded up by a variety of quick checks, the simplest of which is comparison of string lengths.
Whether or not such checks are useful depends upon the precise costs of string comparison
and string table access.
In a multi-pass compiler, the lookup mechanism may be discarded after the lexical analysis
has converted identiers to symbols. The string table must, however, be retained for later
tasks such as module linking.
follow the target machine collating sequence. The range of integer values, however, must
normally be larger than that of the target machine. Suppose that we compile a program
containing the type constructor of the previous paragraph for the PDP11 (maxint = 32767).
Suppose further that l = ,5000, u = 5000 and m is real. This is a perfectly legal declaration
of an array that will easily t into the 65536-byte memory of the PDP11, but computation
of its size in bytes (40004) over
ows the PDP11's integer range.
If the compiler is being executed on the target machine, this requirement for increased
range implies that the computational and comparison operations of the constant table must
use a multiple-precision representation. Knuth [1969] describes in detail how to implement
such a package.
Although, as shown above, over
ow of the target machine's arithmetic range is legitimate
in some cases, it is often forbidden. When the user writes an expression consisting only of
constants, and that expression over
ows the range of the target machine, the over
ow must
be detected if the expression is evaluated by the compiler. This leads to a requirement that
the constant table module provide an over
ow indicator that is set appropriately by each
computational operator to indicate whether or not the computation would over
ow on the
target machine. Regardless of the state of the over
ow indicator, however, the constant table
should yield the (mathematically) correct result.
In most programming languages, a particular numeric value can be expressed in many
dierent ways. For example, each of the following LAX
oating point numbers expresses the
value `one thousand':
1000000E-3 1.0E3 .001E6 1000.0
The source-to-internal conversion operators of the constant module should accept only
a standardized input format. Nonzero integers are normally represented by a sequence of
digits, the rst of which is nonzero. A suitable representation for nonzero
oating point
numbers is the pair (signicand, exponent), in which the signicand is a sequence of digits
without leading or trailing zeros and the exponent is suitably adjusted. The signicand can be
interpreted either as an integer or a normalized decimal fraction. `One thousand' would then
be represented either as ('1',3) or as ('1',4) respectively. A fractional signicand is preferable
because it can be truncated or rounded without changing the exponent. Zero is represented
by ('0',0). In Section 6.2 we shall show how the standardized format is obtained by the lexical
analyzer.
If no
oating point arithmetic is provided by the constant table then the signicand can
be stored in a string table. The internal representation is the triple (string table index,
signicand length, adjusted exponent). When compile-time
oating point operations are
available,
oating point numbers are converted to an internal representation of appropriate
accuracy for which the arithmetic of the target machine can be simulated exactly. (Note that
decimal arithmetic is satisfactory only if the target machine also uses decimal arithmetic.)
choice of information to be included in a denition table.) Thus each form of the structure
tree has, at least conceptually, an associated denition table. Transformations of the structure
tree imply corresponding transformations of the denition table. Whether the denition table
is actually transformed, or a new denition table is built from the transformed tree, is an
implementation decision that depends upon two factors:
The relative costs of transformation and reconstruction.
The relationship between the traversal needed to reconstruct the information and the
traversal using that information.
When assessing the relative costs, we must be certain to consider the extra storage required
during the transformation as well as the code involved.
The second factor mentioned above may require some elaboration: Consider the denition
table used during semantic analysis and that used during code generation. Although the
structure tree may be almost the same for these two processes, the interesting attributes of
dened objects are usually quite dierent. During semantic analysis we are concerned with
source properties; during code generation with target properties. Thus the denition tables
for the two processes will dier. Suppose further that our code generation strategy requires
a single depth-rst, left-to-right traversal of the structure tree given that the denition table
is available.
If the denition table can be rebuilt during a single depth-rst, left-to-right traversal of the
structure tree, and every attribute becomes available before it is needed for code generation,
then rebuilding can be combined with code generation and the second factor noted above
does not lead to increased costs. When this condition is not satised, the second factor does
increase the rebuilding cost and this must be taken into account. It may then be cheaper to
transform the denition table between the last semantic analysis traversal and the rst code
generation traversal. (The attribute dependency analysis presented in Section 8.2 is used to
decide whether the condition is satised.)
A denition table is generally an unstructured collection of entries. Any arbitrary entry
can be accessed via a pointer in order to read an attribute or assign a new value. In a one-pass
compiler, a stack strategy could also be used: At every denition a new entry is pushed onto
the top of the stack, and at the end of a range all denitions found in the range are popped.
This organization has the advantage that only relevant entries must be held in storage.
Copies of some of the more-frequently accessed attributes of an entity may be included in
each leaf representing a use of that entity. The choice of such attributes depends upon the
particular compiler design; we shall return to this question several times, in Chapters 9, 10
and 14. It may be that these considerations lead to including all attributes in the leaf. The
denition table then ceases to exist as a separate data structure.
Ullman, 1977]. The concept of separate tables seems to be restricted to descriptions of multi-
pass compilers, as a mechanism for reducing main storage requirements [Naur, 1964]. This
is not invariably true, however, especially when one considers the literature on ALGOL 68
[Peck, 1971] In his description of a multi-pass Pascal compiler, Hartmann [1977] uses sep-
arate tables both to reduce core requirements and to provide better compiler structure.
Lookup mechanisms have concerned a large number of authors; the most comprehensive
treatment is that of Knuth. Knuth [1973] He gives details of a variety of mechanisms,
including hashing, and shows how they compare for dierent applications. It appears that
hashing is the method of choice for symbol table implementation, but there may be some
circumstances in which binary trees are superior [Palmer et al., 1974]. For symbol tables
with a xed number of known entries (e.g. keywords) Cichelli [1980] and Cercone et al.
[1982] describe a way of obtaining a hash function that does not have any collisions and hence
requires no collision resolution.
Exercises
4.1 [Sale, 1971; McIlroy, 1974] Specify abstract tokens for FORTRAN 66.
4.2 Specify a target node (Figure 4.1.3) suitable for some machine with which you are
familiar.
4.3 Is a symbol table needed to map identiers in a compiler for Minimal Standard BASIC?
Explain.
4.4 Implement a string table module, using a software paging scheme: Statically allocate an
array of pointers (a `page table') to blocks of xed size (`pages'). Initially no additional
blocks are allocated. When a string must be stored, try to t it into a currently-
allocated page. If this cannot be done, dynamically allocate a new page and place a
pointer to it in the page table. Carefully dene the interface to your module.
4.5 Implement a symbol table module that provides a lookup mechanism, and uses the
module of Exercise 4.4 to store the identier string.
4.6 Identier strings are specied in the module of Exercise 4.5 by the pair (string table
index, length). On a computer like the DEC PDP11, this specication occupies 8 bytes.
Comment on the relative merits of this scheme versus one in which identier strings
are stored directly if they are no longer than k bytes, and a string table is used for
those whose length exceeds k. What should the value of k be for the PDP11? Would
this scheme be appropriate for a multipass compiler?
4.7 Consider the FORTRAN expression `X * 3.1415926535897932385 * Y'. Assume that
no explicit type has been given for X, and that Y has been declared DOUBLE PRE-
CISION.
(a) Should the constant be interpreted as a single or double precision value? Explain.
(b) For some machine with which you are familiar, estimate the relative errors in the
single and double precision representations of the constant.
(c) Explain the relevance of this example to the problem of selecting the internal
representation to be provided by the constant table for
oating point numbers.
Chapter 5
Elements of Formal Systems
Formal grammars, in particular context-free grammars, are the tools most frequently used
to describe the structure of programs. They permit a lucid representation of that structure
in the form of parse trees, and one can (for the most part mechanically) specify automata
that will accept all correctly-structured programs (and only these). The automata are easy
to modify so that they output any convenient encoding of the parse tree.
We limit our discussion to the denitions and theorems necessary to understand and use
techniques explained in Chapters 6 and 7, and many theorems are cited without proof. In the
cases where we do sketch proofs, we restrict ourselves to the constructive portions upon which
practical algorithms are based. (We reference such constructions by giving the number of the
associated theorem.) A formally complete treatment would exceed both the objectives of and
size constraints on this book. Readers who wish to delve more deeply into the theoretical
aspects of the subject should consult the notes and references at the end of this chapter.
83
84 Elements of Formal Systems
5.1 Definition
Let = !; ; ! 2 V . The string is called a head, and the string ! a tail, of . If 6=
(! 6= ) then it is a proper head (tail) of .
Each subset of V is called a language over vocabulary V . The elements of a language are
called sentences. Interesting languages generally contain innitely many sentences, and hence
cannot be dened by enumeration. We therefore dene each such language, L, by specifying
a process that generates all of its sentences, and no other elements of V . This process may
be characterized by a binary, transitive relation )+ over V , such that L = f j )+ g
for a distinguished string 2 V . We term the relation )+ a derivative relation.
5.2 Definition
A pair (V; )+ ) consisting of a vocabulary V and a derivative relation )+ , is called a formal
system.
A derivative relation usually cannot be dened by enumeration either. We shall concern
ourselves only with relations that can be described by a nite set of pairs (; ) of strings
from V . We call such pairs productions, and write them as ! . The transitive closure of
the nite relation described by these productions yields a derivative relation. More precisely:
5.3 Definition
A pair (V; P ), consisting of a vocabulary V and a nite set, P , of productions ! (; 2
V ) is called a general rewriting (or Semi-Thue ) system.
5.4 Definition
A string is directly derivable from a string (symbolically ) ) by a general rewriting
system (V; P ) if there exist strings , , , 2 V such that = , = and !
is an element of P .
5.5 Definition
A string is derivable from a string (symbolically )+ ) by a general rewriting system
(V; P ) if there exist strings 0 ; : : : ; n 2 V (n 1) such that = 0 , n = and i,1 ) i ,
i = 1; : : : ; n. The sequence 0; : : : ; n is called a derivation of length n.
We write ) to indicate that either = or )+ . If is (directly) derivable from
, we also say that is (directly ) reducible to . Without loss of generality, we shall assume
that derivations )+ of a string from itself are impossible.
5.1.2 Grammars
Using the general rewriting system dened by Figure 5.1, it is possible to derive from E
every correct algebraic expression consisting of the operators + and , the variable i, and
the parentheses ( ). Many other strings can be derived also, as shown in Figure 5.2. In the
remainder of this chapter we shall concentrate on rewriting systems in which the vocabulary is
made up of two disjoint subsets: T , a set of terminals, and N , a set of nonterminals (syntactic
variables ). We will ultimately be interested only in those strings derivable from a distinguished
nonterminal (the axiom or start symbol ) and consisting entirely of terminals. (Thus we speak
of generative systems. One could instead consider analytic systems in which the axiom is
derived from a string of terminals. We shall return to this concept with Denitions 5.12
and 5.20.)
5.1 Descriptive Tools 85
fE; T; F; +; ; (; ); ig
a) The vocabulary V
fE ! T , E ! E + T ,
T ! F, T ! T F,
F ! i, F ! (E )g
b) The productions P
Figure 5.1: A General Rewriting System (V; P )
E)T
T )T F
T F )T i
a) Some immediate derivations
E ) T i (length 3)
E ) i + i i (length 8)
TiE ) iii (length 5)
TiE ) TiE (length 0)
E ) T (length 1)
b) Additional derivations
Figure 5.2: Derivations
5.6 Definition
A quadruple G = (T; N; P; Z ) is called a grammar for the language L(G) = f 2 T j Z ) g
if T and N are disjoint, (T [ N; P ) is a general rewriting system, and Z is an element of N .
We say that two grammars G and G0 are equivalent if L(G) = L(G0 ).
Figure 5.3 illustrates these concepts with two grammars that generate algebraic expressions
in the variable i. These grammars are equivalent according to Denition 5.6.
Grammars may be classied by the complexity of their productions:
5.7 Definition (Chomsky Hierarchy)
The grammar G = (T; N; P; Z ) is a
type 0 grammar if each production has the form ! , 2 V and 2 V .
+
type 1 (context-sensitive ) grammar if each production has the form A ! , ; 2 V ,
A 2 N and 2 V . +
T =f+; ; (; ); ig
N =fE; T; F g
P =fE ! T , E ! E + T ,
T ! F, T ! T F,
F ! i, F ! (E )g
Z= E
a) A grammar incorporating (V; P ) from Figure 5.1
T =f+; ; (; ); ig
N =fE; E 0 ; T; T 0 ; F g
P =fE ! T , E ! TE 0 ,
0
E !+T , E 0 !+TE 0 ,
T ! F, T ! FT 0 ,
T 0 !*F , T 0 !*FT 0 ,
F ! i, F ! (E )g
Z= E
b) A grammar incorporating another general rewriting system
Figure 5.3: Equivalent Grammars
empty string. Such languages can always be described by -free grammars { grammars without
-productions. Therefore -productions will only be used when they result in more convenient
descriptions.
We assume further that every symbol in the vocabulary will appear in the derivation of
at least one sentence. Thus the grammar will not contain any useless symbols. (This is
not always true for actual descriptions of programming languages, as illustrated by the LAX
denition of Appendix A.)
5.1.3 Derivations and Parse Trees
Each production in a regular grammar can have at most one nonterminal on the right-hand
side. This property guarantees { in contrast to the context-free grammars { that each sen-
tence of the language has exactly one derivation when the grammar is unambiguous (Deni-
tion 5.11).
Figure 5.4a is a regular grammar that generates the non-negative integers and real numbers
if n represents an arbitrary sequence of digits. Three derivations according to this grammar
are shown in Figure 5.4b. Each string except the last in a derivation contains exactly one
nonterminal, from which a new string must be derived in the next step. The last string consists
only of terminals. The sequence of steps in each derivation of this example is determined by
the derived sentence.
The situation is dierent for context-free grammars, which may have any number of non-
terminals on the right-hand side of each production. Figure 5.5 shows that several derivations,
diering only in the sequence of application of the productions, are possible for a given sen-
tence. (These derivations are constructed according to the grammar of Figure 5.3a.)
In the left-hand column, a leftmost derivation was used: At each step a new string was
derived from the leftmost nonterminal. Similarly, a rightmost derivation was used in the
right-hand column. A nonterminal was chosen arbitrarily at each step to produce the center
derivation.
A grammar ascribes structure to a string not by giving a particular sequence of derivation
steps but by showing that a particular substring is derived from a particular nonterminal.
5.1 Descriptive Tools 87
T = fn; :; +; ,; E g
N = fC; F; I; X; S; U g
P = fC ! n, C ! nF , C ! :I ,
F ! :I , F ! ES ,
I ! n, I ! nX ,
X ! ES ,
S ! n, S ! +U , S ! ,U ,
U ! ng
Z = C
a) A grammar for real constants
C C C
n :I nF
:n n:I
n:nX
n:nES
n:nE + U
n:nE + n
b) Three derivations according to the grammar of (a)
Figure 5.4: Derivations According to a Regular Grammar
For example, in Figure 5.5 the substring i i is derived from the single nonterminal T . We
interpret this property of the derivation to mean that i i forms a single semantic unit: an
instance of the operator applied to the i's as operands. It is important to realize that the
grammar was constructed in a particular way specically to ascribe a semantically relevant
structure to each sentence in the language. We cannot be satised with any grammar that
denes a particular language; we must choose one re
ecting the semantic structure of each
sentence. For example, suppose that the rules E ! E + T and T ! T F of Figure 5.3a
had been replaced by E ! E T and T ! T + F respectively. The modied grammar would
describe the same language, but would ascribe a dierent structure to its sentences: It would
imply that additions should take precedence over multiplications.
E E E
E+T E+T E+T
T +T E+T F E +T F
F +T T +T F E +T i
i+T T +F F E+F i
i+T F T +F i E +ii
i+F F F +F i T +ii
i+iF i+F i F +ii
i+ii i+ii i+ii
Figure 5.5: Derivations According to a Context-Free Grammar
Substrings derived from single nonterminals are called phrases :
5.8 Definition
Consider a grammar G = (T; N; P; Z ). The string 2 V + is a phrase (for X ) of if and
only if Z ) X )+ (; 2 V , X 2 N ). It is a simple phrase of if and only if
Z ) X ) .
Notice that a phrase need not consist solely of terminals.
88 Elements of Formal Systems
Each of the three derivations of Figure 5.5 identies the same set of simple phrases. They
are therefore equivalent in the sense that they ascribe identical phrase structure to the string
i + i i. In order to have a single representation for the entire set of equivalent derivations,
one that makes the structure of the sentence obvious, we introduce the notion of a parse tree
(see Appendix B for the denition of an ordered tree):
5.9 Definition
Consider an ordered tree (K; D) with root k0 and label function f : K ! M . Let k1 ; : : : ; kn ,
(n > 0) be the immediate successors of k0 . (K; D) is a parse tree according to the grammar
(T; N; P; Z ) if the following conditions hold:
(a) M V [ fg
(b) f (k ) = Z
0
(c) Z ! f (k ) : : : f (kn) 2 P
1
E + T
T T * F
F F i
i i
T = f+; ; ig
N = fE g
P = fE ! E + E; E ! E E; E ! ig
Z=E
a) An ambiguous grammar
E E
E * E E + E
E + E i i E * E
i i i i
Derive the empty set of productions for any subtree with h(k0 ) = specification, and
derive fh(k0 ) ! h(k1 )g for any subtree not yet mentioned.
5.2 Regular Grammars and Finite Automata 91
The grammar derived from Figure 5.8c by this process will have more productions than
Figure 5.8a. The extra productions can be removed by a simple substitution: If B 2 N
occurs exactly twice in a grammar, once in a production of the form A ! B and once
in a production of the form B ! (; ; 2 V ), then B can be eliminated and the two
productions replaced by A ! . After all such substitutions have been made, the resulting
grammar will dier from Figure 5.8a only in the representation of vocabulary symbols.
T =fn; :; +; ,; E g
Q =fC; F; I; X; S; U; qg
R =fCn ! q, Cn ! F , C: ! I ,
F: ! I , FE ! S ,
In ! q, In ! X ,
XE ! S ,
Sn ! q, S + ! U , S , ! U ,
Un ! qg
q0 = C
F =fqg
Figure 5.9: An Automaton Corresponding to Figure 5.4a
5.15 Theorem
For every regular grammar, G, there exists a deterministic nite automaton, A, such that
L(A) = L(G).
Following construction 5.13, we can derive an automaton from a regular grammar G =
(T; N; P; Z ) such that, during acceptance of a sentence in L(G), the state at each point
species the element of N used to derive the remainder of the string. Suppose that the pro-
ductions X ! tU and X ! tV belong to P . When t is the next input symbol, the remainder
of the string could have been derived either from U or from V . If A is to be deterministic,
however, R must contain exactly one production of the form Xt ! q0 . Thus the state q0
must specify a set of nonterminals, any one of which could have been used to derive the
remainder of the string. This interpretation of the states leads to the following inductive
algorithm for determining Q, R and F of a deterministic automaton A = (T; Q; R; q0 ; F ). (In
this algorithm, q represents a subset Nq of N [ ff g; f 2= N ):
1. Initially let Q = fq g and R = ;, with Nq0 = fZ g.
0
2. Let q be an element of Q that has not yet been considered. Perform steps (3)-(5) for each
t 2 T.
3. Let next(q; t) = fU j 9X 2 Nq such that X ! tU 2 P g.
4. If there is an X 2 Nq such that X ! t 2 P then add f to next(q; t) if it is not already
present; if there is an X 2 Nq such that X ! 2 P then add f to Nq if it is not already
present.
5. If next(q; t) 6= ; then let q0 be the state representing Nq = next(q; t). Add q0 to Q and
0
n : + , E
q0 q1 q2 fC g
q1 q2 q3 ff; F g
q2 q4 fI g
q3 q5 q6 q6 fS g
q4 q3 ff; X g
q5 ff g
q6 q5 fU g
a) The state table
T = fn; :; +; ,; E g
Q = fq0 ; q1 ; q2 ; q3 ; q4 ; q5 ; q6 g
P = fq0 n ! q1 , q0: ! q2 ,
q1 : ! q2 , q1 E ! q3 ,
q2 n ! q4 ,
q3 n ! q5 , q3+ ! q6 , q3 , ! q6 ,
q4E ! q3,
q6n ! q5 g
F = fq1 ; q4 ; q5 g
b) The complete automaton
Figure 5.10: A Deterministic Automaton Corresponding to Figure 5.4a
of which avoid the need for irrelevant structuring, are available for regular languages. The
rst is the representation of a nite automaton by a directed graph:
5.17 Definition
Let A = (T; Q; R; q0 ; F ) be a nite automaton, D = f(q; q0 ) j 9t; qt ! q0 2 Rg, and f :
(q; q0 ) ! ft j qt ! q0 2 Rg be a mapping from D into the powerset of T . The directed graph
(Q; D) with edge labels f ((q; q0 )) is called the state diagram of the automaton A.
Figure 5.11a is the state diagram of the automaton described in Figure 5.10b. The nodes
corresponding to elements of F have been represented as squares, while the remaining nodes
are represented as circles. Only the state numbers appear in the nodes: 0 stands for q0 , 1 for
q1 , and so forth.
In a state diagram, the sequence of edge labels along a path beginning at q0 and ending at
a state in F is a sentence of L(A). Figure 5.11a has exactly 12 such paths. The corresponding
sentences are given in Figure 5.11b.
A state diagram species a regular language. Another characterization is the regular
expression:
5.18 Definition
Given a vocabulary V , and the symbols E , , +, , ( and ) not in V . A string over
V [ fE; ; +; ; (; )g is a regular expression over V if
1. is a single symbol of V or one of the symbols E or , or if
2. has the form (X + Y ), (XY ) or (X ) where X and Y are regular expressions.
5.2 Regular Grammars and Finite Automata 95
0 n 1 E 3 +- 6
E n n
n
2 4 5
a) State diagram
n .n n.n
nEn nE+n nE-n
.nEn .nE+n .nE-n
n.nEn n.nE+n n.nE-n
b) Paths
Figure 5.11: Another Description of Figure 5.10b
Every regular expression results from a nite number of applications of rules (1) and (2). It
describes a language over V : The symbol E describes the empty language, describes the
language consisting only of the empty string, v 2 V describes the language fvg, (X + Y ) =
f! j ! 2 X or ! 2 Y g, (XY ) = f
j 2 X;
2 Y g. The closure operator () is dened by
the following innite sum:
X = + X + XX + XXX + : : :
As illustrated in this denition, we shall usually omit parentheses. Star is unary, and takes
priority over either binary operator; plus has a lower priority than concatenation. Thus
W + XY is equivalent to the fully-parenthesized expression (W + (X (Y ))).
Figure 5.12 summarizes the algebraic properties of regular expressions. The distinct rep-
resentations for X show that several regular expressions can be given for one language.
X +Y = Y +X (commutative)
(X + Y ) + Z = X + (Y + Z ) (associative)
(XY )Z = X (Y Z )
X (Y + Z ) = XY + XZ (distributive)
(X + Y )Z = XZ + Y Z
X +E = E + X = X (identity)
X = X = X
XE = EX = E (zero)
X +X = X (idempotent)
(X )= X
X = + XX
X = X + X
=
E =
Figure 5.12: Algebraic Properties of Regular Expressions
The main advantage in using a regular expression to describe a set of strings is that it
gives a precise specication, closely related to the `natural language' description, which can
be written in text form suitable for input to a computer. For example, let l denote any single
letter and d any single digit. The expression l(l + d) is then a direct representation of the
natural language description `a letter followed by any sequence of letters and digits'.
96 Elements of Formal Systems
number 1, n is the total number of productions and the ith production has the form Xi ! i ,
i = xi;1 : : : xi;m. The length, m, of the right-hand side is also called the length of the
production. We shall denote a leftmost derivation X ) Y by X )L Y and a rightmost
derivation by X )R Y .
We nd the following notation convenient for describing the properties of strings: The
k-head k : ! of ! gives the rst min(k; j!j + 1) symbols of !#. FIRSTk (!) is the set of
all terminal k-heads of strings derivable from !. The set EFFk (!) (`-free rst') contains all
strings from FIRSTk (!) for which no -production A ! was applied at the last step in
the rightmost derivation. The set FOLLOWk (!) comprises all terminal k-heads that could
follow !. By denition FOLLOWk (Z ) = f#g for any k. Formally:
(
T = f+; ; (; ); ig
Q = fqg
R = fEq ! Tq, Eq ! T +Eq,
Tq ! Fq, Tq ! F *Tq,
Fq ! iq, Fq !)E (q,
+q+ ! q, *q*! q, (q(! q, )q)! q, iqi ! qg
q0 = q
F = fqg
S = f+, *, (, ), i; E; T; F g
s0 = E
Figure 5.14: A Pushdown Automaton Constructed from Figure 5.3a
Stack Input Leftmost derivation
E q i+ii E
T +E q i+ii E+T
T +T q i+ii T +T
T +F q i+ii F +T
T +i q i+ii i+T
T + q +i i
Tq ii
F T q ii i+T F
F F q ii i+F F
F i q ii i+iF
F q i
Fq i
iq i i+ii
q
Figure 5.15: Top-Down Analysis
This automaton accepts a string in L(G) by constructing a leftmost derivation of that string
and comparing the symbols generated (from left to right) with the symbols actually appearing
in the string.
Figure 5.14 is a pushdown automaton constructed in this manner from the grammar of
Figure 5.3a. In the left-hand column of Figure 5.15 we show the derivation by which this
automaton accepts the string i + i i. The right-hand column is the leftmost derivation of
this string, copied from Figure 5.5. Note that the automaton's derivation has more steps due
to the rules that compare a terminal symbol on the stack with the head of the input string
and delete both. Figure 5.16 shows a reduced set of productions combining some of these
steps with those that precede them.
The analysis performed by this automaton is called a top-down (or predictive ) analysis
because it traces the derivation from the axiom (top) to the sentence (bottom), predicting
the symbols that should be present. For each conguration of the automaton, the stack
species a string from V used to derive the remainder of the input string. This corresponds
to construction 5.13 for nite automata, with the stack content playing the role of the state
and the state merely serving to mark the point reached in the input scan.
We now specify the construction of deterministic, top-down pushdown automata by means
of the LL(k) grammars introduced by Lewis and Stearns [1969]:
100 Elements of Formal Systems
R0 = fEq ! Tq, Eq ! T + Eq
Tq ! Fq, Tq ! F Tq,
Fqi ! q, Fq(!)Eq,
+q+ ! q, q ! q, )q) ! qg
Figure 5.16: Reduced Productions for Figure 5.14
5.22 Definition
A context-free grammar G = (T; N; P; Z ) is LL(k) for given k 0 if, for arbitrary derivations
Z )L A ) )
;
2 T ; ; 2 V ; A 2 N
Z )L A ) ! )
0
0 2 T ; ! 2 V
(k :
= k :
0 ) implies = !.
5.23 Theorem
For every LL(k) grammar, G, there exists a deterministic pushdown automaton, A, such that
L(A) = L(G).
A reads each sentence of the language L(G) from l eft to right, tracing a l eftmost derivation
and examining no more than k input symbols at each step. (Hence the term `LL(k)'.)
In our discussion of Theorem 5.13, we noted that each state of the nite automaton
corresponding to a given grammar specied the nonterminal of the grammar that must have
been used to derive the string being analyzed. Thus the state of the automaton characterized a
step in the grammar's derivation of a sentence. We can provide an analogous characterization
of a step in a context-free derivation by giving information about the production being applied
and the possible right context: Each state of a pushdown automaton could specify a triple
(p; j;
), where 0 j np gives the number of symbols from the right-hand side of production
Xp ! xp;1 : : : xp;n already analyzed and
is the set of k-heads of strings that could follow
p
the string derived from Xp . This triple is called a situation, and is written in the following
descriptive form:
[Xp ! ;
] = xp;1 : : : xp;j ; = xp;j +1 : : : xp;np
The dot (which is assumed to be outside of the vocabulary) marks the position of the analysis
within the right-hand side. (In most cases
contains a single string. We shall then write it
without set brackets.)
Given a grammar (T; N; P; Z ), we specify the states Q and transitions R of the automaton
inductively as follows:
1. Initially let Q = fq0 g and R = ;, with q0 = [Z ! S ; #]. (Note that FOLLOWk (Z ) =
f#g.) The initial state is q0, which is also the initial stack content of A. (We could
have chosen an arbitrary state as the initial stack content.) The automaton halts if this
state is reached again, the stack is empty, and the next input symbol is the terminator
#.
2. Let q = [X ! ;
] be an element of Q that has not yet been considered.
3. If = then add q ! to R if it is not already present. (The notation q ! is
shorthand for the set of spontaneous unstacking transitions q0 q ! q0 with arbitrary
q0.)
4. If = t
for some t 2 T and
2 V , let q0 = [X ! t
;
]. Add q0 to Q and qt ! q0
to R if they are not already present.
5.3 Context-Free Grammars and Pushdown Automata 101
is executed, the reading of terminal symbols and the decision to terminate the production
with an unstacking transition proceeds without further lookahead.
There exist grammars that do not have the LL(k) property for any k. Among the possible
reasons is the occurrence of left recursive nonterminals { nonterminals A for which a derivation
A ) A!, ! 6= , is possible. In a predictive automaton, left recursive nonterminals lead to
cycles that can be broken only by examining a right context of arbitrary length. They can,
however, be eliminated through a transformation of the grammar.
5.24 Theorem
An LL(k) grammar can have no left recursive nonterminal symbols.
5.25 Theorem
For every context-free grammar G = (T; N; P; Z ) with left recursive nonterminals, there exists
an equivalent grammar G0 = (T; N 0 ; P 0 ; Z ) with no left recursive nonterminals.
Let the elements of N be numbered consecutively: N = fX1 ; : : : ; Xn g. If we choose the
indices such that the condition i < j holds for all productions Xi ! Xj ! then G has no left
recursive nonterminals. If such a numbering is not possible for G, we can guarantee it for G0
through the following construction:
1. Let N 0 = N , P 0 = P . Perform steps (2) and (3) for i = 1; : : : ; n.
2. For j = 1; : : : ; i,1 replace all productions Xi ! Xj ! 2 P 0 by fXi ! j ! j Xj ! j 2 P 0g.
(After this step, Xi ) Xj
implies i j .)
+
3. Replace the entire set of productions of the form Xi ! Xi! 2 P 0 (if any exist) by the
productions fBi ! !Bi j Xi ! Xi ! 2 P 0 g [ fBi ! g, adding a new symbol Bi to N 0 .
At the same time, replace the entire set of productions Xi ! , = 6 Xi
, by Xi ! Bi.
The symbols added during this step will be given numbers n + 1; n + 2; : : : ;
If the string ! in the production Xi ! Xi ! does not begin with Xj , j i then we can
replace Xi ! Xi ! by fBi ! !, Bi ! !Bi g and Xi ! by fXi ! , Xi ! Bi g in step (3).
This approach avoids the introduction of -productions; it was used to obtain the grammar
of Figure 5.3b from that of Figure 5.3a.
Note that left recursion such as E ! T , E ! E + T is used in the syntax of arithmetic
expressions to re
ect the left-association of the operators. This semantic property can also be
seen in the transformed productions E ! TE 0 ; E 0 ! +TE 0 ; E 0 ! , but not in E ! T; E !
T + E . In EBNF the left associativity of an expression can be conveniently represented by
E ::= T (0+0 T ).
One of the constructions discussed above results in -productions, while the other does
not. We can always eliminate -productions from an LL(k) grammar, but by doing this we
may increase the value of k:
5.26 Theorem
Given an LL(k) grammar G with -productions. There exists an LL(k + 1) grammar without
-productions that generates the language L(G) , fg.
Conversely, k can be reduced by introducing -productions:
5.27 Theorem
For every -free LL(k + 1) grammar G, k > 0, there exists an equivalent LL(k) grammar with
-productions.
5.3 Context-Free Grammars and Pushdown Automata 103
The proof of Theorem 5.27 rests upon a grammar transformation known as left-factoring,
illustrated in Figure 5.18. In Figure 5.18a, we cannot distinguish the productions X ! Y c
and X ! Y d by examining any xed number of symbols from the input text: No matter
what number of symbols we choose, it is possible for Y to derive a string of that length in
either production.
P = fZ ! X ,
X ! Y c, X ! Y d,
Y ! a, Y ! bY g
a) A grammar that is not LL(k) for any k
P = fZ ! X ,
X ! Y X 0,
X 0 ! c, X 0 ! d,
Y ! a, Y ! bY g
b) An equivalent LL(1) grammar
Figure 5.18: Left Factoring
We avoid the problem by deferring the decision. Since both productions begin with Y ,
it is really not necessary to distinguish them until after the string derived from Y has been
scanned. The productions can be combined by `factoring out' the common portion, as shown
in Figure 5.18b. Now the decision is made at exactly the position where the productions
begin to dier, and consequently it is only necessary to examine a single symbol of the input
string.
In general, by deferring a decision we obtain more information about the input text we
are analyzing. The top-down analysis technique requires us to decide which production to
apply before analyzing the string derived from that production. In the next section we shall
present the opposite technique, which does not require a decision until after analyzing the
string derived from a production. Intuitively, this technique should handle a larger class of
grammars because more information is available on which to base a decision; this intuition can
be proven correct. The price is an increase in the complexity of both the analysis procedure
and the resulting automaton, but in practice the technique remains competitive.
5.3.3 Bottom-Up Analysis and LR(k) Grammars
Again let G = (T; N; P; Z ) be a context-free grammar, and consider the pushdown automaton
A = (T; fqg; R; q; fqg; V; ) with V = T [ N , and R dened as follows:
R = fx1 : : : xnq ! Xq j X ! x1 : : : xn 2 P g [ fqt ! tq j t 2 T g [ fZq ! qg
This automaton accepts a string in L(G) by working backward through a rightmost derivation
of the string.
Figure 5.19 is a pushdown automaton constructed in this manner from the grammar of
Figure 5.3a. In the left-hand column of Figure 5.20, we show the derivation by which this
automaton accepts the string i + i i. The right-hand column is the reverse of the rightmost
derivation of this string, copied from Figure 5.5. The number of steps required for the
automaton's derivation can be decreased by combining productions as shown in Figure 5.21.
(This reduction is analogous to that of Figure 5.16.)
The analysis performed by this automaton is called a bottom-up analysis because of the
fact that it traces the derivation from the sentence (bottom) to the axiom (top). In each
104 Elements of Formal Systems
T = f +; ; (; ); ig
R = fTq ! Eq, E + Tq ! Eq,
Fq ! Tq, T Fq ! Tq,
iq ! Fq, (E )q ! Fq,
q+ ! +q, q ! q, q(! (q, q) !)q, qi ! iq,
Eq ! qg
S = f +; ; (; ); i; E; T; F g
Figure 5.19: A Pushdown Automaton Constructed from Figure 5.3a
Stack Input Reverse rightmost derivation
q i+ii i+ii
i q +i i
F q +i i F + i i
T q +i i T + i i
E q +i i E + i i
E+ q i i
E + i q i
E + F q i E+F i
E + T q i E+T i
E + T q i
E+T i q
E +T F q E+T F
E +T q E+T
E q E
q
Figure 5.20: Bottom-Up Analysis
conguration of the automaton the stack contains a string from S , from which the portion
of the input text already read can be derived. The state merely serves to mark the point
reached in the input scan. The meaningful information is therefore the pair (; ), where
2 S denotes the stack contents and 2 T denotes the remainder of the input text.
The pairs (; ) that describe the congurations of an automaton tracing such a derivation
may be partitioned into equivalence classes as follows:
5.28 Definition
For p = 1; : : : ; n let Xp ! p be the pth production of a context-free grammar G =
(T; N; P; Z ). The reduction classes, Rj , j = 0; : : : ; n are dened by:
R0 = f(; ) j =
; = ! such that Z )R A!; A )R
; 6= g
0
`A )R ' denotes the relation `A )R and the last step in the derivation does not take the
0
form B ) '.
The reduction classes contain all pairs of strings that could appear during the bottom-up
parse of a sentence in L(G) by the automaton described above. Further, the reduction class
to which a pair belongs characterizes the transition carried out by the automaton when that
pair appears as a conguration. There are three possibilities:
1. (; ) 2 R . The simple phrase is not yet completely in the stack; the transition qt ! tq
0
with t = 1 : is applied (shift transition ).
2. (; ) 2 Rp, 1 p n. The simple phrase is complete in the stack and the reduce
transition pq ! Xp q is applied. (For p = 1 the transition Zq ! q occurs and the
automaton halts.)
3. (; ) 2= Rj , 0 j n. No further transitions are possible; the input text does not belong
to L(G).
A pushdown automaton that bases its decisions upon the reduction classes is obviously
deterministic if and only if the grammar is unambiguous.
Unfortunately the denition of the sets Rj uses the entire remainder of the input string
in order to determine the reduction class to which a pair (; ) belongs. That means that our
bottom-up automaton must inspect an arbitrarily long lookahead string to make a decision
about the next transition, if it is to be deterministic. If we restrict the number of lookahead
symbols to k, we arrive at the following denition:
5.29 Definition
For some k 0, the sets Rj;k , j = 0; : : : ; n, are called k-stack classes of a grammar G if:
Rj;k = f(; ) j 9(; ) 2 Rj such that = k : g
If the k-stack classes are pairwise-disjoint, then the pushdown automaton is deterministic
even when the lookahead is restricted to k symbols. This property characterizes a class of
grammars introduced by Knuth [1965]:
5.30 Definition
A context-free grammar G = (T; N; P; Z ) is LR(k) for given k 0 if, for arbitrary derivations
Z )R A! ) ! 2 V ; ! 2 T ; A ! 2 P
Z )R 0B!0 ) 0
!0 0 2 V ; !0 2 T ; B !
2 P
(jj + k) : ! = (jj + k) : 0
!0 implies = 0 , A = B and =
.
The automaton given at the beginning of this section scans the input text from l eft to right,
tracing the reverse of a r ightmost derivation and examining no more than k input symbols
at each step. (Hence the term "LR(k)".)
5.31 Theorem
A context-free grammar is LR(k) if and only if its k-stack classes are pairwise-disjoint.
On the basis of this theorem, we can test the LR(k) property by determining the intersection
of the k-stack classes. Unfortunately the k-stack classes can contain innitely many pairs
(; ): The length restriction permits only a nite number of strings , but the lengths of the
stack contents are unrestricted. However, we can give a regular grammar Gj for each k-stack
106 Elements of Formal Systems
class Rj;k such that L(Gj ) = f(& ) j (; ) 2 Rj;k g. Since algorithms exist for determining
whether two regular languages are disjoint, this construction leads to a procedure for testing
the LR(k) property.
5.32 Theorem
Let G = (T; N; P; Z ) be a context-free grammar, and let k 0. Assume that & is not an
element of the vocabulary V = T [ N . There exists a set of regular grammars Gj , j = 0; : : : ; n
such that L(Gj ) = f& j (; ) 2 Rj;k g.
The regular grammars that generate the k-stack classes are based upon the situations intro-
duced in connection with Theorem 5.23:
W = f[X ! ; !] j X ! 2 P; ! 2 FOLLOWk (X )g
These situations are the nonterminal symbols of the regular grammars. To dene the gram-
mars themselves, we rst specify a set of grammars that generate the k-stack classes, but are
not regular:
G0j = (V [ f&; #g; W; P 0 [ P 00 [ Pj ; [Z ! S ; #])
The productions in P 0 [P 00 build the components of the k-stack class. They provide the nite
description of the innite strings. Productions in Pj attach the component, terminating the
k-stack class:
P 0 = f[X !
; !] ! [X !
; !] j 2 V g
5.33 Theorem
For every LR(k) grammar G there exists a deterministic pushdown automaton A such that
L(A) = L(G).
Let G = (T; N; P; Z ). We base the construction of the automaton on the grammars Gj , eec-
tively building a machine that simultaneously generates the k-stack classes and checks them
against the reverse of a rightmost derivation of the string. Depending upon the particular
k-stack class, the automaton pushes the input symbol onto the stack or reduces some number
of stacked symbols to a nonterminal. The construction algorithm generates the necessary
situations as it goes, and uses the closure operation discussed above `on the
y' to avoid
considering productions from P 00 . As in the construction associated with Theorem 5.15, a
state of the automaton must specify a set of situations, any one of which might have been
used in deriving the current k-stack class. It is convenient to restate the denition of a closure
directly in terms of a set of situations M :
H (M ) = M [ f[B ! ; ] j 9[X ! B
; !] 2 H (M ); B ! 2 P; 2 FIRSTk (
!)g
The elements of Q and R are determined inductively as follows:
1. Initially let Q = fq g and R = ;, with q = H (f[Z ! S ; #]g).
0 0
2. Let q be an element of Q that has not yet been considered. Perform steps (3)-(5) for each
2V.
3. Let basis(q; ) = f[X !
; !] j [X !
; !] 2 qg.
4. If basis(q; ) =6 ;, then let next(q; ) = H (basis(q; )). Add q0 = next(q; ) to Q if it is
not already present.
5. If basis(q; ) 6= ; and 2 T then set
(
0
R := R [ fq ! qq g0 if k 1
fq ! qq j [X !
; !] 2 q; 2 FIRSTk,1(
!)g otherwise
6. If all elements of Q have been considered, perform step (7) for each q 2 Q and then stop.
Otherwise return to step (2).
7. For each [X ! ; !] 2 q, where = x1 : : : xn, set R := R [ fq1 : : : qnq! ! q1q0 ! j
[X ! ; !] 2 q1 ; qi+1 = next(qi ; xi )(i = 1; : : : ; n , 1); q = next(qn ; xn ); q0 =
next(q1 ; X )g
The construction terminates in all cases, since only a nite number of situations [X !
; !] exist.
Figure 5.22 illustrates the algorithm by applying it to the grammar of Figure 5.17a with
k = 2. In this example k = 1 would yield the same set of states. (For k = 0, q4 and q6 would
be coalesced, as would q7 and q9 .) Nevertheless, a single lookahead symbol is not sucient to
distinguish between the shift and reduce transitions in state 6. The grammar is thus LR(2),
but not LR(1).
We shall conclude this section by quoting the following theoretical results:
5.34 Theorem
For every LR(k) grammar with k > 1 there exists an equivalent LR(1) grammar.
5.35 Theorem
Every LL(k) grammar is also an LR(k) grammar.
108 Elements of Formal Systems
q0: [Z ! X ; #] q4 : [Y ! c ; #]
[X ! Y ; #] [Y ! c a; #]
[X ! bY a; #]
[Y ! c; #] q5 : [X ! bY a; #]
[Y ! ca; #] q6 : [Y ! c; a#]
q1: [Z ! X ; #] [Y ! c a; a#]
q2: [X ! Y ; #] q7 : [Y ! ca; #]
q3: [X ! b Y a; #] q8 : [X ! bY a; #]
[Y ! c; a#] q9 : [Y ! ca; a#]
[Y ! ca; a#]
a) States
R = fq0 bc ! q0 q3 c,
q0 c# ! q0 q4 #,
q0 ca ! q0q4a,
q3 ca ! q3q6a,
q4 a# ! q4q7 #,
q5 a# ! q5q8 #,
q6 aa ! q6 q9 a,
q0 q2 # ! q0 q1#,
q0 q4 # ! q0 q2#,
q3 q6 a# ! q3 q5a#,
q0 q4 q7 # ! q0q2 #,
q0 q3 q5 q8# ! q0 q1 #,
q3 q6 q9 a# ! q3q5 a#g
b) Transitions
Figure 5.22: A Deterministic Bottom-Up Automaton for Figure 5.17a
5.36 Theorem
There exist LR(k) grammars that are not LL(k0 ) for any k0 .
5.37 Theorem
There exists an algorithm that, when given an LR(k) grammar G, will decide in a nite
number of steps whether there exists a k0 such that G is LL(k0 ).
As a result of Theorem 5.34 we see that it might possibly be sucient to concern ourselves
only with LR(1) grammars. (As a matter of fact, the transformation underlying the proof
of this theorem is unsuitable for practical purposes.) The remaining theorems support our
intuitive thoughts at the end of Section 5.3.2.
BNF notation was rst used to describe ALGOL 60 [Naur, 1963]. Many authors have
proposed extensions similar to our EBNF, using quoted terminals rather than bracketed
nonterminals and having a regular expression capability. EBNF denitions are usually shorter
than their BNF equivalents, but the important point is that they are textual representations
of syntax charts [Jensen and Wirth, 1974; ANSI, 1978a]. This means that the context-free
grammar can actually be developed and described to the user by means of pictures.
Pushdown automata were rst examined by Samelson and Bauer [1960] and applied
to the compilation of a forerunner of ALGOL 60. Theoretical mastery of the concepts and
the proofs of equivalence to general context-free grammars followed later. Our introduction
of LR(k) grammars via reduction classes follows the work of Langmaack [1971]. Aho and
Ullman [1972] (and many other books dealing with formal languages) cover essentially the
same material as this chapter, but in much greater detail. The proofs that are either outlined
here or omitted entirely can be found in those texts.
Exercises
5.1 Prove that there is no loss of generality by prohibiting formal systems in which a
derivation )+ of a string from itself is possible.
5.2 Choose some useless nonterminal from the LAX denition and brie
y justify its inclu-
sion in Appendix A.
5.3 Give an intuitive justication of Theorem 5.10.
5.4 Write a program to examine a nite automaton A and return the accepted language
L(A) in closed form as a regular expression.
5.5 Regular expressions X1 ; : : : ; Xn can also be dened implicitly via systems of regular
equations of the form:
5.7 Prove that the algorithm for rewriting G to remove productions of the form A ! B ,
A; B 2 N results in a grammar G0 such that L(G) = L(G0 ).
110 Elements of Formal Systems
Chapter 6
Lexical Analysis
Lexical analysis converts the source program from a character string to a sequence of
semantically-relevant symbols. The symbols and their encoding form the intermediate lan-
guage output from the lexical analyzer.
In principle, lexical analysis is a subtask of parsing that could be carried out by the normal
parser mechanisms. To separate these functions, the source language grammar G must be
partitioned into subgrammars G0 ; G1 ; G2 ; : : : such that G1 ; G2 ; : : : describe the structure
of the basic symbols and G0 describes the structure of the language in terms of the basic
symbols. L(G) is then obtained by replacing the terminal symbols of G0 by strings from
L(G1 ); L(G2 ); : : :
The separation of lexical analysis from parsing gives rise to higher organizational costs
that can be justied only by realizing greater savings in other areas. Such savings are possible
in table-driven parsers through reduction in table size. Further, basic symbols usually have
such a simple structure that faster procedures can be used for the lexical analysis than for
the general parsing.
We shall rst discuss the partitioning of the grammar and the desired results of lexical
analysis, and then consider implementation with the help of nite automata.
In many languages the grammar for basic symbols (symbol grammar ) is not so easily de-
termined from the language denition, or it results in additional diculties. For example,
the ALGOL 60 Report denes keywords, letters, digits, special characters and special char-
acter combinations as basic symbols; it does not include identiers, numbers and strings in
this category. This description must be transformed to meet the requirements of compiler
construction. In PL/1, as in other languages in which keywords are lexically indistinguish-
able from identiers, context determines whether an identier (e.g. IF) is to be treated as a
keyword or a freely-chosen identier. Two symbol grammars must therefore be distinguished
on the basis of context; one accepts identiers and not keywords, the other does the converse.
An example of similar context-dependence in FORTRAN is the rst identier of a statement:
In an assignment it is interpreted as the identier of a data object, while in most other cases
it is interpreted as a keyword. (Statement classication in FORTRAN is not an easy task {
see the discussion by Sale [1971] for details.)
Even if it is necessary to consult context in order to determine which symbols are possible
at the given point in the input text, a nite automaton often suces. The automaton in this
case has several starting states corresponding to the distinct symbol grammars. We shall not
pursue this point further.
Source program
Symbol table
input
of the same basic symbol are resolved at this point. For example, if we were to allow the
symbol `<' to be written `LESS' or `LT' also, all three would lead to creation of the same
token. The operation identify symbol is used during token creation to perform the mapping
discussed in Section 4.2.1. If the basic symbol is a literal constant, rather than an identier,
the enter constant operation is used instead of identify symbol (Section 4.2.2).
6.2 Construction
We assume that the basic symbols are described by some set of regular grammars or regular
expressions as discussed in Section 6.1.1. According to Theorem 5.15 or Theorem 5.19 we
can construct a set of nite automata that accept the basic symbols. Unfortunately, these
automata assume the end of the string to be known a priori ; the task of the lexical analyzer is
to extract the next basic symbol from the input text, determining the end of the symbol in the
process. Thus the automaton only partially solves the lexical analysis problem. To enhance
the eciency of the lexical analyzer we should use the automaton with the fewest states from
the set of automata that accept the given language. Finally, we consider implementation
questions.
In order to obtain the classication for the basic symbol (Figure 4.1) we partition the
nal states of the automaton into classes. Each class either provides the classication directly
or indicates that it must be found by using the operation identify symbol . The textual
representation of constants, and the strings used to interrogate the symbol table, are obtained
from the input stream. The automaton is extended for this purpose to a nite-state transducer
that emits a character on each state transition. (In the terminology of switching theory, this
transducer is a special case of a Mealy machine.) The output characters are collected together
into a character string, which is then used to derive the necessary information.
d.E1. Since .EQ. is also a basic symbol, the automaton must look ahead three characters (in
certain cases) before it can determine the end of the symbol string.
By applying the tests of Section 5.3.3 to the original grammar G, we could determine (for
xed k) whether a k-character lookahead is sucient to resolve ambiguity. Because of the
eort involved, this is usually not done. Instead, we apply the principle of the longest match :
The automaton continues to read until it reaches a state with no transition corresponding to
the current input character. If that state is a nal state, then it accepts the symbol scanned
to that point; otherwise it signals a lexical error. The feasibility of the principle of the longest
match is determined by the representation of the symbols (the grammars G1 ; G2 ; : : : ) and by
the sequences of symbols permitted (the grammar G0 ).
The principle of the longest match in its basic form as stated above is unsuitable for a large
number of grammars. For example, an attempt to extract the next token from `3.EQ.4' using
the rules of FORTRAN would result in a lexical error when `Q' was encountered. The solution
is to retain information about the most-recently encountered nal state, thus providing a `fall-
back' position. If the automaton halts in a nal state, then it accepts the symbol; otherwise
it restores the input stream pointer to that at the most-recently encountered nal state. A
lexical error is signaled only if no nal state had been encountered during the scan.
We have tacitly assumed that the initial state of the automaton is independent of the
nal state reached by the previous invocation of next token . If this assumption is relaxed,
permitting the state to be retained from the last invocation, then it is sometimes possible to
avoid even the limited backtracking discussed above (Exercise 6.3). Whether this technique
solves all problems is still an open question.
The choice of a representation for the keywords of a language plays a central role in de-
termining the representations of other basic symbols. This choice is largely a question of
language design: The denitions of COBOL, FORTRAN and PL/1 (for example) prescribe
the representations and their relationship to freely-chosen identiers. In the case of AL-
GOL 60 and its descendants, however, these characteristics are not discussed in the language
denitions. Here we shall brie
y review the possibilities and their consequences.
The simplest possibility is the representation of keywords by reserved words { ordinary
identiers that the programmer is not permitted to use for any other purpose. This approach
requires that identiers be written without gaps, so that spaces and newlines can serve as
separators between identiers and between an identier and a number. Letters may appear
within numbers, and hence they must not be separated from the preceding part of the number
by spaces. The main advantage of this representation is its lucidity and low susceptibility to
typographical errors. Its main disadvantage is that the programmer often does not remember
all of the reserved words and hence incorrectly uses one as a freely-chosen identier. Further,
it is virtually impossible to modify the language by adding a new keyword because too many
existing programs might have used this keyword as a freely-chosen identier.
If keywords are distinguished lexically then it is possible to relax the restrictions on place-
ment of spaces and newlines. There is no need for the programmer to remember all of the
keywords, and new ones may be introduced without aecting existing programs. The rules
for distinguishing keywords are known as stropping conventions ; the most common ones are:
Underlining the keyword.
Bracketing the keyword by special delimiters (such as the apostrophes used in the DIN
66006 standard for ALGOL 60).
Prexing the keyword with a special character and terminating it at the rst space,
newline or character other than a letter or digit.
Using upper case letters for keywords and lower case for identiers (or vice-versa).
6.2 Construction 115
All of these conventions increase the susceptibility of the input text to typographical errors.
Some also require larger character sets than others or relatively complex line-imaging routines.
6.2.2 State Minimization
Consider a completely-specied nite automaton A = (T; Q; R; q0 ; F ) in which a production
qt ! q0 exists for every pair (q; t), q 2 Q, t 2 T . Such an automaton is termed reduced when
there exists no equivalent automaton with fewer states.
6.1 Theorem
Theorem: For every completely-specied nite automaton A = (T; Q; R; q0 ; F ) there exists a
reduced nite automaton A0 = (T; Q0 ; R0 ; q00 ; F 0 ) with L(A0 ) = L(A).
To construct A0 we rst delete all states q for which there exists no string ! such that
q0 ! ) q. (These states are termed unreachable.) We then apply the renement algorithm
of Section B.3.2 to the state diagram of A, with the initial partition fq j q 2 F g, fq j q 2= F g.
Let Q0 be the set of all blocks in the resulting partition, and let [q] denote the block to which
q 2 Q belongs. The denition of A0 can now be completed as follows:
l ld
0 1
Figure 6.2: Reduced Automaton Accepting l(l + d)
In order to apply the algorithm of Section B.3.2 to this example we must complete the
original automaton, which permits only l as an input character in state q0 . To do this we
introduce an `error state', qe , and transitions qt ! qe for all pairs (q; t); q 2 Q; t 2 T , not
corresponding to transitions of the given automaton. (In the example, q0 d ! qe suces.) In
practice, however, it is easier to modify the algorithm so that it does not require explicit error
transitions.
If c denotes any character other than the quote, then the regular expression "" + "(c +
"")(c + "")*" describes the characters and strings of Pascal. Figure 6.3a shows the automaton
constructed from this expression according to the procedure of Theorem 5.19, and the reduced
automaton is shown in Figure 6.3b.
In our application we must modify the equivalence relation still further, and only treat
nal states as equivalent when they lead to identical subsequent processing. For an automaton
recognizing the symbol grammar of LAX, we divide the nal states into the following classes:
Identiers or keywords
Special characters
Combinations of special characters
116 Lexical Analysis
1 ’’ 4
’’ c ’’
0 c 3 c 6
c ’’ ’’
’’
2 ’’ 5
a) Unreduced
c
0 ’’ 1,2
3,6
’’
’’
4,5
b) Reduced
Figure 6.3: Finite Automata Accepting `"" + "(c + "")(c + "")*"'
Integers
Floating point numbers
Floating point numbers with exponents
This results in the reduced automaton of Figure 6.4. Letters denote the following character
classes:
a = all characters other than '*'
a0 = all characters other than '*' or ')'
d = digits
l = letters
s = '+' '-' '*' '<' '>' '"' ';' ',' ')' '[' ']'
Figure 6.4 illustrates several methods of obtaining the code corresponding to a basic
symbol. States, 1, 6, 7, 9, and 12-18 all provide the code directly. Identify symbol must
be used in state 4 to distinguish identiers from keywords. In state 19 we might also use
identify symbol , or we might use some other direct computation based on the character
codes.
The state reduction in these examples could be performed by hand with no display of
theory, but the theory is required if we wish to mechanically implement a lexical analyzer
based upon regular expressions.
6.2.3 Programming the Lexical Analyzer
In order to extract the basic symbol that follows a given position p in the input stream we must
recognize and delete irrelevant characters such as spaces and newlines, use the automaton to
read the symbol, and x the terminal position p0 .
Super
uous spaces can be deleted by adding transitions q0 0 ! q to all states q in which
such spaces are permitted. Since newlines (card boundaries or carriage returns) are input
characters if they are signicant, we can handle them in the same way as super
uous spaces
in many languages.
6.2 Construction 117
a * ld
* -
2 3 4 5
a’ ld
*
) l
1 6
( . d
d d
0 8 d 9 12
d
s . E d d
19 / : = 7 E 10 +- 11
d
17 14 13
/ = =
18 16 15
Figure 6.4: Finite Automaton Accepting LAX Basic Symbols
There are two possibilities from which to choose when programming the automaton:
Representing the transition table as a matrix, so that the program for the automaton
has the general form:
while basic_symbol_not_yet_complete do
state := table[state, next_character] ;
The simplest way to provide output from the automaton is to add the input character
to a string { empty at the start of the basic symbol { during each state transition. This
strategy is generally inadequate. For example, the quotes bounding a Pascal character or
string denotation should be omitted and any doubled internal quote should be replaced by
a single quote. Thus more general actions may need to be taken at each state transition. It
usually suces, however, to provide the following four options:
Add (some mapping of) the input character to the output string.
Add a given character to the output string.
Set a pointer or index to the output string.
Do nothing.
Figure 6.5 illustrates three of these actions applied to produce output from the automaton
of Figure 6.3b. A slash separates the output action from the input character; the absence of
a slash indicates the `do nothing' action.
c/c
0 1,2
’’ 3,6 ’’/’’
’’ 4,5
Figure 6.5: Finite Transducer for Pascal Strings
In order to produce the standard representation of
oating point numbers (see Sec-
tion 4.2.2), we require three indices to the characters of the signicand:
beg: Initially indexes the rst character of the signicand, nally indexes the rst nonzero
digit.
pnt: Indexes the rst position to the right of the decimal point.
lim: Initially indexes the rst position to the right of the signicand, nally indexes the rst
position to the right of the last nonzero digit.
By moving the indices beg and lim, the leading and trailing zeros are removed so that the
signicand is left over in standard form. If e is the value of the explicit exponent, then the
adjusted exponent e0 is given by:
e0 := e + (pnt , beg) signicand interpreted as a fraction
e0 := e + (pnt , lim) signicand interpreted as an integer
The standard representation of a
oating point zero is the pair (0 00 ; 0). This representation
is obtained by taking a special exit from the standardization algorithm if beg becomes equal
to lim during the zero-removal process.
Many authors suggest that the next character operation be implemented by a proce-
dure. We have already pointed out that the implementation of next character strongly
in
uences the overall speed of the compiler; in many cases simple use of a procedure leads to
signicant ineciency. For example, Table 6.6 shows the results of measuring lexical analysis
times for three translators running on a Control Data 6400 under KRONOS 2.0. RUN 2.3 is a
FORTRAN compiler that reads one line at a time, storing it in an array; the next character
operation is implemented as a fetch and index increment in-line. The COMPASS 2.0 assem-
bler implements some instances of next character by procedure calls and others by in-line
references, while the Pascal compiler uses a procedure call to fetch each character. The two
6.3 Notes and References 119
test programs for the FORTRAN compiler had similar characteristics: Each was about 5000
lines long, composed of 30-40 heavily-commented subprograms. The test program for COM-
PASS contained 900 lines, about one-third of which were comments, and that for Pascal (the
compiler itself) had 5000 lines with very few comments.
Lexical Analysis Time
Translator Program Microseconds Fraction of
per character total compile time
RUN 2.3 Page Formatter 3.56 14%
without comments 3.44 9%
Flowchart Generator 3.3 11.5%
COMPASS 2.0 I/O Package 5.1 21%
Pascal 3.4 Pascal Compiler 35.6 39.6%
Figure 6.6: Lexical Analysis on a Control Data 6400 [Dunn, 1974]
Further measurements on existing compilers for a number of languages indicate that the
major subtasks of lexical analysis can be rank-ordered by amount of time spent as follows:
1. Skipping spaces and comments.
2. Collecting identiers and keywords.
3. Collecting digits.
4. All other tasks.
In many cases there are large (factor of at least 2) dierences in the amount of time spent
between adjacent elements in this hierarchy. Of course the precise breakdown depends upon
the language, compiler, operating system and coding technique of the user. For example, skip-
ping a comment is trivial in FORTRAN; on the other hand, an average non-comment card
in FORTRAN has 48 blank columns out of the 66 allocated to code Knuth [1971a]. Taken
together, the measurements discussed in the two paragraphs above lead to the conclusion that
the lexical analyzer should be partitioned further: Tasks 1-3 should be incorporated into a
scanner module that implements the next character operation, and the nite automaton
and its underlying regular grammar (or regular expression) should be dened in terms of
the characters digit string , identifier , keyword , etc. This decomposition drastically re-
duces the number of invocations of next character , and also the in
uence of the automaton
implementation upon the speed of the lexical analyzer.
Tasks 1-3 are trivial, and can be implemented `by hand' using all of the coding tricks and
special instructions available on the target computer. They can be carefully integrated with
the I/O facilities provided by the operating system to minimize overhead. In this way, serious
ineciencies in the lexical analyzer can be avoided while retaining systematic construction
techniques for most of the implementation.
many indications that the hand-coded product provides signicant savings in execution time
over the products of existing generators. Many of the coding details (table formats, output
actions, limited backtrack and character class tradeos) are discussed by Waite [1973a] in
his treatment of string-directed pattern matching.
Two additional features, macros and compiler control commands (compiler options,
compile-time facilities) complicate the lexical analyzer and its interface to the parser. Macro
processing can usually be done in a separate pre-pass. If, however, it is integrated into the
language (as in PL/M or Burroughs Extended ALGOL) then it is a task of the lexical an-
alyzer. This requires additional information from the parser regarding the scope of macro
denitions.
We recommend that control commands always be written on a separate line, and be easily
recognizable by the lexical analyzer. They should also be syntactically valid, so that the parser
can process them if they are not relevant to lexical analysis. Finally, it is important that there
be only one form of control command, since the user should not be forced to learn several
conventions because the compiler writer decides to process commands in several places.
Exercises
6.1 Derive a regular grammar from the LAX symbol grammar of Appendix A.1. Derive a
regular expression.
6.2 [Sale, 1971; McIlroy, 1974] Consider the denition of FORTRAN 66.
(a) Partition the grammar as discussed in Section 6.1.1. Explain why you distin-
guished each of the symbol subgrammars Gi .
(b) Carefully specify the lexical analyzer interface. How do you invoke dierent symbol
subgrammars?
6.3 Consider the following set of tokens, which are possible in a FORTRAN assignment
statement [McIlroy, 1974] (identifier is constructed as usual, d denotes a nonempty
sequence of digits, and s denotes either `+' or `-'):
+ - * / ** (), =
.TRUE. .FALSE.
.AND. .OR. .NOT.
.LT. .LE. .EQ. .NE. .GE. .GT.
identifier
d d. d.d .d
dEd d.Ed d.dEd .dEd
dEsd d.Esd d.dEsd .dEsd
Assume that any token sequence is permissible, and that the ambiguity of `***' may
be resolved in any convenient manner.
(a) Derive an analysis automaton using the methods of Section 5.2, and minimize the
number of states by the method of Section B.3.3.
(b) Derive an analysis automaton using the methods given by Aho and Corasick
[1975], and minimize the number of states.
(c) Describe in detail the interaction between the parser and the automaton derived
in (b). What information must be retained? What form should that information
take?
6.3 Notes and References 121
(d) Can you generalize the construction algorithms of Aho and Corasick to arbitrary
regular expression inputs?
6.4 Write a line-imaging routine to accept an arbitrary sequence of printable characters,
spaces and backspace characters and create an image of the input line. You should
recognize an extended character set which includes arbitrary underlining, plus the fol-
lowing overstruck characters:
c overstruck by / interpreted as `cents'
= overstruck by / interpreted as `not equal'
(Note: Overstrikes may occur in any order.) Your image should be an integer array,
with one element per character position. This integer should encode the character (e.g.
`cents') resulting in that position from the arbitrary input sequence.
6.5 Write a program to implement the automaton of Figure 6.4 as a collection of case
clauses. Compile the program and compare its size to the requirements for the transi-
tion table.
6.6 Attach output specications to the transitions of Figure 6.4. How will the inclusion of
these specications aect the program you wrote for Exercise 6.5? Will their inclusion
change the relationship between the program size and transition table size signicantly?
6.7 Consider the partition of a lexical analyzer for LAX into a scanner and an automaton.
(a) Restate the symbol grammar in terms of identifier , digit string , etc. to
re
ect the partition. Show how this change aects Figure 6.4.
(b) Carefully specify the interface between scanner and automaton.
(c) Rewrite the routine of Exercise 6.5, using the interface dened in (b). Has the
overall size of the lexical analyzer changed? (Don't forget to include the scan-
ner size!) Has the relationship between the two possible implementations of the
automaton (case clauses or transition tables) changed?
(d) Measure the time required for lexical analysis, comparing the implementation of
(c) with that of Exercise 6.5. If they dier, can you attribute the dierence to any
specic feature of your environment (e.g. an expensive procedure mechanism)? If
they do not dier, can you explain why?
6.8 Suppose that LAX is being implemented on a machine that supports both upper and
lower case letters. How would your lexical analyzer change under each of the following
assumptions:
(a) Upper and lower case letters are indistinguishable.
(b) Upper and lower case may be mixed arbitrarily in identiers, but all occurrences of
a given identier must use the same characters. (In other words, if an identier is
introduced as ArraySize then no identier such as arraysize can be introduced
in the same range.) Keywords must always be lower case.
(c) As (b), except that upper and lower case may be mixed arbitrarily in keywords,
and need not always be the same.
(d) Choose one of the schemes (a)-(c) and argue in favor of it on grounds of program
portability, ease of use, documentation value, etc.
122 Lexical Analysis
Chapter 7
Parsing
The parsing of a source program determines the semantically-relevant phrases and, at the
same time, veries syntactic correctness. As a result we obtain the parse tree of the program,
at rst represented implicitly by the sequence of productions employed during the derivation
from (or reduction to) the axiom according to the underlying grammar.
In this chapter we concern ourselves with the practical implementation of parsers. We
begin with the parser interface and the appropriate choice of parsing technique, and then
go into the construction of deterministic parsers from a given grammar. We shall consider
both the top-down and bottom-up parsing techniques introduced in Section 5.3.2 and 5.3.3.
Methods for coding parsers by hand and for generating them mechanically will be discussed.
7.1 Design
To design a parser we must dene the grammar to be processed, augment it with connection
points (points at which information will be extracted) and choose the parsing algorithm.
Finally, the augmented grammar must be converted into a form suited to the chosen parsing
technique. After this preparation the actual construction of the parser can be carried out
mechanically. Thus the process of parser design is really one of grammar design, in which we
derive a grammar satisfying the restrictions of a particular parsing algorithm and containing
the connection points necessary to determine the semantics of the source program.
Even if we are given a grammar for the language, modications may be necessary to obtain
a useful parser. We must, of course, guarantee that the modied grammar actually describes
the same language as the original, and that the semantic structure is unchanged. Structural
syntactic ambiguity leading to dierent semantic interpretations can only be corrected by
altering the language. Other ambiguities can frequently be removed by deleting productions
or restricting their applicability depending upon the parser state.
Error Sythesized
reports tokens
Error
handler
the operation parse program . It invokes the lexical analyzer's next symbol operation for
each basic symbol, and reports each connection point by invoking an appropriate operation
of some other module. (We term this invocation a parser action.) Control of the entire
transduction process resides within the parser in this design. By moving the control out of
the parser module, we obtain the two alternative designs: The parser module provides either
an operation parse symbol that is invoked with a token as an argument, or an operation
next connection that is invoked to obtain a connection point specication.
It is also possible to divide the parsing over more than one pass. Properties of the language
and demands of the parsing algorithm can lead to a situation where we need to know the
semantics of certain symbols before we can parse the context of the denitions of these
symbols. ALGOL 68, for example, permits constructs whose syntactic structure can be
recognized by deterministic left-to-right analysis only if the complete set of type identiers is
known beforehand. When the parsing is carried out in several passes, the sequence of symbols
produced by the lexical analyzer will be augmented by other information collected by parser
actions during previous passes. The details depend upon the source language.
We have already considered the interface between the parser and the lexical analyzer, and
the representation of symbols. The parser looks ahead some number of symbols in order to
control the parsing. As soon as it has accepted one of the lookahead symbols as a component
of the sentence being analyzed, it reads the next symbol to maintain the supply of lookahead
symbols. Through the use of LL or LR techniques, we can be certain that the program is
syntactically correct up to and including the accepted symbol. The parser thus need not
retain accepted symbols. If the code for these symbols, or their values, must be passed on
to other compiler modules via parser actions, these actions must be connected directly to
the acceptance of the symbol. We shall term connection points serving this purpose symbol
connections.
We can distinguish a second class of connection point, the structure connection. It is
used to connect parser actions to the attainment of certain sets of situations (in the sense
of Section 5.3.2) and permits us to trace the phrases recognized by the parser in the source
program. Note carefully that symbol and structure connections provide the only information
that a compiler extracts from the input text.
In order to produce the parse tree as an explicit data structure, it suces to provide
one structure connection at each reduction of a simple phrase and one symbol connection at
acceptance of each symbol having a symbol value; at the structure connections we must know
which production was applied. We can x the connection points for this process mechanically
from the grammar. This process has proved useful, particularly with bottom-up parsing.
Parser actions that enter declarations into tables or generate code directly cannot be xed
mechanically, but must be introduced by the programmer. Moreover, we often know which
production is to be applied well before the reduction actually takes place, and we can make
7.1 Design 125
good use of this knowledge. In these cases we must explicitly mark the connection points
and parser actions in the grammar from which the parser is produced. We add the symbol
encoding (code and value) taken from the lexical analyzer as a parameter to the symbol
connections, whereas parser actions at structure connections extract all of their information
from the state of the parser.
Expression ::= Term ('+' Term % Addop ) .
Term ::= Factor ('*' Factor % Mulop ) .
Factor ::= 'Identifier ' & Ident j '(' Expression ')' .
a) A grammar for expressions
Addop : Output "+"
Mulop : Output "*"
Ident : Output the identier returned by the lexical analyzer
upon program length, avoiding backtrack and the need to unravel parser actions. We have
already pointed out the LL and LR algorithms as special cases of deterministic techniques
that recognize a syntactic error at the rst symbol, t, that cannot be the continuation of a
correct program; other algorithms may not discover the error until attempting to reduce the
simple phrase in which t occurs. Moreover, LR(k) grammars comprise the largest class whose
sentences can be parsed using deterministic pushdown automata. In view of these properties
we restrict ourselves to the discussion of LL and LR parsing algorithms. Other techniques
can be found in the literature cited in Section 7.4.
Usually the availability of a parser generator is the strongest motive for the choice between
LL and LR algorithms: If one has such a generator at one's disposal, then the technique it
implements is given preference. If no parser generator is available, then an LL algorithm
should be selected because the LL conditions are substantially easier to verify by hand. Also
a transparent method for obtaining the parser from the grammar exists for LL but not for
LR algorithms. By using this approach, recognizers for large grammars can be programmed
relatively easily by hand.
LR algorithms apply to a larger class of grammars than LL algorithms, because they
postpone the decision about the applicable production until the reduction takes place. The
main advantage of LR algorithms is that they permit more latitude in the representation of
the grammar. As the example at the end of Section 7.1.1 shows, however, this advantage may
be neutralized if distinct structure connections that frustrate deferment of a parsing decision
must be introduced. (Note that LL and LR algorithms behave identically for all language
constructs that begin with a special keyword.)
We restrict our discussion to parsers with only one-symbol lookahead, and thus to LL(1)
and LR(1) grammars. Experience shows that this is not a substantial restriction; program-
ming languages are usually so simply constructed that it is easy to satisfy the necessary
conditions. In fact, to a large extent one can manage with no lookahead at all. The main
reason for the restriction is the considerable increase in cost (both time and space) that must
be invested to obtain more lookahead symbols in the parser generator and in the generated
parser.
When dealing with LR grammars, not even the restriction to the LR(1) case is sucient
to obtain practical tables. Thus we use an LR(1) parse algorithm, but control it with tables
obtained through a modication of the LR(0) analyzer.
of don't-cares leads to possible reduction in table size by combining rows or columns that dier
only in those elements.
The transition function may be stored as program fragments rather than as a matrix.
This is especially useful in an LL parser, where there are simple rules relating the program
fragments to the original grammar.
Parser generation is actually compilation: The source program is a grammar with em-
bedded connection points, and the target program is some representation of the transition
function. Like all compilers, the parser generator must rst analyze its input text. This
analysis phase tests the grammar to ensure that it satises the conditions (LL(1), LR(1),
etc.) assumed by the parser. Some generators, like `error correcting' compilers, will attempt
to transform a grammar that does not meet the required conditions. Other transformations
designed to optimize the generated parser may also be undertaken. In Sections 7.2 and 7.3
we shall consider some aspects of the `semantic analysis' (condition testing) and optimization
phases of parser generators.
Grammar Type Test Parser generation
LL(1) n 2
n2
Strong LL(k) n k +1
nk+1
LL(k) n2k 2n +(k+1) log n
k
SLR(1) n 2
2n+log n
SLR(k) nk+2 2n+k log n
LR(k) n2(k+1) 2n +1 +k log n
k
Z!E
E ! FE1
E1 ! j + FE1
F ! i j (E )
a) The grammar
q0 : [Z ! E ] q8 : [E1 ! + FE1 ]
q1 : [Z ! E ] q9 : [F ! i]
q2 : [E ! FE1 ] q10 : [F ! (E )]
q3 : [E ! F E1 ] q11 : [E1 ! + FE1 ]
q4 : [F ! i] q12 : [F ! (E )]
q5 : [F ! (E )] q13 : [E1 ! +F E1 ]
q6 : [E ! FE1 ] q14 : [F ! (E )]
q7 : [E1 ! ] q15 : [E1 ! +FE1 ]
b) The states of the parsing automaton
q0i ! q1 q2 i, q0(! q1 q2 (,
q1 ! ;
q2i ! q3 q4 i, q2(! q3 q5 (,
q3# ! q6 q7 #, q3) ! q6 q7 ), q3 + ! q6q8 +,
q4i ! q9 ,
q5(! q10 ,
q6 ! ,
q7 ! ,
q8+ ! q11 ,
q9 ! ,
q10 i ! q12 q2 i, q10(! q12 q2 (,
q11 i ! q13 q4 i, q11(! q13 q5 (,
q12 ) ! q14,
q13 # ! q15 q7 #, q13) ! q15 q7 ), q13 + ! q15q8 +,
q14 ! ,
q15 !
c) The transitions of the parsing automaton
Figure 7.4: A Sample Grammar and its Parsing Automaton
If we already know that a grammar satises the LL(1) condition, we can easily use these
transformations to write a parser (either by mechanical means or by hand). With additional
transformation rules we can generalize the technique suciently to convert our extended BNF
(Section 5.1.3) and connection points. Some of the additional rules appear in Figure 7.7.
Figure 7.8 illustrates the use of these rules.
7.2 LL(1) Parsers 131
procedure parser ;
procedure E ; forward;
procedure F ;
begin (* F *)
case symbol of
'i':
begin
(* q4 : *) if symbol = 'i' then next_symbol else error ;
(* q9 : *)
end;
'(' :
begin
(* q5 : *) if
symbol = '(' then next_symbol else error ;
(* q10 : *) E ;
(* q12 : *) if
symbol = ')' then next_symbol else error ;
(* q14 : *)
end
otherwise error
end;
end; (* F *)
procedure E1 ;
begin (* E1 *)
case symbol of
'#', ')' : (* q7 : *);
'+' :
begin
(* q8 : *) if
symbol = '+' then next_symbol else error;
(* q11 : *) F ;
(* q13 : *) E1 ;
(* q15 : *)
end
otherwise error
end;
end; (* E1 *)
procedure E ;
begin (* E *)
(* q2 :
*) F ;
(* q3 :
*) E1 ;
(* q6 :
*)
end ; (* E *)
begin (* parser *)
q
(* 0 : *) E ;
q
(* 1 : *) if
symbol <> '#' then error ;
end; (* parser *)
Figure 7.5: A Recursive Descent Parser for the Grammar of Figure 7.4
132 Parsing
procedure parser;
procedure E ; forward;
procedure F ;
begin (* F *)
case symbol of
'i' : next_symbol ;
'(':
begin
next_symbol;
E;
if symbol = ')' then next_symbol else error ;
end
otherwise error
end;
end; (* F *)
procedure E1 ;
begin (* E1 *)
case symbol of
'#', ')': ;
'+': begin next_symbol ; F ; E1 end
otherwise error
end;
end; (* E1 *)
procedure E ;
begin F ; E1 end;
begin (* parser *)
E;
if symbol <> '#' then error ;
end; (* parser *)
a) Errors detected within E1
procedure E1 ;
begin (* E1 *)
if symbol = `+' then begin next_symbol ; F ; E1 end;
end; (* E1 *)
b) Errors detected after exit from E 1
Figure 7.6: Figure 7.5 Simplied
7.2 LL(1) Parsers 133
arbitrary recursive procedures because they have no parameters or local variables, and there
is only a single global variable. Thus the alteration of the environment pointer on procedure
entry and exit can be omitted.
An interpretive implementation of a recursive descent parser is also possible: The control
program interprets tables generated from the grammar. Every table entry species a basic
operation of the parser and the associated data. For example, a table entry might be described
as follows:
typeparse_table_entry = record
operation : integer ; (* Transition *)
lookahead : set of
symbol_code ; (* Input or lookahead symbol *)
next : integer (* Parse table index *)
end
;
States corresponding to situations that follow one another in a single production follow
one another in the table. Figure 7.9 species a recursive descent interpreter assuming that
parse table is an array of parse table entry .
Alternatives (1)-(5) of the case clause in Figure 7.9 supply the program schemata for
qt ! q0 , q ! and qti ! q0 qi ti introduced in Figure 7.3. As before, the transition qti !
q0 qi ti is accomplished in two steps (alternative 3 followed by either 4 or 5). The situations
represented by the alternatives are given as comments. Alternative 6 shows one of the possible
optimizations, namely the combination of selecting a production X ! i (alternative 4)
with acceptance of the rst symbol of i (alternative 1). Further optimization is possible
(Exercise 7.6).
7.2.3 Computation of FIRST and FOLLOW Sets
The rst step in the generation of an LL(1) parser is to ensure that the grammar G =
(T; N; P; Z ) satises the LL(1) condition. To do this we compute the FIRST and FOLLOW
sets for all X 2 N . For each production X ! 2 P we can then determine the director
set W = FIRST (FOLLOW (X )). The director sets are used to verify the LL(1) condition,
and also become the lookahead sets used by the parser. With the computation of these sets,
the task of generating the parser is essentially complete. If the grammar does not satisfy the
LL(1) condition, the generator may attempt transformations automatically (for example, left
recursion removal and simple left factoring) or it may report the cause of failure to the user
for correction.
The following algorithm can be used to compute FIRST (X ) and initial values for the
director set W of each production X ! .
1. Set FIRST (X ) empty and repeat steps (2)-(5) for each production X ! .
2. Let = x1 : : : xn , i = 0 and W = f#g. If n = 0, go to step 5.
3. Set i := i + 1 and W := W [ FIRST (xi ). (If xi is an element of T , FIRST (xi ) = fxi g;
if FIRST (xi ) is not available, invoke this algorithm recursively to compute it.) Repeat
step 3 until either i = n or # is not an element of FIRST (xi ).
4. If # is not an element of FIRST (xi ), set W := W , f#g.
5. Set FIRST (X ) := FIRST (X ) [ W .
Note that if the grammar is left recursive, step (3) will lead to an endless recursion and
the algorithm will fail. This failure can be avoided by marking each X when the computation
of FIRST (X ) begins, and clearing the mark when that computation is complete. If step (3)
attempts to invoke the algorithm with a marked nonterminal, then a left recursion has been
detected.
7.2 LL(1) Parsers 135
procedure parser ;
var current : integer ;
stack : array [1 .. max_stack ] of integer ;
stack_pointer : 0 .. max_stack ;
begin (* parser *)
current := 1; stack_pointer := 0;
repeat
with parse_table [current ] do
case operation of
1: (* X ! t *)
if symbol in lookahead then
begin next_symbol ; current := current + 1 end
else error;
2: (* X ! *)
begin
current := stack [stack_pointer ];
stack_pointer := stack_pointer - 1;
end ;
3: (* X ! B *)
begin
if stack_pointer = max_stack then
abort ;
stack_pointer := stack_pointer + 1;
stack [stack_pointer ] := current + 1;
current := next ;
end
;
4: (* X ! i (not the last alternative) *)
if
symbol in
lookahead then
current := current + 1
elsecurrent := next ;
5: (* X ! m (last alternative) *)
if
symbol in
lookahead then
current := current + 1
elseerror ;
6: (* X ! t i (not the last alternative) *)
if
symbol in
lookahead then
begin next_symbol ; current := current + 1 end
elsecurrent := next
end
;
until current = 1;
if symbol <> '#' then
error ;
end ; (* Parser *)
7.3 LR Parsers
Using construction 5.33, we can both test whether a grammar is LR(1) and construct a parser
for it. Unfortunately, the number of states of such a parser is too large for practical use.
Exactly as in the case of strong LL(k) grammars, many of the transitions in an LR(1) parser
7.3 LR Parsers 137
are independent of the lookahead symbol. We can utilize this fact to arrive at a parser with
fewer states, which implements the LR(1) analysis algorithm but in which reduce transitions
depend upon the lookahead symbol only if it is absolutely necessary.
We begin the construction with an LR(0) parser, which does not examine lookahead
symbols at all, and introduce lookahead symbols only as required. The grammars that we
can process with these techniques are the simple LR(1) (SLR(1)) grammars of DeRemer
[1969]. (This class can also be dened for arbitrary k < 1.) Not all LR(1) grammars are also
SLR(1) (there is no equivalence similar to that between ordinary and strong LL(1) grammars),
but the distinction is unimportant in practice except for one class of problems. This class
of problems will be solved by sharpening the denition of SLR(1) to obtain lookahead LR(1)
(LALR(1)) grammars.
The verications of the LR(1), SLR(1) and LALR(1) conditions are more laborious than
verication of the LL(1) condition. Also, there exists no simple relationship between the
grammar and the corresponding LR pushdown automaton. LR parsers are therefore employed
only if one has a parser generator. We shall rst discuss the workings of the parser and in
that way derive the SLR(1) and LALR(1) grammars from the LR(0) grammars. Next we
shall show how parse tables are constructed. Since these tables are still too large in practice,
we investigate the question of compressing them and show examples in which the nal tables
are of feasible size. The treatment of error handling will be deferred to Section 12.2.2.
7.3.1 The Parse Algorithm
Consider an LR(k) grammar G = (T; N; P; Z ) and the pushdown automaton A = (T; Q; R; q0 ;
fq0g; Q; q0 ) of construction 5.33. The operation of the automaton is most easily explained
using the matrix form of the transition function:
8
>
>
>
q0 if
2 T and q
! qq0
2 R or
>
>
>
< if 2 N and q0 = next(q; ) (shift transition)
f (q; ) = >X ! if [X ! ; ] 2 q (reduce transition)
>
>
>
>
>
HALT if = # and [Z ! S ; #] 2 q
:
ERROR otherwise
This transition function is easily obtained from construction 5.33: All of the transitions
dened in step (5) deliver shift transitions with one terminal symbol, which will be accepted;
the remaining transitions result from step (7) of the construction. We divide the transition
p1 : : : pmq! ! p1 q0! referred to in step (7) into two steps: Because [X ! ; ] is in q we
know that we must reduce according to the production X ! and remove m = jj states
from the stack. Further we dene f (p1 ; X ) = next(p1 ; X ) = q0 to be the new state. If w = #
and [Z ! S ; #] 2 q then the pushdown automaton halts.
Figure 7.10 gives an example of the construction of a transition function for k = 0. We
have numbered the states and rules consecutively. `+2' indicates that a reduction will be
made according to rule 2; `*' marks the halting of the pushdown automaton. Because k = 0,
the reductions are independent of the following symbols.
Figure 7.10c shows the transition function as the transition diagram of a nite automaton
for the grammars of Theorem 5.32. The distinct grammars correspond to distinct nal states.
As an LR parser, the automaton operates as follows: Beginning at the start state 0, we make
a transition to the successor state corresponding to the symbol read. The states through
which we pass are stored on the stack; this continues until a nal state is reached. In the nal
state we reduce by means of the given production X ! , delete jj states from the stack
and proceed as though X had been `read'.
138 Parsing
(1) Z ! E
(2) E ! E + F (3) E ! F
(4) F ! i (5) F ! (E )
a) The grammar
i ( ) + # E F
0 3 4 . . . 1 2
1 . . . 5 *
2 +3 +3 +3 +3 +3
3 +4 +4 +4 +4 +4
4 3 4 . . . 6 2
5 3 4 . . . 7
6 . . 8 5 .
7 +2 +2 +2 +2 +2
8 +5 +5 +5 +5 +5
b) The transition table
+3 2 F 0 E 1 # HALT
( i
F
( 4 i 3 +4 +
E i
(
) F
+5 8 6 + 5 7 +2
The only distinction between the mode of operation of an LR(k) parser for k > 0 and the
LR(0) parser of the example is that the reductions may depend upon lookahead symbols. In
the nal states of the automaton, reductions will take place only if the context allows them.
Don't-care entries with f (q; ) = ERROR, i.e. entries such that there exists no word
with q0 q0 # ) !q
# with suitable stack contents !, may occur in the matrix representa-
tion of the transition function. Note that all entries (q; X ), X 2 N , with f (q; X ) = ERROR
are don't-cares. By the considerations in step (3) of construction 5.33, no error can occur in
a transition on a nonterminal; it would have been recognized at the latest at the preceding
reduction. (The true error entries are denoted by `.', while don't-cares are empty entries in
the matrix representation of f (q; ).)
(1) Z!E
(2) E ! E + T (3) E ! T
(4) T ! T F (5) T ! F
(6) F !i (7) F ! (E )
a) The grammar
i ( ) + * # E T F
0 4 5 . . . . 1 2 3
1 . . . 6 . *
2 . . +3 +3 7 +3
3 +5 +5 +5 +5 +5 +5
4 +6 +6 +6 +6 +6 +6
5 4 5 . . . . 8 2 3
6 4 5 . . . . 9 3
7 4 5 . . . . 10
8 . . 11 6 . .
9 . . +2 +2 7 +2
10 +4 +4 +4 +4 +4 +4
11 +7 +7 +7 +7 +7 +7
b) The transition table
Figure 7.11: A Non-LR(0) Grammar
lookahead symbol, we can also employ an LR(0) parser for a grammar that does not satisfy
the LR(0) condition. States in which a lookahead symbol must be considered are called
inadequate. They are characterized by having a situation [X ! ] that leads to a reduction,
and also a second situation. This second situation leads either to a reduction with another
production or to a shift transition.
DeRemer [1971] investigated the class of grammars for which these modications lead to
a parser:
7.5 Definition
A context free grammar G = (T; N; P; Z ) is SLR(1) i the following algorithm leads to a
deterministic pushdown automaton.
The pushdown automaton A = (T; Q; R; q0 ; fq0 g; Q; q0 ) will be dened by its transition
function f (q; ) rather than the production set R. The construction follows that of construc-
tion 5.33. We use the following as the closure of a set of situations:
H (M ) = M [ f[Y ! ] j 9[X ! Y
] 2 H (M )g
1. Initially let Q = fq0 g, with q0 = H (f[Z ! S ]g).
2. Let q be an element of Q that has not yet been considered. Perform steps (3)-(4) for
each 2 V .
3. Let basis(q; ) = f[X !
] j [X ! v
] 2 qg.
4. If basis(q; ) 6= ;, then let next(q; ) = H (basis(q; )). Add q0 = next(q; ) to Q if it is
not already present.
5. If all elements of Q have been considered, perform step (6) for each q 2 Q and then
stop. Otherwise return to step (2).
6. For all 2 V , dene f (q; ) by:
140 Parsing
8
>
>
>
next(q; ) if [X ! v
] 2 q
f (q; ) = >X ! if [X ! ] 2 q and 2 FOLLOW (X )
>
<
>
>
HALT if = # and [Z ! S ] 2 q
ERROR
>
:
otherwise
This construction is almost identical to construction 5.33 with k = 0. The only dierence is
the additional restriction 2 FOLLOW (X ) for the reduction (second case).
SLR(1) grammars cover many practically important language constructs not expressible
by LR(0) grammars. Compared to the LR(1) construction, the given algorithm leads to
substantially fewer states in the automaton. (For the grammar of Figure 7.11a the ratio is
22:12). Unfortunately, even SLR(1) grammars do not suce for all practical requirements.
State Situation f (q; )
0 * [Z ! E ] E 1
[E ! E + T ]
[E ! T ] T 2
[T ! T F ]
[T ! F ] F 3
[F ! i] i 4
[F ! (E )] ( 5
1 * [Z ! E ] # HALT
* [E ! E + T ] + 6
2 * [E ! T ] #; ); + reduce 3
* [T ! T F ] 7
3 * [T ! F ] reduce 5
4 * [F ! i] reduce 6
5 * [F ! (E )] E 8
[E ! E + T ]
[E ! T ] T 2
[T ! T F ]
[T ! F ] F 3
[F ! i] i 4
[F ! (E )] ( 5
6 * [E ! E + T ] T 9
[T ! T F ]
[T ! F ] F 3
[F ! i] i 4
[F ! (E )] ( 5
7 * [T ! T F ] F 10
[F ! i] i 4
[F ! (E )] ( 5
8 * [F ! (E )] ) 11
* [E ! E + T ] + 6
9 * [E ! E + T ] #; ); + reduce 2
* [T ! T F ] 7
10 * [T ! T F ] reduce 4
11 * [F ! (E )] reduce 7
Figure 7.12: Derivation of the Automaton of Figure 7.11b
7.3 LR Parsers 141
The problem arises whenever there is a particular sequence of tokens that plays dierent roles
in dierent places. In LAX, for example, an identier followed by a colon may be either a label
(A.2.0.6) or a variable serving as a lower bound (A.3.0.4). For this reason the LAX grammar is
not SLR(1), because the lookahead symbol `:' does not determine whether identifier should
be reduced to name (A.4.0.16), or a shift transition building a label definition should take
place.
If the set of lookahead symbols for a reduction could be partitioned according to the
state then we could solve the problem, as can be seen from the example of Figure 7.14. The
productions of Figure 7.14a do not fulll the SLR(1) condition, as we see in the transition
diagram of Figure 7.14b. In the critical state 5, however, a reduction with lookahead symbol
c need not be considered! If c is to follow B then b must have been read before, and we would
therefore have had the state sequence 0, 3, 7 and not 0, 2, 5. The misjudgement arises through
states in which all of the symbols that could possibly follow B are examined to determine
whether to reduce B ! d, without regard to the symbols preceding B . We thus rene the
construction so that we do not admit all lookahead symbols in FOLLOW (X ) when deciding
upon a reduction X ! , but distinguish on the basis of predecessor states lookahead symbols
that can actually appear.
We begin by dening the kernel of an LR(1) state to be its LR(0) situations:
kernel(q) = f[X ! ] j [X ! ;
] 2 qg
142 Parsing
(1) Z ! A
(2) A ! aBb (3) A ! adc (4) A ! bBc (5) A ! bdd
(6) B ! d
a) The grammar
0
A a b
1 2 3
# B d B d
HALT 4 +6
5 on 6 +6
7 on
b,c b,c
b c c d
8 9 10 11
+2 +3 +4 +5
b) The SLR(1) transition diagram
a b c d # A B
0 2 3 . . . 1
1 . . . . *
2 . . . 5 . 4
3 . . . 7 . 6
4 8 .
5 . +6 9 . .
6 . 10
7 . +6 +6 11 .
8 +2 +2 +2 +2 +2
9 +3 +3 +3 +3 +3
10 +4 +4 +4 +4 +4
11 +5 +5 +5 +5 +5
c) The LALR(1) transition table
Figure 7.14: A Non-SLR(1) Grammar
Construction 7.5 above eectively merges states of the LR(1) parser that have the same kernel,
and hence any lookahead symbol that could have appeared in any of the LR(1) states can
appear in the LR(0) state. The set of all such symbols forms the exact right context upon
which we must base our decisions.
7.6 Definition
Let G = (T; N; P; Z ) be a context free grammar, Q be the state set of the pushdown automaton
formed by construction 7.5, and Q0 be the state set of the pushdown automaton formed by
construction 5.33 with k = 1. The exact right context of an LR(0) situation [X ! ] in a
state q 2 Q is dened by:
ERC (q; [X ! ]) = ft 2 T j 9q0 2 Q0 such that q = kernel(q0 ) and [X ! ; t] 2 q0 g
Theorem 5.31 related the LR(k) property to non-overlapping k-stack classes, so it is not
surprising that the denition of LALR(1) grammars involves an analogous condition:
7.3 LR Parsers 143
7.7 Definition
Let G = (T; N; P; Z ) be a context free grammar and Q be the state set of the pushdown
automaton formed by construction 7.5. G is LALR(1) i the following sets are pairwise
disjoint for all q 2 Q, p 2 P :
Sq;0 = ft j [X ! ] 2 q; 6= ; t 2 EFF (ERC (q; [X ! ]))g
Sq;p = ERC (q; [Xp ! p ])
Although Denition 7.6 implies that we need to carry out construction 5.33 to determine the
exact right context, this is not the case. The following algorithm generates only the LR(0)
states, but may consider each of those states several times in order to build the exact right
context. Each time a shift transition into a given state is discovered, we propagate the right
context. If the propagation changes the third element of any triple in the state then the entire
state is reconsidered, possibly propagating the change further. Formally, we dene a merge
operation on sets of situations as follows:
merge(A; B ) = f[X ! ; [
] j [X ! ; ] 2 A; [X ! ;
] 2 B g
The LALR(1) construction algorithm is then:
1. Initially let Q = fq0 g, with q0 = H (f[Z ! S ; f#g]g).
2. Let q be an element of Q that has not yet been considered. Perform steps (3)-(5) for
each 2 V .
3. Let basis(q; ) = f[X !
;
] j [X ! v
;
] 2 qg.
4. If basis(q; ) 6= ; and there is a q0 2 Q such that kernel(q0 ) = kernel(H (basis(q; )))
then let next(q; ) = merge(H (basis(q; )); q0 ). If next(q; ) 6= q0 then replace q0 by
next(q; ) and mark q0 as not yet considered.
5. If basis(q; ) 6= ; and there is no q0 2 Q such that kernel(q0 ) = kernel(H (basis(q; )))
then let next(q; ) = H (basis(q; )). Add q00 = next(q; ) to Q.
6. If all elements of Q have been considered, perform step (7) for each q 2 Q and then
stop. Otherwise return to step (2).
7. For all 2 V dene f (q; ) as follows:
8
>
>
>
next(q; ) if basis(q; ) 6= ;
f (q; ) = >X ! if [X ! ;
] 2 q; 2
>
<
>
>
HALT if = # and [Z ! S ; f#g] 2 q
ERROR
>
:
otherwise
Figure 7.14c shows the LALR(1) automaton derived from Figure 7.14a. Note that we
can only recognize a B by reducing production 6, and this can be done only with b or c as
the lookahead symbol (see rows 5 and 7 of Figure 7.14c). States 4 and 6 are entered only
after recognizing a B , and hence the current symbol must be b or c in these states. Thus
Figure 7.14c has don't-care entries for all symbols other than b and c in states 4 and 6.
7.3.3 Shift-Reduce Transitions
For most programming languages 30-50% of the states of an LR parser are LR(0) reduce
states, in which reduction by a specic production is determined without examining the
144 Parsing
context. In Figure 7.12 these states are 3, 4, 10 and 11. We can combine these reductions
with the stacking of the previous symbol to obtain a new kind of transition { the shift-reduce
transition { specifying both the stacking of the last symbol of the right-hand side and the
production by which the next reduction symbol is to be made. Formally:
If f (q0; ) = X ! (or f (q0 ; ) = HALT ) is the only possible action (other than ERROR)
in state q0 then redene f (q; ) to be `shift reduce X ! ' for all states q with f (q; ) = q0
and for all 2 V . Then delete state q0 .
With this simplication the transition function of Figure 7.11 can be written as shown in
Figure 7.15.
i ( ) + * # E T F
0 -6 5 . . . . 1 2 -5
1 . . . 6 . *
2 . . +3 +3 7 +3
5 -6 5 . . . 8 2 -5
6 -6 5 . . . . 9 -5
7 -6 5 . . . . -4
8 . . -7 6 . .
9 . . +2 +2 7 +2
Figure 7.15: The Automaton of Figure 7.11 Recast for Shift-Reduce Transitions
(The notation remains the same, with the addition of ,p to indicate a shift-reduce tran-
sition that reduces according to the pth production.)
Introduction of shift-reduce transitions into a parsing automaton for LAX reduces the
number of states from 131 to 70.
* HALT 1 4
on # +
T
+4 6 +2
on #,+,* 7 i 5 * on +,#
c) After elimination of the chain production (3) E ! T
Figure 7.16: A Simple Case of Chain Production Elimination
(a simplied version of Figure 7.11a), yields a parser with the state diagram given in
Figure 7.16b. If we reach state 2, we can reduce to E given the lookahead symbol #, but we
could also reduce to Z immediately. We may therefore take either the actions of state 1 or
those of state 2. Figure 7.16c shows the parser that results from merging these two states.
Note that in Figure 7.16b the actions for states 1 and 2 do not con
ict (with the excep-
tion of the reduction E ! T being eliminated). This property is crucial to the reduction;
fortunately it follows automatically from the LR(1) property of the grammar: Suppose that
for A 6= B , A )c C and B )c C . Suppose further that some state q contains situations
[X ! A
; ,] and [Y ! B; ]. The follower condition `FIRST (
,) and FIRST ()
disjoint' must then hold, since otherwise it would be impossible to decide whether to reduce C
to A or B in state f (q; C ). Consideration of state 0 in Figure 7.16b with A = E , B = C = T
illustrates that the follower condition is identical to the absence of con
ict required above.
Situations involving chain productions are always introduced by a closure operation. In-
stead of using these chain production situations when establishing a new state, we use the
situations that introduced them. This is equivalent to saying that reduction to the right-hand
side of the chain production should be interpreted as reduction to the left-hand side. Thus
the only change in construction 7.7 comes in computation of basis(q; ):
3'. Let basis(q; ) = f[Y ! a ; ] j [X ! v
; ,]; [Y ! a; ] 2 q; a )c g,f[A !
B ;
] j A !c B g.
146 Parsing
partitioned in this way to ease the storage management problems. Because of cost we store
the transition function as a packed data structure and employ an access routine that locates
the value f (q; ) given (q; ). Some systems work with a list representation of the (sparse)
transition matrix; the access may be time consuming if such a scheme is used, because lists
must be searched.
The access time is reduced if the matrix form of the transition function is retained, and the
storage requirements are comparable to those of the list method if as many rows and columns
as possible are combined. In performing this combination we take advantage of the fact that
two rows can be combined not only when they agree, but also when they are compatible
according to the following denition:
7.8 Definition
Consider a transition matrix f (q; ). Two rows q; q0 2 Q are compatible if, for each column
, either f (q; ) = f (q0; ) or one of the two entries is a don't-care entry.
Compatibility is dened analogously for two columns ; 0 2 V . We shall only discuss the
combination of rows here.
We inspect the terminal transition matrix, the submatrix of f (q; ) with 2 T , separately
from the nonterminal transition matrix. Often dierent combinations are possible for the two
submatrices, and by exploiting them separately we can achieve a greater storage reduction.
This can be seen in the case of Figure 7.18a, which is an implementation of the transition
matrix of Figure 7.17. In the terminal transition matrix rows 0, 4, 5 and 6 are compatible,
but none of these rows are compatible in the nonterminal transition matrix.
In order to increase the number of compatible rows, we introduce a Boolean failure matrix,
F [q; t], q 2 Q, t 2 T . This matrix is used to lter the access to the terminal transition matrix:
f (q; t) = if F [q; t] then error else entry in the transition matrix;
For this purpose we dene F [q; t] as follows:
(
coloring the nodes with a minimum number of colors such that any pair of nodes connected
by a branch are of dierent colors. (Graph coloring is discussed in Section B.3.3.) Further
compression may be possible as indicated in Exercises 7.12 and 7.13.
i ( ) + * # E T F
0 -6 4 . . . . 1 2 2
1 . 5 *
2 . . . 5 6 *
4 -6 4 . . . . 7 8 8
5 -6 4 . . . . 9 9
6 -6 4 . . . . -4
7 -7 5 .
8 . . -7 5 6 .
9 . . +2 +2 6 +2
a) Transition matrix for Figure 7.17 with shift-reduce transitions
i ( ) + * #
0 false false true true true true
1 true false false
2 true true true false false false
4 false false true true true true
5 false false true true true true
6 false false true true true true
7 false false true
8 true true false false false true
9 true true false false false false
b) Uncompressed failure matrix for (a)
i ( ) + * #
0,1,2,4,5,6,7,8 -6 4 -7 5 6 *
9 +2 +2 6 +2
c) Compressed terminal transition matrix
E TF
0,1,2 1 2
4 7 8
5 9
6,7,8,9 -4
d) Compressed nonterminal transition matrix
Figure 7.18: Table Compression
mars or Floyd-Evans Productions, for example) either apply to smaller language classes or do
not attain the same computational eciency or error recovery properties as the techniques
treated here. Operator precedence grammars have also achieved signicant usage because one
can easily construct parsers by hand for expressions with inx operators. Aho and Ullman
[1972] give quite a complete overview of the available parsing techniques and their optimal
implementation.
Instead of obtaining the LALR(1) parser from the LR(1) parser by merging states, one
could begin with the SLR(1) parser and determine the exact right context only for those states
in which the transition function is ambiguous. This technique reduces the computation time,
but unfortunately does not generalize to an algorithm that eliminates all chain productions.
Construction 7.7 requires a redundant eort that can be avoided in practice. For example,
the closure of a situation [X ! B
;
] depends only upon the nonterminal B if the
lookahead set is ignored. The closure can thus be computed ahead of time for each B 2 N ,
and only the lookahead sets must be supplied during parser construction. Also, the repeated
construction of the follower state of an LALR(1) state that develops from the combination
of two LR(1) states with distinct lookahead sets can be simplied. This repetition, which
results from the marking of states as not yet examined, leaves the follower state (specied
as a set of situations) unaltered. It can at most add lookahead symbols to single situations.
This addition can also be accomplished without computing the entire state anew.
Our technique for chain production elimination is based upon an idea of Pager [1974].
Use of the failure matrix to increase the number of don't-care entries in the transition matrix
was rst proposed by Joliat [1973, 1974].
Exercises
7.1 Consider a grammar with embedded connection points. Explain why transformations
of the grammar can be guaranteed to leave the invocation sequence of the associated
parser actions invariant.
7.2 State the LL(1) condition in terms of the extended BNF notation of Section 5.1.3.
Prove that your statement is equivalent to Theorem 7.2.
7.3 Give an example of a grammar in which the graph of LAST contains a cycle. Prove
that FOLLOW (A) = FOLLOW (B ) for arbitrary nodes A and B in the same strongly
connected subgraph.
7.4 Design a suitable internal representation of a grammar and program the generation
algorithm of Section 7.2.3 in terms of it.
7.5 Devise an LL(1) parser generation algorithm that accepts the extended BNF notation
of Section 5.1.3. Will you be able to achieve a more ecient parser by operating upon
this form directly, or by converting it to productions? Explain.
7.6 Consider the interpretive parser of Figure 7.9.
(a) Dene additional operation codes to implement connection points, and add the
appropriate alternatives to the case statement. Carefully explain the interface
conventions for the parser actions. Would you prefer a dierent kind of parse
table entry? Explain.
(b) Some authors provide special operations for the situations [X ! B ] and [X !
tB ]. Explain how some recursion can be avoided in this manner, and write
appropriate alternatives for the case statement.
150 Parsing
(c) Once the special cases of (b) are recognized, it may be advantageous to provide
extra operations identical to 4 and 5 of Figure 7.9, except that the conditions are
reversed. Why? Explain.
(d) Recognize the situation [X ! t] and alter the code of case 4 to absorb the
processing of the 2 operation following it.
(e) What is your opinion of the value of these optimizations? Test your predictions
on some language with which you are familiar.
7.7 Show that the following grammar is LR(1) but not LALR(1):
Z ! A,
A ! aBcB , A ! B , A ! D,
B ! b, B ! Ff ,
D ! dE ,
E ! FcA, E ! FcE ,
F !b
7.8 Repeat Exercise 7.5 for the LR case. Use the algorithm of Section 7.3.4.
7.9 Show that FIRST (A) can be computed by any marking algorithm for directed graphs
that obtains a `spanning tree', B , for the graph. B has the same node set as the original
graph, G, and its branch set is a subset of that of G.
7.10 Consider the grammar with the following productions:
Z ! AXd, Z ! BX , Z ! C ,
A ! B, A ! C ,
B ! CXb,
C ! c,
X!
(a) Derive an LALR(1) parser for this grammar.
(b) Delete the reductions by the chain productions A ! B and A ! C .
7.11 Use the techniques discussed in Section 7.3.5 to compress the transition matrix pro-
duced for Exercise 7.8.
7.12 [Anderson et al., 1973] Consider a transition matrix for an LR parser constructed by
one of the algorithms of Section 7.3.2.
(a) Show that for every state q there is exactly one symbol z (q) such that f (q0 ; a)
implies a = z (q).
(b) Show that, in the case of shift-reduce transitions introduced by the algorithms
of Sections 7.3.3 and 7.3.4, an unambiguous symbol z (A ! ) exists such that
f (q; a) = `shift and reduce A ! ' implies a = z(A ! ).
(c) Show that the states (and shift-reduce transitions) can be numbered in such a
way that all states in column c have sequential numbers c0 + i, i = 0; 1; : : : Thus
it suces to store only the relative number i in the transition matrix; the base
c0 is only given once for each column. In exactly the same manner, a list of the
reductions in a row can be assigned to this row and retain only the appropriate
index to this list in the transition matrix.
(d) Make these alterations in the transition matrix produced for Exercise 7.8 before
beginning the compression of Exercise 7.11, and compare the result with that
obtained previously.
7.4 Notes and References 151
7.13 Bell [1974] Consider an m n transition matrix, t, in which all unspecied entries
are don't-cares. Show that the matrix can be compressed into a p q matrix c, two
length-m arrays f and u, and two length-n arrays g and by the following algorithm:
Initially fi = gi = 1, 1 i m, 1 j n, and k = 1. If all occupied columns of
the ith row of t uniformly contain the value r, then set fi := k, k := k + 1, ui := r
and delete the ith row of t. If the j th column is uniformly occupied, delete it also and
set gj := k, k := k + 1, j := r. Repeat this process until no uniformly-occupied row
or column remains. The remaining matrix is the matrix c. We then enter the row
(column) number in c of the former ith row (j th column) into ui (j ). The following
relation then holds:
ti;j = if fi < gj then ui
else if fi < gj then j
else (* fi = gj = 1 *) cu ; ;
i j
(Hint: Show that the size of c is independent of the sequence in which the rows and
columns are deleted.)
152 Parsing
Chapter 8
Attribute Grammars
Semantic analysis and code generation are based upon the structure tree. Each node of the
tree is `decorated' with attributes describing properties of that node, and hence the tree
is often called an attributed structure tree for emphasis. The information collected in the
attributes of a node is derived from the environment of that node; it is the task of semantic
analysis to compute these attributes and check their consistency. Optimization and code
generation can be also described in similar terms, using attributes to guide the transformation
of the tree and ultimately the selection of machine instructions.
Attribute grammars have proven to be a useful aid in representing the attribution of the
structure tree because they constitute a formal denition of all context-free and context-
sensitive language properties on the one hand, and a formal specication of the semantic
analysis on the other. When deriving the specication, we need not be overly concerned with
the sequence in which the attributes are computed because this can (with some restrictions) be
derived mechanically. Storage for the attribute values is also not re
ected in the specication.
We begin by assuming that all attributes belonging to a node are stored within that node in
the structure tree; optimization of the attribute storage is considered later.
Most examples in this chapter are included to show constraints and pathological cases;
practical examples can be found in Chapter 9.
translatable. We could also regard this condition as the computation of a Boolean attribute
consistent , which we associate with the left-hand side of the production.
As an example, Figure 8.1 gives a simplied attribute grammar for LAX assignments.
Each p 2 P is marked by the keyword ruleand written using EBNF notation (restricted to
express only productions). The elements of R(p) follow the keyword attribution. We use a
conventional expression-oriented programming language notation for the functions f , and ter-
minate each element with a semicolon. Particular instances of an attribute are distinguished
by numbering multiple occurrences of symbols in the production (e.g. name[1] , name[2] )
from left to right. Any condition is also marked by a keyword and terminated by a semicolon.
In order to check the consistency of the assignment and to further identify the + operator,
we must take the operand types into account. For this purpose we dene two attributes,
primode and postmode , for the symbols expression and name , and one attribute, mode ,
for the symbol addop . Primode describes the type determined directly from the node and its
descendants; postmode describes the type expected when the result is used as an operand by
other nodes. Any dierence between primode and postmode must be resolved by coercions.
The Boolean function coercible (t1 ; t2 ) tests whether type t1 can be coerced to t2 .
8.1 Basic Concepts of Attribute Grammars 155
assignment
name1 expression
Figure 8.2 shows the analysis of x := y + z according to the grammar of Figure 8.1.
(Assignment.environment would be computed from the declarations of x, y and z , but here
we show it as given in order to make the example self-contained.) Attributes on the same
line of Figure 8.2c can be computed collaterally; every attribute is dependent upon at least
one attribute from the previous line. These dependency relations can be expressed as a graph
(Figure 8.3). Each large box represents the production whose application corresponds to the
node of the structure tree contained within it. The small boxes making up the node itself
represent the attributes of the symbol on the left-hand side of the production, and the arrows
represent the dependency relations arising from the attribution rules of the production. The
node set of the dependency graph is just the set of small boxes representing attributes; its
edge set is the set of arrows representing dependencies.
environment
assignment
env pri post environment primode postmode
expression
name
symbol env pri post mode oper env pri post
identifier
name addop name
symbol symbol
identifier identifier
We must know all of the values upon which an attribute depends before we can compute
the value of that attribute. Clearly this is only possible if the dependency graph is acyclic.
Figure 8.3 is acyclic, but consider the following LAX type denition, which we shall discuss
in more detail in Sections 9.1.2 and 9.1.3:
type t = record (real x , ref t p );
We must compute a type attribute for each of the identiers t, x and p so that the
associated type is known at each use of the identier. The type attribute of t consists of
the keyword record plus the types and identiers of the elds. Now, however, the type of p
contains an application of t, implying that the type identied by t depends upon which type
a use of t identies. Thus the type t depends cyclically upon itself. (We shall show how to
eliminate the cycle from this example in Section 9.1.3.)
Let us now make the intuition gained from these examples more precise. We begin with
the grammar G, a set of attributes A(X ) for each X in the vocabulary of G, and a set of
attribution rules R(p) (and possibly a condition B (p)) for each p in the production set of G.
8.1 Definition
An attribute grammarSis a 4-tuple, AG = (G; A; R; B ). G = (T; N; P; Z )Sis a reduced context
free grammar, A = X 2T [N A(X )Sis a nite set of attributes, R = p2P R(p) is a nite
set of attribution rules, and B = p2P B (p) is a nite set of conditions. A(X ) \ A(Y ) 6=
; implies X = Y . For each occurrence of X in the structure tree corresponding to a sentence
of L(G), at most one rule is applicable for the computation of each attribute a 2 A(X ).
8.2 Definition
For each p : X0 ! X1 : : : Xn 2 P the set of dening occurrences of attributes is AF (p) =
fXi :a j Xi :a f (: : : ) 2 R(p)g. An attribute X:a is called derived or synthesized if there
exists a production p : X ! and X:a is in AF (p); it is called inherited if there exists a
production q : Y ! X and X:a 2 AF (q).
Synthesized attributes of a symbol represent properties resulting from consideration of the
subtree derived from the symbol in the structure tree. Inherited attributes result from con-
sideration of the environment. In Figure 8.1, the name.primode and addop.operation at-
tributes were synthesized; name.environment and addop.mode were inherited.
Attributes such as the value of a constant or the symbol of an identier, which arise in
conjunction with structure tree construction, are called intrinsic. Intrinsic attributes re
ect
our division of the original context-free grammar into a parsing grammar and a symbol gram-
mar. If we were to use the entire grammar of Appendix A as the parsing grammar, we could
easily compute the symbol attribute of an identifier node from the subtree rooted in that
node. No intrinsic attributes would be needed because constant values could be assigned
to left-hand side attributes in rules such as letter ::= 'a'. Thus our omission of intrinsic
attributes in Denition 8.2 results in no loss of generality.
8.3 Theorem
The following sets are disjoint for all X in the vocabulary of G:
AS (X ) = fX:a j 9p : X ! 2 P and X:a 2 AF (p)g
AI (X ) = fX:a j 9q : Y ! X 2 P and X:a 2 AF (q)g
Further, there exists at most one rule X:a f (: : : ) in R(p) for each p 2 P and a 2 A(X ).
8.1 Basic Concepts of Attribute Grammars 157
Evaluate name.environment
Move to name
Evaluate expression.environment
Move to expression
Evaluate name.postmode
Move to name
Evaluate expression.postmode
Move to expression
Move to parent
a) Procedure for assignment ::= name ':=' expression
Evaluate name[1].environment
Move to name[1]
Evaluate name[2].environment
Move to name[2]
Evaluate expression.primode
Move to parent
Evaluate name[1].postmode
Move to name[1]
Evaluate addop.mode
Move to addop
Evaluate name[2].postmode
Move to name[2]
Evaluate condition
Move to parent
b) Procedure for expression ::= name addop name
Evaluate name.primode
Move to parent
Evaluate condition
Move to parent
c) Procedure for name ::= identifier
Figure 8.4: Evaluation Procedures for Figure 8.1
the algorithms for p and q. In Figure 8.3, for example, the algorithms for expression ::= name
addop name and assignment ::= name ':=' expression are both involved in computation
of attributes for the expression node. Because all computation begins and ends at the
root, the general pattern of the (coroutine) interaction would be the following: The algorithm
for q computes values for some subset of AI (X ) using a sequence of evaluation instructions.
It then passes control to the algorithm for p by executing `move to child i'. After using a
sequence of evaluation operations to compute some subset of AS (X ), the algorithm for p
returns by executing `move to parent'. (Of course both algorithms could have other attribute
evaluations and moves interspersed with these; here we are considering only computation of
X 's attributes.) This process continues, alternating computation of subsets of AI (X ) and
AS (X ) until all attribute values are available. The last action of each algorithm is `move to
parent'.
Figure 8.4 gives possible algorithms for the grammar of Figure 8.1. Because a symbol like
expression can appear in several productions on the left or right sides, we always identify
the production for the child node by giving only the left-hand-side symbol. We do not answer
the question of which production is really used because in general we cannot know. For the
same reason we do not specify the parent production more exactly.
160 Attribute Grammars
The attributes of X constitute the only interface between the algorithms for p and q.
When the algorithm for q passes control to the algorithm for p by executing `move to child i',
it expects that a particular subset of AS (X ) will be evaluated before control returns. Since the
algorithms must work for all structure trees, this subset must be evaluated by every algorithm
corresponding to a production of the form X ! . The same reasoning holds for subsets of
AI (X ) evaluated by algorithms corresponding to productions of the form Y ! X .
8.9 Definition
Given a partition of A(X ) into disjoint subsets Ai (X ), i = 1; : : : ; m(X ) for each X in the
vocabulary of G, the resulting partition of the entire attribute set A is admissible if, for all
X , Ai (X ) is a subset of AS (X ) for i = m; m , 2; : : : and Ai (X ) is a subset of AI (X ) for
i = m , 1; m , 3; : : : Ai (X ) may be empty for any i.
8.10 Definition
An attribute grammar is partitionable if it is locally acyclic and an admissible partition exists
such that for each X in the vocabulary of G the attributes of X can be evaluated in the
order A1 (X ); : : : ; Am (X ). An attribute grammar together with such a partition is termed
partitioned.
Since all attributes can be evaluated, a partitionable grammar must be well-dened.
A set of attribution algorithms satisfying our constraints can be constructed if and only
if the grammar is partitioned. The admissible partition denes a partial ordering on A(X )
that must be observed by every algorithm. Attributes belonging to a subset Ai (X ) may be
evaluated in any order permitted by DDP (p), and this order may vary from one production
to another. No context switch across the X interface occurs while these attributes are being
evaluated, although context switches may occur at other interfaces. A move instruction
crossing the X interface follows evaluation of each subset.
The grammar of Figure 8.1 is partitioned, and the admissible partition used to construct
Figure 8.4 was:
A1 (expression ) =fenvironment g A1 (name ) =fenvironmen tg
A2 (expression ) =fprimode g A2 (name ) =fprimode g
A3 (expression ) =fpostmode g A3 (nam e) =fpostmode g
A4 (expression ) =fg A4 (name ) = fg
A1 (addop ) =fmode g
A2 (addop ) =foperation g
A4 is empty in the cases of both expression and name because the last nonempty subset
in the partition consists of inherited attributes, while Denition 8.9 requires synthesized
attributes. At this point the algorithm actually contains a test of the condition, which we
have already noted can be regarded as a synthesized attribute of the left-hand-side symbol.
With this interpretation, it would constitute the single element of A4 for each symbol.
tree S , because then X:b could not be calculated before X:a as required by the fact that
i > j . DDP (p) gives direct dependencies for all attributes, but the graph of DT (S ) includes
indirect dependencies resulting from the interaction of direct dependencies. These indirect
dependencies may lead to a cycle in the graph of DT (S ) as shown in Figure 8.5. We need a
way of characterizing these dependencies that is independent of the structure tree.
p
r s
IDS (X ) = fa ! bg
IDS (Y ) = fc ! e; d ! f; e ! d; f ! cg
b) Induced dependencies for symbols
Figure 8.6: A Well-Dened Grammar
IDP (p) and IDS (X ) are pessimistic approximations to the desired dependency relations.
Any essential dependency that could be present in any structure tree is included in IDP (p)
and IDS (X ), and all are assumed to be present simultaneously. The importance of this point
is illustrated by the grammar of Figure 8.6, which is well-dened but not partitioned. Both
c ! e and d ! f are included in IDS (Y ) even though it is clear from Figure 8.7 that only
one of these dependencies could occur in any structure tree. A similar situation occurs for
e ! d and f ! c. The result is that IDS (Y ) indicates a cycle that will never be present in
any DT .
The pessimism of the indirect dependencies is crucial for the existence of a partitioned
grammar. Remember that it must always be possible to evaluate the attributes of X in
the order specied by the admissible partition. Thus the order must satisfy all dependency
relations simultaneously.
8.2 Traversal Strategies 163
Z ::= X Z ::= X
a b a b
X ::= sY X ::= sY
c d e f c d e f
Y ::= u Y ::= v
Z ::= X Z ::= X
a b a b
X ::= tY X ::= tY
c d e f c d e f
Y ::= u Y ::= v
rule Z ::= s X X.
attribution
X[1].a X[2].d ;
X[1].c 1;
X[2].a X[1].d ;
X[2].c 2;
rule Z ::= t X X.
attribution
X[1].a 3;
X[1].c X[2].b ;
X[2].a 4;
X[2].c X[1].b ;
rule X ::= u.
attribution
X.b X.a ;
X.d X.c ;
Z ::= sXX
a b c d a b c d
X ::= u X ::= u
Z ::= tXX
a b c d a b c d
X ::= u X ::= u
Here m is the smallest n such that Tn,1 (X ) [ Tn (X ) = A(X ), T,1 (X ) = T0 (X ) = ;, and for
k<0
T2k,1 (X ) = fa 2 AS (X ) j a ! b 2 IDS (X ) implies b 2 Tj (X ); j (2k , 1)g
T2k (X ) = fa 2 AI (X ) j a ! b 2 IDS (X ) implies b 2 Tj (X ); j 2kg
This denition requires that all Tj (X ) actually exist. Some attributes remain unassigned to
any Tj (X ) if (and only if) the grammar is locally acyclic and some IDS contains a cycle.
For the grammar of Figure 8.10, construction 8.16 leads to the `obvious' partition discussed
above, which fails. Thus the grammar is not ordered, and we must conclude that the ordered
grammars form a proper subclass of the partitionable grammars.
Suppose that a partitioned attribute grammar is given, with partitions A1 (X ); : : : ; Am (X )
for each X in the vocabulary. In order to construct an attribution algorithm for a production
p : X0 ! X1 : : : Xn , we begin by dening a new attribute ci;j corresponding to each subset
Ai (Xj ) of attributes not computed in the context of p. (These are the inherited attributes
Ai (X0 ), i = m , 1; m , 3; : : : of the left-hand side and the synthesized attributes Ai(Xj ); j 6=
0; i = m; m , 2; : : : of the right-hand side symbols.) For example, the grammar of Figure 8.1
is partitioned as shown at the end of Section 8.2.1. In order to construct the attribution
algorithm of Figure 8.4b, we must dene new attributes as shown in Figure 8.11a.
Every occurrence of an attribute from Ai (Xj ) is then replaced by ci;j in DP (p) [ DDP (p),
as illustrated by Figure 8.11b. DP (p) alone does not suce in this step because it was derived
(via IDP (p)) from NDDP (p), and thus does not re
ect all dependencies of DDP (p). In
Figure 8.11b, for example, the dependencies expression.primode ! name[i].postmode
(i = 1; 2) are in DDP but not DP .
Figure 8.11b has a single node for each ci;j because each partition contains a single at-
tribute. In general, however, partitions will contain more than one attribute. The resulting
graph still has only one node for each ci;j . This node represents all of the attributes in Ai (Xj ),
and hence any relation involving an attribute in Ai (Xj ) is represented by an edge incident
upon this node.
166 Attribute Grammars
The graph of Figure 8.11b describes a partial order. To obtain an attribution algorithm,
we augment the partial order with additional dependencies, consistent with each other and
with the original partial order, until the nodes are totally ordered. Figure 8.11c shows such
additional dependencies for Figure 8.11b. The total order denes the algorithm: Each element
that is an attribute in AF (p) corresponds to a computation of that attribute, each element
ci;0 corresponds to a move to the parent, and each element ci;j (j > 0) corresponds to a move
to the j th child. Finally, a `move to parent' operation is added to the end of the algorithm.
Figure 8.4b is the algorithm resulting from the analysis of Figure 8.11.
The construction sketched above is correct if we can show that all attribute dependencies
from IDP (p) and DDP (p) are accounted for and that the interaction with the moves between
nodes is proper. Since IDP (p) is a subset of DP (p), problems can only arise from the merging
of attributes that are not elements of AF (p). We distinguish ve cases:
Xi :a ! Xi :b 2 IDP (p), a 2= AF (p), b 2= AF (p)
Xi :a ! Xi :b 2 IDP (p), a 2 AF (p), b 2= AF (p)
Xi :a ! Xi :b 2 IDP (p), a 2= AF (p), b 2 AF (p)
Xi :a ! Xj :b 2 IDP (p), i 6= j , a 2= AF (p)
Xi :a ! Xj :b 2 IDP (p), i 6= j , b 2= AF (p)
In the rst case the dependency is accounted for in all productions q for which a and b
are elements of AF (q). In the second and third cases Xi :a and Xi :b must belong to dierent
subsets Ar (Xi ) and As (Xi ). The dependency manifests itself in the ordering condition r < s
or s < r, and will not be disturbed by collapsing either subset. In the fourth case we compute
Xj :b only after all of the attributes in the subset to which Xi :a belongs have been computed;
this is simply an additional restriction. The fth case is excluded by Denition 8.11: Xi :a !
Xj :b cannot be an element of DDP (p) because Xj :b is not in AF (p); it cannot be an element
of any IDS because i 6= j .
8.2 Traversal Strategies 167
c1;0 = fexpression.environment g
c3;0 = fexpression.postmode g
c2;1 = fname[1].primode g
c4;1 = fg
c2;2 = faddop.operation g
c2;3 = fname[2].primode g
c4;3 = fg
a) New attributes
c 1,0
addop.mode
c 2,1 c 2,3
expression.primode
When an algorithm begins with a visit cj;i , this visit may or may not actually be carried
out. Suppose that the structure tree has been completed before the attribution is attempted.
The traversal then begins at the root, and every algorithm will be initiated by a `move to child
i'. Now if the rst action of the algorithm is c1;0 , i.e. a move to the parent to compute inherited
attributes, this move is super
uous because the child is only invoked if these attributes are
available. Hence the initial c1;0 should be omitted. The situation is reversed if the tree is
being processed bottom-up, as when attribution is merged with a bottom-up parse: An initial
ci;j that causes a move to the leftmost subtree should be omitted.
Semantic conditions are taken care of in this schema by treating them as synthesized
attributes of the left-hand side of the production. They can be introduced into an algorithm
at any arbitrary point following computation of the attributes upon which they depend.
In practice, conditions should be evaluated as early as possible to enhance semantic error
recovery and reduce the lifetime of attributes.
168 Attribute Grammars
8.20 Theorem
An attribute grammar is LAG(k) if and only if it is locally acyclic and a partition A =
A1 [ [ Ak exists such that for all p : X0 ! X1 : : : Xn 2 P , Xi:a ! Xj :b 2 DDP (p),
a 2 Au(Xi ), b 2 Av (Xj ) implies one of the following conditions:
u<v
u = v and j = 0
u = v and i = 0 and a 2 AI (X0 )
u = v and 1 i < j
u = v and 1 i = j and a 2 AI (Xi )
Theorem 8.20 leads directly to a procedure for determining the partition and the value
of k from a locally acyclic grammar (Figure 8.13). For k = 1; 2; : : : this procedure assumes
that all remaining attributes belong to Ak and then deletes those for which this assumption
violates the theorem. There are two distinct stopping conditions:
No attribute is deleted. The number of traversals is k and the partition is A1 ; : : : ; Ak .
All attributes are deleted. The conditions of Theorem 8.20 cannot be met and hence
the attribute grammar is not LAG(k) for any k.
Analogous constructions are possible for RAG(k) grammars and for the alternating evalu-
able attribute grammars (AAG(k)). With the latter class, structure tree attributes are evalu-
ated by traversals that alternate in direction: The rst is left-to-right, the second right-to-left,
and so forth. We leave the derivation of these denitions and theorems, plus the necessary
processing routines, to the reader.
It is important to note that the algorithm of Figure 8.13 and its analogs for RAG(k) and
AAG(k) assign attributes to the rst traversal in which they might be computed. These
algorithms give no indication that it might also be possible to evaluate an attribute in a later
traversal without delaying evaluation of other attributes or increasing the total number of
traversals.
rule Z ::= X .
attribution
X.b 1;
rule X ::= W X.
attribution
X[1].a W.c ;
X[2].b X[1].b ;
W.d X[2].a ;
rule Z ::= X .
attribution
X.b 1;
rule X ::= W X Y.
attribution
X[1].a W.c ;
X[1].e Y.g ;
X[2].b X[1].b ;
W.d X[2].a ;
Y.f X[2].e ;
class expression ;
begin comment Declarations of primode , postmode and environment end;
class name ;
begin comment Declarations of primode , postmode and environment end;
class addop ;
begin comment Declarations of mode and operation end;
expression class p2 ;
begin ref (name) X1 ; ref (addop) X2 ; ref (name) X3 ;
comment Initialization of X1 , X2 and X3 needed here;
detach;
X1.environment := environment ;
resume (X1) ;
X3.environment := environment ;
resume (X3) ;
primode := if : : :
;
detach ;
X1.postmode := primode ;
resume (X1) ;
X2.mode := primode ;
resume (X2) ;
X3.postmode := primode ;
resume (X3) ;
if : : :
; comment
Evaluate the condition;
detach ;
end;
Figure 8.16: SIMULA Implementation of Figure 8.4b
transformed to a collection of recursive procedures, or embodied in a set of tables to be
interpreted. We shall discuss each of these possibilities in turn.
The coroutines can be coded directly in SIMULA as classes, one per symbol and one
per production. Each symbol class denes the attributes of the symbol and serves as a
prex for classes representing productions with that symbol on the left side. This allows us
to obtain access to a subtree having a particular symbol as its root without knowing the
production by which it was constructed. Terminal nodes t are represented only by the class
t. Each production class contains pointer declarations for all of its descendants X1 : : : Xn. A
structure tree is built using statements of the form node :- new p (or node :- new t )
to create nodes and assignments of the form node.xi :- subnode to link them. Since a side
eect of new is execution of the class body, the rst statement of each class body is detach
(return to caller). (Intrinsic attributes could be initialized by statements preceding this rst
detach.) Figure 8.16 gives the SIMULA coding of the procedure from Figure 8.4b.
Figure 8.17 gives an implementation using recursive procedures. The tree is held in a
data structure made up of the nodes dened in Figure 8.17a. When a node corresponding to
application of p : X0 ! X1 : : : Xn is created, its elds are initialized as follows:
X0 p = p
x pi = pointer to node representing Xi ; i = 1; : : : ; n
The body of a coroutine is broken at the detach statements, with each segment forming
one branch of the case statement in the corresponding procedure. Then detach is imple-
mented by simply returning; resume (Xi ) is implemented by sproc s (x pi ; k), where sproc s
174 Attribute Grammars
type
tree_pointer = tree_node ;"
tree_node = record
case
symbols of
s: (* one per symbol in the vocabulary *)
( :::
(* storage for attributes of S *)
case
s_p : integer of
p : (* one per production p : S X1 Xn *) ! :::
(x_p : array
[1..n] tree_pointer );
)
end
;
a) General structure of a node
procedure pproc_p (t : tree_pointer; k : integer) ;
(* one procedure per production *)
begin (* pproc_p *)
case of
k
0 :
:::
(* actions up to the first detach *)
::: (* successive segments *)
end ;
end; (* pproc_p *)
b) General structure of a production procedure
proceduresproc_s (t : tree_pointer; k : integer) ;
(* one procedure per symbol *)
begin (* sproc_s *)
case "
t .s_p of
p : pproc_p (t, k) ; (* one case element per production *)
:::
end;
end; (* sproc_s *)
c) General structure of a symbol procedure
Figure 8.17: Transformation of Coroutines to Procedures
is the procedure corresponding to symbol Xi and k is the segment of that procedure to be ex-
ecuted. Figure 8.18 shows the result of applying the transformation to Figure 8.16. We have
followed the schema closely in constructing this example, but in practice the implementation
can be greatly simplied.
A tabular implementation, in which the stack is explicit, can be derived from Figure 8.17.
It involves a pushdown automaton that walks the structure tree, invoking evaluate in much
the same way that the parsing automata of Chapter 7 invoke parser actions to report connec-
tion points. In each case the automaton communicates with another processor via a sequence
of simple data items. Thus the implementations of the automaton and the communicating
processor are quite distinct, and dierent techniques may be used to carry them out. The
number of actions is usually very large, and when deciding how to handle them one must take
account of any restrictions imposed by the implementation language and its compiler.
Figure 8.19 shows how the pushdown automaton is implemented. Each entry in the table
corresponds to an element of some algorithm and there is an auxiliary function, segment ,
8.3 Implementation Considerations 175
such that segment (k; p) is the index of the rst entry for the kth segment of the algorithm
for production p. If the element corresponds to Xi :a then it species the computation in
some appropriate manner (perhaps as a case index or procedure address); otherwise it simply
contains the pair of integers dening the visit. Because the selectors for a visit must be
extracted from the table, rather than being built into the procedure, the tree node must be
represented as shown in Figure 8.19b.
type
"
tree_pointer = tree_node ;
tree_node = record
case symbols of
expression :
(expression_environment : environment ;
expression_primode,expression_postmode:type_specification ;
case expression_2 : integer of
:::2 : (x_2 : array
[1..3] tree_pointer ); of :::
name :
(name_environment : environment ;
name_primode, name_postmode : type_specification ; ); :::
addop :
(addop_mode : type_specification ; ); :::
end ;
procedure sproc_expression (t : tree_pointer; k : integer );
begin (* sproc_expression *)
case " t .expression_2 of
2 : pproc_2 (t , k );
end ;
end; (* sproc_expression *)
procedure pproc_2 (t : tree_pointer; k : integer );
begin (* pproc_2 *)
case of
k
0 : (* construction of subtrees *);
1 : begin
" " "
t .x_1[1] .name_environment := t .expression_environment ;
"
sproc_name (t .x_1[1], 1 );
" " "
t .x_1[3] .name_environment := t .expression_environment ;
"
sproc_name (t .x_1[3], 1 );
"
t .expression_primode := ; if : : :
end ;
2 : begin
" "
t .x_1[1].name_postmode := t .expression_primode ;
"
sproc_name (t .x_1[1], 2 );
" "
t .x_1[2].name_postmode := t .expression_primode ;
"
sproc_addop (t .x_1[2], 1 );
" "
t .x_1[3].addop_postmode := t .expression_primode ;
"
sproc_name (t .x_1[3], 2 );
if : : :
;
end ;
end ;
end; (* pproc_2 *)
Simplications in the general coding procedure are possible for LAG(k),RAG(k) and
AAG(k) grammars. When k = 1 the partition for each X is A1 (X ) = AI (X ), A2 (X ) =
AS (X ), so no intermediate detach operations occur in the coroutines. This, in turn, means
that no case statement is required in the production procedures or in the interpretive model.
For k > 1 there are k + 1 segments in each procedure proc p , corresponding to the ini-
tialization and k traversals. It is best to gather together the procedures for each traversal
as though dealing with a grammar for which k = 1, and then run them sequentially. When
parsing by recursive descent, the tree construction, the calculation of intrinsic attributes and
the rst tree traversal can be combined with the parsing.
type
table_entry = record
caseis_computation : boolean of
true : (* Rp ; X a *)
i :
(rule : attribute_computation );
false : ;
(* Csegment number child *)
(segment_number, child : integer )
end
;
a) Structure of a table entry
type
tree_pointer = "
tree_node ;
tree_node = record
production : integer ;
X : array
[1..max_right_hand_side] of tree_pointer
end
;
b) Structure of a tree node
procedure interpret ;
label 1 ;
var t : tree_pointer ;
state , next : integer ;
begin (* interpret *)
t := root_of_the_tree ;
state := segment (0, t".production );
repeat
next := state + 1;
with table[state] do
if is_computation then evaluate (t , rule )
else if segment_number <> 0 then
begin
stack_push (t , next );
t := t".X[child] ;
next := segment (segment_number ,t".production );
end
else if stack_empty then goto 1
else stack_pop (t , next );
state := next ;
until false ; (* forever *)
1 : end; (* interpret *)
c) Table interpreter
Figure 8.19: Tabular Implementation of Attribution Algorithms
8.3 Implementation Considerations 177
block or procedure node is given during the processing of the nodes in the subtree, then we
obtain the same environment: First we reach the local denitions in the innermost enclosing
block and, in the same manner, the next outermost, etc. The search of the environment for
a suitable denition thus becomes a search of the local denition lists from inner to outer.
Attributes should often be completely removed from the corresponding nodes and repre-
sented by global variables or linked structures in global storage. We have already noted that
it is usually impossible to retain the entire structure tree in memory. Global storage is used
to guarantee that an attribute accessible by a pointer is not moved to secondary storage with
the corresponding node. Global storage is also useful if the exact size of an attribute cannot
be determined a priori. Finally, global storage has the advantage that it is directly accessible,
without the need to pass pointers as parameters to the evaluation procedures.
If the environment is kept as a global attribute then it is represented by a list of local
denitions belonging to the nested blocks or procedures. In order to be certain that the
`correct' environment is visible at each node we alter the global attribute during the traversal
of the structure tree: When we move to a block or procedure node from its parent, we copy
the local denition set to this environment variable; when we return to the parent we delete
it.
The description in the previous paragraph shows that in reality we are using a global
data structure to describe several related attribute values. This situation usually occurs with
recursive language elements such as blocks. The environment attribute shows the typical
situation for inherited attributes: Upon descent in the tree we alter the attribute value, for
example increasing its size; the corresponding ascent in the tree requires that the previous
state be restored. Sometimes, as in the case of the nesting depth attribute of a LAX block,
restoration is a simple inverse of the computation done on entry to the substructure. Often
there is no inverse, however, and the old value of the attribute must be saved explicitly. (The
environment represents an intermediate situation that we shall consider in Section refsec-9.3.)
By replacing the global variable with a global stack, we can handle such cases directly.
Global variables and stacks are also useful for synthesized attributes, and the analysis par-
allels that given above. Here we usually nd that attribute values replace each other at suc-
cessive ascents in the tree. An example is the primode computation in a LAX case clause :
Ordered attribute grammars were originated by Kastens [1976, 1980], who used the term
`arranged orderly' to denote a partitioned grammar. OAG is a subclass of ANCAG for which
no decisions about evaluation order are made dynamically; all have been shifted to evaluator
construction time. This means that attribute lifetimes can be determined easily, and the
optimizations discussed in Section 8.3.2 can be applied automatically: In a semantic analyzer
for Pascal, constructed automatically from an ALADIN description by the GAG [Kastens
et al., 1982] system, attributes occupied only about 20% of the total structure tree storage.
Lewis et al. [1974] studied the problem of evaluating all attributes during a single depth-
rst, left-to-right traversal of the structure tree. Making no use of the local acyclicity of
DDP (p), they derived the rst three conditions we stated in Theorem 8.18. The same con-
ditions were deduced independently by Bochman [1976], who went on to point out that
dependencies satisfying the fourth condition of Theorem 8.18 are allowed if the relationship
NDDP (p) is used in place of DDP (p). There is no real need for this substitution, however,
because if DDP (p) is locally acyclic then the dependency Xi :a ! Xj :b immediately rules out
Xj :b ! Xi :a. Thus dependencies satisfying the fourth condition of Theorem 8.18 cannot lead
to any problem in left-to-right evaluation. Since local acyclicity is a necessary condition for
well-denedness, this assumption does not result in any loss of generality.
LAG(k) conditions similar to those of Theorem 8.20 were also stated by Bochman [1976].
Again, he did not make use of local acyclicity to obtain the last condition of our result.
Systems based upon LAG(k) grammars have been developed at the Universite de Montreal
[Bochmann and Lecarme, 1974] and the Technische Universitat Munchen [Giegerich,
1979].
The theoretical underpinnings of the latter system are described by Ripken [1977], Wil-
helm [1977] and Ganzinger [1978]. Wilhelm's work combines tree transformation with
attribution.
Alternating-evaluable grammars were introduced by Jazayeri and Walter [1975] as a
generalization of Bochmann's work. Their algorithm for testing the AAG(k) condition does
not provide precise criteria analogous to those of Theorem 8.18, but rather uses specications
such as `occur before [the current candidate] in the present pass' to convey the basic idea. A
group at the University of Helsinki developed a compiler generator based upon this form of
grammar [Raiha and Saarinen, 1977; Raiha et al., 1978].
Asbrock [1979] and Pozefsky [1979] considers the question of attribute overlap mini-
mization in more detail.
Jazayeri and Pozefsky [1977] and Pozefsky [1979] give a completely dierent method
of representing a structure tree and evaluating a multi-pass attribute grammar. They propose
that the parser create k sequential les Di such that Di contains the sequence of attribution
rules with parameters for pass i of the evaluation. Thus Di contains, in sequential form,
the entire structure of the tree; only the attribute values, arbitrarily arranged and without
pointers to subnodes, are retained in memory. Pozefsky [1979] also considers the question
of whether the evaluation of a multi-pass grammar can be arranged to permit overlaying of
the attributes in memory.
Exercises
8.1 Write an attribute grammar describing a LAX basic symbol as an identifier , integer
or floating point . (Section A.1 describes these basic symbols.) Your grammar should
compute the intrinsic attributes discussed in Section 4.1.1 for each basic symbol (with
the exception of location) as synthesized attributes. Use no intrinsic attributes in your
grammar. Be sure to invoke the appropriate symbol and constant table operations
during your computation.
8.4 Notes and References 181
8.2 [Banatre et al., 1979] Write a module for a given well-dened attribute grammar
(G; A; R; B ) that will build the attributed structure tree of a sentence of L(G). The
interface for the module must provide creation, access and assignment operations as
discussed in Section 4.1.2. The creation and assignment operations will be invoked by
parser actions to build the structure tree and set intrinsic attribute values; the access
operation will be invoked by other modules to examine the structure of the tree and
attribute values of the nodes. Within the module, access and assignment operations are
used to implement attribution rules. You may assume that all invocations of creation
and assignment operations from outside the module will precede any invocation of an
access operation from outside. Invocations from within the module must, of course, be
scheduled according to the dependencies of the attribute grammar. You may provide
an additional operation to be invoked from outside the module to indicate the end of
the sequence of external creation and assignment invocations.
8.3 Consider the following attribute grammar:
rule Z ::= s X.
attribution
X.a X.c ;
X.b X.a ;
rule Z ::= t X.
attribution
X.b X.d ;
X.a X.b ;
rule X ::= u.
attribution
X.d 1;
X.c X.d ;
rule X ::= v.
attribution
X.c 2;
X.d X.c ;
(a) Show that this grammar is partitionable using the admissible partition A1 (X ) =
fc; dg, A2 (X ) = fa; bg, A3 (X ) = fg.
(b) Compute IDP (p) and IDS (X ) replacing NDDP (p) by DDP (p) in Deni-
tion 8.12. Explain why the results are cyclic.
(c) Modify the grammar to make IDP (p) and IDS (X ) acyclic under the modication
of Denition 8.12 postulated in (b).
(d) Justify the use of NDDP (p) in Denition 8.12 in terms of the modication of (c).
8.4 Compute IDP and IDS for all p and X in the grammar of Figure 8.1. Apply construc-
tion 8.16, obtaining a partition (dierent from that given at the end of Section 8.2.1),
and verify that Theorem 8.13 is satised. Compute DP for all p, and verify that
Theorem 8.15 is satised.
8.5 Show that a partitionable grammar that is not ordered can be made into an ordered
grammar by adding suitable `articial dependencies' X:a ! X:b to some IDS (X ).
(In other words, the gap between partitionable and ordered grammars can always be
bridged by hand.)
182 Attribute Grammars
8.6 Dene a procedure Evaluate P for each production of an LAG(1) grammar such that
all attributes of a structure tree can be evaluated by applying Evaluate Z (where Z
is the production dening the axiom) to the root.
8.7 A right-to-left attribute grammar may have both inherited and synthesized attributes.
All of the attribute values can be obtained in some number of depth-rst, right-to-left
traversals of the structure tree. State a formal denition for RAG(k) analogous to
Denition 8.19 and prove a theorem analogous to Theorem 8.20.
8.8 [Jazayeri and Walter, 1975] Dene the class of alternating evaluable attribute gram-
mars AAG(k) formally, state the condition they must satisfy, and give an analysis pro-
cedure for verifying this condition. (Hint: Proceed as for LAG(2k), but make some of
the conditions dependent upon whether the traversal number is odd or even.)
8.9 Extend the basic denitions for multi-pass attribute grammars to follow the hybrid
linearization strategy of Figure 4.4d: Synthesized attributes can be evaluated not only
at the last visit to a node but also after the visit to the ith subnode, 1 i n, or
even prior to the rst subnode visit (i = 0). How does this change the procedure
determine traversals ?
8.10 Show that the LAG(k), RAG(k) or AAG(k) condition can be violated by a partitionable
attribute grammar only when a syntactic rule leads to recursion.
8.11 Complete the class denitions of Figure 8.16 and ll in the remaining details to obtain
a complete program that parses an assignment statement by recursive descent and then
computes the attributes. If you do not have access to SIMULA, convert the schema
into MODULA2, Ada or some other language providing coroutines or processes.
8.12 Under what conditions will the tabular implementation of an evaluator for a partitioned
attribute grammar require less space than the coroutine implementation?
8.13 Give detailed schemata similar to Figure 8.17 for LAG(k) and AAG(k) evaluators,
along the lines sketched at the end of Section 8.3.1.
8.14 Consider the implementation strategies for attribution algorithms exemplied by Fig-
ures 8.17 and 8.19.
(a) Explain why the tree node of Figure 8.19b is less space-ecient than that of
Figure 8.17a.
(b) Show that, by coding the interpreter of Figure 8.19c in assembly language and
assigning appropriate values to the child eld of Figure 8.19a, it is possible to use
the tree node of Figure 8.17a and also avoid the need for the sproc s procedures
of Figure 8.17c.
8.15 Modify Figure 8.1 by replacing name with expression everywhere, and changing the
second rule to expression ::= '(' expression addop expression ')'. Consider an
interpretive implementation of the attribution algorithms that follows the model of
Exercise 8.16.
(a) Show the memory layout of every possible node.
(b) Dene another rule, addop ::= '-', with a suitable attribution procedure. What
nodes are aected by this change, and how?
(c) Show that the addop node can be incorporated into the expression node without
changing the attribution procedures for addop . What is the minimum change
necessary to the interpreter and the attribution procedure for expression ? (Hint:
Introduce a second interpretation for ci;j .)
Chapter 9
Semantic Analysis
Semantic analysis determines the properties of a program that are classed as static semantics
(Section 2.1.1), and veries the corresponding context conditions { the consistency of these
properties.
We have already alluded to all of the tasks of semantic analysis. The rst is name anal-
ysis, nding the denition valid at each use of an identier. Based upon this information,
operator identication and type checking determine the operand types and verify that they
are allowable for the given operator. The terms `operator' and `operand' are used here in
their broadest sense: Assignment is an operator whether the language denition treats it as
such or not; we also speak of procedure parameter transmission and block end (end of extent)
as operations.
Section 9.1 is devoted to developing a formal specication of the source language from
which analysis algorithms can be mechanically generated by the techniques of Chapters 5-8.
Our goal for the specication is clarity, so that we can convince ourselves of its correctness.
This is an important point, because the correspondence between the specication and the
given source language cannot be checked formally. In the interest of clarity, we often use
impractically inecient descriptions that give the eect of auxiliary functions, but do not
re
ect their actual implementation. Section 9.2 discusses the practical implementation of
these auxiliary functions by modules.
advantage of using attribute grammars (or some other formal description tool such as denota-
tional semantics) lies in the fact that one has a comprehensive and unied specication. This
ensures that the parsing grammar, structure tree and semantic analysis `t together' without
interface problems.
Development of an attribute grammar consists of the following interdependent steps:
Development of the context-free syntax.
Determination of the attributes and specication of their types.
Development of the attribution rules.
Formulation of the auxiliary functions.
Three major aspects of semantic analysis described via attribution are scope and name
analysis, types and type checking, and operator identication in expressions. With a few
exceptions, such as the requirement for distinct case labels in a case clause (Section A.4.6),
all of the static semantic rules of LAX fall into these classes. Sections 9.1.1 to 9.1.4 examine
the relevant attribution rules in detail.
Many of the attribution rules in a typical attribute grammar are simple assignments. To
reduce the number of such assignments that must be written explicitly, we use the following
conventions: A simple assignment to a synthesized attribute of the left-hand side of a pro-
duction may be omitted when there is exactly one symbol on the right-hand side that has
a synthesized attribute with the same name. Similarly, simple assignments of inherited at-
tributes of the left-hand side to same-named inherited attributes of any number of right-hand
side symbols may be omitted. In important cases we shall write these (semantic) transfers
for emphasis. (Attribute grammar specication languages such as ALADIN [Kastens et al.,
1982] contain even more far-reaching conventions.)
We assume for every record type R used to describe attributes the existence of a function
N R whose parameters correspond to the elds of the record. This function creates a new
record of type R and sets its elds to the parameter values. Further, we may dene a list of
objects by records of the form:
type
"
t_list = t_list_element ;
t_list_element = record
first : t ; rest : t_list end;
If
e is an object of type t then we shall also regard e as a single element of type
t list wherever the context requires this interpretation. We write l 1 & l 2 to indicate
concatenation of two lists, and hence e & l describes addition of the single element e to the
front of the list l . `Value semantics' are assumed for list assignment: A copy of the entire
list is made and this copy becomes the value of the attribute on the left of the arrow.
9.1.1 Scope and Name Analysis
The scope of identiers is specied in most languages by the hierarchical structure of the
program. In block structured languages the scopes are nested. Languages like FORTRAN
have only a restricted number of levels in the hierarchy (level 1 contains the subprogram and
COMMON names, level 2 the local identiers of a subprogram including statement numbers).
Further considerations are the use of implicit denition (FORTRAN), the admissibility (AL-
GOL 60) or inadmissibility (LIS) of new denitions in inner blocks for identiers declared in
outer blocks, and the restriction of scope to the portion of the block following the denition
(C). We shall consider the special properties of eld selectors in Section 9.1.3.
Every denition of an identier is represented in the compiler by a variant record. The
types of Figure 9.1a suce for LAX; dierent variants would be required for other languages.
9.1 Description of Language Properties via Attribute Grammars 185
type
definition_class = (
object_definition , (* Section A.3.1 *)
type_definition , (* Section A.3.1 *)
label_definition , (* Section A.2.6 *)
unknown_definition ); (* Undefined identifier *)
definition = record
uid : integer ; :::(* Discussed in Section 9.1.3 *)
ident : symbol ; (* Identifier being defined *)
casek : definition_class of
object_definition : (object_type :mode ); (* mode is discussed *)
type_definition : (defined_type :mode ); (* in Section 9.1.2 *)
label_definition ,
unknown_definition : ()
end
;
a) The attributes of an identier
definition_table = "dt_element ;
dt_element = record first :definition ; rest :definition_table end;
b) Type of the environment attribute
rule name ::= identifier_use .
condition
identifier_use.corresponding_definition.k = object_definition ;
const i = 17;
type t = : : : ; (* First declaration of t *)
procedure p ;
type
j = i; (* Use of i illegal here *)
i = 1; (* This makes the previous line illegal *)
type
tt = "t ; (* Refers to second declaration of t *)
t = :::; (* Second declaration of t *)
declaration of tt is correct and identies the type whose declaration appears on the next
line. This problem can be solved by a variant of the standard technique for dealing with
declarations in a one-pass ALGOL 60 compiler (Exercise 9.5).
9.1.2 Types
A type species the possible operations on an entity and the coercions that can be applied
to it. During semantic analysis this information is used to identify operators and verify
the compatibility of constructs with their environment. We shall concentrate on languages
with manifest types. Languages with latent types, in which type checking and operator
identication occur during execution, are treated in the same manner except that these tasks
are deferred.
In order to perform the tasks outlined in the previous paragraph, every structure tree
node that represents a value must have an attribute describing its type. These attributes
are usually tree-valued, and are built of linked records. For uniformity, the compiler writer
should dene a single record format to be used in building all of them. The record format
must therefore be capable of representing the type of any value that could appear in a source
program, regardless of whether the language denition explicitly describes that value as being
typed. For example, the record format used in a LAX compiler must be capable of representing
the type of nil because nil can appear as a value. Section A.3.1 does not describe nil as
having a specic type, but says that it `denotes a value of type ref t , for arbitrary t' .
Figure 9.7 denes a record that can be used to build attributes describing LAX types.
Type class bad type is used to indicate that errors have made it impossible to determine the
proper type. The type itself must be retained, however, since all attributes must be assigned
values during semantic analysis. Nil type is the type of the predened identier nil . We
also need a special mechanism for describing the result type of a proper procedure. Void type
species this case, and in fact is used whenever a result is to be discarded.
For languages like ALGOL 60 and FORTRAN, which have only a xed number of types,
an enumeration similar to type class serves to represent all types. Array types must also
specify the number of dimensions, but the element type can be subsumed into the enumeration
(e.g. integer array type or real array type ). Pascal requires additional specications for
the index bounds; in LAX the bounds are expressions whose values do not belong to the static
semantics, as illustrated by the rules of Figure 9.8.
Figure 9.9 shows how procedure types are constructed in LAX. (Bad symbol represents a
nonexistent identier.) Because parameter transmission is always by value (reference param-
eters are implemented by passing a ref value as discussed in Section 2.5.3) it is not necessary
to give a parameter transmission mechanism. In Pascal or ALGOL 60, however, the trans-
mission mechanism must be included for each parameter. For a language like Ada, in which
190 Semantic Analysis
type
type_class = (
bad_type , nil_type , void_type , bool_type , int_type , real_type ,
ref_type ,
arr_type ,
rec_type ,
proc_type ,
unidentified_type , (* See Section 9.1.3 *)
identified_type ); (* See Section 9.1.3 *)
mode = record
case k : type_class of
bad_type , nil_type , void_type , bool_type ,
int_type , real_type : ();
"
ref_type : (target : mode );
arr_type : (dimensions : integer ; element : mode ); "
rec_type : (fields : definition_table );
proc_type : (parameters : definition_table ; result : mode ); "
unidentified_type : (identifier : symbol );
identified_type : (definition : integer )
end ;
Figure 9.7: Representation of LAX Types
keyword association of arguments and parameters is possible, the identiers must be retained
also. We retain the parameter identiers, even though this is not required in LAX, to reduce
the number of attributes for the common case of a procedure declaration (A.3.0.8). Here we
can use the procedure type attribute both to validate the type compatibility and to provide
the parameter denitions. If we were to remove the parameter identiers from the procedure
type this would not be possible.
When types and denitions are represented by attributes, the complete set of declarations
(other than procedure declarations) can, in principle, be deleted from the structure tree
to avoid duplicating information both as attributes and as subtrees of the structure tree.
Actually, however, this compression of the representation should only be carried out under
extreme storage constraints; normally both representations should be retained. The main
reason is that expressions (like dynamic array bounds) appearing within declarations cannot
be abstracted as attributes because they are not evaluated until the program is executed.
Context-sensitive properties of types lead to several relations that can be expressed as
recursive functions over types (objects of type mode ). These basic relations are:
Equivalent : Two types t and t' are semantically equivalent.
Compatible : Usually an asymmetric relation, in which an object of type t can be used in
place of an object of type t' .
Coercible : A type t is coercible to a type t' if it is either compatible with t' or can be
converted to t' by a sequence of coercions.
Type equivalence is dened in Section A.3.1 for LAX; this denition is embodied in the
procedure type equivalent of Figure 9.10. Type equivalent must be used in all cases
where two types should be compared. The direct comparison t1 = t2 may not yield true for
equivalent composite types because the pointers contained in the type records may address
equivalent types represented by dierent records.
The test for equivalence of type identiers is for the identity of the type declarations
rather than for the equivalence of types they declare. This re
ects the name equivalence
9.1 Description of Language Properties via Attribute Grammars 191
rule of Section A.3.1. If structural equivalence is required, as in ALGOL 68, then we must
compare the declared types instead. A simple implementation of this comparison leads to
innite recursion for types containing pointers to themselves. The recursion can, however, be
stopped as soon as we attempt to compare two types whose comparison has been begun but
has not yet terminated. During comparison we therefore hold such pairs in a stack. Since the
only types that can participate in innite recursion are those of class identified type , we
enter pairs of identified type types into the stack when we begin to compare them. The
next pair is checked against the stack before beginning their comparison; if the pair is found
then they are considered to be equivalent and no further comparison of them is required. (If
they are not equivalent, this will be detected by the rst comparison { the one on the stack.)
rule type_specification ::= 'ref' type_specification .
attribution
type_specification[1].repr N_mode (ref_type,
type_specification[2].repr );
The treatment of array variables in Figure 9.12 re
ects the requirements of Section A.3.2.
We construct the array type based only on the dimensionality and element type. The bounds
must be integer expressions, but they are to be evaluated at execution time.
Type declarations introduce apparent circularities into the declaration process: The de-
nition of an identier must be known in order to dene that identier. One obvious example,
the declaration type t = record x : real ; p : ref t end, was mentioned in Section 8.1.
Another is the fact that the analysis process discussed in Section 9.1.1 assumes we can con-
struct denitions for all identiers in a range and then form an environment for that range.
Unfortunately the denition of a variable identier includes its type, which might be specied
by a type identier declared in the same range. Hence the environment must be available to
obtain the type.
We solve the problem in three steps, as shown in Figure 9.13, using the unidenti-
fied type and identified type variants of mode :
1. Collect all of the type declarations of a range into one attribute, of type defini-
tion table . Any type identiers occurring in the corresponding types are not yet
identied, but are given by the unidentified type variant.
9.1 Description of Language Properties via Attribute Grammars 195
2. As soon as step (1) has been completed, transform the entire attribute to an-
other definition table in which each unidentified type has been replaced by an
identified type that identies the proper denition. This transformation uses the
environment inherited by the range as well as the information present in the type dec-
larations.
3. Incorporate the newly-created definition table into the range's environment, and
then process all of the remaining declarations (none of which are type declarations).
Complete env is a recursive function that traverses the denitions seeking unidentied
types. Whenever one is found, identify type (Figure 9.14) is used to obtain the current
denition of the type identier. Note that identify type must use a unique representation
of the denition, not the denition itself, corresponding to the type identier. The reason
is that, if types involve recursive references, we cannot construct any of the denitions until
we have constructed all of them! (Remember that attributes are not variables, so it is not
possible to construct an `empty' denition and then ll it in later.)
196 Semantic Analysis
If not voided, the result has the same base type (type after all references and procedures
have been removed) as one of the operands.
If t1 is coercible to the base type of t2 but not to t2 itself, the result type is a dereferencing
and/or deproceduring of t2 .
If LAX types t1 and t2 are coerced to an a posteriori type t0 , then the type balance (t1 ; t2 )
always appears as an intermediate step. This may not be true in other languages, however. In
ALGOL 68, for example, balance (integer , real ) = real but both types can be coerced
to union (integer , real ) and in this case integer is not coerced to real rst.
Figure 9.16 illustrates the use of balancing. In addition to the attributes primode and
postmode , this example uses label values (synthesized, type case selectors ). Postmode
9.1 Description of Language Properties via Attribute Grammars 197
is simply passed through from top to bottom, so we follow our convention of not writing these
transfers explicitly. Label values collects the values of all case labels into a list so we can
check that no label has occurred more than once (Section A.4.6).
Note that there is no condition checking coercibility of the resulting a priori type of the
case clause to the a posteriori type. Similarly, the a priori type of the selecting expression is
not checked against its a posteriori type in these rules. Such tests appear only in those rules
where the a priori type is determined by considerations other than balancing or transfer from
adjacent nodes.
Figure 9.17 illustrates some typical attribution rules for primode and postmode in ex-
pressions. Table A.2 requires that the left operand of an assignment be a reference, and
Section A.4.2 permits only dereferencing coercions of the left operand. Thus the assignment
rule invokes deproc (Figure 9.18)
Figure 9.18 to obtain an a posteriori type for the name. Note that there is no guarantee
that the type obtained actually is a reference, so additional checks are needed. Coercible
(Figure 9.11) is invoked to verify that the a priori type of the assignment itself can be coerced
to the a posteriori type required by the context in which the assignment appears. As can be
seen from the remainder of Figure 9.17, this check is made every time an object is created.
Assignment is the only dyadic operator in Table A.2 whose left and right operands have
dierent types. In all other cases, the types of the operands must be the same. The attribution
198 Semantic Analysis
type
case_selectors = "cs_element ;
cs_element = record first : integer ; rest : case_selectors end;
a) Type of label values
rule case_clause ::=
'case' expression 'of' cases 'else' statement_list 'end'.
attribution
case_clause.primode balance (cases.primode ,
statement_list.primode );
expression.postmode N_mode (int_type );
condition
values_unambiguous (cases.label_values );
rules for comparison show how balance can be used in this case to obtain a candidate
operand type. The two rules for eqop illustrate placement of additional requirements upon
this candidate.
The attribution for a simple name sets the a priori type to the type specied by the
identier's denition, and must also verify (via coercible ) that the a priori type satises
the requirements of the context as specied by the a posteriori type. Field selection is a bit
trickier. Section A.4.4 states that the name preceding the dot may yield either an object or
a reference to an object. This requirement, which also holds for index selection, is embodied
in one ref (Figure 9.18). Note that the environment in which the eld identier is sought is
that of the record type denition, not the one in which the eld selection appears. We must
therefore write the transfer of the environment attribute explicitly. Finally, the type yielded
by the eld selection is a reference if and only if the object yielded by the name to the left of
the dot was a reference (Section A.4.4).
Figure 9.19 shows how the eld denitions of the record are obtained. Section A.3 requires
that every record type be given a name. The declaration process described in Figures 9.13
and 9.14 guarantees that if this name is associated with an identified type , the type
denition will actually be in the current environment. Moreover, the type denition cannot
specify anything but a record. Thus record env need not verify these conditions.
In most programming languages the specication of the operator and the a posteriori types
of the operands uniquely determines the operation to be carried out, but usually no operation
attribute appears in the language description itself. The reason is that semantic analysis does
9.1 Description of Language Properties via Attribute Grammars 199
Operator identication for Ada depends not only upon the a priori types of the operands,
but also upon the a posteriori type of the result. There is no coercion, so the a priori and
a posteriori types must be compatible, but on the other hand the constant 2 (for example)
could have any of the types `short integer', and `integer' and `long integer'. Thus both the
operand types and the result types must be determined by analysis of the tree.
Each operand and result is given one inherited and one synthesized attribute, each of
which is a set of types. We begin at the leaves of the tree and compute the possible (a priori)
types of each operand. Moving up the tree, we specify the possible operations and result types
based upon the possible combinations of operand types and the operator indication. Upon
arriving at the root of the tree for the expression we have a synthesized attribute for every
node giving the possible types for the value of this node. Moving down the tree, these type
sets are now further restricted: An inherited attribute, a subset of the previous synthesized
attribute, is computed for each node. It species the set of types permitted by the use of this
value as an operand in operations further up the tree. At the beginning of the descent, the
previously-computed set of possible result types at the root is used as the inherited attribute
of the root. If this process leads to a unique type for every node of the tree, i.e. if the inherited
attribute is always a singleton set, then the operations are all specied; otherwise at least one
operator (and hence the program) is semantically ambiguous and hence illegal.
Because LAX is an expression-oriented language, statements and statement-like constructs
(statement list , iteration , loop , etc.) also have primode and postmode attributes.
Most rules involving these constructs simply transfer those attributes. Figure 9.20 shows
rules that embody the conditions given in Sections A.2.4 through A.2.6.
rule iteration :=
'for' identifier 'from' expression 'to' expression loop .
attribution
iteration.primode N_mode (void_type );
expression[1].postmode N_mode (int_type );
expression[2].postmode N_mode (int_type );
loop.environment
N_definition (gennum ,identifier.sym ,object_definition ,
N_mode (int_type )) &
iteration.environment ;
loop.postmode N_mode (void_type );
condition
iteration.postmode.k = void_type ;
One of the most common attributes in the structure tree is the environment, which allows
us to determine the meaning of an identier at a given point in the program. In the simplest
case, for example in several machine-oriented languages, each identier has exactly one de-
nition in the program. The denition entry can then be reached directly via a pointer in the
symbol table. In fact, the symbol and denition table can be integrated into a single table in
this case.
Range header Possession relations for the range
Symbol
Entity
Current
Possession
relation
type
one = record f : integer ; g : "two end;
two = record f : boolean ; h : "one end;
var
j : "one ;
:::
with j " do
begin
:::
with g " do
begin
:::
with h " do
begin
:::
end
end
end;
Figure 9.22: Self-Nesting Ranges
When a range is entered, the stack for each identier dened in the range must be pushed
down and an entry describing the denition valid in this range placed on top. Conversely,
the stack for each identier dened in a range must be popped when leaving that range. To
simplify the updating, we represent the range by a linear list of elements specifying a symbol
table entry and a corresponding denition as shown at the top of Figure 9.21. This gives
constant-time access to the stacks to be pushed or popped, and means that the amount of
time required to enter or leave a range is linear in the number of identiers having denitions
in it.
We use a pointer to the denition rather than the denition itself in the range list because
many identiers in dierent ranges may refer to the same denition. (For example, in Pascal
many type identiers might refer to the same complex record type.) By using a pointer we
avoid having to store multiple copies of the denition itself, and also we simplify equality
tests on denitions.
We stack a pointer to the appropriate range list entry instead of stacking the range list
entry itself because it is possible to enter a range and then enter it again before leaving it.
(Figure 9.22 is a Pascal fragment that has this property. The statement with j " enters the
range of the record type one ; the range will be left at the end of that statement. However,
the nested statement with h " also enters the same range!) When a range is entered twice
without being left, its denitions are stacked twice. If the (single) range list entry were placed
on the stack twice, a cycle would be created and the compiler would fail.
Finally, we stack a pointer to the range list entry rather than a pointer to the denition
to cater for languages (such as COBOL and PL/1) that allow partial qualication: In a eld
selection the specication of the containing record may be omitted if it can be determined
unambiguously. (This assumes that, in contrast to LAX, exactly one object exists for each
record type. In other words, the concepts of record and record type merge.)
Figure 9.23 illustrates the problem of partial qualication, using an example from PL/1.
Each qualied name must include sucient identiers to resolve any ambiguity within a single
block; the reference is unambiguous if either or both of the following conditions hold:
The reference gives a valid qualication for exactly one declaration.
The reference gives the complete qualication for exactly one declaration.
9.2 Implementation of Semantic Analysis 205
A: PROCEDURE;
DECLARE
1 W,
:::;
B: PROCEDURE;
DECLARE
P,
1 Q,
2 R,
3 Z,
2 X,
3 Y,
3 Z,
3 Q;
Y = R.Z; /* Q.X.Y from B, Q.R.Z from B */
W = Q, BY NAME; /* W from A, major Q from B */
C: PROCEDURE
DECLARE Y,
1 R,
2 Z;
Z = Q.Y /* R.Z from C, Q.X.Y from B */
X = R, BY NAME; /* Q.X from B, R from C */
END C;
END B;
END A;
Figure 9.23: Partial Qualication
Most of the references in Figure 9.23 are unambiguous because the rst of these conditions
holds. The Q in W = Q, however, gives a valid qualication for either the major structure or
the eld Q:X:Q; it is unambiguous because it gives the complete qualication of the major
structure. References Z and Q:Z in procedure B would be ambiguous.
In order to properly analyze Figure 9.23, we must add three items of structural in-
formation to each possession relation in Figure 9.21: The level is the number of identi-
ers in a fully-qualied reference to the entity possessed. If the level is greater than 1,
containing structure points to the possession relation for the containing structure. In any
case, the range to which the possession belongs must be specied. Figure 9.24 shows the
possession relations for procedure B of Figure 9.23. Note that this range contains two valid
possession relations for Q and two for Z . The symbol stack entries for Z have been included
to show that this results in two stack entries for the same range.
A reference is represented by an array of symbols. The stack corresponding to the last
of these is scanned, and the test of Figure 9.25 applied to each possession relation. When a
relation satisfying the test is found, no further ranges are tested; any other relations for the
same symbol within that range must be tested, however. If more than one relation in a range
satises the test, then the reference is ambiguous unless the level of one of the relations is
equal to the number of symbols in the reference.
A denition table module might provide the following operations:
New range () range : Establish a new range.
Add possession (symbol, definition, range) : Add a possession relation to a given
range.
Enter range (range) : Enter a given range.
206 Semantic Analysis
Range header Possession relations for the range
P Q R Z X Y Z Q
1 1 2 3 2 3 3 3
Symbol stack
headers
Figure 9.24: Range Specication Including Structure
Leave range : Leave the current range.
Current definition (symbol) definition : Identify the denition corresponding to
a given identier at the current point in the program.
Definition in range (symbol, range) definition : Identify the denition corre-
sponding to a given identier in a given range.
The rst two of these operations are used to build the range lists. The next three have been
discussed in detail above. The last is needed for eld selection in languages such as Pascal
and LAX. Recall the treatment of eld selection in Figure 9.17. There the environment in
which the eld identier was sought consisted only of the eld identiers dened in the record
yielded by name . This is exactly the function of definition in range . If we were to enter
the range corresponding to the record and then use current definition , we would not
achieve the desired eect. If the identier sought were not dened in the record's range, but
was dened in an enclosing range, the latter denition would be found!
Unfortunately, definition in range must perform a search. (Actually, the search is
slightly cheaper than the incorrect implementation outlined in the previous paragraph.) It
might linearly search the list of denitions for the range representing the record type. This
technique is advantageous if the number of elds in the record is not too large. Alternatively,
we could associate a list of pairs (record type, pointer to a denition entry for a eld with
this selector) with each identier and search that. This would be advantageous if the number
of record types in which an identier occurred was, on the average, smaller than the number
of elds in a record.
In many compilers the semantic analysis is not treated as a separate task but as a by-
product of parsing or code generation. The result is generally that the static semantic condi-
tions are not fully veried, so erroneous programs are sometimes accepted as correct. We have
taken the view here that semantic analysis is the fundamental target-independent task of the
compiler, and should be the controlling factor in the development of the analysis module.
type
possession = record
"
range : range_header ;
"
next : possession ;
possessing_symbol : symbol ;
possessed_entity : entity ;
level : integer ;
containing_structure : possession "
end;
symbol_array = array
[1..max_qualifiers ] of symbol ;
Many of the techniques presented here for describing specic language facilities were the
result of experience with attribute grammars for PEARL, [DIN, 1980] Pascal [Kastens et al.,
1982], and Ada [Uhl et al., 1982] developed at the Universitat Karlsruhe. The representation
of arbitrarily many types by lists was rst discussed in conjunction with ALGOL 68 com-
pilers [Peck, 1971]. Koster [1969] described the recursive algorithm for ALGOL 68 mode
equivalence using this representation.
The attribution process for Ada operator identication sketched in Section 9.1.4 is due
to Persch and his colleagues [Persch et al., 1980]. Baker [1982] has proposed a similar
algorithm that computes attributes containing pointers to the operator nodes that must be
identied. The advantage claimed by the author is that if the nodes can be accessed randomly,
this means that a complete second traversal is unnecessary. Operator identication cannot
be considered in isolation, however. It is not at all clear that a second complete traversal
will not be required by other attribution, giving us the operator identication `for free'. This
illustrates the importance of constructing the complete attribute grammar without regard to
number of traversals, and then processing it to determine the overall evaluation order.
Most authors combine the symbol and denition tables into a single `symbol table' [Gries,
1971; Bauer and Eickel, 1976; Aho and Ullman, 1977]. Separate tables appear in descrip-
tions of multi-pass compilers and serve above all to reduce the main storage requirements;
[Naur, 1964] the literature on ALGOL 68 [Peck, 1971] is an exception. In his description of
a multi-pass compiler for `sequential Pascal', Hartmann [1977] separates the tables both to
reduce the storage requirement and simplify the compiler structure.
The basic structure of the denition table was developed for ALGOL 60 [Randell and
Russell, 1964; Grau et al., 1967; Gries, 1971]. We have rened this structure to allow
it to handle record types and incompletely-qualied identiers [Busam, 1971]. An algebraic
specication of a module similar to that sketched at the end of Section 9.2 was given by
Guttag [1975, 1977].
Exercises
9.1 Determine the visibility properties of Pascal labels. Write attribution rules that embody
these properties. Treat the prohibition against jumping into a compound statement as
a restriction on the visibility of the label denition (as opposed to the label declaration,
which appears in the declaration part of the block).
9.2 Write the function current definition (Figure 9.1c).
9.3 Write the function unambiguous (Figure 9.2a).
9.4 Note that Figure 9.5 requires additional information: the implicit type of an identier.
Check the FORTRAN denition to nd out how this information is determined. How
would you make it available in the attribute grammar? Be specic, discussing the role
of the lexical analyzer and parser in the process.
9.5 [Sale, 1979] Give attribution rules and auxiliary functions to verify the denition before
use constraint in Pascal. Assume that the environment is being passed along the text,
as illustrated by Figure 9.4.
(a) Add a depth eld to the denition record, and provide attribution rules that set
this eld to the static nesting depth at which the denition occurred.
9.3 Notes and References 209
(b) Add attribution rules that check the denition depth at each use of an identier.
Maintain a list of identiers that have been used at a depth greater than their
denition.
(c) When an identier is dened, check the list to ensure that the identier has not
previously been used at a level greater than or equal to the current level when it
was dened at a level less than the current level.
(d) Demonstrate that your rules correctly handle Figure 9.6.
9.6 What extensions to the environment attribute are required to support modules as
dened in MODULA2?
9.7 Extend the representation of LAX types to handle enumerated types and records with
variants, described as in Pascal.
9.8 Develop type representations analogous to Figure 9.7 for FORTRAN, ALGOL 60 and
Ada.
9.9 Modify the procedure type equivalent to handle the following alterations in the LAX
denition:
(a) Structural type equivalence similar to that of ALGOL 68 is specied instead of
the equivalence of A.3.1.
(b) Union types union (t1 ; : : : ; tn ) similar to those of ALGOL 68. The sequence
of types is arbitrary and union (t1 ; union (t2 ; t3 )) = union (union (t1 ; t2 ); t3 ) =
union (t1 ; t2 ; t3 ).
(d) Given the modied rules of (c), do any of the attributes you listed in (b) satisfy
the conditions for implementation as global variables? As global stacks? How do
your answers to these questions bear upon the implementation of the denition
table as a package vs. an abstract data type?
9.13 Develop denition tables for BASIC, FORTRAN, COBOL and Pascal.
9.14 Add the use before denition check of Exercise 9.5 to the denition table of Figure 9.21.
9.15 Give a detailed explanation of the problems encountered when analyzing Figure 9.22 if
possession relation entries are stacked directly.
9.16 How must a Pascal denition table be set up to handle the with statement? (Hint:
Build a stack of with expressions for each record type.)
9.17 Show the development during compilation of the denition table for the program of
Figure 9.23 by giving a sequence of snapshots.
Chapter 10
Code Generation
The code generator creates a target tree from a structure tree. This task has, in principle,
three subtasks:
Resource allocation: Determine the resources that will be required and used during
execution of instruction sequences. (Since in our case the resources consist primarily of
registers, we shall speak of this as register allocation.)
Execution order determination: Specify the sequence in which the descendants of a node
will be evaluated.
Code selection: Select the nal instruction sequence corresponding to the operations
appearing in the structure tree under the mapping discussed in Chapter 3.
In order to produce code optimum under a cost criterion that minimizes either program
length or execution time, these subtasks must be intertwined and iterated. The problem is
NP-complete even for simple machine architectures, which indicates that in practice the cost
will be exponential in the number of structure tree nodes. In view of the simple form of
the expressions that actually occur in programs, however, it is usually sucient to employ
linear-cost algorithms that do not necessarily produce the optimum code in all cases.
The approach taken in this chapter is to rst map the source-language objects onto the
memory of the target machine. An estimate of register usage is then made, and the execution
order determined on the basis of that estimate. Finally, the behavior of the target machine is
simulated during an execution-order traversal of the structure tree, driving the code selection
and register assignment. The earlier estimate of register usage must guarantee that all register
requirements can actually be met during the nal traversal. The code may be suboptimal in
some cases because the nal register assignment cannot aect the execution order.
The computation graph discussed in Section 4.1.3 is implicit in the execution-order struc-
ture tree traversal. Chapter 13 will make the computation graph explicit, and discuss opti-
mizing transformations that can be applied to it. If a compiler writer follows the strategies of
Chapter 13, some of the optimization discussed here becomes redundant. Nevertheless, the
three code generation subtasks introduced above remain unchanged.
Section 10.1 shows how the memory map is built up, starting with the storage requirements
for elementary objects given by the implementor in the mapping specication of Section 3.4.
We present the basic register usage estimation process in Section 10.2, and show how addi-
tional attributes can be used to improve the generated code. Target machine simulation and
code selection are covered in Section 10.3.
211
212 Code Generation
type
area = :::
size = :::
location = :::
direction = (up,down );
strategy = (align, pack );
procedure new_area (d : direction ; s : strategy ; var
a : area );
(* Establish a new memory area
On entry -
d = growth direction for this area
s = growth strategy for this area
On exit -
a specifies the new area
*)
:::;
procedure add_block (a : area ; s : size ; alignment : integer ;
varl : location );
(* Allocate a block in an area
On entry -
a specifies the area to which the block is to be added
s = size of the block
alignment = alignment of the block
On exit -
l = relative location of the first cell of the block
*)
:::;
procedure end_area (a : area ; var
s : size ; var
alignment : integer );
(* Terminate an area
On entry -
a specifies the area to be terminated
On exit -
s = size of the resulting block
alignment = alignment of the resulting block
*)
:::;
procedure mark (a : area );
(* Mark the current growth point of an area *)
:::;
procedure back (a : area );
(* Reset the growth point of an area to the last outstanding mark *)
:::;
procedure combine (a : area );
(* Erase the last outstanding mark in an area and
reset the growth point to the maximum of all previous growths
*)
:::;
adding two output parameters to both back and combine (Figure 10.1), making their calling
sequences identical to that of end area .
In areas that will become activation records, storage must be reserved for pointers to
static and dynamic predecessors, plus the return address and possibly a template pointer.
The size and alignment of this information is xed by the mapping specication, which may
also require space for saving registers and for other working storage. It is usually placed
either at the beginning of the record or between the parameters and local variables. (In
the latter case, the available access paths must permit both negative and positive osets.)
Finally, it is convenient to leave an activation record area open during the generation of code
for the procedure body, so that compiler-generated temporaries may be added. Only upon
completion of the code selection will the area be closed and the size and alignment of the
activation record nally determined.
In principle, the storage module is invoked at the beginning of code generation to x
the length, relative address and alignment of all declared objects and types. For languages
like Ada, integration with the semantic analyzer is essential because object size may be
interrogated by the program and must be used in verifying semantic conditions. Even in this
case, however, we must continue to regard the storage module as a part of the synthesis task
of the compiler; only the location of the calls, not the modular decomposition, is changed.
(x + y)=(a b + c d)
a) A LAX expression
LE 0,x LE 2,a
AE 0,y ME 2,b
LE 2,a LE 0,c
ME 2,b ME 0,d
LE 4,c AER 2,0
ME 4,d LE 0,x
AER 2,4 AE 0,y
DER 0,2 DER 0,2
(uses 3 registers) (uses 2 registers)
b) Two possible IBM 370 implementations
Figure 10.2: Dependence of Register Usage on Evaluation Order
Global register allocation begins with values specied by the implementation as being held
permanently in registers. This might result in the following allocations for the IBM 370:
Register 15: Subprogram entry address
Register 14: Return address
Register 13: Local activation record base address
Register 12: Global activation record base address
Register 11: Base address for constants
Register 10: Code base address
Register 9: Code oset (Section 11.1.3)
Only two registers are allocated globally as activation record bases; registers for access to
the activation records of intermediate contours are obtained from the local allocation, as are
registers for stack and heap pointers.
Most compilers use no additional global register allocation. Further global allocation
might, for example, be appropriate because most of a program's execution time is spent in
the innermost loops. We could therefore stretch the register usage considerably and shorten
the code if we reserved a xed number of registers (say, 3) for the most-frequently used values
of the innermost loops. The controlled variable of the loop is often one of these values. The
simple approach of assigning the controlled variables of the innermost loops to the reserved
registers gives very good results in practice; more complex analysis is generally unnecessary.
Upon completion of the global allocation, we must ensure that at least n registers always
remain for local allocation. Here n is the maximum number of registers used in a single
instruction. (For the IBM 370, n = 4 in the MVCL instruction.) A rule of thumb says that
we should actually guarantee that n+1 registers remain for local allocation, which allows at
least one additional intermediate result or base address to be held in a register.
Pre-planning of local register allocation would be unnecessary if the number of available
registers always suced for the number of simultaneously-existing intermediate results of an
expression. Given a limited number of registers, however, we can guarantee this only for some
subtrees. Outside of these, the register requirement is not xed unambiguously: Altering the
sequence of operations may change the number of registers required. Figure 10.2 shows an
example.
The general strategy for local register allocation is to seek subtrees evaluable, possibly
with rearrangement, using only the number of registers available to hold intermediate results.
These subtrees can be coded without additional store instructions. We choose the largest,
216 Code Generation
and generate code to evaluate it and store the result. All registers are then again available to
hold intermediate results in the next subtree.
Consider an expression represented as a structure tree and a machine with n identical
registers ri . The machine's instructions have one of the following forms:
Load: ri := memory location
Store: memory location := ri
Compute: ri := op(vj ; : : : ; vk ), where vh may be either a register or a memory location.
The machine has various computation instructions, each of which requires specic
operands in registers and memory locations. (Note that a load instruction can be consid-
ered to compute the identity function, and require a single operand in a memory location.)
We say that a program fragment is in normal form if it is written as P1 J1 : : : Ps,1 Js,1 Ps
such that each J is a store instruction, each P is a sequence containing no store instructions,
and all of the registers are free immediately after each store instruction. Let I1 : : : In be one
of the sequences containing no stores. We term this sequence strongly contiguous if, whenever
Ii is used to compute an operand of Ik (i < k) all Ij such that i j < k are also used in the
computation of operands of Ik . The sequence P1 J1 : : : Ps is in strong normal form if Pq is
strongly contiguous for all 1 q s.
Aho and Johnson [1976] shows that, provided no operand or result has a size exceeding
the capacity of a single register, an optimal program to evaluate an expression tree on our
assumed machine can be written in strong normal form. (The criterion for optimality is
minimum program length.) Thus to achieve an optimal program it suces to determine a
suitable sequence in which to evaluate the operands of each operator and { in case the register
requirements exceed n { to introduce store operations at the proper points. The result can
be described in terms of three attributes: register count , store and operand sequence .
Register count species the maximum number of registers needed simultaneously at any
point during the computation of the subtree. Store is a Boolean attribute that is true if the
result of this node must be stored. Operand sequence is an array of integers giving the order
in which the operands of the node should be evaluated. A Boolean attribute can be used if
the maximum number of operands is 2.
The conditions for a strong normal form stated above are fullled on most machines by
oating point expressions with single-length operands and results. For integer expressions
they generally do not hold, since multiplication of single-length values produces a double-
length result and division requires a double-length dividend. Under these conditions the
optimal instruction sequence may involve `oscillation'. Figure 10.3a shows a tree that requires
oscillation in any optimal program. The square nodes produce double-length values, the round
nodes single-length values. An optimal PDP11 program to evaluate the expression appears as
Figure 10.3b. The PDP11 is an `even/odd machine' { one that requires double-length values
to be held in a pair of adjacent registers, the rst of which has an even register number. No
polynomial algorithm that yields an optimal solution in this case is known.
Under the conditions that the strong normal form theorem holds and, with the exception of
the load instruction, all machine instructions take their operands from registers, the following
register allocation technique leads to minimum register requirements: For the case of two
operands with register requirements k1 < k2 , always evaluate the one requiring k1 registers
rst. The result remains as an intermediate value in a register, so that while evaluating the
other operand, k2 +1 registers are actually required. Since k1 < k2 however, the total register
requirement cannot exceed k1 .
When k1 = k2 , either operand may be evaluated rst. The evaluation of the rst operand
will still require k1 registers and the result remains in a register. Thus k1 + 1 registers will be
10.2 Target Attribution 217
DIV
* DIV
DIV F * +
+ E G H I J
needed to evaluate the second operand, leading to an overall requirement for k1 + 1 registers.
If k1 = n then it is not possible to evaluate the entire expression in the registers available,
although either subexpression can be evaluated entirely in registers. We therefore evaluate
one operand (usually the second) and store the result. This leaves all n registers free to
evaluate the other operand. Figure 10.4 formalizes the computation of these attributes.
If the second operand may be either in a register or in memory we apply the same rules,
but begin with simple operands having a register count of 0; further, the left operand
count is replaced by max (expression[2].register count ; 1) since the rst operand must
always be loaded and therefore has a cost of at least one register. Extension to the case in
which the second operand must be in memory (as for halfword arithmetic on the IBM 370)
presents some additional problems (Exercise 10.3). For integermutiplication and division we
must take account of the fact that the result (respectively the rst operand) requires two
registers. The resulting sequence is not always optimal in this case.
Several independent sets of registers can also be dealt with in this manner; examples
are general registers and
oating point registers or general registers and index registers. The
problem of the Univac 1108, in which the index registers and general registers overlap, requires
additional thought.
On machines like the PDP11 or Motorola 68000, which have stack instructions in addition
to registers or the ability to execute operations with all operands and the result in memory,
218 Code Generation
optimization of the local register allocation is a very dicult problem. The minimum register
requirement in these cases is always 0, so that we must include the program length or execution
time as cost criteria. The result is that in general memory-to-memory operations are only
reasonable if no operands are available in registers, and also the result does not appear in a
register and will not be required in one. Operations involving the stack usually have longer
execution time than operations of the same length involving registers. On the other hand,
the operations to move data between registers and the stack are usually shorter and faster
than register-memory moves. As a general principle, then, intermediate results that must be
stored because of insucient registers should be placed on the stack.
10.2.2 Targeting
Targeting attributes are inherited attributes used to provide information about the desired
destination of a result or target of a jump.
We use the targeting attribute desire to indicate that a particular operand should be in a
register of a particular class. If a descendant can arrange to have its result in a suitable register
at no extra cost, this should be done. Figure 10.5 gives the attribution rules for expressions
containing the four basic arithmetic operations, assuming the IBM 370 as the target machine.
This machine requires a multiplicand to be in an odd register, and a dividend to be in a
register pair. We therefore target a single-length dividend to the even-numbered register of
the pair, so that it can be extended to double-length with a simple shift.
In the case of the commutative operators addition and multiplication, we target both operands
to the desired register class. Then if the register allocation can satisfy our preference for
the second operand but not the rst, we make use of commutativity (Section 10.2.3) and
interchange the operands. If neither of the preferences can be satised, then an instruction to
move the information to the proper register will be generated as a part of the coding of the
multiplication or division operator. No disadvantages arise from inability to satisfy the stated
10.2 Target Attribution 219
preference. This example illustrates the importance of the non-binding nature of targeting
information. We propagate our desire to both branches in the hope it will be satised on one
of them. If it is satised on one branch then it is actually spurious on the other, and no cost
should be incurred by trying to satisfy it there.
Many Boolean expressions can be evaluated using conditional jumps (Section 3.2.3), and it
is necessary to specify the address at which execution continues after each jump. Figure 10.6
shows the attribution used to obtain short-circuit evaluation, in the context of a conditional
jump. (If short-circuit evaluation is not permitted by the language, the only change is to delay
generation of the conditional jumps until after all operands not containing Boolean operators
have been evaluated, as discussed in Section 3.2.3.) Labels (and procedure entry points) are
specied by references to target tree elements, for which the assembler must later substitute
addresses. Thus the type assembler symbol is dened not by the code generator, but by the
assembler (Section 11.1.1).
Given the attribution of Figure 10.6, it is easy to see how code is generated: A conditional
jump instruction is produced following the code to evaluate each operand that contains no
further Boolean operators (e.g. a relation). The target of the jump is the label that does not
immediately follow the operand, and the condition is chosen accordingly. Boolean operator
nodes generate no code at all. Moreover, the execution order is xed; no use of commutativity
is allowed.
10.2.3 Use of Algebraic Identities
The goal of the attribution discussed in Section 10.2.1 was to reduce the register requirements
of an expression, which usually leads to a reduction in the length of the code sequence. The
length of the code sequence can often be reduced further through use of the algebraic identities
220 Code Generation
x+y =y+x
x , y = x + (,y) = ,(y , x)
,(,x) = x
x y = y x = (,x) (,y)
,(x y) = (,x) y = x (,y)
a) Identities for integer and real operands
L 1,x
LNR 1,1
L 2,y
S 2,z
MR 0,2
b) Computation of (,x) (y , z )
L 2,z
S 2,y
L 1,x
MR 0,2
c) Computation of x (z , y), which is equivalent to (b)
L 1,z
S 1,y
M 0,x
d) Computation of (z , y) x, which is equivalent to (c)
Figure 10.7: Algebraic Identities
The number of computational instructions can be reduced by, for example, using the
identities of Figure 10.7a to remove a change of sign or combine it with a load instruction
(unary complement elimination). Load operations can be avoided by applying commutativity
when the right operand of a commutative operator is already in a register and the left operand
is still in memory. Figures 10.7b-d give a simple example of these ideas.
None of the identities of Figure 10.7a involve the associative or distributive laws of algebra.
Computers do not obey these axioms, and hence transformations based upon them are not
safe. Also, if the target machine uses a radix-complement representation for negative numbers
then the identity ,(,x) = x fails when x is the most negative representable value, leaving
commutativity of addition and multiplication as the only safe identities. As implementors,
however, we are free to specify the range of values representable using a given type. By simply
stating that the most negative value does not lie in that range, we can use all of the identities
listed in Figure 10.7a. This does not unduly constrain the programmer, since its only eect is
to make the range symmetric and thus remove an anomaly of the hardware arithmetic. (We
normally remove the analogous anomaly of sign-magnitude representation, the negative zero,
without debate.)
222 Code Generation
Although use of algebraic identities can reduce the register requirement, the decisive cost
criterion is the code size. Here we assume that every instruction has the same cost; in prac-
tical applications the respective instruction lengths must be introduced. Let us also assume,
for the moment, a machine that only provides register-register arithmetic instructions. All
operands must therefore be loaded into registers before they are used. We shall restrict our-
selves to addition, subtraction, multiplication and negation in this example and assume that
multiplication yields a single-length result. The basic idea consists of attaching a synthesized
attribute, cost , to each expression. Cost species the minimum costs (number of instruc-
tions) to compute the result of the expression in its correct and inverse (negated) form. It is
determined from the costs of the operation, the operand computations, and any complement-
ing required. An inherited attribute, decision , is then computed on the basis of these costs
and species the actual form (correct or inverse) that should be used.
To generate code for a node, we must know which operation to actually implement. (In
general this may dier from the operator appearing in the structure tree.) If the actual
operation is not commutative then we have to know whether the operands are to be taken
in the order given by the structure tree or not. Finally, we need to know whether the result
must be complemented. As shown in Table 10.1, all of this information can be deduced from
the structure tree operator and the forms of the operands and result.
Tree Result Operand k Reverse Negate Actual Method
Node Form Forms Operands Operation
c cc 1 false false plus a+b
ci 1 false false minus a , (,b)
ic 1 true false minus b , (,a)
ii 2 false true plus ,(,a + (,b))
a+b i cc 2 false true plus ,(a + b)
ci 1 true false minus ,b , a
ic 1 false false minus ,a , b
ii 1 false false plus ,a + (,b)
c cc 1 false false minus a,b
ci 1 false false plus a + (,b)
ic 2 false true plus ,(,a + b)
ii 1 true false minus ,b , (,a)
a,b i cc 1 true false minus b,a
ci 2 false true plus ,(,a + (,b))
ic 1 false false plus ,a + b
ii 1 false false minus ,a , (,b)
c cc 1 false false times ab
ci 2 false true times ,(a (,b))
ic 2 false true times ,(,a b)
ii 1 false false times ,a (,b)
ab i cc 2 false true times ,(a b)
ci 1 false false times a (,b)
ic 1 false false times ,a b
ii 2 false true times ,(,a (,b)
c means that the sign of the operand is not inverted
i means that the sign of the operand is inverted
k is a typical cost of the operation in instructions
Table 10.1: Unary Complement Elimination
10.2 Target Attribution 223
We took the operand location as xed in deriving Table 10.2. This meant, for example,
that when the correct left operand was in memory and the inverted right operand was in a
register we used the sequence subtract , negate to obtain the correct value of the expression
(Table 10.2, row 7). We could also have used the sequence load , subtract , but this would
have increased the register requirements. If we allow the unary complement elimination to
alter the register requirements then it must be integrated with the local register allocation,
increasing the number of attribute dependencies and possibly requiring a more complex tree
traversal. Our approach is optimal provided that the cost of a load instruction is never less
than the cost of negating a value in a register.
Result Operand Operand k Reverse Negate Actual Method
Form Forms Locations Operands Operation
c cc rr 1 false false plus a+b
rm 1 false false plus a+b
mr 1 true false plus b+a
mm 2 false false plus a+b
ci rr 1 false false minus a , (,b)
rm 1 false false minus a , (,b)
mr 2 true true minus ,(,b , a)
mm 2 false false minus a , (,b)
ic rr 1 true false minus b , (,a)
rm 2 false true minus ,(,a , b)
mr 1 true false minus b , (,a)
mm 2 true false minus b , (,a)
ii rr 2 false true plus ,(,a + (,b))
rm 2 false true plus ,(,a + (,b))
mr 2 true true plus ,(,b + (,a))
mm 3 false true plus ,(,a + (,b))
i cc rr 2 false true plus ,(a + b)
rm 2 false true plus ,(a + b)
mr 2 true true plus ,(b + a)
mm 3 false true plus ,(a + b)
ci rr 1 true false minus ,b , a
rm 2 false true minus ,(a , (,b))
mr 1 true false minus ,b , a
mm 2 true false minus ,b , a
ic rr 1 false false minus ,a , b
rm 1 false false minus ,a , b
mr 2 true true minus ,(b , (,a))
mm 2 false false minus ,a , b
ii rr 1 false false plus ,a + (,b)
rm 1 false false plus ,a + (,b)
mr 1 true false plus ,b + (,a)
mm 2 false false plus ,a + (,b)
c means that the sign of the operand is not inverted
i means that the sign of the operand is inverted
r means that the value of the operand is in a register
m means that the value of the operand is in memory
k is a typical cost of the operation in instructions
Table 10.2: Addition on a Machine with Both Memory and Register Operands
226 Code Generation
When we apply algebraic identities on a machine with both register-register and register-
memory instructions, the local register allocation process should assume that each computa-
tional instruction can accept any of its operands either in a register or in memory, and returns
its result to a register (the general model proposed in Section 10.2.1). This assumption leads
to the proper register requirement, and allows complete freedom in applying the identities.
Local register allocation decides the evaluation order of the operands, but leaves open the
question of which operand is left and which is right. Algebraic identities, on the other hand,
deal with the choice of left and right operands but make no decisions about evaluation order.
IBM 370 instructions. The situation could be dierent on the PDP11, where explicit assign-
ments to the program counter are possible. Computers like the Motorola 68000 and PDP11,
which provide stack instructions, also require information about the storage class `stack'. The
actual representation in the descriptor depends upon how many stacks there are and whether
only the top element or also lower elements can be accessed. We restrict ourselves here to
two storage classes: `main storage' and `registers'. Similar techniques can be used for other
storage classes.
type
main_storage_access = record
"
base , index : value_descriptor ;
displacement : internal_int ;
end;
value_descriptor = record
tmode : target_type ; (* Pointer to target definition table *)
case class : value_class of
literal_value :
(lval : internal_int );
label_reference , procedure_reference :
(code : assembler_symbol ;
"
environment : value_descriptor );
general_register , register_pair , floating_point_register :
"
(reg : register_descriptor );
memory_address , memory_value :
(location : main_storage_access )
end;
register_descriptor = record
state : register_state ;
"
content : value_descriptor ;
memory_copy : main_storage_access ;
end;
Figure 10.10: Descriptors for Implementing LAX on the IBM 370
228 Code Generation
When an access function is realizable within a given addressing structure, we say that the
accessed object is addressable within that structure. If an object required by the computation
is not addressable then the code generator must issue instructions to manipulate the state,
making it addressable, before it can be used. These manipulations can be divided into two
groups, those required by source language concepts and those required by limitations on the
addressing structure of the target machine. Implementing a reference with a pointer variable
would be an example of the former, while loading a value into an index register illustrates
the latter. The exact division between the groups is determined by the structure of the main
storage access function implemented in the descriptors. We assume that every non-literal leaf
of the structure tree is addressable by this access function. The main storage access function
of Figure 10.10 is stated in terms of a base, an index and a displacement. The base refers
to an allocatable object (Section 10.1) whose address may, in general, be computed during
execution. The index is an integer value computed during execution, while the displacement
is xed at compile time. Index and displacement values are summed to yield the relative
address of the accessed location within the allocatable object referred to by the base.
If the access is to statically-allocated storage then the `allocatable object' to which the
accessed object belongs is the entire memory. We indicate this special case by a nil base, and
the displacement becomes the static address. A more interesting situation arises when the
access is to storage in the activation record of a LAX procedure.
Figure 10.11 shows a LAX program with ve static nesting levels. If we associate activation
records only with procedures (Section 3.3.2) then we need consider only three levels. Value
descriptors for the three components of the assignment in the body of q could be constructed
as shown in Figure 10.11b.
The level array is built into the compiler with an appropriate maximum size. When the
compiler begins to translate a procedure, it ensures one value descriptor for each level up to the
level of the procedure. Initially, the descriptor at level 1 indicates that the global activation
record base address can be found in register 12 and the descriptor at the procedure's level
indicates that the local activation record base address can be found in register 13. Base
addresses for other activation records can be found by following the static chain, as indicated
by the descriptor at level 2. This initial condition is determined by the mapping specication.
We are assuming here that the LAX-to-IBM 370 mapping specication makes the global
register allocation proposed at the beginning of Section 10.2.1.
When a value descriptor is created for a variable, its base is simply a copy of the level
array element corresponding to the variable's static nesting depth. (The program is assumed
at level 0 here.) The index eld for a simple variable's access function is nil (indicated in
Figure 10.11b by an empty eld) and the displacement is the oset of the variable within the
activation record. For array variables, the index eld points to the value descriptor of the
index, and the displacement is the ctitious oset discussed in Section 3.2.2.
The access function for a value may change as instructions that manipulate the value
are generated. For example, suppose that we generate code to carry out the assignment in
Figure 10.11a, starting from the machine state described by Figure 10.11b. We might rst
consider generating a load instruction for b. Unfortunately, b is not addressable; the IBM 370
load instruction requires that the base be in a register. Thus we must rst obtain a register
(say, general register 1) and load the base address for the activation record at level 2 into
it. When this instruction has been generated, we change the value descriptor for the base to
have a value class of general register and indicate general register 1. Generation of the
load for b is now possible, and the value descriptor for b must be altered to re
ect the fact
that it is in (say) general register 3.
There is one register descriptor for each register used by the code generator. This includes
both the registers controlled by the local register allocation and globally-assigned registers
10.3 Code Selection 229
declare
a : integer ;
procedure p;
declare
b : integer ;
procedure q (c : integer ); a := b + c
begin
b := 1; q (2)
end
begin
p
end
a) A LAX program
general memory
register address
12
a offset
0
1 memory memory
2 value address
3
4
.
.
. static chain b offset
offset
Level
Array general memory
register address
13
c offset
Immediately after a load or store instruction, the contents of a register are a copy of
the contents of some memory location. This `copy' relationship represents a condition that
occurs during execution, and to specify it the register descriptor must be able to dene a
memory access function. This access function is copied into the register descriptor from a
value descriptor at the time the two are linked; it might describe the location from which the
register was loaded or that to which it was stored. Some care must be exercised in deciding
when to establish such a relationship: The code generator must be able to guarantee that
the value in memory will not be altered by side eects without explicitly terminating the
relationship. Use of programmer-dened variables is particularly dangerous because of this
requirement, but use of compiler-generated temporaries and activation record bases is safe.
if free registers exist then choose one arbitrarily
else if copy registers exist then choose the least-recently accessed
else
begin
choose the least-recently accessed unique register;
allocate a temporary memory location;
emit a store instruction;
end
;
ifchosen register has an associated value descriptor then
de-link the value descriptor;
lock the chosen register;
Figure 10.12: Register Management
The register assignment algorithm should not make a random choice when asked to assign
a register (Figure 10.12). If some register is in state free , it may be assigned without penalty.
A register whose state is copy may be assigned without storing its value, but if this value is
needed again it will have to be reloaded. The contents of a register whose state is unique must
be stored before the register can be reassigned, and a locked register cannot be reassigned
at all. All globally-allocated registers are locked throughout the simulation. The states of
locally-allocated registers change during the simulation; they are always free at a label.
As shown in Figure 10.12, the register assignment algorithm locks a register when it is
assigned. The code selection routine requesting the register then links it to the proper value
descriptor, generating any code necessary to place the value into the register. If the value is
the result of a node with the store attribute then the register descriptor state is changed to
unique . This makes the register available for reassignment, and guarantees that the value
will be saved if the register is actually reassigned. When a value descriptor is destroyed, it is
rst de-linked from any associated register descriptor. The state of the register descriptor is
changed to free if the register descriptor species no memory copy; otherwise it is changed
to copy . In either case it is available for reassignment without any requirement to store
its contents. The local register allocation algorithm of Section 10.2.1 guarantees that the
simulator can never block due to all registers being locked.
10.3.2 Code Transformation
We traverse the structure tree in execution order, carrying out a simulation of the target
machine's behavior, in order to obtain the nal transformation of the structure tree into a
sequence of instructions. When the traversal reaches a leaf of the tree, we construct a value
descriptor for the object that the leaf represents. When the traversal reaches an interior node,
a decision table specic to that kind of node is consulted. There is at least one decision table
for every abstract operation, and if the traversal visits the node more than once then each
10.3 Code Selection 231
visit may have its own decision table. The condition stubs of these decision tables involve
attributes of the node and its descendants.
Result correct Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y N N N N N N N N N N N N N N N N
l correct YYYYYYYYNNNNNNNNYYYYYYYYNNNNNNNN
r correct YYYYNNNNYYYYNNNNYYYYNNNNYYYYNNNN
l in register Y Y N N Y Y N N Y Y N N Y Y N N Y Y N N Y Y N N Y Y N N Y Y N N
r in register Y N Y N Y N Y N Y N Y N Y N Y N Y N Y N Y N Y N Y N Y N Y N Y N
swap(l; r) X X X XX X X X XX X X
lreg(l; desire) X X X X X X X X
gen(A; l; r) XXX XXX XXX XXX
gen(AR; l; r) X X X X
gen(S; l; r) XXX XXX XXX XXX
gen(SR; l; r) X X X X
gen(LCR; l; r) X X XXXXXXXX X X
free(r) XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
result(l; store) X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
"correct" means the sign is not inverted
l=value descriptor of the left operand, r=value descriptor of the right operand
desire =desire attribute of the current node
store =store attribute of the current node
A, AR, S, SR and LCR are IBM 370 instructions
Figure 10.13: IBM 370 Decision Table for + (integer, integer) integer Based on Ta-
ble 10.2
Figure 10.13 shows a decision table for integer addition on the IBM 370 that is derived
from Table 10.2. The condition stub uses the form and location attributes discussed in
Section 10.2.3 to select a single column, and the elements of the action stub corresponding to
X's in that column are carried out in sequence from top to bottom. These actions are based
primarily upon the value descriptors for the operands, but they may interrogate any of the
node's attributes. They are basically of two kinds, machine state manipulation and instruction
generation, although instructions must often be generated as a side eect of manipulating the
machine state.
Four machine state manipulation actions appear in Figure 10.13: swap (l, r) simply
interchanges the contents of the value descriptors for the left and right operands. A regis-
ter is allocated by lreg (l, desire) , taking into account the preference discussed in Sec-
tion 10.2.2. This action also generates an instruction to load the allocated register with the
value specied by value descriptor l , and then links that value descriptor to the register
descriptor of the allocated register. After the code to carry out the addition has been gen-
erated, registers that might have been associated with the right operand must be freed and
the descriptor for the register holding the left operand must be linked to the value descriptor
for the result. If the store attribute is true then the result register descriptor state is set to
unique ; otherwise it remains locked as discussed in Section 10.3.1.
Figure 10.13 contains one action to generate the RR-format of the add instruction and
another to generate the RX-format. A single action could have been used instead, deferring
the selection to assembly. The choice between having the code generator select the instruction
format and having the assembler select it is made on grounds of convenience. In our case the
code generator possesses all of the information necessary to make the selection; for machines
with several memory addressing formats this is not always true because the proper format
232 Code Generation
generation tasks. The specications of the two tasks remain distinct, however, their merging
is an implementation decision that can be carried out automatically.
`Peephole optimization' [McKeeman, 1965] uses a machine simulation, and capitalizes
upon relationships that arise when certain code fragments are joined together. Wilcox
[1971] proposed a code generator consisting of two components, a transducer (which essen-
tially evaluates attributes) and a simulator (which performs the machine simulation and code
selection). He introduced the concepts of value and register descriptors in a form quite similar
to that discussed here. Davidson and Fraser [1980] uses a simulation following a simple
code selector based upon a depth-rst, left-to-right traversal of the structure tree with no
attempt to be clever about register allocation. He claims that this approach is easier to
automate, and gives results approaching those of far more sophisticated techniques.
Formulation of the code selection process in terms of decision tables is relatively rare in
the literature, although they seem to be the natural vehicle for describing it. A number of
authors [Elson and Rake, 1970; Wilcox, 1971; Waite, 1976] have proposed special code
generator description languages that eectively lead to programmed decision trees. Gries
[1971] mentions decision tables, but only in the context of a rather specialized implementation
used by the IBM FORTRAN H compiler [Lowry and Medlock, 1969]. This technique,
known as `bit strips', divides the conditions into two classes. Conditions in the rst class
select a column of the table, while those in the second are substituted into particular rows of
the selected column. It is useful only when a condition applies to some (but not all) elements
of a row. The technique precludes the use of a bit matrix because it requires each element to
specify one of three possibilities (execute, skip and substitute) instead of two.
Glanville and Graham [1978] use SLR(1) parse tables as a data structure implementa-
tion of the decision tables; this approach has also been used in the context of LALR(1) parse
tables by Jansohn et al. [1982]
Exercises
10.1 Complete the denition of the memory mapping module outlined in Figure 10.1 for a
machine of your choice.
10.2 Devise a linear algorithm to rearrange the elds of a record to minimize waste space,
assuming that the only possible alignments are 1 and 2. (The DEC PDP11 and Intel
8086 have this property.)
10.3 [Aho and Johnson, 1976] Consider an expression tree attributed according to the
rules of Figure 10.4.
(a) State an execution-order traversal algorithm that will produce optimum code when
arithmetic instructions are emitted at the postx encounters of interior nodes.
(b) State the conditions under which LOAD and STORE instructions will be emitted
during the traversal of (a).
(c) Show that the attribution of Figure 10.4 is inadequate in the case where some
arithmetic operations can be carried out only by instructions that require one
operand in memory.
(d) Show that optimum code can be produced in case (c) if it is possible to create
a queue of pointers to the tree and use this queue to guide the execution-order
traversal.
10.4 Extend the attribution of Figure 10.4 to handle expression nodes with arbitrary num-
bers of operands, all of which must be in registers.
234 Code Generation
10.5 [Bruno and Lassagne, 1975] Suppose that the target computer has a stack of xed
depth instead of a set of registers. (This is the case for most
oating point chips
available for microprocessors.) Show that your algorithm of Exercise 10.4 will still
work if extra constraints are placed upon the allowable permutations.
10.6 What changes would you make in your solution to Exercise 10.4 if some of a node's
operands had to be in memory and others in registers?
10.7 Show that the attribution rules of Figure 10.6 obey DeMorgan's law, i.e. that either
member of the following pairs of LAX expressions leads to the same set of attributes
for a and b:
not (a and b), not a or not b
not (a or b), not a and not b
10.8 Modify Figure 10.6 for a language that does not permit short-circuit evaluation. What
corresponding changes must be made in the execution-order determination?
10.9 [Elson and Rake, 1970] The PL/1 LENGTH function admits optimizations of string ex-
pressions analogous to short-circuit evaluation of Boolean expressions: LENGTH (AjjB )
becomes LENGTH (A)+ LENGTH (B ). (`jj' is the concatenation operator.) Devise targeting
attributes to carry this information and show how they are propagated.
10.10 Show that the unary complement elimination discussed in Section 10.2.3 also minimizes
register requirements.
10.11 Extend Table 10.1 to include division.
10.12 Show that the following relation holds for the cost attribute (Figure 10.9) of any ex-
pression node:
jcost[correct]:length , cost[inverse]:lengthj L
Where L is the length of a negation operator. (This condition must hold for all op-
erations, not just those illustrated in Table 10.1.) What follows from this if register-
memory instructions are also allowed?
10.13 What changes would be required in Figure 10.9 for a machine with a `load negative'
instruction that places the negative of a memory value into a register?
10.14 Modify Figure 10.8 for a machine with both register-register and register-memory in-
structions. Write a single set of attribution rules incorporating the tasks of both Fig-
ure 10.4 and Figure 10.9.
10.15 Specify descriptors to be used in implementing LAX on some computer other than
the IBM 370. Carefully explain any dierence between your specication and that of
Figure 10.10.
10.16 Under what circumstances could a LAX code generator link register values to
programmer-dened variables? Do you believe that the payo would justify the analysis
required?
10.17 There is no guarantee that the heuristic of Figure 10.12 will produce optimal code.
Under what circumstances would the code improve when unique registers were chosen
before copy registers?
10.18 Give, for a machine of your choice, the remaining decision tables necessary to translate
LAX trees involving simple integer operands and operators from Table A.2.
Chapter 11
Assembly
The task of assembly is to convert the target tree produced by the code generator into the
target code required by the compiler specication. This target code may be a sequence of bit
patterns to be interpreted by the control unit of the target machine, or it may be text subject
to further processing by a link editor or loader. In either case, the assembler must determine
operand addresses and resolve any issues left open by the code generator.
Since the largest fraction of the compilers for most machines originate from the manufac-
turer, the manufacturer's target code format provides a de facto standard that the compiler
writer should use: If the manufacturer's representation is abandoned then all access to the
software already developed using other compilers, and probably all that will be developed in
the future at other installations, is lost. For the same reason, it is best to use manufacturer-
supplied link editors and loaders to carry out the external address resolution. Otherwise, if
the target code format is extended or changed then we must alter not only the compilers,
but also the resolution software that we had developed. We shall therefore assume that the
output of the assembly task is a module rather than a whole program, and that external ad-
dress resolution is to be provided by other software. (If this is not the case, then the encoding
process is somewhat simplied.)
Assembly is essentially independent of the source language, and should be implemented by
a common module that can be used in any compiler for the given machine. To a large extent,
this module can be made machine-independent in design. Regardless of the particular com-
puter, it must be able to resolve operand addresses and encode instructions. The information
required by dierent link editors and loaders does not vary signicantly in content. In this
chapter we shall discuss the two main subtasks of assembly, internal address resolution and
instruction encoding, in some detail. We shall sketch the external address resolution problem
brie
y in order to indicate the kind of information that must be provided by the compiler;
two specic examples of the way in which this information is represented can be found in
Chapter 14.
235
236 Assembly
type
label_element = krecord
uid : integer ; (* Unique identification for the label *)
base : integer ; (* Sequence to which the label belongs *)
relative_address : integer ,(* Address of the label in the sequence *)
end;
origin_element = krecord
uid : integer ; (* Unique identification for the sequence *)
length : integer (* Space occupied by the sequence *)
case k : origin_class of
arbitrary : ();
based : (origin : address_exp )
end;
a) Types used in the environments of Figure 11.1
type
address_exp = record
case k : expr_class of
absolute :
(value : integer_value ); (* Provided by the constant table *)
relative :
(label : integer ); (* Unique identification of the
referenced label *)
computation :
(rator : (add, sub );
"
right , left : address_exp )
end;
b) Types used to represent address expressions
Figure 11.2: The Environment Attributes
location of its target; in rare cases the length of a constant-setting instruction may depend
upon the value of an expression (LABEL1 - LABEL2 ). In the remainder of this section we shall
consider only the former situation, and restrict the operand of the span-dependent instruction
to a simple label.
Span-dependence does not change the basic attribution of Figure 11.1, but it requires that
an extra attribute be constructed. This attribute, called mod list , consists of linked records
whose form is given in Figure 11.3a. Mod list is initialized and propagated in exactly the
same way as label env . Elements are added to it at span-dependent instructions as shown in
Figure 11.3b. The function instr size returns the minimum length of the span-dependent
instruction, and this value is used to determine origin values as discussed in Section 11.1.1.
The next step is to construct a relocation table that can be consulted whenever a label
value must be determined. Each relocation table entry species the total increase in size for
all span-dependent instructions lying below a given address (relative or absolute). When the
label address calculation of Section 11.1.1 indicates an address lying between two relocation
table entries, it is increased by the amount specied in the lower entry.
11.1 Internal Address Resolution 239
type
mod_element = record
base : integer ; (* Sequence in which instruction appears *)
relative_address :integer ;(* Address of the instruction in the sequence *)
operand : integer ; (* Unique identification for the operand label*)
instr : machine_op ; (* Characterization of the instruction *)
end;
a) Type used in mod list
rule nodes := nodes span_dependent_operation
attribution
nodes[1].length
nodes[2].length + instr_size (span_dependent_operation.instr );
nodes[1].mod_list
nodes[2].mod_list &
N_mod_element (
nodes[1].base ,
nodes[2].length ,
span_dependent_operation.operand_uid ,
span_dependent_operation.instr );
b) Calculation of mod list
Figure 11.3: Span-Dependent Instructions
The properties of the span-dependent instructions are embodied in a module that provides
two operations:
Too short (machine op , integer ) boolean : Yields true if the instruction dened by
machine op cannot have its operand at the (signed) distance from the instruction given
by the integer.
Lengthen (machine op , integer ) integer : Updates the given machine op , if necessary, so
that the instruction dened can have its operand at the (signed) distance given by the
integer. Yields the increase in instruction size resulting from the change.
The relocation table is built by the following algorithm:
1. Establish an empty relocation table.
2. Make the rst element of mod list current.
3. Calculate the addresses of the span-dependent instruction represented by the current
element of mod list and its operand, using the current environments and relocation
table.
4. Apply too short to the (signed) distance between the span-dependent instruction and
its operand. If the result is false , go to step 6.
5. Lengthen the instruction and update the relocation table accordingly. Go to step 2.
6. If elements remain in mod list , make the next element current and go to step 3.
Otherwise stop.
This algorithm has running time proportional to n2 in the worst case (n is the number
of span-dependent instructions), even when each span-dependent instruction has more than
two lengths.
240 Assembly
Span-dependency must be resolved separately in each portion of the program that depends
upon a dierent origin (see the end of Section 11.1.1). If span-dependent instructions provide
cross-references between portions based on dierent origins then either all analysis of span-
dependence must be deferred to external address resolution or some arbitrary assumption
must be made about the cross-referencing instructions. The usual approach is to optimize
span-dependent instructions making internal references and use the longest version of any
cross-referencing instruction.
taken as advice that the compiler writer should design his own representation! As noted at
the beginning of the chapter, we strongly advocate use of manufacturer-supplied link editors
and loaders for external address resolution.
11.2.1 Cross-Referencing
In many respects, external address resolution is analogous to internal address resolution: Each
module is a single code sequence with certain locations (usually called entry points, although
they may be either data or code addresses) distinguished. These locations are analogous
to the label nodes in the internal address resolution case. The module may also contain
address expressions that depend upon values (usually called external references ) not dened
within that module. These values are analogous to the label references in the internal address
resolution case. When the modules are combined, they can be considered to be a list of
independent code sequences and all of the techniques discussed in Section 11.1 can be carried
over.
There can be some benet in going beyond the analogy discussed in the previous para-
graph, and simply deferring the internal address resolution until all modules have been gath-
ered together. Under those circumstances one could optimize the length of inter-module
references as well as intra-module references (Section 11.1.2). We believe that the bene-
ts are not commensurate with the costs, however, since inter-module references should be
relatively rare.
Two basic mechanisms are available for establishing inter-module references: transfer
vectors and direct substitution. A transfer vector is best suited to references involving a
transfer of control. It is a block of memory, included in each module that contains external
references, consisting of one element for each distinct external symbol referenced (Figure 11.4).
The internal address resolution process replaces every external reference with a reference to
the corresponding element of the transfer vector, and the external address resolution process
lls each transfer vector element with the address of the proper entry point. When the
machine architecture permits indirect addressing, the initial reference is indirect and may
be either a control or a data reference. If the machine does not provide indirect addressing
via main memory, the transfer vector address must be loaded into a base register for the
access. When the address length permits jumps to arbitrary addresses, we might also place
an unconditional jump to the entry point in the transfer vector and implement a call as a call
to that transfer vector entry.
Direct substitution avoids the indirection inherent in the transfer vector mechanism: The
actual address of an entry point is determined during external address resolution and stored
into the instruction that references it. Even with the transfer vector mechanism, direct
substitution is required within the transfer vector itself. In the nal analysis, we use a
transfer vector because it reduces to one the number of changes that must be made when the
address of an entry point changes, and concentrates these changes at a particular point in the
program. Entry point addresses may change statically, as when a module is newly compiled
and bound without altering the program, or they may change dynamically, as when a routine
resides in memory temporarily. For example, service routines in an operating system are
often `transient' { they are brought into memory only when needed. The operating system
provides a transfer vector, and all invocations of service routines must go via this transfer
vector. When a routine is not in memory, its transfer vector entry is replaced by a jump to
a loader. Even if the service routines are not transient, a transfer vector is useful: When
changes made to the operating system result in moving the service routine entry points, only
the transfer vector is altered; there is no need to x up the external references of all user
programs. (Note that in this case the transfer vector is a part of the operating system, not
242 Assembly
of each module using the operating system as discussed in the previous paragraph. If the
vector occupies a xed location in memory, however, it may be regarded either as part of the
module or as part of the operating system.)
In the remainder of this section we shall consider the details of the direct substitution
mechanism. As pointed out earlier, this is analogous to internal address resolution. We shall
therefore concern ourselves only with the dierences between external and internal resolution.
These dierences lie mainly in the representation of the modules.
A control dictionary is associated with each module to provide the following information:
Length of the module.
Locations of entry points relative to the beginning of the module.
Symbols used to denote entry points and external values.
Fields within the module that represent addresses relative to the beginning of the mod-
ule.
Fields within the module that represent external references.
Additional information about the size of external areas may also be carried, to support
external static data areas such as FORTRAN COMMON.
The module length, relative entry point addresses and symbols are used to establish
an attribute analogous to label element . Note that this requires a traversal of the list
of modules, but not of the individual modules themselves. After this attribute is known,
the elds representing relative and external addresses must be updated. A relative address
is updated by adding the address of the module origin; the only information necessary to
characterize the eld is the fact that it contains a relative address. One common way of
encoding this information is to associate relocation bits with the module text. The precise
relationship between relocation bits and elds depends upon the machine architecture. For
example, on the PDP11 a relative address occurring in an instruction must occupy one word.
We might therefore use one relocation bit per word, 1 indicating a relative address. Note
that this encoding precludes other placement of relative addresses, and may therefore impose
constraints upon the code generator's mapping of data structures to be initialized by the
compiler.
To characterize an external reference we must specify the particular external symbol in-
volved in addition to the fact that an external reference occurs in the eld. The concept of
11.3 Instruction Encoding 243
a relocation bit can be extended to cover the existence of an external reference by adding a
third state: For a particular eld the possibilities are `no change', `relative' and `external'.
The eld itself then contains an integer specifying the particular external symbol.
There are two disadvantages to this strategy for characterizing external references. The
most important is that it does not permit an address relative to an external symbol, since the
eld must be used to dene the symbol itself. Data references, especially those to external
arrays like FORTRAN COMMON, tend to violate this constraint. A second disadvantage is
that the number of relocation bits for every eld is increased, although only a small minority
of the elds may actually contain external references. Both disadvantages may be overcome
by maintaining a list of all elds containing external references relative to a particular symbol.
The eld itself contains the relative address and the symbol address is simply added to it,
exactly as a relative address is updated. (This same strategy can be used instead of relocation
bits for relative addresses on machines whose architectures tend to make relative addresses
infrequent; the IBM 370 is an example.)
The result of the cross-referencing process could be a ready-to-run program, with all
addresses absolute, or it could be single module with relative addresses, entry points and
external references that can be used as input to further linkage steps. In the latter case, the
input must specify not only the modules to be linked but also the entry points to be retained
after linkage. External references will be retained automatically if and only if they do not
refer to entry points of other input modules.
simply result in invocations of the set location operation. The remaining nodes must be
encoded by invoking one or more of the last three operations dened in the previous section.
Constants may appear as literal values to be incorporated directly into the target code, or
they may be components of address expressions. In the latter case, the result of the expression
could be used as data or as an operand of an instruction. Literal values must be converted
using the internal-to-target conversion operations of the constant table (Section 4.2.2), and
then inserted into the target code by absolute text . An address expression is evaluated
as outlined in Exercise 11.9. If the result is used as data then the appropriate target code
operation is used to insert it; otherwise it is handled by the instruction encoding.
In the simplest case the abstract instructions correspond to unique operation codes of the
real machine. In general, however, the correspondence is not so simple: One abstract opera-
tion can represent several instructions, or one of several operation codes could be appropriate
depending upon the operand access paths. Decisions are thus made during instruction en-
coding on the basis of the abstract operator and the attributes of the operand(s) just as in
the case of code generation.
The basic instruction encoding operations are called formats. They are procedures that
take sets of values and add them to the target code so that the result is a single instruction.
These procedures sometimes correspond to the instruction formats recognized by the target
machine's control unit, and hence their name. In many cases, however, the instruction format
shows regularities that can be exploited to reduce the number of encoding formats. For
example, the ve instruction formats of the IBM 370 (Figure 11.5a) might correspond to only
three encoding formats (Figure 11.5b).
RR opcode R1 R2
RX opcode R1 X2 B2 D2
RS opcode R1 R3 B2 D2
SI opcode I2 B1 D1
SS opcode L1 L2 B1 D1 B2 D2
a) Instruction formats
FR opcode R1 R2
FI opcode I
FM B D
b) Encoding formats
Figure 11.5: IBM 370 Formats
An instruction is encoded by calling a sequence of one or more format-encoding proce-
dures. The process can be described in a language resembling a normal macro assembly
language. Figure 11.6 shows a portion of a description of the IBM 370 instruction encod-
ing cast in this form. Each macro body species the sequence of format invocations, using
constants or macro parameters (denoted by the character `%' followed by the position of the
parameter) as arguments. A separate directive, NAME, is used to associate the macro body
with an instruction because many instructions can often use the same encoding procedure.
246 Assembly
AR NAME 1AH
SR NAME 1BH
MACRO ; Register,Register
FR %0,%1,%2
ENDM
A NAME 5AH
S NAME 5BH
MACRO ; Register,Memory,Index
FR %0,%1,%3
FM %2
ENDM
AP NAME 0FAH
SP NAME 0FBH
MACRO ; Memory,Length,Memory,Length
FR %0,%2,%4
FM %1
FM %3
ENDM
Note : Sux `H' denotes hexadecimal.
Figure 11.6: IBM 370 Instruction Encoding
NAME directives may specify an argument, which becomes parameter 0 of the macro. In
Figure 11.6 the NAME directive has been used to supply the hexadecimal operation code for
each instruction. (A hexadecimal constant begins with a digit and ends with `H'.) We use the
IBM mnemonics to denote the instructions; in practice these macros would be represented
by tables and the node type of an abstract operation would appear in place of the symbolic
operation code.
Formal parameters of the macros in Figure 11.6 are described by comments. (Strings
following `;' on the same line are comments.) The corresponding actual parameters are the
operands of the target tree node, and their values will have been established during code
generation or address resolution. Note that a `memory' operand includes its base register but
not an index register. Thus the `FM' format takes a single memory address and encodes it
as a base and displacement. This re
ects the fact that the index register is assigned by the
code generator, while the base register is determined during assembly. In other words, the
abstract IBM 370 from which these macros were derived did not have the concept of a based
access.
Consider the LAX expression a + b " [c]. If a were in register 1, b " in register 2 and
c (multiplied by the appropriate element length) in register 3 then the addition could be
performed by a single IBM 370 add instruction with R1 = 1, B 2 = 2, X 2 = 3 and D2 a
displacement appropriate to the lower bound of the array being referenced. Given the macros
of Figure 11.6, however, this instruction could not be encoded because the abstract machine
has no concept of a based access. Clearly one solution to this problem is to give FM two
arguments and make the base register explicit in the abstract machine; another is to provide
the abstract machine with two kinds of memory address: one in the code sequence and the
other in data memory. We favor the latter solution because these two kinds of memory address
are specied dierently. The code generator denes the former by a label and the latter by
a base register and displacement. The assembler mustpick a base register for the former but
11.3 Instruction Encoding 247
A NAME 5AH
S NAME 5BH
MACRO ,LABEL ; Register,Memory,Index
FR %0,%1,%3
FM1 %2
ENDM
MACRO ; Register,Base,Index,Displacement
FR %0,%1,%3
FM2 %2,%4
ENDM
a) Selection of dierent macros
A NAME 5AH
S NAME 5BH
MACRO ; Either pattern
FR %0,%1,%3
IF @%2=LABEL
FM1 %2
ELSE
FM2 %2,%4
ENDIF
ENDM
b) Conditional within a macro
Figure 11.7: Two Memory Operand Types
not the latter. Because of these dierences it is probably useful to have distinct target node
formats for the two cases.
Figure 11.7 shows a modication of the macros of Figure 11.6 to allow our second solution.
In Figure 11.7a the add instruction is associated with two macro bodies, and the attribute of
one of the parameters of the rst is specied. The specication gives the attribute that the
operand must possess if this macro is to be selected. By convention, the macros associated
with a given name are checked in the order in which they appeared in the denition; param-
eters with no specied attributes match anything. Figure 11.7b combines the two bodies,
using a conditional to select the proper format invocation. Here the operator `@' is used to
select the attribute rather than the value of the parameter. This emphasizes the fact that
there are two components of an operand, attribute and value, which must be distinguished.
What constitutes an attribute of an operand, and what constitutes a value? These ques-
tions depend intimately upon the design of the abstract machine and its relationship to the
actual target instructions. We shall sketch a specic mechanism for dening and dealing with
attributes as an illustration.
The value and attribute of an operand are arbitrary bit patterns of a specied length.
They may be accessed and manipulated individually, using the normal arithmetic and bitwise-
logical operators. Any expression yields a value consisting of a single bit pattern. Two
expressions may be formed into a value/attribute pair by using the quote operator: e1 "e2 .
(See Figure 11.8 for examples.) An operand is compatible with a parameter of a macro if the
following expression yields true :
(@operand and @parameter ) = parameter
Thus the operand R2 would be compatible with the parameters R2, EVENGR and GENREG
248 Assembly
in Figure 11.8; it would not be compatible with ODDGR or LABEL. Clearly any operand is
compatible with ANY, and it is this object that is supplied when a parameter specication
is omitted.
Macro languages similar to the one sketched here have been used to specify instruction
encoding in many contexts. Experience shows that they are useful, but if not carefully imple-
mented can lead to very slow processors. It is absolutely essential to implement the formats
by routines coded in the implementation language of the compiler. Macros can be interpreted,
but the interpretive code must be compact and carefully tailored to the interpretation pro-
cess. The normal implementation of a macro processor as a string manipulator is inadequate.
Names should be implemented as a compact set of integers so that access to lists of macro
bodies is direct. Since the number of bodies associated with a name is usually small, linear
search is adequate. Note that a tradeo is possible between selection on the basis of the name
and selection on the basis of attributes.
As a by-product of the encoding, it is possible to produce a symbolic assembly code version
of the program to aid in the debugging and maintenance of the compiler itself. If the macro
names are specied symbolically, as in Figures 11.6 and 11.7, these can be used as symbolic
operation codes in the listing. The uid that appears as an intrinsic attribute of the label
nodes can be converted into a normal identier by prexing a letter. Only constants need
special treatment: a set of target value-to-character conversion procedures must be provided.
Exercises
11.1 Complete Figure 11.1 by adding rules to describe address expressions and construct
the attribute expression.expr .
11.2 [Galler and Fischer, 1964] Consider the problem of mapping storage described by
FORTRAN COMMON, DIMENSION, EQUIVALENCE and DATA statements onto
a sequence of contiguous blocks of storage (one for each COMMON area and one for
local variables).
(a) How can these statements be translated into a target tree of the form discussed
in Section 4.1.4 and Figure 11.1?
(b) Will the translation you describe in (a) ever produce more than one arbitrary -
origin sequence? Carefully explain why or why not.
(c) Does your target tree require any processing by the assembler in addition to that
described in Section 11.1.1? If so, explain why.
11.3 [Talmadge, 1963] Consider the concatenation of all arbitrary -origin sequences dis-
cussed in Section 11.1.1.
(a) Write a procedure to determine the length of an arbitrary -origin sequence.
(b) Write a procedure to scan origin env , nding two arbitrary -origin sequences
and concatenating them by altering the origin element record for the second.
11.4 Consider the implementation of the span-dependence algorithm of Section 11.1.2.
(a) Show that the algorithm has running time proportional to n2 in the worst case,
where n is the number of span-dependent instructions.
(b) Dene a relocation table entry and write the update routine mentioned in step
(5) of the algorithm.
250 Assembly
11.5 [Szymanski, 1978] Modify the span-dependence analysis to allow target expressions of
the form label constant .
11.6 Consider the code basing problem of Section 11.1.3.
(a) Dene any attributes necessary to maintain the state of q within a code sequence,
and modify the rules of Figures 11.1 and 11.3 to include them.
(b) Explain how the operations too short and lengthen (Section 11.1.2) must be
altered to handle this case. Would you prefer to dene other operations instead?
Explain.
11.7 [Robertson, 1979] The Data General Nova has an 8-bit address eld, addressing
relative to the program counter is allowed, and any address may be indirect. Constants
must be placed in the code sequence within 127 words of the instruction that references
them. If a jump target is further than 127 words from the jump then the address must
be placed in the code sequence as a constant and the jump made indirect. (The size of
the jump instruction is the same in either case.)
(a) Give an algorithm for placing constants that takes advantage of any unconditional
jumps already present in the code, placing constants after them.
(b) Indicate how the constant blocks might be considered span-dependent instruc-
tions, whose size varies depending upon whether or not they contain jump target
addresses.
(c) Show that the problem of optimizing the span-dependence in (b) is NP-complete.
11.8 [Talmadge, 1963] Some symbolic assemblers provide `multiple location counters',
where each location counter denes a sequence in the sense of Section 11.1.1. Pseudo
operations are available that allow the user to switch arbitrarily from one location
counter to another.
(a) Show how a target tree could represet arbitrary sequence changes by using
internally-generated labels to associate `pieces' of the same sequence.
(b) Some computers (such as the Control Data Cyber series) have instructions that
are smaller than a single memory element, but an address refers only to an entire
memory element. How could labels be represented for such a machine? How does
the choice of label representation impact the solution to (a)?
(c) What changes to Figure 11.1 would be needed if we chose not to represent arbitrary
sequence changes by internally-generated labels, but instead gave every `piece' of
the same sequence the same uid ?
(d) If we used the representation for sequences suggested in (c), how would the answer
to (b) change?
11.9 The ultimate value of an address embedded in the target code must be either a number
or a pair (external symbol, number). A number alone may represent either a numeric 14
operand or a relative address.
(a) Suppose that A, B and C are labels. What form does the value of (A+B)-C take?
Why is (A+B)+C a meaningless address expression?
(b) Specify an attribute that could be used to distinguish the cases mentioned in (a).
(c) If A were an external symbol, would your answer to (a) change? Would your
answer to (b) change? How?
(d) Would you allow the expression (A+B)-(A+C), A an external symbol, B and C
labels? What form would its value take?
11.4 Notes and References 251
end
i := 1;
a) A legal fragment of an ALGOL 60 program
end;
i := 1;
b) The probable intent of (a)
for i := 1 step 1 until 2 n + 1
c) A probable ineciency in SIMULA
Figure 12.1: Anomalies
and report them as anomalies before their symptoms arise. An example of such a case is the
fragment of ALGOL 60 shown in Figure 12.1a. Since ALGOL 60 treats text following end
as a comment (terminated by else, end or ;), there is no inconsistency here. However, the
appearance of := in the comment makes one suspicious that the user actually intended the
fragment of Figure 12.1b. Many ALGOL 60 compilers will therefore report an anomaly in
this case.
Note that a detectable error may appear as an anomaly before its symptoms arise: A
LAX compiler could report the expression (1=0) as an anomaly even though its symptoms
would not be detected until execution time. Reports of anomalies therefore dier from error
reports in that they are simply warnings that the user may choose to suppress.
Anomalies may be reported even though there is no reason whatever to believe that
they represent true errors; some compilers are quite prepared to simply comment on the
programmer's style. The SIMULA compiler for the Univac 1108, for example, diagnoses
Figure 12.1c as poor style because { as in ALGOL 60 { the upper limit of the iteration is
evaluated 2n + 1 times even though its value probably does not change during execution of
the loop. Such reports may also be used to call the programmer's attention to nonstandard
constructs supported by the particular system on which he is running.
A particular implementation normally places some limitations on the language denition,
due to the nite resources at its disposal. (Examples include the limitation of nite-precision
arithmetic, a limit on the number of identiers in a program, the number of dimensions in
an array or the maximum depth of parentheses in an expression.) Although violations of
implementation-imposed constraints are not errors in the sense discussed above, they have
the same eect for the user. A major design goal is therefore to minimize the number of such
limitations, and to make them as `reasonable' as possible. They should not be imposed lightly,
simply to ease the task of the implementor, but should be based upon a careful analysis of
the cost/benet ratio for user programs.
12.1.2 Responses
We distinguish three possible levels of response to a symptom:
1. Report: Provide the user with an indication that an error has occurred. Specify the
symptom, locate its position precisely, and possibly attempt a diagnosis.
2. Recover: Make the state of the process (compilation, execution) consistent and continue
in an attempt to nd further errors.
3. Repair: On the basis of the observed symptom, attempt a diagnosis of the error. If
condent that the diagnosis is correct, make an appropriate alteration in the program
or data and continue.
256 Error Handling
Both the compiler and the run-time system must at least report every symptom they
detect (level 1). Recovery (level 2) is generally provided only by the compiler, while repair
may be provided by either. The primary criterion for recovery techniques is that the system
must not collapse, since in so doing it may take the error message (and even the precise
location of the symptom) with it. There is nothing more frustrating than a job that aborts
without telling you why!
A compiler that reports the rst symptom detected and then terminates compilation is not
useful in practice, since one run would be needed for each symptom. (In an interactive setting,
however, it may be reasonable for the compiler to halt at the rst symptom, requiring the
programmer to deal with it before continuing.) The compiler should therefore recover from
almost all symptoms, allowing detection of as many as possible in a single run. Some errors
(or restrictions) make it impossible for the compiler to continue; in this case it is best to give
a report and terminate gracefully. We shall term such errors deadly, and attempt to minimize
their number by careful language and compiler design.
Recovery requires that the compiler make some alteration of its state to achieve con-
sistency. This alteration may cause spurious errors to appear in later text that is actually
correct. Such spurious errors constitute an avalanche, and one of the major design criteria
for a recovery scheme is to minimize avalanches. We shall discuss this point in more detail in
Section 12.2.
If the compiler is able to diagnose and repair all errors with a high probability of success,
then the program could safely be executed to permit detection of further errors. We must,
however, be quite clear that a repair is not a correction. Much of the early literature on this
subject used these terms interchangeably. This has unfortunate connotations, particularly for
the novice, indicating that the compiler is capable of actually determining the programmer's
intent.
Repair requires some circumspection, since the cost of execution could be very high and
the particular nature of the repair could render that execution useless or could cause it to
destroy important data les. In general, repair should not be attempted unless the user
specically requests it.
As in the case of recovery, we may classify certain errors as uneconomic or impossible
to repair. These are termed fatal, and may cause us to refuse to execute the program. If a
program containing a fatal error is to be executed, the compiler should produce code to abort
the program when the error location is reached.
user's natural language), restrained and polite. It should be stated in terms of what the user
has done (or not done) rather than in terms of the compiler's internal state. If the compiler
has recovered from the error, the nature of the recovery should be made clear so that any
resulting avalanche will be understandable.
Ideally, error reports should occur in two places: at the point where the compiler noticed
the symptom, and in a summary at the end of the program. By placing a report at the point
of detection, the compiler can identify the coordinates of the symptom in a simple manner
and spare the programmer the task of switching his attention from one part of the listing to
another. The summary report directs the programmer to the point of error without requiring
him to scan the entire listing, reducing the likelihood that errors will be missed.
Compiler error reports may be classied into several levels according to severity:
1. Note
2. Comment
3. Warning
4. Error
5. Fatal error
6. Deadly error
Levels 1-3 are reports of anomalies: Notes refer to nonstandard constructs, and are only
important for programs that will be transported to other implementations; comments criticize
programming style; warnings refer to possible errors. The remaining levels are reports of
actual errors or violations of limits. Errors at level 4 can be repaired, fatal errors suppress
production of an executable program (but the compiler will recover from them), and deadly
errors cause compilation to terminate.
The user should be able to suppress messages below a given severity level. Both the default
severity cuto and the number of reports possible on each level will vary with the design goals
of the compiler. A compiler for use in introductory programming courses should probably
have a default cuto of 0 or 1, and produce a plethora of comments and warnings; one for
use in a production operation with a single type of computer should probably have a cuto
of 3, and do very little repair. The ability to vary these characteristics is a key component in
the adaptability of a compiler.
The programmer's ability to cope with errors seems to be inversely proportional to the
density of errors. If the error density becomes very large, the compiler should probably
abandon the program and let the programmer deal with those errors found so far. (There is
always the chance that a job control error has been made, and the `program' is really a data le
or a program in another language!) It is dicult to state a precise criterion for abandonment,
but possibly one should consider this response when the number of errors exceeds one-tenth
of the number of lines processed and is greater than 10.
The error report le is maintained by a module that provides a single operation:
Error (position , severity , code )
position : The source text position for the message.
severity : One of the numbers 1-6, as discussed above.
code : An integer defining the error.
error was detected. We can retain position information for certain constructs and then use
that information later when we have sucient context to diagnose an error. For example,
suppose that a label was declared in a Pascal program and then never used. The error would
be diagnosed at the end of the procedure declaring the label, but we would give the position
of the declaration in the report and therefore the message `label never used' would point
directly to the declaration.
none of whose associated operators is consistent with the pattern of operand types given. This
symptom could result from an error in one of the operand expressions, or from an erroneous
operator indication. There is no way to be certain which error has occurred, although the
probability of the former is enhanced if one of the operands is consistent with some operator
associated with the indication. In this case, the choice of operator should be based upon the
consistent operand, and might take into account the use of the result. If this choice is not
correct, however, spurious errors may occur later in the analysis. To prevent an avalanche
in this case, we should carry along the information that a semantic error has been repaired.
Further error messages involving type mismatches of this result should then be suppressed.
Another important class of semantic error is the undeclared identier. We have already
noted (Section 12.1.1) that this error may arise in several ways. Clearly we should produce
an error message if the problem was that the identier was misspelled on this use, but if
the declaration were misspelled or omitted the messages attached to each use of the variable
constitute an avalanche, and should be suppressed.
In order to distinguish between these cases, we might set up a denition table entry for the
undeclared identier specifying as many properties as could be determined from the context
of the use. Subsequent occurrences could then be used to rene the properties, but error
messages would not be issued unless the properties were inconsistent. This strategy attempts
to distinguish the cases on the basis of frequency of use of an identier: At the rst use an
error will be reported; thereafter we assume that the declaration is missing or erroneous and
do not make further reports. This method works well in practice. It breaks down when the
programmer chooses an identier susceptible to a consistent misspelling, or when the text
is entered into the machine by a typist prone to a certain type of error (usually a character
transposition or replacement).
The specic details of the consistency check are language dependent. As a concrete ex-
ample, consider the algorithm used by the Whetstone Compiler for ALGOL 60 [Randell
and Russell, 1964]. (There the algorithm is not used to suppress avalanches, but rather
to resolve forward references to declared identiers in a one-pass compilation.) The Whet-
stone Compiler created a property set upon the rst use of an (as yet) undeclared identier,
with each element specifying a distinct property that could be deduced from local context
(Table 12.1). The rst three elements of Table 12.1 determine the form of the use, while the
remaining nine elements retain information about its context. For each successive occurrence,
a new set A0 was established and checked for consistency with the old one, A: The union of
the two must be identical to either set (e.g. A must be a subset of A0 or A0 must be a subset
of A). If A0 is a superset of A, then the new use provides additional information.
Suppose that we encounter the assignment p := q where neither p nor q have been seen
before. We deduce that both p and q must have the form of simple variables, and that
values could be assigned to each; the type must therefore be real, integer or Boolean. If
the assignment r := p + s; were encountered later, we could deduce that p must possess an
arithmetic (i.e. real or integer) value. This use of p is consistent with the former use, and
provides additional information. (Note that the same deduction can be applied to q, but this
relationship is a bit too devious to pursue.) Figures 12.2a and 12.2b show the sets established
for the rst and second occurrences of p. If the statement p[i] := 3; were now encountered,
the union of Figure 12.2c with Figure 12.2b would indicate an inconsistency.
If a declaration is available, we are usually not able to accept additional information about
the variable. There is one case in ALGOL 60 (and in many other languages) in which the
declaration does not give all of the necessary information: A procedure used as a formal
parameter might or might not have parameters of its own, so the declaration does not specify
which of the properties fsimple; procg should appear (Figure 12.2d). That decision must be
deferred until a call of the procedure is encountered.
260 Error Handling
Property Meaning
simple The use takes the form of a simple variable.
array The use takes the form of an array reference.
proc The use takes the form of a procedure call.
value The object may be used in a context where a value is required.
variable The object has a Boolean value.
arithmetic The object has an arithmetic (i.e. integer or real) value.
Boolean The object has a Boolean value.
integer The object has an integer value.
location The object is either a label or a switch.
normal The object is not a label, switch or string.
string The object is a string.
nopar The object is a parameterless procedure.
Table 12.1: Identier Properties in the Whetstone ALGOL Compiler
fsimple; value; variableg
a) Property set for both p and q derived from p := q
fsimple; value; variable; arithmeticg
b) Property set for p derived from r := p + s;
farray; value; variableg
c) Property set for p derived from p[i] := 3;
procedure x(p); procedure p;
d) A declaration that leaves properties unspecied
Figure 12.2: Consistency Checks
to precede it with comment! For ALGOL-like languages simpler methods that can change
more symbols are often superior. On the other hand, global minimum-distance correction
minimizes avalanches.
The symptom of a syntactic error is termed a parser-dened error. Since we parse a
program deterministically from left to right, the parser-dened error is the rst symbol t such
that ! is a head of some string in the language, but !t is not. For example, the string !
of Figure 12.3a is certainly a head of a legal FORTRAN program, which might continue as
shown in Figure 12.3b. If t is the end-of-statement marker, #, then !t is not the head of
any legal program. Hence # constitutes a parser-dened error. Possible minimum-distance
corrections are shown in Figure 12.3d. From the programmer's point of view, the rst has
the highest probability of being a correct program. This shows that a parser-dened error
may not always coincide with the point of the error in the user's eyes. This is especially true
for bracketing errors, which are generally the most dicult to repair.
DO 10 I = J(K,L
a) A head, !, of a FORTRAN program
!)#
b) A possible continuation (# is end-of-statement)
!#
c) A parser-dened error
DO 10 I = J,K,L
DO 10 I = J(K,L)
d) Two minimum-distance corrections
Figure 12.3: Syntax Errors
Ad hoc parsing techniques, and even some of the older formal methods, may fail to detect
any errors at all in certain strings not belonging to the language. Other approaches (e.g. simple
precedence) may delay the point of detection arbitrarily. The LL and LR algorithms will
detect the error immediately, and fail to accept t. This not only simplies the localization of
the symptom in the listing, but also avoids the need to process any syntactically incorrect text.
Recovery is eased, since the immediate context of the error is still available for examination
and alteration.
If !t 2 (T , L) is an erroneous program with parser-dened error t, then to eect
recovery the parser must alter either ! or t such that !0 t 2 L or !t0 0 2 L. Alteration of !
is unpleasant, since it may involve undoing the eects of connection points. It will also slow
the processing of correct programs to permit backtrack when an error is detected. Thus we
shall only consider alteration of the erroneous symbol t and the following string .
Our basic technique will be to recover from each error by the following sequence of steps:
1. Determine a continuation, , such that ! 2 L.
2. Construct a set of anchors D = fd 2 T j is a head of and !d is a head of some
string in Lg.
3. Find the shortest string 2 T such that t = t00 0 ; t00 2 D.
4. Discard from the input string and insert the shortest string 2 T such that !t00 is
a head of some string in L.
5. Resume the normal parse.
262 Error Handling
This procedure can never cause the error recovery process to loop indenitely, since at
least one symbol (t00 ) of the input string is consumed each time the parser is restarted. Note
also that it is never necessary to actually alter the input string during step (4); the parser
is simply advanced through the required steps. A dummy symbol of the appropriate kind is
created at each symbol connection encountered during this advance.
The sequence of connection points reported by the parser is always consistent when this
error recovery technique is used. Semantic analysis can therefore proceed without checking
for inconsistent input. Generated symbols, however, must be recognized as having arbitrary
attributes. This is guaranteed by using special `erroneous' attribute values as discussed in
the previous section.
It is clear from the example of Figure 12.3 that we can make no claim regarding the
`correctness' of the continuation determined during step (1). The quality of the recovery in the
eyes of the user depends upon the particular continuation chosen, but it seems unlikely that we
will nd an algorithm that `optimizes' this choice at acceptable cost. We therefore advocate
a process that can be incorporated into a parser generator and applied automatically without
any eort on the part of the compiler writer. The most important benet is a guarantee
that the parser will recover from all syntactic errors, presenting only consistent input to the
semantic analyzer. This guarantee cannot be made with ad hoc error recovery techniques.
P = fZ ! E #,
E ! FE 0 ,
E 0 ! +FE 0 , E 0 ! ,
F ! i, F ! (E )g
a) Productions of the grammar
Z ! E#
E ! FE 0
E0 !
F !i
b) Designated productions
q 0 i ! q1 q2 i, q0 (! q1 q2 (,
q1 ! ,
q 2 i ! q3 q4 i, q2 (! q3 q5 (,
q3 # ! q6 q7 #, q3 ) ! q6 q7 ), q3+ ! q6 q8 +,
q 4 i ! q9 ,
q5 (! q10 ,
q6 ! ,
q7 ! ,
q8 + ! q11 ,
q9 ! ,
q10 i ! q12 q2 i, q10 (! q12 q2 (,
q11 i ! q13 q4 i, q11 (! q13 q5 (,
q12 ) ! q14 ,
q13 # ! q15 q7 #, q13 ) ! q15 q7 ), q13 + ! q15 q8 +,
q14 ! ,
q15 !
c) The transitions of the parsing automaton (compare Figure 7.4)
Figure 12.4: Adding Error Recovery to an LL(1) Parser
12.2 Compiler Error Recovery 263
q0 i + #
q1 q2 i + #
q1 q3 q4 i + #
q1 q3 q9 + #
q1 q3 + #
q1 q6 q8 + #
q1 q6 q11 #
a) Parse to the point of error detection
q1 q6q11 D = fi(g
q1 q6q13 q4
q1 q6q13 q9
q1 q6q13 D = fi(#)+g
q1 q6q15 q7
q1 q6q15
q1 q6
q1
b) Continuation to the nal state
q1 q6 q11 #
q1 q6 q13 q4#
q1 q6 q13 q9# i is generated by q4i ! q9
q1 q6 q13 # the normal parse may now continue
c) Continuation to the resume point
Figure 12.5: Recovery Using Figure 12.4c
We begin by designating one production for each nonterminal, such that the set of desig-
nated productions contains no recursion. For example, in the production set of Figure 12.4a
we would designate the productions listed in Figure 12.4b. (With this example the desig-
nation is unique, a condition seldom encountered in larger grammars.) We then reorder the
productions for each nonterminal so that the designated production is rst, and apply the
parser generation algorithms of Chapters 5 and 7. As the transitions of the parsing automata
are derived, certain of them are marked. When an error occurs during the parse, we choose
a valid continuation by allowing the parsing automaton to carry out the marked transitions
until it reaches its nal state. No input is read during this process, but at each step the set
of input symbols that could be accepted is added to the set of anchors.
Construction 5.23, as modied in Section 7.2.1 for strong LL(1) grammars, was used to
generate the automaton of Figure 12.4c. The transitions were marked as follows (marked
transitions are preceded by an asterisk in Figure 12.4c):
Any transition introduced by step 3 or step 4 of the construction was marked.
The elements of H in step 5' are listed in the order discussed in the previous paragraph.
The rst transition q! ! qh[1]! of a group introduced by step 5' was marked.
To see the details of the recovery, consider the erroneous sentence i+#. Figure 12.5a traces
the actions of the automaton up to the point at which the error is detected. The continuation
is traced in Figure 12.5b. Note that the input is simply ignored, and the stack is updated
as though the parser were reading symbols that caused it to make the marked transition. At
each step, all terminal symbols that could be accepted are added to D. Figure 12.5c shows
264 Error Handling
(1) Z ! E #
(2) E ! E + F , (3) E ! F
(4) F ! i, (5) F ! (E )
a) The grammar
0: Z ! E ; # 4: F ! (E ) ; #+)
E ! F ; #+ E ! F ; )+
E ! E + F ; #+ E ! E + F ; )+
F ! i ; #+ F ! i ; )+
F ! (E ) ; #+ F ! (E ) ; )+
1: Z ! E ; # 5: E ! E + F ; #+)
E ! E + F ; #+ F ! i ; #+)
F ! (E ) ; #+)
2: E ! F ; #+)
6: F ! (E ) ; #+)
3: F ! i ; #+) E ! E + F ; )+
7: E ! E + F ; #+)
8: F ! (E ) ; #+)
b) States of the Automaton
i ( ) + # E F
0 -4' 4 . . . 1 -3
1 . . . 5 *1'
4 -4' 4 . . . 6 -3
5 -4' 4 . . . -2
6 . . -5' 5 .
c) The transition function for the parser
Figure 12.6: Error Recovery in an LR(0) Parser
the remainder of the recovery. No symbols are deleted from the input string, since # is in
the set of anchors. The parser now follows the continuation again, generating any terminal
symbols needed to cause it to make the marked transitions. When it reaches a point where
the rst symbol of the input string can be accepted, the normal parse resumes.
Let us now turn to the LR case. Figure 12.6a shows a left-recursive grammar for the same
language as that dened by the grammar of Figure 12.4a. The designated productions are
1, 3 and 4. If we reorder productions 2 and 3 and then apply Construction 5.33, we obtain
the states of Figure 12.6b. The situations are given in the order induced by the ordering of
the productions and the mechanics of Construction 5.33. Figure 12.6c shows the transition
table of the automaton generated from Figure 12.6b, incorporating shift-reduce transitions.
The marked transition in each state (indicated by a prime) was the rst shift, reduce or
shift-reduce transition generated in that state considering the situations in order.
An example of the LR recovery is given in Figure 12.7, using the same format as Fig-
ure 12.5. The erroneous sentence is i+)i#. In this case, ) does not appear in the set of
anchors and is therefore deleted.
One obvious question raised by use of automatic syntactic error recovery is that of provid-
ing meaningful error reports for the user. Fortunately, the answer is also obvious: Describe
12.2 Compiler Error Recovery 265
q0 i+)i#
q0 q1 +)i#
q0 q1 q5 )i#
a) Parse to the point of error detection
q0q1 q5 D = fi (g
q0q1 D = fi ( + #g
b) Continuation to the nal state
q0 q1q5i# the normal parse may now continue
c) Continuation to the resume point
Figure 12.7: LR Error Recovery
the repair that was made! This description requires one error number per token class (Sec-
tion 4.1.1) to report insertions, plus a single error number to report deletions. Since token
classes are usually denoted by a nite type, the obvious choice is to use the ordinal of the
token class as the error number to indicate that a token of that class has been inserted.
Missing or super
uous closing brackets always present the danger that avalanches will
occur because brackets are inserted in (globally) unsuitable places. For this reason we must
take cognizance of error recovery when designing the grammar. In particular, we wish to
make bracketed constructs `visible' as such to the error recovery process. Thus the grammar
should be written to ensure that closing brackets appear in the anchor sets for any errors
that could cause them to be deleted from the input string. This condition guarantees that an
opening bracket will not be deleted by mistake and lead to an avalanche error at the matching
closing bracket. It is easy to see that the grammar of Figure 12.4a satises the condition, but
that it would not if F were dened as follows:
F ! i; F ! (F 0 ,
F 0 ! E)
12.2.3 Lexical Errors
The lexical analyzer recognizes two classes of lexical error: Violations of the regular grammar
for the basic symbols and illegal characters not belonging to the terminal vocabulary of the
language or, in languages with stropping conventions, misspelled keywords.
Violations of the regular grammar for the basic symbols (`structural' errors), such as the
illegal LAX
oating point number :E 2, are recovered in essentially the same way as syntax
errors. Characters are not usually deleted from the input string, but insertions are made as
required to force the lexical analyzer to either a nal state or a state accepting the next input
character. If a character can neither form part of the current token, nor appear as the rst
character of any token, then it must be discarded. A premature transition to a nal state
can make two symbols out of one, usually resulting in syntactic avalanche errors. A third
possibility is to skip to a symbol terminator like `space' and then return a suitable symbol
determined in an ad hoc manner. This is interesting because in most languages lexical errors
occur primarily in numbers, where the kind of symbol is known.
Invalid characters are usually deleted without replacement. Occasionally these characters
are returned to the parser so it can give a more informative report. This behavior violates
the important basic principle that each analysis task should cope with its own errors.
When keywords are distinguished by means of underlines or bracketed by apostrophes,
the compiler has sucient information available to attempt a more complete recovery by
checking for certain common misspellings. If we restrict ourselves to errors consisting of
266 Error Handling
a : array [1 : 4; 1 : 4] of real;
:::
b := a[3; i]=(1 + c 2)
a) A LAX fragment
134
1i4
1 + c 2 6= 0
b) Relationships implied by the LAX denition and (a)
J=K*L
c) A FORTRAN statement
jK j < 248
d) Relationship implied by the Control Data 6000 FORTRAN implementation and (c)
ASSERT m = n
e) Relationship explicitly stated by the programmer
Figure 12.8: Implicit and Explicit Relationships
in the program. The numbers may be chosen in various ways: One of the simplest is to
use the address of the rst instruction generated by the source line. (This numbering, like
others discussed below, may contain gaps.) The contents of the location counter provides a
direct reference to the program line if the compiler produces absolute code. If the compiler
produces relocatable code and the nal target program is drawn from several sources, then
the conversion f (z ) rst requires identication of the (separately compiled) program unit by
means of a load map produced when the units are linked. This map gives the absolute address
of each program unit. The relative address appearing on the listing is obtained by subtracting
the starting address from the address of the erroneous instruction.
If the compiler has used several areas for instructions (Section 11.2), the monotonicity of
the (relative) addresses is no longer guaranteed and we must use arbitrary sequence numbers.
These numbers could be provided by the programmer himself or supplied by the compiler. In
the latter case the number could be incremented for each line or for each construct of a given
class (for example, assignments).
When arbitrary sequence numbers are used, the compiler must either store f (z ) in tabular
form accessible to the run-time system or insert instructions into the target program to place
the current sequence number into some specied memory location. If a table is given in a
le, a relationship between the table and the program must be established by the run-time
system; no further cost is incurred. In the second case all information is held within the
program and a run-time overhead in both time and space is implied.
The line number, and even the position within the line, can be given for each instruction
if a table is used. For dynamic determination of line numbers, the line number must be set
in connection with a suitable syntactic unit of the source program. The instructions making
up an assignment, for example, do not always occur in the order in which they appear in the
source program. This is noticeable when the assignment is spread over several source lines.
Of course the numbering need only be updated at those syntactic units that might fail; it
may be omitted for the empty statement in ALGOL 60, for example.
268 Error Handling
provide redundant information, for example special bit patterns in particular places, to aid
in this process.
the literature on symbolic debugging packages [Hall, 1975; Pierce, 1974; Satterthwaite,
1972; Balzer, 1969; Gaines, 1969].
Exercises
12.1 Dene the class of detectable errors for some language available at your installation.
Which of these are detected at compile time? At run time? Are any of the detectable
errors left undetected? Have you made any such errors in your programming?
12.2 We have classied the LAX expression (1=0) as a compile-time anomaly, rather than a
compile-time error. Some authors disagree, arguing that if the expression is evaluated at
run time it will lead to a failure and that if it can never be evaluated then the program
is erroneous for other reasons. Write a cogent argument for or against (whichever you
prefer) our classication.
12.3 The denition of the programming language Euclid species minimum limitations that
may be placed on programs by an implementation. For example, the denition re-
quires that any compiler accept expressions having parentheses nested to depth 7, and
programs having environments nested to depth 31. The danger of setting such min-
imum limits is pointed out by Sale [1977], who demonstrates that the requirement
for environments nested to depth 31 eectively precludes implementation of Euclid
on Burroughs 6700 and 7700 equipment. Comment on the advantages and disadvan-
tages of Euclid approach, indicating the scope of the problem and possible compromise
solutions.
12.4 Consider some compiler running at your installation. How are its error messages com-
municated to the user? If the result gives less information than the model we discussed
in Section 12.1.3, argue for or against its adequacy. Were there any constraints on the
implementor forcing him to his choice?
12.5 Experiment with some compiler running at your installation, attempting to create
an avalanche based upon a semantic error. If you succeed, analyze the cause of the
avalanche. Could it have been avoided? How? At what cost to correct programs?
If you do not succeed, analyze the cause of your failure. Is the language subject to
avalanches from semantic errors? Is the implementation very clever, possibly at some
cost to correct programs?
12.6 Under what conditions might a simple precedence analyzer [Gries, 1971] delay detec-
tion of an error?
12.7 [Ro hrich, 1980] Give an algorithm for designating productions of a grammar so that
there is one production designated for each nonterminal, and the set of designated
productions contains no recursion.
12.8 Apply the syntactic error recovery technique of Section 12.2.2 to a recursive descent
parser based upon extended BNF (Section 7.2.2).
12.9 Apply both the automaton of Figure 12.4c and that of Figure 12.6c to the string
(i(i + i#. Do you feel that the recovery is reasonable?
12.10 [Dunn and Waite, 1981] Consider the modication of Figure 7.9 to support automatic
error recovery.
12.4 Notes and References 271
(a) Assuming that the form of the table entry remained unchanged, how would you
incorporate the denition of the continuation into the tables?
(b) Based upon your answer to (a), write procedures parser error , get anchor and
advance parser to actually carry out the recovery. These procedures should be
nested in parser as follows, and parser should be modied appropriately to
invoke them:
parser
parser_error
get_anchor
advance_parser
(c) Carefully explain your mechanism for generating symbols. Does it require access
to information known only to the lexical analysis module? If so, how do you obtain
this information?
12.11 [Morgan, 1970] Design an algorithm for checking the equivalence of two strings under
the transformations discussed in Section 12.2.3. How would you interface this algorithm
to the analysis process discussed in Chapters 6 and 7? Be specic !
12.12 Consider some compiler running at your installation. How is the static location of
a run-time error determined when using that compiler? To what extent could the
determination be automated without making any change to the compiler? What (if
anything) would such automation add to the cost of running a correct program?
12.13 [Kruseman-Aretz, 1971] A run-time error-reporting system for ALGOL 60 programs
uses a variable lnc to hold the line number of the rst basic symbol of the smallest
statement whose execution has begun but not yet terminated. We wish to minimize
the number of assignments to lnc . Give an algorithm that decides when assignments
to lnc must be generated.
12.14 Consider some compiler running at your installation. How is the dynamic environment
of a run-time error determined when using that compiler? To what extent could the
determination be automated without making any change to the compiler? What (if
anything) would such automation add to the cost of running a correct program?
12.15 [Bayer et al., 1967] Consider some language and machine with which you are familiar.
Dene a reasonable symbolic dump format for that language, and specify the infor-
mation that a compiler must supply to support it. Give a detailed encoding of the
information for the target computer, and explain the cost increase (if any) for running
a correct program.
272 Error Handling
Chapter 13
Optimization
Optimization seeks to improve the performance of a program. A true optimum may be too
costly to obtain because most optimization techniques interact, and the entire process of
optimization must be iterated until there is no further change. In practice, therefore, we
restrict ourselves to a xed sequence of transformations that leads to useful improvement in
commonly-occurring cases. The primary goal is to compensate for ineciencies arising from
the characteristics of the source language, not to lessen the eects of poor coding by the
programmer. These ineciencies are inherent in the concept of a high level language, which
seeks to suppress detail and thereby simplify the task of implementing an algorithm.
Every optimization is based upon a cost function, a meaning-preserving transformation,
and a set of relationships occurring within some component of the program. Code size,
execution time and data storage requirements are the most commonly used cost criteria; they
may be applied individually, or combined according to some weighting function.
The boundary between optimization and competent code generation is fuzzy. We have
chosen to regard techniques based upon processing of an explicit computation graph as opti-
mizations. A computation graph is implicit in the execution-order traversal of the structure
tree, as pointed out at the beginning of Chapter 10, but the code generation methods dis-
cussed so far do not require that it ever appear as an explicit data structure. In this chapter
we shall consider ways in which a computation graph can be manipulated to improve the
performance of the generated code.
Our treatment in this chapter diers markedly from that in the remainder of the text.
The nature of most optimization problems makes computationally ecient algorithms highly
unlikely, so the available techniques are all heuristic. Each has limited applicability and
many are quite complex. Rather than selecting a particular approach and exploring it in
detail, we shall try to explain the general tasks and show how they t together. Citations to
appropriate literature will be given along with the discussion. In Section 13.1 we motivate the
characteristics of the computation graph and sketch its implementation. Section 13.2 focuses
on optimization within a region containing no jumps, while Section 13.3 expands our view
to a complete compilation unit. Finally, Section 13.4 gives an assessment of the gains to be
expected from various optimizations and the costs involved.
program being optimized. In the rst place, the structure tree re
ects the semantics of the
source language and therefore suppresses detail. Secondly, execution-order tree traversals
depend upon the values of specied attributes and hence cannot be generated mechanically
by the tools of Chapter 8.
Data access operations are often implicit in the target machine code as well: They are
incorporated into the access paths of instructions, rather than appearing as separate com-
putations. Because of this, it is dicult to isolate them and discover patterns that can be
optimized. The target tree is thus also an unsuitable representation for use by an optimizer.
To avoid these problems, we dene the computation graph to have the following properties:
All source operations have been replaced by (sequences of) operations from the in-
struction set of the target machine. Coercions appear as machine operations only if
they result in code. Other coercions, which only alter the interpretation of the binary
representation of a value, are omitted.
Every operation appears individually, with the appropriate number of operands.
Operands are either intermediate results or directly-accessible values. Each value has a
specied target type.
All address computations are explicit.
Assignments to program variables are separated from other operations.
Control
ow operations are represented by conditional and unconditional jumps.
Although based upon target machine operations, the computation graph is largely
machine-independent because the instruction sets of most Von Neumann machines are very
similar.
We assume that every operation has no more than one result. To satisfy this assumption,
we either ignore any side eects of the machine instruction(s) implementing the operation or
we create a sequence of operations making those side eects explicit. In both cases we rely
upon subsequent processing to generate the proper instructions. For example, the arithmetic
operations of some machines set the condition code as a side eect. We ignore this, producing
comparison operators (whose one result is placed in the condition code) where required.
Peephole optimization (Section 13.2.3) will remove super
uous comparisons in cases where a
preceding arithmetic operation has properly set the condition code. The second approach is
used to deal with the fact that on many machines the integer division instruction yields both
the quotient and the remainder. Here we create a sequence of two operations for both div
and mod. The rst operation in each case is divmod ; the second is a unary selector, div or
mod respectively, that operates on the result of divmod . Common subexpression elimination
(Section 13.2.1) will remove any super
uous divmod operators.
The atoms of the computation graph are tuples. A tuple consists of an operator of the
(abstract) target machine and one or more operands, each of which is either a value known
to the compiler or the result of a computation described by a tuple. Each appearance of a
tuple in the computation graph is called a program point, and given an integer index greater
than 0.
Let o1 and o2 be operands in a computation graph. These operands are congruent if
they are the same known value, or if they are the results of tuples t1 and t2 with the same
numbers of operands for which operator(t1 ) = operator(t2 ) and operandi (t1 ) is congruent to
operandi(t2 ) for all i. A unique operand identier is associated with each set of congruent
operands, and this identier is used to denote all of the operands in the set.
Figure 13.1b has 12 program points and 9 distinct tuples. Values known to the compiler
have the corresponding source language constructs as their operand identiers. The full
denition of a tuple is given only at its rst occurrence; subsequent occurrences are denoted
13.1 The Computation Graph 275
i 1 2 3 4 5
a x
j 6 7 8 9 10
procedure has the same properties as a LAX or ALGOL 68 pointer in most languages, except
that the accessibility is limited to objects outside the current activation record. A procedure
call must be assumed to use and potentially modify every variable visible to that procedure,
as well as every variable passed to it as a reference parameter.
To construct the computation graph, we apply the storage mapping, target attribution
and code selection techniques of Sections 10.1-10.3. These methods yield the tuples in an
execution order determined by the target attributes, in particular the register estimate. The
only changes lie in the code selection process (Section 10.3), where the abstract nature of the
computation graph must be re
ected.
A new value class , generated , must be introduced in Figure 10.10. If the class of
a value descriptor is generated , the variant part contains a single id eld specifying an
operand identier. Decision tables (such as Figure 10.13) do not have tests of operand value
class in their condition stubs, nor do they generate dierent instructions for memory and
register operands. The result is a signicant reduction in the table size (Figure 13.3). Note
that the gen routine calls in Figure 13.3 still specify machine operation codes, even though
no instruction is actually being produced. This is done to emphasize the fact that the tuple's
operator is actually a machine operator. In this case we have chosen `A' to represent IBM
370 integer addition. A tuple whose operator was A might ultimately be coded using an AR
instruction or appear as an access path of an RX-format instruction, but it would never result
in (say) a
oating add.
Result correct Y Y Y Y N N N N
l correct Y Y N N Y Y N N
r correct Y N Y N Y N Y N
swap(l,r) X X
gen(A,l,r) X X X X
gen(S,l,r) X X X X
gen(LCR,l,l) X X
Figure 13.3: Decision Table for +(integer , integer ) integer Based on Figure 10.13
The gen routine's behavior is controlled by the operator and the operand descriptor
classes. When the operands are literal values and the operator is one made available by the
constant table, then the specied computation is performed and the appropriate literal value
delivered as the result. In this case, nothing is added to the computation graph. Memory
operands (either addresses or values) are checked to determine whether they are directly
addressable. If not, tuples are generated to produce the specied results. In any case, the
value descriptors are altered to class generated and an appropriate operand identier is
inserted. Finally a tuple is generated to describe the current operation and the proper operand
identier is inserted into the value descriptor for the left operand.
Although we have not shown it explicitly, part of the input to the gen routine species
the program variables potentially used and destroyed. This information is used to derive the
dependency sets. An example giving the
avor of the process can be found in the description
of Bliss-11 [Wulf et al., 1975].
Our strategy for optimizing a basic block is to carry out the following steps in the order
indicated:
1. Value Numbering : Perform a `symbolic execution' of the block, propagating symbolic
values and eliminating redundant computations.
2. Coding : Collect access paths for program variables and combine them with operations
to form valid target machine instructions, assuming an innite set of registers.
3. Peephole Optimization : Attempt to combine sequences of instructions into single in-
structions having the same eect.
4. Register Allocation : Map the register set resulting from the coding step onto the avail-
able target machine registers, generating spill code (code to save and/or restore registers)
as necessary.
Throughout this section we assume that all program variables are potentially accessed
after the end of the basic block, and that no tuple values are. The latter assumption fails for
an expression-oriented language, and in that case we must treat the tuple representing the
nal value of the expression computed by the block as a program variable. Section 13.3 will
consider the more general case occurring as a result of global optimization.
13.2.1 Value Numbering
Access computations for composite objects are rich sources of common subexpressions. One
classic example is the code for the following FORTRAN statement, used in solving three-
dimensional boundary value problems:
invalid := initialize_vn ;
for o
S
2 [U t D t ][ do
PV[o] := invalid ;
for to last
tuple do
t
t := first tuple
begin
if (t = "v"") and (PV[v] 6= invalid ) then
for o 2 D do PV[o] := PV[v]
t
else
begin
T := evaluate (t );
if notis_value (T , PV[t] ) then
begin
result := new_value (T) ;
for o 2
Xt do
PV[o] := invalid ;
for o 2
Dt do
PV[o] := result ;
end
end
end;
a) The algorithm
Operation Meaning
initialize vn : value number Clear the output block and return the rst
value number.
evaluate (tuple ) : tuple Create a new tuple by replacing each t in
the argument by PV[t] . Return the newly-
created tuple.
is value (tuple , operand ) : boolean If the last occurrence of tuple in the out-
put block was associated with PV[operand]
then return true , otherwise return false .
new value (tuple ) : value number Add tuple to the output block, associating
it with a new value number. Return the
new value number.
b) Operations of the output module
Figure 13.4: Value Numbering
and t12 result in the last two tuples of Figure 13.5c. As can be seen from this example, value
numbering recognizes common subexpressions even when they are written dierently in the
source program.
In more complex examples than Figure 13.5, the precise identity of the accessed object
may not be known. For example, the value of a[i] in Figure 13.2a might be altered even
though none of the assignment tuples in the corresponding straight-line segment has a[i] as
a target. The analysis uses Xt10 to account for this phenomenon, yielding the basic block
of Figure 13.6. Note that the algorithm correctly recognizes the address of a[i] as being a
common subexpression.
The last step in the value numbering process is to delete redundant assignments to program
variables (such as v1 in Figure 13.5c) and, as a byproduct, to develop use counts for all of the
tuples. Figure 13.7 gives the algorithm. Since each tuple value is dened exactly once, and
never used before it is dened, USECOUNT [v] will give the number of uses of v at the end
of the algorithm. The entries for program variables, on the other hand, may not be accurate
because they include potential uses by procedures and pointer assignments.
280 Optimization
a := 2;
b := a X + 1;
a := 2 X ;
c := a + 1 + b;
a) A sequence of assignments
Tuple U D X
t1 : a := 2 fg fag ft ; t ; t ; t ; t ; t ; t g
2 4 5 6 9 11 12
t2 : a" fag ft g2 fg
t3 : X" fX g ft g3 fg
t4 : t2 t3 ft ; t g
2 3 ft g4 fg
t5 : t4 + 1 ft g
4 ft g5 fg
t6 : b := t5 ft g
5 fb; t g 6 ft ; t ; t g
10 11 12
t3
t7 : 2 t3 ft g
3 ft g7 fg
t8 : a := t7 ft g
7 fa; t g ft ; t ; t ; t ; t ; t ; t g
8 2 4 5 6 9 11 12
t2
t9 : t2 + 1 ft g
2 ft g 9 fg
t10 : b" fbg ft g 10 fg
t11 : t9 + t10 ft ; t g ft g
9 10 11 fg
t12 : c := t11 ft g
11 fc; t g 12 fg
b) Tuples and sets for (a)
v1 : a := 2 v5 : b := v4
v2 : X " v6 : a := v3
v3 : 2 v2 v7 : v4 + v4
v4 : v3 + 1 v8 : c := v7
c) Transformed computation graph
Figure 13.5: Common Subexpression Elimination
The analysis discussed in this section can be easily generalized to extended basic blocks.
Each path through the tree of basic blocks is treated as a single basic block; when the control
ow branches, we save the current information in order to continue the analysis on the other
branch. Should constant folding determine that the condition of a conditional jump is xed,
we replace this conditional jump by an unconditional jump or remove it. In either case one of
the alternatives and the corresponding basic block is super
uous and its code can be deleted.
These situations arise most frequently in automatically-generated code, or when the if : : :
then : : : else construct, controlled by a constant dened at the beginning of the program, is
used for conditional compilation.
To generalize Figure 13.7, we begin by analyzing the basic blocks at the leaves of the
extended basic block. The contents of USECOUNT are saved, and analysis restarted on a
for o 2 S [U [ D ] do USECOUNT[o] := 0;
v v v
for o 2 fProgram variablesg do USECOUNT[o] := 1;
for v := last tuple downto first tuple do
begin
c := 0;
for 2
o Dv do
begin
c := c + USECOUNT[o] ;
if o is a program variable then
USECOUNT[o] := 0;
end ;
if c = 0 then
delete tuple v
else kfor o 2
Uv do
USECOUNT[o] := USECOUNT[o] + 1;
end ;
Figure 13.7: Redundant Assignment Elimination and Use Counting
predecessor block by resetting each element of USECOUNT to the maximum of the saved values
for the successors. We cannot guarantee consistency in the use counts by this method, since
not all of the use counts must reach their maxima along the same execution path. It turns
out, however, that this inconsistency is irrelevant for our purposes.
13.2.2 Coding
The coding process is very similar to that of Section 10.3. We maintain a value descriptor
for each operand identier, and simulate the action of the target computer using these value
descriptors as a data base. There is no need to maintain register descriptors, since we are
assuming an innite supply.
Figure 13.8 gives two possible codings of Figure 13.1a for the IBM 370. Our notation for
describing the instructions is essentially that of Davidson and Fraser [1980]: `R[: : : ]' means
`contents of register : : : ' and `M[: : : ]' means `contents of the memory location addressed by
: : : '. Register numbers greater than 15 represent `abstract registers' of the innite-register
machine, while those less than 15 represent actual registers whose usage is prescribed by the
mapping specication. (As discussed in Section 10.2.1, register 13 is used to address the local
activation record.)
The register transfer notation of Figure 13.8 is independent of the target machine (al-
though the particular descriptions of Figure 13.8b are specic to the IBM 370), and is useful
for the peephole optimization discussed at the end of this section. Figure 13.8b is not a
complete description of the register transfers for the given instructions, but it suces for the
current example. Later we shall show an example that uses a more complete description.
The dierences between the left and right columns of Figure 13.8b stem from the choice
of the left operand of the multiply instruction, made when the second line was generated.
Because the multiply is a two-address instruction, the value of the left operand will be replaced
by the value of the result. Wulf et al. [1975] calls this operand the target path.
In generating the left column of Figure 13.8b, we used Wulf's criterion: Operand v2 has a
use count greater than 1, and consequently it cannot be destroyed by the operation because it
will be needed again. It should not lie on the target path, because then an extra instruction
would be needed to copy it. Since v3 is only used once, no extra instructions are required
when it is chosen as the target path. Nevertheless, the code in the right column is two bytes
shorter { why? The byte counts for the rst six rows re
ect the extra instruction required to
preserve v2 when it is chosen as the target path. However, that instruction is an LR rather
282 Optimization
than an L and thus its cost is only two bytes. It happens that the last use of v2 involves an
operation with two memory operands, one of which must be loaded at a cost of 4 bytes! If
the last use involved an operation whose other operand was in a register, we could use an RR
instruction for that operation and hence the byte counts of the two codings would be equal.
This example points up the fact that the criteria for target path selection depend strongly
upon the target computer architecture. Wulf's criterion is the proper one for the DEC PDP11,
but not for the IBM 370.
Figure 13.8b does not account for the fact that the IBM 370 multiply instruction requires
the multiplicand to be in an odd register and leaves the product in a register pair. The
register allocation process must enforce these conditions in any event, and it does not appear
useful to introduce extra notation for them at this stage. We shall treat the problem in detail
in Section 13.2.4.
:= + 1
a) Incrementing an arbitrary location
ti : tj " tj is the address : : :
tk : ti + 1 Increment the value
tl : tj := tk Store the result
b) The tuple sequence for (a) after value numbering
R[8] := M[R[9]];
R[8] := R[8]+1;
M[R[9]]:=R[8];
c) Registers transfers for (b) after redundant transfer elimination
M[R[9] := M[R[9]]+1;
d) The overall eect of (c)
Figure 13.11: Generating an Increment
13.2 Local Optimization 285
Figure 13.11 shows how an increment instruction is generated. The `: : : ' in Figure 13.11a
stands for an arbitrarily complex address expression that appears on both sides of the assign-
ment. This expression is recognized as common during value numbering, and the address it
describes appears as an operand identier (Figure 13.11b).
Davidson and Fraser [1980] assert that windows larger than 3 are not required. Ad-
ditional evidence for this position comes from Tanenbaum's 1982 table of 123 optimization
patterns. Only seven of these were longer than three instructions, and none of the seven
resulted in just a single output instruction. Three of them converted addition or subtraction
of 2 to two increments or decrements, the other four produced multi-word move instructions
from successive single-word moves when the addresses were adjacent. All of these patterns
were applied rather infrequently.
The optimizations of Figures 13.10 and 13.11 could be specied by the following patterns if
we used the second peephole optimization method mentioned at the beginning of this section:
MOV a; b CMP a; 0 ) MOV a; b
MOV a; b ADD 1,b MOV b; a ) INC a
(The second pattern assumes that b is not used elsewehere.)
Any nite-state pattern matching technique, such as that of Aho and Corasick [1975],
can be modied to eciently match patterns such as these. (Modication is required to
guarantee that the item matching the rst occurrence of a or b also matches subsequent
occurrences.) A complete description of a particular algorithm is given by Ramamoorthy
and Jahanian [1976]. As indicated earlier, an extensive set of patterns may be required.
(Tanenbaum and his coauthors 1982 give a representative example.) The particular set of
patterns that will prove useful depends upon the source language, compiler code generation
and optimization strategies, and target machine. It is developed over time by examining
the code output by the compiler and recognizing areas of possible improvement. There is
never any guarantee that signicant optimizations have not been overlooked, or that useless
patterns have not been introduced. On the other hand, the processing is signicantly faster
than that for the rst method because it is unnecessary to `rediscover' the patterns for each
pair of instructions.
13.2.4 Local Register Allocation
The classical approach to register allocation determines the register assignment `on the
y' as
the nal code is being output to the assembler. This determination is based upon attributes
calculated by previous traversals of the basic block, and uses value descriptors to maintain
the state of the allocation. We solve the register pair problem by computing a size and
alignment for each abstract register. (Thus the abstract register becomes a block in the sense
of Section 10.1.) In the right column of Figure 13.8b, R[16] and R[17] each have size 1 and
alignment 1 but R[18] has size 2 and alignment 2 because of its use as a multiplicand. Other
machine-specic attributes may be required. For example, R[16] is used as a base register
and thus cannot be assigned to register 0 on the IBM 370.
A register assignment algorithm similar to that described in Section 10.3.1 can be used.
The only modication lies in the choice of a register to free. In Figure 10.12 we chose the
least-recently accessed register; here we should choose the one whose next access is furthest in
the future. (Belady [1966] has shown this strategy to be optimal in the analogous problem
of determining which page to replace in a virtual memory system.) We can easily obtain
this information at the same time we compute the other attributes mentioned in the previous
paragraph. Note that all of the attributes used in register allocation must be computed after
peephole optimization; the peephole optimizer, by combining instructions, may alter some of
the attribute values.
286 Optimization
Figure 10.12 makes use of a register state copy that indicates existence of a memory
copy of the register content. If it has been necessary to spill a register then the assignment
algorithm knows that it is in the copy state. However, as the example of Figure 13.8 shows,
a register (e.g. R[16]) may be in the copy state because it has been loaded from a memory
location whose content will not be altered. In order to make use of this fact, we must guarantee
that no side eect will invalidate the memory copy. The necessary information is available in
the sets D and X associated with the original tuples, and must be propagated by the value
numbering and coding processes.
When we are dealing with a machine like the IBM 370, the algorithm of Figure 10.12
should make an eort to maximize the number of available pairs by appropriate choice of a
free register to allocate. Even when this is done, however, we may reach a situation in which
no pair is free but at least two registers are free. We can therefore free a pair by freeing one
register, and we might free that register by moving its content to the second free register
at a cost of two bytes. If the state of one of the candidate registers is copy , then it can
be freed at a cost of two bytes if and only if its next use is the proper operand of an RR
instruction (either operand if the operation is commutative). It appears that we cannot lose
by using an LR instruction. However, suppose that the value being moved must ultimately
(due to other con
icts) be saved in memory. In that case, we are simply paying to postpone
the inevitable! We conclude that the classical strategy cannot be guaranteed to produce an
optimum assignment on a machine with double-length results.
propagation is a good example of this kind of analysis. As the computation graph is being
built, we accumulate a list of all of the program points at which an operand is given a constant
value. During global data
ow analysis we dene a set USES (o, p) at each program point
p as the set of program points potentially using the value of o dened at p . Similarly, a set
DEFS (o, p) is the set of program points potentially dening the value of operand o used at
program point p . For each element of the list of constant denitions, we can then nd all of
the potential uses. For each potential use, in turn, we can nd all other potential denitions.
If all denitions yield the same constant then this constant can be substituted for the operand
use in question. Finally, if we substitute constants for all operand uses in a tuple then the
tuple can be evaluated and its program point added to the list. The process terminates when
the list is empty.
For practical reasons, global data
ow analysis is carried out in two parts. The rst
part gathers information within a single basic block, summarizing it in sets dened at the
entry and/or exit points. This drastically reduces the number of sets that must be processed
during the second part, which propagates the information over the
ow graph. The result of
the second part is then again sets dened at the entry and/or exit points of basic blocks. These
sets are nally used to distribute the information within the block. A complete treatment of
the algorithms used to propagate information over the
ow graph is beyond the scope of this
book. Kennedy [1981] gives a good survey, and Hecht [1977] covers the subject in depth.
As an example, consider the computation of LIVE (b) . We characterize the
ow graph
for this computation by two sets:
PRED(b) = fh j h is an immediate predecessor of b in the
ow graphg
SUCC (b) = fh j h is an immediate successor of b in the
ow graphg
An operand is then live on exit from a block b if it is used by any block in SUCC (b) before
it is either dened or invalidated. Moreover, if a block h 2 SUCC (b) neither denes nor
invalidates the operand, then it is live on exit from b if it is live on exit from h . Symbolically:
[
LIV E (b) = [IN (h) [ THRU (h) \ LIV E (h)] (13.1)
h2SUCC (b)
IN (h) is the set of operand identiers used in h before being dened or invalidated, and
THRU (h) is the set of operand identiers neither dened nor invalidated in h .
We can solve the system of set equations (13.1) iteratively as shown in Figure 13.12. This
algorithm is O(n2 ), where n is the number of basic blocks: At most n , 1 executions of
the repeat statement are needed to make a change in a basic block b available to another
arbitrary basic block b' . The actual number of iterations depends upon the sequence in which
the basic blocks are considered and the complexity of the program. For programs without
explicit jumps the cost can be reduced to two iterations, if the basic blocks are ordered so
that inner loops are processed before the loops in which they are contained.
Computation of the sets USES (o, p) and DEFS (o, p) provides a more complex exam-
ple of global
ow analysis. We begin by computing REACHES (b) , the set of program points
that dene values valid at the entry point of basic block b . Let DEF (b) be the set of program
points within b whose denitions remain valid at the end of b , and let VALID (b) be the
set of program points whose denitions are not changed or invalidated in b . REACHES (b) is
then dened by:
[
REACHES (b) = [DEF (h) [ V ALID(h) \ REACHES (h)] (13.2)
h2PRED(b)
288 Optimization
unsafe, however, because if k were zero the transformed program would terminate abnormally
and the original would not.
We can think of code motion as a combination of insertions and deletions. An insertion
is safe if the expression being inserted is available at the point of insertion. An expression is
available at a given point if it has been computed on every path leading to that point and
none of its operands have been altered since the last computation. Clearly the program's
result will not be changed by the inserted code if the inserted expression is available, and
if the inserted code were to terminate abnormally then the original program would have
terminated abnormally at one of the earlier computations. This argument guarantees the
safety of the rst transformation in Figure 13.14b. We rst insert the address computation
and assignment to a[i; j ], making it an epilogue of the conditional. The original computations
in the two branches are then redundant and may be removed.
C : array [operand_identifier] of program_point ;
for all operand identifiers o do DF (o ) := ;;
for all basic blocks b do
begin
for all operand identifiers o do C[o] := 0;
for i := first program point of b to last program point of b do
begin
for o 2 X (i) do C[o] := 0;
t
for o 2 D (i) do C[o] := i;
t
end;
DEF (b ) := ;;
for all operand identifiers o do
if C[o] =6 0 then
begin
DEF (b ) := DEF (b ) [ fC[o] g;
DF (o ) := DF (o ) [ fC[o] g;
end;
end;
for all basic blocks b do
begin
VALID (b ) := ;;
for all operand identifiers o do
if o 2 THRU (b ) then VALID (b ) := VALID (b ) [ DF (o );
end;
a) Computation of DEF (b) and VALID (b)
TR := REACHES (b );
fori := first program point of b to last program point of b do
begin
DEFS (o , i ) := ;;
for o 2Ut (i) do DEFS (o , i ) := TR \ DF (o );
for o 2Dt (i) [ Xt (i) do TR := TR - DF (o );
for o 2Dt (i) do TR := TR [ fi g;
end;
b) Computation of DEFS (o, p)
Figure 13.13: Computing a Set of Program Points
290 Optimization
The second transformation in Figure 13.14b involves an insertion where the inserted ex-
pression is not available, but where it is anticipated. An expression is anticipated at a given
point if it appears on every execution path leaving that point and none of its operands could
be altered between the point in question and the rst computation on each path. In our
example, (i , 1) n is anticipated in the prologue of the j loop, but i div k is not. Therefore
it is safe to insert the former but not the latter. Once the insertion has been made, the
corresponding computation in the epilogue of the conditional is redundant because its value
is available.
Let AVAIL (b) be the set of operand identiers available on entry to basic block b and
ANTIC (b) be the set of operand identiers anticipated on exit from b . These sets are dened
by the following systems of equations:
\
AVAIL(b) = [OUT (h) [ THRU (h) \ AVAIL(h)]
h2PRED(b)
\
ANTIC (b) = [ANLOC (h) [ THRU (h) \ ANTIC (h)]
h2SUCC (b)
Here OUT (b) is the set of operand identiers dened in b and not invalidated after their last
denition, and ANLOC (b) is the set of operand identiers for tuples computed in b before
any of their operands are dened or invalidated.
The main task of the optimizer is to nd code motions that are safe and protable (reduce
the cost of the program according to the desired measure). Wulf et al. [1975] considers
` , !' code motions that move computations from branched constructs to prologues and
epilogues. (The center column of Figure 13.14 illustrates an ! motion; an motion would
have placed the computation of a[i; j ] before the compare instruction.) He also discusses
the movement of invariant computations out of loops, as illustrated by the right column of
Figure 13.14. If loops are nested, invariant code is moved out one region at a time. Morel
and Renvoise [1979] present a method for moving a computation directly to the entrance
block of the outermost strongly-connected region in which it is invariant.
for i := 1 to n do
for j := 1 to n do
if j > k then a[i, j] := 0 else a[i, j] := i div k;
a) A Pascal fragment
LA R0,1 LA R0,1 LA R0,1
C R0,n(R13) C R0,n(R13) C R0,n(R13)
BH ENDI BH ENDI BH ENDI
B BODI B BODI B BODI
INCI A R0,=1 INCI A R0,=1 INCI A R0,=1
BODI ST R0,i(R13) BODI ST R0,i(R13) BODI ST R0,i(R13)
C R0,n(R13) C R0,n(R13) C R0,n(R13)
BH ENDJ BH ENDJ BH ENDJ
L R5,i(R13)
S R5,=1
M R4,n(R13)
B BODJ B BODJ B BODJ
INCJ A R0,=1 INCJ A R0,=1 INCJ A R0,=1
BODJ ST R0,j (R13) BODJ ST R0,j (R13) BODJ ST R0,j (R13)
C R0,k(R13) C R0,k(R13) C R0,k(R13)
BNH ELSE BNH ELSE BNH ELSE
SR R1,R1 SR R1,R1 SR R1,R1
L R3,i(R13)
S R3,=1
M R2,n(R13)
A R3,j (R13)
SLA R3,2
ST R1,a-4(R3,R13)
B ENDC B ENDC B ENDC
ELSE L R0,i(R13) ELSE L R0,i(R13) ELSE L R0,i(R13)
SRDA R0,32 SRDA R0,32 SRDA R0,32
D R0,k(R13) D R0,k(R13) D R0,k(R13)
L R3,i(R13) ENDC L R3,i(R13)
S R3,=1 S R3,=1
M R2,n(R13) M R2,n(R13)
A R3,j (R13) A R3,j (R13)
ENDC L R3,j (R13)
AR R3,R5
SLA R3,2 SLA R3,2 SLA R3,2
ST R1,a-4(R3,R13) ST R1,a-4(R3,R13) ST R1,a-4(R3,R13)
ENDC L R0,j (R13) L R0,j (R13) L R0,j (R13)
C R0,n(R13) C R0,n(R13) C R0,n(R13)
BL INCJ BL INCJ BL INCJ
ENDJ L R0,i(R13) ENDJ L R0,i(R13) ENDJ L R0,i(R13)
C R0,n(R13) C R0,n(R13) C R0,n(R13)
BL INCI BL INCI BL INCI
ENDI ENDI ENDI
(142 bytes) (118 bytes) (120 bytes)
b) IBM 370 implementations
Figure 13.14: Code Motion
292 Optimization
LA R0,1
C R0,n(R13)
BH ENDI
SR R5,R5 (i , 1) n initially 0
B BODI
INCI A R0,=1
A R5,=n Increment (i , 1) n
BODI ST R0,i(R13)
LA R0,1
C R0,n(R13)
BH ENDJ
B BODJ
INCJ A R0,=1
BODJ ST R0,j (R13)
C R0,k(R13)
BNH ELSE
SR R1,R1
B ENDIF
ELSE L R0,i(R13)
SRDA R0,32
D R0,k(R13)
ENDIF L R3,j (R13)
AR R3,R5
SLA R3,2
ST R1,a-4(R3,R13)
L R0,j (R13)
C R0,n(R13)
BL INCJ
ENDJ L R0,i(R13)
C R0,n(R13)
BL INCI
ENDI
(118 bytes)
Figure 13.15: Strength Reduction Applied to Figure 13.14b
Here j and k are either induction values or region constants and i is an induction variable.
The set of induction values is determined by assuming that all values dened in the region
are induction values, and then deleting those that do not satisfy the conditions [Allen et al.,
1981]. The induction values in Figure 13.16 are i, t2 , t3 and t7 .
To perform a strength reduction transformation on Figure 13.16, we dene a variable
V1 to hold the value t9 . An assignment must be made to this variable prior to entering
the strongly-connected region, and at program points where t9 has been invalidated and yet
t2 d1 is anticipated. For example, t9 is invalidated by t8 in Figure 13.16, and yet t2 d1
is anticipated at that point. An assignment V1 := t2 d1 should therefore be inserted just
before l2 . Since t2 is the value of i ", i := t7 ; V1 := t2 d1 is equivalent to V1 := (t2 + 1) d1 ;
i := t7. Using the distributive law, and recalling the invariant that V1 always holds the value
of t9 (= t2 d1 ), this sequence can be written as V1 := V1 + d1 ; i := t7 . Figure 13.17 shows the
result of the transformation, after appropriate decomposition into tuples.
We could now apply exactly the same reasoning to Figure 13.17, noting that V1 , t28 , t29 ,
t31 , t35 and t49 are now induction values. The obvious variables then hold t32 , t36 and t41 .
13.3 Global Optimization 293
porating such asymmetries in order to avoid having to exclude certain registers from the
allocation altogether. One allocation scheme [Chaitin et al., 1981; Chaitin, 1982] that
avoids the problem is based on graph coloring (Section B.3.3). The constraints on allocation
are expressed as an interference graph, a graph with one node for each register, both abstract
and actual. An edge connects two nodes if they interfere (i.e. if they exist simultaneously).
Clearly all of the machine registers interfere with each other. In the left column of Figure 13.8,
R[17] and R[18] do not interfere with each other, although they both interfere with R[16];
all abstract registers interfere with each other in the right column. If there are n registers, a
register assignment is equivalent to an n-coloring (Section B.3.3) of the interference graph.
Many asymmetry constraints are easily introduced as interferences. For example, any
abstract register used as a base register on the IBM 370 interferes with machine register
0. Similarly, we can solve a part of the multiplication problem by making the abstract
multiplicand interfere with every even machine register and dening another abstract register
that interferes with every odd machine register and every abstract register that exists during
the multiply. This guarantees that the multiplicand goes into an odd register and that an
even register is free, but it does not guarantee that the multiplicand and free register form a
pair.
The coloring algorithm [Chaitin et al., 1981] used for this problem diers from that of
Section B.3.3 because the constraints are dierent: There we are trying to nd the minimum
number of colors, assuming that the graph is xed; here we are trying to nd an n-coloring,
and the graph can be changed to make that possible. (Spilling a value to memory removes
some of the interferences, changing the graph.) Any node with fewer than n interferences does
not aect the coloring, since there will be a color available for it regardless of the colors chosen
for its neighbors. Thus it (and all edges incident upon it) can be deleted without changing
whether the graph can be n-colored. If we can continue to delete nodes in this manner until
the entire graph disappears, then the original was n-colorable. The coloring can be obtained
by adding the nodes back into the graph in the reverse order of deletion, coloring each as it
is restored.
If the coloring algorithm encounters a node with n or more interferences, it must make a
decision about which node to spill. A separate table is used to give the cost of spilling each
register, and the register is chosen for which cost/(incident edges) is as small as possible.
Some local intelligence is included: When a computation is local to a basic block, and no
abstract register lifetimes end between its denition and last use, the cost of spilling it is set
to innity. The cost algorithm also accounts for the facts that some computations can be
redone instead of being spilled and reloaded, and that if the source or target of a register
copy operation is spilled then that operation can be deleted. It is possible that a particular
spill can have negative cost!
Unfortunately, the introduction of spill code changes the conditions of the problem. Thus,
after all spill decisions are made, the original program is updated with spill code and the
allocation re-run. Chaitin claims that the second iteration usually succeeds, but it may
be necessary to insert more spill code and try again. To reduce the likelihood of multiple
iterations, one can make the rst run with n , k registers instead of n registers.
benets of some of the techniques we have discussed and leave it to the compiler writer to
strike, under pressure from the marketplace, a reasonable balance.
By halving the code size required to implement a language element that accounts for 1%
of a program we reduce the code size of that program by only 0.5%, which certainly does not
justify a high compilation cost. Thus it is important for the compiler writer to know the milieu
in which his compiler will operate. For example, elimination of common subexpressions, code
motion and strength reduction might speed up a numerical computation solving a problem
in linear algebra by a factor of 2 or 3. The same optimizations often improve non-numeric
programs by scarcely 10%. Carter's 1982 measurements of 95,000 lines of Pascal, primarily
non-numeric code, shows that the compiler would typically be dealing with basic blocks
containing 2-4 assignments, 10-15 tuples and barely 2 common subexpressions!
Static analysis does not, of course, tell the whole story. Knuth [1971a] found in his study
of FORTRAN that less than 4% of a program generally accounts for half of its running time.
This phenomenon was exploited by Dakin and Poole [1973] to implement an interactive text
editor as a mixture of interpreted and directly-executed code. Their measurements showed
that in a typical editing session over 97% of the execution involved less than 10% of the
code, and more than half of the code was never used at all. Finally, Knuth discovered that
over 25% of the running times of the FORTRAN programs he proled was spent performing
input/output.
Measure Ratios
Local/None Global/None Global/Local
Compilation time Min. 0.8 1.0 1.2
Avg. 0.9 1.4 1.4
Max. 1.0 1.6 1.6
Code space Min. 0.42 0.38 0.89
Avg. 0.54 0.55 1.02
Max. 0.69 0.66 1.19
Execution time Min. 0.32 0.19 0.58
Avg. 0.50 0.42 0.82
Max. 0.72 0.61 0.94
Table 13.1: Evaluation of PL/1L [Cocke and Markstein, 1980]
Actual measurements of optimization ecacy and cost are rare in the literature, and the
sample size is invariably small. It is thus very dicult to draw general conclusions. Table 13.1
summarizes a typical set of measurements. Cocke and Markstein [1980] PL/1L, an ex-
perimental optimizing compiler for a PL/1-like language, was run over each of four programs
several times. A dierent level of optimization was specied for each compilation of a given
program, and measurements made of the compilation time, code space used for the resulting
object program, and execution time of the resulting object program on a set of data. At
every level the compiler allocated registers globally by the graph coloring algorithm sketched
in Section 13.3.4. No other optimizations were performed at the `None' level. The `Local' op-
timizations were those discussed in Section 13.2.1, and the `Global' optimizations were those
discussed in Sections 13.3.1 through 13.3.3. It is not clear what (if any) peephole optimization
was done, although the global register allocation supposedly deleted redundant comparisons
following arithmetic operations by treating the condition code as another allocatable register
[Chaitin et al., 1981]. The reduction in compilation time for local optimization clearly illus-
trates the strong role that global register allocation played in the compilation time gures.
Local optimization reduced the number of nodes in the interference graph, thus more than
covering its own cost. One of the test programs was also compiled by the standard optimizing
13.4 Ecacy and Cost 297
PL/1 compiler in a bit less than half of the time required by the PL/1L compiler. OPT=0
was selected for the PL/1 compiler, and local optimization for the PL/1L compiler. This
ratio changed slightly in favor of the PL/1 compiler (0.44 to 0.38) when OPT=2 and `global'
were selected. When the same program was rewritten in FORTRAN and compiled using
FORTRAN H, the ratios OPT=0/local and OPT=2/global were almost identical at 0.13.
(Section 14.2.3 discusses the internals of FORTRAN H.)
In the late 1970's, Wulf and his students attempted to quantitatively evaluate the size of
the object code produced by an optimizing compiler. They modeled the optimization process
by the following equation:
Y
K (C; P ) = Ku (C; P ) Oi (C )
i
K (C; P ) is the cost (code space) of program P compiled with compiler C , and Ku is the cor-
responding unoptimized cost. Each Oi (C ) is a measure of how eectively compiler C applies
optimization i to reduce the code size of a typical program, assuming that all optimizations
1; : : : ; i , 1 have already been done. They were never able to validate this model to their
satisfaction, and hence the work never reached publication. They did, however, measure the
factors Oi (C ) for Bliss-11 [Wulf et al., 1975] (Table 13.2).
Index Description Factor
1 Evaluating constant expressions 0.938
2 Dead code elimination 0.98
3 Peephole optimization 0.88
4 Algebraic laws 0.975
5 CSE in statements 0.987
6 CSE in basic blocks 0.973
7 Global CSE 0.987
8 Global register allocation 0.975
9 Load/store motion 0.987
10 Cross jumping 0.972
11 Code motion 0.985
12 Strength reduction -
Table 13.2: Optimization Factors for Bliss-11 Wulf et al. [1975]
We have considered optimizations 1 and 4 of Table 13.2 to precede formation of the com-
putation graph; the remainder of 1-6 constitute the local optimizations of Section 13.2. Thus
the product of these factors (roughly 0.76) should approximate the eect of local optimization
alone. Similarly, the product of factors 7-12 (roughly 0.91) should approximate the additional
improvement due to global optimization. Comparing this latter gure with the last column
of Table 13.1 shows the deleterious eect of strength reduction on code space discussed in
Section 13.3.3.
The rst column of Table 13.1 shows a code size improvement signicantly better than
0.76, implying that the PL/1L compiler generates poorer initial code than Bliss-11, leaving
more to be gained by simple optimizations. This should not be taken as a criticism. After
all, using a sophisticated code generator with an optimizer is a bit like vacuuming the oce
before the cleaning crew arrives! Davidson and Fraser [1980] take the position that code
generation should be trivial, producing instructions to simulate a simple stack machine on
an innite-register analog of the target computer. They then apply the optimizations of
Section 13.2, using a fragment bounded by labels (i.e. a path in an extended basic block) in
lieu of a basic block.
298 Optimization
Exercises
13.1 Show how the dependency sets would be derived when building a computation graph
that represents a LAX program for a target machine of your choice.
13.2 Assume that the FORTRAN assignment statement
A(I; J; K ) =(A(I; J; K , 1) + A(I; J; K + 1) +
A(I; J , 1; K ) + A(I; J + 1; K ) +
A(I , 1; J; K ) + A(I + 1; J; K ))=6:0
constitutes a single basic block.
(a) Write the initial tuple sequence for the basic block.
(b) Derive a new tuple sequence by the algorithm of Figure 13.4a.
(c) Code the results of (b), using register transfers that describe the instructions of
some machine with which you are familiar.
13.3 Give an example, for some machine with which you are familiar, of a common subex-
pression satisfying each of the following conditions. If this is impossible for one or more
of the conditions, carefully explain why.
(a) Always cheaper to recompute than save.
(b) Never cheaper to recompute than save.
(c) Cheaper to recompute i it must be saved in memory.
13.4 Explain how the rst method of peephole optimization described in Section 13.2.3 could
be used to generate patterns for the second. Would it be feasible to combine the two
methods, backing up the second with the rst? Explain.
13.5 Assume that the register management algorithm of Figure 10.12 is to be used in an
optimizing compiler. Dene precisely the conditions under which all possible changes
in register state will occur.
13.6 Show how the D and X sets are propagated through the value numbering and coding
processes to support the decisions of Exercise 13.5, as described in Section 13.2.4.
13.7 Give examples of safe code motions in which the following behavior is observed:
(a) The transformed program terminates abnormally in a dierent place than the
original, but with the same error.
(b) The transformed program terminates abnormally in a dierent place than the
original, with a dierent error.
13.8 Consider a Pascal if statement with integer constant bounds. Assume that the lower
bound is smaller than the upper bound, which is smaller than maxint. Instead of using
the schema of Figure 3.10c, the implementor chooses the following:
i := e1 ; t := e3 ;
l1 : : : : (* Body of the loop *)
i := i + 1;
if i t then goto l1 ;
(a) Explain why no strength reduction can be carried out in this loop.
(b) Suppose that we ignore the explanation of (a) and carry out the transformation
anyway. Give a specic example in which the transformed program terminates
abnormally but the original does not. Restrict the expressions in your example
to those arising from array subscript calculations. Your array bounds must be
reasonable (i.e. arrays with maxint elements are unreasonable).
Chapter 14
Implementation
In earlier chapters we have developed a general framework for the design of a compiler. We
have considered how the task and its data structures could be decomposed, what tools and
strategies are available to the compiler writer, and what problems might be encountered.
Given a source language, target machine and performance goals for the generated code we
can design a translation algorithm. The result of the design is a set of module specications.
This chapter is concerned with issues arising out of the implementation of these specica-
tions. We rst discuss the decisions that must be made by the implementors and the criteria
that guide these decisions. Unfortunately, we can give no quantitative relationship between
decisions and criteria! Compiler construction remains an art in this regard, and the successful
compiler writer must simply develop a feel for the inevitable compromises. We have there-
fore included three case studies of successful compilers that make very dierent architectural
decisions. For each we have tried to identify the decisions made and show the outcome.
14.1.1 Criteria
Maintainability, performance and portability are the three main criteria used in making im-
plementation decisions. The rst is heavily in
uenced by the structure of the program, and
depends ultimately on the quality of the modular design. Unfortunately, given current imple-
mentation languages, it is sometimes necessary to sacrice some measure of maintainability
to achieve performance goals. Such tradeos run counter to our basic principles. We do not
lightly recommend them, but we recognize that in some cases the compiler will not run at all
299
300 Implementation
unless they are made. We do urge, however, that all other possibilities be examined before
such a decision is taken.
Performance includes memory requirements, secondary storage requirements and process-
ing time. Hardware constraints often place limits on performance tradeos, with time the
only really free variable. In Sections 14.1.2 and 14.1.3 we shall be concerned mainly with
tradeos between primary and secondary storage driven by such constraints.
Portability can be divided into two sub-properties often called rehostability and retar-
getability. Rehosting is the process of making the compiler itself run on a dierent machine,
while retargeting is the process of making it generate code for a dierent machine. Rehosta-
bility is largely determined by the implementation language and the performance tradeos
that have been made. Suppose, for example, that we produce a complete design for a Pascal
compiler, specifying all modules and interfaces carefully. If this design is implemented by
writing a FORTRAN program that uses only constructs allowed by the FORTRAN standard,
then there is a good chance of its running unchanged on a wide variety of computers. If, on
the other hand, the design is implemented by writing a program in assembly language for the
Control Data 6000 series then running it on another machine would involve a good deal of
eort.
Even when we x both the design and the implementation language, performance consid-
erations may aect rehostability. For example, consider the use of bit vectors (say as parser
director sets or error matrices, or as code generator decision table columns) when the imple-
mentation language is Pascal. One possible representation is a set, another is a packed array
of Boolean. Unfortunately, some Pascal implementations represent all sets with the same
number of bits. This usually precludes large sets, and the bit vectors must be implemented
as arrays of sets or packed arrays of Boolean. Other implementations only pack arrays to the
byte level, thus making a packed array of Boolean eight times as large as it should be. Clearly
when the compiler is rehosted from a machine with one of these problems to a machine with
the other, dierent implementations of bit vectors may be needed to meet performance goals.
Neither of the situations in the two previous paragraphs aected the design (set of mod-
ules and interfaces). Rehostability is thus quite evidently a property of the implementation.
Retargetability, on the other hand, is more dependent upon the design. It requires a clean
separation between the analysis and synthesis tasks, since the latter must be redesigned in
order to retarget the compiler. If the target machine characteristics have been allowed to
in
uence the design of the analysis task as well as the synthesis task, then the redesign will
be more extensive. For example, suppose that the design did not contain a separate constant
table module. Operations on constants were carried out wherever they were needed, following
the idiosyncrasies of the target machine. Retargeting would then involve redesign of every
module that performed operations on constants, rather than redesign of a single module.
Although the primary determinant of retargetability is the design, implementation may
have an eect in the form of tradeos between modularity and performance that destroy the
analysis/synthesis interface. Such tradeos also degrade the maintainability, as indicated at
the beginning of this section. This should not be surprising, because retargeting a compiler is,
after all, a form of maintenance: The behavior of the program must be altered to t changing
customer requirements.
a chair and is visited in turn by the dentist, hygienist and x-ray technician: The program
is placed in the primary storage of the machine and the phases of the compiler are `passed
by the program', each performing a transformation of the data in memory. This strategy is
appropriate for systems with restricted secondary storage capability. It does not require that
intermediate forms of the program be written and then reread during compilation; a single
read-only le to hold the compiler itself is sucient. The size of the program that can be
compiled is limited, but it is generally possible to compile programs that will completely ll
the machine's memory at execution time. (Source and intermediate encodings of programs
are often more compact than the target encoding.)
Another strategy is analogous to that of a bureau of motor vehicles in which the applicant
rst goes to a counter where application forms are handed in, then to another where written
tests are given, and so on through the eye test, driving test, cashier and photographer: The
compiler `passes over the program', repeatedly reading and writing intermediate forms, until
the translation is complete. This strategy is appropriate for systems with secondary storage
that can support several simultaneously-open sequential les. The size of the program that
can be compiled is limited by the ling system rather than the primary memory. (Of course
primary memory will limit the complexity of the program as discussed in Chapter 1.)
Either strategy requires us to decompose the compilation into a sequence of transforma-
tions, each of which is completed before the next is begun. One fruitful approach to the
decomposition is to consider relationships between tasks and large data structures, organiz-
ing each transformation around a single data structure. This minimizes the information
ow
between transformations, narrowing the interfaces. Table 14.1 illustrates the process for a
typical design. Each row represents a transformation. The rst column gives the central data
structure for the tasks in the second column. It participates in only the transformation cor-
responding to its row, and hence no two of these data structures need be held simultaneously.
Our second strategy places an extra constraint upon the intermediate representations of
the program: They must be linear, and each will be processed sequentially. The transforma-
tions are carried out by passes, where a pass is a single scan, in either direction, of a linear
intermediate representation of the program. Each pass corresponds to a traversal of the
structure tree, with forward passes corresponding to depth-rst, left-to-right traversals and
backward passes corresponding to depth-rst, right-to-left traversals. Under this constraint
we are limited to AAG(n) attribution; the attribute dependencies determine the number of
passes and the tasks carried out in each. It is never necessary to build an explicitly-linked
structure tree unless we wish to change traversals. (An example is the change from a depth-
rst, left-to-right traversal of an expression tree to an execution-order traversal based upon
register counts.)
The basic Pascal le abstraction is a useful one for the linear intermediate representations
of the program. A module encapsulates the representation, providing an element type and a
302 Implementation
single window variable of that type. Operations are available to empty the sequence, add the
content of the window to the sequence, get the rst element of the sequence into the window,
get the next element of the sequence into the window, and test for the end of the sequence.
This module acts as a `pipeline' between the passes of the compiler, with each operating
directly on the window. By implementing the module in dierent ways we can cause the
communicating passes to operate as coroutines or to interact via a le.
While secondary storage is larger than primary storage, constraints on space are not
uncommon. Moreover, a signicant fraction of the passes may be I/O-bound and hence
any reduction in the size of an intermediate representation will be re
ected directly in the
compilation time. Our communication module, if it writes information to a le, should
therefore encode that information carefully to avoid redundancy. In particular, the element
will usually be a variant record and the communication module should transmit only the
information present in the stated variant (rather than always assuming the largest variant).
Further compression may be possible given a knowledge of the meanings of the elds. For
example, in the token of Figure 4.1 the line number eld of coordinates changes only rarely,
and need be included only when it does change. The fact that the line number is present can
be encoded by the classication eld in an obvious way. Because most tokens are completely
specied by the classication eld alone, this optimization can reduce the size of a token le
by 30%.
resulting block placed in the current element. In any case the pointer to the most-recently
delivered element is advanced along the list. Thus the list acts like a stack, and its nal length
is the maximum number of entries the table required at one point in the compilation.
The disadvantage of this strategy is that the storage requirements are those that would
obtain if all tables in each pass reached their maximum requirement simultaneously. Often
this is not the case, and hence larger programs could have been accommodated if storage for
unused entries had been returned to the operating system.
Every pass that manipulates constant values must include the necessary operations of
the abstract data type constant table discussed in Section 4.2.2. Constant table denes an
internal representation for each type of value. This representation can be used as an attribute
value, but any manipulation of it (other than assignment) must be carried out by constant
table operations. We pointed out in Section 4.2.2 that the internal representation might sim-
ply describe an access function for a data structure within the constant table module. This
strategy should be used carefully in a multipass compiler to avoid broadening the interface
between passes: The extra data structure should usually not be retained intact and trans-
mitted from one pass to the next via a separate le. Instead, all of the information about
a constant should be added to the linearized form of the attributed structure tree at an ap-
propriate point. The extra data structure is then reconstituted as the linearized tree is read
in.
The string table is a common exception to the approach suggested above. Careful design
of the compiler can restrict the need for string table access to two tasks: lexical analysis
and assembly. (This is true even though it may be used to store literal strings and strings
representing the fractions of
oating point numbers as well as identiers.) Thus the string
table is often written to a separate le at the completion of lexical analysis. It is only retrieved
during assembly when the character representations of constants must be converted to target
code, and identiers must be incorporated into external symbol dictionaries.
Note the interdependence of the decisions about representation of tokens and form of the
intermediate code. A 10-bit byte allows values in the range [0,1023]. By using the subrange
[512,1022] for identiers, one eectively combines the classification and symbol elds of
Figure 4.1. Values less than 512 classify non-identier tokens, in most cases characterizing
them completely. Only constants need more than a single byte using this scheme, and we
know that constants occur relatively infrequently. Interestingly, only string constants are
handled in pass 1. Those whose machine representations do not exceed 40 bits are replaced
by a marker byte followed by 4 bytes holding the representation. Longer strings are saved
on the drum and replaced in the code by a marker byte followed by 4 bytes giving the drum
track number and relative address. In the terminology of Section 4.2.2, the constant table has
separate xed-length representations for long and short strings. Numeric constants remain in
the text as strings of bytes, one corresponding to each character of the constant.
Pass 3 performs the normal syntactic analysis, and also converts numeric and logical
constants to a
ag byte followed by 4 bytes giving the machine representation. Again in the
terminology of Section 4.2.2, the internal and target representations of numeric constants are
identical. (The
ag byte simply serves as the classification eld of Figure 4.1; it is not part
of the constant itself.) Naur's description of the compiler strongly suggests that parsing is
carried out by the equivalent of a pushdown automaton while the lexical analysis of pass 1
is more ad-hoc. As we have seen, numeric constants can be handled easily by a pushdown
automaton. The decision to process numeric and logical constants in pass 3 rather than in
pass 1 was therefore probably one of convenience.
The intermediate output from pass 3 consists of the unchanged identiers and constants,
and a transformed set of delimiters that precisely describe the program's structure. It is
eectively a sequence of connection point numbers and tokens, with the transformed delimiters
specifying structure connections and each identier or constant specifying a single symbol
connection plus the associated token.
Attribute
ow is generally from declaration to use. Since declaration may follow use
in ALGOL 60, reverse attribute
ow may occur. Pass 4 is a reverse pass that collects all
declarative information of a block at the head of the block. It merely simplies subsequent
processing.
In pass 5, the denition table is actually distributed through the text. Each identier is
replaced by a 4-byte group that is the corresponding denition table entry. It gives the kind
(e.g. variable, procedure), result type, block number, relative address and possibly additional
information. Thus GIER ALGOL does not abstract entities as proposed in Section 4.2.3, but
deposits the necessary information at the leaves of the structure tree. This example empha-
sizes the fact that possessions and denitions are separate. GIER ALGOL uses possessions
virtually identical to those discussed in connection with Figure 9.21 to control placement of
the attributes during pass 5, but it has no explicit denition table at all.
Given the attribute propagation performed by passes 4 and 5, the attribution of pass 6 is
LAG(1). This illustrates the interaction between attribute
ow and pass structure. Given an
attribute grammar, we must attempt to partition the relationships and semantic functions
so that they fall into separable components that can be t into the overall implementation
model. This partitioning is beyond the current state of the art for automatic generators.
We can only carry out the partitioning by hand and then use analysis tools based upon the
theorems of Chapter 8 to verify that we have not made any mistake.
Address calculations are carried out during both pass 7 and pass 8. Backward references
are resolved by pass 7; pass 8 is backward over the program, and hence can trivially resolve
forward references. Literal pooling is also done during pass 7. All of the constants used in
the code on one drum track appear in a literal pool on that track.
306 Implementation
basic symbol
program
block
constant
type
simple type
field list
label declaration
constant declaration
type declaration
variable declaration
procedure declaration
parameter list
body
statement
selector
variable
call
expression
simple expression
term
factor
assignment
compound statement
goto statement
if statement
case statement
while statement
repeat statement
for statement
with statement
Figure 14.1: The Structure of the Zurich Pascal Compilers
The overall structure of the compiler was established in step 1; Figure 14.1 shows this
structure. Each line represents a procedure, and nesting is indicated by indentation. At this
step the procedure bodies had the form discussed in Section 7.2.2, and implemented an EBNF
description of the language.
Lexical analysis is carried out by a single procedure that follows the outline of Chapter 6.
It has no separate scanning procedures, and it incorporates the constant table operations
for conversion from source to internal form. Internal form and target form are identical. No
internal-to-target operators are used, and the internal form is manipulated directly via normal
Pascal operations.
There is no symbol table. Identiers are represented internally as packed arrays of 10
characters { one 60-bit word. If the identier is shorter than 10 characters then it is padded
on the right with spaces; if it is longer then it is truncated on the right. (We have already
deplored this strategy for a language whose denition places no constraints upon identier
length.) Although the representation is xed-length, it still does not dene a small enough
address space to be used directly as a pointer or table index. Name analysis therefore requires
searching and, because there may be duplicate identiers in dierent contexts, the search space
may be larger than in the case of a symbol table. Omission of the symbol table does not save
308 Implementation
much storage because most of the symbol table lookup mechanism must be included in the
name analysis.
Syntactic error recovery is carried out using the technique of Section 12.2.2. A minor
modication was needed because the stack is not accessible when an error is detected: Each
procedure takes an anchor set as an argument. This set describes the anchors after reduction
of the nonterminal corresponding to the procedure. Symbols must be added to this set
to represent anchors within the production currently being examined. Of course all of the
code to update the anchors, check for errors, skip input symbols and advance the parse was
produced by hand. This augmentation of the basic step 1 routines constituted step 2 of
the compiler development. The basic structure of Figure 14.1 remained virtually unchanged;
common routines for error reporting and skipping to an anchor were introduced, with the
former preceding the basic symbol routine (so that lexical errors could be reported) and the
latter following it (so that the basic symbol routine could be invoked when skipping).
Step 3 was concerned with building the environment attribute discussed in Section 9.1.1.
Two record types, identrec and structrec, were added to the existing compiler. The envi-
ronment is a linked data structure made up of records of these types. There is one identrec
per declared identier, and those for identiers declared in the same range are linked as an
unbalanced binary tree. An array of pointers to tree roots constitutes the denition of the
current addressing environment. Three of the denition table operations discussed in Sec-
tion 9.2 (add a possession to a range, search the current environment, search a given range)
are implemented as common routines while the others are coded in line. Entering and leaving
a range are trivial operations, involving pointer assignment only, while searching the current
environment is complex. This is exactly the opposite of Figure 9.21, which requires complex
behavior on entry to and exit from a range with simple access to the current environment.
The actual discrepancy between the two techniques is reduced, however, when we recall that
the Zurich compiler does not perform symbol table lookups.
Each identrec carries attribute information as well as the linkages used to implement the
possession table. Thus the possessions and denitions are combined in this implementation.
The type attribute of an identier is represented by a pointer to a record of type structrec,
and there is one such record for every dened type. Certain types (as for example scalar
types) are dened in terms of identiers and hence a structrec may point to an identrec. The
identrec contains an extra link eld, beyond those used for the range tree, to implement lists
of identiers such as scalar constants, record elds and formal parameters.
The procedures of Figure 14.1 can be thought of as carrying out a depth-rst, left-to-
right traversal of the parse tree even though that tree never has an explicit incarnation.
Since only one pass is made over the source program, the attribution rules must meet the
LAG(1) condition. They were simply implemented by Pascal statements inserted into the
procedures of Figure 14.1 at the appropriate points. Thus at the conclusion of step 3 the
bodies of these procedures still had the form of Section 7.2.2, but contained additional Pascal
code to calculate the environment attribute. As discussed in Section 8.3.2, attribute storage
optimization led to the representation of the environment attribute as a linked, global data
structure rather than an item stored at each parse tree node. The interesting part of the
structure tree is actually represented by the hierarchy of activation records of the recursive
descent procedures. Attribute values attached to the nodes are stored as values of local
variables of these procedures.
During step 4 of the renement the remainder of the semantic analysis was added to
the routines of Figure 14.1. This step involved additional attribution and closely followed
the discussion of Chapter 9. Type denitions were introduced for the additional attributes,
global variables were declared for those attributes whose storage could be optimized, and local
variables were declared for the others. The procedures of Figure 14.1 were augmented by the
14.2 Case Studies 309
Pascal code for the necessary attribution rules, and functions were added to implement the
recursive attribute functions.
Ammann [1975] reports that steps 1-4 occupied a bit more than 6 months of the 24-month
project and accounted for just over 2000 of the almost 7000 lines in Pascal-6000. Steps 5 and
6 for Pascal-P were carried out in less than two and a half months and resulted in about 1500
lines of Pascal, while the corresponding numbers for Pascal-6000 were thirteen months and
4000 lines. Step 7 added another three and a half months to the total cost of Pascal-6000,
while increasing the number of lines by less than 1000.
The abstract stack computer that is the target for the Pascal-P compiler is carefully
matched to Pascal. Its elementary operators and data types are those of Pascal, as are
its memory access paths. There are special instructions for procedure entry and exit that
provide exactly the eect of a Pascal procedure invocation, and an indexed jump instruction
for implementing a case selection. Code generation for such a machine is clearly trivial, and
we shall not consider this part of the project further.
Section 10.1 describes storage allocation in terms of blocks and areas. A block is an object
whose size and alignment are known, while an area is an object that is still growing. In Pascal,
blocks are associated with completely-dened types, whereas areas are associated with types
in the process of denition and with activation records. Thus Pascal-6000 represents blocks
by means of a size eld in every structrec. The actual form of this eld varies with the
type dened by the structrec; there is no uniform "size" attribute like that of Figure 10.1.
Because of the recursive descent architecture and the properties of Pascal, the lifetime of an
area coincides with the invocation of one of the procedures of Figure 14.1 in every case. For
example, an area corresponding to a record type grows only during an invocation of the eld
list procedure. This means that the specication of an area can be held in local variables
of a procedure. Step 5 added these local variable declarations and the code to process area
growth to the procedures of Figure 14.1. The size eld was also added to structrec in this
step.
Step 6 was the rst point at which a `foreign' structure { the structure of the target
machine { appeared. This renement was thus the rst that added a signicant number
of procedures to those of Figure 14.1. The added procedures eectively act as modules for
simulation and assembly.
As we pointed out earlier, no explicit structure tree is ever created by Pascal-6000. This
means that the structure tree cannot be decorated with target attributes used to determine
an improved execution order and then traversed according to this execution order for code
selection. Pascal-6000 thus computes no target attributes other than the value descriptors of
Section 10.3.1. They are used in conjunction with a set of register descriptors and register
allocation operations to perform a machine simulation exactly as discussed in Section 10.3.1.
The recursive descent architecture once again manifests itself in the fact that global storage
is provided for only one value descriptor. Most value descriptors are held as local variables of
procedures appearing in Figure 14.1, with the global variable describing the `current' value {
the one that would lie at the `top of the stack'.
The decision tables describing code selection are hand-coded as Pascal conditionals and
case statements within the analysis procedures. Code is generated by invoking register alloca-
tion procedures, common routines such as load and store, and assembly interface procedures
from Table 14.4.
The rst four operations of Table 14.4 assemble target code sequentially; Pascal-6000 does
not have the concept of separate sequences discussed in Section 11.1.1. A `location counter'
holds the current relative address, which may be accessed by any routine and saved as a
label. The third operand of a 30-bit instruction may be either an absolute value or a relative
address, and gen30 has a fourth parameter to distinguish these cases. Forward references are
310 Implementation
Procedure Description
noop Force code alignment to a word boundary
gen15 Assemble a 15-bit instruction
gen30 Assemble a 30-bit instruction
gen60 Assemble a 60-bit constant
searchextid Set up an external reference
ins Satisfy a given forward reference
lgohead Output PIDL and ENTR
lgotext Output TEXT
lgoend Output XFER and LINK
Table 14.4: Pascal-6000 Assembly Operations
handled by ins, which allows a relative address to be stored at a given position in the code
already assembled.
In keeping with the one-pass architecture, Pascal-6000 retains all of the code for a single
procedure. The assembly `module' is initialized when the `body' procedure (Figure 14.1) is
invoked, and a complete relocatable deck is output at the end of this invocation to nalize
the `module'. Pascal-6000 uses Control Data's standard relocatable binary text as its target
code, in keeping with our admonition at the beginning of Section 11.2. We shall discuss the
layout of that text here in some detail as an illustration; another example, the IBM 370 object
module, will be given at the end of the next section.
PIDL 34 1 0 TEXT 40 n + 1 address
name length relocation bits
text1
XFER 46 1 0 :::
Start symbol 0 textn
ENTR 36 2n 0 LINK 44 n 0
symbol1 0 symbol1
0 address1 field1;1 :::
::: field1;i sym
symboln 0 symbol2 field2;1
0 addressn :::
Figure 14.2: Control Data 6000 Series Relocatable Binary Code
A relocatable subprogram is a logical record composed of a sequence of tables (Figure 14.2),
which are simply blocks of information with various purposes. The rst word of each table
contains an identifying code and species the number of additional 60-bit words in the table.
As with any record, a relocatable subprogram may be preceded by a prex table containing
arbitrary information (such as the date compiled, version of the compiler, etc.), but the rst
component of the subprogram proper is the program identication and length (PIDL) table.
PIDL is conventionally followed by an entry point (ENTR ) table that associates entry point
symbols with the locations they denote (Section 11.2.1), but in fact the loader places no
constraints on either the number or the position(s) of any tables other than PIDL.
The body of the subprogram is made up of TEXT tables. Each TEXT table species a
block of up to 15 words, the rst of which should be loaded at the specied address. Four
relocation bits are used for each text word (hence the limit of 15 text words). References to
external symbols are not indicated by the relocation bits, which only distinguish absolute and
signed relative addresses. External references are specied by LINK tables: For each external
14.2 Case Studies 311
symbol, a sequence of operand eld denitions is given. The loader will add the address of
the external symbol to each of the elds so dened. Thus a call of "sqrt", for example, would
appear in the TEXT table as an RJ (return jump ) instruction with the absolute value 0 as
its operand. This 0-eld would then be described in a LINK table by one of the operand eld
denitions following the symbol sqrt. When the loader had determined the address of sqrt it
would add it to the 0-eld, thus changing the instruction into RJ sqrt. There is no restriction
on the number of LINK tables, the number of times a symbol may appear or the number of
eld denitions that may follow a single symbol. As shown in Figure 14.2, each eld denition
occupies 30 bits, each symbol occupies 60 bits, and a symbol may be split between words.
The transfer (XFER ) table is conventionally associated with a main program. It gives
the entry point to which control is transferred after the loader has completed loading the
program. Again, however, the loader places no restriction on the number of XFER tables
or the subprograms with which they are associated. An XFER table is ignored if its start
symbol begins with a space, or if a new XFER whose start symbol does not begin with a
space is encountered. The only requirement is that, by the time the load is completed, a start
symbol that is an entry point of some loaded subprogram has been specied.
Internal and external references, either of which may occur in a 30-bit instruction, are
represented quite dierently in the target code. This is re
ected at the assembly interface
by the presence of searchextid. When a 30-bit instruction is emitted, gen30 checks a global
pointer. If it is not nil then it points to an external symbol, and gen30 adds the target
location of the current instruction's third operand to a list rooted in that symbol. This list
will ultimately be used by lgoend to generate a LINK table. The global pointer checked by
gen30 is set by searchextid and cleared to nil by gen30. When the code generator emits a
30-bit instruction containing an external reference it therefore rst invokes searchextid with
the external identier and then invokes gen30 with the absolute value 0 as the third operand.
Section 11.3.1 gives an alternative strategy.
spaces, and then classied. Based upon the classication, ad hoc analysis routines are used to
deal with the parts of the statement. All of these routines have similar structures: They scan
the statement from left to right, extracting each operand and making an entry for it in the
denition table if one does not already exist, and building a linear list of operator/operand
pairs. The operator of the pair is the operator that preceded the operand; for the rst pair
it is the statement class. An operand is represented by a pointer to the denition table plus
its type and kind (constant, simple variable, array, etc.) The type and kind codes are also in
the denition table entry, and are retained in the list solely to simplify access.
Phase 10 performs only a partial syntactic analysis of the source program. It does not
determine the tree structure within a statement, but it does extract the statement number
and classify some delimiters that have multiple meaning. For example, it replaces `(' by `left
arithmetic parenthesis', `left subscript parenthesis' or `function parenthesis' as appropriate.
Name analysis is rudimentary in FORTRAN because the meaning of an identier is inde-
pendent of the structure of a program unit. This means that no possessions are required, and
the symbol and denition tables can be integrated without penalty. Symbol lookup uses a
simple linear scan of the chained denition table entries, but the organization of the chains is
FORTRAN-specic: There is one ordered chain for each of the six possible identier lengths,
and each chain is doubly-linked with the header pointing to the center of the chain. Thus a
search on any chain only involves half the entries. (The header is moved as entries are added
to a chain, in order to maintain the balance.) Constants, statement numbers and common
block names also have entries in the denition table. Three chains are used for constants, one
for each allowable length (4, 8 or 16 bytes), and one each for statement numbers and common
block names.
The only semantic analysis done during Phase 10 is `declaration processing'. Type, di-
mension, common and equivalence statements are completely processed and the results sum-
marized in the denition table. Because FORTRAN does not require that identiers be
declared, attribute information must also be gathered from applied occurrences. A minor use
of the attribute information is in the classication of left parentheses (mentioned above), be-
cause FORTRAN does not make a lexical distinction between subscript brackets and function
parentheses.
14.2 Case Studies 313
Phase 15 completes the syntactic analysis, converting the lists of operator/operand pairs
to lists of quadruples where appropriate. Each quadruple consists of an operator, a target type
and three pointers to the denition table. This means that phase 15 also creates a denition
table entry for every anonymous intermediate result. Such `temporary names' are treated
exactly like programmer-dened variables in subsequent processing, and may be eliminated by
various optimizations. The quadruples are chained in a correct (but not necessarily optimum )
execution order and gathered into basic blocks.
Semantic analysis is also completed during phase 15, with all operator identication and
consistency checking done as the quadruples are built. The target type is expressed as a
general type (logical, integer, real) plus an operand type (short, long) for each operand and
for the result.
The syntactic and semantic analysis tasks of phase 15 are carried out by an overlay segment
known as PHAZ15, which also gathers dened/used information for common subexpression
and dead variable analysis. This information is stored in basic block headers as discussed
in Chapter 13. Finally, PHAZ 15 links the basic block headers to both their predecessors
and their successors, describing the
owgraph of the program and preparing for dominance
analysis.
CORAL is the second overlay segment of phase 15, which carries out the memory mapping
task. The algorithm is essentially that discussed in Section 10.1, but its only function is to
assign addresses to constants and variables (in other words, to map the activation record).
There are no variant records, but equivalence statements cause variables to share storage. By
convention, the activation record base is in register 13. The layout of the activation record is
given in Figure 14.3. It is followed immediately by the code for the program unit. (Remember
that storage allocation is static in FORTRAN.) The size of the save area (72 bytes) and its
alignment (8) are xed by the implementation, as is the size of the initial contents for register
12 (discussed below). Storage for the computed GOTO tables and the parameter lists has
already been allocated storage by Phase 10. CORAL allocates storage for constants rst, then
for simple variables and then for arrays. Local variables and arrays mentioned in equivalence
statements come next, completing this part of the activation record. Finally the common
blocks specied by the program unit are mapped as separate areas.
Save area
Initial contents for register 12
Branch tables for computed GOTO's
Parameter lists
Constants and local variables
Address values (`adcons')
Namelist dictionaries
Compiler-generated temporaries
Label addresses
Figure 14.3: FORTRAN H Activation Record
System/360 access paths limit the maximum displacement to 4095. When a larger dis-
placement is generated during CORAL processing, the compiler denes an adcon variable
{ a new activation record base { and resets the displacement to normal variable for further
processing. CORAL does not place either adcons or temporaries into the activation record
at this time, because they may be deleted during optimization.
Phase 20 assigns operands to registers. If the user has specied optimization level 0, the
compiler treats the machine as having one accumulator, one base register and one register for
specifying jump addresses (Table 14.6). Machine simulation (Section 10.3.1) is used to avoid
314 Implementation
redundant loads and stores, but no change is made in the execution order of the quadruples.
Attributes are added to the quadruples, specifying the register or base register used for each
operand and for the result.
Level 1 optimization makes use of a pool of general-purpose registers, as shown in Ta-
ble 14.6. Register 13 is always reserved as the base of the activation record. A decision
about whether to reserve some or all of registers 9-12 is made on the basis of the number of
quadruples output by phase 15. This statistic is available prior to register allocation, and it
predicts the size of the subprogram code. Once the register pool is xed, phase 20 performs
local register assignment within basic blocks and global assignment over the entire program
unit. Again, the order of the quadruples is unchanged and attributes giving the registers used
for each operand or memory access path are added to the quadruples.
Common subexpression elimination, live/dead analysis, code motion and strength reduc-
tion are all performed at optimization level 2. The register assignment algorithms used on
the entire program unit at level 1 are then applied to each loop of the modied program,
starting with the innermost and ending with the entire program unit. This guarantees that
the register assignment within an inner loop will be determined primarily by the activity
of operands within that loop, whereas at level 1 it may be in
uenced by operand activity
elsewhere in the program.
The basic implementation used for a branch is to load the target address of the branch
into a register and then execute an RR-format branch instruction. This requires an adcon for
every basic block whose rst instruction is a branch target. If a register already happened to
hold an address less than 4096 bytes lower than the branch target, however, both the load and
the adcon would be unnecessary. A single RX-format branch instruction would suce. Thus
the compiler reserves registers to act as code bases. To understand the mechanism involved,
we must consider the layout of information in storage more carefully.
Assignment at optimization level
Register 0 1,2
0 Operands and results
1
2
3 Not used
4 Operands and results
5 Branch addresses Selected
logical operands
6 Operands representing in-
dex values
7 Base addresses
8
9 Not used
10 Code bases or operands and
results
11
12 Adcon base
13 Activation record base
14 Computed GOTO Logical Operands and results
results of comparisons
15 Computed GOTO
Table 14.6: General-Purpose Register Assignment by FORTRAN H
14.2 Case Studies 315
We have already seen that phase 15 allocates activation record storage for constants
and programmer-dened variables, generating adcons as necessary to satisfy the displace-
ment limit of 4095. When register allocation is complete, all adcons and temporary vari-
ables that have not been eliminated are added to the activation record. The adcons
must all be directly addressable, since they must be loaded to provide base addresses for
memory access. If they are not all within 4095 bytes of the activation record base then
the reserved register 12 is assumed to contain either the address of the rst adcon or
(base address of the activation record + 4096), whichever is larger. It is assumed that
the number of adcons will never exceed 1024 (although this is theoretically possible, given
the address space of System/360) and hence all adcons will be directly accessible via either
register 12 or register 13. (Note that a fail-safe decision to reserve register 12 can be made
on the basis of the phase 15 output, without regard to the number of quadruples.)
If the number of quadruples output from phase 15 is large enough, register 11 will be
reserved and initialized to address the 4096th byte beyond that addressed by register 12.
Similarly, for a larger number of quadruples, register 10 will be reserved and initialized to an
address 4096 larger than register 11. Finally, register 9 will be reserved and initialized for an
even larger number of quadruples. Phase 20 can calculate the maximum possible address of
each basic block. Those lying within 4096 bytes of one of the reserved registers are marked
with the register number and displacement. The adcon corresponding to the basic block label
is then deleted. (These deletions, plus the ultimate shortening of the basic blocks due to
optimization of the branch instructions, can never invalidate the addressability conditions on
the basic blocks.)
The branch optimization described in the previous paragraphs is carried out only at
optimization levels 1 and 2. At optimization level 0 the basic implementation is used for all
branches.
Phase 25 uses decision tables to select the proper sequence of machine instructions. The
algorithm is basically that of Section 10.3.2, except that the action stub of the decision table
is simply a sequence of instruction templates. Actions such as swap and lreg (Figure 10.13)
have already been carried out during phase 20. There is conceptually one table for every
quadruple operator. Actually, several tables are associated with families of operators, and
the individual operator modies the skeletons as they are extracted. The condition is selected
by a 4-bit status, which may have somewhat dierent meanings for dierent operators. It is
used as an index to select the proper column of the table, which in turn identies the templates
to be used in implementing the operator.
FORTRAN H generates System/360 object modules, which are sequences of 80-character
card images (Figure 14.4). Each card image is output by a normal FORTRAN formatted
write statement. The rst byte contains 2, which is the communication control character
STX (start of text ). All other elds left blank in Figure 14.4 are unused. Columns 2-4 and
73-80 contain alphanumeric information as indicated, with the serial number consisting of
a four-character deck identier and a four-digit sequence number. The remaining columns
simply contain whatever character happens to have the value of the corresponding byte as its
EBCDIC code. Thus 24-bit (3-byte) addresses occupy three columns and halfword (2-byte)
integers occupy two columns. Even though the length eld n has a maximum value of 56, it
occupies a halfword because System/360 has no byte arithmetic.
Comparing Figure 14.4 with Figure 14.2, we see that essentially the same elements are
present. END optionally carries a transfer address, thus subsuming XFER. ESD plays the
roles of both PIDL and ENTR, and also species the symbols from LINK. Its purpose is
to describe the characteristics of the control sections associated with global symbols, and
to dene short, xed-length representations (the esdid's) for those symbols. The esdid in
columns 15-16 identies a deck or external; only one symbol of these types may appear on
316 Implementation
an ESD card. Entry symbols identify the control sections to which they belong (ldid), and
therefore they may be placed on any ESD card where space is available.
RLD provides the remaining function of LINK, and also that of the relocation bits in
TEXT. Each item of relocation information modies the eld at the absolute location specied
in the position esdid and address by either adding or subtracting the value identied by the
relocation esdid. Byte f determines whether the value will be added or subtracted, and also
species the width of the eld being modied (which may be 1, 2, 3 or 4 bytes). If a sequence
of relocations involve the same esdid's then these specications are omitted from the second
and subsequent relocations. (The rightmost bit of f is 1 if the following relocation does not
specify esdid's, 0 otherwise.)
The decision to use relocation bits on the Control Data machine and the RLD mechanism
on System/360 re
ects a fundamental dierence in the instruction sets: 30-bit instructions
on the 6000 Series often reference memory directly, and therefore relocatable addresses are
common in the text. On System/360, however, all references to memory are via values in
registers. Only the adcons are relocatable and therefore relocatable addresses are quite rare
in the text.
have incorrect ideas about the source of bottlenecks in their code. Measurement of critical
parameters of the compiler as it is running is thus imperative. These parameters include the
sizes of various data structures and the states of various allocation and lookup mechanisms,
as well as an execution histogram [Waite, 1973b]. The only description of GIER ALGOL in
the open literature is the paper by Naur [1964] cited earlier, but a very similar compiler for
a variant of Pascal was discussed in great detail by Hartmann [1977].
Ammann [1975] gives an excellent account in German of the development of Zurich Pascal,
and partial descriptions are available in English Ammann [1974, 1977].
In addition to the Program Logic Manual, [IBM, 1968] descriptions of FORTRAN H have
been given by Lowry and Medlock [1969] and Scarborough and Kolsky [1980]. These
treatments concentrate on the optimization performed by Phase 20, however, and give very
little information about the compiler as a whole.
318 Implementation
Appendix A
The Sample Programming
Language LAX
In this Appendix we dene the sample programming language LAX (LAnguage eX ample),
upon which the concrete compiler design examples in this book are based. LAX illustrates
the fundamental problems of compiler construction, but avoids uninteresting complications.
We shall use extended Backus-Naur form (EBNF ) to describe the form of LAX. The
dierences between EBNF and normal BNF are:
Each rule is terminated by a period.
Terminal symbols of the grammar are delimited by apostrophes. (Thus the metabrackets
`<' and `>' of BNF are super
uous.)
The following abbreviations are permitted:
Abbreviation Meaning
X ::= ( )
: X ::= Y
: Y ::= :
X ::= [ ]
: X ::=
j ( )
:
X ::= u+
: X ::= Y
: Y ::= u j Y u:
X ::= u
: X ::= [u+ ]
:
X ::= jj t: X ::= (t) :
Here , and
are arbitrary right-hand sides of rules, Y is a symbol that does not
appear elsewhere in the specication, u is either a single symbol or a parenthesized
right-hand side, and t is a terminal symbol.
For a more complete discussion of EBNF see Section 5.1.4.
The axiom of the grammar is program. EBNF rules marked with an asterisk in this
Appendix are included to aid in the description of the language, but they do not participate
in the derivation of any sentence. Thus they dene useless nonterminals in the sense of
Chapter 5.
one described here for the given computation. The meaning of constructs that do not satisfy
the rules given here is undened. Whether, and in what manner, a particular implementation
of LAX gives meaning to undened constructs is outside the scope of this denition.
Before translation, a LAX program is embedded in the following block, which is then
translated and executed:
declare standard declarations begin program end
The standard declarations provide dening occurrences of the predened identiers given
in Table A.1. These declarations cannot be expressed in LAX.
Identier Meaning
boolean Logical type
false Falsity
integer Integer type
nil Reference to no object
real Floating point type
true Truth
Table A.1: Predened Identiers
1. Let R be the text of A, and let B be the block in which the LAX program is embedded.
2. Let R0 be the smallest range properly containing R, and let T be the text of R0 excluding
the text of all ranges nested within it.
3. If T does not contain a dening occurrence of I , and R0 is not B , then let R be R0 and
go to step (2).
4. If T contains a dening occurrence of I then that dening occurrence is D.
Identifier is a dening occurrence in the productions for label definition (A.2.0.6), itera-
tion (A.2.0.7), variable declaration (A.3.0.2), identity declaration (A.3.0.7), procedure dec-
laration (A.3.0.8), parameter (A.3.0.10), type declaration (A.3.0.12) and field (A.3.0.14).
All other instances of identifier are applied occurrences.
A.2.3 Blocks
The execution of a block begins with a consistent renaming : If an identier has dening
occurrences in this block (excluding all blocks nested within it) then those dening occurrences
and all applied occurrences identifying them are replaced by a new identier not appearing
elsewhere in the program.
After the consistent renaming, the declarations of the block are executed in the sequence
they were written and then the statements are executed as described for a statement list
(Section A.2.4). The result of this execution is the result of the block. The extent of the
result of a block must be larger than the execution of that block.
322 The Sample Programming Language LAX
A.2.5 Iterations
The iteration
while expression do statement list end
is identical in meaning to the conditional clause:
if expression then
statement list;
while expression do statement list end
end
The iteration
for identifier from initial value to final value do statement list end
is identical in meaning to the block:
A.3 Declarations
A.3.0.1 declaration ::= variable declaration
j identity declaration
j procedure declaration
j type declaration:
A.3.0.2 variable declaration ::= identifier 0 :0 type specification
j identifier 0 :0
0 array0 0 [0 (bound pair jj 0 ;0 ) 0 ]0 0 of 0 type specification:
A.3.0.3 type specification ::= identifier
j 0 ref 0 type specification
j 0 ref 0 array type
j procedure type:
A.3.0.4 bound pair ::= expression 0 :0 expression:
A.3.0.5 array type ::= 0array0 0 [0 0 ;0 0 ]0 0 of 0 type specification:
A.3.0.6 procedure type ::=
0 procedure0 [ 0 (0 (type specification jj 0 ;0 ) 0 )0 ] [result type]:
A.3.0.7 identity declaration ::=
identifier 0 is0 expression 0 :0 type specification:
A.3.0.8 procedure declaration ::= 0 procedure0 identifier procedure:
A.3.0.9 procedure ::= [0 (0 (parameter jj 0 ;0 ) 0 )0 ] [result type] 0 ;0 expression:
A.3.0.10 parameter ::= identifier 0 :0 type specification:
A.3.0.11 result type ::= 0 :0 type specification:
A.3.0.12 type declaration ::= 0 type0 identifier 0 =0 record type:
A.3.0.13 record type ::= 0 record0 (field jj 0 ;0 ) 0 end0 :
A.3.0.14 field ::= identifier 0 :0 type specification:
A.3.0.15 * type ::= type specification j array type j procedure type:
See Section A.4 for Expressions.
A.3.1 Values, Types and Objects
Values are abstract entities upon which operations may be performed, types classify values
according to the operations that may be performed upon them, and objects are the concrete
instances of values that are operated upon. Two objects are equal if they are instances of the
same value. Two objects are identical if references (see below) to them are equal. Every object
has a specied extent, during which it can be operated upon. The extents of denotations,
the value nil (see below) and objects generated by new (Section A.4.4) are unbounded; the
extents of other objects are determined by their declarations.
The predened identiers boolean, integer and real represent the types of truth values,
integers and
oating point numbers respectively. Values of these types are called primitive
values, and have the usual meanings.
An instance of a value of type ref t is a variable that can refer to (or contain ) an object
of type t. An assignment to a variable changes the object to which the variable refers, but
does not change the identity of the variable. The predened identier nil denotes a value of
type ref t, for arbitrary t. Nil refers to no object, and may only be used in a context that
species the referenced type t uniquely.
Values and objects of array and record types are composite. The immediate components
of an array are all of the same type, and the simple selectors are integer tuples. The immediate
components of a record may be of dierent types, and the simple selectors are represented by
identiers. No composite object may have a component of its own type.
324 The Sample Programming Language LAX
Values of a procedure type are specications of computations. If the result type is omitted,
then a call of the procedure yields no result and the procedure is called a proper procedure ;
otherwise it is called a function procedure.
If two types consist of the same sequence of basic symbols and, for every identier in that
sequence, the applied occurrences in one type identify the same dening occurrence as the
applied occurrences in the other, then the two types are the same. In all other cases, the two
types are dierent.
A.3.2 Variable Declarations
A variable referring to an undened value (of the specied type) is created, and the identier
represents this object. The extent of the created variable begins when the declaration is
executed and ends when execution of the smallest range containing the declaration is complete.
If the variable declaration has the form
identifier : t
then the created variable is of type ref t, and may refer to any value of type t. If, on the
other hand, it has the form
identifier : array [l1 : u1 ; : : : ; ln : un ] of t
then the created variable is of type ref array type, and may only refer to values having
the specied number of immediate components. The type of the array is obtained from
the variable declaration by deleting `identifier:' and each bound pair e1 : e2 ; array [l1 :
u1; : : : ; ln : un] of t species an array of this type with (u1 , l1 + 1) (un , ln + 1)
immediate components of type t. The bounds li and ui are integers with li ui .
A.3.3 Identity Declarations
A new instance of the value (of the specied type) resulting from evaluation of the expression
is created, and the identier represents this object. If the expression yields an array or
reference to an array, the new instance has the same bounds. The extent of the created
object is identical to the extent of the result of the expression.
A.3.4 Procedure Declarations
A new instance of the value (of the specied procedure type) resulting from copying the basic
symbol sequence of the procedure is created, and the identier represents this object. The
extent of the created object begins when the declaration is executed and ends when execution
of the smallest block containing the declaration is complete.
Evaluation of the expression of a function procedure must yield a value of the given
result type.
The procedure type is obtained from the procedure declaration by deleting `identifier'
and `; expression', and removing `identifier :' from each parameter.
A.3.5 Type Declarations
The identier represents a new record type dened according to the given specication.
A.4 Expressions
A.4.0.1 expression ::= assignment j disjunction:
A.4.0.2 assignment ::= name 0 :=0 expression:
A.4.0.3 disjunction ::= conjunction j disjunction 0 or0 conjunction:
A.4 Expressions 325
A.4.2 Coercions
The context in which a language element (statement, argument, expression, operand, name
as a component of an indexed object, procedure call, etc.) appears may permit a stated
set of types for the result of that element, prescribe a single type, or require that the result
be discarded. When the a priori type of the result does not satisfy the requirements of the
326 The Sample Programming Language LAX
2. The block (or parenthesized expression) is executed. If it is not left by a jump, the
result is coerced to the result type of P (or voided, in the case of a proper procedure).
3. As soon as execution is completed, possibly by a jump, the substitution of step 1 is
reversed (i.e. the original call is restored).
The value yielded by the coercion in step (2) is the result of the procedure call.
A.4.6 Clauses
The expression in a conditional clause must deliver a Boolean result. If this result is true
then the rst statement list will be executed and its result will be taken as the result of the
conditional clause; otherwise the second statement list will be executed and its result will be
taken as the result of the conditional clause. The rst alternative of a one-sided conditional
clause, in which the second alternative is omitted, is voided.
The expression in a case clause must deliver an integer result. When the value of the
expression is i and one of the case labels is i, the statement list associated with that case
label will be executed and its result will be taken as the result of the case clause; otherwise
the statement list following else will be executed and its result will be taken as the result of
the case clause. All case labels in a case clause must be distinct.
The component statement lists of a clause must be balanced to ensure that the type of
the result yielded is the same regardless of which alternative was chosen. Balancing involves
coercing the result of each component statement list to a common type. If there is no one
type to which all of the result types are coercible then all the results are voided. When the
type returned by the clause is uniquely prescribed by the context then this type is chosen
as the common result type for all alternatives. If the context of the expression is such that
several result types are possible, the one leading to the smallest total number of coercions is
chosen.
Appendix B
Useful Algorithms For Directed
Graphs
The directed graph is a formalism well-suited to the description of syntactic derivations, data
structures and control
ow. Such descriptions allow us to apply results from graph theory
to a variety of compiler components. These results yield standard algorithms for carrying
out analyses and transformations, and provide measures of complexity for many common
tasks. In this appendix we summarize the terminology and algorithms most important to the
remainder of the book.
B.1 Terminology
B.1 Definition
A directed graph is a pair (K; D); where K is a nite, nonempty set and D is a subset of
K K . The elements of K are called the nodes of the graph, and the elements of D are the
edges.
Figure B.1a is a directed graph, and Figure B.1b shows how this graph might be represented
pictorially.
K = f1; 2; 3; 4g
D = f(1; 2); (1; 3); (4; 4); (2; 3); (3; 2); (3; 4)g
a) The components of the graph
1 4
2 3
b) Pictorial representation
K[1] K[2] K[3]
In many cases, a label function, f , is dened on the nodes and/or edges of a graph. Such
a function associates a label, which is an element of a nite, nonempty set, with each node or
edge. We then speak of a graph with node or edge labels. The labels serve as identication of
the nodes or edges, or indicate their interpretation. This is illustrated in Figure B.1b, where
a function has been provided to map K into the set f1; 2; 3; 4g.
B.2 Definition
A sequence (k0 ; : : : ; kn ) of nodes in a directed graph (K; D), n 1, is called a path of length
n if (ki , 1; ki ) 2 D; i = 1; : : : ; n. A path is called a cycle if k0 = kn .
An edge may appear more than once in a path: In the graph of Figure B.1, the sequence of
edges (2,3), (3,2), (2,3), (3,4), (4,4), (4,4) denes the path (2,3,2,3,4,4,4) of length 6.
B.3 Definition
Let (K; D) be a directed graph. Partition K into equivalence classes Ki such that nodes u
and v belong to the same class if and only if there is a cycle to which u and v belong. Let Di
be the subset of edges connecting pairs of nodes in Ki . The directed graphs (Ki ; Di ) are the
strongly connected components of (K; D).
The graph of Figures B.1a and B.1b has three strongly connected components:
K1 = f1g; D1 = fg
K2 = f4g; D2 = f(4; 4)g
K3 = f2; 3g; D3 = f(2; 3); (3; 2)g
Often we deal with graphs in which all nodes of a strongly connected component are
identical with respect to some property of interest. When dealing with this property, we can
therefore replace the original graph with a graph having one node for each strongly connected
component.
B.4 Definition
Let P = fK1 ; : : : ; Kn g be a partition of node set of a directed graph (K; D). The reduction
of (K; D) with respect to the partition P is the directed graph (K 0 ; D0 ) such that K 0 =
fk1 ; : : : ; kn g and D0 = f(ki ; kj ) j i 6= j; and (u; v) is an element of D for some u 2 Ki and
v 2 Kj g.
We term the subsets Ki of an (arbitrary) partition blocks. The reduction with respect to
strongly connected components is the condensation graph.
The condensation graph of Figure B.1b is shown in Figure B.1c. Since every cycle lies
wholly within a single strongly connected region, the condensation graph has no cycles.
B.5 Definition
A directed acyclic graph is a directed graph that contains no cycles.
B.6 Definition
A directed acyclic graph is called a tree with root k0 if for every node k 6= k0 there exists
exactly one path (k0 ; : : : ; k).
These two special classes of graphs are illustrated in Figure B.2.
If a tree has an edge (k; k0 ); we say that k0 is a child of k and k is the parent of k0 . Note
that Denition B.6 permits a node to have any number of children. Because the path from
the root is unique, however, every node k 6= k0 has exactly one parent. The root, k0 ; is the
only node with no parent. A tree has at least one leaf, which is a node with no children. If
B.1 Terminology 331
1 2 3
4 5 6 7 8
9
a) A directed acyclic graph
0
1 2 3
4 5 6 7 8
b) A tree
Figure B.2: Special Cases of Directed Graphs
there is a path in a tree from node k to node k0 , we say that k0 is a descendant of k and k is
an ancestor of k0 .
B.7 Definition
A tree is termed ordered if, for every node, a linear order is dened on the children of that
node.
If we list the children of a node k0 in an ordered tree, we shall always do so in the sense of
the ordering; we can therefore take the enumeration as the ordering. The rst child of k0 is
also called the left child ; the child node that follows k in the order of successors of k0 is called
the right sibling of k. In Figure B.2b, for example, we might order the children of a node
according to the magnitude of their labels. Thus 1 would be the left child of 0, 2 would be
the right sibling of 1, and 3 the right sibling of 2. 3 has no right siblings and there is no
relationship between 6 and 7.
In an ordered tree, the paths leaving the root can be ordered lexicographically: Consider
two paths x = (x0 ; : : : ; xm ) and y = (y0 ; : : : ; yn ) with m n and x0 = y0 being the root.
Because both paths begin at the root, there exists some i 0 such that xj = yj ; j = 0; : : : ; i.
We say that x < y either if i = m and i < n; or if xi+1 < yi+1 according to the ordering of the
children of xi (= yi ). Since there is exactly one path from the root to any node in the tree,
this lexicographic ordering of the paths species a linear ordering of all nodes of the tree.
B.8 Definition
A cut in a tree (K; D) is a subset, C , of K such that for each leaf km 2 (K; D) exactly one
element of C lies on the path (k0 ; : : : ; km ) from the root k0 to that leaf.
Examples of cuts in Figure B.2b are f0g; f1; 2; 3g; f1; 2; 7; 8g and f4; 5; 6; 7; 8g.
In an ordered tree, the nodes of a cut are linearly-ordered on the basis of the ordering of
all nodes. When we describe a cut in an ordered tree, we shall always write the nodes of that
cut in the sense of this order.
332 Useful Algorithms For Directed Graphs
B.9 Definition
A spanning forest for a directed graph (K; D) is a set of trees f(K1 ; D1 ); : : : ; (Kn ; Dn )g such
that the Ki 's partition K and each Di is a (possibly empty) subset of D.
All of the nodes of a directed graph can be visited by traversing the trees of some spanning
forest. The spanning forest used for such a traversal is often the one corresponding to a
depth-rst search :
procedure depth_first_search (k : node );
begin mark k as having been visited;
for each successor k' of k do
if k' has not yet been visited then depth_first_search (k' )
end; (* depth_first_search *)
To construct a spanning forest, this procedure is applied to an arbitrary unvisited node and
repeated so long as such nodes exist.
A depth-rst search can also be used to number the nodes in the graph:
B.10 Definition
A depth-rst numbering is a permutation (k1 ; : : : ; kn ) of the nodes of a directed graph (K; D)
such that k1 is the rst node visited by a particular depth-rst search, k2 the second and so
forth.
Once a spanning forest f(K1 ; D1 ); : : : ; (Kn ; Dn )g has been dened for a graph (K; D) the set
D can be partitioned into four subsets:
Tree edges, elements of D1 [ [ Dn.
Forward edges, (kp ; kq ) such that kp is an ancestor of kq in some tree Ki ; but (kp; kq ) is
not an element of Di .
Back edges, (kq ; kp) such that either kp is an ancestor of kq in some tree Ki or p = q.
Cross edges, (kp ; kq ) such that kp is neither an ancestor nor a descendant of kq in any
tree Ki .
These denitions are illustrated by Figure B.3.
Figure B.3b shows a spanning forest and depth-rst numbering for the graph of Fig-
ure B.3a. The forest has two trees, whose roots are nodes 1 and 7 respectively. All edges
appearing in Figure B.3b are tree edges. In Figure B.3a (using the numbers of Figure B.3b),
(1,3) is a forward edge, (4,2) and (7,7) are back edges, and (5,4), (7,3) and (7,6) are cross
edges.
a) A directed graph
2 4
1 3 5
6 7
b) A depth-rst numbering and spanning forest for (a)
Figure B.3: Depth-First Numbering
A directed graph that is a tree can, of course, be represented by the abstract data type
of Figure B.4. In this case, however, a simpler representation (Figure B.5) could also be
used. This simplication is based upon the fact that any node in a tree can have at most one
parent. Note that the edges do not appear explicitly, but are implicit in the node linkage.
The abstract data structure is set up by instantiating the module with the proper number
of nodes and then invoking define edge once for each edge to specify the nodes at its head
and tail. If it is desired that the order of the sibling list re
ect a total ordering dened on
the children of a node, then the sequence of calls on define edge should be the opposite of
this order.
A partition is dened by a collection of blocks (sets of nodes) and a membership relation
node 2 block . The representation of the partition must be carefully chosen so that operations
upon it may be carried out in constant time. Figure B.6 denes such a representation.
When a partition module is instantiated, its block set is empty. Blocks may be created by
invoking new block , which returns the index of the new block. This block has no members
initially. The procedure add node is used to make a given node a member of a given block.
Since each node can be a member of only one block, this procedure must delete the given
node from the block of which it was previously a member (if such exists).
The status of a partition can be determined by invoking number of blocks , block cont-
aining , node count , first node and next node . If a node does not belong to any block,
then block containing returns 0; otherwise it returns the number of the block of which the
node is a member. Application of the function node count to a block yields the number of
nodes in that block. The procedures first node and next node work together to access all
of the members of a block: A call of first node returns the rst member of a specic block.
(If the block is empty then first node returns 0.) Each subsequent invocation of next node
returns the next member of that block. When all members have been accessed, next node
returns 0.
334 Useful Algorithms For Directed Graphs
1 2 3
4 5 6 7 8
Solid lines represent tree edges. Dashed lines represent actual links maintained by the tree
module.
a) Pictorial representation
module tree (n : public
integer );
(* Representation of a tree
n = Number of nodes in the tree *)
var node : array[1 .. n] of record
parent ,child ,sibling :integer ; end
i : integer ;
public procedure define_edge (hd , tl : integer );
begin (* define_edge *)
with do
node[tl]
begin parent := hd ; sibling := node[hd].child ; end
node[hd].child := tl ;
end ; (* define_edge *)
public function parent (n : integer ) : integer ;
begin parent := node[n].parent ;end
public function child (n : integer ) : integer ;
begin child := node[n].child ; end
public function sibling (n : integer ) : integer ;
begin sibling := node[n].sibling ; end
begin (* tree *)
for i := 1to do with
n node[i] do
parent := child := sibling := 0;
end ; (* tree *)
b) Abstract data type
Figure B.5: Simplication for a Tree
The membership relation is embodied in a doubly-linked list. Each node species the block
of which it is a member, and each block species the number of members. Figure B.6 uses a
single array to store both node and block information. This representation greatly simplies
the treatment of the doubly-linked list, since the last and next elds have identical meanings
for node and block entries. The member eld species the number of members in a block entry,
but the block of which the node is a member in a node entry. For our problems, the number
of partitions can never exceed the number of nodes. Hence the array is allocated with twice
as many elements as there are nodes in the graph being manipulated. (Element 0 is included
to avoid zero tests when accessing the next element in a node list.) The rst half of the array
is indexed by the node numbers; the second half is used to specify the blocks of the partition.
Note that the user is not aware of this oset in block indices because all necessary translation
is provided by the interface procedures.
336 Useful Algorithms For Directed Graphs
B.3.2 Renement
Consider a graph (K; D) and a partition P = fPp ; : : : ; Pk g of Q with m 2. We wish to nd
the partition R = fR1 ; : : : ; Rr g with smallest r such that:
Each Rk is a subset of some Pj (`R is a renement of P 0 ).
If a and b are elements of Rk then, for each (a; x) 2 D and (b; y) 2 D, x and y are
elements of some one Rm (`R is compatible with D0 ).
B.3 Partitioning Algorithms 339
The state minimization problem discussed in Section 6.2.2 and the determination of struc-
tural equivalence of types from Section 9.1.2 can both be cast in this form.
The obvious strategy for making a renement is to check the successors of all nodes in a
single element of the current partition. This element must be split if two nodes have successors
in dierent elements of the partition. To obtain the renement, split the element so that these
two nodes lie in dierent elements. The rened partition is guaranteed to satisfy condition
(1). The process terminates when no element must be split. Since a partition in which
each element contains exactly one node must satisfy condition (2), the process of successive
renement must eventually terminate. It can be shown that this algorithm is quadratic in
the number of nodes.
By checking predecessors rather than successors of the nodes in an element, it is possible
to reduce the asymptotic behavior of the algorithm to O(n log n); where n is the number of
nodes. This reduction is achieved at the cost of a more complex algorithm, however, and may
not be worthwhile for small problems. In the remainder of this section we shall discuss the
O(n log n) algorithm, leaving the simpler approach to the reader (Exercise B.6).
procedure refine (p : partition; f : graph );
(* Make p be the coarsest partition compatible with p and f *)
var pending : fixed_depth_stack (f.n );
i : integer ;
procedure split (block : integer );
var inv :inverse (f ,block ,p ); (* Construct the inverse of block *)
b , k , n : integer ;
begin
(* split *)
k := inv.next_block ;
while 6
k= 0 do
begin (* P [f
k
,1 ( block 6 ;
) = but not k ,1 (
P f ) *) block
b := p.new_block ;
while (n := inv.common_node ) = 0 6 do
p.add_node (n , b );
if
pending.member (k ) or
(p.element_count (k ) < p.element_count (b ))
then pending.push (b )
elsepending.push (k )
k := inv.next_block ;
end
end; (* split *)
begin (* refine *)
for i := 1 to p.block_count do pending.push (i );
repeat pending.pop (i ); split (i ) until pending.empty
end; (* refine *)
Figure B.9: Renement Algorithm
The renement procedure of Figure B.9 accepts a graph G = (K; D) and a partition
fP ; : : : ; Pm g of K with m 2. The elements of D correspond to a mapping f : K ! K for
1
which (k; k0 ) is an element of D if f (k) = k0 . Refine inspects the inverse mappings f ,1(Pj ).
A set Pk must be split into two subsets if and only if Pk \ f ,1(Pj ) is nonempty for some
j , and yet Pk is not a subset of f ,1(Pj ). The two subsets are then Pk0 = (Pk \ f ,1 (Pj ))
and Pk00 = Pk , Pk0 . This split must be carried out once for every Pj . If Pj contributes to
the splitting of Pk and is itself split later, both subsets must again be used to split other
partitions.
340 Useful Algorithms For Directed Graphs
The rst step in each execution of the split procedure is to construct the inverse of block
Pj . Next the blocks Pk for which Pk \ f ,1(Pj ) is nonempty but Pk is not a subset of f ,1(Pj )
are split and the smaller of the two components is returned to the stack of blocks yet to be
considered.
Figure B.10 denes an abstract data type that can be used to represent f ,1 (Pj ). When
inverse is instantiated, it represents an empty set. Nodes are added to the set by invoking
inv node . After all nodes belonging to inverse (j) have been added to the set, we wish to
consider exactly those blocks that contain elements of inverse (j) but are not themselves
subsets of inverse (j) . The module allows us to obtain a block satisfying these constraints
by invoking next block . (If next block returns 0, no more such blocks exist.) Once a block
has been obtained, successive invocations of common node yield the elements common to that
block and inverse (j) . Note that each of the operations provided by the abstract data type
requires constant time.
B.3.3 Coloring
The problem of minimizing the number of rows in a parse table can be cast as a problem
in graph theory as follows: Let each row correspond to a node. Two nodes k and k0 are
adjacent (connected by edges (k; k0 ) and (k0 ; k)) if the corresponding rows are incompatible
and therefore cannot be combined. We seek a partition of the graph such that no two adjacent
nodes belong to the same block of the partition. The rows corresponding to the nodes in a
single block of the partition then have no incompatibilities, and can be merged. Clearly we
would like to nd such a partition having the smallest number of blocks, since this will result
in maximum compression of the table.
This problem is known in graph theory as the coloring problem, and the minimum number
of partitions is the chromatic number of the graph. It has been shown that the coloring prob-
lem is NP-complete, and hence we seek algorithms that eciently approximate the optimum
partition.
Most approximation algorithms are derived from backtracking algorithms that decide
whether a given number of colors is sucient for the specied graph. If such an algorithm
is given a number of colors equal to the number of nodes in the graph then it will never
need to backtrack, and hence all of the mechanism for backtracking can be removed. A good
backtracking algorithm contains heuristics designed to prune large portions of the search tree,
which, in this case, implies using as few colors as possible for trial colorings. But it is just
these heuristics that lead to good approximations when there is no backtracking!
A general approach is to make the most constrained decisions rst. This can be done
by sorting the nodes in order of decreasing incident edge count. The rst node colored has
the maximum number of adjacent nodes and hence rules out the use of its color for as many
nodes as possible. We then choose the node with the most restrictive constraint on its color
next, resolving ties by taking the one with most adjacent nodes. At each step we color the
chosen node with the lowest possible color.
Figure B.11 gives the complete coloring algorithm. We assume that g contains no cycles
of length 1. (A graph with cycles of length 1 cannot be colored because some node is adjacent
to itself and thus, by denition, must have a dierent color than itself.) First we partition the
nodes according to number of adjacencies, coloring any isolated nodes immediately. Because
of our assumptions, block g.n of sort must be empty. The coloring loop then scans the
nodes in order of decreasing adjacency count, seeking the most restrictive choice of colors.
This node is then assigned the lowest available color, and that color is made unavailable to
all of the node's neighbors. Note that we mark a node as having been colored by moving it
to block g.n of the sort partition.
B.3 Partitioning Algorithms 341
Exercises
B.1 The graph module of Figure B.4 is unpleasant when the number of edges is not known
at the time the module is instantiated: If e is not made large enough then the program
will fail, and if it is made too large then space will be wasted.
(a) Change the module denition so that the array edge is not present. Instead, each
edge should be represented by a record allocated dynamically by define edge .
(b) What is the lifetime of the edge storage in (a)? How can it be recovered?
B.2 Modify the module of Figure B.5 to save space by omitting the parent eld of each
node. Provide access to the parent via the sibling pointer of the last child. What
additional information is required? If the two versions of the module were implemented
on a machine with which you are familiar, would there be any dierence in the actual
storage requirements for a node? Explain.
B.3 Consider the partition module of Figure B.6.
(a) Show that if array p is dened with lower bound 1, execution of add node may
abort due to an illegal array reference. How can this error be avoided if the lower
bound is made 1? Why is initialization of p[0] unnecessary?
(b) What changes would be required if we wished to remove a node from all blocks
by using add node to add it to a ctitious block 0?
(c) Under what circumstances would the use of first node and next node to scan
a block of the partition be unsatisfactory? How could this problem be overcome?
B.4 Explain why the elements of stack are initialized to 0 in Figure B.7 and why the pop
operation resets the element to 0. Could top be set to 0 initially also?
344 Useful Algorithms For Directed Graphs
B.5 Consider the application of strongly connected components to the graph of Fig-
ure B.3a. Assume that the indexes of the node in the graph were assigned `by column':
The leftmost node has number 1, the next three have numbers 2-4 (from the top) and
the rightmost three have numbers 5-7. Also assume that the lists of edges leaving a
node are ordered clockwise from the 12 o'clock position.
(a) Show that the nodes will be visited in the order given by Figure B.3b.
(b) Give a sequence of snapshots showing the procedure activations and the changes
in lowlink .
(c) Show that the algorithm partitions the graph correctly.
B.6 Consider the renement problem of Section B.3.2.
(a) Implement a Boolean procedure split(block) that will rene block according
to the successors of its nodes: If all of the successors of nodes in block lie in
the same block of p , then split(block) returns false and p is unchanged.
Otherwise, suppose that the successors of nodes in block lie in n distinct blocks,
n > 1. Add n , 1 blocks to p and distribute the nodes of block among block and
these new blocks on the basis of their successor blocks. Split(block) returns
true in this case.
(b) Implement refine as a loop that cycles through the blocks of p , applying split
to each. Repeat the loop so long as any one of the applications of split yields
true . (Note that for each repetition of the loop, the number of blocks in p will
increase by at least one.)
B.7 Consider the problem of structural equivalence of types discussed in Section 9.1.2. We
can solve this problem as follows:
(a) Dene a graph, each of whose nodes represents a single type. There is an edge
from node k1 to node k2 if type k1 `depends upon' type k2 . (One type `depends
upon' another if its denition uses that type. For example, if k1 is declared to be
of type ref k2 then k1 `depends upon' k2 .)
(b) Dene a partition that groups all of the `similarly dened' types. (Two types are
`similarly dened' if their type denitions have the same structure, ignoring any
type specications appearing in them. For example, ref k1 and ref k2 are `similarly
dened'.)
(c) Apply the renement algorithm of Section B.3.2. Assume that array types are
`similarly dened' if they have the same dimensions, and record types are `similarly
dened' if they have the same eld identiers in the same order. Apply the
procedure outlined above to the structural equivalence problem of Exercise 2.2.
B.8 Consider the problem of state minimization discussed in Section 6.2.2. The state dia-
gram is a directed graph with node and edge labels. It denes a function f (i; s), where
i is an input symbol selected from the set of edge labels and s is a state selected from
the set of node labels.
(a) Assume that the state diagram has been completed by adding an error state, so
that there is an edge for every input symbol leaving every node. Dene a three-
block partition on the graph, with the error state in one block, all nal states
in the second and all other states in the third. Consider the edges of the state
diagram to dene a set of functions, fi, one per input symbol. Show that the states
of the minimum automaton correspond to the nodes of the reduction of the state
B.4 Notes and References 345
diagram with respect to the renement of the three block partition compatible
with all fi .
(b) Show that Denition B.1 permits only a single edge directed from one specic node
to another. Is this limitation enforced by Figure B.4? If so, modify Figure B.4 to
remove it.
(c) Modify Figure B.4 to allow attachment of integer edge labels.
(d) Modify Figure B.9 to carry out the renement of a graph with edge labels, treating
each edge label as a distinct function.
(e) Modify the result of (d) to make completion of the state diagram unnecessary:
When a particular edge label is missing, assume that its destination is the error
state.
346 Useful Algorithms For Directed Graphs
References
We have repeatedly stressed the need to derive information about a language from the def-
inition of that language rather than from particular implementation manuals or textbooks
describing the language. In this book, we have used the languages listed below as sources of
examples. For each language we give a reference that we consider to be the `language deni-
tion'. Any statement that we make regarding the language is based upon the cited reference,
and does not necessarily hold for particular implementations or descriptions of the language
found elsewhere in the literature.
Ada The denition of Ada was still under discussion when this book went to press. We have
based our examples upon the version described by Ichbiah [1980].
ALGOL 60 Naur [1963].
ALGOL 68 van Wijngaarden et al. [1975].
BASIC Almost every equipment manufacturer provides a version of this language, and the
strongest similarity among them is the name. We have followed the standard for `min-
imal BASIC' ANSI [1978b].
COBOL ANSI [1968]
Euclid Lampson et al. [1977]
FORTRAN We have drawn examples from both the 1966 ANSI [1966] and 1978 ANSI
[1978a] standards. When we refer simply to `FORTRAN', we assume the 1978 standard.
If we are pointing out dierences, or if the particular version is quite important, then we
use `FORTRAN 66' and `FORTRAN 77' respectively. (Note that the version described
by the 1978 standard is named `FORTRAN 77', due to an unforeseen delay in publication
of the standard.)
LIS Rissen et al. [1974].
LISP The examples for which we use LISP depend upon its applicative nature, and hence we
rely upon the original description McCarthy [1960] rather than more modern versions.
MODULA-2 Wirth [1980].
Pascal Pascal was in the process of being standardized when this book went to press. We
have relied for most of our examples on the User Manual and Report Jensen and
Wirth [1974] but we have also drawn upon the draft standard Addyman [1980]. The
examples from the latter have been explicitly noted as such.
SIMULA Nygaard et al. [1970].
SNOBOL-4 Griswold et al. [1971].
ACM [1961]. ACM compiler symposium 1960. Communications of the ACM, 4(1):3{84.
Addyman, A.M. [1980]. A draft proposal of Pascal. ACM SIGPLAN Notices, 15(4):1{66.
Aho, Alfred V. and Corasick, M. J. [1975]. Ecient string matching: An aid to biblio-
graphic search. Communications of the ACM, 18(6):333{340.
347
348 References
Aho, Alfred V., Hopcroft, J. E., and Ullman, Jeffrey D. [1974]. The Design and
Analysis of Computer Algorithms. Addision Wesley, Reading, MA.
Aho, Alfred V. and Johnson, Stephen C. [1976]. Optimal code generation for expression
trees. Journal of the ACM, 23(3):488{501.
Aho, Alfred V., Johnson, Stephen C., and Ullman, Jeffrey D. [1977]. Code gener-
ation for machines with multiregister operations. Journal of the ACM, pages 21{28.
Aho, Alfred V. and Ullman, Jeffrey D. [1972]. The Theory of Parsing, Translation,
and Compiling. Prentice-Hall, Englewood Clis.
Aho, Alfred V. and Ullman, Jeffrey D. [1977]. Principles of Compiler Design. Addision
Wesley, Reading, MA.
Allen, F. E., Cocke, J., and Kennedy, K. [1981]. Reduction of operator strength. In
[ Muchnick and Jones, 1981], pages 79{101. Prentice-Hall, Englewood Clis.
Ammann, U. [1974]. The method of structured programming applied to the development of
a compiler. In Proceedings of the International Computing Symposium 1973, pages 94{99.
North-Holland, Amsterdam, NL.
Ammann, U. [1975]. Die Entwicklung eines Pascal-Compilers nach der Methode des Struk-
turierten Programmierens. Ph.D. thesis, Eidgenossischen Technischen Hochschule Zurich.
Ammann, U. [1977]. On code generation in a Pascal compiler. Software{Practice and Expe-
rience, 7:391{423.
Anderson, T., Eve, J., and Horning, J. J. [1973]. Ecient LR(1) parsers. Acta Infor-
matica, 2:12{39.
ANSI [1966]. FORTRAN. American National Standards Institute, New York. X3.9-1966.
ANSI [1968]. COBOL. American National Standards Institute, New York. X3.23-1968.
ANSI [1978a]. FORTRAN. American National Standards Institute, New York. X3.9-1978.
ANSI [1978b]. Minimal BASIC. American National Standards Institute, New York. X3.9-
1978.
Asbrock, B. [1979]. Attribut-Implementierung und -Optimierung fur Attributierte Gram-
matiken. Master's thesis, Fakultat fur Informatik, Universitat Karlsruhe, FRG.
Baker, T. P. [1982]. A one-pass algorithm for overload resolution in Ada. ACM Transactions
on Programming Languages and Systems, 4(4):601{614.
Balzer, R. M. [1969]. EXDAMS - extendable debugging and monitoring system. In Spring
Joint Computer Conference, volume 34 of AFIPS Conference Proceedings, pages 567{580.
AFIPS Press, Montvale, NJ.
Banatre, J. P., Routeau, J. P., and Trilling, L. [1979]. An event-driven compiling
technique. Communications of the ACM, 22(1):34{42.
Barron, D. W. and Hartley, D. F. [1963]. Techniques for program error diagnosis on
EDSAC2. Computer Journal, 6:44{49.
References 349
Cercone, N., Kraus, M., and Boates, J. [1982]. Lexicon design using perfect hash
functions. SIGSOC Bulletin, 13(2):69{78.
Chaitin, Gregory J. [1982]. Register allocation & spilling via coloring. ACM SIGPLAN
Notices, 17(6):98{105.
Chaitin, Gregory J., Cocke, John, Chandra, A. K., Auslander, Marc A., Hop-
kins, Martin E., and Markstein, Peter W. [1981]. Register allocation via coloring.
Computer Languages, 6:47{57.
Chomsky, N. [1956]. Three models for the description of language. IRE Transactions on
Information Theory, IT-2:113{124.
Cichelli, R. J. [1980]. Minimal perfect hash functions made simple. Communications of the
ACM, 23(1):17{19.
Clark, D. W. and Green, C. C. [1977]. An empirical study of list structure in LISP.
Communications of the ACM, 20(2):78{87.
Cocke, John and Markstein, Peter W. [1980]. Measurement of code improvement al-
gorithms. In Lavington, S. H., editor, Information Processing 80, pages 221{228. North-
Holland, Amsterdam, NL.
Cody, William J. and Waite, William M. [1980]. Software Manual for the Elementary
Functions. Prentice-Hall, Englewood Clis.
Constantine, L. L., Stevens, W. P., and Myers, G. J. [1974]. Structured design. IBM
Systems Journal, 2:115{139.
Conway, R. and Wilcox, T. R. [1973]. Design and implementation of a diagnostic compiler
for PL/1. Communications of the ACM, 16(3):169{179.
Dakin, R. J. and Poole, Peter Cyril [1973]. A mixed code approach. Computer Journal,
16(3):219{222.
Damerau, F. [1964]. A technique for computer detection and correction of spelling errors.
Communications of the ACM, 7(3):171{176.
Davidson, J. W. and Fraser, C. W. [1980]. The design and application of a retargetable
peephole optimizer. ACM Transactions on Programming Languages and Systems, 2(2):191{
202.
Day, W. H. E. [1970]. Compiler assignment of data items to registers. IBM Systems Journal,
9(4):281{317.
Dencker, Peter [1977]. Ein neues LALR-System. Master's thesis, Fakultat fur Informatik,
Universitat Karlsruhe, FRG.
DeRemer, F. L. [1969]. Practical translators for LR(k) languages. Technical report, MIT,
Cambridge, MA. MAC-TR-65.
DeRemer, F. L. [1971]. Simple LR(k) grammars. Communications of the ACM, 14(7):453{
460.
DeRemer, F. L. [1974]. Lexical Analysis, pages 109{120. Springer Verlag, Heidelberg, New
York.
References 351
Holt, R. C., Barnard, David T., Cordy, James R., and Wortman, David B.
[1977]. SP/k: a system for teaching computer programming. Communications of the ACM,
20(5):301{309.
Horning, J. J., Lalonde, W. R., and Lee, E. S. [1972]. An LALR(k) parser genera-
tor. In Freiman, C. V., editor, Information Processing 71, pages 513{518. North-Holland,
Amsterdam, NL.
Housden, R.J.W. [1975]. On string concepts and their implementation. Computer Journal,
18(2):150{156.
Hunt, H. B. I., Szymanski, Thomas G., and Ullman, Jeffrey D. [1975]. On the
complexity of LR(k) testing. In Conference Proceedings of the Second ACM Symposium on
Principles of Programming Languages, pages 137{148. ACMg.
IBM [1968]. IBM System/360 operating system FORTRAN IV (H) compiler program logic
manual. Technical Report Y28-6642-3, IBM Corporation.
ICC [1962]. Symbolic Languages in Data Processing. Gordon and Breach, New York.
Ichbiah, J. D. [1980]. Ada Reference Manual, volume 106 of Lecture Notes in Computer
Science. Springer Verlag, Heidelberg, New York.
Irons, E. T. [1961]. A syntax-directed compiler for ALGOL 60. Communications of the
ACM, 4(1):51{55.
Irons, E. T. [1963a]. An error correcting parse algorithm. Communications of the ACM,
6(11):669{673.
Irons, E. T. [1963b]. Towards more versatile mechanical translators. In Experimental Arith-
metic, High Speed Computing and Mathematics, volume 15 of Proceedings of Symposia in
Applied Mathematics, pages 41{50. American Mathematical Society, Providence, RI.
Jansohn, Hans-Stephan, Landwehr, Rudolph, and Goos, Gerhard [1982]. Experi-
ence with an automatic code generator generator. ACM SIGPLAN Notices, 17(6):56{66.
Jazayeri, M. [1981]. A simpler construction showing the intrinsically exponential complexity
of the circularity problem for attribute grammars. Journal of the ACM, 28(4):715{720.
Jazayeri, M., Ogden, W. F., and Rounds, W. C. [1975]. On the complexity of the
circularity test for attribute grammars. In Conference Record of the Second Principles of
Programming Languages, pages 119{129. ACMg.
Jazayeri, M. and Pozefsky, D. P. [1977]. Algorithms for ecient evaluation of multi-
pass attribute grammars without a parse tree. Technical Report TP77-001, Department of
Computer Science, University of North Carolina, Chapel Hill, NC.
Jazayeri, M. and Walter, K. G. [1975]. Alternating semantic evaluator. In Proceedings
of the ACM National Conference, pages 230{234. ACMg.
Jensen, Kathleen and Wirth, Niklaus [1974]. Pascal User Manual and Report, volume 18
of Lecture Notes in Computer Science. Springer Verlag, Heidelberg, New York.
Johnson, D. S. [1974]. Worst case behavior of graph coloring algorithms. In Proceedings of
the Fifth Southeastern Conference on Combinatorics, Graph Theory and Computing, pages
513{523. Utilitas Mathematica Publishing, Winnipeg, Canada.
354 References
Johnson, W. L., Porter, J. H., Ackley, S. I., and Ross, Douglas T. [1968]. Automatic
generation of ecient lexical processors using nite state techniques. Communications of
the ACM, 11(12):805{813.
Johnston, J. B. [1971]. Contour model of block structured processes. ACM SIGPLAN
Notices, 6(2):55{82.
Joliat, M. L. [1973]. On the Reduced Matrix Representation of LR(k) Parser Tables. Ph.D.
thesis, University of Toronto.
Joliat, M. L. [1974]. Practical minimization of LR(k) parser tables. In [ Rosenfeld, 1974],
pages 376{380. North-Holland, Amsterdam, NL.
Jones, C. B. and Lucas, P. [1971]. Proving correctness of implementation techniques. In
Engeler, E., editor, Symposium on Semantics of Algorithmic Languages, volume 188 of
Lecture Notes in Mathematics, pages 178{211. Springer Verlag, Heidelberg, New York.
Karp, R. M. [1972]. Reducibility among combinatorial problems. In Miller and Thatcher
[1972], pages 85{104. Plenum Press, New York.
Kastens, Uwe [1976]. Systematische Analyse semantischer Abhangigkeiten. In Program-
miersprachen, number 1 in Informatik Fachberichte, pages 19{32. Springer Verlag, Heidel-
berg, New York.
Kastens, Uwe [1980]. Ordered attribute grammars. Acta Informatica, 13(3):229{256.
Kastens, Uwe, Zimmermann, Erich, and Hutt, B. [1982]. GAG: A practical compiler
generator. In , volume 141 of Lecture Notes in Computer Science. Springer Verlag, Heidel-
berg, New York.
Kennedy, K. [1981]. A survey of data
ow analysis techniques. In D., Muchnick Steven S.
Jones Neil, editor, Program Flow Analysis: Theory and Applications, pages 5{54. Prentice-
Hall, Englewood Clis.
Kennedy, K. and Ramanathan, J. [1979]. A deterministic attribute grammar evaluator
based on dynamic sequencing. ACM Transactions on Programming Languages and Systems,
1:142{160.
Kennedy, K. and Warren, S. K. [1976]. Automatic generation of ecient evaluators
for attribute grammars. In Conference Record of the Third Principles of Programming
Languages, pages 32{49. ACMg.
Klint, P. [1979]. Line numbers made cheap. Communications of the ACM, 22(10):557{559.
Knuth, D. E. [1962]. History of writing compilers. Computers and Automation, 11:8{14.
Knuth, D. E. [1965]. On the translation of languages from left to right. Information and
Control, 8(6):607{639.
Knuth, D. E. [1968a]. Fundamental Algorithms, volume 1 of The Art of Computer Program-
ming. Addision Wesley, Reading, MA.
Knuth, D. E. [1968b]. Semantics of context-free languages. Mathematical Systems Theory,
2(2):127{146. see [Knuth, 1971b].
References 355
Nygaard, K., Dahl, O., and Myrhaug, B. [1970]. SIMULA 67 Common Base Language
- Publication S-22. Norwegian Computing Center, Oslo.
Pager, D. [1974]. On eliminating unit productions from LR(k) parsers. In Loeckx, J., editor,
Automata, Languages and Programming, volume 14 of Lecture Notes in Computer Science,
pages 242{254. Springer Verlag, Heidelberg, New York.
Palmer, E. M., Rahimi, M. A., and Robinson, R. W. [1974]. Eciency of a binary
comparison storage technique. Journal of the ACM, 21(3):376{384.
Parnas, D. L. [1972]. On the criteria to be used in decomposing systems into modules.
Communications of the ACM, 15(12):1053{1058.
Parnas, D. L. [1976]. On the design and development of program families. IEEE Transac-
tions on Software Engineering, SE-2(1):1{9.
Peck, J. E. L., editor [1971]. ALGOL 68 Implementation. North-Holland, Amsterdam, NL.
Persch, Guido, Winterstein, Georg, Dausmann, Manfred, and Drossopoulou,
Sophia [1980]. Overloading in preliminary Ada. ACM SIGPLAN Notices, 15(11):47{56.
Peterson, T. G. [1972]. Syntax Error Detection, Correction and Recovery in Parsers. Ph.D.
thesis, Stevens Institute of Technology, Hoboken, NJ.
Pierce, R. H. [1974]. Source language debugging on a small computer. Computer Journal,
17(4):313{317.
Pozefsky, D. P. [1979]. Building Ecient Pass-Oriented Attribute Grammar Evaluators.
Ph.D. thesis, University of North Carolina, Chapel Hill, NC.
Quine, W. V. O. [1960]. Word and Object. Wiley, New York.
Raiha, K. [1980]. Bibliography on attribute grammars. ACM SIGPLAN Notices, 15(3):35{44.
Raiha, K. and Saarinen, M. [1977]. An optimization of the alternating semantic evaluator.
Information Processing Letters, 6(3):97{100.
Raiha, K., Saarinen, M., Soisalon-Soininen, E., and Tienari, M. [1978]. The compiler
writing system HLP (Helsinki Language Processor). Technical Report A-1978-2, Depart-
ment fo Computer Science, University of Helsinki, Finland.
Ramamoorthy, C. V. and Jahanian, P. [1976]. Formalizing the specication of target
machines for compiler adaptability enhancement. In Proceedings of the Symposium on
Computer Software Engineering, pages 353{366. Polytechnic Institute of New York.
Randell, B. and Russell, L. J. [1964]. ALGOL 60 Implementation. Academic Press, New
York.
Richards, M. [1971]. The portability of the BCPL compiler. Software{Practice and Experi-
ence, 1:135{146.
Ripken, K. [1977]. Formale Beschreibung von Maschinen, Implementierungen und Opti-
mierender Machinecoderzeugung Aus Attributierten Programmgraphen. Ph.D. thesis, Tech-
nische Universitaat Munchen.
358 References
Rissen, J. P., Heliard, J. C., Ichbiah, J. D., and Cousot, P. [1974]. The system imple-
mentation language LIS, reference manual. Technical Report 4549 E/EN, CII Honeywell-
Bull, Louveciennes, France.
Robertson, E. L. [1979]. Code generation and storage allocation for machines with span-
dependent instructions. ACM Transactions on Programming Languages and Systems,
1(1):71{83.
Ro hrich, J. [1978]. Automatic construction of error correcting parsers. Technical Report
Interner Bericht 8, Universitat Karlsruhe.
Ro hrich, J. [1980]. Methods for the automatic construction of error correcting parsers. Acta
Informatica, 13(2):115{139.
Rosen, S. [1967]. Programming and Systems and Languages. Mc Grawhill.
Rosenfeld, J. L., editor [1974]. Information Processing 74. North-Holland, Amsterdam,
NL.
Rosenkrantz, D. J. and Stearns, R. E. [1970]. Properties of deterministic top-down
grammars. Information and Control, 17:226{256.
Ross, D. T. [1967]. The AED free storage package. Communications of the ACM, 10(8):481{
492.
Rutishauser, H. [1952]. Automatische Rechenplanfertigung bei Programm-gesteuerten
Rechenmaschinen. Mitteilungen aus dem Institut fur Angewandte Mathematik der ETH-
Zurich, 3.
Sale, Arthur H. J. [1971]. The classication of FORTRAN statements. Computer Journal,
14:10{12.
Sale, Arthur H. J. [1977]. Comments on `report on the programming language Euclid'.
ACM SIGPLAN Notices, 12(4):10.
Sale, Arthur H. J. [1979]. A note on scope, one-pass compilers, and Pascal. Pascal News,
15:62{63.
Salomaa, Arto [1973]. Formal Languages. Academic Press, New York.
Samelson, K. and Bauer, Friedrich L. [1960]. Sequential formula translation. Commu-
nications of the ACM, 3(2):76{83.
Satterthwaite, E. [1972]. Debugging tools for high level languages. Software{Practice and
Experience, 2:197{217.
Scarborough, R. G. and Kolsky, H. G. [1980]. Improved optimization of FORTRAN
object programs. IBM Journal of Research and Development, 24(6):660{676.
Schulz, Waldean A. [1976]. Semantic Analysis and Target Language Synthesis in a Trans-
lator. Ph.D. thesis, University of Colorado, Boulder, CO.
Seegmuller, G. [1963]. Some remarks on the computer as a source language machine.
In Popplewell, C.M., editor, Information processing 1962, pages 524{525. North-Holland,
Amsterdam, NL.
References 359
Sethi, Ravi and Ullman, Jeffrey D. [1970]. The generation of optimal code for arithmetic
expressions. Journal of the ACM, 17(4):715{728.
Steele, G. L. [1977]. Arithmetic shifting considered harmful. ACM SIGPLAN Notices,
12(11):61{69.
Stephens, P. D. [1974]. The IMP language and compiler. Computer Journal, 17:216{223.
Stevenson, D. A. [1981]. Proposed standard for binary
oating-point arithmetic. Computer,
14(3):51{62.
Szymanski, T. G. [1978]. Assembling code for machines with span-dependent instructions.
Communications of the ACM, 21(4):300{308.
Talmadge, R. B. [1963]. Design of an integrated programming and operating system part
ii. the assembly program and its language. IBM Systems Journal, 2:162{179.
Tanenbaum, A. S. [1976]. Structured Computer Organization. Prentice-Hall, Englewood
Clis.
Tanenbaum, A. S. [1978]. Implications of structured programming for machine architecture.
Communications of the ACM, 21(3):237{246.
Tanenbaum, Andrew S., van Staveren, H., and Stevenson, J. W. [1982]. Using
peephole optimization on intermediate code. ACM Transactions on Programming Languages
and Systems, 4(1):21{36.
Tennent, R. D. [1981]. Principles of Programming Languages. Prentice-Hall, Englewood
Clis.
Tienari, M. [1980]. On the denition of an attribute grammar. In Semantics-Directed
Compiler Construction, volume 94 of Lecture Notes in Computer Science, pages 408{414.
Springer Verlag, Heidelberg, New York.
Uhl, Jurgen, Drossopoulou, Sophia, Persch, Guido, Goos, Gerhard, Dausmann,
Manfred, Winterstein, Georg, and Kirchgassner, Walter [1982]. An Attribute
Grammar for the Semantic Analysis of Ada, volume 139 of Lecture Notes in Computer
Science. Springer Verlag, Heidelberg, New York.
van Wijngaarden, A., Mailloux, B. J., Lindsey, C. H., Meertens, L. G. L. T.,
Koster, C. H. A., Sintzoff, M., Peck, J. E. L., and Fisker, R. G. [1975]. Revised
report on the algorithmic language ALGOL 68. Acta Informatica, 5:1{236.
Waite, William M. [1973a]. Implementing Software for Non-Numerical Applications. Pren-
tice-Hall, Englewood Clis.
Waite, William M. [1973b]. A sampling monitor for applications programs. Software{
Practice and Experience, 3(1):75{79.
Waite, William M. [1976]. Semantic analysis. In [ Bauer and Eickel, 1976], pages
157{169. Springer Verlag, Heidelberg, New York.
Wegbreit, B. [1972]. A generalised compactifying garbage collector. Computer Journal,
15:204{208.
Wegner, P. [1972]. The vienna denition language. ACM Computing Surveys, 4(1):5{63.
360 References
DAVID A WATT
University of Glasgow, Scotland
and
DERYCK F BROWN
The Robert Gordon University, Scotland
The rights of David A Watt and Deryck F Brown to be identified as authors of this
Work have been asserted by them in accordance with the Copyright, Designs and
Patents Act 1988.
Typeset by 7
Printed an bound in Great Britain by Biddles Ltd, www.biddles.co.uk
Contents
Preface
1 Introduction
1.1 Levels of programming language
1.2 Programming language processors
1.3 Specification of programming languages
1.3.1 Syntax
1.3.2 Contextual constraints
1.3.3 Semantics
1.4 Case study: the programming language Triangle
1.5 Further reading
Exercises
2 Language Processors
Translators and compilers
Interpreters
Real and abstract machines
interpretive compilers
Portable compilers
Bootstrapping
2.6.1 Bootstrapping a portable compiler
2.6.2 Full bootstrap
2.6.3 Half bootstrap
2.6.4 Bootstrapping to improve efficiency
Case study: the Triangle language processor
Further reading
Exercises
3 Compilation
3.1 Phases
3.1.1 Syntactic analysis
3.1.2 Contextual analysis
3.1.3 Code generation
3.2 Passes
vi Programming Language Processors in Java
4 Syntactic Analysis
4.1 Subphases of syntactic analysis
4.1.1 Tokens
4.2 Grammars revisited
4.2.1 Regular expressions
4.2.2 Extended BNF
4.2.3 Grammar transformations
4.2.4 Starter sets
4.3 Parsing
4.3.1 The bottom-up parsing strategy
4.3.2 The top-down parsing strategy
4.3.3 Recursive-descent parsing
4.3.4 Systematic development of a recursive-descent parser
4.4 Abstract syntax trees
4.4.1 Representation
4.4.2 Construction
4.5 Scanning
4.6 Case study: syntactic analysis in the Triangle compiler
4.6.1 Scanning
4.6.2 Abstract syntax trees
4.6.3 Parsing
4.6.4 Error handling
4.7 Further reading
Exercises
5 Contextual Analysis
5.1 Identification
5.1.1 Monolithic block structure
5.1.2 Flat block structure
5.1.3 Nested block structure
5.1.4 Attributes
5.1.5 Standard environment
5.2 Typechecking
5.3 A contextual analysis algorithm
5.3.1 Decoration
5.3.2 Visitor classes and objects
5.3.3 Contextual analysis as a visitor object
5.4 Case study: contextual analysis in the Triangle compiler
Contents vii
5.4.1 Identification
5.4.2 Type checking
5.4.3 Standard environment
5.5 Further reading
Exercises
6 Run-Time Organization
6.1 Data representation
6.1.1 Primitive types
6.1.2 Records
6.1.3 Disjoint unions
6.1.4 Static arrays
6.1.5 Dynamic arrays
6.1.6 Recursive types
6.2 Expression evaluation
6.3 Static storage allocation
6.4 Stack storage allocation
6.4.1 Accessing local and global variables
6.4.2 Accessing nonlocal variables
6.5 Routines
6.5.1 Routine protocols
6.5.2 Static links
6.5.3 Arguments
6.5.4 Recursion
6.6 Heap storage allocation
6.6.1 Heap management
6.6.2 Explicit storage deallocation
6.6.3 Automatic storage deallocation and garbage collection
6.7 Run-time organization for object-oriented languages
6.8 Case study: the abstract machine TAM
6.9 Further reading
Exercises
7 Code Generation
7.1 Code selection
7.1.1 Code templates
7.1.2 Special-case code templates
7.2 A code generation algorithm
7.2.1 Representation of the object program
7.2.2 Systematic development of a code generator
7.2.3 Control structures
7.3 Constants and variables
7.3.1 Constant and variable declarations
7.3.2 Static storage allocation
7.3.3 Stack storage allocation
viii Programming Language Processors in Java
8 Interpretation
8.1 Iterative interpretation
8.1.1 Iterative interpretation of machine code
8.1.2 Iterative interpretation of command languages
8.1.3 Iterative interpretation of simple programming languages
8.2 Recursive interpretation
8.3 Case study: the TAM interpreter
8.4 Further reading
Exercises
9 Conclusion
9.1 The programming language life cycle
9.1.1 Design
9.1.2 Specification
9.1.3 Prototypes
9.1.4 Compilers
9.2 Error reporting
9.2.1 Compile-time error reporting
9.2.2 Run-time error reporting
9.3 Efficiency
9.3.1 Compile-time efficiency
9.3.2 Run-time efficiency
9.4 Further reading
Exercises
Projects with the Triangle language processor
Appendices
Answers 7
Answers 8
Answers 9
Bibliography
Preface
and reusability. Secondly, Java itself has experienced a prodigious growth in popularity
since its appearance as recently as 1994, and that for good technical reasons: Java is
simple, consistent, portable, and equipped with an extremely rich class library. Soon we
can expect all computer science students to have at least some familiarity with Java.
Processors
6 Run-Time
u Conclusion
Educational software
A Triangle language processor is available for educational use in conjunction with this
textbook. The Triangle language processor consists of: a compiler for Triangle, which
generates code for TAM (Triangle Abstract Machine); an interpreter for TAM; and a
disassembler for TAM. The tools are written entirely in Java, and will run on any
computer equipped with a JVM (Java Virtual Machine). You can download the Triangle
language processor from our Web site:
Readership
This book and its companions are aimed at junior, senior, and graduate students of com-
puter science and information technology, all of whom need some understanding of the
fundamentals of programming languages. The books should also be of interest to profes-
sional software engineers, especially project leaders responsible for language evaluation
and selection, designers and implementors of language processors, and designers of new
languages and extensions to existing languages.
The basic prerequisites for this textbook are courses in programming and data struc-
tures, and a course in programming languages that covers at least basic language con-
cepts and syntax. The reader should be familiar with Java, and preferably at least one
other high-level language, since in studying implementation of programming languages
it is important not to be unduly influenced by the idiosyncrasies of a particular language.
All the algorithms in this textbook are expressed in Java.
The ability to read a programming language specification critically is an essential
skill. A programming language implementor is forced to explore the entire language,
including its darker corners. (The ordinary programmer is wise to avoid these dark
xvi Programming Language Processors in Java
corners!) The reader of this textbook will need a good knowledge of syntax, and ideally
some knowledge of semantics; these topics are briefly reviewed in Chapter 1 for the
benefit of readers who might lack such knowledge. Familiarity with BNF and EBNF
(which are commonly used in language specifications) is essential, because in Chapter 4
we show how to exploit them in syntactic analysis. No knowledge of formal semantics
is assumed.
The reader should be comfortable with some elementary concepts from discrete
mathematics - sets and recursive functions - as these help to sharpen understanding of,
for example, parsing algorithms. Discrete mathematics is essential for a deeper under-
standing of compiler theory; however, only a minimum of compiler theory is presented
in this book.
This book and its companions attempt to cover all the most important aspects of a
large subject. Where necessary, depth has been sacrificed for breadth. Thus the really
serious student will need to follow up with more advanced studies. Each book has an
extensive bibliography, and each chapter closes with pointers to further reading on the
topics covered by the chapter.
Acknowledgments
Most of the methods described in this textbook have long since passed into compiler
folklore, and are almost impossible to attribute to individuals. Instead, we shall mention
people who have particularly influenced us personally.
For providing a stimulating environment in which to think about programming lan-
guage issues, we are grateful to colleagues in the Department of Computing Science at
the University of Glasgow, in particular Malcolm Atkinson, Muffy Calder, Quintin
Cutts, Peter Dickman, Bill Findlay, John Hughes, John Launchbury, Hermano Moura,
John Patterson, Simon Peyton Jones, Fermin Reig, Phil Trinder, and Phil Wadler. We
have also been strongly influenced, in many different ways, by the work of Peter
Buneman, Luca Cardelli, Edsger Dijkstra, Jim Gosling, Susan Graham, Tony Hoare,
Jean Ichbiah, Mehdi Jazayeri, Robin Milner, Peter Mosses, Atsushi Ohori, Bob Tennent,
Jim Welsh, and Niklaus Wirth.
We wish to thank the reviewers for reading and providing valuable comments on an
earlier draft of this book. Numerous cohorts of undergraduate students taking the
Programming Languages 3 module at the University of Glasgow made an involuntary
but essential contribution by class-testing the Triangle language processor, as have three
cohorts of students taking the Compilers module at the Robert Gordon University.
We are particularly grateful to Tony Hoare, editor of the Prentice Hall International
Series in Computer Science, for his encouragement and advice, freely and generously
offered when these books were still at the planning stage. If this book is more than just
another compiler textbook, that is partly due to his suggestion to emphasize the connec-
tions between compilation, interpretation, and semantics.
Glasgow and Aberdeen D.A.W.
July, 1999 D.F.B.
CHAPTER ONE
Introduction
In this introductory chapter we start by reviewing the distinction between low-level and
high-level programming languages. We then see what is meant by a programming lan-
guage processor, and look at examples from different programming systems. We review
the specification of the syntax and semantics of programming languages. Finally, we
look at Triangle, a programming language that will be used as a case study throughout
this book.
Once written, a program could simply be loaded into the machine and run.
Clearly, machine-code programs are extremely difficult to read, write, and edit. The
programmer must keep track of the exact address of each item of data and each instruc-
tion in storage, and must encode every single instruction as a bit string. For small pro-
grams (consisting of thousands of instructions) this task is onerous; for larger programs
the task is practically infeasible.
Programmers soon began to invent symbolic notations to make programs easier to
read, write, and edit. The above instructions might be written, respectively, as follows:
LOAD x
ADD R1 R2
JUMPZ h
where LOAD,ADD,and JUMPZ are symbolic names for operations, R1 and R2 are sym-
bolic names for registers, x is a symbolic name for the address of a particular item of
data, and h is a symbolic name for the address of a particular instruction. Having written
a program like this on paper, the programmer would prepare it to be run by manually
translating each instruction into machine code. This process was called assembling the
program.
The obvious next step was to make the machine itself assemble the program. For this
process to work, it is necessary to standardize the symbolic names for operations and
registers. (However, the programmer should still be free to choose symbolic names for
data and instruction addresses.) Thus the symbolic notation is formalized, and can now
be termed an assembly language.
Even when writing programs in an assembly language, the programmer is still work-
ing in terms of the machine's instruction set. A program consists of a large number of
very primitive instructions. The instructions must be written individually, and put to-
gether in the correct sequence. The algorithm in the mind of the programmer tends to be
swamped by details of registers, jumps, and so on. To take a very simple example, con-
sider computing the area of a triangle with sides a , b, and c, using the formula:
d(s x (S- a) x (s - b) x (s - c))
where s = (a + b + c) I 2
Written in assembly language, the program must be expressed in terms of individual
arithmetic operations, and in terms of the registers that contain intermediate results:
LOAD R1 a; ADD R1 b; ADD R1 c; DIV R1 # 2 ;
LOAD R2 R1;
LOAD R3 R1; SUB R3 a; MULT R2 R3;
LOAD R3 R1; SUB R3 b; MULT R2 R3;
Introduction 3
If the program fails to compile, or misbehaves when run, the user reinvokes the
editor to modify the program; then reinvokes the compiler; and so on. Thus program
development is an edit-compile-run cycle.
There is no direct communication between these language processors. If the program
fails to compile, the compiler will generate one or more error reports, each indicating
the position of the error. The user must note these error reports, and on reinvoking the
editor must find the errors and correct them. This is very inconvenient, especially in the
early stages of program development when errors might be numerous.
0
The essence of the 'software tools' philosophy is to provide a small number of com-
mon and simple tools, which can be used in various combinations to perform a large
variety of tasks. Thus only a single editor need be provided, one that can be used to edit
programs in a variety of languages, and indeed other textual documents too.
What we have described is the 'software tools' philosophy in its purest form. In
practice, the philosophy is compromised in order to make program development easier.
The editor might have a facility that allows the user to compile the program (or indeed
issue any system command) without leaving the editor. Some compilers go further: if
the program fails to compile, the editor is automatically reinvoked and positioned at the
first error.
These are ad hoc solutions. A fresh approach seems preferable: a fully integrated
language processor, designed specifically to support the edit-compile-run cycle.
widely understood. But contextual constraints and semantics are usually specified infor-
mally, because their formal specification is more difficult, and the available notations
are not yet widely understood. A typical language specification, with formal syntax but
otherwise informal, may be found in Appendix B.
1.3.1 Syntax
Syntax is concerned with the form of programs. We can specify the syntax of a pro-
gramming language formally by means of a context-free grammar. This consists of the
following elements:
A finite set of terminal symbols (or just terminals). These are atomic symbols, the
ones we actually enter at the keyboard when composing a program in the language.
Typical examples of terminals in a programming language's grammar are '>=',
' w h i l e ' , and '; '.
A finite set of nonterminal symbols (or just nonteminals). A nonterminal symbol
represents a particular class of phrases in the language. Typical examples of
nonterminals in a programming language's grammar are Program, Command,
Expression, and Declaration.
A start symbol, which is one of the nonterminals. The start symbol represents the
principal class of phrases in the language. Typically the start symbol in a
programming language's grammar is Program.
A finite set of production rules. These define how phrases are composed from termi-
nals and subphrases.
Grammars are usually written in the notation BNF (Backus-Naur Form). In BNF, a
production rule is written in the form N ::= a, where N is a nonterminal symbol, and
where a is a (possibly empty) string of terminal andlor nonterminal symbols. Several
production rules with a common nonterminal on their left-hand sides:
The BNF symbol '::=' is pronounced 'may consist of', and 'I' is pronounced 'or alterna-
tively'.
Expression primary-Expression
Expression Operator primary-Expression
Integer-Literal
V-name
Operator primary-Expression
( Expression )
V-name ldentifier
Declaration single-Declaration
Declaration ;single-Declaration
-
const ldentifier Expression
var ldentifier :Type-denoter
Type-denoter ldentifier
Operator +1-1*1/1<1>1=1\
ldentifier Letter I ldentifier Letter I ldentifier Digit
Integer-Literal Digit 1 Integer-Literal Digit
Comment ! Graphic* eol
Production rule (1.30 tells us that a single-command may consist of the terminal
symbol 'begin',followed by a command, followed by the terminal symbol 'end'.
Production rule (1.3a) tells us that a single-command may consist of a value-or-
variable-name, followed by the terminal symbol ' :=', followed by an expression.
A value-or-variable-name, represented by the nonterminal symbol V-name, is the
name of a declared constant or variable. Production rule (1.6) tells us that a value-or-
variable-name is just an identifier. (More complex value-or-variable-names can be writ-
ten in full Triangle.)
Production rules (1.2a-b) tell us that a command may consist of a single-command
alone, or alternatively it may consist of a command followed by the terminal symbol ' ;'
followed by a single-command. In other words, a command consists of a sequence of
one or more single-commands separated by semicolons.
In production rules (1.1 la-c), (1.12a-b), and (1.13):
eol stands for an end-of-line 'character';
Letter stands for one of the lowercase letters 'a','b',..., or '2';
This AST's root node is labeled Whilecommand, signifying the fact that this is a while-
command. The root node's second child is labeled Sequentialcommand, signifying the
fact that the body of the while-command is a sequential-command. Both children of the
Sequentialcommand node are labeled Assigncommand.
When we write down the above command, we need the symbols 'begin'and 'end'
to bracket the subcommands 'n : = 0' and 'b : = false'.These brackets distinguish
the above command from:
while b do n := 0; b := false
whose meaning is quite different. (See Exercise 1S.) There is no trace of these brackets
in the abstract syntax, nor in the AST of Figure 1.5. They are not needed because the
AST structure itself represents the bracketing of the subcommands.
0
A program's AST represents its phrase structure explicitly. The AST is a convenient
structure for specifying the program's contextual constraints and semantics. It is also a
convenient representation for language processors such as compilers. For example, con-
sider again the assignment command 'while E do C'. The meaning of this command can
be specified in terms of the meanings of its subphrases E and C . The translation of this
command into object code can be specified in terms of the translations of E and C into
object code. The command is represented by an AST with root node labeled 'While-
Command' and two subtrees representing E and C, so the compiler can easily access
these subphrases.
In Chapter 3 we shall use ASTs extensively to discuss the internal phases of a com-
piler. In Chapter 4 we shall see how a compiler constructs an AST to represent the
source program. In Chapter 5 we shall see how the AST is used to check that the
program satisfies the contextual constraints. In Chapter 7 we shall see how to translate
the program into object code.
BinaryExpression
BinaryExpression
VnameExpr.
I
SimpleVname
I
SimpleVname
I
1dent. Op. 1nt.Lit. op.
I
Ident.
AssignCommand AssignCommand
VnameExpr.
Program
,
I
Letcommand
BinaryExpression
,
VarDeclaration Simplev.
V n a m T
SimpleT. Simplev.
y Integer
The function call at point (2) also doubles its argument, because the applied occurrence
of m inside the function f always denotes 2, regardless of what m denotes at the point of
call.
In a language with dynamic binding, on the other hand, the applied occurrence of m
would denote the value to which m was most recently bound. In such a language, the
function call at (1) would double its argument, whereas the function call at (2) would
triple its argument.
0
Every programming language has a universe of discourse, the elements of which we
call values. Usually these values are classified into types. Each operation in the language
has an associated type rule, which tells us the expected operand type(s), and the type of
the operation's result (if any). Any attempt to apply an operation to a wrongly-typed
value is called a type error.
A programming language is statically typed if a language processor can detect all
type errors without actually running the program; the language is dynamically typed if
type errors cannot be detected until run-time.
The fact that a programming language is statically typed implies the following:
Every well-formed expression E has a unique type T, which can be inferred without
actually evaluating E.
Whenever E is evaluated, it will yield a value of type T. (Evaluation of E might fail
due to overflow or some other run-time error, or it might diverge, but its evaluation
will never fail due to a type error.)
In this book we shall generally assume that the source language exhibits static bind-
ing and is statically typed.
1.3.3 Semantics
Semantics is concerned with the meanings of programs, i.e., their behavior when run.
Many notations have been devised for specifying semantics formally, but so far none
has achieved widespread acceptance. Here we show how to specify the semantics of a
programming language informally.
Our first task is to specify, in general terms, what will be the semantics of each class
of phrase in the language. We may specify the semantics of commands, expressions, and
declarations as follows:
A command is executed to update variables. [It may also have the side effect of per-
forming input-output.]
An expression is evaluated to yield a value. [It may also have the side effect of updat-
ing variables.]
Introduction 23
Triangle has the usual variety of operators, standard functions, and standard proce-
dures. These behave exactly like ordinary declared functions and procedures; unlike
Pascal, they have no special type rules or parameter mechanisms. In particular, Triangle
operators behave exactly like functions of one or two parameters.
3ach expressions, commands, declarations - rather than individual lines. You proba-
ions. bly spend a lot of time on chores such as good layout. Also think of the
ruct, common syntactic errors that might reasonably be detected immediately.)
]ally
your 1.4 According to the context-free grammar of Mini-Triangle in Example 1.3,
.d to which of the following are Mini-Triangle expressions?
ning
(a) true
(b) sin(x)
(c) -n
(d) m 2 = n
(e) m - n * 2
am- Draw the syntax tree and AST of each one that is an expression.
lese Similarly, which of the following are Mini-Triangle commands?
lion
ling (f) n : = n + 1
(g) halt
dler (h) put (m, n)
and (i) if n > m then m : = n
OUS ) while n > 0 do n : = n-1
e is
Similarly, which of the following are Mini-Triangle declarations?
tics
(k) const pi - 3 .I416
of (1) const y - x+l
:ual (m) var b: Boolean
ans (n) var m, n: Integer
(0) var y: Integer; const dpy - 365
1.5 Draw the syntax tree and AST of the Mini-Triangle command:
while b do n : = 0; b : = false
cited at the end of Example 1.5. Compare with Figures 1.2 and 1.5.
of
1.6 According to the syntax and semantics of Mini-Triangle in Examples 1.3 and
1.8, what value is written by the following Mini-Triangle program? (The stan-
ro- dard procedure putint writes its argument, an integer value.)
let
const m 2;-
const n - rn + 1
in
putint(m + n * 2)
(Note: Do not be misled by your knowledge of any other languages.)
Language Processors 27
' We use the term x86 to refer to the family of processors represented by the Intel 80386
processor and its successors.
Language Processors 29
An S-into-T translator is itself a program, and can run on machine M only if it is ex-
pressed in machine code M. When the translator runs, it. translates a source program P,
expressed in the source language S, to an equivalent object program P, expressed in the
target language T. This is shown in Figure 2.5. (The object program is shaded gray, to
emphasize that it is newly generated, unlike the translator and source program, which
must be given at the start.)
...........
"'.'must match
.......'"
must ....___,_.._
match
The second stage of the diagram shows the object program being run, also on an x86
machine.
0
A cross-compiler is a compiler that runs on one machine (the host machine) but gen-
erates code for a dissimilar machine (the target machine). The object program must be
generated on the host machine but downloaded to the target machine to be run. A cross-
compiler is a useful tool if the target machine has too little memory to accommodate the
compiler, or if the target machine is ill-equipped with program development aids. (Com-
pilers tend to be large programs, needing a good programming environment to develop,
and needing ample memory to run.)
Language Processors 35
chess chess
Basic Lisp Lisp
Basic Lisp Basic
x86
(b) (c)
L-l
Ultima
We can now translate the interpreter into some machine code, say M, using the C
compiler:
Language Processors 39
v
Figure 2.8 An abstract machine is functionally equivalent to a real machine.
character set or different arithmetic. Written with care, however, application prograi
expressed in high-level languages should achieve 95-99% portability.
Similar points apply to language processors, which are themselves programs. Inder
it is particularly important for language processors to be portable because they :
especially valuable and widely-used programs. For this reason language processors :
commonly written in high-level languages such as Pascal, C, and Java.
Unfortunately, it is particularly hard to make compilers portable. A compile
function is to generate machine code for a particular machine, a function that
machine-dependent by its very nature. If we have a C-into-x86 compiler expressed ir
high-level language, we should be able to move this compiler quite easily to run or
dissimilar machine, but it will still generate x86 machine code! To change the compi
to generate different machine code would require about half the compiler to
rewritten, implying that the compiler is only about 50% portable.
It might seem that highly portable compilers are unattainable. However, the situatil
is not quite so gloomy: a compiler that generates intermediate language is potential
much more portable than a compiler that generates machine code.
How can we make this work? It seems that we cannot compile Java programs un
we have an implementation of JVM-code, and we cannot use the JVM-code interpret
until we can compile Java programs! Fortunately, a small amount of work can get us o
of this chicken-and-egg situation.
Suppose that we want to get the system running on machine M, and suppose that v
already have a compiler for a suitable high-level language, such as C, on this machine.
Then we rewrite the interpreter in C:
JVM -+ M
(This is a substantial job, but only about half as much work as writing a complete Java-
into-M compiler.) Next, we compile this translator using the existing interpretive
compiler:
This gives an Ada-S compiler for machine M. We can test it by using it to compile and
run Ada-S test programs.
But we prefer not to rely permanently on version 1 of the Ada-S compiler, because it
is expressed in C, and therefore is maintainable only as long as a C compiler is
available. Instead, we make version 2 of the Ada-S compiler, expressed in Ada-S itself:
Language Processors 47
We compile the modified compiler, using the original compiler, to obtain a cross-
compiler:
Language Processors 5 1
The compiler translates Triangle source programs into TAM code. TAM (Triangle
Abstract Machine) is an abstract machine, implemented by an interpreter. TAM has
been designed to facilitate the implementation of Triangle - although it would be
equally suitable for implementing Algol, Pascal, and similar languages. Like JVM-code
(Example 2.15), TAM'S primitive operations are more similar to the operations of a
high-level language than to the very primitive operations of a typical real machine. As a
consequence, the translation from Triangle into TAM code is straightforward and fast.
The Triangle-into-TAM compiler and the TAM interpreter together constitute an
interpretive compiler, much like the one described in Example 2.15. (See Exercise 2.2.)
The TAM disassembler translates a TAM machine code program into TAL (Triangle
Assembly Language). It is used to inspect the object programs produced by the
Triangle-into-TAM compiler.
c+ c+
Triangle + TAM TAM 4 TAL
Further reading
A number of authors have used tombstone diagrams to represent language processors
and their interactions. The formalism was fully developed, complete with mathematical
underpinnings, by Earley and Sturgis (1970). Their paper also presents an algorithm that
systematically determines all the tombstones that can be generated from a given initial
set of tombstones.
A case study of compiler development by full bootstrap may be found in Wirth
(1971). A case study of compiler development by half bootstrap may be found in Welsh
and Quinn (1972). Finally, a case study of compiler improvement by bootstrapping may
be found in Ammann (1981). Interestingly, all these three case studies are interlinked:
Wirth's Pascal compiler was the starting point for the other two developments.
Bootstrapping has a longer history, the basic idea being described by several authors
in the 19.50s. (At that time compiler development itself was still in its infancy !) The first
well-known application of the idea seems to have been a program called eval,which
was a Lisp interpreter expressed in Lisp itself (McCarthy et al. 1965).
Sun Microsystems' Java Development Kit (JDK) consists of a compiler that trans-
lates Java code to JVM code, a JVM interpreter, and a number of other tools. The
compiler (javac) is written in Java itself, having been bootstrapped from an initial
Language Processors 53
2.3 Assume that you have the following: a machine M; a C compiler that runs on
machine M and generates machine code M; and a Java-into-C translator ex-
pressed in C. Use tombstone diagrams to represent these language processors.
Also show how you would use these language processors to:
(a) compile and run a program P expressed in C;
(b) compile the Java-into-C translator into machine code;
(c) compile and run a program Q expressed in Java.
2.4 Assume that you have the following: a machine M; a C compiler that runs on
machine M and generates machine code M; a TAM interpreter expressed in C;
and a Pascal-into-TAM compiler expressed in C. Use tombstone diagrams to
represent these language processors. Also show how you would use these lan-
guage processors to:
(a) compile the TAM interpreter into machine code;
2.5 The Gnu compiler kit uses a machine-independent register transfer language,
RTL, as an intermediate language. The kit includes translators from several
high-level languages (such as C, C++, Pascal) into RTL, and translators from
RTL into several machine codes (such as Alpha, PPC, SPARC). It also
includes an RTL 'optimizer', i.e., a program that translates RTL into more
efficient RTL. All of these translators are expressed in C.
(a) Show how you would install these translators on a SPARC machine,
given a C compiler for the SPARC.
Now show how you would use these translators to:
(b) compile a program P, expressed in Pascal, into SPARC machine code;
(c) compile the same program, but using the RTL optimizer to generate more
efficient object code;
(d) cross-compile a program Q, expressed in C++, into PPC machine code.
2.6 The Triangle language processor (see Section 2.7) is expressed entirely in Java.
Use tombstone diagrams to show how the compiler, interpreter, and disassem-
bler would be made to run on machine M. Assume that a Java-into-M compiler
is available.
2.7 Draw tombstone diagrams to illustrate the use of a Java JIT (just-in-time)
compiler. Show what happens when a Java program P is compiled and stored
on a host machine H, and subsequently downloaded for execution on the user's
CHAPTER THREE
Compilation
In this chapter we study the internal structure of compilers. A compiler's basic function
is to translate a high-level source program to a low-level object program, but before
doing so it must check that the source program is well-formed. So compilation is
decomposed into three phases: syntactic analysis, contextual analysis, and code gener-
ation. In this chapter we study these phases and their relationships. We also examine
some possible compiler designs, each design being characterized by the number of
passes over the source program or its internal representation, and discuss the issues
underlying the choice of compiler design.
In this chapter we restrict ourselves to a shallow exploration of compilation. We
shall take a more detailed look at syntactic analysis, contextual analysis, and code
generation in Chapters 4, 5, and 7, respectively.
Inside any compiler, the source program is subjected to several transformations before
an object program is finally generated. These transformations are called phases. The
three principal phases of compilation are as follows:
Syntactic analysis: The source program is parsed to check whether it conforms to the
source language's syntax, and to determine its phrase structure.
Contextual analysis: The parsed program is analyzed to check whether it conforms to
the source language's contextual constraints.
Code generation: The checked program is translated to an object program, in accor-
dance with the semantics of the source and target languages.
The three phases of compilation correspond directly to the three parts of the source
language's specification: its syntax, its contextual constraints, and its semantics. '
' Some compilers include a fourth phase, code optimization. Lexical analysis is sometimes
treated as a distinct phase, but in this book we shall treat it as a sub-phase of syntactic analysis.
Compilation 57
let
var n: Integer
in ! ill-formed program
while n / 2 do
m : = 'n' > 1
Figure 3.5 An ill-formed Triangle source program.
Program
I
Letcommand
(1)l
Whilecommand
I
BinaryE&ession :i n x
int ‘,SimpleV.
int (2)
Ident. Ident. Op. 1nt.Lit. Ident. Char.Lit. Op. 1nt.Lit.
n / 2 m 'n' > 1
Figure 3.6 Discovering errors during contextual analysis of the Triangle program of Figure 3.5.
(7) It generates the instruction 'CALL add'. (When executed, this instruction will add
the two previously-fetched values.)
( 5 ) By following the link to the declaration of n, it retrieves this variable's address,
namely 0 [SB] . Then it generates the instruction 'STORE 0 [SB] '. (When exe-
cuted, this instruction will store the previously-computed value in that variable.)
In this way the code generator translates the whole program into object code.
0
3.2 Passes
In the previous section we examined the principal phases of compilation, and the flow
of data between them. In this section we go on to examine and compare alternative
compiler designs.
In designing a compiler, we wish to decompose it into modules, in such a way that
each module is responsible for a particular phase. In practice there are several ways of
doing so. The design of the compiler affects its modularity, its time and space require-
ments, and the number of passes over the program being compiled.
A pass is a complete traversal of the source program, or a complete traversal of an
internal representation of the source program (such as an AST). A one-pass compiler
makes a single traversal of the source program; a multi-pass compiler makes several
traversals.
In practice, the design of a compiler is inextricably linked to the number of passes it
makes. In this section we contrast multi-pass and one-pass compilation, and summarize
the advantages and disadvantages of each.
A structure diagram summarizes the modules and module dependencies in a system. The
higher-level modules are those near the top of the structure diagram. A connecting line
represents a dependency of a higher-level module on a lower-level module. This dependency
consists of the higher-level module using the services (e.g., types or methods) provided by the
lower-level module.
Compilation 65
After parsing the assignment command 'c := ' & ' ', the syntactic analyzer calls
the contextual analyzer to check type compatibility. It then calls the code generator
to generate instruction 'STORE 1[ S B ] ', using the address retrieved at point (3).
After parsing the value-or-variable-name n, the syntactic analyzer infers (by
calling the contextual analyzer) that it is a variable of type int. It then calls the code
generator to retrieve the variable's address, 0 [SB].
While parsing the expression n+l,the syntactic analyzer infers (by calling the
contextual analyzer) that the subexpression n is of type int, that the operator '+' is
of type int x int + int, that the subexpression 1 is of type int, and hence that the
whole expression is of type int. It calls the code generator to generate instructions
'LOAD 0 [ SBI ', 'LOADL l',and 'CALL add'.
Compilation 67
A one-pass Triangle compiler would have been perfectly feasible, so the choice of a
three-pass design needs to be justified. The Triangle compiler is intended primarily for
educational purposes, so simplicity and clarity are paramount. Efficiency is a secondary
consideration; in any case, efficiency arguments for a one-pass compiler are inconclu-
sive, as we saw in Section 3.2.3. So the Triangle compiler was designed to be as modul-
ar as possible, allowing the different phases to be studied independently of one another.
Triangle
Exercises
In Examples 3.2 and 3.4, the first assignment command 'c : = ' & ' ' was
ignored. Describe how this command would have been subjected to contextual
analysis and code generation.
The Mini-Triangle source program below left would be compiled to the object
program below right:
let
const m 7;-
var x: Integer PUSH 1
in
x : = m * x LOADL 7
LOAD O[SB]
CALL mult
STORE 0 [SB]
POP 1
HALT
Describe the compilation in the same manner as Examples 3.1, 3.2, and 3.4.
(You may ignore the generation of the PUSH,and POP instructions.)
Choose a compiler with which you are familiar. Find out and describe its
phases and its pass structure. Draw a data flow diagram (like Figure 3.1) and a
structure diagram (like Figure 3.8 or Figure 3.9).
Syntactic Analysis
In Chapter 3 we saw how compilation can be decomposed into three principal phases,
one of which is syntactic analysis. In this chapter we study syntactic analysis, and
further decompose it into scanning, parsing, and abstract syntax tree construction.
Section 4.1 explains this decomposition.
The main function of syntactic analysis is to parse the source program in order to
discover its phrase structure. Thus the main topic of this chapter is parsing, and in
particular the simple but effective method known as recursive-descent parsing. Sec-
tion 4.3 explains how parsing works, and shows how a recursive-descent parser can be
systematically developed from the programming language's grammar. This
development is facilitated by a flexible grammatical notation (EBNF) and by various
techniques for transforming grammars, ideas that are introduced in Section 4.2.
In a multi-pass compiler, the source program's phrase structure must be represented
explicitly in some way. This choice of representation is a major design decision. One
convenient and widely-used representation is the abstract syntax tree. Section 4.4 shows
how to make the parser construct an abstract syntax tree.
In parsing it is convenient to view the source program as a stream of tokens: symbols
such as identifiers, literals, operators, keywords, and punctuation. Since the source
program text actually consists of individual characters, and a token may consist of
several characters, scanning is needed to group the characters into tokens, and to discard
other text such as blank space and comments. Scanning is the topic of Section 4.5.
literal, and '+' is of kind operator. The criterion for classifying tokens is simply this: all
tokens of the same kind can be freely interchanged without affecting the program's
phrase structure. Thus the identifier ' y ' could be replaced by 'x'or 'banana',and the
integer-literal '1' by '7' or 'loo', without affecting the program's phrase structure. On
the other hand, the token '1e t' could not be replaced by '1o t' or '1ed' or anything
else; 'let'is the only token of its kind.
Each token is completely described by its kind and spelling. Thus a token can be
represented simply by an object with these two fields. The different kinds of token can
be represented by small integers.
R ~ eger
~ .-
~ ~ ~
Figure 4.2 The program of Figure 4.1 represented by a stream of tokens.
Program
f \
-l
Expression
Declaration Expression
f > n
RmmFRAFlRRRRn
Ident. Ident. Ident. Ident. Op. Int-Lit.
n n n nnn
eger
In summary:
A regular language - a language that does not exhibit self-embedding - can be
generated by an RE.
A language that does exhibit self-embedding cannot be generated by any RE. To
generate such a language, we must write recursive production rules in either BNF or
EBNF.
ldentifier
Operator
This grammar generates expressions such as:
e
a + b
a - b - c
a + (b * C )
a * ( b + c )/ d
a - ( b - ( C - ( d - el))
Because the production rules defining Expression and primary-Expression are
mutually recursive, the grammar can generate self-embedded expressions.
0
EBNF combines the advantages of both BNF and REs. It is equivalent to BNF in
expressive power. Its use of RE notation makes it more convenient than BNF for
specifying some aspects of syntax.
Syntactic Analysis 81
These production rules are equivalent in the sense that they generate exactly the
same languages. The production rule N ::= X 1 N Y states that an N-phrase may consist
either of an X-phrase or of an N-phrase followed by a Y-phrase. This is just a roundabout
way of stating that an N-phrase consists of an X-phrase followed by any number of Y-
phrases. The production rule N ::= X (Y)* states the same thing more concisely.
This production rule is a little more complicated than the form shown above, but we can
left-factorize it:
Identifier . Letter
I ldentifier (Letter I Digit)
We can easily generalize this to define the starter set of an extended RE. There is
only one case to add:
where N is a nonterminal
symbol defined by
production rule N ::=X
In Example 4.4:
starters[[Expression] = starters[[primary-Expression
(Operator primary-Expression)*]
= starters[[primary-Expression]
= starters[[ldentifier] u starters[[( Expression ) ]
= starters[[a 1 b I c I d I el u { ( )
= l a , b, c,d,e, ( 1
4.3 Parsing
In this section we are concerned with analyzing sentences in some grammar. Given an
input string of terminal symbols, our task is to determine whether the input string is a
sentence of the grammar, and if so to discover its phrase structure. The following
definitions capture the essence of this.
With respect to a particular context-free grammar G:
Recognition of an input string is deciding whether or not the input string is a sentence
of G.
Parsing of an input string is recognition of the input string plus determination of its
phrase structure. The phrase structure can be represented by a syntax tree, or other-
wise.
We assume that G is unambiguous, i.e., that every sentence of G has exactly one
syntax tree. The possibility of an input string having several syntax trees is a compli-
cation we prefer to avoid.
Parsing is a task that humans perform extremely well. As we read a document, or
listen to a speaker, we are continuously parsing the sentences to determine their phrase
structure (and then determine their meaning). Parsing is subconscious most of the time,
but occasionally it surfaces in our consciousness: when we notice a grammatical error,
or realize that a sentence is ambiguous. Young children can be taught consciously to
parse simple sentences on paper.
In this section we are interested in parsing algorithms, which we can use in syntactic
analysis. Many parsing algorithms have been developed, but there are only two basic
parsing strategies: bottom-up parsing and top-down parsing. These strategies are
characterized by the order in which the input string's syntax tree is reconstructed. (In
Syntactic Analysis 85
Noun
the
I
cat
(Input terminal symbols not yet examined by the parser are shaded gray.)
(2) Now the parser can apply the production rule 'Subject ::= the Noun' (4.2c), com-
bining the input terminal symbol 'the' and the adjacent Noun-tree into a Subject-
tree:
Subject
I
the
I
cat
(3) Now the parser moves on to the next input terminal symbol, 'sees'. Here it can
apply the production rule 'Verb ::= sees' (4.5d), forming a Verb-tree:
Subject
c.dun Verb
(4) The next input terminal symbol is 'a'. The parser cannot do anything with this
terminal symbol yet, so it moves on to the following input terminal symbol, 'rat'.
Here it can apply the production rule 'Noun ::= rat' (4.4c), forming a Noun-tree:
Subject
G u n Verb Noun
I
the
I
cat
I
sees a
I
rat
I Syntactic Analysis 89
(6) The leftmost stub is now the (second) node labeled Noun. If the parser chooses to
apply production rule 'Noun ::= rat' (4.4c), it can connect the input terminal sym-
bol 'rat' to the tree. This step leaves the parser with a stub labeled '.' that matches
the next (and last) input terminal symbol:
Sentence
I
Subject Object
Now let us see how to implement the parser. We need a class to contain all of the
parsing methods; let us call it Parser.This class will also contain an instance variable,
currentTermina1,that will range over the terminal symbols of the input string. (For
example, given the input string of Figure 4.5, currentTermina1 will first contain
'the', then 'cat', then 'sees', etc., and finally '.'.) The Parser class, containing
currentTermina1,is declared as follows:
public class Parser {
This type style indicates a command or expression not yet refined into Java. We will use this
convention to suppress minor details.
The parser is initiated using the following method:
public void parse ( ) {
currentTermina1 = first input terminal ;
parsesentence();
check that no terminal follows the sentence
I
This parser does not actually construct a syntax tree. But it does (implicitly) deter-
mine the input string's phrase structure. For example, parseNoun whenever called
finds the beginning and end of a phrase of class Noun, and parsesubject whenever
called finds the beginning and end of a phrase of class Subject. (See Figure 4.5.)
0
In general, the methods of a recursive-descent parser cooperate as follows:
The variable currentTermina1will successively contain each input terminal. All
parsing methods have access to this variable.
On entry to method parseN,currentTermina1 is supposed to contain the first
terminal of an N-phrase. On exit from parseN,currentTermina1 is supposed to
contain the input terminal immediately following that N-phrase.
On entry to method accept with argument t, current~erminalis supposed to
contain the terminal t . On exit from accept,currentTermina1 is supposed to
contain the input terminal immediately following t.
If the production rules are mutually recursive, then the parsing methods will also be
mutually recursive. For this reason (and because the parsing strategy is top-down), the
algorithm is called recursive descent.
These transformations are justified because they will make the grammar mor
suitable for parsing purposes. After making similar transformations to other parts of th
grammar, we obtain the following complete EBNF grammar of Mini-Triangle:
Program ...- single-Command (4.t
Command ..-
. single-Command (; single-Command)" (4.7
single-Command ::= Identifier (: = Expression ( ( Expression ) ) (4.8
I if Expression then single-Command
else single-Command
I while Expression do single-Command
( let Declaration in single-Command
I begin Command end
Expression ..- primary-Expression
(Operator primary-Expression)"
primary-Expression ::= Integer-Literal
I ldentifier
I Operator primary-Expression
( ( Expression )
Declaration .
..- single-Declaration (; single-Declaration)" (4.1 1
single-Declaration ::= -
const ldentifier Expression
I var ldentifier : Type-denoter
Type-denoter ..-
. Identifier (4.13
We have excluded production rules (1.10) through (1 .I 3), which specify the synta
of operators, identifiers, literals, and comments, all in terms of individual characters
This part of the syntax is called the language's lexicon (or microsyntax). The lexicon i
of no concern to the parser, which will view each identifier, literal, and operator as ,
single token. Instead, the lexicon will later be used to develop the scanner, in Sectioi
4.5.
We shall assume that the scanner returns tokens of class Token,defined in Exam
ple 4.2. Each token consists of a kind and a spelling. The parser will examine only th~
kind of each token.
Step (2) is to convert each EBNF production rule to a parsing method. The parsin1
methods will be as follows:
private void parseprogram 0 ;
private void parsecommand ( ) ;
private void parseSingleCommand ( ) ;
private void parseExpression 0 ;
private void parsePrimaryExpression 0 ;
private void parseDeclaration 0 ;
private void parseSingleDeclaration 0 ;
96 Programming Language Processors in Java
parsesinglecommand(); single-Command
1 >*
1
This method illustrates something new. The EBNF notation '(; single-Command)*
signifies a sequence of zero or more occurrences of '; single-Command'. To parse thi
we use a while-loop, which is iterated zero or more times. The condition for continuin<
the iteration is simply that the current token is a semicolon.
Method parseDeclaration is similar to parsecommand. The remainin;
methods are as follows:
private void parseprogram () { Program ::=
parseSingleCommand(); single-Command
1
private void parseSingleCommand () {
switch (currentToken.kind) {
case Token.1DENTIFIER:
{
parseIdentifier0; Identifier
switch (currentToken-kind){
case Token.BECOMES:
{
acceptIt ( ) ; :=
parseExpression(); Expression
1
break ;
case Token.LPAREN: I
acceptIt ( ) ; (
parseExpression(); Expression
accept(Token.RPAREN); 1
1
break ;
default :
report a syntactic error
1
1
break ;
Syntactic Analysis 99
Operator
primary-Expression
>*
acceptIt ( ) ; (
parseExpression ( ) ; Expression
accept(Token.RPAREN); 1
break ;
default:
report a syntactic error
1
private void parseTypeDenoter 0 { Type-denoter ::=
parseIdentifier0; Identifier
1
The nonterminal symbol ldentifier corresponds to a single token, so the method
parseIdentif ier is similar to accept:
private void parseIdentifier ( ) {
if (currentToken-kind== Token.1DENTIFIER)
currentToken = scanner.scan0;
else
report a syntactic error
1
Syntactic Analysis I(
Having worked through a complete example, let us now study in general terms ho
we systematically develop a recursive-descent parser from a suitable grammar. The tw
main steps are: (1) express the grammar in EBNF, performing any necessary transforn
ations; and (2) convert the EBNF production rules to parsing methods. It will be cot
venient to examine these steps in reverse order.
The reasoning behind this is simple. The input must consist of an X-phrase followed
by a Y-phrase. Since the parser works from left to right, it must parse the X-phrase and
{
acceptIt ( ) ;
parsesinglecommand();
1
In this situation we know already that the current token is a semicolon, so 'accept
It ( ) ;' is a correct alternative to 'accept(Token. SEMICOLON);'.
[
This eliminates the problem, assuming that starters[[Declaration ;1) is disjoint from
starters[Command].
The above examples are quite typical. Although the LL(1) condition is quite restric-
tive, in practice most programming language grammars can be transformed to make
them LL(1) and thus suitable for recursive-descent parsing.
In general, a grammar that exhibits left recursion cannot be LL(1). Any attempt to
convert left-recursive production rules directly into parsing methods would result in an
incorrect parser. It is easy to see why. Given the left-recursive production rule:
N ::= X 1 NY
we find:
startersl[N YJI = startersl[NJ = starters[Xlj u startersl[N YJ
so startersl[fl and starters[N Y j cannot be disjoint.
4.4.1 Representation
The following example illustrates how we can define ASTs in Java.
Program
I (1.14)
C
Assigncommand CallCommand
el (1.15a) (1.15b)
V E Identifier E
spelling
Syntactic Analysis 1 1 1
A node with tag 'ConstDeclaration' is the root of a Declaration AST with two
subtrees: an ldentifier AST and an Expression AST.
A node with tag 'Identifier' is the root of an ldentifier AST. This is just a terminal
node, whose only content is its spelling.
We need to define Java classes that capture the structure of Mini-Triangle ASTs. We
begin by introducing an abstract class, called AST,for all abstract syntax trees:
public abstract class AST {
Program has only a single form, consisting simply of a Command, so the class
Program simply contains an instance variable for the command that is the body of the
program.
For each nonterminal in the Mini-Triangle abstract syntax that has several forms
(such as Command), we introduce an abstract class (such as Command), and several
concrete subclasses.
Command ASTs:
public abstract class Command extends AST { ... )
This method is fairly typical. It has been enhanced with a local variable, declAST,in
which the AST of the single-declaration will be stored. The method eventually returns
this AST as its result. Local variables iAST,eAST,and tAST are introduced where
required to contain the ASTs of the single-declaration's subphrases.
Here is the enhanced method parsecommand:
private Command parsecommand ( ) {
Command clAST = parseSingleCommand();
while (currentToken.kind == Token.SEMICOLON) {
acceptIt ( ) ;
Command c2AST = parseSingleComrnand();
clAST = new SequentialCommand (clAST, c2AST) ;
1
return clAST;
This method contains a loop, arising from the iteration '*' in production rule (4.7),
which in turn was introduced by eliminating the left recursion in (1.2a-b). We must be
careful to construct an AST with the correct structure. The local variable clAST is used
to accumulate this AST.
Suppose that the command being parsed is 't := x; x := y ; y := t'.Then after
the method parses 't := x',it sets clAST to the AST for 't := x';after it parses 'x :=
y', it updates clAST to the AST for 't := x ; x := y';and after it parses 'y := t', it
updates c lAST to the AST for ' t : = x; x : = y; y : = t'.
Here is an outline of the enhanced method parseSingleCommand:
private Command parseSingleCommand () {
Command comAST;
switch (currentToken.kind) {
case Token.IDENTIFIER: {
Identifier iAST = parseIdentifier0 ;
switch (currentToken.kind) {
case Token.BECOMES: {
acceptIt ( ) ;
Expression eAST = parseExpression ( ) ;
comAST = new ~ssignCommand(iAST, eAST) ;
1
break ;
case Token.LPAREN: {
acceptIt ( ) ;
Expression eAST = parseExpression0;
accept(Token.RPAREN);
comAST = new CallCommand (iAST, eAST) ;
break ;
Syntactic Analysis 1
default :
report a syntactic error
I
1
break ;
case Token.IF:
...
case Token.WHILE:
...
case Token.LET: {
acceptIt ( ) ;
D e c l a r a t i o n dAST = parseDeclaration ( ) ;
accept(Token.IN);
Command CAST = parseSingleCommand();
comAST = new Letcommand ( d A S T , CAST) ;
1
break ;
case Token.BEGIN: {
acceptIt ( ) ;
comAST = parsecommand ( ) ;
accept(Token.END);
}
break ;
default :
report a syntactic error
1
return comAST;
1
If the single-command turns out to be of the form 'beginC end',there is no need 1
construct a new AST, since the 'begin'and 'end'are just command brackets. So i
this case the method immediately stores C's AST in comAST.
The method parseIdentifier constructs an AST terminal node:
private I d e n t i f i e r parseIdentifier () {
I d e n t i f i e r idAST;
if (currentToken.kind == Token-IDENTIFIER) {
i d A S T = new I d e n t i f i e r ( c u r r e nt T o k e n . s p e l l i n g ) ;
currentToken = scanner.scan();
) else
report a syntactic error
return i d A S T ;
1
The methods parseIntegerLitera1 and parseoperator do likewise.
Syntactic Analysis 119
The lexical grammar of Triangle expressed in EBNF may be found in Section B.8.
Before developing the scanner, the lexical grammar was modified in two respects:
The production rule for Token was modified to add end-of-text as a distinct token.
Keywords were grouped with identifiers. (See Exercise 4.18 for an explanation.)
Most nonterminals were eliminated by substitution. The result was a lexical grammar
containing only individual characters, nonterminals that represent individual characters
(i.e., Letter, Digit, Graphic, and Blank), and the nonterminals Token and Separator:
Token ::= Letter (Letter I Digit)* I Digit Digit* I (4.24)
Op-character Op-character* I Graphic
. l ~ l ; l : ~ ~ l = ~ l ~ l ~ l ~ l ~ l l l ~ l l l
end-of-text
Separator ::= ! Graphic* end-of-line I Blank (4.25)
The Triangle scanner was then developed from this lexical grammar, following the
procedure described in Section 4.5.
4.6.3 Parsing
The Parser class contains a recursive-descent parser, as described in Section 4.3. The
parser calls the scan method of the Scanner class to scan the source program, om
Syntactic Analysis 129
Exercises
Section 4.1
4.1 Perform syntactic analysis of the Mini-Triangle program:
begin while true do putint(1); putint(0) end
along the lines of Figures 4.1 through 4.4.
4.2 Modify the class Token (Example 4.2) so that the instance variable spell-
ing is left empty unless the token is an identifier, literal, or operator.
Syntactic Analysis 131
4.10* The following EBNF grammar generates a subset of the UNIX shell command
language:
Script ..-
. Command*
Command ::= Filename Argument* eol
I Variable = Argument eol
I if Filename Argument* then eol
Command*
else eol
Command*
f i eol
I for Variable in Argument* eol
do eol
Command*
od eol
Argument ::= Filename I Literal ( Variable
The start symbol is Script. The token eol corresponds to an end-of-line.
Construct a recursive-descent parser for this language. Treat filenames, literals,
and variables as single tokens.
4.11* Consider the rules for converting EBNF production rules to parsing methods
(Section 4.3.4).
(a) Suggest an alternative refinement rule for 'parse X I Y , using an if-
statement rather than a switch-statement.
(b) In some variants of EBNF, [a is used as an abbreviation for X ( E.
Suggest a refinement rule for 'parse [XI'.
(c) In some variants of EBNF, Xt is used as an abbreviation for X X*.
Suggest a refinement rule for 'parse X+'.
In each case, state any condition that must be satisfied for the refinement rule
to be correct.
134 Programming Language Processors in Java
another AST node. A terminal node contains a tag and a spelling. The tag dis-
tinguishes between an identifier, a literal, and an operator.
(a) Reimplement the class AST for Mini-Triangle.
(b) Provide this class with a method d i s p l a y , as specified in Exercise 4.15.
Section 4.5
4.17 The Mini-Triangle scanner (Example 4.21) stores the spellings of separators,
including comments, only to discard them later. Modify the scanner to avoid
this inefficiency.
4.18* Suppose that the Mini-Triangle lexical grammar (Example 4.21) were modified
as follows, in an attempt to distinguish between identifiers and keywords (such
as ' i f ', 'then', ' e l s e ' , etc.):
Token .
..- Identifier I Integer-Literal I Operator (
if 1 then ( else 1 ... 1
; ~ : ~ : = ~ - ~ ( ~ ) ~ e o t
Identifier ::= Letter (Letter ( Digit)"
Point out a serious problem with this lexical grammar. (Remember that the ter-
minal symbols are individual characters.) Can you see any way to remedy this
problem?
4.19 (a) Modify the Mini-Triangle lexical grammar (Example 4.21) as follows.
Allow identifiers to contain single embedded underscores, e.g., 'set-up'
(but not ' s e t u p ' , nor 'set-', nor '-up'). Allow real-literals, with a
decimal point surrounded by digits, e.g., ' 3 . 1 4 1 6 ' (but not ' 4 . ', nor
' .1 25 ' ) .
General
4.20* Consider a hypothetical programming language, Newspeak, with an English-
like syntax (expressed in EBNF) as follows:
Program ..-
.- Command .
Command ..- single-Command single-Command*
do nothing
store Expression in Variable
if Condition : single-Command
otherwise : single-Command
do Expression times : single-Command
Syntactic Analysis 135
4.21** Design and implement a complete syntactic analyzer for your favorite pro-
gramming language.
4.22"" A cross-referencer is a language processor that lists each identifier that occurs
in the source program, together with the line numbers where that identifier oc-
curs. Starting with either the Mini-Triangle syntactic analyzer or the syntactic
analyzer you implemented in Exercise 4.21:
(a) Modify the scanner so that every token contains a field for the line number
where it occurs.
(b) Develop a simple cross-referencer, reusing appropriate parts of your syn-
tactic analyzer.
(c) Now make your cross-referencer distinguish between defining and applied
occurrences of each identifier.
A programming language must also specify the appropriate scope rule for the
standard environment. Most programming languages consider the standard environment
to be a scope enclosing the whole program, so that the source program may contain a
declaration of an identifier present in the standard environment without causing a scope
error. Some other programming languages (such as C) introduce the standard environ-
ment at the same scope level as the global declarations of the source program.
If the standard environment is to be at a scope enclosing the whole program, the
declarations of the standard environment should be entered at scope level 0 in the
identification table.
(We show the inferred type T by annotating the AST node with ': T'.)
If I is declared in a variable declaration, whose right side is type T, then the type of
the applied occurrence of I is T:
I I I I
e,
VarDeclaration
---- -----
Ident. T
SimpleVname
I
Ident.
,-J,
VarDeclaration
Ident. T
SimpleVname : T
----- -- -- I
Ident.
I I
BinaryExpression BinaryExpression : boo1
m.
. . .Expr. Op. . .Expr.
m.
. .Expr. Op. ..Expr.
I : int / I : int 1:int / I : int
... < ... ... < ...
The operator '<' is of type int x int + bool. Having checked that the type of E l is
equivalent to int, and that the type of E2 is equivalent to int, the type checker infers that
the type of ' E l < E2' is bool. Other operators would be handled similarly.
0
Of course, Mini-Triangle type checking is exceptionally simple: the representation
of types is trivial, and testing for type equivalence is also trivial. Type checking is more
complicated if the source language has composite types. For example, Triangle array
and record types have component types, which are unrestricted. Thus we need to
represent types by trees.
Furthermore, there are two possible definitions of type equivalence.
Some programming languages (such as Triangle) adopt structural equivalence,
whereby two types are equivalent if and only if their structures are the same. If types are
represented by trees, structural equivalence can be tested by comparing the structures of
these trees. If the implementation language is Java, then this kind of equality is conven-
tionally tested by an e q u a l s method in the T y p e class.
Other programming languages (such as Pascal and Ada) adopt name equivalence.
Every occurrence of a type constructor (e.g., a r r a y or r e c o r d ) creates a new and
distinct type. In this case type equivalence can be tested simply by comparing pointers
to the objects representing the types: distinct objects (created at different times)
represent types that are not equivalent, even if they happen to be structurally similar. If
the implementation language is Java, then this kind of equality is tested by the '=='
operator applied to objects of class Type.
Contextual Analysis 1
The work of the contextual analyzer will be done by a set of visitor methods. The
will be exactly one visitor method, visitA,for each concrete AST subclass A.The
visitor methods will cooperate to traverse the AST in the desired order.
In Triangle type equivalence is structural. Of the types shown in Figure 5.5, only (5)
and (6) are equivalent to each other. To test whether two types are equivalent, the type
checker just compares their ASTs structurally. This test is performed by defining an
equals method in each subclass of TypeDenoter.Class TypeDenoter itself is
enhanced as follows:
public abstract class TypeDenoter extends AST {
public abstract boolean equals (Object other);
Type identifiers in the AST would complicate the type equivalence test. To remove
this complication, the visitorlchecking methods for type-denoters are made to eliminate
all type identifiers. This is achieved by replacing each type identifier by the type it
denotes.
Figure 5.6 shows the ASTs representing the following Triangle declarations:
type Word -
array 8 of Char;
var wl: Word;
var w2: array 8 of Char
Initially the type subtrees (1) and (2) in the two variable declarations are different. After
these subtrees have been checked, however, the type identifiers 'Char'and 'Word'
have been eliminated. The resulting subtrees (3) and (4) are structurally similar. The
elimination of type identifiers makes it clear that the types of variables wl and w2 are
equivalent.
A consequence of this transformation is to make each type 'subtree' (and hence the
whole AST) into a directed acyclic graph. Fortunately, this causes no serious complic-
ation in the Triangle compiler. (But recursive types - as found in Pascal, Ada, and ML -
would cause a complication: see Exercise 5.9.)
The Triangle type checker infers and checks the types of expressions and value-or-
variable-names in much the same way as in Example 5.8. Types are tested for structural
equivalence by using the equals method of the TypeDenoter class. (Instead,
comparing types by means of '==' would implement name equivalence.)
Contextual Analysis 167
Before analyzing a source program, the contextual analyzer initializes the identifi-
cation table with entries for the standard identifiers, at scope level 0, as shown in Figure
5.8. The attribute stored in each of these entries is a pointer to the appropriate 'declar-
ation'. Thus standard identifiers are treated in exactly the same way as identifiers dec-
lared in the source program.
-
(1) (2) (3)
TypeDeclaration ConstDeclaration ConstDeclaration
l-l
Ident. boo1
i-l
Ident EmptyExpr.
I
Ident EmptyExpr.
--
(4)
FuncDeclaration
eof
(5) (6)
ProcDeclaration ProcDeclaration
-
get VarFP.
r-l
Ident. char
el
Ident. char
dummy
(8)
BinaryOpDeclaration
Figure 5.7 Small ASTs representing the Triangle standard environment (abridged).
168 Programming Language Processors in Java
Figure 5.8 Identification table for the Triangle standard environment (abridged).
The Triangle standard environment also includes a collection of unary and binary
operators. It is convenient to treat operators in much the same way as identifiers, as
shown in Figures 5.7 and 5.8.'
The representation of the Triangle standard environment therefore includes small
ASTs representing 'operator declarations', such as one for the unary operator ' \ ' (7),
and one for the binary operator '<' (8). (See Figure 5.7.) An 'operator declaration'
merely defines the types of the operator's operand(s) and result. Entries are also made
for operators in the identification table. (See Figure 5.8.) At an application of operator
0, the identification table is used to retrieve the 'operator declaration' of 0,and thus to
find the operand and result types for type checking.
Further reading
For a more detailed discussion of declarations, scope, and block structure, see Chapter 4
of the companion textbook by Watt (1990). Section 2.5 of the same textbook discusses
simple type systems (of the kind found in Triangle, Pascal, and indeed most program-
ming languages). Chapter 7 goes on to explore more advanced type systems. Coercions
(found in most languages) are implicit conversions from one type to another. Overload-
ing (found in Ada and Java) allows several functions/procedures/methods with different
bodies and different types to have a common identifier, even in the same scope. In a
function/procedure/method call with this common identifier, a technique called overload
resolution is needed to identify which of several functions/procedures/methods is being
called. Parametric polymorphism (found in M L ) allows a single function to operate
' Indeed, some programming languages. such as ML and Ada, actually allow operators to be
declared like functions in the source program. This emphasizes the analogy between operators
and function identifiers.
Contextual Analysis 169
uniformly on arguments of a family of types (e.g., the list types). Moreover, the types of
functions, parameters, etc., need not be declared explicitly. Polymorphic type inference
is a technique that allows the types in a source program to be inferred in the context of a
polymorphic type system.
For a comprehensive account of type checking, see Chapter 6 of Aho et al. (1985).
As well as elementary techniques, the authors discuss techniques required by the more
advanced type systems: type checking of coercions, overload resolution, and polymor-
phic type inference. For some reason, however, Aho et al. defer discussion of identifi-
cation to Chapter 7 (run-time organization).
A classic paper on polymorphic type inference by Milner (1978) was the genesis of
the type system that was adopted by ML, and borrowed by later functional languages.
For a good short account of contextual analysis in a one-pass compiler for a Pascal
subset, see Chapter 2 of Welsh and McKeag (1980). The authors clearly explain ways of
representing the identification table, attributes, and types. They also present a simple
error recovery technique that enables the contextual analyzer to generate sensible error
reports when an identifier is declared twice in the same scope, or not declared at all.
The visitor pattern used to structure the Triangle compiler is not the only possible
object-oriented design. One alternative design, explained in Appel (1 997), is to associate
the checking methods (and the encoding methods in the code generator) for a particular
AST object with the AST object itself. This design is initially easier to understand than
the visitor design pattern, but it has the disadvantage that the checking methods (and
encoding methods) are spread all over the AST subclass definitions instead of being
grouped together in one place.
You should be aware of a lack of standard terminology in the area of contextual
analysis. Identification tables are often called 'symbol tables' or 'declaration tables'.
Contextual analysis itself is often misnamed 'semantic analysis'.
Exercises
Section 5.1
5.1 Consider a source language with monolithic block structure. as in Section
5.1.1, and consider the following ways of implementing the identification table:
(a) an ordered list;
(b) a binary search tree;
(c) a hash table.
In each case implement the IdentificationTable class, including the
methods enter and retrieve.
In efficiency terms, how do these implementations compare with one another?
170 Programming Language Processors in Java
5.2 Consider a source language with flat block structure, as in Section 5.1.2.
Devise an efficient way of implementing the identification table. Implement the
IdentificationTable class, including the methods enter,retrieve,
openscope,and closescope.
5.3" For a source language with nested block structure, as in Section 5.1.3, we could
implement the identification table by a stack of binary search trees (BSTs).
Each BST would contain entries for declarations at one scope level. Consider
the innermost block of Figure 5.3, for example. At the stack top there would be
a BST containing the level-3 entries; below that there would be a BST
containing the level-2 entries; and at the stack bottom there would be a BST
containing the level- 1 entries.
Implement the Identif icationTable class, including the methods en-
ter,retrieve,openscope,and closescope.
In efficiency terms, how does this implementation compare with that used in
the Triangle compiler (Section 5.4.1)?
5.4* For a source language with nested block structure, we can alternatively imple-
ment the identification table by a sparse matrix, with columns indexed by scope
levels and rows indexed by identifiers. Each column links the entries at a par-
ticular scope level. Each row links the entries for a particular identifier, in order
from innermost scope to outermost scope. In the innermost block of Figure 5.3,
for example, the table would look like Figure 5.9.
Implement the IdentificationTable class, including the methods en-
ter,retrieve,openscope,and closescope.
In efficiency terms, how does this implementation compare with that used in
the Triangle compiler (Section 5.4.1), and with a stack of binary search trees
(Exercise 5.3)?
5.5" Outline an identification algorithm that does not use an identification table, but
instead searches the AST. For simplicity, assume monolithic block structure.
In efficiency terms, how does this algorithm compare with one based on an
identification table?
CHAPTER SIX
Run-Time Organization
Programming languages provide high-level data types such as truth values, integers,
characters, records, and arrays, together with operations over these types. Target I
machines provide only machine 'types' such as bits, bytes, words, and double-words,
together with low-level arithmetic and logical operations. To bridge the semantic gap i
between the source language and the target machine, the implementor must decide how
to represent the source language's types and operations in terms of the target machine's
types and operations.
i
In the following subsections we shall survey representations of various types. As we 1
study these representations, we should bear in mind the following fundamental
principles of data representation:
Nonconfusion: Different values of a given type should have different representations.
i
Uniqueness: Each value should always have the same representation.
The nonconfusion requirement should be self-evident. If two different values are
confused, i.e., have the same representation, then comparison of these values will
incorrectly treat the values as equal.
Nevertheless, confusion does arise in practice. A well-known example is the approx-
imate representation of real numbers: real numbers that are slightly different mathemat-
ically might have the same approximate representation. This confusion is inevitable,
however, given the design of our digital computers. So language designers must formul-
ate the semantics of real-number operations with care; and programmers on their part
must learn to live with the problem, by avoiding naive comparisons of real numbers.
On the other hand, confusion can and must be avoided in the representations of
discrete types, such as truth values, characters, and integers.
If the source language is statically typed, the nonconfusion requirement refers only
to values of the same type; values of distinct types need not have distinct represent-
ations. Thus the word 00.. .002 may represent the truth value false, the integer 0, the real
number 0.0, and so on. Compile-time type checks will ensure that values of different
types cannot be used interchangeably at run-time, and therefore cannot be confused.
Thus we can be sure that if 00 ...002 turns up as an operand of a logical operation, it
represents false, whereas if it turns up as an operand of an arithmetic operation, it
represents the integer 0.
The uniqueness requirement is likewise self-evident. Comparison of values would be
complicated by the possibility of any value having more than one representation. Cor-
rect comparison is possible, however, so uniqueness is desirable rather than essential.
Run-Time Organization 175
Figure 6.1 (a) Direct representation of a value x; (b) indirect representation of a value x;
(c) indirect representation of a value y, of the same type as x but requiring
more space.
176 Programming Language Processors in Java
Indirect representation is essential for types whose values vary greatly in size. For
example, a list or dynamic array may have any number of elements, and clearly the total
amount of space depends on the number of elements. For types such as this, indirect
representation is the only way to satisfy the constant-size requirement. This is illustrated
in Figure 6.l(b) and (c) where, although the values x and y occupy different amounts of
space, the handles to x and y occupy the same amount of space.
We now survey representations of the more common types found in programming
languages. We shall assume direct representation wherever possible, i.e., for primitive
types, records, disjoint unions, and static arrays. But we shall see that indirect represen-
tation is necessary for dynamic arrays and recursive types.
We shall use the following notation:
#T stands for the cardinality of type T, i.e., the number of distinct values of type T.
For example, # [ B o o l e a n ] = 2.
size T stands for the amount of space (in bits, bytes, or words) occupied by each value
of type T. If indirect representation is used, only the handle is counted.
We use emphatic brackets to enclose a specific type-denoter, as in # [ B o o l e a n ] or
sizel[Boolean] or sizel[array 8 o f C h a r ] .
If a direct representation is chosen for values of type T, we can assert the inequality:
size T 2 log2 (#T), or equivalently 2(si~e
T, 2 #T (6.1)
where size T is expressed in bits. This follows from the nonconfusion requirement: in n
bits we can represent at most 2n distinct values if we are to avoid confusion.
The values of the type C h a r are the elements of a character set. Sometimes the
source language specifies a particular character set. For example, Ada specifies the ISO-
Latin1 character set, which consists of 28 distinct characters, and Java specifies the
Unicode character set, which consists of 216 distinct characters. Most programming
languages, however, are deliberately unspecific about the character set. This allows the
compiler writer to choose the target machine's 'native' character set. Typically this
consists of 27 or 28 distinct characters. In any case, the choice of character set
determines the representation of individual characters. For example, I S 0 defines the
representation of character 'A' to be 010000012. We can represent a character by one
byte or one word.
The values of the type I n t e g e r are integer numbers. Obviously we cannot repre-
sent an unbounded range of integers within a fixed amount of space. All major program-
ming languages take account of this in their semantics: I n t e g e r denotes an
implementation-defined' bounded range of integers. The binary representation of
integers is determined by the target machine's arithmetic unit, and almost always
occupies one word. The source language's integer operations can then, for the most part,
be implemented by the corresponding machine operations.
In Pascal and Triangle, I n t e g e r denotes the range -muxint, ..., -1, 0, +1, ...,
+maxint, where the constant maxint is implementation-defined. In this case we have
#[[IntegerlJ = 2 x maxint + 1, and therefore we can specialize (6.1) as follows:
If the word size is w bits, then size[[IntegerJ = w. To ensure that (6.2) is satisfied, the
implementation should define maxint = 2W-1- 1 .
In Java, i n t denotes the range -23l, ..., -1,0, +1, ..., In this case we have
# t i n t ] = 232.
Some programming languages allow the programmer to define new primitive types.
An example is the enumeration type of Pascal. The values of such a type are called
enumerands. Enumerands can be represented by small integers.
We can represent each Ii by (the binary equivalent of) i. Since #T = n , size T 2 log2 n
bits.
The enumeration type is equipped with operations such as s u c c (which returns the
successor of the given enumerand) and o r d (which returns the integer representation of
the given enumerand). The representation chosen allows s u c c to be implemented by
the target machine's INC operation (if available). The o r d operation becomes a NOP.
We must distinguish between the identifiers and the enumerands they denote, because the
identifiers could be redeclared.
Run-Time Organization 179
6.1.2 Records
Now we proceed to examine the representation of composite types. These are types
whose values are composed from simpler values.
A record consists of several fields, each of which has an identifier. A record type
designates the identifiers and types of its fields, and all the records of a particular type
have fields with the same identifiers and types. The fundamental operation on records is
field selection, whereby we use one of the field identifiers to access the corresponding
field.
Records occur obviously in Pascal, Ada, and Triangle, and as s t r u c t s in C.
There is an obvious and good direct representation for records: we simply juxtapose
the fields, i.e., make them occupy consecutive positions in storage. This representation
is compact, and makes it easy to implement field selection very efficiently.
today.y
t0day.m
today.d
her.female
her,
her.status
{
h e .d o .Y
her.dob.m
her.dob.d
rl
Assume for simplicity that each primitive value occupies one word. Then the
variables today and her (after initialization) would look like this:
Each box in the diagram is a word. A variable of type Date (such as today) occupies
three consecutive words, one for each of its fields. A variable of type Details (such
as her) occupies five consecutive words, one for its field female,three for its field
dob,and one for its field status.
180 Programming Language Processors in Java
We can predict not only the total size of each record variable, but also the position of
each field relative to the base of the record. If t o d a y is located at address 100 (i.e., it
occupies the words at addresses 100 through 102), then t o d a y . y is located at address
100, t o d a y .m is located at address 101, and t o d a y . d is located at address 102. In
other words, the fields y , m, and d have offsets of 0, 1, and 2, respectively, within any
record of type D a t e . Likewise, the fields f e m a l e , d o b , and s t a t u s have offsets of
0, 1, and 4, respectively, within any record of type D e t a i l s .
Summarizing:
sizefDa ten = 3 words
addressftoday.y j = addressutoday] + 0
address[ t o d a y .m] = address[today]I + 1
a d d r e s s f t o d a y .d j = addressftoday] 2 +
sizef~etails] = 5 words
addressfher . f e m a l e ] = address[[her]+ 0
addressther .doh] = address[herj + 1
addressfher . d o b .y ] = addressl[her .d o b ] + 0 = uddress[[her]+ 1
addremuher .d o b . mj = addressfher . d o b j + 1 = address([her]l+ 2
address[[her. d o b .d ] = addressther .d o b ] + 2 = addressl[herj + 3
addressuher . s t a t u s ] = addressl[her] + 4
0
We shall use the notation address v to stand for the address of variable v. If the
variable occupies several words, this means the address of the first word. We use
emphatic brackets I[.. .]I to enclose a specific variable-name, as in addressfher .d o b ] .
Let us now generalize from this example. Consider a record type T and variable r:
t y p e T = r e c o r d 1, : T I , ... , I,: T, e n d ; (6.3)
v a r r: T
We represent each record of type T by juxtaposing its n fields, as shown in Figure 6.2. It
is clear that:
size T = size T I + ... + size T, (6.4)
This satisfies the constant-size requirement. If size T 1 , ..., and size T, are all constant,
then size T is also constant.
The implementation of field selection is simple and efficient. To access field Iiof the
record r, we use the following address computation:
addressur .Ii] = address r + (size T I + ... + size Ti_,) (6.5)
Since size T 1 , .. ., and size TiPl are all constant, the address of the field r . Ii is just a
constant offset from the base address of r itself. Thus, if the compiler knows the address
Run-Time Organization 181
of the record, it can determine the exact address of any field, and can generate code to
access the field directly. In these circumstances, field selection is a zero-cost operation!
However, note that some machines have alignment restrictions, which may force
unused space to be left between record fields. Such alignment restrictions invalidate
equations (6.4) and (6.5). (See Exercise 6.9.)
value of type T I
value of type T2
... ...
Some values of type Number occupy two words; others occupy three words. This
apparently contradicts the constant-size requirement, which we wish to avoid at all
costs. We want the compiler to allocate a fixed amount of space to each variable of type
Number,and let it change form within this space. To be safe we must allocate three
words: one word for the tag field, and two words for the variant part. The fields i and r
can be overlaid within the latter two words. When the tag is true,one word is unused
(shaded gray in the diagram), but this is a small price to pay for satisfying the constant-
size requirement. Thus:
size[[NumberJ = 3 words
Now consider the following variant record type, which illustrates an empty variant
and a variant with more than one field:
type Shape = (point, circle, box);
Figure = record
case s: Shape of
point: ( ) ;
circle: ( r: Integer ) ;
box : ( h, w: Integer )
end ;
var fig: Figure
Every value of type Figure has a tag field, named s, and a variant part. The value of
the tag (point,circle,or box) determines the form of the variant part. If the tag is point,
the variant part is empty. If the tag is circle,the variant part is an integer field named r.
If the tag is box,the variant part is a pair of integer fields named h and w.
Run-Time Organization 183
Assume that each primitive value occupies one word. Then the variable f i g would
look like this:
(The enumerands point, circle, and box would be represented by small integers, as
discussed in Section 6.1.1 .)
It is easy to see that:
size[Figure] = 3 words
address[[fig.sJ = address[[fig]+O
address[[f i g . r ] = address[[£i g ] + 1
address[[fig.h] = address[[fig]+l
address[ f i g .wJ = address[[f i g ] + 2
Let us now generalize. Consider a Pascal variant record type T and variable u:
type T = record
c a s e Itag: Ttag of
v ] : (I1: T I ) ;
.. .
vn: ( I n : T n )
end ;
var u: T
where each variant is labeled by one possible value of the type Ttag = { v l , ..., v,,}. We
represent each record of type T by juxtaposing its tag field and variant part. Within the
variant part we overlay the different variants, which are of types T I ,T2, ..., and Tn. This
representation is shown in Figure 6.3. It is clear that:
size T = size Ttag + max (size T I , ..., size Tn) (6.8)
This satisfies the constant-size requirement. If size T,,,, size TI, . . ., and size T, are all
constant, then size T is also constant.
The operations on variant records are easily implemented. To access the tag and
variant fields of the variant record u, we use the following address computations:
address[[u.ItagJ = address u + 0 (6.9)
addressru .Ii] = address u + size Ttag (6.10)
- both being constant offsets from the base address of u.
This analysis can easily be generalized to variants with no fields or many fields, as in
Example 6.5.
184 Programming Language Processors in Java
But what is the significance of this number 102? It is just addressfgrade [O] I). We
call this address the origin of the array g r a d e . An array's origin coincides with its base
address only if its lower bound is zero.
Similarly, addressfgnp [ i ] I) = addressfgnp [ O ] I) + i, where the origin of the array
g n p is addressl[gnp [ 01 I) = addressfgnpI) - 2000. Of course, this particular array has
no element with index 0, but that does not prevent us from using its origin (which is just
a number!) to compute the addresses of its elements at run-time.
17
Let us now generalize. Consider a Pascal array type T and array variable a:
type T = a r r a y [ I . . u ] of Telem; (6.14)
var a: T
The constants 1 and u are the lower and upper index bounds, respectively, of the array
type. Each array of type T has (u - 1 + 1 ) elements, indexed from 1 through u. As before,
we represent each array by juxtaposing its elements, as shown in Figure 6.5. It is clear
that:
size T = ( u - I + 1 ) x size Telem (6.15 )
Again, this satisfies the constant-size requirement, since 1 and u are constant.
The element of array a with index i is addressed as follows:
addressfa [ i l l = address a + (i - 1) x size Telem
= address a - ( I x size Telem)+ (i x size Telem)
From this we can determine the origin addressfa [0] I), and use it to simplify the
formula:
addressta [ 0 I ] = address a - ( I x size Telem) (6.16)
addressl[a [il I) = addressUa [O] I) + ( i x size Te& (6.17)
Equation (6.17) has the same form as (6.13). The only difference is that a [ 0 I no longer
need be the first element of the array a. Indeed, a [O] might not even exist! But that
does not matter, as we saw in Example 6.7, because address[a [ O ] I) is just a number.
There is more to array indexing than an address computation. An index check is also
needed, to ensure that the evaluated index lies within the array's index bounds. When an
array of the type T of (6.14) is indexed by i, the index check must ensure that:
Since the index bounds 1 and u are known at compile-time, the compiler can easily
generate such an index check.
188 Programming Language Processors in Java
The values of type String are arrays of characters, indexed by integers. Different
arrays of type String may have different index bounds; moreover, these index bounds
may be evaluated at run-time. Operations such as concatenation and lexicographic
comparison are applicable to any arrays of type String,even if they have different
numbers of elements. But any attempt to assign one array of type String to another
will fail at run-time unless they happen to have the same number of elements.
A suitable representation for arrays of type String is as follows. Each array's
handle contains the array's origin, i.e., the address of the (possibly notional) element
with index 0. The handle also contains the array's lower and upper index bounds. The
array's elements are stored separately.
Suppose that the variables k, m, and n turn out to have values 7, 0, and 4, respec-
tively. Then the array d will have index bounds 1 and 7, and the array s will have index
bounds 0 and 3. The arrays will look like this:
Run-Time Organization
{ ?%%bound
upper bound
rv
origin
lower bound
upper bound
handle elements
Each array's handle occupies 3 words exactly (assuming that integers and addresses
occupy one word each). The elements of d occupy 7 words, whereas the elements of s
occupy 4 words (assuming that characters occupy one word each). Since the elements
are stored separately, we take size[String]l to be the size of the handle:
J 3 words
size[r~tringI=
Likewise, we shall take address[[dIJ to be the address of d's handle. The address of
element d ( 0 ) is stored at offset 0 within the handle. Thus the address of an arbitrary
element can be computed as follows:
Let us now generalize. Consider an Ada array type T and array variable a :
t y p e T i s a r r a y ( I n t e g e r r a n g e < > ) of Telem; (6.19)
The object code for expression evaluation in registers is efficient but rather compli-
cated. A compiler generating such code must assign a specific register to each
intermediate result. It is important to do this well, but quite tricky. In particular, a
problem arises when there are not enough registers for all the intermediate results. (See
Exercise 6.1 1.)
A very different kind of machine is one that provides a stack for holding
intermediate results. This allows us to evaluate expressions in a very natural way. Such
a machine typically provides instructions like those listed in Table 6.2.
In Figure 6.7 and throughout this book, the stack is shown growing downwards, with the stack
top nearest the bottom of the diagram. If this convention seems perverse, recall the convention
for drawing trees in computer science textbooks! Shading indicates the unused space beyond
the stack top.
Run-Time Organization 195
These desirable and simple properties of evaluation on the stack hold true regardless
of how complicated the expression is. An expression involving function calls can be
evaluated in just the same way. Likewise, an expression involving operands of different
types (and therefore different sizes) can be evaluated in just the same way.
And so on.
The collection of registers LB, L I , L2, ..., and SB is often called the display. The
display allows access to local, nonlocal, and global variables. The display changes
whenever a routine is called or returns.
The critical property of the display is that the compiler can always determine which
register to use to access any variable. A global variable is always addressed relative to
SB. A local variable is always addressed relative to LB. A nonlocal variable is addressed
relative to one of the registers L1, L2, .... The appropriate register is determined entirely
by the nesting levels of the routines in the source program.
We assign routine levels as follows: the main program is at routine level 0; the body
of each routine declared at level 0 is at routine level 1; the body of each routine declared
at level 1 is at routine level 2; and so on.
Let v be a variable declared at routine level 1, and let v's address displacement be d.
Then the current value of v is fetched by various parts of the code as follows:
If 1 = 0 (i.e., v is a global variable):
LOAD d[SB] - for any code to fetch the value of v
Routines
A routine (or subroutine) is the machine-code equivalent of a procedure or function in a
high-level language. Control is transferred to a routine by means of a call instruction (or
instruction sequence). Control is transferred back to the caller by means of a return
instruction in the routine.
When a routine is called, some arguments may be passed to it. An argument could
be, for example, a value or an address. There may be zero, one, or many arguments. A
routine may also return a result - that is if it corresponds to a function in the high-level
language.
We have already studied one aspect of routines, namely allocation of storage for
local variables. In this section we study other important aspects:
protocols for passing arguments to routines and returning their results
how static links are determined
Run-Time Organization 21
(1) Just after reading (2) Just before call to (3) Just after (4) Just before call to
g: W: computing s: F:
( 5 ) Just before return (6) Just after return (7) Just after return
from F: from I?: from W:
arg. i
6.26" Using the class definitions from Exercise 6.25, consider the following hypo-
thetical class definition:
class ~eachingAssistantextends Staff, Student
special-case code template is worth having if phrases of the special form occur fre-
quently, and if they allow translation into particularly efficient object code. The follow-
ing example illustrates another common special case.
Code template (7.12a) specifies that the code 'elaborate [const n - 71' will deposit
the value 7 in a suitable cell (at the current stack top). Whenever n is used, code
template (7.10) specifies that the value will be loaded from that cell. The following
translation illustrates these code templates:
execute [[letconst n - 7;
vari: Integer
in i : = n*n]
' elaborate [[const n - 71
elaborate [[vari : Integer]
I
The first instruction 'LOADL 7' makes space for the constant n on the stack top.
LOADL 7
PUSH 1
LOAD n
LOAD n
CALL mult
STORE i
POP(O)2
Instructions of the form 'LOAD n' fetch the constant's value, wherever required. The
final instruction 'POP ( 0 ) 2' pops the constant and variable off the stack.
A much better translation is possible: simply use the literal value 7 wherever n is
fetched. This special treatment is possible whenever an identifier is bound to a known
value in a constant declaration. This is expressed by the following special-case code
templates:
fetch [[Ill = (7.16)
LOADL v where v = value bound to I(if known)
elaborate [[const I- IL] =
(i.e., no code)
In (7.17) no code is required to elaborate the constant declaration. It is sufficient that the
value of the integer-literal IL is bound to I for future reference. In (7.16) that value is
incorporated into a LOADL instruction. Thus the object code is more efficient in both
places. The following alternative translation illustrates these special-case code
templates:
260 Programming Language Processors in Java
Many of these visitor methods will simply be encoding methods. For example, the
visitorlencoding methods for commands will be vi sitAssigncommand,visit-
Callcommand, etc., and their implementations will be determined by the code
templates for 'execute I[V : =a',
'execute ([I ( E ) ]', etc.
Table 7.2 Summary of visitorlencoding methods for the Mini-Triangle code generator.
I Phrase class I Visitor/encoding method I Behavior of visitorlencoding method I
I Program I visitprogram I Generate code as specified by 'run P'. I
Command visit...Command Generate code as specified by 'execute C'.
Expression visit . ..Expression Generate code as specified by 'evaluate E' .
V-name visi t...Vname Return an entity description for the given value-
or-variable-name(explained in Section 7.3.)
Declaration visit...Declaration Generate code as specified by 'elaborate D'.
I Type-denoter I visit...TypeDenoter I Return the size of the given type. I
for control structures. Thereafter Sections 7.3 and 7.4 deal with the problems of generat-
ing code for declared constants and variables, procedures, functions, and parameters.
short g = nextInstrAddr; g:
com.C.visit(this, arg); execute C
short h = nextInstrAddr; h:
patch(j, h);
com.E.visit(this, arg); evaluute E
emit(Instruction.JUMPIFop, 1, JUMPIF ( 1) g
Instruction.CBr, g);
return null;
1
public Object visitIfCommand execute [if E
(Ifcommand com, then C1
Object arg) C else C2] =
com.E.visit (this, arg) ; evaluate E
short i = nextInstrAddr; 1:
emit(~nstruction.JUMPIFop,0, JUMPIF(0) g
Instruction.CBr, 0);
com.Cl.visit(this, arg); execute C1
short j = nextInstrAddr; j:
emit(Instruction.JUMPop, 0, JUMP h
~nstruction.CBr,0);
short g = nextInstrAddr;
patch(i, 9);
com.C2 .visit(this, arg); execute C2
short h = nextInstrAddr; h:
patch(j, nextInstrAddr);
return null;
1
Here we have used the following auxiliary method for patching instructions:
private void patch (short addr, short d) {
/ / Store d in the operand field of the instruction at address addr.
code [addr].d = d;
1
Program
I
LetCommand
I
LetCommand
I
(2) 1
ConstDeclaration 4--\,
\, SimpleV.
! x x putint y
Figure 7.2 Entity descriptions for a known address and an unknown value.
Consider now a source language with procedures and local variables. As explained in
Section 6.4, stack storage allocation is appropriate for such a language. The code
generator cannot predict a local variable's absolute address, but it can predict the
variable's address displacement relative to the base of a frame - a frame belonging to
the procedure within which the variable was declared. At run-time, a display register
will point to the base of that frame, and the variable can be addressed relative to that
register. The appropriate register is determined entirely by a pair of routine levels
known to the code generator: the routine level of the variable's declaration, and the
routine level of the code that is addressing the variable. (See Section 6.4.2 for details.)
To make the code generator implement stack storage allocation, we must modify the
form of addresses in entity descriptions. The address of a variable will now be held as a
pair (1, d), where 1 is the routine level of the variable's declaration, and d is the
variable's address displacement relative to its frame base. As in Section 6.4.2, we assign
a routine level of 0 to the main program, a routine level of 1 to the body of each
procedure or function declared at level 0, a routine level of 2 to the body of each
procedure or function declared at level 1, and so on.
The following methods show how the entity descriptions are now set up:
public Object visitConstDeclaration
(ConstDeclaration decl, Object arg) {
Frame frame = (Frame) arg;
if (dec1.E instanceof IntegerExpression) {
IntegerLiteral IL =
((IntegerExpression) dec1.E).IL;
decl.entity = new KnownValue
(1, valuation(IL.spelling));
return new Short(0);
1 else {
short s =
shortValueOf(decl.E.visit(this, frame));
decl.entity = new Unknownvalue
(s, frame.leve1, frame.size);
return new Short(s);
1
1
public Object visitvar~eclaration
(VarDeclaration decl, Object arg) {
Frame frame = (Frame) arg;
short s = shortValueOf(decl.T.visit(this,null));
emit(Instruction.PUSHop,0, 0, s);
decl.entity = new KnownAddress
(1, frame.leve1, frame.size);
return new Short(s);
1
When the appropriate visitorlencoding method is called to translate a procedure body,
the frame level must be incremented by one and the frame size set to 3, leaving just
enough space for the link data:
Frame outerFrame = ... ;
Frame 1ocalFrame = new Frame(outerFrame.level + 1, 3);
Finally, method encode starts off with a frame at level 0 and with no storage
allocated:
public void encode (Program prog) {
Frame globalFrame = new Frame(0, 0);
prog.visit(this, globalFrame);
1
288 Programming Language Processors in Java
elaborate [[procI ( ) - Cj =
JUMP g
e : execute C
RETURN ( 0 ) 0
g:
The generated routine body consists simply of the object code 'execute C' followed by a
RETURN instruction. The two zeros in the RETURN instruction indicate that the routine
has no result and no arguments. Since we do not want the routine body to be executed at
the point where the procedure is declared, only where the procedure is called, we must
generate a jump round the routine body. The routine's entry address, e, must be bound
to I for future reference.
The code template specifying translation of a procedure call would be:
execute [[I ( ) 1 = (7.24)
CALL(SB) e where e = entry address of routine bound to I
This is straightforward. The net effect of executing this CALL instruction will be simply
to execute the body of the routine bound to I.
0
Example 7.18 Object code for Mini-Triangle plus global procedures
The following extended Mini-Triangle program illustrates a procedure declaration and
call:
let
var n: Integer;
proc P ( ) -
n : = n * 2
in
begin
n : = 9;
PO
end
The corresponding object program illustrates code templates (7.23) and (7.24):
elaborate [[var n :
Integer] I 0: PUSH 1
I : JUMP 7
2: LOAD 0 [SBI
elaborate [[procp ( ) - < execute ([n : = n*2] 2
n : = n*2] 4: CALL mu1 t
5: STORE 0 [SBI
i 6: RETURN (0) 0
Code Generation 289
execute [n : = 9j 7: LOADL 9
execute [[begin n : = 9 ; 8: STORE 0 [SBI
P ( ) endl 9: CALL (SB) 2
10: POP(0) 1
11: HALT
The corresponding decorated AST and entity descriptions are shown in Figure 7.4.
0
A function is translated in much the same way as a procedure. The only essential
difference is in the code that returns the function result.
Program
I
Letcommand
!
! \
\
! \
! Ident. i Ident. Ident. Ident. OD. 1nt.Lit. Ident. 1nt.Lit. Ident.
known address
address = 2
7.4.3 Parameters
Now let us consider how the code generator implements parameter passing. Every
source language has one or more parameter mechanisms, the means by which
arguments are associated with the corresponding formal parameters.
As explained in Section 6.5.1, a routine protocol is needed to ensure that the calling
code deposits the arguments in a place where the called routine expects to find them. If
the operating system does not impose a routine protocol, the language implementor must
design one, taking account of the source language's parameter mechanisms and the
target machine architecture.
Code Generation 295
' In principle, nu1 in this example could be treated as bound to a known value. However, the
code generator would have to be enhanced to evaluate the expression 'chr ( 0 ) ' itself, using a
technique called constant folding.
298 Programming Language Processors in Java
now nearly all programs - even operating systems - are written in high-level languages.
So it makes more sense for the machine to support the code generator by, for example,
providing a simple regular instruction set. A lucid discussion of the interaction between
code generation and machine design may be found in Wirth (1986).
Almost all real machines have general-purpose andlor special-purpose registers;
some have a stack as well. The number of registers is usually small and always limited.
It is quite hard to generate object code that makes effective use of registers. Code
generation for register machines is therefore beyond the scope of this introductory
textbook. For a thorough treatment, see Chapter 9 of Aho et al. (1985).
The code generator described in this chapter works in the context of a multi-pass
compiler: it traverses an AST that represents the entire source program. In the context of
a one-pass compiler, the code generator would be structured rather differently: it would
be a collection of methods, which can be called by the syntactic analyzer to generate
code 'on the fly' as the source program is parsed. For a clear account of how to organize
code generation in a one-pass compiler, see Welsh and McKeag (1980).
The sheer diversity of machine architectures is a problem for implementors. A
common practice among software vendors is to construct a family of compilers, trans-
lating a single source language to several different target machine languages. These
compilers will have a common syntactic analyzer and contextual analyzer, but a distinct
code generator will be needed for each target machine. Unfortunately, a code generator
suitable for one target machine might be difficult or impossible to adapt to a dissimilar
target machine. Code generation by pattern matching is an attractive way to reduce the
amount of work to be done. In this method the semantics of each machine instruction is
expressed in terms of low-level operations. Each source-program command is translated
to a combination of these low-level operations; code generation then consists of finding
an instruction sequence that corresponds to the same combination of operations. A
survey of code generation by pattern matching may be found in Ganapathi et al. (1982).
Fraser and Hansen (1995) describe in detail a C compiler with three alternative
target machines. This gives a clear insight into the problems of code generation for
dissimilar register machines.
Exercises
Section 7.1
7.1 The Triangle compiler uses code template (7.8e) for while-commands, but
many compilers use the following alternative code template:
302 Programming Language Processors in Java
execute ( [ w h i1e E d o a
=
g : evaluate E
JUMPIF ( 0 ) h
execute C
JUMP g
h:
Convince yourself that the alternative code template is semantically equivalent
to (7.8e).
Apply the alternative code template to determine the object code of:
execute ( [ w h i l en > 0 do n := n - 21
Compare with Example 7.3, and show that the object code is less efficient.
Why, do you think, is the alternative code template commonly used?
(b) let D i n E
This is a block expression: the declaration D is elaborated, and the resul-
tant bindings are used in the evaluation of E.
Section 7.2
7.4* Implement the visitor/encoding methods visit ...Expression (along the
lines of Example 7.8) for the expressions of Exercise 7.3.
Section 7.3
7.6 Classify the following declarations according to whether they bind identifiers
to known or unknown values, variables, or routines.
(a) Pascal constant, variable, and procedure declarations, and Pascal value,
variable, and procedural parameters.
Section 7.4
7.9" Modify the Mini-Triangle code generator to deal with parameterized pro-
cedures, using the code templates of Example 7.24.
Interpretation
The following method is the emulator proper. Its control structure is a switch-
statement within a loop, preceded by initialization of the registers. Each case of the
switch-statement follows directly from Table 8.1.
public void emulate () {
/ / Initialize ...
PC = 0; ACC = 0; status = RUNNING;
This emulator has been kept as simple as possible, for clarity. But it might behave
unexpectedly if, for example, an ADD or SUB instruction overflows. A more robust
version would set status to FAILED in such circumstances. (See Exercise 8.1 .)
0
When we write an interpreter like that of Example 8.1, it makes no difference
whether we are interpreting a real machine code or an abstract machine code. For an
abstract machine code, the interpreter will be the only implementation. For a real
machine code, a hardware interpreter (processor) will be available as well as a software
interpreter (emulator). Of these, the processor will be much the faster. But an emulator
is much more flexible than a processor: it can be adapted cheaply for a variety of
purposes. An emulator can be used for experimentation before the processor is ever
constructed. An emulator can also easily be extended for diagnostic purposes. (Exercises
8.2 and 8.3 suggest some of the possibilities.) So, even when a processor is available, an
emulator for the same machine code complements it nicely.
print filename number Print the given number of copies of the named file.
Run execut- filename arg, .. . arg, Run the executable program contained in the named
able program file, with the given arguments.
3 12 Programming Language Processors in Java
Production rules for Filename and Literal have been omitted here.
In the Mini-Shell interpreter, we can represent commands as follows:
public class MiniShellCommand {
public String name;
public String[] args;
1
The following class represents the Mini-Shell state:
public class ~iniShellState {
/ / File store ...
public ... ;
/ / Registers ...
public byte status; / / RUNNINGorHALTEDorFAILED
public static final byte / / status values
RUNNING = 0 , HALTED = 1, FAILED = 2 ;
1
There is no need for either a code store or a code pointer, since each command will be
executed only once, as soon as it is entered.
The following class will implement the Mini-Shell interpreter:
public class Minishell extends Minishellstate {
/ / Initialize .. .
status = RUNNING;
do C
/ / Fetch and analyze the next instruction ...
MiniShellCommand com = readAnalyze0 ;
/ / Execute this instruction . . .
else if (com.name.equals("delete"))
delete(com.args);
3 14 Programming Language Processors in Java
else if (com.name.equals("edit"))
edit (com.args[O]) ;
else if (com.name.equals("list"))
list ( ) ;
else if (com.name.equals("print"))
print(com.args[O], com.args[l]);
else if (com.name.equals("quit"))
status = HALTED;
else / / executable program
exec (com.name, com.args) ;
} while (status == RUNNING);
(a) Source text: Each command must be scanned and parsed at run-time (i.e., every
time the command is fetched from the code store).
(b) Token sequence: Each command must be scanned at load-time, and parsed at run-
time.
(c) AST: All commands must be scanned and parsed at load-time.
Choice (a), illustrated in Figure 8.3, would slow the interpreter drastically. Choice (c) is
better but would slow the loader somewhat. Choice (b) is a reasonable compromise, so
let us adopt it here:
class T o k e n {
byte k i n d ;
String spelling;
1
class ScannedCornrnand {
Token [ 1 t o k e n s ;
1
Interpretation 3 17
Later we shall define concrete subclasses for particular forms of commands and expres-
sions. These will implement the methods execute and evaluate,which we shall
call interpreting methods.
Note that we must allow the interpreting methods to access the state of the Mini-
Basic abstract machine, hence their argument state.The following class will represent
the abstract machine state:
public class ~iniBasicState {
public static final short CODESIZE = 4096;
public static final short DATASIZE = 26;
/ / Code store . ..
public ScannedCommand[] code =
ScannedCommand[CODESIZE];
/ / Data store . . .
public float[] data = new float[DATASIZEl;
/ / Registers . ..
public short CP;
public byte status;
public static final byte / / status values
RUNNING = 0, HALTED = 1, FAILED = 2;
/ / Initialize ...
CP = 0; status = RUNNING;
do {
...
/ / Fetch the next instruction
ScannedCommand scannedcom = code[CP++l;
/ / Analyze this instruction ...
Command analyzedcom = parse(scannedC0m);
/ / Execute this instruction ...
analyzedCom.execute((MiniBasicState) this);
) while (status == RUNNING);
1
Now we must define how to represent and execute analyzed commands. We intrc
duce a subclass of Command for each form of command in Mini-Basic:
public class Assigncommand extends Command {
byte V ; / / left-side variable address
Expression E ; / / right-side expression
Interpretation 3 I!
The alternative to dynamic method selection would have been to make the interpret-
er test the subclass of each command before executing it, along the following lines:
/ / Execute this instruction ..
if (analyzedcom instanceof AssignCommand) {
AssignCommand com = (Assigncommand) analyzedcom;
data[com.V] = evaluate(c0m.E);
1
else if (analyzedcom instanceof GoCommand) {
GoCommand corn = (Gocommand) analyzedcom;
CP = c0m.L;
I
else ...
But this would not be in the true spirit of object-oriented design!
Examples 1.3 and 1.8. Assume that the analyzed program is to be represented by a
decorated AST. The source program will be subjected to syntactic and contextual
analysis, and also storage allocation, before execution commences.
We must choose a representation of Mini-Triangle values. These include not only
truth values and integers, but also undefined (which is the initial value of a variable).
The following classes represent all these types of values:
public abstract class Value I 1
public class IntValue extends Value {
public short i;
I
public class BoolValue extends Value {
public boolean b;
1
public class UndefinedValue extends Value {
This Mini-Triangle processor is a visitor object (see Section 5.3.2), in which the visitor
methods act as interpreting methods.
Interpretation
/ / Code store . . .
public Instruction[I code =
new Instruction[COD~SIZE];
/ / Data store . . .
public short[] data = new short[DATASIZE];
/ / Registers . ..
public short final CB = 0;
public short CT ;
public short final PB = CODESIZE;
public short final PT = CODESIZE + 28;
public short final SB = 0;
public short ST ;
public short final HB = DATASIZE;
public short HT ;
public short LB ;
public short CP;
public byte status ;
public static final byte / / status values
RUNNING = 0, HALTED = 1, FAILED = 2;
1
The following class implements the TAM interpreter proper:
public class Interpreter extends State {
/ / Initialize ...
ST = SB; HT = HB; LB = SB; CP = CB;
Status = RUNNING;
328 Programming Language Processors in Java
do {
/ / Fetch the next instruction ...
Instruction instr = code[CP++];
/ / Analyze this instruction . . .
byte op = instr.op;
byte r = instr.r;
byte n = instr-n;
short d = instr.d;
/ / Execute this instruction ..
switch (op) {
case LOADop: ...
case LOADAop: ...
case LOADIop: ...
case LOADLop: ...
case STOREop: ...
case STOREIop: ...
case CALLop: ...
case CALLIop: ...
case RETURNop: ...
case PUSHop: ...
case POPop: .. .
case JUMPop: ...
case JUMPIFO~:...
case HALTop: status = HALTED; break;
default : status = FAILED;
1
) while (status == RUNNING);
1
The fact that TAM is a stack machine gives rise to many differences in detail from
an interpreter for a register machine. Load instructions push values on to the stack, and
store instructions pop values off the stack. For example, the TAM LOADL instruction is
interpreted as follows:
case LOADLop:
data[ST++] = d;
break ;
(Register ST points to the word immediately above the stack top, as shown in Fig-
ure C. 1 .)
Further differences arise from the special design features of TAM (outlined in
Section 6.8).
Interpretation 329
For example, the LOAD and STORE instructions (on the simplifying assumption that the
length field n is 1) would be interpreted as follows:
case LOADop: {
short addr = relative (d, r) ;
data[ST++] = data[addr] ;
break ;
1
case STOREop: {
I short addr = relative (d, r) ;
data [addr] = data [--ST];
break ;
1
The operand of a CALL,JUMP,or JUMPIF instruction is also of the form 'd [ r l ',
where r is generally CB or PB, and d is a constant displacement. As usual, the displace-
ment d is added to the content of register r. The auxiliary method relative also
handles these cases.
Interpretation 33 1
indeed. Its control structures were more typical of a low-level language, making it
unattractive for serious programmers. More recently, 'structured' dialects of Basic have
become more popular, and compilation has become an alternative to interpretation.
Recursive interpretation is less common. However, this form of interpretation has
long been associated with Lisp (McCarthy et al. 1965). A Lisp program is not just
represented by a tree: it is a tree! Several features of the language - dynamic binding,
dynamic typing, and the possibility of manufacturing extra program code at run-time -
make interpretation of Lisp much more suitable than compilation. A description of a
Lisp interpreter may be found in McCarthy et al. (1965). Lisp has always had a devoted
band of followers, but not all are prepared to tolerate slow execution. A more recent
successful dialect, Scheme (Kelsey et al. 1998), has discarded Lisp's problematic
features in order to make compilation feasible.
It is noteworthy that two popular programming languages, Basic and Lisp, both
suitable for interpretation but otherwise utterly different, have evolved along somewhat
parallel lines, spawning structured dialects suitable for compilation!
Another example of a high-level language suitable for interpretation is Prolog. This
language has a very simple syntax, a program being a flat collection of clauses, and it
has no scope rules and few type rules to worry about. Interpretation is almost forced by
the ability of a program to modify itself by adding and deleting clauses at run-time.
Exercises
Make the Hypo interpreter of Example 8.1 detect the following exceptional
conditions, and set the status register accordingly:
(a) overflow;
(b) invalid instruction address;
(c) invalid data address.
(Assume that Hypo may have less than 4096 words of code store and less than
4096 words of data store, thus making conditions (b) and (c) possible.)
Make the Hypo interpreter of Example 8.1 display a summary of the machine
state after executing each instruction. Display the contents of ACC and CP, the
instruction just executed, and a selected portion of the data store.
Write an emulator for a real machine with which you are familiar.
Interpretation 333
Expressions, operators, and variables are unchanged, but labels are removed.
Write a recursive interpreter for this structured dialect.
8.10** The TAM interpreter (Section 8.3) sacrifices efficiency for clarity. For
example, the fetch/analyze/execute cycle could be combined and replaced by a
single switch-statement of the form:
switch ((instr = c o d e [ C P + + ] ) . o p ) {
case LOADop: ...
Another efficiency gain could be achieved by holding the top one or two stack
elements in simple variables, and possibly avoiding the unnecessary updating
of the stack pointer during a long sequence of arithmetic operations. (This is
effectively turning TAM into a register machine!)
Consider these and other possible improvements to the TAM interpreter, and
develop a more efficient implementation. Compare your version with the origi-
nal TAM interpreter, and measure the performance gain.
CHAPTER NINE
Conclusion
In any case, the language might have to be redesigned, respecified, and reimple-
mented, perhaps several times. This is bound to be costly, i.e., time-consuming and ex-
pensive. It is necessary, therefore, to plan the life cycle in order to minimize costs.
Figure 9.1 illustrates a life cycle model that has much to recommend it. Design is
immediately followed by specification. (This is needed to communicate the design to
implementors and programmers.) Development of a prototype follows, and development
of compilers follows that. Specification, prototyping, and compiler development are
successively more costly, so it makes sense to order them in this way. The designer gets
the fastest possible feedback, and costly compiler development is deferred until the
language design has more or less stabilized.
Specification
I
Manuals,
Ad Compilers
9.1.1 Design
The essence of programming language design is that the designer selects concepts and
decides how to combine them. This selection is, of course, determined largely by the
intended use of the language. A variety of concepts have found their way into program-
ming languages: basic concepts such as values and types, storage, bindings, and abstrac-
tion; and more advanced concepts such as encapsulation, polymorphism, exceptions,
and concurrency. A single language that supports all these concepts is likely to be very
large and complex indeed (and its implementations will be large, complex, and costly).
Therefore a judicious selection of concepts is necessary.
Conclusion 337
different contexts (assignment, array indexing, loop parameters); whereas Algol from
the start had just one class o f expression, permissible in all contexts.
Similarly, formal specification o f semantics tends to encourage semantic simplicity
and regularity. Unfortunately, few language designers yet attempt this. Semantic
formalisms are much more difficult to master than BNF. Even then, writing a semantic
specification o f a real programming language (as opposed to a toy language) is a
substantial task. Worst o f all, the designer has to specify, not a stable well-understood
language, but one that is gradually being designed and redesigned. Most semantic
formalisms are ill-suited to meet the language designer's requirements, so it is not
surprising that almost all designers content themselves with writing informal semantic
specifications.
The advantages o f formality and the disadvantages o f informality should not be
underestimated, however. Informal specifications have a strong tendency to be inconsis-
tent or incomplete or both. Such specification errors lead to confusion when the langu-
age designer seeks feedback from colleagues, when the new language is implemented,
and when programmers try to learn the new language. O f course, with sufficient invest-
ment o f effort,most specification errors can be detected and corrected, but an informal
specification will probably never be completely error-free. The same amount o f effort
could well produce a formal specification that is at least guaranteed to be precise.
The very act o f writing a specification tends to focus the designer's mind on aspects
o f the design that are incomplete or inconsistent. Thus the specification exercise
provides valuable and timely feedback to the designer. Once the design is completed,
the specification (whether formal or informal) will be used to guide subsequent
implementations o f the new language.
Prototypes
A prototype is a cheap low-quality implementation o f a new programming language.
Development o f a prototype helps to highlight any features o f the language that are hard
to implement. The prototype also gives programmers an early opportunity to try out the
language. Thus the language designer gains further valuable feedback. Moreover, since
a prototype can be developed relatively quickly, the feedback is timely enough to make
a language revision feasible. A prototype might lack speed and good error reporting; but
these qualities are deliberately sacrificed for the sake o f rapid implementation.
For a suitable programming language, an interpreter might well be a useful
prototype. An interpreter is very much easier and quicker to implement than a compiler
for the same language. The drawback o f an interpreter is that an interpreted program
will run perhaps 100 times more slowly than an equivalent machine-code program.
Programmers will quickly tire o f this enormous inefficiency,once they pass the stage o f
trying out the language and start to use it to build real applications.
A more durable form o f prototype is an interpretive compiler. This consists o f a
translator from the programming language to some suitable abstract machine code,
338 Programming Language Processors in Java
together with an interpreter for the abstract machine. The interpreted object program
will run 'only' about 10 times more slowly than a machine-code object program.
Developing the compiler and interpreter together is still much less costly than
developing a compiler that translates the programming language to real machine code.
Indeed, a suitable abstract machine might be available 'off the shelf', saving the cost of
writing the interpreter.
Another method of developing the prototype implementation is to implement a
translator from the new language into an existing high-level language. Such a translation
is usually straightforward (as long as the target language is chosen with care). Clearly
the existing target language must already be supported by a suitable implementation.
This was precisely the method chosen for the first implementation of C++, which used
the cf r o n t translator to convert the source program into C.
Development of the prototype must be guided by the language specification, whether
the specification is formal or informal. The specification tells the implementor which
programs are well-formed (i.e., conform to the language's syntax and contextual
constraints) and what these programs should do when run.
9.1.4 Compilers
A prototype is not suitable for use over an extended period by a large number of
programmers building real applications. When it has served its purpose of allowing
programmers to try out the new language and provide feedback to the language
designer, the prototype should be superseded by a higher-quality implementation. This
is invariably a compiler - or, more likely, a family of compilers, generating object code
for a number of target machines. Such a high-quality implementation is referred as an
industrial-strength compiler.
The work that went into developing a prototype need not go to waste. If the
prototype was an interpretive compiler, for example, we can bootstrap it to make a
compiler that generates real machine code (see Section 2.6).
Development of compilers must be guided by the language specification. A syntactic
analyzer can be developed systematically from the source language's syntactic specifi-
cation (see Chapter 4). A specification of the source language's scope rules and type
rules should guide the development of a contextual analyzer (see Chapter 5). Finally, a
specification of the source language's semantics should guide the development of a code
specification, which should in turn be used to develop a code generator systematically
(see Chapter 7).
In practice, contextual constraints and semantics are rarely specified formally. If we
compare separately-developed compilers for the same language, we often find that they
are consistent with respect to syntax, but inconsistent with respect to contextual con-
straints and semantics. This is no accident, because syntax is usually specified formally,
and therefore precisely, and everything else informally, leading inevitably to misunder-
standing.
Conclusion 339
To facilitate error recovery during type checking, it is useful for the type checker to
ascribe a special improper type, error-type, to any ill-typed expression. The type
checker can then ignore error-type whenever it is subsequently encountered. This
technique would avoid both the spurious error reports mentioned in Example 9.2.
As these examples illustrate, it is easy for a compiler to discover that the source
program is ill-formed, and to generate error reports; but it is difficult to ensure that the
compiler never generates misleading error reports. There is a genuine tension between
the task of compiling well-formed source programs and the need to make some sense of
ill-formed programs. A compiler is structured primarily to deal with well-formed source
programs, so it must be enhanced with special error recovery algorithms to make it deal
reasonably with ill-formed programs.
342 Programming Language Processors in Java
' If the language is dynamically typed, i s . , a variable can take values of different types a
different times, then type errors also are run-time errors. However, we do not conside
dynamically-typed languages here.
Conclusion 343
...
end
Assume that characters and integers occupy one word each, and that the addresses of
global variables name and i are 200 and 204, respectively. Thus name occupies words
200 through 203; and the address of name [ i ] is 200 + i, provided that 0 I i I 3.
The Triangle compiler does not currently generate index checks. The assignment
command at (1 ) will be translated to object code like this (omitting some minor details):
LOADL 48 - fetch the blank character
LOAD 2 0 4 - fetch the value of i
LOADL 2 00 - fetch the address of name [ 0 ]
CALL a d d - compute the address of name [ i 3
STORE1 - store the blank character at that address
This code is dangerous. If the value of i is out of range, the blank character will be
stored, not in an element of name, but in some other variable - possibly of a different
type. (If the value of i happens to be 4, then i itself will be corrupted in this way.)
We could correct this deficiency by making the compiler generate object code with
index checks, like this:
LOADL 4 8 - fetch the blank character
LOAD 2 0 4 - fetch the value of i
LOADL 0 -fetch the lower bound of name
LOADL 3 - fetch the upper bound of name
CALL rangecheck -check that the index is within range
LOADL 2 0 0 - fetch the address of name [ 0 ]
CALL a d d - compute the address of name [ i I
STORE1 - store the blank character at that address
The index check is italicized for emphasis. The auxiliary routine rangecheck, when
called with arguments i, rn, and n, is supposed to return i if rn 5 i I n, or to fail
otherwise. The space cost of the index check is three instructions, and the time cost is
three instructions plus the time taken by rangecheck itself.
0
344 Programming Language Processors in Java
Software run-time checks are expensive in terms of object-program size and speed.
Without them, however, the object program might overlook a run-time error, eventually
failing somewhere else, or terminating with meaningless results. And, let it be empha-
sized, if a compiler generates object programs whose behavior differs from the language
specification, it is simply incorrect. The compiler should, at the very least, allow the
programmer the option of including or suppressing run-time checks. Then a program's
unpredictable behavior would be the responsibility of the programmer who opts to
suppress run-time checks.
Whether the run-time check is performed by hardware or software, there remains the
problem of generating a suitable error report. This should not only describe the nature of
the error (e.g., 'arithmetic overflow' or 'index out of range'), but should also locate it in
the source program. An error report stating that overflow occurred at instruction address
1234 (say) would be unhelpful to a programmer who is trying to debug a high-level
language program. A better error report would locate the error at a particular line in the
source program.
The general principle here is that error reports should relate to the source program
rather than the object program. Another example of this principle is a facility to display
the current values of variables during or after the running of the program. A simple
storage dump is of little value: the programmer cannot understand it without a detailed
knowledge of the run-time organization assumed by the compiler (data representation,
storage allocation, layout of stack frames, layout of the heap, etc.). Better is a symbolic
dump that displays each variable's source-program identifier, together with its current
value in source-language syntax.
This information is hard to understand, to put it mildly. It is not clear which array
indexing operation failed. There is no indication that some of the words in the data store
constitute an array. There is no distinction between different types of data such as
integers and characters.
Conclusion 345
The following error report and storage dump are expressed more helpfully in source-
program terms:
Array indexing error at line 45.
Data store at this point:
name = ['J', 'a', 'v', 'a']
i = 10
Here the programmer can tell at a glance what went wrong.
But how can the source-program line number be determined at run-time? One
possible technique is this. We dedicate a register (or storage cell) that will contain the
current line number. The compiler generates code to update this register whenever
control passes from one source-program line to another. Clearly, however, this
technique is costly in terms of extra instructions in the object program.
An alternative technique is as follows. The compiler generates a table relating line
numbers to instruction addresses. If the object program stops, the code pointer is used to
search the table and determine the corresponding line number. This technique has the
great advantage of imposing no time or space overheads on the object program. (The
line-number table can be stored separately from the object program, and loaded only if
required.)
The generation of reliable line-number information, however, is extremely difficult
in the presence of heavily-optimized code. In this case, the code generator may have
eliminated some of the original instructions, and substantially re-ordered others, making
it very difficult to identify the line number of a given instruction. In the worst case, a
single instruction may actually be part of the code for several different lines of source
code.
To generate a symbolic storage dump requires more sophisticated techniques. The
compiler must generate a 'symbol table' containing the identifier, type, and address of
each variable in the source program, and the identifier and entry address of each
procedure (and function). If the object program stops, using the symbol table each (live)
variable can be located in the data store. The variable's identifier can be printed along
with its current value, formatted according to its type. If one or more procedures are
active at the time when the program stops, the store will contain one or more stack
frames. To allow the symbolic dump to cover local variables, the symbol table must
record which variables are local to which procedures, and the procedure to which each
frame belongs must be identified in some way. (See Exercise 9.16.)
This problem is compounded on a register machine, where a variable might be
located in a register and not in the store. It is also compounded for heavily-optimized
code, where several variables with disjoint lifetimes may share the same memory
location.
346 Programming Language Processors in Java
9.3 Efficiency
When we consider efficiency in the context of a compiler, we must carefully distinguish
between compile-time efficiency and run-time efficiency. They are not the same thing at
all; indeed, there is often a tradeoff between the two. The more a compiler strives to
generate efficient (compact and fast) object code, the less efficient (bulkier and slower)
the compiler itself tends to become.
The most efficient compilers are those that generate abstract machine code, where
the abstract machine has been designed specifically to support the operations of the
source language. Compilation is simple and fast because there is a straightforward trans-
lation from the source language to the target language, with few special cases to worry
about. Such is the Triangle compiler used as a case study in this book. Of course, the
object code has to be interpreted, imposing a significant speed penalty at run-time.
Compilers that generate code for real machines are generally less efficient. They
must solve a variety of awkward problems. There is often a mismatch between the
operations of the source language and the operations provided by the target machine.
The target-machine operations are often irregular, complicating the translation. There
might be many ways of translating the same source program into object code, forcing
the compiler writer to implement lots of special cases in an attempt to generate the best
possible object code.
* The 0-notation is a way of estimating the efficiency of a program. Let n be the size of the
program's input. If we state that the program's running time is O(n), we mean that its running
time is proportional to n. (The actual running time could be lOOn or 0.01n.) Similarly, O(n log
n) time means time proportional to n log n, 0(n2) time means time proportional to n2, and so
on. In estimates of algorithmic complexity, the constants of proportionality are generally less
important than the difference between, for example, O(n) and 0(n2).
Suppose that phase A runs in time an, and phase B in time bn (where a and b are constants).
Then the combination of these phases will run in time an + bn = (u + b)n, which is still O(n).
348 Programming Language Processors in Java
CALL a d d
CALL sub
STORE a
As we saw in Chapter 7, a simple efficient code generator can easily perform this
translation. The code generator has no registers to worry about.
Now suppose that the target machine has a pool of registers and a typical one-
address instruction set. Now the command might be translated to object code like this:
LOAD R1 b
MULT R1 c
LOAD R2 d
LOAD R3 e
MULT R3 f
ADD R2R3
SUB R1 R2
STORE R1 a
Although this is comparatively straightforward, some complications are already evident
The code generator must allocate a register for the result of each operation. It musl
ensure that the register is not reused until that result has been used. (Thus R1 cannot bc
used during the evaluation of 'd + (e*£ 1 ', because at that time it contains the unusec
result of evaluating 'bXc'.)Furthermore, when the right operand of an operator is 2
simple variable, the code generator should avoid a redundant load by generating, foi
example, 'MULTR1 c' rather than 'LOADR2 c' followed by 'MULTR1 R2'.
The above is not the only possible object code, nor even the best. One improvemenl
is to evaluate 'd + (e*f ) ' befare 'b*c'.A further improvement is to evaluate ' ( e*f )
+ d' instead of 'd + (e*f ) ', exploiting the commutativity of '+'. The combined effec
of these improvements is to save an instruction and a register:
LOAD R1 e
MULT R1 f
ADD R1 d
LOAD R2 b
MULT R2 c
SUB R2 R1
STORE R2 a
The trick illustrated here is to evaluate the more complicated subexpression of a binarj
operator first.
But that is not all. The compiler might decide to allocate registers to selectec
variables throughout their lifetimes. Supposing that registers R6 and R7 are thu:
allocated to variables a and d, the object code could be further improved as follows:
Conclusion 349
LOAD R1 e
MULT R1 f
ADD R1 R7
LOAD R6 b
MULT R6 c
SUB R6 R1
Several factors make code generation for a register machine rather complicated.
Register allocation is one factor. Another is that compilers must in practice achieve code
improvements of the kind illustrated above - programmers demand nothing less!
But even a compiler that achieves such improvements will still generate rather
mediocre object code (typically four times slower than hand-written assembly code). A
variety of algorithms have been developed that allow a compiler to generate much more
efficient object code (typically twice as slow as hand-written assembly code). These are
called code transformation (or code optimizationr) algorithms. Some of the more
common code transformations are:
Constant folding: If an expression depends only on known values, it can be evaluated
at compile-time rather than run-time.
Common subexpression elimination: If the same expression occurs in two different
places, and is guaranteed to yield the same result in both places, it might be possible
to save the result of the first evaluation and reuse it later.
Code movement: If a piece of code executed inside a loop always has the same effect,
it might be possible to move that code out of the loop, where it will be executed fewer
times.
' The more widely used term, code optimization, is actually inappropriate: it is infeasible for a
compiler to generate truly optimal object code.
350 Programming Language Processors in Java
(assuming that each integer occupies one word). Furthermore, if the compiler decidt
that addressl[hol]l = 20 (relative to SB), then addressl[hol [2 I .m]l can be folded 1
the constant address 27. This is shown in the following object code:
LOADL 12
STORE 27 [ SB]
Address folding makes field selection into a compile-time operation. It even makc
indexing of a static array by a literal into a compile-time operation.
I
where i is the value of variable i,and where we have assumed that each value of type T
occupies four words.
The common subexpression 'x-y'could have been eliminated by modifying the
source program. But the common subexpression ' i x 4' can be eliminated only by the
compiler, because it exists only at the target machine level.
0
(assuming that each character occupies one word). A straightforward translation of this
program fragment will generate code to evaluate addressl[name]+ (i x 10) inside the
inner loop. But this code will yield the same address in every iteration of the inner loop,
since the variable i is not updated by the inner loop.
The object program would be more efficient if this code were moved out of the inner
loop. (It cannot be moved out of the outer loop, of course, because the variable i is
updated by the outer loop.)
0
352 Programming Language Processors in Java
(a) Find out the object code that would be generated by the Triangle
compiler.
(b) Write down the object code that would be generated by a Triangle com-
piler that performs code transformations such as constant folding,
common subexpression elimination, and code movement.
9.9"" Extend Triangle with unary and binary operator declarations of the form:
func 0 ( I I : TI) : T - E
func 0 (Il: T I , 12: T2) : T - E
Operators are to be treated like functions. A unary operator application '0 E' is
to be treated like a function call ' 0 ( E )', and a binary operator application 'El
0 E2' is to be treated like a function call ' 0 ( E l, E 2 ) '.
356 Programming Language Processors in Java
which creates a new and distinct primitive type with n values, and respectively
binds the identifiers 11,..., and In to these values. Make the generic operations
of assignment, '=', and '\=' applicable to enumeration types. (They are appli-
cable to all Triangle types.) Provide new operations of the form 'succE' (suc-
cessor) and 'predE' (predecessor), where succ and pred are keywords.
Extend Triangle with a new family of types, string n, whose values are
strings of exactly n characters (n 2 1). Provide string-literals of the form
" el .. .en " . Make the generic operations of assignment, '=', and ' \ =' applicable
to strings. Provide a new binary operator '<<' (lexicographic comparison). Fi-
nally, provide an array-like string indexing operation of the form ' V I E ]',
where V names a string value or variable. (Hint: Represent a string in the same
way as a static array.)
Or:
Extend Triangle with a new type, String,whose values are character strings
of any length (including the empty string). Provide string-literals of the form
" el.. .en " (n 2 0). Make the generic operations of assignment, '=', and ' \ =' ap-
plicable to strings. Provide new binary operators '<<' (lexicographic compari-
son) and '++' (concatenation). Finally, provide an array-like string indexing
operation of the form ' V [ E ]', and a substring operation of the form
' V I E I:E2]', where V names a string value or variable. But do not permit
string variables to be selectively updated. (Hint: Use an indirect representation
for strings. The handle should consist of a length field and a pointer to an array
of characters stored in the heap. In the absence of selective updating, string
assignment can be implemented simply by copying the handle.)
APPENDIX A
Specimen answers to about half of the exercises are given here. Some of the answers are
given only in outline.
Answers 1
1.1 Other kinds of language processor: syntax checkers, cross-referencers, pretty-
printers, high-level translators, program transformers, symbolic debuggers, etc.
1.4 Mini-Triangle expressions: (a) and (e) only. (Mini-Triangle has no functions,
no unary operators, and no operator '>='.)
Commands: (f) and 0 ) only. (Mini-Triangle procedures have exactly one pa-
rameter each, and there is no if-command without an else-part.)
Declarations: (I), (m), and (0). (Mini-Triangle has no real-literals, and no multi-
ple variable declarations.)
1.5 AST:
Whilecommand
AssignCommand
'1 l - l -
VnameExpr.
SimpleV.
Int.Expr. VnameExpr.
SimpleV. I
SimpleV.
I
Ident. Ident. 1nt.Lit.
I
Ident.
SimpleV.
I
Ident.
b n 0 b faise
Answers to Selected Exercises 363
Answers 3
3.3 The contextual errors are (i) 'Logical'is not declared; (ii) the expression of
the if-command is not of type bool; and (iii) 'yes' is not declared:
Program
I I
Assigncommand Assigncommand
r-l
Ident. Ident.
3.5 In brief, compile one subprogram at a time. After parsing a subprogram and
constructing its AST, perform contextual analysis and code generation on the
AST. Then prune the AST: replace the subprogram's body by a stub, and retain
only the part(s) of the AST that will be needed to compile subsequent calls to
the subprogram (i.e., its identifier, formal parameters, and result type if any).
The maximum space requirement will be for the largest subprogram's AST,
plus the pruned ASTs of all the subprograms.
Answers 4
4.3 After repeated left factorization and elimination of left recursion:
Numeral ::= Digits (. Digits I E) (e Sign Digits I r)
364 Programming Language Processors in Java
Digits ..-
. Digit Digit*
1 ( currentchar == { I * ' )
char op = currentchar;
acceptIt ( 1 ;
int numval = parseNumeral() ;
switch (op) i
case ' + ' : expval += numval; break;
case . expval -= numval; break;
' - I -
1
1
return expval;
1
private int parseNumera1 ( ) {
int numval = parseDigit ( ) ;
while (isDigit(currentChar))
numval = 1O*numval + parseDigit ( ) ;
return numval;
3
private byte parseDigit ( ) {
if ( ' 0 '<= currentchar && currentchar <= ' 9 ' )
byte digval = currentchar - '0';
currentchar = next input character;
return digval;
1 else
report a lexical error
Answers to Selected Exercises
This is correct if and only if startersl[XJis disjoint from the set of tok
that can follow X+ in this particular context.
case Token.IF: {
acceptIt ( ) ;
parseExpression();
accept(Token.THEN);
parseSingleCommand();
368 Programming Language Processors in Java
4.18 This lexical grammar is ambiguous. The scanning procedure would turn (
follows:
private byte scanToken ( ) {
switch (currentchar) {
Answers to Selected Exercises 369
case 'i':
takeIt0; take('£');
return Token.IF;
case ' t ' :
takeIt ( ) ; take('hl)
; take('e0)
; take('nl
) ;
return Token-THEN;
case ' e n :
takeIt0; take('l8);take('sl);take('el);
return Token.ELSE;
This method will not compile. Moreover, there is no reasonable way to fix it.
Answers 5
5.2 One possibility would be a pair of subtables, one for globals and one for locals.
(Each subtable could be an ordered binary tree or a hash table.) There would
also be a variable, the current level, set to either global or locul. Constructor
IdentificationTable would set the current level to global, and would
empty both subtables. Method enter would add the new entry to the global or
local subtable, according to the current level. Method retrieve would search
the local subtable first, and if unsuccessful would search the global subtable
second. Method openscope would change the current level to local. Method
closescope would change it to global, and would also empty the local
subtable.
TypeDeclaration
* TypeDeclaration
I
FieldList
SimpleT.
I
Ident.
I
Ident. Ident. Ident. 1d&. 1dent. 1dent.
1nt~ist 1nt~ode
r-l
'
Field
FieldList
I
Field
I
IntList ~ n t ~ o d hd
e t1
The AST has been transformed to a directed graph, with the mutually recursive
types giving rise to a cycle.
The complication is that the e q u a l s method must be able to compare two
(possibly cyclic) graphs for structural equivalence. It must be implemented
carefully to avoid nontermination.
5.10 Consider the function call 'I ( E )'. Check that I has been declared by a
function declaration, say 'f unc I (I': T') : T - E". Check that the type of
the actual parameter E is equivalent to the formal parameter type T'. Infer that
the type of the function call is T.
372 Programming Language Processors in Java
Answers 6
6.3 Advantage of single-word representation:
It is economical in storage.
Advantages of double-word representation:
It is closer to the mathematical (unbounded) set of integers.
Overflow is less likely.
6.5 (a)
pixel [redl
pixel [orange]
pixel [yellow]
pixel [green]
... pixel [blue]
freq['zr1
6.8 Make the handle contain the lower and upper bounds in both dimensions,
well as a pointer to the elements. Store the elements themselves row by row (
in Example 6.6). If 1, u, 1', and u' are the values of El, E2, Eg, and E4, respc
tively, then we get:
Answers to Selected Exercises 373
origin
lower bound 1
upper bound I
lower bound 2
upper bound 2
handle row u { t-
elements of type Tele,
6.16 Let each frame consist of a static part and a dynamic part. The static part
accommodates variables of primitive type, and the handles of dynamic arrays.
The dynamic part expands as necessary to accommodate elements of dynamic
arrays. The frame containing v would look like this:
link
data
static
part of
frame
dynamic
part of
frame
Since everything in the static part is of constant size, the compiler can
determine each variable's address relative to the frame base. This is not true for
the dynamic part, but the array elements there are always addressed indirectly
through the handles.
6.17 There are three cases of interest. If n = m+l, S is local to the caller. If n = m, S
is at the same level as the caller. If n < m, S encloses the caller.
(a) On call, push S's frame on to the stack. In all cases, set Dn to point to the
base of the new frame. (Note: If n < m, D(n+l), ..., and Dm become
undefined.)
(b) On return, pop S's frame off the stack. If n = m+l, do nothing else. If n =
m, reset Dm to point to the base of the (now) topmost frame. If n < m,
reset other display registers using the static links: D(m-1) t content
(Dm); ...; Dn t content (D(n+l)). (Note: If n = m + l , Dn becomes
undefined.)
There is no need to change DO, D l , ..., D(n-1) at either the call or the return,
since these registers point to the same frames before, during, and after the acti-
vation of S.
Advantages and disadvantages (on the assumption that DO, D l , etc., are all true
registers):
Nonlocal variables can be accessed as efficiently as local or global variables.
Answers to Selected Exercises 3'
(e) e x e c u t e u r e p e a t C 1 w h i l e E do C2J =
JUMP h
g: execute C2
h: executeC1
evaluate E
JUMPIF(1) g
7.3 (a) evaluate[[if El t h e n E2 e l s e E3J =
evaluate El
JUMPIF ( 0 ) g
evaluate E2
JUMP h
g: evaluate E3
h:
(b) e v a l u a t e [ l e t D i n E]I =
elaborate D
evaluate E
POP(n) s ifs>O
where s = amount of storage allocated by D:
n = size (type of E)
(c) e v a l u a t e ( [ b e g i n C ; y i e l d E end] =
execute C
evaluate E
7.10 (a) Reserve space for the result variable just above the link data in the func
tion's frame (i.e., at address 3 [LB]):
JUMP g
e: PUSH n where n = size T
execute C
RETURN ( n ) d where d = size of FP
g:
execute[resul t E]I =
evaluate E
STORE(n) 3 [LB] where n = size (type of E)
(b)
public Object visitFuncDeclaration
(FuncDeclaration decl,
Object arg) {
Frame £ = (Frame) arg;
short i = nextInstrAddr;
emit(Instruction.JUMPop, 0,
Instruction.CBr, 0);
short e = nextInstrAddr;
decl.entity =
new KnownRoutine(2, £.level, e);
Frame £1 = new Frame(f.leve1 + 1, 0);
short d = shortValueOf(
decl.FP.visit(this, £1));
/ / ... creates a run-time entity for the formal parameter,
/ / and returns the size of the parameter.
short n = shortValueOf(
decl.T.visit(this, null));
emit(Instruction.PUSHop, 0 , 0, n);
Frame £2 = new Frame(£.level + 1, 3 + n) ;
decl .C.visit(this, £2);
emit(Instruction.RETURNop, n, 0 , d) ;
short g = nextInstrAddr;
patch(i, 9);
return new Short(0);
1
Answers to Selected Exercises :
Answers 8
8.3 In outline:
public abstract class UserCommand {
public abstract void perform
(HypoInterpreter interp);
1
8.8 In outline:
public class Minishell extends Minishellstate {
else if (com.narne.equals("ca11")) {
File input = new File(com.args[O]);
FileInputStream script =
new FileInputStream(input);
while (morecommandsinscript) {
MiniShellCommand subcorn =
readAnalyze(script);
execute(subCorn);
1
1 else / / executable program
exec (corn.name, corn.args) ;
1
public void interpret () {
/ / Initialize . . .
status = RUNNING;
Answers 9
9.5 In outline:
Common subexpressions are: 'i < j ' at points ( 1 ) ; the address of a [ i :
points (2); the address of a [ j I at points (3); the address of a [ n ]at points (4
var a : a r r a y ... of I n t e g e r
...
i : = m - 1 ; j := n; p i v o t := aCn];
w h i l e i < j(') do
begin
i : = i + 1;
while a [ i ~ (<~ p)i v o t do i : = i + 1;
j : = j - 1;
while a [ j ~ (>~ p)i v o t do j : = j - 1 ;
i f i i j(') t h e n
begin
t := a [ i ~ ( ~ ) ;
a [ i ] ( 2 ) := a [ j ](3);
a[j1(3) := t
end
end ;
t := a[i](2);
a[i](2) := a [ n ~ ( ~ ) ;
a [ n ~ ( :~= ) t
APPENDIX B
B.l Introduction
Triangle is a regularized extensible subset of Pascal. It has been designed as a model
language to assist in the study of the concepts, formal specification, and implementation
of programming languages.
The following sorts of entity can be declared and used in Triangle:
A value is a truth value, integer, character, record, or array.
A variable is an entity that may contain a value and that can be updated. Each variable
has a well-defined lifetime.
A procedure is an entity whose body may be executed in order to update variables. A
procedure may have constant, variable, procedural, and functional parameters.
A function is an entity whose body may be evaluated in order to yield a value. A
function may have constant, variable, procedural, and functional parameters.
A type is an entity that determines a set of values. Each value, variable, and function
has a specific type.
Each of the following sections specifies part of the language. The subsection headed
Syntax specifies its grammar in BNF (except for Section B.8 which uses EBNF). The
subsection headed Semantics informally specifies the semantics (and contextual
constraints) of each syntactic form. Finally, the subsection headed Examples illustrates
typical usage.
B.2 Commands
A command is executed in order to update variables. (This includes input-output.)
388 Programming Language Processors in Java
Syntax
A single-command is a restricted form of command. (A command must be enclosed
between begin ... end brackets in places where only a single-command is allowed.)
Command ..-
.. single-Command
I Command ;single-Command
single-Command ::=
I V-name := Expression
I Identifier ( Actual-Parameter-Sequence
I begin Command end
I let Declaration in single-Command
I if Expression then single-Command
else single-Command
I while Expression do single-Command
(The first form of single-command is empty.)
Semantics
The skip command ' ' has no effect when executed.
The assignment command ' V : = E' is executed as follows. The expression E is
evaluated to yield a value; then the variable identified by V is updated with this value.
(The types of V and E must be equivalent.)
The procedure calling command 'I(APS) ' is executed as follows. The actual-
parameter-sequence APS is evaluated to yield an argument list; then the procedure
bound to I is called with that argument list. (I must be bound to a procedure. APS
must be compatible with that procedure's formal-parameter-sequence.)
The sequential command 'C1 ; C2' is executed as follows. C1 is executed first; then
C2 is executed.
The bracketed command 'beginC end' is executed simply by executing C.
The block command 'let D in C' is executed as follows. The declaration D is
elaborated; then C is executed, in the environment of the block command overlaid by
the bindings produced by D. The bindings produced by D have no effect outside the
block command.
The if-command ' i f E then C1 else C2' is executed as follows. The expression E
is evaluated; if its value is true, then C1 is executed; if its value is false, then C2 is
executed. (The type of E must be Boolean.)
The while-command 'while E do C' is executed as follows. The expression E is
evaluated; if its value is true, then C is executed, and then the while-command is
executed again; if its value is false, then execution of the while-command is com-
pleted. (The type of E must be Boolean.)
Informal Specification of the Programming Language Triangle 389
Examples
The following examples assume the standard environment (Section B.9), and also the
following declarations:
var i: Integer;
var s: array 8 of Char;
var t: array 8 of Char;
proc sort (var a: array 8 of Char) - ...
Expressions
An expression is evaluated to yield a value. A record-aggregate is evaluated to construct
a record value from its component values. An array-aggregate is evaluated to construct
an array value from its component values.
Syntax
A secondary-expression and a primary-expression are progressively more restricted
forms of expression. (An expression must be enclosed between parentheses in places
where only a primary-expression is allowed.)
Expression .
..- secondary-Expression
I l e t Declaration i n Expression
I i f Expression then Expression e l s e Expression
secondary-Expression ::= primary-Expression
I secondary-Expression Operator primary-Expression
390 Programming Language Processors in Java
Integer-Literal
Character-Literal
V-name
ldentifier ( Actual-Parameter-Sequence )
Operator primary-Expression
( Expression )
{ Record-Aggregate )
[ Array-Aggregate I
Semantics
The expression 'IL' yields the value of the integer-literal IL. (The type of the expres-
sion is I n t e g e r . )
The expression 'CL' yields the value of the character-literal CL. (The type of the
expression is C h a r . )
The expression 'V', where V is a value-or-variable-name, yields the value identified
by V, or the current value of the variable identified by V. (The type of the expression
is the type of V.)
The function calling expression ' Z ( A P S )' is evaluated as follows. The actual-
parameter-sequence APS is evaluated to yield an argument list; then the function
bound to I is called with that argument list. (I must be bound to a function. APS must
be compatible with that function's formal-parameter-sequence. The type of the
expression is the result type of that function.)
The expression '0E' is, in effect, equivalent to a function call '0 ( E )'.
The expression 'El 0 E2' is, in effect, equivalent to a function call ' 0 ( E l , E 2 ) '.
The expression ' ( E )' yields just the value yielded by E.
The block expression ' l e t D i n E' is evaluated as follows. The declaration D is
elaborated; then E is evaluated, in the environment of the block expression overlaid
by the bindings produced by D. The bindings produced by D have no effect outside
the block expression. (The type of the expression is the type of E.)
The if-expression ' i f El t h e n E2 e l s e E3' is evaluated as follows. The expression
El is evaluated; if its value is true, then E2 is evaluated; if its value is false, then E3 is
evaluated. (The type of El must be B o o l e a n . The type of the expression is the same
as the types of E2 and E3, which must be equivalent.)
Informal Specification of the Programming Language Triangle 391
The expression ' { R A ) ' yields just the value yielded by the record-aggregate R4.(The
-
type of ' { I I E l , ... , In - E n } ' is 'record II: T I , ... , I,: T , end',where the
type of each Eiis Ti. The identifiers I 1, ..., I, must all be distinct.)
The expression ' [ A A I ' yields just the value yielded by the array-aggregate AA. (The
type of ' [ E1 , .. . , En]' is 'arrayn of T ,where the type of every Eiis T.)
The record-aggregate ' I - E' yields a record value, whose only field has the identifier
Iand the value yielded by E.
The record-aggregate 'I - E , RA' yields a record value, whose first field has the
identifier I and the value yielded by E, and whose remaining fields are those of the
record value yielded by RA.
The array-aggregate 'E'yields an array value, whose only component (with index 0)
is the value yielded by E.
The array-aggregate ' E , AA' yields an array value, whose first component (with
index 0) is the value yielded by E, and whose remaining components (with indices 1,
2, ...) are the components of the array value yielded by AA.
Examples
The following examples assume the standard environment (Section B.9), and also the
following declarations:
var current: Char;
type Date - record
y: Integer, m: Integer, d: Integer
end ;
var today: Date;
func multiple (m: Integer, n: Integer) : Boolean -
...
func leap (yr: Integer) : Boolean - ..
(a) {y - t0day.y + 1, m - 1, d - 1)
(b) [ 3 1 , if leap(t0day.y) then 29 else 28,
31, 30, 3 1 , 3 0 , 3 1 , 3 1 , 30, 3 1 , 3 0 , 3 1 1
in
if capital (current)
then chr(ord(current) + shift)
else current
Syntax
V-name ::= Identifier
1 .
V-name Identifier
I V-name [ Expression I
Semantics
The simple value-or-variable-name 'I' identifies the value or variable bound to I. (I
must be bound to a value or variable. The type of the value-or-variable-name is the
type of that value or variable.)
The qualified value-or-variable-name ' V . I' identifies the field I of the record value or
variable identified by V. (The type of V must be a record type with a field I. The type
of the value-or-variable-name is the type of that field.)
The indexed value-or-variable-name ' V [ E ]' identifies that component, of the array
value or variable identified by V , whose index is the value yielded by the expression
E. If the array has no such index, the program fails. (The type of E must be
Integer,and the type of V must be an array type. The type of the value-or-variable-
name is the component type of that array type.)
Examples
The following examples assume the standard environment (Section B.9), and also the
following declarations:
type Date -
record
m : Integer, d : Integer
end ;
const m a s - {m - 12, d - 2 5 1 ;
var easter : Date;
var holiday : array 10 of Date
(a) easter
(b) m a s
396 Programming Language Processors in Java
form 'funcI', and the argument function is the one bound to I. (1must be bound to a
function, and that function must have a formal-parameter-sequence equivalent to FPS
and a result type equivalent to the type denoted by T.)
Examples
The following examples assume the standard environment (Section B.9):
(a) while \eol( ) do
begin get (var ch); put (ch) end;
geteol(); puteol ( )
(b) proc increment (var count: Integer) -
count : = count + 1
A type-denoter denotes a data type. Every value, constant, variable, and function has a
specified type.
A record-type-denoter denotes the structure of a record type.
Syntax
Type-denoter . Identifier
I array Integer-Literal of Type-denoter
I record Record-Type-denoterend
Record-Type-denoter ::= ldentifier : Type-denoter
I ldentifier : Type-denoter , Record-Type-denoter
Semantics
The type-denoter 'I' denotes the type bound to I.
The type-denoter 'array IL of T' denotes a type whose values are arrays. Each
array value of this type has an index range whose lower bound is zero and whose
upper bound is one less than the integer-literal IL. Each array value has one
component of type T for each value in its index range.
The type-denoter 'recordRT end' denotes a type whose values are records. Each
record value of this type has the record structure denoted by RT.
The record-type-denoter 'I : T' denotes a record structure whose only field has the
identifier I and the type T.
The record-type-denoter 'I : T , R T denotes a record structure whose first field has
the identifier I and the type T, and whose remaining fields are determined by the
record structure denoted by RT. I must not be a field identifier of RT.
(Type equivalence is structural:
Two primitive types are equivalent if and only if they are the same type.
The type record ... , I;: T i , .. . end is equivalent to record . .. , li' : Ti', . . .
end if and only if each Ii is the same as I;' and each Ti is equivalent to Ti'.
The type array n of T is equivalent to array n' of T' if and only if n = n' and T
is equivalent to T'.)
Examples
(a) Boolean
(b) array 80 of Char
Informal Specification of the Programming Language Triangle 399
(Note: The symbols space, tab, and end-of-line stand for individual characters that
cannot stand for themselves in the syntactic rules.)
Semantics
The value of the integer-literal d,. ..dido is d,xlOn + ... + d l x 1 0 + do.
The value of the character-literal ' c ' is the graphic character c.
Every character in an identifier is significant. The cases of the letters in an identifier
are also significant.
Every character in an operator is significant. Operators are, in effect, a subclass of
identifiers (but they are bound only in the standard environment, to unary and binary
functions).
Examples
(a) Integer-literals: 0 1 9 87
(b) Character-literals: '%' ' z' ' ' '
(c) Identifiers: x pi vlOl Integer get gasFlowRate
(d) Operators: + * <= \/
Programs
A program communicates with the user by performing input-output.
Syntax
Program ::= Command
Semantics
The program 'C' is run by executing the command C in the standard environment.
Informal Specification of the Programming Language Triangle 401
Standard environment
The standard environment includes the following constant, type, procedure, and
function declarations:
type Boolean - ... ; ! truth values
proc puteol () -
... ; ! write an end-of-line to output
In addition, the following functions are available for every type T:
func = (vall: T, va12: T) : Boolean -
... ; ! t r u e i f f v a l l i s e q u a l t o v a l 2
func \ = (vall: T, va12: T) : Boolean -
... ! trueiffvallisnotequaltova12
APPENDIX C
TAM is an abstract machine whose design makes it especially suitable for executing
programs compiled from a block-structured language (such as Algol, Pascal, or Trian-
gle). All evaluation takes place on a stack. Primitive arithmetic, logical, and other
operations are treated uniformly with programmed functions and procedures.
(4) The return instruction 'RETURN ( n ) d' pops the topmost frame and replaces the d
words of arguments by the n-word result. LB is reset using the dynamic link, and
control is transferred to the instruction at the return address.
Since R's arguments lie immediately below its frame, R can access the arguments
using negative displacements relative to LB. For example:
LOAD(1) -d[LBl - for R to load its first argument (1 word)
LOAD ( 1) -1 [LB] - for R to load its last argument ( I word)
A primitive routine is one that performs an elementary arithmetic, logical, input-
output, heap, or general-purpose operation. The primitive routines are summarized in
Table C.3. Each primitive routine has a fixed address in the primitive segment. TAM
traps every call to an address in that segment, and performs the corresponding operation
directly.
PB+21
PB + 23 geteol
end-of-line.
PB + 24 1 DU teol - 1 - 1 Write an end-of-line.
a - Read an integer-literal (optionally preceded
by blanks and/or signed), and store its value
at address a.
. .
1 - Write an integer-literal whose value is i.
n a' Set a ' = address of a newly allocated n-
This appendix uses class diagrams to summarize the structure of the Triangle compiler,
which is available from our Web site (see Preface, page xv).
The Triangle compiler has broadly the same structure as the Mini-Triangle compiler
used throughout the text of this book. It is discussed in more detail in Sections 3.3, 4.6,
5.4, and 7.5.
The class diagrams are expressed in UML (Unified Modeling Language). UML is
described in detail in Booch et al. (1999). However, the following points are worth
noting. The name of an abstract class is shown in italics, whereas the name of a concrete
class is shown in bold. Private attributes and methods are prefixed by a minus sign (-),
whereas public attributes and methods are prefixed by a plus sign (+). The definition of
a class attribute or method is underlined. The name of a method parameter is omitted
where it is of little significance.
414 Programming Language Processors in Java
D.l Compiler
The following diagram shows the overall structure of the compiler, including the
syntactic analyzer (scanner and parser), the contextual analyzer, and the code generator:
Triangle::ErrorReporter
Triang1e::AbstractSyntaxTrees::Visitor
+ *(constructor* ErrorReoorter ( )
+ reportError (: String, : String,
: SourcePosition) : void
+ reportRestriction (: String) : void
A
Triang1e::ContextualAnalyzer::Checker
Triang1e::StdEnvironment
AST ActualParameter
- ActualParameter-
Sequence
- ArrayAggregate
- Command
- Declaration FormalParumeter
- Expression
- FormalParameter-
Sequence
- Program
416 Programming Language Processors in Java
D.2.1 Commands
The following diagram shows the individual concrete classes for each form of
command:
Command Assigncommand
- Callcommand
Ifcommand
Letcommand
Class Diagrams for the Triangle Compiler 417
D.2.2 Expressions
The following diagram shows the individual concrete classes for each form of
expression:
I Expression
EmptyExpression
VnameExpression
The following diagram shows the individual concrete subclasses for a record
aggregate:
41 8 Programming Language Processors in Java
The following diagram shows the individual concrete subclasses for an array
aggregate:
SingleArrayAggregate
D.2.4 Declarations
The following diagram shows the individual concrete classes for each form of
declaration:
Declaration - BinaryOperator-
Declaration
- ConstDeclaration
FormalParameter
Parameter
- FuncDeclaration
FuncFormal-
Parameter
- ProcDeclaration
ProcFormal-
SequentialDeclaration Parameter
TypeDeclaration
1
H UnaryOperator-
Declaration I
Class Diagrams for the Triangle Compiler 421
D.2.6 Type-denoters
The following diagram shows the individual concrete subclasses for each form of type-
denoter:
TypeDenoter AnyTypeDenoter
- ArrayTypeDenoter
- BoolTypeDenoter
c CharTypeDenoter
ErrorTypeDenoter
FieldTypeDenoter MultipleField-
TypeDenoter
IntTypeDenoter I I SineleField- I
TypeDenoter
SimpleTypeDenoter
D.2.7 Terminals
The following diagram shows the individual concrete subclasses for each form of
terminal node:
Terrninul CharacterLiteral
- Identifier
- IntegerLiteral
- Operator
Class Diagrams for the Triangle Compiler
IdEntry
r - level
- latest
: int
: IdEntry
+ <<constructor>>
IdentificationTable ( )
+ openScope ( ) : void
+ attr : Declaration
+ level : int
+ previous : IdEntry
+ closeScope ( ) : void + constructor* IdEntrv (
+ enter (: String, : Declaration) : void : String, : Declaration,
+ retrieve (: String) : Declaration : int, : IdEntrv)
Checker
I - idTable : IdentificationTable
+ <<constructor>>
Checker (: ErrorReporter)
+ check (: Program) : void
150 Programming Language Processors in Java
The above declarations of the standard environment are not syntactically valid in
Mini-Triangle, and so cannot be introduced by processing a normal input file. In fact,
these declarations are entered into the identification table using a method called estab-
lishStandardEnvironment,which the contextual analyzer calls before checking
the source program.
Once the standard environment is entered in the identification table, the source
program can be checked for any type errors. At every applied occurrence of an
identifier, the identification table will be searched in exactly the same way (regardless of
whether the identifier turns out to be in the standard environment or the source
program), and its corresponding attribute used to determine its type.
0
Type checking
The second task of the contextual analyzer is to ensure that the source program contains
no type errors. The key property of a statically-typed language is that the compiler can
detect any type errors without actually running the program. In particular, for every
expression E in the language, the compiler can infer either that E has some type T or
that E is ill-typed. If E does have type T, then evaluating E will always yield a value of
that type T. If E occurs in a context where a value of type T' is expected, then the
compiler can check that T is equivalent to T', without actually evaluating E. This is the
task that we call type checking.
Here we shall focus on the type checking of expressions. Bear in mind, however,
that some phrases other than expressions have types, and therefore also must be type-
checked. For example, a variable-name on the left-hand side of an assignment command
has a type. Even an operator has a type. We write a unary operator's type in the form
T 1+ T2, meaning that the operator must be applied to an operand of type T I , and will
yield a result of type T2. We write a binary operator's type in the form T1 x T2 4 T3,
meaning that the operator must be applied to a left operand of type T I and a right
operand of type T2,and will yield a result of type T3.
For most statically-typed programming languages, type checking is straightforward.
The type checker infers the type of each expression bottom-up (i.e., starting with literals
and identifiers, and working up through larger and larger subexpressions):
Literal: The type of a literal is immediately known.
Identifier: The type of an applied occurrence of identifier I is obtained from the
corresponding declaration of I.
Unary operator application: Consider the expression '0E', where 0 is a unary
operator of type T I + T2. The type checker ensures that E's type is equivalent to T I ,
and thus infers that the type of ' 0 E' is T2. Otherwise there is a type error.