Unit-4-4
Unit-4-4
UNIT-4
2
SRMIST, Vadapalani Campus
Intermediate Languages
3
SRMIST, Vadapalani Campus
Intermediate Languages
● Syntax trees and postfix notation, respectively, are two kinds of
intermediate representations.
● A third, called three-address code, will be used here.
● The semantic rules for generating three-address code from common
programming language constructs are similar to those for constructing
syntax trees or for generating postfix notation.
4
CS416 Compiler Design
Graphical Representations
● A syntax tree depicts the
natural hierarchical
structure of a source
program.
● A dag gives the same
information but in a more
compact way because
common subexpressions
are identified.
● A syntax tree and dag for
the assignment statement
a := b* -c + b* -c
5
CS416 Compiler Design
DECLARATIONS
● As the sequence of declarations in a procedure block is examined, storage for
names local to the procedure are defined.
● For each local name, a symbol-table entry with information like type and relative
address of the storage of the name is used.
● The relative address consists of an offset from the base of the static data area or
the field for local data in an activation record.
● When the front code generates address, it may have a target machine in mind
● Suppose that addresses of consecutive integers differ by 4 on a byte addressable
machine.
● The address calculations generated by the front end may therefore include
multiplications by 4.
6
Declarations in a procedure
● The syntax of languages such as C, Pascal, and Fortran, allows all the
declarations in a single procedure to be processed as a group.
● A global variable, offset can keep track of the next available relative
address.
● The procedure enter (name, type, offset) creates a symbol-table entry for
name, gives it type type and relative address offset in its data area.
● We use synthesized attributes type and width for nonterminal T to indicate
the type and width or number of memory units taken by objects or that
type.
● Attribute type represents a type expression constructed from the basic
types integer and real by applying the type constructors pointer and array,
● If type expressions are represented by graphs, then attribute type might be
a pointer to the node representing a type expression
7
In the translation nonterminal P generates a sequence SDT for declaration
of declarations of the form id : T
10
Example: Nested Procedures The nesting of procedure definitions in the Pascal program
indicated by the following indentation:
11
The semantic rules are defined in terms of the following operations:
1. mktable(previous):
-creates a new symbol table and returns a pointer to the new table.
-The argument previous points to a previously created symbol table.
-The pointer previous is placed in a header for the new symbol table,
along with additional information such as the nesting depth of a procedure.
2. enter(table, name, type, offset):
-creates a new entry for name name in the symbol table pointed by table.
-enter places type type and relative address offset in fields within the entry
3. addwidth(table, width):
-This records the cumulative width of all the entries in table in the header
4. enterproc(table, name, newtable)
-creates a new entry for procedure name in the symbol table pointed to table.
-The argument newtable points to the symbol table for this procedure name
12
The action for nonterminal M initializes stack tblptr with a
symbol table for the outermost scope, created by operation
mktable (nil)
Stacks tblptr and offset are then popped and revert to examining the declarations in the enclosing
procedure and then name of the enclosed procedure is entered into the symbol table 13
Field Names in Records
The top of stack offset will hold the width of all the data
objects within the record after the fields have been
examined
15
● The semantic actions use procedure emit to
emit three-address statements to an output
file, rather than building up code attributes for
nonterminals
● Example:
Suppose that the context in which an assignment
appears is given by the following grammar.
16
Reusing Temporary Names
● Assuming that newtemp generates a new temporary name each time a temporary
is needed.
● It is useful, in optimizing compilers, to actually create a distinct name each time
newtemp is called;
● The temporaries used to hold intermediate values in expression calculations and
need separate space has to be allocated to hold their values
● Temporaries can be reused by changing newtemp
● From the rules for the synthesized attribute E.place follows that t1and t2 are not
used elsewhere in the program.
● The lifetimes of these temporaries are nested like matching pairs of balanced
parentheses.
● It is possible to modify newtemp so that it uses a small array in a procedure’s data
area to hold temporaries. 17
The sequence of three-address statements that would be generated by semantic rules if newtemp were
modified. The table also contains an indication of the "current" value of c after the generation of each
statement.
Temporaries that may be assigned and/or used
more than once
18
Addressing Array Elements
● Elements of an array can be accessed quickly if the elements are stored in a block of
consecutive locations. If the width of each array element is w, then the ith element of
array A begins in location
● where low is the lower bound on the subscript and base is the relative address of the
storage allocated for the array. i.e., base is the relative address of A[low].
● The above expression can be partially evaluated at compile time as
where low and base, are the lower bounds on the values of i1
and i2 and n2 is the number of values that i2 can take
Assuming that i1 and i2 are the only values that are not known at compile time,
rewrite the above expression as
20
Generalize row- or column-major form to many dimension as
Again the expression above can be generalized to the following expression for the relative
address of A[i1,i2,i3,…,ik]
Column-major form generalizes to the opposite arrangement, with the leftmost subscripts
The formulas for accessing the elements of such arrays are the same as for
fixed-size arrays, but the upper and lower limits are not known at compile time.
21
Example:
Array references can be permitted in assignments if nonterminal L with the following productions as
These productions allow a pointer to the symbol-table entry for the array name to be passed as a
synthesized attribute array of Elist
Also Elist.ndim to record the number of dimensions (index expressions) in the Elist.
And Elist.place denotes the temporary holding a value computed from index expressions in Elist
L.place will be a pointer to the symbol-table entry for that name; L.offset will be null, indicating that the
l-value is a simple name rather than an array reference.
We generate a normal assignment if L is a simple name, and an indexed assignment into the
location denoted by L :
24
The code for arithmetic expressions is
Here L-offset is a new temporary representing the first term in multi-dimensional array
E.place holds both the value of the expression E and the value for m=1.
26
Example: Let A be a 10x20 array with
low1= low2 = 1. Also, n1= 10 and n2 = 20 w to be 4
27
Type Conversions within Assignment
The compiler must keep track of both the types and relative addresses of the fields of a record
An advantage of keeping this information in symbol-table entries for the field names is that the routine for
looking up names in the symbol table can also be used for field names.
If t is a pointer ta the symbol table for a record type, then the type record(t) formed by applying the
constructor record to the pointer was returned as T.type
We use the expression to illustrate how a pointer 10 the symbol table can be extracted
from an attribute E.type
Here p must be a pointer to a record with a field name info whose type is arithmetic
30
BOOLEAN EXPRESSIONS
● Boolean expressions have two primary purposes. They are used to compute logical
values, but more often they are used as conditional expressions in statements that
alter the flow of control, such as if-then, if-then-else, or while-do statements.
● Boolean expressions are composed of the boolean operators (and, or, and not)
applied to elements that are boolean variables or relational expressions.
● Relational expressions are of the form E1 relop E2, where E1 and E2 are arithmetic
expressions.
● Consider boolean expressions generated by the following grammar:
31
Methods of Translating Boolean Expressions
● There are two principal methods of representing the value of a boolean expression
● The first method is to encode true and false numerically and to evaluate a boolean
expression analgously to an arithmetic expression. Often 1 is used to denote true
and 0 to denote false
Example: let any nonnegative quantity denote true and any negative number denote
false.
● The second principal method of implementing boolean expressions is by flow of
control, i.e., representing the value of a boolean expression by a position reached
in a program.
● This method is convenient in implementing the boolean expressions in
flow-of-control statements, such as the if-then and while-do statements.
Example: Consider the expression E1 or E2, if we determine that E1 is true, then we
can conclude that the entire expression is true without having to evaluate E2.
32
Numerical Representation
● Consider the implementation of boolean expressions using 1 to denote true and 0
to denote false.
● Expressions will be evaluated completely, from Left to right, similar to arithmetic
expressions
● Example: translation for
33
Example:
A translation scheme for producing three-address code for boolean expression is shown here.
34
Short-Circuit Code
The scheme (previous slide) would generate the three-address code for the expression
a < b or c < d and e < f.
We can also translate a boolean expression into three-address code without generalizing code for any
of the boolean operators and without having the code necessarily evaluate the entire expression.
This style of evaluation is sometimes called "short-circuit" or "'jumping" code.
35
Flow-of-control statements
● Consider the translation of boolean expressions into three-address code in the
context of if-then, if-then-else, and while-do statements such as are generated by
the following grammar:
37
An alternative method, called
"backpatching" emits code for
such statements in one pass
38
Control-Flow Translation of Boolean Expressions
E is translated into a sequence of three-address statements that evaluates E as a sequence of
conditional and unconditional jumps to one of two locations: E.true the place control is to reach if E
is true, and E.false, the place control is to reach if E is false.
The bask idea behind the translation is the following. Suppose E is of the form a < b
Then the generated code is of the form
Suppose E is of the form E1 or E2. If E1 is true, then E itself is true, so E1.true is the same as E.true.
If E1 is false, then E2 must be evaluated, E1.false be the label of the first statement in the code for E2.
The true and false exits of E2 can be made the same as the true and false exits of E.
39
40
From these examples
redundant instructions and type of local
transformation of this form can be
subsequently removed by a simple
peephole optimizer
41
Mixed-Mode Boolean Expressions
E + E produces an integer
arithmetic result
The second statement is a jump over the third. E🡪id is also arithmetic
42
CASE STATEMENTS
● The "switch" or "case" statement is available in a variety of
languages; Switch-statement syntax is
● There is a selector expression, which is to be evaluated,
followed by n constant values that the expression might
take, perhaps including a default "value," which always
matches the expression if no other value does
● The intended translation of a switch is code to:
1. Evaluate the expression
2. Find which value in the list of cases is the same as the
value of the expression. Recall that the default value
matches the expression if none of the values explicitly
mentioned in cases does
3. Execute the statement associated with the value found
43
Syntax Directed Translation of Case Statements
With a SDT scheme, it is convenient to translate
this case statement into intermediate code
44
We can also generate a sequence of three-address statements of the form
45
BACK PATCHING
● The easiest way to implement the SDD is to use two passes.
● First, construct a syntax tree for the input, and then walk the tree in depth-first
order, computing the translations given in the definition.
● The main problem with generating code for boolean expressions and flow-of
control statements in a single pass is that during one single pass we may not know
the labels that control must go to at the time the jump statements are generated.
● We can get around this problem by generating a series of branching statements
with the targets of the jumps temporarily left unspecified.
● Each such statement will be put on a list of goto statements whose labels will be
filled in when the proper label can be determined.
● This subsequent filling in of labels Backpatching
46
● We show how backpatching can be used to generate code for boolean
expressions and flow-of-control statements in one pass.
● To manipulate lists of labels, we use three functions:
○ makelist(i) creates a new list containing only i, an index into the array of
quadruples. makelist returns a pointer to the list it has made.
○ merge (p1,p2) concatenates the lists pointed to by p1 and p2, and returns a
pointer to the concatenated list.
○ backpatch(p,i) inserts i as the target label for each of the statements on the
list pointed to by p
Boolean Expression
● To construct a translation scheme suitable for producing quadruples for boolean
expressions during bottom-up parsing.
● Insert a marker nonterminal M into the grammar to cause a semantic action to
pick up, the index of the next quadruple to be generated.
47
Synthesized attributes truelist and falselist of nonterminal E are
used to generate jumping code for boolean expressions.
As code is generated for E, jumps to the true and false exits are
left incomplete, with the label field unfilled.
The variable nextquad holds the index of the next quadruple to follow
This value will be backpatched onto the E1.truelist on to the production E🡪E1 and M E2
48
The translation scheme is
as follows:
49
The marker nonterminal M in the production E🡪E1 or M E2
records the value of nextquad
50
51
Flow-of Control Statements
The code that follows a given statement in execution also follows it physically in the quadruple array.
If that is not true, an explicit jump must be provided
52
Schemes to Implement the Translation
53
For (1)
Backpatch the jumps when E is true
to the quadruple M1.quad, which is
the beginning of the code for S1.
54
Labels and Gotos
55
PROCEDURE CALLS
● The procedure is such an important and frequently used programming
construct that it is imperative for a compiler to generate good code for
procedure calls and returns.
● The run-time routines that handle procedure argument passing, calls, and
returns are part of the run-time support package
● The code that is typically generated for procedure calls and returns
56
Calling Sequences
● The translation for a call includes a calling sequence, a sequence of actions taken
on entry to and exit from each procedure
● When a procedure call occurs, space must be allocated for the activation record
of the called procedure.
● The arguments of the called procedure must be evaluated and made available to
the called procedure in a known place
● Environment pointers must be established to enable the called procedure to access
data in enclosing blocks
● The state of the calling procedure must be saved so it can resume execution after
the call.
● The return address is usually the location of the instruction that follows the call in
the calling procedure.
● Finally, a jump to the beginning of the code for the called procedure must be
generated.
57
● When a procedure returns, several actions also must take place.
● If the called procedure is a function, the result must be stored in a known
place
● The activation record of the calling procedure must be restored.
● A jump to the calling procedure’s return address must be generated
58
A simple Example
Let us consider a simple example in which parameters are passed by reference and
storage is statically allocated
E will include a step to store E.place on a
queue queue
A convenient data structure in which to save these values is a queue, a first-in first-out list.
59