Unit-Iv: Intermediate Code Generation
Unit-Iv: Intermediate Code Generation
UNIT-IV
In the analysis-synthesis model of a compiler, the front end analyzes a source program and
creates an intermediate representation, from which the back end generates target code. This
facilitates retargeting: enables attaching a back end for the new machine to an existing front
end.
A compiler front end is organized as in figure above, where parsing, static checking,
and intermediate-code generation are done sequentially; sometimes they can be combined
and folded into parsing. All schemes can be implemented by creating a syntax tree and then
walking the tree.
Static Checking
This includes type checking which ensures that operators are applied to compatible
operands. It also includes any syntactic checks that remain after parsing like
flow–of-control checks
o Ex: Break statement within a loop construct
Uniqueness checks
o Labels in case statements
Name-related checks
Intermediate Representations
We could translate the source program directly into the target language. However, there
are benefits to having an intermediate, machine-independent representation.
- 58 -
Instructor: P Naga Deepthi
IR can be either an actual language or a group of internal data structures that are shared by
the phases of the compiler. C used as intermediate language as it is flexible, compiles into
efficient machine code and its compilers are widely available.In all cases, the intermediate
code is a linearization of the syntax tree produced during syntax and semantic analysis. It is
formed by breaking down the tree structure into sequential instructions, each of which is
equivalent to a single, or small number of machine instructions. Machine code can then be
generated (access might be required to symbol tables etc). TAC can range from high- to low-
level, depending on the choice of operators. In general, it is a statement containing at most 3
addresses or operands.
The general form is x := y op z, where “op” is an operator, x is the result, and y and z are
operands. x, y, z are variables, constants, or “temporaries”. A three-address instruction
- 59 -
Instructor: P Naga Deepthi
- 60 -
Instructor: P Naga Deepthi
- 61 -
Instructor: P Naga Deepthi
Unconditional jump
goto L
Creates label L and generates three-address code „goto L‟
v. Creates label L, generate code for expression exp, If the exp returns value true then go to
the statement labelled L. exp returns a value false go to the statement immediately following
the if statement.
Function call
For a function fun with n arguments a1,a2,a3….an ie.,
fun(a1, a2, a3,…an),
- 62 -
Instructor: P Naga Deepthi
- 63 -
Instructor: P Naga Deepthi
- 64 -
Instructor: P Naga Deepthi
TRIPLES
Triples uses only three fields in the record structure. One field for operator, two fields for
operands named as arg1 and arg2. Value of temporary variable can be accessed by the
position of the statement the computes it and not by location as in quadruples.
Example: a = -b * d + c + (-b) * d
Triples for the above example is as follows
- 65 -
Instructor: P Naga Deepthi
Arg1 and arg2 may be pointers to symbol table for program variables or literal table for
constant or pointers into triple structure for intermediate results.
Example: Triples for statement x[i] = y which generates two records is as follows
Triples are alternative ways for representing syntax tree or Directed acyclic graph for
program defined names.
Indirect Triples
Indirect triples are used to achieve indirection in listing of pointers. That is, it uses pointers to
triples than listing of triples themselves.
Example: a = -b * d + c + (-b) * d
- 66 -
Instructor: P Naga Deepthi
Conditional operator and operands. Representations include quadruples, triples and indirect
triples.
SYNTAX TREES
Syntax trees are high level IR. They depict the natural hierarchical structure of the source
program. Nodes represent constructs in source program and the children of a node represent
meaningful components of the construct. Syntax trees are suited for static type checking.
- 67 -
Instructor: P Naga Deepthi
- 68 -
Instructor: P Naga Deepthi
Using the SDD to draw syntax tree or DAG for a given expression:-
• Draw the parse tree
• Perform a post order traversal of the parse tree
• Perform the semantic actions at every node during the traversal
– Constructs a DAG if before creating a new node, these functions check whether an
identical node already exists. If yes, the existing node is returned.
SDD to produce Syntax trees or DAG is shown below.
- 69 -
Instructor: P Naga Deepthi
BASIC BLOCKS
A basic block is a sequence of consecutive statements in which flow of control
enters at the beginning and leaves at the end without halt or possibility of branching except at
the end. The following sequence of three-address statements forms a basic block:
t1 := a*a
t2 := a*b
t3 := 2*t2
t4 := t1+t3
t5 := b*b
t6 := t4+t5
A three-address statement x := y+z is said to define x and to use y or z. A name in a basic
block is said to live at a given point if its value is used after that point in the program,
perhaps in another basic block.
The following algorithm can be used to partition a sequence of three-address statements into
basic blocks.
Algorithm 1: Partition into basic blocks.
Input: A sequence of three-address statements.
Output: A list of basic blocks with each three-address statement in exactly one block.
Method:
1. We first determine the set of leaders, the first statements of basic blocks.
The rules we use are the following:
I) The first statement is a leader.
II) Any statement that is the target of a conditional or unconditional goto is a leader.
- 70 -
Instructor: P Naga Deepthi
III) Any statement that immediately follows a goto or conditional goto statement is a
leader.
2. For each leader, its basic block consists of the leader and all statements up to but not
including the next leader or the end of the program.
Example 3: Consider the fragment of source code shown in fig. 7; it computes the dot
product of two vectors a and b of length 20. A list of three-address statements performing
this computation on our target machine is shown in fig. 8.
begin
prod := 0;
i := 1;
do begin
prod := prod + a[i] * b[i];
i := i+1;
end
while i<= 20
end
Let us apply Algorithm 1 to the three-address code in fig 8 to determine its basic
blocks. statement (1) is a leader by rule (I) and statement (3) is a leader by rule (II), since the
last statement can jump to it. By rule (III) the statement following (12) is a leader. Therefore,
statements (1) and (2) form a basic block. The remainder of the program beginning with
statement (3) forms a second basic block.
(1) prod := 0
(2) i := 1
(3) t1 := 4*i
(4) t2 := a [ t1 ]
(5) t3 := 4*i
(6) t4 :=b [ t3 ]
(7) t5 := t2*t4
(8) t6 := prod +t5
(9) prod := t6
(10) t7 := i+1
- 71 -
Instructor: P Naga Deepthi
(11) i := t7
(12) if i<=20 goto (3)
STRUCTURE-PRESERVING TRANSFORMATIONS
The primary structure-preserving transformations on basic blocks are:
1. Common sub-expression elimination
2. Dead-code elimination
3. Renaming of temporary variables
4. Interchange of two independent adjacent statements
We assume basic blocks have no arrays, pointers, or procedure calls.
1. Common sub-expression elimination
Consider the basic block
a:= b+c
b:= a-d
c:= b+c
d:= a-d
The second and fourth statements compute the same expression, namely b+c-d, and hence
this basic block may be transformed into the equivalent block
a:= b+c
b:= a-d
c:= b+c d:= b
Although the 1st and 3rd statements in both cases appear to have the same expression
- 72 -
Instructor: P Naga Deepthi
on the right, the second statement redefines b. Therefore, the value of b in the 3rd
statement is different from the value of b in the 1st, and the 1st and 3rd statements do
not compute the same expression.
2. Dead-code elimination
Suppose x is dead, that is, never subsequently used, at the point where the statement
x:= y+z appears in a basic block. Then this statement may be safely removed without
changing the value of the basic block.
3. Renaming temporary variables
Suppose we have a statement t:= b+c, where t is a temporary. If we change this statement to
u:= b+c, where u is a new temporary variable, and change all uses of this instance of t to u,
then the value of the basic block is not changed.
4. Interchange of statements
Suppose we have a block with the two adjacent statements
t1:= b+c
t2:= x+y
Then we can interchange the two statements without affecting the value of the block if and
only if neither x nor y is t1 and neither b nor c is t2. A normal-form basic block permits all
statement interchanges that are possible.
- 73 -
Instructor: P Naga Deepthi
2. Designate as output nodes those N whose values are live on exit, an officially-mysterious
term meaning values possibly used in another block. (Determining the live on exit values
requires global, i.e., inter-block, flow analysis.) As we shall see in the next few sections
various basic-block optimizations are facilitated by using the DAG.
Finding Local Common Subexpressions
As we create nodes for each statement, proceeding in the static order of the tatements, we
might notice that a new node is just like one already in the DAG in which case we don't need
a new node and can use the old node to compute the new value in addition to the one it
already was computing. Specifically, we do not construct a new node if an existing node has
the same children in the same order and is labeled with the same operation.
Consider computing the DAG for the following block of code.
a=b+c
c=a+x
d=b+c
b=a+x
The DAG construction is explain as follows (the movie on the right accompanies the
explanation).
1. First we construct leaves with the initial values.
2. Next we process a = b + c. This produces a node labeled + with a attached and having b0
and c0 as children.
3. Next we process c = a + x.
4. Next we process d = b + c. Although we have already computed b + c in the first
statement, the c's are not the same, so we produce a new node.
5. Then we process b = a + x. Since we have already computed a + x in statement 2, we do
not produce a new node, but instead attach b to the old node.
6. Finally, we tidy up and erase the unused initial values.
You might think that with only three computation nodes in the DAG, the block could be
reduced to three statements (dropping the computation of b). However, this is wrong. Only if
b is dead on exit can we omit the computation of b. We can, however, replace the last
statement with the simpler b = c. Sometimes a combination of techniques finds
- 74 -
Instructor: P Naga Deepthi
improvements that no single technique would find. For example if a-b is computed, then both
a and b are incremented by one, and then a-b is computed again, it will not be recognized as a
common subexpression even though the value has not changed. However, when combined
with various algebraic transformations, the common value can be recognized.
Strength reduction
Another class of simplifications is strength reduction, where we replace one operation by a
cheaper one. A simple example is replacing 2*x by x+x on architectures where addition is
cheaper than multiplication. A more sophisticated strength reduction is applied by compilers
that recognize induction variables (loop indices). Inside a for i from 1 to N loop, the
expression 4*i can be strength reduced to j=j+4 and 2^i can be strength reduced to j=2*j
(with suitable initializations of j just before the loop). Other uses of algebraic identities are
possible; many require a careful reading of the language
reference manual to ensure their legality. For example, even though it might be advantageous
to convert ((a + b) * f(x)) * a to ((a + b) * a) * f(x)
it is illegal in Fortran since the programmer's use of parentheses to specify the order of
operations can not be violated.
Does
a=b+c
x=y+c+b+r
- 75 -
Instructor: P Naga Deepthi
contain a common sub expression of b+c that need be evaluated only once?
The answer depends on whether the language permits the use of the associative and
commutative law for addition. (Note that the associative law is invalid for floating point
numbers.)
- 76 -