Intermediate Code Generation
- Types of Three address code,
Representation, Declarations
Intermediate Code Generation
• Facilitates retargeting: enables attaching a back end
for the new machine to an existing front end
Intermediate Target
Front end code Back end machine
code
• Enables machine-independent code optimization
Intermediate Representations
• Graphical representations
• AST
• Postfix notation: operations on values stored on operand stack
• JVM bytecode
• Three-address code: x := y op z
• Variation of three address code - two-address code:x := op y
Syntax-Directed Translation of
Abstract Syntax Trees
Production Semantic Rule
S id := E S.nptr := mknode(‘:=’, mkleaf(id, id.entry), E.nptr)
E E1 + E2 E.nptr := mknode(‘+’, E1.nptr, E2.nptr)
E E1 * E2 E.nptr := mknode(‘*’, E1.nptr, E2.nptr)
E - E1 E.nptr := mknode(‘uminus’, E1.nptr)
E ( E1 ) E.nptr := E1.nptr
E id E.nptr := mkleaf(id, id.entry)
Abstract Syntax Trees
E.nptr
a * (b + c) E.nptr * E.nptr
a ( E.nptr )
E.nptr + E.nptr
b c
*
a +
b c
Abstract Syntax Trees versus DAGs
a := b * -c + b * -c
:= :=
a + a +
* * *
b uminus b uminus b uminus
c c c
Tree DAG 6
Postfix Notation
a := b * -c + b * -c Bytecode (for example)
iload 2 // push b
a b c uminus * b c uminus * + assign
iload 3 // push c
Postfix notation represents ineg // uminus
operations on a stack imul // *
iload 2 // push b
iload 3 // push c
ineg // uminus
imul // *
iadd // +
istore 1 // store a
Three-Address Code
a := b * -c + b * -c
t1 := - c t1 := - c
t2 := b * t1 t2 := b * t1
t3 := - c t5 := t2 + t2
t4 := b * t3 a := t5
t5 := t2 + t4
a := t5 Linearized representation
Linearized representation
of a syntax DAG
of a syntax tree
Three address code
• In a three address code there is at most one operator at the right side of
an instruction
• Example:
+
t1 = b – c
+ * t2 = a * t1
* d
t3 = a + t2
- t4 = t1 * d
a
t5 = t3 + t4
b c
Types of three address codes
• x := y op z
• x := op y
• x := y
• goto L
• if x goto L and if (false x) goto L1
• if x relop y goto L
Types of three address code
• Procedure calls using:
• param x
• call p, n
• y := call p, n
• return y
• x := y[i] and x[i] := y
• x := &y
• x := *y and *x :=y
Example
• do i = i+1; while (a[i] < v);
L: t1 = i + 1 100: t1 = i + 1
i = t1 101: i = t1
t2 = i * 8 102: t2 = i * 8
t3 = a[t2] 103: t3 = a[t2]
if t3 < v goto L 104: if t3 < v goto 100
Symbolic labels Position numbers
Representing three address codes
• Quadruples
• Has four fields: op, arg1, arg2 and result
• The contents of arg1, arg2, and result are usually pointers to the symbol table
entries for the names represented by these fields
• Unary operators like x := -y do not use arg2
• Operators like param use neither arg2 nor result
• Conditional and unconditional jump put target label in result
Three address code
Example: a:= b * -c + b* -c Quadruple
Op Arg1 Arg2 result
• t1 : = -c (0) uminus c t1
• t2 := b * t1 (1) * b t1 t2
• t3 : = - c (2) uminus c t3
• t4 := b * t3 (3) * b t3 t4
• t5 := t2 + t4 (4) + t2 t4 t5
• a := t5 (5) := t5 a
Representing three address codes
• Triples
• To avoid entering temporary names into the symbol table, we can refer to a
temporary value by the position of the statement that computes it
• Has 3 fields: op, arg1, and arg2
• arg1, and arg2 are either pointers to the symbol table (for programmer-
defined names or constants) or pointers to the triple structure (for temporary
values)
• Parenthesized numbers represent pointers into the triple structure itself
Example
• Example: a:= b * -c + b* -c
• t1 : = -c Op Arg1 Arg2
• t2 := b * t1 (0) uminus c
• t3 : = - c (1) * b (0)
(2) uminus c
• t4 := b * t3
(3) * b (2)
• t5 := t2 + t4
(4) + (1) (3)
• a := t5
(5) := a (4)
Triples for arrays – ternary operation
x [i] : = y
op arg1 arg2
(0) []= x i
(1) assign (0) y
x : = y [i]
op arg1 arg2
(0) =[] y I
(1) assign x (0)
Representing three address codes
• Indirect triples
• In addition to triples we use a list of pointers to triples
• to list pointers to triples in the desired order
Example
Statement Op Arg1 Arg2
(30) (0) (0) uminus c
(31) (1) (1) * b (0)
(32) (2) (2) uminus c
(33) (3) (3) * b (2)
(34) (4) (4) + (1) (3)
(35) (5) (5) := a (4)
Comparison of Representations
• Benefit of quadruples over triples can be seen in an optimizing
compiler, where instructions are often moved around.
• With quadruples, if we move an instruction that computes a temporary t,
then the instructions that use t require no change.
• With triples, the result of an operation is referred to by its position, so moving
an instruction may require us to change all references to that result.
• This problem does not occur with indirect triples, an optimizing compiler can
move an instruction by reordering the instruction list, without affecting the
triples themselves.
SDT into Three address code
• Three address code is constructed based on the grammar construct
• Non-terminals have Attributes
• Code – sequence of three-address statements evaluating Expression
• Place – Address/name that holds the value
• Value
Three address code for expression
Production Semantic Rules
S id := E; S.code = E.code || gen (top.get(id.lexeme) ‘:=‘ E.address
E E1 + E2 E.addr = new Temp()
E.code = E1.code || E2.code ||
gen (E.addr ‘:=‘ E1.addr ‘+’ E2.addr)
E - E1 E.addr = new Temp()
E.code = E1.code || gen (E.addr ‘:=‘ ‘uminus’ E1.addr)
E (E1) E.addr = E1. addr
E.code = E1.code
Three address code
Production Semantic Rules
E E1 * E2 E.addr = new Temp()
E.code = E1.code || E2.code ||
gen (E.addr ‘:=‘ E1.addr ‘*’ E2.addr)
E id E.addr = top.get(id.lexeme)
E.code = ‘ ‘
Example
• a:= b + c * d
S id := E => E + E => E +E * E = > id + id * id
The corresponding syntax tree would be
Example
* Node – New temp
E1.addr = c, E2.addr = d
+ Node – New temp
E1.addr = b, E2.addr=t1
t1 := c * d
t2 := b + t1
Root node
a := t2
Declarations – Three address code
• Can be in a procedure – need to track scope of variable’s and need
symbol table
• Computing the address of variables and other is done by semantic
rules related to three address code
• Type, width, offset
Declarations
Production Semantic rules
PD {offset = 0}
DD;D
D id : T {enter (id.name, T.type, offset);
offset = offset + T.width;
T integer T.type = integer;
T.width = 4;
T real T.type = real;
T.width = 8;
T array[num] of T1 T.type = array(num.val, T1.type)
T.width = num *T1.width;
Declarations
Production Semantic rules
T ↑ T1 T.type = pointer (T1.type);
T.width = 4;
Declarations
• PD
• D D ; D | id : T | proc id ; D ; S
A new symbol table is created when proc id; D;S is encountered
Symbol Table Functions
• mktable(previous) creates a new symbol table and returns a pointer
to the new table that is linked to a previous table in the outer scope.
The pointer previous is placed in the header of the new symbol table.
• enter(table, name, type, offset) creates a new entry for name in the
symbol table pointed to by table
• table – address of the current table, variable name, type and offset
• addwidth(table, width) records the cumulative width of all entries in
table in the header associated with this symbol table
Symbol Table functions
• enterproc(table, name, newtable) creates a new entry in table for
procedure name with local scope newtable
• table – existing table, name of the newtable, address of the new table
• lookup(table, name) returns a pointer to the entry in the table for
name by following linked tables
Example
globals
struct S prev=nil [4] Trec S
{ int a; s (0)
int b; prev=nil [8]
swap
a (0)
} s; foo
b (4)
Tfun swap
void swap(int a, int b) Tref
{ int t; prev [12]
Tint
t = a; a (0)
a = b; b (4)
b = t; t (8) Table nodes
} type nodes
Tfun foo
(offset)
prev [0] [width]
Example
void foo()
{ …
swap(s.a, s.b);
…
}
Example function call
Void foo( ) Void A()
{ call A() {
call D()
call B() }
call (C)
}
Calling stack
D
A B C
foo foo foo foo
Main Main Main Main
Symbol Table semantic rules
Productions Semantic Rule
P MD;S {addwidth (top(tblptr), top(offset);
pop(tblptr); pop(offset)}
Mε { t := mktable(nil); push(t, tblptr); push(0, offset) }
D id : T { enter(top(tblptr), id.name, T.type, top(offset));
top(offset) := top(offset) + T.width }
D proc id ; N D1 ; S { t := top(tblptr); addwidth(t, top(offset));
pop(tblptr); pop(offset);
enterproc(top(tblptr), id.name, t) }
Symbol Table creation
Productions Semantic Rule
Nε { t := mktable(top(tblptr)); push(t, tblptr); push(0, offset) }
D D1 ; D2
T integer { T.type := ‘integer’; T.width := 4 }
T real { T.type := ‘real’; T.width := 8 }
T array [ num ] of { T.type := array(num.val, T1.type);
T1 T.width := num.val * T1.width }
T ^ T1 { T.type := pointer(T1.type); T.width := 4 }
Symbol table tracking
• Stack of tblptr is available to keep track of the available symbol table
• When a new procedure is called a symbol table is created and its
pointer pushed to this stack with offset
• When the call terminates this tblptr is popped
Declarations and Records in Pascal
Production Semantic rule
T record L D end { T.type := record(top(tblptr)); T.width := top(offset);
addwidth(top(tblptr), top(offset)); pop(tblptr);
pop(offset) }
Lε { t := mktable(nil); push(t, tblptr); push(0, offset) }
SDT’s of Statements
• S S;S
S id := E
{ p := lookup(top(tblptr), id.name);
if p = nil then
error()
else if p.level = 0 then // global variable
emit(id.place ‘:=’ E.place)
else // local variable in subroutine frame
emit(fp[p.offset] ‘:=’ E.place) }
Assignment statements
• Names in the symbol table
• Variables referred to addresses
• Lookup(id.name) identifies variable id in symbol table
• Emit – used to emit three address statements to output file
Translation scheme
Production Semantic Rules
S id := E; {p = lookup(id.name);
If p ≠ nil then emit (p ‘:=‘ E.place) else error
E E1 + E2 {E.place= newtemp();
emit (E.place ‘:=‘ E1.place ‘+’ E2.place)}
E E1*E2 {E.place= newtemp();
emit (E.place ‘:=‘ E1.place ‘*’ E2.place)}
E - E1 {E. place = newtemp();
emit(E.place ‘:=‘ ‘uminus’ E1.place)}
E (E1) {E.place = E1. place}
E id {p := lookp(id.name); if p ≠ nil then E.place :=p else error}
Reusing Temporary Names
generate
E1 + E2 Evaluate E1 into t1
Evaluate E2 into t2
t3 := t1 + t2
If t1 no longer used, can reuse t1
instead of using new temp t3
Modify newtemp() to use a “stack”:
Keep a counter c, initialized to 0
newtemp() increments c and returns temporary $c
Decrement counter on each use of a $i in a three-address statement
Reusing temporary name
x := a * b + c * d - e * f
Statement c
0
$0 := a * b 1
$1 := c * d 2
$0 := $0 + $1 1
$1 := e * f 2
$0 := $0 - $1 1
x := $0 0