3 Intermediate Code Generation
3 Intermediate Code Generation
LEARNING OBJECTIVES
P11 = makeleaf (id, d) The corresponding three address code will be like this:
P12 = makenode (*, P10, P11)
Syntax Tree DAG
P13 = makenode (+, P7, P12)
t1 = -z t1 = -z
Example 2: a: = a – 10 t2 = y * t1 t2 = y * t1
:=
t3 = -z t5 = t2 + t2
− t4 = y * t3 X = t5
a 10 t5 = t4 + t2
X = t5
Three-Address Code The postfix notation for syntax tree is: xyz unaryminus *yz
In three address codes, each statement usually contains 3 unaryminus *+=.
addresses, 2 for operands and 1 for the result. •• Three address code is a ‘Linearized representation’ of
Example: -x = y OP z syntax tree.
•• x, y, z are names, constants or complier generated •• Basic data of all variables can be formulated as syntax
temporaries, directed translation. Add attributes whenever necessary.
•• OP stands for any operator. Any arithmetic operator (or) Example: Consider below SDD with following
Logical operator. specifications:
Example: Consider the statement x = y * - z + y* - z E might have E. place and E.code
E.place: the name that holds the value of E.
=
E.code: the sequence of intermediate code starts evaluating E.
+ Let Newtemp: returns a new temporary variable each time
x
it is called.
*
* New label: returns a new label.
Unary-minus
y Unary-minus Then the SDD to produce three–address code for expressions
z y
z
is given below:
Error handling routine error – msg (error information); Start_addr: starting address
The error messages can be written and stored in other 1D Array: A[i]
file. Temp space management:
•• Start_addr + (i – low )* w = i * w + (start_addr - low *w)
•• This is used for generating code for expressions. •• The value called base, (start_addr – low * w) can be com-
•• newtemp (): allocates a temp space. puted at compile time and then stored at the symbol table.
•• freetemp (): free t if it is allocated in the temp space Example: array [-8 …100] of integer.
To declare [-8] [-7] … [100] integer array in Pascal.
Label management 2D Array A [i1, i2]
•• This is needed in generating branching statements. Row major order: row by row. A [i] means the ith row.
•• newlabel (): generate a label in the target code that has 1st row A [1, 1]
never been used. A [1, 2]
Type conversions
n n
)
= i1 * π ik= 2 i + i2 * π ik=3 i + + ik * w
L→ id Boolean Expressions
L→ [Elist] There are two choices for implementation of Boolean
Elist→ Elist1, E expressions:
Elist→ id [E] 1. Numerical representation
E→ id 2. Flow of control
E→ E + E
Numerical representation
E→ (E)
Encode true and false values.
•• S → L: = E {if L. offset = null then /* L is a Numerically, 1:true 0: false.
simple id */ gen (L. place, “:=”, E.place); Flow of control: Representing the value of a Boolean
Else expression by a position reached in a program.
gen (L. place, “[“, L. offset, “]”,”:=”,
Short circuit code: Generate the code to evaluate a Boolean
E.place);
expression in such a way that it is not necessary for the code
•• E → E1 + E2 {E.place = newtemp ();
gen (E. place, “:=”, E1.place, "+”, E2.
to evaluate the entire expression.
place) ;} •• If a1 or a2
•• E → (E1) {E.place= E1.place} a1 is true then a2 is not evaluated.
•• E →L {if L. offset = null then /* L is a •• If a1 and a2
simple id */ E.place:= L .place); a1 is false then a2 is not evaluated.
Else begin
E.place:=newtemp(); Numerical representation
gen (E.place, “:=”,L.place, “[“,L.offset,
E → id1 relop id2
‘]”);
{B.place:= newtemp ();
end }
gen (“if”, id1.place, relop.op, id2.
•• L → id {P! = lookup (id.name, top (tblptr)); place,”goto”, next stat +3);
If P ≠ null then
gen (B.place,”:=”, “0”);
Begin
gen (“goto”, nextstat+2);
L.place: = P.place:
gen (B.place,”:=”, “1”)’}
L.offset:= null;
End Example 1: Translate the statement (if a < b or c < d and e
Else < f) without short circuit evaluation.
Error (“Var underfined”, id. Name) ;} 100: if a < b goto 103
•• L → Elist {L. offset: = newtemp (); 101: t1:= 0
gen (L. offset, “:=”, Elist.elesize, 102: goto 104
“*”, Elist.place ); 103: t1:= 1 /* true */
freetemp (Elist.place); 104: if c < d goto 107
L.Place := Elist . base ;} 105: t2:= 0 /* false */
•• Elist→ Elist1, E {t: =newtemp (); m: = Elist1. 106: goto 108
ndim+1;
107: t2:= 1
gen (t, “:=” Elist1.place, “*”, limit (Elist1.
108: if e < f goto 111
array, m));
Gen (t, “:=”, t"+”, E.place); freetemp 109: t3:= 0
(E.place); 110: goto 112
Elist.array: = Elist.array; 111: t3 := 1
Elist.place:= t; Elist.ndim:= m ;} 112: t4 := t2 and t3
Elist → id [E {Elist.Place:= E.place; Elist. 113: t3:= t1 or t4
ndim:=1;
P! = lookup (id.name, top (tblptr)); check Flow of Control Statements
for id errors;
Elist.elesize:= P.size; Elist.base: = p.base;
B→ id1 relop id2
{
Elist.array:= p.place ;}
B.true: = newlabel ();
•• E → id {P:= lookup (id,name, top (tblptr); B.false:= newlabel ();
Check for id errors; E. Place: = Populace ;} B.code:= gen (“if”, id1. relop, id2, “goto”,
6.42 | Unit 6 • Compiler Design
1. If – then implementation:
S →if B then S1 {gen (Befalls,” :”);} Translation sequence
To B.true •• Evaluate the expression.
B.Code
To B.false •• Find which value in the list matches the value of the
B.true: S1.Code expression, match default only if there is no match.
B.false: •• Execute the statement associated with the matched value.
2. If – then – else How to find the matched value? The matched value can be
P→S {S.next:= newlabel (); found in the following ways:
P.code:= S.code || gen (S.next,” :”)} 1. Sequential test
2. Lookup table
S → if B then S1 else S2 {S1.next:= S.next;
3. Hash table
S2.next:= S.next; 4. Back patching
Secede: = B.code || S1.code ||.
Two different translation schemes for sequential test are
Gen (“goto” S.next) || B. false,” :”) shown below:
||S2.code}
1. Code to evaluate E into t
Need to use inherited attributes of S to define the
Goto test
attributes of S1 and S2
L[i]: code for S [1]
B.Code To B. true goto next
To B.false
B.true: S1.Code L[k]: code for S[k]
goto next
Goto S.next
L[d]: code for S[d]
S2.Code
B.false: Go to next test:
S.next If t = V [1]: goto L [1]
.
3. While loop: .
B→ id1 relop id2 B.true:= newlabel (); .
B.false:= newlabel (); goto L[d]
B.code:=gen (‘if’, id.relop, Next:
id2, ‘goto’, B.true ‘else’, ‘goto’, B. false) || 2. Can easily be converted into look up table
gen (B.true ‘:’); If t <> V [i] goto L [1]
S→ while B do S1 S.begin:= newlabel (); Code for S [1]
S.code:=gen (S.begin,’:’)|| goto next
B.code||S1.code || gen
(‘goto’, S.begin) || gen (B.false, ‘:’); L [1]: if t < > V [2] goto L [2]
Code for S [2]
S.begin B.Code B. true
B.false Goto next
B.true: S1.Code
L [k - 1]: if t < > V [k] goto L[k]
Goto S.next
Code for S[k]
B.false: Goto next
.
4. Switch/case statement: .
The c - like syntax of switch case is .
switch epr { L[k]: code for S[d]
case V [1]: S [1] Next:
Chapter 3 • Intermediate Code Generation | 6.43
Use a table and a loop to find the address to jump r – value: value of the variable, i.e., on the right side of
assignment. Ex: y, in above assignment.
V [1] L [1] l – value: The location/address of the variable, i.e., on the
L[1] : S [1]
V [2] L [2] leftside of assignment. Ex: x, in above assignment.
L [2]: S [2]
There are different modes of parameter passing
V [3] L [3]
1. call-by-value
2. call-by-reference
3. call-by-value-result (copy-restore)
4. call-by-name
3. Hash table: When there are more than two entries
use a hash table to find the correct table entry.
4. Back patching: Call by value
•• Generate a series of branching statements with the Calling procedure copies the r values of the arguments into
targets of jumps temporarily left unspecified. the called proceduce’s Activation Record.
•• To determine label table: each entry contains a list Changing a formal parameter has no effect on the actual
of places that need to be back patched. parameter.
•• Can also be used to implement labels and gotos.
Example: void add (int C)
{
Procedure Calls C = C+ 10;
•• Space must be allocated for the activation record of the printf (‘\nc = %d’, &C);
called procedure. }
•• Arguments are evaluated and made available to the called main ()
procedure in a known place. {
•• Save current machine status. int a = 5;
•• When a procedure returns: printf (‘a=%d’, &a);
•• Place returns value in a known place. add (a);
•• Restore activation record. printf (‘\na = %d’, &a);
}
Example: S → call id (Elist) In main a will not be affected by calling add (a)
{for each item P on the queue Elist.
It prints a = 5
Queue do gen (‘PARAM’, q);
gen (‘call:’, id.place) ;}
a=5
Elist → Elist, E {append E.place to the end of Only the value of C in add ( ) will be changed to 15.
Elist.queue} Usage:
Elist → E {initialize Elist.queue to contain only 1. Used by PASCAL and C++ if we use non-var
E.place} parameters.
Use a queue to hold parameters, then generate codes for 2. The only thing used in C.
params. Advantages:
Code for E1, store in t1 1. No aliasing.
. 2. Easier for static optimization analysis.
. 3. Faster execution because of no need for redirecting.
.
Code for Ek, store in tk
PARAM t1 Call by reference
: Calling procedure copies the l-values of the arguments into
. the called procedure’s activation record. i.e., address
. will be passed to the called procedure.
PARAM tk
•• Changing formal parameter affects the corresponding
Call P
actual parameter.
Terminology:
•• It will have some side effects.
Procedure declaration:
Parameters, formal parameters Example: void add (int *c)
Procedure call: {
Arguments, actual parameters. *c = *c + 10;
The values of a variable: x = y printf(‘\nc=%d’, *c);
6.44 | Unit 6 • Compiler Design
} int j;
void main() j = - 1;
{ For (in y= 0; y < 10; y ++)
int a = 5; x ++;
}
printf (‘\na = %d’, a);
add (&a); •• Instead of passing values or address as arguments, a func-
printf (‘\na = %d’, a); tion is passed for each argument.
output: a = 5 •• These functions are called thunks.
c = 15 •• Each time a parameter is used, the thunk is called, then
a = 15 the address returned by the thunk is used.
That is, here the actual parameter is also modified.
y = 0: use return value of thunk for y as the -value.
Advantages
1. Efficiency in passing large objects. Advantages
2. Only need to copy addresses. •• More efficient when passing parameters that are never
used.
Call-by-value-result •• This saves lot of time because evaluating unused param-
Equivalent to call-by-reference except when there is aliasing. eter takes a longtime.
That is, the program produces the same result, but not the
same code will be generated. Code Generation
Aliasing: Two expressions that have the same l-values are Code generation is the final phase of the compiler model.
called aliases. They access the same location from different
places. Input Intermediate Code
Front
Aliasing happens through pointer manipulation. (or)
end code optimization
Source
1. Call by reference with global variable as an argument. program
2. Call by reference with the same expression as argu- Intermediate code
ment twice.
Example: test (x,y,x) Target Code
program generation
Advantages:
1. If there is no aliasing, we can implement it by using
call – by – reference for large objects. The requirements imposed on a code generator are
2. No implicit side effect if pointers are not passed. 1. Output code must be correct.
2. Output code must be of high quality.
Call by-name 3. Code generator should run efficiently.
used in Algol.
•• Procedure body is substituted for the call in calling procedure. Issues in the Design of a Code Generator
•• Each occurrence of a parameter in the called procedure is
The generic issues in the design of code generators are
replaced with the corresponding argument.
•• Similar to macro expansion. •• Input to the code generator
•• A parameter is not evaluated unless its value is needed •• Target programs
during computation. •• Memory Management
•• Instruction selection
Example: •• Register Allocation
void show (int x) •• Choice of Evaluation order
{
for (int y = 0; y < 10; y++)
x++; Input to the code generator
} Intermediate representation with symbol table will be the
main () input for the code generator.
{
int j; •• High Level Intermediate representation
j = –1; Example: Abstract Syntax Tree (AST)
show (j);
} •• Medium – level intermediate representation
Actually it will be like this Example: control flow graph of complex operations
main ()
{ •• Low – Level Intermediate representation
Chapter 3 • Intermediate Code Generation | 6.45
•• A name in a three address statement refers to a symbol Mode Form Address Cost
entry for the name. Absolute M M 2
•• Stack, heap, garbage collection is done here. Register R R 1
Indexed C(R) C+contents(R) 2
Instruction selection Indirect *R Contents (R) 1
register
Instruction selection depends on the factors like
Indirect *C(R) Contents (C+contents 2
•• Uniformity indexed (R))
•• Completeness of the instruction
•• Instruction speed Example: x: = y – z
•• Machine idioms MOV y, R0 → cost = 2
•• Choose set of instructions equivalent to intermediate rep- SUB z, R0 → cost = 2
resentation code. MOV R0, x → cost = 2
•• Minimize execution time, used registers and code size. 6
6.46 | Unit 6 • Compiler Design
Example: Three address statement: if x < y goto z Code Generation from DAG:
It can be implemented by subtracting y from x in R, then
jump to z if value of R is negative. S1 = 4 * i S1 = 4 * i
2. Based on a set of condition codes to indicate whether S2 = add(A) - 4 S2 = add(A) - 4
last quantity computed or loaded into a location is S3 = S2 [S1] S3 = S2 [S1]
negative (or) Zero (or) Positive. S4 = 4 * i
•• compare instruction set codes without actually S5 = add(B) - 4 S5 = add(B) - 4
computing the value. S6 = S5[S4] S6 = S5[S4]
Example: CMP x, y S7 = S3 *S6 S7 = S3 *S6
CJL Z. S8 = prod + S7 prod = prod + S7
•• Maintains a condition code descriptor, which tells the prod = S8
name that last sets the condition codes. S9 = I + 1
Example: X: = y + z I = S9 I=I+1
If x < 0 goto z if I < = 20 got (1) if I < = 20 got (1)
By
MOV y, Ro
ADD z, Ro Rearranging order of the code
MOV Ro, x Consider the following basic block
CJN z. t1:= a + b
t2:= c + d
DAG Representation t3:= e – t2
of Basic Blocks x = t1 - t3 and its DAG
•• DAGS are useful data structures for implementing trans- −x
formations on basic blocks.
•• Tells, how value computed by a statement is used in sub- t1 − t3
sequent statements.
•• It is a good way of determining common sub expressions. a b y +t 2
•• A DAG for a basic block has following labels on the nodes:
•• Leaves are labeled by unique identifiers, either variable c d
names or constants.
•• Interior nodes are labeled by an operator symbol. Three address code for the DAG:
•• Nodes are also optionally given as a sequence of identi- (Assuming only two registers are available)
fiers for labels. MOV a, Ro
Example: 1: t1:= 4 * i ADD b, Ro
2: t2:= a [t1] MOV c, R1
3: t3:= 4 * i MOV Ro, t1 Register Spilling
4: t4:= b [t3]
MOV e, Ro Register Reloading
5: t5:= t2 * t4
6: t6:= prod + t5 SUB R1, Ro
7: prod: = t6 MOV t1, R1
8: t7:= i + 1 SUB Ro, R1
9: i= t7 MOV R1, x
10: if i < = 20 got (1)
Rearranging the code as
+ t 6, prod t2:= c + d
t3:= e – t2
prod * t5
t1:= a + b
(1)
x = t1 – t3
[] [ ] t4 <=
The rearrangement gives the code:
MOV c, Ro
20
a b t 1, t 3 + t 7, i ADD d, Ro
*
MOV e, R1
4 i SUB Ro, R1
io
6.50 | Unit 6 • Compiler Design
Exercises
Practice Problems 1 Var …
Directions for questions 1 to 15: Select the correct alterna- call A2;
tive from the given choices }
1. Consider the following expression tree on a machine Procedure A2 ( )
with bad store architecture in which memory can be {
accessed only through load and store instructions. The
variables p, q, r, s and t are initially stored in memory. Var..
The binary operators used in this expression tree can Procedure A21 ( )
be evaluated by the machine only when the operands {
are in registers. The instructions produce result only Var…
in a register if no intermediate results can be stored
in memory, what is the minimum number of registers call A21 ( );
needed to evaluate this expression? }
+ Call A1;
}
− − Call A1;
}
p q t + Consider the calling chain: main ( )→ A1 ( ) → A2 ( ) →
A21 ( ) → A1 ( ).
r s
The correct set of activation records along with their
access links is given by
(A) 2 (B) 9 (A) (B) (C) (D)
(C) 5 (D) 3 main main
main main
2. Consider the program given below with lexical scoping
and nesting of procedures permitted. A1 A1 A1 A1
Program main ( ) A2 A2 A2 A2
{
A 21
Var … A 21 A 21 A 21
Procedure A1 ( ) Frame A1 Access A1 A1
{ Pointer links
6.52 | Unit 6 • Compiler Design
How many blocks are there in the flow graph for the 18. In call by value the actual parameters are evaluated.
above code? What type of values is passed to the called procedure?
(A) 5 (A) l-values
(B) 6 (B) r-values
(C) Text of actual parameters
(C) 8
(D) None of these
(D) 7
19. Which of the following is FALSE regarding a Block?
7. A basic block can be analyzed by
1 (A) The first statement is a leader.
(A) Flow graph (B) Any statement that is a target of conditional / un-
conditional goto is a leader.
(B) A graph with cycles
(C) Immediately next statement of goto is a leader.
(C) DAG (D) The last statement is a leader.
(D) None of these
Answer Keys
Exercises
Practice Problems 1
1. D 2. D 3. C 4. C 5. B 6. A 7. B 8. A 9. A 10. A
11. B 12. C 13. A 14. C 15. B
Practice Problems 2
1. B 2. B 3. A 4. B 5. B 6. A 7. B 8. C 9. A 10. A
11. B 12. D 13. A 14. C 15. C 16. A 17. C 18. B 19. D