0% found this document useful (0 votes)
33 views

Chapter 6 Intermediate Code Generation

Uploaded by

Yohannes Dereje
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Chapter 6 Intermediate Code Generation

Uploaded by

Yohannes Dereje
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Principles of Compiler

Design

Chapter 6: Intermediate Code


Generation
Contents

• Introduction

– Intermediate code generation

– Why we need Intermediate code generator

• Intermediate code representation

– Parse trees, syntax trees, and Direct Acyclic

Graph(DAG),

– Postfix notation,

– Three-address code,

• Data structure of TAC

– quadruples, triples and indirect Triples


Introduction — Intermediate code
generation
• In compiler, the front-end translates a source program into an

intermediate representation from which the back end

generates target code.

• Intermediate code is:

– The output of the Parser and the input to the Code Generator.

– Relatively machine-independent and allows the compiler to

be retargeted.

– Relatively easy to manipulate (optimize).

• Intermediate code serves as a bridge between the high-


level source language and the low-level machine code.
Why Intermediate Code?
• Facilitates retargeting: enables attaching a back
end for the new machine to an existing front end.
• Enables machine-independent code optimization.
• Simplicity : Conversion to Target Code is made
easy.
Without IR vs. With IR
• Without using intermediate code for n-source languages

and m-Target languages, for each source-target pair we

need (n*m) Compilers.

• If we use Intermediate code, We require n-Compilers to

convert each source language into Intermediate code and

m-Compilers to convert Intermediate code into m-target

languages. Thus We require only (n+m) Compilers.


Intermediate code Representations

• In compiler, the front-end translates a source program into

an intermediate representation from which the back end

generates target code.


Intermediate Languages

 The most commonly used intermediate representations

were :
– Syntax Tree
– Direct Acyclic Graph (DAG)
– Postfix Notation
– 3 Address Code
 Graphical Representation

• Includes-
– Syntax Tree
In addition, we use:
• Static Single Assignment (SSA)
– Direct Acyclic Graph(DAG)
• Control Flow Graph (CFG)
Syntax Tree or Abstract Syntax Tree(AST)
 Syntax Tree depicts the hierarchical structure of a

source program.

 Syntax tree (AST) is a condensed form of parse tree

useful for representing language constructs.

 Graphical Intermediate Representation

 Ex
Syntax Tree / Abstract Syntax Tree(AST)

 Example - Parse tree and syntax tree for 3 * 5 + 4

as follows.
Parse Tree VS Syntax Tree
Parse Tree Syntax Tree

• A parse tree is a graphical • A syntax tree (AST) is a


representation of a condensed form of parse
replacement process in a tree
derivation

• Each interior node • Each interior node


represents a grammar Rule represents an operator
• Each leaf node represents a • Each leaf node represents
terminal an operand

• Parse tree represent every • Syntax tree does not


detail from the real syntax represent every detail from
the real syntax
Constructing Syntax Tree For Expression

• Each node in a syntax tree can be implemented in a record


with several fields.
• In the node of an operator, one field contains operator and
remaining field contains pointer to the nodes for the
operands.
• When used for translation, the nodes in a syntax tree may
contain addition of fields to hold the values of attributes
attached to the node.
• Following functions are used to create syntax tree

1. mknode(op,left,right): creates an operator node with


label op and two fields containing pointers to left and
right.
Constructing Syntax Tree For Expression

• Example -
a–4+c
• The tree is constructed bottom up

P1 = mkleaf(id,entry a)
P2 = mkleaf(num, 4)
P3 = mknode(-, P1, P2)
P4 = mkleaf(id,entry c)
P5 = mknode(+, P3, P4)
Syntax directed definition
• Syntax trees for assignment statements are produced
by the syntax-directed definition.
• Non terminal S generates an assignment statement.
• The two binary operators + and * are examples of the
full operator set in a typical language.
• Operator associates and precedencies are the usual
ones, even though they have not been put into the
grammar.
• This definition constructs the tree from the input a:=b*
-c + b* -c
Syntax directed definition
This definition constructs the tree from the input
a:=b* -c + b* -c
Syntax directed definition
 Two representations of the syntax tree are as
follows.
Direct Acyclic Graph (DAG)
 DAG : Direct Acyclic Graph
 Variant of Syntax tree with a unique node for
each value. [contains no cycles]
 Helps optimize the code by identifying common
sub expressions in syntax tree.
 SDD used to generate Syntax tree will be used to
construct DAG too, with a simple check:

 The process of making this check is an overhead;


hence constructing DAG is costly.
Direct Acyclic Graph (DAG)
 DAG also gives the hierarchical structure of source
program but in a more compact way because
common sub expressions are identified.
Direct Acyclic Graph (DAG)
Postfix Notation
 Linearized representation of syntax tree
 In postfix notation, each operator appears
immediately after its last operand.
 Operators can be evaluated in the order in which
they appear in the string
 EXAMPLE
 Source String : a := b * -c + b * -c
 Postfix String: a b c uminus * b c uminus * +
assign
Postfix Notation - Rules
1) If E is a variable or constant, then the postfix notation for
E is E itself.
2) If E is an expression of the form E1 op E2 then postfix
notation for E is
– E1E2op.

3) If E is an expression of the form (E), then the postfix


notation for E is the same as the postfix notation for E.
4) For unary operation –E the postfix is E
– Ex: postfix notation for 9- (5+2) is 952+-

 Postfix notation of an infix expression can be obtained


using stack
Three-address Code

• In Three address statement, at most 3 addresses are used


to represent any statement.
• The reason for the term “three address code” is that each
statement contains 3 addresses at most.
– Two for the operands and one for the result.
• General Form of three Address Code is:

a = b op c where,
– a, b, c are the operands that can be names, constants
or compiler generated temporaries.
– op represents operator, such as fixed or floating point
arithmetic operator or a logical operator on Boolean
valued data.
Three-address Code
• Linearized representation of syntax tree or DAG.
• At most one operator on RHS of an instruction.
• Each instruction can have up to three addresses.
• Address can either be a
– name(identifier)
– constant(number)
– Temporary (holds intermediate result)
• Thus a source language expression like x + y * z
might be translated into a sequence
Three-address Code

• Example #2: Consider the assignment a:=b*-c+b*-c:


• The tree address code(TAC) representation will be;
T1 := -c
T2: = b*T1
T3: = T2 + T2
a := T3
• Let us calculate addresses based on widths, using the formula
address(List[i]) = base+i1*w1+i2*w2+………………+ik*wk,
where
– base – the address of first element
– W – width of the array, for Integer =4, char =1, Float =4, double =8, …
– i –the index of array
• Implementation of Arrays
• Example: calculate address(list[1]), assume base = 12 and
element size =4.
12 16 20 24 28 32 36 40 44 48
10 5 20 12 25 21 48 52 99 01

• Access function for single-dimensioned arrays:


address(list[k])= address(list[1])+((k - 1) * element_size)
address(list[4])= 12+((4-1)*4) = > 12+12 =24
address(list[3])= 12+((3-1)*4) = > 12+8 =20
address(list[2])= 12+((2-1)*4) = > 12+4 =16
address(list[1])= 12+((1-1)*4) = > 12+0 =12
• Translation of Array References – Example
– Let a denote an integer array of size 5, and
let c and i be integers.
– Assuming that the width of an integer is 4.
– Three address code for expression c+a[i] is
t1 = i * 4
t2 = a [ t 1 ] or a+t1
t 3 = C + t2
• TAC generation for Arrays
• Implementing Multi-dimensioned Arrays
• Multi-dimensioned arrays must be mapped to linear memory.
• Given this 3 x 3 array: A[3][3] =
347
625
138
• There are two common ordering ways:
• Row major order (stored by rows) – used in most languages
– 3, 4, 7, 6, 2, 5, 1, 3, 8
• Column major order (stored by columns) – used in Fortran
– 3, 6, 1, 4, 2, 3, 7, 5, 8
• Accessing an Element in a Multi-dimensioned Array
Location (a[i,j]) = address of a [row_lb,col_lb] +
(((number of rows above ith row) * size of row) +
(number of elements left to the jth column)) *
element_size
Location (a[i,j]) = address of a [row_lb,col_lb] +(((i -
row_lb) * n) + (j - col_lb)) * element_size
• Translation of Array References – Example
• Example 2- Let a denote a 3 x 3 array of integers, and let c, i,
and j all denote integers.
• Three address code for expression c+a[i][i] is

t1 = i * 12
t2=j*4
t 3 = t 1+ t 2
t 4 = a [ t 3 ] or a + t3
t s = C + t4
Implementation of Three-Address Statements

• Three address code is represented as record

structure with fields for operator and operands.

• These records can be stored as array or linked list.

• Those representations are

– Quadruples

– Triples

– Indirect triples
Quadruples

• A quadruple is a record structure with four fields,

which are op, ag1, arg2 and result

• The op field contains an internal code for the operator.

The three address statement


– x:= y op z is represented by placing y in arg1, z in arg2 and x

in result.

• Disadvantage – Temporary names have to be entered


Quadruples
Quadruples
TRIPLES
 In triples representation, the use of temporary
variables is avoided & instead serial number of
statement is used.
 So three address statements can be represented by
records with only there fields OP, arg1 & arg2.
 Since, there fields are used this intermediated code
formal is known as triples
 Advantages
 No need to use temporary variable which saves memory
as well as time
 Disadvantages
 Code Immovability
TRIPLES
INDIRECT TRIPLES
 This representation is an enhancement over triple
representation.
 It uses an additional instruction array to led the pointer
to the triples in the desired order.
 Since, it uses pointers instead of position to stage
reposition the expression to produce an optimized
code.
Indirect Triples - Example
Con…
Con…
PROBLEM 1
• Translate the following expression to quadruple tuples &
indirect tuples
a=b*-c+b*-c
• Sol : - Three address code for given expression is
TAC
t1 = uniminus c
t2 = b* t1
t3 = uniminus c
t4 = b* t3
t5 = t2 + t4
Q = t5
Summary

• Data Structures for TAC :


• Three address code is represented as record structure with
fields for operator and operands.
• These records can be stored as array or linked list.
• We discuss the following arrangements :
1. Quadruples

2. 2. Triples
3. 3. Indirect Triples

You might also like