0% found this document useful (0 votes)
6 views

Syntax-Directed Translation

Uploaded by

wiviyap974
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Syntax-Directed Translation

Uploaded by

wiviyap974
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 126

Course Name:

COMPILER DESIGN

Course code:
SCS1303
August 11, 2024 SCS1303 Compiler Design 1
UNIT 3
INTERMEDIATE CODE GENERATION

• Syntax directed translation scheme


• Three Address Code
• Representation of three address code
• Intermediate code generation for:
– Assignment statements
– Boolean statements
– Switch case statement
– Procedure call
• Symbol Table Generation
August 11, 2024 SCS1303 Compiler Design 2
Introduction

August 11, 2024 SCS1303 Compiler Design 3


Rest of Analysis Phase

• To help in semantic analysis.


• To help in intermediate code generation .
• Two tools are used:
– Semantic rules

– Semantic actions

August 11, 2024 SCS1303 Compiler Design 4


Syntax Directed Translation(SDT)

• The notational framework for intermediate code generation is


an extension of context-free grammar called syntax-directed
translation scheme.
• It allows subroutines or semantic actions to be attached to the
productions of a CFG.
• Grammar symbols are associated with attributes to associate
information with the programming language constructs
that they represent.
• Values of these attributes are evaluated by the semantic rules
associated with the production rules.
Grammar + semantic rule = SDT (syntax directed translation)

August 11, 2024 SCS1303 Compiler Design 5


EXAMPLE

August 11, 2024 SCS1303 Compiler Design 6


SDT (Contd..)
• Evaluation of the semantic rules:
– may generate intermediate codes
– may put information into the symbol table
– may perform type checking
– may issue error messages
– may perform some other activities

• An attribute may hold almost any thing.


• a string, a number, a memory location, a complex record.

August 11, 2024 SCS1303 Compiler Design 7


SDT (Contd..)
• When we associate semantic rules with productions, we use
two notations:
– Syntax-Directed Definitions
– Translation Schemes
• Syntax-Directed Definitions:
– give high-level specifications for translations
– hide many implementation details such as order of
evaluation of semantic actions.
• Translation Schemes:
– indicate the order of evaluation of semantic actions
associated with a production rule.
– In other words, translation schemes give a little bit
information about implementation details.
August 11, 2024 SCS1303 Compiler Design 8
August 11, 2024 SCS1303 Compiler Design 9
Syntax-Directed Definitions
• A syntax-directed definition is a generalization of a context-free
grammar in which:
– Each grammar symbol is associated with a set of attributes.
– This set of attributes for a grammar symbol is partitioned
into two subsets called synthesized and inherited attributes of
that grammar symbol.
– Each production rule is associated with a set of semantic rules.

A→XYZ {Y.VAL=2*A.VAL}
August 11, 2024 SCS1303 Compiler Design 10
August 11, 2024 SCS1303 Compiler Design 11
Annotated Parse Tree
• A parse tree showing the values of attributes at each node is called
an annotated parse tree.
• The process of computing the attributes values at the nodes is called
annotating(or decorating) of the parse tree.
• Of course, the order of these computations depends on the
dependency graph induced by the semantic rules.

August 11, 2024 SCS1303 Compiler Design 12


Syntax-Directed Definitions
• Semantic rules set up dependencies between attributes which
can be represented by a dependency graph.
• This dependency graph determines the evaluation order of
these semantic rules.

The flow of information happens bottom-up and all the children attributes
are computed before parents.
August 11, 2024 SCS1303 Compiler Design 13
Example

August 11, 2024 SCS1303 Compiler Design 14


Implementation of SDT

• Use an extra fields in the parser stack entries corresponding


to the grammar symbols.
• These fields hold the values of the corresponding translations.
STATE VAL

TOP
Z Z.VAL

Y Y.VAL

X X.VAL

STACK BEFORE REDUCTION

August 11, 2024 SCS1303 Compiler Design 15


SDT scheme for Desk calculator
Production Semantic Action
S→E$ { printE.VAL }
E → E 1 + E2 {E.VAL := E1.VAL + E2.VAL }
E → E1 * E2 {E.VAL := E1.VAL * E2.VAL }
E → (E1) {E.VAL := E1.VAL }
E→I {E.VAL := I.VAL }
I → I1 digit {I.VAL := 10 * I1.VAL + LEXVAL }
I → digit { I.VAL:= LEXVAL}

August 11, 2024 SCS1303 Compiler Design 16


Parse Tree

S→E$
E → E1 + E2
E → E1 * E2
E → (E1)
E→I
I → I1 digit
I → digit

Input: 23*5+4$

August 11, 2024 SCS1303 Compiler Design 17


Parse tree with translations
Input: 23*5+4$

August 11, 2024 SCS1303 Compiler Design 18


Implementation of Desk calculator
Production Program Fragment
S→E$ print VAL[TOP]
E→E+E VAL[TOP] := VAL[TOP] + VAL[TOP-2]
E→E*E VAL[TOP] := VAL[TOP] * VAL[TOP-2]
E → (E) VAL[TOP] := VAL[TOP-1]
E→I none
I → I digit VAL[TOP] := 10 * VAL[TOP] + LEXVAL
I → digit VAL[TOP] := LEXVAL}

TOP
E E.VAL
+
E E.VAL
.
August 11, 2024 SCS1303 Compiler Design 19
e Moves Input STATE VAL Production used
q (1) 23*5+4$ _ _

u
(2) 3*5+4$ 2 _
(3) 3*5+4$ I 2 I → digit

e (4)
(5)
*5+4$
*5+4$
I3
I
2_
(23) I → I digit
n (6) *5+4$ E (23) E→I
(7) 5+4$ E* (23) _
c (8) +4$ E*5 (23) _ _

e (9)
(10)
+4$
+4$
E*I
E*E
(23) _ 5
(23) _ 5
I → digit
E→I

o (11)
(12)
+4$
4$
E
E+
(115)
(115) _
E→E*E

f (13) $ E+4 (115) _4


(14) $ E+I (115) _4 I → digit
M (15) $ E+E (115) _4 E→I

o (16)
(17)
$
_
E
E$
(119)
(119) _
E→E+E

v (18) _ S PRINT 119 S→E$

e
August 11, 2024 SCS1303 Compiler Design 20
Intermediate Code

August 11, 2024 SCS1303 Compiler Design 21


• In the analysis-synthesis model of a compiler, the front end of
a compiler translates a source program into an independent
intermediate code, then the back end of the compiler uses
this intermediate code to generate the target code (which can
be understood by the machine).

• The benefits of using machine independent intermediate


code are:
– Intermediate code eliminates the need of a new full compiler for every
unique machine by keeping the analysis portion same for all the
compilers.
– The second part of compiler, synthesis, is changed according to the
target machine.
– It becomes easier to apply the source code modifications to improve
code performance by applying code optimization techniques on the
intermediate code.
August 11, 2024 SCS1303 Compiler Design 22
Intermediate Representation
• Intermediate codes can be represented in a variety of ways
and they have their own benefits.
• High Level IR - High-level intermediate code representation is
very close to the source language itself. They can be easily
generated from the source code and we can easily apply code
modifications to enhance performance. But for target
machine optimization, it is less preferred.
• Low Level IR - This one is close to the target machine, which
makes it suitable for register and memory allocation,
instruction set selection, etc. It is good for machine-
dependent optimizations.
• Intermediate code can be either language specific (e.g., Byte
Code for Java) or language independent (three-address code).

August 11, 2024 SCS1303 Compiler Design 23


What is intermediate code?
• During the translation of a source program into the object
code for a target machine, a compiler may generate a middle-
level language code, which is known as intermediate
code or intermediate text.
• The complexity of this code lies between the source language
code and the object code.
• The intermediate code can be represented in the form of:
– Postfix notation,
– Syntax tree,
– Three-address code.

August 11, 2024 SCS1303 Compiler Design 24


August 11, 2024 SCS1303 Compiler Design 25
Postfix Notation

• The ordinary (infix) way of writing the sum of a and b is with


operator in the middle : a + b

• The postfix notation for the same expression places the


operator at the right end as ab +.

• In general, if e1 and e2 are any postfix expressions, and + is any


binary operator, the postfix notation is:
e1e2 +
• No parentheses are needed in postfix notation because the
position and arity (number of arguments) of the operators
permit only one way to decode a postfix expression.

August 11, 2024 SCS1303 Compiler Design 26


Postfix Notation (Contd..)

• Postfix notation is the useful form of intermediate code if the


given language is expressions.

• Postfix notation is also called as 'suffix notation' and 'reverse


polish'.

• Postfix notation is a linear representation of a syntax tree.

• In the postfix notation, any expression can be written


unambiguously without parentheses.

August 11, 2024 SCS1303 Compiler Design 27


Example
The postfix representation of the expression
1. (a+b)*c ab+c*
2. a*(b+c) abc+*
3. (a+b)*(c+d) ab+cd+*
4. (a – b) * (c + d) + (a – b) ab - cd + *ab - +
5. Let us introduce a 3-ary (ternary) operator ,the conditional
expression. Consider
 x, if eif ethen
0 x else y ,whose value is

 y , if e  0

Using ? as a ternary postfix operator, we can represent this


expression as:
exy?
August 11, 2024 SCS1303 Compiler Design 28
Example

What is the postfix?


if a then if c-d then a+c else a*c else a+b
Soln:
acd-ac+ac*?ab+?

August 11, 2024 SCS1303 Compiler Design 29


Evaluation of Postfix Expression

• Consider the postfix expression ab+c*.Suppose a,b and c have


values 1,3 and 5 respectively.
• To evaluate 13+5*,perform the following:
1. Stack 1
2. Stack 3
3. Add the two topmost elements ,pop them off stack and
the stack the result 4.
4. Stack 5
5. Multiply the two topmost elements ,pop them off stack
and the stack the result 20.
The value on top of the stack at the end is the value of the
entire expression

August 11, 2024 SCS1303 Compiler Design 30


Syntax-Directed translation to Postfix code

• E.Code is a string-valued translation

• The value of the translation E.code for the first production is


the concatenation of E(1).code , E(2).code and the symbol op

August 11, 2024 SCS1303 Compiler Design 31


Sequence of moves
• Processing the input a+b*c ,a syntax-directed translator based
on an LR parser, has the following sequence of moves:
1. Shift a
2. Reduce by E→ id and print a
3. Shift +
4. Shift b
5. Reduce by E→ id and print b
6. Shift *
7. Shift c
8. Reduce by E→ id and print b
9. Reduce by E→ E op E and print *
10. Reduce by E→ E op E and print +

August 11, 2024 SCS1303 Compiler Design 32


Syntax Tree
• Syntax tree is nothing more than condensed form of a parse
tree.

• The operator and keyword nodes of the parse tree are moved
to their parents and a chain of single productions is replaced
by single link in syntax tree the internal nodes are operators
and child nodes are operands.

• To form syntax tree put parentheses in the expression, this


way it's easy to recognize which operand should come first.

August 11, 2024 SCS1303 Compiler Design 33


Parse tree vs. Syntax tree
• A parse tree contains more details than actually needed. So, it
is very difficult to compiler to parse the parse tree. Take the
following parse tree as an example:

August 11, 2024 SCS1303 Compiler Design 34


Parse tree vs. Syntax tree

• In the parse tree, most of the leaf nodes are single child to
their parent nodes.
• In the syntax tree, we can eliminate this extra information.
• Syntax tree is a variant of parse tree. In the syntax tree,
interior nodes are operators and leaves are operands.
• Syntax tree is usually used when represent a program in a tree
structure.
• A sentence id + id * id would have the following syntax tree:

August 11, 2024 SCS1303 Compiler Design 35


Example
• The syntax tree for the expression a*(b+c)/d
/

* d

a +

b c
• The syntax tree for the statement if a=b then a=c+d else b=c-d
if-then-else

= = =

a b a b -
+

c d c d

August 11, 2024 SCS1303 Compiler Design 36


• Example:
x = (a + b * c) / (a – b * c)

August 11, 2024 SCS1303 Compiler Design 37


Syntax-directed construction of syntax tree
• Abstract syntax tree can be represented as:

• Abstract syntax trees are important data structures in a


compiler. It contains the least unnecessary information.
• Abstract syntax trees are more compact than a parse tree and
can be easily used by a compiler.

August 11, 2024 SCS1303 Compiler Design 38


Three address code
• Three-address code is an intermediate code. It is used by the
optimizing compilers.

• In three-address code, the given expression is broken down


into several separate instructions. These instructions can
easily translate into assembly language.

• Each three address code instruction has at most three


operands. It is a combination of assignment and a binary
operator.

August 11, 2024 SCS1303 Compiler Design 39


August 11, 2024 SCS1303 Compiler Design 40
August 11, 2024 SCS1303 Compiler Design 41
Example
Given an Expression:
a := (-c * b) + (-c * d)
Three-address code is as follows:
t1 := -c
t2 := b*t1
t3 := -c
t4 := d * t3
t5 := t2 + t4
a := t5
t is used as registers in the target program.
The three address code can be represented in two
forms: quadruples and triples (and indirect triples).
August 11, 2024 SCS1303 Compiler Design 42
Types of Three Address code

August 11, 2024 SCS1303 Compiler Design 43


• Unconditional Jump:
goto L
– Jumps to the three address code with the label L and the
execution continues from that statement.
e.g. goto L
...
...
L: x=y+z
...
• Conditional Jump:
If x relop y goto L

August 11, 2024 SCS1303 Compiler Design 44


August 11, 2024 SCS1303 Compiler Design 45
August 11, 2024 SCS1303 Compiler Design 46
Three-address statements
x:=y op z assignment
x:=op y unary assignment
x:=y copy
goto L unconditional jump
if x relop y goto L conditional jump
param x procedure call
call p n procedure call
return y procedure call
x:=y[i] indexed assignment
x[i]:=y indexed assignment
x:=&y pointer assignment
x:=*y pointer assignment
*x=y pointer assignment
August 11, 2024 SCS1303 Compiler Design 47
• The three address code can be represented by:
– Quadruples

– Triples and

– Indirect triples.

August 11, 2024 SCS1303 Compiler Design 48


Quadruple
It is structure with consist of 4 fields namely op, arg1, arg2 and
result.
–op denotes the operator
–arg1 and arg2 denotes the two operands and
–result is used to store the result of the expression.
Advantage :
• Easy to rearrange code for global optimization.
• Can quickly access value of temporary variables using
symbol table.
Disadvantage :
• Contain lot of temporaries.
• Temporary variable creation increases time and space
complexity.
August 11, 2024 SCS1303 Compiler Design 49
Example : Consider expression a = b * – c + b * – c.
The three address code is:
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5

August 11, 2024 SCS1303 Compiler Design 50


Triples
• Doesn’t make use of extra temporary variable.
• A reference to another triple’s value is needed, a pointer to
that triple is used.
• So, it consist of only three fields namely op, arg1 and arg2.

Disadvantage :
• Temporaries are implicit and difficult to rearrange code.
• It is difficult to optimize because optimization involves moving
intermediate code. When a triple is moved, any other triple
referring to it must be updated also.
• With help of pointer one can directly access symbol table
entry.

August 11, 2024 SCS1303 Compiler Design 51


Example : Consider expression a = b * – c + b * – c

August 11, 2024 SCS1303 Compiler Design 52


Triples Eg.
Unconditional jump:
goto L // for label Triples
op arg1 arg2
goto L --

Conditional jump:
if x < y goto L

op arg1 arg2
(0) < x y

(1) if (0) L

August 11, 2024 SCS1303 Compiler Design 53


Indexed statements:
x := y[i]

op arg1 arg2
(0) =[] y i

(1) = x (0)

x[i] := y

op arg1 arg2
(0) []= x i

(1) = (0) y

August 11, 2024 SCS1303 Compiler Design 54


Indirect Triples

• This representation makes use of pointer to the listing of all


references to computations which is made separately and
stored.
• Its similar in utility as compared to quadruple representation
but requires less space than it.

August 11, 2024 SCS1303 Compiler Design 55


Example : Consider expression a = b * – c + b * – c

August 11, 2024 SCS1303 Compiler Design 56


Problem
1. Write quadruple, triples and indirect triples for following expression :
(x + y) * (y + z) + (x + y + z)

The three address code is:


t1 = x + y
t2 = y + z
t3 = t1 * t2
t4 = t1 + z
t5 = t3 + t4

August 11, 2024 SCS1303 Compiler Design 57


August 11, 2024 SCS1303 Compiler Design 58
Implementation

The three-address code for (a+(b*c))-d/(b*c) is below :

August 11, 2024 SCS1303 Compiler Design 59


# Calculate one solution to the [[Quadratic equation]].
x = (-b + sqrt(b^2 - 4*a*c)) / (2*a)
t1 := b * b
t2 := 4 * a
t3 := t2 * c
t4 := t1 - t3
t5 := sqrt(t4)
t6 := - b
t7 := t5 + t6
t8 := 2 * a
t9 := t7 / t8
x := t9

August 11, 2024 SCS1303 Compiler Design 60


Why IR?

August 11, 2024 SCS1303 Compiler Design 61


August 11, 2024 SCS1303 Compiler Design 62
Next Topics

Intermediate code generation for:


– Assignment statements
– Boolean statements
– Switch case statement
– Procedure call

August 11, 2024 SCS1303 Compiler Design 63


Translation of Assignment Statement
• In the syntax directed translation, assignment statement is
mainly deals with expressions.
• The expression can be of type real, integer, array…

Consider the grammar


S → id := E
E → E1 + E2
E → E1 * E2
E → (E1)
E → id

August 11, 2024 SCS1303 Compiler Design 64


SDT into Three address code
• The non-terminal E has two attributes:
– E.place , the name that will hold the value of E
– E.code , the sequence of three address statements evaluating E

• The value of the non-terminal E on the left side of E → E1 + E2


will be computed into a new temporary t.

• The three address code for id=E consist of code to evaluate E into
some temporary t ,followed by the assignment id.place=t.

• The Synthesized attribute S.code represents the three address code


for the assignment S.
S → id := E
E → E1 + E2
E → E1 * E 2
E → (E1)
August 11, 2024 SCS1303 Compiler Design 65
E → id
• The function NEWTEMP() returns a sequence of distinct
names t1,t2,.... in response to successive calls.

• We use the notation gen(x ‘=’ y ‘+’ z) represent the three


address statement x = y + z

• GEN (A=B+C) procedure is used to emit the three address


statement A=B+C with actual values substituted for A,B and
C.

• GEN would enter the operator + , and the values of A,B and
C into the quadruple array.

August 11, 2024 SCS1303 Compiler Design 66


{ id = look_up(id.name);
If p ≠ nil then gen (id = E.place) T=NEWTEMP()
E.PLACE=T
Else Error; }

August 11, 2024 SCS1303 Compiler Design 67


e Input Stack PLACE Generated Code

of A= − B*(C+D)
= − B*(C+D)
Sy − B*(C+D) id
id=
A
A−
nt B*(C+D) id = − A− −

ax *(C+D) id = − id A− − B

- *(C+D) id = − E A− − B T1 = - B
*(C+D) id = E A− T1
di (C+D) id= E * A− T1 −
re C+D) id= E * ( A− T1 − −

ct +D) id= E * ( id A− T1 − − C
+D) id= E * ( E A− −T1 − − C
e D) id= E * ( E + A− T1 − − C −
d ) id= E * ( E + id A− T1 − − C − D

tr ) id= E * ( E + E A− T1 − − C − D T2 = C + D

a ) id= E * ( E
id= E * ( E )
A− T1 − − T2
A− T1 − − T2 −
ns id= E * E A− T1 − T2 T3 = T1 * T2
la id= E A− T3 A = T3

ti11, 2024
August
S S
SCS1303 Compiler Design 68
Assignment Statement with mixed types
• The constants and variables would be of different types,hence
the compiler must either reject certain mixed-mode
operations or generate appropriate coercion (mode
conversion) instructions.

• Consider two modes:


– REAL
– INTEGER, with integer converted to real when necessary.

An additional field in translation for E is E.MODE whose


value is either REAL or INTEGER

August 11, 2024 SCS1303 Compiler Design 69


• The semantic rule for E.MODE associated with the production
E → E1 + E2 is:

E → E 1 + E2 {if E1.MODE = INTEGER and E2.MODE = INTEGER


then E.MODE = INTEGER
else E.MODE = REAL }

• When necessary the three address code


A=inttoreal B is generated,
whose effect is to convert integer B to a real of equal value
called A.

August 11, 2024 SCS1303 Compiler Design 70


Example
X=Y+I*J
Assume X and Y to be REAL and I and J have mode
INTEGER.

Three-addresss code:
T1=I int * J
T2= inttoreal T1
T3=Y real +T2
X=T3

The semantic rule uses two translation fields E.PLACE and


E.MODE for the non-terminal E

August 11, 2024 SCS1303 Compiler Design 71


Semantic rule for E → E1 op E2
T= NEWTEMP()
if E1.MODE=INTEGER and E2.MODE=INTEGER then
begin
GEN(T= E1.PLACE int op E2.PLACE)
E.MODE=INTEGER
end

else if E1.MODE=REAL and E2.MODE=REAL then


begin
GEN(T= E1.PLACE real op E2.PLACE)
E.MODE=REAL
end

else if E1.MODE=INTEGER and E2.MODE=REAL then


begin
U=NEWTEMP()
GEN(U= inttoreal E1.PLACE)
GEN(T= U real op E2.PLACE)
E.MODE=REAL
end

else if E1.MODE=REAL and E2.MODE=INTEGER then


begin
U=NEWTEMP()
GEN(U= inttoreal E2.PLACE)
GEN(T= E1.PLACE real op U)
E.MODE=REAL
end
August 11, 2024 SCS1303 Compiler Design 72
E.PLACE=T
Boolean Expressions

• Boolean expressions have two primary purposes.


1. They are used for computing the logical values.
2. They are also used as conditional expression that alter the
flow of control, such as if-then-else or while-do.

• Boolean expression are composed of the boolean


operators(and ,or, and not) applied to elements that are
boolean variables or relational expressions.

• Relational expressions are of the form E1 relop E2 , where


E1 and E2 are arithmetic expressions.

August 11, 2024 SCS1303 Compiler Design 73


Boolean Expressions (Contd..)
Consider the grammar
E → E or E
E → E and E
E → not E
E → (E)
E → id relop id
E → TRUE E → id
E → FALSE

• The relop is any of <,≤,=, ≠, >, or ≥.


• We assume that or and and are left associative.
• or has lowest precedence ,then and ,then not.

August 11, 2024 SCS1303 Compiler Design 74


Methods of Translating Boolean Expressions
• There are two methods of representing the value of a
Boolean.

1. To encode true or false numerically


•1 is used to denote true.
•0 is used to denote false.

2. To evaluate a boolean expression analogously to an


arithmetic expression.
• Flow of control –representing the value of a boolean
expression by a position reached in a program.
• Used in flow of control statements such as if-then-else
or while-do statements.
August 11, 2024 SCS1303 Compiler Design 75
Translation into three-address code

• Use of branching statements of the form

goto L
If A goto L
If A relop B goto L

• where A and B are simple variables or constants.


• L is a quadruple label.
• relop is any of <,≤,=, ≠, >, or ≥.

August 11, 2024 SCS1303 Compiler Design 76


Numerical Representation

• First consider the implementation of boolean expressions


using 1 to denote TRUE and o to denote FALSE.
• Expressions will be evaluated from left to right.
e.g-1.
A or B and C
Three address sequence
T1 = B and C
T2 = A or T1

August 11, 2024 SCS1303 Compiler Design 77


Numerical Representation

e.g-2.
A relational expression A< B is equivalent to the conditional
statement
if A < B then 1 else 0

Three address sequence:


(1) if A < B goto (4)
(2) T=0
(3) goto (5)
(4) T=1
(5) ---

August 11, 2024 SCS1303 Compiler Design 78


Semantic Rules for Boolean Expression
E → E or E {E.place = newtemp(); T=NEWTEMP()
1 2
gen (E.place = E .place or E .place)} E.PLACE=T
1 2

E→E +E {E.place = newtemp();


1 2
gen (E.place = E .place and E .place)}
1 2

E → NOT E {E.place = newtemp();


1
gen (E.place = not E .place)} Next available entry in
1
quadruple
E → (E ) {E.place = E .place}
1 1

E → id relop id { E.place = newtemp();


1 2
gen (if id .place relop.op id .place goto nextquad + 3);
1 2
gen(E.place =0)
gen(goto nextquad + 2)
gen(E.place=1) }
E → TRUE {E.place := newtemp();
gen(E.place=1)}
E → FALSE
August 11, 2024 {E.placeSCS1303
:= newtemp();
Compiler Design 79
August 11, 2024 SCS1303 Compiler Design 80
August 11, 2024 SCS1303 Compiler Design 81
Translation of A< B or C
Three address code is:
(1) if A < B goto (4)
(2) T1=0
(3) goto (5)
(4) T1=1
(5) T2 =T1 or C
If triples are used the
Unconditional jump: Conditional jump: numbers must be
changed
goto L if x < y goto L
op arg1 arg2
op arg1 arg2
(0) < x y
goto L --
(1) if (0) L

August 11, 2024 SCS1303 Compiler Design 82


Control Flow Representation
• Flow of control statements with the jump method. Consider
the following : (short circuit code)
S → if E then S | if E then S1 else S2
S → while E do S1rules.

if E then S1 else S2

if E then S1 else S2
August 11, 2024 SCS1303 Compiler Design 83
Attributes used
if E then S1 else S2

E.true label which controls if E is true

E is associated with two labels


E.false label which controls if E is false

S.next - is a label that is attached to the first three address instruction


to be executed after the code for S (S1 or S2)
August 11, 2024 SCS1303 Compiler Design 84
August 11, 2024 SCS1303 Compiler Design 85
Example
if E then S1 else
S2 E.true=newlabel
E.false= newlabel
S1.next=S.next
S2.next=S.next
S.code=E.code||
gen(E.true ‘:’)||
The basic idea behind the translation: Suppose E is of S1.code ||
the form A<B, then the generated code is of the form gen(goto S.next)||
if A<B goto E.true gen(E.false ‘:’)||
goto E.false
S2.code
e.g.
If A < B then A=A+1 else B=B+1

Three address code:


If A<B goto L1
goto L2
L1: A=A+1
goto L3
L2: B=B+1
August 11, 2024
L3: ---- SCS1303 Compiler Design 86
August 11, 2024 SCS1303 Compiler Design 87
Backpatching
• The syntax directed definition can be implemented in two or more
passes (we have both synthesized attributes and inherited
attributes).
– Build the tree first
– Walk the tree in the depth first order.

• The main difficulty with intermediate code generation in one pass is


that we may not know the target of a branch when we generate
code for flow of control statements .
• Backpatching is the technique to get around this problem.
• Generate branch instructions with empty targets .
• When the target is known, fill in the label of the branch instructions
(backpatching).
August 11, 2024 SCS1303 Compiler Design 88
Backpatching
Back Patching is putting the address instead of labels when the
proper label is determined.
Back patching Algorithms perform three types of operations:

1) makelist (i) – creates a new list containing only i, an index


into the array of quadruples and returns a pointer to the list it
has made.

2) Merge (i, j) – concatenates the lists pointed to by i and j, and


returns a pointer to the concatenated list.

3) Backpatch (p, i) – inserts i as the target label for each of the


statements on the list pointed to by p.

August 11, 2024 SCS1303 Compiler Design 89


Code for While-do Statement

Generate three address code for the following code-


Three address code for the given code is-
while (A < C and B > D) do 1. if (A < C) goto (3)
C=C+1 2. goto (10)
3. if (B > D) goto (5)
A=A+B 4. goto (10)
5. T1 = c + 1
6. c = T1
7. T2 = A + B
8. A = T2
9. goto (10)
10. --

August 11, 2024 SCS1303 Compiler Design 90


Code for While-do Statement

Generate three address code for the following code-


Three address code for the given code is-
while (A < C and B > D) do 1. if (A < C) goto (3)goto (15)
if A = 1 then C = C + 1 2. if (B > D) goto (5)
3. goto (12)
else 4. if (A = 1) goto (7)
A=A+B 5. goto (9)
6. T1 = c + 1
7. c = T1
8. goto (1)
9. T2 = A + B
10. A = T2
11. goto (1)
12. --

August 11, 2024 SCS1303 Compiler Design 91


Code for do-while Statement

Generate three address code for the following code-


Three address code for the given code is-
c=0 1. c = 0
do 2. if (a < b) goto (4)
{ 3. goto (7)
if (a < b) then 4. T1 = x + 1
5. x = T1
x++; 6. goto (9)
else 7. T2 = x – 1
x --; 8. x = T2
c++; 9. T3 = c + 1
} while (c < 5) 10. c = T3
11. if (c < 5) goto (2)
12. --

August 11, 2024 SCS1303 Compiler Design 92


August 11, 2024 SCS1303 Compiler Design 93
Case Statements
• Switch and case statement is available in a variety of
languages. The syntax of case statement is as follows:

August 11, 2024 SCS1303 Compiler Design 94


• When switch keyword is seen then a new temporary t and two new labels
test and next are generated.
• When the case keyword occurs then for each case keyword, a new label Li
is created and entered into the symbol table.
• The value of Vi of each case constant and a pointer to this symbol-table
entry are placed on a stack.

August 11, 2024 SCS1303 Compiler Design 95


Translation of Switch statement

Case V1 L1
Case V2 L2
.
.
.
Case Vn-1 Ln-1
Case T, Ln Equivalent code:
label NEXT if T=V goto L

August 11, 2024 SCS1303 Compiler Design 96


Code for Switch-case Statement

Generate three address code for the following code-


Three address code for the given code is:
switch (ch)
{ if ch = 1 goto L1
if ch = 2 goto L2
case 1 : c = a + b; L1: T1 = a + b
break; c = T1
goto Last
case 2 : c = a – b; L2: T1 = a – b
break; c = T2
} goto Last
Last:

August 11, 2024 SCS1303 Compiler Design 97


Procedure Calls
• Procedure is an important and frequently used programming
construct for a compiler. It is used to generate good code for
procedure calls and returns.
Calling sequence:
The translation for a call includes a sequence of actions taken on
entry and exit from each procedure. Following actions take place in
a calling sequence:
– When a procedure call occurs then space is allocated for
activation record.
– Evaluate the argument of the called procedure.
– Save the state of the calling procedure so that it can resume
execution after the call.
– Also save the return address. It is the address of the location to
which the called routine must transfer after it is finished.
– Finally generate a jump to the beginning of the code for the
called procedure.
August 11, 2024 SCS1303 Compiler Design 98
Procedure Calls
• Consider the grammar for a simple procedure call statement,
(1) S→ call id(Elist)
(2) Elist → Elist,E
Argument list
(3) Elist → E
Arguments separated by
comma

Single argument

August 11, 2024 SCS1303 Compiler Design 99


Syntax-Directed Translation
Production Rule Semantic Action
S → call id(Elist) {for each item p on QUEUE do
GEN (param p)
GEN (call id.PLACE) }
Elist → Elist, E {append E.PLACE to the end of QUEUE }
Elist → E {initialize QUEUE to contain only E.PLACE}

* QUEUE is used to store the list of parameters in the procedure


call.

August 11, 2024 SCS1303 Compiler Design 100


Parse Tree

e.g. sum(x,y,z) Three address code:


S param x
param y
param z
call sum ,3
call )
id Elist
(

x y z
Elist , E
E.place=z

Elist , E x y
E.place=y

E x
E.place=x

August 11, 2024 SCS1303 Compiler Design 101


Example
Procedure call : sum(a+1,b,5) Three address code:
void main() T1=a+1
{ T2=5
sum(a+1,b,5); param T1
} param b
void sum(int x,int y,int z) param T2
call sum,3

August 11, 2024 SCS1303 Compiler Design 102


Next Topic:
Symbol Table

August 11, 2024 SCS1303 Compiler Design 103


Symbol Table
• An important data structure created and maintained by the
compiler in order to keep track of semantics of variable .

• It stores information about scope and binding information


about names, information about instances of various entities
such as variable and function names, classes, objects, etc.

• It is built in lexical and syntax analysis phases.

• The information is collected by the analysis phases of


compiler and is used by synthesis phases of compiler to
generate code.

• It is used by compiler to achieve compile time efficiency.


August 11, 2024 SCS1303 Compiler Design 104
It is used by various phases of compiler as follows :-
Lexical Analysis: Creates new table entries in the table, example like
entries about token.
Syntax Analysis: Adds information regarding attribute type, scope,
dimension, line of reference, use, etc in the table.
Semantic Analysis: Uses available information in the table to check for
semantics i.e. to verify that expressions and assignments are
semantically correct(type checking) and update it accordingly.
Intermediate Code generation: Refers symbol table for knowing how
much and what type of run-time is allocated and table helps in
adding temporary variable information.
Code Optimization: Uses information present in symbol table for
machine dependent optimization.
Target Code generation: Generates code by using address information
of identifier present in the table.

August 11, 2024 SCS1303 Compiler Design 105


Symbol Table entries
• Each entry in symbol table is associated with attributes that support
compiler in different phases.
• The symbol table is searched every time a name is encountered in the
source text.
• Changes to the table occur if a new name or new information about an
existing name is discovered.
• The symbol table grows dynamically even though fixed at compile time.
• Each entry in the symbol table is for the declaration of a name.
• The format of entries is not uniform.
The following information about
Items stored in Symbol table: identifiers are stored in symbol table.
1. The name.
1. Variable names and constants 2. The data type.
2. Procedure and function names 3. The block level.
3. Literal constants and strings 4. Its scope (local, global).
5. Pointer / address
4. Compiler generated temporaries 6. Its offset from base pointer
5. Labels in source languages 7. Function name, parameter and
variable.
August 11, 2024 SCS1303 Compiler Design 106
• A symbol table may serve the following purposes depending
upon the language :
– To store the names of all entities in a structured form at
one place.
– To verify if a variable has been declared.
– To implement type checking, by verifying assignments and
expressions.
– To determine the scope of a name (scope resolution).

August 11, 2024 SCS1303 Compiler Design 107


Operations of Symbol table

The basic operations defined on a symbol table include:

August 11, 2024 SCS1303 Compiler Design 108


Insert
• Insert() operation is more frequently used in the analysis
phase when the tokens are identified and names are stored in
the table.
• The insert() operation is used to insert the information in the
symbol table like the unique name occurring in the source
code.
• In the source code, the attribute for a symbol is the
information associated with that symbol. The information
contains the state, value, type and scope about the symbol.

For example:
int x;
Should be processed by the compiler as:
insert (x, int)
August 11, 2024 SCS1303 Compiler Design 109
Lookup
In the symbol table, lookup() operation is used to search a name.
It is used to determine:
• The existence of symbol in the table.
• The declaration of the symbol before it is used.
• Check whether the name is used in the scope.
• Initialization of the symbol.
• Checking whether the name is declared multiple times.

The basic format of lookup() function is as follows:


lookup (symbol)

This format is varies according to the programming language.

August 11, 2024 SCS1303 Compiler Design 110


Implementation

• The symbol table can be implemented in the


unordered list if the compiler is used to handle the
small amount of data.

• A symbol table can be implemented in one of the


following techniques:
– Linear (sorted or unsorted) list
– Hash table
– Binary search tree

• Symbol table are mostly implemented as hash table.


August 11, 2024 SCS1303 Compiler Design 111
List
• An array is used to store names and associated information.
• A pointer “available” is maintained at end of all stored
records and new names are added in the order as they arrive.
• To search for a name we start from beginning of list till
available pointer and if not found we get an error “use of
undeclared name”.
• While inserting a new name we must ensure that it is not
already present otherwise error occurs i.e. “Multiple defined
name”.
• Insertion is fast O(1), but lookup is slow for large tables – O(n)
on average.
• Advantage is that it takes minimum amount of space.

August 11, 2024 SCS1303 Compiler Design 112


• The compiler plans out the activation record for each procedure.
• The simplest and easiest to implement data structure for a symbol table is
a linear list of records.
• We use a single array, or equivalently several arrays. To store names and
their associated information.
• If the symbol table contains n names, To find the data about a name, on
the average, we search n/2 names, so the cost of an inquiry is also
proportional to n.

Drawbacks:
1. As the array fills, collisions become
more frequent – reduced
performance.
2. Table size is an issue – dynamically
increasing the table size is a difficulty.

August 11, 2024 SCS1303 Compiler Design 113


Storing the characters of the name
Example:
static int a;
float b;
Name Type attribute
a int static
b float -

1. fixed length
Name Type
e.g.
c a l c u l a t e int
int calculate
s u m int
int sum
a int
int a,b b int

August 11, 2024 SCS1303 Compiler Design 114


2. Variable length

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

c a l c u l a t e $ s u m $ a $ b $

Starting length type


index
0 10 int
10 4 int
14 2 int
16 2 int

August 11, 2024 SCS1303 Compiler Design 115


Hash Tables for Symbol Tables
• This method is generally more efficient than linear lists.
• This scheme gives the capability of performing e inquiries on
n names in time proportional to n ( n+e) / m, for any constant
m.

The basic hashing scheme:


• There are two parts to the data structure:
– A hash table consisting of a fixed array of m pointers to
table entries.
– Table entries organized into m separate linked lists, called
buckets (some buckets may be empty). Each record in the
symbol table appears on exactly one of these lists.

August 11, 2024 SCS1303 Compiler Design 116


August 11, 2024 SCS1303 Compiler Design 117
Representing Scope Information
• A simple approach is to maintain a separate symbol table for
each scope.
• The symbol table for a procedure or scope is the compile time
equivalent of an activation record.
• Linked list is best to represent the Scope Information.

August 11, 2024 SCS1303 Compiler Design 118


• The hash function will assign each key to a unique bucket.
• But most hash table designs employ an imperfect hash
function, which might cause hash collisions where the hash
function generates the same index for more than one key.

A small phone book as a hash table

Hash collision resolved by separate chaining.

August 11, 2024 SCS1303 Compiler Design 119


Binary Search Tree
• Another approach to implement symbol table is to use binary
search tree i.e. add two link fields (left and right child)
• All names are created as child of root node that always follow
the property of binary search tree.
• Insertion and lookup are O(log2 n) on average.
Binary tree search routine
While P≠ null do /*P is initailly a pointer to the root*/
if NAME=NAME(P) then
NAME found,do required action
else if NAME < NAME(P) then
P=LEFT(P) /* visit left child*/
else P=RIGHT(P) /* visit right child*/
/*if we fall through the loop ,we have failed to find NAME */
August 11, 2024 SCS1303 Compiler Design 120
Data structure for symbol table

• A compiler contains two type of symbol table:


1. global symbol table and
2. scope symbol table.
• Global symbol table can be accessed by all the procedures
and scope symbol table.

August 11, 2024 SCS1303 Compiler Design 121


The scope of a name and symbol table is arranged in the
hierarchy structure as shown below:
int value=10; int sum_id
int sum_num() {
{ int id_1;
int num_1; int id_2;
int num_2; {
{ int id_3;
int num_3; int id_4;
int num_4; }
} int num_5;
int num_5; }
{
int_num 6;
int_num 7;
}
}

August 11, 2024 SCS1303 Compiler Design 122


The global symbol table contains one global variable and two
procedure names. The name mentioned in the sum_num
table is not available for sum_id and its child tables.

August 11, 2024 SCS1303 Compiler Design 123


Improvised Symbol Table Structure

• Symbol table is the environment where the variables and


functions/methods exist according to their scope and the most
recent updated values are kept for the successful running of the
code.

• It helps in code functioning. It is created during compilation and


maintained, used during running of the code.

• Adding a utility called common file can help in conversion of one


code to another code.

• As common file can be explained as the file containing the


common functionalities of different languages, say, every language
has a print function but with different syntax; these different syntax
of print are added in common file which help in the conversion.
August 11, 2024 SCS1303 Compiler Design 124
Improvised Symbol Table Structure

August 11, 2024 SCS1303 Compiler Design 125


THANK YOU

August 11, 2024 SCS1303 Compiler Design 126

You might also like