18CS61 - SS and C - Module 5
18CS61 - SS and C - Module 5
VI Semester
System Software and Compiler Design (18CS61)
Module 5
Syntax Directed Translation
Contents :
Syntax Directed Translation: Intermediate code generation, Code generation
Text book 2: Chapter 5.1, 5.2, 5.3, 6.1, 6.2, 8.1, 8.2
Introduction
The third phase of the compiler is called semantic analysis. The main goal of semantic
analysis is to check for correctness of program and enable proper execution.
Semantic analysis phase acts as an interface between syntax phase and code generation
phase. It accepts the parse tree from the syntax analysis phase and adds the semantic
information to the parse tree and performs certain checks based on this information.
Example:
Production : E→E+T
Semantic rule: E.val = E.val + T.val
attributes
Note :
Semantic rule is associated with production.
Attribute name val is associated with each non-terminal used in the rule.
Attribute
Examples of attributes
1. The data types associated with variables such as int, float, char etc.
2. The value of an expression
3. The location of a variable in memory
4. The object code of a function
5. The number of significant digits in a number etc
Semantic rule
The rule that describes how to compute the attribute values associated with a grammar symbol is
called semantic rule.
Example :
Consider the production E → E + T
The attribute value of E which is on LHS of the production denoted by E.val can be calculated by
adding the attribute values of variables E and T on RHs of the production.
E.val = E.val + T.val
Types of attributes
1. Synthesized attribute
2. Inherited attribute
Synthesized attribute
The attribute value of a non-terminal A derived from the attribute values of its children or itself is
called as synthesized attribute.
Hence the attribute values of synthesized attributes are passed from children to the parent node in
bottom-up manner.
Example:
Production : E→E+T
Semantic rule: E.val = E.val + T.val
E.val = 10 + T.val = 20
Inherited attribute
The attribute value of a non-terminal A derived from the attribute values of its siblings or from its
parent or itself is called as inherited attribute.
Hence the attribute values of inherited attributes are passed from siblings or from parent to children
in top-down manner.
Example:
Production : D→T V
Where D – declaration
T – type such as int
V – variable such as sum
id.entry
The type int obtained from the Lexical Analyzer is already stored in T.type whose value is
transferred to its sibling V. ie, V.inh = T.type
Since attribute value of V is obtained from its sibling, it is inherited attribute and its attribute is
denoted by inh.
Similarly, the value int stored in V.inh is transferred to its child id.entry and hence entry is
inherited attribute of id and attribute value is denoted by id.entry.
It can be evaluated during a single bottom-up It can be evaluated during a single top-down
traversal of parse tree. traversal of parse tree.
A parse tree showing the attribute values of each node is called annotated parse tree.
The terminals in the annotated parse tree can have only synthesized attribute values and they
are obtained directly from Lexical Analyzer. So, there are no semantic rules in SDD to get
lexical values into terminals of the annotated parse tree. Terminals can never have inherited
attributes.
The other nodes in the annotated parse tree may have synthesized or inherited attributes.
Questions
Solution :
Let us assume an input string 4 * 5 + 6 for computing synthesized attributes. The annotated parse
tree for the input string is
2. Write the grammar and SDD for a simple desk calculator and show annotated parse tree
for the expression (3+4)*(5+6).
Solution :
A simple desk calculator performs operations such as addition, subtraction, multiplication and
division with or without ().
Grammar :
S En where n represents end of file marker
EE+T | E-T | T
TT*F | T/F |F
F (E) | digit
Annotated parse tree for the expression (3+4)*(5+6) consisting of attribute values for each non-
terminal is given below.
Solution :
Productions Semantic rules
S EN S.val = E.val
EE+T E.val = E.val + T.val
EE-T E.val = E.val - T.val
ET E.val = T.val
TT*F T.val = T.val * F.val
TT/F T.val = T.val / F.val
TF T.val = F.val
F (E) F.val = E.val
F digit F.val = digit.lexval
N; ;
4. The SDD to translate binary integer number into decimal is shown below.
Construct the parse tree and annotated parse tree for the string 1100.
In Syntax Directed Translation, along with the grammar, we associate some informal notations and
these notations are called as semantic rules.
Syntax Directed Translation = Grammar + Semantic rules
SDTs are used
To build syntax trees for programming constructs
To translate infix expressions into postfix notation
To evaluate expressions
Types of SDT
1. S-attributed SDT
2. L-attributed SDT
S-attributed SDT
L-attributed SDT
1) It uses both synthesized and inherited attributes. Inherited attributes can inherit values from
either parent or left siblings only.
Example :
A BCD { C.inh=A.inh, C.inh=B.inh, D.inh=B.inh, D.inh=A.inh}
But C.inh=D.inh is invalid as it takes value from right sibling.
2) Semantic actions are placed anywhere on RHS of the production.
Example :
A {…} BC | B {…} C | BC {…}
3) Attributes are evaluated by traversing the parse tree - depth first, left to right.
Note
If a definition is S-attributed, then it is also L-attributed but NOT vice-versa.
Uses bottom-up parsing. Uses Top-down parsing (depth first, left to right)
Semantic rules are always written at Semantic rules are written anywhere in RHS.
rightmost position in RHS.
Problems
4. Write the SDD for a simple type declaration and write the annotated parse tree for the
declaration “ float id1, id2, id3”.
Solution :
The grammar for the simple type declaration is
Where
DTL
D : Declaration
T int | float T : Data type(int /float)
L L1, id | id L : List of identifiers or identifier
1) The declaration D consists of basic data type T followed by list of L identifiers. T can be
either int or float. Thus, the tokens corresponding to int or float such as integer or float
contained from Lexical analyzer are copied into attribute value of T. The corresponding
productions and semantic rules are
Production Semantic rule
T int T.type=integer
T float T.type=float
2) The attribute value T.type available in the left subtree should be transferred to the right
subtree L. Since attribute value is transferred from left sibling to right sibling, its attribute
must be inherited attribute and is denoted by L.inh and can be obtained by the following
production.
Production Semantic rule
D TL L.inh=T.type
3) The type L.inh must be transferred to identifier id and hence it has to be copied into L1.inh
which is the left most child in RHS of the production L L1, id. This can be obtained by
the following production.
Production Semantic rule
L L1, id L1.inh=L.inh
4) The attribute value of L.inh in turn must be entered as the type for identifier id using the
production L id. This can be done as follows.
Production Semantic rule
L id Addtype (id.entry, L.inh)
Addtype()
• id.entry is a lexical value that points to the symbol table.
• L.inh is the type being assigned to every identifier in the list
• The function installs L.inh as the type of corresponding identifier.
5) The attribute value of L.inh in turn must be entered as the type for identifier id which is the
right most child in RHS of the production L L1 , id. This can be done as follows.
Production Semantic rule
L L1, id Addtype (id.entry, L.inh)
Dependency Graph
A graph that shows the flow of information which helps in computation of various attribute values
in a particular parse tree is called dependency graph.
An edge from one attribute instance to another attribute instance indicates that the attribute value of
the first is needed to compute the attribute value of the second.
While Annotated parse tree shows the values of attributes, Dependency graph shows how these
values are computed.
Example :
Production Semantic rules
EE+T E.val = E.val + T.val
In the above figure, the dotted lines along the nodes connected to them represent the parse tree.
The shaded nodes represented as val with solid arrows originating from one node and ends in
another node is the dependency graph.
Example :
SE
EE+T | E-T | T
TT*F | T/F |F
F (E) | digit
Input : 7+3*2
Question :
1. Give the SDD to process a sample variable declared in C and dependency graph for the
input “int a, b, c”
Dependency graph
The syntax tree is an abstract representation of the language constructs. The syntax trees are used to
write the translation routines using SDD. Constructing syntax tree for an expression means
translation of expression into postfix form.
Functions
1. mknode(op, left, right)
2. mkleaf(id, entry)
3. mkleaf(num, val)
This function creates a node with a filed operator having op as label and two pointers left and right.
2. mkleaf(id, entry)
id
This function creates a node for an identifier with label id and a pointer to symbol table is given by
entry.
3. mkleaf(num, val)
num
This function creates a node for number with label num and val is for value of that number.
Questions
Symbol Operation
x p1=mkleaf(id, ptr for x)
y p2=mkleaf(id, ptr for y)
* p3=mknode(*, p1, p2)
5 p4=mkleaf(num, 5)
- p5=mknode(-, p3, p4)
z p6=mkleaf(id, ptr for z)
+ p7=mknode(+, p5, p6)
Syntax tree
Symbol Operation
3 p1=mkleaf(num, 3)
5 p2=mkleaf(num, 5)
* p3=mknode(*, p1, p2)
4 p4=mkleaf(num, 4)
+ p5=mknode(+, p3, p4)
Syntax tree
3. Assuming suitable SDD, construct syntax tree for the expression a-4+e
Solution :
Symbol Operation
a p1=mkleaf(id, ptr for a)
4 p2=mkleaf(num, 4)
- p3=mknode(-, p1, p2)
e P4=mkleaf(id, ptr for e)
+ p5=mknode(+, p3, p4)
Syntax tree
In the analysis-synthesis model of a computer, the front end of a compiler translates a source
program into an independent intermediate code and then back end of the compiler uses this
intermediate code to generate the target code.
The benefits using machine independent intermediate code are
It is easy to change the source or the target language by adapting only the front-end or back-end.
It makes optimization easier.
The intermediate representation can be directly interpreted.
Intermediate representations
2. Graphical representation
a) Syntax Tree
Syntax tree is a condensed form of a parse tree. The operator and keyword nodes of the
parse tree are moved to their parents and a chain of single productions is replaced by single
link in syntax tree. The internal nodes are operators and child nodes are operands. To form
syntax tree, write ( ) in the expression. This way it is easy to recognize which operand
should come first.
Example :
x = -a * b + -a * b
Example 1 :
t0 = a + b
t1 = t0 + c
d = t0 + t1
Example 2 :
a + a * (b – c) + ( b – c) * d
Construction of DAG
Step 1:
For each 3-address instruction of the form x := y op z do the following activities
a) Find a node labeled y. If none exists, create it.
b) Find a node labeled z. If none exists, create it.
c) Find a node labeled op with y as the left child and z as the right child. If none found, then
create one and call this node N. If node op exists, whose name is N with y as left child and z
as right child, then add x to the list of identifiers attached to N.
Step 2:
For each 3-addres instruction of the form x := y, do the following activities.
a) Find a node labeled y. If node does not exist, create it and name it N. If exists, name it N.
b) Add x to the list of identifiers attached to N.
Step 3:
For each 3-address instruction of the form x := -y, do the following activities.
a) Find a node labeled y. If node does not exist, create it.
b) Find a node labeled – with y as the child. If none exists, create it. Call this node N.
c) Add x to the list of identifiers attached to N.
Questions
Solution :
1. x = x*3 2. y = y+x
3. x = y-z 3. y = x
Solution :
Covert to 3-address instruction.
t1= a+b
t2=t1+c
t3=t1*t2
Solution :
Note : There is no parenthesis for a+a. * has the highest precedence among +, *, -
Hence the expression order will be a + (a * (b - c)) + ((b – c) * d)
Covert to 3-address instruction.
t1 = b – c
t2= a * t1
t3 = t1 * d
t4= a + t2
t5 = t4 + t3
Solution :
a=b*c d =b e=d*c
5. Obtain the DAG representation for the expression (((a+a) + (a+a)) + ((a+a) + (a+a)))
Solution :
Covert to 3-address instruction.
t1=a+a
t2=t1+t1
t3=t2
t4=t3+t3
t1=a+a t2=t1+t1
t3=t2 t4=t3+t3
t3=t2+c t4=t3+d
t4=t2*d t5=t3+t4
7. Obtain the DAG representation for the expression ((x+y) – (x+y) * (x-y))) + ((x+y) * (x-y) )
Solution :
Covert to 3-address instruction.
t1=x+y
t2=x-y
t3=t1*t2
t4=t1-t3
t5=t3
t6=t4+t5
6 Procedure calls pqr (x) Here x is used as the parameter to procedure pqr.
{ The return statement indicates to return the value of
….. y.
return y;
}
7 Indexed assignment x := y[i] The value of array y at ith index is assigned to x.
x[i] := y The value of identifier y is assigned to index i of
array x.
8 Address and pointer x := &y The value of x will be the address or location of y
assignment x := *y y is a pointer whose value is assigned to x.
*x := y r-value of object pointed by x is set by l-value of y.
a) Quadruple representation
In quadruple representation, each instruction is divided into four fields - op, arg1, arg2 and
result.
• The op field is used to represent the internal code for the operator.
• The arg1 and arg2 represent two operands
• The result is used to store the result of the expression.
Example :
a:= -b * c + d
Three address
Location op arg1 arg2 result
code
t1 = -b (0) uminus b - t1
t2 = c + d (1) + c d t2
t3 = t1 * t2 (2) * t1 t2 t3
a = t3 (3) := t3 - a
b) Triple representation
In this representation, the use of temporary variables is avoided.
Instead, references to instructions are made.
The triple is a record field containing three fields op, arg1, arg2.
Example :
a := -b * c + d
Example :
a=b*–c+b*–c
Three address Location op arg1 arg2 Address Location
code
30 (1)
t1=-c (1) uminus c -
31 (2)
t2=b*t1 (2) * b (1)
t3=-c (3) uminus c 32 (3)
Code Generation
2. Target program
The target program is the output of the code generator. The output may be absolute machine
language, relocatable machine language, assembly language. Absolute machine language as
output has advantages that it can be placed in a fixed memory location and can be
immediately executed. Relocatable machine language as an output allows subprograms and
subroutines to be compiled separately. Relocatable object modules can be linked together and
loaded by linking loader. But there is added expense of linking and loading. Assembly
language as output makes the code generation easier. We can generate symbolic instructions
and use macro-facilities of assembler in generating code. And we need an additional assembly
step after code generation.
3. Memory Management
Mapping the names in the source program to the addresses of data objects is done by the front
end and the code generator. A name in the three address statements refers to the symbol table
entry for name. Then from the symbol table entry, a relative address can be determined for the
name.
4. Instruction selection
Selecting the best instructions will improve the efficiency of the program. It includes the
instructions that should be complete and uniform. Instruction speeds and machine idioms also
plays a major role when efficiency is considered. But if we do not care about the efficiency of
the target program then instruction selection is straight-forward.
For example, the respective three-address statements would be translated into the latter code
sequence as shown below:
P:=Q+R
S:=P+T
MOV Q, R0
ADD R, R0
MOV R0, P
MOV P, R0
ADD T, R0
MOV R0, S
Here the fourth statement is redundant as the value of the P is loaded again in that statement
that just has been stored in the previous statement. It leads to an inefficient code sequence. A
given intermediate representation can be translated into many code sequences, with significant
cost differences between the different implementations. A prior knowledge of instruction cost
is needed in order to design good sequences, but accurate cost information is difficult to
predict.
Evaluation order
The code generator decides the order in which the instruction will be executed. The order of
computations affects the efficiency of the target code. Among many computational orders,
some will require only fewer registers to hold the intermediate results. However, picking the
best order in the general case is a difficult NP-complete problem.
Code generator must always generate the correct code. It is essential because of the number of
special cases that a code generator might face. Some of the design goals of code generator are:
Correct
Easily maintainable
Testable
Efficient
The various types of instructions that are supported by the target machine are
1) Load instructions
2) Store instructions
3) Computational instructions
4) Unconditional instructions
5) Conditional jumps
1. Load instructions
They are used to copy the data into the destination operand which must be a register.
Syntax :
LD destination, address
The second operand can either be a register or a memory location.
Example :
LD R1, R2
LD R1, A
2. Store instructions
It is the opposite of load instruction. It is used to copy the data into memory location specified in the
destination operand.
Syntax :
ST destination, register
Destination must be a memory location
Example :
ST A, R1
3. Arithmetic instruction
The arithmetic operations are performed using these instructions.
Syntax :
OP destination, source1, source2
Example :
ADD R0, R1, R2
SUB R0, R0, R1
MUL R2, R0, R1
The addressing modes that are supported by generalized target machine are
a) Direct addressing
Address of the data to be accessed is directly present in the instruction ie, If a location is
identified by a variable name x, the value stored in a memory location can be accessed
directly using x.
Example : LD R1, R2
Load the content of register R2 to R1.
b) Indexed addressing
The data can be accessed from a memory location using an index.
Example : LD R1 A[R2]
d) Indirect addressing
The contents of data can be accessed by de-referencing using * operator.
Example :
LD R1, *(R2)
LOAD R1, @100
Load the content of memory address stored at memory address 100 to the register R1.
e) Immediate addressing
The data to be manipulated is directly present in the instruction and proceeded by #.
Example : LD R1, #100
Questions
Questions
Example:
Circumference of Circle = (22/7) x Diameter
Here,
This technique evaluates the expression 22/7 at compile time.
The expression is then replaced with its result 3.14.
This saves the time at run time.
b) Constant Propagation
In this technique,
If some variable has been assigned some constant value, then it replaces that variable
with its constant value in the further program during compilation.
The condition is that the value of variable must not get alter in between.
Example:
pi = 3.14, radius = 10, Area of circle = pi x radius x radius
Here,
This technique substitutes the value of variables „pi‟ and „radius‟ at compile time.
It then evaluates the expression 3.14 x 10 x 10.
The expression is then replaced with its result 314.
This saves the time at run time.
Example :
3. Code Movement
In this technique,
As the name suggests, it involves movement of the code.
The code present inside the loop is moved out if it does not matter whether it is present
inside or outside.
Such a code unnecessarily gets execute again and again with each iteration of the loop.
This leads to the wastage of time at run time.
Example:
Code Before Optimization Code After Optimization
for ( int j = 0 ; j < n ; j ++) x=y+z;
{ for ( int j = 0 ; j < n ; j ++)
x=y+ z; {
a[j] = 6 x j; a[j] = 6 x j;
} }
5. Strength Reduction
In this technique,
As the name suggests, it involves reducing the strength of expressions.
This technique replaces the expensive and costly operators with the simple and cheaper
ones.
Example :
Code Before Optimization Code After Optimization
B=Ax2 B=A+A
Here,
The expression “A x 2” is replaced with the expression “A + A”.
This is because the cost of multiplication operator is higher than that of addition
operator.