0% found this document useful (0 votes)
6 views

unit3part1

The document covers compiler design, specifically focusing on top-down parsing, semantic analysis, and intermediate code generation. It discusses various parsing techniques, semantic rules, and types of intermediate representations, including three-address code and its structures like quadruples and triples. Additionally, it highlights the importance of machine-independent intermediate code for optimization and portability in compilers.

Uploaded by

nirajdhanore04
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

unit3part1

The document covers compiler design, specifically focusing on top-down parsing, semantic analysis, and intermediate code generation. It discusses various parsing techniques, semantic rules, and types of intermediate representations, including three-address code and its structures like quadruples and triples. Additionally, it highlights the importance of machine-independent intermediate code for optimization and portability in compilers.

Uploaded by

nirajdhanore04
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

COMPILER DESIGN :

UNIT III
Question

2) Which one of the following is a top-down parser?

A. Recursive descent parser.


B. Operator precedence parser.
C. An LR(k) parser.
D. An LALR(k) parser
Question

Which one of the following is a top-down parser?

A. Recursive descent parser.


B. Operator precedence parser.
C. An LR(k) parser.
D. An LALR(k) parser

Option A
Recursive Descent parsing is LL(1) parsing
which is top down parsing.
Question

For the grammar below, a partial LL(1) parsing table is also presented
along with the grammar. Entries that need to be filled are indicated as
E1, E2, and E3. |epsilon is the empty string, $ indicates end of input,
and, | separates alternate right hand sides of productions.
Answer: (A)

Now in the above question,


FIRST(S) = { a, b, epsilon}
FIRST(A) = FIRST(S) = { a, b, epsilon}
FIRST(B) = FIRST(S) = { a, b, epsilon}
FOLLOW (A) = { b , a }
FOLLOW (S) = { $ } U FOLLOW (A) = { b , a , $ }
FOLLOW (B) = FOLLOW (S) = { b ,a , $ }
epsilon corresponds to empty string.
5.Consider the grammar given below:
S → Aa
A → BD
B→b|ε
D→d|ε
Let a, b, d, and $ be indexed as follows:
Compute the FOLLOW set of the non-terminal B and write the index
values for the symbols in the FOLLOW set in the descending order.
(For example, if the FOLLOW set is {a, b, d, $}, then the answer
should be 3210)
YOUR INPUT _(GATE CSE 2019)
• Follow(B) = First(D)- ε 𝖴 Follow(A)

[ When D is ε then Follow(B) = Follow(A) ]

∴ Follow(B) = {d} 𝖴 {a} = {a, d}

As a = 3 and d = 1 then Descending order = 31


Question
A canonical set of items is given below
S → L. > R
Q → R.
On input symbol < the set has? (GATE Question)

A. a shift-reduce conflict and a reduce-reduce conflict.


B. a shift-reduce conflict but not a reduce-reduce conflict.
C. a reduce-reduce conflict but not a shift-reduce conflict.
D. neither a shift-reduce nor a reduce-reduce conflict
option D Explanation
• The question is asked with respect to the symbol ' < ' which is
not present in the given canonical set of items.
• Hence it is neither a shift-reduce conflict nor a reduce-reduce
conflict on symbol '<‘.
• So option D is correct choice
• But if the question would have asked with respect to the
symbol ' > ' then it would have been a shift-reduce conflict.
UNIT III
INTERMEDIATE CODE GENERATION
USING
SYNTAX DIRECTED TRANSLATION SCHEMES
(SDTS)
Objectives:
 To introduce the semantic analysis phase of compiler
 To discuss syntax directed definition and translation
 To determine the semantic rules & their evaluation
order
 To discuss construction of syntax tree & DAG
 To discuss types, type expression, the type system &
type checker
Semantic Analysis
 The compiler should check syntactic & semantic conventions
of the source language. The semantics of the source code is
verified after the syntax is checked.
 Semantic analysis validates the meaning of the code by
checking if the sequence of tokens:
 is meaningful and correct
 is associated with the correct type
 is consistent and correct in the way in which control structures and
data types are used.
 The semantics of the language is validated with the help of
semantic rules.
 Semantic analysis is not always a separate phase of
compiler, it is usually combined with the other phases of
compiler like the parser
Semantic Analysis
 Semantic rules can be attached to grammar to perform
type checking.
 Semantic analysis typically involves the type checking,
Label checking, Uniqueness checking, name related
checking & control flow checks
 Semantic rules are a collection of procedures called at
appropriate times by the parser as the grammar
requires.
 Semantic rules can be applied to the grammar by
attaching attributes to the CFG.
 CFG+ sematic rules → Attribute grammar
Attributes
 Attribute grammar: An attribute grammar is a special
form of CFG where some additional information called
attribute is appended to one or more non terminals for
performing semantic analysis or intermediate code
generation or both.

 E →E + T {E.value = E.value + T.value}


 Based on the way the attributes obtain their values they
are divided into two categories:
 Synthesized – obtain the values from child nodes
 Inherited - obtain the values from parents or siblings
Synthesized attributes
 The attributes that obtain values Example: 5+3
from the attribute values of their
child nodes
Production Semantic Rules
L→E print(E.val)
E→E1+T E.val=E1.val+T.val
E→ T E.val = T.val
T→T*F T.val = T1.val * F.val
T→ F T.val = F.val
F→digit F.val = digit.lexval
Synthesized attributes
 The attributes that obtain values Example: 5+3
from the attribute values of their
child nodes
Production Semantic Rules
L→E print(E.val)
E→E1+T E.val=E1.val+T.val
E→ T E.val = T.val
T→T*F T.val = T1.val * F.val
T→ F T.val = F.val
F→digit F.val = digit.lexval
Inherited attributes
 Inherited attributes take values from parents and/or
siblings.
→ Example will be taken later on for type checking

D→integer id { Enter (id.type=integer);


D.attr = integer; }
D→real id { Enter (id.type=real);
D.attr = real; }
D→D , id ; { Enter (id.place= D.attr);
D.attr = D1.attr; }
Synthesized Attribute :
 The value of a synthesized attribute at a node is computed
from the values of attributes at the children in that node of the
parse tree .
 In production E->E+T, parent node E obtain its value from its
child node E & T.
 In the parse tree, these attributes are passed up the tree
Inherited Attribute:
 The value of an inherited attribute at a node is computed from
the values of attributes at the siblings and parent of that
node of the parse tree.
 If production rule S-> ABC, A can take values from S, B, C. B
can take values from S, A, C. Likewise C can take values from S,
A, B
 In the parse tree, these attributes are passed down
Syntax-directed Translation Scheme
 To associate sematic rules with productions.
Introduction to Intermediate Code Generation

 Many compilers convert the code to an intermediate representation.


 Intermediate code is the interface between front end and back end in a
compiler
 Ideally the details of source language are confined to the front end and
the details of target machines to the back end
 The benefits of using machine-independent intermediate code are as
follows:
 It reduces the number of optimizers and code generators.
 It is easy to generate and translate code into the target program.
 It enhances portability.
 It is easy to optimize as compared to machine-dependent code.
 The representation of intermediate code can be directly executed using
a program, which is referred to as the interpreter.
 Intermediate code can be either language specific (e.g., Byte Code for
Java) or language independent (three-address code)
Introduction to Intermediate Code Generation

• Intermediate representations span the gap between the source


and target languages:

• closer to target language;


• (more or less) machine independent;
• allows many optimizations to be done in a machine-
independent way.
• Implementable via syntax directed translation, so can be
folded into the parsing process.
Benefits of machine independent
intermediate code
• It is easy to generate & translate code into the target
program
• It enhances portability. If a compiler translate the
source language into the target machine language
without having an option to generate the intermediate
code, then for each new machine , a full naive compiler
is required. This is because there can be some
modification in the compiler, according to machine
specification
• It is easy to optimize as compared to machine
dependent code
Intermediate Code Representation

• Intermediate code are machine independent code, closer to


machine instructions.
• The designer of the compiler determines the intermediate
language.
• The intermediate representation can be selected based on
following.
⮚It should be easy to translate the source code into an
intermediate representation.
⮚It should be easy to translate the intermediate
representation to machine code
⮚The intermediate representation should be suitable for
optimization
⮚It should be neither too high level nor too low level.
Types of Intermediate Languages
• High Level Representations (e.g., syntax trees):
• closer to the source language
• easy to generate from an input program
• code optimizations may not be straightforward.

• Low Level Representations (e.g., 3-address code):


• closer to the target machine;
• easier for optimizations, final code generation;
Forms of Intermediate representation

• Graphical representation
• Linear representation
Graphical representation:
• Syntax tree-It depicts hierarchical structure of the source language
• DAG: It gives same information but in more compact way

Input: id + id * id Parse tree: Syntax tree


Forms of Intermediate representation

Linear representation:
• Postfix notation
• Three address code: TAC instructions are of the form
x=y op z,
where x, y, z are names, constants or compiler generated
temporaries,
op is any operator
Consider the statement
x= a* b + c can be written as
Forms of Intermediate representation

Linear representation:
• Postfix notation
• Three address code: TAC instructions are of the form
x=y op z,
where x, y, z are names, constants or compiler generated
temporaries,
op is any operator
Consider the statement
x= a* b + c can be written as
T1=a*b
T2=T1+c
x=T2
Three Address Code

• At most one operator allowed on RHS, so no ‘built-up”


expressions.
Instead, expressions are computed using temporaries
(compiler-generated variables).
• Source:
if ( x + y*z > x*y + z)
a = 0;
• Three Address Code:
Three Address Code

• At most one operator allowed on RHS, so no ‘built-up” expressions.


Instead, expressions are computed using temporaries
(compiler-generated variables).
• Source:
if ( x + y*z > x*y + z)
a = 0;
• Three Address Code:
tmp1 = y*z
tmp2 = x+tmp1 // x + y*z
tmp3 = x*y
tmp4 = tmp3+z // x*y + z
if (tmp2 <= tmp4) goto L
goto L1
L:a = 0
Intermediate Code Representation
 Graphical representations can be parse trees,
abstract syntax trees, DAG, etc.
 Linear representations are non-graphical like three-
address code (TAC), static single assignment (SSA),
etc.
 Representation of TACs
 Quadruples

 Triples

 Indirect triples
3 Address Instruction Set
• Assignment:
• x = y op z (op binary) • Procedure call/return:
• x = op y (op unary); • param x, k (x is the kth
•x=y param)
• Jumps: • call p
• if ( x op y ) goto L (L a • return
label);
• goto L • Type Conversion:
• Pointer and indexed • x = cvt_A_to_B y (A, B base
assignments: types) e.g.:
• x = y[ z ] cvt_int_to_float
• y[ z ] = x
• x = &y • Miscellaneous
• x = *y • label L
• *y = x.
Data structures for three address codes

• Quadruples :
• Has four fields: op, arg1, arg2 and result
• It uses temporaries to store intermediate result.
Advantage –
• Easy to rearrange code for global optimization.
• One can quickly access value of temporary variables
using symbol table.
Disadvantage –
• Contain lot of temporaries.
• Temporary variable creation increases time and space
complexity.
Data structures for three address codes

• Triples :
• Temporaries are not used and instead references to
instructions (another triple’s value) are made
• It consist 3 fields, op, arg1 and arg2 . The result field
is not considered
Disadvantage –
• Temporaries are implicit and difficult to rearrange
code.
• It is difficult to optimize because optimization
involves moving intermediate code. When a triple is
moved, any other triple referring to it must be
updated also.
Data structures for three address codes

• Indirect triples:
• In addition to triples we use a list of pointers to
triples in the desired order.
• This representation makes use of pointer to the
listing of all references to computations which is
made separately and stored. Its similar in utility as
compared to quadruple representation but requires
less space than it. Temporaries are implicit and
easier to rearrange code.
Intermediate Code Representation-
Example
Represent the expression a = (b + c) * −c in
quadruple, triple and indirect triple representation
 TAC:
T1 = b + c
T2 = −c
T3 = T1* T2
a = T3
Intermediate Code Representation-
Example (Contd..)
TAC:
T1 = b + c
T2 = −c
T3 = T1* T2
a = T3
Quadruple
op x (operand1) y (operand2) z (result)
Intermediate Code Representation-
Example (Contd..)
TAC:
T1 = b + c
T2 = −c
T3 = T1* T2
a = T3
Quadruple
op x (operand1) y (operand2) z (result)

(1) + b c T1
(2) - c T2
(3) * T1 T2 T3
(4) = a T3
Intermediate Code Representation-
Example (Contd..)
TAC:
T1 = b + c
T2 = −c
T3 = T1* T2
a = T3
Quadruple
op x (operand1) y (operand2) z (result)

(1) + b c T1
(2) - c T2
(3) * T1 T2 T3
(4) = a T3
Intermediate Code Representation-
Example (Contd..)
TAC:
T1 = b + c
T2 = −c
T3 = T1* T2
a = T3
Triple
op x (operand1) y (operand2)

(1) - b c

(2) - c

(3) * (1) (2)

(4) = a (3)
Intermediate Code Representation-
Example (Contd..)
TAC:
T1 = b + c
T2 = −c
T3 = T1* T2
a = T3
Indirect Triple
op x (operand1) y (operand2)
(1) (1) - b c
(2) (2) - c
(3) (3) * (1) (2)
(4) (4) = a (3)
Example
• a =b * - c + b *- c

Three address code


Example
• a =b * - c + b *- c

Three address code


t1 = - c
t2 = b * t1
t3 = - c
t4 = b * t3
t5 = t2 + t4
a = t5
Quadruples
Example
• a =b * - c + b *- c op
-
arg1 arg2 result
c t1
* b t1 t2
Three address code - c t3
* b t3 t4
t1 = - c + t2 t4 t5
t2 = b * t1 = t5 a
t3 = - c
t4 = b * t3
t5 = t2 + t4 Indirect Triples
Triples
a = t5 op arg1 arg2 op op arg1 arg2
0 - c 35 (0) 0 - c
1 * b (0) 36 (1) 1 * b (0)
2 - c 37 (2) 2 - c
3 * b (2) b (2)
38 (3) 3 *
4 + (1) (3) 39 (4) 4 + (1) (3)
5 = a (4) 40 (5) 5 = a (4)
Home work
 Generate the Three address code for the given arithmetic
expression by constructing the annotated parse tree. Also
represent the 3AC in Quadruple, Triples & Indirect triples.
- ( a + b) * ( c + d ) – ( a + b + c )
Syntax-directed Translation into Three-
address Code – Principle
 To translate any construct of a programming
language, its syntax structure must be specified
 Semantic actions should be defined in the
production rules of the grammar.
 The syntax-directed translation (SDT) scheme is used
to generate the TAC.
Syntax-directed translation scheme to
convert infix to postfix
Grammar Semantic rule
E→E1+T E.string = E1.string || T.string || ‘+’
E→ T E.string = T.string
T→T1*F T.string = T1.string || F.string || ‘*’
T→F T.string = F.string
F→(E) F.string = E.string
F→num F.string = num.string

Annotated parse tree for the input string 2*3+6*5


Syntax-directed translation scheme to
convert infix to postfix
Annotated parse tree for the input string 2*3+6*5
Grammar Semantic rule
E1→E2+T E1.string = E1.string || T.string || ‘+’
E1→ T E1.string = T.string
T1→T2*F T1.tring = T2.string || F.string || ‘*’
T→F T.string = F.string
F→(E) F.string = E.string
F→num F.string = num.string

Grammar Semantic rule


E→E1+T E.string = E1.string || T.string || ‘+’
E→ T E.string = T.string
T→T1*F T.string = T1.string || F.string || ‘*’
T→F T.string = F.string
F→(E) F.string = E.string
F→num F.string = num.string
Syntax-directed translation scheme to
convert infix to postfix
Annotated parse tree for the input string 2*3+6*5
Thank You

Happy Learning !

You might also like