Compiler Design Quantum
Compiler Design Quantum
1
UNIT
Introduction to
Compiler
CONTENTS
Part-1 : Introduction to Compiler : ....................... 1–2C to 1–6C
Phases and Passes
PART-1
Introduction to Compiler, Phases and Passes.
Questions-Answers
Answer
A compiler contains 6 phases which are as follows :
i. Phase 1 (Lexical analyzer) :
a. The lexical analyzer is also called scanner.
b. The lexical analyzer phase takes source program as an input and
separates characters of source language into groups of strings
called token.
c. These tokens may be keywords identifiers, operator symbols and
punctuation symbols.
ii. Phase 2 (Syntax analyzer) :
a. The syntax analyzer phase is also called parsing phase.
b. The syntax analyzer groups tokens together into syntactic
structures.
c. The output of this phase is parse tree.
iii. Phase 3 (Semantic analyzer) :
a. The semantic analyzer phase checks the source program for
semantic errors and gathers type information for subsequent code
generation phase.
b. It uses parse tree and symbol table to check whether the given
program is semantically consistent with language definition.
c. The output of this phase is annotated syntax tree.
iv. Phase 4 (Intermediate code generation) :
a. The intermediate code generation takes syntax tree as an input
from semantic phase and generates intermediate code.
b. It generates variety of code such as three address code, quadruple,
triple.
Compiler Design 1–3 C (CS/IT-Sem-5)
Source program
Lexical analyzer
Syntax analyzer
Semantic analyzer
Code optimizer
Code generator
Target program
Fig. 1.1.1.
Lexical analyzer
Token stream
id1 = (id2 + id3) * (id2 + id3) * 2
Syntax analyzer
id2 id3
Semantic analyzer
id1 *
Annotated syntax tree
*
int_to_real
+ +
2
id2 id3
Intermediate code
generation
t1 = b + c
Intermediate code
t2 = t1 * t1
t3 = int_to_real (2)
t4 = t2 * t3
id1 = t4
Compiler Design 1–5 C (CS/IT-Sem-5)
Code optimization
t1 = b + c Optimized code
t2 = t1 * t1
id1 = t2 * 2
Machine code
Machine code
MOV R1, b
ADD R 1, R 1, c
MUL R2, R1, R1
MUL R2, R1, # 2.0
ST id1, R2
Answer
Types of passes :
1. Single-pass compiler :
a. In a single-pass compiler, when a line source is processed it is
scanned and the tokens are extracted.
b. Then the syntax of the line is analyzed and the tree structure,
some tables containing information about each token are built.
2. Multi-pass compiler : In multi-pass compiler, it scan the input source
once and produces first modified form, then scans the modified form
and produce a second modified form and so on, until the object form is
produced.
Answer
Role of compiler writing tools :
1. Compiler writing tools are used for automatic design of compiler
component.
2. Every tool uses specialized language.
3. Writing tools are used as debuggers, version manager.
Introduction to Compiler 1–6 C (CS/IT-Sem-5)
PART-2
Bootstrapping.
Questions-Answers
Answer
Cross compiler : A cross compiler is a compiler capable of creating executable
code for a platform other than the one on which the compiler is running.
Bootstrapping :
1. Bootstrapping is the process of writing a compiler (or assembler) in the
source programming language that it intends to compile.
2. Bootstrapping leads to a self-hosting compiler.
3. An initial minimal core version of the compiler is generated in a different
language.
Compiler Design 1–7 C (CS/IT-Sem-5)
The T-diagram shown in Fig. 1.4.1 is also used to depict the same
compiler :
S T
Fig. 1.4.1.
6. To create a new language, L, for machine A :
a. Create S C AA a compiler for a subset, S, of the desired language, L,
using language A, which runs on machine A. (Language A may be
assembly language.)
S A
Fig. 1.4.2.
L
b. Create CSA , a compiler for language L written in a subset of L.
L A
Fig. 1.4.3.
L A L A
S S A A
Fig. 1.4.4.
Introduction to Compiler 1–8 C (CS/IT-Sem-5)
PART-3
Finite State Machines and Regular Expressions and their
Application to Lexical Analysis, Optimization of DFA
Based Pattern Matchers.
Questions-Answers
Answer
1. Regular expression is a formula in a special language that is used for
specifying simple classes of strings.
2. A string is a sequence of symbols; for the purpose of most text-based
search techniques, a string is any sequence of alphanumeric characters
(letters, numbers, spaces, tabs, and punctuation).
Formal recursive definition of regular expression :
Formally, a regular expression is an algebraic notation for characterizing a
set of strings.
1. Any terminals, i.e., the symbols belong to S are regular expression.
Null string (, ) and null set () are also regular expression.
2. If P and Q are two regular expressions then the union of the two
regular expressions, denoted by P + Q is also a regular expression.
3. If P and Q are two regular expressions then their concatenation denoted
by PQ is also a regular expression.
4. If P is a regular expression then the iteration (repetition or closure)
denoted by P* is also a regular expression.
5. If P is a regular expression then P, is a regular expression.
6. The expressions got by repeated application of the rules from (1) to (5)
over are also regular expression.
Que 1.6. Define and differentiate between DFA and NFA with an
example.
Compiler Design 1–9 C (CS/IT-Sem-5)
Answer
DFA :
1. A finite automata is said to be deterministic if we have only one transition
on the same input symbol from some state.
2. A DFA is a set of five tuples and represented as :
M = (Q, , , q0, F)
where, Q = A set of non-empty finite states
= A set of non-empty finite input symbols
q0 = Initial state of DFA
F = A non-empty finite set of final state
= Q × Q.
NFA :
1. A finite automata is said to be non-deterministic, if we have more than
one possible transition on the same input symbol from some state.
2. A non-deterministic finite automata is set of five tuples and represented
as :
M = (Q, , , q0, F)
where, Q = A set of non-empty finite states
= A set of non-empty finite input symbols
q0 = Initial state of NFA and member of Q
F = A non-empty finite set of final states and
member of Q
q0
(q, a) q1
q2
qn
Fig. 1.6.1.
Example: DFA for the language that contains the strings ending with
0 over Σ = {0, 1}.
1 0
Start q0 0 qf
1
Fig. 1.6.2.
NFA for the language L which accept all the strings in which the third
symbol from right end is always a over = {a, b}.
a, b a, b
q0 q1 q2 q3
a a, b a, b
Fig. 1.6.3.
Answer
Thompson’s construction :
1. It is an algorithm for transforming a regular expression to equivalent
NFA.
2. Following rules are defined for a regular expression as a basis for the
construction :
i. The NFA representing the empty string is :
0 1
Compiler Design 1–11 C (CS/IT-Sem-5)
b
iv. Concatenation simply involves connecting one NFA to the other
thus ab can be represented as :
a b
0 1 2
v. The Kleene closure must allow for taking zero or more instances
of the letter from the input; thus a* looks like :
a
0 1 2 3
For example :
Construct NFA for r = (a|b)*a
For r 1 = a,
start a
2 3
For r 2 = b,
start b
4 5
For r 3 = a|b
a
2 3
start
1 6
4 5
b
The NFA for r 4 = (r3)*
Introduction to Compiler 1–12 C (CS/IT-Sem-5)
a
2 3
start
0 1 6 7
4 5
b
Finally, NFA for r 5 = r 4·r1 = (a|b)*a
a
2 3
start
a
1 1 6 7 8
4 5
b
Que 1.8. Construct the NFA for the regular expression a|abb|a*b+
by using Thompson’s construction methodology.
AKTU 2017-18, Marks 10
Answer
Given regular expression : a + abb + a*b+
+
Step 1 : q 1 a + abb + a*b qf
a
Step 2 : q1 qf
a
b
q2 b q3
+
a* b
q4
a
b
q1 a q2 b q3 b qf
Step 3 :
q5 a q4 b
q6
Compiler Design 1–13 C (CS/IT-Sem-5)
Answer
Step 1 : a
a
Step 2 : b*
b
Step 3 : b
b
Step 4 : ab*
a b
Step 5 : ab
a b
Step 6 : ab*|ab
a b
2 3 4 5
1 10
a b
6 7 8 9
Fig. 1.9.1. NFA of ab*|ab.
Que 1.10. Discuss conversion of NFA into a DFA. Also give the
Answer
Conversion from NFA to DFA :
Suppose there is an NFA N < Q, , q0, , F > which recognizes a language L.
Then the DFA D < Q', , q0, , F > can be constructed for language L as :
Step 1 : Initially Q = .
Step 2 : Add q0 to Q.
Step 3 : For each state in Q, find the possible set of states for each input
symbol using transition function of NFA. If this set of states is not in Q, add
it to Q.
Introduction to Compiler 1–14 C (CS/IT-Sem-5)
Step 4 : Final state of DFA will be all states which contain F (final states of
NFA).
Que 1.11. Construct the minimized DFA for the regular expression
Answer
Given regular expression : (0 + 1)*(0 + 1)10
NFA for given regular expression :
(0 + 1)*(0 + 1)10
q1 qf
(0 + 1)* (0 + 1)
q1 q2 q3 1 q4 0 qf
(0 + 1)* 1
q1 q2 0 q3 q4 0 qf
1
0, 1
q1 q 1 q2 0 q3 1 q4 0
5 qf
1
If we remove we get
0, 1
q1 0 q3 1 q4 0 qf
1
[ can be neglected so q1 = q5 = q2]
Now, we convert above NFA into DFA :
Transition table for NFA :
/ 0 1
q1 q1 q3 q1 q3
q3 q4
q4 qf
q
* f
Transition table for DFA :
/ 0 1 Let
q1 q1 q3 q1 q3 q1 as A
q1 q3 q1 q3 q1 q3q4 q1 q3 as B
q1 q3 q4 q1 q3 qf q1 q3 q4 q1 q3 q4 as C
q q q q1 q3 q1 q3 q4 q1 q3 qf as D
* 1 3 f
Compiler Design 1–15 C (CS/IT-Sem-5)
*D B C
Que 1.12. How does finite automata useful for lexical analysis ?
Answer
1. Lexical analysis is the process of reading the source text of a program
and converting it into a sequence of tokens.
2. The lexical structure of every programming language can be specified
by a regular language, a common way to implement a lexical analyzer
is to :
a. Specify regular expressions for all of the kinds of tokens in the
language.
b. The disjunction of all of the regular expressions thus describes
any possible token in the language.
c. Convert the overall regular expression specifying all possible
tokens into a Deterministic Finite Automaton (DFA).
d. Translate the DFA into a program that simulates the DFA. This
program is the lexical analyzer.
Introduction to Compiler 1–16 C (CS/IT-Sem-5)
Answer
1. DFA for all strings over {a, b} such that fifth symbol from right is a :
Regular expression : (a + b)* a (a + b) (a + b) (a + b) (a + b)
2. Regular expression :
[00(0 + 1) (0 + 1) 0(0 + 1) 0(0 + 1) + 0(0 + 1) (0 + 1)0 + (0 + 1) 00(0 +1)
+ (0 + 1)0(0 + 1) 0 + (0 + 1) (0 + 1)00]
q0 q1 q2
b b. c
a. c
Fig. 1.14.1.
Answer
Transition table for -NFA :
/ a b c
q0 q1 q2 {q1, q2}
q1 q0 q2 {q0, q2}
q2
-closure of {q0} = {q0, q1, q2}
-closure of {q1} = {q1}
-closure of {q2} = {q2}
Compiler Design 1–17 C (CS/IT-Sem-5)
D
Dead state
a, b, c
Fig. 1.14.2.
PART-4
Implementation of Lexical Analyzers, Lexical Analyzer Generator,
LEX Compiler.
Introduction to Compiler 1–18 C (CS/IT-Sem-5)
Questions-Answers
Answer
Lexical analyzer can be implemented in following step :
1. Input to the lexical analyzer is a source program.
2. By using input buffering scheme, it scans the source program.
3. Regular expressions are used to represent the input patterns.
4. Now this input pattern is converted into NFA by using finite automation
machine.
Regular expression
Finite automata
Symbol table
Answer
1. For efficient design of compiler, various tools are used to automate the
phases of compiler. The lexical analysis phase can be automated using a
tool called LEX.
Compiler Design 1–19 C (CS/IT-Sem-5)
Answer
1. Automatic generation of lexical analyzer is done using LEX
programming language.
2. The LEX specification file can be denoted using the extension .l (often
pronounced as dot L).
3. For example, let us consider specification file as x.l.
4. This x.l file is then given to LEX compiler to produce lex.yy.c as shown
in Fig. 1.17.1. This lex.yy.c is a C program which is actually a lexical
analyzer program.
Lex specification
file LEX lex.yy.c
x.l. compiler Lexical analyzer
program
Fig. 1.17.1.
5. The LEX specification file stores the regular expressions for the token
and the lex.yy.c file consists of the tabular representation of the
transition diagrams constructed for the regular expression.
6. In specification file, LEX actions are associated with every regular
expression.
7. These actions are simply the pieces of C code that are directly carried
over to the lex.yy.c.
8. Finally, the C compiler compiles this generated lex.yy.c and produces
an object program a.out as shown in Fig. 1.17.2.
9. When some input stream is given to a.out then sequence of tokens
gets generated. The described scenario is shown in Fig. 1.17.2.
Introduction to Compiler 1–20 C (CS/IT-Sem-5)
lex.yy.c C a.out
compiler
[Executable program]
Input Stream of
a.out
strings from tokens
source program
Fig. 1.17.2. Generation of lexical analyzer using LEX.
Answer
The LEX program consists of three parts :
%{
Declaration section
%}
%%
Rule section
%%
Auxiliary procedure section
1. Declaration section :
a. In the declaration section, declaration of variable constants can be
done.
b. Some regular definitions can also be written in this section.
c. The regular definitions are basically components of regular
expressions.
2. Rule section :
a. The rule section consists of regular expressions with associated
actions. These translation rules can be given in the form as :
R1 {action1}
R2 {action2}
.
.
.
Rn {actionn}
Where each Ri is a regular expression and each actioni is a program
fragment describing what action is to be taken for corresponding
regular expression.
b. These actions can be specified by piece of C code.
3. Auxiliary procedure section :
a. In this section, all the procedures are defined which are required
by the actions in the rule section.
Compiler Design 1–21 C (CS/IT-Sem-5)
Answer
%{
int count;
/*program to recognize the keywords*/
%}
%%
[%\t ] + /* “+” indicates zero or more and this pattern is use for
ignoring the white spaces*/
auto | double | if| static | break | else | int | struct | case |
enum | long | switch | char | extern | near | typedef | const | float |
register| union| unsigned | void | while | default
printf(“C keyword(%d) :\t %s”,count,yytext);
[a-zA-Z]+ { printf(“%s: is not the keyword\n”, yytext);
%%
main()
{
yylex();
}
Que 1.20. What are the various LEX actions that are used in LEX
programming ?
Answer
There are following LEX actions that can be used for ease of programming
using LEX tool :
1. BEGIN : It indicates the start state. The lexical analyzer starts at state
0.
2. ECHO : It emits the input as it is.
3. yytext() :
a. yytext is a null terminated string that stores the lexemes when
lexer recognizes the token from input token.
b. When new token is found the contents of yytext are replaced by
new token.
Introduction to Compiler 1–22 C (CS/IT-Sem-5)
Answer
Token :
1. A token is a pair consisting of a token name and an optional attribute
value.
2. The token name is an abstract symbol representing a kind of lexical
unit.
3. Tokens can be identifiers, keywords, constants, operators and
punctuation symbols such as commas and parenthesis.
Lexeme :
1. A lexeme is a sequence of characters in the source program that matches
the pattern for a token.
2. Lexeme is identified by the lexical analyzer as an instance of that token.
Pattern :
1. A pattern is a description of the form that the lexemes of a token may
take.
2. Regular expressions play an important role for specifying patterns.
3. If a keyword is considered as token, pattern is just sequence of characters.
PART-5
Formal Grammars and their Application to Syntax Analysis,
BNF Notation.
Questions-Answers
Answer
A grammar or phrase structured grammar is combination of four tuples and
can be represented as G (V, T, P, S). Where,
1. V is finite non-empty set of variables/non-terminals. Generally non-
terminals are represented by capital letters like A, B, C, ……, X, Y, Z.
2. T is finite non-empty set of terminals, sometimes also represented by
or VT. Generally terminals are represented by a, b, c, x, y, z, , , etc.
3. P is finite set whose elements are in the form . Where and are
strings, made up by combination of V and T i.e., (V T). has at least
one symbol from V. Elements of P are called productions or production
rule or rewriting rules.
4. S is special variable/non-terminal known as starting symbol.
While writing a grammar, it should be noted that V T = , i.e., no terminal
can belong to set of non-terminals and no non-terminal can belong to set of
terminals.
Answer
EE E
E (E)
E id
Answer
Answer
BNF notation :
1. The BNF (Backus-Naur Form) is a notation technique for context free
grammar. This notation is useful for specifying the syntax of the
language.
2. The BNF specification is as :
<symbol> : = Exp1|Exp2|Exp3...
Where <symbol> is a non terminal, and Exp1, Exp2 is a sequence of
symbols. These symbols can be combination of terminal or non
terminals.
Compiler Design 1–25 C (CS/IT-Sem-5)
3. For example :
<Address> : = <fullname> : “,” <street> “,” <zip code>
<fullname> : = <firstname> “–” <middle name> “–” <surname>
<street> : = <street name> “,” <city>
We can specify first name, middle name, surname, street name, city
and zip code by valid strings.
4. The BNF notation is more often non-formal and in human readable
form. But commonly used notations in BNF are :
a. Optional symbols are written with square brackets.
b. For repeating the symbol for 0 or more number of times asterisk
can be used.
For example : {name}*
c. For repeating the symbols for at least one or more number of
times + is used.
For example : {name}+
d. The alternative rules are separated by vertical bar.
e. The group of items must be enclosed within brackets.
PART-6
Ambiguity, YACC.
Questions-Answers
Answer
Ambiguous grammar : A context free grammar G is ambiguous if there
is at least one string in L(G) having two or more distinct derivation tree.
Proof : Let production rule is given as :
E EE+
Introduction to Compiler 1–26 C (CS/IT-Sem-5)
E E(E)
E id
Parse tree for id(id)id + is
E
Only one parse tree is
E E + possible for id(id)id+
so, the given grammar
E ( E ) id is unambiguous.
id id
Answer
i. Context free grammar : Refer Q. 1.23, Page 1–23C, Unit-1.
ii. YACC parser generator :
1. YACC (Yet Another Compiler - Compiler) is the standard parser
generator for the Unix operating system.
2. An open source program, YACC generates code for the parser in
the C programming language.
3. It is a Look Ahead Left-to-Right (LALR) parser generator, generating
a parser, the part of a compiler that tries to make syntactic sense of
the source code.
Answer
Given :
S AB|aaB
A a|Aa
B b
Let us generate string aab from the given grammar. Parse tree for generating
string aab are as follows :
S S
a a B A B
b A a b
a
Fig. 1.28.1.
Here for the same string, we are getting more than one parse tree. Hence,
grammar is an ambiguous grammar.
The grammar
S AB
A Aa/a
B b
is an unambiguous grammar equivalent to G. Now this grammar has only
one parse tree for string aab.
A B
a a b
Fig. 1.28.2.
PART-7
The Syntactic Specification of Programming Languages : Context
Free Grammar (CFG), Derivation and Parse Trees,
Capabilities of CFG.
Introduction to Compiler 1–28 C (CS/IT-Sem-5)
Questions-Answers
Que 1.29. Define pars e tree. What are the conditions for
constructing a parse tree from a CFG ?
Answer
Parse tree :
1. A parse tree is an ordered tree in which left hand side of a production
represents a parent node and children nodes are represented by the
production’s right hand side.
2. Parse tree is the tree representation of deriving a Context Free Language
(CFL) from a given Context Free Grammar (CFG). These types of trees
are sometimes called derivation trees.
Conditions for constructing a parse tree from a CFG :
i. Each vertex of the tree must have a label. The label is a non-terminal or
terminal or null ().
ii. The root of the tree is the start symbol, i.e., S.
iii. The label of the internal vertices is non-terminal symbols VN.
iv. If there is a production A X1X2 ....XK . Then for a vertex, label A, the
children of that node, will be X1X2 .... XK .
v. A vertex n is called a leaf of the parse tree if its label is a terminal
symbol or null ().
Answer
5. We say that the production is applied to the string ab to obtain
ab or we say that ab directly drives ab.
6. Now suppose 1, 2, 3, ...m are string in (Vn Vt)*, m 1 and
1 2, 2 3, 3 4, ..., m–1 m.
G G G G
*
7. Then we say that 1 m, i.e., we say 1, drives m in grammar G. If
G
i
drives by exactly i steps, we say 1 .
G
Que 1.31. What do you mean by left most derivation and right
most derivation with example ?
Answer
Left most derivation : The derivation S s is called a left most derivation,
if the production is applied only to the left most variable (non-terminal) at
every step.
Example : Let us consider a grammar G that consist of production rules
E E + E | E E | id.
Firstly take the production
EE+EE E+E (Replace E E E)
id E+E (Replace E id)
id id + E (Replace E id)
id id + id (Replace E id)
Right most derivation : A derivation S s is called a right most derivation,
if production is applied only to the right most variable (non-terminal) at
every step.
Example : Let us consider a grammar G having production.
E E+E|E E | id.
Start with production
E E E
E E+ E (Replace E E + E)
E E + id (Replace E id)
E id + id (Replace E id)
id id + id (Replace E id)
Introduction to Compiler 1–30 C (CS/IT-Sem-5)
Answer
Various capabilities of CFG are :
1. Context free grammar is useful to describe most of the programming
languages.
2. If the grammar is properly designed then an efficient parser can be
constructed automatically.
3. Using the features of associatively and precedence information,
grammars for expressions can be constructed.
4. Context free grammar is capable of describing nested structures like :
balanced parenthesis, matching begin-end, corresponding if-then-else’s
and so on.
. c
q0 q1 q2
b b. c
a. c
Fig. 1.
Ans. Refer Q. 1.14.
Q. 6. Explain the term token, lexeme and pattern.
Ans. Refer Q. 1.21.
Q. 7. What is an ambiguous grammar ? Is the following grammar
is ambiguous ? Prove EE + |E(E)|id. The grammar should
be moved to the next line, centered.
Ans. Refer Q. 1.26.
2
UNIT
Basic Parsing
Techniques
CONTENTS
Part-1 : Basic Parsing Techniques : ...................... 2–2C to 2–4C
Parsers Shift Reduce Parsing
PART-1
Basic Parsing Techniques : Parsers, Shift Reduce Parsing.
Questions-Answers
Que 2.1. What is parser ? Write the role of parser. What are the
most popular parsing techniques ?
OR
Explain about basic parsing techniques. What is top-down parsing ?
Explain in detail.
Answer
A parser for any grammar is a program that takes as input string w and
produces as output a parse tree for w.
Role of parser :
1. The role of parsing is to determine the syntactic validity of a source
string.
2. Parser helps to report any syntax errors and recover from those errors.
3. Parser helps to construct parse tree and passes it to rest of phases of
compiler.
There are basically two type of parsing techniques :
1. Top-down parsing :
a. Top-down parsing attempts to find the left-most derivation for an
input string w, that start from the root (or start symbol), and
create the nodes in pre-defined order.
b. In top-down parsing, the input string w is scanned by the parser
from left to right, one symbol/token at a time.
c. The left-most derivation generates the leaves of parse tree in left
to right order, which matches to the input scan order.
d. In the top-down parsing, parsing decisions are based on the
lookahead symbol (or sequence of symbols).
2. Bottom-up parsing :
a. Bottom-up parsing can be defined as an attempt to reduce the input
string w to the start symbol of a grammar by finding out the right-
most derivation of w in reverse.
Compiler Design 2–3 C (CS/IT-Sem-5)
b. Parsing involves searching for the substring that matches the right
side of any of the productions of the grammar.
c. This substring is replaced by the left hand side non-terminal of the
production.
d. Process of replacing the right side of the production by the left side
non-terminal is called “reduction”.
Answer
Bottom-up parsing : Refer Q. 2.1, Page 2–2C, Unit-2.
Bottom-up parsing techniques are :
1. Shift-reduce parser :
a. Shift-reduce parser attempts to construct parse tree from leaves
to root and uses a stack to hold grammar symbols.
b. A parser goes on shifting the input symbols onto the stack until a
handle comes on the top of the stack.
c. When a handle appears on the top of the stack, it performs
reduction.
read/write head
Seeking
Shift-reduce parser
for handle
on stack
$ top
Stack
Fig. 2.2.1. Shift-reduce parser.
d. This parser performs following basic operations :
i. Shift
ii. Reduce
iii. Accept
iv. Error
2. LR parser : LR parser is the most efficient method of bottom-up
parsing which can be used to parse the large class of context free
grammars. This method is called LR(k) parsing. Here
Basic Parsing Techniques 2–4 C (CS/IT-Sem-5)
Answer
There are two most common conflict encountered in shift-reduce parser :
1. Shift-reduce conflict :
a. The shift-reduce conflict is the most common type of conflict found
in grammars.
b. This conflict occurs because some production rule in the grammar
is shifted and reduced for the particular token at the same time.
c. This error is often caused by recursive grammar definitions where
the system cannot determine when one rule is complete and
another is just started.
2. Reduce-reduce conflict :
a. A reduce-reduce conflict is caused when a grammar allows two or
more different rules to be reduced at the same time, for the same
token.
b. When this happens, the grammar becomes ambiguous since a
program can be interpreted more than one way.
c. This error can be caused when the same rule is reached by more
than one path.
PART-2
Operator Precedence Parsing.
Questions-Answers
Answer
1. A grammar G is said to be operator precedence if it posses following
properties :
a. No production on the right side is .
Compiler Design 2–5 C (CS/IT-Sem-5)
Answer
Algorithm for computing precedence function :
Input : An operator precedence matrix.
Output : Precedence functions representing the input matrix or an indication
that none exist.
Method :
1. Create symbols fa and ga for each a that is a terminal or $.
2. Partition the created symbols into as many groups as possible, in such a
way that if ab, then fa and gb are in the same group.
3. Create a directed graph whose nodes are the groups found in step 2. For
any a and b, if a <. b, place an edge from the group of gb to the group
of fa. If a .> b, place an edge from the group of fa to that of gb.
4. If the graph constructed in step 3 has a cycle, then no precedence
functions exist. If there are no cycles, let f(a) be the length of the longest
path from the group of fa; let g(b) be the length of the longest path from
the group of gb. Then there exists a precedence function.
Precedence graph for above matrix is :
fa ga
f( g(
f) g)
f; g;
f$ g$
Fig. 2.5.1.
Compiler Design 2–7 C (CS/IT-Sem-5)
( ( ) ; $
f 1 0 2 2 0
g 3 3 0 1 0
Answer
Operator precedence parsing algorithm :
Let the input string be a1, a2,....., an $. Initially, the stack contains $.
1. Set p to point to the first symbol of w$.
2. Repeat : Let a be the topmost terminal symbol on the stack and let b be
the current input symbol.
i. If only $ is on the stack and only $ is the input then accept and
break.
else
begin
.
ii. If a >· b or a = b then shift a onto the stack and increment p to next
input symbol.
iii. else if a <· b then reduce b from the stack
iv. Repeat :
c pop the stack
v. Until the top stack terminal is related by > · to the terminal most
recently popped.
else
vi. Call the error correcting routine
end
Operator precedence table :
* ( ) id $
· · · · · ·
* · · · · · ·
( · · · ·
) · · · ·
id · · · ·
$ · · · ·
Basic Parsing Techniques 2–8 C (CS/IT-Sem-5)
Parsing :
$(< · id > + (< · id · > * < · id · >))$ Handle id is obtained between < · · >
Reduce this by F id
(F + (< · id · > * < · id · >))$ Handle id is obtained between < · · >
Reduce this by F id
(F + (F * < · id · >))$ Handle id is obtained between < · · >
Reduce this by F id
(F + (F * F)) Remove all the non-terminals.
(+(*)) Insert $ at the beginning and at the
end.
Also insert the precedence operators.
$(< · + · >(< · * · >)) $ The * operator is surrounded by < · · >.
This indicates that * becomes handle.
That means we have to reduce T*F
operation first.
$<·+·>$ Now + becomes handle. Hence we
evaluate E + T.
$ $ Parsing is done.
PART-3
Top-down Parsing, Predictive Parsers.
Questions-Answers
Answer
Problems with top-down parsing are :
1. Backtracking :
a. Backtracking is a technique in which for expansion of non-terminal
symbol, we choose alternative and if some mismatch occurs then
we try another alternative if any.
b. If for a non-terminal, there are multiple production rules beginning
with the same input symbol then to get the correct derivation, we
need to try all these alternatives.
Compiler Design 2–9 C (CS/IT-Sem-5)
A Subtree
A A
A A
A A
Fig. 2.7.1. Left recursion.
e. This causes major problem in top-down parsing and therefore
elimination of left recursion is must.
3. Left factoring :
a. Left factoring is occurred when it is not clear that which of the two
alternatives is used to expand the non-terminal.
b. If the grammar is not left factored then it becomes difficult for the
parser to make decisions.
Que 2.8. What do you understand by left factoring and left
recursion and how it is eliminated ?
Answer
Left factoring and left recursion : Refer Q. 2.7, Page 2–8C, Unit-2.
Left factoring can be eliminated by the following scheme :
a. In general if
A 1| 2 |...| n|1|...|m
is a production then it is not possible for parser to take a decision whether
to choose first rule or second.
b. In such situation, the given grammar can be left factored as
Basic Parsing Techniques 2–10 C (CS/IT-Sem-5)
A A|1|...|m|
A 1| 2|...| n
Left recursion can be eliminated by following scheme :
a. In general if
A A1|A2 |...|An| 1| 2 ...| m
where no i begin with an A.
b. In such situation we replace the A-productions by
A 1 A| 2 A|...| m A
A 1 A|2 A|...|m A|
Answer
S AB
A BS|b
B SA|a
S AB
S BSB|bB
S S
A
ASB| aSB
|bB
1 2
S aSBS|bBS
S ASBS|
B ABA|a
B B
ABBA
|
bBA|
a
1 2
B bBA B |aB
B ABBA B|
A BS|a
A SAS|aS|a
A A
BAAB
|
aAB|
a
1 2
A aAB A |aA
A BAAB A|
The production after left recursion is
S aSB S |bBS
S ASB S|
A aABA|aA
Compiler Design 2–11 C (CS/IT-Sem-5)
A BAABA|
B bBA B|aB
B ABBA B|
Que 2.10. Write short notes on top-down parsing. What are top-
down parsing techniques ?
Answer
Top-down parsing : Refer Q. 2.1, Page 2–2C, Unit-2.
Top-down parsing techniques are :
1. Recursive-descent parsing :
i. A top-down parser that executes a set of recursive procedures to
process the input without backtracking is called recursive-descent
parser and parsing is called recursive-descent parsing.
ii. The recursive procedures can be easy to write and fairly efficient
if written in a language that implements the procedure call
efficiently.
2. Predictive parsing :
i. A predictive parsing is an efficient way of implementing recursive-
descent parsing by handling the stack of activation records
explicitly.
ii. The predictive parser has an input, a stack, a parsing table, and an
output. The input contains the string to be parsed, followed by $,
the right end-marker.
a + b $ Input
X Predictive
parsing Output
Stack Y
Z program
$
Parsing
Table
The program determines X symbol on top of the stack, and ‘a’ the
current input symbol. These two symbols determine the action of
the parser.
vi. Following are the possibilities :
a. If X = a= $, the parser halts and announces successful
completion of parsing.
b. If X = a $, the parser pops X off the stack and advances the
input pointer to the next input symbol.
c. If X is a non-terminal, the program consults entry M[X, a] of
the parsing table M. This entry will be either an X-production
of the grammar or an error entry.
Answer
Que 2.12. What are the problems with top-down parsing ? Write
Answer
Problems with top-down parsing : Refer Q. 2.7, Page 2–8C, Unit-2.
Compiler Design 2–13 C (CS/IT-Sem-5)
Answer
Answer
Non-recursive descent parsing (Predictive parsing) : Refer Q. 2.10,
Page 2–11C, Unit-2.
Numerical :
E TE
E + TE |
T FT
T *F T |
F F*|a|b
First we remove left recursion
F F * | a
| b
1 2
F aF |bF
F *F |
FIRST(E) = FIRST(T) = FIRST(F) = {a, b}
FIRST(E ) = { +, }, FIRST(F ) = {*, }
FIRST(T ) = {*, }
FOLLOW(E) = { $ }
FOLLOW(E) = { $ }
FOLLOW(T) = {+, $ }
FOLLOW(T) = {+, $ }
FOLLOW(F) = {*, +, $ }
FOLLOW(F) = {*, +, $ }
Compiler Design 2–15 C (CS/IT-Sem-5)
PART-4
Automatic Generation of Efficient Parsers : LR Parsers, The
Canonical Collections of LR(0) Items
Questions-Answers
Answer
Working of LR parser :
1. The working of LR parser can be understood by using block diagram as
shown in Fig. 2.15.1.
a1 a2 a3 ...... an–1 a n $ Input tape
Sm LR parsing
Output
Stack Xm program
Sm–1
Xm–1 Action goto
S0
2. In LR parser, it has input buffer for storing the input string, a stack for
storing the grammar symbols, output and a parsing table comprised of
two parts, namely action and goto.
3. There is a driver program and reads the input symbol one at a time from
the input buffer. This program is same for all LR parser.
4. It reads the input string one symbol at a time and maintains a stack.
5. The stack always maintains the following form :
S0X1 S1X2 S2...........Xm-1 Sm-1 Xm Sm
where Xi is a grammar symbol, each Si is the state and Sm state is top
of the stack.
6. The action of the driver program depends on action {Sm, ai] where ai is
the current input symbol.
7. Following action are possible for input ai ai + 1 ...... an :
a. Shift : If action [Sm, ai] = shift S, the parser shift the input symbol,
ai onto the stack and then stack state S. Now current input symbol
becomes ai+1.
Stack Input
S0X1 S1X2...........Xm-r Sm-r Xm Smai S, ai+1, ai+2............an$
b. Reduce : If action [Sm, ai] = reduce A the parser executes a
reduce move using the A production of the grammar. If A
has r grammar symbols, first 2r symbols are popped off the stack
(r state symbol and r grammar symbol). So, the top of the stack
now becomes Sm-r then A is pushed on the stack, and then state
goto [Sm-r A] is pushed on the stack. The current input symbol is
still ai.
Stack Input
S0X1 S1X2...........Xm-r Sm-r AS ai ai+1............an$
where, S = Goto [Sm-r A]
i. If action [Sm, ai] = accept, parsing is completed.
ii. If action [Sm, ai] = error, the parser has discovered a syntax
error.
LR parser is widely used for following reasons :
1. LR parsers can be constructed to recognize most of the programming
language for which context free grammar can be written.
2. The class of grammar that can be parsed by LR parser is a superset of
class of grammars that can be parsed using predictive parsers.
3. LR parser works using non-backtracking shift-reduce technique.
4. LR parser is an efficient parser as it detects syntactic error very quickly.
Compiler Design 2–17 C (CS/IT-Sem-5)
Answer
1. LR(0) items : The LR (0) item for grammar G is production rule in
which symbol • is inserted at some position in R.H.S. of the rule. For
example
S •ABC
S A•BC
S AB•C
S ABC•
The production S generates only one item S •.
2. Augmented grammar : If a grammar G is having start symbol S then
augmented grammar G in which S is a new start symbol such that
S S. The purpose of this grammar is to indicate the acceptance of
input. That is when parser is about to reduce S S, it reaches to
acceptance state.
3. Kernel items : It is collection of items S • S and all the items whose
dots are not at the left most end of R.H.S. of the rule.
Non-kernel items : The collection of all the items in which • are at the
left end of R.H.S. of the rule.
4. Functions closure and goto : These are two important functions
required to create collection of canonical set of LR (0) items.
PART-5
Constructing SLR Parsing Tables.
Questions-Answers
Answer
The SLR parsing can be done as :
Basic Parsing Techniques 2–18 C (CS/IT-Sem-5)
Input
Parsing of input string
string
Output
Fig. 2.17.1. Working of SLR parser.
Algorithm for construction of an SLR parsing table :
Input : C the canonical collection of sets of items for an augmented grammar
G.
Output : If possible, an LR parsing table consisting of parsing action function
ACTION and a goto function GOTO.
Method :
Let C = [I0, I1, ........ , In]. The states of the parser are 0, 1....n state i being
constructed from Ii.
The parsing action for state is determined as follows :
1. If [A •a] is in Ii and GOTO (Ii, a) = Ij then set ACTION [i, a] to
“shift j”. Here a is a terminal.
2. If [A •] is in Ii the set ACTION [i, a] to “reduce A ” for all ‘a’ in
FOLLOW (A).
3. If [S S•] is in Ii, then set ACTION [i, $] to “accept”.
The goto transitions for state i are constructed using the rule.
4. If GOTO (Ii, A) = Ij then GOTO [i, A] = j.
5. All entries not defined by rules (1) through rule (4) are made “error”.
6. The initial state of the parser is the one constructed from the set of
items containing [S•S].
The parsing table consisting of the parsing ACTION and GOTO function
determined by this algorithm is called the SLR parsing table for G. An
LR parser using the SLR parsing table for G is called the SLR parser
for G and a grammar having an SLR parsing table is said to be SLR(1).
Answer
The augmented grammar G for the above grammar G is :
S S
S A)
S A, P
S (P, P
P {num, num}
The canonical collection of sets of LR(0) item for grammar are as follows :
I0 : S • S
S • A)
S • A, P
S • (P, P
P • {num, num}
I1 = GOTO (I0, S)
I1 : S S •
I2 = GOTO (I0, A)
I2 : S A •)
S A •, P
I3 = GOTO (I0, ( )
I3 : S ( • P, P
P •{num, num}
I4 = GOTO (I0, { )
I4 : P {• num, num}
I5 = GOTO (I2, ))
I5 : S A ) •
I6 = GOTO (I2, ,)
I6 : S A,•P
P •{num, num}
I7 = GOTO (I3, P)
I7 : S ( P•, P
I8 = GOTO (I4, num)
I8 : P {num •, num}
I9 = GOTO (I6, P)
I9 : S A, P•
I10 = GOTO (I7, ,)
I10 : S (P, •P
P • {num, num}
I11 = GOTO (I8, ,)
I11 : P {num, • num}
I12 = GOTO (I10, P)
S (P,P•
I13 = GOTO (I11, num)
I13 : P {num, num •}
I14 = GOTO (I13 ,})
I14 : P {num, num} •
Basic Parsing Techniques 2–20 C (CS/IT-Sem-5)
Action Goto
Item ) , ( { Num } $ S A P
Set
0 S3 S4 1 2
1 accept
2 S5 S6
3 S4 6
4 S8
5 r1
6 S4 r2 9
7 S10
8 S11
9 r2
10 S4 12
11 S13
12 r3
13 S14
14 r4 r4
Answer
The augmented grammar is :
S S
S AS|b
A SA|a
The canonical collection of LR(0) items are
Io : S • S
S • AS|• b
A • SA|• a
I1 = GOTO (I0, S)
I1 : S S • S AS|• b
A S • A
Compiler Design 2–21 C (CS/IT-Sem-5)
A • SA|• a
I2 = GOTO (I0, A)
I2 : S A • S
S • AS|• b
A • SA|• a
I3 = GOTO (I0, b)
I3 : S b •
I4 = GOTO (I0, a)
I4 : A a •
I5 = GOTO (I1, A)
I5 : A SA •
I6 = GOTO (I1, S) = I1
I7 = GOTO (I1, a) = I4
I8 = GOTO (I2, S)
I8 : S AS •
I9 = GOTO (I2, A) = I2
I10 = GOTO (I2, b) = I3
Let us numbered the production rules in the grammar as :
1. S AS
2. S b
3. A SA
4. A a
FIRST(S) = FIRST(A) = {a, b}
FOLLOW(S) ={$, a, b}
FOLLOW(A) = {a, b}
S A
I0 I1 I5
a
A a
b
I2 b I4
A
S b I3
a
I8
Answer
The augmented grammar is as :
E E
E E+E
E E*E
E (E)
E id
The set of LR(0) items is as follows :
Compiler Design 2–23 C (CS/IT-Sem-5)
I0 : E •E
E •E + E
E •E * E
E •(E)
E •id
I1 = GOTO (I0, E)
I1 : E E•
E E• + E
E E•* E
I2 = GOTO (I0, ( )
I2 : E (•E)
E •E + E
E •E * E
E •(E)
E •id
I3 = GOTO (I0, id)
I3 : E id•
I4 = GOTO (I1, +)
I4 : E E + •E
E •E + E
E •E * E
E •(E)
E •id
I5 = GOTO (I1, *)
I5 : E E * •E
E •E + E
E •E * E
E •(E)
E •id
I6 = GOTO (I2, E)
I6 : E (E•)
E E• + E
E E• * E
I7 = GOTO (I4, E)
I7 : E E + E•
E E• + E
E E• * E
I8 = GOTO (I5, E)
I8 : E E * E•
E E• + E
E E• * E
I9 = GOTO (I6, ))
I9 : E (E)•
Basic Parsing Techniques 2–24 C (CS/IT-Sem-5)
Action Goto
State id + * ( ) $ E
0 S3 S2 1
1 S4 S5 accept
2 S3 S2 6
3 r4 r4 r4 r4
4 S3 S2 8
5 S3 S2 8
6 S4 S5 S3
7 r1 S5 r1 r1
8 r2 r2 r2 r2
9 r3 r3 r3 r3
Que 2.21. Perform shift reduce parsing for the given input strings
using the grammar S (L)|a L L, S|S
i. (a, (a, a)) ii. (a, a) AKTU 2018-19, Marks 07
Answer
i.
Stack contents Input string Actions
$ (a, (a, a))$ Shift (
$( (a, (a, a))$ Shift a
$(a ,(a, a))$ Reduce S a
$(S ,(a, a))$ Reduce L S
$(L ,(a, a))$ Shift )
$(L, (a, a))$ Shift (
$(L, ( a, a))$ Shift a
$(L, (a ,a))$ Reduce S a
$(L, (S ,a))$ Reduce L S
$(L, (L ,a))$ Shift,
$(L, (L, a))$ Shift a
$(L, (L, a ))$ Reduce S a
$(L, (L, S ))$ Reduce L L, S
$(L, (L ))$ Shift )
$(L, (L) )$ Reduce S (L)
$(L, S )$ Reduce L L, S
$(L )$ Shift )
$(L) $ Reduce S (L)
$S $ Accept
Compiler Design 2–25 C (CS/IT-Sem-5)
ii.
Stack contents Input string Actions
$ (a, a)$ Shift (
$( a, a)$ Shift a
$(a , a)$ Reduce S a
$(S , a)$ Reduce L S
$(L , a)$ Shift ,
$(L, a)$ Shift a
$(L, a )$ Reduce S a
$(L, S )$ Reduce L L, S
$(L )$ Shift )
$(L) $ Reduce S L
$S $ Accept
Answer
The augmented grammar is :
S S
S cB|ccA
A cA|a
B ccB|b
The canonical collection of LR (0) items are :
I0 : S • S
S • cB|• ccA
A • cA|• a
B • ccB|• b
I1 = GOTO (I0, S)
I1 : SS •
I2 = GOTO (I0, c)
I2 : S c • B|c • cA
A c•A
B c•cB
A •cA|•a
B •ccB|•b
Basic Parsing Techniques 2–26 C (CS/IT-Sem-5)
I3 = GOTO (I0, a)
I3 : A a •
I4 = GOTO (I0, b)
I4 : B b •
I5 = GOTO (I2, B)
I5 : S cB •
I6 = GOTO (I2, A)
I6 : A cA •
I7 = GOTO (I2, c)
I7 : S cc • A
B cc • B
A c•A
B c • cB
A • cA/ • a
B • ccB/ • b
I8 = GOTO (I7, A)
I8 : S ccA •
A cA •
I9 = GOTO (I7, B)
I9 : B ccB •
I10 = GOTO (I7, c)
I10 : B cc • B
A c•A
B c • cB
B • ccB| • b
A • cA| • a
I11 = GOTO (I10, A)
I11 : A cA •
DFA for set of items :
S
I0 I1 A I6
c
B I8
a b I2 I5 A
c
b B I9
I3 I4 b I7 c
a A
a b I10 I11
c
a
Fig. 2.22.1.
Compiler Design 2–27 C (CS/IT-Sem-5)
Action GOTO
States a b c $ A B S
I0 S3 S4 S2 1
I1 Accept
I2 S3 S4 S7 6 5
I3 r4 r4 r4 r4
I4 r6 r6 r6 r6
I5 r1 r1 r1 r1
I6 r3 r3 r3 r3
I7 S3 S4 S 10 8 9
I9 r5 r5 r5 r5
I10 S3 S4 S 10 11
I11 r3 r3 r3 r3
PART-6
Constructing Canonical LR Parsing Tables.
Questions-Answers
Answer
Algorithm for construction of canonical LR parsing table :
Input : An augmented grammar G.
Output : The canonical parsing table function ACTION and GOTO for G.
Method : Construct C= {I0, I1,...., In} the collection of sets of LR (1) items of
G. State i of the parser is constructed from Ii.
1. The parsing actions for state i are determined as follows :
a. If [A •a , b] is in Ii and GOTO (Ii, a) = I j , then set
ACTION [i, a] to “shift j”. Here, a is required to be a terminal.
b. If [A •, a] is in Ii, A S, then set ACTION [i, a] to “reduce
A •”.
c. If [S S•, $] is in Ii, then set ACTION [i, $] to “accept”.
The goto transitions for state i are determined as follows :
2. If GOTO (Ii, A) = Ij then GOTO [i, A] = j.
3. All entries not defined by rules (1) and (2) are made “error”.
4. The initial state of parser is the one constructed from the set containing
items [S •S, $].
If the parsing action function has no multiple entries then grammar is
said to be LR (1) or LR.
PART-7
Constructing LALR Parsing Tables Using Ambiguous Grammars, An
Automatic Parser Generator, Implementation of LR Parsing Tables.
Questions-Answers
Answer
The algorithm for construction of LALR parsing table is as :
Input : An augmented grammar G.
Output : The LALR parsing table function ACTION and GOTO for G.
Compiler Design 2–29 C (CS/IT-Sem-5)
Method :
1. Construct C = {I0, Ij,........., In} the collection of sets of LR (1) items.
2. For each core present among the LR (1) items, find all sets having that
core, and replace these sets by their union.
3. Let C = {J0, J1,..........Jm} be the resulting sets of LR (1) items. The
parsing actions for state i are constructed from Ji. If there is a parsing
action conflicts, the algorithms fails to produce a parser and the
grammar is said not to be LALR (1).
4. The goto table constructed as follows. If ‘J’ is the union of one or more
sets of LR (1) items, i.e., J = I1 I2 I3........ Ik, then the cores of the
GOTO (I1, X), GOTO (I2, X),......, GOTO (Ik, X) are the same. Since I1,
I2,......., Ik all have the same core. Let k be the union of all sets of the
items having the same core as GOTO (I1, X). Then GOTO (J, X) = k.
The table produced by this algorithm is called LALR parsing table for
grammar G. If there are no parsing action conflicts, then the given
grammar is said to be LALR(1) grammar.
The collection of sets of items constructed in step ‘3’ of this algorithm is
called LALR(1) collections.
Answer
Augmented grammar :
S S
S aAd|bBd| aBe|bAe
A f
B f
Canonical collection of LR(1) grammar :
I0 : S •S, $
S •aAd, $
S •bBd, $
S •aBe, $
S •bAe, $
A •f, d/e
B •f, d/e
Basic Parsing Techniques 2–30 C (CS/IT-Sem-5)
I1 : = GOTO(I0, S)
I1: S •S, $
I2 : = GOTO(I0, a)
I2 : S a•Ad, $
S aBe, $
A •f, d
B •f, e
I3 : = GOTO(I0, b)
I3 : S b•Bd, $
S b•Ae, $
A •f, d
B •f, e
I4 : = GOTO(I2, A)
I4 : S aA•d, $
I5 : = GOTO(I2, B)
I5 : S aB•d, $
I6 : = GOTO(I2, f)
I6 : A f•, d
B f•, e
I7 : = GOTO(I3, B)
I7 : S bB•d, $
I8 : = GOTO(I3, A)
I8 : S bA•e, $
I9 : = GOTO(I3, f)
I9 : A f•, d
B f•, e
I10 : = GOTO(I4, d)
I10 : S aAd• , $
I11 : = GOTO(I5, d)
I11 : S aBd• , $
I12 : = GOTO(I7, d)
I12 : S bBd• , $
I13 : = GOTO(I8, e)
I13 : S bAe• , $
Compiler Design 2–31 C (CS/IT-Sem-5)
Answer
Augmented grammar G for the given grammar :
S S
S Aa
S bAc
S Bc
S bBa
Basic Parsing Techniques 2–32 C (CS/IT-Sem-5)
Ad
Bd
Canonical collection of sets of LR(0) items for grammar are as follows :
I0 : S •S, $
S •Aa, $
S •bAc, $
S •Bc, $
S •bBa, $
A •d, a
B •d, c
I1 = GOTO (I0, S)
I1 : S S•, $
I2 = GOTO (I0, A)
I2 : S A•a, $
I3 = GOTO (I0, b)
I3 : S b•Ac, $
S b•Ba, $
A •d, c
B •d, a
I4 = GOTO (I0, B)
I4 : S B•c, $
I5 = GOTO (I0, d)
I5 : A d•, a
B d•, c
I6 = GOTO (I0, a)
I6 : S Aa•, $
I7 = GOTO (I3, A)
I7 : S bA•c, $
I8 = GOTO (I3, B)
I8 : S bB•a, $
I9 = GOTO (I3, d)
Compiler Design 2–33 C (CS/IT-Sem-5)
I9 : A d•, c
B d•, a
I10 = GOTO (I4, c)
I10 : S Bc•, $
I11 = GOTO (I7, c)
I11 : S bAc•, $
I12 = GOTO (I8, a)
I12 : S bBa•, $
The action/goto table will be designed as follows :
Table 2.26.1.
Since the table does not have any conflict. So, it is LR(1).
For LALR(1) table, item set 5 and item set 9 are same. Thus we merge
both the item sets (I5, I9) = item set I59. Now, the resultant parsing table
becomes :
Basic Parsing Techniques 2–34 C (CS/IT-Sem-5)
Table 2.26.2.
State Action Goto
a b c d $ S A B
0 S3 S59 1 2 4
1 accept
2 S6
3 S59 7 8
4 S10
59 r59, r6 r6, r59
6 r1
7 S11
8 S12
10 r3
11 r2
12 r4
Que 2.27. Construct the LALR pars ing table for following
grammar :
S AA
A aA
Ab
is LR (1) but not LALR(1). AKTU 2015-16, Marks 10
Answer
The given grammar is :
S AA
A aA|b
The augmented grammar will be :
S S
S AA
A aA|b
Compiler Design 2–35 C (CS/IT-Sem-5)
Table 2.27.1.
Action Goto
State
a b $ S A
0 S3 S4 1 2
1 accept
2 S6 S7 5
3 S3 S4 8
4 r3 r3
5 r1
6 S6 S7 9
7 r3
8 r2 r2
9 r2
Since table does not contain any conflict. So it is LR(1).
The goto table will be for LALR I3 and I6 will be unioned, I4 and I7 will
be unioned, and I8 and I9 will be unioned.
So, I36 : A a•A, a / b / $
A •aA, a / b / $
A •b, a / b / $
I47 : A b•, a / b / $
I89 : A aA •, a / b / $ and LALR table will be :
Table 2.27.2.
Action Goto
State a b $ S A
0 S36 S47 1 2
1 accept
2 S36 S47 5
36 S36 S47 89
47 r3 r3 r3
5 r1
89 r2 r2 r2
Compiler Design 2–37 C (CS/IT-Sem-5)
Since, LALR table does not contain any conflict. So, it is also LALR(1).
DFA :
S
I0 I1
A A
I2 I5
a a
a A
I36 I89
b b
b
I47
Fig. 2.27.1.
Q. 10. Perform shift reduce parsing for the given input strings
using the grammar S (L)|a L L, S|S
i. (a, (a, a))
ii. (a, a)
Ans. Refer Q. 2.21.
Compiler Design 3–1 C (CS/IT-Sem-5)
3
UNIT
Syntax-Directed
Translations
CONTENTS
Part-1 : Syntax-Directed Translation : ................. 3–2C to 3–5C
Syntax-Directed Translation Scheme,
Implementation of Syntax-Directed
Translators
PART-1
Syntax-Directed Translation : Syntax-Directed Translation
Schemes, Implementation of Syntax-Directed Translators.
Questions-Answers
Answer
1. Syntax directed definition/translation is a generalization of context
free grammar in which each grammar production X is associated
with a set of semantic rules of the form a := f (b1, b2, .... bk), where a is
an attribute obtained from the function f.
2. Syntax directed translation is a kind of abstract specification.
3. It is done for static analysis of the language.
4. It allows subroutines or semantic actions to be attached to the
productions of a context free grammar. These subroutines generate
intermediate code when called at appropriate time by a parser for that
grammar.
5. The syntax directed translation is partitioned into two subsets called
the synthesized and inherited attributes of grammar.
Lexical analysis
Token stream
Syntax analysis
Parse tree
Semantic analysis
Dependency graph
Syntax directed
translation Evaluation order for semantic rules
Translation of constructs
Fig. 3.1.1.
Compiler Design 3–3 C (CS/IT-Sem-5)
E.val = 29 F * F.val = 2
T
( E E.val = 29 ) id id.lexval = 2
T.val = 28 T + F F.val = 1
T F F.val = 7
T.val = 4 *
id id
id.lexval = 4 id.lexval = 7
Fig. 3.1.2.
Answer
Syntax directed translation : Refer Q. 3.1, Page 3–2C, Unit-3.
Semantic actions are attached with every node of annotated parse tree.
Example : A parse tree along with the values of the attributes at nodes
(called an “annotated parse tree”) for an expression 2 + 3*5 with synthesized
attributes is shown in the Fig. 3.2.1.
E.val=17
E
E.val=2 E
T T.val=15
Fig. 3.2.1.
Syntax-Directed Translations 3–4 C (CS/IT-Sem-5)
Answer
Attributes :
1. Attributes are associated information with language construct by
attaching them to grammar symbols representing that construct.
2. Attributes are associated with the grammar symbols that are the labels
of parse tree node.
3. An attribute can represent anything (reasonable) such as string, a
number, a type, a memory location, a code fragment etc.
4. The value of an attribute at parse tree node is defined by a semantic rule
associated with the production used at that node.
Synthesized attribute :
1. An attribute at a node is said to be synthesized if its value is computed
from the attributed values of the children of that node in the parse tree.
2. A syntax directed definition that uses the synthesized attributes is
exclusively said to be S-attributed definition.
3. Thus, a parse tree for S-attributed definition can always be annotated
by evaluating the semantic rules for the attributes at each node from
leaves to root.
4. If the translations are specified using S-attributed definitions, then the
semantic rules can be conveniently evaluated by the parser itself during
the parsing.
For example : A parse tree along with the values of the attributes at
nodes (called an “annotated parse tree”) for an expression 2 + 3*5 with
synthesized attributes is shown in the Fig. 3.3.1.
E.val=17
E
E.val=2 E
T T.val=15
Inherited attribute :
1. An inherited attribute is one whose value at a node in a parse tree is
defined in terms of attributes at the parent and/or sibling of that node.
2. Inherited attributes are convenient for expressing the dependence of a
programming language construct.
For example : Syntax directed definitions that uses inherited attribute
are given as :
D TL L.type : = T.Type
T int T.type : = integer
T real T.type : = real
L L1, id L1.type : = L.type
enter (id.prt, L.type)
L id enter (id.prt, L.type)
The parse tree, along with the attribute values at the parse tree nodes, for an
input string int id1, id2 and id3 is shown in the Fig. 3.3.2.
D
T.type=int T
L L.type=int
int L id 3
L.type=int L id 2
id 1
Answer
PART-2
Intermediate Code, Postfix Notation, Parse Trees and Syntax Trees.
Questions-Answers
Answer
Intermediate code generation is the fourth phase of compiler which takes
parse tree as an input from semantic phase and generates an intermediate
code as output.
The benefits of intermediate code are :
1. Intermediate code is machine independent, which makes it easy to
retarget the compiler to generate code for newer and different processors.
2. Intermediate code is nearer to the target machine as compared to the
source language so it is easier to generate the object code.
3. The intermediate code allows the machine independent optimization of
the code by using specialized techniques.
4. Syntax directed translation implements the intermediate code
generation, thus by augmenting the parser, it can be folded into the
parsing.
Answer
Postfix (reverse polish) translation : It is the type of translation in which
the operator symbol is placed after its two operands.
For example :
Consider the expression : (20 + (–5)* 6 + 12)
Postfix for above expression can be calculate as :
(20 + t1 * 6 + 12) t1 = 5 –
20 + t2 + 12 t2 = t16*
t3 + 12 t3 = 20 t2 +
t4 t4 = t3 12 +
Now putting values of t4, t3, t2, t1
t4 = t3 12 +
Compiler Design 3–7 C (CS/IT-Sem-5)
20 t2 + 12 +
20 t1 6 * + 12 +
(20) 5 – 6 * + 12 +
Que 3.7. Define parse tree. Why parse tree construction is only
possible for CFG ?
Answer
Parse tree : A parse tree is an ordered tree in which left hand side of a
production represents a parent node and children nodes are represented by
the production’s right hand side.
Conditions for constructing a parse tree from a CFG are :
i. Each vertex of the tree must have a label. The label is a non-terminal or
terminal or null ().
ii. The root of the tree is the start symbol, i.e., S.
iii. The label of the internal vertices is non-terminal symbols VN.
iv. If there is a production A X1X2 ....XK . Then for a vertex, label A, the
children node, will be X1X2 .... XK .
v. A vertex n is called a leaf of the parse tree if its label is a terminal
symbol or null ().
Parse tree construction is only possible for CFG. This is because the properties
of a tree match with the properties of CFG.
Que 3.8. What is syntax tree ? What are the rules to construct
syntax tree for an expression ?
Answer
1. A syntax tree is a tree that shows the syntactic structure of a program
while omitting irrelevant details present in a parse tree.
2. Syntax tree is condensed form of the parse tree.
3. The operator and keyword nodes of a parse tree are moved to their
parent and a chain of single production is replaced by single link.
Rules for constructing a syntax tree for an expression :
1. Each node in a syntax tree can be implemented as a record with several
fields.
2. In the node for an operator, one field identifies the operator and the
remaining field contains pointer to the nodes for the operands.
3. The operator often is called the label of the node.
4. The following functions are used to create the nodes of syntax trees for
expressions with binary operators. Each function returns a pointer to
newly created node.
a. Mknode(op, left, right) : It creates an operator node with label op
and two field containing pointers to left and right.
Syntax-Directed Translations 3–8 C (CS/IT-Sem-5)
– id
to entry for c
id num 4
to entry for a
Fig. 3.8.1. The syntax tree for a – 4 + c.
Answer
Syntax tree for given expression : a * (b + c) – d/2
–
* /
a + d 2
b c
Fig. 3.9.1.
Compiler Design 3–9 C (CS/IT-Sem-5)
PART-3
Three Address Code, Quadruples and Triples.
Questions-Answers
Answer
1. Three address code is an abstract form of intermediate code that can be
implemented as a record with the address fields.
2. The general form of three address code representation is :
a := b op c
where a, b and c are operands that can be names, constants and op
represents the operator.
3. The operator can be fixed or floating point arithmetic operator or logical
operators or boolean valued data. Only single operation at right side of
the expression is allowed at a time.
4. There are at most three addresses are allowed (two for operands and
one for result). Hence, the name of this representation is three address
code.
For example : The three address code for the expression a = b + c + d
will be :
t1 := b + c
t2 := t1 + d
a := t2
Here t1 and t2 are the temporary names generated by the compiler.
Syntax-Directed Translations 3–10 C (CS/IT-Sem-5)
Que 3.11. What are different ways to write three address code ?
Answer
Different ways to write three address code are :
1. Quadruple representation :
a. The quadruple is a structure with at most four fields such as op,
arg1, arg2, result.
b. The op field is used to represent the internal code for operator, the
arg1 and arg2 represent the two operands used and result field is
used to store the result of an expression.
For example : Consider the input statement x := – a * b + – a * b
The three address code is
+ t2 t4 t5
x := t5
:= t5 x
Que 3.12. Write the quadruples, triple and indirect triple for the
following expression :
(x + y) * (y + z) + (x + y + z)
AKTU 2018-19, Marks 07
Answer
The three address code for given expression :
t1 := x + y
t2 := y + z
t3 := t1* t2
t4 := t1 + z
t5 := t3 + t4
i. The quadruple representation :
Location Operator Operand 1 Operand 2 Result
(1) + x y t1
(2) + y z t2
(3) * t1 t2 t3
(4) + t1 z t4
(5) + t3 t4 t5
Syntax-Directed Translations 3–12 C (CS/IT-Sem-5)
Que 3.13. Generate three address code for the following code :
switch a + b
{
case 1 : x = x + 1
case 2 : y = y + 2
case 3 : z = z + 3
default : c = c – 1
} AKTU 2015-16, Marks 10
Answer
101 : t1 = a + b goto 103
102 : goto 115
103 : t = 1 goto 105
104 : goto 107
105 : t2 = x + 1
106 : x = t2
107 : if t = 2 goto 109
108 : goto 111
109 : t3 = y + 2
110 : y = t3
111 : if t = 3 goto 113
112 : goto 115
113 : t4 = z + 3
114 : z = t4
115 : t5 = c – 1
116 : c = t5
117 : Next statement
Compiler Design 3–13 C (CS/IT-Sem-5)
Answer
Given : low1 = 1 and low = 1, n1 = 10, n2 = 20.
B[i, j] = ((i × n2) +j) × w + (base – ((low1 × n2) + low) × w)
B[i, j] = ((i × 20) +j) × 4 + (base – ((1 × 20) + 1) × 4)
B[i, j] = 4 × (20 i + j) + (base – 84)
Similarly, A[i, j] = 4 × (20 i + j) + (base – 84)
and, D[i, j] = 4 × (20 i + j) + (base – 84)
Hence, C[A[i, j]] = 4 × (20 i + j) + (base – 84) + 4 × (20 i + j) +
(base – 84) + 4 × (20 i + j) + (base – 84)
= 4 × (20 i + j) + (base – 84) [1 + 1 + 1]
= 4 × 3 × (20 i + j) + (base – 84) × 3
= 12 × (20 i + j) + (base – 84) × 3
Therefore, three address code will be
t1 = 20 × i
t2 = t1 + j
t3 = base – 84
t4 = 12 × t2
t5 = t4 + 3 × t3
PART-4
Translation of Assignment Statements.
Questions-Answers
Que 3.15. How would you convert the following into intermediate
code ? Give a suitable example.
i. Assignment statements
ii. Case statements AKTU 2016-17, Marks 15
Syntax-Directed Translations 3–14 C (CS/IT-Sem-5)
Answer
i. Assignment statements :
S id := E { id_entry := look_up(id.name);
if id_entry nil then
append (id_entry ‘:=’ E.place)
else error; /* id not declared*/
}
E E1 + E2 { E.place := newtemp();
append (E.place ‘:=’ E1.place ‘+’ E2.place)
}
E E1 * E2 { E.place := newtemp();
append (E.place ‘:=’ E1.place ‘*’ E2.place)
}
E – E1 { E.place := newtemp();
append (E.place ‘:=’ ‘minus’ E1.place)
}
E id { id_entry: = look_up(id.name);
if id_entry nil then
append (id_entry ‘:=’ E.place)
else error; /* id not declared*/
}
1. The look_up returns the entry for id.name in the symbol table if it exists
there.
2. The function append is used for appending the three address code to the
output file. Otherwise, an error will be reported.
3. Newtemp() is the function used for generating new temporary variables.
4. E.place is used to hold the value of E.
Example : x := (a + b)*(c + d)
We will assume all these identifiers are of the same type. Let us have
bottom-up parsing method :
Compiler Design 3–15 C (CS/IT-Sem-5)
switch expression
{
case value : statement
case value : statement
...
case value : statement
default : statement
}
Syntax-Directed Translations 3–16 C (CS/IT-Sem-5)
Example :
switch(ch)
{
case 1 : c = a + b;
break;
case 2 : c = a – b;
break;
}
The three address code can be
if ch = 1 goto L1
if ch = 2 goto L2
L1 : t1 := a + b
c := t1
goto last
L2 : t2 := a – b
c := t2
goto last
last :
Answer
1. Boolean expression are used along with if-then, if-then-else,
while-do, do-while statement constructs.
2. S If E then S1 | If E then S1 else S2|while E do S1 |do E1 while E.
3. All these statements ‘E’ correspond to a boolean expression evaluation.
4. This expression E should be converted to three address code.
5. This is then integrated in the context of control statement.
Translation procedure for if-then and if-then-else statement :
1. Consider a grammar for if-else
S if E then S1|if E then S1 else S2
2. Syntax directed translation scheme for if-then is given as follows :
S if E then S1
E.true := new_label()
E.false := S.next
S1.next := S.next
Compiler Design 3–17 C (CS/IT-Sem-5)
if a<b E.true
E.true: a := a+5 S1
E.false: a := a+7
PART-5
Boolean Expressions, Statements that alter the Flow of Control.
Questions-Answers
Answer
1. Backpatching is the activity of filling up unspecified information of labels
using appropriate semantic actions during the code generation process.
2. Backpatching refers to the process of resolving forward branches that
have been used in the code, when the value of the target becomes
known.
3. Backpatching is done to overcome the problem of processing the
incomplete information in one pass.
4. Backpatching can be used to generate code for boolean expressions and
flow of control statements in one pass.
To generate code using backpatching following functions are used :
1. Makelist(i) : Makelist is a function which creates a new list from one
item where i is an index into the array of instructions.
2. Merge(p1, p2) : Merge is a function which concatenates the lists pointed
by p1 and p2, and returns a pointer to the concatenated list.
Compiler Design 3–19 C (CS/IT-Sem-5)
Answer
Translation scheme for boolean expression can be understand by following
example.
Consider the boolean expression generated by the following grammar :
E E OR E
E E AND E
E NOT E
E (E)
E id relop id
E TRUE
E FALSE
Here the relop is denoted by , , , <, >. The OR and AND are left associate.
The highest precedence is NOT then AND and lastly OR.
The translation scheme for boolean expressions having numerical
representation is as given below :
Compiler Design 3–21 C (CS/IT-Sem-5)
PART-6
Postfix Translation : Array References in Arithmetic Expressions.
Questions-Answers
Answer
1. In a production A , the translation rule of A.CODE consists of the
concatenation of the CODE translations of the non-terminals in in the
same order as the non-terminals appear in .
2. Production can be factored to achieve postfix form.
Postfix translation of while statement :
Production : S while M1 E do M2 S1
Can be factored as :
1. S C S1
2. C W E do
3. W while
A suitable transition scheme is given as :
Answer
Postfix notation : Refer Q. 3.6, Page 3–6C, Unit-3.
Numerical : Syntax directed translation scheme to specify the translation of
an expression into postfix notation are as follow :
Production :
E E1 + T
E1 T
T T1 × F
T1 F
F (E)
F id
Compiler Design 3–23 C (CS/IT-Sem-5)
Schemes :
E.code = E1.code ||T1code || ‘+’
E1.code = T.code
T1.code = T1.code ||F.code || ‘×’
T1.code = F.code
F.code = E.code
F.code = id.code
where ‘||’ sign is used for concatenation.
PART-7
Procedures Call.
Questions-Answers
Answer
Procedures call :
1. Procedure is an important and frequently used programming construct
for a compiler.
2. It is used to generate code for procedure calls and returns.
3. Queue is used to store the list of parameters in the procedure call.
4. The translation for a call includes a sequence of actions taken on entry
and exit from each procedure. Following actions take place in a calling
sequence :
a. When a procedure call occurs then space is allocated for activation
record.
b. Evaluate the argument of the called procedure.
c. Establish the environment pointers to enable the called procedure
to access data in enclosing blocks.
d. Save the state of the calling procedure so that it can resume execution
after the call.
e. Also save the return address. It is the address of the location to
which the called routine must transfer after it is finished.
f. Finally generate a jump to the beginning of the code for the called
procedure.
Syntax-Directed Translations 3–24 C (CS/IT-Sem-5)
Answer
1. An array is a collection of elements of similar data type. Here, we assume
the static allocation of array, whose subscripts ranges from one to some
limit known at compile time.
2. If width of each array element is ‘w’ then the ith element of array A
begins in location,
base + (i – low) * d
where low is the lower bound on the subscript and base is the relative
address of the storage allocated for an array i.e., base is the relative
address of A[low].
3. A two dimensional array is normally stored in one of two forms, either
row-major (row by row) or column-major (column by column).
4. The Fig. 3.22.1 for row-major and column-major are given as :
A [1, 1] A [1, 1]
First Column
First Row A [1, 2] A [2, 1]
A [1, 3] A [1, 2]
Second Column
A [2, 1] A [2, 2]
Second Row A [2, 2] A [1, 3]
Third Column
A [2, 3] A [2, 3]
Fig. 3.22.1.
PART-8
Declarations Statements.
Questions-Answers
Answer
In the declarative statements the data items along with their data types are
declared.
For example :
SD {offset:= 0}
D id : T {enter_tab(id.name, T.type,offset);
offset:= offset + T.width)}
T integer {T.type:= integer;
T.width:= 8}
T real {T.type:= real;
T.width:= 8}
T array[num] of T1 {T.type:= array(num.val, T1.type)
T.width:= num.val × T1.width}
T *T1 {T.type:= pointer(T.type)
T.width:= 4}
Syntax-Directed Translations 3–26 C (CS/IT-Sem-5)
1. Initially, the value of offset is set to zero. The computation of offset can
be done by using the formula offset = offset + width.
2. In the above translation scheme, T.type, T.width are the synthesized
attributes. The type indicates the data type of corresponding identifier
and width is used to indicate the memory units associated with an
identifier of corresponding type. For instance integer has width 4 and
real has 8.
3. The rule D id : T is a declarative statements for id declaration. The
enter_tab is a function used for creating the symbol table entry for
identifier along with its type and offset.
4. The width of array is obtained by multiplying the width of each element
by number of elements in the array.
5. The width of pointer types of supposed to be 4.
Compiler Design 4–1 C (CS/IT-Sem-5)
4
UNIT
Symbol Tables
CONTENTS
Part-1 : Symbol Tables : .......................................... 4–2C to 4–7C
Data Structure for
Symbol Tables
PART-1
Symbol Tables : Data Structure for Symbol Tables.
Questions-Answers
Answer
1. A symbol table is a data structure used by a compiler to keep track of
scope, life and binding information about names.
2. These information are used in the source program to identify the various
program elements, like variables, constants, procedures, and the labels
of statements.
3. A symbol table must have the following capabilities :
a. Lookup : To determine whether a given name is in the table.
b. Insert : To add a new name (a new entry) to the table.
c. Access : To access the information related with the given name.
d. Modify : To add new information about a known name.
e. Delete : To delete a name or group of names from the table.
Que 4.2. What are the symbol table requirements ? What are the
demerits in the uniform structure of symbol table ?
Answer
The basic requirements of a symbol table are as follows :
1. Structural flexibility : Based on the usage of identifier, the symbol
table entries must contain all the necessary information.
2. Fast lookup/search : The table lookup/search depends on the
implementation of the symbol table and the speed of the search should
be as fast as possible.
3. Efficient utilization of space : The symbol table must be able to
grow or shrink dynamically for an efficient usage of space.
4. Ability to handle language characteristics : The characteristic of
a language such as scoping and implicit declaration needs to be handled.
Demerits in uniform structure of symbol table :
1. The uniform structure cannot handle a name whose length exceed
upper bound or limit or name field.
2. If the length of a name is small, then the remaining space is wasted.
Compiler Design 4–3 C (CS/IT-Sem-5)
Answer
1. The symbol table is searched (looked up) every time a name is
encountered in the source text.
2. When a new name or new information about an existing name is
discovered, the content of the symbol table changes.
3. Therefore, a symbol table must have an efficient mechanism for accessing
the information held in the table as well as for adding new entries to the
symbol table.
4. In any case, the symbol table is a useful abstraction to aid the compiler
to ascertain and verify the semantics, or meaning of a piece of code.
5. It makes the compiler more efficient, since the file does not need to be
re-parsed to discover previously processed information.
For example : Consider the following outline of a C function :
void scopes ( )
{
int a, b, c; /* level 1 */
.......
{
int a, b; /* level 2 */
....
}
{
float c, d; /* level 3 */
{
int m; /* level 4 */
.....
}
}
}
The symbol table could be represented by an upwards growing stack as :
i. Initially the symbol table is empty.
ii. After the first three declarations, the symbol table will be
c int
b int
a int
iii. After the second declaration of Level 2.
Symbol Tables 4–4 C (CS/IT-Sem-5)
b int
a int
c int
b int
a int
Que 4.4. What is the role of symbol table ? Discuss different data
structures used for symbol table.
OR
Discuss the various data structures used for symbol table with
suitable example.
Compiler Design 4–5 C (CS/IT-Sem-5)
Answer
Role of symbol table :
1. It keeps the track of semantics of variables.
2. It stores information about scope.
3. It helps to achieve compile time efficiency.
Different data structures used in implementing symbol table are :
1. Unordered list :
a. Simple to implement symbol table.
b. It is implemented as an array or a linked list.
c. Linked list can grow dynamically that eliminate the problem of a
fixed size array.
d. Insertion of variable take (1) time , but lookup is slow for large
tables i.e., (n) .
2. Ordered list :
a. If an array is sorted, it can be searched using binary search in
(log 2 n).
b. Insertion into a sorted array is expensive that it takes (n) time on
average.
c. Ordered list is useful when set of names is known i.e., table of
reserved words.
3. Search tree :
a. Search tree operation and lookup is done in logarithmic time.
b. Search tree is balanced by using algorithm of AVL and Red-black
tree.
4. Hash tables and hash functions :
a. Hash table translate the elements in the fixed range of value called
hash value and this value is used by hash function.
b. Hash table can be used to minimize the movement of elements in
the symbol table.
c. The hash function helps in uniform distribution of names in symbol
table.
For example : Consider a part of C program
int x, y;
msg ( );
1. Unordered list :
Symbol Tables 4–6 C (CS/IT-Sem-5)
Id Name Type Id
Id1 x int Id1
Id2 y int Id2
Id3 msg function Id3
3. Search tree :
x
msg y
4. Hash table :
Name 1
Data 1
Link1
Name Name2
Data2
Link2
Hash table Name3
Data3
Link3
Storage table
Que 4.5. Describe symbol table and its entries. Also, discuss
various data structure used for symbol table.
AKTU 2015-16, Marks 10
Answer
Symbol table : Refer Q. 4.1, Page 4–2C, Unit-4.
Entries in the symbol table are as follows :
1. Variables :
a. Variables are identifiers whose value may change between
executions and during a single execution of a program.
b. They represent the contents of some memory location.
c. The symbol table needs to record both the variable name as well as
its allocated storage space at runtime.
Compiler Design 4–7 C (CS/IT-Sem-5)
2. Constants :
a. Constants are identifiers that represent a fixed value that can never
be changed.
b. Unlike variables or procedures, no runtime location needs to be
stored for constants.
c. These are typically placed right into the code stream by the compiler
at compilation time.
3. Types (user defined) :
a. A user defined type is combination of one or more existing types.
b. Types are accessed by name and reference a type definition
structure.
4. Classes :
a. Classes are abstract data types which restrict access to its members
and provide convenient language level polymorphism.
b. This includes the location of the default constructor and destructor,
and the address of the virtual function table.
5. Records :
a. Records represent a collection of possibly heterogeneous members
which can be accessed by name.
b. The symbol table probably needs to record each of the record’s
members.
Various data structure used for symbol table : Refer Q. 4.4, Page 4–4C,
Unit-4.
PART-2
Representing Scope Information.
Questions-Answers
Answer
1. Scope information characterizes the declaration of identifiers and the
portions of the program where it is allowed to use each identifier.
2. Different languages have different scopes for declarations. For example,
in FORTRAN, the scope of a name is a single subroutine, whereas in
Symbol Tables 4–8 C (CS/IT-Sem-5)
Answer
1. Scoping is method of keeping variables in different parts of program
distinct from one another.
2. Scoping is generally divided into two classes :
a. Static scoping : Static scoping is also called lexical scoping. In this
scoping a variable always refers to its top level environment.
b. Dynamic scoping : In dynamic scoping, a global identifier refers
to the identifier associated with the most recent environment.
Answer
Answer
Difference : Refer Q. 4.8, Page 4–9C, Unit-4.
Access to non-local names in static scope :
1. Static chain is the mechanism to implement non-local names (variable)
access in static scope.
2. A static chain is a chain of static links that connects certain activation
record instances in the stack.
3. The static link, static scope pointer, in an activation record instance for
subprogram A points to one of the activation record instances of A’s
static parent.
Symbol Tables 4–10 C (CS/IT-Sem-5)
D
static link
D
B
static link
Calls
E E
A calls E
static link
E calls B
B calls D A
D calls C
PART-3
Run-Time Administration : Implementation of Simple Stack
Allocation Scheme.
Questions-Answers
Answer
1. Activation record is used to manage the information needed by a single
execution of a procedure.
2. An activation record is pushed into the stack when a procedure is called
and it is popped when the control returns to the caller function.
Compiler Design 4–11 C (CS/IT-Sem-5)
Answer
Sub-division of run-time memory into codes and data areas is shown in
Fig. 4.11.1.
Code
Static
Heap
Free Memory
Stack
Fig. 4.11.1.
Symbol Tables 4–12 C (CS/IT-Sem-5)
1. Code : It stores the executable target code which is of fixed size and do
not change during compilation.
2. Static allocation :
a. The static allocation is for all the data objects at compile time.
b. The size of the data objects is known at compile time.
c. The names of these objects are bound to storage at compile time
only and such an allocation of data objects is done by static allocation.
d. In static allocation, the compiler can determine amount of storage
required by each data object. Therefore, it becomes easy for a
compiler to find the address of these data in the activation record.
e. At compile time, compiler can fill the addresses at which the target
code can find the data on which it operates.
3. Heap allocation : There are two methods used for heap management :
a. Garbage collection method :
i. When all access path to a object are destroyed but data object
continue to exist, such type of objects are said to be garbaged.
ii. The garbage collection is a technique which is used to reuse
that object space.
iii. In garbage collection, all the elements whose garbage collection
bit is ‘on’ are garbaged and returned to the free space list.
b. Reference counter :
i. Reference counter attempt to reclaim each element of heap
storage immediately after it can no longer be accessed.
ii. Each memory cell on the heap has a reference counter
associated with it that contains a count of number of values
that point to it.
iii. The count is incremented each time a new value point to the
cell and decremented each time a value ceases to point to it.
4. Stack allocation :
a. Stack allocation is used to store data structure called activation
record.
b. The activation records are pushed and popped as activations begins
and ends respectively.
Compiler Design 4–13 C (CS/IT-Sem-5)
Answer
Run-time storage management is required because :
1. A program needs memory resources to execute instructions.
2. The storage management must connect to the data objects of programs.
3. It takes care of memory allocation and deallocation while the program is
being executed.
Simple stack implementation is implemented as :
1. In stack allocation strategy, the storage is organized as stack. This stack
is also called control stack.
2. As activation begins the activation records are pushed onto the stack
and on completion of this activation the corresponding activation records
can be popped.
3. The locals are stored in the each activation record. Hence, locals are
bound to corresponding activation record on each fresh activation.
4. The data structures can be created dynamically for stack allocation.
Answer
i. Call by name :
1. In call by name, the actual parameters are substituted for formals
in all the places where formals occur in the procedure.
Symbol Tables 4–14 C (CS/IT-Sem-5)
PART-4
Storage Allocation in Block Structured Language.
Questions-Answers
Answer
1. Hashing is an important technique used to search the records of symbol
table. This method is superior to list organization.
2. In hashing scheme, a hash table and symbol table are maintained.
3. The hash table consists of k entries from 0, 1 to k – 1. These entries are
basically pointers to symbol table pointing to the names of symbol table.
4. To determine whether the ‘Name’ is in symbol table, we used a hash
function ‘h’ such that h(name) will result any integer between 0 to
k – 1. We can search any name by position = h(name).
5. Using this position, we can obtain the exact locations of name in symbol
table.
Symbol Tables 4–16 C (CS/IT-Sem-5)
6. The hash table and symbol table are shown in Fig. 4.14.1.
Symbol table
hash table Name Info hash link
Sum Sum
i
j
avg j
avg
.. ..
.. ..
Fig. 4.14.1.
PART-5
Error Detection and Recovery : Lexical Phase Errors, Syntactic
Phase Errors, Semantic Errors.
Questions-Answers
Que 4.15. Define error recovery. What are the properties of error
message ? Discuss the goals of error handling.
Answer
Error recovery : Error recovery is an important feature of any compiler,
through which compiler can read and execute the complete program even it
have some errors.
Compiler Design 4–17 C (CS/IT-Sem-5)
Que 4.16. What are lexical phase errors, syntactic phase errors
and semantic phase errors ? Explain with suitable example.
AKTU 2015-16, Marks 10
Answer
1. Lexical phase error :
a. A lexical phase error is a sequence of character that does not match
the pattern of token i.e., while scanning the source program, the
compiler may not generate a valid token from the source program.
b. Reasons due to which errors are found in lexical phase are :
i. The addition of an extraneous character.
ii. The removal of character that should be presented.
iii. The replacement of a character with an incorrect character.
iv. The transposition of two characters.
For example :
i. In Fortran, an identifier with more than 7 characters long is a
lexical error.
ii. In Pascal program, the character ~, & and @ if occurred is a
lexical error.
2. Syntactic phase errors (syntax error) :
a. Syntactic errors are those errors which occur due to the mistake
done by the programmer during coding process.
Symbol Tables 4–18 C (CS/IT-Sem-5)
OR
Explain logical phase error and syntactic phase error. Also suggest
methods for recovery of error. AKTU 2017-18, Marks 10
Answer
Lexical and syntactic error : Refer Q. 4.16, Page 4–17C, Unit-4.
Various error recovery methods are :
1. Panic mode recovery :
a. This is the simplest method to implement and used by most of the
parsing methods.
b. When parser detect an error, the parser discards the input symbols
one at a time until one of the designated set of synchronizing token
is found.
c. Panic mode correction often skips a considerable amount of input
without checking it for additional errors. It gives guarantee not to
go in infinite loop.
For example :
Let consider a piece of code :
a = b + c;
d = e + f;
By using panic mode it skips a = b + c without checking the error in
the code.
2. Phrase-level recovery :
a. When parser detects an error the parser may perform local
correction on remaining input.
b. It may replace a prefix of the remaining input by some string that
allows parser to continue.
c. A typical local correction would replace a comma by a semicolon,
delete an extraneous semicolon or insert a missing semicolon.
For example :
Let consider a piece of code
while (x > 0) y = a + b;
In this code local correction is done by phrase-level recovery by
adding ‘do’ and parsing is continued.
Symbol Tables 4–20 C (CS/IT-Sem-5)
Answer
a. Missing operand
b. Unbalanced right parenthesis
c. Missing right parenthesis
d. Missing operators
Compiler Design 5–1 C (CS/IT-Sem-5)
5
UNIT
Code Generation
CONTENTS
Part-1 : Code Generation : ...................................... 5–2C to 5–3C
Design Issues
PART-1
Code Generation : Design Issues.
Questions-Answers
Answer
1. Code generation is the final phase of compiler.
2. It takes as input the Intermediate Representation (IR) produced by the
front end of the compiler, along with relevant symbol table information,
and produces as output a semantically equivalent target program as
shown in Fig. 5.1.1.
PART-2
PART-1
The Target
CodeLanguage,
Generation
Address
: Design
inIssues.
Target Code.
Questions-Answers
Answer
1. Addresses in the target code show how names in the IR can be converted
into addresses in the target code by looking at code generation for
simple procedure calls and returns using static and stack allocation.
2. Addresses in the target code represent executing program runs in its
own logical address space that was partitioned into four code and data
areas :
a. A statically determined area code that holds the executable target
code. The size of the target code can be determined at compile
time.
Code Generation 5–4 C (CS/IT-Sem-5)
PART-3
Basic Blocks and Flow Graphs, Optimization of Basic
Blocks, Code Generator.
Questions-Answers
Answer
The algorithm for construction of basic block is as follows :
Input : A sequence of three address statements.
Output : A list of basic blocks with each three address statements in exactly
one block.
Method :
1. We first determine the set of leaders, the first statement of basic block.
The rules we use are given as :
a. The first statement is a leader.
b. Any statement which is the target of a conditional or unconditional
goto is a leader.
c. Any statement which immediately follows a conditional goto is a
leader.
2. For each leader construct its basic block, which consist of leader and all
statements up to the end of program but not including the next leader.
Any statement not placed in the block can never be executed and may
now be removed, if desired.
Compiler Design 5–5 C (CS/IT-Sem-5)
Answer
1. A flow graph is a directed graph in which the flow control information is
added to the basic blocks.
2. The nodes to the flow graph are represented by basic blocks.
3. The block whose leader is the first statement is called initial blocks.
4. There is a directed edge from block Bi – 1 to block Bi if Bi immediately
follows Bi – 1 in the given sequence. We can say that Bi – 1 is a predecessor
of Bi.
For example : Consider the three address code as :
1. prod := 0 2. i := 1
3. t1 := 4 * i 4. t2 := a[t1] /* computation of a[i] */
5. t3 := 4 * i 6. t4 := b[t3] /* computation of b[i] */
7. t5 := t2 * t4 8. t6 := prod + t5
9. prod := t6 10. t7 := i + 1
11. i := t7 12. if i <= 10 goto (3)
The flow graph for the given code can be drawn as follows :
prod : = 0 Block B1 : the
i:=1 initial block
t1 := 4*i
t2 := a[t 1]
t3 := 4*i
t4 := b[t 3]
t5 := t2 * t4
t6 := prod + t 5
prod := t6
t7 := i+1
i := t7
if i< = 10 goto (3)
Answer
Loop is a collection of nodes in the flow graph such that :
1. All such nodes are strongly connected i.e., there is always a path from
any node to any other node within that loop.
2. The collection of nodes has unique entry. That means there is only one
path from a node outside the loop to the node inside the loop.
3. The loop that contains no other loop is called inner loop.
Following term constitute a loop in flow graph :
1. Dominators :
a. In control flow graphs, a node d dominates a node n if every path
from the entry node to n must go through d. This is denoted as d
dom n.
b. By definition, every node dominates itself.
c. A node d strictly dominates a node n if d dominates n and d is not
equal to n.
d. The immediate dominator (or idom) of a node n is the unique node
that strictly dominates n but does not strictly dominate any other
node that strictly dominates n. Every node, except the entry node,
has an immediate dominator.
e. A dominator tree is a tree where each node’s children are those
nodes it immediately dominates. Because the immediate dominator
is unique, it is a tree. The start node is the root of the tree.
2. Natural loops :
a. The natural loop can be defined by a back edge n d such that
there exists a collection of all the nodes that can reach to n
without going through d and at the same time d can also be
added to this collection.
b. Loop in a flow graph can be denoted by n d such that d dom n.
c. These edges are called back edges and for a loop there can be
more than one back edge.
d. If there is p q then q is a head and p is a tail and head dominates
tail.
3. Pre-header :
a. The pre-header is a new block created such that successor of this
block is the header block.
b. All the computations that can be made before the header block
can be made before the pre-header block.
Compiler Design 5–7 C (CS/IT-Sem-5)
Pre-header
header header
B0 B0
Answer
Different issues in code optimization are :
1. Function preserving transformation : The function preserving
transformations are basically divided into following types :
a. Common sub-expression elimination :
i. A common sub-expression is nothing but the expression which
is already computed and the same expression is used again
and again in the program.
ii. If the result of the expression not changed then we eliminate
computation of same expression again and again.
Code Generation 5–8 C (CS/IT-Sem-5)
For example :
Before common sub-expression elimination :
a = t * 4 – b + c;
........................
........................
m = t * 4 – b + c;
........................
........................
n = t * 4 – b + c;
After common sub-expression elimination :
temp = t * 4 – b + c;
a = temp;
........................
........................
m = temp;
........................
........................
n = temp;
iii. In given example, the equation a = t * 4 – b + c is occurred most
of the times. So it is eliminated by storing the equation into
temp variable.
b. Dead code elimination :
i. Dead code means the code which can be emitted from program
and still there will be no change in result.
ii. A variable is live only when it is used in the program again and
again. Otherwise, it is declared as dead, because we cannot
use that variable in the program so it is useless.
iii. The dead code occurred during the program is not introduced
intentionally by the programmer.
For example :
# Define False = 0
!False = 1
If(!False)
{
........................
........................
}
iv. If false becomes zero, is guaranteed then code in ‘IF’ statement
will never be executed. So, there is no need to generate or
write code for this statement because it is dead code.
c. Copy propagation :
i. Copy propagation is the concept where we can copy the result
of common sub-expression and use it in the program.
ii. In this technique the value of variable is replaced and
computation of an expression is done at the compilation time.
Compiler Design 5–9 C (CS/IT-Sem-5)
For example :
pi = 3.14;
r = 5;
Area = pi * r * r;
Here at the compilation time the value of pi is replaced by 3.14
and r by 5.
d. Constant folding (compile time evaluation) :
i. Constant folding is defined as replacement of the value of one
constant in an expression by equivalent constant value at the
compile time.
ii. In constant folding all operands in an operation are constant.
Original evaluation can also be replaced by result which is also
a constant.
For example : a = 3.14157/2 can be replaced by a = 1.570785
thereby eliminating a division operation.
2. Algebraic simplification :
a. Peephole optimization is an effective technique for algebraic
simplification.
b. The statements such as
x:=x+0
or x:=x*1
can be eliminated by peephole optimization.
Answer
Transformation :
1. A number of transformations can be applied to basic block without
changing set of expression computed by the block.
2. Transformation helps us in improving quality of code and act as optimizer.
3. There are two important classes as local transformation that can be
applied to the basic block :
a. Structure preserving transformation : They are as follows :
i. Common sub-expression elimination : Refer Q. 5.6,
Page 5–7C, Unit-5.
ii. Dead code elimination : Refer Q. 5.6, Page 5–7C, Unit-5.
iii. Interchange of statement : Suppose we have a block with
the two adjacent statements,
temp1 = a + b
Code Generation 5–10 C (CS/IT-Sem-5)
temp2 = m + n
Then we can interchange the two statements without affecting
the value of the block if and only if neither ‘m’ nor ‘n’ is
temporary variable temp1 and neither ‘a’ nor ‘b’ is temporary
variable temp2. From the given statements we can conclude
that a normal form basic block allow us for interchanging all
the statements if they are possible.
b. Algebraic transformation : Refer Q. 5.6, Page 5–7C, Unit-5.
PART-4
Machine Independent Optimizations, Loop Optimization.
Questions-Answers
Answer
Code optimization :
1. The code optimization refers to the techniques used by the compiler to
improve the execution efficiency of generated object code.
2. It involves a complex analysis of intermediate code and performs various
transformations but every optimizing transformation must also
preserve the semantic of the program.
Classification of code optimization :
Code optimization
c. The use of intermix instructions along with the data increases the
speed of execution.
2. Machine independent : The machine independent optimization can
be achieved using following criteria :
a. The code should be analyzed completely and use alternative
equivalent sequence of source code that will produce a minimum
amount of target code.
b. Use appropriate program structure in order to improve the
efficiency of target code.
c. By eliminating the unreachable code from the source program.
d. Move two or more identical computations at one place and make
use of the result instead of each time computing the expressions.
Answer
Answer
Following term constitute a loop in flow graph : Refer Q. 5.5,
Page 5–5C, Unit-5.
Loop optimization is a process of increasing execution time and reducing
the overhead associated with loops.
The loop optimization is carried out by following methods :
1. Code motion :
a. Code motion is a technique which moves the code outside the
loop.
b. If some expression in the loop whose result remains unchanged
even after executing the loop for several times, then such an
expression should be placed just before the loop (i.e., outside the
loop).
c. Code motion is done to reduce the execution time of the program.
2. Induction variables :
a. A variable x is called an induction variable of loop L if the value of
variable gets changed every time.
b. It is either decremented or incremented by some constant.
3. Reduction in strength :
a. In strength reduction technique the higher strength operators
can be replaced by lower strength operators.
b. The strength of certain operator is higher than other.
c. The strength reduction is not applied to the floating point
expressions because it may yield different results.
4. Loop invariant method : In loop invariant method, the computation
inside the loop is avoided and thereby the computation overhead on
compiler is avoided.
5. Loop unrolling : In this method, the number of jumps and tests can
be reduced by writing the code two times.
For example :
int i = 1; int i = 1;
while(i<=100) while(i<=100)
{ Can be written as {
a[i]=b[i]; a[i]=b[i];
i++; i++;
} a[i]=b[i];
i++ ;
}
For example :
Answer
a. Basic blocks and flow graph :
1. As first statement of program is leader statement.
PROD = 0 is a leader.
2. Fragmented code represented by two blocks is shown below :
Code Generation 5–14 C (CS/IT-Sem-5)
PROD = 0
B1
I=1
T1 = 4 * I
T2 = addr(A) – 4
T3 = T2 [T1]
T4 = addr(B) – 4
T5 = T4 [T1]
T6 = T3 * T5
PROD = PROD + T6 B2
I=I+1
If I <= 20 goto B2
Fig. 5.11.1.
b. Function preserving transformation :
1. Common sub-expression elimination : No any block has any sub
expression which is used two times. So, no change in flow graphs.
2. Copy propagation : No any instruction in the block B2 is direct
assignment i.e., in the form of x = y. So, no change in flow graph and
basic block.
3. Dead code elimination : No any instruction in the block B2 is dead. So,
no change in flow graph and basic block.
4. Constant folding : No any constant expression is present in basic
block. So, no change in flow graph and basic block.
Loop optimization :
1. Code motion : In block B2 we can see that value of T2 and T4 is calculated
every time when loop is executed. So, we can move these two instructions
outside the loop and put in block B1 as shown in Fig. 5.11.2.
PROD = 0
I=1
T2 = addr(A) – 4 B1
T4 = addr(B) – 4
T1 = 4 * I
T3 = T2 [T1]
T5 = T4 [T1]
T6 = T3 * T5
PROD = PROD + T6 B2
I=I+1
If I <= 20 goto B2
Fig. 5.11.2.
Compiler Design 5–15 C (CS/IT-Sem-5)
T1 = T 1 + 4 B 2
PROD = 0
T1 = 4 * I B1
T 2 = addr(A) – 4
T 4 = addr(B) – 4
T1 = T1 + 4
T3 = T 2 [T 1]
T5 = T 4 [T 1]
B2
T6 = T3 * T5
PROD = PROD + T6
if T 1 < = 80 goto B 2
Fig. 5.11.3
Que 5.12. Write short notes on the following with the help of
example :
i. Loop unrolling
ii. Loop jamming
iii. Dominators
iv. Viable prefix AKTU 2018-19, Marks 07
Answer
i. Loop unrolling : Refer Q. 5.10, Page 5–11C, Unit-5.
ii. Loop jamming : Refer Q. 5.10, Page 5–11C, Unit-5.
iii. Dominators : Refer Q. 5.5, Page 5–5C, Unit-5.
For example : In the flow graph,
Code Generation 5–16 C (CS/IT-Sem-5)
1
2 1
3
2 3
4
4 5 7
5 6 6
7 8
9 10
8
( b) Dominator tree
9 10
(a) Flow graph
Fig. 5.12.1.
Initial Node, Node1 dominates every Node.
Node 2 dominates itself. Node 3 dominates all but 1 and 2. Node 4 dominates
all but 1,2 and 3.
Node 5 and 6 dominates only themselves, since flow of control can skip
around either by go in through the other. Node 7 dominates 7, 8, 9 and 10.
Node 8 dominates 8, 9 and 10.
Node 9 and 10 dominates only themselves.
iv. Viable prefix : Viable prefixes are the prefixes of right sentential forms
that can appear on the stack of a shift-reduce parser.
For example :
Let : S x1x2x3x4
A x1 x2
Let w = x1x2x3
SLR parse trace :
STACK INPUT
$ x1x2x3
$ x1 x2x3
$ x 1x 2 x3
$A x3
$AX3 $
.
.
.
As we see, x1x2x3 will never appear on the stack. So, it is not a viable
prefix.
PART-5
DAG Representation of Basic Blocks.
Compiler Design 5–17 C (CS/IT-Sem-5)
Questions-Answers
Answer
DAG :
1. The abbreviation DAG stands for Directed Acyclic Graph.
2. DAGs are useful data structure for implementing transformations on
basic blocks.
3. A DAG gives picture of how the value computed by each statement in
the basic block is used in the subsequent statement of the block.
4. Constructing a DAG from three address statement is a good way of
determining common sub-expressions within a block.
5. A DAG for a basic block has following properties :
a. Leaves are labeled by unique identifier, either a variable name or
constants.
b. Interior nodes are labeled by an operator symbol.
c. Nodes are also optionally given a sequence of identifiers for labels.
6. Since, DAG is used in code optimization and output of code optimization
is machine code and machine code uses register to store variable used
in the source program.
Advantage of DAG :
1. We automatically detect common sub-expressions with the help of
DAG algorithm.
2. We can determine which identifiers have their values used in the
block.
3. We can determine which statements compute values which could be
used outside the block.
Que 5.14. What is DAG ? How DAG is created from three address
code ? Write algorithm for it and explain it with a relevant example.
Code Generation 5–18 C (CS/IT-Sem-5)
Answer
DAG : Refer Q. 5.13, Page 5–17C, Unit-5.
Algorithm :
Input : A basic block.
Output : A DAG with label for each node (identifier).
Method :
1. Create nodes with one or two left and right children.
2. Create linked list of attached identifiers for each node.
3. Maintain all identifiers for which a node is associated.
4. Node (identifier) represents value that identifier has the current point
in DAG construction process. Symbol table store the value of node
(identifier).
5. If there is an expression of the form x = y op z then DAG contain “op”
as a parent node and node(y) as a left child and node(z) as a right child.
For example :
Given expression : a * (b – c) + (b – c) * d
The construction of DAG with three address code will be as follows :
t1
Step 1 : – t1 = b – c
b c
t2
Step 2 : * t2 = (b – c) * d
– d
b c
t3
Step 3 : * t3 = a * (b – c)
a –
b c
t4
Step 4 : + t4 = a * (b – c) + (b – c) * d
* *
– d a
b c
Que 5.15. How DAG is different from syntax tree ? Construct the
DAG for the following basic blocks :
Compiler Design 5–19 C (CS/IT-Sem-5)
a := b + c
b := b – d
c := c + d
e=b+c
Answer
DAG v/s Syntax tree :
1. Directed Acyclic Graph is a data structure for transformations on the
basic block. While syntax tree is an abstract representation of the
language constructs.
2. DAG is constructed from three address statement while syntax tree is
constructed directly from the expression.
DAG for the given code is :
+ a + e
+ b + c
b0
c0 d0
Fig. 5.15.1.
Answer
– d
b c
t3
Step 3 : * t3 = a * (b – c)
a –
b c
t4
Step 4 : + t4 = a * (b – c) + (b – c) * d
* *
– d a
b c
t5
Step 5 : + t5 = a + a * (b – c) + (b – c) * d
a +
* *
– d a
b c
Que 5.17. How would you represent the following equation using
DAG ?
a= b*–c+b*–c AKTU 2018-19, Marks 07
Compiler Design 5–21 C (CS/IT-Sem-5)
Answer
Code representation using DAG of equation : a = b * – c + b *– c
t1
Step 1 : –
t1 = – c
c
Step 2 : t2
*
t2 = b * t1
b –
+ t3
t3 = t2 + t2
Step 3 : *
b –
c
Step 4 : = t4
t4 = a
+ a
b –
Que 5.18. Give the algorithm for the elimination of local and global
common sub-expressions algorithm with the help of example.
AKTU 2017-18, Marks 10
Answer
Algorithm for elimination of local common sub-expression : DAG
algorithm is used to eliminate local common sub-expression.
DAG : Refer Q. 5.13, Page 5–17C, Unit-5.
Code Generation 5–22 C (CS/IT-Sem-5)
PART-6
Value Numbers and Algebraic Laws, Global Data Flow Analysis.
Questions-Answers
Answer
1. Data flow analysis is a process in which the values are computed using
data flow properties.
2. In this analysis, the analysis is made on data flow.
3. A program’s Control Flow Graph (CFG) is used to determine those parts
of a program to which a particular value assigned to a variable might
propagate.
4. A simple way to perform data flow analysis of programs is to set up data
flow equations for each node of the control flow graph and solve them
by repeatedly calculating the output from the input locally at each node
until the whole system stabilizes, i.e., it reaches a fix point.
5. Reaching definitions is used by data flow analysis in code optimization.
Reaching definitions :
1. A definition D reaches at point p if there is a path from D to p along
which D is not killed.
d1 : y := 2 B1
d2 : x := y + 2 B2
d1 : y := 2 B1
d2 : y := y + 2 B2
d3 : x := y + 2 B3
3. The definition d1 is said to a reaching definition for block B2. But the
definition d1 is not a reaching definition in block B3, because it is killed
by definition d2 in block B2.
Que 5.20. Write short notes (any two) :
i. Global data flow analysis
ii. Loop unrolling
iii. Loop jamming
AKTU 2015-16, Marks 15
Code Generation 5–24 C (CS/IT-Sem-5)
OR
Write short note on global data analysis.
AKTU 2017-18, Marks 05
Answer
i. Global data flow analysis :
1. Global data flow analysis collects the information about the entire
program and distributed it to each block in the flow graph.
2. Data flow can be collected in various block by setting up and solving
a system of equation.
3. A data flow equation is given as :
OUT(s) = {IN(s) – KILL(s)} GEN(s)
OUT(s) : Definitions that reach exist of block B.
GEN(s) : Definitions within block B that reach the end of B.
IN(s) : Definitions that reaches entry of block B.
KILL(s) : Definitions that never reaches the end of block B.
ii. Loop unrolling : Refer Q. 5.10, Page 5–11C, Unit-5.
iii. Loop fusion or loop jamming : Refer Q. 5.10, Page 5–11C, Unit-5.
Answer
Role of macros in programming language are :
1. It is use to define word that are used most of the time in program.
2. It automates complex task.
3. It helps to reduce the use of complex statement in a program.
4. It makes the program run faster.
a + a * (b – c) + (b – c) * d.
Ans. Refer Q. 5.16.
Compiler Design SQ–1 C (CS/IT-Sem-5)
1
UNIT
Introduction to
Compiler
(2 Marks Questions)
1. It scans all the lines of source It scans one line at a time, if there is
program and list out all any syntax error, the execution of
syntax errors at a time. program terminates immediately.
2. Object produced by compiler Machine code produced by
gets saved in a file. Hence, interpreter is not saved in any file.
file does not need to compile Hence, we need to interpret the
again and again. file each time.
3. It takes less time to execute. It takes more time to execute.
a
q0 q1
a
b b b b
a
q2 q3
a
Fig. 1.
Regular expression for above DFA :
(aa + bb + (ab + aa)(aa + bb)* (ab + ba))*
Compiler Design SQ–5 C (CS/IT-Sem-5)
2
UNIT
Basic Parsing
Techniques
(2 Marks Questions)
Ans. We need LR parsing tables to parse the input string using shift
reduce method.
Compiler Design SQ–7 C (CS/IT-Sem-5)
3
UNIT
Syntax Directed
Translation
(2 Marks Questions)
Ans.
1. Quadruples are preferred over triples in an optimizing compiler as
instructions are after found to move around in it.
2. In the triples notation, the result of any given operations is referred
by its position and therefore if one instruction is moved then it is
required to make changes in all the references that lead to that
result.
3.8. What is a syntax tree ? Draw the syntax tree for the
following statement : c b c b a – * + – * =
AKTU 2016-17, Marks 02
Ans.
1. A syntax tree is a tree that shows the syntactic structure of a
program while omitting irrelevant details present in a parse tree.
Compiler Design SQ–9 C (CS/IT-Sem-5)
2 Marks Questions SQ–10 C (CS/IT-Sem-5)
4
UNIT
Symbol Tables
(2 Marks Questions)
2 Marks Questions SQ–12 C (CS/IT-Sem-5)
5
UNIT
Code Generation
(2 Marks Questions)
5.1. What do you mean by code optimization ?
Ans. Code optimization refers to the technique used by the compiler to
improve the execution efficiency of the generated object code.
i=0
sum = 0
Yes
if (i < = 10)
sum = sum + i
No i = i + 1;
Stop
Fig. 5.9.1.
2 Marks Questions SQ–14 C (CS/IT-Sem-5)
Compiler Design SP–1 C (CS/IT-Sem-5)
B. Tech.
(SEM. VI) EVEN SEMESTER THEORY
EXAMINATION, 2015-16
COMPILER DESIGN
Time : 3 Hours Max. Marks : 100
Section - A
g. Define DAG.
Section - B
2. Attempt any five question from this section. (10 × 5 = 50)
a. Construct an SLR(1) parsing table for the following
grammar :
S A)
S A, P| (P, P
P {num, num}
a ( ) ; $
A > > >
( < < = <
) > > >
; < < > >
$ < <
Section-C
Solved Paper (2015-16) SP–4 C (CS/IT-Sem-5)
1. It scans all the lines of source It scans one line at a time, if there is
program and list out all any syntax error, the execution of
syntax errors at a time. program terminates immediately.
2. Object produced by compiler Machine code produced by
gets saved in a file. Hence, interpreter is not saved in any file.
file does not need to compile Hence, we need to interpret the
again and again. file each time.
3. It takes less time to execute. It takes more time to execute.
Compiler Design SP–5 C (CS/IT-Sem-5)
g. Define DAG.
Ans.
1. The abbreviation DAG stands for Directed Acyclic Graph.
2. DAGs are useful data structure for implementing transformations
on basic blocks.
3. A DAG gives picture of how the value computed by each statement
in the basic block is used in the subsequent statement of the
block.
Section - B
2. Attempt any five question from this section. (10 × 5 = 50)
a. Construct an SLR(1) parsing table for the following
grammar :
S A)
S A, P| (P, P
P {num, num}
Solved Paper (2015-16) SP–6 C (CS/IT-Sem-5)
Action Goto
Item ) , ( { Num } $ S A P
Set
0 S3 S4 1 2
1 accept
2 S5 S6
3 S4 6
4 S8
5 r1
6 S4 r2 9
7 S10
8 S11
9 r2
10 S4 12
11 S13
12 r3
13 S14
14 r4 r4
a ( ) ; $
A > > >
( < < = <
) > > >
; < < > >
$ < <
3. Create a directed graph whose nodes are the groups found in step
2. For any a and b, if a <. b, place an edge from the group of gb to the
group of fa. If a .> b, place an edge from the group of fa to that of gb.
4. If the graph constructed in step 3 has a cycle, then no precedence
functions exist. If there are no cycles, let f(a) be the length of the
longest path from the group of f a; let g(b) be the length of the
longest path from the group of gb. Then there exists a precedence
function.
Precedence graph for above matrix is :
fa ga
f( g(
f) g)
f; g;
f$ g$
Fig. 1.
From the precedence graph, the precedence function using
algorithm calculated as follows :
( ( ) ; $
f 1 0 2 2 0
g 3 3 0 1 0
S AA
A aA|b
The LR (1) items will be :
I0 : S •S, $
S •AA, $
A •aA, a/b
A •b, a/b
I1 = GOTO (I0, S)
I1: S S•, $
I2 = GOTO (I0, A)
I2: S A•A, $
A •aA, $
A •b, $
I3 = GOTO (I0, a)
I3: A a•A, a/b
A •aA, a/b
A •b, a/b
I4 = GOTO (I0, b)
I4: A b•, a/b
I5 = GOTO (I2, A)
I5: S AA•, $
I6 = GOTO (I2, a)
I6: A a•A, $
A •aA, $
A •b, $
I7 = GOTO (I2, b)
I7: A b•, $
I8 = GOTO (I3, A)
I8: A aA•, a/b
I9 = GOTO (I6, A)
I9: A aA•, $
Solved Paper (2015-16) SP–12 C (CS/IT-Sem-5)
Table 1.
State Action Goto
a b $ S A
0 S3 S4 1 2
1 accept
2 S6 S7 5
3 S3 S4 8
4 r3 r3
5 r1
6 S6 S7 9
7 r3
8 r2 r2
9 r2
Since table does not contain any conflict. So it is LR(1).
The goto table will be for LALR I3 and I6 will be unioned, I4 and I7
will be unioned, and I8 and I9 will be unioned.
So, I36 : A a•A, a / b / $
A •aA, a / b / $
A •b, a / b / $
I47 : A b•, a / b / $
I89 : A aA•, a / b / $ and LALR table will be :
Table 2.
State Action Goto
a b $ S A
0 S36 S47 1 2
1 accept
2 S36 S47 5
36 S36 S47 89
47 r3 r3 r3
5 r1
89 r2 r2 r2
Compiler Design SP–13 C (CS/IT-Sem-5)
Since, LALR table does not contain any conflict. So, it is also
LALR(1).
DFA :
S
I0 I1
A A
I2 I5
a a
a A
I36 I89
b b
b
I47
Fig. 2.
S •Bc, $
S •bBa, $
A •d, a
B •d, c
I1 = GOTO (I0, S)
I1 : S S•, $
I2 = GOTO (I0, A)
I2 : S A•a, $
I3 = GOTO (I0, b)
I3 : S b•Ac, $
S b•Ba, $
A •d, c
B •d, a
I4 = GOTO (I0, B)
I4 : S B•c, $
I5 = GOTO (I0, d)
I5 : A d•, a
B d•, c
I6 = GOTO (I0, a)
I6 : S Aa•, $
I7 = GOTO (I3, A)
I7 : S bA•c, $
I8 = GOTO (I3, B)
I8 : S bB•a, $
I9 = GOTO (I3, d)
I9 : A d•, c
B d•, a
I10 = GOTO (I4, c)
Compiler Design SP–15 C (CS/IT-Sem-5)
I10 : S Bc•, $
I11 = GOTO (I7, c)
I11 : S bAc•, $
I12 = GOTO (I8, a)
I12 : S bBa•, $
The action/goto table will be designed as follows :
Table 3.
Since the table does not have any conflict. So, it is LR(1).
For LALR(1) table, item set 5 and item set 9 are same. Thus we
merge both the item sets (I5, I9) = item set I59. Now, the resultant
parsing table becomes :
Solved Paper (2015-16) SP–16 C (CS/IT-Sem-5)
Table 4.
State Action Goto
a b c d $ S A B
0 S3 S59 1 2 4
1 accept
2 S6
3 S59 7 8
4 S10
59 r59, r6 r6, r59
6 r1
7 S11
8 S12
10 r3
11 r2
12 r4
2. Constants :
a. Constants are identifiers that represent a fixed value that can
never be changed.
b. Unlike variables or procedures, no runtime location needs to
be stored for constants.
c. These are typically placed right into the code stream by the
compiler at compilation time.
3. Types (user defined) :
a. A user defined type is combination of one or more existing
types.
b. Types are accessed by name and reference a type definition
structure.
4. Classes :
a. Classes are abstract data types which restrict access to its
members and provide convenient language level polymorphism.
b. This includes the location of the default constructor and
destructor, and the address of the virtual function table.
5. Records :
a. Records represent a collection of possibly heterogeneous
members which can be accessed by name.
b. The symbol table probably needs to record each of the record’s
members.
Various data structure used for symbol table :
1. Unordered list :
a. Simple to implement symbol table.
b. It is implemented as an array or a linked list.
c. Linked list can grow dynamically that eliminate the problem of
a fixed size array.
d. Insertion of variable take (1) time , but lookup is slow for
large tables i.e., (n) .
2. Ordered list :
a. If an array is sorted, it can be searched using binary search in
(log 2 n).
b. Insertion into a sorted array is expensive that it takes (n)
time on average.
c. Ordered list is useful when set of names is known i.e., table of
reserved words.
3. Search tree :
a. Search tree operation and lookup is done in logarithmic time.
b. Search tree is balanced by using algorithm of AVL and Red-
black tree.
4. Hash tables and hash functions :
a. Hash table translate the elements in the fixed range of value
called hash value and this value is used by hash function.
b. Hash table can be used to minimize the movement of elements
in the symbol table.
Compiler Design SP–19 C (CS/IT-Sem-5)
b0
c0 d0
Fig. 3.
1. The two occurrences of sub-expressions b + c compute the same
value.
2. Value computed by a and e are same.
Applications of DAG :
1. Scheduling : Directed acyclic graphs representations of partial
orderings have many applications in scheduling for systems of tasks.
2. Data processing networks : A directed acyclic graph may be
used to represent a network of processing elements.
3. Data compression :Directed acyclic graphs may also be used as a
compact representation of a collection of sequences. In this type of
application, one finds a DAG in which the paths form the sequences.
4. It helps in finding statement that can be recorded.
7.T5 : = T4 [T1]
8.T6 : = T3*T5
9.Prod : = Prod + T6
10.I=I+1
11.If I <= 20 goto (3)
Perform loop optimization.
Ans. Basic blocks and flow graph :
1. As first statement of program is leader statement.
PROD = 0 is a leader.
2. Fragmented code represented by two blocks is shown below :
PROD = 0
B1
I=1
T1 = 4 * I
T2 = addr(A) – 4
T3 = T2 [T1]
T4 = addr(B) – 4
T5 = T4 [T1]
T6 = T3 * T5
PROD = PROD + T6 B2
I=I+1
If I <= 20 goto B2
Fig. 4.
Loop optimization :
1. Code motion : In block B2 we can see that value of T2 and T4 is
calculated every time when loop is executed. So, we can move
these two instructions outside the loop and put in block B1 as shown
in Fig. 5.
PROD = 0
I=1
T2 = addr(A) – 4 B1
T4 = addr(B) – 4
T1 = 4 * I
T3 = T2 [T1]
T5 = T4 [T1]
T6 = T3 * T5
PROD = PROD + T6 B2
I=I+1
If I <= 20 goto B2
Fig. 5.
Compiler Design SP–21 C (CS/IT-Sem-5)
T1 = T1 + 4
T 3 = T 2 [T 1]
T 5 = T 4 [T 1]
B2
T6 = T3 * T5
PROD = PROD + T6
if T 1 < = 80 goto B 2
Fig. 6.
5. Write short notes on :
i. Global data flow analysis
ii. Loop unrolling
iii. Loop jamming
Ans.
i. Global data flow analysis :
1. Global data flow analysis collects the information about the entire
program and distributed it to each block in the flow graph.
2. Data flow can be collected in various block by setting up and solving
a system of equation.
3. A data flow equation is given as :
OUT(s) = {IN(s) – KILL(s)} GEN(s)
OUT(s) : Definitions that reach exist of block B.
GEN(s) : Definitions within block B that reach the end of B.
IN(s) : Definitions that reaches entry of block B.
KILL(s) : Definitions that never reaches the end of block B.
ii. Loop unrolling : In this method, the number of jumps and tests
can be reduced by writing the code two times.
Solved Paper (2015-16) SP–22 C (CS/IT-Sem-5)
For example :
int i = 1; int i = 1;
while(i<=100) while(i<=100)
{ Can be written as {
a[i]=b[i]; a[i]=b[i];
i++; i++;
} a[i]=b[i];
i++ ;
}
iii. Loop fusion or loop jamming : In loop fusion method, several
loops are merged to one loop.
For example :
for i:=1 to n do Can be written as for i:=1 to n*m do
for j:=1 to m do a[i]:=10
a[i,j]:=10
Compiler Design SP–1 C (CS/IT-Sem-5)
B. Tech.
(SEM. VI) EVEN SEMESTER THEORY
EXAMINATION, 2016-17
COMPILER DESIGN
Time : 3 Hours Max. Marks : 100
Section-C
Compiler Design SP–3 C (CS/IT-Sem-5)
b b b b
a
q2 q3
a
Fig. 1.
Regular expression for above DFA :
(aa + bb + (ab + aa)(aa + bb)* (ab + ba))*
=
c *
b –
c +
b *
a –
Fig. 2.
i=0
sum = 0
Yes
if (i < = 10)
sum = sum + i
No i = i + 1;
Stop
Fig. 3.
Source program
Lexical analyzer
Syntax analyzer
Semantic analyzer
Code optimizer
Code generator
Target program
Fig. 4.
v. Phase 5 (Code optimization) : This phase is designed to improve
the intermediate code so that the ultimate object program runs
faster and takes less space.
vi. Phase 6 (Code generation) :
a. It is the final phase for compiler.
b. It generates the assembly code as target language.
c. In this phase, the address in the binary code is translated from
logical address.
Symbol table / table management : A symbol table is a data
structure containing a record that allows us to find the record for
each identifier quickly and to store or retrieve data from that
record quickly.
Error handler : The error handler is invoked when a flaw in the
source program is detected.
Solved Paper (2016-17) SP–8 C (CS/IT-Sem-5)
Lexical analyzer
Token stream
id1 = (id2 + id3) * (id2 + id3) * 2
Syntax analyzer
id2 id3
Semantic analyzer
id1 *
Annotated syntax tree
*
int_to_real
+ +
2
id2 id3
Intermediate code
generation
t1 = b + c
Intermediate code
t2 = t1 * t1
t3 = int_to_real (2)
t4 = t2 * t3
id1 = t4
Compiler Design SP–9 C (CS/IT-Sem-5)
Code optimization
t1 = b + c Optimized code
t2 = t1 * t1
id1 = t2 * 2
Machine code
Machine code
MOV R1, b
ADD R 1, R 1, c
MUL R2, R1, R1
MUL R2, R1, # 2.0
ST id1, R2
(0 + 1)* (0 + 1)
q1 q2 q3 1 q4 0 qf
(0 + 1)* 1
q1 q2 0 q3 q4 0 qf
1
0, 1
q1 q 1 q2 0 q3 1 q4 0
5 qf
1
If we remove we get
0, 1
q1 0 q3 1 q4 0 qf
1
[ can be neglected so q1 = q5 = q2]
Now, we convert above NFA into DFA :
Transition table for NFA :
/ 0 1
q1 q1 q3 q1 q3
q3 q4
q4 qf
* qf
Solved Paper (2016-17) SP–10 C (CS/IT-Sem-5)
/ 0 1 Let
q1 q1 q3 q1 q3 q1 as A
q1 q3 q1 q3 q1 q3q4 q1 q3 as B
q1 q3 q4 q1 q3 qf q1 q3 q4 q1 q3 q4 as C
* q1 q3 qf q1 q3 q1 q3 q4 q1 q3 qf as D
*D B C
E
Only one parse tree is
E E + possible for id(id)id+
so, the given grammar
E ( E ) id is unambiguous.
id id
Step 2 : b*
b
Step 3 : b
b
Step 4 : ab*
a b
Step 5 : ab
a b
Step 6 : ab*|ab
a b
2 3 4 5
1 10
a b
6 7 8 9
ii. After the first three declarations, the symbol table will be
c int
b int
a int
iii. After the second declaration of Level 2.
b int
a int
c int
b int
a int
d float
c float
c int
b int
a int
vi. After entering into Level 4.
m int
d float
c float
c int
b int
a int
vii. On leaving the control from Level 4.
d float
c float
c int
b int
a int
viii. On leaving the control from Level 3.
c int
b int
a int
ix. On leaving the function entirely, the symbol table will be again
empty.
next leader. Any statement not placed in the block can never be
executed and may now be removed, if desired.
If(!False)
{
........................
........................
}
iv. If false becomes zero, is guaranteed then code in ‘IF’ statement will
never be executed. So, there is no need to generate or write code
for this statement because it is dead code.
c. Copy propagation :
i. Copy propagation is the concept where we can copy the result of
common sub-expression and use it in the program.
ii. In this technique the value of variable is replaced and computation
of an expression is done at the compilation time.
For example :
pi = 3.14;
r = 5;
Area = pi * r * r;
Here at the compilation time the value of pi is replaced by 3.14 and r
by 5.
d. Constant folding (compile time evaluation) :
i. Constant folding is defined as replacement of the value of one
constant in an expression by equivalent constant value at the compile
time.
ii. In constant folding all operands in an operation are constant.
Original evaluation can also be replaced by result which is also a
constant.
For example : a = 3.14157/2 can be replaced by a = 1.570785
thereby eliminating a division operation.
2. Algebraic simplification :
a. Peephole optimization is an effective technique for algebraic
simplification.
b. The statements such as
x:=x+0
or x:=x*1
can be eliminated by peephole optimization.
Code
Static
Heap
Free Memory
Stack
Fig. 6.
1. Code : It stores the executable target code which is of fixed size and
do not change during compilation.
2. Static allocation :
a. The static allocation is for all the data objects at compile time.
b. The size of the data objects is known at compile time.
c. The names of these objects are bound to storage at compile time
only and such an allocation of data objects is done by static allocation.
d. In static allocation, the compiler can determine amount of storage
required by each data object. Therefore, it becomes easy for a
compiler to find the address of these data in the activation record.
e. At compile time, compiler can fill the addresses at which the target
code can find the data on which it operates.
3. Heap allocation : There are two methods used for heap
management :
a. Garbage collection method :
i. When all access path to a object are destroyed but data object
continue to exist, such type of objects are said to be garbaged.
ii. The garbage collection is a technique which is used to reuse that
object space.
iii. In garbage collection, all the elements whose garbage collection bit
is ‘on’ are garbaged and returned to the free space list.
b. Reference counter :
i. Reference counter attempt to reclaim each element of heap storage
immediately after it can no longer be accessed.
ii. Each memory cell on the heap has a reference counter associated
with it that contains a count of number of values that point to it.
iii. The count is incremented each time a new value point to the cell
and decremented each time a value ceases to point to it.
4. Stack allocation :
a. Stack allocation is used to store data structure called activation
record.
b. The activation records are pushed and popped as activations begins
and ends respectively.
c. Storage for the locals in each call of the procedure is contained in
the activation record for that call. Thus, locals are bound to fresh
storage in each activation, because a new activation record is pushed
onto the stack when call is made.
d. These values of locals are deleted when the activation ends.
Compiler Design SP–17 C (CS/IT-Sem-5)
Section-C
S A
I0 I1 I5
a
A a
b
I2 b I4
A
S b I3
a
I8
Fig. 7. DFA for set of items.
Table 2 : Parse the input abab using parse table.
S id := E { id_entry := look_up(id.name);
if id_entry nil then
append (id_entry ‘:=’ E.place)
else error; /* id not declared*/
}
E E1 + E2 { E.place := newtemp();
append (E.place ‘:=’ E1.place ‘+’ E2.place)
}
E E1 * E2 { E.place := newtemp();
append (E.place ‘:=’ E1.place ‘*’ E2.place)
}
E – E1 { E.place := newtemp();
append (E.place ‘:=’ ‘minus’ E1.place)
}
E id { id_entry: = look_up(id.name);
if id_entry nil then
append (id_entry ‘:=’ E.place)
else error; /* id not declared*/
}
1. The look_up returns the entry for id.name in the symbol table if it
exists there.
2. The function append is used for appending the three address code
to the output file. Otherwise, an error will be reported.
3. Newtemp() is the function used for generating new temporary
variables.
4. E.place is used to hold the value of E.
Example : x := (a + b)*(c + d)
We will assume all these identifiers are of the same type. Let us
have bottom-up parsing method :
Solved Paper (2016-17) SP–20 C (CS/IT-Sem-5)
E id E.place := a
E id E.place := b
E E1 + E2 E.place := t1 t1 := a + b
E id E.place := c
E id E.place := d
E E1 + E2 E.place := t2 t2 := c + d
E E1 * E2 E.place := t3 t3 := (a + b)*(c + d)
S id := E x := t3
{
case 1 : c = a + b;
break;
case 2 : c = a – b;
break;
}
The three address code can be
if ch = 1 goto L1
if ch = 2 goto L2
L1 : t1 := a + b
c := t1
goto last
L2 : t2 := a – b
c := t2
goto last
last :
t2
Step 2 : * t2 = (b – c) * d
– d
b c
t3
Step 3 : * t3 = a * (b – c)
a –
b c
t4
Step 4 : + t4 = a * (b – c) + (b – c) * d
* *
– d a
b c
t5
Step 5 : + t5 = a + a * (b – c) + (b – c) * d
a +
* *
– d a
b c
Compiler Design SP–1 C (CS/IT-Sem-5)
B. Tech.
(SEM. VI) EVEN SEMESTER THEORY
EXAMINATION, 2017-18
COMPILER DESIGN
Time : 3 Hours Max. Marks : 100
Note : 1. Attempt all Sections. If require any missing data; then choose
suitably.
2. Any special paper specific instruction.
SECTION-A
SECTION-B
SECTION-C
Solved Paper (2017-18) SP–4 C (CS/IT-Sem-5)
Note : 1. Attempt all Sections. If require any missing data; then choose
suitably.
2. Any special paper specific instruction.
SECTION-A
SECTION-B
Step 2 : q1 qf
a
b
q2 b q3
+
a* b
q4
a
b
q1 a q2 b q3 b qf
Step 3 :
q5 a q4 b
q6
S aSBS|bBS
S ASBS|
B ABA|a
B BABBA
|
bBA|
a
1 2
B bBA B |aB
B ABBA B|
A BS|a
A SAS|aS|a
A A
BAAB
|aAB|
a
1 2
A aAB A |aA
A BAAB A|
The production after left recursion is
S aSB S |bBS
S ASB S|
Compiler Design SP–7 C (CS/IT-Sem-5)
A aABA|aA
A BAABA|
B bBA B|aB
B ABBA B|
a + b $ Input
X Predictive
parsing Output
Stack Y
Z program
$
Parsing
Table
F F * | a
| b
1 2
F aF |bF
F *F |
FIRST(E) = FIRST(T) = FIRST(F) = {a, b}
FIRST(E ) = { +, }, FIRST(F ) = {*, }
FIRST(T ) = {*, }
FOLLOW(E) = { $ }
FOLLOW(E) = { $ }
FOLLOW(T) = {+, $ }
FOLLOW(T) = {+, $ }
FOLLOW(F) = {*, +, $ }
FOLLOW(F) = {*, +, $ }
Compiler Design SP–9 C (CS/IT-Sem-5)
SECTION-C
S •aBe, $
S •bAe, $
A •f, d/e
B •f, d/e
I1 : = GOTO(I0, S)
I1: S •S, $
I2 : = GOTO(I0, a)
I2 : S a•Ad, $
S aBe, $
A •f, d
B •f, e
I3 : = GOTO(I0, b)
I3 : S b•Bd, $
S b•Ae, $
A •f, d
B •f, e
I4 : = GOTO(I2, A)
I4 : S aA•d, $
I5 : = GOTO(I2, B)
I5 : S aB•d, $
I6 : = GOTO(I2, f)
I6 : A f•, d
B f•, e
I7 : = GOTO(I3, B)
I7 : S bB•d, $
I8 : = GOTO(I3, A)
I8 : S bA•e, $
I9 : = GOTO(I3, f)
I9 : A f•, d
B f•, e
I10 : = GOTO(I4, d)
I10 : S aAd• , $
I11 : = GOTO(I5, d)
I11 : S aBd• , $
I12 : = GOTO(I7, d)
I12 : S bBd• , $
I13 : = GOTO(I8, e)
I13 : S bAe• , $
Solved Paper (2017-18) SP–12 C (CS/IT-Sem-5)
No, two states can be merged. So, LALR table cannot be constructed
from LR(1) parsing table.
Schemes :
E.code = E1.code ||T1code || ‘+’
E1.code = T.code
T1.code = T1.code ||F.code || ‘×’
T1.code = F.code
F.code = E.code
F.code = id.code
where ‘||’ sign is used for concatenation.
E •E * E
E •(E)
E •id
I6 = GOTO (I2, E)
I6 : E (E•)
E E• + E
E E• * E
I7 = GOTO (I4, E)
I7 : E E + E•
E E• + E
E E• * E
I8 = GOTO (I5, E)
I8 : E E * E•
E E• + E
E E• * E
I9 = GOTO (I6, ))
I9 : E (E)•
SLR parsing table :
Action Goto
State id + * ( ) $ E
0 S3 S2 1
1 S4 S5 accept
2 S3 S2 6
3 r4 r4 r4 r4
4 S3 S2 8
5 S3 S2 8
6 S4 S5 S3
7 r1 S5 r1 r1
8 r2 r2 r2 r2
9 r3 r3 r3 r3
PROD = 0
B1
I=1
T1 = 4 * I
T2 = addr(A) – 4
T3 = T2 [T1]
T4 = addr(B) – 4
T5 = T4 [T1]
T6 = T3 * T5
PROD = PROD + T6 B2
I=I+1
If I <= 20 goto B2
Fig. 2.
b. Function preserving transformation :
1. Common sub-expression elimination : No any block has any
sub expression which is used two times. So, no change in flow
graphs.
2. Copy propagation : No any instruction in the block B2 is direct
assignment i.e., in the form of x = y. So, no change in flow graph
and basic block.
3. Dead code elimination : No any instruction in the block B2 is
dead. So, no change in flow graph and basic block.
4. Constant folding : No any constant expression is present in basic
block. So, no change in flow graph and basic block.
3. Reduction in strength :
a. In strength reduction technique the higher strength operators
can be replaced by lower strength operators.
b. The strength of certain operator is higher than other.
c. The strength reduction is not applied to the floating point
expressions because it may yield different results.
4. Loop invariant method : In loop invariant method, the
computation inside the loop is avoided and thereby the computation
overhead on compiler is avoided.
5. Loop unrolling : In this method, the number of jumps and tests
can be reduced by writing the code two times.
For example :
int i = 1; int i = 1;
while(i<=100) while(i<=100)
{ Can be written as {
a[i]=b[i]; a[i]=b[i];
i++; i++;
} a[i]=b[i];
i++ ;
}
6. Loop fusion or loop jamming : In loop fusion method, several
loops are merged to one loop.
For example :
for i:=1 to n do Can be written as for i:=1 to n*m do
for j:=1 to m do a[i]:=10
a[i,j]:=10
Compiler Design SP–1 C (CS/IT-Sem-5)
B. Tech.
(SEM. VI) EVEN SEMESTER THEORY
EXAMINATION, 2018-19
COMPILER DESIGN
Time : 3 Hours Max. Marks : 100
Note : 1. Attempt all Sections. If require any missing data; then choose
suitably.
SECTION-A
SECTION-B
SECTION-C
q0 q1 q2
b b. c
a. c
Fig. 1.
Solved Paper (2018-19) SP–4 C (CS/IT-Sem-5)
Note : 1. Attempt all Sections. If require any missing data; then choose
suitably.
SECTION-A
SECTION-B
A Subtree
A A
A A
A A
Fig. 1.
e. This causes major problem in top-down parsing and therefore
elimination of left recursion is must.
3. Left factoring :
a. Left factoring is occurred when it is not clear that which of the two
alternatives is used to expand the non-terminal.
b. If the grammar is not left factored then it becomes difficult for the
parser to make decisions.
Algorithm for FIRST and FOLLOW :
1. FIRST function :
i. FIRST (X) is a set of terminal symbols that are first symbols
appearing at R.H.S. in derivation of X.
ii. Following are the rules used to compute the FIRST functions.
a. If X determine terminal symbol ‘a’ then the FIRST(X) = {a}.
Compiler Design SP–7 C (CS/IT-Sem-5)
ii.
Stack contents Input string Actions
$ (a, a)$ Shift (
$( a, a)$ Shift a
$(a , a)$ Reduce S a
$(S , a)$ Reduce L S
$(L , a)$ Shift ,
$(L, a)$ Shift a
$(L, a )$ Reduce S a
$(L, S )$ Reduce L L, S
$(L )$ Shift )
$(L) $ Reduce S L
$S $ Accept
d1 : y := 2 B1
d2 : x := y + 2 B2
d1 : y := 2 B1
d2 : y := y + 2 B2
d3 : x := y + 2 B3
I5 = GOTO (I2, B)
I5 : S cB •
I6 = GOTO (I2, A)
I6 : A cA •
I7 = GOTO (I2, c)
I7 : S cc • A
B cc • B
A c•A
B c • cB
A • cA/ • a
B • ccB/ • b
I8 = GOTO (I7, A)
I8 : S ccA •
A cA •
I9 = GOTO (I7, B)
I9 : B ccB •
I10 = GOTO (I7, c)
I10 : B cc • B
A c•A
B c • cB
B • ccB| • b
A • cA| • a
I11 = GOTO (I10, A)
I11 : A cA •
DFA for set of items :
S
I0 I1 A I6
c
B I8
a b I2 I5 A
c
b B I9
I3 I4 b I7 c
a A
a b I10 I11
c
a
Fig. 2.
Let us numbered the production rules in the grammar as
1. S cB
2. S ccA
3. A cA
4. A a
5. B ccB
6. B b
Compiler Design SP–11 C (CS/IT-Sem-5)
Action GOTO
States a b c $ A B S
I0 S3 S4 S2 1
I1 Accept
I2 S3 S4 S7 6 5
I3 r4 r4 r4 r4
I4 r6 r6 r6 r6
I5 r1 r1 r1 r1
I6 r3 r3 r3 r3
I7 S3 S4 S10 8 9
I9 r5 r5 r5 r5
I10 S3 S4 S10 11
I11 r3 r3 r3 r3
SECTION-C
q0 q1 q2
b b. c
a. c
Fig. 3.
Ans. Transition table for -NFA :
/ a b c
q0 q1 q2 {q1, q2}
q1 q0 q2 {q0, q2}
q2
-closure of {q0} = {q0, q1, q2}
Solved Paper (2018-19) SP–12 C (CS/IT-Sem-5)
D
Dead state
a, b, c
Fig. 4.
Ans.
i. Call by name :
1. In call by name, the actual parameters are substituted for formals
in all the places where formals occur in the procedure.
2. It is also referred as lazy evaluation because evaluation is done on
parameters only when needed.
For example :
main (){
int n1=10;n2=20;
printf(“n1: %d, n2: %d\n”, n1, n2);
swap(n1,n2);
printf(“n1: %d, n2: %d\n”, n1, n2); }
swap(int c ,int d){
int t;
t=c;
c=d;
d=t;
printf(“n1: %d, n2: %d\n”, n1, n2);
}
Output : 10 20
20 10
20 10
ii. Call by reference :
1. In call by reference, the location (address) of actual arguments is
passed to formal arguments of the called function. This means by
accessing the addresses of actual arguments we can alter them
within the called function.
2. In call by reference, alteration to actual arguments is possible within
called function; therefore the code must handle arguments carefully
else we get unexpected results.
For example :
#include <stdio.h>
void swapByReference(int*, int*); /* Prototype */
int main() /* Main function */
{
int n1 = 10; n2 = 20;
/* actual arguments will be altered */
swapByReference(&n1, &n2);
printf(“n1: %d, n2: %d\n”, n1, n2);
}
void swapByReference(int *a, int *b)
{
int t;
t = *a; *a = *b; *b = t;
}
Output : n1: 20, n2: 10
Solved Paper (2018-19) SP–14 C (CS/IT-Sem-5)
c
Step 2 : t2
*
t2 = b * t1
b –
+ t3
t3 = t2 + t2
Step 3 : *
b –
c
Step 4 : = t4
t4 = a
+ a
b –
Ans. Difference :
S. No. Lexical scope Dynamic scope
Nesting
Static frames
A
B
C
C
fp static link
D
static link
D
B
static link
Calls
E E
A calls E
static link
E calls B
B calls D A
D calls C
iii. Dominators :
a. In control flow graphs, a node d dominates a node n if every path
from the entry node to n must go through d. This is denoted as d
dom n.
b. By definition, every node dominates itself.
c. A node d strictly dominates a node n if d dominates n and d is not
equal to n.
Compiler Design SP–17 C (CS/IT-Sem-5)
As we see, x1x2x3 will never appear on the stack. So, it is not a viable
prefix.
S if E then S1
E.true := new_label()
E.false := S.next
S1.next := S.next
S.code := E.code || gen_code(E.true ‘:’) || S1.code
3. In the given translation scheme || is used to concatenate the strings.
4. The function gen_code is used to evaluate the non-quoted
arguments passed to it and to concatenate complete string.
5. The S.code is the important rule which ultimately generates the
three address code.
S if E then S1 else S2
E.true := new_label()
E.false := new_label()
S1.next := S.next
S2.next := S.next
S.code := E.code || gen_code(E.true ‘:’) ||
S1.code := gen_code(‘goto’, S.next) ||
gen_code(E.false ‘:’) || S2.code
For example : Consider the statement if a < b then a = a + 5 else
a=a+7
if a<b E.true
E.true: a := a+5 S1
E.false: a := a+7
Switch statement :
switch expression
{
case value : statement
case value : statement
...
case value : statement
default : statement
}
Example :
switch(ch)
{
case 1 : c = a + b;
break;
case 2 : c = a – b;
break;
Compiler Design SP–21 C (CS/IT-Sem-5)
}
The three address code can be
if ch = 1 goto L1
if ch = 2 goto L2
L1 : t1 := a + b
c := t1
goto last
L2 : t2 := a – b
c := t2
goto last
last :
E.val = 29 F * F.val = 2
T
( E E.val = 29 ) id id.lexval = 2
T.val = 28 T + F F.val = 1
T F F.val = 7
T.val = 4 *
id id
id.lexval = 4 id.lexval = 7
Fig. 7.
Pre-header
header header
B0 B0
Fig. 8. Pre-header.
Solved Paper (2018-19) SP–24 C (CS/IT-Sem-5)
For example :
int i = 1; int i = 1;
while(i<=100) while(i<=100)
{ Can be written as {
a[i]=b[i]; a[i]=b[i];
i++; i++;
} a[i]=b[i];
i++ ;
}
6. Loop fusion or loop jamming : In loop fusion method, several
loops are merged to one loop.
For example :
for i:=1 to n do Can be written as for i:=1 to n*m do
for j:=1 to m do a[i]:=10
a[i,j]:=10
ST U D YZONE A D I T YA .COM
www.studyzoneaditya.com