0% found this document useful (0 votes)
11 views

CD Model Set-4 Answer Key

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

CD Model Set-4 Answer Key

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

SET- 4 Reg. No.

IFET COLLEGE OF ENGINEERING


(An Autonomous Institution)
INTERNAL ASSESSMENT EXAMINATION-II
DEPARTMENT OF CSE & IT-
SUB CODE: 19UCSPC601 MAX MARKS: 60
SUB NAME: COMPILER DESIGN DURATION: 120Min
DATE: .05.2024/FN YEAR/ SEMESTER: III/VI
ANSWER KEY
PART-A (10  2=20)
Answer All Questions
(Each answer should have minimum 7 lines)
1. Define the term cross compiler R CO1
 There may be a compiler which run on one machine and produces the target 1
code for another machine. Such a compiler is called Cross Compiler.
Basically there exists three types of languages
 Source language i.e. the application program. 1
 Target language in which machine code is written.
 The Implementation language in which a compiler is written.

2. Point out why is buffering used in lexical analysis? U CO1


 The lexical analyzer scans the input from left to right one character at a time. 1
 It uses two pointers begin ptr(bp) and forward ptr(fp) to keep track of
the pointer of the input scanned.
 Input buffering is an important concept in compiler design that refers
to the way in which the compiler reads input from the source code. 1
 In many cases, the compiler reads input one character at a time,
which can be a slow and inefficient process. Input buffering is a
technique that allows the compiler to read input in larger chunks,
which can improve performance and reduce overhead.

3. Eliminate left recursion from the grammar: S->Aa / b, A->Ac / Sd/e. A CO2
To remove left recursion from the grammar of the form : A → Aα | β We rewrite the 1
production rules as:
A → βA' A'→ αA'| ε
Given Grammar: S → Aa | b A → Ac | Sd | ε
after finding indirect left recursion, grammar:
S → Aa | b
A → Ac | Aad | bd | ε here, α = c, ad, β = bd
So, Grammar after removing left recursion = S → Aa | b 1
A → bdA'
A'→ cA'| adA'| ε
4. What is handle pruning? R CO2
1
 A rightmost derivation in reverse can be obtained by “handle-pruning”.
 The process of discovering a handle & reducing it to the appropriate
left-hand side is called handle pruning
 Construct right sentential form and handle for
grammar E → E + E,
E → E * E,
E → ( E ),
E → id

5. Write Three Address Code for the following expression-If A < B then 1 else 0 A CO3
101:if A<B then goto 104 2
102:T1=0
103:goto 105 104:T1=1
105:stop

6. List out the types of SDD R CO3


1
Syntax-directed definitions come in two different varieties. The two types of Syntax
Directed Definitions are
1. S-attributed and
2. L-attributed Definitions.
L-attributed translation translates while the parsing is being done. Additionally, 1
bottom-up parsing is combined with S-attributed translation.

7. State the limitations of static allocation. U CO4


2
 It is incompatible with recursive subprograms.
 It is not possible to use variables whose size has to be determined at run
time.
 The static allocation can be completed if the size of the data
object is called compile time.
 Static Allocation is not efficient.
 Not Highly scalable.

8. Mention the four principal uses of registers in code generation U CO4


 In most machine architectures, some or all of the operands of an operation must 1
be in registers in order to perform the operation.
 Registers make good temporaries — places to hold the result of a subexpression
while a larger expression is being evaluated, or more generally, a place to hold a
variable that is used only within a single basic block.
 Registers are used to hold (global) values that are computed in one basic block
and used in other blocks, for example, a loop index that is incremented going
around the loop and is used several times within the loop.
 Registers are often used to help with run-time storage management, for
example, to manage the run-time stack, including the maintenance of stack 1
pointers and possibly the top elements of the stack itself.
 These are competing needs, since the number of registers available is limited.
The machine instructions are of the form
LD reg, mem
ST mem, reg
OP reg, reg, reg

9. Represent the following in flow graph i=1; sum=0;while (i<=10){sum+=i; i++;} A CO5
2

10. Apply the basic block concepts, how would you representing the dummy blocks S CO5
with no statements indicated in global dataflow analysis?
 In global dataflow analysis in compiler design,dummy blocks are introduced to 2
represent certain program structures or behaviors that may not have a direct
equivalent in the original code.

PART-B (Total=40 Marks)


(2  16=32&1  8=8)
Answer All Questions
(Each answer should be written for minimum 5 pages with minimum 25 lines per page)
11. A (i) Discuss in detail about the role of Lexical Analyzer U (8) CO1
Lexical Analyzer
The main task of lexical analyser is to read the source program, scan
the input characters, group them into lexemes and produce the token as 1
output. The stream of tokens is sent to the parser for syntax analysis. It
interacts with the symbol table. When it discovers a lexeme consisting as
identifier, it needs to enter that lexeme into symbol table. In some cases,
information regarding kind of identifier may be read from the symbol
table by lexical analyser to assist it in determining the proper token it must
pass to the parser. The interaction between lexical analyser and the parser
is implemented by the getNextToken command (Fig 1.9). When the
command is issued by parser, the lexical analyser to read characters from
its input until it identify the next lexeme and produce for it the next
token, which it returns to the parser.

Functions of Lexical analyser


The following are the some other tasks performed by the lexical analyser
 It eliminates whitespace and comments. 1
 It generates symbol table which stores the information about
identifiers, constants encountered in the input.
 It keeps track of line numbers.
 It reports the error encountered while generating the tokens.
 It expands the macros when the source program uses a
macro-pre-processor
Processes of lexical analyser
Sometimes, lexical analyser is divided into a cascade of two processes: 1
 Scanning: It performs reading of input characters, removal
of white spaces and comments
 Tokenization: It is the more complex portion, where the
scanner produces the sequence of tokens as output
Lexical Analysis versus Parsing
There are a number of reasons why analysis portion of a complier is 1
normally separated into lexical analysis and parsing (syntax analysis)
phases
1. Simplicity of design is the most important consideration. The separation
of lexical and syntactic analysis often allows us to simplify at least one of
these tasks. The removal of white spaces and comments enables the
syntax analyzer for efficient syntactic constructs.
2. Compiler efficiency is improved. A separate lexical analyzer allows us
to apply specialized techniques that serve only the lexical task, not the job
of parsing. In addition, specialized buffering techniques for reading input
characters can speed up the compiler significantly
3. Compiler portability is enhanced. Input-device-specific peculiarities can
be restricted to the lexical analyser
Tokens, Patterns and Lexemes
1
When discussing lexical analysis, we use three related but distinct terms:
Token: Token is a valid sequence of characters which are given by
lexeme. In a programming language, Keywords, constant, identifiers,
numbers, operators and punctuations symbols are possible tokens to be
identified.
Pattern: Pattern describes a rule that must be matched by sequence of
characters (lexemes) to form a token. It can be defined by regular
expressions or grammar rules
Lexeme: Lexeme is a sequence of characters that matches the pattern for
a token i.e., instance of a token.
Table 1.4 Examples of token

Attributes of tokens:
• The lexical analyzer collects information about tokens into their
associated attributes. The tokens influence parsing decisions and attributes 1
influence the translation of tokens.
• Usually a token has a single attribute i.e. pointer to the symbol table
entry in which the information about the token is kept
Lexical Errors
• A character sequence that cannot be scanned into any valid token is a
lexical error.
1
• Lexical errors are uncommon, but they still must be handled by a
scanner.
• Misspelling of identifiers, keyword, or operators are considered as
lexical errors. Usually, a lexical error is caused by the appearance of some
illegal character, mostly at the beginning of a token.
Lexical error handling approaches
Lexical errors can be handled by the following actions:
• Deleting one character from the remaining input.
• Inserting a missing character into the remaining input.
• Replacing a character by another character.
• Transposing two adjacent characters.
11. A ii) Elaborate in detail about the recognition of tokens U (8) CO1
Recognition of token explains how to take the patterns for all needed
tokens. It builds a piece of code that examines the input string and finds a
prefix that is a lexeme matching one of the patterns.
Rules for conditional statement or branching statement can be given as
follows
Stmt-> if expr then stmt
| If expr then stmtelse stmt

Expr->term relop term
| term
Term->id
|number
For relop, we use the comparison operations of languages like Pascal or
SQL where = is “equals” and <> is “not equals” because it presents an
interesting structure of lexemes.
The terminal of grammar, which are if, then, else, relop, id and numbers
are the names of tokens as far as the lexical analyzer is concerned, the
patterns for the tokens are described using regular definitions.
The patterns for token
digit -->[0,9]
digits -->digit+
number -->digit(.digit)?(e.[+-]?digits)?
letter -->[A-Z,a-z]
id -->letter(letter/digit)*
if --> if
then -->then
else -->else
relop -->< | > | <= | >= |= |= | <>
For easy recognition, keywords are considered as reserved words even
through their
lexeme match with the pattern for identifiers. In addition, we assign the
lexical analyzer the job of
stripping out white space, by recognizing the “token” ws defined by:
ws-> (blank |tab| newline)+
or
delim-> blank| tab | newline
ws->delim+
Here, blank, tab and newline are abstract symbols that we use to express
the ASCII characters of the same names. Token ws is different from the
other tokens in that ,when we recognize it, we do not return it to parser
,but rather restart the lexical analysis from the character that follows the
white space.
(OR)
11. B Construct the minimized DFA for the regular expression (aa*|bb*) A (16) CO1

12. A Construct a SLR parsing table for the following grammar. A (16) CO2
E→E+T|T
T → TF | F
F →F* | a | b
Solution
Step1 − Construct the augmented grammar and number the productions.
(0) E′ → E
(1) E → E + T
(2) E → T
(3) T → TF
(4) T → F
(5) F → F ∗
(6) F → a
(7) (7) F → b.
Step2 − Find closure & goto Functions to construct LR (0) items.
Box represents the New states, and the circle represents the Repeating
State.
Computation of FOLLOW:
We can find out FOLLOW(E) = {+, $}
FOLLOW(T) = {+, a, b, $}
FOLLOW(F) = {+,*, a, b, $}
(OR)
12. B i) Explain in detail about the Context-free Grammar U (8) CO2
Refer model set – 3 answer key

12. B ii) Describe about the parse tree with suitable example U (8) CO2
Refer model set – 3 answer key

13. A i) Construct a Syntax-Directed Translation scheme that translates S (8) CO3


arithmetic expressions from infix into postfix notation. Your
solution should include the context-free grammar, the semantic
attributes for each of the grammar symbols, and semantic rules.
Shown the application of your scheme to the input and “3*4+5*2”.
Syntax Directed Translation is a set of productions that have semantic
rules embedded inside it. The syntax-directed translation helps in the
semantic analysis phase in the compiler. SDT has semantic actions along
with the production in the grammar. This article is about postfix SDT and
postfix translation schemes with parser stack implementation of it. Postfix
SDTs are the SDTs that have semantic actions at the right end of the
production. This article also includes SDT with actions inside the
production, eliminating left recursion from SDT and SDTs for L-
attributed definitions.
Postfix Translation Schemes:
 The syntax-directed translation which has its semantic actions at the
end of the production is called the postfix translation scheme.
 This type of translation of SDT has its corresponding semantics at
the last in the RHS of the production.
 SDTs which contain the semantic actions at the right ends of the
production are called postfix SDTs.
Context-Free Grammar (CFG) For Arithmetic Expressions:
E -> E + T { print('+') }
E -> E - T { print('-') }
E -> T
T -> T * F { print('*') }
T -> T / F { print('/') }
T -> F
F -> ( E )
F -> id { print('id') }
In this CFG, E represents an expression, T represents a term,
and F represents a factor. The terminal id represents an identifier (i.e., a
number or a variable).
The semantic rules are embedded within the CFG. Whenever a
production rule is used, the corresponding semantic rule is executed. In
this case, the semantic rules are simply to print the operator.
Now, let’s apply this SDT scheme to the input “34*+52*”. We’ll
assume that each number is an id.
1. Parse “34*+52*” using the CFG. This will result in the following
parse tree:
E

E + T

T T * F

T * F F id

F id id

id

2. Traverse the parse tree in post order (left, right, root), executing the
semantic rules as you go. This will result in the postfix expression
“34*52*+”.
3. So, the infix expression “34*+52*” translates to the postfix expression
“34*52*+” using this SDT scheme. Note that this scheme assumes that
the input is a well-formed infix expression and does not handle errors

13. A ii) Explain in detail about syntax directed definitions and its types U (8) CO3
Syntax-directed definition
A syntax-directed definition (SDD) is a context-free grammar
together with attributes and rules. Attributes are associated with grammar
symbols and rules are associated with productions. If X is a symbol and a
is one of its attributes, then we write X.a to denote the value of a at a
particular parse-tree node labeled X. If we implement the nodes of the
parse tree by records or objects, then the attributes of X can be
implemented by data fields in the records that represent the nodes for X.
Attributes may be of any kind: numbers, types, table references, or
strings, for instance. The strings may even be long sequences of code, say
code in the intermediate language used by a compiler.
Annotated Parse Tree – The parse tree containing the values of attributes
at each node for given input string is called annotated or decorated parse
tree.
Inherited and Synthesized Attributes
There are two kinds of attributes for non terminals:
1. A synthesized attribute for a nonterminal A at a parse-tree node
N is defined by a semantic rule associated with the production at N. Note
that the production must have A as its head. A synthesized attribute at
node N is defined only in terms of attribute values at the children of N
and at N itself.
Example:
E --> E1 + T { E.val = E1.val + T.val}
In this, E.val derive its values from E1.val and T.val

2. An inherited attribute for a nonterminal B at a parse-tree node N


is defined by a semantic rule associated with the production at the parent
of N. Note that the production must have B as a symbol in its body. An
inherited attribute at node N is defined only in terms of attribute values at
JV's parent, N itself, and N's siblings.
Example:
A --> BCD { C.in = A.in, C.type = B.type }

(OR)
13. B How could you generate the intermediate code for the flow of control U (16) CO3
statement? Explain with an example
Refer model set – 3 answer key
14. A Illustrate in detail about the code generation algorithm with an U (16) CO4
example
The Code-Generation Algorithm:

An essential part of the algorithm is a function getReg(I), which selects


registers for each memory location associated with the three-address
instruction I. Function getReg has access to the register and address
descriptors for all the variables of the basic block, and may also have
access to certain useful data- flow information such as the variables that
are live on exit from the block. We shall discuss getReg after presenting
the basic algorithm. While we do not know the total number of registers
available for local data belonging to a basic block, we assume that there
are enough registers so that, after freeing all available registers by storing
their values in memory, there are enough registers to accomplish any three-
address operation.
In a three-address instruction such as x=y+z, we shall treat + as a generic
operator and ADD as the equivalent machine instruction. We do not,
therefore, take advantage of commutativity of +. Thus, when we
implement the operation, the value of y must be in the second register
mentioned in the ADD instruction, never the third. A possible
improvement to the algorithm is to generate code for both x = y + z and
x=z+y whenever + is a commutative operator, and pick the better code
sequence.
Machine Instructions for Operations
For a three-address instruction such as x = y + z, do the following:
1. Use getReg(x = y + z) to select registers for x, y, and z. Call these ,
, and .
2. If y is not in , (according to the register descriptor for ), then issue
an instruction LD , y', where y' is one of the memory locations for y
(according to the address descriptor for y).
3. Similarly, if z is not in , issue an instruction LD , z', where is
location for z.
4. Issue the instruction ADD , , .
Machine Instructions for Copy Statements
There is an important special case: a three-address copy statement
of the form x = y. We assume that getReg will always choose the same
register for both x and y. If y is not already in that register , then
generate the machine instruction LD , y. If y was already in , we do
nothing. It is only necessary that we adjust the register descriptor for ,
so that it includes x as one of the values found there.
Ending the Basic Block
As we have described the algorithm, variables used by the block
may wind up with their only location being a register. If the variable is a
temporary used only within the block, that is fine; when the block ends,
we can forget about the value of the temporary and assume its register is
empty. However, if the variable is live on exit from the block, or if we
don't know which variables are live on exit, then we need to assume that
the value of the variable is needed later. In that case, for each variable x
whose address descriptor does not say that its value is located in the
memory location for x, we must generate the instruction ST x, R, where R
is a register in which x's value exists at the end of the block.
Managing Register and Address Descriptors
As the code-generation algorithm issues load, store, and other
machine instructions, it needs to update the register and address
descriptors. The rules are as follows:
1. For the instruction LD R, x
(a) Change the register descriptor for register R so it holds only x.
(b) Change the address descriptor for x by adding register R as an
additional location.
2. For the instruction ST x, R, change the address descriptor for x
to include its own memory location.
3. For an operation such as ADD , , , implementing a
three-address instruction x=y+z
(a) Change the register descriptor for , so that it holds only x.
(b) Change the address descriptor for x so that its only location is
. Note that the memory location for x is not now in the address
descriptor for x.
(c) Remove from the address descriptor of any variable other
than x.
4. When we process a copy statement x = y, after generating the
load for y into register , if needed, and after managing descriptors as
for all load statements (per rule 1):
(a) Add x to the register descriptor for .
(b) Change the address descriptor for x so that its only location is

Example: Let us translate the basic block consisting of the three-address


statements
t=a-b
u=a-c
v=t+u
a=d
d=v+u

Here we assume that t, u, and v are temporaries, local to the block,


while a, b, c, and d are variables that are live on exit from the block. Since
we have not yet discussed how the function getReg might work, we shall
simply assume that there are as many registers as we need, but that when a
register's value is no longer needed (for example, it holds only a
temporary, all of whose uses have been passed), then we reuse its register.

For the first three-address instruction, t=a-b we need to issue there


instructions, since nothing is in a register initially. Thus, we see a and b
loaded into registers R1 and R2, and the value t produced in register R2.
Notice that we can use R2 for t because the value b previously in R2 is not
needed within the block. Since b is presumably live on exit from the
block, had it not been in its own memory location (as indicated by its
address descriptor), we would have had to store R2 into b first. The
decision to do so, had we needed R2, would be taken by getReg. The
second instruction, u=a-c, does not require a load of a, since it is already
in register R1. Further, we can reuse R1 for the result, u, since the value of
a. previously in that register, is no longer needed within the block, and its
value is in its own memory location if a is needed outside the block. Note
that we change the address descriptor for a to indicate that it is no longer
in R1, but is in the memory location called a. The third instruction, v=t+u,
requires only the addition. Further, we can use R3 for the result, v, since
the value of e in that register is no longer needed within the block, and e
has its value in its own memory location. The copy instruction, ad,
requires a load of d, since it is not in a register. We show register R2's
descriptor holding both a and d. The addition of a to the register descriptor
is the result of our processing the copy statement, and is not the result of
any machine instruction. The fifth instruction, d=v+u, uses two values that
are in registers. Since u is a temporary whose value is no longer needed,
we have chosen to reuse its register R1 for the new value of d. Notice that
d is now in only R1, and is not in its own memory location. The same
holds for a, which is in R2 and not in the memory location called a. As a
result, we need a "coda" to the machine code for the basic block that
stores the live-on-exit variables a and d into their memory locations. We
show these as the last two instructions.
(OR)
14. B Explain the use of symbol table in compilation process. List out the U (16) CO4
various attributes for implementing the symbol table
Refer model set – 3 answer key
15. A Can you provide the detailed explain about the principal sources of R (16) CO5
optimization?
Refer model set – 2 answer key
(OR)
15. B i) Construct an algorithm for building Dominator Tree for flow graph U (8) CO5
Refer model set – 3 answer key
15. B ii) Explain the concept of data flow analysis in flow graphs with U (8) CO5
suitable example
Refer model set – 3 answer key

Staff In-charge HoD

You might also like