Compiler Design Questio and Answer Key - 1
Compiler Design Questio and Answer Key - 1
1. What is a Complier?
2. State some compiler construction tools?
3. What is a lexeme? Define a regular set.
4. What are the Error-recovery actions in a lexical analyzer?
5. List the properties of LR parser.
6. Write short notes on YACC
7. What are kernel and non kernel items?
8. Define back patching.
9. Define basic block and flow graph.
10. What are the characteristics of peephole optimization?
PART B (5 x 16 = 80 Marks)
11. (a) (i)What is a Compiler? Write notes on LEX tools. 8
(ii)Briefly explain grouping of phases 8
OR
(b) Explain the phases of compiler with neat sketch 16
12. (a) Define a non- deterministic finite state automata. Write an algorithm to simulate nDFA 16
OR
(b) (i) Explain specification of tokens. (8)
(ii) What is the role of Lexical analyzer in a compilation process? What are lexemes ad tokens? 8
14. (a) (i) Generate intermediate code for the following code segment along with the required syntax
directed translation scheme: (8)
if(a>b)
x=a+b
else
x=a-b where a and x are of real and b of int type data.
(ii) Write short notes on back-patching. (8)
OR
(b) (i) Explain code generation phase with simple code generation algorithm. (10)
(ii) Write short notes on next-use information with suitable example. 6
1. A Complier is a program that reads a program written in one language-the source language-and
translates it in to an equivalent program in another language-the target language . As an important
part of this translation process, the compiler reports to its user the presence of errors in the source
program
2. i. Parse generator
ii. Scanner generators
iii. Syntax-directed translation engines
iv. Automatic code generator
v. Data flow engines.
3. A Lexeme is a sequence of characters in the source program that is matched by the pattern for a
token.
A language denoted by a regular expression is said to be a regular set
5. LR parsers can be constructed to recognize most of the programming languages for which the
context free grammar can be written.
2. The class of grammar that can be parsed by LR parser is a superset of class of grammars that
can be parsed using predictive parsers.
3. LR parsers work using non backtracking shift reduce technique yet it is efficient one.
7. Kernel: i. The set of items which include the initial item, SS, and all items whose
dots are not at the left end are known as kernel items.
Non Kernel: ii. The set of items, which have their dots at the left end, are known as non kernel
items
9. A basic block is a sequence of consecutive statements in which flow of Control enters at the
beginning and leaves at the end without halt or possibility Of branching except at the end.
A flow graph is defined as the adding of flow of control information to the Set of basic blocks
making up a program by constructing a directed graph.
LEXICAL ANALYSIS:
It is the first phase of the compiler. It gets input from the source program and produces tokens as output.
It reads the characters one by one, starting from left to right and forms the tokens.
Token : It represents a logically cohesive sequence of characters such as keywords,
o operators, identifiers, special symbols etc.
o Example: a + b = 20
o Here, a,b,+,=,20 are all separate tokenso Group of characters forming a token is called the Lexeme.
The lexical analyser not only generates a token but also enters the lexeme into the symbol
o table if it is not already there.
SYNTAX ANALYSIS:
It is the second phase of the compiler. It is also known as parser. It gets the token stream as input from the
lexical analyser of the compiler and generates o syntax tree as the output.
Syntax tree:
o It is a tree in which interior nodes are operators and exterior nodes are operands.
Example: For a=b+c*2, syntax tree is
SEMANTIC ANALYSIS:
It is the third phase of the compiler.
It gets input from the syntax analysis as parse tree and checks whether the given syntax is correct or not.
It performs type conversion of all the data types into real data types.
INTERMEDIATE CODE GENERATION:
It is the fourth phase of the compiler. It gets input from the semantic analysis and converts the input into
output as intermediate code such as three address code.
The three-address code consists of a sequence of instructions, each of which has atmost three operands.
Example: t1=t2+t3
CODE OPTIMIZATION:
It is the fifth phase of the compiler. It gets the intermediate code as input and produces optimized
intermediate code as output. This phase reduces the redundant code and attempts to improve the
intermediate code so that faster-running machine code will result.
During the code optimization, the result of the program is not affected.
To improve the code generation, the optimization involves
- deduction and removal of dead code (unreachable code).
- calculation of constants in expressions and terms.
- collapsing of repeated expression into temporary string.
- loop unrolling.
- moving code outside the loop.
- removal of unwanted temporary variables.
CODE GENERATION:
It is the final phase of the compiler.
It gets input from code optimization phase and produces the target code or object code as result.
Intermediate instructions are translated into a sequence of machine instructions that perform the same
task.
The code generation involves
- allocation of register and memory
- generation of correct references
- generation of correct data types
- generation of missing code
SYMBOL TABLE MANAGEMENT:
Symbol table is used to store all the information about identifiers used in the program.
It is a data structure containing a record for each identifier, with fields for the attributes of the identifier.
It allows to find the record for each identifier quickly and to store or retrieve data from that record.
Whenever an identifier is detected in any of the phases, it is stored in the symbol table.
ERROR HANDLING:
Each phase can encounter errors. After detecting an error, a phase must handle the error so that
compilation can proceed.
In lexical analysis, errors occur in separation of tokens. In syntax analysis, errors occur during
construction of syntax tree. In semantic analysis, errors occur when the compiler detects constructs with
right
syntactic structure but no meaning and during type conversion. In code optimization, errors occur when
the result is affected by the optimization. In code generation, it shows error when code is missing etc.
14 A I Intermediate code
14 A II back-patching. (8)
Back patching is the activity of filling up unspecified information of labels using
appropriate semantic actions in during the code generation process. (2)
In the semantic actions the functions used are (2)
mklist(i) create a new list having i, an index into array of quadruples.
merge(p1,p2) - merges two lists pointed by p1 and p2
back patch(p,j) inserts the target label j for each list pointed by p.
Example: (4)
Source:
if a or b then
if c then
x= y+1
Translation:
if a go to L1
if b go to L1
go to L3
L1: if c goto L2
goto L3
L2: x= y+1
L3:
After Backpatching:
100: if a goto 103
101: if b goto 103
102: goto 106
103: if c goto 105
104: goto 106
105: x=y+1
106:
14 B I code generation phase with simple code generation algorithm. (10)
It generates target code for a sequence of three address statements. (2)
Assumptions:
For each operator in three address statement, there is a corresponding target language operator.
Computed results can be left in registers as long as possible.
E.g. a=b+c: (2)
Add Rj,Ri where Ri has b and Rj has c and result in Ri. Cost=1;
Add c, Ri where Ri has b and result in Ri. Cost=2;
Mov c, Rj; Add Rj, Ri; Cost=3;
Register descriptor: Keeps track of what is currently in each register
Address descriptor: Keeps tracks of the location where the current value of the name can be found at run
time. (2)
Code generation algorithm: For x= y op z (2)
Invoke the function getreg to determine the location L, where the result of y op z should be stored
(register or memory location)
Check the address descriptor for y to determine y
Generate the instruction op z, L where z is the current location of z
If the current values of y and/or z have no next uses, alter register descriptor
Getreg: (2)
If y is in a register that holds the values of no other names and y is not live, return register of y for L
If failed, return empty register
If failed, if X has next use, find an occupied register and empty it
If X is not used in the block, or suitable register is found, select memory location of x as L
15 Ai Code optimization is needed to make the code run faster or take less space
or both.
Function preserving transformations:
Common sub expression elimination
Copy propagation
Dead-code elimination
Constant folding
Common sub expression elimination: (2)
E is called as a common sub expression if E was previously computed and the
values of variables in E have not changed since the previous computation.
Copy propagation: (2)
Assignments of the form f:=g is called copy statements or copies in short. The idea
here is use g for f wherever possible after the copy statement.
Dead code elimination: (2)
A variable is live at a point in the program if its value can be used subsequently.
Otherwise dead. Deducing at compile time that the value of an expression is a
constant and using the constant instead is called constant folding.
Loop optimization: (2)
Code motion: Moving code outside the loop
Takes an expression that yields the same result independent of the number of times
a loop is executed (a loop-invariant computation) and place the expression before
the loop.
Induction variable elimination
Reduction in strength: Replacing an expensive operation by a cheaper one.
15 A II storage organization
Run time storage: The block of memory obtained by compiler from OS to execute the
compiled program. It is subdivided into
Generated target code
Data objects
Stack to keep track of the activations
Heap to store all other information
Activation record: (Frame)
It is used to store the information required by a single procedure call.
Returned value Actual parameters Optional control link Optional access link
Saved machine status Local data temporaries
Temporaries are used to hold values that arise in the evaluation of expressions. Local data is the data that
is local to the execution of procedure. Saved machine status represents status of machine just before the
procedure is called. Control link (dynamic link) points to the activation record of the calling procedure.
Access link refers to the non-local data in other activation records. Actual parameters are the one which is
passed to the called procedure. Returned value field is used by the called procedure to return a
value to the calling procedure
Compile time layout of local data:
The amount of storage needed for a name is determined by its type. The field for the local data is laid out
as the declarations in a procedure are examined at compile time. The storage layout for data objects is
strongly influenced by the addressing constraints on the target machine.
(2) Parameter passing.
Call by value
A formal parameter is treated just like a local name. Its storage is in the activation record of the called
procedure
The caller evaluates the actual parameter and place the r-value in the storage for the formals
Call by reference
If an actual parameter is a name or expression having L-value, then that lvalue itself is passed
However, if it is not (e.g. a+b or 2) that has no l-value, then expression is evaluated in the new location
and its address is passed.
Code
Static data
Stack
Heap
Copy-Restore: Hybrid between call-by-value and call-by-ref (copy in, copy out)
Actual parameters evaluated, its r-value is passed and l-value of the actual are determined
When the called procedure is done, r-value of the formals are copied back to
the l-value of the actuals
Call by name
Inline expansion(procedures are treated like a macro)
15 B PEEPHOLE optimizatio