CD Imp Ques 1
CD Imp Ques 1
UNIT 1
2M
Panic Mode Recovery is a common error recovery technique used in compilers to handle syntax
errors efficiently. It allows the parser to recover from errors and continue parsing the rest of the
input instead of stopping abruptly.
➢ When a syntax error is detected, the parser discards input symbols until it finds a
synchronizing token (such as a semicolon; or closing bracket }).
➢ The parser then resumes normal parsing from this point, preventing an infinite loop of
errors.
➢ The synchronizing tokens are chosen based on the language’s grammar (e.g., keywords,
delimiters).
7.DEFINE FA
8.DEFINE RE
12M
LEXICAL ANALYSER
Reads streams of characters making up the source program & groups the characters into meaningful
sentences called lexemes. For each lexeme, LEXA producer as output a token of the form <token
name attribute value> where token name represents abstract symbol & attribute value points to an
entry in symbol table.
SYNTAX ANALYSER
Also called as parser. This phase groups the token produced by LEXA into syntactic variables.
SEMANTIC ANALYSIS
Checks for sematic errors. Concentrates on type checking whether operands are type compatible.
After semantic analysis, some compilers generate an explicit intermediate representation of the
source program. This representation should be easy to produce and easy to translate into the target
program. There are variety of forms.
The commonly used representation is three address formats. The format consists of a sequence of
instructions, each of which has at most three operands.
CODE OPTIMISATION
This phase attempts to improve the intermediate code, so that faster running machine code will
result. There are various techniques used by most of the optimizing compilers, such as:
3. Constant folding
4. Copy propagation
6. Code motion
7. Reduction in strength
CODE GENERATION
The final phase of the compiler is the generation of target code, consisting of relocatable machine
code or assembly code. The intermediate instructions are each translated into sequence of machine
instructions that perform the same task. A crucial aspect is the assignment of variables to registers.
2.CONVERT NFA TO M-DFA
3.EXPLAIN INPUT BUFFERING 6M
Input buffering is a technique used in lexical analysis to efficiently read and process input characters
from a source file while minimizing the overhead of frequent system calls. Since reading characters
one at a time from disk or memory is slow, buffering improves efficiency by reading large blocks of
data at once.
Buffering Techniques
1. Single Buffering
➢ The input is divided into two N-character buffers (typically 1024 or 4096 bytes).
Working Mechanism:
Advantages:
➢ Instead of checking for the end of the buffer in every character read (which adds extra
comparisons), a special sentinel character (EOF) is placed at the end of each buffer.
➢ When the forward pointer encounters EOF, it triggers buffer reloading.
➢ Benefit: Eliminates extra condition checks and speeds up processing.
➢ Lexeme Beginning Pointer (lexeme_beginning): Marks the start of the current lexeme
(token).
➢ Forward Pointer (forward): Moves ahead to identify tokens.
➢ End-of-buffer Handling: If the forward pointer reaches the end of a buffer, the other buffer
is loaded, and scanning continues seamlessly.
UNIT 2
PARSER
A parser is a component of a compiler or interpreter that analyses the syntax of a given input
(usually a program) according to the rules of a formal grammar. It ensures that the structure of the
code is correct before further processing.
Parser for any grammar is program that takes as input string w (obtain set of strings tokens from the
lexical analyser) and produces as output either a parse tree for w , if w is a valid sentences of
grammar or error message indicating that w is not a valid sentences of given grammar.
2.DEFINE CFG
A Context-Free Grammar (CFG) is a formal grammar used to define the syntax of programming
languages and natural languages. It consists of a set of rules (productions) that describe how strings
in a language can be generated.
Components of a CFG
G = (V, T, P, S)
where
T (Terminals) – A finite set of symbols (tokens) that make up the actual language.
where
A is a non-terminal
3. Consider the G,
E → E + E | E * E | (E) | - E | id
Derive the string id + id * id using leftmost derivation and rightmost derivation.
4.DIFF BW AMBIGUOUS AND UNAMBIGUOUS GRAMMAR
➢ In this method, when a parser encounters an error, it performs the necessary correction on
the remaining input so that the rest of the input statement allows the parser to parse ahead.
➢ The correction can be deletion of extra semicolons, replacing the comma with semicolons, or
inserting a missing semicolon.
➢ While performing correction, utmost care should be taken for not going in an infinite loop.
➢ A disadvantage is that it finds it difficult to handle situations where the actual error occurred
before pointing of detection.
12M
Shift-Reduce Parser
A Shift-Reduce Parser is a bottom-up parsing technique that reduces an input string to the start
symbol of a grammar using shifting and reducing operations. It is widely used in bottom-up parsers
like LR, SLR, LALR, and CLR parsers.
Shift
Reduce
Replace a sequence of symbols on the stack with a corresponding non-terminal based on a grammar
rule.
Accept
If the stack contains only the start symbol and input is fully parsed, the string is accepted.
Error
Predictive Parser
A grammar after eliminating left recursion and left factoring can be parsed by a recursive descent
parser that needs no backtracking is a called a predictive parser. Let us understand how to eliminate
left recursion and left factoring.
A grammar is said to be left recursive if it has a non-terminal A such that there is a derivation A=>Aα
for some string α. Top-down parsing methods cannot handle left-recursive grammars. Hence, left
recursion can be eliminated as follows:
A → βA'
A' → αA' | ε
E → E+T | T
T → T*F | F
F → (E) | id
E → TE'
E' → +TE' | ε
T → FT '
E → TE'
E' → +TE' | ε
T → FT '
F → (E) | id
ii)Eliminating Left factoring
Left factoring is a grammar transformation that is useful for producing a grammar suitable for
predictive parsing. When it is not clear which of two alternative productions to use to expand a non-
terminal A, we can rewrite the A-productions to defer the decision until we have seen enough of the
input to make the right choice.
A → αA'
A’ → αβ1 | αβ2
S → iEtS | iEtSeS | a
E→b
Here,i,t,e stand for if ,the,and else and E and S for “expression” and “statement”.
S → iEtSS' | a
S' → eS | ε
E→b
With input id+id*id the predictive parser makes the sequence of moves
3.OPERATOR PRECEDENCE PARSING
Operator precedence parser – An operator precedence parser is a bottom-up parser that interprets
an operator grammar. This parser is only used for operator grammars. Ambiguous grammars are not
allowed in any parser except operator precedence parser.
There are two methods for determining what precedence relations should hold between a pair of
terminals:
This parser relies on the following three precedence relations: ⋖, ≐, ⋗ a ⋖ b This means a “yields
precedence to” b. a ⋗ b This means a “takes precedence over” b. a ≐ b This means a “has same
precedence as” b.
There is not given any relation between id and id as id will not be compared and two variables can
not come side by side. There is also a disadvantage of this table – if we have n operators then size of
table will be n*n and complexity will be 0(n2).
In order to decrease the size of table, we use operator function table. Operator precedence parsers
usually do not store the precedence table with the relations; rather they are implemented in a
special way.
Operator precedence parsers use precedence functions that map terminal symbols to integers, and
the precedence relations between the symbols are implemented by numerical comparison. The
parsing table can be encoded by two precedence functions f and g that map terminal symbols to
integers.
Use Stack implementation of operator precedence parser to check this sentence id + id by this
grammar: E→ E+E | E*E | id
Sol: