0% found this document useful (0 votes)
3 views

Lexical_Syntax_Semantic_Analyzers_Latest

The document outlines the principles of Compiler Design, focusing on Lexical Analysis, tokenization, and parsing techniques. It explains the roles of lexical analyzers, tokens, and regular expressions, as well as various parsing methods including LL and LR parsing. Additionally, it covers the construction of parsing tables and the significance of FIRST and FOLLOW sets in grammar analysis.

Uploaded by

bunnyreddy0402
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lexical_Syntax_Semantic_Analyzers_Latest

The document outlines the principles of Compiler Design, focusing on Lexical Analysis, tokenization, and parsing techniques. It explains the roles of lexical analyzers, tokens, and regular expressions, as well as various parsing methods including LL and LR parsing. Additionally, it covers the construction of parsing tables and the significance of FIRST and FOLLOW sets in grammar analysis.

Uploaded by

bunnyreddy0402
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Compiler Design

Academic Year: 2024 - 25

Dr. Praveen Kumar Alapati


[email protected]
Department of Computer Science and Engineering
Ecole Centrale School of Engineering

Dr. Praveen (Mahindra University) Compiler Design CS3101 1 / 62


Lexical Analysis

▶ Breaking down text: The lexical analyzer reads the source code
character by character and groups these characters into meaningful
sequences called tokens (lexemes).
▶ Tagging Tokens: Each token is classified into categories such as
keyword, identifier, literal, operator.
▶ Detecting illegal tokens: Lexical analyzer detects and reports any
illegal tokens that do not match any known pattern for a token,
helping in early diagnostics of syntax errors.
▶ Removing Comments: Lexical analyzer removes comments that are
not needed for syntax analysis.
▶ Removing Whitespaces: Lexical analyzer removes any whitespace,
tabs, and newline characters that are not needed for syntax analysis.
▶ Stream of Tokens: The output from lexical analyzer is a stream of
tokens, which is input for the syntax analyzer.

Dr. Praveen (Mahindra University) Compiler Design CS3101 2 / 62


Tokens, Patterns, and Lexemes

▶ lexeme is the lowest level syntactic unit in the source code.


A sequence of characters that forms a meaningful piece, according to
the language syntax.
▶ Pattern is a description of the form that the lexemes of a token may
take.
▶ Token is a symbol that the compiler or interpreter uses during the
syntax analysis phase. Each token represents a category of lexemes
that share a common role in the language syntax.
▶ Tokens are essentially classifications provided to lexemes.
For example, consider 12345 as a lexeme, then it can be categorized
under the token type NUMBER.

Dr. Praveen (Mahindra University) Compiler Design CS3101 3 / 62


Delimiters in C Language

Delimiters separate tokens from each other.

▶ Whitespaces: Space, tab, new line, vertical tab, form feed.

▶ Punctuation Delimiters: Serve as delimiters and tokens.


It depends on the context. Semicolon, Comma, Parentheses, Braces,
Brackets, Colon.

▶ Operator Symbols: Serve as delimiters, if they are between literals


and identifiers.

▶ Miscellaneous: Dot, Arrow, and Question Mark.

Dr. Praveen (Mahindra University) Compiler Design CS3101 4 / 62


Uses of regular expressions:
▶ Text Search and Replace: Used in text editors, IDEs, and word
processors for searching and replacing text.
▶ Data Validation: Regular expressions are crucial in validating user
input in web forms, databases, and software applications (i.e., data
entered matches a specific format, such as email addresses, phone
numbers, and URLs).
▶ Syntax Highlighting: Helps developers distinguish between elements
of source code such as keywords, strings, comments, and operators by
coloring them differently based on matching patterns.
▶ Parsing and Extracting Information: To parse text files and extract
information in various applications, including web scraping, log file
analysis, and data migration tasks.
▶ Machine Learning and Natural Language Processing:
To pre-process text data, remove unwanted characters or format data
in a way that is suitable for ML or NLP.
Dr. Praveen (Mahindra University) Compiler Design CS3101 5 / 62
Regular Expressions to NFA

Dr. Praveen (Mahindra University) Compiler Design CS3101 6 / 62


NFA for the Regular Expression (a/b)∗ abb

Dr. Praveen (Mahindra University) Compiler Design CS3101 7 / 62


Transitions

▶ ϵ-closure(s): Set of NFA states that are reachable from the NFA state
s on ε transations alone.
▶ ϵ closure(T): Set of NFA states reachable from some NFA state s ∈ T
on ϵ-transation alone;
▶ move(T,a): Set of NFA states to which there is a transition on input
symbol a from some state s in T.

Dr. Praveen (Mahindra University) Compiler Design CS3101 8 / 62


Subset Construction Algorithm

Where Dstate means DFA state


Dtran means transition on input symbol.
Dr. Praveen (Mahindra University) Compiler Design CS3101 9 / 62
NFA to DFA using Subset Construction Algorithm

Dr. Praveen (Mahindra University) Compiler Design CS3101 10 / 62


Syntax of Lex

Lex is a tool that generates lexical analyzers, converting sequences of


characters into tokens based on regular expression rules.

Dr. Praveen (Mahindra University) Compiler Design CS3101 11 / 62


How to Compiler a Lex Program

Dr. Praveen (Mahindra University) Compiler Design CS3101 12 / 62


Purpose of yytext and yylex

▶ yytext is a variable declared and maintained by Lex in the generated


scanner code. It is a pointer to a character array (string) that holds
the text of the current token that has been matched by the scanner.

▶ yylex is the main function generated by Lex. When compiled and


linked, it becomes the entry point for the lexical analysis functionality
provided by the Lex-generated code.

▶ yylex repeatedly reads the input stream, identifies the longest prefix
that matches any of the specified regular expression patterns,
executes the corresponding action, and continues until there are no
more tokens.

Dr. Praveen (Mahindra University) Compiler Design CS3101 13 / 62


Parser
Gives a structural representation of the input while checking for correct
syntax according to a formal grammar.

Dr. Praveen (Mahindra University) Compiler Design CS3101 14 / 62


Functionalities of Parser

Tokenization: The input data needs to be converted into a format that the
parser can understand (i.e., by breaking the input into meaningful
elements called tokens).
▶ Parser checks the syntax of the given input data and verifies whether
it is derived from the rules of a specific grammar or not.
▶ Errors: When the input data does not derived from the grammar, the
parser is expected to report these errors effectively. It should provide
meaningful error messages to fix the issue.
▶ No-Error: Parser builds a parse tree or an abstract syntax tree which
represents the grammatical structure of the input.

Dr. Praveen (Mahindra University) Compiler Design CS3101 15 / 62


classification of Parsers

Dr. Praveen (Mahindra University) Compiler Design CS3101 16 / 62


classification of Parsers

Dr. Praveen (Mahindra University) Compiler Design CS3101 17 / 62


Operator Precedence Parser
▶ It is a bottom-up parser, suitable for grammars where there is no
ambiguity in the precedence and associativity of operators.
▶ It uses the relative precedence of operators to decide how to parse a
sequence of tokens.
▶ The core of an operator precedence parser is the operator precedence
table, which specifies relationships between pairs of terminal symbols.
1 Less than (<): One operator has lower precedence than another.
2 Greater than (>): One operator has higher precedence than another.
3 Equal to (=): Two operators have the same precedence and should
associate according to the associativity rules.
▶ Parser reads tokens from left to right and push onto a stack. Based
on the precedence relations in the table, the parser will decide
whether to shift or reduce.
Shift: When the current operator has higher precedence than the one on
the stack.
Reduce: When the current operator has lower precedence than the one on
the stack, reduce the stack.
Dr. Praveen (Mahindra University) Compiler Design CS3101 18 / 62
Operator Precedence Parsing Algorithm

1 Initialize a stack with a special symbol ($).

2 Read the next token from the input.


3 Compare the precedence of the top of the stack with the current
token:
▶ If the token has higher precedence, shift it onto the stack.
▶ If the token has lower precedence, reduce the stack by applying a
grammar rule.

4 Repeat until the end of input and the stack is reduced to the start
symbol.

Dr. Praveen (Mahindra University) Compiler Design CS3101 19 / 62


Operator Precedence Table

E → E + T /T
T → T ∗ F /F
F → id

Dr. Praveen (Mahindra University) Compiler Design CS3101 20 / 62


Parse the input id1 + id2 * id3 using a shift-reduce
parser

Dr. Praveen (Mahindra University) Compiler Design CS3101 21 / 62


Handle

In a shift-reduce parser, a portion of the stack’s content matches the


right-hand side of a production that is called Handle.

Dr. Praveen (Mahindra University) Compiler Design CS3101 22 / 62


Viable Prefix

▶ It is a prefix of a valid right sentential form.

▶ A viable prefix is a valid sequence of symbols that could occur on


the top of the stack of a shift-reduce parser, before performing a
shift or reduce operation.
▶ Viable prefix contains only the symbols that the parser has
already processed and does not depend on what comes next (i.e.,
it does not include the remaining input).
▶ Viable prefixes help the parser decide whether to shift (continue
processing more input) or reduce (apply a production rule to the
current symbols on the stack).

Dr. Praveen (Mahindra University) Compiler Design CS3101 23 / 62


LL Parsing: Reads the input from left to right and
construct a leftmost derivation of the sentence

E → TE 1 ; E 1 → +TE 1 /ϵ; T → FT 1 ; T 1 → ∗FT 1 /ϵ; F → (E )/id

Dr. Praveen (Mahindra University) Compiler Design CS3101 24 / 62


Parsing using LL Parsing Lable

Dr. Praveen (Mahindra University) Compiler Design CS3101 25 / 62


LR Parsing: Reads the input from Left to right, and
considers a Rightmost derivation in reverse

Dr. Praveen (Mahindra University) Compiler Design CS3101 26 / 62


Parsing using LR Parsing Lable

Dr. Praveen (Mahindra University) Compiler Design CS3101 27 / 62


FIRST and FOLLOW

FIRST of a non-terminal in a grammar is the set of terminal


symbols that begin the strings derivable from that non-terminal.

FOLLOW of a non-terminal in a grammar is the set of


terminal symbols that appear immediately to the right of that
non-terminal in some ”sentential” form.
For the start symbol S, FOLLOW of S contains $.
Example: Find the FIRST and FOLLOW of the following grammar:
S→AB
A→aA/a
B→bB/b

Dr. Praveen (Mahindra University) Compiler Design CS3101 28 / 62


Rules for Calculating FIRST Set:

Apply the following rules until no more terminals or ϵ can be added to any
FIRST set.

1 If X is a terminal, then FIRST(X)=X.

2 If X is a non-terminal and X→ ϵ is a production, then ϵ ∈ FIRST(X).

3 If X is a non-terminal and X→Y1Y2. . . Yn is a production:


▶ Add FIRST(Y1) to FIRST(X) excluding ϵ.

▶ If FIRST(Y1) contains ϵ, add FIRST(Y2), and so on.

▶ If all Yi (1 ≤ i ≤ n) have ϵ in their First set, then add ϵ to FIRST(X).

Dr. Praveen (Mahindra University) Compiler Design CS3101 29 / 62


Rules for Calculating FOLLOW Set:

Apply the following rules until no more terminals can be added to any
FOLLOW set.

FOLLOW(A) is the set of terminals that appear immediately to the right


of A in any derivation.

1 For the start symbol S: FOLLOW(S) contains End-of-Input symbol $.

2 If A→αBβ is a production, everything in FIRST(β) except ϵ is added


to FOLLOW(B).

3 If A→αBβ is a production and ϵ ∈ FIRST(β) or if A→αB is a


production then add FOLLOW(A) to FOLLOW(B).

Dr. Praveen (Mahindra University) Compiler Design CS3101 30 / 62


Construction of LL Parsing Table: Algorithm

Dr. Praveen (Mahindra University) Compiler Design CS3101 31 / 62


Construct of LL Parsing Table

E → TE 1 ; E 1 → +TE 1 /ϵ; T → FT 1 ; T 1 → ∗FT 1 /ϵ; F → (E )/id

Dr. Praveen (Mahindra University) Compiler Design CS3101 32 / 62


Construct of LL Parsing Table

E → TE 1 ; E 1 → +TE 1 /ϵ; T → FT 1 ; T 1 → ∗FT 1 /ϵ; F → (E )/id

Dr. Praveen (Mahindra University) Compiler Design CS3101 32 / 62


If-Else Grammar

S → iEtS|iEtSeS
E →b

Left Factored Grammar is


S → iEtSS 1
S 1 → eS|ϵ
E →b

Dr. Praveen (Mahindra University) Compiler Design CS3101 33 / 62


If-Else Grammar

S → iEtS|iEtSeS
E →b

Left Factored Grammar is


S → iEtSS 1
S 1 → eS|ϵ
E →b

Dr. Praveen (Mahindra University) Compiler Design CS3101 33 / 62


If-Else Grammar: Parsing Table

S → iEtSS 1
S 1 → eS|ϵ
E →b

Dr. Praveen (Mahindra University) Compiler Design CS3101 34 / 62


Closure of an LR(0) Item

Algorithm 1 CLOSURE(I)
1: J=I
2: repeat
3: for(each Item A → α.Bβ in J)
4: for( each Production B → γ of G )
5: if(B → .γ is not in J )
6: add B → .γ to J;
7: until no more items are added to J on one round
8: return J;

Dr. Praveen (Mahindra University) Compiler Design CS3101 35 / 62


Canonical Sets of LR Items

Algorithm 2 Sets of LR Items (G 1 )


1: C={CLOSURE([S 1 → .S])}
2: repeat
3: for(each set of Items I in C)
4: for( each grammar symbol X)
5: if(GOTO(I,X) is not in C )
6: add GOTO(I,X) to C;
7: until no new sets of items are added to C on a round
8: return C;

Dr. Praveen (Mahindra University) Compiler Design CS3101 36 / 62


Sets of LR(0) Items

Dr. Praveen (Mahindra University) Compiler Design CS3101 37 / 62


Constructing an SLR Parsing Table

INPUT: An augmented grammar G 1 .


OUTPUT: The SLR-parsing table functions ACTION and GOTO for G 1 .
1 Construct C={I0 , I1 , I2 , ...In }, the sets of LR(0) items for G 1 .
2 State i is constructed from Ii . The parsing actions for state i are:
a. If [A → α.aβ]∈ Ii and GOTO(Ii , a)=Ij , then set ACTION[i,a]=Shift j.
b. If [A → α.] is in Ii , then set ACTION[i,a]=reduce by A → α for all a
in FOLLOW(A).
c. If [S 1 → S.] is in Ii , then set ACTION[i, $]= accept.
3 The GOTO transitions for state i are constructed using the rule:
If GOTO(Ii , A) is Ij , then GOTO[i, A] = j
4 All entries not defined by rules (2) and (3) are ERRORS.
5 The initial state of the parser is the one constructed from the set of
items containing [S 1 → .S]

Dr. Praveen (Mahindra University) Compiler Design CS3101 38 / 62


is the following grammar SLR(1) ?

Grammar

S →L=R |R
L → ∗R | id
R →L
Augmented Grammar

S1 → S
S →L=R |R
L → ∗R | id
R →L

Dr. Praveen (Mahindra University) Compiler Design CS3101 39 / 62


FIRST and FOLLOW of the grammar

FIRST of the Non-Terminals

FIRST(S)=FIRST(L)=FIRST(R)={*, id}

FOLLOW of the Non-Terminals

FOLLOW(S)={$}
FOLLOW(L)={$, =}
FOLLOW(R)={$, =}

Dr. Praveen (Mahindra University) Compiler Design CS3101 40 / 62


is the following grammar SLR(1) ?

State I0 :

S 1 → .S
S → .L = R | .R
L → . ∗ R | .id
R → .L

I0 on S we get state I1 :

S 1 → S.

I0 on L we get state I2 :

S → L. = R
R → L.

Dr. Praveen (Mahindra University) Compiler Design CS3101 41 / 62


is the following grammar SLR(1) ?

I0 on L we get state I2 :

S → L. = R
R → L.

In state I2 , we had the item R → L.


So, SLR parser calls the reduce action by using the production R → L for
each terminal {$, =} in FOLLOW(R).
Here, the SLR parser also calls the shift action by using the item
S → L. = R, hence the SLR parser has a shift-reduce conflict.
No right-sentential form of the grammar has vaiable prefix R=, So
reduction using the production R → L is of no use when the next
input symbol is =
To eliminated useless reductions (inturn reduces conflicts), one could
consider 1 or more lookahead symbols while finding sets of LR items.

Dr. Praveen (Mahindra University) Compiler Design CS3101 42 / 62


Sets of LR(0) Items (Example2)

Dr. Praveen (Mahindra University) Compiler Design CS3101 43 / 62


Closure of an LR(1) Item

Algorithm 3 CLOSURE(I)
1: J=I
2: repeat
3: for(each Item [A → α.Bβ, a] in J)
4: for( each Production B → γ in G 1 )
5: for(each terminal b in FIRST(βa))
6: add [B → .γ, b] to J;
7: until no more items are added to J on one round
8: return J;

Dr. Praveen (Mahindra University) Compiler Design CS3101 44 / 62


Canonical Sets of LR(1) Items

Algorithm 4 Sets of LR Items (G 1 )


1: C={CLOSURE([S 1 → .S, $])}
2: repeat
3: for(each set of Items I in C)
4: for( each grammar symbol X)
5: if(GOTO(I,X) is not in C )
6: add GOTO(I,X) to C;
7: until no new sets of items are added to C on a round
8: return C;

Dr. Praveen (Mahindra University) Compiler Design CS3101 45 / 62


GOTO Function on an Item

Algorithm 5 Set of LR(1) Items GOTO (I , X )


1: Initialize J to be an empty set;
2: for(each item [A → α.X β, a] in I)
3: add [A → αX .β, a] to set J;
4: return CLOSURE(J);

Dr. Praveen (Mahindra University) Compiler Design CS3101 46 / 62


Sets of LR(1) Items

Dr. Praveen (Mahindra University) Compiler Design CS3101 47 / 62


Sets of LR(1) Items

Dr. Praveen (Mahindra University) Compiler Design CS3101 48 / 62


CLR Parsing Table

Dr. Praveen (Mahindra University) Compiler Design CS3101 49 / 62


Constructing an CLR Parsing Table

INPUT: An augmented grammar G 1 .


OUTPUT: The CLR-parsing table functions ACTION and GOTO for G 1 .
1 Construct C={I0 , I1 , I2 , ...In }, the sets of LR(1) items for G 1 .
2 State i is constructed from Ii . The parsing actions for state i are:
a. If [A → α.aβ, b] ∈ Ii and GOTO(Ii , a)=Ij , set ACTION[i,a]=Shift j.
b. If [A → α., a] is in Ii , then set ACTION[i,a]=reduce by A → α.
c. If [S 1 → S., $] is in Ii , then set ACTION[i, $]= accept.
3 The GOTO transitions for state i are constructed using the rule:
If GOTO(Ii , A) is Ij , then GOTO[i, A] = j
4 All entries not defined by rules (2) and (3) are ERRORS.
5 The initial state of the parser is the one constructed from the set of
items containing [S 1 → .S, $]

Dr. Praveen (Mahindra University) Compiler Design CS3101 50 / 62


Constructing an LALR Parsing Table
INPUT: An augmented grammar G 1 .
OUTPUT: The LALR-parsing table functions ACTION and GOTO for G 1 .
1 Construct C={I0 , I1 , I2 , ...In }, the sets of LR(1) items for G 1 .
2 Group the sets of LR(1) items based on their core.
3 Let C 1 ={J0 , J1 , J2 , ...Jm } be the resulting sets of LR(1) items.
4 State i is constructed from Ji . The parsing actions for state i are:
a. If [A → α.aβ, b] ∈ Ji and GOTO(Ji , a)=Jj , set ACTION[i,a]=Shift j.
b. If [A → α., a] is in Ji , then set ACTION[i,a]=reduce by A → α.
c. If [S 1 → S., $] is in Ji , then set ACTION[i, $]= accept.
5 The GOTO transitions for state i are constructed using the rule:
If GOTO(Ji , A) is Jj , then GOTO[i, A] = j
6 All entries not defined by rules (4) and (5) are ERRORS.
7 The initial state of the parser is the one constructed from the set of
items containing [S 1 → .S, $]
Dr. Praveen (Mahindra University) Compiler Design CS3101 51 / 62
LALR Parsing Table

Dr. Praveen (Mahindra University) Compiler Design CS3101 52 / 62


LR(0) Grammar

Dr. Praveen (Mahindra University) Compiler Design CS3101 53 / 62


LR(0) Grammar Parsing Table

Dr. Praveen (Mahindra University) Compiler Design CS3101 54 / 62


Important Points

1 Every SLR(1) grammar is unambiguous, but there are many


unambiguous grammars that are not SLR(1).

Dr. Praveen (Mahindra University) Compiler Design CS3101 55 / 62


Operator Grammar and Operator-Precedence Grammar

▶ Operator Grammar: In a grammar no production right side has two


adjacent non-terminals.
E → EAE |(E )|id
A → +|∗
▶ Operator-Precedence Grammar: An operator grammar is said to
be an operator-precedence grammar iff
(i) It should be ϵ free operator grammar.
(ii) For any pair of terminals a and b, never more than one of the
relations a < b, a = b, and a > b is true.

Dr. Praveen (Mahindra University) Compiler Design CS3101 56 / 62


Procedure to Compute Operator Precedence Relations

Let G be a ϵ free operator grammar, a and b are two terminals


▶ a = b if there is a right side of a production of the form αaβbγ,
where β is either ϵ or a single non-terminal.

▶ a < b if for some non-terminal A there is a right side of the form


+
αaAβ and A =⇒ γbδ, where γ is either ϵ or a single non-terminal.

▶ a > b if for some non-terminal A there is a right side of the form


+
αAbβ and A =⇒ γaδ, where δ is either ϵ or a single non-terminal.

Dr. Praveen (Mahindra University) Compiler Design CS3101 57 / 62


LEADING and TRAILING
+
▶ LEADING(A)= { a | A =⇒ γaδ, where γ is either ϵ or a single
non-terminal}
+
▶ TRAILING(A) = { a | A =⇒ γaδ, where δ is either ϵ or a single
non-terminal}
Algorithm for LEADING:
1 a is in LEADING(A) if there is a production of the form A → γaδ,
where γ is either ϵ or a single non-terminal
2 a is in LEADING(B), and there is production of the form A → Bα,
then a is in LEADING(A).
Algorithm for TRAILING
1 a is in TRAILING(A) if there is a production of the form A → γaδ,
where δ is either ϵ or a single non-terminal
2 a is in TRAILING(B), and there is production of the form A → αB,
then a is in TRAILING(A).
Dr. Praveen (Mahindra University) Compiler Design CS3101 58 / 62
LEADING and TRAILING
+
▶ LEADING(A)= { a | A =⇒ γaδ, where γ is either ϵ or a single
non-terminal}
+
▶ TRAILING(A) = { a | A =⇒ γaδ, where δ is either ϵ or a single
non-terminal}
Algorithm for LEADING:
1 a is in LEADING(A) if there is a production of the form A → γaδ,
where γ is either ϵ or a single non-terminal
2 a is in LEADING(B), and there is production of the form A → Bα,
then a is in LEADING(A).
Algorithm for TRAILING
1 a is in TRAILING(A) if there is a production of the form A → γaδ,
where δ is either ϵ or a single non-terminal
2 a is in TRAILING(B), and there is production of the form A → αB,
then a is in TRAILING(A).
Dr. Praveen (Mahindra University) Compiler Design CS3101 58 / 62
LEADING and TRAILING
+
▶ LEADING(A)= { a | A =⇒ γaδ, where γ is either ϵ or a single
non-terminal}
+
▶ TRAILING(A) = { a | A =⇒ γaδ, where δ is either ϵ or a single
non-terminal}
Algorithm for LEADING:
1 a is in LEADING(A) if there is a production of the form A → γaδ,
where γ is either ϵ or a single non-terminal
2 a is in LEADING(B), and there is production of the form A → Bα,
then a is in LEADING(A).
Algorithm for TRAILING
1 a is in TRAILING(A) if there is a production of the form A → γaδ,
where δ is either ϵ or a single non-terminal
2 a is in TRAILING(B), and there is production of the form A → αB,
then a is in TRAILING(A).
Dr. Praveen (Mahindra University) Compiler Design CS3101 58 / 62
Computing Operator Precedence Relations
Input: An operator grammar G
Output: The relations <, =, and > for G
Method:
1 Compute LEADING(A) and TRAILING(A) for each non-terminal A.
2 Set $ < a for all a in LEADING(S) and set b > $ for all b in
TRAILING(S), where S is the start symbol of G.
3 for(each production A → X1 X2 X3 ...Xn ) do
for (i = 1 to n − 1 )do
if Xi and Xi+1 are both terminals then set Xi = Xi+1
if i ≤ n − 2 and Xi and Xi+2 are terminals and
Xi+1 is a non-terminal then set Xi = Xi+2
if Xi is a terminal and Xi+1 is a non-terminal then
for (all a in LEADING(Xi+1 )) set Xi < a.
if Xi is a non-terminal and Xi+1 is a terminal then
for (all a in TRAILING(Xi )) set a > Xi+1 .

Dr. Praveen (Mahindra University) Compiler Design CS3101 59 / 62


Error Recovery Strategies in LR Parsing

In LR parsing, error recovery is crucial.


▶ To help the parser continue parsing after encountering an error.
▶ Error recovery strategies ensure that the parser can handle errors
gracefully and continue parsing the rest of the input and possibly
reporting more errors in a single pass.
Error Recovery Strategies: Panic Mode Recovery
1 When an error is detected, the parser discards input symbols one by
one until a synchronizing token is found. Synchronizing tokens are
predesignated symbols (e.g., delimiters like ;, }, or )), where parsing
can safely resume.
2 Advantages: Simple to implement. It allows the parser to skip over
problematic sections and continue parsing later parts of the input.
3 Disadvantages: It can cause the parser to skip over large sections of
input, potentially missing multiple errors in the skipped section.

Dr. Praveen (Mahindra University) Compiler Design CS3101 60 / 62


Error Recovery Strategies in LR Parsing

Error Recovery Strategies: Phrase-Level Recovery


1 This strategy tries to correct the error locally by inserting, deleting, or
replacing tokens to transform the erroneous input into something that
could be valid.
2 Advantages: It provides a more localized and less disruptive
recovery. The parser doesn’t discard large sections of the input.
3 Disadvantages:Disadvantages: Determining the correct
modifications can be complex, and incorrect modifications might
introduce more errors or hide the actual problem.

Dr. Praveen (Mahindra University) Compiler Design CS3101 61 / 62


Error Recovery Strategies in LR Parsing

Error Recovery Strategies: Error Productions


1 The grammar is augmented with special error productions that define
common errors the parser expects. If the parser encounters a syntax
error that matches an error production, it applies the error rule and
attempts to continue parsing.
2 Advantages: Allows the parser to handle specific expected errors.
3 Disadvantages: Requires foresight into common error patterns, and
writing the error productions can complicate the grammar.

Dr. Praveen (Mahindra University) Compiler Design CS3101 62 / 62

You might also like