CH-3 Syntax Analyzer
CH-3 Syntax Analyzer
1
Contents
Chapter 3: Syntax Analysis (6hr) 3.9. Bottom-Up Parsing
▪ It gets the input from the tokens and generates a syntax tree or parse tree.
• Planning the error handling right from the start can both simplify the structure of a
compiler and improve its response to errors.
2. The precision of modern parsing methods; they can detect the presence of syntactic errors in programs
very efficiently.
▪ Accurately detecting the presence of semantic and logical errors at compile time is a much
more difficult task.
▪ In this section, we present a few basic techniques for recovering from syntax errors; their
implementation is discussed in conjunction with the parsing methods in this chapter.
Cont…
▪ Functions of error handler :
1. It should report the presence of errors clearly and accurately.
2. It should recover from each error quickly enough to be able to detect subsequent errors.
3. It should not significantly slow down the processing of correct programs.
▪ Several parsing methods, such as the LL and LR methods, detect an error as soon as
possible.
▪ LL and LR methods have the viable-prefix property, meaning they detect that an error
has occurred as soon as they see a prefix of the input that is not a prefix of any string in
the language.
Error-Recovery Strategies
▪ Once an error is detected, how should the parser recover?
▪ There are many different general strategies that a parser can employ to
recover from a syntactic error.
▪ These are
1. Panic mode recovery
2. Phrase level recovery
3. Error productions recovery
4. Global correction recovery
Cont…
▪ Panic mode recovery:
✓On discovering an error, the parser discards input symbols one at a time until a synchronizing
token is found.
✓The synchronizing tokens are usually delimiters, such as semicolon or end.
✓It has the advantage of simplicity and does not go into an infinite loop.
✓When multiple errors in the same statement are rare, this method is quite useful.
▪ Global correction:
✓Given an incorrect input string x and grammar G, certain algorithms can be used to
find a parse tree for a string y, such that the number of insertions, deletions and
changes of tokens is as small as possible.
✓However, these methods are in general too costly in terms of time and space.
Context Free Grammar (CFG)
▪ Due to the limitations of regular expressions, the lexical analyzer
cannot check the syntax of a given sentence.
➢It consists of a set of production rules that specify how to generate strings of
symbols in the language.
• A CFG is defined by a 4-tuple (V, Σ, R, S), where:
➢V: is a set of non-terminal symbols or variables
➢Σ: is a set of terminal symbols or terminals.
➢R: is a set of production rules or rewrite rules.
➢S: is the start symbol.
Cont…
▪ Example of context-free grammar:
▪ The following grammar defines simple
arithmetic expressions:
expr → expr op expr
expr → (expr)
expr → - expr
expr → id
op → +
op → -
op → *
op → /
op → ↑
Cont…
Cont…
Parse Tree
▪ A parse tree is a graphical depiction of a derivation.
▪ It is convenient to see how strings are derived from the start symbol.
▪ The start symbol of the derivation becomes the root of the parse tree.
2. Right-most Derivation
▪ Example: Given grammar G: E → E+E | E*E | ( E ) | - E | id
▪ Sentence to be derived : - (id+id)
Left Most Derivation Right Most Derivation
E→-E E→-E
E→-(E) E→-(E)
E→-(E+E) E→-(E+E)
E→-(id+E) E→-(E+id)
E→-(id+id) E→-(id+id)
Exercise:
➢Derivation and Parse Tree: Let G be a Context Free Grammar for which
the production Rules are given below:
S → aB|bA
A → a|aS|bAA
B → b|bS|aBB
Drive the string aaabbabbba using the above grammar (using Left Most
Derivation and Right most Derivation).
Ambiguous Grammar/ Ambiguity
▪ A grammar G is said to be ambiguous if it has more than one parse tree (left or
right derivation) for at least one string.
▪ A grammar that produces more than one parse tree for some sentence is said to
be ambiguous grammar.
▪ Example: Given grammar G: E → E+E | E*E | ( E ) | - E | id
▪ The sentence id+id*id has the following two distinct leftmost derivations:
E → E+ E E → E* E
E → id + E E →E+ E * E
E → id + E * E E → id + E * E
E → id + id * E E → id + id * E
E → id + id * id E → id + id * id
▪ What will be the parse tree generated by the string or sentence?
Cont…
• For the string id + id – id, these grammar generates two parse trees:
➢Write the left most derivation and right most derivation for the above parse tree?
Eliminating ambiguity:
▪ Ambiguity of the grammar that produces more than one parse tree for leftmost or
rightmost derivation can be eliminated by re-writing the grammar.
This grammar is ambiguous since the string if E1 then if E2 then S1 else S2 has the following two
parse trees for leftmost derivation
➢i.e. it starts from the start symbol and ends on the terminals.
➢It basically generates the parse tree by using brute force and backtracking.
➢It uses a parsing table to generate the parse tree instead of backtracking.
Cont…
2. Bottom-up Parser:
➢Bottom-up Parser is the parser that generates the parse tree for the given input
string with the help of grammar productions by compressing the non-terminals
1. LR parser
▪ The first step in executing yacc program p.y is yacc –d p.y the option –d
generates y.tab.c file which can be compiled using cc compiler so it can be
compiled using the command cc y.tab.c then the executable file a.out will
be generated.
Example-1: Yacc program
➢ Program to test the validity of a simple expression involving operators +, -, * and /
%token NUMBER ID NL
%left ‘+’ ‘-‘
%left ‘*’ ‘/’
%%
stmt : exp NL { printf(“Valid Expression”); exit(0);}
;
exp : exp ‘+’ exp
| exp ‘-‘ exp
| exp ‘*’ exp
| exp ‘/’ exp
| ‘(‘ exp ‘)’
| ID
| NUMBER
;
%%
int yyerror(char *msg)
{
printf(“Invalid Expression\n”);
exit(0);
}
main ()
{
printf(“Enter the expression\n”);
yyparse();
}
Example-2
➢ Program to recognize nested IF control statements and display the levels of nesting
%token IF RELOP S NUMBER ID
%{
int count=0;
%}
%%
stmt : if_stmt { printf(“No of nested if statements=%d\n”,count); exit(0);}
;
if_stmt : IF ‘(‘ cond ‘)’ if_stmt {count++;}
| S;
;
cond : x RELOP x
;
x : ID
| NUMBER
;
%%
int yyerror(char *msg)
{
printf(“Invalid Expression\n”);
exit(0);
}
main ()
{
printf(“Enter the statement”);
yyparse();
}
Example-3
➢ Program to check the syntax of a simple expression involving operators +, -, * and /.
%token NUMBER ID NL
%left ‘+’ ‘-‘
%left ‘*’ ‘/’
%%
stmt : exp NL { printf(“Valid Expression”); exit(0);}
;
exp : exp ‘+’ exp
| exp ‘-‘ exp
| exp ‘*’ exp
| exp ‘/’ exp
| ‘(‘ exp ‘)’
| ID
| NUMBER
;
%%
int yyerror(char *msg)
{
printf(“Invalid Expression\n”);
exit(0);
}
main ()
{
printf(“Enter the expression\n”);
yyparse();
}