0% found this document useful (0 votes)
29 views

CH-3 Syntax Analyzer

Uploaded by

vakame5133
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

CH-3 Syntax Analyzer

Uploaded by

vakame5133
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Injibara University

Department of Computer Science


Compiler Design (CoSc3112)

Chapter 3: Syntax Analysis


Minychil F.

1
Contents
Chapter 3: Syntax Analysis (6hr) 3.9. Bottom-Up Parsing

3.1. Parsing 3.10. Handles

3.2. Top-down Parsing 3.11. Stack Implementation of Shift

3.3.1. Predictive Parsing Reduce Parsing

3.4.1. Top-down Parsing principles of CFG 3.12. LR Parers-Implementation - LR

3.5. Regular Expression Vs Context Free Parsing Algorithms

Grammar (CFG) 3.13. SLR, CLR and LALR parser


3.6. Top-down Parsing Implementation - 3.3. Error Recovery
Recursive Decent parsing 3.4. Parser Generator
3.7. Non-Recursive Predictive Parsing
3.8. LL(1) Grammar
Syntax Analysis
▪ Syntax analysis is the second phase of the compiler.

▪ It gets the input from the tokens and generates a syntax tree or parse tree.

▪ Advantages of grammar for syntactic specification:


1. A grammar gives a precise and easy-to-understand syntactic specification of a
programming language.
2. An efficient parser can be constructed automatically from a properly designed grammar.
3. A grammar imparts a structure to a source program that is useful for its translation into
object code and for the detection of errors.
4. New constructs can be added to a language more easily when there is a grammatical
description of the language.
THE ROLE OF PARSER
▪ The parser or syntactic analyzer obtains a string of tokens from the lexical analyzer
and verifies that the string can be generated by the grammar for the source language.
▪ It reports any syntax errors in the program.
▪ It also recovers from commonly occurring errors so that it can continue processing its
input.

Fig. 3.1 Position of parser in compiler model


Cont…
Functions of the parser : Issues in syntax analyzer :
1. It verifies the structure generated ▪ Parser cannot detect errors such as:
1. Variable re-declaration
by the tokens based on the
2. Variable initialization before use.
grammar.
3. Data type mismatch for an
2. It constructs the parse tree.
operation.
3. It reports the errors.
▪ The above issues are handled by
4. It performs error recovery.
Semantic Analysis phase.
Syntax Error Handling
• Most programming language specifications do not describe how a compiler should
respond to errors; the response is left to the compiler designer.

• Planning the error handling right from the start can both simplify the structure of a
compiler and improve its response to errors.

• Programs can contain errors at many different levels.

• For example, errors can be


1. Lexical, such as misspelling an identifier, keyword or operator.
2. Syntactic, such as an arithmetic expression with unbalanced parentheses.
3. Semantic, such as an operator applied to an incompatible operand.
4. Logical, such as an infinitely recursive call.
Cont…
▪ Often much of the error detection and recovery in a compiler is placed on the syntax analysis
phase. The reason for this is that:
1. Many errors are syntactic in nature or are exposed when the stream of tokens coming from the lexical
analyzer disobeys the grammatical rules defining the programming language.

2. The precision of modern parsing methods; they can detect the presence of syntactic errors in programs
very efficiently.

▪ Accurately detecting the presence of semantic and logical errors at compile time is a much
more difficult task.

▪ In this section, we present a few basic techniques for recovering from syntax errors; their
implementation is discussed in conjunction with the parsing methods in this chapter.
Cont…
▪ Functions of error handler :
1. It should report the presence of errors clearly and accurately.
2. It should recover from each error quickly enough to be able to detect subsequent errors.
3. It should not significantly slow down the processing of correct programs.

▪ Several parsing methods, such as the LL and LR methods, detect an error as soon as
possible.

▪ LL and LR methods have the viable-prefix property, meaning they detect that an error
has occurred as soon as they see a prefix of the input that is not a prefix of any string in
the language.
Error-Recovery Strategies
▪ Once an error is detected, how should the parser recover?

▪ There are many different general strategies that a parser can employ to
recover from a syntactic error.

▪ These are
1. Panic mode recovery
2. Phrase level recovery
3. Error productions recovery
4. Global correction recovery
Cont…
▪ Panic mode recovery:
✓On discovering an error, the parser discards input symbols one at a time until a synchronizing
token is found.
✓The synchronizing tokens are usually delimiters, such as semicolon or end.
✓It has the advantage of simplicity and does not go into an infinite loop.
✓When multiple errors in the same statement are rare, this method is quite useful.

▪ Phrase level recovery:


✓On discovering an error, the parser performs local correction on the remaining input that
allows it to continue.
✓Example: Insert a missing semicolon or delete an extraneous semicolon etc.
Cont…
▪ Error productions:
✓The parser is constructed using augmented grammar with error productions.
✓If an error production is used by the parser, appropriate error diagnostics can be
generated to indicate the erroneous constructs recognized by the input.

▪ Global correction:
✓Given an incorrect input string x and grammar G, certain algorithms can be used to
find a parse tree for a string y, such that the number of insertions, deletions and
changes of tokens is as small as possible.
✓However, these methods are in general too costly in terms of time and space.
Context Free Grammar (CFG)
▪ Due to the limitations of regular expressions, the lexical analyzer
cannot check the syntax of a given sentence.

▪ Regular expressions cannot check balancing tokens, such as


parenthesis.

▪ Syntax analysis phase uses context-free grammar (CFG), which is


recognized by push-down automata.

▪ CFG is a superset of Regular Grammar


Cont…
Cont…
Definition of CFG:
➢A Context-Free Grammar (CFG) is a formal grammar that describes the syntax
or structure of a formal language.

➢It consists of a set of production rules that specify how to generate strings of
symbols in the language.
• A CFG is defined by a 4-tuple (V, Σ, R, S), where:
➢V: is a set of non-terminal symbols or variables
➢Σ: is a set of terminal symbols or terminals.
➢R: is a set of production rules or rewrite rules.
➢S: is the start symbol.
Cont…
▪ Example of context-free grammar:
▪ The following grammar defines simple
arithmetic expressions:
expr → expr op expr
expr → (expr)
expr → - expr
expr → id
op → +
op → -
op → *
op → /
op → ↑
Cont…
Cont…
Parse Tree
▪ A parse tree is a graphical depiction of a derivation.

▪ It is convenient to see how strings are derived from the start symbol.

▪ The start symbol of the derivation becomes the root of the parse tree.

▪ Let us see this by an example from the last topic.

▪ Example: We take the left-most derivation of a + b * c

▪ The left-most derivation is:


Cont…
Derivation
• Derivation is a process that generates a valid string with the help of
grammar by replacing the non-terminals on the left with the string on the
right side of the production.

• Example: Consider the following grammar for arithmetic expressions:


E→E+E|E*E|(E)|-E| id
Cont…
▪ Example: Consider the following grammar for arithmetic expressions:
E→E+E|E*E|(E)|-E|id
▪ To generate a valid string -( id+id ) from the grammar, the steps are
E→-E
E → id
E→-(E)
E → - ( E+E )
E → - ( id+E )
E → - ( id+id )
Cont…
▪ There are two types of derivation
1. Left-most Derivation

2. Right-most Derivation
▪ Example: Given grammar G: E → E+E | E*E | ( E ) | - E | id
▪ Sentence to be derived : - (id+id)
Left Most Derivation Right Most Derivation
E→-E E→-E
E→-(E) E→-(E)
E→-(E+E) E→-(E+E)
E→-(id+E) E→-(E+id)
E→-(id+id) E→-(id+id)
Exercise:
➢Derivation and Parse Tree: Let G be a Context Free Grammar for which
the production Rules are given below:

S → aB|bA
A → a|aS|bAA
B → b|bS|aBB

Drive the string aaabbabbba using the above grammar (using Left Most
Derivation and Right most Derivation).
Ambiguous Grammar/ Ambiguity
▪ A grammar G is said to be ambiguous if it has more than one parse tree (left or
right derivation) for at least one string.
▪ A grammar that produces more than one parse tree for some sentence is said to
be ambiguous grammar.
▪ Example: Given grammar G: E → E+E | E*E | ( E ) | - E | id
▪ The sentence id+id*id has the following two distinct leftmost derivations:
E → E+ E E → E* E
E → id + E E →E+ E * E
E → id + E * E E → id + E * E
E → id + id * E E → id + id * E
E → id + id * id E → id + id * id
▪ What will be the parse tree generated by the string or sentence?
Cont…
• For the string id + id – id, these grammar generates two parse trees:

➢Write the left most derivation and right most derivation for the above parse tree?
Eliminating ambiguity:
▪ Ambiguity of the grammar that produces more than one parse tree for leftmost or
rightmost derivation can be eliminated by re-writing the grammar.

▪ Consider this example,


G: stmt → if expr then stmt | if expr then stmt else stmt | other

This grammar is ambiguous since the string if E1 then if E2 then S1 else S2 has the following two
parse trees for leftmost derivation

▪ To eliminate ambiguity, the following grammar may be used:


➢ stmt → matched_stmt | unmatched_stmt

➢ matched_stmt → if expr then matched_stmt else matched_stmt | other

➢ unmatched_stmt → if expr then stmt | if expr then matched_stmt else unmatched_stmt


Cont…
Types of parsing
▪ Parsing is the process of analyzing a continuous stream of input in order to
determine its grammatical structure with respect to a given formal grammar.
▪ Parse tree:
✓Graphical representation of a derivation or deduction is called a parse tree.
✓Each interior node of the parse tree is a non-terminal; the children of the node can be
terminals or non-terminals.

▪ The parser is mainly classified into two categories:


1. Top down parsing: A parser can start with the start symbol and try to transform it to the input string.
2. Bottom up parsing: A parser can start with input and attempt to rewrite it into the start symbol.
Cont…
Cont…
1. Top-Down Parser:
➢The top-down parser is the parser that generates parse for the given input string
with the help of grammar productions by expanding the non-terminals

➢i.e. it starts from the start symbol and ends on the terminals.

➢It uses left most derivation.

➢Further Top-down parser is classified into 2 types:

1. Recursive descent parser

2. Non-recursive descent parser


Cont…
i. Recursive descent parser
➢It is also known as the Brute force parser or the backtracking parser.

➢It basically generates the parse tree by using brute force and backtracking.

ii. Non-recursive descent parser


➢It is also known as LL(1) parser or predictive parser or without backtracking
parser or dynamic parser.

➢It uses a parsing table to generate the parse tree instead of backtracking.
Cont…
2. Bottom-up Parser:
➢Bottom-up Parser is the parser that generates the parse tree for the given input
string with the help of grammar productions by compressing the non-terminals

➢i.e. it starts from non-terminals and ends on the start symbol.

➢It uses the reverse of the rightmost derivation.

➢Further Bottom-up parser is classified into two types:

1. LR parser

2. Operator precedence parser


Cont…
i. LR parser
➢It is the bottom-up parser that generates the parse tree for the given string by using
unambiguous grammar.
➢It follows the reverse of the rightmost derivation.
➢LR parser is of four types: a- LR(0) b- SLR(1) c-LALR(1) d-CLR(1)

ii. Operator precedence parser


➢It generates the parse tree form given grammar and string but the only condition is
two consecutive non-terminals and epsilon never appear on the right-hand side of any
production
Assignment-1
1. Recursive decent parsing
2. Backtracking parsing
a. Define Backtracking parsing
b. Implementation of Backtracking parsing
3. Predictive parsing or LL(1) parsing
a. Explain about predictive parsing or LL(1) parsing,
b. Show the block diagram of predictive parsing and
c. Write the steps to construct predictive parsing by supporting with examples?
d. Error recover in Predictive parsing
4. Explain about Shift reducing parsing
5. Explain about LR parsing and error recovery in LR parsing.
6. Explain about SLR parsing.
7. Explain about CLR parsing.
8. Explain about LALR parsing.
9. Explain about operator precedence parsing
Yacc Specifications
▪ In other words, a full specification file
▪ Every Yacc specification file consists of looks like:
three sections: Declarations
%%
➢ The declarations, Rules
%%
➢ The (grammar) rules, and Auxiliary programs
➢ The programs. The grammar
A →B
▪ The sections are separated by double A→C
A→D
percent “%%'' marks. Can be given to yacc as follows
A: B
▪ The percent ``%'' is generally used in |C
|
Yacc specifications as an escape D
|;
character.).
Cont…
Cont…
▪ The parser program in yacc is saved using the file extension .y

▪ The first step in executing yacc program p.y is yacc –d p.y the option –d
generates y.tab.c file which can be compiled using cc compiler so it can be
compiled using the command cc y.tab.c then the executable file a.out will
be generated.
Example-1: Yacc program
➢ Program to test the validity of a simple expression involving operators +, -, * and /
%token NUMBER ID NL
%left ‘+’ ‘-‘
%left ‘*’ ‘/’
%%
stmt : exp NL { printf(“Valid Expression”); exit(0);}
;
exp : exp ‘+’ exp
| exp ‘-‘ exp
| exp ‘*’ exp
| exp ‘/’ exp
| ‘(‘ exp ‘)’
| ID
| NUMBER
;
%%
int yyerror(char *msg)
{
printf(“Invalid Expression\n”);
exit(0);
}
main ()
{
printf(“Enter the expression\n”);
yyparse();
}
Example-2
➢ Program to recognize nested IF control statements and display the levels of nesting
%token IF RELOP S NUMBER ID
%{
int count=0;
%}
%%
stmt : if_stmt { printf(“No of nested if statements=%d\n”,count); exit(0);}
;
if_stmt : IF ‘(‘ cond ‘)’ if_stmt {count++;}
| S;
;
cond : x RELOP x
;
x : ID
| NUMBER
;
%%
int yyerror(char *msg)
{
printf(“Invalid Expression\n”);
exit(0);
}
main ()
{
printf(“Enter the statement”);
yyparse();
}
Example-3
➢ Program to check the syntax of a simple expression involving operators +, -, * and /.
%token NUMBER ID NL
%left ‘+’ ‘-‘
%left ‘*’ ‘/’
%%
stmt : exp NL { printf(“Valid Expression”); exit(0);}
;
exp : exp ‘+’ exp
| exp ‘-‘ exp
| exp ‘*’ exp
| exp ‘/’ exp
| ‘(‘ exp ‘)’
| ID
| NUMBER
;
%%
int yyerror(char *msg)
{
printf(“Invalid Expression\n”);
exit(0);
}
main ()
{
printf(“Enter the expression\n”);
yyparse();
}

You might also like