Presented by Jyoti Thakur
Presented by Jyoti Thakur
JYOTI THAKUR
Outline
Introduction
Phases of compiler
Tokens, Pattern, Lexems,
Lexical Analysis
Types of Grammar
Parsers
Introduction
Compiler is a language processor which translates a
source program into object program(machine
language).
It can be Assembly Language.
input
Source Target
Compiler
Program Program
Output
Error messages
The Analysis-Synthesis Model
of Compilation
There are two parts to compilation:
Analysis
Synthesis
Phases of compiler
Lexical Analysis
Syntax Analysis
Semantic Analysis
Intermediate Code Generator
M/C independent code optimization
Target code generator
M/C dependent code optimization
Tokens, Patterns and Lexemes
A sequence of input characters that comprises a single
token is called a lexeme. Ex: total, = , 2 etc.
Tokens are classes of similar lexemes. Ex: identifier,
keyword, constant etc.
It is a rule which describe a token. Ex: pattern for
identifier is letter is followed by letter or digits.
Lexical Analysis
It reads total high level Language Program one
character at a time.
It Breaks the program into tokens.
It removes White spaces and Comments.
It maintains the line number of the program.
It creates the storage for identifier in the symbols
table.
Types of Grammar
There are two types of Grammar:
1. Left Recursive Grammar.
2. Right Recursive Grammar.
• Left recursive grammar may not be equivalent to
Right recursive grammar.
• Left recursion grammar creates an infinite loop in the
program.
• Right recursion grammar doesn’t create any
problem.
Elimination of left recursion
A grammar is left recursive if it has a non-terminal A
+
such that there is a derivation A=> Aα
Top down parsing methods can’t handle left-
recursive grammars
A simple rule for direct left recursion elimination:
For a rule like:
S-> Sa|b
We may replace it with
S -> b S’
S’ -> a S’ | ɛ
Left factoring
This kind of production actually creates an issue with
Generating a string.
To avoid Backtracking.
For a rule like:
S -> aα1/aα2/aα3…
We may replace it with:
S ->aS’
S’ -> α1/α2/α3..
Left factoring (cont.)
Example:
S -> I E t S | i E t S e S | a
E -> b
Introduction
A Top-down parser tries to create a parse tree from the
root towards the leaf's scanning input from left to
right
It can be also viewed as finding a leftmost derivation
for an input string
Example: id+id*idE
E -> TE’ lm
E E
lm
E E E
lm lm lm
E’ -> +TE’ | Ɛ T E’ T E’ T E’ T E’ T E’
T -> FT’
T’ -> *FT’ | Ɛ F T’ F T’ F T’ F T’ + T E’
F -> (E) | id id id Ɛ id Ɛ
Example
S->cAd
A->ab | a Input: cad
S S S
c A d c A d c A d
a b a
Computing First
Rule 1:
If A is ε, Then First(A) = ε.
Rule 2:
If A is terminal, Then First(A) = A.
Rule 3:
If A is variable, Then First(A) -> First(x1, x2,
x3…),Where if first(x1) contains ε then find first(x2)…
Computing follow
To compute First(A) for all non terminals A, apply
following rules until nothing can be added to any
follow set:
1. Place $ in Follow(A) where A is the start symbol
2. If there is a production S-> βAαD then everything in
First(αD) except ɛ is in Follow(A).
3. If there is a production S-> βAor a production
S->βAD where First(D) contains ɛ, then everything in
Follow(A) is in Follow(S)
Example!
Construction of predictive
parsing table(LL1)
For each production A->α in grammar do the
following:
1. For each terminal a in First(α) add A-> in M[A,a]
2. If ɛ is in First(α), then for each terminal b in
Follow(A) add A-> ɛ to M[A,b]. If ɛ is in First(α) and
$ is in Follow(A), add A-> ɛ to M[A,$] as well
If after performing the above, there is no production
in M[A,a] then set M[A,a] to error
Example First Follow
F {(,id} {+, *, ), $}
E -> TE’ {(,id} {+, ), $}
E’ -> +TE’ | Ɛ T
E {(,id} {), $}
T -> FT’
T’ -> *FT’ | Ɛ E’ {+,ɛ} {), $}
T’ {*,ɛ} {+, ), $}
F -> (E) | id
Input Symbol
Non -
terminal id + * ( ) $
E E -> TE’ E -> TE’
Input Symbol
Non -
terminal a b e i t $
S S -> a S -> iEtSS’
S’ S’ -> Ɛ S’ -> Ɛ
S’ -> eS
E E -> b
Introduction
Constructs parse tree for an input string beginning at
the leaves (the bottom) and working towards the
root (the top)
Example: id*id
id id F id T*F
id F id
id
Shift-reduce parser
The key decisions during bottom-up parsing are about
when to reduce and about what production to
apply
A reduction is a reverse of a step in a derivation
The goal of a bottom-up parser is to construct a
derivation in reverse:
E=>T=>T*F=>T*id=>F*id=>id*id
LR Parsing
The most prevalent type of bottom-up parsers
The structure of LR Parser is Similar to LL(1)Parser.
It apply on unambiguous grammar.
It uses a canonical collection of LR(0)items.
States of an LR parser
States represent set of items
An LR(0) item of G is a production of G with the dot at
some position of the body:
For A->XYZ we have following items
A->.XYZ
A->X.YZ
A->XY.Z
A->XYZ.
In a state having A->.XYZ we hope to see a string
derivable from XYZ next on the input.
What about A->X.YZ?
Constructing canonical LR(0)
item sets
Augmented grammar:
G with addition of a production: S’->S
Closure of item sets:
If I is a set of items, closure(I) is a set of items constructed from I by
the following rules:
Add every item in I to closure(I)
Example acc
$
T -> T * F | F
F -> (E) | id
I6 I9
E->E+.T
I1 T->.T*F T
E’->E. + T->.F
E->E+T.
T->T.*F
E E->E.+T
F->.(E)
F->.id
I0=closure({[E’->.E]} I2
T I7 I10
E’->.E
E’->T. * T->T*.F F
E->.E+T F->.(E)
T->T.*F T->T*F.
E->.T id F->.id
T->.T*F id
T->.F I5
F->.(E) F->id.
F->.id ( +
I4
F->(.E)
E->.E+T I8 I11
E->.T
E E->E.+T )
T->.T*F F->(E.) F->(E).
T->.F
F->.(E)
F->.id
I3
T>F.
THANK YOU!!