Top-Down Parsing
Top-Down Parsing
035
6 035
Top-Down Parsing
Martin Rinard
Laboratory for Computer Science
Massachusetts Institute of Technology
Orientation
• Language specification
• Lexical structure – regular expressions
• Syntactic structure – grammar
• This Lecture - recursive descent parsers
• Code parser as set of mutually recursive procedures
• Structure of program matches structure of grammar
Starting Point
Sentential Form
Sentential Form
Start
Expr
Applied
pp Production
Start → Expr
Current Position in Parse Tree
Parsing Example
Parse Remaining Input
Tree Start
2-2*2
p
Expr
Sentential Form
Expr - Term Expr - Term
Int 2
Parsing Example
Parse Remaining Input
Tree Start Match
Input -2*2
Expr
p Token!
Sentential Form
Expr - Term 2 - Term
Term
Int 2
Parsing Example
Parse Remaining Input
Tree Start Match
Input 2*2
Expr
p Token!
Sentential Form
Expr - Term 2 - Term
Term
Int 2
Parsing Example
Parse Remaining Input
Tree Start
2*2
Expr
p
Sentential Form
Expr - Term 2 - Term
Term*Int
Term
Term * Int Applied
pp Production
Int 2
Term → Term * Int
Parsing Example
Parse Remaining Input
Tree Start
2*2
Expr
p
Sentential Form
Expr - Term 2 - Int * Int
Term
Term * Int Applied
pp Production
Int 2
Int Term → Int
Parsing Example
Parse Remaining Input
Tree Start Match
Input 2*2
Expr
p Token!
Sentential Form
Expr - Term 2 - 2* Int
Term
Term * Int
Int 2
Int 2
Parsing Example
Parse Remaining Input
Tree Start Match
Input *2
Expr
p Token!
Sentential Form
Expr - Term 2 - 2* Int
Term
Term * Int
Int 2
Int 2
Parsing Example
Parse Remaining Input
Tree Start Match
Input 2
Expr
p Token!
Sentential Form
Expr - Term 2 - 2* Int
Term
Term * Int
Int 2
Int 2
Parsing Example
Parse Remaining Input
Tree Start Parse
Complete! 2
Expr
p
Sentential Form
Expr - Term 2 - 2*2
Term
Term * Int 2
Int 2
Int 2
Summary
Sentential Form
Start
Backtracking
g Example
p
Parse Remaining Input
Tree Start
2-2*2
Expr
p
Sentential Form
Expr
Applied
pp Production
Start → Expr
Backtracking
g Example
p
Parse Remaining Input
Tree Start
2-2*2
Expr
p
Sentential Form
Expr + Term Expr + Term
Applied
pp Production
Expr → Expr + Term
Backtracking
g Example
p
Parse Remaining Input
Tree Start
2-2*2
Expr
p
Sentential Form
Expr + Term Term + Term
Term
Applied
pp Production
Expr → Term
Backtracking
g Example
p
Parse Remaining Input
Tree Start Match
Input 2-2*2
Expr
p Token!
Sentential Form
Expr + Term Int + Term
Term
Applied
pp Production
Int
Term → Int
Backtracking
g Example
p
Parse Remaining Input
Tree Start Can t
Can’t
Match -2*2
Expr
p Input
Sentential Form
Token!
Expr + Term 2 - Term
Term
Applied
pp Production
Int 2
Term → Int
Backtracking
g Example
p
Parse Remaining Input
Tree Start So
Backtrack! 2-2*2
Expr
p
Sentential Form
Expr
Applied
pp Production
Start → Expr
Backtracking
g Example
p
Parse Remaining Input
Tree Start
2-2*2
Expr
p
Sentential Form
Expr - Term Expr - Term
Applied
pp Production
Expr → Expr - Term
Backtracking
g Example
p
Parse Remaining Input
Tree Start
2-2*2
Expr
p
Sentential Form
Expr - Term Term - Term
Term
Applied
pp Production
Expr → Term
Backtracking
g Example
p
Parse Remaining Input
Tree Start
2-2*2
Expr
p
Sentential Form
Expr - Term Int - Term
Term
Applied
pp Production
Int Term → Int
Backtracking
g Example
p
Parse Remaining Input
Tree Start Match
Input -2*2
Expr
p Token!
Sentential Form
Expr - Term 2 - Term
Term
Int 2
Backtracking
g Example
p
Parse Remaining Input
Tree Start Match
Input 2*2
Expr
p Token!
Sentential Form
Expr - Term 2 - Term
Term
Int 2
Left Recursion + Top-Down Parsing
= Infinite Loop
• Example Production: Term → Term*Num
• Potential
P t ti l parsing
i steps:
t
Term * Num
General Search Issues
• Three components
• Search space (parse trees)
• Search algorithm (parsing algorithm)
algorithm)
• Goal to find (parse tree for input program)
• Would like to (but can’t always) ensure that
• Find goal (hopefully quickly) if it exists
• Search terminates if it does not
• Handled in various ways in various contexts
• Finite search space makes it easy
• Exploration strategies for infinite search space
• Sometimes one goal more important (model checking)
• For parsing, hack grammar to remove left recursion
Eliminating Left Recursion
• Start with productions of form
• A →A α
• A→β
• α, β sequences of terminals and nonterminals that
do not start with A
• Repeated application of A →A α
A
builds parse tree like this:
A α
A α
β α
Eliminating Left Recursion
• Replacement productions
– A →A α A→ βR R is a new nonterminal
– A→ β R→αR
– R→ε New Parse Tree
Old Parse Tree A
A R
β R
A α α
R
β α α
ε
Hacked Grammar
Term
Term
Int Term’
Term * Int
* Int Term’
Int * Int
* Int Term’
ε
Eliminating Left Recursion
• Alternative to backtracking
• Useful
U f l for
f programming i languages,
l which
hi h can be
b
designed to make parsing easier
• Basic idea
• Look ahead in input stream
• Decide which production to apply based on
next tokens in input stream
• We will use one token of lookahead
Predictive Parsing Example Grammar
• Notation
• T is t i l NT is
i a terminal, i l S is
i a nonterminal,
t i a
terminal or nonterminal, and β is a sequence
of terminals or nonterminals
Rules + Request Generate System of Subset
Inclusion Constraints
Grammar Request: What is First(Term’ )?
Term’ → * Int Term’
Term’ → / Int Term’
Constraints
Term’ → ε
First(* Num Term’ ) ⊆ First(Term’ )
First(/ Num Term’ ) ⊆ First(Term’ )
Rules
First(*) ⊆ First(* Num Term’ )
1)) T∈First((T ) First(/) ⊆ First(/ N
Num
um T
Term
erm’ )
2) First(S) ⊆ First(S β) *∈First(*)
3) NT derives ε implies / ∈First(/)
First(β) ⊆ First(NT β)
4) NT → S β implies
First(S β) ⊆ First(NT )
Constraint Propagation
p g Algorithm
g
Constraints
Solution
First(* Num Term’ ) ⊆ First(Term’ )
First(Term’ ) = {}
First(/ Num Term’ ) ⊆ First(Term’ )
First(* Num Term’ ) = {}
First(*)) ⊆ First(* Num Term’
First( Term )
First(/Num T erm’ ) = {}
First(/) ⊆ First(/ Num Term’ )
First(*) = {*}
*∈First(*)
First(/) = {/}
/ ∈First(/)
Initialize Sets to {}
Propagate Constraints Until
Fixed Point
Constraint Propagation
p g Algorithm
g
Constraints
Solution
First(* Num Term’ ) ⊆ First(Term’ )
First(Term’ ) = {}
First(/ Num Term’ ) ⊆ First(Term’ )
First(* Num Term’ ) = {}
First(*)) ⊆ First(* Num Term’
First( Term )
First(/Num T erm’ ) = {}
First(/) ⊆ First(/ Num Term’ )
First(*) = {*}
*∈First(*)
First(/) = {/}
/ ∈First(/)
Grammar
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
Term
Constraint Propagation
p g Algorithm
g
Constraints
Solution
First(* Num Term’ ) ⊆ First(Term’ )
First(Term’ ) = {}
First(/ Num Term’ ) ⊆ First(Term’ )
First(* Num Term’ ) = {*}
First(*)) ⊆ First(* Num Term’
First( Term )
First(/Num T erm’ ) = {/}
First(/) ⊆ First(/ Num Term’ )
First(*) = {*}
*∈First(*)
First(/) = {/}
/ ∈First(/)
Grammar
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
Term
Constraint Propagation
p g Algorithm
g
Constraints
Solution
First(* Num Term’ ) ⊆ First(Term’ )
First(Term’ ) = {*,/}
First(/ Num Term’ ) ⊆ First(Term’ )
First(* Num Term’ ) = {*}
First(*)) ⊆ First(* Num Term’
First( Term )
First(/Num T erm’ ) = {/}
First(/) ⊆ First(/ Num Term’ )
First(*) = {*}
*∈First(*)
First(/) = {/}
/ ∈First(/)
Grammar
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
Term
Constraint Propagation
p g Algorithm
g
Constraints
Solution
First(* Num Term’ ) ⊆ First(Term’ )
First(Term’ ) = {*,/}
First(/ Num Term’ ) ⊆ First(Term’ )
First(* Num Term’ ) = {*}
First(*)) ⊆ First(* Num Term’
First( Term )
First(/Num T erm’ ) = {/}
First(/) ⊆ First(/ Num Term’ )
First(*) = {*}
*∈First(*)
First(/) = {/}
/ ∈First(/)
Grammar
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
Term
Building A Parse Tree
Int Term’
Term Term
2
Term’
Term Term * Int
* Int
4
3
Term’ Int * Int
* Int
2 3
4
ε
Why Use Hand-Coded Parser?
• Why not use parser generator?
• What do you do if your parser doesn’t
doesn t work?
• Recursive descent parser – write more code
• Parser
a se ggenerator
e e ato
• Hack grammar
• But if parser generator doesn’t work,
nothing
h you can do
d
• If you have complicated grammar
• Increase chance of going outside
outside comfort zone
of parser generator
• Your parser
p mayy NEVER work
Bottom Line
• Recursive descent parser properties
• Probably more work
• But less risk of a disaster - you can almost always
make a recursive descent parser
p work
• May have easier time dealing with resulting code
• Single language system
• No need to deal with potentially flaky parser
generator
• No integration issues with automatically
generated code
• If your
y parser
p development
p time is small compared
p to
rest of project, or you have a really complicated
language, use hand-coded recursive descent parser
Summary
• Top-Down Parsing
• Use
U Lookahead
L k h d to t Avoid
A id Backtracking
B kt ki
• Parser is
• Hand-Coded
Hand Coded
• Set of Mutually Recursive Procedures
Direct Generation of Abstract Tree
• TermPrime builds an incomplete tree
• Missing leftmost child
• Returns root and incomplete node
• (root, incomplete) = TermPrime()
• Called with token = *
• Remaining tokens = 3 * 4 root Term
token Int
2
Code for Term Input
p to
Term()
parse
if (token = Int n)
leftmostInt = token;; token = NextToken();
(); 2*3*4
(root, incomplete) = TermPrime();
if (root == NULL) return leftmostInt;
incomplete leftChild = leftmostInt;
incomplete.leftChild leftmostInt;
return root;
else throw SyntaxError
token Int
2
Code for Term Input
p to
Term()
parse
if (token = Int n)
leftmostInt = token;; token = NextToken();
(); 2*3*4
(root, incomplete) = TermPrime();
if (root == NULL) return leftmostInt;
incomplete leftChild = leftmostInt;
incomplete.leftChild leftmostInt;
return root;
else throw SyntaxError
token Int
2
Code for Term Input
p to
Term()
parse
if (token = Int n)
leftmostInt = token;; token = NextToken();
(); 2*3*4
(root, incomplete) = TermPrime();
if (root == NULL) return leftmostInt;
incomplete leftChild = leftmostInt;
incomplete.leftChild leftmostInt;
return root;
else throw SyntaxError root Term
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.