0% found this document useful (0 votes)
123 views73 pages

Top-Down Parsing

This document discusses top-down parsing using recursive descent parsers. It describes how a parser can be implemented as a set of mutually recursive procedures that match the structure of the grammar. The basic approach is to start with the start symbol and build a leftmost derivation by choosing productions and applying them if the leftmost symbol matches. Backtracking is used to try alternative productions if the current attempt fails.

Uploaded by

Ahmad Abba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views73 pages

Top-Down Parsing

This document discusses top-down parsing using recursive descent parsers. It describes how a parser can be implemented as a set of mutually recursive procedures that match the structure of the grammar. The basic approach is to start with the start symbol and build a leftmost derivation by choosing productions and applying them if the leftmost symbol matches. Backtracking is used to try alternative productions if the current attempt fails.

Uploaded by

Ahmad Abba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

MIT 6.

035
6 035
Top-Down Parsing

Martin Rinard
Laboratory for Computer Science
Massachusetts Institute of Technology
Orientation

• Language specification
• Lexical structure – regular expressions
• Syntactic structure – grammar
• This Lecture - recursive descent parsers
• Code parser as set of mutually recursive procedures
• Structure of program matches structure of grammar
Starting Point

• Assume lexical analysis has produced a sequence


of tokens
• Each token has a type and value
• Types correspond to terminals
• Values to contents of token read in
• Examples
• Int 549 – integer token with value 549 read in
• if - if keyword, no need for a value
• AddOp + - add operator, value +
Basic Approach

• Start with Start symbol


B ild a leftmost
• Build l ft t derivation
d i ti
• If leftmost symbol is nonterminal, choose a
production and apply it
• If leftmost symbol is terminal, match against
input
p
• If all terminals match, have found a parse!
• Key:
y find correct pproductions for nonterminals
Graphical Illustration of Leftmost
Derivation

Sentential Form

NT1 T1 T2 T3 NT2 NT3

Apply Production Not Here


Here
Here
Grammar for Parsing
g Example
p
Start → Expr
• Set of tokens is
Expr → Expr + Term
{ +, -, *, /, Int }, where
Expr → Expr - Term Int = [0-9][0-9]*
Expr → Term • For
For conve
convenience,
nience may represent
Term → Term * Int each Int n token by n
Term → Term / Int
Term → Int
Parsing Example
Parse Remaining Input
Tree Start
2-2*2

Sentential Form

Start

Current Position in Parse Tree


Parsing Example
Parse Remaining Input
Tree Start
2-2*2
Expr
p
Sentential Form

Expr

Applied
pp Production

Start → Expr
Current Position in Parse Tree
Parsing Example
Parse Remaining Input
Tree Start
2-2*2
p
Expr
Sentential Form
Expr - Term Expr - Term

Expr → Expr + Term Applied


pp Production
Expr → Expr - Term
Expr → Term Expr → Expr - Term
Parsing Example
Parse Remaining Input
Tree Start
2-2*2
Expr
p
Sentential Form
Expr - Term Term - Term
Term
Expr
p → Expr
p + Term
e Applied
pp Production
Expr → Expr - Term
Expr → Term
Expr→→Term
Expr Term
Parsing Example
Parse Remaining Input
Tree Start
2-2*2
Expr
p
Sentential Form
Expr - Term Int - Term
Term
Applied
pp Production
Int
Term → Int
Parsing Example
Parse Remaining Input
Tree Start Match
Input 2-2*2
Expr
p Token!
Sentential Form
Expr - Term 2 - Term
Term

Int 2
Parsing Example
Parse Remaining Input
Tree Start Match
Input -2*2
Expr
p Token!
Sentential Form
Expr - Term 2 - Term
Term

Int 2
Parsing Example
Parse Remaining Input
Tree Start Match
Input 2*2
Expr
p Token!
Sentential Form
Expr - Term 2 - Term
Term

Int 2
Parsing Example
Parse Remaining Input
Tree Start
2*2
Expr
p
Sentential Form
Expr - Term 2 - Term
Term*Int
Term
Term * Int Applied
pp Production
Int 2
Term → Term * Int
Parsing Example
Parse Remaining Input
Tree Start
2*2
Expr
p
Sentential Form
Expr - Term 2 - Int * Int
Term
Term * Int Applied
pp Production
Int 2
Int Term → Int
Parsing Example
Parse Remaining Input
Tree Start Match
Input 2*2
Expr
p Token!
Sentential Form
Expr - Term 2 - 2* Int
Term
Term * Int
Int 2
Int 2
Parsing Example
Parse Remaining Input
Tree Start Match
Input *2
Expr
p Token!
Sentential Form
Expr - Term 2 - 2* Int
Term
Term * Int
Int 2
Int 2
Parsing Example
Parse Remaining Input
Tree Start Match
Input 2
Expr
p Token!
Sentential Form
Expr - Term 2 - 2* Int
Term
Term * Int
Int 2
Int 2
Parsing Example
Parse Remaining Input
Tree Start Parse
Complete! 2
Expr
p
Sentential Form
Expr - Term 2 - 2*2
Term
Term * Int 2
Int 2
Int 2
Summary

• Three Actions (Mechanisms)


• Apply
A l production
d ti tot expand d currentt
nonterminal in parse tree
• Match current terminal (consuming input)
• Accept the parse as correct
• Parser generates preorder traversal of parse tree
• visit parents before children
• visit siblings from left to right
Policy Problem
• Which production to use for each nonterminal?
• Classical Separation
p of Policyy and Mechanism
• One Approach: Backtracking
• Treat it as a search problem
• At each choice point, try next alternative
• If it is clear that current try fails, go back to
previous choice and try something different
• General technique for searching
• Used a lot in classical AI and natural language
g g
processing (parsing, speech recognition)
Backtracking
g Example
p
Parse Remaining Input
Tree Start
2-2*2

Sentential Form
Start
Backtracking
g Example
p
Parse Remaining Input
Tree Start
2-2*2
Expr
p
Sentential Form
Expr

Applied
pp Production
Start → Expr
Backtracking
g Example
p
Parse Remaining Input
Tree Start
2-2*2
Expr
p
Sentential Form
Expr + Term Expr + Term

Applied
pp Production
Expr → Expr + Term
Backtracking
g Example
p
Parse Remaining Input
Tree Start
2-2*2
Expr
p
Sentential Form
Expr + Term Term + Term
Term
Applied
pp Production
Expr → Term
Backtracking
g Example
p
Parse Remaining Input
Tree Start Match
Input 2-2*2
Expr
p Token!
Sentential Form
Expr + Term Int + Term
Term
Applied
pp Production
Int
Term → Int
Backtracking
g Example
p
Parse Remaining Input
Tree Start Can t
Can’t
Match -2*2
Expr
p Input
Sentential Form
Token!
Expr + Term 2 - Term
Term
Applied
pp Production
Int 2
Term → Int
Backtracking
g Example
p
Parse Remaining Input
Tree Start So
Backtrack! 2-2*2
Expr
p
Sentential Form
Expr

Applied
pp Production
Start → Expr
Backtracking
g Example
p
Parse Remaining Input
Tree Start
2-2*2
Expr
p
Sentential Form
Expr - Term Expr - Term

Applied
pp Production
Expr → Expr - Term
Backtracking
g Example
p
Parse Remaining Input
Tree Start
2-2*2
Expr
p
Sentential Form
Expr - Term Term - Term
Term
Applied
pp Production
Expr → Term
Backtracking
g Example
p
Parse Remaining Input
Tree Start
2-2*2
Expr
p
Sentential Form
Expr - Term Int - Term
Term
Applied
pp Production
Int Term → Int
Backtracking
g Example
p
Parse Remaining Input
Tree Start Match
Input -2*2
Expr
p Token!
Sentential Form
Expr - Term 2 - Term
Term

Int 2
Backtracking
g Example
p
Parse Remaining Input
Tree Start Match
Input 2*2
Expr
p Token!
Sentential Form
Expr - Term 2 - Term
Term

Int 2
Left Recursion + Top-Down Parsing
= Infinite Loop
• Example Production: Term → Term*Num
• Potential
P t ti l parsing
i steps:
t

Term Term Term

Term * Num Term * Num

Term * Num
General Search Issues
• Three components
• Search space (parse trees)
• Search algorithm (parsing algorithm)
algorithm)
• Goal to find (parse tree for input program)
• Would like to (but can’t always) ensure that
• Find goal (hopefully quickly) if it exists
• Search terminates if it does not
• Handled in various ways in various contexts
• Finite search space makes it easy
• Exploration strategies for infinite search space
• Sometimes one goal more important (model checking)
• For parsing, hack grammar to remove left recursion
Eliminating Left Recursion
• Start with productions of form
• A →A α
• A→β
• α, β sequences of terminals and nonterminals that
do not start with A
• Repeated application of A →A α
A
builds parse tree like this:
A α
A α
β α
Eliminating Left Recursion
• Replacement productions
– A →A α A→ βR R is a new nonterminal
– A→ β R→αR
– R→ε New Parse Tree
Old Parse Tree A
A R
β R
A α α
R
β α α
ε
Hacked Grammar

Original Grammar New Grammar Fragment


Fragment Term → Int
Term Int Term
Term’
Term → Term * Int Term’ → * Int Term’
Term → Term / Int Term’ → / Int Term’
Term → Int
Term’ → ε
Parse Tree Comparisons
p

Original Grammar New Grammar

Term
Term
Int Term’
Term * Int
* Int Term’
Int * Int
* Int Term’

ε
Eliminating Left Recursion

• Changes search space exploration algorithm


• Eliminates
Eli i t direct
di t infinite
i fi it recursion
i
• But grammar less intuitive
• Sets things up for predictive
predictive p
parsing
arsing
Predictive Parsing

• Alternative to backtracking
• Useful
U f l for
f programming i languages,
l which
hi h can be
b
designed to make parsing easier
• Basic idea
• Look ahead in input stream
• Decide which production to apply based on
next tokens in input stream
• We will use one token of lookahead
Predictive Parsing Example Grammar

Start → Expr Term


Term → Int Term
Term’
Expr → Term Expr’ Term’ → * Int Term’
Expr’ → + Expr’ Term’ → / Int Term’
Expr’ → - Expr’
Term’ → ε
Expr’ → ε
Choice Points
• Assume Term’ is current position in parse tree
• Have three possible productions to apply
Term’ → * Int Term’
Term’
e → / Intt Term’
e
Term’ → ε
• Use next token to decide
• If next token is *, apply Term’ → * Int Term’
• If next token is /, apply Term’ → / Int Term’
Otherwise apply Term
• Otherwise, Term’ → ε
Predictive Parsing + Hand Coding =
Recursive Descent Parser
• One procedure per nonterminal NT
• Productions
P d ti NT → β1 , …, NT → βn
• Procedure examines the current input symbol T to
determine which production
p to apply
pp y
• If T∈First(βk)
• Apply production k
• Consume
C terminals
t i βk (check
i l in ( h k for
f correctt
terminal)
• Recursively call procedures for nonterminals in βk
• Current input symbol stored in global variable token
• Procedures return
• true if parse succeeds
succeeds
• false if parse fails
Boolean Term()
Example
if (token = Int n) token = NextToken(); return(TermPrime())
else return(false)
Boolean TermPrime()
if (token = *)
token = NextToken();
if (t
(token
k = IIntt n)) ttoken
k =NNextToken();
tT k () return(TermPrime())
t (T P i ())
else return(false)
else if (token = /)
token = NextToken();
if (token = Int n) token = NextToken(); return(TermPrime())
else return(false)
else return(true)
Term → Int Term’
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
Multiple Productions With Same
Prefix in RHS
• Example Grammar
NT → if then
NT → if then else
• Assume NT is current position in parse tree,
tree aand
nd
if is the next token
• Unclear which production to apply
• Multiple k such that T∈First(βk)
• if ∈ First(if then)
• if ∈ First(if then else)
Solution: Left Factor the Grammar

• New Grammar Factors Common Prefix Into


Single Production
NT → if then NT’
NT’ → else
NT
NT’ → ε
• No choice when next token is if!
• All choices have been unified in one production.
Nonterminals

• What about productions with nonterminals?


NT → NT1 α1
NT → NT2 α 2
• Must choose based on possible first terminals
that NT1 and NT2 can generate
Wh t if NT1 or NT2 can generate
• What t ε??
• Must choose based on α1 and α2
NT derives ε
• Two rules
• NT → ε implies
i li NT derives
d i ε
• NT → NT1 ... NTn and for all 1≤i ≤n NTi
d i
derives ε implies
i li NT derives
d i ε
Fixed Point Algorithm for Derives ε

for all nonterminals NT


et NT derives
set de i e ε to be ffalse
l e
for all productions of the form NT → ε
set NT derives ε to be true
while (some NT derives ε changed in last iteration)
for all productions of the form NT → NT1 ... NTn
if (for all 1≤i ≤n NTi derives ε)
set NT derives ε to be true
First(β)
• T∈ First(β ) if T can appear as the first
symbol in a derivation starting from β
1) T∈First(T )
2)) First((S ) ⊆ First((S β)
β)
3) NT derives ε implies First(β) ⊆ First(NT β)
4) NT → S β implies First(S β) ⊆ First(NT )

• Notation
• T is t i l NT is
i a terminal, i l S is
i a nonterminal,
t i a
terminal or nonterminal, and β is a sequence
of terminals or nonterminals
Rules + Request Generate System of Subset
Inclusion Constraints
Grammar Request: What is First(Term’ )?
Term’ → * Int Term’
Term’ → / Int Term’
Constraints
Term’ → ε
First(* Num Term’ ) ⊆ First(Term’ )
First(/ Num Term’ ) ⊆ First(Term’ )
Rules
First(*) ⊆ First(* Num Term’ )
1)) T∈First((T ) First(/) ⊆ First(/ N
Num
um T
Term
erm’ )
2) First(S) ⊆ First(S β) *∈First(*)
3) NT derives ε implies / ∈First(/)
First(β) ⊆ First(NT β)
4) NT → S β implies
First(S β) ⊆ First(NT )
Constraint Propagation
p g Algorithm
g
Constraints
Solution
First(* Num Term’ ) ⊆ First(Term’ )
First(Term’ ) = {}
First(/ Num Term’ ) ⊆ First(Term’ )
First(* Num Term’ ) = {}
First(*)) ⊆ First(* Num Term’
First( Term )
First(/Num T erm’ ) = {}
First(/) ⊆ First(/ Num Term’ )
First(*) = {*}
*∈First(*)
First(/) = {/}
/ ∈First(/)

Initialize Sets to {}
Propagate Constraints Until
Fixed Point
Constraint Propagation
p g Algorithm
g
Constraints
Solution
First(* Num Term’ ) ⊆ First(Term’ )
First(Term’ ) = {}
First(/ Num Term’ ) ⊆ First(Term’ )
First(* Num Term’ ) = {}
First(*)) ⊆ First(* Num Term’
First( Term )
First(/Num T erm’ ) = {}
First(/) ⊆ First(/ Num Term’ )
First(*) = {*}
*∈First(*)
First(/) = {/}
/ ∈First(/)

Grammar
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
Term
Constraint Propagation
p g Algorithm
g
Constraints
Solution
First(* Num Term’ ) ⊆ First(Term’ )
First(Term’ ) = {}
First(/ Num Term’ ) ⊆ First(Term’ )
First(* Num Term’ ) = {*}
First(*)) ⊆ First(* Num Term’
First( Term )
First(/Num T erm’ ) = {/}
First(/) ⊆ First(/ Num Term’ )
First(*) = {*}
*∈First(*)
First(/) = {/}
/ ∈First(/)

Grammar
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
Term
Constraint Propagation
p g Algorithm
g
Constraints
Solution
First(* Num Term’ ) ⊆ First(Term’ )
First(Term’ ) = {*,/}
First(/ Num Term’ ) ⊆ First(Term’ )
First(* Num Term’ ) = {*}
First(*)) ⊆ First(* Num Term’
First( Term )
First(/Num T erm’ ) = {/}
First(/) ⊆ First(/ Num Term’ )
First(*) = {*}
*∈First(*)
First(/) = {/}
/ ∈First(/)

Grammar
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
Term
Constraint Propagation
p g Algorithm
g
Constraints
Solution
First(* Num Term’ ) ⊆ First(Term’ )
First(Term’ ) = {*,/}
First(/ Num Term’ ) ⊆ First(Term’ )
First(* Num Term’ ) = {*}
First(*)) ⊆ First(* Num Term’
First( Term )
First(/Num T erm’ ) = {/}
First(/) ⊆ First(/ Num Term’ )
First(*) = {*}
*∈First(*)
First(/) = {/}
/ ∈First(/)

Grammar
Term’ → * Int Term’
Term’ → / Int Term’
Term’ → ε
Term
Building A Parse Tree

• Have each procedure return the section of the


parse tree for the part of the string it parsed
• Use exceptions to make code structure clean
Building Parse Tree In Example
Term()
if (token = Int n)
oldToken = token;; token = NextToken();
();
node = TermPrime();
if (node == NULL) return oldToken;
else return(new TermNode(oldToken,
TermNode(oldToken node);
else throw SyntaxError
TermPrime()
if (token = *) || (token = /)
first = token; next = NextToken();
if (next
( = Int n))
token = NextToken();
return(new TermPrimeNode(first, next, TermPrime())
else throw SyntaxError
else return(NULL)
Parse Tree for 2*3*4
Concrete Desired
Parse Tree Abstract
Term Parse Tree

Int Term’
Term Term
2
Term’
Term Term * Int
* Int
4
3
Term’ Int * Int
* Int
2 3
4
ε
Why Use Hand-Coded Parser?
• Why not use parser generator?
• What do you do if your parser doesn’t
doesn t work?
• Recursive descent parser – write more code
• Parser
a se ggenerator
e e ato
• Hack grammar
• But if parser generator doesn’t work,
nothing
h you can do
d
• If you have complicated grammar
• Increase chance of going outside
outside comfort zone
of parser generator
• Your parser
p mayy NEVER work
Bottom Line
• Recursive descent parser properties
• Probably more work
• But less risk of a disaster - you can almost always
make a recursive descent parser
p work
• May have easier time dealing with resulting code
• Single language system
• No need to deal with potentially flaky parser
generator
• No integration issues with automatically
generated code
• If your
y parser
p development
p time is small compared
p to
rest of project, or you have a really complicated
language, use hand-coded recursive descent parser
Summary

• Top-Down Parsing
• Use
U Lookahead
L k h d to t Avoid
A id Backtracking
B kt ki
• Parser is
• Hand-Coded
Hand Coded
• Set of Mutually Recursive Procedures
Direct Generation of Abstract Tree
• TermPrime builds an incomplete tree
• Missing leftmost child
• Returns root and incomplete node
• (root, incomplete) = TermPrime()
• Called with token = *
• Remaining tokens = 3 * 4 root Term

incomplete Term * Int


4
Missingleft
Missing
Missing Leftchild
left child
child
* Int
3
to be
to be filled
filled inin by
by
caller
caller
Code for Term Input
p to
Term()
parse
if (token = Int n)
leftmostInt = token;; token = NextToken();
(); 2*3*4
(root, incomplete) = TermPrime();
if (root == NULL) return leftmostInt;
incomplete leftChild = leftmostInt;
incomplete.leftChild leftmostInt;
return root;
else throw SyntaxError

token Int
2
Code for Term Input
p to
Term()
parse
if (token = Int n)
leftmostInt = token;; token = NextToken();
(); 2*3*4
(root, incomplete) = TermPrime();
if (root == NULL) return leftmostInt;
incomplete leftChild = leftmostInt;
incomplete.leftChild leftmostInt;
return root;
else throw SyntaxError

token Int
2
Code for Term Input
p to
Term()
parse
if (token = Int n)
leftmostInt = token;; token = NextToken();
(); 2*3*4
(root, incomplete) = TermPrime();
if (root == NULL) return leftmostInt;
incomplete leftChild = leftmostInt;
incomplete.leftChild leftmostInt;
return root;
else throw SyntaxError

token Int
2
Code for Term Input
p to
Term()
parse
if (token = Int n)
leftmostInt = token;; token = NextToken();
(); 2*3*4
(root, incomplete) = TermPrime();
if (root == NULL) return leftmostInt;
incomplete leftChild = leftmostInt;
incomplete.leftChild leftmostInt;
return root;
else throw SyntaxError root Term

incomplete Term * Int


4
leftmostInt Int * Int
2 3
Code for Term Input
p to
Term()
parse
if (token = Int n)
leftmostInt = token;; token = NextToken();
(); 2*3*4
(root, incomplete) = TermPrime();
if (root == NULL) return leftmostInt;
incomplete leftChild = leftmostInt;
incomplete.leftChild leftmostInt;
return root;
else throw SyntaxError root Term

incomplete Term * Int


4
leftmostInt Int * Int
2 3
Code for Term Input
p to
Term()
parse
if (token = Int n)
leftmostInt = token;; token = NextToken();
(); 2*3*4
(root, incomplete) = TermPrime();
if (root == NULL) return leftmostInt;
incomplete leftChild = leftmostInt;
incomplete.leftChild leftmostInt;
return root;
else throw SyntaxError root Term

incomplete Term * Int


4
leftmostInt Int * Int
2 3
Code for TermPrime
TermPrime()
if (token = *) || (token = /) Missing left child
op = token;
t k nextt = N
NextToken();
tT k () to be filled in by
if (next = Int n)
caller
token = NextToken();
(root, incomplete) = TermPrime();
if (root == NULL)
root = new ExprNode(NULL, op, next);
return (root, root);
else newChild = new ExprNode(NULL, op, next);
incomplete.leftChild = newChild;
return(root, newChild);
else throw SyntaxError
else return(NULL,NULL)
MIT OpenCourseWare
http://ocw.mit.edu

6.035 Computer Language Engineering


Spring 2010

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

You might also like