Lecture 9: Bottom-Up Parsing: Front-End Back-End
Lecture 9: Bottom-Up Parsing: Front-End Back-End
Today’s lecture:
Bottom-Up parsing
18 Aug 2022 COMP36512 Lecture 9 1
Bottom-Up Parsing: What is it all about?
Goal: Given a grammar, G, construct a parse tree for a string (i.e.,
sentence) by starting at the leaves and working to the root (i.e., by
working from the input sentence back toward the start symbol S).
Recall: the point of parsing is to construct a derivation:
S012...n-1sentence
To derive i-1 from i, we match some rhs b in i, then replace b with its
corresponding lhs, A. This is called a reduction (it assumes Ab).
The parse tree is the result of the tokens and the reductions.
Example: Consider the grammar below and the input string abbcde.
Sentential Form Production Position
1. GoalaABe abbcde 3 2
2. AAbc a A bcde 2 4
a A de 4 3
3. |b aABe 1 4
Goal - -
4. Bd
18 Aug 2022 COMP36512 Lecture 9 2
Finding Reductions
• What are we trying to find?
– A substring b that matches the right-side of a production that occurs as one
step in the rightmost derivation. Informally, this substring is called a handle.
• Formally, a handle of a right-sentential form is a pair <Ab,k>
where Ab P and k is the position in of b’s rightmost symbol.
(right-sentential form: a sentential form that occurs in some rightmost derivation).
– Because is a right-sentential form, the substring to the right of a handle
contains only terminal symbols. Therefore, the parser doesn’t need to scan past
the handle.
– If a grammar is unambiguous, then every right-sentential form has a unique
handle (sketch of proof by definition: if unambiguous then rightmost
derivation is unique; then there is unique production at each step to produce a
sentential form; then there is a unique position at which the rule is applied;
hence, unique handle).
If we can find those handles, we can build a derivation!
18 Aug 2022 COMP36512 Lecture 9 3
Motivating Example
Given the grammar of the left-hand side below, find a rightmost
derivation for x – 2*y (starting from Goal there is only one, the
grammar is not ambiguous!). In each step, identify the handle.
1. Goal Expr Production Sentential Form Handle
2. Expr Expr + Term - Goal -
3. | Expr – Term 1 Expr 1,1
4. | Term 3 Expr – Term 3,3
5. Term Term * Factor
6. | Term / Factor
7. | Factor
8. Factor number
9. | id
L stands for left-to-right scanning of the input; R for constructing a rightmost derivation in reverse; 1 for the
number of input symbols for lookahead.
18 Aug 2022 COMP36512 Lecture 9 9
LR Parsing: Background
• Read tokens from an input buffer (same as with shift-reduce
parsers)
• Add an extra state information after each symbol in the
stack. The state summarises the information contained in
the stack below it. The stack would look like:
$ S0 Expr S1 - S2 num S3
• Use a table that consists of two parts:
– action[state_on_top_of_stack, input_symbol]: returns one of: shift
s (push a symbol and a state); reduce by a rule; accept; error.
– goto[state_on_top_of_stack,non_terminal_symbol]: returns a new
state to push onto the stack after a reduction.
18 Aug 2022 COMP36512 Lecture 9 10
Skeleton code for an LR Parser
Push $ onto the stack
push s0
token=next_token()
repeat
s=top_of_the_stack /* not pop! */
if ACTION[s,token]==‘reduce Ab’
then pop 2*(symbols_of_b) off the stack
s=top_of_the_stack /* not pop! */
push A; push GOTO[s,A]
elseif ACTION[s,token]==‘shift sx’
then push token; push sx
token=next_token()
elseif ACTION[s,token]==‘accept’
then break
else report_error
end repeat
report_success
Computes the state that the parser would reach if it recognised an x while
in state s.
Example:
S1 (x=CatNoise): [GoalCatNoise,eof], [CatNoiseCatNoise miau, eof],
[CatNoiseCatNoise miau, miau]
S2 (x=miau): [CatNoisemiau, eof], [CatNoisemiau, miau]
S3 (from S1): [CatNoiseCatNoise miau, eof], [CatNoiseCatNoise miau, miau]
18 Aug 2022 COMP36512 Lecture 9 21
Example (slide 1 of 4)
Simplified expression grammar:
GoalExpr
ExprTerm-Expr
ExprTerm
TermFactor*Term
TermFactor
Factorid
FIRST(Goal)=FIRST(Expr)=FIRST(Term)=FIRST(Factor)=FIRST(id)=id
FIRST(-)=-
FIRST(*)=*