0% found this document useful (0 votes)
26 views

Top Down Parsing Example: The Problem Is Simple: Left Recursion!

f
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Top Down Parsing Example: The Problem Is Simple: Left Recursion!

f
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Top Down Parsing

Example

When we are parsing, we produce a unique syntax tree from a legal


sentence.
An unambiguous grammar gives rise to a single leftmost
derivation for any sentence in the language.
So, if we are trying to recognise a sentence, what we are trying to do
is grow a parse tree corresponding to that sentence

We will always be looking at the leftmost nonterminal.


This follows the push down automaton model of the previous lecture
The parser must choose the correct production from the set of
productions that correspond to the current state of the parse.
If at any time there is no candidate production corresponding to the
state of the parse, we must have made a wrong turn at some earlier
stage and we will need to backtrack.
2003, 2004 Kevin J. Maciunas, Charles Lakos

S E
E T | E+T
T F | T*F
F unit | (E)
and the expression:
1+2*3

We are trying to find the leftmost derivation.


A top-down parser constructs a leftmost parse

CC&P 2004

Suppose we have a grammar:

Slide 1

This means that to


parse this sentence
some backtracking is
required, i.e., put input
symbols back!
Backtracking in
compilers is nontrivial
and to be avoided!!
CC&P 2004

Example #2
Suppose we have a grammar:
S E
E T | E+T
T F | T*F
F unit | (E)
and the expression:
1+2*3

A legal parse would be:


1. S
1+2*3
only one rule: S E
2. E
1+2*3
choose E E+T
3. E+T
1+2*3
choose E E+T
4. E+T+T
1+2*3
choose E E+T
5. E+T+T+T
1+2*3
choose E E+T
...you can see what happens!

2003, 2004 Kevin J. Maciunas, Charles Lakos

2003, 2004 Kevin J. Maciunas, Charles Lakos

Slide 2

Solutions?

The problem is simple: Left Recursion!

CC&P 2004

A legal parse would be:


1. S
1+2*3
only one rule: S E
2. E
1+2*3
choose E E+T
3. E+T
1+2*3
choose E T F unit
4. unit+T
1+2*3
match 1 and +
5. 1+T
2*3
choose T F unit
6. 1+unit
2*3
match 2
7. 1+2
*3
WRONG

Slide 3

We could rearrange the productions so that the left recursive ones


come at the end, and always choose the first matching production.
For the previous examples, this has already been done. The left
recursive ones are at the end of the list!
Note that this is not an easy task in general since mutually
recursive grammars have the same problems:
AB|CD
CE|AF
In general, rearranging productions will not help the parser will
still have problems.
Even if it does help, a parser which needs to backtrack an
arbitrary distance is inefficient.
What we need is a way to deterministically parse a grammar in a
top down fashion without backtracking.

CC&P 2004

2003, 2004 Kevin J. Maciunas, Charles Lakos

Slide 4

Eliminating left recursion

Example of eliminating left recursion

An algorithm to eliminate arbitrary left recursion (by replacing it


with right recursion) is as follows:

Consider the productions:


A a | Ba
B b | Cb

C c | Ac

1. Arbitrarily order the non-terminals: N1, N2, N3, ...

1. Arbitrarily order the non-terminals: A, B, C

2. Apply the following steps to the productions for N1, then N2, ...
3. For Ni :
a) For all productions Ni Nk , where k < i and if the productions
for Nk are Nk 1 | 2 | 3 | ... then expand the reference to Nk,
i.e. replace the production Ni Nk by Ni 1 | 2 | ...
b) If the productions for Ni are now
Ni 1 | 2 | ... | Ni 1 | Ni 2 | ...
(where the first few are not left recursive while the latter are)
then replace them with
Ni 1 Ni | 2 Ni | ...
Ni | 1 Ni | 2 Ni | ...

2. Consider the productions for A: no change


2. Consider the productions for B: no change

CC&P 2004

2003, 2004 Kevin J. Maciunas, Charles Lakos

Slide 5

3. Consider the productions for C:


a) Replace C Ac by C ac | Bac
a) Replace C Bac by C bac | Cbac
Productions for C are now: C c | ac | bac | Cbac
b) Replace the productions for C by:
C cC | acC | bacC

C | bacC

CC&P 2004

2003, 2004 Kevin J. Maciunas, Charles Lakos

Slide 6

A Workable Solution

Definitions

Observation
The trouble which gives rise to nondeterminacy and
backtracking in top down parsers shows itself in only one
place that is when a parser has to choose between
several alternatives with the same left hand side.
The only information which we can use to make the
correct decision is the input stream itself.

A parser which can make a deterministic decision about which


alternative to choose when faced with one, if given a buffer of k
symbols, is called a LL(k) parser.
Left to right scan of input
Left most derivation
k symbols of look-ahead
The grammar that an LL(k) parser recognizes is an LL(k) grammar
and any language that has an LL(k) grammar is an LL(k) language.
We are constructing an LL(1) compiler that recognises LL(1)
grammars.
So the question is How do we know when we have an LL(1)
grammar?
We also have LR(k) grammars and other variations, but our focus is
currently on LL(1) grammars.

In the example, we (humans) could see which alternative


to choose by looking at the input yet-to-be-read.

If we are going to look ahead in order to make the correct


decision, we need a buffer in which to store the next few
symbols.
In practice, this buffer is of a fixed length.

CC&P 2004

2003, 2004 Kevin J. Maciunas, Charles Lakos

Slide 7

CC&P 2004

2003, 2004 Kevin J. Maciunas, Charles Lakos

Slide 8

Definition of LL(1)

Definition of First
To compute FIRST(X) for all grammar symbols X, apply
the following algorithm until no more terminals or can be
added to any FIRST set.
1. If X is a terminal, then FIRST(X) is {X}
2. If X is a production, then add to FIRST(X)
3. If X is a nonterminal and X Y1 Y2 ...Yn is a
production, then place a in FIRST(X) if for some i, a is in
FIRST(Yi), and is in all of FIRST(Y1 ), ..., FIRST (Yi-1 );
that is Y1 Y2 ...Yi-1 * .
If is in FIRST(Yj ) for all j = 1, 2, ... n, then add to
FIRST(X). For example, everything in FIRST(Y1) is surely
in FIRST(X). If Y1 does not derive , then we add nothing
more to FIRST(X), but if Y1 then we add FIRST(Y2)
and so on.

When faced with a production such as:


A 1 | 2 | 3
We chose one of the i uniquely by looking at the
next input symbol.
We employ two sets: first and follow, to help us.
Recall:
First(X) is the set of all terminal symbols that can
start the production X
Follow(X) is the set of terminal symbols that can
follow an X

CC&P 2004

2003, 2004 Kevin J. Maciunas, Charles Lakos

Slide 9

CC&P 2004

Definition of Follow

Slide 10

2003, 2004 Kevin J. Maciunas, Charles Lakos

Definition of LL(1) property

To compute FOLLOW(A) for all nonterminals A, apply the


following algorithm until nothing can be added to any
FOLLOW set.
1. Place $ in FOLLOW(S), where S is the start symbol and
$ is the input right endmarker.
2. If there is a production A B then everything in
FIRST() except for is placed in FOLLOW(B).
3. If there is a production A B or a production A B
where FIRST() contains (i.e., * ), then
everything in FOLLOW(A) is in FOLLOW(B).

CC&P 2004

2003, 2004 Kevin J. Maciunas, Charles Lakos

Slide 11

Definition: A grammar G is LL(1) if and only if for all rules


A 1 | 2 | ... | n

director(i) director(k) = i k

where:
director(i) = first(i) follow(A)
= first(i)

CC&P 2004

if j
otherwise

2003, 2004 Kevin J. Maciunas, Charles Lakos

Slide 12

Making Grammars LL(1)

Fixing the problem

We can't always make a grammar which is not LL(1) into


an equivalent LL(1) grammar.
Some tricks to help are factorisation and substitution.
Consider the grammar
ST
T L B | L C array
L long |
CB |
B real | integer

ST
T L B | L C array
L long |
CB |
B real | integer
2

Transform the grammar by factorisation:


ST
TLX
X B | C array
L long |
CB |
B real | integer

2003, 2004 Kevin J. Maciunas, Charles Lakos

Slide 13

Transform the grammar by factorisation:


ST
TLX
X B | C array
L long |
CB |
B real | integer

Substitute for C wherever it occurs:


ST
TLX
X B | B array | array
L long |
B real | integer

Note that it still is not LL(1)!


CC&P 2004

CC&P 2004

Factorisation of B in X gives:
ST
TLX
X B Y | array
Y array |
L long |
B real | integer

2003, 2004 Kevin J. Maciunas, Charles Lakos

Slide 14

You might also like