Lexical Analyzer
Lexical Analyzer
• Union
– L1 ∪ L2 = { s | s ∈ L1 or s ∈ L2 }
• Exponentiation:
– L0 = {ε } L1 = L L2 = LL
• Kleene Closure
∞
– L =*
Li
i =0
• Positive Closure
∞
– L =+
L
i =1
i
• L1L2 = {a1,a2,b1,b2,c1,c2,d1,d2}
• L1 ∪ L2 = {a,b,c,d,1,2}
• (r)+ = (r)(r)*
• (r)? = (r) | ε
• Ex:
Σ = {0,1}
– 0|1 => {0,1}
– (0|1)(0|1) => {00,01,10,11}
– 0* => {ε ,0,00,000,0000,....}
– (0|1)* => all strings with 0 and 1, including the empty string
∀ ε - transitions are allowed in NFAs. In other words, we can move from one state
to another one without consuming any symbol.
• A NFA accepts a string x, if and only if there is a path from the starting state to
one of accepting states such that edge labels along this path spell out x.
a
b a
The language recognized by
a b
0 1 2
this DFA is also (a|b) * a b
b
ε N(r1) ε
i ε f NFA for r1 | r2
ε
N(r2)
NFA for r1 r2
ε ε
i N(r) f
NFA for r*
(a | b) ε ε
b b
b:
ε
a ε
ε
ε ε
(a|b) *
ε ε
b
ε
ε
a ε
ε
ε
(a|b) * a ε
ε a
ε
b
S1
S0 b a
S2
•
Syntax tree of (a|b) * a #
• #
4
* a
3 • each symbol is numbered (positions)
| • each symbol is at a leave
a b
2
1 • inner nodes are operators
For example, ( a | b) * a #
1 2 3 4
• If firstpos and lastpos have been computed for each node, followpos
of each position can be computed by making one depth-first traversal
of the syntax tree.
S1=firstpos(root)={1,2,3}
⇓ mark S1
a: followpos(1) ∪ followpos(3)={1,2,3,4}=S2 move(S1,a)=S2
b: followpos(2)={1,2,3}=S1 move(S1,b)=S1
⇓ mark S2
a: followpos(1) ∪ followpos(3)={1,2,3,4}=S2 move(S2,a)=S2
b: followpos(2)={1,2,3}=S1 move(S2,b)=S1
b a
start state: S1 a
S1 S2
accepting states: {S2}
b
S1=firstpos(root)={1,2}
⇓ mark S1
a: followpos(1)={2}=S2 move(S1,a)=S2
b: followpos(2)={3,4}=S3 move(S1,b)=S3
⇓ mark S2
b: followpos(2)={3,4}=S3 move(S2,b)=S3
⇓ mark S3 S2
a
b
c: followpos(3)={3,4}=S3 move(S3,c)=S3 S1
b
S3 c
start state: S1
CS416 Compiler Design 33
Minimizing Number of States of a DFA
• partition the set of states into two groups:
– G1 : set of accepting states
– G2 : set of non-accepting states
• Start state of the minimized DFA is the group containing the start state
of the original DFA.
• Accepting states of the minimized DFA are the groups containing the accepting
states of the original DFA.
b a
{1,3} a {2}
1 4
Groups: {1,2,3} {4}
b
b a
{1,2} {3} a b
3 b no more partitioning 1->2 1->3
2->2 2->3
b 3->4 3->3
{3}
a b
{1,2} a b
a {4}
• What is the end of a token? Is there any character which marks the end
of a token?
– It is normally not defined.
– If the number of characters in a token is fixed, in that case no problem: + -
– But < < or <> (in Pascal)
– The end of an identifier : the characters cannot be in an identifier can mark the end of
token.
– We may need a lookhead
• In Prolog: p :- X is 1. p :- X is 1.5.
The dot followed by a white space character can mark the end of a number.
But if that is not the case, the dot must be treated as a part of the number.