Lexical Analysis - Part II From Regular Expression To Scanner Comp 412
Lexical Analysis - Part II From Regular Expression To Scanner Comp 412
FALL 2010
Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.
Students enrolled in Comp 412 at Rice University have explicit permission to make copies
of these materials for their personal use.
Faculty from other educational institutions may use these materials for nonprofit
educational purposes, provided this copyright notice is preserved.
Quick Review
Last class:
— The scanner is the first stage in the front end
— Specifications can be expressed using regular expressions
— Build tables and code from a DFA
(0|1|2| … 9)
r (0|1|2| … 9)
S0 S1 S2
accepting state
0,1,2,3,4, All
Char $ next character r 5,6,7,8,9 others
#
State $ s0
s0 s1 se se
while (Char % EOF)
State $ #(State,Char)
s1 se s2 se
Char $ next character
if (State is a final state ) s2 se s2 se
then report success
else report failure
se se se se
0,1,2,3,4, All
Char $ next character # r 5,6,7,8,9 others
State $ s0 &
while (Char % EOF) s0 s1 se se
Next $ #(State,Char) start error error
Act $ &(State,Char)
perform action Act s1 se s2 se
State $ Next error add error
Char $ next character
s2 se s2 se
if (State is a final state ) error add error
then report success
se se se se
else report failure
error error error
(0|1|2| … 9)
S2 S3
0,1,2
r 3 0,1
S0 S1 S5 S6
4,5,6,7,8,9
S4
0,1,2
r 3 0,1
S0 S1 S5 S6
State 4,5,6
r 0,1 2 3 other
Action 7,8,9
1
0 e e e e e
start
2 2 5 4
1 e e
add add add add
3 3 3 3 e
2 e
add add add add exit
e
3,4 e e e e e
exit
6 e
5 e e e e
add exit
x
6 e e e e e
exit
e e e e e e e
(0|1|2| … 9)
S2 S3
don’t412,
Comp worry (much) about number of states.
Fall 2010 4,5,6,7,8,9 10
S4
Where are we going?
• We will show how to construct a finite state automaton
to recognize any RE Introduce NFAs
• Overview:
— Direct construction of a nondeterministic finite automaton
(NFA) to recognize a given RE
" Easy to build in an algorithmic way
" Requires !-transitions to combine regular subexpressions
— Construct a deterministic finite automaton (DFA) to simulate
the NFA
" Use a set-of-states construction Optional, but
worthwhile
— Minimize the number of states in the DFA
" Hopcroft state minimization algorithm
— Generate the scanner code
" Additional specifications needed for the actions
b a
a b b
S0 S1 S2 S3
a
a
a|b
! a b b
S0 S1 S2 S3 S4
NFA
! NFA becomes an NFA
Scanner generators
• Lex and Flex work along these lines
• Algorithms are well-known and well-understood
• Key issue is interface to parser (define all parts of speech)
• You could build one in a weekend!