0% found this document useful (0 votes)
69 views17 pages

Lexical Analysis - Part II From Regular Expression To Scanner Comp 412

Students enrolled in Comp 412 at rice have explicit permission to make copies. Faculty from other educational institutions may use materials for nonprofit educational purposes. Regular expressions can be used to specify parts of speech and words.

Uploaded by

Muhammad Istafa
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views17 pages

Lexical Analysis - Part II From Regular Expression To Scanner Comp 412

Students enrolled in Comp 412 at rice have explicit permission to make copies. Faculty from other educational institutions may use materials for nonprofit educational purposes. Regular expressions can be used to specify parts of speech and words.

Uploaded by

Muhammad Istafa
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

COMP 412

FALL 2010

Lexical Analysis — Part II


From Regular Expression
to Scanner
Comp 412

Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.
Students enrolled in Comp 412 at Rice University have explicit permission to make copies
of these materials for their personal use.
Faculty from other educational institutions may use these materials for nonprofit
educational purposes, provided this copyright notice is preserved.
Quick Review

compile source code parts of speech & words


time Scanner
tables
or code Represent
design specifications Scanner words as
time Generator indices into a
global table
Specifications written as
“regular expressions”

Last class:
—  The scanner is the first stage in the front end
—  Specifications can be expressed using regular expressions
—  Build tables and code from a DFA

Comp 412, Fall 2010 1


Quick Review of Regular Expressions
•  All strings of 1s and 0s ending in a 1
( 0 | 1 )* 1

•  All strings over lowercase letters where the vowels (a,e,i,o,


& u) occur exactly once, in ascending order
Let Cons be (b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|y|z)
Cons* a Cons* e Cons* i Cons* o Cons* u Cons*

•  All strings of 1s and 0s that do not contain three 0s in a row:


( 1* ( ! |01 | 001 ) 1* )* ( ! | 0 | 00 )

Comp 412, Fall 2010 2


Example (from Lab 1)
Consider the problem of recognizing ILOC register names
Register " r (0|1|2| … | 9) (0|1|2| … | 9)*
•  Allows registers of arbitrary number
•  Requires at least one digit

RE corresponds to a recognizer (or DFA)

(0|1|2| … 9)
r (0|1|2| … 9)
S0 S1 S2
accepting state

Recognizer for Register

Transitions on other inputs go to an error state, se

Comp 412, Fall 2010 3


Example (continued)
DFA operation
•  Start in state S0 & make transitions on each input character
•  DFA accepts a word x iff x leaves it in a final state (S2 )
(0|1|2| … 9)
r (0|1|2| … 9)
S0 S1 S2
accepting state

Recognizer for Register


So,
•  r17 takes it through s0, s1, s2 and accepts
•  r takes it through s0, s1 and fails
•  a takes it straight to se

Comp 412, Fall 2010 4


Example (continued)
To be useful, the recognizer must be converted into code

0,1,2,3,4, All
Char $ next character r 5,6,7,8,9 others
#
State $ s0
s0 s1 se se
while (Char % EOF)
State $ #(State,Char)
s1 se s2 se
Char $ next character
if (State is a final state ) s2 se s2 se
then report success
else report failure
se se se se

Skeleton recognizer Table encoding the RE


(0|1|2| … 9)
O(1) cost per character r (0|1|2| … 9)
Comp (or per2010
412, Fall transition) S0 S1 S2 5
Example (continued)
We can add “actions” to each transition

0,1,2,3,4, All
Char $ next character # r 5,6,7,8,9 others
State $ s0 &
while (Char % EOF) s0 s1 se se
Next $ #(State,Char) start error error
Act $ &(State,Char)
perform action Act s1 se s2 se
State $ Next error add error
Char $ next character
s2 se s2 se
if (State is a final state ) error add error
then report success
se se se se
else report failure
error error error

Skeleton recognizer Table encoding RE

Comp 412, Fall 2010 Typical action is to capture the lexeme 6


What if we need a tighter specification?
r Digit Digit* allows arbitrary numbers
•  Accepts r00000
•  Accepts r99999
•  What if we want to limit it to r0 through r31 ?
Write a tighter regular expression
—  Register ! r ( (0|1|2) (Digit | !) | (4|5|6|7|8|9) | (3|30|31) )
—  Register ! r0|r1|r2| … |r31|r00|r01|r02| … |r09

Produces a more complex DFA


•  DFA has more states
•  DFA has same cost per transition (or per character)
•  DFA has same basic implementation

Comp 412, Fall 2010 7


Tighter register specification (continued)

The DFA for


Register ! r ( (0|1|2) (Digit | !) | (4|5|6|7|8|9) | (3|30|31) )

(0|1|2| … 9)
S2 S3
0,1,2

r 3 0,1
S0 S1 S5 S6

4,5,6,7,8,9
S4

•  Accepts a more constrained set of register names


•  Same set of actions, more states

Comp 412, Fall 2010 8


Tighter register specification (continued)

All This table runs


# r 0,1 2 3 4-9 others in the same
skeleton
s0 s1 se se se se se recognizer
s1 se s2 s2 s5 s4 se
s2 se s3 s3 s3 s3 se This table uses
the same O(1)
s3 se se se se se se time per
character
s4 se se se se se se
The extra
s5 se s6 se se se se
precision costs
s6 se se se se se se us table space,
not time
se se se se se se se
(0|1|2| … 9)

Table encoding RE for the tighter register specification


S2 S3

0,1,2

r 3 0,1
S0 S1 S5 S6

Comp 412, Fall 2010 4,5,6,7,8,9 9


S4
Tighter register specification (continued)

State 4,5,6
r 0,1 2 3 other
Action 7,8,9
1
0 e e e e e
start
2 2 5 4
1 e e
add add add add
3 3 3 3 e
2 e
add add add add exit
e
3,4 e e e e e
exit
6 e
5 e e e e
add exit
x
6 e e e e e
exit

e e e e e e e

(0|1|2| … 9)
S2 S3

We care about path lengths (time) and finite 0,1,2

size of set of states (representability), but we S0


r
S1
3
S5
0,1
S6

don’t412,
Comp worry (much) about number of states.
Fall 2010 4,5,6,7,8,9 10
S4
Where are we going?
•  We will show how to construct a finite state automaton
to recognize any RE Introduce NFAs
•  Overview:
—  Direct construction of a nondeterministic finite automaton
(NFA) to recognize a given RE
"  Easy to build in an algorithmic way
"  Requires !-transitions to combine regular subexpressions
—  Construct a deterministic finite automaton (DFA) to simulate
the NFA
"  Use a set-of-states construction Optional, but
worthwhile
—  Minimize the number of states in the DFA
"  Hopcroft state minimization algorithm
—  Generate the scanner code
"  Additional specifications needed for the actions

Comp 412, Fall 2010 11


Non-deterministic Finite Automata
What about an RE such as ( a | b )* abb ?

b a

a b b
S0 S1 S2 S3
a
a

Each RE corresponds to a deterministic finite automaton (DFA)


•  We know a DFA exists for each RE
•  The DFA may be hard to build directly
•  Automatic techniques will build it for us …
Nothing here that would change
Comp 412, Fall 2010 12
the O(1) cost per transition
Non-deterministic Finite Automata
Here is a simpler RE for ( a | b )* abb

a|b

! a b b
S0 S1 S2 S3 S4

This recognizer is more intuitive


•  Structure seems to follow the RE’s structure
This recognizer is not a DFA
•  S0 has a transition on !
•  S1 has two transitions on a
This is a non-deterministic finite automaton (NFA)

This NFA needs one more transition,


Comp 412, Fall 2010 13
at O(1) cost per transition
Non-deterministic Finite Automata
An NFA accepts a string x iff ' a path though the
transition graph from s0 to a final state such that
the edge labels spell x, ignoring !’s
•  Transitions on ! consume no input
•  To “run” the NFA, start in s0 and guess the right transition at
each step
—  Always guess correctly
—  If some sequence of correct guesses accepts x then accept

Why study NFAs?


•  They are the key to automating the RE"DFA construction
•  We can paste together NFAs with !-transitions

NFA
! NFA becomes an NFA

Comp 412, Fall 2010 14


Relationship between NFAs and DFAs
DFA is a special case of an NFA
•  DFA has no ! transitions
•  DFA’s transition function is single-valued
•  Same rules will work

DFA can be simulated with an NFA


—  Obviously

NFA can be simulated with a DFA (less obvious)


•  Simulate sets of possible states
•  Possible exponential blowup in the state space
•  Still, one state per character in the input stream

Comp 412, Fall 2010 Rabin & Scott, 1959 15


Automating Scanner Construction
To convert a specification into code:
1  Write down the RE for the input language
2  Build a big NFA
3  Build the DFA that simulates the NFA
4  Systematically shrink the DFA
5  Turn it into code

Scanner generators
•  Lex and Flex work along these lines
•  Algorithms are well-known and well-understood
•  Key issue is interface to parser (define all parts of speech)
•  You could build one in a weekend!

Comp 412, Fall 2010 16

You might also like