Ch3 Modified
Ch3 Modified
CHAPTER 3
The Reason Why Lexical
2
error error
Symbol Table
The Role of the Lexical 4
Analyzer
Attributes of Tokens
<id, “y”> <assign, > <num, 31> <‘+’, > <num, 28> <‘*’, > <id, “x”>
Examples of tokens
The "lexemeBegin" pointer is like putting your finger at the start of a word you're about to read.
The "forward" pointer is like your eyes scanning ahead to find the end of that word.
Terms for Parts of Strings 10
Tokens: Definitions
s0 =
si = si-1s for i > 0
note that s = s = s
Specification of Patterns for
14
Union
L M = {s s L or s M}
Concatenation
LM = {xy x L and y M}
Exponentiation
L0 = {}; Li = Li-1L
Kleene closure
L* = i=0,…, Li
Positive closure
L+ = i=1,…, Li
Specification of Patterns for
15
Basis symbols:
is a regular expression denoting language {}
a is a regular expression denoting {a}
If r and s are regular expressions denoting
languages L(r) and M(s) respectively, then
rs is a regular expression denoting L(r) M(s)
rs is a regular expression denoting L(r)M(s)
r* is a regular expression denoting L(r)*
(r) is a regular expression denoting L(r)
A language defined by a regular
expression is called a regular set
Algebraic Laws for Regular
16
Expressions
Specification of Patterns for Tokens: 17
Regular Definitions
where:
Each di is a new symbol, not in Σ and not the same
as any other of the d's, and
Each ri is a regular expression over the alphabet
Σ U {dl, d2,. . . , di-l).
Specification of Patterns for
18
Example:
letter AB…Zab…z
digit 01…9
id letter ( letterdigit )*
Examples:
digit [0-9]
num digit+ (. digit+)? ( E (+-)? digit+ )?
Regular Definitions and
20
Grammars
Grammar
stmt if expr then stmt
if expr then stmt else stmt
expr term relop term relational operators
term Regular definitions
term id if if
num then then
else else
relop < <= <> > >= =
id letter ( letter | digit )*
num digit+ (. digit+)? ( E (+-)? digit+ )?
Regular Definitions and Grammars 21
Value
Coding Regular Definitions in
23
Transition Diagrams
relop <<=<>>>==
start < =
0 1 2 return(relop, LE)
>
3 return(relop, NE)
other
4 * return(relop, LT)
=
5 return(relop, EQ)
> =
6 7 return(relop, GE)
other
8 * return(relop, GT)
id letter ( letterdigit )* letter or digit
Transition Diagrams
What Else Does Lexical Analyzer Do? 25
token nexttoken()
{ while (1) {
switch (state) {
case 0: c = nextchar();
if (c==blank || c==tab || c==newline) { Decides the
state = 0;
lexeme_beginning++; next start state
}
else if (c==‘<’) state = 1; to check
else if (c==‘=’) state = 5;
else if (c==‘>’) state = 6;
else state = fail();
break; int fail()
case 1: { forward = token_beginning;
… swith (start) {
case 9: c = nextchar(); case 0: start = 9; break;
if (isletter(c)) state = 10; case 9: start = 12; break;
else state = fail(); case 12: start = 20; break;
break; case 20: start = 25; break;
case 10: c = nextchar(); case 25: recover(); break;
if (isletter(c)) state = 10; default: /* error */
else if (isdigit(c)) state = 10; }
else state = 11; return start;
break; }
…
The Lex and Flex Scanner
41
Generators
lex
source lex (or flex) lex.yy.c
program
lex.l
lex.yy.c C a.out
compiler
input sequence
stream a.out of tokens
43
LEX Specification
digit [0-9]
id {letter}({letter}|{digit})*
......
%%
if { return(IF);}
then { return(THEN);}
Rules
Contains
%{ the matching
Translation #include <stdio.h> lexeme
%}
rules %%
[0-9]+ { printf(“%s\n”, yytext); }
.|\n { }
%% Invokes
main() the lexical
{ yylex(); analyzer
}
lex spec.l
gcc lex.yy.c -ll
./a.out < spec.l
49
Generator
Optional
regular
NFA DFA
expressions
Automata
Transition Graph
a
S = {0,1,2,3}
start a b b = {a,b}
0 1 2 3
s0 = 0
b F = {3}
55
Transition Table
Input Input
State
(0,a) = {0,1} a b
(0,b) = {0} 0 {0, 1} {0}
(1,b) = {2} 1 {2}
(2,b) = {3}
2 {3}
The Language Defined by an
56
NFA
Subset construction
DFA
From Regular Expression to NFA
58
(Thompson’s Construction)
start
i f
a start a
i f
start N(r1)
r1 r2 i f
N(r2)
start
r1 r2 i N(r1) N(r2) f
r* start
i N(r) f
Combining the NFAs of a Set of
59
Regular Expressions
start a
1 2
a { action1 }
start a b b
abb { action2 } 3 4 5 6
a b
a*b+ { action3 }
start
7 b 8
a
1 2
start
0 3
a
4
b
5
b
6
a b
7 b 8
Simulating the Combined NFA
60
Example 1
a
1 2 action1
start
0 3
a
4
b
5
b
6 action2
a b
7 b 8 action3
a a b a
none
0 2 7 8 action3
1 4
3 7 Must find the longest match:
7 Continue until no further moves are possible
When last state is accepting: execute action
Simulating the Combined NFA
61
Example 2
a
1 2 action1
start
0 3
a
4
b
5
b
6 action2
a b
7 b 8 action3
a b b a
none
0 2 5 6 action2
1 4 8 8 action3
3 7
7 When two or more accepting states are reached, the
first action given in the Lex specification is executed
62
Example DFA
b
b
a
start a b b
0 1 2 3
a a
Conversion of an NFA into a
64
DFA
Algorithm
a
1 2 a1
start
0 3
a
4
b
5
b
6 a2
a b
7 b 8 a3
b
a3 Dstates
C A = {0,1,3,7}
b a
b b B = {2,4,7}
start C = {8}
A D
D = {7}
a a
b b E = {5,8}
B E F
F = {6,8}
a1 a3 a2 a3
Minimizing the Number of
70
States of a DFA
C
b
b a b
start a b b start a b b
A B D E AC B D E
a a
a
a b a a
From Regular Expression to DFA
71
Directly
Directly (Algorithm)
Leaf true
{1, 2} | {1, 2}
Directly: followpos
Directly: Example
followp
Node 1
os
3 4 5 6
{1, 2,
1
3} 2
{1, 2,
2
3}
3 {4}b b
a
4 start {5} a 1,2, b 1,2, b 1,2,
1,2,3
5 {6} 3,4 3,5 3,6
a
6 - a
80
Time-Space Tradeoffs
Space Time
Automaton (worst (worst
case) case)
O(rx
NFA O(r)
)
DFA O(2|r|) O(x)