A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
NFA to DFA
DFA Analysis
The Structure of Compiler (Phases of Compiler)
Syntax analyzer
Symbol table
manager Semantic analyzer Error handler
Code optimizer
Backend
Code generator
Lexical Analysis
• The first phase of a compiler is called Lexical
Analysis or Scanning.
• The Lexical Analyzer reads the stream of characters
making up the source program and groups the
characters into meaningful sequences called
Lexemes.
• For each lexemes, the lexical analyzer produces an
output a token of the form
• <token-name, attribute-value>
• The same token is being passed to the subsequent
phase, syntax alalysis
Dr. Deepak K. Sinha, JIT, JU 4
• In the token, the first component token-name is an
abstract symbol that is used during syntax analysis,
And
• The second component attribute-value points to an
entry in the symbol table for this token.
symbol table
Dr. Deepak K. Sinha, JIT, JU 6
• The assignment symbol = is a lexeme that is
mapped into the token <=>
• i……………….<id, 2>
• +……………….<+>
• r.........................<id, 3>
• *…………………<*>
• 60…………………<60>
token
Source Lexical parser
program analyzer Nexttoken()
symbol
table
• Two issues in lexical analysis.
– How to specify tokens (patterns)?
– How to recognize the tokens giving a token specification (how to implement
the nexttoken() routine)?
program text
token scanner
description generator lexical analysis
tokens
syntax analysis
• Generating a lexical analyzer
AST
– generic methods context handling
– specific tool lex
annotated AST
Token description
• base [bo] %{
integer digit+ base? #include
%}
"lex.h"
base [bo]
• rule = expr + action digit [0-9]
%%
{digit}+ {base}? {return INTEGER;}
• {} signal application
%%
of a description
automatic generation
program text
token scanner
description generator lexical analysis
tokens
syntax analysis
digit
digit AST
S0 S1
‘i’ ‘f’
• FSA S0 S1 S2
– Initial state S0
– set of accepting states
FSA examples
digit
S0 S1
digit digit
‘.’ digit
S0 S2 S3
Concurrent recognition
• integral_number [0-9]+
• fixed_point_number [0-9]* ‘.’ [0-9]+
digit
• recognize both
digit
tokens in one pass S0 S1
digit digit
‘.’ digit
S0 S2 S3
Concurrent recognition
• integral_number [0-9]+
• fixed_point_number [0-9]* ‘.’ [0-9]+
digit digit
digit
S0 S1
• integral_number [0-9]+
• fixed_point_number [0-9]* ‘.’ [0-9]+
digit
digit
S0 S1
‘.’ ‘.’
• correct approach: digit
digit
character recognized
state token digit
digit dot other S0 S1
S0 S1 S2 -
S1 S1 S2 - integer ‘.’ ‘.’ digit
S2 S3 - - digit
S3 S3 - - fixed point S2 S3
The role of the lexical analyzer
6、Lexical Errors
– Deleting an extraneous character
– Inserting a missing character
– Replacing an incorrect character by a
correct character
– Transposing two adjacent characters(such
as , fi=>if)
– Pre-scanning
Specification of Tokens
1、Usage of FA
– Precisely recognize the regular sets
– A regular set is a set of sentences relating
to the regular expression
2、Sorts of FA
– Deterministic FA
– Non-deterministic FA
LEXICAL ANALYSIS
Finite automata
3、Deterministic FA (DFA)
Note: 1) In a DFA, no state has an -transition;
2)In a DFA, for each state s and input
symbol a, there is at most one edge labeled a
leaving s
3)To describe a FA,we use the transition
graph or transition table
4)A DFA accepts an input string x if and
only if there is some path in the transition graph
from start state to some accepting state
e.g. DFA M=({0,1,2,3},{a,b},move,0,{3})
Move: move(0,a)=1 m(0,b)=2 m(1,a)=3 m(1,b)=2
m(2,a)=1 m(2,b)=3 m(3,a)=3 m(3,b)=3
Transition table
input a b
1 a
a
state a
0 b a 3
0 1 2 b
b b
1 3 2 2
2 1 3 Transition graph
3 3 3
e.g. Construct a DFA M,which can accept the
strings which begin with a or b, or begin with c
and contain at most one a。
b b
0 c 2 a 3
a b c c
c 1 a
b
So ,the DFA is
M=({0,1,2,3,},{a,b,c},move,0,{1,2,3})
move:move(0,a)=1 move(0,b)=1
move(0,c)=1 move(1,a)=1
move(1,b)=1 move(1,c)=1
move(2,a)=3 move(2,b)=2
move(2,c)=2 move(3,b)=3
move(3,c)=3
Definition of An Automata
a1 a2 a3 b b b
Reading
Head
Tape divided into
Finite-state
cells of finite
automaton
length
q0 q1
0/0
A Transition System
11 1
0
0,1
1
0111 111 1
0 0
states 0 1
q0 q2 q1
q1 q3 q0
q2 q0 q3
q3 q1 q2
Example
0 1 0,1
q0 1 q1 0 q2
states
q1 q2 q1
q2 q2 q2
Non-deterministic Finite Automata (NFA)
0 1
q0 q1
q
0
d(q0, 0100)={q0, q3, q4}
0
0
1 1 q2
q0 q
1 1 1
0 0
^
0 q
q3
4
1
Non-deterministic Finite Automata (NFA)
51
LEXICAL ANALYSIS
ND- Finite automata
4、Non-deterministic FA (NFA)
Note:1) In a NFA,the same character can label
two or more transitions out of one state;
2) In a NFA, is a legal input symbol.
3) A DFA is a special case of a NFA
4)A NFA accepts an input string x if and
only if there is some path in the transition graph
from start state to some accepting state. A path
can be represented by a sequence of state
transitions called moves.
5)The language defined by a NFA is the set
of input strings it accepts
e.g. An NFA M=
({q0,q1},{0,1},move,q0,{q1})
0
input 0 1 1
State q0 1 q1
0
q0 q0 q1 0
q1 q0, q1 q0
Regular expression: (0+1)*01(0+1)*
56
NFA to DFA construction: Example
• L = {w | w ends in 01}
1 0
NFA: DFA: 0 1
0,1 {q0} {q0,q1} {q0,q2}
0
0 1 1
q0 q1 q2
δD 0 1 δD 0 1
δN 0 1
Ø Ø Ø {q0} {q0,q1} {q0}
q0 {q0,q1} {q0} {q0} {q0,q1} {q0} {q0,q1} {q0,q1} {q0,q2}
q1 Ø {q2} {q1} Ø {q2} *{q0,q2} {q0,q1} {q0}
*q2 Ø Ø *{q2} Ø Ø
{q0,q1} {q0,q1} {q0,q2}
*{q0,q2} {q0,q1} {q0} 0. Enumerate all possible subsets
*{q1,q2} Ø {q2} 1. Determine transitions
*{q0,q1,q2} {q0,q1} {q0,q2} 2. Retain only those states
reachable from {q570}
LEXICAL ANALYSIS
Finite automata
6、 Minimizing the number of States of a DFA
a)Basic idea
Find all groups of states that can be distinguished by
some input string. At beginning of the process, we
assume two distinguished groups of states: the group of
non-accepting states and the group of accepting states.
Then we use the method of partition of equivalent class
on input string to partition the existed groups into
smaller groups .
• e.g. Minimize the following DFA.
a
a b
1 3 4
a b
b a a a b
0
b
2 b a
5 6
b
• 1. Initialization: ∏0={{0,1,2},{3,4,5,6}}
• 2.1 For Non-accepting states in ∏0 :
– a: move({0,2},a)={1} ; move({1},a)={3} . 1,3
do not in the same subgroup of ∏0.
– So ,∏1`={{1},{0,2},{3,4,5,6}}
– b: move({0},b)={2}; move({2},b)={5}. 2,5 do
not in the same subgroup of ∏1‘.
– So, ∏1``={{1},{0},{2},{3,4,5,6}}
2.2 For accepting states in ∏0 :
– a: move({3,4,5,6},a)={3,6}, which is the
subset of {3,4,5,6} in ∏1“
– b: move({3,4,5,6},b)={4,5}, which is the
subset of {3,4,5,6} in ∏1“
– So, ∏1={{1},{0},{2},{3,4,5,6}}.
3.Apply the step (2) again to ∏1 ,and get ∏2.
– ∏2={{1},{0},{2},{3,4,5,6}}= ∏1 ,
– So, ∏final = ∏1
4. Let state 3 represent the state group {3,4,5,6}
So, the minimized DFA is :
1 a
a a
0 b a 3
b b
b
2
Construction of Finite Automata Equivalent to A RE
• Construct an FA equivalent to the regular
expression: (0+1)* (00+11)(0+1)*
(0+1)* (00+11)(0+1)*
q0 qf
q0 ۸ ۸
q5 ۸ q1 q2 q6 ۸ qf
1 1
q8
(0+1)*
(0+1)* 0 0
q5 qf
1 1