Lecture 3
Lecture 3
Part 2
Lexical Analysis -
Outline of the
Lecture
Lexical Analysis -
Nondeterministic
FSA
NFAs are FSA which allow 0, 1, or more transitions
from a state on a given input symbol
An NFA is a 5-tuple as before, but the transition
function δ
is different
δ(q, a) = the set of all states p, such that
there is a transition labelled a from q to p
δ : Q × Σ → 2Q
A string is accepted by an NFA if there exists a
sequence of transitions corresponding to the
string, that leads from the start state to some
final state
Every NFA can be converted to an equivalent
deterministic FA (DFA), that accepts the same
language as the NFA Lexical Analysis -
Nondeterministic FSA
Example - 1
Lexical Analysis -
An NFA and an Equivalent
DFA
Lexical Analysis -
Example of NFA to DFA
conversion
The start state of the DFA would correspond to the
set
{q 0 } and will be represented by [q0]
Starting from δ([q0 ], a), the new states of the
DFA are constructed on demand
Each subset of NFA states is a possible DFA
state
All the states of the DFA containing some final
state as a member would be final states of the
DFA
For the NFA presented before (whose equivalent
DFA was also presented)
δ[q0 ], a) = [q0 , q1 ], δ([q0 ], b) = φ
δ([q0 , q1 ], a) = [q0 , q1 ], δ([q0 , q1 ], b) = [q1 , q2 ]
δ(φ, a) = φ, δ(φ, b) = φ
δ([q1 , q2 ], a) = φ, δ([q1 , q2 ], b) =
[q1 , q2 ] [q1 , q2 ] is the final state
Lexical Analysis -
NFA with ϵ-Moves
ϵ-NFA is equivalent to NFA in
power
Lexical Analysis -
Regular
Expressions
Let Σ be an alphabet. The REs over Σ and the
languages they denote (or generate) are defined as
below
1 φ is an RE. L(φ) =
2 φ ϵ is an RE. L(ϵ)
3 = { each
For ϵ} a ∈ Σ , a is an RE. L(a) =
4 {
If a}
r and s are REs denoting the languages R
respectively
and S,
(rs) is an RE, L(rs) = R.S = { xy | x ∈ R ∧ y
∈ S}
[∞ R ∪ S
(r + s) is an RE, L(r + s) =
(r ∗) is an RE, L(r ∗) = = Ri
i=0
(L
R∗ is called the Kleene closure or closure of
∗
L)
Lexical Analysis -
Examples of Regular
Expressions
1 L = set of all strings of 0’s and 1’s
r = (0 + 1) ∗
How to generate the string 101 ?
(0 + 1) ∗ ⇒ 4 (0 + 1)(0 + 1)(0 + 1)ϵ ⇒ 4 101
2 L = set of all strings of 0’s and 1’s, with at
least two consecutive 0’s
r = (0 + 1) ∗ 00(0 + 1) ∗
3
L = { w ∈ { 0, 1} ∗ | w has two or three occurrences
first
of 1, and
the second of which are not consecutive}
r = 0 10∗ 010∗ (10∗ + ϵ)
∗
4
r = (1 + 10) ∗
L = set of all strings of 0’s and 1’s, beginning with 1
and not having two consecutive 0’s
5
r = (0 + 1) ∗ 011
L = set of all strings of 0’s and 1’s ending in 011
Lexical Analysis -
Examples of Regular
Expressions
6
r = c ∗ (a + bc ∗ ) ∗
L = set of all strings over {a,b,c} that do not
have the substring ac
7
L = { w | w ∈ { a, b} ∗ ∧ w ends
rwith
= (aa}+ b) ∗ a
8
L = {if, then, else, while, do, begin, end}
r = if + then + else + while + do + begin +
end
Lexical Analysis -
Examples of Regular
Definitions
A regular definition is a sequence of "equations" of the
form d1 = r1; d2 = r2; ... ; dn = rn, where each di is a
distinct name, and each ri is a regular expression over
the symbols
Σ 1∪ identifiers
{ d1 , d2 , ..., and integers
di − 1 }
letter = a + b + c + d + e; digit = 0 + 1 + 2 +
3 + 4; = letter (letter + digit ) ∗ ; number = digit
identifier
2 digit ∗
unsigned numbers
digit = 0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9;
digits = digit digit ∗ ;
optional_exponent = (E (+| − |ϵ)digits)
optional_fraction
+ ϵ = d˙ igits + ϵ;
unsigned _number =
digits optional_fraction optional_exponent
Lexical Analysis -
Equivalence of REs and
FSA
Lexical Analysis -
Construction of FSA from RE - r = φ, ϵ,
or a
Lexical Analysis -
FSA for r = r1 +
r2
Lexical Analysis -
FSA for r = r1
r2
Lexical Analysis -
FSA for r =
r1*
Lexical Analysis -
NFA Construction for r =
(a+b)*c
Lexical Analysis -
Transition
Diagrams
Lexical Analysis -
Lexical Analysis -
Lexical Analysis -
Lexical Analysis -
Lexical Analysis -
Lexical Analysis -
Lexical Analyzer Implementation from Trans.
Diagrams
TOKEN gettoken() {
TOKEN mytoken; char c;
while(1) { switch (state) {
/* recognize reserved words and identifiers */
case 0: c = nextchar(); if (letter(c))
state = 1; else state = failure();
break;
case 1: c = nextchar();
if (letter(c) || digit(c))
state = 1; else state = 2; break;
case 2: retract(1);
mytoken.token =
search_token(); if
(mytoken.token == IDENTIFIER)
mytoken.value = get_id_string();
return(mytoken);
Lexical Analysis -
Lexical Analysis -
Lexical Analyzer Implementation from Trans.
Diagrams
Lexical Analysis -
Lexical Analysis -
Lexical Analyzer Implementation from Trans.
Diagrams
case 6: c = nextchar(); if (digithex(c))
state = 6; else if ((c ==
’u’)|| (c == ’U’)||(c == ’l’)||
(c == ’L’)) state = 8; else
state = 7; break;
case 7: retract(1);
/* fall through to case 8, to save coding */
case 8: mytoken.token = INT_CONST;
mytoken.value = eval_hex_num();
return(mytoken);
case 9: c = nextchar(); if (digitoct(c))
state = 9; else if ((c == ’u’)||
(c == ’U’)||(c == ’l’)||(c == ’L’))
state = 11; else state = 10; break;
Lexical Analysis -
Lexical Analyzer Implementation from Trans.
Diagrams
Lexical Analysis -
Lexical Analysis -
Lexical Analyzer Implementation from Trans.
Diagrams
/* recognize integer constants */
case 12: c = nextchar(); if (digit(c))
state = 13; else state = failure();
case 13: c = nextchar(); if (digit(c))
state = 13;else if ((c == ’u’)||
(c == ’U’)||(c == ’l’)||(c == ’L’))
state = 15; else state = 14; break;
case 14: retract(1);
/* fall through to case 15, to save coding */
case 15: mytoken.token = INT_CONST;
mytoken.value = eval_int_num();
return(mytoken);
default: recover();
}
}
}
Lexical Analysis -