0% found this document useful (0 votes)
5 views

Lecture 3

Uploaded by

as.business.023
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture 3

Uploaded by

as.business.023
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Lexical Analysis -

Part 2

Lexical Analysis -
Outline of the
Lecture

What is lexical analysis? (covered in part 1)


Why should LA be separated from syntax
analysis? (covered in part 1)
Tokens, patterns, and lexemes (covered in
part 1) Difficulties in lexical analysis (covered
in part 1)
Recognition of tokens - finite automata and
transition diagrams
Specification of tokens - regular expressions and
regular definitions
LEX - A Lexical Analyzer Generator

Lexical Analysis -
Nondeterministic
FSA
NFAs are FSA which allow 0, 1, or more transitions
from a state on a given input symbol
An NFA is a 5-tuple as before, but the transition
function δ
is different
δ(q, a) = the set of all states p, such that
there is a transition labelled a from q to p
δ : Q × Σ → 2Q
A string is accepted by an NFA if there exists a
sequence of transitions corresponding to the
string, that leads from the start state to some
final state
Every NFA can be converted to an equivalent
deterministic FA (DFA), that accepts the same
language as the NFA Lexical Analysis -
Nondeterministic FSA
Example - 1

Lexical Analysis -
An NFA and an Equivalent
DFA

Lexical Analysis -
Example of NFA to DFA
conversion
The start state of the DFA would correspond to the
set
{q 0 } and will be represented by [q0]
Starting from δ([q0 ], a), the new states of the
DFA are constructed on demand
Each subset of NFA states is a possible DFA
state
All the states of the DFA containing some final
state as a member would be final states of the
DFA
For the NFA presented before (whose equivalent
DFA was also presented)
δ[q0 ], a) = [q0 , q1 ], δ([q0 ], b) = φ
δ([q0 , q1 ], a) = [q0 , q1 ], δ([q0 , q1 ], b) = [q1 , q2 ]
δ(φ, a) = φ, δ(φ, b) = φ
δ([q1 , q2 ], a) = φ, δ([q1 , q2 ], b) =
[q1 , q2 ] [q1 , q2 ] is the final state
Lexical Analysis -
NFA with ϵ-Moves
ϵ-NFA is equivalent to NFA in
power

Lexical Analysis -
Regular
Expressions
Let Σ be an alphabet. The REs over Σ and the
languages they denote (or generate) are defined as
below
1 φ is an RE. L(φ) =

2 φ ϵ is an RE. L(ϵ)
3 = { each
For ϵ} a ∈ Σ , a is an RE. L(a) =
4 {
If a}
r and s are REs denoting the languages R
respectively
and S,
(rs) is an RE, L(rs) = R.S = { xy | x ∈ R ∧ y
∈ S}
[∞ R ∪ S
(r + s) is an RE, L(r + s) =
(r ∗) is an RE, L(r ∗) = = Ri
i=0
(L
R∗ is called the Kleene closure or closure of

L)

Lexical Analysis -
Examples of Regular
Expressions
1 L = set of all strings of 0’s and 1’s
r = (0 + 1) ∗
How to generate the string 101 ?
(0 + 1) ∗ ⇒ 4 (0 + 1)(0 + 1)(0 + 1)ϵ ⇒ 4 101
2 L = set of all strings of 0’s and 1’s, with at
least two consecutive 0’s
r = (0 + 1) ∗ 00(0 + 1) ∗
3
L = { w ∈ { 0, 1} ∗ | w has two or three occurrences
first
of 1, and
the second of which are not consecutive}
r = 0 10∗ 010∗ (10∗ + ϵ)

4
r = (1 + 10) ∗
L = set of all strings of 0’s and 1’s, beginning with 1
and not having two consecutive 0’s
5
r = (0 + 1) ∗ 011
L = set of all strings of 0’s and 1’s ending in 011
Lexical Analysis -
Examples of Regular
Expressions

6
r = c ∗ (a + bc ∗ ) ∗
L = set of all strings over {a,b,c} that do not
have the substring ac
7
L = { w | w ∈ { a, b} ∗ ∧ w ends
rwith
= (aa}+ b) ∗ a
8
L = {if, then, else, while, do, begin, end}
r = if + then + else + while + do + begin +
end

Lexical Analysis -
Examples of Regular
Definitions
A regular definition is a sequence of "equations" of the
form d1 = r1; d2 = r2; ... ; dn = rn, where each di is a
distinct name, and each ri is a regular expression over
the symbols
Σ 1∪ identifiers
{ d1 , d2 , ..., and integers
di − 1 }
letter = a + b + c + d + e; digit = 0 + 1 + 2 +
3 + 4; = letter (letter + digit ) ∗ ; number = digit
identifier
2 digit ∗
unsigned numbers
digit = 0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9;
digits = digit digit ∗ ;
optional_exponent = (E (+| − |ϵ)digits)
optional_fraction
+ ϵ = d˙ igits + ϵ;
unsigned _number =
digits optional_fraction optional_exponent

Lexical Analysis -
Equivalence of REs and
FSA

Let r be an RE. Then there exists an NFA with ϵ-


transitions that accepts L(r ). The proof is by
construction.
If L is accepted by a DFA, then L is generated by
an RE. The proof is tedious.

Lexical Analysis -
Construction of FSA from RE - r = φ, ϵ,
or a

Lexical Analysis -
FSA for r = r1 +
r2

Lexical Analysis -
FSA for r = r1
r2

Lexical Analysis -
FSA for r =
r1*

Lexical Analysis -
NFA Construction for r =
(a+b)*c

Lexical Analysis -
Transition
Diagrams

Transition diagrams are generalized DFAs


with the following differences
Edges may be labelled by a symbol, a set of
symbols, or a regular definition
Some accepting states may be indicated as
retracting states, indicating that the lexeme does
not include the symbol that brought us to the
accepting state
Each accepting state has an action attached to
it, which is
executed when that state is reached. Typically,
such an action returns a token and its attribute
value
Transition diagrams are not meant for machine
translation but only for manual translation

Lexical Analysis -
Lexical Analysis -
Lexical Analysis -
Lexical Analysis -
Lexical Analysis -
Lexical Analysis -
Lexical Analyzer Implementation from Trans.
Diagrams
TOKEN gettoken() {
TOKEN mytoken; char c;
while(1) { switch (state) {
/* recognize reserved words and identifiers */
case 0: c = nextchar(); if (letter(c))
state = 1; else state = failure();
break;
case 1: c = nextchar();
if (letter(c) || digit(c))
state = 1; else state = 2; break;
case 2: retract(1);
mytoken.token =
search_token(); if
(mytoken.token == IDENTIFIER)
mytoken.value = get_id_string();
return(mytoken);
Lexical Analysis -
Lexical Analysis -
Lexical Analyzer Implementation from Trans.
Diagrams

/* recognize hexa and octal constants */


case 3: c = nextchar();
if (c == ’0’) state = 4; break;
else state = failure();
case 4: c = nextchar();
if ((c == ’x’) || (c == ’X’)) state
= 5; else if (digitoct(c)) state
= 9; else state = failure();
break;
case 5: c = nextchar(); if (digithex(c))
state = 6; else state = failure();
break;

Lexical Analysis -
Lexical Analysis -
Lexical Analyzer Implementation from Trans.
Diagrams
case 6: c = nextchar(); if (digithex(c))
state = 6; else if ((c ==
’u’)|| (c == ’U’)||(c == ’l’)||
(c == ’L’)) state = 8; else
state = 7; break;
case 7: retract(1);
/* fall through to case 8, to save coding */
case 8: mytoken.token = INT_CONST;
mytoken.value = eval_hex_num();
return(mytoken);
case 9: c = nextchar(); if (digitoct(c))
state = 9; else if ((c == ’u’)||
(c == ’U’)||(c == ’l’)||(c == ’L’))
state = 11; else state = 10; break;

Lexical Analysis -
Lexical Analyzer Implementation from Trans.
Diagrams

case 10: retract(1);


/* fall through to case 11, to save coding */
case 11: mytoken.token = INT_CONST;
mytoken.value = eval_oct_num();
return(mytoken);

Lexical Analysis -
Lexical Analysis -
Lexical Analyzer Implementation from Trans.
Diagrams
/* recognize integer constants */
case 12: c = nextchar(); if (digit(c))
state = 13; else state = failure();
case 13: c = nextchar(); if (digit(c))
state = 13;else if ((c == ’u’)||
(c == ’U’)||(c == ’l’)||(c == ’L’))
state = 15; else state = 14; break;
case 14: retract(1);
/* fall through to case 15, to save coding */
case 15: mytoken.token = INT_CONST;
mytoken.value = eval_int_num();
return(mytoken);
default: recover();
}
}
}
Lexical Analysis -

You might also like