0% found this document useful (0 votes)
446 views

Lect 07

Uploaded by

chmianarainyahya
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
446 views

Lect 07

Uploaded by

chmianarainyahya
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 46

Finite State Machines

 An automaton is a computation that


determines whether a given string belongs
to a specified language
 A finite state machine (FSM) is an
automaton that recognize regular
languages (regular expressions)

1
Finite State Machines(cont’d)

 In particular ,finite automata can be used to


describe the process of recognizing patterns
in input strings and so can be used to
construct scanners.

2
Finite State Machines(cont’d

 Formal basis for lexical analysis is the finite state


automaton (FSA)
– REs generate regular sets
– FSAs recognize regular sets
 FSA – informal definition:
– A finite set of states
– Transitions between states
– An initial state (start)
– A set of final states (accepting states)

3
Finite Automata and Lexical Analysis

 The tokens of a language are specified using


regular expressions.
 A scanner is a big DFA, essentially the
“aggregate” of the automata for the individual
tokens.

4
Two Kinds of FSA

 Non-deterministic finite automata (NFA)


– There may be multiple possible transitions or
some transitions that do not require an input ()
 Deterministic finite automata (DFA)
– The transition from each state is uniquely
determined by the current input character
 For each state, at most 1 edge labeled ‘a’ leaving state
– No  transitions

5
Implementing the Scanner

 Three methods
– Hand-coded approach:
 Draw DFSM, then implement with loop and case statement
– Hybrid approach :
 Define tokens using regular expressions, convert to NFSM,
apply algorithm to obtain minimal DSFM
 Hand-code resulting DFSM

– Automated approach:
 Use regular grammar as input to lexical scanner
generator (e.g. LEX)

6
Hand-coding
 Branch depending on first character:
– If digit, scan numeric literal
– If character, scan identifier or keyword
– If operator, check next character (++, etc.)
 Return token found
 Write aggressive efficient code: goto’s, global variables

7
NFAs & DFAs

 Non-Deterministic Finite Automata (NFAs)


easily represent regular expression, but
are somewhat less precise.

 Deterministic
Finite Automata (DFAs)
require more complexity to represent
regular expressions, but offer more
precision.
8
Non-Deterministic Finite Automata

 An NFA is a mathematical model that consists of :


 A set of states,S
 A set of input symbols  (input symbol alphabet)
 A transition function ,move, that maps state-symbol
pairs to sets of states.
 move(state, symbol)  set of states
 A state s0 that is distinguished as the start (or initial) state
 A set of states F , distinguished as accepting (or final)state

9
Representing NFAs

 Transition Diagrams : Number states (circles),


arcs, final states, …


More suitable to
Transition Tables:
representation within a
computer

We’ll see examples of both !

10
Example NFA
 S = { 0, 1, 2, 3 } a
 s0 = 0 start
0 a 1 b 2 b 3
 F={3}
  = { a, b } b
What Language is defined ? (a|b)*abb

What is the Transition Table ?


a b
s  (null) moves possible
0 { 0, 1 } {0}
t
a i  j
1 -- {2}
t
e 2 -- {3} Switch state but do not
use any input symbol
11
How Does An NFA Work ?

a
start a b b
0 1 2 3
 Given an input string, we trace moves
b  If no more input & in final state, ACCEPT
EXAMPLE:
-OR-
Input: ababb
move(0, a) = 0
move(0, a) = 1 move(0, b) = 0
move(1, b) = 2 move(0, a) = 1
move(2, a) = ? (undefined) move(1, b) = 2
move(2, b) = 3
12 REJECT ! ACCEPT !
How Does An NFA Work ?(cont’d)

An NFA can be represented diagrammatically


by a labeled directed graph ,called transition
graph ,in which the nodes are the states and
the labeled edges represent the transition
function. This graph looks like a transition
diagram ,but the same character can label
two or more transitions out of one state, and
edges can be labeled by the especial  as
well as by input symbols.

13
How Does An NFA Work ?(cont’d)

The transition graph for an NFA that recognizes


the language (a | b)*abb. The set of states of
the NFA is { 0,1,2,3 } and the input symbol
alphabet is { a, b} State 0 is distinguish as the
start state and state 3 is indicated by a double
circle.

14
How Does An NFA Work ?(cont’d)

 When describing an NFA we use the


transition graph representation. In a
computer, the transition function of an NFA
can be implemented in several different
ways.

15
How Does An NFA Work ?(cont’d)

 The easiest implementation is a transition


table in which there is a row for each state
and a column for each input symbol and  if
necessary. The entry for row i and symbol a
in the table is the set of states(or more likely
in practice, a pointer to the set of states) that
can be reached by a transition from state i on
input “a”. the transition table for the NFA is
shown in the above slides.

16
 The transition table representation has the
advantage that it provides fast access to the
transitions of a given state on a given
character; its disadvantage is that it can take
up a lot of space when the input alphabet is
large and most transitions are to the empty
set.

17
Handling Undefined Transitions

We can handle undefined transitions by defining one more state,


a “death” state, and transitioning all previously undefined
transition to this death state.

a
start a b b
0 1 2 3

b a
a
a, b
4

18
NFA- Regular Expressions & Compilation

Problems with NFAs for Regular Expressions:


1. Valid input might not be accepted
2. NFA may behave differently on the same input

Relationship of NFAs to Compilation:


1. Regular expression “recognized” by NFA
2. Regular expression is “pattern” for a “token”
3. Tokens are building blocks for lexical analysis
4. Lexical analyzer can be described by a collection of NFAs.
Each NFA is for a language token.
19
Second NFA Example
 Given the regular expression : (a (b*c)) | (a (b | c+)?)
 Find a transition diagram NFA that recognizes it.
b
c
2 4

start a b
0 1
 c

3 c 5
String abbc can be accepted.
20
Alternative Solution Strategy

b
a c
a (b*c) 1 2 3

4 a 5 b
a (b | c+)?
c
c
7
21
Using Null Transitions to “OR” NFAs

b
a c
1 2 3

0 6

4 a 5 b

c
c
22 7
Other Concepts

Not all paths may result in acceptance.


a
start a b b
0 1 2 3

b
a a b b
aabb is accepted along path : 0  0  1  2  3

BUT… it is not accepted along the valid path:


a a b b
00000
23
NFA Example

Recognizes: aa* | b | ab a a
1 4

b
start 0  2 5

a

3  a b
0 1,2,3 - -
Can represent FA with either 1 - 4 Error
graph of transition table 2 - Error 5
3 - 2 Error
4 - 4 Error
24 5 - Error Error
Deterministic Finite Automata

A DFA is an NFA with a few restrictions


– No epsilon transitions(€)
– For every state s, there is only one transition (s,x)
from s for any symbol x in Σ
– ADVANTAGES
– Easy to implement a DFA with an
algorithm!
– Deterministic behavior
25
Simulating a DFA

INPUT:
An input string x terminated by end of file
character eof(or any other delimiter). A DFA
‘d’ with start state sº and a set of accepting
states F.
OUTPUT:
The answer “yes” if ‘d’ accepts x, “no” other
wise

26
Simulating a DFA

METHOD:
Apply the algorithm to the input string x .The
function move(s,c) gives the state to which
there is a transition from state s on input
character c . The function nextchar returns
the next character of the input string x.

27
Simulating a DFA(cont’d)

 s = s0
 c = nextchar;
 while c  eof do
 s = move(s,c);
 c = nextchar;
 end;
 if s is in F then return “yes”
 else return “no”

28
Following transition graph of a DFA
accepting the language (a|b)*abb as that
accepted by the NFA .With this DFA and
input string ababb algorithm follows the
sequence of state 0,1,2,1,2,3 and return
“yes”.

29
 String ababb b
a
start a b b
0 1 2 3
a
b a

Recall the original NFA: DFA accepting (a|b)*abb

a
start a b b
0 1 2 3

30 b
DFA Example

Recognizes: aa* | b | ab
a a
2
1
a
0 b
start
b 3

31
Finite State Machines(Cont’d)

The pattern for identifiers as commonly defined


in programming languages is given by the
following regular definition.
ID = Letter (Letter | Digit)*
This represent a string that begins with a
letter and continues with any sequence of
letters and / or digits. The process of
recognizing such a string can be described
by the diagram.
32
Finite State Machines(Cont’d)

Letter

Letter
1 2 23 ID

Digit

33
Finite State Machines(Cont’d)

In the diagram ,the circles numbered 1 and 2


represent states, which are locations in the
process of recognition that record how much
of the pattern has already been seen. The
arrowed lines represents transitions that
record a change from one state to another
upon a mach of the character or characters
by which they are labeled.

34
Finite State Machines(Cont’d)

In the sample diagram , the state 1 is the


start state or the state at which the
recognition process begins.By convention
the start state is indicated by drawing an un
labeled arrowed line to it.On state 2 any
number of letters and /or digits may be seen
and a match to these return us to state 2.

35
Finite State Machines(Cont’d)

The states that represent the end of the recognition


process in which we can declare success are called
Accepting States and are indicated by drawing a
double line border around the state in the
diagram.There may be more than one of these.In the
the sample diagram state 3 is an accepting state
indicating that after a letter is seen ,any subsequent
sequence of letters and digits represents a legal
identifier.

36
Example

The process of recognizing an actual


character string as an identifier can now be
indicated by listing the sequence of states
and transitions in the diagram that are used in
the recognition process.For example,the
process of recognizing Xtemp as an identifier
can be indicated as follows.
x t e m p
1 2 2 2 2 2
37
Where Are The Missing Transitions?

The answer is that they represent errors that


is ,in recognizing an identifier we can not
accept any characters other than letters from
the start state and letters or numbers after
that .The convention is that these error
transitions are not drawn in the diagram but
are simply assumed to always exist.If we
were to draw them the diagram for an
identifier would look as show in next slide.

38
Where Are The Missing
Transitions?(cont’d)

Letter

Letter
Start In_id
2

Other Other
Error Digit

Any
39
Where Are The Missing
Transitions?(cont’d)

In the figure,we have labeled the new state error


(Since it represents an erroneous occurrence),and
we have labeled the error transitions OTHER.By
convention ,other represents any character not
appearing in any other transition from the state where
it originates.Thus the definition of OTHER coming
from the start state is
Other = - Letter

40
Where Are The Missing
Transitions?(cont’d)

The definition of other coming from the stat


in_id is
Other = - (Letter|Digit)

41
Where Are The Missing
Transitions?(cont’d)

Letter

Letter {Other}
Start In_id Finish
2 Return ID

Digit

42
Structure of a Scanner Automaton

43
How much should we match?
 In general, find the longest match possible.
 E.g., on input 123.45, match this as
 num_const(123.45)
 rather than
 num_const(123), “.”, num_const(45).

44
ASSINGMENT

45
THANKS
46

You might also like