0% found this document useful (0 votes)
2 views

Lecture 2b

The document outlines the concepts of transition diagrams, finite automata (FA), and nondeterministic finite automata (NFA) in the context of compilers. It explains the construction of transition diagrams for tokens, the formal definitions of finite automata, and the differences between deterministic and nondeterministic automata. Additionally, it discusses how NFAs relate to regular expressions and their role in lexical analysis.

Uploaded by

enochmack04
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 2b

The document outlines the concepts of transition diagrams, finite automata (FA), and nondeterministic finite automata (NFA) in the context of compilers. It explains the construction of transition diagrams for tokens, the formal definitions of finite automata, and the differences between deterministic and nondeterministic automata. Additionally, it discusses how NFAs relate to regular expressions and their role in lexical analysis.

Uploaded by

enochmack04
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

WUCS405

COMPILERS
Lecture 2b
Lecture Outline
■ Constructing Transition Diagrams
■ Finite Automata (FA)
■ Nondeterministic Finite Automata (NFA)
Constructing Transition Diagrams for Tokens
■ Transition Diagrams (TD) are used to represent the tokens
– these are automatons
■ As characters are read, the relevant TDs are used to
attempt to match lexeme to a pattern
■ Each TD has:
– States : Represented by Circles
– Actions : Represented by Arrows between states
– Start State : Beginning of a pattern (Arrowhead)
– Final State(s) : End of pattern (Concentric Circles)
■ Each TD is Deterministic
Components of Transition Diagrams

is a State Is the start state

a
Is a transition on 𝑎 ∈ ∑ Is an accepting state

a b c

c
Example - Automatons
■ A tool to specify a token

>= start > = RTN(GE)


0 6 7

other
8 * RTN(G)

■ We’ve accepted “>” and have read other char that must be
unread.
Sample RE and Token’s in C Language
Regular Token Attribute-Value
Expression

ws - -
if if -
then then -
else else -
id id pointer to table entry
num num pointer to table entry
< relop LT
<= relop LE
= relop EQ
<> relop NE
> relop GT
>= relop GE

Note: Each token has a unique token identifier to define category of lexemes
Example: All C RELOP’s
start < =
0 1 2 return(relop, LE)
>
3 return(relop, NE)
other

= 4
*
return(relop, LT)

5 return(relop, EQ)
>

=
6 7 return(relop, GE)
other
8
*
return(relop, GT)
Example: combined finite automaton
Finite State Automaton
■ Finite automata are recognizers:
– that takes an input string and determines whether it’s a
valid string of the language.
– they simply say "yes" or "no" about each possible input
string.
■ A FA can be used to recognize the tokens specified by a RE
■ A FA is a simple, idealized computer that recognizes strings
as belonging to regular sets
Finite State Automaton
■ A finite automaton is, in the abstract sense, a machine that
has a finite number of states and a finite number of
transitions between these.
■ A transition between states is usually labelled by a
character from the input alphabet, but we can also use
transitions marked with ε (epsilon transitions)
■ A finite automaton can be used to decide if an input string
is a member in some particular set of strings.
Finite Automata
■ A Finite Automaton consists of:
– A finite set of states
– A finite vocabulary, denoted Σ
– A set of transitions (or moves) from one state to another,
labeled with characters in Σ
– A special state called the start state
– A subset of the states called the accepting, or final, states
■ A FA also can be represented graphically using a TD
Finite Automata
■ To begin, we select one of the states of the automaton as
the starting state.
■ We start in this state and in each step, we can do one of
the following:
– Follow an epsilon transition to another state, or
– Read a character from the input and follow a transition
labelled by that character.
■ When all characters from the input are read, we see if the
current state is marked as being accepting.
■ If so, the string we have read from the input is in the
language defined by the automaton.
Formal Definition of Finite Automata
■ A Finite Automaton is a five tuple (𝑄, Σ, 𝛿, 𝑞! , 𝐹), where:
– 𝑄 is a finite set of states
– Σ is a finite set of input symbols called the alphabet
– 𝛿: 𝑄×Σ → 𝑄 is the transition function (or moves from one
state to another)
– 𝑞! 𝜖𝑄 is a special state called the start state
– 𝐹 ⊆ 𝑄 is a subset of states called accepting or final states
Example of a Finite Automata
0,1 0 1
1 𝑞! 𝑞! 𝑞#
𝑞! 𝑞"
𝑞# 𝑞" 𝑞#
𝑞#
𝑞" 𝑞# 𝑞#
1
0
0

■ We can describe 𝑀" formally by writing 𝑀" = (𝑄, Σ, 𝛿, 𝑞" , 𝐹)


■ 𝑄 = 𝑞" , 𝑞# , 𝑞$
■ Σ = 0,1
■ 𝑞" 𝑖𝑠 𝑡ℎ𝑒 𝑠𝑡𝑎𝑟𝑡 𝑠𝑡𝑎𝑡𝑒
■ 𝐹 = 𝑞#
Example of a Finite Automata
𝑀% = { 𝑠, 𝑟, 𝑡, 𝑢, 𝑣 , 𝑎, 𝑏 , 𝛿, 𝑠 , 𝑟, 𝑡 } s

a b
𝛿=
a b
a
b
t r
s t r

t t u
b b
a a
r v r

u v a
u t u
b
v v r
Two Types of Finite automata:
■ Nondeterministic finite automata (NFA) have no restrictions
on the labels of their edges.
– A symbol can label several edges out of the same state, and e, the
empty string, is a possible label.
■ We may have a choice of several actions at each step:
– We can choose between either an epsilon transition or a transition
on an alphabet character, and if there are several transitions with
the same symbol, we can choose between these.
– This makes the automaton nondeterministic, as the choice of action
is not determined solely by looking at the current state and input.
– It may be that some choices lead to an accepting state while others
do not.
Two Types of Finite automata:
■ Deterministic finite automata (DFA) have, for each state,
and for each symbol of its input alphabet exactly one edge
with that symbol leaving that state.
■ Both deterministic and nondeterministic finite automata
are capable of recognizing the same languages.
– A finite automaton can be used to decide if an input string
is a member in some particular set of strings.
Nondeterministic Finite Automata (NFA)
■ A NFA is one that has a choice of edges (labeled with the
same symbol) to follow out of a state.
■ Or it may have special edges labeled with ε (epsilon) that
can be followed without eating any symbol from the input.
– It may be that some choices lead to an accepting state while others
do not.
– In DFA machine, several choices may exist for the next state at any
point.
NFA
■ Transitions between states is usually labelled by a
character from the input alphabet.
■ We start in the start state and in each step, we can do one
of the following:
– Follow an epsilon transition to another state, or
– Read a character from the input and follow a transition
labelled by that character
■ When all characters from the input are read, we see if the
current state is marked as being accepting.
■ If so, the string we have read from the input is in the
language defined by the automaton.
Definition
■ A nondeterministic finite automaton consists of a set S of
states. One of these states, 𝑠! ∈ 𝑆, is called the starting
state (initial state) of the automaton and a subset 𝐹 ⊆ 𝑆 of
the states are accepting states (final states).
■ Additionally, we have a set T of transitions. Each transition t
connects a pair of states 𝑠" and 𝑠# and is labelled with a
symbol, which is either a character c from the alphabet Σ,
or the symbol ε, which indicates an epsilon-transition.
■ A transition from state s to state t on the symbol c is written
as 𝑠 % 𝑡.
Formal Definition of NFA
■ A nondeterministic finite automaton (NFA) is a five tuple
(𝑄, ∑, 𝛿, 𝑞! , 𝐹) where:
– 𝑄 is a set of finite states
– ∑ is a finite input alphabet
– 𝛿: 𝑄×Σ& → Ρ(𝑄) is the transition function that gives,
move(state, symbol) ® state
– 𝑞! ∈ 𝑄 is a start state (initial state)
– 𝐹 ⊆ 𝑄 is the set of final or accepting states
NFA
■ Figure below shows an NFA

■ This NFA recognizes the language described by the regular


expression a∗(a|b).
■ As an example, show the sequence of transitions to recognize the
string aab.
Example NFA
0, 1 0, 1

1 0, 𝜖 1
𝑞! 𝑞# 𝑞" 𝑞$

𝑁" = 𝑞" , 𝑞# , 𝑞$ , 𝑞' , 0,1 , 𝛿, 𝑞" , {𝑞' }


𝛿=
𝟎 𝟏 𝝐
𝑞! {𝑞! } {𝑞! , 𝑞# } ∅
𝑞# {𝑞" } ∅ {𝑞" }
𝑞" ∅ {𝑞$ } ∅
𝑞$ {𝑞$ } {𝑞$ } ∅
Example
The transition graph for an NFA recognizing the language of
regular expression (a|b)*abb
a

start a b b
0 1 2 3

S = { 0, 1, 2, 3 }
𝑠/ = 0
F={3}
Σ = { a, b }
How Does An NFA Work?
a

start a b b
0 1 2 3

■ Given an input string, we trace moves


■ If no more input & in final state, ACCEPT
EXAMPLE: move(0, a) = 0
Input: ababb move(0, b) = 0
move(0, a) = 1
move(0, a) = 1
move(1, b) = 2
move(1, b) = 2
move(2, b) = 3
move(2, a) = ? (undefined)
ACCEPT !
REJECT !
Transition Table
■ We can also represent an NFA by a transition table, whose
rows correspond to states, and whose columns correspond
to the input symbols and Î.
– The transition table has the advantage that we can easily
find the transitions on a given state and input.
– The disadvantage is that it takes a lot of space, when the
input alphabet is large.
State a b 𝝐
0 {0, 1} {0} 0
1 0 {2} 0
2 0 {3} 0
3 0 0 0
Handling Undefined Transitions
■ We can handle undefined transitions by defining one more
state, a “death” state, and transitioning all previously
undefined transition to this death state.

a
start
0 a 1 b 2 b 3

b a
a
a, b

4
S
NFA – Regular Expressions & Compilation
■ Problems with NFAs for Regular Expressions:
– Valid input might not be accepted
– NFA may behave differently on the same input
■ Relationship of NFAs to Compilation:
– Regular expression “recognized” by NFA
– Regular expression is “pattern” for a “token”
– Tokens are building blocks for lexical analysis
– Lexical analyzer can be described by a collection of NFAs.
Each NFA is for a language token.
Example
■ Given the regular expression : (a (b*c)) | (a (b | c+)?). Find
a transition diagram NFA that recognizes it.

b
c
2 4
Î

start a b
0 1

Î c

3 c 5
Alternative Solution
b
a c
a (b*c) 1 2 3

4 a 5 b
a (b | c+)?
c
c
7

Now that you have the individual diagrams, “or” them as follows:
Using Null Transitions to “OR” NFAs

b
a c
1 2 3

0 6
Î

4 a 5 b

c
c
7
Converting a regular expression to an NFA
■ We construct NFA compositionally from a RE.
■ From each subexpression, we can construct an NFA
fragment and then combine these fragments into bigger
fragments.
■ A fragment is not a complete NFA, so we complete the
construction by adding the necessary components to make
a complete NFA.
Converting a regular expression to an NFA
Converting a regular expression to an NFA
Converting a RE to a NFA
Example
■ NFA for the regular expression (a|b)∗ac
Try
■ Given the regular expression a∗(a|b)aa. Construct an
equivalent NFA using the method described here.

■ Given the regular expression ((a|b)(a|bb))∗ Construct an


equivalent NFA

You might also like