Lecture 2b
Lecture 2b
COMPILERS
Lecture 2b
Lecture Outline
■ Constructing Transition Diagrams
■ Finite Automata (FA)
■ Nondeterministic Finite Automata (NFA)
Constructing Transition Diagrams for Tokens
■ Transition Diagrams (TD) are used to represent the tokens
– these are automatons
■ As characters are read, the relevant TDs are used to
attempt to match lexeme to a pattern
■ Each TD has:
– States : Represented by Circles
– Actions : Represented by Arrows between states
– Start State : Beginning of a pattern (Arrowhead)
– Final State(s) : End of pattern (Concentric Circles)
■ Each TD is Deterministic
Components of Transition Diagrams
a
Is a transition on 𝑎 ∈ ∑ Is an accepting state
a b c
c
Example - Automatons
■ A tool to specify a token
other
8 * RTN(G)
■ We’ve accepted “>” and have read other char that must be
unread.
Sample RE and Token’s in C Language
Regular Token Attribute-Value
Expression
ws - -
if if -
then then -
else else -
id id pointer to table entry
num num pointer to table entry
< relop LT
<= relop LE
= relop EQ
<> relop NE
> relop GT
>= relop GE
Note: Each token has a unique token identifier to define category of lexemes
Example: All C RELOP’s
start < =
0 1 2 return(relop, LE)
>
3 return(relop, NE)
other
= 4
*
return(relop, LT)
5 return(relop, EQ)
>
=
6 7 return(relop, GE)
other
8
*
return(relop, GT)
Example: combined finite automaton
Finite State Automaton
■ Finite automata are recognizers:
– that takes an input string and determines whether it’s a
valid string of the language.
– they simply say "yes" or "no" about each possible input
string.
■ A FA can be used to recognize the tokens specified by a RE
■ A FA is a simple, idealized computer that recognizes strings
as belonging to regular sets
Finite State Automaton
■ A finite automaton is, in the abstract sense, a machine that
has a finite number of states and a finite number of
transitions between these.
■ A transition between states is usually labelled by a
character from the input alphabet, but we can also use
transitions marked with ε (epsilon transitions)
■ A finite automaton can be used to decide if an input string
is a member in some particular set of strings.
Finite Automata
■ A Finite Automaton consists of:
– A finite set of states
– A finite vocabulary, denoted Σ
– A set of transitions (or moves) from one state to another,
labeled with characters in Σ
– A special state called the start state
– A subset of the states called the accepting, or final, states
■ A FA also can be represented graphically using a TD
Finite Automata
■ To begin, we select one of the states of the automaton as
the starting state.
■ We start in this state and in each step, we can do one of
the following:
– Follow an epsilon transition to another state, or
– Read a character from the input and follow a transition
labelled by that character.
■ When all characters from the input are read, we see if the
current state is marked as being accepting.
■ If so, the string we have read from the input is in the
language defined by the automaton.
Formal Definition of Finite Automata
■ A Finite Automaton is a five tuple (𝑄, Σ, 𝛿, 𝑞! , 𝐹), where:
– 𝑄 is a finite set of states
– Σ is a finite set of input symbols called the alphabet
– 𝛿: 𝑄×Σ → 𝑄 is the transition function (or moves from one
state to another)
– 𝑞! 𝜖𝑄 is a special state called the start state
– 𝐹 ⊆ 𝑄 is a subset of states called accepting or final states
Example of a Finite Automata
0,1 0 1
1 𝑞! 𝑞! 𝑞#
𝑞! 𝑞"
𝑞# 𝑞" 𝑞#
𝑞#
𝑞" 𝑞# 𝑞#
1
0
0
a b
𝛿=
a b
a
b
t r
s t r
t t u
b b
a a
r v r
u v a
u t u
b
v v r
Two Types of Finite automata:
■ Nondeterministic finite automata (NFA) have no restrictions
on the labels of their edges.
– A symbol can label several edges out of the same state, and e, the
empty string, is a possible label.
■ We may have a choice of several actions at each step:
– We can choose between either an epsilon transition or a transition
on an alphabet character, and if there are several transitions with
the same symbol, we can choose between these.
– This makes the automaton nondeterministic, as the choice of action
is not determined solely by looking at the current state and input.
– It may be that some choices lead to an accepting state while others
do not.
Two Types of Finite automata:
■ Deterministic finite automata (DFA) have, for each state,
and for each symbol of its input alphabet exactly one edge
with that symbol leaving that state.
■ Both deterministic and nondeterministic finite automata
are capable of recognizing the same languages.
– A finite automaton can be used to decide if an input string
is a member in some particular set of strings.
Nondeterministic Finite Automata (NFA)
■ A NFA is one that has a choice of edges (labeled with the
same symbol) to follow out of a state.
■ Or it may have special edges labeled with ε (epsilon) that
can be followed without eating any symbol from the input.
– It may be that some choices lead to an accepting state while others
do not.
– In DFA machine, several choices may exist for the next state at any
point.
NFA
■ Transitions between states is usually labelled by a
character from the input alphabet.
■ We start in the start state and in each step, we can do one
of the following:
– Follow an epsilon transition to another state, or
– Read a character from the input and follow a transition
labelled by that character
■ When all characters from the input are read, we see if the
current state is marked as being accepting.
■ If so, the string we have read from the input is in the
language defined by the automaton.
Definition
■ A nondeterministic finite automaton consists of a set S of
states. One of these states, 𝑠! ∈ 𝑆, is called the starting
state (initial state) of the automaton and a subset 𝐹 ⊆ 𝑆 of
the states are accepting states (final states).
■ Additionally, we have a set T of transitions. Each transition t
connects a pair of states 𝑠" and 𝑠# and is labelled with a
symbol, which is either a character c from the alphabet Σ,
or the symbol ε, which indicates an epsilon-transition.
■ A transition from state s to state t on the symbol c is written
as 𝑠 % 𝑡.
Formal Definition of NFA
■ A nondeterministic finite automaton (NFA) is a five tuple
(𝑄, ∑, 𝛿, 𝑞! , 𝐹) where:
– 𝑄 is a set of finite states
– ∑ is a finite input alphabet
– 𝛿: 𝑄×Σ& → Ρ(𝑄) is the transition function that gives,
move(state, symbol) ® state
– 𝑞! ∈ 𝑄 is a start state (initial state)
– 𝐹 ⊆ 𝑄 is the set of final or accepting states
NFA
■ Figure below shows an NFA
1 0, 𝜖 1
𝑞! 𝑞# 𝑞" 𝑞$
start a b b
0 1 2 3
S = { 0, 1, 2, 3 }
𝑠/ = 0
F={3}
Σ = { a, b }
How Does An NFA Work?
a
start a b b
0 1 2 3
a
start
0 a 1 b 2 b 3
b a
a
a, b
4
S
NFA – Regular Expressions & Compilation
■ Problems with NFAs for Regular Expressions:
– Valid input might not be accepted
– NFA may behave differently on the same input
■ Relationship of NFAs to Compilation:
– Regular expression “recognized” by NFA
– Regular expression is “pattern” for a “token”
– Tokens are building blocks for lexical analysis
– Lexical analyzer can be described by a collection of NFAs.
Each NFA is for a language token.
Example
■ Given the regular expression : (a (b*c)) | (a (b | c+)?). Find
a transition diagram NFA that recognizes it.
b
c
2 4
Î
start a b
0 1
Î c
3 c 5
Alternative Solution
b
a c
a (b*c) 1 2 3
4 a 5 b
a (b | c+)?
c
c
7
Now that you have the individual diagrams, “or” them as follows:
Using Null Transitions to “OR” NFAs
b
a c
1 2 3
0 6
Î
4 a 5 b
c
c
7
Converting a regular expression to an NFA
■ We construct NFA compositionally from a RE.
■ From each subexpression, we can construct an NFA
fragment and then combine these fragments into bigger
fragments.
■ A fragment is not a complete NFA, so we complete the
construction by adding the necessary components to make
a complete NFA.
Converting a regular expression to an NFA
Converting a regular expression to an NFA
Converting a RE to a NFA
Example
■ NFA for the regular expression (a|b)∗ac
Try
■ Given the regular expression a∗(a|b)aa. Construct an
equivalent NFA using the method described here.