0% found this document useful (0 votes)
79 views

Answer Fo Auomata

This document discusses automata theory and its applications in bioinformatics. It provides three examples of how automata can be used in bioinformatics: (1) to find an exact string, (2) to find an inexact string or motif, and (3) to find genes or structures using long-range correlations. The document also defines deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs), and proves that NFAs and DFAs have equivalent computational power despite NFAs allowing nondeterminism through multiple transition options.

Uploaded by

Mehari Temesgen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

Answer Fo Auomata

This document discusses automata theory and its applications in bioinformatics. It provides three examples of how automata can be used in bioinformatics: (1) to find an exact string, (2) to find an inexact string or motif, and (3) to find genes or structures using long-range correlations. The document also defines deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs), and proves that NFAs and DFAs have equivalent computational power despite NFAs allowing nondeterminism through multiple transition options.

Uploaded by

Mehari Temesgen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

VL Algorithmen und Datenstrukturen für Bioinformatik (19400001)

WS15/2016
Woche 12

Tim Conrad
AG Medical Bioinformatics
Institut für Mathematik & Informatik, Freie Universität Berlin

Based on material by Andrej Bogdanov, U Hong Kong


What is Automata Theory?

• Study of abstract computing devices, or “machines”


• Automaton = an abstract computing device
• Note: A “device” does not need to even be a physical hardware!
Automata in Bioinformatics

• Main idea:
given a text T, check for occurrence of a pattern

• Example I

Find an
exact string
(KMP!)

Knuth-Morris-Pratt
Automata in Bioinformatics

• Main idea:
given a text T, check for occurrence of a pattern

• Example II

Find an
inexact string
(motif,
correlations)
Serine carboxypeptidases,
histidine active site.
Automata in Bioinformatics

• Main idea:
given a text T, check for occurrence of a pattern

• Example III

Find a
gene

(long range
correlations)
Automata in Bioinformatics

• Main idea:
given a text T, check for occurrence of a pattern

• Example IV

Find a
structure

(long range
correlations)
• Parsing regular language
– DFA <-> NDFA <-> RE

• Limits of regular languages


– No long-range correlation

• Context-free languages
– Long-range correlation
• NOT covered:
– Chomsky normal form
– Parsing CFG
(push-down-automaton,
CYK algorithm)
Recall:
Finite Automata
Example of a finite automaton

off on

• There are states off and on, the automaton starts in


off and tries to reach the “good state” on
• What sequences of fs lead to the good state?
• Answer: {f, fff, fffff, …} = {f n: n is odd}
• This is an example of a deterministic finite automaton
over alphabet {f}
Deterministic finite automata

• A deterministic finite automaton (DFA) is a 5-tuple


(Q, Σ, δ, q0, F) where
– Q is a finite set of states
– Σ is an alphabet
– δ: Q × Σ → Q is a transition function
– q0 ∈ Q is the initial state
– F ⊆ Q is a set of accepting states (or final states).
• In diagrams, the accepting states will be denoted by
double loops
Example

0 1 0,1

q0 1 q1 0 q2

alphabet Σ = {0, 1} transition function δ:


start state Q = {q0, q1, q2} inputs
initial state q0 0 1
accepting states F = {q0, q1} q0 q0 q1
q1 q2 q1

states
q2 q2 q2
Language of a DFA

The language of a DFA (Q, Σ, δ, q0, F) is the set of


all strings over Σ that, starting from q0 and
following the transitions as the string is read left
to right, will reach some accepting state.

M: off on

• Language of M is {f, fff, fffff, …} = {f n: n is odd}


Examples

0 0
1
q0 q1
1

0 1
1
q0 q1
0

0 1 0,1

q0 1 q1 0 q2

What are the languages of these DFAs?


Examples

• Construct a DFA that accepts the language

L = {010, 1} ( Σ = {0, 1} )
Examples

• Construct a DFA that accepts the language

L = {010, 1} ( Σ = {0, 1} )

• Answer
q0 1 q01 0 q010
0
0 1
qε 0, 1
1 q1 0, 1 qdie
0, 1
Examples

• Construct a DFA over alphabet {0, 1} that accepts all


strings that end in 101
Examples

• Construct a DFA over alphabet {0, 1} that accepts all


strings that end in 101

• Hint: The DFA must “remember” the last 3 bits of the


string it is reading
Examples

• Construct a DFA over alphabet {0, 1} that accepts all


strings that end in 101
• Sketch of answer:
0
0 q000
0 q00 1
1
q0 q001
0 1
q01 …



1 q101
0 q10
1
q1 …


1
q11 1
q111 1
• Parsing regular language
– DFA <-> NDFA <-> RE

• Limits of regular languages


– No long-range correlation

• Context-free languages
– Long-range correlation

Nondeterminism
Would be easier if…

• Suppose we could guess when the string we are


reading has only 3 symbols left
• Then we could simply look for the sequence 101
and accept if we see it

3 symbols left 1 0 1

qdie
This is not a DFA!
Nondeterminism

• Nondeterminism is the ability to make guesses, which


we can later verify
• Informal nondeterministic algorithm for language of
strings that end in 101:
1. Guess if you are approaching end of input
2. If guess is yes, look for 101 and accept if you see it
3. If guess is no, read one more symbol and go to step 1
Nondeterministic finite automaton

• This is a kind of automaton that allows you to make


guesses

0, 1

q0 1 q1 0 q2 1 q3

• Each state can have zero, one, or more transitions out


labeled by the same symbol
Semantics of guessing

0, 1

q0 1 q1 0 q2 1 q3

• State q0 has two transitions labeled 1


• Upon reading 1, we have the choice of staying in q0 or
moving to q1
Semantics of guessing

0, 1

q0 1 q1 0 q2 1 q3

• State q1 has no transition labeled 1


• Upon reading 1 in q1, we die; upon reading 0, we
continue to q2
Semantics of guessing

0, 1

q0 1 q1 0 q2 1 q3

• State q3 has no transition going out


• Upon reading anything in q3, we die
Meaning of automaton

Guess if you are 3 symbols


away from end of input

0, 1

q0 1 q1 0 q2 1 q3

If so, guess you will Check that you are at


see the pattern 101 the end of input
Formal definition

• A nondeterministic finite automaton (NFA) is a


5-tuple (Q, Σ, δ, q0, F) where
– Q is a finite set of states
– Σ is an alphabet
– δ: Q × Σ → subsets of Q is a transition function
– q0 ∈ Q is the initial state
– F ⊆ Q is a set of accepting states (or final states).
• Only difference from DFA is that output of δ is a set
of states
Example

0, 1

q0 1 q1 0 q2 1 q3

alphabet Σ = {0, 1} transition function δ:


start state Q = {q0, q1, q2, q3} inputs
initial state q0 0 1
accepting states F = {q3} q0 {q0} {q0, q1}
states q1 {q2} ∅
q2 ∅ {q3}
q3 ∅ ∅
Language of an NFA

The language of an NFA is the set of all strings for


which there is some path that, starting from the
initial state, leads to an accepting state as the
string is read left to right.

• Example
0, 1

q0 1 q1 0 q2 1 q3

– 1101 is accepted, but 0110 is not


NFAs are as powerful as DFAs

• Obviously, an NFA can do everything a DFA can do


• But can it do more?
NFAs are as powerful as DFAs

• Obviously, an NFA can do everything a DFA can do


• But can it do more?

NO!
• Theorem

A language L is accepted by some DFA if and


only if it is accepted by some NFA.
Proof of theorem

• To prove the theorem, we have to show that for


every NFA there is a DFA that accepts the same
language
• We will give a general method for simulating any NFA
by a DFA
• Let’s do an example first
Simulation example

0, 1

1 0
NFA: q0 q1 q2

0 0

1 0
DFA: q0 q0 or q1 q0 or q2
1 1
General method

NFA DFA
states q0, q1, …, qn ∅, {q0}, {q1}, {q0,q1}, …, {q0,…,qn}
one for each subset of states in the NFA

initial state q0 {q0}

transitions δ δ’({qi1,…,qik}, a) =
δ(qi1, a) ∪…∪ δ(qik, a)
accepting F⊆Q F’ = {S: S contains some state in F}
states
Proof of correctness

• Lemma
After reading n symbols, the DFA is in state
{qi1,…,qik} if and only if the NFA is in one of the
states qi1,…,qik

• Proof by induction on n
• At the end, the DFA accepts iff it is in a state that
contains some accepting state of NFA
• By lemma, this is true iff the NFA can reach an
accepting state
• Parsing regular language
– DFA <-> NDFA <-> RE

• Limits of regular languages


– No long-range correlation

• Context-free languages
– Long-range correlation

Regular Expressions
Operations on strings

• Given two strings s = a1…an and t = b1…bm, we define


their concatenation st = a1…anb1…bm

s = abb, t = cba st = abbcba

• We define sn as the concatenation ss…s n times


s = 011 s3 = 011011011
Operations on languages

• The concatenation of languages L1 and L2 is

L1L2 = {st: s ∈ L1, t ∈ L2}

• Similarly, we write Ln for LL…L (n times)


• The union of languages L1 ∪ L2 is the set of all strings
that are in L1 or in L2

• Example: L1 = {01, 0}, L2 = {ε, 1, 11, 111, …}.


What is L1L2 and L1 ∪ L2?
Operations on languages

• The star (Kleene closure) of L are all strings made up


of zero or more chunks from L:

L* = L 0 ∪ L1 ∪ L2 ∪ …

– This is always infinite, and always contains ε

• Example: L1 = {01, 0}, L2 = {ε, 1, 11, 111, …}.


What is L1* and L2*?
Constructing languages with operations

• Let’s fix an alphabet, say Σ = {0, 1}


• We can construct languages by starting with simple
ones, like {0}, {1} and combining them

{0}({0}∪{1})* 0(0+1)*
all strings that start with 0

({0}{1}*)∪({1}{0}*) 01*+10*
Regular expressions

• A regular expression over Σ is an expression formed


using the following rules:
– The symbol ∅ is a regular expression
– The symbol ε is a regular expression
– For every a ∈ Σ, the symbol a is a regular expression
– If R and S are regular expressions, so are RS, R+S and R*.

• Definition of regular language

A language is regular if it is represented by a


regular expression
Examples

1. 01* = {0, 01, 011, 0111, …..}


2. (01*)(01) = {001, 0101, 01101, 011101, …..}
3. (0+1)*
4. (0+1)*01(0+1)*
5. ((0+1)(0+1)+(0+1)(0+1)(0+1))*
6. ((0+1)(0+1))*+((0+1)(0+1)(0+1))*
7. (1+01+001)*(ε+0+00)
Examples

• Construct a RE over Σ = {0,1} that represents


– All strings that have two consecutive 0s.

(0+1)*00(0+1)*

– All strings except those with two consecutive 0s.

(1*01)*1* + (1*01)*1*0

– All strings with an even number of 0s.

(1*01*01*)*
Main theorem for regular languages

• Theorem

A language is regular iff (if and only if) it is the


language of some DFA

regular
DFA NFA
expression

regular languages
Proof plan

• For every regular expression, we have to give a DFA


for the same language

regular
εNFA NFA DFA
expression

• For every DFA, we give a regular expression for the


same language
What is an εNFA?

• An εNFA is an extension of NFA where some


transitions can be labeled by ε
– Formally, the transition function of an εNFA is a function
δ: Q × ( Σ ∪ {ε}) → subsets of Q

• The automaton is allowed to follow ε-transitions


without consuming an input symbol
Example of εNFA

a ε
q0 ε,b q1 q2 Σ = {a, b}
a

• Which of the following is accepted by this εNFA:


– aab, bab, ab, bb, a, ε
Examples: regular expression → εNFA

• R1 = 0
q0 0 q1

• R2 = 0 + 1
q2 0 q3 M2
ε ε
q0 q1
ε q4 1 q5 ε

• R3 = (0 + 1)* ε
ε
q’0 ε M2 ε q’1
General method

regular expr εNFA

∅ q0

ε q0

symbol a q0 a q1

RS q0 ε MR ε ε q1
MS
Convention

• When we draw a box around an εNFA:


– The arrow going in points to the start state
– The arrow going out represents all transitions going out of
accepting states
– None of the states inside the box is accepting
– The labels of the states inside the box are distinct from all
other states in the diagram
General method continued

regular expr εNFA

MR ε
ε
R+S q0 q1
ε ε
MS

ε
ε
R* q0 ε MR ε q1
Road map

 
εNFA NFA
regular DFA
expression
Example of εNFA to NFA conversion

a ε
εNFA: q0 ε,b q1 q2
a

Transition table of corresponding NFA:


inputs
a b
q0 {q0, q1, q2} {q1, q2}
states

q1 {q0, q1, q2} ∅


q2 ∅ ∅

Accepting states of NFA: {q0, q1, q2}


Example of εNFA to NFA conversion

a ε
εNFA: q0 ε,b q1 q2
a

a a a
NFA: q0 a, b q1 q2
a

a, b
General method

• To convert an εNFA to an NFA:


– States stay the same
– Start state stays the same
– The NFA has a transition from qi to qj labeled a iff the εNFA
has a path from qi to qj that contains one transition labeled
a and all other transitions labeled ε
– The accepting states of the NFA are all states that can
reach some accepting state of εNFA using only ε-transitions
Why the conversion works

In the original ε-NFA, when given input a1a2…an the


automaton goes through a sequence of states:
q0 → q1→ q2 → … → qm
Some ε-transitions may be in the sequence:
q0 → ... → q → ... → q → … →
ε a ε i1 ε a ε i2 ε
q
ε in
1 2

In the new NFA, each sequence of states of the form:


qik→
ε
... → q
ε ik+1
ak+1

will be represented by a single transition qik → q


ak+1 ik+1
because of the way we construct the NFA.
Proof that the conversion works

• More formally, we have the following invariant for any


k ≥ 1:
After reading k input symbols, the set of
states that the εNFA and NFA can be in are
exactly the same

• We prove this by induction on k


• When k = 0, the εNFA can be in more states, while
the NFA must be in q0
Proof that the conversion works

• When k ≥ 1 (input is not the empty string)


– If εNFA is in an accepting state, so is NFA
– Conversely, if NFA is an accepting state qi, then some
accepting state of εNFA is reachable from qi, so εNFA
accepts also
• When k = 0 (input is the empty string)
– The εNFA accepts iff one of its accepting states is reachable
from q0
– This is true iff q0 is an accepting state of the NFA
From DFA to regular expressions

  
εNFA NFA
regular DFA
expression

You might also like