Unit-8 String Matching
Unit-8 String Matching
(ADA)
GTU # 3150703
Unit-8:
String Matching
Text T[1..13] a b c a b a a b c a b a c
Pattern P[1..4] a b a a
a c a a b c a c a a b c a c a a b c
a a b a a b a a b
s= s=
s=0
1 2
a c a a b c
Pattern matched with shift 2
a a b P[1..m] = T[s+1..s+m]
s=
3
Dr. Gopi Sanghani #3150703 (ADA) Unit 8 – String Matching 5
Naive String Matching - Algorithm
NAIVE-STRING MATCHER (T,P)
1. n = T.length
2. m = P.length T[1..6] a c a a b c
3. for s = 0 to n-m P[1..3
a a ba ba ba b
4. if p[1..m] == T[s+1..s+m] ]
5. print “Pattern occurs with
s = 0132
shift” s
Pattern occurs with shift 2
Pattern P 2 6
Choose a random prime number q =
11
Let, p = P mod q
= 26 mod 11 = 4
Let ts denotes modulo q for text of length
m
3 1 4 1 5 9 2 6 5 3 5
9 3 8 4 4 4 4 10 9 2
Pattern P 2 6 p = P mod q = 26 mod 11 = 4
Text T 3 1 4 1 5 9 2 6 5 3 5
ts 9 3 8 4 4 4 4 10 9 2
Spurious Valid
Hit match
if ts == p
if P[1..m] == T[s+1..s+m]
print “pattern occurs with shift” s
Rabin-Karp Algorithm
We can compute using following formula
Input a
State a b b
0 1 0 0 1
a
1 0 0
b
Transition Table Finite
Dr. Gopi Sanghani #3150703 (ADA) Unit 8 – StringAutomaton
Matching 14
Suffix of String
Suffix of a string is any number of trailing symbols of that string. If a string is a suffix of a
string then it is denoted by .
1 2 3 4 5 6 7 8 9 10 11
Text a b a b a b a c a b a
Dr. Gopi Sanghani #3150703 (ADA) Unit 8 – String Matching 20
Finite Automata Matcher
input
FINITE-AUTOMATON MATCHER(T, δ, m) 78
i = 9
6
1
3
5
4
2
State a b c
n ← length[T] 6
4
5
7
q= 3
0
1
2 0 1 0 0
q←0
1 1 2 0
for i ← 1 to n do
2 3 0 0
q ← δ(q, T[i])
3 1 4 0
if q == m then
4 5 0 0
print "Pattern occurs with shift" i – m 5 1 4 6
6 7 0 0
1 2 3 4 5 6 7 7 1 2 0
Pattern a b a b a c a
1 2 3 4 5 6 7 8 9 10 11
Text a b a b a b a c a b a
Dr. Gopi Sanghani #3150703 (ADA) Unit 8 – String Matching 21
Suffix & Prefix of a String
1 2 3 4 5 6 7
Pattern a b a b a c a
Prefix(π) 0 0 1 2 3 0 1
ababa
abab
aba
ab
a
We have prefix
Possible no possible
a ab,
= a, abprefixes
aba,
aba abab
We have suffix
Possible no possible
bb, ba,
= a, suffixes
ba
ab, aba,
bab baba
1 2 3 4 5 6 7
P a c a c a g t
π 0 0 1 2 3 0 0
false true
k = 1
0
3
2 P[k+1]==P[q]
q = 4
3
2
7
6
5 false true
k>0
Initially set π[1] = 0
k is the longest prefix found k=π[k] k=k+1
q is the current index of pattern
π[q]=k
Dr. Gopi Sanghani #3150703 (ADA) Unit 8 – String Matching 26
KMP- Compute Prefix Function
COMPUTE-PREFIX-FUNCTION(P)
m ← length[P]
π[1] ← 0
k←0
for q ← 2 to m
while k > 0 and P[k + 1] ≠ P[q]
k ← π[k]
end while
if P[k + 1] == P[q] then
k←k+1
end if
π[q] ← k
return π
T a c a t a c g a c a c a g t
Mismatch ?
a c a c a g t Check value in prefix
table
T a c a t a c g a c a c a g t
Mismatch ?
a c a c a g t Check value in prefix
table
Dr. Gopi Sanghani #3150703 (ADA) Unit 8 – String Matching 28
KMP String Matching
1 2 3 4 5 6 7
Pattern a c a c a g t
Prefix(π) 0 0 1 2 3 0 0
T a c a t a c g a c a c a g t
Mismatch ?
Check value in prefix
a c a c a g t table
We can skip 2 shifts
(Skip unnecessary shifts)
T a c a t a c g a c a c a g t
a c a c a g t
T a c a t a c g a c a c a g t
a c a c a g t
Pattern matches with shift
Dr. Gopi Sanghani #3150703 (ADA) Unit 8 – String Matching 29
KMP-MATCHER
KMP-MATCHER(T, P)
n ← length[T]
m ← length[P]
π ← COMPUTE-PREFIX-FUNCTION(P)
q←0 //Number of characters matched.
for i ← 1 to n //Scan the text from left to right.
while q > 0 and P[q + 1] ≠ T[i]
q ← π[q] //Next character does not match.
if P[q + 1] == T[i] then
then q ← q + 1 //Next character matches.
if q == m then //Is all of P matched?
print "Pattern occurs with shift" i - m
q ← π[q] //Look for the next match.
Dr. Gopi Sanghani #3150703 (ADA) Unit 8 – String Matching 30
Thank You!