0% found this document useful (0 votes)
7 views

Unit-8 String Matching

Unit-8 String Matching

Uploaded by

dakshtfgp
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Unit-8 String Matching

Unit-8 String Matching

Uploaded by

dakshtfgp
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Analysis and Design of Algorithms

(ADA)
GTU # 3150703

Unit-8:
String Matching

Dr. Gopi Sanghani


Computer Engineering Department
Darshan Institute of Engineering & Technology, Rajkot
[email protected]
9825621471
 Outline
Looping
 Introduction
 The Naive String Matching Algorithm
 The Rabin-Karp Algorithm
 String Matching with Finite Automata
 The Knuth-Morris-Pratt Algorithm
Introduction
 Text-editing programs frequently need to find all occurrences of a pattern in the text.
 Efficient algorithms for this problem is called String-Matching Algorithms.
 Among its many applications, “String-Matching” is highly used in Searching for patterns in
DNA and Internet search engines.
 Assume that the text is represented in the form of an array 𝑻[𝟏…𝒏] and the pattern is an
array 𝑷[𝟏…𝒎].

Text T[1..13] a b c a b a a b c a b a c

Pattern P[1..4] a b a a

Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – String Matching 3


Naive String Matching
Algorithm
Naive String Matching - Example
 The naive algorithm finds all valid shifts using a loop that checks the condition P[1..m] =
T[s+1..s+m]

a c a a b c a c a a b c a c a a b c

a a b a a b a a b
s= s=
s=0
1 2

a c a a b c
Pattern matched with shift 2
a a b P[1..m] = T[s+1..s+m]
s=
3
Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – String Matching 5
Naive String Matching - Algorithm
NAIVE-STRING MATCHER (T,P)
1. n = T.length
2. m = P.length T[1..6] a c a a b c
3. for s = 0 to n-m P[1..3
a a ba ba ba b
4. if p[1..m] == T[s+1..s+m] ]
5. print “Pattern occurs with
s = 0132
shift” s
Pattern occurs with shift 2

Naive String Matcher takes time O((n-m+1)m)

Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – String Matching 6


Rabin-Karp Algorithm
Text T 3 1 4 1 5 9 2 6 5 3 5

Pattern P 2 6
Choose a random prime number q =
11
Let, p = P mod q
= 26 mod 11 = 4
Let ts denotes modulo q for text of length
m

3 1 4 1 5 9 2 6 5 3 5

9 3 8 4 4 4 4 10 9 2
Pattern P 2 6 p = P mod q = 26 mod 11 = 4

Text T 3 1 4 1 5 9 2 6 5 3 5

ts 9 3 8 4 4 4 4 10 9 2
Spurious Valid
Hit match

if ts == p
if P[1..m] == T[s+1..s+m]
print “pattern occurs with shift” s
Rabin-Karp Algorithm
 We can compute using following formula

ts+1 = 10(ts - 10m-1T[s+1]) + T[s + m + 1]


3 1 4 1 5 9 2 6 5 3 5

For m=2 and s=0 ts = 31


We wish to remove higher order digit T[s+1]=3 and bring the new
lower order digit T[s+m+1]=4
ts+1 = 10(31-10·3) + 4
= 10(1) + 4 = 14
ts+2 = 10(14-10·1) + 1
= 10(4) + 1 = 41

Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – String Matching 10


Rabin-Karp-Matcher
RABIN-KARP-MATCHER(T, P, d, q)
n ← length[T]; T31415926535
m ← length[P];
h ← dm-1 mod q;
P 2 6 d 10 q 11
p ← 0; n 11 m 2 h 10
t0 ← 0;
p 4
0 t0 9
0
for i ← 1 to m do
p ← (dp + P[i]) mod q
t0 ← (dt0 + T[i]) mod q
for s ← 0 to n – m do
if p == ts then
if P[1..m] == T[s+1..s+m] then
print “pattern occurs with shift” s
if s < n-m then
ts+1 ← (d(ts – T[s+1]h) + T[s+m+1]) mod q
Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – String Matching 11
String Matching with Finite
Automata
Introduction to Finite Automata
 Finite automaton (FA) is a simple machine, used to recognize patterns.
 It has a set of states and rules for moving from one state to another.
 It takes the string of symbol as input and changes its state accordingly. When the desired
symbol is found, then the transition occurs.
 At the time of transition, the automata can either move to the next state or stay in the same
state.
 When the input string is processed successfully, and the automata reached its final state, then
it will accept the input string.
 The string-matching automaton is very efficient: it examines each character in the text
exactly once and reports all the valid shifts.

Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – String Matching 13


Introduction to Finite Automata
 A finite automaton M is a 5-tuple, which consists of,

 is a finite set of states, 𝑸={0 , 1 }


 is a start state, 𝒒 𝟎= 0
 set of accepting states, 𝑨={1}
 is a finite input alphabet, 𝚺 ={a , b }
 is a transition function of M. 𝜹 (1 , 𝑏 ) =0

Input a
State a b b
0 1 0 0 1
a
1 0 0
b
Transition Table Finite
Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – StringAutomaton
Matching 14
Suffix of String
 Suffix of a string is any number of trailing symbols of that string. If a string is a suffix of a
string then it is denoted by .

Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – String Matching 15


Compute Transition Function
COMPUTE-TRANSITION-FUNCTION(P, Σ )
m ← length[P]
for q ← 0 to m do
for each character α Є Σ do
k ← min(m + 1, q + 2)
repeat k ← k – 1 until Pk ⊐ Pqα
δ(q, α) ← k
return δ

Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – String Matching 16


1 2 3 4 5 6 7 for q ← 0 to m do
Pattern a b a b a c a Σ = {a, b, c} for each character Є Σ do
1 2 3 4 5 6 7 m=7 k ← min(m + 1, q + 2)
repeat k ← k – 1 until Pk ⊐ Pq
input δ(q, ) ← k
State a b c return δ
0 1 0 0 q=0 ω=a k=2 P2⊐P0ω ab⊐ϵa
1 k=1 P1⊐P0ω a⊐ϵa
2 ω=b k=2 P2⊐P0ω ab⊐ϵb
3 P1⊐P0ω a⊐ϵb
k=1
4
k=0 P0⊐P0ω ϵ⊐ϵb
5
ω=c k=2 P2⊐P0ω ab⊐ϵc
6
7 k=1 P1⊐P0ω a⊐ϵc
k=0 P0⊐P0ω ϵ⊐ϵc
1 2 3 4 5 6 7 for q ← 0 to m do
Pattern a b a b a c a Σ = {a, b, c} for each character Є Σ do
1 2 3 4 5 6 7 m=7 k ← min(m + 1, q + 2)
repeat k ← k – 1 until Pk ⊐ Pq
input δ(q, ) ← k
State a b c return δ
0 1 0 0 q=1 ω=a k=3 P3⊐P1ω aba⊐aa
1 1 2 0 P2⊐P1ω ab⊐aa
k=2
2
k=1 P1⊐P1ω a⊐aa
3
ω=b k=3 P3⊐P1ω aba⊐ab
4
5 k=2 P2⊐P1ω ab⊐ab
6 ω=c k=3 P3⊐P1ω aba⊐ac
7 k=2 P2⊐P1ω ab⊐ac
k=1 P1⊐P1ω a⊐ac
k=0 P0⊐P1ω ϵ⊐ac
1 2 3 4 5 6 7 for q ← 0 to m do
Pattern a b a b a c a Σ = {a, b, c} for each character Є Σ do
1 2 3 4 5 6 7 m=7 k ← min(m + 1, q + 2)
repeat k ← k – 1 until Pk ⊐ Pq
input δ(q, ) ← k
State a b c return δ
0 1 0 0 q=2 ω=a k=4 P4⊐P2ω abab⊐aba
1 1 2 0
k=3 P3⊐P2ω aba⊐aba
2 3 0 0
ω=b k=0 P0⊐P2ω ϵ⊐abb
3 1 4 0
4 5 0 0 ω=c k=0 P0⊐P2ω ϵ⊐abc
5 1 4 6
q=3 ω=a k=1 P1⊐P3ω a⊐abaa
6 7 0 0
7 1 2 0 ω=b k=4 P4⊐P3ω abab⊐abab
ω=c k=0 P0⊐P3ω ϵ⊐abac
Finite Automata Matcher
FINITE-AUTOMATON MATCHER(T, δ, m) 6
i = 5
1
3
4
2 input
7
State a b c
n ← length[T] 4
5
q= 3
0
1
2 0 1 0 0
q←0
1 1 2 0
for i ← 1 to n do
2 3 0 0
q ← δ(q, T[i])
3 1 4 0
if q == m then
4 5 0 0
print "Pattern occurs with shift" i – m 5 1 4 6
6 7 0 0
1 2 3 4 5 6 7 7 1 2 0
Pattern a b a b a c a

1 2 3 4 5 6 7 8 9 10 11
Text a b a b a b a c a b a
Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – String Matching 20
Finite Automata Matcher
input
FINITE-AUTOMATON MATCHER(T, δ, m) 78
i = 9
6
1
3
5
4
2
State a b c
n ← length[T] 6
4
5
7
q= 3
0
1
2 0 1 0 0
q←0
1 1 2 0
for i ← 1 to n do
2 3 0 0
q ← δ(q, T[i])
3 1 4 0
if q == m then
4 5 0 0
print "Pattern occurs with shift" i – m 5 1 4 6
6 7 0 0
1 2 3 4 5 6 7 7 1 2 0
Pattern a b a b a c a

1 2 3 4 5 6 7 8 9 10 11
Text a b a b a b a c a b a
Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – String Matching 21
Suffix & Prefix of a String

Suffix of a string Prefix of a string

Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – String Matching 22


String Matching with Knuth-Morris-Pratt
Algorithm
Introduction
 The KMP algorithm relies on prefix function (π).
 Proper prefix: All the characters in a string, with one or more cut off the end. “S”, “Sn”,
“Sna”, and “Snap” are all the proper prefixes of “Snape”.
 Proper suffix: All the characters in a string, with one or more cut off the beginning. “agrid”,
“grid”, “rid”, “id”, and “d” are all proper suffixes of “Hagrid”.
 KMP algorithm works as follows:
 Step-1: Calculate Prefix Function
 Step-2: Match Pattern with Text

Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – String Matching 24


Longest Common Prefix and Suffix

1 2 3 4 5 6 7
Pattern a b a b a c a
Prefix(π) 0 0 1 2 3 0 1

ababa
abab
aba
ab
a

We have prefix
Possible no possible
a ab,
= a, abprefixes
aba,
aba abab

We have suffix
Possible no possible
bb, ba,
= a, suffixes
ba
ab, aba,
bab baba

Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – String Matching 25


Calculate Prefix Function - Example
k+1 q q

1 2 3 4 5 6 7
P a c a c a g t
π 0 0 1 2 3 0 0
false true
k = 1
0
3
2 P[k+1]==P[q]
q = 4
3
2
7
6
5 false true
k>0
Initially set π[1] = 0
k is the longest prefix found k=π[k] k=k+1
q is the current index of pattern

π[q]=k
Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – String Matching 26
KMP- Compute Prefix Function
COMPUTE-PREFIX-FUNCTION(P)
m ← length[P]
π[1] ← 0
k←0
for q ← 2 to m
while k > 0 and P[k + 1] ≠ P[q]
k ← π[k]
end while
if P[k + 1] == P[q] then
k←k+1
end if
π[q] ← k
return π

Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – String Matching 27


KMP String Matching
1 2 3 4 5 6 7
Pattern a c a c a g t
Prefix(π) 0 0 1 2 3 0 0
T a c a t a c g a c a c a g t
Mismatch ?
a c a c a g t Check value in prefix table
We can skip 2 shifts
a c a c a g t
(Skip unnecessary shifts)

T a c a t a c g a c a c a g t
Mismatch ?
a c a c a g t Check value in prefix
table
T a c a t a c g a c a c a g t
Mismatch ?
a c a c a g t Check value in prefix
table
Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – String Matching 28
KMP String Matching
1 2 3 4 5 6 7
Pattern a c a c a g t
Prefix(π) 0 0 1 2 3 0 0
T a c a t a c g a c a c a g t
Mismatch ?
Check value in prefix
a c a c a g t table
We can skip 2 shifts
(Skip unnecessary shifts)
T a c a t a c g a c a c a g t

a c a c a g t

T a c a t a c g a c a c a g t

a c a c a g t
Pattern matches with shift
Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – String Matching 29
KMP-MATCHER
KMP-MATCHER(T, P)
n ← length[T]
m ← length[P]
π ← COMPUTE-PREFIX-FUNCTION(P)
q←0 //Number of characters matched.
for i ← 1 to n //Scan the text from left to right.
while q > 0 and P[q + 1] ≠ T[i]
q ← π[q] //Next character does not match.
if P[q + 1] == T[i] then
then q ← q + 1 //Next character matches.
if q == m then //Is all of P matched?
print "Pattern occurs with shift" i - m
q ← π[q] //Look for the next match.
Dr. Gopi Sanghani #3150703 (ADA)  Unit 8 – String Matching 30
Thank You!

You might also like