Lecture 34, 35 36 - String Matching Algorithms
Lecture 34, 35 36 - String Matching Algorithms
Algorithms
Instructor: Dr. Zuhair Zafar
Symbols /
Representation
Operators
Σ Sigma: Represent all the characters that can appear in the text
Sigma Star: The set of all the finite length strings formed using
Σ∗ sigma, Σ.
𝜀 Epsilon: zero length string or an empty string
⊏ Prefix notation
⊐ Suffix notation
String Matching Algorithms
⚫ The above figure portrays the naïve-string algorithm procedure as sliding a template
containing the pattern over the text.
⚫ The pattern occurs at shift s = 2 which is a valid shift.
⚫ Pattern occurs at shift s=0, s=1 and s=3 are invalid shifts.
⚫ Goal is to find all the valid shifts.
O(n-m+1)
O(m)
String Matching Algorithms
Index 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Pattern a b c d a b e a b f a a b c a d a a b e
Pi values 0 0 0 0 1 2 0 1 2 0 0 1 0 0 1 0 1 2 3 0
Example 1 Example 2
Index 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9
Pattern a b c d e a b f a b c a a a a b a a c d
Pi values 0 0 0 0 0 1 2 0 1 2 3 0 1 2 3 0 1 2 0 0
Example 3 Example 4
How KMP Algorithm works
⚫ Illustration: given a String ‘S’ and pattern ‘p’ as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
0 1 2 3 4 5
p: a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S: a b a b c a b c a b a b a b d
𝑞 Step 3: if q == m
0 1 2 3 4 5
In the above pseudocode for computing the The for loop beginning in step 5 runs ‘n’ times,
prefix function, the for loop from step 4 to step 10 i.e., as long as the length of the string ‘S’. Since
runs ‘m’ times. Step 1 to step 3 take constant step 1 to step 4 take constant time, the running
time. Hence the running time of compute prefix time is dominated by this for loop. Thus running
function is Θ(m). time of matching function is Θ(n).