KMP algorithm
KMP algorithm
Pattern matching
Exact Matching Algorithms
A string is sequence of characters (0-indexed)
Examples of strings
C++ Code
HTML document
DNA sequence
Digitized Image
Applications
Text Editors, compilers
Search Engines
Biological Research
String Matching : Example
0 1 2 3 4 5 6 7 8 9 10 1 12 13 14 15 16 17 18 19
1
Text (T) A A B A A C A A D A A B A A B A A D A A
Pattern
A A B A
(P)
Pattern found at 0, 9, 12
Naïve String Matching Algorithm
Slide the Pattern (P) over Text (T) one by one and check for match
0 1 2 3 4 5 6 7 8 9 10 1 12 13 14 15 16 17 18 19
1
Text (T) A A B A A C A A D A A B A A B A A D A A
Pattern A A B A At 0
(P) Text Size – n
A A B A Pattern size – m
A A B A Each comparison takes m steps
Total n comparisons
A A B A O(mn)
A A B A
A A B A
A A B A
A A B A
A A B A
A A B A At 9 …
Naïve String Searching Algorithm
Slide the Pattern (P) over Text (T) one by one and check for match
0 1 2 3 4 5 6 7 8 9 10 1 12 13 14 15 16 17 18 19
1
Text (T) A A B A A C A A D A A B A A B A A D A A
i j
Pattern A A B A At 0
(P) 0-3 0-3
A A B A 1-4 0-3
A A B A 2-5 0-3
A A B A 3-6 0-3
A A B A 4-7 0-3
A A B A 5-8 0-3
A A B A 6-9 0-3
A A B A 7-10 0-3
A A B A 8-11 0-3
A A B A At 9 … 9-12 0-3
Worst case scenario of Naïve approach
0 1 2 3 4 5 6 7
Text (T) a a a a a a a d
Pattern X
a a a d i j
(P) X 0-3 0-3
a a a d
X 1-4 0-3
a a a d
2-5 0-3
X
a a a d 3-6 0-3
a a a d 4-7 0-3
Found a match at index 5
(T) T R A I L T R A I N
X i j
(P) T R A I N 0-4 0-4
T R A I N 5-9 0-4
• Here Index i is moved to next one after where mismatch happened (index i keep moving forward)
• Found match at 5
• Does this work always?
How about this?
0 1 2 3 4 5 6 7 8 9 10
i j
(T) O N I O N I O N S P L 0-5 0-5
X
(P) O N I O N S 6-8 0-2
X
O N I X 9 0
O X
10 0
O
• Here Index i is moved to next one after where mismatch happened
• No Match found ; while actually pattern exists
• Why does this NOT work – Overlapping sub patterns
(T) O N I O N I O N S P L
(P) O N I O N S
How about this?
0 1 2 3 4 5 6 7 8 9 10
(T) O N I O N I O N S P L i j
X
(P) O N I O N S 0-5 0-5
O N I O N S 5-8 2-5
2. We already know some of the characters in the text of the next window
For each sub-pattern pat[0..i] where i = 0 to m-1 ; lps[i] stores the length of
the maximum matching proper prefix which is also a suffix of the sub-
If beginning
pattern part of pattern occurs anywhere else in the
pat[0..i]
pattern ?
KMP – Pre-processing
Patte A B C D A B E A B F
rn 0 0 0 0 1 2 0 1 2 0
LPS
Patte A B C D E A B F A B C
rn 0 0 0 0 0 1 2 0 1 2 3
LPS
Patte A A B C A D A A B E
rn 0 1 0 0 1 0 1 2 3 0
LPS
Patte A A A A B A A C D
rn 0 1 2 3 0 1 2 0 0
LPS
KMP – Pre-processing
Patte A A A A Patte A A A C A A A A A C
rn rn
0 1 2 3 0 1 2 0 1 2 3 3 3 4
LPS LPS
Patte A B C D E Patte A A A B A A A
rn rn
0 0 0 0 0 0 1 2 0 1 2 3
LPS LPS
Patte A A B A A C A A B A A
rn 0 1 0 1 2 0 1 2 3 4 5
LPS
KMP Way
0 1 2 3 4 5 6 7 8 9 10
(T) O N I O N I O N S P
(P) O N I O N S
LPS 0 0 0 1 2 0
LPS Table
KMP Way… (P) O N I O N S
LPS 0 0 0 1 2 0
0 1 2 3 4 5 6
i
0 1 2 3 4 5 6 7 8 9 10 j
(T) O N I O N I O N S P i j
X
(P) O N I O N S 0-5 0-5
O N I O N S 5-8 2-5
(T) A B A B C A B C A B A B A B D i j
X
(P) A B A B D Mismatch i = 4; j = 4; j = LPS [j] 0-4 0-4
X
Mismatch i = 7 j = 2; A B A B D Mismatch i = 4; j = 2; j = LPS [j] 4-4 2-4
X
Mismatch i = 7 j = 0; A B A B D 5-7 0-4
X 8-13 0-4
Mismatch i = 13 j = 2; A B A B D
Match at 11 13- 2-4
A B A B D
15