Knuth Morris Pratt Algorithm
Knuth Morris Pratt Algorithm
KMP Algorithm is one of the most popular patterns matching algorithms. KMP stands for
Knuth Morris Pratt. KMP algorithm was invented by Donald Knuth and Vaughan Pratt
together and independently by James H Morris in the year 1970. In the year 1977, all the
three jointly published KMP Algorithm. KMP algorithm was the first linear time
complexity algorithm for string matching. KMP algorithm is one of the string-matching
algorithms used to find a Pattern in a Text.
KMP algorithm is used to find a "Pattern" in a "Text". This algorithm compares character
by character from left to right. But whenever a mismatch occurs, it uses a preprocessed table
called "Prefix Table" to skip characters comparison while matching. Sometimes prefix table
is also known as LPS Table. Here LPS stands for "Longest proper Prefix which is also
Suffix".
The Knuth Morris Pratt algorithm is a string-matching algorithm that searches for
occurrences of a “word” W within a string “S” in O (n +m) time, where n is the length of S
and m is the length of W. The core of the KMP algorithm is the ability to memorize the
matches of the pattern within the text. This efficiency is achieved by precomputing a table of
how far the search position should jump ahead when a mismatch occurs. The key idea is to
avoid redundant checking of characters in S that have already been matched against W.
The main steps of the KMP algorithm are:
• Preprocessing Phase: Before the actual search begins, the algorithm preprocesses W
to create a partial match table (known as “prefix table”). This table is used to
determine how far to skip ahead in the string S after a mismatch. The preprocessing
focuses on finding the longest proper prefix which is also a suffix in the sub patterns
of W.
• Searching Phase: The algorithm then begins to search for W in S by matching
characters from the beginning. When a mismatch occurs, the partial match table is
used to find out how many characters can be safely skipped. Instead of starting the
search anew from the next character, the algorithm uses the information in the partial
match table to skip non-viable positions, thereby reducing the number of comparisons
needed.
Initially, we will add ‘0’ to all the characters because it is the first time they are occurring,
starting from the leftmost end of the pattern. As a, b, c and d occurs for the first time, hence
corresponding index of them is 0.
a b c d a b e a b f
0 0 0 0
At index 5, ‘a’ occurs once more. As character at index 1 is same as character at index 5, So,
we write 1 for ‘a’. Similarly, character at index 2 is same as character at index 6, So, we
write 2 for ‘b’.
a b c d a b e a b f
0 0 0 0 1 2
At index 7 the character is ‘e’ which appears for the first time. So, corresponding value for
‘e’ at LPP table is 0.
a b c d a b e a b f
0 0 0 0 1 2 0
At index 8, ‘a’ occurs once again. As ‘a’ appears first time at index 1, so, we write 1 for ‘a’
at LPP table. Similarly, we write 2 for ‘b’. As ‘f’ appears for the first time, so the
corresponding value for ‘f’ is 0.
a b c d a b e a b f
0 0 0 0 1 2 0 1 2 0
The LPP table for some other strings is as below:
a b c d e a b f a b c
The String
1 2 3 4 5 6 7 8 9 10 11
a b c d e a b f a b c
The LPP
0 0 0 0 0 1 2 0 1 2 3
a a b c a d a a b e
The String
1 2 3 4 5 6 7 8 9 10
a a b c a d a a b e
The LPP
0 1 0 0 1 0 1 2 3 0
Look at index 7 and 8. We have written 1 and 2 at the LPP table, as ‘a’ appears at index 1,
and repeating ‘a’ appears at index 2. Hence for ‘a’ at index 8, the corresponding value at
LPP table is 2.
Algorithm for Creating LPS Table (Prefix Table)
Step 1 Define a one-dimensional array with the size equal to the length of the Pattern.
(LPS [size])
Step 2 Define variables i & j. Set i = 0, j = 1 and LPS [0] = 0.
Step 3 Repeat from Step 4 to Step 6 till all the values of LPS [] are not filled
Step 4 Compare the characters at Pattern[i] and Pattern[j].
Step 5 If they are same then
▪ set LPS[j] = i+1
▪ Increase value of i & j by one.
▪ Goto to Step 3.
Step 6 If they are not same then
▪ Check the value of variable 'i'.
✓ If it is '0' then set LPS[j] = 0
▪ Increase value of ‘j’ by one
✓ If it is not '0' then set i = LPS[i-1].
▪ Goto Step 3.
Step 7 Stop