0% found this document useful (0 votes)
59 views

Knuth Morris Pratt Algorithm

The KMP algorithm is a linear time string matching algorithm that uses a prefix table to skip characters and reduce comparisons when a mismatch occurs. It preprocesses the pattern to create the prefix table, then searches for the pattern in the text by matching characters and using the table to determine how far to skip ahead after a mismatch.

Uploaded by

Dhiman1001
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Knuth Morris Pratt Algorithm

The KMP algorithm is a linear time string matching algorithm that uses a prefix table to skip characters and reduce comparisons when a mismatch occurs. It preprocesses the pattern to create the prefix table, then searches for the pattern in the text by matching characters and using the table to determine how far to skip ahead after a mismatch.

Uploaded by

Dhiman1001
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Knuth-Morris-Pratt Algorithm

KMP Algorithm is one of the most popular patterns matching algorithms. KMP stands for
Knuth Morris Pratt. KMP algorithm was invented by Donald Knuth and Vaughan Pratt
together and independently by James H Morris in the year 1970. In the year 1977, all the
three jointly published KMP Algorithm. KMP algorithm was the first linear time
complexity algorithm for string matching. KMP algorithm is one of the string-matching
algorithms used to find a Pattern in a Text.
KMP algorithm is used to find a "Pattern" in a "Text". This algorithm compares character
by character from left to right. But whenever a mismatch occurs, it uses a preprocessed table
called "Prefix Table" to skip characters comparison while matching. Sometimes prefix table
is also known as LPS Table. Here LPS stands for "Longest proper Prefix which is also
Suffix".
The Knuth Morris Pratt algorithm is a string-matching algorithm that searches for
occurrences of a “word” W within a string “S” in O (n +m) time, where n is the length of S
and m is the length of W. The core of the KMP algorithm is the ability to memorize the
matches of the pattern within the text. This efficiency is achieved by precomputing a table of
how far the search position should jump ahead when a mismatch occurs. The key idea is to
avoid redundant checking of characters in S that have already been matched against W.
The main steps of the KMP algorithm are:
• Preprocessing Phase: Before the actual search begins, the algorithm preprocesses W
to create a partial match table (known as “prefix table”). This table is used to
determine how far to skip ahead in the string S after a mismatch. The preprocessing
focuses on finding the longest proper prefix which is also a suffix in the sub patterns
of W.
• Searching Phase: The algorithm then begins to search for W in S by matching
characters from the beginning. When a mismatch occurs, the partial match table is
used to find out how many characters can be safely skipped. Instead of starting the
search anew from the next character, the algorithm uses the information in the partial
match table to skip non-viable positions, thereby reducing the number of comparisons
needed.

Working of KMP Algorithm


In the KMP algorithm, an index number is used to track each character of the string and
pattern. A variable is used to trace the index numbers.
The matching of each character of text and pattern occurs between two strings.
There is a pie table or Longest Proper Prefix (LPP) table. For making an LPP table, there
are two terms known as prefixes and suffixes in a string.
For example, in a string:
abab
• The prefix for the above string can be: a, ab, or aba. The prefix excludes the last
string character.
• The suffix for the above string can be: b, ba, or bab. It excludes the first character of
the string.
Consider an example to make an LPP table.
a b c d a b e a b f
1 2 3 4 5 6 7 8 9 10

Initially, we will add ‘0’ to all the characters because it is the first time they are occurring,
starting from the leftmost end of the pattern. As a, b, c and d occurs for the first time, hence
corresponding index of them is 0.
a b c d a b e a b f
0 0 0 0
At index 5, ‘a’ occurs once more. As character at index 1 is same as character at index 5, So,
we write 1 for ‘a’. Similarly, character at index 2 is same as character at index 6, So, we
write 2 for ‘b’.
a b c d a b e a b f
0 0 0 0 1 2
At index 7 the character is ‘e’ which appears for the first time. So, corresponding value for
‘e’ at LPP table is 0.
a b c d a b e a b f
0 0 0 0 1 2 0
At index 8, ‘a’ occurs once again. As ‘a’ appears first time at index 1, so, we write 1 for ‘a’
at LPP table. Similarly, we write 2 for ‘b’. As ‘f’ appears for the first time, so the
corresponding value for ‘f’ is 0.
a b c d a b e a b f
0 0 0 0 1 2 0 1 2 0
The LPP table for some other strings is as below:

a b c d e a b f a b c
The String
1 2 3 4 5 6 7 8 9 10 11

a b c d e a b f a b c
The LPP
0 0 0 0 0 1 2 0 1 2 3

a a b c a d a a b e
The String
1 2 3 4 5 6 7 8 9 10

a a b c a d a a b e
The LPP
0 1 0 0 1 0 1 2 3 0
Look at index 7 and 8. We have written 1 and 2 at the LPP table, as ‘a’ appears at index 1,
and repeating ‘a’ appears at index 2. Hence for ‘a’ at index 8, the corresponding value at
LPP table is 2.
Algorithm for Creating LPS Table (Prefix Table)
Step 1 Define a one-dimensional array with the size equal to the length of the Pattern.
(LPS [size])
Step 2 Define variables i & j. Set i = 0, j = 1 and LPS [0] = 0.
Step 3 Repeat from Step 4 to Step 6 till all the values of LPS [] are not filled
Step 4 Compare the characters at Pattern[i] and Pattern[j].
Step 5 If they are same then
▪ set LPS[j] = i+1
▪ Increase value of i & j by one.
▪ Goto to Step 3.
Step 6 If they are not same then
▪ Check the value of variable 'i'.
✓ If it is '0' then set LPS[j] = 0
▪ Increase value of ‘j’ by one
✓ If it is not '0' then set i = LPS[i-1].
▪ Goto Step 3.
Step 7 Stop

Pattern Matching Algorithm


KMP_Search (Pattern, Text)
Where the Pattern is searched from the Text
Step 1 Construct the LPS table
Step 2 [Initialization]
a. n ← string length
b. m ← pattern length
c. i←0
d. j←0
Step 3 Repeat from Step 4 to Step 6 while i < n
Step 4 [If the characters match]
If pattern[j] = string[i] then
[Increase i and j by 1]
a. i ← i + 1
b. j ← j + 1
Step 5 If j = m then
return (i – j) [index of the match]
Step 6 Else if i<n && pattern[j] ≠ string[i] then
if j ≠ 0, then
j ← LPS [j - 1]
else
i←i+1
[End of Step 3 loop]
Step 7 [No match found]
return -1
Step 8 Stop

You might also like