0% found this document useful (0 votes)
113 views

5.the Knuth Morris Pratt Algorithm

The document describes the Knuth-Morris-Pratt (KMP) string matching algorithm. It discusses how the KMP algorithm uses the prefix function to compute the overlap between the pattern and text to efficiently determine matches. It provides examples to demonstrate how the prefix function and matching process works on a sample pattern and text. It also includes two lemmas about properties of the prefix function used to prove correctness of the KMP algorithm.

Uploaded by

Shubham Taneja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views

5.the Knuth Morris Pratt Algorithm

The document describes the Knuth-Morris-Pratt (KMP) string matching algorithm. It discusses how the KMP algorithm uses the prefix function to compute the overlap between the pattern and text to efficiently determine matches. It provides examples to demonstrate how the prefix function and matching process works on a sample pattern and text. It also includes two lemmas about properties of the prefix function used to prove correctness of the KMP algorithm.

Uploaded by

Shubham Taneja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

The Knuth-Morris-Pratt

algorithm
KMP Matcher Algorithm
Prefix Function Algorithm
Alternative Prefix function algorithm
Input: pattern P of length m
Overlap[1] = 0
For k:=1 to m-1 // Consider P[1..k+1]
c:=P[k+1] // current character of P
v:=Overlap[k]
while P[v+1] ≠ c and v ≠ 0 // until overlap can be extended
v:=Overlap[v] // find next largest precomputed overlap
if P[v+1] = c then
Overlap[k+1]:=v+1 // extend the current overlap
else
Overlap[k+1]:=0 // no overlap exists return overlap
Matching Algorithm
i=1,j=1,k=1
While (n-k) ≥ m do
while j ≤ m and T[i] = P[j] do
i++, j++
if j > m then output k
if Overlap (j-1) > 0 then
k=i-Overlap (j-1)
else
if i==k then i++
k=i;
if j>1 then j=Overlap(j-1) + 1
Computation of Prefix and Matching
j 1 2 3 4 5 6 7 8
P A T T A T A C A
Overlap (j) 0 0 0 1 2 1 0 1

i 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6

P A T C G C A C A T T A T A C A T T A T T A T A C A T
j
Example
• i=1, j=1, k=1 – match
• i=2, j=2 – match
• i=3, j=3 – no match
• Since Overlap(j-1) = 0, j = overlap(j-1) + 1 => j = 1, k = i=3
• i=3, j=1 – no match
• Since i = k, i++ => i=4, j =1 – no match
• Since i = k, i++ => i=5, j =1 – no match
• Since i = k, i++ => i=6, j =1 – match
• i=7, j=2 – no match
• Since Overlap(j-1) = 0, j = overlap(j-1) + 1 => j = 1, k = i=7
• i=7, j=1 – no match
Example
• i=8, j=1 – match
• i=9, j=2 - match
• i=10, j=3 - match
• i=11, j=4 - match
• i=12, j=5 - match
• i=13, j=6 – match
• i=14, j=7 - match
• i=15, j=8 - match
• i=16, j=9 – j > m => output k = 8 (position from where the pattern is
found)
• Overlap(8) = 1 > 0 => k = i – overlap(j-1) = 16-1 = 15
• Start matching at j = overlap(j-1) + 1 = 1+1 = 2
Example
• i=16, j=2 – match
• i=17, j=3 – match
• i=18, j=4 – match
• i=19, j=5 – match
• i=20, j=6 – no match
• Overlap(j-1) = 2 > 0 => k=i-overlap(j-1) = 20-2=18; j = overlap(j-1)+1 =
2+1 = 3
• i=20, j = 3 – match
• i=21, j = 4 – match
• i=22, j = 5 – match
• i=23, j = 6 – match
• i=24, j = 7 – match
• i=25, j = 8 – match
• i=26, j = 9 – match => j > m => output k – k =18.
Example 2
Running time
• For prefix computation – Θ(m)
• For matching - Θ(n)
Lemma 32.5 (Prefix function iteration lemma)
• *[q] is the list of all possible values obtained by
repeatedly applying the prefix function  to q.
• Lemma: Let P be a pattern of length m with prefix
function π. Then, for q = 1, 2, …, m, we have *[q]
= {k : k < q and Pk ] Pq}.
• Proof: We first prove that i ϵ π*[q] implies Pi ] Pq.
• If i ϵ π*[q], then i = π(u)[q] for some u > 0. we prove
the above equation by induction on u.
• For u = 1, we have i = π[q], and the claim follows
since i < q and Pπ[q] ] Pq.
Lemma 32.5
• Using the relations π[i] < i and Pπ[i] ] Pi and the
transitivity of < and ] establishes the claim for
all i in π*[q].
• Therefore, π*[q]  {k : k < q and Pk ] Pq}.
• We prove that {k : k < q and Pk ] Pq}  π*[q] by
contradiction.
• Suppose to the contrary that there is an
integer in the set {k : k < q and Pk ] Pq} - π*[q],
and let j be the largest such value.
Lemma 32.5
• Because π[q] is the largest value in {k : k < q
and Pk ] Pq} and π[q] ϵ π*[q], we must have j <
π[q], and so we let j’ denote the smallest
integer in π*[q] that is greater than j.
• We can choose j’ = π[q] if there is no other
number in π*[q] that is greater than j.
• We have Pj ] Pq because j ϵ {k : k < q and Pk ]
Pq}, and we have Pj’ ] Pq because j’ ϵ π*[q].
Lemma 32.5
• Thus, Pj ] Pj’ by lemma 32.1 and j is the largest
value less than j’ with this property.
• Therefore, we must have π[j’] = j and, since j’ ϵ
π*[q], we must have j ϵ π*[q] as well.
• This contradiction proves the lemma.
Lemma 32.6
• Let P be a pattern of length m, and let π be the
prefix function for P. for q = 1, 2, …, m, if π[q] >
0, then π[q] – 1 ϵ π*[q – 1].
• Proof: if r = π[q] > 0, then r < q and Pr ] Pq; thus
r – 1 < q – 1 and Pr-1 ] Pq-1 (by dropping the last
character from Pr and Pq).
• By lemma 32.5, therefore, π[q] – 1 = r – 1 ϵ
π*[q-1].

You might also like