0% found this document useful (0 votes)
3 views

M3-string_matching

The document discusses the Naive String-Matching Algorithm, which identifies valid shifts of a pattern within a text using a loop. The algorithm checks for matching conditions across all possible shifts, resulting in a time complexity of O((n-m+1)*m), which can be θ(n²) when the pattern length is half of the text length. Various tables illustrate the matching process through examples.

Uploaded by

ssanjayreg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

M3-string_matching

The document discusses the Naive String-Matching Algorithm, which identifies valid shifts of a pattern within a text using a loop. The algorithm checks for matching conditions across all possible shifts, resulting in a time complexity of O((n-m+1)*m), which can be θ(n²) when the pattern length is half of the text length. Various tables illustrate the matching process through examples.

Uploaded by

ssanjayreg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

String Matching Algorithms

Naive String-Matching Algorithm

• Finds all valid shifts using a loop that checks the condition

P [1..m] = T[s+1..s+m] for each of the n - m + 1 possible

values of s

• takes time O((n-m+1)*m), which is θ(n2) if m = ⌊n/2⌋


Naive String-Matching Algorithm
Naive String-Matching Algorithm
Naive String-Matching Algorithm 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 7
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 9
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 11
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 13
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 17
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 18
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 19
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 21
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 23
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 24
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 28
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm 31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 0 0 0 1 0 1 0 0 0 1

1 2 3 4
0 0 0 1
Naive String-Matching Algorithm
Rabin-Karp algorithm

• Performs well in practice and that also generalizes to other

algorithms for related problems, such as two-dimensional

pattern matching

• uses θ(m) preprocessing time, and its worst-case running

time is θ((n-m+1)*m)
Rabin-Karp algorithm

• Makes use of elementary number-theoretic notions such as


the equivalence of two numbers modulo a third number

• Assume that Σ= {0, 1, 2,...,9}, so that each character is a


decimal digit

• Can then view a string of k consecutive characters as


representing a length-k decimal number
Rabin-Karp algorithm

• Character string 31415 thus corresponds to the decimal


number 31,415

• Given a pattern P [1..m], let p denote its corresponding


decimal value.

• In a similar manner, given a text T[1..n], let t s denote


decimal value of the length-m substring T [s+1..s + m]
Rabin-Karp algorithm

• Can compute p in time θ(m) using Horner’s rule:

• Similarly compute t0 from T [1..m] in time θ(m)

• t1 is computed from [2..m+1] and so on


Rabin-Karp algorithm
• Observe that we can compute ts+ 1from ts in constant time

• For example, if m = 5 and t s = 31415, then we wish to


remove the high-order digit T [s + 1] = 3 and bring in the
new low-order digit (suppose it is T [s + 5 + 1] = 2) to
obtain
• ts+1 = 10(31415 - 10000 * 3) + 2 = 14152
Rabin-Karp algorithm
• We can find all occurrences of the pattern P[1..m] in the text
T[1..n] with θ(m) preprocessing time and θ(n - m + 1)
matching time
• One problem: p and t s may be too large to work with
conveniently
• If P contains m characters, then we cannot reasonably assume
that each arithmetic operation on p (which is m digits long)
takes “constant time.”
Rabin-Karp algorithm
• Fortunately, we can solve this problem easily, as Figure 32.5
shows: compute p and the ts values modulo a suitable modulus
q
• Choose modulus q as a prime such that 10q just fits within
one computer word
• then we can perform all the necessary computations with
single-precision arithmetic
Rabin-Karp algorithm
Rabin-Karp algorithm
Rabin-Karp algorithm
• If q is large enough, then we hope that spurious hits occur
infrequently enough that the cost of the extra checking is low.

• The inputs to the procedure are the text T , the pattern P , the
radix d to use (which is typically taken to be |Σ|), and the
prime q to use
Rabin-Karp algorithm
The Knuth-Morris-Pratt algorithm
• Linear-time string-matching algorithm

• using just an auxiliary function π, which we precompute


from the pattern in time θ(m) and store in an array π[1..m]

• prefix function π for a pattern encapsulates knowledge about


how the pattern matches against shifts of itself
The Knuth-Morris-Pratt algorithm
• avoid testing useless shifts in the naive pattern-matching
algorithm

• For this example, q = 5 of the characters have matched


successfully, but the 6th pattern character fails to match the
corresponding text character
The Knuth-Morris-Pratt algorithm
• Knowing these q text characters allows us to determine
immediately that certain shifts are invalid

• In this example, shift s + 1 is necessarily invalid, since first


character (a) would be aligned with a text character that we
know does not match the first pattern character, but does
match the second pattern character (b).
The Knuth-Morris-Pratt algorithm
• Given a pattern P [1..m], the prefix function for the pattern P
is the function π: {1, 2, ... , m} -> {0, 1, ..., m-1} such that

• π[q] = max {k : k < q and Pk > Pq }


KMP algorithm
KMP algorithm
KMP algorithm
Trace
• T - bacbababaababacac

• P - ababaca
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Example
Trace
• P - ababca

• T - abgababcababcabdababebcedaababcababcabdababe

You might also like