0% found this document useful (0 votes)
23 views32 pages

Ada Chapt7 Space and Time Trade Offs

The document discusses techniques for improving the time complexity of algorithms by using space-time tradeoffs. It covers sorting by counting and distribution counting, which improve sorting to O(n) time by using O(n) extra space. It also covers the string matching algorithm Horspool's, which improves efficiency by preprocessing the pattern to generate a shift table.

Uploaded by

Satvik Sapaliga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views32 pages

Ada Chapt7 Space and Time Trade Offs

The document discusses techniques for improving the time complexity of algorithms by using space-time tradeoffs. It covers sorting by counting and distribution counting, which improve sorting to O(n) time by using O(n) extra space. It also covers the string matching algorithm Horspool's, which improves efficiency by preprocessing the pattern to generate a shift table.

Uploaded by

Satvik Sapaliga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

ANALYSIS AND DESIGN OF

ALGORITHMS

UNIT-III
CHAPTER 7:
SPACE AND TIME TRADEOFFS

1
OUTLINE
 Space and Time Tradeoffs

 Sorting by Counting

 Input Enhancement in String Matching


 Horspool’s Algorithm

 Hashing
 Open Hashing (Separate Chaining)
 Closed Hashing (Open Addressing)
2
Space and Time Tradeoffs
 The two techniques based on pre-processing of data thereby
increasing the speed of the algorithm are:
• Input enhancement
• Pre-structuring

 Input Enhancement Technique: In this technique it is required to


preprocess the problem’s input, in whole or in part, and store the
additional information obtained to accelerate solving the problem
afterwards.
We discuss the following algorithms based on it:
 Counting methods for sorting.
 Horspool’s algorithm, a simplified version of Boyer-Moore
algorithm

 Pre-structuring: This technique that exploits space-for-time tradeoffs


simply uses extra space to facilitate faster and/or more flexible access
to the data.
We illustrate this approach by
 Hashing. 3
Sorting by Counting
 For each element of a list to be sorted, count the total number of elements
smaller than this element and record the results in a table.

 These numbers will indicate the positions of the elements in sorted list:
Example: If the count is 3 for some element, it should be in the 4th
position in the sorted array.

 Thus, we will be able to sort the list by simply copying its elements to
their appropriate positions in a new, sorted list.

 This algorithm is called Comparison counting sort.


A: Elements 20 35 10 18 40 15
Count 3 4 0 2 5 1

0 1 2 3 4 5 4
S: sorted Elements 10 15 18 20 35 40
Sorting by Counting
ALGORITHM ComparisonCountingSort(A[0 . . . n-1])
//Sorts an array by comparison counting
//Input: An array A[0 . . . n-1] of orderable elements
//Output: Array S[0 . . . n-1] of A’s elements sorted in
nondecreasing order
for i ← 0 to n-1 do Count[i] ← 0
for i ← 0 to n-2 do
for j ← i +1 to n-1 do
if A[i] < A[j]
Count[j] ← Count[j] + 1
else Count[i] ← Count[i] + 1
for i ← 0 to n-1 do S[Count[i]] ← A[i]
return S 5
Tracing of Comparison counting Sort
Let us illustrate the working of this algorithm by taking the elements 20, 35,10, 18,
40, 15. The outermost loop varies from 0 to n-2 and thus total number of passes
required will be n-1. In this example, since there are 6 elements, we require 5
passes. The figure below provides the number of elements less than the
corresponding item at each pass.
0 1 2 3 4 5
Array A[0 . . . 5] 20 35 10 18 40 15

Initially Count[ ] 0 0 0 0 0 0
After pass i = 0 Count[ ] 3 1 0 0 1 0
After pass i = 1 Count[ ] 4 0 0 2 0
After pass i = 2 Count[ ] 0 1 3 1
After pass i = 3 Count[ ] 2 4 1
After pass i = 4 Count[ ] 5 1
Final state Count[ ] 3 4 0 2 5 1

0 1 2 3 4 5 6

Array S[0 . . . 5] Sorted 10 15 18 20 35 40


Analysis of Comparison count sort
 The input size metric for this algorithm is n.

 The basic operation is comparison statement “if A[i] < A[j]” in the
innermost for loop.

 The number of comparisons can be obtained as shown below:


n-2 n-1 n=2 n-2

C(n) = ∑ ∑ 1 = ∑ (n-1) – (i+1) + 1 = ∑ (n-1-i) = n(n-1)


i=0 j=i+1 i=0 i=0 2

Since the algorithm makes the same number of key comparisons as


selection sort (Θ(n2)) and in addition uses a linear amount of extra
space, it can hardly be recommended for practical use.

7
Sorting by Distribution counting
 Store the frequency of occurrence of each element in an array.
 Then we can copy elements into new array S[0. . . n-1] to hold sorted list as
follows:
 Array A elements whose values are equal to lowest value l are copied into the
first F[0] elements of S, i.e., positions 0 through F[0]-1, next higher elements
are copied to positions from F[0] to (F[0]+F[1])-1, and so on.

 Since such accumulated sums of frequencies are called a distribution in


statistics, the method itself is known as distribution counting.

Example: Consider sorting the array 13 11 12 13 12 12

whose values are known to come from the set {11, 12, 13} and should not
be overwritten in the process of sorting.
The frequency and distribution arrays are as follows:

Array values 11 12 13
Frequencies 1 3 2
Distribution values 1 4 6 8
Sorting by Distribution counting
 Note that the distribution values indicate the proper positions for the
last occurrences of their elements in the final sorted array. If we index
array positions from 0 to n-1, the distribution values must be reduced
by 1 to get corresponding element positions.

Input array 13 11 12 13 12 12

It is convenient to process the input array from right to left.


For example, the last element is 12 in above list, and, since its
distribution value is 4, we place this 12 in position 4-1 =3 of the sorted
array S.
 Then we decrease the 12’s distribution value by 1 and proceed to next
element ( from the right) in the given array. 9
Sorting by Distribution counting
Input array 13 11 12 13 12 12

S[0 . . . 5]
D[0 . . . 2]
Index value 0 1 2 0 1 2 3 4 5

A[5] = 12 1 4 6 12
A[4] = 12 1 3 6 12
A[3] = 13 1 2 6 13
A[2] = 12 1 2 5 12
A[1] = 11 1 1 5 11
A[0] = 13 0 1 5 13

Figure: Example of sorting by distribution counting. The distribution


values being decremented are shown in bold. 10
Sorting by Distribution counting
ALGORITHM DistributionCounting(A[0 . . . n-1], L, U)
//Sorts an array of integers from a limited range by distribution counting
//Input: An array A[0 . . . n-1] of integers between l and u (l ≤ u)
//Output: Array S[0 . . . n-1] of A’s elements sorted in nondecreasing order
for j ← 0 to U – L do D[j] ← 0 //initialize frequencies
for i ← 0 to n – 1 do D[A[i] – L] ← D[A[i] – L] + 1 //compute frequencies
for j ← 1 to U – L do D[j] ← D[j– 1] + D[j] //reuse for distribution
for i ← n-1 downto 0 do
j ← A[i] – L
S[D[j] – 1] ← A[i]
D[j] ← D[j] – 1 13 11 12 13 12 12
return S

11
Analysis of sorting by distribution counting
 The input size metric for this algorithm is n.

 The statements within the for loop can be considered as basic


operation.

 The number of times basic operation is executed can be obtained as


shown below:
0 n-1
C(n) = ∑ 1 = ∑ 1 = n – 1 – 0 + 1 = n
i=n-1 i=0

So, time complexity of sorting by distribution counting is Θ(n).

This is a better time-efficiency class than that of the most efficient


sorting algorithms – mergesort, quicksort, and heapsort Θ(nlog2 n)
- we have encountered.

12
Input Enhancement in String Matching
 The pattern matching algorithm using Brute-force method had the
worst-case efficiency of Θ(mn) where m is the length of the pattern
and n is the length of the text. In the average-case, its efficiency turns
out to be in Θ(n).

 Several better algorithms have been discovered. Most of them exploit


the input enhancement idea: preprocess the pattern to get some
information about it, store this information in a table, and then use this
information during an actual search for the pattern in a given text.

 The various algorithms which uses the input enhancement technique


for string matching are:
 Knuth-Morris-Pratt algorithm
 Boyer-Moore algorithm
 Horspool’s algorithm which is a simplified version of Boyer -

Moore algorithm. 13
Horspool’s Algorithm
A simplified version of Boyer-Moore algorithm:

 preprocesses pattern to generate a shift table that


determines how much to shift the pattern when a
mismatch occurs

 Determines the size of shift by looking at the character c


of the text that was aligned against the last character of
the pattern.

14
How far to shift?
In general, the following four possibilities can occur :
Case 1: Look at first (rightmost) character in text that was compared:
The character is not in the pattern
s0...…....S...............................................sn-1 (S not in pattern)
BAOBAB
BAOBAB
If there are no c’s in the pattern – eg., c is letter S in above example,
we can safely shift the pattern by its entire length.

Case 2: The character is in the pattern (but not the rightmost)


s0…......O……….......................sn-1 (O occurs once in pattern)
BAOBAB
BAOBAB
s0……..A……….......................sn-1 (A occurs twice in pattern)
BAOBAB
BAOBAB
If there are occurrences of character c in the pattern but it is not the
last one there – e.g., c is letters O and A in above examples respectively – the
shift should align the rightmost occurrence of c in the pattern with the c in
the text. 15
Case 3: If c happens to be the last character in the pattern but there are no c’s among
its other m-1 characters, the shift should be similar to that of Case 1: the
pattern should be shifted by the entire pattern’s length m.
s0...……..MER...............................................sn-1
LEADER
LEADER

Case 4: Finally, if c happens to be the last character in the pattern and there are other
c’s among its first m-1 characters, the shift should be similar to that of
Case 2: the rightmost occurrence of c among the first m-1 characters in the
pattern should be aligned with the text’s c.
s0...………………..OR...............................................sn-1
REORDER
REORDER

 We can precompute shift sizes and store them in a table.

 The table will be indexed by all possible characters that can be


encountered in a text, including, for natural language texts, the space,
16
punctuation symbols, and other special characters.
HORSPOOL’S ALGORITHM
 The table’s entries will indicate the shift sizes computed by the formula:
the pattern’s length m,
if c is not among the first m-1 characters of the pattern
t(c) =
the distance from the rightmost c among the first m-1
characters of the pattern to its last character, otherwise

Example: For the pattern BAOBAB, all the table’s entries will be equal to 6,
except for the entries A, B, O, which will be 1, 2 and 3 respectively.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

1 2 6 6 6 6 6 6 6 6 6 6 6 6 3 6 6 6 6 6 6 6 6 6 6 6

Figure: Shift Table contents for above example. 17


ALGORITHM FOR COMPUTING THE
SHIFT TABLE ENTRIES
ALGORITHM ShiftTable(P[0 . . . m-1])
//Fills the shift table used by Horspool’s algorithm
//Input: Pattern P[0 . . . m-1] and an alphabet of possible characters
//Output: Table[0 . . . size-1] indexed by the alphabet’s characters and
// filled with shift sizes computed
initialize all the elements of Table with m
for j ← 0 to m-2 do Table[P[j]] ← m – 1 – j
return Table

 Initialize all the entries to the pattern’s length m.


 Scan the pattern left to right repeating the following step m-1 times:
for the jth character of pattern (0 ≤ j ≤ m-2), overwrite its entry in the
table with m-1-j, which is the character’s distance to the right end of
the pattern. Since algorithm scans pattern from left to right, the last
overwrite will happen for a character’s rightmost occurrence.
18
HORSPOOL’S ALGORITHM
Step 1 For a given pattern of length m and the alphabet used in both the
pattern and text, construct the shift table.

Step 2 Align the pattern against the beginning of text.

Step 3 Repeat the following until either a matching substring is found


or the pattern reaches beyond the last character of the text.
Starting with the last character in the pattern, compare the
corresponding characters in the pattern and text until either all m
characters are matched (then stop) or a mismatching pair is
encountered. In the latter case, retrieve the entry t(c) from the c’s
column of the shift table where c is the text’s character currently
aligned against the last character of the pattern and shift the
pattern by t(c) characters to the right along the text.
19
HORSPOOL’S ALGORITHM
ALGORITHM HorspoolMatching(P[0…m-1], T[0…n-1])
//Implements Horspool’s algorithm for string matching
//Input: Pattern P[0…m-1] and T[0…n-1]
//Output: The index of the left end of the first matching substring or
//-1 if there are no matches
ShiftTable(P[0…m-1]) // generate Table of shifts
i ← m-1 // position of the pattern’s right end
while i ≤ n-1 do
k ← 0 //number of matched characters
while k ≤ m-1 and P[m – 1 - k] = T[i - k] do
k← k+1
if k = m
return i – m + 1
else i ← i + Table[T[i]]
return -1 20
HORSPOOL’S ALGORITHM
Example: For the pattern BARBER, all the table’s entries
will be equal to 6, except for the entries E, B, R,
and A, which will be 1, 2,3, and 4 respectively.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

4 2 6 6 1 6 6 6 6 6 6 6 6 6 6 6 6 3 6 6 6 6 6 6 6 6

The actual search in a particular text proceeds as follows:


J I M_S A W_M E_I N_A_B A R B E R S H O P
BARBE R B A RB E R
BA R BE R B ARB E R
B A RB ER B A R B E R
21
ANALYSIS OF HORSPOOL’S ALGORITHM
 m (pattern length) and n (text length) are measure of
input’s size.

 The basic operation is the comparison statement


“ P[m – 1 – k] = T[ i – k] “ in while loop.

 Total number of times the basic operation is executed is


given by:
n-1 m-1 n-1 n-1 n-1

C(n) = ∑ ∑ 1 = ∑ m-1-0+1 = ∑ m =m∑1


i=m-1 k=0 i=m-1 i=m-1 m-1

So, the time complexity for pattern matching in the


worst case = Θ(mn), but for random texts, it is in Θ(n).
Horspool’s algorithm is faster on average than the brute-force
algorithm. 22
Hashing
 A very efficient method for implementing a dictionary, i.e.,
a set with the operations:
 search
 insert
 delete

 Based on representation-change and space-for-time tradeoff


ideas

 Important applications:
 symbol tables – a table of computer program symbols
generated during compilation.
 databases –hashing is useful for storing very large
dictionaries on disks; this variation of hashing is called
extendible hashing. 23
Hash tables and hash functions
Here, we assume that we have to implement a dictionary of n records
with keys K1, K2, . . . , Kn.
 Hashing is based on the idea of distributing keys among a one-
dimensional array H[0…m-1] called hash table.

 The distribution is done by computing, for each of the keys, the value
of some predefined function h called the hash function. This function
assigns an integer between 0 and m-1, called the hash address.
Example: student records, key = SSN.
Hash function: h(K) = K mod m where m is some integer
(typically, prime)
If m = 37, where is record with SSN= 314159265 stored?

 In General, a hash function should:


 be easy to compute
 distribute keys about evenly throughout the hash table

24
Collisions
 If we choose a hash table’s size m to be smaller than the number of keys n,
we will get collisions – a phenomenon of two ( or more) keys being hashed
into the same cell of the hash table.
Ki Kj
. . . . . .
0 b m-1

Figure: Collision of two keys in hashing: h(ki) = h(kj).


 In the worst case, all the keys could be hashed to the same cell of hash table.
With as appropriately chosen size of the hash table and a good hash
function, this situation will happen rarely.

 Two principal hashing schemes handle collisions differently:


 Open hashing (Separate chaining)

 Closed hashing (Open addressing)


- in case of collision, finds another cell by

a) linear probing b) double hashing


25
Open hashing (Separate chaining)
 Keys are stored in linked lists attached to cells of a hash table.
 Each list contains all the keys hashed to its cell.
 Example: A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED
h(K) = sum of K’s letters’ positions in the alphabet MOD 13
(for ARE hash address is (1 + 18 + 5) mod 13 =11 . Similarly
hash address is computed for other keys.)

Keys A FOOL AND HIS MONEY ARE SOON PARTED


Hash 1 9 6 10 7 11 11 12
addresses

0 1 2 3 4 5 6 7 8 9 10 11 12

A AND MONEY FOOL HIS ARE PARTED

Figure: Example of a hash table construction with SOON 26

separate chaining
Open Hashing( Separate chaining)
To search for a specific key:
Example: If we want to search for the key KID in the hash table, we first
compute the value of the hash function for the key: h(KID)=11.
Since the list attached to cell 11 is not empty, its linked list may
contain the search key. After comparing the string KID first
with the string ARE and then with the string SOON, we end
up
with an unsuccessful search.

 The efficiency of searching depends on the lengths of the linked lists,


which in turn, depends on the dictionary and table sizes, as well as
the quality of the hash function.
 If the hash function distributes n keys among m cells of the hash table
about evenly, each list will be about n/m keys long.
 The ratio α = n/m, called the load factor of the hash table, plays a
crucial role in the efficiency of hashing. 27
Open Hashing (Separate Chaining)
 The average number of pointers (chain links) inspected in successful
searches, and unsuccessful searches, U, turn out to be
S  1+α/2, U = α, respectively.

 Load α is typically kept small (ideally, about 1). Having it too small
would imply a lot of empty lists and hence inefficient use of space;
having it too large would mean longer linked lists and hence longer
search times.

 Two other dictionary operations - insertion and deletion – are almost


identical searching and they are all Θ(1) in the average case if the
number of keys n is about equal to hash table’s size m.
 Insertions are normally done at the end of a list.
 Deletion is performed by searching for a key to be deleted and then
removing it from its list.

28
Closed hashing (Open addressing)
All keys are stored inside a hash table without the use of linked lists.
(This implies that the table size m must be atleast as large as the number of keys n.)

Key A FOOL AND HIS MONEY ARE SOON PARTED


h(K) 1 9 6 10 7 11 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12
A
A FOOL
A AND FOOL
A AND FOOL HIS
A AND MONEY FOOL HIS
A AND MONEY FOOL HIS ARE
A AND MONEY FOOL HIS ARE SOON
PARTED A AND MONEY FOOL HIS ARE SOON
29
Figure: Example of a hash table construction with linear probing.
Closed Hashing (Open Addressing)
 Different strategies can be employed for collision resolution.
 Linear probing: This strategy checks the cell following the one
where the collision occurs. If that cell is empty, the new key is
installed there; if the next cell is already occupied, the availability of
that cell’s immediate successor is checked, and so on. Note that if
the end of the hash table is reached, the search is wrapped to the
beginning of the table; i.e, it is treated as a circular array.
o To search for a given key K, we start by computing h(k) where h is
the hash function used in the table’s construction. If the cell h(k) is
empty, the search is unsuccessful. If the cell is not empty, we must
compare K with the cell’s occupant: if they are equal, we have
found a matching key; if they are not, we compare K with a key in
the next cell and continue in this manner until we encounter either a
matching key (a successful search) or an empty cell (unsuccessful
search).

30
Closed hashing (cont.)
 Does not work if n > m
 Avoids pointers
 Deletions are not straightforward
 Number of probes to find/insert/delete a key depends on load
factor α = n/m (hash table density) and collision resolution strategy.
For linear probing:
S ≈ (½) (1+ 1/(1- α)) and U ≈ (½) (1+ 1/(1- α)²)
 As the table gets filled (α approaches 1), number of probes in linear
probing increases dramatically:

31
End of Chapter 7

32

You might also like