Ch09 Space and Time Tradeoffs
Ch09 Space and Time Tradeoffs
1
Design and Analysis of Algorithms - Chapter 9
Objectives
• After Studding this chapter, you should be able to;
1. Explain the importance of space-time tradeoff in
programming.
2. The process of sorting by counting.
3. Analyze the process of input enhancement in string
matching.
4. Define the role of hashing in space and time tradeoff.
5. Explain B-Tree technique with respect to space and
time tradeoff.
2
Design and Analysis of Algorithms - Chapter 9
Sorting
• Input enhancement is based on preprocessing the
instance to obtain additional information that can
be used to solve the instance in less time.
• Sorting is an example of input enhancement that
achieves time efficiency.
3
Design and Analysis of Algorithms - Chapter 9
Distribution Counting
• Distribution Counting is a sorting method that
uses some associated information of the elements
and places the elements in an array at their
relative position.
• In this method element are actually distributed in
the array from 0th to (n-1)th postion.
• This method ensures that the elements do not get
over written.
4
Design and Analysis of Algorithms - Chapter 9
Distribution Counting
5
Design and Analysis of Algorithms - Chapter 9
Distribution Counting
6
Design and Analysis of Algorithms - Chapter 9
Distribution Counting Algorithm
7
Design and Analysis of Algorithms - Chapter 9
Input Enhancement
in String Matching
• String matching is an important application of input
enhancement .
• We know that brute force method is the simplest method.
But it is time consuming because every character of
pattern is matched with every character of text.
• Hence faster algorithms have been developed and some
of these are listed below:
Horspool’s algorithm
The Boyer-Moore algorithm
The Knuth-Morris-Prat algorithm
8
Design and Analysis of Algorithms - Chapter 9
Horspool’s Algorithm
9
Design and Analysis of Algorithms - Chapter 9
Boyer – Moore Algorithm
• The Boyer-moore algorithm uses two heuristics:
good-suffix and bad-character shift.
• We use this when mismatch occurs.
• We decide the number of places to shift by using
bad character shift.
- As in Horspool’s algorithm if the rightmost
character does not match, then the patern is
shifted to the right by its length.
10
Design and Analysis of Algorithms - Chapter 9
Boyer – Moore Algorithm
- When the rightmost character of the pattern
matches with that of the text, then each character
is compared from right to left.
- If at some point a mismatch occurs after a certain
number of ‘K’ matches with text’s character ‘T’ ,
then bad character shift is denoted by P.
11
Design and Analysis of Algorithms - Chapter 9
Good Suffix Shift
• This shift helps in shifting a matched part of
the pattern, and is denoted by Q.
• Good suffix shift Q is applied after 0<K<m
characters are matched.
• Q= distance between matched suffix of size
K and its rightmost occurrence in the
pattern that is not preceded by the same
character as the suffix.
12
Design and Analysis of Algorithms - Chapter 9
Good Suffix Shift
• When the substring is not found do the following
- If all the m characters of pattern are matching
with the text then stop.
- If a mismatching pair occurs, then compute bad
character shift P and good suffix shift Q and use
these to compute shift size.
- R=max {P,Q} Where K>0
13
Design and Analysis of Algorithms - Chapter 9
Hash Methodology
• Hashing is the method by which a string of characters or a
large amount of data is transformed into a usually shorter
fixed-length value or key that represents the original string
• This key is used to index and retrieve items from a
database.
• We can find items faster using the shorter hashed key
than by using the original value.
• Hashing is performed on arbitrary data by a hash function
• The Code generated is unique to the data it came from.
14
Design and Analysis of Algorithms - Chapter 9
Hash Function
• What is hash Function?
• A hash function is a function that converts data to
either a number or an alphanumeric code.
• The hashing algorithm is called the hash function.
• Hash functions are primary used in hash tables, to
locate a data record given its search key.
• The hash function maps the search key to the
hash.
15
Design and Analysis of Algorithms - Chapter 9
Hash Function
• The index gives the place where the corresponding record
is stored.
• Therefore, each slot in a hash table is related with a set of
records, rather than a single record.
• Each slot in a hash table is called a bucket, and hash values
are also called bucket indices.
• A hash function returns hash value, hash codes, hash
sums, checksums or simply hashes.
• The hash fucnion hints at the record’s location – it tells
where one should start looking for it.
16
Design and Analysis of Algorithms - Chapter 9
Uses of Hash Functions
• Hash functions are used to speed up table lookup or data
comparison tasks such as finding items in a database,
detecting duplicate or similar records in a large file and so
on.
• Hash functions are also used to determine if two objects
are equal or similar, checksums overa large amount of
data and finding an entry in a database by a key value.
• The UNIX C-shell uses hash table to store the location of
executable programs.
17
Design and Analysis of Algorithms - Chapter 9
Collision resolution
• Hash collisions are unavoidable when hashing a random subset
of a large set of possible keys.
• A hash function can map two or more keys to the same hash
value.
• In many applications, it is desirable to minimize the occurrence
of such collisions, which means that the hash function must
map the keys to the hash values as evenly as possible.
• Therefore, hash table implementations have some collision
resolution strategy to handle such events,
18
Design and Analysis of Algorithms - Chapter 9
Collision resolution
• The most common strategies are;
1. Open Hashing( Separate Chaining)
2. Closed Hashing ( Open Addressing)
19
Design and Analysis of Algorithms - Chapter 9
Load Factor
20
Design and Analysis of Algorithms - Chapter 9
Separate chaining
• Hash collision is resolved by separate chaining also called
Open hashing or Closed addressing.
• In this strategy, each slot of the bucket array is a pointer to
a linked list which contains the key-value pairs that are
hashed to the same location.
• Lookup scans the list for an entry with the given key.
• Insertion involves adding a new entry record to both the
ends of the list belonging to the hashed slot.
21
Design and Analysis of Algorithms - Chapter 9
Separate chaining
• Deletion involves searching the list and removing
the element.
• Chained hash tables with lists which are linked is
popular because they require only basic data
structures with simple algorithms, and can use
simple hash functions that are unsuitable for
other methods.
22
Design and Analysis of Algorithms - Chapter 9
Open addressing
• Hash Collision resolved by open addressing is
called closed hashing. The term “open
addressing” indicates that the location (“address”)
of the item is not determined by its hash value.
• In Open addressing, the entry records are stored
in the bucket array itself.
23
Design and Analysis of Algorithms - Chapter 9
Open addressing
• When a new entry has to be made, the bucket is
examined, starting with the hashed-to slot and proceeds
in some probe sequence, until an unoccupied slot is
found.
• When searching for an entry, the buckets are scanned,
until either the target record is found, or an unused array
slot is found.
• Which indicates that there is no such key in the table. The
popular probe sequences are;
24
Design and Analysis of Algorithms - Chapter 9
Open addressing
- Double hashing- The interval between probes is
computed by another hash function
- Linear probing – The interval between probes is
fixed (usually 1)
- Quadratic probing – The interval between probes
is increased by adding the successive outputs of a
quadratic polynomial to the starting value given
by the original hash computation.
25
Design and Analysis of Algorithms - Chapter 9
Open addressing
• A drawback to the open addressing schemes is that the
number of stored entries cannot exceed the number of
slots in the bucket array. In fact, even with good hash
functions, their performance decreases when the load
factor grows beyond 0.7 or so.
• Open addressing schemes also put more strict
requirements on the hash function. The function must
distribute the keys more uniformly over the buckets and
minimize clustering of hash values that are consecutive in
the probe order.
26
Design and Analysis of Algorithms - Chapter 9
Open addressing
• Open addressing saves memory if the entries are
small (less than 4 times the size of a pointer).
• Open addressing is a waste if the load factor is
close to zero (that is, there are far more buckets
than stored entries), even if each entry is just two
words.
27
Design and Analysis of Algorithms - Chapter 9
Indexing Schemas
• An effective indexing scheme will help in easy
retrieval of data.
• Different tradeoffs are involved in different
indexing techniques.
• An indexing scheme which is faster may require
more storage.
• The B-tree is one such important index
organization.
28
Design and Analysis of Algorithms - Chapter 9
B-Tree Technique
• The B-Tree is an individual indexing schema at the high
speed end of the spectrum.
• The tradeoff here is that is exchange for fast access times.
You pay in terms of code and memory buffer size and in
disk space for the B-Tree themselves.
• It uses an index to the data records. A file that contains
enough information to describe the position and key of
each record in the data file is built. The index is organized
into a branching tree structure and the tree is kept
balanced.
29
Design and Analysis of Algorithms - Chapter 9
B-Tree Technique
• The number of index accesses to reach a record is
proportional to the logarithm of the number of
records in the file.
• That is to say access time increases slowly as the
number of records increases.
• As the branching factor increases, the height of
the tree decrease, thus making access quicker.
30
Design and Analysis of Algorithms - Chapter 9
B-Tree Technique
• A B-Tree of order m (the maximum number of
children for each node) is a tree which satisfies the
following properties;
- Every node has at most m children.
- Every node (except root) has at least m/2 children.
- The root has at least two children if it is not a leaf
node.
- All leaves appear in the same level, and carry
information.
- A non-leaf node with k children contains k-1 keys.
31
Design and Analysis of Algorithms - Chapter 9
Search
• Searching is similar to a binary search tree. Starting from the root, the
tree is recursively traversed from top to bottom. At each level the
search chooses the child pointer(Sub tree) whose separation values
are on either side of the search value.
• For example Consider the figure 9.8. To search the number 5 in the
tree we start at the root node and traverse through the tree. The
number 5 is compared with root node value 13 and since it is less
than 14 the searching operation shifts to the left sub-tree.
• At the second level in the tree it again compares with both the
numbers 4, 7. The number 5 is between 4 and, so it moves to the
second element 7. The Comparison continues in this way and the
node having number 5 is found.
32
Design and Analysis of Algorithms - Chapter 9
Search
33
Design and Analysis of Algorithms - Chapter 9
Insertion
• To understand the process of insertion let us consider an
empty B-tree of order 5 and insert the following numbers
in it 3, 14, 7, 1, 8, 5, 11, 17, 13.
• A tree of order 5 has a maximum of 5 children and 4 keys.
All nodes other than the root must have a mimum of 2
keys.
34
Design and Analysis of Algorithms - Chapter 9
Deletion
• Considering the B-Tree in figure 9.13, let us understand
the process of deletion by the deleting the following
numbers one by one.
• To delete 20, we find its successor 23 (the next item in
ascending order) because it is not a leaf node, and move
23 up to replace 20.
• Therefore we can remove 23 from the leaf since this leaf
has extra keys.
35
Design and Analysis of Algorithms - Chapter 9
Summary
• We usually analyze the efficiency of an algorithm in terms of its
time and space requirements. Usually the efficiency of an
algorithm is stated as a function relating the input length to the
number of steps (time complexity) or storage locations (Space
Complexity).
• Distribution counting is an input enhancement method wherein a
separate array is used to store the information generated during
the sorting process and these arrays enhance the sorting process
• Horspool’s and Boyre-Moore algorithms are string matching
algorithms where in the pattern is compared with the text and
shifting of the pattern is done by computing shift size. Thus
searching operation becomes faster.
36
Design and Analysis of Algorithms - Chapter 9
Summary
• Hashing is a technique that uses a hash key to find items.
Collision occurs when the hash key value of two items turns out
to be the same value. This is rectified by the two collision
resolution methods – separate chaining and open addressing.
• The branching factor in B-Tree technique speeds the access
time.
• Leaf node is a node in a tree data structure that has no child
nodes.
• A linked list is a data structure which consists of a sequence of
data records, that in each record there is a field that contains a
reference.
37
Design and Analysis of Algorithms - Chapter 9
Terminal Questions
1. Explain distribution counting with an example.
2. Explain the two types of collision resolution in
hashing.
3. Describe the algorithms based on input
enhancement in string matching.
4. What is a hash function?
5. How does B-Tree technique enhance space and
time tradeoff?
38
Design and Analysis of Algorithms - Chapter 9
Self Assessment Question
1. Input enhancement is based on Pre Processing the
instance.
2. The information which is used to place the elements at
proper positions is accumulated sum of frequencies which
is called as Distribution.
3. Sorting is an example of input enhancement that achieves
Time Efficiency
4. Input enhancement is to Preprocess the input pattern.
5. In Horspool’s algorithm, the characters from the pattern
are matched Right to Left.
39
Design and Analysis of Algorithms - Chapter 9
Self Assessment Question
6. The two heuristics in Boyre-Moore algorithm are Good Suffix and Bad
Character shift.
7. Each slot of a hash table is often called a Bucket
8. Collision occurs when a hash function maps two or more keys to the
Same Hash Value
9. When the interval between probes is computed by another hash
function it is Double hashing.
10.As the Branching Factor increases the height of the tree decreases
thus speeding access.
11.Access time increase slowly as the number of recors Increases.
12.The insertions in a B-Tree start from a Leaf node.
40
Design and Analysis of Algorithms - Chapter 9
END
41
Design and Analysis of Algorithms - Chapter 9