3 Hashing
3 Hashing
General Idea
• Facilitates search ideally in O(1) time
• The ideal hash table structure is merely an array of some fixed
size, containing the items.
• A stored item needs to have a data member, called key, that
will be used in computing the index value for the item.
• Key could be an integer, a string, etc
▪ e.g. a name or Id that is a part of a large employee structure
• The size of the array is TableSize.
• The items that are stored in the hash table are indexed by
values from 0 to TableSize – 1.
• Each key is mapped into some number in the range 0 to
TableSize – 1.
• The mapping is called a hash function.
Example Hash
0 Table
1
Item
2
s 25000
john
3 john 25000
Hash
phil 31250 ke Functio 4 phil 31250
y n
dave 27500 5
mary 28200 6 dave 27500
7 mary 28200
ke 8
y 9
Hash Function
• Function H from set of K keys to set of L memory
locations
▪ H:K →L
• The hash function:
▪ must be simple to compute.
▪ must distribute the keys evenly among the cells.
• If we know which keys will occur in advance we can
write perfect hash functions, but we don’t
• Problems:
▪ Keys may not be numeric.
▪ Number of possible keys is much larger than the space
available in table.
• Different keys may map into same location
▪ Hash function is not one-to-one => collision.
Some popular hash functions
• Division Method
• Midsquare Mthod
• Folding Method
Division Method
• h(k) = k mod M
• Generally, it is best to choose M to be a prime
number because making M a prime increases
the likelihood that the keys are mapped with
a uniformity in the output range of values.
Midsquare Method
• Step 1: Square the value of the key. That is,
find k 2
• Step 2: Extract the middle r bits of the result
obtained in Step 1 where r is the size of the
Example: Calculate the hash value for keys 1234 and 5642 using the mid
address
square of hash
method. The location
table has 100 memory locations.
Note the hash table has 100 memory locations whose indices vary from
0-99. this means, only two digits are needed to map the key to a
location in the hash table, so r = 2.
Observe that 3rd and 4th digits starting from the right are chosen.
Folding Method
• The folding method works in two steps.
• Step 1: Divide the key value into a number of parts. That is
divide k into parts, k1, k2, …, kn, where each part has the same
number of digits except the last part which may have lesser
digits than the other parts.
• Step 2: Add the individual parts. That is obtain the sum of k1 +
k2 + .. + kn. Hash value is produced by ignoring the last carry,
if any.
• Note that the number of digits in each part of the key will vary
depending upon the size of the hash table. For example, if the
hash table has a size of 1000. Then it means there are 1000
locations in the hash table. To address these 1000 locations, we
will need at least three digits, therefore, each part of the key
must have three digits except the last part which may have
lesser digits.
Collision Resolution
1 81 1
2
4 64 4
5 25
6 36 16
7
9 49 9
Operations
• Initialization: all entries are set to NULL
• Find:
– locate the cell using hash function.
– sequential search on the linked list in that cell.
• Insertion:
– Locate the cell using hash function.
– (If the item does not exist) insert it as the first
item in the list.
• Deletion:
– Locate the cell using hash function.
– Delete the item from the linked list.