Cs 218 - Data Structures: Hashing
Cs 218 - Data Structures: Hashing
HASHING
MAP
• A map allows us to store elements so they can
be located quickly using keys.
• Specifically map stores key-value pairs
• Implement map? Use hash table
• So the basic motive of hashing is to find, insert
and delete in constant time (approx).
Hashing
• We map our values using hash table.
• Structure: consider indexes as buckets where
values are stored as per hash function applied.
• Basic hash functions
• H(k)=k mod table-size
Basic hash functions
• Division
• Radix method
• Selective characters
• Addition
• Folding
Separate chaining
• Maintain a list of all the elements which hash
to same value.
• Array of linked lists
• Example 0,81,1, 9,49,25,36,16
• Load factor: number of elements/table-size
• Average length of the list is load factor.
• Searching
Separate chaining disadvantages
• Uses linked list hence using double data
structures (array + linked list)
• Better strategy: why not simply use the
unused or free cells in array.
• Concept of open addressing
• Changes: table size should be now bigger.
Collision within keys
• Different keys colliding at same index.
• Factors to consider: table-size, hash function
and load factor
Open addressing
• Concept of probing
• Linear probing
• Quadratic probing
• Double hashing
• Main problem? Deleting an element
• Solution: marker {deleted, empty, filled}
• Not to use this technique when deletion
operation is more often required
Linear probing
• H(k)=k mod table-size
• Now it says H(x) = (hash(x) + f (i)) mod TableSize
• F(i)=i which means f(0)=0
• Load factor should be <=0.5
• Eg insert keys {89, 18, 49, 58, 69} into a hash
table
• For this case assume hash function to be key
mod table-size
Linear probing
• Problem: primary clustering
• Expected number of probes using linear probing
is roughly 1/2(1 + 1/(1 − λ) 2 ) for insertions and
unsuccessful searches
• number of probes using linear probing is roughly
1/2(1 + 1/(1 − λ) ) for successful searches.
• You can calculate that more the load factor,
more probes required.
Quadratic probing
• Change simply f(i)=i2
• There is no guarantee of finding an empty cell
once the table gets more than half full, or
even before the table gets half full if the table
size is not prime.
• Problem: Secondary clustering
Solution
• Eg insert keys {89, 18, 49, 58, 69} into a hash table
with table size 10 suppose (using quadratic probing)
Solution:
• 49 -> index 0
• 58 -> index 2
• 69 -> index 3
• 18 -> index 8
• 89 -> index 9
Double hashing
• f (i) = i · hash2(x)
• As i is being multiplied by second hash
function thereby suitable hash function needs
to be chosen. In short it must not evaluate to
zero.
• hash2(x) = R − (x mod R), with R a prime
smaller than TableSize, will work well
Solution
• Eg insert keys {89, 18, 49, 58, 69} into a hash table
with table size 10 suppose (using quadratic probing)
Solution:
• 49 -> index 6
• 58 -> index 3
• 69 -> index 0
• 18 -> index 8
• 89 -> index 9
Rehashing
• What if the table gets too full?
• Consider a table size 7 with elements 13, 15,
6, 24
• Now add 23. Is table almost full?
• Create a new table. But what size should be
kept?
• When to rehash?
Worse case O(1)
• What if we get to know the number of items?
• If we are allowed to rearrange items as they
are inserted, then O(1) worst-case cost is
achievable for searches.
Perfect hashing
• O(1) worst case time for search
• 2- level hashing
• First level – apply hash function and build a
table where previously each index had a linked
list
• Now we intend to store a hash table instead of
linked list within each index. That’s second
level of hashing
Practice question
• Let H be a hash-table where collisions are
handle by separate or liner chaining and
where re-hashing is used each time the load
factor (number of items in the table divided by
size of table) exceeds 0.5. We assume that the
initial size of H is 2 and that re-hashing
doubles the size of the table. After inserting
10 items with different keys, what is the size of
the hash table H?