CSD203 Hashing
CSD203 Hashing
Objectives
2
• Why Hashing?
• Hash Table
• Hash Functions
• Collision Resolution
• Deletion
• Perfect Hash Functions
• Hash Functions for Extendable files
• Hash code
• Maps
• Hashing in java.util
07/29/24 Data Structures and Algorithms
Why hashing?
3
• If data collection is sorted array, we can search for an
item in O(log n) time using the binary search
algorithm.
• However with a sorted array, inserting and deleting
items are done in O(n) time.
• If data collection is balanced binary search tree, then
inserting, searching and deleting are done in O(log n)
time.
• Is there a data structure where inserting, deleting and
searching for items are more efficient?
• The answer is “Yes”, that is a Hash Table.
key 8
9
Buckets of a hash table with size 11 with entries (1,D), (25,C), (3,F), (14,Z), (6,A), (39,C),
and (7,Q), using a modulo-division hash function.
07/29/24 Data Structures and Algorithms
Deletion
Consider
25 the table in
which the keys are
stored using linear
probing. Suppose we
delete A4 and then then
try to find B4. Because
when searching B we
hash it to position 4 and
see that this position is
empty and conclude
Linear search in the situation where both insertion
that B4 is not found and deletion of keys are permitted
(which is not true).
To avoid this situation, we mark the deleted positions only. When
inserting new element to this position, we update information for new
element. When there too many marked deleted elements in the table,
the table is refresh (d).
07/29/24 Data Structures and Algorithms
Perfect Hash Functions
26
• If hash function h transforms different keys
into different numbers, it is called a perfect
hash function.
• If a function requires only as many cells in the
table as the number of data so that no empty
cell remains after hashing is completed, it is
called a minimal perfect hash function
• Cichelli’s method is an algorithm to construct
a minimal perfect hash function
07/29/24 Data Structures and Algorithms
Hash Functions for Extendible Files
• There are two categories of hashing: Static hashing (the
27
is a general strategy.
– Unless key happens to have some undesirable properties.
(e.g. all keys end in 0 and we use mod 10)
• If the keys are not integers, hash function needs more care.
– The first action that a hash function performs is to
convert an arbitrary key k to an integer that is called the
hash code for k; this integer need not be in the range
[0,M −1], and may even be negative. For example, If the
keys are real numbers between 0 and 1, we might just
multiply by M and round off to the nearest integer to get
an index between 0 and M-1
07/29/24 Data Structures and Algorithms
Maps - 1
30 A map is an abstract data type designed to efficiently store and retrieve
values based upon a uniquely identifying search key for each.
Specifically, a map stores keyvalue pairs (k,v), which we call entries,
where k is the key and v is its corresponding value. Keys are required to
be unique, so that the association of keys to values defines a mapping.