Lecture 14 Hashing
Lecture 14 Hashing
What is Hashing
• The searching time of each searching technique depends on the
comparison. i.e., n comparisons required for an array A with n
elements
• To increase the efficiency, i.e., to reduce the searching time, we
need to avoid unnecessary comparisons
• Hashing is a technique where we can compute the location of the
desired record in order to retrieve it in a single access (or without
comparison)
• a hash table is a data structure
– that uses a hash function to efficiently translate certain keys
(e.g., person names) into associated values (e.g., their
telephone numbers).
Introduction
• Let there is a table of n employee records and each
employee record is defined by a unique employee
code, which is a key to the record and employee name
• If the key (or employee code) is used as the array
index, then the record can be accessed by the key
directly
– Ideally the hash function should map each possible key to a different slot index
– but this goal is rarely achievable in practice.
– Most hash table designs assume that hash collisions — pairs of different keys with
the same hash values — are normal occurrences, and accommodate them in some
way.
Applications
– Real-time databases
• Organizing files in the hard disk
• air traffic control
• packet routing
• Correct delivery of data in computer networks
4
Hash Tables
• Hashing is used for storing relatively large amounts
of data in a table called a hash table ADT.
• Hash table is usually fixed as H-size, which is larger
than the amount of data that we want to store.
• We define the load factor () to be the ratio of data
to the size of the hash table.
• Hash function maps an item into an index in range.
hash table
item 0
key hash 1
function 2
3
H-1
Hash Tables
• Hashing is a technique used to perform insertions, deletions, and
searches/finds in constant average time.
• To insert or find a certain data, we assign a key to the elements and use a
function to determine the location of the element within the table called
hash function.
• Hash tables are arrays of cells with fixed size containing data or keys
corresponding to data.
• For each key, we use the hashing function to map key into some number in
the range 0 to H-size-1 using hashing function.
Unfortunately such a function H may not yield different values (or index), it is
possible that two different keys k1 and k2 will yield the same hash address
This situation is called Hash Collision, which is discussed later
Hash Function
• The basic idea of hash function is the
transformation of the key into the corresponding
location in the hash table
• A Hash function H can be defined as a function
that takes key as input and transforms it into a
hash table index
1. Division method
2. Mid Square method
3. Folding method
----
10 2103
• Choose m in such a way that it is greater than 90
----
• Suppose m = 91. Then for the following employee code (or
19 3750
key k) :
----
H(k) = H(2103) = 2103 mod 91 = 10
H(k) = H(6147) = 6147 mod 91 = 50 50 6147
H(k) = H(3750) = 3750 mod 91 = 19
----
So if you enter the employee code to the hash function, we 90
can directly retrieve TABLE[H(k)] details directly
• Minimize collisions
• Be easy and quick to compute
• Distribute key values evenly in the hash table
• Use all the information provided in the key
48
8
Algorithms & Data Structures
For improvement
• Extra milling can also be applied to even numbered parts,
k2, k4, ...... are each reversed before the addition
48
--
84
--
-
84
-
• h(k) = k mod 13
• Insert keys:
• 18 41 22 44 59 32 31 72
key 18 41 22 44 59 32 31 72
Mod 5 2 9 5 7 6 5 7
41 18 44 59 32 22 31 72
0 1 2 3 4 5 6 7 8 9 10 11 12
Clustering
• Sometimes, data will cluster – this is caused
when many elements hash to the same (or
similar) location and linear probing has been
used often. We can help with this problem by
choosing our divisor carefully in our hash
function and by carefully choosing our table
size.
Problems of Linear Probing
• The majority of the problems are caused by
clustering. These problems can be helped by
using Quadratic probing instead.
Quadratic probing
• Eliminate the primary clustering by selecting f(i) = i2
• There is more problem with a hash table that is more
than half full.
• You have to select appropriate table size that is not
square of a number.
• We can prove that quadratic probing with table size
prime number and at least half empty will always find a
location for an element.
• Elements that hash to the same location will probe the
same alternative cells (secondary clustering).
Quadratic Probing
• Works like linear probing but instead of
looking to the next available position, the next
location is chosen by looking at the positions
that are 12, 22, 32, etc. positions ahead.
Quadratic Probing
• Consider the data with keys: 24, 42, 34,62,73
into a table of size 10. These entries can be
placed into the table at the following
locations:
Key 24 42 34 62 73
H(key) 4 2 4 2 3
42 62 24 34 73
0 1 2 3 4 5 6 7 8 9
Quadratic Probing
• 24 % 10 = 4. Position is free. 24 placed into element 4
• 42 % 10 = 2. Position is free. 42 placed into element 2
• 34 % 10 = 4. Position is occupied. Try place 12 away in the
table (5). 34 placed into position 5.
• 62 % 10 = 2. Position is occupied. Try place 12 away in the
table. (3) 62 placed into position 3.
• 73 % 10 = 3. Position is occupied. Try place 12 away in the
table (4). Same problem. Try place 22 away in the table (7). 73
is placed into position 7.
– Thus, we jumped over the existing cluster.
• This doesn’t completely solve our problem, but it helps.
Double Hashing
• Use two hash functions h(key) and hp(key)
• hi(key) = [h(key) + I* hp(key)]
(a) using double hashing with the first hash function: h(key) = key % 13 and the second hash
function: hp(key) = 1 + key % 12
(b) using double hashing with the first hash function: h(key) = key % 13 and the second hash
function: hp(key) = 7 - key % 7
Show all computations.
Double Hashing (cont’d)
h0(64) = 64%13 = 12
h0(47) = 47%13 = 8
h0(96) = 96%13 = 5 collision
hp(96) = 1 + 96%12 = 1
h1(96) = (5 + 1*1)%13 = 6 collision
h2(96) = (5 + 2*1)%13 = 7
h0(36) = 36%13 = 10
h0(70) = 70%13 = 5 collision
hp(70) = 1 + 70%12 = 11
h1(70) = (5 + 1*11)%13 = 3
33
Double Hashing (cont'd)
4
Open Hashing
For search, we use the hash function to determine
which linked list holds the element, and then traverse
the linked list to find the element.
Deletion is done to the element in the appropriate
linked list after we find the element to be deleted.
We could use other kinds of lists like a tree or another
hash table for each cell in the hash table to resolve
collision.
The main advantage of this method is the fact that it
can handle any amount of data (dynamic expansion).
Perfect Hashing
• If all of the keys that will be used are known ahead of time,
and there are no more keys than can fit the hash table, a
perfect hash function can be used to create a perfect hash
table, in which there will be no collisions.
Summary
Hash tables: array
Hash function: function that maps key into
number [0 size of hash table)
Collision resolution
Open hashing
Separate chaining
Closed hashing (Open addressing)
Linear probing
Quadratic probing
Double hashing
Summary
• Advantage
– Constant Running time + time to resolve Collision
• Disadvantage
– Difficult (not efficient) to print all elements in hash
table
– Inefficient to find minimum element or maximum
element
– Not growable (for closed hash/open addressing)
– Waste some space
Conclusions