Unit 3 2nd Half 2024
Unit 3 2nd Half 2024
Hashing:
Hashing is a technique that is used to store, retrieve and find data in the data structure called
Hash Table.
It is used to overcome the drawback of Linear Search (Comparison) & Binary Search (Sorted order list).
It involves two important concepts:
1. Hash Table 2. Hash Function
1. Hash table
A hash table is a data structure that is used to store and retrieve data (keys) very quickly.
It is an array of some fixed size, containing the keys. Hash table run from 0 to Tablesize – 1.
Each key is mapped into some number in the range 0 to Tablesize – 1. This mapping is called Hash
function.
Insertion of the data in the hash table is based on the key value obtained from the hash function.
Using same hash key value, the data can be retrieved from the hash table by few or more Hash key
comparison.
The load factor of a hash table is calculated using the formula:
(Number of data elements in the hash table) / (Size of the hash table)
Factors affecting Hash Table Design
1. Hash function 2. Table size. 3.Collision handling scheme
0
1
2
3
.
. Simple Hash table with table size = 10
8
9
2. Hash function:
It is a function, which distributes the keys evenly among the cells in the Hash Table.
Using the same hash function we can retrieve data from the hash table.
Hash function is used to implement hash table.
The integer value returned by the hash function is called hash key.
If the input keys are integer, the commonly used hash function is
H ( key ) = key % Tablesize
Krishna / Unit 5 / Data Structures 2
For example, if the item were 44, we would first compute 442=1,936.
Extract the middle two digit 93 from the answer. Store the key 44 in the index 93.
44
93
Krishna / Unit 5 / Data Structures 3
107
3306
4999
4. Extraction:
In this method some digit are extracted from the key to form the address location in hash table.
Example: Suppose first, third and fourth digit from left is selected for hash key. (497824)
4 9 7 8 2 4
478 at 478 location in the hash table of size 1000 the key can be stored.
_______________________________________________________________________________
Krishna / Unit 5 / Data Structures 4
Collision:
If two more keys hashes to the same index, the corresponding records cannot be stored in the same location.
This condition is known as collision.
Characteristics of Good Hashing Function:
It should be Simple to compute.
Number of Collision should be less while placing record in Hash Table.
Hash function with no collision Perfect hash function.
Hash Function should produce keys which are distributed uniformly in hash table.
The hash function should depend upon every bit of the key. Thus the hash function that simply extracts
the portion of a key is not suitable.
If two keys map to same value, the elements are chained together.
Initial configuration of the hash table with separate chaining.
Here we use SLL(Singly Linked List) concept to chain the elements.
Example:
Insert the following four keys 10, 11, 81, 10, 7, 34, 94, 17 into hash table of size 10 using separate chaining.
The hash function is H(key) = key % 10
Krishna / Unit 5 / Data Structures 6
Insertion:
Perform the insertion of an element, traverse down the appropriate list to check whether the element is
already in place.
If the element is new one, the inserted it is either at the front of the list or at the end of the list.
If it is a duplicate element, an extra field is kept and placed
Insert 10:
Hash (k) = k% Tablesize
Hash (10) = 10 % 10
Hash (10) = 0
Insert 11:
Hash (11) = 11 % 10
Hash (11) = 1
Insert 81:
Hash (81) = 81% 10
Hash (81) = 1
The element 81 collides to the same hash value 1. To place the value 81 at this position perform the
following.
Traverse the list to check whether it is already present.
Since it is not already present, insert at end of the list. Similarly the rest of the elements are inserted.
Advantages:
1. More number of elements can be inserted using array of Link List
Disadvantages:
1. It requires more pointers, which occupies more memory space.
2. Search takes time. Since it takes time to evaluate Hash Function and also to traverse the List
2. Open Addressing:
It is also called as Closed Hashing
It is a Collision resolution technique.
It Uses Hi(X)=(Hash(X)+F(i))mod Tablesize
When collision occurs, alternative cells are tried until empty cells are found.
Types:-
1. Linear Probing
2. Quadratic Probing
3. Double Hashing
Hash function: H (key) = key % table size.
Krishna / Unit 5 / Data Structures 7
Insert Operation:
To insert a key; Use the hash function to identify the list to which the element should be inserted.
Then traverse the list to check whether the element is already present.
If exists, increment the count.
Else the new element is placed at the front of the list.
1. Linear Probing:
It is easiest method to handle collision.
Apply the hash function H (key) = key % table size
Hi(X)=(Hash(X)+F(i))mod Tablesize, where F(i)=i.
How to Probing:
First probe given a key k, hash to H(key)
Second probe if H(key)+f(1) is occupied, try H(key)+f(2) And
so forth.
Probing Properties:
We force f(0)=0
The ith probe is to (H (key) +f (i)) %table size.
If i reach size-1, the probe has failed. Depending on f (i), the probe may fail sooner. Long sequences of
probe are costly.
Probe Sequence is:
H (key) % table size
H (key)+1 % Table size
H (Key)+2 % Table size
1. H(Key)=Key mod Tablesize
This is the common formula that you should apply for any hashing
If collocation occurs use Formula 2
2. H(Key)=(H(key)+i) Tablesize
Where i=1, 2, 3, ……etc
Example 1: 89 18 49 58 69; Tablesize=10
1. H(89) =89%10=9
2. H(18) =18%10=8
3. H(49) =49%10=9 (colloids with 89.So try for next free cell using formula 2)
i=1 h1(49) = (H(49)+1)%10
= (9+1)%10
Krishna / Unit 5 / Data Structures 8
=10%10
=0
4. H(58) =58%10=8 ((colloids with 18))
i=1 h1(58) =(H(58) +1)%10
= (8+1) %10
=9%10
=9 =>Again
collision
i=2 h2(58) =(H(58)+2)%10
=(8+2)%10
=10%10
=0 =>Again collision
EMPT 89 18 49 58 69
0 Y 49 49 49
1 58 58
2 69
3
4
5
6
7
8 18 18 18
9 89 89 89 89
2. Quadratic Probing
To resolve the primary clustering problem, quadratic probing can be used. With quadratic probing,
rather than always moving one spot, move i2 spots from the point of collision, where i is the number of
attempts to resolve the collision.
Another collision resolution method which distributes items more evenly.
From the original index H, if the slot is filled, try cells H+12, H+22, H+32,.., H + i2 with wrap-around.
Hi(X)=(Hash(X)+F(i))mod Tablesize,F(i)=i2
Hi(X)=(Hash(X)+ i2)mod Tablesize
Example: Insert 18, 89, 21, 58, 68
Limitation: at most half of the table can be used as alternative locations to resolve collisions.
This means that once the table is more than half full, it's difficult to find an empty spot. This new
problem is known as secondary clustering because elements that hash to the same hash key will always
probe the same alternative cells.
3. Double Hashing
Double hashing uses the idea of applying a second hash function to the key when a collision
occurs. The result of the second hash function will be the number of positions forms the point of
collision to insert.
There are a couple of requirements for the second function:
It must never evaluate to 0 must make sure that all cells can be probed.
Hi(X)=(Hash(X)+i*Hash 2(X))mod Tablesize
Krishna / Unit 5 / Data Structures 10
A popular second hash function is: Hash2 (key) = R - (key % R) where R is a prime number that is
smaller than the size of the table
Krishna / Unit 5 / Data Structures 11
Example 2: Given input {4371, 1323, 6173, 4199, 4344, 9679, 1989} and a hash function h(x) = x % 10,
show the resulting
(a). separate chaining hash table (Note -- pseudo collisions are added to the end of the list.)
0 /
1 --> 4371
2 /
3 --> 1323 --> 6173
4 --> 4344
5 /
6 /
7 /
8 /
9 --> 4199 --> 9679 --> 1989
(b). closed hash table using linear probing
0 9679 x % 10 = 9 insert(9+1)
1 4371 x % 10 = 1
2 1989 x % 10 = 9 (collision at 9+1, 9+2) insert(9+3)
3 1323 x % 10 = 3
4 6173 x % 10 = 3 insert(3+1)
5 4344 x % 10 = 4 insert(4+1)
6 /
7 /
8 /
9 4199 x % 10 = 9
(c). closed hash table using quadratic probing
0 9679 x % 10 = 9 insert(9+1)
1 4371 x % 10 = 1
2 /
3 1323 x % 10 = 3
4 6173 x % 10 = 3 insert(3+1)
5 4344 x % 10 = 4 insert(4+1)
6 /
7 /
8 1989 x % 10 = 9 (collision at 9+1, 9+4) insert(9+9)
9 4199 x % 10 = 9
Krishna / Unit 5 / Data Structures 13
Example 3: Given input {43, 160, 61, 44, 67, 94, 37} and hash functions,
h1(x) = x/10 % 10 and h2(x) = 7 - (x % 7) , use double hashing to show the resulting hash table.
0 /
1 /
2 67 [h1(67) = 67/10 % 10 = 6] [h2(67) = 7-(67%7) = 7-4 = 3]
Try: [6 + 1*3 = 9]
[6 + 2*3 = 12 ==> 2]
3 94 [h1(94) = 94/10 % 10 = 9] [h2(94) = 7-(94%7) = 7-3 = 4]
Try: [9 + 1*4 = 13 ==> 3]
4 43 [h1(43) = 43/10 % 10 = 4]
5 /
6 160 [h1(160) = 160/10 % 10 = 16 ==> 6]
7 /
8 61 [h1(61) = 61/10 % 10 = 6] [h2(61) = 7-(61%7) = 7-5 = 2]
Try: [6 + 1*2 = 8]
9 44 [h1(44) = 44/10 % 10 = 4] [h2(44) = 7-(44%7) = 7-2 = 5]
Try: [4 + 1*5 = 9]
Problem with 37:
[h1(37) = 37/10 % 10 = 3] [h2(37) = 7-(37%7) = 7-2 = 5]
Try: [3+1*5= 8]
[3+2*5=13==>3]
[3+3*5=18==>8]
[3+4*5=23==>3]
[3+5*5=28==>8]
[3+6*5=33==>3]
...
is a problem as the value cycles between 3 and 8 showing the problem with a non-prime hash table size.
Krishna / Unit 5 / Data Structures 14
Rehashing:
Once the hash table gets too full, the running time for operations will start to take too long and may fail.
To solve this problem, a table at least twice the size of the original will be built and the elements will
be transferred to the new table.
The question becomes when should the rehashing be applied? (Or)
Rehashing can be implemented in several ways with quadratic probing such as:
Rehash, as soon as the table is half full.
Rehash only when an insertion fails
Rehash when the table reaches a certain load factor.
Advantage:
A programmer doesn’t worry about table system.
Simple to implement
Can be used in other data structure as well
Routine:
HashTable Rehash (HashTable H)
{
int i, oldsize;
cell *old cells;
oldcells = H ® Thecells;
oldsize = H ® Tablesize;
H = InitializeTable (2 * oldsize);
for (i = 0; i < oldsize ; i++)
if (oldcells [i]. Info = = Legitimate)
Insert (oldcells [i] . Element, H);
free (oldcells);
return H;
}
Krishna / Unit 5 / Data Structures 15
Example:
Consider the elements 13, 15, 24 and 6 are inserted into an open addressing hash table of size 7 with hash
function h(X) = X mod 7 and if linear probing is used to resolve collisions, then the resulting hash table
appears as follows:
13 % 7 = 5, 15 % 7 = 1, 24 % 7 = 3 and 6 % 7 = 0
0 6
1 15
3 24
6 13
In above figure , Open addressing hash table with linear probing with input 13, 15, 6, 24
If 23 is inserted into the table, the resulting table will be over 70 percent full. 23 % 7 = 2
0 6
1 15
2 23
3 24
6 13
-------------------------------------------------------------------------------------------------------------------------------------