Hashing Methods (1)
Hashing Methods (1)
Hashing
Presented by Pr. Nabil KESKES
year 2023-2024.
1
PLAN
Introduction.
Hash Functions
Collusion
Conclusion
2
1. Introduction
Hash Table uses an array as a storage medium and uses hash technique to generate an
index where an element is to be inserted or is to be located from.
3
2.Types Of Hash Function In Data Structures
a hash function maps a significant number or string to a small integer that can be
used as the index in the hash table.
A good hash function should have the following characteristics:
It should be deterministic. This means that a given input should always produce the
same output.
Collision free. 4
1. Division Method:
This is the most simple and easiest method to generate a hash value. The hash function
divides the value k by M and then uses the remainder obtained.
Formula:
h(K) = k mod M
Here,
k is the key value, and
M is the size of the hash table.
.
5
Example:
k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
k = 1276
M = 11
h(1276) = 1276 mod 11
=0
Pros:
This method is quite good for any value of M.
The division method is very fast since it requires only a single division operation
Cons:
This method leads to poor performance since consecutive keys map to consecutive hash
values in the hash table.
Sometimes extra care should be taken to choose the value of M
6
2. Mid Square Method
The steps involved in computing this hash method include the following -
7
Example:
Suppose the hash table has 100 memory locations. So r = 2 because two digits
are required to map the key to the memory location.
k = 60
k x k = 60 x 60
= 3600
h(60) = 60
8
3. Digit Folding Method:
This method involves two steps:
Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where each part has
the same number of digits except for the last part that can have lesser digits than the other
parts.
Add the individual parts. The hash value is obtained by ignoring the last carry if any.
9
Formula:
k = k1, k2, k3, k4, ….., kn
s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s
Here,
s is obtained by adding the parts of the key k
Example:
k = 12345
k1 = 12,. k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51
10
Note:
The number of digits in each part varies depending upon the size of the hash table.
Suppose for example the size of the hash table is 100, then each part must have two
digits except for the last part which can have a lesser number of digits.
11
Hash Collision
A hash collision happens when the same hash value is produced for two different input
values by a hash algorithm. But it's important to point out that collisions aren't a
12
Linear Probing
Linear probing involves systematically checking the hash table from its very beginning. A
different site is searched if the one received is already occupied. In linear probing, the interval
between the probes is usually fixed (generally, to a value of 1).
The hash(n) is the index computed using a hash function, and T is the table size.
If slot index = ( hash(n) % T) is full, then the next slot index is calculated by adding 1
((hash(n) + 1) % T).
The sequence goes as -
index = ( hash(n) % T)
(hash(n) + 1) % T
(hash(n) + 2) % T
(hash(n) + 3) % T … and so on.
13
Example -
INDEX
SL. NO KEY HASH INDEX
(AFTER LINEAR PROBING)
1 3 3%20 3 3
2 2 2%20 2 2
3 46 46%20 6 6
4 6 6%20 6 7
5 11 11%20 11 11
6 13 13%20 13 13
7 53 53%20 13 14
8 12 12%20 12 12
9 70 70%20 10 10
.
14
Double Hashing
Double hashing is a collision resolution technique used in hash tables. It works by using
two hash functions to compute two different hash values for a given key. The first hash
function is used to compute the initial hash value, and the second hash function is used
to compute the step size for the probing sequence.
15
Example -
Double Hashing
4 mod 5 = 4
9 mod 5 = 4 3 - (9 mod 3) = 3
14 mod 5 = 4 3 - (14 mod 3) = 1
1 mod 5 = 1
19 mod 5 = 4 3 - (19 mod 3) = 2
0 1 2 3 4
14 1 9 19 4
16
Separate Chaining:
this method is implemented using the linked list data structure. As a result,
when numerous elements are hashed into the same slot index, those
17
Let's use "key mod 7" as our simple hash function with the following key values: 50, 700, 76, 85, 92, 73, 101.
18
Estimation of overflows
N*P(x) is therefore an estimate of the number of boxes having been chosen x times
during the insertion of r data into the table The total number of overflow data
is then estimated at: NP(2) + 2N*P(3) + 3N*P(4) + 4N*P(5) + ... 0
19
When inserting 1000 data into a table of 1000 boxes (density = 1), we estimate that:
N.P(0) = 368 boxes will not receive any data N.P(1) = 368 boxes will have been chosen only
once
N.P(2) = 184 boxes will have been chosen twice
N.P(3) = 61 boxes will have been chosen 3 times
N.P(4) = 15 boxes will have been chosen 4 times
N.P(5) = 3 boxes will have been chosen 5 times
N.P(6) = 0 boxes will have been chosen 6 times
The number of overflow data is close to: 184 + 2*61 + 3*15 + 4*3 = 363 or 36% of the data
20