0% found this document useful (0 votes)
6 views20 pages

Hashing Methods (1)

The document presents an overview of hashing techniques used in data structures, focusing on hash functions, collision resolution methods, and their characteristics. It covers various hashing methods such as the Division Method, Mid Square Method, Digit Folding Method, Linear Probing, Double Hashing, and Separate Chaining, along with examples and their pros and cons. Additionally, it discusses the estimation of overflows in hash tables based on the density of data insertion.

Uploaded by

aymen Beskri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views20 pages

Hashing Methods (1)

The document presents an overview of hashing techniques used in data structures, focusing on hash functions, collision resolution methods, and their characteristics. It covers various hashing methods such as the Division Method, Mid Square Method, Digit Folding Method, Linear Probing, Double Hashing, and Separate Chaining, along with examples and their pros and cons. Additionally, it discusses the estimation of overflows in hash tables based on the density of data insertion.

Uploaded by

aymen Beskri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

FILE ORGANIZATION

Hashing
Presented by Pr. Nabil KESKES

year 2023-2024.

1
PLAN

Introduction.
Hash Functions
Collusion
Conclusion

2
1. Introduction
Hash Table uses an array as a storage medium and uses hash technique to generate an
index where an element is to be inserted or is to be located from.

Hashing is a technique to convert a range of key values into a range of indexes of an


array.

3
2.Types Of Hash Function In Data Structures
a hash function maps a significant number or string to a small integer that can be
used as the index in the hash table.
A good hash function should have the following characteristics:

It should be deterministic. This means that a given input should always produce the
same output.

It should be fast to compute.

It should be hard to predict the output for a given input.

Collision free. 4
1. Division Method:
This is the most simple and easiest method to generate a hash value. The hash function
divides the value k by M and then uses the remainder obtained.

Formula:

h(K) = k mod M

Here,
k is the key value, and
M is the size of the hash table.
.

5
Example:

k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
k = 1276
M = 11
h(1276) = 1276 mod 11
=0
Pros:
This method is quite good for any value of M.
The division method is very fast since it requires only a single division operation
Cons:
This method leads to poor performance since consecutive keys map to consecutive hash
values in the hash table.
Sometimes extra care should be taken to choose the value of M
6
2. Mid Square Method

The steps involved in computing this hash method include the following -

Squaring the value of k ( like k*k)

Extract the hash value from the middle r digits.

Formula - h(K) = h(k x k)


(where k = key value )

7
Example:

Suppose the hash table has 100 memory locations. So r = 2 because two digits
are required to map the key to the memory location.

k = 60
k x k = 60 x 60
= 3600
h(60) = 60

The hash value obtained is 60

8
3. Digit Folding Method:
This method involves two steps:

Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where each part has
the same number of digits except for the last part that can have lesser digits than the other
parts.

Add the individual parts. The hash value is obtained by ignoring the last carry if any.

9
Formula:
k = k1, k2, k3, k4, ….., kn
s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s
Here,
s is obtained by adding the parts of the key k

Example:
k = 12345
k1 = 12,. k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51
10
Note:

The number of digits in each part varies depending upon the size of the hash table.

Suppose for example the size of the hash table is 100, then each part must have two

digits except for the last part which can have a lesser number of digits.

11
Hash Collision

A hash collision happens when the same hash value is produced for two different input

values by a hash algorithm. But it's important to point out that collisions aren't a

problem; they're a fundamental aspect of hashing algorithms.

12
Linear Probing

Linear probing involves systematically checking the hash table from its very beginning. A
different site is searched if the one received is already occupied. In linear probing, the interval
between the probes is usually fixed (generally, to a value of 1).

The formula for linear probing: index = key % hashTableSize

The hash(n) is the index computed using a hash function, and T is the table size.

If slot index = ( hash(n) % T) is full, then the next slot index is calculated by adding 1
((hash(n) + 1) % T).
The sequence goes as -
index = ( hash(n) % T)
(hash(n) + 1) % T
(hash(n) + 2) % T
(hash(n) + 3) % T … and so on.
13
Example -

For a hash table, Table Size = 20


Keys = 3,2,46,6,11,13,53,12,70,90

INDEX
SL. NO KEY HASH INDEX
(AFTER LINEAR PROBING)

1 3 3%20 3 3
2 2 2%20 2 2
3 46 46%20 6 6
4 6 6%20 6 7
5 11 11%20 11 11
6 13 13%20 13 13
7 53 53%20 13 14
8 12 12%20 12 12
9 70 70%20 10 10
.
14
Double Hashing

Double hashing is a collision resolution technique used in hash tables. It works by using
two hash functions to compute two different hash values for a given key. The first hash
function is used to compute the initial hash value, and the second hash function is used
to compute the step size for the probing sequence.

Double hashing can be done using :


(hash1(key) + i * hash2(key)) % TABLE_SIZE
Here hash1() and hash2() are hash functions and TABLE_SIZE
is size of hash table.

15
Example -

Insert Keys: 4, 9, 14, 1, 19

h(x) = x mod 5 h2(x) = 3 – (x mod 3)

Double Hashing

4 mod 5 = 4
9 mod 5 = 4 3 - (9 mod 3) = 3
14 mod 5 = 4 3 - (14 mod 3) = 1
1 mod 5 = 1
19 mod 5 = 4 3 - (19 mod 3) = 2

0 1 2 3 4
14 1 9 19 4

16
Separate Chaining:

this method is implemented using the linked list data structure. As a result,

when numerous elements are hashed into the same slot index, those

elements are added to a chain, which is a singly-linked list.

17
Let's use "key mod 7" as our simple hash function with the following key values: 50, 700, 76, 85, 92, 73, 101.

18
Estimation of overflows

Consider a table of N elements, and we would like to insert r data


The filling percentage (density) is therefore: d = r / N
Let P(x) be the probability that x data among r are “hashed” to the same element

P(x) = C(r ,x) ( 1 – 1/N )r-x (1/N)x

The Poisson function is a good approximation, in assuming a uniform hash function

P(x) = (dx *e-d) / x! (with d = r/N)

N*P(x) is therefore an estimate of the number of boxes having been chosen x times
during the insertion of r data into the table The total number of overflow data
is then estimated at: NP(2) + 2N*P(3) + 3N*P(4) + 4N*P(5) + ... 0

19
When inserting 1000 data into a table of 1000 boxes (density = 1), we estimate that:

N.P(0) = 368 boxes will not receive any data N.P(1) = 368 boxes will have been chosen only
once
N.P(2) = 184 boxes will have been chosen twice
N.P(3) = 61 boxes will have been chosen 3 times
N.P(4) = 15 boxes will have been chosen 4 times
N.P(5) = 3 boxes will have been chosen 5 times
N.P(6) = 0 boxes will have been chosen 6 times

The number of overflow data is close to: 184 + 2*61 + 3*15 + 4*3 = 363 or 36% of the data

against 631 data in their primary addresses (368 + 184 + 61 + 15 + 3 = 631)


For a density = 0.5, (ex r = 500 and N = 1000), we would have had 21% data overflow

20

You might also like