0% found this document useful (0 votes)
9 views

Unit 3 2nd Half 2024

The document provides an overview of hashing, focusing on hash tables and hash functions, which are used for efficient data storage and retrieval. It discusses various types of hash functions, collision resolution strategies such as separate chaining and open addressing, and their respective advantages and disadvantages. Additionally, it includes examples of inserting keys into hash tables using different collision resolution techniques.

Uploaded by

nctitacademic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Unit 3 2nd Half 2024

The document provides an overview of hashing, focusing on hash tables and hash functions, which are used for efficient data storage and retrieval. It discusses various types of hash functions, collision resolution strategies such as separate chaining and open addressing, and their respective advantages and disadvantages. Additionally, it includes examples of inserting keys into hash tables using different collision resolution techniques.

Uploaded by

nctitacademic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Krishna / Unit 5 / Data Structures 1

Hashing:
 Hashing is a technique that is used to store, retrieve and find data in the data structure called
Hash Table.
 It is used to overcome the drawback of Linear Search (Comparison) & Binary Search (Sorted order list).
It involves two important concepts:
1. Hash Table 2. Hash Function
1. Hash table
 A hash table is a data structure that is used to store and retrieve data (keys) very quickly.
 It is an array of some fixed size, containing the keys. Hash table run from 0 to Tablesize – 1.
 Each key is mapped into some number in the range 0 to Tablesize – 1. This mapping is called Hash
function.
 Insertion of the data in the hash table is based on the key value obtained from the hash function.
 Using same hash key value, the data can be retrieved from the hash table by few or more Hash key
comparison.
 The load factor of a hash table is calculated using the formula:
 (Number of data elements in the hash table) / (Size of the hash table)
 Factors affecting Hash Table Design
1. Hash function 2. Table size. 3.Collision handling scheme
0
1
2
3
.
. Simple Hash table with table size = 10
8
9
2. Hash function:
 It is a function, which distributes the keys evenly among the cells in the Hash Table.
 Using the same hash function we can retrieve data from the hash table.
 Hash function is used to implement hash table.
 The integer value returned by the hash function is called hash key.
 If the input keys are integer, the commonly used hash function is
H ( key ) = key % Tablesize
Krishna / Unit 5 / Data Structures 2

Types of Hash Functions:


1. Division Method
2. Mid Square Method
3. Multiplicative Hash Function
4. Digit Folding
1. Division Method:
 It depends on remainder of division.
 Divisor is Table Size.
 Formula is ( H ( key ) = key % table size )
 E.g. consider the following data or record or key (36, 18, 72, 43, 6) table size = 8

2. Mid Square Method:


 We first square the item.
 Then the extract some portion of the resulting digits.

For example, if the item were 44, we would first compute 442=1,936.
Extract the middle two digit 93 from the answer. Store the key 44 in the index 93.

44
93
Krishna / Unit 5 / Data Structures 3

3. Multiplicative Hash Function:


 Key is multiplied by some constant value.
 Hash function is given by, H(key)=Floor (P * ( key * A ))
o Where P = Integer constant [e.g. P=50]
o A = Constant real number [A=0.61803398987],suggested by Donald Knuth to use this constant
 Example: Key 107
H (107) = Floor (50*(107*0.61803398987))
= Floor (3306.481845) H (107)
= 3306
 Consider table size is 5000 0

107
3306

4999

4. Extraction:
In this method some digit are extracted from the key to form the address location in hash table.
Example: Suppose first, third and fourth digit from left is selected for hash key. (497824)
4 9 7 8 2 4

478  at 478 location in the hash table of size 1000 the key can be stored.

_______________________________________________________________________________
Krishna / Unit 5 / Data Structures 4

Collision:
 If two more keys hashes to the same index, the corresponding records cannot be stored in the same location.
 This condition is known as collision.
Characteristics of Good Hashing Function:
 It should be Simple to compute.
 Number of Collision should be less while placing record in Hash Table.
 Hash function with no collision Perfect hash function.
 Hash Function should produce keys which are distributed uniformly in hash table.
 The hash function should depend upon every bit of the key. Thus the hash function that simply extracts
the portion of a key is not suitable.

Collision Resolution Strategies / Techniques (CRT):


 If collision occurs, it should be handled or overcome by applying some technique. Such technique is
called CRT.
 There are a number of collision resolution techniques, but the most popular are:

1. Separate chaining (Open Hashing)


2. Open addressing. (Closed Hashing)
 Linear Probing
 Quadratic Probing
 Double Hashing
1. Separate chaining (Open Hashing):
 It is an Open hashing technique.
 Implemented using singly linked list concept.
 Pointer (ptr) field is added to each record.
 When collision occurs, a separate chaining is maintained for colliding data. Element inserted in front of
the list.
 H (key) =key % table size
 Two operations are there:-
o Insert
o Find
Krishna / Unit 5 / Data Structures 5

Structure Definition for Node


typedef Struct node *Position; Struct node
{
int data; defines the nodes
Position next;
};
Structure Definition for Hash Table
typedef Position List;
struct Hashtbl
{ Defines the hash table which contains
int Tablesize; array of linked list
List * theLists;
};

If two keys map to same value, the elements are chained together.
Initial configuration of the hash table with separate chaining.
Here we use SLL(Singly Linked List) concept to chain the elements.

Example:
Insert the following four keys 10, 11, 81, 10, 7, 34, 94, 17 into hash table of size 10 using separate chaining.
The hash function is H(key) = key % 10
Krishna / Unit 5 / Data Structures 6

Insertion:
 Perform the insertion of an element, traverse down the appropriate list to check whether the element is
already in place.
 If the element is new one, the inserted it is either at the front of the list or at the end of the list.
 If it is a duplicate element, an extra field is kept and placed
Insert 10:
Hash (k) = k% Tablesize
Hash (10) = 10 % 10
Hash (10) = 0
Insert 11:
Hash (11) = 11 % 10
Hash (11) = 1
Insert 81:
Hash (81) = 81% 10
Hash (81) = 1
 The element 81 collides to the same hash value 1. To place the value 81 at this position perform the
following.
 Traverse the list to check whether it is already present.
 Since it is not already present, insert at end of the list. Similarly the rest of the elements are inserted.
Advantages:
1. More number of elements can be inserted using array of Link List
Disadvantages:
1. It requires more pointers, which occupies more memory space.
2. Search takes time. Since it takes time to evaluate Hash Function and also to traverse the List

2. Open Addressing:
 It is also called as Closed Hashing
 It is a Collision resolution technique.
 It Uses Hi(X)=(Hash(X)+F(i))mod Tablesize
 When collision occurs, alternative cells are tried until empty cells are found.
Types:-
1. Linear Probing
2. Quadratic Probing
3. Double Hashing
Hash function: H (key) = key % table size.
Krishna / Unit 5 / Data Structures 7

Insert Operation:
 To insert a key; Use the hash function to identify the list to which the element should be inserted.
 Then traverse the list to check whether the element is already present.
 If exists, increment the count.
 Else the new element is placed at the front of the list.
1. Linear Probing:
 It is easiest method to handle collision.
 Apply the hash function H (key) = key % table size
 Hi(X)=(Hash(X)+F(i))mod Tablesize, where F(i)=i.
How to Probing:
 First probe  given a key k, hash to H(key)
 Second probe if H(key)+f(1) is occupied, try H(key)+f(2) And
so forth.
Probing Properties:
 We force f(0)=0
 The ith probe is to (H (key) +f (i)) %table size.
 If i reach size-1, the probe has failed. Depending on f (i), the probe may fail sooner. Long sequences of
probe are costly.
Probe Sequence is:
 H (key) % table size
 H (key)+1 % Table size
 H (Key)+2 % Table size
1. H(Key)=Key mod Tablesize
 This is the common formula that you should apply for any hashing
 If collocation occurs use Formula 2
2. H(Key)=(H(key)+i) Tablesize
 Where i=1, 2, 3, ……etc
Example 1: 89 18 49 58 69; Tablesize=10
1. H(89) =89%10=9
2. H(18) =18%10=8
3. H(49) =49%10=9 (colloids with 89.So try for next free cell using formula 2)
i=1 h1(49) = (H(49)+1)%10
= (9+1)%10
Krishna / Unit 5 / Data Structures 8

=10%10
=0
4. H(58) =58%10=8 ((colloids with 18))
i=1 h1(58) =(H(58) +1)%10
= (8+1) %10
=9%10
=9 =>Again
collision
i=2 h2(58) =(H(58)+2)%10
=(8+2)%10
=10%10
=0 =>Again collision
EMPT 89 18 49 58 69
0 Y 49 49 49
1 58 58
2 69
3
4
5
6
7
8 18 18 18
9 89 89 89 89

Example 2: 89 18 49 58 69; Tablesize=10


EMPT 76 93 40 47 10 55
0 Y 47 47 47
1
2 93 93 93 93 93
3 10 10
4
5 40 40 40 40 40
6 76 76 76 76 76 76
Krishna / Unit 5 / Data Structures 9

2. Quadratic Probing
 To resolve the primary clustering problem, quadratic probing can be used. With quadratic probing,
rather than always moving one spot, move i2 spots from the point of collision, where i is the number of
attempts to resolve the collision.
 Another collision resolution method which distributes items more evenly.
 From the original index H, if the slot is filled, try cells H+12, H+22, H+32,.., H + i2 with wrap-around.
 Hi(X)=(Hash(X)+F(i))mod Tablesize,F(i)=i2
 Hi(X)=(Hash(X)+ i2)mod Tablesize
Example: Insert 18, 89, 21, 58, 68

Limitation: at most half of the table can be used as alternative locations to resolve collisions.
 This means that once the table is more than half full, it's difficult to find an empty spot. This new
problem is known as secondary clustering because elements that hash to the same hash key will always
probe the same alternative cells.

3. Double Hashing
 Double hashing uses the idea of applying a second hash function to the key when a collision
occurs. The result of the second hash function will be the number of positions forms the point of
collision to insert.
 There are a couple of requirements for the second function:
 It must never evaluate to 0 must make sure that all cells can be probed.
Hi(X)=(Hash(X)+i*Hash 2(X))mod Tablesize
Krishna / Unit 5 / Data Structures 10

 A popular second hash function is: Hash2 (key) = R - (key % R) where R is a prime number that is
smaller than the size of the table
Krishna / Unit 5 / Data Structures 11

Hashing Problems solutions


Example: 1: Given the values {2341, 4234, 2839, 430, 22, 397, 3920}, a hash table of size 7, and hash
function h(x) = x mod 7, show the resulting tables after inserting the values in the given order with each
of these collision strategies.
{2341, 4234, 2839, 430, 22, 397, 3920}
h(x) = x mod 7
2341 % 7 = 3
4234 % 7 = 6
2839 % 7 = 4
430 % 7 = 3
22 % 7 = 1
397 % 7 = 5
3920 % 7 = 0
1. separate chaining
0 [3920] 1 [22] 2 [ ] 3 [2341, 430] 4 [2839] 5 [397] 6 [4234]
2. linear probing
0 [397] 1 [22] 2 [3920] 3 [2341] 4 [2839] 5 [430] 6 [4234]
3. quadratic probing
0 [430] 1 [22] 2 [3920] 3 [2341] 4 [2839] 5 [397] 6 [4234]
430 collides at 3:
next - +1 = 4
next - +4 = 7 % 7 = 0
3920 collides at 0:
next - +1 = 1 next - +4 = 4
next - +9 = 9 % 7 = 2
4. double hashing with second hash function h'(x) = (2x - 1) mod 7
0 [3920] 1 [430] 2 [22] 3 [2341] 4 [2839] 5 [397] 6 [4234]
(2*430-1) % 7 = 5
430 = 3
= 3+1*5 = 8 % 7 = 1 (2*22-1) % 7 = 1
22 = 1
= 1 + 1*1 = 2
Krishna / Unit 5 / Data Structures 12

Example 2: Given input {4371, 1323, 6173, 4199, 4344, 9679, 1989} and a hash function h(x) = x % 10,
show the resulting
(a). separate chaining hash table (Note -- pseudo collisions are added to the end of the list.)
0 /
1 --> 4371
2 /
3 --> 1323 --> 6173
4 --> 4344
5 /
6 /
7 /
8 /
9 --> 4199 --> 9679 --> 1989
(b). closed hash table using linear probing
0 9679 x % 10 = 9 insert(9+1)
1 4371 x % 10 = 1
2 1989 x % 10 = 9 (collision at 9+1, 9+2) insert(9+3)
3 1323 x % 10 = 3
4 6173 x % 10 = 3 insert(3+1)
5 4344 x % 10 = 4 insert(4+1)
6 /
7 /
8 /
9 4199 x % 10 = 9
(c). closed hash table using quadratic probing
0 9679 x % 10 = 9 insert(9+1)
1 4371 x % 10 = 1
2 /
3 1323 x % 10 = 3
4 6173 x % 10 = 3 insert(3+1)
5 4344 x % 10 = 4 insert(4+1)
6 /
7 /
8 1989 x % 10 = 9 (collision at 9+1, 9+4) insert(9+9)
9 4199 x % 10 = 9
Krishna / Unit 5 / Data Structures 13

Example 3: Given input {43, 160, 61, 44, 67, 94, 37} and hash functions,
h1(x) = x/10 % 10 and h2(x) = 7 - (x % 7) , use double hashing to show the resulting hash table.
0 /
1 /
2 67 [h1(67) = 67/10 % 10 = 6] [h2(67) = 7-(67%7) = 7-4 = 3]
Try: [6 + 1*3 = 9]
[6 + 2*3 = 12 ==> 2]
3 94 [h1(94) = 94/10 % 10 = 9] [h2(94) = 7-(94%7) = 7-3 = 4]
Try: [9 + 1*4 = 13 ==> 3]
4 43 [h1(43) = 43/10 % 10 = 4]
5 /
6 160 [h1(160) = 160/10 % 10 = 16 ==> 6]
7 /
8 61 [h1(61) = 61/10 % 10 = 6] [h2(61) = 7-(61%7) = 7-5 = 2]
Try: [6 + 1*2 = 8]
9 44 [h1(44) = 44/10 % 10 = 4] [h2(44) = 7-(44%7) = 7-2 = 5]
Try: [4 + 1*5 = 9]
Problem with 37:
[h1(37) = 37/10 % 10 = 3] [h2(37) = 7-(37%7) = 7-2 = 5]
Try: [3+1*5= 8]
[3+2*5=13==>3]
[3+3*5=18==>8]
[3+4*5=23==>3]
[3+5*5=28==>8]
[3+6*5=33==>3]
...
is a problem as the value cycles between 3 and 8 showing the problem with a non-prime hash table size.
Krishna / Unit 5 / Data Structures 14

Rehashing:
 Once the hash table gets too full, the running time for operations will start to take too long and may fail.
 To solve this problem, a table at least twice the size of the original will be built and the elements will
be transferred to the new table.
The question becomes when should the rehashing be applied? (Or)
Rehashing can be implemented in several ways with quadratic probing such as:
 Rehash, as soon as the table is half full.
 Rehash only when an insertion fails
 Rehash when the table reaches a certain load factor.

Advantage:
 A programmer doesn’t worry about table system.
 Simple to implement
 Can be used in other data structure as well

The new size of the hash table:


 It should also be prime
 It will be used to calculate the new insertion spot (hence the name rehashing)
 This is a very expensive operation! O(N) since there are N elements to rehash and the table size is
roughly 2N. This is ok though since it doesn't happen that often.

Routine:
HashTable Rehash (HashTable H)

{
int i, oldsize;
cell *old cells;
oldcells = H ® Thecells;
oldsize = H ® Tablesize;
H = InitializeTable (2 * oldsize);
for (i = 0; i < oldsize ; i++)
if (oldcells [i]. Info = = Legitimate)
Insert (oldcells [i] . Element, H);
free (oldcells);

return H;
}
Krishna / Unit 5 / Data Structures 15

Example:
Consider the elements 13, 15, 24 and 6 are inserted into an open addressing hash table of size 7 with hash
function h(X) = X mod 7 and if linear probing is used to resolve collisions, then the resulting hash table
appears as follows:

13 % 7 = 5, 15 % 7 = 1, 24 % 7 = 3 and 6 % 7 = 0

0 6

1 15

3 24

6 13

In above figure , Open addressing hash table with linear probing with input 13, 15, 6, 24
If 23 is inserted into the table, the resulting table will be over 70 percent full. 23 % 7 = 2

0 6

1 15

2 23

3 24

6 13

A new table is created, as table is so full.


The size of this table is 17, as this is the first prime that as twice as large as the old table size.
The new hash function h(X) = X mod 17.
The old table is scanned and the elements 6, 15, 23, 24 and 13 are inserted into the new table.
Krishna / Unit 5 / Data Structures 16

-------------------------------------------------------------------------------------------------------------------------------------

You might also like