DSA Unit 1 notes_part 2
DSA Unit 1 notes_part 2
Step 2 – Check Global Depth of the directory. Suppose the global depth of
the Hash-directory is 3.
Step 7 – Tackling Over Flow Condition during Data Insertion: Many times,
while inserting data in the buckets, it might happen that the Bucket overflows. In
such cases, we need to follow an appropriate procedure to avoid mishandling of
data.
First, Check if the local depth is less than or equal to the global depth. Then
choose one of the cases below.
Case1: If the local depth of the overflowing Bucket is equal to the
global depth, then Directory Expansion, as well as Bucket Split, needs
to be performed. Then increment the global depth and the local depth
value by 1. And, assign appropriate pointers.
Directory expansion will double the number of directories present in
the hash structure.
Case2: In case the local depth is less than the global depth, then only
Bucket Split takes place. Then increment only the local depth value by
1. And, assign appropriate pointers.
Step 8 – Rehashing of Split Bucket Elements: The Elements present in the
overflowing bucket that is split are rehashed w.r.t the new global depth of the
directory.
Step 9 – The element is successfully hashed.
Solved Example:
Add following elements in hash table by applying extendible hashing mechanism
16,4,6,22,24,10,31,7,9,20,26.
Bucket Size: 3 (Assume)
Hash Function: Suppose the global depth is X. Then the Hash Function returns X
LSBs.
Solution: First, calculate the binary forms of each of the given numbers.
16- 10000
4- 00100
6- 00110
22- 10110
24- 11000
10- 01010
31- 11111
7- 00111
9- 01001
20- 10100
26- 11010
Initially, the global-depth and local-depth is always 1. Thus, the hashing f rame
looks like this:
Inserting 16:
The binary format of 16 is 10000 and global-depth is 1. The hash function
returns 1 LSB of 10000 which is 0. Hence, 16 is mapped to the directory with
id=0.
Inserting 4 and 6:
Both 4(100) and 6(110)have 0 in their LSB. Hence, they are hashed as follows:
Inserting 22: The binary form of 22 is 10110. Its LSB is 0. The bucket pointed
by directory 0 is already full. Hence, Over Flow occurs.
As directed by Step 7-Case 1, Since Local Depth = Global Depth, the bucket
splits and directory expansion takes place. Also, rehashing of numbers present
in the overflowing bucket takes place after the split. And, since the global depth
is incremented by 1, now,the global depth is 2. Hence, 16,4,6,22 are now
rehashed w.r.t 2 LSBs.[ 16(10000),4(100),6(110),22(10110) ]
*Notice that the bucket which was underflow has remained untouched. But, since
the number of directories has doubled, we now have 2 directories 01 and 11
pointing to the same bucket. This is because the local-depth of the bucket has
remained 1. And, any bucket having a local depth less than the global depth is
pointed-to by more than one directories.
Inserting 24 and 10: 24(11000) and 10 (1010) can be hashed based on
directories with id 00 and 10. Here, we encounter no overflow condition.
Inserting 31,7,9: All of these elements[ 31(11111), 7(111), 9(1001) ] have
either 01 or 11 in their LSBs. Hence, they are mapped on the bucket
pointed out by 01 and 11. We do not encounter any overflow condition here.
Inserting 20: Insertion of data element 20 (10100) will again cause the overflow
problem.
20 is inserted in bucket pointed out by 00. As directed by Step 7-Case 1, since
the local depth of the bucket = global-depth, directory expansion (doubling)
takes place along with bucket splitting. Elements present in overflowing bucket
are rehashed with the new global depth. Now, the new Hash table looks like
this:
The bucket overflows, and, as directed by Step 7-Case 2, since the local depth
of bucket < Global depth (2<3), directories are not doubled but, only the
bucket is split and elements are rehashed.
Finally, the output of hashing the given list of numbers is obtained.
Hashing of 11 Numbers is Thus Completed.
Key Observations:
1. A Bucket will have more than one pointers pointing to it if its local depth is less
than the global depth.
2. When overflow condition occurs in a bucket, all the entries in the bucket are
rehashed with a new local depth.
3. the size of a bucket cannot be changed after the data insertion process begins.
Advantages:
1. The directory size may increase significantly if several records are hashed on
the same directory while keeping the record distribution non-uniform.
2. Size of every bucket is fixed.
3. Memory is wasted in pointers when the global depth and local depth difference
becomes drastic.
4. This method is complicated to code.